16:00:07 #startmeeting tor anti-censorship meeting 16:00:07 Meeting started Thu May 20 16:00:07 2021 UTC. The chair is cohosh. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:07 Useful Commands: #action #agreed #help #info #idea #link #topic. 16:00:18 here is our meeting pad: https://pad.riseup.net/p/tor-anti-censorship-keep 16:01:00 i'll wait a bit to let people add things to the agenda 16:01:30 or feel free to just bring stuff up here since it is currently empty :) 16:03:32 hey arlolra 16:04:06 the only thing I have for the meeting is to ask dcf1 if he is ok with anti-censorship/pluggable-transports/snowflake!38 being merged 16:04:19 meskio: it's good to merge 16:04:27 oh right the attribution 16:04:33 exactly 16:04:51 it's fine with me 16:04:56 no need to alter the name on the commit imo 16:05:01 :) 16:05:22 great, then I'll merge it as it is 16:06:14 we do have a possible ongoing issue with snowflake where now that we've figured out which proxies work for clients with symmetric NATs we don't have a lot of them 16:06:56 so the situation is a lot better than before but sometimes these users can't get proxies fast enough before tor gives up 16:07:01 don't have a lot of suitable proxies, you mean? 16:07:07 yes 16:07:39 we're thinking of doing a push to get people to run the Go proxies on VPSs 16:07:43 snowflake!40045 probably is that issue, do you think? 16:07:55 https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/issues/40045 16:08:10 snowflake#40045 is what I meant 16:08:25 yes i think that's possible 16:08:33 'Esp. on iOS, I always seem to get only the "unrestricted" types.' 16:08:49 which is what is expected for a client on restricted NAT 16:09:18 what kind of push? blog/twitter/etc.? 16:09:20 yeah and mobile networks seem to be very restrictive 16:09:48 yeah all of those, i haven't worked out the details with the community or comms people yet 16:10:07 we've been working on improving documentation for setting them up 16:10:21 about go proxies on VPSs, I'm looking into dockerizing snowflake proxie, it might help the way there 16:10:32 meskio: ah we actually have a docker image for that 16:10:45 ahh, cool, where is that? 16:10:55 https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/wikis/home#docker-setup 16:10:56 we should probably push it into hub.docker.io 16:11:09 https://hub.docker.com/repository/docker/thetorproject/snowflake-proxy 16:11:16 * meskio remembers that should look into the wiki more often when having questions 16:11:20 https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/issues/19409#note_2736272 here's discussion from today re docker 16:11:35 ohh, I did search for it, but I guess docker hub search tool is not so great 16:11:44 ah well the wiki is a bit tough, i see now that you commented on the debian package ticket i should've responded faster 16:12:32 is ok, I didn't do anything more than thinking about it 16:12:38 I'm happy is done :) 16:13:16 but yeah my main worry is that especially with orbot rolling out snowflake we will have even more clients with restrictive NATs 16:13:43 proportionally to desktop clients 16:15:09 sounds like (a) it's super important to get the matching right then, if random assignment is likely to lead to repeated sadness, 16:15:23 well the matching is actually really good 16:15:32 but also (b) if we have static cloud snowflakes, the enumeration-and-blocking issue gets more important too 16:15:53 we already do NAT-aware matching, the issue is there's not enough of the right kind of proxy for the clients that need that type 16:16:05 the problems restricted clients are getting is that the broker says "no proxies available" because there aren't any that will work 16:16:24 headless go flakes can serve more than one client right? 16:16:32 yup 16:16:39 the default is 10 16:17:03 but if we're getting a lot of VPS's running them perhaps a higher default is better 16:17:51 so the issue is that they're full of not currently willing to do more? or is it a timing thing where they don't offer at the right rate so there aren't enough offering at the moment we need them? 16:17:57 s/full of not/full and not/ 16:18:07 the standalone poll rate is pretty high 16:18:49 increasing each standalone proxy's capacity and getting more of them out there will both help with this problem 16:19:28 i guess we can set the default higher to encourage it 16:20:04 maybe we want some sort of low water mark / high water mark caps? because, if we match a client to a goflake but then the client never follows up, the goflake thinks it's "in use" but it isn't 16:20:32 there's a timeout that's pretty good at detecting that 16:22:34 i think no matter what optimizations we do we're going to have to do some community work to get more capacity 16:23:15 yep. i think we'll want to pair that community growth with "assign load properly given however many nodes we have", 16:23:44 that is, if we have a pool of headless proxies and a client who wants one, the client should get paired with one always, rather than fail 16:24:46 I'm confused. That's how it works now, according to my understanding? 16:24:49 okay so you're suggesting the broker do load balancing instead of the proxies deciding how often to poll or how many clients to support? 16:25:27 i'm confused too. if that's how it works now, and there is at least 1 go snowflake, how is that clients are failing to get a match? 16:25:51 so right now the existence of the pool is decided by whether or not the proxies poll 16:26:05 the 1 go snowflake is at its capacity, presumably 16:26:11 i think there's some misunderstanding that comes from conflating this with how bridgedb works 16:26:27 we only have proxies that are currently polling 16:26:33 and they decide whether they want to poll 16:26:53 and our issue is that we have some proxies, yes but they fill up all their slots with clients and stop polling 16:27:11 so at that point we don't "have" them anymore 16:27:22 ok great. what are the downsides of filling a single proxy with an arbitrary number of clients? bandwidth use? anything else? 16:27:53 right, so like cohosh suggested, two remedies are to increase the number of proxies or increase the capacity per proxy 16:28:27 I don't think there's much of a downside and the default limit should probably be much higher 16:28:34 yeah 16:28:43 especially if they are on a VPS 16:28:47 ok. i think what i'm suggesting is: no matter how big the volunteer pool is, we should be using a matching algorithm that handles whatever the pool is. 16:29:15 that is, yes we should get more volunteers, but if there are edge cases where we don't have the perfect number of volunteers -- and that's always going to keep happening as we grow -- then we want smarter matching too. 16:29:19 hm, there is apoint to be made on what the broker *should* handle though 16:29:28 I think you're still confused about how it works. This isn't about matching, which is something the borker does. 16:29:50 i think arma2 is suggesting the broker keep track of how many clients each proxy currently has 16:29:54 by matching i include "the proxy needs to offer often enough for the match to work" 16:30:27 Okay, I still think it's about capacity, and not about polling frequency, though. 16:30:39 we have snowflake#25598 16:30:43 standalone proxies already poll every 5 seconds 16:31:01 so the only time they become unavailable is when they hit their capacity 16:31:13 i guess the broker could tell proxies to increase their capacity 16:31:24 yeah. so we have an error case here where there are proxies but they don't poll 16:31:53 changing 10 to 20 will mean we hit the error case less, and getting more proxies will also mean that, but we're still going to hit it 16:32:41 i am curious about what would happen if we remove the capacity entirely 16:32:43 so for example, one change would be: the proxy continues to poll but also it says how full it is, and then the broker can make smart choices 16:33:12 okay 16:33:49 or the proxy can poll at an interval dependent on capacity 16:33:52 that sounds sensible, so the broker basically always chooses the one with lower fullness 16:33:59 ah so instead of doing a FIFO heap, do an ordered heap 16:34:30 arlolra: yep. except, that could put us back in the error case where proxies don't poll often enough for the number of clients that are hoping to find a match 16:34:48 i think the "broker says when to come back" feature will be really useful here 16:35:03 because the goal is to have "enough" proxies offering, for whatever rate of clients have arrived 16:35:13 i see 16:35:26 It's not a FIFO queue currently though, AFAIK the broker *does* prioritize less-loaded proxies 16:35:29 https://gitweb.torproject.org/pluggable-transports/snowflake.git/tree/broker/snowflake-heap.go?id=0054cb2dec19e89e07b8c5a6d8b9d23589842deb#n26 16:35:46 "Snowflakes serving less clients should sort earlier." 16:36:09 I don't know how reliable the client count tracking is, though. 16:36:13 so the proxies already report their fullness 16:36:20 ? 16:36:22 yeah. i didn't mean for this to be a deep idea. i simply meant: if snowflakes stop polling when they're full, then no matter how many snowflakes we have and no matter what fixed limit they each use, we're going to have this conversation again at some point. 16:36:35 Well, or not. 16:37:12 dcf1: hm, looks like that is not used 16:37:15 It could be that we set the limit to 500 and natural proxy growth or whatever makes it not be an issue 16:37:21 https://gitweb.torproject.org/pluggable-transports/snowflake.git/tree/broker/broker.go#n169 16:38:01 okay, maybe the client counting thing was aspirational 16:38:22 dcf1: yep. that's another approach that could work: raise the limit way bigger than we think we need right now, and then watch the load and raise it again when we start approaching it. 16:38:26 meskio: I think the idea was for the broker to track itself how many clients it has given to each proxy (keyed by the proxies self-chosen random ID) 16:38:26 okay so the next steps here seem to be: finish implementing this feature, and raise the default capcity to something very high 16:38:48 dcf1: but the broker doesn't know if the connection with the client is still there 16:39:18 meskio: you're right, perhaps there was meant to be some kind of termination message 16:39:23 it does seem like having the proxy declare its load will give a more accurate number. i hesitate a bit because privacy, but. 16:39:52 yeah it would require less message passing as well 16:39:58 I'm thinking about a proxy that lies about always having 0 load in order to get all the clients 16:40:20 not that we're currently robust against adversarial proxies in general 16:40:23 dcf1: with the current self-chosen random ID thing that is an issue too right? 16:40:33 "i am always new and fresh" 16:40:51 correct, at least the ID should be paired with IP address for better resistance to manipulation 16:41:09 you could use the load as a weight on a random picker 16:41:25 yep. it is a good point that we want to (a) always make sure clients match to somebody and (b) prevent an enthusiastic proxy from getting too much attention. these goals conflict. 16:41:32 arma2: it already generates a new id for each client slot https://gitweb.torproject.org/pluggable-transports/snowflake.git/tree/proxy/snowflake.go#n601 16:42:39 so i guess it's already seemingly always new and fresh 16:43:50 so can we just use IP addresses here/ 16:43:59 ok. so *if* we want the broker to make choices based on load, we need to either fix the design so the broker can recognize proxies it's used before, or change things so the proxy declares its load. 16:45:03 ..or both. we could imagine a redundancy approach where we track what we think the load might be, and also the proxy declares its load, and we use the declared number because it's more accurate, unless it's wildly different than what we think it should be 16:45:10 given that the broker can't see when a client session ends, i think these are two separate issues 16:45:47 unless the broker makes an assumption of snowflake session length 16:46:04 maybe that's the way to go 16:46:15 it won't be a perfect matching of current load 16:46:35 i wonder how many snowflake matches result in no load, i.e. the client learns about the snowflake but doesn't really use it 16:46:35 but it's probably pretty good if the broker assigns proxies according to how frequently they've given out that IP address 16:47:26 yeah apparently quite a few snowflake#30498 16:48:06 it seems weird that we'd set ourselves up to try to model how many clients you 'should' have, when we could just be told 16:48:32 but yeah, we're going to need to do the modeling part at some point, for recognizing people who lie 16:49:02 anyway, sounds good, we're all back on the same page 16:50:10 (if we try to build the model, and also we get told, we have a way to validate and improve our model) 16:50:15 yeah thanks for the discussion 16:50:44 i'm tempted to just do the proxy self reporting for now since we're already not that secure against malicious proxies anyway 16:51:13 and see what happens 16:51:21 unless we think it is too big a privacy issue, having the self reporting can only help, for visibility into what's actually happening. whether we use it now or later can be a separate choice. 16:51:22 since that's something we'll have to address later no matter what 16:51:43 arma2: which privacy issue are you worried about? 16:51:57 the broker already has an exact client count 16:52:04 maybe we want to keep secret from the broker which clients ended up on which snowflake 16:52:39 doesn't the broker asign clients to snowflakes? 16:52:54 that goal would seem tough to accomplish though, because yea, the broker could just say "here is your only snowflake" 16:53:32 yeah i think the broker already has easier access to that information 16:53:44 not that it's stored 16:53:55 in that case, let's say everything explicitly, rather than trying to guess what the implicit numbers are 16:54:28 especially since it will give us "the other half" of #30498 16:55:15 true, i'm curious about how the count of client requests to proxy load works out 16:55:29 okay we're running up against the end of the meeting 16:55:39 that was a really discussion 16:55:49 does anyone have other topics to bring up or things they need help with? 16:55:59 *a really good discussion 16:57:31 also damn arlolra that was quick with snowflake#26092 :P 16:57:57 there wasn't much doing? 16:58:09 there's a chain of patches 16:58:22 i haven't had time to look at it yet 16:58:25 the earlier ones just refactor in process so the logic is out of the http handlers 16:59:01 then there's a WIP to make a separate http frontend, but that can be left to discuss at a later date 16:59:30 I tried to respect the don't design this myself warning 16:59:51 nice, i'm excited to take a look 17:00:43 no rush, maybe we can discuss people's idea for how this should be architected next week 17:01:32 okay sounds good 17:01:51 i'll wait another minute or so and then end the meeting for today 17:02:46 hi folks 17:02:52 one minor thing, wearing comms hat 17:03:00 there is a turbotunnel security audit by cure53, do you wish to publish a blogpost about it? 17:03:17 i had a to-do in the comms pad to ping dcf1 about it 17:03:45 lemme know offline ! 17:03:49 ok 17:03:50 on the "woo security audit, we want to normalize everybody getting them" theory, yes please 17:04:30 :) 17:04:35 okay i'll end it here 17:04:39 #endmeeting