15:59:48 <meskio> #startmeeting tor anti-censorship meeting 15:59:48 <MeetBot> Meeting started Thu Mar 3 15:59:48 2022 UTC. The chair is meskio. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:59:48 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic. 15:59:52 <irl> hi 15:59:53 <cohosh> hi! 15:59:54 <meskio> hello :) 16:00:07 <meskio> here is our meeting pad: https://pad.riseup.net/p/tor-anti-censorship-keep 16:00:16 <meskio> feel free to add what you've been working on and put items on the agenda 16:02:01 <meskio> I kept the obfs4proxy point in the agenda just to say that we are coordinating with the default bridges so they get updated 16:02:08 <meskio> and many has update already 16:02:23 <meskio> and there is a new docker image that ships the new version 16:02:32 <meskio> I guess there is nothing else new since last week, isn't it? 16:02:56 <irl> should i be deploying the new obfs4proxy package onto the s96/125 bridges? 16:03:04 <irl> this hasn't been clear for me 16:03:14 <meskio> irl: yes, I think you should 16:03:22 <irl> ok i'll make that happen 16:03:37 <irl> should acute be putting it into backports to get it globally available in debian? 16:03:47 <dcf1> great work with your automated bridge deployment irl 16:03:56 <irl> thanks (: 16:04:17 <meskio> I've tested the connection with an old obfs4proxy to the new one and the problem is similar than the other way around, I expect most clients to be updated because auto update in TB, but the old ones will just take longer to connect 16:04:42 <meskio> irl: yes, will be great to get the new version in backports, AFAIK is already in testing 16:05:00 <irl> yes, we paused on pushing to backports because of confusion over if it was good 16:05:30 <meskio> I think is not worst than keeping it as it is 16:05:37 <irl> cool 16:05:44 <irl> thanks for clarifying that 16:06:09 <acute> meskio: good timing, the package just cleared testing 16:06:10 <meskio> no prob, thanks for the help there 16:06:50 <dcf1> meskio, acute, thanks for your expertise in debian packaging 16:08:16 <meskio> anything more in this topic? 16:08:51 <meskio> the next point in the agenda is 'snowflake load and bottlenecks' 16:09:37 <meskio> does anybody want to introduce it? 16:09:46 <cohosh> sure, there's been some questions asked lately on whether we can use more proxies 16:10:08 <cohosh> more proxies = more ips so it's always a good thing as long as the broker can handle it and it looks like it can 16:10:34 <arma2> in particular, i had been wondering if getting more "totally not firewalled" proxies, e.g. headless snowflake proxies run at universities, for the users whose nat situation needs those 16:10:36 <cohosh> but if i'm not mistaken, it seems our current pressing bottleneck is the bridge still and we can't use our full proxy capacity anymore 16:11:29 <dcf1> Besides the broker, there is probetest, which is constantly chugging, but we don't know if that's just load or a bug in probetest, if I'm not mistaken 16:11:56 <cohosh> ah yeah true 16:12:31 <cohosh> probetest is doing a decent enough job, enough to get us an adequate amount of proxies in each pool 16:12:38 <cohosh> but it's definitely at capacity 16:12:49 <shelikhoo> And we can partially solve this by monitor and restart it automatically? 16:13:09 <meskio> could we set up multiple probetests? will it hard to implement it/worth it? 16:13:30 <dcf1> shelikhoo: I think so, probetest doesn't keep any meaningful long-term state, right? 16:13:32 <cohosh> meskio: maybe we can use the same load balancing trick as the bridge 16:13:53 <cohosh> i think it's at capacity even without the potential bug that needs restarting 16:14:13 <cohosh> partly because it's still working right now but we have a fair number of unknown proxies which should theoretically be quite low 16:14:16 <dcf1> It's actually strange that probetest is at 100% CPU, now that I think about it: it indicates all the load is in one goroutine 16:14:32 <dcf1> i.e., could be some catastrophic garbage collection or inefficient data structure on the main thread 16:14:47 <arma2> probetest is a tool we run alongside the broker, to connect to snowflake and decide what nat category they're in, so we can match them to the right clients? (as opposed to other tools we have called probetest) 16:14:58 <arma2> s/connect to snowflake/connect to snowflakes/ 16:15:09 <cohosh> arma2: yes, it was in restrospect a bad name that i gave it 16:15:15 <cohosh> "nattest" would be better lol 16:15:16 <shelikhoo> dcf1: I don't think there is any state, but I am not very sure 16:15:24 <dcf1> arma2: correct, it's https://gitlab.torproject.org/cohosh/probetest 16:15:35 <dcf1> next tool will be called "testprobe" to reduce confusion 16:15:40 <cohosh> lol 16:15:47 <meskio> XD 16:15:58 <shelikhoo> This confused me a lot 16:16:12 <arma2> oh wait so this *is* the same as the other probetest? :) 16:16:13 <meskio> maybe we need to profile probetest more, we had lightly done that and didn't find much 16:16:17 <cohosh> cutting edge security by obscurity technique 16:16:56 <shelikhoo> If we care about CPU issue we can bundle a pprof into it 16:17:09 <shelikhoo> but I am not very sure if we should do this 16:17:27 <dcf1> arma2: ack, no! sorry 16:17:38 <dcf1> https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/tree/main/probetest 16:17:46 <arma2> dcf1: there is a thing in that probetest named stun-test, for extra fun 16:17:56 <dcf1> ↑ this is the one we are talking about now, the one that runs on the broker 16:18:03 <meskio> crazy idea to put on the table, makes sense to modify standalone proxies be they own bridge and connect directly to Tor? 16:18:13 <dcf1> sorry, I was confused too 16:18:20 <arma2> whew 16:18:22 <meskio> they might not be enough to make so much change in the main bridge, and too much work to implement... 16:19:23 <cohosh> meskio: the main problem there is we still have the issues outlined in https://lists.torproject.org/pipermail/tor-project/2022-March/003303.html 16:19:24 <arma2> meskio: the challenge there would be the same one dcf1 has been facing with setting up bridge clones -- the snowflake client needs a way to know which bridge fingerprint it will be connecting to 16:19:43 <arma2> otherwise it looks like a mitm attempt, to tor 16:19:56 <dcf1> shelikhoo: I don't think there's any problem running a profiled probetest. It only takes a few minutes to reach CPU stauration after being restarted, it's not a rare edge case or anything. 16:20:25 <shelikhoo> dcf1: Yes... 16:20:50 <meskio> cohosh: what I mean is if standalone proxies where not sending traffic to the snowflake bridge but directly to tor we might reduce the load in the bridge, but maybe not enough 16:22:25 <shelikhoo> Another thing we can do is run more than one pool 16:22:30 <dcf1> what meskio is saying has some sense, if I understand it right. The standalone proxies, rather than re-wrap their streams as WebSocket, could make a TCP connection directly to the ExtORPort of the tor bridge(s) that snowflake-server forwards to 16:22:58 <dcf1> i.e., an optimization to cut out one part of the connection pipeline, in the special case where proxies can make direct TCP connections 16:22:59 <cohosh> we'd lose turbotunnel though right? 16:23:12 <dcf1> oh hm right 16:23:16 <meskio> ouch, true 16:23:29 <meskio> we need the central snowflake-server, I forgot about that 16:23:30 <shelikhoo> This will increase the speed for users in different region 16:23:51 <shelikhoo> and buy us more time to find a better solution 16:24:16 <shelikhoo> Let it work in the same way as meek 16:24:56 <dcf1> shelikhoo: it really doesn't work without the turbo tunnel component, it's not only a performance optimization 16:25:21 <dcf1> you have to restart the browser every few minutes, it's not nice 16:25:25 <cohosh> shelikhoo: hm that's true, we could probably add logic to the webextension too that groups broker-bridge pairs and then randomly chooses which pool to be in 16:25:39 <cohosh> just run two of both 16:26:17 <shelikhoo> dcf1: What I means is that "Another thing we can do is run more than one pool" 16:26:26 <dcf1> shelikhoo: aha, I see now, thanks shelikhoo 16:26:49 <arma2> with two brokers and two pools, one could get rebooted and the world wouldn't end quite so much 16:27:14 <dcf1> cohosh posted about communicating a random bridge selection to the broker in https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/issues/28651#note_2782955 16:28:00 <shelikhoo> Or we have just have separate brokers and snowflake-server 16:28:07 <dcf1> what I understand by "run more than one pool" is that the bridge selection could be *implicit* in the broker selection: each broker knows of one and only one bridge, and clients know what bridge to expect from the broker they are using 16:28:11 <shelikhoo> Or we can have just have separate brokers and snowflake-server 16:28:24 <dcf1> use broker A, expect bridge A; use broker B, expect bridge B 16:29:10 <meskio> yep, so in a single session a client will always reconnect using the same broker 16:29:25 <arma2> dcf1: i liked your "The broker could maintain a mapping from fingerprint to IP:port, which would simultaneously serve as an allowlist." sentence on that ticket 16:29:26 <cohosh> (re multiple pools): it's easy to do that separation for clients by including broker domains in the bridge line instead of the ClientTransportPlugin line 16:30:02 <arma2> dcf1: and i think having the client specify a fingerprint, and the broker provide the allowlist, is orthogonal to (i.e. a separate design decision from) whether each broker has its own set of bridges or what 16:30:15 <shelikhoo> Yes, maybe we could move broker domain to the bridge line 16:30:27 <cohosh> this multiple pools thing is actually an orthogonal approach to the multiple bridges idea in snowflake#28651 16:30:30 <dcf1> broker domain is already in the bridge line, it's url= 16:30:40 <arma2> cohosh: yes! i think / hope so too 16:30:42 <cohosh> because father down the line, each pool could have their own set of bridges 16:30:57 <cohosh> dcf1: yes but not in Tor Browser yet 16:31:33 <arma2> cohosh: yep. though in that case, we should remember that the client will need to know *which* broker to ask, to reach a given bridge. that is, that info has to go into the client-side config (e.g. the bridge line) 16:32:19 <meskio> another bottle neck for paralelization is the snowflake-server having in memory state, might be possible to move it to a DB and have several snowflake-servers in different machines running consuming that DB, but I guess this is a problem to solve farther down the line 16:32:56 <dcf1> I was thinking, we could even run the 2 brokers on the same host, as long as load allows... but that reduces to the client telling a (single) broker which bridge/pool it wants to use 16:33:35 <shelikhoo> We should allow client/proxy to choose which pool it wants to use for speed reasons 16:33:46 <dcf1> but there's a lot of merit to the idea of partitioning snowflake-server instances and bridges -- if the client's random pool/whatever selection consistently selects the same snowflake-server instance, that's a way to scale snowflake-server over multiple hosts 16:34:43 <shelikhoo> Let's say we can ask client to submit a route key? 16:35:01 <arma2> dcf1: right. i think the endpoint here needs to be "we have multiple snowflake servers/bridges" and the only question is how best to get there 16:35:48 <shelikhoo> X-Wants: pool=EU; routeKey=12345 16:36:18 <shelikhoo> or X-Wants-Pool: EU 16:36:39 <shelikhoo> X-RouteKey: 12345 16:36:45 <shelikhoo> And let nginx do the routing? 16:37:54 <meskio> what is the routekey? a number so you are always asigned to the same snowflake-server and don't need to share state between multiple snowflake-servers? 16:38:43 <shelikhoo> Yes, a random number to route the client to the correct snowflake-server 16:38:47 <dcf1> shelikhoo: I think I understand. But the client still needs to specify exactly one bridge fingerprint in the pool, in addition to an identifier for the pool 16:39:16 <meskio> could the fingerprint being given by the broker? 16:39:18 <arma2> yep. and if the routekey is the same as the fingerprint, are we all set? 16:39:19 <shelikhoo> Yes, the one bridge fingerprint limitation comes from Tor..... 16:39:46 <shelikhoo> Oh, we can use fingerprint as route key! 16:39:52 <shelikhoo> great! 16:40:00 <dcf1> meskio: no, unfortunately I don't think so. 16:40:06 <dcf1> Bridge line format is 16:40:28 <shelikhoo> but we can have something like bridgedb... 16:40:31 <meskio> ahh, sure, we can't give the fp from the pt 16:40:36 <dcf1> Bridge snowflake 192.0.2.3:1 FINGERPRINT args=vals 16:40:43 <arma2> i am imagining we have several snowflake bridge lines in tor browser's torrc, and each snowflake bridge line points to a different destination tor bridge 16:40:44 <shelikhoo> to give use a list of bridges to try 16:40:58 <shelikhoo> to give user a list of bridges to try 16:41:10 <arma2> and then whether to have one broker per bridge, or one pool per something, etc are all design choices to be made :) 16:41:28 <dcf1> and the PT protocol doesn't provide a way for the PT to tell tor what the FINGERPRINT should be, that information only flow in the other direction 16:42:00 <meskio> makes sense 16:42:22 <dcf1> arma2: the problem I see with multiple bridges is a "thundering herd" where tor tries to connect to all simultaneous, and keeps one 16:42:43 <dcf1> arlolra suggested dynamically writing the torrc file: choose a bridge line at random, and write a torrc containing only that one line 16:43:05 <dcf1> *"the problem I see with multiple bridge lines in tor browser's torrc" 16:43:11 <arma2> dcf1: right. i think i am fine with this thundering herd thing. since, it's actually kind of lightweight to do a connection and then not use it much after that 16:43:21 <cohosh> yeah in snowflake's case that "opportunistically try all configured bridges" becomes worse because trying a bridge involves the full broker poll-connect to snowflake dance 16:43:24 <arma2> dcf1: and i think the failover is a huge win, i.e. there is one next in the list 16:43:39 <dcf1> it's not super lightweight, it's N broker transactions, N STUN exchanges 16:43:45 <cohosh> so thundering herd places a lot of strain on the system 16:44:16 <dcf1> potentially N-1 proxies held idle, if tor or snowflake-client doesn't disconnect the unused ones 16:44:22 <arma2> right. how about a tor patch that shuffles your bridges and then tries them in order and doesn't try later ones if it has one it likes? 16:44:31 <shelikhoo> 1. We can first create separate pools for different region 2. make a obfs4 like system where user host their own snowflake server 16:44:47 <arma2> s/shuffles your bridges/after it shuffles your bridges/ 16:44:48 <dcf1> arma2: something like that would be great 16:45:24 <arma2> dcf1: the people in russia with 500 bridges in their torrc want that feature too 16:47:19 <arma2> though i do still think that if there are e.g. five snowflake bridge lines, 5x the bootstrap-but-don't-actually-use-it connections isn't so bad. but apparently i am the minority opinion there :) 16:48:45 <arma2> what are the new building blocks we need from wherever this design goes? (a) client needs a way to specify to the snowflake what route-id-fingerprint it wants. (b) snowflake learns from broker how to map a fingerprint to an allowed ip:port. 16:48:59 <arma2> those two building blocks seem to be a part of all the designs we're considering? 16:49:36 <dcf1> yes, I think so 16:50:13 <dcf1> putting the bridge fingerprint into a SOCKS param and sending it to the broker along with the offer seems pretty straightforward 16:50:41 <dcf1> seems a necessary component in any case 16:50:53 <arlolra> I was gonna write that patch 16:51:00 <cohosh> oh hey arlolra 16:51:08 <arlolra> hi 16:51:08 <dcf1> haha I was just about to say, "don't want to speak for arlolra" 16:52:58 <shelikhoo> Yes, so the Tor tell snowflake the fingerprint to connect, this information is then shared to broker and then proxy then specific stable snowflake-server. snowflake-server then connect to named tor with that fingerprint? 16:53:25 <shelikhoo> Yes, so the Tor tell snowflake-client the fingerprint to connect, this information is then shared to broker and then proxy then specific stable snowflake-server. snowflake-server then connect to named tor with that fingerprint? 16:54:08 <meskio> wouldn't it make sense to run one snowflake-server per fingerprint? 16:54:08 <dcf1> yes, I think that's right 16:54:47 <dcf1> If there is only one bridge for each snowflake-server, then only the broker and the proxy need to know the routing. If there can be more than one bridge per snowflake-server, then the bridge fingerprint needs to be passed all the way through to snowflake-server. 16:55:18 <arma2> typically the bridge will sit very close to the snowflake-server, right? 16:55:21 <dcf1> (The pass-through can happen in the same way we currently pass client IP information for geolocation; i.e., in a URL query parameter in the WebSocket HTTP request.) 16:55:24 <arma2> i.e. they will come in pairs 16:55:28 <shelikhoo> meskio: this should work as well, so long as snowflake-server's websocket address is also passed by broker 16:55:55 <meskio> that will make it possible to place bridges in different locations and don't have the snowflake-server as bottleneck 16:56:03 <shelikhoo> I like this solution 16:56:09 <meskio> anyway a single turbotunnel connection should use a single fingerprint 16:56:31 <dcf1> arma2: yes, ideally they are close network-wise. I.e. each snowflake-server has its cadre of a few or a few dozen bridges that it knows about in the same data center. 16:56:41 <cohosh> this is basically the idea in https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/issues/28651#note_2782955? sorry i'm losing track of the thread 16:56:57 <cohosh> we have some backwards compatibility issues to consider there 16:57:13 <dcf1> that was one problem with the current hosting: the data center that the snowflake bridge is in now (Amsterdam) is full, and new VPSes are being created in Miami 16:57:18 <meskio> cohosh: I'm sorry, but I hadn't read your idea yet, but might be the same :) 16:57:56 <dcf1> I was thinking of the possibility of creating e.g. a WireGuard tunnel for the ExtORPort connections between the snowflake-server host and the bridges host, latency be damned 16:58:05 <dcf1> but obviously better if they are close 16:58:06 <cohosh> no worries, i'm just trying to figure out where we're at heh 16:58:56 <dcf1> I think the key insight I'm taking from this discussion is that it's a good idea to partition bridges so that each instance of snowflake-server has its own disjoint set. 16:59:23 <dcf1> (Even if all snowflake-servers ↔ all bridges might use available resources more efficiently) 17:00:07 <meskio> not sure I understand this last comment 17:00:17 <dcf1> A bridge fingerprint not only maps you to the bridge you want, it defines your affinity to a specific snowflake-server, which solves the problem of sharing turbo tunnel state 17:00:34 <dcf1> I guess we're out of time, but I can elaborate on what I mean after the meeting 17:00:50 <meskio> sure, let's close the meeting here 17:00:56 <meskio> #endmeeting