15:59:48 <meskio> #startmeeting tor anti-censorship meeting
15:59:48 <MeetBot> Meeting started Thu Mar  3 15:59:48 2022 UTC.  The chair is meskio. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:59:48 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic.
15:59:52 <irl> hi
15:59:53 <cohosh> hi!
15:59:54 <meskio> hello :)
16:00:07 <meskio> here is our meeting pad: https://pad.riseup.net/p/tor-anti-censorship-keep
16:00:16 <meskio> feel free to add what you've been working on and put items on the agenda
16:02:01 <meskio> I kept the obfs4proxy point in the agenda just to say that we are coordinating with the default bridges so they get updated
16:02:08 <meskio> and many has update already
16:02:23 <meskio> and there is a new docker image that ships the new version
16:02:32 <meskio> I guess there is nothing else new since last week, isn't it?
16:02:56 <irl> should i be deploying the new obfs4proxy package onto the s96/125 bridges?
16:03:04 <irl> this hasn't been clear for me
16:03:14 <meskio> irl: yes, I think you should
16:03:22 <irl> ok i'll make that happen
16:03:37 <irl> should acute be putting it into backports to get it globally available in debian?
16:03:47 <dcf1> great work with your automated bridge deployment irl
16:03:56 <irl> thanks (:
16:04:17 <meskio> I've tested the connection with an old obfs4proxy to the new one and the problem is similar than the other way around, I expect most clients to be updated because auto update in TB, but the old ones will just take longer to connect
16:04:42 <meskio> irl: yes, will be great to get the new version in backports, AFAIK is already in testing
16:05:00 <irl> yes, we paused on pushing to backports because of confusion over if it was good
16:05:30 <meskio> I think is not worst than keeping it as it is
16:05:37 <irl> cool
16:05:44 <irl> thanks for clarifying that
16:06:09 <acute> meskio: good timing, the package just cleared testing
16:06:10 <meskio> no prob, thanks for the help there
16:06:50 <dcf1> meskio, acute, thanks for your expertise in debian packaging
16:08:16 <meskio> anything more in this topic?
16:08:51 <meskio> the next point in the agenda is 'snowflake load and bottlenecks'
16:09:37 <meskio> does anybody want to introduce it?
16:09:46 <cohosh> sure, there's been some questions asked lately on whether we can use more proxies
16:10:08 <cohosh> more proxies = more ips so it's always a good thing as long as the broker can handle it and it looks like it can
16:10:34 <arma2> in particular, i had been wondering if getting more "totally not firewalled" proxies, e.g. headless snowflake proxies run at universities, for the users whose nat situation needs those
16:10:36 <cohosh> but if i'm not mistaken, it seems our current pressing bottleneck is the bridge still and we can't use our full proxy capacity anymore
16:11:29 <dcf1> Besides the broker, there is probetest, which is constantly chugging, but we don't know if that's just load or a bug in probetest, if I'm not mistaken
16:11:56 <cohosh> ah yeah true
16:12:31 <cohosh> probetest is doing a decent enough job, enough to get us an adequate amount of proxies in each pool
16:12:38 <cohosh> but it's definitely at capacity
16:12:49 <shelikhoo> And we can partially solve this by monitor and restart it automatically?
16:13:09 <meskio> could we set up multiple probetests? will it hard to implement it/worth it?
16:13:30 <dcf1> shelikhoo: I think so, probetest doesn't keep any meaningful long-term state, right?
16:13:32 <cohosh> meskio: maybe we can use the same load balancing trick as the bridge
16:13:53 <cohosh> i think it's at capacity even without the potential bug that needs restarting
16:14:13 <cohosh> partly because it's still working right now but we have a fair number of unknown proxies which should theoretically be quite low
16:14:16 <dcf1> It's actually strange that probetest is at 100% CPU, now that I think about it: it indicates all the load is in one goroutine
16:14:32 <dcf1> i.e., could be some catastrophic garbage collection or inefficient data structure on the main thread
16:14:47 <arma2> probetest is a tool we run alongside the broker, to connect to snowflake and decide what nat category they're in, so we can match them to the right clients? (as opposed to other tools we have called probetest)
16:14:58 <arma2> s/connect to snowflake/connect to snowflakes/
16:15:09 <cohosh> arma2: yes, it was in restrospect a bad name that i gave it
16:15:15 <cohosh> "nattest" would be better lol
16:15:16 <shelikhoo> dcf1: I don't think there is any state, but I am not very sure
16:15:24 <dcf1> arma2: correct, it's https://gitlab.torproject.org/cohosh/probetest
16:15:35 <dcf1> next tool will be called "testprobe" to reduce confusion
16:15:40 <cohosh> lol
16:15:47 <meskio> XD
16:15:58 <shelikhoo> This confused me a lot
16:16:12 <arma2> oh wait so this *is* the same as the other probetest? :)
16:16:13 <meskio> maybe we need to profile probetest more, we had lightly done that and didn't find much
16:16:17 <cohosh> cutting edge security by obscurity technique
16:16:56 <shelikhoo> If we care about CPU issue we can bundle a pprof into it
16:17:09 <shelikhoo> but I am not very sure if we should do this
16:17:27 <dcf1> arma2: ack, no! sorry
16:17:38 <dcf1> https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/tree/main/probetest
16:17:46 <arma2> dcf1: there is a thing in that probetest named stun-test, for extra fun
16:17:56 <dcf1> ↑ this is the one we are talking about now, the one that runs on the broker
16:18:03 <meskio> crazy idea to put on the table, makes sense to modify standalone proxies be they own bridge and connect directly to Tor?
16:18:13 <dcf1> sorry, I was confused too
16:18:20 <arma2> whew
16:18:22 <meskio> they might not be enough to make so much change in the main bridge, and too much work to implement...
16:19:23 <cohosh> meskio: the main problem there is we still have the issues outlined in https://lists.torproject.org/pipermail/tor-project/2022-March/003303.html
16:19:24 <arma2> meskio: the challenge there would be the same one dcf1 has been facing with setting up bridge clones -- the snowflake client needs a way to know which bridge fingerprint it will be connecting to
16:19:43 <arma2> otherwise it looks like a mitm attempt, to tor
16:19:56 <dcf1> shelikhoo: I don't think there's any problem running a profiled probetest. It only takes a few minutes to reach CPU stauration after being restarted, it's not a rare edge case or anything.
16:20:25 <shelikhoo> dcf1: Yes...
16:20:50 <meskio> cohosh: what I mean is if standalone proxies where not sending traffic to the snowflake bridge but directly to tor we might reduce the load in the bridge, but maybe not enough
16:22:25 <shelikhoo> Another thing we can do is run more than one pool
16:22:30 <dcf1> what meskio is saying has some sense, if I understand it right. The standalone proxies, rather than re-wrap their streams as WebSocket, could make a TCP connection directly to the ExtORPort of the tor bridge(s) that snowflake-server forwards to
16:22:58 <dcf1> i.e., an optimization to cut out one part of the connection pipeline, in the special case where proxies can make direct TCP connections
16:22:59 <cohosh> we'd lose turbotunnel though right?
16:23:12 <dcf1> oh hm right
16:23:16 <meskio> ouch, true
16:23:29 <meskio> we need the central snowflake-server, I forgot about that
16:23:30 <shelikhoo> This will increase the speed for users in different region
16:23:51 <shelikhoo> and buy us more time to find a better solution
16:24:16 <shelikhoo> Let it work in the same way as meek
16:24:56 <dcf1> shelikhoo: it really doesn't work without the turbo tunnel component, it's not only a performance optimization
16:25:21 <dcf1> you have to restart the browser every few minutes, it's not nice
16:25:25 <cohosh> shelikhoo: hm that's true, we could probably add logic to the webextension too that groups broker-bridge pairs and then randomly chooses which pool to be in
16:25:39 <cohosh> just run two of both
16:26:17 <shelikhoo> dcf1: What I means is that "Another thing we can do is run more than one pool"
16:26:26 <dcf1> shelikhoo: aha, I see now, thanks shelikhoo
16:26:49 <arma2> with two brokers and two pools, one could get rebooted and the world wouldn't end quite so much
16:27:14 <dcf1> cohosh posted about communicating a random bridge selection to the broker in https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/issues/28651#note_2782955
16:28:00 <shelikhoo> Or we have just have separate brokers and snowflake-server
16:28:07 <dcf1> what I understand by "run more than one pool" is that the bridge selection could be *implicit* in the broker selection: each broker knows of one and only one bridge, and clients know what bridge to expect from the broker they are using
16:28:11 <shelikhoo> Or we can have just have separate brokers and snowflake-server
16:28:24 <dcf1> use broker A, expect bridge A; use broker B, expect bridge B
16:29:10 <meskio> yep, so in a single session a client will always reconnect using the same broker
16:29:25 <arma2> dcf1: i liked your "The broker could maintain a mapping from fingerprint to IP:port, which would simultaneously serve as an allowlist." sentence on that ticket
16:29:26 <cohosh> (re multiple pools): it's easy to do that separation for clients by including broker domains in the bridge line instead of the ClientTransportPlugin line
16:30:02 <arma2> dcf1: and i think having the client specify a fingerprint, and the broker provide the allowlist, is orthogonal to (i.e. a separate design decision from) whether each broker has its own set of bridges or what
16:30:15 <shelikhoo> Yes, maybe we could move broker domain to the bridge line
16:30:27 <cohosh> this multiple pools thing is actually an orthogonal approach to the multiple bridges idea in snowflake#28651
16:30:30 <dcf1> broker domain is already in the bridge line, it's url=
16:30:40 <arma2> cohosh: yes! i think / hope so too
16:30:42 <cohosh> because father down the line, each pool could have their own set of bridges
16:30:57 <cohosh> dcf1: yes but not in Tor Browser yet
16:31:33 <arma2> cohosh: yep. though in that case, we should remember that the client will need to know *which* broker to ask, to reach a given bridge. that is, that info has to go into the client-side config (e.g. the bridge line)
16:32:19 <meskio> another bottle neck for paralelization is the snowflake-server having in memory state, might be possible to move it to a DB and have several snowflake-servers in different machines running consuming that DB, but I guess this is a problem to solve farther down the line
16:32:56 <dcf1> I was thinking, we could even run the 2 brokers on the same host, as long as load allows... but that reduces to the client telling a (single) broker which bridge/pool it wants to use
16:33:35 <shelikhoo> We should allow client/proxy to choose which pool it wants to use for speed reasons
16:33:46 <dcf1> but there's a lot of merit to the idea of partitioning snowflake-server instances and bridges -- if the client's random pool/whatever selection consistently selects the same snowflake-server instance, that's a way to scale snowflake-server over multiple hosts
16:34:43 <shelikhoo> Let's say we can ask client to submit a route key?
16:35:01 <arma2> dcf1: right. i think the endpoint here needs to be "we have multiple snowflake servers/bridges" and the only question is how best to get there
16:35:48 <shelikhoo> X-Wants: pool=EU; routeKey=12345
16:36:18 <shelikhoo> or X-Wants-Pool: EU
16:36:39 <shelikhoo> X-RouteKey: 12345
16:36:45 <shelikhoo> And let nginx do the routing?
16:37:54 <meskio> what is the routekey? a number so you are always asigned to the same snowflake-server and don't need to share state between multiple snowflake-servers?
16:38:43 <shelikhoo> Yes, a random number to route the client to the correct snowflake-server
16:38:47 <dcf1> shelikhoo: I think I understand. But the client still needs to specify exactly one bridge fingerprint in the pool, in addition to an identifier for the pool
16:39:16 <meskio> could the fingerprint being given by the broker?
16:39:18 <arma2> yep. and if the routekey is the same as the fingerprint, are we all set?
16:39:19 <shelikhoo> Yes, the one bridge fingerprint limitation comes from Tor.....
16:39:46 <shelikhoo> Oh, we can use fingerprint as route key!
16:39:52 <shelikhoo> great!
16:40:00 <dcf1> meskio: no, unfortunately I don't think so.
16:40:06 <dcf1> Bridge line format is
16:40:28 <shelikhoo> but we can have something like bridgedb...
16:40:31 <meskio> ahh, sure, we can't give the fp from the pt
16:40:36 <dcf1> Bridge snowflake 192.0.2.3:1 FINGERPRINT args=vals
16:40:43 <arma2> i am imagining we have several snowflake bridge lines in tor browser's torrc, and each snowflake bridge line points to a different destination tor bridge
16:40:44 <shelikhoo> to give use a list of bridges to try
16:40:58 <shelikhoo> to give user a list of bridges to try
16:41:10 <arma2> and then whether to have one broker per bridge, or one pool per something, etc are all design choices to be made :)
16:41:28 <dcf1> and the PT protocol doesn't provide a way for the PT to tell tor what the FINGERPRINT should be, that information only flow in the other direction
16:42:00 <meskio> makes sense
16:42:22 <dcf1> arma2: the problem I see with multiple bridges is a "thundering herd" where tor tries to connect to all simultaneous, and keeps one
16:42:43 <dcf1> arlolra suggested dynamically writing the torrc file: choose a bridge line at random, and write a torrc containing only that one line
16:43:05 <dcf1> *"the problem I see with multiple bridge lines in tor browser's torrc"
16:43:11 <arma2> dcf1: right. i think i am fine with this thundering herd thing. since, it's actually kind of lightweight to do a connection and then not use it much after that
16:43:21 <cohosh> yeah in snowflake's case that "opportunistically try all configured bridges" becomes worse because trying a bridge involves the full broker poll-connect to snowflake dance
16:43:24 <arma2> dcf1: and i think the failover is a huge win, i.e. there is one next in the list
16:43:39 <dcf1> it's not super lightweight, it's N broker transactions, N STUN exchanges
16:43:45 <cohosh> so thundering herd places a lot of strain on the system
16:44:16 <dcf1> potentially N-1 proxies held idle, if tor or snowflake-client doesn't disconnect the unused ones
16:44:22 <arma2> right. how about a tor patch that shuffles your bridges and then tries them in order and doesn't try later ones if it has one it likes?
16:44:31 <shelikhoo> 1. We can first create separate pools for different region 2. make a obfs4 like system where user host their own snowflake server
16:44:47 <arma2> s/shuffles your bridges/after it shuffles your bridges/
16:44:48 <dcf1> arma2: something like that would be great
16:45:24 <arma2> dcf1: the people in russia with 500 bridges in their torrc want that feature too
16:47:19 <arma2> though i do still think that if there are e.g. five snowflake bridge lines, 5x the bootstrap-but-don't-actually-use-it connections isn't so bad. but apparently i am the minority opinion there :)
16:48:45 <arma2> what are the new building blocks we need from wherever this design goes? (a) client needs a way to specify to the snowflake what route-id-fingerprint it wants. (b) snowflake learns from broker how to map a fingerprint to an allowed ip:port.
16:48:59 <arma2> those two building blocks seem to be a part of all the designs we're considering?
16:49:36 <dcf1> yes, I think so
16:50:13 <dcf1> putting the bridge fingerprint into a SOCKS param and sending it to the broker along with the offer seems pretty straightforward
16:50:41 <dcf1> seems a necessary component in any case
16:50:53 <arlolra> I was gonna write that patch
16:51:00 <cohosh> oh hey arlolra
16:51:08 <arlolra> hi
16:51:08 <dcf1> haha I was just about to say, "don't want to speak for arlolra"
16:52:58 <shelikhoo> Yes, so the Tor tell snowflake the fingerprint to connect, this information is then shared to broker and then proxy then specific stable snowflake-server. snowflake-server then connect to named tor with that fingerprint?
16:53:25 <shelikhoo> Yes, so the Tor tell snowflake-client the fingerprint to connect, this information is then shared to broker and then proxy then specific stable snowflake-server. snowflake-server then connect to named tor with that fingerprint?
16:54:08 <meskio> wouldn't it make sense to run one snowflake-server per fingerprint?
16:54:08 <dcf1> yes, I think that's right
16:54:47 <dcf1> If there is only one bridge for each snowflake-server, then only the broker and the proxy need to know the routing. If there can be more than one bridge per snowflake-server, then the bridge fingerprint needs to be passed all the way through to snowflake-server.
16:55:18 <arma2> typically the bridge will sit very close to the snowflake-server, right?
16:55:21 <dcf1> (The pass-through can happen in the same way we currently pass client IP information for geolocation; i.e., in a URL query parameter in the WebSocket HTTP request.)
16:55:24 <arma2> i.e. they will come in pairs
16:55:28 <shelikhoo> meskio: this should work as well, so long as snowflake-server's websocket address is also passed by broker
16:55:55 <meskio> that will make it possible to place bridges in different locations and don't have the snowflake-server as bottleneck
16:56:03 <shelikhoo> I like this solution
16:56:09 <meskio> anyway a single turbotunnel connection should use a single fingerprint
16:56:31 <dcf1> arma2: yes, ideally they are close network-wise. I.e. each snowflake-server has its cadre of a few or a few dozen bridges that it knows about in the same data center.
16:56:41 <cohosh> this is basically the idea in https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/issues/28651#note_2782955? sorry i'm losing track of the thread
16:56:57 <cohosh> we have some backwards compatibility issues to consider there
16:57:13 <dcf1> that was one problem with the current hosting: the data center that the snowflake bridge is in now (Amsterdam) is full, and new VPSes are being created in Miami
16:57:18 <meskio> cohosh: I'm sorry, but I hadn't read your idea yet, but might be the same :)
16:57:56 <dcf1> I was thinking of the possibility of creating e.g. a WireGuard tunnel for the ExtORPort connections between the snowflake-server host and the bridges host, latency be damned
16:58:05 <dcf1> but obviously better if they are close
16:58:06 <cohosh> no worries, i'm just trying to figure out where we're at heh
16:58:56 <dcf1> I think the key insight I'm taking from this discussion is that it's a good idea to partition bridges so that each instance of snowflake-server has its own disjoint set.
16:59:23 <dcf1> (Even if all snowflake-servers ↔ all bridges might use available resources more efficiently)
17:00:07 <meskio> not sure I understand this last comment
17:00:17 <dcf1> A bridge fingerprint not only maps you to the bridge you want, it defines your affinity to a specific snowflake-server, which solves the problem of sharing turbo tunnel state
17:00:34 <dcf1> I guess we're out of time, but I can elaborate on what I mean after the meeting
17:00:50 <meskio> sure, let's close the meeting here
17:00:56 <meskio> #endmeeting