#tor-meeting log

16:15:39 <onyinyang> #startmeeting tor anti-censorship meeting
16:15:39 <MeetBot> Meeting started Thu Apr 25 16:15:39 2024 UTC.  The chair is onyinyang. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:15:39 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic.
16:15:39 <onyinyang> hello everyone!
16:15:39 <onyinyang> here is our meeting pad: [https://pad.riseup.net/p/r.9574e996bb9c0266213d38b91b56c469](https://pad.riseup.net/p/r.9574e996bb9c0266213d38b91b56c469)
16:15:47 <theodorsm> hi
16:16:32 <meskio> hello
16:17:18 <shelikhoo> hi!
16:21:00 <onyinyang> ok let's start the meeting :)
16:21:12 <shelikhoo> yes!
16:21:34 <onyinyang> The first item is from cohosh I think:
16:21:38 <onyinyang> What is the current repo for the snowflake website?
16:22:04 <cohosh> yeah, i'm still working on some followups to the snowflake webextension work
16:22:26 <cohosh> and i realized that i don't know what the current state of the snowflake webpage is
16:23:08 <meskio> I think there was some redesign going on tpo/web, but I don't think is finished
16:23:08 <cohosh> i think it's still pulling from the original snowflake-webext repository, but this other repo also exists: https://gitlab.torproject.org/tpo/web/snowflake
16:23:11 <meskio> not sure the state
16:23:20 <meskio> I hope ggus knows
16:24:01 <meskio> or maybe gaba
16:24:17 <cohosh> okay, i can ask in another channel later
16:24:48 <cohosh> it's not urgent, for now i will just fix the privacy policy link to point to the mozilla addon page
16:25:11 <meskio> if the website is not anymore in y the webext maybe we should remove the copy of that repo...
16:25:28 <meskio> but I got lost with what was happening with this
16:27:12 <onyinyang> ok, let's move on to the next item
16:27:31 <onyinyang> I think this one is from shelikhoo
16:27:33 <onyinyang> Snowflake Performance Improvement:
16:27:33 <onyinyang> https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/merge_reque
16:27:38 <shelikhoo> yes!
16:27:47 <onyinyang> There is possible alternative design that would reduce complexity, should we do it? https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/merge_requests/219#note_3011810
16:28:09 <shelikhoo> snowflake udp like transport's review is finished
16:28:47 <shelikhoo> one of the issue of it is that it is kind of unnecessary complex in an attempt to make sure webrtc is absolutely used as an dumb pipe
16:29:03 <shelikhoo> but since we are already using the opening message to transfer a string
16:29:20 <shelikhoo> we could transfer additional information like ConnectionID in it
16:29:53 <dcf1> The key difficulty is "tagging" the data stream with a client ID, so the snowflake server knows how to associate different connections (same client but different snowflakes) into one session
16:29:54 <shelikhoo> in order to reduce the complexity of the udp like communication channel in webrt
16:30:28 <shelikhoo> yes, it is client id... sorry my memory is messed up..
16:30:46 <dcf1> and there is some tradaeoff here with how aware the proxy should be of the client ID (i.e., should it have to parse out the client ID, or can it pass everything straight through to the snowflake server without interpretation in a "dumb pipe" fashion)
16:31:31 <dcf1> shelikhoo had an implementation that retained the dumb pipe property, but I was worried that it created a lot of states to manage
16:33:06 <shelikhoo> after not working on it for a while and forgot my initial mindset, and look at it again, I do have more realization
16:33:11 <dcf1> shelikhoo: so if I understand correctly, the new proposal is to send the client ID in the WebRTC data channel initialization (the DCEP), which is "out of band" of the remainder of the data stream, have the proxy read the client ID from there, and then prepend the WebSocket stream with the client ID (whereas formerly it was the client that would prepend the client ID)?
16:33:56 <dcf1> I am a bit confused, I thought you had proposed a new approach of sending the client ID with the rendezvous message, and having the proxy retrieve the client ID along with the SDP offer etc., but I may be misremembering.
16:34:39 <shelikhoo> dcf1: it is sending it with data channel initialization message, not rendezvous
16:34:51 <shelikhoo> the initial idea was with rendezvous message
16:35:09 <shelikhoo> then it changed to use a layered protocol in the udp like channel
16:35:12 <dcf1> Err, sorry, I said "prepend the WebSocket with the client ID" but you actually propose to send it as a query parameter in the WebSocket request, that is fine too.
16:35:38 <dcf1> ok, I remembered the order wrong. I thought it had been 1. layered protocol 2. rendezvous 3. DCEP
16:35:58 <shelikhoo> and now, the potential design is send the clientid with DCEP
16:36:08 <cohosh> are there limitations on the size of the initialization message?
16:36:51 <dcf1> This is DCEP "data channel establishment protocol": https://datatracker.ietf.org/doc/html/rfc8832
16:37:46 <cohosh> ah okay, looks like it can be pretty long
16:37:50 <shelikhoo> yes, the data would be send via websocket's path
16:37:51 <dcf1> When you call https://pkg.go.dev/github.com/pion/webrtc#RTCPeerConnection.CreateDataChannel with a particular `label` and `protocol`, that's how those bits of information get transmitted to the peer
16:38:26 <dcf1> shelikhoo: I kind of like the idea of putting metadata in the websocket URL query, rather than having it in-band in the data stream as currently
16:39:14 <shelikhoo> yes, and I would design it in a way that proxy don't interpret or process the extra data, so that we can upgrade it later
16:39:16 <dcf1> the only mild concern I have is packet size fingerprinting, if we send fixed strings within the DCEP, our packets may always have an identifiable size, and this is a place where we cannot really do any external shaping
16:39:39 <shelikhoo> this would allow us to send extra data in the label
16:39:54 <shelikhoo> and this could be used for padding the initial message
16:40:02 <dcf1> theodorsm: this part may interest you too. it's not DTLS handshake fingerprinting, but it would be the size of one of the first packets sent after the DTLS handshake finishes. the question, I guess, is what other popular implementations do with their DCEP.
16:40:19 <theodorsm> I will check it out
16:40:23 <dcf1> right, we can pad larger if needed, but we cannot easily make it smaller, especially if we start using it for other things
16:40:25 <shelikhoo> or change what communication protocol/padding protocol used in the udp like channel
16:40:45 <shelikhoo> yes, we won't be able to reduce its size later
16:41:54 <theodorsm> This paper talks a bit about fingerprinting padding in encrypted TLS packets: https://www.usenix.org/system/files/sec24summer-prepub-465-xue.pdf
16:42:33 <dcf1> it's becoming more and more of a hot topic
16:43:04 <cohosh> what's the concern with sending it in the rendezvous? was it complexity?
16:43:46 <shelikhoo> cohosh: bad proxy would be able to collect many client id and use it to drain the client's packet
16:43:55 <dcf1> I definitely like the idea of transmitting the client ID in the DCEP and WebSocket request path, with the only caveat being possible packet length fingerprinting of DCEP packets. I'm confident we can devise a protocol to do shaping/padding however we like (up to DTLS constraints) in the remainder of the protocol.
16:45:45 <dcf1> Currently, proxies are ignorant of the client ID. The client prepends the ID to its reliable data channel data stream, which the proxy passes through verbatim as WebSocket to the server.
16:46:01 <dcf1> This only works because we are currently using reliable data channels.
16:46:37 <dcf1> The crux of the issues is that we need to send the client ID to the server, and ensure it has been received, *before* the rest of the unreliable data channel packets can have any meaning to the server.
16:47:04 <shelikhoo> in my pet protocol equivalent to turbotunnel = VLite , the clientid(known as connID there) is combined with a connection counter before encryption then send to the untrusted channel, and connections with higher counter can just kickout connections with lower counter number, so untrusted channel couldn't collect the initial message to drain the client's packet
16:47:49 <cohosh> shelikhoo: how is this worse than what bad proxies can currently do, though?
16:48:11 <dcf1> yeah I was wondering if there's some kind of crypto trick that would make it nicer to use rendezvous as an alternative
16:48:23 <cohosh> i think using the DCEP is a good idea but if we run into padding issues, sending it to the proxy via the rendezvous seems like a good fallback
16:48:37 <cohosh> and we probably wouldn't have to change anything on the websocket side if we switch
16:48:43 <shelikhoo> cohosh: currently bad proxy have to connect to client to get the connid, which is more expensive than just get it from rendezvous
16:48:53 <cohosh> sorry, fingerprinting issues not padding issues
16:48:56 <theodorsm> Rather than transmitting in first packet after the handshake, you could send a few random packets with variable size before sending the id
16:49:08 <theodorsm> just junk data
16:50:01 <dcf1> theodorsm: the problem with that, though, is that you can't guarantee the packets arrive successfully, because it's now using an unreliable data channel
16:50:41 <shelikhoo> cohosh: yes, we could use rendezvous as a fallback, but that would means we need to update the proxy again to apply that update
16:51:02 <shelikhoo> the reason I wish to kind of make the proxy a dumb pipe is updating it is so difficult
16:51:03 <dcf1> theodorsm: that idea is pretty much what https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/merge_requests/219 does: it defines two packet types, one with client ID and one without, and the client keeps transmitting the client ID with *every* packet untile it gets acknowledgement that the server has received and interpreted it
16:51:22 <shelikhoo> especially for standalone ones
16:51:32 <cohosh> shelikhoo: true, proxy updates are nontrivial
16:51:44 <theodorsm> ahh okay, I haven't really read the full MR and idea
16:52:17 <cohosh> since we have to update the proxy anyway, could we have it look for a field from the rendezvous as well as in the DCEP
16:52:27 <cohosh> and then use whichever one it finds
16:53:03 <cohosh> and then if we have to switch to sending it via the rendezvous, the proxies will already be looking for it
16:53:04 <dcf1> and if it finds both?
16:53:14 <cohosh> hmm, we could give priority to one of them
16:53:34 <shelikhoo> or just send both extra fields with websocket connection
16:53:38 <cohosh> maybe to the DCEP, since that's directly from the client
16:53:50 <shelikhoo> or just send both extra fields with websocket connection's path
16:54:10 <shelikhoo> and keep the one from rendezvous empty for now
16:54:17 <shelikhoo> so that we can update them later
16:54:22 <shelikhoo> without updating the proxy
16:54:25 <dcf1> one thing I was uncomfortable with with !291 was there were some ambiguous/undefined state combinations. but also, thinking about the challenge of updating proxies is good foresight.
16:55:30 <dcf1> yeah so it's a tradeoff between increasing upgrade friction and introducing unused/little-used code paths
16:56:40 <shelikhoo> dcf1: yes, sending the extra data with DCEP basically both allow future update and also reduce these undefined state
16:57:20 <theodorsm> Could you implicitly create a ID without sending one? For example hashing ip/port + time in random field to create a unique id?
16:57:29 <shelikhoo> and add extra data field to rendezvous can allow more smooth update in the future
16:57:41 <shelikhoo> without a lot of benefit for now
16:57:49 <theodorsm> might open up to hijacking attacks tho
16:58:31 <cohosh> theodorsm: another problem is that the client id needs to be consistent across multiple connections which will use different ports and have different connection times
16:58:56 <theodorsm> ahh
16:59:10 <dcf1> theodorsm: for some background see https://www.bamsoftware.com/papers/turbotunnel/#sec:implementing
16:59:23 <shelikhoo> I think derive the clientid would just make it more complex...
16:59:29 <dcf1> "The encapsulation must also associate with each packet a session identifier, a unique value that enables the recipient to distinguish packets belonging to different sessions, analogous to the four-tuple in TCP. The session identifier, being decoupled from any network address, enables roaming by the client, as in Mosh [27 §2.2] and WireGuard [5 §II-A]. The details of roaming depend on the obfuscation,
16:59:35 <dcf1> but generally, a server receiving upstream packets tagged with a certain session identifier from a particular network address assumes that downstream packets for that session may be sent to that same network address."
17:00:24 <dcf1> you could maybe derive it from the client IP address, if that's stable, but not the port, since that changes with every proxy connection, and even the IP address wouldn't work with NAT, or if you wanted to run two snowflake clients at once on the same computer
17:00:59 <meskio> a single IP address could have multiple snowflake clients
17:01:19 <dcf1> that's the key idea behind the turbo tunnel design: we decouple the session from any particular network connection, but in so doing we need to introduce an identifier to serve the purpose that IP/port 4-tuple does in ordinary network protocols
17:01:42 <dcf1> QUIC also learned this lesson, it has a connection identifier separate from 4-tuple
17:03:18 <dcf1> shelikhoo: the rough outlines of what you have proposed sound good to me though. I can tell you have been thinking about it carefully.
17:04:39 <shelikhoo> yes, I think I can go ahead and implement the extradata over DCEP. and the extradata over rendezvous can wait for while
17:05:32 <shelikhoo> this is the design with a good balance between complexity and upgrade friendlyness
17:06:05 <shelikhoo> over
17:06:07 <onyinyang> ok, I think we can move on to the final topic after that very informative discussion? :)
17:06:11 <dcf1> maybe we'll get lucky, and common implementations already send a hex GUID `label` in the DCEP or something, giving us room to play with and a padding target
17:06:19 <dcf1> seacrest out
17:06:48 <onyinyang> Ok the only remaining thin is the interesting links:
17:06:50 <onyinyang> Maybe already known, Brave Browser has a snowflake proxy feature (since 2023) https://github.com/brave/brave-browser/issues/25315
17:07:07 <dcf1> I added that note, for some reason I had not been aware of it
17:07:35 <onyinyang> I hadn't heard about it either
17:07:35 <shelikhoo> nice!
17:07:37 <dcf1> I'm guessing the brave browser proxies get counted with webext and we don't have a separate estimate of how prevalent they are
17:07:58 <meskio> AFAIK yes, they just embed the webext
17:08:14 <onyinyang> cool :)
17:10:01 <onyinyang> that looks like it, is there anything else anyone would like to add before I end the meeting?
17:10:11 <cohosh> JackWampler[m]: i see you joined the meeting today
17:10:17 <shelikhoo> eof from me
17:10:27 <onyinyang> ohhi JackWampler[m]! :)
17:11:34 <JackWampler[m]> Hi! Just wanted to listen in for today. I have been working on some pluggable transport things in rust, I am hoping to get it into a more share-able state soon
17:11:44 <cohosh> nice, welcome!
17:11:53 <cohosh> sorry to put you on the spot
17:12:10 <cohosh> we can provide a link to the agenda pad if you want to add something for a future meeting
17:12:15 <meskio> nice
17:13:48 <JackWampler[m]> Sounds good! thanks
17:14:01 <onyinyang> looking forward to hearing more about it!
17:14:29 <onyinyang> on that note:
17:14:30 <onyinyang> #endmeeting