16:15:39 <onyinyang> #startmeeting tor anti-censorship meeting 16:15:39 <MeetBot> Meeting started Thu Apr 25 16:15:39 2024 UTC. The chair is onyinyang. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:15:39 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic. 16:15:39 <onyinyang> hello everyone! 16:15:39 <onyinyang> here is our meeting pad: [https://pad.riseup.net/p/r.9574e996bb9c0266213d38b91b56c469](https://pad.riseup.net/p/r.9574e996bb9c0266213d38b91b56c469) 16:15:47 <theodorsm> hi 16:16:32 <meskio> hello 16:17:18 <shelikhoo> hi! 16:21:00 <onyinyang> ok let's start the meeting :) 16:21:12 <shelikhoo> yes! 16:21:34 <onyinyang> The first item is from cohosh I think: 16:21:38 <onyinyang> What is the current repo for the snowflake website? 16:22:04 <cohosh> yeah, i'm still working on some followups to the snowflake webextension work 16:22:26 <cohosh> and i realized that i don't know what the current state of the snowflake webpage is 16:23:08 <meskio> I think there was some redesign going on tpo/web, but I don't think is finished 16:23:08 <cohosh> i think it's still pulling from the original snowflake-webext repository, but this other repo also exists: https://gitlab.torproject.org/tpo/web/snowflake 16:23:11 <meskio> not sure the state 16:23:20 <meskio> I hope ggus knows 16:24:01 <meskio> or maybe gaba 16:24:17 <cohosh> okay, i can ask in another channel later 16:24:48 <cohosh> it's not urgent, for now i will just fix the privacy policy link to point to the mozilla addon page 16:25:11 <meskio> if the website is not anymore in y the webext maybe we should remove the copy of that repo... 16:25:28 <meskio> but I got lost with what was happening with this 16:27:12 <onyinyang> ok, let's move on to the next item 16:27:31 <onyinyang> I think this one is from shelikhoo 16:27:33 <onyinyang> Snowflake Performance Improvement: 16:27:33 <onyinyang> https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/merge_reque 16:27:38 <shelikhoo> yes! 16:27:47 <onyinyang> There is possible alternative design that would reduce complexity, should we do it? https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/merge_requests/219#note_3011810 16:28:09 <shelikhoo> snowflake udp like transport's review is finished 16:28:47 <shelikhoo> one of the issue of it is that it is kind of unnecessary complex in an attempt to make sure webrtc is absolutely used as an dumb pipe 16:29:03 <shelikhoo> but since we are already using the opening message to transfer a string 16:29:20 <shelikhoo> we could transfer additional information like ConnectionID in it 16:29:53 <dcf1> The key difficulty is "tagging" the data stream with a client ID, so the snowflake server knows how to associate different connections (same client but different snowflakes) into one session 16:29:54 <shelikhoo> in order to reduce the complexity of the udp like communication channel in webrt 16:30:28 <shelikhoo> yes, it is client id... sorry my memory is messed up.. 16:30:46 <dcf1> and there is some tradaeoff here with how aware the proxy should be of the client ID (i.e., should it have to parse out the client ID, or can it pass everything straight through to the snowflake server without interpretation in a "dumb pipe" fashion) 16:31:31 <dcf1> shelikhoo had an implementation that retained the dumb pipe property, but I was worried that it created a lot of states to manage 16:33:06 <shelikhoo> after not working on it for a while and forgot my initial mindset, and look at it again, I do have more realization 16:33:11 <dcf1> shelikhoo: so if I understand correctly, the new proposal is to send the client ID in the WebRTC data channel initialization (the DCEP), which is "out of band" of the remainder of the data stream, have the proxy read the client ID from there, and then prepend the WebSocket stream with the client ID (whereas formerly it was the client that would prepend the client ID)? 16:33:56 <dcf1> I am a bit confused, I thought you had proposed a new approach of sending the client ID with the rendezvous message, and having the proxy retrieve the client ID along with the SDP offer etc., but I may be misremembering. 16:34:39 <shelikhoo> dcf1: it is sending it with data channel initialization message, not rendezvous 16:34:51 <shelikhoo> the initial idea was with rendezvous message 16:35:09 <shelikhoo> then it changed to use a layered protocol in the udp like channel 16:35:12 <dcf1> Err, sorry, I said "prepend the WebSocket with the client ID" but you actually propose to send it as a query parameter in the WebSocket request, that is fine too. 16:35:38 <dcf1> ok, I remembered the order wrong. I thought it had been 1. layered protocol 2. rendezvous 3. DCEP 16:35:58 <shelikhoo> and now, the potential design is send the clientid with DCEP 16:36:08 <cohosh> are there limitations on the size of the initialization message? 16:36:51 <dcf1> This is DCEP "data channel establishment protocol": https://datatracker.ietf.org/doc/html/rfc8832 16:37:46 <cohosh> ah okay, looks like it can be pretty long 16:37:50 <shelikhoo> yes, the data would be send via websocket's path 16:37:51 <dcf1> When you call https://pkg.go.dev/github.com/pion/webrtc#RTCPeerConnection.CreateDataChannel with a particular `label` and `protocol`, that's how those bits of information get transmitted to the peer 16:38:26 <dcf1> shelikhoo: I kind of like the idea of putting metadata in the websocket URL query, rather than having it in-band in the data stream as currently 16:39:14 <shelikhoo> yes, and I would design it in a way that proxy don't interpret or process the extra data, so that we can upgrade it later 16:39:16 <dcf1> the only mild concern I have is packet size fingerprinting, if we send fixed strings within the DCEP, our packets may always have an identifiable size, and this is a place where we cannot really do any external shaping 16:39:39 <shelikhoo> this would allow us to send extra data in the label 16:39:54 <shelikhoo> and this could be used for padding the initial message 16:40:02 <dcf1> theodorsm: this part may interest you too. it's not DTLS handshake fingerprinting, but it would be the size of one of the first packets sent after the DTLS handshake finishes. the question, I guess, is what other popular implementations do with their DCEP. 16:40:19 <theodorsm> I will check it out 16:40:23 <dcf1> right, we can pad larger if needed, but we cannot easily make it smaller, especially if we start using it for other things 16:40:25 <shelikhoo> or change what communication protocol/padding protocol used in the udp like channel 16:40:45 <shelikhoo> yes, we won't be able to reduce its size later 16:41:54 <theodorsm> This paper talks a bit about fingerprinting padding in encrypted TLS packets: https://www.usenix.org/system/files/sec24summer-prepub-465-xue.pdf 16:42:33 <dcf1> it's becoming more and more of a hot topic 16:43:04 <cohosh> what's the concern with sending it in the rendezvous? was it complexity? 16:43:46 <shelikhoo> cohosh: bad proxy would be able to collect many client id and use it to drain the client's packet 16:43:55 <dcf1> I definitely like the idea of transmitting the client ID in the DCEP and WebSocket request path, with the only caveat being possible packet length fingerprinting of DCEP packets. I'm confident we can devise a protocol to do shaping/padding however we like (up to DTLS constraints) in the remainder of the protocol. 16:45:45 <dcf1> Currently, proxies are ignorant of the client ID. The client prepends the ID to its reliable data channel data stream, which the proxy passes through verbatim as WebSocket to the server. 16:46:01 <dcf1> This only works because we are currently using reliable data channels. 16:46:37 <dcf1> The crux of the issues is that we need to send the client ID to the server, and ensure it has been received, *before* the rest of the unreliable data channel packets can have any meaning to the server. 16:47:04 <shelikhoo> in my pet protocol equivalent to turbotunnel = VLite , the clientid(known as connID there) is combined with a connection counter before encryption then send to the untrusted channel, and connections with higher counter can just kickout connections with lower counter number, so untrusted channel couldn't collect the initial message to drain the client's packet 16:47:49 <cohosh> shelikhoo: how is this worse than what bad proxies can currently do, though? 16:48:11 <dcf1> yeah I was wondering if there's some kind of crypto trick that would make it nicer to use rendezvous as an alternative 16:48:23 <cohosh> i think using the DCEP is a good idea but if we run into padding issues, sending it to the proxy via the rendezvous seems like a good fallback 16:48:37 <cohosh> and we probably wouldn't have to change anything on the websocket side if we switch 16:48:43 <shelikhoo> cohosh: currently bad proxy have to connect to client to get the connid, which is more expensive than just get it from rendezvous 16:48:53 <cohosh> sorry, fingerprinting issues not padding issues 16:48:56 <theodorsm> Rather than transmitting in first packet after the handshake, you could send a few random packets with variable size before sending the id 16:49:08 <theodorsm> just junk data 16:50:01 <dcf1> theodorsm: the problem with that, though, is that you can't guarantee the packets arrive successfully, because it's now using an unreliable data channel 16:50:41 <shelikhoo> cohosh: yes, we could use rendezvous as a fallback, but that would means we need to update the proxy again to apply that update 16:51:02 <shelikhoo> the reason I wish to kind of make the proxy a dumb pipe is updating it is so difficult 16:51:03 <dcf1> theodorsm: that idea is pretty much what https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/merge_requests/219 does: it defines two packet types, one with client ID and one without, and the client keeps transmitting the client ID with *every* packet untile it gets acknowledgement that the server has received and interpreted it 16:51:22 <shelikhoo> especially for standalone ones 16:51:32 <cohosh> shelikhoo: true, proxy updates are nontrivial 16:51:44 <theodorsm> ahh okay, I haven't really read the full MR and idea 16:52:17 <cohosh> since we have to update the proxy anyway, could we have it look for a field from the rendezvous as well as in the DCEP 16:52:27 <cohosh> and then use whichever one it finds 16:53:03 <cohosh> and then if we have to switch to sending it via the rendezvous, the proxies will already be looking for it 16:53:04 <dcf1> and if it finds both? 16:53:14 <cohosh> hmm, we could give priority to one of them 16:53:34 <shelikhoo> or just send both extra fields with websocket connection 16:53:38 <cohosh> maybe to the DCEP, since that's directly from the client 16:53:50 <shelikhoo> or just send both extra fields with websocket connection's path 16:54:10 <shelikhoo> and keep the one from rendezvous empty for now 16:54:17 <shelikhoo> so that we can update them later 16:54:22 <shelikhoo> without updating the proxy 16:54:25 <dcf1> one thing I was uncomfortable with with !291 was there were some ambiguous/undefined state combinations. but also, thinking about the challenge of updating proxies is good foresight. 16:55:30 <dcf1> yeah so it's a tradeoff between increasing upgrade friction and introducing unused/little-used code paths 16:56:40 <shelikhoo> dcf1: yes, sending the extra data with DCEP basically both allow future update and also reduce these undefined state 16:57:20 <theodorsm> Could you implicitly create a ID without sending one? For example hashing ip/port + time in random field to create a unique id? 16:57:29 <shelikhoo> and add extra data field to rendezvous can allow more smooth update in the future 16:57:41 <shelikhoo> without a lot of benefit for now 16:57:49 <theodorsm> might open up to hijacking attacks tho 16:58:31 <cohosh> theodorsm: another problem is that the client id needs to be consistent across multiple connections which will use different ports and have different connection times 16:58:56 <theodorsm> ahh 16:59:10 <dcf1> theodorsm: for some background see https://www.bamsoftware.com/papers/turbotunnel/#sec:implementing 16:59:23 <shelikhoo> I think derive the clientid would just make it more complex... 16:59:29 <dcf1> "The encapsulation must also associate with each packet a session identifier, a unique value that enables the recipient to distinguish packets belonging to different sessions, analogous to the four-tuple in TCP. The session identifier, being decoupled from any network address, enables roaming by the client, as in Mosh [27 §2.2] and WireGuard [5 §II-A]. The details of roaming depend on the obfuscation, 16:59:35 <dcf1> but generally, a server receiving upstream packets tagged with a certain session identifier from a particular network address assumes that downstream packets for that session may be sent to that same network address." 17:00:24 <dcf1> you could maybe derive it from the client IP address, if that's stable, but not the port, since that changes with every proxy connection, and even the IP address wouldn't work with NAT, or if you wanted to run two snowflake clients at once on the same computer 17:00:59 <meskio> a single IP address could have multiple snowflake clients 17:01:19 <dcf1> that's the key idea behind the turbo tunnel design: we decouple the session from any particular network connection, but in so doing we need to introduce an identifier to serve the purpose that IP/port 4-tuple does in ordinary network protocols 17:01:42 <dcf1> QUIC also learned this lesson, it has a connection identifier separate from 4-tuple 17:03:18 <dcf1> shelikhoo: the rough outlines of what you have proposed sound good to me though. I can tell you have been thinking about it carefully. 17:04:39 <shelikhoo> yes, I think I can go ahead and implement the extradata over DCEP. and the extradata over rendezvous can wait for while 17:05:32 <shelikhoo> this is the design with a good balance between complexity and upgrade friendlyness 17:06:05 <shelikhoo> over 17:06:07 <onyinyang> ok, I think we can move on to the final topic after that very informative discussion? :) 17:06:11 <dcf1> maybe we'll get lucky, and common implementations already send a hex GUID `label` in the DCEP or something, giving us room to play with and a padding target 17:06:19 <dcf1> seacrest out 17:06:48 <onyinyang> Ok the only remaining thin is the interesting links: 17:06:50 <onyinyang> Maybe already known, Brave Browser has a snowflake proxy feature (since 2023) https://github.com/brave/brave-browser/issues/25315 17:07:07 <dcf1> I added that note, for some reason I had not been aware of it 17:07:35 <onyinyang> I hadn't heard about it either 17:07:35 <shelikhoo> nice! 17:07:37 <dcf1> I'm guessing the brave browser proxies get counted with webext and we don't have a separate estimate of how prevalent they are 17:07:58 <meskio> AFAIK yes, they just embed the webext 17:08:14 <onyinyang> cool :) 17:10:01 <onyinyang> that looks like it, is there anything else anyone would like to add before I end the meeting? 17:10:11 <cohosh> JackWampler[m]: i see you joined the meeting today 17:10:17 <shelikhoo> eof from me 17:10:27 <onyinyang> ohhi JackWampler[m]! :) 17:11:34 <JackWampler[m]> Hi! Just wanted to listen in for today. I have been working on some pluggable transport things in rust, I am hoping to get it into a more share-able state soon 17:11:44 <cohosh> nice, welcome! 17:11:53 <cohosh> sorry to put you on the spot 17:12:10 <cohosh> we can provide a link to the agenda pad if you want to add something for a future meeting 17:12:15 <meskio> nice 17:13:48 <JackWampler[m]> Sounds good! thanks 17:14:01 <onyinyang> looking forward to hearing more about it! 17:14:29 <onyinyang> on that note: 17:14:30 <onyinyang> #endmeeting