15:58:28 <meskio> #startmeeting tor anti-censorship meeting 15:58:28 <MeetBot> Meeting started Thu Mar 30 15:58:28 2023 UTC. The chair is meskio. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:58:28 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic. 15:58:32 <meskio> hello all!!! 15:58:34 <meskio> here is our meeting pad: https://pad.riseup.net/p/tor-anti-censorship-keep 15:58:38 <meskio> feel free to add what you've been working on and put items on the agenda 15:59:03 <shelikhoo> Hi~ Hi~ 15:59:25 <onyinyang[m]> hellooo o/ 16:00:43 <meskio> I have added the first point in the agenda 16:01:04 <meskio> I just noticed we've being ignoring merge requests on the snowflake webextension 16:01:28 <meskio> we don't have that repo connected to the triage bot, and sometimes there are weeks until one of us notice those mr 16:01:44 <meskio> how do you feel about adding that repo to triage bot? 16:01:53 <meskio> or any better ideas to keep an eye to it? 16:02:45 <cohosh> i think triage bot is a good idea 16:03:05 <shelikhoo> +1, we can always reassign if needed 16:03:16 <meskio> sounds good 16:03:21 <meskio> who should be included there? 16:03:23 <meskio> I volunteer 16:03:26 <meskio> shelikhoo? cohosh? 16:03:31 <cohosh> you can include me 16:03:34 <shelikhoo> I can be included too! 16:04:08 <meskio> I might need to reassign them some times as my knowledge of the webextension is not great, but I'll do my best there 16:04:23 <meskio> good, I'll configure triagebot with the three of us for now 16:04:32 <meskio> if anybody wants to be included feel free to poke me :) 16:04:39 <shelikhoo> It is fine, I might need to reassign some of them as well... 16:05:48 <meskio> BTW, cohosh I did assign to you already one that you did interact, but if you want to drop it into triagebot feel free to unsign yourself :) 16:06:39 <cohosh> meskio: thanks, no worries 16:06:53 <meskio> anything more on this? 16:07:23 <meskio> maybe we can move to the next topic: Update on Analysis of speed deficiency of Snowflake in China, 2023 Q1 16:07:27 <meskio> shelikhoo: is it you? 16:07:31 <shelikhoo> Yes yes! 16:08:04 <shelikhoo> I have made the necessary analysis to determine the packet loss pattern of snowflake's webrtc connection in China 16:08:23 <shelikhoo> https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/issues/40251#note_2883879 16:08:53 <shelikhoo> (there is 2 updates since last discussion) 16:09:22 <shelikhoo> and the second one is about enabling datagram transport on webrtc 16:09:28 <shelikhoo> to deal with the packet loss situation 16:10:22 <meskio> what are the tradeoffs of enabling the datagram transport? 16:10:41 * meskio is still trying to understand what it means 16:11:04 <dcf1> I'm in favor of switching to unreliable and unrelaible data channels in webrtc. it's a nontrivial change though. 16:11:04 <shelikhoo> it means enable unreliable udp like 16:11:37 <dcf1> before we had the turbotunnel feature, we weren't passing discrete "packets" through the proxy to the bridge, 16:11:38 <shelikhoo> yes, it is a complex change, but should improve the network performance 16:11:42 <meskio> I see, we already have kcp to be a relaible channel, so we don't need it in webrtc, isn't it? 16:12:11 <dcf1> instead it was one continuous "stream", and we relied on the default reliability of webrtc data channels to provide the uninterrupted stream 16:12:37 <shelikhoo> yes, and two retransmission system resulted in bad performance 16:12:42 <dcf1> when the turbotunnel feature was added, we switched conceptually from transmitting a "stream" through the proxy, to transmitting discrete "packets" (i.e., KCP packets) 16:13:17 <dcf1> but when we did that, we overlaid those discrete "packets" on top of the existing reliable stream infrastructure. in effect, just sending a length prefix before each packet 16:14:20 <dcf1> and like shelikhoo says, this gives rise to inefficiencies, like stale KCP packets being kept in a buffer during a connection interruption, even when their retransmissions have been sent later. 16:15:09 <dcf1> it's just kind of an unfortunate side effect of the way snowflake evolved, first from a reliable stream abstraction that could not tolerate the failure of a proxy, then to a session-based turbotunnel model that sends packets through the tunnel 16:16:14 <dcf1> meskio: correct, the internal KCP, whose main purpose is just to give a continuous session identifier that outlives a single proxy, also does all the reliability and retransmission stuff that TCP (or the SCTP inside WebRTC data channels) normally does 16:16:35 <meskio> that makes sense, being a complex/sizeable change, is there a way to mesure if the change does actually work without doing the whole reingeniering? or constructing that test will be as big as doing the work itself? 16:16:52 <dcf1> but there's a way to turn off reliability and ordering in SCTP inside data channels, making it work effectively like UDP datagrams, which would be is ideal for the way we use it 16:18:03 <meskio> with 'measure if it actually work' I mean, if it improves the connections from china 16:18:16 <dcf1> one question I'm not sure about though is, whether this issue is the main culprit in Snowflake slowness in China. it seems like it would affect everyone equally, not just China? or is this exacerbated by the "Great Bottleneck"? 16:19:19 <dcf1> meskio: the big challenge I foresee is backward compatibility, this change will require proxies to change and be aware of the new transmission model, so it will require a staged upgrade process like we used for the multi-bridge support 16:19:28 <cohosh> when i looked into this before, i noticed that sctp in "unreliable" mode doesn't actually mean full unreliability, there's still a notion of how much packet loss is tolerated 16:19:44 <shelikhoo> the packet loss rate in other region is usually not as high as china 16:20:10 <dcf1> one way to go about it would be not to worry about backward compatibility at first, but just create a separate testing fork that works the way we want it to, that way we could test its effectiveness without worrying about all the compatibility complications 16:20:15 <cohosh> i think this is because the ideal use case was audio/video streams where losing a few packets isn't as noticeable but loosing too many affects the quality of the stream 16:20:23 <dcf1> then if it turns out to be beneficial, we can integrate it into the existing system somehow 16:20:54 <shelikhoo> china have a unique network topology 16:20:56 <dcf1> this is how I staged the turbotunnel development, first I made some forks that were turbotunnel-only, then later merged one of them and added a magic token to the beginning of the data stream to distinguish turbotunnel from legacy connections 16:21:15 <dcf1> cohosh: it's a little different, media streams are a separate thing from data channels 16:21:27 <cohosh> yeah but datachannels still use sctp, right? 16:21:27 <meskio> dcf1: that plan sounds good 16:21:48 <dcf1> media streams are always considered lossy, and afaik there's no way to configure that. data channels may be configured to be either reliable/unreliable (and on a separate axis, either ordered/unordered). 16:21:57 <cohosh> and because sctp was designed with media channels in mind they have this property where it will still retransmit even in unreliable mode 16:22:16 <dcf1> cohosh: correct, but it's a feature of SCTP itself that it can be either reliable/unreliable or ordered/unordered. It's a feature SCTP has that TCP does not have. 16:22:28 <dcf1> See the "U" flag in SCTP chunks. 16:22:45 <dcf1> cohosh: sctp is not used at all for media streams, only for data channels. 16:22:53 <dcf1> media streams use STRP. 16:22:56 <dcf1> *SRTP 16:23:20 <cohosh> oh i see, when i looked into sctp though it was still retransmitting in unrelaible mode 16:23:40 <cohosh> the retransmission happened when the loss passed a threshold that was considere acceptable 16:23:51 <shelikhoo> there is an unordered mode 16:23:52 <cohosh> if I'm remembering correctly 16:23:58 <dcf1> this is a separate issue from the question of whether snowflake should tunnel through media streams rather than data channels; that is also a worthwhile discussion; but even staying within the paradigm of data channels we can do better than we do now 16:24:04 <shelikhoo> and a retransmission limit system 16:24:20 <dcf1> https://lists.torproject.org/pipermail/anti-censorship-team/2023-March/000286.html is my analysis from a few weeks ago 16:25:02 <cohosh> okay yeah it was the partial reliability i was thinking of, thanks 16:26:00 <dcf1> I think the "partial" reliability means it will never give you half a datagram; datagrams are still all-or-nothing and atomic like in UDP. 16:26:43 <shelikhoo> in SCTP the message boundary is always preserved 16:27:10 <shelikhoo> so every write result in a read 16:27:40 <shelikhoo> it may fragment the message or put more than one message in a ethernet frame 16:27:55 <dcf1> er, actually, "partial" is because you can configure reliability separately for each message in an SCTP association: https://www.rfc-editor.org/rfc/rfc3758#section-1.2 16:28:02 <shelikhoo> but these are hidden to application 16:28:06 <dcf1> "We define partially reliable transport service as a service that allows the user to specify, on a per message basis" 16:28:52 <dcf1> But according to my research, there's no way to have varying reliability inside a WebRTC data channel, the abstraction doesn't expose that feature of SCTP, you can only be all-reliable or all-unreliable. So for us, it's basically just "unreliable" that we want. 16:30:04 <shelikhoo> yes, and it is already quite complex to support per connection reliability setting... 16:30:09 <dcf1> cohosh: the threshold you are thinking of may be the feature where you can limit retransmission either by number of by time. 16:30:11 <shelikhoo> we won't need that part either 16:30:11 <cohosh> dcf1: thanks for clearing that up, i'm not sure where i picked up the idea of sctp having a max loss tolerance, it could've been from srtp 16:30:25 <cohosh> or that yeah 16:30:31 <shelikhoo> cohosh: dtls have this for sequence number 16:30:33 <dcf1> what we would do is turn both knobs to zero, for no retransmission at all 16:30:48 <shelikhoo> but it is not like it is going to matter for our application 16:30:49 <dcf1> https://www.rfc-editor.org/rfc/rfc8831#name-sctp-protocol-consideration "Limiting the number of retransmissions to zero, combined with unordered delivery, provides a UDP-like service where each user message is sent exactly once and delivered in the order received." 16:32:08 <dcf1> Based on what research I have done, I am in favor of this development, and thanks to shelikhoo and WofWca for the analysis they have done. 16:32:33 <WofWca[m]> 😉 16:32:37 <shelikhoo> ^~^ 16:32:47 <cohosh> yes thanks for moving this forward! 16:32:49 <meskio> nice work 16:33:03 <onyinyang[m]> this sounds great :) nice work! 16:33:47 <meskio> good, it looks we have a way forward, anything else on this topic? 16:33:57 <shelikhoo> I have do the research and try to draft a plan for this, and once it is ready we have another discussion about enabling unreliable udp like webrtc 16:34:29 <shelikhoo> I will do the research and try to draft a plan for this, and once it is ready we have another discussion about enabling unreliable udp like webrtc 16:34:58 <shelikhoo> nothing more from me... it seems... will take some thought to get it right 16:35:10 <meskio> :) 16:35:35 <meskio> I don't see more points for discussing, if you have something raise your voice... 16:36:10 <meskio> dcf1: I see you put you need help reviewing a mr, it looks assigned to cohosh, is that fine or you need some help over there? 16:36:37 <dcf1> cohosh is a good reviewer for that one 16:36:44 <cohosh> dcf1: oh sorry about that, i just noticed 16:36:56 <cohosh> i really need to fix my gitlab notifications 16:36:57 <meskio> :) 16:37:09 <cohosh> they important ones are getting drowned out heh 16:38:42 <meskio> cohosh: I have the same feeling, I tend to relay more and more in the TODO of gitlab to don't miss important things 16:39:20 <meskio> onyinyang[m]: I see you need help with rdsys updates, I'm going to be AFK until next thursday, but I'm happy to answer things about it then 16:39:35 <meskio> maybe others can help you in the mean time if you have something more urgent 16:40:04 <meskio> or we can have a conversation just after this meeting :) 16:40:30 <onyinyang[m]> yeah I think that the code I wrote for handling the bridges from rdsys in the distributor doesn't really match with rdsys' behaviour so I just want to confirm some things 16:40:39 <onyinyang[m]> a quick sync after the meeting would be helpful 16:41:06 <meskio> sounds good 16:41:15 <meskio> anything else for today? 16:41:54 <shelikhoo> EOF 16:42:05 <meskio> I guess we can end the meeting here 16:42:17 <meskio> #endmeeting