15:58:28 <meskio> #startmeeting tor anti-censorship meeting
15:58:28 <MeetBot> Meeting started Thu Mar 30 15:58:28 2023 UTC.  The chair is meskio. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:58:28 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic.
15:58:32 <meskio> hello all!!!
15:58:34 <meskio> here is our meeting pad: https://pad.riseup.net/p/tor-anti-censorship-keep
15:58:38 <meskio> feel free to add what you've been working on and put items on the agenda
15:59:03 <shelikhoo> Hi~ Hi~
15:59:25 <onyinyang[m]> hellooo o/
16:00:43 <meskio> I have added the first point in the agenda
16:01:04 <meskio> I just noticed we've being ignoring merge requests on the snowflake webextension
16:01:28 <meskio> we don't have that repo connected to the triage bot, and sometimes there are weeks until one of us notice those mr
16:01:44 <meskio> how do you feel about adding that repo to triage bot?
16:01:53 <meskio> or any better ideas to keep an eye to it?
16:02:45 <cohosh> i think triage bot is a good idea
16:03:05 <shelikhoo> +1, we can always reassign if needed
16:03:16 <meskio> sounds good
16:03:21 <meskio> who should be included there?
16:03:23 <meskio> I volunteer
16:03:26 <meskio> shelikhoo? cohosh?
16:03:31 <cohosh> you can include me
16:03:34 <shelikhoo> I can be included too!
16:04:08 <meskio> I might need to reassign them some times as my knowledge of the webextension is not great, but I'll do my best there
16:04:23 <meskio> good, I'll configure triagebot with the three of us for now
16:04:32 <meskio> if anybody wants to be included feel free to poke me :)
16:04:39 <shelikhoo> It is fine, I might need to reassign some of them as well...
16:05:48 <meskio> BTW, cohosh I did assign to you already one that you did interact, but if you want to drop it into triagebot feel free to unsign yourself :)
16:06:39 <cohosh> meskio: thanks, no worries
16:06:53 <meskio> anything more on this?
16:07:23 <meskio> maybe we can move to the next topic: Update on Analysis of speed deficiency of Snowflake in China, 2023 Q1
16:07:27 <meskio> shelikhoo: is it you?
16:07:31 <shelikhoo> Yes yes!
16:08:04 <shelikhoo> I have made the necessary analysis to determine the packet loss pattern of snowflake's webrtc connection in China
16:08:23 <shelikhoo> https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/issues/40251#note_2883879
16:08:53 <shelikhoo> (there is 2 updates since last discussion)
16:09:22 <shelikhoo> and the second one is about enabling datagram transport on webrtc
16:09:28 <shelikhoo> to deal with the packet loss situation
16:10:22 <meskio> what are the tradeoffs of enabling the datagram transport?
16:10:41 * meskio is still trying to understand what it means
16:11:04 <dcf1> I'm in favor of switching to unreliable and unrelaible data channels in webrtc. it's a nontrivial change though.
16:11:04 <shelikhoo> it means enable unreliable udp like
16:11:37 <dcf1> before we had the turbotunnel feature, we weren't passing discrete "packets" through the proxy to the bridge,
16:11:38 <shelikhoo> yes, it is a complex change, but should improve the network performance
16:11:42 <meskio> I see, we already have kcp to be a relaible channel, so we don't need it in webrtc, isn't it?
16:12:11 <dcf1> instead it was one continuous "stream", and we relied on the default reliability of webrtc data channels to provide the uninterrupted stream
16:12:37 <shelikhoo> yes, and two retransmission system resulted in bad performance
16:12:42 <dcf1> when the turbotunnel feature was added, we switched conceptually from transmitting a "stream" through the proxy, to transmitting discrete "packets" (i.e., KCP packets)
16:13:17 <dcf1> but when we did that, we overlaid those discrete "packets" on top of the existing reliable stream infrastructure. in effect, just sending a length prefix before each packet
16:14:20 <dcf1> and like shelikhoo says, this gives rise to inefficiencies, like stale KCP packets being kept in a buffer during a connection interruption, even when their retransmissions have been sent later.
16:15:09 <dcf1> it's just kind of an unfortunate side effect of the way snowflake evolved, first from a reliable stream abstraction that could not tolerate the failure of a proxy, then to a session-based turbotunnel model that sends packets through the tunnel
16:16:14 <dcf1> meskio: correct, the internal KCP, whose main purpose is just to give a continuous session identifier that outlives a single proxy, also does all the reliability and retransmission stuff that TCP (or the SCTP inside WebRTC data channels) normally does
16:16:35 <meskio> that makes sense, being a complex/sizeable change, is there a way to mesure if the change does actually work without doing the whole reingeniering? or constructing that test will be as big as doing the work itself?
16:16:52 <dcf1> but there's a way to turn off reliability and ordering in SCTP inside data channels, making it work effectively like UDP datagrams, which would be is ideal for the way we use it
16:18:03 <meskio> with 'measure if it actually work' I mean, if it improves the connections from china
16:18:16 <dcf1> one question I'm not sure about though is, whether this issue is the main culprit in Snowflake slowness in China. it seems like it would affect everyone equally, not just China? or is this exacerbated by the "Great Bottleneck"?
16:19:19 <dcf1> meskio: the big challenge I foresee is backward compatibility, this change will require proxies to change and be aware of the new transmission model, so it will require a staged upgrade process like we used for the multi-bridge support
16:19:28 <cohosh> when i looked into this before, i noticed that sctp in "unreliable" mode doesn't actually mean full unreliability, there's still a notion of how much packet loss is tolerated
16:19:44 <shelikhoo> the packet loss rate in other region is usually not as high as china
16:20:10 <dcf1> one way to go about it would be not to worry about backward compatibility at first, but just create a separate testing fork that works the way we want it to, that way we could test its effectiveness without worrying about all the compatibility complications
16:20:15 <cohosh> i think this is because the ideal use case was audio/video streams where losing a few packets isn't as noticeable but loosing too many affects the quality of the stream
16:20:23 <dcf1> then if it turns out to be beneficial, we can integrate it into the existing system somehow
16:20:54 <shelikhoo> china have a unique network topology
16:20:56 <dcf1> this is how I staged the turbotunnel development, first I made some forks that were turbotunnel-only, then later merged one of them and added a magic token to the beginning of the data stream to distinguish turbotunnel from legacy connections
16:21:15 <dcf1> cohosh: it's a little different, media streams are a separate thing from data channels
16:21:27 <cohosh> yeah but datachannels still use sctp, right?
16:21:27 <meskio> dcf1: that plan sounds good
16:21:48 <dcf1> media streams are always considered lossy, and afaik there's no way to configure that. data channels may be configured to be either reliable/unreliable (and on a separate axis, either ordered/unordered).
16:21:57 <cohosh> and because sctp was designed with media channels in mind they have this property where it will still retransmit even in unreliable mode
16:22:16 <dcf1> cohosh: correct, but it's a feature of SCTP itself that it can be either reliable/unreliable or ordered/unordered. It's a feature SCTP has that TCP does not have.
16:22:28 <dcf1> See the "U" flag in SCTP chunks.
16:22:45 <dcf1> cohosh: sctp is not used at all for media streams, only for data channels.
16:22:53 <dcf1> media streams use STRP.
16:22:56 <dcf1> *SRTP
16:23:20 <cohosh> oh i see, when i looked into sctp though it was still retransmitting in unrelaible mode
16:23:40 <cohosh> the retransmission happened when the loss passed a threshold that was considere acceptable
16:23:51 <shelikhoo> there is an unordered mode
16:23:52 <cohosh> if I'm remembering correctly
16:23:58 <dcf1> this is a separate issue from the question of whether snowflake should tunnel through media streams rather than data channels; that is also a worthwhile discussion; but even staying within the paradigm of data channels we can do better than we do now
16:24:04 <shelikhoo> and a retransmission limit system
16:24:20 <dcf1> https://lists.torproject.org/pipermail/anti-censorship-team/2023-March/000286.html is my analysis from a few weeks ago
16:25:02 <cohosh> okay yeah it was the partial reliability i was thinking of, thanks
16:26:00 <dcf1> I think the "partial" reliability means it will never give you half a datagram; datagrams are still all-or-nothing and atomic like in UDP.
16:26:43 <shelikhoo> in SCTP the message boundary is always preserved
16:27:10 <shelikhoo> so every write result in a read
16:27:40 <shelikhoo> it may fragment the message or put more than one message in a ethernet frame
16:27:55 <dcf1> er, actually, "partial" is because you can configure reliability separately for each message in an SCTP association: https://www.rfc-editor.org/rfc/rfc3758#section-1.2
16:28:02 <shelikhoo> but these are hidden to application
16:28:06 <dcf1> "We define partially reliable transport service as a service that allows the user to specify, on a per message basis"
16:28:52 <dcf1> But according to my research, there's no way to have varying reliability inside a WebRTC data channel, the abstraction doesn't expose that feature of SCTP, you can only be all-reliable or all-unreliable. So for us, it's basically just "unreliable" that we want.
16:30:04 <shelikhoo> yes, and it is already quite complex to support per connection reliability setting...
16:30:09 <dcf1> cohosh: the threshold you are thinking of may be the feature where you can limit retransmission either by number of by time.
16:30:11 <shelikhoo> we won't need that part either
16:30:11 <cohosh> dcf1: thanks for clearing that up, i'm not sure where i picked up the idea of sctp having a max loss tolerance, it could've been from srtp
16:30:25 <cohosh> or that yeah
16:30:31 <shelikhoo> cohosh: dtls have this for sequence number
16:30:33 <dcf1> what we would do is turn both knobs to zero, for no retransmission at all
16:30:48 <shelikhoo> but it is not like it is going to matter for our application
16:30:49 <dcf1> https://www.rfc-editor.org/rfc/rfc8831#name-sctp-protocol-consideration "Limiting the number of retransmissions to zero, combined with unordered delivery, provides a UDP-like service where each user message is sent exactly once and delivered in the order received."
16:32:08 <dcf1> Based on what research I have done, I am in favor of this development, and thanks to shelikhoo and WofWca for the analysis they have done.
16:32:33 <WofWca[m]> 😉
16:32:37 <shelikhoo> ^~^
16:32:47 <cohosh> yes thanks for moving this forward!
16:32:49 <meskio> nice work
16:33:03 <onyinyang[m]> this sounds great :) nice work!
16:33:47 <meskio> good, it looks we have a way forward, anything else on this topic?
16:33:57 <shelikhoo> I have do the research and try to draft a plan for this, and once it is ready we have another discussion about enabling unreliable udp like webrtc
16:34:29 <shelikhoo> I will do the research and try to draft a plan for this, and once it is ready we have another discussion about enabling unreliable udp like webrtc
16:34:58 <shelikhoo> nothing more from me... it seems... will take some thought to get it right
16:35:10 <meskio> :)
16:35:35 <meskio> I don't see more points for discussing, if you have something raise your voice...
16:36:10 <meskio> dcf1: I see you put you need help reviewing a mr, it looks assigned to cohosh, is that fine or you need some help over there?
16:36:37 <dcf1> cohosh is a good reviewer for that one
16:36:44 <cohosh> dcf1: oh sorry about that, i just noticed
16:36:56 <cohosh> i really need to fix my gitlab notifications
16:36:57 <meskio> :)
16:37:09 <cohosh> they important ones are getting drowned out heh
16:38:42 <meskio> cohosh: I have the same feeling, I tend to relay more and more in the TODO of gitlab to don't miss important things
16:39:20 <meskio> onyinyang[m]: I see you need help with rdsys updates, I'm going to be AFK until next thursday, but I'm happy to answer things about it then
16:39:35 <meskio> maybe others can help you in the mean time if you have something more urgent
16:40:04 <meskio> or we can have a conversation just after this meeting :)
16:40:30 <onyinyang[m]> yeah I think that the code I wrote for handling the bridges from rdsys in the distributor doesn't really match with rdsys' behaviour so I just want to confirm some things
16:40:39 <onyinyang[m]> a quick sync after the meeting would be helpful
16:41:06 <meskio> sounds good
16:41:15 <meskio> anything else for today?
16:41:54 <shelikhoo> EOF
16:42:05 <meskio> I guess we can end the meeting here
16:42:17 <meskio> #endmeeting