15:59:17 <cohosh> #startmeeting anticensorship meeting
15:59:17 <MeetBot> Meeting started Thu Dec  3 15:59:17 2020 UTC.  The chair is cohosh. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:59:17 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic.
15:59:37 <phw> o/
15:59:53 <cohosh> here is our meeting pad: https://pad.riseup.net/p/tor-anti-censorship-keep
15:59:55 <dmbb> Hi all, I'm Diogo Barradas and I'm attending today's meeting per Cecylia and David's invitation
16:00:04 <cohosh> dmbb: yay! welcome :)
16:00:09 <phw> dmbb: welcome and thanks for joining us!
16:00:09 <cohosh> i'm cecylia
16:00:36 <agix> welcome!
16:00:47 <cohosh> we usually go through our regular meeting agenda first
16:00:54 <cohosh> and the reading group follows at the end of the meeting
16:00:55 <dmbb> Thank you all, it is good to be part of the resistance =)
16:01:30 <cohosh> 8)
16:01:43 <dmbb> Certainly, I'll wait for the proper time to contribute to the discussion
16:01:51 <cohosh> phw: i think the first announcement is yours?
16:02:29 <phw> yes, so rdsys and bridgestrap are now deployed and we have https://bridges.torproject.org/status?id=FINGERPRINT which allows bridge operators to look up the status of their bridge
16:02:48 <phw> we currently only test obfs2, obfs3, obfs4, and scramblesuit because rdsys doesn't yet have a parser for vanilla bridges
16:03:22 <agix> nice \(^-^)/
16:03:23 <phw> tor is going to log this url, so bridge operators can simply click on it to learn if their setup works
16:03:37 <dunqan> hey everyone, I'll be standing in for anto over the next few months :)
16:03:41 <dunqan> (don't mind me lurking)
16:03:59 <phw> (you can also provide a hashed fingerprint, so you can share the url with others)
16:04:08 <phw> hi dunqan and thanks for joining! o/
16:04:19 <dunqan> thanks for having me o/
16:04:42 <phw> that's it for bridge testing. i may as well do the announcement too
16:04:47 <cohosh> dunqan: welcome!
16:05:03 <phw> please take a look at our november 2020 report and change/add items as you see fit: https://pad.riseup.net/p/U4o0LNYPgm7SCxuF-1Sm
16:05:12 <phw> my plan is to publish it later today
16:06:21 <cohosh> will do
16:06:33 <cohosh> any other discussion announcements?
16:06:42 <cohosh> if not we can move on to reviews
16:07:05 <phw> i'm good
16:08:29 <cohosh> cool
16:08:38 <cohosh> i need a review of snowflake!21
16:09:33 <cohosh> phw needs snowflake!22 and bridgestrap#10
16:09:40 <cohosh> i can take both of those phw
16:10:16 <phw> happy to review snowflake!21 unless dcf (who doesn't seem to be here) wants to
16:10:41 <cohosh> agix: do you need any reviews?
16:10:55 <phw> agix: i took a look at tpo/anti-censorship/rdsys#5 yesterday and haven't made up my mind yet on how to approach this
16:11:18 <agix> oh ok thanks, take your time
16:11:26 <phw> in particular, i was thinking about moving the code to a dedicated persistence layer. i'll add my thoughts to the ticket
16:11:48 <agix> do you have any other ticket I can help you with regarding rdsys?
16:11:59 <phw> oh for sure, let me take a look at the open issues
16:12:11 <agix> cool
16:13:19 <cohosh> i wonder if we should wait a bit to see if dcf shows up before the reading group
16:13:38 <phw> tpo/anti-censorship/rdsys#6 or testing more broadly is an important issue
16:13:56 <phw> but i can also think of a few other issues that i haven't gotten around to filing yet
16:14:00 <phw> i'll let you know
16:14:17 <agix> ok thanks
16:16:41 <cohosh> okay that's it for our regular agenda it seems
16:17:31 <cohosh> hm
16:17:34 <dmbb> i don't mind waiting a few minutes for dcf as well, if you prefer
16:18:23 <cohosh> let's wait like 5 more minutes and then start?
16:18:31 <phw> sounds good. just enough time to make coffee
16:18:35 <phw> good call cohosh
16:18:51 * cohosh makes tea
16:23:44 <cohosh> okay let's get started \o/
16:23:55 <cohosh> i have a quick high level summary
16:24:01 <cohosh> oh yay dcf1
16:24:07 <cohosh> that is perfect timing
16:24:24 <cohosh> lol
16:24:26 <agix> right on time
16:25:16 <cohosh> alright, pasting summary:
16:25:43 <cohosh> Protozoa is a new anti-censorship system design and implementation that uses existing WebRTC services as covert channels for censorship resistance traffic.
16:25:46 <cohosh> It works by having a client and a proxy visit the same WebRTC service through the Protozoa-modified Chromium browser.
16:25:49 <cohosh> One user creates a password-protected conferencing room and shares that URL with the other user out of band.
16:25:52 <cohosh> The camera and mic of each user records video, but these frames are replaced with covert traffic after they are encoded but before they are sent on the wire by a hooks in the WebRTC stack. A corresponding downstream hook retrieves the covert traffic and replaces it in turn with a blank keyframe to prevent software malfunction.
16:25:57 <cohosh> This covert channel behaves like a SOCKS5 proxy and allows the tunelling of arbitrary traffic.
16:26:00 <cohosh> The authors evaluate the throughput and detection resistance of this channel by using statistical properties of the packet flows. They send a variety of downloaded YouTube videos over the channel and try to detect when these videos are being replaced with covert traffic.
16:26:04 <cohosh> The results are good: both high throughput and high detection resistance
16:26:07 <cohosh> They also test their tool in the wild to evade censorship in China, Russia, and India.
16:26:10 <cohosh> </summary>
16:26:44 <dcf1> I got in touch with the authors and they say they are working on a proposal to make Protozoa into a Tor pluggable transport
16:26:55 <dcf1> Did any of the authors come to the meeting?
16:26:56 <cohosh> yup dmbb is here :)
16:27:03 <dcf1> great
16:27:04 <dmbb> Yes, this is Diogo :)
16:27:27 <dmbb> Regarding this proposal:
16:28:17 <dmbb> It is something that we intend to discuss with you. I will prepare a well-formated document and share with you
16:29:08 <cohosh> nice!
16:30:03 <dcf1> So I think it's clear that the key innovative idea in this work is the "encoded media tunneling"
16:30:26 <dmbb> While the overall plumbing of Protozoa would require some adaptation to the PT API, I think we can do some more work on the bridge infrastructure part of the thing
16:30:51 <dcf1> Were you don't try to encode payloads as pixel data that has to survive transcoding, but just replace the existing pixel data entirely
16:31:48 <dmbb> yes, that is correct
16:31:59 <dcf1> For me it brought up similarieities to Slitheen. Protozoa cover video :: Slitheen over user similator; Protozoa replaces all video bitstream data :: Slitheen replaces payloads of leaf resources
16:32:08 <phw> dmbb: what changes to the PT API were you envisioning?
16:32:27 <dcf1> In both cases, the result is to yield strongly indistinguishable packet length and timing distributions
16:32:42 <cohosh> yeah, it's like end to end traffic replacement
16:33:02 <cohosh> personally, i think this approach is a lot cleaner than slitheen
16:33:08 <cohosh> which requires a state machine
16:33:42 <cohosh> and is subject to a lot of bandwidth loss due to out of order packets and packet boundaries splitting headers
16:34:11 <dmbb> pwd: I would not say changes to the PT API itself, but changes to the current way we are picking up and packaging IP packets in Protozoa's covert channel. For now, we are picking up IP packets within a network namespace, and we can do this part better by directly hooking Tor's port to feed its packets to Protozoa
16:35:00 <cohosh> for reference, we did try doing video frame replacement in slitheen and we had some trouble: https://uwspace.uwaterloo.ca/bitstream/handle/10012/13595/Bocovich_Cecylia.pdf#subsection.4.2.2
16:35:19 <cohosh> i really like the protozoa approach
16:35:19 <phw> dmbb: gotcha
16:35:26 <dcf1> Right, currently they have a VPN-like model where the circumvention transport carries raw IP datagrams (encoded), so the OS kernels at both ends are responsible for reliability and retransmission
16:36:06 <dcf1> Whereas Tor PT is all userspace and application layer, transports have to implement their own reliability if they run over lossy carriers
16:37:46 <dcf1> dmbb I wanted to ask about your experience in modifying Chromium
16:37:48 <dmbb> dcfl: This is right. We also did several experiments with different network conditions and the covert channel's throughput does not seem to be largely affected. Whether this + Tor circuitry would prove to slow down the channel, is something that we need to ascertain
16:38:11 <dcf1> Formerly, in Snowflake, we used the WebRTC stack of Chromium, separated into a standalone library: https://github.com/keroserene/go-webrtc
16:39:04 <dcf1> But this approach became an extreme maintenance burden and was preventing ports to platforms that were not supported by Chromium's cross-compiling (e.g. Windows)
16:39:48 <dcf1> What's your impression, would the changes you had to make be likely to re-apply cleanly after a Chromium major version upgrade, say?
16:40:45 <dmbb> dcfl: Well, I modified Chromium's WebRTC stack for two particular versions (about a year and halft prior to the submission and then just a few months before). There were indeed some changes to the code, but nothing to harsh to accomodate.
16:41:19 <dmbb> In particular, they refactored a few functions that deal with media, which forced me to create a couple of getters/setters to the portion of the frames we were replacing
16:41:48 <dmbb> The hooks we placed within Chromium's code are quite simple
16:42:58 <dmbb> For instance, for encoding covert data, one hook tells Protozoa the size of the frame. Protozoa then packs IP packets up to that lenght, and gives it back so that the hook can replace the media content
16:43:56 <dmbb> So, in my experience, I would say we probably cannot rely completely rely on a "perfect" compatibility between versions
16:44:42 <dmbb> But the effort for adapting between versions seems rather small. And this is because the media containers kung fu seem to be rather static portions of the code
16:45:04 <dmbb> .
16:45:08 <dcf1> I see, thanks
16:45:12 <cohosh> dmbb: did you have an idea about how to do proxy distribution for the PT version of protozoa?
16:45:24 <dcf1> I don't think there would be a problem in representing Protozoa bridges in rdsys, it would be probably a service name and a chat room name.
16:45:26 <cohosh> from what i can tell, the design in the paper requires a lot of manual set up
16:45:55 <dcf1> But in terms of anti-enumeration, rdsys may not provide enough protection, because besides the WebRTC obfsucation, Protozoa's model does not really differ from obfs4's
16:46:23 <cohosh> hmm yeah
16:46:25 <dcf1> I.e., volunteers set up their Protozoa bridges (at relatively static IP addresses), censored users learn about them and use them
16:47:10 <dcf1> As I understand it, the Whereby servers are only used for signaling and STUN, the actual video connection is direct peer-to-peer between client and bridge
16:47:38 <cohosh> there has to be a TURN server in use then
16:47:51 <dmbb> cohosh: definitely. In our paper we largely dismiss the proxy distribution problem and show how the system could work if you had a friend/family outside the censored region. One way we though about to do this better under the PT model, was to use these chat room names and rely on some kind of trusted bridge infrastructure where bridges were able to rotate their IP quite often
16:48:12 <dcf1> (Perhaps there are alternative servers that do route all the peer-to-peer connections through their own servers, which would provide some degree of collateral damage, but then those servers would also *not* have to try to decode the video stream they are passing through.)
16:48:29 <dmbb> we do rely on STUN only for our paper
16:48:33 <cohosh> dmbb: sounds like a good fit for rdsys then :)
16:48:40 <cohosh> hm
16:48:45 <dmbb> dcfl: i checked your thoughts on discord
16:48:51 <cohosh> yeah i wonder how whereby deals with incompatible NATs
16:49:06 <dmbb> about the use of a central server that inspects the content of discord connections
16:49:20 <cohosh> the advantage of protozoa is that there's a collatoral damage factor here that snowflake does not have
16:49:26 <dmbb> it makes me think about WebRTC gateways (used to re-encode, inspect, forward) media data
16:49:54 <dcf1> Yeah I found this blog post: https://medium.com/tenable-techblog/lets-reverse-engineer-discord-1976773f4626 that says Discord uses WebRTC but not peer-to-peer, a Discord middlebox intercepts and decodes all connections
16:50:06 <dmbb> If we could have a service that relies on a WebRTC gateway to forward data between peers *without* exposing their IPs, that could be useful for hiding bridges IPs
16:50:31 <dmbb> (but of course, which did not inspect our traffic for malformed content)
16:50:36 <dcf1> cohosh: collateral damage on the Whereby web, STUN, and signaling servers maybe, but not on the bridge IP addresses, as I understand it
16:51:04 <cohosh> dcf1: yeah i mean collatoral damage for whatever central server they are using to proxy between users with incompatible NATs
16:51:07 <dmbb> WhereBy itself uses one of these WebRTC gateways (with 4+ participants), but I'm not sure about the kind of inspection / validation they perform
16:51:27 <dcf1> A censor could harvest rdsys for Protozoa bridge addresses and block them, without blocking Whereby
16:51:48 <dcf1> cohosh: oh I see what you're saying and agree, if the service has a TURN server or something similar
16:52:10 <cohosh> yup, the same is true for snowflake to an extent. a censor can continuously poll the broker for all the snowflake addresses
16:52:23 <cohosh> the advantage there is how lightweight the proxies are so we hope to get a lot of them
16:52:56 <cohosh> but i would be interested to know how much blocking of existing proxies is required to degrade the performance enough to amke it unusable anyway
16:53:25 <dmbb> yes, that's why we thougght about setting up an infrastructure of bridges which could rotate their IP (for instance, after each videocall), or to take advantage of one of these TURN/WebRTC gateway boxes
16:53:52 <dcf1> yeah it's good to think about
16:54:07 <dcf1> and also worth thinking in mind that the system doesn't have to be 100% proof against 100% of censors to be useful
16:54:13 <dcf1> *keeping in mind
16:54:35 <cohosh> yup
16:54:55 <dmbb> indeed. Still related to snowflake, how aware are you of attempts to fingerprint its traffic?
16:55:18 <cohosh> dmbb: we haven't noticed it yet
16:55:32 <dmbb> i've stumbled across some work that tried to take a deeper look at it (https://arxiv.org/pdf/2008.03254.pdf)
16:55:35 <dcf1> only by researchers so far
16:55:38 <cohosh> the most blocking we've had is china blocking google's STUN server which used to be the default
16:55:55 <cohosh> and then some blocking of individual proxies back when we only had like 10
16:55:59 <dmbb> dcfl: ist that the one you know about, or there is more research on that?
16:56:14 <dcf1> Yes, we have a fingerpriting wiki page and that paper is already listed
16:56:15 <dcf1> https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/wikis/Fingerprinting
16:56:15 <cohosh> that's the only one i've seen
16:56:31 <dcf1> We have had some converstations with Kyle MacMillan in these meetings in the past
16:56:41 <dmbb> cohosh: about the STUN servers, are you able to use any STUN server for Snowflake?
16:56:48 <cohosh> dmbb: yeah
16:56:52 <dmbb> say, a server controlled by ggoogle or Whereby?
16:57:11 <cohosh> definitely
16:57:20 <dmbb> Hum, so there's not any kind of app-dependent auth, right
16:57:41 <cohosh> no, other than stun we don't use any third party infrastructure
16:57:56 <cohosh> we use our own domain-fronted service for signaling
16:58:57 <dmbb> dcfl: thank you for the link
16:59:28 <cohosh> dmbb: so for the PT, to run proxy, the user would install and use the protozoa-modified chromium browser?
17:00:50 <dmbb> cohosh: as far as we thought about it, yes. There may also be the possibility to do this kind of change in the Tor Browser itself, I think?
17:01:07 <cohosh> Tor Browser doesn't have webrtc enabled
17:01:35 <dmbb> right, so I think we would need a separate browser, then
17:01:40 <cohosh> though i definitely think that getting this incorporated into an existing browser would be a good way to go
17:02:11 <cohosh> you can ask the tor browser team about the difficulties of maintaining even a patchset ontop of an existing browser :)
17:02:53 <dmbb> at first we tried to take a look at browser extensions to see if it was possible to change some of the WebRTC inner workings through browser extensions
17:03:22 <dmbb> but then we gave up on that as we really need to control the native code
17:03:25 <cohosh> i'm guessing the don'
17:03:28 <cohosh> blah
17:03:31 <dmbb> yeah
17:03:42 <cohosh> guessing they don't expose the hooks you need to repalce the video frames
17:03:55 <dmbb> Yes, nothing like that, really
17:04:13 <dmbb> We get to interact (open webrtc sessions and the like) but not much else
17:04:18 <cohosh> maybe brave would open to something like this?
17:04:35 <dmbb> One thing we did not do was to change audio frames
17:05:12 <dmbb> i believe it's also doable, but probably gives us less bandwidth. And coordinating video + audio delivery may not prove to give that many benefits
17:06:09 <dmbb> cohosh: if brave were interested in doing this, I think it could be easier to deploy a PT, for sure (at least between updates)
17:06:10 <cohosh> could you use turbotunnel to coordinate it?
17:06:47 <dmbb> i'll have to check your docs first, sorry :)
17:06:54 <cohosh> we found dcf1's turbotunnel pretty much a requirement for snowflake
17:07:17 <dmbb> alright, i will take a look at it!
17:07:39 <cohosh> though i suspect that protozoa would be more reliable than snwoflake is just because of using an existing service for NAT punching
17:08:05 <dcf1> I thought that the network namespace + kernel IP would take care of any coordination you need, but yes, something in userspace could make it more portable, though probably less efficient
17:08:40 <dmbb> dcfl: exactly, i was thinking about the need to depend on the kernel
17:09:25 <cohosh> another thing to consider: are there any merits to applying the video frame replacement idea to snowflake?
17:09:48 <dcf1> yeah, the big benefit IMO is media streams vs. data streams
17:10:02 <dmbb> dcfl: some other difficulties I faced when deploying protozoa hooks within chromium was that "the code is the doc" and that were two concurrent implementations of the video engine (one being slowly phased out)
17:10:41 <dcf1> Section 2 of https://arxiv.org/pdf/1605.08805.pdf "Media vs. data transport"
17:10:43 <cohosh> lol this was my experience with modifying firefox to do video replacement for web-based video streams too
17:10:43 <dmbb> cohosh: i would say resistance to fingerprinting may be another
17:11:05 <dmbb> ^^'
17:12:00 <cohosh> dmbb: we use https://github.com/pion/webrtc for snowflake and the developer of that project is very keen on censorship resistance efforts fwiw
17:12:27 <cohosh> is suspect he would be amenable to implementing the hooks needed to manipulate video and audio frames upstream
17:12:56 <dmbb> regarding other issues like DDoS, I feel like Protozoa is on the same board as Snowflake. I see you are working on Salmon to perform a judicious distribution of bridges
17:12:57 <cohosh> he has already upstreamed patches for snowflake
17:13:02 <dcf1> It would be a possibility to create a dummy session with e.g. Whereby before starting the Snowflake peer-to-peer connection, in case a sudden peer-to-peer connection out of the blue is identifiable
17:13:26 <cohosh> dcf1: oh that's a good idea
17:13:28 <dcf1> But it couldn't fully take the place of the broker, even Protozoa has a out-of-band data transfer at the beginning to do what the Snowflake broker does
17:14:16 <cohosh> dmbb: yeah, phw is working on the salmon implementation :)
17:14:51 <dmbb> cohosh: I think Protozoa would benefit from it as well, yes
17:16:07 <dmbb> dcfl: yes, we also require something similar to a broker, unless the client does know someone outside the censored area. I don't know how often is it to do this kind of connections via WebRTC data VS video channels
17:16:07 <cohosh> so it looks like there's two paths for integrating protozoa work into tor: 1) a protozoa PT that integrates the proxy into an existing browser and uses rdsys(+salmon) to distribute protozoa bridges, and 2) using some ideas from protozoa to enhance snowflake
17:16:15 <cohosh> and both paths are worth pursuing
17:17:02 <cohosh> seems like rdsys would fill the role of the broker
17:17:04 <dcf1> Unfortunately I don't think the encoded media tunneling can apply to Snowflake, because a browser extension won't have the necessary level of access to the media stream
17:17:26 <cohosh> dcf1: it could if we talk to pion
17:17:27 <dcf1> At least with browser-based proxies
17:17:33 <cohosh> aha right
17:17:39 <cohosh> just the standalone proxies would have it
17:17:45 <dmbb> yes, so 1) is kind of the subject of our proposal I was telling you about the other day. I can share it with you by tomorrow
17:18:13 <dmbb> Would definitely be interested in having a chat with you, after you got the chance to look at it
17:18:38 <cohosh> dmbb: yes let's do that! we started an email chain about it earlier this week
17:18:46 <cohosh> dcf1: do you want to be a part of the discussion?
17:18:51 <cohosh> (or anyone else here)
17:19:17 <cohosh> i'm sure roger will also be interested and i just realized he's away this and next week
17:19:37 <dmbb> cohosh: yes, thanks! I will reply to that thread with our draft
17:20:13 <dcf1> I'm afraid I may be not so available over the next week, but you can Cc me in the thread
17:20:24 <cohosh> cool, we can continue to coordinate after the reading group ends
17:20:31 <dcf1> Did anyone write a longer summary of the paper this week? agix? If not, I have one to post to bbs.
17:20:32 <dmbb> perfect
17:21:01 <agix> dcf1 i didn't prepare one, so feel free to use yours
17:21:19 <dmbb> dcfl: let me thank you again for the summary, it's rally on point
17:21:27 <dmbb> really*
17:21:36 <dcf1> np
17:21:57 <dcf1> just trying to build the constructive and cooperative research field I want
17:22:03 <cohosh> :D
17:22:41 <dmbb> hehe. does anyone have some other question about the paper?
17:23:07 <cohosh> dmbb: this was really impressive work, i appreciated the attention to detail with the implementation and the quality of the performance and detection evaluations
17:23:16 <agix> dmbb yeah a quick one from me. considering that Protozoa uses 98.8% of the available frame space for transmitting covert data, do you see any opportunities to enhance the throughput in the future?
17:24:00 <dmbb> cohosh: thank you! I've really been learning a lot from all the work you people have been doing!
17:24:17 <dmbb> agix: one possibility would be to use the audio channel as well
17:24:44 <dmbb> agix: i suppose the same kind of replacement can also been done there
17:25:11 <dmbb> agix: we did focus on video only since the majority of the bandwidth is allocated to video anyway
17:26:08 <dmbb> agix: for now, I don't see many other possibilities for us to replace more content from the video frames. We need the header for knowing which frame size to replace at the receptor
17:27:12 <dmbb> cohosh: I hope the code and artifacts may also be useful. I tried to build a full walkthrough for performing our experiments at https://github.com/dmbb/Protozoa
17:27:21 <agix> dmbb thanks for the info and great work btw!
17:28:06 <dmbb> agix: thank you!
17:29:21 <cohosh> alright, it seems dicussion is winding down and it's been about an hour
17:29:30 <cohosh> i'll wait another minute and then close the meeting
17:30:11 <phw> dcf1: i agreed to review https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/merge_requests/21 but would leave it to you if you're interested
17:30:53 <dcf1> I am afraid I won't have time this week, you review it please
17:31:01 <phw> will do
17:31:04 <cohosh> thanks phw
17:31:12 <cohosh> #endmeeting