15:59:27 <cohosh> #startmeeting anti-censorship meeting
15:59:27 <MeetBot> Meeting started Thu Nov 12 15:59:27 2020 UTC.  The chair is cohosh. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:59:27 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic.
15:59:32 <cohosh> hey everyone
15:59:44 <dcf1> hi
15:59:45 <agix> hi
16:00:02 <cohosh> here is our meeting pad: https://pad.riseup.net/p/tor-anti-censorship-keep
16:00:44 <cohosh> phw: you around?
16:02:08 <cohosh> i wonder if we should move the meeting later now with the time change
16:02:36 <agix> might be a good idea
16:03:37 <phw> ugh, i forgot about the different time.. again
16:04:53 <cohosh> heh covid time + daylight savings time is a rough combo
16:06:05 <cohosh> we can give a few mins for updating the pad in case you want to add something to the agenda
16:06:28 <cohosh> there are again not very many items but we have a reading group discussion today \o/
16:08:06 <phw> brief announcement/discussion item: i'm planning on setting up rdsys/bridgestrap on polyanthum soon, but only for the sake of bridge testing
16:08:29 <phw> the idea is to test our bridges and expose a status page that operators can take a look at. no distributors for now
16:08:50 <phw> the status page tells you if your bridge is 1) untested, 2) tested + works, 3) tested + does not work
16:09:31 <cohosh> cool!
16:10:27 <agix> sorry for asking but polyanthum is the bridges.torproject.org host, right?
16:10:52 <phw> agix: yes, exactly
16:11:24 <phw> i believe our hostnames are named after flowers and onions
16:11:59 <agix> I see :D
16:14:02 <cohosh> any other discussion items?
16:14:08 <cohosh> i see we have a new default bridge
16:14:22 <cohosh> tor-browser#40212
16:14:42 <phw> nothing from my side
16:14:52 <cohosh> okay, any needs help with?
16:15:01 <cohosh> it looks like someone caught the bridge outage on the bug reporting pad
16:15:10 <cohosh> https://pad.riseup.net/p/tor-anti-censorship-bugs-keep
16:15:21 <cohosh> but then commented later that it started working again
16:15:29 <dcf1> yeah I'm going to copy to a comment
16:15:33 <phw> i'm blocking on other teams for now
16:15:37 <cohosh> thanks dcf1
16:17:32 <phw> cohosh: fwiw, i spent quite some time looking into i18n and prometheus-based metrics in go, just in case that may come in handy in snowflake or other projects
16:17:46 <phw> i'll sum up all i learned in the respective issues
16:17:59 <cohosh> cool :) most of the nsowflake pieces that need i18n are not in Go
16:18:23 <cohosh> but that sounds useful for GetTor and just to have in case that changes
16:19:28 <cohosh> GetTor for sure
16:20:16 <cohosh> anything else before we move on to reading group?
16:20:44 <phw> not from me
16:21:04 <cohosh> yey, reading group time
16:21:19 <cohosh> anyone have a summary ready? i can do a quick one if no
16:21:26 <agix> i prepared one
16:21:31 <cohosh> agix: oh yay!
16:21:50 <agix> <summary>
16:21:56 <agix> The paper analyzes real-world TLS traffic from over 11.8 billion TLS connections in order to identify which wide range of TLS client implementations are actually used on the Internet. The data included counts and time-stamps of unique Client Hello messages, a sample of SNI and metadata for each Client Hello and Server Hello responses.
16:22:05 <agix> For each connection a fingerprint was generated by calculating the SHA1 hash over several specific extensions including the TLS record version, handshake version, cipher suite list, compression method list, elliptic curve list, EC point format list, extension list, signature algorithm list and ALPN list.
16:22:19 <agix> The collected fingerprints are then used to analyze how distinguishable certain censorship circumvention tools are from real-world traffic.
16:22:19 <agix> In total, 230000 unique fingerprints were collected.
16:22:24 <agix> Some of the key findings:
16:22:33 <agix> Some TLS implementations generate several fingerprints, like Google Chrome, which generate at least 4 fingerprints even from the same device, due to sending different combinations of extension depending on the context and size of TLS requests.
16:22:47 <agix> To measure how quickly fingerprints change and how this might impact a censor using a whitelist approach, a list of new fingerprints (1 week old) was compiled and compared to the collected amount in the following weeks, showing a steady and but small increase of 0,33%. However, TLS updates in Chrome or iOS would cause a whitelist approach to block half of all connections after 6 months.
16:23:00 <agix> As of August 2018 Psiphon was able to mimic Chrome 58-64 making it less likely to be blocked, followed by Outline (which uses a randomized protocol to look like nothing), meek, Snowflake, Lantern, Tapdance and Signal.
16:23:18 <agix> In order to assist censorship circumvention tools, a TLS library named uTLS (fork of Golangs TLS library) was created, which allows developers to mimic arbitrary Client Hello messages. As of August 2018 the library has been adopted by Psiphon, Lantern and TapDance.
16:23:24 <agix> </summary>
16:26:10 <cohosh> one of the things that really stands out in this paper to me is the amount of real world data used
16:27:03 <dcf1> They've used their CU Boulder network tap in other research as well
16:27:13 <cohosh> 9 months of collecting TLS fingerprint data is impressive
16:27:34 <dcf1> Like "Detecting Probe-resistant Proxies" which we've discussed
16:28:51 <cohosh> i'd be interested in a study that compares differences in the fingerprints from CU Boulder to another location
16:29:08 <cohosh> obviously this type of data collection is difficult
16:29:37 <dcf1> Yeah there's a big opportunity there.
16:30:02 <agix> I wonder how difficult the talks might have been with the university staff to permit the experiment
16:30:28 <cohosh> heh
16:30:36 <dcf1> One of the strengths of "Seeing Through Network-Protocol Obfuscation" (https://censorbib.nymity.ch/#Wang2015a) is that they used some real-world traces evaluate false classification rates.
16:30:40 <cohosh> is sergey here this time?
16:33:59 <cohosh> this paper is also interesting because it carves out an edge case in the parrot is dead rule
16:34:24 <cohosh> uTLS hides TLS traffic by mimicking the fingerprints of other tools
16:34:50 <dcf1> IMO it's not really an edge case; parrot is dead was always somewhat overstated
16:34:56 <cohosh> the way i explain why this works better than the other types of mimicry called out in that paper (like skype) is because TLS is relatively simple
16:35:18 <cohosh> dcf1: that's fair, it was a theoretical result, not censorship seen in the wild
16:35:50 <dcf1> But yeah, uTLS dmeonstrably works at what it is trying to do
16:36:10 <dcf1> meek (obfs4proxy meek_lite) in Tor Browser uses uTLS since Ocotober 2019
16:36:20 <agix> Could anyone give me a short overview on how complex it is to adjust the fingerprint of transports like meek or Snowflake?
16:37:02 <cohosh> meek uses uTLS, which makes it a lot simpler from the perspective of someone using the uTLS library
16:37:27 <cohosh> i wonder how much maintenance work is put into keeping uTLS up to date with current fingerprints
16:37:37 <dcf1> agix: you can choose from one of the premade fingerprints with something like `utls=hellorandomized`, `utls=hellofirefox_65`, `utls=helloios_auto` in the bridge line
16:37:43 <cohosh> the paper gave some measurements on how fingerprints change over the timeline of the data collection
16:38:45 <dcf1> Currently it's obfs4proxy lagging behind, as its built-in list of fingerprints isn't up to date with what uTLS offers now
16:39:16 <dcf1> Though the "auto" fingerprints will still use the most recent, as I understand it, even if obfs4proxy's built-in list doesn't know about all the ones available
16:39:29 <phw> that sounds like a good 'first contribution' ticket
16:39:57 <dcf1> Well, except obfs4proxy isn't maintained by us or at gitlab.torproject.org, really
16:40:03 <agix> dcf1: is hellofirefox_65 still commonly used?
16:40:05 <cohosh> that requires pulling the latest versions of uTLS right?
16:40:14 <dcf1> yeah
16:40:17 <phw> i don't see the problem. ask yawning to merge and if that doesn't work out, we can fork
16:40:18 <cohosh> we need to do that manually for tor browser builds
16:40:30 <cohosh> so that is something we can do
16:40:41 <dcf1> But it's true it's a concern with keeping uTLS up to date. The last commit is from August.
16:40:50 * cohosh nods
16:41:19 <dcf1> There's some good community involvement, for example from Psiphon contributing code https://github.com/refraction-networking/utls/issues?q=is%3Aissue+author%3Arod-hynes
16:41:51 <agix> when did meek start to use uTLS?
16:42:38 <dcf1> It was work too, to check the fingerprint with each new ESR release, and sometimes it required configuration changes for the headless browser, like re-enabling TLS session tickets https://trac.torproject.org/projects/tor/ticket/26241
16:43:11 <sysrqb> cohosh: dcf1: phw: we can discuss that process after your meeting (whether relying on new obfs4proxy versions or updating it in tor-browser-build)
16:43:26 <dcf1> agix: https://blog.torproject.org/new-release-tor-browser-90 https://gitlab.torproject.org/tpo/applications/tor-browser/-/issues/29430
16:43:40 <agix> dcf1: nice thanks!
16:44:09 <dcf1> Both the mainline meek code and uTLS have support for uTLS. At the same time, Tor Browser switched from the mainline to obfs4proxy.
16:44:55 <dcf1> obfs4proxy's internal transport name is meek_lite, because it originally did not support TLS camouflage, even though it was compatible with the protocol otherwise. Now that there's uTLS support, meek_lite is effectively the same as meek.
16:45:35 <cohosh> i've been meaning to ask what the story behind meek_lite is
16:45:47 <cohosh> maybe this is a digression
16:46:36 <dcf1> That's the story, Yawning implemented the meek protocol in obfs4proxy but without browser camouflage, and gave it an incompatible transport name so people wouldn't be fooled into thinking it had all the same blocking resistance.
16:47:19 <dcf1> Then with uTLS, Tor Browser's started using obfs4proxy because it permitted getting rid of some project dependencies.
16:48:52 <dcf1> phw: Oh, one other complication, obfs4proxy doesn't actually use the mainline uTLS, but a fork (with an incompatible license) also made by Yawning. https://gitlab.com/yawning/utls
16:49:10 <cohosh> lol gdi that license thing
16:49:59 <phw> that's a great readme in yawning's fork
16:50:01 <phw> "Your tears are delicious, and your code will burn."
16:51:23 <dcf1> The fork is also unchanged for half a year
16:51:52 <dcf1> So yeah, we may be approaching a situation meek as in before, where it accurately imitated the fingerprint it intended to imitate, but that fingerprint was out of date and no longer common
16:52:56 <cohosh> that's frustrating, it's not that uTLS is out of date
16:53:03 <cohosh> right?
16:53:26 <cohosh> just that licenses, and project maintainership are making it difficult to easily update
16:54:32 <dcf1> Well both forks have unsurprisingly diverged
16:54:45 <dcf1> Each has some fixes that the other does not, I believe
16:55:47 <cohosh> so where are the biggest pain points in keeping uTLS up to date?
16:56:03 <cohosh> you mentioned finding new fingerprints for new versions of browsers is a lot of work
16:56:11 <dcf1> I don't know. I've asked sergey but I don't remember anything specific.
16:56:18 <dcf1> cohosh: no, that's not what I meant
16:56:23 <cohosh> i thought the discussions of GREASE in the paper are interesting
16:56:25 <cohosh> ah
16:57:07 <dcf1> I haven't done it, but there is apparently code in uTLS where you can give it a pcap and it will generate code to give you that fingerprint
16:57:22 <cohosh> aha wow!
16:57:31 <dcf1> but it's not always that easy, as I understand it uTLS required a lot of refactoring to support the changes in crypto/tls for TLS 1.3
16:57:34 <agix> https://tlsfingerprint.io/pcap
16:58:36 <dcf1> It looks like both versions have the same fingerprint list, last added Chrome 83 in June 2020. https://github.com/refraction-networking/utls/commits/master/u_parrots.go https://gitlab.com/yawning/utls/-/commits/obfs4proxy-dev/u_parrots.go
16:58:58 <dcf1> But they have different fixes beyond that, "Fix GREASE repeating values"; " Yawning Angel's avatar
16:59:16 <dcf1> Support more than one KeyShare extension correctly", "Add support for TLS Certificate Compression" in the other
16:59:35 <dcf1> It's a bummer how it worked out
17:00:49 <cohosh> yeah
17:01:43 <cohosh> as far as snowflake, it looks like they were using the old chrome-based version
17:02:12 <dcf1> That shouldn't matter for TLS purposes, for domain fronting, then and now, Snowflake uses crypto/tls
17:02:27 <cohosh> oh this was for the communication with the broker
17:02:33 <cohosh> ?
17:02:45 <dcf1> Yes, nothing to do with WebRTC in this paper
17:03:06 <cohosh> hmm so we might want to consider moving to uTLS for snowflake
17:03:09 <dcf1> We need to add uTLS (or equivalent) to Snowflake at some point
17:03:22 <cohosh> cool
17:03:52 <dcf1> It is not that hard, but unfortunately using uTLS with HTTP/2 is the most complicated use case and requires some tricks that are not entirely satisfying
17:03:52 <cohosh> i was wondering, since we're using pion/webrtc if a uTLS-like equivalent for DTLS would be useful later on
17:04:07 <cohosh> (for the webrtc part of the connection)
17:04:54 <dcf1> It was actually Yawning that figred out a decent way to do it https://lists.torproject.org/pipermail/tor-dev/2019-January/013633.html
17:04:59 <cohosh> dcf1: is it because of APLN? I don't see immediately why HTTP/2 would be difficult
17:05:13 <cohosh> *ALPN
17:05:35 <dcf1> This is what the meek implementation looks like, there are many comments here: https://gitweb.torproject.org/pluggable-transports/meek.git/commit/?id=b8fb876145cda3c14d335a3fc88b5e422a926150
17:06:35 <cohosh> lol now that i look at it i think i reviewed this commit awhile ago
17:06:55 <dcf1> I believe that even this scheme can fail if an HTTPS server switches from HTTP/1 to HTTP/2 in between requests (which could conceivably happen with a load balancer or something, though it's unlikely with the kinds of CDNs we are interested in)
17:07:58 <dcf1> But at any rate, it's more complicated than https://github.com/refraction-networking/utls#migrating-from-cryptotls
17:10:27 <dcf1> Oh yeah you did https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/meek/-/issues/29077#note_2602629
17:10:35 <cohosh> :)
17:16:46 <cohosh> seems like discussion is winding down
17:17:17 <cohosh> anything else before we end the meeting?
17:18:12 <cohosh> okay let's end it here
17:18:21 <cohosh> thanks agix for the summary and for choosing this reading!
17:18:46 <cohosh> #endmeeting