16:00:16 <cohosh> #startmeeting tor anti-censorship meeting
16:00:16 <MeetBot> Meeting started Thu Apr 29 16:00:16 2021 UTC.  The chair is cohosh. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:16 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic.
16:00:21 <cohosh> hey everyone!
16:00:36 <cohosh> here is our meeting pad: https://pad.riseup.net/p/tor-anti-censorship-keep
16:01:55 <cohosh> there's not a lot on the agenda for today
16:02:01 <cohosh> but we do have a reading group discussion
16:02:44 <dcf1> are you planning to post the Belarus report?
16:02:54 <cohosh> dcf1: yes!
16:03:12 <cohosh> i was just going to post it to the existing belarus thread on net4people
16:03:16 <cohosh> and maybe make a blog post
16:03:20 <cohosh> and put it on ntc.party
16:03:31 <dcf1> better make a new thread imo, it's distinct enough
16:03:40 <cohosh> ah okay, good call
16:04:10 <dcf1> I only made a grab-bag Belarus thread because I didn't have time to sort through it all
16:04:54 <cohosh> good to know, i think there is an existing thread on ntc.party on tor blocking specifically that i can post to there
16:05:34 <agix> hi
16:05:40 <cohosh> hey agix!
16:06:11 <cohosh> okay our first announcement for the day is that tor is switching our default git brances to main
16:06:28 <cohosh> i created a ticket for it at anti-censorship/team#6
16:06:46 <cohosh> there are tickets that other teams have made as well that are linked there
16:07:26 <cohosh> we have a lot of repositories and many are maintained by volunteers so i don't want to step on peoples' toes there
16:08:14 <cohosh> feel free to comment on the ticket if you have opinions or feedback
16:08:49 <cohosh> for many repositories it will not be a big deal to change
16:10:01 <cohosh> i'll probably send an email to the anti-censorship team list about it as well
16:10:13 <dcf1> I'm curious why the ticket prioritizes GitLab repos, rather than git.tpo ones
16:10:31 <dcf1> Since there's a big warning on gitlab pages saying that it's not meant as a primary code repository, or something
16:11:06 <dcf1> "GitLab is used only for code review, issue tracking and project management. Canonical locations for source code are still https://gitweb.torproject.org/ https://git.torproject.org/ and git-rw.torproject.org."
16:11:43 <cohosh> hm, yeah you're right
16:12:11 <cohosh> when i was going through the repos i noticed many were on gitlab and not git.tpo
16:12:52 <dcf1> yeah the warning banner never stood a chance
16:13:16 <cohosh> it's also the case that all the repos the org maintains are on gitlab
16:13:53 <cohosh> and while there are many anti-censorship repos not listed in this ticket, they are either completely maintained by people outside TPI or haven't been updated in years
16:14:01 <dcf1> at least a fork/mirror is on gitlab for active projects, because of merge requests
16:14:07 <cohosh> yeah
16:14:16 <cohosh> it's also a bit disorganized there though
16:14:27 <cohosh> most of our mirrors are git.tpo --> gitlab (like snowflake)
16:14:39 <cohosh> but i believe there are some gitlab --> git.tpo mirrors
16:14:44 <cohosh> and these are not well documented
16:15:03 <dcf1> oh bummer
16:15:28 <cohosh> perhaps we should document all of these in the team wiki
16:15:49 <cohosh> actually yes we certianly should
16:16:08 <cohosh> because i am just keeping in my mind which place to push to all the time
16:16:21 <cohosh> pushing to the wrong place breaks the mirror
16:16:34 * cohosh adds better repo documentation to the list of todos
16:17:10 <cohosh> as far as this change, i was not planning on updating the default branch for really old, discontinued repositories
16:17:25 <cohosh> unless they start receiving active development
16:17:42 <cohosh> but i'm happy to do that as well
16:19:29 <cohosh> okay let's move on and we can continue to sort out details in the ticket and emails
16:19:47 <cohosh> april is ending and it's time for our monthly report: https://pad.riseup.net/p/3v7ixS9pkbAdtS7TBxYW
16:20:04 <cohosh> i've started it and feel free to add items you think should be there
16:20:54 <cohosh> any reviews or needs help with items for this week?
16:21:12 <agix> not from me
16:21:37 <cohosh> agix: i just saw your comment on bridgedb#27984 and i'll merge that after the meeting :)
16:21:49 <agix> cohosh thanks :)
16:22:23 <dcf1> I am scheduled to do AMP cache rendezvous in about 8 weeks, so I will watching / helping with snowflake#29293
16:22:59 <cohosh> dcf1: ah awesome \o/
16:23:00 <dcf1> wrt requirements engineering
16:23:23 <cohosh> i am actually planning on getting to that this week heh
16:23:43 <cohosh> if you ever need me to move osmething up my priority list i am happy to fwiw
16:24:59 <cohosh> i could use a review for the moat shim bridgedb#32276 but i can ask meskio (our new developer) to review it next week as well :)
16:26:11 <cohosh> should we move on to our reading group?
16:26:58 <dcf1> i'm ready
16:28:00 <cohosh> cool let's start
16:28:24 <cohosh> our reading for this week was on Domain Shadowing: Leveraging Content Delivery Networks for Robust Blocking-Resistant Communications
16:28:33 <cohosh> link to paper: https://www.usenix.org/system/files/sec21fall-wei.pdf
16:28:44 <cohosh> anyone have a summary prepared?
16:28:52 <agix> yep, i got a short one
16:28:58 <cohosh> agix: awesome!
16:29:10 <agix> <summary>
16:29:23 <agix> This paper presents a novel censorship evasion technique called Domain Shadowing, which takes advantage of the fact that CDNs allow their customers to bind their front-end domain to any back-end domain.
16:29:38 <agix> A user only needs to register a new domain to a CDN service that is accessible from the censored country and bind the domain to the actual target domain, in other words the censored domain the he/she wants to visit. Within the CDN user account a rule needs to be specified that rewrites the Host header of the incoming requests to the target domain, otherwise the requests will be rejected.
16:29:51 <agix> Once these steps have been established, the user sends a request to the registered domain within the censored area. The request will be sent to the CDN, where the Host header will be rewritten according to the specified rule and the request will be forwarded to the target domain. The subsequent response will be delivered under the user registered domain name.
16:30:10 <agix> During this process, a censor sees only an HTTPS to the CDN requesting the previously registered user domain and thus will not block the connection.
16:30:22 <agix> Additionally the author proposes the use of DfDs, which combines the efforts of domain fronting and domain shadowing for an enhanced blocking-resistant evasion technique.
16:30:30 <agix> </summary>
16:30:49 <dcf1> thanks
16:31:09 <dcf1> "a rule needs to be specified that rewrites the Host header of the incoming requests to the target domain, otherwise the requests will be rejected."
16:31:46 <dcf1> I think this is not true in general (i.e., it's not an HTTP rule or anything), but it is true for various important sites, the paper gives Facebook and Amazon S3 as examples
16:32:35 <agix> good point, thanks!
16:33:22 <dcf1> Let me explain how I am mentally categorizing this research
16:34:32 <dcf1> A few years back there was CacheBrowsing (https://censorbib.nymity.ch/#Holowczak2015a) / CDNBrowsing (https://censorbib.nymity.ch/#Zolfaghari2016a). (I always have trouble remembering which of these two papers does what.)
16:35:31 <dcf1> Essentially the idea is you build a mapping of what websites are on what CDNs, and you use a browser extension to front every request with a front domain that is appropriate for that CDN
16:35:56 <dcf1> Of course, it only works for sites that actually are on a CDN.
16:36:53 <dcf1> I see Domain Shadowing as patching this gap in CacheBrowsing: since you create your own CDN domain bindings, it is as if *every* site were on a CDN, and you don't need a site->CDN mapping anymore.
16:38:19 <dcf1> After that, it's mechanically the same: you go directly to the destination web server (not via an HTTP-based tunnel, like meek)
16:39:07 <dcf1> And I believe the security and privacy caveats are the same as with CacheBrowsing
16:39:19 * cohosh nods
16:40:04 <dcf1> One noteworthy difference is that the user pays for their own traffic (since it is using their own CDN bindings), rather than the publisher paying for the traffic.
16:41:17 <agix> so do you think it would make more sense to combine Domain Shadowing with approaches like CacheBrowsing rather than Domain Fronting
16:41:28 <arma2> does this one also work better in places that don't allow domain fronting? that is, if you have to name what site you're going to, in cachebrowsing, you name the publisher's site and then maybe the censor knows they want to block it
16:41:53 <arma2> whereas here you name your own personal nonsense site and the censor has never heard of it
16:42:19 <dcf1> agix: no, that's not what I mean. CacheBrowsing itself is based on domain fronting. It's using preexisting CDN bindings rather than user-created ones, is the critical distinction, I think.
16:42:51 <agix> dcf1 got ya
16:44:43 <dcf1> arma2: yes, I think one use case is as a personal secure-through-obscurity proxy. You use your own domain name for all your domain shadowing needs, and as long as it doesn't get too big, you don't get blocked. Kind of like a very cheap / agile mirror site setup.
16:45:11 <arma2> right. and in that sense, it's one of the few cdn-using variations that doesn't need domain fronting too.
16:45:31 <cohosh> arma2: there's a section in the CDN browsing paper that talks about sni
16:45:34 <arma2> but it's unsatisfying in that it doesn't scale well for that scenario
16:45:41 <dcf1> (Where mentally, I also file mirror sites into the secure-through-obscurity bucket, because they are easy to block once discovered)
16:45:51 <dcf1> well, this paper has a few other interesting variations
16:45:52 <cohosh> it seems that at the time of writing it was not deployed widely
16:46:46 <dcf1> 1. you point your own domain at someone else's domain
16:47:14 <dcf1> 2. you point a nonexistent domain (Section 3.5) at someone else's domain
16:47:55 <dcf1> 3. you point someone else's domain at someone else's else's domain (Section 4.2.2 has www.facebook.com shadowed by www.forbes.com)
16:49:04 <dcf1> Kind of surprising that CDNs will let you claim some random domains and if the traffic manages to get to their edge servers, they'll follow the rules you set up, but the paper gives some technical reasons why things work that way (Section 6.2)
16:49:20 <arma2> yea, that part is fun :)
16:49:40 <cohosh> arma2: so at the time of writing, cache/CDNBrowsing had an assumption built in that all a censor sees is the CDN and not the customer of that CDN's domain because things like sni weren't widely used by CDNs yet
16:50:00 <arma2> cohosh: ah ha. so back in cachebrowsing time, everybody domain fronted by default, kinda
16:50:11 <cohosh> they have a section in their paper where they say "The described HTTPS deployments may leak the identityof the customer CDN websites in one of the following ways,enabling low-cost censorship"
16:50:25 <dcf1> cohosh: ah, thanks for that, I was not remembering clearly and throught that they were doing domain fronting, whereas it was actually "domainless" fronting.
16:50:32 <cohosh> and then go on to list sni and certificates and dedicated IPs but then say that a lot of cdns don't use them
16:51:33 <arma2> how did cdn's scale back then if they didn't use sni? nobody used https? don't you need to say the sni for them to know what https cert to serve, if many sites are on one ip address?
16:51:42 <cohosh> heh this paper was in 2016, times change so quickly :)
16:52:36 <dcf1> Let's Encrypt launched 2016, that changed the balance a lot
16:53:06 <arma2> oh gosh. so, let's encrypt killed domainless fronting?
16:53:08 <dcf1> I think "nobody used https" is probably the answer
16:53:31 <dcf1> those foes of privacy and security!
16:54:40 <cohosh> lol
16:55:09 <arma2> to get back to the domain shadowing thing: one other piece that really worried me was mushing together every website on the internet into your single obscurity domain, and what that does to browser security
16:55:32 <arma2> since now all the cookies are for the same domain (or they aren't for your domain at all), and all the things that are more complex than cookies
16:56:07 <arma2> an earlier version of the domain shadowing paper said it wasn't a big deal, but i convinced them that actually it kind of was if you wanted this thing to work in practice and to be able to visit arbitrary websites
16:56:37 <dcf1> They talk about cookies in 7.2, but I was vaguely uneasy reading it, as I have a sense that the rules are somehow more complicated than strict subdomain checks.
16:58:01 <cohosh> can you use this technique to setup a general purpose HTTP tunnel by having your shadow domain point to, say, a meek server?
16:58:09 <arma2> right. i guess by our previous observation that 'times change so quickly', deity only knows what the browser isolation rules will be in 4 years, and nobody has "keep this domain shadowing thing working" on their browser security plate
16:58:14 <dcf1> cohosh: yes, I think so.
16:58:49 <dcf1> https://developer.mozilla.org/en-US/docs/Web/HTTP/Cookies#domain_attribute
16:59:15 <arma2> cohosh: yes. and this would be a great addition to tor browser, except, each user has to do their own cdn work.
16:59:26 <dcf1> "The Domain attribute specifies which hosts are allowed to receive the cookie. If unspecified, it defaults to the same host that set the cookie, excluding subdomains."
17:00:06 <dcf1> So at least, you have to rewrite cookies so that they specify a Domain attribute, otherwise all the cookies get scoped to shadow.com, is my reading.
17:00:08 <arma2> dcf1: i think the domain shadowing plans involved rewriting the cookies at the browser side, with the little domain shadowing plugin
17:00:13 <arma2> right
17:01:15 <dcf1> We haven't yet talked about how CDN APIs make this practical for live, dynamic browsing
17:01:42 <dcf1> A typical site contains resrouces from dozens of other sites, and you need CDN bindings for all of them.
17:01:50 <arma2> apparently fastly's binding api is quite fast
17:01:54 <arma2> (so says the paper)
17:02:01 <arma2> i wonder if that's true for most or just few
17:02:05 <cohosh> putting the fast in fastly
17:02:14 <dcf1> But it turns out that CDNs provide APIs that are sufficiently fast (a few seconds) that you can create new bindings on demand, right in the browser extension.
17:03:25 <dcf1> Imagine in Tor Browser, instead of entering a bridge line, the user enters a CDN API key
17:03:56 <dcf1> Bridge domainshadow 192.0.2.5:1 1234123412341234 apikey=abcdabcdabcdabcd
17:04:46 <arma2> ah so now all they need to do beforehand is make a fastly account?
17:05:00 <dcf1> right
17:05:10 <arma2> intriguing
17:05:15 <dcf1> this is kind of like what we were trying to do with https://github.com/katherinelitor/GAEuploader
17:05:30 <dcf1> searching for ways to make meek more sustainable with a "user-pays" model
17:06:37 <arma2> but in this case it doesn't need domain fronting. though i guess it wouldn't object to having domain fronting work too. and the tor browser could make up a nonsense domain -- wait, the tor browser needs a surprising new domain to resolve to fastly, right?
17:07:00 <arma2> i guess "no, if domain fronting works, you just domain front"
17:08:42 <arma2> seems sad to make fastly handle k*n new bindings, where k is number of users doing this and n is number of domains they visit
17:09:22 <dcf1> well, in the tor browser case, there wouldn't be an n, because we'd tunnel everything
17:09:47 <arma2> ah ha right it's just "send it to the tor bridge" for every user
17:09:48 <dcf1> a bridge wouldn't even have cryptographic access to segregate things by domain
17:10:24 <arma2> and that's a feature
17:10:26 <dcf1> but yes, probably different bindings for every user, and change them every 10 minutes, why not
17:10:45 <arma2> so, fastly costs $50 per month or something. that's an expensive way to reach tor.
17:11:21 <arma2> so, the race is on for finding the cheapest cdn that has quick binding api's and allows fronting and is too big to fail
17:11:25 <dcf1> True, that's within an order of magnitude of a paid VPN though
17:12:04 <dcf1> Also there's cloudflare, whose lack is noted in this paper as not supporing Host rewriting in the free plan, so you just make the bridge not care about Host
17:12:16 <arma2> oooo
17:12:42 <arlolra> quick binding might not be so important if you're only making one?
17:13:02 <arma2> arlolra: yes, also true. you just need to set up your one binding when you make your account.
17:13:27 <dcf1> yeah, that's the best application of domain shadowing I can think of offhand
17:13:31 <dcf1> create your CDN account
17:13:36 <arma2> does the anti-active-probing trick work when there's only one long term binding and you ignore host and etc?
17:13:59 <dcf1> enter the details into Tor Browser (whether tha's an API key or a binding you have manually configured)
17:14:31 <dcf1> the pluggable transport resolves the front-end domain to the CDN edge servver and the CDN routes it to a bridge (which can be a centralized component)
17:15:03 <dcf1> arma2: you may be right, this may fall to cative probing
17:16:19 <arma2> doing this with cloudflare could be fun because nothing beats free
17:16:50 <arma2> do you... if cloudflare doesn't do domain fronting then what domain do you use?
17:17:25 <dcf1> I was thinking a randomly generated domain
17:17:53 <arma2> do you then need to register it? or no because you already know the ip address to connect to
17:18:01 <dcf1> Maybe even frequently changing, though that's not entirely satisfying
17:18:32 <dcf1> You don't need to register it. You just need to hack the client to "resolve" the name locally and deliver it to the CDN edge server as if it were so registered
17:19:40 <dcf1> "For instance, we have successfully set 5f4dcc3b5aa765d61d8327deb882cf99.com, the MD5 value of the word “password”, as the front-end in Fastly, and connected it to www.facebook.com."
17:19:59 <dcf1> Funnily enough, it seems 5f4dcc3b5aa765d61d8327deb882cf99.com has been reigstered since then.
17:20:44 <dcf1> The non-existent domain thing, though, is probably one of the easier things for CDNs to prevent, if they wanted to.
17:22:45 <maxbee> I think that frequent changing would be a defense against active probing?
17:24:29 <dcf1> It might have to be quite frequent, at least calibrating against the GFW's active probing.
17:24:38 <dcf1> https://github.com/net4people/bbs/issues/22 "The first replay probes usually arrive within seconds of a genuine client connection."
17:26:00 <dcf1> And if the domains are nonexistent, the censor has a highly efficient first-pass filter, which is to try resolving the name that appears in the SNI, and see if it resolves to the destination IP address. Only those TLS connections that fail that test need to be active-probed.
17:26:31 <maxbee> huh - anecdotally I had been thinking that while the probes might arrive quickly, the blocking didn't usually take place for something like on the order of minutes, but I could have been misreading scenarios or not being thorough enough w/ our data
17:28:13 <dcf1> no, dynamic bloacking can be pretty much immediate, it was like that with ESNI blocking last august too https://github.com/net4people/bbs/issues/43
17:31:18 <maxbee> I guess you could pair this w/ something like Frolov's HTTPT? https://www.usenix.org/system/files/foci20-paper-frolov.pdf but that would be starting to add a lot of different pieces together
17:33:13 <dcf1> yeah that should be possible
17:34:28 <dcf1> it might require the CDN to support WebSocket proxying (Cloudflare does, I don't know if others do)
17:35:30 <dcf1> that's a much more efficient way to construct a meek-like tunnel
17:38:21 <dcf1> feels like we're wrapping up here. any other points to discuss?
17:38:38 <cohosh> not from me :)
17:39:18 <maxbee> nope, just that I really enjoyed the reading and thanks for doing this!
17:40:27 <dcf1> thanks agix for the summary
17:41:02 <cohosh> yeah thanks for joining!
17:41:10 <cohosh> i'll end the meeting here
17:41:14 <cohosh> #endmeeting