16:00:16 <cohosh> #startmeeting tor anti-censorship meeting 16:00:16 <MeetBot> Meeting started Thu Apr 29 16:00:16 2021 UTC. The chair is cohosh. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:16 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic. 16:00:21 <cohosh> hey everyone! 16:00:36 <cohosh> here is our meeting pad: https://pad.riseup.net/p/tor-anti-censorship-keep 16:01:55 <cohosh> there's not a lot on the agenda for today 16:02:01 <cohosh> but we do have a reading group discussion 16:02:44 <dcf1> are you planning to post the Belarus report? 16:02:54 <cohosh> dcf1: yes! 16:03:12 <cohosh> i was just going to post it to the existing belarus thread on net4people 16:03:16 <cohosh> and maybe make a blog post 16:03:20 <cohosh> and put it on ntc.party 16:03:31 <dcf1> better make a new thread imo, it's distinct enough 16:03:40 <cohosh> ah okay, good call 16:04:10 <dcf1> I only made a grab-bag Belarus thread because I didn't have time to sort through it all 16:04:54 <cohosh> good to know, i think there is an existing thread on ntc.party on tor blocking specifically that i can post to there 16:05:34 <agix> hi 16:05:40 <cohosh> hey agix! 16:06:11 <cohosh> okay our first announcement for the day is that tor is switching our default git brances to main 16:06:28 <cohosh> i created a ticket for it at anti-censorship/team#6 16:06:46 <cohosh> there are tickets that other teams have made as well that are linked there 16:07:26 <cohosh> we have a lot of repositories and many are maintained by volunteers so i don't want to step on peoples' toes there 16:08:14 <cohosh> feel free to comment on the ticket if you have opinions or feedback 16:08:49 <cohosh> for many repositories it will not be a big deal to change 16:10:01 <cohosh> i'll probably send an email to the anti-censorship team list about it as well 16:10:13 <dcf1> I'm curious why the ticket prioritizes GitLab repos, rather than git.tpo ones 16:10:31 <dcf1> Since there's a big warning on gitlab pages saying that it's not meant as a primary code repository, or something 16:11:06 <dcf1> "GitLab is used only for code review, issue tracking and project management. Canonical locations for source code are still https://gitweb.torproject.org/ https://git.torproject.org/ and git-rw.torproject.org." 16:11:43 <cohosh> hm, yeah you're right 16:12:11 <cohosh> when i was going through the repos i noticed many were on gitlab and not git.tpo 16:12:52 <dcf1> yeah the warning banner never stood a chance 16:13:16 <cohosh> it's also the case that all the repos the org maintains are on gitlab 16:13:53 <cohosh> and while there are many anti-censorship repos not listed in this ticket, they are either completely maintained by people outside TPI or haven't been updated in years 16:14:01 <dcf1> at least a fork/mirror is on gitlab for active projects, because of merge requests 16:14:07 <cohosh> yeah 16:14:16 <cohosh> it's also a bit disorganized there though 16:14:27 <cohosh> most of our mirrors are git.tpo --> gitlab (like snowflake) 16:14:39 <cohosh> but i believe there are some gitlab --> git.tpo mirrors 16:14:44 <cohosh> and these are not well documented 16:15:03 <dcf1> oh bummer 16:15:28 <cohosh> perhaps we should document all of these in the team wiki 16:15:49 <cohosh> actually yes we certianly should 16:16:08 <cohosh> because i am just keeping in my mind which place to push to all the time 16:16:21 <cohosh> pushing to the wrong place breaks the mirror 16:16:34 * cohosh adds better repo documentation to the list of todos 16:17:10 <cohosh> as far as this change, i was not planning on updating the default branch for really old, discontinued repositories 16:17:25 <cohosh> unless they start receiving active development 16:17:42 <cohosh> but i'm happy to do that as well 16:19:29 <cohosh> okay let's move on and we can continue to sort out details in the ticket and emails 16:19:47 <cohosh> april is ending and it's time for our monthly report: https://pad.riseup.net/p/3v7ixS9pkbAdtS7TBxYW 16:20:04 <cohosh> i've started it and feel free to add items you think should be there 16:20:54 <cohosh> any reviews or needs help with items for this week? 16:21:12 <agix> not from me 16:21:37 <cohosh> agix: i just saw your comment on bridgedb#27984 and i'll merge that after the meeting :) 16:21:49 <agix> cohosh thanks :) 16:22:23 <dcf1> I am scheduled to do AMP cache rendezvous in about 8 weeks, so I will watching / helping with snowflake#29293 16:22:59 <cohosh> dcf1: ah awesome \o/ 16:23:00 <dcf1> wrt requirements engineering 16:23:23 <cohosh> i am actually planning on getting to that this week heh 16:23:43 <cohosh> if you ever need me to move osmething up my priority list i am happy to fwiw 16:24:59 <cohosh> i could use a review for the moat shim bridgedb#32276 but i can ask meskio (our new developer) to review it next week as well :) 16:26:11 <cohosh> should we move on to our reading group? 16:26:58 <dcf1> i'm ready 16:28:00 <cohosh> cool let's start 16:28:24 <cohosh> our reading for this week was on Domain Shadowing: Leveraging Content Delivery Networks for Robust Blocking-Resistant Communications 16:28:33 <cohosh> link to paper: https://www.usenix.org/system/files/sec21fall-wei.pdf 16:28:44 <cohosh> anyone have a summary prepared? 16:28:52 <agix> yep, i got a short one 16:28:58 <cohosh> agix: awesome! 16:29:10 <agix> <summary> 16:29:23 <agix> This paper presents a novel censorship evasion technique called Domain Shadowing, which takes advantage of the fact that CDNs allow their customers to bind their front-end domain to any back-end domain. 16:29:38 <agix> A user only needs to register a new domain to a CDN service that is accessible from the censored country and bind the domain to the actual target domain, in other words the censored domain the he/she wants to visit. Within the CDN user account a rule needs to be specified that rewrites the Host header of the incoming requests to the target domain, otherwise the requests will be rejected. 16:29:51 <agix> Once these steps have been established, the user sends a request to the registered domain within the censored area. The request will be sent to the CDN, where the Host header will be rewritten according to the specified rule and the request will be forwarded to the target domain. The subsequent response will be delivered under the user registered domain name. 16:30:10 <agix> During this process, a censor sees only an HTTPS to the CDN requesting the previously registered user domain and thus will not block the connection. 16:30:22 <agix> Additionally the author proposes the use of DfDs, which combines the efforts of domain fronting and domain shadowing for an enhanced blocking-resistant evasion technique. 16:30:30 <agix> </summary> 16:30:49 <dcf1> thanks 16:31:09 <dcf1> "a rule needs to be specified that rewrites the Host header of the incoming requests to the target domain, otherwise the requests will be rejected." 16:31:46 <dcf1> I think this is not true in general (i.e., it's not an HTTP rule or anything), but it is true for various important sites, the paper gives Facebook and Amazon S3 as examples 16:32:35 <agix> good point, thanks! 16:33:22 <dcf1> Let me explain how I am mentally categorizing this research 16:34:32 <dcf1> A few years back there was CacheBrowsing (https://censorbib.nymity.ch/#Holowczak2015a) / CDNBrowsing (https://censorbib.nymity.ch/#Zolfaghari2016a). (I always have trouble remembering which of these two papers does what.) 16:35:31 <dcf1> Essentially the idea is you build a mapping of what websites are on what CDNs, and you use a browser extension to front every request with a front domain that is appropriate for that CDN 16:35:56 <dcf1> Of course, it only works for sites that actually are on a CDN. 16:36:53 <dcf1> I see Domain Shadowing as patching this gap in CacheBrowsing: since you create your own CDN domain bindings, it is as if *every* site were on a CDN, and you don't need a site->CDN mapping anymore. 16:38:19 <dcf1> After that, it's mechanically the same: you go directly to the destination web server (not via an HTTP-based tunnel, like meek) 16:39:07 <dcf1> And I believe the security and privacy caveats are the same as with CacheBrowsing 16:39:19 * cohosh nods 16:40:04 <dcf1> One noteworthy difference is that the user pays for their own traffic (since it is using their own CDN bindings), rather than the publisher paying for the traffic. 16:41:17 <agix> so do you think it would make more sense to combine Domain Shadowing with approaches like CacheBrowsing rather than Domain Fronting 16:41:28 <arma2> does this one also work better in places that don't allow domain fronting? that is, if you have to name what site you're going to, in cachebrowsing, you name the publisher's site and then maybe the censor knows they want to block it 16:41:53 <arma2> whereas here you name your own personal nonsense site and the censor has never heard of it 16:42:19 <dcf1> agix: no, that's not what I mean. CacheBrowsing itself is based on domain fronting. It's using preexisting CDN bindings rather than user-created ones, is the critical distinction, I think. 16:42:51 <agix> dcf1 got ya 16:44:43 <dcf1> arma2: yes, I think one use case is as a personal secure-through-obscurity proxy. You use your own domain name for all your domain shadowing needs, and as long as it doesn't get too big, you don't get blocked. Kind of like a very cheap / agile mirror site setup. 16:45:11 <arma2> right. and in that sense, it's one of the few cdn-using variations that doesn't need domain fronting too. 16:45:31 <cohosh> arma2: there's a section in the CDN browsing paper that talks about sni 16:45:34 <arma2> but it's unsatisfying in that it doesn't scale well for that scenario 16:45:41 <dcf1> (Where mentally, I also file mirror sites into the secure-through-obscurity bucket, because they are easy to block once discovered) 16:45:51 <dcf1> well, this paper has a few other interesting variations 16:45:52 <cohosh> it seems that at the time of writing it was not deployed widely 16:46:46 <dcf1> 1. you point your own domain at someone else's domain 16:47:14 <dcf1> 2. you point a nonexistent domain (Section 3.5) at someone else's domain 16:47:55 <dcf1> 3. you point someone else's domain at someone else's else's domain (Section 4.2.2 has www.facebook.com shadowed by www.forbes.com) 16:49:04 <dcf1> Kind of surprising that CDNs will let you claim some random domains and if the traffic manages to get to their edge servers, they'll follow the rules you set up, but the paper gives some technical reasons why things work that way (Section 6.2) 16:49:20 <arma2> yea, that part is fun :) 16:49:40 <cohosh> arma2: so at the time of writing, cache/CDNBrowsing had an assumption built in that all a censor sees is the CDN and not the customer of that CDN's domain because things like sni weren't widely used by CDNs yet 16:50:00 <arma2> cohosh: ah ha. so back in cachebrowsing time, everybody domain fronted by default, kinda 16:50:11 <cohosh> they have a section in their paper where they say "The described HTTPS deployments may leak the identityof the customer CDN websites in one of the following ways,enabling low-cost censorship" 16:50:25 <dcf1> cohosh: ah, thanks for that, I was not remembering clearly and throught that they were doing domain fronting, whereas it was actually "domainless" fronting. 16:50:32 <cohosh> and then go on to list sni and certificates and dedicated IPs but then say that a lot of cdns don't use them 16:51:33 <arma2> how did cdn's scale back then if they didn't use sni? nobody used https? don't you need to say the sni for them to know what https cert to serve, if many sites are on one ip address? 16:51:42 <cohosh> heh this paper was in 2016, times change so quickly :) 16:52:36 <dcf1> Let's Encrypt launched 2016, that changed the balance a lot 16:53:06 <arma2> oh gosh. so, let's encrypt killed domainless fronting? 16:53:08 <dcf1> I think "nobody used https" is probably the answer 16:53:31 <dcf1> those foes of privacy and security! 16:54:40 <cohosh> lol 16:55:09 <arma2> to get back to the domain shadowing thing: one other piece that really worried me was mushing together every website on the internet into your single obscurity domain, and what that does to browser security 16:55:32 <arma2> since now all the cookies are for the same domain (or they aren't for your domain at all), and all the things that are more complex than cookies 16:56:07 <arma2> an earlier version of the domain shadowing paper said it wasn't a big deal, but i convinced them that actually it kind of was if you wanted this thing to work in practice and to be able to visit arbitrary websites 16:56:37 <dcf1> They talk about cookies in 7.2, but I was vaguely uneasy reading it, as I have a sense that the rules are somehow more complicated than strict subdomain checks. 16:58:01 <cohosh> can you use this technique to setup a general purpose HTTP tunnel by having your shadow domain point to, say, a meek server? 16:58:09 <arma2> right. i guess by our previous observation that 'times change so quickly', deity only knows what the browser isolation rules will be in 4 years, and nobody has "keep this domain shadowing thing working" on their browser security plate 16:58:14 <dcf1> cohosh: yes, I think so. 16:58:49 <dcf1> https://developer.mozilla.org/en-US/docs/Web/HTTP/Cookies#domain_attribute 16:59:15 <arma2> cohosh: yes. and this would be a great addition to tor browser, except, each user has to do their own cdn work. 16:59:26 <dcf1> "The Domain attribute specifies which hosts are allowed to receive the cookie. If unspecified, it defaults to the same host that set the cookie, excluding subdomains." 17:00:06 <dcf1> So at least, you have to rewrite cookies so that they specify a Domain attribute, otherwise all the cookies get scoped to shadow.com, is my reading. 17:00:08 <arma2> dcf1: i think the domain shadowing plans involved rewriting the cookies at the browser side, with the little domain shadowing plugin 17:00:13 <arma2> right 17:01:15 <dcf1> We haven't yet talked about how CDN APIs make this practical for live, dynamic browsing 17:01:42 <dcf1> A typical site contains resrouces from dozens of other sites, and you need CDN bindings for all of them. 17:01:50 <arma2> apparently fastly's binding api is quite fast 17:01:54 <arma2> (so says the paper) 17:02:01 <arma2> i wonder if that's true for most or just few 17:02:05 <cohosh> putting the fast in fastly 17:02:14 <dcf1> But it turns out that CDNs provide APIs that are sufficiently fast (a few seconds) that you can create new bindings on demand, right in the browser extension. 17:03:25 <dcf1> Imagine in Tor Browser, instead of entering a bridge line, the user enters a CDN API key 17:03:56 <dcf1> Bridge domainshadow 192.0.2.5:1 1234123412341234 apikey=abcdabcdabcdabcd 17:04:46 <arma2> ah so now all they need to do beforehand is make a fastly account? 17:05:00 <dcf1> right 17:05:10 <arma2> intriguing 17:05:15 <dcf1> this is kind of like what we were trying to do with https://github.com/katherinelitor/GAEuploader 17:05:30 <dcf1> searching for ways to make meek more sustainable with a "user-pays" model 17:06:37 <arma2> but in this case it doesn't need domain fronting. though i guess it wouldn't object to having domain fronting work too. and the tor browser could make up a nonsense domain -- wait, the tor browser needs a surprising new domain to resolve to fastly, right? 17:07:00 <arma2> i guess "no, if domain fronting works, you just domain front" 17:08:42 <arma2> seems sad to make fastly handle k*n new bindings, where k is number of users doing this and n is number of domains they visit 17:09:22 <dcf1> well, in the tor browser case, there wouldn't be an n, because we'd tunnel everything 17:09:47 <arma2> ah ha right it's just "send it to the tor bridge" for every user 17:09:48 <dcf1> a bridge wouldn't even have cryptographic access to segregate things by domain 17:10:24 <arma2> and that's a feature 17:10:26 <dcf1> but yes, probably different bindings for every user, and change them every 10 minutes, why not 17:10:45 <arma2> so, fastly costs $50 per month or something. that's an expensive way to reach tor. 17:11:21 <arma2> so, the race is on for finding the cheapest cdn that has quick binding api's and allows fronting and is too big to fail 17:11:25 <dcf1> True, that's within an order of magnitude of a paid VPN though 17:12:04 <dcf1> Also there's cloudflare, whose lack is noted in this paper as not supporing Host rewriting in the free plan, so you just make the bridge not care about Host 17:12:16 <arma2> oooo 17:12:42 <arlolra> quick binding might not be so important if you're only making one? 17:13:02 <arma2> arlolra: yes, also true. you just need to set up your one binding when you make your account. 17:13:27 <dcf1> yeah, that's the best application of domain shadowing I can think of offhand 17:13:31 <dcf1> create your CDN account 17:13:36 <arma2> does the anti-active-probing trick work when there's only one long term binding and you ignore host and etc? 17:13:59 <dcf1> enter the details into Tor Browser (whether tha's an API key or a binding you have manually configured) 17:14:31 <dcf1> the pluggable transport resolves the front-end domain to the CDN edge servver and the CDN routes it to a bridge (which can be a centralized component) 17:15:03 <dcf1> arma2: you may be right, this may fall to cative probing 17:16:19 <arma2> doing this with cloudflare could be fun because nothing beats free 17:16:50 <arma2> do you... if cloudflare doesn't do domain fronting then what domain do you use? 17:17:25 <dcf1> I was thinking a randomly generated domain 17:17:53 <arma2> do you then need to register it? or no because you already know the ip address to connect to 17:18:01 <dcf1> Maybe even frequently changing, though that's not entirely satisfying 17:18:32 <dcf1> You don't need to register it. You just need to hack the client to "resolve" the name locally and deliver it to the CDN edge server as if it were so registered 17:19:40 <dcf1> "For instance, we have successfully set 5f4dcc3b5aa765d61d8327deb882cf99.com, the MD5 value of the word “password”, as the front-end in Fastly, and connected it to www.facebook.com." 17:19:59 <dcf1> Funnily enough, it seems 5f4dcc3b5aa765d61d8327deb882cf99.com has been reigstered since then. 17:20:44 <dcf1> The non-existent domain thing, though, is probably one of the easier things for CDNs to prevent, if they wanted to. 17:22:45 <maxbee> I think that frequent changing would be a defense against active probing? 17:24:29 <dcf1> It might have to be quite frequent, at least calibrating against the GFW's active probing. 17:24:38 <dcf1> https://github.com/net4people/bbs/issues/22 "The first replay probes usually arrive within seconds of a genuine client connection." 17:26:00 <dcf1> And if the domains are nonexistent, the censor has a highly efficient first-pass filter, which is to try resolving the name that appears in the SNI, and see if it resolves to the destination IP address. Only those TLS connections that fail that test need to be active-probed. 17:26:31 <maxbee> huh - anecdotally I had been thinking that while the probes might arrive quickly, the blocking didn't usually take place for something like on the order of minutes, but I could have been misreading scenarios or not being thorough enough w/ our data 17:28:13 <dcf1> no, dynamic bloacking can be pretty much immediate, it was like that with ESNI blocking last august too https://github.com/net4people/bbs/issues/43 17:31:18 <maxbee> I guess you could pair this w/ something like Frolov's HTTPT? https://www.usenix.org/system/files/foci20-paper-frolov.pdf but that would be starting to add a lot of different pieces together 17:33:13 <dcf1> yeah that should be possible 17:34:28 <dcf1> it might require the CDN to support WebSocket proxying (Cloudflare does, I don't know if others do) 17:35:30 <dcf1> that's a much more efficient way to construct a meek-like tunnel 17:38:21 <dcf1> feels like we're wrapping up here. any other points to discuss? 17:38:38 <cohosh> not from me :) 17:39:18 <maxbee> nope, just that I really enjoyed the reading and thanks for doing this! 17:40:27 <dcf1> thanks agix for the summary 17:41:02 <cohosh> yeah thanks for joining! 17:41:10 <cohosh> i'll end the meeting here 17:41:14 <cohosh> #endmeeting