15:58:23 <cohosh> #startmeeting anti-censorship meeting
15:58:23 <MeetBot> Meeting started Thu Oct  1 15:58:23 2020 UTC.  The chair is cohosh. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:58:23 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic.
15:58:30 <cohosh> hi
15:58:49 <dcf1> hi
15:58:58 <user_gfw-report> hi
15:58:59 <cohosh> this is the weekly anti-censorship meeting and here is our meeting pad: https://pad.riseup.net/p/tor-anti-censorship-keep
15:59:31 <agix> hi
15:59:55 <cohosh> looks like there isn't much on the agenda for today
16:00:24 <phw> o/
16:00:31 <hanneloresx> hi
16:00:41 <cohosh> phw: shall we jump to your discussion point?
16:00:47 <phw> yes
16:01:01 <phw> the problem is that the terminology we use for bridges is a bit confusing
16:01:15 <phw> for example, if a bridge doesn't report itself to bridgedb, we call it "private"
16:01:30 <phw> but that may suggest that bridges who do report themselves aren't private, which isn't exactly true
16:01:53 <phw> it would be helpful to come up with terms that are more clear
16:02:26 <phw> here's the corresponding ticket: https://gitlab.torproject.org/tpo/anti-censorship/rdsys/-/issues/8
16:03:16 <phw> i figured it may be helpful to brainstorm this for a few minutes
16:03:39 <phw> i like "built-in" instead of "default" bridges
16:04:00 <cohosh> what are exclusive bridges again?
16:04:12 <gaba> +1 built-in
16:04:13 <cohosh> are these ones that are sorted by bridgedb into a bucket we hand out manually?
16:05:28 <phw> cohosh: yes, we call these "reserved": https://bridges.torproject.org/info
16:06:56 <phw> as far as users are concerned, "reserved" is the same as "private" (since neither are distributed by bridgedb), so we may want to abandon the term "reserved"
16:08:14 <cohosh> as far as bridge operators are concerned they're different though right?
16:08:24 <cohosh> where private means bridgedb doesn't know about it
16:08:28 <cohosh> and you hand it out yourself
16:08:41 <cohosh> and reserved means that bridgedb does know about it and tor hands it out
16:09:00 <cohosh> maybe i'm also getting confused by terms lol
16:09:26 <phw> correct, but the motivation to improve our terminology is to make it easier for users
16:10:08 <phw> cc antonela
16:10:27 <cohosh> where bridge operators aren't users?
16:11:27 <phw> bridge operators tend to be technical and have an easier time navigating the different terms that we use. it's not that easy for users
16:12:23 <phw> users understand that one needs a bridge to reach the tor network if it's blocked
16:12:49 <phw> but we should minimise the confusion around different types of bridges
16:13:25 <cohosh> okay that makes sense
16:14:09 <hanneloresx> is this just a matter of changing the user-facing info page then or is it also changing terms throughout documentation/code?
16:14:30 <phw> hanneloresx: both, but mostly the latter
16:15:50 <phw> ideally we would come up with less confusing terms that we use consistently, across all user-facing components (tor browser, documentation, web site, etc)
16:17:48 <phw> anyway, we won't fix this today
16:17:58 <phw> if y'all have any ideas, please comment on the ticket!
16:18:15 <cohosh> thanks phw!
16:18:48 <cohosh> okay I guess we can assign reviews now
16:19:34 <cohosh> i've got snowflake!12
16:19:43 <cohosh> and snowflake-webext!4
16:20:14 <cohosh> these are for snowflake#33157
16:20:47 <phw> dcf1: are you interested in reviewing any of these? i'll pick whatever you don't want to review
16:21:13 <dcf1> I understand they are essentially the same, right? One in Go and one in JavaScript.
16:21:17 <cohosh> dcf1: i'll look at snowflake#30510 again to makre sure i have access
16:21:32 <cohosh> dcf1: yeah they are the same feature
16:21:57 <dcf1> I am looking at snowflake!12 now and I will comment something today
16:22:43 <cohosh> thanks!
16:23:03 <phw> i'll take snowflake-webext!4 then
16:23:20 <phw> no review left behind
16:23:46 <cohosh> hehe thanks
16:24:03 <cohosh> alright any other discussion?
16:24:11 <phw> not from me
16:25:24 <cohosh> cool, it looks like we have reading group today
16:26:17 <cohosh> to discuss https://www.usenix.org/conference/foci20/presentation/anonymous
16:27:33 <user_gfw-report> Here is a summary on the paper by dcf1: https://github.com/net4people/bbs/issues/47
16:28:29 <user_gfw-report> I am here for any questions/comments/feedback.
16:28:34 <cohosh> nice!
16:28:41 <cohosh> welcome user_gfw-report!
16:28:52 <dcf1> So there have been many papers analyzing the GFW's practice of DNS injection
16:29:18 <dcf1> Partly because it is comparatively easy to analyze from the outside
16:30:06 <dcf1> One of the common themes in papers going back to, I don't know, 2010 or earlier, is that the poisioned IP addresses are drawn from a fixed pool that changes over time
16:30:47 <dcf1> This paper discovers something new and major about GFW DNS injection: there is more than one injector, and the different ones handle different subsets of domain names
16:31:24 <dcf1> And the different injectors have radically different network fingerprints, though they all appear to be co-located at the level of router hops
16:32:11 <dcf1> So when you do a DNS query, you may get 1, 2, or even 3 injected responses, one from each injector acting independently
16:32:30 <dcf1> But it varies depending on what network you query from
16:33:05 <user_gfw-report> > there is more than one injector
16:33:05 <user_gfw-report> Yes, this is summarized in Figure 9: https://www.usenix.org/system/files/foci20-paper-anonymous_0.pdf#page=6
16:33:05 <user_gfw-report> > you may get 1, 2, or even 3 injected responses
16:33:05 <user_gfw-report> Yes, in one or two subnets, we got up to 5 responses
16:33:05 <user_gfw-report> v
16:33:14 <dcf1> Another interesting outcome of this research is that they caught the moment that the pool of poison IP addresses changed. One day it decreased in size from 1510 to 216
16:35:43 <dcf1> I have heard it said through the grapevine that a large fraction of poison IP addresses belong to organizations like Facebook. I might be mistaken, but this may be the first peer-reviewed paper to document that curious fact.
16:36:00 <cohosh> that's cool
16:37:44 <dcf1> Compare Figure 4(a) from this paper with Figure 4 from a 2014 study:
16:37:49 <dcf1> https://www.usenix.org/system/files/foci20-paper-anonymous_0.pdf#page=5
16:37:53 <dcf1> https://www.usenix.org/system/files/conference/foci14/foci14-anonymous.pdf#page=5
16:38:29 <dcf1> Clearly similar behavior, but in th 2014 one they didn't realize they were looking at only one of many injector processes (or maybe there only was one back then)
16:39:41 <cohosh> i wonder why the three fingerprints for the current injectors are so different
16:40:23 <dcf1> That's my question too. user_gfw-report, do you have any speculation? I wondered if perhaps the GFW takes bids from different contractors who supply them with hardware to host, or something like that.
16:41:00 <dcf1> Could be one is a legacy system they are in the process of replacing, and they run them concurrently for now
16:41:36 <dcf1> Maybe different sub-bureaus that have mandates to block different subject matter, and they all share access to a network tap but do not cooperate
16:41:43 <user_gfw-report> Yes, we sepculate that is likely because there are different contractors: https://github.com/net4people/bbs/issues/47#issuecomment-685836862
16:42:36 <user_gfw-report> Also for the censors to avoid the single point failure.
16:43:04 <arma2> in this case, it seems like non-uniformity of censorship leads to more confusion, which leads to users not feeling confident about getting around the blocking (which is a feature for the censors)
16:43:51 <dcf1> I wonder if it would be possible to identify differences in DNS parsing for the different injectors. Find a domain name that is detected by 2 or more, then construct a tricky DNS query packet that is parsed by one but not the other
16:44:45 <user_gfw-report> That would be interesting. BTW, www.google.sm is the one that could trigger all three injectors.
16:44:47 <dcf1> Like, maybe one of them does a grep in the packet, so it would trigger even if the domain name pattern appeared in EDNS(0) padding, or something like that, while another only looks in the QUESTION section.
16:45:04 <dcf1> Or try something with name compression so that the literal pattern is broken up in the packet.
16:45:07 <studentmain> arma2: We the Chinese user already have too many confusion, so that won't be problem...
16:45:35 <DuckSoft> One biggest feature is that, the polluted DNS response contains very unresonable resolution result: something like 127.0.0.1 and ::1
16:46:02 <user_gfw-report> And maybe one aspect to check the internal parsing logic is whether they parse pointers (correctly): https://gfw.report/blog/gfw_looking_glass/en/
16:46:19 <dcf1> yeah that's what I mean by name compression
16:47:42 <dcf1> Or maybe some only look at the first QUESTION and some look at all of them. (While I was learning about DNS recently, I learned that queries with more than one question (QDCOUNT>1) are not really well-defined.)
16:48:20 <user_gfw-report> We couldn't corroborate this from our dataset. Different from DNS censorship in Iran, all poisoned IPs used by GFW are foreign public IPs.
16:48:26 <arma2> studentmain: right, but this is the sort of "design decision" that makes sure you keep the confusion :) like, if several people get together to compare notes on what censorship they see, they will seem like they have contradictory info.
16:49:05 <user_gfw-report> BTW, another interesting thing we found after this work is that for those “bad IPs” for poisoning, some of them were spoofed by the GFW. It allows TCP handshake on all ports and wait for a data packets to be sent before closing the connection.
16:49:05 <user_gfw-report> Look at the figure 3 of the paper: https://www.usenix.org/system/files/foci20-paper-anonymous_0.pdf#page=4,
16:49:05 <user_gfw-report> Isn’t it a bit strange that 0.4% of the IP-port pairs were reachable in China but not in US?
16:49:05 <user_gfw-report> We conjecture that’s because those IPs were used by the GFW secretly learning what people would send after connecting to the server.
16:49:25 <arma2> i like dcf1's notion of building a fingerprint for each injector. anything that helps automation in watching them, learning when they change, etc.
16:49:56 <user_gfw-report> That sounds very interesting.
16:50:34 <dcf1> Like https://censorbib.nymity.ch/#Bock2020a, where they found two different systems installed in series, and found that they could bypass one of the systems selectively in order to isolate the other
16:51:50 <cohosh> how much does the GFW rely on DNS poisoning for blocking content? Is it the main censorship technique for sites with domains? Or is there also IP blocking?
16:52:34 <studentmain> Not so much, they prefer IP/port blocking.
16:52:47 <DuckSoft> there are both pollutions and blockings
16:53:28 <DuckSoft> They generally pollute domains with CDNs to localhost
16:53:37 <dcf1> I am not sure how up to date it is, but https://hikinggfw.org/blocked_sites lists popular sites and says for each whether it is blocked by HTTP keyword, DNS injection, or IP blocking.
16:53:51 <cohosh> ah cool
16:53:55 <studentmain> We've developed some anti-pollution tools long times ago
16:54:34 <studentmain> And now DoH is getting popular (and maybe getting blocked)
16:54:58 <DuckSoft> the oldest attempt, if i was correct, was something called GoogleHosts
16:54:59 <user_gfw-report> I remember there is a work on characterizing this.
16:54:59 <user_gfw-report> Here, in Figure 4: https://www.usenix.org/system/files/foci19-paper_chai_update.pdf#page=4
16:55:10 <dcf1> There is "Hold-On", which I think means to ignore the injected responses and wait for the genuine response: https://censorbib.nymity.ch/#Duan2012a
16:56:35 <dcf1> In #ooni the other day they were talking about some encrypted DNS servers like 1.1.1.1 and 9.9.9.9 being blocked in Russia.
16:58:06 <DuckSoft> sure since DoT/DoH are encrypted but not anonymous. distinguishable ALPN and even port number make them somewhat unusable under censorship
16:59:17 <dcf1> E.g. https://explorer.ooni.org/search?until=2020-09-29&domain=1.1.1.1&test_name=web_connectivity&probe_cc=RU
16:59:46 <dcf1> Yeah the big DoH servers are probably easy to block, the small ones will probably be like other proxies, blocked as they are used and discovered
16:59:50 <user_gfw-report> > Or is there also IP blocking?
16:59:50 <user_gfw-report> It seems that 80% of the blocked Alex 1M websites are under IP blocking.
16:59:52 <studentmain> Mostly about DoT, DoH is just HTTP, problem about DoH is their address is well known.
17:02:45 <cohosh> it seems like we're winding down on discussion
17:03:01 <dcf1> I guess the main benefit of this work, from Tor's point of view, is giving some insight into how the GFW may be organized internally. I don't think that much of what this team does depends on the availability of DNS, usually.
17:03:13 <arma2> cohosh: right, and that IP blocking is one of the reasons i'm not quite as excited about geneva as they currently framed it: if you have a website that They want to censor, no packet level tricks will get around "they blackholed the IP address"
17:03:16 <cohosh> yeah that's a good point
17:03:48 * dcf1 wonders if Geneva could help construct whacky DNS packets
17:04:04 <cohosh> XD
17:04:05 <DuckSoft> dcf1: that will be interesting!
17:04:09 <arma2> dcf1: i was wondering that too. the next thought was: you need ground truth somehow. like, to know which wacky behavior matches which injector.
17:04:39 <arma2> though, maybe geneva can generate packets that show different behaviors. and then the human can go in to decide how to divide them.
17:05:33 <dcf1> Yeah. user_gfw-report, if you are interested, I can send you a short list of weird DNS packet constructions that have a chance of being parsed differently.
17:05:54 <DuckSoft> maybe also we can train a model to tell how genuine a DNS response is
17:06:19 <user_gfw-report> Yes, please. That would be super cool to try it out.
17:06:36 <dcf1> ok
17:07:43 * cohosh waits a few more minutes to wrap up the meeting
17:08:06 <user_gfw-report> >like, to know which wacky behavior matches which injector.
17:08:06 <user_gfw-report> Right, we can try using domains that exclusively trigger a single injector. Or we can fingerprint the packets to tell which injectors it was sent from.
17:10:17 <cohosh> okay i'll end the meeting here, feel free to stick around and chat for a bit after
17:10:28 <cohosh> user_gfw-report: thanks for joining today to talk about your work!
17:10:36 <cohosh> #endmeeting