16:01:41 #startmeeting Anti-Censorship team meeting, 7th november 2021 16:01:41 Meeting started Thu Oct 7 16:01:41 2021 UTC. The chair is meskio. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:01:41 Useful Commands: #action #agreed #help #info #idea #link #topic. 16:01:51 hello o/ 16:01:59 hi! 16:02:41 here is our meeting pad: https://pad.riseup.net/p/tor-anti-censorship-keep 16:03:36 dcf1: you want to take the first item on the agenda? 16:04:09 Our nix packager suggested making the broker URL a configurable parameter, with snowflake-broker.bamsoftware.com as a default 16:04:12 https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/issues/40070#note_2753884 16:04:42 I was surprised to find that the defaultBrokerURL and defaultRelayURL in the proxy still point to bamsoftware.com domains 16:05:11 for historical reasons we use bamsoftware.com, freehaven.net, and torproject.net in different places, though they all functionally point to the same place 16:05:24 should we change them all to torproject.net? 16:05:33 yeah, iirc the standalone proxy wasn't affected by the bamsoftware issues 16:05:51 and someone in the UK raised a concern with torproject* domains being blocked on some networks 16:06:14 idk if that was a UK specific thing 16:06:30 but i suppose there are coffee shops i used to go to that blocked torproject domains 16:06:31 ok, then freehaven.net for the webext and torproject.net everywhere else 16:06:37 * meskio got trapped into the colorfull website of bamsoftware.com 16:06:41 yeah that makes sense 16:07:02 I assume all the domains point to the same broker 16:07:05 lol woah dcf1 you updated your home page 16:07:45 it's a cool visual effect, with not much code 16:07:58 okay, then I can open an issue to make the change in the golang proxy 16:08:01 ahh my brain 16:08:06 is pretty cool, I couldn't look into the irc window with all this colors moving in the other screen 16:08:14 meskio: yeah its the same broker 16:08:18 cohosh: I have seen *.torproject.org but *not .torproject.net being blocked. 16:08:33 and the same goes for each snowflake bridge url 16:09:05 anadahz: oh okay, it might be fine then 16:09:15 BTW what is the difference between org and net in this context? 16:09:35 we usually use net for "third party" things that are not administrated by our sysadmin team 16:09:53 the snowflake broker and bridge are joinly admin'd by dcf1 and the anti-censorhsip team 16:10:13 AFAIK the bridge url is reached over the snowflake channel, so it should not be censored in any case, isn't it? 16:10:30 meskio: this is censored for the proxies, not the clients 16:10:50 so if a proxy volunteer is in a coffee shop that blocks torproject.net, their proxy won't work 16:11:09 ahh, I see, makes sense 16:12:09 neat, anything else on the agenda before we move onto reading group? 16:12:19 not from my side 16:13:38 * cohosh waits another min 16:13:54 I have a summary draft, but I haven't heard back from the authors yet about it 16:13:57 https://share.riseup.net/#-tBnwEOvYlgPS4hN8yZGcg 16:14:09 are any of the other authors here? 16:14:25 i reached out to them, but haven't heard back 16:16:09 the main authors were undergrads during the time of this work 16:16:21 dcf1: nice summary 16:16:48 has this paper turn into actuall code into dnstt? 16:18:06 no, I did not find anything immediately actionable, I just posted a link at https://www.bamsoftware.com/software/dnstt/security.html#2021-foci 16:18:23 I am planning to add uTLS support though 16:19:09 I was initially reluctant because I wanted the dnstt code to be a clean example of a turbo tunnel design for other developers, without a lot of extra features to complicate it, but it's probably better than not to have built-in TLS camouflage 16:19:55 dcf1: do you think dnstt can eventually be useful for snowflake broker rendezvous? 16:20:10 as in, is it worth taking steps towards implementing that? 16:20:48 yes, with one complication 16:21:19 it would be easy to do in one DNS query+response, if there were more capacity in a DNS query 16:21:51 but we cannot fit an entire client registration into one DNS query. Capacity of a query is about 100 bytes, we need more like 1000 16:22:44 so you need to fragment the registration message somehow and send it over multiple queries, which means the receiver needs like a reconstruction buffer, etc. 16:23:05 many things would work, it needs a design decision though 16:24:02 This is one of the conclusions of the paper, that the DNS tunnel was detectable mainly because of its high speed 16:24:21 * cohosh nods 16:24:25 (high speed relative to normal DNS queries, though still low speed in comparison to other tunnels) 16:25:20 A DNS-based rendezvous for snowflake would look like ~15 DNS exchanges through a DoH resolver, which is not very unusual, considering queries often happen in bursts like that when loading a web page 16:25:43 the idea here would be to use cloudflare's 1.1.1.1 or any other famous DoH to connect to the broker, isn't it? 16:25:48 I'm wondering how hard is for the censor to just block the popular DoH servers if we start using dnstt, I'm not expecting mayor implications for users if you block cloudflare, google and so DoH servers 16:25:52 I asked the authors for their rate-limiting patch, but have not heard back about that either 16:26:45 yes, unfortunately secure DNS has a fallback to insecure DNS in many cases, so blocking secure DNS does not result in many negative outcomes 16:27:09 so it might be too early to move to something like dnstt and we need to wait for more adoption 16:27:18 to don't become like esni in china 16:27:43 actually 3 of the big secure DNS resolvers were blocked in Russia a couple of weeks ago, because they were hardcoded in a voting-related app that the government wanted to block 16:27:47 https://github.com/net4people/bbs/issues/81 16:28:13 https://ntc.party/t/tls-youtube/1311/24 16:28:15 woah 16:28:34 ouch 16:29:01 I unpacked the app and found the array {"dns.google", "1.1.1.1", "1.0.0.1", "doh.opendns.com"} 16:29:59 so yes, the security of circumvention reduces to the blocking resistance of some proxy in the middle 16:30:23 i wonder if the "wait until it gets more adoption" strategy is a bit undercut by the control most places have over the applications they want people to use 16:31:26 yes, I'm not sure the best strategy here, I think will be great to include dnstt in snowflake but don't enable it by default, have it as backup 16:31:55 not necessarily dnstt, just dns-based rendezvous in general 16:32:22 I say that because it may not require the full overhead of KCP and Noise for registration messages 16:33:03 I'm not sure. Certainly KCP would suffice for fragmenting messages over many queries, and maybe that's the best/simplest way 16:34:32 This aspect of the paper was interesting: 16:34:47 "The modified dnstt sessions managed to evade detection on the average payload length attack, for the bidirectional and incoming attacks. Recall and precision were significantly decreased for the outgoing direction as well. For packet rate and throughput attacks, our rate limiting was insufficient to evade detection." 16:35:06 So they invented 3 attacks: avg payload length, packet rate, throughput. 16:35:32 They added rate limiting to the dnstt server, and this was effective against the payload length attack but not the other two attacks. 16:36:05 They do not present any modification that is effective against all the posited attacks 16:36:24 So I wondered what further traffic modification to make it even more resistant would look like 16:38:24 I guess you could add some delay between packets 16:38:54 mmm, rate limiting is already kind of delaying, but maybe too predictive 16:39:26 there is actually some support (at the protocol level) in dnstt for traffic shaping, in terms of padding packets, and it would be possible to send packets according to a defined schedule with some additional hacking 16:40:13 afaik it's still an open research question what a good schedule would look like 16:40:15 hm there was a paper that i'm looking for now that used GANs to generate user models for sending emails 16:40:46 I can sympathize with the difficulties of modeling. 16:41:09 is it https://eprint.iacr.org/2021/686 16:41:32 oh it's not that one, but i hadn't seen that yet 16:41:58 yes, that was one of the questions I was thinking reading the paper, the paper is making some made up requests DoH requests to use as 'normal' traffic, is hard to know if all those made up traffic has any relation to real users traffic 16:42:02 the non-circumventor data set in this paper comes from browsing the home pages of alexa top sites, which is common in modeling, though one can argue how realistic it is 16:42:15 this was about using GANs to decide when to send the emails used in the tunnel and how long to make the emails 16:42:45 yeah that's a good point 16:42:46 it's also the case that in the circumventor data set, it contains the *contents* (not just the tunneled DNS queries) of those same alexa top sites, so it's necessarily much bigger 16:43:09 we reached out to the CU Boulder researchers who had a tap on the university network 16:43:40 even without considering the DNS tunnel aspect of it, you can probably distinguish a data set that contains only DNS traffic from one that contains DNS traffic + HTTP traffic based on timing features, etc. 16:43:57 which would have been a nicer dataset to use 16:44:02 (Which is not to take away from the true observation that in common use, the DNS tunnel traffic will be used to tunnel HTTP traffic) 16:44:35 cohosh: let me know if you find that email paper, sounds interesting 16:45:21 it's a big challenge, and it's unfortunate that more empirical modeling may be limited to large organizations with lots of resources, though things seem to be moving in that direction 16:46:20 I would like to see some kind of benchmark comparison paper, like 16:46:55 "if you had this hypothesis and conducted these statistical tests, you would reach conclusion A if you used alexs top sites and conclusion B if you used our university tap" 16:47:23 work on how to lower the barrier of researching these problems would be awesome 16:47:27 I feel that currently we are dealing with an unquantified level of unreality 16:47:51 (though as I said I sympathize, to do much better than the status quo is not easy) 16:47:58 ooh yeah that comparison paper would be great 16:48:15 it reminds me of the tor shadow papers 16:48:23 I'm wondering the eticity of the taps for research, I guess you can annonimize some pieces to make it not too bad 16:48:33 https://www.robgjansen.com/publications/neverenough-sec2021.pdf 16:49:30 basically, a meta paper on how to set up experiments so they are more accurate ^^ 16:49:43 yeah 16:49:58 it was interesting that this paper used both alexa and umbrella top sites, in different experiement 16:50:11 there is a third one I've seen used too in other papers, can't remember the name 16:51:05 https://tranco-list.eu/ 16:51:48 it also bothers me that papers never cite nor publish the specific date of the alexa list they used; afaik it changes daily 16:53:45 yea, that's a good point 16:54:03 I guess one other feature that is usually used in this type of analysis is interpacket timing 16:54:24 not sure if it was considered for this paper and turned out not to be useful, or what 16:54:40 one of the goals of the attacks were to make them low effort attacks 16:54:48 I guess you can infer average interpacket times from average packet rate 16:54:54 or to conside ronly low effort attacks 16:55:13 of the form "once this flow crosses this threshold, or once you see this behaviour, cut the connection" 16:55:25 i guess you could have a running average 16:55:37 that's a good point 16:55:58 or if some bucket in the IAT histogram is especially predictive, keep a running count of that one bucket 16:56:16 I assume this kind of technics are still far from what censors do, is risky to base censorship on stadistical annalisys 16:56:38 yea 16:56:55 I think that's the current belief, though in combination with other signals like active probing it can become powerful 16:57:12 though active probing doesn't work in the dnstt threat model 16:57:23 yep :) 16:57:33 i gotta run :/ 16:57:43 bye bye 16:57:55 thanks for the discussion, i will ping the main authors again about the code and the summary! 16:58:20 nice 16:58:25 and thanks dcf1 for all the feedback for the paper :D 16:58:31 e.g. found this with shadowsocks 16:58:35 I will also need to leave soon 16:58:38 https://gfw.report/talks/imc20/en/ 16:58:44 "Now you can understand this process of active probing as a way of increasing precision or reducing cost in network classification. If you were to write a purely passive classifier for Shadowsocks, it may yield unacceptably high false positives. On the other hand, if you were to try to active probe every single connection that passes through the firewall that may be more probes than you can manage to 16:58:50 send. So you can think of step one as being a sort of pre-filter for step two." 16:59:35 makes sense 16:59:39 and also is scary 16:59:54 ok let's wait until next week to choose a future paper then 17:00:12 yes, let's talk about it next week 17:00:19 anything more about it 17:00:50 I'll wait a minute to close the meeting 17:01:54 #endmeeting