16:01:41 <meskio> #startmeeting Anti-Censorship team meeting, 7th november 2021
16:01:41 <MeetBot> Meeting started Thu Oct  7 16:01:41 2021 UTC.  The chair is meskio. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:01:41 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic.
16:01:51 <meskio> hello o/
16:01:59 <cohosh> hi!
16:02:41 <cohosh> here is our meeting pad: https://pad.riseup.net/p/tor-anti-censorship-keep
16:03:36 <cohosh> dcf1: you want to take the first item on the agenda?
16:04:09 <dcf1> Our nix packager suggested making the broker URL a configurable parameter, with snowflake-broker.bamsoftware.com as a default
16:04:12 <dcf1> https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/issues/40070#note_2753884
16:04:42 <dcf1> I was surprised to find that the defaultBrokerURL and defaultRelayURL in the proxy still point to bamsoftware.com domains
16:05:11 <dcf1> for historical reasons we use bamsoftware.com, freehaven.net, and torproject.net in different places, though they all functionally point to the same place
16:05:24 <dcf1> should we change them all to torproject.net?
16:05:33 <cohosh> yeah, iirc the standalone proxy wasn't affected by the bamsoftware issues
16:05:51 <cohosh> and someone in the UK raised a concern with torproject* domains being blocked on some networks
16:06:14 <cohosh> idk if that was a UK specific thing
16:06:30 <cohosh> but i suppose there are coffee shops i used to go to that blocked torproject domains
16:06:31 <dcf1> ok, then freehaven.net for the webext and torproject.net everywhere else
16:06:37 * meskio got trapped into the colorfull website of bamsoftware.com
16:06:41 <cohosh> yeah that makes sense
16:07:02 <meskio> I assume all the domains point to the same broker
16:07:05 <cohosh> lol woah dcf1 you updated your home page
16:07:45 <dcf1> it's a cool visual effect, with not much code
16:07:58 <dcf1> okay, then I can open an issue to make the change in the golang proxy
16:08:01 <cohosh> ahh my brain
16:08:06 <meskio> is pretty cool, I couldn't look into the irc window with all this colors moving in the other screen
16:08:14 <cohosh> meskio: yeah its the same broker
16:08:18 <anadahz> cohosh: I have seen *.torproject.org but *not .torproject.net being blocked.
16:08:33 <cohosh> and the same goes for each snowflake bridge url
16:09:05 <cohosh> anadahz: oh okay, it might be fine then
16:09:15 <anadahz> BTW what is the difference between org and net in this context?
16:09:35 <cohosh> we usually use net for "third party" things that are not administrated by our sysadmin team
16:09:53 <cohosh> the snowflake broker and bridge are joinly admin'd by dcf1 and the anti-censorhsip team
16:10:13 <meskio> AFAIK the bridge url is reached over the snowflake channel, so it should not be censored in any case, isn't it?
16:10:30 <cohosh> meskio: this is censored for the proxies, not the clients
16:10:50 <cohosh> so if a proxy volunteer is in a coffee shop that blocks torproject.net, their proxy won't work
16:11:09 <meskio> ahh, I see, makes sense
16:12:09 <cohosh> neat, anything else on the agenda before we move onto reading group?
16:12:19 <meskio> not from my side
16:13:38 * cohosh waits another min
16:13:54 <dcf1> I have a summary draft, but I haven't heard back from the authors yet about it
16:13:57 <dcf1> https://share.riseup.net/#-tBnwEOvYlgPS4hN8yZGcg
16:14:09 <cohosh> are any of the other authors here?
16:14:25 <cohosh> i reached out to them, but haven't heard back
16:16:09 <cohosh> the main authors were undergrads during the time of this work
16:16:21 <meskio> dcf1: nice summary
16:16:48 <meskio> has this paper turn into actuall code into dnstt?
16:18:06 <dcf1> no, I did not find anything immediately actionable, I just posted a link at https://www.bamsoftware.com/software/dnstt/security.html#2021-foci
16:18:23 <dcf1> I am planning to add uTLS support though
16:19:09 <dcf1> I was initially reluctant because I wanted the dnstt code to be a clean example of a turbo tunnel design for other developers, without a lot  of extra features to complicate it, but it's probably better than not to have built-in TLS camouflage
16:19:55 <cohosh> dcf1: do you think dnstt can eventually be useful for snowflake broker rendezvous?
16:20:10 <cohosh> as in, is it worth taking steps towards implementing that?
16:20:48 <dcf1> yes, with one complication
16:21:19 <dcf1> it would be easy to do in one DNS query+response, if there were more capacity in a DNS query
16:21:51 <dcf1> but we cannot fit an entire client registration into one DNS query. Capacity of a query is about 100 bytes, we need more like 1000
16:22:44 <dcf1> so you need to fragment the registration message somehow and send it over multiple queries, which means the receiver needs like a reconstruction buffer, etc.
16:23:05 <dcf1> many things would work, it needs a design decision though
16:24:02 <dcf1> This is one of the conclusions of the paper, that the DNS tunnel was detectable mainly because of its high speed
16:24:21 * cohosh nods
16:24:25 <dcf1> (high speed relative to normal DNS queries, though still low speed in comparison to other tunnels)
16:25:20 <dcf1> A DNS-based rendezvous for snowflake would look like ~15 DNS exchanges through a DoH resolver, which is not very unusual, considering queries often happen in bursts like that when loading a web page
16:25:43 <meskio> the idea here would be to use cloudflare's 1.1.1.1 or any other famous DoH to connect to the broker, isn't it?
16:25:48 <meskio> I'm wondering how hard is for the censor to just block the popular DoH servers if we start using dnstt, I'm not expecting mayor implications for users if you block cloudflare, google and so DoH servers
16:25:52 <dcf1> I asked the authors for their rate-limiting patch, but have not heard back about that either
16:26:45 <dcf1> yes, unfortunately secure DNS has a fallback to insecure DNS in many cases, so blocking secure DNS does not result in many negative outcomes
16:27:09 <meskio> so it might be too early to move to something like dnstt and we need to wait for more adoption
16:27:18 <meskio> to don't become like esni in china
16:27:43 <dcf1> actually 3 of the big secure DNS resolvers were blocked in Russia a couple of weeks ago, because they were hardcoded in a voting-related app that the government wanted to block
16:27:47 <dcf1> https://github.com/net4people/bbs/issues/81
16:28:13 <dcf1> https://ntc.party/t/tls-youtube/1311/24
16:28:15 <cohosh> woah
16:28:34 <meskio> ouch
16:29:01 <dcf1> I unpacked the app and found the array {"dns.google", "1.1.1.1", "1.0.0.1", "doh.opendns.com"}
16:29:59 <dcf1> so yes, the security of circumvention reduces to the blocking resistance of some proxy in the middle
16:30:23 <cohosh> i wonder if the "wait until it gets more adoption" strategy is a bit undercut by the control most places have over the applications they want people to use
16:31:26 <meskio> yes, I'm not sure the best strategy here, I think will be great to include dnstt in snowflake but don't enable it by default, have it as backup
16:31:55 <dcf1> not necessarily dnstt, just dns-based rendezvous in general
16:32:22 <dcf1> I say that because it may not require the full overhead of KCP and Noise for registration messages
16:33:03 <dcf1> I'm not sure. Certainly KCP would suffice for fragmenting messages over many queries, and maybe that's the best/simplest way
16:34:32 <dcf1> This aspect of the paper was interesting:
16:34:47 <dcf1> "The modified dnstt sessions managed to evade detection on the average payload length attack, for the bidirectional and incoming attacks. Recall and precision were significantly decreased for the outgoing direction as well. For packet rate and throughput attacks, our rate limiting was insufficient to evade detection."
16:35:06 <dcf1> So they invented 3 attacks: avg payload length, packet rate, throughput.
16:35:32 <dcf1> They added rate limiting to the dnstt server, and this was effective against the payload length attack but not the other two attacks.
16:36:05 <dcf1> They do not present any modification that is effective against all the posited attacks
16:36:24 <dcf1> So I wondered what further traffic modification to make it even more resistant would look like
16:38:24 <meskio> I guess you could add some delay between packets
16:38:54 <meskio> mmm, rate limiting is already kind of delaying, but maybe too predictive
16:39:26 <dcf1> there is actually some support (at the protocol level) in dnstt for traffic shaping, in terms of padding packets, and it would be possible to send packets according to a defined schedule with some additional hacking
16:40:13 <dcf1> afaik it's still an open research question what a good schedule would look like
16:40:15 <cohosh> hm there was a paper that i'm looking for now that used GANs to generate user models for sending emails
16:40:46 <dcf1> I can sympathize with the difficulties of modeling.
16:41:09 <dcf1> is it https://eprint.iacr.org/2021/686
16:41:32 <cohosh> oh it's not that one, but i hadn't seen that yet
16:41:58 <meskio> yes, that was one of the questions I was thinking reading the paper, the paper is making some made up requests DoH requests to use as 'normal' traffic, is hard to know if all those made up traffic has any relation to real users traffic
16:42:02 <dcf1> the non-circumventor data set in this paper comes from browsing the home pages of alexa top sites, which is common in modeling, though one can argue how realistic it is
16:42:15 <cohosh> this was about using GANs to decide when to send the emails used in the tunnel and how long to make the emails
16:42:45 <cohosh> yeah that's a good point
16:42:46 <dcf1> it's also the case that in the circumventor data set, it contains the *contents* (not just the tunneled DNS queries) of those same alexa top sites, so it's necessarily much bigger
16:43:09 <cohosh> we reached out to the CU Boulder researchers who had a tap on the university network
16:43:40 <dcf1> even without considering the DNS tunnel aspect of it, you can probably distinguish a data set that contains only DNS traffic from one that contains DNS traffic + HTTP traffic based on timing features, etc.
16:43:57 <cohosh> which would have been a nicer dataset to use
16:44:02 <dcf1> (Which is not to take away from the true observation that in common use, the DNS tunnel traffic will be used to tunnel HTTP traffic)
16:44:35 <dcf1> cohosh: let me know if you find that email paper, sounds interesting
16:45:21 <dcf1> it's a big challenge, and it's unfortunate that more empirical modeling may be limited to large organizations with lots of resources, though things seem to be moving in that direction
16:46:20 <dcf1> I would like to see some kind of benchmark comparison paper, like
16:46:55 <dcf1> "if you had this hypothesis and conducted these statistical tests, you would reach conclusion A if you used alexs top sites and conclusion B if you used our university tap"
16:47:23 <cohosh> work on how to lower the barrier of researching these problems would be awesome
16:47:27 <dcf1> I feel that currently we are dealing with an unquantified level of unreality
16:47:51 <dcf1> (though as I said I sympathize, to do much better than the status quo is not easy)
16:47:58 <cohosh> ooh yeah that comparison paper would be great
16:48:15 <cohosh> it reminds me of the tor shadow papers
16:48:23 <meskio> I'm wondering the eticity of the taps for research, I guess you can annonimize some pieces to make it not too bad
16:48:33 <cohosh> https://www.robgjansen.com/publications/neverenough-sec2021.pdf
16:49:30 <cohosh> basically, a meta paper on how to set up experiments so they are more accurate ^^
16:49:43 <dcf1> yeah
16:49:58 <dcf1> it was interesting that this paper used both alexa and umbrella top sites, in different experiement
16:50:11 <dcf1> there is a third one I've seen used too in other papers, can't remember the name
16:51:05 <dcf1> https://tranco-list.eu/
16:51:48 <dcf1> it also bothers me that papers never cite nor publish the specific date of the alexa list they used; afaik it changes daily
16:53:45 <cohosh> yea, that's a good point
16:54:03 <dcf1> I guess one other feature that is usually used in this type of analysis is interpacket timing
16:54:24 <dcf1> not sure if it was considered for this paper and turned out not to be useful, or what
16:54:40 <cohosh> one of the goals of the attacks were to make them low effort attacks
16:54:48 <dcf1> I guess you can infer average interpacket times from average packet rate
16:54:54 <cohosh> or to conside ronly low effort attacks
16:55:13 <cohosh> of the form "once this flow crosses this threshold, or once you see this behaviour, cut the connection"
16:55:25 <cohosh> i guess you could have a running average
16:55:37 <dcf1> that's a good point
16:55:58 <dcf1> or if some bucket in the IAT histogram is especially predictive, keep a running count of that one bucket
16:56:16 <meskio> I assume this kind of technics are still far from what censors do, is risky to base censorship on stadistical annalisys
16:56:38 <cohosh> yea
16:56:55 <dcf1> I think that's the current belief, though in combination with other signals like active probing it can become powerful
16:57:12 <dcf1> though active probing doesn't work in the dnstt threat model
16:57:23 <meskio> yep :)
16:57:33 <cohosh> i gotta run :/
16:57:43 <dcf1> bye bye
16:57:55 <cohosh> thanks for the discussion, i will ping the main authors again about the code and the summary!
16:58:20 <meskio> nice
16:58:25 <cohosh> and thanks dcf1 for all the feedback for the paper :D
16:58:31 <dcf1> e.g. found this with shadowsocks
16:58:35 <meskio> I will also need to leave soon
16:58:38 <dcf1> https://gfw.report/talks/imc20/en/
16:58:44 <dcf1> "Now you can understand this process of active probing as a way of increasing precision or reducing cost in network classification. If you were to write a purely passive classifier for Shadowsocks, it may yield unacceptably high false positives. On the other hand, if you were to try to active probe every single connection that passes through the firewall that may be more probes than you can manage to
16:58:50 <dcf1> send. So you can think of step one as being a sort of pre-filter for step two."
16:59:35 <meskio> makes sense
16:59:39 <meskio> and also is scary
16:59:54 <dcf1> ok let's wait until next week to choose a future paper then
17:00:12 <meskio> yes, let's talk about it next week
17:00:19 <meskio> anything more about it
17:00:50 <meskio> I'll wait a minute to close the meeting
17:01:54 <meskio> #endmeeting