17:59:18 <phw> #startmeeting anti-censorship meeting
17:59:18 <MeetBot> Meeting started Thu Apr 16 17:59:18 2020 UTC.  The chair is phw. Information about MeetBot at http://wiki.debian.org/MeetBot.
17:59:18 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic.
17:59:21 <arma2> dcf1: i am using your turbo tunnel snowflake a11 build and it is working great
17:59:21 <phw> hello everyone
17:59:25 <cjb> hi!
17:59:40 <agix> hi
17:59:40 <phw> here is our meeting pad: https://pad.riseup.net/p/tor-anti-censorship-keep
18:00:08 <cohosh> hi
18:00:09 <juggy> hi
18:00:22 <dcf1> arma2: thanks, same here. I haven't had any CircuitBuildTimeout-related problems either.
18:00:57 <phw> one discussion item: should https://gitweb.torproject.org/snowflake-webext.git/ be in the pluggable-transports directory?
18:00:59 * cohosh is also enjoying the tasty dogfood
18:01:06 <cohosh> ah yeah
18:01:12 * phw takes a look
18:01:17 <cohosh> so we have a new repository now and can move ahead with the split
18:01:36 <cohosh> irl set it up but i forgot to ask whether we want it nested in the pluggable-transports directory
18:01:36 <dcf1> I vote for /pluggable-transports/snowflake-webext.git
18:01:42 <cohosh> okay
18:01:44 * phw thinks the answer should be "yes"
18:01:50 <cohosh> i think that's more people for that option then
18:02:07 <cohosh> i'll leave a comment for irl in case they don't get the pings here
18:02:13 <cohosh> thanks!
18:02:41 <phw> good, that was a productive discussion
18:03:02 <phw> let's take a look at each other's help sections in the pad
18:03:59 <phw> #32912, #33593, and #33666 for cohosh
18:04:06 <phw> #33365 for arlolra
18:04:17 <cohosh> i'll ping hiro about #32912 next week
18:04:38 <cohosh> and arlolra already took a look at #33593 for me
18:04:52 <cohosh> so that just leaves #33666
18:05:08 <phw> cohosh: i can also review #32912 if you'd like to get it done sooner
18:05:19 <cohosh> i can also take #33365, i started reviewing it this morning but didnt finish before the meeting
18:05:19 <arlolra> I can look at that too, but seems like there might be some conflicts with #33365
18:05:25 <cohosh> phw: ah okay that would be great
18:05:45 <arlolra> that = #33366
18:05:57 <arlolra> i mean 33666, sorry
18:06:01 <cohosh> arlolra: ah you're right
18:06:03 <arlolra> :(
18:06:46 <cohosh> just the missing feature part of it conflicts, right?
18:07:16 <arlolra> yeah
18:07:21 <cohosh> arlolra: how about i review yours first and then rebase on top of your branch?
18:07:31 <cohosh> that sounds the easiest to me
18:07:34 <arlolra> sure
18:07:51 <cohosh> i'm also interested in general feedback about the options proposed in #33666
18:08:03 <phw> i'll certainly take a look at #33666
18:08:09 <cohosh> thanks!
18:09:12 <phw> i think that's it for today. did i forget anyone? any other help needed?
18:09:56 <phw> i'll wait for 2 minutes. if there's nothing else, we can move on to today's reading group
18:12:19 <phw> ok, let's start our reading group
18:12:40 <cohosh> cool
18:12:42 <phw> we're discussing wang et al's ndss'20 paper today: https://censorbib.nymity.ch/#Wang2020a
18:13:14 <phw> may i hand the mic over to you, cohosh?
18:13:18 <cohosh> sure
18:13:34 <cohosh> so for a short summary
18:14:28 <cohosh> this paper looks at how differences in protocol state machines at DPI boxes vs. endpoint implementations can allow the DPI features of the box to be evaded
18:15:42 <cohosh> these difference can for example allow a client in a censored area to send some unusual traffic that will confuse the DPI box but be accepted at the other endpoint
18:15:58 <cohosh> which can allow users to evade censorship in some cases
18:16:13 <cohosh> this technique has been used before, for example the "your state is not mine" paper
18:16:30 <cohosh> https://censorbib.nymity.ch/#Wang2017a
18:17:09 <cohosh> what makes this paper different is that they are trying to more completely enumerate the evasion techniques against several different DPI boxes
18:18:04 <cohosh> they do this using software analysis techniques by exploring the state machines of popular endpoint implementations of TCP
18:18:39 <cohosh> and present evasion attacks on several DPI systems, including ones they have testing on the GFW
18:19:22 <cohosh> </summary>
18:19:48 <dcf1> In this paper, the only endpoint TCp implementation they consider is Linux.
18:20:01 <cohosh> aha yes
18:20:20 <dcf1> I wasn't so clear on whether it would be easy to do if you don't have the source code (e.g. Windows)
18:20:40 <phw> ...and it's about linux servers, specifically. it would be useful to know about evasion strategies for windows clients. then, bridges could to evasion without the clients even knowing.
18:20:47 <dcf1> Also they are looking at servers only, though the same idea could possibly apply to clients
18:21:22 <cohosh> yeah, they repeatedly state that knowing the endpoint source allowed them to come up with test cases
18:21:50 <dcf1> The "Sym" part of the title comes from symbolic execution; they use that to automatically explore the code paths of the TCP implementation and come up with candidate packet sequences
18:22:06 <cohosh> my understanding is that they could expand their test cases by looking at other implementations, but they could also use the ones they came up with and test them against a windows endpoint to see if they work?
18:22:31 <dcf1> Then they use their intuition (I think) to winnow down the set of candidates into ones that may also happen to insert/evade against a middlebox.
18:24:21 <phw> i was surprised by how easy it is to make zeek tear down its tcb. and i was equally surprised to learn that snort implements os-specific state machines
18:24:38 <dcf1> Me too re Snort.
18:24:52 <phw> but zeek didn't surprise you? ;)
18:25:01 <dcf1> No, not really, because
18:25:12 <dcf1> https://www.mattblaze.org/papers/internet-tap.pdf "The Eavesdropper's Dilemma"
18:25:44 <dcf1> There's an inherent tension in middleboxes between precision and generality (I forget what exact terms they use)
18:26:24 <cohosh> so the workflow I was imagining here with actually using the results of this paper to evade censorship is to 1) use symbolic execution on whatever endpoints you have available to come up with candidates, 2) narrow them down and test them against middle boxes to see if they work, 3) test them with whichever endpoint implementations you'd like to actually use to see if they work for you, 4) deploy them
18:26:30 <dcf1> Zeek could try and be very precise and reject packets that would cause it to tear down its TCB, or it could be very liberal and accept any that might do so
18:26:42 <dcf1> You get errors on either side
18:27:12 <dcf1> You kind of have to pick a place on the continuum you want to inhabit, or maybe simulate multiple possible worlds simultaneously
18:28:00 <dcf1> "Sensitivity" and "selectivity" are the words they use
18:28:09 <phw> dcf1: yes, makes sense on second thought
18:28:33 <cohosh> dcf1: thanks for linking that
18:30:06 <phw> regarding gfw evasion strategies: i'm intrigued by the lack of checksum verification. i suspect that was a conscious decision to increase firewall throughput
18:30:26 <dcf1> So one possible application of this research is to make a brdgrd-like program that you can run on the server
18:30:47 <dcf1> Same like with Geneva (I don't know if Geneva released that code yet)
18:31:44 <cohosh> ah nice
18:31:46 <dcf1> This paper and Geneva take two quite different approaches to discovering candidate packet sequences. SymTCP: offline, white box. Geneva: online, black box.
18:32:23 <dcf1> Though SymTCP is not fully offline, only the first part. After that they need online checks because that's the only access they assume they have to the DPI system.
18:32:57 <cjb> Is there a way to compare the approaches, see which might end up with more coverage over the possible relevant sequences?  I saw SymTCP describes itself as "more principled" than Geneva.
18:33:18 <dcf1> Yeah I flinched a bit at that statement, I thought it was a bit out of line.
18:33:41 <dcf1> I wish people who write papers wouldn't write that kind of thing, but to some extent they are forced to do so by program committees.
18:34:15 <cohosh> :/
18:34:30 <cjb> It would have been great to see a comparison that was more like "here's an evasion we found that Geneva would not find because reasons"
18:34:40 <dcf1> cjb: I agree
18:34:59 <cohosh> in addition to coverage, there are other things to consider as well. like the amount of traffic that's generated by each side in the online phases
18:36:20 <cohosh> i'm assuming both systems want each client/server to perform these steps separately. is there another route here where either or both of symTCP/Geneva are run and the successful evasion techniques gathered to be integrated into tools manually?
18:36:38 <cohosh> perhaps that was the original intent
18:36:49 <dcf1> SymTCP seems like it requires still a fair amount of manual work, I'm looking for the passages that highlight that.
18:36:57 <phw> i'd love to experiment with some of these strategies. in particular, the ones based on bad checksums and timestamps. i assume that's conceptually harder for the gfw to fix than many of the other strategies
18:37:27 <cohosh> phw: yeah, anything that makes DPI more expensive to fix is interesting :D
18:37:53 <dcf1> Geneva's actual genetic algorithm is not public yet, apparently. https://github.com/Kkevsterrr/geneva "We will be releasing the genetic algorithm at a later date."
18:38:22 <dcf1> SymTCP source code is at https://github.com/seclab-ucr/SymTCP
18:41:25 <cohosh> does anyone know if the techniques from "your state is not mine" were ever implemented and deployed?
18:42:34 <phw> there's a paragraph that talks about the ambiguity of tcp's urgent pointer. in particular, it mentions that payload that's marked as urgent could be "pushed to the application layer using a separate interface". i wonder what that means because all the application has, is a socket, no?
18:43:16 <dcf1> phw: there's a special syscall for it, I think. Some telnet implementations use it to move user keystrokes to the head of the line for a small amoutn of out-of-order processing.
18:44:05 <phw> dcf1: oh, interesting
18:44:13 <yanmaani> I don't understand this - pardon me if it's far below your level, but isn't it a simple Bayesian thing? If you have someone who usually gets flagged for proxies, their packets can be checked much harder.
18:44:21 * dcf1 open up UNIX Network Programming... send(..., MSG_OOB)
18:44:55 <yanmaani> Like, isn't the basic assumption that if you get through, you get through, and if you don't, then you try again tomorrow and hope things are better or your software is fixed?
18:46:13 <phw> the paper also reminded me of vecna's sniffjoke: https://tools.kali.org/sniffingspoofing/sniffjoke i wonder what he would think of this paper
18:46:26 <cohosh> yanmaani: this is a good insight. at the moment, we're not aware of different DPI techniques being used for traffic that corresponds to users who generate alot of known proxy traffic
18:46:33 <cohosh> atleast as far as i know
18:46:34 <dcf1> yanmaani: I don't understand what you mean. Are you referring to a topic of this paper or is it a question in general about proxy detection?
18:47:04 <yanmaani> I was under the impression that the GFW worked like this - in times of harder political pressure, they quite literally "cranked it up"
18:47:25 <yanmaani> dcf1: A question in general
18:47:52 <yanmaani> So, intuitively, if you live in the sort of country to implement large-scale internet censorship, you will most likely also live in the sort of country which is not overkeen on anonymous internet access
18:48:54 <yanmaani> And then it seems like you could just keep a "reverse blacklist" of IPs, just like with spam: This guy here is in his 20s, reasonably good with technology, and has previously had packets blocked for firewall evasion - let's expend a few extra cycles on checking his egress
18:49:43 <yanmaani> And vice versa - this is a lifelong party member in her 70s who only got a smartphone last year and has never uttered any unorthodoxy thus far, so let's only do the very basic filtering
18:50:12 <dcf1> yanmaani: I feel like this discussion could take us far afield from the topic of this week's reading group
18:50:16 <cohosh> yanmaani: my intuition is that this kind of blacklist would be really difficult to implement at scale
18:50:33 <yanmaani> right, not gonna go further off-topic. thanks
18:51:28 <cohosh> so going back to the original questions we had to think about for this
18:51:28 <dcf1> So one of the common objections to this style of evasion is that it requires low-level socket access (e.g. root), which is harder to deploy.
18:52:03 <dcf1> Like, if the strategy calls for us to send a packet with a bad TCp checksum, how to we actually implement that on an iPhone?
18:52:07 <cohosh> dcf1: right, so it's better as a bridge-side defence?
18:52:36 <phw> i bet that the "bad checksum" strategy works in both directions, so the bridge could take care of that
18:52:41 <dcf1> Maybe yes maybe no, even on the bridge side there's a step down in usability when something requires root
18:53:31 <dcf1> One nice thing about Geneva is that at least a few strategies don't require low-level packet access, you can do tham in user space, if I recall correctly.
18:53:50 <phw> this is something that we could deploy in our obfs4 docker container because the complex setup process would be done by us, and not the operator
18:53:56 <cohosh> does geneva not limit itself to TCP?
18:54:04 <dcf1> But still most of Geneva's are similar in nature to what SymTCP finds, is my impression
18:54:06 <cohosh> (haven't read that paper unfortunately)
18:54:20 <dcf1> cohosh: see https://github.com/net4people/bbs/issues/23 for a summary
18:54:53 <cohosh> cool thanks
18:55:16 <cohosh> phw: that's a good point
18:55:43 <cohosh> although, i don't think these techniques will help overly much with the kind of blocking we typically see of obfs4 bridges, correct?
18:55:50 <cohosh> since those are blocked by IP address?
18:55:59 <phw> thymbahutymba: you may find the above interesting
18:56:33 <phw> cohosh: yes, correct.
18:56:47 <dcf1> cohosh: yes, that's another part of it. The other common objection to systems like this is, aren't the evasion techniques fragile / security by obscurity / only work until the censor knows you are using them?
18:57:11 <dcf1> I think both SymTCP and Geneva make the claim that by automating the discovery of new evasions, they mitigate that risk.
18:57:23 <cohosh> dcf1: for that i think the ones that make DPI expensive are especially interesting though
18:57:39 <dcf1> But of course no one has put that idea to the test, tried deploying something and then reacting when the censor notices it.
18:57:42 <cohosh> even if the censor patches there stuff, maybe we cost them money and it works for a while in the meantime?
18:57:59 <dcf1> cohosh: yes, that too. That was a point made by...
18:58:02 <cohosh> although yeah idk, i guess it's hard to tell what the impact of that is
18:58:17 <dcf1> https://censorbib.nymity.ch/#Khattak2013a
18:58:30 <cohosh> ah nice thanks
18:58:51 <dcf1> "ways that users can alter network traffic that will both avoid detection and, crucially, require the censor to make expensive changes to the system’s basic modelto remedy."
18:59:10 * cohosh 's reading list is growing exponentially lol
18:59:30 <phw> tcp-based evasion may be attractive for, say, web servers where PTs aren't an option. but of course, dns-based blocking would still remain an issue.
18:59:35 <dcf1> So that might be an interesting way to push work like this further, make hypotheses about what evasions would be expensive (i.e. rank candidates) and then try them.
19:00:12 <dcf1> phw: yes, it could be an interesting short-term unblocking option for web server operators who can't train all their users to use circumvention software, in some cases
19:00:15 <cohosh> yup a long-term study on DPI changes would be super interesting
19:01:21 <dcf1> cohosh: I have short informal summaries of some papers at https://www.bamsoftware.com/papers/thesis/summaries.txt
19:01:29 <dcf1> In case you want to bias yourself before reading them ;)
19:01:56 <cohosh> haha nice
19:02:52 <phw> dcf1: i'd love to incorporate that in censorbib somehow. eg, the summaries could show up when you move your mouse over a paper
19:05:10 <cohosh> it seems like we've pretty much covered the questions we were wanting to ask ourselves about this paper
19:05:58 <phw> yes, shall we wrap it up?
19:06:01 <cohosh> the actions we could take being updating the obfs4 docker, although it's probably not going to help much with our obfs4 bridges being blocked at the moment
19:06:32 <cohosh> and future work being getting people to try these techniques and see what happens
19:06:40 <cohosh> yeah, anything else to add?
19:08:00 <phw> it would be exciting to see how many evasion opportunities there are in ipv4 and ipv6. wouldn't it be great if we could unblock torproject.org on the server side?
19:08:21 <cohosh> lol nice
19:08:41 <dcf1> About IPv6, I did suggest that to some research group, I don't remember which...
19:09:02 <dcf1> You notice on SymTCP how much trouble TCP options gave them, because they are variable-length and otherwise variable
19:09:41 <dcf1> IPv6 extension options are very similar in that respect. Probably a lot of middleboxes don't actually fully and correctly parse them, only recognizing the most common stereotyped patterns.
19:10:30 <dcf1> Ah yes I suggested it to the Geneva group. "In India in Section 5.3, I was curious about the potential with IPv6 extension options—if they can't handle TCP options they may not be able to handle IPv6 extension option layouts that differ from the most common ones."
19:11:10 <phw> dave levin reads both our anti-censorship list and ooni's channels. it would be great if we could convince him to tackle some of these open questions.
19:11:59 <phw> i'll point him to the meeting log of this reading group
19:12:34 <cohosh> cool :)
19:12:55 <phw> would anyone like to suggest a paper for the next reading group?
19:13:10 <dcf1> I am reading MassBrowser
19:13:42 <phw> okay, let's discuss massbrowser on april 30?
19:13:46 <dcf1> https://censorbib.nymity.ch/#Nasr2020a or at least that was the next summary I planned to post to bbs
19:13:56 <cohosh> sounds good to me
19:15:08 <phw> that was a great conversation today, thanks!
19:15:11 <phw> let's wrap it up
19:15:15 <phw> #endmeeting