17:59:18 <phw> #startmeeting anti-censorship meeting 17:59:18 <MeetBot> Meeting started Thu Apr 16 17:59:18 2020 UTC. The chair is phw. Information about MeetBot at http://wiki.debian.org/MeetBot. 17:59:18 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic. 17:59:21 <arma2> dcf1: i am using your turbo tunnel snowflake a11 build and it is working great 17:59:21 <phw> hello everyone 17:59:25 <cjb> hi! 17:59:40 <agix> hi 17:59:40 <phw> here is our meeting pad: https://pad.riseup.net/p/tor-anti-censorship-keep 18:00:08 <cohosh> hi 18:00:09 <juggy> hi 18:00:22 <dcf1> arma2: thanks, same here. I haven't had any CircuitBuildTimeout-related problems either. 18:00:57 <phw> one discussion item: should https://gitweb.torproject.org/snowflake-webext.git/ be in the pluggable-transports directory? 18:00:59 * cohosh is also enjoying the tasty dogfood 18:01:06 <cohosh> ah yeah 18:01:12 * phw takes a look 18:01:17 <cohosh> so we have a new repository now and can move ahead with the split 18:01:36 <cohosh> irl set it up but i forgot to ask whether we want it nested in the pluggable-transports directory 18:01:36 <dcf1> I vote for /pluggable-transports/snowflake-webext.git 18:01:42 <cohosh> okay 18:01:44 * phw thinks the answer should be "yes" 18:01:50 <cohosh> i think that's more people for that option then 18:02:07 <cohosh> i'll leave a comment for irl in case they don't get the pings here 18:02:13 <cohosh> thanks! 18:02:41 <phw> good, that was a productive discussion 18:03:02 <phw> let's take a look at each other's help sections in the pad 18:03:59 <phw> #32912, #33593, and #33666 for cohosh 18:04:06 <phw> #33365 for arlolra 18:04:17 <cohosh> i'll ping hiro about #32912 next week 18:04:38 <cohosh> and arlolra already took a look at #33593 for me 18:04:52 <cohosh> so that just leaves #33666 18:05:08 <phw> cohosh: i can also review #32912 if you'd like to get it done sooner 18:05:19 <cohosh> i can also take #33365, i started reviewing it this morning but didnt finish before the meeting 18:05:19 <arlolra> I can look at that too, but seems like there might be some conflicts with #33365 18:05:25 <cohosh> phw: ah okay that would be great 18:05:45 <arlolra> that = #33366 18:05:57 <arlolra> i mean 33666, sorry 18:06:01 <cohosh> arlolra: ah you're right 18:06:03 <arlolra> :( 18:06:46 <cohosh> just the missing feature part of it conflicts, right? 18:07:16 <arlolra> yeah 18:07:21 <cohosh> arlolra: how about i review yours first and then rebase on top of your branch? 18:07:31 <cohosh> that sounds the easiest to me 18:07:34 <arlolra> sure 18:07:51 <cohosh> i'm also interested in general feedback about the options proposed in #33666 18:08:03 <phw> i'll certainly take a look at #33666 18:08:09 <cohosh> thanks! 18:09:12 <phw> i think that's it for today. did i forget anyone? any other help needed? 18:09:56 <phw> i'll wait for 2 minutes. if there's nothing else, we can move on to today's reading group 18:12:19 <phw> ok, let's start our reading group 18:12:40 <cohosh> cool 18:12:42 <phw> we're discussing wang et al's ndss'20 paper today: https://censorbib.nymity.ch/#Wang2020a 18:13:14 <phw> may i hand the mic over to you, cohosh? 18:13:18 <cohosh> sure 18:13:34 <cohosh> so for a short summary 18:14:28 <cohosh> this paper looks at how differences in protocol state machines at DPI boxes vs. endpoint implementations can allow the DPI features of the box to be evaded 18:15:42 <cohosh> these difference can for example allow a client in a censored area to send some unusual traffic that will confuse the DPI box but be accepted at the other endpoint 18:15:58 <cohosh> which can allow users to evade censorship in some cases 18:16:13 <cohosh> this technique has been used before, for example the "your state is not mine" paper 18:16:30 <cohosh> https://censorbib.nymity.ch/#Wang2017a 18:17:09 <cohosh> what makes this paper different is that they are trying to more completely enumerate the evasion techniques against several different DPI boxes 18:18:04 <cohosh> they do this using software analysis techniques by exploring the state machines of popular endpoint implementations of TCP 18:18:39 <cohosh> and present evasion attacks on several DPI systems, including ones they have testing on the GFW 18:19:22 <cohosh> </summary> 18:19:48 <dcf1> In this paper, the only endpoint TCp implementation they consider is Linux. 18:20:01 <cohosh> aha yes 18:20:20 <dcf1> I wasn't so clear on whether it would be easy to do if you don't have the source code (e.g. Windows) 18:20:40 <phw> ...and it's about linux servers, specifically. it would be useful to know about evasion strategies for windows clients. then, bridges could to evasion without the clients even knowing. 18:20:47 <dcf1> Also they are looking at servers only, though the same idea could possibly apply to clients 18:21:22 <cohosh> yeah, they repeatedly state that knowing the endpoint source allowed them to come up with test cases 18:21:50 <dcf1> The "Sym" part of the title comes from symbolic execution; they use that to automatically explore the code paths of the TCP implementation and come up with candidate packet sequences 18:22:06 <cohosh> my understanding is that they could expand their test cases by looking at other implementations, but they could also use the ones they came up with and test them against a windows endpoint to see if they work? 18:22:31 <dcf1> Then they use their intuition (I think) to winnow down the set of candidates into ones that may also happen to insert/evade against a middlebox. 18:24:21 <phw> i was surprised by how easy it is to make zeek tear down its tcb. and i was equally surprised to learn that snort implements os-specific state machines 18:24:38 <dcf1> Me too re Snort. 18:24:52 <phw> but zeek didn't surprise you? ;) 18:25:01 <dcf1> No, not really, because 18:25:12 <dcf1> https://www.mattblaze.org/papers/internet-tap.pdf "The Eavesdropper's Dilemma" 18:25:44 <dcf1> There's an inherent tension in middleboxes between precision and generality (I forget what exact terms they use) 18:26:24 <cohosh> so the workflow I was imagining here with actually using the results of this paper to evade censorship is to 1) use symbolic execution on whatever endpoints you have available to come up with candidates, 2) narrow them down and test them against middle boxes to see if they work, 3) test them with whichever endpoint implementations you'd like to actually use to see if they work for you, 4) deploy them 18:26:30 <dcf1> Zeek could try and be very precise and reject packets that would cause it to tear down its TCB, or it could be very liberal and accept any that might do so 18:26:42 <dcf1> You get errors on either side 18:27:12 <dcf1> You kind of have to pick a place on the continuum you want to inhabit, or maybe simulate multiple possible worlds simultaneously 18:28:00 <dcf1> "Sensitivity" and "selectivity" are the words they use 18:28:09 <phw> dcf1: yes, makes sense on second thought 18:28:33 <cohosh> dcf1: thanks for linking that 18:30:06 <phw> regarding gfw evasion strategies: i'm intrigued by the lack of checksum verification. i suspect that was a conscious decision to increase firewall throughput 18:30:26 <dcf1> So one possible application of this research is to make a brdgrd-like program that you can run on the server 18:30:47 <dcf1> Same like with Geneva (I don't know if Geneva released that code yet) 18:31:44 <cohosh> ah nice 18:31:46 <dcf1> This paper and Geneva take two quite different approaches to discovering candidate packet sequences. SymTCP: offline, white box. Geneva: online, black box. 18:32:23 <dcf1> Though SymTCP is not fully offline, only the first part. After that they need online checks because that's the only access they assume they have to the DPI system. 18:32:57 <cjb> Is there a way to compare the approaches, see which might end up with more coverage over the possible relevant sequences? I saw SymTCP describes itself as "more principled" than Geneva. 18:33:18 <dcf1> Yeah I flinched a bit at that statement, I thought it was a bit out of line. 18:33:41 <dcf1> I wish people who write papers wouldn't write that kind of thing, but to some extent they are forced to do so by program committees. 18:34:15 <cohosh> :/ 18:34:30 <cjb> It would have been great to see a comparison that was more like "here's an evasion we found that Geneva would not find because reasons" 18:34:40 <dcf1> cjb: I agree 18:34:59 <cohosh> in addition to coverage, there are other things to consider as well. like the amount of traffic that's generated by each side in the online phases 18:36:20 <cohosh> i'm assuming both systems want each client/server to perform these steps separately. is there another route here where either or both of symTCP/Geneva are run and the successful evasion techniques gathered to be integrated into tools manually? 18:36:38 <cohosh> perhaps that was the original intent 18:36:49 <dcf1> SymTCP seems like it requires still a fair amount of manual work, I'm looking for the passages that highlight that. 18:36:57 <phw> i'd love to experiment with some of these strategies. in particular, the ones based on bad checksums and timestamps. i assume that's conceptually harder for the gfw to fix than many of the other strategies 18:37:27 <cohosh> phw: yeah, anything that makes DPI more expensive to fix is interesting :D 18:37:53 <dcf1> Geneva's actual genetic algorithm is not public yet, apparently. https://github.com/Kkevsterrr/geneva "We will be releasing the genetic algorithm at a later date." 18:38:22 <dcf1> SymTCP source code is at https://github.com/seclab-ucr/SymTCP 18:41:25 <cohosh> does anyone know if the techniques from "your state is not mine" were ever implemented and deployed? 18:42:34 <phw> there's a paragraph that talks about the ambiguity of tcp's urgent pointer. in particular, it mentions that payload that's marked as urgent could be "pushed to the application layer using a separate interface". i wonder what that means because all the application has, is a socket, no? 18:43:16 <dcf1> phw: there's a special syscall for it, I think. Some telnet implementations use it to move user keystrokes to the head of the line for a small amoutn of out-of-order processing. 18:44:05 <phw> dcf1: oh, interesting 18:44:13 <yanmaani> I don't understand this - pardon me if it's far below your level, but isn't it a simple Bayesian thing? If you have someone who usually gets flagged for proxies, their packets can be checked much harder. 18:44:21 * dcf1 open up UNIX Network Programming... send(..., MSG_OOB) 18:44:55 <yanmaani> Like, isn't the basic assumption that if you get through, you get through, and if you don't, then you try again tomorrow and hope things are better or your software is fixed? 18:46:13 <phw> the paper also reminded me of vecna's sniffjoke: https://tools.kali.org/sniffingspoofing/sniffjoke i wonder what he would think of this paper 18:46:26 <cohosh> yanmaani: this is a good insight. at the moment, we're not aware of different DPI techniques being used for traffic that corresponds to users who generate alot of known proxy traffic 18:46:33 <cohosh> atleast as far as i know 18:46:34 <dcf1> yanmaani: I don't understand what you mean. Are you referring to a topic of this paper or is it a question in general about proxy detection? 18:47:04 <yanmaani> I was under the impression that the GFW worked like this - in times of harder political pressure, they quite literally "cranked it up" 18:47:25 <yanmaani> dcf1: A question in general 18:47:52 <yanmaani> So, intuitively, if you live in the sort of country to implement large-scale internet censorship, you will most likely also live in the sort of country which is not overkeen on anonymous internet access 18:48:54 <yanmaani> And then it seems like you could just keep a "reverse blacklist" of IPs, just like with spam: This guy here is in his 20s, reasonably good with technology, and has previously had packets blocked for firewall evasion - let's expend a few extra cycles on checking his egress 18:49:43 <yanmaani> And vice versa - this is a lifelong party member in her 70s who only got a smartphone last year and has never uttered any unorthodoxy thus far, so let's only do the very basic filtering 18:50:12 <dcf1> yanmaani: I feel like this discussion could take us far afield from the topic of this week's reading group 18:50:16 <cohosh> yanmaani: my intuition is that this kind of blacklist would be really difficult to implement at scale 18:50:33 <yanmaani> right, not gonna go further off-topic. thanks 18:51:28 <cohosh> so going back to the original questions we had to think about for this 18:51:28 <dcf1> So one of the common objections to this style of evasion is that it requires low-level socket access (e.g. root), which is harder to deploy. 18:52:03 <dcf1> Like, if the strategy calls for us to send a packet with a bad TCp checksum, how to we actually implement that on an iPhone? 18:52:07 <cohosh> dcf1: right, so it's better as a bridge-side defence? 18:52:36 <phw> i bet that the "bad checksum" strategy works in both directions, so the bridge could take care of that 18:52:41 <dcf1> Maybe yes maybe no, even on the bridge side there's a step down in usability when something requires root 18:53:31 <dcf1> One nice thing about Geneva is that at least a few strategies don't require low-level packet access, you can do tham in user space, if I recall correctly. 18:53:50 <phw> this is something that we could deploy in our obfs4 docker container because the complex setup process would be done by us, and not the operator 18:53:56 <cohosh> does geneva not limit itself to TCP? 18:54:04 <dcf1> But still most of Geneva's are similar in nature to what SymTCP finds, is my impression 18:54:06 <cohosh> (haven't read that paper unfortunately) 18:54:20 <dcf1> cohosh: see https://github.com/net4people/bbs/issues/23 for a summary 18:54:53 <cohosh> cool thanks 18:55:16 <cohosh> phw: that's a good point 18:55:43 <cohosh> although, i don't think these techniques will help overly much with the kind of blocking we typically see of obfs4 bridges, correct? 18:55:50 <cohosh> since those are blocked by IP address? 18:55:59 <phw> thymbahutymba: you may find the above interesting 18:56:33 <phw> cohosh: yes, correct. 18:56:47 <dcf1> cohosh: yes, that's another part of it. The other common objection to systems like this is, aren't the evasion techniques fragile / security by obscurity / only work until the censor knows you are using them? 18:57:11 <dcf1> I think both SymTCP and Geneva make the claim that by automating the discovery of new evasions, they mitigate that risk. 18:57:23 <cohosh> dcf1: for that i think the ones that make DPI expensive are especially interesting though 18:57:39 <dcf1> But of course no one has put that idea to the test, tried deploying something and then reacting when the censor notices it. 18:57:42 <cohosh> even if the censor patches there stuff, maybe we cost them money and it works for a while in the meantime? 18:57:59 <dcf1> cohosh: yes, that too. That was a point made by... 18:58:02 <cohosh> although yeah idk, i guess it's hard to tell what the impact of that is 18:58:17 <dcf1> https://censorbib.nymity.ch/#Khattak2013a 18:58:30 <cohosh> ah nice thanks 18:58:51 <dcf1> "ways that users can alter network traffic that will both avoid detection and, crucially, require the censor to make expensive changes to the system’s basic modelto remedy." 18:59:10 * cohosh 's reading list is growing exponentially lol 18:59:30 <phw> tcp-based evasion may be attractive for, say, web servers where PTs aren't an option. but of course, dns-based blocking would still remain an issue. 18:59:35 <dcf1> So that might be an interesting way to push work like this further, make hypotheses about what evasions would be expensive (i.e. rank candidates) and then try them. 19:00:12 <dcf1> phw: yes, it could be an interesting short-term unblocking option for web server operators who can't train all their users to use circumvention software, in some cases 19:00:15 <cohosh> yup a long-term study on DPI changes would be super interesting 19:01:21 <dcf1> cohosh: I have short informal summaries of some papers at https://www.bamsoftware.com/papers/thesis/summaries.txt 19:01:29 <dcf1> In case you want to bias yourself before reading them ;) 19:01:56 <cohosh> haha nice 19:02:52 <phw> dcf1: i'd love to incorporate that in censorbib somehow. eg, the summaries could show up when you move your mouse over a paper 19:05:10 <cohosh> it seems like we've pretty much covered the questions we were wanting to ask ourselves about this paper 19:05:58 <phw> yes, shall we wrap it up? 19:06:01 <cohosh> the actions we could take being updating the obfs4 docker, although it's probably not going to help much with our obfs4 bridges being blocked at the moment 19:06:32 <cohosh> and future work being getting people to try these techniques and see what happens 19:06:40 <cohosh> yeah, anything else to add? 19:08:00 <phw> it would be exciting to see how many evasion opportunities there are in ipv4 and ipv6. wouldn't it be great if we could unblock torproject.org on the server side? 19:08:21 <cohosh> lol nice 19:08:41 <dcf1> About IPv6, I did suggest that to some research group, I don't remember which... 19:09:02 <dcf1> You notice on SymTCP how much trouble TCP options gave them, because they are variable-length and otherwise variable 19:09:41 <dcf1> IPv6 extension options are very similar in that respect. Probably a lot of middleboxes don't actually fully and correctly parse them, only recognizing the most common stereotyped patterns. 19:10:30 <dcf1> Ah yes I suggested it to the Geneva group. "In India in Section 5.3, I was curious about the potential with IPv6 extension options—if they can't handle TCP options they may not be able to handle IPv6 extension option layouts that differ from the most common ones." 19:11:10 <phw> dave levin reads both our anti-censorship list and ooni's channels. it would be great if we could convince him to tackle some of these open questions. 19:11:59 <phw> i'll point him to the meeting log of this reading group 19:12:34 <cohosh> cool :) 19:12:55 <phw> would anyone like to suggest a paper for the next reading group? 19:13:10 <dcf1> I am reading MassBrowser 19:13:42 <phw> okay, let's discuss massbrowser on april 30? 19:13:46 <dcf1> https://censorbib.nymity.ch/#Nasr2020a or at least that was the next summary I planned to post to bbs 19:13:56 <cohosh> sounds good to me 19:15:08 <phw> that was a great conversation today, thanks! 19:15:11 <phw> let's wrap it up 19:15:15 <phw> #endmeeting