17:59:03 #startmeeting anti-censorship meeting 17:59:03 Meeting started Thu Apr 2 17:59:03 2020 UTC. The chair is phw. Information about MeetBot at http://wiki.debian.org/MeetBot. 17:59:03 Useful Commands: #action #agreed #help #info #idea #link #topic. 17:59:10 hello everybody 17:59:20 here is today's meeting pad: https://pad.riseup.net/p/tor-anti-censorship-keep 17:59:22 o/ 17:59:30 hi! 17:59:37 o/ 18:00:08 hi] 18:00:09 we have an empty agenda today. does anyone have impromptu announcements, questions, or discussion topics? 18:00:49 Could I know where to find the relevant code for studying Moat in Tor's source code? 18:01:40 let me find that for you 18:02:10 One the client I believe it's part of Tor Launcher 18:02:11 https://gitweb.torproject.org/tor-launcher.git/tree/src/modules/tl-bridgedb.jsm 18:02:29 that should be the server-side component: https://gitweb.torproject.org/pluggable-transports/meek.git/tree/meek-server 18:02:37 * antonela is late, o/ 18:02:45 Thank you! 18:03:02 (hi folks!) 18:03:08 juggy: did i forward you the email explaining moat? 18:03:26 No, I don't think so 18:03:29 sysrqb once wrote a lengthy email, which is the best moat documentation we have :) 18:03:48 phw: can you cc me on that? or maybe post it to the anti-censorship mailing list? 18:04:03 phw: cc me please too :) 18:04:05 me too thank you :) 18:04:09 cohosh: that's a good idea, i might as well send it to our anti-censorship list 18:04:38 for those who aren't subscribed, you can do so here: https://lists.torproject.org/cgi-bin/mailman/listinfo/anti-censorship-team 18:06:34 sysrqb: are you ok with me forwarding your email (<20190215165707.xag6rl334h3f7vrj@localhost>) to our public team list? 18:06:51 anything else before we take a look at reviews? 18:07:52 ok, let's get started with reviews 18:08:11 i have #29686 and #30941 18:08:15 * phw looks at cohosh 18:08:23 yep i can take those :) 18:08:30 thanks! :) 18:08:33 cohosh has #33666 18:08:53 I'm going to run some tests of my own for #33666 18:09:01 ok cool dcf1 18:09:15 and I'll comment on the ticket, short summary is that I think (3) is a good option 18:09:20 Is reviews for requesting others to take a look at what you've done for a ticket? 18:09:45 juggy: yes, exactly. we don't merge code unless it has been reviewed by somebody else 18:09:49 juggy: yup! 18:10:02 for juggy i see #10831 needs a review 18:10:07 i can take a look at that 18:10:09 #10831 18:10:12 Yes, thanks! 18:10:53 i think arlo's #19026 is from last week because it's already fixed 18:11:03 thanks for working on that one juggy 18:11:05 juggy: it's also a good idea to change the ticket to needs_review state (if your account lets you do that) 18:11:25 i see, let me try 18:11:28 and if it doesn't, we should upgrade your account 18:12:33 agix has a question wrt #19997. let me take a look 18:12:59 thanks 18:13:52 Hmm under "modify ticket" I only see one disable radio button for "leave as assigned" 18:14:04 disabled* 18:14:16 huh, i wasn't even aware of this ticket. i think i'll have to do some more digging before i'll be able to answer your question agix :/ 18:14:41 cohosh: i can do trac changes if any are needed 18:14:51 cohosh: come to think of it, maybe i should make it so somebody else here can too, if they can't :) 18:15:02 arma2: can you give juggy trac powers? 18:15:11 no problem :) is it okay if i still start with this one or should i just tackle a different ticket 18:15:22 yes. GRP_user enough? or GRP_devel? i don't actually know how they differ 18:15:37 i'll try GRP_devel because it sounds more impressive 18:15:50 agix: and regarding "How many bridges can currently be requested by a user from a particular Tor exit address?": a client can request bridges over and over but tor exits are supposed to have their own pool of bridges. that is, if you request bridges from an exit relay, you shouldn't be getting bridges that are allocated for https, moat, or email 18:16:20 i see, thanks for that! 18:16:22 does that answer your question? 18:16:26 juggy: it is done 18:16:43 thanks! 18:17:18 hmm, we should probably verify that this is actually happening. the code doesn't always match our understanding of the code. 18:17:37 yup, do you want me to work on a different ticket in the meantime? 18:18:16 phw: and if it turns out to still be true (hopefully so), i wonder if we should tag it as a new bonus https-tor distribution bucket on atlas 18:18:32 agix: #19997 seems like a great thing to be working on. i believe i also owe you one or two reviews. i try to get to them today or tomorrow 18:19:03 cool thanks :) ill be on irc or just send me a mail if you got some info on it 18:19:07 arma2: i suggest calling it the "gfw distribution bucket" 18:19:37 lol 18:19:39 works for me. just a few lines down the list from the gfy distribution bucket. 18:20:02 (also an apt name) 18:20:33 did i forget anyone's work? 18:21:28 catalyst: oops, i owe you a response in #5304. sorry about that 18:22:30 phw: thanks. i'm not currently blocking on it, at least until i finish evaluating the current patch 18:22:57 ok, good 18:23:07 shall we conclude our meeting and move on to the reading group? 18:23:55 by "concluding the meeting" i mean changing topics. there's no need to end the meeting and lose logs 18:24:12 sounds good to me 18:24:29 a reminder: we're discussing https://censorbib.nymity.ch/#Frolov2020a today 18:24:48 we also have a new section, "reading group", on our pad 18:25:24 let me briefly summarise the paper and start with some background 18:26:29 several years ago, china's great firewall was enhanced with an "active probing" infrastructure which turned out to be highly effective at blocking circumvention protocols 18:27:44 it works in two stages: first, it identifies traffic that *may* be circumvention protocols. then, it actively talks to the circumvention proxies it found in the first step, and if they speak the protocol it suspected, it can block them. 18:27:58 vanilla tor and many other protocols fell prey to it 18:28:10 so we started developing pluggable transport protocols that are resistant to this kind of attack 18:28:28 the idea is to not talk to a client unless the client can prove knowledge of a shared secret in the first few bytes it sends to the server 18:29:17 for obfs4, this effectively means that you must have gotten the obfs4 bridge from bridgedb. if you only have its ip address and port, you cannot get it to talk to you 18:29:35 a bunch of other protocols started implementing this idea and it turned out to be relatively effective, i would say 18:30:41 "effective" meaning that the gfw has yet to block these probing-resistant protocols. we're still struggling though because distributing these obfs4 bridges is still a big issue 18:31:06 anyway, the goal of the paper was to look at how one can tell apart probing-resistant protocols from "ordinary" protocols 18:32:00 their idea was to use two data sets (one passive and one active) and then look for ways to distinguish probing-resistant protocols from all other protocols 18:32:53 for these two datasets, they were reasonably successful and found a handful of clever and effective attacks. it is not perfectly clear how the findings generalise to the entire internet, though 18:34:08 i think their datasets give us a good idea of how hard, approximately, the problem is but there's still a lot of uncertainty left 18:35:26 in practice, obfs4 still works. that can mean two things: it may be (too) hard for the gfw to block it, and/or the gfw is successful enough at getting obfs4 bridges from bridgedb, so there's no need to block the protocol itself. 18:36:26 and with that, i want to stop my monologue and ask y'all for your thoughts. what do you conclude from the paper? and how does it apply to our problem of distributing obfs4 bridges? 18:36:58 to be fair it looks like obfs4 made this change in response? https://gitweb.torproject.org/pluggable-transports/obfs4.git/commit/?id=1a6129b66ff3e66c347b54fbae203c1c61d12d74 18:37:21 oh, right. i believe sergey reached out to yawning, who then fixed this issue. 18:37:39 Something I was wondering about was, have they shared their findings regarding which servers they were able to identify and if so did we check if those addresses actually were Tor servers speaking obfs4? 18:39:30 agix: according to Table IV on page 11, they only found 2 servers that were classified as obfs4 18:39:49 that's a good question. i haven't heard from the authors but then again, i believe they ran their experiments before i started working full-time on tor. 18:40:21 Section V.D on page 9 says that those 2 are unlikely to be actual obfs4 servers (i.e. they are false positives) 18:40:27 there definitely wasn't a mail from them to us about bridges they found. i'm glad they talked to yawning though. 18:41:25 "Both server are in China, and on serves a TLS certificate valid for several subdomains of baofeng.com." I interpret that to mean that they didn't really find any obfs4 bridges worth reporting 18:41:40 i was curious whether this was related at all to the shadowsocks active probing https://gfw.report/blog/gfw_shadowsocks/ 18:42:23 agix: Their Lampshade match also looks like a false positive, as are likely almost all of their MTProto matches, but in the case of OSSH they were able to verify with Psiphon that 7/8 of the classifications were correct. 18:43:43 dcf1: I guess you are right, sounds like false positives to me as well 18:44:02 cohosh: it's possible that some of the different shadowsocks probes are aimed at classification of this type. Especially the ones that are 0-50 bytes long, they seem to match certain byte thresholds for various ways of using shadowsocks. 18:44:33 cohosh: but other probe types like plain replay are more straightforward than the subtle attacks of this paper. 18:45:00 i'm very curious what their scan results would look like over udp. for example, take all udp end points that their university clients talked to, and then try to get them to talk. 18:45:01 dcf1: okay cool 18:46:02 phw: that's a good angle. Lots of UDP protocols are "probe-resistant" in that their default response is no response unless you know exactly what protocol to send. 18:46:26 oh, nice 18:46:26 I.e., why Nmap classifies non-responsive TCP ports as "filtered" and non-responsive UDP ports as "open|filtered" 18:46:58 besides, universities are a somewhat "sterile" environment in that they are unlikely to have a lot of, say, torrent traffic. i wonder what else they're missing. 18:47:52 (re udp, see also blanu's original 'dust' idea, which is kind of like scramblesuit but for udp) 18:48:12 The paper's selection of probes designed to elicit a response (HTTP/TLS/Modbus/S7/random/empty) is pretty reasonable, but it's easy to imagine tweaks to that list that would result in different counts for what you consider responsive hosts. 18:48:47 here's the dust paper: https://censorbib.nymity.ch/#Wiley2011a 18:49:11 does the paper look just at protocols that are popular at an american university, or is there any attempt to use protocols that are popular over the gfw? 18:49:33 I guess they were influenced by what probes are available by default in Zgrab (https://github.com/zmap/zgrab): -modbus, -s7 18:50:32 yes, modbus and s7 seemed like unexpected choices to me 18:51:15 what do you mean, arma2? Their Tap dataset is more about endpoints, not about protocols per se. 18:51:28 arma2: i don't think so. the problem is that we don't have a good idea (beyond anecdotal evidence) of what's popular behind the gfw. 18:52:16 yep. doesn't make it invalid, but it turns it more into a "somebody could" paper than a "we did" paper 18:52:25 I don't think the choice of probing protocols matters very much; it's just a prefilter to get rid of endpoints that ever respond (which can never be one of the circumvention protocols they're looking for) 18:52:38 it sure would be interesting to learn how different things would look for tsinghua university 18:52:47 ok, that's an even better question: "how much does having the right set of protocols matter here" 18:52:48 What you're left with is a superset of hosts that includes probe-resistant proxy servers 18:53:26 So you could send more protocol probes and filter out a few more candidates, but I don't think it changes much 18:53:51 I wonder how the authors came up with the hypothesis that FIN vs RST might differ, and how we could be sure that they thought of all the varying characteristics to test. 18:54:54 Like, would it make sense to approach the problem from another way, run each of these projects locally, and point some kind of TCP-level fuzzer at it, varying size and timing of bytes sent. 18:55:00 cjb: or, was it not a "here's a hypothesis, let's test it, oh look we're right", but more of a "let's stare at the tcpdumps, hey that's weird" 18:55:31 cjb: for that last idea, see also the recent geneva paper, which essentially tries to fuzz firewalls like that 18:55:49 (geneva mostly focuses on tcp packet level tricks, not tcp flow level tricks, i think) 18:56:02 * phw continues to be arma2's link bot: https://censorbib.nymity.ch/#Bock2019a 18:56:11 (thanks!) 18:56:40 ok so to reframe that question: "could something like geneva have found that obfs4 fin/rst identifier?" 18:57:03 or is the search space not a thing that's easy or productive to search over 18:57:30 tbh, i'm not sure whether this attack violates the "long tail" model of how obfs4 tries to hide. the fact that there were false positives suggests it doesn't 18:57:38 Geneva also does intra-TCP stuff: https://geneva.cs.umd.edu/posts/iran-whitelister/ 18:58:23 cohosh: it's different, though, when you already have reason to suspect that a server is a proxy, like with obfs3 back in the day. That's more of a closed world. 18:58:40 okay so it's about filtering things out more 18:59:02 i would worry about this attack in combination with some other attack that gets you a different set of hints 18:59:17 like, this + the zig-zag attack where you already suspect your user of using obfs and you look at all their destinations 18:59:37 okay yup good point 18:59:41 Neither of their data sets is likely to contain obfs4 servers; I understand that they also didn't have as much control over their tap as a GFW would. They just got a list of IP:port from the IT department, or something like that, and couldn't run any passive protocol analysis as a prefilter. 19:00:00 also, if you find a potential obfs4 bridge, you can port scan it and see if it has an open OR port 19:00:34 Actually I'm surprised that they found so many Psiphon servers in their passive tap. I wonder who the users are. 19:00:44 dcf1: Wouldn’t it be more likely to find obfs4 servers in the ZMap dataset? 19:01:26 compared to the TAP dataset 19:01:58 agix: I don't think so. There's only what, like 2,000 obfs4 bridges, and they only scanned 20,000 random hosts in their Zmap dataset, the probability of intersection is small. 19:02:06 dcf1: they are collaborating with psiphon on some decoy routing projects 19:02:12 perhaps that has something to do with it 19:02:17 i wonder if "many psiphon connections" equals "many psiphon users" or if very few users could result in many connections due to how psiphon allocates proxies to users 19:02:35 dcf1: makes sense 19:02:49 The fact, though, that they made datasets out of both random scanning and a passive tap is a strong point. A lot of other projects wouldn't have the idea or the means to do that, and not know what they're missing. 19:02:51 dcf1: did their traffic set overlap with the psiphon refraction network experiment? 19:03:21 See Table III on page 6, the two are quite different qualitatively. 19:03:55 cohosh: haha, I didn't think of that. Yes, it's possible the Psiphon users are members of the research group. 19:04:15 (right, that was my same question) 19:04:56 to get back to our reading group agenda: what action to you all believe we should take based on this work? 19:05:57 From talking to members of the team, an open question is whether the fixed timeout adopted by some projects including obfs4 in response to this work is adequate. 19:06:07 phw: i liked your idea of investigating udp-based transports more 19:06:48 It seems the best behavior is to be like MTProto, never time out and never stop receiving bytes. But, I suppose because of concerns about resource usage, some projects didn't go that far in their mitigation. 19:07:40 i still think that proxy-specific randomisation is a potentially strong aspect of obfs4. we haven't relied on it a lot in the past, mostly because we didn't know what's worth randomising. the paper has some concrete ideas of how we could move forward here. 19:08:25 Outline changed to a fixed 59-second timeout (https://github.com/Jigsaw-Code/outline-ss-server/commit/c70d512e78525eba36bb1e6ad7a0868593166cf9) 19:08:37 phw: what's proxy-specific randomisation? 19:08:40 obfs4 removed its byte threshold but kept its time thresholds (https://gitlab.com/yawning/obfs4/-/commit/1a6129b66ff3e66c347b54fbae203c1c61d12d74) 19:09:17 cjb: obfs4 bridges have the ability to do some simple flow obfuscation, e.g., pad data bursts 19:09:37 cjb: see Section 4.3 of https://censorbib.nymity.ch/#Winter2013b 19:09:45 thanks 19:09:57 but they don't all do it the same way. the first time an obfs4 bridge starts, it randomly derives probability distributions that dictate how much padding is added 19:11:11 does obfs4 just close the (failed handshake) connection after 30 seconds always? 19:11:11 a brief reminder that we scheduled an extra 15 minutes for our anti-censorship meeting, so we only have three more minutes 19:12:07 i suggest continuing the technical discussion in #tor-dev, and wrap it up in here 19:12:21 nice 19:13:04 what do y'all think of the reading group format? i think it's important to stick with our agenda questions as much as possible because there's always potential to drift off 19:13:54 i'd do it again 19:14:11 also, an extra 15 minutes is not a lot of time. if we have a packed agenda before the reading group, there's little to no time for the reading group 19:14:26 so we may want to allocate 30 or maybe even 60 minutes after our meeting. 19:14:52 * cohosh nods 19:14:54 cohosh: yes, me too 19:15:04 +1 19:15:34 we had plenty of good questions and thoughts. i would like to turn this into a blog post, if possible. 19:16:23 any other thoughts on what we can improve? 19:16:27 or mailing list post 19:16:58 cohosh: right, i like that even more because it facilitates further discussion and the barrier of writing it is lower 19:17:31 also, we already have a post for this: https://github.com/net4people/bbs/issues/26 :) 19:18:32 so my revised plan is to distill the above conversation and post it to the net4people thread 19:19:44 i suggest we do this again in two weeks and add a 60 minute slot after our anti-censorship meeting. how does that sound? 19:19:45 cool, sounds good 19:19:51 +1 19:20:11 +1 19:20:38 ok! 19:20:41 is there gonna be a different paper to discuss? 19:20:52 yes, i suggest not discussing this one again :) 19:20:59 does anyone want to volunteer to pick the next one? 19:21:09 i can volunteer 19:21:16 * phw passes the baton to cohosh 19:21:55 i wanted to check out YMTCP: Eluding Stateful Deep Packet Inspection with Automated Discrepancy Discovery 19:22:07 to err *SYMTCP 19:22:29 https://censorbib.nymity.ch/#Wang2020a 19:22:38 to switch gears and look at dpi techniques 19:22:44 cool! 19:23:16 ok, let's wrap it up and discuss the symtcp paper on april 16 19:23:23 thanks for volunteering, cohosh 19:23:38 #endmeeting