#tor-meeting log

17:59:03 <phw> #startmeeting anti-censorship meeting
17:59:03 <MeetBot> Meeting started Thu Apr  2 17:59:03 2020 UTC.  The chair is phw. Information about MeetBot at http://wiki.debian.org/MeetBot.
17:59:03 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic.
17:59:10 <phw> hello everybody
17:59:20 <phw> here is today's meeting pad: https://pad.riseup.net/p/tor-anti-censorship-keep
17:59:22 <gaba> o/
17:59:30 <cohosh> hi!
17:59:37 <catalyst> o/
18:00:08 <juggy> hi]
18:00:09 <phw> we have an empty agenda today. does anyone have impromptu announcements, questions, or discussion topics?
18:00:49 <juggy> Could I know where to find the relevant code for studying Moat in Tor's source code?
18:01:40 <phw> let me find that for you
18:02:10 <dcf1> One the client I believe it's part of Tor Launcher
18:02:11 <dcf1> https://gitweb.torproject.org/tor-launcher.git/tree/src/modules/tl-bridgedb.jsm
18:02:29 <phw> that should be the server-side component: https://gitweb.torproject.org/pluggable-transports/meek.git/tree/meek-server
18:02:37 * antonela is late, o/
18:02:45 <juggy> Thank you!
18:03:02 <cjb> (hi folks!)
18:03:08 <phw> juggy: did i forward you the email explaining moat?
18:03:26 <juggy> No, I don't think so
18:03:29 <phw> sysrqb once wrote a lengthy email, which is the best moat documentation we have :)
18:03:48 <cohosh> phw: can you cc me on that? or maybe post it to the anti-censorship mailing list?
18:04:03 <agix> phw: cc me please too :)
18:04:05 <juggy> me too thank you :)
18:04:09 <phw> cohosh: that's a good idea, i might as well send it to our anti-censorship list
18:04:38 <cohosh> for those who aren't subscribed, you can do so here: https://lists.torproject.org/cgi-bin/mailman/listinfo/anti-censorship-team
18:06:34 <phw> sysrqb: are you ok with me forwarding your email (<20190215165707.xag6rl334h3f7vrj@localhost>) to our public team list?
18:06:51 <phw> anything else before we take a look at reviews?
18:07:52 <phw> ok, let's get started with reviews
18:08:11 <phw> i have #29686 and #30941
18:08:15 * phw looks at cohosh
18:08:23 <cohosh> yep i can take those :)
18:08:30 <phw> thanks! :)
18:08:33 <phw> cohosh has #33666
18:08:53 <dcf1> I'm going to run some tests of my own for #33666
18:09:01 <cohosh> ok cool dcf1
18:09:15 <dcf1> and I'll comment on the ticket, short summary is that I think (3) is a good option
18:09:20 <juggy> Is reviews for requesting others to take a look at what you've done for a ticket?
18:09:45 <phw> juggy: yes, exactly. we don't merge code unless it has been reviewed by somebody else
18:09:49 <cohosh> juggy: yup!
18:10:02 <phw> for juggy i see #10831 needs a review
18:10:07 <phw> i can take a look at that
18:10:09 <juggy> #10831
18:10:12 <juggy> Yes, thanks!
18:10:53 <phw> i think arlo's #19026 is from last week because it's already fixed
18:11:03 <antonela> thanks for working on that one juggy
18:11:05 <dcf1> juggy: it's also a good idea to change the ticket to needs_review state (if your account lets you do that)
18:11:25 <juggy> i see, let me try
18:11:28 <cohosh> and if it doesn't, we should upgrade your account
18:12:33 <phw> agix has a question wrt #19997. let me take a look
18:12:59 <agix> thanks
18:13:52 <juggy> Hmm under "modify ticket" I only see one disable radio button for "leave as assigned"
18:14:04 <juggy> disabled*
18:14:16 <phw> huh, i wasn't even aware of this ticket. i think i'll have to do some more digging before i'll be able to answer your question agix :/
18:14:41 <arma2> cohosh: i can do trac changes if any are needed
18:14:51 <arma2> cohosh: come to think of it, maybe i should make it so somebody else here can too, if they can't :)
18:15:02 <cohosh> arma2: can you give juggy trac powers?
18:15:11 <agix> no problem :) is it okay if i still start with this one or should i just tackle a different ticket
18:15:22 <arma2> yes. GRP_user enough? or GRP_devel? i don't actually know how they differ
18:15:37 <arma2> i'll try GRP_devel because it sounds more impressive
18:15:50 <phw> agix: and regarding "How many bridges can currently be requested by a user from a particular Tor exit address?": a client can request bridges over and over but tor exits are supposed to have their own pool of bridges. that is, if you request bridges from an exit relay, you shouldn't be getting bridges that are allocated for https, moat, or email
18:16:20 <agix> i see, thanks for that!
18:16:22 <phw> does that answer your question?
18:16:26 <arma2> juggy: it is done
18:16:43 <juggy> thanks!
18:17:18 <phw> hmm, we should probably verify that this is actually happening. the code doesn't always match our understanding of the code.
18:17:37 <agix> yup, do you want me to work on a different ticket in the meantime?
18:18:16 <arma2> phw: and if it turns out to still be true (hopefully so), i wonder if we should tag it as a new bonus https-tor distribution bucket on atlas
18:18:32 <phw> agix: #19997 seems like a great thing to be working on. i believe i also owe you one or two reviews. i try to get to them today or tomorrow
18:19:03 <agix> cool thanks :) ill be on irc or just send me a mail if you got some info on it
18:19:07 <phw> arma2: i suggest calling it the "gfw distribution bucket"
18:19:37 <cohosh> lol
18:19:39 <arma2> works for me. just a few lines down the list from the gfy distribution bucket.
18:20:02 <arma2> (also an apt name)
18:20:33 <phw> did i forget anyone's work?
18:21:28 <phw> catalyst: oops, i owe you a response in #5304. sorry about that
18:22:30 <catalyst> phw: thanks. i'm not currently blocking on it, at least until i finish evaluating the current patch
18:22:57 <phw> ok, good
18:23:07 <phw> shall we conclude our meeting and move on to the reading group?
18:23:55 <phw> by "concluding the meeting" i mean changing topics. there's no need to end the meeting and lose logs
18:24:12 <cohosh> sounds good to me
18:24:29 <phw> a reminder: we're discussing https://censorbib.nymity.ch/#Frolov2020a today
18:24:48 <phw> we also have a new section, "reading group", on our pad
18:25:24 <phw> let me briefly summarise the paper and start with some background
18:26:29 <phw> several years ago, china's great firewall was enhanced with an "active probing" infrastructure which turned out to be highly effective at blocking circumvention protocols
18:27:44 <phw> it works in two stages: first, it identifies traffic that *may* be circumvention protocols. then, it actively talks to the circumvention proxies it found in the first step, and if they speak the protocol it suspected, it can block them.
18:27:58 <phw> vanilla tor and many other protocols fell prey to it
18:28:10 <phw> so we started developing pluggable transport protocols that are resistant to this kind of attack
18:28:28 <phw> the idea is to not talk to a client unless the client can prove knowledge of a shared secret in the first few bytes it sends to the server
18:29:17 <phw> for obfs4, this effectively means that you must have gotten the obfs4 bridge from bridgedb. if you only have its ip address and port, you cannot get it to talk to you
18:29:35 <phw> a bunch of other protocols started implementing this idea and it turned out to be relatively effective, i would say
18:30:41 <phw> "effective" meaning that the gfw has yet to block these probing-resistant protocols. we're still struggling though because distributing these obfs4 bridges is still a big issue
18:31:06 <phw> anyway, the goal of the paper was to look at how one can tell apart probing-resistant protocols from "ordinary" protocols
18:32:00 <phw> their idea was to use two data sets (one passive and one active) and then look for ways to distinguish probing-resistant protocols from all other protocols
18:32:53 <phw> for these two datasets, they were reasonably successful and found a handful of clever and effective attacks. it is not perfectly clear how the findings generalise to the entire internet, though
18:34:08 <phw> i think their datasets give us a good idea of how hard, approximately, the problem is but there's still a lot of uncertainty left
18:35:26 <phw> in practice, obfs4 still works. that can mean two things: it may be (too) hard for the gfw to block it, and/or the gfw is successful enough at getting obfs4 bridges from bridgedb, so there's no need to block the protocol itself.
18:36:26 <phw> and with that, i want to stop my monologue and ask y'all for your thoughts. what do you conclude from the paper? and how does it apply to our problem of distributing obfs4 bridges?
18:36:58 <cohosh> to be fair it looks like obfs4 made this change in response? https://gitweb.torproject.org/pluggable-transports/obfs4.git/commit/?id=1a6129b66ff3e66c347b54fbae203c1c61d12d74
18:37:21 <phw> oh, right. i believe sergey reached out to yawning, who then fixed this issue.
18:37:39 <agix> Something I was wondering about was, have they shared their findings regarding which servers they were able to identify and if so did we check if those addresses actually were Tor servers speaking obfs4?
18:39:30 <dcf1> agix: according to Table IV on page 11, they only found 2 servers that were classified as obfs4
18:39:49 <phw> that's a good question. i haven't heard from the authors but then again, i believe they ran their experiments before i started working full-time on tor.
18:40:21 <dcf1> Section V.D on page 9 says that those 2 are unlikely to be actual obfs4 servers (i.e. they are false positives)
18:40:27 <arma2> there definitely wasn't a mail from them to us about bridges they found. i'm glad they talked to yawning though.
18:41:25 <dcf1> "Both server are in China, and on serves a TLS certificate valid for several subdomains of baofeng.com." I interpret that to mean that they didn't really find any obfs4 bridges worth reporting
18:41:40 <cohosh> i was curious whether this was related at all to the shadowsocks active probing https://gfw.report/blog/gfw_shadowsocks/
18:42:23 <dcf1> agix: Their Lampshade match also looks like a false positive, as are likely almost all of their MTProto matches, but in the case of OSSH they were able to verify with Psiphon that 7/8 of the classifications were correct.
18:43:43 <agix> dcf1: I guess you are right, sounds like false positives to me as well
18:44:02 <dcf1> cohosh: it's possible that some of the different shadowsocks probes are aimed at classification of this type. Especially the ones that are 0-50 bytes long, they seem to match certain byte thresholds for various ways of using shadowsocks.
18:44:33 <dcf1> cohosh: but other probe types like plain replay are more straightforward than the subtle attacks of this paper.
18:45:00 <phw> i'm very curious what their scan results would look like over udp. for example, take all udp end points that their university clients talked to, and then try to get them to talk.
18:45:01 <cohosh> dcf1: okay cool
18:46:02 <dcf1> phw: that's a good angle. Lots of UDP protocols are "probe-resistant" in that their default response is no response unless you know exactly what protocol to send.
18:46:26 <cohosh> oh, nice
18:46:26 <dcf1> I.e., why Nmap classifies non-responsive TCP ports as "filtered" and non-responsive UDP ports as "open|filtered"
18:46:58 <phw> besides, universities are a somewhat "sterile" environment in that they are unlikely to have a lot of, say, torrent traffic. i wonder what else they're missing.
18:47:52 <arma2> (re udp, see also blanu's original 'dust' idea, which is kind of like scramblesuit but for udp)
18:48:12 <dcf1> The paper's selection of probes designed to elicit a response (HTTP/TLS/Modbus/S7/random/empty) is pretty reasonable, but it's easy to imagine tweaks to that list that would result in different counts for what you consider responsive hosts.
18:48:47 <phw> here's the dust paper: https://censorbib.nymity.ch/#Wiley2011a
18:49:11 <arma2> does the paper look just at protocols that are popular at an american university, or is there any attempt to use protocols that are popular over the gfw?
18:49:33 <dcf1> I guess they were influenced by what probes are available by default in Zgrab (https://github.com/zmap/zgrab): -modbus, -s7
18:50:32 <phw> yes, modbus and s7 seemed like unexpected choices to me
18:51:15 <dcf1> what do you mean, arma2? Their Tap dataset is more about endpoints, not about protocols per se.
18:51:28 <phw> arma2: i don't think so. the problem is that we don't have a good idea (beyond anecdotal evidence) of what's popular behind the gfw.
18:52:16 <arma2> yep. doesn't make it invalid, but it turns it more into a "somebody could" paper than a "we did" paper
18:52:25 <dcf1> I don't think the choice of probing protocols matters very much; it's just a prefilter to get rid of endpoints that ever respond (which can never be one of the circumvention protocols they're looking for)
18:52:38 <phw> it sure would be interesting to learn how different things would look for tsinghua university
18:52:47 <arma2> ok, that's an even better question: "how much does having the right set of protocols matter here"
18:52:48 <dcf1> What you're left with is a superset of hosts that includes probe-resistant proxy servers
18:53:26 <dcf1> So you could send more protocol probes and filter out a few more candidates, but I don't think it changes much
18:53:51 <cjb> I wonder how the authors came up with the hypothesis that FIN vs RST might differ, and how we could be sure that they thought of all the varying characteristics to test.
18:54:54 <cjb> Like, would it make sense to approach the problem from another way, run each of these projects locally, and point some kind of TCP-level fuzzer at it, varying size and timing of bytes sent.
18:55:00 <arma2> cjb: or, was it not a "here's a hypothesis, let's test it, oh look we're right", but more of a "let's stare at the tcpdumps, hey that's weird"
18:55:31 <arma2> cjb: for that last idea, see also the recent geneva paper, which essentially tries to fuzz firewalls like that
18:55:49 <arma2> (geneva mostly focuses on tcp packet level tricks, not tcp flow level tricks, i think)
18:56:02 * phw continues to be arma2's link bot: https://censorbib.nymity.ch/#Bock2019a
18:56:11 <cjb> (thanks!)
18:56:40 <arma2> ok so to reframe that question: "could something like geneva have found that obfs4 fin/rst identifier?"
18:57:03 <arma2> or is the search space not a thing that's easy or productive to search over
18:57:30 <cohosh> tbh, i'm not sure whether this attack violates the "long tail" model of how obfs4 tries to hide. the fact that there were false positives suggests it doesn't
18:57:38 <dcf1> Geneva also does intra-TCP stuff: https://geneva.cs.umd.edu/posts/iran-whitelister/
18:58:23 <dcf1> cohosh: it's different, though, when you already have reason to suspect that a server is a proxy, like with obfs3 back in the day. That's more of a closed world.
18:58:40 <cohosh> okay so it's about filtering things out more
18:59:02 <arma2> i would worry about this attack in combination with some other attack that gets you a different set of hints
18:59:17 <arma2> like, this + the zig-zag attack where you already suspect your user of using obfs and you look at all their destinations
18:59:37 <cohosh> okay yup good point
18:59:41 <dcf1> Neither of their data sets is likely to contain obfs4 servers; I understand that they also didn't have as much control over their tap as a GFW would. They just got a list of IP:port from the IT department, or something like that, and couldn't run any passive protocol analysis as a prefilter.
19:00:00 <phw> also, if you find a potential obfs4 bridge, you can port scan it and see if it has an open OR port
19:00:34 <dcf1> Actually I'm surprised that they found so many Psiphon servers in their passive tap. I wonder who the users are.
19:00:44 <agix> dcf1: Wouldn’t it be more likely to find obfs4 servers in the ZMap dataset?
19:01:26 <agix> compared to the TAP dataset
19:01:58 <dcf1> agix: I don't think so. There's only what, like 2,000 obfs4 bridges, and they only scanned 20,000 random hosts in their Zmap dataset, the probability of intersection is small.
19:02:06 <cohosh> dcf1: they are collaborating with psiphon on some decoy routing projects
19:02:12 <cohosh> perhaps that has something to do with it
19:02:17 <phw> i wonder if "many psiphon connections" equals "many psiphon users" or if very few users could result in many connections due to how psiphon allocates proxies to users
19:02:35 <agix> dcf1: makes sense
19:02:49 <dcf1> The fact, though, that they made datasets out of both random scanning and a passive tap is a strong point. A lot of other projects wouldn't have the idea or the means to do that, and not know what they're missing.
19:02:51 <arma2> dcf1: did their traffic set overlap with the psiphon refraction network experiment?
19:03:21 <dcf1> See Table III on page 6, the two are quite different qualitatively.
19:03:55 <dcf1> cohosh: haha, I didn't think of that. Yes, it's possible the Psiphon users are members of the research group.
19:04:15 <arma2> (right, that was my same question)
19:04:56 <phw> to get back to our reading group agenda: what action to you all believe we should take based on this work?
19:05:57 <dcf1> From talking to members of the team, an open question is whether the fixed timeout adopted by some projects including obfs4 in response to this work is adequate.
19:06:07 <cohosh> phw: i liked your idea of investigating udp-based transports more
19:06:48 <dcf1> It seems the best behavior is to be like MTProto, never time out and never stop receiving bytes. But, I suppose because of concerns about resource usage, some projects didn't go that far in their mitigation.
19:07:40 <phw> i still think that proxy-specific randomisation is a potentially strong aspect of obfs4. we haven't relied on it a lot in the past, mostly because we didn't know what's worth randomising. the paper has some concrete ideas of how we could move forward here.
19:08:25 <dcf1> Outline changed to a fixed 59-second timeout (https://github.com/Jigsaw-Code/outline-ss-server/commit/c70d512e78525eba36bb1e6ad7a0868593166cf9)
19:08:37 <cjb> phw: what's proxy-specific randomisation?
19:08:40 <dcf1> obfs4 removed its byte threshold but kept its time thresholds (https://gitlab.com/yawning/obfs4/-/commit/1a6129b66ff3e66c347b54fbae203c1c61d12d74)
19:09:17 <phw> cjb: obfs4 bridges have the ability to do some simple flow obfuscation, e.g., pad data bursts
19:09:37 <dcf1> cjb: see Section 4.3 of https://censorbib.nymity.ch/#Winter2013b
19:09:45 <cjb> thanks
19:09:57 <phw> but they don't all do it the same way. the first time an obfs4 bridge starts, it randomly derives probability distributions that dictate how much padding is added
19:11:11 <cjb> does obfs4 just close the (failed handshake) connection after 30 seconds always?
19:11:11 <phw> a brief reminder that we scheduled an extra 15 minutes for our anti-censorship meeting, so we only have three more minutes
19:12:07 <phw> i suggest continuing the technical discussion in #tor-dev, and wrap it up in here
19:12:21 <cohosh> nice
19:13:04 <phw> what do y'all think of the reading group format? i think it's important to stick with our agenda questions as much as possible because there's always potential to drift off
19:13:54 <cohosh> i'd do it again
19:14:11 <phw> also, an extra 15 minutes is not a lot of time. if we have a packed agenda before the reading group, there's little to no time for the reading group
19:14:26 <phw> so we may want to allocate 30 or maybe even 60 minutes after our meeting.
19:14:52 * cohosh nods
19:14:54 <phw> cohosh: yes, me too
19:15:04 <agix> +1
19:15:34 <phw> we had plenty of good questions and thoughts. i would like to turn this into a blog post, if possible.
19:16:23 <phw> any other thoughts on what we can improve?
19:16:27 <cohosh> or mailing list post
19:16:58 <phw> cohosh: right, i like that even more because it facilitates further discussion and the barrier of writing it is lower
19:17:31 <phw> also, we already have a post for this: https://github.com/net4people/bbs/issues/26 :)
19:18:32 <phw> so my revised plan is to distill the above conversation and post it to the net4people thread
19:19:44 <phw> i suggest we do this again in two weeks and add a 60 minute slot after our anti-censorship meeting. how does that sound?
19:19:45 <cohosh> cool, sounds good
19:19:51 <cohosh> +1
19:20:11 <agix> +1
19:20:38 <phw> ok!
19:20:41 <juggy> is there gonna be a different paper to discuss?
19:20:52 <phw> yes, i suggest not discussing this one again :)
19:20:59 <phw> does anyone want to volunteer to pick the next one?
19:21:09 <cohosh> i can volunteer
19:21:16 * phw passes the baton to cohosh
19:21:55 <cohosh> i wanted to check out YMTCP: Eluding Stateful Deep Packet Inspection with Automated Discrepancy Discovery
19:22:07 <cohosh> to err *SYMTCP
19:22:29 <phw> https://censorbib.nymity.ch/#Wang2020a
19:22:38 <cohosh> to switch gears and look at dpi techniques
19:22:44 <phw> cool!
19:23:16 <phw> ok, let's wrap it up and discuss the symtcp paper on april 16
19:23:23 <phw> thanks for volunteering, cohosh
19:23:38 <phw> #endmeeting