16:00:06 <phw> #startmeeting anti-censorship team meeting
16:00:06 <MeetBot> Meeting started Thu May 28 16:00:06 2020 UTC.  The chair is phw. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:06 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic.
16:00:18 <phw> hi all, here's our pad: https://pad.riseup.net/p/tor-anti-censorship-keep
16:00:23 <antonela> hello
16:00:25 <agix> hi
16:00:28 <cohosh> hi!
16:00:42 <arma2> (hi! i am nearby but still catching up on things)
16:00:58 <phw> antonela: let's start with the s30 ux discussion?
16:01:03 <antonela> sure thing!
16:01:24 <antonela> so, first of all, this is a first approach, please feel free to tell me you are crazy or not
16:01:44 <antonela> i wanted to drop down all the things we informally talked about so we have some base to discuss
16:02:08 <antonela> personas is in progress, duncan is back with some hours as volunteer and we are making progress there
16:02:19 <gaba> hi!
16:02:32 <antonela> i used the EFF circumvention guide to define some escenarios, please let me know if you want to include more/different use cases
16:02:59 <antonela> and finally, i made a first attempt of what could/would be a tor browser ten flow for censored users
16:03:34 <antonela> i think we can plan from now to six months the next major release, and that release can be focused on all the efforts we are doing around anticensorship
16:03:54 <phw> oh, exciting
16:04:05 <cohosh> cool@
16:04:06 <antonela> we did it with onion services and the 9.5 release which will happen next week and i think was a great process (we still need to release :)
16:04:17 <cohosh> is there a link for the persona work?
16:04:32 <antonela> yes, our notes are linked in that notion
16:04:50 <gaba> #32811 is the ticket
16:04:54 * dcf1 had to look up what s30 and objectives were: https://trac.torproject.org/projects/tor/wiki/org/sponsors/Sponsor30
16:05:00 <phw> antonela: how would you like to get feedback? email?
16:05:10 <antonela> yes, will update the tickets after this meeting but wanted to briefly discuss it live first
16:05:26 <antonela> phw: i dont know, maybe export all this and paste in a ticket is useful?
16:05:35 <antonela> is a simple markup export
16:05:41 <antonela> sorry, markdown
16:06:33 <antonela> so, the main question from is how do you feel about changing the narrative in tor browser from making users to find their bridges vs tor browser giving the best bridge
16:06:42 <antonela> *from me
16:07:09 <antonela> if you feel confident that we are in a place where we can try something around it, then we can move forward
16:07:19 <gaba> antonela: just a second that maybe people need a review of what those activites were
16:07:32 <antonela> oh yes
16:07:53 <cohosh> i like the idea of automating the bridge usage process more. particularly in getting the bridges/choosing the transports automatically
16:08:33 <antonela> im contemplating a step that users might need to solve a puzzle or give sensitive information, im not sure exactly what you may need, but im having that moment into account
16:08:41 <phw> a lot of this will require tor browser knowing what country it's in
16:08:46 <antonela> right
16:08:49 <cohosh> i think detecting interference will be difficult, since there are a lot of non-censorship reasons for network trouble, but perhaps this can be helped by phw's country-specific work?
16:09:03 <cohosh> ah yes +1
16:09:06 <antonela> and that is something we may want users to give consent
16:09:38 <antonela> discussing this ideas with sysrqb, he remembered me that we have some users with specific locales which can jump that step
16:09:53 <antonela> but we have a lot of users using the us version who are in another places that is not the states
16:10:02 <phw> right, that's a good point
16:10:59 <cohosh> our supported locales in general are much fewer than the number of countries (or ASes) people can be in
16:11:15 <arma2> right, we've heard from folks in e.g. arabic-speaking countries that they use the english version because the words are clearer
16:11:25 <arma2> (and also, "arabic-speaking countries" is a lot of countries, each with their own censorship situation)
16:11:29 <antonela> yep
16:11:35 <antonela> so is useful, but partially
16:11:48 <antonela> as you may see in this early wireframes, the plan is removing tor launcher from the boostrapping, this will require a deep discussion with tb dev folks but i think it worth it
16:12:13 <antonela> as i said, this is a super early concept and im happy to incorporate all your feedback and iterate this mocks until we have a prototype we are happy with
16:13:05 <phw> antonela: there's not a lot i can say now. i'll need an hour to go over your work and think about it
16:13:12 <antonela> yes sure
16:13:17 <antonela> i'll export all this and paste in tickets
16:13:19 <gaba> ok. Let's try to put all this feedback and discussion in trac tickets
16:13:24 <antonela> and we can continue working there
16:13:26 <phw> sounds good, thanks
16:13:26 <antonela> sounds good?
16:13:27 <gaba> thanks!
16:13:36 <cohosh> thanks antonela!
16:13:48 <antonela> no problem, im happy to back on focusing on this!
16:14:14 <antonela> we can ship really good improvements in tor browser
16:14:25 <antonela> you are a stellar team :)
16:14:36 <phw> :)
16:14:45 <phw> anything else regarding sponsor 30?
16:14:52 * cohosh wants to be a lunar team
16:14:57 <antonela> not really, the other part is more about mirrors
16:15:13 <antonela> i'll comment in tickets so we can look on what is in trac about it too
16:15:18 <cohosh> ohh the mirrors stuff yeah, is that for gettor?
16:15:23 <antonela> for everything
16:15:40 <antonela> we need a table with all the mirrors and some green and red dots that show if they are running or not
16:15:48 <antonela> is very useful for support and also for us to monitor what is going on
16:15:53 <antonela> right now im lost on mirrors
16:16:05 * antonela nice title for movie
16:16:15 <arma2> phw: my very very early impression is, one of the good roles for you and others here, re antonela's workflow, is to figure out which technical steps are feasible. like, what are our options for deciding that we're in a censored area, and which options will actually work reliably.
16:16:30 <cohosh> yeah i am not even sure what mirrors exist. i can say that if there are functioning gettor mirrors they are extremely out of date
16:16:31 <phw> fwiw, bridgedb doesn't have a mirror. its domain-fronted endpoint (moat) should work everywhere as far as we know, so i don't see a need for a mirror
16:16:50 <antonela> for bridgedb specific, should we have tpi brand there?
16:17:12 <antonela> should we have bridgedb under the lektor / www flow?
16:17:28 <phw> antonela: does that mean changing its website UI so match torproject.org?
16:17:30 <antonela> arma2: yes, that is indeed what we need next
16:17:33 <antonela> phw: yes
16:18:08 <phw> antonela: i would like that but that will require some work that's currenty not on our roadmap
16:18:21 <antonela> oki, good to know
16:18:29 <phw> arma2: yes, there are already sponsor30 tickets for that
16:19:38 <phw> bridgedb seems to use the bootstrap framework. i'm not sure how much work it would be to make it match torproject.org
16:19:56 <antonela> if is using boostrap is just loading a css over that
16:20:14 <antonela> we can include it in the www roadmap if we want it
16:20:22 <antonela> (i personally can work on it)
16:20:22 <dcf1> yeah metrics.torproject.org is bootstrap already too
16:20:26 <phw> huh, if it's as simple as that, i may give it a shot later today
16:20:43 <antonela> yes, you can give a try phw
16:21:21 <antonela> i think that is all from me, i'll update the tickets and we can continue from there
16:21:26 <phw> thanks antonela
16:21:31 <antonela> np
16:21:53 <phw> next up is snowflake QoL improvements
16:21:58 <phw> ...written by a mysterious green person
16:22:12 <cohosh> lol
16:22:16 <antonela> haha
16:22:45 <cohosh> so there currently some clients who have a really hard time connecting to a snowflake
16:22:55 <cohosh> only ~12% of snowflakes work for them
16:23:18 <cohosh> which means that, even with things like requesting multiple snowflakes, it takes a very long time to get a working connection
16:23:43 <cohosh> we've found the issue to be related to NAT implementations and there are some solutions
16:23:58 <cohosh> which involve handing these clients proxies with less restrictive NATs
16:24:21 <cohosh> but any time we prioritize some snowflakes over others using a metric the proxies self report, we are making ourselves more vulnerable to DoS
16:25:35 <cohosh> i would like to keep as much in line with concept of snowflake having many ephemeral, lightweight proxies as possible
16:26:06 <cohosh> but i think we need to do something more here to match up clients with proxies that work for them
16:26:16 <cohosh> so i was hoping for thoughts/feedback on how to handle this tradeoff
16:26:33 <phw> the self-reported proxy metric would be "hey, i'm behind a NAT of type X"?
16:26:40 <cohosh> yup
16:27:25 <cohosh> in general we don't have great DoS defenses
16:27:34 <cohosh> so maybe i am getting too in the weeds here
16:27:42 <cohosh> and we should just make it work well first
16:27:45 <arma2> seems like we need to have the info, to give users snowflakes that work
16:28:13 <cohosh> my concern is that now in order to flood the broker, a censor can just claim to be a super permissive snowflake and do it much more easily
16:28:34 <cohosh> but like i said, perhaps we are already too weak against this attack for this change to matter much
16:29:03 <arma2> right. maybe it's conceivable that we could measure the snowflakes and decide what kind of nat they are. but then the attacker could respond to our probe in a way that makes us decide 'most permissive'
16:29:13 <cohosh> i also understand if people would rather discuss this on the ticket than here, but wanted to raise my concern
16:29:25 <phw> by "flood" do you mean "be matched with as many clients as possible"?
16:29:59 <cohosh> phw: basically by spinning up a bunch of malfunctioning snowflakes, and making sure these get handed out to clients more easily than honest ones
16:30:10 <phw> i see
16:30:35 * dcf1 is constantly impressed by cohosh's productive collaboration with pion upstream (https://github.com/pion/stun/pull/33)
16:30:52 <cohosh> lol that is all pion. they are great
16:31:07 <cohosh> about getting back quickly on things
16:31:13 <cohosh> it's like minutes
16:32:02 <phw> technically, we have the same problem with bridges. a censor could set up non-functioning bridges to harm the user experience
16:32:10 <phw> granted, it's easier to set up a snowflake than a bridge
16:33:05 <arma2> and, for both snowflakes and bridges, we could spot-check them for correct behavior, and they could grow more and more subtle with their failures
16:33:27 <dcf1> cohosh: I think I agree with your assessment of the situation. A change like self-throttling if you don't have a good NAT type does increase risk of DoS, but for Snowflake in its current state the increase is somewhat marginal.
16:34:17 <dcf1> One way to look at it is that the snowflakes with a non-favorable NAT type are already "DoS"ing the network for some clients somewhat, which can be a good thing, because it lets us learn to deal with a situation like that in a somewhat safe and controlled environment.
16:34:52 <dcf1> Two existing tickets that can help against broker-flooding DoS are #25723 and #34080
16:34:53 <cohosh> hmm yeah, that's true. a DoS that disproportionately affects some users more than others
16:35:33 <cohosh> yup, my worry is that with a 12% failure rate these tickets won't do enough by themselves
16:35:46 <dcf1> It it's possible to cycle through enough snowflake quickly enough (possibly simultaneously), then that 88% failure rate starts to quickly decay to 0.
16:36:31 <cohosh> oh right sorry 12% success rate*
16:36:45 <cohosh> dcf1: yeah, and imo that is slightly more in the spirit of snowflake
16:37:27 <dcf1> So for me, it's been taking 5 or 6 minutes, with fairly high variance, to connect with the new 9.5a13. That's without one other commit (can't find it now) that reduced one of the timeouts.
16:37:30 <cohosh> get a blizzard, and eventually some of the snowflakes will work for you
16:38:06 <arma2> dcf1: i guess you're behind the pessimal kind of nat?
16:38:20 <cohosh> hmm 5 or 6 minutes is a lot. i had a friend with a symmetric NAT run some tests for me and with a 12% success rate it took 10 minutes
16:38:24 <dcf1> Right, but still I have doubts about whether that really works or if there's some other failure mode we will run into (like perhaps conspicuous network activity)
16:38:42 <cohosh> that's a good point
16:38:50 <cohosh> i guess there's no reason why we can't experiment
16:38:55 <dcf1> It should be a binomial distribution right? So a long tail.
16:39:01 <cohosh> yup
16:39:34 <arma2> cohosh, dcf1: i liked the framing of it as fairness -- here you are trying to be fair to everybody, but now you need to define what you mean by fair
16:39:51 <arma2> that's the same question that all the congestion control protocols face
16:40:04 <cohosh> so matching up NAT types will make it more fair
16:40:22 <cohosh> and we can do this either by throttling the poll rate of restrictive NATs or doing smart matching at the broker
16:40:25 <dcf1> So the first option in comment:11:ticket:34129 seems pertty non-invasive and good to experiment.
16:40:44 <cohosh> cool, replacing our current poll throttling
16:40:47 <dcf1> The pairwise matching, you're right, seems like it will require some protocol changes.
16:41:04 <dcf1> arma2: yes I think I have the bad kind of NAT.
16:43:19 <cohosh> alright i can move forward with this, thanks!
16:43:36 <cohosh> i appreciate the discussion
16:43:53 <cohosh> and would welcome future input if anyone has more to add later
16:44:42 <dcf1> sorry, a geometric distribution, not a binomial distribution (time to first success, not the number of successes)
16:45:27 <cohosh> okay that's all from me for now unless others have more to add
16:46:52 <phw> next up is our 'needs help with'. cohosh has #34129 (which we just discussed) and #34286 (which i'll snatch)
16:46:57 <phw> anything else?
16:48:15 <phw> *crickets* means "no"
16:48:20 <cohosh> :)
16:48:29 <phw> the final item on our agenda is today's reading group
16:48:36 <cohosh> \o/
16:48:38 <phw> which is the conjure paper: https://censorbib.nymity.ch/#Frolov2019b
16:48:51 <cohosh> i prepared a summary but if someone else has one i'm happy if they jump in
16:49:16 <phw> cohosh: that would be great. i have to confess that i only made it to page 2, so i'm not a gread session lead
16:49:48 <cohosh> cool, prepare for wall of paste
16:50:03 <cohosh> This paper is about Conjure, a new approach to refraction networking (aka decoy routing) that uses unused IP address space as destination addresses for proxies.
16:50:06 <cohosh> Refraction networking is sometimes referred to as end-to-middle proxying because a client wishing to circumvention censorship, is actually trying to reach a router deployed in the middle of the network rather than a proxy endpoint to tunnel their traffic.
16:50:10 <cohosh> This is usually done by a cryptographically secure steganographic tag inserted in the client's traffic to a legitimate IP address that is not involved with the system. The client's traffic passes through the router on it's way to the destination address and is recognized and redirected to a circumvention proxy by the router in a way transparent to the censor.
16:50:15 <cohosh> So a censor still things the client is talking to the (decoy) destination IP while the client is actually having their traffic proxied by the router.
16:50:19 <cohosh> For some context: refraction networking is difficult to deploy since it requires the cooperation of ISPs.
16:50:22 <cohosh> The tag detection and redirection is expensive and risky when you take into account the amount of traffic that these routers see and the Terms of Service agreements the routers have with the potential decoy IP addresses.
16:50:26 <cohosh> TapDance (which Conjure builds on) is the only large-scale deployment of refraction networking used for censorship resistance.
16:50:29 <cohosh> This is made possible by design decisions that make the refraction networking process less risky for ISPs.
16:50:37 <cohosh> Conjure seeks to make the process even easier by using the ISPs *unused* IP(v6) addresses as decoy destinations that the client is seen connecting to, rather than real HTTPS sites.
16:50:41 <cohosh> To connect, clients perform a unidirectional registration process with the router by inserting a steganographic tag into a connection to a real site behind the router.
16:50:44 <cohosh> This tag contains: a public key that allows the client (who already knows the router's public key) and the router to compute a shared secret, and a message encrypted with that secret that contains a seed.
16:50:48 <cohosh> The seed is used by both the router and client to computer a destination IP address that exists within a range of pre-shared unused IP address space.
16:50:51 <cohosh> If the registration was successful, all packets from the client to the phantom destination address are forwarded by the router to its proxy server.
16:50:54 <cohosh> Another difference in conjure are that there are multiple available transports that can be used to add another obfuscation layer onto the forwarded traffic. These include obfs4 and mask sites (aiui similar to HTTPS proxy).
16:50:58 <cohosh> As a result of its design, Conjure allows for much larger uploads than TapDance (the upload traffic isn't being sent to a legitimate HTTPS that terminates the connection after the client sends ~32KB of traffic).
16:51:02 <cohosh> </summary>
16:52:24 <arma2> the thing i found fascinating about conjure is that it seems like it's totally shifting the assumptions about what it's trying to blend with. in the early decoy routing approaches, there's a website, and you're talking to it, and it's legit to be talking to it.
16:52:46 <arma2> but here, there's a...network. and you're talking to somebody surprising in it. but it's supposed to be legit to be talking to the network because there are *other* people in it that look reasonable
16:53:29 <arma2> this seems like a different assumption (like, if this were a crypto paper, they just swapped out their underlying hardware assumption)
16:53:37 <dcf1> another way to look at it is you've got an obfs4 server (or similar) that can magically be at whatever IP address you need it to be at, within an ISP's network.
16:53:38 <arma2> s/hardware/hardness/ oops
16:54:22 <phw> (unhelpful comment: i find it amusing that conjure's decoy endpoints are conceptually very similar to the gfw's active probing decoy clients)
16:54:27 <cohosh> yeah i like the flexibility here
16:54:36 <dcf1> so yeah, they break with previous refraction networking papers in not necessarily needing to pretend to carry all traffic within a legit TLS session.
16:54:53 <cohosh> the registration is still old school
16:55:09 <dcf1> although the "mask sites" is one possible option for the carrier channel, and it still looks like a TLS connection to somewhere.
16:58:19 <arma2> phw: speaking of that, it seems that a new research question they bring in is: how easy/hard is it to pretend to be a leaf in your network, when you're actually in the middle of the network? like, you have to get ttl right, but is there a huge list of things you need to get right
16:59:02 <arma2> we had that challenge in the 'defiance' paper long ago
16:59:03 <arma2> https://www.freehaven.net/anonbib/#foci12-defiance
16:59:16 <arma2> where we wanted to wrap up packets at one point, transport them to another point, and pretend that we were an ordinary computer
16:59:38 <arma2> and that means having varied tcp kernel fingerprints, etc etc
16:59:56 <dcf1> Yes in 7.1 they say something about that, "we can filter mask sites by those that have identical TCP/IP stacks to ours" but admit there's not a comprehensive satisfying solution.
16:59:58 * cohosh again finds this discussion brings up so many more cool papers to read
17:01:46 <cohosh> i found the collateral damage claims in the paper interesting
17:02:21 <cohosh> where conjure will share IP address ranges with clients that include legitimate sites
17:03:19 <cohosh> because in ipv6 address space the chance of choosing them as the phantom destination is low, but it prevents a censor from blocking the whole range
17:03:50 <cohosh> i wonder how much this collateral damage amounts to given that a lot of popular sites also have ipv4 addresses
17:04:35 <dcf1> I agree it's interesting territory, saying that a censor doesn't mind deploying a /80 or /16 block rule, but doesn't like maintaining a sparse list with lots of holes in it.
17:05:02 <dcf1> We do know, form observations of censors in e.g. Russia and China, that censors are not good at maintaining even the IP blocking rules they have now.
17:06:23 <cohosh> what do you mean by maintaining? like updating the IP addresses of sites they want to block?
17:06:51 <dcf1> Yes, and pruning stale and obsolete rules.
17:09:02 <dcf1> I'm thinking about https://censoredplanet.org/russia (Section VI) and findings such as that they have many duplicates in their blocklist.
17:09:21 <cohosh> oh cool, thanks for the link
17:11:18 <agix> What is the probability that a top tier isp might actually agree on deploying such a system?
17:11:50 <cohosh> that's a good question, and one that eric and the other tapdance folks are in a better position to answer
17:12:00 <cohosh> fwiw, tapdance currently has an impressive deployment
17:12:08 <cohosh> but at a smaller ISP in Michigan
17:12:12 <dcf1> agix: the prior knowledge on that point is https://censorbib.nymity.ch/#Frolov2017a
17:12:48 <agix> cool thanks
17:13:08 <agix> is the ISP on Michigan the only one to currently deploy Tapdance?
17:13:09 <phw> aiui, the good news is that you don't necessarily need a tier 1 isp like level 3 to adopt this
17:13:10 <dcf1> My understanding is that they applied it at a CU Boulder router and at a regional ISP.
17:13:13 <agix> *in
17:13:34 <dcf1> And some fraction of users are actually using the deployment in the linked paper.
17:14:29 <dcf1> I think the history of how we got to this point was that the Refraction Networking team was actually proactive in trying to pitch Telex to various ISPs, and move it beyond just a research idea
17:15:00 <cohosh> yup, it was a combination of tapdance being an easier-to-deploy design and also hardwork from the tapdance team in having these conversations with isps
17:15:23 <dcf1> But the response they got from ISPs was "No way. We cannot install any flow-blocking elements in our network." (At least that's what I got from talking to one of the authors about it.) It's a super-hard requirement from ISPs that they can't install flow-blocking middleboxes.
17:15:25 <cohosh> anecdotally, they mentioned to me that they got laughed out of the room by [major US ISP]
17:16:16 <dcf1> Incidentally, we tend to see the same preference in censors, for on-the-side blocking devices (e.g. RST and DNS injection) rather than in-line packet dropping.
17:17:26 <dcf1> And the GFW is known to "fail open" rather than closed, probably for the same reason.
17:18:42 <cohosh> that's an interesting insight
17:20:49 <dcf1> TapDance is limited to about 32 KB of upload per decoy host, which comes from the decoy's TCP receive window. Out-of-window packets will get reset, which is detectable.
17:21:16 <dcf1> Conjure removes that restriction, which give it something like 1000x greater upload.
17:21:59 <cohosh> yeah that part is awesome
17:22:02 <dcf1> It means they must have had some way of doing session continuity in TapDance. I was just checking today to see how they do it, but didn't find it yet. I wondered if they kept the same mechanism in Conjure, or if the removal of the 32 KB limit makes it unnecessary.
17:22:53 <dcf1> some fraction of *Psiphon* users I meant to say earlier.
17:23:51 <dcf1> So are there any implications of Conjure for the anti-censorship team?
17:24:09 <cohosh> Yeah! We've been talking to eric about possibly using it for snowflakes
17:24:25 <cohosh> or another kind of pluggable transport
17:24:39 <dcf1> Hmm, so like a proxy-go that's multi-homed to many phantom IPs.
17:24:45 <dcf1> that's great
17:26:19 <dcf1> oh and about mask sites, I had trouble understanding the concept just from reading the paper. Here's the longer explanation I wrote up after consulting with the authors: https://github.com/net4people/bbs/issues/18#issuecomment-561336149
17:26:23 <cohosh> all in all i think conjure is a really cool idea and i'm excited about the possibilities here
17:26:50 <agix> nice thx
17:28:40 <phw> any more thoughts? or should we wrap it up?
17:29:11 <cohosh> i'm good on my end
17:29:46 <agix> same here
17:30:07 <phw> ok! any suggestions for our next paper or project?
17:31:36 <phw> we don't have to decide now. if anything comes to mind over the next few days, please let our mailing list know
17:31:39 <phw> #endmeeting