16:00:06 #startmeeting anti-censorship team meeting 16:00:06 Meeting started Thu May 28 16:00:06 2020 UTC. The chair is phw. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:06 Useful Commands: #action #agreed #help #info #idea #link #topic. 16:00:18 hi all, here's our pad: https://pad.riseup.net/p/tor-anti-censorship-keep 16:00:23 hello 16:00:25 hi 16:00:28 hi! 16:00:42 (hi! i am nearby but still catching up on things) 16:00:58 antonela: let's start with the s30 ux discussion? 16:01:03 sure thing! 16:01:24 so, first of all, this is a first approach, please feel free to tell me you are crazy or not 16:01:44 i wanted to drop down all the things we informally talked about so we have some base to discuss 16:02:08 personas is in progress, duncan is back with some hours as volunteer and we are making progress there 16:02:19 hi! 16:02:32 i used the EFF circumvention guide to define some escenarios, please let me know if you want to include more/different use cases 16:02:59 and finally, i made a first attempt of what could/would be a tor browser ten flow for censored users 16:03:34 i think we can plan from now to six months the next major release, and that release can be focused on all the efforts we are doing around anticensorship 16:03:54 oh, exciting 16:04:05 cool@ 16:04:06 we did it with onion services and the 9.5 release which will happen next week and i think was a great process (we still need to release :) 16:04:17 is there a link for the persona work? 16:04:32 yes, our notes are linked in that notion 16:04:50 #32811 is the ticket 16:04:54 * dcf1 had to look up what s30 and objectives were: https://trac.torproject.org/projects/tor/wiki/org/sponsors/Sponsor30 16:05:00 antonela: how would you like to get feedback? email? 16:05:10 yes, will update the tickets after this meeting but wanted to briefly discuss it live first 16:05:26 phw: i dont know, maybe export all this and paste in a ticket is useful? 16:05:35 is a simple markup export 16:05:41 sorry, markdown 16:06:33 so, the main question from is how do you feel about changing the narrative in tor browser from making users to find their bridges vs tor browser giving the best bridge 16:06:42 *from me 16:07:09 if you feel confident that we are in a place where we can try something around it, then we can move forward 16:07:19 antonela: just a second that maybe people need a review of what those activites were 16:07:32 oh yes 16:07:53 i like the idea of automating the bridge usage process more. particularly in getting the bridges/choosing the transports automatically 16:08:33 im contemplating a step that users might need to solve a puzzle or give sensitive information, im not sure exactly what you may need, but im having that moment into account 16:08:41 a lot of this will require tor browser knowing what country it's in 16:08:46 right 16:08:49 i think detecting interference will be difficult, since there are a lot of non-censorship reasons for network trouble, but perhaps this can be helped by phw's country-specific work? 16:09:03 ah yes +1 16:09:06 and that is something we may want users to give consent 16:09:38 discussing this ideas with sysrqb, he remembered me that we have some users with specific locales which can jump that step 16:09:53 but we have a lot of users using the us version who are in another places that is not the states 16:10:02 right, that's a good point 16:10:59 our supported locales in general are much fewer than the number of countries (or ASes) people can be in 16:11:15 right, we've heard from folks in e.g. arabic-speaking countries that they use the english version because the words are clearer 16:11:25 (and also, "arabic-speaking countries" is a lot of countries, each with their own censorship situation) 16:11:29 yep 16:11:35 so is useful, but partially 16:11:48 as you may see in this early wireframes, the plan is removing tor launcher from the boostrapping, this will require a deep discussion with tb dev folks but i think it worth it 16:12:13 as i said, this is a super early concept and im happy to incorporate all your feedback and iterate this mocks until we have a prototype we are happy with 16:13:05 antonela: there's not a lot i can say now. i'll need an hour to go over your work and think about it 16:13:12 yes sure 16:13:17 i'll export all this and paste in tickets 16:13:19 ok. Let's try to put all this feedback and discussion in trac tickets 16:13:24 and we can continue working there 16:13:26 sounds good, thanks 16:13:26 sounds good? 16:13:27 thanks! 16:13:36 thanks antonela! 16:13:48 no problem, im happy to back on focusing on this! 16:14:14 we can ship really good improvements in tor browser 16:14:25 you are a stellar team :) 16:14:36 :) 16:14:45 anything else regarding sponsor 30? 16:14:52 * cohosh wants to be a lunar team 16:14:57 not really, the other part is more about mirrors 16:15:13 i'll comment in tickets so we can look on what is in trac about it too 16:15:18 ohh the mirrors stuff yeah, is that for gettor? 16:15:23 for everything 16:15:40 we need a table with all the mirrors and some green and red dots that show if they are running or not 16:15:48 is very useful for support and also for us to monitor what is going on 16:15:53 right now im lost on mirrors 16:16:05 * antonela nice title for movie 16:16:15 phw: my very very early impression is, one of the good roles for you and others here, re antonela's workflow, is to figure out which technical steps are feasible. like, what are our options for deciding that we're in a censored area, and which options will actually work reliably. 16:16:30 yeah i am not even sure what mirrors exist. i can say that if there are functioning gettor mirrors they are extremely out of date 16:16:31 fwiw, bridgedb doesn't have a mirror. its domain-fronted endpoint (moat) should work everywhere as far as we know, so i don't see a need for a mirror 16:16:50 for bridgedb specific, should we have tpi brand there? 16:17:12 should we have bridgedb under the lektor / www flow? 16:17:28 antonela: does that mean changing its website UI so match torproject.org? 16:17:30 arma2: yes, that is indeed what we need next 16:17:33 phw: yes 16:18:08 antonela: i would like that but that will require some work that's currenty not on our roadmap 16:18:21 oki, good to know 16:18:29 arma2: yes, there are already sponsor30 tickets for that 16:19:38 bridgedb seems to use the bootstrap framework. i'm not sure how much work it would be to make it match torproject.org 16:19:56 if is using boostrap is just loading a css over that 16:20:14 we can include it in the www roadmap if we want it 16:20:22 (i personally can work on it) 16:20:22 yeah metrics.torproject.org is bootstrap already too 16:20:26 huh, if it's as simple as that, i may give it a shot later today 16:20:43 yes, you can give a try phw 16:21:21 i think that is all from me, i'll update the tickets and we can continue from there 16:21:26 thanks antonela 16:21:31 np 16:21:53 next up is snowflake QoL improvements 16:21:58 ...written by a mysterious green person 16:22:12 lol 16:22:16 haha 16:22:45 so there currently some clients who have a really hard time connecting to a snowflake 16:22:55 only ~12% of snowflakes work for them 16:23:18 which means that, even with things like requesting multiple snowflakes, it takes a very long time to get a working connection 16:23:43 we've found the issue to be related to NAT implementations and there are some solutions 16:23:58 which involve handing these clients proxies with less restrictive NATs 16:24:21 but any time we prioritize some snowflakes over others using a metric the proxies self report, we are making ourselves more vulnerable to DoS 16:25:35 i would like to keep as much in line with concept of snowflake having many ephemeral, lightweight proxies as possible 16:26:06 but i think we need to do something more here to match up clients with proxies that work for them 16:26:16 so i was hoping for thoughts/feedback on how to handle this tradeoff 16:26:33 the self-reported proxy metric would be "hey, i'm behind a NAT of type X"? 16:26:40 yup 16:27:25 in general we don't have great DoS defenses 16:27:34 so maybe i am getting too in the weeds here 16:27:42 and we should just make it work well first 16:27:45 seems like we need to have the info, to give users snowflakes that work 16:28:13 my concern is that now in order to flood the broker, a censor can just claim to be a super permissive snowflake and do it much more easily 16:28:34 but like i said, perhaps we are already too weak against this attack for this change to matter much 16:29:03 right. maybe it's conceivable that we could measure the snowflakes and decide what kind of nat they are. but then the attacker could respond to our probe in a way that makes us decide 'most permissive' 16:29:13 i also understand if people would rather discuss this on the ticket than here, but wanted to raise my concern 16:29:25 by "flood" do you mean "be matched with as many clients as possible"? 16:29:59 phw: basically by spinning up a bunch of malfunctioning snowflakes, and making sure these get handed out to clients more easily than honest ones 16:30:10 i see 16:30:35 * dcf1 is constantly impressed by cohosh's productive collaboration with pion upstream (https://github.com/pion/stun/pull/33) 16:30:52 lol that is all pion. they are great 16:31:07 about getting back quickly on things 16:31:13 it's like minutes 16:32:02 technically, we have the same problem with bridges. a censor could set up non-functioning bridges to harm the user experience 16:32:10 granted, it's easier to set up a snowflake than a bridge 16:33:05 and, for both snowflakes and bridges, we could spot-check them for correct behavior, and they could grow more and more subtle with their failures 16:33:27 cohosh: I think I agree with your assessment of the situation. A change like self-throttling if you don't have a good NAT type does increase risk of DoS, but for Snowflake in its current state the increase is somewhat marginal. 16:34:17 One way to look at it is that the snowflakes with a non-favorable NAT type are already "DoS"ing the network for some clients somewhat, which can be a good thing, because it lets us learn to deal with a situation like that in a somewhat safe and controlled environment. 16:34:52 Two existing tickets that can help against broker-flooding DoS are #25723 and #34080 16:34:53 hmm yeah, that's true. a DoS that disproportionately affects some users more than others 16:35:33 yup, my worry is that with a 12% failure rate these tickets won't do enough by themselves 16:35:46 It it's possible to cycle through enough snowflake quickly enough (possibly simultaneously), then that 88% failure rate starts to quickly decay to 0. 16:36:31 oh right sorry 12% success rate* 16:36:45 dcf1: yeah, and imo that is slightly more in the spirit of snowflake 16:37:27 So for me, it's been taking 5 or 6 minutes, with fairly high variance, to connect with the new 9.5a13. That's without one other commit (can't find it now) that reduced one of the timeouts. 16:37:30 get a blizzard, and eventually some of the snowflakes will work for you 16:38:06 dcf1: i guess you're behind the pessimal kind of nat? 16:38:20 hmm 5 or 6 minutes is a lot. i had a friend with a symmetric NAT run some tests for me and with a 12% success rate it took 10 minutes 16:38:24 Right, but still I have doubts about whether that really works or if there's some other failure mode we will run into (like perhaps conspicuous network activity) 16:38:42 that's a good point 16:38:50 i guess there's no reason why we can't experiment 16:38:55 It should be a binomial distribution right? So a long tail. 16:39:01 yup 16:39:34 cohosh, dcf1: i liked the framing of it as fairness -- here you are trying to be fair to everybody, but now you need to define what you mean by fair 16:39:51 that's the same question that all the congestion control protocols face 16:40:04 so matching up NAT types will make it more fair 16:40:22 and we can do this either by throttling the poll rate of restrictive NATs or doing smart matching at the broker 16:40:25 So the first option in comment:11:ticket:34129 seems pertty non-invasive and good to experiment. 16:40:44 cool, replacing our current poll throttling 16:40:47 The pairwise matching, you're right, seems like it will require some protocol changes. 16:41:04 arma2: yes I think I have the bad kind of NAT. 16:43:19 alright i can move forward with this, thanks! 16:43:36 i appreciate the discussion 16:43:53 and would welcome future input if anyone has more to add later 16:44:42 sorry, a geometric distribution, not a binomial distribution (time to first success, not the number of successes) 16:45:27 okay that's all from me for now unless others have more to add 16:46:52 next up is our 'needs help with'. cohosh has #34129 (which we just discussed) and #34286 (which i'll snatch) 16:46:57 anything else? 16:48:15 *crickets* means "no" 16:48:20 :) 16:48:29 the final item on our agenda is today's reading group 16:48:36 \o/ 16:48:38 which is the conjure paper: https://censorbib.nymity.ch/#Frolov2019b 16:48:51 i prepared a summary but if someone else has one i'm happy if they jump in 16:49:16 cohosh: that would be great. i have to confess that i only made it to page 2, so i'm not a gread session lead 16:49:48 cool, prepare for wall of paste 16:50:03 This paper is about Conjure, a new approach to refraction networking (aka decoy routing) that uses unused IP address space as destination addresses for proxies. 16:50:06 Refraction networking is sometimes referred to as end-to-middle proxying because a client wishing to circumvention censorship, is actually trying to reach a router deployed in the middle of the network rather than a proxy endpoint to tunnel their traffic. 16:50:10 This is usually done by a cryptographically secure steganographic tag inserted in the client's traffic to a legitimate IP address that is not involved with the system. The client's traffic passes through the router on it's way to the destination address and is recognized and redirected to a circumvention proxy by the router in a way transparent to the censor. 16:50:15 So a censor still things the client is talking to the (decoy) destination IP while the client is actually having their traffic proxied by the router. 16:50:19 For some context: refraction networking is difficult to deploy since it requires the cooperation of ISPs. 16:50:22 The tag detection and redirection is expensive and risky when you take into account the amount of traffic that these routers see and the Terms of Service agreements the routers have with the potential decoy IP addresses. 16:50:26 TapDance (which Conjure builds on) is the only large-scale deployment of refraction networking used for censorship resistance. 16:50:29 This is made possible by design decisions that make the refraction networking process less risky for ISPs. 16:50:37 Conjure seeks to make the process even easier by using the ISPs *unused* IP(v6) addresses as decoy destinations that the client is seen connecting to, rather than real HTTPS sites. 16:50:41 To connect, clients perform a unidirectional registration process with the router by inserting a steganographic tag into a connection to a real site behind the router. 16:50:44 This tag contains: a public key that allows the client (who already knows the router's public key) and the router to compute a shared secret, and a message encrypted with that secret that contains a seed. 16:50:48 The seed is used by both the router and client to computer a destination IP address that exists within a range of pre-shared unused IP address space. 16:50:51 If the registration was successful, all packets from the client to the phantom destination address are forwarded by the router to its proxy server. 16:50:54 Another difference in conjure are that there are multiple available transports that can be used to add another obfuscation layer onto the forwarded traffic. These include obfs4 and mask sites (aiui similar to HTTPS proxy). 16:50:58 As a result of its design, Conjure allows for much larger uploads than TapDance (the upload traffic isn't being sent to a legitimate HTTPS that terminates the connection after the client sends ~32KB of traffic). 16:51:02 16:52:24 the thing i found fascinating about conjure is that it seems like it's totally shifting the assumptions about what it's trying to blend with. in the early decoy routing approaches, there's a website, and you're talking to it, and it's legit to be talking to it. 16:52:46 but here, there's a...network. and you're talking to somebody surprising in it. but it's supposed to be legit to be talking to the network because there are *other* people in it that look reasonable 16:53:29 this seems like a different assumption (like, if this were a crypto paper, they just swapped out their underlying hardware assumption) 16:53:37 another way to look at it is you've got an obfs4 server (or similar) that can magically be at whatever IP address you need it to be at, within an ISP's network. 16:53:38 s/hardware/hardness/ oops 16:54:22 (unhelpful comment: i find it amusing that conjure's decoy endpoints are conceptually very similar to the gfw's active probing decoy clients) 16:54:27 yeah i like the flexibility here 16:54:36 so yeah, they break with previous refraction networking papers in not necessarily needing to pretend to carry all traffic within a legit TLS session. 16:54:53 the registration is still old school 16:55:09 although the "mask sites" is one possible option for the carrier channel, and it still looks like a TLS connection to somewhere. 16:58:19 phw: speaking of that, it seems that a new research question they bring in is: how easy/hard is it to pretend to be a leaf in your network, when you're actually in the middle of the network? like, you have to get ttl right, but is there a huge list of things you need to get right 16:59:02 we had that challenge in the 'defiance' paper long ago 16:59:03 https://www.freehaven.net/anonbib/#foci12-defiance 16:59:16 where we wanted to wrap up packets at one point, transport them to another point, and pretend that we were an ordinary computer 16:59:38 and that means having varied tcp kernel fingerprints, etc etc 16:59:56 Yes in 7.1 they say something about that, "we can filter mask sites by those that have identical TCP/IP stacks to ours" but admit there's not a comprehensive satisfying solution. 16:59:58 * cohosh again finds this discussion brings up so many more cool papers to read 17:01:46 i found the collateral damage claims in the paper interesting 17:02:21 where conjure will share IP address ranges with clients that include legitimate sites 17:03:19 because in ipv6 address space the chance of choosing them as the phantom destination is low, but it prevents a censor from blocking the whole range 17:03:50 i wonder how much this collateral damage amounts to given that a lot of popular sites also have ipv4 addresses 17:04:35 I agree it's interesting territory, saying that a censor doesn't mind deploying a /80 or /16 block rule, but doesn't like maintaining a sparse list with lots of holes in it. 17:05:02 We do know, form observations of censors in e.g. Russia and China, that censors are not good at maintaining even the IP blocking rules they have now. 17:06:23 what do you mean by maintaining? like updating the IP addresses of sites they want to block? 17:06:51 Yes, and pruning stale and obsolete rules. 17:09:02 I'm thinking about https://censoredplanet.org/russia (Section VI) and findings such as that they have many duplicates in their blocklist. 17:09:21 oh cool, thanks for the link 17:11:18 What is the probability that a top tier isp might actually agree on deploying such a system? 17:11:50 that's a good question, and one that eric and the other tapdance folks are in a better position to answer 17:12:00 fwiw, tapdance currently has an impressive deployment 17:12:08 but at a smaller ISP in Michigan 17:12:12 agix: the prior knowledge on that point is https://censorbib.nymity.ch/#Frolov2017a 17:12:48 cool thanks 17:13:08 is the ISP on Michigan the only one to currently deploy Tapdance? 17:13:09 aiui, the good news is that you don't necessarily need a tier 1 isp like level 3 to adopt this 17:13:10 My understanding is that they applied it at a CU Boulder router and at a regional ISP. 17:13:13 *in 17:13:34 And some fraction of users are actually using the deployment in the linked paper. 17:14:29 I think the history of how we got to this point was that the Refraction Networking team was actually proactive in trying to pitch Telex to various ISPs, and move it beyond just a research idea 17:15:00 yup, it was a combination of tapdance being an easier-to-deploy design and also hardwork from the tapdance team in having these conversations with isps 17:15:23 But the response they got from ISPs was "No way. We cannot install any flow-blocking elements in our network." (At least that's what I got from talking to one of the authors about it.) It's a super-hard requirement from ISPs that they can't install flow-blocking middleboxes. 17:15:25 anecdotally, they mentioned to me that they got laughed out of the room by [major US ISP] 17:16:16 Incidentally, we tend to see the same preference in censors, for on-the-side blocking devices (e.g. RST and DNS injection) rather than in-line packet dropping. 17:17:26 And the GFW is known to "fail open" rather than closed, probably for the same reason. 17:18:42 that's an interesting insight 17:20:49 TapDance is limited to about 32 KB of upload per decoy host, which comes from the decoy's TCP receive window. Out-of-window packets will get reset, which is detectable. 17:21:16 Conjure removes that restriction, which give it something like 1000x greater upload. 17:21:59 yeah that part is awesome 17:22:02 It means they must have had some way of doing session continuity in TapDance. I was just checking today to see how they do it, but didn't find it yet. I wondered if they kept the same mechanism in Conjure, or if the removal of the 32 KB limit makes it unnecessary. 17:22:53 some fraction of *Psiphon* users I meant to say earlier. 17:23:51 So are there any implications of Conjure for the anti-censorship team? 17:24:09 Yeah! We've been talking to eric about possibly using it for snowflakes 17:24:25 or another kind of pluggable transport 17:24:39 Hmm, so like a proxy-go that's multi-homed to many phantom IPs. 17:24:45 that's great 17:26:19 oh and about mask sites, I had trouble understanding the concept just from reading the paper. Here's the longer explanation I wrote up after consulting with the authors: https://github.com/net4people/bbs/issues/18#issuecomment-561336149 17:26:23 all in all i think conjure is a really cool idea and i'm excited about the possibilities here 17:26:50 nice thx 17:28:40 any more thoughts? or should we wrap it up? 17:29:11 i'm good on my end 17:29:46 same here 17:30:07 ok! any suggestions for our next paper or project? 17:31:36 we don't have to decide now. if anything comes to mind over the next few days, please let our mailing list know 17:31:39 #endmeeting