17:59:14 <phw> #startmeeting anti-censorship meeting
17:59:14 <MeetBot> Meeting started Thu Apr 30 17:59:14 2020 UTC.  The chair is phw. Information about MeetBot at http://wiki.debian.org/MeetBot.
17:59:14 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic.
17:59:17 <phw> hello everybody!
17:59:21 <cohosh> hi
17:59:24 <phw> here is today's meeting pad: https://pad.riseup.net/p/tor-anti-censorship-keep
17:59:37 <phw> we have a packed schedule today, so let's jump right in
17:59:48 <agix> hi
17:59:58 <phw> first off, an announcement. we have a new list: anti-censorship-alerts@lists.tp.o
18:00:08 <phw> the idea is for our monitoring systems to send their alerts to this list
18:00:34 <phw> cohosh and i are on it. feel free to subscribe (or send reports there) if you're interested
18:00:40 <arma2> (hi! i am nearby if you have questions for me, and will try to watch the meeting too :)
18:01:21 <phw> when i say "monitoring systems" i mean gman999's sysmon instance, our prometheus instance, and the nagios instances that check gettor and bridgedb's email autoresponder
18:01:35 <agix> do you have a link where i can subscribe to that mailing list?
18:01:49 <phw> agix: https://lists.torproject.org/cgi-bin/mailman/listinfo/anti-censorship-alerts
18:01:54 <agix> thx
18:02:01 <phw> keep in mind that it may be the most boring mailing list you've ever been on
18:02:07 <phw> but i appreciate your enthusiasm
18:02:18 <dcf1> "No messages have been posted to this list yet" my kind of list
18:02:32 <agix> haha :D
18:02:46 <phw> you'll be drowning in alerts soon, no worries!
18:03:18 <cjb> hi! unrelated: I've tried to sign up for the anti-censorship-team list but I think my join request's stuck in moderation.
18:03:46 <phw> ok, next up is a discussion about indicators that can tell us if we're on the right path as a team. it's too easy to get obsessed with short-term goals and forget about the long-term ones
18:04:22 <phw> cjb: hmm, nothing's stuck in the moderation queue. pm me your email address and i'll add you manually
18:04:50 <cjb> thanks
18:05:11 <phw> regarding indicators: an obvious one is the # of bridge users. if this number is falling, we may be doing something wrong
18:05:32 <phw> needless to say, we shouldn't get obsessed with these numbers and do whatever it takes to make them go up
18:06:48 <phw> my point is that we should occasionally revisit these numbers to make sure that we're on the right track. in the end, we're doing all of this for our users and if they're jumping ship, we're doing something wrong
18:07:09 <phw> can anyone think of other indicators that we should periodically be revisiting?
18:07:36 <cohosh> by on the right track, you mean "we're generally making anti-censorship better" as opposed to "stuff isn't failing", right?
18:09:07 <phw> the former, mostly
18:09:15 <cohosh> okay cool
18:09:42 <cohosh> hmm i haven't really thought about this outside of usage metrics
18:09:46 <phw> this topic came up because a sponsor wants us to report "indicators of success". while i'm generally not a big fan of these, they can serve as a good reminder that we're doing what we should be doing
18:09:58 <cohosh> ah
18:10:00 <dcf1> there are sometimes tickets like #33884 that say something doesn't work, though it's usually hard to get any details from them. Sometimes in blog commetns too.
18:10:26 <dcf1> If it were easier for users to report such, maybe we would get anough high-quality reports to act on, and the number of such reports could be an indicator.
18:10:43 <cohosh> o.O man i wish we'd get cc'd on tickets when they change components
18:11:04 <dcf1> cohosh: yes I know what you mean, I find tickets like that randomly sometimes.
18:11:20 <phw> dcf1: right, that's a good point
18:11:45 <dcf1> https://blog.torproject.org/comment/287638#comment-287638 "Snowflake and Meek-Azure connections still not working. Will this ever get fixed? Both of these have been non functional for quite a long time now....
18:12:10 <dcf1> Obviously this user's experience is not representative, but still it represents at least one person having a bad time.
18:12:53 <phw> right, that's also why i've been meaning to blog more. there should be more contact with users and while blog comments aren't perfect, it's better than doing nothing
18:14:52 <agix> It wouldn’t be very representative of our work, but perhaps it would be nice to track how many lines of code have been contributed to a specific project over each month
18:14:54 <phw> the frontdesk is yet another great resource but i'm not reading it :/
18:15:15 <dcf1> agix: yes, amount of code or number of commits
18:16:21 <gaba> phw: we are getting more people into frontdesk now
18:16:26 <gaba> hopefully we can get it better organized
18:16:39 <arma2> phw: it is a good idea, that (a) somebody from anti censorship team should be talking to the frontdesk people to see what censorship questions get asked and how to help, and (b) the rate of questions, or the rate of *resolving* questions, is a neat (though expensive to track) indicator
18:17:31 <phw> yes, a) for sure
18:18:35 <arma2> (in theory frontdesk should be escalating things they can't resolve themselves. i haven't seen them doing that lately, so maybe they're just trying to resolve everything themselves, even if that means not reporting issues 'upstream'. worth finding out actual facts here. :)
18:19:49 <arma2> phw: for other indicator ideas: find out what other people do in other contexts, and see if they make sense to apply here. since we didn't just name a ton.
18:20:12 <phw> arma2: yes, thanks for the suggestions
18:20:26 <cohosh> i guess we could also do rate of closing user-reported tickets
18:23:07 <phw> thanks all, these are good suggestions. we should keem them in mind but not obsess over it, to avoid goodhart's law, which says that "when a measure becomes a target, it ceases to be a good measure."
18:24:07 <phw> next up is our monthly report. please add your april 2020 highlights to our monthly report pad: https://pad.riseup.net/p/CANAMyvGfb-wtKfzN2R_
18:25:21 <phw> let's move on to our 'needs help with' sections
18:26:19 <phw> #30794 for phw, #34061 and #34062 for cohosh, #33365 for arlolra
18:26:36 <phw> i can take #34061 and #34062
18:26:54 <cohosh> thanks!
18:27:04 <cohosh> arlolra: you're ready for another review of #33365 now?
18:27:22 <arlolra> cohosh: I responded to your requests for information, yes
18:27:38 <cohosh> oh i see ok
18:27:43 <cohosh> i'll take another look then
18:27:58 <cohosh> phw: i can take #30794
18:28:04 <arlolra> I thought we were going to get that in before your patch, but maybe you resolved the conflict?
18:28:06 <phw> thanks cohosh. should be a very quick review
18:28:36 <cohosh> arlolra: yeah i used a different method
18:28:41 <arlolra> ok
18:28:43 <cohosh> that didn't require conflicting changes
18:28:52 <cohosh> sorry i misunderstood that you were still working on it
18:29:48 <arlolra> not a problem, I was just waiting on that ordering to review your patch
18:29:53 <arlolra> but glad you weren't blocked
18:30:18 <dcf1> #34061 looks fine to me, but I don't know GetTor.
18:31:47 <phw> shall we move on to today's reading group?
18:32:49 <dcf1> I have a brief summary to paste to start the discussion.
18:32:59 * phw takes the sound of crickets to mean "yes"
18:33:14 <phw> dcf1: sounds good!
18:33:18 <dcf1> censorbib link: https://censorbib.nymity.ch/#Nasr2020a
18:33:35 <dcf1> long draft summary (not reviewed by the authors yet): https://framabin.org/p/?803eae8e7a2829d8#hPFXVNDvLtX2c8D6h2Y1Il1z9RCfDE84zvKMfd4xzVA=
18:33:40 <dcf1> MassBrowser is a circumvention system, deployed now for more than a year. It combines several circumvention technologies in order to overcome problems with other systems, such as high cost or low quality of service.
18:33:52 <dcf1> The client software chooses least expensive / highest performance option for every connection. In order:
18:33:58 <dcf1> - selective proxying (non-censored sites do not use the circumvention system at all)
18:34:06 <dcf1> - CacheBrowsing (domain fronting but only for HTTPS sites on the same CDN, less expensive and less general than meek)
18:34:12 <dcf1> - volunteer proxies (like Snowflake) and user-to-user proxying
18:34:20 <dcf1> - Tor (tunnelled through a volunteer proxy)
18:34:32 <dcf1> MassBrowser is relevant to our current interests because of its volunteer proxy subsystem, which is similar to Snowflake in many ways.
18:34:41 <dcf1> MassBrowser's Operator is like the Snowflake broker in that it matches clients and proxies, though the Operator has additional responsibilities such as distributing the databases needed to enable CacheBrowsing.
18:34:50 <dcf1> MassBrowser's volunteer proxies (called Buddies) are not browser-based, and so are able to be more free in their protocol support.
18:34:59 <dcf1> end of summary
18:36:08 <dcf1> So I haven't used MassBrowser yet. I meant to ask if we could have some invitation code but I forgot to.
18:36:15 <cohosh> i found that last bit interesting because i remember in the early days of mass browser a poster about a webextension for the buddies
18:36:32 <dcf1> It's noteworthy that it's been in beta deployment for a year, so it's a real working system facing real challenges.
18:36:51 <dcf1> cohosh: that's interesting, I didn't know that.
18:36:59 <cohosh> it's possible it's a false memory
18:37:10 <cohosh> but i remember discussion about buddies having exit policies as well
18:37:17 <cohosh> which i don't see in this paper
18:37:28 <dcf1> They call it "Content Categories"
18:37:40 <cohosh> ahh i thought that was just for deciding how to route it
18:37:45 <cohosh> not whether or not to route it
18:37:51 <phw> i think the paper may misunderstand the "slowness" of existing circumvention systems. it assumes that the circumvention systems are slow while it really may be (at least in part) the international links.
18:38:33 <dcf1> As I understand it, Buddies express what subset of categories (from Table III, page 10) they are willing to proxy.
18:38:39 <arma2> cohosh: we talked to them in january and i think at that point they were still talking about buddies being webextensions. or...things run in web browsers. i don't want to assume the wrong modern browser terminology. :)
18:38:55 <cohosh> ok got it
18:39:16 <dcf1> When a user wants to make a connection, they have to tell the Operator what content category they want to access (just the category, not the specific site), and the Operator finds a compatible Buddy
18:39:20 <dcf1> is my understanding
18:39:50 <arma2> i certainly find the "buddies get to spy on everything" and "buddies get blamed for everything" angles interesting, because they're one of the differences from tor. i wonder if it affects which people / how many people sign up to use it or to be a buddy.
18:40:02 <dcf1> phw: I agree with you. One part of this work that really didn't resonate with me is the taxonomy of circumvention systems and their drawbacks (Section II, Table I)
18:40:44 <dcf1> I think their "Key shortcomings of existing systems" is mostly pretty sound, but I don't think circumvention systems break down quite so neatly as they present.
18:41:39 <dcf1> Especially the term "proxy-based" used to refer only to a certain form of static, unshielded proxy, when everything including MassBrowser uses a "proxy" of some kind or another.
18:41:40 <phw> i set up massbrowser earlier today. it's got a slick control UI, similar to vidalia back in the day
18:42:09 <dcf1> This isn't really a complaint about the system, I think the boundaries are blurry and it's hard to characterize in a few pages in any case.
18:42:23 <dcf1> phw: you set up a Client or a Buddy?
18:42:41 <phw> dcf1: i'm using the client
18:43:26 <arma2> phw: the vidalia reference makes me wonder how/whether they do secure updates for their software
18:43:49 <arma2> (and all those other messy software engineering things that you have to deal with when you give actual humans your software)
18:44:03 <phw> arma2: yes, i was wondering about that earlier today. the latest massbrowser download is based on firefox esr.
18:44:21 <dcf1> arma2: page 10 talks about Client-Buddy protocol obfsucation. "We  also  implement  traffic  obfuscation  to  protect  Mass-Browser’s traffic against traffic analysis attack. Particularly,  we  have  built  a  custom  implementation  of ... obfsproxy ... The obfuscation algorithm protocol look like benign peer-to-peer traffic, e.g., p2p gaming or file sharing traffic."
18:45:10 <dcf1> The part that makes me think it's not a browser extension is Appendix A, "We have coded our Buddy software in Javascript  ES6  using  NodeJS  with  a  graphical  user  interface developed  with  the  Electron  framework"
18:45:22 <phw> you have to give them credit for subjecting massbrowser to an audit. subgraph is apparently working on it (or was, at the time they prepared their camera ready)
18:46:18 <dcf1> Yeah it really looks like they are making a proper effort of real deployment, bundling a browser, considering users, and everything that goes with that.
18:46:27 <cohosh> the thing i'm most interested in right now is how they're matching up clients with buddies
18:46:37 <cohosh> i sent an email to them earlier today to ask about NAT issues
18:46:41 <arma2> phw: they have funding from otf, and otf demands an audit. so yes it makes sense that they are doing it, and also yes good for them for doing it for real. :)
18:46:51 <cohosh> since in the paper they say they are determining the type of NAT by running their own STUN server
18:47:03 <cohosh> but we know now that they are using 3rd party STUN servers
18:47:05 <dcf1> V.B page 8, "Assigning Buddies to Clients by the Operator"
18:47:35 <dcf1> yes, quite interesting that peers know their own NAT type, and the Operator tries to match them up compatibly.
18:47:42 <cjb> It looks like they handle updates using Electron's own updater API: https://github.com/SPIN-UMass/MassBrowser/blob/9c2fdc878cdef4c743f452a8371209113e252134/app/src/common/services/autoUpdater.js
18:47:46 <cohosh> yeah I'm wondering how they do that
18:47:46 <cjb> > import { autoUpdater as eAutoUpdater } from 'electron-updater'
18:47:57 <cohosh> (determine the NAT type that is)
18:48:06 <cohosh> becuase from what I can tell this isn't trivial
18:49:21 <cohosh> they say "MassBrowser’s  Operator  serves  as  a  STUN  server  to  dis-cover the NAT type of each peer."
18:49:30 <cohosh> but I'd think you'd need two different servers
18:49:43 <cohosh> and their STUN server got blocked so they switched to using 3rd party STUN servers
18:50:13 <phw> cohosh: could it be two different services rather than servers? that is, on the same ip address but behind different ports?
18:50:14 <dcf1> cohosh: good point. I was only thinking about the STUN server being blocked, I hadn't considered the mechanics of figuring out NAT type.
18:50:37 <phw> either way, i'm also curious to read amir's response
18:51:19 <dcf1> cjb: thanks for finding that re the updater
18:51:23 <cjb> and they find out whether there's a new release in the updater by hitting api.github.com's releases endpoint, so I guess it doesn't have to talk to a MassBrowser updater server
18:51:36 <cjb> I wonder if it goes through the proxy to try to hit api.github.com, or if they're assuming it as unblocked :)
18:51:40 <dcf1> phw: did you have to do anything special to run the client? I hadn't tried yet but the paper says it's still invite-only.
18:51:41 <cjb> (same source file)
18:52:01 <phw> dcf1: yes, it's invite only. let me send you my invitation code. we can probably share it
18:52:07 <phw> (happy to share with anyone else who's interested)
18:53:37 <dcf1> there's an interesting aspec of the peer-to-peer proxying thta differs from snowflake
18:53:53 <dcf1> III.C "Client-to-Client Proxying"
18:54:18 <dcf1> They may have a user in China, for exmaple, be a proxy for a user in Iran, because what is blocking in China is not blocked in Iran, and vice versa.
18:54:58 <arma2> how does that work when their 'exit policies' are just broad categories?
18:55:08 <dcf1> And they can do that kind of thing more easily because they may choose different circumvention routes for every distinct destination, instead of forcing everythingg through the same tunnel.
18:55:16 <cohosh> ah they get away with that because they don't need to connect to tor I guess
18:55:39 <dcf1> arma2: I think this part may be a local client lookup of blocked sites, I'm not sure.
18:55:49 <arma2> like, do they need to have 'chinese political news' and 'farsi political news' as sub-categories
18:55:50 <phw> dcf1: yes, that's a neat observation, and may even bring with it performance improvements if the two clients are geographically close
18:55:57 <phw> (or topologically)
18:56:10 <dcf1> Client periodically download databased from the Operator, which contains things like what domains are CacheBrowseable and what sites are blocked in what countries.
18:56:35 <dcf1> But I'm not sure how a Client requests a Buddy that is compatible both with a content category and a per-country known-blocked list.
18:57:22 <arma2> maybe the buddy can pass it into tor if it can't handle it directly
18:57:34 <arma2> but, how do buddies in china do this properly
18:57:54 <dcf1> The destination-aware proxying of MassBrowser imposes a different and interesting set of design constraints from what Snowflake-with-Tor has.
18:58:17 <dcf1> Buddies exit traffic directly, which means they can eavesdrop and also that they have outgoing traffic attributed to them.
18:58:42 <dcf1> That means Buddies need to have exit policy content category whitelists, to avoid acrrying anything they don't want to be associated with.
18:58:59 <dcf1> Which means the Client has to leak its content categories to the Operator.
18:59:22 <dcf1> Which necessiates the local TLS interception they use, in order to find out what the destination is so it can determine the content category.
18:59:58 <dcf1> The advantages is that knowing the destination gives you a lot of extra felxibility
19:00:21 <dcf1> to, for example, route different destinations differently, or proxy some traffic and not all
19:01:46 <dcf1> I think they have done a good job of identifying their design goals and building the system to suit them, understanding that you can't have everything.
19:04:08 <dcf1> There's one major claim in the paper that I think is questionable and it's relevant to Snowflake too.
19:04:33 <arlolra> the goal being circumvention over privacy, etc?  are there ways to game the integrity of the content delivered?
19:04:39 <dcf1> The claim is that a censor cannot afford to block residential NATed peer-to-peer connections. SS. V.B VII.C
19:05:09 <dcf1> arlolra: the goal being to fix the "key shortcomings of existing systems" from Section I.
19:05:33 <cohosh> dcf1: yeah that's a good point. we only have a bit of data on this
19:05:48 <phw> dcf1: right, i had the same thought when reading that part
19:05:53 <dcf1> I agree with phw that the "poor QoS" is perhaps a bit overstated, and people can disagree with the no-privacy-unless-you-ask design.
19:06:03 <dcf1> I have some quotes that stuck out to me.
19:06:08 <dcf1> "a censor cannot block the Buddies that she obtains from the Operator, nor can she identify their clients (since Buddy IPs are NATed)"
19:06:14 <dcf1> "The IP enumeration techniques that censors practice against traditional circumvention systems like Tor will *not* work against MassBrowser Buddies."
19:06:19 <dcf1> "This is because the censors can only obtain the NAT IPs of the Buddies; blacklisting such IPs will have similar collateral damage as blocking domain fronting systems."
19:06:47 <dcf1> I think this is a major unknown factor for Snowflake too, and my intuition says that the claim is not as strong as is stated.
19:07:15 <dcf1> There's something to be said for agility, that is, the cost of updating a frequenctly changing blacklist, but that's still pretty unknown to us as well, I think.
19:07:41 * cohosh digs up ticket with measurements on this
19:08:55 <dcf1> The question in my mind is, does a censor hesistate to block *.adsl.comcast.net the same way they hesistate to block *.github.io, say.
19:09:03 <phw> (my intuition is that the collateral damage of blocking NATed addresses of buddies is very low, but the real cost comes from the agility that dcf1 mentioned)
19:10:01 <cohosh> #30368
19:10:05 <dcf1> The collateral damage argument can make sense if you think that one way to compete against proxy agility, without constant updates, is to block *really big* address ranges, and then for sure you are blocking something you don't intend to.
19:10:33 <phw> i would love to learn more how "agile" the set of buddies/snowflakes really is. people often claim that proxies jump from one coffee shop to the next one, and that's probably true for some proxies, but i'm afraid that the majority doesn't change their ip address much at all.
19:10:44 <cohosh> it would be cool if we could figure out how to measure this better
19:10:51 <dcf1> phw: I feel the same!
19:11:39 <cohosh> dcf1: phw: perhaps we could add some proxy churn metrics
19:11:56 <cohosh> like count of new ips we've never seen before
19:12:43 <phw> cohosh: i wonder how cellular providers handle ip address churn, and if snowflakes on cell phones make for better proxies because they're more agile
19:13:09 <cohosh> well we don't have any snowflakes on cell phones yet :)
19:13:19 <cohosh> but yeah good question
19:13:22 <phw> (we should distill these ideas and add them to https://research.torproject.org/ideas/ )
19:13:32 <cohosh> so maybe we could have these new churn metrics in place for when we do
19:13:38 <cohosh> and then watch to see if that number goes up
19:13:48 <cohosh> idk if all these are research ideas
19:13:57 <cohosh> so much as things we need to implement to get the right data
19:15:15 <cohosh> seeing whether the blocking of indivudual proxies happens is totally something a research group could do though
19:16:02 <dcf1> one way is to do what a censor would have to do, constantly poll the broker and make a list of all the proxies you see, then do it again the next day and see how many are repeats.
19:16:38 <cohosh> yeah that's true
19:17:19 <cohosh> would doing metrics on our end be useful though?
19:17:39 <dcf1> for sure, to answer questions like these
19:18:19 <cohosh> alright i'll make a ticket for churn metrics
19:20:00 <dcf1> We've covered the topics I meant to highlight for this paper.
19:20:28 <dcf1> If people have time to try running a MassBrowser Client or Buddy and want to write a paragraph of what the experience was like, I for one would appreciate that.
19:21:07 <phw> i'll contribute my experience to the net4people thread
19:21:57 <phw> let's wrap up our reading group
19:22:40 <phw> does anyone want to suggest our next paper? or project?
19:24:38 <phw> if not, then i'll pick one, but not right now. i'll announce it to the anti-censorship-team@ list
19:24:49 <phw> anything else for today?
19:24:56 <cohosh> not from me
19:25:14 * phw wants a minute before closing the meeting
19:26:39 <phw> #endmeeting