15:57:52 <phw> #startmeeting anti-censorship team meeting
15:57:52 <MeetBot> Meeting started Thu Dec 17 15:57:52 2020 UTC.  The chair is phw. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:57:52 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic.
15:57:59 <phw> good morning, folks
15:58:07 <phw> here's our meeting pad: https://pad.riseup.net/p/tor-anti-censorship-keep
15:58:20 <agix> hi
15:58:25 <phw> hi agix!
15:58:26 <cohosh> hi!
15:58:31 <phw> o/
15:59:36 <phw> let's start with our announcement. tor is moving to ed25519 identifiers for relays and bridges. that may bring with it changes to some of our code bases
15:59:47 <phw> let me see if i can conjure the ahf
16:00:14 <phw> ideally, we would figure out what's going to break ahead of time
16:00:36 <ahf> im here in a second
16:01:05 <cohosh> :D
16:01:06 <phw> bridgedb is definitely affected but it relies on stem, and it shouldn't be too hard to adapt as long as stem did
16:01:22 <phw> (famous last words, i know)
16:01:28 <agix> ^^
16:01:34 <cohosh> would this change bridge lines?
16:01:48 <phw> good question. i... think so?
16:02:01 <cohosh> so perhaps previously distributed bridges wouldn't work
16:02:13 <cohosh> unless there's some backwards compatability
16:02:31 <dcf1> Offhand I don't think the change will affect the IPC PT interface
16:02:50 <phw> backwards compatibility sounds like a desirable feature. at least for a few months
16:02:54 <ahf> hey, sorry, my grandmother called right when the meeting started
16:03:06 <phw> no worries, grandmas always have priority
16:03:11 <ahf> yeah, no doubt
16:03:12 <ahf> lol
16:03:23 <ahf> i don't think it breaks anything for current bridge lines, but i think it would be smart for us to find a clever way to detect these new lines
16:03:38 <ahf> right now we can check it by simply the length of the hash, since we use SHA1(RSA identifier) iirc
16:04:11 <cohosh> yeah i think the biggest UX pain point would be if people's bridges just suddenly stopped working because their bridge lines are now not accepted
16:04:12 <ahf> and that is 160 bits encoded in hex. for ed25519 identifiers, which are 256-bit long, we can spot it based on that
16:04:32 <ahf> but in the future we might have another 256-bit format (hash of some PQ identifier???) so we might wanna think it one step into the future here
16:04:42 <ahf> yeah i think we should find a way to avoid that, cohosh
16:04:49 <ahf> that would also break people's current configs at an upgrade point
16:04:54 <dcf1> there may be some change necessary in tor-launcher as iirc it does some light parsing of manually entered bridge lines
16:05:12 <ahf> to be fair, this isn't *only* for tor. we found this doing a hackathon for the arti (the rust implementation of tor) and arti assumes that all nodes have an ed25519 identifier
16:05:56 <phw> our hard-coded default bridges will also need an update but that's not a big deal
16:06:55 <ahf> if we find a way forward where tor can use both at the same time, it wont be that urgent to update though. i don't know with tor-launcher as dcf1 says
16:07:46 <phw> are the ed25519 ids already in descriptors? so in theory we could make these changes already?
16:08:49 <ahf> hm. i don't know if they are in the bridge descriptors, but the nodes do have the identifier already
16:10:17 <ahf> do you have a bridge descriptor nearby? i don't even see any in our tests :o
16:10:36 <phw> https://collector.torproject.org/recent/bridge-descriptors/server-descriptors/
16:10:45 <ahf> perfect
16:10:49 <dcf1> https://metrics.torproject.org/bridge-descriptors.html
16:11:00 <phw> i see "master-key-ed25519 TLAshERmIgIdJlx3ibxD6niayV+lsvoOna/AGw3NbGA" lines
16:11:44 <ahf> yeah, i was just wondering if that is the name
16:12:28 <phw> ok, it would also be useful to know what the new descriptor format would be like, so we can start adapting unit tests and figure out what's catching fire
16:14:01 <dcf1> I just checked tor-launcher, and it looks like it's okay, the only processing it does is to remove a "bridge" prefix from each line.
16:14:04 <dcf1> https://gitweb.torproject.org/tor-launcher.git/tree/src/chrome/content/network-settings.js?h=0.2.26#n2417
16:14:04 <phw> and if there won't be backwards compatibility, i wonder how we should hand out bridge lines if there will be new-tor *and* old-tor in the wild? should we just omit the bridge identifier?
16:14:06 <ahf> yeah, it sounds like i should take a look at how we handle the ed25519 part for bridges in tor right now, create a ticket in tor with you guys in CC and then we try to figure out the upgrade paths for the user side of things?
16:14:26 <dcf1> tbh I don't know how Tor Browser and tor-launcher interact anymore now that the settings are part of about:preferences
16:14:26 <ahf> i think we must aim for backwards compatibility here :-/ and i think we can in little-t-tor
16:14:35 <ahf> but in arti we will likely break some legacy things
16:14:36 <phw> sounds like a great plan ahf, thanks!
16:14:44 <ahf> but arti wont happen over the weekend in the new year
16:15:06 <ahf> phw: awesome, i will do that
16:16:21 <phw> any more thoughts on this?
16:17:19 <phw> if not, then let's move on to our next topic
16:17:30 <phw> this is about tpo/anti-censorship/pluggable-transports/meek#40001
16:17:53 <phw> the problem is that our monitoring wasn't set up to detect these sorts of issues
16:18:12 <phw> we're monitoring a bunch of things in isolation but not the overall process
16:18:41 <phw> so the question is: what can we do to catch this (and similar) problems in the future?
16:19:13 <phw> in theory, we can give monit (our monitoring system) and executable or script, and it raises an alert if the return code is != 0
16:19:17 <dcf1> one place where the process fell down,
16:19:17 <ahf> maybe network health can help?
16:19:19 <phw> s/and/an/
16:20:00 <dcf1> is I got an email in oct about a pending TLS cert change, I knew that obfs4proxy was pinning certificates, I spent a few hours investigating, and came to the incorrect conclusion that we were not affected
16:20:42 <dcf1> I think I got the email because I'm the admin/whatever of the CDN setup. I'm not sure if any others got the email since the time we were trying to get more admins enrolled.
16:21:32 <phw> dcf1: both cohosh and i are finally able to administer our azure setup. i'll have to take a look at the portal and see where notifications are sent to
16:21:59 <cohosh> yeah i haven't spent enough time figuring out the admin side of azure
16:22:26 <cohosh> so this is good to look at more, but i think phw is right that it would be good to expand our monitoring
16:22:36 <cohosh> so that we're not relying on users to tell us something is down
16:22:43 <cohosh> phw: did we find out about it through frontdesk?
16:22:56 <phw> it would be simple-ish to fetch a moat captcha over obfs4proxy's meek_lite. the problem is that we gotta use the same obfs4proxy and tor as it's used in tor browser
16:23:10 <phw> cohosh: i think ggus did, yes
16:24:33 <phw> the "best" solution i can think of is to do some scripting to create an environment that's identical to tor browser and then attempt to talk to moat
16:25:27 <cohosh> hmm
16:25:29 <phw> this would probably require a lot of duct tape and chewing gum, so maybe anyone has a better idea?
16:25:32 <dcf1> for true end to end, you could download the distribution package, verify sigs, then run it with different configurations
16:25:51 <dcf1> but even then, what if there's a bug that only affects Windows packages, etc.
16:26:27 <phw> right, we won't be able to cover all possible failure modes
16:26:49 <dcf1> Another point: what did affected users see when the failure started happening? Was the error message useful and conducive to having the problem reported? Could we improve that?
16:27:45 <dcf1> According to meek#40001 it was `Problem bootstrapping. Stuck at 10% (conn_done): Connected to a relay. (TLS_ERROR; TLS_ERROR; count 1; recommendation warn; host 97700DFE9F483596DDA6264C4D7DF7641E1E39CE at 0.0.2.0:2)`, not so good
16:28:09 <dcf1> I'm guessing that `TLS_ERROR; TLS_ERROR` comes from obfs4proxy, it may be possible to make a change there.
16:28:33 <dcf1> On second thought maybe not; that text may come from tor
16:28:58 <phw> i think the way it broke was also particularly bad. obfs4proxy paniced, so tor hit this bug: tpo/core/tor#33669
16:30:06 <phw> oh, hold on, i may be wrong
16:31:48 <phw> correction: obfs4proxy didn't panic. it logged a warning (which basically nobody saw) and tor would log the tls error that dcf1 mentioned
16:33:01 <phw> and yes, these bootstrapping messages are useless to users but at least we can sometimes infer what went wrong :/
16:33:21 <cohosh> is this a good use case for https://gitweb.torproject.org/torspec.git/tree/pt-spec.txt#n634 ?
16:33:47 <dcf1> cohosh: yes, LOG or even just printing something to stderr would have caused a user-visible message in the tor log
16:34:15 <dcf1> LOG is somewhat problematic because it causes a buffering bug and a deadlock in older versions of tor
16:34:31 <dcf1> but maybe those versions are far enough behind us at this point
16:34:58 <phw> (also, we should have a system in place that monitors the stats of our meek bridge and alerts us if the numbers fall below a given threshold)
16:35:10 <dcf1> good idea
16:35:34 <cohosh> phw: +1 !
16:35:39 <dcf1> numbers are still diminished since the event
16:35:41 <dcf1> https://metrics.torproject.org/userstats-bridge-transport.html?start=2020-11-01&end=2020-12-31&transport=meek
16:36:00 <dcf1> (12-05)
16:36:10 <phw> i worry that the problem isn't completely fixed. benjamin from the guardian project is still struggling with the bug despite using the latest version
16:36:19 <phw> i hope that it's just an issue with their build system but who knows
16:36:41 <dcf1> it's possible that there are different intermediate certs used depending on edge server, and that the ones hardcoded in obfs4proxy are not complete
16:36:53 <phw> yes, that is my suspicion too
16:37:09 <dcf1> there's an azure web page that gives fingerprints for the certs in use, phw you may have consulted it
16:37:12 <phw> i asked him to try a patched obfs4proxy that logs the certs that it sees
16:37:58 <phw> https://gitlab.com/yawning/obfs4/-/blob/master/transports/meeklite/hpkp_lite.go#L106 -- all our new fingerprints are from the page you once told me about
16:38:29 <dcf1> the 2 links I know about are
16:38:38 <dcf1> https://docs.microsoft.com/en-us/azure/security/fundamentals/tls-certificate-changes#what-is-changing
16:38:42 <dcf1> https://www.microsoft.com/pki/mscorp/cps/default.htm
16:38:42 <phw> we have all six of them pinned
16:39:58 <phw> hmm. i hope i didn't retire a root ca that's still used. i'll check with benjamin
16:41:05 <phw> ok, all of this was very useful. i'll file a few tickets after the meeting
16:41:41 <cohosh> great job debugging and responding to this phw
16:41:46 <phw> "Is this the last meeting until 2021?" -> i'd say yes :)
16:42:16 <phw> i'll also mostly be around on irc if anyone needs anything
16:43:38 <cohosh> yeah i'll be monitoring messages off and on
16:44:13 <phw> ok, let's do reviews
16:44:26 <phw> i don't see any
16:45:15 <dcf1> I iwll say that I am happy that cohosh is looking at KCP layering issues in snowflake#40026 etc
16:45:38 <dcf1> This is a topic that definitely needed additional experienced eyes on it
16:45:39 <phw> yes, a very important (and complicated!) effort
16:45:46 <cohosh> thanks for the comments on that dcf1
16:47:53 <phw> any last words in 2020?
16:48:09 <cohosh> lol
16:48:16 <cohosh> that sounds so ominous
16:48:45 <phw> no, we have a bright future ahead of us!
16:48:59 <cohosh> :D
16:49:22 <phw> #endmeeting