15:57:52 <phw> #startmeeting anti-censorship team meeting 15:57:52 <MeetBot> Meeting started Thu Dec 17 15:57:52 2020 UTC. The chair is phw. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:57:52 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic. 15:57:59 <phw> good morning, folks 15:58:07 <phw> here's our meeting pad: https://pad.riseup.net/p/tor-anti-censorship-keep 15:58:20 <agix> hi 15:58:25 <phw> hi agix! 15:58:26 <cohosh> hi! 15:58:31 <phw> o/ 15:59:36 <phw> let's start with our announcement. tor is moving to ed25519 identifiers for relays and bridges. that may bring with it changes to some of our code bases 15:59:47 <phw> let me see if i can conjure the ahf 16:00:14 <phw> ideally, we would figure out what's going to break ahead of time 16:00:36 <ahf> im here in a second 16:01:05 <cohosh> :D 16:01:06 <phw> bridgedb is definitely affected but it relies on stem, and it shouldn't be too hard to adapt as long as stem did 16:01:22 <phw> (famous last words, i know) 16:01:28 <agix> ^^ 16:01:34 <cohosh> would this change bridge lines? 16:01:48 <phw> good question. i... think so? 16:02:01 <cohosh> so perhaps previously distributed bridges wouldn't work 16:02:13 <cohosh> unless there's some backwards compatability 16:02:31 <dcf1> Offhand I don't think the change will affect the IPC PT interface 16:02:50 <phw> backwards compatibility sounds like a desirable feature. at least for a few months 16:02:54 <ahf> hey, sorry, my grandmother called right when the meeting started 16:03:06 <phw> no worries, grandmas always have priority 16:03:11 <ahf> yeah, no doubt 16:03:12 <ahf> lol 16:03:23 <ahf> i don't think it breaks anything for current bridge lines, but i think it would be smart for us to find a clever way to detect these new lines 16:03:38 <ahf> right now we can check it by simply the length of the hash, since we use SHA1(RSA identifier) iirc 16:04:11 <cohosh> yeah i think the biggest UX pain point would be if people's bridges just suddenly stopped working because their bridge lines are now not accepted 16:04:12 <ahf> and that is 160 bits encoded in hex. for ed25519 identifiers, which are 256-bit long, we can spot it based on that 16:04:32 <ahf> but in the future we might have another 256-bit format (hash of some PQ identifier???) so we might wanna think it one step into the future here 16:04:42 <ahf> yeah i think we should find a way to avoid that, cohosh 16:04:49 <ahf> that would also break people's current configs at an upgrade point 16:04:54 <dcf1> there may be some change necessary in tor-launcher as iirc it does some light parsing of manually entered bridge lines 16:05:12 <ahf> to be fair, this isn't *only* for tor. we found this doing a hackathon for the arti (the rust implementation of tor) and arti assumes that all nodes have an ed25519 identifier 16:05:56 <phw> our hard-coded default bridges will also need an update but that's not a big deal 16:06:55 <ahf> if we find a way forward where tor can use both at the same time, it wont be that urgent to update though. i don't know with tor-launcher as dcf1 says 16:07:46 <phw> are the ed25519 ids already in descriptors? so in theory we could make these changes already? 16:08:49 <ahf> hm. i don't know if they are in the bridge descriptors, but the nodes do have the identifier already 16:10:17 <ahf> do you have a bridge descriptor nearby? i don't even see any in our tests :o 16:10:36 <phw> https://collector.torproject.org/recent/bridge-descriptors/server-descriptors/ 16:10:45 <ahf> perfect 16:10:49 <dcf1> https://metrics.torproject.org/bridge-descriptors.html 16:11:00 <phw> i see "master-key-ed25519 TLAshERmIgIdJlx3ibxD6niayV+lsvoOna/AGw3NbGA" lines 16:11:44 <ahf> yeah, i was just wondering if that is the name 16:12:28 <phw> ok, it would also be useful to know what the new descriptor format would be like, so we can start adapting unit tests and figure out what's catching fire 16:14:01 <dcf1> I just checked tor-launcher, and it looks like it's okay, the only processing it does is to remove a "bridge" prefix from each line. 16:14:04 <dcf1> https://gitweb.torproject.org/tor-launcher.git/tree/src/chrome/content/network-settings.js?h=0.2.26#n2417 16:14:04 <phw> and if there won't be backwards compatibility, i wonder how we should hand out bridge lines if there will be new-tor *and* old-tor in the wild? should we just omit the bridge identifier? 16:14:06 <ahf> yeah, it sounds like i should take a look at how we handle the ed25519 part for bridges in tor right now, create a ticket in tor with you guys in CC and then we try to figure out the upgrade paths for the user side of things? 16:14:26 <dcf1> tbh I don't know how Tor Browser and tor-launcher interact anymore now that the settings are part of about:preferences 16:14:26 <ahf> i think we must aim for backwards compatibility here :-/ and i think we can in little-t-tor 16:14:35 <ahf> but in arti we will likely break some legacy things 16:14:36 <phw> sounds like a great plan ahf, thanks! 16:14:44 <ahf> but arti wont happen over the weekend in the new year 16:15:06 <ahf> phw: awesome, i will do that 16:16:21 <phw> any more thoughts on this? 16:17:19 <phw> if not, then let's move on to our next topic 16:17:30 <phw> this is about tpo/anti-censorship/pluggable-transports/meek#40001 16:17:53 <phw> the problem is that our monitoring wasn't set up to detect these sorts of issues 16:18:12 <phw> we're monitoring a bunch of things in isolation but not the overall process 16:18:41 <phw> so the question is: what can we do to catch this (and similar) problems in the future? 16:19:13 <phw> in theory, we can give monit (our monitoring system) and executable or script, and it raises an alert if the return code is != 0 16:19:17 <dcf1> one place where the process fell down, 16:19:17 <ahf> maybe network health can help? 16:19:19 <phw> s/and/an/ 16:20:00 <dcf1> is I got an email in oct about a pending TLS cert change, I knew that obfs4proxy was pinning certificates, I spent a few hours investigating, and came to the incorrect conclusion that we were not affected 16:20:42 <dcf1> I think I got the email because I'm the admin/whatever of the CDN setup. I'm not sure if any others got the email since the time we were trying to get more admins enrolled. 16:21:32 <phw> dcf1: both cohosh and i are finally able to administer our azure setup. i'll have to take a look at the portal and see where notifications are sent to 16:21:59 <cohosh> yeah i haven't spent enough time figuring out the admin side of azure 16:22:26 <cohosh> so this is good to look at more, but i think phw is right that it would be good to expand our monitoring 16:22:36 <cohosh> so that we're not relying on users to tell us something is down 16:22:43 <cohosh> phw: did we find out about it through frontdesk? 16:22:56 <phw> it would be simple-ish to fetch a moat captcha over obfs4proxy's meek_lite. the problem is that we gotta use the same obfs4proxy and tor as it's used in tor browser 16:23:10 <phw> cohosh: i think ggus did, yes 16:24:33 <phw> the "best" solution i can think of is to do some scripting to create an environment that's identical to tor browser and then attempt to talk to moat 16:25:27 <cohosh> hmm 16:25:29 <phw> this would probably require a lot of duct tape and chewing gum, so maybe anyone has a better idea? 16:25:32 <dcf1> for true end to end, you could download the distribution package, verify sigs, then run it with different configurations 16:25:51 <dcf1> but even then, what if there's a bug that only affects Windows packages, etc. 16:26:27 <phw> right, we won't be able to cover all possible failure modes 16:26:49 <dcf1> Another point: what did affected users see when the failure started happening? Was the error message useful and conducive to having the problem reported? Could we improve that? 16:27:45 <dcf1> According to meek#40001 it was `Problem bootstrapping. Stuck at 10% (conn_done): Connected to a relay. (TLS_ERROR; TLS_ERROR; count 1; recommendation warn; host 97700DFE9F483596DDA6264C4D7DF7641E1E39CE at 0.0.2.0:2)`, not so good 16:28:09 <dcf1> I'm guessing that `TLS_ERROR; TLS_ERROR` comes from obfs4proxy, it may be possible to make a change there. 16:28:33 <dcf1> On second thought maybe not; that text may come from tor 16:28:58 <phw> i think the way it broke was also particularly bad. obfs4proxy paniced, so tor hit this bug: tpo/core/tor#33669 16:30:06 <phw> oh, hold on, i may be wrong 16:31:48 <phw> correction: obfs4proxy didn't panic. it logged a warning (which basically nobody saw) and tor would log the tls error that dcf1 mentioned 16:33:01 <phw> and yes, these bootstrapping messages are useless to users but at least we can sometimes infer what went wrong :/ 16:33:21 <cohosh> is this a good use case for https://gitweb.torproject.org/torspec.git/tree/pt-spec.txt#n634 ? 16:33:47 <dcf1> cohosh: yes, LOG or even just printing something to stderr would have caused a user-visible message in the tor log 16:34:15 <dcf1> LOG is somewhat problematic because it causes a buffering bug and a deadlock in older versions of tor 16:34:31 <dcf1> but maybe those versions are far enough behind us at this point 16:34:58 <phw> (also, we should have a system in place that monitors the stats of our meek bridge and alerts us if the numbers fall below a given threshold) 16:35:10 <dcf1> good idea 16:35:34 <cohosh> phw: +1 ! 16:35:39 <dcf1> numbers are still diminished since the event 16:35:41 <dcf1> https://metrics.torproject.org/userstats-bridge-transport.html?start=2020-11-01&end=2020-12-31&transport=meek 16:36:00 <dcf1> (12-05) 16:36:10 <phw> i worry that the problem isn't completely fixed. benjamin from the guardian project is still struggling with the bug despite using the latest version 16:36:19 <phw> i hope that it's just an issue with their build system but who knows 16:36:41 <dcf1> it's possible that there are different intermediate certs used depending on edge server, and that the ones hardcoded in obfs4proxy are not complete 16:36:53 <phw> yes, that is my suspicion too 16:37:09 <dcf1> there's an azure web page that gives fingerprints for the certs in use, phw you may have consulted it 16:37:12 <phw> i asked him to try a patched obfs4proxy that logs the certs that it sees 16:37:58 <phw> https://gitlab.com/yawning/obfs4/-/blob/master/transports/meeklite/hpkp_lite.go#L106 -- all our new fingerprints are from the page you once told me about 16:38:29 <dcf1> the 2 links I know about are 16:38:38 <dcf1> https://docs.microsoft.com/en-us/azure/security/fundamentals/tls-certificate-changes#what-is-changing 16:38:42 <dcf1> https://www.microsoft.com/pki/mscorp/cps/default.htm 16:38:42 <phw> we have all six of them pinned 16:39:58 <phw> hmm. i hope i didn't retire a root ca that's still used. i'll check with benjamin 16:41:05 <phw> ok, all of this was very useful. i'll file a few tickets after the meeting 16:41:41 <cohosh> great job debugging and responding to this phw 16:41:46 <phw> "Is this the last meeting until 2021?" -> i'd say yes :) 16:42:16 <phw> i'll also mostly be around on irc if anyone needs anything 16:43:38 <cohosh> yeah i'll be monitoring messages off and on 16:44:13 <phw> ok, let's do reviews 16:44:26 <phw> i don't see any 16:45:15 <dcf1> I iwll say that I am happy that cohosh is looking at KCP layering issues in snowflake#40026 etc 16:45:38 <dcf1> This is a topic that definitely needed additional experienced eyes on it 16:45:39 <phw> yes, a very important (and complicated!) effort 16:45:46 <cohosh> thanks for the comments on that dcf1 16:47:53 <phw> any last words in 2020? 16:48:09 <cohosh> lol 16:48:16 <cohosh> that sounds so ominous 16:48:45 <phw> no, we have a bright future ahead of us! 16:48:59 <cohosh> :D 16:49:22 <phw> #endmeeting