15:59:14 #startmeeting tor anti-censorship meeting 15:59:14 Meeting started Thu Jan 27 15:59:14 2022 UTC. The chair is meskio. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:59:14 Useful Commands: #action #agreed #help #info #idea #link #topic. 15:59:18 hello everybody!! 15:59:23 Hi~ 15:59:32 here is our meeting pad: https://pad.riseup.net/p/tor-anti-censorship-keep 15:59:46 please add what you've been working on 15:59:55 and any items for the agenda you want to talk about 16:00:16 hi! 16:01:24 meskio: is there anything to say about the obfs4 upstream changes? 16:01:46 mmm 16:01:55 I haven't done much on that this week 16:02:11 debian package is in process, I think, but I haven't check how it is 16:02:18 no problem, just wanted to see if it should be on the agenda 16:02:31 and I don't know what is the status for TB to update it 16:02:49 I don't have updates, but maybe I should put a bit of my head on this next week 16:03:10 dcf1: do you want to start with the load balancing? 16:03:43 at the top of https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/issues/40095 you can see the progress so far 16:04:38 The DNS switch 2 days ago was pretty smooth, we just had to debug a problem with a file desciprtor limit 16:05:01 * meskio notices that gitlab seems to be down 16:05:03 I posted updates from monitoring it the past 2 days at https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/issues/40095#note_2772325 and the comment following 16:05:28 gitlab is working for me currently 16:05:48 weird, is not loading here 16:05:55 Yes. Gitlab seems to be down for me..... 16:05:57 the 4 tor instances are combining to use more than 100% of a CPU core, which was the whole reason for doing any of this, so that's working 16:06:42 nice, congrats 16:06:44 The 8 CPu cores are pretty much maxed out between snowflake-server, haproxy, the tors, and extor-static-cookie though 16:06:53 cohosh lost access to the meeting(message from signal) 16:06:57 At least during the day in Russia, at night it's a little more chill 16:07:31 I have an idea for preventing onion key rotation, which is the last piece of the load balancing setup we haven't solved yet 16:08:09 It's to make the onion key files read-only. arma2 looked at the code and says there's a chance it will work. https://forum.torproject.net/t/tor-relays-how-to-reduce-tor-cpu-load-on-a-single-bridge/1483/16 16:08:52 one side effect is that snowflake metrics are possibly going to be messed up 16:09:07 hey, chives is having issues and many people dropped from irc 16:09:11 including cohosh 16:09:19 the ExtORPort stuff is working, and the bridge is uploading descriptors with a mix of countries 16:09:24 hmm bummer 16:09:33 I'll type and paste again if I need to 16:09:40 there :) 16:09:43 (hi, got disconnected) 16:10:02 hello back 16:10:10 e.g. dirreq-v3-ips ru=8304,us=728,cn=272,by=248,de=160,gb=128 16:10:46 I was saying the 8 CPU cores are pretty much maxed out between snowflake-server, haproxy, the tors, and extor-static-cookie during the time of the most usage, which corresponds to daytime in Russia 16:11:10 I need to test an idea to make the onion key files read-only, to prevent onion key rotation https://forum.torproject.net/t/tor-relays-how-to-reduce-tor-cpu-load-on-a-single-bridge/1483/16 16:11:33 https://metrics.torproject.org/rs.html#details/5481936581E23D2D178105D44DB6915AB06BFB7F 16:11:53 a side effect of running multiple tor instances with the same identity key is that metrics might get messed up 16:12:22 the bandwidth graph is going down on 25 and 26 January, but that's definitely not really the case 16:12:47 sounds good, a pitty not to have proper metrics, but we could collect them from another place if needed 16:13:09 the bridge is currently about 30MB/s incoming, 40 MB/s outgoing, which is more than we had before 16:13:14 hmm, i would have expected metrics to rise by too much because clients would get double counted, why would it fall? 16:13:38 Maybe the data overwrite each other? 16:13:38 cecylia: hypothesis is because it's only counting metrics from 1/4 of the instances 16:13:42 ahh 16:13:45 that makes sense 16:14:05 taking the most recent descriptor with a certain identity fingerprint, instead of adding them together 16:14:22 But I am not sure exactly what's happening 16:14:28 is it worth talking to the network team and trying to get at least a temporary fix in place here? 16:14:33 Maybe it's something the network health team can help with 16:14:39 dcf1: Thank you for all this work! 16:14:42 * cecylia nods 16:14:46 tx anadahz 16:14:48 thanks~ 16:15:07 probably today or tomorrow I'll take down the production bridge and do the load balancing installation there 16:15:10 yep, great work 16:15:15 then maybe next monday try switching the DNS back 16:15:16 nice \o/ 16:15:50 yeah!! 16:16:00 (and actually there are currently 5 instances running, the 4 load balanced + the 1 production that is running concurrently, so that might also be affecting metrics) 16:16:15 Do we have a plan in case the clients get diverging onion keys? 16:16:39 get an alert last night saying the bridge had already used 3 TB of bandwidth, in about 1.5 days 16:16:56 shelikhoo: the plan is to try making the onion keys files read-only to prevent the tor relay from rotating onion keys 16:17:27 an alternative is to apply a patch to the tor relay, but I have less preference for that as a custom-patched tor makes it more likely to miss a security upgrade 16:17:34 Yes, but what if it does not work and clients cached incorrect onion key? 16:17:57 shelikhoo: well, I will test it first, and if it does not work, we will not do it. 16:18:02 Yes! 16:18:13 Sounds great 16:18:20 the other thing you can do is to stop tor for 30 seconds, then erase the LastRotatedOnionKey line, then restart tor and tor will refresh the timestamp 16:18:41 that is effective, but it means you have to shut down the realy for ~1 minute every 3 weeks 16:19:08 which is not too bad on its own, the real risk is automating / remembering to do it 16:19:27 cron job? 16:19:29 Let's hope read only trick works 16:19:43 yeah, I hope it works. but we can cope if it does not. 16:19:43 otherwise I can create something like systemd timer 16:20:25 which is similar to cron 16:21:17 :) 16:21:55 should we move on to the next topic? 16:22:23 ok 16:22:31 just an announcement 16:22:44 the last few months I've being working on replacing bridgedb backend by rdsys 16:23:00 I plan to set up next week a test bridgedb with it 16:23:38 only @torproject.org email addresses and people.tpo for the website will be able to reach it 16:23:40 \o/ 16:23:48 yeah! 16:24:00 I'll send an email to our team list with information if others wants to try it and see if they find problems 16:24:06 it will be distributing real bridges 16:24:15 :D 16:24:17 that is why the @tpo limitation 16:24:34 I hope to push it into production soonish in februrary, but lets see 16:24:54 I guess there is not much to discuss about it, just to notice that I will poke for test 16:25:15 the last point on the agenda is hetzner networking issue 16:25:16 Is gettor included in this deployment? 16:25:27 shelikhoo: not yet 16:25:39 but gettor will be next once rdsys is in production, thanks to your work 16:25:51 I'll talk with you when everything is ready to start deploying gettor 16:25:58 Okay, looking for testing it in real environments.... 16:26:10 no problem! let me know when this happens 16:26:19 sure :) 16:27:49 Yes, we could move on to hetzner networking issue.... 16:27:50 about hetzner, I have no idea what in our infra is there, it seems to be back now 16:28:09 Yes, it is quite annoying 16:28:19 but not that bad 16:28:26 i think it's worth adding to the metrics timeline, there are some bridges there 16:28:38 also polyanthum (bridgedb) 16:29:06 bridges.tpo seems to be reachable now 16:29:12 the outage wasn't long 16:29:22 but there were some routing issues the other day 16:29:35 so this could be a longer term event that involves intermittent disruptions 16:30:06 Has this has affected also Snowflake users? 16:30:15 i don't think it would 16:30:32 the bridge isn't hosted on hetzner (either the production or staging server) 16:30:33 * anadahz goes to read #tor-dev backlog 16:32:46 I hope the problem doesn't continue 16:32:58 anything more in this topic? something else to talk about? 16:33:03 you can see the outages on our anti-censorship alerts mailing list 16:33:25 yep, there has been a bunch of them lattely 16:35:56 if we don't have anything more we can finish this meeting 16:36:07 a reminder, next week we have reading group 16:36:49 one note, I'm going to try to run some profiling on the staging bridge before we switch it back 16:37:02 in order to help with https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/issues/40086 16:37:27 dcf1: awesome! thanks for doing that! 16:38:46 I'll wait a minute to close the meeting if someone has something more 16:39:59 #endmeeting