15:59:14 <meskio> #startmeeting tor anti-censorship meeting 15:59:14 <MeetBot> Meeting started Thu Jan 27 15:59:14 2022 UTC. The chair is meskio. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:59:14 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic. 15:59:18 <meskio> hello everybody!! 15:59:23 <shelikhoo> Hi~ 15:59:32 <meskio> here is our meeting pad: https://pad.riseup.net/p/tor-anti-censorship-keep 15:59:46 <meskio> please add what you've been working on 15:59:55 <meskio> and any items for the agenda you want to talk about 16:00:16 <cohosh> hi! 16:01:24 <dcf1> meskio: is there anything to say about the obfs4 upstream changes? 16:01:46 <meskio> mmm 16:01:55 <meskio> I haven't done much on that this week 16:02:11 <meskio> debian package is in process, I think, but I haven't check how it is 16:02:18 <dcf1> no problem, just wanted to see if it should be on the agenda 16:02:31 <meskio> and I don't know what is the status for TB to update it 16:02:49 <meskio> I don't have updates, but maybe I should put a bit of my head on this next week 16:03:10 <meskio> dcf1: do you want to start with the load balancing? 16:03:43 <dcf1> at the top of https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/issues/40095 you can see the progress so far 16:04:38 <dcf1> The DNS switch 2 days ago was pretty smooth, we just had to debug a problem with a file desciprtor limit 16:05:01 * meskio notices that gitlab seems to be down 16:05:03 <dcf1> I posted updates from monitoring it the past 2 days at https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/issues/40095#note_2772325 and the comment following 16:05:28 <dcf1> gitlab is working for me currently 16:05:48 <meskio> weird, is not loading here 16:05:55 <shelikhoo> Yes. Gitlab seems to be down for me..... 16:05:57 <dcf1> the 4 tor instances are combining to use more than 100% of a CPU core, which was the whole reason for doing any of this, so that's working 16:06:42 <meskio> nice, congrats 16:06:44 <dcf1> The 8 CPu cores are pretty much maxed out between snowflake-server, haproxy, the tors, and extor-static-cookie though 16:06:53 <shelikhoo> cohosh lost access to the meeting(message from signal) 16:06:57 <dcf1> At least during the day in Russia, at night it's a little more chill 16:07:31 <dcf1> I have an idea for preventing onion key rotation, which is the last piece of the load balancing setup we haven't solved yet 16:08:09 <dcf1> It's to make the onion key files read-only. arma2 looked at the code and says there's a chance it will work. https://forum.torproject.net/t/tor-relays-how-to-reduce-tor-cpu-load-on-a-single-bridge/1483/16 16:08:52 <dcf1> one side effect is that snowflake metrics are possibly going to be messed up 16:09:07 <gaba> hey, chives is having issues and many people dropped from irc 16:09:11 <gaba> including cohosh 16:09:19 <dcf1> the ExtORPort stuff is working, and the bridge is uploading descriptors with a mix of countries 16:09:24 <dcf1> hmm bummer 16:09:33 <dcf1> I'll type and paste again if I need to 16:09:40 <gaba> there :) 16:09:43 <cecylia> (hi, got disconnected) 16:10:02 <meskio> hello back 16:10:10 <dcf1> e.g. dirreq-v3-ips ru=8304,us=728,cn=272,by=248,de=160,gb=128 16:10:46 <dcf1> I was saying the 8 CPU cores are pretty much maxed out between snowflake-server, haproxy, the tors, and extor-static-cookie during the time of the most usage, which corresponds to daytime in Russia 16:11:10 <dcf1> I need to test an idea to make the onion key files read-only, to prevent onion key rotation https://forum.torproject.net/t/tor-relays-how-to-reduce-tor-cpu-load-on-a-single-bridge/1483/16 16:11:33 <dcf1> https://metrics.torproject.org/rs.html#details/5481936581E23D2D178105D44DB6915AB06BFB7F 16:11:53 <dcf1> a side effect of running multiple tor instances with the same identity key is that metrics might get messed up 16:12:22 <dcf1> the bandwidth graph is going down on 25 and 26 January, but that's definitely not really the case 16:12:47 <meskio> sounds good, a pitty not to have proper metrics, but we could collect them from another place if needed 16:13:09 <dcf1> the bridge is currently about 30MB/s incoming, 40 MB/s outgoing, which is more than we had before 16:13:14 <cecylia> hmm, i would have expected metrics to rise by too much because clients would get double counted, why would it fall? 16:13:38 <shelikhoo> Maybe the data overwrite each other? 16:13:38 <dcf1> cecylia: hypothesis is because it's only counting metrics from 1/4 of the instances 16:13:42 <cecylia> ahh 16:13:45 <cecylia> that makes sense 16:14:05 <dcf1> taking the most recent descriptor with a certain identity fingerprint, instead of adding them together 16:14:22 <dcf1> But I am not sure exactly what's happening 16:14:28 <cecylia> is it worth talking to the network team and trying to get at least a temporary fix in place here? 16:14:33 <dcf1> Maybe it's something the network health team can help with 16:14:39 <anadahz> dcf1: Thank you for all this work! 16:14:42 * cecylia nods 16:14:46 <dcf1> tx anadahz 16:14:48 <shelikhoo> thanks~ 16:15:07 <dcf1> probably today or tomorrow I'll take down the production bridge and do the load balancing installation there 16:15:10 <meskio> yep, great work 16:15:15 <dcf1> then maybe next monday try switching the DNS back 16:15:16 <cecylia> nice \o/ 16:15:50 <meskio> yeah!! 16:16:00 <dcf1> (and actually there are currently 5 instances running, the 4 load balanced + the 1 production that is running concurrently, so that might also be affecting metrics) 16:16:15 <shelikhoo> Do we have a plan in case the clients get diverging onion keys? 16:16:39 <dcf1> get an alert last night saying the bridge had already used 3 TB of bandwidth, in about 1.5 days 16:16:56 <dcf1> shelikhoo: the plan is to try making the onion keys files read-only to prevent the tor relay from rotating onion keys 16:17:27 <dcf1> an alternative is to apply a patch to the tor relay, but I have less preference for that as a custom-patched tor makes it more likely to miss a security upgrade 16:17:34 <shelikhoo> Yes, but what if it does not work and clients cached incorrect onion key? 16:17:57 <dcf1> shelikhoo: well, I will test it first, and if it does not work, we will not do it. 16:18:02 <shelikhoo> Yes! 16:18:13 <shelikhoo> Sounds great 16:18:20 <dcf1> the other thing you can do is to stop tor for 30 seconds, then erase the LastRotatedOnionKey line, then restart tor and tor will refresh the timestamp 16:18:41 <dcf1> that is effective, but it means you have to shut down the realy for ~1 minute every 3 weeks 16:19:08 <dcf1> which is not too bad on its own, the real risk is automating / remembering to do it 16:19:27 <meskio> cron job? 16:19:29 <shelikhoo> Let's hope read only trick works 16:19:43 <dcf1> yeah, I hope it works. but we can cope if it does not. 16:19:43 <shelikhoo> otherwise I can create something like systemd timer 16:20:25 <shelikhoo> which is similar to cron 16:21:17 <meskio> :) 16:21:55 <meskio> should we move on to the next topic? 16:22:23 <dcf1> ok 16:22:31 <meskio> just an announcement 16:22:44 <meskio> the last few months I've being working on replacing bridgedb backend by rdsys 16:23:00 <meskio> I plan to set up next week a test bridgedb with it 16:23:38 <meskio> only @torproject.org email addresses and people.tpo for the website will be able to reach it 16:23:40 <gaba> \o/ 16:23:48 <anadahz> yeah! 16:24:00 <meskio> I'll send an email to our team list with information if others wants to try it and see if they find problems 16:24:06 <meskio> it will be distributing real bridges 16:24:15 <cohosh> :D 16:24:17 <meskio> that is why the @tpo limitation 16:24:34 <meskio> I hope to push it into production soonish in februrary, but lets see 16:24:54 <meskio> I guess there is not much to discuss about it, just to notice that I will poke for test 16:25:15 <meskio> the last point on the agenda is hetzner networking issue 16:25:16 <shelikhoo> Is gettor included in this deployment? 16:25:27 <meskio> shelikhoo: not yet 16:25:39 <meskio> but gettor will be next once rdsys is in production, thanks to your work 16:25:51 <meskio> I'll talk with you when everything is ready to start deploying gettor 16:25:58 <shelikhoo> Okay, looking for testing it in real environments.... 16:26:10 <shelikhoo> no problem! let me know when this happens 16:26:19 <meskio> sure :) 16:27:49 <shelikhoo> Yes, we could move on to hetzner networking issue.... 16:27:50 <meskio> about hetzner, I have no idea what in our infra is there, it seems to be back now 16:28:09 <shelikhoo> Yes, it is quite annoying 16:28:19 <shelikhoo> but not that bad 16:28:26 <cohosh> i think it's worth adding to the metrics timeline, there are some bridges there 16:28:38 <cohosh> also polyanthum (bridgedb) 16:29:06 <meskio> bridges.tpo seems to be reachable now 16:29:12 <cohosh> the outage wasn't long 16:29:22 <cohosh> but there were some routing issues the other day 16:29:35 <cohosh> so this could be a longer term event that involves intermittent disruptions 16:30:06 <anadahz> Has this has affected also Snowflake users? 16:30:15 <cohosh> i don't think it would 16:30:32 <cohosh> the bridge isn't hosted on hetzner (either the production or staging server) 16:30:33 * anadahz goes to read #tor-dev backlog 16:32:46 <meskio> I hope the problem doesn't continue 16:32:58 <meskio> anything more in this topic? something else to talk about? 16:33:03 <cohosh> you can see the outages on our anti-censorship alerts mailing list 16:33:25 <meskio> yep, there has been a bunch of them lattely 16:35:56 <meskio> if we don't have anything more we can finish this meeting 16:36:07 <meskio> a reminder, next week we have reading group 16:36:49 <dcf1> one note, I'm going to try to run some profiling on the staging bridge before we switch it back 16:37:02 <dcf1> in order to help with https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/issues/40086 16:37:27 <cohosh> dcf1: awesome! thanks for doing that! 16:38:46 <meskio> I'll wait a minute to close the meeting if someone has something more 16:39:59 <meskio> #endmeeting