13:01:18 #startmeeting network-health 2025-05-12 13:01:18 Meeting started Mon May 12 13:01:18 2025 UTC. The chair is hiro. Information about MeetBot at http://wiki.debian.org/MeetBot. 13:01:18 Useful Commands: #action #agreed #help #info #idea #link #topic. 13:07:40 https://grafana1.torproject.org/d/xfpJB9FGz/8b2109d?var-interval=2m&orgId=1&from=now-2d&to=now&timezone=browser&var-origin_prometheus=&var-job=node&var-hostname=$__all&var-node=colchicifolium.torproject.org:9100&var-device=$__all&var-maxmount=%2Fhome&var-show_hostname=alberti&var-total=94 is the link for those interested 13:07:41 now that it seems things have stabilized a bit I was thinking to write a post-mortem of the latests issues. GeKo (IRC) where do you think this could go? the wiki or some of the analysis tickets? 13:07:42 I was worried that if I write it up again in one of the fix issues it gets buried into gitlab so to speak 13:07:43 other than this I do not have anything else besides what is in my tasks list and some bugs for the current metrics website reg some of the graphs not behaving properly that I have to dig into 13:07:49 GeKo (IRC), juga do you have anything to discuss for this week sync? 13:08:09 hiro: hmm, i don't think so 13:09:53 hiro: i'd be fine in some of the tickets 13:10:25 the fix tickets for the service or the analysis one? 13:10:42 or you mean analysis#96? 13:10:42 Uhm, which one of [tpo/network-health/analysis, tpo/network-health/metrics/analysis] did you mean? 13:10:52 yeah 13:10:57 i'd reserve that for the actual ddos 13:11:08 ok then 13:11:27 collector is kind of 2nd order collateral damage 13:12:07 oh GeKo (IRC) one operator at the meetup said that they have observed a ddos to exits that we didn't know about (the specific to exits part) 13:12:22 interesting 13:12:30 do we have more info about that? 13:12:33 we have asked for more info and to write to bad-relays so if that mail arrives we will know 13:12:40 ah, okay 13:13:33 hiro: is it expected that the memory usage for collector is still jumping? 13:13:56 it's essentially the same pattern as before, just the spikes are not that high right now 13:14:35 and the RAM cache + buffer does not folow the RAM used, hrm 13:14:39 *follow 13:15:44 sometimes it is possible but the service seems more stable now 13:15:45 the average is 60% free memory 13:15:45 also we shouldn't use more than 80% memory anyways 13:16:02 okay 13:16:20 i don't have anything else 13:16:39 hiro: oh 13:16:48 did the votes document thing succeed? 13:16:50 great I'll clean up the MR for collector and add the post mortem then 13:17:01 yes 13:17:07 we should have that in the db 13:17:24 awesome, so you could fix up that mr as well and we are done with that part 13:17:26 but creating statuses takes 6h nowadays... so me and sarthik are working on a solution f or that 13:17:35 yeah... 13:18:13 so yeah I can fix up the MR for the vote part .. probably doing a rebase 13:18:18 but the long term solution is this: https://gitlab.torproject.org/tpo/network-health/metrics/aggreagator.rs/-/issues/1 13:18:29 moving creating statuses out of the parser 13:19:42 ok so that's all I guess 13:20:11 if everybody is good we can end the meeting 13:21:17 +1 13:21:27 #endmeeting