13:01:18 <hiro> #startmeeting network-health 2025-05-12
13:01:18 <MeetBot> Meeting started Mon May 12 13:01:18 2025 UTC.  The chair is hiro. Information about MeetBot at http://wiki.debian.org/MeetBot.
13:01:18 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic.
13:07:40 <hiro> https://grafana1.torproject.org/d/xfpJB9FGz/8b2109d?var-interval=2m&orgId=1&from=now-2d&to=now&timezone=browser&var-origin_prometheus=&var-job=node&var-hostname=$__all&var-node=colchicifolium.torproject.org:9100&var-device=$__all&var-maxmount=%2Fhome&var-show_hostname=alberti&var-total=94 is the link for those interested
13:07:41 <hiro> now that it seems things have stabilized a bit I was thinking to write a post-mortem of the latests issues. GeKo (IRC) where do you think this could go? the wiki or some of the analysis tickets?
13:07:42 <hiro> I was worried that if I write it up again in one of the fix issues it gets buried into gitlab so to speak
13:07:43 <hiro> other than this I do not have anything else besides what is in my tasks list and some bugs for the current metrics website reg some of the graphs not behaving properly that I have to dig into
13:07:49 <hiro> GeKo (IRC), juga do you have anything to discuss for this week sync?
13:08:09 <juga> hiro: hmm, i don't think so
13:09:53 <GeKo> hiro: i'd be fine in some of the tickets
13:10:25 <hiro> the fix tickets for the service or the analysis one?
13:10:42 <GeKo> or you mean analysis#96?
13:10:42 <tor> Uhm, which one of [tpo/network-health/analysis, tpo/network-health/metrics/analysis] did you mean?
13:10:52 <hiro> yeah
13:10:57 <GeKo> i'd reserve that for the actual ddos
13:11:08 <hiro> ok then
13:11:27 <GeKo> collector is kind of 2nd order collateral damage
13:12:07 <hiro> oh GeKo (IRC) one operator at the meetup said that they have observed a ddos to exits that we didn't know about (the specific to exits part)
13:12:22 <GeKo> interesting
13:12:30 <GeKo> do we have more info about that?
13:12:33 <hiro> we have asked for more info and to write to bad-relays so if that mail arrives we will know
13:12:40 <GeKo> ah, okay
13:13:33 <GeKo> hiro: is it expected that the memory usage for collector is still jumping?
13:13:56 <GeKo> it's essentially the same pattern as before, just the spikes are not that high right now
13:14:35 <GeKo> and the RAM cache + buffer does not folow the RAM used, hrm
13:14:39 <GeKo> *follow
13:15:44 <hiro> sometimes it is possible but the service seems more stable now
13:15:45 <hiro> the average is 60% free memory
13:15:45 <hiro> also we shouldn't use more than 80% memory anyways
13:16:02 <GeKo> okay
13:16:20 <GeKo> i don't have anything else
13:16:39 <GeKo> hiro: oh
13:16:48 <GeKo> did the votes document thing succeed?
13:16:50 <hiro> great I'll clean up the MR for collector and add the post mortem then
13:17:01 <hiro> yes
13:17:07 <hiro> we should have that in the db
13:17:24 <GeKo> awesome, so you could fix up that mr as well and we are done with that part
13:17:26 <hiro> but creating statuses takes 6h nowadays... so me and sarthik are working on a solution f or that
13:17:35 <GeKo> yeah...
13:18:13 <hiro> so yeah I can fix up the MR for the vote part .. probably doing a rebase
13:18:18 <hiro> but the long term solution is this: https://gitlab.torproject.org/tpo/network-health/metrics/aggreagator.rs/-/issues/1
13:18:29 <hiro> moving creating statuses out of the parser
13:19:42 <hiro> ok so that's all I guess
13:20:11 <hiro> if everybody is good we can end the meeting
13:21:17 <GeKo> +1
13:21:27 <hiro> #endmeeting