13:01:41 <hiro> #startmeeting network-health 2025-06-16
13:01:41 <MeetBot> Meeting started Mon Jun 16 13:01:41 2025 UTC.  The chair is hiro. Information about MeetBot at http://wiki.debian.org/MeetBot.
13:01:41 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic.
13:01:56 <hiro> https://pad.riseup.net/p/tor-nethealthteam-2025-keep
13:04:05 <hiro> oook my updates are in the pad, but my priority is fixing the data update on meronense...
13:04:37 <hiro> I am really not sure why suddenly the system has decide to kill the process
13:05:35 <GeKo> can we figure out what changed around the time when this issue started?
13:05:46 <GeKo> there was no trixie update yet, right?
13:05:57 <hiro> no just usual sys update
13:06:12 <GeKo> what got updated during that?
13:06:16 <GeKo> just kernel?
13:06:24 <hiro> I haven't checked that yet
13:06:32 <anarcat> meronense?
13:06:35 <GeKo> might be worth it
13:06:41 <GeKo> yeah
13:06:47 <anarcat> still bookworm
13:06:54 <GeKo> thanks
13:06:56 <hiro> one thing I saw was that there was a reboot and there was a temp file from the onion stats that wasn't allowing the modules to finish
13:07:00 <anarcat> batch 2, in progress: https://gitlab.torproject.org/tpo/tpa/team/-/issues/42070
13:07:03 <anarcat> marked as sensitive
13:07:22 <hiro> hence I deleted that and then it started not being able to finish anymore in the usual time frame
13:07:29 <hiro> and then the sys would just kill it
13:07:51 <hiro> anarcat (IRC): I have asked zen-fu not to update the machine till it is able to finish a run succesfully
13:07:51 <GeKo> i guess the box is already on edge due to the anticipated switch to trixie and misbehaving because of that :)
13:08:01 <anarcat> hiro: right
13:08:12 <anarcat> yeah, it's scared like going to the dentist
13:08:17 <GeKo> exactly
13:08:19 <anarcat> it's going to be better after, i promise meronense
13:08:31 <GeKo> hard to convince, that one
13:08:33 * anarcat hiding the needle behind his huge smile
13:08:42 <GeKo> it knows you
13:08:53 <anarcat> we know each other very well, don't we meronense
13:09:05 <anarcat> now this is getting creepy, sorry
13:09:08 <hiro> well maybe I could just dump the data it has imported and we could update it anyways
13:09:11 <hiro> I have a few dumps already
13:09:39 <GeKo> hiro: can we split the update to be done into smaller pieces
13:09:46 <GeKo> so it does not choke on the biiiiig one?
13:09:57 <hiro> I tried
13:10:27 <GeKo> what's happening?
13:10:36 <hiro> but wasn't successful... for some reasons... those sql tables aren't easy to debug
13:10:50 <hiro> I tried to process the data day by day
13:11:04 <GeKo> i see
13:11:06 <hiro> but still was killed ... then I got the generated ids out of sync and had to fix that too
13:11:16 <GeKo> *sigh*
13:11:49 <GeKo> what is killing thosee day by day processing tasks?
13:11:49 <hiro> I tried to give more memory to the db and less to the app thinking that could help but it didn't so not sure... let's hope this final tweak works
13:12:07 <GeKo> oom?
13:12:11 <hiro> yeah
13:12:17 <hiro> maybe there is a memory leak
13:12:48 <GeKo> i wonder how the day-by-day processing is working usually without hitting the oom then?
13:12:49 <GeKo> huh
13:12:59 <hiro> It was
13:13:15 <hiro> since the 30th of may it hasn't worked give or take
13:13:36 <hiro> but the data is processed all at once
13:13:48 <hiro> there are some partial tables involved
13:14:29 <hiro> there https://gitlab.torproject.org/tpo/network-health/metrics/website/-/blob/master/src/main/sql/clients/init-userstats.sql?ref_type=heads#L191
13:14:41 <GeKo> hrm
13:14:45 <hiro> I broke it down a few times already
13:14:52 <hiro> if you remember
13:15:11 <GeKo> hazily :)
13:15:39 <hiro> yeah anyways
13:16:16 <hiro> does anyone have anything else?
13:16:21 <GeKo> so, we have some stats on the website back for some days, right?
13:16:40 <GeKo> did some runs succeed then?
13:16:48 <hiro> yeah that was when the ids got out of sync and the other modules managed to finish
13:17:01 <GeKo> okay
13:17:07 <hiro> because the clients module finished on its own with a sql exception
13:17:14 <hiro> yeah fun stuff
13:17:41 <GeKo> let us know whether and how we can help <3
13:18:34 <hiro> thank you!
13:19:29 <hiro> @juga thanks for reviewing that tor_fusion patch
13:19:50 <juga> hiro: yw
13:21:00 <hiro> ok does anyone have some updates to share?
13:21:14 <juga> i think i've catched up with mrs after being afk. More this week on chunk stuff, erpc and/or collector
13:21:27 <juga> nothing else from my side
13:22:10 <GeKo> nothing from my side
13:22:14 <GeKo> worth mentioning
13:22:22 <GeKo> i'll be out on thu, likely
13:22:26 <sarthikg[mds]> not much from my side. continuing on the aggregator. i think once i have the effective family working, i can work on the adapters for server_status. i hope there's no more blockers, and it gets done this week
13:22:26 <hiro> @juga let me know if you want to sync about anything  during this week, we have our 1-1 on thursday but maybe you need it earlier whatever works
13:22:30 <hiro> thanks sarthikg
13:22:34 <GeKo> but apart from that working on p183 and bad-relay stuff
13:22:47 <hiro> no worries GeKo (IRC)
13:22:50 <juga> hiro: yes, thanks, i think we can wait to Thursday
13:23:17 <hiro> ok all groot then
13:23:22 <hiro> if anyone is ok we can continue async and end the meeting
13:23:36 <GeKo> wfm
13:23:41 <juga> +1
13:23:44 <sarthikg[mds]> i'm good
13:23:47 <hiro> #endmeeting