13:01:41 <hiro> #startmeeting network-health 2025-06-16 13:01:41 <MeetBot> Meeting started Mon Jun 16 13:01:41 2025 UTC. The chair is hiro. Information about MeetBot at http://wiki.debian.org/MeetBot. 13:01:41 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic. 13:01:56 <hiro> https://pad.riseup.net/p/tor-nethealthteam-2025-keep 13:04:05 <hiro> oook my updates are in the pad, but my priority is fixing the data update on meronense... 13:04:37 <hiro> I am really not sure why suddenly the system has decide to kill the process 13:05:35 <GeKo> can we figure out what changed around the time when this issue started? 13:05:46 <GeKo> there was no trixie update yet, right? 13:05:57 <hiro> no just usual sys update 13:06:12 <GeKo> what got updated during that? 13:06:16 <GeKo> just kernel? 13:06:24 <hiro> I haven't checked that yet 13:06:32 <anarcat> meronense? 13:06:35 <GeKo> might be worth it 13:06:41 <GeKo> yeah 13:06:47 <anarcat> still bookworm 13:06:54 <GeKo> thanks 13:06:56 <hiro> one thing I saw was that there was a reboot and there was a temp file from the onion stats that wasn't allowing the modules to finish 13:07:00 <anarcat> batch 2, in progress: https://gitlab.torproject.org/tpo/tpa/team/-/issues/42070 13:07:03 <anarcat> marked as sensitive 13:07:22 <hiro> hence I deleted that and then it started not being able to finish anymore in the usual time frame 13:07:29 <hiro> and then the sys would just kill it 13:07:51 <hiro> anarcat (IRC): I have asked zen-fu not to update the machine till it is able to finish a run succesfully 13:07:51 <GeKo> i guess the box is already on edge due to the anticipated switch to trixie and misbehaving because of that :) 13:08:01 <anarcat> hiro: right 13:08:12 <anarcat> yeah, it's scared like going to the dentist 13:08:17 <GeKo> exactly 13:08:19 <anarcat> it's going to be better after, i promise meronense 13:08:31 <GeKo> hard to convince, that one 13:08:33 * anarcat hiding the needle behind his huge smile 13:08:42 <GeKo> it knows you 13:08:53 <anarcat> we know each other very well, don't we meronense 13:09:05 <anarcat> now this is getting creepy, sorry 13:09:08 <hiro> well maybe I could just dump the data it has imported and we could update it anyways 13:09:11 <hiro> I have a few dumps already 13:09:39 <GeKo> hiro: can we split the update to be done into smaller pieces 13:09:46 <GeKo> so it does not choke on the biiiiig one? 13:09:57 <hiro> I tried 13:10:27 <GeKo> what's happening? 13:10:36 <hiro> but wasn't successful... for some reasons... those sql tables aren't easy to debug 13:10:50 <hiro> I tried to process the data day by day 13:11:04 <GeKo> i see 13:11:06 <hiro> but still was killed ... then I got the generated ids out of sync and had to fix that too 13:11:16 <GeKo> *sigh* 13:11:49 <GeKo> what is killing thosee day by day processing tasks? 13:11:49 <hiro> I tried to give more memory to the db and less to the app thinking that could help but it didn't so not sure... let's hope this final tweak works 13:12:07 <GeKo> oom? 13:12:11 <hiro> yeah 13:12:17 <hiro> maybe there is a memory leak 13:12:48 <GeKo> i wonder how the day-by-day processing is working usually without hitting the oom then? 13:12:49 <GeKo> huh 13:12:59 <hiro> It was 13:13:15 <hiro> since the 30th of may it hasn't worked give or take 13:13:36 <hiro> but the data is processed all at once 13:13:48 <hiro> there are some partial tables involved 13:14:29 <hiro> there https://gitlab.torproject.org/tpo/network-health/metrics/website/-/blob/master/src/main/sql/clients/init-userstats.sql?ref_type=heads#L191 13:14:41 <GeKo> hrm 13:14:45 <hiro> I broke it down a few times already 13:14:52 <hiro> if you remember 13:15:11 <GeKo> hazily :) 13:15:39 <hiro> yeah anyways 13:16:16 <hiro> does anyone have anything else? 13:16:21 <GeKo> so, we have some stats on the website back for some days, right? 13:16:40 <GeKo> did some runs succeed then? 13:16:48 <hiro> yeah that was when the ids got out of sync and the other modules managed to finish 13:17:01 <GeKo> okay 13:17:07 <hiro> because the clients module finished on its own with a sql exception 13:17:14 <hiro> yeah fun stuff 13:17:41 <GeKo> let us know whether and how we can help <3 13:18:34 <hiro> thank you! 13:19:29 <hiro> @juga thanks for reviewing that tor_fusion patch 13:19:50 <juga> hiro: yw 13:21:00 <hiro> ok does anyone have some updates to share? 13:21:14 <juga> i think i've catched up with mrs after being afk. More this week on chunk stuff, erpc and/or collector 13:21:27 <juga> nothing else from my side 13:22:10 <GeKo> nothing from my side 13:22:14 <GeKo> worth mentioning 13:22:22 <GeKo> i'll be out on thu, likely 13:22:26 <sarthikg[mds]> not much from my side. continuing on the aggregator. i think once i have the effective family working, i can work on the adapters for server_status. i hope there's no more blockers, and it gets done this week 13:22:26 <hiro> @juga let me know if you want to sync about anything during this week, we have our 1-1 on thursday but maybe you need it earlier whatever works 13:22:30 <hiro> thanks sarthikg 13:22:34 <GeKo> but apart from that working on p183 and bad-relay stuff 13:22:47 <hiro> no worries GeKo (IRC) 13:22:50 <juga> hiro: yes, thanks, i think we can wait to Thursday 13:23:17 <hiro> ok all groot then 13:23:22 <hiro> if anyone is ok we can continue async and end the meeting 13:23:36 <GeKo> wfm 13:23:41 <juga> +1 13:23:44 <sarthikg[mds]> i'm good 13:23:47 <hiro> #endmeeting