13:01:41 #startmeeting network-health 2025-06-16 13:01:41 Meeting started Mon Jun 16 13:01:41 2025 UTC. The chair is hiro. Information about MeetBot at http://wiki.debian.org/MeetBot. 13:01:41 Useful Commands: #action #agreed #help #info #idea #link #topic. 13:01:56 https://pad.riseup.net/p/tor-nethealthteam-2025-keep 13:04:05 oook my updates are in the pad, but my priority is fixing the data update on meronense... 13:04:37 I am really not sure why suddenly the system has decide to kill the process 13:05:35 can we figure out what changed around the time when this issue started? 13:05:46 there was no trixie update yet, right? 13:05:57 no just usual sys update 13:06:12 what got updated during that? 13:06:16 just kernel? 13:06:24 I haven't checked that yet 13:06:32 meronense? 13:06:35 might be worth it 13:06:41 yeah 13:06:47 still bookworm 13:06:54 thanks 13:06:56 one thing I saw was that there was a reboot and there was a temp file from the onion stats that wasn't allowing the modules to finish 13:07:00 batch 2, in progress: https://gitlab.torproject.org/tpo/tpa/team/-/issues/42070 13:07:03 marked as sensitive 13:07:22 hence I deleted that and then it started not being able to finish anymore in the usual time frame 13:07:29 and then the sys would just kill it 13:07:51 anarcat (IRC): I have asked zen-fu not to update the machine till it is able to finish a run succesfully 13:07:51 i guess the box is already on edge due to the anticipated switch to trixie and misbehaving because of that :) 13:08:01 hiro: right 13:08:12 yeah, it's scared like going to the dentist 13:08:17 exactly 13:08:19 it's going to be better after, i promise meronense 13:08:31 hard to convince, that one 13:08:33 * anarcat hiding the needle behind his huge smile 13:08:42 it knows you 13:08:53 we know each other very well, don't we meronense 13:09:05 now this is getting creepy, sorry 13:09:08 well maybe I could just dump the data it has imported and we could update it anyways 13:09:11 I have a few dumps already 13:09:39 hiro: can we split the update to be done into smaller pieces 13:09:46 so it does not choke on the biiiiig one? 13:09:57 I tried 13:10:27 what's happening? 13:10:36 but wasn't successful... for some reasons... those sql tables aren't easy to debug 13:10:50 I tried to process the data day by day 13:11:04 i see 13:11:06 but still was killed ... then I got the generated ids out of sync and had to fix that too 13:11:16 *sigh* 13:11:49 what is killing thosee day by day processing tasks? 13:11:49 I tried to give more memory to the db and less to the app thinking that could help but it didn't so not sure... let's hope this final tweak works 13:12:07 oom? 13:12:11 yeah 13:12:17 maybe there is a memory leak 13:12:48 i wonder how the day-by-day processing is working usually without hitting the oom then? 13:12:49 huh 13:12:59 It was 13:13:15 since the 30th of may it hasn't worked give or take 13:13:36 but the data is processed all at once 13:13:48 there are some partial tables involved 13:14:29 there https://gitlab.torproject.org/tpo/network-health/metrics/website/-/blob/master/src/main/sql/clients/init-userstats.sql?ref_type=heads#L191 13:14:41 hrm 13:14:45 I broke it down a few times already 13:14:52 if you remember 13:15:11 hazily :) 13:15:39 yeah anyways 13:16:16 does anyone have anything else? 13:16:21 so, we have some stats on the website back for some days, right? 13:16:40 did some runs succeed then? 13:16:48 yeah that was when the ids got out of sync and the other modules managed to finish 13:17:01 okay 13:17:07 because the clients module finished on its own with a sql exception 13:17:14 yeah fun stuff 13:17:41 let us know whether and how we can help <3 13:18:34 thank you! 13:19:29 @juga thanks for reviewing that tor_fusion patch 13:19:50 hiro: yw 13:21:00 ok does anyone have some updates to share? 13:21:14 i think i've catched up with mrs after being afk. More this week on chunk stuff, erpc and/or collector 13:21:27 nothing else from my side 13:22:10 nothing from my side 13:22:14 worth mentioning 13:22:22 i'll be out on thu, likely 13:22:26 not much from my side. continuing on the aggregator. i think once i have the effective family working, i can work on the adapters for server_status. i hope there's no more blockers, and it gets done this week 13:22:26 @juga let me know if you want to sync about anything during this week, we have our 1-1 on thursday but maybe you need it earlier whatever works 13:22:30 thanks sarthikg 13:22:34 but apart from that working on p183 and bad-relay stuff 13:22:47 no worries GeKo (IRC) 13:22:50 hiro: yes, thanks, i think we can wait to Thursday 13:23:17 ok all groot then 13:23:22 if anyone is ok we can continue async and end the meeting 13:23:36 wfm 13:23:41 +1 13:23:44 i'm good 13:23:47 #endmeeting