13:01:34 #startmeeting network-health 2026-03-09 13:01:34 Meeting started Mon Mar 9 13:01:34 2026 UTC. The chair is hiro. Information about MeetBot at https://wiki.debian.org/MeetBot. 13:01:34 Useful Commands: #action #agreed #help #info #idea #link #topic. 13:02:01 and the pad 13:02:02 #link https://pad.riseup.net/p/tor-nethealthteam-2025-keep 13:02:36 It is https://pad.riseup.net/p/tor-nethealthteam-2026-keep 13:02:36 umm, this one? https://pad.riseup.net/p/tor-nethealthteam-2026-keep 13:02:53 oh yeah! 13:02:55 sorry 13:03:03 #link https://pad.riseup.net/p/tor-nethealthteam-2026-keep 13:04:26 ok who wants to go? 13:04:39 My updates: 13:04:39 Last week i sorked on some exitmap, sbws and services issues, reviewed others and continued to work on tor_anomalies. 13:04:42 This week will continue with tor_anomalies and collector. 13:04:53 I have been fighting OOM errors lol 13:05:42 hiro: those are hard... 13:06:08 yeah can't wait for collector-rs \o/ 13:06:19 >_> 13:07:41 i guess we discuss p183 issues in our monthly sync later, right? 13:07:53 apart from that i had a week of reviews 13:07:58 and i survived 13:07:59 My update: Added the clickhouse backend to sea-query. currently extracting it as a separate crate since maintaining sea-query's fork is tougher in the long-term than to maintain this crate. Parallely migrating the queries to use the new crate. Also, in-discussion to add support for happy families in aggregator. 13:08:15 I think so GeKo (IRC) 13:10:12 hiro: for the grafana dashboards, i think i found some issues 13:10:16 alright! so if isn't there anything else we can talk at our voice sync in 1-h 13:10:30 i'll stop at that and see how fixing those goes 13:10:32 GeKo (IRC): which one? 13:10:43 both dashboards 13:11:05 i filed issues in datastore 13:11:14 the bw one needs some work for sure... and I have to review the clients/onions counts again but the oom issues has blocked me there 13:11:23 https://gitlab.torproject.org/tpo/network-health/metrics/datastore/-/issues/14 is the one for the bw 13:11:36 there is some issues with the bw_line_stream at least 13:12:02 well, i am not concerned with the layout of the panes or etc but the data displayed/used 13:12:04 maybe I am filtering too aggresively 13:12:24 and unrelated but while we are here we should talk about the happy family part 13:12:46 we need to display that somehow on relay-search 13:13:08 so, just having family-cert and -id in onionoo doesn't cut it, i think 13:13:21 although it's a necessary step 13:13:25 @geko the MR for onionoo is almost there and I left you a comment there 13:13:35 I think I am able to keep both system 13:13:48 I just hope onionoo doesn't choke in the process 13:13:57 yeah, the question is relevant for aggregator-rs as well 13:14:14 as we want to have something related to happy families in the statuses as well 13:14:23 so, it's not onionoo specific 13:14:38 well for aggregator it is easier because it is a query away the result 13:14:49 i was wondering whether we could just reuse effective_family here 13:15:03 yes 13:15:06 if there is no happy family configured use the old mechanism 13:15:17 otherwise populate it with the new one 13:15:24 so effective family should be everything that is verified via the family_ids 13:15:35 for declared_family it's a bit trickier 13:15:45 as we have all the fingerprints in the relay descs 13:15:53 but for happy family it's just the cert 13:16:08 yes 13:16:22 this is what I have done in onionoo 13:16:44 so, maybe we just leave declared_ family empty in case happy families is configured 13:17:18 why? 13:17:34 because the relay only declares a certificate 13:17:45 well it saves strings in the db 13:17:45 and no family in the server desc anymore 13:17:54 but for onionoo as well? 13:18:00 i don't follow 13:18:33 declared_family in the old system contains all the fingerprints in the relay descriptors 13:18:52 but there is no such thing with happy families 13:18:56 well if you want to visualize the effective family the same way, why you don't want the alleged family to match? 13:18:58 sorry declared 13:19:14 ah because you are thinking of the field in the descriptor ok 13:19:19 yes 13:19:35 ok makes sense then 13:19:44 they might not even match because folks made a mistake with the family configuration 13:20:03 sarthikg[mds]: ^ 13:20:08 yeah that's why I was doing the check 13:20:13 does that make sense? 13:20:14 to see if it was matching 13:20:31 I thought that was relevant information to haev 13:20:39 i am not sure what we should do in case the old and new system contradicts each other 13:21:01 maybe we just pick the happy families config in that case and run with that 13:21:38 GeKo: yeah, for the purpose of happy families, we will only be having effective_family based on the certs/id matching? but should we also introduce a new field to mark that the relay uses the happy_families instead of the legacy families? 13:22:06 i don't think we'd need that 13:22:19 hiro: how do you take several family-certs into account? 13:22:43 because that's possible now while we only had one `family` in the server desc 13:22:52 do you merge them somehow? 13:23:49 sarthikg[mds]: but, yeah, i think we can leave declared_family empty and focus only on the effective_family for now 13:24:15 one thing i was thinking about was to use declared_family in the happy family case to somehow display all the family-certs 13:24:34 because there are now several possible, meaning a relay can now belong to several families 13:24:36 I was treating the cert like declared fps 13:24:40 and then using the ids to verify... but not sure what we should do when we have a two ids for example... 13:25:05 that's why I thought in the beginning that the effective_family is not something we should maintain 13:25:06 okay, well, we can have several certs to begin with 13:25:12 just list the certs and the ids 13:25:50 we don't need to maintain that but just populate that from the microserver descs 13:26:00 we get that for free from the dir-auths 13:26:10 I mean I think we should keep the two system separated 13:26:31 yes and just publish the family_ids 13:26:47 and then one could query all the relays with the same family_id(s) 13:27:41 how would we populate the effective family members on relay search? 13:28:25 from the old system 13:28:41 and for the new system you would be able to visualize the ids the relay belongs to 13:28:42 but there are relays with new system only 13:29:07 yes and you'd get the ids for those 13:30:00 and when you click on an id, you'd get a list of relays 13:30:20 hrm 13:30:38 I think happy family might open more possibilities to relay belonging to different families and that might have to be visualized differently imo than a list 13:30:47 i believe the happy_families require having family as its own entity, since a relay can be part of multiple families, and not all families have the same set of relays. I'm not sure how it will work with the current design, but that's just what i think for now... 13:31:47 okay, so in the old system we keep the fingerprint list in effective_family 13:31:49 but I am happy to add all the relays sharing the same family ids to the effective family if that's what one would expect. 13:32:07 and in the new one we only add the family ids? 13:32:21 yes that's what I thought in the beginning 13:33:02 we are fingerprint based, though, so i expect no one can make use of the family ids 13:33:08 at least not on relay-search 13:33:23 and i'd expect everyone clicking on the ids to get the fingerprints/other relays 13:33:25 but one could search by family_id 13:33:26 or family_cert 13:33:42 yes that should be the idea, that you get a list 13:33:46 of fp sharing that id 13:33:57 yeah, searching by id makes sense 13:34:36 but i think we should populate the Effective Family Members part with the actual fingerprint directly 13:35:06 the looking clase next to it could contain a the link to the familiy id(s) or something 13:35:30 where we now have https://metrics.torproject.org/rs.html#search/family:E4F9B844BB53B27EC4394C34A75CFBCC06E5F266 13:36:09 but things get tricky quickly with serveral ids 13:36:58 this is what onionoo is doing atm in the MR 13:37:14 great 13:37:21 but the relationship now is more graph like imo and the list might be deceiving 13:38:17 we can test whether we get the right thing with: https://metrics.torproject.org/rs.html#search/contact:applied 13:38:46 GeKo: how about we rename effective_family & declared_family as legacy_effective_family & legacy_declared_family? and introduce the family_ids key which contains all the listed id's. happy families will only use family_ids? and the family_id will be searchable to get all relays that list that family_id in the website? 13:38:56 they only have happy families configured and i thought all their fps should show up in the effective familys section 13:39:17 quetzacoal I think is on both systems for example 13:39:17 and there should be a (104) again after the nicknames 13:39:31 which is another candidate 13:39:50 sarthikg[mds]: we could do that 13:40:29 it might be cleaner that way 13:40:51 i was a bit unhappy with legacy_* stuff 13:41:00 if we use the family_ids key we do not need to rename the old system? it'll still be valid to analyse old data 13:41:03 but maybe we need to bite that bullet for better clarity 13:41:53 hiro: we can just do the renaming in the website maybe, not in the db 13:42:06 yeah, i was more thinking about having effective_family_new and declared_family_new or something 13:42:23 so, something in parallel 13:42:52 because at the end family_cert and family_id is something in parallel as well 13:43:21 (compared to "family") 13:44:06 I am not sure about having all these family fields in onionoo apis 13:44:26 yeah 13:44:28 for what the DB is concerned we can add as many fields as we want 13:45:02 GeKo: but we won't be able to represent the graph structure well enough with the lists imo... or at all tbh. 13:45:08 that's why i was thinking we might try to bolt the new system on top of what we have with onionoo as good as we can 13:45:23 sarthikg[mds]: true 13:45:30 yes that's what is happening 13:45:59 so, maybe we try to stick to what we have with onionoo as good as possible 13:46:08 anyways I have to change location before the 183 meeting. let' 13:46:10 while not following those constraints with the db 13:46:24 and do the right thing there instead 13:46:32 hiro: kk 13:47:21 let me end the meeting. I'll be 5 mintues late for 183 I think 13:47:29 #endmeeting