17:00:11 #startmeeting Network team meeting, 9th august 2021 17:00:11 Meeting started Mon Aug 9 17:00:11 2021 UTC. The chair is ahf. Information about MeetBot at http://wiki.debian.org/MeetBot. 17:00:11 Useful Commands: #action #agreed #help #info #idea #link #topic. 17:00:13 hello everybody 17:00:21 pad is at https://pad.riseup.net/p/tor-netteam-2021.1-keep 17:00:26 o/ 17:00:33 o/ 17:00:38 o/ 17:00:42 woo! hi! 17:01:10 o/ 17:01:25 o/ woh looks like we are many people today, very nice 17:01:32 how are people doing with their boards ? 17:01:47 o/ 17:01:57 o/ 17:02:08 my board is good; arti can be harder than I had expected :) 17:02:24 I need to offload arti#128 to the whole team, I think. I don't have a coherent thing to write there 17:02:39 very nice, i am mostly in s30 land right now 17:02:53 ah, we can lift it over to the tpo/core/team repo if we want to? 17:03:25 ah, yes please 17:03:55 i haven't made much thoughts about this other then the conversations we had last week for the thursday meeting 17:04:24 i will get it moved when i have my admin account around 17:05:23 works for me 17:05:28 release things looks indifferent from last week from what i could tell, but dgoulet, nickm and i should probably talk afterwards TROVE 2021 007 17:05:34 the james bond trove 17:06:24 is that correct? :-) 17:06:37 makes sense to me 17:06:57 yes 17:07:13 don't see anything from other teams 17:07:58 no discussion items or announcements w00tw00t 17:08:04 ok then i think it's s61 time 17:08:07 i thought i adderd a discussion item 17:08:08 hang on 17:08:10 huh 17:08:22 you did! 17:08:33 it was in the wrong place; I moved it up, sorry 17:08:46 2021-08-09 [nickm] It looks like we never made the tickets for TROVE-2021-00[356] public. Can we safely do so now? 17:09:09 i would say we can, yeah. i don't remember number 5, but 3 and 6 i think was OK to do with 17:10:31 ok 17:10:34 doing now 17:11:13 ok 17:11:19 next item is: 2021-08-09 [nickm] Plan dates for next releases, and TROVE-2021-007 fix. 17:11:53 it sounds like we need to chat a bit about 2021-007 after, but we are also talking about how dgoulet and i need to get more involved with releases 17:12:05 maybe this is a good opportunity for us to dive into it heads first 17:12:14 looks like next ff release data is early september if we want to sync with that 17:12:28 tbb-team: can you pick up a security release earlier than that if we have one? 17:12:49 oh hm 17:12:57 ? 17:13:02 nickm: i will take a look at arti#128 17:13:10 yeah, syncing with them is fine, nothing there 17:13:22 gaba: you already did; we're just moving it into the team issues list 17:13:43 we can discuss it on thursday 17:14:03 dgoulet, ahf: I think it would be reasonable to target August 16 (1 week) for the releases with this fix; what do you think? 17:14:34 plausible! 17:14:41 i think that is OK 17:15:23 and only 3 releases to do, since 047 isn't releasing yet and 044 is EOL 17:15:29 nice! 17:15:42 shall we try to get a new set of fallbacks by that date too? 17:15:53 yes absolutely 17:16:02 dgoulet: ok. can I leave that to you? :) 17:16:05 yes ofc 17:16:08 woot 17:16:16 awesome 17:16:29 then david and i can continue our plan with talking about release process tomorrow i think 17:16:34 cool! 17:16:40 cool; pull me in if you have any questions 17:16:43 i think that was all for discussion items. let's move to s61? 17:16:48 nickm: ya, we will for sure 17:16:58 2 notes. 1: current process is is doc/HACKING/ReleasingTor.md 17:17:07 yep 17:17:13 ahf: yes, lets do that 17:17:14 2: I forget what my second note was 17:17:15 :) 17:17:20 ok, s61 now :) 17:17:21 goto 1; 17:17:23 :-D 17:17:28 mikeperry: you're on 17:17:34 ok 17:17:46 so ppl seem back from vacations; yay 17:18:43 :-) 17:18:44 I updated the Sponsor61 section as best I could. I think it captures the stuff we went over in the meting last week, for those who were out 17:18:57 we have some blockers in that the shadow box is busted 17:19:22 and we also need to figure out how to get its output to match metrics.tpo 17:19:38 [i have something for after the s61 section; sorry! it will be short] 17:20:11 I am not sure if the shadow issue needs input from jnowesome to diagnose the log, or if lavamind is still looking into it: https://gitlab.torproject.org/tpo/tpa/team/-/issues/40350 17:20:27 oh, it was the sudden gitlab issue? 17:20:48 yeah it is not accepting jobs. but there was a lot of gitlab runner damage last week with disk space, etc 17:21:04 my impression is that lavamind is still looking into it 17:21:20 * gaba still needs to look about anything from s61 from last week 17:22:43 once we get shadow running again, we need to figure out how to make its baseline match the metrics website onionperf data 17:23:09 I think what needs to be done there is to import a handful of instances of the onionperf models into shadow 17:23:24 so with acute last week we were discussing that the models from onionperf could maybe used into shadow 17:23:48 to get a representative amount of data points from an hour sim, I think we will need multiple compies of the onionperf models 17:24:04 she was confident some of the code could be reused.. but I haven't looked into that yet 17:24:17 nickm: regarding earlier Tor Browser release - yes, we can release earlier if needed 17:24:29 great! 17:24:51 afaik the main difference is just how the data is reported - the shadow postprocessing (tornettools) reports aggregate data over the onionperf instances in the simulation. the raw data is there for the individual instances, but we don't have scripts to graph it 17:25:13 hiro,jnewsome,acute: this probably requires some coordination on how to add the onionperf tgen models to the shadow sim. which I imagine requires a working gitlab runner to test 17:25:23 wom 2 17:26:04 sporksmith[m]: you mean how the data is graphed in onionperf or in metrics website? 17:26:42 in the shadow sim postprocessing - it shows aggregate data, vs the web site showing individual instances 17:26:54 uhm 17:27:21 so it's a matter of understanding how shadows aggregates the data and having it graph individual instances if requested? 17:27:45 or the other way around? aggregate the onionperf data in metrics website? 17:28:15 I think graphing individual instances of the shadow sim data, though I guess we could do the other way around too 17:28:45 maybe this is too much detail for this meeting. should we try syncing up again this week? 17:29:02 ok sounds good. I think I'll create a ticket to track this in the website 17:29:05 though it's going to be hard to do very much before the runner is working again 17:29:53 jnewsome: can you check with lavamind to see if he needs anything from you to help diagnose the runner failure? 17:30:10 mikeperry: will do 17:30:44 is that it for s61 things? 17:31:00 I have one thing about the overload metrics 17:31:13 https://gitlab.torproject.org/tpo/network-health/metrics/relay-search/-/issues/40005 17:31:59 so we found some performances issues in onionoo, for which we might not be able to expose right away all the information about the overload-ratelimits and overload-fd-exhausted lines 17:33:03 hrmm, is this due to extra-info handling? the overload-general line is ok? 17:33:15 * hiro < https://matrix.org/_matrix/media/r0/download/matrix.org/wsHmhslGEYcJxPreXWNhmIgy/message.txt > 17:33:39 hm 17:33:39 the overload-general could be ok 17:33:48 unless we want the operators to know what is overloading 17:34:35 not sure i understand, but you see nodes hitting the fd limit from the overload-fd-exhausted entry in the extra-info's ? 17:34:43 yes 17:35:23 interesting. if there is a way to cluster it we might be able to find out if they run tor by hand or use some init system that forgets to bump these limits for tor. could be bugs in distro's init scripts 17:35:33 but because of the way onionoo process the extrainfos we might not be able to expose all the info on relay-search 17:35:50 ah 17:35:51 I think we only want the operator to know that it is "overloaded" 17:35:54 for overload-general, we should give them an alert that includes instructions on how to get the metricsport details into prometheus for their own diagnosis 17:35:59 and then the operator can go on the MetricsPort to learn why 17:36:10 there, what mikeperry says :) /me shuts up 17:36:21 ok! 17:36:38 so that's useful to know thanks 17:36:50 for the fd-ehausted issue, I imagine geko and arma2 inspecting that while doing reachability tests, etc 17:37:11 but that is just a ulimit change to fix, or it should be 17:37:44 yep 17:37:59 yeah 17:38:03 I guess if getting that data causes perf issues on the metrics server, that is not surprising. we ran into that in early testing 17:38:12 and i agree with dgoulet on just showing that relays are overloaded 17:38:33 with some hint on how to figure out what is going on 17:38:38 looking at the metrics port 17:38:49 I would put it like "red" or something very noticeable! 17:38:52 so we will consume the fd-exhausted information? and we can offer the operators just the boolean flag? 17:39:20 because we can expose a fd-exhausted flag with the bandwidth information 17:39:40 and don't expose that on relay-search 17:39:52 maybe just on the bandwidth graph 17:40:06 we could experiment a bit i guess 17:40:12 sounds good 17:40:22 to figure out what approach is not confusing operators too much 17:40:24 I am not sure what the perf issue is, but the theory is that fd-exuasted should be an easy ulimit fix. so a boolean is fine there, if that is easier 17:40:59 it's just on the onionoo data models and how the endpoints outputs the documents it produces 17:41:19 I think that's ok mikeperry 17:42:50 very good 17:42:51 It's all from me 17:42:57 * GeKo does not have anything else for s61 17:43:01 juga: while you were away, ggus experimented with a research prototype for unlisted exists. some of that work might be useful for the sbws pinned exit ticket: https://gitlab.torproject.org/tpo/network-health/sbws/-/issues/40022#note_2746514 17:43:03 * ahf good too 17:43:10 qq from me not on s61: Anybody mind if I take off from 30 Aug through Sep 3? 17:43:23 nickm: please do! 17:43:39 mikeperry: i looked at that, but i think we'll run into the onion service issue we mentioned 17:43:56 juga: that was a way to do it without the onion service 17:44:22 mikeperry: ok, let's talk later, cause i think bridge uses onion service there 17:44:34 nickm: nope, hope you enjoy it :-) 17:44:44 oh interesting. ok 17:46:10 well I think that is it for the s61 part of the meeting then 17:46:44 sweet <3 17:47:04 i am also gonna take some holiday later this month but figuring that out this week. last month while it was away from work it was more doing emotional paperwork /o\ 17:47:10 ok, i don't think we have anything else for our meeting today 17:47:12 everybody good? 17:47:25 yes 17:47:34 👍️ 17:47:45 ok w me. do we still need to talk about the TROVE after the meeting? 17:47:55 or should we do that when we cover backports and releases tomorrow? 17:48:06 later imo 17:48:19 ok. we'll confer about that tomorrow. 17:48:22 dgoulet: later today or later tomorrow? :-S 17:48:32 lol the second option Nick game :) 17:48:39 former vs later lol 17:48:51 ok, let's chat about it tomorrow? 17:49:00 yes 17:49:12 spelling is latter :) 17:49:18 knew it ... 17:49:18 hence the confusion 17:49:28 ah! 17:49:34 ok, we talk tomorrow then 17:49:44 thanks all for the meeting. nice to have everybody back 17:49:47 o/ 17:49:48 #endmeeting