16:58:54 #startmeeting Network team meeting, 31st May 2022 16:58:54 Meeting started Tue May 31 16:58:54 2022 UTC. The chair is ahf. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:58:54 Useful Commands: #action #agreed #help #info #idea #link #topic. 16:58:58 yoyo 16:59:07 o/ 16:59:11 welcome to the last meeting in may, which means on monday we have the s61 meeting on its own 16:59:17 our pad is at https://pad.riseup.net/p/tor-netteam-2022.1-keep 16:59:26 o/ 16:59:36 o/ 16:59:46 o/ 16:59:57 hi all! 17:00:13 o/ 17:00:37 * ahf is completely off today since it's tuesday 17:00:58 how are folks doing with their boards: Board: https://gitlab.torproject.org/groups/tpo/core/-/boards 17:01:17 stable 17:01:50 LOL, that is a good way to look at it 17:01:54 exponential growth this week 17:02:35 okay with me 17:02:46 we are entering last month of Q2 now, so this week is a good week to look a bit at whether the Q2 situation for yourself looks well 17:02:48 hoping that i can find more stuff to work on actually :) 17:02:59 nice 17:03:16 good problem to have! :) 17:03:43 nickm: at some point i would really love to have a sync with you, dgoulet, and gaba about tpo/core/tor and see how many tickets that we can close and/or turn into arti/torspec specific tickets 17:03:53 we have ~850 tickets in tpo/core/tor right now 17:04:09 +1 17:04:47 ok, don't see anything off with the board 17:04:53 dgoulet: anything on tor releases this week? 17:05:08 nope 17:05:09 i think you are waiting for some merges from me if we want to roll our any updates there? 17:05:12 cool! 17:05:24 yeah but no hurry but for sure a 045 and 046 are coming 17:05:29 ya 17:05:31 excellent 17:05:52 we don't have anything incoming 17:06:30 we have a discussion item: 17:06:32 [2022-05-31] [nickm] I've been asked when we plan to build the improved MyFamily. Thoughts? 17:07:03 i do not have any specific thoughts there other than i remember you had a design for it 17:07:31 It is in exactly the wrong spot wrt fundable-ness: it is too small to be its own proposal, but too big to just do in a day. 17:07:49 prop321 ? 17:07:54 I guess it would take around two weeks to do it in C and arti. 17:08:04 dgoulet: yes 17:08:31 the benefit is that (eventually) it saves a lot of bandwidth, and that it enables bridges to meaningfully belong to families. 17:08:41 It is a prerequisite for walking onions 17:09:16 would you be interested in trying to fit it in with you looking for some tasks right now? 17:09:55 * juga joined 17:10:16 o/ juga 17:10:26 Hm. I'll think about it, but I think it would be good to work ahead on arti stuff instead. 17:10:43 If I wind up way ahead of schedule on Q2 arti stuff then I'll consider? 17:10:56 nickm: oki, i have no rush on this at all. wouldn't it be something we could batch up with the walking onion grant proposal(s) 17:10:58 ya 17:10:59 sounds good 17:11:30 i think mikeperry can do s61 next then? 17:11:40 ok 17:12:21 so hiro and I took a look at the onionperf instances with congestion control: https://gitlab.torproject.org/tpo/network-health/analysis/-/issues/37 17:12:56 it looks like they are only 1.5-2X faster than non-congestion control, which is less than shadow predicted. (it predicted 3-4X) 17:13:33 I am not sure if this is because everyone has not upgraded yet, or other reasons 17:14:01 it is def odd that consumed bandwidth has not risen, despite TBB upgrade: https://metrics.torproject.org/bandwidth-flags.html?start=2022-05-20&end=2022-05-31 17:14:53 we can make the congestion control params more agressive.. that is one option 17:15:21 we could also see what shadow says if we set our exact exit upgrade yet, but hold back all non-perfclients from upgrading 17:15:52 the consumed bw date on metrics goes until (and including) 05/28, hrm 17:15:59 *data 17:16:17 it is a bit of a head scratcher 17:16:56 since the TB upgrade is being distributed now does it make sense to wait a little bit and see if this corrects itself (the consumed bw gets a larger %) ? 17:17:24 even if tor browser usage were not the main usage of tor, i think it should still be visible on the consumed bw graph 17:17:31 ok 17:17:50 so, yeah, maybe waiting a bit more? 17:18:17 otherwise we could maybe look at the data manually and figure out whether we have a bug for the consumed bw graph? 17:18:31 the sim idea is interesting; would give us a bound for what "low uptake" looks like in shadow 17:18:56 jnewsome: is it possible with shadow to put all markovclients on 0.4.6, but all exits and perfclients on 0.4.7? that might be a similar situation to what we have now 17:19:13 juga: can you run the exit consensus upgrade check script real quick? 17:19:28 0.84 17:19:32 mikeperry: I don't remember if that option exists in the pipeline now, but if not it'd be easy to add 17:19:44 juga: thanks 17:19:49 np :) 17:20:44 do we know that exits are mostly on 0.4.7? I guess we have that from the relay descriptors? 17:21:09 jnewsome: that's the number juga just gave. we're at 84% on 0.4.7 by consensus weight 17:21:22 ah cool 17:21:56 so we could input that fraction for Exits on 0.4.7 in shadow, and just hold back all markov clients, and see what that looks like 17:22:02 mikeperry: not sure how important it is, but i don't see longclaw aproximating to gabelmoo at https://metrics.torproject.org/totalcw.html and it's already using CC exits (since bwscanner_cc param was changed) 17:22:02 does that mean we'd expect e.g. 84% of the onionperf measurements to be through an upgraded exit? 17:22:12 hhmmm I have 63.46% 17:22:13 3946: 0.4.7 [63.46 %] (MAJOR) 17:22:19 (that is weighted ^) 17:22:34 (we can confirm after the meeting) 17:22:47 dgoulet: i filter the ones that allow exiting to 443 and don't have BAD flag 17:23:09 hm, interesting with running a sim like that 17:23:12 you filter out those that allow 443 ? 17:23:15 there is also descriptor weight vs sbws "w Bandwidth=N Measured=1" line 17:23:45 dgoulet: yes, and i take consensus weight 17:24:08 can we directly check the onionperf logs to see which/how-many of the measurements went through an upgraded exit? 17:25:21 it is some work. we'd need onionperf to listen to CIRC_BW events, and cross-reference those with CIRC events 17:25:41 ok, yeah maybe not worth it yet then 17:26:16 though if we have it start listening to CIRC_BW now the data will be there if we decide it's worth checking 17:26:39 but yeah that could be a factor. it did not look simiar to the 10%, 25%, or 50% upgrade runs in terms of CDFs 17:26:51 it looked like a larger fraction than that 17:27:44 juga: longclaw and gabelmoo are both on 0.4.7.7 and sbws 1.5.2 now? 17:27:52 juga: strange that I don't get the same :S ... would be curious to see your script so we can fix the health team helper scripts 17:28:16 mikeperry: i think gabelmoo is not using sbws 1.5.2 yet, i can check 17:28:39 (and i don't think it's on 0.4.7.7 either) 17:28:56 dgoulet: yes, i'm looking at exits with 2 in flowctrl, not the tor version, can pass you the link to the code in some secs 17:28:59 i'd assume sebastian would have notified the dir-auth thread otherwise 17:30:25 dgoulet: you would need to dig into the other functions too :/ https://gitlab.torproject.org/tpo/network-health/sbws/-/blob/m15/sbws/core/flowctrl2.py#L183 17:30:39 awseome thanks 17:31:26 so for sbws, yeah lets get those two upgraded. is bastet still yoloing? 17:31:43 faravahar you mean? 17:32:00 I thought bastet was on 0.4.7 but not sbws 1.5.x 17:32:08 ah! 17:32:09 i think that's right 17:32:12 yes, i think so 17:32:45 it is interesting that all of them went up around the time that cc_alg=2 was set 17:32:56 maybe they are all yoloing on 0.4.7, despite the ask 17:33:06 i can check 17:33:23 in which case, might as well just get them to yolo onto sbws 1.5.2 too :) 17:33:36 lol 17:34:24 only 2 using 0.4.7.7 17:34:30 (guess longclaw and bastet) 17:34:56 mikeperry: you should be able to do the sim you want now with PL_TORV2_EXIT_BW_UP_FRAC and PL_TORV2_BG_CLIENT_BW_UP_FRAC 17:35:31 very strange.. this could mean that the upgrade to congestion control freed up capacity for sbws to measure even with 0.4.6 17:35:47 that could mean that in some cases, 0.4.6 can in fact out-compete congestion control 17:36:01 which might explain some of this behavior 17:36:27 jnewsome: very nice, live shadow hacks! 17:36:35 ok I can try a sim after the meeting 17:36:38 nice 17:36:43 mikeperry: yes, re. longclaw and bastetr, only lonclaw using sbws 1.5.x 17:38:30 GeKo: did the overload we saw last week go down? there was a spike but I think it was onion service noise again 17:38:48 curious about that and any other netthealth reports 17:39:05 yeah, as mentioned on another channel here is an updated graph: 17:39:07 https://share.riseup.net/#0_wdcsiggs-LI9ptk9LeWQ 17:39:17 so the guard overload seems indeed to go down 17:39:51 hmmm 17:39:54 there is a spike in exit overloads where there about 100 additional ones get added on 05/28 17:39:55 GeKo: ooh but exit overload is increasing as of tbb update 17:40:02 i don't think so 17:40:03 oh, so unrelated? 17:40:17 i looked at it and that's niftybunny's relays 17:40:18 they got added and immediately were overloaded? 17:40:24 which are still on 0.4.6.8 o_O 17:40:43 and that version still had some bugs we fixed later on 17:40:43 ohh so that had false positives in it still 17:40:50 nifty indeed 17:41:02 yeah, i am inclined to think this is cc unrelated 17:41:35 so from the overload side things look okay-ish imo 17:41:54 i got no new reports either by relay operators etc. complain on irc or somewhere else 17:42:31 ok.. so I will run a sim or two with jnewsome bg client hax and we can see if perf is similar to what we see now on live. in which case we should not mess with things 17:42:57 but if shadow says it still should be faster.. hrmm.. I might get itchy to jack up the cc params, esp if overload stays low 17:43:42 dgoulet: also I see https://gitlab.torproject.org/tpo/core/tor/-/issues/40620.. that one might be annoying to find.. the function that is in gets called from all over the place :/ 17:44:07 connection_start_reading() 17:44:25 so one or more callpoints is probably not checking the XOFF state first 17:44:50 right 17:44:52 :S 17:45:25 anyway it is doing the "right thing" there.. it maybe just should be rate limited 17:45:55 or info idk 17:46:52 ok I think that is all I have for s61 17:47:32 nice 17:47:35 anything else for today? 17:47:41 * juga is good 17:48:13 <- too 17:48:38 let's call it then, thanks all for joining! 17:48:41 #endmeeting