16:58:12 #startmeeting Network team meeting, 13 March 2023 16:58:12 Meeting started Mon Mar 13 16:58:12 2023 UTC. The chair is ahf. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:58:12 Useful Commands: #action #agreed #help #info #idea #link #topic. 16:58:14 hello hello 16:58:28 pad is at https://pad.riseup.net/p/tor-netteam-2023.1-keep 16:58:32 hello 16:58:35 o/ 16:59:14 o/ 16:59:50 o/ 17:00:16 okay, let's go 17:00:25 pad is very slow for me to load today 17:00:32 but i am having some other network issues locally 17:00:42 how are folks doing with their boards @ https://gitlab.torproject.org/tpo/core/team/-/issues ? 17:00:53 hi 17:01:04 o/ 17:02:25 i am chugging ahead on "next" and "doing"; I need to clean out my "backlog" or make it more realistic 17:03:25 nice, feel free to drop reviews on me this week if you need to while ian is away 17:03:39 wait that was the wrong link i gave above 17:03:48 or maybe it was a test to see how many clicks on my links 17:04:21 https://gitlab.torproject.org/groups/tpo/core/-/boards 17:04:29 i didn't see anything off here 17:05:33 dgoulet: anything on releases? 17:05:37 nothing 17:05:52 nextTopic() :) 17:06:18 perfect 17:06:21 no announcements 17:06:27 no discussion items 17:06:34 i assume y'all is not blocked on anything here 17:06:39 mikeperry: you wanna move to s61? 17:07:27 yeah, not much here. dgoulet and I are fixing issues in conflux, mostly wrt finishing touches and unit tests 17:08:01 i'm back today - lmk if you need anything (cc runners?) 17:08:06 as we add behaviors for circuit prediction, it is affecting missing pieces of the unit tests.. but the tests are also protecting us from regressions, so we got that going for us 17:08:21 wb jnewsome! 17:10:43 jnewsome: not yet, though gabi had some questions about geting end-to-end socks timings out of tgen 17:10:46 jnewsome: wb! I think we might need some oniontrace changes to plot some additional stuff (i.e. circuit build error reasons) before we can close core/tor#40717 17:11:09 wrong ticket 17:11:17 I meant core/tor#40570 17:12:03 I left a comment on the oniontrace PR about this (let me know if what I'm asking for doesn't make sense) https://github.com/shadow/oniontrace/pull/7 17:12:40 gabi: mikeperry ok - yeah should be simple enough to make that change on the oniontrace branch 17:12:47 mikeperry: right, I think it would be nice to have those too! 17:13:59 once we're happy with this oniontrace branch i think it's probably mergeable without too much fuss, but I'll wait until we're happy with it 17:14:07 nice 17:14:17 nice, ty 17:15:19 re: #40570 -- I had a chat with dgoulet earlier today and I think we can probably close that ticket (the perf issue seem to be a side effect of the DoS). However, it's still worth figuring out why disabling cannibalization lead to an increase in circuit build failures 17:15:33 (I'll leave a comment with my findings) 17:15:57 I still think there's a problem where it will wait a minute due to intro/hsdir repurposing 17:16:16 the socks timing of tgen, and socks failures, will help confirm this 17:17:10 have we created a ticket for that investigation? the socks failure one? or do we need to split that out of another ticket? 17:17:19 to clarify, I don't see any socks failures, so the connection _does_ succeed eventually, despite some circuits failing 17:17:33 (the circuits are simply relaunched and eventually succeed) 17:18:01 ahf: it's probably worth opening separate tickets for each of these problems, yeah 17:19:17 yeah, let's do that 17:19:19 are we going to close that ticket by disabling cannibalization then? because that ticket is about a problem seen in shadow, where there was no ddos 17:19:43 i think we can do that 17:19:59 the original problem isn't really a problem though (we were measuring CBT wrong), right? 17:20:18 so we don't _have_ to disable cannibalization (though we might still want to) 17:20:23 yeah I think the cause was off here from the hypothesis 17:20:56 +1 17:21:02 oh I see. so the onionperf branch measures it correctly 17:21:18 and so that ticket can be closed, and we can look at stream/socks behavior independently. 17:21:23 ok 17:21:27 +1 17:21:29 yep, agreed 17:21:40 nod yeah I think fixing the reporting in oniontrace would fix this issue. though if we've convinced ourselves that cannibalization doesn't provide enough effort to justify it's complexity maybe it's not a bad idea 17:21:50 *enough benefit 17:22:13 yeah, no that is less clear. probably needs another ticket in this case, I agree 17:23:17 cool, I'll open some follow up tickets after the meeting 17:23:23 ok. so, juga: you are still looking at the bwscanner_cc=2 stuff? gonna try disabling it? 17:24:54 it's disabled right now on longclaw 17:25:29 ok. the ddos attack is also letting up again, it seems. so be sure to take that into consideration when looking at results. 17:25:32 and suddenly the amount of measured relays rises substantially :( 17:25:35 https://gitlab.torproject.org/tpo/network-health/sbws/-/issues/40152#note_2885627 17:25:41 it seems to be backing off and resuming in sputters 17:25:51 so, something new to investigate 17:25:57 yeah 17:26:32 and if it does not resume this week i might have enough data to final do the outlier analysis properly 17:27:17 *finally 17:27:19 hm. weird. do we think that is because the ddos activity is less right now? 17:27:26 or is that an open question 17:29:07 no, i think because we disabled uploading in sbws 17:29:36 and went back to downloading on longclaw 17:29:42 the congestion windows are growing on our test relay. so it makes sense that things would appear faster to sbws now. but not why there are more releys that can be measured 17:29:46 maybe more reliability 17:30:21 i see 17:30:51 so yeah, avoid conclusions like "upload broken" until we can do some more controlled tests without the ddos changing things out from under us 17:31:59 yeah, i am not saying it's broken 17:32:29 it's just that after one hour of disabling it the number of measured relays is suddenly rising again 17:32:49 that was on march 10 17:33:01 and the ddos was gone at that time for a while already 17:33:27 anyway, we'll see what juga will find out 17:33:58 yeah it fell off on march 7 but came back for a bit march 8. now it is sputtering at a lower volume since march 11.. 17:37:00 do we have more? :-) 17:37:00 ok. anything else? 17:37:02 :D 17:37:09 * ahf is good 17:39:15 ok then. I guess we can call it 17:40:19 ya, thanks all 17:40:21 #endmeeting