16:58:12 <ahf> #startmeeting Network team meeting, 13 March 2023
16:58:12 <MeetBot> Meeting started Mon Mar 13 16:58:12 2023 UTC.  The chair is ahf. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:58:12 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic.
16:58:14 <ahf> hello hello
16:58:28 <ahf> pad is at https://pad.riseup.net/p/tor-netteam-2023.1-keep
16:58:32 <nickm> hello
16:58:35 <gabi> o/
16:59:14 <dgoulet> o/
16:59:50 <jnewsome> o/
17:00:16 <ahf> okay, let's go
17:00:25 <ahf> pad is very slow for me to load today
17:00:32 <ahf> but i am having some other network issues locally
17:00:42 <ahf> how are folks doing with their boards @ https://gitlab.torproject.org/tpo/core/team/-/issues ?
17:00:53 <GeKo> hi
17:01:04 <ahf> o/
17:02:25 <nickm> i am chugging ahead on "next" and "doing"; I need to clean out my "backlog" or make it more realistic
17:03:25 <ahf> nice, feel free to drop reviews on me this week if you need to while ian is away
17:03:39 <ahf> wait that was the wrong link i gave above
17:03:48 <ahf> or maybe it was a test to see how many clicks on my links
17:04:21 <ahf> https://gitlab.torproject.org/groups/tpo/core/-/boards
17:04:29 <ahf> i didn't see anything off here
17:05:33 <ahf> dgoulet: anything on releases?
17:05:37 <dgoulet> nothing
17:05:52 <dgoulet> nextTopic() :)
17:06:18 <ahf> perfect
17:06:21 <ahf> no announcements
17:06:27 <ahf> no discussion items
17:06:34 <ahf> i assume y'all is not blocked on anything here
17:06:39 <ahf> mikeperry: you wanna move to s61?
17:07:27 <mikeperry> yeah, not much here. dgoulet and I are fixing issues in conflux, mostly wrt finishing touches and unit tests
17:08:01 <jnewsome> i'm back today - lmk if you need anything (cc runners?)
17:08:06 <mikeperry> as we add behaviors for circuit prediction, it is affecting missing pieces of the unit tests.. but the tests are also protecting us from regressions, so we got that going for us
17:08:21 <ahf> wb jnewsome!
17:10:43 <mikeperry> jnewsome: not yet, though gabi had some questions about geting end-to-end socks timings out of tgen
17:10:46 <gabi> jnewsome: wb! I think we might need some oniontrace changes to plot some additional stuff (i.e. circuit build error reasons) before we can close core/tor#40717
17:11:09 <gabi> wrong ticket
17:11:17 <gabi> I meant core/tor#40570
17:12:03 <gabi> I left a comment on the oniontrace PR about this (let me know if what I'm asking for doesn't make sense) https://github.com/shadow/oniontrace/pull/7
17:12:40 <jnewsome> gabi: mikeperry ok - yeah should be simple enough to make that change on the oniontrace branch
17:12:47 <gabi> mikeperry: right, I think it would be nice to have those too!
17:13:59 <jnewsome> once we're happy with this oniontrace branch i think it's probably mergeable without too much fuss, but I'll wait until we're happy with it
17:14:07 <ahf> nice
17:14:17 <gabi> nice, ty
17:15:19 <gabi> re: #40570 -- I had a chat with dgoulet earlier today and I think we can probably close that ticket (the perf issue seem to be a side effect of the DoS). However, it's still worth figuring out why disabling cannibalization lead to an increase in circuit build failures
17:15:33 <gabi> (I'll leave a comment with my findings)
17:15:57 <mikeperry> I still think there's a problem where it will wait a minute due to intro/hsdir repurposing
17:16:16 <mikeperry> the socks timing of tgen, and socks failures, will help confirm this
17:17:10 <ahf> have we created a ticket for that investigation? the socks failure one? or do we need to split that out of another ticket?
17:17:19 <gabi> to clarify, I don't see any socks failures, so the connection _does_ succeed eventually, despite some circuits failing
17:17:33 <gabi> (the circuits are simply relaunched and eventually succeed)
17:18:01 <gabi> ahf: it's probably worth opening separate tickets for each of these problems, yeah
17:19:17 <ahf> yeah, let's do that
17:19:19 <mikeperry> are we going to close that ticket by disabling cannibalization then? because that ticket is about a problem seen in shadow, where there was no ddos
17:19:43 <ahf> i think we can do that
17:19:59 <gabi> the original problem isn't really a problem though (we were measuring CBT wrong), right?
17:20:18 <gabi> so we don't _have_ to disable cannibalization (though we might still want to)
17:20:23 <dgoulet> yeah I think the cause was off here from the hypothesis
17:20:56 <gabi> +1
17:21:02 <mikeperry> oh I see. so the onionperf branch measures it correctly
17:21:18 <mikeperry> and so that ticket can be closed, and we can look at stream/socks behavior independently.
17:21:23 <mikeperry> ok
17:21:27 <dgoulet> +1
17:21:29 <gabi> yep, agreed
17:21:40 <jnewsome> nod yeah I think fixing the reporting in oniontrace would fix this issue. though if we've convinced ourselves that cannibalization doesn't provide enough effort to justify it's complexity  maybe it's not a bad idea
17:21:50 <jnewsome> *enough benefit
17:22:13 <mikeperry> yeah, no that is less clear. probably needs another ticket in this case, I agree
17:23:17 <gabi> cool, I'll open some follow up tickets after the meeting
17:23:23 <mikeperry> ok. so, juga: you are still looking at the bwscanner_cc=2 stuff? gonna try disabling it?
17:24:54 <GeKo> it's disabled right now on longclaw
17:25:29 <mikeperry> ok. the ddos attack is also letting up again, it seems. so be sure to take that into consideration when looking at results.
17:25:32 <GeKo> and suddenly the amount of measured relays rises substantially :(
17:25:35 <GeKo> https://gitlab.torproject.org/tpo/network-health/sbws/-/issues/40152#note_2885627
17:25:41 <mikeperry> it seems to be backing off and resuming in sputters
17:25:51 <GeKo> so, something new to investigate
17:25:57 <GeKo> yeah
17:26:32 <GeKo> and if it does not resume this week i might have enough data to final do the outlier analysis properly
17:27:17 <GeKo> *finally
17:27:19 <ahf> hm. weird. do we think that is because the ddos activity is less right now?
17:27:26 <ahf> or is that an open question
17:29:07 <GeKo> no, i think because we disabled uploading in sbws
17:29:36 <GeKo> and went back to downloading on longclaw
17:29:42 <mikeperry> the congestion windows are growing on our test relay. so it makes sense that things would appear faster to sbws now. but not why there are more releys that can be measured
17:29:46 <mikeperry> maybe more reliability
17:30:21 <ahf> i see
17:30:51 <mikeperry> so yeah, avoid conclusions like "upload broken" until we can do some more controlled tests without the ddos changing things out from under us
17:31:59 <GeKo> yeah, i am not saying it's broken
17:32:29 <GeKo> it's just that after one hour of disabling it the number of measured relays is suddenly rising again
17:32:49 <GeKo> that was on march 10
17:33:01 <GeKo> and the ddos was gone at that time for a while already
17:33:27 <GeKo> anyway, we'll see what juga will find out
17:33:58 <mikeperry> yeah it fell off on march 7 but came back for a bit march 8. now it is sputtering at a lower volume since march 11..
17:37:00 <ahf> do we have more? :-)
17:37:00 <mikeperry> ok. anything else?
17:37:02 <ahf> :D
17:37:09 * ahf is good
17:39:15 <mikeperry> ok then. I guess we can call it
17:40:19 <ahf> ya, thanks all
17:40:21 <ahf> #endmeeting