17:00:14 #startmeeting network team meeting, 3 Jun 2019 17:00:14 Meeting started Mon Jun 3 17:00:14 2019 UTC. The chair is nickm. Information about MeetBot at http://wiki.debian.org/MeetBot. 17:00:14 Useful Commands: #action #agreed #help #info #idea #link #topic. 17:00:20 o/ 17:00:22 hi! I think it's that time again! 17:00:25 o/ 17:00:28 https://pad.riseup.net/p/tor-netteam-2019.1-keep 17:00:29 hello hello 17:00:29 (around but will soon go for dinner with family) 17:00:30 hello 17:00:31 nothing from me! 17:01:10 asn: ok! Anything you want to make sure we talk about while you are around? 17:01:12 o/ 17:01:19 nope im covered 17:01:44 i've been in contact with the s27 team and the scaling team and they are aware of my next moves 17:01:51 woo 17:02:11 So actually I suggest we skip past CI for now and talk about it when we do rotation handoff and discussion 17:02:19 let's go straight to 041 release status 17:02:27 https://trac.torproject.org/projects/tor/wiki/org/teams/NetworkTeam/CoreTorReleases/041Status 17:03:14 we have a bunch fewer than we did before , and once this week's reviews are done, we'll have fewer still, I hope 17:04:09 I triaged a bunch of stuff out of the 0.4.1.x milestone last week; sent a tor-dev email about that at https://lists.torproject.org/pipermail/tor-dev/2019-May/013840.html 17:04:28 Here are the tickets currently marked as 041-must: https://trac.torproject.org/projects/tor/query?status=accepted&status=assigned&status=merge_ready&status=needs_information&status=needs_review&status=needs_revision&status=new&status=reopened&keywords=~041-must&col=id&col=summary&col=status&col=type&col=priority&col=milestone&col=component&order=status 17:05:32 If everybody with one of those can focus on closing it, and everybody with reviews to do for 041 can get reviews done, we'll get 0.4.1.2-alpha out this week 17:05:56 Does that sound like an okay plan? 17:06:19 yes 17:06:46 nickm: (1) is not simple, I've been all morning at it 17:07:03 nickm: (3) will probably be closed imo through another one in needs review I think 17:07:20 dgoulet: Wrt (1) would you like some help or do you want to poke at it longer? 17:08:14 nickm: I'm hoping the person on the ticket will send me more debug logs or I'm finally able to reproduce, but if both fails, yes help++ 17:09:17 ok 17:09:38 another option is to log more information when this happens (like, what kind of circuit and which hop the sendme came from) 17:10:07 nickm: I have a relay loggin 5 lines of text every time it happens :) so far not helping much, I need the end2end correlation :S 17:10:16 nickm: somehow the deliver/package window are out of sync :S 17:10:33 weird 17:10:38 I'll ask more on #tor-dev :) 17:10:49 very... especially code I didn't change (stream level :S) 17:10:52 next thing to do is roadmap 17:10:53 nickm: great 17:11:33 everybody please take a look at that kanban, filter it, and move stuff to a good place :) 17:12:57 any questions/issues there? 17:13:10 If not, let's move on to reviews... 17:13:44 looks like I only got 1 this week, so if anybody needs help, please feel free to pass me something if you're overloaded 17:14:23 i'd really like some extra eyes on #25140, i think it looks good, but it's pretty big and have had a ton of iterations 17:14:39 we have found something at every iteration i believe 17:15:08 ahf: ok, I can help. Let's plan a time tomorrow to look at it together? 17:15:25 sounds good, at your morning'ish time? 17:16:15 I have a 9am meeting my time, so my 10am would be good? 17:16:26 == 1400 UTC I think 17:16:34 sound sgood! yep 17:17:18 Rotations are under discussion too, so let's move to rotations/discussion :) 17:18:19 I'm passing CI to teor, but I plan to keep working on test-stem and test-rebind stuff 17:18:40 one thing we should talk about is whether we disable the intermittent failing stuff that we have not yet been able to fix 17:18:51 This is test-stem and test_rebind.by 17:18:59 That is, we would have to make it allow_failures 17:19:08 and keep working on a fix so we can turn it back on 17:19:12 What do people think about that? 17:20:22 the test_rebind.py failure seems to be macOS-only, so we could allow_failures the macOS builds 17:20:31 nice 17:20:34 or disable test_rebind on them 17:20:45 anyone seen the test_rebind failure on not-macOS? 17:21:02 I thought I had, but I turned out to be wrong 17:21:50 Does anybody think we should or shouldn't allow_failure these for now? 17:22:38 Maybe we should note this stuff on the CI status page, and have a note in ReleasingTor.md saying that we should manually try all the CI-disabled tests before releasing 17:23:16 we could also conditionally not run test_rebind on macOS somehow, so we don't have to disable all the macOS jobs 17:23:21 s/disable/allow_failures/ 17:23:28 yeah 17:23:42 If nobody objects, I think this is the approach we should go with 17:25:03 we could make it also depend on running in Travis so a developer on macOS running `make check` still has it run 17:25:28 I'd just add an environment variable for disabling test_rebind.py, and add it to the relevant entries in the travis file 17:25:42 that works too 17:26:11 maybe we should also have a process for the CI rotation person to check the allow_failures Travis jobs occasionally? 17:26:44 +1 17:27:01 if nothing else on this, let's look at teor's other questions about sbws deployment? 17:27:53 they are: 17:27:57 Should we deploy sbws to half the bandwidth authorities? 17:27:58 Should we raise AuthDirMaxServersPerAddr to 4? 17:28:09 any thoughts? 17:30:24 is this a suggestion we make to the dirauth operators? 17:30:27 about AuthDirMaxServersPerAddr, on tor-dev there 17:30:36 discussion seems advancing 17:31:16 In my ideal world this is a question dirauth operators would figure out 17:32:06 I think we made that value tunable because we didn't know what it should be 17:32:13 I don't mind keeping it at 2 or raising it to 4 17:32:20 it's not just a measurement problem.. operators need multiple instances to use all bandwidth because of single-threaded CPU limits 17:32:23 good to see the thread is making progress 17:32:54 for sbws -- I'd like us to ask ourselves how sure we are this won't break, how fast we can change back if it does break. 17:33:21 also maybe we should ask "are we measuring the right things so that we will notice any performance changes that will happen as a result of this?" 17:33:28 so maybe we should pull in metrics too 17:35:49 isn't it unlikely to change metrics until relay operators actually run more than 2 relays per IP? 17:36:07 I was talking about the SBWS change 17:36:10 might take a bit for enough relay operators to take advantage/run more relay instances 17:36:13 oh 17:37:13 mikeperry: I agree with you about the MaxServers change 17:38:31 we had some good aggregate graphs that we used from metrics when we last did a major torflow update, many years ago, to verify similar distributions between old and new instances 17:39:18 mikeperry: could you introduce those to the tor-dev thread that teor linked? 17:40:10 I am trying to find it.. it was a long long time ago 17:40:17 ok 17:40:36 That's the end of the discussion section on the pad. Do we have any other items for this week's meeting? 17:40:50 * ahf has none 17:41:59 okay. Let's have a great week hacking, then. Thanks, everybody! 17:42:41 o/ 17:42:49 o/ thanks 17:43:51 #endmeeting