16:57:48 #startmeeting network team meeting, 12th of april 2021 16:57:48 Meeting started Mon Apr 12 16:57:48 2021 UTC. The chair is ahf. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:57:48 Useful Commands: #action #agreed #help #info #idea #link #topic. 16:57:51 hello hello 16:57:54 hi 16:58:02 our pad is at https://pad.riseup.net/p/tor-netteam-2021.1-keep 16:58:06 o/ 16:58:07 o/ dgoulet 16:58:09 o/ juga 16:58:14 o/ 16:58:15 hi everybody! 16:59:12 o/ 16:59:21 how are folks doing with their tickets in https://gitlab.torproject.org/groups/tpo/core/-/boards ? 16:59:37 moving forward! 16:59:41 * asn looking & also writing report 16:59:52 nice 17:00:17 does someone have the pad link handy? i'm migrating computers and don't have the bookmark 17:00:35 https://pad.riseup.net/p/tor-netteam-2021.1-keep 17:01:28 0.4.5 post-stable and 0.4.6 stable - anything we need to discuss there? 17:01:37 thx 17:01:54 i saw dgoulet have a patch in MR for the bridge bootstrap issue 17:02:27 yah! 17:02:30 that one was rough... 17:02:40 I went a bit crazy on the commit message :P 17:02:49 We talked some on thursday; currennt plan is to put out 046 on the 15th this week, and not do an 045 yet 17:02:53 Let me know if that should change? 17:03:04 yeah, sounds good 17:03:15 Also let's try to not do any merges starting wednesday until the release comes out. 17:03:16 also new fallbackdir MRs (tor and arti) are up 17:03:56 I should do a new geoip file as well 17:04:03 ok, no merges from wednesday 17:04:50 the only new ticket from another team is the bridge one that david is on 17:05:05 i don't see any discussion or announcement items from us? so maybe it's s61 discussion time? 17:06:36 i take that as it is s61 discussion time. mikeperry you want to lead this? 17:06:59 the main plan for s61 was to start implementation on congestion control, but funding proposals and analysis and other issues are eating my month 17:07:22 I'm finally!! been able to make progress on prop325 here 17:07:29 so at least that is that 17:07:44 mikeperry: something we can do to help there? 17:07:45 dgoulet: nice 17:07:52 very exciting 17:08:42 we have done some preliminary analysis on usage of extra-info by sbws. we aren't done with looking at overhead byte counts of desc vs extra-info for overload-general yet. we ran into stem issues 17:09:53 ok! 17:09:54 sbws also only needs to look at overload-general, and not the other lines. so there is a question of looking at that line separately for consideration in descs, vs considering them all together 17:10:43 we're also unclear on the goals here. so were analyzing everything. that is time consuming 17:11:07 here's our preliminary script: https://gitlab.torproject.org/tpo/network-health/helper-scripts/-/merge_requests/4 17:11:42 the other lines = the lines that are already in extra-info with geoip db and whatever? 17:12:42 i think overload-X 17:13:01 ok 17:13:09 overload-ratelimits and overload-fd-exhausted 17:13:51 those ones aren't needed by sbws, and do not need to be checked as often by network-health. they can be checked from disk, or other easier ways to get extra-info 17:14:11 overload-general is needed by sbws from every fresh consensus desc set 17:14:58 so the big discussion on where this belongs seems to be going towards that it would be easier for you to have overload-general in the server descriptor and have all the other ones in the extra-info is that correct? 17:15:13 i haven't looked at the benchmark scripts 17:15:49 i don't think it's easier to have the other things in the extrs-info desc 17:15:50 yes 17:16:11 it might just be not too important if they are there or in the server descs 17:16:28 it's not easier to have them in extra-info either, but they are there now, and keeping them there may keep overhead bytes smaller for descs, if that is the root concern 17:16:37 yeah 17:19:14 overload-general is also not expected to be present for all relays. ideally, we would reduce load so it goes away. so it should be minimal overhead bytes in descs 17:19:27 okay, how do we untagle this? this has been discussed for a month now. so far the network team have thought that the overhead is negible (i think, i have been out of the loop) and you guys think it is causing overhead for sbws (i think?) and in terms of development time for future applications too, right? 17:19:41 but we have not yet done analysis on its prevalance in relays that have upgraded already 17:19:46 network-team: help me out here because i want this to be solved today since this has been discussed for too long now and we keep having an open end to it i think 17:20:00 so one thing I want to ask 17:20:12 sbws is taking server descriptors from where caches or dirauth? 17:20:43 sometimes dirauths cause of the fetchextraearly 17:20:48 torcc option 17:20:54 currently dirauth, to get the latest consensus as quickly as possible. but thats another issue. it can pivot to dircache in the event of dos, if it can keep using descs 17:21:06 but we lose this ability to pivot if we have to use extra-info 17:21:11 right 17:21:40 (Unless we have some caches start fetching and serving extrainfo) 17:21:49 Did any of the measurements so far change what we expected? 17:22:00 but isn't that a larger change and also something that requires talking with humans to do? 17:22:10 yup. 17:22:18 in terms of "what we expected" I mean: 17:23:11 That downloading extrainfos is a significant increase in bandwidth needed used for directory<-->sbws communication, but that such bandwidth is _not_ a significant part of the directories' total load, nor is it a significant part of SBWS's total load. 17:24:17 but without the "getting caches to serve extra-info", they will have to use dirauth's all the time, no? that sounds bad to me just by the fact that we have some 6-7k nodes in the network we /could/ use, but we force ourselves down to 9 of them? 17:24:49 could we do a crazy thing and merge extrainfo + server desc into 1 single documents (let say server desc because cached by default) ? :) 17:25:01 I mean, both are unused at all for the network to function except for monitoring 17:25:04 I also expected many other things to break with this, and that has been top priority to find first. stem can't get extra-info from control port, and fetching extra-infos via stem's http methods reduce compression of the docs, and takes far longer (30m to get just 46 descriptors). so it's a massive engineering hole. 17:25:23 we have not done the analysis on the effects of bytes on this, but the perf is awful 17:25:32 dgoulet: i was thinking a bit about that too, i am not even sure why they are split up today when we use microdesc's in clients 17:25:37 so basically: our tooling is too awful to use extrainfos 17:25:39 ? 17:26:02 we have other questions too 17:26:12 i think extra-info might have not been needed hardly enough for the tooling to care for it? i guess that's a yes :-S 17:26:24 we don't know how many relays opt out of extra-info yet. and are they updated immediately, or only once per 24 hours? 17:27:06 * ahf don't know 17:27:09 oh wow true, relays can opt-out ... huh 17:27:17 (extra info) 17:27:21 The original distinction between router descriptors and extrainfo documents was that routerinfors have the "information that you need to use a router" and extrainfo has "misc statistics that aren't so useful" 17:27:27 (see prop#166) 17:27:38 right but server desc are not used anymore 17:27:42 nickm: in the tor timeline, this predates microdescs right? 17:27:43 yeah. 17:27:44 nickm: ahf: i've not played with it enough to know whether it's just the stem part of not using control port or the script i wrote is wrong so far 17:27:47 That too. 17:28:11 (And all this predates diffs) 17:28:22 ((which aren't used for descriptors)) 17:28:28 how wild is david's idea as a medium-term one? and then the short term one is to promote the overload-general into the server descriptor to make the life easier of the only user of this value right now so they can move on with their exciting tasks so we can get decent bw scanning again ? 17:28:45 dgoulet's idea being to merge the two documents? 17:28:48 yes 17:29:23 could be neat. We'd need a proposal, and we'd probably want to do other stuff along with it. Like, perhaps, make it so that most relays don't cache descriptors at all. 17:29:40 We'd need to look at tooling a lot and see if anything breaks if extrainfo documents go away 17:29:47 if there's anything that absolutely needs descriptors 17:29:55 why? don't we want to future proof it exactly by having most relays cache it so we avoid this situation in the future? 17:29:59 how hard migrating the metrics code would be 17:30:00 etc 17:30:02 ya 17:30:06 relays don't need this info at all 17:30:17 and it changes relatively frequently 17:30:18 right, but consumers of the cahced data might want it 17:30:22 *nod* 17:30:38 what about the short term suggestion? this discussion have been going on for a month now and i feel we are wasting energy on it 17:30:47 having 5000 caches vs <10 tools that want this seems off :) 17:30:48 I think the right infrastructure there would be to use the caching mechanism at the authorities instead of all relays 17:30:58 (maybe) 17:31:14 if this info were in the descriptors today, would sbws actually use it for anything yet? 17:31:30 i think it would start using it as part of the work the team is trying to solve with taking overload into account 17:31:48 i think that is what they are trying to get to so they can do the development for that and start testing it out as the network upgrades 17:32:06 what is it about the tooling the makes it so terrible? i export my moria1-cached-extrainfo files hourly to my webserver. maybe the tooling is interacting poorly with the dir auth dos defenses? 17:32:22 arma2: nobody can figure out how to download extrainfos. 17:32:29 arma2: it is too hard and takes too much programming 17:32:33 not sure about Stem, I use stem to download extrainfo but from dirauth directly 17:32:38 oh. but isn't it just a url? 17:32:40 downloading extrainfo with Stem is simple.... 17:32:59 but it won't do it to caches iirc 17:33:27 arma2: please read the backlog a bit, there is also discussion about the lack of caching of the extra-info 17:33:44 yes, i read all the backlog. i found the "6000 caches for <10 users" line compelling. 17:34:13 nickm: we are workingo on that as part of s61, so, yes? 17:34:23 *working 17:34:31 i mean we need to write code to do so 17:34:41 but the tickets are filed 17:34:44 GeKo: which "that" do you mean? 17:34:46 and work started to do so 17:34:49 the "using the info"? 17:34:59 yes 17:35:12 that is while sbws is about to roll out 17:35:30 we want to start work on making sure the overload is taken into account during bw measurements 17:36:05 there are already relays which are reporting overload-general, so this is needed 17:36:06 what about sbws uses extra-info *for now* to get going, network team works on getting rid of extra-info and merging it into sevrer descriptor? 17:36:28 I don't think we should cache extra-info at caches, either. I don't think it makes sense to merge them. I think having just this overload-general line in descs is less overhead total, is more reliable, is less risk in the event of dos, is less engineering cost, etc 17:36:30 ahf: just moving this field into server-descriptor would be easier,IMO 17:36:32 that way i think you guys are unblocked, we promise to resolve it in the future, at some point in the future you can avoid using extra-info for all the reasons listed? 17:36:45 nickm: i am all for that too 17:36:50 then let's do that 17:36:52 ok lets move this to server desc and move on 17:36:55 (is there any point in extrainfos?) 17:36:59 yes, please 17:37:16 sounds good 17:37:19 if it's easier implementation-wise all the lines could move to server descs 17:37:32 i am not sure how weird it is having just some of them in server descs 17:37:34 Question: I'd like to get a general principle out of this so that we don't have to do this over every time we have a question about what info goes where. 17:37:44 Any thoughts? 17:37:57 "Never put anything into extrainfos that some tool might need"? 17:37:59 nothing goes into extra-info and we try to deprecate it? 17:38:08 ^ 17:38:17 "Only metrics should use extrainfo"? 17:38:17 if the mindset of simplifying things, we should get rid of one 17:38:28 why are caches caching relay descriptors either? nothing asks the caches for them, right? except clients who set usemicrodescriptors to 0? 17:38:33 i think we should get rid of it entirely over time 17:38:46 so if we want to do the architecture shift, the one to do is to deprecate cached-consensus and cached-descriptors 17:39:01 arma2: that's correct. We should migrate caches to not serving descriptors by default either. But that's a bigger and separate change. 17:39:37 When we do that we should form a plan to keep _some_ (like 100) caches serving descriptors, so metrics and sbws and whatever can use them 17:39:47 perhaps 17:41:05 yeah 17:41:28 ok, mikeperry, juga, GeKo: will the above stuff unblock you guys and make it possible to proceed? 17:41:52 ahf, yes 17:41:55 like move overoad-general into the sever descriptor, and the netteam works on finding a way of probably not using extra-info in the future 17:42:09 sounds good 17:42:10 thanks 17:42:37 ok, i'll create a ticket for it after the meeting to tpo/core/tor 17:42:51 <3 17:43:12 what is next on our agenda? 17:43:15 yay 17:44:10 that's it for s61, I think. unless there are any questions about congestion control, etc or anything else 17:45:46 * ahf is good 17:46:15 thanks ahf for faciliting this hard discussion 17:46:29 np, sorry if i sound pushy on it 17:46:48 thanks all for arriving to a solution 17:46:54 i am gonna call endmeeting now 17:46:56 #endmeeting