16:57:48 <ahf> #startmeeting network team meeting, 12th of april 2021
16:57:48 <MeetBot> Meeting started Mon Apr 12 16:57:48 2021 UTC.  The chair is ahf. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:57:48 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic.
16:57:51 <ahf> hello hello
16:57:54 <dgoulet> hi
16:58:02 <ahf> our pad is at https://pad.riseup.net/p/tor-netteam-2021.1-keep
16:58:06 <juga> o/
16:58:07 <ahf> o/ dgoulet
16:58:09 <ahf> o/ juga
16:58:14 <GeKo> o/
16:58:15 <nickm> hi everybody!
16:59:12 <asn> o/
16:59:21 <ahf> how are folks doing with their tickets in https://gitlab.torproject.org/groups/tpo/core/-/boards ?
16:59:37 <nickm> moving forward!
16:59:41 * asn looking & also writing report
16:59:52 <ahf> nice
17:00:17 <jnewsome> does someone have the pad link handy? i'm migrating computers and don't have the bookmark
17:00:35 <ahf> https://pad.riseup.net/p/tor-netteam-2021.1-keep
17:01:28 <ahf> 0.4.5 post-stable and 0.4.6 stable - anything we need to discuss there?
17:01:37 <jnewsome> thx
17:01:54 <ahf> i saw dgoulet have a patch in MR for the bridge bootstrap issue
17:02:27 <dgoulet> yah!
17:02:30 <dgoulet> that one was rough...
17:02:40 <dgoulet> I went a bit crazy on the commit message :P
17:02:49 <nickm> We talked some on thursday; currennt plan is to put out 046 on the 15th this week, and not do an 045 yet
17:02:53 <nickm> Let me know if that should change?
17:03:04 <ahf> yeah, sounds good
17:03:15 <nickm> Also let's try to not do any merges starting wednesday until the release comes out.
17:03:16 <dgoulet> also new fallbackdir MRs (tor and arti) are up
17:03:56 <nickm> I should do a new geoip file as well
17:04:03 <ahf> ok, no merges from wednesday
17:04:50 <ahf> the only new ticket from another team is the bridge one that david is on
17:05:05 <ahf> i don't see any discussion or announcement items from us? so maybe it's s61 discussion time?
17:06:36 <ahf> i take that as it is s61 discussion time. mikeperry you want to lead this?
17:06:59 <mikeperry> the main plan for s61 was to start implementation on congestion control, but funding proposals and analysis and other issues are eating my month
17:07:22 <dgoulet> I'm finally!! been able to make progress on prop325 here
17:07:29 <dgoulet> so at least that is that
17:07:44 <ahf> mikeperry: something we can do to help there?
17:07:45 <ahf> dgoulet: nice
17:07:52 <ahf> very exciting
17:08:42 <mikeperry> we have done some preliminary analysis on usage of extra-info by sbws. we aren't done with looking at overhead byte counts of desc vs extra-info for overload-general yet. we ran into stem issues
17:09:53 <ahf> ok!
17:09:54 <mikeperry> sbws also only needs to look at overload-general, and not the other lines. so there is a question of looking at that line separately for consideration in descs, vs considering them all together
17:10:43 <mikeperry> we're also unclear on the goals here. so were analyzing everything. that is time consuming
17:11:07 <mikeperry> here's our preliminary script: https://gitlab.torproject.org/tpo/network-health/helper-scripts/-/merge_requests/4
17:11:42 <ahf> the other lines = the lines that are already in extra-info with geoip db and whatever?
17:12:42 <juga> i think overload-X
17:13:01 <ahf> ok
17:13:09 <mikeperry> overload-ratelimits and overload-fd-exhausted
17:13:51 <mikeperry> those ones aren't needed by sbws, and do not need to be checked as often by network-health. they can be checked from disk, or other easier ways to get extra-info
17:14:11 <mikeperry> overload-general is needed by sbws from every fresh consensus desc set
17:14:58 <ahf> so the big discussion on where this belongs seems to be going towards that it would be easier for you to have overload-general in the server descriptor and have all the other ones in the extra-info is that correct?
17:15:13 <ahf> i haven't looked at the benchmark scripts
17:15:49 <GeKo> i don't think it's easier to have the other things in the extrs-info desc
17:15:50 <mikeperry> yes
17:16:11 <GeKo> it might just be not too important if they are there or in the server descs
17:16:28 <mikeperry> it's not easier to have them in extra-info either, but they are there now, and keeping them there may keep overhead bytes smaller for descs, if that is the root concern
17:16:37 <GeKo> yeah
17:19:14 <mikeperry> overload-general is also not expected to be present for all relays. ideally, we would reduce load so it goes away. so it should be minimal overhead bytes in descs
17:19:27 <ahf> okay, how do we untagle this? this has been discussed for a month now.    so far the network team have thought that the overhead is negible (i think, i have been out of the loop) and you guys think it is causing overhead for sbws (i think?) and in terms of development time for future applications too, right?
17:19:41 <mikeperry> but we have not yet done analysis on its prevalance in relays that have upgraded already
17:19:46 <ahf> network-team: help me out here because i want this to be solved today since this has been discussed for too long now and we keep having an open end to it i think
17:20:00 <dgoulet> so one thing I want to ask
17:20:12 <dgoulet> sbws is taking server descriptors from where caches or dirauth?
17:20:43 <juga> sometimes dirauths cause of the fetchextraearly
17:20:48 <juga> torcc option
17:20:54 <mikeperry> currently dirauth, to get the latest consensus as quickly as possible. but thats another issue. it can pivot to dircache in the event of dos, if it can keep using descs
17:21:06 <mikeperry> but we lose this ability to pivot if we have to use extra-info
17:21:11 <dgoulet> right
17:21:40 <nickm> (Unless we have some caches start fetching and serving extrainfo)
17:21:49 <nickm> Did any of the measurements so far change what we expected?
17:22:00 <ahf> but isn't that a larger change and also something that requires talking with humans to do?
17:22:10 <nickm> yup.
17:22:18 <nickm> in terms of "what we expected" I mean:
17:23:11 <nickm> That downloading extrainfos is a significant increase in bandwidth needed used for directory<-->sbws communication, but that such bandwidth is _not_ a significant part of the directories' total load, nor is it a significant part of SBWS's total load.
17:24:17 <ahf> but without the "getting caches to serve extra-info", they will have to use dirauth's all the time, no? that sounds bad to me just by the fact that we have some 6-7k nodes in the network we /could/ use, but we force ourselves down to 9 of them?
17:24:49 <dgoulet> could we do a crazy thing and merge extrainfo + server desc into 1 single documents (let say server desc because cached by default) ? :)
17:25:01 <dgoulet> I mean, both are unused at all for the network to function except for monitoring
17:25:04 <mikeperry> I also expected many other things to break with this, and that has been top priority to find first. stem can't get extra-info from control port, and fetching extra-infos via stem's http methods reduce compression of the docs, and takes far longer (30m to get just 46 descriptors). so it's a massive engineering hole.
17:25:23 <mikeperry> we have not done the analysis on the effects of bytes on this, but the perf is awful
17:25:32 <ahf> dgoulet: i was thinking a bit about that too, i am not even sure why they are split up today when we use microdesc's in clients
17:25:37 <nickm> so basically: our tooling is too awful to use extrainfos
17:25:39 <nickm> ?
17:26:02 <mikeperry> we have other questions too
17:26:12 <ahf> i think extra-info might have not been needed hardly enough for the tooling to care for it? i guess that's a yes :-S
17:26:24 <mikeperry> we don't know how many relays opt out of extra-info yet. and are they updated immediately, or only once per 24 hours?
17:27:06 * ahf don't know
17:27:09 <dgoulet> oh wow true, relays can opt-out ... huh
17:27:17 <dgoulet> (extra info)
17:27:21 <nickm> The original distinction between router descriptors and extrainfo documents was that routerinfors have the "information that you need to use a router" and extrainfo has "misc statistics that aren't so useful"
17:27:27 <nickm> (see prop#166)
17:27:38 <dgoulet> right but server desc are not used anymore
17:27:42 <ahf> nickm: in the tor timeline, this predates microdescs right?
17:27:43 <nickm> yeah.
17:27:44 <juga> nickm: ahf: i've not played with it enough to know whether it's just the stem part of not using control port or the script i wrote is wrong so far
17:27:47 <nickm> That too.
17:28:11 <nickm> (And all this predates diffs)
17:28:22 <nickm> ((which aren't used for descriptors))
17:28:28 <ahf> how wild is david's idea as a medium-term one? and then the short term one is to promote the overload-general into the server descriptor to make the life easier of the only user of this value right now so they can move on with their exciting tasks so we can get decent bw scanning again ?
17:28:45 <nickm> dgoulet's idea being to merge the two documents?
17:28:48 <ahf> yes
17:29:23 <nickm> could be neat.  We'd need a proposal, and we'd probably want to do other stuff along with it.  Like, perhaps, make it so that most relays don't cache descriptors at all.
17:29:40 <nickm> We'd need to look at tooling a lot and see if anything breaks if extrainfo documents go away
17:29:47 <nickm> if there's anything that absolutely needs descriptors
17:29:55 <ahf> why? don't we want to future proof it exactly by having most relays cache it so we avoid this situation in the future?
17:29:59 <nickm> how hard migrating the metrics code would be
17:30:00 <nickm> etc
17:30:02 <ahf> ya
17:30:06 <nickm> relays don't need this info at all
17:30:17 <nickm> and it changes relatively frequently
17:30:18 <ahf> right, but consumers of the cahced data might want it
17:30:22 <ahf> *nod*
17:30:38 <ahf> what about the short term suggestion? this discussion have been going on for a month now and i feel we are wasting energy on it
17:30:47 <nickm> having 5000 caches vs <10 tools that want this seems off :)
17:30:48 <dgoulet> I think the right infrastructure there would be to use the caching mechanism at the authorities instead of all relays
17:30:58 <dgoulet> (maybe)
17:31:14 <nickm> if this info were in the descriptors today, would sbws actually use it for anything yet?
17:31:30 <ahf> i think it would start using it as part of the work the team is trying to solve with taking overload into account
17:31:48 <ahf> i think that is what they are trying to get to so they can do the development for that and start testing it out as the network upgrades
17:32:06 <arma2> what is it about the tooling the makes it so terrible? i export my moria1-cached-extrainfo files hourly to my webserver. maybe the tooling is interacting poorly with the dir auth dos defenses?
17:32:22 <nickm> arma2: nobody can figure out how to download extrainfos.
17:32:29 <nickm> arma2: it is too hard and takes too much programming
17:32:33 <dgoulet> not sure about Stem, I use stem to download extrainfo but from dirauth directly
17:32:38 <arma2> oh. but isn't it just a url?
17:32:40 <dgoulet> downloading extrainfo with Stem is simple....
17:32:59 <dgoulet> but it won't do it to caches iirc
17:33:27 <ahf> arma2: please read the backlog a bit, there is also discussion about the lack of caching of the extra-info
17:33:44 <arma2> yes, i read all the backlog. i found the "6000 caches for <10 users" line compelling.
17:34:13 <GeKo> nickm: we are workingo on that as part of s61, so, yes?
17:34:23 <GeKo> *working
17:34:31 <GeKo> i mean we need to write code to do so
17:34:41 <GeKo> but the tickets are filed
17:34:44 <nickm> GeKo: which "that" do you mean?
17:34:46 <GeKo> and work started to do so
17:34:49 <nickm> the "using the info"?
17:34:59 <GeKo> yes
17:35:12 <GeKo> that is while sbws is about to roll out
17:35:30 <GeKo> we want to start work on making sure the overload is taken into account during bw measurements
17:36:05 <GeKo> there are already relays which are reporting overload-general, so this is needed
17:36:06 <ahf> what about sbws uses extra-info *for now* to get going, network team works on getting rid of extra-info and merging it into sevrer descriptor?
17:36:28 <mikeperry> I don't think we should cache extra-info at caches, either. I don't think it makes sense to merge them. I think having just this overload-general line in descs is less overhead total, is more reliable, is less risk in the event of dos, is less engineering cost, etc
17:36:30 <nickm> ahf: just moving this field into server-descriptor would be  easier,IMO
17:36:32 <ahf> that way i think you guys are unblocked, we promise to resolve it in the future, at some point in the future you can avoid using extra-info for all the reasons listed?
17:36:45 <ahf> nickm: i am all for that too
17:36:50 <ahf> then let's do that
17:36:52 <dgoulet> ok lets move this to server desc and move on
17:36:55 <nickm> (is there any point in extrainfos?)
17:36:59 <ahf> yes, please
17:37:16 <arma2> sounds good
17:37:19 <GeKo> if it's easier implementation-wise all the lines could move to server descs
17:37:32 <GeKo> i am not sure how weird it is having just some of them in server descs
17:37:34 <nickm> Question: I'd like to get a general principle out of this so that we don't have to do this over every time we have a question about what info goes where.
17:37:44 <nickm> Any thoughts?
17:37:57 <nickm> "Never put anything into extrainfos that some tool might need"?
17:37:59 <ahf> nothing goes into extra-info and we try to deprecate it?
17:38:08 <dgoulet> ^
17:38:17 <nickm> "Only metrics should use extrainfo"?
17:38:17 <dgoulet> if the mindset of simplifying things, we should get rid of one
17:38:28 <arma2> why are caches caching relay descriptors either? nothing asks the caches for them, right? except clients who set usemicrodescriptors to 0?
17:38:33 <ahf> i think we should get rid of it entirely over time
17:38:46 <arma2> so if we want to do the architecture shift, the one to do is to deprecate cached-consensus and cached-descriptors
17:39:01 <nickm> arma2: that's correct.  We should migrate caches to not serving descriptors by default either.  But that's a bigger and separate change.
17:39:37 <nickm> When we do that we should form a plan to keep _some_ (like 100) caches serving descriptors, so metrics and sbws and whatever can use them
17:39:47 <nickm> perhaps
17:41:05 <ahf> yeah
17:41:28 <ahf> ok, mikeperry, juga, GeKo: will the above stuff unblock you guys and make it possible to proceed?
17:41:52 <juga> ahf, yes
17:41:55 <ahf> like move overoad-general into the sever descriptor, and the netteam works on finding a way of probably not using extra-info in the future
17:42:09 <GeKo> sounds good
17:42:10 <GeKo> thanks
17:42:37 <ahf> ok, i'll create a ticket for it after the meeting to tpo/core/tor
17:42:51 <GeKo> <3
17:43:12 <ahf> what is next on our agenda?
17:43:15 <mikeperry> yay
17:44:10 <mikeperry> that's it for s61, I think. unless there are any questions about congestion control, etc or anything else
17:45:46 * ahf is good
17:46:15 <gaba> thanks ahf for faciliting this hard discussion
17:46:29 <ahf> np, sorry if i sound pushy on it
17:46:48 <juga> thanks all for arriving to a solution
17:46:54 <ahf> i am gonna call endmeeting now
17:46:56 <ahf> #endmeeting