14:59:14 <karsten> #startmeeting metrics team meeting
14:59:14 <MeetBot> Meeting started Thu Aug 27 14:59:14 2020 UTC.  The chair is karsten. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:59:14 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic.
14:59:16 <karsten> hi mikeperry!
14:59:18 <dennis_jackson> o/
14:59:22 <karsten> hi dennis_jackson!
14:59:22 <jnewsome> o/
14:59:31 <karsten> hi jnewsome!
14:59:42 <mikeperry> hello all
15:00:01 <gaba> hi!
15:00:10 <karsten> https://pad.riseup.net/p/tor-metricsteam-2020.1-keep <- pad
15:00:12 <karsten> hi gaba!
15:00:28 <acute> hi everyone!
15:00:37 <karsten> hi acute!
15:02:05 <karsten> okay, I added two topics for today. do we have more?
15:03:38 <karsten> let's start, and if more topics come up, append them to the agenda.
15:03:44 <karsten> OnionPerf 0.7 release
15:03:51 <mikeperry> I added one. if you're going on leave at end of sept we should try to do some trial experiments on CBT to make sure the workflow is ok and I understand how to get data, etc
15:04:01 <karsten> today's the final date of the current roadmap.
15:04:04 <karsten> mikeperry: sounds great!
15:04:27 <karsten> for 0.7, we have two changes according to the change log.
15:04:47 <karsten> https://gitlab.torproject.org/tpo/metrics/onionperf/-/blob/develop/CHANGELOG.md
15:05:17 <karsten> I'm wondering what to do about #33399 here.
15:05:37 <karsten> mikeperry, you mentioned on #33420 that we'll want to drop timeouts together with guards.
15:05:45 <karsten> we're not doing that yet.
15:05:54 <karsten> should we try to include that in 0.7?
15:06:11 <karsten> otherwise, if we change it later, the behavior of 0.7 and 0.8+ will be different.
15:06:26 <mikeperry> hrm yah
15:06:38 <karsten> I'm just not sure what happens if we send a DROPTIMEOUTS command and tor doesn't understand that.
15:06:50 <karsten> or, if we can handle that, how we should handle that.
15:07:03 <karsten> ignore that we cannot drop timeouts, or die?
15:07:22 <mikeperry> just adding in a DROPTIMEOUTS call where DROPGUARDS is done should be sufficient.. but yeah if tor doesn't support it, or if we're not properly able to remove measurements before a new timeout is learned, then the data is not useful
15:07:26 <karsten> (dying seems harsh, just throwing it out here.)
15:07:40 <acute> die with a warning then?
15:08:20 <karsten> maybe?
15:08:25 <mikeperry> dying seems ok so long as we only die if --drop-guards was specified (ie we don't try DROPTIMEOUTS if --drop-guards was enabled)
15:08:34 <karsten> yes, right.
15:08:41 <mikeperry> err don't try=>only try...
15:08:48 <acute> yes, exactly
15:09:10 <karsten> okay, let's try that.
15:09:29 <karsten> who picks #33399? would be good to get this resolved this week.
15:09:44 <karsten> (I can pick it if nobody wants.)
15:10:30 <karsten> okay, picking!
15:10:45 <karsten> that's all for 0.7 from me.
15:11:04 <karsten> moving on to the next roadmap?
15:11:27 <karsten> Roadmap for OnionPerf 0.8
15:11:43 <karsten> I'll be gone in 3.5 weeks.
15:12:02 <karsten> and ideally we'll have 0.8 out earlier than that, so that we can start new measurements with 0.8.
15:12:34 <karsten> how about we go through the tickets on the board and see which one fits into 0.8?
15:12:48 <karsten> https://gitlab.torproject.org/tpo/metrics/onionperf/-/boards
15:13:01 <gaba> sounds good
15:13:14 <karsten> going from right to left, top to bottom:
15:13:21 <karsten> tpo/metrics/onionperf#33260
15:13:36 <karsten> we're almost there, let's include it.
15:13:50 <acute> +1
15:14:21 <mikeperry> great. so long as we can remove a list of fingerprints, then I can test the Fast and Guard relay cutoffs too
15:14:26 <karsten> tpo/metrics/onionperf#33399 is already part of 0.7.
15:14:36 <mikeperry> assuming I can get access to data and apply those filters + graph
15:14:52 <karsten> mikeperry: that should work, yes.
15:15:11 <karsten> mikeperry: you can also start trying that as soon as #33260 is merged to the develop branch, if you want.
15:15:44 <karsten> speaking of, should we include a section in the readme for filtering?...
15:15:56 <mikeperry> ok. that would be great if you can walk me through that process as soon as develop is ready
15:15:57 <karsten> let's put that on the list...
15:16:04 <karsten> yes, will do!
15:16:09 <acute> karsten: happy to have a go at this
15:16:24 <karsten> acute: the readme?
15:16:30 <acute> yes
15:16:36 <karsten> cool! commenting on the ticket now.
15:17:21 <karsten> done.
15:17:36 <karsten> tpo/metrics/onionperf#34231
15:17:42 <mikeperry> is https://gitlab.torproject.org/tpo/metrics/onionperf/-/issues/33328 a dup?
15:18:17 <gaba> yes, it is the objective from the project
15:18:19 <karsten> mikeperry: that's a "Project" ticket.
15:18:43 <karsten> I have been ignoring those as good as I could.
15:19:07 <mikeperry> oh from trac's old parent tickets?
15:19:32 <acute> does this mean the objective is complete once we implement #33260?
15:20:27 <mikeperry> #33260 sounds like it meets what I need. if so, then yes
15:20:44 <mikeperry> so long as it doesn't explode if the list of relays to remove is too large or something like that
15:20:58 <karsten> we never tried, but it shouldn't. :)
15:21:13 <karsten> filtering by fingerprints was the most basic way of filtering we came up with.
15:21:13 <mikeperry> then I can just generate fingerprint lists in stem on the side and filter arbitrarily that way
15:21:22 <karsten> right. that was the plan.
15:21:38 <karsten> the original plan was to import tor descriptors and do more sophisticated filters in onionperf.
15:21:57 <karsten> we can still do that at a later time. but for now, using stem to generate fingerprints and handing those over to onionperf is the way to go.
15:22:51 <karsten> okay, going back to tpo/metrics/onionperf#34231:
15:22:58 <karsten> acute: should we move that to backlog?
15:23:19 <karsten> with the reasoning that we already have a way to map tgen and tor parts.
15:23:35 <karsten> we can still do the more elegant way later, but it's not a blocker right now.
15:23:40 <acute> I don't think there is any rush to include it in 0.8
15:23:49 <acute> so we can
15:23:52 <karsten> okay. moving it.
15:24:18 <karsten> tpo/metrics/onionperf#33420
15:24:37 <karsten> I'd like to keep that for 0.8.
15:25:07 <karsten> it's also related to mikeperry's trial experiment/analysis idea.
15:25:26 <mikeperry> yeah I will likely need to work with that before you come back
15:25:37 <mikeperry> so making sure it does stuff properly first is wise
15:25:45 <karsten> yep. let's keep it then.
15:26:47 <karsten> tpo/metrics/onionperf#40001
15:27:08 <karsten> I wonder if I could have some help with that.
15:27:39 <karsten> for example, part of this documentation includes the setup of our long-running instances.
15:27:44 <mikeperry> I can help by trying to use the docs and whining and crying when I get confused :)
15:27:53 <karsten> yes, that _is_ helpful!
15:28:10 <karsten> let's try that as soon as filters are in the develop branch, okay?
15:28:15 <mikeperry> ok
15:28:30 <acute> ok, so I've actually not set up one of our onionpefs
15:28:50 <acute> but I did examine the setup of op-ab, so I could attempt to draft something
15:28:51 <karsten> would you want to do that together with me, and we write the documentation as we go?
15:29:02 <acute> yes, that sounds great!
15:29:08 <karsten> awesome!
15:30:20 <karsten> great. let's pick a date and time offline.
15:30:36 <acute> cool!
15:30:56 <karsten> tpo/metrics/onionperf#33421
15:31:17 <karsten> it's still a lot of work.
15:31:37 <karsten> and the first part would be to understand how exactly guards work.
15:31:39 <karsten> ;)
15:31:56 <karsten> mikeperry: maybe you could help with the first part there?
15:31:56 <dennis_jackson> :P
15:32:09 <mikeperry> also one of the experiments I want to do is use more than one guard at once. this should improve long-tail performance
15:32:12 <karsten> https://gitlab.torproject.org/tpo/metrics/onionperf/-/issues/33421#note_2706521
15:32:43 <mikeperry> via torrc Num*Guards settings
15:33:08 <karsten> that sounds doable.
15:33:19 <karsten> adding more torrc options is easy in onionperf.
15:33:53 <karsten> the hard part of this issue is to find out what exactly in the tor logs we'd like to process in onionperf.
15:34:07 <karsten> well, as the comment on the issue says.
15:34:26 <mikeperry> can't we just use GUARD events and compare to circuit path lines from the control port?
15:34:45 <karsten> the GUARD events, even the recently fixed ones, are possibly insufficient for this.
15:34:54 <mikeperry> like if I have a data file that records GUARD events, and also path lines, in theory I can do checks on that myself
15:35:01 <mikeperry> oh
15:35:36 <karsten> again, this whole guards thing is a mystery. with all the different sets of candidates, primary guards, and so on.
15:35:42 <karsten> maybe I'm wrong, and they are sufficient.
15:35:56 <karsten> that would be the good result of this first analysis.
15:36:12 <karsten> the not-so-good result would be that we'll have to fix GUARD events even more.
15:36:20 <karsten> because we're not going to parse tor logs in onionperf, just torctl logs.
15:36:43 <dennis_jackson> Question: Are the experiments intended to find bugs in how Tor handles Guards etc?
15:37:10 <dennis_jackson> If not, just using stem directly steps over that issue right? At least, that's what I've done to avoid having to dig into the issue too much
15:37:27 <karsten> how did you use stem?
15:37:35 <dennis_jackson> Programmatically building the circuits I wanted directly
15:37:52 <karsten> ah, that would be a huge change to what onionperf does right now.
15:37:57 <mikeperry> onionperf lets tor itself choose paths
15:38:18 <dennis_jackson> Sure yes, but that's why I asked what you want to measure
15:38:28 <karsten> we might use stem to ask tor what guards it uses.
15:38:34 <karsten> and log that.
15:38:39 <mikeperry> so we need to record those paths, and the output of GUARD events, and see if tor is doing the right thing when we tell it to use 1 guard, or 2 guard, or 3 guards
15:38:44 <karsten> that would work around relying on events.
15:38:47 <dennis_jackson> ah okay
15:38:56 <mikeperry> this GUARD event is a sad stateful mess
15:39:19 <karsten> how about this: I can spend a few hours on this to get this analysis started.
15:39:36 <karsten> I'm just not sure if we'll get it resolved in time for 0.8.
15:39:37 <mikeperry> it should have just told us what Tor thinks the current guards are right now instead of all this stateful per-guard UP/DOWN info
15:40:05 <karsten> never too late to add another event type...
15:40:43 <mikeperry> UP/DOWN also seem not necessarily correlated with in-use
15:40:54 <mikeperry> they might just mean possible to use
15:41:09 <mikeperry> same thing for BAD/GOOD
15:41:37 <karsten> mikeperry: do you want to take a closer look at this first and comment on the ticket before I do something there?
15:42:43 <karsten> in any case, let's keep it in the roadmap, though it might turn out to be too big for 0.8.
15:43:13 <karsten> quickly looking through "Backlog".
15:43:33 <mikeperry> yeah I think this GUARD event requires knowledge of prop271 internal tor state to make use of
15:43:35 <karsten> I don't think there's room for more, if we want to finish in 2-2.5 weeks.
15:43:47 <karsten> mikeperry: sounds like it.
15:43:48 <mikeperry> it is just telling us about *potential* guards, not the one that prop271 decided was the best
15:43:56 <mikeperry> if I am reading the patch right
15:44:04 <mikeperry> ugh I should have looked at that earlier
15:44:19 <karsten> do you want to write a better tor patch, and we run that in onionperf for a while?
15:45:20 <mikeperry> yeah I should at least try.. but you're right.. it is not clear how the hell this GUARD event is *supposed* to tell you the current in-use guard(s)
15:45:28 <mikeperry> because of all this primary secondary business
15:46:05 <karsten> can I assign the issue to you for possible next steps?
15:46:08 <mikeperry> yah
15:47:30 <karsten> done.
15:47:33 <karsten> thanks!
15:47:48 <karsten> okay, I think that's the plan for 0.8 then.
15:48:02 <karsten> anything else on the roadmap topic?
15:48:32 <karsten> moving on:
15:48:33 <karsten> CBT trial experiments/analysis
15:49:04 <karsten> 15:05:25 <+mikeperry> I added one. if you're going on leave at end of sept we should try to do some trial experiments on CBT to make sure the workflow is ok and I understand how to get data, etc
15:49:29 <karsten> assuming we implement something like I suggested on tpo/metrics/onionperf#33420 today,
15:49:30 <mikeperry> so for that, I will want to run my own unionperf instance and examine and graph output most likely
15:50:13 <mikeperry> it already smells fishy if those values you posted on tpo/metrics/onionperf#33420 are real
15:50:22 <karsten> they are.
15:50:47 <karsten> I wonder if it's easier to just analyze the torctl logs directly first.
15:51:05 <karsten> if the goal is to get a feel of the data.
15:51:56 <karsten> can you write down what exactly you're interested in, and I run a quick analysis on the torctl logs locally?
15:52:01 <mikeperry> well the goal is to make sure I know enough onionperf kungfu to be able to diagnose and fix the issue later, as well as tune the quantile value via a custom tor patch
15:53:09 <mikeperry> for the tuning, I want to see what different values of cutoff_quantile do to the actual timeout rate
15:53:13 <karsten> I'm just not sure if we should add all the buildtimeout values to graphs and/or the CSV output.
15:53:22 <mikeperry> and to TTFB and throughput metrics
15:54:14 <karsten> okay. in that case let's try to do this in onionperf, as part of #33420.
15:54:30 <mikeperry> if it is simpler to keep the fields you already put in, that is sufficient
15:54:44 <mikeperry> I can do debugging and analysis with torctl logs, as you said, yah
15:54:50 <mikeperry> of a custom onionperf
15:55:08 <karsten> well, that might be easier.
15:55:16 <karsten> we can later add new stuff to onionperf for this.
15:55:27 <karsten> but knowing what exactly we're interested in would help with that.
15:55:50 <mikeperry> when we do the full experiment on the live network, we will want to be able to mark which sections of the onionperf graphs used what cutoff_quantile
15:56:01 <karsten> you don't even need a custom onionperf (in terms of patched). onionperf already writes torctl logs containing all those events.
15:56:16 <mikeperry> and also know what their timeout_rate (and what onionperf things is the timeout+failure rates) at those times
15:56:23 <mikeperry> ah ok
15:56:49 <karsten> sounds like we'll need to discuss that more.
15:56:52 <mikeperry> I do need to patch tor if I want to change cutoff_quantile locally (as opposed to network-wide in consensus)
15:56:53 <karsten> (90 seconds left)
15:57:00 <karsten> oh, right.
15:57:08 <karsten> I mean, sounds plausible. I wouldn't know for sure.
15:57:24 <karsten> let's use the last minute for the last topic:
15:57:28 <karsten> Simply Secure and Tor UX are running a survey to collect user feedback about the metrics website. Please, participate! I plan to email lists the next month to encourage people to do it (antonela)
15:57:32 <karsten> https://tools.simplysecure.org/survey/index.php?r=survey/index&sid=39865&lang=en
15:57:45 <karsten> I added something to the metrics website for that.
15:57:48 <antonela> yes, thanks Karsten for pushing the banner live!
15:57:50 <karsten> with a link.
15:57:53 <karsten> sure!
15:58:02 <antonela> this work has OTF funding and given the current situation, the work has a stop order. We will use the survey to collect info until things back to regular mood.
15:58:14 <antonela> ill email the lists the next month to call for participation
15:58:22 <karsten> sounds great!
15:58:26 <karsten> thanks for this!
15:58:28 <antonela> all people here should jump in!
15:58:34 <antonela> of course
15:58:46 <karsten> time's up! I think there's another meeting after this.
15:58:58 <karsten> thanks, everyone! talk to you next week. o/
15:59:01 <acute> will do :)
15:59:10 <karsten> #endmeeting