14:58:48 <karsten> #startmeeting metrics team meeting
14:58:48 <MeetBot> Meeting started Thu May 28 14:58:48 2020 UTC.  The chair is karsten. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:58:48 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic.
14:58:56 <karsten> https://pad.riseup.net/p/tor-metricsteam-2020.1-keep <- agenda pad
14:59:11 <karsten> please add topics you want to discuss today.
14:59:41 <acute> hi!
14:59:46 <karsten> hi!
15:00:07 <gaba> hi!
15:00:11 <karsten> hi!
15:01:45 <karsten> giving another minute or so for people to add more topics.
15:01:55 <gaba> ok
15:02:44 <karsten> alright.
15:02:50 <karsten> let's start!
15:02:55 <karsten> Find out why onion service measurements have gotten slower (#34303)
15:03:09 <karsten> so, it looks like this is soon going to be resolved.
15:03:21 <karsten> right not it's in network team land, I think.
15:03:21 <gaba> nice!
15:03:25 <karsten> right now*
15:03:30 <acute> really good catch!
15:03:35 <karsten> :)
15:03:53 <karsten> if you're still running op-ab, I think you can stop that now. but don't delete the instance just yet.
15:04:06 <acute> understood
15:04:12 <karsten> thanks!
15:04:20 <karsten> so, the question on the pad:
15:04:25 <karsten> How do we catch future '#34303's?
15:04:42 <karsten> monitoring with expected bounds?
15:05:05 <karsten> where we would have to define these bounds per instance.
15:05:25 <karsten> and if measurements get faster or slower, we'll put out a warning.
15:05:33 <dennis_jackson> This would be my suggestion. There's obviously a lot of incoming time series, there's no way the metrics team can catch them all by manual inspection
15:05:34 <karsten> something like that?
15:05:34 <acute> we could have some instances running the latest op/tor code, that get updated when the software does?
15:05:57 <dennis_jackson> I think that's a really nice idea. Diversity in Tor version and/or config
15:06:45 <dennis_jackson> #34257 is also quite related to this - where Karsten's eye catches some strange behaviour
15:07:34 <acute> we would at least see if software changes produce changes in our results
15:08:01 <karsten> note that the current state of things is that we don't have any real monitoring in place.
15:08:32 <karsten> monitoring the three long-running instances is a matter of me running a local script that fetches the latest tgen logs, greps them, and I look at the last heartbeat message.
15:09:07 <karsten> but yes, it's good to keep this in mind when we improve our monitoring capabilities.
15:09:43 <acute> hmm, reminds me of #28271
15:09:46 <karsten> and yes, #34257 could be part of this, too. another metric to keep an eye on. though harder to spot.
15:10:35 <karsten> yeah, #28271 needs more attention.
15:11:33 <karsten> it's on the list, so we're not going to forget about it.
15:12:15 <karsten> okay. noted as something to keep in mind in the future! moving on?
15:12:59 <karsten> Analyze unusual distribution of time to extend to first hop in circuit (#34257)
15:13:10 <karsten> this one is yet unresolved.
15:13:33 <karsten> one question here: do we still need that other hong kong instance?
15:13:52 <karsten> and do we need other measurements?
15:14:28 <karsten> and do we need other measurements? <- last thing I wrote before you left.
15:14:39 <dennis_jackson> Sorry, lost connection for a moment
15:15:02 <dennis_jackson> To my mind, not immediately
15:15:29 <karsten> what's the best way forward to further investigate this?
15:16:15 <dennis_jackson> I was thinking of sitting down with the raw logs and looking at the actual initiation times
15:16:24 <karsten> this is also less urgent than #34303.
15:16:36 <karsten> yes, that makes sense.
15:16:49 <dennis_jackson> To me, it really feels like a onionperf bug, because the performance is definitely not this bad with Tor Browser
15:17:21 <karsten> do you have times to compare?
15:17:28 <karsten> circuit build times, that is?
15:17:52 <karsten> but why would those be different?
15:18:01 <dennis_jackson> I think I have start2req times from GCP Instances in Hong Kong and they look normal
15:18:18 <karsten> ah, I was thinking of the time to build the first hop.
15:18:41 <dennis_jackson> Well I think the error for start2req is much larger in magnitude
15:18:46 <karsten> the start2req suffer from #34303, of course.
15:19:00 <karsten> in the onion case.
15:19:56 <dennis_jackson> hm.could #34303 not also be impacting the timings here?
15:20:05 <karsten> yes!
15:20:14 <karsten> well, the start2req.
15:20:26 <karsten> unclear about circuit build times.
15:20:54 <karsten> the impact could be that newly built circuits are different from preemtively built circuits.
15:20:59 <karsten> well, I don't know.
15:21:19 <dennis_jackson> me neither, but I think we are on the same page that we need to cast the net deeper rather than wider
15:21:22 <karsten> should I change the ec2 hong kong instance to run a #34303-patched tor version?
15:21:35 <dennis_jackson> Ah, that would be great
15:21:52 <karsten> okay, I'll do that and let it collect measurements over the next days.
15:22:26 <karsten> great!
15:22:34 <dennis_jackson> Fantastic, I next hope to scrape together a few hours for Tor analysis on Saturday morning and happy to have a look if there's some data available by then
15:22:37 <karsten> so many mysteries. and we're only starting here.
15:22:45 <dennis_jackson> Haha, indeed
15:23:02 <karsten> yup, will add measurements by friday evening then.
15:23:24 <dennis_jackson> great
15:23:37 <karsten> cool! moving on:
15:23:41 <karsten> Harmonize TTFB/TTLB definitions with Tor Metrics plots (#34215)
15:23:59 <karsten> maybe we can decide what to do here.
15:24:40 <karsten> I'd like to change the onionperf graphs to show the same TTFB/TTLB as the metrics website graphs.
15:25:00 <karsten> if we don't do that, we'll need to make a new plan.
15:25:19 <dennis_jackson> I have never used onionperf to do plotting, so I can't stake much of a comment
15:25:20 <karsten> any objections here to make that change? (the patch is trivial.)
15:25:24 <dennis_jackson> Harmonising sounds great though
15:25:35 <karsten> ok.
15:25:42 <karsten> acute: what do you think?
15:26:01 <acute> have just had a look at this
15:26:53 <acute> I think it makes sense to include tor part of the measurement in the total time, so I'd say we should do it
15:27:01 <karsten> great!
15:27:29 <karsten> I'll go ahead then. thanks!
15:27:36 <karsten> Split visualizations into public server vs. v2 onion server vs. v3 onion server measurements (#34216)
15:27:50 <karsten> this is another important change to the visualizations to make them actually useful.
15:28:10 <karsten> before this change, all measurements would be plotted together; but that doesn't work so well with public+onion measurements.
15:28:26 <karsten> this is less about the decision to do it, but about the code to review.
15:28:34 <acute> happy to review this
15:28:38 <karsten> it touches all visualizations in the onionperf code.
15:28:50 <karsten> that would be wonderful!
15:29:02 <karsten> at least the changes are pretty much the same for all graphs there.
15:29:14 <acute> cool, I'll accept it :D
15:29:21 <karsten> yay! :)
15:29:31 <karsten> thanks!
15:29:46 <karsten> Update metrics-web to only plot "official" data (#33397)
15:30:01 <karsten> I had this on the agenda for last week, but we ran out of time.
15:30:22 <karsten> I was thinking that we might want to reconsider archiving all measurements in collector.
15:30:38 <karsten> we're doing that with long-running instances, and we should keep doing that.
15:30:48 <karsten> but I'm less sure about experimental measurements.
15:30:59 <karsten> like the ones I did for #34303 and #34257.
15:31:16 <karsten> if we want to archive them, we'll want to archive more than just the .json files.
15:31:37 <karsten> I only found the issue in #34303 by reading the tor logs, for example.
15:32:14 <karsten> the question is whether we should define some guidelines for ourselves rather than build a tool.
15:32:38 <karsten> we could say that we archive a tarball of the onionperf-data/ directory after running an experiment and put that somewhere.
15:33:14 <karsten> it's just a thought.
15:33:17 <dennis_jackson> I think it is not unlikely that there would be a need for long term non-plotted measurements. But maybe there is no rush to do the work required to support that
15:33:29 <acute> experimental measurements tend to generally be more short-lived
15:33:42 <acute> so we should think about how long we keep the data for as well
15:34:33 <dennis_jackson> But experiments ran for one purpose can be useful for others
15:35:06 <karsten> I'm not yet sure about long term non-plotted measurements.
15:35:13 <dennis_jackson> E.g. when I looked back at latency measurements in the early 2010s, I would have loved to have additional high resolution samples for shorter periods.
15:35:16 <karsten> how would they differ from short-term measurements?
15:36:27 <dennis_jackson> Well, maybe you want to run OnionPerf on {X,Y,Z} Tor versions with a set of different configs
15:36:55 <karsten> yes, but we could do that.
15:37:02 <dennis_jackson> But maybe only the current release with normal config should be plotted as official?
15:37:21 <karsten> right now, we tell collector which onionperf .json files to fetch and archive.
15:37:29 <karsten> and everything that collector archives goes on the metrics website.
15:37:54 <karsten> these other long-term measurements would then run, but not be archived by collector.
15:38:03 <karsten> the files could still be available via their own web server.
15:38:05 <dennis_jackson> How would they be distributed?
15:38:56 <dennis_jackson> Okay, well, I do think having things live in Collector is easier for downstream users, but I totally see it would be effort to implement
15:39:33 <acute> karsten: this sounds like a good compromise
15:39:37 <karsten> okay. I guess we'll have to reconsider as we learn which are our main use cases.
15:39:46 <karsten> good to hear. :)
15:40:11 <karsten> okay, moving to the last topic:
15:40:12 <karsten> Fix message logging and filtering (#29369)
15:41:00 <karsten> this is "implementation-ready". :) but I'm not sure if you're looking for more work right now.
15:41:10 <karsten> maybe I should ask phw if he's interested.
15:41:21 <karsten> I hear friday is his onionperf day.
15:41:42 <karsten> let me try that.
15:41:56 <karsten> Anybody need any help with anything?
15:42:01 <karsten> last topic on the agenda.
15:42:06 <karsten> good question!
15:42:07 <phw> karsten: sure, i can take that
15:42:12 <karsten> hey!
15:42:22 <karsten> perfect!
15:42:42 <acute> :)
15:42:58 <karsten> anything we can do to unblock anyone here?
15:43:33 <acute> things are ok for me at the moment, thank you very much for all the feedback!
15:43:44 <karsten> thank you for all the input! :)
15:43:52 <dennis_jackson> All good here. Could just do with another few days in the week
15:43:58 <acute> haha
15:44:17 <karsten> that would be cool!
15:44:30 <karsten> but you would turn them into weekdays, not weekend days? ok.
15:44:50 <karsten> great!
15:45:00 <dennis_jackson> I think it'd let me turn the weekends back into actual weekends but yes :P
15:45:03 <karsten> if something comes up before the next meeting, just use email or trac.
15:45:12 <karsten> heh, good point!
15:45:28 <acute> dennis_jackson: indeed
15:45:47 <karsten> thanks, everyone! have a good rest of the week and a wonderful weekend!
15:45:52 <karsten> bye! o/
15:45:59 <acute> bye!
15:46:03 <dennis_jackson> o/ :)
15:46:17 <karsten> #endmeeting