14:58:48 #startmeeting metrics team meeting 14:58:48 Meeting started Thu May 28 14:58:48 2020 UTC. The chair is karsten. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:58:48 Useful Commands: #action #agreed #help #info #idea #link #topic. 14:58:56 https://pad.riseup.net/p/tor-metricsteam-2020.1-keep <- agenda pad 14:59:11 please add topics you want to discuss today. 14:59:41 hi! 14:59:46 hi! 15:00:07 hi! 15:00:11 hi! 15:01:45 giving another minute or so for people to add more topics. 15:01:55 ok 15:02:44 alright. 15:02:50 let's start! 15:02:55 Find out why onion service measurements have gotten slower (#34303) 15:03:09 so, it looks like this is soon going to be resolved. 15:03:21 right not it's in network team land, I think. 15:03:21 nice! 15:03:25 right now* 15:03:30 really good catch! 15:03:35 :) 15:03:53 if you're still running op-ab, I think you can stop that now. but don't delete the instance just yet. 15:04:06 understood 15:04:12 thanks! 15:04:20 so, the question on the pad: 15:04:25 How do we catch future '#34303's? 15:04:42 monitoring with expected bounds? 15:05:05 where we would have to define these bounds per instance. 15:05:25 and if measurements get faster or slower, we'll put out a warning. 15:05:33 This would be my suggestion. There's obviously a lot of incoming time series, there's no way the metrics team can catch them all by manual inspection 15:05:34 something like that? 15:05:34 we could have some instances running the latest op/tor code, that get updated when the software does? 15:05:57 I think that's a really nice idea. Diversity in Tor version and/or config 15:06:45 #34257 is also quite related to this - where Karsten's eye catches some strange behaviour 15:07:34 we would at least see if software changes produce changes in our results 15:08:01 note that the current state of things is that we don't have any real monitoring in place. 15:08:32 monitoring the three long-running instances is a matter of me running a local script that fetches the latest tgen logs, greps them, and I look at the last heartbeat message. 15:09:07 but yes, it's good to keep this in mind when we improve our monitoring capabilities. 15:09:43 hmm, reminds me of #28271 15:09:46 and yes, #34257 could be part of this, too. another metric to keep an eye on. though harder to spot. 15:10:35 yeah, #28271 needs more attention. 15:11:33 it's on the list, so we're not going to forget about it. 15:12:15 okay. noted as something to keep in mind in the future! moving on? 15:12:59 Analyze unusual distribution of time to extend to first hop in circuit (#34257) 15:13:10 this one is yet unresolved. 15:13:33 one question here: do we still need that other hong kong instance? 15:13:52 and do we need other measurements? 15:14:28 and do we need other measurements? <- last thing I wrote before you left. 15:14:39 Sorry, lost connection for a moment 15:15:02 To my mind, not immediately 15:15:29 what's the best way forward to further investigate this? 15:16:15 I was thinking of sitting down with the raw logs and looking at the actual initiation times 15:16:24 this is also less urgent than #34303. 15:16:36 yes, that makes sense. 15:16:49 To me, it really feels like a onionperf bug, because the performance is definitely not this bad with Tor Browser 15:17:21 do you have times to compare? 15:17:28 circuit build times, that is? 15:17:52 but why would those be different? 15:18:01 I think I have start2req times from GCP Instances in Hong Kong and they look normal 15:18:18 ah, I was thinking of the time to build the first hop. 15:18:41 Well I think the error for start2req is much larger in magnitude 15:18:46 the start2req suffer from #34303, of course. 15:19:00 in the onion case. 15:19:56 hm.could #34303 not also be impacting the timings here? 15:20:05 yes! 15:20:14 well, the start2req. 15:20:26 unclear about circuit build times. 15:20:54 the impact could be that newly built circuits are different from preemtively built circuits. 15:20:59 well, I don't know. 15:21:19 me neither, but I think we are on the same page that we need to cast the net deeper rather than wider 15:21:22 should I change the ec2 hong kong instance to run a #34303-patched tor version? 15:21:35 Ah, that would be great 15:21:52 okay, I'll do that and let it collect measurements over the next days. 15:22:26 great! 15:22:34 Fantastic, I next hope to scrape together a few hours for Tor analysis on Saturday morning and happy to have a look if there's some data available by then 15:22:37 so many mysteries. and we're only starting here. 15:22:45 Haha, indeed 15:23:02 yup, will add measurements by friday evening then. 15:23:24 great 15:23:37 cool! moving on: 15:23:41 Harmonize TTFB/TTLB definitions with Tor Metrics plots (#34215) 15:23:59 maybe we can decide what to do here. 15:24:40 I'd like to change the onionperf graphs to show the same TTFB/TTLB as the metrics website graphs. 15:25:00 if we don't do that, we'll need to make a new plan. 15:25:19 I have never used onionperf to do plotting, so I can't stake much of a comment 15:25:20 any objections here to make that change? (the patch is trivial.) 15:25:24 Harmonising sounds great though 15:25:35 ok. 15:25:42 acute: what do you think? 15:26:01 have just had a look at this 15:26:53 I think it makes sense to include tor part of the measurement in the total time, so I'd say we should do it 15:27:01 great! 15:27:29 I'll go ahead then. thanks! 15:27:36 Split visualizations into public server vs. v2 onion server vs. v3 onion server measurements (#34216) 15:27:50 this is another important change to the visualizations to make them actually useful. 15:28:10 before this change, all measurements would be plotted together; but that doesn't work so well with public+onion measurements. 15:28:26 this is less about the decision to do it, but about the code to review. 15:28:34 happy to review this 15:28:38 it touches all visualizations in the onionperf code. 15:28:50 that would be wonderful! 15:29:02 at least the changes are pretty much the same for all graphs there. 15:29:14 cool, I'll accept it :D 15:29:21 yay! :) 15:29:31 thanks! 15:29:46 Update metrics-web to only plot "official" data (#33397) 15:30:01 I had this on the agenda for last week, but we ran out of time. 15:30:22 I was thinking that we might want to reconsider archiving all measurements in collector. 15:30:38 we're doing that with long-running instances, and we should keep doing that. 15:30:48 but I'm less sure about experimental measurements. 15:30:59 like the ones I did for #34303 and #34257. 15:31:16 if we want to archive them, we'll want to archive more than just the .json files. 15:31:37 I only found the issue in #34303 by reading the tor logs, for example. 15:32:14 the question is whether we should define some guidelines for ourselves rather than build a tool. 15:32:38 we could say that we archive a tarball of the onionperf-data/ directory after running an experiment and put that somewhere. 15:33:14 it's just a thought. 15:33:17 I think it is not unlikely that there would be a need for long term non-plotted measurements. But maybe there is no rush to do the work required to support that 15:33:29 experimental measurements tend to generally be more short-lived 15:33:42 so we should think about how long we keep the data for as well 15:34:33 But experiments ran for one purpose can be useful for others 15:35:06 I'm not yet sure about long term non-plotted measurements. 15:35:13 E.g. when I looked back at latency measurements in the early 2010s, I would have loved to have additional high resolution samples for shorter periods. 15:35:16 how would they differ from short-term measurements? 15:36:27 Well, maybe you want to run OnionPerf on {X,Y,Z} Tor versions with a set of different configs 15:36:55 yes, but we could do that. 15:37:02 But maybe only the current release with normal config should be plotted as official? 15:37:21 right now, we tell collector which onionperf .json files to fetch and archive. 15:37:29 and everything that collector archives goes on the metrics website. 15:37:54 these other long-term measurements would then run, but not be archived by collector. 15:38:03 the files could still be available via their own web server. 15:38:05 How would they be distributed? 15:38:56 Okay, well, I do think having things live in Collector is easier for downstream users, but I totally see it would be effort to implement 15:39:33 karsten: this sounds like a good compromise 15:39:37 okay. I guess we'll have to reconsider as we learn which are our main use cases. 15:39:46 good to hear. :) 15:40:11 okay, moving to the last topic: 15:40:12 Fix message logging and filtering (#29369) 15:41:00 this is "implementation-ready". :) but I'm not sure if you're looking for more work right now. 15:41:10 maybe I should ask phw if he's interested. 15:41:21 I hear friday is his onionperf day. 15:41:42 let me try that. 15:41:56 Anybody need any help with anything? 15:42:01 last topic on the agenda. 15:42:06 good question! 15:42:07 karsten: sure, i can take that 15:42:12 hey! 15:42:22 perfect! 15:42:42 :) 15:42:58 anything we can do to unblock anyone here? 15:43:33 things are ok for me at the moment, thank you very much for all the feedback! 15:43:44 thank you for all the input! :) 15:43:52 All good here. Could just do with another few days in the week 15:43:58 haha 15:44:17 that would be cool! 15:44:30 but you would turn them into weekdays, not weekend days? ok. 15:44:50 great! 15:45:00 I think it'd let me turn the weekends back into actual weekends but yes :P 15:45:03 if something comes up before the next meeting, just use email or trac. 15:45:12 heh, good point! 15:45:28 dennis_jackson: indeed 15:45:47 thanks, everyone! have a good rest of the week and a wonderful weekend! 15:45:52 bye! o/ 15:45:59 bye! 15:46:03 o/ :) 15:46:17 #endmeeting