14:58:26 #startmeeting metrics team meeting 14:58:27 Meeting started Thu Sep 10 14:58:26 2020 UTC. The chair is karsten. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:58:27 Useful Commands: #action #agreed #help #info #idea #link #topic. 14:58:30 okay, let's start! 14:58:32 o/ 14:58:33 https://pad.riseup.net/p/tor-metricsteam-2020.1-keep 14:58:36 hi mikeperry! 14:58:55 are there more topics for today's agenda? 14:59:48 unfortunately, acute cannot be here today. we briefly talked before the meeting. she'll read the notes. 15:00:27 okay, let's start with the agenda. 15:00:33 Priority on OnionPerf project to finish in the next week. 15:00:48 it's indeed just 1.5 weeks until I go on leave. 15:01:09 we should try to get one more onionperf version out by then. and deployed. 15:01:36 we should probably try to get all features merged by next thursday. 15:01:48 I successfully built 0.7ish. started it up last night 15:01:52 and postpone everything that's not ready. 15:01:58 yay! 15:02:12 0.7ish means master? 15:02:16 yah 15:02:21 okay, cool! 15:02:30 c8275b25e4afda9328634ec6be56ff46c7ee1cfe 15:02:48 yep. 15:02:52 any surprises? 15:03:33 the path to tgen was slightly off from the docs. there is an additional 'src' in my tgen build 15:03:40 otherwise it was mostly cut+paste 15:04:01 https://gitlab.torproject.org/tpo/metrics/onionperf#starting-and-stopping-measurements is the section with the tgen path 15:04:03 will check that 'src' thing. there was a change between tgen 0.0.1 and 1.0.0. 15:04:15 ok. 15:04:42 i am on 048bcc8e2421320c9a27763612b82e86c3c6e683 for tgen fwiw 15:05:05 yep, that's the latest. 15:05:12 I'll check that. 15:05:49 let us know how this works out, including analysis/visualization. 15:06:29 shall we go through the open issues for 0.8? 15:06:56 https://gitlab.torproject.org/tpo/metrics/onionperf/-/boards 15:07:14 tpo/metrics/onionperf#33260 is almost done. 15:07:50 acute and I discussed some final tweaks today. we're good there. 15:07:55 just a final revision and review. 15:08:25 tpo/metrics/onionperf#33421 has a comment from mikeperry that I just replied to. 15:08:44 I think the concept is clear now, this needs some code cleanup, then review, possibly revisions, and the merge. 15:08:51 on track for 0.8, I'd say. 15:09:10 tpo/metrics/onionperf#40001 is also in good shape. 15:09:45 there might still be placeholders in the new documentation for parts that we'll write later, but the important stuff will be documented. 15:09:52 also in time for 0.8. 15:10:11 tpo/metrics/onionperf#33420 is more critical. 15:10:20 we may not have enough time to do this properly. 15:10:45 I'd like to leave it at the end just in case there's still time to do it. 15:10:54 but we shouldn't rush this, or we won't do it right. 15:10:54 is it possible to have onionperf just record the BUILDTIKMEOUT_SET line directly on the side? 15:11:07 that might be enough for initial testing/experiments 15:11:19 onionperf should already log that event. 15:11:33 I'll have to check, but I believe it's written to the .torctl.log files. 15:11:42 that would be in `onionperf measure`. 15:11:54 ok 15:11:56 onionperf would not do anything with that event though. 15:12:14 in the analyze/visualize modes. 15:12:31 you could check in your logs. 15:13:16 okay, that's all about those last remaining four issues. 15:13:22 that might be ok. we should not be changing the tuning parameter (cbtquantile) that much.. logs may suffice 15:13:31 so it is fine for that one to be lower priority 15:13:37 sounds good. 15:14:04 in fact, having some real data would make it easier to implement that feature. 15:14:16 so, in 2-3 months it might be easier to build it. 15:14:42 let's briefly discuss one thing about tpo/metrics/onionperf#40001 here: 15:15:06 acute and I have been talking about serving measurement data tarballs from somewhere. 15:15:21 those tarballs are ~100G right now and more in the future. 15:15:34 one option would be asking for a tp.o host. 15:15:38 another option would be S3. 15:16:08 where S3 would be related to also moving instances to AWS. 15:16:21 and that is the data needed to re-graph and filter results? 15:16:43 no, that data is tiny in comparison. 15:16:46 it would be the full logs. 15:17:04 tarballs of the onionperf-data/ directories produced by `onionperf measure`. 15:17:28 those would be relevant for extracting other parts using `onionperf analyze` than we're extracting right now. 15:17:37 or for grepping/parsing the logs for other things. 15:18:14 the json files required for filtering/re-graphing would still be archived by collector. 15:19:00 is this a discussion to have with the admins? 15:20:01 I'll bring it up there. :) 15:20:27 hrm my tbb is failing to download the instructions.md from 40001 15:21:02 just says failed in the tbb download manager :/ 15:21:15 :( 15:21:21 what about the .html? 15:22:14 aha I was in some other downloads directory other than system one. permissions issue :) 15:23:16 okay. 15:23:35 we'll work on making those instructions even more accessible then. ;) 15:23:49 so my main concern right now is how do I graph and examine my custom onionperf instance data 15:23:57 do I need my own collector instance for that? 15:24:02 no! 15:24:17 just use `onionperf visualize` for that. 15:24:29 README.md has some instructions. 15:24:40 ok and if I want to add any custom graphs? are there examples? 15:24:41 that mode produces a PDF file and a CSV file. 15:25:17 hmm. you'll probably want to look at onionperf/visualization.py and go from the existing code. 15:25:40 or you could take the CSV file and use another graphing tool that you're more familiar with. 15:26:13 and if you need even more, you could always ask acute or phw_ for help at these meetings. 15:26:19 I am a newbie-level grapher.. so any examples, especially python ones, will help me get going 15:26:51 I've used python things before.. stuff in numpy I think 15:27:20 take a look through https://gitlab.torproject.org/tpo/metrics/onionperf/-/blob/master/onionperf/visualization.py 15:27:30 the part about extracting data is craziness. 15:27:43 but the parts about visualization things are quite readable. 15:28:08 I think it's easiest to modify the code there and re-run the visualize mode. 15:28:09 does an 'onionperf analyze' step always have to run before 'visualize'? 15:28:14 no. 15:28:33 the output from `onionperf analyze` is the json file that is the input to `onionperf visualize`. 15:28:52 or json file_s_. you can have directories of those as input to `onionperf visualize`. 15:29:20 has dennis_jackson worked with this data? he is very good at dataviz I have noticed. perhaps I can pester him too 15:29:27 oh, right! 15:29:33 he can for sure help. 15:30:50 okay, let's move to the second agenda item? 15:31:22 sounds good to me. 15:31:26 if nothing else you could use https://app.rawgraphs.io/ with a csv :P 15:31:28 sounds good 15:31:47 Funding proposal for next OnionPerf phase (link in the mail that was sent). 15:32:14 we have an opportunity to apply for amazon’s ‘AWS imagine grant’ 15:32:14 so, this funding would be for AWS resources and development? 15:32:21 please go ahead. 15:32:45 yes - as karsten said, this would be for a grant that involves AWS resources and development. 15:33:40 i know we would like to increase the geographic diversity of the network by spinning up new onion perf instances — and we could pay for the work involved with that, plus a year of aws hosting as one objective 15:33:52 yay! 15:34:07 that would be awesome 15:34:14 but there’s more $ available. i spoke with gaba and we brainstormed — would it be possible to take the onionperf work from the old MOSS proposal (work that hasn’t been done yet) and glue these ideas together 15:34:29 ?* 15:34:54 good question. 15:35:09 regarding hosting, does there have to be a one year limit? 15:35:19 in the pad, i took two objects from our old versions of the MOSS proposal and copy/pasted 15:35:19 stated differently, what happens when that year is over? 15:35:42 karsten - yes, it is limited to a year. so we would need to be sure we can pay for the ~$4k out of tor’s pocket when that is over 15:36:34 regarding development work, we already did a lot in the past few months. 15:36:51 the parts that I suggested in my mail were related to large-scale deployment and monitoring. 15:37:16 given how long we took last week to set up the latest set of onionperf instances, 15:37:33 it would be really important to automate that more if we go from 3 to 9 instances or more. 15:38:10 ok, that makes sense. so step 0 to ‘increasing onionperf geographical diversity’ is automation work 15:38:11 same with monitoring that those 9 instances stay online. 15:38:18 yes, I think so. 15:39:21 is there visualization improvement work that we could include? i think making the results of this project publicly consumable will be important 15:40:26 maybe we should work on the Tor Metrics graphs for this. 15:40:32 how about stability and monitoring work? things like failover instances, data merging in the event of failure, etc? 15:40:53 right now they're designed for 3 instances, but we already reach a limit there with changing sets of 3 instances over time. 15:41:38 missing gaps in our measurements is a big problem I have had while casually digging on https://metrics.torproject.org 15:42:08 I think stability has become better in the past few months. 15:42:16 idk if/what more can be done 15:42:27 monitoring is important. 15:42:32 we can do more there. 15:42:47 your mail talks about Monit 15:42:51 yep. 15:43:30 other than that I think we should have enough data that we can tolerate missing data from single failing instances. 15:43:38 well, make sure we have enough data, that is. 15:44:19 in all these scaling considerations we'll have to keep in mind that resources are available for 1 year only. 15:44:35 ah yes 15:44:40 if we scale too much now, we'll have to scale down more in 1 year. 15:44:40 right, yes 15:45:11 or it will be hard to mantain. 15:45:13 I mentioned serving data in my mail. 15:45:26 we'll want to serve measurement data of all these instances. 15:45:36 we need processes and documentation and guidelines for that. 15:46:05 we already need this for our three instances, but it will be more work for 9 or even more instances. 15:46:21 this will be clearer once tpo/metrics/onionperf#40001 is a thing. 15:46:23 would that kind of work go under a ‘visualization’ objective? sorry, i’m not totally sure what serving the data means — like getting it to metrics.tpo? 15:46:42 making sure that tarballs go to S3 and are linked on the right pages. 15:47:06 together with configuration details, maybe after passing a validation script that everything in the tarball is good data. 15:47:19 this will be even more important for experiments. 15:47:24 i see 15:47:33 processing 15:47:39 the boring part of doing experiments: documenting what you did. 15:48:33 right now we have 4 objectives that would be important (they are in the meeting pad). Where documentation would go? 15:48:39 nevermind 15:48:41 obj 4 15:48:54 i added it to obj 4 — but that can be moved if it doesn’t make sense 15:48:58 what's the difference between 1 and 2? 15:49:29 my understanding is that we need to develop automated deployment tools 15:49:33 1 is writing the scripts, and 2 is executing them and making sure everything's deployed? 15:49:35 then use them to deploy 9 instances 15:49:51 right, yes 15:50:03 does that make sense? 15:50:04 right. They could be combined in one obj 15:50:16 got it 15:50:24 hmm. 15:50:38 they can be separate, I just didn't understand the difference. I do understand now. 15:50:47 maybe objective 2 should be at the start. 15:50:58 that's what we want to do: have more measurements in more places. 15:51:22 automating this should happen early in the project, but we might start with setting up things manually and improving automation over time. 15:51:36 same with monitoring. we would start with the simple monitoring we do now and improve over time. 15:51:48 the important thing is to start doing measurements as soon as we have the resources available. 15:52:03 and note how we set up a new set of measurement instances every month right now. 15:52:20 we would likely start with a manual setup on day 1, even if that takes the whole day. 15:52:29 and be happy how it only takes 2 hours the month after. 15:52:42 ok 15:53:25 we should include acute in this conversation. 15:53:40 she probably has many ideas on the automation part. 15:54:09 phw_: do you have thoughts on scaling up monitoring if we have 9 or more onionperf instances? 15:54:31 ooh could we use aws for large shadow simulations? 15:54:40 oh, maybe! 15:54:43 that could be a temporary usage of the 1yr capacity 15:54:49 absolutely. 15:55:02 I will need machines for sims like that 15:55:22 as next step alsmith: you are ok adding all this to the google doc where we have the proposal? and we all continue writing it there? 15:55:29 the previous plan was to beg pastly but I bet that would be complicated because NRL 15:55:40 gaba - yes 15:56:07 mikeperry: maybe we'll need an objective for making sure that shadow and onionperf results are comparable in some regard. 15:56:26 rather than just "give us resources so that we can run large simulations." 15:56:36 karsten: right now we have a single monitor that checks our instances, right? 15:56:53 phw: we have a monitor on each instance, and they all check themselves and each other. 15:56:57 what’s the best way to flesh out these objectives and activities outside of the meeting, once we move it to the google doc? i need to jump to another meeting in 2 min 15:57:12 phw: but we do not have a central monitoring instance. 15:57:36 alsmith: can we invite everyone who participated in this discussion, plus acute, and talk more vial mail/gdoc? 15:57:40 karsten: i have nothing useful to add off the top of my head. my monit experience is based on a central monitoring instance 15:57:44 alsmith: what aobut changing the objectives and proposal based on what we talked about it now and then continue in the email? 15:57:58 phw: okay! 15:57:59 gaba & karsten - sounds good 15:58:05 we should also ask hiro! 15:58:08 we can call to a voice meeting if we need to after the discussion on email 15:58:14 ok! 15:58:17 great! 15:58:21 gotta end the meeting now. 15:58:28 thanks, everyone! bye! o/ 15:58:31 thank you everyone o/ 15:58:34 o/ 15:58:34 clearing the channel in 5, 4, ... 15:58:38 #endmeeting