14:59:14 #startmeeting metrics team meeting 14:59:14 Meeting started Thu Aug 27 14:59:14 2020 UTC. The chair is karsten. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:59:14 Useful Commands: #action #agreed #help #info #idea #link #topic. 14:59:16 hi mikeperry! 14:59:18 o/ 14:59:22 hi dennis_jackson! 14:59:22 o/ 14:59:31 hi jnewsome! 14:59:42 hello all 15:00:01 hi! 15:00:10 https://pad.riseup.net/p/tor-metricsteam-2020.1-keep <- pad 15:00:12 hi gaba! 15:00:28 hi everyone! 15:00:37 hi acute! 15:02:05 okay, I added two topics for today. do we have more? 15:03:38 let's start, and if more topics come up, append them to the agenda. 15:03:44 OnionPerf 0.7 release 15:03:51 I added one. if you're going on leave at end of sept we should try to do some trial experiments on CBT to make sure the workflow is ok and I understand how to get data, etc 15:04:01 today's the final date of the current roadmap. 15:04:04 mikeperry: sounds great! 15:04:27 for 0.7, we have two changes according to the change log. 15:04:47 https://gitlab.torproject.org/tpo/metrics/onionperf/-/blob/develop/CHANGELOG.md 15:05:17 I'm wondering what to do about #33399 here. 15:05:37 mikeperry, you mentioned on #33420 that we'll want to drop timeouts together with guards. 15:05:45 we're not doing that yet. 15:05:54 should we try to include that in 0.7? 15:06:11 otherwise, if we change it later, the behavior of 0.7 and 0.8+ will be different. 15:06:26 hrm yah 15:06:38 I'm just not sure what happens if we send a DROPTIMEOUTS command and tor doesn't understand that. 15:06:50 or, if we can handle that, how we should handle that. 15:07:03 ignore that we cannot drop timeouts, or die? 15:07:22 just adding in a DROPTIMEOUTS call where DROPGUARDS is done should be sufficient.. but yeah if tor doesn't support it, or if we're not properly able to remove measurements before a new timeout is learned, then the data is not useful 15:07:26 (dying seems harsh, just throwing it out here.) 15:07:40 die with a warning then? 15:08:20 maybe? 15:08:25 dying seems ok so long as we only die if --drop-guards was specified (ie we don't try DROPTIMEOUTS if --drop-guards was enabled) 15:08:34 yes, right. 15:08:41 err don't try=>only try... 15:08:48 yes, exactly 15:09:10 okay, let's try that. 15:09:29 who picks #33399? would be good to get this resolved this week. 15:09:44 (I can pick it if nobody wants.) 15:10:30 okay, picking! 15:10:45 that's all for 0.7 from me. 15:11:04 moving on to the next roadmap? 15:11:27 Roadmap for OnionPerf 0.8 15:11:43 I'll be gone in 3.5 weeks. 15:12:02 and ideally we'll have 0.8 out earlier than that, so that we can start new measurements with 0.8. 15:12:34 how about we go through the tickets on the board and see which one fits into 0.8? 15:12:48 https://gitlab.torproject.org/tpo/metrics/onionperf/-/boards 15:13:01 sounds good 15:13:14 going from right to left, top to bottom: 15:13:21 tpo/metrics/onionperf#33260 15:13:36 we're almost there, let's include it. 15:13:50 +1 15:14:21 great. so long as we can remove a list of fingerprints, then I can test the Fast and Guard relay cutoffs too 15:14:26 tpo/metrics/onionperf#33399 is already part of 0.7. 15:14:36 assuming I can get access to data and apply those filters + graph 15:14:52 mikeperry: that should work, yes. 15:15:11 mikeperry: you can also start trying that as soon as #33260 is merged to the develop branch, if you want. 15:15:44 speaking of, should we include a section in the readme for filtering?... 15:15:56 ok. that would be great if you can walk me through that process as soon as develop is ready 15:15:57 let's put that on the list... 15:16:04 yes, will do! 15:16:09 karsten: happy to have a go at this 15:16:24 acute: the readme? 15:16:30 yes 15:16:36 cool! commenting on the ticket now. 15:17:21 done. 15:17:36 tpo/metrics/onionperf#34231 15:17:42 is https://gitlab.torproject.org/tpo/metrics/onionperf/-/issues/33328 a dup? 15:18:17 yes, it is the objective from the project 15:18:19 mikeperry: that's a "Project" ticket. 15:18:43 I have been ignoring those as good as I could. 15:19:07 oh from trac's old parent tickets? 15:19:32 does this mean the objective is complete once we implement #33260? 15:20:27 #33260 sounds like it meets what I need. if so, then yes 15:20:44 so long as it doesn't explode if the list of relays to remove is too large or something like that 15:20:58 we never tried, but it shouldn't. :) 15:21:13 filtering by fingerprints was the most basic way of filtering we came up with. 15:21:13 then I can just generate fingerprint lists in stem on the side and filter arbitrarily that way 15:21:22 right. that was the plan. 15:21:38 the original plan was to import tor descriptors and do more sophisticated filters in onionperf. 15:21:57 we can still do that at a later time. but for now, using stem to generate fingerprints and handing those over to onionperf is the way to go. 15:22:51 okay, going back to tpo/metrics/onionperf#34231: 15:22:58 acute: should we move that to backlog? 15:23:19 with the reasoning that we already have a way to map tgen and tor parts. 15:23:35 we can still do the more elegant way later, but it's not a blocker right now. 15:23:40 I don't think there is any rush to include it in 0.8 15:23:49 so we can 15:23:52 okay. moving it. 15:24:18 tpo/metrics/onionperf#33420 15:24:37 I'd like to keep that for 0.8. 15:25:07 it's also related to mikeperry's trial experiment/analysis idea. 15:25:26 yeah I will likely need to work with that before you come back 15:25:37 so making sure it does stuff properly first is wise 15:25:45 yep. let's keep it then. 15:26:47 tpo/metrics/onionperf#40001 15:27:08 I wonder if I could have some help with that. 15:27:39 for example, part of this documentation includes the setup of our long-running instances. 15:27:44 I can help by trying to use the docs and whining and crying when I get confused :) 15:27:53 yes, that _is_ helpful! 15:28:10 let's try that as soon as filters are in the develop branch, okay? 15:28:15 ok 15:28:30 ok, so I've actually not set up one of our onionpefs 15:28:50 but I did examine the setup of op-ab, so I could attempt to draft something 15:28:51 would you want to do that together with me, and we write the documentation as we go? 15:29:02 yes, that sounds great! 15:29:08 awesome! 15:30:20 great. let's pick a date and time offline. 15:30:36 cool! 15:30:56 tpo/metrics/onionperf#33421 15:31:17 it's still a lot of work. 15:31:37 and the first part would be to understand how exactly guards work. 15:31:39 ;) 15:31:56 mikeperry: maybe you could help with the first part there? 15:31:56 :P 15:32:09 also one of the experiments I want to do is use more than one guard at once. this should improve long-tail performance 15:32:12 https://gitlab.torproject.org/tpo/metrics/onionperf/-/issues/33421#note_2706521 15:32:43 via torrc Num*Guards settings 15:33:08 that sounds doable. 15:33:19 adding more torrc options is easy in onionperf. 15:33:53 the hard part of this issue is to find out what exactly in the tor logs we'd like to process in onionperf. 15:34:07 well, as the comment on the issue says. 15:34:26 can't we just use GUARD events and compare to circuit path lines from the control port? 15:34:45 the GUARD events, even the recently fixed ones, are possibly insufficient for this. 15:34:54 like if I have a data file that records GUARD events, and also path lines, in theory I can do checks on that myself 15:35:01 oh 15:35:36 again, this whole guards thing is a mystery. with all the different sets of candidates, primary guards, and so on. 15:35:42 maybe I'm wrong, and they are sufficient. 15:35:56 that would be the good result of this first analysis. 15:36:12 the not-so-good result would be that we'll have to fix GUARD events even more. 15:36:20 because we're not going to parse tor logs in onionperf, just torctl logs. 15:36:43 Question: Are the experiments intended to find bugs in how Tor handles Guards etc? 15:37:10 If not, just using stem directly steps over that issue right? At least, that's what I've done to avoid having to dig into the issue too much 15:37:27 how did you use stem? 15:37:35 Programmatically building the circuits I wanted directly 15:37:52 ah, that would be a huge change to what onionperf does right now. 15:37:57 onionperf lets tor itself choose paths 15:38:18 Sure yes, but that's why I asked what you want to measure 15:38:28 we might use stem to ask tor what guards it uses. 15:38:34 and log that. 15:38:39 so we need to record those paths, and the output of GUARD events, and see if tor is doing the right thing when we tell it to use 1 guard, or 2 guard, or 3 guards 15:38:44 that would work around relying on events. 15:38:47 ah okay 15:38:56 this GUARD event is a sad stateful mess 15:39:19 how about this: I can spend a few hours on this to get this analysis started. 15:39:36 I'm just not sure if we'll get it resolved in time for 0.8. 15:39:37 it should have just told us what Tor thinks the current guards are right now instead of all this stateful per-guard UP/DOWN info 15:40:05 never too late to add another event type... 15:40:43 UP/DOWN also seem not necessarily correlated with in-use 15:40:54 they might just mean possible to use 15:41:09 same thing for BAD/GOOD 15:41:37 mikeperry: do you want to take a closer look at this first and comment on the ticket before I do something there? 15:42:43 in any case, let's keep it in the roadmap, though it might turn out to be too big for 0.8. 15:43:13 quickly looking through "Backlog". 15:43:33 yeah I think this GUARD event requires knowledge of prop271 internal tor state to make use of 15:43:35 I don't think there's room for more, if we want to finish in 2-2.5 weeks. 15:43:47 mikeperry: sounds like it. 15:43:48 it is just telling us about *potential* guards, not the one that prop271 decided was the best 15:43:56 if I am reading the patch right 15:44:04 ugh I should have looked at that earlier 15:44:19 do you want to write a better tor patch, and we run that in onionperf for a while? 15:45:20 yeah I should at least try.. but you're right.. it is not clear how the hell this GUARD event is *supposed* to tell you the current in-use guard(s) 15:45:28 because of all this primary secondary business 15:46:05 can I assign the issue to you for possible next steps? 15:46:08 yah 15:47:30 done. 15:47:33 thanks! 15:47:48 okay, I think that's the plan for 0.8 then. 15:48:02 anything else on the roadmap topic? 15:48:32 moving on: 15:48:33 CBT trial experiments/analysis 15:49:04 15:05:25 <+mikeperry> I added one. if you're going on leave at end of sept we should try to do some trial experiments on CBT to make sure the workflow is ok and I understand how to get data, etc 15:49:29 assuming we implement something like I suggested on tpo/metrics/onionperf#33420 today, 15:49:30 so for that, I will want to run my own unionperf instance and examine and graph output most likely 15:50:13 it already smells fishy if those values you posted on tpo/metrics/onionperf#33420 are real 15:50:22 they are. 15:50:47 I wonder if it's easier to just analyze the torctl logs directly first. 15:51:05 if the goal is to get a feel of the data. 15:51:56 can you write down what exactly you're interested in, and I run a quick analysis on the torctl logs locally? 15:52:01 well the goal is to make sure I know enough onionperf kungfu to be able to diagnose and fix the issue later, as well as tune the quantile value via a custom tor patch 15:53:09 for the tuning, I want to see what different values of cutoff_quantile do to the actual timeout rate 15:53:13 I'm just not sure if we should add all the buildtimeout values to graphs and/or the CSV output. 15:53:22 and to TTFB and throughput metrics 15:54:14 okay. in that case let's try to do this in onionperf, as part of #33420. 15:54:30 if it is simpler to keep the fields you already put in, that is sufficient 15:54:44 I can do debugging and analysis with torctl logs, as you said, yah 15:54:50 of a custom onionperf 15:55:08 well, that might be easier. 15:55:16 we can later add new stuff to onionperf for this. 15:55:27 but knowing what exactly we're interested in would help with that. 15:55:50 when we do the full experiment on the live network, we will want to be able to mark which sections of the onionperf graphs used what cutoff_quantile 15:56:01 you don't even need a custom onionperf (in terms of patched). onionperf already writes torctl logs containing all those events. 15:56:16 and also know what their timeout_rate (and what onionperf things is the timeout+failure rates) at those times 15:56:23 ah ok 15:56:49 sounds like we'll need to discuss that more. 15:56:52 I do need to patch tor if I want to change cutoff_quantile locally (as opposed to network-wide in consensus) 15:56:53 (90 seconds left) 15:57:00 oh, right. 15:57:08 I mean, sounds plausible. I wouldn't know for sure. 15:57:24 let's use the last minute for the last topic: 15:57:28 Simply Secure and Tor UX are running a survey to collect user feedback about the metrics website. Please, participate! I plan to email lists the next month to encourage people to do it (antonela) 15:57:32 https://tools.simplysecure.org/survey/index.php?r=survey/index&sid=39865&lang=en 15:57:45 I added something to the metrics website for that. 15:57:48 yes, thanks Karsten for pushing the banner live! 15:57:50 with a link. 15:57:53 sure! 15:58:02 this work has OTF funding and given the current situation, the work has a stop order. We will use the survey to collect info until things back to regular mood. 15:58:14 ill email the lists the next month to call for participation 15:58:22 sounds great! 15:58:26 thanks for this! 15:58:28 all people here should jump in! 15:58:34 of course 15:58:46 time's up! I think there's another meeting after this. 15:58:58 thanks, everyone! talk to you next week. o/ 15:59:01 will do :) 15:59:10 #endmeeting