15:01:16 #startmeeting metrics team meeting 15:01:16 Meeting started Thu Feb 20 15:01:16 2020 UTC. The chair is karsten. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:01:16 Useful Commands: #action #agreed #help #info #idea #link #topic. 15:01:27 please add more topics, if you want. 15:03:15 let's start. if more topics come up, we can append them. 15:03:20 Review tasks from roadmap session (ticket creation, old cards in trello) 15:03:33 gaba: ^ 15:04:28 yes 15:04:47 there are still some tickets that needs to be created. I wanted to check with you all before going and creating them myself. 15:05:02 they are the issues marked as NEED_TICKET in the temporal roadmap pad 15:05:30 the updated (that is strikethrough) is that already added all other tickets and imported them into the trello roadmap 15:06:51 sorry about that, I thought all onionperf tickets were done 15:07:10 and this bring us to the item about migrating from gitlab to trac 15:07:27 I am creating the remaining ones just now 15:07:27 that irl added? 15:07:45 thanks acute! 15:08:01 we didn't move the onionperf tickets yet 15:08:08 if that's what you're asking 15:08:22 we have moved all the metrics-cloud PRs, and even processed, reviewed and merged them 15:08:50 ok 15:10:53 I added a comment to a NeedsTicket line. do you need anything else from me? 15:11:03 nop. I think we are fine 15:11:43 okay. 15:12:06 what else remains on this topic? 15:13:00 I think we are done. I will include those into the roadmap once we have them. 15:13:14 sounds great! 15:13:26 okay, moving on. 15:13:32 GeoIP database (karsten) 15:14:04 last week we considered moving to another database provider. but it turns out we'd be running into similar issues as with the current database. 15:14:10 CCPA. 15:14:23 yeah 15:14:30 (California Consumer Privacy Act) 15:14:43 we should talk about potential workarounds. 15:14:49 i'm sure this has to be a bad interpretation of the act 15:14:57 or the legislators are incompetent 15:15:26 that's a fine question for a lawyer. 15:15:33 i like the idea of removing the need for clients to have a database 15:15:44 yes, that's one option. 15:15:51 which doesn't fully solve the problem, but part of it. 15:15:54 i also like the idea of distributing a cut down version of the database as a new dirauth document 15:16:10 cut down how? 15:16:20 subnet -> country code is i think all we need 15:16:27 ah, yes. 15:16:59 I mentioned cutting out californian addresses, and you didn't like that. why? 15:17:29 a) it's ineffective and i don't want to encourage stuff like this, b) it's a slippery slope 15:18:06 the two ideas above (no need for clients to use database, distribute via dirauths) 15:18:13 only work in 6 months to 2 years from now. 15:18:26 depending on how fast we get this implemented and depending on release cycles. 15:18:52 right but until then nothing actually breaks 15:18:53 cutting out addresses might be a short-term solution. 15:18:59 well, yes. 15:19:33 and we have plenty of relays submitting stats with old geoip files 15:20:04 (enthusiastic students watching should think about extrapolating network stats from only those relays with recent geoip files and seeing how the metrics change) 15:20:12 hehe 15:20:29 okay, I can see both sides here. 15:20:37 I can see how it would be effective in a legal sense. 15:20:47 and I can see how we survive 6 months to 2 years without new geoip data. 15:21:20 are these the two ideas we should pursue? 15:21:35 we would still have to find out whether they would be acceptable. 15:21:39 it's the being able to distribute updates that could be the problem 15:21:41 and then talk about design and code. 15:21:48 which part? 15:22:12 i remember reading we could ship a database with the product but each user had to sign up to receive updates 15:22:20 i don't remember which provider this was 15:22:33 yes, that's something to discuss with them. 15:22:57 I could imagine getting an exception for that. it's the part where they need to make sure that we make sure our users get updates. 15:23:24 so, yes, we should be sure about that before writing a proposal or any code. 15:23:28 let's write a document to describe what we want to do 15:23:57 I think regardless of the questions to IP2Location we should consult a lawyer 15:24:17 i think we need to work out what to ask the lawyer first 15:24:34 unless we have some big fund for lawyer questions 15:24:37 my plan was to write that document in an email. 15:25:01 we can discuss that internally before asking them. should we do that? 15:25:08 yeah 15:25:11 ok 15:25:31 and yes, asking a lawyer should be part of the process. not necessarily step 1, but somewhere. 15:25:52 okay. 15:26:05 that's all from me on this topic. moving on? 15:26:08 ok 15:26:24 Onionoo 8.0 upgrade (karsten) 15:26:28 currently in progress. 15:26:42 is RS going to work with this? 15:27:08 I guess we'll find out. 15:27:21 yes 15:27:24 it will work 15:27:42 the updater is currently running, and I'm going to update the server after this meeting. and the metrics website documentation. 15:27:45 cool! 15:28:06 nice catch with running tests in 1990, by the way. ;) 15:28:21 okay, that's all on this topic. 15:28:24 i have expenses to submit for my time travel 15:28:33 ah, things were cheap back then. 15:28:37 heh 15:28:48 Simplifying (but also breaking) TorDNSEL (irl) 15:29:23 we can make a simple dns service to replace the current thing, instead of replicating the functionality 15:29:38 it would tell you if an exit relay is an exit relay or not, but not look at exit policies 15:30:03 exit lists don't contain exit policies so this gets a lot more complicated than i had originally imagined it would be 15:30:28 for consumers i know of, they don't care so much if it's the simple method or not 15:30:30 hmm, how does the current thing solve this? 15:30:35 but exit operators might notice 15:30:49 the current thing downloads server descriptors from collector on a schedule 15:31:40 this is a fine question... 15:31:41 so if it's just for exit relays and not for exit relays by dest ip/port, we can actually just write out a bind compatible zone file 15:31:55 and that means that the task is 1 pt 15:32:55 how can we ask users? 15:33:17 i have privately spoken to irc network operators 15:33:38 i don't know who the other users might be 15:33:54 the exit operators through the tor-relay mailing list? 15:34:09 i could write a mail on tor-relays@ 15:34:21 and exit relay or not is determined by having the Exit flag? 15:34:32 or having an exit policy that is not reject *? 15:34:49 exit relay ip addresses are ip addresses we found in exit lists 15:35:44 how do we decide which relays we scan? 15:35:55 and is that different with the current and the new thing? 15:36:13 exits that are allowed to connect to https websites 15:36:27 we did a comparison on how it is different, and it's not that different 15:36:40 okay. 15:36:50 the difference is that in the past you could reject say freenode:6667 15:36:54 I like simple. I just don't feel able to answer whether it's good enough. 15:37:02 and then freenode wouldn't list your ip as a tor exit when they queried 15:37:16 the alternative is spending two more weeks on this 15:37:26 I can imagine. 15:37:27 maybe 3 or 4 when i find the complications 15:38:12 okay, I think asking on tor-relays@ is a reasonable next step. and then decide based on feedback. 15:38:18 yeah agreed 15:38:33 ok that's probably all on this topic 15:38:46 great! 15:38:52 Onionoo new hosts (irl) 15:39:01 we got these hosts 15:39:03 should we use them? 15:39:21 how many? 15:39:24 we have 2 nowe 15:39:34 to replace omeiense and oo-hetzner-03? 15:39:44 yeah 15:39:58 let's do it. when? 15:40:06 next week? 15:40:13 good question 15:40:49 4th march? 15:41:08 or 5th march 15:41:30 the plan would be to set up 2 new hosts, but not switch immediately? 15:41:40 and when everything looks good, switch and keep the old ones around? 15:41:50 and then when everything still looks good, kill the old ones? 15:41:53 yeah 15:42:08 i think they are currently masked in the varnish config 15:42:36 how long do we need for step 1? an hour or two? 15:43:12 if so, maybe we can do it sooner, and then do the next step in march. 15:43:21 leaving enough time between steps to do checks. 15:43:40 the time consuming bit is syncing the state 15:43:56 you mean copying over the directory from omeiense? 15:44:04 yeah 15:44:51 okay. let's note down march 4 or 5, and do it sooner if we find free time. 15:44:54 but also i have none of this in my head right now, i have to read the docs again 15:45:31 we can look again at the next meeting to fix when we do it 15:45:57 ok. 15:46:30 (Quick) Questions on Metrics DB / Metrics Website Refresh (dj) 15:46:39 My current understanding of the Metrics Backend is drawn from the March 2019 report by Karsten and irl which was super helpful. (Found at https://research.torproject.org/techreports/metrics-evaluation-2019-03-25.pdf ) 15:46:49 It suggests the the back end is currently mostly Java + R reading from CSVs / PostGres SQL Databases which are created from various tor-specific flat files. 15:47:00 1) Is this correct / up to date? 15:47:13 yes 15:47:15 A few weeks ago it was mentioned that the Metrics Website is getting a refresh and there will be some UX research prior to that. 15:47:22 2) Any idea on timeline? When is the work expected to take place and the new website land? 15:47:33 (I tried to find this myself, but could not track it down) 15:47:35 not the metrics website, but a portal to metrics data 15:48:08 So this will be an alternative set of visualisations of the same data? Using the same backend? 15:48:33 it's more of an index although we might have visualisations in it 15:48:39 the portal work on the user research part will be done in August. We will be implementing it after that depending on capacity. 15:49:00 Okay - so it is more like Onionoo? I don't quite know what to picture sorry 15:49:23 https://data.gov.uk/ but for tor 15:49:36 simply secure is working on the assumption that we are going to have visualizations there and it will be a useful place for researchers, journalists and whoever wants to use the data from Tor 15:49:43 yeah 15:50:27 Okay, great. So will this involve back end work on how metrics are stored? 15:50:34 if it does, we did it wrong 15:50:39 Haha. okay :) 15:51:02 this will also help us to make other data available more easily 15:51:08 That's super helpful, thank you both for saving me a lot of time looking through Tracs and Pads! 15:51:13 say if we did a one-off analysis and have some CSV files that will never be updated 15:51:19 Ah! Gotcha! 15:51:36 Yeah - there's a ton of stuff like that hanging around and it would be great to keep it together 15:51:46 Final question / motivation 15:52:03 Was any thought given to using a backend for time series metrics like InfluxDB or graphna or whatever? 15:52:21 Has it been tried? Or rejected for some reason? 15:52:50 i have been working on/off on a backend using apache beam for windowed batch jobs to get more real-time output 15:53:12 like extrapolating network traffic from the subset of relays that updated extra-info descriptors more recently 15:53:50 it would be awesome to have a funded project and make this a real thing, but we're never going to have the time unless it's funded 15:54:02 and metrics is not the big cool thing that gets funded 15:54:20 Okay, that's interesting. Thanks! 15:54:40 Yeah, I appreciate that, although it really (IMO) it should be the first line in any perf / scaling proposal 15:54:47 "All the metrics!" 15:55:03 yes we should have metrics in every proposal 15:55:08 otherwise how do you know if you did anything? 15:55:14 Amen! 15:55:47 Thank for the info. Saves me a lot of time. Context: students looking for mini projects and something like this as PoC came up. 15:55:55 while we're at it, let's all pay for the current metrics proposal to come through. 15:55:58 pray* 15:55:58 Wanted to check it wasn't already being done / had been done / didn't work 15:56:00 not pay. 15:56:19 lol 15:56:24 so many thing to do. 15:56:51 dennis_jackson: i can email you a "requirements" doc if you have students that will make it a thing 15:57:09 The mozilla proposal? Any idea when they announce? 15:57:42 irl: That would be super handy. I can't promise anything though. 15:57:52 ok 15:57:54 hopefully very soon :) 15:57:56 I.e. if you one already. please do send it, but don't make one or spend time on it 15:58:05 if you have one already* 15:58:21 gaba: fingers crossed then :) 15:58:35 i have all the things on postit notes, i can put them in a doc 15:58:42 approaching the hour. anything else for today? 15:58:54 irl: please copy me, if you write something. curious. 15:59:04 nothing else from me 15:59:06 will do 15:59:07 irl: that would be great then 15:59:10 nothing else from me 15:59:28 thanks, everyone! talk to you next week! 15:59:32 #endmeeting