15:01:16 <karsten> #startmeeting metrics team meeting 15:01:16 <MeetBot> Meeting started Thu Feb 20 15:01:16 2020 UTC. The chair is karsten. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:01:16 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic. 15:01:27 <karsten> please add more topics, if you want. 15:03:15 <karsten> let's start. if more topics come up, we can append them. 15:03:20 <karsten> Review tasks from roadmap session (ticket creation, old cards in trello) 15:03:33 <karsten> gaba: ^ 15:04:28 <gaba> yes 15:04:47 <gaba> there are still some tickets that needs to be created. I wanted to check with you all before going and creating them myself. 15:05:02 <gaba> they are the issues marked as NEED_TICKET in the temporal roadmap pad 15:05:30 <gaba> the updated (that is strikethrough) is that already added all other tickets and imported them into the trello roadmap 15:06:51 <acute> sorry about that, I thought all onionperf tickets were done 15:07:10 <gaba> and this bring us to the item about migrating from gitlab to trac 15:07:27 <acute> I am creating the remaining ones just now 15:07:27 <gaba> that irl added? 15:07:45 <gaba> thanks acute! 15:08:01 <irl> we didn't move the onionperf tickets yet 15:08:08 <irl> if that's what you're asking 15:08:22 <irl> we have moved all the metrics-cloud PRs, and even processed, reviewed and merged them 15:08:50 <gaba> ok 15:10:53 <karsten> I added a comment to a NeedsTicket line. do you need anything else from me? 15:11:03 <gaba> nop. I think we are fine 15:11:43 <karsten> okay. 15:12:06 <karsten> what else remains on this topic? 15:13:00 <gaba> I think we are done. I will include those into the roadmap once we have them. 15:13:14 <karsten> sounds great! 15:13:26 <karsten> okay, moving on. 15:13:32 <karsten> GeoIP database (karsten) 15:14:04 <karsten> last week we considered moving to another database provider. but it turns out we'd be running into similar issues as with the current database. 15:14:10 <karsten> CCPA. 15:14:23 <irl> yeah 15:14:30 <karsten> (California Consumer Privacy Act) 15:14:43 <karsten> we should talk about potential workarounds. 15:14:49 <irl> i'm sure this has to be a bad interpretation of the act 15:14:57 <irl> or the legislators are incompetent 15:15:26 <karsten> that's a fine question for a lawyer. 15:15:33 <irl> i like the idea of removing the need for clients to have a database 15:15:44 <karsten> yes, that's one option. 15:15:51 <karsten> which doesn't fully solve the problem, but part of it. 15:15:54 <irl> i also like the idea of distributing a cut down version of the database as a new dirauth document 15:16:10 <karsten> cut down how? 15:16:20 <irl> subnet -> country code is i think all we need 15:16:27 <karsten> ah, yes. 15:16:59 <karsten> I mentioned cutting out californian addresses, and you didn't like that. why? 15:17:29 <irl> a) it's ineffective and i don't want to encourage stuff like this, b) it's a slippery slope 15:18:06 <karsten> the two ideas above (no need for clients to use database, distribute via dirauths) 15:18:13 <karsten> only work in 6 months to 2 years from now. 15:18:26 <karsten> depending on how fast we get this implemented and depending on release cycles. 15:18:52 <irl> right but until then nothing actually breaks 15:18:53 <karsten> cutting out addresses might be a short-term solution. 15:18:59 <karsten> well, yes. 15:19:33 <irl> and we have plenty of relays submitting stats with old geoip files 15:20:04 <irl> (enthusiastic students watching should think about extrapolating network stats from only those relays with recent geoip files and seeing how the metrics change) 15:20:12 <karsten> hehe 15:20:29 <karsten> okay, I can see both sides here. 15:20:37 <karsten> I can see how it would be effective in a legal sense. 15:20:47 <karsten> and I can see how we survive 6 months to 2 years without new geoip data. 15:21:20 <karsten> are these the two ideas we should pursue? 15:21:35 <karsten> we would still have to find out whether they would be acceptable. 15:21:39 <irl> it's the being able to distribute updates that could be the problem 15:21:41 <karsten> and then talk about design and code. 15:21:48 <karsten> which part? 15:22:12 <irl> i remember reading we could ship a database with the product but each user had to sign up to receive updates 15:22:20 <irl> i don't remember which provider this was 15:22:33 <karsten> yes, that's something to discuss with them. 15:22:57 <karsten> I could imagine getting an exception for that. it's the part where they need to make sure that we make sure our users get updates. 15:23:24 <karsten> so, yes, we should be sure about that before writing a proposal or any code. 15:23:28 <irl> let's write a document to describe what we want to do 15:23:57 <gaba> I think regardless of the questions to IP2Location we should consult a lawyer 15:24:17 <irl> i think we need to work out what to ask the lawyer first 15:24:34 <irl> unless we have some big fund for lawyer questions 15:24:37 <karsten> my plan was to write that document in an email. 15:25:01 <karsten> we can discuss that internally before asking them. should we do that? 15:25:08 <irl> yeah 15:25:11 <gaba> ok 15:25:31 <karsten> and yes, asking a lawyer should be part of the process. not necessarily step 1, but somewhere. 15:25:52 <karsten> okay. 15:26:05 <karsten> that's all from me on this topic. moving on? 15:26:08 <irl> ok 15:26:24 <karsten> Onionoo 8.0 upgrade (karsten) 15:26:28 <karsten> currently in progress. 15:26:42 <karsten> is RS going to work with this? 15:27:08 <karsten> I guess we'll find out. 15:27:21 <irl> yes 15:27:24 <irl> it will work 15:27:42 <karsten> the updater is currently running, and I'm going to update the server after this meeting. and the metrics website documentation. 15:27:45 <karsten> cool! 15:28:06 <karsten> nice catch with running tests in 1990, by the way. ;) 15:28:21 <karsten> okay, that's all on this topic. 15:28:24 <irl> i have expenses to submit for my time travel 15:28:33 <karsten> ah, things were cheap back then. 15:28:37 <irl> heh 15:28:48 <karsten> Simplifying (but also breaking) TorDNSEL (irl) 15:29:23 <irl> we can make a simple dns service to replace the current thing, instead of replicating the functionality 15:29:38 <irl> it would tell you if an exit relay is an exit relay or not, but not look at exit policies 15:30:03 <irl> exit lists don't contain exit policies so this gets a lot more complicated than i had originally imagined it would be 15:30:28 <irl> for consumers i know of, they don't care so much if it's the simple method or not 15:30:30 <karsten> hmm, how does the current thing solve this? 15:30:35 <irl> but exit operators might notice 15:30:49 <irl> the current thing downloads server descriptors from collector on a schedule 15:31:40 <karsten> this is a fine question... 15:31:41 <irl> so if it's just for exit relays and not for exit relays by dest ip/port, we can actually just write out a bind compatible zone file 15:31:55 <irl> and that means that the task is 1 pt 15:32:55 <karsten> how can we ask users? 15:33:17 <irl> i have privately spoken to irc network operators 15:33:38 <irl> i don't know who the other users might be 15:33:54 <gaba> the exit operators through the tor-relay mailing list? 15:34:09 <irl> i could write a mail on tor-relays@ 15:34:21 <karsten> and exit relay or not is determined by having the Exit flag? 15:34:32 <karsten> or having an exit policy that is not reject *? 15:34:49 <irl> exit relay ip addresses are ip addresses we found in exit lists 15:35:44 <karsten> how do we decide which relays we scan? 15:35:55 <karsten> and is that different with the current and the new thing? 15:36:13 <irl> exits that are allowed to connect to https websites 15:36:27 <irl> we did a comparison on how it is different, and it's not that different 15:36:40 <karsten> okay. 15:36:50 <irl> the difference is that in the past you could reject say freenode:6667 15:36:54 <karsten> I like simple. I just don't feel able to answer whether it's good enough. 15:37:02 <irl> and then freenode wouldn't list your ip as a tor exit when they queried 15:37:16 <irl> the alternative is spending two more weeks on this 15:37:26 <karsten> I can imagine. 15:37:27 <irl> maybe 3 or 4 when i find the complications 15:38:12 <karsten> okay, I think asking on tor-relays@ is a reasonable next step. and then decide based on feedback. 15:38:18 <irl> yeah agreed 15:38:33 <irl> ok that's probably all on this topic 15:38:46 <karsten> great! 15:38:52 <karsten> Onionoo new hosts (irl) 15:39:01 <irl> we got these hosts 15:39:03 <irl> should we use them? 15:39:21 <karsten> how many? 15:39:24 <irl> we have 2 nowe 15:39:34 <karsten> to replace omeiense and oo-hetzner-03? 15:39:44 <irl> yeah 15:39:58 <karsten> let's do it. when? 15:40:06 <karsten> next week? 15:40:13 <irl> good question 15:40:49 <irl> 4th march? 15:41:08 <irl> or 5th march 15:41:30 <karsten> the plan would be to set up 2 new hosts, but not switch immediately? 15:41:40 <karsten> and when everything looks good, switch and keep the old ones around? 15:41:50 <karsten> and then when everything still looks good, kill the old ones? 15:41:53 <irl> yeah 15:42:08 <irl> i think they are currently masked in the varnish config 15:42:36 <karsten> how long do we need for step 1? an hour or two? 15:43:12 <karsten> if so, maybe we can do it sooner, and then do the next step in march. 15:43:21 <karsten> leaving enough time between steps to do checks. 15:43:40 <irl> the time consuming bit is syncing the state 15:43:56 <karsten> you mean copying over the directory from omeiense? 15:44:04 <irl> yeah 15:44:51 <karsten> okay. let's note down march 4 or 5, and do it sooner if we find free time. 15:44:54 <irl> but also i have none of this in my head right now, i have to read the docs again 15:45:31 <irl> we can look again at the next meeting to fix when we do it 15:45:57 <karsten> ok. 15:46:30 <karsten> (Quick) Questions on Metrics DB / Metrics Website Refresh (dj) 15:46:39 <dennis_jackson> My current understanding of the Metrics Backend is drawn from the March 2019 report by Karsten and irl which was super helpful. (Found at https://research.torproject.org/techreports/metrics-evaluation-2019-03-25.pdf ) 15:46:49 <dennis_jackson> It suggests the the back end is currently mostly Java + R reading from CSVs / PostGres SQL Databases which are created from various tor-specific flat files. 15:47:00 <dennis_jackson> 1) Is this correct / up to date? 15:47:13 <irl> yes 15:47:15 <dennis_jackson> A few weeks ago it was mentioned that the Metrics Website is getting a refresh and there will be some UX research prior to that. 15:47:22 <dennis_jackson> 2) Any idea on timeline? When is the work expected to take place and the new website land? 15:47:33 <dennis_jackson> (I tried to find this myself, but could not track it down) 15:47:35 <irl> not the metrics website, but a portal to metrics data 15:48:08 <dennis_jackson> So this will be an alternative set of visualisations of the same data? Using the same backend? 15:48:33 <irl> it's more of an index although we might have visualisations in it 15:48:39 <gaba> the portal work on the user research part will be done in August. We will be implementing it after that depending on capacity. 15:49:00 <dennis_jackson> Okay - so it is more like Onionoo? I don't quite know what to picture sorry 15:49:23 <irl> https://data.gov.uk/ but for tor 15:49:36 <gaba> simply secure is working on the assumption that we are going to have visualizations there and it will be a useful place for researchers, journalists and whoever wants to use the data from Tor 15:49:43 <irl> yeah 15:50:27 <dennis_jackson> Okay, great. So will this involve back end work on how metrics are stored? 15:50:34 <irl> if it does, we did it wrong 15:50:39 <dennis_jackson> Haha. okay :) 15:51:02 <irl> this will also help us to make other data available more easily 15:51:08 <dennis_jackson> That's super helpful, thank you both for saving me a lot of time looking through Tracs and Pads! 15:51:13 <irl> say if we did a one-off analysis and have some CSV files that will never be updated 15:51:19 <dennis_jackson> Ah! Gotcha! 15:51:36 <dennis_jackson> Yeah - there's a ton of stuff like that hanging around and it would be great to keep it together 15:51:46 <dennis_jackson> Final question / motivation 15:52:03 <dennis_jackson> Was any thought given to using a backend for time series metrics like InfluxDB or graphna or whatever? 15:52:21 <dennis_jackson> Has it been tried? Or rejected for some reason? 15:52:50 <irl> i have been working on/off on a backend using apache beam for windowed batch jobs to get more real-time output 15:53:12 <irl> like extrapolating network traffic from the subset of relays that updated extra-info descriptors more recently 15:53:50 <irl> it would be awesome to have a funded project and make this a real thing, but we're never going to have the time unless it's funded 15:54:02 <irl> and metrics is not the big cool thing that gets funded 15:54:20 <dennis_jackson> Okay, that's interesting. Thanks! 15:54:40 <dennis_jackson> Yeah, I appreciate that, although it really (IMO) it should be the first line in any perf / scaling proposal 15:54:47 <dennis_jackson> "All the metrics!" 15:55:03 <irl> yes we should have metrics in every proposal 15:55:08 <irl> otherwise how do you know if you did anything? 15:55:14 <dennis_jackson> Amen! 15:55:47 <dennis_jackson> Thank for the info. Saves me a lot of time. Context: students looking for mini projects and something like this as PoC came up. 15:55:55 <karsten> while we're at it, let's all pay for the current metrics proposal to come through. 15:55:58 <karsten> pray* 15:55:58 <dennis_jackson> Wanted to check it wasn't already being done / had been done / didn't work 15:56:00 <karsten> not pay. 15:56:19 <gaba> lol 15:56:24 <karsten> so many thing to do. 15:56:51 <irl> dennis_jackson: i can email you a "requirements" doc if you have students that will make it a thing 15:57:09 <dennis_jackson> The mozilla proposal? Any idea when they announce? 15:57:42 <dennis_jackson> irl: That would be super handy. I can't promise anything though. 15:57:52 <irl> ok 15:57:54 <gaba> hopefully very soon :) 15:57:56 <dennis_jackson> I.e. if you one already. please do send it, but don't make one or spend time on it 15:58:05 <dennis_jackson> if you have one already* 15:58:21 <dennis_jackson> gaba: fingers crossed then :) 15:58:35 <irl> i have all the things on postit notes, i can put them in a doc 15:58:42 <karsten> approaching the hour. anything else for today? 15:58:54 <karsten> irl: please copy me, if you write something. curious. 15:59:04 <irl> nothing else from me 15:59:06 <irl> will do 15:59:07 <dennis_jackson> irl: that would be great then 15:59:10 <dennis_jackson> nothing else from me 15:59:28 <karsten> thanks, everyone! talk to you next week! 15:59:32 <karsten> #endmeeting