15:05:02 <karsten> #startmeeting metrics team 15:05:02 <MeetBot> Meeting started Thu Dec 5 15:05:02 2019 UTC. The chair is karsten. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:05:02 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic. 15:05:04 <karsten> hi! 15:05:10 <acute> hi! 15:05:16 <karsten> hi acute! 15:05:29 <irl> hi! 15:05:46 <karsten> gaba wrote that she might not be able to join us today. 15:06:03 <karsten> anything else for the agenda? 15:06:13 <karsten> Public pad URL is: https://nc.torproject.net/s/Znd4MkHebAznEgA 15:07:07 <acute> not from me 15:07:08 <karsten> if not, let's start. 15:07:23 <irl> ok 15:07:30 <karsten> Exit scanner status (irl) 15:08:06 <irl> good progress, need to ask for a machine to deploy the scanner on, next step is to get check working 15:08:39 * irl waits for trac to find the ticket 15:09:06 <irl> #32186 15:09:36 <irl> I was hoping that this would be done by someone who knows that codebase 15:09:39 <irl> and programming language 15:10:22 <irl> but I've written academic grade Go before, so i should be able to do this 15:10:47 <irl> the final piece in the puzzle is the DNS server which i don't have a plan for yet 15:11:07 <karsten> but doesn't the last comment on that ticket indicate that nothing needs to be rewritten? 15:11:36 <irl> ooh i didn't see the comment 15:11:48 <irl> that's odd trac should have emailed me 15:11:55 <irl> yes, actually that looks like it will be ok 15:12:06 <karsten> just put metrics-team in Cc. 15:12:40 <karsten> how's the DNS server implemented right now? 15:12:53 <irl> i think it is part of the haskell 15:13:17 <karsten> and do we know how much it is being used? 15:13:27 <irl> no 15:13:36 <irl> but i do believe it's the right way to do it 15:13:47 <irl> if it's not being used it's because we failed at making it work 15:14:00 <irl> dnsbl systems scale to internet mail spam 15:14:33 <karsten> so, the current plan is to rewrite it in python somehow? 15:14:58 <irl> i might do it in C 15:15:20 <irl> but yes, rewrite it in something we know 15:15:58 <karsten> are there maybe libraries for doing this? 15:16:30 <irl> i might end up transmogrifying https://www.nlnetlabs.nl/projects/nsd/about/ 15:17:21 <irl> i will contact someone about whether this is a good idea, i have a couple of people to ask 15:17:27 <karsten> okay, good. 15:18:50 <karsten> I wrote a quick summary, feel free to tweak that. anything else on this topic? 15:19:09 <irl> not from me 15:19:33 <karsten> Metrics website daily updater (#25924) (karsten) 15:19:37 <karsten> just a quick update: 15:19:42 <karsten> It's deployed now. Last execution times of daily update runs in hours are: 13.6, 14.4, 17.8, 15.8, 5.4. Those 5.4 hours are from the first execution running this branch. Yay! 15:19:56 <irl> woo! 15:19:59 <karsten> this is more improvement than expected. 15:20:02 <acute> nice! 15:20:04 <irl> less than half 15:20:31 <irl> are we out of low-hanging fruit there or do we still have some tricks left? 15:20:58 <karsten> as you wrote, we were lucky this time. 15:21:15 <irl> hmm ok 15:21:23 <karsten> I didn't spot any obvious next candidates. 15:21:34 <karsten> we would find something if we looked harder. 15:21:53 <irl> right 15:21:57 <karsten> the question is how much it matters if execution times stay around 6-9 hours. 15:22:12 <karsten> those 17.8 hours were too much. 15:22:29 <irl> yeah 15:22:48 <karsten> there are other possible improvements in onionoo (next topic) and in collector's webstats module. 15:23:06 <karsten> let's move on to the onionoo thing: 15:23:11 <karsten> Onionoo change to reduce writes (#32660) (karsten) 15:23:19 <irl> yeah this isn't great 15:23:34 <karsten> did you see my latest comment on that? 15:23:39 <irl> looking now 15:24:43 <irl> right yes this looks sane 15:24:52 <karsten> should we try it out? 15:24:53 <irl> reads are a lot cheaper than writes in this setup afaict 15:24:56 <irl> we should try it out 15:24:59 <karsten> it sounds like it. 15:25:12 <karsten> okay, then I'll put out a release and deploy that on both remaining hosts tomorrow. 15:25:48 <karsten> in theory we should have a config option for this, so that we don't have to put out another release to take it out again. 15:25:51 <gaba> hey, i'm half here but reading 15:25:55 <karsten> but we're not great at config options. 15:25:59 <karsten> hi gaba! 15:26:14 <irl> less config options are better 15:26:18 <irl> so many untested code paths 15:26:23 <karsten> ah, do you want to review the code first? 15:26:41 <karsten> it's a small patch. I can put it on gitweb.tp.o after the meeting. 15:26:59 <irl> yeah i can look this evening 15:27:04 <karsten> okay! 15:27:31 <irl> i'm at a conference tomorrow but if you are working you could aim to deploy then 15:27:51 <karsten> yep. 15:28:28 <karsten> and then we let it run over the weekend and ask for another graph. 15:28:39 <irl> yeah sounds good 15:28:43 <karsten> alright. moving on? 15:28:46 <irl> ok 15:28:51 <karsten> Apache Spark + webstats (irl) 15:29:19 <irl> i was asked about what we can do for a dashboard for webstats 15:29:30 <irl> we have played with the idea of spark before 15:29:44 <irl> i think this is something that we would not do, but services team would do 15:30:04 <karsten> I'd be curious to see it. what do we have to do? 15:30:25 <irl> if we have any suggestions on how to get from our logs to fancy dashboard 15:30:40 <irl> for which i've learned the cloudspeak is "business intelligence" 15:31:11 <irl> then i'm going to write an email to just summarise things we thought about before 15:31:38 <irl> one thing i did find: our logs are not apache combined log format 15:31:47 <irl> they are truncated 15:31:56 <irl> and that breaks awstats 15:31:59 <karsten> oh. 15:32:10 <karsten> we do have a spec for them. 15:32:23 <irl> yeah, the spec says they are compatible with apache combined log 15:32:26 <irl> it's lying 15:32:35 <irl> but other than that we do follow the spec 15:32:40 <karsten> we should fix that. 15:32:49 <irl> :/ 15:32:53 <irl> reprocess everything? 15:32:59 <irl> or fix the spec? 15:33:16 <karsten> find out how the spec is broken exactly. 15:33:25 <karsten> and why. 15:33:34 <karsten> reprocessing everything would be intense. 15:33:39 <irl> it looks like we are missing two quoted strings from the end of the lines 15:33:50 <karsten> but wait. 15:33:55 <karsten> The result is still supposed to be fully compatible with the Common Log Format and can be processed by any tools being capable of processing that format. 15:34:00 <karsten> *common*, not *combined* 15:34:05 <irl> oh 15:34:12 <karsten> https://metrics.torproject.org/web-server-logs.html 15:34:23 <irl> wow those are different things 15:34:25 <karsten> I'd think that awstats can handle that. 15:34:51 <gaba> about the webstats is that something grafana can do? 15:35:06 <irl> gaba: grafana only visualises stuff 15:35:11 <irl> it doesn't parse/analyse 15:35:14 <gaba> we have one installed 15:35:21 <irl> err do we? 15:35:31 <irl> i think tsa has a grafana that they use 15:35:50 <gaba> grafana.torproject.org/ 15:35:53 <gaba> anarcat installed one 15:36:13 <irl> but they wouldn't want other stuff on there, with good reason, you can slow it down pretty efficiently if you don't do things carefully 15:36:47 <irl> ok, if i say it is "common" log format then awstats does parse it 15:37:07 <irl> so i will make an example awstats site (which i just did) and then see if that is good enough for the needs 15:37:14 <gaba> ok 15:37:23 <karsten> sounds good. 15:37:36 <karsten> if there are any issues with that, please open tickets. 15:37:46 <irl> yeah, will do 15:37:48 <karsten> the format was supposed to be usable in tools like awstats. 15:38:07 <karsten> but who knows whether we screwed up somewhere in the process. 15:38:10 <irl> i just didn't read the manual well enough, who would have thought "ACL" and "ACL" were different 15:38:17 <karsten> so far nobody tried it, so this is when we'll learn. 15:38:28 <karsten> different in the fineprint, yeah. 15:38:32 <irl> (and also neither are access control lists) 15:38:43 <irl> we should ban acronyms 15:38:52 <karsten> ;) 15:39:07 <irl> ok i think that's all on that topic then 15:39:12 <irl> it got a lot easier 15:39:41 <karsten> great! 15:40:27 <irl> roadmap? 15:40:44 <karsten> another possible error source could be that our webstats files change 15:40:56 <karsten> every time they're being sanitized. 15:41:11 <karsten> I think this is an issue quite similar to the one in onionoo, though the code is completely different. 15:41:19 <karsten> this is the other low-hanging fruit I mentioned earlier. 15:41:36 <karsten> the effect could be that importing these files into awstats is less efficient than it could be. 15:41:43 <karsten> well, importing them every x hours. 15:42:03 <irl> ah right yes 15:42:10 <irl> i think that awstats is meant to cope with this 15:42:15 <karsten> hopefully. 15:42:16 <irl> because web server access logs change every access 15:42:27 <irl> we should at least not see duplicates 15:42:31 <karsten> right. 15:42:48 <karsten> okay, 15:42:49 <karsten> Roadmap (gaba) 15:43:53 <gaba> how are we doing with it? 15:43:54 <karsten> is there any urgency in doing #32126? 15:43:56 <irl> #32264 and #32473 i think are done, #32186 goes to in progress, #32265 remains in progress 15:44:07 <irl> i think that is not urgent 15:44:17 <gaba> not urgent 15:44:21 <karsten> backlog? 15:44:25 <irl> yeah 15:44:59 <karsten> ah, great, #32473 is done. 15:45:06 <karsten> as is #25924. 15:45:31 <karsten> should I write a new card for #32660? 15:45:50 <karsten> and another one for the webstats issue I mentioned before? 15:45:55 <gaba> please 15:47:48 <karsten> okay. 15:48:24 <karsten> that's all for the roadmap? 15:48:31 <gaba> seems so 15:48:37 <irl> yep 15:48:55 <karsten> and all for today? 15:49:02 <acute> I had a question on new op tickets 15:49:02 <irl> i believe so 15:49:03 <gaba> it seems so :) 15:49:04 <irl> oh 15:49:24 <karsten> yes, please. 15:49:35 <acute> I'm currently doing some onionperf work which is not tracked 15:49:50 <irl> https://dip.torproject.org/torproject/metrics/onionperf/issues is where we are putting all the tickets 15:50:02 <irl> I think this is working now 15:50:15 <acute> so no more trac tickets from now on 15:50:20 <irl> hiro is also tracking the MR problems 15:50:30 <irl> yeah for onionperf no more trac tickets 15:50:45 <karsten> what about existing onionperf trac tickets? 15:50:57 <acute> I've migrated them all to gitlab 15:51:05 <gaba> we will merge those tickets into gitlab once we migrate 15:51:10 <gaba> ohh 15:51:10 <gaba> ok 15:51:24 <irl> yeah the migration will be confusing 15:51:29 <gaba> should we close the onionperf in trac? 15:51:30 <irl> adding a bunch of closed tickets 15:51:49 <gaba> right now we are migrating into a legacy project and the merge is going to be manual 15:51:58 <gaba> but in this case it may makes sense to lock onionperf in trac 15:52:02 <gaba> to not create new tickets there 15:52:14 <irl> we're doing that by not creating new tickets there 15:52:15 <karsten> are they already closed in trac? 15:52:20 <irl> but if there is some lock, that would be good 15:52:27 <irl> i think we did close them in trac 15:52:28 <gaba> yes, I will look into that 15:52:53 <acute> no, we did not 15:52:56 <irl> oh 15:53:32 <karsten> so, there's Archive/Onionperf. 15:53:35 <karsten> Archived* 15:53:41 <gaba> ah 15:53:46 <acute> we were not sure at the time what the migration plan was 15:53:49 <karsten> but they're all closed. 15:53:50 * gaba can not look at it now but can look at it later today 15:54:14 <karsten> I don't think there's a way to create new Onionperf tickets in trac at this point. 15:54:21 <gaba> ok 15:54:23 <irl> acute will also be at the conference tomorrow, so we won't look at this until monday 15:54:26 <slacktopus> <hellais> Re: https://bugs.torproject.org/32126 (Add OONI’s Vanilla Tor measurement data to Tor Metrics) Let us know if there is anything we can do to help out 15:55:13 <karsten> well, that's not true. somebody could create a ticket using that component. 15:55:18 <karsten> but who would do that. 15:55:23 <irl> spam bot 15:55:40 <karsten> I was thinking of another case where it wasn't possible to create new tickets in deleted components. 15:55:45 <karsten> but it's still there, not deleted. 15:55:57 <irl> yeah i don't think you can delete it without orphaning the tickets 15:56:02 <karsten> I think you can. 15:56:15 <irl> oh, then perhaps that is the thing to do 15:56:16 <karsten> they'll still keep that component as a string. 15:56:19 <karsten> maybe. 15:56:19 <irl> ooh 15:56:27 <karsten> requires testing! 15:56:32 <irl> who needs referential integrity anyway 15:56:44 <karsten> indeed. 15:56:55 <karsten> anyway. 15:57:18 <karsten> slacktopus/hellais: okay, will do, thanks! 15:57:33 <irl> gaba: will you let us know how we should handle the tickets for monday? 15:57:37 <gaba> yes 15:57:45 <irl> acute: does this solve the tickets issue? 15:58:01 <acute> yes 15:58:04 <irl> awesome 15:58:08 <karsten> cool! 15:58:17 <karsten> next meeting next thursday as usual? 15:58:21 <irl> yep 15:58:24 <gaba> still if we are using trello we should add onionperf stuff there 15:58:29 <gaba> yes about next meeting 15:59:08 <karsten> great! talk to you next week then! o/ 15:59:11 * gaba is having issues with her internet and getting disconnected from server quite a bit 15:59:14 <gaba> ok 15:59:18 <irl> bye! 15:59:19 <gaba> o/ 15:59:20 <gaba> bye 15:59:23 <karsten> #endmeeting