15:01:55 <karsten> #startmeeting metrics team
15:01:55 <MeetBot> Meeting started Thu Apr 11 15:01:55 2019 UTC.  The chair is karsten. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:01:55 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic.
15:02:19 <karsten> alright, starting with agenda item #1:
15:02:22 <karsten> - grants team is planning grants strategy for 2020. Let's briefly discuss priorities for metrics team (gaba)
15:02:46 <karsten> to be honest, I didn't have a chance to look much at that pad yet.
15:02:49 <gaba> that. I made a summary of projects ideas from different places. We are meeting next week to do a grant strategy for next year
15:03:12 <gaba> if you don't have time then no problem. Probably the priority is scalability
15:03:24 <irl> "overhead for metrics" was relevant in the meeting currently in our regular channel, where they really could have done with more timely webstats and some better analysis pipeline there
15:03:36 <gaba> yep
15:03:49 <gaba> now the plan is to try to add it to all projects
15:03:57 <irl> i see three priorities for metrics if you want a nice list
15:04:02 <gaba> yes please
15:04:21 <irl> 1) more timely analysis of data coming in, and more flexible, so we can answer the questions people have more easily
15:04:39 <irl> 2) automating analysis and alerting, for things like attacks and censorship (s19 related)
15:05:26 <irl> 3) more robust systems, including boosting trust in the data by using external verification and trusted timestamps etc
15:05:53 <karsten> hi acute!
15:05:54 <irl> i could probably come up with something better worded if i think about it more
15:05:58 <acute> hi!
15:06:01 <irl> but those are the things off the top of my head
15:06:07 <karsten> these all sound like things we'd want to have.
15:06:14 <gaba> by 1) you mean time for analysis of data?
15:06:21 <karsten> but I find it hard to say whether these are *the* priorities.
15:06:44 <irl> i mean time from the data being generated (a line written to an apache log) to it being a datapoint plotted on a visualisation
15:07:01 <irl> it's great to know that we see a spike in bridge users in egypt, but it's not so great to only find out 2 days after it happened
15:07:24 <karsten> agreed, more timely data would be good.
15:07:41 <gaba> ok
15:07:51 <karsten> again, I'm not sure if it's #1 on the list.
15:07:55 <irl> we might also have immediate emergencies, like the exit scanner, that take priority over these longer term goals
15:08:14 <gaba> I added the exit scanner to the list
15:08:46 <gaba> for 1, roughly how long it could take that project for 1 person?
15:08:52 <karsten> err
15:08:55 <karsten> wait.
15:09:24 <irl> rewrite our entire data pipeline to use high availability stateful stream processing frameworks? 37864873 points
15:09:31 <gaba> ha
15:09:37 <karsten> yeah, indeed.
15:09:44 <irl> it's a long term goal, something to aim for
15:09:53 <irl> we'd break it down into smaller projects
15:10:02 <irl> the collector work as part of s13 was a step in this direction
15:10:28 <gaba> yep
15:10:31 <karsten> so,
15:10:51 <irl> ideally we can hook these smaller projects as part of other team's projects, like if tor browser wanted a one-click circumvention thing, then we'd need to work on timeliness of bridge user data
15:10:56 <karsten> this is good feedback to have, that more timely data is something that people would like to have.
15:11:21 <karsten> but let's not write a funding proposal just yet.
15:11:29 <irl> and for the deployment of the new website, it would have been good to get webstats data more quickly, people have been logging into the hosts to read out the logs directly
15:11:48 <gaba> ok, I'm including it in the part of things to try to include to any project
15:11:48 <irl> yeah, that sounds good
15:12:03 <gaba> we are not writing a funding proposal but this will feed our calendar of grants to look for
15:12:12 <gaba> this is good for me
15:12:12 <irl> there is still metrics overhead for ongoing maintenance and things that should be justified to include on its own
15:12:13 <gaba> thanks
15:12:52 <irl> but if the case needs to be strengthened for including metrics time, then these are the areas that we could talk about
15:12:53 <irl> and we make "some progress" towards those goals
15:14:04 <gaba> ok. We can move in the agenda.
15:14:16 <karsten> I'm not happy with this, to be honest.
15:14:17 <irl> ok
15:14:19 <irl> oh
15:14:29 <gaba> oh
15:14:34 <gaba> how so karsten?
15:14:37 <karsten> I think it's not a good basis to make funding decisions. or pre-decisions.
15:14:43 <karsten> this needs more thoughts.
15:15:00 <karsten> the three items above sound like important things to do.
15:15:13 <gaba> it will have more thoughts when grants oportunities come in and projects develope
15:15:13 <gaba> or you mean the priorities part?
15:15:14 <karsten> but the scalability thing is something that will eat up a lot of time, too, and it's going to be a priority.
15:15:15 <gaba> if it is the prioritiees part yes..
15:15:21 <gaba> yes
15:15:39 <gaba> scalability is in the list and will have a higher priority
15:15:39 <karsten> adding more data from various sources is going to eat up a lot of time.
15:15:40 <karsten> I'm probably forgetting about many other things that will keep us busy.
15:16:01 <karsten> I feel like we should have a separate meeting to decide what our priorities are for 2020.
15:16:08 <karsten> maybe even an in-person meeting.
15:16:12 <karsten> like in july.
15:16:24 <gaba> Yes. That makes sense
15:16:29 <gaba> This will help us to start the conversation in the grants team
15:16:50 <irl> i guess we can reject any funding proposals that come along, the decision rests with us?
15:16:50 <gaba> yes
15:17:03 <karsten> yes, that is true.
15:17:10 <gaba> And I agree with karsten in having an in person session to discuss goals of the team and priorities for next year.
15:17:55 <irl> how is metrics team not included in the sbws project?
15:17:55 <karsten> I just want to avoid repeating mistakes. and one of our last funding proposals (without naming it) was a mistake.
15:18:03 <irl> it's even under the metrics heading!
15:18:15 <irl> there are many of these projects that i think implicate metrics but we haven't been put on the list
15:18:35 <gaba> can you add metrics to those projecs in the list?
15:18:37 <gaba> feel free to add comments
15:18:49 <karsten> that sounds like a good plan.
15:18:51 <irl> i will take an action to do that after the meeting
15:18:56 <gaba> ok
15:19:04 <karsten> should I read the pad after you made additions, irl?
15:19:25 <irl> yes, it will be done by tomorrow morning
15:19:38 <karsten> okay. maybe leave a comment somewhere or ping me when you're done?
15:19:59 <irl> ok, can send a mail
15:19:59 <karsten> perfect!
15:20:22 <karsten> sorry for being the concerned one, I just think that we should spend more time on decisions that reach that far into the future.
15:20:35 <karsten> thanks!
15:20:51 <gaba> it makes sense
15:20:52 <irl> no problem
15:21:00 <karsten> ok. moving on?
15:21:14 <irl> ok
15:21:21 <gaba> ok
15:21:24 <karsten> - Sponsor 13 blog post (irl)
15:21:25 <irl> https://blog.torproject.org/collecting-aggregating-and-presenting-data-tor-network
15:21:34 <karsten> I saw the conversation going on behind the scenes.
15:21:45 <karsten> glad to see it's out!
15:22:08 <irl> "7 unique clones" on the bushel source code on github
15:22:11 <karsten> and now I'm seeing a lot of comments.
15:22:25 <karsten> what's a unique clone? a fork?
15:22:35 <irl> someone running git clone from a unique ip address
15:22:40 <karsten> ah!
15:23:20 <karsten> maybe bushel is going to attract more contributions because of python.
15:23:21 <irl> we also have metrics on engagement from twitter, and we need to pull out how many people downloaded the collector report from webstats
15:23:37 <irl> yeah, maybe, i've been doing some commits on it recently to show it's still active
15:23:51 <irl> which is how i found #30105
15:24:12 <irl> pr lines in consensuses have trailing whitespace ):
15:24:29 <karsten> huh!
15:24:40 <karsten> feel free to cc metrics-team or me on such issues.
15:24:59 <karsten> (done this myself in this case.)
15:25:15 <karsten> is trac slow for others, too?
15:25:18 <acute> yep
15:25:18 <irl> yeah
15:25:43 <irl> i was thinking if i should cc metrics-team on that ticket
15:25:47 <karsten> okay, nothing more on this topic from me. thanks for writing the post!
15:26:09 <irl> i didn't because i wanted to get some feedback from network-team before deciding if this was a problem
15:26:17 <karsten> sure, that's fine.
15:26:45 <karsten> I didn't see it, because I unsubscribed from that bugs mailing list a while ago.
15:26:57 <irl> i'll add metrics-team to cc in future
15:27:02 <karsten> I only see things where metrics-team or I am cc'ed.
15:27:13 <irl> i guess there may be metrics-lib related things there
15:27:22 <irl> anything formatting consensuses would need to implement this bug
15:27:25 <karsten> okay. just saying that you know why I'm not responding to a ticket.
15:27:31 <irl> yeah ok
15:27:53 <karsten> yes, it might cause some trouble on our side. we'll see.
15:27:54 <karsten> fun bug.
15:28:10 <karsten> okay, moving on?
15:28:20 <irl> i'm probably going to lock comments on the blog post later this evening
15:28:31 <irl> juga has done a post about sbws now, so it's not the newest post anymore
15:28:42 <irl> other than that, we can move on
15:28:43 <stephw> irl: comments need to stay open 2 weeks
15:28:49 <irl> really?
15:28:50 <stephw> as part of the moderation policy
15:28:54 <irl> oh
15:29:04 <gaba> they can be open overnight and deal with it tomorrow...
15:29:09 <irl> ok, so i should write something that tells me if people have commented
15:29:26 <stephw> checking once a day is fine
15:29:50 <irl> ok
15:29:59 <karsten> I'll take a look, too.
15:30:38 <karsten> moving on?
15:30:41 <irl> ok
15:30:47 <karsten> - OnionPerf failure run analysis (karsten)
15:31:08 <acute> yes
15:31:12 <karsten> so, acute and I have been working on #29787.
15:31:24 <karsten> and #29374. (no zwiebelbot?)
15:31:28 <karsten> ah!
15:31:35 <irl> slow trac
15:31:39 <karsten> yeah.
15:32:24 <karsten> everything's on the ticket, I mostly wanted to point out that work is happening there.
15:32:48 <karsten> also, thanks acute, for working on those issues!
15:33:10 <karsten> unless you have questions?
15:33:32 <acute> thanks! I enjoy running the data analysis
15:33:39 <karsten> glad to hear!
15:34:10 <acute> I will update #29787 once I've run my script over the same data as you
15:34:25 <karsten> sounds good!
15:34:44 <karsten> alright, moving on to the next, related agenda item?
15:35:37 <karsten> let's do that and jump back if necessary.
15:35:41 <karsten> - op-ab certificate (karsten)
15:35:51 <karsten> looks like the Let's Encrypt certificate has expired yesterday.
15:36:01 <karsten> which is why collector has trouble getting op-ab data since yesterday.
15:36:08 <karsten> irl: can you take a look?
15:36:17 <irl> yes, can do after the meeting
15:36:21 <karsten> maybe it's a matter of kicking it once or twice.
15:36:42 <karsten> I don't think it affects measurements, right?
15:36:49 <irl> i don't think so
15:36:54 <karsten> ok.
15:37:17 <karsten> okay, moving on?
15:38:04 <karsten> - TorDNSEL (karsten)
15:38:20 <karsten> I'm afraid that I almost gave up on the haskell rescue project.
15:38:35 <karsten> I had asked a volunteer to help with that, but that didn't work out.
15:38:56 <karsten> I had hoped to find somebody reading our monthly report or the vegas meeting notes and wanting to help us out.
15:39:18 <karsten> I heard from a haskell person "oh boy" after briefly seeing the necessary refactoring for the hashtables thing.
15:39:38 <karsten> I don't think I can fix this myself.
15:39:40 <irl> ok
15:39:59 <karsten> we'll have to pick one of the alternatives.
15:40:02 * gaba needs to go out take the kids to school. I will check the logs later. o/
15:40:07 <karsten> o/ gaba!
15:40:26 <irl> i have so far got a formatter for the new exit list format, and now i need the bit that takes pathspider json and turns it into an exit list
15:40:47 <karsten> does pathspider json already contain all relevant data?
15:41:06 <irl> then i need the bit that makes the pathspider json, and i have something that does that, but it's old code that was since removed and is going to be hard to rebase to add it back
15:41:20 <irl> so i might have to rewrite it
15:41:31 <irl> but that gives us everything to have active measurement generating exit lists at least
15:42:20 <karsten> what if we run the tordnsel service ourselves?
15:42:29 <karsten> the current one on debian oldstable?
15:42:40 <karsten> if the admins don't want it anymore.
15:42:50 <irl> i've not yet looked at all the services involved
15:42:55 <karsten> I mean, what you describe sounds good.
15:43:00 <irl> if we had the VM image then we could keep it going
15:43:04 <karsten> I'm just not sure how long it will take to be robust enough.
15:43:08 <irl> if we can run it somewhere
15:43:25 <irl> do they want it off their infrastructure, or do they just want to not be responsible for it?
15:43:31 <karsten> good question.
15:43:42 <karsten> we can find out.
15:43:59 <karsten> in any case, that's one alternative.
15:44:02 <karsten> to buy us time.
15:44:26 <karsten> another alternative is to just let it die and put up a replacement as soon as we have it and are confident enough that it will work.
15:44:53 <karsten> this will make people unhappy. but it's what happens if nobody cares about maintenance.
15:45:05 <karsten> and by care I also mean fund.
15:45:36 <irl> you could also say this is what happens when you write things in esoteric languages
15:45:41 <karsten> hehe
15:46:08 <karsten> I mean, there's truth in that statement.
15:46:31 <karsten> I'll bring this up at the next vegas meeting.
15:46:38 <irl> ok cool
15:46:43 <karsten> again, your plan sounds solid.
15:46:56 <karsten> but we have said we'd deploy something by mid-april.
15:46:57 <irl> i can't really give any time estimates yet
15:47:04 <karsten> I totally agree.
15:47:25 <irl> i want to get to having a system that collects data as quickly as possible to avoid losing any data
15:47:36 <irl> and then the services that consume it will have to catch up after maybe
15:48:13 <karsten> well, if the service is gone, there's no data to miss.
15:48:34 <karsten> tordnsel also misses anything ipv6 related.
15:48:49 <irl> the data is there, you just have to write it down before it fades in time
15:49:28 <irl> did you see my exit list addresses graph?
15:49:36 <karsten> hmm? maybe not?
15:49:37 <irl> https://iain.learmonth.me/blog/2019/2019w151/
15:49:52 <karsten> is that from the ticket that I should review?
15:49:54 <irl> "Using the list of addresses in the consensus would also result in underblocking. In this period, we see an average of 73 addresses that are only to be found in the exit lists, and are not found in the consensus."
15:50:05 <irl> possibly
15:50:27 <karsten> it's on my list.
15:50:38 <karsten> looks interesting at first sight!
15:50:57 <irl> if we need to explain to anyone why exit lists are useful, i think those are all the right numbers to do that with
15:51:51 <karsten> so, alternative 3 would be to keep it running for another few weeks.
15:51:57 <karsten> even on oldstable.
15:52:18 <irl> maybe it is like the brexit extension, we are allowed more time because we demonstrate that we have a plan
15:52:24 <karsten> LOL
15:52:29 <karsten> very good argument!
15:52:39 <karsten> so, I'm asking for november something?
15:52:39 <irl> (:
15:52:42 <irl> aha
15:52:49 <karsten> cool!
15:53:16 <irl> we can ask for a long extension but maybe we have to demonstrate our progress along the way
15:54:22 <irl> the milestones are probably: having another scanner running in paralell that we feed into collector, verifying that the results look similar, updating check to read the new format, replacing the dns server
15:55:08 * gaba is back
15:55:16 <irl> once the dns server is replaced, that's the last thing and we can turn off the oldstable vm
15:55:17 <karsten> that was quick!
15:55:38 <karsten> it's a lot of work.
15:55:47 <karsten> and can go wrong at each step.
15:56:00 <karsten> it just takes time.
15:56:08 <karsten> okay, speaking of time, 4 minutes left.
15:56:15 <irl> one topic to go
15:56:16 <karsten> very quickly:
15:56:17 <karsten> - CollecTor bwauth (karsten)
15:56:20 <gaba> yep, school is close by :)
15:56:31 <karsten> we got a request to start collecting bwauth bandwidth files.
15:56:34 <irl> i want to check that these are actually correctly formatted first
15:56:49 <karsten> okay, do you want to take this ticket for now?
15:56:50 <irl> to fix any bugs in sbws that have been missed like we found in the consensus pr lines
15:56:53 <irl> yeah, ok
15:57:04 <karsten> is it sbws files?
15:57:07 <karsten> or the previous format?
15:57:27 <karsten> anyway, I mostly need to know when I should do something there.
15:57:40 <karsten> something == write collector code.
15:58:13 <irl> ok, i will assign to you when i've done the checks?
15:58:18 <karsten> sounds good to me!
15:58:34 <irl> cool
15:58:42 <karsten> alright, last topic is just the reminder that we're going to have a meeting on next monday.
15:58:50 <karsten> where we do the roadmap update.
15:58:57 <gaba> sounds good
15:58:57 <karsten> and the retrospective. right?
15:59:04 <gaba> yep
15:59:05 <irl> let's also have some more reminders
15:59:12 <karsten> yes?
15:59:14 <irl> 19th and 22nd are public holidays for me and karsten
15:59:19 <irl> next week and following week
15:59:22 <irl> 4-day weekend
15:59:31 <karsten> yep.
15:59:37 <karsten> meeting next thursday?
15:59:44 <gaba> yes
16:00:01 <irl> yes, 1500 utc
16:00:10 <karsten> alright!
16:00:19 <karsten> I gotta run now to the next meeting.
16:00:28 <karsten> thanks, and bye! o/
16:00:30 <gaba> logs off, right?
16:00:31 <irl> bye!
16:00:33 <karsten> #endmeeting