14:00:03 <karsten> #startmeeting metrics team
14:00:03 <MeetBot> Meeting started Thu Aug 18 14:00:03 2016 UTC.  The chair is karsten. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:00:03 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic.
14:00:09 <karsten> who's here for the meeting?
14:04:08 <karsten> hi iwakeh!
14:04:13 <iwakeh> hi karsten!
14:04:19 <karsten> saw you writing on the agenda pad.
14:04:27 <karsten> https://pad.riseup.net/p/zUNzEIFRq5S4
14:04:28 <iwakeh> yes :-)
14:04:37 <karsten> cool. looks like it's just us today.
14:04:52 <iwakeh> ok, many topics anyway.
14:05:03 <karsten> yep. and thanks for running the meeting last week!
14:05:10 <iwakeh> no problem.
14:05:17 <karsten> okay, want to start with your topics?
14:05:22 <iwakeh> fine
14:05:38 <iwakeh> * CollecTor
14:06:17 <iwakeh> should there be a hotfix release for the OOM error fix?
14:06:27 <karsten> probably, yes.
14:06:45 <iwakeh> so, CollecTor 1.0.1
14:06:49 <karsten> should I put that out tomorrow?
14:07:00 <karsten> did you test the fix on your instance?
14:07:03 <iwakeh> It runs fine on the mirror.
14:07:06 <karsten> okay, cool.
14:07:32 <karsten> what's the ticket number again?
14:07:47 <iwakeh> 1.1.0 depends on metrics-lib 1.4.0
14:07:52 <iwakeh> #19913
14:08:34 <karsten> I just noticed that in https://trac.torproject.org/projects/tor/wiki/org/teams/MetricsTeam#PlannedMilestones
14:08:44 <karsten> nice way to visualize that in trac.
14:08:52 <iwakeh> the next topic
14:08:59 <iwakeh> :-)
14:09:03 * karsten looks at 1.4.0
14:09:33 <karsten> some of those tickets might take a while.
14:09:37 <karsten> #16225
14:09:44 <karsten> #18797
14:09:49 <karsten> #17831
14:09:50 <iwakeh> right.
14:09:53 <karsten> #19640
14:10:00 <iwakeh> add metrics-lib 1.5.0
14:10:07 <karsten> as a milestone?
14:10:12 <iwakeh> sure.
14:10:18 <karsten> let me quickly do that..
14:10:41 <karsten> done.
14:10:46 <iwakeh> great.
14:10:59 <karsten> what was your idea for releasing 1.4.0?
14:11:05 <iwakeh> I'm currently impplementing
14:11:18 <iwakeh> the index.json downloader.
14:11:23 <karsten> ah, cool.
14:11:36 <karsten> (and by idea I mean when did you want to have a 1.4.0 release?)
14:11:43 <iwakeh> and think, this could be ready
14:12:07 <iwakeh> for review beginning next week i.e. monday.
14:12:28 <iwakeh> But, as I find myself adding comments
14:12:28 <karsten> ok.
14:12:32 <iwakeh> like
14:12:50 <iwakeh> /TODO: log warning when logging is available.
14:13:01 <iwakeh> I'd like to also include
14:13:10 <iwakeh> #19643
14:13:19 <karsten> ok.
14:13:29 <iwakeh> Just these two in 1.4.0?
14:13:41 <karsten> oh, #19893 is trivial.
14:14:15 <iwakeh> sure, all these things, too.
14:14:26 <karsten> the rest might wait for 1.5.0.
14:14:35 <iwakeh> it's merge ready, anyway :-)
14:14:42 <iwakeh> yes.
14:14:49 <karsten> hehe
14:14:53 <karsten> even better.
14:15:05 <iwakeh> I'll also increase coverage a bit.
14:15:33 <iwakeh> as part of the main tickets.
14:15:39 <karsten> okay, #19643, #19791, and #19893 for 1.4.0.
14:16:01 <iwakeh> sounds fine.
14:16:10 <karsten> let me quickly move the remaining ones to 1.5.0.
14:18:13 <karsten> done, I think.
14:18:31 <karsten> aiming for when?
14:18:48 <iwakeh> hmm, depends on the review.
14:19:05 <karsten> depends on the patch length. ;)
14:19:15 <iwakeh> August 30; longer patch ;-)
14:19:37 <karsten> what about other releases in august?
14:19:52 <karsten> should we put out collector 1.1.0 in august?
14:19:55 <iwakeh> CollecTor 1.1.0 is related.
14:20:48 <iwakeh> But I'm going to be offline next weekend.
14:21:05 <karsten> lots of tickets for 1.1.0.
14:21:15 <iwakeh> reduce?
14:21:28 <iwakeh> and move?
14:21:42 <karsten> it's currently scheduled for aug 31?
14:21:45 <iwakeh> a few are minor.
14:21:57 <iwakeh> We could aim at it.
14:22:31 <karsten> if we do that, should we schedule metrics-lib 1.4.0 for end of next week?
14:22:36 <karsten> so that there's a bit of time between releases?
14:22:55 <iwakeh> Sure.
14:23:01 <karsten> aug 26?
14:23:28 <iwakeh> Well, I'm offline from Aug 25-29
14:23:37 <karsten> okay, aug 24?
14:23:45 <iwakeh> sure.
14:23:48 <karsten> I can prioritize review there.
14:23:54 <karsten> deadlines help sometimes.
14:23:56 <karsten> okay.
14:24:01 <iwakeh> yes :-)
14:24:28 <karsten> sounds like a good plan.
14:24:33 <iwakeh> yep.
14:25:19 <iwakeh> * roadmap ... ?
14:25:34 <karsten> yes, which one? :)
14:25:37 <iwakeh> https://trac.torproject.org/projects/tor/wiki/org/teams/MetricsTeam#ReleasesandMilestones
14:25:48 <iwakeh> you noticed already.
14:25:53 <karsten> ah that one!
14:26:08 <iwakeh> I'd like to add a graph there for dependencies.
14:26:18 <iwakeh> graphviz works in trac I think.
14:26:32 <iwakeh> and remove "next steps"
14:26:33 <karsten> does it need... a plugin?
14:26:37 <iwakeh> replace the
14:26:46 <iwakeh> open topics with tickets.
14:26:59 <iwakeh> It's too cumbersome to track in that list.
14:27:09 <iwakeh> https://trac.torproject.org/projects/tor/wiki/org/teams/MetricsTeam#NextSteps
14:28:02 <iwakeh> regarding graphviz: Ijust need to find the time to check.
14:28:30 <karsten> removing that Next Steps list seems like a good idea.
14:28:37 <iwakeh> But, that shouldn't be a problem either way. with or without plugin.
14:28:38 <karsten> though we'll have to turn some tasks into tickets.
14:28:43 <iwakeh> right.
14:28:49 <karsten> but many can probably just go away.
14:29:05 <iwakeh> yes, it's more like a scratch pad.
14:29:55 <karsten> okay, I'll clean up a bit there this weekend.
14:30:06 <iwakeh> great!
14:30:26 <iwakeh> * onionoo versioning?
14:30:30 <karsten> yes?
14:30:47 <iwakeh> Looking at the planned milestones it
14:31:02 <iwakeh> is odd that we have 3.1.1 even before the first release.
14:31:06 <iwakeh> I think, we
14:31:31 <iwakeh> should use onionoo-3.1-1.0.0 as version like
14:31:38 <iwakeh> junit4-4.11
14:31:51 <iwakeh> or commons-lang-3-3.5
14:32:11 <karsten> that might work.
14:32:14 <iwakeh> It's quite a few numbers. but it fits better into the planning.
14:32:29 <karsten> we could drop the last digit.
14:32:53 <iwakeh> Just onionoo-x.y-a.b ?
14:33:02 <karsten> yes.
14:33:30 <iwakeh> So, only major releases for protocol changes?
14:33:40 <karsten> ah, no.
14:33:43 <karsten> x.y is protocol.
14:33:48 <iwakeh> right.
14:33:49 <karsten> a.b is implementation.
14:34:18 <karsten> I mean, I can also imagine x.y-a.b.c.
14:34:29 <iwakeh> That's better.
14:34:32 <karsten> but you're right that x.y.a doesn't work so well.
14:34:46 <karsten> well, might not work so well.
14:34:52 <iwakeh> No, I wondered even if i missed a release :-)
14:34:52 <karsten> okay, x.y-a.b.cd?
14:35:03 <karsten> errr
14:35:05 <iwakeh> d < 10
14:35:10 <karsten> okay, x.y-a.b.c?
14:35:16 <iwakeh> yes :-)
14:35:19 * karsten types cd too often.
14:35:53 <karsten> okay, that means I should rename some milestones?
14:36:02 <iwakeh> yes
14:36:19 <karsten> "Onionoo 3.1-1.0.0"
14:36:20 <iwakeh> not urgent.
14:36:24 <karsten> right?
14:36:37 <iwakeh> that's right. Onionoo 3.1-1.0.0
14:36:37 <karsten> or "Onionoo 3.1-3.1.0"?
14:36:40 <karsten> ok.
14:37:05 <iwakeh> unless we want to create a milestone inflation ;-)
14:37:16 <iwakeh> Exonerator 10.11.0
14:37:17 <karsten> renamed.
14:37:21 <karsten> hehe
14:37:26 <iwakeh> thanks.
14:37:45 <iwakeh> I'll adapt the wiki pages later.
14:37:48 <karsten> we have Onionoo 3.1-1.0.0 and 3.1-1.1.0 now.
14:37:51 <karsten> ok.
14:37:56 <iwakeh> fine.
14:38:20 <iwakeh> * webstats?
14:38:23 <karsten> yep!
14:38:39 <karsten> Sebastian and I started adding sanitized web logs to a database.
14:38:57 <iwakeh> where are the import scripts?
14:39:07 <karsten> that was not really planned, but we happened to meet and found this to be a fun thing to do.
14:39:26 <karsten> so far, on my laptop. I can paste them if you're curious.
14:39:38 <Sebastian> (hi)
14:39:41 <iwakeh> oh, for documentation.
14:39:42 <karsten> hi!
14:39:48 <iwakeh> hi sebastian!
14:39:53 <karsten> at some point they should go into metrics-web.git.
14:39:59 <karsten> maybe for now they could live in metrics-tasks.git.
14:40:04 <iwakeh> yes.
14:40:19 <iwakeh> just to know what was implemented.
14:40:22 <karsten> okay, I'll put them there.
14:40:41 <iwakeh> I think the db idea is good, because we have
14:41:00 <iwakeh> other questions than usual web-statistics.
14:41:11 <karsten> right.
14:41:18 <karsten> and different data.
14:41:20 <Sebastian> It might be an idea to have a read-only export somewhere
14:41:28 <Sebastian> that people can query on a tpo host
14:41:33 <Sebastian> like people or so
14:41:46 <karsten> people.tp.o.
14:41:47 <iwakeh> with phpadmin access?
14:42:04 <karsten> with psql access.
14:42:05 <iwakeh> or similar.
14:42:12 <iwakeh> oh, command line?
14:42:17 <karsten> yep.
14:42:32 <iwakeh> well, if that's sufficient for the intended users.
14:42:43 <Sebastian> We grouped users into two camps
14:42:44 <karsten> fine question. we discusses that, too.
14:42:48 <karsten> discussed*
14:42:59 * karsten lets Sebastian explain.
14:42:59 <Sebastian> one camp is those who probably are fine with it, and the other camp is those who probably need manual help anyway.
14:43:22 <Sebastian> and we enable people from camp one to help those in camp two. In addition to the metrics team.
14:43:45 <iwakeh> well, I saw people use SQL
14:43:57 <iwakeh> comfortably in the gui of phpadmin who
14:44:12 <iwakeh> would not be able to use ssh login and cmdline
14:44:30 <iwakeh> But, I actually have no idea who would access.
14:44:52 <karsten> there are people who are in neither of the two groups.
14:45:18 <karsten> we just thought about the effort required for building an interface for them vs. the time to just help them get their answers.
14:45:28 <karsten> I don't know about phpadmin, but I sense potential security issues.
14:45:30 <irl> if there's a read only dump there's no reason people can't access it through whatever system they want to
14:45:35 <iwakeh> phpadmin is out of the box.
14:45:43 <irl> the model used for the ultimate debian database seems relevant here
14:46:02 <irl> one central system you can access with psql from trusted hosts, but also read-only dumps where you can easily mirror and do whatever you want
14:46:19 <iwakeh> yes.
14:46:20 <irl> if you want a fancy php gui, then fine, but we don't run it
14:46:58 <irl> https://wiki.debian.org/UltimateDebianDatabase/ (for context)
14:47:01 <iwakeh> do you consider phpadmin fancy?
14:47:17 <irl> i need to use a mouse to use it, it's fancy
14:47:31 <iwakeh> no, the tabbing works :-)
14:47:46 <irl> if people are using the data to do analysis for certain cases, then the end result is probably not the raw output that you want, it's some post-processing that makes a report or visualisation
14:47:59 <irl> so if people want to produce such things, they write something that does that and maybe we host that
14:48:09 <iwakeh> There are simple csv exports for those.
14:48:42 <karsten> okay, I'd say let's get the database ready first before thinking about how to make it available.
14:48:55 <iwakeh> yes, one step after the other.
14:49:06 <karsten> what we did was 1 day of work, not more. we extracted some basic results.
14:49:07 <iwakeh> so read-only db is a good idea.
14:49:15 <karsten> but we need to put more effort into it.
14:49:20 <iwakeh> yes.
14:49:24 <karsten> including getting feedback from tor browser people.
14:49:55 <karsten> I'd say let's revisit this topic when we have a good database to share.
14:50:17 <iwakeh> will the schema stay as it is?
14:50:34 <Sebastian> It's ad hoc
14:50:49 <Sebastian> just to see what we can do with a couple of queries
14:50:50 <karsten> so, maybe, but probably not.
14:51:09 <iwakeh> ok.
14:51:10 <karsten> we also only imported ~2 weeks of data.
14:51:27 <iwakeh> how many weeks should be in there?
14:51:37 <karsten> hmmmm
14:52:01 * karsten finds https://webstats.torproject.org/out/archeotrichon.torproject.org/archive.torproject.org-access.log-20150920.xz
14:52:02 <Sebastian> at least 18 months
14:52:33 <karsten> ah, you mean by throwing out data that's older than 18 months?
14:52:56 <karsten> it looks like the oldest web logs are from 2015-09-20.
14:53:15 <karsten> ideally, we wouldn't have to throw out data.
14:53:27 <iwakeh> depends, if any historic analysis could be of value later?
14:53:49 <iwakeh> that's what dbs are for - long time storage.
14:53:54 <karsten> yep.
14:54:02 <Sebastian> karsten: yes
14:54:12 <karsten> so, we should try to import the year of data we have.
14:54:20 <iwakeh> yes.
14:54:20 <Sebastian> I'm saying that if we have to throw away data, we should store at least 18 months.
14:54:25 <karsten> ok.
14:54:47 <karsten> alright, next topic? (5 mins left)
14:54:52 <iwakeh> fine.
14:55:07 <karsten> * ExoneraTor database cleanup (karsten)
14:55:25 <karsten> Sebastian and I (see the pattern here?) cleaned up the exonerator database a bit.
14:55:30 <iwakeh> hehe.
14:55:33 <karsten> note to self: vacuum full can take a while.
14:55:45 <karsten> so, we threw out some fields and reduced size by 11%.
14:55:59 <karsten> today I implemented more changes and reduced my sample table from
14:56:03 <iwakeh> ok, what data is missing now?
14:56:04 <karsten> 805 MB
14:56:09 <karsten> to 66 MB.
14:56:20 <karsten> nothing is missing, just stored more efficiently.
14:56:25 <iwakeh> ok.
14:56:31 <karsten> in the 11% we threw out unused data.
14:57:11 <karsten> the next change will move stuff to new tables.
14:57:22 <karsten> (which are 640k and 392k in my sample database.)
14:57:33 <Sebastian> the old schema was very suboptimal with lots of duplicated information
14:57:42 <karsten> yep.
14:57:46 <Sebastian> Hopefully there'll be some easy gains wrt query performance, not just table size.
14:57:55 * Sebastian out, have a fun rest of the meeting
14:58:00 <karsten> bye!
14:58:10 <karsten> I'm optimistic wrt query performance.
14:58:26 <iwakeh> Well, let's see ...
14:58:56 <iwakeh> are the changes visible somewhere in the git repo?
14:59:03 <karsten> okay, I'm mostly mentioning this to let you know that I'm cleaning up a bit of these exonerator changes now.
14:59:21 <iwakeh> sounds good.
14:59:23 <karsten> before focusing on something entirely different. because if I don't, I'll lose a lot of context here.
14:59:32 <karsten> first change is pushed to master.
14:59:48 <karsten> second is still in a text file here. would you want to review that change before I push to master?
15:00:10 <iwakeh> is it a tricky change?
15:00:28 <karsten> the migration part from the current schema is tricky. the change is simple.
15:00:47 <iwakeh> It can be reverted? rolled back?
15:01:07 <karsten> the schema change?
15:01:25 <iwakeh> applying the change to the db?
15:01:42 <karsten> not really.
15:02:04 <iwakeh> is there no db backup?
15:02:29 <karsten> ah, yes, there are host backups.
15:02:48 <karsten> but no archives.
15:02:59 <karsten> anyway, I guess we can create a backup before making the change.
15:03:11 <karsten> just to avoid re-importing 10 years of data. :)
15:03:15 <iwakeh> that might be useful.
15:03:42 <karsten> ok.
15:03:54 <karsten> guess we ran out of time and topics.
15:03:57 <iwakeh> will there be application changes, too? b/c of the schema changes?
15:04:02 <karsten> nope.
15:04:07 <karsten> ah
15:04:09 <iwakeh> Oh, 6 past 17:00 :-)
15:04:11 <karsten> tiny ones.
15:04:26 <karsten> but 98% of changes are in the schema.
15:04:44 <karsten> stored procedures for inserting and querying.
15:04:52 <iwakeh> well, we'll see how the performance changes.
15:04:56 <iwakeh> :-)
15:05:03 <karsten> heh
15:05:29 <karsten> okay,
15:05:50 <karsten> I have a few action items here.
15:05:57 <karsten> as do you, I think.
15:05:58 <iwakeh> e.g.
15:06:06 <iwakeh> oh
15:06:15 <iwakeh> yes. there are.
15:06:18 <karsten> clean up Next Steps on team page
15:06:19 <karsten> review #19913 and release CollecTor 1.0.1
15:06:19 <karsten> #19643, #19791, and #19893 for metrics-lib 1.4.0, aug 24
15:06:19 <karsten> collector 1.1.0 aug 31
15:06:19 <karsten> make exonerator db backup before applying next change
15:06:33 <karsten> those are mine.
15:07:11 <iwakeh> I adapt the wiki pages and work on metrics-lib/collector 1.1.0 1.4.0
15:07:43 <karsten> sounds good! talk to you next week and on trac tickets until then?
15:08:02 <iwakeh> yes. all set.
15:08:13 <iwakeh> bye, bye!
15:08:13 <karsten> great! bye! :)
15:08:16 <karsten> #endmeeting