14:00:03 <karsten> #startmeeting metrics team 14:00:03 <MeetBot> Meeting started Thu Aug 18 14:00:03 2016 UTC. The chair is karsten. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:03 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic. 14:00:09 <karsten> who's here for the meeting? 14:04:08 <karsten> hi iwakeh! 14:04:13 <iwakeh> hi karsten! 14:04:19 <karsten> saw you writing on the agenda pad. 14:04:27 <karsten> https://pad.riseup.net/p/zUNzEIFRq5S4 14:04:28 <iwakeh> yes :-) 14:04:37 <karsten> cool. looks like it's just us today. 14:04:52 <iwakeh> ok, many topics anyway. 14:05:03 <karsten> yep. and thanks for running the meeting last week! 14:05:10 <iwakeh> no problem. 14:05:17 <karsten> okay, want to start with your topics? 14:05:22 <iwakeh> fine 14:05:38 <iwakeh> * CollecTor 14:06:17 <iwakeh> should there be a hotfix release for the OOM error fix? 14:06:27 <karsten> probably, yes. 14:06:45 <iwakeh> so, CollecTor 1.0.1 14:06:49 <karsten> should I put that out tomorrow? 14:07:00 <karsten> did you test the fix on your instance? 14:07:03 <iwakeh> It runs fine on the mirror. 14:07:06 <karsten> okay, cool. 14:07:32 <karsten> what's the ticket number again? 14:07:47 <iwakeh> 1.1.0 depends on metrics-lib 1.4.0 14:07:52 <iwakeh> #19913 14:08:34 <karsten> I just noticed that in https://trac.torproject.org/projects/tor/wiki/org/teams/MetricsTeam#PlannedMilestones 14:08:44 <karsten> nice way to visualize that in trac. 14:08:52 <iwakeh> the next topic 14:08:59 <iwakeh> :-) 14:09:03 * karsten looks at 1.4.0 14:09:33 <karsten> some of those tickets might take a while. 14:09:37 <karsten> #16225 14:09:44 <karsten> #18797 14:09:49 <karsten> #17831 14:09:50 <iwakeh> right. 14:09:53 <karsten> #19640 14:10:00 <iwakeh> add metrics-lib 1.5.0 14:10:07 <karsten> as a milestone? 14:10:12 <iwakeh> sure. 14:10:18 <karsten> let me quickly do that.. 14:10:41 <karsten> done. 14:10:46 <iwakeh> great. 14:10:59 <karsten> what was your idea for releasing 1.4.0? 14:11:05 <iwakeh> I'm currently impplementing 14:11:18 <iwakeh> the index.json downloader. 14:11:23 <karsten> ah, cool. 14:11:36 <karsten> (and by idea I mean when did you want to have a 1.4.0 release?) 14:11:43 <iwakeh> and think, this could be ready 14:12:07 <iwakeh> for review beginning next week i.e. monday. 14:12:28 <iwakeh> But, as I find myself adding comments 14:12:28 <karsten> ok. 14:12:32 <iwakeh> like 14:12:50 <iwakeh> /TODO: log warning when logging is available. 14:13:01 <iwakeh> I'd like to also include 14:13:10 <iwakeh> #19643 14:13:19 <karsten> ok. 14:13:29 <iwakeh> Just these two in 1.4.0? 14:13:41 <karsten> oh, #19893 is trivial. 14:14:15 <iwakeh> sure, all these things, too. 14:14:26 <karsten> the rest might wait for 1.5.0. 14:14:35 <iwakeh> it's merge ready, anyway :-) 14:14:42 <iwakeh> yes. 14:14:49 <karsten> hehe 14:14:53 <karsten> even better. 14:15:05 <iwakeh> I'll also increase coverage a bit. 14:15:33 <iwakeh> as part of the main tickets. 14:15:39 <karsten> okay, #19643, #19791, and #19893 for 1.4.0. 14:16:01 <iwakeh> sounds fine. 14:16:10 <karsten> let me quickly move the remaining ones to 1.5.0. 14:18:13 <karsten> done, I think. 14:18:31 <karsten> aiming for when? 14:18:48 <iwakeh> hmm, depends on the review. 14:19:05 <karsten> depends on the patch length. ;) 14:19:15 <iwakeh> August 30; longer patch ;-) 14:19:37 <karsten> what about other releases in august? 14:19:52 <karsten> should we put out collector 1.1.0 in august? 14:19:55 <iwakeh> CollecTor 1.1.0 is related. 14:20:48 <iwakeh> But I'm going to be offline next weekend. 14:21:05 <karsten> lots of tickets for 1.1.0. 14:21:15 <iwakeh> reduce? 14:21:28 <iwakeh> and move? 14:21:42 <karsten> it's currently scheduled for aug 31? 14:21:45 <iwakeh> a few are minor. 14:21:57 <iwakeh> We could aim at it. 14:22:31 <karsten> if we do that, should we schedule metrics-lib 1.4.0 for end of next week? 14:22:36 <karsten> so that there's a bit of time between releases? 14:22:55 <iwakeh> Sure. 14:23:01 <karsten> aug 26? 14:23:28 <iwakeh> Well, I'm offline from Aug 25-29 14:23:37 <karsten> okay, aug 24? 14:23:45 <iwakeh> sure. 14:23:48 <karsten> I can prioritize review there. 14:23:54 <karsten> deadlines help sometimes. 14:23:56 <karsten> okay. 14:24:01 <iwakeh> yes :-) 14:24:28 <karsten> sounds like a good plan. 14:24:33 <iwakeh> yep. 14:25:19 <iwakeh> * roadmap ... ? 14:25:34 <karsten> yes, which one? :) 14:25:37 <iwakeh> https://trac.torproject.org/projects/tor/wiki/org/teams/MetricsTeam#ReleasesandMilestones 14:25:48 <iwakeh> you noticed already. 14:25:53 <karsten> ah that one! 14:26:08 <iwakeh> I'd like to add a graph there for dependencies. 14:26:18 <iwakeh> graphviz works in trac I think. 14:26:32 <iwakeh> and remove "next steps" 14:26:33 <karsten> does it need... a plugin? 14:26:37 <iwakeh> replace the 14:26:46 <iwakeh> open topics with tickets. 14:26:59 <iwakeh> It's too cumbersome to track in that list. 14:27:09 <iwakeh> https://trac.torproject.org/projects/tor/wiki/org/teams/MetricsTeam#NextSteps 14:28:02 <iwakeh> regarding graphviz: Ijust need to find the time to check. 14:28:30 <karsten> removing that Next Steps list seems like a good idea. 14:28:37 <iwakeh> But, that shouldn't be a problem either way. with or without plugin. 14:28:38 <karsten> though we'll have to turn some tasks into tickets. 14:28:43 <iwakeh> right. 14:28:49 <karsten> but many can probably just go away. 14:29:05 <iwakeh> yes, it's more like a scratch pad. 14:29:55 <karsten> okay, I'll clean up a bit there this weekend. 14:30:06 <iwakeh> great! 14:30:26 <iwakeh> * onionoo versioning? 14:30:30 <karsten> yes? 14:30:47 <iwakeh> Looking at the planned milestones it 14:31:02 <iwakeh> is odd that we have 3.1.1 even before the first release. 14:31:06 <iwakeh> I think, we 14:31:31 <iwakeh> should use onionoo-3.1-1.0.0 as version like 14:31:38 <iwakeh> junit4-4.11 14:31:51 <iwakeh> or commons-lang-3-3.5 14:32:11 <karsten> that might work. 14:32:14 <iwakeh> It's quite a few numbers. but it fits better into the planning. 14:32:29 <karsten> we could drop the last digit. 14:32:53 <iwakeh> Just onionoo-x.y-a.b ? 14:33:02 <karsten> yes. 14:33:30 <iwakeh> So, only major releases for protocol changes? 14:33:40 <karsten> ah, no. 14:33:43 <karsten> x.y is protocol. 14:33:48 <iwakeh> right. 14:33:49 <karsten> a.b is implementation. 14:34:18 <karsten> I mean, I can also imagine x.y-a.b.c. 14:34:29 <iwakeh> That's better. 14:34:32 <karsten> but you're right that x.y.a doesn't work so well. 14:34:46 <karsten> well, might not work so well. 14:34:52 <iwakeh> No, I wondered even if i missed a release :-) 14:34:52 <karsten> okay, x.y-a.b.cd? 14:35:03 <karsten> errr 14:35:05 <iwakeh> d < 10 14:35:10 <karsten> okay, x.y-a.b.c? 14:35:16 <iwakeh> yes :-) 14:35:19 * karsten types cd too often. 14:35:53 <karsten> okay, that means I should rename some milestones? 14:36:02 <iwakeh> yes 14:36:19 <karsten> "Onionoo 3.1-1.0.0" 14:36:20 <iwakeh> not urgent. 14:36:24 <karsten> right? 14:36:37 <iwakeh> that's right. Onionoo 3.1-1.0.0 14:36:37 <karsten> or "Onionoo 3.1-3.1.0"? 14:36:40 <karsten> ok. 14:37:05 <iwakeh> unless we want to create a milestone inflation ;-) 14:37:16 <iwakeh> Exonerator 10.11.0 14:37:17 <karsten> renamed. 14:37:21 <karsten> hehe 14:37:26 <iwakeh> thanks. 14:37:45 <iwakeh> I'll adapt the wiki pages later. 14:37:48 <karsten> we have Onionoo 3.1-1.0.0 and 3.1-1.1.0 now. 14:37:51 <karsten> ok. 14:37:56 <iwakeh> fine. 14:38:20 <iwakeh> * webstats? 14:38:23 <karsten> yep! 14:38:39 <karsten> Sebastian and I started adding sanitized web logs to a database. 14:38:57 <iwakeh> where are the import scripts? 14:39:07 <karsten> that was not really planned, but we happened to meet and found this to be a fun thing to do. 14:39:26 <karsten> so far, on my laptop. I can paste them if you're curious. 14:39:38 <Sebastian> (hi) 14:39:41 <iwakeh> oh, for documentation. 14:39:42 <karsten> hi! 14:39:48 <iwakeh> hi sebastian! 14:39:53 <karsten> at some point they should go into metrics-web.git. 14:39:59 <karsten> maybe for now they could live in metrics-tasks.git. 14:40:04 <iwakeh> yes. 14:40:19 <iwakeh> just to know what was implemented. 14:40:22 <karsten> okay, I'll put them there. 14:40:41 <iwakeh> I think the db idea is good, because we have 14:41:00 <iwakeh> other questions than usual web-statistics. 14:41:11 <karsten> right. 14:41:18 <karsten> and different data. 14:41:20 <Sebastian> It might be an idea to have a read-only export somewhere 14:41:28 <Sebastian> that people can query on a tpo host 14:41:33 <Sebastian> like people or so 14:41:46 <karsten> people.tp.o. 14:41:47 <iwakeh> with phpadmin access? 14:42:04 <karsten> with psql access. 14:42:05 <iwakeh> or similar. 14:42:12 <iwakeh> oh, command line? 14:42:17 <karsten> yep. 14:42:32 <iwakeh> well, if that's sufficient for the intended users. 14:42:43 <Sebastian> We grouped users into two camps 14:42:44 <karsten> fine question. we discusses that, too. 14:42:48 <karsten> discussed* 14:42:59 * karsten lets Sebastian explain. 14:42:59 <Sebastian> one camp is those who probably are fine with it, and the other camp is those who probably need manual help anyway. 14:43:22 <Sebastian> and we enable people from camp one to help those in camp two. In addition to the metrics team. 14:43:45 <iwakeh> well, I saw people use SQL 14:43:57 <iwakeh> comfortably in the gui of phpadmin who 14:44:12 <iwakeh> would not be able to use ssh login and cmdline 14:44:30 <iwakeh> But, I actually have no idea who would access. 14:44:52 <karsten> there are people who are in neither of the two groups. 14:45:18 <karsten> we just thought about the effort required for building an interface for them vs. the time to just help them get their answers. 14:45:28 <karsten> I don't know about phpadmin, but I sense potential security issues. 14:45:30 <irl> if there's a read only dump there's no reason people can't access it through whatever system they want to 14:45:35 <iwakeh> phpadmin is out of the box. 14:45:43 <irl> the model used for the ultimate debian database seems relevant here 14:46:02 <irl> one central system you can access with psql from trusted hosts, but also read-only dumps where you can easily mirror and do whatever you want 14:46:19 <iwakeh> yes. 14:46:20 <irl> if you want a fancy php gui, then fine, but we don't run it 14:46:58 <irl> https://wiki.debian.org/UltimateDebianDatabase/ (for context) 14:47:01 <iwakeh> do you consider phpadmin fancy? 14:47:17 <irl> i need to use a mouse to use it, it's fancy 14:47:31 <iwakeh> no, the tabbing works :-) 14:47:46 <irl> if people are using the data to do analysis for certain cases, then the end result is probably not the raw output that you want, it's some post-processing that makes a report or visualisation 14:47:59 <irl> so if people want to produce such things, they write something that does that and maybe we host that 14:48:09 <iwakeh> There are simple csv exports for those. 14:48:42 <karsten> okay, I'd say let's get the database ready first before thinking about how to make it available. 14:48:55 <iwakeh> yes, one step after the other. 14:49:06 <karsten> what we did was 1 day of work, not more. we extracted some basic results. 14:49:07 <iwakeh> so read-only db is a good idea. 14:49:15 <karsten> but we need to put more effort into it. 14:49:20 <iwakeh> yes. 14:49:24 <karsten> including getting feedback from tor browser people. 14:49:55 <karsten> I'd say let's revisit this topic when we have a good database to share. 14:50:17 <iwakeh> will the schema stay as it is? 14:50:34 <Sebastian> It's ad hoc 14:50:49 <Sebastian> just to see what we can do with a couple of queries 14:50:50 <karsten> so, maybe, but probably not. 14:51:09 <iwakeh> ok. 14:51:10 <karsten> we also only imported ~2 weeks of data. 14:51:27 <iwakeh> how many weeks should be in there? 14:51:37 <karsten> hmmmm 14:52:01 * karsten finds https://webstats.torproject.org/out/archeotrichon.torproject.org/archive.torproject.org-access.log-20150920.xz 14:52:02 <Sebastian> at least 18 months 14:52:33 <karsten> ah, you mean by throwing out data that's older than 18 months? 14:52:56 <karsten> it looks like the oldest web logs are from 2015-09-20. 14:53:15 <karsten> ideally, we wouldn't have to throw out data. 14:53:27 <iwakeh> depends, if any historic analysis could be of value later? 14:53:49 <iwakeh> that's what dbs are for - long time storage. 14:53:54 <karsten> yep. 14:54:02 <Sebastian> karsten: yes 14:54:12 <karsten> so, we should try to import the year of data we have. 14:54:20 <iwakeh> yes. 14:54:20 <Sebastian> I'm saying that if we have to throw away data, we should store at least 18 months. 14:54:25 <karsten> ok. 14:54:47 <karsten> alright, next topic? (5 mins left) 14:54:52 <iwakeh> fine. 14:55:07 <karsten> * ExoneraTor database cleanup (karsten) 14:55:25 <karsten> Sebastian and I (see the pattern here?) cleaned up the exonerator database a bit. 14:55:30 <iwakeh> hehe. 14:55:33 <karsten> note to self: vacuum full can take a while. 14:55:45 <karsten> so, we threw out some fields and reduced size by 11%. 14:55:59 <karsten> today I implemented more changes and reduced my sample table from 14:56:03 <iwakeh> ok, what data is missing now? 14:56:04 <karsten> 805 MB 14:56:09 <karsten> to 66 MB. 14:56:20 <karsten> nothing is missing, just stored more efficiently. 14:56:25 <iwakeh> ok. 14:56:31 <karsten> in the 11% we threw out unused data. 14:57:11 <karsten> the next change will move stuff to new tables. 14:57:22 <karsten> (which are 640k and 392k in my sample database.) 14:57:33 <Sebastian> the old schema was very suboptimal with lots of duplicated information 14:57:42 <karsten> yep. 14:57:46 <Sebastian> Hopefully there'll be some easy gains wrt query performance, not just table size. 14:57:55 * Sebastian out, have a fun rest of the meeting 14:58:00 <karsten> bye! 14:58:10 <karsten> I'm optimistic wrt query performance. 14:58:26 <iwakeh> Well, let's see ... 14:58:56 <iwakeh> are the changes visible somewhere in the git repo? 14:59:03 <karsten> okay, I'm mostly mentioning this to let you know that I'm cleaning up a bit of these exonerator changes now. 14:59:21 <iwakeh> sounds good. 14:59:23 <karsten> before focusing on something entirely different. because if I don't, I'll lose a lot of context here. 14:59:32 <karsten> first change is pushed to master. 14:59:48 <karsten> second is still in a text file here. would you want to review that change before I push to master? 15:00:10 <iwakeh> is it a tricky change? 15:00:28 <karsten> the migration part from the current schema is tricky. the change is simple. 15:00:47 <iwakeh> It can be reverted? rolled back? 15:01:07 <karsten> the schema change? 15:01:25 <iwakeh> applying the change to the db? 15:01:42 <karsten> not really. 15:02:04 <iwakeh> is there no db backup? 15:02:29 <karsten> ah, yes, there are host backups. 15:02:48 <karsten> but no archives. 15:02:59 <karsten> anyway, I guess we can create a backup before making the change. 15:03:11 <karsten> just to avoid re-importing 10 years of data. :) 15:03:15 <iwakeh> that might be useful. 15:03:42 <karsten> ok. 15:03:54 <karsten> guess we ran out of time and topics. 15:03:57 <iwakeh> will there be application changes, too? b/c of the schema changes? 15:04:02 <karsten> nope. 15:04:07 <karsten> ah 15:04:09 <iwakeh> Oh, 6 past 17:00 :-) 15:04:11 <karsten> tiny ones. 15:04:26 <karsten> but 98% of changes are in the schema. 15:04:44 <karsten> stored procedures for inserting and querying. 15:04:52 <iwakeh> well, we'll see how the performance changes. 15:04:56 <iwakeh> :-) 15:05:03 <karsten> heh 15:05:29 <karsten> okay, 15:05:50 <karsten> I have a few action items here. 15:05:57 <karsten> as do you, I think. 15:05:58 <iwakeh> e.g. 15:06:06 <iwakeh> oh 15:06:15 <iwakeh> yes. there are. 15:06:18 <karsten> clean up Next Steps on team page 15:06:19 <karsten> review #19913 and release CollecTor 1.0.1 15:06:19 <karsten> #19643, #19791, and #19893 for metrics-lib 1.4.0, aug 24 15:06:19 <karsten> collector 1.1.0 aug 31 15:06:19 <karsten> make exonerator db backup before applying next change 15:06:33 <karsten> those are mine. 15:07:11 <iwakeh> I adapt the wiki pages and work on metrics-lib/collector 1.1.0 1.4.0 15:07:43 <karsten> sounds good! talk to you next week and on trac tickets until then? 15:08:02 <iwakeh> yes. all set. 15:08:13 <iwakeh> bye, bye! 15:08:13 <karsten> great! bye! :) 15:08:16 <karsten> #endmeeting