14:00:03 #startmeeting metrics team 14:00:03 Meeting started Thu Aug 18 14:00:03 2016 UTC. The chair is karsten. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:03 Useful Commands: #action #agreed #help #info #idea #link #topic. 14:00:09 who's here for the meeting? 14:04:08 hi iwakeh! 14:04:13 hi karsten! 14:04:19 saw you writing on the agenda pad. 14:04:27 https://pad.riseup.net/p/zUNzEIFRq5S4 14:04:28 yes :-) 14:04:37 cool. looks like it's just us today. 14:04:52 ok, many topics anyway. 14:05:03 yep. and thanks for running the meeting last week! 14:05:10 no problem. 14:05:17 okay, want to start with your topics? 14:05:22 fine 14:05:38 * CollecTor 14:06:17 should there be a hotfix release for the OOM error fix? 14:06:27 probably, yes. 14:06:45 so, CollecTor 1.0.1 14:06:49 should I put that out tomorrow? 14:07:00 did you test the fix on your instance? 14:07:03 It runs fine on the mirror. 14:07:06 okay, cool. 14:07:32 what's the ticket number again? 14:07:47 1.1.0 depends on metrics-lib 1.4.0 14:07:52 #19913 14:08:34 I just noticed that in https://trac.torproject.org/projects/tor/wiki/org/teams/MetricsTeam#PlannedMilestones 14:08:44 nice way to visualize that in trac. 14:08:52 the next topic 14:08:59 :-) 14:09:03 * karsten looks at 1.4.0 14:09:33 some of those tickets might take a while. 14:09:37 #16225 14:09:44 #18797 14:09:49 #17831 14:09:50 right. 14:09:53 #19640 14:10:00 add metrics-lib 1.5.0 14:10:07 as a milestone? 14:10:12 sure. 14:10:18 let me quickly do that.. 14:10:41 done. 14:10:46 great. 14:10:59 what was your idea for releasing 1.4.0? 14:11:05 I'm currently impplementing 14:11:18 the index.json downloader. 14:11:23 ah, cool. 14:11:36 (and by idea I mean when did you want to have a 1.4.0 release?) 14:11:43 and think, this could be ready 14:12:07 for review beginning next week i.e. monday. 14:12:28 But, as I find myself adding comments 14:12:28 ok. 14:12:32 like 14:12:50 /TODO: log warning when logging is available. 14:13:01 I'd like to also include 14:13:10 #19643 14:13:19 ok. 14:13:29 Just these two in 1.4.0? 14:13:41 oh, #19893 is trivial. 14:14:15 sure, all these things, too. 14:14:26 the rest might wait for 1.5.0. 14:14:35 it's merge ready, anyway :-) 14:14:42 yes. 14:14:49 hehe 14:14:53 even better. 14:15:05 I'll also increase coverage a bit. 14:15:33 as part of the main tickets. 14:15:39 okay, #19643, #19791, and #19893 for 1.4.0. 14:16:01 sounds fine. 14:16:10 let me quickly move the remaining ones to 1.5.0. 14:18:13 done, I think. 14:18:31 aiming for when? 14:18:48 hmm, depends on the review. 14:19:05 depends on the patch length. ;) 14:19:15 August 30; longer patch ;-) 14:19:37 what about other releases in august? 14:19:52 should we put out collector 1.1.0 in august? 14:19:55 CollecTor 1.1.0 is related. 14:20:48 But I'm going to be offline next weekend. 14:21:05 lots of tickets for 1.1.0. 14:21:15 reduce? 14:21:28 and move? 14:21:42 it's currently scheduled for aug 31? 14:21:45 a few are minor. 14:21:57 We could aim at it. 14:22:31 if we do that, should we schedule metrics-lib 1.4.0 for end of next week? 14:22:36 so that there's a bit of time between releases? 14:22:55 Sure. 14:23:01 aug 26? 14:23:28 Well, I'm offline from Aug 25-29 14:23:37 okay, aug 24? 14:23:45 sure. 14:23:48 I can prioritize review there. 14:23:54 deadlines help sometimes. 14:23:56 okay. 14:24:01 yes :-) 14:24:28 sounds like a good plan. 14:24:33 yep. 14:25:19 * roadmap ... ? 14:25:34 yes, which one? :) 14:25:37 https://trac.torproject.org/projects/tor/wiki/org/teams/MetricsTeam#ReleasesandMilestones 14:25:48 you noticed already. 14:25:53 ah that one! 14:26:08 I'd like to add a graph there for dependencies. 14:26:18 graphviz works in trac I think. 14:26:32 and remove "next steps" 14:26:33 does it need... a plugin? 14:26:37 replace the 14:26:46 open topics with tickets. 14:26:59 It's too cumbersome to track in that list. 14:27:09 https://trac.torproject.org/projects/tor/wiki/org/teams/MetricsTeam#NextSteps 14:28:02 regarding graphviz: Ijust need to find the time to check. 14:28:30 removing that Next Steps list seems like a good idea. 14:28:37 But, that shouldn't be a problem either way. with or without plugin. 14:28:38 though we'll have to turn some tasks into tickets. 14:28:43 right. 14:28:49 but many can probably just go away. 14:29:05 yes, it's more like a scratch pad. 14:29:55 okay, I'll clean up a bit there this weekend. 14:30:06 great! 14:30:26 * onionoo versioning? 14:30:30 yes? 14:30:47 Looking at the planned milestones it 14:31:02 is odd that we have 3.1.1 even before the first release. 14:31:06 I think, we 14:31:31 should use onionoo-3.1-1.0.0 as version like 14:31:38 junit4-4.11 14:31:51 or commons-lang-3-3.5 14:32:11 that might work. 14:32:14 It's quite a few numbers. but it fits better into the planning. 14:32:29 we could drop the last digit. 14:32:53 Just onionoo-x.y-a.b ? 14:33:02 yes. 14:33:30 So, only major releases for protocol changes? 14:33:40 ah, no. 14:33:43 x.y is protocol. 14:33:48 right. 14:33:49 a.b is implementation. 14:34:18 I mean, I can also imagine x.y-a.b.c. 14:34:29 That's better. 14:34:32 but you're right that x.y.a doesn't work so well. 14:34:46 well, might not work so well. 14:34:52 No, I wondered even if i missed a release :-) 14:34:52 okay, x.y-a.b.cd? 14:35:03 errr 14:35:05 d < 10 14:35:10 okay, x.y-a.b.c? 14:35:16 yes :-) 14:35:19 * karsten types cd too often. 14:35:53 okay, that means I should rename some milestones? 14:36:02 yes 14:36:19 "Onionoo 3.1-1.0.0" 14:36:20 not urgent. 14:36:24 right? 14:36:37 that's right. Onionoo 3.1-1.0.0 14:36:37 or "Onionoo 3.1-3.1.0"? 14:36:40 ok. 14:37:05 unless we want to create a milestone inflation ;-) 14:37:16 Exonerator 10.11.0 14:37:17 renamed. 14:37:21 hehe 14:37:26 thanks. 14:37:45 I'll adapt the wiki pages later. 14:37:48 we have Onionoo 3.1-1.0.0 and 3.1-1.1.0 now. 14:37:51 ok. 14:37:56 fine. 14:38:20 * webstats? 14:38:23 yep! 14:38:39 Sebastian and I started adding sanitized web logs to a database. 14:38:57 where are the import scripts? 14:39:07 that was not really planned, but we happened to meet and found this to be a fun thing to do. 14:39:26 so far, on my laptop. I can paste them if you're curious. 14:39:38 (hi) 14:39:41 oh, for documentation. 14:39:42 hi! 14:39:48 hi sebastian! 14:39:53 at some point they should go into metrics-web.git. 14:39:59 maybe for now they could live in metrics-tasks.git. 14:40:04 yes. 14:40:19 just to know what was implemented. 14:40:22 okay, I'll put them there. 14:40:41 I think the db idea is good, because we have 14:41:00 other questions than usual web-statistics. 14:41:11 right. 14:41:18 and different data. 14:41:20 It might be an idea to have a read-only export somewhere 14:41:28 that people can query on a tpo host 14:41:33 like people or so 14:41:46 people.tp.o. 14:41:47 with phpadmin access? 14:42:04 with psql access. 14:42:05 or similar. 14:42:12 oh, command line? 14:42:17 yep. 14:42:32 well, if that's sufficient for the intended users. 14:42:43 We grouped users into two camps 14:42:44 fine question. we discusses that, too. 14:42:48 discussed* 14:42:59 * karsten lets Sebastian explain. 14:42:59 one camp is those who probably are fine with it, and the other camp is those who probably need manual help anyway. 14:43:22 and we enable people from camp one to help those in camp two. In addition to the metrics team. 14:43:45 well, I saw people use SQL 14:43:57 comfortably in the gui of phpadmin who 14:44:12 would not be able to use ssh login and cmdline 14:44:30 But, I actually have no idea who would access. 14:44:52 there are people who are in neither of the two groups. 14:45:18 we just thought about the effort required for building an interface for them vs. the time to just help them get their answers. 14:45:28 I don't know about phpadmin, but I sense potential security issues. 14:45:30 if there's a read only dump there's no reason people can't access it through whatever system they want to 14:45:35 phpadmin is out of the box. 14:45:43 the model used for the ultimate debian database seems relevant here 14:46:02 one central system you can access with psql from trusted hosts, but also read-only dumps where you can easily mirror and do whatever you want 14:46:19 yes. 14:46:20 if you want a fancy php gui, then fine, but we don't run it 14:46:58 https://wiki.debian.org/UltimateDebianDatabase/ (for context) 14:47:01 do you consider phpadmin fancy? 14:47:17 i need to use a mouse to use it, it's fancy 14:47:31 no, the tabbing works :-) 14:47:46 if people are using the data to do analysis for certain cases, then the end result is probably not the raw output that you want, it's some post-processing that makes a report or visualisation 14:47:59 so if people want to produce such things, they write something that does that and maybe we host that 14:48:09 There are simple csv exports for those. 14:48:42 okay, I'd say let's get the database ready first before thinking about how to make it available. 14:48:55 yes, one step after the other. 14:49:06 what we did was 1 day of work, not more. we extracted some basic results. 14:49:07 so read-only db is a good idea. 14:49:15 but we need to put more effort into it. 14:49:20 yes. 14:49:24 including getting feedback from tor browser people. 14:49:55 I'd say let's revisit this topic when we have a good database to share. 14:50:17 will the schema stay as it is? 14:50:34 It's ad hoc 14:50:49 just to see what we can do with a couple of queries 14:50:50 so, maybe, but probably not. 14:51:09 ok. 14:51:10 we also only imported ~2 weeks of data. 14:51:27 how many weeks should be in there? 14:51:37 hmmmm 14:52:01 * karsten finds https://webstats.torproject.org/out/archeotrichon.torproject.org/archive.torproject.org-access.log-20150920.xz 14:52:02 at least 18 months 14:52:33 ah, you mean by throwing out data that's older than 18 months? 14:52:56 it looks like the oldest web logs are from 2015-09-20. 14:53:15 ideally, we wouldn't have to throw out data. 14:53:27 depends, if any historic analysis could be of value later? 14:53:49 that's what dbs are for - long time storage. 14:53:54 yep. 14:54:02 karsten: yes 14:54:12 so, we should try to import the year of data we have. 14:54:20 yes. 14:54:20 I'm saying that if we have to throw away data, we should store at least 18 months. 14:54:25 ok. 14:54:47 alright, next topic? (5 mins left) 14:54:52 fine. 14:55:07 * ExoneraTor database cleanup (karsten) 14:55:25 Sebastian and I (see the pattern here?) cleaned up the exonerator database a bit. 14:55:30 hehe. 14:55:33 note to self: vacuum full can take a while. 14:55:45 so, we threw out some fields and reduced size by 11%. 14:55:59 today I implemented more changes and reduced my sample table from 14:56:03 ok, what data is missing now? 14:56:04 805 MB 14:56:09 to 66 MB. 14:56:20 nothing is missing, just stored more efficiently. 14:56:25 ok. 14:56:31 in the 11% we threw out unused data. 14:57:11 the next change will move stuff to new tables. 14:57:22 (which are 640k and 392k in my sample database.) 14:57:33 the old schema was very suboptimal with lots of duplicated information 14:57:42 yep. 14:57:46 Hopefully there'll be some easy gains wrt query performance, not just table size. 14:57:55 * Sebastian out, have a fun rest of the meeting 14:58:00 bye! 14:58:10 I'm optimistic wrt query performance. 14:58:26 Well, let's see ... 14:58:56 are the changes visible somewhere in the git repo? 14:59:03 okay, I'm mostly mentioning this to let you know that I'm cleaning up a bit of these exonerator changes now. 14:59:21 sounds good. 14:59:23 before focusing on something entirely different. because if I don't, I'll lose a lot of context here. 14:59:32 first change is pushed to master. 14:59:48 second is still in a text file here. would you want to review that change before I push to master? 15:00:10 is it a tricky change? 15:00:28 the migration part from the current schema is tricky. the change is simple. 15:00:47 It can be reverted? rolled back? 15:01:07 the schema change? 15:01:25 applying the change to the db? 15:01:42 not really. 15:02:04 is there no db backup? 15:02:29 ah, yes, there are host backups. 15:02:48 but no archives. 15:02:59 anyway, I guess we can create a backup before making the change. 15:03:11 just to avoid re-importing 10 years of data. :) 15:03:15 that might be useful. 15:03:42 ok. 15:03:54 guess we ran out of time and topics. 15:03:57 will there be application changes, too? b/c of the schema changes? 15:04:02 nope. 15:04:07 ah 15:04:09 Oh, 6 past 17:00 :-) 15:04:11 tiny ones. 15:04:26 but 98% of changes are in the schema. 15:04:44 stored procedures for inserting and querying. 15:04:52 well, we'll see how the performance changes. 15:04:56 :-) 15:05:03 heh 15:05:29 okay, 15:05:50 I have a few action items here. 15:05:57 as do you, I think. 15:05:58 e.g. 15:06:06 oh 15:06:15 yes. there are. 15:06:18 clean up Next Steps on team page 15:06:19 review #19913 and release CollecTor 1.0.1 15:06:19 #19643, #19791, and #19893 for metrics-lib 1.4.0, aug 24 15:06:19 collector 1.1.0 aug 31 15:06:19 make exonerator db backup before applying next change 15:06:33 those are mine. 15:07:11 I adapt the wiki pages and work on metrics-lib/collector 1.1.0 1.4.0 15:07:43 sounds good! talk to you next week and on trac tickets until then? 15:08:02 yes. all set. 15:08:13 bye, bye! 15:08:13 great! bye! :) 15:08:16 #endmeeting