14:01:18 #startmeeting Measurement Team meeting #4 14:01:18 Meeting started Wed Jul 29 14:01:18 2015 UTC. The chair is karsten. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:01:18 Useful Commands: #action #agreed #help #info #idea #link #topic. 14:01:30 I saw phw. who else is here for the meeting? 14:02:32 shall we give it another 5 minutes, phw? 14:03:14 sure, sounds good. 14:03:23 phw: want to take a look at the roadmap draft in the meantime? 14:03:27 https://people.torproject.org/~karsten/volatile/measurement-roadmap.pdf 14:03:30 * phw looks 14:03:39 * karsten makes coffee and is back in 4 14:06:39 * karsten has coffee 14:06:56 did anybody else arrive for the meeting? 14:08:23 phw: should we talk about measurement things anyway? 14:08:38 let's do it. 14:08:42 ok :) 14:08:53 so, that roadmap is the result of our last two meetings. 14:09:04 with some content I added yesterday. 14:09:30 it looks good so far! is it on git? if so, i can go over it and make minor patches. 14:09:31 it's mostly for ourselves, though we may be able to use it in the future. 14:09:42 oh, that would be awesome! 14:09:52 it's not on git, because I didn't know where to put it. any suggestions? 14:10:03 tech-reports.git maybe? 14:10:16 though it's technically not decided that it will be a tech report. 14:10:21 which might not matter. 14:10:36 it's just prettier in latex. 14:10:42 tech-reports.git is where i would have looked. 14:11:19 great. let me put it there now... 14:11:52 thanks 14:13:07 https://gitweb.torproject.org/user/karsten/tech-reports.git/log/?h=measurement-roadmap 14:13:10 thank you! 14:13:31 so, this is already part of the 1-1-1 task exchange I envisioned for today. 14:13:49 one thing I wanted to ask for was that somebody reviews/revises that document. 14:13:59 is there something that I can review for you? 14:14:10 or, let me explain the "rules" first: 14:14:13 - 1-1-1 task exchange: you get 1 minute to describe a task that would take somebody else roughly 1 hour and that they will do for you within 1 week (review a document, write some analysis code, fix a small bug, etc.; better come prepared to get the most out of this; give 1, take 1) 14:14:56 the idea is that having a fresh set of eyes on something might help you make more progress than spending that hour on the thing yourself. 14:15:06 oh, that sounds useful. 14:16:10 if you need to think a bit about this, feel free to send me something later today or tomorrow. 14:16:24 (it took me a bit to go through trac, email, todo lists, etc. to find good tasks.) 14:16:37 i don't have anything right now, but i have a question regarding collector's data. 14:16:39 (which I'll save for next week.) 14:16:42 sure 14:17:15 have you ever experimented with putting (parts of) collector's data in a database for easy querying? 14:17:46 we're using a database for parts of the metrics website, yes. 14:17:52 i'm currently experimenting with ways to make the data easier to analyse. see also my recent mail to damian on tor-dev@. 14:18:08 and we're using a database for exonerator. 14:18:42 the idea is to keep this database general purpose, not make a new database for a new problem? 14:19:42 is the current database flexible enough to answer questions such as "which guards changed their ip address more than X times?" 14:19:52 not at all. 14:20:10 so, 14:20:26 I think a major problem is that data is distributed to more than one descriptor. 14:20:27 ah. because that's the kind of question i find myself asking a lot when analysing bad relays. and answering them involves a bit of manual work. 14:20:39 which doesn't matter in this specific case, 14:20:54 but for many problems you want to combine consensuses with server descriptors and even extra-info descriptors. 14:21:28 and table joins are expensive. 14:21:59 by saying "not at all", do you mean it's impossible or just very slow? 14:22:02 though in this case you'd be happy to wait a bit, right? 14:22:31 the current database is written specifically for the purpose of producing the exact aggregate statistics shown on the metrics website. 14:22:36 personally, i'm find with waiting several seconds. several minutes would make it a little bit annoying. 14:22:44 s/find/fine/ 14:22:47 it's not flexible at all. you could use it as inspiration, but not to solve your problem. 14:23:09 who would use that database? just you? 14:23:14 like, not the internet? 14:23:45 unfortunately, it's quite possible that you'll have to wait for minutes or even longer. until you figure out which index you're missing. 14:23:58 it's a huge amount of data, and it's easy to screw up performance-wise. 14:24:12 i would like it to be used by anyone who wants. either by setting up a dedicated service (which might be difficult) or by asking people to set up their own service, locally. 14:24:29 the latter sounds good as a start. 14:24:55 do you already have a database schema for this? 14:25:02 i should probably find a database person at the university and have a chat. 14:25:12 oh, if you can find such a person, yes. 14:25:13 no, nothing. 14:25:25 I can also take a look. but I'm not a database person. 14:25:32 but I could comment on the tor specifics. 14:25:46 like, which data could be missing, or what's potentially expensive to join, etc. 14:25:58 i was thinking it would be cool to have the data in a python shell eventually, which is more flexible and would facilitate exploratory analysis. 14:26:13 I suggest you also look at the exonerator database schema, which is better designed, though still not perfect. let me find a link. 14:26:28 well, there's also psql. :) 14:26:55 you just need to write the importer, possibly using python. 14:27:23 psql is postgresql? 14:27:24 for extra performance, write the importer in a way that produces .sql files that you can then import with psql. 14:27:32 ah, yes, its command-line tool. 14:27:53 https://gitweb.torproject.org/exonerator.git/tree/db/exonerator.sql 14:28:35 here's another example for a metrics thing using psql: https://gitweb.torproject.org/metrics-tasks.git/tree/task-8462 14:28:52 with the importer being https://gitweb.torproject.org/metrics-tasks.git/tree/task-8462/src/Parse.java 14:29:02 look at the end. it writes files that psql can import. 14:29:11 very useful, thanks! 14:29:12 that's faster than using any binding. 14:29:17 to python/java/etc. 14:29:44 sure. fun stuff! :) 14:30:13 anything else we should talk about while we're here? 14:30:41 that's basically what kept me busy. maybe something you would like to talk about? 14:31:44 no, I think we talked about two important things. nothing else comes to mind now. 14:32:00 ok. 14:32:11 did the meeting reminder reach you on time? 14:32:20 like, is 24 hours in advance good? or too late? 14:32:37 i use tor's google calendar, so i actually don't need a reminder. 14:32:44 oh! 14:32:55 okay, that works, too. 14:33:06 great, I'll send out the next reminder 24 hours in advance for the folks who don't use it. 14:33:38 okay, let's end this meeting early then. be sure to send me something to review for the 1-1-1 thing if you want. 14:33:47 will do! 14:33:52 :) 14:33:54 #endmeeting