14:01:18 <karsten> #startmeeting Measurement Team meeting #4 14:01:18 <MeetBot> Meeting started Wed Jul 29 14:01:18 2015 UTC. The chair is karsten. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:01:18 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic. 14:01:30 <karsten> I saw phw. who else is here for the meeting? 14:02:32 <karsten> shall we give it another 5 minutes, phw? 14:03:14 <phw> sure, sounds good. 14:03:23 <karsten> phw: want to take a look at the roadmap draft in the meantime? 14:03:27 <karsten> https://people.torproject.org/~karsten/volatile/measurement-roadmap.pdf 14:03:30 * phw looks 14:03:39 * karsten makes coffee and is back in 4 14:06:39 * karsten has coffee 14:06:56 <karsten> did anybody else arrive for the meeting? 14:08:23 <karsten> phw: should we talk about measurement things anyway? 14:08:38 <phw> let's do it. 14:08:42 <karsten> ok :) 14:08:53 <karsten> so, that roadmap is the result of our last two meetings. 14:09:04 <karsten> with some content I added yesterday. 14:09:30 <phw> it looks good so far! is it on git? if so, i can go over it and make minor patches. 14:09:31 <karsten> it's mostly for ourselves, though we may be able to use it in the future. 14:09:42 <karsten> oh, that would be awesome! 14:09:52 <karsten> it's not on git, because I didn't know where to put it. any suggestions? 14:10:03 <karsten> tech-reports.git maybe? 14:10:16 <karsten> though it's technically not decided that it will be a tech report. 14:10:21 <karsten> which might not matter. 14:10:36 <karsten> it's just prettier in latex. 14:10:42 <phw> tech-reports.git is where i would have looked. 14:11:19 <karsten> great. let me put it there now... 14:11:52 <phw> thanks 14:13:07 <karsten> https://gitweb.torproject.org/user/karsten/tech-reports.git/log/?h=measurement-roadmap 14:13:10 <karsten> thank you! 14:13:31 <karsten> so, this is already part of the 1-1-1 task exchange I envisioned for today. 14:13:49 <karsten> one thing I wanted to ask for was that somebody reviews/revises that document. 14:13:59 <karsten> is there something that I can review for you? 14:14:10 <karsten> or, let me explain the "rules" first: 14:14:13 <karsten> - 1-1-1 task exchange: you get 1 minute to describe a task that would take somebody else roughly 1 hour and that they will do for you within 1 week (review a document, write some analysis code, fix a small bug, etc.; better come prepared to get the most out of this; give 1, take 1) 14:14:56 <karsten> the idea is that having a fresh set of eyes on something might help you make more progress than spending that hour on the thing yourself. 14:15:06 <phw> oh, that sounds useful. 14:16:10 <karsten> if you need to think a bit about this, feel free to send me something later today or tomorrow. 14:16:24 <karsten> (it took me a bit to go through trac, email, todo lists, etc. to find good tasks.) 14:16:37 <phw> i don't have anything right now, but i have a question regarding collector's data. 14:16:39 <karsten> (which I'll save for next week.) 14:16:42 <karsten> sure 14:17:15 <phw> have you ever experimented with putting (parts of) collector's data in a database for easy querying? 14:17:46 <karsten> we're using a database for parts of the metrics website, yes. 14:17:52 <phw> i'm currently experimenting with ways to make the data easier to analyse. see also my recent mail to damian on tor-dev@. 14:18:08 <karsten> and we're using a database for exonerator. 14:18:42 <karsten> the idea is to keep this database general purpose, not make a new database for a new problem? 14:19:42 <phw> is the current database flexible enough to answer questions such as "which guards changed their ip address more than X times?" 14:19:52 <karsten> not at all. 14:20:10 <karsten> so, 14:20:26 <karsten> I think a major problem is that data is distributed to more than one descriptor. 14:20:27 <phw> ah. because that's the kind of question i find myself asking a lot when analysing bad relays. and answering them involves a bit of manual work. 14:20:39 <karsten> which doesn't matter in this specific case, 14:20:54 <karsten> but for many problems you want to combine consensuses with server descriptors and even extra-info descriptors. 14:21:28 <karsten> and table joins are expensive. 14:21:59 <phw> by saying "not at all", do you mean it's impossible or just very slow? 14:22:02 <karsten> though in this case you'd be happy to wait a bit, right? 14:22:31 <karsten> the current database is written specifically for the purpose of producing the exact aggregate statistics shown on the metrics website. 14:22:36 <phw> personally, i'm find with waiting several seconds. several minutes would make it a little bit annoying. 14:22:44 <phw> s/find/fine/ 14:22:47 <karsten> it's not flexible at all. you could use it as inspiration, but not to solve your problem. 14:23:09 <karsten> who would use that database? just you? 14:23:14 <karsten> like, not the internet? 14:23:45 <karsten> unfortunately, it's quite possible that you'll have to wait for minutes or even longer. until you figure out which index you're missing. 14:23:58 <karsten> it's a huge amount of data, and it's easy to screw up performance-wise. 14:24:12 <phw> i would like it to be used by anyone who wants. either by setting up a dedicated service (which might be difficult) or by asking people to set up their own service, locally. 14:24:29 <karsten> the latter sounds good as a start. 14:24:55 <karsten> do you already have a database schema for this? 14:25:02 <phw> i should probably find a database person at the university and have a chat. 14:25:12 <karsten> oh, if you can find such a person, yes. 14:25:13 <phw> no, nothing. 14:25:25 <karsten> I can also take a look. but I'm not a database person. 14:25:32 <karsten> but I could comment on the tor specifics. 14:25:46 <karsten> like, which data could be missing, or what's potentially expensive to join, etc. 14:25:58 <phw> i was thinking it would be cool to have the data in a python shell eventually, which is more flexible and would facilitate exploratory analysis. 14:26:13 <karsten> I suggest you also look at the exonerator database schema, which is better designed, though still not perfect. let me find a link. 14:26:28 <karsten> well, there's also psql. :) 14:26:55 <karsten> you just need to write the importer, possibly using python. 14:27:23 <phw> psql is postgresql? 14:27:24 <karsten> for extra performance, write the importer in a way that produces .sql files that you can then import with psql. 14:27:32 <karsten> ah, yes, its command-line tool. 14:27:53 <karsten> https://gitweb.torproject.org/exonerator.git/tree/db/exonerator.sql 14:28:35 <karsten> here's another example for a metrics thing using psql: https://gitweb.torproject.org/metrics-tasks.git/tree/task-8462 14:28:52 <karsten> with the importer being https://gitweb.torproject.org/metrics-tasks.git/tree/task-8462/src/Parse.java 14:29:02 <karsten> look at the end. it writes files that psql can import. 14:29:11 <phw> very useful, thanks! 14:29:12 <karsten> that's faster than using any binding. 14:29:17 <karsten> to python/java/etc. 14:29:44 <karsten> sure. fun stuff! :) 14:30:13 <karsten> anything else we should talk about while we're here? 14:30:41 <phw> that's basically what kept me busy. maybe something you would like to talk about? 14:31:44 <karsten> no, I think we talked about two important things. nothing else comes to mind now. 14:32:00 <phw> ok. 14:32:11 <karsten> did the meeting reminder reach you on time? 14:32:20 <karsten> like, is 24 hours in advance good? or too late? 14:32:37 <phw> i use tor's google calendar, so i actually don't need a reminder. 14:32:44 <karsten> oh! 14:32:55 <karsten> okay, that works, too. 14:33:06 <karsten> great, I'll send out the next reminder 24 hours in advance for the folks who don't use it. 14:33:38 <karsten> okay, let's end this meeting early then. be sure to send me something to review for the 1-1-1 thing if you want. 14:33:47 <phw> will do! 14:33:52 <karsten> :) 14:33:54 <karsten> #endmeeting