15:00:26 #startmeeting SponsorR 15:00:26 Meeting started Tue Apr 21 15:00:26 2015 UTC. The chair is asn. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:26 Useful Commands: #action #agreed #help #info #idea #link #topic. 15:00:32 hello meeting 15:00:34 greets 15:00:37 * nickm in attendence, distracted, cheerful 15:00:46 hehe 15:00:48 (and bad at spelling) 15:00:54 * syverson likewise I guess 15:00:55 :) 15:01:03 ok i see ohmygodel i see syverson i see dgoulet 15:01:10 maybe not karsten, but he sent an email about his potential absense 15:01:28 * teor just ended up in the SponsorR meeting by hanging around too long 15:01:31 so let's start with status reports 15:01:45 during past week, I looked at the graphs in #15513 15:02:01 i ended up comentoring two students in SoP 15:02:17 i got plenty of feedback on the direct onion services proposal 15:02:46 i opened #15743 and #15714. the latter has split up further. 15:03:03 and I was accepted for SoP under Yawning and asn, I proposed the Onion Name System 15:03:04 i'm still udnecided about #13667. 15:03:13 and that's that from me I think. 15:03:27 i have a few ideas about further stats, but they are not very good. 15:03:31 who wants next? 15:03:32 is kernelcorn jesse victors ? 15:03:36 ohmygodel_: correct 15:03:43 ok 15:03:50 * ohmygodel_ can go next 15:03:51 kernelcorn: ohmygodel_ is aaron johnson. 15:03:52 ohmygodel_: please go 15:04:02 i went to the sponsor r meeting last week 15:04:17 mostly david and rob did actual work 15:04:36 but i am working with SRI to collect timing information for their crawler 15:04:43 to see what kind of performance they are seeing 15:04:46 i saw that. waht's the end goal? 15:04:47 ah i see 15:05:04 our whole justification for improving HS performance 15:05:18 depends on such stats 15:05:27 yes. 15:05:30 thats also why Rob made OnionPerf 15:05:33 this might be handy client-side stats indeed 15:05:38 are they using one-hop circs? 15:05:47 for crawling? 15:05:54 no 15:05:58 (shouldn't they?) 15:06:11 maybe ? 15:06:15 ack 15:06:20 maybe they do care about anonymity though 15:06:31 since they are using full circs, then the client-side stats might be representative 15:06:39 they keep getting kicked off various sketchy sites for being too obviously botlike 15:07:10 no surprises here 15:07:17 fair enough 15:07:18 also i worked on the updated technical report about HS stats obfuscation techniques 15:07:28 yes i havent forgotten about that 15:07:29 and no robots.txt equivalent for .onion sites 15:07:48 actually its basically done, just some minor thing to add before sending it to you all 15:07:53 ohmygodel_: ack 15:08:02 its about 12 pages now 15:08:19 im not sure how to get it pulled into tor’s official git repo 15:08:35 pulled into? pushed into? 15:08:36 but maybe somebody can advise me on that later when you get a chance to look at it 15:08:51 pulled, pushed, shoved, slid, whatever 15:08:57 i don't think there is an official git repo for tech reports. 15:09:02 or maybe there is and i just don't know 15:09:22 i think karsten can figure this out. 15:09:26 well then maybe i can just pull request to https://git.torproject.org/user/karsten/tech-reports.git 15:09:30 yes 15:09:31 … if i know how to do that 15:09:39 ok FIN 15:09:40 you can't really pull request like on github 15:09:43 you would have to open a trac ticket 15:09:45 or send karsten an email 15:09:55 ohmygodel: ack thanks! 15:09:56 next?\ 15:10:26 * syverson can go I guess. 15:10:30 syverson: please 15:10:35 clone to Github, then make pull request 15:10:45 or make a diff or .patch 15:10:48 Attended Sponsor R QPR. Worked with SRI and others on challenges a bit. Taught some people there more about how Tor works and why. Began a response to the "Why is Tor working with DARPA question" that was much improved by Roger and issued by him. Fought with ohmygodel about naming and design issues for [successor-to-hidden] services. Attended the PM departure celebration and talked to the new PM about our goals. 15:10:53 Done. 15:11:26 great thx. you also answered the popularity thread, but i haven't had time to look at it yet. 15:11:30 next? 15:11:34 o/ 15:11:39 dgoulet: pleas 15:11:40 e 15:12:07 Right something I forgot. Probably forgot other things, but others please go. 15:13:05 QPR like you all know so some PR and helping other people understanding Tor, did some practical work on #15714 and #14917 15:13:21 FYI, #15714 is now split into 3 tickets you can find them as child ticket 15:13:34 and also I'm now wiser on SMC thanks to ohmygodel :) 15:13:38 * dgoulet done 15:13:41 thx 15:13:48 * nickm did very little for R, but tried to follow the various discussions. I wrote an implementation of PCSA for my approx_counting branch. Maybe others will find it useful. 15:13:50 Done. 15:13:55 PCSA? 15:14:04 Probabilistic Counting with Stochastic Averages 15:14:05 the hash counting trick? 15:14:08 ah ok 15:14:14 the trick on top of the hash counting trick 15:14:23 where you have multiple counters that you choose between. 15:14:23 :) 15:14:50 ok so discussion phase 15:14:55 what would you like to discuss? 15:15:04 i have a few ideas on further stats, but they need to be refined. 15:15:11 we could talk about that if you want. 15:15:11 roadmap refinement? 15:15:18 yay! 15:15:21 roadmap refinement sure 15:15:35 asn: i would like to hear about that 15:15:36 * isabela guess what topic I have (top down list :)) 15:15:37 dgoulet: nice find on #14917, well done on that 15:15:41 ah https://trac.torproject.org/projects/tor/wiki/org/sponsors/SponsorR 15:15:45 because the one in here ^ 15:15:48 is up to April 15:16:10 ok then it probably makes sense to refine the roadmap 15:16:26 fwiw, my exam period starts these days, so I will be low activity for the next 10 days. and after that I will be super low activity for 2 weeks. 15:16:30 just saying :) 15:16:35 asn: ack 15:16:54 asn: you know, a HS monitoring system is easier if you have a DNS, then it's very easy to get lists of HSs 15:17:15 meep 15:17:24 ok. so let's talk abit about more stats 15:17:32 since that's going to be useful during roadmap making 15:17:43 so we did #15513, which I think was great. 15:17:55 dgoulet: did you have any HSDir-related graphs in the end? 15:18:34 asn: nope unfortunately, I still plan to try to have some by the end of april and actually have the hs health running full time 15:18:42 15513 is very useful yes! 15:18:45 ok 15:18:59 so hsdir stats are coming up. that's ok. 15:19:13 after that, we are undecided on the next stats 15:19:16 here is the top down list: https://docs.google.com/spreadsheets/d/1mY8wax7FBUIAPAmDLpGcEtYCYCcsxW21KtLCntRbSk0/edit?pli=1#gid=0 15:19:27 some ideas from that list: 15:20:00 - on the stress testing part, we could play with the profiler results from #15515 etc. and identify choke points and places to be improved 15:20:17 we have already done a bit of this, and we could present some pie charts and stuff on where the CPU is spent on a busy HS. 15:20:28 potentially useful. potentially useless too. 15:21:04 - still on #15513 , I am a bit curious on looking how the bandwidth graphs of IPs of busy hidden services look like. 15:21:11 asn: hmmm we have already tickets for the chockepoint we could start working on 15:21:19 like we could check the IPs of the hidden services when they have lots of traffic, and see if we notice any bumps on their bandwidth graphs. 15:21:40 but that's not too exciting either. 15:22:02 - i'm still curious about #15714 and how we could find a better number than 16384 15:22:23 we could try to think of more statistics that could help us here, without completely leaking popularity 15:22:32 but this does not seem doable cleanly, without an SMC system. 15:22:55 another stupid suggestion would be to run those statistics on a few of our relays for a few weeks, and locally check the results 15:23:10 in that case we are not worrying about the stats beeing forever or global. 15:23:20 and it would give us a better idea on how the numbers look like. 15:23:29 but it's very cheap to run stats only on our own relays. so i don't really like this idea either. 15:23:31 asn: I can see that working (for IPs) 15:23:40 asn: i think local, private measurements is a useful technique in general 15:23:54 ohmygodel: i think so too, but it's a bit rude maybe. 15:24:09 i think it's useful because without some initial meausremnets it's hard to understand if our fears are reasonable or not 15:24:57 academic tor researchers do it all the time 15:24:58 anyway, these are my ideas so far. 15:25:03 ohmygodel: true 15:25:12 (that doesnt make it not rude, though) 15:25:37 so are you going to fix that popularity leak ? 15:25:41 which one? 15:25:42 by removing the IP cell cap ? 15:25:54 i think nickm and dgoulet are workking on the 16384 one 15:26:03 ohmygodel: #15745 ;) 15:26:25 but we still need to decide if we want to ditch officially the balancing algorithm 15:26:48 i think we should. at least in its current form. 15:27:07 dgoulet: a randomized cap is good, no cap is better 15:27:14 the fact that it doesn't have memory, makes it rotate through a big number of IPs quickly. 15:27:17 ohmygodel: not sure no cap is better... 15:27:24 for privacy anyway 15:27:37 asn: we did have a ticket for ditching the algo rith? (can't find it :S) 15:27:43 * kernelcorn is now AFK 15:27:51 that's #4862 15:27:59 asn: great 15:28:30 anyway, i'm not totally against running local measurements to figure out better numbers for this ticket. 15:29:20 no idea how private measurement will give us a better number for 16384 tbh... 15:29:39 we would check how many intros we get in intro circs for a month 15:29:50 and see if we get any of them with more than 16384. 15:30:08 if lots of circuits see more than 16384 intros, then that number is probably too low. 15:30:27 ah! wait "private measurement" means run that on our own relay? 15:30:30 yes 15:30:33 (ok sorry I thought it was in private network) 15:30:59 yeah I think we could definitely do that 15:31:55 ok 15:32:00 that could potentially be a next stat then 15:32:12 because the profiler output is not that exciting. 15:32:13 this might be more fun. 15:32:55 asn: will you have time to make a patch for this? 15:33:04 ehhhhmm 15:33:13 :) 15:33:22 maybe 15:33:25 if we decide what it needs to do 15:33:40 asn: ok let me know if you can't make it, I can jump on it I guess 15:33:41 but i would prefer to do it in 3 weeks tbh 15:33:52 although it doesn't seem too hard. 15:33:57 ok 15:34:09 let's use the ticket 15:34:11 in the following days 15:34:12 to discuss this 15:34:16 and see what we should measure 15:34:19 asn: yeah I think it's going to be pretty straight forward so we could do it soon-ish and run it for a while, I just don,t want to put you in trouble for your Uni ;) 15:34:24 asn: ack 15:34:54 simply counting intro1s per circuit would be very straightforward. making the collection a bit more privacy-preserving would require more thought. 15:35:19 ok 15:35:23 let's use https://trac.torproject.org/projects/tor/ticket/15744 for this task., 15:35:30 hash the onion with siphash24() and count them so when we log we don't know which .onion (anyway ticket talk :) 15:35:41 perfect 15:35:46 IPs don't get to learn the onion anyway 15:35:51 ok 15:36:10 let's talk on that ticket for the next week or so. if we decide on something good, maybe we can hack something up soon. 15:36:15 otherwise, I will do it when I finish exams. 15:36:20 ok 15:36:43 so OK, one potential controversial stat got added. 15:36:52 maybe we can move to roadmap refinement? 15:36:53 If this is onion counting we can do even better for privacy if we accept some uncertainty. See my approx_count branch. 15:37:26 nickm: i'm not very familiar with the approx_count technique 15:37:42 but i'm not sure if it adds something here 15:37:57 nickm: its not onion counting - its introduction cell counting at an introduction point 15:37:59 it's about not keeping IPs, when we count unique IPs right? 15:38:37 shit IP is an overloaded term in this conversation. 15:38:48 yeah there is no count unique streaming items issue here, which nickm’s approximate counting is for 15:39:11 so it's count-INTRODUCE2-cells-per-IP ? 15:39:26 INTRODUCE1, i believe 15:39:35 yes. probably intro1. 15:39:42 ah 15:39:43 maybe per-service. 15:39:46 or per-circuit. 15:40:03 cant be per-service (ips dont know service), right? 15:40:14 correct 15:40:27 it would be per-circuit. (and assuming that it's one circuit per service) 15:40:33 hrm isn't there the service pub key in the establish_intro cell ? 15:40:59 that's a fresh key for each IP, not the normal pukey of the hidden service 15:41:21 IIRC, each HS keeps an assymetric keypair for each one of its IP. 15:41:38 the pubkeys are exposed in the descriptor. 15:41:45 this is to not leak the identity of the HS to the IPs. it's smart. 15:41:55 clever 15:41:57 asn: you could keep a running noisy histogram and add in each circuit’s count once it’s closed 15:42:18 ohmygodel: yes, that would be the verbose version. 15:42:29 you would still have a running total for an open circuit, though 15:43:06 i'm not completely against a histogram, provided that the time period is big enough. maybe a week or a month. 15:43:39 the more privacy-preserving technique would be a simple 'True' or 'False' on whether any circuits saw more than 16384 INTRO1s during the past week. 15:43:40 if you dont want the histogram, the a noisy running total count of all INTRO1s is even easier, of course 15:44:03 total count would not help us figure out whether the 16384 number is good, right? 15:44:43 asn: im not sure how paranoid you need to be here. the worry is that you get hacked at some point? 15:44:56 no 15:45:01 Who's "you"? 15:45:05 asn: yeah, you're right. 15:45:06 just that the results should not be too revealing when we publish them. 15:45:09 “you” = asn 15:45:27 grrr 15:45:30 oh then fuck all this shit 15:45:32 (we are planning to publish these results, right?) 15:45:46 store everything you want and publish nothing 15:45:57 just use it to inform your protocol improvements 15:46:30 asn: hrm I wasn't thinking we should publish those 15:46:32 that's also a tactic. store everything for our use. and in the end publish a sanitized report. 15:47:35 anyway, let's use #15744 to discuss these techniques, ok? 15:47:56 Hmmm. That seems a pretty serious divergence from general Tor openness principles. 15:48:05 i agree 15:48:21 i disagree 15:48:26 haha 15:48:30 paul you yourself did this 15:48:33 Surprise! 15:48:42 in your measurement study with matt edman 15:49:05 you stored a bunch of shit during the study, and then only published very aggregate statistics afterwards 15:49:31 Right because we realized that we were not as privacy preserving as we had thought. 15:49:50 and rob is doing this with his retransmission observations on micah’s relay 15:49:57 and ivan did this with his HSDir observations 15:50:05 and gareth owen did it with his HSDir observations 15:50:26 and RPW did it with his onion router TCP connections observations 15:50:29 etc. etc. etc. 15:50:33 it's true. but i guess we could display higher openness standards. 15:51:08 I'm not saying we shouldn't do this at all. I'm saying to have eyes open and recognize what principles you're following. How you might be breaking them. If that's a good idea. How you should revise your principles. etc. 15:51:15 oh and roger did it when he started collecting HS stats on moria during the sponsor r kickoff 15:51:36 this is well established practice 15:51:57 It may be that the oft-stated principle is crap. That then requires revisitng. 15:52:00 not that the issue is closed 15:52:06 just its hardly “pretty serious divergence from general Tor openness principles” 15:52:27 oft-stated? 15:52:36 Roger says it all the time. 15:53:00 We should only collect anything that we would be willing to publicly share. 15:53:23 ok ok 15:53:29 let me think about it for the next few days 15:53:36 and i will suggest a few different techniques in the ticket 15:53:43 interesting - i believe roger was actually collecting HSDir *fetches*, which we know of no way to safely publicly share 15:53:57 ohmygodel: I was underscoring the problem with the principle and drawing a bright line between principle and practice, which is not good. Kinda my point. 15:54:09 * nickm 's principle is subtly different: If I would accept a hypothetical patch that prevented anybody from collecting a statistic, then the information is private. :) 15:54:19 but this principle too is suspect 15:54:20 i dont believe in that principle, and i dont know anybody that does 15:54:27 in fact violation of it is built into Tor itself 15:54:39 because Tor collects data for bridge statistics that it would never publish 15:54:56 I just wanted to note it. I don't think this is the time for this discussion. 15:55:20 nickm: what do you think about collecting info for solving #15744 on our relays? 15:55:36 nickm: if we collected that info on the whole network, we would leak popularity, which we have decided not to do, supposedly. 15:55:47 nickm: do you think that collecting suhc research info on our own boxes for a few weeks is too naughty? 15:55:52 For your values of "we". 15:55:59 tor devs 15:56:30 nickm: we would then look at the info, analyze it to derive conclusions, and publish whatever we think is appropriate to the public. 15:56:46 I disagree with you asn that tor devs decided this except perhaps by fiat of some, but let's leave that aside for now too. 15:57:44 let's defer the topic, I say 15:57:53 ok 15:58:00 let's talk in #15744 about this 15:58:01 over the next week 15:58:02 or so 15:58:03 (btw nickm, i was referring to syverson’s principle, not yours. your msg came out while i was writing that) 15:58:08 * karsten is around now 15:58:13 hello karsten! 15:58:18 hi asn 15:58:20 eok 15:58:21 ok 15:58:27 so does it make sense to do roadmap refinement now? 15:58:30 or leave it for next week? 15:58:39 That is *not* my principle. I'm just the messenger. 15:58:42 asn: hrm couple of things 15:58:45 dgoulet: do say 15:59:06 tor perf tickets, seems Rob took the ball on that which would differ them after April 15:59:17 ok 15:59:24 what is rob's thing btw? 15:59:28 oniionperf or something? 15:59:29 OnionPerf 15:59:31 is it like torperf but better? 15:59:36 or different use case? 15:59:38 I hope he updates tor-dev soon with it 15:59:41 ok 15:59:54 yeah it's TorPerf 2 but with a service component also as far as I know 15:59:55 it currently fetches some files of a fixed size over a hidden service he’s running 16:00:04 btw i talked to rob yesterday 16:00:17 he doesnt want to tackle torperf 2 as described in the proposal 16:00:24 (offtopic, I really need to steal dgoulet for a day from sponsor R and lock him in a room with all the accumulated torsocks patches) 16:00:28 its like tedious software engineering stuff 16:00:41 there is a toprerf2 proposal? 16:00:50 e.g. split into these modules, use these packaging techniques, etc. 16:00:51 writen by someone who is not Rob? 16:01:09 tedious software engineering stuff is my bread and butter, my rice and beans, my pasta and tomatoes. 16:01:10 ohmygodel: hire slave^wgradstudents? 16:01:11 there's a tech report where we sketched out a design for a torperf 2. 16:01:11 Yawning: ack, I'm *way* to far behind on that, need to talk half day to merge all and release, please continue to pressure me this week if I don't do it ;) 16:01:19 karsten: aha 16:01:20 that said, my time is hyperlimited over the next N months. 16:01:52 ohmygodel: ok so Rob just wants the quick results of it or improve the proposal or ? 16:02:07 sorry nickm, we’re not lazy, but we don’t get rewarded for spending much time on that kind of stuff 16:02:59 dgoulet: so what about torperf? 16:03:08 dgoulet: why did you mention it i mean? 16:03:28 it's in April HS roadmap 16:03:35 ah 16:03:41 we had kind of plan to have it on metrics soonish 16:03:42 dgoulet: if someone in tor is excited to incorporate this into metrics measurements, then i think it can go forward. if not, we’ll keep collecting the stats for our use in sponsor r only. 16:04:08 we could start by adding a link to metrics and think about doing more. 16:04:11 but actually you should talk to rob about it 16:04:28 a link being a page with a short description that can be found on the main metrics page. 16:04:31 ohmygodel: ack 16:04:44 ohmygodel: sounds good. 16:04:58 karsten: right but we really don't want to use current TorPerf right, instead maybe use R to build v2 ? 16:05:42 dgoulet: depends on who's going to do it. 16:05:59 current torperf is a bunch of scripts. better not touch anything or you'll break it. 16:06:14 yeah that's my concern ^ ... ENOTENOUGH resource/time I think 16:06:17 building v2 requires some effort, but may save a lot of trouble long-term. 16:06:59 but did we promise anything in april, which ends in about 9 days? 16:07:24 karsten: not that I know of no 16:07:29 ok good. 16:07:46 i think we are ok yes. 16:07:57 anyway, let's do roadmap refinement next week ok 16:07:58 ? 16:08:07 sounds good 16:08:18 i may stop coming to these meetings for a while 16:08:19 btw it's likely that someone might have to run this meeting during the first week of may. 16:08:34 bc im working on stuff now that doesnt really involve you all much 16:08:41 ohmygodel: ok 16:08:52 ohmygodel: you will still continue on SMC a bit? 16:09:05 dgoulet: yeah, thats a summer project 16:09:06 dgoulet: karsten: you have things for next week? 16:09:09 ohmygodel: great 16:09:16 asn: yeah this week is booked for me so no worries 16:09:27 ok 16:09:35 asn: I'll spend a bit of time on that new intro point graph and look at onionperf. 16:09:44 should I be doing anything else? (haven't read backlog yet.) 16:09:59 maybe CC yourself to #15744? 16:10:07 ok. 16:10:09 i will try to post some technqiues there this week 16:10:10 asn: I can help with the meetings while you are away 16:10:27 isabela: great. will tell you. 16:10:32 ok thanks for the meeting folks 16:10:35 have a good day! 16:10:36 #endmeeting