16:03:55 #startmeeting 16:03:55 Meeting started Tue Dec 9 16:03:55 2014 UTC. The chair is asn. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:03:55 Useful Commands: #action #agreed #help #info #idea #link #topic. 16:04:02 ohmygodel: ack. 16:04:23 ok that’s it haha 16:04:26 great 16:04:33 so let's start with what we've been doing past week 16:04:38 and we move to discussion in a bit. 16:04:44 so personally, I've been doing non-SponsorR stuff 16:04:47 but on the SponsorR front 16:04:55 I've been running a relay that is collecting stats 16:04:57 for the past 5-6 days 16:05:03 so far the stats look like this: 16:05:19 [warn] Original rp_relay_cells_seen: 96291. Obfuscated: 90112 Original hs_stats->hs_seen_as_hsdir: 54. Obfuscated: 56 16:05:22 [warn] Original rp_relay_cells_seen: 1278. Obfuscated: 1024 Original hs_stats->hs_seen_as_hsdir: 83. Obfuscated: 88 16:05:25 [warn] Original rp_relay_cells_seen: 18068. Obfuscated: 19456 Original hs_stats->hs_seen_as_hsdir: 108. Obfuscated: 112 16:05:28 [warn] Original rp_relay_cells_seen: 22504. Obfuscated: 8192 Original hs_stats->hs_seen_as_hsdir: 119. Obfuscated: 136 16:05:31 ponder on these for a bit 16:05:36 i also posted a new version of proposal 238 on tor-dev 16:05:44 that details how exactly we are currently doing obfuscation 16:05:54 yeah i just sent a reply 16:05:55 and I just noticed that ohmygodel replied with a different way of doing this. 16:06:07 yeah i think you should flip the order of noise and binning 16:06:18 ohmygodel: btw, is "binning" an established term with a name? 16:06:23 because i'm calling it round-up obfuscation. 16:06:24 anyway 16:06:38 i also looked at the tech report a bit. i have a small patch, but I could do more there so I haven't published it. 16:06:39 yes: 16:06:41 and that's that from me. 16:06:43 ohmygodel: cheers 16:06:48 who wants next? 16:06:57 * karsten can go next 16:06:59 karsten: go! 16:07:10 I picked up asn's branch that implements said proposal 238 16:07:21 and added obfuscation as we discussed it last week. 16:07:34 I also tested it in a local chutney network. 16:07:51 and I looked at asn's revised proposal. like it. 16:07:54 that's all. 16:08:03 great 16:08:06 great work with the coding btw 16:08:10 * dgoulet can go next 16:08:14 asn: :) 16:08:15 dgoulet: go!! 16:08:50 also pretty quick, last week was mostly for me to finalize part of the measurement framework I've been working on, my goal is to have the report and doc. this week 16:09:04 review karsten/asn code also 16:09:14 dgoulet is this for measuring hs performance ? 16:09:16 read the proposal which is pretty fine by me 16:09:17 I talked to Aaron (ohmygodel) about obfuscation and how it is both more useful and more secure to do this on the global aggregate than locally 16:09:20 ohmygodel: yes 16:09:25 ok thx 16:09:39 good morning, world 16:09:44 ohmygodel: private network, we still have to setup it on Shadow (with rob) but in the making 16:09:58 dgoulet: great. 16:10:05 done 16:10:08 who next? 16:10:16 syverson: ack! 16:10:22 syverson you done ? 16:10:48 What? Sorry. 16:10:53 syverson: you wanna do a status report? 16:11:01 apart from 16:09 < syverson> I talked to Aaron (ohmygodel) about obfuscation and how it is both more useful and more secure to do this on the global aggregate than locally 16:11:11 or that summarizes your activity sufficiently? 16:11:24 Oh. Ermm thinking. 16:11:47 actually what does "how it is both more useful and more secure to do this on the global aggregate than locally" mean? 16:12:04 doing obfuscation on the global value instead of individually in each meter? 16:12:13 Read through the tech report and made a bunch of hand notes. Wasn't sure about editing it. 16:12:43 that would be very interesting feedback, syverson. 16:13:23 assuming you mean the tech report where we outline what stats we might gather about hidden services in the future. 16:13:33 and what we should rather not gather. 16:14:06 ok great 16:14:07 Well if you are adding noise per HSDir, then the cumulative effect can be bad. 16:14:23 yes 16:14:27 too much noise, you mean? 16:14:54 well more than is necessary 16:15:02 ye 16:15:17 because all we currently want is a single number 16:15:27 Yes. That report. Nothing deep so far. One thing I was doing was changing hidden service to .onion site everywhere as per some other conversations. 16:15:30 ok ohmygodel wanna finish with status report? 16:15:35 reporting per-relay is only because it is the easiest to do 16:15:39 or you are fine with moving to discussion directly? 16:15:53 yes ill go 16:16:11 ive been thinking about privacy threats for the proposed stats 16:16:14 * aagbsn waits turn 16:16:19 aagbsn: ack! 16:16:24 one minor issue 16:16:41 is that currently the security stuff only exists in the proposal 16:16:58 there is this "risk" session in every stat in the tech report 16:17:02 but the tech report is the place where were collecting discussio of all stats 16:17:06 yes 16:17:07 i agree 16:17:20 i also get a weird feeling reading the tech report 16:17:24 so, should that be ported back from the proposal ? 16:17:33 i'm not sure if the "details / benefits / risks" model is the best model. 16:17:41 not at all, asn. 16:17:48 but i can't think of a much better model either. 16:17:50 it was just useful for collecting ideas. 16:17:53 karsten: yes 16:18:03 the tech report needs quite some love. 16:18:05 ohmygodel: yes, ideally all those security problems from the proposal should have a nice place in the tech report. 16:18:17 the tech report should ideally have a section on obfuscation too... 16:18:44 note that the proposal is almost (?) ready for publication, whereas the tech report is for january 15. 16:18:47 or 12. 16:18:48 ok that answers that problem 16:18:48 karsten: yes 16:18:59 so my approach is to consider 16:19:10 tbh I can see the tech report dragging past jan15. but we should definitely have a decent first version ready by then. 16:19:14 1. what things we want to make sure to keep private 16:19:25 asn: makes sense. 16:19:37 karsten: :) 16:19:47 2. what background knowledge the adversary could plausibly have 16:20:23 so i guess that follows the “risks” approach, but i think it a good way to think about it 16:20:23 (and we are now moving towards the discussion phase of this meeting btw) 16:20:38 (we should let aagbsn say his report too) 16:20:40 ohmygodel: yes I agree 16:20:47 ok sure that ends my status update 16:20:54 ohmygodel: ack thanks! 16:20:56 aagbsn: next? 16:21:21 ok 16:21:27 dropping by per roger's recommendation and attempting to catch up / see where I can be useful here. dgoulet sent me a few links (SponsorR, SponsorRtasklist. are these being assigned to people as they pick them/are there any tasks no one else wants to do? 16:21:41 hm 16:21:45 that's a good question. 16:21:48 we should think about this. 16:21:51 that is, where you can fit. 16:22:19 i don't have a good answer atm, because my view is quite narrow towards the first deadline on jan 15th. 16:22:22 I got rogers email earlier today and haven't had a lot of time to look at things yet 16:22:30 and till then, we have a good idea on who does what. 16:22:41 but I think we can definitely find tasks for people. 16:22:49 is anybody working on figuring out how much our stats are polluted by crawlers ? 16:22:55 ohmygodel: no. 16:22:58 probably I can look at tor controller related code 16:22:59 ohmygodel: no one is working on this. 16:23:25 ok, theres an idea 16:23:32 not sure if thats the best use of aagbsn’s time 16:23:58 perhaps there are thinsgs most directly applicable to jan15 deadline? 16:24:30 is there a set of things that are supposed to be done by then? 16:24:36 yes 16:24:46 *** learn number of HSes / how much HS traffic 16:24:47 *** privnet benchmarking // interesting observations that lead to surprising results 16:24:49 *** have a list of tasks/projects ready (read the thread) 16:24:50 this is the rough list. 16:24:52 *** perfromance baseline 16:24:54 I didn't see it as safe statistics per se, but I have been describing to ohmygodel (and a little to Roger) setting up a reporting site and also "honey" detection for when a cralw is taking place. 16:25:08 sysrqb: interesting 16:25:26 syverson: interesting 16:25:30 syverson: donncha of oniontip was talking about a honey system 16:25:40 might want to loop him in here too 16:25:42 This is sort of like exonerator (but only sort of) 16:26:08 aagbsn: so, the above 4 bullet points are what we need to deliver by jan15th supposeldy. 16:26:15 aagbsn: the first one is proposal238 that I just sent to [tor-dev] 16:26:31 aagbsn: the second one is the work that dgoulet is doing with chutney 16:26:39 aagbsn: the third one is https://trac.torproject.org/projects/tor/wiki/org/sponsors/SponsorRtasklist 16:26:46 aagbsn: the fourth one is also related to what dgoulet has been doing 16:26:56 Mostly I just wanted to detect when crawls were taking place, but it could also be used to distinguish "benign" crawls from scanning for attack surfaces, etc. 16:27:12 wait, bullet 3 is all the tasks on that list? 16:27:23 make the list, not do the things. 16:27:30 oh, ok :) 16:27:40 syverson: isn't this a bit of an arms race though? 16:28:01 syverson: that is, we detect crawling. they change software to crawl in a different way . we detect new crawling. they change software etc. 16:28:26 syverson: but I like the idea. 16:28:28 one thing for Jan 15th that could be good is to present performance improvement and it's possible to do on the host side like #13739 (I think there are other ticket that slow path have been identified) 16:28:54 I'm mostly not concerned with adversarial crawling, though we can consider it. I'm mostly concerned with recognizing benign crawls to see how it affects statistics. 16:28:56 asn: what’s the arms race? its passive observation 16:29:25 what is adversarial crawling and what is bening crawling? :) 16:29:38 they seem kind of the same to me tbh. 16:29:53 ohmygodel: well, the arms race is that of detecting crawling as an action. 16:29:58 differentiating between normal client and crawler. 16:30:06 from the PoV of a website operator, I guess. 16:30:10 what is the incentive to change your crawling behavior ? 16:30:23 to not get detected by the crawling detectors 16:30:27 if you even care so. 16:30:37 Sorry. Somebody trying to see the HSes that are out there as opposed to scanning HSes to see which are vunerable to attacks. 16:30:41 not trying to kill the discussion, because it's a good thing to do, imo. but this is something for post-jan15, right? 16:30:50 ok, but the purpose of crawling detectors is just to adjust our statistical inferences 16:30:53 karsten: yes, I think so. 16:31:11 karsten: i think the above 4 bullet points I posted is what we are doing for jan15. everything else is discussion for now :) 16:31:17 ok. 16:31:19 ohmygodel: ah, I see. 16:31:38 asn: i think that adjusting for crawling behavior is plausibly doable and interesting for jan 15 16:31:41 i was imagining that the purpose of crawling detectors would be to *stop* the crawling. 16:31:45 if there is someone who has time 16:31:55 I'm going to start on it soon, but I may be able to mostly coordinate with SRI folk and not involve others who are too busy with other things. 16:31:59 i was hoping the SRI people might do it, but if Tor has a person, great 16:32:10 maybe aagbsn could be the person! 16:32:19 but can you explain a bit more in depth what you mean? 16:32:46 figure out how much of the RP activity we see is due to crawler fetches 16:32:52 like, we run some HSes, we check the amount of crawling that happens, and then we put this knowledge into the statistics to weed out crawling traffic from real traffic? 16:33:08 a design I overheard was to run some HS that are never listed publicly 16:33:15 At least two things. At simplest you just have a site where people can announce they are doing (or did if they don't want to announce in reatlime) a crawl. 16:33:36 to look for crawling that is done by running hsdir and snooping 16:33:52 aagbsn: yes! watching the wattchers 16:34:05 Second, set up a bunch of HSes that don't have much other purpose and watch the pattern of accesses to them. Announce their results on the same site. 16:34:30 without telling which sites these are. 16:34:42 I'm currently working on detecting HSDir's who are crawling the DHT 16:35:26 Current plan is to set up some relays, then to run some HS's which only publish their descriptors to my relays. 16:35:30 syverson: and what's the end goal? to learn how many entities are crawlin? 16:35:40 syverson: or to learn how *much* they are crawling? 16:35:44 There are separate possibilities for what you want to detect. If the "honey" HSes are not publicly announced anywhere, that might indicate misuse by an HSDir. 16:36:17 But for the publicly announced ones, it's just showing benign (sorry) crawling. 16:36:34 I'm then going to publish a descriptor for a unique HS to each HSDir (modifying the descriptor_id and resigning). 16:36:59 DonnchaC: that's nice. 16:37:07 asn: my thought is that crawling the web does not disrupt the web much, but crawling a tiny thing like .onion space really displaces the statistics. 16:37:21 By monitoring the resulting descriptor fetches and connections, it should be possible to figure out the malicious HSDir's, potentially blacklisting them 16:37:21 DonnchaC: i think the pubkey also needs to match because HSDirs will check whether they are in the correct slice before serving a desc 16:38:09 DonnchaC: see hid_serv_responsible_for_desc_id() 16:38:11 DonnchaC: yeah what asn is saying is a problem here, the HSDir won't accept to store de descriptor if the pubkey is not in the range of it 16:38:29 of the HS* 16:38:38 syverson: yes, that might be true. 16:38:43 DonnchaC: what asn and dgoulet say. but when you create a new pubkey, be sure to reuse introduction points. 16:38:50 dgoulet: I'll recheck, I thought it checked the descriptor id but not the public key 16:38:55 DonnchaC: that was something I was considering, but I mostly was focused on the statistical impact of crawling first. 16:39:05 syverson: but do you think we will be able to learn "So crawling is X% of the HS traffic" from this experiment? 16:39:27 syverson: we might be able to learn "there are N different crawlers crawling the Tor network" 16:39:45 syverson: or even "crawler Y is _this_ fast, and crawler Z is _that_ fast" 16:39:56 syverson: but not sure how to get the crawler's influence on the statistics. 16:40:08 but it seems like an interesting project. 16:40:17 shedding more light to the crawling activity seems worthwhile. 16:40:18 DonnchaC: not on the HSDir part, the client had that issue (maybe #13214 is what you are talking about) 16:40:23 Not sure. At the least you should be able to say, here are the statistics during a period when we knew crawling (or n crawls) were taking place, and here's the stats when no crawls were detected. 16:40:52 hmmmm 16:40:59 plausible 16:41:20 although "no crawls were detected" might actually be "no crawlers were stupid enough to hit our weird unknown HS" 16:41:35 but the smart crawlers will still be crawling the public HSes that are not run by us. 16:42:05 yeah and there are definitely crawlers focused on only certain specific HSes 16:42:25 Even if we just differentiated when SRI's darkcrawler and ahmia were crawling and when they weren't would be useful. 16:42:25 might be able to insert links to a honion (honey-onion) on other public hs in a way that humans wouldn't click them 16:42:25 e.g. Grams 16:43:07 Similarly if somebody else we don't know about is doing research crawling we can find that. 16:43:44 aagbsn: do you find this an interesting problem to work on 16:43:46 ? 16:43:48 it's fun. 16:43:53 dgoulet: I don't think the HSDir recalulates the descriptor id when it receives a descriptor, it just uses the one provided in the descriptor and checks if it is responsible for the provided descriptor_id 16:44:06 DonnchaC: yup exactly 16:44:39 we can also setup HSes of different publicity. 16:44:42 This is intiial stuff, to perhaps have something to say for January. The adverarial aspects can be addressed but if we should at least try to know how the non-adversarial activities affect things. 16:44:49 one of them can be completely private, the other can be on ahmia, the other can be wherever, etc. 16:45:16 asn: could be, depends what we decide to do :) 16:45:39 i think differentiating adversarial (finding vulns) from bening crawling (search enginers or whatever this might be) will be exteremely hard 16:45:44 it's bascially the IDS game. 16:46:14 aagbsn: well, this seems like something useful and tractable 16:46:18 might want to know if 2 requests are from the same crawler 16:46:26 e.g. fuzzing, not walking 16:46:36 The goal should probably not be to win the arms race but to get the low hanging fruit and then quit the race. 16:46:44 I'm going to keep working on detecting adversarial HSDir's (logging DHT requests) anyways. I'll let you know what kind of results I get. 16:46:46 syverson: heh 16:46:49 also look for scanners that try to connect to ports other than 80, etc 16:46:54 DonnchaC: yes, seems worthwhile. 16:46:57 DonnchaC: keep us in the loop. 16:47:42 not sure how to make tor listen to all ports, for example 16:47:47 I'd imagine it should be possible to identify some malicious relays, and corrolate what relays are possibly run by the same people/groups depending on the fingerprint of the crawl 16:47:52 aagbsn: a patch will go in soon for that but yeah that could be a very intersting stat also, if HS get portscan: https://trac.torproject.org/projects/tor/ticket/13667 16:48:09 OK, so let's move to the next discussion topic? 16:48:21 aagbsn: Can try to log the rendevous request too, rather than just connections to port 80 16:48:28 if so id like to discuss more complicated protocols for stats collection 16:48:32 (i gtg in 12min) 16:48:45 ohmygodel: ok let's talk about this! 16:49:00 ohmygodel: you mean some way for all relays to communicate with each other and exchange stats 16:49:00 so it would be really great if we could do something like privex 16:49:06 right 16:49:10 DonnchaC: 4583 log_warn(LD_REND, "Parsed descriptor ID does not match "$ 16:49:10 4584 "computed descriptor ID.");$ 16:49:10 something like that yeah 16:49:24 DonnchaC: look for that in src/or/routerparse.c 16:49:33 is this something work shooting for in the next few months ? 16:49:37 my first thoughts are "big project. not completely unreasoanble." 16:49:44 karsten: Thanks, I'll check it out 16:49:47 ohmygodel: unclear. 16:49:49 and what are the big problems to solve 16:49:49 ohmygodel: my plan was 16:50:02 ohmygodel: to let roger and dgoulet go to this meeting on jan15 16:50:15 and use the tasklist https://trac.torproject.org/projects/tor/wiki/org/sponsors/SponsorRtasklist 16:50:19 + any other tasks we add in there 16:50:30 and come out of the meeting with some tasks that we and the funder wants us to do. 16:50:51 i have not read the privex paper 16:50:59 so i cannot really evaluate how much time it would take 16:51:11 224 would be nice also 16:51:13 i believe that karsten has 16:51:15 but it seems like a multi-months project. 16:51:32 ohmygodel: I can't say how much effort it would be. 16:51:50 ohmygodel: as a reference, implementing this simple laplace thing with tests and all that took us 1, maybe 2 days. 16:51:50 ohmygodel: i will definitely add privex-like things to the tasklist though 16:52:11 ok that sounds good and we can discuss it at the jan mtg then 16:52:16 ohmygodel: I think we'll need to write a lot of code for privex that seems simple from a design perspective, but that is quite hard to write. 16:52:26 im not advocating privex per se 16:52:30 ohmygodel: yes 16:52:42 karsten: it might not be uber hard, since relays talk to each other with cells already. 16:52:45 im suggesting a centralized statistics aggregation method 16:52:49 so we would have to add new cell types that carry statistics 16:52:56 asn: I'm open to discussing it. 16:53:01 yee... 16:53:04 one that could be just for a few HS stats initially 16:53:16 although in the long term, its a good model for very many Tor metrics 16:53:21 yes maybe 16:53:24 yup. 16:53:36 i dont see a new cell type being needed 16:53:40 this could be a completely separate module 16:53:44 completely separate code 16:54:10 i see two issues 16:54:30 1. implementation using some crytographic tools not previously used 16:54:48 e.g. some homomorphic encryption scheme 16:54:55 oh god 16:55:36 2. setting up infrastructure to support the collection (e.g. something like directory auths but for stats) 16:55:36 asn: it's just exempli gratia 16:55:54 ye sure 16:56:07 it just seems like a big project just for the sake of stats :) 16:56:11 asn: yes if there simply doesnt exist a reliable implementation of the needed cryptosystem, then were SOL 16:56:58 anyway, I guess we need to read the privex thing first. 16:57:03 Tor gets lots of money for stats 16:57:07 stats are really useful and interesting 16:57:14 i don't think I should comment too much without reading the paper first. 16:57:32 * asn shrugs 16:57:39 ok so thx for putting it on the task list 16:57:40 i would prefer to spend that money on performance 16:57:48 but anyway, I don't mind. 16:57:52 yeah and i would prefer to spend that money on privacy 16:57:53 we should think about this for sure. 16:57:55 No idea, but we could try to mine what's come out of the DARPA PROCEED program. 16:58:04 ohmygodel: :) 16:58:25 #fundingissues 16:58:42 ponies 16:58:46 so this discussion topic is done too. 16:58:49 what's next? 16:59:29 ohmygodel: i need to think more about your obfuscation reply 16:59:29 btw 16:59:41 ohmygodel: that is, inverting the order of that function composition. 16:59:45 it probably makes sense, but I don't get it yet. 16:59:52 ok 16:59:59 and now its noon 17:00:07 peace friends 17:00:15 ohmygodel: o/ 17:00:16 ok thx 17:00:34 syverson: btw, would you be interested in doing a blog post about mixed-latency anonymity? :) 17:01:20 https://trac.torproject.org/projects/tor/ticket/13192#comment:21 hah 17:01:41 yeah I laught 17:02:03 funny cause it's true 17:02:07 but anyway 17:02:11 ermm I might participate, but if it's something that needs a publication release I should let someone else be the author. 17:02:23 syverson: no, not really. 17:02:33 syverson: there is just lots of interest in doing higher-latency anonymity lately 17:02:35 hey 17:02:47 syverson: and most of us don't really know how to start thinking about htis problem 17:02:57 is there a community board for the tor project? I got this idea that will improve anonymity for all outproxy users. 17:03:00 syverson: and I was imagining that you would have some ideas. 17:03:27 syverson: anyway, think about it, and if you are interested in writing something send me an email :) 17:03:33 syverson: blog post, tor-dev post, whatever all is fine. 17:03:40 it doesn't even need to be big. 17:04:17 ideas != time unfortunately. I'm trying to get myself up to speed on the stats and HS stuff that I haven't been looking at enough. 17:04:40 syverson: ye :) 17:04:43 syverson: no problem :) 17:05:00 is there anyone looking at multi-path circuits? 17:05:12 e.g. so connections can persist a relay failure 17:05:33 I would like to look at mixed-latency stuff and mixing. The important thing is not to get too distracted by the interesting cool things when there's work to do. ;) 17:06:00 syverson: you mean not get distracted by stats when there is actual anonymity work to be done, right? :) 17:06:03 syverson: j/k 17:06:06 lol :) 17:06:33 aagbsn: nope. mainly mikeperry has been looking into this. 17:06:41 @asn The idea is putting all out-proxy behind a hidden service so out-proxy IP are not public anymore. It would make Tor very hard to block, add plausible deniability to out-proxy user. 17:06:43 aagbsn: actually that is the funding we had to work on onion routing when we designed Tor. 17:07:22 admin-pc: outproxy is I2P terminology right? 17:07:28 admin-pc: it's the same as exit nodes in Tor? or not? 17:07:34 @asn no 17:07:48 out-proxy = socks-5 proxy server behind hidden service 17:08:00 in tor we dont have it 17:08:04 we simply have exit nodes 17:08:14 and they know way too much :( 17:09:18 %endmeeting 17:09:20 #endmeeting