18:00:35 #startmeeting weekly network team meeting, 16 Jan 2018 18:00:35 Meeting started Tue Jan 16 18:00:35 2018 UTC. The chair is nickm. Information about MeetBot at http://wiki.debian.org/MeetBot. 18:00:35 Useful Commands: #action #agreed #help #info #idea #link #topic. 18:00:38 hi all! 18:00:46 https://pad.riseup.net/p/l6fPZUFt4NwQ is the pad 18:01:02 hey 18:01:45 helloooo 18:01:48 * asn was writing an email 18:01:50 * asn drafts it 18:01:51 hi 18:02:24 helloooo! 18:02:35 thanks for putting up with the changed date this week 18:03:30 it's nice that we just pushed it a day instead of waiting a week, i think that is a good change 18:03:48 agreed 18:04:57 well, it's going to be a busy week! The 0.3.3 freeze is next week, so it's a good time to get things done. 18:05:19 (The freeze was originally going to be yesterday, but we postponed because of the 0.3.2 delay. 18:05:22 ) 18:06:36 o/ 18:06:41 hi, isis! How are you doing? 18:07:11 \o 18:07:15 * isis lurks to stay-up-to-date on what other people are up to 18:07:31 i'm doing okay, how are you? 18:07:48 not so bad. Gearing up for the freeze 18:07:55 pad is at https://pad.riseup.net/p/l6fPZUFt4NwQ 18:08:07 isabela: are you around? 18:09:46 okay, let's get started! 18:10:03 let's all read updates and add boldface? 18:10:44 (Everybody knows the freeze is monday, yeah?) 18:11:06 yep 18:11:11 yep 18:11:15 yes 18:11:59 okay. there are still a few needs_review issues in review-group-29. I'll review #22798, but I need somebody else to look at #21074 and #24652. 18:12:13 ill take review-group-29 task too 18:12:17 They're mostly osx issues, but you shouldn't have to be an osx hacker to grok the code 18:12:27 ugh 18:12:28 ok 18:12:33 and they're small too 18:12:44 after that I'll open up review-group-30 18:12:59 should i be worrying that #23101 is not in r-g-29? 18:13:10 been trying to get that for 0.3.3 18:13:22 let's talk afterwards. If it's in rg30 it has a chance. Is it bit? 18:13:25 err, 18:13:27 is it big? 18:13:33 medium 18:13:37 hm, ok 18:13:55 review has been going on for aw hile between me and mike 18:13:56 #action nickm and asn and maybe mikeperry talk about 23101 afterwards 18:14:02 mikeperry: (you here?) 18:14:09 (it's #13837 and #23101) 18:14:11 ack 18:14:40 it took so long because we wanted it to be very good before you look at it 18:14:47 also let's remember that 0.3.4 will freeze in may, so it's not a -huge- wait for anything that does need to wait. 18:14:49 in hindsight perhaps we should have put it in merge_ready before that 18:14:53 it's unrealistic to get the dtrace probes in if i have a decent patch ready around thursday morning US/canada time? 18:15:04 let's see how we're doing! 18:15:07 its not super important to get in, since i mostly do testing with master anyway, but would be nice to get in 18:15:22 ahf: ack; we'll have a look at the patch when it arrives 18:15:29 ack, yes, let's do that 18:15:44 catalyst: hey what kind of refactorings are u thinking for #24661? 18:15:59 havent had time to reply to the ticket yet 18:16:07 i can look at the two macos patches, i'm having a mac opened anyway the last two weeks 18:16:20 the review-group-29 macos changes 18:16:38 asn: renaming a bunch of is_live stuff to is_reasonably_live or something like that. consolidating them so they're a little less scattered 18:18:14 catalyst: ack. do you know the right places to hit with your hammer? aka which places can be safely renamed and which not? 18:18:27 Also, does anybody have guidance for catalyst on the travis issue they raise on the pad? 18:18:41 Somebody who uses travis more than I do should probably opine. 18:20:12 https://github.com/travis-ci/travis-ci/issues/9033 is the Travis issue. no action by Travis for almost a week it looks like 18:20:49 how hard is the workaround? 18:21:03 asn: was going to try to scope it to things that entrynodes.c needs 18:21:33 nickm: difficulty: very easy. cost: slightly slows down clang builds 18:21:40 nickm: two possible workarounds -- use sudo or tolerate clang build failures. probably better to use sudo 18:22:12 "sudo: true" doesn't mean that the build runs as root 18:22:21 where "tolerate clang failures" means "make it so if a clang build fails, the entire build doesn't show as failed" 18:22:22 it means that sudo is allowed 18:22:35 Hello71: you're right, i was confused 18:22:42 okay. I'd say, workaround early if you want it soon, but maybe give them to the end of the week? 18:22:47 that would be my guess 18:22:54 adjust depending on how much you like travis to work :) 18:23:05 and whether you can get Hello71 to do it for you :) 18:23:34 Hello71: do you know of a way to enable sudo only for clang and not for gcc? (i haven't tested the speed difference sudo-vs-not in a while) 18:24:00 My issue: should I do stable releases this week or next? I'd be backporting the DESTROY-queue fix. But maybe dgoulet will have other code we should backport too, and I should wait for next week? 18:24:12 based on how long it took to support new Ubuntu versions, I would say "hope for the best, but don't expect anything major to be done within a year :/" 18:24:28 catalyst: I believe sudo is an allowed key in the matrix 18:24:30 nickm: there are some fixes that need very much backporting such as the MAX REND FAILURE patch and the is client I believe 18:24:41 catalyst: ok. there are a few places in entrynodes.c doing is_live checks. finding the rationale for those might be hard. 18:24:48 dgoulet: are these patches that are merged now, or not? 18:24:50 nickm: but these should be ACKed this week I hope 18:24:54 catalyst: i remember some of those were discussed during prop#271 review rounds 18:25:10 nickm: branches exists, review ongoing last I saw and one is waiting on Roger to modify it 18:25:24 catalyst: basically you use matrix:, and I think you manually specify every entry instead of the default (cross product everything) 18:25:58 perhaps I am mistaken about the second part, but I believe there is some way 18:26:19 Hello71: thanks. i'll try that if global sudo enabling slows down stuff too much 18:26:22 ok. so next week, _with_ the stuff we merge this week, is likelier than "this week" 18:26:57 dgoulet: can you lead us in a discussion about 24902 ? 18:27:05 +1 18:27:23 ok sure 18:27:34 catalyst: please let me know if i can help you with renaming is_live to is_reasonably_live. i imagine it might be quite hard to find out whether something is safe to rename. 18:27:40 the actual build speed should be roughly the same, but it will take longer to start and slightly more likely to be queued 18:27:58 asn: thanks! 18:28:02 also certain features are disabled 18:28:23 bottom line is that we've been able to identify two types of "DoS" on the network, a mass circuit creation from clients and the second is many concurrent connections from the same address creating 1 or 2 circuits, we are 99% sure they are tor2web nodes 18:28:50 #24902 is an attempt at mitigating the circuit creation DoS by introducing a "Dos mitigation" subsystem with the circuit creation feature 18:29:17 what does it do? 18:29:41 this ^ works at the Guard level basically, it monitors connections from *client address* that is keep stats on the number of concurent conn. and circuit creation 18:29:42 ug, are we doing anything about the second bug? 18:29:50 nickm: I'll get to this :) 18:30:50 nickm: if for the threshold is reached of concurrent conn (currently 3) and some threshold of circuit creation (function of concurrent conn) over a time period, a defense is triggered for the IP address 18:31:11 currently the defense is to accept CREATE, send back CREATED and from then drop all cells on the circuit(s) 18:31:21 huh 18:31:33 there is a design doc on the ticket (badly written but it has the gist at least) 18:32:13 there are culprits to that but the idea is for it to be used in special circumstances with consensus param 18:32:21 and in normal circumstances, be disabled 18:32:40 okay. I'll look at it once rg30 is open; I hope mikeperry will too 18:32:43 so all in all, we make the Guard soak up the load by dropping the cells and not send it into the network 18:32:47 hi sorry was on the phone 18:32:49 ! 18:33:03 since the defense is kind of related to some of the stuff that his cbt code is meant to sniff out 18:33:06 maybe 18:33:15 "rg30" ? 18:33:19 review group 30 18:34:05 * isabela opens pad etc 18:34:30 dgoulet: have you tested how clients respond to the CREATED thing? 18:34:31 so far after 5 days on my relay, it idenitfied over 500 clietn IPs, 98% from Hetzner, my relay dropped ~11GB of mostly EXTEND2 cells in the last 12h 18:34:53 wow 18:35:17 nickm: they consider that they can use the circuit, we get EXTEND2 cells and then circuit is killed and that process repeats millions of times 18:35:49 nickm: what I haven't looked at in depth is how that behaviro affects Guard selection and CBT that is having a circuit created client side but the EXTENDED never comes back 18:36:17 or if client start switching Guards a lot if they can't get the EXTEND circuit working 18:36:43 why is the circuit killed? we don't kill it, do we? 18:36:47 (also the defense type we can use is a parameter so we can use different defenses depending on the circumstances) 18:37:02 nickm: I beleive the client destroys it 18:37:08 ok, but we should make sure 18:37:19 this ongoing load btw is due to massive amount of clients reaching onion addresses 18:37:23 nickm: I am here now. sorry, the day delay in meeting threw me 18:37:30 np, sorry mikeperry 18:37:35 glad to have you here 18:37:47 can you see backlog & grep for your mentions? 18:38:22 that is the first issue (circuit creation DoS), the second is due to tor2web clients (and a lot of them), we think they crawl the .onion space 18:38:35 the collateral damage of this is high number of TCP conections filling the limits on relays 18:38:45 with, like, separate connections? That's a disgusting bug on their end 18:38:52 also, it's okay IMO if we break crawlers 18:38:58 Roger has apparently a patch (or working on a patch) to help mitigate that at the relay side (RENDEZVOUS) 18:39:29 nickm: yes concurrent connections from the same address... we see 200-300 concurrent connectiosn from one single clients for which they are for rendezvous attempt 18:39:36 wow 18:39:40 that's horrible. 18:39:40 and that for many many IPs 18:39:43 from* 18:39:48 ugh 18:40:01 at least, we've identified lots of IPs that are ALL coming from LeaseWeb so I'm working on getting in contact with them 18:40:10 is this normal tor2web behavior? 18:40:16 * isabela left updates on the pad 18:40:39 it is not (I'm still investigating code for this), for now I would go for each connection is a tor instance 18:40:56 okay, then to heck with those resource hogs 18:40:57 and imagine tens of thousands of them behind few IPs 18:41:43 Roger plan I believe is to count concurrent connections (like the DoS patch I did) and associate them with a threhsold of RDV1 cell, then if detected, throw a defense 18:41:56 in this case it would be close TCP conn as fast as possible since socket exhaustion is the problem 18:42:04 ok. i hope that your counters can share code?} 18:42:36 we'll make it happen! That is the goal of the "DoS mitigation" subsytem :) 18:43:29 ok 18:43:40 also if you want your stuff backported, remember to base it on the appropriate branches from the start 18:43:58 have we run out of discussion topics for today? 18:44:11 yeah all this will be some code to backport but at least I made it in its own file with very few entry point to help backporting, we'll let you know how it goes 18:45:34 asn: I'm really curious to know what you have to say about wide-block ciphers :) 18:45:42 any more for this week? 18:46:06 nickm: ha. i talked with Jean Paul Degabriele in rwc 18:46:11 i think you have also talked with him over email 18:46:14 or that's what he said 18:46:35 he presented a tagging-ish attack that's also possible with wide-blocks 18:46:38 it's possible! A couple of crypto groups have emailed me about wide block ciphers 18:46:51 ok, I'd love to know about it 18:47:00 by having the guard flip the RELAY field to RELAY_EARLY, and checkign that on the last hop 18:47:42 nickm: ack perhaps post-meeting 18:47:57 (although i have to relocate after the meeting) 18:48:17 np 18:48:21 or email tor-dev to summarize 18:48:24 whatever you prefer 18:48:31 asn: ah, I had thought about that 18:48:43 the relay_early bit has to be part of the input to the wide-block cipher 18:48:56 right 18:49:18 but then can the guard node count the relay_earlies to reject infinite circuits? 18:49:30 (afaik that's the original point of relay_early) 18:49:44 I think so; let's try to write out a design tho 18:49:46 anything else for today? 18:49:48 ack 18:49:57 not from me :) 18:50:08 got tons of things to do for the rest of the week and things to get up to date to! 18:50:15 gonna be a good one :) 18:50:26 okay. peace all. thanks for the hacking! 18:50:32 #endmeeting