18:00:35 <nickm> #startmeeting weekly network team meeting, 16 Jan 2018 18:00:35 <MeetBot> Meeting started Tue Jan 16 18:00:35 2018 UTC. The chair is nickm. Information about MeetBot at http://wiki.debian.org/MeetBot. 18:00:35 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic. 18:00:38 <nickm> hi all! 18:00:46 <nickm> https://pad.riseup.net/p/l6fPZUFt4NwQ is the pad 18:01:02 <ahf> hey 18:01:45 <asn> helloooo 18:01:48 * asn was writing an email 18:01:50 * asn drafts it 18:01:51 <dgoulet> hi 18:02:24 <nickm> helloooo! 18:02:35 <nickm> thanks for putting up with the changed date this week 18:03:30 <ahf> it's nice that we just pushed it a day instead of waiting a week, i think that is a good change 18:03:48 <asn> agreed 18:04:57 <nickm> well, it's going to be a busy week! The 0.3.3 freeze is next week, so it's a good time to get things done. 18:05:19 <nickm> (The freeze was originally going to be yesterday, but we postponed because of the 0.3.2 delay. 18:05:22 <nickm> ) 18:06:36 <isis> o/ 18:06:41 <nickm> hi, isis! How are you doing? 18:07:11 <Samdney> \o 18:07:15 * isis lurks to stay-up-to-date on what other people are up to 18:07:31 <isis> i'm doing okay, how are you? 18:07:48 <nickm> not so bad. Gearing up for the freeze 18:07:55 <nickm> pad is at https://pad.riseup.net/p/l6fPZUFt4NwQ 18:08:07 <nickm> isabela: are you around? 18:09:46 <nickm> okay, let's get started! 18:10:03 <nickm> let's all read updates and add boldface? 18:10:44 <nickm> (Everybody knows the freeze is monday, yeah?) 18:11:06 <ahf> yep 18:11:11 <asn> yep 18:11:15 <dgoulet> yes 18:11:59 <nickm> okay. there are still a few needs_review issues in review-group-29. I'll review #22798, but I need somebody else to look at #21074 and #24652. 18:12:13 <asn> ill take review-group-29 task too 18:12:17 <nickm> They're mostly osx issues, but you shouldn't have to be an osx hacker to grok the code 18:12:27 <asn> ugh 18:12:28 <asn> ok 18:12:33 <nickm> and they're small too 18:12:44 <nickm> after that I'll open up review-group-30 18:12:59 <asn> should i be worrying that #23101 is not in r-g-29? 18:13:10 <asn> been trying to get that for 0.3.3 18:13:22 <nickm> let's talk afterwards. If it's in rg30 it has a chance. Is it bit? 18:13:25 <nickm> err, 18:13:27 <nickm> is it big? 18:13:33 <asn> medium 18:13:37 <nickm> hm, ok 18:13:55 <asn> review has been going on for aw hile between me and mike 18:13:56 <nickm> #action nickm and asn and maybe mikeperry talk about 23101 afterwards 18:14:02 <nickm> mikeperry: (you here?) 18:14:09 <asn> (it's #13837 and #23101) 18:14:11 <asn> ack 18:14:40 <asn> it took so long because we wanted it to be very good before you look at it 18:14:47 <nickm> also let's remember that 0.3.4 will freeze in may, so it's not a -huge- wait for anything that does need to wait. 18:14:49 <asn> in hindsight perhaps we should have put it in merge_ready before that 18:14:53 <ahf> it's unrealistic to get the dtrace probes in if i have a decent patch ready around thursday morning US/canada time? 18:15:04 <nickm> let's see how we're doing! 18:15:07 <ahf> its not super important to get in, since i mostly do testing with master anyway, but would be nice to get in 18:15:22 <nickm> ahf: ack; we'll have a look at the patch when it arrives 18:15:29 <ahf> ack, yes, let's do that 18:15:44 <asn> catalyst: hey what kind of refactorings are u thinking for #24661? 18:15:59 <asn> havent had time to reply to the ticket yet 18:16:07 <ahf> i can look at the two macos patches, i'm having a mac opened anyway the last two weeks 18:16:20 <ahf> the review-group-29 macos changes 18:16:38 <catalyst> asn: renaming a bunch of is_live stuff to is_reasonably_live or something like that. consolidating them so they're a little less scattered 18:18:14 <asn> catalyst: ack. do you know the right places to hit with your hammer? aka which places can be safely renamed and which not? 18:18:27 <nickm> Also, does anybody have guidance for catalyst on the travis issue they raise on the pad? 18:18:41 <nickm> Somebody who uses travis more than I do should probably opine. 18:20:12 <catalyst> https://github.com/travis-ci/travis-ci/issues/9033 is the Travis issue. no action by Travis for almost a week it looks like 18:20:49 <nickm> how hard is the workaround? 18:21:03 <catalyst> asn: was going to try to scope it to things that entrynodes.c needs 18:21:33 <Hello71> nickm: difficulty: very easy. cost: slightly slows down clang builds 18:21:40 <catalyst> nickm: two possible workarounds -- use sudo or tolerate clang build failures. probably better to use sudo 18:22:12 <Hello71> "sudo: true" doesn't mean that the build runs as root 18:22:21 <catalyst> where "tolerate clang failures" means "make it so if a clang build fails, the entire build doesn't show as failed" 18:22:22 <Hello71> it means that sudo is allowed 18:22:35 <catalyst> Hello71: you're right, i was confused 18:22:42 <nickm> okay. I'd say, workaround early if you want it soon, but maybe give them to the end of the week? 18:22:47 <nickm> that would be my guess 18:22:54 <nickm> adjust depending on how much you like travis to work :) 18:23:05 <nickm> and whether you can get Hello71 to do it for you :) 18:23:34 <catalyst> Hello71: do you know of a way to enable sudo only for clang and not for gcc? (i haven't tested the speed difference sudo-vs-not in a while) 18:24:00 <nickm> My issue: should I do stable releases this week or next? I'd be backporting the DESTROY-queue fix. But maybe dgoulet will have other code we should backport too, and I should wait for next week? 18:24:12 <Hello71> based on how long it took to support new Ubuntu versions, I would say "hope for the best, but don't expect anything major to be done within a year :/" 18:24:28 <Hello71> catalyst: I believe sudo is an allowed key in the matrix 18:24:30 <dgoulet> nickm: there are some fixes that need very much backporting such as the MAX REND FAILURE patch and the is client I believe 18:24:41 <asn> catalyst: ok. there are a few places in entrynodes.c doing is_live checks. finding the rationale for those might be hard. 18:24:48 <nickm> dgoulet: are these patches that are merged now, or not? 18:24:50 <dgoulet> nickm: but these should be ACKed this week I hope 18:24:54 <asn> catalyst: i remember some of those were discussed during prop#271 review rounds 18:25:10 <dgoulet> nickm: branches exists, review ongoing last I saw and one is waiting on Roger to modify it 18:25:24 <Hello71> catalyst: basically you use matrix:, and I think you manually specify every entry instead of the default (cross product everything) 18:25:58 <Hello71> perhaps I am mistaken about the second part, but I believe there is some way 18:26:19 <catalyst> Hello71: thanks. i'll try that if global sudo enabling slows down stuff too much 18:26:22 <nickm> ok. so next week, _with_ the stuff we merge this week, is likelier than "this week" 18:26:57 <nickm> dgoulet: can you lead us in a discussion about 24902 ? 18:27:05 <asn> +1 18:27:23 <dgoulet> ok sure 18:27:34 <asn> catalyst: please let me know if i can help you with renaming is_live to is_reasonably_live. i imagine it might be quite hard to find out whether something is safe to rename. 18:27:40 <Hello71> the actual build speed should be roughly the same, but it will take longer to start and slightly more likely to be queued 18:27:58 <catalyst> asn: thanks! 18:28:02 <Hello71> also certain features are disabled 18:28:23 <dgoulet> bottom line is that we've been able to identify two types of "DoS" on the network, a mass circuit creation from clients and the second is many concurrent connections from the same address creating 1 or 2 circuits, we are 99% sure they are tor2web nodes 18:28:50 <dgoulet> #24902 is an attempt at mitigating the circuit creation DoS by introducing a "Dos mitigation" subsystem with the circuit creation feature 18:29:17 <nickm> what does it do? 18:29:41 <dgoulet> this ^ works at the Guard level basically, it monitors connections from *client address* that is keep stats on the number of concurent conn. and circuit creation 18:29:42 <nickm> ug, are we doing anything about the second bug? 18:29:50 <dgoulet> nickm: I'll get to this :) 18:30:50 <dgoulet> nickm: if for the threshold is reached of concurrent conn (currently 3) and some threshold of circuit creation (function of concurrent conn) over a time period, a defense is triggered for the IP address 18:31:11 <dgoulet> currently the defense is to accept CREATE, send back CREATED and from then drop all cells on the circuit(s) 18:31:21 <nickm> huh 18:31:33 <dgoulet> there is a design doc on the ticket (badly written but it has the gist at least) 18:32:13 <dgoulet> there are culprits to that but the idea is for it to be used in special circumstances with consensus param 18:32:21 <dgoulet> and in normal circumstances, be disabled 18:32:40 <nickm> okay. I'll look at it once rg30 is open; I hope mikeperry will too 18:32:43 <dgoulet> so all in all, we make the Guard soak up the load by dropping the cells and not send it into the network 18:32:47 <isabela> hi sorry was on the phone 18:32:49 <isabela> ! 18:33:03 <nickm> since the defense is kind of related to some of the stuff that his cbt code is meant to sniff out 18:33:06 <nickm> maybe 18:33:15 <dgoulet> "rg30" ? 18:33:19 <nickm> review group 30 18:34:05 * isabela opens pad etc 18:34:30 <nickm> dgoulet: have you tested how clients respond to the CREATED thing? 18:34:31 <dgoulet> so far after 5 days on my relay, it idenitfied over 500 clietn IPs, 98% from Hetzner, my relay dropped ~11GB of mostly EXTEND2 cells in the last 12h 18:34:53 <ahf> wow 18:35:17 <dgoulet> nickm: they consider that they can use the circuit, we get EXTEND2 cells and then circuit is killed and that process repeats millions of times 18:35:49 <dgoulet> nickm: what I haven't looked at in depth is how that behaviro affects Guard selection and CBT that is having a circuit created client side but the EXTENDED never comes back 18:36:17 <dgoulet> or if client start switching Guards a lot if they can't get the EXTEND circuit working 18:36:43 <nickm> why is the circuit killed? we don't kill it, do we? 18:36:47 <dgoulet> (also the defense type we can use is a parameter so we can use different defenses depending on the circumstances) 18:37:02 <dgoulet> nickm: I beleive the client destroys it 18:37:08 <nickm> ok, but we should make sure 18:37:19 <dgoulet> this ongoing load btw is due to massive amount of clients reaching onion addresses 18:37:23 <mikeperry> nickm: I am here now. sorry, the day delay in meeting threw me 18:37:30 <nickm> np, sorry mikeperry 18:37:35 <nickm> glad to have you here 18:37:47 <nickm> can you see backlog & grep for your mentions? 18:38:22 <dgoulet> that is the first issue (circuit creation DoS), the second is due to tor2web clients (and a lot of them), we think they crawl the .onion space 18:38:35 <dgoulet> the collateral damage of this is high number of TCP conections filling the limits on relays 18:38:45 <nickm> with, like, separate connections? That's a disgusting bug on their end 18:38:52 <nickm> also, it's okay IMO if we break crawlers 18:38:58 <dgoulet> Roger has apparently a patch (or working on a patch) to help mitigate that at the relay side (RENDEZVOUS) 18:39:29 <dgoulet> nickm: yes concurrent connections from the same address... we see 200-300 concurrent connectiosn from one single clients for which they are for rendezvous attempt 18:39:36 <nickm> wow 18:39:40 <nickm> that's horrible. 18:39:40 <dgoulet> and that for many many IPs 18:39:43 <dgoulet> from* 18:39:48 <asn> ugh 18:40:01 <dgoulet> at least, we've identified lots of IPs that are ALL coming from LeaseWeb so I'm working on getting in contact with them 18:40:10 <nickm> is this normal tor2web behavior? 18:40:16 * isabela left updates on the pad 18:40:39 <dgoulet> it is not (I'm still investigating code for this), for now I would go for each connection is a tor instance 18:40:56 <nickm> okay, then to heck with those resource hogs 18:40:57 <dgoulet> and imagine tens of thousands of them behind few IPs 18:41:43 <dgoulet> Roger plan I believe is to count concurrent connections (like the DoS patch I did) and associate them with a threhsold of RDV1 cell, then if detected, throw a defense 18:41:56 <dgoulet> in this case it would be close TCP conn as fast as possible since socket exhaustion is the problem 18:42:04 <nickm> ok. i hope that your counters can share code?} 18:42:36 <dgoulet> we'll make it happen! That is the goal of the "DoS mitigation" subsytem :) 18:43:29 <nickm> ok 18:43:40 <nickm> also if you want your stuff backported, remember to base it on the appropriate branches from the start 18:43:58 <nickm> have we run out of discussion topics for today? 18:44:11 <dgoulet> yeah all this will be some code to backport but at least I made it in its own file with very few entry point to help backporting, we'll let you know how it goes 18:45:34 <nickm> asn: I'm really curious to know what you have to say about wide-block ciphers :) 18:45:42 <nickm> any more for this week? 18:46:06 <asn> nickm: ha. i talked with Jean Paul Degabriele in rwc 18:46:11 <asn> i think you have also talked with him over email 18:46:14 <asn> or that's what he said 18:46:35 <asn> he presented a tagging-ish attack that's also possible with wide-blocks 18:46:38 <nickm> it's possible! A couple of crypto groups have emailed me about wide block ciphers 18:46:51 <nickm> ok, I'd love to know about it 18:47:00 <asn> by having the guard flip the RELAY field to RELAY_EARLY, and checkign that on the last hop 18:47:42 <asn> nickm: ack perhaps post-meeting 18:47:57 <asn> (although i have to relocate after the meeting) 18:48:17 <nickm> np 18:48:21 <nickm> or email tor-dev to summarize 18:48:24 <nickm> whatever you prefer 18:48:31 <nickm> asn: ah, I had thought about that 18:48:43 <nickm> the relay_early bit has to be part of the input to the wide-block cipher 18:48:56 <asn> right 18:49:18 <asn> but then can the guard node count the relay_earlies to reject infinite circuits? 18:49:30 <asn> (afaik that's the original point of relay_early) 18:49:44 <nickm> I think so; let's try to write out a design tho 18:49:46 <nickm> anything else for today? 18:49:48 <asn> ack 18:49:57 <asn> not from me :) 18:50:08 <asn> got tons of things to do for the rest of the week and things to get up to date to! 18:50:15 <asn> gonna be a good one :) 18:50:26 <nickm> okay. peace all. thanks for the hacking! 18:50:32 <nickm> #endmeeting