18:00:35 <nickm> #startmeeting weekly network team meeting, 16 Jan 2018
18:00:35 <MeetBot> Meeting started Tue Jan 16 18:00:35 2018 UTC.  The chair is nickm. Information about MeetBot at http://wiki.debian.org/MeetBot.
18:00:35 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic.
18:00:38 <nickm> hi all!
18:00:46 <nickm> https://pad.riseup.net/p/l6fPZUFt4NwQ is the pad
18:01:02 <ahf> hey
18:01:45 <asn> helloooo
18:01:48 * asn was writing an email
18:01:50 * asn drafts it
18:01:51 <dgoulet> hi
18:02:24 <nickm> helloooo!
18:02:35 <nickm> thanks for putting up with the changed date this week
18:03:30 <ahf> it's nice that we just pushed it a day instead of waiting a week, i think that is a good change
18:03:48 <asn> agreed
18:04:57 <nickm> well, it's going to be  a busy week!  The 0.3.3 freeze is next week, so it's a good time to get things done.
18:05:19 <nickm> (The freeze was originally going to be yesterday, but we postponed because of the 0.3.2 delay.
18:05:22 <nickm> )
18:06:36 <isis> o/
18:06:41 <nickm> hi, isis!  How are you doing?
18:07:11 <Samdney> \o
18:07:15 * isis lurks to stay-up-to-date on what other people are up to
18:07:31 <isis> i'm doing okay, how are you?
18:07:48 <nickm> not so bad. Gearing up for the freeze
18:07:55 <nickm> pad is at https://pad.riseup.net/p/l6fPZUFt4NwQ
18:08:07 <nickm> isabela: are you around?
18:09:46 <nickm> okay, let's get started!
18:10:03 <nickm> let's all read updates and add boldface?
18:10:44 <nickm> (Everybody knows the freeze is monday, yeah?)
18:11:06 <ahf> yep
18:11:11 <asn> yep
18:11:15 <dgoulet> yes
18:11:59 <nickm> okay.  there are still a few needs_review issues in review-group-29.  I'll review #22798, but I need somebody else to look at #21074 and #24652.
18:12:13 <asn> ill take review-group-29 task too
18:12:17 <nickm> They're mostly osx issues, but you shouldn't have to be an osx hacker to grok the code
18:12:27 <asn> ugh
18:12:28 <asn> ok
18:12:33 <nickm> and they're small too
18:12:44 <nickm> after that I'll open up review-group-30
18:12:59 <asn> should i be worrying that #23101 is not in r-g-29?
18:13:10 <asn> been trying to get that for 0.3.3
18:13:22 <nickm> let's talk afterwards.  If it's in rg30 it has a chance.  Is it bit?
18:13:25 <nickm> err,
18:13:27 <nickm> is it big?
18:13:33 <asn> medium
18:13:37 <nickm> hm, ok
18:13:55 <asn> review has been going on for aw hile between me and mike
18:13:56 <nickm> #action nickm and asn and maybe mikeperry talk about 23101 afterwards
18:14:02 <nickm> mikeperry: (you here?)
18:14:09 <asn> (it's #13837 and #23101)
18:14:11 <asn> ack
18:14:40 <asn> it took so long because we wanted it to be very good before you look at it
18:14:47 <nickm> also let's remember that 0.3.4 will freeze in may, so it's not a -huge- wait for anything that does need to wait.
18:14:49 <asn> in hindsight perhaps we should have put it in merge_ready before that
18:14:53 <ahf> it's unrealistic to get the dtrace probes in if i have a decent patch ready around thursday morning US/canada time?
18:15:04 <nickm> let's see how we're doing!
18:15:07 <ahf> its not super important to get in, since i mostly do testing with master anyway, but would be nice to get in
18:15:22 <nickm> ahf: ack; we'll have a look at the patch when it arrives
18:15:29 <ahf> ack, yes, let's do that
18:15:44 <asn> catalyst: hey what kind of refactorings are u thinking for #24661?
18:15:59 <asn> havent had time to reply to the ticket yet
18:16:07 <ahf> i can look at the two macos patches, i'm having a mac opened anyway the last two weeks
18:16:20 <ahf> the review-group-29 macos changes
18:16:38 <catalyst> asn: renaming a bunch of is_live stuff to is_reasonably_live or something like that. consolidating them so they're a little less scattered
18:18:14 <asn> catalyst: ack. do you know the right places to hit with your hammer? aka which places can be safely renamed and which not?
18:18:27 <nickm> Also, does anybody have guidance for catalyst on the travis issue they raise on the pad?
18:18:41 <nickm> Somebody who uses travis more than I do should probably opine.
18:20:12 <catalyst> https://github.com/travis-ci/travis-ci/issues/9033 is the Travis issue. no action by Travis for almost a week it looks like
18:20:49 <nickm> how hard is the workaround?
18:21:03 <catalyst> asn: was going to try to scope it to things that entrynodes.c needs
18:21:33 <Hello71> nickm: difficulty: very easy. cost: slightly slows down clang builds
18:21:40 <catalyst> nickm: two possible workarounds -- use sudo or tolerate clang build failures. probably better to use sudo
18:22:12 <Hello71> "sudo: true" doesn't mean that the build runs as root
18:22:21 <catalyst> where "tolerate clang failures" means "make it so if a clang build fails, the entire build doesn't show as failed"
18:22:22 <Hello71> it means that sudo is allowed
18:22:35 <catalyst> Hello71: you're right, i was confused
18:22:42 <nickm> okay. I'd say, workaround early if you want it soon, but maybe give them to the end of the week?
18:22:47 <nickm> that would be my guess
18:22:54 <nickm> adjust depending on how much you like travis to work :)
18:23:05 <nickm> and whether you can get Hello71 to do it for you :)
18:23:34 <catalyst> Hello71: do you know of a way to enable sudo only for clang and not for gcc? (i haven't tested the speed difference sudo-vs-not in a while)
18:24:00 <nickm> My issue: should I do stable releases this week or next?  I'd be backporting the DESTROY-queue fix.  But maybe dgoulet will have other code we should backport too, and I should wait for next week?
18:24:12 <Hello71> based on how long it took to support new Ubuntu versions, I would say "hope for the best, but don't expect anything major to be done within a year :/"
18:24:28 <Hello71> catalyst: I believe sudo is an allowed key in the matrix
18:24:30 <dgoulet> nickm: there are some fixes that need very much backporting such as the MAX REND FAILURE patch and the is client I believe
18:24:41 <asn> catalyst: ok. there are a few places in entrynodes.c doing is_live checks. finding the rationale for those might be hard.
18:24:48 <nickm> dgoulet: are these patches that are merged now, or not?
18:24:50 <dgoulet> nickm: but these should be ACKed this week I hope
18:24:54 <asn> catalyst: i remember some of those were discussed during prop#271 review rounds
18:25:10 <dgoulet> nickm: branches exists, review ongoing last I saw and one is waiting on Roger to modify it
18:25:24 <Hello71> catalyst: basically you use matrix:, and I think you manually specify every entry instead of the default (cross product everything)
18:25:58 <Hello71> perhaps I am mistaken about the second part, but I believe there is some way
18:26:19 <catalyst> Hello71: thanks. i'll try that if global sudo enabling slows down stuff too much
18:26:22 <nickm> ok. so next week, _with_ the stuff we merge this week, is likelier than "this week"
18:26:57 <nickm> dgoulet: can you lead us in a discussion about 24902 ?
18:27:05 <asn> +1
18:27:23 <dgoulet> ok sure
18:27:34 <asn> catalyst: please let me know if i can help you with renaming is_live to is_reasonably_live. i imagine it might be quite hard to find out whether something is safe to rename.
18:27:40 <Hello71> the actual build speed should be roughly the same, but it will take longer to start and slightly more likely to be queued
18:27:58 <catalyst> asn: thanks!
18:28:02 <Hello71> also certain features are disabled
18:28:23 <dgoulet> bottom line is that we've been able to identify two types of "DoS" on the network, a mass circuit creation from clients and the second is many concurrent connections from the same address creating 1 or 2 circuits, we are 99% sure they are tor2web nodes
18:28:50 <dgoulet> #24902 is an attempt at mitigating the circuit creation DoS by introducing a "Dos mitigation" subsystem with the circuit creation feature
18:29:17 <nickm> what does it do?
18:29:41 <dgoulet> this ^ works at the Guard level basically, it monitors connections from *client address* that is keep stats on the number of concurent conn. and circuit creation
18:29:42 <nickm> ug, are we doing anything about the second bug?
18:29:50 <dgoulet> nickm: I'll get to this :)
18:30:50 <dgoulet> nickm: if for the threshold is reached of concurrent conn (currently 3) and some threshold of circuit creation (function of concurrent conn) over a time period, a defense is triggered for the IP address
18:31:11 <dgoulet> currently the defense is to accept CREATE, send back CREATED and from then drop all cells on the circuit(s)
18:31:21 <nickm> huh
18:31:33 <dgoulet> there is a design doc on the ticket (badly written but it has the gist at least)
18:32:13 <dgoulet> there are culprits to that but the idea is for it to be used in special circumstances with consensus param
18:32:21 <dgoulet> and in normal circumstances, be disabled
18:32:40 <nickm> okay. I'll look at it once rg30 is open; I hope mikeperry will too
18:32:43 <dgoulet> so all in all, we make the Guard soak up the load by dropping the cells and not send it into the network
18:32:47 <isabela> hi sorry was on the phone
18:32:49 <isabela> !
18:33:03 <nickm> since the defense is kind of related to some of the stuff that his cbt code is meant to sniff out
18:33:06 <nickm> maybe
18:33:15 <dgoulet> "rg30" ?
18:33:19 <nickm> review group 30
18:34:05 * isabela opens pad etc
18:34:30 <nickm> dgoulet: have you tested how clients respond to the CREATED thing?
18:34:31 <dgoulet> so far after 5 days on my relay, it idenitfied over 500 clietn IPs, 98% from Hetzner, my relay dropped ~11GB of mostly EXTEND2 cells in the last 12h
18:34:53 <ahf> wow
18:35:17 <dgoulet> nickm: they consider that they can use the circuit, we get EXTEND2 cells and then circuit is killed and that process repeats millions of times
18:35:49 <dgoulet> nickm: what I haven't looked at in depth is how that behaviro affects Guard selection and CBT that is having a circuit created client side but the EXTENDED never comes back
18:36:17 <dgoulet> or if client start switching Guards a lot if they can't get the EXTEND circuit working
18:36:43 <nickm> why is the circuit killed?  we don't kill it, do we?
18:36:47 <dgoulet> (also the defense type we can use is a parameter so we can use different defenses depending on the circumstances)
18:37:02 <dgoulet> nickm: I beleive the client destroys it
18:37:08 <nickm> ok, but we should make sure
18:37:19 <dgoulet> this ongoing load btw is due to massive amount of clients reaching onion addresses
18:37:23 <mikeperry> nickm: I am here now. sorry, the day delay in meeting threw me
18:37:30 <nickm> np, sorry mikeperry
18:37:35 <nickm> glad to have you here
18:37:47 <nickm> can you see backlog & grep for your mentions?
18:38:22 <dgoulet> that is the first issue (circuit creation DoS), the second is due to tor2web clients (and a lot of them), we think they crawl the .onion space
18:38:35 <dgoulet> the collateral damage of this is high number of TCP conections filling the limits on relays
18:38:45 <nickm> with, like, separate connections?  That's a disgusting bug on their end
18:38:52 <nickm> also, it's okay IMO if we break crawlers
18:38:58 <dgoulet> Roger has apparently a patch (or working on a patch) to help mitigate that at the relay side (RENDEZVOUS)
18:39:29 <dgoulet> nickm: yes concurrent connections from the same address... we see 200-300 concurrent connectiosn from one single clients for which they are for rendezvous attempt
18:39:36 <nickm> wow
18:39:40 <nickm> that's horrible.
18:39:40 <dgoulet> and that for many many IPs
18:39:43 <dgoulet> from*
18:39:48 <asn> ugh
18:40:01 <dgoulet> at least, we've identified lots of IPs that are ALL coming from LeaseWeb so I'm working on getting in contact with them
18:40:10 <nickm> is this normal tor2web behavior?
18:40:16 * isabela left updates on the pad
18:40:39 <dgoulet> it is not (I'm still investigating code for this), for now I would go for each connection is a tor instance
18:40:56 <nickm> okay, then to heck with those resource hogs
18:40:57 <dgoulet> and imagine tens of thousands of them behind few IPs
18:41:43 <dgoulet> Roger plan I believe is to count concurrent connections (like the DoS patch I did) and associate them with a threhsold of RDV1 cell, then if detected, throw a defense
18:41:56 <dgoulet> in this case it would be close TCP conn as fast as possible since socket exhaustion is the problem
18:42:04 <nickm> ok. i hope that your counters can share code?}
18:42:36 <dgoulet> we'll make it happen! That is the goal of the "DoS mitigation" subsytem :)
18:43:29 <nickm> ok
18:43:40 <nickm> also if you want your stuff backported, remember to base it on the appropriate branches from the start
18:43:58 <nickm> have we run out of discussion topics for today?
18:44:11 <dgoulet> yeah all this will be some code to backport but at least I made it in its own file with very few entry point to help backporting, we'll let you know how it goes
18:45:34 <nickm> asn: I'm really curious to know what you have to say about wide-block ciphers :)
18:45:42 <nickm> any more for this week?
18:46:06 <asn> nickm: ha. i talked with Jean Paul Degabriele in rwc
18:46:11 <asn> i think you have also talked with him over email
18:46:14 <asn> or that's what he said
18:46:35 <asn> he presented a tagging-ish attack that's also possible with wide-blocks
18:46:38 <nickm> it's possible! A couple of crypto groups have emailed me about wide block ciphers
18:46:51 <nickm> ok, I'd love to know about it
18:47:00 <asn> by having the guard flip the RELAY field to RELAY_EARLY, and checkign that on the last hop
18:47:42 <asn> nickm: ack perhaps post-meeting
18:47:57 <asn> (although i have to relocate after the meeting)
18:48:17 <nickm> np
18:48:21 <nickm> or email tor-dev to summarize
18:48:24 <nickm> whatever you prefer
18:48:31 <nickm> asn: ah, I had thought about that
18:48:43 <nickm> the relay_early bit has to be part of the input to the wide-block cipher
18:48:56 <asn> right
18:49:18 <asn> but then can the guard node count the relay_earlies to reject infinite circuits?
18:49:30 <asn> (afaik that's the original point of relay_early)
18:49:44 <nickm> I think so; let's try to write out a design tho
18:49:46 <nickm> anything else for today?
18:49:48 <asn> ack
18:49:57 <asn> not from me :)
18:50:08 <asn> got tons of things to do for the rest of the week and things to get up to date to!
18:50:15 <asn> gonna be a good one :)
18:50:26 <nickm> okay. peace all. thanks for the hacking!
18:50:32 <nickm> #endmeeting