15:29:22 <nickm> #startmeeting proposals for authority hardening
15:29:22 <MeetBot> Meeting started Fri Feb 19 15:29:22 2016 UTC.  The chair is nickm. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:29:22 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic.
15:30:24 <nickm> So I think we're talking about proposals prop#257 and prop#258 by me and andrea respectively.
15:30:33 <nickm> Proposal 257: Refactoring authorities and taking parts offline
15:30:34 <nickm> Proposal 258: Denial-of-service resistance for directory authorities
15:30:50 <nickm> I think isis and dgoulet and athena can make it?
15:30:55 <nickm> armadev: ?
15:31:34 <nickm> again, I apologize for getting the time wrong :p
15:31:35 <athena> hi hi meeting
15:31:39 <nickm> hi hi
15:32:03 <nickm> so, let's do prop#258 first because it's shorter?
15:32:29 <nickm> anybody (other than athena) want to summarize prop#258, or should I ?
15:32:42 <nickm> (I generally want non-authors to summarize, so that the author can correct if they're wrong)
15:33:29 <nickm> ok, I'll try.  Let's hope more people filter in too.
15:33:54 <nickm> The basic idea is that if a dirauth gets too many connections from an IP, or too many begindir requests on a circuit, it should start rejecting.
15:34:09 <nickm> sections 3 and onwards explain how.
15:34:34 * isis is here
15:34:40 <isis> https://gitweb.torproject.org/torspec.git/tree/proposals/257-hiding-authorities.txt
15:34:45 <isis> https://gitweb.torproject.org/torspec.git/tree/proposals/258-dirauth-dos.txt
15:34:51 <nickm> isis: we're doing 258 now
15:34:56 <isis> got it
15:35:12 <isis> that is the one i haven't read yet…
15:35:29 <nickm> the approach is to classify requests in several categories, and have a count per-IP or per-circuit of how many we've seen, using the EWMA algorithm so that we don't have to store a lot of data.
15:35:44 <nickm> we can probably be a bit aggressive about the timeout here, even
15:35:53 <nickm> since remembering old requests isn't important.
15:36:08 <nickm> This is going to need some thinking for IPv6, where everybody can trivially generate a billion addresses.
15:36:12 <nickm> athena: did I get it about right?
15:37:28 * dgoulet here
15:37:34 <dgoulet> yeah it's about rigth!
15:37:37 <nickm> hi dgoulet
15:38:01 <isis> it sounds like it's not per-IP but per bucket in some hash table
15:38:26 <dgoulet> "get a single bucket per source IP each"
15:39:21 <dgoulet> #4581 is the ticket
15:39:40 <nickm> athena: did you agree that the description is more or less right?
15:39:41 <athena> nickm: yes, that's about correct
15:39:42 <nickm> all: what did I miss?
15:40:07 <nickm> at what point do we increment the counter?
15:40:32 <isis> dgoulet: ah, i see.  but, if we want to limit memory exhaustion attacks, don't we want multiple source IPs per bucket?  particularly for IPv6?
15:40:35 <athena> and yes, for IPv6 it would benefit from some tinkering - possibly, buckets for ranges, and an IP bumps every bucket for the successive /ns containing it, and higher thresholds for larger buckets
15:41:13 <athena> nickm: at the point of an incoming conection to the dirport for direct, or at the point of receiving a begindir cell
15:41:43 <dgoulet> we probably want that for IPv4 also unless we can live with _large_ ok-ish HT
15:41:58 <nickm> well, the HT doesn't need to get too big, right?
15:42:20 <nickm> like, if we make it so that we expire after ~30 seconds, we're probably in the clear and the HT doesn't store too much.
15:42:43 <athena> yes, HT entries that drop below a threshold by EWMA get purged
15:43:30 <nickm> and the threshold doesn't have to be super-low, since we can accept that anybody doing less than e.g. one request per 30-60 seconds is probably not doing a DoS of this kind
15:43:31 <isis> okay, it shouldn't grow too large in that case then
15:43:55 <isis> how will the timeout be specified?
15:45:04 <nickm> probably by one of those config parameters.
15:45:10 <isis> oh, i see, §4
15:45:13 <nickm> yeah
15:45:16 <isis> sorry!  still reading…
15:45:22 <athena> hmm, it isn't an explicit timeout in the version i wrote, it's a decay constant for the EWMA and then it drops the counters when they get below (IIRC) 0.01 * the relevant DoS blocking threshold
15:46:51 <nickm> isis: how are you typing all of those §s?
15:47:31 <isis> compose_key + s + o
15:48:55 <isis> compose_key is menu for me, specified in /etc/default/keyboard as `XKBOPTIONS="compose:menu,grp:sclk_toggle,grp_led:num,lv3:switch,keypad:future,c\
15:48:58 <isis> aps:swapescape"
15:50:06 <dgoulet> so I'm re-reading the ticket since I looked at the code for that proposal a while back and one comment could be useful for this discussion:
15:50:50 <dgoulet> about the choice of the parameters, the code uses some values but I don't see anything in the proposal explainign why "32" or a table of possible values
15:51:08 <dgoulet> maybe it's something we have no idea so we have to try it and tweak it as we go?
15:53:03 <athena> dgoulet: they are rather arbitrary - the algorithm is really "high enough they don't trip on the dirauths under normal conditions, low enough they cut off attacker traffic"
15:53:50 <athena> i suspect the real answer to this is "add a mode that blocks nothing but records stats on 95th percentile alues or whatever, ad see what the dirauths get in practice"
15:55:33 <dgoulet> hrm ok
15:56:24 <isis> do we all agree that prop#258 should be accepted?
15:57:44 <dgoulet> yes I think it's a useful addition!
15:57:54 <athena> concur, but i wrote it
15:58:20 <dgoulet> we shouldbe very careful at first and monitor our dirauth quite a bit so we don't end up actually denying valid request but that's the tuning part
15:58:30 <dgoulet> we can start with one single dirauth anyway
15:58:39 <nickm> +1 to accept
16:00:13 <isis> #action change prop#258 status to accepted
16:00:24 <nickm> more on 258, or on to prop#257?
16:00:39 <dgoulet> can we make prop258 go in 028 ?
16:01:49 <athena> dgoulet: depends how much you care about getting good unit test coverage and how sophisticated you want to get on that load-measuring mode, i think
16:03:31 <dgoulet> hrm we should then probably take a #yolo dirauth (winks at armadev :) and make it use that patch so we can tune it before making it stable and all 9 dirauth updates?
16:05:30 <athena> well, i don't think you should deploy a patch like this on all nine dirauths at once because most of its real-world load-testing will happen on the dirauths
16:05:53 <nickm> this is sounding like "let's try to get this into the first week of 0.2.9 merges" then?
16:05:59 <dgoulet> yeah ^
16:06:00 <athena> and if it turns out to have some bug we didn't know about in advance, finding out on all nine dirauths at once will suck
16:06:07 <dgoulet> athena: very true
16:08:01 <nickm> so on to prop#257 ? :)
16:08:11 <dgoulet> yup
16:09:28 <nickm> somebody other than me summarize?
16:10:10 <dgoulet> I can try, I've read it a while back but I skimmed it again before the meeting so I might get it right
16:11:14 <nickm> go for it, sez I
16:11:57 <dgoulet> basically we want to split dirauth role 1) from little-t tor and 2) into modules that are basically different processes. "Upload" component where relays send they descriptor, this goes to the voting process to create a consensus then a module for publishing which this one pushes it to the dircache (which I assume is the Distribution component in the proposal)
16:12:36 <dgoulet> I'm unsure where the bwauth are plugged in that infrastructure but I assume "Upload"
16:13:30 <dgoulet> then there are some considerations on how communication happens, which would be TLS for relays uploading their descriptor and then in between modules it could be something else like rsync-ssh
16:13:45 <dgoulet> that's all I have in my head ^ --
16:14:50 <nickm> sounds roughly right.  There's more detail in the proposal.
16:15:16 <nickm> basically, the idea is to separate the "parts that need a public IP", "parts that need to hold a key", and "parts that need lots of bandwidth" into logically distinct parts.
16:15:37 <nickm> because authorities do like 5-10 separate things right now, and there's no reasaon that they need to be done on the same host.
16:16:24 <dgoulet> is it the Upload module that test the relays?
16:16:33 <nickm> it doesn't have to be.
16:16:38 <dgoulet> hrm
16:16:52 <dgoulet> as long as that information  is feed somehow inn the Voting module?
16:16:57 <nickm> but I bet it might be efficient if it does an initial "does it work at all" test before passing it on.
16:17:00 <nickm> right
16:17:36 <dgoulet> so here is a question
16:18:03 <dgoulet> ah nvm
16:19:43 <nickm> so, IMO this proposal is too big and vague to actually sit down and build as at stands.  It needs to get its pieces prioritized and individually specified.
16:19:44 <dgoulet> I think the proposal could gain from adding modules that interacts with "Voting", there are at least two now that is bwauth and "testing relays"
16:20:31 <dgoulet> nickm: indeed
16:20:36 <dgoulet> very much
16:21:03 <nickm> So maybe we should talk about the basic idea, and then try to figure out how the specifics would go
16:21:58 <dgoulet> yes
16:22:07 <dgoulet> nickm: anything you want to start with?
16:22:47 <nickm> not really.  Maybe, what order would we want to implement this stuff in?  Is it worth it?  How hard would it be?
16:23:43 <dgoulet> I think it totally does worth it, what I like is that the "Voting" module which I think here is basically the dirauth creating a consensus from data fed from different places can be hidden in some capacity
16:24:28 <nickm> it would be good to be clear about which parts need to be trusted, and how much they need to be trusted.
16:24:39 <dgoulet> yah that was my next question
16:24:40 <nickm> Like, the upload module needs to be trusted not to drop descriptors in a hostile way.
16:25:29 <dgoulet> it does yes
16:27:37 <dgoulet> well I guess we need to breakdown each module in "what task they do" and security considerations and communication mode (IPC/Network)
16:27:52 <dgoulet> then we can start breaking down dirauth roles into those modules and implement
16:27:55 <dgoulet> big task
16:28:22 <dgoulet> as for what to start with, I would say Upload since it's the inbound "untrusted" data from the network
16:28:34 <dgoulet> the rest we kind of control the flow
16:29:44 <isis> i presume we would not have the Voting module also serve the documents produced?
16:29:51 <nickm> right
16:30:10 <isis> so we would kind of need three modules to start with: Upload, Voting, Serving
16:30:21 <nickm> yeah.  If we pull voting out of upload and serving...
16:30:28 <isis> where the BWAuth talks to Serving?
16:30:39 <dgoulet> Voting
16:30:50 <nickm> voting needs the info from bwauth.
16:31:01 <nickm> I don't think I have a solid idea how voting gets that info
16:31:08 <isis> but then that would require that Voting has network access?  which i thought we were trying to avoid?
16:31:26 <nickm> well, voting has to get info somehow.
16:31:33 <dgoulet> isis: in some way it will need network access if you want to isolate them
16:31:49 <dgoulet> isis: both inbound data from Uplaod/bwauth/testing and then send them to Publishing
16:32:04 <dgoulet> and outbound*
16:32:05 <isis> right, but i assumed that part was IPC on the same machine
16:32:23 <isis> whereas the BWAuths are (all?) on separate machines
16:32:29 <dgoulet> I'm guessing at first we'll do that with IPC ^ and then we have the maybe luxury to move them to different machiens
16:32:36 <nickm> I think that IPC on the same machine might be a default implementation, but I'd like to have them on separate machines and IPs.
16:32:47 <isis> ah, i see
16:32:52 <nickm> One easy way to prepare would be to separate the following:
16:32:59 <nickm> IP:fingerprint for servers where you must upload descriptors.
16:33:08 <isis> so this proposal is less about taking parts "offline" and more about process isolation
16:33:17 <nickm> IP:fingerprint for servers where you may download canonical consensuses that were just signed.
16:33:29 <nickm> fingerprints for authorities that sign consensuses.
16:33:47 <nickm> (Only authorities need to care about this) IP:fingerprint for where you should upload/download votes
16:34:07 <dgoulet> isis: yeah that would be more accurate, dirauth roles into separate processes
16:34:20 <nickm> This would only be a matter of splitting our authority list into a few separate lists.
16:35:15 <dgoulet> yeah
16:35:38 <nickm> well, the motivation for doing separate processes is to minimize the amount of code that holds trust here...
16:35:46 <nickm> AND to make it easier to isolate parts by taking them offline-ish.
16:36:05 <isis> okay, so "offline" still sort of makes sense, in that we can hide the Voting module of each DirAuth from the public network
16:36:14 <nickm> like, I think it would be reasonable to have a voter in a separate VM that only interacts through the outside world through another VM that does its communications for it.
16:36:34 <nickm> the voter needs some connection to the outside world, but that doesn't mean it needs to be opening connections directly.
16:36:35 <isis> maybe we should change the proposal to clarify that?  when i read it, i thought "offline" meant, well, actually offline.
16:36:57 <nickm> how would offline work?
16:37:08 <isis> so… basically we're going to build QubesOS into the DirAuths.  awesome :)
16:37:09 <nickm> #action clarify that we mean offline-ish
16:37:14 <dgoulet> consensus created offline is superb magic :D
16:37:28 <nickm> or rather, make dirauths able to exploite QubesOS
16:37:33 <nickm> *exploit
16:39:39 <isis> i don't know how it would work… i guess in my head i was thinking: ORs → (network) → Upload → (ipc) → Voting → (ipc) → Serving → (network) → OPs
16:40:11 <nickm> so, another way to do that is
16:40:16 <isis> and that the Voting module wouldn't have a public IP at all, or be in a VM or something
16:40:39 <dgoulet> isis: yeah doable but that forces Upload and Voting on the same m achine
16:40:40 <nickm> ORs -> network -> Upload -> rsync+ssh -> filesystem -> Voting -> filesystem -> rsync+ssh -> serving -> OPs
16:40:49 <nickm> Is that more offline or less?
16:41:43 <dgoulet> clearly Upload and Voting have massive trust so that sounds ok to me for the case we want them on seperate data centers
16:41:44 <isis> both are good designs with slightly different threat models
16:42:15 <nickm> Upload needs high availability and a fair amount of BW.
16:42:26 <isis> (i'm not trying to argue for the version of this i had in my head, just point out that i didn't understand very clearly what this would look like in implementation)
16:43:01 <nickm> Voting needs far less BW, Publishing needs far more.
16:43:28 <nickm> Voting doesn't need to accept or open connections from unknown parties ever
16:43:41 <isis> separate machines would make more sense, security-wise, i think.  plus it would give DirAuth operators the ability to rebuild/move a machine when there's a threat or a breakin.
16:44:00 <dgoulet> isis: yes!
16:45:22 <isis> i have a slight preference for avoiding more usage of rsync+ssh in the network… mostly just because it feels sort of incomplete to have core parts of the network rely on cronjobs being correctly configured, rather than having some solution which is built into tor itself.
16:45:48 <isis> maybe "incomplete" is not the right word…
16:45:54 <nickm> feel free to s/rsync+ssh+cron/tor-rsync-ssh-cron-driver/ ;)
16:46:12 <dgoulet> stitched together? :)
16:46:14 <isis> more like "let's stop duct-taping things together"
16:47:12 <nickm> I agree every thing that makes the network run should be packaged and tested and have a well-defined way that it's integrated.
16:47:21 <nickm> I don't mind that way requiring ssh+rsync
16:47:37 <nickm> I do agree that "and then you set up ssh+rsync and you had better do it right" should not be part of our instructions
16:47:50 <isis> also because it seems like a lot of that configuration is hidden/undocumented, and if we all got hit by a bus tomorrow, i would very much pity the poor bastards who have to figure out how to re-setup the DirAuths
16:47:59 <nickm> yup
16:48:28 * nickm .oO ( one case of salmonella-tainted club mate ... )
16:48:45 * isis spits out her mate
16:48:52 <isis> wat
16:49:11 <nickm> Trying to make it smaller than a bus
16:49:27 <nickm> it's hypothetical
16:49:30 <isis> ah, phew
16:49:47 <dgoulet> so what do we do with this proposal? accept it as a "meta proposal" that is we needs moar proposal breaking it down? :)
16:49:48 * isis sotps pouring the mate down the kitchen sink
16:50:36 <nickm> Could be.
16:50:40 <isis> i think breaking it down would make it more reasonable to implement sections of it?
16:50:53 <isis> but i am probably not the person to implement this so…
16:50:58 <nickm> Even for implemeting sections, we'll need to specify them all more closely
16:51:09 <isis> well, for the bridgeauth i would gladly help
16:51:22 <dgoulet> oh we have a "Meta" status wow
16:51:39 <isis> i should write a proposal for redeigning the bridgeauth to actually do useful stuff
16:51:42 <dgoulet> doesn't apply much to this one though
16:52:14 <isis> having the "test the bridge reachability" part as a separate machine would definitely be extremely useful
16:52:19 <nickm> Meta seems useful.
16:52:23 <nickm> #action call this proposal meta
16:52:48 <dgoulet> cool
16:53:23 <dgoulet> now let's pitch that idea to a sponsor :D
16:53:48 <nickm> #action open a ticket for that dirauth addr/key splitting in 0.2.9
16:54:37 <dgoulet> and I'm sure C (as much as I love it) is probably the best choice for those componenent unless we need crazy performance
16:54:57 <nickm> are you missing a "not" in that sentence? :)
16:55:09 <dgoulet> YES! haha
16:55:13 <dgoulet> is probably ONT
16:55:14 <dgoulet> NOT*
16:55:15 <dgoulet> ...
16:55:32 <isis> oh!  maybe a chance to write some more rust?
16:55:36 <isis> :D
16:55:44 <nickm> or python
16:55:47 <nickm> or go
16:55:53 <nickm> or haskell
16:56:07 <nickm> the voter part seems like it would be very well suited to a functional language
16:58:24 <isis> oh, or julia could be interesting because then we could use the builtin distributed computation abilities to assign remote jobs (e.g. "check reachabillity for X", "deduplicate these descriptors")
16:58:38 <isis> this is starting to seem more fun now
16:58:59 <nickm> somebody needs to rewrite the beatles' Julia to be about the programming language
16:59:15 <nickm> that somebody should probably know the Julia language better than I do.
16:59:33 <isis> #action cover the beatles' "julia" to be about the programming language
17:00:04 <dgoulet> ok I'll go afk for food, thanks all :)
17:00:13 <nickm> ok. ready to endmeeting?
17:00:15 <nickm> going once
17:00:20 <isis> dgoulet: see you!
17:00:23 <nickm> going twice
17:00:30 <dgoulet> isis: o/
17:00:31 <dgoulet> bbl
17:00:32 <nickm> so mote it be
17:00:34 <nickm> #endmeeting