00:16:30 <teor> #startmeeting prop280 00:16:30 <MeetBot> Meeting started Wed Sep 13 00:16:30 2017 UTC. The chair is teor. Information about MeetBot at http://wiki.debian.org/MeetBot. 00:16:30 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic. 00:16:58 <teor> So we started meetbot late, I'll replay the open questions and then we'll get to question 1 00:17:14 <teor> The privcount in tor proposal is here: https://gitweb.torproject.org/torspec.git/tree/proposals/280-privcount-in-tor.txt 00:17:26 <teor> It deals with the low-level blinding, noise, and aggregation, specific statistics are for a later proposal 00:17:38 <teor> 1. How do we make sure the protocol survives data collector and tally reporter failures? (recall data collectors are on tor relays, and tally reporters combine the share keeper and tally server roles) 00:17:43 <teor> 2. How do we make sure the protocol survives outlier or broken measurements from relays? 00:18:16 <teor> 3. How do we make sure the added noise is sufficient, particularly as we add and remove statistics? What if we think more noise is safer? What if we want to add different noise? (This might be out of scope) 00:18:24 <teor> 4. (how) do we measure different statistics over different time periods? 00:18:54 <teor> On to question1 00:19:22 <teor> So the proposed design for redundancy is to have multiple subsets of tally reporters, where each subset of tally reporters handles a subset of data collectors 00:19:34 <armadev> seems like question1 depends in part on how we're going to pick the tally reporters (and how many) 00:19:59 <teor> For example, we have 9 tally reporters in 3 sets of 3, and each set handles 1/3 of the relays 00:20:07 <ohmygodel> armadev how so ? 00:20:21 <armadev> it leads to different trust assumptions 00:20:25 <nickm> I've seen three ideas here: 00:20:33 <nickm> "do nothing; assume everybody's honest" 00:20:35 <armadev> like, "we'll have the dir auths run them" vs "rob and aaron and teor will run them" 00:20:53 <nickm> "use different instances of the algorithm with different members; hope one works." 00:21:12 <nickm> "as above but use k-of-n secret sharing instead of multiple instances" 00:21:26 <nickm> i haven't seen either of the last two fully worked out 00:21:55 <ohmygodel> secret sharing seems like the best to me to deal with TR failures 00:22:03 <Samdney> +1 00:22:09 <teor> (given the challenge of getting the dir auths to run bandwidth authorities [which to be fair is a code quality issue as well], let's make a design that doesn't require that level of trust) 00:22:30 <nickm> My objection to secret sharing as it stands is "there is no specification and nobody has volunteered to write one" 00:22:38 <nickm> but that's easily resolved :) 00:22:53 <Samdney> really not? oh! 00:23:00 <ohmygodel> it is cheap for both DCs and TRs and it can survive the failure of any n-k servers 00:23:06 <ohmygodel> sorry n-k TRs 00:23:43 <nickm> I'd also like to know if secret-sharing can be implemented in a similarly efficient and forward-secure way as the current code uses 00:23:52 <armadev> just so i'm following correctly, by failure do we mean 'missing' or 'byzantine'? 00:23:59 <teor> Yes 00:23:59 <nickm> byzantine 00:24:01 <ohmygodel> fair enough nickm, one drawback is that implementation is somewhat more complex 00:24:02 <teor> Both 00:24:08 <armadev> ok 00:24:15 <nickm> ohmygodel: a little complexity, I know will be there... 00:24:15 <ohmygodel> armadev: i meant missing 00:24:28 <nickm> ... but adding a bunch of logic to the critical path would be sad 00:24:37 <ohmygodel> byzantine adversary can cause outputs to be garbage 00:24:42 <nickm> so if incremending a counter gets much slower, that would be bad 00:24:49 <nickm> *incrementing 00:24:58 <teor> Which is why we split the data collectors into independent subsets 00:25:03 <nickm> and if forward secrecy gets much worse, that would be bad 00:25:22 <nickhopper> but the secret sharing is just for the secrets part, not the counter increments right? 00:25:23 <ohmygodel> incrementing the counter would be the same as before 00:25:39 <nickm> nickhopper: nobody knows, there is no spec 00:25:43 <ohmygodel> forward secrecy at the DCs would be the same 00:26:09 <ohmygodel> yes nickm, this definitely needs to be written up 00:26:33 <teor> So if we were to do k-n secret sharing, we could split the secret, then encrypt, and wipe the original secret 00:26:46 <teor> I think that gives us forward secrecy 00:26:59 <ohmygodel> but to make it a bit clearer the way I envision it, it would work like this: 00:27:08 <ohmygodel> 1. DCs do blinding with TRs as before 00:27:14 <ohmygodel> 2. DCs increment counters as before 00:27:37 <ohmygodel> 3. DCs send k-of-n secret shares of blinded value to TRs 00:27:54 <ohmygodel> 4. TRs add current blinded values to received secret shares 00:28:24 <ohmygodel> 5. TRs reveal secret shares to each other (or some designated party) to allow reconstruction of the secret, which is the desired aggregate value 00:29:04 <ohmygodel> 1.5 The noise gets added into the counter before (sorrry I skipped this between steps 1 and 2) 00:29:48 <ohmygodel> Then as long as k TRs are online and reveal their shares, the secret can be reconstructed 00:30:01 <ohmygodel> And as long as no more than k-1 TRs collude, no private inputs can be learned 00:30:30 <teor> I am confused about the meaning of "blinded values" and "secret shares". Which steps produce which of these? 00:31:04 <ohmygodel> by “blinded value” I meant the value stored in the counter, which include a blinding value, the noise, and any increments 00:31:52 <ohmygodel> “secret shares” are produced by DCs from the blinded value (aka the counter) in step 3 (which is at the end of the measurement period) and sent immediately to the TRs 00:33:39 <teor> So do we need a spec for k-n secret shares, and a spec revision to prop280 that uses them? 00:33:56 <ohmygodel> yeah 00:34:00 <Samdney> lol 00:34:20 <teor> Any volunteers? Otherwise I will just note them down as actions 00:34:26 <armadev> and what they get us is that some talliers can fall out of the picture but we can still recover aggregate values? 00:34:46 <ohmygodel> ok yeah I got the steps mixed up 00:34:47 <Samdney> me, maybe. Have to think about it ;) 00:34:47 <nickm> wait, i am confused. 00:34:54 <robgjansen> ohmygodel: i dont understand your proposal 00:34:55 <ohmygodel> nickhopper was right 00:35:02 <nickm> the blinding value is added to the blinded value. 00:35:05 <ohmygodel> the secret sharing happens at step 1 00:35:19 <nickm> If any single DC is broken, its part of the blinding value won't be recoverable 00:35:40 <Samdney> (that was the answer to "volunteer?") 00:35:41 <ohmygodel> *of* the blinding value (only one is produced, not pairwise as before) 00:35:43 <robgjansen> if the blinding value (i.e., the random value added to the counter to make it appear random upon inspection) is not secret shared, and som TRs holding those go offline, and the blinding values are not secret shared, how can we reconstruct them? 00:36:07 <ohmygodel> right yes robgjansen 00:36:25 <robgjansen> sure we can reconstruct the final blinded value... 00:36:29 <teor> Ok, so we need: 00:36:41 <robgjansen> but we also need to reconstruct the blinding value in order to remove it 00:36:43 <ohmygodel> ok let me try again 00:37:11 <ohmygodel> 1. Each DC chooses a random blinding value, send secret shares to the TRs, and adds the blinding value into the counter 00:37:14 <nickm> ohmygodel: maybe try in a specification ? 00:37:17 <nickm> :D 00:37:25 <ohmygodel> 2. The DC increments the counter as before 00:37:40 <teor> 1.5. The DC adds noise 00:37:44 <ohmygodel> 3. The DC adds in noise to the counter 00:38:18 <robgjansen> ohmygodel: ahh, i missed in step 1 that the blinding values are also secret shared 00:38:37 <robgjansen> seems ok to me then 00:38:37 <ohmygodel> 4. At the end of measurement, the DCs broadcast their counters / send them to a Tally Server / send them the TRs / whatever 00:39:08 <ohmygodel> 5. The TRs add their shares (actually just those shares from DCs didn’t fail before broadcasting their counters) 00:39:20 <ohmygodel> 6. The TRs broadcast their secret shares to reconstruct the secret 00:39:26 <teor> Ok, so for forward secrecy, it's best that the noise is added before any increments (1.5, not 3.) 00:39:38 <Samdney> +1 00:39:38 <ohmygodel> 7. The secret (aka the sum of the blinding values) and the broadcast counters get added to yield the aggregate 00:40:15 <teor> And for state management, it's best that 1. becomes "encrypt secret shares to the TRs" 00:40:25 <teor> And then all the data is sent in one hit at the end. 00:40:29 <teor> Let's do this in a spec 00:40:36 <robgjansen> #agreed 00:40:39 <ohmygodel> yes teor that seems right 00:40:44 <Samdney> oh yes please 00:41:01 <robgjansen> (sorry, i don't know how to use meetbot) 00:41:03 <teor> #action write a k-of-n secret sharing spec 00:41:25 <teor> #action revise prop280 to use k-of-n secret sharing 00:41:28 <teor> (I hope that works) 00:41:49 <ohmygodel> so that sketch also includes my suggestion to deal with DC failures - just have the TRs use only the shares from DCs that successfully submitted their stuff at the end of measurement 00:42:09 <nickm> how do we handle DCs being deliberately junky? 00:42:17 <armadev> what if there's disagreement about which DCs successfully submitted their stuff? 00:42:19 <teor> Let's move onto the next question, because we have 20 minutes ledt 00:42:29 <teor> 2. How do we make sure the protocol survives outlier or broken measurements from relays? 00:43:06 <Samdney> this questions depends on the ration of broken measurments to all mearsurements, I think 00:43:31 <ohmygodel> ok so for this question, the subset idea seems like a fine one to me 00:43:40 <teor> The current proposal is to split the DCs into multiple independent subsets, calculate an aggregate for each subset, and then take the median (or whatever) 00:43:51 <nickhopper> defining "broken" as byzantine, yes? 00:44:06 <ohmygodel> here broken is byzantine, yes 00:44:27 <teor> If we make the subsets depend on an shared random value released *after* results are submitted, then relays can't game their subsets 00:45:15 <nickm> teor: to be fair, there is no spec for doing this part either. The current proposal assumes that the subsets have been constructed and that's that 00:45:18 <teor> This also handles a small amount of disagreement about which DCs submitted, for example, if a DC crashes during results ipload 00:45:26 <armadev> fun math. (notice that to game median, you only need to get a liar into half of the subsets) 00:45:35 <teor> Yes, that's true 00:45:59 <teor> But you have to get the right number of liars in each subset 00:46:07 <Samdney> and of the course the size of a subset .. 00:46:08 <nickhopper> 1 = right number 00:46:30 <nickhopper> if all you are worried about is disrupting the result 00:46:40 <ohmygodel> yeah this is a bit of a sad hack to deal with the lack of robustness against bad DC inputs in the protocol 00:47:35 <teor> #action update the proposal to deal with post-submission shared-random-based relay subset selection 00:47:51 <teor> ^ in reply to nickm 00:48:16 <nickm> thanks 00:48:36 <ohmygodel> I really don’t think it can handle a strategic adversary 00:48:54 <teor> No, but neither can the current statistics, tbh 00:49:15 <ohmygodel> because in order to have good statistics you want reasonably large subsets, which means an adversary is likely to be in it 00:49:41 <armadev> makes sense. alas 00:50:03 <teor> We have about 10 minutes left, so let's leave that for future research? 00:50:04 <ohmygodel> we can’t just have a huge number of subsets, because in the limit that is just releasing per-relay statistics, which is what Tor does now 00:50:19 <ohmygodel> teor: I do want to mention something about this 00:50:32 <ohmygodel> You must account for the number of subset outputs that are being produced when generating noise 00:50:45 <armadev> right, more subsets means more noise 00:50:48 <ohmygodel> k subsets = k times the noise per subset to get the same privacy guarantee 00:51:13 <teor> #action increase the noise added in the spec for each subset of relays that produces a result 00:51:18 <ohmygodel> and that’s the real reason to limit the number of subsets 00:51:40 <teor> 3. How do we make sure the added noise is sufficient, particularly as we add and remove statistics? What if we think more noise is safer? What if we want to add different noise? (This might be out of scope) 00:52:16 <nickm> well, it's essential if we want to deploy 00:52:32 <teor> Do we have a basic idea of how version upgrades will work? 00:52:58 <nickm> between sets of statistics? 00:53:17 <teor> Yes 00:53:22 <armadev> straw person #1: we treat all relays doing the wrong version as bad, and discard their votes 00:53:29 <teor> The current proposal says that TRs can add zeroes for missing counters, and then notes that they will need to add noise as well 00:54:13 <teor> But we also need minimum thresholds for activating a new statistic (and removing an old one) 00:54:36 <Samdney> in physics you would choose armadev's version ;) (sorry I'm physician) 00:54:46 <ohmygodel> teor that seems like a good approach to me: one stats regime at a time, switchover when enough have upgraded 00:54:58 <teor> For example: when a new counter is supported by 10% of relays, report it. When an old counter is supported by < 5% of relays, remove it. 00:55:00 <nickm> Samdney: "physicist"; physician is different ;) 00:55:05 <teor> Or we could say "set of statistics" 00:55:31 <Samdney> (oh! thank you nickm, my english!) 00:55:33 <nickm> teor: in practice we could have that be in the consensus, and let the authorities decide 00:55:36 <teor> I think it's less complex and less risky to switch an entire statistics set 00:55:46 <nickm> Samdney: your english is still better than my anything-else :) 00:56:02 <nickm> teor: but let's think though 00:56:15 <armadev> so everybody adds enough noise as if 100% of relays are reporting, even when only 10% of them are, and the rest are filled in as 0's? 00:56:20 <teor> But it makes for slower upgrades 00:56:32 <ohmygodel> armadev: the noise is independent of the number of relays 00:56:33 <nickm> this would mean that if we were on statistics set X, we would never learn statistics from routers that did not support set X. 00:57:02 <nickm> But if such routers had a different value for some counters within X, we would not see their values 00:57:03 <teor> Indeed. Which is very sad. 00:57:16 <nickm> if we were looking for signs of an attack, a clever attacker would just attack the old routers 00:57:33 <nickm> so, here's a countersuggestion: 00:57:36 <armadev> or heck, add old routers to move us back to statistics set X-2 00:57:45 <nickm> let there be multiple named sets of statistics. 00:57:53 <nickm> each set can be turned on or off independently in the consensus 00:58:25 <teor> So the problem with this is that the noise is a function of the entire set of statistics being collected 00:58:46 <teor> * noise distribution 00:59:48 <armadev> that is tied into my question 4 00:59:51 <teor> So it's not safe in the general case to combine new stats with old stats 01:00:00 <teor> Or quick stats with slow stats 01:00:27 <armadev> unless we run the whole apparatus in parallel, one for each type of stat 01:00:34 <Samdney> what exactly are "quick" or "slow" stats? 01:00:39 <robgjansen> just gonna writ that 01:00:40 <armadev> and make sure none of our stats are dependent on each other 01:00:43 <ohmygodel> you could divide the “privacy budget” (i.e. the noise allocation) evenly among the sets of statistics that are available at a given time 01:01:14 <teor> Samdney: stats collected over different periods 01:01:15 <armadev> samdney: quick ones would be one where the numbers each relay publishes have to do with a small period, and slow ones would be for large periods 01:01:24 <ohmygodel> but you need that number to stay constant 01:02:04 <nickm> Is there a formula that actually works as sets of statistics evolve? 01:02:34 <ohmygodel> nickm: the way we handled that was that our privacy definition only covered a given period of time 01:03:19 <teor> Ok, so we need to do something about continuous collection? 01:03:20 <ohmygodel> that is, we hide some amount on “activity” (i.e. making a circuit, sending bytes) within some period of time (e.g. 24 hours) 01:04:25 <ohmygodel> so reasonable activity within k hours should not be discernible from the statistics 01:04:31 <teor> ohmygodel: re: evolving stats: can you fix the sigma values for the old statistics, and then put all the new privacy budget on the new statistics? 01:04:54 <nickm> Is this a calculation that is easy to automate? 01:05:22 <ohmygodel> teor: yes you can, although changing the privacy budget allocation required a delay period between the two collections 01:05:55 <armadev> (i guess you could implement the delay by the talliers deciding not to tally) 01:06:03 <ohmygodel> the reason for that was your k hours of activity might span the two collection periods running under different budget allocations, which could violate the privacy guarantee 01:06:19 <armadev> lots of moving parts here 01:06:21 <teor> if we have versions in the consensus, we can have a version "off" 01:06:56 <nickm> but if we regularly turn some statistics off, that means we can't use statistics for ongoing incident detection so well 01:07:03 <ohmygodel> nickm: I think the answer is yes. For example, everything is basically automated in PrivCount now, except choosing exactly which statistics you want to collect (which requires a human to decide). 01:07:23 <nickm> do those statistics require any annotations? 01:07:55 <ohmygodel> They need a “sensitivity” (the max amount by which the limited amount of user activity can change them) 01:08:14 <nickm> how do we derive that value? 01:08:31 <teor> ohmygodel: and an expected value 01:09:01 <teor> nickm: existing statistics about user activity 01:09:08 <ohmygodel> teor: right, for accuracy, some guess about the likely value will help optimize the noise allocation 01:09:43 <nickm> sounds like there's a bootstrapping issue there...? 01:09:52 <teor> nickm: or estimates, or the amount of activity we *want* to protect 01:09:56 <nickm> what are the risks if we just make a wild-assed guess? 01:10:12 <ohmygodel> nickm: In general, you have to reason about it, but often there are just a few sensitivities shared across many types of statistics (that differ in ways irrelevant to the sensitivy) 01:10:40 <teor> Either: exposing as much information as relays currently do, or a signal that's swamped by the noise 01:11:21 <teor> Oh, but you get an aggregate, so the "too little noise" case is still better than tor's current stats 01:11:27 <ohmygodel> nickm: you might have a very noisy (aka inaccurate) answer, which you will likely recognize because you know the noise distribution 01:11:28 <nickm> yeah 01:11:52 <ohmygodel> teor: there is no privacy issue by choosing the expected value incorrectly 01:12:01 <robgjansen> without good ideas of expected values for noise "optimization", you risk some counters having too much noise 01:12:17 <teor> what about the sensitivity? 01:12:22 <robgjansen> but after collection, you can always compute the fraction of the result that noise accounts for 01:12:42 <ohmygodel> sensitivity must be right or the differential privacy guarantee may be violated 01:12:47 <robgjansen> if its too high, use your updated estimate in the next round 01:13:34 <teor> #action specify how to estimate sensitivity and expected values for each counter, and how to turn that into a set of sigmas 01:14:35 <teor> #action specify how to safely change the set of counters that is collected (or the noise on those counters) as new tor versions that support new counters are added to the network (and old versions leave) 01:15:10 <teor> Is that a good summary? 01:15:27 <teor> Do we have time to move on to question 4, or do we want to leave that for later? 01:15:28 <teor> 4. (how) do we measure different statistics over different time periods? 01:15:37 <ohmygodel> I’m fine discussing it. 01:15:50 <armadev> it seems related to the discussion we had on 3 01:15:54 <nickm> (for the record, I think having only one set of statistics at a time will be trouble.) 01:15:59 <nickm> (from a deployability POV) 01:16:04 <armadev> agreed 01:16:12 <Samdney> +1 01:16:19 <armadev> i also worry about cases where we have a whole lot of number.s like the per country counts. 01:16:39 <teor> I agree. I'd like there to be a way to safely add and remove individual counters as needed. 01:16:51 <teor> armadev: it might turn out that those counters were never safe 01:17:02 <nickm> and have old relays still report counters. 01:17:06 <armadev> (oh, and for the per country thing, which geoip file each relay is using fits into it. ugh.) 01:18:12 <ohmygodel> nickm: An easy way would just be to treat each set of statistics as independent. That is what Tor does currently. We tried to do better by considering how user activity can affect all statistics being collected, but maybe incremental progress is better. 01:20:01 <ohmygodel> armadev: I agree that it isn’t clear how private and accurate the entire collection of Tor statistics could be if it was all ported to using differential privcay. 01:20:02 <armadev> we don't want it on today's list of questions, but we might also want to pick a policy where relays only collect things that we'd be ok having published per relay, if stuff breaks 01:20:02 <nickm> so the privcount design assumes some kind of worst-case about how user activity is exposed by non-independent statistics? 01:21:02 <ohmygodel> armadev: I would further add the some individual statistics are unlikely to be able to be collected with reasonable accuracy and reasonable DP privacy (e.g. countries with few users, as Karsten discovered). 01:21:50 <teor> nickm: yes 01:22:18 <nickm> I wonder how close we are to the worst-case here 01:22:20 <ohmygodel> nickm: yes, it considers how much user activity (given that it is within some limits) can affect each statistic and then takes the worst-case view that possibly *all* stats could be simultaneously affected by those amounts 01:22:59 <armadev> does that mean that separating it, i.e. running several in parallel and assuming they're not correlated, can really reduce the amount of noise that it feels it needs to add? 01:23:19 <armadev> s/it/statistics/ 01:24:07 <teor> Well, it reduces the differential privacy guarantee 01:24:16 <ohmygodel> yes, treating different sets independently can reduce the amount of noise it feels it needs to add to each one 01:24:33 <armadev> but we'd best be right about the independence 01:24:47 <teor> Well, it would be no worse than the current state 01:25:00 <armadev> except if we decide to collect something new, which we weren't comfortable collecting before 01:25:01 <ohmygodel> but yes, it means the DP guarantee only applies to each set and not them simultaneously (although DP composes, and so it doesn’t just explode, it degrades) 01:25:28 <teor> We could probably deal with that as a first cut 01:25:37 <teor> Particularly if the number of sets was small 01:26:15 <teor> For example, we have 5 different tor versions with significant presence (> 10%) on the network: https://metrics.torproject.org/versions.html 01:26:53 <teor> If we added 1-2 sets of statistics per version, then we'd be looking at about ~8 sets of simultaneous statistics 01:27:01 <ohmygodel> armadev: It seems to me like most statistics will not be independent, and so maybe it is better just to call this accepting a lower privacy guarantee than assuming they are independent. 01:27:28 <teor> although from a design perspective, it would make sense to group related statistics together 01:27:34 <nickm> yeah. I wonder if we could define sets logically rather than per-version 01:27:39 <teor> s/would/would still/ 01:27:40 <armadev> ohmygodel: yeah. even for things that seem quite different, like "user counts" and "bandwidth use", they won't be independent 01:28:46 <teor> nickm: for example, we might have a "version 3 onion service" set in 0.3.3, and then an "extra version 3 onion service set" in 0.3.4 01:28:55 <armadev> defining sets logically is better because the noise will be better suited for them too 01:29:07 <nickm> and a "basic bandwidth usage" set that's very stable over time 01:29:12 <ohmygodel> nickm: could we limit the number of simultaneous sets active at a given time ? 01:29:32 <nickm> ohmygodel: programmatically or via good sense? 01:29:33 <teor> It would seem that the consensus and the protocols would be the way to do this 01:29:58 <teor> There's no point in collecting sets supported by very few relays 01:29:59 <ohmygodel> programmatically using good sense :-) ? 01:30:37 <ohmygodel> by that I mean, could we say “Tor will allow no more than 5 sets of statistics to be reported at a given time” ? 01:30:46 <teor> Or, maybe it's better to say "there's no point in collecting sets only supported by relay versions we don't support" 01:30:57 <ohmygodel> and then the Tor (via the DCs and/or TRs) would enforce that ? 01:31:42 <teor> Yes, it's possible. We could use the existing protocol version infrastructure for that 01:32:28 <cypherpunk> join #tor 01:32:33 <cypherpunk> oops 01:32:36 <ohmygodel> because then we could just build that into the privacy budget: 1/5 of it for each possible set 01:33:39 <robgjansen> the total number of sets we want may increase over time though 01:33:52 <ohmygodel> changing that budget dynamically could also be done, but it should have some limit so that an adversary can’t destroy all stats by running all Tor versions and making the per-version budget too low 01:34:16 <teor> why don't we estimate how many sets we think we'll have, build it into the privacy budget, and then *if* we go over, add new sets using the new budget> 01:34:35 <nickm> or maybe have the counter/budget/something values in the consensus? 01:34:46 <teor> then we degrade slightly if we go over, but it's less complex, and less dynamic 01:35:01 <robgjansen> do we want to build in some kind of mechanism to allow for a "reset" of the sets and privacy 01:35:15 <teor> I don't understand 01:35:36 <Samdney> a "default button"? 01:35:53 <robgjansen> you guess ahead of time that you want 8 sets and your privacy budget is 200... 01:36:03 <robgjansen> after 1 year you realize you were way off 01:36:20 <robgjansen> and you actually have 30 sets and need a budget of 2000 01:36:28 <teor> Ok, so this is what a consensus parameter would be useful for 01:37:00 <teor> #action specify the privacy budget parameter that we need to turn into consensus parameters 01:37:57 <armadev> consensus params make me nervous because they are potentially new every hour, and relays don't necessarily have history of them 01:38:17 <nickm> if not consensus params, some similar mechanism? 01:38:27 <armadev> yep. something. another moving part. :) 01:38:49 <armadev> like the shared-random-value that the subset-creation module needs. hoof. 01:39:28 <nickm> armadev: https://www.youtube.com/watch?v=jy5vjfOLTaA 01:39:30 <ohmygodel> I’d just like to mention again that while changing the budgets and allocations over time can be done, it requires some mechanism to make the guarantee apply. Options include (1) enforce a delay between measurement periods (what we do now), (2) reduce accuracy temporarily, (3) change the privacy guarantee to apply to activity within a specific time period (and not activity over any time period of a given length). 01:39:46 <teor> armadev: we already have a suitable shared random value 01:41:05 <armadev> do we? does it go public at the time we need? 01:41:21 <armadev> i mean, i agree we have one. i'm not yet convinced it's suitable. 01:41:54 <armadev> (we should probably do what it takes to make it suitable. but that might involve constraints that make us sad.) 01:42:26 <nickm> We might need to have some that last a long time, or have some way to get old ones, or such 01:42:27 <robgjansen> ohmygodel: even though privcount does (1) now, i think that's not the best for a continuous deployment 01:42:46 <ohmygodel> armadev: The TRs could do subset selection among themselves as well at the end of the collection period 01:43:02 <teor> armadev: it might mean waiting for 12 hours for stats, because it's secure as long as we don't know the final set of reveals, which can be revealed 12 hours before the SRV 01:43:20 <ohmygodel> robgjansen: yeah, I actually think (2) might work better. 01:43:40 <teor> ohmygodel is right, there are protocols that let you select subsets as long as one party is trusted 01:43:42 <ohmygodel> that way you don’t lose statistics, some just get a bit blurrier for a bit 01:43:53 <nickm> I'm afraid I need to sign off soon 01:43:57 <armadev> big fan of (2) 01:44:23 <armadev> so, we have a bunch of #action lines, and no names attached to them. what could go wrong? :) 01:44:25 <robgjansen> agreed about (2); (3) leaves room open for privacy attacks 01:44:41 <teor> #action specify how to maintain privacy guarantees when the set of statistics changes, probably by reducing accuracy 01:44:59 <ohmygodel> robgjansen: indeed, woe to those who cross the International Privacy Line 01:45:15 <robgjansen> :) 01:46:17 <teor> Ok, any more to add before I turn off the meet bot in about 1 minute's time? 01:46:32 <ohmygodel> Nothing from me 01:46:40 <teor> Thank you everyone for helping us make tor's statistics better 01:46:50 <armadev> i was wondering about a 'question 6', something like 'is there sufficient evidence that this whole thing is worth the trouble' 01:46:51 <nickm> yaay. 01:46:57 <Samdney> :) 01:47:05 <armadev> (i hope the answer is yes, but we would be wise to collect enough of it first probably) 01:47:07 <nickm> well, it depends how complicated we make it 01:47:36 <Samdney> I have the feeling it become very complicated ;) 01:47:36 <nickm> i think in the long term it makes our existing statistics more private, and makes collecting other statistics safely possible. 01:47:37 <armadev> i do agree that getting rid of our current stats, which are probably harmful in surprising ways, would be good if we can get 'most' of it still in this new way 01:47:53 <nickm> but the complications are all tbd right now IMO 01:48:12 <nickm> I suspect that we might see the next version of the spec and look for places to simplify 01:48:27 <nickm> the in-tor implementation for the current prop280 is dead simple, fwiw 01:48:44 <ohmygodel> armadev: There are attacks right now using Tor’s current stats (HS guard discovery). 01:49:30 <armadev> yep 01:49:46 <robgjansen> privcount is better than current methods 01:49:55 <armadev> i assume that when we go to shift things over to privcount, we will... we will decide that none of them can be collected? lots of questions remain :) 01:50:13 <robgjansen> do we want to keep it simple and make progress, or try to wait until we have the perfect solution 01:50:40 <Samdney> can we have both? ;) 01:50:40 <ohmygodel> Also, I believe that most of your statistics could be gathered using this sytem with similar utility and no worse privacy (actually better because the privacy methodology isn’t ad hoc). 01:50:43 <armadev> like, say, the per relay bandwidth stats 01:50:56 <armadev> i imagine when we go to do that in privcount, we will do away with per relay bandwidth stats 01:51:10 <nickm> robgjansen: not wait for perfection, that's for sure 01:51:24 <ohmygodel> armadev: yes, an exception is per-relay statistics that you actually want per relay 01:51:26 <robgjansen> IMO, make simple progress now and don't try to desgin the perfect end-all solution; we can update to better solutions later 01:51:26 <nickm> (note for the metrics team: we will not strand you suddenly) 01:51:38 <robgjansen> where better solutions != privcount 01:51:56 <nickm> well, i'm not clear whether this k-of-n thing is still privcount or not :) 01:52:02 <nickm> but that's up to you 01:52:14 <nickm> ok, i need to sign off. i'm getting silly 01:52:21 <nickm> good night everyone! or good morning as the case may be! 01:52:26 <ohmygodel> good night ! 01:52:30 <Samdney> night 01:52:31 <robgjansen> goodnight! 01:52:34 <armadev> thanks all 01:52:42 <ohmygodel> so is there a plan here ? 01:52:48 <armadev> teor: are you going to commit tweaks to prop280? i made some too while reading it (grammar etc) 01:53:09 <armadev> i assumed that the plan was that teor had a plan, since he's been saying #action without trying to attach names to things :) 01:53:28 <teor> aramdev: #23492 01:53:34 <armadev> great 01:53:40 <teor> You can add to my branch 01:54:01 <teor> And re: the plan, I don't know how to split up the workload 01:54:29 <ohmygodel> teor: yeah, suddenly none of us is working on transitioning PrivCount :-/ 01:55:13 <ohmygodel> Sorry, that smiley looks more malevolent than I expected 01:55:27 <teor> I think we will work it out over the next few weeks, or at the dev meeting. Network team is focused on 0.3.1 and 0.3.2 right now. 01:55:53 <teor> Anyway, I think that's a good point to end the meetbot 01:55:56 <teor> #endmeeting