#tor-dev log

00:16:30 <teor> #startmeeting prop280
00:16:30 <MeetBot> Meeting started Wed Sep 13 00:16:30 2017 UTC.  The chair is teor. Information about MeetBot at http://wiki.debian.org/MeetBot.
00:16:30 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic.
00:16:58 <teor> So we started meetbot late, I'll replay the open questions and then we'll get to question 1
00:17:14 <teor> The privcount in tor proposal is here: https://gitweb.torproject.org/torspec.git/tree/proposals/280-privcount-in-tor.txt
00:17:26 <teor> It deals with the low-level blinding, noise, and aggregation, specific statistics are for a later proposal
00:17:38 <teor> 1. How do we make sure the protocol survives data collector and tally reporter failures? (recall data collectors are on tor relays, and tally reporters combine the share keeper and tally server roles)
00:17:43 <teor> 2. How do we make sure the protocol survives outlier or broken measurements from relays?
00:18:16 <teor> 3. How do we make sure the added noise is sufficient, particularly as we add and remove statistics? What if we think more noise is safer? What if we want to add different noise? (This might be out of scope)
00:18:24 <teor> 4. (how) do we measure different statistics over different time periods?
00:18:54 <teor> On to question1
00:19:22 <teor> So the proposed design for redundancy is to have multiple subsets of tally reporters, where each subset of tally reporters handles a subset of data collectors
00:19:34 <armadev> seems like question1 depends in part on how we're going to pick the tally reporters (and how many)
00:19:59 <teor> For example, we have 9 tally reporters in 3 sets of 3, and each set handles 1/3 of the relays
00:20:07 <ohmygodel> armadev how so ?
00:20:21 <armadev> it leads to different trust assumptions
00:20:25 <nickm> I've seen three ideas here:
00:20:33 <nickm> "do nothing; assume everybody's honest"
00:20:35 <armadev> like, "we'll have the dir auths run them" vs "rob and aaron and teor will run them"
00:20:53 <nickm> "use different instances of the algorithm with different members; hope one works."
00:21:12 <nickm> "as above but use k-of-n secret sharing instead of multiple instances"
00:21:26 <nickm> i haven't seen either of the last two fully worked out
00:21:55 <ohmygodel> secret sharing seems like the best to me to deal with TR failures
00:22:03 <Samdney> +1
00:22:09 <teor> (given the challenge of getting the dir auths to run bandwidth authorities [which to be fair is a code quality issue as well], let's make a design that doesn't require that level of trust)
00:22:30 <nickm> My objection to secret sharing as it stands is "there is no specification and nobody has volunteered to write one"
00:22:38 <nickm> but that's easily resolved :)
00:22:53 <Samdney> really not? oh!
00:23:00 <ohmygodel> it is cheap for both DCs and TRs and it can survive the failure of any n-k servers
00:23:06 <ohmygodel> sorry n-k TRs
00:23:43 <nickm> I'd also like to know if secret-sharing can be implemented in a similarly efficient and forward-secure way as the current code uses
00:23:52 <armadev> just so i'm following correctly, by failure do we mean 'missing' or 'byzantine'?
00:23:59 <teor> Yes
00:23:59 <nickm> byzantine
00:24:01 <ohmygodel> fair enough nickm, one drawback is that implementation is somewhat more complex
00:24:02 <teor> Both
00:24:08 <armadev> ok
00:24:15 <nickm> ohmygodel: a little complexity, I know will be there...
00:24:15 <ohmygodel> armadev: i meant missing
00:24:28 <nickm> ... but adding a bunch of logic to the critical path would be sad
00:24:37 <ohmygodel> byzantine adversary can cause outputs to be garbage
00:24:42 <nickm> so if incremending a counter gets much slower, that would be bad
00:24:49 <nickm> *incrementing
00:24:58 <teor> Which is why we split the data collectors into independent subsets
00:25:03 <nickm> and if forward secrecy gets much worse, that would be bad
00:25:22 <nickhopper> but the secret sharing is just for the secrets part, not the counter increments right?
00:25:23 <ohmygodel> incrementing the counter would be the same as before
00:25:39 <nickm> nickhopper: nobody knows, there is no spec
00:25:43 <ohmygodel> forward secrecy at the DCs would be the same
00:26:09 <ohmygodel> yes nickm, this definitely needs to be written up
00:26:33 <teor> So if we were to do k-n secret sharing, we could split the secret, then encrypt, and wipe the original secret
00:26:46 <teor> I think that gives us forward secrecy
00:26:59 <ohmygodel> but to make it a bit clearer the way I envision it, it would work like this:
00:27:08 <ohmygodel> 1. DCs do blinding with TRs as before
00:27:14 <ohmygodel> 2. DCs increment counters as before
00:27:37 <ohmygodel> 3. DCs send k-of-n secret shares of blinded value to TRs
00:27:54 <ohmygodel> 4. TRs add current blinded values to received secret shares
00:28:24 <ohmygodel> 5. TRs reveal secret shares to each other (or some designated party) to allow reconstruction of the secret, which is the desired aggregate value
00:29:04 <ohmygodel> 1.5 The noise gets added into the counter before (sorrry I skipped this between steps 1 and 2)
00:29:48 <ohmygodel> Then as long as k TRs are online and reveal their shares, the secret can be reconstructed
00:30:01 <ohmygodel> And as long as no more than k-1 TRs collude, no private inputs can be learned
00:30:30 <teor> I am confused about the meaning of "blinded values" and "secret shares". Which steps produce which of these?
00:31:04 <ohmygodel> by “blinded value” I meant the value stored in the counter, which include a blinding value, the noise, and any increments
00:31:52 <ohmygodel> “secret shares” are produced by DCs from the blinded value (aka the counter) in step 3 (which is at the end of the measurement period) and sent immediately to the TRs
00:33:39 <teor> So do we need a spec for k-n secret shares, and a spec revision to prop280 that uses them?
00:33:56 <ohmygodel> yeah
00:34:00 <Samdney> lol
00:34:20 <teor> Any volunteers? Otherwise I will just note them down as actions
00:34:26 <armadev> and what they get us is that some talliers can fall out of the picture but we can still recover aggregate values?
00:34:46 <ohmygodel> ok yeah I got the steps mixed up
00:34:47 <Samdney> me, maybe. Have to think about it ;)
00:34:47 <nickm> wait, i am confused.
00:34:54 <robgjansen> ohmygodel: i dont understand your proposal
00:34:55 <ohmygodel> nickhopper was right
00:35:02 <nickm> the blinding value is added to the blinded value.
00:35:05 <ohmygodel> the secret sharing happens at step 1
00:35:19 <nickm> If any single DC is broken, its part of the blinding value won't be recoverable
00:35:40 <Samdney> (that was the answer to "volunteer?")
00:35:41 <ohmygodel> *of* the blinding value (only one is produced, not pairwise as before)
00:35:43 <robgjansen> if the blinding value (i.e., the random value added to the counter to make it appear random upon inspection) is not secret shared, and som TRs holding those go offline, and the blinding values are not secret shared, how can we reconstruct them?
00:36:07 <ohmygodel> right yes robgjansen
00:36:25 <robgjansen> sure we can reconstruct the final blinded value...
00:36:29 <teor> Ok, so we need:
00:36:41 <robgjansen> but we also need to reconstruct the blinding value in order to remove it
00:36:43 <ohmygodel> ok let me try again
00:37:11 <ohmygodel> 1. Each DC chooses a random blinding value, send secret shares to the TRs, and adds the blinding value into the counter
00:37:14 <nickm> ohmygodel: maybe try in a specification ?
00:37:17 <nickm> :D
00:37:25 <ohmygodel> 2. The DC increments the counter as before
00:37:40 <teor> 1.5. The DC adds noise
00:37:44 <ohmygodel> 3. The DC adds in noise to the counter
00:38:18 <robgjansen> ohmygodel: ahh, i missed in step 1 that the blinding values are also secret shared
00:38:37 <robgjansen> seems ok to me then
00:38:37 <ohmygodel> 4. At the end of measurement, the DCs broadcast their counters / send them to a Tally Server / send them the TRs / whatever
00:39:08 <ohmygodel> 5. The TRs add their shares (actually just those shares from DCs didn’t fail before broadcasting their counters)
00:39:20 <ohmygodel> 6. The TRs broadcast their secret shares to reconstruct the secret
00:39:26 <teor> Ok, so for forward secrecy, it's best that the noise is added before any increments (1.5, not 3.)
00:39:38 <Samdney> +1
00:39:38 <ohmygodel> 7. The secret (aka the sum of the blinding values) and the broadcast counters get added to yield the aggregate
00:40:15 <teor> And for state management, it's best that 1. becomes "encrypt secret shares to the TRs"
00:40:25 <teor> And then all the data is sent in one hit at the end.
00:40:29 <teor> Let's do this in a spec
00:40:36 <robgjansen> #agreed
00:40:39 <ohmygodel> yes teor that seems right
00:40:44 <Samdney> oh yes please
00:41:01 <robgjansen> (sorry, i don't know how to use meetbot)
00:41:03 <teor> #action write a k-of-n secret sharing spec
00:41:25 <teor> #action revise prop280 to use k-of-n secret sharing
00:41:28 <teor> (I hope that works)
00:41:49 <ohmygodel> so that sketch also includes my suggestion to deal with DC failures - just have the TRs use only the shares from DCs that successfully submitted their stuff at the end of measurement
00:42:09 <nickm> how do we handle DCs being deliberately junky?
00:42:17 <armadev> what if there's disagreement about which DCs successfully submitted their stuff?
00:42:19 <teor> Let's move onto the next question, because we have 20 minutes ledt
00:42:29 <teor> 2. How do we make sure the protocol survives outlier or broken measurements from relays?
00:43:06 <Samdney> this questions depends on the ration of broken measurments to all mearsurements, I think
00:43:31 <ohmygodel> ok so for this question, the subset idea seems like a fine one to me
00:43:40 <teor> The current proposal is to split the DCs into multiple independent subsets, calculate an aggregate for each subset, and then take the median (or whatever)
00:43:51 <nickhopper> defining "broken" as byzantine, yes?
00:44:06 <ohmygodel> here broken is byzantine, yes
00:44:27 <teor> If we make the subsets depend on an shared random value released *after* results are submitted, then relays can't game their subsets
00:45:15 <nickm> teor: to be fair, there is no spec for doing this part either.  The current proposal assumes that the subsets have been constructed and that's that
00:45:18 <teor> This also handles a small amount of disagreement about which DCs submitted, for example, if a DC crashes during results ipload
00:45:26 <armadev> fun math. (notice that to game median, you only need to get a liar into half of the subsets)
00:45:35 <teor> Yes, that's true
00:45:59 <teor> But you have to get the right number of liars in each subset
00:46:07 <Samdney> and of the course the size of a subset ..
00:46:08 <nickhopper> 1 = right number
00:46:30 <nickhopper> if all you are worried about is disrupting the result
00:46:40 <ohmygodel> yeah this is a bit of a sad hack to deal with the lack of robustness against bad DC inputs in the protocol
00:47:35 <teor> #action update the proposal to deal with post-submission shared-random-based relay subset selection
00:47:51 <teor> ^ in reply to nickm
00:48:16 <nickm> thanks
00:48:36 <ohmygodel> I really don’t think it can handle a strategic adversary
00:48:54 <teor> No, but neither can the current statistics, tbh
00:49:15 <ohmygodel> because in order to have good statistics you want reasonably large subsets, which means an adversary is likely to be in it
00:49:41 <armadev> makes sense. alas
00:50:03 <teor> We have about 10 minutes left, so let's leave that for future research?
00:50:04 <ohmygodel> we can’t just have a huge number of subsets, because in the limit that is just releasing per-relay statistics, which is what Tor does now
00:50:19 <ohmygodel> teor: I do want to mention something about this
00:50:32 <ohmygodel> You must account for the number of subset outputs that are being produced when generating noise
00:50:45 <armadev> right, more subsets means more noise
00:50:48 <ohmygodel> k subsets = k times the noise per subset to get the same privacy guarantee
00:51:13 <teor> #action increase the noise added in the spec for each subset of relays that produces a result
00:51:18 <ohmygodel> and that’s the real reason to limit the number of subsets
00:51:40 <teor> 3. How do we make sure the added noise is sufficient, particularly as we add and remove statistics? What if we think more noise is safer? What if we want to add different noise? (This might be out of scope)
00:52:16 <nickm> well, it's essential if we want to deploy
00:52:32 <teor> Do we have a basic idea of how version upgrades will work?
00:52:58 <nickm> between sets of statistics?
00:53:17 <teor> Yes
00:53:22 <armadev> straw person #1: we treat all relays doing the wrong version as bad, and discard their votes
00:53:29 <teor> The current proposal says that TRs can add zeroes for missing counters, and then notes that they will need to add noise as well
00:54:13 <teor> But we also need minimum thresholds for activating a new statistic (and removing an old one)
00:54:36 <Samdney> in physics you would choose armadev's version ;) (sorry I'm physician)
00:54:46 <ohmygodel> teor that seems like a good approach to me: one stats regime at a time, switchover when enough have upgraded
00:54:58 <teor> For example: when a new counter is supported by 10% of relays, report it. When an old counter is supported by < 5% of relays, remove it.
00:55:00 <nickm> Samdney: "physicist"; physician is different ;)
00:55:05 <teor> Or we could say "set of statistics"
00:55:31 <Samdney> (oh! thank you nickm, my english!)
00:55:33 <nickm> teor: in practice we could have that be in the consensus, and let the authorities decide
00:55:36 <teor> I think it's less complex and less risky to switch an entire statistics set
00:55:46 <nickm> Samdney: your english is still better than my anything-else  :)
00:56:02 <nickm> teor: but let's think though
00:56:15 <armadev> so everybody adds enough noise as if 100% of relays are reporting, even when only 10% of them are, and the rest are filled in as 0's?
00:56:20 <teor> But it makes for slower upgrades
00:56:32 <ohmygodel> armadev: the noise is independent of the number of relays
00:56:33 <nickm> this would mean that if we were on statistics set X, we would never learn statistics from routers that did not support set X.
00:57:02 <nickm> But if such routers had a different value for some counters within X, we would not see their values
00:57:03 <teor> Indeed. Which is very sad.
00:57:16 <nickm> if we were looking for signs of an attack, a clever attacker would just attack the old routers
00:57:33 <nickm> so, here's a countersuggestion:
00:57:36 <armadev> or heck, add old routers to move us back to statistics set X-2
00:57:45 <nickm> let there be multiple named sets of statistics.
00:57:53 <nickm> each set can be turned on or off independently in the consensus
00:58:25 <teor> So the problem with this is that the noise is a function of the entire set of statistics being collected
00:58:46 <teor> * noise distribution
00:59:48 <armadev> that is tied into my question 4
00:59:51 <teor> So it's not safe in the general case to combine new stats with old stats
01:00:00 <teor> Or quick stats with slow stats
01:00:27 <armadev> unless we run the whole apparatus in parallel, one for each type of stat
01:00:34 <Samdney> what exactly are "quick" or "slow" stats?
01:00:39 <robgjansen> just gonna writ that
01:00:40 <armadev> and make sure none of our stats are dependent on each other
01:00:43 <ohmygodel> you could divide the “privacy budget” (i.e. the noise allocation) evenly among the sets of statistics that are available at a given time
01:01:14 <teor> Samdney: stats collected over different periods
01:01:15 <armadev> samdney: quick ones would be one where the numbers each relay publishes have to do with a small period, and slow ones would be for large periods
01:01:24 <ohmygodel> but you need that number to stay constant
01:02:04 <nickm> Is there a formula that actually works as sets of statistics evolve?
01:02:34 <ohmygodel> nickm: the way we handled that was that our privacy definition only covered a given period of time
01:03:19 <teor> Ok, so we need to do something about continuous collection?
01:03:20 <ohmygodel> that is, we hide some amount on “activity” (i.e. making a circuit, sending bytes) within some period of time (e.g. 24 hours)
01:04:25 <ohmygodel> so reasonable activity within k hours should not be discernible from the statistics
01:04:31 <teor> ohmygodel: re: evolving stats: can you fix the sigma values for the old statistics, and then put all the new privacy budget on the new statistics?
01:04:54 <nickm> Is this a calculation that is easy to automate?
01:05:22 <ohmygodel> teor: yes you can, although changing the privacy budget allocation required a delay period between the two collections
01:05:55 <armadev> (i guess you could implement the delay by the talliers deciding not to tally)
01:06:03 <ohmygodel> the reason for that was your k hours of activity might span the two collection periods running under different budget allocations, which could violate the privacy guarantee
01:06:19 <armadev> lots of moving parts here
01:06:21 <teor> if we have versions in the consensus, we can have a version "off"
01:06:56 <nickm> but if we regularly turn some statistics off, that means we can't use statistics for ongoing incident detection so well
01:07:03 <ohmygodel> nickm: I think the answer is yes. For example, everything is basically automated in PrivCount now, except choosing exactly which statistics you want to collect (which requires a human to decide).
01:07:23 <nickm> do those statistics require any annotations?
01:07:55 <ohmygodel> They need a “sensitivity” (the max amount by which the limited amount of user activity can change them)
01:08:14 <nickm> how do we derive that value?
01:08:31 <teor> ohmygodel: and an expected value
01:09:01 <teor> nickm: existing statistics about user activity
01:09:08 <ohmygodel> teor: right, for accuracy, some guess about the likely value will help optimize the noise allocation
01:09:43 <nickm> sounds like there's a bootstrapping issue there...?
01:09:52 <teor> nickm: or estimates, or the amount of activity we *want* to protect
01:09:56 <nickm> what are the risks if we just make a wild-assed guess?
01:10:12 <ohmygodel> nickm: In general, you have to reason about it, but often there are just a few sensitivities shared across many types of statistics (that differ in ways irrelevant to the sensitivy)
01:10:40 <teor> Either: exposing as much information as relays currently do, or a signal that's swamped by the noise
01:11:21 <teor> Oh, but you get an aggregate, so the "too little noise" case is still better than tor's current stats
01:11:27 <ohmygodel> nickm: you might have a very noisy (aka inaccurate) answer, which you will likely recognize because you know the noise distribution
01:11:28 <nickm> yeah
01:11:52 <ohmygodel> teor: there is no privacy issue by choosing the expected value incorrectly
01:12:01 <robgjansen> without good ideas of expected values for noise "optimization", you risk some counters having too much noise
01:12:17 <teor> what about the sensitivity?
01:12:22 <robgjansen> but after collection, you can always compute the fraction of the result that noise accounts for
01:12:42 <ohmygodel> sensitivity must be right or the differential privacy guarantee may be violated
01:12:47 <robgjansen> if its too high, use your updated estimate in the next round
01:13:34 <teor> #action specify how to estimate sensitivity and expected values for each counter, and how to turn that into a set of sigmas
01:14:35 <teor> #action specify how to safely change the set of counters that is collected (or the noise on those counters) as new tor versions that support new counters are added to the network (and old versions leave)
01:15:10 <teor> Is that a good summary?
01:15:27 <teor> Do we have time to move on to question 4, or do we want to leave that for later?
01:15:28 <teor> 4. (how) do we measure different statistics over different time periods?
01:15:37 <ohmygodel> I’m fine discussing it.
01:15:50 <armadev> it seems related to the discussion we had on 3
01:15:54 <nickm> (for the record, I think having only one set of statistics at a time will be trouble.)
01:15:59 <nickm> (from a deployability POV)
01:16:04 <armadev> agreed
01:16:12 <Samdney> +1
01:16:19 <armadev> i also worry about cases where we have a whole lot of number.s like the per country counts.
01:16:39 <teor> I agree. I'd like there to be a way to safely add and remove individual counters as needed.
01:16:51 <teor> armadev: it might turn out that those counters were never safe
01:17:02 <nickm> and have old relays still report counters.
01:17:06 <armadev> (oh, and for the per country thing, which geoip file each relay is using fits into it. ugh.)
01:18:12 <ohmygodel> nickm: An easy way would just be to treat each set of statistics as independent. That is what Tor does currently. We tried to do better by considering how user activity can affect all statistics being collected, but maybe incremental progress is better.
01:20:01 <ohmygodel> armadev: I agree that it isn’t clear how private and accurate the entire collection of Tor statistics could be if it was all ported to using differential privcay.
01:20:02 <armadev> we don't want it on today's list of questions, but we might also want to pick a policy where relays only collect things that we'd be ok having published per relay, if stuff breaks
01:20:02 <nickm> so the privcount design assumes some kind of worst-case about how user activity is exposed by non-independent statistics?
01:21:02 <ohmygodel> armadev: I would further add the some individual statistics are unlikely to be able to be collected with reasonable accuracy and reasonable DP privacy (e.g. countries with few users, as Karsten discovered).
01:21:50 <teor> nickm: yes
01:22:18 <nickm> I wonder how close we are to the worst-case here
01:22:20 <ohmygodel> nickm: yes, it considers how much user activity (given that it is within some limits) can affect each statistic and then takes the worst-case view that possibly *all* stats could be simultaneously affected by those amounts
01:22:59 <armadev> does that mean that separating it, i.e. running several in parallel and assuming they're not correlated, can really reduce the amount of noise that it feels it needs to add?
01:23:19 <armadev> s/it/statistics/
01:24:07 <teor> Well, it reduces the differential privacy guarantee
01:24:16 <ohmygodel> yes, treating different sets independently can reduce the amount of noise it feels it needs to add to each one
01:24:33 <armadev> but we'd best be right about the independence
01:24:47 <teor> Well, it would be no worse than the current state
01:25:00 <armadev> except if we decide to collect something new, which we weren't comfortable collecting before
01:25:01 <ohmygodel> but yes, it means the DP guarantee only applies to each set and not them simultaneously (although DP composes, and so it doesn’t just explode, it degrades)
01:25:28 <teor> We could probably deal with that as a first cut
01:25:37 <teor> Particularly if the number of sets was small
01:26:15 <teor> For example, we have 5 different tor versions with significant presence (> 10%) on the network: https://metrics.torproject.org/versions.html
01:26:53 <teor> If we added 1-2 sets of statistics per version, then we'd be looking at about ~8 sets of simultaneous statistics
01:27:01 <ohmygodel> armadev: It seems to me like most statistics will not be independent, and so maybe it is better just to call this accepting a lower privacy guarantee than assuming they are independent.
01:27:28 <teor> although from a design perspective, it would make sense to group related statistics together
01:27:34 <nickm> yeah. I wonder if we could define sets logically rather than per-version
01:27:39 <teor> s/would/would still/
01:27:40 <armadev> ohmygodel: yeah. even for things that seem quite different, like "user counts" and "bandwidth use", they won't be independent
01:28:46 <teor> nickm: for example, we might have a "version 3 onion service" set in 0.3.3, and then an "extra version 3 onion service set" in 0.3.4
01:28:55 <armadev> defining sets logically is better because the noise will be better suited for them too
01:29:07 <nickm> and a "basic bandwidth usage" set that's very stable over time
01:29:12 <ohmygodel> nickm: could we limit the number of simultaneous sets active at a given time ?
01:29:32 <nickm> ohmygodel: programmatically or via good sense?
01:29:33 <teor> It would seem that the consensus and the protocols would be the way to do this
01:29:58 <teor> There's no point in collecting sets supported by very few relays
01:29:59 <ohmygodel> programmatically using good sense :-) ?
01:30:37 <ohmygodel> by that I mean, could we say “Tor will allow no more than 5 sets of statistics to be reported at a given time” ?
01:30:46 <teor> Or, maybe it's better to say "there's no point in collecting sets only supported by relay versions we don't support"
01:30:57 <ohmygodel> and then the Tor (via the DCs and/or TRs) would enforce that ?
01:31:42 <teor> Yes, it's possible. We could use the existing protocol version infrastructure for that
01:32:28 <cypherpunk> join #tor
01:32:33 <cypherpunk> oops
01:32:36 <ohmygodel> because then we could just build that into the privacy budget: 1/5 of it for each possible set
01:33:39 <robgjansen> the total number of sets we want may increase over time though
01:33:52 <ohmygodel> changing that budget dynamically could also be done, but it should have some limit so that an adversary can’t destroy all stats by running all Tor versions and making the per-version budget too low
01:34:16 <teor> why don't we estimate how many sets we think we'll have, build it into the privacy budget, and then *if* we go over, add new sets using the new budget>
01:34:35 <nickm> or maybe have the counter/budget/something values in the consensus?
01:34:46 <teor> then we degrade slightly if we go over, but it's less complex, and less dynamic
01:35:01 <robgjansen> do we want to build in some kind of mechanism to allow for a "reset" of the sets and privacy
01:35:15 <teor> I don't understand
01:35:36 <Samdney> a "default button"?
01:35:53 <robgjansen> you guess ahead of time that you want 8 sets and your privacy budget is 200...
01:36:03 <robgjansen> after 1 year you realize you were way off
01:36:20 <robgjansen> and you actually have 30 sets and need a budget of 2000
01:36:28 <teor> Ok, so this is what a consensus parameter would be useful for
01:37:00 <teor> #action specify the privacy budget parameter that we need to turn into consensus parameters
01:37:57 <armadev> consensus params make me nervous because they are potentially new every hour, and relays don't necessarily have history of them
01:38:17 <nickm> if not consensus params, some similar mechanism?
01:38:27 <armadev> yep. something. another moving part. :)
01:38:49 <armadev> like the shared-random-value that the subset-creation module needs. hoof.
01:39:28 <nickm> armadev: https://www.youtube.com/watch?v=jy5vjfOLTaA
01:39:30 <ohmygodel> I’d just like to mention again that while changing the budgets and allocations over time can be done, it requires some mechanism to make the guarantee apply. Options include (1) enforce a delay between measurement periods (what we do now), (2) reduce accuracy temporarily, (3) change the privacy guarantee to apply to activity within a specific time period (and not activity over any time period of a given length).
01:39:46 <teor> armadev: we already have a suitable shared random value
01:41:05 <armadev> do we? does it go public at the time we need?
01:41:21 <armadev> i mean, i agree we have one. i'm not yet convinced it's suitable.
01:41:54 <armadev> (we should probably do what it takes to make it suitable. but that might involve constraints that make us sad.)
01:42:26 <nickm> We might need to have some that last a long time, or have some way to get old ones, or such
01:42:27 <robgjansen> ohmygodel: even though privcount does (1) now, i think that's not the best for a continuous deployment
01:42:46 <ohmygodel> armadev: The TRs could do subset selection among themselves as well at the end of the collection period
01:43:02 <teor> armadev: it might mean waiting for 12 hours for stats, because it's secure as long as we don't know the final set of reveals, which can be revealed 12 hours before the SRV
01:43:20 <ohmygodel> robgjansen: yeah, I actually think (2) might work better.
01:43:40 <teor> ohmygodel is right, there are protocols that let you select subsets as long as one party is trusted
01:43:42 <ohmygodel> that way you don’t lose statistics, some just get a bit blurrier for a bit
01:43:53 <nickm> I'm afraid I need to sign off soon
01:43:57 <armadev> big fan of (2)
01:44:23 <armadev> so, we have a bunch of #action lines, and no names attached to them. what could go wrong? :)
01:44:25 <robgjansen> agreed about (2); (3) leaves room open for privacy attacks
01:44:41 <teor> #action specify how to maintain privacy guarantees when the set of statistics changes, probably by reducing accuracy
01:44:59 <ohmygodel> robgjansen: indeed, woe to those who cross the International Privacy Line
01:45:15 <robgjansen> :)
01:46:17 <teor> Ok, any more to add before I turn off the meet bot in about 1 minute's time?
01:46:32 <ohmygodel> Nothing from me
01:46:40 <teor> Thank you everyone for helping us make tor's statistics better
01:46:50 <armadev> i was wondering about a 'question 6', something like 'is there sufficient evidence that this whole thing is worth the trouble'
01:46:51 <nickm> yaay.
01:46:57 <Samdney> :)
01:47:05 <armadev> (i hope the answer is yes, but we would be wise to collect enough of it first probably)
01:47:07 <nickm> well, it depends how complicated we make it
01:47:36 <Samdney> I have the feeling it become very complicated ;)
01:47:36 <nickm> i think in the long term it makes our existing statistics more private, and makes collecting other statistics safely possible.
01:47:37 <armadev> i do agree that getting rid of our current stats, which are probably harmful in surprising ways, would be good if we can get 'most' of it still in this new way
01:47:53 <nickm> but the complications are all tbd right now IMO
01:48:12 <nickm> I suspect that we might see the next version of the spec and look for places to simplify
01:48:27 <nickm> the in-tor implementation for the current prop280 is dead simple, fwiw
01:48:44 <ohmygodel> armadev: There are attacks right now using Tor’s current stats (HS guard discovery).
01:49:30 <armadev> yep
01:49:46 <robgjansen> privcount is better than current methods
01:49:55 <armadev> i assume that when we go to shift things over to privcount, we will... we will decide that none of them can be collected? lots of questions remain :)
01:50:13 <robgjansen> do we want to keep it simple and make progress, or try to wait until we have the perfect solution
01:50:40 <Samdney> can we have both? ;)
01:50:40 <ohmygodel> Also, I believe that most of your statistics could be gathered using this sytem with similar utility and no worse privacy (actually better because the privacy methodology isn’t ad hoc).
01:50:43 <armadev> like, say, the per relay bandwidth stats
01:50:56 <armadev> i imagine when we go to do that in privcount, we will do away with per relay bandwidth stats
01:51:10 <nickm> robgjansen: not wait for perfection, that's for sure
01:51:24 <ohmygodel> armadev: yes, an exception is per-relay statistics that you actually want per relay
01:51:26 <robgjansen> IMO, make simple progress now and don't try to desgin the perfect end-all solution; we can update to better solutions later
01:51:26 <nickm> (note for the metrics team: we will not strand you suddenly)
01:51:38 <robgjansen> where better solutions != privcount
01:51:56 <nickm> well, i'm not clear whether this k-of-n thing is still privcount or not :)
01:52:02 <nickm> but that's up to you
01:52:14 <nickm> ok, i need to sign off. i'm getting silly
01:52:21 <nickm> good night everyone! or good morning as the case may be!
01:52:26 <ohmygodel> good night !
01:52:30 <Samdney> night
01:52:31 <robgjansen> goodnight!
01:52:34 <armadev> thanks all
01:52:42 <ohmygodel> so is there a plan here ?
01:52:48 <armadev> teor: are you going to commit tweaks to prop280? i made some too while reading it (grammar etc)
01:53:09 <armadev> i assumed that the plan was that teor had a plan, since he's been saying #action without trying to attach names to things :)
01:53:28 <teor> aramdev: #23492
01:53:34 <armadev> great
01:53:40 <teor> You can add to my branch
01:54:01 <teor> And re: the plan, I don't know how to split up the workload
01:54:29 <ohmygodel> teor: yeah, suddenly none of us is working on transitioning PrivCount :-/
01:55:13 <ohmygodel> Sorry, that smiley looks more malevolent than I expected
01:55:27 <teor> I think we will work it out over the next few weeks, or at the dev meeting. Network team is focused on 0.3.1 and 0.3.2 right now.
01:55:53 <teor> Anyway, I think that's a good point to end the meetbot
01:55:56 <teor> #endmeeting