#ooni log

16:00:42 <hellais> #startmeeting OONI gathering 2016-01-09
16:00:42 <MeetBot> Meeting started Mon Jan  9 16:00:42 2017 UTC.  The chair is hellais. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:42 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic.
16:01:14 <hellais> Hello there!
16:01:25 <hellais> Add things you would like to discuss to the pad: https://pad.riseup.net/p/ooni-irc-pad
16:01:53 <darkk> Yo!
16:04:20 <hellais> I guess I have one thing I would like to share with you today
16:04:42 <agrabeli_> hello
16:05:04 <hellais> I have been working on the specification for the OONI Probe Orchestration System (OPOS) and have made a bit of progress on it: https://github.com/TheTorProject/ooni-spec/blob/ac60b1ed237a7179e627f7c276cd42eb6febe5da/OONI-Probe-Orchestration-System-Design.md
16:05:38 <hellais> I ended up for the moment leaving out the definition of the transport protocol to be used (websockets, http long polling, ZeroMQ or whatever) and focus just on the messagging layer
16:06:06 <hellais> I believe that with the 3 actions specified in the there and those types of triggerring methods we should be able to cover quite an amount of things
16:06:59 <hellais> In doing this I also realised that if we have some sort of way of knowing which probes are "alive" and support some way for them to report back how long they were offline for we could potentially also support detecting internet shutdown type things, but I fear adding this at this stage to the spec would lead to scope creep
16:07:55 <hellais> anyways this is what I had, if you have questions or comments we can discuss it here or in https://github.com/TheTorProject/ooni-spec/issues/80
16:08:32 <darkk> hellais: IMHO, it's not OPOS. That's just recorded failures of measurements that will be submitted when (if?) the connectivity is restored
16:10:34 <darkk> tracking probes at OPOS level is as good as giving each probe uniq ID and we were against doing that last time it was discussed. I'm still unsure if lack of probe IDs is a right decision, but I'm not a privacy advocate (just a reminder) :)
16:10:37 <hellais> darkk: yeah I also suspect that it's probably something else. To ensure that a shutdown has happenned we also probably want to do specific measurements that are not just "I can't reach the OPOS Event feed", but more of the sort "I can't reach these 10 very popular sites + traceroutes towards these 5 destinations stop at 3 hops away"
16:11:47 <darkk> yep, that's autonomous auto-start of in-depth measurements on detected anomalies :)
16:11:50 <hellais> darkk: well we actually haven't ruled out that possibility yet entirely especially if it's an opt-in feature. I can see there being a huge benefit in being able to trigger measurements on specific probes especially if they are probes we control.
16:12:10 <darkk> https://github.com/TheTorProject/ooni-probe/issues/647
16:16:04 <hellais> yes being more reactive and adjusting future measurements (or triggering measurements based on past measurements) would be a great thing to have
16:16:34 * sbs is here!
16:16:40 <hellais> but yeah this is out of scope for OPOS
16:16:52 <anadahz> hellais: is the OPOS (or how it's called) implementable for ooni-mobile probes?
16:18:45 <nuke> Hi!
16:18:50 <hellais> it's implementable, but the target platform for this first iteration is not mobile
16:19:02 <darkk> IMHO, that's the whole point of `leaving out the definition of the transport protocol` as mobile probe will likely have different method for push notification delivery
16:20:15 <hellais> darkk: yes exactly. I suspect the choice of the pub-sub technology for mobile will not be the same used for desktop.
16:25:21 <darkk> (4 minutes of silence) whatabout next topic ?
16:26:29 <hellais> yup
16:26:31 <hellais> go for it
16:27:28 <darkk> hellais: long story short -- are you OK with non-strict chronological ordering of measurement identifiers numerical _values_ ? It makes things a lot easier as it allows trivial parallel processing.
16:28:55 <hellais> can we ensure that there is at least partial ordering?
16:29:47 <darkk> IMHO (3) from https://github.com/TheTorProject/ooni-pipeline/issues/48 is easier to achieve with filtering API that queries metadata DB, but last time we discussed that you had an opinion that the numerical value itself is valuable.
16:30:36 <hellais> The reason why I wanted to have chronological ordering on the ID is that it would allow people that are consuming the measurements to say: "Give me measurements since ID XXX" and we would be able to return all the measurements that are more recent than that by just doing an ordering on that
16:31:09 <darkk> Do you really mean [-recent-]{+unconsumed+} ?
16:31:11 <hellais> I guess we can in the end have something like that by just indexing the timestamps of the measurements and then retrieving the ordering from that
16:32:10 <hellais> darkk: I am unfamiliar with the [-xxx-]{+yyy+} notation
16:32:30 <darkk> sorry, that's diff :)
16:32:53 <darkk> I meant that do you have an idea of `implementable tail -f` or do you literally mean _more recent_ ?
16:33:00 <darkk> Imagine that you have several boxes with timestamps diverging a bit and a constant incoming flow of measurements.
16:33:18 <darkk> Being strict about time ordering & ID ordering becomes hard in this case.
16:34:49 <hellais> What I was thinking is that if we publish the measurements only every, say 30 minutes and we can ensure that each 30 minute batch has ids that are greater than the last 30 minute batch and smaller than the next 30 minute batch it should be ok
16:36:13 <darkk> So I suggest that 1) measurements with higher IDs should _LIKELY_ be more recent 2) there may be gaps in measurement IDs 3) one may query both basing on IDs (to do `tail -f` or some other partial sync WRT the data) and on timestamps
16:37:22 <hellais> 2) is for sure not a requirement. I don't care if the measurement ID space is not compact
16:38:47 <darkk> Seems, we have same understanding. I have brought that to be sure that _strict_ alignment in ordering of IDs and timestamps is not a requirement (I've never understood a reason to have that and, seems, there is no such reason)
16:40:45 <hellais> yes, I will anyways update #48 (3) to better explain what I mean by ordering.
16:40:59 <hellais> anything else?
16:42:51 <darkk> EOF
16:44:03 <hellais> do we have more agenda items or things people would like to discuss?
16:45:24 <agrabeli_> as a note: I'll be writing a sort of guide this week on how to investigate internet shutdowns based on third-party data sources
16:46:12 <agrabeli_> OONI has published a few reports over the last months looking at Google traffic and other data sources when examining whether an internet shutdown has occurred in certain places (e.g. Ethiopia)
16:46:37 <agrabeli_> anyhow, it seems that our methodology would be of interest/use to many digital rights orgs out there, such as Access Now
16:47:01 <agrabeli_> the "guide" I have in mind would basically list the sources one can look at and how to examine them for shutdowns
16:47:27 <agrabeli_> ideas/thoughts/suggestions from you on what to add to this resource would be appreciated!
16:47:37 <darkk> agrabeli_: by the way, do you have any knowledge that may help interpreting regular spikes in Egyptian google traffic?
16:48:36 <hellais> darkk: I suspect the spikes in the transparency report traffic "normal", you see it in more or less every country and I think it's due to people using the internet more during peak hours (work hours vs non work hours)
16:49:12 <darkk> hellais: I mean sharp spikes, not usual daily U-shaped smooth curve
16:49:52 <darkk> agrabeli_: the list we were discussed in Berlin, but I bet you have it in your notebeook :) -- https://github.com/OpenObservatory/gatherings/blob/master/internal/2016-11-berlin/rnd.darkk.md#random-stuff
16:50:47 <agrabeli_> darkk: hehe yes I have it in my notebook, but my notebooks are quite chaotic so it's great to find this all here, thanks :)
16:51:26 <agrabeli_> darkk: there were claims that Google was blocked in Egypt recently, but there are also sharp spikes in Google traffic data?
16:53:10 <darkk> agrabeli_: regarding the claims -- yep, it was, but for a rather brief period
16:54:04 <agrabeli_> darkk: I'm not sure why there would be sharp spikes in the data, but it sounds like something worth looking into
16:54:43 <darkk> But I mean regular sharp dips https://share.riseup.net/#-d6vm8Mx-pRRBdrtuWuprg that look like some sort of measurement / aggregation / rounding / visualization bug, but I've not seen same pattern on other countries.
16:54:59 <hellais> darkk: On first look it doesn't seem like the EG data is particularly more spikey than other countries with comparable number of internet users: https://www.google.com/transparencyreport/traffic/explorer/?r=AR&l=WEBSEARCH&csd=1482766200000&ced=1483975800000
16:57:34 <darkk> hellais: ah, it makes sense, I was likely looking at countries with higher e-population.
16:59:30 <hellais> do we have more things to discuss?
17:01:39 <darkk> .
17:02:00 <hellais> ok well I guess that it folks
17:02:06 <hellais> have a great week!
17:02:11 * darkk always hesitates if sending explicit EOF is good or noise :)
17:02:11 <hellais> #endmeeting