13:59:11 #startmeeting OONI Community Gathering 2018-01-30 13:59:11 Meeting started Tue Jan 30 13:59:11 2018 UTC. The chair is hellais. Information about MeetBot at http://wiki.debian.org/MeetBot. 13:59:11 Useful Commands: #action #agreed #help #info #idea #link #topic. 14:00:07 Hello everyone! Welcome to the first OONI Community Meeting of 2018! :slightly_smiling_face: 14:00:11 \o/ 14:00:16 <171> Hello! 14:00:18 Hi @agrabeli and others! 14:00:19 :tada: 14:01:00 #topic Accessing OONI data and using it on demand: OONI-SYNC and data visualization. 14:01:16 As a reminder, please add topics that you would like to discuss as part of this meeting in this pad: https://pad.riseup.net/p/ooni-community-meeting 14:01:36 And feel free to introduce yourselves asynchronously as you join the meeting :slightly_smiling_face: 14:02:21 I raised that topic since OONI has accumulated this data and we may be sitting on vast findings 14:02:47 hello 14:02:47 <171> Hi, Everyone! I am shashiknath. I have recently joined SFLC.in as technologist. 14:02:49 But because it requires extra skills to make sense of the data, then it is not being optimised. 14:02:49 hello! 14:03:25 I work as the Africa program researcher with the Committee to Protect Journalists 14:03:28 The task is not an OONI thing, so I am not placing this on you - you have done impressive work with the collection. 14:04:52 Whoever is responsible for data processing at OONI perhaps can shed more light on how we can move from tests run to findings visualized. 14:06:02 @tafiti one thing that I should point out, is that we have made some pretty significant improvements to our data processing pipeline, that can't really fully be appreciated due to the current state of OONI Explorer, but that is in the process of being revamped 14:06:20 hi, everyone :slightly_smiling_face: 14:06:24 What I mean by this is that we are now "annotating" reports that contain interesting findings directly as part of a data processing stage 14:06:44 What this means is that with the new OONI Explorer you will be able to filter based on reports that only contain anomalies easier 14:07:20 I guess I can share a sneak preview of this (that is heavy work in progress), to make you understand what I mean: 14:07:26 This is music to many: journalists and researchers, especially. 14:07:34 Maybe the other piece of this, is if there are specific "views" of the data, or visualizations that would be worth having for each country rather than specific to a single context, it can be worth tracking bugs to see if we can have those available in the explorer 14:07:41 For example this link: https://ooni-explorer-next.test.ooni.io/search?only=anomalies&probe_cc=IT 14:07:50 Will show you all the `anomalous` measurements for italy 14:07:57 Or this link: https://ooni-explorer-next.test.ooni.io/search?only=confirmed&probe_cc=IT 14:08:06 Will show you all the "confirmed" blocked sites in italy 14:08:20 This means sites that show a blockpage in Italy 14:08:46 One aspect that I think we could greatly benefit from community contributions, is that of flagging on blockpages look like in every country 14:09:23 So we are thinking of adding some buttons to explorer where people can directly report to us an interesting finding so that we can add the blockpage to our fingerprint database and in the future detect it 14:10:03 Though this is only pertaining to website blocking and also only website blocking when there is a blockpage 14:10:20 I should also point out that the data that OONI collects goes beyond just blocked sites 14:10:31 This is great, @hellais 14:10:34 I think one thing that would be useful for us to understand is basically what @willscott said 14:11:00 I have checked through and it ticks some of the most critical boxes 14:11:14 @hellais - i think https://github.com/TheTorProject/ooni-explorer/issues/70 is the issue tracking your button suggestion? 14:12:14 @willscott yeah I guess so. It's also part of the master ticket for the revamped explorer: https://github.com/OpenObservatory/design/issues/6#issuecomment-326552777 14:13:45 @tafiti do you think there is something we are missing in terms of "questions" we should be sure are answered thanks to OONI Explorer or something else? 14:14:11 @willscott you mean having specific ‘views’ for a country but then you go ahead “rather than specific to a single context”. You lost me there. Do you mean customized views for each country or test? 14:14:15 I would particularly be interesting in hearing about things that are not just about website blocking 14:15:21 @tafiti with the new OONI Explorer we plan to have country pages where each country page presents the top censorship findings in terms of websites blocked, middleboxes detected, speed and performance 14:15:33 (and IM tests) 14:15:50 (and censorship circumvention tool test findings) 14:15:55 (and censorship methods) 14:16:07 @hellais I think it would be interesting to have timelines baked in. If Facebook, for example, is blocked in country x, it should be great if one can see days when tests shown it was up, down and up again. 14:16:13 And the idea is that all of these findings will automatically be updated as new data comes in 14:16:38 That may make a far interesting story. Leaving that to Journalists may lead to mixed up interpretation based on what one accessed. 14:17:46 @agrabeli I guess I should be a little bit patient :slightly_smiling_face: 14:17:51 The tricky thing with timelines, is that if you don't have consistency in vantage points they can sometimes be misleading. We are thinking, though, of having historical views on a per-country and per-site basis (and possibly also per-site + per-country, which is what you are saying I guess) 14:18:41 Yes. There could be three options to this: Tested - on, Tested - down, Not-Tested - N/A 14:18:46 @tafiti I really like your suggestion on visualizing the data in a timeline 14:19:59 @tafiti the thing is that in many case we have many measurements of the same site in the same day. So it's tricky to come up with "conflict resolution" 14:20:25 The data is multidimensional and therefore it's tricky to flatten it onto a bi-dimensional timeline 14:20:54 The dimensions are: (ASN, time, target, status) 14:21:18 And the data also often has "holes" 14:21:42 I have been using OONI-SYNC and Viz tools and I think there is a lot to do with the data. May be explorer can keep it simple, then interested parties can follow up with more detailed analysis and visualization. I do not think you will solve all the vix needs for every single case so you may as well focus on the general but crucial ones. 14:22:32 I think the issue with it not being continous on a per-network basis, is going to some extent improve the situation, but it's likely to still be present and lead to possible wrong interpretations 14:23:40 Perhaps having incomplete data per site is not necessarily a blocker 14:23:46 In any case, we definitely have plans to try out some ideas in this area, but we can't promise they will end up in the final version of OONI Explorer 14:24:12 That said we could maybe publish some partial pre-processed data that can be useful to people wanting to plot their own timelines and the like, without having to download GBs of data 14:25:06 and by partial preprocessed data, I mean something like a flat CSV file 14:25:10 If we can find a way to visualize the connectivity of a site over time (even in a timeline similar to what is used by Google transparency reports), then we can (a) show "normal flow" when the site is accessible, (b) show spikes when there are anomalies (and by hovering on the spikes, the user sees what types of anomalies), (c) show nothing when there is no data for that specific time period. 14:25:43 This probably needs more thought :P 14:26:34 explorer + pre-processed data sounds like a good compromise. 14:30:15 @agrabeli the missing datapoints when no tests were conducted could be ignored. Then a disclaimer be made “only dates when tests were run are shown’. 14:31:22 <171> @hellais you have mentioned about many measurements of the same day and conflict resolution. What are the other factors that can be coupled with timeline to get a bi-dimensional visualization? 14:31:39 <171> location? ISP? 14:32:46 @tafiti maybe that could work, not sure... 14:33:14 @171 I would say the primary issue is with having inconsistencies in the "vantage point dimension" (location, ISP, etc.) 14:34:07 For example you will have on `day[1]` only from `ISP[1]`, then on `day[2]` you have data from `ISP[2], ISP[3]` 14:34:48 If measurements on `day[1]` are conflicting with measurements from `day[2]` it's hard to say anything meaningful in terms of what changed in `country[1]` 14:35:25 But it's also hard to highlight this kind of nuance inside of a two-dimension timeline type visualization 14:36:25 <171> Oh so the measurements do not necessarily include every ISP everyday. 14:36:34 @171 correct 14:36:36 I guess a lot of this depends on the type and amount of data available per country, and the focus of the research (e.g. in some countries, there are only measurements from one ASN) 14:37:06 @171 no, the measurements depend on where volunteers run tests (which varies from country to country) 14:37:19 Currently this problem is even more acute, since we don't have a way of ensuring clients test sites we care about routinely automatically on the mobile devices 14:37:45 We are working on a solution to this and hopefully this will lead to more consistent data 14:38:23 <171> @agrabeli where can i find information regarding how to volunteer for running tests? 14:38:54 https://ooni.torproject.org/install/ 14:38:55 @171 you can find OONI Probe installation information here: https://ooni.torproject.org/install/ :slightly_smiling_face: 14:39:12 OONI Probe is available for Linux, macOS, Android, iOS, and Raspberry Pis 14:39:59 @171 feel free to ping us directly with any questions you may have in relation to installation 14:40:03 <171> @hellais great. 14:40:17 <171> @willscott @agrabeli Thanks! 14:40:32 Agenda 2? 14:40:36 yep! 14:41:08 #topic 2. Are there technical measurements of the "internet photo op" in Cameroon? 14:41:25 <171> It's great that the app is available on F-droid. Awesome! 14:41:40 @zakkai would you like to share a few words on your proposed topic? :slightly_smiling_face: 14:41:58 #topic Are there technical measurements of the "internet photo op" in Cameroon? 14:42:05 (repeating here so that meetbot captures it) 14:43:29 Maybe we can move onto the next one and if @zakkai re-appears we can get back to 2 14:43:42 #topic What features would you like us to be sure to include in the upcoming OONI Probe desktop apps? 14:43:52 <171> Can someone help me out where can i find info about Topic 2? 14:44:53 @171 I'm not sure what the "internet photo op" is, let's w8 for @zakkai :slightly_smiling_face: 14:45:11 So basically what I had in mind for this was that, as maybe you know we are working on developing a OONI Probe desktop app that is going to be targeting Windows & macOS platforms 14:45:13 <171> Ok. :+1: 14:45:58 Since this is still in the very early stages of development, now is a great time to collect feedback on features that are currently missing and you would really like to see in the new upcoming OONI Probe desktop apps 14:46:42 Maybe something that you have found to be annoying or not working as you expected in the current "desktop" apps or even things that are missing from the mobile app that you would like to do thanks to a desktop app. 14:47:45 one of the things that i wonder about is that each time i run web connectivity tests on my phone, i get a different subset of domains i test 14:47:53 I would like to be able to test URLs directly via the new desktop apps :slightly_smiling_face: 14:48:06 for consistency, i feel like it might be more valuable if i were to test the same URLs every day 14:49:53 <171> How are the domains selected for testing? 14:49:59 @willscott the reason why we do that on mobile is that it's a sort of workaround for the problem of "distributing" the load of testing in a distributed fashion 14:50:10 yep, and it makes some sense 14:50:17 but especially if i schedule a recurring measurement 14:50:22 there's a lot of value in the consistency 14:50:32 so it's worth thinking about selection strategy 14:51:00 @171 - they come from the citizen lab list: https://github.com/citizenlab/test-lists 14:51:00 We have just rolled out in proteus a feature that allows for dynamic URL list fetching (without needing to hardcode the testing lists in the app), that will allow us to better manage which URLs are given to each probe and optimise the testing strategy without requiring an app upgrade 14:51:29 @171 https://ooni.torproject.org/get-involved/contribute-test-lists/ 14:51:30 I think there are tradeoffs with each approach, but at least moving the "intelligence" into the backend means we can try out different strategies and see which ones plays out better (sort of A-B testing style) 14:52:07 For scheduling of measurements that is quite a different beast. I think that what we probably want to optimise for is having some sort of feedback between the orchestration and the data processing pipeline 14:52:25 So for example if we detect that a site is blocked on a particular network, we probably want to be sure that a particular client keeps testing that site 14:52:39 While sites that are known to be working can perhaps be put on a lower priority 14:53:02 I think though, this is an interesting research question, in a way, to which we don't really have a good answer to what is the best approach 14:53:39 On the one hand we want consistency, but we also need to work within the bandwidth constraints of mobile and still ensure we are doing extensive testing of all the URLs in the lists 14:53:50 So it's not something that is easy 14:54:24 @hellais I'd highlight that we've rolled out proteus feature, but we're not going to turn it on in public OONI Probe build till security audit is done, right? 14:54:26 <171> @hellais yes, being able to try out different strategies does help in improving the strategy 14:54:34 We will probably have some sort of "unmetered" mode available, that allows us to run through all the URLs in the testing lists, but we probably cannot push this onto every user (even in the desktop deployment scenario) 14:54:55 @hellais - regarding apps on desktop. It is a pain at times to clear app cache. A 20MB app unzips to 150MB, keeps taking more space with each run and by the time you knwo it, it’s way out of line. I hope the app can self-clean right after uploading the tests as a default option. If one wants to keep the data (I hardly doubt), one can opt in. 14:55:10 > I'd highlight that we've rolled out proteus feature, but we're not going to turn it on in public OONI Probe build till security audit is done, right? This is not related to orchestration. This is just for URL list fetching. We will probably include this as part of a new mobile release in 1 month or so 14:55:38 And by URL list fetching I mean when you click on the Web Connectivity `RUN` button, what is the list of URLs that is going to be tested 14:55:39 <171> is there a selection startegy for the type of tests to be conducted on a domain? 14:56:01 @171 every domain is testing, to some extent, in the same way 14:56:41 right. web_connectivity tries to establish a connection, and can differentiate several ways in which it can fail 14:56:51 from DNS, to TCP, to DPI 14:57:28 @tafiti > regarding apps on desktop. It is a pain at times to clear app cache. A 20MB app unzips to 150MB, keeps taking more space with each run and by the time you knwo it, it’s way out of line. I hope the app can self-clean right after uploading the tests as a default option. If one wants to keep the data (I hardly doubt), one can opt in. Yes this is a very good point 14:57:35 <171> @hellais do you mean that as part of testing a domain all the tests are conducted. 14:57:41 Currently we have an option to delete all the past measurements, but it's a bit painful to use, I agree 14:57:44 what do you mean by 'all the tests'? 14:58:10 Our plan for the new OONI Probe mobile app, is that after the measurement is uploaded and some days have past it's deleted on the device and we only store the report_id 14:58:22 Some metadata about the result will be kept on the device 14:58:31 But if you need to access the JSON it will be fetched from the API directly 14:58:37 @171 - there's only one test that is meaningful for an arbitrary domain. the other tests are about your connection as a whole - like your speed, and if IM apps work 14:59:32 @171 this OONI test (called Web Connectivity) measures the DNS, TCP, and HTTP blocking of URLs: https://ooni.torproject.org/nettest/web-connectivity/ 14:59:36 We expect that this will drastically reduce the app disk space footprint 14:59:36 This makes it easier to convince more people to run tests. Most people in SSA are all about: will it use my data bundle? how much ‘space’ will be used? 14:59:39 <171> @willscott understood. 15:00:07 @tafiti another thing we are going to be introducing in 2.0 of the mobile app (and desktop) is data usage and quotas 15:00:28 You will be able to know how many MBs of data OONI Probe has used and pick data usage quotas for when you are on mobile or wifi 15:00:54 The app itself will also give an estimate of the data usage of a given test run and total test runtime 15:01:28 <171> @agrabeli @willscott Thanks 15:01:42 Fair enough. Oh, by the way I did not mean OONI app is taking lots of space - well, I have never checked. I meant some desktop apps. 15:02:16 @tafiti the on disk usage of OONI Probe desktop app will be comparable to the disk usage of the Slack desktop app 15:02:17 @171 You can learn all about other OONI Probe tests here: https://ooni.torproject.org/nettest/ (and feel free to ping us directly with questions) 15:02:37 @tafiti do you feel like the disk usage of the desktop app itself is also a concern for people? 15:02:43 <171> @agrabeli Cool! 15:03:24 @tafiti like would you say a desktop app that is say 30MB compressed (180MB expanded) is too big? 15:03:45 @tafiti does it make sense to be able to measure with one network and upload measurements with another one? That's actually three questions: 1. is upload metered? 2. do people have access to two networks "interesting" (to measure) and "cheap" (unmetered?), 3. are people motivated enough to follow some manual procedure to reduce data usage by factor of two (not even ten)? 15:04:51 @hellais desktop running out of space recives more drastic response than mobile. I know folks who prune apps just because they don’t want to risk on degraded performance. You don’t want someone ranking apps and thinking ‘this one is using too much space so , off you go’ 15:05:28 @150mb expanded is fair game. 15:05:51 Anything over 250MB is too much. At least on Mac. 15:06:52 @darkk I think most desktop connections are over wifi or fixed unlimited connections. 15:07:20 Ok I will keep this in mind. We sort of have a lower bound to the on-disk usage given by electron 15:07:20 we should maybe be respectful of time and wrap up? 15:07:42 Mobile apps may have the option of SIM and Wifi so perhaps if its possible to run the tests on sim and upload on wifi, that would be something. 15:08:10 This is all very useful feedback, thanks! 15:08:16 We can continue this chat elsewhere. 15:08:19 Thanks for the feedback -- I'll be documenting it into a ticket so that it can feed into the dev of the desktop apps 15:08:20 Cheers 15:08:48 We can continue discussions on this channel for those interested, but for now, let's officially wrap up the meeting since we're over time 15:09:45 Thanks everyone for attending! The next monthly community meeting will take place on this channel at 14:00 UTC on the last Tuesday of the month. If you haven't already, please sign up on the ooni-talk mailing list (where we'll be announcing the next meeting): https://lists.torproject.org/cgi-bin/mailman/listinfo/ooni-talk 15:10:37 We're open to any suggestions you may have for a time and date that might work better for you for the next meeting 15:10:37 I'd be interested in reading this discussion later, but for now I have other commitments. I would appreciate if the discussion continued here, though I don't have anything to add to it. Thanks for an interesting discussion, everyone! Bye! 15:11:00 @sukarn thanks for joining us! 15:11:06 And thanks to everyone else :slightly_smiling_face: 15:11:21 #endmeeting