02:03:25 #startmeeting 02:03:25 Meeting started Wed May 13 02:03:25 2015 UTC. The chair is isis. Information about MeetBot at http://wiki.debian.org/MeetBot. 02:03:25 Useful Commands: #action #agreed #help #info #idea #link #topic. 02:03:40 :) 02:03:55 isis: is it bi-weekly or weekly? 02:04:14 bi-weekly, but both Yawning andi missed the last one 02:04:37 ok, next one in two weeks? I should add it to the calendar 02:04:48 my right index finger is currently being held together with superglue… so i might be typing slow 02:04:59 ! 02:05:01 isabela: that would be great! thanks 02:06:12 what happened to your finger? 02:06:29 so, i'll reportback first, and then whoever wants to go next can go, then questions/discussion/brainstorming/etc 02:06:50 * isabela have a point at discussion time (roadmaps) 02:06:51 isabela: uh, i got in a fight with a very sharp knife and lost 02:07:17 apparently, this is why i can't have nice things :) 02:07:28 :( fuen 02:07:29 it's healing okay though 02:09:12 these last few weeks i worked on finishing #12505. in the process, because the tasks turned out to be much more intertwined than i had expected, i finished #12029, #11330, #1839, and very nearly all of #12506 02:10:26 now i am currently a tiny bit stuck on whether i should do a bit of bending over backwards to keep the new hashring structures compatible with the old database schema 02:12:27 or if i should update the schema and transition the data to the new one (this would give us sub-hashring persistence, so e.g. bridges for email-riseup.net sub-hashring would always go to riseup.net users and bridges for email-gmail.com sub-hashring would always go to gmail.com users) 02:13:02 or if i should just start doing #12030 now 02:13:22 and switch to the new databases outlined in prop#226 02:13:53 i'd kind of prefer option #3, but that is a lot of changes to be making at once on a live system 02:15:16 okay, i think that is it for me 02:15:35 who would like to go next? 02:17:15 do we have a dcf? I have some domain fronting questions/thoughts 02:17:18 okay, perhaps i am not doing so well at coraling people into attending meetings… 02:18:05 mikeperry: i do not see a dcf, no. 02:19:27 #action isis send out announcement email for pt+bridges meeting a day or so in advance 02:19:54 the main takeaways from my last tor-dev post were: a) if we make domain fronting the default way of getting bridges in Tor Launcher, it will be used by TBB, Tails, TorBirdy, Tor Messenger, and probably also OrBot. We're going to need a lot more bridges for that than 20% of the current HTTPS pool size, I think. IMO, this should be the biggest pool, even if separated from the HTTPS one 02:20:58 and b) can we get analytics on the probing? What if Tor Launcher gave you a request parameter that said the last set of bridges you gave it just failed.. like &justfailed=obfs4 02:21:53 if you could break the count of that parameter by GeoIP country and export that statistic, it might tell us which countries are either harvesting lots of IPs, or blocking some transport by DPI 02:24:59 hmm, so you wouldn't tell me which bridges in particular failed, but it would be like POST /report?justfailed=obfs4&cc=cn ? 02:25:38 yes, i can easily do that, once prop#226 is done 02:26:21 currently the structure for persisting, collecting, and studying data is… um… nice words nice words… "lacking" 02:28:03 yes, though you would be the one inferring the country code 02:28:58 okay, so the connection looks like: TorLauncher → BridgeDB's Domain Front → BridgeDB 02:29:13 and from #13171, i pull out the header with the IP address 02:29:17 with an X-DomainFronted-For header with the original IP 02:30:06 okay, yeah, that is a great idea 02:31:26 and would TorLauncher also want/need a way to query like GET /recommended_transport ? 02:32:08 you may want a filter that ensures this request can't be counted more frequently than the current IP turnover rate for the hash rings, both to guard against DoS and to not overcount users who keep trying over and over again with the same bridge lines because they don't know any better 02:32:25 (meta note: i guess the floor is open for discussion/questions/brainstorming now, but if anyone else shows up and would like to reportback, please feel free to jump in and do so) 02:32:26 yes! /recommended_transport could use GeoIP to decide the answer 02:32:42 which would greatly simplify the UX for deciding what transport to use for your situation 02:32:58 that is a great idea 02:34:01 oh… mapping IPs to the time they last queried is harder… 02:34:32 it means i would have to store IP addresses… which there really isn't any way to do that privately 02:34:37 oh right, the hashring mapping is stateless.. hrmm :/ 02:34:56 hashring mapping? 02:35:15 oh, you mean like IP → which bridges 02:35:16 ? 02:35:19 yes 02:35:40 uh… that is "pseudo-deterministic" 02:36:06 meaning that, yes, it would be entirely deterministic iff we didn't add/remove items from the hashring 02:37:38 but since we re-parse bridge descriptors and update the hashring every 30 minutes, there is a chance inversely proportional to the size of the hashring and the size of the set difference over the old and new hashrings, that the new "deterministic" mapping of IP → which bridges could produce a different answer 02:37:45 what about a post-processing step then? if you recorded the statistics as (hashring_position, hashring_time_epoch, GeoIP_country) tuples for each transport, you could just count uniques 02:38:14 ugh 02:38:23 so add a reparse_epoch to that tuple? ;) 02:38:52 there is a reparse_epoch, see bridgedb.schedule.ScheduledInterval 02:41:40 though if reparse_epoch changes much more frequently than hashring_time_epoch, we may have problems counting uniques :/ 02:45:05 maybe we just ignore reparse_epoch? how much does reparsing change what bridges you get? 02:45:16 by "reparse_epoch", you mean the frequency by which BridgeDB reparses bridge descriptors (i.e. 30 minutes) 02:45:21 yes 02:46:15 and by "hashring_time_epoch", you mean the current rotation interval for any rotating sub-hashring the distributor might have, i think 02:46:29 yes, for that transport 02:50:22 for that transport? the sub-hashrings (and sub-sub-hashrings, ad infinitum) are *mostly* bridge-type agnostic 02:51:35 The idea (IIUC) is to be able to track censorship levels for each transport type, and BridgeDB is just a convenient place to do that. 02:53:34 isis: does the sub-sub-hashring name/type also need to be included in the tuple then, in addition to the position? or is the position alone enough? 02:53:35 So I guess the only reason for tracking it per-subring would be if the subrings have non-overlapping user sets...? 02:54:04 so i am imagining a redis store which has a SET for each transport type, and then we use mikeperry's idea to use "(hashring_position, interval, country_code)" as a unique string for a report during that period that those bridges were blocked 02:55:10 str4d: yes. also, the subrings as I understand them mostly represent different types of restrictions on the bridges you get (like TCP port, IPv4 vs IPv4, etc). I thought that transport type was also something that caused a subring to be made, but I really only barely understand bridgedb's operation ;) 02:55:14 then, at the end of each interval, we clear each set, and increment a counter for "(transport_type, country_code)", iff there was a matching item in the set 02:56:28 oh, yeah, if we're tracking per sub-hashring, then the unique string should be like "(hashring_name, hashring_position, interval, country_code)" 02:57:07 i mean, that is only if we want this system to also aply to the HTTPS and email distributors 02:57:16 s/aply/apply/ 02:58:09 which we don't, i guess, because then we'd need something like a way to reportback "hey! everything you just gave me doesn't work!" over email… and ugh 02:59:35 mikeperry: okay, that is totally doable 03:00:14 mikeperry: will users ever be hitting BridgeDB's domain front over Tor? 03:03:39 hrmm.. maybe? I think we shouldn't rule it out.. how are Tor users handled by the HTTPS distributor again? 03:03:54 are they treated like a single IP, or otherwise any differently than any other non-Tor IP? 03:04:20 (maybe we could count them as country code "Proxies" or something) 03:05:56 my guess is that it is an edge case that we shouldn't spend a lot of time making perfect, but at minimum we should ensure that the Tor IPs don't affect our country counts for non-Tor IPs 03:06:57 the web tells me that MaxMind has a special GeoIP country code for proxies 03:07:03 so it might "just work" 03:07:38 GeKo, mikeperry: just upgraded to 5.0a1 -- still have issues with the resize thing :/ had to disable it. 03:07:55 currently, they are treated like a single IP with four disjoint subgroups (the subgroup is deterministically computed from the client's exit node, so using "New Tor Circuit For This Site" would give a client up to four sets of bridges in a time period) 03:08:00 hell yeah, represent nation "A1" ;) 03:08:03 https://dev.maxmind.com/geoip/legacy/codes/iso3166/ 03:09:42 no, A1 is not applied consistently, nor are Tor exits given the A1 classification by Maxmind 03:09:54 iirc 03:12:14 nope: 03:12:18 In [81]: import pygeoip 03:12:23 In [82]: geo4 = pygeoip.GeoIP('/usr/share/GeoIP/GeoIP.dat') 03:12:25 In [83]: geo4.country_code_by_addr('106.187.37.158') 03:12:26 Out[83]: 'JP' 03:12:33 that's my exit 03:12:39 and it is in japan 03:13:17 was the package that installed /usr/share/GeoIP/GeoIP.dat updated since you started that exit on that IP? 03:13:32 (and when was the upstream version of that package released?) 03:14:46 that exit has been running for like three or four years, so yeah 03:16:15 and that DB is the latest version from maxmind from 1 April 2015 03:16:36 anyway 03:17:22 there's a few more things we'd need to iron out for the design of the TorLauncher Distributor, but the blockage statistics is definitely doable 03:17:59 also /recommended_transport ftw 03:18:18 yeah, less probing 03:19:07 maybe in aggregate, but I suspect the statistics will show us that IP blocking is common, and will be the main reason why users try another transport type 03:19:30 but we should get the data for this before just guessing and changing things, IMO 03:21:18 would it be mean to respond to e.g. US clients asking for /recommended_transport with "obfs3" or "scramblesuit" first? 03:21:35 recommended_transport mainly gives us more agility and localized approaches. that's why I'm excited about it 03:22:36 rather than going straight to the latest-and-greatest for a country which only does occassional blocking (mostly depending on ISP and corporate/university network, etc.) 03:22:39 good question. do we conserve our favorite transports for other users in other countries? 03:22:53 I wonder if we can find other metrics to help answer that.. 03:23:26 I bet actually though, US censorship is pretty fucking hardcore where it happens 03:23:42 we have the best gear after all... everyone is buying their shit from us 03:23:53 (meta note: i'm going to end the meeting at 03:30 UTC) 03:24:12 and its used in libraries, schools, companies, etc 03:25:21 so maybe we don't try to guess anything about recommended_transport without more data, but the API should be there so we can adjust as needed. like if Iran suddenly blocks SSL or anything that looks encrypted during their elections, then at that point we'll want to tell .IR users to use FTE 03:26:24 ack 03:26:51 bridgedb is almost going to need an admin interface :P 03:26:54 (kidding) 03:27:47 okay, more questions? comments? 03:28:04 isabela: did you have a request for roadmapping? 03:28:32 any more Tor Browser resizing bug reports? ;) 03:33:30 sorry 03:33:41 was distracted 03:34:18 I am pinging teams to update their roadmaps ->> https://trac.torproject.org/projects/tor/wiki/org/roadmaps/TorObfuscation 03:34:32 I will add a status column there for you folks 03:35:10 we missed april, so I am asking to look at march and april - move things around if you didn't got a chance to do it and you either doing in may or will do in another month 03:35:24 that's it 03:36:18 isabela: okay, will do 03:36:39 tx 03:36:58 isabela: i'll make sure Yawning knows too 03:37:13 :) 03:38:12 i forgot to mention that i also revised BridgeDB's error pages to make them cuter: http://static.inky.ws/image/5205/error.png http://static.inky.ws/image/5206/not-found.png http://static.inky.ws/image/5207/maintenance.png 03:39:07 aww 03:39:27 i am excited about the maintenance page because i get like 5 false bug reports every time the thing goes down for reparsing 03:39:55 good point 03:40:12 :) 03:40:33 okay, i think the meeting is done? 03:40:44 #endmeeting