16:00:20 #startmeeting tor anti-censorship meeting 16:00:20 Meeting started Thu Oct 28 16:00:20 2021 UTC. The chair is cohosh. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:20 Useful Commands: #action #agreed #help #info #idea #link #topic. 16:00:28 welcome :) 16:00:42 here is our meeting pad: https://pad.riseup.net/p/tor-anti-censorship-keep 16:00:56 hello 16:01:13 feel free to add topics to the agenda 16:01:26 we also have a reading group at the end of the meeting today :) 16:03:15 first item is about bumping the major version of the snowflake library 16:03:31 the ticket tracking changes we've made is snowflake#40063 16:05:07 if there's more work we plan to do on the API, i'd prefer to do it before bumping the version 16:05:32 so please take a look at particularly the proxy, client, and server libraries and comment on the ticket if there are changes you'd like to see 16:05:51 you mean since https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/merge_requests/60 ? 16:05:57 also thanks to idk (from i2p) for turning the proxy code into a library that can be called from other programs 16:06:00 dcf1: yup! 16:06:02 that was merged 16:07:03 also, does anyone remember the discussion on logging from PTIM the other week? 16:07:23 we have a note on this ticket about looking into modifying how we do logging 16:07:27 from sbs (ooni) 16:07:44 i had another meeting at that time and missed the details 16:07:56 it's still inked under the interesting links on the meeting pad 16:07:59 https://lists.torproject.org/pipermail/tor-project/2021-October/003194.html 16:08:05 http://meetbot.debian.net/tor-meeting/2021/tor-meeting.2021-09-30-16.00.html 16:08:22 https://github.com/Pluggable-Transports/Pluggable-Transports-spec/blob/70bc1c5115639411cf05eec300c52645c174312b/proposals/0011%20-%20Improve%20Logging%20in%20APIs.pdf 16:08:30 oh no i mean sbs's talk at PTIM 16:08:33 is the proposal (candidate for PT 3.0 I think) 16:08:41 oh no, sorry, I didn't see it 16:08:41 about difficulties with snowflake logs 16:08:45 in the ooni tests 16:08:47 okay no worries 16:08:53 i'll follow up with them directly 16:08:55 I think I missed it 16:09:15 ok, that's it for me on this subject 16:09:48 next is rebooting the snowflake vpses 16:09:49 https://lists.torproject.org/pipermail/anti-censorship-team/2021-October/000196.html 16:10:08 meskio identified a good time of day to do this 16:10:18 great 16:10:35 maybe I can do this Saturday night 16:11:00 It looks like it's something I have to do in the web configuration panel after shutting down the hosts 16:11:20 an alternative is to do it next monday and have people ready to act if something goes wrong 16:11:44 I think we have things set up so that even an uncontrolled reboot should start everyhthing running properly again, though 16:12:08 yeah 16:12:19 thanks dcf1 16:12:31 it might be pretty late for me anyway, so if others will be around I'll be happy to don't need to be 16:12:52 ok I'll plan to do it myself on Saturday and report when it's done 16:13:05 sounds good :) 16:13:18 * cohosh was just looking into meskio's other comment on that email 16:13:42 dcf1: thanks, good luck 16:14:05 my other comment in the email is the next point in the agenda if we have finished with this 16:14:14 yep go for it 16:15:06 in grafana looks like we have a problem identifying the nat type of proxies/clients since a couple of days 16:15:42 related to that or not we are seeing a lot of moments where there are no proxies ideling and many clients are beind denied 16:16:02 some spikes of denied clients hit the 1.5k in one moment 16:16:15 the clients denied are also with 'unknown' type of nat 16:16:36 and all of that started the 25, so three days ago 16:16:44 any idea what could be causing it? 16:17:00 yeah, i think you're right that it's an issue with our NAT probe test service 16:17:20 I just sent a screenshot of the grafana plot 16:17:32 so, there is a service that is being polled to check the nat type? 16:17:38 yeah 16:17:52 proxies make a connection to it when they start up 16:18:04 and every 24h? after that 16:18:18 I see, we need to check if something is happening with it 16:18:24 it's basically a webrtc peer run in it's own network namespace on the broker 16:18:33 24h will make sense, seeing how the graph basically changes in a period of 24h 16:18:37 set up to have a symmetric NAT 16:18:58 it looks like it's running 16:19:14 but we did have issues with it using a lot of CPU earlier 16:19:33 sorry I always have such a hard time keeping these straight - what does "symmetric NAT" mean in the language of rfc4787? 16:19:33 and maybe we should add an alert on that kind of behaviour (or at least open a ticket about it) 16:19:42 https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/issues/40039 16:20:12 maxbee: a symmetric NAT has ip and possible port dependent NAT mapping 16:20:36 probetest is using 100% CPU at the moment 16:20:39 ahh, I recall now probetest, I did look at the code anc try to fix some of the 100% CPU issues 16:20:41 meskio: yeah the idea with prometheus metrics was to slowly add alerts as we discover behaviour we would want to be alerted on, so now is a great time to add it :) 16:21:07 if the number of unkown proxies is higher than the number of restricted alert... 16:21:09 cohosh: and the filtering behavior can be any? 16:21:32 maxbee: yeah, though i think it's unusual to have a symmetric NAT without filtering 16:21:53 cohosh: thanks!!! 16:22:04 actually i just saw that pion has a wiki page about it that is also really nice: https://github.com/pion/webrtc/wiki/Network-Address-Translation 16:22:14 maxbee: AFAIK from snowlfake perspective there is two kinds of nat: restricted and unrestricted: https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/wikis/NAT-matching#proxy-assignment 16:22:31 i was thinking of updating our page to use this terminology as well 16:22:52 but anyway, it's possible that our NAT check service is just getting overloaded 16:22:59 we do have > 12,000 proxies 16:23:26 maybe we need to run more than one, just DNS round robin might work 16:24:08 we have a sort of "fail safe" logic in the proxy code where if it was previously marked as "unrestricted", it won't revert to "unknown", but if it was marked as "restricted" then i think it does 16:24:20 one thing to try is restarting probetest and seeing if it goes back up to 100% immediately 16:24:29 or takes time to get there 16:24:39 good idea 16:24:42 sounds good 16:25:52 so we have a few items here 16:26:05 install some kind of monitoring for the NAT type assignment 16:26:23 restart probetest and watch it for a few days 16:26:48 do we need to do the prometheus sensor first, or just watch probetest manually? 16:27:21 you mean for CPU uage? 16:27:25 *usage 16:27:36 or for how NAT assignments are being done 16:27:40 ah we have that already 16:27:46 all we're missing is an auto alert 16:28:00 it seems probetest gets pushed into a failure state (which is what we want to monitor for) 16:28:06 it's displayed in a graph-based dashboard 16:28:16 that's what I mean, sorry, installing an alert for the existing monitoring 16:28:32 ah :) i don't think we need an alert before restarting it, no 16:29:01 it's pretty easy to monitor at the dashboard if we're already looking for it 16:29:24 I'll make a ticket for this and link #40039 16:29:59 thanks 16:30:31 and good catch meskio! 16:30:58 it was luky that I opened grafana and it looked pretty weird 16:31:28 yea 16:31:42 i had just checked earlier this week 16:31:46 who can do the alertmanager config? do we have access to that machine? or do we need to ask the metrics team? 16:31:53 oh we can do it 16:32:09 i set it up with anarcat during the last hackweek that all we need to do is make a MR 16:32:28 ahh, cool, so the config file is in a repo 16:32:42 I can do that, never touched alertmanager, but is in my list of things to learn 16:32:50 https://gitlab.torproject.org/tpo/tpa/prometheus-alerts 16:32:58 sure 16:33:58 awesome :) 16:34:05 anything else before reading group? 16:34:32 not from my side 16:36:02 okay let's start the discussion on this week's reading :) 16:36:59 we decided to discuss "Characterizing Transnational Internet Performance and the Great Bottleneck of China" 16:37:20 by Zhu et al. 16:37:36 https://censorbib.nymity.ch/#Zhu2020a 16:37:54 anyone have a quick summary they'd like to share? 16:38:34 so generally the paper is about measuring "transnational" Internet performance 16:38:52 where a transnational path begins in one country and ends in another 16:39:21 they did an experiment with 29 countries downloading from each other pairwise and found abnormally slow performance in 5 African countries and in China 16:39:44 the measurements from China were especially weird so most of the paper is investigating specifically that 16:40:16 they design several experiments aiming at testing the two hypotheses: 16:40:35 1: the slow performance in China is caused by or related to Great Firewall censorship 16:41:06 2: the slow performance in China is caused by underprovisioning of transnational links (perhaps deliberate, in order to give an advantage to domestic Chinese services) 16:41:47 By my reading, I don't think there was a clear conclusion as to the cause 16:42:12 but it is not clearly GFW-related, and the dynamics of the slowdown are consistent in some ways with congestion 16:42:28 yeah, they also noted that the two possible causes are not necessarily unrelated 16:42:48 one part of the experiements is to identify which routers are the bottleneck nodes 16:43:15 70% of the bottlenecks are interior to China (not on the border); 28% are within one hop of the border 16:43:19 (Fig. 13) 16:43:58 but they tried some circumvention protocols and did not see them treated any differently, nor did TCP, UDP, ICMP seem to make a difference 16:44:14 however the elevated packet loss occurred in only one direction: inbound to China 16:45:02 is pretty interesting how it doesn't apply to hong kong and it is a great place for proxies or in our case bridges :) 16:45:40 it doesn't apply == there is network performance problems in the connection to hong kong or connecting from hong kong to abroad 16:46:50 Hong Kong had no slowdowns when accessing data from the rest of the world... that it also has much less frequent slowdowns when being accessed by nodes in mainland China. 16:46:59 "... India, Japan, and Korea are the next best senders (relatively speaking), presumably because of their physical proximity to China, though they still suffer from 4 to 8 hours on average daily." 16:48:40 yeah it could be that HK has good performance, but i think the GFW censorship is being more heavily applied there, so i'm not sure how viable running bridges there is :-S 16:49:25 Yeah I want to say that there was once some users looking for SNI proxies (https://www.bamsoftware.com/computers/sniproxy/) especially in HK, because of better performance (circa 2016) 16:49:45 However the page where I remember reading that, https://github.com/phuslu/goproxy/issues/853, is gone now and I neglected to archive it 16:50:24 ohh, I was expecting the GFW be less powerfull in HK :( 16:50:55 one of the snowflake infra machine was hosted in HK until greenhost had to give up that data center 16:51:53 Oh actually you can still see "hk" in some of the hostnames copied from the github issue https://www.bamsoftware.com/computers/sniproxy/#goproxy 16:53:47 so what's the main observation of this research, for us? that the evidence is against the network bottleneck being caused by overloading of GFW nodes? 16:54:11 i was interested because of snowflake performance testing 16:54:18 because I think that GFW interference is the first thing that would jump to anyone's mind 16:54:27 oh good call 16:55:41 snowflake performance is already limited by proxies run on home networks with slow upload speeds 16:56:06 i have been doing reachability and throughput tests from china 16:56:27 and sometimes it's hard to tell if it's blocked 16:56:31 or just performing too badly 16:57:32 this chart is a bit hard to see but: https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/issues/32657#note_2728218 16:57:44 it looks like timing the tests over the chinese night might help to don't be affected by the network bottleneck 16:58:03 most tor connections do not bootstrap to 100% within a 3 minute timeout 16:58:26 which is bad news for users who are actually trying to use this 16:58:51 :( 16:58:58 i also double checked some of the global network data the shadow simulation software uses for their tests 16:59:19 from this paper: https://tmodel-ccs2018.github.io/ 16:59:48 they ran ping tests between RIPE atlas nodes to get an idea of the *latency* of transnational links 17:00:14 and then used speedtest.net data to determine typical upload/download throughtput for nodes in the simulation 17:00:27 for example: https://www.speedtest.net/global-index/china#fixed 17:00:54 in a spot check, i saw that the latency between a CN-US link in their dataset was not very different from the latency of a US-JP link 17:00:58 which is suspicious 17:01:49 i didn't do a thorough check though 17:03:20 "China Telecom Global’s official website explicitly claims four tiers of services to connect to Chinese users. (1) China Access, (2) ChinaNet Paid-Peer, (3) Global Transit (GT), (4) Global Internet Access (GIA)." 17:03:30 I think I found the page for these tiers: 17:03:33 https://www.chinatelecomglobal.com/expertise?category=product-and-services&subcategory=internet&pid=gis 17:04:04 "the first three share the same point-of-presence or international gateway and therefore similar potential bottleneck, while Global Internet Access has a different dedicated CN2 international gateway." 17:04:14 possibly speedtest.net is using one of these prioritized links? 17:04:29 hmm, or speedtest.net is using a server in each country 17:04:55 i think it chooses the closest server out of a set 17:05:25 i'm more suspicious about the RIPE atlas latency tests 17:07:15 https://www.speedtest.net/speedtest-servers 17:08:05 but yeah, maybe the RIPE nodes were using a prioritized link 17:09:30 an interesting design choice is that they capped the curl download speed to 4 Mbps 17:09:53 which makes for easily observable features in e.g. Fig 10 (page 14) 17:10:31 their logic is that they were mainly interested in slow speeds, and therefore did not run the links as fast as possible 17:11:04 but it also means that the graphs are missing potentially interesting data for what happens above 4 Mbps 17:11:33 Fig 10(b) looks like turning on and off a faucet, but we don't really know what the shape of the data would be in the brief periods when it was fast 17:11:55 anything you have in mind that could be learned from that data? 17:12:25 i'm wondering, is the nature of the "slowdown hours" discrete or continuous? 17:12:40 ah 17:12:53 does performance keep ramping up smoothly to a maximum outside the slowdown hours, or does it reach a flat plateau and stay there? 17:13:07 having 600Mbps at home it sounds tiny 4Mbps, but I guess is enough for a 'decent' web browsing experience 17:13:56 I think designing their experiment that was was actually a pretty clever decision 17:14:43 Like in Fig. 1 you see almost all pairs glued to 4 Mbps, except for the anomalous ones they noted. 17:15:05 If those fast links were all floating on the graph at different speeds, it would not be so clear 17:15:36 I'm wondering if you could achieve a similar effect by measuring to a higher cap, and then applying a maximum before making the graphs 17:17:46 I think that's all my notes 17:18:52 Oh, potentially related to the observation about RIPE Atlas: 17:18:52 yeah, i'm looking for studies on whether the new security bill is being heavily applied in HK 17:19:00 "We cross verify our results with M-Lab’s NDT tests, which collect the China’s transnational link speed since the beginning of 2019." 17:19:04 and all i see are news reports, but no extensive studies 17:19:12 "In 64% of the 75,464 tests (with each test lasting 9 to 60 seconds), the download speed was less than 500 kbps, which generally accords with our finding of broad and severe slowdowns." 17:19:42 oh thanks dcf1 17:22:28 okay anything else? 17:22:53 i really liked this paper :) 17:23:19 yes, it was fun to read, I like how it was building up 17:26:17 If we're looking for another one, I'm planning to read soon the Geneva "Come as you are" paper on server-side evasion https://geneva.cs.umd.edu/papers/come-as-you-are.pdf 17:26:37 Or another short alternative might be https://dl.acm.org/doi/10.1145/3473604.3474560 "Measuring QQMail's automated email censorship in China" 17:27:03 nice! i'm down to read/discuss both :) 17:27:18 honestly it's a bit rude of authors to keep writing new papers before we've finished reading the old ones 17:27:30 lol XD 17:27:39 we discussed geneva in the past: https://gitlab.torproject.org/tpo/anti-censorship/team/-/wikis/Anti-censorship-reading-group 17:27:45 but not the server-side evasion paper 17:28:02 okay let's plan to do the QQMail one 17:28:11 cool 17:28:16 in two weeks? 17:28:17 I can read the server-side one and find out if we want to spend another session on it 17:28:55 that works for me, Armistice Day 17:29:12 sounds good, we have the next paper 17:30:25 awesome, i'll close the meeting here then :) 17:30:30 #endmeeting