#tor-meeting log

16:00:20 <cohosh> #startmeeting tor anti-censorship meeting
16:00:20 <MeetBot> Meeting started Thu Oct 28 16:00:20 2021 UTC.  The chair is cohosh. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:20 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic.
16:00:28 <cohosh> welcome :)
16:00:42 <cohosh> here is our meeting pad: https://pad.riseup.net/p/tor-anti-censorship-keep
16:00:56 <meskio> hello
16:01:13 <cohosh> feel free to add topics to the agenda
16:01:26 <cohosh> we also have a reading group at the end of the meeting today :)
16:03:15 <cohosh> first item is about bumping the major version of the snowflake library
16:03:31 <cohosh> the ticket tracking changes we've made is snowflake#40063
16:05:07 <cohosh> if there's more work we plan to do on the API, i'd prefer to do it before bumping the version
16:05:32 <cohosh> so please take a look at particularly the proxy, client, and server libraries and comment on the ticket if there are changes you'd like to see
16:05:51 <dcf1> you mean since https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/merge_requests/60 ?
16:05:57 <cohosh> also thanks to idk (from i2p) for turning the proxy code into a library that can be called from other programs
16:06:00 <cohosh> dcf1: yup!
16:06:02 <cohosh> that was merged
16:07:03 <cohosh> also, does anyone remember the discussion on logging from PTIM the other week?
16:07:23 <cohosh> we have a note on this ticket about looking into modifying how we do logging
16:07:27 <cohosh> from sbs (ooni)
16:07:44 <cohosh> i had another meeting at that time and missed the details
16:07:56 <dcf1> it's still inked under the interesting links on the meeting pad
16:07:59 <dcf1> https://lists.torproject.org/pipermail/tor-project/2021-October/003194.html
16:08:05 <dcf1> http://meetbot.debian.net/tor-meeting/2021/tor-meeting.2021-09-30-16.00.html
16:08:22 <dcf1> https://github.com/Pluggable-Transports/Pluggable-Transports-spec/blob/70bc1c5115639411cf05eec300c52645c174312b/proposals/0011%20-%20Improve%20Logging%20in%20APIs.pdf
16:08:30 <cohosh> oh no i mean sbs's talk at PTIM
16:08:33 <dcf1> is the proposal (candidate for PT 3.0 I think)
16:08:41 <dcf1> oh no, sorry, I didn't see it
16:08:41 <cohosh> about difficulties with snowflake logs
16:08:45 <cohosh> in the ooni tests
16:08:47 <cohosh> okay no worries
16:08:53 <cohosh> i'll follow up with them directly
16:08:55 <meskio> I think I missed it
16:09:15 <cohosh> ok, that's it for me on this subject
16:09:48 <dcf1> next is rebooting the snowflake vpses
16:09:49 <dcf1> https://lists.torproject.org/pipermail/anti-censorship-team/2021-October/000196.html
16:10:08 <dcf1> meskio identified a good time of day to do this
16:10:18 <cohosh> great
16:10:35 <dcf1> maybe I can do this Saturday night
16:11:00 <dcf1> It looks like it's something I have to do in the web configuration panel after shutting down the hosts
16:11:20 <dcf1> an alternative is to do it next monday and have people ready to act if something goes wrong
16:11:44 <dcf1> I think we have things set up so that even an uncontrolled reboot should start everyhthing running properly again, though
16:12:08 <cohosh> yeah
16:12:19 <cohosh> thanks dcf1
16:12:31 <meskio> it might be pretty late for me anyway, so if others will be around I'll be happy to don't need to be
16:12:52 <dcf1> ok I'll plan to do it myself on Saturday and report when it's done
16:13:05 <cohosh> sounds good :)
16:13:18 * cohosh was just looking into meskio's other comment on that email
16:13:42 <meskio> dcf1: thanks, good luck
16:14:05 <meskio> my other comment in the email is the next point in the agenda if we have finished with this
16:14:14 <cohosh> yep go for it
16:15:06 <meskio> in grafana looks like we have a problem identifying the nat type of proxies/clients since a couple of days
16:15:42 <meskio> related to that or not we are seeing a lot of moments where there are no proxies ideling and many clients are beind denied
16:16:02 <meskio> some spikes of denied clients hit the 1.5k in one moment
16:16:15 <meskio> the clients denied are also with 'unknown' type of nat
16:16:36 <meskio> and all of that started the 25, so three days ago
16:16:44 <meskio> any idea what could be causing it?
16:17:00 <cohosh> yeah, i think you're right that it's an issue with our NAT probe test service
16:17:20 <cohosh> I just sent a screenshot of the grafana plot
16:17:32 <meskio> so, there is a service that is being polled to check the nat type?
16:17:38 <cohosh> yeah
16:17:52 <cohosh> proxies make a connection to it when they start up
16:18:04 <cohosh> and every 24h? after that
16:18:18 <meskio> I see, we need to check if something is happening with it
16:18:24 <cohosh> it's basically a webrtc peer run in it's own network namespace on the broker
16:18:33 <meskio> 24h will make sense, seeing how the graph basically changes in a period of 24h
16:18:37 <cohosh> set up to have a symmetric NAT
16:18:58 <cohosh> it looks like it's running
16:19:14 <cohosh> but we did have issues with it using a lot of CPU earlier
16:19:33 <maxbee> sorry I always have such a hard time keeping these straight - what does "symmetric NAT" mean in the language of rfc4787?
16:19:33 <meskio> and maybe we should add an alert on that kind of behaviour (or at least open a ticket about it)
16:19:42 <cohosh> https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/issues/40039
16:20:12 <cohosh> maxbee: a symmetric NAT has ip and possible port dependent NAT mapping
16:20:36 <dcf1> probetest is using 100% CPU at the moment
16:20:39 <meskio> ahh, I recall now probetest, I did look at the code anc try to fix some of the 100% CPU issues
16:20:41 <cohosh> meskio: yeah the idea with prometheus metrics was to slowly add alerts as we discover behaviour we would want to be alerted on, so now is a great time to add it :)
16:21:07 <meskio> if the number of unkown proxies is higher than the number of restricted alert...
16:21:09 <maxbee> cohosh: and the filtering behavior can be any?
16:21:32 <cohosh> maxbee: yeah, though i think it's unusual to have a symmetric NAT without filtering
16:21:53 <maxbee> cohosh: thanks!!!
16:22:04 <cohosh> actually i just saw that pion has a wiki page about it that is also really nice: https://github.com/pion/webrtc/wiki/Network-Address-Translation
16:22:14 <meskio> maxbee: AFAIK from snowlfake perspective there is two kinds of nat: restricted and unrestricted: https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/wikis/NAT-matching#proxy-assignment
16:22:31 <cohosh> i was thinking of updating our page to use this terminology as well
16:22:52 <cohosh> but anyway, it's possible that our NAT check service is just getting overloaded
16:22:59 <cohosh> we do have > 12,000 proxies
16:23:26 <meskio> maybe we need to run more than one, just DNS round robin might work
16:24:08 <cohosh> we have a sort of "fail safe" logic in the proxy code where if it was previously marked as "unrestricted", it won't revert to "unknown", but if it was marked as "restricted" then i think it does
16:24:20 <dcf1> one thing to try is restarting probetest and seeing if it goes back up to 100% immediately
16:24:29 <dcf1> or takes time to get there
16:24:39 <cohosh> good idea
16:24:42 <meskio> sounds good
16:25:52 <dcf1> so we have a few items here
16:26:05 <dcf1> install some kind of monitoring for the NAT type assignment
16:26:23 <dcf1> restart probetest and watch it for a few days
16:26:48 <dcf1> do we need to do the prometheus sensor first, or just watch probetest manually?
16:27:21 <cohosh> you mean for CPU uage?
16:27:25 <cohosh> *usage
16:27:36 <dcf1> or for how NAT assignments are being done
16:27:40 <cohosh> ah we have that already
16:27:46 <cohosh> all we're missing is an auto alert
16:28:00 <dcf1> it seems probetest gets pushed into a failure state (which is what we want to monitor for)
16:28:06 <cohosh> it's displayed in a graph-based dashboard
16:28:16 <dcf1> that's what I mean, sorry, installing an alert for the existing monitoring
16:28:32 <cohosh> ah :) i don't think we need an alert before restarting it, no
16:29:01 <cohosh> it's pretty easy to monitor at the dashboard if we're already looking for it
16:29:24 <dcf1> I'll make a ticket for this and link #40039
16:29:59 <cohosh> thanks
16:30:31 <cohosh> and good catch meskio!
16:30:58 <meskio> it was luky that I opened grafana and it looked pretty weird
16:31:28 <cohosh> yea
16:31:42 <cohosh> i had just checked earlier this week
16:31:46 <meskio> who can do the alertmanager config? do we have access to that machine? or do we need to ask the metrics team?
16:31:53 <cohosh> oh we can do it
16:32:09 <cohosh> i set it up with anarcat during the last hackweek that all we need to do is make a MR
16:32:28 <meskio> ahh, cool, so the config file is in a repo
16:32:42 <meskio> I can do that, never touched alertmanager, but is in my list of things to learn
16:32:50 <cohosh> https://gitlab.torproject.org/tpo/tpa/prometheus-alerts
16:32:58 <cohosh> sure
16:33:58 <cohosh> awesome :)
16:34:05 <cohosh> anything else before reading group?
16:34:32 <meskio> not from my side
16:36:02 <cohosh> okay let's start the discussion on this week's reading :)
16:36:59 <cohosh> we decided to discuss "Characterizing Transnational Internet Performance and the Great Bottleneck of China"
16:37:20 <cohosh> by Zhu et al.
16:37:36 <cohosh> https://censorbib.nymity.ch/#Zhu2020a
16:37:54 <cohosh> anyone have a quick summary they'd like to share?
16:38:34 <dcf1> so generally the paper is about measuring "transnational" Internet performance
16:38:52 <dcf1> where a transnational path begins in one country and ends in another
16:39:21 <dcf1> they did an experiment with 29 countries downloading from each other pairwise and found abnormally slow performance in 5 African countries and in China
16:39:44 <dcf1> the measurements from China were especially weird so most of the paper is investigating specifically that
16:40:16 <dcf1> they design several experiments aiming at testing the two hypotheses:
16:40:35 <dcf1> 1: the slow performance in China is caused by or related to Great Firewall censorship
16:41:06 <dcf1> 2: the slow performance in China is caused by underprovisioning of transnational links (perhaps deliberate, in order to give an advantage to domestic Chinese services)
16:41:47 <dcf1> By my reading, I don't think there was a clear conclusion as to the cause
16:42:12 <dcf1> but it is not clearly GFW-related, and the dynamics of the slowdown are consistent in some ways with congestion
16:42:28 <cohosh> yeah, they also noted that the two possible causes are not necessarily unrelated
16:42:48 <dcf1> one part of the experiements is to identify which routers are the bottleneck nodes
16:43:15 <dcf1> 70% of the bottlenecks are interior to China (not on the border); 28% are within one hop of the border
16:43:19 <dcf1> (Fig. 13)
16:43:58 <dcf1> but they tried some circumvention protocols and did not see them treated any differently, nor did TCP, UDP, ICMP seem to make a difference
16:44:14 <dcf1> however the elevated packet loss occurred in only one direction: inbound to China
16:45:02 <meskio> is pretty interesting how it doesn't apply to hong kong and it is a great place for proxies or in our case bridges :)
16:45:40 <meskio> it doesn't apply == there is network performance problems in the connection to hong kong or connecting from hong kong to abroad
16:46:50 <dcf1> Hong Kong had no slowdowns when accessing data from the rest of the world... that it also has much less frequent slowdowns when being accessed by nodes in mainland China.
16:46:59 <dcf1> "... India, Japan, and Korea are the next best senders (relatively speaking), presumably because of their physical proximity to China, though they still suffer from 4 to 8 hours on average daily."
16:48:40 <cohosh> yeah it could be that HK has good performance, but i think the GFW censorship is being more heavily applied there, so i'm not sure how viable running bridges there is :-S
16:49:25 <dcf1> Yeah I want to say that there was once some users looking for SNI proxies (https://www.bamsoftware.com/computers/sniproxy/) especially in HK, because of better performance (circa 2016)
16:49:45 <dcf1> However the page where I remember reading that, https://github.com/phuslu/goproxy/issues/853, is gone now and I neglected to archive it
16:50:24 <meskio> ohh, I was expecting the GFW be less powerfull in HK :(
16:50:55 <dcf1> one of the snowflake infra machine was hosted in HK until greenhost had to give up that data center
16:51:53 <dcf1> Oh actually you can still see "hk" in some of the hostnames copied from the github issue https://www.bamsoftware.com/computers/sniproxy/#goproxy
16:53:47 <dcf1> so what's the main observation of this research, for us? that the evidence is against the network bottleneck being caused by overloading of GFW nodes?
16:54:11 <cohosh> i was interested because of snowflake performance testing
16:54:18 <dcf1> because I think that GFW interference is the first thing that would jump to anyone's mind
16:54:27 <dcf1> oh good call
16:55:41 <cohosh> snowflake performance is already limited by proxies run on home networks with slow upload speeds
16:56:06 <cohosh> i have been doing reachability and throughput tests from china
16:56:27 <cohosh> and sometimes it's hard to tell if it's blocked
16:56:31 <cohosh> or just performing too badly
16:57:32 <cohosh> this chart is a bit hard to see but: https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/issues/32657#note_2728218
16:57:44 <meskio> it looks like timing the tests over the chinese night might help to don't be affected by the network bottleneck
16:58:03 <cohosh> most tor connections do not bootstrap to 100% within a 3 minute timeout
16:58:26 <cohosh> which is bad news for users who are actually trying to use this
16:58:51 <meskio> :(
16:58:58 <cohosh> i also double checked some of the global network data the shadow simulation software uses for their tests
16:59:19 <cohosh> from this paper: https://tmodel-ccs2018.github.io/
16:59:48 <cohosh> they ran ping tests between RIPE atlas nodes to get an idea of the *latency* of transnational links
17:00:14 <cohosh> and then used speedtest.net data to determine typical upload/download throughtput for nodes in the simulation
17:00:27 <cohosh> for example: https://www.speedtest.net/global-index/china#fixed
17:00:54 <cohosh> in a spot check, i saw that the latency between a CN-US link in their dataset was not very different from the latency of a US-JP link
17:00:58 <cohosh> which is suspicious
17:01:49 <cohosh> i didn't do a thorough check though
17:03:20 <dcf1> "China Telecom Global’s official website explicitly claims four tiers of services to connect to Chinese users. (1) China Access, (2) ChinaNet Paid-Peer, (3) Global Transit (GT), (4) Global Internet Access (GIA)."
17:03:30 <dcf1> I think I found the page for these tiers:
17:03:33 <dcf1> https://www.chinatelecomglobal.com/expertise?category=product-and-services&subcategory=internet&pid=gis
17:04:04 <dcf1> "the first three share the same point-of-presence or international gateway and therefore similar potential bottleneck, while Global Internet Access has a different dedicated CN2 international gateway."
17:04:14 <dcf1> possibly speedtest.net is using one of these prioritized links?
17:04:29 <cohosh> hmm, or speedtest.net is using a server in each country
17:04:55 <cohosh> i think it chooses the closest server out of a set
17:05:25 <cohosh> i'm more suspicious about the RIPE atlas latency tests
17:07:15 <cohosh> https://www.speedtest.net/speedtest-servers
17:08:05 <cohosh> but yeah, maybe the RIPE nodes were using a prioritized link
17:09:30 <dcf1> an interesting design choice is that they capped the curl download speed to 4 Mbps
17:09:53 <dcf1> which makes for easily observable features in e.g. Fig 10 (page 14)
17:10:31 <dcf1> their logic is that they were mainly interested in slow speeds, and therefore did not run the links as fast as possible
17:11:04 <dcf1> but it also means that the graphs are missing potentially interesting data for what happens above 4 Mbps
17:11:33 <dcf1> Fig 10(b) looks like turning on and off a faucet, but we don't really know what the shape of the data would be in the brief periods when it was fast
17:11:55 <cohosh> anything you have in mind that could be learned from that data?
17:12:25 <dcf1> i'm wondering, is the nature of the "slowdown hours" discrete or continuous?
17:12:40 <cohosh> ah
17:12:53 <dcf1> does performance keep ramping up smoothly to a maximum outside the slowdown hours, or does it reach a flat plateau and stay there?
17:13:07 <meskio> having 600Mbps at home it sounds tiny 4Mbps, but I guess is enough for a 'decent' web browsing experience
17:13:56 <dcf1> I think designing their experiment that was was actually a pretty clever decision
17:14:43 <dcf1> Like in Fig. 1 you see almost all pairs glued to 4 Mbps, except for the anomalous ones they noted.
17:15:05 <dcf1> If those fast links were all floating on the graph at different speeds, it would not be so clear
17:15:36 <dcf1> I'm wondering if you could achieve a similar effect by measuring to a higher cap, and then applying a maximum before making the graphs
17:17:46 <dcf1> I think that's all my notes
17:18:52 <dcf1> Oh, potentially related to the observation about RIPE Atlas:
17:18:52 <cohosh> yeah, i'm looking for studies on whether the new security bill is being heavily applied in HK
17:19:00 <dcf1> "We cross verify our results with M-Lab’s NDT tests, which collect the China’s transnational link speed since the beginning of 2019."
17:19:04 <cohosh> and all i see are news reports, but no extensive studies
17:19:12 <dcf1> "In 64% of the 75,464 tests (with each test lasting 9 to 60 seconds), the download speed was less than 500 kbps, which generally accords with our finding of broad and severe slowdowns."
17:19:42 <cohosh> oh thanks dcf1
17:22:28 <cohosh> okay anything else?
17:22:53 <cohosh> i really liked this paper :)
17:23:19 <meskio> yes, it was fun to read, I like how it was building up
17:26:17 <dcf1> If we're looking for another one, I'm planning to read soon the Geneva "Come as you are" paper on server-side evasion https://geneva.cs.umd.edu/papers/come-as-you-are.pdf
17:26:37 <dcf1> Or another short alternative might be https://dl.acm.org/doi/10.1145/3473604.3474560 "Measuring QQMail's automated email censorship in China"
17:27:03 <cohosh> nice! i'm down to read/discuss both :)
17:27:18 <dcf1> honestly it's a bit rude of authors to keep writing new papers before we've finished reading the old ones
17:27:30 <cohosh> lol XD
17:27:39 <cohosh> we discussed geneva in the past: https://gitlab.torproject.org/tpo/anti-censorship/team/-/wikis/Anti-censorship-reading-group
17:27:45 <cohosh> but not the server-side evasion paper
17:28:02 <dcf1> okay let's plan to do the QQMail one
17:28:11 <cohosh> cool
17:28:16 <cohosh> in two weeks?
17:28:17 <dcf1> I can read the server-side one and find out if we want to spend another session on it
17:28:55 <dcf1> that works for me, Armistice Day
17:29:12 <meskio> sounds good, we have the next paper
17:30:25 <cohosh> awesome, i'll close the meeting here then :)
17:30:30 <cohosh> #endmeeting