16:00:27 <onyinyang> #startmeeting tor anti-censorship meeting
16:00:27 <MeetBot> Meeting started Thu Jun 26 16:00:27 2025 UTC.  The chair is onyinyang. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:27 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic.
16:00:42 <meskio> hello
16:01:48 <meskio> while people update the pad a reminder:
16:02:12 <meskio> Next week is mid-year Tor break, so all of us will be (mostly) AFK and there will not be any meeting
16:02:26 <meskio> next meeting is July 11
16:02:30 <meskio> is it 11th?
16:03:06 <meskio> yes, 11th, I'll not be around that week neither
16:03:23 <shelikhoo> hi~hi~
16:04:51 <cohosh> hi
16:06:14 <onyinyang> ok, let's start
16:06:40 <onyinyang> We have a couple of discussion points and we'll end with the reading group :)
16:06:54 <onyinyang> The first discussion point is on the Iran network shutdown
16:07:07 <meskio> I guess we should update the title
16:07:12 <meskio> as the network is comming back
16:07:22 <meskio> and we can see how snowflake is being overloaded
16:07:25 <meskio> again
16:08:07 <meskio> and now we have access to a vantage point in the country, so maybe we can investigate if they are blocking it by fingerprint or listing proxies
16:08:29 <shelikhoo> yes, I have yet to check that vantage point...
16:08:50 <meskio> I had ssh into it and it was slow but worked, but I didn't have the time to run anything from there
16:08:56 <cohosh> the broker is very overloaded
16:09:07 <cohosh> or perhaps just the proxy pool
16:10:05 <cohosh> the restricted proxy pool is totally used up and it looks like we have an equal number of matched vs idle proxies overall
16:10:53 <cohosh> but the bridge also shows that the number of currently connected clients has almost doubled since the 20th: https://metrics.torproject.org/userstats-bridge-transport.html?start=2025-06-01&end=2025-06-26&transport=snowflake
16:11:50 <meskio> and 10k of those ~35k are from Iran: https://metrics.torproject.org/userstats-bridge-combined.html?start=2025-03-28&end=2025-06-26&country=ir
16:11:56 <onyinyang> woah
16:11:59 <cohosh> it could be that it isn't blocked, and difficulty connecting is due to proxy pool and broker capacity
16:13:06 <meskio> but we've seen more users in the past
16:13:14 <shelikhoo> or maybe everyone just got internet back and need to access network at the same time
16:13:28 <shelikhoo> let's say someone check email once per week
16:13:30 <meskio> now we are in 35k, we hitted 90k in 2023: https://metrics.torproject.org/userstats-bridge-transport.html?start=2021-03-01&end=2025-06-26&transport=snowflake
16:13:46 <cohosh> ah good point
16:14:06 <shelikhoo> when network is back everyone will check email at the same time
16:14:07 <shelikhoo> over
16:14:38 <meskio> I agree, everybody is trying to connect at once from Iran, but I also have the feeling that there is something happening there depleting our proxy pool
16:15:11 <cohosh> the broker does seem disproportionately overwhelmed
16:15:41 <ggus> re: snowflake lack of proxies: i believe ac-team needs to feed comms team with more specific needs, just saying 'we need more proxies' is not very engaging or appealing for potential volunteers outside our community, and right now, we need more volunteers outside the tor community. is there a number of how many proxies are needed or how many snowflake users we have in iran right now or even, how many
16:15:47 <ggus> users in iran are being rejected because of lack of proxies? i can create a ticket about finding new volunteers, but i'm lacking of those information from ac-team.
16:15:52 <dcf1> the broker seems ok on resource consumption currently, though I will put it back to the configuration it was at after this month so it remains under the credit
16:16:40 <ggus> that said, i asked arturo to post a call for snowflake proxies in our social channels: https://mastodon.social/@torproject/114745109066226826
16:17:55 <meskio> ggus: you have lack of information from us, because we don't know exactly what is happening
16:18:09 <meskio> we need to investigate it, and we couldn't do it without a vantage point
16:18:17 <meskio> that is why we've been so vage
16:18:33 <meskio> thanks for the call for proxies
16:19:16 <shelikhoo> yes! we does need more proxies
16:19:42 <meskio> but it was not even clear if more standalone proxies improve or create more problems...
16:19:54 <cohosh> ggus: yes, thanks. i think a ticket that we can update as we get more information would be best
16:20:56 <cohosh> more info to potential volunteers is a good idea
16:21:34 <cohosh> it's unfortunate our scraped prometheus metrics aren't publicly avaible, that's the best source of info we have on real-time proxy pool usage
16:21:57 <cohosh> but we could maybe give updated screen shots to show the number of client polls that are matched
16:22:31 <cohosh> there are also some things we could do with the snowflake-stats metrics in CollecTor
16:24:11 <ggus> ^yeah, i think screenshots to show the number of client polls is a good idea; it's like when you have a crowdfunding campaign and you see the project reaching the goal in a status bar
16:25:32 <meskio> I plan to be partially online until next thursday, I can help producing those screenshots, if someone is going to be around to use them
16:27:46 <ggus> https://gitlab.torproject.org/tpo/community/relays/-/issues/116
16:28:28 <ggus> cohosh: we don't know if the rejected clients because of the lack of enough proxies are from iran or other country, right?
16:29:10 <meskio> I'll keep updated this ticket until get AFK
16:29:38 <shelikhoo> thanks
16:29:46 <cohosh> ggus: from prometheus we do have country information for client polls
16:30:55 <meskio> is this from the new broker metrics change?
16:31:05 <meskio> or I've missed the whole time?
16:31:20 <cohosh> no, it's been there the whole time
16:31:25 <meskio> wow, nice
16:31:25 <cohosh> i think, let me check heh
16:31:44 <meskio> I can update the dashboard so we have a country selector
16:32:03 <cohosh> yeah https://snowflake-broker.torproject.net/prometheus
16:32:24 <cohosh> snowflake_rounded_client_poll_total has a "cc" field
16:32:51 <meskio> :)
16:33:16 <onyinyang> anything else on this topic?
16:33:32 <shelikhoo> nothing from me
16:33:40 <meskio> after the break I hope someone can investigate the situation in Iran
16:33:54 <meskio> if I find the time I'll have a look and document whatever I find
16:34:31 <meskio> nothing more from me
16:35:03 <onyinyang> hopefully next week will be uneventful o_o
16:35:41 <onyinyang> anyway, the next topic is the webtunnel bridges block in Russia
16:36:04 <meskio> since a couple of days there are many user reports saying that webtunnel bridges are being block in Russia
16:36:10 <shelikhoo> I have tested this and found the block was based on SNI
16:36:35 <meskio> so the censor is listing bridges and blocking them
16:36:37 <shelikhoo> so, it is likely that censor has collected bridge lines and blocked their SNI name
16:36:54 <meskio> make sense, it was even surprising it took them so long to do it
16:37:03 <shelikhoo> yes, listing bridge is my assumption
16:37:05 <dcf1> because connecting to the bridge with an altered SNI works, correct?
16:37:15 <meskio> yes, people is doing that
16:37:41 <meskio> there is a modifyed version of webtunnel that does domain fronting
16:37:52 <meskio> and there are reports that works for half of the bridges
16:38:53 <ggus> i think it worth to share the context that this block is on top of obfs4 block (and that's why we did a webtunnel campaign in december 2024 - feb 2025), so mobile tor users are pratically blocked
16:41:15 <meskio> I assume the bridges that are not working with this domain fronting patch might be because their webserver rejects requests with an SNI they don't host
16:41:26 <meskio> what I'm surprised is that actually work for many
16:41:56 <meskio> people is just using google.com as SNI or youtu.be
16:44:16 <shelikhoo> actually in the background I was developing a protocol that could bypass tls sni based block while looks like tls
16:44:32 <shelikhoo> but it is still in development, and has not get a working stage yet
16:44:44 <shelikhoo> but I think this event does make something like this more important
16:44:53 <meskio> :)
16:45:20 <onyinyang> indeed :)
16:45:31 <onyinyang> are there any other actions we can take in the meantime, or anything else to discuss on this topic?
16:46:15 <shelikhoo> nothing from me other than we should keep monitoring the situation
16:46:20 <meskio> not from me, I'll try to look more into this and see if I have any concrete proposals
16:46:27 <shelikhoo> and maybe ingest the patch into main
16:46:46 <shelikhoo> if it does cover a usage case unsupported by our main branch
16:47:03 <meskio> some of the things in the patch are already in lyrebird, like uTLS support
16:47:18 <meskio> or it is adding cert pinning that you have already mostly done
16:47:37 <meskio> but the host http header setting should be included, I agree
16:47:38 <shelikhoo> maybe it is cert pinning? I think it might still be missing in lyrebird
16:47:48 <shelikhoo> yes, and the http header setting
16:48:26 <meskio> yes
16:48:32 <shelikhoo> that's all from me on this topic
16:48:49 <onyinyang> ok well. . .that leaves 10 min for the reading group, which isn't very much
16:49:11 * meskio can do another 10min of overtime if needed, but not way more
16:49:44 <shelikhoo> I am happy with push it a week as well, if this paper is important\
16:49:49 <shelikhoo> and need more discussion
16:49:58 <onyinyang> I'm good with either
16:50:41 <meskio> I think is an interesting paper, but maybe there is not a long discussion on it, as not so many things there affect us
16:50:49 <onyinyang> I think probably we need more than 10 minutes, so if we are all ok with extending the discussion by 10 extra minutes, let's go ahead
16:50:55 <onyinyang> otherwise, let's push it to next time?
16:51:35 <meskio> let's do it now
16:52:03 <meskio> my fast TL;RD is: the authors found two things:
16:52:29 <meskio> * chinese censorship is not anymore centralized on the edge of the country and now there is one region with it's own extra firewall
16:52:42 <dcf1> https://gfw.report/publications/sp25/en/
16:52:57 <meskio> * the GFW is not perfectly bidirectional and some things are only censored in outgoing connections
16:53:21 <meskio> the paper mostly looks into web blockades
16:53:41 <meskio> do I miss any interesting keypoint?
16:54:08 <shelikhoo> one interesting I wish to discuss is the censor's limitation
16:54:16 <dcf1> the partial bidirectionality is interesting. they say it was first observed by GFWeb, 2024 https://censorbib.nymity.ch/#Hoang2024a
16:54:21 <shelikhoo> like it assume tcp header length
16:54:37 <shelikhoo> and unable to process fragmentation
16:54:54 <dcf1> also the fact that the Henan firewall seems totally different, technically, than the GFW. like its blocking behavior and network fingerprinting is not even close.
16:55:01 <onyinyang> I'm not sure this was never _not_ the case. I'm not sure I read it correctly but I understood it more as: ignoring regional differences in censorship may be missing a lot
16:55:03 <shelikhoo> both of them can give us some hint about how to avoid this censorship in unprivileged userspace
16:55:08 <dcf1> yeah, like those qualities for example.
16:55:16 <dcf1> and the TCP header length = 20 thing is so so weird
16:56:06 <shelikhoo> onyinyang: yes, there are also different level of censorship for different isp
16:56:20 <dcf1> Figure 3 shows cross-province connections as well as international connections https://gfw.report/publications/sp25/en/#fig:3-client-to-sink-server-data-matrix
16:56:51 <dcf1> at least according to this, regional firewalls is not something widespread, but only in Henan. (with the caveat that they say they were not able to test all provinces)
16:57:18 <onyinyang> one thing that struck me as strange/suspicious was that they used a different vps for henan only, it would have been interesting if they used the same vps and/or multiple vps' to compare behaviour
16:57:25 <onyinyang> but I also don't know if this would have mattered at all
16:57:32 <dcf1> The TCP header of 20 bytes thing is especially weird considering a further experiment they did: "the Henan Firewall did parse the TCP header length field in the TCP header, but had a condition to only block a connection when its TCP header length is 20 bytes."
16:57:36 <shelikhoo> actually there is also report of regional firewall in different regions as well
16:58:15 <shelikhoo> however, henan is the one that is easier to publish a paper on
16:58:22 <dcf1> hmm, ok
16:58:40 <shelikhoo> since in other place like Fujian, getting a vps with regional censorship is much harder
16:59:00 <shelikhoo> and typically requires a real residential network
16:59:18 <shelikhoo> which has ethical concerns
16:59:21 <shelikhoo> so...
16:59:35 <meskio> the paper clearly states why they avoided that to don't put people on risk
16:59:42 <shelikhoo> yes
17:00:23 <onyinyang> ah ok
17:01:16 <shelikhoo> but otherwise that 20 byte tcp header assumption is a very interesting point as well
17:01:43 <shelikhoo> I suspect that this works mostly fine for censoring windows machine's traffic
17:02:08 <shelikhoo> which works well enough, at least for people checking how the censorship is working
17:02:39 <dcf1> shelikhoo: what makes you think it is specific to windows?
17:03:00 <dcf1> this is the Nmap TCP/IP fingerprint database: https://svn.nmap.org/nmap/nmap-os-db
17:03:12 <dcf1> The `O=` fields record TCP options: https://nmap.org/book/osdetect-methods.html#osdetect-o
17:03:38 <dcf1> The `%` is a delimiter, so you can search for OSes with empty TCP options by searching for `%O=%`
17:03:40 <shelikhoo> I downloaded a few pcap files captured from window machine, and their first payload packet has no options
17:04:17 <shelikhoo> I didn't say it is specific to windows
17:04:27 <shelikhoo> I just say it works well enough for censor
17:04:59 <dcf1> "this works mostly fine for censoring windows machine's traffic" I'm curious if you know that windows tends to use zero TCP options sometimes, or something like that
17:05:30 <shelikhoo> no... I just inspected a few downloaded pcap from wireshark's website
17:05:45 <shelikhoo> https://gitlab.com/wireshark/wireshark/-/wikis/uploads/__moin_import__/attachments/SampleCaptures/smb-on-windows-10.pcapng
17:05:46 <dcf1> In nmap-os-db, look at the OPS line "contains the TCP options received for each of the probes (the test names are O1 through 06)"
17:05:54 <shelikhoo> https://gitlab.com/wireshark/wireshark/-/wikis/uploads/__moin_import__/attachments/SampleCaptures/nspi.pcap
17:06:05 <dcf1> E.g.
17:06:09 <dcf1> Fingerprint Microsoft Windows 10 1607 - 11 23H2
17:06:12 <dcf1> OPS(O1=MFFD7NW8ST11%O2=MFFD7NW8ST11%O3=MFFD7NW8NNT11%O4=MFFD7NW8ST11%O5=MFFD7NW8ST11%O6=MFFD7ST11)
17:06:43 <dcf1> This is the part that doesn't make sense to make. Things like TCP timestamps are ubiquitous on all but the tiniest network stacks, at least according to my understanding.
17:07:11 <dcf1> I would have thought that limiting censorship to TCP segments without options would affect almost no traffic.
17:07:20 <dcf1> But clearly it must have affected enough traffic for users to notice.
17:07:49 <dcf1> There might be some common situation where TCP connections are set up without options that I'm not aware of. They have a graph showing it's about 20%.
17:08:09 <shelikhoo> we should maybe check more about Windows XP or Windows 7
17:08:17 <dcf1> nothign really to say about it, just it's quite against my intuition
17:08:40 <shelikhoo> I think a lot of enterprise user doesn't really upgrade to the most recent OS
17:08:49 <dcf1> those are represented int he database as well
17:09:10 <dcf1> it's not like the TCP timestamp option, for example, is new technology: https://www.rfc-editor.org/rfc/rfc1323 is from 1992!
17:09:59 <dcf1> that's the part that stood out to me the most, because it's so weird. but it's probably not the most important point.
17:10:16 <meskio> maybe they are recycling an old version of the GFW for this local firewals :P
17:10:33 <dcf1> The 01020304050607080900 RST payload is quite a strange thing too.
17:11:16 <dcf1> The Nmap OS detection documentation comments on that too: https://nmap.org/book/osdetect-methods.html#osdetect-rd
17:11:24 <shelikhoo> at least in smb-on-windows-10.pcapng
17:11:30 <dcf1> "Some operating systems return ASCII data such as error messages in reset packets. This is explicitly allowed by section 4.2.2.12 of RFC 1122." "Some of the few operating systems that may return data in their reset packets are HP-UX and versions of Mac OS prior to Mac OS X."
17:11:33 <shelikhoo> the first payload packet does not have an option
17:11:50 <dcf1> shelikhoo: yes, I assume there must be something I don't understand
17:11:51 <shelikhoo> while subsequent packet might have these option
17:12:17 <shelikhoo> so for nmap, it detect whether it EVER has these options set
17:12:35 <dcf1> But the MSS option, for example, goes on the SYN packet, and it's pretty common
17:12:39 <shelikhoo> but for the censorship, it just need to make sure the client hello message is tcp length = 20
17:12:58 <shelikhoo> syn packet is not the first payload packet
17:13:02 <dcf1> shelikhoo: no, the Nmap data are from individual response packets. not full established connections.
17:13:13 <shelikhoo> okay...
17:13:24 <shelikhoo> sorry I not an expert on nmap
17:13:25 <dcf1> I apologize. I didn't mean to start an argument. As I said, there must be something I don't understand.
17:13:43 <dcf1> I don't have anything else to add.
17:13:48 <shelikhoo> yes... I think we should look at packet captures to find out
17:14:00 <shelikhoo> rather than looking at rfcs
17:14:01 <shelikhoo> over
17:14:34 <onyinyang> I guess we can end it there for today then
17:15:07 <shelikhoo> yes thanks~
17:15:08 <onyinyang> Thanks everyone for the discussion!
17:15:28 <onyinyang> #endmeeting