16:00:04 <meskio> #startmeeting tor anti-censorship meeting 16:00:04 <MeetBot> Meeting started Thu Mar 6 16:00:04 2025 UTC. The chair is meskio. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:04 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic. 16:00:08 <meskio> hello everybody 16:00:11 <meskio> here is our meeting pad: https://pad.riseup.net/p/r.9574e996bb9c0266213d38b91b56c469 16:00:13 <meskio> ask me in private to give you the link of the pad to be able to edit it if you don't have it 16:00:15 <meskio> I'll wait few minutes for everybody to add you've been working on and put items on the agenda 16:00:24 <cohosh> hi 16:00:51 <onyinyang> hihi 16:00:52 <theodorsm> hi 16:01:07 <WofWca[m]> 👋 16:02:29 <shelikhoow> hi~ 16:02:38 <meskio> I guess we can start 16:03:02 <meskio> there are a couple of discussion points that were already in the pad from last week marked as to discuss today 16:03:18 <shelikhoo> hi 16:03:23 <meskio> Should we user test snowflake with covert-dtls? It is difficult to force Snowflake client to become the DTLS client: 16:03:50 <meskio> theodorsm? 16:04:22 <theodorsm> Long story short: we have to deploy custom proxies with DTLS role set to server to enforce setup:active in the SDP Answer and the Snowflake becoming the DTLS client 16:05:17 <theodorsm> Is that feasable to do? Deploying test proxies for user testing? 16:05:19 <cohosh> nice work tracking that down theodorsm 16:06:17 <theodorsm> I don't have any stats to back it up, but I then assume most Snowflake Clients are actually DTLS Server in the handshake, thus less prone to Client fingerprinting, but proxies are more vulnerable to fingerprinting 16:07:05 <theodorsm> Secondly, Firefox has adopted DTLS 1.3 by default in webrtc, so to mimic the firefox fingerprint I had to update covertDTLS to support the new key_share extension 16:08:28 <theodorsm> That took some effort, but I am soon merging it to main and deploying a new version of covertDTLS. 16:08:40 <meskio> wow 16:08:49 <meskio> nice work there 16:08:58 <shelikhoo> nice work! 16:09:00 <theodorsm> tnx:) 16:09:35 <cohosh> we have a few options for testing 16:10:23 <cohosh> we could run some proxies with the production broker that have this setup:active feature you mentioned, but it will be difficult to test because we don't have anything in place to control whether as a client you get one of these proxies 16:10:54 <cohosh> we did discuss running a staging broker with a few proxies last week for testing another change 16:11:23 <theodorsm> Ah, I guess we have multiple usecases for a staging broker then 16:11:53 <theodorsm> I think that would be a nice option, if that is something that is being worked on 16:12:23 <shelikhoo> X~X I have got some work done on containerizing the snowflake stack done, and ready to explore how to deploy it once get approval 16:12:56 <shelikhoo> https://gist.github.com/xiaokangwang/0aecf8e40789a91ca3426038045b35f3 16:13:16 <cohosh> shelikhoo: nice, do you need something from the rest of us for that? 16:13:58 <shelikhoo> cohosh: not yet. I will let you know if there is any 16:14:09 <shelikhoo> thanks!!!! 16:14:12 <cohosh> cool, thanks for working on that! 16:14:39 <cohosh> theodorsm: we could also run some proxies on the regular network with a covertdtls fingerprint to make sure that works 16:15:02 <meskio> I've being doing that 16:15:02 <cohosh> this is a little tricky, leaving it as a bridge-configurable option at the client will have much more agility in the presence of censorship events 16:15:10 <cohosh> so it does seem like ultimately that's what we want 16:15:35 <meskio> I haven't check for a while the metrics, but in my proxies it looks fine, as in getting traffic from many places 16:15:40 <cohosh> ah nice 16:16:00 <theodorsm> meskio: nice! I guess we also have to deploy a new version with the new key_share extension etc. 16:16:39 <meskio> ok, I can do that, and now I don't need to hack country metrics, they got merged in snowflake 16:17:12 <meskio> is this extension released? can I just update covertdtls on your merge request and it will be there? 16:17:15 <theodorsm> Cool! I will update the MR with the new covertDTLS version soon and tag you. 16:17:18 <meskio> anything specific to configure? 16:17:29 <meskio> cool, I'll wait for that, thanks 16:17:29 <cohosh> do we have a goal for testing? i suppose there aren't any active dtls fingerprinting events that we know of, so it might be mostly testing reliability at this point? and that the fingerpritns are being applied properly? 16:18:10 <theodorsm> cohosh: yes, I am a bit concerned about reliability and valdation of fingerprints. 16:18:36 <cohosh> ok great it sounds like the staging environment will help with that then :) 16:19:11 <cohosh> thanks for your patience theodorsm, these changes take time to deploy but they'll definitely be worth it 16:20:05 <theodorsm> ^^ 16:20:31 <meskio> cool, we have next steps for this and we'll have to wait for the testing server to be there 16:21:04 <meskio> should we move to the next topic? 16:21:10 <theodorsm> Yes, all from me 16:21:28 <meskio> snowflake broker match failure rate is high? 16:21:36 <meskio> cohosh? 16:21:52 <cohosh> yeah, i've been doing a deep dive into snowflake rendezvous failures 16:21:59 <cohosh> and i think it's happening more than we realized 16:22:13 <cohosh> partially because of a bug in our metrics 16:22:40 <cohosh> where snowflake-client-denied is only counted if the snowflake wasn't matched with a polling proxy 16:23:08 <cohosh> *client-denied-count that is 16:23:26 <meskio> ohh 16:23:26 <cohosh> and client-$rendezvous_method-count is only counted if the snowflake received an answer 16:23:42 <cohosh> so there is an unknown number of snowflakes that are timing out and those polls and timeouts aren't counted 16:24:04 <cohosh> from my own unscientific experience, it looks like when i make two simultaneous polls at least one of them times out 16:24:16 <cohosh> and the broker logs are full of time out messages 16:24:39 <cohosh> i don't think this is a disaster for usability because clients do get snowflake eventually but i'd like to find out the cause of this problem 16:25:20 <cohosh> my first proposal is to fix the metrics, but i also wanted to brainstorm some things to check 16:25:30 <shelikhoo> did we isolated the component that is causing this issue? 16:25:43 <cohosh> WofWca[m] had the idea that maybe it's due to proxies taking too long to do ICE gathering 16:26:02 <shelikhoo> yes... 16:26:20 <cohosh> we've merged a fix for that in the standalone proxy code but it will take a while to deploy 16:26:41 <cohosh> i'm also suspicious of the sqs rendezvous failures indicating some sort of resource limits at the broker 16:27:01 <cohosh> https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/issues/40447#note_3169787 16:27:26 <cohosh> my first thought was open file limits but i checked and they look fine to me 16:27:51 <cohosh> i wonder if there's a way to figure out if goroutines are being blocked for too long 16:28:36 <shelikhoo> https://pkg.go.dev/os/signal 16:28:44 <shelikhoo> A SIGQUIT, SIGILL, SIGTRAP, SIGABRT, SIGSTKFLT, SIGEMT, or SIGSYS signal causes the program to exit with a stack dump. 16:28:54 <shelikhoo> we can send a sigquit to a golang program 16:29:08 <shelikhoo> and it will print stack trace, and the status of each goroutine 16:29:17 <cohosh> i don't feel great about doing that on the production broker 16:29:18 <shelikhoo> including how long it has been waiting 16:29:21 <shelikhoo> oh yes 16:29:32 <shelikhoo> I don't like as well... 16:29:37 <cohosh> and it seems like a problem caused by load, so it might be difficult to simulate 16:29:43 <cohosh> maybe i can figure out something in shadow 16:29:56 <cohosh> ah nope 16:30:04 <WofWca[m]> There should be some tool ro make a "flame graph". 16:30:06 <cohosh> shadow can't simulate CPU congestion 16:30:44 <shelikhoo> https://pkg.go.dev/net/http/pprof 16:30:54 <cohosh> yeah i suppose we could use some sort of benchmarking/instrumentation tool 16:30:55 <shelikhoo> I think there is such a tool 16:31:03 <shelikhoo> but I don't know if this one will work 16:31:21 <cohosh> we've used pprof before for production snowflake pieces 16:31:21 <meskio> killing the broker for a second in production is not that bad, it will not kill any existing snowflake traffic, clients will just take a bit longer to connect, is similar to a restart... 16:31:23 <cohosh> i think the server 16:31:47 <cohosh> it may impact performance 16:32:19 <shelikhoo> I agree we should reduce service interruption when possible 16:32:35 <meskio> sure, pprof might be a better option 16:32:44 <cohosh> meskio: yeah that's true, it will impact metrics, no more than a restart, but i think what we learn from a single stack trace is limited 16:32:45 <shelikhoo> let's see if we could run the analysis invasively 16:34:05 <cohosh> ok i'll put together a MR for a pprof patch and we can discuss deploying it for a short period of time 16:34:18 <meskio> sounds good 16:34:25 <cohosh> in the meantime i'll work on metrics so we can see how often this is happening 16:34:26 <shelikhoo> nice! thanks!!! 16:34:27 <WofWca[m]> So are we sure right now that the timeouts are not caused by proxies not sending the answer in time? 16:34:38 <cohosh> WofWca[m]: no we're not sure about that 16:34:50 <cohosh> there may be many things going on 16:35:01 <cohosh> the sqs failures are not due to the proxies but the bulk of timeouts may be 16:35:23 <cohosh> so we can move forward with deploying the patch you wrote as well 16:35:40 <WofWca[m]> Maybe it makes sense to make a release and see if the timeout rate goes down? 16:36:00 <cohosh> yeah, at the moment we don't even know what the timeout rate is 16:36:03 <WofWca[m]> Because our Docker compose automatically updates the proxies. 16:36:29 <WofWca[m]> For iptroxy, we'll need to wait for Orbot release 16:36:49 <cohosh> that's right 16:37:01 <WofWca[m]> And I guess the extension needs to be updated too 16:37:22 <cohosh> yes 16:37:56 <cohosh> i'll create and link some issues to https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/issues/40447 16:38:10 <cohosh> so we can track all of this 16:38:32 <meskio> +1 16:38:45 <cohosh> thanks everyone for the brainstorming! 16:39:07 <meskio> anything more on this topic? any other discussion points? 16:39:10 <shelikhoo> hehe! thanks!!! 16:39:23 <WofWca[m]> Not from me 16:39:28 <shelikhoo> eof from shell 16:39:31 <meskio> on the interesting links we have the snowflake daily operations for february 16:39:33 <meskio> https://forum.torproject.org/t/snowflake-daily-operations-february-2025-update/17759 16:39:40 <meskio> snowflake users keep declining 16:39:50 <meskio> I hope this means less censorship not issues with snowflake 16:40:38 <meskio> is weird that the decline in ir and ru are very similar, maybe as dcf sugests is related to azure going down 16:41:00 <meskio> anyway, still daily ~16k users 16:41:09 <meskio> anything else for this meeting? 16:41:22 <onyinyang> nothing from me 16:41:25 <shelikhoo> eof 16:41:34 <emmapeel> i have a little advertisement 16:41:45 <meskio> go ahead emmapeel 16:41:51 * onyinyang plays jingle 16:42:03 <shelikhoo> hi~hi~ emmapeel!!!! 16:42:03 <emmapeel> the new version of the Snowflake website is on, and you can help to translate it at https://hosted.weblate.org/projects/tor/snowflake-web/ 16:42:21 <meskio> https://snwoflake.torproject.org/ 16:42:22 <cohosh> <3 16:42:28 <shelikhoo> ^~^ 16:42:41 <meskio> https://blog.torproject.org/snowflake-refresh-to-help-more-people-get-online/ 16:42:54 <meskio> yes, I don't think we have announced it in this meeting 16:42:57 <meskio> thanks for the reminder 16:43:25 <emmapeel> please help is to make a nice translation and dont hesitate to contact me if your language is not available 16:43:43 <onyinyang> <3 Thanks emmapeel! 16:44:17 <emmapeel> thanks for all these great tools! 16:44:40 <emmapeel> here more information about translation https://community.torproject.org/localization/becoming-tor-translator/ i shut up now 16:44:50 <meskio> :) 16:45:34 <meskio> I'll end the meeting then 16:45:44 <meskio> #endmeeting