16:00:04 <meskio> #startmeeting tor anti-censorship meeting
16:00:04 <MeetBot> Meeting started Thu Mar  6 16:00:04 2025 UTC.  The chair is meskio. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:04 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic.
16:00:08 <meskio> hello everybody
16:00:11 <meskio> here is our meeting pad: https://pad.riseup.net/p/r.9574e996bb9c0266213d38b91b56c469
16:00:13 <meskio> ask me in private to give you the link of the pad to be able to edit it if you don't have it
16:00:15 <meskio> I'll wait few minutes for everybody to add you've been working on and put items on the agenda
16:00:24 <cohosh> hi
16:00:51 <onyinyang> hihi
16:00:52 <theodorsm> hi
16:01:07 <WofWca[m]> 👋
16:02:29 <shelikhoow> hi~
16:02:38 <meskio> I guess we can start
16:03:02 <meskio> there are a couple of discussion points that were already in the pad from last week marked as to discuss today
16:03:18 <shelikhoo> hi
16:03:23 <meskio> Should we user test snowflake with covert-dtls? It is difficult to force Snowflake client to become the DTLS client:
16:03:50 <meskio> theodorsm?
16:04:22 <theodorsm> Long story short: we have to deploy custom proxies with DTLS role set to server to enforce setup:active in the SDP Answer and the Snowflake becoming the DTLS client
16:05:17 <theodorsm> Is that feasable to do? Deploying test proxies for user testing?
16:05:19 <cohosh> nice work tracking that down theodorsm
16:06:17 <theodorsm> I don't have any stats to back it up, but I then assume most Snowflake Clients are actually DTLS Server in the handshake, thus less prone to Client fingerprinting, but proxies are more vulnerable to fingerprinting
16:07:05 <theodorsm> Secondly, Firefox has adopted DTLS 1.3 by default in webrtc, so to mimic the firefox fingerprint I had to update covertDTLS to support the new key_share extension
16:08:28 <theodorsm> That took some effort, but I am soon merging it to main and deploying a new version of covertDTLS.
16:08:40 <meskio> wow
16:08:49 <meskio> nice work there
16:08:58 <shelikhoo> nice work!
16:09:00 <theodorsm> tnx:)
16:09:35 <cohosh> we have a few options for testing
16:10:23 <cohosh> we could run some proxies with the production broker that have this setup:active feature you mentioned, but it will be difficult to test because we don't have anything in place to control whether as a client you get one of these proxies
16:10:54 <cohosh> we did discuss running a staging broker with a few proxies last week for testing another change
16:11:23 <theodorsm> Ah, I guess we have multiple usecases for a staging broker then
16:11:53 <theodorsm> I think that would be a nice option, if that is something that is being worked on
16:12:23 <shelikhoo> X~X I have got some work done on containerizing the snowflake stack done, and ready to explore how to deploy it once get approval
16:12:56 <shelikhoo> https://gist.github.com/xiaokangwang/0aecf8e40789a91ca3426038045b35f3
16:13:16 <cohosh> shelikhoo: nice, do you need something from the rest of us for that?
16:13:58 <shelikhoo> cohosh: not yet. I will let you know if there is any
16:14:09 <shelikhoo> thanks!!!!
16:14:12 <cohosh> cool, thanks for working on that!
16:14:39 <cohosh> theodorsm: we could also run some proxies on the regular network with a covertdtls fingerprint to make sure that works
16:15:02 <meskio> I've being doing that
16:15:02 <cohosh> this is a little tricky, leaving it as a bridge-configurable option at the client will have much more agility in the presence of censorship events
16:15:10 <cohosh> so it does seem like ultimately that's what we want
16:15:35 <meskio> I haven't check for a while the metrics, but in my proxies it looks fine, as in getting traffic from many places
16:15:40 <cohosh> ah nice
16:16:00 <theodorsm> meskio: nice! I guess we also have to deploy a new version with the new key_share extension etc.
16:16:39 <meskio> ok, I can do that, and now I don't need to hack country metrics, they got merged in snowflake
16:17:12 <meskio> is this extension released? can I just update covertdtls on your merge request and it will be there?
16:17:15 <theodorsm> Cool! I will update the MR with the new covertDTLS version soon and tag you.
16:17:18 <meskio> anything specific to configure?
16:17:29 <meskio> cool, I'll wait for that, thanks
16:17:29 <cohosh> do we have a goal for testing? i suppose there aren't any active dtls fingerprinting events that we know of, so it might be mostly testing reliability at this point? and that the fingerpritns are being applied properly?
16:18:10 <theodorsm> cohosh: yes, I am a bit concerned about reliability and valdation of fingerprints.
16:18:36 <cohosh> ok great it sounds like the staging environment will help with that then :)
16:19:11 <cohosh> thanks for your patience theodorsm, these changes take time to deploy but they'll definitely be worth it
16:20:05 <theodorsm> ^^
16:20:31 <meskio> cool, we have next steps for this and we'll have to wait for the testing server to be there
16:21:04 <meskio> should we move to the next topic?
16:21:10 <theodorsm> Yes, all from me
16:21:28 <meskio> snowflake broker match failure rate is high?
16:21:36 <meskio> cohosh?
16:21:52 <cohosh> yeah, i've been doing a deep dive into snowflake rendezvous failures
16:21:59 <cohosh> and i think it's happening more than we realized
16:22:13 <cohosh> partially because of a bug in our metrics
16:22:40 <cohosh> where snowflake-client-denied is only counted if the snowflake wasn't matched with a polling proxy
16:23:08 <cohosh> *client-denied-count that is
16:23:26 <meskio> ohh
16:23:26 <cohosh> and client-$rendezvous_method-count is only counted if the snowflake received an answer
16:23:42 <cohosh> so there is an unknown number of snowflakes that are timing out and those polls and timeouts aren't counted
16:24:04 <cohosh> from my own unscientific experience, it looks like when i make two simultaneous polls at least one of them times out
16:24:16 <cohosh> and the broker logs are full of time out messages
16:24:39 <cohosh> i don't think this is a disaster for usability because clients do get snowflake eventually but i'd like to find out the cause of this problem
16:25:20 <cohosh> my first proposal is to fix the metrics, but i also wanted to brainstorm some things to check
16:25:30 <shelikhoo> did we isolated the component that is causing this issue?
16:25:43 <cohosh> WofWca[m] had the idea that maybe it's due to proxies taking too long to do ICE gathering
16:26:02 <shelikhoo> yes...
16:26:20 <cohosh> we've merged a fix for that in the standalone proxy code but it will take a while to deploy
16:26:41 <cohosh> i'm also suspicious of the sqs rendezvous failures indicating some sort of resource limits at the broker
16:27:01 <cohosh> https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/issues/40447#note_3169787
16:27:26 <cohosh> my first thought was open file limits but i checked and they look fine to me
16:27:51 <cohosh> i wonder if there's a way to figure out if goroutines are being blocked for too long
16:28:36 <shelikhoo> https://pkg.go.dev/os/signal
16:28:44 <shelikhoo> A SIGQUIT, SIGILL, SIGTRAP, SIGABRT, SIGSTKFLT, SIGEMT, or SIGSYS signal causes the program to exit with a stack dump.
16:28:54 <shelikhoo> we can send a sigquit to a golang program
16:29:08 <shelikhoo> and it will print stack trace, and the status of each goroutine
16:29:17 <cohosh> i don't feel great about doing that on the production broker
16:29:18 <shelikhoo> including how long it has been waiting
16:29:21 <shelikhoo> oh yes
16:29:32 <shelikhoo> I don't like as well...
16:29:37 <cohosh> and it seems like a problem caused by load, so it might be difficult to simulate
16:29:43 <cohosh> maybe i can figure out something in shadow
16:29:56 <cohosh> ah nope
16:30:04 <WofWca[m]> There should be some tool ro make a "flame graph".
16:30:06 <cohosh> shadow can't simulate CPU congestion
16:30:44 <shelikhoo> https://pkg.go.dev/net/http/pprof
16:30:54 <cohosh> yeah i suppose we could use some sort of benchmarking/instrumentation tool
16:30:55 <shelikhoo> I think there is such a tool
16:31:03 <shelikhoo> but I don't know if this one will work
16:31:21 <cohosh> we've used pprof before for production snowflake pieces
16:31:21 <meskio> killing the broker for a second in production is not that bad, it will not kill any existing snowflake traffic, clients will just take a bit longer to connect, is similar to a restart...
16:31:23 <cohosh> i think the server
16:31:47 <cohosh> it may impact performance
16:32:19 <shelikhoo> I agree we should reduce service interruption when possible
16:32:35 <meskio> sure, pprof might be a better option
16:32:44 <cohosh> meskio: yeah that's true, it will impact metrics, no more than a restart, but i think what we learn from a single stack trace is limited
16:32:45 <shelikhoo> let's see if we could run the analysis invasively
16:34:05 <cohosh> ok i'll put together a MR for a pprof patch and we can discuss deploying it for a short period of time
16:34:18 <meskio> sounds good
16:34:25 <cohosh> in the meantime i'll work on metrics so we can see how often this is happening
16:34:26 <shelikhoo> nice! thanks!!!
16:34:27 <WofWca[m]> So are we sure right now that the timeouts are not caused by proxies not sending the answer in time?
16:34:38 <cohosh> WofWca[m]: no we're not sure about that
16:34:50 <cohosh> there may be many things going on
16:35:01 <cohosh> the sqs failures are not due to the proxies but the bulk of timeouts may be
16:35:23 <cohosh> so we can move forward with deploying the patch you wrote as well
16:35:40 <WofWca[m]> Maybe it makes sense to make a release and see if the timeout rate goes down?
16:36:00 <cohosh> yeah, at the moment we don't even know what the timeout rate is
16:36:03 <WofWca[m]> Because our Docker compose automatically updates the proxies.
16:36:29 <WofWca[m]> For iptroxy, we'll need to wait for Orbot release
16:36:49 <cohosh> that's right
16:37:01 <WofWca[m]> And I guess the extension needs to be updated too
16:37:22 <cohosh> yes
16:37:56 <cohosh> i'll create and link some issues to https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/issues/40447
16:38:10 <cohosh> so we can track all of this
16:38:32 <meskio> +1
16:38:45 <cohosh> thanks everyone for the brainstorming!
16:39:07 <meskio> anything more on this topic? any other discussion points?
16:39:10 <shelikhoo> hehe! thanks!!!
16:39:23 <WofWca[m]> Not from me
16:39:28 <shelikhoo> eof from shell
16:39:31 <meskio> on the interesting links we have the snowflake daily operations for february
16:39:33 <meskio> https://forum.torproject.org/t/snowflake-daily-operations-february-2025-update/17759
16:39:40 <meskio> snowflake users keep declining
16:39:50 <meskio> I hope this means less censorship not issues with snowflake
16:40:38 <meskio> is weird that the decline in ir and ru are very similar, maybe as dcf sugests is related to azure going down
16:41:00 <meskio> anyway, still daily ~16k users
16:41:09 <meskio> anything else for this meeting?
16:41:22 <onyinyang> nothing from me
16:41:25 <shelikhoo> eof
16:41:34 <emmapeel> i have a little advertisement
16:41:45 <meskio> go ahead emmapeel
16:41:51 * onyinyang plays jingle
16:42:03 <shelikhoo> hi~hi~ emmapeel!!!!
16:42:03 <emmapeel> the new version of the Snowflake website is on, and you can help to translate it at https://hosted.weblate.org/projects/tor/snowflake-web/
16:42:21 <meskio> https://snwoflake.torproject.org/
16:42:22 <cohosh> <3
16:42:28 <shelikhoo> ^~^
16:42:41 <meskio> https://blog.torproject.org/snowflake-refresh-to-help-more-people-get-online/
16:42:54 <meskio> yes, I don't think we have announced it in this meeting
16:42:57 <meskio> thanks for the reminder
16:43:25 <emmapeel> please help is to make a nice translation and dont hesitate to contact me if your language is not available
16:43:43 <onyinyang> <3 Thanks emmapeel!
16:44:17 <emmapeel> thanks for all these great tools!
16:44:40 <emmapeel> here more information about translation https://community.torproject.org/localization/becoming-tor-translator/ i shut up now
16:44:50 <meskio> :)
16:45:34 <meskio> I'll end the meeting then
16:45:44 <meskio> #endmeeting