16:00:10 <meskio> #startmeeting tor anti-censorship meeting
16:00:10 <MeetBot> Meeting started Thu Feb  3 16:00:10 2022 UTC.  The chair is meskio. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:10 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic.
16:00:13 <shelikhoo> hi~
16:00:15 <meskio> hello everybod!!
16:00:19 <cohosh> hi!
16:00:25 <meskio> here is our meeting pad: https://pad.riseup.net/p/tor-anti-censorship-keep
16:00:36 <meskio> feel free to add what you've been working on and put items on the agenda
16:00:58 <meskio> I see the discussion points are still from last week
16:01:21 <dcf1> no, top 2 are refreshed
16:01:32 <meskio> ok, removing the other two
16:02:04 <meskio> should we start with the snowflake bridge? and others can add extra points if they come up with more stuff
16:02:08 <meskio> how is it going dcf1?
16:02:33 <dcf1> give me a sec, just want to quickly upload graphs that have 1 more data point
16:02:41 <meskio> :)
16:02:51 <irl> hi
16:02:55 <acute> hi o/
16:03:56 <dcf1> https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/issues/40095#note_2774428
16:04:04 <dcf1> refresh if you had it open already
16:04:25 <dcf1> the summary is what's written in the pad
16:04:39 <dcf1> the load balancing migration was successful, the snowflake bridge is no longer bottlenecking on tor
16:04:45 <cohosh> \o/
16:04:49 <dcf1> because it that, it's able to provide more bandwidth
16:05:01 <anadahz> o/
16:05:04 <dcf1> but also because of that, it has reached the limit of its hardware and won't be able to go any faster
16:05:04 <meskio> yeah!
16:05:36 <dcf1> metrics are confused by multiple instances sharing the same fingerprint, so I had to make graphs manually from extra-info descriptors
16:06:23 <cohosh> does collector keep the descriptors from each instance, or did you have to get them from the snowflake server itself?
16:06:44 <shelikhoo> \0/
16:06:57 <dcf1> collector has everything
16:07:06 <cohosh> nice :) that's good at least
16:07:13 <dcf1> the data are all there, making the graphs is a matter of interpreting them differently
16:08:08 <dcf1> I want to start a discussion with core tor about:
16:08:31 <dcf1> 1. finding a different way to do ExtORPort authentication (remove the need for extor-static-cookie), and
16:08:44 <dcf1> 2. having a supported way to disable onion key rotation.
16:09:31 <dcf1> The trick I found to disable onion key rotation was to create a directory at the target path that tor tries to rename its files to when doing the rotation, which causes it to fail before replacing the file
16:09:37 <cohosh> ahf: ^ so you can read the backlog when you are free :)
16:09:43 <dcf1> needless to say that's a little fragile, though it works for now
16:10:48 <dcf1> snowflake has a lot more potential. if we had a bigger bridge with more processing power, it could go a lot faster, I think.
16:11:26 <cohosh> yeah this is awesome
16:11:33 <dcf1> based on estimates of the 6 days on the staging bridge, though, it's no longer just a matter of CPU, we would need a lot of network transfer per month, too. at least 50 TB / month in both directions, though obviously if it's going faster it will need even more.
16:13:37 <dcf1> the next point is kind of related
16:13:38 <meskio> I think there are still some server providers that allow unlimited bandwith
16:13:47 <dcf1> https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/issues/40086#note_2773087
16:13:55 <anadahz> dcf1: Asking at tor-relays maybe of help to get some feedback about servers and performance.
16:14:16 <dcf1> anadahz: there is a thread on tor-relays about this issue already.
16:14:44 <dcf1> snowflake-server alone uses about half the CPU resources of the bridge (4*tor, 4*extor-static-cookie, haproxy use the other half)
16:15:06 <dcf1> anadahz: https://forum.torproject.net/t/tor-relays-how-to-reduce-tor-cpu-load-on-a-single-bridge/1483
16:15:56 <anadahz> great! https://lists.torproject.org/pipermail/tor-relays/2022-January/020183.html
16:16:30 <dcf1> there were not any obvious easy wins in the CPU profile. a lot of the graph is understandably under kcp-go's scheduler, which is the driver for a lot of other operations
16:17:55 <ahf> cohosh, dcf1: very nice! i saw the BUG() ticket in tpo/core/tor related to this and i love this hack you have found here
16:18:52 <dcf1> thanks ahf. the BUG is not causing any problems, to be clear. just something incidentally discovered while doing these other things.
16:19:02 <ahf> ya, i also read it as not blocking anything there
16:21:10 <cohosh> dcf1: are we at a point where we should look for other hosting options for snowflake and possibly reach out to people about taking this on?
16:22:25 <shelikhoo> I think in long term we need to aim for allowing a more distributed deployment of snowflake server,
16:22:25 <dcf1> yes, I think so. I was thinking about asking around to see if there's any groups interested.
16:22:57 <meskio> maybe ggus can help here
16:24:13 <meskio> dcf1: do you already have groups in mind to ask? do you need help there?
16:24:53 <dcf1> i'll let you know
16:25:14 <meskio> great
16:25:39 <meskio> anything more about this? any other topics before we move to the reading group?
16:25:44 <dcf1> I'm done
16:26:02 <cohosh> thanks dcf1 for your work on this
16:26:50 <cohosh> it's awesome to see snowflake growing so much :D
16:27:12 <meskio> \o/
16:27:40 <shelikhoo> thanks dcf1 ! \o/
16:27:46 <meskio> I guess we can move to the reading group
16:28:07 <meskio> for this session we had readed 'Meteor: Cryptographically Secure Steganography for Realistic Distributions'
16:28:16 <meskio> https://dl.acm.org/doi/10.1145/3460120.3484550
16:28:25 <meskio> anybody has a summary of the paper to share?
16:29:59 <shelikhoo> TLDR: This is a paper that purposed a method to encode data into English natural texts that fit the context.
16:31:04 <meskio> my knowledge of steganografphy is very limited and I had a hard time to grasp the details of the paper
16:32:09 <meskio> is interested how they use english language generators like GPT-2 to cover traffic inside
16:33:00 <meskio> but see their usage of CPU for very small texts it looks pretty limited in reality
16:33:34 <dcf1> I think, think of it like this:
16:34:04 <dcf1> both sides have a shared key (so they can encrypt/decrypt from each other, and equally importantly, they can produce a random-looking ciphertext stream between them.
16:35:03 <dcf1> When a generator like GPT-2 generates a text, for each word it finds a probability distribution for what the next word should be (A bunch of probability values that sum to 1.0). Then it internally does a weighted random selection over that probability distribution.
16:35:45 <dcf1> Meteor is replacing GPT-2 random number generators with (cryptographically) pseudo-random bits taken from the ciphertext they are exchanging
16:36:15 <dcf1> so it's actually deterministic, but an adversary who is watching the distributions being generated cannot tell that, without knowledge of the shared key
16:36:39 <shelikhoo> This can be used to transfer metadata, like Bridge line, on platform controlled by adversary like Chat Apps. I remember that things like OnionShare will generate a onion link that is a quite obvious target for chat app observers. If the link is encoded by this, then it is not possible for chat app observers to know people are using OnionShare by scanning for .onion keyword in chat log
16:37:30 <dcf1> On the receiving side, the receiver looks at what word ws generated, looks at the range of random values that could have resulted in that word, and takes the common prefix of those values (if any) as a few bits of the covert message.
16:37:47 <dcf1> they can do that because both sides share the same generative text model.
16:39:01 <meskio> dcf1: nice summary, thanks :)
16:39:21 <shelikhoo> thanks for the summary...
16:40:52 <cohosh> this needs a handshake to take place, does that handshake information to construct the model as well as the PRG key?
16:41:15 <dcf1> cohosh: the model and shared key are exchanged in advance
16:41:36 <dcf1> the model can be known to the adversary but the key needs to stay secret
16:41:49 <dcf1> I think GPT-2 models are GBs or TBs large
16:42:08 <cohosh> wow
16:42:36 <meskio> why could you not do some kind of DH over that cover channel to set up the key? you could use a default key = 0 (or whatever)
16:42:41 <meskio> maybe will be easy to fingerprint?
16:43:44 <shelikhoo> The DH process itself will be visible for adversary...
16:44:07 <meskio> yep, but plausible deniable
16:44:13 <meskio> (I guess)
16:44:35 <dcf1> https://en.wikipedia.org/wiki/GPT-2#Limitations "GPT-2 deployment is resource-intensive; the full version of the model is larger than five gigabytes..."
16:45:38 <dcf1> They talk about making a hybrid public-key + symmetric-key crypto system in a few places, but they propose to use existing publikc-key steganography for the symmetric key exchange, then Meteor for the symmetric part.
16:46:20 <cohosh> okay so GPT-2 is one specific model, not a general generative method
16:46:58 <meskio> yes, in the paper they use other models that they trained themselves from wikipedia and http headers
16:47:02 <dcf1> But yes, I think one reason the shared key is necessary is because you need something that looks like a uniformly random bitstream, which ciphertexts are and plaintexts generally are not.
16:47:31 <cohosh> and the advantage of using something large like GPT-2 over a smaller model like wikipedia is the efficiency of the scheme in terms of overhead length?
16:47:56 <meskio> GPT-2 model is hard to distinguise from real humans talking
16:48:02 <dcf1> In their summary of public-key setganography, they give an example of hashing the stego tokens in order to achieve that unbiased distribution. Or something like that.
16:48:07 <cohosh> (if i'm reading table 3 correctly)
16:48:24 <shelikhoo> Yes, it generates a more natural speech...
16:48:35 <cohosh> ah i see so it is steganographically better
16:48:43 <meskio> GPT-2 is actually scarily human
16:50:33 <dcf1> Can't resist sharing this video, a novelist using generative text models as an aid to writing https://www.youtube.com/watch?v=cIpErjWBqm0
16:50:50 <meskio> :D
16:51:36 <dcf1> So, are there any immediate implications of this work for us?
16:52:34 <dcf1> one significant practical limitation is that even in the imagined world where all encrypted communication is banned, steganography does not hide who you are talking do, which is usually the easiest basis for censoring something
16:52:57 <dcf1> So an unencrypted steganographic request to @GetBridgesBot would still be detected and blocked
16:53:32 <shelikhoo> Yes, unless there is a broadcast channel...
16:53:52 <cohosh> i like shelikhoo's use cases here, users sharing between themselves onion addresses or bridge lines
16:54:00 <dcf1> shelikhoo: that is a good point.
16:54:43 <cohosh> the model size seems like too big a barrier
16:55:03 <cohosh> unless there is a way to get good enough steganography with something much smaller and more accessible
16:55:03 <meskio> meteor has 12x overhead, so it requires a fairly long conversation for small amount of data, but I guess duable
16:55:22 <dcf1> I'm thinking something like a long-running Facebook chat between two users (bots). The goal here is to avoid detection by Facebook as bots.
16:55:47 <meskio> the metor paper says that is a first stem, I don't think they see it as actually usable in the real world
16:55:48 <dcf1> These two bot users carry on a chat with each other using Meteor  messages for text, and then at either side it's bridged into a proxy.
16:55:54 <meskio> in a phone it will kill your battery
16:56:12 <dcf1> so the actual users of the proxy don't necessarily need to do all the expensive processing.
16:57:09 <meskio> I guess facebook bot detection is not only about the content but about the ratio, so you have a limited bandwidth
16:57:33 <cohosh> dcf1: woah
16:57:40 <dcf1> sure, my point is that the heavy processing needs are not necessarily a limitation, because you can have a semi-centralized proxy do that for you
16:57:58 <meskio> how many chars a human can type per second? 12? 1B/s of meteor actuall bandwidth
16:58:47 <dcf1> haha okay so maybe it can serve the same purpose as a numbers station
16:59:12 <dcf1> but yes, the low bandwidth kind of puts it outside the realm of most of what we do
17:00:24 <meskio> you can have many bots talking and parallelize, but still
17:03:02 <meskio> something more about meteor?
17:03:27 <meskio> BTW, I just found the website and it has a nice description: https://meteorfrom.space/
17:03:51 <meskio> do we want to choose the next paper for our reading group?
17:04:32 <dcf1> censorbib is updated with a bunch of 2021 papers https://censorbib.nymity.ch/
17:04:36 <cohosh> meskio: nice, thanks for the link!
17:06:02 <meskio> I'm curious about the weaponizing middleboxes one, doesn't have much implications for us, but might be fun to read
17:06:14 <meskio> https://censorbib.nymity.ch/#Bock2021b
17:06:42 <cohosh> sounds good to me
17:07:26 <meskio> feb 17?
17:07:36 <shelikhoo> (related: Gfw middle box repurposed as DDoS dropper: https://arstechnica.com/information-technology/2015/04/ddos-attacks-that-crippled-github-linked-to-great-firewall-of-china/ )
17:07:57 <meskio> :D
17:08:48 <meskio> ok, I guess we are done for today
17:09:04 <meskio> I'll live the meeting open for one more minute just in case
17:09:12 <meskio> s/live/leave/
17:10:12 <meskio> #endmeeting