16:00:10 #startmeeting tor anti-censorship meeting 16:00:10 Meeting started Thu Feb 3 16:00:10 2022 UTC. The chair is meskio. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:10 Useful Commands: #action #agreed #help #info #idea #link #topic. 16:00:13 hi~ 16:00:15 hello everybod!! 16:00:19 hi! 16:00:25 here is our meeting pad: https://pad.riseup.net/p/tor-anti-censorship-keep 16:00:36 feel free to add what you've been working on and put items on the agenda 16:00:58 I see the discussion points are still from last week 16:01:21 no, top 2 are refreshed 16:01:32 ok, removing the other two 16:02:04 should we start with the snowflake bridge? and others can add extra points if they come up with more stuff 16:02:08 how is it going dcf1? 16:02:33 give me a sec, just want to quickly upload graphs that have 1 more data point 16:02:41 :) 16:02:51 hi 16:02:55 hi o/ 16:03:56 https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/issues/40095#note_2774428 16:04:04 refresh if you had it open already 16:04:25 the summary is what's written in the pad 16:04:39 the load balancing migration was successful, the snowflake bridge is no longer bottlenecking on tor 16:04:45 \o/ 16:04:49 because it that, it's able to provide more bandwidth 16:05:01 o/ 16:05:04 but also because of that, it has reached the limit of its hardware and won't be able to go any faster 16:05:04 yeah! 16:05:36 metrics are confused by multiple instances sharing the same fingerprint, so I had to make graphs manually from extra-info descriptors 16:06:23 does collector keep the descriptors from each instance, or did you have to get them from the snowflake server itself? 16:06:44 \0/ 16:06:57 collector has everything 16:07:06 nice :) that's good at least 16:07:13 the data are all there, making the graphs is a matter of interpreting them differently 16:08:08 I want to start a discussion with core tor about: 16:08:31 1. finding a different way to do ExtORPort authentication (remove the need for extor-static-cookie), and 16:08:44 2. having a supported way to disable onion key rotation. 16:09:31 The trick I found to disable onion key rotation was to create a directory at the target path that tor tries to rename its files to when doing the rotation, which causes it to fail before replacing the file 16:09:37 ahf: ^ so you can read the backlog when you are free :) 16:09:43 needless to say that's a little fragile, though it works for now 16:10:48 snowflake has a lot more potential. if we had a bigger bridge with more processing power, it could go a lot faster, I think. 16:11:26 yeah this is awesome 16:11:33 based on estimates of the 6 days on the staging bridge, though, it's no longer just a matter of CPU, we would need a lot of network transfer per month, too. at least 50 TB / month in both directions, though obviously if it's going faster it will need even more. 16:13:37 the next point is kind of related 16:13:38 I think there are still some server providers that allow unlimited bandwith 16:13:47 https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/issues/40086#note_2773087 16:13:55 dcf1: Asking at tor-relays maybe of help to get some feedback about servers and performance. 16:14:16 anadahz: there is a thread on tor-relays about this issue already. 16:14:44 snowflake-server alone uses about half the CPU resources of the bridge (4*tor, 4*extor-static-cookie, haproxy use the other half) 16:15:06 anadahz: https://forum.torproject.net/t/tor-relays-how-to-reduce-tor-cpu-load-on-a-single-bridge/1483 16:15:56 great! https://lists.torproject.org/pipermail/tor-relays/2022-January/020183.html 16:16:30 there were not any obvious easy wins in the CPU profile. a lot of the graph is understandably under kcp-go's scheduler, which is the driver for a lot of other operations 16:17:55 cohosh, dcf1: very nice! i saw the BUG() ticket in tpo/core/tor related to this and i love this hack you have found here 16:18:52 thanks ahf. the BUG is not causing any problems, to be clear. just something incidentally discovered while doing these other things. 16:19:02 ya, i also read it as not blocking anything there 16:21:10 dcf1: are we at a point where we should look for other hosting options for snowflake and possibly reach out to people about taking this on? 16:22:25 I think in long term we need to aim for allowing a more distributed deployment of snowflake server, 16:22:25 yes, I think so. I was thinking about asking around to see if there's any groups interested. 16:22:57 maybe ggus can help here 16:24:13 dcf1: do you already have groups in mind to ask? do you need help there? 16:24:53 i'll let you know 16:25:14 great 16:25:39 anything more about this? any other topics before we move to the reading group? 16:25:44 I'm done 16:26:02 thanks dcf1 for your work on this 16:26:50 it's awesome to see snowflake growing so much :D 16:27:12 \o/ 16:27:40 thanks dcf1 ! \o/ 16:27:46 I guess we can move to the reading group 16:28:07 for this session we had readed 'Meteor: Cryptographically Secure Steganography for Realistic Distributions' 16:28:16 https://dl.acm.org/doi/10.1145/3460120.3484550 16:28:25 anybody has a summary of the paper to share? 16:29:59 TLDR: This is a paper that purposed a method to encode data into English natural texts that fit the context. 16:31:04 my knowledge of steganografphy is very limited and I had a hard time to grasp the details of the paper 16:32:09 is interested how they use english language generators like GPT-2 to cover traffic inside 16:33:00 but see their usage of CPU for very small texts it looks pretty limited in reality 16:33:34 I think, think of it like this: 16:34:04 both sides have a shared key (so they can encrypt/decrypt from each other, and equally importantly, they can produce a random-looking ciphertext stream between them. 16:35:03 When a generator like GPT-2 generates a text, for each word it finds a probability distribution for what the next word should be (A bunch of probability values that sum to 1.0). Then it internally does a weighted random selection over that probability distribution. 16:35:45 Meteor is replacing GPT-2 random number generators with (cryptographically) pseudo-random bits taken from the ciphertext they are exchanging 16:36:15 so it's actually deterministic, but an adversary who is watching the distributions being generated cannot tell that, without knowledge of the shared key 16:36:39 This can be used to transfer metadata, like Bridge line, on platform controlled by adversary like Chat Apps. I remember that things like OnionShare will generate a onion link that is a quite obvious target for chat app observers. If the link is encoded by this, then it is not possible for chat app observers to know people are using OnionShare by scanning for .onion keyword in chat log 16:37:30 On the receiving side, the receiver looks at what word ws generated, looks at the range of random values that could have resulted in that word, and takes the common prefix of those values (if any) as a few bits of the covert message. 16:37:47 they can do that because both sides share the same generative text model. 16:39:01 dcf1: nice summary, thanks :) 16:39:21 thanks for the summary... 16:40:52 this needs a handshake to take place, does that handshake information to construct the model as well as the PRG key? 16:41:15 cohosh: the model and shared key are exchanged in advance 16:41:36 the model can be known to the adversary but the key needs to stay secret 16:41:49 I think GPT-2 models are GBs or TBs large 16:42:08 wow 16:42:36 why could you not do some kind of DH over that cover channel to set up the key? you could use a default key = 0 (or whatever) 16:42:41 maybe will be easy to fingerprint? 16:43:44 The DH process itself will be visible for adversary... 16:44:07 yep, but plausible deniable 16:44:13 (I guess) 16:44:35 https://en.wikipedia.org/wiki/GPT-2#Limitations "GPT-2 deployment is resource-intensive; the full version of the model is larger than five gigabytes..." 16:45:38 They talk about making a hybrid public-key + symmetric-key crypto system in a few places, but they propose to use existing publikc-key steganography for the symmetric key exchange, then Meteor for the symmetric part. 16:46:20 okay so GPT-2 is one specific model, not a general generative method 16:46:58 yes, in the paper they use other models that they trained themselves from wikipedia and http headers 16:47:02 But yes, I think one reason the shared key is necessary is because you need something that looks like a uniformly random bitstream, which ciphertexts are and plaintexts generally are not. 16:47:31 and the advantage of using something large like GPT-2 over a smaller model like wikipedia is the efficiency of the scheme in terms of overhead length? 16:47:56 GPT-2 model is hard to distinguise from real humans talking 16:48:02 In their summary of public-key setganography, they give an example of hashing the stego tokens in order to achieve that unbiased distribution. Or something like that. 16:48:07 (if i'm reading table 3 correctly) 16:48:24 Yes, it generates a more natural speech... 16:48:35 ah i see so it is steganographically better 16:48:43 GPT-2 is actually scarily human 16:50:33 Can't resist sharing this video, a novelist using generative text models as an aid to writing https://www.youtube.com/watch?v=cIpErjWBqm0 16:50:50 :D 16:51:36 So, are there any immediate implications of this work for us? 16:52:34 one significant practical limitation is that even in the imagined world where all encrypted communication is banned, steganography does not hide who you are talking do, which is usually the easiest basis for censoring something 16:52:57 So an unencrypted steganographic request to @GetBridgesBot would still be detected and blocked 16:53:32 Yes, unless there is a broadcast channel... 16:53:52 i like shelikhoo's use cases here, users sharing between themselves onion addresses or bridge lines 16:54:00 shelikhoo: that is a good point. 16:54:43 the model size seems like too big a barrier 16:55:03 unless there is a way to get good enough steganography with something much smaller and more accessible 16:55:03 meteor has 12x overhead, so it requires a fairly long conversation for small amount of data, but I guess duable 16:55:22 I'm thinking something like a long-running Facebook chat between two users (bots). The goal here is to avoid detection by Facebook as bots. 16:55:47 the metor paper says that is a first stem, I don't think they see it as actually usable in the real world 16:55:48 These two bot users carry on a chat with each other using Meteor messages for text, and then at either side it's bridged into a proxy. 16:55:54 in a phone it will kill your battery 16:56:12 so the actual users of the proxy don't necessarily need to do all the expensive processing. 16:57:09 I guess facebook bot detection is not only about the content but about the ratio, so you have a limited bandwidth 16:57:33 dcf1: woah 16:57:40 sure, my point is that the heavy processing needs are not necessarily a limitation, because you can have a semi-centralized proxy do that for you 16:57:58 how many chars a human can type per second? 12? 1B/s of meteor actuall bandwidth 16:58:47 haha okay so maybe it can serve the same purpose as a numbers station 16:59:12 but yes, the low bandwidth kind of puts it outside the realm of most of what we do 17:00:24 you can have many bots talking and parallelize, but still 17:03:02 something more about meteor? 17:03:27 BTW, I just found the website and it has a nice description: https://meteorfrom.space/ 17:03:51 do we want to choose the next paper for our reading group? 17:04:32 censorbib is updated with a bunch of 2021 papers https://censorbib.nymity.ch/ 17:04:36 meskio: nice, thanks for the link! 17:06:02 I'm curious about the weaponizing middleboxes one, doesn't have much implications for us, but might be fun to read 17:06:14 https://censorbib.nymity.ch/#Bock2021b 17:06:42 sounds good to me 17:07:26 feb 17? 17:07:36 (related: Gfw middle box repurposed as DDoS dropper: https://arstechnica.com/information-technology/2015/04/ddos-attacks-that-crippled-github-linked-to-great-firewall-of-china/ ) 17:07:57 :D 17:08:48 ok, I guess we are done for today 17:09:04 I'll live the meeting open for one more minute just in case 17:09:12 s/live/leave/ 17:10:12 #endmeeting