15:59:14 #startmeeting tor anti-censorship meeting 15:59:14 Meeting started Thu Oct 19 15:59:14 2023 UTC. The chair is onyinyang[m]. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:59:14 Useful Commands: #action #agreed #help #info #idea #link #topic. 15:59:20 hi~ 15:59:22 hello 15:59:32 hello everyone! 15:59:45 here is our meeting pad: [https://pad.riseup.net/p/tor-anti-censorship-keep](https://pad.riseup.net/p/tor-anti-censorship-keep) 15:59:54 I am just going to restart my client quickly while we fill out the pad 16:03:57 I removed the running flag discussion, I don't think we need to talk about it, but if someone disagrees let me know 16:04:37 Cool, I was just going to start with the Armored bridge line discussion topic since it seems to be the only one 16:05:19 So the first discussion topic today is continuing our discussion from last week: 16:05:19 * Armored Bridge line Spec(Oct-19: let's discuss again) 16:05:19 https://gitlab.torproject.org/tpo/anti-censorship/team/-/issues/126#note_2954127 16:06:13 yes! 16:06:54 last week, we have decided to read the spec and discuss it this week 16:07:08 I think the spec looks good, thank you for the work there shelikhoo 16:07:19 it is about sharing bridge in a way that have better os integration and detects errors 16:07:20 there are few things I think will be nice to add there: 16:07:43 we should have some examples of bridgelines and their convertion, so people implementing it has something to test 16:08:09 and I think we should decide on a domain name and include specifics on how the bridge URL will look like with it 16:08:25 I have two domain names bought: brdg.es and bridge.st 16:08:38 I think I like the first, but I'm ok with any or another 16:09:17 I have no preference on the domain name 16:09:28 With .es, it should be puent.es :) 16:09:51 just joking 16:09:52 yes, I thought about it, but maybe confusing for non-spanish speakers :D 16:10:40 speaking of non-english languages, right now the armored bridge line does not support non-ascii charactors 16:10:47 as a result of a compression step 16:11:04 do we wish to address this issue, or let it be 16:11:34 I don't expect to have non-ascii bridgelines in the near future, thinks like webtunnel urls can be converted to ascii chars... 16:11:58 but maybe is something to say explicitly so implementations throw an error if non-ascii is inputed 16:12:23 what is the data type of the arguments that bridges publish now in their descriptors? 16:12:48 i.e., in the bridge descriptor specs? Probably u8[] or UTF-8, I would guess. 16:13:08 yes, I will add it should return an error if non-ascii characters was encountered 16:13:22 what happens if a bridge publishes some arguments that are non-ascii? Presumably there would be an error, at what stage of the pipeline would the error happen? 16:14:02 good questions, I don't know, I'm not even sure on what spec should that be 16:14:57 when an obfs4 bridge publishes e.g. cert= and iat-mode= parameters, there's a protocol for that. I'm looking for it now. 16:16:09 it doesn't say in the extra-info spec: https://gitlab.torproject.org/tpo/core/torspec/-/blob/main/spec/dir-spec/extra-info-document-format.md 16:16:13 it just says arglist 16:16:25 so implementation specific... 16:16:54 there is also the pt-spec 16:16:58 I think there may be a blanket requirement that descriptors have to be UTF-8 (added after the core team started experimenting with rust, since it's less convenient to deal with byte strings) 16:18:00 In goptlib, you call SmethodArgs to add arguments you want to have published in the descriptor: 16:18:03 https://pkg.go.dev/git.torproject.org/pluggable-transports/goptlib.git#SmethodArgs 16:18:22 The data type there is a Go string, which is a byte array. 16:18:44 * arma2 is nearby, listening in case anything needed from him 16:19:14 It gets communicated to the tor process using the SMETHOD message 16:19:15 arma2: any idea if bridgelines were supposed to support UTF-8 chars? 16:19:15 ah the pt spec says ascii: https://gitlab.torproject.org/tpo/core/torspec/-/blob/main/spec/pt-spec/ipc.md 16:19:16 https://spec.torproject.org/pt-spec/ipc.html#pluggable-transport-server-messages-server-messages 16:19:40 meskio: the original plans were only 'normal' characters, but, i don't know if things have changed since then 16:19:57 the things is we can let's say, add a single bit at the beginning of the bridgeline 16:19:58 but, we are also planning to change how we deal with PTs and core tor 16:20:02 the armored one 16:20:04 cohosh: ASCII is for the variable name, the value is ::= 16:20:22 so that if we could add unicode support later without workaround 16:20:26 dcf1: ah i see 16:20:51 Yeah so as far as the PT spec goes, I think the type is byte arrays, not even guaranteed UTF-8. 16:21:02 having an extra byte for the type at the beginning might make sense so we are ready to suppor other formats... 16:21:37 or even just a bit 16:21:59 mmm, a bit is a bit limited as we might need extra ones in the future... 16:22:00 cannot contain newline or \0, though, the encoding of SMETHOD args doesn't handle that case 16:22:26 yes, then we could just add a empty byte 16:22:37 as a way to future proof 16:22:51 oh sorry cohosh, I pasted without thinking. ArgChar itself says US-ASCII, it doesn't contradict what you said 16:23:22 maybe a byte is too much, as you work on making it small, I don't know 16:23:42 There are some PT messages where non-ASCII bytes can be escaped (e.g. STATUS), but SMETHOD ARGS uses its own custom escaping scheme that doesn't have support for that. 16:24:51 we have to worry about the dir spec more than the PT spec though if we are scrapping it soon anyway? 16:24:56 (for some definition of soon) 16:24:56 dcf1: the 0 byte or bit(s) will be removed during bridgeline unwrap process, so it doesn't matter if couldn't be processed by pt spec 16:25:49 shelikhoo: no, what I'm worried about is that if bridgeline armoring supports a narrower data type than is generally permitted for bridge lines, then what kind of error results, is it easy to detect and debug, etc. 16:26:25 so I'm thinking of ways it would be possible to get a non-ASCII bridgeline, and I'm thinking one possible way would be if the server PT supplies non-ASCII args to tor which are then published in a descriptor 16:26:55 yes... I think the result of discussion will be we just add a type indicator, and the error will be encountered when we try to encode to the armored bridge line 16:27:12 this will happen when we try to distribute a armored bridgeline 16:27:22 (Of course one could circumvent the normal reporting mechanism and just maliciously post descriptors of whatever contents to Collector, but I'm not thinking about deliberate attacks here, I'm thinking of possible inadvertent consequences) 16:28:31 the only inadvertent consecuence that could be already happen is webtunnel urls with utf chars, not sure if they are encoded before publishing them 16:28:33 Through our research just now, it looks like the way I was thinking of would not actually work to get non-ASCII into a descriptor. (goptlib enforces it, though who knows what tor does once it gets the bytes) 16:29:01 Yeah, so it seems like it may not actually be a serious problem in normal use then. 16:29:17 sounds good then the 7 bit encoding 16:29:30 I will say, though, that the fact we are having this discussion makes me think that maybe the armoring is trying to be a bit too clever o_O 16:29:33 yes, so should we have a type indicator or not 16:30:07 and maybe could be solved by removing an element of encoding (trading compression for simplicity), rather than further complicating by adding a type indictor 16:30:10 I think is a good idea to have it for future proofness, but I'm not sure if a bit, a byte or 4 bits... 16:30:31 But I haven't really looked at the proposal, this is just an outsider's impression. I'll support your decision. 16:30:49 (there are already 2 filter dropped to favor simplicity) 16:31:12 I think we can go with one byte, just in case 16:31:33 the extra length should not matter that much in the end 16:31:53 and processing things bitwise could get out of hand quickly 16:32:26 (although it already so in the compression step) 16:32:42 I'm ok with any solution (either having a type indicator or UTF-8 support) 16:33:24 okay I will comeback with these suggestions adopted 16:33:40 sounds good 16:33:50 It seems like we've mostly come to a conclusion on this topic and can move on. 16:34:00 should we decide a domain name? I haven't hear opinions, should we use brdg.es as is shorter? 16:34:21 I think we can go with brdg.es 16:34:30 great 16:34:36 onyinyang[m]: I'm finished now with this 16:34:51 cool :) 16:34:58 The next topic is from last week but is there anything further to discuss about the snowflake broker? 16:35:23 If not we can move to interesting links 16:35:28 I have already deployed a new version this monday. nothing to discuss from me 16:35:38 nice 16:35:47 ok great. Let's discuss the interesting links then 16:36:02 The first is: "On Precisely Detecting Censorship Circumvention in Real-World Networks" 16:36:02 https://www.robgjansen.com/publications/precisedetect-ndss2024.html 16:36:12 yes there seems to be a lots of new interesting links... 16:36:19 it looks like an interesting collection of papers? should we pick one for a reading group? 16:36:58 did a conference just post a bunch of accepted papers that I missed? XD 16:37:11 I think picking one for a reading group sounds like a good idea! 16:37:12 I found these 3 papers last week. I'll post them to the mailng list probably. 16:37:33 "On Precisely Detecting..." is the one I would recommend for a reading group. 16:37:49 sounds good, and rob is around if we want to invite him 16:38:15 nice 16:38:41 what is the time frame we usually give for people to read the paper before the discussion? 16:38:52 usually we give two weeks 16:39:14 but I will be AFK in two weeks, and I see others in our team will be AFK in the following weeks 16:39:41 ahh, no, nov 12 looks good 16:39:42 yes, I was just checking that 16:39:47 this is 3 weeks from now 16:40:00 I will be away XD but can probably join for the discussion 16:40:19 do you mean 9th? 16:40:25 from a quick look at chart, it seems snowflake rendezvous is the one considered to be the weakest part for censorship resistance 16:40:34 ahh, true, you are AFK 16:40:54 yes but also the 12th is not a thursday, as cohosh points out lol 16:40:56 nov 12th is not a thursday, the 9th works for me 16:41:07 9th also works for me :) 16:41:11 ahh, yes I mean 9 16:41:16 I was looking at october 16:41:17 my head 16:41:28 hehe 16:41:31 9 sounds good to everone 16:41:37 Ok, November 9th it is! 16:41:56 I'll poke rob about it just in case he wants to join 16:42:35 Is there anything else from any of the interesting links or otherwise that anyone wants to bring up? 16:42:35 we should invite ryan too 16:43:15 ahh, is the main author, I saw rob picture and I assumed... 16:43:24 I won't know ryan 16:43:42 I think... 16:44:11 I'll dropthem an email 16:44:12 https://www.rwails.org/research/wails_precisely_ndss24.pdf if you prefer, rob's page for it nicer though :D 16:44:22 :D 16:44:48 meskio: I've already emailed them about it btw 16:45:03 ohh, great, one less thing in my queue, thanks 16:45:18 We added a citation to the snowflake paper last week after finding it 16:46:14 shelikhoo: no, you are reading the chart backwards, they say rendezvous is by far the most difficult to classify (using the features they use, anyway) because it is basically HTTPS 16:46:48 "Detecting the TLS connections to the broker performed the least-well among the four circumvention protocols (FPR = 0.18), which is expected: our network capture contains mostly TLS flows, and the Snowflake broker connections are genuine TLS connections." 16:47:18 DTLS data transfer, they say, is the most easily detectable, more than obfs4. 16:47:24 dcf1: oh no... I will read it more carefully... 16:47:59 But one of the main observations is that even with their enhanced classifiers, there are too many false positives to be practical, so they turn to multiple observations per host to increase precision 16:48:03 It's an interesting paper. 16:48:44 "For more realistic base rates, such as λ > 1 × 10⁶, the precision attained by any of the classifiers is near-zero." 16:49:40 that sounds promising 16:49:44 I think a real censor could in theory block all international dtls traffic from a certain ip if that ip contacted broker's sni 16:50:31 but in reality, we didn't see this kind of things in practice 16:51:17 let's leave this to the discussion, I will need to read the paper in detail.. 16:51:53 ok, I think that's it for today. I'm going to end the meeting now. 16:52:10 #endmeeting