15:58:34 <meskio> #startmeeting tor anti-censorship meeting 15:58:34 <MeetBot> Meeting started Thu Sep 8 15:58:34 2022 UTC. The chair is meskio. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:58:34 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic. 15:58:39 <meskio> Hello everybody!! 15:58:44 <cohosh> hi 15:58:52 <meskio> here is our meeting pad: https://pad.riseup.net/p/tor-anti-censorship-keep 15:59:01 <meskio> feel free to add what you've been working on and put items on the agenda 15:59:53 <shelikhoo> hi~ 16:00:00 <meskio> First of all an announcement: there will not be a meeting next week, some of us will be a the tormeeting talking in person :) 16:00:09 <meskio> next meeting will be Sept 22 16:01:27 <meskio> shelikhoo: I kept the point about webtunnel, is there anything to talk about that? or should we move on? 16:02:17 <shelikhoo> the next step need to be discussed is about how to get rdsys to distribute connection info 16:02:33 <shelikhoo> I would like to discuss about this part 16:02:40 <meskio> +1 16:04:49 <meskio> can we just use goptlib to add the needed params to the bridge descriptors so rdsys discovers them? 16:05:12 <anadahz> (hi) 16:06:09 <shelikhoo> yes, and is this info send to rdsys only? 16:07:12 <meskio> AFAIK it gets sent to the bridge authority and the bridge authority sents it to rdsys, metrics.tpo will clear it up befure publishing it 16:08:44 <shelikhoo> yes, just to confirm this info is not going to be something one get can from tor in any other way... 16:09:59 <meskio> I think you need to use SmethoArgs from goptlib, but you can test it by creating a bridge and looking into what gets published in metrics.tpo and what you have in polyanthum 16:10:35 <shelikhoo> yes, or just try to get info from existing obfs4 bridge... 16:10:49 <meskio> +1 16:12:00 <meskio> do we have a plan here? 16:12:41 <shelikhoo> The issue is that tor is quite complex, and i fear I could miss something 16:13:06 <cohosh> you're worried about the parameters getting leaked? 16:13:34 <shelikhoo> yes, I am unsure who will be able to receive these parameters 16:13:57 <shelikhoo> since I am not familiar with the design of other part of tor 16:13:58 <dcf1> use SmethodArgs, that's what obfs4proxy does to distribute its secret credentials 16:14:03 <cohosh> and the consequence of them getting leaked is the bridge could be discovered and blocked? 16:14:09 <shelikhoo> yes 16:14:18 <cohosh> okay fwiw obfs4 has the same threat model 16:14:25 <shelikhoo> it will contain the domain name 16:14:32 <shelikhoo> which can be blocked 16:14:41 <shelikhoo> okay, that is reassuring 16:15:17 <shelikhoo> I think in this case, the client will pass these info via cmd, which will be then be send to tor 16:15:21 <dcf1> metrics removes such extra parameters before publishing: 16:15:23 <dcf1> https://metrics.torproject.org/bridge-descriptors.html#transport 16:15:53 <dcf1> shelikhoo: using SOCKS args (i.e. key=val in a bridge line) is generally better and more flexible than cmd line arguments 16:16:18 <dcf1> oh sorry, you're talking about server side, never mind 16:16:26 <shelikhoo> dcf1: these are for sever... yes 16:16:43 <dcf1> You can use ServerTransportOptions in torrc then 16:17:16 <dcf1> Then get them from Bindaddr.Args 16:17:32 <dcf1> TOR_PT_SERVER_TRANSPORT_OPTIONS is how it works internally if you want to read about it 16:17:45 <shelikhoo> yes! I think this is better than cmd 16:17:58 <shelikhoo> let's do it this way 16:18:01 <meskio> that is nice :) 16:18:31 <dcf1> One limitation is that you cannot run multiple instances of the same transport with different options... but that's a limitation of torrc syntax 16:19:27 <shelikhoo> this should be fine for us... I think most user will only run one webtunnel managed by one tor 16:19:50 <meskio> you can always have several tor process if you really have that usecase... 16:19:59 <meskio> with different torrc files 16:20:30 <dcf1> meskio: you have to run separate instances in practice anyway, for accurate metrics 16:20:40 <meskio> yep 16:21:37 <meskio> should we move on to the next topic? 16:22:06 <shelikhoo> nothing more from me on this topic 16:22:16 <meskio> "Proposal for outreachy" 16:22:41 <meskio> I have submited a proposal to do some work around other distributors for gettor at Outreachy 16:22:43 <meskio> https://gitlab.torproject.org/tpo/team/-/issues/67#note_2834285 16:23:06 <meskio> outreachy is an intership program focused on underrepresented people 16:23:25 <meskio> so we might have an intern dec-febr 16:23:47 <meskio> I will mentor them, but I'll be happy to get any help there 16:23:59 <cohosh> nice :) 16:24:09 <meskio> or if someone wants to co-mentor this is possible officialy and I'll be happy to share the load 16:24:49 <meskio> we might see in the comming months contributions arriving to rdsys from candidates, I'll need to create more newcomers tickets for that... 16:26:02 <meskio> I think we can move to the next topic 16:26:24 <shelikhoo> yes 16:26:30 <meskio> "A new format for placeholder addresses in PT bridge lines" 16:26:37 <dcf1> okay, quick note about placeholder addresses 16:27:12 <dcf1> we have been using placeholder addresses with incrementing port numbers :1, :2, :3, etc., because tor requires all PT bridges to have different IP:port, or it gets them confused 16:27:56 <dcf1> this causes a problem when tor is configured with ReachableAddresses or FascistFirewall, because tor thinks the PT is going to try to actually make a TCP connection to those placeholder addresses 16:28:22 <dcf1> and it says "port 1 is not one of the ports permitted by FascistFirewall, therefore I will not attempt this bridge connection" 16:28:35 <dcf1> imo it's a bug in core tor, but it's WONTFIX for years now 16:29:15 <meskio> it might not get fixed until arti 16:29:23 <dcf1> so the proposal is to move this counter into the IP address and always use port 80 for placeholder addresses, to make them more likely to work with ReachableAddresses and FascistFirewall 16:30:49 <dcf1> Way back in flash proxy, which also needed a placeholder address, I tried to make the placeholder look as different from an actual usable IP address as possible, to reduce confusion 16:31:20 <dcf1> since then, the placeholders have been slowly morphing to look more and more like real IP addresses :) 16:31:38 <dcf1> the progression was something like: 16:31:57 <dcf1> 0.0.0.0:0 <- no good, tor uses the all-zero address as a sentinel internally 16:32:09 <dcf1> 0.0.0.1:1 <- no good, 0.0.0.X is used by SOCKS 16:32:35 <dcf1> 0.0.1.0:1 <- we used this for a while, but it ran into problems with 0/8 being considered "internal" by tor 16:32:59 <dcf1> 192.0.2.1:1 <- what we use now, using a special non-routable IP range reserved for documentation 16:33:13 <dcf1> 192.0.2.1:80 <- new proposal 16:34:20 <dcf1> there may not be much to discuss, if it seems okay I'm planning to open a merge request in tor-browser-build 16:34:54 <shelikhoo> is it possible for us to use a IPv6 address in this role 16:35:10 <shelikhoo> in this way, we could just randomly generate an address 16:35:39 <dcf1> IPv6 is an interesting idea, I hadn't thought of that. It must be possible, because there is an IPv6 default obfs4 bridge 16:35:40 <shelikhoo> without the need to make sure different bridges have different "address" 16:36:24 <dcf1> I am not sure randomly generated is a good idea, though. One reason for using "artifical looking" placeholder addresses is to reduce the risk in case tor tries to do anything with the address, for example connect to it 16:37:05 <shelikhoo> there are some reserved IP block in IPv6 we could use as a prefix 16:37:12 <dcf1> tor did, in fact, have a bug, where if you configured a "Bridge snowflake" line, but did not configure a "ClientTransportPlugin snowflake" line, it would initiate a direct TCP connection to the placeholder IP address! 16:37:55 <dcf1> I'm not sure if that bug has been fixed, but in that case, the decision to use a non-routable address like 0.0.3.0 was a good one, it prevented tor from making random outgoing TCP connections in the case of a misconfiguration 16:38:30 <shelikhoo> Yes, there should be similar address in IPv6 as well 16:38:35 <dcf1> shelikhoo: but yes, you are right, IPv6 could give some more flexibility 16:38:39 <shelikhoo> yes 16:39:19 <dcf1> okay well I will propose a merge request with the 192.0.2.(16(n−1)+t):80 format, it's good enough for our needs for now 16:39:43 <shelikhoo> yes... 16:39:55 <meskio> yes, I think is fine for now, but we should keep an eye because there is not a huge space for snowflake bridges, and maybe one day we want to revisit it 16:39:58 <dcf1> an IPv6 placeholder would need more testing, for example I would be wary of tor internally doing something like, "no IPv6 interfaces detected, therefore I will not use this bridge that has an IPv6 address" 16:40:46 <meskio> (16 bridges, should be fine for a while) 16:41:01 <shelikhoo> yes... I hope in arti, anti-censorship part can be take into consideration in the initial design 16:41:08 <dcf1> I did notice in a ticket about PT support in arti there is a line about "transports that don't use bridge addresses", so maybe this is something that will be less of a problem in the future 16:41:44 <meskio> AFAIK the arti team knows that this kind of bridges exist 16:42:25 <meskio> anything else on this topic? 16:42:34 <dcf1> that's all from me 16:42:37 <shelikhoo> nothing from me 16:42:49 <meskio> any other topics before the reading group? 16:43:18 <dcf1> https://gitlab.torproject.org/tpo/core/arti/-/blob/main/doc/BridgeIssues.md is the doc I was thinking of 16:44:17 <dcf1> "Problem 6: Existing bridge-line format" 16:44:32 <dcf1> "Make addresses optional" 16:44:56 <meskio> nice :) 16:45:11 <dcf1> Okay I'll kick off the reading group 16:45:16 <meskio> thanks 16:45:30 <dcf1> Our paper is "An Empirical Analysis of Plugin-Based Tor Traffic over SSH Tunnel" https://ieeexplore.ieee.org/document/9020938 16:46:47 <dcf1> It is a bit difficult to tease out exactly what this paper is about and what experiments were done, it's sort of scattered 16:47:31 <dcf1> Briefly, it's about looking at various properties of Tor pluggable transports (FTE, meek, obfs3, ScrambleSuit, obfs4), when those transports are configured to use an SSH upstream proxy before reaching their bridge 16:47:40 <dcf1> The topology is like: 16:48:12 <dcf1> client ---> SSH proxy ---> Tor guard ---> Tor middle ---> Tor exit ---> dest 16:48:22 <dcf1> They do 3 different experiments: 16:48:53 <dcf1> 1. distinguish different PTs (inside SSH) from each other, and from "normal" SSH 16:49:21 <dcf1> 2. distinguish different application protocols inside SSH-tunneled obfs4 (for this experiment they use only obfs4) 16:50:24 <dcf1> 3. do ML-based traffic correlation on the PT traffic -- for this experiment only, they do not look at the SSH link, but rather the links between the SSH proxy and Tor guard, and between the Tor exit and dest 16:51:01 <dcf1> as far as I could tell, experiment 3 does not use the SSH part of the topology at all, so I don't see why it matters, though I suppose having passed through an SSH tunnel could have some effect on traffic features 16:51:51 <dcf1> they do show a graph (Fig. 2) that shows SSH-tunneled obfs4 begin differently shaped from plain obfs4 (though this graph is suspect for reasons I will get into) 16:52:44 <dcf1> BTW see Table I (page 4) for a breakdown of the experiments. First 6 rows are experiment #1 as I have called it, next 5 rows are experiment #2, last 3 rows are experiment #3. 16:53:24 <dcf1> The most interesting thing to me, I think, is the feature selection in section III-B 16:54:00 <dcf1> they borrowed features from the paper "Deciphering malware's use of TLS (without decryption)", which is a pretty well-known and foundational work on encrypted traffic analysis (it was a precursor to a product called ETA by cisco) 16:54:23 <dcf1> I gave a rump session talk on this line of research at PETS 2017: https://www.bamsoftware.com/talks/pets-2017-menace/index.html 16:54:33 * eta by cisco 16:55:31 <dcf1> There is a probability distribution over all byte values (256 features), plus binned packet sizes and interarrival times, 456 features in total 16:56:04 <dcf1> They also use the cisco tool Joy to separate pcaps into upstream and downstream flows: https://github.com/cisco/joy 16:57:24 <dcf1> For experiment #3, the correlation experiment, they don't use the 256 byte distribution features, because there they are looking at obfs4, not SSH, and the uniform distribution doesn't provide any useful information 16:58:16 <dcf1> They run all these features through some Scikit-learn classifiers, and get high accuracy, as usual with papers of this kind 16:58:41 <dcf1> (though what they call low false positive rates are pretty high: 1-3%!) 16:59:12 <cohosh> (especially considering a low base rate) 16:59:29 <dcf1> I have some critical comments about the paper, but I will let anyone else chime in 16:59:41 <meskio> I find weird they pick SSH proxying, is proxying tor over ssh actually any common? 17:00:07 <dcf1> yes that stood out to me too 17:00:37 <dcf1> the introduction (and throughout) shows a fairly poor understanding of pluggable transports and why people might use them 17:00:46 <meskio> as you say I was not expecting that using ssh or not make any difference to categorize tor traffic 17:00:56 <cohosh> every once in a while we get users asking us how to do something like this and we try to dissuade them from it 17:00:56 <dcf1> ""Considering it is not safe to totally trust one single node, more users rely on fronting proxies to forward their traffic to the entry node of Tor." 17:01:00 <shelikhoo> I think there is one issue that is common to both obfs4 and VMess that it does not hide the original traffic's general traffic shape 17:01:12 <meskio> they might be searching for an excuse to write something "new" on the field... 17:01:25 <dcf1> Well, this is totally a supported configuration in Tor Browser. It's what is supposed to happen when you set a proxy in the connection settings dialog. 17:01:50 <dcf1> tor sets the environment variable TOR_PT_PROXY, and the PT client is supposed to understand that (send back PROXY OK) and act on it. 17:01:55 <shelikhoo> like downloading a 4 mb file from HTTP 1.0 server will always result in a connection in the same shape 17:02:15 <dcf1> I find the justification weak, but there is some intuition for it in section II-C: 17:02:52 <shelikhoo> that after proxy handshake, the client send a small request, then the server send a 4 MB chunk back 17:03:04 <dcf1> by looking only at the SSH part of the link (except, confusingly, in experiment #3, but ignore that), they remove some features that could be potentially useful, like the TLS ciphersuites of meek, or the entropy profile of FTE 17:03:42 <dcf1> so in a sense, they are making the challenge *harder* for themselves, by forcing themselves to only look at flow features like size and timing (and byte distribution, but ignore that) 17:04:14 <dcf1> shelikhoo: yes, and they kind of admit as much somewhere, let me find it... 17:04:29 <dcf1> "the effect of application identification is better than plugin identification" 17:05:17 <dcf1> which is basically saying that the application features overshadow the PT features 17:05:29 <dcf1> (also disappointing that they don't mention different iat-mode for obfs4) 17:05:57 <dcf1> I can kind of imagine that an SSH tunnel could affect packet sizes, in this sense: 17:06:20 <dcf1> obfs4 is going to try to send packets that are MTU-sized, when possible (at least in iat-mode=0) 17:06:55 <dcf1> the overhead added by the SSH proxy will cause those packets to be re-segmented when they are sent back out 17:07:18 <dcf1> I was thinking of this when looking at Fig. 2, which made me think, why are there so many data points > 1500 in Fig. 2? 17:07:32 <cohosh> going back to what you said about table I, that means they only did open world experiments for the 3rd experiment with the "campus net" background traffic, but all other experiments were closed world and only classified traffic that was tor traffic? 17:07:36 <dcf1> I find it suspicious, does anyone have a good explanation? 17:08:17 <dcf1> They even mention an MTU of 1500 bytes in III-B; what conditions were they running in that they could measure *average* packet sizes greater than 1500? 17:08:42 <dcf1> That kind of thing can happen on loopback localhost connections, but it's also possible there's an error in their experiment. 17:09:23 <cohosh> o.O that is weird 17:09:36 <anadahz> Does actually iat-mode > 0 makes any difference on the detection of obfs4? 17:09:56 <dcf1> cohosh: that, I'm not sure about. They don't really say anything about what their traffic mix was, how they define "normal" SSH, they don't report classification rates for "normal" SSH. 17:10:32 <dcf1> anadahz: iat-mode=1 and iat-mode=2 do affect the packet size distibution a lot, it causes obfs4 to send more sub-MTU packets than normal, at least 17:11:07 <dcf1> anadahz: see https://www.bamsoftware.com/talks/pets-2017-menace/index.html "this is your Tor on obfs4 with timing obfuscation" and "this is your Tor on obfs4 with aggressive timing obfuscation" 17:12:01 <dcf1> obfs4's timing obfuscation clearly leaves a lot of features (like the big gap after the handshake), but I'll bet a classifier trained on one would not work on another. 17:12:25 <anadahz> thx dcf1 17:12:42 <dcf1> I guess what I'm getting at is unfortunately there's not a lot for us to take from this paper 17:13:07 <dcf1> other than a general notion that there are researchers doing this kind of thing and this is their level of awareness 17:13:46 <dcf1> the experiments are not well defined, the evaluation is unconvincing, one gets the feeling it would fall apart if not done by the authors themselves 17:14:37 <cohosh> agreed 17:14:49 <meskio> yes, wich is good news for us :) 17:14:59 <dcf1> one interesting aspect is that the paper is written from a fairly sympathetic point of view towards tor and pluggable transports 17:15:01 <shelikhoo> add this with other patents done against V2Ray, I think we need to consider hiding application connection shape in the future when designing anti-censorship protocols 17:15:18 <dcf1> it is written more like a defense paper than an attack paper, even though it is about adversarial detection 17:15:41 <cohosh> even though they didn't go into detail about this "campus net" background traffic, it sounds somewhat similar to what large university based research groups are trying to do to analyze this type of traffic analysis 17:15:57 <cohosh> CU Boulder has done this, for example 17:15:58 <meskio> but in a defense paper I would expect some recomendations on changes to do at PTs to fix them... 17:16:19 <dcf1> yes, just because there are 1000s of weak ML classification papers, doesn't mean there are not some good one 17:16:51 <dcf1> Also I appreciated the paragraph about safety and privacy in section IV-A, which I think is actually on the mark. 17:16:54 <shelikhoo> there is already deployed ssh -D connection blocking system that allow shell and sftp, but block socket forwarding over ssh tunnel 17:17:13 <dcf1> "Our research does not raise any privacy issues during data collection. All traffic captured in the experiments is generated by ourselves, and the self-built Obfs4 bridge is set to be unpublished in the configuration file called torrc." 17:17:25 <dcf1> "In addition, considering the shortage of available Tor bridges, we only request to the Tor project one time for each kind of plugins." 17:17:50 <dcf1> shelikhoo: that's the other big cloud hanging over the communication model in this paper 17:18:26 <dcf1> besides "how common is it to use an SSH proxy with tor in this way?", it's "well, now SSH *it* your pluggable transport" 17:18:49 <dcf1> "why are you trying to distinguish different PTs inside the SSH tunnel, why do you care?" 17:20:12 <meskio> fair questions 17:20:24 <dcf1> that's all I have about this paper, anything else? 17:21:01 <meskio> not from my side 17:21:11 <shelikhoo> It would be interesting to see if there is anything to detect the type of application over kcp 17:21:46 <dcf1> I hadn't read this one in advance before suggesting it, but I can probably find more papers of this kind that may have more to teach us 17:22:17 <meskio> yes, we can give it a try to others, I hope we can find papers that are a bit less thin 17:22:23 <shelikhoo> right now most stream in stream proxy don't hide traffic shape and reveals inner application type 17:23:01 <dcf1> shelikhoo: yes, that's true. A few years ago there was not really evidence that this kind of detection was happening, now there is starting to be a little evidence. 17:23:05 <anadahz> Though in IV-A "We send emails to bridges@torproject.org to acquire available Tor bridges when collecting the traffic of different plugins" 17:23:08 <dcf1> still not a lot, but it's better to be ahead of these things. 17:23:16 <shelikhoo> KCP or mux maybe able to make more difficult to get what is inside 17:23:21 <anadahz> So they did use other bridges instead of their own. 17:24:11 <dcf1> anadahz: it's not totally clear, but I interpret it to mean they used their own obfs4 bridge (maybe only for experiment #3); for the other transports they used BridgeDB bridges, but only 1 of each transport 17:24:14 <shelikhoo> I think we should invest in creating a proxy protocol that don't reveal it is a proxy, and don't reveal the tunnel application type 17:24:31 <shelikhoo> I think we should invest in creating a proxy protocol that don't reveal it is a proxy, and don't reveal the tunneled application type 17:24:32 <dcf1> to me there's no ethics or privacy problem with doing that 17:24:55 <shelikhoo> that's all from me 17:25:07 <anadahz> dcf1: yes, it makes sense 17:25:10 <dcf1> shelikhoo: have you seen https://people.torproject.org/~dcf/obfs4-timing/ ? 17:25:37 <dcf1> it shows how to get almost arbitrary shaping by using a "pull" data model internally, rather than a "push" model 17:25:41 <dcf1> https://lists.torproject.org/pipermail/tor-dev/2017-June/012310.html 17:26:14 <dcf1> For timing+size obfuscation, I think it's the right way to structure an application internally 17:26:37 <dcf1> similar shaping is possible with ShadowSocks AEAD, using zero-byte payload packets for padding when needed 17:26:38 <meskio> the problem with those mechanisms that they increase the latency, isn't it? 17:26:50 <meskio> in places like china that might mean to become even less usable 17:26:54 <dcf1> meskio: not necessarily, it depends on the shaping model 17:27:14 <dcf1> if your traffic model has large gaps without sending, then yes, increased latency is unavoidable 17:27:32 <shelikhoo> I have not read it yet... will read it after the meeting 17:27:35 * cohosh gotta go 17:27:47 <dcf1> if you traffic model is constant bitrate 1 Mbps, then you are sending mostly padding, and filling in with actual data when available, in that case it doesn't slow down the useful payload 17:27:52 <cohosh> thanks for the paper suggestion, dcf1! 17:28:02 <dcf1> bye cohosh 17:28:10 <meskio> ciao cohosh 17:28:15 <shelikhoo> bye cohosh~ 17:28:34 <meskio> yes, but then you might care about mobile connections where you pay per MB 17:28:48 <meskio> is a hard ballance, but very interesting to investigate 17:29:03 <shelikhoo> there is a lot of issue when it comes to padding and traffic shaping 17:29:07 <dcf1> meskio: yes, I am not saying that constant bit rate is a good model for circumvention either, I'm saying everything depends on what model you choose 17:29:18 <shelikhoo> but it should be investigated 17:29:24 <dcf1> *but* that it's possible to conform to any given model, once you have chosen a model 17:29:25 <anadahz> bye cohosh 17:29:34 <meskio> :) 17:29:57 <meskio> should we wrap it up? 17:30:01 <dcf1> yes 17:30:12 <meskio> #endmeeting