16:00:08 <shelikhoo> #startmeeting tor anti-censorship meeting 16:00:08 <shelikhoo> here is our meeting pad: https://pad.riseup.net/p/r.9574e996bb9c0266213d38b91b56c469 16:00:08 <shelikhoo> editable link available on request 16:00:08 <MeetBot> Meeting started Thu Feb 27 16:00:08 2025 UTC. The chair is shelikhoo. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:08 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic. 16:00:12 <shelikhoo> hi~hi~ 16:00:24 <cohosh> hi 16:01:01 <meskio> hello 16:04:08 <shelikhoo> okay, I think we can start with the first discussion point: 16:04:09 <shelikhoo> Next Step for Datagram mode transport mode for Snowflake 16:04:09 <shelikhoo> The broker can now reject older proxies based on the version number 16:04:09 <shelikhoo> The new server, broker, and proxy is designed to work with both new and old client 16:04:09 <shelikhoo> We still need to add the support for new protocol to webextension version of the proxy 16:04:10 <shelikhoo> Should we add both version of protocol to the client? or should we just merge the proxy, broker, and server code now, and wait long enough before merging the client? 16:04:32 <shelikhoo> https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/merge_requests/315 16:05:19 <shelikhoo> It is possible add both stream and datagram mode transport to the client, but this work would not be strictly necessary 16:06:00 <shelikhoo> We can process with aiming to merge both proxy, broker, and server first 16:06:07 <shelikhoo> and wait for proxy deployment 16:06:16 <shelikhoo> before go ahead with merging the client 16:06:33 <shelikhoo> so that we don't have to support both mode in client, then delete it later 16:07:00 <shelikhoo> we might still wants to run a staging broker and some proxy for testing 16:07:20 <shelikhoo> before proxy part is merged 16:07:44 <shelikhoo> so there are 2 discussion points: should we go ahead and deploy a staging server 16:07:59 <shelikhoo> should we add both version of protocol to client? 16:09:04 <cohosh> we've had one instance before where we rolled out a feature that required all proxies to update before we updated clients 16:09:29 <cohosh> that was when we added support for multiple snowflake bridges 16:09:31 <cohosh> https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/issues/28651 16:09:47 <cohosh> we ended up deploying some metrics to track the proxy update process https://gitlab.torproject.org/tpo/anti-censorship/team/-/issues/95 16:10:21 <shelikhoo> yes, and in that case the clients already have code to support both single bridge mode 16:10:30 <shelikhoo> and multiple bridge mode 16:10:43 <shelikhoo> we just changed an option on the client 16:11:11 <shelikhoo> in the current case, we are discussing whether to support a both transport mode in a single version of code base 16:11:11 <cohosh> that's right, so it was a little different 16:11:27 <shelikhoo> in will increase complexity of code 16:11:38 <shelikhoo> this will increase complexity of code 16:13:22 <meskio> seeing that we can take time to do this deployment and wait until most proxies have upgraded to me it sounds like the complexity maybe is not needed 16:13:33 <cohosh> i think the plan of merging and deploying support in proxies first, and watching how it rolls out is a good one 16:13:50 <meskio> +1 16:13:59 <meskio> we can change our mind if the roll out is too slow 16:14:09 <meskio> BTW, is the webextension also ready for this? 16:14:25 <cohosh> i think it would be good for performance testing purposes to try out both old and new clients with the deployed proxies 16:14:44 <cohosh> since that's the goal of this work 16:14:51 <shelikhoo> no, but webextension deployment is much faster 16:15:04 <shelikhoo> while standalone proxy will take some time to update 16:15:12 <cohosh> it would be nice to get an idea of how successful it was before we flip the switch 16:15:25 <cohosh> well, i guess it's one goal of this work 16:15:51 <shelikhoo> yes, in our current configuration, if the client asked for udp transport mode and proxy does not support it 16:16:01 <cohosh> yes 16:16:02 <shelikhoo> the connection would fail or not work 16:16:37 <shelikhoo> so unless the broker is rejecting old proxies, connecting with new proxy will not work as expected 16:16:46 <shelikhoo> so unless the broker is rejecting old proxies, connecting with new client will not work as expected 16:16:47 <dcf1> so, is the question about whether in implement backward compatibility at the broker or the client? 16:17:23 <dcf1> "the connection would fail or not work" could be addressed in at least two ways: the broker could avoid making matches that don't work; or all clients could be universally compatible 16:17:23 <shelikhoo> we are thinking about whether to add the code to support both transport mode into client 16:17:44 <cohosh> i think they are linked, if we decide not to make the client backwards compatible, we will need the broker to reject old proxies 16:17:49 <shelikhoo> the broker can already avoid making match that don't work by rejecting old proxies 16:18:02 <shelikhoo> this part is done 16:18:13 <cohosh> we've used this before 16:18:22 <dcf1> Ok, to me these seem like two alternatives, not something where you would do both. 16:18:58 <shelikhoo> yes, and my current proposal is to merge the proxy, server, and broker part first 16:19:11 <shelikhoo> then wait for enough proxy to switch 16:19:26 <dcf1> also I'm mentally reserving that the broker could be more subtle than rejecting old proxies, it could instead avoid matching new clients with old proxies 16:19:36 <shelikhoo> while we write the web extension proxy support for udp transport mode 16:19:51 <shelikhoo> before we reject the old proxy 16:20:04 <shelikhoo> dcf1: it would make matching process more complex 16:20:32 <dcf1> yes, granted 16:20:46 <shelikhoo> and the version that does a broker assisted protocol negotiation was rejected in merge review 16:20:48 <dcf1> and universal protocol support in clients is also a cost in complexity 16:21:07 <dcf1> that's why I was mentally framing it as two alternatives, but my understanding might be bad 16:21:21 <cohosh> we also don't need to make this decision now, we can wait and see what the proxy metrics look like after the deployment 16:21:22 <shelikhoo> and the client select protocol approach was used 16:21:34 <dcf1> "the version that does a broker assisted protocol negotiation was rejected in merge review" that's not because it's a bad idea, it's because that was not the focus of that part of the patch development 16:22:04 <shelikhoo> okay... but anyway the thing we could do now is 2: 16:22:21 <dcf1> that was a feature I asked you to leave out of the protocol development, specifically to defer the discussion of how to do backward compatibility for later, which is what we are doing now 16:22:28 <shelikhoo> 1. I can create a merge request with only proxy, broker, and server update 16:23:05 <shelikhoo> 2. I can deploy a testing broker, so that we can test the new protocol without impacting current proxy pool 16:24:29 <shelikhoo> alternatively we could consider other plans such as adding more tolerance to version difference 16:25:23 <cohosh> a testing broker would be nice for catching bugs, we'll be limited in what we can learn about performance 16:26:05 <shelikhoo> yes, I could just deploy a testing broker 16:26:13 <shelikhoo> without deploying testing proxies 16:26:33 <cohosh> why without? 16:26:56 <shelikhoo> in this way when testing, the proxies's network environment can be controlled by the tester 16:27:13 <cohosh> ah i see 16:28:20 <shelikhoo> oh, actually we can have them both, firstly we run a broker with proxy to testing for bugs 16:28:33 <shelikhoo> before we shutdown proxy and have another round of testing 16:29:33 <cohosh> that sounds like a reasonable testing plan to me 16:30:01 <meskio> +1 16:30:16 <cohosh> we can also test proxies that have support for the new feature in the real snowflake network, if some operators are willing 16:30:30 <shelikhoo> yes. So, my plan is as follow: I will create a testing broker deployment with some proxies for testing 16:30:50 <cohosh> (just to make sure the proxy backwards compatability works for now) 16:31:46 <shelikhoow> cohosh: I assume we need to merge and update broker, and server for the backwards comparability to actually work 16:32:02 <cohosh> shelikhoo: ah, i missed that part of it 16:32:06 <cohosh> ok 16:32:11 <shelikhoow> otherwise if the proxies are updated first, then the result will be undefined 16:32:50 <cohosh> can this testing broker can be used with both new and old clients? 16:33:06 <shelikhoow> both new and old clients will work 16:33:19 <shelikhoow> but proxies, broker, and server must be updated 16:34:49 <cohosh> ok so the test set up would need a server too 16:34:51 <shelikhoo> so my plan will be to create a testing network with server, broker, and proxies 16:35:13 <cohosh> ok, it's a lot of work but this is a big feature 16:35:21 <shelikhoo> yes, but things like that isn't hard, comparing to get our current pool to update 16:35:48 <cohosh> we can make some tor browser builds with the new client and build in the test bridge line if we want to do a wider testing call 16:36:33 <shelikhoo> yes! that being said the speed won't be directly comparable 16:36:40 <cohosh> yep 16:36:48 <shelikhoo> unless we distribute 2 tor browser 16:36:58 <shelikhoo> one with new client, and another one with old client 16:37:07 <shelikhoo> and no one else is running proxy 16:37:36 <shelikhoo> but anyway I think we already reached an conclusion for next step: setting up testing network 16:37:42 <shelikhoo> let's move to the next topic 16:37:46 <meskio> :) 16:38:04 <cohosh> in the meantime, it also makes sense to me to prepare a MR with just the proxy, broker, and server changes since regardless of how we handle backwards compatability that will be the first step 16:38:27 <cohosh> oh, we can move on, since there's also no rush on that i suppose 16:38:38 <shelikhoo> yes, we are running out of time 16:38:46 <shelikhoo> there is no rush 16:38:47 <shelikhoo> Gitlab Dependency Proxy and our merge request pipeline 16:38:47 <shelikhoo> https://gitlab.torproject.org/tpo/anti-censorship/pluggable-transports/snowflake/-/merge_requests/522 16:38:47 <shelikhoo> The pipeline will stop working when we are developing from our "personal" fork of it 16:39:13 <shelikhoo> with the new dependency proxy system 16:39:39 <shelikhoo> the pipeline would use dependency proxy if it is running from team namespace 16:39:58 <shelikhoo> and pull directly without it when running from personal name space 16:40:02 <shelikhoo> so... 16:40:16 <shelikhoo> if we are continue to develop with our existing workflow 16:40:44 <shelikhoo> then our personal fork that we develop on will have malfunctioning pipeline 16:40:57 <shelikhoo> should we: develop in the team namespace 16:41:00 <meskio> ohh, I guess I missed that part when writting it, I just copypasted from what the network team is using 16:41:15 <shelikhoo> or find some better plan about gitlab actions 16:41:45 <shelikhoo> we could, obviously, run the gitlab actions locally on our own machine 16:42:03 <meskio> I like the personal forks as they don't pollute the main repo with branches 16:42:10 <meskio> but I don't have strong opinions there 16:42:18 <shelikhoo> but it would make it significantly harder to collaborating and merge reviewing 16:42:34 <meskio> yes, is nice to get CI on our merge requests 16:43:07 <meskio> I could check with TPA on this and see if we can find a solution that includes personal forks 16:43:30 <meskio> I haven't really look into their template 16:43:36 <cohosh> +1 checking that sounds good 16:43:38 <shelikhoo> okay, we can discuss this again next week 16:43:47 <meskio> okay, I'll dig into it 16:43:57 <shelikhoo> here are the announcements: 16:43:59 <shelikhoo> rdsys 1.0 release 16:43:59 <shelikhoo> https://blog.torproject.org/making-connections-from-bridgedb-to-rdsys/ 16:44:01 <shelikhoo> yeah! 16:44:18 <shelikhoo> interesting links: 16:44:19 <shelikhoo> TURN/STUN server networks from https://www.petsymposium.org/foci/2025/foci-2025-0003.php "Using TURN Servers for Censorship Evasion" 16:44:19 <shelikhoo> https://developers.cloudflare.com/calls/turn/ 16:44:19 <shelikhoo> https://www.metered.ca/tools/openrelay/ 16:44:19 <shelikhoo> https://www.expressturn.com/ 16:44:20 <shelikhoo> https://xirsys.com/ 16:44:32 <shelikhoo> =================== 16:44:33 <shelikhoo> https://arxiv.org/abs/2409.06247 "Differential Degradation Vulnerabilities in Censorship Circumvention Systems" 16:44:33 <shelikhoo> Several recently proposed censorship circumvention systems use encrypted network channels of popular applications to hide their communications. For example, a Tor pluggable transport called Snowflake uses the WebRTC data channel, while a system called Protozoa substitutes content in a WebRTC video-call application. By using the same channel as the cover application and (in the case of Protozoa) matching its observable traffic 16:44:34 <shelikhoo> characteristics, these systems aim to resist powerful network-based censors capable of large-scale traffic analysis. Protozoa, in particular, achieves a strong indistinguishability property known as behavioral independence. 16:44:36 <shelikhoo> We demonstrate that this class of systems is generically vulnerable to a new type of active attacks we call "differential degradation." These attacks do not require multi-flow measurements or traffic classification and are thus available to all real-world censors. They exploit the discrepancies between the respective network requirements of the circumvention system and its cover application. We show how a censor can use the minimal 16:44:40 <shelikhoo> application-level information exposed by WebRTC to create network conditions that cause the circumvention system to suffer a much bigger degradation in performance than the cover application. Even when the attack causes no observable differences in network traffic and behavioral independence still holds, the censor can block circumvention at a low cost, without resorting to traffic analysis, and with minimal collateral damage to non- 16:44:42 <shelikhoo> circumvention users. 16:44:47 <shelikhoo> ================================ 16:44:49 <shelikhoo> Wallbleed: A Memory Disclosure Vulnerability in the Great Firewall of China 16:44:51 <shelikhoo> https://gfw.report/publications/ndss25/en/ 16:44:56 <dcf1> I suppose none of you on the snowflake team was contacted about the "Differential Degradations" manuscript? 16:45:07 <cohosh> no 16:45:14 <shelikhoo> no from me 16:45:24 <dcf1> I'm guessing this is yet another instance of the research anti-pattern. 16:45:47 <dcf1> I found out about it in the references when reading another paper draft. I haven't read it yet. 16:46:00 <shelikhoo> Wallbleed was previously discussed in last year's FOCI online ver. 16:46:34 <shelikhoo> okay, let's discuss it once we have read it 16:46:42 <shelikhoo> and now is the reading group 16:46:43 <shelikhoo> Identifying VPN Servers through Graph-Represented Behaviors 16:46:46 <meskio> yes, maybe for another reading group 16:46:57 <shelikhoo> https://dl.acm.org/doi/10.1145/3589334.3645552 16:46:57 <shelikhoo> https://dl.acm.org/doi/pdf/10.1145/3589334.3645552 16:47:38 <shelikhoo> do we wants to have a summary of the paper or we can move to discussion directly? 16:47:44 <dcf1> I wrote a summary of this one, unfortunately I had trouble keeping it concise. 16:47:48 <dcf1> https://github.com/net4people/bbs/issues/455 16:48:10 <meskio> dcf1: thank you for the summary, it helped me to understand parts of the paper I was struggling with 16:48:26 <meskio> and also to see that some parts that are confusing to me are also unclear to you 16:48:35 <dcf1> The main reason I was interested in this one is the "graph-represented" term in the title. I supposed that that meant it was going to be a paper about using server access patterns to identify VPNs. 16:48:51 <onyinyang> yes thanks for this write up and pointing to the open reviews dcf1 16:49:30 <dcf1> It turns out that is true (they do do that), but only in part. Because they also represent a whole bunch of active probing features, and store those in a graph as well, which muddies the concept somewhat. 16:50:28 <dcf1> These are the "communication graph" and the "probing graph". They use a technique called GraphSAGE to aggregate features from nearby nodes in each graph, then concatenate the features from a particular server to be classified and feed it to a normal ML classifier. 16:50:50 <shelikhoo> I think one of the important observation is that: "no feature" is a feature 16:51:18 <shelikhoo> like when receiving bad request, keep reading the connection 16:51:23 <dcf1> Ultimately I was disappointed, because they don't have much to say about what makes a communication graph characteristic of a VPN. All they say (and they do not back it up with any evidence) is that VPNs have fewer distinct users than "normal" servers. 16:51:33 <shelikhoo> or do not send close signal 16:51:57 <dcf1> shelikhoo: yes, that is relevant. However it's an active probing feature, not an "access relation" feature as I hoped to read more about. 16:52:38 <cohosh> this work reminded me a little of the host-based classification from https://censorbib.nymity.ch/#Wails2024a 16:53:01 <shelikhoo> yes, I think once there is machine learning involved, the method is no longer human explainable 16:53:13 <dcf1> If you check Table 4 "Ablation Experiments" on page 7, you can see in the "W/PG" row (which means "without probing graph; i.e., with communication graph only") that the access relation features, on their own, are not good at all. E.g. Accuracy = 0.5582. 16:53:50 <shelikhoo> so even if they did able to make machine learning based detection work, we could learn very little about it, even with all the weights open access 16:54:10 <dcf1> shelikhoo: I disagree, I mean they are doing feature engineering in selecting the features, they must have some motivation for them. But my guess is that the authors were just kind of throwing some thing together to see if they worked, without having a strong insight. 16:54:41 <shelikhoo> dcf1: I agree the feature inputs are important 16:55:02 <shelikhoo> we should try to make our protocol match other protocol's "features input" 16:55:13 <dcf1> If I had to guess, I would venture that the motivation for writing this paper was to try classification using communication graph features only, because, as they say in the introduction, that offers the possibility of classifying probe-resistant servers. 16:56:09 <dcf1> Then they discovered they were not able to make that work well, so they added a bunch of traditional probing features (which they were then obligated to cast in the form of a graph for compatibility with what they had already done). 16:56:48 <shelikhoo> yeah, I think this theory would explain the structure of the paper 16:57:08 <dcf1> Because their way of constructing the probing graph is really arbitrary. Every node represents a server, and edges between servers indicate similarity of open port patterns? I mean sure, but why not any of a dozen other things? 16:58:09 <dcf1> I am still quite interested in access relation–based classification, and will keep on the lookout for such. I listed a few other possibilities at https://github.com/net4people/bbs/issues/455#issuecomment-2683821673. 16:58:35 <onyinyang> it seemed suspicious to me that open port patterns was a good indicator of anything 16:58:50 <onyinyang> especially when some of the ones they listed were port 22, and ports 80 and 443 -__- 16:59:02 <dcf1> The "Web Proxy Detection based on Multiple Features Analysis" is the only one I've read, and it actually is pretty clear in the motivation, saying things like, a user that accesses one proxy is likely to access another proxy. 16:59:16 <dcf1> https://jcs.iie.ac.cn/xxaqxben/ch/reader/view_abstract.aspx?file_no=20180404 16:59:28 <shelikhoo> 80 443 22 is like any web server... 16:59:34 <dcf1> There is an English translation of this that I'm not sure is online anywhere yet. 16:59:34 <onyinyang> exactly 17:00:00 <meskio> they also look into hidden ports, wich is a bit more of an indicator 17:00:26 <dcf1> There is some similarity there with the idea of, say, and obfs4 bridge being compromised because it is also a vanilla bridge. That is, a server offers multiple ports/protocols, which in itself makes it look like a VPN or proxy server. 17:01:25 <cohosh> yeah i think the access pattern technique could still work to identify proxies/bridges, i was wondering if combining it with the host-based threshold idea would make it stronger too 17:01:32 <dcf1> Yeah, the "stealth ports" part is good. I was a bit confused about how they represent it as a feature. As I understood it, the concrete feature is the *number* of stealth ports detected? Alongside other features like the *number* of "observed ports" and the *number* of scanned ports? 17:02:03 <dcf1> cohosh: I think you are absolutely right about the similarity with host-based classification of https://censorbib.nymity.ch/#Wails2024a 17:03:06 <dcf1> Let me gripe a little bit about the notation. 17:03:39 <dcf1> meskio, you said some parts were hard to understand, I think part of the cause is that the authors are trying to pull a trick, using a lot of $$ LaTeX in an effort to appear intimidating. 17:04:02 <onyinyang> lol 17:04:12 <dcf1> But it really falls apart in places like Equation 1 on page 3. 17:04:19 <meskio> XD 17:04:25 <shelikhoo> we are a little over time 17:04:48 <dcf1> Where they are using the ∑ operator to somehow sum *sets*?!? And then taking the union of these now *integers*? 17:04:57 <cohosh> lmao yes formula 1 just to say all servers 17:05:00 <dcf1> shelikhoo: I'll only gripe a moment longer. 17:05:07 <dcf1> cohosh: yes!!! infuriating 17:05:12 * meskio is ok with a bit of overtime 17:05:17 <cohosh> shelikhoo: i think it's okay to run over time, unless you need to go 17:05:18 <dcf1> And Equation 2 is supposed to be Jaccard similarity 17:05:19 <shelikhoo> yes, dcf1: you have lock now 17:05:45 <dcf1> But they got the || in the wrong places, so again they're taking intersections and unions of integers 17:06:37 <dcf1> It really irritates me when authors use a pile of notation to try and look obscure, but what they're explaining isn't really all that complicated, and then they mess up basic details like writing ">=" instead of "≥". 17:07:00 <shelikhoo> cohosh: yes, I have no rush to leave... 17:07:21 <shelikhoo> I think they are trying to impress their boss to get more KPI 17:07:28 <meskio> maybe a way to cover when you don't have so much real meat for your paper 17:07:34 <dcf1> Rule of thumb: whenever you see a paper break out the \mathcal, it either means you are reading something difficult but quite real and meaningful, or you are reading a lot of fluff that probably doesn't mean much. 17:07:43 <dcf1> And this paper sure does love its \mathcal 17:07:51 <cohosh> i have a high level discussion question from this work 17:08:11 <cohosh> that's related to something i think roya asked me a while ago 17:08:38 <cohosh> which was whether bridges or proxies with more users could be subject to more scrutiny and blocking 17:08:56 <cohosh> and if that's the case, what kinds of defenses could we come up with against that 17:09:03 <cohosh> or how would we detect it 17:09:23 <cohosh> i don't necessarily trust the outcomes of this paper to say how feasible it is 17:09:24 <dcf1> My last point about notation, the equations in 3.4 look intimidating if you haven't seen them before, but it's not anything novel, they're just re-stating some standard neural network classification stuff. E.g., search for "tanh", "softmax", "ReLu" at https://en.wikipedia.org/wiki/Convolutional_neural_network. 17:09:31 <shelikhoo> I have once read some paper where the author overrides math symbols with their own function, as a result one reader need to read other papers first to understand it 17:10:03 <dcf1> cohosh: yes, exactly, that's the source of my interest too. 17:10:27 <shelikhoo> cohosh: it is a lot more difficult to run experiments on such claims 17:10:38 <shelikhoo> since we would need a lots of clients 17:10:39 <cohosh> we do have some usage metrics from collector, and we have shell's vantage point data 17:10:52 <dcf1> (Oh yeah, and Equations 7 and 8, what a bad way to express what they're doing.) 17:11:05 <cohosh> i was thinking at some point to do some data analysis on the rotating bridge reachability data we have to see if there is a correlation between use and blocks 17:11:06 <shelikhoo> yes... 17:11:20 <dcf1> cohosh: I imaging it could also go the other direction; that is, 1:1, one user using one server, a sign of a personal VPN perhaps. 17:11:37 <cohosh> dcf1: that's true 17:11:47 <dcf1> Is rotating bridges something that happens with Tor Browser default bridges? 17:11:52 <cohosh> low use also means a trivial tradeoff for a censor to block it 17:12:21 <cohosh> dcf1: no, i think these might have been telegram bridges? or maybe bridges we handed out manually with the community team? 17:12:27 <cohosh> maybe others remember more 17:13:07 <dcf1> https://www.bamsoftware.com/proxy-probe/ is kind of similar, but we were measuring time intervals, not levels of use. 17:13:30 <cohosh> yes our vantage point scripts are based on that originally :) 17:13:32 <meskio> I have always assumed high use bring high blocks because if people keep sharing a bridge a censor will hear about it at some point, more than the censor noticing the traffic 17:13:49 <dcf1> meskio: there are at least 2 possible cuases of that though 17:14:09 <cohosh> meskio: yeah it's hard to track down what the cause would be 17:14:20 <dcf1> one is that the censor observes high uses through passive monitoring, this is the "access relation" type classification I am talking about 17:14:56 <dcf1> another is that, the more users there are, the higher the chance that one of them is a censor agent, this is the "bridge distribution"/Lox type classification 17:15:18 <meskio> yes, and I've being assuming the former is more common 17:15:21 <cohosh> lox does also limit the number of users per bridge in way that our other distribution methods dont 17:15:57 <onyinyang> at the initial access point yes, but not in how many users can learn of it through invitations 17:16:03 <dcf1> Yeah. Which would mitigate the "censor agent" attack and have an unknown effect on the "access relation" attack (probably mitigates it tho). 17:16:10 <cohosh> oh true good point onyinyang 17:16:14 <onyinyang> though it takes longer to build up a user base through invitations 17:16:15 <meskio> I tend to believe that if I have a high tech and a low tech (social) explanations the low tech is the one true 17:16:59 <meskio> but I know there are many examples where I'm wrong 17:17:17 <shelikhoo> there is another possibility that some users have a compromised device, like a windows machine with a app that upload user's bridge line to censor 17:17:49 <dcf1> shelikhoo: yes, that's right, in that case real users are acting as censor agents without knowing it 17:17:49 <shelikhoo> so they can be helping censor to block proxies without opt-in into such thing 17:17:54 <shelikhoo> yes 17:18:30 <dcf1> Let me just write a few questions about this paper. 17:18:39 <shelikhoo> so from this point of view, a server with a lots of users does increase the risk, no matter how the increased risk actually works 17:18:54 <dcf1> I was not sure what to make of this strange sentence in the introduction: "Given our limited dataset, the result may not fully represent the actual capabilities of these detection engines, and our purpose is not to distinguish which is the best." 17:19:21 <cohosh> heh yeah that was interesting 17:19:31 <dcf1> It's not clear how they integrated "probing graph" features into the "offline" evaluation. The ISP gave them a list of IP addresses; did they then do their active probing after the fact? After what time delay? 17:19:55 <shelikhoo> and if a user has only one user, but generate a lots of traffics, this will be another pattern, as usually with unowned server, it is not appropriate exchange a lot of traffic 17:20:04 <shelikhoo> consider how expensive the traffic is 17:20:21 <shelikhoo> like 0.3 USD per GB with many providers 17:20:39 <dcf1> The data set seems unbalanced. See Table 2 on page 6. The servers represented in the access logs are 6.6% Psiphon? That seems unreasonably high. Is this supposed to be a uniform sample, or is it something already filtered by the ISP? 17:21:29 <meskio> maybe a true usage percentage in their corner of china 17:21:39 <meskio> but for sure not representative of other parts of the world 17:21:44 <cohosh> oh wow, that's a hell of a base rate lol 17:21:48 <dcf1> There's a really ambiguous sentence in Section 3.3, describing what the nodes are that make up the communication graph. "Where L_i represents a node of client IP, server IP, and domain" 17:22:23 <dcf1> The only way I could make this make sense to me is if I change the "and" to an "or": ""Where L_i represents a node of client IP, server IP, or domain"; i.e., there are three types of nodes in the communication graph. 17:23:05 <dcf1> But even that doesn't quite work, because the domain names comes from PTR records and TLS certificates, they should not stand on their own, they should be somehow attached to server nodes. 17:23:07 <meskio> their ethics review claims that they didn't have access to client IPs 17:23:09 <meskio> ... 17:23:27 <cohosh> they might have been replaced with identifiers 17:23:48 <meskio> maybe 17:23:58 <dcf1> meskio: https://github.com/chenxuStep/VPNChecker/blob/main/dataset/cipIndex_time_sip.csv, they have identifiers such as ClientIP1 17:24:15 <meskio> I see 17:24:49 <dcf1> Also https://github.com/chenxuStep/VPNChecker/tree/main/dataset, there are long lists of reasonable-looking server IP addresses, I don't know if they have been tweaked in some way 17:25:25 <shelikhoo> I also believe this kind of analysis(flowdata analysis) will be less useful with the CGNAT deployment 17:25:32 <dcf1> Also of note regarding the communication graph, they never actually say that an edge in the graph means that some communication was detected between two hosts (even though I can't imagine they would mean anything else). 17:25:57 <dcf1> They just say, circularly, "R(L_i) signifies the edges between node L_i and others" 17:26:35 <shelikhoo> a IP address no longer represent a single client, but a random set of clients assigned to a NAT gateway 17:26:37 <dcf1> I looked at the https://github.com/chenxuStep/VPNChecker code by the way. It only has some of the active probing code, none of the GraphSAGE aggregation or analysis code. 17:27:18 <dcf1> Oh yeah, I don't want to get into this point because it will take too long, but there is mention of NAT in Section 4.1 around "seed IPs". 17:27:36 <dcf1> To me, this sequence of sentences is a non-sequitur, though it seems to be trying to say something: 17:28:12 <dcf1> "Owing to IP-sharing access technologies, such as Network Address Translation (NAT), multiple endpoints might be behind one client IP." 17:28:15 <dcf1> "VPN software allows a portion of application traffic to be routed through the VPN tunnel, while it routes other traffic outside this tunnel." 17:28:18 <dcf1> "As a result, clients may access multiple VPN servers within a short time." 17:28:57 <dcf1> Like, it's hard to see how these three sentences are related, but it seems to be hinting at the idea of multiple clients behind one client IP address. 17:29:25 <dcf1> But if that's the case, it would be a reason *not* to use "seed" server IPs to discover more server IPs. 17:29:44 <dcf1> But, they don't even seem to use the "seed IPs" in that way at all! This part was thoroughly confusing to me. 17:30:14 <dcf1> In conclusion, I apologize for once again suggesting a problematic paper, but maybe we can learn a little from it. 17:30:31 <cohosh> it was an interesting discussion! 17:30:35 <meskio> is being a nice conversation, so it was useful 17:30:45 <meskio> and I find funny how they call VPN providers "attackers" 17:30:53 <onyinyang> agree :0 17:31:00 <shelikhoo> personally I think the active probing part is very interesting, so many things we should avoid when designing a new protocols 17:31:00 <onyinyang> oops I meant :) 17:31:12 <shelikhoo> haha attackers.... 17:31:23 <meskio> also how they say their motivation is netflix blocking VPNs, but they only care about the traffic client-VPN not the outgoing traffic from the VPN 17:32:33 <dcf1> meskio: the "attackers" thing was called out by one of the reviewer too. 17:33:06 <cohosh> solidarity means attack! 17:33:07 <dcf1> https://openreview.net/forum?id=7024czziih¬eId=3RLxwrJDQC "In Section 5, who are the "attackers"? This term first appears in this section and does not make sense." 17:33:26 <meskio> :D 17:33:31 <cohosh> maybe it's positive :P 17:33:33 <shelikhoo> do we really wants them to say "you are the propriety of your ruler, who will decision what you can read or write, some external attackers wants to violate the propriety right of rulers" 17:34:09 <shelikhoo> it would make sense if we put it into context like that 233333 17:34:30 <dcf1> I have a collection of first-paragraph rationales from papers like this. 17:34:39 <dcf1> One even claimed the goal was to prevent ticket scalping... 17:35:34 <meskio> lol 17:36:34 <cohosh> lol 17:36:41 <shelikhoo> preventing editing wikipedia 17:37:29 <dcf1> Let's keep this "access relation" idea in the back of our mind, and maybe something good will come up. 17:37:52 <meskio> :) 17:37:54 <shelikhoo> yes... 17:38:05 <dcf1> I'll mention there is a connection to "zig-zag between bridges and users" from "10 ways to discover Tor bridges" https://research.torproject.org/techreports/ten-ways-discover-tor-bridges-2011-10-31.pdf#page=5 17:40:28 <meskio> I guess we are done with the reading group 17:40:31 <shelikhoo> I imagine this would only work if the IP:port combination is used for a single purpose 17:40:45 <shelikhoo> with httpupgrade/websocket based proxy 17:40:59 <shelikhoo> the same ip port is used for more than one purpose 17:41:22 <shelikhoo> so it is harder to say someone connected to a specific port is the user of proxy service 17:41:31 <shelikhoo> and yes, we can call this a meeting 17:41:41 <shelikhoo> is there anything we would like to discuss in this meeting? 17:42:13 <meskio> not from me 17:42:26 <shelikhoo> #endmeeting