16:00:18 #startmeeting network-health 16:00:18 Meeting started Mon Jun 7 16:00:18 2021 UTC. The chair is GeKo. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:18 Useful Commands: #action #agreed #help #info #idea #link #topic. 16:00:53 o/ 16:00:54 alright, let's get started for the weekly meeting 16:00:59 https://pad.riseup.net/p/tor-netteam-2021.1-keep is the pad 16:01:07 ah, no 16:01:09 hey! 16:01:12 let me get the right onw 16:01:14 *one 16:01:27 hi 16:01:34 kfahv6wfkbezjyg4r6mlhpmieydbebr5vkok5r34ya464gqz6c44bnyd.onion/p/tor-nethealthteam-2021.1-keep 16:02:31 hi! 16:04:29 okay, let's get started 16:04:45 if someones needs to add things to the pad, please do while we are chatting here 16:05:19 irl[m]: so, for the metrics.tpo outages what are the next steps here? 16:05:40 as yet, no idea, i'm still mostly working on collector outage 16:05:58 i've filed https://gitlab.torproject.org/tpo/metrics/team/-/issues/15 16:06:01 irl[m]: what specifically you did to bring metrics.tpo back? 16:06:08 turned it off and on again 16:06:17 and added an item on the status page to indicate we have issues 16:06:33 i've not restarted it since the weekend, and it looks happy again now 16:06:44 i had issues this morning 16:07:00 showing a 502 proxy error on relay search 16:07:10 that would imply it's a load-related problem then 16:07:20 until i've got the prometheus stuff set up i have no visibility of any of this 16:07:23 so it seems theses problems are at least only intermittently happening 16:07:34 what is the ticket for that? 16:07:52 or do we need to create one? 16:08:54 irl[m]: ^ 16:10:04 https://gitlab.torproject.org/tpo/tpa/team/-/issues/40280 is the ticket that is blocking the prometheus exporter being set up for collector, then i was going to add collector into the prometheus, and then go from there 16:10:04 it's a whole new thing, to replace the old metrics nagios that seems to have been turned off while i was gone and nothing replaced it 16:10:04 i think the biggest problem here is that i only knew metrics was broken because someone told me 16:10:04 not a single alert was triggered anywhere 16:10:16 the second problem is that the logs are very noisy, because metrics-web does a lot more than it did when the logging was initially devised, so without monitoring you just have a mountain of logs 16:10:30 you don't know where to look because you have no timestamp 16:10:37 i see 16:10:54 i'll look to see if we made a ticket for the larger thing 16:11:01 thanks 16:11:16 https://gitlab.torproject.org/tpo/tpa/team/-/issues/40216 is related 16:11:24 https://gitlab.torproject.org/tpo/tpa/team/-/issues/40274 is related 16:11:31 right 16:11:34 there isn't a "project" ticket as such that i can see 16:11:42 i remember the last one 16:12:04 the ticket would probably be titled "Monitor Metrics services with Prometheus" 16:12:36 anarcat has set up some git stuff to make it easier for me to directly write the prometheus configs and have them deployed 16:12:45 can we take some shortcuts here so that the issue potentially buggging metrics.tpo is caught first? 16:13:00 i am not sure what logging infra needs to get set up for that 16:13:14 as i don't really know all the pieces involved here 16:13:40 yes, i need to refresh my knowledge of blackbox exporter and then write the config for that 16:13:47 instead of matrix it will just send me emails, which is better than nothing 16:14:05 but if collector is e.g. not involved in the metrics.tpo outage we could postpone setting prometheus alerts up for that one 16:14:23 and start with a different part first 16:14:38 right yes, that is the plan 16:14:48 great 16:14:52 so the idea is to do this in nagios? 16:14:55 and not prometheus 16:14:59 no, prometheus 16:15:02 ook 16:15:10 it all used to be in nagios but i guess people didn't like nagios as it got turned off 16:15:44 the prometheus is being used by anti-censorship too, so there's redundancy of knowledge 16:15:53 yeah 16:16:07 i don't know anything about why the nagios part got turned off 16:16:35 but we should not start with it again if we move to prometheus i guess 16:17:03 okay 16:17:11 that's anything i had for that item 16:17:19 the other is the roadmap 16:17:25 http://kfahv6wfkbezjyg4r6mlhpmieydbebr5vkok5r34ya464gqz6c44bnyd.onion/p/IutVYvgMq9614nDk-KFm 16:17:29 yes, not sure. We never talked about retiring nagios 16:17:46 i've cleaned it up and created tickets and we started triaging them 16:18:11 so for this week it would be useful if any of you could go over it and think about things that are missing 16:18:20 or even mis-categorized 16:18:26 arma2: mikeperry: ^ 16:18:51 we tried to put things in Needed and Wanted etc. according to what we came up during the meeting 16:19:05 and by me thinking about it afterwards 16:19:10 but things are not set in stone 16:19:28 so, if there is anything we should fix here, let gaba or me know 16:19:45 ggus: should we deal with the remaining community items? 16:19:49 GeKo: yes 16:20:10 so, i have the meetup in Needed 16:20:20 anything else we should put into that? 16:20:31 the otf fellow and operator census work? 16:20:48 yes, the operator census work is needed. 16:20:59 to we have a ticket for that work? 16:21:14 mmmh, let me check 16:21:21 arma2: we also assigned you a ticket. 16:22:00 and could easily assign more :) 16:22:31 :) 16:23:51 ggus: no need to find it now, if it takes too long (yeah gitlab search is horrible) 16:23:51 GeKo: https://gitlab.torproject.org/tpo/community/team/-/issues/39 16:23:58 :) 16:24:33 there are three items in the wanted section 16:24:43 which i marked with "XXX Ticket" 16:24:57 i guess we a) want to have them as wanted 16:25:10 and b) there should be tickets for them? 16:25:28 could you file them if they are missing and add the links to the pad? 16:25:47 i'll clean up things around them afterwards 16:26:43 > Understand where relay operators try to go to get support (UX side) 16:26:48 who created this one? 16:27:01 dunno 16:27:13 maybe arma2 16:27:16 i think it was arma2 16:27:32 ok, i will create a ticket for that one. this will also part of the new fellow 16:27:41 part of the work 16:27:45 nice 16:27:48 thanks 16:28:10 the final item i had to discuss is the website blocking tor one 16:28:36 i guess we can keep that as wanted given that we have a gsoc project running 16:28:44 which is providing the infra for that 16:29:18 ggus: at some point we should connect both worlds the advocacy one with the tools one 16:29:28 and the comms world too 16:29:28 so the former can start using the latter 16:29:31 yes 16:29:48 when this project will be released? 16:30:05 i'll leave that for you to decide when the right time is to get started with that 16:30:11 let me see 16:31:02 i am actually not sure when gsoc ends 16:31:13 but i think end of july 16:31:16 or begin of august 16:31:26 https://gitlab.torproject.org/woswos/CAPTCHA-Monitor/-/wikis/GSoC-2021 is the page for the project 16:32:05 ggus: i'll put you as the comms liasion on the pad, too 16:32:11 not just the community one :) 16:32:33 and we can then put that item on the whishlist for comms folks 16:32:33 ok! 16:32:41 i'll create a ticket after the meeting 16:32:47 and then we can take it from there 16:33:11 but the tool is a thing (or will be) and it can be useful for the advocavy part i think 16:33:18 *advocacy 16:33:20 GeKo: it would be nice to have woswos and _ranchak_ presenting both projects during a Tor demo day. 16:33:28 right! 16:33:29 GeKo: yeah! 16:33:31 if you have any wishlist for the gsoc project, please let me/us know 16:33:34 good idea 16:33:52 woswos: you could think about the demo day idea, too 16:34:07 would be awesome to have it presented there 16:34:38 is there a link for getting more information about it? 16:34:43 woswos: yes, one sec 16:34:47 ggus: that's all i had from my side 16:36:04 woswos: example - https://lists.torproject.org/pipermail/tor-project/2021-February/003047.html 16:36:12 while ggus is looking for the link let me know if there is anything else to discuss today 16:36:33 we will announce the next demo day on torproject mailing list. but it should happen in august 16:36:58 nice 16:36:59 5 - 10 minutes, open for community members, small crowd (~40 ppl) 16:37:02 end of August 16:37:09 thanks for the link 16:37:51 ggus: are we good wrt the roadmap for now? 16:38:17 GeKo: yes, i will create the 2 tickets that are missing 16:38:26 thanks 16:38:47 okay. i heard nothing getting raised for discussion 16:38:57 so thanks for being here and have a nice week 16:38:59 #endmeeting