19:29:27 <TheSnide> #startmeeting 19:29:27 <MeetBot> Meeting started Wed Jan 27 19:29:27 2016 UTC. The chair is TheSnide. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:29:27 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic. 19:29:30 <TheSnide> hi all 19:31:18 <TheSnide> another meeting. those were quite sporadic these days, i'll try to make them much more regular. 19:32:18 <TheSnide> since last time, *lots* has been done on the debian package, which enabled us much more testing :) 19:33:03 <TheSnide> notably the dmm0 has been updated, first with the latest beta, then with the latest dailybuilds. 19:33:23 <TheSnide> the daily builds are also back, but not automated yet. They should be by tomorrow. 19:34:07 <TheSnide> ... They will be auto updated from both github:devel and the alioth:debian/experimental. 19:35:19 <TheSnide> i only have resources (time, knowledge & incentive) to do it for the debian package, but i *do* encourage any other distro to do the same. 19:35:59 <TheSnide> ... the deb package should also work on ubuntu, but i won't test them there. 19:37:15 <TheSnide> also, you should thank shapirus for quite a number of bugfixes. 19:37:55 <TheSnide> ssm for the deb pacakging (as usual). So if the package doesn't work for you, it's him. And if the package is great, it's him also :-D 19:38:45 <TheSnide> that said, if the debianauto package isn't working, it's usually me. 19:39:24 <TheSnide> chteuchteu: i think what you maid for the new UI is quite great now. most of the things are just working as planned. 19:40:19 <TheSnide> chteuchteu also redesigned our insitutional website, which looks quite great now, you can have a peek at http://mm0.eu/n 19:40:28 <shapirus> > ... the deb package should also work on ubuntu <-- as far as 14.10 is concerned, not without a few patches: there's a test failing, and List::Util doesn't have the "any" and "all" methods. Also the init scripts have to be worked on. 19:40:54 <TheSnide> shapirus: yeah, the init scripts are systemd-based IIRC. 19:41:48 <shapirus> yes, and systemd isn't in 14.10. If we want the packages to work for that distro, we'll have to provide upstart or sysv-style configs. 19:42:06 <TheSnide> i'd prefer sysv-style if i have my word to say. 19:42:20 <TheSnide> as it's compat with any /sbin/init 19:42:27 <shapirus> upstart sounds easier to me, but sysv is way more portable 19:42:57 <shapirus> also think of debian systems with sysv instead of systemd (yes it is possible to choose either of them in jessie) 19:43:00 <TheSnide> ... but i'd take anything which is making it working in ubuntu 19:43:18 <shapirus> so it's not only ubuntu-related 19:43:23 <TheSnide> yeah, that's why i'd prefer sysv-style. 19:43:56 <shapirus> I have a working munin-node sysv script already (took it from 2.1.9 I believe, it works almost as is) 19:43:59 <TheSnide> the good part is that sysv-style, i can do it myself if needed. 19:44:16 <TheSnide> (only time will be missing, not skills) 19:44:26 <shapirus> I can share it, but I have no idea on how to inject it into the deb packages, so that'll be up to you guys 19:44:47 <shapirus> as far as munin-asyncd, I run that one under supervisord currently 19:44:48 <TheSnide> shapirus: put it in a PR as a contrib/ 19:45:05 <shapirus> should probably be not difficult to make a sysv script for it as well 19:45:18 <shapirus> I'll look at that 19:45:34 <TheSnide> sysv script aren't that difficult if you don't care about babysitting the processes 19:45:45 <shapirus> I'm also working on a mysql plugin that provides a bunch of TokuDB status graphs 19:45:46 <TheSnide> (restart upon failure, etc) 19:45:51 <shapirus> will contribute it when it's ready 19:46:14 <TheSnide> also, now is the time to make the graphs look more 2016-ish :) 19:46:19 <shapirus> I thought of incorporating it in the existing mysql_ plugin, but that one's too bloated and maintained outside of munin repo 19:46:26 <shapirus> and it's not multigraph capable 19:46:29 <TheSnide> the new CPU colors are quite great IMHO 19:46:35 <shapirus> hence the decision for a separate plugin 19:47:02 <shapirus> thanks, I've spent quite a bit of time picking all those colors 19:47:45 <shapirus> so that they don't merge into one visually on the adjacent fields yet still remain contrast, readable and easy on the eyes 19:48:03 <TheSnide> oh, and i really think about providing all the "system" plugins a 1sec variant 19:48:13 <shapirus> and there's only 16.7M colors to choose from 19:48:18 <TheSnide> since, it has a great WOW factor. 19:49:08 <TheSnide> shapirus: now you can proceed to others :) 19:49:25 <shapirus> yeah and some good documentation on 1sec (or anything of higher than 5min resolution for that matter) would be nice 19:49:36 <TheSnide> +1 on doc 19:49:53 <TheSnide> I'll review the whole guide and write more doc 19:50:01 <shapirus> then, what comes to my mind at once, is the logging and configuration issues 19:50:13 <TheSnide> as it's only a copy/paste from a former blog article of mine right now 19:50:18 <shapirus> munin-httpd: prefork or not, how many processes to run 19:50:54 <TheSnide> the CPAN lib might not be in debian. so we have to choose at runtime, based on the availability of it. 19:50:55 <shapirus> munin-asyncd or others: log level, log destination (if it is feasible to switch it between syslog and files) 19:51:47 <TheSnide> the logging master is ssm, i did mostly delegate everything to him. let's ping him. If he doesn't reply, I'll see. 19:52:18 <TheSnide> #action TheSnide will write more details in the Guide about 1sec plugins 19:52:19 <shapirus> then that issue of the host subdirectories under /var/spool/munin/async 19:52:35 <TheSnide> shapirus: yup. it was by design. 19:52:41 <shapirus> which I think has been agreed upon, but hardly documented anywhere 19:52:53 <TheSnide> ... but after some time, it just feels wrong. 19:53:23 <TheSnide> shapirus: yes, what we agreed is much better. can you write the guide blurb about it ? 19:53:28 <shapirus> oh and the "1 hour bug" 19:53:49 <TheSnide> it's called the GhostBug™ :) 19:53:50 <shapirus> I have not a slightest idea how to possibly debug it 19:54:09 <shapirus> other than recording a whole log of the process's strace 19:54:33 <shapirus> or there has to be a way to turn on the most verbose debug output in the process itself 19:54:40 <TheSnide> shapirus: that's what i did before, and discovered several races. but i guess i didn't find all. 19:54:52 <shapirus> upon which I can set up some alerting and debug it when it hangs 19:55:28 <TheSnide> if it hangs, it's already too late, as it issues a sleep(3600) 19:55:44 <TheSnide> the key to fixing it, is *why*. 19:56:00 <shapirus> that's already something 19:56:18 <shapirus> there must be some condition that makes it call sleep(3600), right? 19:56:47 <shapirus> well probably recording a strace log is not a bad idea at all 19:57:35 <shapirus> if only it happened predictably... 19:57:38 <TheSnide> shapirus: i did previously, and it paid off. 19:57:52 <TheSnide> but it's a little bit tedious and HDD unfriendly :) 19:58:30 <shapirus> I have a couple dozen of VMs 19:58:49 <shapirus> I'll see what it takes to record that output 19:58:54 <hugin> [13munin] 15steveschnepp commented on issue #625: Recent version of the deb package are ok. Closing it. 02https://git.io/vzHBk 19:59:04 <shapirus> I'd love to catch it 19:59:13 <TheSnide> also, madduck has a good point here https://github.com/munin-monitoring/munin/issues/619 20:00:04 <shapirus> right 20:00:37 <shapirus> it should be easy to convert that to "send as you read" 20:00:53 <shapirus> unless there happen some data aggregation after it's read completely 20:01:03 <TheSnide> i don't think there is. 20:01:08 <TheSnide> also, https://github.com/munin-monitoring/munin/issues/617 is quite relevant 20:01:49 <TheSnide> https://github.com/munin-monitoring/munin/issues/612 has obviously to be fixed 20:01:59 <shapirus> hard coded stuff bites again 20:02:40 <shapirus> the latter looks like pure coding mistake 20:02:49 <shapirus> or underimplemented something 20:03:35 <shapirus> I'd also mention this one https://github.com/munin-monitoring/munin/issues/634 20:04:22 <shapirus> at the very least it needs a cron job schedule workaround merged 20:04:40 <shapirus> until there is a decision of what to do for a permanent solution 20:05:55 <TheSnide> oh, the update-async struggle :) 20:06:51 <TheSnide> but the first part of the issue, i don't really understand 20:07:07 <hugin> [13munin] 15Skaronator commented on issue #592: ah completely forgot this issue but yeah rebuild the whole graph system in HTML5 with some fancy JS-Graphs would be the best solution. 02https://git.io/vzH0k 20:07:13 <shapirus> what I was thinking about in regard to that is a mechanism that allowed munin-master to connect to the nodes at once as soon as there are new data available 20:07:32 <shapirus> instead of fixed cron runs 20:07:52 <shapirus> but that has to involve support for some callback from the nodes 20:08:17 <shapirus> which doesn't sound impossible, however a new daemon on the master server will be needed 20:08:47 <shapirus> the first part issue is easy 20:09:03 <shapirus> well basically it's all in the steps to reproduce 20:09:10 <shapirus> try them and see what happens :) 20:10:48 <TheSnide> i was thinking about allowing some POST on the munin-httpd from the nodes 20:10:59 <shapirus> yes 20:11:07 <TheSnide> but that got dropped from 3.0 20:11:08 <shapirus> node->master: "I am ready" 20:11:19 <TheSnide> nah, directly the data :D 20:11:20 <shapirus> master->node: "ok I'm coming, give me your data" 20:11:32 <shapirus> how does that sound? 20:11:37 <madduck> go TheSnide go! ;) 20:11:46 <shapirus> it'll also spread load over time 20:12:08 <shapirus> since munin-update will not connect to all nodes at once, but only as soon as there are new data 20:12:18 <madduck> shapirus: this poking could also presumably be done with SSH? 20:12:26 <TheSnide> as it would enable to monitor loosely connected nodes (those behind an infamous mobile internet NAT) 20:12:26 <shapirus> which in turn will allow updates more frequent than every 5 min 20:12:41 <shapirus> or less frequent as well 20:12:45 <madduck> shapirus: or simply keep persistent connections from the server to the nodes and poll. 20:12:59 <shapirus> that doesn't sound as scalable 20:13:11 <shapirus> the notify->poll method sounds better to me 20:13:14 <madduck> you mean past 64k connections? 20:13:23 <TheSnide> ;) 20:13:27 <shapirus> think of poor connections 20:13:50 <TheSnide> madduck: well, if you have more than 1 ip for all the node, you can have more than 64k 20:13:53 <shapirus> I'm imagining some DNS transfers-like system here 20:14:05 <madduck> YUK ;) 20:14:11 <shapirus> where on serial change the master sends notifies 20:14:16 <shapirus> and the slaves come for the fresh zone copy 20:14:23 <TheSnide> shapirus: i have the FTP-data in mind when i look at your proposal, and it doesn't sound pretty 20:14:32 <shapirus> that approach suits munin perfectly I think 20:15:00 <madduck> so wait a minute (sorry that I am talking as bystander, but…) 20:15:05 <TheSnide> shapirus: even better would be "node POST the new data to munin-httpd". period. 20:15:24 <TheSnide> madduck: we're quite tolerant & open :) 20:15:29 <madduck> asyncd collects data and you are suggesting that it reaches out to newdaemond to inform it about new data such that munin then opens SSH to asyncd? 20:15:34 <shapirus> FTP isn't bad either, it's just that the technology turned out to be not very suitable for where people started using it 20:15:55 <TheSnide> madduck: ... basically, yes. 20:15:59 <madduck> why not let asynd push the data instead of poking for a pickup? 20:16:16 <TheSnide> madduck: *precisely* what's i'd prefer :) 20:16:17 <madduck> without replacing pull support ;) 20:16:31 <shapirus> the problem is with the "node POST the new data to munin-httpd" approach is that you'll have to handle the situation when the server isn't available 20:16:44 <shapirus> and mark a portion of data as "failed to transfer" 20:16:49 <TheSnide> shapirus: bah. the async already takes care of that. 20:16:50 <madduck> you just accumulate, as if the server doesn't fetch 20:17:01 <TheSnide> ... remember the "node" is *dumb* approach ? 20:17:19 <shapirus> yes, but you've just said the opposite: that the node has to push data 20:17:20 <TheSnide> the callback will *only* be done by async implementations. 20:17:40 <TheSnide> shapirus: ok. i lied. async will directly push data to master. 20:18:35 <TheSnide> madduck: i think you are thinking the same as my original design. 20:18:49 <TheSnide> (which i didn't have time to implement) 20:19:15 <TheSnide> what's currently missing is : #1 a persistant SQL db. #2 a POST handler on the munin-httpd 20:19:29 <shapirus> [22:16] < madduck> asyncd collects data and you are suggesting that it reaches out to newdaemond to inform it about new data such that munin then opens SSH to 20:19:32 <shapirus> asyncd? 20:19:40 <shapirus> that's exactly what I'm proposing 20:19:58 <shapirus> as to the "why not push data" thing 20:20:14 <shapirus> well, it's just gonna be more difficult to implement I think 20:20:19 <shapirus> more prone to bugs 20:20:24 <shapirus> and less reliable therefore 20:20:46 <shapirus> whereas the notify->poll method will be bulletproof even in case of poor connection 20:20:53 <shapirus> think dns :) 20:21:26 <shapirus> then again, think of load 20:21:43 <shapirus> if a hundred nodes chime in and try to push their data at once 20:21:48 <shapirus> it may overload the master 20:22:19 <shapirus> and you'll have to implement a "sorry come later" mechanism and make the node handle it properly 20:23:03 <shapirus> where as with the notify-poll method, the master will know the list of nodes which have fresh data and work on it at its convenience 20:23:03 <kenyon> like what http already has with its status codes 20:24:05 <shapirus> and that will feel like more 'distributed' rather than 'centralized' architecture 20:25:00 <shapirus> then you can have the node update its spool files as frequent as you wish (think 1sec graph) and notify the master in a lightweight way about that (every time or not more frequent than a configured interval) 20:25:30 <shapirus> and then the master may poll "when the node asks for it but not more frequent than <configvalue> seconds" 20:25:42 <TheSnide> shapirus: i agree in most of what you said. But, as the spooling is already done, avoiding callbacks alltogether means that we don't need to open FW in both directions 20:26:01 <shapirus> that's right 20:26:02 <TheSnide> and retrying is mostly implemented 20:26:20 <shapirus> in what I suggest, there must be an open port on the master 20:26:32 <TheSnide> (since the biggest issue in retring is retaining the failed data. which we *already* do.) 20:26:40 <shapirus> the only drawback as I see it 20:27:01 <TheSnide> so, the async will POST *data* to master has mostly only advantages. 20:27:18 <shapirus> not until you think of scalability 20:27:21 <TheSnide> ... the only drawback i see is "security" 20:27:46 <shapirus> as soon as the number of nodes grows high enough, the master is stuck 20:27:46 <TheSnide> shapirus: bah. scaling HTTP requests is something that is well understood nowadays :D 20:28:14 <shapirus> no, I mean the case of N nodes coming with their data all at the same time 20:28:21 <TheSnide> ... and with persistant DB, comes *multi-host* master 20:28:48 <TheSnide> N nodes coming with their data all at the same time <-- that *will* happen. as we are time-sensitive 20:28:55 <shapirus> the security issue is still there no matter POST or new-data notification for poll 20:29:29 <shapirus> my approach will achieve time-insensitiveness as a side bonus 20:29:34 <TheSnide> if notif, you'll only eventually DoS the platform. not inject wrong data 20:29:35 <shapirus> think of that :) 20:30:23 <TheSnide> shapirus: i'm _really_ not in favors of callbacks. Got my back burnt too often with that :) 20:31:05 <shapirus> and I've had my share of trouble with centralized systems ;) 20:31:23 <shapirus> I'm 8 years in designing and working with HA and high-load systems 20:31:24 <TheSnide> ... it becomes a nightmare to open holes in most FW, and you are at the mercy of NAT :) 20:31:44 <shapirus> I can see where scalability issues are part of design 20:31:59 <shapirus> but then, as far as the firewall goes 20:32:01 <TheSnide> shapirus: that said, _nothing_ prevents you to provide a contrib tool to do that. 20:32:17 <shapirus> let's assume your HTTP data POST method 20:32:24 <TheSnide> i'd even merge it if it works 20:32:25 <shapirus> how are you going to secure it? 20:32:51 <TheSnide> i was naively thinking about SSL 20:33:04 <TheSnide> client-side certs 20:33:16 <TheSnide> _if_ you want to secure it. 20:33:39 <TheSnide> or simple HTTPS + HTTP-Basic Auth. 20:33:47 <TheSnide> that might also be enough 20:33:48 <shapirus> well if not secured, then anyone can inject wrong data or simply DoS the server 20:33:57 <TheSnide> yup 20:34:14 <shapirus> but then, secured in any way, the node callbacks can be secured in just the same way 20:34:31 <shapirus> so the security issue is the same for both approaches 20:34:39 <TheSnide> i said : "node callbacks are *more* secure than node posts" 20:34:50 <shapirus> how are they more secure? 20:35:02 <TheSnide> data injection is impossible. only dos is. 20:35:23 <shapirus> agreed 20:35:51 <shapirus> and then with some firewall rules (e.g., allow LAN only) it protects against DoS as well 20:35:58 <TheSnide> I also wish for some UDP data protocol :) 20:36:14 <shapirus> that's a bright idea 20:36:30 <shapirus> the perfect choise for notifications 20:36:41 <shapirus> we don't care about the packet loss 20:36:42 <TheSnide> ... as a typical "$plugin fetch" does usually fit in a 1.5K packet 20:37:01 <shapirus> 512 bytes. 20:37:12 <TheSnide> ha ? 20:37:22 <shapirus> UDP packet size is up to 512 bytes 20:37:30 <TheSnide> #wut ? 20:37:31 <shapirus> lower than the typical MTU :) 20:38:00 <be0rn> That's not true. UDP packets can easily be bigger. 20:38:01 <TheSnide> well, that's the *guaranteed* size. 20:38:10 <shapirus> well theoretically it can be up to 65k 20:38:17 * TheSnide enlarged his UDP packets. 20:38:26 <be0rn> Bloat 20:38:45 <TheSnide> but i'd say a 1k packet is fair 20:38:50 <shapirus> either way, I don't think it's a good idea to use UDP for actual data fetch: too much hassle for no profit 20:39:16 <shapirus> very unlike the lightweight "new data" notifications sent from the nodes to the master, if we think that way 20:39:18 <TheSnide> super-duper lightweight node. no async. 20:39:29 <TheSnide> if data is lost, so be it. 20:39:52 <shapirus> as an additional feature? 20:39:59 * TheSnide has a WIP of munin-node-c with streaming plugin via UDP 20:40:00 <shapirus> might be useful 20:40:42 <TheSnide> cherry on the cake would be to have a statd gateway (that's what i'm _really_ thinking of) 20:40:44 <shapirus> won't you eventually be creating another graphite that way? :) 20:41:30 <TheSnide> shapirus: graphite does many things right. 20:41:35 <TheSnide> from a user perspective. 20:41:52 <TheSnide> anyway... i'm closing the meeting 20:41:56 <TheSnide> #endmeeting