16:17:47 <h01ger> #startmeeting snapshot.debian.org 16:17:47 <MeetBot> Meeting started Mon May 6 16:17:47 2024 UTC. The chair is h01ger. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:17:47 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic. 16:17:48 <noahm> Hey all. I'm here for the meeting, representing both the cloud team and my employer, who would like to provide a mirror of snapshot.d.o in the Microsoft Azure cloud. 16:17:53 <h01ger> #chair ln5 16:17:53 <MeetBot> Current chairs: h01ger ln5 16:18:04 <h01ger> ln5: so you can also say 16:18:12 <ln5> ok tnx 16:18:16 <h01ger> #topic agenda is at https://pad.sigsum.org/p/2024-05-06_snapshot.do 16:18:26 <h01ger> is the agenda ok or do we need anything else? 16:18:47 <h01ger> #info can be used by anyone to add noteworthy stuff to the log 16:18:53 <ln5> let's add noahm (hi!) offer 16:19:16 <h01ger> oh, nice one! 16:19:22 <noahm> updated the agenda to note that 16:19:27 <ln5> noahm: thanks 16:19:45 <h01ger> shall we start? 16:19:50 <ln5> so, feel free to add to the agenda if you figure out more 16:19:58 <ln5> h01ger: let's go 16:19:59 <weasel> h01ger: please. 16:20:07 <h01ger> #topic status updates 16:20:55 <ln5> what's the protocol here? i'm kinda lost :) 16:21:24 <weasel> somebody needs to run this meeting. 16:21:32 <weasel> ln5: do you want to? h01ger, you? 16:21:38 <ln5> h01ger: would you run the meeting please? 16:22:22 <ln5> ok, i'll do something 16:22:37 <ln5> lucas said earlier: update about hw for new primary site: the machine has been built and mostly tested; ETA for DSA access to BMC is a week from today 16:22:40 <ln5> meh 16:22:45 <ln5> cut'n'paste fail 16:23:00 <ln5> lucas said earlier: hi, I made progress on my side, which I documented in https://lists.debian.org/debian-snapshot/2024/05/msg00000.html 16:23:13 <h01ger> (sorry, was fetching tee as i expected people with updates share them) 16:23:14 <weasel> that document is about s3 backend for snapshot 16:23:28 <h01ger> tea 16:23:30 <weasel> lucas: great news, and great summary. 16:23:42 <weasel> lucas: do you want us to consider the open questions here now? 16:23:56 <ln5> he's not around, will read backlog 16:24:06 <ln5> so let's take a stab at them now 16:24:37 <noahm> many of the questions that lucas is facing will also apply to an azure-hosted service, fwiw 16:24:56 <noahm> it will be very similar to an S3-backed implementation 16:25:05 <weasel> seems likely 16:25:24 <ln5> so one question is if we should come up with another kind of "snapshot mirror"? 16:25:44 <ln5> ie, not the type already running at lw 16:26:05 <weasel> yes, my gut feeling is that a sane approach would be for the primary site to just import things into its local storage, 16:26:16 <ln5> which iiuc has some quirks like files and db not appearing at the same time 16:26:18 <weasel> and then, once an import is finished, various mirror things can be triggered 16:26:50 <waldi> so option A? 16:26:52 <ln5> weasel: like what we have today, right? 16:27:00 <weasel> right now the only kind of mirror is to sync individual files to a secondary site, but it could be just as easily be a (parallel) upload to s3/azure 16:27:11 <weasel> waldi: yes 16:27:20 <ln5> individual files plus the database 16:27:27 <weasel> as far as farm content is concerned 16:27:52 <weasel> if we run full VMs in azure, it could also host a DB replica 16:28:11 <h01ger> if there are 3 implementations, we would have 2 mirrors, no? 16:28:23 <weasel> not sure if AWS would have anything to run the/a DB 16:28:34 <waldi> does the snapshot stuff have a log of changed things? because trying to find out which objects exists are no good idea on s3 and similar stores 16:28:52 <weasel> waldi: yes, there's something like that IIRC. and if not, it's easy to add 16:29:03 <waldi> either a local db on the web node or rds 16:29:04 <ln5> a journal? 16:29:08 <waldi> ln5: yes 16:29:09 <weasel> (things also never change, there just are new things) 16:30:00 <weasel> right now the way to get the database copied to another place is pg streaming replication 16:30:03 <h01ger> arent there removals from snapshot.d.o for legal reasons too? 16:30:12 <weasel> they only get marked unreadable 16:30:17 <h01ger> ah, ic 16:30:30 <ln5> the upside of this (A) is that there's one and only one view 16:30:31 <weasel> (and yes, also there's a trivial list which an s3 backend could use) 16:31:55 <weasel> I don't really know if we want the DB in amazon and azure as well, or if we just want the storage parts there. 16:32:00 <ln5> what are the downsides with keeping with only a single primary? 16:32:03 <weasel> does anyone know? 16:32:40 <waldi> yes, there should be db copies 16:32:49 <weasel> ln5: if things break, stuff is broken until somebody gets around to fixing things. 16:33:07 <weasel> waldi: ok. can be done, makes it more challenging. 16:33:20 <weasel> is this something we want from the start? 16:33:54 <noahm> is the db involved in serving of content, or just in tracking and generating repo metadata? (sorry, not super familir with the architecture) 16:34:52 <weasel> the DB is what has the file structure. the mapping of (archive, timestamp, path, filename) to <sha1 of the file>. 16:35:07 <weasel> the storage just has blobs named like their sha1 16:35:54 <noahm> so the db is in the critical path for file access, since it needs to map URIs to blobs? 16:35:58 <noahm> How big is the db? 16:35:58 <weasel> yes 16:36:12 <ln5> noahm: depends on what the client knows -- the db is needed to map file path (in url) to file content 16:36:28 <ln5> noahm: 60-70G iirc 16:36:35 <weasel> noahm: << 100gb on disk 16:37:23 <weasel> and right now our method of "mirroring" that is postgres streaming replication (backed by wal shipping) 16:37:52 <weasel> so there's a tight sync between primary and replica(s), including version and arch constraints 16:38:14 <waldi> what might be challenging as well: a high throughput require that the web frontend only issues redirects to the storage. but the storage does not yet know file names and content type it can tell the client 16:38:38 <weasel> (there is an older way of getting a secondary database from the days before PG had wal shipping, where we dump the metadata of each mirror run to a text file and then import it on the other side. not sure if it has rotten) 16:38:46 <waldi> weasel: logical replication is easy in the meantime 16:38:56 <weasel> waldi: yes, also an option 16:39:04 <waldi> and does not have such limitations 16:39:15 <weasel> right 16:39:43 <weasel> in general, a web frontend really wants a local copy of the DB 16:40:13 <weasel> we don't have that right now at leaseweb (the DB we use at leaseweb.nl is actually at manda in .de), but that's because of local hw constraints. 16:40:56 <ln5> other considerations for which alternatives make sense? 16:41:10 <ln5> lucas seems to prefer starting at C, for example 16:41:24 <noahm> I wonder how hard it would be to cache the entire db in a giant nginx config file with a bunch of "location" directives. I'm not sure that nginx would like loading 100 GB of config data, but I don't love the idea of a db in the critical path if it can be avoided. 16:42:11 <weasel> most (all?) of the dynamicly created stuff can be cached for <long time> 16:42:51 <waldi> noahm: no, you don't have that amount of memory 16:42:52 <weasel> the location directives would probably be many. #files × #mirrorruns? 16:43:02 <ln5> needs quite frequent updates though, but yes -- an append-only config file for each "client" is basically what's needed 16:43:22 <noahm> there are hosts with 100+ gb of memory 16:43:37 <waldi> weasel: just to think about: it might be required to store the filename with the objects, which means the checksum is now over content and filename 16:44:08 <weasel> that would be a radical change 16:44:57 <jas4711> hi! fwiw, i am importing data into git lfs effectively creating another variant of snapshot.debian.org. not yet sure it can scale to snapshot.debian.org sizes, but archive.d.o 2TB is no problem 16:45:01 <ln5> anyhow, let's continue design discussions in the "other" section? 16:45:10 <waldi> yes 16:45:20 <ln5> to get "updates" done 16:45:45 <ln5> jas4711: hi! please add an entry in "other" in https://pad.sigsum.org/p/2024-05-06_snapshot.do and we'll get there 16:45:47 <jas4711> offering this as a "alternative idea". however i really hope you get current snapshot into better hosting so will not disturb :) 16:46:31 <ln5> cue next update item: i said earlier that the server for a new primary site is almost built and will be ready for DSA in ~1w 16:47:11 <weasel> \o/ 16:47:17 <ln5> it's a 2x12x20TB machine very much like the one we specced earlier 16:47:44 <h01ger> \o/ 16:47:55 <ln5> i will exchange ip addr/s and wg keys and whatnot with DSA later this week/early next 16:48:02 <ln5> any other updates? 16:48:37 <ln5> #topic open questions 16:49:07 <ln5> i guess some open questions from lucas report fit in this section; cf. https://lists.debian.org/debian-snapshot/2024/05/msg00000.html 16:49:29 <ln5> but we've talked about them in the previous already and will do more in next, maybe 16:49:51 <ln5> there was a question prior to the meeting about the the ETA, let me find it 16:51:16 <ln5> axhn asked "does 'rough ETA for resumed imports of all archives, "before october"' still hold?" 16:51:32 <ln5> and i think it holds fine 16:51:46 <axhn> thanks 16:51:58 <ln5> more open questions? 16:52:06 <axhn> I'll keep asking the next time :) 16:52:20 <ln5> :) 16:52:34 <h01ger> the software will - for now - stay more or less the same, or? 16:52:54 <h01ger> (eg sha1 but also its exists :) 16:53:03 <ln5> i guess that depends on who's going to own this... :) 16:53:09 <weasel> I'm more than open for anyone to change it, 16:53:27 <weasel> I don't expect to find any time to do redesigns myself. 16:53:38 <h01ger> ok, thats good enough as an answer for this for now. 16:53:41 <ln5> i mean, i'd love to fix things but don't have the time for that for a long while 16:53:42 <weasel> so whoever owns snapshot goings forward gets to decide whether to redesign, change things. 16:53:47 * h01ger nods 16:54:08 <weasel> I expect replacing sha1 with sha256/512 might be high on the list of things one would want, 16:54:14 <weasel> but it's probably not the most pressing issue 16:54:38 <weasel> which is why i'm sceptical of radical changes that require changing the addressing system 16:54:43 <ln5> i'm willing to try to own snapshot but will be busy getting things running in its current incarnation, before making changes 16:55:35 <ln5> if anyone else has more time and want to rebuild the thing i can be part of supporting things i understand, but not much more atm 16:56:03 <axhn> I could check my old databases (ten years) whether we ever had a md5 collision, and I strongly doubt that. So, going away from SHA-1 should happen some day but it's not really urgent. 16:57:07 <h01ger> .oO( and nobody was ever fired for buying IBM, so why change that? ) 16:57:16 <h01ger> any other open questions? 16:57:17 <weasel> h01ger: it's MS today :) 16:57:26 <h01ger> weasel: gugle 16:57:28 <ln5> more open questions? or we move further 16:57:46 <ln5> #topic other 16:58:02 <waldi> i would just say that importing into azure/aws is not useful right now. because we can't use the storage format long enough 16:58:20 <weasel> "long enough"? 16:58:38 <h01ger> weasel: -v? 16:58:46 <h01ger> i ment waldi, sorry 16:59:21 <waldi> h01ger: what i just said. for high throughput we need to do redirects to the storage. for this the storage needs to tell the client what the filename is 16:59:41 <waldi> or clients will just store files named after the checksum 16:59:49 <h01ger> ic, thx 16:59:50 <waldi> but we also can't update existing objects 17:00:28 <waldi> so import now, fixup later is also not easy 17:00:29 <h01ger> apt should query for objects with sha256 hashes indeed, and stop using filenames :) 17:00:49 <waldi> h01ger: we have a web interface, so people use it manually, or? 17:01:09 <weasel> right now the web application sends a redirect to /file/<sha1> and that redirect is then magically dealt with in varnish and apache 17:01:16 <weasel> the client never sees the redirect 17:01:21 <h01ger> maybe we should give noahm some space to explain their plans 17:02:58 <noahm> would it not be possible to store the files as (for example) /pool/p/<checksum/pkg.deb We can reference the file by its actual name in the Packages files, so apt will still work, and we've still got checksum-based deduplication. 17:03:37 <noahm> err, small typo, that was /pool/p/<checksum>/pkg.deb 17:03:55 <noahm> it would definitely be a rearchitecture of things, so not trivial. 17:04:23 <weasel> noahm: file names do not usually uniquely refer to content. archive and time are also factors 17:04:57 <ln5> noahm: given the current architecture, would you want to host farms only or farms and the db? 17:05:01 <weasel> a given foo.deb name may not change its content, but snapshot makes no such assumptions (and I think we had such cases in the past) 17:05:23 <waldi> the archive does not make such promises 17:05:28 <noahm> right, which is why we still encode the checksum in the URI path 17:05:50 <noahm> just not as the filename itself, since apt cares about that. 17:06:22 <weasel> and then rewrite Packages files? 17:06:28 <noahm> yes 17:06:35 <weasel> snapshot doesn't do that right now 17:06:57 <weasel> it gives you a file system tree as it was on import time. it doesn't particularly care that it's a debian archive 17:07:10 <weasel> sure, could be done, but that's a different piece of software :) 17:07:35 <weasel> (and it's probably not entirely trivial. not all archives look exactly like the ftp.debian.org main one) 17:07:36 <noahm> right. I don't mean to suggest that this would be trivial. But IMO it seems like it might scale better by virtue of taking the db out of the critical path. 17:07:46 <weasel> would it? 17:07:54 <weasel> the Packages file would have to be built somewhere 17:08:03 <noahm> the db would be involved in generating the packages files, but that's asyncronous. 17:08:14 <noahm> a db outage does not prevent clients from accessing the archive 17:08:19 <noahm> as it does today 17:08:52 <noahm> it also means that replica sites don't need a local db 17:08:56 <olasd> my experience with serving redirects to cloud storage buckets is you have to generate a somewhat short lived access signature with your bucket key, and within that signature you tell the bucket the filename / content type that you want it to present to the client 17:08:59 <weasel> sure, it's one option, but that's not the software we have.:) 17:09:26 <noahm> olasd: or you just make them all public. since this is all public data anyway. 17:09:26 <weasel> a storage object also does not uniquely refer to one file 17:09:35 <waldi> olasd: s3 is able to do thatt, others not 17:09:36 <weasel> we have plenty of objects that are known by different names 17:10:24 <jas4711> re-generating packages files would also break pgp signatures and validation by apt so this is indeed a rather different approach and needs tooling to work 17:10:25 <noahm> yeah, and blob storage systems don't usually support a notion of symlinks. 17:11:30 <weasel> (and not all apts support http redirects) 17:11:47 <weasel> but maybe it's ok to ignore those in this day and age 17:12:13 <ln5> with 5m left of the meeting, i'd like to jump to "next meeting" and then back to jas4711 offer 17:12:48 <ln5> i propose monday june 10 at 1600 UTC for a sync like this one 17:13:28 <weasel> +1 17:14:02 <ln5> going.... 17:14:30 <h01ger> works for me 17:14:33 <ln5> ... going, gone. 2024-06-10 16:00Z 17:14:35 <noahm> +1 17:14:44 <ln5> great, thanks 17:15:06 <ln5> noahm: want to explain more about what you'd like to do? with current or future software? 17:16:49 <weasel> jas4711: re git lfs. I don't expect sheer size of the blobs to be an issue. the question is how does it look over 80k commits of filesystem trees of 1.5M files each 17:17:15 <ln5> #agreed next meeting 2024-06-10 16:00Z 17:17:58 <waldi> jas4711: is on dated snapshot one commit, or do you have one tree with all snapshots? 17:18:32 <jas4711> weasel: i don't know yet, it is an experiment. i'm playing with it on e.g. https://gitlab.com/debdistutils/archives/debian/ftp.debian.org 17:19:06 <noahm> ln5: the goal is to support a snapshot-like service as a scalable production service (e.g. can support an arbitrary number of Debian systems pointing apt at it) such that clients can be configured to point to admin-determined versions of the repo. The admin can then roll forward to a new repo version in a controlled fashion to test and deploy updates. 17:19:27 <jas4711> for snapshot i would expect that you would have all files available in sub-directories debian/20240501T024440Z/, debian/20240501T024441Z/, debian/20240501T024442Z/ etc 17:20:01 <weasel> jas4711: that's not really how things could work, though, is it? 17:20:22 <weasel> jas4711: your "source" would be a debian mirror having /debian/, and you add it, then it changes, you update, etc 17:21:41 <jas4711> i think it is possible to cut in two ways: 1) one git repository with commits showing the evolution of the archives, 2) one git repository with exploded view a'la current snapshot.d.o 17:22:12 <jas4711> i'm not sure git lfs handle 90M files well. but 1.5M files is no problem 17:23:17 <jas4711> or even the 3M files of archive.debian.org, works fast on my laptop 17:23:27 <h01ger> in coordination with ln5 i intend to close the meeting in 3min, unless something super interesting comes up :) 17:23:45 <h01ger> (everyone can continue talking after the meeting, there just wont be logging) 17:23:46 <weasel> h01ger: I think closing it is fine 17:24:24 <ln5> thanks h01ger, and thanks all. i need to move away from keyboard. 17:24:50 <weasel> cheers 17:25:44 <h01ger> thank you all! 17:25:47 <h01ger> #endmeeting