#debian-kernel log

20:02:05 <bwh> #startmeeting
20:02:05 <MeetBot> Meeting started Wed Feb 26 20:02:05 2025 UTC.  The chair is bwh. Information about MeetBot at http://wiki.debian.org/MeetBot.
20:02:05 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic.
20:02:37 <bwh> #chair waldi carnil
20:02:37 <MeetBot> Current chairs: bwh carnil waldi
20:03:00 <bwh> Hi all
20:03:01 <ukleinek> before we start, I want to highlight that carnil closes quite a some old bugs. Thanks a lot \o/.
20:03:29 <bwh> Thanks carnil, that's always appreciated
20:03:31 <ukleinek> There was some negative feedback and caring for his mental health I wonder if there is some need for support from the rest of us.
20:03:50 <bwh> #topic Bugs #1076372 and #1090717: NVMe corruption
20:05:26 * ukleinek didn't follow the discussion. does someone else know if there is something new?
20:05:30 <bwh> I don't see any news from the reporter on the upstream bug
20:05:51 <carnil> I do neither have dug into the long bug
20:06:00 <carnil> just noticed another comment that say "So BIOS firmware 4.10 seems to have solved the problem."
20:06:11 <bwh> Right
20:06:11 <carnil> (comment 128 in the upstream issue)
20:06:35 <carnil> so at least there seems to be some indication that's not actually a Linux issue to handle
20:06:57 <bwh> In any case I don't think there's anything we can do at this stage
20:07:11 <bwh> #topic Bug #1098661: linux: fails to boot on VisionFive 2: Unhandled exception: Store/AMO access fault
20:07:56 <ukleinek> there was also some discussion in #d-arm about efi zboot
20:08:01 <bwh> I think MR #1384 is related to this?
20:08:03 <ukleinek> (also involving aurel32)
20:08:45 <ukleinek> salsa 500s for me, but yes, there is an MR by aurel32 related to that
20:08:47 <bwh> OK, the MR subject only mentions riscv64 (I can't load the page for it at the moment)
20:09:16 <ukleinek> Topic in #salsa is "Maintenance in progress"
20:09:28 <waldi> and it is not flly right. even that systm can boot the kernel. except if run via grub
20:09:36 <carnil> (ah that's unlucky for the meeting)
20:10:07 * ukleinek is happy to have a clone of meeting.git :-)
20:10:18 <ukleinek> but that doesn't help that much
20:10:28 <aurel32> yes, it's just a revert of the riscv64 specific part of the commit
20:11:39 <bwh> aurel32: I didn't follow why it is broken for riscv64 and not arm64?
20:11:40 <aurel32> https://paste.debian.net/1356413/
20:11:49 <ukleinek> waldi: I'm not aware what the motivation was to enable that efi-zboot stuff. Is there a masterplan somewhere?
20:12:20 <waldi> ukleinek: be able to compress the stuff on arm64 as well
20:12:38 <aurel32> bwh: arm64 seems also borken, just i haven't tested it
20:12:59 <ukleinek> If I understand correctly vmlinuz.efi is (theoretically) superior to Image, but people/machines are not prepared to handle that?
20:13:22 <ukleinek> waldi: what is "the stuff"?
20:14:11 <aurel32> there are two issues on riscv64: 1) a bug somewhere that prevents the decompressor to work when using grub 2) kernel stopped working in the non-uefi case
20:14:24 <aurel32> AFAIK arm64 is only affected 2)
20:14:55 <aurel32> 1) is probably fixable, just that debugging on real hardware takes time (it can't be reproduced under QEMU, or at least I have not been able too so far)
20:15:26 <waldi> aurel32: which bootloader uses the non-uefi case?
20:15:44 <waldi> i know about flash-kernel
20:15:51 <bwh> Are most Debian arm64 systems booting with EFI?
20:16:16 <bwh> I would have guessed not, but I just don't know
20:16:18 <aurel32> waldi: u-boot with extlinux.conf, kvm
20:16:18 * ukleinek thinks most arm64 systems are booted by U-Boot
20:16:36 <aurel32> (or the rpi bootloader)
20:16:36 <waldi> aurel32: "kvm"?
20:16:56 <ukleinek> aurel32: ack, was just about to mention that.
20:17:10 <bwh> ukleinek: Yes or whatever Android uses, but Debian may be different
20:17:11 <waldi> aurel32: rpi uses flash-kernel, which copies stuff around. so this needs zboot support
20:17:14 <aurel32> waldi: qemu -enable-kvm, current S-mode is not supported under KVM, so you need to load the kernel directly
20:17:46 <ukleinek> "my" rpi doesn't use flash-kenrel
20:18:16 <aurel32> also IIRC, i guess a few testsuites are calling qemu with the default kernel and initrd, they need to be updated
20:18:33 <ukleinek> raspi-firmware handles kernel updates I think, but the intention is similar to f-k
20:19:13 <bwh> It seems like we need to revert this for both arm64 and riscv64 for now, then plan a transition together with the relevant boot loader maintainers
20:19:17 <waldi> aurel32: we already had an arch with zboot before, so they need to support it anyway
20:19:27 <ukleinek> I think the goal of migrating to efi-zboot is consensus?
20:19:37 <waldi> bwh: for experimental?
20:20:07 <ukleinek> waldi: if this influences backports, IMHO yes
20:20:12 <aurel32> note also that the situation between arm64 and riscv64 is a bit different as the later is the is already using EFI_STUB=y with an uncompressed kernel, so the kernel already works fine for both EFI non non-EFI, even with systemd-boot
20:20:31 <aurel32> so efi-zboot only brings compression
20:20:38 <bwh> waldi: Less important for experimental, but still
20:20:56 <aurel32> on arm64 it is needed to keep compression with systemd-boot
20:21:43 <ukleinek> Is systemd-boot one (or the?) motivation to migrate to efi?
20:22:38 <ukleinek> Just for me: efi-zboot is a compressed binary, the bios/bootloader is expected to extract it to the right location in memory and then dive into it?
20:23:00 <waldi> ukleinek: no. the included efi binary decompresses it
20:23:19 <ukleinek> ah, so it's like a zImage, just as efi binary.
20:23:29 <waldi> if you don't use efi, you need to recreate that decompressor somehow
20:23:57 <waldi> yes. there are even patches floating around to convert x86-64 to zboot
20:24:51 <aurel32> qemu/arm64 is also able to do the decompression, but only for gzip, not zstd
20:24:59 <aurel32> qemu/riscv64 is not able to
20:25:12 <waldi> issue is reported. someone needs to implement it
20:25:31 <waldi> neither does loog64
20:25:39 <ukleinek> And there is no zImage for arm64 and riscv64? That would be an alternative, right?
20:25:49 <bwh> Right, there is no zImage
20:26:01 * ukleinek guesses upstream doesn't want zImage
20:26:32 <waldi> zboot is supperior, as it runs in a capable environment already
20:27:00 <aurel32> the alternative is uncompressed kernel with CONFIG_EFI_STUB=y. It's what we were using on riscv64 before that change
20:27:23 <bwh> It really seems premature to make this switch when we know some boot loaders and QEMU do not support it
20:27:50 <carnil> maybe the question could be: We know this is experimental only, and trixie is not yet released so trixie-backports is not yet impacted, but would a tempoary revert be helpful to get some baseline done first on the other fronts and then re-apply the implementations (or maybe postpone it for after the trixie release at all?)
20:27:51 <aurel32> that make the kernel way bigger, but if you look at kernel + initrd the difference is not some important
20:27:54 <bwh> I know this is in experimental but that will go to unstable in the middle of the year
20:28:33 <ukleinek> For a softer transition it would be good if we could have both. Maybe a separate kernel image package providing the efi-zboot image to work on bootloader support?
20:28:56 <waldi> ukleinek: add a script to /etc/kernel that decompresses the kernel
20:29:18 <aurel32> well also people are using experimental during the freeze to get a newer kernel for newer hardware
20:29:46 <ukleinek> waldi: fine for me, that script should run automatically at install time to ensure that unprepared machines can still boot.
20:31:12 <ukleinek> maybe plus making sure that only Image or the efi-image is in /boot to not excessively eat partition space?
20:31:51 <bwh> So let's revert the change until we have something like that implemented
20:31:56 <ukleinek> ack
20:32:14 <waldi> bwh: if we set a definitive date
20:32:42 <ukleinek> waldi: what do you imagine approximately?
20:33:14 <ukleinek> something like "trixie release + X"?
20:33:22 <waldi> six months. this should be enough for people to follow
20:34:16 <ukleinek> trixie + six months?
20:34:19 <bwh> We also need a NEWS entry for any change like this
20:34:59 <aurel32> i still don't get what it brings besides compression to the riscv64 case
20:35:42 <ukleinek> Does upstream push into the efi direction? On both arm64 and riscv64?
20:37:26 * ukleinek eyes the agenda and wonders how we can close this discussion to have opportunity to handle the other issues, too.
20:37:54 <bwh> #agreed EFI_ZBOOT should be disabled for now on arm64 and riscv64
20:38:23 <ukleinek> I can care to look over aurel32's MR and merge it
20:38:26 <waldi> #agreed trixie + 6 months is re-enable
20:38:38 <waldi> ukleinek: no. just revet the original one
20:38:59 <bwh> #topic Bugs #1086028, #1087809, #1093200: mips spurious EFAULTs
20:39:24 <carnil> bwh: asked upstream to backport two commits to 6.1: https://lore.kernel.org/stable/Z79tTfjD-rCIa6EV@eldamar.lan/T/#u
20:39:34 <bwh> Yes, I saw that, thanks!
20:40:04 <ukleinek> waldi: fine for me
20:40:07 <bwh> So do we want to apply those already to unblock builds, or should we wait for a stable update?
20:41:24 <carnil> bwh: given it's mips6el and that we have the problem since 6.1.37-1 I'm not sure we should hurry up a next upload. I have 6.1.129 already prepared and a point release is upcoming so latest then we should have all in I believe
20:41:35 <carnil> 2025-03-15 is point release
20:41:43 <carnil> but if you think it should happen earlier I can do that
20:42:02 <bwh> That makes sense to me
20:42:05 <ukleinek> looking at https://buildd.debian.org/stats/graph-week-big.png the situation doesn't seem too bad. (But where is mipsel?)
20:42:16 <carnil> so I would have waited until upstream really queues up at least the two commits
20:43:10 <bwh> So no specific action needed here, I think
20:43:28 <bwh> #topic Bug #1071562: nfsd blocks indefinitely in nfsd4_destroy_session
20:43:49 <carnil> this one has two commits in 6.1.129-1 as well
20:44:17 <carnil> according to Chuck and other upstream people there are still known issues, for the above bug one reporter said that after applying the patches situation is stable
20:44:29 <carnil> so I have added closer for this bug in 6.1.129-1
20:44:41 <bwh> which is not released yet, right?
20:44:51 <carnil> no not in Debian
20:45:02 <bwh> OK
20:45:11 <carnil> 6.1.129-1 is just in https://salsa.debian.org/kernel-team/linux/-/merge_requests/1381
20:45:28 <bwh> Well, we can come back to this bug if it turns out not to be fixed
20:45:39 <carnil> yes
20:45:46 <bwh> #topic #1085178: linux-signed-amd64: Some BPF fentry hooks silently fail
20:46:24 <carnil> Relevant comment here: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1085178#30 "I would guess that series will merge into 6.15, but we'll have to see."
20:46:57 <ukleinek> ..ooOO(The reporter has an @crowdstrike.com address ...)
20:47:04 <bwh> I have no idea what's going on here
20:47:28 <bwh> ukleinek: Yep, they use BPF to crash^Wsecure Linux systems now
20:48:18 <bwh> I don't think we have anything to do here
20:48:39 <bwh> If and when the fixes land upstream we maybe need to request backporting to stable
20:48:44 <carnil> wait until it lands in mainline, eventually maybe they go back to stable then
20:48:53 <ukleinek> apart from keeping an eye on that and check it's properly backported
20:49:02 <bwh> (but again I don't really understand the bug and I don't know whether it's practical to backport)
20:49:33 <bwh> #topic Bug #1095745: rockchip: NVMe unavailable on rk3568 platform
20:50:25 <carnil> this is a regression for all stable series. There is a commit/patch acked, but still (not checked today) it did not get up to mainline
20:51:13 <carnil> > the patch already is in the fixes-branch of the phy-tree [0], sho should
20:51:16 <carnil> make its way into 6.14-rc shortly.
20:51:21 <carnil> https://lore.kernel.org/lkml/6647031.K2JlShyGXD@diego/
20:51:55 <bwh> It's in next
20:52:26 <carnil> ok
20:52:29 <ukleinek> It's not in linus/master
20:52:40 <bwh> Maybe we should cherry-pick it?
20:53:07 <carnil> I tried here to get it a bit faster in: https://lore.kernel.org/lkml/Z7gosm7PJMR0zCg4@eldamar.lan/ but apparently it does not seem that important
20:53:13 <bwh> because it's probably going to miss the point release otherwise
20:53:24 <ukleinek> sounds reasonable to cherry-pick
20:54:04 <ukleinek> hmm, the commit that is noted in the Fixes: line is only in v6.13-rc5? That's not what I expected.
20:54:24 <carnil> ok yes we can do that, so cherry-pick for the next experimental (debian/latest) and unstable upload (and then cerry-pick it as wlel for 6.1.y for bookworm)
20:55:09 <carnil> ukleinek: fbcbffbac994aca1264e3c14da96ac9bfd90466e is in 6.13-rc5 but it got backported to 6.1.123, 6.6.69. 6.12.8
20:55:25 <ukleinek> carnil: ah, that explains it. thx
20:55:44 <bwh> carnil: Will you take an action for that? (Only if you have the time)
20:55:59 <carnil> bwh: yes you can assign a an action for that to me
20:56:23 <bwh> #action carnil will apply upstream fix for #1095745 to affected branches
20:56:27 <carnil> I prefer that it lands officially in the stable series but if that fails I will cherry-pick it
20:56:37 <ukleinek> carnil: If you hit problems, feel free to get in touch. I probably have more time than usual for such things next week.
20:56:41 <carnil> "if taht fails in time for the next upload I mean"
20:56:43 <bwh> #topic Bug #1050578: linux-image-6.1.0-11-amd64: kernel disk device cache coherency issue: stale reads on /dev/sda1
20:56:58 <carnil> ukleinek: ok noted!
20:57:18 <bwh> I think the user did something crazy and this is not a bug
20:57:44 <ukleinek> bwh: that's what I thought when I saw "hexedit /dev/sda"
20:58:05 <carnil> bwh: some background on this if you all are interested (but we are short in time)
20:58:08 <carnil> I will try to be brief
20:58:25 <carnil> - this was an old bug without relevant action, I closed
20:58:31 <carnil> - reporter did not agreed
20:58:43 <carnil> - short interaction to let him explain, where he mentions he reported upstream
20:59:02 <bwh> I think the issue is the page cache of whole-disk and partition block devices are independent; this is known upstream and wontfix
20:59:05 <carnil> - since coonveration was a bit difficult, researched and there is https://lore.kernel.org/lkml/CA+jjjYTk=5wn2o46uNB+bJYX8xLgMP==dsJuvC94DvtN2f_6Yw@mail.gmail.com/ upstream
20:59:22 <carnil> which is "intersting to read"
20:59:38 <carnil> but at this point I think we can try to close the bug again and hoping reporter does not play pingpong on reopening
20:59:39 <bwh> So I propose I will tag this as wontfix
20:59:44 <bwh> and downgrade to normal
20:59:45 <ukleinek> ..ooOO(If it stings, I won't read it :-)
21:00:14 <bwh> Does anyone disagree with that resolution?
21:00:20 <ukleinek> I don't know
21:00:22 <carnil> bwh: yes please in this case.
21:00:38 <carnil> not sure about to close it as well, as this brings it away from open bugs plate, but we might diagree here
21:00:44 <bwh> #action bwh will mark #1050578 wontfix and reduce severity
21:00:59 <ukleinek> (FTR: I don't know = I don't disagree)
21:01:00 <carnil> (my goal is to keep our bugs somehow overviewable in the BTS)
21:01:05 <bwh> #topic Bug #1087981: linux-image-6.1.0-27-amd64: detected stalls in kernel log, system very slow on IO (regression)
21:01:28 <bwh> carnil: Yes, I get that. Can you easily exclude bugs marked wontfix?
21:01:43 <carnil> bwh: yes sure
21:02:18 <carnil> bwh: about #1087981: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1087981#37 but not sure if that's helpful
21:03:06 <carnil> if someone has better ideas that would be welcome I think
21:03:35 <ukleinek> I like the reporter testing a newer upstream version
21:03:36 <bwh> Uh why are they doing drop_caches periodically?
21:04:31 <bwh> That's probably not going to help with "computer got slow"
21:04:48 <ukleinek> bwh: where do you see that?
21:04:59 <bwh> In the latest kernel.log.xz
21:05:15 <bwh> The times are several minutes away from the stall reports though
21:06:13 <bwh> So I have no idea what's going on
21:06:30 <carnil> so maybe let's wait that Willi reports back
21:06:34 <bwh> OK
21:06:45 <ukleinek> +1, so I think testing upstream and then eventually involving upstream sounds reasonable
21:06:57 <bwh> #topic Bug #1098354: linux-image-6.1.0: FriendlyElec R5S, one Ethernet-Port / PCI-Device missing on Kernels beyond 6.1.0-28-arm64
21:07:32 <carnil> duplicate of the previous discussed on, together with #1095745
21:07:37 <bwh> Oh right
21:07:48 <carnil> (and #1098250)
21:08:09 <bwh> #topic Bug #1098698:  linux: Segfault and system hang on larger network file transfers
21:08:55 <bwh> carnil: I think you believe this to be fixed, but the reporter ran into another regression, right?
21:09:52 <carnil> bwh: yes the original trace posted was a known bug which got fixed, but reporter experiences those hangs in my understanding with the most recent kernels still (asked to explicitly confirm). But then this is still an open issue (again unrelated to the original posted trace)
21:10:34 <bwh> Can you ask for another crash log?
21:11:05 <carnil> ok yes. Is there anything else we can ask at this stage already on the problem?
21:11:46 <bwh> I'm guessing there may be some difficulty getting a log
21:12:20 <bwh> So you may need to point to netconsole documentation
21:12:24 <carnil> right because the system get unresponsive, but maybe attaching a netconsole could get enugh information
21:12:27 <carnil> ah same idea :)
21:12:42 <ukleinek> Is earlyprintk + sysrq a thing on amd64?
21:12:55 <ukleinek> That might be easier than netconsole
21:13:01 <waldi> yes, it is
21:13:27 <bwh> I'm not seeing how that would help
21:14:11 <ukleinek> bwh: because the output would appear, but getting that in a mail is difficult?
21:14:57 <bwh> This is not a boot failure so earlyprintk is irrelevant
21:15:56 <bwh> #action carnil will ask for new kernel log for #1088826
21:15:59 <ukleinek> ack, that's just an automatism of mine to thing earlyprintk when considering sysrq
21:16:12 <ukleinek> s/thing/think/
21:16:32 <bwh> Scratch that
21:16:39 <bwh> #action carnil will ask for new kernel log for #1098698
21:16:58 <bwh> #topic Bug #1088826: /usr/share/bug/linux-image-686-pae/presubj: Fails to boot after 6.1.0-22
21:17:10 <carnil> I do not expect an answer here on the moreinfo question
21:17:14 <ukleinek> #1088826 waits for report feedback
21:17:28 <ukleinek> the reporter's email address looks suspicous
21:17:37 <carnil> has "sapammer @ ..." address
21:17:37 <bwh> Right
21:18:08 <bwh> So, nothing to do for this bug for now
21:18:25 <bwh> I will skip all the bugs < important
21:18:55 <bwh> #topic Issue linux#6: trixie kernel maintenance
21:19:09 <bwh> I still need to talk to the release team about this
21:19:40 <bwh> #topic New upstream versions
21:19:54 <bwh> I need to update firmware-nonfree, ktls-utils, and wireless-regdb.
21:20:19 <bwh> Did anyone look at a linux updatie to 6.14 yet?
21:20:33 * ukleinek didn't
21:20:39 * carnil is doing basic testing with the current 6.12.17-rc2 and 6.13.5-rc2 but no work done on 6.14 at all
21:21:06 <carnil> and I wonder when it's the best time to switch from 6.13.y stable series in experimental to an RC version of 6.14
21:21:06 <bwh> Well, if I ever get through my actions I will take a look at it
21:21:14 <carnil> early would be nice so that we "keep the pace"
21:21:21 <bwh> yes
21:21:49 <bwh> #topic Merge requests
21:22:20 <bwh> I looked at all the initramfs-tools MRs but didn't fully review all of them yet. I'm planning to make a release this week.
21:22:46 <bwh> We discussed linux#1384. Do either of the others need discussion?
21:23:09 <carnil> there would be as well #1359 were I think is disagreement
21:23:24 <carnil> but we are almost over the time and not sure how pressing a decision is here
21:23:38 * ukleinek doesn't understand the problem in !1359
21:23:39 <carnil> and waldi is anyway in the best position to explain the current issue
21:23:44 <bwh> We are well over time :-/
21:24:15 <bwh> I haven't looked at #1359 yet
21:25:00 <ukleinek> I'll look into the arm64 MRs (!1313 + !1301)
21:25:34 <bwh> ukleinek: Thank you
21:25:55 <bwh> I will try to look at !1359 but can't promise it
21:26:02 <ukleinek> there are some more, !1295
21:26:53 <bwh> The script only shows MRs that have been changed in the last week
21:27:06 * ukleinek also looks into !1321 as he was involved into that already
21:27:31 <carnil> does it skip as well such in Draft and with failed CI?
21:27:41 <bwh> It skips drafts, yes
21:27:47 <carnil> (do not neet to answer, can look up in code myself later)
21:27:53 <carnil> bwh: ack
21:28:01 <ukleinek> though a look from someone with more knowledge about the debian/rules targets would be welcome
21:28:03 <bwh> failed CI should be indicated in the St(atus) column
21:28:18 <bwh> #topic AOB
21:28:33 * ukleinek will not be able to attend the meeting next week.
21:28:55 <carnil> for chair: I think it is in meanwhile my turn again
21:29:02 <bwh> Thank you
21:29:24 <ukleinek> so in two weeks it will be my turn. Feel free to assign me next week.
21:29:36 <carnil> ukleinek: ok!
21:30:18 <bwh> #endmeeting