#debian-kernel log

19:00:01 <carnil> #startmeeting
19:00:01 <MeetBot> Meeting started Wed Oct  1 19:00:01 2025 UTC.  The chair is carnil. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:00:01 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic.
19:00:09 <yunseongkim[m]> Hi
19:00:14 <carnil> #chair bwh ukleinek waldi
19:00:14 <MeetBot> Current chairs: bwh carnil ukleinek waldi
19:00:15 <waldi> hi
19:00:21 <carnil> hi all!
19:00:42 <kathara> helloo
19:00:52 <carnil> before we start with the agenda, is there anything else to be discussed beforehand and put on top of the list?
19:01:13 <bwh> Hello
19:01:37 <bwh> I don't have anything
19:01:42 <waldi> me neither
19:01:44 <carnil> ack
19:01:48 <carnil> so let's start
19:01:54 <carnil> #topic Build time and limits in Salsa CI
19:02:10 <carnil> I think we have a problem. Hoped that maybe santiago might be able to join the meeting tonight.
19:02:18 <waldi> yes we do
19:02:23 <carnil> our build time seems to hit now the time limit of 3h
19:02:31 <bwh> On which branch(es)?
19:02:41 <carnil> on debian/latest at least
19:02:53 <waldi> gcc 15 looks slower
19:02:59 <carnil> I have not seen the problem on trixie and bookworm
19:03:08 <waldi> and ccache seems to be completely ineffective in the CI case
19:03:32 <bwh> Would it make sense to revert to gcc-14 temporarily and report this as a bug?
19:03:47 <carnil> for ccache there is https://salsa.debian.org/salsa-ci-team/pipeline/-/issues/492 but I'm not sure if we are hit about the same
19:04:11 <waldi> and for some reason this thing currently uses mmdebstrap inside an existing system, re-downloading all
19:05:00 <bwh> Is there a way to see what is currently in the cache?
19:05:04 <waldi> no
19:05:14 <waldi> apart from the normal stats
19:05:26 <waldi> everything is hashed, so you can just see that it does not match
19:06:11 <waldi> but ccache can't do mark and sweep expiration, to see if it might overflow with data
19:07:40 <waldi> no idea right now. i have to talk to salsa people again, they still have to so a shared runner migration
19:08:45 <bwh> So gcc 15 is a problem, and I suspect there are new drivers that should be disabled in the cloud config
19:08:53 <carnil> bwh: if you think it would help then yes we can revert my proposal to go for gcc-15 on the debian/latest branch, but I'm not sure this is the only problem here.
19:09:08 <waldi> please see if it fixes something first
19:09:40 <carnil> so an action, revert gcc-15 switch first
19:10:04 <carnil> #action carnil reverts switch to gcc-15 for debian/latest branch
19:10:09 <waldi> no?
19:10:10 <bwh> Broken caching would hurt  the usual case but we still have to get the worst case (no usable cache) under 3h
19:10:18 <carnil> no?
19:11:03 <bwh> Shall I take an action to actually test the difference?
19:11:33 <carnil> ok that sounds better, do you agree waldi?
19:12:19 <waldi> i'll see what i can find
19:12:56 <bwh> #action bwh will test performance difference of building with gcc 14 vs 15, and open bug if significantly worse with 15
19:13:10 <waldi> okay, 14 vs 15 seems to be no difference
19:13:20 <bwh> #action bwh will look for more drivers to disable in cloud configs
19:13:29 <waldi> i have 14 builds with 169 minutes
19:14:15 <waldi> so the difference now is: someone merged the source and build steps
19:14:26 <carnil> and might the problem be actually with the switch to the new pipelines and using sbuild+unshare otherwise?
19:14:53 <waldi> well, we can test that easily
19:15:42 <bwh> It definitely doesn't help, but the source build doesn't seem to take very long
19:16:29 <carnil> waldi: my understanding was, we cannot revert https://salsa.debian.org/kernel-team/linux/-/commit/167b300f2f9f95426b0d86670ceec15ce84405a6 to test, because once salsaci people switched the pipelines, ours were not working anymore at all
19:17:00 <waldi> carnil: no idea. i will transplant my simplified one and see what it gives us
19:17:19 <bwh> So long as the required Docker images exist I would expect we can still test with a copy of the old version
19:18:20 <bwh> waldi: Can you give yourself an appropriate action?
19:18:39 <waldi> #action waldi to test pipeline and slow runtimes
19:19:01 <carnil> so thank you very much to both, I think we have a couple of actions now that we can move on to the next item
19:19:18 <waldi> yes
19:19:20 <carnil> #topic Disabling bcachefs
19:19:25 <carnil> this is an 'easier' one.
19:19:37 <waldi> yes
19:19:43 <carnil> for debian/latest I just have to adapt what bwh has commented and add a hint to the now existing dkms module in NEWS file
19:19:58 <carnil> the question is if we just want to leave status quo for trixie
19:20:27 <carnil> I'm undre the impression it make no sense to really keep it enabled in trixie, no tools were available when trixie was released
19:20:40 <bwh> I wish we hadn't included it in trixie, but I Think we should not remove remove drivers during a stable release unless they are completely busted
19:20:43 <waldi> and there wont be any fixes. so lets drop it as well as unsupported
19:20:58 <waldi> bwh: well, is it still security supported?
19:21:01 <carnil> so the amount of people really potentially using it should go to zero.
19:21:31 <bwh> waldi: No, but we could say the same for many other components unfortunately
19:22:51 <carnil> bwh: I agree with you that we should not have included it in trixie and removed it in time, it's my fault not having it on the radar once the release did approach (because I was confident when enabling it, and did not noticed that the tools went away in meanwhile).
19:23:28 <bwh> carnil: I don't blame you at all
19:23:29 <carnil> we have disabled for instance broken drivers like ntfs in past in stable releases, and in this case still my impresion is that the usage will be almost not present and with a corresonding NEWS entry "safe enough" to remove
19:23:36 <waldi> the same as we remove packages from stable, we should not be shy to remove stuff we can not support any longer
19:23:51 <carnil> but that is the point of that discussion to see if we agree or rather would want to keep it enabled
19:24:56 <carnil> from my sec-team experience, we have already triggered some removals in packages because not beeing supportable in stable releases (one of the recent one was guix, which worked quite without much hassle/flux)
19:25:46 <carnil> but maybe we should just take the action now to remove it from debian/latest and once that is done make up our minds for or not to remove it from trixie as well
19:26:27 <bwh> If we want to remove it from trixie we should definitely talk to the release team and maybe put in a very visible warning for some versions before actually removing it
19:29:19 <bwh> Next topic?
19:29:37 <carnil> ok. One other alterantive is to keep it and since we rebase 6.12.y go with whatever get applied as fixes.
19:29:42 <carnil> yes let's move to next topic
19:29:48 <carnil> bug list
19:29:57 <carnil> #topic #1106411: (i, u) linux-image-6.12.27-amd64: kernel NULL pointer dereference in bmc150_accel_core (merged with #1102522, #1112643)
19:30:21 <carnil> I'm following the upstrema discussion. Still there is no fix. I just pinged today the upstream thread to see if things can be moved on.
19:30:43 <bwh> So I see
19:30:44 <carnil> for almost all affected people which trigger the issue the easy workaround is, since it is not usable, to just mask iio-proxy
19:31:03 <carnil> so I would say we wait bit longer to see what lands in mainline and make sure it get backported to stable
19:31:28 <bwh> I was wondering if we could just pick that fix already
19:32:02 <carnil> I would prefer not until it is really blessed and on it's way to mainline
19:32:22 <carnil> (but my opinion only here)
19:33:04 <bwh> I'll try to have a look at what the patch is doing and review it myself, and then open an MR if it seems safe
19:33:05 <carnil> I proposed to get actual affected people test patches so they can add Tested-by and get more confidence
19:33:10 <bwh> right
19:34:14 <carnil> so okay to wait, I'm not really confident aobut picking the https://lore.kernel.org/linux-iio/20250613124648.14141-1-marek.vasut+bmc150@mailbox.org/ patch as long there is so much discussion around it
19:34:19 <carnil> ?
19:34:56 <bwh> I'm OK to wait but I might open the MR after looking over all that
19:35:04 <carnil> ok!
19:36:15 <carnil> can I add an action for you for this second pair of eyes review and possibly open the MR?
19:36:22 <bwh> No :-)
19:36:29 <carnil> ok no :)
19:36:33 <carnil> then we move to the next topic
19:36:49 <carnil> time is progressing but it was good to take time for the two first items
19:36:58 <carnil> #topic #1111095: (i, ) firmware-amd-graphics: Radeon HD 8280 : gpu lock, black screen and crash.
19:37:19 <carnil> reporter is on monologue, but has a followup: without any firmware installed things get more "stable"
19:37:33 <carnil> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1111095#62
19:37:55 <Santiago[m]> carnil: sorry, I've just seen your message (I'm on the phone right now, but can chime in if useful)
19:38:53 <bwh> The bug was opened against firmware-amd-graphics and now they are talking about removing firmware-intel-graphics??
19:39:25 <carnil> bwh: we did reassing it later to src:linux
19:39:32 <carnil> ah wait
19:39:36 <carnil> sorry misread
19:40:21 <carnil> you are right there is not firmware-amd-graphics in the reporter's last list
19:40:46 <waldi> but the log from 6.1 clearly shows radeon loading firmware
19:41:29 <carnil> firmware-amd-graphics: /usr/lib/firmware/radeon/BONAIRE_vce.bin
19:43:47 <bwh> I think if they also removed firmware-amd-graphics then radeon would abort probing (thanks to our patch) and so they would be using a generic video driver
19:44:26 <waldi> the 6.12.43 log does not show firmware loading for some reason
19:44:44 <waldi> or did the message for it vanish?
19:45:13 <carnil> should we get confirmation on installed state for firmware-amd-graphics, get full kernel log there, retest with current 6.12.48-1 without and with firmware-amd-graphics and let us get all the log s for comparison?
19:45:23 <waldi> yes
19:46:14 <carnil> #action carnil ask reporter of #1111095 to confirm installed state of firmware-amd-graphics and provide full kernel logs (with updated 6.12.48-1) with both firmware installed and not
19:46:22 <bwh> waldi: We used to add a log message for successful firmware loading and we no longer do
19:46:45 <carnil> let's do that and see next week were we are
19:46:52 <bwh> yes
19:46:55 <carnil> #topic #1112288: (i, u) amdgpu: GPU hang with 6.12 & 6.16sid, 6.16 liquorix works.
19:46:56 <waldi> bwh: ah, so it got missing
19:47:27 <bwh> waldi: I intentionally minimised the patch
19:48:08 <carnil> some fresh information here. liquorix people pointed out where the single patches are in their branches.  But this needs someone to go trough if motivation is available and see which patches are not backported upstream to 6.16.y and 6.12.y if we can identify the fixing patches
19:48:22 <carnil> I personally won't spend energy into that
19:48:23 <bwh> Right, which looks like it could be a big job
19:48:39 <bwh> At least we know where to look for individual patches now
19:49:14 <carnil> for reference of the meeting logs: it is in https://github.com/zen-kernel/zen-kernel/commits/6.16/fixes/
19:49:36 <bwh> carnil: That should be a #link I think
19:49:41 <waldi> #info liquorix patches are at https://github.com/zen-kernel/zen-kernel/commits/6.16/fixes/
19:50:00 <waldi> #link https://github.com/zen-kernel/zen-kernel/commits/6.16/fixes/
19:50:01 <waldi> okay
19:50:09 <carnil> I guess we won't take actions here right now, move to the next
19:50:15 <bwh> right
19:50:19 <carnil> #topic #1112627: (i, Mu) linux-image-6.16.3+deb14-amd64: Intel audio no longer works: DMAR: [DMA Write NO_PASID] Request device [00:1b.0] ... non-zero reserved fields in PTE
19:50:24 <carnil> waiting for bisect results, no action
19:50:31 <carnil> #topic #1114557: (i, u+) linux-image-6.12.43+deb13-amd64: Jieli touchscreen and stylus no longer supported
19:50:49 <carnil> on it yet, tested patches, waiting for mainline and backporting to stable
19:50:59 <carnil> #topic #1114884: (i, M) linux-image-6.1.0-39-amd64: AMD Ryzen 9 7950X, linux-image-6.1.0-39-amd64 hangs on boot while 6.1.0-37-amd64 works fine
19:51:20 <carnil> kathara: was on it to assist. Waiting for confirmation on 6.1.147-1. Then bisection to narrow down braking commit needed. For now nothing to do.
19:51:37 <bwh> OK
19:51:47 <carnil> #topic #1114912: (i, ) linux-image-amd64: KVM GPU passthrough causes kernel crash and system hang on Debian 13 after VM shutdown
19:52:17 <carnil> there appears to be workaround (see last message), but no solution worked out sofar. Should it be reported upstream?
19:52:40 <bwh> Yes I think so
19:53:25 <carnil> bwh would you take an action for it?
19:53:41 <bwh> #action bwh will forward #1114912 upstream
19:53:49 <carnil> thanks :)
19:54:00 <carnil> #topic #1115613: (i, +M) linux: Please enable CONFIG_SOUNDWIRE_AMD=m
19:54:28 <carnil> waiting. Reporter confirmed that it is indeed not enough to only enable that module, will do more test/work and then report back -> no action
19:54:33 <waldi> there are quite some modules missing it seems
19:55:07 <carnil> waldi: I would expect reporter can provide us a tested list of required modules which we then can enable
19:55:14 <waldi> i hope so
19:55:51 <carnil> #topic #1116065: (i, Mu) linux: kernel oops with rsync on MSI X99A with ntfs3
19:56:23 <bwh> Only one driver can actually bind to the PCI device, so while lspci on Ubuntu gives us a list of other drivers we probably should enable, they won't be relevant to this specific device
19:56:44 <waldi> they might be helper modules
19:56:47 <carnil> this is not an easy one. The isue at least according to reporter seems to narrow down when rsyncing where NTFS filesystem is involved, but only when using the ntfs3 driver, and when on a SSD. For me there is still quite an unclear picture of the problem
19:57:29 <carnil> ah you are back to the SOUNDWIRE_AMD one, sorry was too fast
19:57:31 <bwh> Maybe there is a race that is harder to hit with an HDD
19:59:15 <waldi> yeah. use after free. the address is valid for kernel
20:00:45 <waldi> wait, instruction pointee?
20:00:47 <bwh> carnil: I agree that the earlier upstream bug report looks quite similar. Maybe a regression of that bug
20:02:01 <bwh> waldi: Use-after-free and an indirect jump. Fun for all the exploit writers
20:02:22 <waldi> and yes, it looks pretty similar
20:02:28 <carnil> it might now be worth to report it to the ntfs3 list (while checkng it does not seem still too active, but at least it can be a next step)
20:02:30 <waldi> indirect jumps are the best
20:02:49 <bwh> carnil: Yes this should go upstream
20:03:07 <carnil> #action carnil forwards information from #1116065 to ntfs3 driver upstream
20:03:27 <carnil> do you have still capactiy for a couple of more bugs, or should we switch to AOB?
20:03:39 <bwh> Let's do the last 2 "important"
20:03:54 <carnil> agree, and then stop
20:03:58 <carnil> #topic #1116554: (i, ) linux-image-6.12.43+deb12-amd64: hard freeze during normal operation with stacktraces in the syslog
20:04:16 <carnil> New report, traces and logs are provided by reporter.
20:04:43 <carnil> maybe not easily reproducible, mich might make bisection bit harder
20:05:02 <carnil> but it is not a direct regression
20:05:14 <bwh> I see a hang in a USB-C-related work item
20:05:18 <carnil> reporter has issues as well with earlier versions
20:06:33 <bwh> Maybe worth checking whether they're using a USB-C dock and if there's an update firmware available for that
20:06:57 <waldi> and it would be useful to see who holds this mutex
20:07:33 <bwh> Does that get logged?
20:08:12 <bwh> I suppose not as the "hung task" is not specific to mutexes
20:08:14 <waldi> no, it is not easy to see. a task dump can help
20:08:18 <waldi> no
20:08:19 <bwh> right
20:09:12 <waldi> ww don't even know which ucsi backend is used
20:10:15 <waldi> so a complete log, and echo t > /proc/sysrq-trigger (was it t?)
20:10:46 <bwh> yes
20:10:56 <bwh> Can you ask for that?
20:11:19 <waldi> #action waldi to handle #1116554, full log, task info via sysrq
20:11:25 <carnil> thank you waldi
20:11:36 <carnil> #topic #1116643: (i, ) UBSAN: shift-out-of-bounds in .../drivers/gpu/drm/display/drm_dp_mst_topology.c (shift exponent -1)
20:11:40 <carnil> this is as well a new report
20:12:12 <bwh> New to us, anyway
20:12:39 <carnil> but maybe we have already enough information to forward it
20:12:59 <bwh> Yes, though it's probably worth searching to avoid a duplicate report
20:13:12 <carnil> yes right
20:13:52 <carnil> I can give it a try, but will put it down to the priority list TBH
20:13:58 <bwh> #action bwh will look for upstream bug for #1116643 and ask reporter to forward if there is none
20:14:06 <carnil> ok better :)
20:14:18 <carnil> then I would say switch to
20:14:23 <carnil> #topic AOB?
20:14:47 <waldi> nothing from me
20:14:48 <carnil> who can chair next week? IIRC ukleinek offered that he can take it. But he is not around today to confirm
20:14:58 <waldi> i can do that
20:15:16 <carnil> #action waldi will chair next weeks team meeting
20:15:27 <carnil> so I would propose to close now
20:15:35 <bwh> OK
20:15:43 <carnil> thanks for all for partecipating today
20:15:44 <bwh> Thanks for chairing
20:15:47 <carnil> #endmeeting