#debian-kernel log

19:00:06 <ukleinek> #startmeeting
19:00:06 <MeetBot> Meeting started Wed Aug 20 19:00:06 2025 UTC.  The chair is ukleinek. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:00:06 <MeetBot> Useful Commands: #action #agreed #help #info #idea #link #topic.
19:00:11 <carnil> hi
19:00:15 <ukleinek> #chair bwh carnil waldi
19:00:15 <MeetBot> Current chairs: bwh carnil ukleinek waldi
19:00:19 <ukleinek> o/
19:00:56 <ukleinek> Anything before we go through the loooong buglist?
19:01:42 <ukleinek> A meta-comment: All those bug that we get now make me fear wanting to do a kernel bump in the middle of trixie if 6.12 will become unsupported upstream
19:02:52 <ukleinek> #topic #1111184
19:03:03 <carnil> I'm still optimistic (maybe wrongly) that we won't need to do
19:03:27 * ukleinek is somewhat optimistic for trixie, less sure for forky
19:04:00 <ukleinek> My comment for #1111184 in the agenda became obsolete by the reporter providing some info.
19:04:23 <ukleinek> They didn't get that we need an upstream bisect next, probably needs a reply pointing that out.
19:05:44 <carnil> Right, there are a lot of amdgpu changes in between 6.13-rc6 and 6.13.5, not sure why it was not narrowed down as well bit more with the kernel images between 6.13~rc6-1~exp1 and 6.13.5-1~exp1
19:05:54 <carnil> ash there were some more to test which would make our range bit smaller
19:06:21 <carnil> so two things I guess: 1. ask if the other kernels inbetween can be tested as well (or did I miss something?) and then ask for bisect in upstream kernel?
19:06:34 <ukleinek> but probably it's not worth binding 6.13.{0,1,2,3,4} packages?
19:06:37 <bwh> Sorry I'm latre
19:07:04 <ukleinek> bwh: you didn't miss much
19:07:16 <ukleinek> s/binding/building/
19:07:54 <carnil> ukleinek:  no not necdsarily, I was meaning to narrow down first in the kernel-image packages we uploaded to experimental which are inbetween
19:08:16 <carnil> there should be 6.13~rc7-1~exp1, 6.13.2-1~exp1, 6.13.3-1~exp1, 6.13.4-1~exp1
19:08:17 <ukleinek> ah, there are such packages, understood now
19:08:30 <bwh> I'm confused by the stack trace here.
19:09:12 <bwh> No amdgpu functions are in the stack trace (without a ?), only ftrace and kallsyms
19:09:21 <ukleinek> the kernel seems to load a module?!
19:10:41 <ukleinek> there was a bug where I wondered if it's really a graphics problem, but I think it was a different one. Given the bug history I'd not start wondering that, but let them bisect to an end and then start interpreting the result.
19:12:02 <ukleinek> it's currently loading the amdgpu driver, so I tend to agree it's related to that though.
19:12:31 <bwh> Maybe, maybe not
19:13:13 <ukleinek> I won't take actions today that take longer than today because my vacations start tomorrow
19:13:47 <ukleinek> carnil: do you wanna retry telling the reporter what we need next?
19:13:52 <carnil> ukleinek: yes I will do
19:14:10 <ukleinek> #action carnil to continue communication on #1111184
19:14:12 <ukleinek> thanks
19:14:20 <ukleinek> #topic #1104165
19:14:34 <carnil> another amdgpu one :-/
19:14:57 <ukleinek> and it seems to be related to suspend/resume
19:15:22 <bwh> I think we can leave this with upstream for now though?
19:15:35 * carnil agrees
19:15:49 <ukleinek> there might be a chance that it's not really amdgpu that has the problem, but that guake misbehaves somehow.
19:15:56 <ukleinek> Anyhow ack, next
19:16:03 <ukleinek> #topic #1107521
19:16:21 <ukleinek> I guess we're waiting on upstream for that one, too
19:17:19 <ukleinek> someone understanding why this is a fix and if it's sensible to backport that to 6.12 would be nice
19:17:55 <ukleinek> wait?
19:18:16 <bwh> OK
19:18:32 <ukleinek> The reporter already pinged upstream, that's why this bug reappeared on the agenda
19:18:40 <ukleinek> #topic #1109268
19:19:04 <ukleinek> that one I didn't understand, this was originally a firmware-atheros bug but then reassigned to linux
19:19:35 <bwh> The bug is that the driver tries to load a firmware file that isn't available (anywhere) and which it doesn't need
19:19:56 <bwh> but we log that and the installer prompts to provide this nonexistent file
19:20:13 <bwh> I think the driver should be patched to not request that file
19:20:49 <bwh> or at least we somehow avoid emitting the log message for it
19:20:50 <ukleinek> sending a patch to upstream is the implicit question then where to find that file
19:21:23 <bwh> I think it's a file that could be created in future to override the values in NVRAM
19:21:40 <bwh> so for upstream it's not a bug that the file is requested
19:21:42 <bwh> Do you see?
19:22:11 <ukleinek> so do we need a silent version of the firmware-request function, and so a Debian specific change?
19:22:24 <bwh> yes we might have to do that
19:22:53 <bwh> #action bwh will work out what to do with #1109268
19:23:04 <ukleinek> great
19:23:08 <bwh> and I will retitle this to explain what the actual bug is
19:23:08 <ukleinek> #topic #1109666
19:23:47 <ukleinek> It's unclear to me if the reporter has a problem with their storage device or if there is really a kernel bug
19:24:21 <ukleinek> Ah, "Older kernels on the same /boot partitions do boot without problems", so probably not a hw issue
19:25:29 <bwh> Yeah I'm pretty sure this is a crypto driver problem
19:25:34 <bwh> Let me handle this again
19:25:40 <carnil> thank you bwh
19:25:49 <bwh> #action bwh will continue to handle #1109666
19:26:01 <ukleinek> #topic #1110566
19:26:14 <ukleinek> carnil asked for more tests, tagged moreinfo
19:26:26 <ukleinek> #topic #1110687
19:26:59 <bwh> Reported to be fixed in 6.12.41-1
19:27:00 <carnil> I think this one can now be closed with 6.12.41-1
19:27:04 <ukleinek> #1110687 might be fixed in 6.12.41-1, but the problem isn't easily reproduced, so we have to wait a bit
19:27:23 <carnil> the answer came just in today, so did miss to take action to do so
19:27:26 <bwh> They say it was fine for a week, so...
19:27:27 <ukleinek> if you're confident, closing is also fine for me
19:28:10 <carnil> ukleinek: I think so, previously it would have crashed within a couple of days (~3 days) and now running good for a week
19:28:23 <carnil> I could cclose with the comment to please reopen if the bug still triggers
19:28:27 <bwh> agreed
19:28:30 <ukleinek> sounds good
19:28:41 <ukleinek> #topic #1110765
19:29:03 <ukleinek> We talked about that one last week, but without an action for it
19:29:03 <carnil> #action carnil closes #1110687 (with 6.12.41-1 + comment to reopen if still bug is triggered)
19:30:10 <ukleinek> That bug made me wonder about that other bug involving three screen, but that was an nvidia/nouveau one
19:30:30 <KGB> 03linux 05debian/latest 06Salvatore Bonaccorso * [update] merge request !1617: Draft: Update to 6.16.2 * 14https://salsa.debian.org/kernel-team/linux/-/merge_requests/1617
19:31:24 * ukleinek is irritated, that wasn't the amdgpu bug with three monitors
19:31:39 <carnil> there was one with nouveau
19:31:58 <carnil> I do not remember one with amdgpu
19:32:04 <bwh> So, we have a SATA regression, and an unrelated NVMe regression as a follow-up
19:32:33 <ukleinek> I think we'll come to that still. That there are three monitors involved wasn't explicit there, just my interpretation of the kernlog
19:33:09 <bwh> I don't know what bug you are talking about but it's not in the topic
19:33:19 <ukleinek> anyhow, for #1110765 it might be sensible to let the reporter confirm that 6.1 is still good
19:33:53 <bwh> I said last week I would take care of this one, so I will do that (eventually)
19:34:11 <ukleinek> fine!
19:34:19 <bwh> #action bwh will respond to #1110765
19:34:23 <ukleinek> #topic #1110783
19:34:32 <ukleinek> moreinfo -> wait
19:34:38 <carnil> ack
19:34:44 <ukleinek> #topic #1110839
19:34:55 <ukleinek> moreinfo -> wait, too
19:35:28 <ukleinek> #topic #1110999
19:36:06 <bwh> Missing any useful information
19:36:33 <ukleinek> (probably) the sof-audio-pci-intel-tgl makes the machine hang in intervals after a suspend/resume cycle.
19:36:36 <carnil> I'm not sure I added correctly found versions here, I'm confused
19:36:54 <carnil> ah no right it is on 6.1.140-1
19:37:10 <ukleinek> My idea would be to ask if unloading the driver module (before or after suspend/resume) makes a relevant difference.
19:38:33 <ukleinek> That would confirm if the driver is indeed responsible for the short hangs
19:38:43 <bwh> OK, things to ask: lspci -vvnn, full kernel log, reload the driver module, is this a regression?
19:39:05 <ukleinek> #action ukleinek asks all the things that bwh suggested for #1110999
19:39:14 * ukleinek can do that directly after the meeting
19:39:24 <ukleinek> #topic #1111095
19:39:47 <ukleinek> bugs-on-amdgpu++, asked for a bisection + moreinfo -> wait
19:40:10 <carnil> ok
19:40:11 <ukleinek> #topic #1111108
19:41:10 <ukleinek> that bug made it on the agenda because several bugs were merged, no new info in all 3 bugs (at least this morning)
19:41:22 <carnil> we still have no reply from the original reporter to https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1104269#114
19:41:35 <bwh> On the upstream report I see "Update: Issue still present with 6.12.41+deb13-amd64."
19:41:43 <carnil> I'm not really sure what we should do here
19:41:49 <carnil> the upstream issue as well has no movement
19:42:22 <carnil> apart we know form one other that is is present as well in 6.12.41-1
19:42:26 <ukleinek> I think as long as the other two bugs in the set-of-three are active, we can just carry the one where the original reporter didn't reply any more forward with the others
19:43:53 <bwh> agreed
19:44:33 <ukleinek> the upstream situation with amdgpu is sad, so many bugs and people from AMD not acting :-\
19:45:13 <bwh> Seems like all their effort is going into artificial stupidity rather than graphics
19:45:31 <ukleinek> I think there isn't much that we can do here.
19:45:37 <carnil> didn'w e have someone from AMD now DD, maybe we can ask him directly to have a look at those bugs
19:45:43 <carnil> and connect the right persons
19:46:26 <ukleinek> carnil: I'm not aware, but if you know someone suitable ...
19:46:39 <carnil> superm1, was it
19:47:20 <ukleinek> would it help to maintain a usertag for all those amdgpu bugs to be able to pass a flexible link with all the bugs to them?
19:48:26 <bwh> https://bugs.debian.org/cgi-bin/pkgreport.cgi?include=subject%3Aamdgpu;src=linux
19:48:36 <ukleinek> ah, TIL
19:49:12 <ukleinek> someone takes the action?
19:49:18 <bwh> I think it's probably easier to make sure that amdgpu is in the title than to work with the (underdocumented) usertags
19:49:31 <carnil> I can approach Mario and ask if he can have a look
19:49:32 <ukleinek> ack
19:50:02 <ukleinek> #action carnil approaches superm1 about all the amdgpu bugs
19:50:10 <ukleinek> #topic #1111180
19:50:43 <ukleinek> asked for some tests -> moreinfo -> wait
19:50:46 <bwh> agreed
19:50:55 <ukleinek> #topic #1111362
19:51:32 <ukleinek> that one is new, two bugs about network performance drop from bookworm to trixie
19:52:26 <ukleinek> both reporters use a realtek nic
19:52:46 <bwh> Yes though there are many different chips handled by r8169 now
19:53:27 <bwh> These should probbaly be forwarded upstream unless it looks like they may have been fixed post-6.12
19:53:29 <ukleinek> rtl8168e-3.fw (so probably rtl8168e) vs. RTL8125
19:54:45 <ukleinek> My first guess would be: disable EEE on the phy
19:55:30 <ukleinek> -> `ethtool --set-eee eth0 eee off`
19:55:57 <bwh> Well, better check that it is enabled first
19:56:35 <ukleinek> that would be querying for the output of `sudo ethtool --show-eee eth0`
19:57:09 <carnil> there was this regression as well on the regressions list, https://lore.kernel.org/regressions/CAJmAMMyOk7AVqQRrtK4Oum2uVKreGeLJ943-kkRCTspoGApZ8w@mail.gmail.com/ but this seems to have been introduced post 6.12.y
19:57:38 <bwh> and it's wireless
19:57:39 <ukleinek> this is wireless though
19:58:19 * carnil goes again quiet
19:58:34 <ukleinek> #action ukleinek asks both reporters about eee state on phy
19:58:56 <ukleinek> #action ukleinek asks both reporters about eee state on phy (#1111362 + #1111016)
19:59:09 <carnil> ukleinek: both reporters are the same BTW
19:59:10 <ukleinek> #topic #1111455
19:59:29 <ukleinek> NFS regression between bookworm and trixie
19:59:44 <ukleinek> 6.16 is good again
19:59:48 <carnil> I forwarded this to https://lore.kernel.org/stable/aKMdIgkSWw9koCPC@eldamar.lan/ but so far not reply
20:00:38 <carnil> there were a series of issues back in 6.12.y and 6.13.y with netfs, it got refactored and some fixes were then 6.12.y and 6.13.y specific fixes (as mainline never had the bugs).
20:00:41 <ukleinek> so no further action for now
20:00:43 <ukleinek> ?
20:00:56 <carnil> I included Max Kellermann as he was as well involved in the other netfs fixes
20:01:29 <carnil> ukleinek: yes wait for upstream to comment/help
20:01:39 <ukleinek> we did our planned hour now, how is your availability and sleep depriv state?
20:03:11 <ukleinek> Maybe #1111011 is related?
20:03:20 <carnil> I still can partecipate a bit, not sure how much my contribution will be
20:03:59 <ukleinek> #1111011 might be related to the rtl performance drop, not the NFS bug
20:04:21 <ukleinek> #topic #1111531
20:05:45 <ukleinek> very noisy kernel log, didn't spot the problematic part when I looked into that bug earlier today
20:06:53 <bwh> I'm not seeing a kernel crash
20:08:07 <ukleinek> maybe the machine just hung and the reporter calls that crash?
20:08:09 * carnil fails to see something useful
20:08:36 <bwh> It might be a hang of the compositor. I hit that a few times in gnome-shell recently
20:09:03 <ukleinek> Then Ctrl-Alt-F4 should still work?
20:09:07 <bwh> no
20:09:44 <ukleinek> sysrq or log extraction via ssh?
20:10:02 <bwh> AFAIK once a program takes over a console, the kernel doesn't handle VT switching
20:10:32 <bwh> might work
20:10:38 <ukleinek> (or both)
20:10:58 <carnil> oh this is the same reporter as #1110256
20:11:11 <ukleinek> different kernel though
20:11:26 <ukleinek> but both 6.12.x
20:12:06 <yunseongkim[m]> bwh: Oh, I've had a similar experience. The terminal mode switch and magic key also didn’t work for me.
20:12:10 <carnil> yes but in the later one he commented that 6.12.41 resolves the issue (this is the one where we discussed to indicate to provide netconsole over ethernet interface, how to trigger sysrq+t (and enable it first))
20:12:40 <bwh> So we should ask to do the same thing again?
20:12:53 <ukleinek> so we ask if the two reports are the same issue and if "So far I've had no issues at all" also applies to #1111531?
20:13:23 <carnil> + if the issue is still triggered to do these debugging steps we proposed in the first bug
20:13:42 <carnil> ukleinek: you can assign it to me, will followup
20:13:51 <ukleinek> hmm, #1111531 was reported on 6.12.41 one day after they claimed no issues
20:14:23 <ukleinek> #action carnil cares for #1111531 (might be related to #1110256)
20:14:42 * ukleinek 's concentration drops, I suggest to end the meeting here
20:15:04 <bwh> I would like to briefly discuss when we plan to upload 6.16 to unstable
20:15:46 <ukleinek> no blocker from my side, it's great when the first wave of bugs comes in during my vacation, so please rush this :-)
20:15:51 <bwh> Heh
20:16:06 <bwh> Does anyone else see a blocker for this?
20:16:08 <carnil> bwh: I think we should go with 6.16.y to unstable, maybe with !1617 on top? but there is a ext4 regression in 6.16->6.16.1 which is unclear to me yet about the severity
20:16:40 <carnil> i.e. if e consider the ext4 problem a regression we might cherry-pick the (I think as of this morning) not yet commited in mainline
20:16:47 * carnil checks
20:17:09 <carnil> https://lore.kernel.org/all/3d7f77d2-b1f8-4d49-b36a-927a943efc2f@heusel.eu/#t
20:17:40 <bwh> That does seem worth cherry-picking if the fix is known
20:18:16 <bwh> I agree !1617 should be merged
20:18:27 <ukleinek> from the report the fix is not clear, reverting b9c561f3f29c2?
20:18:54 <ukleinek> reverts cleanly
20:19:07 <carnil> jupp, the fix is not clear yet, it is not a problem in 6.17-rc2 so specific to 6.16.y
20:19:20 <carnil> (might be some missing dependency)
20:19:47 <carnil> so if we want 6.16.2 to unstable we might better revert
20:20:11 <ukleinek> Might be interesting to test 5137d6c8906b55b3c7b5d1aa5a549753ec8520f5 (= upstream equivalent of the bad commit)
20:20:39 <bwh> OK, so I think we're agreed 6.16.x should go into unstable once that's fixed
20:20:44 <ukleinek> depending on if that is good or not it's fixed later or a missing dependency
20:21:10 <bwh> Is there a Debian bug report for this?
20:21:32 <carnil> bwh: not that I'm aware of
20:21:52 <bwh> OK, I'll open a bug report at severity serious
20:21:58 <carnil> ok!
20:22:01 <ukleinek> 👍
20:22:09 <bwh> just so it's harder to forget
20:22:31 <bwh> #action bwh will open a bug report for the ext4 regression in 6.16.1
20:22:42 <carnil> bwh: are you taking care as well of finishing the upload to unstable with a 6.16.y version? If so I will no further interfer, have to finish test build for 6.16.2 but will remove draft status after that
20:22:48 <bwh> #agreed 6.16.x should go into unstable once the ext4 regression is resolved
20:23:05 <bwh> carnil: I will try to do that but am not committing to do so
20:23:14 <ukleinek> then we only need to determine who will chair next week
20:23:20 <carnil> bwh: ok in case we can coordinate and see
20:23:25 <bwh> I think it's my turn
20:23:36 <ukleinek> bwh: ack
20:23:43 <carnil> thank you :)
20:24:10 <ukleinek> #agreed bwh chairs next week
20:24:14 <ukleinek> #endmeeting