In the Linux kernel, the following vulnerability has been resolved:
md: fix resync softlockup when bitmap size is less than array size
Is is reported that for dm-raid10, lvextend + lvchange --syncaction will trigger following softlockup:
kernel:watchdog: BUG: soft lockup - CPU#3 stuck for 26s! [mdXresync:6976] CPU: 7 PID: 3588 Comm: mdXresync Kdump: loaded Not tainted 6.9.0-rc4-next-20240419 #1 RIP: 0010:rawspinunlockirq+0x13/0x30 Call Trace: <TASK> mdbitmapstartsync+0x6b/0xf0 raid10syncrequest+0x25c/0x1b40 [raid10] mddosync+0x64b/0x1020 mdthread+0xa7/0x170 kthread+0xcf/0x100 retfromfork+0x30/0x50 retfromfork_asm+0x1a/0x30
And the detailed process is as follows:
mddosync j = mddev->resyncmin while (j < maxsectors) sectors = raid10syncrequest(mddev, j, &skipped) if (!mdbitmapstartsync(..., &syncblocks)) // mdbitmapstartsync set syncblocks to 0 return syncblocks + sectorsskippe; // sectors = 0; j += sectors; // j never change
Root cause is that commit 301867b1c168 ("md/raid10: check slab-out-of-bounds in mdbitmapgetcounter") return early from mdbitmapgetcounter(), without setting returned blocks.
Fix this problem by always set returned blocks from mdbitmapget_counter"(), as it used to be.
Noted that this patch just fix the softlockup problem in kernel, the case that bitmap size doesn't match array size still need to be fixed.