In the Linux kernel, the following vulnerability has been resolved:
xdp: fix invalid wait context of pagepooldestroy()
If the driver uses a page pool, it creates a page pool with pagepoolcreate(). The reference count of page pool is 1 as default. A page pool will be destroyed only when a reference count reaches 0. pagepooldestroy() is used to destroy page pool, it decreases a reference count. When a page pool is destroyed, ->disconnect() is called, which is memallocatordisconnect(). This function internally acquires mutex_lock().
If the driver uses XDP, it registers a memory model with xdprxqinforegmemmodel(). The xdprxqinforegmemmodel() internally increases a page pool reference count if a memory model is a page pool. Now the reference count is 2.
To destroy a page pool, the driver should call both pagepooldestroy() and xdpunregmemmodel(). The xdpunregmemmodel() internally calls pagepooldestroy(). Only pagepooldestroy() decreases a reference count.
If a driver calls pagepooldestroy() then xdpunregmemmodel(), we will face an invalid wait context warning. Because xdpunregmemmodel() calls pagepooldestroy() with rcureadlock(). The pagepooldestroy() internally acquires mutex_lock().
[ BUG: Invalid wait context ]
ethtool/1806 is trying to lock: ffffffff90387b90 (memidlock){+.+.}-{4:4}, at: memallocatordisconnect+0x73/0x150 other info that might help us debug this: context-{5:5} 3 locks held by ethtool/1806: stack backtrace: CPU: 0 PID: 1806 Comm: ethtool Tainted: G W 6.10.0-rc6+ #4 f916f41f172891c800f2fed Hardware name: ASUS System Product Name/PRIME Z690-P D4, BIOS 0603 11/01/2021 Call Trace: <TASK> dumpstacklvl+0x7e/0xc0 lockacquire+0x1681/0x4de0 ? _printk+0x64/0xe0 ? _pfxmarklock.part.0+0x10/0x10 ? _pfxlockacquire+0x10/0x10 lockacquire+0x1b3/0x580 ? memallocatordisconnect+0x73/0x150 ? wakeupklogd.part.0+0x16/0xc0 ? _pfxlockacquire+0x10/0x10 ? dumpstacklvl+0x91/0xc0 _mutexlock+0x15c/0x1690 ? memallocatordisconnect+0x73/0x150 ? _pfxprbreadvalid+0x10/0x10 ? memallocatordisconnect+0x73/0x150 ? _pfxllistaddbatch+0x10/0x10 ? consoleunlock+0x193/0x1b0 ? lockdephardirqson+0xbe/0x140 ? _pfxmutexlock+0x10/0x10 ? ticknohztickstopped+0x16/0x90 ? irqworkqueuelocal+0x1e5/0x330 ? irqworkqueue+0x39/0x50 ? _wakeupklogd.part.0+0x79/0xc0 ? memallocatordisconnect+0x73/0x150 memallocatordisconnect+0x73/0x150 ? _pfxmemallocatordisconnect+0x10/0x10 ? markheldlocks+0xa5/0xf0 ? rcuiswatching+0x11/0xb0 pagepoolrelease+0x36e/0x6d0 pagepooldestroy+0xd7/0x440 xdpunregmemmodel+0x1a7/0x2a0 ? _pfxxdpunregmemmodel+0x10/0x10 ? kfree+0x125/0x370 ? bnxtfreering.isra.0+0x2eb/0x500 ? bnxtfreemem+0x5ac/0x2500 xdprxqinfounreg+0x4a/0xd0 bnxtfreemem+0x1356/0x2500 bnxtclosenic+0xf0/0x3b0 ? _pfxbnxtclosenic+0x10/0x10 ? ethnlparsebit+0x2c6/0x6d0 ? _pfxnlavalidateparse+0x10/0x10 ? pfxethnlparsebit+0x10/0x10 bnxtsetfeatures+0x2a8/0x3e0 _netdevupdatefeatures+0x4dc/0x1370 ? ethnlparsebitset+0x4ff/0x750 ? _pfxethnlparsebitset+0x10/0x10 ? _pfxnetdevupdatefeatures+0x10/0x10 ? markheldlocks+0xa5/0xf0 ? _rawspinunlockirqrestore+0x42/0x70 ? _pmruntimeresume+0x7d/0x110 ethnlset_features+0x32d/0xa20
To fix this problem, it uses rhashtablelookupfast() instead of rhashtablelookup() with rcureadlock(). Using xa without rcureadlock() here is safe. xa is freed by _xdpmemallocatorrcufree() and this is called by callrcu() of memxaremove(). The memxaremove() is called by pagepooldestroy() if a reference count reaches 0. The xa is already protected by the reference count mechanism well in the control plane. So removing rcureadlock() for pagepool_destroy() is safe.