In the Linux kernel, the following vulnerability has been resolved:
blk-mq: fix IO hang from sbitmap wakeup race
In blkmqmarktagwait(), _addwaitqueue() may be re-ordered with the following blkmqgetdriver_tag() in case of getting driver tag failure.
Then in _sbitmapqueuewakeup(), waitqueueactive() may not observe the added waiter in blkmqmarktagwait() and wake up nothing, meantime blkmqmarktag_wait() can't get driver tag successfully.
This issue can be reproduced by running the following test in loop, and fio hang can be observed in < 30min when running it on my test VM in laptop.
modprobe -r scsi_debug
modprobe scsi_debug delay=0 dev_size_mb=4096 max_queue=1 host_max_queue=1 submit_queues=4
dev=`ls -d /sys/bus/pseudo/drivers/scsi_debug/adapter*/host*/target*/*/block/* | head -1 | xargs basename`
fio --filename=/dev/"$dev" --direct=1 --rw=randrw --bs=4k --iodepth=1 \
--runtime=100 --numjobs=40 --time_based --name=test \
--ioengine=libaio
Fix the issue by adding one explicit barrier in blkmqmarktagwait(), which is just fine in case of running out of tag.