In the Linux kernel, the following vulnerability has been resolved:
mlxsw: thermal: Fix out-of-bounds memory accesses
Currently, mlxsw allows cooling states to be set above the maximum cooling state supported by the driver:
# cat /sys/class/thermal/thermalzone2/cdev0/type mlxswfan # cat /sys/class/thermal/thermalzone2/cdev0/maxstate 10 # echo 18 > /sys/class/thermal/thermalzone2/cdev0/curstate # echo $? 0
This results in out-of-bounds memory accesses when thermal state transition statistics are enabled (CONFIGTHERMALSTATISTICS=y), as the transition table is accessed with a too large index (state) [1].
According to the thermal maintainer, it is the responsibility of the driver to reject such operations [2].
Therefore, return an error when the state to be set exceeds the maximum cooling state supported by the driver.
To avoid dead code, as suggested by the thermal maintainer [3], partially revert commit a421ce088ac8 ("mlxsw: core: Extend cooling device with cooling levels") that tried to interpret these invalid cooling states (above the maximum) in a special way. The cooling levels array is not removed in order to prevent the fans going below 20% PWM, which would cause them to get stuck at 0% PWM.
[1] BUG: KASAN: slab-out-of-bounds in thermalcoolingdevicestatsupdate+0x271/0x290 Read of size 4 at addr ffff8881052f7bf8 by task kworker/0:0/5
CPU: 0 PID: 5 Comm: kworker/0:0 Not tainted 5.15.0-rc3-custom-45935-gce1adf704b14 #122 Hardware name: Mellanox Technologies Ltd. "MSN2410-CB2FO"/"SA000874", BIOS 4.6.5 03/08/2016 Workqueue: eventsfreezablepower_ thermalzonedevicecheck Call Trace: dumpstacklvl+0x8b/0xb3 printaddressdescription.constprop.0+0x1f/0x140 kasanreport.cold+0x7f/0x11b thermalcoolingdevicestatsupdate+0x271/0x290 _thermalcdevupdate+0x15e/0x4e0 thermalcdevupdate+0x9f/0xe0 stepwisethrottle+0x770/0xee0 thermalzonedeviceupdate+0x3f6/0xdf0 processonework+0xa42/0x1770 workerthread+0x62f/0x13e0 kthread+0x3ee/0x4e0 retfrom_fork+0x1f/0x30
Allocated by task 1: kasansavestack+0x1b/0x40 _kasankmalloc+0x7c/0x90 thermalcoolingdevicesetupsysfs+0x153/0x2c0 _thermalcoolingdeviceregister.part.0+0x25b/0x9c0 thermalcoolingdeviceregister+0xb3/0x100 mlxswthermalinit+0x5c5/0x7e0 _mlxswcorebusdeviceregister+0xcb3/0x19c0 mlxswcorebusdeviceregister+0x56/0xb0 mlxswpciprobe+0x54f/0x710 localpciprobe+0xc6/0x170 pcideviceprobe+0x2b2/0x4d0 reallyprobe+0x293/0xd10 _driverprobedevice+0x2af/0x440 driverprobedevice+0x51/0x1e0 _driverattach+0x21b/0x530 busforeachdev+0x14c/0x1d0 busadddriver+0x3ac/0x650 driverregister+0x241/0x3d0 mlxswspmoduleinit+0xa2/0x174 dooneinitcall+0xee/0x5f0 kernelinitfreeable+0x45a/0x4de kernelinit+0x1f/0x210 retfromfork+0x1f/0x30
The buggy address belongs to the object at ffff8881052f7800 which belongs to the cache kmalloc-1k of size 1024 The buggy address is located 1016 bytes inside of 1024-byte region [ffff8881052f7800, ffff8881052f7c00) The buggy address belongs to the page: page:0000000052355272 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1052f0 head:0000000052355272 order:3 compoundmapcount:0 compoundpincount:0 flags: 0x200000000010200(slab|head|node=0|zone=2) raw: 0200000000010200 ffffea0005034800 0000000300000003 ffff888100041dc0 raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected
Memory state around the buggy address: ffff8881052f7a80: 00 00 00 00 00 00 04 fc fc fc fc fc fc fc fc fc ffff8881052f7b00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8881052f7b80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ^ ffff8881052f7c00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff8881052f7c80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[2] https://lore.kernel.org/linux-pm/9aca37cb-1629-5c67- ---truncated---