In the Linux kernel, the following vulnerability has been resolved:
bpf: Fix overrunning reservations in ringbuf
The BPF ring buffer internally is implemented as a power-of-2 sized circular buffer, with two logical and ever-increasing counters: consumerpos is the consumer counter to show which logical position the consumer consumed the data, and producerpos which is the producer counter denoting the amount of data reserved by all producers.
Each time a record is reserved, the producer that "owns" the record will successfully advance producer counter. In user space each time a record is read, the consumer of the data advanced the consumer counter once it finished processing. Both counters are stored in separate pages so that from user space, the producer counter is read-only and the consumer counter is read-write.
One aspect that simplifies and thus speeds up the implementation of both producers and consumers is how the data area is mapped twice contiguously back-to-back in the virtual memory, allowing to not take any special measures for samples that have to wrap around at the end of the circular buffer data area, because the next page after the last data page would be first data page again, and thus the sample will still appear completely contiguous in virtual memory.
Each record has a struct bpfringbufhdr { u32 len; u32 pgoff; } header for
book-keeping the length and offset, and is inaccessible to the BPF program.
Helpers like bpfringbuf_reserve() return (void *)hdr + BPF_RINGBUF_HDR_SZ
for the BPF program to use. Bing-Jhong and Muhammad reported that it is however
possible to make a second allocated memory chunk overlapping with the first
chunk and as a result, the BPF program is now able to edit first chunk's
header.
For example, consider the creation of a BPFMAPTYPERINGBUF map with size
of 0x4000. Next, the consumerpos is modified to 0x3000 /before/ a call to
bpfringbufreserve() is made. This will allocate a chunk A, which is in
[0x0,0x3008], and the BPF program is able to edit [0x8,0x3008]. Now, lets
allocate a chunk B with size 0x3000. This will succeed because consumerpos
was edited ahead of time to pass the new_prod_pos - cons_pos > rb->mask
check. Chunk B will be in range [0x3008,0x6010], and the BPF program is able
to edit [0x3010,0x6010]. Due to the ring buffer memory layout mentioned
earlier, the ranges [0x0,0x4000] and [0x4000,0x8000] point to the same data
pages. This means that chunk B at [0x4000,0x4008] is chunk A's header.
bpfringbufsubmit() / bpfringbufdiscard() use the header's pgoff to then
locate the bpfringbuf itself via bpfringbufrestorefromrec(). Once chunk
B modified chunk A's header, then bpfringbuf_commit() refers to the wrong
page and could cause a crash.
Fix it by calculating the oldest pendingpos and check whether the range from the oldest outstanding record to the newest would span beyond the ring buffer size. If that is the case, then reject the request. We've tested with the ring buffer benchmark in BPF selftests (./benchs/runbench_ringbufs.sh) before/after the fix and while it seems a bit slower on some benchmarks, it is still not significantly enough to matter.