In the Linux kernel, the following vulnerability has been resolved:
net: ip_tunnel: prevent perpetual headroom growth
syzkaller triggered following kasan splat: BUG: KASAN: use-after-free in skbflowdissect+0x19d1/0x7a50 net/core/flowdissector.c:1170 Read of size 1 at addr ffff88812fb4000e by task syz-executor183/5191 [..] kasanreport+0xda/0x110 mm/kasan/report.c:588 _skbflowdissect+0x19d1/0x7a50 net/core/flowdissector.c:1170 skbflowdissectflowkeys include/linux/skbuff.h:1514 [inline] skbgethash net/core/flowdissector.c:1791 [inline] _skbgethash+0xc7/0x540 net/core/flowdissector.c:1856 skbgethash include/linux/skbuff.h:1556 [inline] iptunnelxmit+0x1855/0x33c0 net/ipv4/iptunnel.c:748 ipiptunnelxmit+0x3cc/0x4e0 net/ipv4/ipip.c:308 _netdevstartxmit include/linux/netdevice.h:4940 [inline] netdevstartxmit include/linux/netdevice.h:4954 [inline] xmitone net/core/dev.c:3548 [inline] devhardstartxmit+0x13d/0x6d0 net/core/dev.c:3564 _devqueuexmit+0x7c1/0x3d60 net/core/dev.c:4349 devqueuexmit include/linux/netdevice.h:3134 [inline] neighconnectedoutput+0x42c/0x5d0 net/core/neighbour.c:1592 ... ipfinishoutput2+0x833/0x2550 net/ipv4/ipoutput.c:235 ipfinishoutput+0x31/0x310 net/ipv4/ipoutput.c:323 .. iptunnelxmit+0x5b4/0x9b0 net/ipv4/iptunnelcore.c:82 iptunnelxmit+0x1dbc/0x33c0 net/ipv4/iptunnel.c:831 ipgrexmit+0x4a1/0x980 net/ipv4/ipgre.c:665 _netdevstartxmit include/linux/netdevice.h:4940 [inline] netdevstartxmit include/linux/netdevice.h:4954 [inline] xmitone net/core/dev.c:3548 [inline] devhardstartxmit+0x13d/0x6d0 net/core/dev.c:3564 ...
The splat occurs because skb->data points past skb->head allocated area. This is because neigh layer does: _skbpull(skb, skbnetworkoffset(skb));
... but skbnetworkoffset() returns a negative offset and _skbpull() arg is unsigned. IOW, we skb->data gets "adjusted" by a huge value.
The negative value is returned because skb->head and skb->data distance is more than 64k and skb->network_header (u16) has wrapped around.
The bug is in the iptunnel infrastructure, which can cause dev->neededheadroom to increment ad infinitum.
The syzkaller reproducer consists of packets getting routed via a gre tunnel, and route of gre encapsulated packets pointing at another (ipip) tunnel. The ipip encapsulation finds gre0 as next output device.
This results in the following pattern:
1). First packet is to be sent out via gre0. Route lookup found an output device, ipip0.
2). iptunnelxmit for gre0 bumps gre0->neededheadroom based on the future output device, rt.dev->neededheadroom (ipip0).
3). ip output / start_xmit moves skb on to ipip0. which runs the same code path again (xmit recursion).
4). Routing step for the post-gre0-encap packet finds gre0 as output device to use for ipip0 encapsulated packet.
tunl0->needed_headroom is then incremented based on the (already bumped) gre0 device headroom.
This repeats for every future packet:
gre0->neededheadroom gets inflated because previous packets' ipip0 step incremented rt->dev (gre0) headroom, and ipip0 incremented because gre0 neededheadroom was increased.
For each subsequent packet, gre/ipip0->needed_headroom grows until post-expand-head reallocations result in a skb->head/data distance of more than 64k.
Once that happens, skb->networkheader (u16) wraps around when pskbexpandhead tries to make sure that skbnetwork_offset() is unchanged after the headroom expansion/reallocation.
After this skbnetworkoffset(skb) returns a different (and negative) result post headroom expansion.
The next trip to neigh layer (or anything else that would _skbpull the network header) makes skb->data point to a memory location outside skb->head area.
v2: Cap the needed_headroom update to an arbitarily chosen upperlimit to prevent perpetual increase instead of dropping the headroom increment completely.