| CVE |
Vendors |
Products |
Updated |
CVSS v3.1 |
| In the Linux kernel, the following vulnerability has been resolved:
netfilter: conntrack: add missing netlink policy validations
Hyunwoo Kim reports out-of-bounds access in sctp and ctnetlink.
These attributes are used by the kernel without any validation.
Extend the netlink policies accordingly.
Quoting the reporter:
nlattr_to_sctp() assigns the user-supplied CTA_PROTOINFO_SCTP_STATE
value directly to ct->proto.sctp.state without checking that it is
within the valid range. [..]
and: ... with exp->dir = 100, the access at
ct->master->tuplehash[100] reads 5600 bytes past the start of a
320-byte nf_conn object, causing a slab-out-of-bounds read confirmed by
UBSAN. |
| In the Linux kernel, the following vulnerability has been resolved:
drm/imagination: Synchronize interrupts before suspending the GPU
The runtime PM suspend callback doesn't know whether the IRQ handler is
in progress on a different CPU core and doesn't wait for it to finish.
Depending on timing, the IRQ handler could be running while the GPU is
suspended, leading to kernel crashes when trying to access GPU
registers. See example signature below.
In a power off sequence initiated by the runtime PM suspend callback,
wait for any IRQ handlers in progress on other CPU cores to finish, by
calling synchronize_irq().
At the same time, remove the runtime PM resume/put calls in the threaded
IRQ handler. On top of not being the right approach to begin with, and
being at the wrong place as they should have wrapped all GPU register
accesses, the driver would hit a deadlock between synchronize_irq()
being called from a runtime PM suspend callback, holding the device
power lock, and the resume callback requiring the same.
Example crash signature on a TI AM68 SK platform:
[ 337.241218] SError Interrupt on CPU0, code 0x00000000bf000000 -- SError
[ 337.241239] CPU: 0 UID: 0 PID: 112 Comm: irq/234-gpu Tainted: G M 6.17.7-B2C-00005-g9c7bbe4ea16c #2 PREEMPT
[ 337.241246] Tainted: [M]=MACHINE_CHECK
[ 337.241249] Hardware name: Texas Instruments AM68 SK (DT)
[ 337.241252] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 337.241256] pc : pvr_riscv_irq_pending+0xc/0x24
[ 337.241277] lr : pvr_device_irq_thread_handler+0x64/0x310
[ 337.241282] sp : ffff800085b0bd30
[ 337.241284] x29: ffff800085b0bd50 x28: ffff0008070d9eab x27: ffff800083a5ce10
[ 337.241291] x26: ffff000806e48f80 x25: ffff0008070d9eac x24: 0000000000000000
[ 337.241296] x23: ffff0008068e9bf0 x22: ffff0008068e9bd0 x21: ffff800085b0bd30
[ 337.241301] x20: ffff0008070d9e00 x19: ffff0008068e9000 x18: 0000000000000001
[ 337.241305] x17: 637365645f656c70 x16: 0000000000000000 x15: ffff000b7df9ff40
[ 337.241310] x14: 0000a585fe3c0d0e x13: 000000999704f060 x12: 000000000002771a
[ 337.241314] x11: 00000000000000c0 x10: 0000000000000af0 x9 : ffff800085b0bd00
[ 337.241318] x8 : ffff0008071175d0 x7 : 000000000000b955 x6 : 0000000000000003
[ 337.241323] x5 : 0000000000000000 x4 : 0000000000000002 x3 : 0000000000000000
[ 337.241327] x2 : ffff800080e39d20 x1 : ffff800080e3fc48 x0 : 0000000000000000
[ 337.241333] Kernel panic - not syncing: Asynchronous SError Interrupt
[ 337.241337] CPU: 0 UID: 0 PID: 112 Comm: irq/234-gpu Tainted: G M 6.17.7-B2C-00005-g9c7bbe4ea16c #2 PREEMPT
[ 337.241342] Tainted: [M]=MACHINE_CHECK
[ 337.241343] Hardware name: Texas Instruments AM68 SK (DT)
[ 337.241345] Call trace:
[ 337.241348] show_stack+0x18/0x24 (C)
[ 337.241357] dump_stack_lvl+0x60/0x80
[ 337.241364] dump_stack+0x18/0x24
[ 337.241368] vpanic+0x124/0x2ec
[ 337.241373] abort+0x0/0x4
[ 337.241377] add_taint+0x0/0xbc
[ 337.241384] arm64_serror_panic+0x70/0x80
[ 337.241389] do_serror+0x3c/0x74
[ 337.241392] el1h_64_error_handler+0x30/0x48
[ 337.241400] el1h_64_error+0x6c/0x70
[ 337.241404] pvr_riscv_irq_pending+0xc/0x24 (P)
[ 337.241410] irq_thread_fn+0x2c/0xb0
[ 337.241416] irq_thread+0x170/0x334
[ 337.241421] kthread+0x12c/0x210
[ 337.241428] ret_from_fork+0x10/0x20
[ 337.241434] SMP: stopping secondary CPUs
[ 337.241451] Kernel Offset: disabled
[ 337.241453] CPU features: 0x040000,02002800,20002001,0400421b
[ 337.241456] Memory Limit: none
[ 337.457921] ---[ end Kernel panic - not syncing: Asynchronous SError Interrupt ]--- |
| In the Linux kernel, the following vulnerability has been resolved:
drm/amdgpu: Limit BO list entry count to prevent resource exhaustion
Userspace can pass an arbitrary number of BO list entries via the
bo_number field. Although the previous multiplication overflow check
prevents out-of-bounds allocation, a large number of entries could still
cause excessive memory allocation (up to potentially gigabytes) and
unnecessarily long list processing times.
Introduce a hard limit of 128k entries per BO list, which is more than
sufficient for any realistic use case (e.g., a single list containing all
buffers in a large scene). This prevents memory exhaustion attacks and
ensures predictable performance.
Return -EINVAL if the requested entry count exceeds the limit
(cherry picked from commit 688b87d39e0aa8135105b40dc167d74b5ada5332) |
| In the Linux kernel, the following vulnerability has been resolved:
wifi: mac80211: always free skb on ieee80211_tx_prepare_skb() failure
ieee80211_tx_prepare_skb() has three error paths, but only two of them
free the skb. The first error path (ieee80211_tx_prepare() returning
TX_DROP) does not free it, while invoke_tx_handlers() failure and the
fragmentation check both do.
Add kfree_skb() to the first error path so all three are consistent,
and remove the now-redundant frees in callers (ath9k, mt76,
mac80211_hwsim) to avoid double-free.
Document the skb ownership guarantee in the function's kdoc. |
| In the Linux kernel, the following vulnerability has been resolved:
ipv6: add NULL checks for idev in SRv6 paths
__in6_dev_get() can return NULL when the device has no IPv6 configuration
(e.g. MTU < IPV6_MIN_MTU or after NETDEV_UNREGISTER).
Add NULL checks for idev returned by __in6_dev_get() in both
seg6_hmac_validate_skb() and ipv6_srh_rcv() to prevent potential NULL
pointer dereferences. |
| In the Linux kernel, the following vulnerability has been resolved:
nf_tables: nft_dynset: fix possible stateful expression memleak in error path
If cloning the second stateful expression in the element via GFP_ATOMIC
fails, then the first stateful expression remains in place without being
released.
unreferenced object (percpu) 0x607b97e9cab8 (size 16):
comm "softirq", pid 0, jiffies 4294931867
hex dump (first 16 bytes on cpu 3):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
backtrace (crc 0):
pcpu_alloc_noprof+0x453/0xd80
nft_counter_clone+0x9c/0x190 [nf_tables]
nft_expr_clone+0x8f/0x1b0 [nf_tables]
nft_dynset_new+0x2cb/0x5f0 [nf_tables]
nft_rhash_update+0x236/0x11c0 [nf_tables]
nft_dynset_eval+0x11f/0x670 [nf_tables]
nft_do_chain+0x253/0x1700 [nf_tables]
nft_do_chain_ipv4+0x18d/0x270 [nf_tables]
nf_hook_slow+0xaa/0x1e0
ip_local_deliver+0x209/0x330 |
| In the Linux kernel, the following vulnerability has been resolved:
af_unix: Give up GC if MSG_PEEK intervened.
Igor Ushakov reported that GC purged the receive queue of
an alive socket due to a race with MSG_PEEK with a nice repro.
This is the exact same issue previously fixed by commit
cbcf01128d0a ("af_unix: fix garbage collect vs MSG_PEEK").
After GC was replaced with the current algorithm, the cited
commit removed the locking dance in unix_peek_fds() and
reintroduced the same issue.
The problem is that MSG_PEEK bumps a file refcount without
interacting with GC.
Consider an SCC containing sk-A and sk-B, where sk-A is
close()d but can be recv()ed via sk-B.
The bad thing happens if sk-A is recv()ed with MSG_PEEK from
sk-B and sk-B is close()d while GC is checking unix_vertex_dead()
for sk-A and sk-B.
GC thread User thread
--------- -----------
unix_vertex_dead(sk-A)
-> true <------.
\
`------ recv(sk-B, MSG_PEEK)
invalidate !! -> sk-A's file refcount : 1 -> 2
close(sk-B)
-> sk-B's file refcount : 2 -> 1
unix_vertex_dead(sk-B)
-> true
Initially, sk-A's file refcount is 1 by the inflight fd in sk-B
recvq. GC thinks sk-A is dead because the file refcount is the
same as the number of its inflight fds.
However, sk-A's file refcount is bumped silently by MSG_PEEK,
which invalidates the previous evaluation.
At this moment, sk-B's file refcount is 2; one by the open fd,
and one by the inflight fd in sk-A. The subsequent close()
releases one refcount by the former.
Finally, GC incorrectly concludes that both sk-A and sk-B are dead.
One option is to restore the locking dance in unix_peek_fds(),
but we can resolve this more elegantly thanks to the new algorithm.
The point is that the issue does not occur without the subsequent
close() and we actually do not need to synchronise MSG_PEEK with
the dead SCC detection.
When the issue occurs, close() and GC touch the same file refcount.
If GC sees the refcount being decremented by close(), it can just
give up garbage-collecting the SCC.
Therefore, we only need to signal the race during MSG_PEEK with
a proper memory barrier to make it visible to the GC.
Let's use seqcount_t to notify GC when MSG_PEEK occurs and let
it defer the SCC to the next run.
This way no locking is needed on the MSG_PEEK side, and we can
avoid imposing a penalty on every MSG_PEEK unnecessarily.
Note that we can retry within unix_scc_dead() if MSG_PEEK is
detected, but we do not do so to avoid hung task splat from
abusive MSG_PEEK calls. |
| In the Linux kernel, the following vulnerability has been resolved:
ice: Fix memory leak in ice_set_ringparam()
In ice_set_ringparam, tx_rings and xdp_rings are allocated before
rx_rings. If the allocation of rx_rings fails, the code jumps to
the done label leaking both tx_rings and xdp_rings. Furthermore, if
the setup of an individual Rx ring fails during the loop, the code jumps
to the free_tx label which releases tx_rings but leaks xdp_rings.
Fix this by introducing a free_xdp label and updating the error paths to
ensure both xdp_rings and tx_rings are properly freed if rx_rings
allocation or setup fails.
Compile tested only. Issue found using a prototype static analysis tool
and code review. |
| In the Linux kernel, the following vulnerability has been resolved:
sched/deadline: Fix missing ENQUEUE_REPLENISH during PI de-boosting
Running stress-ng --schedpolicy 0 on an RT kernel on a big machine
might lead to the following WARNINGs (edited).
sched: DL de-boosted task PID 22725: REPLENISH flag missing
WARNING: CPU: 93 PID: 0 at kernel/sched/deadline.c:239 dequeue_task_dl+0x15c/0x1f8
... (running_bw underflow)
Call trace:
dequeue_task_dl+0x15c/0x1f8 (P)
dequeue_task+0x80/0x168
deactivate_task+0x24/0x50
push_dl_task+0x264/0x2e0
dl_task_timer+0x1b0/0x228
__hrtimer_run_queues+0x188/0x378
hrtimer_interrupt+0xfc/0x260
...
The problem is that when a SCHED_DEADLINE task (lock holder) is
changed to a lower priority class via sched_setscheduler(), it may
fail to properly inherit the parameters of potential DEADLINE donors
if it didn't already inherit them in the past (shorter deadline than
donor's at that time). This might lead to bandwidth accounting
corruption, as enqueue_task_dl() won't recognize the lock holder as
boosted.
The scenario occurs when:
1. A DEADLINE task (donor) blocks on a PI mutex held by another
DEADLINE task (holder), but the holder doesn't inherit parameters
(e.g., it already has a shorter deadline)
2. sched_setscheduler() changes the holder from DEADLINE to a lower
class while still holding the mutex
3. The holder should now inherit DEADLINE parameters from the donor
and be enqueued with ENQUEUE_REPLENISH, but this doesn't happen
Fix the issue by introducing __setscheduler_dl_pi(), which detects when
a DEADLINE (proper or boosted) task gets setscheduled to a lower
priority class. In case, the function makes the task inherit DEADLINE
parameters of the donoer (pi_se) and sets ENQUEUE_REPLENISH flag to
ensure proper bandwidth accounting during the next enqueue operation. |
| In the Linux kernel, the following vulnerability has been resolved:
cxl/mbox: validate payload size before accessing contents in cxl_payload_from_user_allowed()
cxl_payload_from_user_allowed() casts and dereferences the input
payload without first verifying its size. When a raw mailbox command
is sent with an undersized payload (ie: 1 byte for CXL_MBOX_OP_CLEAR_LOG,
which expects a 16-byte UUID), uuid_equal() reads past the allocated buffer,
triggering a KASAN splat:
BUG: KASAN: slab-out-of-bounds in memcmp+0x176/0x1d0 lib/string.c:683
Read of size 8 at addr ffff88810130f5c0 by task syz.1.62/2258
CPU: 2 UID: 0 PID: 2258 Comm: syz.1.62 Not tainted 6.19.0-dirty #3 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0xab/0xe0 lib/dump_stack.c:120
print_address_description mm/kasan/report.c:378 [inline]
print_report+0xce/0x650 mm/kasan/report.c:482
kasan_report+0xce/0x100 mm/kasan/report.c:595
memcmp+0x176/0x1d0 lib/string.c:683
uuid_equal include/linux/uuid.h:73 [inline]
cxl_payload_from_user_allowed drivers/cxl/core/mbox.c:345 [inline]
cxl_mbox_cmd_ctor drivers/cxl/core/mbox.c:368 [inline]
cxl_validate_cmd_from_user drivers/cxl/core/mbox.c:522 [inline]
cxl_send_cmd+0x9c0/0xb50 drivers/cxl/core/mbox.c:643
__cxl_memdev_ioctl drivers/cxl/core/memdev.c:698 [inline]
cxl_memdev_ioctl+0x14f/0x190 drivers/cxl/core/memdev.c:713
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:597 [inline]
__se_sys_ioctl fs/ioctl.c:583 [inline]
__x64_sys_ioctl+0x18e/0x210 fs/ioctl.c:583
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xa8/0x330 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fdaf331ba79
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fdaf1d77038 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007fdaf3585fa0 RCX: 00007fdaf331ba79
RDX: 00002000000001c0 RSI: 00000000c030ce02 RDI: 0000000000000003
RBP: 00007fdaf33749df R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007fdaf3586038 R14: 00007fdaf3585fa0 R15: 00007ffced2af768
</TASK>
Add 'in_size' parameter to cxl_payload_from_user_allowed() and validate
the payload is large enough. |
| In the Linux kernel, the following vulnerability has been resolved:
net: add proper RCU protection to /proc/net/ptype
Yin Fengwei reported an RCU stall in ptype_seq_show() and provided
a patch.
Real issue is that ptype_seq_next() and ptype_seq_show() violate
RCU rules.
ptype_seq_show() runs under rcu_read_lock(), and reads pt->dev
to get device name without any barrier.
At the same time, concurrent writers can remove a packet_type structure
(which is correctly freed after an RCU grace period) and clear pt->dev
without an RCU grace period.
Define ptype_iter_state to carry a dev pointer along seq_net_private:
struct ptype_iter_state {
struct seq_net_private p;
struct net_device *dev; // added in this patch
};
We need to record the device pointer in ptype_get_idx() and
ptype_seq_next() so that ptype_seq_show() is safe against
concurrent pt->dev changes.
We also need to add full RCU protection in ptype_seq_next().
(Missing READ_ONCE() when reading list.next values)
Many thanks to Dong Chenchen for providing a repro. |
| In the Linux kernel, the following vulnerability has been resolved:
net/sched: cls_u32: use skb_header_pointer_careful()
skb_header_pointer() does not fully validate negative @offset values.
Use skb_header_pointer_careful() instead.
GangMin Kim provided a report and a repro fooling u32_classify():
BUG: KASAN: slab-out-of-bounds in u32_classify+0x1180/0x11b0
net/sched/cls_u32.c:221 |
| In the Linux kernel, the following vulnerability has been resolved:
btrfs: do not strictly require dirty metadata threshold for metadata writepages
[BUG]
There is an internal report that over 1000 processes are
waiting at the io_schedule_timeout() of balance_dirty_pages(), causing
a system hang and trigger a kernel coredump.
The kernel is v6.4 kernel based, but the root problem still applies to
any upstream kernel before v6.18.
[CAUSE]
From Jan Kara for his wisdom on the dirty page balance behavior first.
This cgroup dirty limit was what was actually playing the role here
because the cgroup had only a small amount of memory and so the dirty
limit for it was something like 16MB.
Dirty throttling is responsible for enforcing that nobody can dirty
(significantly) more dirty memory than there's dirty limit. Thus when
a task is dirtying pages it periodically enters into balance_dirty_pages()
and we let it sleep there to slow down the dirtying.
When the system is over dirty limit already (either globally or within
a cgroup of the running task), we will not let the task exit from
balance_dirty_pages() until the number of dirty pages drops below the
limit.
So in this particular case, as I already mentioned, there was a cgroup
with relatively small amount of memory and as a result with dirty limit
set at 16MB. A task from that cgroup has dirtied about 28MB worth of
pages in btrfs btree inode and these were practically the only dirty
pages in that cgroup.
So that means the only way to reduce the dirty pages of that cgroup is
to writeback the dirty pages of btrfs btree inode, and only after that
those processes can exit balance_dirty_pages().
Now back to the btrfs part, btree_writepages() is responsible for
writing back dirty btree inode pages.
The problem here is, there is a btrfs internal threshold that if the
btree inode's dirty bytes are below the 32M threshold, it will not
do any writeback.
This behavior is to batch as much metadata as possible so we won't write
back those tree blocks and then later re-COW them again for another
modification.
This internal 32MiB is higher than the existing dirty page size (28MiB),
meaning no writeback will happen, causing a deadlock between btrfs and
cgroup:
- Btrfs doesn't want to write back btree inode until more dirty pages
- Cgroup/MM doesn't want more dirty pages for btrfs btree inode
Thus any process touching that btree inode is put into sleep until
the number of dirty pages is reduced.
Thanks Jan Kara a lot for the analysis of the root cause.
[ENHANCEMENT]
Since kernel commit b55102826d7d ("btrfs: set AS_KERNEL_FILE on the
btree_inode"), btrfs btree inode pages will only be charged to the root
cgroup which should have a much larger limit than btrfs' 32MiB
threshold.
So it should not affect newer kernels.
But for all current LTS kernels, they are all affected by this problem,
and backporting the whole AS_KERNEL_FILE may not be a good idea.
Even for newer kernels I still think it's a good idea to get
rid of the internal threshold at btree_writepages(), since for most cases
cgroup/MM has a better view of full system memory usage than btrfs' fixed
threshold.
For internal callers using btrfs_btree_balance_dirty() since that
function is already doing internal threshold check, we don't need to
bother them.
But for external callers of btree_writepages(), just respect their
requests and write back whatever they want, ignoring the internal
btrfs threshold to avoid such deadlock on btree inode dirty page
balancing. |
| In the Linux kernel, the following vulnerability has been resolved:
btrfs: send: check for inline extents in range_is_hole_in_parent()
Before accessing the disk_bytenr field of a file extent item we need
to check if we are dealing with an inline extent.
This is because for inline extents their data starts at the offset of
the disk_bytenr field. So accessing the disk_bytenr
means we are accessing inline data or in case the inline data is less
than 8 bytes we can actually cause an invalid
memory access if this inline extent item is the first item in the leaf
or access metadata from other items. |
| In the Linux kernel, the following vulnerability has been resolved:
rxrpc: Fix recvmsg() unconditional requeue
If rxrpc_recvmsg() fails because MSG_DONTWAIT was specified but the call at
the front of the recvmsg queue already has its mutex locked, it requeues
the call - whether or not the call is already queued. The call may be on
the queue because MSG_PEEK was also passed and so the call was not dequeued
or because the I/O thread requeued it.
The unconditional requeue may then corrupt the recvmsg queue, leading to
things like UAFs or refcount underruns.
Fix this by only requeuing the call if it isn't already on the queue - and
moving it to the front if it is already queued. If we don't queue it, we
have to put the ref we obtained by dequeuing it.
Also, MSG_PEEK doesn't dequeue the call so shouldn't call
rxrpc_notify_socket() for the call if we didn't use up all the data on the
queue, so fix that also. |
| In the Linux kernel, the following vulnerability has been resolved:
fs/ntfs3: handle attr_set_size() errors when truncating files
If attr_set_size() fails while truncating down, the error is silently
ignored and the inode may be left in an inconsistent state. |
| In the Linux kernel, the following vulnerability has been resolved:
dmaengine: mmp_pdma: Fix race condition in mmp_pdma_residue()
Add proper locking in mmp_pdma_residue() to prevent use-after-free when
accessing descriptor list and descriptor contents.
The race occurs when multiple threads call tx_status() while the tasklet
on another CPU is freeing completed descriptors:
CPU 0 CPU 1
----- -----
mmp_pdma_tx_status()
mmp_pdma_residue()
-> NO LOCK held
list_for_each_entry(sw, ..)
DMA interrupt
dma_do_tasklet()
-> spin_lock(&desc_lock)
list_move(sw->node, ...)
spin_unlock(&desc_lock)
| dma_pool_free(sw) <- FREED!
-> access sw->desc <- UAF!
This issue can be reproduced when running dmatest on the same channel with
multiple threads (threads_per_chan > 1).
Fix by protecting the chain_running list iteration and descriptor access
with the chan->desc_lock spinlock. |
| In the Linux kernel, the following vulnerability has been resolved:
dm-verity: disable recursive forward error correction
There are two problems with the recursive correction:
1. It may cause denial-of-service. In fec_read_bufs, there is a loop that
has 253 iterations. For each iteration, we may call verity_hash_for_block
recursively. There is a limit of 4 nested recursions - that means that
there may be at most 253^4 (4 billion) iterations. Red Hat QE team
actually created an image that pushes dm-verity to this limit - and this
image just makes the udev-worker process get stuck in the 'D' state.
2. It doesn't work. In fec_read_bufs we store data into the variable
"fio->bufs", but fio bufs is shared between recursive invocations, if
"verity_hash_for_block" invoked correction recursively, it would
overwrite partially filled fio->bufs. |
| In the Linux kernel, the following vulnerability has been resolved:
ublk: fix deadlock when reading partition table
When one process(such as udev) opens ublk block device (e.g., to read
the partition table via bdev_open()), a deadlock[1] can occur:
1. bdev_open() grabs disk->open_mutex
2. The process issues read I/O to ublk backend to read partition table
3. In __ublk_complete_rq(), blk_update_request() or blk_mq_end_request()
runs bio->bi_end_io() callbacks
4. If this triggers fput() on file descriptor of ublk block device, the
work may be deferred to current task's task work (see fput() implementation)
5. This eventually calls blkdev_release() from the same context
6. blkdev_release() tries to grab disk->open_mutex again
7. Deadlock: same task waiting for a mutex it already holds
The fix is to run blk_update_request() and blk_mq_end_request() with bottom
halves disabled. This forces blkdev_release() to run in kernel work-queue
context instead of current task work context, and allows ublk server to make
forward progress, and avoids the deadlock.
[axboe: rewrite comment in ublk] |
| In the Linux kernel, the following vulnerability has been resolved:
f2fs: fix to detect potential corrupted nid in free_nid_list
As reported, on-disk footer.ino and footer.nid is the same and
out-of-range, let's add sanity check on f2fs_alloc_nid() to detect
any potential corruption in free_nid_list. |