1
Fork 0

Rollup merge of #137731 - SparrowLii:waiter, r=nnethercote

Resume one waiter at once in deadlock handler

When multiple query loop errors occur in the code, only one waiter should be resumed at a time to avoid waking up multiple waiters at the same time and causing deadlock due to thread grabbing.

This fixes the UI failures in #132051

cc `@Zoxc` `@cjgillot` `@nnethercote` `@bjorn3` `@Kobzol`

Zulip discussion [here](https://rust-lang.zulipchat.com/#narrow/channel/187679-t-compiler.2Fwg-parallel-rustc/topic/Deadlocks.20and.20Rayon)

Edit: We can't reproduce these bugs with the existing test suits, so we keep them until we merge #132051
UPDATES #129912
UPDATES #120757
UPDATES #129911
This commit is contained in:
许杰友 Jieyou Xu (Joe) 2025-03-05 21:46:45 +08:00 committed by GitHub
commit 927c11fcb7
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -477,8 +477,8 @@ fn remove_cycle(
/// Detects query cycles by using depth first search over all active query jobs.
/// If a query cycle is found it will break the cycle by finding an edge which
/// uses a query latch and then resuming that waiter.
/// There may be multiple cycles involved in a deadlock, so this searches
/// all active queries for cycles before finally resuming all the waiters at once.
/// There may be multiple cycles involved in a deadlock, but we only search
/// one cycle at a call and resume one waiter at once. See `FIXME` below.
pub fn break_query_cycles(query_map: QueryMap, registry: &rayon_core::Registry) {
let mut wakelist = Vec::new();
let mut jobs: Vec<QueryJobId> = query_map.keys().cloned().collect();
@ -488,6 +488,19 @@ pub fn break_query_cycles(query_map: QueryMap, registry: &rayon_core::Registry)
while jobs.len() > 0 {
if remove_cycle(&query_map, &mut jobs, &mut wakelist) {
found_cycle = true;
// FIXME(#137731): Resume all the waiters at once may cause deadlocks,
// so we resume one waiter at a call for now. It's still unclear whether
// it's due to possible issues in rustc-rayon or instead in the handling
// of query cycles.
// This seem to only appear when multiple query cycles errors
// are involved, so this reduction in parallelism, while suboptimal, is not
// universal and only the deadlock handler will encounter these cases.
// The workaround shows loss of potential gains, but there still are big
// improvements in the common case, and no regressions compared to the
// single-threaded case. More investigation is still needed, and once fixed,
// we can wake up all the waiters up.
break;
}
}