It's very useful. There are some false positives involving integration
tests in `rustc_pattern_analysis` and `rustc_serialize`. There is also a
false positive involving `rustc_driver_impl`'s
`rustc_randomized_layouts` feature. And I removed a `rustc_span` mention
in a doc comment in `rustc_log` because it wasn't integral to the
comment but caused a dev-dependency.
The `hash_raw_entry` feature has finished fcp-close, so the compiler
should stop using it to allow its removal. Several `Sharded` maps were
using raw entries to avoid re-hashing between shard and map lookup, and
we can do that with `hashbrown::HashTable` instead.
Revert <https://github.com/rust-lang/rust/pull/138084> to buy time to
consider options that avoids breaking downstream usages of cargo on
distributed `rustc-src` artifacts, where such cargo invocations fail due
to inability to inherit `lints` from workspace root manifest's
`workspace.lints` (this is only valid for the source rust-lang/rust
workspace, but not really the distributed `rustc-src` artifacts).
This breakage was reported in
<https://github.com/rust-lang/rust/issues/138304>.
This reverts commit 48caf81484, reversing
changes made to c6662879b2.
Rollup of 12 pull requests
Successful merges:
- #136127 (Allow `*const W<dyn A> -> *const dyn A` ptr cast)
- #136968 (Turn order dependent trait objects future incompat warning into a hard error)
- #137319 (Stabilize `const_vec_string_slice`)
- #137885 (tidy: add triagebot checks)
- #138040 (compiler: Use `size_of` from the prelude instead of imported)
- #138084 (Use workspace lints for crates in `compiler/`)
- #138158 (Move more layouting logic to `rustc_abi`)
- #138160 (depend more on attr_data_structures and move find_attr! there)
- #138192 (crashes: couple more tests)
- #138216 (bootstrap: Fix stack printing when a step cycle is detected)
- #138232 (Reduce verbosity of GCC build log)
- #138242 (Revert "Don't test new error messages with the stage 0 compiler")
r? `@ghost`
`@rustbot` modify labels: rollup
compiler: Use `size_of` from the prelude instead of imported
Use `std::mem::{size_of, size_of_val, align_of, align_of_val}` from the prelude instead of importing or qualifying them. Apply this change across the compiler.
These functions were added to all preludes in Rust 1.80.
r? ``@compiler-errors``
Change TaskDeps to start preallocated with 128 capacity
This is a tiny change that makes `TaskDeps::read_set` start preallocated with capacity for 128 elements.
From local profiling, it looks like `TaskDeps::read_set` is one of the most-often resized hash-sets in `rustc`.
By naming them in `[workspace.lints.rust]` in the top-level
`Cargo.toml`, and then making all `compiler/` crates inherit them with
`[lints] workspace = true`. (I omitted `rustc_codegen_{cranelift,gcc}`,
because they're a bit different.)
The advantages of this over the current approach:
- It uses a standard Cargo feature, rather than special handling in
bootstrap. So, easier to understand, and less likely to get
accidentally broken in the future.
- It works for proc macro crates.
It's a shame it doesn't work for rustc-specific lints, as the comments
explain.
Use `std::mem::{size_of, size_of_val, align_of, align_of_val}` from the
prelude instead of importing or qualifying them.
These functions were added to all preludes in Rust 1.80.
Resume one waiter at once in deadlock handler
When multiple query loop errors occur in the code, only one waiter should be resumed at a time to avoid waking up multiple waiters at the same time and causing deadlock due to thread grabbing.
This fixes the UI failures in #132051
cc `@Zoxc` `@cjgillot` `@nnethercote` `@bjorn3` `@Kobzol`
Zulip discussion [here](https://rust-lang.zulipchat.com/#narrow/channel/187679-t-compiler.2Fwg-parallel-rustc/topic/Deadlocks.20and.20Rayon)
Edit: We can't reproduce these bugs with the existing test suits, so we keep them until we merge #132051
UPDATES #129912
UPDATES #120757
UPDATES #129911
Parallel-compiler-related cleanup
Parallel-compiler-related cleanup
I carefully split changes into commits. Commit messages are self-explanatory. Squashing is not recommended.
cc "Parallel Rustc Front-end" https://github.com/rust-lang/rust/issues/113349
r? SparrowLii
``@rustbot`` label: +WG-compiler-parallel
tree-wide: parallel: Fully removed all `Lrc`, replaced with `Arc`
tree-wide: parallel: Fully removed all `Lrc`, replaced with `Arc`
This is continuation of https://github.com/rust-lang/rust/pull/132282 .
I'm pretty sure I did everything right. In particular, I searched all occurrences of `Lrc` in submodules and made sure that they don't need replacement.
There are other possibilities, through.
We can define `enum Lrc<T> { Rc(Rc<T>), Arc(Arc<T>) }`. Or we can make `Lrc` a union and on every clone we can read from special thread-local variable. Or we can add a generic parameter to `Lrc` and, yes, this parameter will be everywhere across all codebase.
So, if you think we should take some alternative approach, then don't merge this PR. But if it is decided to stick with `Arc`, then, please, merge.
cc "Parallel Rustc Front-end" ( https://github.com/rust-lang/rust/issues/113349 )
r? SparrowLii
`@rustbot` label WG-compiler-parallel
`rustc_middle` and `rustc_query_system` both have a file called
`dep_node.rs` with a big comment at the top, and the comments are very
similar. The one in `rustc_query_system` looks like the original, and
the one in `rustc_middle` is a copy with some improvements.
This commit removes the comment from `rustc_middle` and updates the one
in `rustc_query_system` to include the improvements. I did it this way
because `rustc_query_system` is the crate that defines `DepNode`, and so
seems like the right place for the comment.
When the word "cache" appears in the context of the query system, it often
isn't obvious whether that is referring to the in-memory query cache or the
on-disk incremental cache.
For these types, we can assure the reader that they are for in-memory caching.
Refactored the duplicated code into a function.
`with_feed_task` currently passes the query key to `debug_assert!`.
This commit changes that, so it debug prints the `DepNode`, as in
`with_task`.
`rustc_span::symbol` defines some things that are re-exported from
`rustc_span`, such as `Symbol` and `sym`. But it doesn't re-export some
closely related things such as `Ident` and `kw`. So you can do `use
rustc_span::{Symbol, sym}` but you have to do `use
rustc_span::symbol::{Ident, kw}`, which is inconsistent for no good
reason.
This commit re-exports `Ident`, `kw`, and `MacroRulesNormalizedIdent`,
and changes many `rustc_span::symbol::` qualifiers in `compiler/` to
`rustc_span::`. This is a 200+ net line of code reduction, mostly
because many files with two `use rustc_span` items can be reduced to
one.
Improve VecCache under parallel frontend
This replaces the single Vec allocation with a series of progressively larger buckets. With the cfg for parallel enabled but with -Zthreads=1, this looks like a slight regression in i-count and cycle counts (~1%).
With the parallel frontend at -Zthreads=4, this is an improvement (-5% wall-time from 5.788 to 5.4688 on libcore) than our current Lock-based approach, likely due to reducing the bouncing of the cache line holding the lock. At -Zthreads=32 it's a huge improvement (-46%: 8.829 -> 4.7319 seconds).
try-job: i686-gnu-nopt
try-job: dist-x86_64-linux
This replaces the single Vec allocation with a series of progressively
larger buckets. With the cfg for parallel enabled but with -Zthreads=1,
this looks like a slight regression in i-count and cycle counts (<0.1%).
With the parallel frontend at -Zthreads=4, this is an improvement (-5%
wall-time from 5.788 to 5.4688 on libcore) than our current Lock-based
approach, likely due to reducing the bouncing of the cache line holding
the lock. At -Zthreads=32 it's a huge improvement (-46%: 8.829 -> 4.7319
seconds).
Delete the `cfg(not(parallel))` serial compiler
Since it's inception a long time ago, the parallel compiler and its cfgs have been a maintenance burden. This was a necessary evil the allow iteration while not degrading performance because of synchronization overhead.
But this time is over. Thanks to the amazing work by the parallel working group (and the dyn sync crimes), the parallel compiler has now been fast enough to be shipped by default in nightly for quite a while now.
Stable and beta have still been on the serial compiler, because they can't use `-Zthreads` anyways.
But this is quite suboptimal:
- the maintenance burden still sucks
- we're not testing the serial compiler in nightly
Because of these reasons, it's time to end it. The serial compiler has served us well in the years since it was split from the parallel one, but it's over now.
Let the knight slay one head of the two-headed dragon!
#113349
Note that the default is still 1 thread, as more than 1 thread is still fairly broken.
cc `@onur-ozkan` to see if i did the bootstrap field removal correctly, `@SparrowLii` on the sync parts
Since it's inception a long time ago, the parallel compiler and its cfgs
have been a maintenance burden. This was a necessary evil the allow
iteration while not degrading performance because of synchronization
overhead.
But this time is over. Thanks to the amazing work by the parallel
working group (and the dyn sync crimes), the parallel compiler has now
been fast enough to be shipped by default in nightly for quite a while
now.
Stable and beta have still been on the serial compiler, because they
can't use `-Zthreads` anyways.
But this is quite suboptimal:
- the maintenance burden still sucks
- we're not testing the serial compiler in nightly
Because of these reasons, it's time to end it. The serial compiler has
served us well in the years since it was split from the parallel one,
but it's over now.
Let the knight slay one head of the two-headed dragon!