Implement `SliceIndex` for `ByteStr`
Implement `Index` and `IndexMut` for `ByteStr` in terms of `SliceIndex`. Implement it for the same types that `&[u8]` supports (a superset of those supported for `&str`, which does not have `usize` and `ops::IndexRange`).
At the same time, move compare and index traits to a separate file in the `bstr` module, to give it more space to grow as more functionality is added (e.g., iterators and string-like ops). Order the items in `bstr/traits.rs` similarly to `str/traits.rs`.
cc `@joshtriplett`
`ByteStr`/`ByteString` tracking issue: https://github.com/rust-lang/rust/issues/134915
Apply `Recovery::Forbidden` when reparsing pasted macro fragments.
Fixes#137874.
The changes to the output of `tests/ui/associated-consts/issue-93835.rs`
partly undo the changes seen when `NtTy` was removed in #133436, which
is good.
r? ``@petrochenkov``
Rustdoc: typecheck settings.js
This makes the file fully typechecked with no instances of ``````@ts-expect-error`````` and no type casts.
r? `````@notriddle`````
replace extra_filename with strict version hash in metrics file names
Should resolve the potential issue of overwriting metrics from the same crate when compiled with different features or flags.
r? `````@estebank`````
try-job: test-various
Add integer to string formatting tests
As discussed in https://github.com/rust-lang/rust/pull/136264, there doesn't seem to have tests to ensure that int to string conversion is performed correctly, only sporadic tests here and there. Now we have some basic tests. :)
r? `````@Mark-Simulacrum`````
Autodiff batching
Enzyme supports batching, which is especially known from the ML side when training neural networks.
There we would normally have a training loop, where in each iteration we would pass in some data (e.g. an image), and a target vector. Based on how close we are with our prediction we compute our loss, and then use backpropagation to compute the gradients and update our weights.
That's quite inefficient, so what you normally do is passing in a batch of 8/16/.. images and targets, and compute the gradients for those all at once, allowing better optimizations.
Enzyme supports batching in two ways, the first one (which I implemented here) just accepts a Batch size,
and then each Dual/Duplicated argument has not one, but N shadow arguments. So instead of
```rs
for i in 0..100 {
df(x[i], y[i], 1234);
}
```
You can now do
```rs
for i in 0..100.step_by(4) {
df(x[i+0],x[i+1],x[i+2],x[i+3], y[i+0], y[i+1], y[i+2], y[i+3], 1234);
}
```
which will give the same results, but allows better compiler optimizations. See the testcase for details.
There is a second variant, where we can mark certain arguments and instead of having to pass in N shadow arguments, Enzyme assumes that the argument is N times longer. I.e. instead of accepting 4 slices with 12 floats each, we would accept one slice with 48 floats. I'll implement this over the next days.
I will also add more tests for both modes.
For any one preferring some more interactive explanation, here's a video of Tim's llvm dev talk, where he presents his work. https://www.youtube.com/watch?v=edvaLAL5RqU
I'll also add some other docs to the dev guide and user docs in another PR.
r? ghost
Tracking:
- https://github.com/rust-lang/rust/issues/124509
- https://github.com/rust-lang/rust/issues/135283
Expose algebraic floating point intrinsics
# Problem
A stable Rust implementation of a simple dot product is 8x slower than C++ on modern x86-64 CPUs. The root cause is an inability to let the compiler reorder floating point operations for better vectorization.
See https://github.com/calder/dot-bench for benchmarks. Measurements below were performed on a i7-10875H.
### C++: 10us ✅
With Clang 18.1.3 and `-O2 -march=haswell`:
<table>
<tr>
<th>C++</th>
<th>Assembly</th>
</tr>
<tr>
<td>
<pre lang="cc">
float dot(float *a, float *b, size_t len) {
#pragma clang fp reassociate(on)
float sum = 0.0;
for (size_t i = 0; i < len; ++i) {
sum += a[i] * b[i];
}
return sum;
}
</pre>
</td>
<td>
<img src="https://github.com/user-attachments/assets/739573c0-380a-4d84-9fd9-141343ce7e68" />
</td>
</tr>
</table>
### Nightly Rust: 10us ✅
With rustc 1.86.0-nightly (8239a37f9) and `-C opt-level=3 -C target-feature=+avx2,+fma`:
<table>
<tr>
<th>Rust</th>
<th>Assembly</th>
</tr>
<tr>
<td>
<pre lang="rust">
fn dot(a: &[f32], b: &[f32]) -> f32 {
let mut sum = 0.0;
for i in 0..a.len() {
sum = fadd_algebraic(sum, fmul_algebraic(a[i], b[i]));
}
sum
}
</pre>
</td>
<td>
<img src="https://github.com/user-attachments/assets/9dcf953a-2cd7-42f3-bc34-7117de4c5fb9" />
</td>
</tr>
</table>
### Stable Rust: 84us ❌
With rustc 1.84.1 (e71f9a9a9) and `-C opt-level=3 -C target-feature=+avx2,+fma`:
<table>
<tr>
<th>Rust</th>
<th>Assembly</th>
</tr>
<tr>
<td>
<pre lang="rust">
fn dot(a: &[f32], b: &[f32]) -> f32 {
let mut sum = 0.0;
for i in 0..a.len() {
sum += a[i] * b[i];
}
sum
}
</pre>
</td>
<td>
<img src="https://github.com/user-attachments/assets/936a1f7e-33e4-4ff8-a732-c3cdfe068dca" />
</td>
</tr>
</table>
# Proposed Change
Add `core::intrinsics::f*_algebraic` wrappers to `f16`, `f32`, `f64`, and `f128` gated on a new `float_algebraic` feature.
# Alternatives Considered
https://github.com/rust-lang/rust/issues/21690 has a lot of good discussion of various options for supporting fast math in Rust, but is still open a decade later because any choice that opts in more than individual operations is ultimately contrary to Rust's design principles.
In the mean time, processors have evolved and we're leaving major performance on the table by not supporting vectorization. We shouldn't make users choose between an unstable compiler and an 8x performance hit.
# References
* https://github.com/rust-lang/rust/issues/21690
* https://github.com/rust-lang/libs-team/issues/532
* https://github.com/rust-lang/rust/issues/136469
* https://github.com/calder/dot-bench
* https://www.felixcloutier.com/x86/vfmadd132ps:vfmadd213ps:vfmadd231ps
try-job: x86_64-gnu-nopt
try-job: x86_64-gnu-aux
Use the span of the whole bound when the diagnostic talks about a bound
While it makes sense that the host predicate only points to the `~const` part, as whether the actual trait bound is satisfied is checked separately, the user facing diagnostic is talking about the entire trait bound, at which point it makes more sense to just highlight the entire bound
r? `@compiler-errors` or `@fee1-dead`
ToSocketAddrs: fix typo
It's "a function", never "an function".
I noticed the same typo somewhere in the compiler sources so figured I'd fix it there as well.
Fix `Debug` impl for `LateParamRegionKind`.
It uses `Br` prefixes which are inappropriate and appear to have been incorrectly copy/pasted from the `Debug` impl for `BoundRegionKind`.
r? `@BoxyUwU`
AsyncDestructor: replace fields with impl_did
The future and ctor fields aren't actually used, and the way they are extracted is obviously wrong – swapping the order of the items in the source code will give wrong results.
Instead, store just the LocalDefId of the impl, which is enough for the only use of this data.
Fix 2024 edition doctest panic output
Fixes#137970.
The problem was that the output was actually displayed by rustc itself because we're exiting with `Result<(), String>`, and the display is really not great. So instead, we get the output, we print it and then we return an `ExitCode`.
r? ````@aDotInTheVoid````
Remove `rustc_middle::ty::util::ExplicitSelf`.
It's an old (2017 or earlier) type that describes a `self` receiver. It's only used in `rustc_hir_analysis` for two error messages, and much of the complexity isn't used. I suspect it used to be used for more things.
This commit removes it, and moves a greatly simplified version of the `determine` method into `rustc_hir_analysis`, renamed as `get_self_string`. The big comment on the method is removed because it no longer seems relevant.
r? `@BoxyUwU`
add `TypingMode::Borrowck`
Shares the first commit with #138499, doesn't really matter which PR to land first 😊😁
Introduces `TypingMode::Borrowck` which unlike `TypingMode::Analysis`, uses the hidden type computed by HIR typeck as the initial value of opaques instead of an unconstrained infer var. This is a part of https://github.com/rust-lang/types-team/issues/129.
Using this new `TypingMode` is unfortunately a breaking change for now, see tests/ui/impl-trait/non-defining-uses/as-projection-term.rs. Using an inference variable as the initial value results in non-defining uses in the defining scope. We therefore only enable it if with `-Znext-solver=globally` or `-Ztyping-mode-borrowck`
To do that the PR contains the following changes:
- `TypeckResults::concrete_opaque_type` are already mapped to the definition of the opaque type
- writeback now checks that the non-lifetime parameters of the opaque are universal
- for this, `fn check_opaque_type_parameter_valid` is moved from `rustc_borrowck` to `rustc_trait_selection`
- we add a new `query type_of_opaque_hir_typeck` which, using the same visitors as MIR typeck, attempts to merge the hidden types from HIR typeck from all defining scopes
- done by adding a `DefiningScopeKind` flag to toggle between using borrowck and HIR typeck
- the visitors stop checking that the MIR type matches the HIR type. This is trivial as the HIR type are now used as the initial hidden types of the opaque. This check is useful as a safeguard when not using `TypingMode::Borrowck`, but adding it to the new structure is annoying and it's not soundness critical, so I intend to not add it back.
- add a `TypingMode::Borrowck` which behaves just like `TypingMode::Analysis` except when normalizing opaque types
- it uses `type_of_opaque_hir_typeck(opaque)` as the initial value after replacing its regions with new inference vars
- it uses structural lookup in the new solver
fixes#112201, fixes#132335, fixes#137751
r? `@compiler-errors` `@oli-obk`
Demote i686-pc-windows-gnu to Tier 2
In accordance with [RFC 3771](https://github.com/rust-lang/rfcs/pull/3771). FCP has been completed.
tracking issue #138422
I also added a stub doc page for the target and renamed the windows-gnullvm page for consistency.
Rollup of 8 pull requests
Successful merges:
- #138949 (Rename `is_like_osx` to `is_like_darwin`)
- #139295 (Remove creation of duplicate `AnonPipe`)
- #139313 (Deduplicate some `rustc_middle` function bodies by calling the `rustc_type_ir` equivalent)
- #139317 (compiletest: Encapsulate all of the code that touches libtest)
- #139322 (Add helper function for checking LLD usage to `run-make-support`)
- #139335 (Pass correct param-env to `error_implies`)
- #139342 (Add a mailmap entry for myself)
- #139349 (adt_destructor: sanity-check returned item)
r? `@ghost`
`@rustbot` modify labels: rollup
Pass correct param-env to `error_implies`
Duplicated comment from the test:
In the error reporting code, when reporting fulfillment errors for goals A and B, we try to see if elaborating A will result in another goal that can equate with B. That would signal that B is "implied by" A, allowing us to skip reporting it, which is beneficial for cutting down on the number of diagnostics we report.
In the new trait solver especially, but even in the old trait solver through things like defining opaque type usages, this `can_equate` call was not properly taking the param-env of the goals, resulting in nested obligations that had empty param-envs. If one of these nested obligations was a `ConstParamHasTy` goal, then we would ICE, since those goals are particularly strict about the param-env they're evaluated in.
This is morally a fix for <https://github.com/rust-lang/rust/issues/139314>, but that repro uses details about how defining usages in the `check_opaque_well_formed` code can spring out of type equality, and will likely stop failing soon coincidentally once we start using `PostBorrowck` mode in that check. Instead, we use lazy normalization to end up generating an alias-eq goal whose nested goals woul trigger the ICE instead, since this is a lot more stable.
Fixes https://github.com/rust-lang/rust/issues/139314
r? ``@oli-obk`` or reassign
compiletest: Encapsulate all of the code that touches libtest
Compiletest currently relies on unstable libtest APIs in order to actually execute tests. That's unfortunate, but removing the dependency isn't trivial.
However, we can make a small step towards removing the libtest dependency by encapsulating the libtest interactions into a single dedicated module. That makes it easier to see what parts of libtest are actually used.
---
As a side-effect of moving the `test_opts` function into that dedicated module, this PR also ends up allowing `--fail-fast` to be passed on the command line, instead of requiring an environment variable.
---
There is still (at least) one other aspect of the libtest dependency that this PR does not address, namely the fact that we rely on libtest's output capture (via unstable std APIs) to capture the output that we print during individual tests. I hope to do something about that at some point.
r? jieyouxu
Deduplicate some `rustc_middle` function bodies by calling the `rustc_type_ir` equivalent
Maybe in the future we can use method delegation, but I'd rather avoid that for now (I don't even know if it can do that already)
Remove creation of duplicate `AnonPipe`
The `File` is unwrapped to a `Handle` into an `AnonPipe`, and then that `AnonPipe` was unwrapped to a `Handle` into another `AnonPipe`. The second operation is entirely redundant.
Rename `is_like_osx` to `is_like_darwin`
Replace `is_like_osx` with `is_like_darwin`, which more closely describes reality (OS X is the pre-2016 name for macOS, and is by now quite outdated; Darwin is the overall name for the OS underlying Apple's macOS, iOS, etc.).
``@rustbot`` label O-apple
r? compiler
Folder experiment: Monomorphize region resolver
**NOTE:** This is one of a series of perf experiments that I've come up with while sick in bed. I'm assigning them to lqd b/c you're a good reviewer and you'll hopefully be awake when these experiments finish, lol.
r? lqd
This is actually two tweaks to the `RegionFolder`, monomorphizing its callback and accounting for flags to avoid folding unnecessarily.
The future and ctor fields aren't actually used, and the way they are
extracted is obviously wrong – swapping the order of the items in the
source code will give wrong results.
Instead, store just the LocalDefId of the impl, which is enough for the
only use of this data.