RangeInclusive internal iteration performance improvement.
Specialize `Iterator::try_fold` and `DoubleEndedIterator::try_rfold` to improve code generation in all internal iteration scenarios.
This changes brings the performance of internal iteration with `RangeInclusive` on par with the performance of iteration with `Range`:
- Single conditional jump in hot loop,
- Unrolling and vectorization,
- And even Closed Form substitution.
Unfortunately, it only applies to internal iteration. Despite various attempts at stream-lining the implementation of `next` and `next_back`, LLVM has stubbornly refused to optimize external iteration appropriately, leaving me with a choice between:
- The current implementation, for which Closed Form substitution is performed, but which uses 2 conditional jumps in the hot loop when optimization fail.
- An implementation using a `is_done` boolean, which uses 1 conditional jump in the hot loop when optimization fail, allowing unrolling and vectorization, but for which Closed Form substitution fails.
In the absence of any conclusive evidence as to which usecase matters most, and with no assurance that the lack of Closed Form substitution is not indicative of other optimizations being foiled, there is no way
to pick one implementation over the other, and thus I defer to the statu quo as far as `next` and `next_back` are concerned.
Add unstable Iterator::copied()
Initially suggested at https://github.com/bluss/rust-itertools/pull/289, however the maintainers of itertools suggested this may be better of in a standard library.
The intent of `copied` is to avoid accidentally cloning iterator elements after doing a code refactoring which causes a structure to be no longer `Copy`. This is a relatively common pattern, as it can be seen by calling `rg --pcre2 '[.]map[(][|](?:(\w+)[|] [*]\1|&(\w+)[|] \2)[)]'` on Rust main repository. Additionally, many uses of `cloned` actually want to simply `Copy`, and changing something to be no longer copyable may introduce unnoticeable performance penalty.
Also, this makes sense because the standard library includes `[T].copy_from_slice` to pair with `[T].clone_from_slice`.
This also adds `Option::copied`, because it makes sense to pair it with `Iterator::copied`. I don't think this feature is particularly important, but it makes sense to update `Option` along with `Iterator` for consistency.
name old ns/iter new ns/iter diff ns/iter diff % speedup
iter::bench_cycle_take_ref_sum 927,152 927,194 42 0.00% x 1.00
iter::bench_cycle_take_sum 938,129 603,492 -334,637 -35.67% x 1.55
the originally generated code was highly suboptimal
this brings it close to the same code or even exactly the same as a
manual while-loop by eliminating a branch and the
double stepping of n-1 + 1 steps
The intermediate trait lets us circumvent the specialization
type inference bugs
Add std/core::iter::repeat_with
Adds an iterator primitive `repeat_with` which is the "lazy" version of `repeat` but also more flexible since you can build up state with the `FnMut`. The design is mostly taken from `repeat`.
r? @rust-lang/libs
cc @withoutboats, @scottmcm
During the RFC, it was discussed that figuring out whether a range is empty was subtle, and thus there should be a clear and obvious way to do it. It can't just be ExactSizeIterator::is_empty (also unstable) because not all ranges are ExactSize -- not even Range<i32> or RangeInclusive<usize>.
`match`ing on an `Option<Ordering>` seems cause some confusion for LLVM; switching to just using comparison operators removes a few jumps from the simple `for` loops I was trying.
Implement TrustedLen for Take<Repeat> and Take<RangeFrom>
This will allow optimization of simple `repeat(x).take(n).collect()` iterators, which are currently not vectorized and have capacity checks.
This will only support a few aggregates on `Repeat` and `RangeFrom`, which might be enough for simple cases, but doesn't optimize more complex ones. Namely, Cycle, StepBy, Filter, FilterMap, Peekable, SkipWhile, Skip, FlatMap, Fuse and Inspect are not marked `TrustedLen` when the inner iterator is infinite.
Previous discussion can be found in #47082
r? @alexcrichton
Because the last item needs special handling, it seems that LLVM has trouble canonicalizing the loops in external iteration. With the override, it becomes obvious that the start==end case exits the loop (as opposed to the one *after* that exiting the loop in external iteration).
This is the core method in terms of which the other methods (fold, all, any, find, position, nth, ...) can be implemented, allowing Iterator implementors to get the full goodness of internal iteration by only overriding one method (per direction).