1
Fork 0
Commit graph

65 commits

Author SHA1 Message Date
Mazdak Farrokhzad
f19bec89d7
Rollup merge of #58122 - matthieu-m:range_incl_perf, r=dtolnay
RangeInclusive internal iteration performance improvement.

Specialize `Iterator::try_fold` and `DoubleEndedIterator::try_rfold` to improve code generation in all internal iteration scenarios.

This changes brings the performance of internal iteration with `RangeInclusive` on par with the performance of iteration with `Range`:

 - Single conditional jump in hot loop,
 - Unrolling and vectorization,
 - And even Closed Form substitution.

Unfortunately, it only applies to internal iteration. Despite various attempts at stream-lining the implementation of `next` and `next_back`, LLVM has stubbornly refused to optimize external iteration appropriately, leaving me with a choice between:

 - The current implementation, for which Closed Form substitution is performed, but which uses 2 conditional jumps in the hot loop when optimization fail.
 - An implementation using a `is_done` boolean, which uses 1 conditional jump in the hot loop when optimization fail, allowing unrolling and vectorization, but for which Closed Form substitution fails.

In the absence of any conclusive evidence as to which usecase matters most, and with no assurance that the lack of Closed Form substitution is not indicative of other optimizations being foiled, there is no way
to pick one implementation over the other, and thus I defer to the statu quo as far as `next` and `next_back` are concerned.
2019-02-23 09:25:12 +01:00
Ralf Jung
72be9a607b review or fix remaining miri failures in libcore 2019-02-13 18:21:13 +01:00
Ralf Jung
7f5dc49214 review or fix miri failures in iter, slice, cell, time 2019-02-13 17:56:43 +01:00
Ralf Jung
26ade1cfaa mark failures expected due to panics 2019-02-13 17:56:43 +01:00
Alexander Regueiro
99ed06eb88 libs: doc comments 2019-02-10 23:57:25 +00:00
Matthieu M
4fed67f942 Fix exhaustion of inclusive range try_fold and try_rfold 2019-02-09 18:42:34 +01:00
Ralf Jung
81613ad7cf disable tests in Miri 2019-02-07 18:24:10 +01:00
Kevin Leimkuhler
8dea0d0172 Add initial impl of is_sorted to Iterator 2019-01-17 22:34:42 -08:00
Stjepan Glavina
7c083a8fed Remove unnecessary mut 2019-01-14 12:23:50 +01:00
Stjepan Glavina
e449f3d629 Fix failing test 2019-01-14 00:45:57 +01:00
Stjepan Glavina
04c74f46f0 Add core::iter::once_with 2019-01-13 16:58:08 +01:00
bors
a7be40c65a Auto merge of #56534 - xfix:copied, r=@SimonSapin
Add unstable Iterator::copied()

Initially suggested at https://github.com/bluss/rust-itertools/pull/289, however the maintainers of itertools suggested this may be better of in a standard library.

The intent of `copied` is to avoid accidentally cloning iterator elements after doing a code refactoring which causes a structure to be no longer `Copy`. This is a relatively common pattern, as it can be seen by calling `rg --pcre2 '[.]map[(][|](?:(\w+)[|] [*]\1|&(\w+)[|] \2)[)]'` on Rust main repository. Additionally, many uses of `cloned` actually want to simply `Copy`, and changing something to be no longer copyable may introduce unnoticeable performance penalty.

Also, this makes sense because the standard library includes `[T].copy_from_slice` to pair with `[T].clone_from_slice`.

This also adds `Option::copied`, because it makes sense to pair it with `Iterator::copied`. I don't think this feature is particularly important, but it makes sense to update `Option` along with `Iterator` for consistency.
2018-12-26 19:39:19 +00:00
Mark Rousskov
2a663555dd Remove licenses 2018-12-25 21:08:33 -07:00
Konrad Borowski
8ac5380ea0
Merge branch 'master' into copied 2018-12-23 16:47:11 +01:00
Clar Fon
fb18ddaaaa Add DoubleEndedIterator::nth_back 2018-12-20 01:18:04 -05:00
Shotaro Yamada
f0483f76e6 Remove <Cycle as Iterator>::try_fold override
It was a incorrect optimization.
2018-12-17 15:00:22 +09:00
Shotaro Yamada
fbe5aa57ed Override Cycle::try_fold
name                            old ns/iter  new ns/iter  diff ns/iter   diff %  speedup
 iter::bench_cycle_take_ref_sum  927,152      927,194                42    0.00%   x 1.00
 iter::bench_cycle_take_sum      938,129      603,492          -334,637  -35.67%   x 1.55
2018-12-09 00:01:09 +09:00
Konrad Borowski
a964307999 Add a test for cloned side effects 2018-12-05 17:53:34 +01:00
Konrad Borowski
fe45e9a886 Add tests for Iterator::copied() 2018-12-05 15:40:15 +01:00
Simon Sapin
641c4909e4 Add std::iter::successors 2018-11-20 18:22:40 +01:00
Артём Павлов [Artyom Pavlov]
6357021294
fix test 2018-11-19 01:01:06 +03:00
Артём Павлов [Artyom Pavlov]
6ad61b9c3b
tests 2018-11-18 23:14:52 +03:00
Артём Павлов [Artyom Pavlov]
126b71f690
revert 2018-11-18 21:39:23 +03:00
Scott McMurray
0a3bd9b6ab Use impl_header_lifetime_elision in libcore 2018-09-29 21:33:35 -07:00
Emerentius
000aff604e specialize StepBy<Range(Inclusive)>
the originally generated code was highly suboptimal
this brings it close to the same code or even exactly the same as a
manual while-loop by eliminating a branch and the
double stepping of n-1 + 1 steps

The intermediate trait lets us circumvent the specialization
type inference bugs
2018-06-19 19:33:54 +02:00
Thayne McCombs
87941b079a Stabilize iterator_repeat_with
Fixes #48169
2018-06-02 15:52:09 -06:00
Aleksey Kladov
591dd5d992 Add Iterator::find_map 2018-04-03 00:47:00 +03:00
Scott McMurray
11fefeb61c Add a Zip::nth test for side effects 2018-03-01 02:17:50 -08:00
Scott McMurray
70d5a4600b Specialize Zip::nth for TrustedRandomAccess
Makes the bench asked about on URLO 58x faster :)
2018-03-01 01:57:25 -08:00
Mazdak Farrokhzad
0e394010e6 core::iter::Flatten: update FlatMap & Flatten according to discussion 2018-02-20 08:28:33 +01:00
Mazdak Farrokhzad
6af23f977c add Iterator::flatten and redefine flat_map(f) in terms of map(f).flatten() 2018-02-20 08:27:32 +01:00
kennytm
bebd2fbfc8
Rollup merge of #48156 - Centril:feature/iterator_repeat_with, r=alexcrichton
Add std/core::iter::repeat_with

Adds an iterator primitive `repeat_with` which is the "lazy" version of `repeat` but also more flexible since you can build up state with the `FnMut`. The design is mostly taken from `repeat`.

r? @rust-lang/libs
cc @withoutboats, @scottmcm
2018-02-14 18:25:22 +08:00
Mazdak Farrokhzad
55c669c4d9 core::iter::repeat_with: fix tests some more 2018-02-12 09:15:13 +01:00
Mazdak Farrokhzad
f025eff21d core::iter::repeat_with: fix tests 2018-02-12 09:13:47 +01:00
Mazdak Farrokhzad
0f789aad2b add core::iter::repeat_with 2018-02-12 08:05:46 +01:00
Scott McMurray
b5cb393cf5 Use is_empty in range iteration exhaustion tests 2018-02-09 17:54:27 -08:00
Scott McMurray
4f8049a2b0 Add Range[Inclusive]::is_empty
During the RFC, it was discussed that figuring out whether a range is empty was subtle, and thus there should be a clear and obvious way to do it.  It can't just be ExactSizeIterator::is_empty (also unstable) because not all ranges are ExactSize -- not even Range<i32> or RangeInclusive<usize>.
2018-02-09 01:47:18 -08:00
bors
932c736479 Auto merge of #48057 - scottmcm:less-match-more-compare, r=dtolnay
Simplify RangeInclusive::next[_back]

`match`ing on an `Option<Ordering>` seems cause some confusion for LLVM; switching to just using comparison operators removes a few jumps from the simple `for` loops I was trying.

cc https://github.com/rust-lang/rust/issues/45222 https://github.com/rust-lang/rust/issues/28237#issuecomment-363706510

Example:
```rust
#[no_mangle]
pub fn coresum(x: std::ops::RangeInclusive<u64>) -> u64 {
    let mut sum = 0;
    for i in x {
        sum += i ^ (i-1);
    }
    sum
}
```
Today:
```asm
coresum:
    xor r8d, r8d
    mov r9, -1
    xor eax, eax
    jmp .LBB0_1
.LBB0_4:
    lea rcx, [rdi - 1]
    xor rcx, rdi
    add rax, rcx
    mov rsi, rdx
    mov rdi, r10
.LBB0_1:
    cmp rdi, rsi
    mov ecx, 1
    cmovb   rcx, r9
    cmove   rcx, r8
    test    rcx, rcx
    mov edx, 0
    mov r10d, 1
    je  .LBB0_4         // 1
    cmp rcx, -1
    jne .LBB0_5         // 2
    lea r10, [rdi + 1]
    mov rdx, rsi
    jmp .LBB0_4         // 3
.LBB0_5:
    ret
```
With this PR:
```asm
coresum:
	cmp	rcx, rdx
	jbe	.LBB0_2
	xor	eax, eax
	ret
.LBB0_2:
	xor	r8d, r8d
	mov	r9d, 1
	xor	eax, eax
	.p2align	4, 0x90
.LBB0_3:
	lea	r10, [rcx + 1]
	cmp	rcx, rdx
	cmovae	rdx, r8
	cmovae	r10, r9
	lea	r11, [rcx - 1]
	xor	r11, rcx
	add	rax, r11
	mov	rcx, r10
	cmp	r10, rdx
	jbe	.LBB0_3         // Just this
	ret
```

<details><summary>Though using internal iteration (`.map(|i| i ^ (i-1)).sum()`) is still shorter to type, and lets the compiler unroll it</summary>

```asm
coresum_inner:
.Lcfi0:
.seh_proc coresum_inner
	sub	rsp, 168
.Lcfi1:
	.seh_stackalloc 168
	vmovdqa	xmmword ptr [rsp + 144], xmm15
.Lcfi2:
	.seh_savexmm 15, 144
	vmovdqa	xmmword ptr [rsp + 128], xmm14
.Lcfi3:
	.seh_savexmm 14, 128
	vmovdqa	xmmword ptr [rsp + 112], xmm13
.Lcfi4:
	.seh_savexmm 13, 112
	vmovdqa	xmmword ptr [rsp + 96], xmm12
.Lcfi5:
	.seh_savexmm 12, 96
	vmovdqa	xmmword ptr [rsp + 80], xmm11
.Lcfi6:
	.seh_savexmm 11, 80
	vmovdqa	xmmword ptr [rsp + 64], xmm10
.Lcfi7:
	.seh_savexmm 10, 64
	vmovdqa	xmmword ptr [rsp + 48], xmm9
.Lcfi8:
	.seh_savexmm 9, 48
	vmovdqa	xmmword ptr [rsp + 32], xmm8
.Lcfi9:
	.seh_savexmm 8, 32
	vmovdqa	xmmword ptr [rsp + 16], xmm7
.Lcfi10:
	.seh_savexmm 7, 16
	vmovdqa	xmmword ptr [rsp], xmm6
.Lcfi11:
	.seh_savexmm 6, 0
.Lcfi12:
	.seh_endprologue
	cmp	rdx, rcx
	jae	.LBB1_2
	xor	eax, eax
	jmp	.LBB1_13
.LBB1_2:
	mov	r8, rdx
	sub	r8, rcx
	jbe	.LBB1_3
	cmp	r8, 7
	jbe	.LBB1_5
	mov	rax, r8
	and	rax, -8
	mov	r9, r8
	and	r9, -8
	je	.LBB1_5
	add	rax, rcx
	vmovq	xmm0, rcx
	vpshufd	xmm0, xmm0, 68
	mov	ecx, 1
	vmovq	xmm1, rcx
	vpslldq	xmm1, xmm1, 8
	vpaddq	xmm1, xmm0, xmm1
	vpxor	xmm0, xmm0, xmm0
	vpcmpeqd	xmm11, xmm11, xmm11
	vmovdqa	xmm12, xmmword ptr [rip + __xmm@00000000000000010000000000000001]
	vmovdqa	xmm13, xmmword ptr [rip + __xmm@00000000000000030000000000000003]
	vmovdqa	xmm14, xmmword ptr [rip + __xmm@00000000000000050000000000000005]
	vmovdqa	xmm15, xmmword ptr [rip + __xmm@00000000000000080000000000000008]
	mov	rcx, r9
	vpxor	xmm4, xmm4, xmm4
	vpxor	xmm5, xmm5, xmm5
	vpxor	xmm6, xmm6, xmm6
	.p2align	4, 0x90
.LBB1_9:
	vpaddq	xmm7, xmm1, xmmword ptr [rip + __xmm@00000000000000020000000000000002]
	vpaddq	xmm9, xmm1, xmmword ptr [rip + __xmm@00000000000000040000000000000004]
	vpaddq	xmm10, xmm1, xmmword ptr [rip + __xmm@00000000000000060000000000000006]
	vpaddq	xmm8, xmm1, xmm12
	vpxor	xmm7, xmm8, xmm7
	vpaddq	xmm2, xmm1, xmm13
	vpxor	xmm8, xmm2, xmm9
	vpaddq	xmm3, xmm1, xmm14
	vpxor	xmm3, xmm3, xmm10
	vpaddq	xmm2, xmm1, xmm11
	vpxor	xmm2, xmm2, xmm1
	vpaddq	xmm0, xmm2, xmm0
	vpaddq	xmm4, xmm7, xmm4
	vpaddq	xmm5, xmm8, xmm5
	vpaddq	xmm6, xmm3, xmm6
	vpaddq	xmm1, xmm1, xmm15
	add	rcx, -8
	jne	.LBB1_9
	vpaddq	xmm0, xmm4, xmm0
	vpaddq	xmm0, xmm5, xmm0
	vpaddq	xmm0, xmm6, xmm0
	vpshufd	xmm1, xmm0, 78
	vpaddq	xmm0, xmm0, xmm1
	vmovq	r10, xmm0
	cmp	r8, r9
	jne	.LBB1_6
	jmp	.LBB1_11
.LBB1_3:
	xor	r10d, r10d
	jmp	.LBB1_12
.LBB1_5:
	xor	r10d, r10d
	mov	rax, rcx
	.p2align	4, 0x90
.LBB1_6:
	lea	rcx, [rax - 1]
	xor	rcx, rax
	inc	rax
	add	r10, rcx
	cmp	rdx, rax
	jne	.LBB1_6
.LBB1_11:
	mov	rcx, rdx
.LBB1_12:
	lea	rax, [rcx - 1]
	xor	rax, rcx
	add	rax, r10
.LBB1_13:
	vmovaps	xmm6, xmmword ptr [rsp]
	vmovaps	xmm7, xmmword ptr [rsp + 16]
	vmovaps	xmm8, xmmword ptr [rsp + 32]
	vmovaps	xmm9, xmmword ptr [rsp + 48]
	vmovaps	xmm10, xmmword ptr [rsp + 64]
	vmovaps	xmm11, xmmword ptr [rsp + 80]
	vmovaps	xmm12, xmmword ptr [rsp + 96]
	vmovaps	xmm13, xmmword ptr [rsp + 112]
	vmovaps	xmm14, xmmword ptr [rsp + 128]
	vmovaps	xmm15, xmmword ptr [rsp + 144]
	add	rsp, 168
	ret
	.seh_handlerdata
	.section	.text,"xr",one_only,coresum_inner
.Lcfi13:
	.seh_endproc
```

</details>
2018-02-08 06:38:30 +00:00
Scott McMurray
27d4d51670 Simplify RangeInclusive::next[_back]
`match`ing on an `Option<Ordering>` seems cause some confusion for LLVM; switching to just using comparison operators removes a few jumps from the simple `for` loops I was trying.
2018-02-07 11:11:54 -08:00
Manish Goregaokar
da6dcbc21e
Rollup merge of #47944 - oberien:unboundediterator-trustedlen, r=bluss
Implement TrustedLen for Take<Repeat> and Take<RangeFrom>

This will allow optimization of simple `repeat(x).take(n).collect()` iterators, which are currently not vectorized and have capacity checks.

This will only support a few aggregates on `Repeat` and `RangeFrom`, which might be enough for simple cases, but doesn't optimize more complex ones. Namely, Cycle, StepBy, Filter, FilterMap, Peekable, SkipWhile, Skip, FlatMap, Fuse and Inspect are not marked `TrustedLen` when the inner iterator is infinite.

Previous discussion can be found in #47082

r? @alexcrichton
2018-02-07 08:30:53 -08:00
kennytm
4f184eb6a3
Rollup merge of #48012 - scottmcm:faster-rangeinclusive-fold, r=alexcrichton
Override try_[r]fold for RangeInclusive

Because the last item needs special handling, it seems that LLVM has trouble canonicalizing the loops in external iteration.  With the override, it becomes obvious that the start==end case exits the loop (as opposed to the one *after* that exiting the loop in external iteration).

Demo adapted from https://github.com/rust-lang/rust/issues/45222
```rust
#[no_mangle]
pub fn foo3r(n: u64) -> u64 {
    let mut count = 0;
    (0..n).for_each(|_| {
        (0 ..= n).rev().for_each(|j| {
            count += j;
        })
    });
    count
}
```

<details>
 <summary>Current nightly ASM, 100 lines (https://play.rust-lang.org/?gist=f5674c702c6e2045c3aab5d03763e5f6&version=nightly&mode=release)</summary>

```asm
foo3r:
	pushq	%rbx
.Lcfi0:
.Lcfi1:
	testq	%rdi, %rdi
	je	.LBB0_1
	testb	$1, %dil
	jne	.LBB0_4
	xorl	%eax, %eax
	xorl	%r8d, %r8d
	cmpq	$1, %rdi
	jne	.LBB0_11
	jmp	.LBB0_23
.LBB0_1:
	xorl	%eax, %eax
	popq	%rbx
	retq
.LBB0_4:
	xorl	%r8d, %r8d
	movq	$-1, %r9
	xorl	%eax, %eax
	movq	%rdi, %r11
	xorl	%r10d, %r10d
	jmp	.LBB0_5
.LBB0_8:
	addq	%r11, %rax
	movq	%rsi, %r11
	movq	%rdx, %r10
.LBB0_5:
	cmpq	%r11, %r10
	movl	$1, %ecx
	cmovbq	%r9, %rcx
	cmoveq	%r8, %rcx
	testq	%rcx, %rcx
	movl	$0, %esi
	movl	$1, %edx
	je	.LBB0_8
	cmpq	$-1, %rcx
	jne	.LBB0_9
	leaq	-1(%r11), %rsi
	movq	%r10, %rdx
	jmp	.LBB0_8
.LBB0_9:
	movl	$1, %r8d
	cmpq	$1, %rdi
	je	.LBB0_23
.LBB0_11:
	xorl	%r9d, %r9d
	movq	$-1, %r10
.LBB0_12:
	movq	%rdi, %rsi
	xorl	%r11d, %r11d
	jmp	.LBB0_13
.LBB0_16:
	addq	%rsi, %rax
	movq	%rcx, %rsi
	movq	%rbx, %r11
.LBB0_13:
	cmpq	%rsi, %r11
	movl	$1, %edx
	cmovbq	%r10, %rdx
	cmoveq	%r9, %rdx
	testq	%rdx, %rdx
	movl	$0, %ecx
	movl	$1, %ebx
	je	.LBB0_16
	cmpq	$-1, %rdx
	jne	.LBB0_17
	leaq	-1(%rsi), %rcx
	movq	%r11, %rbx
	jmp	.LBB0_16
.LBB0_17:
	movq	%rdi, %rcx
	xorl	%r11d, %r11d
	jmp	.LBB0_18
.LBB0_21:
	addq	%rcx, %rax
	movq	%rsi, %rcx
	movq	%rbx, %r11
.LBB0_18:
	cmpq	%rcx, %r11
	movl	$1, %edx
	cmovbq	%r10, %rdx
	cmoveq	%r9, %rdx
	testq	%rdx, %rdx
	movl	$0, %esi
	movl	$1, %ebx
	je	.LBB0_21
	cmpq	$-1, %rdx
	jne	.LBB0_22
	leaq	-1(%rcx), %rsi
	movq	%r11, %rbx
	jmp	.LBB0_21
.LBB0_22:
	addq	$2, %r8
	cmpq	%rdi, %r8
	jne	.LBB0_12
.LBB0_23:
	popq	%rbx
	retq
.Lfunc_end0:
```
</details><br>

With this PR:
```asm
foo3r:
	test	rcx, rcx
	je	.LBB3_1
	lea	r8, [rcx - 1]
	lea	rdx, [rcx - 2]
	mov	rax, r8
	mul	rdx
	shld	rdx, rax, 63
	imul	r8, r8
	add	r8, rcx
	sub	r8, rdx
	imul	r8, rcx
	mov	rax, r8
	ret
.LBB3_1:
	xor	r8d, r8d
	mov	rax, r8
	ret
```
2018-02-07 03:23:25 +08:00
Scott McMurray
1b1e887f4d Override try_[r]fold for RangeInclusive
Because the last item needs special handling, it seems that LLVM has trouble canonicalizing the loops in external iteration.  With the override, it becomes obvious that the start==end case exits the loop (as opposed to the one *after* that exiting the loop in external iteration).
2018-02-04 23:48:40 -08:00
oberien
75474ff132 TrustedLen for Repeat / RangeFrom test cases 2018-02-04 16:09:32 +01:00
oberien
f08dec114f Handle Overflow 2018-01-19 21:07:01 +01:00
oberien
d33cc12eed Unit Tests 2018-01-19 14:55:34 +01:00
varkor
919d643b79 Add min and last specialisations for Range 2018-01-09 19:37:44 +00:00
varkor
c23d4500fd Fix behaviour after iterator exhaustion 2018-01-05 18:57:10 +00:00
varkor
439beab41f Remove min from RangeFrom 2018-01-04 15:03:50 +00:00
varkor
f3baa85729 Add tests for specialised Range iter methods 2018-01-04 12:37:00 +00:00
Scott McMurray
eef4d42a3f Fundamental internal iteration with try_fold
This is the core method in terms of which the other methods (fold, all, any, find, position, nth, ...) can be implemented, allowing Iterator implementors to get the full goodness of internal iteration by only overriding one method (per direction).
2017-10-29 15:45:20 -07:00