bjoernager/rust - mandelbrot.dk

Author	SHA1	Message	Date
Mazdak Farrokhzad	f19bec89d7	Rollup merge of #58122 - matthieu-m:range_incl_perf, r=dtolnay RangeInclusive internal iteration performance improvement. Specialize `Iterator::try_fold` and `DoubleEndedIterator::try_rfold` to improve code generation in all internal iteration scenarios. This changes brings the performance of internal iteration with `RangeInclusive` on par with the performance of iteration with `Range`: - Single conditional jump in hot loop, - Unrolling and vectorization, - And even Closed Form substitution. Unfortunately, it only applies to internal iteration. Despite various attempts at stream-lining the implementation of `next` and `next_back`, LLVM has stubbornly refused to optimize external iteration appropriately, leaving me with a choice between: - The current implementation, for which Closed Form substitution is performed, but which uses 2 conditional jumps in the hot loop when optimization fail. - An implementation using a `is_done` boolean, which uses 1 conditional jump in the hot loop when optimization fail, allowing unrolling and vectorization, but for which Closed Form substitution fails. In the absence of any conclusive evidence as to which usecase matters most, and with no assurance that the lack of Closed Form substitution is not indicative of other optimizations being foiled, there is no way to pick one implementation over the other, and thus I defer to the statu quo as far as `next` and `next_back` are concerned.	2019-02-23 09:25:12 +01:00
Ralf Jung	72be9a607b	review or fix remaining miri failures in libcore	2019-02-13 18:21:13 +01:00
Ralf Jung	7f5dc49214	review or fix miri failures in iter, slice, cell, time	2019-02-13 17:56:43 +01:00
Ralf Jung	26ade1cfaa	mark failures expected due to panics	2019-02-13 17:56:43 +01:00
Alexander Regueiro	99ed06eb88	libs: doc comments	2019-02-10 23:57:25 +00:00
Matthieu M	4fed67f942	Fix exhaustion of inclusive range try_fold and try_rfold	2019-02-09 18:42:34 +01:00
Ralf Jung	81613ad7cf	disable tests in Miri	2019-02-07 18:24:10 +01:00
Kevin Leimkuhler	8dea0d0172	Add initial impl of is_sorted to Iterator	2019-01-17 22:34:42 -08:00
Stjepan Glavina	7c083a8fed	Remove unnecessary mut	2019-01-14 12:23:50 +01:00
Stjepan Glavina	e449f3d629	Fix failing test	2019-01-14 00:45:57 +01:00
Stjepan Glavina	04c74f46f0	Add core::iter::once_with	2019-01-13 16:58:08 +01:00
bors	a7be40c65a	Auto merge of #56534 - xfix:copied, r=@SimonSapin Add unstable Iterator::copied() Initially suggested at https://github.com/bluss/rust-itertools/pull/289, however the maintainers of itertools suggested this may be better of in a standard library. The intent of `copied` is to avoid accidentally cloning iterator elements after doing a code refactoring which causes a structure to be no longer `Copy`. This is a relatively common pattern, as it can be seen by calling `rg --pcre2 '[.]map[(][\|](?:(\w+)[\|] [*]\1\|&(\w+)[\|] \2)[)]'` on Rust main repository. Additionally, many uses of `cloned` actually want to simply `Copy`, and changing something to be no longer copyable may introduce unnoticeable performance penalty. Also, this makes sense because the standard library includes `[T].copy_from_slice` to pair with `[T].clone_from_slice`. This also adds `Option::copied`, because it makes sense to pair it with `Iterator::copied`. I don't think this feature is particularly important, but it makes sense to update `Option` along with `Iterator` for consistency.	2018-12-26 19:39:19 +00:00
Mark Rousskov	2a663555dd	Remove licenses	2018-12-25 21:08:33 -07:00
Konrad Borowski	8ac5380ea0	Merge branch 'master' into copied	2018-12-23 16:47:11 +01:00
Clar Fon	fb18ddaaaa	Add DoubleEndedIterator::nth_back	2018-12-20 01:18:04 -05:00
Shotaro Yamada	f0483f76e6	Remove `<Cycle as Iterator>::try_fold` override It was a incorrect optimization.	2018-12-17 15:00:22 +09:00
Shotaro Yamada	fbe5aa57ed	Override Cycle::try_fold name old ns/iter new ns/iter diff ns/iter diff % speedup iter::bench_cycle_take_ref_sum 927,152 927,194 42 0.00% x 1.00 iter::bench_cycle_take_sum 938,129 603,492 -334,637 -35.67% x 1.55	2018-12-09 00:01:09 +09:00
Konrad Borowski	a964307999	Add a test for cloned side effects	2018-12-05 17:53:34 +01:00
Konrad Borowski	fe45e9a886	Add tests for Iterator::copied()	2018-12-05 15:40:15 +01:00
Simon Sapin	641c4909e4	Add std::iter::successors	2018-11-20 18:22:40 +01:00
Артём Павлов [Artyom Pavlov]	6357021294	fix test	2018-11-19 01:01:06 +03:00
Артём Павлов [Artyom Pavlov]	6ad61b9c3b	tests	2018-11-18 23:14:52 +03:00
Артём Павлов [Artyom Pavlov]	126b71f690	revert	2018-11-18 21:39:23 +03:00
Scott McMurray	0a3bd9b6ab	Use impl_header_lifetime_elision in libcore	2018-09-29 21:33:35 -07:00
Emerentius	000aff604e	specialize StepBy<Range(Inclusive)> the originally generated code was highly suboptimal this brings it close to the same code or even exactly the same as a manual while-loop by eliminating a branch and the double stepping of n-1 + 1 steps The intermediate trait lets us circumvent the specialization type inference bugs	2018-06-19 19:33:54 +02:00
Thayne McCombs	87941b079a	Stabilize iterator_repeat_with Fixes #48169	2018-06-02 15:52:09 -06:00
Aleksey Kladov	591dd5d992	Add Iterator::find_map	2018-04-03 00:47:00 +03:00
Scott McMurray	11fefeb61c	Add a Zip::nth test for side effects	2018-03-01 02:17:50 -08:00
Scott McMurray	70d5a4600b	Specialize Zip::nth for TrustedRandomAccess Makes the bench asked about on URLO 58x faster :)	2018-03-01 01:57:25 -08:00
Mazdak Farrokhzad	0e394010e6	core::iter::Flatten: update FlatMap & Flatten according to discussion	2018-02-20 08:28:33 +01:00
Mazdak Farrokhzad	6af23f977c	add Iterator::flatten and redefine flat_map(f) in terms of map(f).flatten()	2018-02-20 08:27:32 +01:00
kennytm	bebd2fbfc8	Rollup merge of #48156 - Centril:feature/iterator_repeat_with, r=alexcrichton Add std/core::iter::repeat_with Adds an iterator primitive `repeat_with` which is the "lazy" version of `repeat` but also more flexible since you can build up state with the `FnMut`. The design is mostly taken from `repeat`. r? @rust-lang/libs cc @withoutboats, @scottmcm	2018-02-14 18:25:22 +08:00
Mazdak Farrokhzad	55c669c4d9	core::iter::repeat_with: fix tests some more	2018-02-12 09:15:13 +01:00
Mazdak Farrokhzad	f025eff21d	core::iter::repeat_with: fix tests	2018-02-12 09:13:47 +01:00
Mazdak Farrokhzad	0f789aad2b	add core::iter::repeat_with	2018-02-12 08:05:46 +01:00
Scott McMurray	b5cb393cf5	Use is_empty in range iteration exhaustion tests	2018-02-09 17:54:27 -08:00
Scott McMurray	4f8049a2b0	Add Range[Inclusive]::is_empty During the RFC, it was discussed that figuring out whether a range is empty was subtle, and thus there should be a clear and obvious way to do it. It can't just be ExactSizeIterator::is_empty (also unstable) because not all ranges are ExactSize -- not even Range<i32> or RangeInclusive<usize>.	2018-02-09 01:47:18 -08:00
bors	932c736479	Auto merge of #48057 - scottmcm:less-match-more-compare, r=dtolnay Simplify RangeInclusive::next[_back] `match`ing on an `Option<Ordering>` seems cause some confusion for LLVM; switching to just using comparison operators removes a few jumps from the simple `for` loops I was trying. cc https://github.com/rust-lang/rust/issues/45222 https://github.com/rust-lang/rust/issues/28237#issuecomment-363706510 Example: ```rust #[no_mangle] pub fn coresum(x: std::ops::RangeInclusive<u64>) -> u64 { let mut sum = 0; for i in x { sum += i ^ (i-1); } sum } ``` Today: ```asm coresum: xor r8d, r8d mov r9, -1 xor eax, eax jmp .LBB0_1 .LBB0_4: lea rcx, [rdi - 1] xor rcx, rdi add rax, rcx mov rsi, rdx mov rdi, r10 .LBB0_1: cmp rdi, rsi mov ecx, 1 cmovb rcx, r9 cmove rcx, r8 test rcx, rcx mov edx, 0 mov r10d, 1 je .LBB0_4 // 1 cmp rcx, -1 jne .LBB0_5 // 2 lea r10, [rdi + 1] mov rdx, rsi jmp .LBB0_4 // 3 .LBB0_5: ret ``` With this PR: ```asm coresum: cmp rcx, rdx jbe .LBB0_2 xor eax, eax ret .LBB0_2: xor r8d, r8d mov r9d, 1 xor eax, eax .p2align 4, 0x90 .LBB0_3: lea r10, [rcx + 1] cmp rcx, rdx cmovae rdx, r8 cmovae r10, r9 lea r11, [rcx - 1] xor r11, rcx add rax, r11 mov rcx, r10 cmp r10, rdx jbe .LBB0_3 // Just this ret ``` <details><summary>Though using internal iteration (`.map(\|i\| i ^ (i-1)).sum()`) is still shorter to type, and lets the compiler unroll it</summary> ```asm coresum_inner: .Lcfi0: .seh_proc coresum_inner sub rsp, 168 .Lcfi1: .seh_stackalloc 168 vmovdqa xmmword ptr [rsp + 144], xmm15 .Lcfi2: .seh_savexmm 15, 144 vmovdqa xmmword ptr [rsp + 128], xmm14 .Lcfi3: .seh_savexmm 14, 128 vmovdqa xmmword ptr [rsp + 112], xmm13 .Lcfi4: .seh_savexmm 13, 112 vmovdqa xmmword ptr [rsp + 96], xmm12 .Lcfi5: .seh_savexmm 12, 96 vmovdqa xmmword ptr [rsp + 80], xmm11 .Lcfi6: .seh_savexmm 11, 80 vmovdqa xmmword ptr [rsp + 64], xmm10 .Lcfi7: .seh_savexmm 10, 64 vmovdqa xmmword ptr [rsp + 48], xmm9 .Lcfi8: .seh_savexmm 9, 48 vmovdqa xmmword ptr [rsp + 32], xmm8 .Lcfi9: .seh_savexmm 8, 32 vmovdqa xmmword ptr [rsp + 16], xmm7 .Lcfi10: .seh_savexmm 7, 16 vmovdqa xmmword ptr [rsp], xmm6 .Lcfi11: .seh_savexmm 6, 0 .Lcfi12: .seh_endprologue cmp rdx, rcx jae .LBB1_2 xor eax, eax jmp .LBB1_13 .LBB1_2: mov r8, rdx sub r8, rcx jbe .LBB1_3 cmp r8, 7 jbe .LBB1_5 mov rax, r8 and rax, -8 mov r9, r8 and r9, -8 je .LBB1_5 add rax, rcx vmovq xmm0, rcx vpshufd xmm0, xmm0, 68 mov ecx, 1 vmovq xmm1, rcx vpslldq xmm1, xmm1, 8 vpaddq xmm1, xmm0, xmm1 vpxor xmm0, xmm0, xmm0 vpcmpeqd xmm11, xmm11, xmm11 vmovdqa xmm12, xmmword ptr [rip + __xmm@00000000000000010000000000000001] vmovdqa xmm13, xmmword ptr [rip + __xmm@00000000000000030000000000000003] vmovdqa xmm14, xmmword ptr [rip + __xmm@00000000000000050000000000000005] vmovdqa xmm15, xmmword ptr [rip + __xmm@00000000000000080000000000000008] mov rcx, r9 vpxor xmm4, xmm4, xmm4 vpxor xmm5, xmm5, xmm5 vpxor xmm6, xmm6, xmm6 .p2align 4, 0x90 .LBB1_9: vpaddq xmm7, xmm1, xmmword ptr [rip + __xmm@00000000000000020000000000000002] vpaddq xmm9, xmm1, xmmword ptr [rip + __xmm@00000000000000040000000000000004] vpaddq xmm10, xmm1, xmmword ptr [rip + __xmm@00000000000000060000000000000006] vpaddq xmm8, xmm1, xmm12 vpxor xmm7, xmm8, xmm7 vpaddq xmm2, xmm1, xmm13 vpxor xmm8, xmm2, xmm9 vpaddq xmm3, xmm1, xmm14 vpxor xmm3, xmm3, xmm10 vpaddq xmm2, xmm1, xmm11 vpxor xmm2, xmm2, xmm1 vpaddq xmm0, xmm2, xmm0 vpaddq xmm4, xmm7, xmm4 vpaddq xmm5, xmm8, xmm5 vpaddq xmm6, xmm3, xmm6 vpaddq xmm1, xmm1, xmm15 add rcx, -8 jne .LBB1_9 vpaddq xmm0, xmm4, xmm0 vpaddq xmm0, xmm5, xmm0 vpaddq xmm0, xmm6, xmm0 vpshufd xmm1, xmm0, 78 vpaddq xmm0, xmm0, xmm1 vmovq r10, xmm0 cmp r8, r9 jne .LBB1_6 jmp .LBB1_11 .LBB1_3: xor r10d, r10d jmp .LBB1_12 .LBB1_5: xor r10d, r10d mov rax, rcx .p2align 4, 0x90 .LBB1_6: lea rcx, [rax - 1] xor rcx, rax inc rax add r10, rcx cmp rdx, rax jne .LBB1_6 .LBB1_11: mov rcx, rdx .LBB1_12: lea rax, [rcx - 1] xor rax, rcx add rax, r10 .LBB1_13: vmovaps xmm6, xmmword ptr [rsp] vmovaps xmm7, xmmword ptr [rsp + 16] vmovaps xmm8, xmmword ptr [rsp + 32] vmovaps xmm9, xmmword ptr [rsp + 48] vmovaps xmm10, xmmword ptr [rsp + 64] vmovaps xmm11, xmmword ptr [rsp + 80] vmovaps xmm12, xmmword ptr [rsp + 96] vmovaps xmm13, xmmword ptr [rsp + 112] vmovaps xmm14, xmmword ptr [rsp + 128] vmovaps xmm15, xmmword ptr [rsp + 144] add rsp, 168 ret .seh_handlerdata .section .text,"xr",one_only,coresum_inner .Lcfi13: .seh_endproc ``` </details>	2018-02-08 06:38:30 +00:00
Scott McMurray	27d4d51670	Simplify RangeInclusive::next[_back] `match`ing on an `Option<Ordering>` seems cause some confusion for LLVM; switching to just using comparison operators removes a few jumps from the simple `for` loops I was trying.	2018-02-07 11:11:54 -08:00
Manish Goregaokar	da6dcbc21e	Rollup merge of #47944 - oberien:unboundediterator-trustedlen, r=bluss Implement TrustedLen for Take<Repeat> and Take<RangeFrom> This will allow optimization of simple `repeat(x).take(n).collect()` iterators, which are currently not vectorized and have capacity checks. This will only support a few aggregates on `Repeat` and `RangeFrom`, which might be enough for simple cases, but doesn't optimize more complex ones. Namely, Cycle, StepBy, Filter, FilterMap, Peekable, SkipWhile, Skip, FlatMap, Fuse and Inspect are not marked `TrustedLen` when the inner iterator is infinite. Previous discussion can be found in #47082 r? @alexcrichton	2018-02-07 08:30:53 -08:00
kennytm	4f184eb6a3	Rollup merge of #48012 - scottmcm:faster-rangeinclusive-fold, r=alexcrichton Override try_[r]fold for RangeInclusive Because the last item needs special handling, it seems that LLVM has trouble canonicalizing the loops in external iteration. With the override, it becomes obvious that the start==end case exits the loop (as opposed to the one after that exiting the loop in external iteration). Demo adapted from https://github.com/rust-lang/rust/issues/45222 ```rust #[no_mangle] pub fn foo3r(n: u64) -> u64 { let mut count = 0; (0..n).for_each(\|_\| { (0 ..= n).rev().for_each(\|j\| { count += j; }) }); count } ``` <details> <summary>Current nightly ASM, 100 lines (https://play.rust-lang.org/?gist=f5674c702c6e2045c3aab5d03763e5f6&version=nightly&mode=release)</summary> ```asm foo3r: pushq %rbx .Lcfi0: .Lcfi1: testq %rdi, %rdi je .LBB0_1 testb $1, %dil jne .LBB0_4 xorl %eax, %eax xorl %r8d, %r8d cmpq $1, %rdi jne .LBB0_11 jmp .LBB0_23 .LBB0_1: xorl %eax, %eax popq %rbx retq .LBB0_4: xorl %r8d, %r8d movq $-1, %r9 xorl %eax, %eax movq %rdi, %r11 xorl %r10d, %r10d jmp .LBB0_5 .LBB0_8: addq %r11, %rax movq %rsi, %r11 movq %rdx, %r10 .LBB0_5: cmpq %r11, %r10 movl $1, %ecx cmovbq %r9, %rcx cmoveq %r8, %rcx testq %rcx, %rcx movl $0, %esi movl $1, %edx je .LBB0_8 cmpq $-1, %rcx jne .LBB0_9 leaq -1(%r11), %rsi movq %r10, %rdx jmp .LBB0_8 .LBB0_9: movl $1, %r8d cmpq $1, %rdi je .LBB0_23 .LBB0_11: xorl %r9d, %r9d movq $-1, %r10 .LBB0_12: movq %rdi, %rsi xorl %r11d, %r11d jmp .LBB0_13 .LBB0_16: addq %rsi, %rax movq %rcx, %rsi movq %rbx, %r11 .LBB0_13: cmpq %rsi, %r11 movl $1, %edx cmovbq %r10, %rdx cmoveq %r9, %rdx testq %rdx, %rdx movl $0, %ecx movl $1, %ebx je .LBB0_16 cmpq $-1, %rdx jne .LBB0_17 leaq -1(%rsi), %rcx movq %r11, %rbx jmp .LBB0_16 .LBB0_17: movq %rdi, %rcx xorl %r11d, %r11d jmp .LBB0_18 .LBB0_21: addq %rcx, %rax movq %rsi, %rcx movq %rbx, %r11 .LBB0_18: cmpq %rcx, %r11 movl $1, %edx cmovbq %r10, %rdx cmoveq %r9, %rdx testq %rdx, %rdx movl $0, %esi movl $1, %ebx je .LBB0_21 cmpq $-1, %rdx jne .LBB0_22 leaq -1(%rcx), %rsi movq %r11, %rbx jmp .LBB0_21 .LBB0_22: addq $2, %r8 cmpq %rdi, %r8 jne .LBB0_12 .LBB0_23: popq %rbx retq .Lfunc_end0: ``` </details><br> With this PR: ```asm foo3r: test rcx, rcx je .LBB3_1 lea r8, [rcx - 1] lea rdx, [rcx - 2] mov rax, r8 mul rdx shld rdx, rax, 63 imul r8, r8 add r8, rcx sub r8, rdx imul r8, rcx mov rax, r8 ret .LBB3_1: xor r8d, r8d mov rax, r8 ret ```	2018-02-07 03:23:25 +08:00
Scott McMurray	1b1e887f4d	Override try_[r]fold for RangeInclusive Because the last item needs special handling, it seems that LLVM has trouble canonicalizing the loops in external iteration. With the override, it becomes obvious that the start==end case exits the loop (as opposed to the one after that exiting the loop in external iteration).	2018-02-04 23:48:40 -08:00
oberien	75474ff132	TrustedLen for Repeat / RangeFrom test cases	2018-02-04 16:09:32 +01:00
oberien	f08dec114f	Handle Overflow	2018-01-19 21:07:01 +01:00
oberien	d33cc12eed	Unit Tests	2018-01-19 14:55:34 +01:00
varkor	919d643b79	Add `min` and `last` specialisations for `Range`	2018-01-09 19:37:44 +00:00
varkor	c23d4500fd	Fix behaviour after iterator exhaustion	2018-01-05 18:57:10 +00:00
varkor	439beab41f	Remove min from RangeFrom	2018-01-04 15:03:50 +00:00
varkor	f3baa85729	Add tests for specialised Range iter methods	2018-01-04 12:37:00 +00:00
Scott McMurray	eef4d42a3f	Fundamental internal iteration with try_fold This is the core method in terms of which the other methods (fold, all, any, find, position, nth, ...) can be implemented, allowing Iterator implementors to get the full goodness of internal iteration by only overriding one method (per direction).	2017-10-29 15:45:20 -07:00

1 2

65 commits