Commit graph

362 commits

Author SHA1 Message Date
Stuart Cook
c6bf3a01ef
Rollup merge of #137880 - EnzymeAD:autodiff-batching, r=oli-obk
Autodiff batching

Enzyme supports batching, which is especially known from the ML side when training neural networks.
There we would normally have a training loop, where in each iteration we would pass in some data (e.g. an image), and a target vector. Based on how close we are with our prediction we compute our loss, and then use backpropagation to compute the gradients and update our weights.
That's quite inefficient, so what you normally do is passing in a batch of 8/16/.. images and targets, and compute the gradients for those all at once, allowing better optimizations.

Enzyme supports batching in two ways, the first one (which I implemented here) just accepts a Batch size,
and then each Dual/Duplicated argument has not one, but N shadow arguments.  So instead of
```rs
for i in 0..100 {
   df(x[i], y[i], 1234);
}
```
You can now do
```rs
for i in 0..100.step_by(4) {
   df(x[i+0],x[i+1],x[i+2],x[i+3], y[i+0], y[i+1], y[i+2], y[i+3], 1234);
}
```
which will give the same results, but allows better compiler optimizations. See the testcase for details.

There is a second variant, where we can mark certain arguments and instead of having to pass in N shadow arguments, Enzyme assumes that the argument is N times longer. I.e. instead of accepting 4 slices with 12 floats each, we would accept one slice with 48 floats. I'll implement this over the next days.

I will also add more tests for both modes.

For any one preferring some more interactive explanation, here's a video of Tim's llvm dev talk, where he presents his work. https://www.youtube.com/watch?v=edvaLAL5RqU
I'll also add some other docs to the dev guide and user docs in another PR.

r? ghost

Tracking:

- https://github.com/rust-lang/rust/issues/124509
- https://github.com/rust-lang/rust/issues/135283
2025-04-05 13:18:13 +11:00
Manuel Drehwald
b7c63a973f add autodiff batching backend 2025-04-04 14:24:23 -04:00
Daniel Paoliello
79b9664091 Reduce visibility of most items in rustc_codegen_llvm 2025-03-25 16:36:47 +11:00
Zalathar
d07ef5b0e1 coverage: Add LLVM plumbing for expansion regions
This is currently unused, but paves the way for future work on expansion
regions without having to worry about the FFI parts.
2025-03-20 12:40:36 +11:00
Matthias Krüger
63c548d82c
Rollup merge of #137549 - oli-obk:llvm-ffi, r=davidtwco
Clean up various LLVM FFI things in codegen_llvm

cc ```@ZuseZ4``` I touched some autodiff parts

The major change of this PR is [bfd88ce](bfd88cead0) which makes `CodegenCx` generic just like `GenericBuilder`

The other commits mostly took advantage of the new feature of making extern functions safe, but also just used some wrappers that were already there and shrunk unsafe blocks.

best reviewed commit-by-commit
2025-03-07 19:15:34 +01:00
Michael Goulet
a59a8f9e75 Revert "Auto merge of #135335 - oli-obk:push-zxwssomxxtnq, r=saethlin"
This reverts commit a7a6c64a65, reversing
changes made to ebbe63891f.
2025-03-02 18:52:48 +00:00
bors
0c72c0d11a Auto merge of #133250 - DianQK:embed-bitcode-pgo, r=nikic
The embedded bitcode should always be prepared for LTO/ThinLTO

Fixes #115344. Fixes #117220.

There are currently two methods for generating bitcode that used for LTO. One method involves using `-C linker-plugin-lto` to emit object files as bitcode, which is the typical setting used by cargo. The other method is through `-C embed-bitcode=yes`.

When using with `-C embed-bitcode=yes -C lto=no`, we run a complete non-LTO LLVM pipeline to obtain bitcode, then the bitcode is used for LTO. We run the Call Graph Profile Pass twice on the same module.

This PR is doing something similar to LLVM's `buildFatLTODefaultPipeline`, obtaining the bitcode for embedding after running `buildThinLTOPreLinkDefaultPipeline`.

r? nikic
2025-03-01 08:22:18 +00:00
许杰友 Jieyou Xu (Joe)
d65f568302
Rollup merge of #137713 - vayunbiyani:fix-enzyme-build-errors, r=oli-obk
Fix enzyme build errors

After [this PR](https://github.com/rust-lang/rust/pull/136428) was merged, I switched to master and attempted building `./x.py build --stage 1 library` with the config mentioned in the enzyme rustbook but it resulted in some errors tho the config.example.toml build succeeded
The errors were re:
### 1. Use of ref in match patterns
The errors were related to match ergonomics in Rust 2024, where ref is no longer needed when matching on references. Examples:
```
error: binding modifiers may only be written when the default binding mode is `move`
   --> compiler/rustc_builtin_macros/src/autodiff.rs:136:31
    |
136 |             Annotatable::Item(ref iitem) => {
    |                               ^^^ binding modifier not allowed under `ref` default binding mode
    |
    = note: for more information, see <https://doc.rust-lang.org/nightly/edition-guide/rust-2024/match-ergonomics.html>
note: matching on a reference type with a non-reference pattern changes the default binding mode
   --> compiler/rustc_builtin_macros/src/autodiff.rs:136:13
    |
136 |             Annotatable::Item(ref iitem) => {
    |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ this matches on type `&_`
help: remove the unnecessary binding modifier
    |
136 -             Annotatable::Item(ref iitem) => {
136 +             Annotatable::Item(iitem) => {
    |

error: binding modifiers may only be written when the default binding mode is `move`
   --> compiler/rustc_builtin_macros/src/autodiff.rs:146:36
    |
146 |             Annotatable::AssocItem(ref assoc_item, _) => {
    |                                    ^^^ binding modifier not allowed under `ref` default binding mode
    |
    = note: for more information, see <https://doc.rust-lang.org/nightly/edition-guide/rust-2024/match-ergonomics.html>
note: matching on a reference type with a non-reference pattern changes the default binding mode
   --> compiler/rustc_builtin_macros/src/autodiff.rs:146:13
    |
146 |             Annotatable::AssocItem(ref assoc_item, _) => {
    |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ this matches on type `&_`
help: remove the unnecessary binding modifier
    |
146 -             Annotatable::AssocItem(ref assoc_item, _) => {
146 +             Annotatable::AssocItem(assoc_item, _) => {
    |

error: binding modifiers may only be written when the default binding mode is `move`
   --> compiler/rustc_builtin_macros/src/autodiff.rs:174:31
    |
174 | ...       Annotatable::Item(ref iitem) => (iitem.vis.clone(), iitem.ide...
    |                             ^^^ binding modifier not allowed under `ref` default binding mode
    |
    = note: for more information, see <https://doc.rust-lang.org/nightly/edition-guide/rust-2024/match-ergonomics.html>
note: matching on a reference type with a non-reference pattern changes the default binding mode
   --> compiler/rustc_builtin_macros/src/autodiff.rs:174:13
    |
174 | ...   Annotatable::Item(ref iitem) => (iitem.vis.clone(), iitem.ident.c...
    |       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ this matches on type `&_`
help: remove the unnecessary binding modifier
    |
174 -             Annotatable::Item(ref iitem) => (iitem.vis.clone(), iitem.ident.clone()),
174 +             Annotatable::Item(iitem) => (iitem.vis.clone(), iitem.ident.clone()),
    |

error: binding modifiers may only be written when the default binding mode is `move`
   --> compiler/rustc_builtin_macros/src/autodiff.rs:175:36
    |
175 |             Annotatable::AssocItem(ref assoc_item, _) => {
    |                                    ^^^ binding modifier not allowed under `ref` default binding mode
    |
    = note: for more information, see <https://doc.rust-lang.org/nightly/edition-guide/rust-2024/match-ergonomics.html>
note: matching on a reference type with a non-reference pattern changes the default binding mode
   --> compiler/rustc_builtin_macros/src/autodiff.rs:175:13
    |
175 |             Annotatable::AssocItem(ref assoc_item, _) => {
    |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ this matches on type `&_`
help: remove the unnecessary binding modifier
    |
175 -             Annotatable::AssocItem(ref assoc_item, _) => {
175 +             Annotatable::AssocItem(assoc_item, _) => {
    |

error: could not compile `rustc_builtin_macros` (lib) due to 4 previous errors
warning: build failed, waiting for other jobs to finish...
Build completed unsuccessfully in 0:19:39
```
### 2. the use of external C blocks without unsafe in compiler/rustc_codegen_llvm/src/llvm/enzyme_ffi.rs (I don't have the error message handy)

The first commit fixes the errors above

---
## Additional Improvement:
`@ZuseZ4` suggested we consolidate the variants under `#[cfg(llvm_enzyme)]` and `#[cfg(not(llvm_enzyme))]` by conditionally checking for `cfg!(llvm_enzyme)` instead. This way,  the autodiff code is compiled but not executed avoiding such regressions

r? `@ZuseZ4`
cc: `@oli-obk`
2025-02-28 21:42:01 +08:00
Vayun Biyani
cb53e97870 Fix enzyme build errors 2025-02-25 17:25:50 +05:30
Oli Scherer
553828c6f4 Mark more LLVM FFI as safe 2025-02-24 15:11:29 +00:00
Oli Scherer
3565603d25 Use a safe wrapper around an LLVM FFI function 2025-02-24 15:11:29 +00:00
Oli Scherer
396baa750e Make allocator shim creation mostly use safe code 2025-02-24 15:11:29 +00:00
Oli Scherer
a54bfcf52b Use safe FFI for various functions in codegen_llvm 2025-02-24 15:05:56 +00:00
David Wood
a5615d3c62
codegen_llvm: avoid Deref impls w/ extern type
`rustc_codegen_llvm` relied on `Deref` impls where `Deref::Target` was
or contained an extern type - in my experimental implementation of
rust-lang/rfcs#3729, this isn't possible as the `Target` associated
type's `?Sized` bound cannot be relaxed backwards compatibly (unless we
come up with some way of doing this).

In later pull requests with the rust-lang/rfcs#3729 implementation,
breakage like this could only occur for nightly users relying on the
`extern_types` feature.

Upstreaming this to avoid needing to keep carrying this patch locally,
and I think it'll necessarily need to change eventually.
2025-02-24 08:08:55 +00:00
bors
e0be1a0262 Auto merge of #137271 - nikic:gep-nuw-2, r=scottmcm
Emit getelementptr inbounds nuw for pointer::add()

Lower pointer::add (via intrinsic::offset with unsigned offset) to getelementptr inbounds nuw on LLVM versions that support it. This lets LLVM make use of the pre-condition that the offset addition does not wrap in an unsigned sense. Together with inbounds, this also implies that the offset is non-negative.

Fixes https://github.com/rust-lang/rust/issues/137217.
2025-02-24 03:06:16 +00:00
DianQK
1a99ca8da9
The embedded bitcode should always be prepared for LTO/ThinLTO 2025-02-23 21:23:36 +08:00
bors
15469f8f8a Auto merge of #137420 - matthiaskrgr:rollup-rr0q37f, r=matthiaskrgr
Rollup of 9 pull requests

Successful merges:

 - #136910 (Implement feature `isolate_most_least_significant_one` for integer types)
 - #137183 (Prune dead regionck code)
 - #137333 (Use `edition = "2024"` in the compiler (redux))
 - #137356 (Ferris 🦀 Identifier naming conventions)
 - #137362 (Add build step log for `run-make-support`)
 - #137377 (Always allow reusing cratenum in CrateLoader::load)
 - #137388 (Fix(lib/fs/tests): Disable rename POSIX semantics FS tests under Windows 7)
 - #137410 (Use StableHasher + Hash64 for dep_tracking_hash)
 - #137413 (jubilee cleared out the review queue)

r? `@ghost`
`@rustbot` modify labels: rollup
2025-02-22 13:32:44 +00:00
Manuel Drehwald
e2d250c3f6 update autodiff flags 2025-02-21 21:51:20 -05:00
Michael Goulet
e1819a889a Fix overcapturing, unsafe extern blocks, and new unsafe ops 2025-02-22 00:01:48 +00:00
Oli Scherer
ce7f58bd91 Merge two operations that were always performed together 2025-02-20 11:24:00 +00:00
Oli Scherer
ea7180813b Create safe helper for LLVMSetDLLStorageClass 2025-02-20 11:15:00 +00:00
Nikita Popov
5e9d8a7d55 Switch to the LLVMBuildGEPWithNoWrapFlags API
This API allows us to set the nuw flag as well.
2025-02-19 11:32:32 +01:00
Scott McMurray
9ad6839f7a Set both nuw and nsw in slice size calculation
There's an old note in the code to do this, and now that LLVM-C has an API for it, we might as well.
2025-02-13 21:26:48 -08:00
Daniel Paoliello
e7cef26a3d cg_llvm: Reduce visibility of all functions in the llvm module 2025-02-13 12:36:25 +11:00
Zalathar
659e20fa75 Remove LLVMGetModuleContext
This was unused after the removal of `-Zprofile` in #131829.
2025-02-13 12:36:09 +11:00
Oli Scherer
dcf1e4d72b Document some safety constraints and use more safe wrappers 2025-02-11 09:47:13 +00:00
Matthias Krüger
78f5bddd57
Rollup merge of #136419 - EnzymeAD:autodiff-tests, r=onur-ozkan,jieyouxu
adding autodiff tests

I'd like to get started with upstreaming some tests, even though I'm still waiting for an answer on how to best integrate the enzyme pass. Can we therefore temporarily support the -Z llvm-plugins here without too much effort? And in that case, how would that work? I saw you can do remapping, e.g. `rust-src-base`, but I don't think that will give me the path to libEnzyme.so. Do you have another suggestion?

Other than that this test simply checks that the derivative of `x*x` is `2.0 * x`, which in this case is computed as
`%0 = fadd fast double %x.0.val, %x.0.val`
(I'll add a few more tests and move it to an autodiff folder if we can use the -Z flag)

r? ``@jieyouxu``

Locally at least `-Zllvm-plugins=${PWD}/build/x86_64-unknown-linux-gnu/enzyme/build/Enzyme/libEnzyme-19.so` seems to work if I copy the command I get from x.py test and run it manually. However, running x.py test itself fails.

Tracking:

- https://github.com/rust-lang/rust/issues/124509

Zulip discussion: https://rust-lang.zulipchat.com/#narrow/channel/326414-t-infra.2Fbootstrap/topic/Enzyme.20build.20changes
2025-02-10 16:38:23 +01:00
Manuel Drehwald
21d096184e fix non-enzyme builds 2025-02-07 22:27:46 -05:00
Daniel Paoliello
2a6b27444a Remove dead code from rustc_codegen_llvm and the LLVM wrapper 2025-02-06 16:53:52 -08:00
Zalathar
042fd8c24a Remove some unused glob re-exports
These were detected by temporarily making `mod llvm` non-public.
2025-02-06 12:10:45 +11:00
León Orell Valerian Liehr
75989e98d8
Rollup merge of #136375 - Zalathar:llvm-di-builder, r=workingjubilee
cg_llvm: Replace some DIBuilder wrappers with LLVM-C API bindings (part 1)

Part of #134001, follow-up to #136326, extracted from #134009.

This PR performs an arbitrary subset of the LLVM-C binding migrations from #134009, which should make it less tedious to review. The remaining migrations can occur in one or more subsequent PRs.
2025-02-05 05:03:03 +01:00
bors
3f33b30e19 Auto merge of #135760 - scottmcm:disjoint-bitor, r=WaffleLapkin
Add `unchecked_disjoint_bitor` per ACP373

Following the names from libs-api in https://github.com/rust-lang/libs-team/issues/373#issuecomment-2085686057

Includes a fallback implementation so this doesn't have to update cg_clif or cg_gcc, and overrides it in cg_llvm to use `or disjoint`, which [is available in LLVM 18](https://releases.llvm.org/18.1.0/docs/LangRef.html#or-instruction) so hopefully we don't need any version checks.
2025-02-04 17:46:06 +00:00
Scott McMurray
4ee1602eab Override disjoint_or in the LLVM backend 2025-01-31 22:29:08 -08:00
Zalathar
c3f2930edc Explain why (some) pointer/length strings are *const c_uchar 2025-02-01 14:14:40 +11:00
Zalathar
5413d2bd6f Add FIXME for auditing optional parameters passed to DIBuilder 2025-02-01 14:14:40 +11:00
Zalathar
8ddd9c38f6 Use LLVMDIBuilderCreateDebugLocation
The LLVM-C binding takes an explicit context, whereas our binding obtained the
context from the scope argument.
2025-02-01 14:14:40 +11:00
Zalathar
949b4673ce Use LLVMDIBuilderCreateLexicalBlockFile 2025-02-01 14:14:40 +11:00
Zalathar
70d41bc711 Use LLVMDIBuilderCreateLexicalBlock 2025-02-01 14:14:40 +11:00
Zalathar
878ab125a1 Use LLVMDIBuilderCreateNameSpace 2025-02-01 14:14:39 +11:00
Zalathar
cd2af2dd9a Use LLVMDIBuilderFinalize 2025-02-01 13:38:12 +11:00
Zalathar
832fcfb64f Introduce DIBuilderBox, an owning pointer to DIBuilder 2025-02-01 13:34:14 +11:00
Ben Kimock
ce7cb312fa Add link attribute for Enzyme's FFI 2025-01-31 21:11:23 -05:00
Jacob Pratt
c19c4b91f5
Rollup merge of #133429 - EnzymeAD:autodiff-middle, r=oli-obk
Autodiff Upstreaming - rustc_codegen_ssa, rustc_middle

This PR should not be merged until the rustc_codegen_llvm part is merged.
I will also alter it a little based on what get's shaved off from the cg_llvm PR,
and address some of the feedback I received in the other PR (including cleanups).

I am putting it already up to
1) Discuss with `@jieyouxu` if there is more work needed to add tests to this and
2) Pray that there is someone reviewing who can tell me why some of my autodiff invocations get lost.

Re 1: My test require fat-lto. I also modify the compilation pipeline. So if there are any other llvm-ir tests in the same compilation unit then I will likely break them. Luckily there are two groups who currently have the same fat-lto requirement for their GPU code which I have for my autodiff code and both groups have some plans to enable support for thin-lto. Once either that work pans out, I'll copy it over for this feature. I will also work on not changing the optimization pipeline for functions not differentiated, but that will require some thoughts and engineering, so I think it would be good to be able to run the autodiff tests isolated from the rest for now. Can you guide me here please?
For context, here are some of my tests in the samples folder: https://github.com/EnzymeAD/rustbook

Re 2: This is a pretty serious issue, since it effectively prevents publishing libraries making use of autodiff: https://github.com/EnzymeAD/rust/issues/173. For some reason my dummy code persists till the end, so the code which calls autodiff, deletes the dummy, and inserts the code to compute the derivative never gets executed. To me it looks like the rustc_autodiff attribute just get's dropped, but I don't know WHY? Any help would be super appreciated, as rustc queries look a bit voodoo to me.

Tracking:

- https://github.com/rust-lang/rust/issues/124509

r? `@jieyouxu`
2025-01-31 00:26:30 -05:00
Matthias Krüger
89f8abe8b4
Rollup merge of #135026 - Flakebi:global-addrspace, r=saethlin
Cast global variables to default address space

Pointers for variables all need to be in the same address space for correct compilation. Therefore ensure that even if a global variable is created in a different address space, it is casted to the default address space before its value is used.

This is necessary for the amdgpu target and others where the default address space for global variables is not 0.

For example `core` does not compile in debug mode when not casting the address space to the default one because it tries to emit the following (simplified) LLVM IR, containing a type mismatch:

```llvm
`@alloc_0` = addrspace(1) constant <{ [6 x i8] }> <{ [6 x i8] c"bit.rs" }>, align 1
`@alloc_1` = addrspace(1) constant <{ ptr }> <{ ptr addrspace(1) `@alloc_0` }>, align 8
; ^ here a struct containing a `ptr` is needed, but it is created using a `ptr addrspace(1)`
```

For this to compile, we need to insert a constant `addrspacecast` before we use a global variable:

```llvm
`@alloc_0` = addrspace(1) constant <{ [6 x i8] }> <{ [6 x i8] c"bit.rs" }>, align 1
`@alloc_1` = addrspace(1) constant <{ ptr }> <{ ptr addrspacecast (ptr addrspace(1) `@alloc_0` to ptr) }>, align 8
```

As vtables are global variables as well, they are also created with an `addrspacecast`. In the SSA backend, after a vtable global is created, metadata is added to it. To add metadata, we need the non-casted global variable. Therefore we strip away an addrspacecast if there is one, to get the underlying global.

Tracking issue: #135024
2025-01-30 20:47:02 +01:00
Manuel Drehwald
1f30517d40 upstream rustc_codegen_ssa/rustc_middle changes for enzyme/autodiff 2025-01-29 21:31:13 -05:00
Matthias Krüger
e0d74c0667
Rollup merge of #135156 - Zalathar:debuginfo-flags, r=cuviper
Make our `DIFlags` match `LLVMDIFlags` in the LLVM-C API

In order to be able to use a mixture of LLVM-C and C++ bindings for debuginfo, our Rust-side `DIFlags` needs to have the same layout as LLVM-C's `LLVMDIFlags`, and we also need to be able to convert it to the `DIFlags` accepted by LLVM's C++ API.

Internally, LLVM converts between the two types with a simple cast. We can't necessarily rely on that always being true, and LLVM doesn't expose a conversion function, so we have two potential options:
- Convert each bit/subvalue individually
- Statically assert that doing a cast is actually fine

As long as both types do remain the same under the hood (which seems likely), the static-assert-and-cast approach is easier and faster. If the static assertions ever start failing against some future version of LLVM, we'll have to switch over to the convert-each-subvalue approach, which is a bit more error-prone.

---

Extracted from #134009, though this PR ended up choosing the static-assert-and-cast approach over the convert-each-subvalue approach.
2025-01-22 19:29:39 +01:00
Oli Scherer
dfa4c01b2e Treat undef bytes as equal to any other byte 2025-01-21 08:27:21 +00:00
Zalathar
32f1c1d85e Make our DIFlags match LLVMDIFlags in the LLVM-C API 2025-01-21 14:41:44 +11:00
bors
0c2c096e1a Auto merge of #135047 - Flakebi:amdgpu-kernel-cc, r=workingjubilee
Add gpu-kernel calling convention

The amdgpu-kernel calling convention was reverted in commit f6b21e90d1 (#120495 and https://github.com/rust-lang/rust-analyzer/pull/16463) due to inactivity in the amdgpu target.

Introduce a `gpu-kernel` calling convention that translates to `ptx_kernel` or `amdgpu_kernel`, depending on the target that rust compiles for.

Tracking issue: #135467
amdgpu target tracking issue: #135024
2025-01-17 04:36:09 +00:00
Flakebi
e7e5202978
Add gpu-kernel calling convention
The amdgpu-kernel calling convention was reverted in commit
f6b21e90d1 due to inactivity in the amdgpu
target.

Introduce a `gpu-kernel` calling convention that translates to
`ptx_kernel` or `amdgpu_kernel`, depending on the target that rust
compiles for.
2025-01-16 00:26:55 +01:00