Always mark rust and rust-call abi's as unwind
PR #63909 identified a bug that had been injected by PR #55982. As discussed on https://github.com/rust-lang/rust/issues/64655#issuecomment-537517428 , we started marking extern items as nounwind, *even* extern items that said they were using "Rust" or "rust-call" ABI.
This is a more targeted variant of PR #63909 that fixes the above bug.
Fix#64655
----
I personally suspect we will want PR #63909 to land in the long-term
But:
* it is not certain that PR #63909 *will* land,
* more importantly, PR #63909 almost certainly will not be backported to beta/stable.
The identified bug was more severe than I think anyone realized (apart from perhaps @gnzlbg, as noted [here](https://github.com/rust-lang/rust/pull/63909#issuecomment-524818838)).
Thus, I was motivated to write this PR, which fixes *just* the issue with extern rust/rust-call functions, and deliberately avoids injecting further deviation from current behavior (you can see further notes on this in the comments of the code added here).
When thread sanitizer instrumentation is enabled during compilation of
stack probe function, the function will be miscompiled and trigger
segmentation fault at runtime. Disable stack probes when tsan is
enabled.
This flag inserts `mcount` function call to the beginning of every function
after inline processing. So tracing tools like uftrace [1] (or ftrace for
Linux kernel modules) have a chance to examine function calls.
It is similar to the `-pg` flag provided by gcc or clang, but without
generating a `__gmon_start__` function for executables. If a program
runs without being traced, no `gmon.out` will be written to disk.
Under the hood, it simply adds `"instrument-function-entry-inlined"="mcount"`
attribute to every function. The `post-inline-ee-instrument` LLVM pass does
the actual job.
[1]: https://github.com/namhyung/uftrace
Unconditionally emit the target-cpu LLVM attribute.
This PR makes `rustc` always emit the `target-cpu` LLVM attribute for functions. The goal is to allow for cross-language inlining of functions defined in `libstd`. So far `libstd` functions were the only function without a `target-cpu` attribute, so in whole-crate-graph cross-lang LTO scenarios they were not eligible for inlining into foreign code.
r? @alexcrichton
This was intended to land way back in 1.24, but it was backed out due to
breakage which has long since been fixed. An unstable `#[unwind]`
attribute can be used to tweak the behavior here, but this is currently
simply switching rustc's internal default to abort-by-default if an
`extern` function panics, making our codegen sound primarily (as
currently you can produce UB with safe code)
Closes#52652
Generalized operand.rs#nontemporal_store and fixed tidy issues
Generalized operand.rs#nontemporal_store's implem even more
With a BuilderMethod trait implemented by Builder for LLVM
Cleaned builder.rs : no more code duplication, no more ValueTrait
Full traitification of builder.rs
Prefer unwrap_or_else to unwrap_or in case of function calls/allocations
The contents of `unwrap_or` are evaluated eagerly, so it's not a good pick in case of function calls and allocations. This PR also changes a few `unwrap_or`s with `unwrap_or_default`.
An added bonus is that in some cases this change also reveals if the object it's called on is an `Option` or a `Result` (based on whether the closure takes an argument).
Support for disabling PLT for better function call performance
This PR gives `rustc` the ability to skip the PLT when generating function calls into shared libraries. This can improve performance by reducing branch indirection.
AFAIK, the only advantage of using the PLT is to allow for ELF lazy binding. However, since Rust already [enables full relro for security](https://github.com/rust-lang/rust/pull/43170), lazy binding was disabled anyway.
This is a little known feature which is supported by [GCC](https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html) and [Clang](https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-fplt) as `-fno-plt` (some Linux distros [enable it by default](https://git.archlinux.org/svntogit/packages.git/tree/trunk/makepkg.conf?h=packages/pacman#n40) for all builds).
Implementation inspired by [this patch](https://reviews.llvm.org/D39079#change-YvkpNDlMs_LT) which adds `-fno-plt` support to Clang.
## Performance
I didn't run a lot of benchmarks, but these are the results on my machine for a `clap` [benchmark](https://github.com/clap-rs/clap/blob/master/benches/05_ripgrep.rs):
```
name control ns/iter no-plt ns/iter diff ns/iter diff % speedup
build_app_long 11,097 10,733 -364 -3.28% x 1.03
build_app_short 11,089 10,742 -347 -3.13% x 1.03
build_help_long 186,835 182,713 -4,122 -2.21% x 1.02
build_help_short 80,949 78,455 -2,494 -3.08% x 1.03
parse_clean 12,385 12,044 -341 -2.75% x 1.03
parse_complex 19,438 19,017 -421 -2.17% x 1.02
parse_lots 431,493 421,421 -10,072 -2.33% x 1.02
```
A small performance improvement across the board, with no downsides. It's likely binaries which make a lot of function calls into dynamic libraries could see even more improvements. [This comment](https://patchwork.ozlabs.org/patch/468993/#1028255) suggests that, in some cases, `-fno-plt` could improve PIC/PIE code performance by 10%.
## Security benefits
**Bonus**: some of the speculative execution attacks rely on the PLT, by disabling it we reduce a big attack surface and reduce the need for [`retpoline`](https://reviews.llvm.org/D41723).
## Remaining PLT calls
The compiled binaries still have plenty of PLT calls, coming from C/C++ libraries. Building dependencies with `CFLAGS=-fno-plt CXXFLAGS=-fno-plt` removes them.
Disable the PLT where possible to improve performance
for indirect calls into shared libraries.
This optimization is enabled by default where possible.
- Add the `NonLazyBind` attribute to `rustllvm`:
This attribute informs LLVM to skip PLT calls in codegen.
- Disable PLT unconditionally:
Apply the `NonLazyBind` attribute on every function.
- Only enable no-plt when full relro is enabled:
Ensures we only enable it when we have linker support.
- Add `-Z plt` as a compiler option
Fix a few AMDGPU related issues
* AMDGPU ignores `noinline` and sadly doesn't clear the attribute when it slaps `alwaysinline` on everything,
* an AMDGPU related load bit range metadata assertion,
* I didn't enable the `amdgpu` component in the `librustc_llvm` build script,
* Add AMDGPU call abi info.
This fixes a regression from #53031 where specifying `-C target-cpu=native` is
printing a lot of warnings from LLVM about `native` being an unknown CPU. It
turns out that `native` is indeed an unknown CPU and we have to perform a
mapping to an actual CPU name, but this mapping is only performed in one
location rather than all locations we inform LLVM about the target CPU.
This commit centralizes the mapping of `native` to LLVM's value of the native
CPU, ensuring that all locations we inform LLVM about the `target-cpu` it's
never `native`.
Closes#53322
This shim is generated elsewhere in the compiler so this commit adds support to
ensure it goes through similar paths as the rest of the compiler to set llvm
function attributes like target features.
cc #53372
This commit stabilizes the `#[wasm_import_module]` attribute as
`#[link(wasm_import_module = "...")]`. Tracked by #52090 this new directive in
the `#[link]` attribute is used to configured the module name that the imports
are listed with. The WebAssembly specification indicates two utf-8 names are
associated with all imported items, one for the module the item comes from and
one for the item itself. The item itself is configurable in Rust via its
identifier or `#[link_name = "..."]`, but the module name was previously not
configurable and defaulted to `"env"`. This commit ensures that this is also
configurable.
Closes#52090