Inject arm32 shims into Windows metadata generation
I had been keen to eventually move to using windows-sys as a normal Cargo dependency. But for linking, compile times and other reasons that's unlikely to ever happen.
So if we're sticking with generated bindings then injecting any necessary missing type definitions (i.e. for the MS unsupported arm32) is simpler than defining whole functions ourselves just because we need to manually implement those types on a tier 3 platform. This also reduces the places we need to change when making changes to how we use `#[link]`.
r? libs
Make os/windows and pal/windows default to `#![deny(unsafe_op_in_unsafe_fn)]`
This is to prevent regressions in modules that currently pass. I did also fix up a few trivial places where the module contained only one or two simple wrappers. In more complex cases we should try to ensure the `unsafe` blocks are appropriately scoped and have any appropriate safety comments.
This does not fix the windows bits of #127747 but it should help prevent regressions until that is done and also make it more obvious specifically which modules need attention.
As discovered by Mara in #110897, our TLS implementation is a total mess. In the past months, I have simplified the actual macros and their expansions, but the majority of the complexity comes from the platform-specific support code needed to create keys and register destructors. In keeping with #117276, I have therefore moved all of the `thread_local_key`/`thread_local_dtor` modules to the `thread_local` module in `sys` and merged them into a new structure, so that future porters of `std` can simply mix-and-match the existing code instead of having to copy the same (bad) implementation everywhere. The new structure should become obvious when looking at `sys/thread_local/mod.rs`.
Unfortunately, the documentation changes associated with the refactoring have made this PR rather large. That said, this contains no functional changes except for two small ones:
* the key-based destructor fallback now, by virtue of sharing the implementation used by macOS and others, stores its list in a `#[thread_local]` static instead of in the key, eliminating one indirection layer and drastically simplifying its code.
* I've switched over ZKVM (tier 3) to use the same implementation as WebAssembly, as the implementation was just a way worse version of that
Please let me know if I can make this easier to review! I know these large PRs aren't optimal, but I couldn't think of any good intermediate steps.
@rustbot label +A-thread-locals
Win10: Use `GetSystemTimePreciseAsFileTime` directly
On Windows 10 we can use `GetSystemTimePreciseAsFileTime` directly instead of lazy loading it (with a fallback).
Implement junction_point
Implements https://github.com/rust-lang/rust/issues/121709
We already had a private implementation that we use for tests so we could just make that public. Except it was very hacky as it was only ever intended for use in testing. I've made an improved version that at least handles path conversion correctly and has less need for things like the `Align8` hack. There's still room for further improvement though.
Always use WaitOnAddress on Win10+
`WaitOnAddress` and `WakeByAddressSingle` are always available since Windows 8 so they can now be used without needing to delay load. I've also moved the Windows 7 thread parking fallbacks into a separate sub-module.