As discovered by Mara in #110897, our TLS implementation is a total mess. In the past months, I have simplified the actual macros and their expansions, but the majority of the complexity comes from the platform-specific support code needed to create keys and register destructors. In keeping with #117276, I have therefore moved all of the `thread_local_key`/`thread_local_dtor` modules to the `thread_local` module in `sys` and merged them into a new structure, so that future porters of `std` can simply mix-and-match the existing code instead of having to copy the same (bad) implementation everywhere. The new structure should become obvious when looking at `sys/thread_local/mod.rs`.
Unfortunately, the documentation changes associated with the refactoring have made this PR rather large. That said, this contains no functional changes except for two small ones:
* the key-based destructor fallback now, by virtue of sharing the implementation used by macOS and others, stores its list in a `#[thread_local]` static instead of in the key, eliminating one indirection layer and drastically simplifying its code.
* I've switched over ZKVM (tier 3) to use the same implementation as WebAssembly, as the implementation was just a way worse version of that
Please let me know if I can make this easier to review! I know these large PRs aren't optimal, but I couldn't think of any good intermediate steps.
@rustbot label +A-thread-locals
Win10: Use `GetSystemTimePreciseAsFileTime` directly
On Windows 10 we can use `GetSystemTimePreciseAsFileTime` directly instead of lazy loading it (with a fallback).
Implement junction_point
Implements https://github.com/rust-lang/rust/issues/121709
We already had a private implementation that we use for tests so we could just make that public. Except it was very hacky as it was only ever intended for use in testing. I've made an improved version that at least handles path conversion correctly and has less need for things like the `Align8` hack. There's still room for further improvement though.
Always use WaitOnAddress on Win10+
`WaitOnAddress` and `WakeByAddressSingle` are always available since Windows 8 so they can now be used without needing to delay load. I've also moved the Windows 7 thread parking fallbacks into a separate sub-module.