Greatly reduce amount of debuginfo compiled for bootstrap itself
Rather than compiling rustbuild and all its dependencies with
`debuginfo=2`, this compiles dependencies without debuginfo and
rustbuild with `debuginfo=1`. On my laptop, this brings compile times
down from ~1:20 to ~1:05.
See also 254847594.
r? ``@Mark-Simulacrum``
Rather than compiling rustbuild and all its dependencies with
`debuginfo=2`, this compiles dependencies without debuginfo and
rustbuild with `debuginfo=1`. On my laptop, this brings compile times
down from ~1:20 to ~1:05.
On NixOS systems, bootstrap will patch rustc used in bootstrapping after
checking `/etc/os-release` (to confirm the current distribution is NixOS).
However, when using Nix on a non-NixOS system, it can be desirable for
bootstrap to patch rustc. In this commit, a `patch-binaries-for-nix`
option is added to `config.toml`, which allows for user opt-in to
bootstrap's Nix patching.
Signed-off-by: David Wood <david.wood@huawei.com>
Pin bootstrap checksums and add a tool to update it automatically
⚠️⚠️ This is just a proactive hardening we're performing on the build system, and it's not prompted by any known compromise. If you're aware of security issues being exploited please [check out our responsible disclosure page](https://www.rust-lang.org/policies/security). ⚠️⚠️
---
This PR aims to improve Rust's supply chain security by pinning the checksums of the bootstrap compiler downloaded by `x.py`, preventing a compromised `static.rust-lang.org` from affecting building the compiler. The checksums are stored in `src/stage0.json`, which replaces `src/stage0.txt`. This PR also adds a tool to automatically update the bootstrap compiler.
The changes in this PR were originally discussed in [Zulip](https://zulip-archive.rust-lang.org/stream/241545-t-release/topic/pinning.20stage0.20hashes.html).
## Potential attack
Before this PR, an attacker who wanted to compromise the bootstrap compiler would "just" need to:
1. Gain write access to `static.rust-lang.org`, either by compromising DNS or the underlying storage.
2. Upload compromised binaries and corresponding `.sha256` files to `static.rust-lang.org`.
There is no signature verification in `x.py` as we don't want the build system to depend on GPG. Also, since the checksums were not pinned inside the repository, they were downloaded from `static.rust-lang.org` too: this only protected from accidental changes in `static.rust-lang.org` that didn't change the `*.sha256` files. The attack would allow the attacker to compromise past and future invocations of `x.py`.
## Mitigations introduced in this PR
This PR adds pinned checksums for all the bootstrap components in `src/stage0.json` instead of downloading the checksums from `static.rust-lang.org`. This changes the attack scenario to:
1. Gain write access to `static.rust-lang.org`, either by compromising DNS or the underlying storage.
2. Upload compromised binaries to `static.rust-lang.org`.
3. Land a (reviewed) change in the `rust-lang/rust` repository changing the pinned hashes.
Even with a successful attack, existing clones of the Rust repository won't be affected, and once the attack is detected reverting the pinned hashes changes should be enough to be protected from the attack. This also enables further mitigations to be implemented in following PRs, such as verifying signatures when pinning new checksums (removing the trust on first use aspect of this PR) and adding a check in CI making sure a PR updating the checksum has not been tampered with (see the future improvements section).
## Additional changes
There are additional changes implemented in this PR to enable the mitigation:
* The `src/stage0.txt` file has been replaced with `src/stage0.json`. The reasoning for the change is that there is existing tooling to read and manipulate JSON files compared to the custom format we were using before, and the slight challenge of manually editing JSON files (no comments, no trailing commas) are not a problem thanks to the new `bump-stage0`.
* A new tool has been added to the repository, `bump-stage0`. When invoked, the tool automatically calculates which release should be used as the bootstrap compiler given the current version and channel, gathers all the relevant checksums and updates `src/stage0.json`. The tool can be invoked by running:
```
./x.py run src/tools/bump-stage0
```
* Support for downloading releases from `https://dev-static.rust-lang.org` has been removed, as it's not possible to verify checksums there (it's customary to replace existing artifacts there if a rebuild is warranted). This will require a change to the release process to avoid bumping the bootstrap compiler on beta before the stable release.
## Future improvements
* Add signature verification as part of `bump-stage0`, which would require the attacker to also obtain the release signing keys in order to successfully compromise the bootstrap compiler. This would be fine to add now, as the burden of installing the tool to verify signatures would only be placed on whoever updates the bootstrap compiler, instead of everyone compiling Rust.
* Add a check on CI that ensures the checksums in `src/stage0.json` are the expected ones. If a PR changes the stage0 file CI should also run the `bump-stage0` tool and fail if the output in CI doesn't match the committed file. This prevents the PR author from tweaking the output of the tool manually, which would otherwise be close to impossible for a human to detect.
* Automate creating the PRs bumping the bootstrap compiler, by setting up a scheduled job in GitHub Actions that runs the tool and opens a PR.
* Investigate whether a similar mitigation can be done for "download from CI" components like the prebuilt LLVM.
r? `@Mark-Simulacrum`
The architecture auto-detect table has no entry for riscv64 (which rustc
uses riscv64gc for the first part of triplet, assuming it's a generic
Linux distro).
Add it to the table to allow riscv64 systems to bootstrap Rust.
Signed-off-by: Icenowy Zheng <icenowy@aosc.io>
Only look for commits by bors that are merge commits, because
those are the only ones with CI artifacts. Also, use
`--first-parent` to avoid traversing stuff like rollup branches.
Use `git rev-list` instead of `git log` to be more robust against
UI changes in git. Also, use the full email address for bors,
because `--author` uses a substring match.
When determining which LLVM artifacts to download, bootstrap.py
calls: `git log --author=bors --format=%H -n1 -m --first-parent --
src/llvm-project src/bootstrap/download-ci-llvm-stamp src/version`.
However, the `-m` option has no effect, per the `git log` help:
> -m
> This option makes diff output for merge commits to be shown in the
> default format. -m will produce the output only if -p is given as
> well. The default format could be changed using log.diffMerges
> configuration parameter, which default value is separate.
Accordingly, this commit removes use of the -m option in favor of
`--no-patch`, to make clear that this command should never output
diff information, as the SHA-1 hash is the only desired output.
Tested using git 2.32, this does not change the
output of the command.
The motivation for this change is that some patched versions of git
change the behavior of the `-m` flag to imply `-p`, rather than to do
nothing unless `-p` is passed. These patched versions of git lead to
this script not working. Google's corp-provided git is one such example.
This only updates the submodules the first time they're needed, instead
of unconditionally the first time you run x.py.
Ideally, this would move *all* submodules and not exclude some tools and
backtrace. Unfortunately, cargo requires all `Cargo.toml` files in the
whole workspace to be present to build any crate.
On my machine, this takes the time for an initial submodule clone (for
`x.py --help`) from 55.70 to 15.87 seconds.
This uses exactly the same logic as the LLVM update used, modulo some
minor cleanups:
- Use a local variable for `src.join(relative_path)`
- Remove unnecessary arrays for `book!` macro and make the macro simpler to use
- Add more comments
- Don't print the exact command run by rustbuild unless `--verbose` is set.
This is almost always unhelpful, since it's just cargo with a lot of
arguments.
- Don't print "Build completed unsuccessfully" unless --verbose is set.
You can already tell the build failed by the errors above, and the
time isn't particularly helpful.
- Don't print the full path to bootstrap. This is useless to everyone,
even including when working on x.py itself. You can still opt-in to
this being shown with `--verbose`, since it will throw an exception.
Before:
```
error[E0432]: unresolved import `x`
--> library/std/src/lib.rs:343:5
|
343 | use x;
| ^ no external crate `x`
error: aborting due to previous error
For more information about this error, try `rustc --explain E0432`.
error: could not compile `std`
To learn more, run the command again with --verbose.
command did not execute successfully: "/home/joshua/rustc4/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "check" "--target" "x86_64-unknown-linux-gnu" "-Zbinary-dep-depinfo" "-j" "8" "--release" "--features" "panic-unwind backtrace" "--manifest-path" "/home/joshua/rustc4/library/test/Cargo.toml" "--message-format" "json-render-diagnostics"
expected success, got: exit status: 101
failed to run: /home/joshua/rustc4/build/bootstrap/debug/bootstrap check
Build completed unsuccessfully in 0:00:13
```
After:
```
error[E0432]: unresolved import `x`
--> library/std/src/lib.rs:343:5
|
343 | use x;
| ^ no external crate `x`
error: aborting due to previous error
For more information about this error, try `rustc --explain E0432`.
error: could not compile `std`
To learn more, run the command again with --verbose.
```
Use HTTPS links where possible
While looking at #86583, I wondered how many other (insecure) HTTP links were in `rustc`. This changes most other `http` links to `https`. While most of the links are in comments or documentation, there are a few other HTTP links that are used by CI that are changed to HTTPS.
Notes:
- I didn't change any to or in licences
- Some links don't support HTTPS :(
- Some `http` links were dead, in those cases I upgraded them to their new places (all of which used HTTPS)
Previously, changing the standard library with `download-rustc =
"if-unchanged"` would incorrectly reuse the cached compiler and standard
library from CI, which was confusing and led to incorrect test failures
or successes.
This enables better caching, since LLVM is only updated when needed, not
whenever x.py is run. Before, bootstrap.py had to use heuristics to
guess if LLVM would be needed, and updated the module more often than
necessary as a result.
This syncs the LLVM submodule only just before building the compiler, so
people working on the standard library never have to worry about it.
Example output:
```
Copying stage0 std from stage0 (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu / x86_64-unknown-linux-gnu)
Updating submodule src/llvm-project
Submodule 'src/llvm-project' (https://github.com/rust-lang/llvm-project.git) registered for path 'src/llvm-project'
Submodule path 'src/llvm-project': checked out 'f9a8d70b6e0365ac2172ca6b7f1de0341297458d'
```
- Don't try to update the LLVM submodule when using system LLVM
Previously, this would try to update LLVM unconditionally. Now the
submodule is only initialized if `llvm-config` is not set.
- Don't update LLVM submodule in dry runs
This prevents the following test failures:
```
running 17 tests
fatal: invalid gitfile format: /checkout/src/llvm-project/.git
test builder::tests::defaults::build_cross_compile ... FAILED
---- builder::tests::defaults::build_default stdout ----
thread 'main' panicked at 'command did not execute successfully: "git" "rev-parse" "HEAD"
expected success, got: exit code: 128', src/build_helper/lib.rs:139:9
```
- Try running git without --progress if it fails the first time
This avoids having to do version detection to see if --progress is
supported or not.
- Don't try to update submodules when the source repository isn't managed by git
- Update LLVM submodules that have already been checked out
- Only check for whether the submodule should be updated in lib.rs; update
it unconditionally in native.rs
Previously, this caused a bug on NixOS:
1. bootstrap.py would download and patch stage0/cargo
2. bootstrap.py would download nightly cargo, but extract it to
stage0/cargo instead of ci-rustc/cargo.
3. bootstrap.py would fail to build rustbuild because stage0/cargo
wasn't patched.
The "proper" fix is to extract nightly cargo to ci-rustc instead, but it
doesn't seem to be necessary at all, so this just skips downloading it
instead.
Moving the `.nix-deps` has resulted in rpath links being broken and
therefore bootstrap on NixOS broken entirely.
This PR still produces a `.nix-deps` but only for the purposes of
producing a gc root. We rpath a symlink-resolved result instead.
For purposes of simplicity we also use joinSymlink to produce a single
merged output directory so that we don't need to update multiple
locations every time we add a library or something.
Add `download-rustc = "if-unchanged"`
This allows keeping the setting to a fixed value without having to
toggle it when you want to work on the compiler instead of on tools.
This sets `BOOTSTRAP_DOWNLOAD_RUSTC` in bootstrap.py so rustbuild doesn't have to try and replicate its logic.
Helps with https://github.com/rust-lang/rust/issues/81930.
r? `@Mark-Simulacrum` cc `@camelid`
Use the beta compiler for building bootstrap tools when `download-rustc` is set
## Motivation
This avoids having to rebuild bootstrap and tidy each time you rebase
over master. In particular, it makes rebasing and running `x.py fmt` on
each commit in a branch significantly faster. It also avoids having to
rebuild bootstrap after setting `download-rustc = true`.
## Implementation
Instead of extracting the CI artifacts directly to `stage0/`, extract
them to `ci-rustc/` instead. Continue to copy them to the proper
sysroots as necessary for all stages except stage 0.
This also requires `bootstrap.py` to download both stage0 and CI
artifacts and distinguish between the two when checking stamp files.
Note that since tools have to be built by the same compiler that built
`rustc-dev` and the standard library, the downloaded artifacts can't be
reused when building with the beta compiler. To make sure this is still
a good user experience, warn when building with the beta compiler, and
default to building with stage 2.
I tested this by rebasing this PR from edeee915b1 over 1c77a1fa3c and confirming that only the bootstrap library itself had to be rebuilt, not any dependencies and not `tidy`. I also tested that a clean build with `x.py build` builds rustdoc exactly once and does no other work, and that `touch src/librustdoc/lib.rs && x.py build` works. `x.py check` still behaves as before (checks using the beta compiler, even if there are changes to `compiler/`).
Helps with https://github.com/rust-lang/rust/issues/81930.
r? `@Mark-Simulacrum`
## Motivation
This avoids having to rebuild bootstrap and tidy each time you rebase
over master. In particular, it makes rebasing and running `x.py fmt` on
each commit in a branch significantly faster. It also avoids having to
rebuild bootstrap after setting `download-rustc = true`.
## Implementation
Instead of extracting the CI artifacts directly to `stage0/`, extract
them to `ci-rustc/` instead. Continue to copy them to the proper
sysroots as necessary for all stages except stage 0.
This also requires `bootstrap.py` to download both stage0 and CI
artifacts and distinguish between the two when checking stamp files.
Note that since tools have to be built by the same compiler that built
`rustc-dev` and the standard library, the downloaded artifacts can't be
reused when building with the beta compiler. To make sure this is still
a good user experience, warn when building with the beta compiler, and
default to building with stage 2.
When bumping the bootstrap version, the name of the generated LLVM
shared object file is changed, even though it's the same contents as
before. If bootstrap tries to use an older version, it will get linking
errors:
```
Building rustdoc for stage1 (x86_64-unknown-linux-gnu)
Compiling rustdoc-tool v0.0.0 (/home/joshua/rustc/src/tools/rustdoc)
error: linking with `cc` failed: exit code: 1
|
= note: "cc" "-Wl,--as-needed" ... lots of args ...
= note: /usr/bin/ld: cannot find -lLLVM-12-rust-1.53.0-nightly
clang: error: linker command failed with exit code 1 (use -v to see invocation)
error: could not compile `rustdoc-tool`
```
On reflection on the issue in https://github.com/rust-lang/rust/pull/79540#discussion_r572572280, I think the bug was actually using the `compiler/` filter, not using `--author=bors`. 9a1d6174c9 has no CI artifacts because it was merged as part of a rollup:
```
$ curl -I https://ci-artifacts.rust-lang.org/rustc-builds/96e843ce6ae42e0aa519ba45e148269de347fd84/rust-std-nightly-x86_64-unknown-linux-gnu.tar.xz
HTTP/2 404
```
So 9a1d6174c9 is the correct commit to download, and that's what `--author=bors` does:
$ git log --author=bors 4aec8a5da5
commit 9a1d6174c9
Ideally it would look for "the most recent bors commit not followed by a change to `compiler/`", which would exclude things like documentation changes and avoid redownloading more than necessary, but
- Redownloading isn't the end of the world,
- That metric is hard to implement, and
- Documentation-only or library-only changes are very rare anyway since they're usually rolled up with changes to the compiler.
- Use the same compiler for stage0 and stage1. This should be fixed at
some point (so bootstrap isn't constantly rebuilt).
- Make sure `x.py build` and `x.py check` work.
- Use `git merge-base` to determine the most recent commit to download.
- Copy stage0 to the various sysroots in `Sysroot`, and delegate to
Sysroot in Assemble. Leave all other code unchanged.
- Rename date -> key
This can also be a commit hash, so 'date' is no longer a good name.
- Add the commented-out option to config.toml.example
- Disable all steps by default when `download-rustc` is enabled
Most steps don't make sense when downloading a compiler, because they'll
be pre-built in the sysroot. Only enable the ones that might be useful,
in particular Rustdoc and all `check` steps.
At some point, this should probably enable other tools, but rustdoc is
enough to test out `download-rustc`.
- Don't print 'Skipping' twice in a row
Bootstrap forcibly enables a dry run if it isn't already set, so
previously it would print the message twice:
```
Skipping bootstrap::compile::Std because it is not enabled for `download-rustc`
Skipping bootstrap::compile::Std because it is not enabled for `download-rustc`
```
Now it correctly only prints once.
## Future work
- Add FIXME about supporting beta commits
- Debug logging will never work. This should be fixed.
Don't clone LLVM submodule when download-ci-llvm is set
Previously, `downloading_llvm` would check `self.build` while it was
still an empty string, and think it was always false. This fixes the
check.
This addresses the worst part of https://github.com/rust-lang/rust/issues/76653. There are still some large submodules being downloaded (in particular, `rustc-by-example` is 146 MB, and all the submodules combined are 311 MB), but this is a lot better than the whopping 1.4 GB before.
Before, it could print this error if no toolchain was configured:
```
error: no default toolchain configured
error: backtrace:
error: stack backtrace:
0: error_chain::backtrace:👿:InternalBacktrace::new
1: rustup::config::Cfg::toolchain_for_dir
2: rustup_init::run_rustup_inner
3: rustup_init::main
4: std::rt::lang_start::{{closure}}
5: main
6: __libc_start_main
7: _start
```
cc #79813
This PR adds an allow-by-default future-compatibility lint
`SEMICOLON_IN_EXPRESSIONS_FROM_MACROS`. It fires when a trailing semicolon in a
macro body is ignored due to the macro being used in expression
position:
```rust
macro_rules! foo {
() => {
true; // WARN
}
}
fn main() {
let val = match true {
true => false,
_ => foo!()
};
}
```
The lint takes its level from the macro call site, and
can be allowed for a particular macro by adding
`#[allow(semicolon_in_expressions_from_macros)]`.
The lint is set to warn for all internal rustc crates (when being built
by a stage1 compiler). After the next beta bump, we can enable
the lint for the bootstrap compiler as well.
Don't use `self.date` unconditionally for `program_out_of_date()`
This avoids unnecessary cache invalidations for programs not affected by
the stage0 version (which is everything except the stage0 compiler
itself).
The redundant invalidations weren't noticed until now because they only
showed up on stage0 bumps, at which point people are used to rebuilding
everything anyway. I noticed it in https://github.com/rust-lang/rust/pull/79540
because I wasn't adding `self.date` to the stamp file (because I didn't realize it was necessary). Rather than
adding self.date I thought it was better to remove it from the cache key.
This avoids unnecessary cache invalidations for programs not affected by
the stage0 version (which is everything except the stage0 compiler
itself).
The redundant invalidations weren't noticed until now because they only
showed up on stage0 bumps, at which point people are used to rebuilding
everything anyway. I noticed it because I wasn't adding `self.date` to
the stamp file (because I didn't realize it was necessary). Rather than
adding self.date I thought it was better to remove it from the cache
key.
Previously, `os.remove` would always give a FileNotFound error the first
time you called it, causing bootstrap to make unnecessary copies. This
now only calls `remove()` if the file exists, avoiding the unnecessary
error.
bootstrap completely ignores all errors when detecting a rustup version,
so this wasn't noticed before.
Fixes the following error:
```
rustup not detected: a bytes-like object is required, not 'str'
falling back to auto-detect
```
This also takes the opportunity to only call rustup and other external
commands only once during startup.