Reduce formatting `width` and `precision` to 16 bits
This is part of https://github.com/rust-lang/rust/issues/99012
This is reduces the `width` and `precision` fields in format strings to 16 bits. They are currently full `usize`s, but it's a bit nonsensical that we need to support the case where someone wants to pad their value to eighteen quintillion spaces and/or have eighteen quintillion digits of precision.
By reducing these fields to 16 bit, we can reduce `FormattingOptions` to 64 bits (see https://github.com/rust-lang/rust/pull/136974) and improve the in memory representation of `format_args!()`. (See additional context below.)
This also fixes a bug where the width or precision is silently truncated when cross-compiling to a target with a smaller `usize`. By reducing the width and precision fields to the minimum guaranteed size of `usize`, 16 bits, this bug is eliminated.
This is a breaking change, but affects almost no existing code.
---
Details of this change:
There are three ways to set a width or precision today:
1. Directly a formatting string, e.g. `println!("{a:1234}")`
2. Indirectly in a formatting string, e.g. `println!("{a:width$}", width=1234)`
3. Through the unstable `FormattingOptions::width` method.
This PR:
- Adds a compiler error for 1. (`println!("{a:9999999}")` no longer compiles and gives a clear error.)
- Adds a runtime check for 2. (`println!("{a:width$}, width=9999999)` will panic.)
- Changes the signatures of the (unstable) `FormattingOptions::[get_]width` methods to use a `u16` instead.
---
Additional context for improving `FormattingOptions` and `fmt::Arguments`:
All the formatting flags and options are currently:
- The `+` flag (1 bit)
- The `-` flag (1 bit)
- The `#` flag (1 bit)
- The `0` flag (1 bit)
- The `x?` flag (1 bit)
- The `X?` flag (1 bit)
- The alignment (2 bits)
- The fill character (21 bits)
- Whether a width is specified (1 bit)
- Whether a precision is specified (1 bit)
- If used, the width (a full usize)
- If used, the precision (a full usize)
Everything except the last two can simply fit in a `u32` (those add up to 31 bits in total).
If we can accept a max width and precision of u16::MAX, we can make a `FormattingOptions` that is exactly 64 bits in size; the same size as a thin reference on most platforms.
If, additionally, we also limit the number of formatting arguments, we can also reduce the size of `fmt::Arguments` (that is, of a `format_args!()` expression).
Revert <https://github.com/rust-lang/rust/pull/138084> to buy time to
consider options that avoids breaking downstream usages of cargo on
distributed `rustc-src` artifacts, where such cargo invocations fail due
to inability to inherit `lints` from workspace root manifest's
`workspace.lints` (this is only valid for the source rust-lang/rust
workspace, but not really the distributed `rustc-src` artifacts).
This breakage was reported in
<https://github.com/rust-lang/rust/issues/138304>.
This reverts commit 48caf81484, reversing
changes made to c6662879b2.
By naming them in `[workspace.lints.rust]` in the top-level
`Cargo.toml`, and then making all `compiler/` crates inherit them with
`[lints] workspace = true`. (I omitted `rustc_codegen_{cranelift,gcc}`,
because they're a bit different.)
The advantages of this over the current approach:
- It uses a standard Cargo feature, rather than special handling in
bootstrap. So, easier to understand, and less likely to get
accidentally broken in the future.
- It works for proc macro crates.
It's a shame it doesn't work for rustc-specific lints, as the comments
explain.
We already do this for a number of crates, e.g. `rustc_middle`,
`rustc_span`, `rustc_metadata`, `rustc_span`, `rustc_errors`.
For the ones we don't, in many cases the attributes are a mess.
- There is no consistency about order of attribute kinds (e.g.
`allow`/`deny`/`feature`).
- Within attribute kind groups (e.g. the `feature` attributes),
sometimes the order is alphabetical, and sometimes there is no
particular order.
- Sometimes the attributes of a particular kind aren't even grouped
all together, e.g. there might be a `feature`, then an `allow`, then
another `feature`.
This commit extends the existing sorting to all compiler crates,
increasing consistency. If any new attribute line is added there is now
only one place it can go -- no need for arbitrary decisions.
Exceptions:
- `rustc_log`, `rustc_next_trait_solver` and `rustc_type_ir_macros`,
because they have no crate attributes.
- `rustc_codegen_gcc`, because it's quasi-external to rustc (e.g. it's
ignored in `rustfmt.toml`).
That is, change `diagnostic_outside_of_impl` and
`untranslatable_diagnostic` from `allow` to `deny`, because more than
half of the compiler has be converted to use translated diagnostics.
This commit removes more `deny` attributes than it adds `allow`
attributes, which proves that this change is warranted.
`unescape_literal` becomes `unescape_unicode`, and `unescape_c_string`
becomes `unescape_mixed`. Because rfc3349 will mean that C string
literals will no longer be the only mixed utf8 literals.
- Sort dependencies and features sections.
- Add `tidy` markers to the sorted sections so they stay sorted.
- Remove empty `[lib`] sections.
- Remove "See more keys..." comments.
Excluded files:
- rustc_codegen_{cranelift,gcc}, because they're external.
- rustc_lexer, because it has external use.
- stable_mir, because it has external use.
add diagnostic for raw identifiers in format string
Format strings don't support raw identifiers (e.g. `format!("{r#type}")`), but they do support keywords in the format string directly (e.g. `format!("{type}")`). This PR improves the error output when attempting to use a raw identifier in a format string and adds a machine-applicable suggestion to remove the `r#`.
fixes https://github.com/rust-lang/rust/issues/115466
Sometimes, we want to create subspans and point at code in the literal
if possible. But this doesn't always make sense, sometimes the literal
may come from macro expanded code and isn't actually there in the
source. Then, we can't really make these suggestions.
This now makes sure that the literal is actually there as we see it so
that we will not run into ICEs on weird literal transformations.
Originally, this was kinda half-allowed. There were some primitive
checks in place that looked at the span to see whether the input was
likely a literal. These "source literal" checks are needed because the
spans created during `format_args` parsing only make sense when it is
indeed a literal that was written in the source code directly.
This is orthogonal to the restriction that the first argument must be a
"direct literal", not being exanpanded from macros. This restriction was
imposed by [RFC 2795] on the basis of being too confusing. But this was
only concerned with the argument of the invocation being a literal, not
whether it was a source literal (maybe in spirit it meant it being a
source literal, this is not clear to me).
Since the original check only really cared about source literals (which
is good enough to deny the `format_args!(concat!())` example), macros
expanding to `format_args` invocations were able to use implicit
captures if they spanned the string in a way that lead back to a source
string.
The "source literal" checks were not strict enough and caused ICEs in
certain cases (see # 106191 (the space is intended to avoid spammy
backreferences)). So I tightened it up in # 106195 to really only work
if it's a direct source literal.
This caused the `indoc` crate to break. `indoc` transformed the source
literal by removing whitespace, which made it not a "source literal"
anymore (which is required to fix the ICE). But since `indoc` spanned
the literal in ways that made the old check think that it's a literal,
it was able to use implicit captures (which is useful and nice for the
users of `indoc`).
This commit properly seperates the previously introduced concepts of
"source literal" and "direct literal" and therefore allows `indoc`
invocations, which don't create "source literals" to use implicit
captures again.
[RFC 2795]: https://rust-lang.github.io/rfcs/2795-format-args-implicit-identifiers.html#macro-hygiene
Suggest `{var:?}` when finding `{?:var}` in inline format strings
Link to issue: https://github.com/rust-lang/rust/issues/106572
This is my first PR to this project, so hopefully I can get some good pointers with me from the first PR.
Currently my idea was to test out whether or not this is the correct solution to this issue and then hopefully expand upon the idea to not only work for Debug formatting but for all of them. If this is a valid solution, I will create a new issue to give a better error message to a broader range of wrong-order formatting.