The parser pushes a `TokenType` to `Parser::expected_token_types` on
every call to the various `check`/`eat` methods, and clears it on every
call to `bump`. Some of those `TokenType` values are full tokens that
require cloning and dropping. This is a *lot* of work for something
that is only used in error messages and it accounts for a significant
fraction of parsing execution time.
This commit overhauls `TokenType` so that `Parser::expected_token_types`
can be implemented as a bitset. This requires changing `TokenType` to a
C-style parameterless enum, and adding `TokenTypeSet` which uses a
`u128` for the bits. (The new `TokenType` has 105 variants.)
The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to
the `check`/`eat` methods. This is for maximum speed. The elements in
the pairs are always statically known; e.g. a
`token::BinOp(token::Star)` is always paired with a `TokenType::Star`.
So we now compute `TokenType`s in advance and pass them in to
`check`/`eat` rather than the current approach of constructing them on
insertion into `expected_token_types`.
Values of these pair types can be produced by the new `exp!` macro,
which is used at every `check`/`eat` call site. The macro is for
convenience, allowing any pair to be generated from a single identifier.
The ident/keyword filtering in `expected_one_of_not_found` is no longer
necessary. It was there to account for some sloppiness in
`TokenKind`/`TokenType` comparisons.
The existing `TokenType` is moved to a new file `token_type.rs`, and all
its new infrastructure is added to that file. There is more boilerplate
code than I would like, but I can't see how to make it shorter.
`rustc_span::symbol` defines some things that are re-exported from
`rustc_span`, such as `Symbol` and `sym`. But it doesn't re-export some
closely related things such as `Ident` and `kw`. So you can do `use
rustc_span::{Symbol, sym}` but you have to do `use
rustc_span::symbol::{Ident, kw}`, which is inconsistent for no good
reason.
This commit re-exports `Ident`, `kw`, and `MacroRulesNormalizedIdent`,
and changes many `rustc_span::symbol::` qualifiers in `compiler/` to
`rustc_span::`. This is a 200+ net line of code reduction, mostly
because many files with two `use rustc_span` items can be reduced to
one.
Look at the expression that was parsed when trying to recover from a bad `if` condition to determine what was likely intended by the user beyond "maybe this was meant to be an `else` body".
```
error: expected `{`, found `map`
--> $DIR/missing-dot-on-if-condition-expression-fixable.rs:4:30
|
LL | for _ in [1, 2, 3].iter()map(|x| x) {}
| ^^^ expected `{`
|
help: you might have meant to write a method call
|
LL | for _ in [1, 2, 3].iter().map(|x| x) {}
| +
```
This commit does the following.
- Renames `collect_tokens_trailing_token` as `collect_tokens`, because
(a) it's annoying long, and (b) the `_trailing_token` bit is less
accurate now that its types have changed.
- In `collect_tokens`, adds a `Option<CollectPos>` argument and a
`UsePreAttrPos` in the return type of `f`. These are used in
`parse_expr_force_collect` (for vanilla expressions) and in
`parse_stmt_without_recovery` (for two different cases of expression
statements). Together these ensure are enough to fix all the problems
with token collection and assoc expressions. The changes to the
`stringify.rs` test demonstrate some of these.
- Adds a new test. The code in this test was causing an assertion
failure prior to this commit, due to an invalid `NodeRange`.
The extra complexity is annoying, but necessary to fix the existing
problems.
This pre-existing type is suitable for use with the return value of the
`f` parameter in `collect_tokens_trailing_token`. The more descriptive
name will be useful because the next commit will add another boolean
value to the return value of `f`.
Use more slice patterns inside the compiler
Nothing super noteworthy. Just replacing the common 'fragile' pattern of "length check followed by indexing or unwrap" with slice patterns for legibility and 'robustness'.
r? ghost
Previously we would try to issue a suggestion for `let x <op>= 1`, i.e.
a compound assignment within a `let` binding, to remove the `<op>`. The
suggestion code unfortunately incorrectly assumed that the `<op>` is an
exactly-1-byte ASCII character, but this assumption is incorrect because
we also recover Unicode-confusables like `➖=` as `-=`. In this example,
the suggestion code used a `+ BytePos(1)` to calculate the span of the
`<op>` codepoint that looks like `-` but the mult-byte Unicode
look-alike would cause the suggested removal span to be inside a
multi-byte codepoint boundary, triggering a codepoint boundary
assertion.
Issue: <https://github.com/rust-lang/rust/issues/128845>
`parse_expr_assoc_with` has an awkward structure -- sometimes the lhs is
already parsed. This commit splits the post-lhs part into a new method
`parse_expr_assoc_rest_with`, which makes everything shorter and
simpler.
It has a single use. This makes the `let` handling case in
`parse_stmt_without_recovery` more similar to the statement path and
statement expression cases.
There are three places where we currently check `force_collect` and call
`collect_tokens_no_attrs` for `ForceCollect::Yes` and a vanilla parsing
function for `ForceCollect::No`.
But we can instead just pass in `force_collect` and let
`collect_tokens_trailing_token` do the appropriate thing.
It's used in `Parser::collect_tokens_trailing_token` to decide whether
to capture a trailing token. But the callers actually know whether to
capture a trailing token, so it's simpler for them to just pass in a
bool.
Also, the `TrailingToken::Gt` case was weird, because it didn't result
in a trailing token being captured. It could have been subsumed by the
`TrailingToken::MaybeComma` case, and it effectively is in the new code.
Some parser improvements
I was looking closely at attribute handling in the parser while debugging some issues relating to #124141, and found a few small improvements.
``@spastorino``
Go over all structured parser suggestions and make them verbose style.
When suggesting to add or remove delimiters, turn them into multiple suggestion parts.
Combine `NotYetParsed` and `AttributesParsed` into a single variant,
because (a) that reflects the structure of the code that consumes
`LhsExpr`, and (b) because that variant will have the `Option` removed
in a later commit.
Disallow cast with trailing braced macro in let-else
This fixes an edge case I noticed while porting #118880 and #119062 to syn.
Previously, rustc incorrectly accepted code such as:
```rust
let foo = &std::ptr::null as &'static dyn std::ops::Fn() -> *const primitive! {
8
} else {
return;
};
```
even though a right curl brace `}` directly before `else` in a `let...else` statement is not supposed to be valid syntax.
The starting point for this was identical comments on two different
fields, in `ast::VariantData::Struct` and `hir::VariantData::Struct`:
```
// FIXME: investigate making this a `Option<ErrorGuaranteed>`
recovered: bool
```
I tried that, and then found that I needed to add an `ErrorGuaranteed`
to `Recovered::Yes`. Then I ended up using `Recovered` instead of
`Option<ErrorGuaranteed>` for these two places and elsewhere, which
required moving `ErrorGuaranteed` from `rustc_parse` to `rustc_ast`.
This makes things more consistent, because `Recovered` is used in more
places, and there are fewer uses of `bool` and
`Option<ErrorGuaranteed>`. And safer, because it's difficult/impossible
to set `recovered` to `Recovered::Yes` without having emitted an error.
Add asm goto support to `asm!`
Tracking issue: #119364
This PR implements asm-goto support, using the syntax described in "future possibilities" section of [RFC2873](https://rust-lang.github.io/rfcs/2873-inline-asm.html#asm-goto).
Currently I have only implemented the `label` part, not the `fallthrough` part (i.e. fallthrough is implicit). This doesn't reduce the expressive though, since you can use label-break to get arbitrary control flow or simply set a value and rely on jump threading optimisation to get the desired control flow. I can add that later if deemed necessary.
r? ``@Amanieu``
cc ``@ojeda``
Existing names for values of this type are `sess`, `parse_sess`,
`parse_session`, and `ps`. `sess` is particularly annoying because
that's also used for `Session` values, which are often co-located, and
it can be difficult to know which type a value named `sess` refers to.
(That annoyance is the main motivation for this change.) `psess` is nice
and short, which is good for a name used this much.
The commit also renames some `parse_sess_created` values as
`psess_created`.
When a `Local` is fully parsed, but not followed by a `;`, keep the `:` span
arround and mention it. If the type could continue being parsed as an
expression, suggest replacing the `:` with a `=`.
```
error: expected one of `!`, `+`, `->`, `::`, `;`, or `=`, found `.`
--> file.rs:2:32
|
2 | let _: std::env::temp_dir().join("foo");
| - ^ expected one of `!`, `+`, `->`, `::`, `;`, or `=`
| |
| while parsing the type for `_`
| help: use `=` if you meant to assign
```
Fix#119665.
This works for most of its call sites. This is nice, because `emit` very
much makes sense as a consuming operation -- indeed,
`DiagnosticBuilderState` exists to ensure no diagnostic is emitted
twice, but it uses runtime checks.
For the small number of call sites where a consuming emit doesn't work,
the commit adds `DiagnosticBuilder::emit_without_consuming`. (This will
be removed in subsequent commits.)
Likewise, `emit_unless` becomes consuming. And `delay_as_bug` becomes
consuming, while `delay_as_bug_without_consuming` is added (which will
also be removed in subsequent commits.)
All this requires significant changes to `DiagnosticBuilder`'s chaining
methods. Currently `DiagnosticBuilder` method chaining uses a
non-consuming `&mut self -> &mut Self` style, which allows chaining to
be used when the chain ends in `emit()`, like so:
```
struct_err(msg).span(span).emit();
```
But it doesn't work when producing a `DiagnosticBuilder` value,
requiring this:
```
let mut err = self.struct_err(msg);
err.span(span);
err
```
This style of chaining won't work with consuming `emit` though. For
that, we need to use to a `self -> Self` style. That also would allow
`DiagnosticBuilder` production to be chained, e.g.:
```
self.struct_err(msg).span(span)
```
However, removing the `&mut self -> &mut Self` style would require that
individual modifications of a `DiagnosticBuilder` go from this:
```
err.span(span);
```
to this:
```
err = err.span(span);
```
There are *many* such places. I have a high tolerance for tedious
refactorings, but even I gave up after a long time trying to convert
them all.
Instead, this commit has it both ways: the existing `&mut self -> Self`
chaining methods are kept, and new `self -> Self` chaining methods are
added, all of which have a `_mv` suffix (short for "move"). Changes to
the existing `forward!` macro lets this happen with very little
additional boilerplate code. I chose to add the suffix to the new
chaining methods rather than the existing ones, because the number of
changes required is much smaller that way.
This doubled chainging is a bit clumsy, but I think it is worthwhile
because it allows a *lot* of good things to subsequently happen. In this
commit, there are many `mut` qualifiers removed in places where
diagnostics are emitted without being modified. In subsequent commits:
- chaining can be used more, making the code more concise;
- more use of chaining also permits the removal of redundant diagnostic
APIs like `struct_err_with_code`, which can be replaced easily with
`struct_err` + `code_mv`;
- `emit_without_diagnostic` can be removed, which simplifies a lot of
machinery, removing the need for `DiagnosticBuilderState`.