The parser pushes a `TokenType` to `Parser::expected_token_types` on
every call to the various `check`/`eat` methods, and clears it on every
call to `bump`. Some of those `TokenType` values are full tokens that
require cloning and dropping. This is a *lot* of work for something
that is only used in error messages and it accounts for a significant
fraction of parsing execution time.
This commit overhauls `TokenType` so that `Parser::expected_token_types`
can be implemented as a bitset. This requires changing `TokenType` to a
C-style parameterless enum, and adding `TokenTypeSet` which uses a
`u128` for the bits. (The new `TokenType` has 105 variants.)
The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to
the `check`/`eat` methods. This is for maximum speed. The elements in
the pairs are always statically known; e.g. a
`token::BinOp(token::Star)` is always paired with a `TokenType::Star`.
So we now compute `TokenType`s in advance and pass them in to
`check`/`eat` rather than the current approach of constructing them on
insertion into `expected_token_types`.
Values of these pair types can be produced by the new `exp!` macro,
which is used at every `check`/`eat` call site. The macro is for
convenience, allowing any pair to be generated from a single identifier.
The ident/keyword filtering in `expected_one_of_not_found` is no longer
necessary. It was there to account for some sloppiness in
`TokenKind`/`TokenType` comparisons.
The existing `TokenType` is moved to a new file `token_type.rs`, and all
its new infrastructure is added to that file. There is more boilerplate
code than I would like, but I can't see how to make it shorter.
This is a naming convention used in a handful of spots in the parser for
delimiters. It confused me when I first saw it a long time ago, and I've
never liked it. A web search says "Bra-ket notation" exists in linear
algebra but the terminology has zero prior use in a programming context,
as far as I can tell.
This commit changes it to `open`/`close`, which is consistent with the
rest of the compiler.
`rustc_symbol` is the source of truth for keywords.
rustdoc has its own implicit definition of keywords, via the
`is_doc_keyword`. It (presumably) intends to include all keywords, but
it omits `yeet`.
rustfmt has its own explicit list of Rust keywords. It also (presumably)
intends to include all keywords, but it omits `await`, `builtin`, `gen`,
`macro_rules`, `raw`, `reuse`, `safe`, and `yeet`. Also, it does linear
searches through this list, which is inefficient.
This commit fixes all of the above problems by introducing a new
predicate `is_any_keyword` in rustc and using it in rustdoc and rustfmt.
It documents that it's not the right predicate in most cases.
`rustc_span::symbol` defines some things that are re-exported from
`rustc_span`, such as `Symbol` and `sym`. But it doesn't re-export some
closely related things such as `Ident` and `kw`. So you can do `use
rustc_span::{Symbol, sym}` but you have to do `use
rustc_span::symbol::{Ident, kw}`, which is inconsistent for no good
reason.
This commit re-exports `Ident`, `kw`, and `MacroRulesNormalizedIdent`,
and changes many `rustc_span::symbol::` qualifiers in `compiler/` to
`rustc_span::`. This is a 200+ net line of code reduction, mostly
because many files with two `use rustc_span` items can be reduced to
one.
`parse_expr_bottom` stores `this.token.span` in `lo`, but then fails to
use it in many places where it could. This commit fixes that, and
likewise (to a smaller extent) in `parse_ty_common`.
This PR detects misspelled keywords using two heuristics:
1. Lowercasing the unexpected identifier.
2. Using edit distance to find a keyword similar to the unexpected identifier.
However, it does not detect each and every misspelled keyword to
minimize false positives and ambiguities. More details about the
implementation can be found in the comments.
This commit does the following.
- Renames `collect_tokens_trailing_token` as `collect_tokens`, because
(a) it's annoying long, and (b) the `_trailing_token` bit is less
accurate now that its types have changed.
- In `collect_tokens`, adds a `Option<CollectPos>` argument and a
`UsePreAttrPos` in the return type of `f`. These are used in
`parse_expr_force_collect` (for vanilla expressions) and in
`parse_stmt_without_recovery` (for two different cases of expression
statements). Together these ensure are enough to fix all the problems
with token collection and assoc expressions. The changes to the
`stringify.rs` test demonstrate some of these.
- Adds a new test. The code in this test was causing an assertion
failure prior to this commit, due to an invalid `NodeRange`.
The extra complexity is annoying, but necessary to fix the existing
problems.
Fix ICE in suggestion caused by `⩵` being recovered as `==`
The second suggestion shown here would previously incorrectly assume that the span corresponding to `⩵` was 2 bytes wide composed by 2 1 byte wide chars, so a span pointing at `==` could point only at one of the `=` to remove it. Instead, we now replace the whole thing (as we should have the whole time):
```
error: unknown start of token: \u{2a75}
--> $DIR/unicode-double-equals-recovery.rs:1:16
|
LL | const A: usize ⩵ 2;
| ^
|
help: Unicode character '⩵' (Two Consecutive Equals Signs) looks like '==' (Double Equals Sign), but it is not
|
LL | const A: usize == 2;
| ~~
error: unexpected `==`
--> $DIR/unicode-double-equals-recovery.rs:1:16
|
LL | const A: usize ⩵ 2;
| ^
|
help: try using `=` instead
|
LL | const A: usize = 2;
| ~
```
Fix#127823.
The second suggestion shown here would previously incorrectly assume that the span corresponding to `⩵` was 2 bytes wide composed by 2 1 byte wide chars, so a span pointing at `==` could point only at one of the `=` to remove it. Instead, we now replace the whole thing (as we should have the whole time):
```
error: unknown start of token: \u{2a75}
--> $DIR/unicode-double-equals-recovery.rs:1:16
|
LL | const A: usize ⩵ 2;
| ^
|
help: Unicode character '⩵' (Two Consecutive Equals Signs) looks like '==' (Double Equals Sign), but it is not
|
LL | const A: usize == 2;
| ~~
error: unexpected `==`
--> $DIR/unicode-double-equals-recovery.rs:1:16
|
LL | const A: usize ⩵ 2;
| ^
|
help: try using `=` instead
|
LL | const A: usize = 2;
| ~
```
Use smaller span for suggesting adding `_:` ahead of a type:
```
error: expected one of `(`, `...`, `..=`, `..`, `::`, `:`, `{`, or `|`, found `)`
--> $DIR/anon-params-denied-2018.rs:12:47
|
LL | fn foo_with_qualified_path(<Bar as T>::Baz);
| ^ expected one of 8 possible tokens
|
= note: anonymous parameters are removed in the 2018 edition (see RFC 1685)
help: explicitly ignore the parameter name
|
LL | fn foo_with_qualified_path(_: <Bar as T>::Baz);
| ++
```
Go over all structured parser suggestions and make them verbose style.
When suggesting to add or remove delimiters, turn them into multiple suggestion parts.
Improve conflict marker recovery
<!--
If this PR is related to an unstable feature or an otherwise tracked effort,
please link to the relevant tracking issue here. If you don't know of a related
tracking issue or there are none, feel free to ignore this.
This PR will get automatically assigned to a reviewer. In case you would like
a specific user to review your work, you can assign it to them by using
r? <reviewer name>
-->
closes#113826
r? ```@estebank``` since you reviewed #115413
cc: ```@rben01``` since you opened up the issue in the first place
Fix duplicated attributes on nonterminal expressions
This PR fixes a long-standing bug (#86055) whereby expression attributes can be duplicated when expanded through declarative macros.
First, consider how items are parsed in declarative macros:
```
Items:
- parse_nonterminal
- parse_item(ForceCollect::Yes)
- parse_item_
- attrs = parse_outer_attributes
- parse_item_common(attrs)
- maybe_whole!
- collect_tokens_trailing_token
```
The important thing is that the parsing of outer attributes is outside token collection, so the item's tokens don't include the attributes. This is how it's supposed to be.
Now consider how expression are parsed in declarative macros:
```
Exprs:
- parse_nonterminal
- parse_expr_force_collect
- collect_tokens_no_attrs
- collect_tokens_trailing_token
- parse_expr
- parse_expr_res(None)
- parse_expr_assoc_with
- parse_expr_prefix
- parse_or_use_outer_attributes
- parse_expr_dot_or_call
```
The important thing is that the parsing of outer attributes is inside token collection, so the the expr's tokens do include the attributes, i.e. in `AttributesData::tokens`.
This PR fixes the bug by rearranging expression parsing to that outer attribute parsing happens outside of token collection. This requires a number of small refactorings because expression parsing is somewhat complicated. While doing so the PR makes the code a bit cleaner and simpler, by eliminating `parse_or_use_outer_attributes` and `Option<AttrWrapper>` arguments (in favour of the simpler `parse_outer_attributes` and `AttrWrapper` arguments), and simplifying `LhsExpr`.
r? `@petrochenkov`
This span records the declaration of the metavariable in the LHS of the macro.
It's used in a couple of error messages. Unfortunately, it gets in the way of
the long-term goal of removing `TokenKind::Interpolated`. So this commit
removes it, which degrades a couple of (obscure) error messages but makes
things simpler and enables the next commit.
Make sure we consume a generic arg when checking mistyped turbofish
When recovering un-turbofish-ed args in expr position (e.g. `let x = a<T, U>();` in `check_mistyped_turbofish_with_multiple_type_params`, we used `parse_seq_to_before_end` to parse the fake generic args; however, it used `parse_generic_arg` which *optionally* parses a generic arg. If it doesn't end up parsing an arg, it returns `Ok(None)` and consumes no tokens. If we don't find a delimiter after this (`,`), we try parsing *another* element. In this case, we just infinitely loop looking for a subsequent element.
We can fix this by making sure that we either parse a generic arg or error in `parse_seq_to_before_end`'s callback.
Fixes#124897
The starting point for this was identical comments on two different
fields, in `ast::VariantData::Struct` and `hir::VariantData::Struct`:
```
// FIXME: investigate making this a `Option<ErrorGuaranteed>`
recovered: bool
```
I tried that, and then found that I needed to add an `ErrorGuaranteed`
to `Recovered::Yes`. Then I ended up using `Recovered` instead of
`Option<ErrorGuaranteed>` for these two places and elsewhere, which
required moving `ErrorGuaranteed` from `rustc_parse` to `rustc_ast`.
This makes things more consistent, because `Recovered` is used in more
places, and there are fewer uses of `bool` and
`Option<ErrorGuaranteed>`. And safer, because it's difficult/impossible
to set `recovered` to `Recovered::Yes` without having emitted an error.
Existing names for values of this type are `sess`, `parse_sess`,
`parse_session`, and `ps`. `sess` is particularly annoying because
that's also used for `Session` values, which are often co-located, and
it can be difficult to know which type a value named `sess` refers to.
(That annoyance is the main motivation for this change.) `psess` is nice
and short, which is good for a name used this much.
The commit also renames some `parse_sess_created` values as
`psess_created`.