Speed up Parser::expected_token_types.

The parser pushes a `TokenType` to `Parser::expected_token_types` on
every call to the various `check`/`eat` methods, and clears it on every
call to `bump`. Some of those `TokenType` values are full tokens that
require cloning and dropping. This is a *lot* of work for something
that is only used in error messages and it accounts for a significant
fraction of parsing execution time.

This commit overhauls `TokenType` so that `Parser::expected_token_types`
can be implemented as a bitset. This requires changing `TokenType` to a
C-style parameterless enum, and adding `TokenTypeSet` which uses a
`u128` for the bits. (The new `TokenType` has 105 variants.)

The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to
the `check`/`eat` methods. This is for maximum speed. The elements in
the pairs are always statically known; e.g. a
`token::BinOp(token::Star)` is always paired with a `TokenType::Star`.
So we now compute `TokenType`s in advance and pass them in to
`check`/`eat` rather than the current approach of constructing them on
insertion into `expected_token_types`.

Values of these pair types can be produced by the new `exp!` macro,
which is used at every `check`/`eat` call site. The macro is for
convenience, allowing any pair to be generated from a single identifier.

The ident/keyword filtering in `expected_one_of_not_found` is no longer
necessary. It was there to account for some sloppiness in
`TokenKind`/`TokenType` comparisons.

The existing `TokenType` is moved to a new file `token_type.rs`, and all
its new infrastructure is added to that file. There is more boilerplate
code than I would like, but I can't see how to make it shorter.
This commit is contained in:
Nicholas Nethercote 2024-12-04 15:55:06 +11:00
parent d5370d981f
commit b9bf0b4b10
22 changed files with 1357 additions and 793 deletions

View file

@ -12,6 +12,7 @@ use rustc_errors::{Applicability, Diag, MultiSpan, PResult, SingleLabelManySpans
use rustc_expand::base::*;
use rustc_lint_defs::builtin::NAMED_ARGUMENTS_USED_POSITIONALLY;
use rustc_lint_defs::{BufferedEarlyLint, BuiltinLintDiag, LintId};
use rustc_parse::exp;
use rustc_parse_format as parse;
use rustc_span::{BytePos, ErrorGuaranteed, Ident, InnerSpan, Span, Symbol};
@ -93,12 +94,12 @@ fn parse_args<'a>(ecx: &ExtCtxt<'a>, sp: Span, tts: TokenStream) -> PResult<'a,
let mut first = true;
while p.token != token::Eof {
if !p.eat(&token::Comma) {
if !p.eat(exp!(Comma)) {
if first {
p.clear_expected_token_types();
}
match p.expect(&token::Comma) {
match p.expect(exp!(Comma)) {
Err(err) => {
match token::TokenKind::Comma.similar_tokens() {
Some(tks) if tks.contains(&p.token.kind) => {
@ -122,7 +123,7 @@ fn parse_args<'a>(ecx: &ExtCtxt<'a>, sp: Span, tts: TokenStream) -> PResult<'a,
match p.token.ident() {
Some((ident, _)) if p.look_ahead(1, |t| *t == token::Eq) => {
p.bump();
p.expect(&token::Eq)?;
p.expect(exp!(Eq))?;
let expr = p.parse_expr()?;
if let Some((_, prev)) = args.by_name(ident.name) {
ecx.dcx().emit_err(errors::FormatDuplicateArg {