2020-12-16 17:34:47 -08:00
|
|
|
//! This is an NFA-based parser, which calls out to the main Rust parser for named non-terminals
|
2017-07-23 11:55:52 +02:00
|
|
|
//! (which it commits to fully when it hits one in a grammar). There's a set of current NFA threads
|
|
|
|
//! and a set of next ones. Instead of NTs, we have a special case for Kleene star. The big-O, in
|
|
|
|
//! pathological cases, is worse than traditional use of NFA or Earley parsing, but it's an easier
|
|
|
|
//! fit for Macro-by-Example-style rules.
|
|
|
|
//!
|
|
|
|
//! (In order to prevent the pathological case, we'd need to lazily construct the resulting
|
|
|
|
//! `NamedMatch`es at the very end. It'd be a pain, and require more memory to keep around old
|
2022-03-25 16:20:39 +11:00
|
|
|
//! matcher positions, but it would also save overhead)
|
2017-07-23 11:55:52 +02:00
|
|
|
//!
|
2018-05-27 09:47:04 +09:00
|
|
|
//! We don't say this parser uses the Earley algorithm, because it's unnecessarily inaccurate.
|
2017-07-23 11:55:52 +02:00
|
|
|
//! The macro parser restricts itself to the features of finite state automata. Earley parsers
|
|
|
|
//! can be described as an extension of NFAs with completion rules, prediction rules, and recursion.
|
2014-06-09 13:12:30 -07:00
|
|
|
//!
|
|
|
|
//! Quick intro to how the parser works:
|
|
|
|
//!
|
2022-03-25 16:20:39 +11:00
|
|
|
//! A "matcher position" (a.k.a. "position" or "mp") is a dot in the middle of a matcher, usually
|
|
|
|
//! written as a `·`. For example `· a $( a )* a b` is one, as is `a $( · a )* a b`.
|
2014-06-09 13:12:30 -07:00
|
|
|
//!
|
A new matcher representation for use in `parse_tt`.
`parse_tt` currently traverses a `&[TokenTree]` to do matching. But this
is a bad representation for the traversal.
- `TokenTree` is nested, and there's a bunch of expensive and fiddly
state required to handle entering and exiting nested submatchers.
- There are three positions (sequence separators, sequence Kleene ops,
and end of the matcher) that are represented by an index that exceeds
the end of the `&[TokenTree]`, which is clumsy and error-prone.
This commit introduces a new representation called `MatcherLoc` that is
designed specifically for matching. It fixes all the above problems,
making the code much easier to read. A `&[TokenTree]` is converted to a
`&[MatcherLoc]` before matching begins. Despite the cost of the
conversion, it's still a net performance win, because various pieces of
traversal state are computed once up-front, rather than having to be
recomputed repeatedly during the macro matching.
Some improvements worth noting.
- `parse_tt_inner` is *much* easier to read. No more having to compare
`idx` against `len` and read comments to understand what the result
means.
- The handling of `Delimited` in `parse_tt_inner` is now trivial.
- The three end-of-sequence cases in `parse_tt_inner` are now handled in
three separate match arms, and the control flow is much simpler.
- `nameize` is no longer recursive.
- There were two places that issued "missing fragment specifier" errors:
one in `parse_tt_inner()`, and one in `nameize()`. Presumably the
latter was never executed. There's now a single place issuing these
errors, in `compute_locs()`.
- The number of heap allocations done for a `check full` build of
`async-std-1.10.0` (an extreme example of heavy macro use) drops from
11.8M to 2.6M, and most of these occur outside of macro matching.
- The size of `MatcherPos` drops from 64 bytes to 16 bytes. Small enough
that it no longer needs boxing, which partly accounts for the
reduction in allocations.
- The rest of the drop in allocations is due to the removal of
`MatcherKind`, because we no longer need to record anything for the
parent matcher when entering a submatcher.
- Overall it reduces code size by 45 lines.
2022-04-01 10:19:16 +11:00
|
|
|
//! The parser walks through the input a token at a time, maintaining a list
|
2022-03-25 16:20:39 +11:00
|
|
|
//! of threads consistent with the current position in the input string: `cur_mps`.
|
2014-06-09 13:12:30 -07:00
|
|
|
//!
|
2022-03-25 16:20:39 +11:00
|
|
|
//! As it processes them, it fills up `eof_mps` with threads that would be valid if
|
|
|
|
//! the macro invocation is now over, `bb_mps` with threads that are waiting on
|
|
|
|
//! a Rust non-terminal like `$e:expr`, and `next_mps` with threads that are waiting
|
2015-10-07 23:11:25 +01:00
|
|
|
//! on a particular token. Most of the logic concerns moving the · through the
|
2017-07-23 11:55:52 +02:00
|
|
|
//! repetitions indicated by Kleene stars. The rules for moving the · without
|
|
|
|
//! consuming any input are called epsilon transitions. It only advances or calls
|
2022-03-25 16:20:39 +11:00
|
|
|
//! out to the real Rust parser when no `cur_mps` threads remain.
|
2014-06-09 13:12:30 -07:00
|
|
|
//!
|
2017-05-13 21:40:06 +02:00
|
|
|
//! Example:
|
2014-06-09 13:12:30 -07:00
|
|
|
//!
|
2017-05-13 21:40:06 +02:00
|
|
|
//! ```text, ignore
|
|
|
|
//! Start parsing a a a a b against [· a $( a )* a b].
|
2014-06-09 13:12:30 -07:00
|
|
|
//!
|
2017-05-13 21:40:06 +02:00
|
|
|
//! Remaining input: a a a a b
|
2017-07-23 11:55:52 +02:00
|
|
|
//! next: [· a $( a )* a b]
|
2014-06-09 13:12:30 -07:00
|
|
|
//!
|
2017-05-13 21:40:06 +02:00
|
|
|
//! - - - Advance over an a. - - -
|
2014-06-09 13:12:30 -07:00
|
|
|
//!
|
2017-05-13 21:40:06 +02:00
|
|
|
//! Remaining input: a a a b
|
2014-06-09 13:12:30 -07:00
|
|
|
//! cur: [a · $( a )* a b]
|
2022-03-25 16:20:39 +11:00
|
|
|
//! Descend/Skip (first position).
|
2014-06-09 13:12:30 -07:00
|
|
|
//! next: [a $( · a )* a b] [a $( a )* · a b].
|
|
|
|
//!
|
2017-05-13 21:40:06 +02:00
|
|
|
//! - - - Advance over an a. - - -
|
2014-06-09 13:12:30 -07:00
|
|
|
//!
|
2017-05-13 21:40:06 +02:00
|
|
|
//! Remaining input: a a b
|
2017-07-23 11:55:52 +02:00
|
|
|
//! cur: [a $( a · )* a b] [a $( a )* a · b]
|
2022-03-25 16:20:39 +11:00
|
|
|
//! Follow epsilon transition: Finish/Repeat (first position)
|
2014-06-09 13:12:30 -07:00
|
|
|
//! next: [a $( a )* · a b] [a $( · a )* a b] [a $( a )* a · b]
|
|
|
|
//!
|
2017-05-13 21:40:06 +02:00
|
|
|
//! - - - Advance over an a. - - - (this looks exactly like the last step)
|
2014-06-09 13:12:30 -07:00
|
|
|
//!
|
2017-05-13 21:40:06 +02:00
|
|
|
//! Remaining input: a b
|
2017-07-23 11:55:52 +02:00
|
|
|
//! cur: [a $( a · )* a b] [a $( a )* a · b]
|
2022-03-25 16:20:39 +11:00
|
|
|
//! Follow epsilon transition: Finish/Repeat (first position)
|
2014-06-09 13:12:30 -07:00
|
|
|
//! next: [a $( a )* · a b] [a $( · a )* a b] [a $( a )* a · b]
|
|
|
|
//!
|
2017-05-13 21:40:06 +02:00
|
|
|
//! - - - Advance over an a. - - - (this looks exactly like the last step)
|
2014-06-09 13:12:30 -07:00
|
|
|
//!
|
2017-05-13 21:40:06 +02:00
|
|
|
//! Remaining input: b
|
2017-07-23 11:55:52 +02:00
|
|
|
//! cur: [a $( a · )* a b] [a $( a )* a · b]
|
2022-03-25 16:20:39 +11:00
|
|
|
//! Follow epsilon transition: Finish/Repeat (first position)
|
2017-07-23 11:55:52 +02:00
|
|
|
//! next: [a $( a )* · a b] [a $( · a )* a b] [a $( a )* a · b]
|
2014-06-09 13:12:30 -07:00
|
|
|
//!
|
2017-05-13 21:40:06 +02:00
|
|
|
//! - - - Advance over a b. - - -
|
2014-06-09 13:12:30 -07:00
|
|
|
//!
|
2017-05-13 21:40:06 +02:00
|
|
|
//! Remaining input: ''
|
2014-06-09 13:12:30 -07:00
|
|
|
//! eof: [a $( a )* a b ·]
|
2017-05-13 21:40:06 +02:00
|
|
|
//! ```
|
2014-06-09 13:12:30 -07:00
|
|
|
|
2019-09-22 17:42:17 +03:00
|
|
|
crate use NamedMatch::*;
|
|
|
|
crate use ParseResult::*;
|
2019-02-07 02:33:01 +09:00
|
|
|
|
A new matcher representation for use in `parse_tt`.
`parse_tt` currently traverses a `&[TokenTree]` to do matching. But this
is a bad representation for the traversal.
- `TokenTree` is nested, and there's a bunch of expensive and fiddly
state required to handle entering and exiting nested submatchers.
- There are three positions (sequence separators, sequence Kleene ops,
and end of the matcher) that are represented by an index that exceeds
the end of the `&[TokenTree]`, which is clumsy and error-prone.
This commit introduces a new representation called `MatcherLoc` that is
designed specifically for matching. It fixes all the above problems,
making the code much easier to read. A `&[TokenTree]` is converted to a
`&[MatcherLoc]` before matching begins. Despite the cost of the
conversion, it's still a net performance win, because various pieces of
traversal state are computed once up-front, rather than having to be
recomputed repeatedly during the macro matching.
Some improvements worth noting.
- `parse_tt_inner` is *much* easier to read. No more having to compare
`idx` against `len` and read comments to understand what the result
means.
- The handling of `Delimited` in `parse_tt_inner` is now trivial.
- The three end-of-sequence cases in `parse_tt_inner` are now handled in
three separate match arms, and the control flow is much simpler.
- `nameize` is no longer recursive.
- There were two places that issued "missing fragment specifier" errors:
one in `parse_tt_inner()`, and one in `nameize()`. Presumably the
latter was never executed. There's now a single place issuing these
errors, in `compute_locs()`.
- The number of heap allocations done for a `check full` build of
`async-std-1.10.0` (an extreme example of heavy macro use) drops from
11.8M to 2.6M, and most of these occur outside of macro matching.
- The size of `MatcherPos` drops from 64 bytes to 16 bytes. Small enough
that it no longer needs boxing, which partly accounts for the
reduction in allocations.
- The rest of the drop in allocations is due to the removal of
`MatcherKind`, because we no longer need to record anything for the
parent matcher when entering a submatcher.
- Overall it reduces code size by 45 lines.
2022-04-01 10:19:16 +11:00
|
|
|
use crate::mbe::{KleeneOp, TokenTree};
|
2019-10-16 10:59:30 +02:00
|
|
|
|
A new matcher representation for use in `parse_tt`.
`parse_tt` currently traverses a `&[TokenTree]` to do matching. But this
is a bad representation for the traversal.
- `TokenTree` is nested, and there's a bunch of expensive and fiddly
state required to handle entering and exiting nested submatchers.
- There are three positions (sequence separators, sequence Kleene ops,
and end of the matcher) that are represented by an index that exceeds
the end of the `&[TokenTree]`, which is clumsy and error-prone.
This commit introduces a new representation called `MatcherLoc` that is
designed specifically for matching. It fixes all the above problems,
making the code much easier to read. A `&[TokenTree]` is converted to a
`&[MatcherLoc]` before matching begins. Despite the cost of the
conversion, it's still a net performance win, because various pieces of
traversal state are computed once up-front, rather than having to be
recomputed repeatedly during the macro matching.
Some improvements worth noting.
- `parse_tt_inner` is *much* easier to read. No more having to compare
`idx` against `len` and read comments to understand what the result
means.
- The handling of `Delimited` in `parse_tt_inner` is now trivial.
- The three end-of-sequence cases in `parse_tt_inner` are now handled in
three separate match arms, and the control flow is much simpler.
- `nameize` is no longer recursive.
- There were two places that issued "missing fragment specifier" errors:
one in `parse_tt_inner()`, and one in `nameize()`. Presumably the
latter was never executed. There's now a single place issuing these
errors, in `compute_locs()`.
- The number of heap allocations done for a `check full` build of
`async-std-1.10.0` (an extreme example of heavy macro use) drops from
11.8M to 2.6M, and most of these occur outside of macro matching.
- The size of `MatcherPos` drops from 64 bytes to 16 bytes. Small enough
that it no longer needs boxing, which partly accounts for the
reduction in allocations.
- The rest of the drop in allocations is due to the removal of
`MatcherKind`, because we no longer need to record anything for the
parent matcher when entering a submatcher.
- Overall it reduces code size by 45 lines.
2022-04-01 10:19:16 +11:00
|
|
|
use rustc_ast::token::{self, DocComment, Nonterminal, NonterminalKind, Token};
|
2022-03-25 12:39:12 +11:00
|
|
|
use rustc_parse::parser::{NtOrTt, Parser};
|
2020-12-28 16:57:13 -06:00
|
|
|
use rustc_span::symbol::MacroRulesNormalizedIdent;
|
A new matcher representation for use in `parse_tt`.
`parse_tt` currently traverses a `&[TokenTree]` to do matching. But this
is a bad representation for the traversal.
- `TokenTree` is nested, and there's a bunch of expensive and fiddly
state required to handle entering and exiting nested submatchers.
- There are three positions (sequence separators, sequence Kleene ops,
and end of the matcher) that are represented by an index that exceeds
the end of the `&[TokenTree]`, which is clumsy and error-prone.
This commit introduces a new representation called `MatcherLoc` that is
designed specifically for matching. It fixes all the above problems,
making the code much easier to read. A `&[TokenTree]` is converted to a
`&[MatcherLoc]` before matching begins. Despite the cost of the
conversion, it's still a net performance win, because various pieces of
traversal state are computed once up-front, rather than having to be
recomputed repeatedly during the macro matching.
Some improvements worth noting.
- `parse_tt_inner` is *much* easier to read. No more having to compare
`idx` against `len` and read comments to understand what the result
means.
- The handling of `Delimited` in `parse_tt_inner` is now trivial.
- The three end-of-sequence cases in `parse_tt_inner` are now handled in
three separate match arms, and the control flow is much simpler.
- `nameize` is no longer recursive.
- There were two places that issued "missing fragment specifier" errors:
one in `parse_tt_inner()`, and one in `nameize()`. Presumably the
latter was never executed. There's now a single place issuing these
errors, in `compute_locs()`.
- The number of heap allocations done for a `check full` build of
`async-std-1.10.0` (an extreme example of heavy macro use) drops from
11.8M to 2.6M, and most of these occur outside of macro matching.
- The size of `MatcherPos` drops from 64 bytes to 16 bytes. Small enough
that it no longer needs boxing, which partly accounts for the
reduction in allocations.
- The rest of the drop in allocations is due to the removal of
`MatcherKind`, because we no longer need to record anything for the
parent matcher when entering a submatcher.
- Overall it reduces code size by 45 lines.
2022-04-01 10:19:16 +11:00
|
|
|
use rustc_span::Span;
|
2019-02-07 02:33:01 +09:00
|
|
|
|
2018-08-18 13:55:43 +03:00
|
|
|
use rustc_data_structures::fx::FxHashMap;
|
2019-02-15 12:36:10 +11:00
|
|
|
use rustc_data_structures::sync::Lrc;
|
2021-06-07 20:17:48 -05:00
|
|
|
use rustc_span::symbol::Ident;
|
2020-02-05 09:44:03 +11:00
|
|
|
use std::borrow::Cow;
|
2018-08-18 13:55:43 +03:00
|
|
|
use std::collections::hash_map::Entry::{Occupied, Vacant};
|
2014-10-06 23:00:56 +01:00
|
|
|
|
A new matcher representation for use in `parse_tt`.
`parse_tt` currently traverses a `&[TokenTree]` to do matching. But this
is a bad representation for the traversal.
- `TokenTree` is nested, and there's a bunch of expensive and fiddly
state required to handle entering and exiting nested submatchers.
- There are three positions (sequence separators, sequence Kleene ops,
and end of the matcher) that are represented by an index that exceeds
the end of the `&[TokenTree]`, which is clumsy and error-prone.
This commit introduces a new representation called `MatcherLoc` that is
designed specifically for matching. It fixes all the above problems,
making the code much easier to read. A `&[TokenTree]` is converted to a
`&[MatcherLoc]` before matching begins. Despite the cost of the
conversion, it's still a net performance win, because various pieces of
traversal state are computed once up-front, rather than having to be
recomputed repeatedly during the macro matching.
Some improvements worth noting.
- `parse_tt_inner` is *much* easier to read. No more having to compare
`idx` against `len` and read comments to understand what the result
means.
- The handling of `Delimited` in `parse_tt_inner` is now trivial.
- The three end-of-sequence cases in `parse_tt_inner` are now handled in
three separate match arms, and the control flow is much simpler.
- `nameize` is no longer recursive.
- There were two places that issued "missing fragment specifier" errors:
one in `parse_tt_inner()`, and one in `nameize()`. Presumably the
latter was never executed. There's now a single place issuing these
errors, in `compute_locs()`.
- The number of heap allocations done for a `check full` build of
`async-std-1.10.0` (an extreme example of heavy macro use) drops from
11.8M to 2.6M, and most of these occur outside of macro matching.
- The size of `MatcherPos` drops from 64 bytes to 16 bytes. Small enough
that it no longer needs boxing, which partly accounts for the
reduction in allocations.
- The rest of the drop in allocations is due to the removal of
`MatcherKind`, because we no longer need to record anything for the
parent matcher when entering a submatcher.
- Overall it reduces code size by 45 lines.
2022-04-01 10:19:16 +11:00
|
|
|
/// A unit within a matcher that a `MatcherPos` can refer to. Similar to (and derived from)
|
|
|
|
/// `mbe::TokenTree`, but designed specifically for fast and easy traversal during matching.
|
|
|
|
/// Notable differences to `mbe::TokenTree`:
|
|
|
|
/// - It is non-recursive, i.e. there is no nesting.
|
|
|
|
/// - The end pieces of each sequence (the separator, if present, and the Kleene op) are
|
|
|
|
/// represented explicitly, as is the very end of the matcher.
|
|
|
|
///
|
|
|
|
/// This means a matcher can be represented by `&[MatcherLoc]`, and traversal mostly involves
|
|
|
|
/// simply incrementing the current matcher position index by one.
|
2022-04-05 16:34:46 +10:00
|
|
|
pub(super) enum MatcherLoc {
|
A new matcher representation for use in `parse_tt`.
`parse_tt` currently traverses a `&[TokenTree]` to do matching. But this
is a bad representation for the traversal.
- `TokenTree` is nested, and there's a bunch of expensive and fiddly
state required to handle entering and exiting nested submatchers.
- There are three positions (sequence separators, sequence Kleene ops,
and end of the matcher) that are represented by an index that exceeds
the end of the `&[TokenTree]`, which is clumsy and error-prone.
This commit introduces a new representation called `MatcherLoc` that is
designed specifically for matching. It fixes all the above problems,
making the code much easier to read. A `&[TokenTree]` is converted to a
`&[MatcherLoc]` before matching begins. Despite the cost of the
conversion, it's still a net performance win, because various pieces of
traversal state are computed once up-front, rather than having to be
recomputed repeatedly during the macro matching.
Some improvements worth noting.
- `parse_tt_inner` is *much* easier to read. No more having to compare
`idx` against `len` and read comments to understand what the result
means.
- The handling of `Delimited` in `parse_tt_inner` is now trivial.
- The three end-of-sequence cases in `parse_tt_inner` are now handled in
three separate match arms, and the control flow is much simpler.
- `nameize` is no longer recursive.
- There were two places that issued "missing fragment specifier" errors:
one in `parse_tt_inner()`, and one in `nameize()`. Presumably the
latter was never executed. There's now a single place issuing these
errors, in `compute_locs()`.
- The number of heap allocations done for a `check full` build of
`async-std-1.10.0` (an extreme example of heavy macro use) drops from
11.8M to 2.6M, and most of these occur outside of macro matching.
- The size of `MatcherPos` drops from 64 bytes to 16 bytes. Small enough
that it no longer needs boxing, which partly accounts for the
reduction in allocations.
- The rest of the drop in allocations is due to the removal of
`MatcherKind`, because we no longer need to record anything for the
parent matcher when entering a submatcher.
- Overall it reduces code size by 45 lines.
2022-04-01 10:19:16 +11:00
|
|
|
Token {
|
2022-04-05 12:12:15 +10:00
|
|
|
token: Token,
|
A new matcher representation for use in `parse_tt`.
`parse_tt` currently traverses a `&[TokenTree]` to do matching. But this
is a bad representation for the traversal.
- `TokenTree` is nested, and there's a bunch of expensive and fiddly
state required to handle entering and exiting nested submatchers.
- There are three positions (sequence separators, sequence Kleene ops,
and end of the matcher) that are represented by an index that exceeds
the end of the `&[TokenTree]`, which is clumsy and error-prone.
This commit introduces a new representation called `MatcherLoc` that is
designed specifically for matching. It fixes all the above problems,
making the code much easier to read. A `&[TokenTree]` is converted to a
`&[MatcherLoc]` before matching begins. Despite the cost of the
conversion, it's still a net performance win, because various pieces of
traversal state are computed once up-front, rather than having to be
recomputed repeatedly during the macro matching.
Some improvements worth noting.
- `parse_tt_inner` is *much* easier to read. No more having to compare
`idx` against `len` and read comments to understand what the result
means.
- The handling of `Delimited` in `parse_tt_inner` is now trivial.
- The three end-of-sequence cases in `parse_tt_inner` are now handled in
three separate match arms, and the control flow is much simpler.
- `nameize` is no longer recursive.
- There were two places that issued "missing fragment specifier" errors:
one in `parse_tt_inner()`, and one in `nameize()`. Presumably the
latter was never executed. There's now a single place issuing these
errors, in `compute_locs()`.
- The number of heap allocations done for a `check full` build of
`async-std-1.10.0` (an extreme example of heavy macro use) drops from
11.8M to 2.6M, and most of these occur outside of macro matching.
- The size of `MatcherPos` drops from 64 bytes to 16 bytes. Small enough
that it no longer needs boxing, which partly accounts for the
reduction in allocations.
- The rest of the drop in allocations is due to the removal of
`MatcherKind`, because we no longer need to record anything for the
parent matcher when entering a submatcher.
- Overall it reduces code size by 45 lines.
2022-04-01 10:19:16 +11:00
|
|
|
},
|
|
|
|
Delimited,
|
|
|
|
Sequence {
|
|
|
|
op: KleeneOp,
|
|
|
|
num_metavar_decls: usize,
|
|
|
|
idx_first_after: usize,
|
|
|
|
next_metavar: usize,
|
|
|
|
seq_depth: usize,
|
|
|
|
},
|
|
|
|
SequenceKleeneOpNoSep {
|
|
|
|
op: KleeneOp,
|
|
|
|
idx_first: usize,
|
|
|
|
},
|
|
|
|
SequenceSep {
|
2022-04-05 12:12:15 +10:00
|
|
|
separator: Token,
|
A new matcher representation for use in `parse_tt`.
`parse_tt` currently traverses a `&[TokenTree]` to do matching. But this
is a bad representation for the traversal.
- `TokenTree` is nested, and there's a bunch of expensive and fiddly
state required to handle entering and exiting nested submatchers.
- There are three positions (sequence separators, sequence Kleene ops,
and end of the matcher) that are represented by an index that exceeds
the end of the `&[TokenTree]`, which is clumsy and error-prone.
This commit introduces a new representation called `MatcherLoc` that is
designed specifically for matching. It fixes all the above problems,
making the code much easier to read. A `&[TokenTree]` is converted to a
`&[MatcherLoc]` before matching begins. Despite the cost of the
conversion, it's still a net performance win, because various pieces of
traversal state are computed once up-front, rather than having to be
recomputed repeatedly during the macro matching.
Some improvements worth noting.
- `parse_tt_inner` is *much* easier to read. No more having to compare
`idx` against `len` and read comments to understand what the result
means.
- The handling of `Delimited` in `parse_tt_inner` is now trivial.
- The three end-of-sequence cases in `parse_tt_inner` are now handled in
three separate match arms, and the control flow is much simpler.
- `nameize` is no longer recursive.
- There were two places that issued "missing fragment specifier" errors:
one in `parse_tt_inner()`, and one in `nameize()`. Presumably the
latter was never executed. There's now a single place issuing these
errors, in `compute_locs()`.
- The number of heap allocations done for a `check full` build of
`async-std-1.10.0` (an extreme example of heavy macro use) drops from
11.8M to 2.6M, and most of these occur outside of macro matching.
- The size of `MatcherPos` drops from 64 bytes to 16 bytes. Small enough
that it no longer needs boxing, which partly accounts for the
reduction in allocations.
- The rest of the drop in allocations is due to the removal of
`MatcherKind`, because we no longer need to record anything for the
parent matcher when entering a submatcher.
- Overall it reduces code size by 45 lines.
2022-04-01 10:19:16 +11:00
|
|
|
},
|
|
|
|
SequenceKleeneOpAfterSep {
|
|
|
|
idx_first: usize,
|
|
|
|
},
|
|
|
|
MetaVarDecl {
|
|
|
|
span: Span,
|
|
|
|
bind: Ident,
|
2022-04-05 16:31:30 +10:00
|
|
|
kind: Option<NonterminalKind>,
|
A new matcher representation for use in `parse_tt`.
`parse_tt` currently traverses a `&[TokenTree]` to do matching. But this
is a bad representation for the traversal.
- `TokenTree` is nested, and there's a bunch of expensive and fiddly
state required to handle entering and exiting nested submatchers.
- There are three positions (sequence separators, sequence Kleene ops,
and end of the matcher) that are represented by an index that exceeds
the end of the `&[TokenTree]`, which is clumsy and error-prone.
This commit introduces a new representation called `MatcherLoc` that is
designed specifically for matching. It fixes all the above problems,
making the code much easier to read. A `&[TokenTree]` is converted to a
`&[MatcherLoc]` before matching begins. Despite the cost of the
conversion, it's still a net performance win, because various pieces of
traversal state are computed once up-front, rather than having to be
recomputed repeatedly during the macro matching.
Some improvements worth noting.
- `parse_tt_inner` is *much* easier to read. No more having to compare
`idx` against `len` and read comments to understand what the result
means.
- The handling of `Delimited` in `parse_tt_inner` is now trivial.
- The three end-of-sequence cases in `parse_tt_inner` are now handled in
three separate match arms, and the control flow is much simpler.
- `nameize` is no longer recursive.
- There were two places that issued "missing fragment specifier" errors:
one in `parse_tt_inner()`, and one in `nameize()`. Presumably the
latter was never executed. There's now a single place issuing these
errors, in `compute_locs()`.
- The number of heap allocations done for a `check full` build of
`async-std-1.10.0` (an extreme example of heavy macro use) drops from
11.8M to 2.6M, and most of these occur outside of macro matching.
- The size of `MatcherPos` drops from 64 bytes to 16 bytes. Small enough
that it no longer needs boxing, which partly accounts for the
reduction in allocations.
- The rest of the drop in allocations is due to the removal of
`MatcherKind`, because we no longer need to record anything for the
parent matcher when entering a submatcher.
- Overall it reduces code size by 45 lines.
2022-04-01 10:19:16 +11:00
|
|
|
next_metavar: usize,
|
|
|
|
seq_depth: usize,
|
|
|
|
},
|
|
|
|
Eof,
|
2022-03-31 08:21:36 +11:00
|
|
|
}
|
|
|
|
|
2022-04-13 22:59:45 +02:00
|
|
|
pub(super) fn compute_locs(matcher: &[TokenTree]) -> Vec<MatcherLoc> {
|
2022-04-05 16:34:46 +10:00
|
|
|
fn inner(
|
|
|
|
tts: &[TokenTree],
|
|
|
|
locs: &mut Vec<MatcherLoc>,
|
|
|
|
next_metavar: &mut usize,
|
|
|
|
seq_depth: usize,
|
|
|
|
) {
|
|
|
|
for tt in tts {
|
|
|
|
match tt {
|
|
|
|
TokenTree::Token(token) => {
|
|
|
|
locs.push(MatcherLoc::Token { token: token.clone() });
|
|
|
|
}
|
2022-04-08 17:38:28 +10:00
|
|
|
TokenTree::Delimited(span, delimited) => {
|
2022-04-11 10:55:49 +10:00
|
|
|
let open_token = Token::new(token::OpenDelim(delimited.delim), span.open);
|
|
|
|
let close_token = Token::new(token::CloseDelim(delimited.delim), span.close);
|
|
|
|
|
2022-04-05 16:34:46 +10:00
|
|
|
locs.push(MatcherLoc::Delimited);
|
2022-04-11 10:55:49 +10:00
|
|
|
locs.push(MatcherLoc::Token { token: open_token });
|
2022-04-13 22:59:45 +02:00
|
|
|
inner(&delimited.tts, locs, next_metavar, seq_depth);
|
2022-04-11 10:55:49 +10:00
|
|
|
locs.push(MatcherLoc::Token { token: close_token });
|
2022-04-05 16:34:46 +10:00
|
|
|
}
|
|
|
|
TokenTree::Sequence(_, seq) => {
|
|
|
|
// We can't determine `idx_first_after` and construct the final
|
|
|
|
// `MatcherLoc::Sequence` until after `inner()` is called and the sequence end
|
|
|
|
// pieces are processed. So we push a dummy value (`Eof` is cheapest to
|
|
|
|
// construct) now, and overwrite it with the proper value below.
|
|
|
|
let dummy = MatcherLoc::Eof;
|
|
|
|
locs.push(dummy);
|
|
|
|
|
|
|
|
let next_metavar_orig = *next_metavar;
|
|
|
|
let op = seq.kleene.op;
|
|
|
|
let idx_first = locs.len();
|
|
|
|
let idx_seq = idx_first - 1;
|
2022-04-13 22:59:45 +02:00
|
|
|
inner(&seq.tts, locs, next_metavar, seq_depth + 1);
|
2022-04-05 16:34:46 +10:00
|
|
|
|
|
|
|
if let Some(separator) = &seq.separator {
|
|
|
|
locs.push(MatcherLoc::SequenceSep { separator: separator.clone() });
|
|
|
|
locs.push(MatcherLoc::SequenceKleeneOpAfterSep { idx_first });
|
|
|
|
} else {
|
|
|
|
locs.push(MatcherLoc::SequenceKleeneOpNoSep { op, idx_first });
|
|
|
|
}
|
|
|
|
|
|
|
|
// Overwrite the dummy value pushed above with the proper value.
|
|
|
|
locs[idx_seq] = MatcherLoc::Sequence {
|
|
|
|
op,
|
|
|
|
num_metavar_decls: seq.num_captures,
|
|
|
|
idx_first_after: locs.len(),
|
|
|
|
next_metavar: next_metavar_orig,
|
|
|
|
seq_depth,
|
|
|
|
};
|
|
|
|
}
|
|
|
|
&TokenTree::MetaVarDecl(span, bind, kind) => {
|
|
|
|
locs.push(MatcherLoc::MetaVarDecl {
|
|
|
|
span,
|
|
|
|
bind,
|
|
|
|
kind,
|
|
|
|
next_metavar: *next_metavar,
|
|
|
|
seq_depth,
|
|
|
|
});
|
|
|
|
*next_metavar += 1;
|
|
|
|
}
|
|
|
|
TokenTree::MetaVar(..) | TokenTree::MetaVarExpr(..) => unreachable!(),
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
let mut locs = vec![];
|
|
|
|
let mut next_metavar = 0;
|
2022-04-13 22:59:45 +02:00
|
|
|
inner(matcher, &mut locs, &mut next_metavar, /* seq_depth */ 0);
|
2022-04-05 16:34:46 +10:00
|
|
|
|
|
|
|
// A final entry is needed for eof.
|
|
|
|
locs.push(MatcherLoc::Eof);
|
|
|
|
|
|
|
|
locs
|
|
|
|
}
|
|
|
|
|
A new matcher representation for use in `parse_tt`.
`parse_tt` currently traverses a `&[TokenTree]` to do matching. But this
is a bad representation for the traversal.
- `TokenTree` is nested, and there's a bunch of expensive and fiddly
state required to handle entering and exiting nested submatchers.
- There are three positions (sequence separators, sequence Kleene ops,
and end of the matcher) that are represented by an index that exceeds
the end of the `&[TokenTree]`, which is clumsy and error-prone.
This commit introduces a new representation called `MatcherLoc` that is
designed specifically for matching. It fixes all the above problems,
making the code much easier to read. A `&[TokenTree]` is converted to a
`&[MatcherLoc]` before matching begins. Despite the cost of the
conversion, it's still a net performance win, because various pieces of
traversal state are computed once up-front, rather than having to be
recomputed repeatedly during the macro matching.
Some improvements worth noting.
- `parse_tt_inner` is *much* easier to read. No more having to compare
`idx` against `len` and read comments to understand what the result
means.
- The handling of `Delimited` in `parse_tt_inner` is now trivial.
- The three end-of-sequence cases in `parse_tt_inner` are now handled in
three separate match arms, and the control flow is much simpler.
- `nameize` is no longer recursive.
- There were two places that issued "missing fragment specifier" errors:
one in `parse_tt_inner()`, and one in `nameize()`. Presumably the
latter was never executed. There's now a single place issuing these
errors, in `compute_locs()`.
- The number of heap allocations done for a `check full` build of
`async-std-1.10.0` (an extreme example of heavy macro use) drops from
11.8M to 2.6M, and most of these occur outside of macro matching.
- The size of `MatcherPos` drops from 64 bytes to 16 bytes. Small enough
that it no longer needs boxing, which partly accounts for the
reduction in allocations.
- The rest of the drop in allocations is due to the removal of
`MatcherKind`, because we no longer need to record anything for the
parent matcher when entering a submatcher.
- Overall it reduces code size by 45 lines.
2022-04-01 10:19:16 +11:00
|
|
|
/// A single matcher position, representing the state of matching.
|
|
|
|
struct MatcherPos {
|
|
|
|
/// The index into `TtParser::locs`, which represents the "dot".
|
2015-01-17 23:33:05 +00:00
|
|
|
idx: usize,
|
2018-11-12 09:18:57 +11:00
|
|
|
|
A new matcher representation for use in `parse_tt`.
`parse_tt` currently traverses a `&[TokenTree]` to do matching. But this
is a bad representation for the traversal.
- `TokenTree` is nested, and there's a bunch of expensive and fiddly
state required to handle entering and exiting nested submatchers.
- There are three positions (sequence separators, sequence Kleene ops,
and end of the matcher) that are represented by an index that exceeds
the end of the `&[TokenTree]`, which is clumsy and error-prone.
This commit introduces a new representation called `MatcherLoc` that is
designed specifically for matching. It fixes all the above problems,
making the code much easier to read. A `&[TokenTree]` is converted to a
`&[MatcherLoc]` before matching begins. Despite the cost of the
conversion, it's still a net performance win, because various pieces of
traversal state are computed once up-front, rather than having to be
recomputed repeatedly during the macro matching.
Some improvements worth noting.
- `parse_tt_inner` is *much* easier to read. No more having to compare
`idx` against `len` and read comments to understand what the result
means.
- The handling of `Delimited` in `parse_tt_inner` is now trivial.
- The three end-of-sequence cases in `parse_tt_inner` are now handled in
three separate match arms, and the control flow is much simpler.
- `nameize` is no longer recursive.
- There were two places that issued "missing fragment specifier" errors:
one in `parse_tt_inner()`, and one in `nameize()`. Presumably the
latter was never executed. There's now a single place issuing these
errors, in `compute_locs()`.
- The number of heap allocations done for a `check full` build of
`async-std-1.10.0` (an extreme example of heavy macro use) drops from
11.8M to 2.6M, and most of these occur outside of macro matching.
- The size of `MatcherPos` drops from 64 bytes to 16 bytes. Small enough
that it no longer needs boxing, which partly accounts for the
reduction in allocations.
- The rest of the drop in allocations is due to the removal of
`MatcherKind`, because we no longer need to record anything for the
parent matcher when entering a submatcher.
- Overall it reduces code size by 45 lines.
2022-04-01 10:19:16 +11:00
|
|
|
/// The matches made against metavar decls so far. On a successful match, this vector ends up
|
|
|
|
/// with one element per metavar decl in the matcher. Each element records token trees matched
|
|
|
|
/// against the relevant metavar by the black box parser. An element will be a `MatchedSeq` if
|
|
|
|
/// the corresponding metavar decl is within a sequence.
|
2022-04-08 14:25:37 +10:00
|
|
|
///
|
|
|
|
/// It is critical to performance that this is an `Lrc`, because it gets cloned frequently when
|
|
|
|
/// processing sequences. Mostly for sequence-ending possibilities that must be tried but end
|
|
|
|
/// up failing.
|
2022-04-08 14:16:44 +10:00
|
|
|
matches: Lrc<Vec<NamedMatch>>,
|
2013-02-21 00:16:31 -08:00
|
|
|
}
|
2012-06-12 10:59:50 -07:00
|
|
|
|
2022-03-03 11:02:43 +11:00
|
|
|
// This type is used a lot. Make sure it doesn't unintentionally get bigger.
|
|
|
|
#[cfg(all(target_arch = "x86_64", target_pointer_width = "64"))]
|
A new matcher representation for use in `parse_tt`.
`parse_tt` currently traverses a `&[TokenTree]` to do matching. But this
is a bad representation for the traversal.
- `TokenTree` is nested, and there's a bunch of expensive and fiddly
state required to handle entering and exiting nested submatchers.
- There are three positions (sequence separators, sequence Kleene ops,
and end of the matcher) that are represented by an index that exceeds
the end of the `&[TokenTree]`, which is clumsy and error-prone.
This commit introduces a new representation called `MatcherLoc` that is
designed specifically for matching. It fixes all the above problems,
making the code much easier to read. A `&[TokenTree]` is converted to a
`&[MatcherLoc]` before matching begins. Despite the cost of the
conversion, it's still a net performance win, because various pieces of
traversal state are computed once up-front, rather than having to be
recomputed repeatedly during the macro matching.
Some improvements worth noting.
- `parse_tt_inner` is *much* easier to read. No more having to compare
`idx` against `len` and read comments to understand what the result
means.
- The handling of `Delimited` in `parse_tt_inner` is now trivial.
- The three end-of-sequence cases in `parse_tt_inner` are now handled in
three separate match arms, and the control flow is much simpler.
- `nameize` is no longer recursive.
- There were two places that issued "missing fragment specifier" errors:
one in `parse_tt_inner()`, and one in `nameize()`. Presumably the
latter was never executed. There's now a single place issuing these
errors, in `compute_locs()`.
- The number of heap allocations done for a `check full` build of
`async-std-1.10.0` (an extreme example of heavy macro use) drops from
11.8M to 2.6M, and most of these occur outside of macro matching.
- The size of `MatcherPos` drops from 64 bytes to 16 bytes. Small enough
that it no longer needs boxing, which partly accounts for the
reduction in allocations.
- The rest of the drop in allocations is due to the removal of
`MatcherKind`, because we no longer need to record anything for the
parent matcher when entering a submatcher.
- Overall it reduces code size by 45 lines.
2022-04-01 10:19:16 +11:00
|
|
|
rustc_data_structures::static_assert_size!(MatcherPos, 16);
|
2022-03-03 12:14:27 +11:00
|
|
|
|
A new matcher representation for use in `parse_tt`.
`parse_tt` currently traverses a `&[TokenTree]` to do matching. But this
is a bad representation for the traversal.
- `TokenTree` is nested, and there's a bunch of expensive and fiddly
state required to handle entering and exiting nested submatchers.
- There are three positions (sequence separators, sequence Kleene ops,
and end of the matcher) that are represented by an index that exceeds
the end of the `&[TokenTree]`, which is clumsy and error-prone.
This commit introduces a new representation called `MatcherLoc` that is
designed specifically for matching. It fixes all the above problems,
making the code much easier to read. A `&[TokenTree]` is converted to a
`&[MatcherLoc]` before matching begins. Despite the cost of the
conversion, it's still a net performance win, because various pieces of
traversal state are computed once up-front, rather than having to be
recomputed repeatedly during the macro matching.
Some improvements worth noting.
- `parse_tt_inner` is *much* easier to read. No more having to compare
`idx` against `len` and read comments to understand what the result
means.
- The handling of `Delimited` in `parse_tt_inner` is now trivial.
- The three end-of-sequence cases in `parse_tt_inner` are now handled in
three separate match arms, and the control flow is much simpler.
- `nameize` is no longer recursive.
- There were two places that issued "missing fragment specifier" errors:
one in `parse_tt_inner()`, and one in `nameize()`. Presumably the
latter was never executed. There's now a single place issuing these
errors, in `compute_locs()`.
- The number of heap allocations done for a `check full` build of
`async-std-1.10.0` (an extreme example of heavy macro use) drops from
11.8M to 2.6M, and most of these occur outside of macro matching.
- The size of `MatcherPos` drops from 64 bytes to 16 bytes. Small enough
that it no longer needs boxing, which partly accounts for the
reduction in allocations.
- The rest of the drop in allocations is due to the removal of
`MatcherKind`, because we no longer need to record anything for the
parent matcher when entering a submatcher.
- Overall it reduces code size by 45 lines.
2022-04-01 10:19:16 +11:00
|
|
|
impl MatcherPos {
|
|
|
|
/// Adds `m` as a named match for the `metavar_idx`-th metavar. There are only two call sites,
|
|
|
|
/// and both are hot enough to be always worth inlining.
|
|
|
|
#[inline(always)]
|
|
|
|
fn push_match(&mut self, metavar_idx: usize, seq_depth: usize, m: NamedMatch) {
|
2022-03-28 17:13:56 +11:00
|
|
|
let matches = Lrc::make_mut(&mut self.matches);
|
A new matcher representation for use in `parse_tt`.
`parse_tt` currently traverses a `&[TokenTree]` to do matching. But this
is a bad representation for the traversal.
- `TokenTree` is nested, and there's a bunch of expensive and fiddly
state required to handle entering and exiting nested submatchers.
- There are three positions (sequence separators, sequence Kleene ops,
and end of the matcher) that are represented by an index that exceeds
the end of the `&[TokenTree]`, which is clumsy and error-prone.
This commit introduces a new representation called `MatcherLoc` that is
designed specifically for matching. It fixes all the above problems,
making the code much easier to read. A `&[TokenTree]` is converted to a
`&[MatcherLoc]` before matching begins. Despite the cost of the
conversion, it's still a net performance win, because various pieces of
traversal state are computed once up-front, rather than having to be
recomputed repeatedly during the macro matching.
Some improvements worth noting.
- `parse_tt_inner` is *much* easier to read. No more having to compare
`idx` against `len` and read comments to understand what the result
means.
- The handling of `Delimited` in `parse_tt_inner` is now trivial.
- The three end-of-sequence cases in `parse_tt_inner` are now handled in
three separate match arms, and the control flow is much simpler.
- `nameize` is no longer recursive.
- There were two places that issued "missing fragment specifier" errors:
one in `parse_tt_inner()`, and one in `nameize()`. Presumably the
latter was never executed. There's now a single place issuing these
errors, in `compute_locs()`.
- The number of heap allocations done for a `check full` build of
`async-std-1.10.0` (an extreme example of heavy macro use) drops from
11.8M to 2.6M, and most of these occur outside of macro matching.
- The size of `MatcherPos` drops from 64 bytes to 16 bytes. Small enough
that it no longer needs boxing, which partly accounts for the
reduction in allocations.
- The rest of the drop in allocations is due to the removal of
`MatcherKind`, because we no longer need to record anything for the
parent matcher when entering a submatcher.
- Overall it reduces code size by 45 lines.
2022-04-01 10:19:16 +11:00
|
|
|
match seq_depth {
|
2022-03-28 17:13:56 +11:00
|
|
|
0 => {
|
|
|
|
// We are not within a sequence. Just append `m`.
|
A new matcher representation for use in `parse_tt`.
`parse_tt` currently traverses a `&[TokenTree]` to do matching. But this
is a bad representation for the traversal.
- `TokenTree` is nested, and there's a bunch of expensive and fiddly
state required to handle entering and exiting nested submatchers.
- There are three positions (sequence separators, sequence Kleene ops,
and end of the matcher) that are represented by an index that exceeds
the end of the `&[TokenTree]`, which is clumsy and error-prone.
This commit introduces a new representation called `MatcherLoc` that is
designed specifically for matching. It fixes all the above problems,
making the code much easier to read. A `&[TokenTree]` is converted to a
`&[MatcherLoc]` before matching begins. Despite the cost of the
conversion, it's still a net performance win, because various pieces of
traversal state are computed once up-front, rather than having to be
recomputed repeatedly during the macro matching.
Some improvements worth noting.
- `parse_tt_inner` is *much* easier to read. No more having to compare
`idx` against `len` and read comments to understand what the result
means.
- The handling of `Delimited` in `parse_tt_inner` is now trivial.
- The three end-of-sequence cases in `parse_tt_inner` are now handled in
three separate match arms, and the control flow is much simpler.
- `nameize` is no longer recursive.
- There were two places that issued "missing fragment specifier" errors:
one in `parse_tt_inner()`, and one in `nameize()`. Presumably the
latter was never executed. There's now a single place issuing these
errors, in `compute_locs()`.
- The number of heap allocations done for a `check full` build of
`async-std-1.10.0` (an extreme example of heavy macro use) drops from
11.8M to 2.6M, and most of these occur outside of macro matching.
- The size of `MatcherPos` drops from 64 bytes to 16 bytes. Small enough
that it no longer needs boxing, which partly accounts for the
reduction in allocations.
- The rest of the drop in allocations is due to the removal of
`MatcherKind`, because we no longer need to record anything for the
parent matcher when entering a submatcher.
- Overall it reduces code size by 45 lines.
2022-04-01 10:19:16 +11:00
|
|
|
assert_eq!(metavar_idx, matches.len());
|
2022-03-28 17:13:56 +11:00
|
|
|
matches.push(m);
|
|
|
|
}
|
|
|
|
_ => {
|
|
|
|
// We are within a sequence. Find the final `MatchedSeq` at the appropriate depth
|
|
|
|
// and append `m` to its vector.
|
A new matcher representation for use in `parse_tt`.
`parse_tt` currently traverses a `&[TokenTree]` to do matching. But this
is a bad representation for the traversal.
- `TokenTree` is nested, and there's a bunch of expensive and fiddly
state required to handle entering and exiting nested submatchers.
- There are three positions (sequence separators, sequence Kleene ops,
and end of the matcher) that are represented by an index that exceeds
the end of the `&[TokenTree]`, which is clumsy and error-prone.
This commit introduces a new representation called `MatcherLoc` that is
designed specifically for matching. It fixes all the above problems,
making the code much easier to read. A `&[TokenTree]` is converted to a
`&[MatcherLoc]` before matching begins. Despite the cost of the
conversion, it's still a net performance win, because various pieces of
traversal state are computed once up-front, rather than having to be
recomputed repeatedly during the macro matching.
Some improvements worth noting.
- `parse_tt_inner` is *much* easier to read. No more having to compare
`idx` against `len` and read comments to understand what the result
means.
- The handling of `Delimited` in `parse_tt_inner` is now trivial.
- The three end-of-sequence cases in `parse_tt_inner` are now handled in
three separate match arms, and the control flow is much simpler.
- `nameize` is no longer recursive.
- There were two places that issued "missing fragment specifier" errors:
one in `parse_tt_inner()`, and one in `nameize()`. Presumably the
latter was never executed. There's now a single place issuing these
errors, in `compute_locs()`.
- The number of heap allocations done for a `check full` build of
`async-std-1.10.0` (an extreme example of heavy macro use) drops from
11.8M to 2.6M, and most of these occur outside of macro matching.
- The size of `MatcherPos` drops from 64 bytes to 16 bytes. Small enough
that it no longer needs boxing, which partly accounts for the
reduction in allocations.
- The rest of the drop in allocations is due to the removal of
`MatcherKind`, because we no longer need to record anything for the
parent matcher when entering a submatcher.
- Overall it reduces code size by 45 lines.
2022-04-01 10:19:16 +11:00
|
|
|
let mut curr = &mut matches[metavar_idx];
|
|
|
|
for _ in 0..seq_depth - 1 {
|
2022-03-28 17:13:56 +11:00
|
|
|
match curr {
|
2022-04-08 14:16:44 +10:00
|
|
|
MatchedSeq(seq) => curr = seq.last_mut().unwrap(),
|
2022-03-28 17:13:56 +11:00
|
|
|
_ => unreachable!(),
|
|
|
|
}
|
|
|
|
}
|
|
|
|
match curr {
|
2022-04-08 14:16:44 +10:00
|
|
|
MatchedSeq(seq) => seq.push(m),
|
2022-03-28 17:13:56 +11:00
|
|
|
_ => unreachable!(),
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
2017-06-08 05:51:32 -06:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
A new matcher representation for use in `parse_tt`.
`parse_tt` currently traverses a `&[TokenTree]` to do matching. But this
is a bad representation for the traversal.
- `TokenTree` is nested, and there's a bunch of expensive and fiddly
state required to handle entering and exiting nested submatchers.
- There are three positions (sequence separators, sequence Kleene ops,
and end of the matcher) that are represented by an index that exceeds
the end of the `&[TokenTree]`, which is clumsy and error-prone.
This commit introduces a new representation called `MatcherLoc` that is
designed specifically for matching. It fixes all the above problems,
making the code much easier to read. A `&[TokenTree]` is converted to a
`&[MatcherLoc]` before matching begins. Despite the cost of the
conversion, it's still a net performance win, because various pieces of
traversal state are computed once up-front, rather than having to be
recomputed repeatedly during the macro matching.
Some improvements worth noting.
- `parse_tt_inner` is *much* easier to read. No more having to compare
`idx` against `len` and read comments to understand what the result
means.
- The handling of `Delimited` in `parse_tt_inner` is now trivial.
- The three end-of-sequence cases in `parse_tt_inner` are now handled in
three separate match arms, and the control flow is much simpler.
- `nameize` is no longer recursive.
- There were two places that issued "missing fragment specifier" errors:
one in `parse_tt_inner()`, and one in `nameize()`. Presumably the
latter was never executed. There's now a single place issuing these
errors, in `compute_locs()`.
- The number of heap allocations done for a `check full` build of
`async-std-1.10.0` (an extreme example of heavy macro use) drops from
11.8M to 2.6M, and most of these occur outside of macro matching.
- The size of `MatcherPos` drops from 64 bytes to 16 bytes. Small enough
that it no longer needs boxing, which partly accounts for the
reduction in allocations.
- The rest of the drop in allocations is due to the removal of
`MatcherKind`, because we no longer need to record anything for the
parent matcher when entering a submatcher.
- Overall it reduces code size by 45 lines.
2022-04-01 10:19:16 +11:00
|
|
|
enum EofMatcherPositions {
|
2022-03-03 11:10:21 +11:00
|
|
|
None,
|
A new matcher representation for use in `parse_tt`.
`parse_tt` currently traverses a `&[TokenTree]` to do matching. But this
is a bad representation for the traversal.
- `TokenTree` is nested, and there's a bunch of expensive and fiddly
state required to handle entering and exiting nested submatchers.
- There are three positions (sequence separators, sequence Kleene ops,
and end of the matcher) that are represented by an index that exceeds
the end of the `&[TokenTree]`, which is clumsy and error-prone.
This commit introduces a new representation called `MatcherLoc` that is
designed specifically for matching. It fixes all the above problems,
making the code much easier to read. A `&[TokenTree]` is converted to a
`&[MatcherLoc]` before matching begins. Despite the cost of the
conversion, it's still a net performance win, because various pieces of
traversal state are computed once up-front, rather than having to be
recomputed repeatedly during the macro matching.
Some improvements worth noting.
- `parse_tt_inner` is *much* easier to read. No more having to compare
`idx` against `len` and read comments to understand what the result
means.
- The handling of `Delimited` in `parse_tt_inner` is now trivial.
- The three end-of-sequence cases in `parse_tt_inner` are now handled in
three separate match arms, and the control flow is much simpler.
- `nameize` is no longer recursive.
- There were two places that issued "missing fragment specifier" errors:
one in `parse_tt_inner()`, and one in `nameize()`. Presumably the
latter was never executed. There's now a single place issuing these
errors, in `compute_locs()`.
- The number of heap allocations done for a `check full` build of
`async-std-1.10.0` (an extreme example of heavy macro use) drops from
11.8M to 2.6M, and most of these occur outside of macro matching.
- The size of `MatcherPos` drops from 64 bytes to 16 bytes. Small enough
that it no longer needs boxing, which partly accounts for the
reduction in allocations.
- The rest of the drop in allocations is due to the removal of
`MatcherKind`, because we no longer need to record anything for the
parent matcher when entering a submatcher.
- Overall it reduces code size by 45 lines.
2022-04-01 10:19:16 +11:00
|
|
|
One(MatcherPos),
|
2022-03-03 11:10:21 +11:00
|
|
|
Multiple,
|
|
|
|
}
|
|
|
|
|
2018-01-24 22:03:57 -06:00
|
|
|
/// Represents the possible results of an attempted parse.
|
2019-09-22 17:42:17 +03:00
|
|
|
crate enum ParseResult<T> {
|
2018-01-24 22:03:57 -06:00
|
|
|
/// Parsed successfully.
|
|
|
|
Success(T),
|
|
|
|
/// Arm failed to match. If the second parameter is `token::Eof`, it indicates an unexpected
|
|
|
|
/// end of macro invocation. Otherwise, it indicates that no rules expected the given token.
|
2019-06-05 01:17:07 +03:00
|
|
|
Failure(Token, &'static str),
|
2018-01-24 22:03:57 -06:00
|
|
|
/// Fatal error (malformed macro?). Abort compilation.
|
2019-12-31 20:15:40 +03:00
|
|
|
Error(rustc_span::Span, String),
|
2020-03-17 14:13:32 +01:00
|
|
|
ErrorReported,
|
2018-01-24 22:03:57 -06:00
|
|
|
}
|
|
|
|
|
2020-03-11 20:05:19 +00:00
|
|
|
/// A `ParseResult` where the `Success` variant contains a mapping of
|
|
|
|
/// `MacroRulesNormalizedIdent`s to `NamedMatch`es. This represents the mapping
|
|
|
|
/// of metavars to the token trees they bind to.
|
|
|
|
crate type NamedParseResult = ParseResult<FxHashMap<MacroRulesNormalizedIdent, NamedMatch>>;
|
2016-11-07 19:40:00 -07:00
|
|
|
|
2022-03-25 16:23:26 +11:00
|
|
|
/// Count how many metavars declarations are in `matcher`.
|
|
|
|
pub(super) fn count_metavar_decls(matcher: &[TokenTree]) -> usize {
|
|
|
|
matcher
|
|
|
|
.iter()
|
2022-03-31 18:56:40 +03:00
|
|
|
.map(|tt| match tt {
|
|
|
|
TokenTree::MetaVarDecl(..) => 1,
|
|
|
|
TokenTree::Sequence(_, seq) => seq.num_captures,
|
2022-04-08 17:38:28 +10:00
|
|
|
TokenTree::Delimited(_, delim) => count_metavar_decls(&delim.tts),
|
2022-03-31 18:56:40 +03:00
|
|
|
TokenTree::Token(..) => 0,
|
|
|
|
TokenTree::MetaVar(..) | TokenTree::MetaVarExpr(..) => unreachable!(),
|
2022-03-25 16:23:26 +11:00
|
|
|
})
|
|
|
|
.sum()
|
2014-10-06 23:00:56 +01:00
|
|
|
}
|
|
|
|
|
2022-03-23 11:46:22 +11:00
|
|
|
/// `NamedMatch` is a pattern-match result for a single metavar. All
|
2022-03-25 12:39:12 +11:00
|
|
|
/// `MatchedNonterminal`s in the `NamedMatch` have the same non-terminal type
|
2022-03-23 11:46:22 +11:00
|
|
|
/// (expr, item, etc).
|
2014-06-09 13:12:30 -07:00
|
|
|
///
|
2017-05-12 20:05:39 +02:00
|
|
|
/// The in-memory structure of a particular `NamedMatch` represents the match
|
2014-10-06 23:00:56 +01:00
|
|
|
/// that occurred when a particular subset of a matcher was applied to a
|
|
|
|
/// particular token tree.
|
2014-06-09 13:12:30 -07:00
|
|
|
///
|
2017-05-12 20:05:39 +02:00
|
|
|
/// The width of each `MatchedSeq` in the `NamedMatch`, and the identity of
|
2022-03-23 11:46:22 +11:00
|
|
|
/// the `MatchedNtNonTts`s, will depend on the token tree it was applied
|
|
|
|
/// to: each `MatchedSeq` corresponds to a single repetition in the originating
|
2017-05-12 20:05:39 +02:00
|
|
|
/// token tree. The depth of the `NamedMatch` structure will therefore depend
|
2022-03-23 11:46:22 +11:00
|
|
|
/// only on the nesting depth of repetitions in the originating token tree it
|
|
|
|
/// was derived from.
|
2022-03-02 21:33:43 -06:00
|
|
|
///
|
2022-04-07 08:51:59 +01:00
|
|
|
/// In layperson's terms: `NamedMatch` will form a tree representing nested matches of a particular
|
2022-03-02 21:33:43 -06:00
|
|
|
/// meta variable. For example, if we are matching the following macro against the following
|
|
|
|
/// invocation...
|
|
|
|
///
|
|
|
|
/// ```rust
|
|
|
|
/// macro_rules! foo {
|
|
|
|
/// ($($($x:ident),+);+) => {}
|
|
|
|
/// }
|
|
|
|
///
|
|
|
|
/// foo!(a, b, c, d; a, b, c, d, e);
|
|
|
|
/// ```
|
|
|
|
///
|
|
|
|
/// Then, the tree will have the following shape:
|
|
|
|
///
|
|
|
|
/// ```rust
|
|
|
|
/// MatchedSeq([
|
|
|
|
/// MatchedSeq([
|
2022-03-25 12:39:12 +11:00
|
|
|
/// MatchedNonterminal(a),
|
|
|
|
/// MatchedNonterminal(b),
|
|
|
|
/// MatchedNonterminal(c),
|
|
|
|
/// MatchedNonterminal(d),
|
2022-03-02 21:33:43 -06:00
|
|
|
/// ]),
|
|
|
|
/// MatchedSeq([
|
2022-03-25 12:39:12 +11:00
|
|
|
/// MatchedNonterminal(a),
|
|
|
|
/// MatchedNonterminal(b),
|
|
|
|
/// MatchedNonterminal(c),
|
|
|
|
/// MatchedNonterminal(d),
|
|
|
|
/// MatchedNonterminal(e),
|
2022-03-02 21:33:43 -06:00
|
|
|
/// ])
|
|
|
|
/// ])
|
|
|
|
/// ```
|
2017-06-08 05:51:32 -06:00
|
|
|
#[derive(Debug, Clone)]
|
2019-09-22 17:42:17 +03:00
|
|
|
crate enum NamedMatch {
|
2022-04-08 14:16:44 +10:00
|
|
|
MatchedSeq(Vec<NamedMatch>),
|
2022-03-23 11:46:22 +11:00
|
|
|
|
2022-03-25 12:39:12 +11:00
|
|
|
// A metavar match of type `tt`.
|
|
|
|
MatchedTokenTree(rustc_ast::tokenstream::TokenTree),
|
2022-03-23 11:46:22 +11:00
|
|
|
|
2022-03-25 12:39:12 +11:00
|
|
|
// A metavar match of any type other than `tt`.
|
|
|
|
MatchedNonterminal(Lrc<Nonterminal>),
|
2012-07-27 19:14:46 -07:00
|
|
|
}
|
2012-06-12 10:59:50 -07:00
|
|
|
|
2019-02-08 14:53:55 +01:00
|
|
|
/// Performs a token equality check, ignoring syntax context (that is, an unhygienic comparison)
|
2019-06-08 19:45:12 +03:00
|
|
|
fn token_name_eq(t1: &Token, t2: &Token) -> bool {
|
|
|
|
if let (Some((ident1, is_raw1)), Some((ident2, is_raw2))) = (t1.ident(), t2.ident()) {
|
|
|
|
ident1.name == ident2.name && is_raw1 == is_raw2
|
|
|
|
} else if let (Some(ident1), Some(ident2)) = (t1.lifetime(), t2.lifetime()) {
|
|
|
|
ident1.name == ident2.name
|
2017-05-15 09:26:26 +00:00
|
|
|
} else {
|
2019-06-08 19:45:12 +03:00
|
|
|
t1.kind == t2.kind
|
2013-09-05 14:14:31 -07:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
A new matcher representation for use in `parse_tt`.
`parse_tt` currently traverses a `&[TokenTree]` to do matching. But this
is a bad representation for the traversal.
- `TokenTree` is nested, and there's a bunch of expensive and fiddly
state required to handle entering and exiting nested submatchers.
- There are three positions (sequence separators, sequence Kleene ops,
and end of the matcher) that are represented by an index that exceeds
the end of the `&[TokenTree]`, which is clumsy and error-prone.
This commit introduces a new representation called `MatcherLoc` that is
designed specifically for matching. It fixes all the above problems,
making the code much easier to read. A `&[TokenTree]` is converted to a
`&[MatcherLoc]` before matching begins. Despite the cost of the
conversion, it's still a net performance win, because various pieces of
traversal state are computed once up-front, rather than having to be
recomputed repeatedly during the macro matching.
Some improvements worth noting.
- `parse_tt_inner` is *much* easier to read. No more having to compare
`idx` against `len` and read comments to understand what the result
means.
- The handling of `Delimited` in `parse_tt_inner` is now trivial.
- The three end-of-sequence cases in `parse_tt_inner` are now handled in
three separate match arms, and the control flow is much simpler.
- `nameize` is no longer recursive.
- There were two places that issued "missing fragment specifier" errors:
one in `parse_tt_inner()`, and one in `nameize()`. Presumably the
latter was never executed. There's now a single place issuing these
errors, in `compute_locs()`.
- The number of heap allocations done for a `check full` build of
`async-std-1.10.0` (an extreme example of heavy macro use) drops from
11.8M to 2.6M, and most of these occur outside of macro matching.
- The size of `MatcherPos` drops from 64 bytes to 16 bytes. Small enough
that it no longer needs boxing, which partly accounts for the
reduction in allocations.
- The rest of the drop in allocations is due to the removal of
`MatcherKind`, because we no longer need to record anything for the
parent matcher when entering a submatcher.
- Overall it reduces code size by 45 lines.
2022-04-01 10:19:16 +11:00
|
|
|
// Note: the vectors could be created and dropped within `parse_tt`, but to avoid excess
|
2022-03-19 09:53:41 +11:00
|
|
|
// allocations we have a single vector fo each kind that is cleared and reused repeatedly.
|
2022-04-05 12:12:15 +10:00
|
|
|
pub struct TtParser {
|
2022-03-19 08:03:48 +11:00
|
|
|
macro_name: Ident,
|
2022-03-19 08:56:24 +11:00
|
|
|
|
2022-03-25 16:20:39 +11:00
|
|
|
/// The set of current mps to be processed. This should be empty by the end of a successful
|
2022-03-19 09:53:41 +11:00
|
|
|
/// execution of `parse_tt_inner`.
|
A new matcher representation for use in `parse_tt`.
`parse_tt` currently traverses a `&[TokenTree]` to do matching. But this
is a bad representation for the traversal.
- `TokenTree` is nested, and there's a bunch of expensive and fiddly
state required to handle entering and exiting nested submatchers.
- There are three positions (sequence separators, sequence Kleene ops,
and end of the matcher) that are represented by an index that exceeds
the end of the `&[TokenTree]`, which is clumsy and error-prone.
This commit introduces a new representation called `MatcherLoc` that is
designed specifically for matching. It fixes all the above problems,
making the code much easier to read. A `&[TokenTree]` is converted to a
`&[MatcherLoc]` before matching begins. Despite the cost of the
conversion, it's still a net performance win, because various pieces of
traversal state are computed once up-front, rather than having to be
recomputed repeatedly during the macro matching.
Some improvements worth noting.
- `parse_tt_inner` is *much* easier to read. No more having to compare
`idx` against `len` and read comments to understand what the result
means.
- The handling of `Delimited` in `parse_tt_inner` is now trivial.
- The three end-of-sequence cases in `parse_tt_inner` are now handled in
three separate match arms, and the control flow is much simpler.
- `nameize` is no longer recursive.
- There were two places that issued "missing fragment specifier" errors:
one in `parse_tt_inner()`, and one in `nameize()`. Presumably the
latter was never executed. There's now a single place issuing these
errors, in `compute_locs()`.
- The number of heap allocations done for a `check full` build of
`async-std-1.10.0` (an extreme example of heavy macro use) drops from
11.8M to 2.6M, and most of these occur outside of macro matching.
- The size of `MatcherPos` drops from 64 bytes to 16 bytes. Small enough
that it no longer needs boxing, which partly accounts for the
reduction in allocations.
- The rest of the drop in allocations is due to the removal of
`MatcherKind`, because we no longer need to record anything for the
parent matcher when entering a submatcher.
- Overall it reduces code size by 45 lines.
2022-04-01 10:19:16 +11:00
|
|
|
cur_mps: Vec<MatcherPos>,
|
2022-03-19 09:53:41 +11:00
|
|
|
|
2022-03-25 16:20:39 +11:00
|
|
|
/// The set of newly generated mps. These are used to replenish `cur_mps` in the function
|
2022-03-19 09:53:41 +11:00
|
|
|
/// `parse_tt`.
|
A new matcher representation for use in `parse_tt`.
`parse_tt` currently traverses a `&[TokenTree]` to do matching. But this
is a bad representation for the traversal.
- `TokenTree` is nested, and there's a bunch of expensive and fiddly
state required to handle entering and exiting nested submatchers.
- There are three positions (sequence separators, sequence Kleene ops,
and end of the matcher) that are represented by an index that exceeds
the end of the `&[TokenTree]`, which is clumsy and error-prone.
This commit introduces a new representation called `MatcherLoc` that is
designed specifically for matching. It fixes all the above problems,
making the code much easier to read. A `&[TokenTree]` is converted to a
`&[MatcherLoc]` before matching begins. Despite the cost of the
conversion, it's still a net performance win, because various pieces of
traversal state are computed once up-front, rather than having to be
recomputed repeatedly during the macro matching.
Some improvements worth noting.
- `parse_tt_inner` is *much* easier to read. No more having to compare
`idx` against `len` and read comments to understand what the result
means.
- The handling of `Delimited` in `parse_tt_inner` is now trivial.
- The three end-of-sequence cases in `parse_tt_inner` are now handled in
three separate match arms, and the control flow is much simpler.
- `nameize` is no longer recursive.
- There were two places that issued "missing fragment specifier" errors:
one in `parse_tt_inner()`, and one in `nameize()`. Presumably the
latter was never executed. There's now a single place issuing these
errors, in `compute_locs()`.
- The number of heap allocations done for a `check full` build of
`async-std-1.10.0` (an extreme example of heavy macro use) drops from
11.8M to 2.6M, and most of these occur outside of macro matching.
- The size of `MatcherPos` drops from 64 bytes to 16 bytes. Small enough
that it no longer needs boxing, which partly accounts for the
reduction in allocations.
- The rest of the drop in allocations is due to the removal of
`MatcherKind`, because we no longer need to record anything for the
parent matcher when entering a submatcher.
- Overall it reduces code size by 45 lines.
2022-04-01 10:19:16 +11:00
|
|
|
next_mps: Vec<MatcherPos>,
|
2022-03-19 09:53:41 +11:00
|
|
|
|
2022-03-25 16:20:39 +11:00
|
|
|
/// The set of mps that are waiting for the black-box parser.
|
A new matcher representation for use in `parse_tt`.
`parse_tt` currently traverses a `&[TokenTree]` to do matching. But this
is a bad representation for the traversal.
- `TokenTree` is nested, and there's a bunch of expensive and fiddly
state required to handle entering and exiting nested submatchers.
- There are three positions (sequence separators, sequence Kleene ops,
and end of the matcher) that are represented by an index that exceeds
the end of the `&[TokenTree]`, which is clumsy and error-prone.
This commit introduces a new representation called `MatcherLoc` that is
designed specifically for matching. It fixes all the above problems,
making the code much easier to read. A `&[TokenTree]` is converted to a
`&[MatcherLoc]` before matching begins. Despite the cost of the
conversion, it's still a net performance win, because various pieces of
traversal state are computed once up-front, rather than having to be
recomputed repeatedly during the macro matching.
Some improvements worth noting.
- `parse_tt_inner` is *much* easier to read. No more having to compare
`idx` against `len` and read comments to understand what the result
means.
- The handling of `Delimited` in `parse_tt_inner` is now trivial.
- The three end-of-sequence cases in `parse_tt_inner` are now handled in
three separate match arms, and the control flow is much simpler.
- `nameize` is no longer recursive.
- There were two places that issued "missing fragment specifier" errors:
one in `parse_tt_inner()`, and one in `nameize()`. Presumably the
latter was never executed. There's now a single place issuing these
errors, in `compute_locs()`.
- The number of heap allocations done for a `check full` build of
`async-std-1.10.0` (an extreme example of heavy macro use) drops from
11.8M to 2.6M, and most of these occur outside of macro matching.
- The size of `MatcherPos` drops from 64 bytes to 16 bytes. Small enough
that it no longer needs boxing, which partly accounts for the
reduction in allocations.
- The rest of the drop in allocations is due to the removal of
`MatcherKind`, because we no longer need to record anything for the
parent matcher when entering a submatcher.
- Overall it reduces code size by 45 lines.
2022-04-01 10:19:16 +11:00
|
|
|
bb_mps: Vec<MatcherPos>,
|
2022-03-30 10:54:57 +11:00
|
|
|
|
|
|
|
/// Pre-allocate an empty match array, so it can be cloned cheaply for macros with many rules
|
|
|
|
/// that have no metavars.
|
2022-04-08 14:16:44 +10:00
|
|
|
empty_matches: Lrc<Vec<NamedMatch>>,
|
2022-03-19 08:03:48 +11:00
|
|
|
}
|
2022-03-19 07:47:22 +11:00
|
|
|
|
2022-04-05 12:12:15 +10:00
|
|
|
impl TtParser {
|
|
|
|
pub(super) fn new(macro_name: Ident) -> TtParser {
|
2022-03-30 10:54:57 +11:00
|
|
|
TtParser {
|
|
|
|
macro_name,
|
|
|
|
cur_mps: vec![],
|
|
|
|
next_mps: vec![],
|
|
|
|
bb_mps: vec![],
|
2022-04-08 14:16:44 +10:00
|
|
|
empty_matches: Lrc::new(vec![]),
|
2022-03-30 10:54:57 +11:00
|
|
|
}
|
2022-03-19 08:03:48 +11:00
|
|
|
}
|
|
|
|
|
2022-03-25 16:20:39 +11:00
|
|
|
/// Process the matcher positions of `cur_mps` until it is empty. In the process, this will
|
|
|
|
/// produce more mps in `next_mps` and `bb_mps`.
|
2022-03-19 07:47:22 +11:00
|
|
|
///
|
|
|
|
/// # Returns
|
|
|
|
///
|
|
|
|
/// `Some(result)` if everything is finished, `None` otherwise. Note that matches are kept
|
2022-03-25 16:20:39 +11:00
|
|
|
/// track of through the mps generated.
|
2022-03-19 09:53:41 +11:00
|
|
|
fn parse_tt_inner(
|
|
|
|
&mut self,
|
2022-04-05 16:34:46 +10:00
|
|
|
matcher: &[MatcherLoc],
|
2022-03-19 07:47:22 +11:00
|
|
|
token: &Token,
|
|
|
|
) -> Option<NamedParseResult> {
|
|
|
|
// Matcher positions that would be valid if the macro invocation was over now. Only
|
|
|
|
// modified if `token == Eof`.
|
2022-03-25 16:20:39 +11:00
|
|
|
let mut eof_mps = EofMatcherPositions::None;
|
|
|
|
|
|
|
|
while let Some(mut mp) = self.cur_mps.pop() {
|
2022-04-05 16:34:46 +10:00
|
|
|
match &matcher[mp.idx] {
|
2022-04-04 15:47:53 +10:00
|
|
|
MatcherLoc::Token { token: t } => {
|
|
|
|
// If it's a doc comment, we just ignore it and move on to the next tt in the
|
|
|
|
// matcher. This is a bug, but #95267 showed that existing programs rely on
|
|
|
|
// this behaviour, and changing it would require some care and a transition
|
|
|
|
// period.
|
|
|
|
//
|
|
|
|
// If the token matches, we can just advance the parser.
|
|
|
|
//
|
|
|
|
// Otherwise, this match has failed, there is nothing to do, and hopefully
|
|
|
|
// another mp in `cur_mps` will match.
|
|
|
|
if matches!(t, Token { kind: DocComment(..), .. }) {
|
|
|
|
mp.idx += 1;
|
|
|
|
self.cur_mps.push(mp);
|
|
|
|
} else if token_name_eq(&t, token) {
|
|
|
|
mp.idx += 1;
|
|
|
|
self.next_mps.push(mp);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
MatcherLoc::Delimited => {
|
|
|
|
// Entering the delimeter is trivial.
|
|
|
|
mp.idx += 1;
|
|
|
|
self.cur_mps.push(mp);
|
|
|
|
}
|
A new matcher representation for use in `parse_tt`.
`parse_tt` currently traverses a `&[TokenTree]` to do matching. But this
is a bad representation for the traversal.
- `TokenTree` is nested, and there's a bunch of expensive and fiddly
state required to handle entering and exiting nested submatchers.
- There are three positions (sequence separators, sequence Kleene ops,
and end of the matcher) that are represented by an index that exceeds
the end of the `&[TokenTree]`, which is clumsy and error-prone.
This commit introduces a new representation called `MatcherLoc` that is
designed specifically for matching. It fixes all the above problems,
making the code much easier to read. A `&[TokenTree]` is converted to a
`&[MatcherLoc]` before matching begins. Despite the cost of the
conversion, it's still a net performance win, because various pieces of
traversal state are computed once up-front, rather than having to be
recomputed repeatedly during the macro matching.
Some improvements worth noting.
- `parse_tt_inner` is *much* easier to read. No more having to compare
`idx` against `len` and read comments to understand what the result
means.
- The handling of `Delimited` in `parse_tt_inner` is now trivial.
- The three end-of-sequence cases in `parse_tt_inner` are now handled in
three separate match arms, and the control flow is much simpler.
- `nameize` is no longer recursive.
- There were two places that issued "missing fragment specifier" errors:
one in `parse_tt_inner()`, and one in `nameize()`. Presumably the
latter was never executed. There's now a single place issuing these
errors, in `compute_locs()`.
- The number of heap allocations done for a `check full` build of
`async-std-1.10.0` (an extreme example of heavy macro use) drops from
11.8M to 2.6M, and most of these occur outside of macro matching.
- The size of `MatcherPos` drops from 64 bytes to 16 bytes. Small enough
that it no longer needs boxing, which partly accounts for the
reduction in allocations.
- The rest of the drop in allocations is due to the removal of
`MatcherKind`, because we no longer need to record anything for the
parent matcher when entering a submatcher.
- Overall it reduces code size by 45 lines.
2022-04-01 10:19:16 +11:00
|
|
|
&MatcherLoc::Sequence {
|
|
|
|
op,
|
|
|
|
num_metavar_decls,
|
|
|
|
idx_first_after,
|
|
|
|
next_metavar,
|
|
|
|
seq_depth,
|
|
|
|
} => {
|
|
|
|
// Install an empty vec for each metavar within the sequence.
|
|
|
|
for metavar_idx in next_metavar..next_metavar + num_metavar_decls {
|
2022-04-08 14:16:44 +10:00
|
|
|
mp.push_match(metavar_idx, seq_depth, MatchedSeq(vec![]));
|
2014-10-06 23:15:12 -07:00
|
|
|
}
|
2016-11-11 16:28:47 -07:00
|
|
|
|
A new matcher representation for use in `parse_tt`.
`parse_tt` currently traverses a `&[TokenTree]` to do matching. But this
is a bad representation for the traversal.
- `TokenTree` is nested, and there's a bunch of expensive and fiddly
state required to handle entering and exiting nested submatchers.
- There are three positions (sequence separators, sequence Kleene ops,
and end of the matcher) that are represented by an index that exceeds
the end of the `&[TokenTree]`, which is clumsy and error-prone.
This commit introduces a new representation called `MatcherLoc` that is
designed specifically for matching. It fixes all the above problems,
making the code much easier to read. A `&[TokenTree]` is converted to a
`&[MatcherLoc]` before matching begins. Despite the cost of the
conversion, it's still a net performance win, because various pieces of
traversal state are computed once up-front, rather than having to be
recomputed repeatedly during the macro matching.
Some improvements worth noting.
- `parse_tt_inner` is *much* easier to read. No more having to compare
`idx` against `len` and read comments to understand what the result
means.
- The handling of `Delimited` in `parse_tt_inner` is now trivial.
- The three end-of-sequence cases in `parse_tt_inner` are now handled in
three separate match arms, and the control flow is much simpler.
- `nameize` is no longer recursive.
- There were two places that issued "missing fragment specifier" errors:
one in `parse_tt_inner()`, and one in `nameize()`. Presumably the
latter was never executed. There's now a single place issuing these
errors, in `compute_locs()`.
- The number of heap allocations done for a `check full` build of
`async-std-1.10.0` (an extreme example of heavy macro use) drops from
11.8M to 2.6M, and most of these occur outside of macro matching.
- The size of `MatcherPos` drops from 64 bytes to 16 bytes. Small enough
that it no longer needs boxing, which partly accounts for the
reduction in allocations.
- The rest of the drop in allocations is due to the removal of
`MatcherKind`, because we no longer need to record anything for the
parent matcher when entering a submatcher.
- Overall it reduces code size by 45 lines.
2022-04-01 10:19:16 +11:00
|
|
|
if op == KleeneOp::ZeroOrMore || op == KleeneOp::ZeroOrOne {
|
|
|
|
// Try zero matches of this sequence, by skipping over it.
|
|
|
|
self.cur_mps.push(MatcherPos {
|
|
|
|
idx: idx_first_after,
|
|
|
|
matches: mp.matches.clone(), // a cheap clone
|
|
|
|
});
|
2022-03-19 07:47:22 +11:00
|
|
|
}
|
2018-01-24 23:10:39 -06:00
|
|
|
|
A new matcher representation for use in `parse_tt`.
`parse_tt` currently traverses a `&[TokenTree]` to do matching. But this
is a bad representation for the traversal.
- `TokenTree` is nested, and there's a bunch of expensive and fiddly
state required to handle entering and exiting nested submatchers.
- There are three positions (sequence separators, sequence Kleene ops,
and end of the matcher) that are represented by an index that exceeds
the end of the `&[TokenTree]`, which is clumsy and error-prone.
This commit introduces a new representation called `MatcherLoc` that is
designed specifically for matching. It fixes all the above problems,
making the code much easier to read. A `&[TokenTree]` is converted to a
`&[MatcherLoc]` before matching begins. Despite the cost of the
conversion, it's still a net performance win, because various pieces of
traversal state are computed once up-front, rather than having to be
recomputed repeatedly during the macro matching.
Some improvements worth noting.
- `parse_tt_inner` is *much* easier to read. No more having to compare
`idx` against `len` and read comments to understand what the result
means.
- The handling of `Delimited` in `parse_tt_inner` is now trivial.
- The three end-of-sequence cases in `parse_tt_inner` are now handled in
three separate match arms, and the control flow is much simpler.
- `nameize` is no longer recursive.
- There were two places that issued "missing fragment specifier" errors:
one in `parse_tt_inner()`, and one in `nameize()`. Presumably the
latter was never executed. There's now a single place issuing these
errors, in `compute_locs()`.
- The number of heap allocations done for a `check full` build of
`async-std-1.10.0` (an extreme example of heavy macro use) drops from
11.8M to 2.6M, and most of these occur outside of macro matching.
- The size of `MatcherPos` drops from 64 bytes to 16 bytes. Small enough
that it no longer needs boxing, which partly accounts for the
reduction in allocations.
- The rest of the drop in allocations is due to the removal of
`MatcherKind`, because we no longer need to record anything for the
parent matcher when entering a submatcher.
- Overall it reduces code size by 45 lines.
2022-04-01 10:19:16 +11:00
|
|
|
// Try one or more matches of this sequence, by entering it.
|
|
|
|
mp.idx += 1;
|
|
|
|
self.cur_mps.push(mp);
|
|
|
|
}
|
|
|
|
&MatcherLoc::SequenceKleeneOpNoSep { op, idx_first } => {
|
|
|
|
// We are past the end of a sequence with no separator. Try ending the
|
|
|
|
// sequence. If that's not possible, `ending_mp` will fail quietly when it is
|
2022-03-31 07:00:07 +11:00
|
|
|
// processed next time around the loop.
|
A new matcher representation for use in `parse_tt`.
`parse_tt` currently traverses a `&[TokenTree]` to do matching. But this
is a bad representation for the traversal.
- `TokenTree` is nested, and there's a bunch of expensive and fiddly
state required to handle entering and exiting nested submatchers.
- There are three positions (sequence separators, sequence Kleene ops,
and end of the matcher) that are represented by an index that exceeds
the end of the `&[TokenTree]`, which is clumsy and error-prone.
This commit introduces a new representation called `MatcherLoc` that is
designed specifically for matching. It fixes all the above problems,
making the code much easier to read. A `&[TokenTree]` is converted to a
`&[MatcherLoc]` before matching begins. Despite the cost of the
conversion, it's still a net performance win, because various pieces of
traversal state are computed once up-front, rather than having to be
recomputed repeatedly during the macro matching.
Some improvements worth noting.
- `parse_tt_inner` is *much* easier to read. No more having to compare
`idx` against `len` and read comments to understand what the result
means.
- The handling of `Delimited` in `parse_tt_inner` is now trivial.
- The three end-of-sequence cases in `parse_tt_inner` are now handled in
three separate match arms, and the control flow is much simpler.
- `nameize` is no longer recursive.
- There were two places that issued "missing fragment specifier" errors:
one in `parse_tt_inner()`, and one in `nameize()`. Presumably the
latter was never executed. There's now a single place issuing these
errors, in `compute_locs()`.
- The number of heap allocations done for a `check full` build of
`async-std-1.10.0` (an extreme example of heavy macro use) drops from
11.8M to 2.6M, and most of these occur outside of macro matching.
- The size of `MatcherPos` drops from 64 bytes to 16 bytes. Small enough
that it no longer needs boxing, which partly accounts for the
reduction in allocations.
- The rest of the drop in allocations is due to the removal of
`MatcherKind`, because we no longer need to record anything for the
parent matcher when entering a submatcher.
- Overall it reduces code size by 45 lines.
2022-04-01 10:19:16 +11:00
|
|
|
let ending_mp = MatcherPos {
|
|
|
|
idx: mp.idx + 1, // +1 skips the Kleene op
|
2022-03-31 08:21:36 +11:00
|
|
|
matches: mp.matches.clone(), // a cheap clone
|
|
|
|
};
|
A new matcher representation for use in `parse_tt`.
`parse_tt` currently traverses a `&[TokenTree]` to do matching. But this
is a bad representation for the traversal.
- `TokenTree` is nested, and there's a bunch of expensive and fiddly
state required to handle entering and exiting nested submatchers.
- There are three positions (sequence separators, sequence Kleene ops,
and end of the matcher) that are represented by an index that exceeds
the end of the `&[TokenTree]`, which is clumsy and error-prone.
This commit introduces a new representation called `MatcherLoc` that is
designed specifically for matching. It fixes all the above problems,
making the code much easier to read. A `&[TokenTree]` is converted to a
`&[MatcherLoc]` before matching begins. Despite the cost of the
conversion, it's still a net performance win, because various pieces of
traversal state are computed once up-front, rather than having to be
recomputed repeatedly during the macro matching.
Some improvements worth noting.
- `parse_tt_inner` is *much* easier to read. No more having to compare
`idx` against `len` and read comments to understand what the result
means.
- The handling of `Delimited` in `parse_tt_inner` is now trivial.
- The three end-of-sequence cases in `parse_tt_inner` are now handled in
three separate match arms, and the control flow is much simpler.
- `nameize` is no longer recursive.
- There were two places that issued "missing fragment specifier" errors:
one in `parse_tt_inner()`, and one in `nameize()`. Presumably the
latter was never executed. There's now a single place issuing these
errors, in `compute_locs()`.
- The number of heap allocations done for a `check full` build of
`async-std-1.10.0` (an extreme example of heavy macro use) drops from
11.8M to 2.6M, and most of these occur outside of macro matching.
- The size of `MatcherPos` drops from 64 bytes to 16 bytes. Small enough
that it no longer needs boxing, which partly accounts for the
reduction in allocations.
- The rest of the drop in allocations is due to the removal of
`MatcherKind`, because we no longer need to record anything for the
parent matcher when entering a submatcher.
- Overall it reduces code size by 45 lines.
2022-04-01 10:19:16 +11:00
|
|
|
self.cur_mps.push(ending_mp);
|
|
|
|
|
|
|
|
if op != KleeneOp::ZeroOrOne {
|
|
|
|
// Try another repetition.
|
|
|
|
mp.idx = idx_first;
|
|
|
|
self.cur_mps.push(mp);
|
|
|
|
}
|
2016-11-11 16:28:47 -07:00
|
|
|
}
|
A new matcher representation for use in `parse_tt`.
`parse_tt` currently traverses a `&[TokenTree]` to do matching. But this
is a bad representation for the traversal.
- `TokenTree` is nested, and there's a bunch of expensive and fiddly
state required to handle entering and exiting nested submatchers.
- There are three positions (sequence separators, sequence Kleene ops,
and end of the matcher) that are represented by an index that exceeds
the end of the `&[TokenTree]`, which is clumsy and error-prone.
This commit introduces a new representation called `MatcherLoc` that is
designed specifically for matching. It fixes all the above problems,
making the code much easier to read. A `&[TokenTree]` is converted to a
`&[MatcherLoc]` before matching begins. Despite the cost of the
conversion, it's still a net performance win, because various pieces of
traversal state are computed once up-front, rather than having to be
recomputed repeatedly during the macro matching.
Some improvements worth noting.
- `parse_tt_inner` is *much* easier to read. No more having to compare
`idx` against `len` and read comments to understand what the result
means.
- The handling of `Delimited` in `parse_tt_inner` is now trivial.
- The three end-of-sequence cases in `parse_tt_inner` are now handled in
three separate match arms, and the control flow is much simpler.
- `nameize` is no longer recursive.
- There were two places that issued "missing fragment specifier" errors:
one in `parse_tt_inner()`, and one in `nameize()`. Presumably the
latter was never executed. There's now a single place issuing these
errors, in `compute_locs()`.
- The number of heap allocations done for a `check full` build of
`async-std-1.10.0` (an extreme example of heavy macro use) drops from
11.8M to 2.6M, and most of these occur outside of macro matching.
- The size of `MatcherPos` drops from 64 bytes to 16 bytes. Small enough
that it no longer needs boxing, which partly accounts for the
reduction in allocations.
- The rest of the drop in allocations is due to the removal of
`MatcherKind`, because we no longer need to record anything for the
parent matcher when entering a submatcher.
- Overall it reduces code size by 45 lines.
2022-04-01 10:19:16 +11:00
|
|
|
MatcherLoc::SequenceSep { separator } => {
|
|
|
|
// We are past the end of a sequence with a separator but we haven't seen the
|
|
|
|
// separator yet. Try ending the sequence. If that's not possible, `ending_mp`
|
|
|
|
// will fail quietly when it is processed next time around the loop.
|
|
|
|
let ending_mp = MatcherPos {
|
|
|
|
idx: mp.idx + 2, // +2 skips the separator and the Kleene op
|
|
|
|
matches: mp.matches.clone(), // a cheap clone
|
|
|
|
};
|
|
|
|
self.cur_mps.push(ending_mp);
|
2018-01-24 23:10:39 -06:00
|
|
|
|
A new matcher representation for use in `parse_tt`.
`parse_tt` currently traverses a `&[TokenTree]` to do matching. But this
is a bad representation for the traversal.
- `TokenTree` is nested, and there's a bunch of expensive and fiddly
state required to handle entering and exiting nested submatchers.
- There are three positions (sequence separators, sequence Kleene ops,
and end of the matcher) that are represented by an index that exceeds
the end of the `&[TokenTree]`, which is clumsy and error-prone.
This commit introduces a new representation called `MatcherLoc` that is
designed specifically for matching. It fixes all the above problems,
making the code much easier to read. A `&[TokenTree]` is converted to a
`&[MatcherLoc]` before matching begins. Despite the cost of the
conversion, it's still a net performance win, because various pieces of
traversal state are computed once up-front, rather than having to be
recomputed repeatedly during the macro matching.
Some improvements worth noting.
- `parse_tt_inner` is *much* easier to read. No more having to compare
`idx` against `len` and read comments to understand what the result
means.
- The handling of `Delimited` in `parse_tt_inner` is now trivial.
- The three end-of-sequence cases in `parse_tt_inner` are now handled in
three separate match arms, and the control flow is much simpler.
- `nameize` is no longer recursive.
- There were two places that issued "missing fragment specifier" errors:
one in `parse_tt_inner()`, and one in `nameize()`. Presumably the
latter was never executed. There's now a single place issuing these
errors, in `compute_locs()`.
- The number of heap allocations done for a `check full` build of
`async-std-1.10.0` (an extreme example of heavy macro use) drops from
11.8M to 2.6M, and most of these occur outside of macro matching.
- The size of `MatcherPos` drops from 64 bytes to 16 bytes. Small enough
that it no longer needs boxing, which partly accounts for the
reduction in allocations.
- The rest of the drop in allocations is due to the removal of
`MatcherKind`, because we no longer need to record anything for the
parent matcher when entering a submatcher.
- Overall it reduces code size by 45 lines.
2022-04-01 10:19:16 +11:00
|
|
|
if token_name_eq(token, separator) {
|
|
|
|
// The separator matches the current token. Advance past it.
|
2022-03-25 16:20:39 +11:00
|
|
|
mp.idx += 1;
|
|
|
|
self.next_mps.push(mp);
|
2022-03-18 17:13:41 +11:00
|
|
|
}
|
A new matcher representation for use in `parse_tt`.
`parse_tt` currently traverses a `&[TokenTree]` to do matching. But this
is a bad representation for the traversal.
- `TokenTree` is nested, and there's a bunch of expensive and fiddly
state required to handle entering and exiting nested submatchers.
- There are three positions (sequence separators, sequence Kleene ops,
and end of the matcher) that are represented by an index that exceeds
the end of the `&[TokenTree]`, which is clumsy and error-prone.
This commit introduces a new representation called `MatcherLoc` that is
designed specifically for matching. It fixes all the above problems,
making the code much easier to read. A `&[TokenTree]` is converted to a
`&[MatcherLoc]` before matching begins. Despite the cost of the
conversion, it's still a net performance win, because various pieces of
traversal state are computed once up-front, rather than having to be
recomputed repeatedly during the macro matching.
Some improvements worth noting.
- `parse_tt_inner` is *much* easier to read. No more having to compare
`idx` against `len` and read comments to understand what the result
means.
- The handling of `Delimited` in `parse_tt_inner` is now trivial.
- The three end-of-sequence cases in `parse_tt_inner` are now handled in
three separate match arms, and the control flow is much simpler.
- `nameize` is no longer recursive.
- There were two places that issued "missing fragment specifier" errors:
one in `parse_tt_inner()`, and one in `nameize()`. Presumably the
latter was never executed. There's now a single place issuing these
errors, in `compute_locs()`.
- The number of heap allocations done for a `check full` build of
`async-std-1.10.0` (an extreme example of heavy macro use) drops from
11.8M to 2.6M, and most of these occur outside of macro matching.
- The size of `MatcherPos` drops from 64 bytes to 16 bytes. Small enough
that it no longer needs boxing, which partly accounts for the
reduction in allocations.
- The rest of the drop in allocations is due to the removal of
`MatcherKind`, because we no longer need to record anything for the
parent matcher when entering a submatcher.
- Overall it reduces code size by 45 lines.
2022-04-01 10:19:16 +11:00
|
|
|
}
|
|
|
|
&MatcherLoc::SequenceKleeneOpAfterSep { idx_first } => {
|
|
|
|
// We are past the sequence separator. This can't be a `?` Kleene op, because
|
|
|
|
// they don't permit separators. Try another repetition.
|
|
|
|
mp.idx = idx_first;
|
2022-03-25 16:20:39 +11:00
|
|
|
self.cur_mps.push(mp);
|
2012-06-12 10:59:50 -07:00
|
|
|
}
|
2022-04-05 16:31:30 +10:00
|
|
|
&MatcherLoc::MetaVarDecl { span, kind, .. } => {
|
2022-04-04 15:47:53 +10:00
|
|
|
// Built-in nonterminals never start with these tokens, so we can eliminate
|
|
|
|
// them from consideration. We use the span of the metavariable declaration
|
|
|
|
// to determine any edition-specific matching behavior for non-terminals.
|
2022-04-05 16:31:30 +10:00
|
|
|
if let Some(kind) = kind {
|
|
|
|
if Parser::nonterminal_may_begin_with(kind, token) {
|
|
|
|
self.bb_mps.push(mp);
|
|
|
|
}
|
|
|
|
} else {
|
2022-04-08 16:04:37 +03:00
|
|
|
// E.g. `$e` instead of `$e:expr`, reported as a hard error if actually used.
|
2022-04-05 16:31:30 +10:00
|
|
|
// Both this check and the one in `nameize` are necessary, surprisingly.
|
2022-04-08 16:04:37 +03:00
|
|
|
return Some(Error(span, "missing fragment specifier".to_string()));
|
2022-04-04 15:47:53 +10:00
|
|
|
}
|
|
|
|
}
|
A new matcher representation for use in `parse_tt`.
`parse_tt` currently traverses a `&[TokenTree]` to do matching. But this
is a bad representation for the traversal.
- `TokenTree` is nested, and there's a bunch of expensive and fiddly
state required to handle entering and exiting nested submatchers.
- There are three positions (sequence separators, sequence Kleene ops,
and end of the matcher) that are represented by an index that exceeds
the end of the `&[TokenTree]`, which is clumsy and error-prone.
This commit introduces a new representation called `MatcherLoc` that is
designed specifically for matching. It fixes all the above problems,
making the code much easier to read. A `&[TokenTree]` is converted to a
`&[MatcherLoc]` before matching begins. Despite the cost of the
conversion, it's still a net performance win, because various pieces of
traversal state are computed once up-front, rather than having to be
recomputed repeatedly during the macro matching.
Some improvements worth noting.
- `parse_tt_inner` is *much* easier to read. No more having to compare
`idx` against `len` and read comments to understand what the result
means.
- The handling of `Delimited` in `parse_tt_inner` is now trivial.
- The three end-of-sequence cases in `parse_tt_inner` are now handled in
three separate match arms, and the control flow is much simpler.
- `nameize` is no longer recursive.
- There were two places that issued "missing fragment specifier" errors:
one in `parse_tt_inner()`, and one in `nameize()`. Presumably the
latter was never executed. There's now a single place issuing these
errors, in `compute_locs()`.
- The number of heap allocations done for a `check full` build of
`async-std-1.10.0` (an extreme example of heavy macro use) drops from
11.8M to 2.6M, and most of these occur outside of macro matching.
- The size of `MatcherPos` drops from 64 bytes to 16 bytes. Small enough
that it no longer needs boxing, which partly accounts for the
reduction in allocations.
- The rest of the drop in allocations is due to the removal of
`MatcherKind`, because we no longer need to record anything for the
parent matcher when entering a submatcher.
- Overall it reduces code size by 45 lines.
2022-04-01 10:19:16 +11:00
|
|
|
MatcherLoc::Eof => {
|
|
|
|
// We are past the matcher's end, and not in a sequence. Try to end things.
|
2022-04-05 16:34:46 +10:00
|
|
|
debug_assert_eq!(mp.idx, matcher.len() - 1);
|
A new matcher representation for use in `parse_tt`.
`parse_tt` currently traverses a `&[TokenTree]` to do matching. But this
is a bad representation for the traversal.
- `TokenTree` is nested, and there's a bunch of expensive and fiddly
state required to handle entering and exiting nested submatchers.
- There are three positions (sequence separators, sequence Kleene ops,
and end of the matcher) that are represented by an index that exceeds
the end of the `&[TokenTree]`, which is clumsy and error-prone.
This commit introduces a new representation called `MatcherLoc` that is
designed specifically for matching. It fixes all the above problems,
making the code much easier to read. A `&[TokenTree]` is converted to a
`&[MatcherLoc]` before matching begins. Despite the cost of the
conversion, it's still a net performance win, because various pieces of
traversal state are computed once up-front, rather than having to be
recomputed repeatedly during the macro matching.
Some improvements worth noting.
- `parse_tt_inner` is *much* easier to read. No more having to compare
`idx` against `len` and read comments to understand what the result
means.
- The handling of `Delimited` in `parse_tt_inner` is now trivial.
- The three end-of-sequence cases in `parse_tt_inner` are now handled in
three separate match arms, and the control flow is much simpler.
- `nameize` is no longer recursive.
- There were two places that issued "missing fragment specifier" errors:
one in `parse_tt_inner()`, and one in `nameize()`. Presumably the
latter was never executed. There's now a single place issuing these
errors, in `compute_locs()`.
- The number of heap allocations done for a `check full` build of
`async-std-1.10.0` (an extreme example of heavy macro use) drops from
11.8M to 2.6M, and most of these occur outside of macro matching.
- The size of `MatcherPos` drops from 64 bytes to 16 bytes. Small enough
that it no longer needs boxing, which partly accounts for the
reduction in allocations.
- The rest of the drop in allocations is due to the removal of
`MatcherKind`, because we no longer need to record anything for the
parent matcher when entering a submatcher.
- Overall it reduces code size by 45 lines.
2022-04-01 10:19:16 +11:00
|
|
|
if *token == token::Eof {
|
|
|
|
eof_mps = match eof_mps {
|
|
|
|
EofMatcherPositions::None => EofMatcherPositions::One(mp),
|
|
|
|
EofMatcherPositions::One(_) | EofMatcherPositions::Multiple => {
|
|
|
|
EofMatcherPositions::Multiple
|
|
|
|
}
|
2022-03-25 16:20:39 +11:00
|
|
|
}
|
2022-03-19 07:47:22 +11:00
|
|
|
}
|
2022-03-18 14:16:45 +11:00
|
|
|
}
|
|
|
|
}
|
2022-03-19 07:47:22 +11:00
|
|
|
}
|
2022-03-18 14:16:45 +11:00
|
|
|
|
2022-03-19 07:47:22 +11:00
|
|
|
// If we reached the end of input, check that there is EXACTLY ONE possible matcher.
|
|
|
|
// Otherwise, either the parse is ambiguous (which is an error) or there is a syntax error.
|
|
|
|
if *token == token::Eof {
|
2022-03-25 16:20:39 +11:00
|
|
|
Some(match eof_mps {
|
|
|
|
EofMatcherPositions::One(mut eof_mp) => {
|
2022-03-28 17:13:56 +11:00
|
|
|
// Need to take ownership of the matches from within the `Lrc`.
|
|
|
|
Lrc::make_mut(&mut eof_mp.matches);
|
|
|
|
let matches = Lrc::try_unwrap(eof_mp.matches).unwrap().into_iter();
|
2022-04-08 16:04:37 +03:00
|
|
|
self.nameize(matcher, matches)
|
2022-03-18 14:16:45 +11:00
|
|
|
}
|
2022-03-25 16:20:39 +11:00
|
|
|
EofMatcherPositions::Multiple => {
|
2022-03-19 07:47:22 +11:00
|
|
|
Error(token.span, "ambiguity: multiple successful parses".to_string())
|
2022-03-18 14:16:45 +11:00
|
|
|
}
|
2022-03-25 16:20:39 +11:00
|
|
|
EofMatcherPositions::None => Failure(
|
2022-03-19 07:47:22 +11:00
|
|
|
Token::new(
|
|
|
|
token::Eof,
|
|
|
|
if token.span.is_dummy() { token.span } else { token.span.shrink_to_hi() },
|
|
|
|
),
|
|
|
|
"missing tokens in macro arguments",
|
|
|
|
),
|
|
|
|
})
|
|
|
|
} else {
|
|
|
|
None
|
2012-06-12 10:59:50 -07:00
|
|
|
}
|
2016-11-11 16:28:47 -07:00
|
|
|
}
|
|
|
|
|
2022-03-25 16:20:39 +11:00
|
|
|
/// Match the token stream from `parser` against `matcher`.
|
2022-03-19 07:47:22 +11:00
|
|
|
pub(super) fn parse_tt(
|
2022-03-19 09:53:41 +11:00
|
|
|
&mut self,
|
2022-03-19 07:47:22 +11:00
|
|
|
parser: &mut Cow<'_, Parser<'_>>,
|
2022-04-05 16:34:46 +10:00
|
|
|
matcher: &[MatcherLoc],
|
2022-03-19 07:47:22 +11:00
|
|
|
) -> NamedParseResult {
|
|
|
|
// A queue of possible matcher positions. We initialize it with the matcher position in
|
2022-03-25 16:20:39 +11:00
|
|
|
// which the "dot" is before the first token of the first token tree in `matcher`.
|
2022-03-19 07:47:22 +11:00
|
|
|
// `parse_tt_inner` then processes all of these possible matcher positions and produces
|
2022-03-25 16:20:39 +11:00
|
|
|
// possible next positions into `next_mps`. After some post-processing, the contents of
|
|
|
|
// `next_mps` replenish `cur_mps` and we start over again.
|
|
|
|
self.cur_mps.clear();
|
A new matcher representation for use in `parse_tt`.
`parse_tt` currently traverses a `&[TokenTree]` to do matching. But this
is a bad representation for the traversal.
- `TokenTree` is nested, and there's a bunch of expensive and fiddly
state required to handle entering and exiting nested submatchers.
- There are three positions (sequence separators, sequence Kleene ops,
and end of the matcher) that are represented by an index that exceeds
the end of the `&[TokenTree]`, which is clumsy and error-prone.
This commit introduces a new representation called `MatcherLoc` that is
designed specifically for matching. It fixes all the above problems,
making the code much easier to read. A `&[TokenTree]` is converted to a
`&[MatcherLoc]` before matching begins. Despite the cost of the
conversion, it's still a net performance win, because various pieces of
traversal state are computed once up-front, rather than having to be
recomputed repeatedly during the macro matching.
Some improvements worth noting.
- `parse_tt_inner` is *much* easier to read. No more having to compare
`idx` against `len` and read comments to understand what the result
means.
- The handling of `Delimited` in `parse_tt_inner` is now trivial.
- The three end-of-sequence cases in `parse_tt_inner` are now handled in
three separate match arms, and the control flow is much simpler.
- `nameize` is no longer recursive.
- There were two places that issued "missing fragment specifier" errors:
one in `parse_tt_inner()`, and one in `nameize()`. Presumably the
latter was never executed. There's now a single place issuing these
errors, in `compute_locs()`.
- The number of heap allocations done for a `check full` build of
`async-std-1.10.0` (an extreme example of heavy macro use) drops from
11.8M to 2.6M, and most of these occur outside of macro matching.
- The size of `MatcherPos` drops from 64 bytes to 16 bytes. Small enough
that it no longer needs boxing, which partly accounts for the
reduction in allocations.
- The rest of the drop in allocations is due to the removal of
`MatcherKind`, because we no longer need to record anything for the
parent matcher when entering a submatcher.
- Overall it reduces code size by 45 lines.
2022-04-01 10:19:16 +11:00
|
|
|
self.cur_mps.push(MatcherPos { idx: 0, matches: self.empty_matches.clone() });
|
2022-03-19 07:47:22 +11:00
|
|
|
|
|
|
|
loop {
|
2022-03-25 16:20:39 +11:00
|
|
|
self.next_mps.clear();
|
|
|
|
self.bb_mps.clear();
|
2022-03-19 07:47:22 +11:00
|
|
|
|
2022-03-25 16:20:39 +11:00
|
|
|
// Process `cur_mps` until either we have finished the input or we need to get some
|
2022-03-19 07:47:22 +11:00
|
|
|
// parsing from the black-box parser done.
|
2022-04-08 16:04:37 +03:00
|
|
|
if let Some(res) = self.parse_tt_inner(matcher, &parser.token) {
|
A new matcher representation for use in `parse_tt`.
`parse_tt` currently traverses a `&[TokenTree]` to do matching. But this
is a bad representation for the traversal.
- `TokenTree` is nested, and there's a bunch of expensive and fiddly
state required to handle entering and exiting nested submatchers.
- There are three positions (sequence separators, sequence Kleene ops,
and end of the matcher) that are represented by an index that exceeds
the end of the `&[TokenTree]`, which is clumsy and error-prone.
This commit introduces a new representation called `MatcherLoc` that is
designed specifically for matching. It fixes all the above problems,
making the code much easier to read. A `&[TokenTree]` is converted to a
`&[MatcherLoc]` before matching begins. Despite the cost of the
conversion, it's still a net performance win, because various pieces of
traversal state are computed once up-front, rather than having to be
recomputed repeatedly during the macro matching.
Some improvements worth noting.
- `parse_tt_inner` is *much* easier to read. No more having to compare
`idx` against `len` and read comments to understand what the result
means.
- The handling of `Delimited` in `parse_tt_inner` is now trivial.
- The three end-of-sequence cases in `parse_tt_inner` are now handled in
three separate match arms, and the control flow is much simpler.
- `nameize` is no longer recursive.
- There were two places that issued "missing fragment specifier" errors:
one in `parse_tt_inner()`, and one in `nameize()`. Presumably the
latter was never executed. There's now a single place issuing these
errors, in `compute_locs()`.
- The number of heap allocations done for a `check full` build of
`async-std-1.10.0` (an extreme example of heavy macro use) drops from
11.8M to 2.6M, and most of these occur outside of macro matching.
- The size of `MatcherPos` drops from 64 bytes to 16 bytes. Small enough
that it no longer needs boxing, which partly accounts for the
reduction in allocations.
- The rest of the drop in allocations is due to the removal of
`MatcherKind`, because we no longer need to record anything for the
parent matcher when entering a submatcher.
- Overall it reduces code size by 45 lines.
2022-04-01 10:19:16 +11:00
|
|
|
return res;
|
2022-03-09 14:34:24 +11:00
|
|
|
}
|
2016-11-11 16:28:47 -07:00
|
|
|
|
2022-03-25 16:20:39 +11:00
|
|
|
// `parse_tt_inner` handled all of `cur_mps`, so it's empty.
|
|
|
|
assert!(self.cur_mps.is_empty());
|
2022-03-19 07:47:22 +11:00
|
|
|
|
|
|
|
// Error messages here could be improved with links to original rules.
|
2022-03-25 16:20:39 +11:00
|
|
|
match (self.next_mps.len(), self.bb_mps.len()) {
|
2022-03-19 07:47:22 +11:00
|
|
|
(0, 0) => {
|
|
|
|
// There are no possible next positions AND we aren't waiting for the black-box
|
|
|
|
// parser: syntax error.
|
|
|
|
return Failure(
|
|
|
|
parser.token.clone(),
|
|
|
|
"no rules expected this token in macro call",
|
|
|
|
);
|
|
|
|
}
|
2012-06-12 10:59:50 -07:00
|
|
|
|
2022-03-19 07:47:22 +11:00
|
|
|
(_, 0) => {
|
2022-03-25 16:20:39 +11:00
|
|
|
// Dump all possible `next_mps` into `cur_mps` for the next iteration. Then
|
2022-03-19 07:47:22 +11:00
|
|
|
// process the next token.
|
2022-04-13 22:18:28 +02:00
|
|
|
self.cur_mps.append(&mut self.next_mps);
|
2022-03-19 07:47:22 +11:00
|
|
|
parser.to_mut().bump();
|
|
|
|
}
|
2022-03-03 11:00:50 +11:00
|
|
|
|
2022-03-19 07:47:22 +11:00
|
|
|
(0, 1) => {
|
|
|
|
// We need to call the black-box parser to get some nonterminal.
|
2022-03-25 16:20:39 +11:00
|
|
|
let mut mp = self.bb_mps.pop().unwrap();
|
2022-04-05 16:34:46 +10:00
|
|
|
let loc = &matcher[mp.idx];
|
A new matcher representation for use in `parse_tt`.
`parse_tt` currently traverses a `&[TokenTree]` to do matching. But this
is a bad representation for the traversal.
- `TokenTree` is nested, and there's a bunch of expensive and fiddly
state required to handle entering and exiting nested submatchers.
- There are three positions (sequence separators, sequence Kleene ops,
and end of the matcher) that are represented by an index that exceeds
the end of the `&[TokenTree]`, which is clumsy and error-prone.
This commit introduces a new representation called `MatcherLoc` that is
designed specifically for matching. It fixes all the above problems,
making the code much easier to read. A `&[TokenTree]` is converted to a
`&[MatcherLoc]` before matching begins. Despite the cost of the
conversion, it's still a net performance win, because various pieces of
traversal state are computed once up-front, rather than having to be
recomputed repeatedly during the macro matching.
Some improvements worth noting.
- `parse_tt_inner` is *much* easier to read. No more having to compare
`idx` against `len` and read comments to understand what the result
means.
- The handling of `Delimited` in `parse_tt_inner` is now trivial.
- The three end-of-sequence cases in `parse_tt_inner` are now handled in
three separate match arms, and the control flow is much simpler.
- `nameize` is no longer recursive.
- There were two places that issued "missing fragment specifier" errors:
one in `parse_tt_inner()`, and one in `nameize()`. Presumably the
latter was never executed. There's now a single place issuing these
errors, in `compute_locs()`.
- The number of heap allocations done for a `check full` build of
`async-std-1.10.0` (an extreme example of heavy macro use) drops from
11.8M to 2.6M, and most of these occur outside of macro matching.
- The size of `MatcherPos` drops from 64 bytes to 16 bytes. Small enough
that it no longer needs boxing, which partly accounts for the
reduction in allocations.
- The rest of the drop in allocations is due to the removal of
`MatcherKind`, because we no longer need to record anything for the
parent matcher when entering a submatcher.
- Overall it reduces code size by 45 lines.
2022-04-01 10:19:16 +11:00
|
|
|
if let &MatcherLoc::MetaVarDecl {
|
2022-04-05 16:31:30 +10:00
|
|
|
span,
|
|
|
|
kind: Some(kind),
|
|
|
|
next_metavar,
|
|
|
|
seq_depth,
|
|
|
|
..
|
A new matcher representation for use in `parse_tt`.
`parse_tt` currently traverses a `&[TokenTree]` to do matching. But this
is a bad representation for the traversal.
- `TokenTree` is nested, and there's a bunch of expensive and fiddly
state required to handle entering and exiting nested submatchers.
- There are three positions (sequence separators, sequence Kleene ops,
and end of the matcher) that are represented by an index that exceeds
the end of the `&[TokenTree]`, which is clumsy and error-prone.
This commit introduces a new representation called `MatcherLoc` that is
designed specifically for matching. It fixes all the above problems,
making the code much easier to read. A `&[TokenTree]` is converted to a
`&[MatcherLoc]` before matching begins. Despite the cost of the
conversion, it's still a net performance win, because various pieces of
traversal state are computed once up-front, rather than having to be
recomputed repeatedly during the macro matching.
Some improvements worth noting.
- `parse_tt_inner` is *much* easier to read. No more having to compare
`idx` against `len` and read comments to understand what the result
means.
- The handling of `Delimited` in `parse_tt_inner` is now trivial.
- The three end-of-sequence cases in `parse_tt_inner` are now handled in
three separate match arms, and the control flow is much simpler.
- `nameize` is no longer recursive.
- There were two places that issued "missing fragment specifier" errors:
one in `parse_tt_inner()`, and one in `nameize()`. Presumably the
latter was never executed. There's now a single place issuing these
errors, in `compute_locs()`.
- The number of heap allocations done for a `check full` build of
`async-std-1.10.0` (an extreme example of heavy macro use) drops from
11.8M to 2.6M, and most of these occur outside of macro matching.
- The size of `MatcherPos` drops from 64 bytes to 16 bytes. Small enough
that it no longer needs boxing, which partly accounts for the
reduction in allocations.
- The rest of the drop in allocations is due to the removal of
`MatcherKind`, because we no longer need to record anything for the
parent matcher when entering a submatcher.
- Overall it reduces code size by 45 lines.
2022-04-01 10:19:16 +11:00
|
|
|
} = loc
|
|
|
|
{
|
2022-03-19 07:47:22 +11:00
|
|
|
// We use the span of the metavariable declaration to determine any
|
|
|
|
// edition-specific matching behavior for non-terminals.
|
|
|
|
let nt = match parser.to_mut().parse_nonterminal(kind) {
|
|
|
|
Err(mut err) => {
|
|
|
|
err.span_label(
|
|
|
|
span,
|
|
|
|
format!(
|
|
|
|
"while parsing argument for this `{kind}` macro fragment"
|
|
|
|
),
|
|
|
|
)
|
|
|
|
.emit();
|
|
|
|
return ErrorReported;
|
|
|
|
}
|
|
|
|
Ok(nt) => nt,
|
|
|
|
};
|
2022-03-23 11:46:22 +11:00
|
|
|
let m = match nt {
|
2022-03-25 12:39:12 +11:00
|
|
|
NtOrTt::Nt(nt) => MatchedNonterminal(Lrc::new(nt)),
|
|
|
|
NtOrTt::Tt(tt) => MatchedTokenTree(tt),
|
2022-03-23 11:46:22 +11:00
|
|
|
};
|
A new matcher representation for use in `parse_tt`.
`parse_tt` currently traverses a `&[TokenTree]` to do matching. But this
is a bad representation for the traversal.
- `TokenTree` is nested, and there's a bunch of expensive and fiddly
state required to handle entering and exiting nested submatchers.
- There are three positions (sequence separators, sequence Kleene ops,
and end of the matcher) that are represented by an index that exceeds
the end of the `&[TokenTree]`, which is clumsy and error-prone.
This commit introduces a new representation called `MatcherLoc` that is
designed specifically for matching. It fixes all the above problems,
making the code much easier to read. A `&[TokenTree]` is converted to a
`&[MatcherLoc]` before matching begins. Despite the cost of the
conversion, it's still a net performance win, because various pieces of
traversal state are computed once up-front, rather than having to be
recomputed repeatedly during the macro matching.
Some improvements worth noting.
- `parse_tt_inner` is *much* easier to read. No more having to compare
`idx` against `len` and read comments to understand what the result
means.
- The handling of `Delimited` in `parse_tt_inner` is now trivial.
- The three end-of-sequence cases in `parse_tt_inner` are now handled in
three separate match arms, and the control flow is much simpler.
- `nameize` is no longer recursive.
- There were two places that issued "missing fragment specifier" errors:
one in `parse_tt_inner()`, and one in `nameize()`. Presumably the
latter was never executed. There's now a single place issuing these
errors, in `compute_locs()`.
- The number of heap allocations done for a `check full` build of
`async-std-1.10.0` (an extreme example of heavy macro use) drops from
11.8M to 2.6M, and most of these occur outside of macro matching.
- The size of `MatcherPos` drops from 64 bytes to 16 bytes. Small enough
that it no longer needs boxing, which partly accounts for the
reduction in allocations.
- The rest of the drop in allocations is due to the removal of
`MatcherKind`, because we no longer need to record anything for the
parent matcher when entering a submatcher.
- Overall it reduces code size by 45 lines.
2022-04-01 10:19:16 +11:00
|
|
|
mp.push_match(next_metavar, seq_depth, m);
|
2022-03-25 16:20:39 +11:00
|
|
|
mp.idx += 1;
|
2022-03-19 07:47:22 +11:00
|
|
|
} else {
|
|
|
|
unreachable!()
|
|
|
|
}
|
2022-03-25 16:20:39 +11:00
|
|
|
self.cur_mps.push(mp);
|
2022-03-19 07:47:22 +11:00
|
|
|
}
|
2022-03-03 11:00:50 +11:00
|
|
|
|
2022-03-19 07:47:22 +11:00
|
|
|
(_, _) => {
|
|
|
|
// Too many possibilities!
|
2022-04-05 16:34:46 +10:00
|
|
|
return self.ambiguity_error(matcher, parser.token.span);
|
2022-03-09 14:18:32 +11:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2022-03-25 16:20:39 +11:00
|
|
|
assert!(!self.cur_mps.is_empty());
|
2012-06-12 10:59:50 -07:00
|
|
|
}
|
|
|
|
}
|
2022-03-09 14:51:31 +11:00
|
|
|
|
2022-04-05 16:34:46 +10:00
|
|
|
fn ambiguity_error(
|
|
|
|
&self,
|
|
|
|
matcher: &[MatcherLoc],
|
|
|
|
token_span: rustc_span::Span,
|
|
|
|
) -> NamedParseResult {
|
2022-03-19 09:53:41 +11:00
|
|
|
let nts = self
|
2022-03-25 16:20:39 +11:00
|
|
|
.bb_mps
|
2022-03-19 07:47:22 +11:00
|
|
|
.iter()
|
2022-04-05 16:34:46 +10:00
|
|
|
.map(|mp| match &matcher[mp.idx] {
|
2022-04-05 16:31:30 +10:00
|
|
|
MatcherLoc::MetaVarDecl { bind, kind: Some(kind), .. } => {
|
|
|
|
format!("{} ('{}')", kind, bind)
|
|
|
|
}
|
A new matcher representation for use in `parse_tt`.
`parse_tt` currently traverses a `&[TokenTree]` to do matching. But this
is a bad representation for the traversal.
- `TokenTree` is nested, and there's a bunch of expensive and fiddly
state required to handle entering and exiting nested submatchers.
- There are three positions (sequence separators, sequence Kleene ops,
and end of the matcher) that are represented by an index that exceeds
the end of the `&[TokenTree]`, which is clumsy and error-prone.
This commit introduces a new representation called `MatcherLoc` that is
designed specifically for matching. It fixes all the above problems,
making the code much easier to read. A `&[TokenTree]` is converted to a
`&[MatcherLoc]` before matching begins. Despite the cost of the
conversion, it's still a net performance win, because various pieces of
traversal state are computed once up-front, rather than having to be
recomputed repeatedly during the macro matching.
Some improvements worth noting.
- `parse_tt_inner` is *much* easier to read. No more having to compare
`idx` against `len` and read comments to understand what the result
means.
- The handling of `Delimited` in `parse_tt_inner` is now trivial.
- The three end-of-sequence cases in `parse_tt_inner` are now handled in
three separate match arms, and the control flow is much simpler.
- `nameize` is no longer recursive.
- There were two places that issued "missing fragment specifier" errors:
one in `parse_tt_inner()`, and one in `nameize()`. Presumably the
latter was never executed. There's now a single place issuing these
errors, in `compute_locs()`.
- The number of heap allocations done for a `check full` build of
`async-std-1.10.0` (an extreme example of heavy macro use) drops from
11.8M to 2.6M, and most of these occur outside of macro matching.
- The size of `MatcherPos` drops from 64 bytes to 16 bytes. Small enough
that it no longer needs boxing, which partly accounts for the
reduction in allocations.
- The rest of the drop in allocations is due to the removal of
`MatcherKind`, because we no longer need to record anything for the
parent matcher when entering a submatcher.
- Overall it reduces code size by 45 lines.
2022-04-01 10:19:16 +11:00
|
|
|
_ => unreachable!(),
|
2022-03-19 07:47:22 +11:00
|
|
|
})
|
|
|
|
.collect::<Vec<String>>()
|
|
|
|
.join(" or ");
|
|
|
|
|
|
|
|
Error(
|
|
|
|
token_span,
|
|
|
|
format!(
|
2022-03-19 08:03:48 +11:00
|
|
|
"local ambiguity when calling macro `{}`: multiple parsing options: {}",
|
|
|
|
self.macro_name,
|
2022-03-25 16:20:39 +11:00
|
|
|
match self.next_mps.len() {
|
2022-03-19 07:47:22 +11:00
|
|
|
0 => format!("built-in NTs {}.", nts),
|
|
|
|
1 => format!("built-in NTs {} or 1 other option.", nts),
|
|
|
|
n => format!("built-in NTs {} or {} other options.", nts, n),
|
|
|
|
}
|
|
|
|
),
|
|
|
|
)
|
|
|
|
}
|
A new matcher representation for use in `parse_tt`.
`parse_tt` currently traverses a `&[TokenTree]` to do matching. But this
is a bad representation for the traversal.
- `TokenTree` is nested, and there's a bunch of expensive and fiddly
state required to handle entering and exiting nested submatchers.
- There are three positions (sequence separators, sequence Kleene ops,
and end of the matcher) that are represented by an index that exceeds
the end of the `&[TokenTree]`, which is clumsy and error-prone.
This commit introduces a new representation called `MatcherLoc` that is
designed specifically for matching. It fixes all the above problems,
making the code much easier to read. A `&[TokenTree]` is converted to a
`&[MatcherLoc]` before matching begins. Despite the cost of the
conversion, it's still a net performance win, because various pieces of
traversal state are computed once up-front, rather than having to be
recomputed repeatedly during the macro matching.
Some improvements worth noting.
- `parse_tt_inner` is *much* easier to read. No more having to compare
`idx` against `len` and read comments to understand what the result
means.
- The handling of `Delimited` in `parse_tt_inner` is now trivial.
- The three end-of-sequence cases in `parse_tt_inner` are now handled in
three separate match arms, and the control flow is much simpler.
- `nameize` is no longer recursive.
- There were two places that issued "missing fragment specifier" errors:
one in `parse_tt_inner()`, and one in `nameize()`. Presumably the
latter was never executed. There's now a single place issuing these
errors, in `compute_locs()`.
- The number of heap allocations done for a `check full` build of
`async-std-1.10.0` (an extreme example of heavy macro use) drops from
11.8M to 2.6M, and most of these occur outside of macro matching.
- The size of `MatcherPos` drops from 64 bytes to 16 bytes. Small enough
that it no longer needs boxing, which partly accounts for the
reduction in allocations.
- The rest of the drop in allocations is due to the removal of
`MatcherKind`, because we no longer need to record anything for the
parent matcher when entering a submatcher.
- Overall it reduces code size by 45 lines.
2022-04-01 10:19:16 +11:00
|
|
|
|
2022-04-05 16:31:30 +10:00
|
|
|
fn nameize<I: Iterator<Item = NamedMatch>>(
|
|
|
|
&self,
|
2022-04-05 16:34:46 +10:00
|
|
|
matcher: &[MatcherLoc],
|
2022-04-05 16:31:30 +10:00
|
|
|
mut res: I,
|
|
|
|
) -> NamedParseResult {
|
A new matcher representation for use in `parse_tt`.
`parse_tt` currently traverses a `&[TokenTree]` to do matching. But this
is a bad representation for the traversal.
- `TokenTree` is nested, and there's a bunch of expensive and fiddly
state required to handle entering and exiting nested submatchers.
- There are three positions (sequence separators, sequence Kleene ops,
and end of the matcher) that are represented by an index that exceeds
the end of the `&[TokenTree]`, which is clumsy and error-prone.
This commit introduces a new representation called `MatcherLoc` that is
designed specifically for matching. It fixes all the above problems,
making the code much easier to read. A `&[TokenTree]` is converted to a
`&[MatcherLoc]` before matching begins. Despite the cost of the
conversion, it's still a net performance win, because various pieces of
traversal state are computed once up-front, rather than having to be
recomputed repeatedly during the macro matching.
Some improvements worth noting.
- `parse_tt_inner` is *much* easier to read. No more having to compare
`idx` against `len` and read comments to understand what the result
means.
- The handling of `Delimited` in `parse_tt_inner` is now trivial.
- The three end-of-sequence cases in `parse_tt_inner` are now handled in
three separate match arms, and the control flow is much simpler.
- `nameize` is no longer recursive.
- There were two places that issued "missing fragment specifier" errors:
one in `parse_tt_inner()`, and one in `nameize()`. Presumably the
latter was never executed. There's now a single place issuing these
errors, in `compute_locs()`.
- The number of heap allocations done for a `check full` build of
`async-std-1.10.0` (an extreme example of heavy macro use) drops from
11.8M to 2.6M, and most of these occur outside of macro matching.
- The size of `MatcherPos` drops from 64 bytes to 16 bytes. Small enough
that it no longer needs boxing, which partly accounts for the
reduction in allocations.
- The rest of the drop in allocations is due to the removal of
`MatcherKind`, because we no longer need to record anything for the
parent matcher when entering a submatcher.
- Overall it reduces code size by 45 lines.
2022-04-01 10:19:16 +11:00
|
|
|
// Make that each metavar has _exactly one_ binding. If so, insert the binding into the
|
|
|
|
// `NamedParseResult`. Otherwise, it's an error.
|
|
|
|
let mut ret_val = FxHashMap::default();
|
2022-04-05 16:34:46 +10:00
|
|
|
for loc in matcher {
|
2022-04-05 16:31:30 +10:00
|
|
|
if let &MatcherLoc::MetaVarDecl { span, bind, kind, .. } = loc {
|
|
|
|
if kind.is_some() {
|
|
|
|
match ret_val.entry(MacroRulesNormalizedIdent::new(bind)) {
|
|
|
|
Vacant(spot) => spot.insert(res.next().unwrap()),
|
|
|
|
Occupied(..) => {
|
|
|
|
return Error(span, format!("duplicated bind name: {}", bind));
|
|
|
|
}
|
|
|
|
};
|
|
|
|
} else {
|
2022-04-08 16:04:37 +03:00
|
|
|
// E.g. `$e` instead of `$e:expr`, reported as a hard error if actually used.
|
2022-04-05 16:31:30 +10:00
|
|
|
// Both this check and the one in `parse_tt_inner` are necessary, surprisingly.
|
2022-04-08 16:04:37 +03:00
|
|
|
return Error(span, "missing fragment specifier".to_string());
|
2022-04-05 16:31:30 +10:00
|
|
|
}
|
A new matcher representation for use in `parse_tt`.
`parse_tt` currently traverses a `&[TokenTree]` to do matching. But this
is a bad representation for the traversal.
- `TokenTree` is nested, and there's a bunch of expensive and fiddly
state required to handle entering and exiting nested submatchers.
- There are three positions (sequence separators, sequence Kleene ops,
and end of the matcher) that are represented by an index that exceeds
the end of the `&[TokenTree]`, which is clumsy and error-prone.
This commit introduces a new representation called `MatcherLoc` that is
designed specifically for matching. It fixes all the above problems,
making the code much easier to read. A `&[TokenTree]` is converted to a
`&[MatcherLoc]` before matching begins. Despite the cost of the
conversion, it's still a net performance win, because various pieces of
traversal state are computed once up-front, rather than having to be
recomputed repeatedly during the macro matching.
Some improvements worth noting.
- `parse_tt_inner` is *much* easier to read. No more having to compare
`idx` against `len` and read comments to understand what the result
means.
- The handling of `Delimited` in `parse_tt_inner` is now trivial.
- The three end-of-sequence cases in `parse_tt_inner` are now handled in
three separate match arms, and the control flow is much simpler.
- `nameize` is no longer recursive.
- There were two places that issued "missing fragment specifier" errors:
one in `parse_tt_inner()`, and one in `nameize()`. Presumably the
latter was never executed. There's now a single place issuing these
errors, in `compute_locs()`.
- The number of heap allocations done for a `check full` build of
`async-std-1.10.0` (an extreme example of heavy macro use) drops from
11.8M to 2.6M, and most of these occur outside of macro matching.
- The size of `MatcherPos` drops from 64 bytes to 16 bytes. Small enough
that it no longer needs boxing, which partly accounts for the
reduction in allocations.
- The rest of the drop in allocations is due to the removal of
`MatcherKind`, because we no longer need to record anything for the
parent matcher when entering a submatcher.
- Overall it reduces code size by 45 lines.
2022-04-01 10:19:16 +11:00
|
|
|
}
|
|
|
|
}
|
|
|
|
Success(ret_val)
|
|
|
|
}
|
2022-03-09 14:51:31 +11:00
|
|
|
}
|