Commit graph

59 commits

Author SHA1 Message Date
Nicholas Nethercote
0bd47e8a39 Reorder match arms in parse_tt_inner.
To match the order the variants are declared in.
2022-04-04 17:03:36 +10:00
Nicholas Nethercote
88f8fbcce0 A new matcher representation for use in parse_tt.
`parse_tt` currently traverses a `&[TokenTree]` to do matching. But this
is a bad representation for the traversal.
- `TokenTree` is nested, and there's a bunch of expensive and fiddly
  state required to handle entering and exiting nested submatchers.
- There are three positions (sequence separators, sequence Kleene ops,
  and end of the matcher) that are represented by an index that exceeds
  the end of the `&[TokenTree]`, which is clumsy and error-prone.

This commit introduces a new representation called `MatcherLoc` that is
designed specifically for matching. It fixes all the above problems,
making the code much easier to read. A `&[TokenTree]` is converted to a
`&[MatcherLoc]` before matching begins. Despite the cost of the
conversion, it's still a net performance win, because various pieces of
traversal state are computed once up-front, rather than having to be
recomputed repeatedly during the macro matching.

Some improvements worth noting.
- `parse_tt_inner` is *much* easier to read. No more having to compare
  `idx` against `len` and read comments to understand what the result
  means.
- The handling of `Delimited` in `parse_tt_inner` is now trivial.
- The three end-of-sequence cases in `parse_tt_inner` are now handled in
  three separate match arms, and the control flow is much simpler.
- `nameize` is no longer recursive.
- There were two places that issued "missing fragment specifier" errors:
  one in `parse_tt_inner()`, and one in `nameize()`. Presumably the
  latter was never executed. There's now a single place issuing these
  errors, in `compute_locs()`.
- The number of heap allocations done for a `check full` build of
  `async-std-1.10.0` (an extreme example of heavy macro use) drops from
  11.8M to 2.6M, and most of these occur outside of macro matching.
- The size of `MatcherPos` drops from 64 bytes to 16 bytes. Small enough
  that it no longer needs boxing, which partly accounts for the
  reduction in allocations.
- The rest of the drop in allocations is due to the removal of
  `MatcherKind`, because we no longer need to record anything for the
  parent matcher when entering a submatcher.
- Overall it reduces code size by 45 lines.
2022-04-04 17:01:28 +10:00
bors
95f68702ff Auto merge of #95509 - nnethercote:simplify-MatcherPos-some-more, r=petrochenkov
Simplify `MatcherPos` some more

A few more improvements.

r? `@petrochenkov`
2022-04-02 04:59:16 +00:00
Vadim Petrochenkov
9ab4f732cb expand: Do not count metavar declarations on RHS of macro_rules
They are 0 by definition there.
2022-03-31 19:09:40 +03:00
Nicholas Nethercote
c6fedd4f10 Make MatcherPos not derive Clone.
It's only used in one place, and there we clone and then make a bunch of
modifications. It's clearer if we duplicate more explicitly, and there's
a symmetry now between `sequence()` and `empty_sequence()`.
2022-03-31 14:40:43 +11:00
Nicholas Nethercote
f68a0449ed Remove MatcherPos::stack.
`parse_tt` needs a way to get from within submatchers make to the
enclosing submatchers. Currently it has two distinct mechanisms for
this:
- `Delimited` submatchers use `MatcherPos::stack` to record stuff about
  the parent (and further back ancestors).
- `Sequence` submatchers use `MatcherPosSequence::parent` to point to
  the parent matcher position.

Having two mechanisms is really confusing, and it took me a long time to
understand all this.

This commit eliminates `MatcherPos::stack`, and changes `Delimited`
submatchers to use the same mechanism as sequence submatchers. That
mechanism is also changed a bit: instead of storing the entire parent
`MatcherPos`, we now only store the necessary parts from the parent
`MatcherPos`.

Overall this is a small performance win, with the positives outweighing
the negatives, but it's mostly for clarity.
2022-03-31 14:39:00 +11:00
Nicholas Nethercote
048bd67d51 Clarify idx handling in sequences.
By adding comments, and improving an assertion. I finally fully
understand this part!
2022-03-31 11:48:36 +11:00
Nicholas Nethercote
2e423c7fd0 Remove MatcherPos::match_lo.
It's redundant w.r.t. other fields.
2022-03-31 11:48:35 +11:00
Nicholas Nethercote
21699c41af Simplify exit of Delimited submatchers.
Currently, we detect an exit from a `Delimited` submatcher when `idx`
exceeds the bounds of the current submatcher *and* there is a `stack`
entry.

This commit changes it to something simpler: just look for a
`CloseDelim` token.
2022-03-31 11:48:34 +11:00
Nicholas Nethercote
6b0a16ab1a Pre-allocate an empty Lrc<NamedMatchVec>.
This avoids some allocations.
2022-03-30 10:54:57 +11:00
Nicholas Nethercote
524d21bd54 Overhaul how matches are recorded.
Currently, matches within a sequence are recorded in a new empty
`matches` vector. Then when the sequence finishes the matches are merged
into the `matches` vector of the parent.

This commit changes things so that a sequence mp inherits the matches
made so far. This means that additional matches from the sequence don't
need to be merged into the parent. `push_match` becomes more
complicated, and the current sequence depth needs to be tracked. But
it's a sizeable performance win because it avoids one or more
`push_match` calls on every iteration of a sequence.

The commit also removes `match_hi`, which is no longer necessary.
2022-03-30 10:54:37 +11:00
Nicholas Nethercote
a1b140cdb7 Improve comments and rename many things for consistency.
In particular:
- Replace use of "item" with "matcher position/"mp".
- Replace use of "repetition" with "sequence".
- Replace `ms` with `matcher`.
2022-03-30 10:50:17 +11:00
Nicholas Nethercote
ac3d8ce1c6 Clarify comments about doc comments in macros. 2022-03-30 10:42:47 +11:00
Nicholas Nethercote
2b60cc081b Simplify and rename count_names. 2022-03-30 10:42:34 +11:00
Nicholas Nethercote
df6ead557d Add a useful assertion. 2022-03-29 08:00:26 +11:00
Dylan DPC
1c8b7412d4
Rollup merge of #95390 - nnethercote:allow-doc-comments-in-macros, r=petrochenkov
Ignore doc comments in a declarative macro matcher.

Fixes #95267. Reverts to the old behaviour before #95159 introduced a
regression.

r? `@petrochenkov`
2022-03-28 16:08:11 +02:00
Nicholas Nethercote
9967594346 Ignore doc comments in a declarative macro matcher.
Fixes #95267. Reverts to the old behaviour before #95159 introduced a
regression.
2022-03-28 10:45:24 +11:00
Nicholas Nethercote
364b908d57 Remove Nonterminal::NtTT.
It's only needed for macro expansion, not as a general element in the
AST. This commit removes it, adds `NtOrTt` for the parser and macro
expansion cases, and renames the variants in `NamedMatch` to better
match the new type.
2022-03-28 10:03:02 +11:00
Nicholas Nethercote
fdec26ddad Shrink MatcherPosRepetition.
Currently it copies a `KleeneOp` and a `Token` out of a
`SequenceRepetition`. It's better to store a reference to the
`SequenceRepetition`, which is now possible due to #95159 having changed
the lifetimes.
2022-03-25 12:35:56 +11:00
Nicholas Nethercote
cad5f1e774 Shrink NamedMatchVec to one inline element.
This counters the `NamedMatchVec` size increase from the previous
commit, leaving `NamedMatchVec` smaller than before.
2022-03-25 12:35:56 +11:00
Nicholas Nethercote
6817442ec7 Split NamedMatch::MatchNonterminal in two.
The `Lrc` is only relevant within `transcribe()`. There, the `Lrc` is
helpful for the non-`NtTT` cases, because the entire nonterminal is
cloned. But for the `NtTT` cases the inner token tree is cloned (a full
clone) and so the `Lrc` is of no help.

This commit splits the `NtTT` and non-`NtTT` cases, avoiding the useless
`Lrc` in the former case, for the following effect on macro-heavy
crates.
- It reduces the total number of allocations a lot.
- It increases the size of some of the remaining allocations.
- It doesn't affect *peak* memory usage, because the larger allocations
  are short-lived.

This overall gives a speed win.
2022-03-25 12:35:52 +11:00
Nicholas Nethercote
904e70a7b0 Add a size assertion for NamedMatchVec. 2022-03-23 13:54:34 +11:00
Nicholas Nethercote
31df680789 Eliminate TokenTreeOrTokenTreeSlice.
As its name suggests, `TokenTreeOrTokenTreeSlice` is either a single
`TokenTree` or a slice of them. It has methods `len` and `get_tt` that
let it be treated much like an ordinary slice. The reason it isn't an
ordinary slice is that for `TokenTree::Delimited` the open and close
delimiters are represented implicitly, and when they are needed they are
constructed on the fly with `Delimited::{open,close}_tt`, rather than
being present in memory.

This commit changes `Delimited` so the open and close delimiters are
represented explicitly. As a result, `TokenTreeOrTokenTreeSlice` is no
longer needed and `MatcherPos` and `MatcherTtFrame` can just use an
ordinary slice. `TokenTree::{len,get_tt}` are also removed, because they
were only needed to support `TokenTreeOrTokenTreeSlice`.

The change makes the code shorter and a little bit faster on benchmarks
that use macro expansion heavily, partly because `MatcherPos` is a lot
smaller (less data to `memcpy`) and partly because ordinary slice
operations are faster than `TokenTreeOrTokenTreeSlice::{len,get_tt}`.
2022-03-23 07:13:31 +11:00
Nicholas Nethercote
754dc8e66f Move items into TtParser as Vecs.
By putting them in `TtParser`, we can reuse them for every rule in a
macro. With that done, they can be `SmallVec` instead of `Vec`, and this
is a performance win because these vectors are hot and `SmallVec`
operations are a bit slower due to always needing an "inline or heap?"
check.
2022-03-21 10:09:24 +11:00
Nicholas Nethercote
cedb787f6e Remove MatcherPosHandle.
This type was a small performance win for `html5ever`, which uses a
macro with hundreds of very simple rules that don't contain any
metavariables. But this type is complicated (extra lifetimes) and
perf-neutral for macros that do have metavariables.

This commit removes `MatcherPosHandle`, simplifying things a lot. This
increases the allocation rate for `html5ever` and similar cases a bit,
but makes things easier for follow-up changes that will improve
performance more than what we lost here.
2022-03-21 10:08:29 +11:00
Nicholas Nethercote
10644e0789 Remove an impossible code path.
Doc comments cannot appear in a matcher.
2022-03-19 09:44:47 +11:00
Nicholas Nethercote
39810a85da Add TtParser::macro_name.
Instead of passing it into `parse_tt`.
2022-03-19 09:44:44 +11:00
Nicholas Nethercote
354bd1071c Rename bb_items_ambiguity_error as ambiguity_error.
Because it involves `next_items` as well as `bb_items`.
2022-03-19 08:04:06 +11:00
Nicholas Nethercote
d21b4f30c1 Introduce TtParser.
It currently has no state, just the three methods `parse_tt`,
`parse_tt_inner`, and `bb_items_ambiguity_error`.

This commit is large but trivial, and mostly consists of changes to the
indentation of those methods. Subsequent commits will do more.
2022-03-19 07:47:22 +11:00
bors
a8adf7685a Auto merge of #95067 - nnethercote:parse_tt-more-refactoring, r=petrochenkov
Still more refactoring of `parse_tt`

r? `@petrochenkov`
2022-03-18 12:34:05 +00:00
Nicholas Nethercote
440a685575 Rename TtSeq as TtSlice.
It's a better name because (a) it holds a slice, and (b) "sequence" has
other meanings in this file.
2022-03-18 17:47:08 +11:00
Nicholas Nethercote
f43028d06f Tweak a bunch of comments.
I've been staring at these enough lately that they're annoying me, let's
make them better.
2022-03-18 17:22:34 +11:00
Nicholas Nethercote
14875a5564 Reorder cases in parse_tt_inner.
I find the new order easier to read: within a matcher; past the end of a
repetition; at end of input. It also reduces the indentation level by
one for
2022-03-18 14:21:13 +11:00
Nicholas Nethercote
83044714a1 Only modify eof_items if token == Eof.
Because that's the condition under which `eof_items` is used.
2022-03-18 14:11:01 +11:00
Nicholas Nethercote
8bd1bcad58 Factor out some code into MatcherPos::repetition.
Also move `create_matches` within `impl MatcherPos`, because it's only
used within that impl block.
2022-03-18 14:09:02 +11:00
Nicholas Nethercote
5bbbee5ba7 Add two useful assertions. 2022-03-18 13:57:11 +11:00
Caio
5d333c155e Fix remaining meta-variable expression TODOs 2022-03-14 08:29:20 -03:00
Nicholas Nethercote
95d13fa37d Move a parse_tt error case into a separate function. 2022-03-11 14:10:21 +11:00
Nicholas Nethercote
235a87fbd3 Make next_items a SmallVec.
For consistency, and to make the code slightly nicer.
2022-03-11 14:10:21 +11:00
Nicholas Nethercote
c13ca42d67 Move eof_items handling entirely within inner_parse_loop.
Also rename `inner_parse_loop` as `parse_tt_inner`, because it's no
longer just a loop.
2022-03-11 14:10:21 +11:00
Nicholas Nethercote
9f0798b2eb Add a useful assertion. 2022-03-11 14:10:21 +11:00
Nicholas Nethercote
4d4baf7c9a Disallow TokenTree::{MetaVar,MetaVarExpr} in matchers.
They should only appear in transcribers.
2022-03-11 14:10:19 +11:00
Nicholas Nethercote
09c3e82050 Refactor the second half of parse_tt.
The current structure makes it hard to tell that there are just four
distinct code paths, depending on how many items there are in `bb_items`
and `next_items`. This commit introduces a `match` that clarifies
things.
2022-03-11 13:56:54 +11:00
Caio
8073a88f35 Implement macro meta-variable expressions 2022-03-09 16:46:23 -03:00
Matthias Krüger
16c6594f34
Rollup merge of #94547 - nnethercote:parse_tt-cleanups, r=petrochenkov
`parse_tt` cleanups

I've been looking closely at this code, and saw some opportunities to improve its readability.

r? ```````@petrochenkov```````
2022-03-03 20:01:45 +01:00
Nicholas Nethercote
97eb1b4669 Change initial_matcher_pos() into MatcherPos::new(). 2022-03-03 21:47:02 +11:00
Nicholas Nethercote
e5f3fd6250 Use a better return type for inner_parse_loop.
Because `inner_parse_loop` has only one way to not succeed, not three.
2022-03-03 21:46:59 +11:00
Nicholas Nethercote
643ba50004 Introduce MatcherPosRepetition.
There are three `Option` fields in `MatcherPos` that are only used in
tandem. This commit combines them, making the code slightly easier to
read. (It also makes clear that the `sep` field arguably should have
been `Option<Option<Token>>`!)
2022-03-03 21:45:52 +11:00
Nicholas Nethercote
b9fabc3f9c Add a static size assertion for MatcherPos. 2022-03-03 21:45:45 +11:00
Nicholas Nethercote
11c565f14b Improve if/else formatting in macro_parser.rs.
To avoid the strange style where comments force `else` onto its own
line.

The commit also removes several else-after-return constructs, which can
be hard to read.
2022-03-03 21:45:42 +11:00