`TokenStream::Stream` can represent a token stream containing any number
of token trees. `TokenStream::Tree` is the special case representing a
single token tree. The latter doesn't occur all that often dynamically,
so this commit removes it, which simplifies the code quite a bit.
This change has mixed performance effects.
- The size of `TokenStream` drops from 32 bytes to 8 bytes, and there
is one less case for all the match statements.
- The conversion of a `TokenTree` to a `TokenStream` now requires two
allocations, for the creation of a single element Lrc<Vec<_>>. (But a
subsequent commit in this PR will reduce the main source of such
conversions.)
`TokenStream` is currently recursive in *two* ways:
- the `TokenTree` variant contains a `ThinTokenStream`, which can
contain a `TokenStream`;
- the `TokenStream` variant contains a `Vec<TokenStream>`.
The latter is not necessary and causes significant complexity. This
commit replaces it with the simpler `Vec<(TokenTree, IsJoint)>`.
This reduces complexity significantly. In particular, `StreamCursor` is
eliminated, and `Cursor` becomes much simpler, consisting now of just a
`TokenStream` and an index.
The commit also removes the `Extend` impl for `TokenStream`, because it
is only used in tests. (The commit also removes those tests.)
Overall, the commit reduces the number of lines of code by almost 200.
Remove `TokenStream::JointTree`.
This is done by adding a new `IsJoint` field to `TokenStream::Tree`,
which simplifies a lot of `match` statements. And likewise for
`CursorKind`.
The commit also adds a new method `TokenTree:stream()` which can replace
a choice between `.into()` and `.joint()`.
This is done by adding a new `IsJoint` field to `TokenStream::Tree`,
which simplifies a lot of `match` statements. And likewise for
`CursorKind`.
The commit also adds a new method `TokenTree:stream()` which can replace
a choice between `.into()` and `.joint()`.
This shrinks:
- ThinTokenStream: 16 to 8 bytes
- TokenTree: 32 to 24 bytes
- TokenStream: 40 to 32 bytes
The only downside is that in a couple of places this requires using
`to_vec()` (which allocates) instead of `sub_slice()`. But those places
are rarely executed, so it doesn't hurt perf.
Overall, this reduces instruction counts on numerous benchmarks by up to
3%.
Because it's an extra type layer that doesn't really help; in a couple
of places it actively gets in the way, and overall removing it makes the
code nicer. It does, however, move `tokenstream::TokenTree` further away
from the `TokenTree` in `quote.rs`.
More importantly, this change reduces the size of `TokenStream` from 48
bytes to 40 bytes on x86-64, which is enough to slightly reduce
instruction counts on numerous benchmarks, the best by 1.5%.
Note that `open_tt` and `close_tt` have gone from being methods on
`Delimited` to associated methods of `TokenTree`.
Ignore non-semantic tokens for 'probably_eq' streams.
Improves the situation in #43081 by skipping typically non-semantic tokens when checking for 'probably_eq'.
r? @alexcrichton
When missing a comma in a macro call, suggest it, regardless of
position. When a macro call doesn't match any of the patterns, check
if the call's token stream could be missing a comma between two idents,
and if so, create a new token stream containing the comma and try to
match against the macro patterns. If successful, emit the suggestion.
Ever plagued by #43081 the compiler can return surprising spans in situations
related to procedural macros. This is exhibited by #47983 where whenever a
procedural macro is invoked in a nested item context it would fail to have
correct span information.
While #43230 provided a "hack" to cache the token stream used for each item in
the compiler it's not a full-blown solution. This commit continues to extend
this "hack" a bit more to work for nested items.
Previously in the parser the `parse_item` method would collect the tokens for an
item into a cache on the item itself. It turned out, however, that nested items
were parsed through the `parse_item_` method, so they didn't receive similar
treatment. To remedy this situation the hook for collecting tokens was moved
into `parse_item_` instead of `parse_item`.
Afterwards the token collection scheme was updated to support nested collection
of tokens. This is implemented by tracking `TokenStream` tokens instead of
`TokenTree` to allow for collecting items into streams at intermediate layers
and having them interleaved in the upper layers.
All in all, this...
Closes#47983
Discovered in #50061 we're falling off the "happy path" of using a stringified
token stream more often than we should. This was due to the fact that a
user-written token like `0xf` is equality-different from the stringified token
of `15` (despite being semantically equivalent).
This patch updates the call to `eq_unspanned` with an even more awful solution,
`probably_equal_for_proc_macro`, which ignores the value of each token and
basically only compares the structure of the token stream, assuming that the AST
doesn't change just one token at a time.
While this is a step towards fixing #50061 there is still one regression
from #49154 which needs to be fixed.
This commit adds even more pessimization to use the cached `TokenStream` inside
of an AST node. As a reminder the `proc_macro` API requires taking an arbitrary
AST node and transforming it back into a `TokenStream` to hand off to a
procedural macro. Such functionality isn't actually implemented in rustc today,
so the way `proc_macro` works today is that it stringifies an AST node and then
reparses for a list of tokens.
This strategy unfortunately loses all span information, so we try to avoid it
whenever possible. Implemented in #43230 some AST nodes have a `TokenStream`
cache representing the tokens they were originally parsed from. This
`TokenStream` cache, however, has turned out to not always reflect the current
state of the item when it's being tokenized. For example `#[cfg]` processing or
macro expansion could modify the state of an item. Consequently we've seen a
number of bugs (#48644 and #49846) related to using this stale cache.
This commit tweaks the usage of the cached `TokenStream` to compare it to our
lossy stringification of the token stream. If the tokens that make up the cache
and the stringified token stream are the same then we return the cached version
(which has correct span information). If they differ, however, then we will
return the stringified version as the cache has been invalidated and we just
haven't figured that out.
Closes#48644Closes#49846
When gluing two tokens, the second of which is joint, the result should also be
joint.
This fixes an issue with joining three `Dot` tokens to make a `DotDotDot` - the
intermediate `DotDot` would not be joint and therefore we would not attempt to
glue the last `Dot` token, yielding `.. .` instead of `...`.
Parsing `a as usize > b` always works, but `a as usize < b` was a
parsing error because the parser would think the `<` started a generic
type argument for `usize`. The parser now attempts to parse as before,
and if a DiagnosticError is returned, try to parse again as a type with
no generic arguments. If this fails, return the original
`DiagnosticError`.
This is useful if parsing from stdin or a String and don't want to try and read in a module from another file. Instead we just leave a stub in the AST.
macros: improve the `TokenStream` quoter
This PR
- renames the `TokenStream` quoter from `qquote!` to `quote!`,
- uses `$` instead of `unquote` (e.g. `let toks: TokenStream = ...; quote!([$toks])`),
- allows unquoting `Token`s as well as `TokenTree`s and `TokenStream`s (fixes#39746), and
- to preserve syntactic space, requires that `$` be followed by
- a single identifier to unquote, or
- another `$` to produce a literal `$`.
r? @nrc