`tokenstream::Spacing` appears on all `TokenTree::Token` instances,
both punct and non-punct. Its current usage:
- `Joint` means "can join with the next token *and* that token is a
punct".
- `Alone` means "cannot join with the next token *or* can join with the
next token but that token is not a punct".
The fact that `Alone` is used for two different cases is awkward.
This commit augments `tokenstream::Spacing` with a new variant
`JointHidden`, resulting in:
- `Joint` means "can join with the next token *and* that token is a
punct".
- `JointHidden` means "can join with the next token *and* that token is a
not a punct".
- `Alone` means "cannot join with the next token".
This *drastically* improves the output of `print_tts`. For example,
this:
```
stringify!(let a: Vec<u32> = vec![];)
```
currently produces this string:
```
let a : Vec < u32 > = vec! [] ;
```
With this PR, it now produces this string:
```
let a: Vec<u32> = vec![] ;
```
(The space after the `]` is because `TokenTree::Delimited` currently
doesn't have spacing information. The subsequent commit fixes this.)
The new `print_tts` doesn't replicate original code perfectly. E.g.
multiple space characters will be condensed into a single space
character. But it's much improved.
`print_tts` still produces the old, uglier output for code produced by
proc macros. Because we have to translate the generated code from
`proc_macro::Spacing` to the more expressive `token::Spacing`, which
results in too much `proc_macro::Along` usage and no
`proc_macro::JointHidden` usage. So `space_between` still exists and
is used by `print_tts` in conjunction with the `Spacing` field.
This change will also help with the removal of `Token::Interpolated`.
Currently interpolated tokens are pretty-printed nicely via AST pretty
printing. `Token::Interpolated` removal will mean they get printed with
`print_tts`. Without this change, that would result in much uglier
output for code produced by decl macro expansions. With this change, AST
pretty printing and `print_tts` produce similar results.
The commit also tweaks the comments on `proc_macro::Spacing`. In
particular, it refers to "compound tokens" rather than "multi-char
operators" because lifetimes aren't operators.
This is surprising, but the new comment explains why. It's a logical
conclusion in the drive to avoid `TokenTree` clones.
`TokenTreeCursor` is now only used within `Parser`. It's still needed
due to `replace_prev_and_rewind`.
This is required to distinguish between cooked and raw byte string
literals in an `ast::LitKind`, without referring to an adjacent
`token::Lit`. It's a prerequisite for the next commit.
`Lit::from_included_bytes` calls `Lit::from_lit_kind`, but the two call
sites only need the resulting `token::Lit`, not the full `ast::Lit`.
This commit changes those call sites to use `LitKind::to_token_lit`,
which means `from_included_bytes` can be removed.
Instead of `ast::Lit`.
Literal lowering now happens at two different times. Expression literals
are lowered when HIR is crated. Attribute literals are lowered during
parsing.
This commit changes the language very slightly. Some programs that used
to not compile now will compile. This is because some invalid literals
that are removed by `cfg` or attribute macros will no longer trigger
errors. See this comment for more details:
https://github.com/rust-lang/rust/pull/102944#issuecomment-1277476773
Remove `TokenStreamBuilder`
`TokenStreamBuilder` is used to combine multiple token streams. It can be removed, leaving the code a little simpler and a little faster.
r? `@Aaron1011`
`TokenStreamBuilder` exists to concatenate multiple `TokenStream`s
together. This commit removes it, and moves the concatenation
functionality directly into `TokenStream`, via two new methods
`push_tree` and `push_stream`. This makes things both simpler and
faster.
`push_tree` is particularly important. `TokenStreamBuilder` only had a
single `push` method, which pushed a stream. But in practice most of the
time we push a single token tree rather than a stream, and `push_tree`
avoids the need to build a token stream with a single entry (which
requires two allocations, one for the `Lrc` and one for the `Vec`).
The main `push_tree` use arises from a change to one of the `ToInternal`
impls in `proc_macro_server.rs`. It now returns a `SmallVec` instead of
a `TokenStream`. This return value is then iterated over by
`concat_trees`, which does `push_tree` on each element. Furthermore, the
use of `SmallVec` avoids more allocations, because there is always only
one or two token trees.
Note: the removed `TokenStreamBuilder::push` method had some code to
deal with a quadratic blowup case from #57735. This commit removes the
code. I tried and failed to reproduce the blowup from that PR, before
and after this change. Various other changes have happened to
`TokenStreamBuilder` in the meantime, so I suspect the original problem
is no longer relevant, though I don't have proof of this. Generally
speaking, repeatedly extending a `Vec` without pre-determining its
capacity is *not* quadratic. It's also incredibly common, within rustc
and many other Rust programs, so if there were performance problems
there you'd think it would show up in other places, too.
`TokenTree::Punct` is handled outside the `match`. This commits moves it
inside the `match`, avoiding the need for the `return`s and making it
easier to read.
proc_macro/bridge: send diagnostics over the bridge as a struct
This removes some RPC when creating and emitting diagnostics, and
simplifies the bridge slightly.
After this change, there are no remaining methods which take advantage
of the support for `&mut` references to objects in the store as
arguments, meaning that support for them could technically be removed if
we wanted. The only remaining uses of immutable references into the
store are `TokenStream` and `SourceFile`.
r? `@eddyb`
- Rename `ast::Lit::token` as `ast::Lit::token_lit`, because its type is
`token::Lit`, which is not a token. (This has been confusing me for a
long time.)
reasonable because we have an `ast::token::Lit` inside an `ast::Lit`.
- Rename `LitKind::{from,to}_lit_token` as
`LitKind::{from,to}_token_lit`, to match the above change and
`token::Lit`.
This removes some RPC when creating and emitting diagnostics, and
simplifies the bridge slightly.
After this change, there are no remaining methods which take advantage
of the support for `&mut` references to objects in the store as
arguments, meaning that support for them could technically be removed if
we wanted. The only remaining uses of immutable references into the
store are `TokenStream` and `SourceFile`.
A `TokenStream` contains a `Lrc<Vec<(TokenTree, Spacing)>>`. But this is
not quite right. `Spacing` makes sense for `TokenTree::Token`, but does
not make sense for `TokenTree::Delimited`, because a
`TokenTree::Delimited` cannot be joined with another `TokenTree`.
This commit fixes this problem, by adding `Spacing` to `TokenTree::Token`,
changing `TokenStream` to contain a `Lrc<Vec<TokenTree>>`, and removing the
`TreeAndSpacing` typedef.
The commit removes these two impls:
- `impl From<TokenTree> for TokenStream`
- `impl From<TokenTree> for TreeAndSpacing`
These were useful, but also resulted in code with many `.into()` calls
that was hard to read, particularly for anyone not highly familiar with
the relevant types. This commit makes some other changes to compensate:
- `TokenTree::token()` becomes `TokenTree::token_{alone,joint}()`.
- `TokenStream::token_{alone,joint}()` are added.
- `TokenStream::delimited` is added.
This results in things like this:
```rust
TokenTree::token(token::Semi, stmt.span).into()
```
changing to this:
```rust
TokenStream::token_alone(token::Semi, stmt.span)
```
This makes the type of the result, and its spacing, clearer.
These changes also simplifies `Cursor` and `CursorRef`, because they no longer
need to distinguish between `next` and `next_with_spacing`.
This method is still only used for Literal::subspan, however the
implementation only depends on the Span component, so it is simpler and
more efficient for now to pass down only the information that is needed.
In the future, if more information about the Literal is required in the
implementation (e.g. to validate that spans line up as expected with
source text), that extra information can be added back with extra
arguments.
This builds on the symbol infrastructure built for `Ident` to replicate
the `LitKind` and `Lit` structures in rustc within the `proc_macro`
client, allowing literals to be fully created and interacted with from
the client thread. Only parsing and subspan operations still require
sync RPC.
Doing this for all unicode identifiers would require a dependency on
`unicode-normalization` and `rustc_lexer`, which is currently not
possible for `proc_macro` due to it being built concurrently with `std`
and `core`. Instead, ASCII identifiers are validated locally, and an RPC
message is used to validate unicode identifiers when needed.
String values are interned on the both the server and client when
deserializing, to avoid unnecessary copies and keep Ident cheap to copy and
move. This appears to be important for performance.
The client-side interner is based roughly on the one from rustc_span, and uses
an arena inspired by rustc_arena.
RPC messages passing symbols always include the full value. This could
potentially be optimized in the future if it is revealed to be a
performance bottleneck.
Despite now having a relevant implementaion of Display for Ident, ToString is
still specialized, as it is a hot-path for this object.
The symbol infrastructure will also be used for literals in the next
part.
proc_macro: Fix expand_expr expansion of bool literals
Previously, the expand_expr method would expand bool literals as a
`Literal` token containing a `LitKind::Bool`, rather than as an `Ident`.
This is not a valid token, and the `LitKind::Bool` case needs to be
handled seperately.
Tests were added to more deeply compare the streams in the expand-expr
test suite to catch mistakes like this in the future.
This greatly reduces round-trips to fetch relevant extra information about the
token in proc macro code, and avoids RPC messages to create Group tokens.
This greatly reduces round-trips to fetch relevant extra information about the
token in proc macro code, and avoids RPC messages to create Punct tokens.
Previously, the expand_expr method would expand bool literals as a
`Literal` token containing a `LitKind::Bool`, rather than as an `Ident`.
This is not a valid token, and the `LitKind::Bool` case needs to be
handled seperately.
Tests were added to more deeply compare the streams in the expand-expr
test suite to catch mistakes like this in the future.
It's a weird function: it lets you modify the token stream in the middle
of iteration. There is only one call site, and it is only used for the
rare `ProceduralMasquerade` legacy case.
Batch proc_macro RPC for TokenStream iteration and combination operations
This is the first part of #86822, split off as requested in https://github.com/rust-lang/rust/pull/86822#pullrequestreview-1008655452. It reduces the number of RPC calls required for common operations such as iterating over and concatenating TokenStreams.