Rewrite collect_tokens implementations to use a flattened buffer

Instead of trying to collect tokens at each depth, we 'flatten' the
stream as we go allong, pushing open/close delimiters to our buffer
just like regular tokens. One capturing is complete, we reconstruct a
nested `TokenTree::Delimited` structure, producing a normal
`TokenStream`.

The reconstructed `TokenStream` is not created immediately - instead, it is
produced on-demand by a closure (wrapped in a new `LazyTokenStream` type). This
closure stores a clone of the original `TokenCursor`, plus a record of the
number of calls to `next()/next_desugared()`. This is sufficient to reconstruct
the tokenstream seen by the callback without storing any additional state. If
the tokenstream is never used (e.g. when a captured `macro_rules!` argument is
never passed to a proc macro), we never actually create a `TokenStream`.

This implementation has a number of advantages over the previous one:

* It is significantly simpler, with no edge cases around capturing the
  start/end of a delimited group.

* It can be easily extended to allow replacing tokens an an arbitrary
  'depth' by just using `Vec::splice` at the proper position. This is
  important for PR #76130, which requires us to track information about
  attributes along with tokens.

* The lazy approach to `TokenStream` construction allows us to easily
  parse an AST struct, and then decide after the fact whether we need a
  `TokenStream`. This will be useful when we start collecting tokens for
  `Attribute` - we can discard the `LazyTokenStream` if the parsed
  attribute doesn't need tokens (e.g. is a builtin attribute).

The performance impact seems to be neglibile (see
https://github.com/rust-lang/rust/pull/77250#issuecomment-703960604). There is a
small slowdown on a few benchmarks, but it only rises above 1% for incremental
builds, where it represents a larger fraction of the much smaller instruction
count. There a ~1% speedup on a few other incremental benchmarks - my guess is
that the speedups and slowdowns will usually cancel out in practice.
This commit is contained in:
Aaron Hill 2020-09-26 21:56:29 -04:00
parent cb2462c53f
commit 593fdd3d45
No known key found for this signature in database
GPG key ID: B4087E510E98B164
7 changed files with 252 additions and 165 deletions

View file

@ -8,7 +8,7 @@
use rustc_ast as ast;
use rustc_ast::token::{self, DelimToken, Nonterminal, Token, TokenKind};
use rustc_ast::tokenstream::{self, TokenStream, TokenTree};
use rustc_ast::tokenstream::{self, LazyTokenStream, TokenStream, TokenTree};
use rustc_ast_pretty::pprust;
use rustc_data_structures::sync::Lrc;
use rustc_errors::{Diagnostic, FatalError, Level, PResult};
@ -248,29 +248,32 @@ pub fn nt_to_tokenstream(nt: &Nonterminal, sess: &ParseSess, span: Span) -> Toke
// As a result, some AST nodes are annotated with the token stream they
// came from. Here we attempt to extract these lossless token streams
// before we fall back to the stringification.
let convert_tokens = |tokens: Option<LazyTokenStream>| tokens.map(|t| t.into_token_stream());
let tokens = match *nt {
Nonterminal::NtItem(ref item) => {
prepend_attrs(sess, &item.attrs, item.tokens.as_ref(), span)
}
Nonterminal::NtBlock(ref block) => block.tokens.clone(),
Nonterminal::NtBlock(ref block) => convert_tokens(block.tokens.clone()),
Nonterminal::NtStmt(ref stmt) => {
// FIXME: We currently only collect tokens for `:stmt`
// matchers in `macro_rules!` macros. When we start collecting
// tokens for attributes on statements, we will need to prepend
// attributes here
stmt.tokens.clone()
convert_tokens(stmt.tokens.clone())
}
Nonterminal::NtPat(ref pat) => pat.tokens.clone(),
Nonterminal::NtTy(ref ty) => ty.tokens.clone(),
Nonterminal::NtPat(ref pat) => convert_tokens(pat.tokens.clone()),
Nonterminal::NtTy(ref ty) => convert_tokens(ty.tokens.clone()),
Nonterminal::NtIdent(ident, is_raw) => {
Some(tokenstream::TokenTree::token(token::Ident(ident.name, is_raw), ident.span).into())
}
Nonterminal::NtLifetime(ident) => {
Some(tokenstream::TokenTree::token(token::Lifetime(ident.name), ident.span).into())
}
Nonterminal::NtMeta(ref attr) => attr.tokens.clone(),
Nonterminal::NtPath(ref path) => path.tokens.clone(),
Nonterminal::NtVis(ref vis) => vis.tokens.clone(),
Nonterminal::NtMeta(ref attr) => convert_tokens(attr.tokens.clone()),
Nonterminal::NtPath(ref path) => convert_tokens(path.tokens.clone()),
Nonterminal::NtVis(ref vis) => convert_tokens(vis.tokens.clone()),
Nonterminal::NtTT(ref tt) => Some(tt.clone().into()),
Nonterminal::NtExpr(ref expr) | Nonterminal::NtLiteral(ref expr) => {
if expr.tokens.is_none() {
@ -602,10 +605,10 @@ fn token_probably_equal_for_proc_macro(first: &Token, other: &Token) -> bool {
fn prepend_attrs(
sess: &ParseSess,
attrs: &[ast::Attribute],
tokens: Option<&tokenstream::TokenStream>,
tokens: Option<&tokenstream::LazyTokenStream>,
span: rustc_span::Span,
) -> Option<tokenstream::TokenStream> {
let tokens = tokens?;
let tokens = tokens?.clone().into_token_stream();
if attrs.is_empty() {
return Some(tokens.clone());
}