Rewrite collect_tokens implementations to use a flattened buffer

Instead of trying to collect tokens at each depth, we 'flatten' the stream as we go allong, pushing open/close delimiters to our buffer just like regular tokens. One capturing is complete, we reconstruct a nested `TokenTree::Delimited` structure, producing a normal `TokenStream`. The reconstructed `TokenStream` is not created immediately - instead, it is produced on-demand by a closure (wrapped in a new `LazyTokenStream` type). This closure stores a clone of the original `TokenCursor`, plus a record of the number of calls to `next()/next_desugared()`. This is sufficient to reconstruct the tokenstream seen by the callback without storing any additional state. If the tokenstream is never used (e.g. when a captured `macro_rules!` argument is never passed to a proc macro), we never actually create a `TokenStream`. This implementation has a number of advantages over the previous one: * It is significantly simpler, with no edge cases around capturing the start/end of a delimited group. * It can be easily extended to allow replacing tokens an an arbitrary 'depth' by just using `Vec::splice` at the proper position. This is important for PR #76130, which requires us to track information about attributes along with tokens. * The lazy approach to `TokenStream` construction allows us to easily parse an AST struct, and then decide after the fact whether we need a `TokenStream`. This will be useful when we start collecting tokens for `Attribute` - we can discard the `LazyTokenStream` if the parsed attribute doesn't need tokens (e.g. is a builtin attribute). The performance impact seems to be neglibile (see https://github.com/rust-lang/rust/pull/77250#issuecomment-703960604). There is a small slowdown on a few benchmarks, but it only rises above 1% for incremental builds, where it represents a larger fraction of the much smaller instruction count. There a ~1% speedup on a few other incremental benchmarks - my guess is that the speedups and slowdowns will usually cancel out in practice.
2020-09-26 21:56:29 -04:00 · 2020-09-26 21:56:29 -04:00 · 593fdd3d45
commit 593fdd3d45
parent cb2462c53f
7 changed files with 252 additions and 165 deletions
--- a/compiler/rustc_ast/src/tokenstream.rs
+++ b/compiler/rustc_ast/src/tokenstream.rs
@ -16,8 +16,9 @@
 use crate::token::{self, DelimToken, Token, TokenKind};

 use rustc_data_structures::stable_hasher::{HashStable, StableHasher};
-use rustc_data_structures::sync::Lrc;
+use rustc_data_structures::sync::{self, Lrc};
 use rustc_macros::HashStable_Generic;
+use rustc_serialize::{Decodable, Decoder, Encodable, Encoder};
 use rustc_span::{Span, DUMMY_SP};
 use smallvec::{smallvec, SmallVec};

@ -119,6 +120,77 @@ where
    }
 }

+// A cloneable callback which produces a `TokenStream`. Each clone
+// of this should produce the same `TokenStream`
+pub trait CreateTokenStream: sync::Send + sync::Sync + FnOnce() -> TokenStream {
+    // Workaround for the fact that `Clone` is not object-safe
+    fn clone_it(&self) -> Box<dyn CreateTokenStream>;
+}
+
+impl<F: 'static + Clone + sync::Send + sync::Sync + FnOnce() -> TokenStream> CreateTokenStream
+    for F
+{
+    fn clone_it(&self) -> Box<dyn CreateTokenStream> {
+        Box::new(self.clone())
+    }
+}
+
+impl Clone for Box<dyn CreateTokenStream> {
+    fn clone(&self) -> Self {
+        let val: &(dyn CreateTokenStream) = &**self;
+        val.clone_it()
+    }
+}
+
+/// A lazy version of `TokenStream`, which may defer creation
+/// of an actual `TokenStream` until it is needed.
+pub type LazyTokenStream = Lrc<LazyTokenStreamInner>;
+
+#[derive(Clone)]
+pub enum LazyTokenStreamInner {
+    Lazy(Box<dyn CreateTokenStream>),
+    Ready(TokenStream),
+}
+
+impl std::fmt::Debug for LazyTokenStreamInner {
+    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+        match self {
+            LazyTokenStreamInner::Lazy(..) => f.debug_struct("LazyTokenStream::Lazy").finish(),
+            LazyTokenStreamInner::Ready(..) => f.debug_struct("LazyTokenStream::Ready").finish(),
+        }
+    }
+}
+
+impl LazyTokenStreamInner {
+    pub fn into_token_stream(&self) -> TokenStream {
+        match self {
+            // Note that we do not cache this. If this ever becomes a performance
+            // problem, we should investigate wrapping `LazyTokenStreamInner`
+            // in a lock
+            LazyTokenStreamInner::Lazy(cb) => (cb.clone())(),
+            LazyTokenStreamInner::Ready(stream) => stream.clone(),
+        }
+    }
+}
+
+impl<S: Encoder> Encodable<S> for LazyTokenStreamInner {
+    fn encode(&self, _s: &mut S) -> Result<(), S::Error> {
+        panic!("Attempted to encode LazyTokenStream");
+    }
+}
+
+impl<D: Decoder> Decodable<D> for LazyTokenStreamInner {
+    fn decode(_d: &mut D) -> Result<Self, D::Error> {
+        panic!("Attempted to decode LazyTokenStream");
+    }
+}
+
+impl<CTX> HashStable<CTX> for LazyTokenStreamInner {
+    fn hash_stable(&self, _hcx: &mut CTX, _hasher: &mut StableHasher) {
+        panic!("Attempted to compute stable hash for LazyTokenStream");
+    }
+}
+
 /// A `TokenStream` is an abstract sequence of tokens, organized into `TokenTree`s.
 ///
 /// The goal is for procedural macros to work with `TokenStream`s and `TokenTree`s