Move token tree related lexer state to a separate struct
Just a types-based refactoring.
We only used a bunch of fields when tokenizing into a token tree, so let's move them out of the base lexer
Identify when a stmt could have been parsed as an expr
There are some expressions that can be parsed as a statement without
a trailing semicolon depending on the context, which can lead to
confusing errors due to the same looking code being accepted in some
places and not others. Identify these cases and suggest enclosing in
parenthesis making the parse non-ambiguous without changing the
accepted grammar.
Fix#54186, cc #54482, fix#59975, fix#47287.
introduce unescape module
A WIP PR to gauge early feedback
Currently, we deal with escape sequences twice: once when we [lex](112f7e9ac5/src/libsyntax/parse/lexer/mod.rs (L928-L1065)) a string, and a second time when we [unescape](112f7e9ac5/src/libsyntax/parse/mod.rs (L313-L366)) literals. Note that we also produce different sets of diagnostics in these two cases.
This PR aims to remove this duplication, by introducing a new `unescape` module as a single source of truth for character escaping rules.
I think this would be a useful cleanup by itself, but I also need this for https://github.com/rust-lang/rust/pull/59706.
In the current state, the PR has `unescape` module which fully (modulo bugs) deals with string and char literals. I am quite happy about the state of this module
What this PR doesn't have yet are:
* [x] handling of byte and byte string literals (should be simple to add)
* [x] good diagnostics
* [x] actual removal of code from lexer (giant `scan_char_or_byte` should go away completely)
* [x] performance check
* [x] general cleanup of the new code
Diagnostics will be the most labor-consuming bit here, but they are mostly a question of just correctly adjusting spans to sub-tokens. The current setup for diagnostics is that `unescape` produces a plain old `enum` with various problems, and they are rendered into `Handler` separately. This bit is not actually required (it is possible to just pass the `Handler` in), but I like the separation between diagnostics and logic this approach imposes, and such separation should again be useful for #59706
cc @eddyb , @petrochenkov
Currently, we deal with escape sequences twice: once when we lex a
string, and a second time when we unescape literals. This PR aims to
remove this duplication, by introducing a new `unescape` mode as a
single source of truth for character escaping rules
There are some expressions that can be parsed as a statement without
a trailing semicolon depending on the context, which can lead to
confusing errors due to the same looking code being accepted in some
places and not others. Identify these cases and suggest enclosing in
parenthesis making the parse non-ambiguous without changing the
accepted grammar.
Rollup of 24 pull requests
Successful merges:
- #58080 (Add FreeBSD armv6 and armv7 targets)
- #58204 (On return type `impl Trait` for block with no expr point at last semi)
- #58269 (Add librustc and libsyntax to rust-src distribution.)
- #58369 (Make the Entry API of HashMap<K, V> Sync and Send)
- #58861 (Expand where negative supertrait specific error is shown)
- #58877 (Suggest removal of `&` when borrowing macro and appropriate)
- #58883 (Suggest appropriate code for unused field when destructuring pattern)
- #58891 (Remove stray ` in the docs for the FromIterator implementation for Option)
- #58893 (race condition in thread local storage example)
- #58906 (Monomorphize generator field types for debuginfo)
- #58911 (Regression test for #58435.)
- #58912 (Regression test for #58813)
- #58916 (Fix release note problems noticed after merging.)
- #58918 (Regression test added for an async ICE.)
- #58921 (Add an explicit test for issue #50582)
- #58926 (Make the lifetime parameters of tcx consistent.)
- #58931 (Elide invalid method receiver error when it contains TyErr)
- #58940 (Remove JSBackend from config.toml)
- #58950 (Add self to mailmap)
- #58961 (On incorrect cfg literal/identifier, point at the right span)
- #58963 (libstd: implement Error::source for io::Error)
- #58970 (delay_span_bug in wfcheck's ty.lift_to_tcx unwrap)
- #58984 (Teach `-Z treat-err-as-bug` to take a number of errors to emit)
- #59007 (Add a test for invalid const arguments)
Failed merges:
- #58959 (Add release notes for PR #56243)
r? @ghost
`-Z treat-err-as-bug=0` will cause `rustc` to panic after the first
error is reported. `-Z treat-err-as-bug=2` will cause `rustc` to
panic after 3 errors have been reported.
Rename rustc_errors dependency in rust 2018 crates
I think this is a better solution than `use rustc_errors as errors` in `lib.rs` and `use crate::errors` in modules.
Related: rust-lang/cargo#5653
cc #58099
r? @Centril
Simplify `TokenStream` some more
These commits simplify `TokenStream`, remove `ThinTokenStream`, and avoid some clones. The end result is simpler code and a slight perf win on some benchmarks.
r? @petrochenkov
Add non-panicking `maybe_new_parser_from_file` variant
Add (seemingly?) missing `maybe_new_parser_from_file` constructor variant.
Disclaimer: I'm not certain this is the correct approach - just found out we don't have this when working on a Rustfmt PR to catch/prevent more Rust parser panics: https://github.com/rust-lang/rustfmt/pull/3240 and tried to make it work somehow.
Because it's an extra type layer that doesn't really help; in a couple
of places it actively gets in the way, and overall removing it makes the
code nicer. It does, however, move `tokenstream::TokenTree` further away
from the `TokenTree` in `quote.rs`.
More importantly, this change reduces the size of `TokenStream` from 48
bytes to 40 bytes on x86-64, which is enough to slightly reduce
instruction counts on numerous benchmarks, the best by 1.5%.
Note that `open_tt` and `close_tt` have gone from being methods on
`Delimited` to associated methods of `TokenTree`.
53956 panic on include bytes of own file
fix#53956
When using `include_bytes!` on a source file in the project, compiler would panic on subsequent compilations because `expand_include_bytes` would overwrite files in the source_map with no source. This PR changes `expand_include_bytes` to check source_map and use the already existing src, if any.
rustdoc: Replaces fn main search and extern crate search with proper parsing during doctests.
Fixes#21299.
Fixes#33731.
Let me know if there's any additional changes you'd like made!