Commit graph

161 commits

Author SHA1 Message Date
Ralf Jung
20d04d8a40 Revert "Rollup merge of #136355 - GuillaumeGomez:proc-macro_add_value_retrieval_methods, r=Amanieu"
This reverts commit 08dfbf49e3, reversing
changes made to 10bcdad7df.
2025-03-18 13:28:56 +01:00
Jacob Pratt
08dfbf49e3
Rollup merge of #136355 - GuillaumeGomez:proc-macro_add_value_retrieval_methods, r=Amanieu
Add `*_value` methods to proc_macro lib

This is the implementation of https://github.com/rust-lang/libs-team/issues/459.

It allows to get the actual value (unescaped) of the different string literals.

Part of https://github.com/rust-lang/rust/issues/136652.

r? libs-api
2025-03-17 05:47:48 -04:00
Nicholas Nethercote
53167c0b7f Rename ast::TokenKind::Not as ast::TokenKind::Bang.
For consistency with `rustc_lexer::TokenKind::Bang`, and because other
`ast::TokenKind` variants generally have syntactic names instead of
semantic names (e.g. `Star` and `DotDot` instead of `Mul` and `Range`).
2025-03-03 09:26:13 +11:00
Nicholas Nethercote
2a1e2e9632 Replace ast::TokenKind::BinOp{,Eq} and remove BinOpToken.
`BinOpToken` is badly named, because it only covers the assignable
binary ops and excludes comparisons and `&&`/`||`. Its use in
`ast::TokenKind` does allow a small amount of code sharing, but it's a
clumsy factoring.

This commit removes `ast::TokenKind::BinOp{,Eq}`, replacing each one
with 10 individual variants. This makes `ast::TokenKind` more similar to
`rustc_lexer::TokenKind`, which has individual variants for all
operators.

Although the number of lines of code increases, the number of chars
decreases due to the frequent use of shorter names like `token::Plus`
instead of `token::BinOp(BinOpToken::Plus)`.
2025-03-03 09:26:11 +11:00
Guillaume Gomez
49d2d5a116 Extract unescape from rustc_lexer into its own crate 2025-02-10 10:38:22 +01:00
Nicholas Nethercote
2620eb42d7 Re-export more rustc_span::symbol things from rustc_span.
`rustc_span::symbol` defines some things that are re-exported from
`rustc_span`, such as `Symbol` and `sym`. But it doesn't re-export some
closely related things such as `Ident` and `kw`. So you can do `use
rustc_span::{Symbol, sym}` but you have to do `use
rustc_span::symbol::{Ident, kw}`, which is inconsistent for no good
reason.

This commit re-exports `Ident`, `kw`, and `MacroRulesNormalizedIdent`,
and changes many `rustc_span::symbol::` qualifiers in `compiler/` to
`rustc_span::`. This is a 200+ net line of code reduction, mostly
because many files with two `use rustc_span` items can be reduced to
one.
2024-12-18 13:38:53 +11:00
Nicholas Nethercote
2e412fef75 Remove Lexer's dependency on Parser.
Lexing precedes parsing, as you'd expect: `Lexer` creates a
`TokenStream` and `Parser` then parses that `TokenStream`.

But, in a horrendous violation of layering abstractions and common
sense, `Lexer` depends on `Parser`! The `Lexer::unclosed_delim_err`
method does some error recovery that relies on creating a `Parser` to do
some post-processing of the `TokenStream` that the `Lexer` just created.

This commit just removes `unclosed_delim_err`. This change removes
`Lexer`'s dependency on `Parser`, and also means that `lex_token_tree`'s
return value can have a more typical form.

The cost is slightly worse error messages in two obscure cases, as shown
in these tests:
- tests/ui/parser/brace-in-let-chain.rs: there is slightly less
  explanation in this case involving an extra `{`.
- tests/ui/parser/diff-markers/unclosed-delims{,-in-macro}.rs: the diff
  marker detection is no longer supported (because that detection is
  implemented in the parser).

In my opinion this cost is outweighed by the magnitude of the code
cleanup.
2024-12-13 07:10:20 +11:00
Esteban Küber
5404cbb996 Fix typo in RFC mention 3598 -> 3593
https://github.com/rust-lang/rfcs/blob/master/text/3593-unprefixed-guarded-strings.md
2024-12-09 17:16:14 +00:00
Michael Goulet
d878fd8877 Only error raw lifetime followed by \' in edition 2021+ 2024-12-01 05:23:16 +00:00
Guillaume Gomez
ca71c8fe5e
Rollup merge of #133487 - pitaj:reserve-guarded-strings, r=fee1-dead
fix confusing diagnostic for reserved `##`

Closes #131615
2024-11-28 12:06:04 +01:00
Peter Jaszkowiak
44f4f67f46 fix confusing diagnostic for reserved ## 2024-11-25 22:29:14 -07:00
Nicholas Nethercote
16a39bb7ca Streamline lex_token_trees error handling.
- Use iterators instead of `for` loops.
- Use `if`/`else` instead of `match`.
2024-11-25 16:10:55 +11:00
Nicholas Nethercote
ba1a1ddc3f Fix some formatting.
Must be one of those cases where the function is too long and rustfmt
bails out.
2024-11-25 16:10:55 +11:00
Nicholas Nethercote
98777b4c49 Merge TokenTreesReader into StringReader.
There is a not-very-useful layering in the lexer, where
`TokenTreesReader` contains a `StringReader`. This commit combines them
and names the result `Lexer`, which is a more obvious name for it.

The methods of `Lexer` are now split across `mod.rs` and `tokentrees.rs`
which isn't ideal, but it doesn't seem worth moving a bunch of code to
avoid it.
2024-11-25 16:10:55 +11:00
Nicholas Nethercote
e9a0c3c98c Remove TokenKind::InvalidPrefix.
It was added in #123752 to handle some cases involving emoji, but it
isn't necessary because it's always treated the same as
`TokenKind::InvalidIdent`. This commit removes it, which makes things a
little simpler.
2024-11-19 18:06:22 +11:00
Michael Goulet
9785c7cf94 Enforce that raw lifetime identifiers must be valid raw identifiers 2024-10-30 14:45:22 +00:00
Peter Jaszkowiak
321a5db7d4 Reserve guarded string literals (RFC 3593) 2024-10-08 18:21:16 -06:00
Michael Goulet
c682aa162b Reformat using the new identifier sorting from rustfmt 2024-09-22 19:11:29 -04:00
Michael Goulet
5de89bb011 Store raw ident span for raw lifetime 2024-09-17 16:43:18 -04:00
Michael Goulet
afa24f0180 Add some more tests 2024-09-06 10:32:48 -04:00
Michael Goulet
97910580aa Add initial support for raw lifetimes 2024-09-06 10:32:48 -04:00
Michael Goulet
3b3e43a386 Format lexer 2024-09-06 10:32:48 -04:00
Michael Goulet
9aaf873396 Reserve prefix lifetimes too 2024-09-06 10:32:48 -04:00
Nicholas Nethercote
84ac80f192 Reformat use declarations.
The previous commit updated `rustfmt.toml` appropriately. This commit is
the outcome of running `x fmt --all` with the new formatting options.
2024-07-29 08:26:52 +10:00
Oli Scherer
7ba82d61eb Use a dedicated type instead of a reference for the diagnostic context
This paves the way for tracking more state (e.g. error tainting) in the diagnostic context handle
2024-06-18 15:42:11 +00:00
Oli Scherer
c91edc3888 Prefer dcx methods over fields or fields' methods 2024-06-18 13:45:08 +00:00
Nicholas Nethercote
d1215da26e Don't use the word "parse" for lexing operations.
Lexing converts source text into a token stream. Parsing converts a
token stream into AST fragments. This commit renames several lexing
operations that have "parse" in the name. I think these names have been
subtly confusing me for years.

This is just a `s/parse/lex/` on function names, with one exception:
`parse_stream_from_source_str` becomes `source_str_to_stream`, to make
it consistent with the existing `source_file_to_stream`. The commit also
moves that function's location in the file to be just above
`source_file_to_stream`.

The commit also cleans up a few comments along the way.
2024-06-05 10:29:16 +10:00
Nicholas Nethercote
bb364fe950 Remove #[macro_use] extern crate tracing from rustc_parse. 2024-05-23 18:02:40 +10:00
Xiretza
98dd6c7e8f Rename buffer_lint_with_diagnostic to buffer_lint 2024-05-21 20:16:39 +00:00
Xiretza
c227f35a9c Generate lint diagnostic message from BuiltinLintDiag
Translation of the lint message happens when the actual diagnostic is
created, not when the lint is buffered. Generating the message from
BuiltinLintDiag ensures that all required data to construct the message
is preserved in the LintBuffer, eventually allowing the messages to be
moved to fluent.

Remove the `msg` field from BufferedEarlyLint, it is either generated
from the data in the BuiltinLintDiag or stored inside
BuiltinLintDiag::Normal.
2024-05-21 20:16:39 +00:00
Lin Yihai
f9bb5df5a0 narrow down visibilities in rustc_parse::lexer 2024-05-07 11:02:28 +08:00
Jubilee
0a0a5a956c
Rollup merge of #123752 - estebank:emoji-prefix, r=wesleywiser
Properly handle emojis as literal prefix in macros

Do not accept the following

```rust
macro_rules! lexes {($($_:tt)*) => {}}
lexes!(🐛"foo");
```

Before, invalid emoji identifiers were gated during parsing instead of lexing in all cases, but this didn't account for macro pre-expansion of literal prefixes.

Fix #123696.
2024-04-18 21:38:55 -07:00
Nicholas Nethercote
0d97669a17 Simplify static_assert_sizes.
We want to run them on all 64-bit platforms.
2024-04-18 15:36:25 +10:00
Matthias Krüger
68359e2284
Rollup merge of #123223 - estebank:issue-123079, r=pnkfelix
Fix invalid silencing of parsing error

Given

```rust
macro_rules! a {
    ( ) => {
        impl<'b> c for d {
            e::<f'g>
        }
    };
}
```

ensure an error is emitted.

Fix #123079.
2024-04-12 17:41:33 +02:00
Esteban Küber
19821ad234 Properly handle emojis as literal prefix in macros
Do not accept the following

```rust
macro_rules! lexes {($($_:tt)*) => {}}
lexes!(🐛"foo");
```

Before, invalid emoji identifiers were gated during parsing instead of lexing in all cases, but this didn't account for macro expansion of literal prefixes.

Fix #123696.
2024-04-10 23:19:27 +00:00
Yutaro Ohno
3a0d8d8afc parser: reduce visibility of unnecessary public UnmatchedDelim
`lexer::UnmatchedDelim` struct in `rustc_parse` is unnecessary public
outside of the crate. This commit reduces the visibility to
`pub(crate)`.

Beside, this removes unnecessary field `expected_delim` that causes
warnings after changing the visibility.
2024-04-08 23:55:48 +09:00
Esteban Küber
e572a194bf Fix invalid silencing of parsing error
Given

```rust
macro_rules! a {
    ( ) => {
        impl<'b> c for d {
            e::<f'g>
        }
    };
}
```

ensure an error is emitted.

Fix #123079.
2024-04-07 17:22:34 +00:00
Zalathar
2d47cd77ac Check x86_64 size assertions on aarch64, too
This makes it easier for contributors on aarch64 workstations (e.g. Macs) to
notice when these assertions have been violated.
2024-04-03 16:53:03 +11:00
Esteban Küber
ea1883d7b2 Silence redundant error on char literal that was meant to be a string in 2021 edition 2024-03-17 23:35:19 +00:00
Esteban Küber
999a0dc300 review comment: str -> string in messages 2024-03-17 23:35:18 +00:00
Esteban Küber
982918f493 Handle str literals written with ' lexed as lifetime
Given `'hello world'` and `'1 str', provide a structured suggestion for a valid string literal:

```
error[E0762]: unterminated character literal
  --> $DIR/lex-bad-str-literal-as-char-3.rs:2:26
   |
LL |     println!('hello world');
   |                          ^^^^
   |
help: if you meant to write a `str` literal, use double quotes
   |
LL |     println!("hello world");
   |              ~           ~
```
```
error[E0762]: unterminated character literal
  --> $DIR/lex-bad-str-literal-as-char-1.rs:2:20
   |
LL |     println!('1 + 1');
   |                    ^^^^
   |
help: if you meant to write a `str` literal, use double quotes
   |
LL |     println!("1 + 1");
   |              ~     ~
```

Fix #119685.
2024-03-17 23:35:18 +00:00
Nicholas Nethercote
7aa0eea19c Rename BuiltinLintDiagnostics as BuiltinLintDiag.
Not the dropping of the trailing `s` -- this type describes a single
diagnostic and its name should be singular.
2024-03-05 12:15:10 +11:00
Nicholas Nethercote
80d2bdb619 Rename all ParseSess variables/fields/lifetimes as psess.
Existing names for values of this type are `sess`, `parse_sess`,
`parse_session`, and `ps`. `sess` is particularly annoying because
that's also used for `Session` values, which are often co-located, and
it can be difficult to know which type a value named `sess` refers to.
(That annoyance is the main motivation for this change.) `psess` is nice
and short, which is good for a name used this much.

The commit also renames some `parse_sess_created` values as
`psess_created`.
2024-03-05 08:11:45 +11:00
Matthias Krüger
686a4b1c17
Rollup merge of #121724 - nnethercote:LitKind-Err-for-floats, r=fmease
Use `LitKind::Err` for malformed floats

#121120 changed `StringReader::cook_lexer_literal` to return `LitKind::Err` for malformed integer literals. This commit does the same for float literals, for consistency.

r? ``@fmease``
2024-02-29 00:17:00 +01:00
Nicholas Nethercote
840c8d3243 Use LitKind::Err for floats with unsupported bases.
This slightly changes error messages in `float-field.rs`, but nothing of
real importance.
2024-02-28 20:59:32 +11:00
Nicholas Nethercote
951f2d9ae2 Use LitKind::Err for floats with empty exponents.
This prevents a follow-up type error in a test, which seems fine.
2024-02-28 20:59:27 +11:00
Nicholas Nethercote
899cb40809 Rename DiagnosticBuilder as Diag.
Much better!

Note that this involves renaming (and updating the value of)
`DIAGNOSTIC_BUILDER` in clippy.
2024-02-28 08:55:35 +11:00
clubby789
06d6c62f80 Add newtype for raw idents 2024-02-20 13:13:29 +00:00
Nicholas Nethercote
25ed6e43b0 Add ErrorGuaranteed to ast::LitKind::Err, token::LitKind::Err.
This mostly works well, and eliminates a couple of delayed bugs.

One annoying thing is that we should really also add an
`ErrorGuaranteed` to `proc_macro::bridge::LitKind::Err`. But that's
difficult because `proc_macro` doesn't have access to `ErrorGuaranteed`,
so we have to fake it.
2024-02-15 14:46:08 +11:00
Nicholas Nethercote
332c57723a Make emit_unescape_error return Option<ErrorGuaranteed>.
And use the result in `cook_common` to decide whether to return an error
token.
2024-02-15 12:58:18 +11:00