Copyedit the macro tutorial
I hope I haven't introduced any grievous errors :-)
This commit is contained in:
parent
22efa39382
commit
0aa42bc91e
1 changed files with 73 additions and 44 deletions
|
@ -2,10 +2,11 @@
|
||||||
|
|
||||||
# Introduction
|
# Introduction
|
||||||
|
|
||||||
Functions are the programmer's primary tool of abstraction, but there are
|
Functions are the primary tool that programmers can use to build
|
||||||
cases in which they are insufficient, because the programmer wants to
|
abstractions. Sometimes, though, programmers want to abstract over
|
||||||
abstract over concepts not represented as values. Consider the following
|
compile-time, syntactic structures rather than runtime values. For example,
|
||||||
example:
|
the following two code fragments both pattern-match on their input and return
|
||||||
|
early in one case, doing nothing otherwise:
|
||||||
|
|
||||||
~~~~
|
~~~~
|
||||||
# enum t { special_a(uint), special_b(uint) };
|
# enum t { special_a(uint), special_b(uint) };
|
||||||
|
@ -24,11 +25,12 @@ match input_2 {
|
||||||
# }
|
# }
|
||||||
~~~~
|
~~~~
|
||||||
|
|
||||||
This code could become tiresome if repeated many times. However, there is
|
This code could become tiresome if repeated many times. However, there is no
|
||||||
no reasonable function that could be written to solve this problem. In such a
|
straightforward way to rewrite it without the repeated code, using functions
|
||||||
case, it's possible to define a macro to solve the problem. Macros are
|
alone. There is a solution, though: defining a macro to solve the problem. Macros are
|
||||||
lightweight custom syntax extensions, themselves defined using the
|
lightweight custom syntax extensions, themselves defined using the
|
||||||
`macro_rules!` syntax extension:
|
`macro_rules!` syntax extension. The following `early_return` macro captures
|
||||||
|
the pattern in the above code:
|
||||||
|
|
||||||
~~~~
|
~~~~
|
||||||
# enum t { special_a(uint), special_b(uint) };
|
# enum t { special_a(uint), special_b(uint) };
|
||||||
|
@ -42,7 +44,12 @@ macro_rules! early_return(
|
||||||
}
|
}
|
||||||
);
|
);
|
||||||
);
|
);
|
||||||
// ...
|
~~~~
|
||||||
|
|
||||||
|
Now, we can replace each `match` with an invocation of the `early_return`
|
||||||
|
macro:
|
||||||
|
|
||||||
|
~~~~
|
||||||
early_return!(input_1 special_a);
|
early_return!(input_1 special_a);
|
||||||
// ...
|
// ...
|
||||||
early_return!(input_2 special_b);
|
early_return!(input_2 special_b);
|
||||||
|
@ -50,48 +57,72 @@ early_return!(input_2 special_b);
|
||||||
# }
|
# }
|
||||||
~~~~
|
~~~~
|
||||||
|
|
||||||
Macros are defined in pattern-matching style:
|
Macros are defined in pattern-matching style: in the above example, the text
|
||||||
|
`($inp:expr $sp:ident)` that appears on the left-hand side of the `=>` is the
|
||||||
|
*macro invocation syntax*, a pattern denoting how to write a call to the
|
||||||
|
macro. The text on the right-hand side of the `=>`, beginning with `match
|
||||||
|
$inp`, is the *macro transcription syntax*: what the macro expands to.
|
||||||
|
|
||||||
# Invocation syntax
|
# Invocation syntax
|
||||||
|
|
||||||
On the left-hand-side of the `=>` is the macro invocation syntax. It is
|
The macro invocation syntax specifies the syntax for the arguments to the
|
||||||
free-form, excepting the following rules:
|
macro. It appears on the left-hand side of the `=>` in a macro definition. It
|
||||||
|
conforms to the following rules:
|
||||||
|
|
||||||
1. It must be surrounded in parentheses.
|
1. It must be surrounded by parentheses.
|
||||||
2. `$` has special meaning.
|
2. `$` has special meaning.
|
||||||
3. The `()`s, `[]`s, and `{}`s it contains must balance. For example, `([)` is
|
3. The `()`s, `[]`s, and `{}`s it contains must balance. For example, `([)` is
|
||||||
forbidden.
|
forbidden.
|
||||||
|
|
||||||
|
Otherwise, the invocation syntax is free-form.
|
||||||
|
|
||||||
To take as an argument a fragment of Rust code, write `$` followed by a name
|
To take as an argument a fragment of Rust code, write `$` followed by a name
|
||||||
(for use on the right-hand side), followed by a `:`, followed by the sort of
|
(for use on the right-hand side), followed by a `:`, followed by a *fragment
|
||||||
fragment to match (the most common ones are `ident`, `expr`, `ty`, `pat`, and
|
specifier*. The fragment specifier denotes the sort of fragment to match. The
|
||||||
`block`). Anything not preceded by a `$` is taken literally. The standard
|
most common fragment specifiers are:
|
||||||
|
|
||||||
|
* `ident` (an identifier, referring to a variable or item. Examples: `f`, `x`,
|
||||||
|
`foo`.)
|
||||||
|
* `expr` (an expression. Examples: `2 + 2`; `if true then { 1 } else { 2 }`;
|
||||||
|
`f(42)`.)
|
||||||
|
* `ty` (a type. Examples: `int`, `~[(char, ~str)]`, `&T`.)
|
||||||
|
* `pat` (a pattern, usually appearing in a `match` or on the left-hand side of
|
||||||
|
a declaration. Examples: `Some(t)`; `(17, 'a')`; `_`.)
|
||||||
|
* `block` (a sequence of actions. Example: `{ log(error, "hi"); return 12; }`)
|
||||||
|
|
||||||
|
The parser interprets any token that's not preceded by a `$` literally. Rust's usual
|
||||||
rules of tokenization apply,
|
rules of tokenization apply,
|
||||||
|
|
||||||
So `($x:ident => (($e:expr)))`, though excessively fancy, would create a macro
|
So `($x:ident -> (($e:expr)))`, though excessively fancy, would designate a macro
|
||||||
that could be invoked like `my_macro!(i=>(( 2+2 )))`.
|
that could be invoked like: `my_macro!(i->(( 2+2 )))`.
|
||||||
|
|
||||||
# Transcription syntax
|
# Transcription syntax
|
||||||
|
|
||||||
The right-hand side of the `=>` follows the same rules as the left-hand side,
|
The right-hand side of the `=>` follows the same rules as the left-hand side,
|
||||||
except that `$` need only be followed by the name of the syntactic fragment
|
except that a `$` need only be followed by the name of the syntactic fragment
|
||||||
to transcribe.
|
to transcribe into the macro expansion; its type need not be repeated.
|
||||||
|
|
||||||
The right-hand side must be surrounded by delimiters of some kind, and must be
|
The right-hand side must be enclosed by delimiters, and must be
|
||||||
an expression; currently, user-defined macros can only be invoked in
|
an expression. Currently, invocations of user-defined macros can only appear in a context
|
||||||
expression position (even though `macro_rules!` itself can be in item
|
where the Rust grammar requires an expression, even though `macro_rules!` itself can appear
|
||||||
position).
|
in a context where the grammar requires an item.
|
||||||
|
|
||||||
# Multiplicity
|
# Multiplicity
|
||||||
|
|
||||||
## Invocation
|
## Invocation
|
||||||
|
|
||||||
Going back to the motivating example, suppose that we wanted each invocation
|
Going back to the motivating example, recall that `early_return` expanded into
|
||||||
of `early_return` to potentially accept multiple "special" identifiers. The
|
a `match` that would `return` if the `match`'s scrutinee matched the
|
||||||
syntax `$(...)*` accepts zero or more occurrences of its contents, much like
|
"special case" identifier provided as the second argument to `early_return`,
|
||||||
the Kleene star operator in regular expressions. It also supports a separator
|
and do nothing otherwise. Now suppose that we wanted to write a
|
||||||
token (a comma-separated list could be written `$(...),*`), and `+` instead of
|
version of `early_return` that could handle a variable number of "special"
|
||||||
`*` to mean "at least one".
|
cases.
|
||||||
|
|
||||||
|
The syntax `$(...)*` on the left-hand side of the `=>` in a macro definition
|
||||||
|
accepts zero or more occurrences of its contents. It works much
|
||||||
|
like the `*` operator in regular expressions. It also supports a
|
||||||
|
separator token (a comma-separated list could be written `$(...),*`), and `+`
|
||||||
|
instead of `*` to mean "at least one".
|
||||||
|
|
||||||
~~~~
|
~~~~
|
||||||
# enum t { special_a(uint),special_b(uint),special_c(uint),special_d(uint)};
|
# enum t { special_a(uint),special_b(uint),special_c(uint),special_d(uint)};
|
||||||
|
@ -118,37 +149,35 @@ early_return!(input_2, [special_b]);
|
||||||
### Transcription
|
### Transcription
|
||||||
|
|
||||||
As the above example demonstrates, `$(...)*` is also valid on the right-hand
|
As the above example demonstrates, `$(...)*` is also valid on the right-hand
|
||||||
side of a macro definition. The behavior of Kleene star in transcription,
|
side of a macro definition. The behavior of `*` in transcription,
|
||||||
especially in cases where multiple stars are nested, and multiple different
|
especially in cases where multiple `*`s are nested, and multiple different
|
||||||
names are involved, can seem somewhat magical and intuitive at first. The
|
names are involved, can seem somewhat magical and intuitive at first. The
|
||||||
system that interprets them is called "Macro By Example". The two rules to
|
system that interprets them is called "Macro By Example". The two rules to
|
||||||
keep in mind are (1) the behavior of `$(...)*` is to walk through one "layer"
|
keep in mind are (1) the behavior of `$(...)*` is to walk through one "layer"
|
||||||
of repetitions for all of the `$name`s it contains in lockstep, and (2) each
|
of repetitions for all of the `$name`s it contains in lockstep, and (2) each
|
||||||
`$name` must be under at least as many `$(...)*`s as it was matched against.
|
`$name` must be under at least as many `$(...)*`s as it was matched against.
|
||||||
If it is under more, it'll will be repeated, as appropriate.
|
If it is under more, it'll be repeated, as appropriate.
|
||||||
|
|
||||||
## Parsing limitations
|
## Parsing limitations
|
||||||
|
|
||||||
The parser used by the macro system is reasonably powerful, but the parsing of
|
The macro parser will parse Rust syntax with two limitations:
|
||||||
Rust syntax is restricted in two ways:
|
|
||||||
|
|
||||||
1. The parser will always parse as much as possible. For example, if the comma
|
1. The parser will always parse as much as possible. For example, if the comma
|
||||||
were omitted from the syntax of `early_return!` above, `input_1 [` would've
|
were omitted from the syntax of `early_return!` above, `input_1 [` would've
|
||||||
been interpreted as the beginning of an array index. In fact, invoking the
|
been interpreted as the beginning of an array index. In fact, invoking the
|
||||||
macro would have been impossible.
|
macro would have been impossible.
|
||||||
2. The parser must have eliminated all ambiguity by the time it reaches a
|
2. The parser must have eliminated all ambiguity by the time it reaches a
|
||||||
`$name:fragment_specifier`. This most often affects them when they occur in
|
`$name:fragment_specifier` declaration. This limitation can result in parse
|
||||||
the beginning of, or immediately after, a `$(...)*`; requiring a distinctive
|
errors when declarations occur at the beginning of, or immediately after,
|
||||||
|
a `$(...)*`. Changing the invocation syntax to require a distinctive
|
||||||
token in front can solve the problem.
|
token in front can solve the problem.
|
||||||
|
|
||||||
## A final note
|
## A final note
|
||||||
|
|
||||||
Macros, as currently implemented, are not for the faint of heart. Even
|
Macros, as currently implemented, are not for the faint of heart. Even
|
||||||
ordinary syntax errors can be more difficult to debug when they occur inside
|
ordinary syntax errors can be more difficult to debug when they occur inside a
|
||||||
a macro, and errors caused by parse problems in generated code can be very
|
macro, and errors caused by parse problems in generated code can be very
|
||||||
tricky. Invoking the `log_syntax!` macro can help elucidate intermediate
|
tricky. Invoking the `log_syntax!` macro can help elucidate intermediate
|
||||||
states, using `trace_macros!(true)` will automatically print those
|
states, invoking `trace_macros!(true)` will automatically print those
|
||||||
intermediate states out, and using `--pretty expanded` as an argument to the
|
intermediate states out, and passing the flag `--pretty expanded` as a
|
||||||
compiler will show the result of expansion.
|
command-line argument to the compiler will show the result of expansion.
|
||||||
|
|
||||||
|
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue