borrow from @[] vectors (cc #2797)
This commit is contained in:
parent
bb5e2ba60a
commit
4ee4a2ab31
2 changed files with 224 additions and 150 deletions
|
@ -1,149 +1,217 @@
|
|||
/*!
|
||||
* # Borrow check
|
||||
*
|
||||
* This pass is in job of enforcing *memory safety* and *purity*. As
|
||||
* memory safety is by far the more complex topic, I'll focus on that in
|
||||
* this description, but purity will be covered later on. In the context
|
||||
* of Rust, memory safety means three basic things:
|
||||
*
|
||||
* - no writes to immutable memory;
|
||||
* - all pointers point to non-freed memory;
|
||||
* - all pointers point to memory of the same type as the pointer.
|
||||
*
|
||||
* The last point might seem confusing: after all, for the most part,
|
||||
* this condition is guaranteed by the type check. However, there are
|
||||
* two cases where the type check effectively delegates to borrow check.
|
||||
*
|
||||
* The first case has to do with enums. If there is a pointer to the
|
||||
* interior of an enum, and the enum is in a mutable location (such as a
|
||||
* local variable or field declared to be mutable), it is possible that
|
||||
* the user will overwrite the enum with a new value of a different
|
||||
* variant, and thus effectively change the type of the memory that the
|
||||
* pointer is pointing at.
|
||||
*
|
||||
* The second case has to do with mutability. Basically, the type
|
||||
* checker has only a limited understanding of mutability. It will allow
|
||||
* (for example) the user to get an immutable pointer with the address of
|
||||
* a mutable local variable. It will also allow a `@mut T` or `~mut T`
|
||||
* pointer to be borrowed as a `&r.T` pointer. These seeming oversights
|
||||
* are in fact intentional; they allow the user to temporarily treat a
|
||||
* mutable value as immutable. It is up to the borrow check to guarantee
|
||||
* that the value in question is not in fact mutated during the lifetime
|
||||
* `r` of the reference.
|
||||
*
|
||||
* # Summary of the safety check
|
||||
*
|
||||
* In order to enforce mutability, the borrow check has three tricks up
|
||||
* its sleeve.
|
||||
*
|
||||
* First, data which is uniquely tied to the current stack frame (that'll
|
||||
* be defined shortly) is tracked very precisely. This means that, for
|
||||
* example, if an immutable pointer to a mutable local variable is
|
||||
* created, the borrowck will simply check for assignments to that
|
||||
* particular local variable: no other memory is affected.
|
||||
*
|
||||
* Second, if the data is not uniquely tied to the stack frame, it may
|
||||
* still be possible to ensure its validity by rooting garbage collected
|
||||
* pointers at runtime. For example, if there is a mutable local
|
||||
* variable `x` of type `@T`, and its contents are borrowed with an
|
||||
* expression like `&*x`, then the value of `x` will be rooted (today,
|
||||
* that means its ref count will be temporary increased) for the lifetime
|
||||
* of the reference that is created. This means that the pointer remains
|
||||
* valid even if `x` is reassigned.
|
||||
*
|
||||
* Finally, if neither of these two solutions are applicable, then we
|
||||
* require that all operations within the scope of the reference be
|
||||
* *pure*. A pure operation is effectively one that does not write to
|
||||
* any aliasable memory. This means that it is still possible to write
|
||||
* to local variables or other data that is uniquely tied to the stack
|
||||
* frame (there's that term again; formal definition still pending) but
|
||||
* not to data reached via a `&T` or `@T` pointer. Such writes could
|
||||
* possibly have the side-effect of causing the data which must remain
|
||||
* valid to be overwritten.
|
||||
*
|
||||
* # Possible future directions
|
||||
*
|
||||
* There are numerous ways that the `borrowck` could be strengthened, but
|
||||
* these are the two most likely:
|
||||
*
|
||||
* - flow-sensitivity: we do not currently consider flow at all but only
|
||||
* block-scoping. This means that innocent code like the following is
|
||||
* rejected:
|
||||
*
|
||||
* let mut x: int;
|
||||
* ...
|
||||
* x = 5;
|
||||
* let y: &int = &x; // immutable ptr created
|
||||
* ...
|
||||
*
|
||||
* The reason is that the scope of the pointer `y` is the entire
|
||||
* enclosing block, and the assignment `x = 5` occurs within that
|
||||
* block. The analysis is not smart enough to see that `x = 5` always
|
||||
* happens before the immutable pointer is created. This is relatively
|
||||
* easy to fix and will surely be fixed at some point.
|
||||
*
|
||||
* - finer-grained purity checks: currently, our fallback for
|
||||
* guaranteeing random references into mutable, aliasable memory is to
|
||||
* require *total purity*. This is rather strong. We could use local
|
||||
* type-based alias analysis to distinguish writes that could not
|
||||
* possibly invalid the references which must be guaranteed. This
|
||||
* would only work within the function boundaries; function calls would
|
||||
* still require total purity. This seems less likely to be
|
||||
* implemented in the short term as it would make the code
|
||||
* significantly more complex; there is currently no code to analyze
|
||||
* the types and determine the possible impacts of a write.
|
||||
*
|
||||
* # Terminology
|
||||
*
|
||||
* A **loan** is .
|
||||
*
|
||||
* # How the code works
|
||||
*
|
||||
* The borrow check code is divided into several major modules, each of
|
||||
* which is documented in its own file.
|
||||
*
|
||||
* The `gather_loans` and `check_loans` are the two major passes of the
|
||||
* analysis. The `gather_loans` pass runs over the IR once to determine
|
||||
* what memory must remain valid and for how long. Its name is a bit of
|
||||
* a misnomer; it does in fact gather up the set of loans which are
|
||||
* granted, but it also determines when @T pointers must be rooted and
|
||||
* for which scopes purity must be required.
|
||||
*
|
||||
* The `check_loans` pass walks the IR and examines the loans and purity
|
||||
* requirements computed in `gather_loans`. It checks to ensure that (a)
|
||||
* the conditions of all loans are honored; (b) no contradictory loans
|
||||
* were granted (for example, loaning out the same memory as mutable and
|
||||
* immutable simultaneously); and (c) any purity requirements are
|
||||
* honored.
|
||||
*
|
||||
* The remaining modules are helper modules used by `gather_loans` and
|
||||
* `check_loans`:
|
||||
*
|
||||
* - `categorization` has the job of analyzing an expression to determine
|
||||
* what kind of memory is used in evaluating it (for example, where
|
||||
* dereferences occur and what kind of pointer is dereferenced; whether
|
||||
* the memory is mutable; etc)
|
||||
* - `loan` determines when data uniquely tied to the stack frame can be
|
||||
* loaned out.
|
||||
* - `preserve` determines what actions (if any) must be taken to preserve
|
||||
* aliasable data. This is the code which decides when to root
|
||||
* an @T pointer or to require purity.
|
||||
*
|
||||
* # Maps that are created
|
||||
*
|
||||
* Borrowck results in two maps.
|
||||
*
|
||||
* - `root_map`: identifies those expressions or patterns whose result
|
||||
* needs to be rooted. Conceptually the root_map maps from an
|
||||
* expression or pattern node to a `node_id` identifying the scope for
|
||||
* which the expression must be rooted (this `node_id` should identify
|
||||
* a block or call). The actual key to the map is not an expression id,
|
||||
* however, but a `root_map_key`, which combines an expression id with a
|
||||
* deref count and is used to cope with auto-deref.
|
||||
*
|
||||
* - `mutbl_map`: identifies those local variables which are modified or
|
||||
* moved. This is used by trans to guarantee that such variables are
|
||||
* given a memory location and not used as immediates.
|
||||
# Borrow check
|
||||
|
||||
This pass is in job of enforcing *memory safety* and *purity*. As
|
||||
memory safety is by far the more complex topic, I'll focus on that in
|
||||
this description, but purity will be covered later on. In the context
|
||||
of Rust, memory safety means three basic things:
|
||||
|
||||
- no writes to immutable memory;
|
||||
- all pointers point to non-freed memory;
|
||||
- all pointers point to memory of the same type as the pointer.
|
||||
|
||||
The last point might seem confusing: after all, for the most part,
|
||||
this condition is guaranteed by the type check. However, there are
|
||||
two cases where the type check effectively delegates to borrow check.
|
||||
|
||||
The first case has to do with enums. If there is a pointer to the
|
||||
interior of an enum, and the enum is in a mutable location (such as a
|
||||
local variable or field declared to be mutable), it is possible that
|
||||
the user will overwrite the enum with a new value of a different
|
||||
variant, and thus effectively change the type of the memory that the
|
||||
pointer is pointing at.
|
||||
|
||||
The second case has to do with mutability. Basically, the type
|
||||
checker has only a limited understanding of mutability. It will allow
|
||||
(for example) the user to get an immutable pointer with the address of
|
||||
a mutable local variable. It will also allow a `@mut T` or `~mut T`
|
||||
pointer to be borrowed as a `&r.T` pointer. These seeming oversights
|
||||
are in fact intentional; they allow the user to temporarily treat a
|
||||
mutable value as immutable. It is up to the borrow check to guarantee
|
||||
that the value in question is not in fact mutated during the lifetime
|
||||
`r` of the reference.
|
||||
|
||||
# Definition of unstable memory
|
||||
|
||||
The primary danger to safety arises due to *unstable memory*.
|
||||
Unstable memory is memory whose validity or type may change as a
|
||||
result of an assignment, move, or a variable going out of scope.
|
||||
There are two cases in Rust where memory is unstable: the contents of
|
||||
unique boxes and enums.
|
||||
|
||||
Unique boxes are unstable because when the variable containing the
|
||||
unique box is re-assigned, moves, or goes out of scope, the unique box
|
||||
is freed or---in the case of a move---potentially given to another
|
||||
task. In either case, if there is an extant and usable pointer into
|
||||
the box, then safety guarantees would be compromised.
|
||||
|
||||
Enum values are unstable because they are reassigned the types of
|
||||
their contents may change if they are assigned with a different
|
||||
variant than they had previously.
|
||||
|
||||
# Safety criteria that must be enforced
|
||||
|
||||
Whenever a piece of memory is borrowed for lifetime L, there are two
|
||||
things which the borrow checker must guarantee. First, it must
|
||||
guarantee that the memory address will remain allocated (and owned by
|
||||
the current task) for the entirety of the lifetime L. Second, it must
|
||||
guarantee that the type of the data will not change for the entirety
|
||||
of the lifetime L. In exchange, the region-based type system will
|
||||
guarantee that the pointer is not used outside the lifetime L. These
|
||||
guarantees are to some extent independent but are also inter-related.
|
||||
|
||||
In some cases, the type of a pointer cannot be invalidated but the
|
||||
lifetime can. For example, imagine a pointer to the interior of
|
||||
a shared box like:
|
||||
|
||||
let mut x = @mut {f: 5, g: 6};
|
||||
let y = &mut x.f;
|
||||
|
||||
Here, a pointer was created to the interior of a shared box which
|
||||
contains a record. Even if `*x` were to be mutated like so:
|
||||
|
||||
*x = {f: 6, g: 7};
|
||||
|
||||
This would cause `*y` to change from 5 to 6, but the pointer pointer
|
||||
`y` remains valid. It still points at an integer even if that integer
|
||||
has been overwritten.
|
||||
|
||||
However, if we were to reassign `x` itself, like so:
|
||||
|
||||
x = @{f: 6, g: 7};
|
||||
|
||||
This could potentially invalidate `y`, because if `x` were the final
|
||||
reference to the shared box, then that memory would be released and
|
||||
now `y` points at freed memory. (We will see that to prevent this
|
||||
scenario we will *root* shared boxes that reside in mutable memory
|
||||
whose contents are borrowed; rooting means that we create a temporary
|
||||
to ensure that the box is not collected).
|
||||
|
||||
In other cases, like an enum on the stack, the memory cannot be freed
|
||||
but its type can change:
|
||||
|
||||
let mut x = some(5);
|
||||
alt x {
|
||||
some(ref y) => { ... }
|
||||
none => { ... }
|
||||
}
|
||||
|
||||
Here as before, the pointer `y` would be invalidated if we were to
|
||||
reassign `x` to `none`. (We will see that this case is prevented
|
||||
because borrowck tracks data which resides on the stack and prevents
|
||||
variables from reassigned if there may be pointers to their interior)
|
||||
|
||||
Finally, in some cases, both dangers can arise. For example, something
|
||||
like the following:
|
||||
|
||||
let mut x = ~some(5);
|
||||
alt x {
|
||||
~some(ref y) => { ... }
|
||||
~none => { ... }
|
||||
}
|
||||
|
||||
In this case, if `x` to be reassigned or `*x` were to be mutated, then
|
||||
the pointer `y` would be invalided. (This case is also prevented by
|
||||
borrowck tracking data which is owned by the current stack frame)
|
||||
|
||||
# Summary of the safety check
|
||||
|
||||
In order to enforce mutability, the borrow check has a few tricks up
|
||||
its sleeve:
|
||||
|
||||
- When data is owned by the current stack frame, we can identify every
|
||||
possible assignment to a local variable and simply prevent
|
||||
potentially dangerous assignments directly.
|
||||
|
||||
- If data is owned by a shared box, we can root the box to increase
|
||||
its lifetime.
|
||||
|
||||
- If data is found within a borrowed pointer, we can assume that the
|
||||
data will remain live for the entirety of the borrowed pointer.
|
||||
|
||||
- We can rely on the fact that pure actions (such as calling pure
|
||||
functions) do not mutate data which is not owned by the current
|
||||
stack frame.
|
||||
|
||||
# Possible future directions
|
||||
|
||||
There are numerous ways that the `borrowck` could be strengthened, but
|
||||
these are the two most likely:
|
||||
|
||||
- flow-sensitivity: we do not currently consider flow at all but only
|
||||
block-scoping. This means that innocent code like the following is
|
||||
rejected:
|
||||
|
||||
let mut x: int;
|
||||
...
|
||||
x = 5;
|
||||
let y: &int = &x; // immutable ptr created
|
||||
...
|
||||
|
||||
The reason is that the scope of the pointer `y` is the entire
|
||||
enclosing block, and the assignment `x = 5` occurs within that
|
||||
block. The analysis is not smart enough to see that `x = 5` always
|
||||
happens before the immutable pointer is created. This is relatively
|
||||
easy to fix and will surely be fixed at some point.
|
||||
|
||||
- finer-grained purity checks: currently, our fallback for
|
||||
guaranteeing random references into mutable, aliasable memory is to
|
||||
require *total purity*. This is rather strong. We could use local
|
||||
type-based alias analysis to distinguish writes that could not
|
||||
possibly invalid the references which must be guaranteed. This
|
||||
would only work within the function boundaries; function calls would
|
||||
still require total purity. This seems less likely to be
|
||||
implemented in the short term as it would make the code
|
||||
significantly more complex; there is currently no code to analyze
|
||||
the types and determine the possible impacts of a write.
|
||||
|
||||
# How the code works
|
||||
|
||||
The borrow check code is divided into several major modules, each of
|
||||
which is documented in its own file.
|
||||
|
||||
The `gather_loans` and `check_loans` are the two major passes of the
|
||||
analysis. The `gather_loans` pass runs over the IR once to determine
|
||||
what memory must remain valid and for how long. Its name is a bit of
|
||||
a misnomer; it does in fact gather up the set of loans which are
|
||||
granted, but it also determines when @T pointers must be rooted and
|
||||
for which scopes purity must be required.
|
||||
|
||||
The `check_loans` pass walks the IR and examines the loans and purity
|
||||
requirements computed in `gather_loans`. It checks to ensure that (a)
|
||||
the conditions of all loans are honored; (b) no contradictory loans
|
||||
were granted (for example, loaning out the same memory as mutable and
|
||||
immutable simultaneously); and (c) any purity requirements are
|
||||
honored.
|
||||
|
||||
The remaining modules are helper modules used by `gather_loans` and
|
||||
`check_loans`:
|
||||
|
||||
- `categorization` has the job of analyzing an expression to determine
|
||||
what kind of memory is used in evaluating it (for example, where
|
||||
dereferences occur and what kind of pointer is dereferenced; whether
|
||||
the memory is mutable; etc)
|
||||
- `loan` determines when data uniquely tied to the stack frame can be
|
||||
loaned out.
|
||||
- `preserve` determines what actions (if any) must be taken to preserve
|
||||
aliasable data. This is the code which decides when to root
|
||||
an @T pointer or to require purity.
|
||||
|
||||
# Maps that are created
|
||||
|
||||
Borrowck results in two maps.
|
||||
|
||||
- `root_map`: identifies those expressions or patterns whose result
|
||||
needs to be rooted. Conceptually the root_map maps from an
|
||||
expression or pattern node to a `node_id` identifying the scope for
|
||||
which the expression must be rooted (this `node_id` should identify
|
||||
a block or call). The actual key to the map is not an expression id,
|
||||
however, but a `root_map_key`, which combines an expression id with a
|
||||
deref count and is used to cope with auto-deref.
|
||||
|
||||
- `mutbl_map`: identifies those local variables which are modified or
|
||||
moved. This is used by trans to guarantee that such variables are
|
||||
given a memory location and not used as immediates.
|
||||
*/
|
||||
|
||||
import syntax::ast;
|
||||
|
|
|
@ -361,12 +361,18 @@ impl public_methods for borrowck_ctxt {
|
|||
|
||||
ret alt deref_kind(self.tcx, base_cmt.ty) {
|
||||
deref_ptr(ptr) {
|
||||
// make deref of vectors explicit, as explained in the comment at
|
||||
// the head of this section
|
||||
let deref_lp = base_cmt.lp.map(|lp| @lp_deref(lp, ptr) );
|
||||
// (a) the contents are loanable if the base is loanable
|
||||
// and this is a *unique* vector
|
||||
let deref_lp = alt ptr {
|
||||
uniq_ptr => {base_cmt.lp.map(|lp| @lp_deref(lp, uniq_ptr))}
|
||||
_ => {none}
|
||||
};
|
||||
|
||||
// (b) the deref is explicit in the resulting cmt
|
||||
let deref_cmt = @{id:expr.id, span:expr.span,
|
||||
cat:cat_deref(base_cmt, 0u, ptr), lp:deref_lp,
|
||||
mutbl:m_imm, ty:mt.ty};
|
||||
cat:cat_deref(base_cmt, 0u, ptr), lp:deref_lp,
|
||||
mutbl:m_imm, ty:mt.ty};
|
||||
|
||||
comp(expr, deref_cmt, base_cmt.ty, mt)
|
||||
}
|
||||
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue