1
Fork 0

borrow from @[] vectors (cc #2797)

This commit is contained in:
Niko Matsakis 2012-07-17 18:44:00 -07:00
parent bb5e2ba60a
commit 4ee4a2ab31
2 changed files with 224 additions and 150 deletions

View file

@ -1,149 +1,217 @@
/*!
* # Borrow check
*
* This pass is in job of enforcing *memory safety* and *purity*. As
* memory safety is by far the more complex topic, I'll focus on that in
* this description, but purity will be covered later on. In the context
* of Rust, memory safety means three basic things:
*
* - no writes to immutable memory;
* - all pointers point to non-freed memory;
* - all pointers point to memory of the same type as the pointer.
*
* The last point might seem confusing: after all, for the most part,
* this condition is guaranteed by the type check. However, there are
* two cases where the type check effectively delegates to borrow check.
*
* The first case has to do with enums. If there is a pointer to the
* interior of an enum, and the enum is in a mutable location (such as a
* local variable or field declared to be mutable), it is possible that
* the user will overwrite the enum with a new value of a different
* variant, and thus effectively change the type of the memory that the
* pointer is pointing at.
*
* The second case has to do with mutability. Basically, the type
* checker has only a limited understanding of mutability. It will allow
* (for example) the user to get an immutable pointer with the address of
* a mutable local variable. It will also allow a `@mut T` or `~mut T`
* pointer to be borrowed as a `&r.T` pointer. These seeming oversights
* are in fact intentional; they allow the user to temporarily treat a
* mutable value as immutable. It is up to the borrow check to guarantee
* that the value in question is not in fact mutated during the lifetime
* `r` of the reference.
*
* # Summary of the safety check
*
* In order to enforce mutability, the borrow check has three tricks up
* its sleeve.
*
* First, data which is uniquely tied to the current stack frame (that'll
* be defined shortly) is tracked very precisely. This means that, for
* example, if an immutable pointer to a mutable local variable is
* created, the borrowck will simply check for assignments to that
* particular local variable: no other memory is affected.
*
* Second, if the data is not uniquely tied to the stack frame, it may
* still be possible to ensure its validity by rooting garbage collected
* pointers at runtime. For example, if there is a mutable local
* variable `x` of type `@T`, and its contents are borrowed with an
* expression like `&*x`, then the value of `x` will be rooted (today,
* that means its ref count will be temporary increased) for the lifetime
* of the reference that is created. This means that the pointer remains
* valid even if `x` is reassigned.
*
* Finally, if neither of these two solutions are applicable, then we
* require that all operations within the scope of the reference be
* *pure*. A pure operation is effectively one that does not write to
* any aliasable memory. This means that it is still possible to write
* to local variables or other data that is uniquely tied to the stack
* frame (there's that term again; formal definition still pending) but
* not to data reached via a `&T` or `@T` pointer. Such writes could
* possibly have the side-effect of causing the data which must remain
* valid to be overwritten.
*
* # Possible future directions
*
* There are numerous ways that the `borrowck` could be strengthened, but
* these are the two most likely:
*
* - flow-sensitivity: we do not currently consider flow at all but only
* block-scoping. This means that innocent code like the following is
* rejected:
*
* let mut x: int;
* ...
* x = 5;
* let y: &int = &x; // immutable ptr created
* ...
*
* The reason is that the scope of the pointer `y` is the entire
* enclosing block, and the assignment `x = 5` occurs within that
* block. The analysis is not smart enough to see that `x = 5` always
* happens before the immutable pointer is created. This is relatively
* easy to fix and will surely be fixed at some point.
*
* - finer-grained purity checks: currently, our fallback for
* guaranteeing random references into mutable, aliasable memory is to
* require *total purity*. This is rather strong. We could use local
* type-based alias analysis to distinguish writes that could not
* possibly invalid the references which must be guaranteed. This
* would only work within the function boundaries; function calls would
* still require total purity. This seems less likely to be
* implemented in the short term as it would make the code
* significantly more complex; there is currently no code to analyze
* the types and determine the possible impacts of a write.
*
* # Terminology
*
* A **loan** is .
*
* # How the code works
*
* The borrow check code is divided into several major modules, each of
* which is documented in its own file.
*
* The `gather_loans` and `check_loans` are the two major passes of the
* analysis. The `gather_loans` pass runs over the IR once to determine
* what memory must remain valid and for how long. Its name is a bit of
* a misnomer; it does in fact gather up the set of loans which are
* granted, but it also determines when @T pointers must be rooted and
* for which scopes purity must be required.
*
* The `check_loans` pass walks the IR and examines the loans and purity
* requirements computed in `gather_loans`. It checks to ensure that (a)
* the conditions of all loans are honored; (b) no contradictory loans
* were granted (for example, loaning out the same memory as mutable and
* immutable simultaneously); and (c) any purity requirements are
* honored.
*
* The remaining modules are helper modules used by `gather_loans` and
* `check_loans`:
*
* - `categorization` has the job of analyzing an expression to determine
* what kind of memory is used in evaluating it (for example, where
* dereferences occur and what kind of pointer is dereferenced; whether
* the memory is mutable; etc)
* - `loan` determines when data uniquely tied to the stack frame can be
* loaned out.
* - `preserve` determines what actions (if any) must be taken to preserve
* aliasable data. This is the code which decides when to root
* an @T pointer or to require purity.
*
* # Maps that are created
*
* Borrowck results in two maps.
*
* - `root_map`: identifies those expressions or patterns whose result
* needs to be rooted. Conceptually the root_map maps from an
* expression or pattern node to a `node_id` identifying the scope for
* which the expression must be rooted (this `node_id` should identify
* a block or call). The actual key to the map is not an expression id,
* however, but a `root_map_key`, which combines an expression id with a
* deref count and is used to cope with auto-deref.
*
* - `mutbl_map`: identifies those local variables which are modified or
* moved. This is used by trans to guarantee that such variables are
* given a memory location and not used as immediates.
# Borrow check
This pass is in job of enforcing *memory safety* and *purity*. As
memory safety is by far the more complex topic, I'll focus on that in
this description, but purity will be covered later on. In the context
of Rust, memory safety means three basic things:
- no writes to immutable memory;
- all pointers point to non-freed memory;
- all pointers point to memory of the same type as the pointer.
The last point might seem confusing: after all, for the most part,
this condition is guaranteed by the type check. However, there are
two cases where the type check effectively delegates to borrow check.
The first case has to do with enums. If there is a pointer to the
interior of an enum, and the enum is in a mutable location (such as a
local variable or field declared to be mutable), it is possible that
the user will overwrite the enum with a new value of a different
variant, and thus effectively change the type of the memory that the
pointer is pointing at.
The second case has to do with mutability. Basically, the type
checker has only a limited understanding of mutability. It will allow
(for example) the user to get an immutable pointer with the address of
a mutable local variable. It will also allow a `@mut T` or `~mut T`
pointer to be borrowed as a `&r.T` pointer. These seeming oversights
are in fact intentional; they allow the user to temporarily treat a
mutable value as immutable. It is up to the borrow check to guarantee
that the value in question is not in fact mutated during the lifetime
`r` of the reference.
# Definition of unstable memory
The primary danger to safety arises due to *unstable memory*.
Unstable memory is memory whose validity or type may change as a
result of an assignment, move, or a variable going out of scope.
There are two cases in Rust where memory is unstable: the contents of
unique boxes and enums.
Unique boxes are unstable because when the variable containing the
unique box is re-assigned, moves, or goes out of scope, the unique box
is freed or---in the case of a move---potentially given to another
task. In either case, if there is an extant and usable pointer into
the box, then safety guarantees would be compromised.
Enum values are unstable because they are reassigned the types of
their contents may change if they are assigned with a different
variant than they had previously.
# Safety criteria that must be enforced
Whenever a piece of memory is borrowed for lifetime L, there are two
things which the borrow checker must guarantee. First, it must
guarantee that the memory address will remain allocated (and owned by
the current task) for the entirety of the lifetime L. Second, it must
guarantee that the type of the data will not change for the entirety
of the lifetime L. In exchange, the region-based type system will
guarantee that the pointer is not used outside the lifetime L. These
guarantees are to some extent independent but are also inter-related.
In some cases, the type of a pointer cannot be invalidated but the
lifetime can. For example, imagine a pointer to the interior of
a shared box like:
let mut x = @mut {f: 5, g: 6};
let y = &mut x.f;
Here, a pointer was created to the interior of a shared box which
contains a record. Even if `*x` were to be mutated like so:
*x = {f: 6, g: 7};
This would cause `*y` to change from 5 to 6, but the pointer pointer
`y` remains valid. It still points at an integer even if that integer
has been overwritten.
However, if we were to reassign `x` itself, like so:
x = @{f: 6, g: 7};
This could potentially invalidate `y`, because if `x` were the final
reference to the shared box, then that memory would be released and
now `y` points at freed memory. (We will see that to prevent this
scenario we will *root* shared boxes that reside in mutable memory
whose contents are borrowed; rooting means that we create a temporary
to ensure that the box is not collected).
In other cases, like an enum on the stack, the memory cannot be freed
but its type can change:
let mut x = some(5);
alt x {
some(ref y) => { ... }
none => { ... }
}
Here as before, the pointer `y` would be invalidated if we were to
reassign `x` to `none`. (We will see that this case is prevented
because borrowck tracks data which resides on the stack and prevents
variables from reassigned if there may be pointers to their interior)
Finally, in some cases, both dangers can arise. For example, something
like the following:
let mut x = ~some(5);
alt x {
~some(ref y) => { ... }
~none => { ... }
}
In this case, if `x` to be reassigned or `*x` were to be mutated, then
the pointer `y` would be invalided. (This case is also prevented by
borrowck tracking data which is owned by the current stack frame)
# Summary of the safety check
In order to enforce mutability, the borrow check has a few tricks up
its sleeve:
- When data is owned by the current stack frame, we can identify every
possible assignment to a local variable and simply prevent
potentially dangerous assignments directly.
- If data is owned by a shared box, we can root the box to increase
its lifetime.
- If data is found within a borrowed pointer, we can assume that the
data will remain live for the entirety of the borrowed pointer.
- We can rely on the fact that pure actions (such as calling pure
functions) do not mutate data which is not owned by the current
stack frame.
# Possible future directions
There are numerous ways that the `borrowck` could be strengthened, but
these are the two most likely:
- flow-sensitivity: we do not currently consider flow at all but only
block-scoping. This means that innocent code like the following is
rejected:
let mut x: int;
...
x = 5;
let y: &int = &x; // immutable ptr created
...
The reason is that the scope of the pointer `y` is the entire
enclosing block, and the assignment `x = 5` occurs within that
block. The analysis is not smart enough to see that `x = 5` always
happens before the immutable pointer is created. This is relatively
easy to fix and will surely be fixed at some point.
- finer-grained purity checks: currently, our fallback for
guaranteeing random references into mutable, aliasable memory is to
require *total purity*. This is rather strong. We could use local
type-based alias analysis to distinguish writes that could not
possibly invalid the references which must be guaranteed. This
would only work within the function boundaries; function calls would
still require total purity. This seems less likely to be
implemented in the short term as it would make the code
significantly more complex; there is currently no code to analyze
the types and determine the possible impacts of a write.
# How the code works
The borrow check code is divided into several major modules, each of
which is documented in its own file.
The `gather_loans` and `check_loans` are the two major passes of the
analysis. The `gather_loans` pass runs over the IR once to determine
what memory must remain valid and for how long. Its name is a bit of
a misnomer; it does in fact gather up the set of loans which are
granted, but it also determines when @T pointers must be rooted and
for which scopes purity must be required.
The `check_loans` pass walks the IR and examines the loans and purity
requirements computed in `gather_loans`. It checks to ensure that (a)
the conditions of all loans are honored; (b) no contradictory loans
were granted (for example, loaning out the same memory as mutable and
immutable simultaneously); and (c) any purity requirements are
honored.
The remaining modules are helper modules used by `gather_loans` and
`check_loans`:
- `categorization` has the job of analyzing an expression to determine
what kind of memory is used in evaluating it (for example, where
dereferences occur and what kind of pointer is dereferenced; whether
the memory is mutable; etc)
- `loan` determines when data uniquely tied to the stack frame can be
loaned out.
- `preserve` determines what actions (if any) must be taken to preserve
aliasable data. This is the code which decides when to root
an @T pointer or to require purity.
# Maps that are created
Borrowck results in two maps.
- `root_map`: identifies those expressions or patterns whose result
needs to be rooted. Conceptually the root_map maps from an
expression or pattern node to a `node_id` identifying the scope for
which the expression must be rooted (this `node_id` should identify
a block or call). The actual key to the map is not an expression id,
however, but a `root_map_key`, which combines an expression id with a
deref count and is used to cope with auto-deref.
- `mutbl_map`: identifies those local variables which are modified or
moved. This is used by trans to guarantee that such variables are
given a memory location and not used as immediates.
*/
import syntax::ast;

View file

@ -361,12 +361,18 @@ impl public_methods for borrowck_ctxt {
ret alt deref_kind(self.tcx, base_cmt.ty) {
deref_ptr(ptr) {
// make deref of vectors explicit, as explained in the comment at
// the head of this section
let deref_lp = base_cmt.lp.map(|lp| @lp_deref(lp, ptr) );
// (a) the contents are loanable if the base is loanable
// and this is a *unique* vector
let deref_lp = alt ptr {
uniq_ptr => {base_cmt.lp.map(|lp| @lp_deref(lp, uniq_ptr))}
_ => {none}
};
// (b) the deref is explicit in the resulting cmt
let deref_cmt = @{id:expr.id, span:expr.span,
cat:cat_deref(base_cmt, 0u, ptr), lp:deref_lp,
mutbl:m_imm, ty:mt.ty};
cat:cat_deref(base_cmt, 0u, ptr), lp:deref_lp,
mutbl:m_imm, ty:mt.ty};
comp(expr, deref_cmt, base_cmt.ty, mt)
}