This commit adds ranking and quick filtering to type-based search,
improving performance and having it order results based on their
type signatures.
Motivation
----------
If I write a query like `str -> String`, a lot of functions come up.
That's to be expected, but `String::from_str` should come up on top, and
it doesn't right now. This is because the sorting algorithm is based
on the functions name, and doesn't consider the type signature at all.
`slice::join` even comes up above it!
To fix this, the sorting should take into account the function's
signature, and the closer match should come up on top.
Guide-level description
-----------------------
When searching by type signature, types with a "closer" match will
show up above types that match less precisely.
Reference-level explanation
---------------------------
Functions signature search works in three major phases:
* A compact "fingerprint," based on the [bloom filter] technique, is used to
check for matches and to estimate the distance. It sometimes has false
positive matches, but it also operates on 128 bit contiguous memory and
requires no backtracking, so it performs a lot better than real
unification.
The fingerprint represents the set of items in the type signature, but it
does not represent nesting, and it ignores when the same item appears more
than once.
The result is rejected if any query bits are absent in the function, or
if the distance is higher than the current maximum and 200
results have already been found.
* The second step performs unification. This is where nesting and true bag
semantics are taken into account, and it has no false positives. It uses a
recursive, backtracking algorithm.
The result is rejected if any query elements are absent in the function.
[bloom filter]: https://en.wikipedia.org/wiki/Bloom_filter
Drawbacks
---------
This makes the code bigger.
More than that, this design is a subtle trade-off. It makes the cases I've
tested against measurably faster, but it's not clear how well this extends
to other crates with potentially more functions and fewer types.
The more complex things get, the more important it is to gather a good set
of data to test with (this is arguably more important than the actual
benchmarking ifrastructure right now).
Rationale and alternatives
--------------------------
Throwing a bloom filter in front makes it faster.
More than that, it tries to take a tactic where the system can not only check
for potential matches, but also gets an accurate distance function without
needing to do unification. That way it can skip unification even on items
that have the needed elems, as long as they have more items than the
currently found maximum.
If I didn't want to be able to cheaply do set operations on the fingerprint,
a [cuckoo filter] is supposed to have better performance.
But the nice bit-banging set intersection doesn't work AFAIK.
I also looked into [minhashing], but since it's actually an unbiased
estimate of the similarity coefficient, I'm not sure how it could be used
to skip unification (I wouldn't know if the estimate was too low or
too high).
This function actually uses the number of distinct items as its
"distance function."
This should give the same results that it would have gotten from a Jaccard
Distance $1-\frac{|F\cap{}Q|}{|F\cup{}Q|}$, while being cheaper to compute.
This is because:
* The function $F$ must be a superset of the query $Q$, so their union is
just $F$ and the intersection is $Q$ and it can be reduced to
$1-\frac{|Q|}{|F|}.
* There are no magic thresholds. These values are only being used to
compare against each other while sorting (and, if 200 results are found,
to compare with the maximum match). This means we only care if one value
is bigger than the other, not what it's actual value is, and since $Q$ is
the same for everything, it can be safely left out, reducing the formula
to $1-\frac{1}{|F|} = \frac{|F|}{|F|}-\frac{1}{|F|} = |F|-1$. And, since
the values are only being compared with each other, $|F|$ is fine.
Prior art
---------
This is significantly different from how Hoogle does it.
It doesn't account for order, and it has no special account for nesting,
though `Box<t>` is still two items, while `t` is only one.
This should give the same results that it would have gotten from a Jaccard
Distance $1-\frac{|A\cap{}B|}{|A\cup{}B|}$, while being cheaper to compute.
Unresolved questions
--------------------
`[]` and `()`, the slice/array and tuple/union operators, are ignored while
building the signature for the query. This is because they match more than
one thing, making them ambiguous. Unfortunately, this also makes them
a performance cliff. Is this likely to be a problem?
Right now, the system just stashes the type distance into the
same field that levenshtein distance normally goes in. This means exact
query matches show up on top (for example, if you have a function like
`fn nothing(a: Nothing, b: i32)`, then searching for `nothing` will show it
on top even if there's another function with `fn bar(x: Nothing)` that's
technically a closer match in type signature.
Future possibilities
--------------------
It should be possible to adopt more sorting criteria to act as a tie breaker,
which could be determined during unification.
[cuckoo filter]: https://en.wikipedia.org/wiki/Cuckoo_filter
[minhashing]: https://en.wikipedia.org/wiki/MinHash
This function dates back to 9a45c9d7c6 and
seems to have been made obsolete when `addIntoResult` grew the ability to
check the levenshtein distance matching with commit
ba824ec52b.
Change a typo mistake in the-doc-attribute.md
I guess that `Bar` in the section I changed should be `bar` because when I run the program it has its page under struct but bar doesn't have any page.
Coroutine variant fields can be uninitialized
Wrap coroutine variant fields in MaybeUninit to indicate that they might be uninitialized. Otherwise an uninhabited field will make the entire variant uninhabited and introduce undefined behaviour.
The analogous issue in the prefix of coroutine layout was addressed by 6fae7f8071.
Support bare unit structs in destructuring assignments
We should be allowed to use destructuring assignments on *bare* unit structs, not just unit structs that are located within other pattern constructors.
Fixes#118753
r? petrochenkov since you reviewed #95380, reassign if you're busy or don't want to review this.
Unbreak non-unix non-windows bootstrap
Fixes#118862.
#118647 added a new use of std::io::Write that is not conditional on any cfg.
028b6d152e/src/bootstrap/src/bin/main.rs (L134)
```console
error[E0599]: no method named `write_all` found for struct `File` in the current scope
--> src/bin/main.rs:134:21
|
134 | t!(file.write_all(lines.join("\n").as_bytes()));
| ^^^^^^^^^ method not found in `File`
|
= help: items from traits can only be used if the trait is in scope
help: the following trait is implemented but not in scope; perhaps add a `use` for it:
|
8 + use std::io::Write;
|
```
ParamEnv is canonicalized in *queries input* rather than query response.
In such case we don't "preserve universes" of canonical variable.
This means that `universe_map` always has the default value, which is
wasteful to store in the cache.
rustc_passes: Enforce `rustc::potential_query_instability` lint
Stop allowing `rustc::potential_query_instability` in all of `rustc_passes` and instead allow it on a case-by-case basis if it is safe. In this case, all instances of the lint are safe to allow.
Part of https://github.com/rust-lang/rust/issues/84447 which is E-help-wanted.
codegen: panic when trying to compute size/align of extern type
The alignment is also computed when accessing a field of extern type at non-zero offset, so we also panic in that case.
Previously `size_of_val` worked because the code path there assumed that "thin pointer" means "sized". But that's not true any more with extern types. The returned size and align are just blatantly wrong, so it seems better to panic than returning wrong results. We use a non-unwinding panic since code probably does not expect size_of_val to panic.
[`RFC 3086`] Attempt to try to resolve blocking concerns
Implements what is described at https://github.com/rust-lang/rust/issues/83527#issuecomment-1744822345 to hopefully make some progress.
It is unknown if such approach is or isn't desired due to the lack of further feedback, as such, it is probably best to nominate this PR to the official entities.
`@rustbot` labels +I-compiler-nominated
Actually parse async gen blocks correctly
1. I got the control flow in `parse_expr_bottom` messed up, and obviously forgot a test for `async gen`, so we weren't actually ever parsing it correctly.
2. I forgot to gate the span for `async gen {}`, so even if we did parse it, we wouldn't have correctly denied it in `cfg(FALSE)`.
r? eholk
Clean up variables in `search.js`
While reviewing https://github.com/rust-lang/rust/pull/118402, I saw a few small clean ups that were needed, mostly about variables creation.
r? ```@notriddle```
Add rustX check to codeblock attributes lint
We discovered this issue [here](https://github.com/rust-lang/rust/pull/118802#discussion_r1421815943).
I assume that the issue will be present in other places outside of the compiler so it's worth adding a check for it.
First commit is just a small cleanup about variables creation which was a bit strange (at least more than necessary).
r? ```@notriddle```
Fix alignment passed down to LLVM for simd_masked_load
Follow up to #117953
The alignment for a masked load operation should be that of the element/lane, not the vector as a whole
It can produce miscompilations after the LLVM optimizer notices the higher alignment and promotes this to an unmasked, aligned load followed up by blend/select - https://rust.godbolt.org/z/KEeGbevbb
Windows: Allow `File::create` to work on hidden files
This makes `OpenOptions::new().write(true).create(true).truncate(true).open(&path)` work if the path exists and is a hidden file. Previously it would fail with access denied.
This makes it consistent with `OpenOptions::new().write(true).truncate(true).open(&path)` (note the lack of `create`) which does not have this restriction. It's also more consistent with other platforms.
Fixes#115745 (see that issue for more details).
tests: CGU tests require build-pass, not check-pass (remove FIXME)
CGU tests require CGU code to be exercised. We can't merely do "cargo check" on these tests.
Part of #62277
Correctly gate the parsing of match arms without body
https://github.com/rust-lang/rust/pull/118527 accidentally allowed the following to parse on stable:
```rust
match Some(0) {
None => { foo(); }
#[cfg(FALSE)]
Some(_)
}
```
This fixes that oversight. The way I choose which error to emit is the best I could think of, I'm open if you know a better way.
r? `@petrochenkov` since you're the one who noticed