Commit graph

23 commits

Author SHA1 Message Date
Michael Howell
5973005d93 rustdoc-search: use correct type annotations 2024-10-30 10:35:38 -07:00
Michael Howell
3699e939e8 rustdoc-search: allow trailing Foo -> arg search 2024-09-05 17:58:05 -07:00
Sunshine
dd5103bb68 Add test for PR #126057 2024-06-07 05:49:46 +08:00
Michael Howell
226c77d0a4 rustdoc: add re-export info to RawSearchIndex type def 2024-04-08 17:07:20 -07:00
Michael Howell
1ab60f2a64 rustdoc-search: fix inaccurate type descriptions 2023-12-30 22:53:52 -07:00
Michael Howell
9a9695a052 rustdoc-search: use set ops for ranking and filtering
This commit adds ranking and quick filtering to type-based search,
improving performance and having it order results based on their
type signatures.

Motivation
----------

If I write a query like `str -> String`, a lot of functions come up.
That's to be expected, but `String::from_str` should come up on top, and
it doesn't right now. This is because the sorting algorithm is based
on the functions name, and doesn't consider the type signature at all.
`slice::join` even comes up above it!

To fix this, the sorting should take into account the function's
signature, and the closer match should come up on top.

Guide-level description
-----------------------

When searching by type signature, types with a "closer" match will
show up above types that match less precisely.

Reference-level explanation
---------------------------

Functions signature search works in three major phases:

* A compact "fingerprint," based on the [bloom filter] technique, is used to
  check for matches and to estimate the distance. It sometimes has false
  positive matches, but it also operates on 128 bit contiguous memory and
  requires no backtracking, so it performs a lot better than real
  unification.

  The fingerprint represents the set of items in the type signature, but it
  does not represent nesting, and it ignores when the same item appears more
  than once.

  The result is rejected if any query bits are absent in the function, or
  if the distance is higher than the current maximum and 200
  results have already been found.

* The second step performs unification. This is where nesting and true bag
  semantics are taken into account, and it has no false positives. It uses a
  recursive, backtracking algorithm.

  The result is rejected if any query elements are absent in the function.

[bloom filter]: https://en.wikipedia.org/wiki/Bloom_filter

Drawbacks
---------

This makes the code bigger.

More than that, this design is a subtle trade-off. It makes the cases I've
tested against measurably faster, but it's not clear how well this extends
to other crates with potentially more functions and fewer types.

The more complex things get, the more important it is to gather a good set
of data to test with (this is arguably more important than the actual
benchmarking ifrastructure right now).

Rationale and alternatives
--------------------------

Throwing a bloom filter in front makes it faster.

More than that, it tries to take a tactic where the system can not only check
for potential matches, but also gets an accurate distance function without
needing to do unification. That way it can skip unification even on items
that have the needed elems, as long as they have more items than the
currently found maximum.

If I didn't want to be able to cheaply do set operations on the fingerprint,
a [cuckoo filter] is supposed to have better performance.
But the nice bit-banging set intersection doesn't work AFAIK.

I also looked into [minhashing], but since it's actually an unbiased
estimate of the similarity coefficient, I'm not sure how it could be used
to skip unification (I wouldn't know if the estimate was too low or
too high).

This function actually uses the number of distinct items as its
"distance function."
This should give the same results that it would have gotten from a Jaccard
Distance $1-\frac{|F\cap{}Q|}{|F\cup{}Q|}$, while being cheaper to compute.
This is because:

* The function $F$ must be a superset of the query $Q$, so their union is
  just $F$ and the intersection is $Q$ and it can be reduced to
  $1-\frac{|Q|}{|F|}.

* There are no magic thresholds. These values are only being used to
  compare against each other while sorting (and, if 200 results are found,
  to compare with the maximum match). This means we only care if one value
  is bigger than the other, not what it's actual value is, and since $Q$ is
  the same for everything, it can be safely left out, reducing the formula
  to $1-\frac{1}{|F|} = \frac{|F|}{|F|}-\frac{1}{|F|} = |F|-1$. And, since
  the values are only being compared with each other, $|F|$ is fine.

Prior art
---------

This is significantly different from how Hoogle does it.
It doesn't account for order, and it has no special account for nesting,
though `Box<t>` is still two items, while `t` is only one.

This should give the same results that it would have gotten from a Jaccard
Distance $1-\frac{|A\cap{}B|}{|A\cup{}B|}$, while being cheaper to compute.

Unresolved questions
--------------------

`[]` and `()`, the slice/array and tuple/union operators, are ignored while
building the signature for the query. This is because they match more than
one thing, making them ambiguous. Unfortunately, this also makes them
a performance cliff. Is this likely to be a problem?

Right now, the system just stashes the type distance into the
same field that levenshtein distance normally goes in. This means exact
query matches show up on top (for example, if you have a function like
`fn nothing(a: Nothing, b: i32)`, then searching for `nothing` will show it
on top even if there's another function with `fn bar(x: Nothing)` that's
technically a closer match in type signature.

Future possibilities
--------------------

It should be possible to adopt more sorting criteria to act as a tie breaker,
which could be determined during unification.

[cuckoo filter]: https://en.wikipedia.org/wiki/Cuckoo_filter
[minhashing]: https://en.wikipedia.org/wiki/MinHash
2023-12-13 10:37:15 -07:00
Michael Howell
63c50712f4 rustdoc-search: add support for associated types 2023-11-19 18:54:36 -07:00
Michael Howell
89a4c7f552 rustdoc: bug fix for -> option<t> 2023-09-03 13:06:07 -07:00
Michael Howell
0b3c617ec0 rustdoc-search: add support for type parameters
When writing a type-driven search query in rustdoc, specifically one
with more than one query element, non-existent types become generic
parameters instead of auto-correcting (which is currently only done
for single-element queries) or giving no result. You can also force a
generic type parameter by writing `generic:T` (and can force it to not
use a generic type parameter with something like `struct:T` or whatever,
though if this happens it means the thing you're looking for doesn't
exist and will give you no results).

There is no syntax provided for specifying type constraints
for generic type parameters.

When you have a generic type parameter in a search query, it will only
match up with generic type parameters in the actual function, not
concrete types that match, not concrete types that implement a trait.
It also strictly matches based on when they're the same or different,
so `option<T>, option<U> -> option<U>` matches `Option::and`, but not
`Option::or`. Similarly, `option<T>, option<T> -> option<T>`` matches
`Option::or`, but not `Option::and`.
2023-09-03 13:06:06 -07:00
Michael Howell
217fe24e52 rustdoc-search: null, not -1, for missing id
This allows us to use negative numbers for others purposes.
2023-09-03 11:20:22 -07:00
Michael Howell
9946d67579 rustdoc-search: build args, return, and generics on one unifier
This enhances generics with the "unboxing" behavior where A<T>
matches T. It makes this unboxing transitive over generics.
2023-06-11 18:19:37 -07:00
Michael Howell
4c11822aeb rustdoc: restructure type search engine to pick-and-use IDs
This change makes it so, instead of mixing string distance with
type unification, function signature search works by
mapping names to IDs at the start, reporting to the user any
cases where it had to make corrections, and then matches with
IDs when going through the items.

This only changes function searches. Name searches are left alone,
and corrections are only done when there's a single item in the
search query.
2023-04-17 12:16:54 -07:00
Michael Howell
53f499d475 rustdoc-search: use ES6 Map for Result instead of Object 2023-04-13 17:40:06 -07:00
Michael Howell
33cf9ea4a2 Add comments, fixes for 0 sentinel 2022-06-27 14:15:14 -07:00
Michael Howell
b54e3e6c98
Update src/librustdoc/html/static/js/externs.js
Co-authored-by: Guillaume Gomez <guillaume1.gomez@gmail.com>
2022-06-27 12:07:13 -07:00
Michael Howell
8081096a7f Add documentation 2022-06-27 11:07:16 -07:00
Folyd
a8ede1248d Change eslint rules from configuration comments to configuration files 2022-05-07 11:47:30 +08:00
Guillaume Gomez
cb8da88c83 Migrate externs.js to ES6 2022-04-26 20:59:32 +02:00
Guillaume Gomez
699ae365df Apply suggestions:
* Forbid generics without a path (so "<p>" is forbidden).
 * Change `handleSingleArg` so that it takes `results_others`, `results_in_args` and `results_returned` as arguments instead of using the "global" variables.
 * Change `createQueryElement` so that it returns the newly created element instead of taking `elems` as argument.
 * Improve documentation
2022-04-18 20:59:09 +02:00
Guillaume Gomez
264064df36 * Greatly improve the rustdoc search parser source code
* Move all functions outside parseQuery
2022-04-18 20:59:08 +02:00
Guillaume Gomez
bbcf1762dd Improve naming of "val" field 2022-04-18 20:59:08 +02:00
Guillaume Gomez
be41750a10 Greatly improve rustdoc search 2022-04-18 20:59:08 +02:00
Jacob Hoffman-Andrews
7ba086c6db Add some JSDoc comments to rustdoc JS
This follows the Closure Compiler dialect of JSDoc, so we
can use it to do some basic type checking. We don't plan to
compile with Closure Compiler, just use it to check types. See
https://github.com/google/closure-compiler/wiki/ for details.
2021-12-22 14:20:04 -08:00