Version:

This documentation is sourced from a third-party project and is not maintained by pgEdge.

Filtering

There are a few ways to index nearest neighbor queries with a WHERE clause.

SELECT * FROM items WHERE category_id = 123 ORDER BY embedding <-> '[3,1,2]' LIMIT 5;

A good place to start is creating an index on the filter column. This can provide fast, exact nearest neighbor search in many cases. Postgres has a number of index types for this: B-tree (default), hash, GiST, SP-GiST, GIN, and BRIN.

CREATE INDEX ON items (category_id);

For multiple columns, consider a multicolumn index.

CREATE INDEX ON items (location_id, category_id);

Exact indexes work well for conditions that match a low percentage of rows. Otherwise, approximate indexes can work better.

CREATE INDEX ON items USING hnsw (embedding vector_l2_ops);

With approximate indexes, filtering is applied after the index is scanned. If a condition matches 10% of rows, with HNSW and the default hnsw.ef_search of 40, only 4 rows will match on average. For more rows, increase hnsw.ef_search.

SET hnsw.ef_search = 200;

Starting with 0.8.0, you can enable iterative index scans, which will automatically scan more of the index when needed.

SET hnsw.iterative_scan = strict_order;

If filtering by only a few distinct values, consider partial indexing.

CREATE INDEX ON items USING hnsw (embedding vector_l2_ops) WHERE (category_id = 123);

If filtering by many different values, consider partitioning.

CREATE TABLE items (embedding vector(3), category_id int) PARTITION BY LIST(category_id);