Filtering
There are a few ways to index nearest neighbor queries with a WHERE clause.
SELECT * FROM items WHERE category_id = 123 ORDER BY embedding <-> '[3,1,2]' LIMIT 5;
A good place to start is creating an index on the filter column. This can provide fast, exact nearest neighbor search in many cases. Postgres has a number of index types for this: B-tree (default), hash, GiST, SP-GiST, GIN, and BRIN.
CREATE INDEX ON items (category_id);
For multiple columns, consider a multicolumn index.
CREATE INDEX ON items (location_id, category_id);
Exact indexes work well for conditions that match a low percentage of rows. Otherwise, approximate indexes can work better.
CREATE INDEX ON items USING hnsw (embedding vector_l2_ops);
With approximate indexes, filtering is applied after the index is scanned. If a condition matches 10% of rows, with HNSW and the default hnsw.ef_search of 40, only 4 rows will match on average. For more rows, increase hnsw.ef_search.
SET hnsw.ef_search = 200;
Starting with 0.8.0, you can enable iterative index scans, which will automatically scan more of the index when needed.
SET hnsw.iterative_scan = strict_order;
If filtering by only a few distinct values, consider partial indexing.
CREATE INDEX ON items USING hnsw (embedding vector_l2_ops) WHERE (category_id = 123);
If filtering by many different values, consider partitioning.
CREATE TABLE items (embedding vector(3), category_id int) PARTITION BY LIST(category_id);