Troubleshooting
Why isn’t a query using an index?
The query needs to have an ORDER BY and LIMIT, and the ORDER BY must be the result of a distance operator (not an expression) in ascending order.
-- index
ORDER BY embedding <=> '[3,1,2]' LIMIT 5;
-- no index
ORDER BY 1 - (embedding <=> '[3,1,2]') DESC LIMIT 5;
You can encourage the planner to use an index for a query with:
BEGIN;
SET LOCAL enable_seqscan = off;
SELECT ...
COMMIT;
Also, if the table is small, a table scan may be faster.
Why isn’t a query using a parallel table scan?
The planner doesn’t consider out-of-line storage in cost estimates, which can make a serial scan look cheaper. You can reduce the cost of a parallel scan for a query with:
BEGIN;
SET LOCAL min_parallel_table_scan_size = 1;
SET LOCAL parallel_setup_cost = 1;
SELECT ...
COMMIT;
or choose to store vectors inline:
ALTER TABLE items ALTER COLUMN embedding SET STORAGE PLAIN;
Why are there less results for a query after adding an HNSW index?
Results are limited by the size of the dynamic candidate list (hnsw.ef_search), which is 40 by default. There may be even less results due to dead tuples or filtering conditions in the query. Enabling iterative index scans can help address this.
Also, note that NULL vectors are not indexed (as well as zero vectors for cosine distance).
Why are there less results for a query after adding an IVFFlat index?
The index was likely created with too little data for the number of lists. Drop the index until the table has more data.
DROP INDEX index_name;
Results can also be limited by the number of probes (ivfflat.probes). Enabling iterative index scans can address this.
Also, note that NULL vectors are not indexed (as well as zero vectors for cosine distance).