The Problem
I am developing a search feature in PostgreSQL that involves a collection of JSONB documents stored in a table, which have been standardised. The goal is to enable clients to perform keyword searches across all documents.
The Code
The function is expected to accept a search query and return a table with columns for the document data in JSONB format, and a relevance score. The function uses the ts_rank_cd
function combined with plainto_tsquery
for full-text search as I thought this combination is suitable for ranking text search results based on the natural-language query. Additionally, I have set different weights (A, B, C) for various parts of the JSONB data, prioritising certain text fields over others.
DROP FUNCTION IF EXISTS search_documents;
CREATE FUNCTION search_documents(search_query text)
RETURNS TABLE(
raw_data JSONB,
relevance real
)
AS
$$
BEGIN
RETURN QUERY
SELECT
mpd.payload AS raw_data,
ts_rank_cd(
setweight(to_tsvector('english', coalesce(jsonb_extract_path_text(mpd.payload, 'type1', 'about_me'), '')), 'A') ||
setweight(to_tsvector('english', coalesce(jsonb_extract_path_text(mpd.payload, 'type2', 'content'), '')), 'A') ||
setweight(to_tsvector('english', coalesce(jsonb_extract_path_text(mpd.payload, 'type3', 'title'), '')), 'A') ||
setweight(to_tsvector('english', coalesce(jsonb_extract_path_text(mpd.payload, 'type4', 'occupation'), '')), 'B') ||
setweight(to_tsvector('english', coalesce(jsonb_extract_path_text(mpd.payload, 'type4', 'summary'), '')), 'B') ||
setweight(to_tsvector('english', coalesce(jsonb_extract_path_text(mpd.payload, 'type4', 'experiences'), '')), 'C'),
plainto_tsquery('english', search_query)
) AS relevance
FROM
document mpd
WHERE
to_tsvector('english',
coalesce(jsonb_extract_path_text(mpd.payload, 'type1', 'about_me'), '') || ' ' ||
coalesce(jsonb_extract_path_text(mpd.payload, 'type2', 'content'), '') || ' ' ||
coalesce(jsonb_extract_path_text(mpd.payload, 'type3', 'title'), '') || ' ' ||
coalesce(jsonb_extract_path_text(mpd.payload, 'type4', 'occupation'), '') || ' ' ||
coalesce(jsonb_extract_path_text(mpd.payload, 'type4', 'summary'), '') || ' ' ||
coalesce(jsonb_extract_path_text(mpd.payload, 'type4', 'experiences'), '')
) @@ plainto_tsquery('english', search_query)
ORDER BY relevance DESC;
END;
$$
LANGUAGE plpgsql;
Self Analysis
- The function uses multiple instances of
coalesce
andjsonb_extract_path_text
which makes theWHERE
clause andSELECT
clause somewhat cluttered. - There is no explicit error handling for potential issues like malformed JSON data or SQL execution errors.
- Extraction logic is duplicated.
- Potentially need to incorporate more sophisticated text analysis or different configurations of the full-text search to yield better results.
- No indexing.
Here is a create
statement for the document
table. I have complete control over this and it can change.
DROP TABLE IF EXISTS public.document;
CREATE TABLE IF NOT EXISTS public.document
(
document_id uuid NOT NULL,
payload jsonb NOT NULL,
created_on timestamp without time zone NOT NULL,
CONSTRAINT document_pkey PRIMARY KEY (document_id)
)