to_tsquery() validation

Question

I'm currently developing a website that allows a search on a PostgreSQL database, the search works with to_tsquery() and I'm trying to find a way to validate the input before it's being sent as a query.

Other than that I'm also trying to add a phrasing capability, so that if someone searches for HELLO | "I LIKE CATS" it will only find results with "hello" or the entire phrase "i like cats" (as opposed to I & LIKE & CATS that will find you articles that have all 3 words, regardless where they might appear).

wasn't there an answer here earlier? odd.

Asaf
– Asaf

2011-07-19 13:39:46 +00:00
Commented Jul 19, 2011 at 13:39 — Asaf
– Asaf, Commented Jul 19, 2011 at 13:39
Related: stackoverflow.com/questions/16020164/…

Ciro Santilli OurBigBook.com
– Ciro Santilli OurBigBook.com

2025-02-13 16:45:49 +00:00
Commented Feb 13, 2025 at 16:45 — Ciro Santilli OurBigBook.com
– Ciro Santilli OurBigBook.com, Commented Feb 13, 2025 at 16:45

Edmund · Accepted Answer · 2012-04-13 05:01:03Z

Is there some reason why it's too expensive to let the DB server validate it? It does seem a bit excessive to duplicate the ts_query parsing algorithm in the client.

If the concern is that you don't want it to try running the whole query (which presumably will involve table access) each time it validates, you could use the input in a smaller query, just in pseudocode (which may look a bit like Python, but that's just coincidence):

is_valid_query(input):
    try:
        execute("SELECT ts_query($1)", input); 
        return True
    except DatabaseError:
        return False

With regard to phrasing, it's probably easiest to search by the non-phrased query first (using indexes), then filter those for having the phrase. That could be done server side or client side. Depending on the language being parsed, it might be easiest to construct a simple regex of the phrase that deals with repeated whitespace or other ignorable symbols.

Search for to_tsquery('HELLO|(I&LIKE&CATS)'), getting back a list of documents which loosely match.
In the client, filter that to those matching the regex "HELLO|(I\s+LIKE\s+CATS)".

The downside is you do need some additional code for translating your query into the appropriate looser query, and then for translating it into a regex.

Finally, there might be a technique in PostgreSQL to do proper phrase searching using the lexeme positions that are stored in ts_vectors. I'm guessing that phrase searches are one of the intended uses, but I couldn't find an example of it in my cursory search. There's a section on it near the bottom of http://linuxgazette.net/164/sephton.html at least.

Collectives™ on Stack Overflow

to_tsquery() validation

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related