5

I'm currently developing a website that allows a search on a PostgreSQL database, the search works with to_tsquery() and I'm trying to find a way to validate the input before it's being sent as a query.

Other than that I'm also trying to add a phrasing capability, so that if someone searches for HELLO | "I LIKE CATS" it will only find results with "hello" or the entire phrase "i like cats" (as opposed to I & LIKE & CATS that will find you articles that have all 3 words, regardless where they might appear).

2

1 Answer 1

2

Is there some reason why it's too expensive to let the DB server validate it? It does seem a bit excessive to duplicate the ts_query parsing algorithm in the client.

If the concern is that you don't want it to try running the whole query (which presumably will involve table access) each time it validates, you could use the input in a smaller query, just in pseudocode (which may look a bit like Python, but that's just coincidence):

is_valid_query(input):
    try:
        execute("SELECT ts_query($1)", input); 
        return True
    except DatabaseError:
        return False

With regard to phrasing, it's probably easiest to search by the non-phrased query first (using indexes), then filter those for having the phrase. That could be done server side or client side. Depending on the language being parsed, it might be easiest to construct a simple regex of the phrase that deals with repeated whitespace or other ignorable symbols.

  1. Search for to_tsquery('HELLO|(I&LIKE&CATS)'), getting back a list of documents which loosely match.
  2. In the client, filter that to those matching the regex "HELLO|(I\s+LIKE\s+CATS)".

The downside is you do need some additional code for translating your query into the appropriate looser query, and then for translating it into a regex.

Finally, there might be a technique in PostgreSQL to do proper phrase searching using the lexeme positions that are stored in ts_vectors. I'm guessing that phrase searches are one of the intended uses, but I couldn't find an example of it in my cursory search. There's a section on it near the bottom of http://linuxgazette.net/164/sephton.html at least.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.