You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Evaluated approaches - I evaluated three options: rule-based detection, headless browser rendering, and external technology detection services.
What I chose and why - I chose rule-based detection + HTTP signals (HEAD and CSP) because it is faster, and provides clear evidence for each detection.
How I built the rules - I extracted most technology rules from manual inspection of website HTML, and I built CSP domain mappings by exploring linked resources and headers in Postman.
What I did not choose for now - I did not use a headless browser for all domains because the current approach prioritizes simpler execution and more predictable processing for large batches.
Accepted trade-offs - I gained speed and operational simplicity, but accuracy may decrease on sites that load technologies strictly dynamically.
Main issues in the current implementation
False positives from simple pattern matching - The current implementation uses substring matching, which can generate false positives in some cases. I would reduce this with stricter rules and multi-signal validation.
No deduplication between detection sources - The same technology can be detected from rules, CSP, and HEAD, resulting in duplicate entries. I would add a deduplication step and aggregate evidence into a single object per technology.
Overly simple confidence score - Current confidence is based only on the number of signals. I would switch to weighted scoring, where strong signals (specific cookie/header) weigh more than generic signals.
Limited performance on large batches - Sequential execution becomes slow at scale. I would optimize with controlled concurrency (async/thread pool), retry logic, and caching for repeated requests.
How I will discover new technologies in the future
Analysis of unknown domains in results - I inspect domains with low confidence or no detections and extract new patterns from scripts, cookies, and headers.
Automatic rule suggestion - I build a step that proposes candidate rules from repeated signals (for example: cookie prefixes, JS endpoints, global variables).
Using rule sets from open-source projects - I use rule sets from open-source projects (for example, Wappalyzer) and adapt relevant ones into my own rules.