Inspiration
Our inspiration for this application came from a tangential issue. On a previous hackathon application ideation, we found it difficult to identify problematic gaps in existing solutions, especially when doing so requires domain knowledge. There are numerous resources for learning features and user journeys from the perspective of businesses selling product, but in terms of end-user experience, the only readily-available source is best-effort customer reviews. What if real onboarding signals could be turned into standardized, evidence-based reviews instead? For B2B software specifically, there are many signal sources such as email, jira, google drive, slack, and even software logs. We started off by turning signals from the first three into auto-generated reviews, but now we needed to figure out the customer's incentive. The obvious benefit is potentially better quality of service if these enriched reviews are turned into something actionable for vendors. Apart from that, however, we realized that the same signals used to generate reviews can also be used to provide insight into the progress of vendor software integration. Often times, there is information asymmetry between what businesses believe vendors can provide and what is ultimately delivered. This gave shape to our goal of (1) providing transparency into the vendor software integration process for the customer and (2) providing crisp visibility into customer pain points for the vendor.
See this slide deck which highlights the value of what we've built and a quick onboarding flow demo: Slide deck + demo
What it does
Short Description This is a vendor intelligence platform for (1) companies that rely on third-party software to understand how their vendor relationships are actually going and (2) vendors to understand customer pain points.
Problem it Solves When your organization uses dozens of SaaS tools, it's hard to know which vendors are reliable, which are causing pain, and whether integrations are maturing or regressing. The signals are scattered across email threads, Jira tickets, and shared documents.
How we built it
Process
Ingests signals automatically from Gmail, Jira, and Google Drive. It watches for vendor-related communication, support tickets, incident reports, and shared documents. Then, it classifies each signal by valence (positive/negative/neutral), subject (vendor issue, vendor request, internal implementation, vendor communication), lifecycle stage (onboarding through optimization), and health category (reliability, performance, fitness for purpose).
Computes health scores for each registered software integration using a deterministic scoring engine across three dimensions (reliability, performance, and fitness for purpose), with an overall composite score and confidence-tier based on signal volume.
- Each category is scored independently using severity-weighted event counts, positive/negative signal ratios, and time-decay factors, then combined into a composite score with category-specific weights.
Tracks maturity trajectory across five lifecycle stages (onboarding → integration → stabilization → productive → optimization). For each stage, it measures five smoothness sub-metrics: friction, issue recurrence, issue escalation, issue resolution effectiveness, and support effort ratio.
- Friction measures net negative impact: severity-weighted negative signals minus a partial offset from positive signals. More severe issues (critical, high) contribute disproportionately.
- Recurrence looks at vendor-issue threads where the same problem resurfaced after resolution. Each thread's impact is time-weighted using exponential decay relative to its own update frequency. A recently recurring issue hurts more, while a long-resolved thread helps more. Threads that recurred many times before resolving get diminished credit.
- Escalation detects within-thread severity increases (e.g., medium → high → critical). The score penalizes based on the severity jump magnitude and how many threads exhibited escalation patterns.
- Resolution pairs ticket-created with ticket-resolved events to measure resolution rates and time-to-resolution. Faster, more complete resolution improves the score.
- Effort compares core product effort to peripheral effort like SSO, billing, or access.
Generates vendor reviews using LLM-powered analysis that synthesizes signal patterns into structured assessments.
Supports customer outreach where vendors can draft and send targeted messages to customers via email to proactively stay on top of customer pain points.
How to try it out
- Visit https://vendor-intel.onrender.com
- Login with the following credentials:
- Username: [email protected]
- Password: vendorintel123
- Go to the "Signals" tab. In the dropdown on the top left, select "Supabase" (an example registered software). We have simulated some dummy interactions to produce minimal data.
- You will see a "Latest Health Score" section with a summary and drill-downs to reliability, performance, and fitness-for-purpose sub-sections.
- You will see an "Integration Trajectory" section with a summary and stage-level drill-downs for "Onboarding" and "Integration". The other stages won't show up since we didn't simulate interactions for them. These drill-downs, in turn, lead to dimension-level drill-downs (friction, recurrence, etc.), where clicking on any of the bars reveals a timeline for that dimension.
- Go to the "Reviews" tab. That will show the auto-generated review of Supabase based on the simulated interactions.
- Go to the "Intelligence" tab. Since we submitted the Supabase review, you'll see an element for Supabase. If you click on that, you'll see some company information corresponding to where reviews (just 1 review in this case) came from, followed by a "Critical User Journey" chart. Those user journey phases are informed by the experiences of end users that submitted a review. Clicking on any bar will display companies who have experience with that phase along with specific contacts. If you hit "Generate Outreach", then that will create an outreach message specific to that contact's experience with the software. This "Intelligence" tab is primarily meant for the vendor (in this example, Supabase) to get a crisp sense of pain points with their software and have a targeted approach for proactively reaching out.
- Please watch the video demo to see how interactions are simulated!
Challenges we ran into
Classification mistakes cascade. The entire scoring pipeline trusts signal classification (valence, subject, stage) done at ingest time. Misclassification, such as a positive email read as negative, silently compounds. Currently, this is mitigated using deterministic keyword matching as a fallback, and an auto-backfill mechanism reclassifies any signals that were stored without tags.
Heuristic calibration without ground truth. Scoring relies on hand-tuned constants (severity weights, decay half-lives, stage-dependent weights) designed to feel reasonable but validated against nothing empirical. Currently, this is mitigated using a confidence tier system (limited/developing/moderate/solid) that flags when signal volume is too low to trust scores, and peer benchmarks let users compare a vendor's scores against that of other software with similar use cases to surface relative outliers rather than relying on absolute numbers.
Cold start and signal sparsity. A newly registered software integration has zero signals, and it can take days or weeks before enough emails, tickets, and documents accumulate to produce meaningful scores. Currently, this is mitigated using the aforementioned confidence tier system and a streaming approach that update scores incrementally as new signals arrive rather than waiting for a scheduled batch.
Accomplishments that we're proud of
End-to-end signal pipeline from raw data to actionable scores. Emails, Jira tickets, and Drive documents enter as unstructured noise and come out as classified, scored, and visualized vendor intelligence. The entire process from OAuth connection to dashboard charts is fully automated, with no manual tagging required.
Nuanced scoring that adapts to context. Sub-metrics like recurrence use time-weighted exponential decay relative to each thread's own update cadence, meaning a high-frequency support thread and a monthly check-in are evaluated on their own terms. Stage-dependent weight matrices shift what matters most as an integration matures. For example, issue resolution speed dominates during onboarding while issue recurrence patterns matter more in production.
Rich drill-down from aggregate scores to individual events. Every score in the dashboard traces back through sub-category breakdowns, per-stage smoothness metrics, and individual timeline events, each with LLM-generated contextual descriptions. Users can go from "reliability score dropped" all the way to the specific incident that caused it.
What we learned
Thread-level signals beat individual messages. Initially, we treated each turn of the email/jira ticket conversation as a separate signal, and this flooded the pipeline with noise from back-and-forth replies.
Temporal context matters more than absolute counts. Simple counts ("3 incidents this month") turned out to be nearly meaningless. The same count means very different things depending on how frequently that thread gets updates, how long ago the last incident was, and what lifecycle stage the integration is in. This led to the relative-time decay model for recurrence and the stage-dependent weight matrices.
Separate what the LLM is good at from what it isn't. We started with a monolithic CrewAI crew that did classification, scoring, and summarization in one pass. It hallucinated scores, produced inconsistent numbers across runs, and was impossible to debug. Breaking the pipeline into LLM-for-language (classification, narrative generation) and code-for-math (scoring, thresholds, aggregation) made each half more reliable and made failures attributable.
What's next for Vendor Intel
Data-driven heuristic calibration: As signal volume grows, we can fit scoring constants (severity weights, decay half-lives, stage weight matrices) to observed distributions rather than hand-tuned values. With enough labeled outcomes like vendor churn, renewal decisions, or escalation to leadership, we can validate whether scores actually predict vendor health and adjust accordingly.
Utilization gap detection: Surface which features or capabilities of a vendor's software are under-utilized by analyzing signal patterns. If most signals cluster around a narrow slice of functionality, that's either a training gap or a sign the tool isn't earning its full license cost.
Broader signal ingestion: Extend beyond email, Jira, and Drive to ingest application logs, uptime monitors (e.g., Datadog, PagerDuty), Slack channels, and usage analytics. Logs capture reliability issues that never become tickets, usage data reveals adoption trends that emails don't surface, and Slack captures informal sentiment that formal channels miss.
Built With
- alembic
- fastapi
- google-gmail-oauth
- postgresql
- python
- react
- sqlalchemy
- sqlite
- tailwindcss
- tanstack
- typescript
- vite
Log in or sign up for Devpost to join the conversation.