Problem

The market for AI chatbots and agents is expected to double in the next 3 years as businesses continuously build them to automate tasks. However, current-age conversational AI can often be a liability for businesses; see Air Canada for example, who got sued for their chatbot hallucinating inaccurate information to a customer. Especially with the rise of conversational AI in sensitive industries such as financial services, where mistakes (a.k.a "Hallucinations") can cause serious complications with compliance, it is crucial to ensure that hallucinations are extremely few and far between.

For product and developer teams, this means that they need to face the decision of either spending time to manually test their AI solutions and make sure they're fool-proof for every situation, or bear the risk of facing consequences that can range from unhappy customers to serious legal complications. As founders of an AI agent startup ourselves, we've experienced this pain first hand; we have seen ourselves and our customers spend hundreds of hours testing various different scenarios and making revisions to the data backing our AI agents. Not only is this a grueling experience for the teams that need to work on thus, but it can also make creating conversational AI tools extremely daunting for businesses, as it can result in tens, to hundreds of thousands of dollars lost for larger companies.

Our Solution

To make this process easier, we developed TalkLogic. TalkLogic is a platform that helps developers and product teams test their AI-powered customer support chatbots and agents across thousands of simulated scenarios without having to spend hundreds of hours to manually discover hallucinations. Our conversational simulation engine and automated reasoning evaluator can test and detect hallucinations dozens of times quicker than humans, helping businesses save tens of thousands of dollars and enabling product teams to focus on other pressing issues.

Unlike most AI evaluation platforms, TalkLogic doesn't stop at simulation and detection; we have extended the platform to also encompass revision. Whenever our automated reasoning systems detect an issue, it analyzes the cause and generates tangible data points that are ready to use in the knowledge base of our user's conversational AI tool. The data is specifically designed to be used in RAG systems, which is a standard architecture that powers most conversational AI tools today. Additionally, users can export this data in bulk, making it seamless to integrate the data into their pipelines.

Because of this, businesses using TalkLogic can actually find continued savings in the long term in addition to the savings in the initial setup stage. The data that TalkLogic provides can enrich smaller and cheaper language models, which are traditionally more prone to hallucinating. This can enable businesses to stray away from expensive flagship models that can be up to 20x more expensive than their counterparts.

How it works

  1. We've created a simple API for developers to seamlessly integrate their conversational AI tools to our platform via websockets. All it takes is 30 lines of code, and pasting your endpoint's URL into our web app.

  2. From there, use our web app to tell us some information about the rules, knowledge, and actions available to your AI tool. These will be used by our automated reasoning system to accurately evaluate and analyze its responses.

  3. Set your desired sample size and watch the simulation unfold on our live interface!

  4. You can click on issues that pop up for more details on hallucinations, as well as our generated data that you can use right away. You can also export the revision data in bulk to seamlessly integrate it into your data pipeline.

Tech Stack

The prototype for TalkLogic uses Next.js for the web app frontend, and aiohttp (Python) for its backend due to the strong websocket support.

The automated evaluation is powered by chain-of-thought reasoning systems on top of LangChain, using OpenAI Models hosted on Microsoft Azure, and FAISS vector stores.

What's next for TalkLogic

We see TalkLogic as an important pillar in the future of conversational AI development, and we have exciting plans for it going forward.

We want to improve TalkLogic to encompass a greater portion of the AI tool testing and revision process by introducing additional reasoning flows that can automatically infer important rules and knowledge that our users may have missed, pre-testing. This will make the simulation and evaluation even more effective as the constraints of our users' AI tools will be better defined going into the process.

We also don't want to limit TalkLogic to purely text-based interfaces. We plan to expand our scope to include other useful applications of generative AI systems, including function calling and voice-based interfaces.

Finally, we also want to give our users more visibility into analytics throughout the evaluation process. This will include providing support for metrics that relate to the failure rate of their AI tools, as well as insights into how they would have improved by comparison after integrating our generated revision data into their knowledge bases.

Whether it's as an internal tool or its own product, we are excited for the future of TalkLogic and we're confident it's only in its infancy.

Built With

Share this project:

Updates