Devpost
Participate in our public hackathons
Devpost for Teams
Access your company's private hackathons
Grow your developer ecosystem and promote your platform
Drive innovation, collaboration, and retention within your organization
By use case
Blog
Insights into hackathon planning and participation
Customer stories
Inspiration from peers and other industry leaders
Planning guides
Best practices for planning online and in-person hackathons
Webinars & events
Upcoming events and on-demand recordings
Help desk
Common questions and support documentation
evaluate how well LLM builds a knowledge graph
One-click prompt alignment for LLM judges via Weave and DSPy. Focused on easy user interface.
Exploratory data analysis looking at how creative LLMs are. Evaluating generated lists of names/words/ideas against each other
Weave has good support for scalar metrics, and comparing models by metrics. However, in some cases pairwise comparison of outputs is a better measure approach to model comparison.
Used DSPy and other tools to automate prompts for attack prompts, adversarial prompts, red teaming, and generating harmful intent prompts and prompt responses that get around LLM guardrails.
Mechanism for Concurrently Tuning Example Data and Metrics for Evaluation of a Model
Evalute LLM judgements with open success criteria
Large Bagging Model Optimize the best debate agent. Creating a pipeline to generate agents and recursively improve them based on human feedback and LLM evalutors.
Are you and your buddies tired of annotating and evaluating LLMs manually? Use LLMs as judges instead!
EvalGuard simplifies evaluation with auto fact-checking and domain-specific synthetic data generation with plug and play JSON. With Weave tracing, it ensures transparency—no more black-box processes
Lazy Evals: Don't wait—dominate! Instant AI evaluations blending speed and quality. Cut latency without sacrificing modularity. Supercharge your AI development. Get lazy, get ahead with Lazy Evals!
Build on Self taught evaluators paper (https://arxiv.org/abs/2408.02666) to generate a lot of training data for using an LLM to write an evaluation prompt
Creating human-aligned LLM judges through enhanced evaluation frameworks and self-improving automated grading systems.
Leveraging AI as a judge within Monte Carlo Tree Search, we automate reasoning evaluation, enabling faster and more accurate preference learning to optimize decision-making without human bias.
1 – 14 of 14