• Devpost Devpost
    • Log in
    • Sign up
    Join a hackathon
    Devpost logo

    Devpost

    Participate in our public hackathons

    Hackathons Projects
    Devpost for Teams

    Devpost for Teams

    Access your company's private hackathons

    Login
    Host a hackathon
    Devpost

    Devpost

    Grow your developer ecosystem and promote your platform

    Host a public hackathon
    Devpost for Teams logo

    Devpost for Teams

    Drive innovation, collaboration, and retention within your organization

    Host an internal hackathon

    By use case

    AI hackathons Customer hackathons Employee hackathons Public hackathons
    Resources

    Blog

    Insights into hackathon planning and participation

    Customer stories

    Inspiration from peers and other industry leaders

    Planning guides

    Best practices for planning online and in-person hackathons

    Webinars & events

    Upcoming events and on-demand recordings

    Help desk

    Common questions and support documentation

  • Devpost Devpost
  • Join a hackathon
    • Devpost logo

      Devpost

      Participate in our public hackathons

      Hackathons Projects
      Devpost for Teams

      Devpost for Teams

      Access your company's private hackathons

      Login
  • Host a hackathon
    • Devpost

      Devpost

      Grow your developer ecosystem and promote your platform

      Host a public hackathon
      Devpost for Teams logo

      Devpost for Teams

      Drive innovation, collaboration, and retention within your organization

      Host an internal hackathon

      By use case

      AI hackathons Customer hackathons Employee hackathons Public hackathons
  • Resources
    • Blog

      Insights into hackathon planning and participation

      Customer stories

      Inspiration from peers and other industry leaders

      Planning guides

      Best practices for planning online and in-person hackathons

      Webinars & events

      Upcoming events and on-demand recordings

      Help desk

      Common questions and support documentation

  • Log in
  • Sign up

Weights & Biases Judgement Day Hackathon

Descend
  • Overview
  • My projects
  • Participants (28)
  • Resources
  • Rules
  • Project gallery
  • Updates
  • Discussions
Connect with the participants – support your favorite projects by liking, sharing, and commenting on them.
kg-evaluator
kg-evaluator

evaluate how well LLM builds a knowledge graph

Winner Winner
Kirill Igumenshchev
0 1
OptoPrompt
OptoPrompt

One-click prompt alignment for LLM judges via Weave and DSPy. Focused on easy user interface.

Winner Winner
Pablo Unzueta Bassim Eledath
0 0
Let's Get Creative!
Let's Get Creative!

Exploratory data analysis looking at how creative LLMs are. Evaluating generated lists of names/words/ideas against each other

Winner Winner
Alex Reibman
0 0
Pairwise Model Tests
Pairwise Model Tests

Weave has good support for scalar metrics, and comparing models by metrics. However, in some cases pairwise comparison of outputs is a better measure approach to model comparison.

Winner Winner
Daniel Fennelly
0 0
Attack_prompt
Attack_prompt

Used DSPy and other tools to automate prompts for attack prompts, adversarial prompts, red teaming, and generating harmful intent prompts and prompt responses that get around LLM guardrails.

Jonathan Laplante
1 0
Judge Tuner
Judge Tuner

Mechanism for Concurrently Tuning Example Data and Metrics for Evaluation of a Model

Rajesh Trivedi Swaraj R Jeffrey Wang
0 0
CycleVal
CycleVal

Evalute LLM judgements with open success criteria

daniel k
0 1
LBM
LBM

Large Bagging Model Optimize the best debate agent. Creating a pipeline to generate agents and recursively improve them based on human feedback and LLM evalutors.

Wei Chun Ajay Kallepalli LORN HIN ADRIAN LAM Benedict Neo
5 0
LLM Idol
LLM Idol

Are you and your buddies tired of annotating and evaluating LLMs manually? Use LLMs as judges instead!

Faris Habib
0 0
EvalGuard
EvalGuard

EvalGuard simplifies evaluation with auto fact-checking and domain-specific synthetic data generation with plug and play JSON. With Weave tracing, it ensures transparency—no more black-box processes

Livia Ellen Jiaping Zhang
0 0
Lazy Evals
Lazy Evals

Lazy Evals: Don't wait—dominate! Instant AI evaluations blending speed and quality. Cut latency without sacrificing modularity. Supercharge your AI development. Get lazy, get ahead with Lazy Evals!

Patrick Damaso MD praveen Ma
0 0
LLMonPy
LLMonPy

Build on Self taught evaluators paper (https://arxiv.org/abs/2408.02666) to generate a lot of training data for using an LLM to write an evaluation prompt

Tom Burns
0 1
Pro-Bias: Self-Improving Human-Aligned Subject LLM Evals
Pro-Bias: Self-Improving Human-Aligned Subject LLM Evals

Creating human-aligned LLM judges through enhanced evaluation frameworks and self-improving automated grading systems.

Aditya Advani Jeff Risberg
0 0
Enhancing Preference Learning for Monte Carlo Tree Search
Enhancing Preference Learning for Monte Carlo Tree Search

Leveraging AI as a judge within Monte Carlo Tree Search, we automate reasoning evaluation, enabling faster and more accurate preference learning to optimize decision-making without human bias.

Marcus Yeo Yong Ning Lee JUN SOO
0 0

1 – 14 of 14

Devpost

  • About
  • Careers
  • Contact
  • Help

Hackathons

  • Browse hackathons
  • Explore projects
  • Host a hackathon
  • Hackathon guides

Portfolio

  • Your projects
  • Your hackathons
  • Settings

Connect

  • Twitter
  • Discord
  • Facebook
  • LinkedIn
© 2026 Devpost, Inc. All rights reserved.
  • Community guidelines
  • Security
  • CA notice
  • Privacy policy
  • Terms of service