SolDevelo

How to Choose the Right Jira Test Management Tool For Visibility, Speed, and Adoption

Anna Kwaśny — Thu, 19 Mar 2026 09:37:23 +0000

Choosing the right Jira test management tool should not come down to comparing feature checklists alone. The best tool is the one your team will actually adopt, the one that embeds testing into everyday Jira workflows, and the one that supports fast, visible feedback without slowing delivery – characteristics of an effective shift-left testing.

That matters because many Jira test management tools fail for the same reason: they reinforce QA silos instead of enabling team-owned quality. This is because they’ve invested the majority of their attention in features – building heavy, complex solutions that only testers touch. Such Jira test management tools often reinforce QA silos, discouraging developers from engaging in early testing and cutting testers off from the development process.

If you’re evaluating Jira test management tools, shortlist them based on three practical criteria: shift-left enablement, speed at scale, and frictionless adoption. Features still matter, but they matter in the context of whether the tool helps your team test earlier, collaborate inside Jira, and maintain visibility across development, QA, and product.

Here is how to choose your next Jira test management app to make sure it supports team-wide visibility, quick feedback loops, and easy adoption.

Why Most Tools Fail – And the Shift-Left Solution

Many Jira test management tools focus on offering a broad range of features, making them highly versatile. However, their users still tend to encounter low visibility and late-stage bug detection. The fault lies in leaving QA at the end of SDLC, resulting in siloed workflows and quality treated as an afterthought, rather than the core value of the software.

That trap can be avoided with shift-left testing, designed to lower the risk of missed bugs and unexpected issues that can delay the release. By including QA in the development workflow, teams can detect problems earlier, validate requirements continuously, and ensure quality is built into the product from the start.

But that only works when the tool supports shared visibility and fast participation across roles, not just deeper test administration for QA.

To choose a Jira test management tool that truly optimizes your processes and maximizes QA effectiveness, the key question is not “How many features does it have?”, but to look for a solution that combines shift-left enablement, high performance, and frictionless adoption. Ask a question like “Will this tool help our team test earlier, move faster, and actually adopt testing inside daily Jira workflows?” When evaluating the options available on the Atlassian Marketplace, pay attention to these four shift-left criteria:

Team Visibility: High testing visibility enables the whole team to monitor quality at every stage of the development process. Test progress, results, and risks should be easy to access so everyone can stay aligned.
Embedded Workflows: Testing should happen alongside development, allowing bugs to be caught earlier and quality insights to be shared naturally between QA and developers.
Built for Speed: The tool must handle growing test volume, large Jira projects, and growing test suites without slowing teams down, ensuring strong performance and scalability as development moves quickly.
Frictionless Adoption: The tool should feel like a natural extension of Jira – easy to implement, intuitive to use, and simple for teams to integrate into their existing workflows without adding unnecessary overhead.

The qualities above should serve as the primary evaluation criteria for further research, framing how you assess specific elements. Instead of looking at features in isolation, consider how well they support shift-left practices, maintain speed at scale, and enable smooth adoption across the team.

Evaluating Trade-Offs that Actually Matter in Jira Test Management

Choosing a Jira test management tool is a game of trade-offs. The strategic choices you make regarding your data and processes directly dictate how the tool performs and how your team adopts it.

To avoid the “Heavy Tool Risk,” you must balance these four critical areas:

Data Model and Governance

How you structure your data determines who is willing to enter it.

Flexibility vs. Standardization

Flexible, team-owned models allow testing to live directly inside daily Jira stories, ensuring deep workflow embedding.
Standardization improves global reporting, but it often creates friction (high adoption hurdles). If a developer has to navigate a rigid, complex data structure just to log a result, they simply won’t do it.

Planning and Execution Model

The complexity of your execution phase dictates who actually participates in quality.

Comprehensive vs. Lightweight Planning and Execution

Lightweight planning combined with In-issue execution allows developers to “test as they go” within their existing Jira tasks, ensuring near-instant adoption.
Detailed, manual-heavy planning provides a massive safety net but creates a “Heavy Tool Risk.” When a tool requires a specialist to configure every test run, QA becomes a bottleneck.

Reporting and Traceability Expectations

The “weight” of your reporting requirements impacts how the app actually feels to the user.

Granularity vs. Clarity

Focusing on high-level outcomes and practical traceability keeps the interface fast and built for speed.
Striving for 100% traceability provides an audit trail but can lead to “Jira issue bloat.” This impacts scale and performance, leading to slow loading screens that frustrate fast-moving agile teams.

Admin Overhead and Team Adoption Risk

The “cost of ownership” is the biggest predictor of whether a tool will be abandoned.

Complexity vs. Simplicity

Simple, intuitive tools lead to fast adoption speed because they require zero specialist training.
Rich, complex features offer deep control but require high admin overhead. If the tool is too complex for a non-tester to understand in five minutes, it will remain a QA-only silo, cutting the rest of the team off from quality insights.

Jira Test Management Tools Trade-Off Comparison

With so many Jira test management tools available on the Atlassian Marketplace, it’s more useful to compare them not just by features – which show what a tool can do – but by the outcomes your team can realistically achieve after implementation.

To make this comparison meaningful, focus on the trade-offs that impact adoption, workflow integration, and quality delivery. These include:

Adoption Speed: How quickly teams adopt the tool and make it part of their daily work?
Workflow Embedding: How well is testing embedded in Jira workflows, encouraging developers and QA to own quality?
Scale and Performance: How does the tool scale and perform in Jira instances of varying size?

Why the “Heavy Tool Risk” Kills Shift-Left

You should also consider the “Heavy Tool Risk” – the silent killer of modern QA. When a Jira test management tool is too complex:

Developers Opt-Out: If running a test requires a 10-step configuration process in a separate Jira tab, developers will skip it.
QA Becomes the Bottleneck: All testing data must go through a “specialist,” creating a silo where the team doesn’t see quality until the end of the sprint.
The Feedback Loop Breaks: Without fast, in-issue visibility, bugs are found later, costing more to fix.

Based on the trade-offs above, here is how the leading Marketplace apps stack up against the shift-left criteria:

Selection Pillar	QAlity Plus	Xray / Zephyr Squad	Qmetry / AIO Tests	RTM
Adoption Speed	Near Instant. UI is lightweight and self-explanatory; no specialist training required.	Slow. Requires weeks of setup and tool training for non-QA members.	Moderate. Feature-heavy UIs can feel overwhelming for non-testers.	Variable. The learning curve depends on what features you use.
Workflow Embedding	Deep In-Issue. Tests and executions live directly inside Jira stories.	Modular. Often requires clicking into separate tabs or specialized issue types.	Feature-Driven. Embedded but cluttered; requires significant UI navigation.	Dual-Focus. Testing is linked to requirements, but can feel like a separate app.
Scale and performance	High Velocity. Built for Jira Cloud speed; avoids the heavy-loading screens common in legacy tools.	At Risk. Large test volumes can lead to Jira “Issue Bloat” and slower page loads.	Scalable but Heavy. Performance holds up, but UI latency increases with feature use.	Stable. Good for mid-sized projects but can lag in extremely large instances.
“Heavy Tool Risk”	Low. Simplicity ensures developers actually log results.	High. If only QA understands the tool, shift-left fails.	High. Feature-fatigue leads to users bypassing the tool during sprints.	Medium. Confusion over requirements vs. tests can hinder quick updates.

Buyer Guidance: Which Tool Fits Your Organization?

If you need fast adoption and “test-as-you-go” efficiency…

If your goal is shift-left, speed-to-adoption, practical fundamentals, and cost efficiency, you need a tool that removes the wall between writing code and logging a test. Tools like QAlity Plus are designed for this.

The Planning Reality: Lightweight and focused on the fundamentals.
The Execution Reality: This is where the shift-left happens. By enabling in-issue execution (logging steps directly inside a Jira Story), you remove the context-switch. When execution is this simple, developers actually participate, and the “Heavy Tool Risk” disappears because quality is a shared activity, not a QA chore.

If you need maximum depth and complex audit trails…

For large organizations in highly regulated sectors (MedTech, Finance), Xray or Zephyr Squad are the standard. They offer the granular versioning and “Test Execution” issue types required for strict compliance.

The Planning Reality: You get robust, versioned test repositories.
The Execution Reality: Running a test often requires creating a separate “Execution” issue. This Modular Execution is a specialist’s workflow. It creates a high barrier for developers, meaning QA remains the sole owner of quality data.

If you need a middle ground with high automation focus…

Tools like QMetry or AIO Tests provide extensive functionality for teams that want more than streamlined tools but less than the full enterprise overhead.

The Planning Reality: Strong folder structures and cross-project reusability.
The Execution Reality: These tools often have “feature-heavy” execution screens. While powerful, the UI latency and complexity can lead to “Execution Fatigue,” where teams begin bypassing the tool during fast-paced sprints to save time.

If you need to combine requirements and testing in one view…

RTM (Requirements & Test Management) is for teams that want a rigid, direct link between Jira requirements and test cases.

The Planning Reality: Excellent for ensuring every requirement has a test case.
The Execution Reality: The process is highly structured. For Agile teams, this Rigid Execution can feel like a separate layer inside Jira. It’s great for “checking the box” on a requirement, but it can struggle to keep up with the fluid nature of daily development work.

Beyond the UI: Why Architecture is Your Final Filter

Choosing the right tool is the first step, but understanding the cost of architectural trade-offs is what ensures long-term success. Once you narrow your shortlist based on shift-left enablement, speed, and adoption, the next question is how the tool’s storage model will affect Jira performance, instance growth, and long-term maintainability.

Before you commit, you need to know how the real costs of “Tests as Jira Issues” vs. “Separate Test Storage” will impact your team’s daily speed, your instance’s health, and your ultimate ability to successfully shift-left.

The post How to Choose the Right Jira Test Management Tool For Visibility, Speed, and Adoption appeared first on SolDevelo.

Why Jira Teams Struggle with Test Visibility – And Why Visibility is a Shift-Left Requirement

Kinga Korzeniowska — Mon, 16 Mar 2026 08:30:34 +0000

Teams often treat visibility as a reporting problem when, in reality, it’s a timing problem.

A sprint can look perfect in Jira: all stories are “Done,” progress is green, and the dashboard shows no blockers. Then QA starts testing, and suddenly, critical issues pop up. By the time QA usually begins in a sprint, visibility no longer helps teams prevent risks – it only exposes them.

The solution? Teams need to spot quality risks while the work is still in progress, not after it’s supposedly “done.” This is where shift-left testing comes in: making test results, coverage, and risk visible early in the development cycle, before issues become costly and disruptive.

But making things visible earlier is one thing. The real question is what kind of visibility teams actually need.

What Is Real Test Visibility?

Real test visibility is about clarity across the full spectrum of quality – not only what has and hasn’t been tested, but also where the risks are and whether the product is actually ready for release. Although for many teams, test visibility = knowing which Jira test cases passed and failed, that’s actually just the tip of the iceberg.

Why does it matter? Many teams face a persistent “quality gap”: fast delivery, late validation. Features are made quickly, but thorough checks happen too late. The result? Gaps in coverage, hidden risks, and uncertainty about whether the product is truly ready.

Visibility isn’t just a nice-to-have for QA – it’s the foundation for effective shift-left testing. The earlier you make testing visible in the development cycle (the more you shift it left), the sooner teams can spot problems before they turn into costly issues.

Three Layers of Test Visibility for Shift-Left QA

True test visibility has three layers that cover different aspects of QA:

Test Coverage Visibility – what’s been tested, and what hasn’t?

Coverage visibility shows which stories, features, or workflows have already been tested – and which still have no reliable validation behind them. In Jira terms, this helps teams see whether delivery progress is actually matched by test coverage, or whether work is moving forward with blind spots still in place.

Visibility of risk – what could go wrong?

Risk visibility highlights where confidence is weak, even before release. That includes failed, blocked, inconsistent, or flaky outcomes, as well as areas where testing is incomplete or evidence is weak. This helps teams focus on what is most likely to delay release or create escaped defects, rather than treating all work as equally healthy.

Visibility of readiness – is this product really ready to ship?

Readiness visibility answers the question leadership and delivery teams actually need to ask: Is this ready to ship? It depends on evidence, not optimism: coverage status, execution outcomes, linked defects, integration checks, and other quality signals that support a credible go/no-go decision.

Those three layers give teams clear insight into coverage, risk, and readiness, making the gap between fast delivery and late validation disappear. As a result, quality becomes more predictable, and releases happen with fewer surprises.

The Consequences of Late Stage QA

Many teams still choose to wait until the end of the sprint to run tests, which doesn’t come without consequences. It can result in issues and risks that can affect the entire development process and the final quality of the product.

“Done” doesn’t mean tested

In late-stage QA, the label “done” can be dangerously misleading. In development, a feature is often marked as “done” once the code is written. But “done” here doesn’t mean it has actually been properly tested. Without thorough QA, there still might be some bugs left. When they appear later during the release, fixing them is not only time-consuming but also costly.

Ad hoc regression planning

When regression testing is handled in a chaotic or improvised way, gaps in coverage show up and old bugs may resurface, increasing risk across the product.

Defects lack a test context

When defects are identified, they frequently lack detailed information about the test conditions under which they were discovered. Without proper context, it becomes harder to understand the root cause, reproduce the issue, and fix it effectively.

Leadership can’t get a credible readiness snapshot

When testing progress, defects, and regression coverage aren’t clearly visible, managers can’t confidently decide if the product is ready for release. This increases the chance of surprise in production.

By shifting testing earlier in the development cycle and making test visibility a priority, these risks can be avoided.

Embedding Shift-Left Testing in Jira

Shift-left works best if quality is visible from the start. Too often, testing happens at the end of the sprint, leaving teams in the dark about coverage, risks, and readiness. Implementing QA earlier into the process changes that.

In Jira, you can bring shift-left visibility to life by adding a test management layer that works directly within the development environment. It gathers test execution, defect tracking, and reporting together in one place. Instead of managing test cases in separate tools or spreadsheets, you access everything directly within Jira and gain full visibility.

User stories, requirements, and sprint work all live in one place, becoming a unified part of your testing process.This way testing is no longer a phase that happens after development. It becomes part of the delivery process itself – visible earlier and connected to context.

How Adding a Test Management Layer in Jira Transforms Your Development Process

When testing becomes fully embedded in Jira, visibility is no longer limited to QA. The biggest shift isn’t technical – it’s organizational.

Shared visibility across roles

Developers see which stories already have test coverage, and which don’t.
QA sees progress in real time, not at the end of the sprint.
Product Owners gain clarity on what has actually been validated, not just what is marked as “Done.”
Delivery Leads get a reliable readiness snapshot without asking for separate reports.

As a result, everyone works from the same source of truth and the data is continuously updated within Jira.

Faster feedback loops

Risks surface earlier: because tests are linked to requirements and executed within the sprint workflow, risks can sometimes be noticed the same day code is delivered.
Defects are no longer isolated tickets: they’re connected to context. This makes issues easier to understand, reproduce, and fix.

This way, instead of discovering problems at the end of the sprint, teams can now address them while work is still in progress.

Fewer coordination meetings
When visibility is built into the workflow, teams don’t need extra meetings to synchronize. Information is accessible, traceable, and always up to date. The result? Less time is spent on aligning status, and more time can be spent on improving quality.

Making Visibility and Shift-left Work

Teams often struggle to see the full picture when QA starts too late, but shift-left visibility in Jira can break this cycle. With the right test management for Jira solution teams can gain full insight from the very start of the delivery process. Unifying testing, defect tracking, and reporting within Jira gives real-time clarity and makesquality an integral part of development – not just a final, too-late stage.

The post Why Jira Teams Struggle with Test Visibility – And Why Visibility is a Shift-Left Requirement appeared first on SolDevelo.

Why and when you should use an hourly Gantt chart

Kinga Korzeniowska — Mon, 19 Jan 2026 08:26:09 +0000

Most project plans start with a simple question: what needs to get done, and on which day? And this approach often works just fine. But once work starts happening in hours instead of days… that’s when that neat timeline can quickly lose its usefulness.

Think about tasks that take 30 minutes, a one-hour setup, or a few tightly timed handoffs. In a day-based plan, those details easily get buried. Short tasks vanish into a single bar, dependencies look like conflicts, and shared resources seem overbooked when they’re actually fine. These are the kind of misconceptions you want to avoid. That’s where an hourly Gantt chart can help you with that.

In this article, we’ll explore day-level and hourly planning, take a look at when an hourly Gantt chart makes sense and when it’s not necessary, and how to use hour-level scheduling without introducing unnecessary complexity to your project plan.

Why day-level planning falls short

With Gantt charts you get to see a broad overview of your plan which works perfectly fine in many cases. There are times though, when these kinds of charts can act as a fog that obscures the critical details of your project.

First, it’s short tasks – those lasting 15 minutes or a few hours. They get lumped together into a single day, turning your plan into some sort of guesswork. Another thing that becomes invisible on a daily Gantt chart is overlapping work. Two people working on interdependent tasks at different hours on the same day can appear to be in conflict, when in reality their schedules fully align. What about situations when you’re dealing with limited tools or spaces? That’s when precision becomes even more important as shared resources require exact timing – you don’t want to book a specialized machine or a meeting room for an entire day, when you only need it for two hours.

At the end of the day, high-pressure projects need clarity, not assumptions. When timelines are tight or things change at the last minute, having hour-level visibility is often a must. The way a day-level Gantt chart obscures the clarity in some cases is what teams simply cannot accept.

When to use an hourly Gantt chart

Whether it’s for tasks shorter than one working day or other reasons, there are times when you need to zoom in to keep everything on track. In fast-moving fields like manufacturing, creative production, or IT maintenance, work happens in quick bursts. If you’re running UX research sessions or managing a busy event, a daily view just won’t cut it – you need to see exactly how those 1-hour or 3-hour windows fit together.

A certain level of detail is also a must in case of work that depends on precise sequencing. Think of specialized installations, equipment handoffs, or inspections; the moments where one task must finish at 10:00 AM so the next can start at 10:05 AM. Finally, an hourly view is essential when client timing and expectations matter. For logistics, photography shoots, or high-level consulting, being on time doesn’t mean sometime today – it means being there at the exact moment the client expects you.

By zooming in to the hour level when timing really matters, you can see how work actually flows through a day, coordinate people and resources more smoothly, and avoid last-minute surprises.

What hourly Gantt charts bring to the table

An hourly Gantt chart gives you a clear roadmap for how the day actually unfolds. It helps ensure work happens in the right order and at the right time. With that level of visibility, you get:

Clear handoffs between people or teams – waiting time and miscommunication are reduced as hourly planning shows exactly when one task ends and the next begins.
Fewer surprises during the day – the risk that something in the detailed plan was missed becomes significantly lower. Teams can spot conflicts or gaps early, before they turn into delays.
Better use of time and resources – seeing the day hour by hour helps avoid idle time, double-booking, or rushed work.
Easier adjustments when plans change – if something runs late, it’s much easier to shift hours than rethink an entire day.

When hourly Gantt charts are unnecessary

Hourly Gantt charts aren’t the right fit for every project – in some situations they can add complexity without delivering any meaningful value. When tasks span multiple days or weeks and don’t depend on precise start-and-stop times, breaking them down by the hour rarely improves execution. For example, work such as research, design, writing, or long development phases doesn’t really progress according to strict hourly blocks.

Hourly Gantt charts are also unnecessary if your schedule rarely changes or if the project is still in a conceptual or early planning stage. At this point, rough estimates and flexible time ranges are far more useful than detailed hour-by-hour breakdowns.

Another key consideration is how your team actually works. If team members don’t track their time in hours or aren’t expected to shift tasks throughout the day, adding that level of detail can create extra effort without real benefits. Instead of improving visibility, it may distract from the real goal: delivering progress and outcomes.

The key is to match the level of detail to the project’s needs, so teams can spend the minimum time necessary to maintain their schedules.

Best practices for hour-level project planning

Hour-level project planning comes with a level of precision that day-based plans can’t offer – but only when used thoughtfully. The best practices below show how to apply it effectively, without overcomplicating the planning process.

Start big, then zoom in later – begin with phases and milestones. Add hourly detail only when you reach an execution window, such as a production day, deployment week, or event week.
Use rolling planning windows – plan hourly only for the next 7-14 days. Keep the rest in day or week view to avoid constant rework when plans change.
Only add hourly detail where it matters – leave multi-day tasks at a high level. Break things down hourly only for work that truly needs precise timing.
Use sub-tasks for precision – keep large tasks at the day level and schedule sub-tasks hourly to maintain both detail and the big picture.
Re-forecast often – hourly plans become outdated quickly. Treat them as living plans and adjust them regularly as conditions change.

Common mistakes in hourly Gantt chart planning

An hourly Gantt chart that was poorly applied can quickly become counterproductive. The mistakes below are some of the most common pitfalls teams run into when planning at the hour level.

Over-detailing the entire project – don’t turn six months of work into one-hour blocks. Too much detail creates noise and makes the plan harder to use.
Forcing hourly detail where it doesn’t fit – turning tasks like “write copy” into hourly slots can lead to micromanagement instead of better planning.
Forgetting the human factor – people aren’t machines and don’t switch context perfectly every hour. Don’t forget to build in buffers and breathing room to keep plans realistic.

Conclusion

Hour-level project planning can be incredibly beneficial, but only when it’s used with the right intent. More detail doesn’t automatically mean better planning; applied too broadly or too early, it can actually reduce clarity. That’s why it’s important to consider the nature of the work, its rhythm, and how your team operates. By choosing thoughtfully between hourly and daily planning, you give your team the right level of structure without unnecessary complexity – keeping the plan useful, realistic, and easy to execute.

The post Why and when you should use an hourly Gantt chart appeared first on SolDevelo.

Extending ODK Collect with custom data collection

Anna Kwaśny — Mon, 12 Jan 2026 12:24:37 +0000

ODK

Data Collection

Mobile data collection tools are widely used in education, global health, and social impact projects. While many platforms offer solid features out of the box, they don’t always cover more specific needs. This is especially true for teams running assessments that require timed tasks, custom question types, or detailed tracking of how respondents progress through a test. In these situations, custom data collection solutions become necessary – particularly in offline-first environments.

This case study shows how we extended ODK Collect with custom functionality to support advanced educational assessments. The goal was to enable a smooth move away from SurveyCTO while keeping full compatibility with the official ODK ecosystem and ensuring the solution would be easy to maintain in the long run.

The client: Compassion International

The client had been using SurveyCTO to run literacy and numeracy assessments in the field. Their setup included additional question types that are not available in standard ODK Collect, such as:

Timed assessments used in EGRA (Early Grade Reading Assessment) and EGMA (Early Grade Mathematics Assessment)
Slider-based numeric questions that made data entry easier and more intuitive

As part of a shift toward a fully open-source and upstream-compatible solution, the client decided to migrate to the official ODK Collect app. The key requirement was to keep all existing assessment features so that field teams could continue working without disruption.

The main challenge: extending ODK Collect with custom question types without breaking compatibility or making future updates difficult.

Analysis: choosing the right approach

We started by looking into whether the required features could be built as external applications that communicate with ODK Collect. At first glance, this approach seemed attractive because it would limit changes to the core app. However, deeper analysis showed several important drawbacks:

Limited support for translations and localization
Weak integration with ODK Collect’s form engine and lifecycle
More complexity when it comes to deployment and long-term maintenance

Because of these limitations, the external-app approach could not fully meet the client’s needs.

Solution: a custom, upstream-compatible fork of ODK Collect

To ensure a seamless user experience and full feature coverage, we implemented a dedicated fork of ODK Collect. This approach is something we often apply as part of our custom software development services, especially when open-source tools need to be adapted to real-world workflows.

The solution was built around three key principles:

Staying compatible with the official ODK Collect project
Keeping code changes minimal and well isolated to simplify future updates
Full support for XLSForm, so existing forms could be reused without changes

Both custom question types were based on existing ODK implementations, which helped us extend functionality while staying consistent with ODK’s architecture and user interface.

Timed Grid question for EGRA and EGMA

The Timed Grid question was designed specifically for conducting timed literacy and numeracy assessments in offline field conditions. It supports a wide range of EGRA and EGMA subtasks, including:

Letter & number identification
Familiar word reading
Nonword reading
Oral reading fluency with comprehension
Addition and subtraction tasks

Key capabilities

Choice lists displayed as grids or text passages
Layouts that adapt automatically to screen size
Pagination for longer assessments
Built-in timer with automatic stop when time runs out
Prompts to finish early after multiple incorrect answers
Option to complete the task manually
Detailed metadata capture, including:
- Number of attempts
- Correct and incorrect responses
- Remaining time
- Last attempted item

This feature lets you run assessments in a consistent, repeatable way while still capturing detailed data for analysis and reporting later.

Slider question for numeric input in ODK

The second feature we added was a Slider question type, designed to make numeric input easier and more user-friendly. The slider is fully integrated into ODK Collect and works directly with XLSForm definitions.

Key aspects

Configurable minimum and maximum values
Smooth, touch-friendly interaction consistent with ODK UI
Numeric values stored cleanly in form results
Clear documentation to support long-term maintenance

This improves the experience for respondents while keeping the data structured and reliable.

Quality assurance, deployment, and knowledge transfer

In addition to development, the project included thorough testing and operational support:

Unit and integration tests to match the behavior of existing SurveyCTO plugins
Deployment support across Development, Staging, and Production environments
Detailed documentation and knowledge transfer to support ongoing maintenance

This ensured the solution was not only functional, but also stable and sustainable over time.

Results: custom data collection without added complexity

By extending ODK Collect through a carefully designed custom fork, the client was able to:

Move away from SurveyCTO without losing key functionality
Run EGRA and EGMA assessments using upstream-compatible ODK tools
Keep full control over their data collection setup
Reduce long-term reliance on proprietary extensions

Need a custom data collection solution?

Looking for a partner experienced in ODK Collect customization and offline data collection?

Let’s design a solution that fits your assessment workflows and scales long-term.

Get in touch to discuss your project →

The post Extending ODK Collect with custom data collection appeared first on SolDevelo.

Ganttly for monday.com: Flexible hourly planning for teams

Anna Kwaśny — Thu, 18 Dec 2025 13:14:17 +0000

Time is an unstoppable force that affects everything we do. Harnessing its potential is the key to effectiveness. But it’s difficult to control something you cannot see. Except, with Ganttly – Hourly Gantt for monday.com, you can.

Turn your time into a clear gantt chart, and manage every detail – down to 15-minute and 1-hour increments.

Visualize your plan

Ganttly – Hourly Gantt for monday.com is a versatile tool designed to support your time management. No matter if you’re preparing a long-term project or a super tight few-hour schedule, Ganttly will support you throughout the process, turning your monday.com board into a clear and structured plan.

Start in just a few steps

You can start benefiting from Ganttly right away. All you have to do is select the start and end date columns for your items. The app will map them neatly on a timeline, using either items only or items with subitems, depending on your needs.

Learn more about setting up Ganttly →

Transform your planning with Ganttly charts

View

View your board in an interactive Gantt chart, including groups, items and subitems, displayed on a timeline in hourly, daily, weekly, and monthly increments.

Benefit from instant color-coding by group that helps you find items you’re looking for right away. In just a glance, gain clarity into how long each group will take.

Manage what items you see on your chart using search and filtering options. Easily hide items and subitems without start and end dates for a clearer view.

Plan

Modify your plans by simply dragging and dropping items on a timeline.

Effortlessly move around the timeline, using quick options to scroll to the current day or to a selected group.

Check details of each item by clicking on its name. Change start and end dates in the item details modal.

Export

Your plans might be built in monday.com, but they live and breathe in the real world, guiding your operations. That’s why you can export them to a CSV, PNG or SVG file to share with other people. Send it to your partners or stakeholders, or print the file and take your Ganttly chart with you wherever you go.

Learn more about Ganttly features →

One app, countless possibilities

Ganttly gives you and your team the freedom to build flexible, transparent plans that fit the way you work. Whenever there are items that need to be neatly arranged on a timeline – Ganttly will handle it with ease.

Master the minute-by-minute: Ultra-detailed hourly operations

When your team runs on tight timelines, even a five-minute delay can ripple through the entire day. Ganttly helps you keep everything synchronized – people, spaces, equipment, and high-stakes deadlines – so nothing slips and no one is left waiting. These are the workflows where precision isn’t a luxury, but a necessity.

Events, Hospitality & Venue Management

Events operate on ultra-compressed schedules, where every room changeover, vendor arrival, and session block has to sync flawlessly. With Ganttly, you can map the entire event flow in minute-level detail, ensuring everything runs exactly when it should.

Use Ganttly to coordinate:

Room and venue bookings with clear start-end times
On-site vendors who rely on strict setup windows
Session schedules across multiple tracks or spaces
Guest flow and staffing, aligned to actual demand throughout the day

No more overlapping bookings, no more guesswork – just total clarity on what happens when.

Film, Media & Creative Production

In creative production, teams jump between shoots, edits, and equipment setups in rapid succession. Hourly blocks are the norm, and precision keeps schedules tight and crews efficient. Ganttly gives you a visual, real-time timeline that shows how every piece fits together.

Perfect for managing:

Shoot schedules with exact call times
Equipment and set bookings, avoiding resource conflicts
Editorial workflows that pass work seamlessly between teams
Studio session planning without overbooking or idle time

With Ganttly, production days become smoother, more predictable, and far easier to adjust on the fly – because creativity shouldn’t be slowed down by logistics.

Precision in motion: High-intensity hourly operations

Some industries operate in a rhythm where every hour shapes the day’s output. They move in a steady, predictable cadence that demands tight coordination and zero surprises. Ganttly gives these teams the clarity to plan confidently, react quickly, and keep critical operations flowing without costly downtime.

Manufacturing & Production

In production environments, machines, shifts, and work orders all run on hourly cycles. A single delay can turn into missed quotas, idle machines, or overtime costs. Ganttly helps you surface those dependencies and keep everything moving in perfect sync.

Use Ganttly to manage:

Machine scheduling across multiple lines or stations
Short production runs that need precise sequencing
Changeovers and maintenance windows without disrupting throughput
Quality inspection slots that must align with production steps

Healthcare & Clinical Operations

Healthcare runs on an intricate schedule of hours, resources, and patient flow. Every room, device, and specialist needs to be in the right place at the right time. Ganttly helps clinical teams visualize these complex patterns and make the most of limited capacity.

Use Ganttly to orchestrate:

Operating room schedules with clear turnaround visibility
Lab tests that follow specific timing and batching requirements
Patient flow and procedures, mapped from prep to recovery
Staff shifts aligned with actual demand patterns

With Ganttly, healthcare teams gain a clear, dynamic view of their operations, reducing delays, improving resource usage, and creating smoother experiences for both staff and patients.

Moving the big picture: Time-sensitive operational logistics

Not every operation requires minute-by-minute oversight, but timing still matters. In logistics and transportation, delays can spread across fleets, routes, and schedules – impacting costs, customer satisfaction, and operational efficiency. Ganttly helps teams visualize the bigger picture while keeping track of the critical timing windows that make day-to-day operations run smoothly.

Logistics & Transportation

Managing logistics means balancing multiple moving parts over daily to multi-day horizons, with some hour-level precision when needed. Ganttly helps you plan, coordinate, and adapt without losing sight of the overall flow.

Use Ganttly to optimize:

Delivery windows so shipments arrive on time, every time
Fleet scheduling to maximize vehicle usage and reduce bottlenecks
Loading and unloading sequences for smoother warehouse operations

With Ganttly, logistics teams gain a clear overview of resources, routes, and timelines, making it easier to anticipate conflicts, adjust plans on the fly, and keep operations flowing efficiently, without obsessing over every single hour.

Master hourly planning on monday.com with Ganttly

With Ganttly – Hourly Gantt for monday.com, planning doesn’t have to be complicated. It turns complex schedules into clear, actionable timelines, giving your team the visibility and flexibility they need to keep operations running smoothly.

Start your free trial today →

The post Ganttly for monday.com: Flexible hourly planning for teams appeared first on SolDevelo.

Scrum Sprint Reviews tips that will make your work more effective

Kinga Korzeniowska — Fri, 05 Dec 2025 07:51:17 +0000

If your Sprint Review feels like just another demo, you’re missing out on one of Scrum’s biggest opportunities. Done right, it can change the direction of your product, and the way your team works. It’s a key moment in the feedback loop. It’s the time when stakeholders can influence where the product goes next, how the feature roadmap evolves, and what value should be delivered in upcoming Sprints.

When do you know that your Review went well? When everyone walks away with a shared understanding of where the product stands and where it’s headed.

This article shares practical tips to help you get real value out of your Sprint Reviews. We’ll walk through three stages: how to prepare, how to run the Review, and what to do afterward. The goal is to give you simple habits that make Sprint Reviews genuinely helpful for your team and stakeholders.

Part I: Sprint Review preparation – laying the groundwork

First things first: before you jump into demos and discussions, a little preparation can set your Sprint Review on the right track. Here’s what you can do to set things up for a smooth one.

1. Maintain an incremental approach

A Review works best when you have something real to show, not just promises or plans. Aim to build a small, working part of the product every sprint – something you can click, test, or experience. This way your team gets to see visible progress and usable features. Break it down so that each sprint produces something stakeholders can see and react to. And if you’re hoping for specific feedback – maybe you have open questions about a feature, potential risks you want to validate, or opportunities you’d love a perspective on – let stakeholders know. Be clear about what kind of input you’re looking for.

2. Invite the right stakeholders

You want your Sprint Review to be a gathering of people who actually care about what’s happening. That means inviting the folks who will be affected by the changes you’re working on – business reps, users, technical experts, and the people who make decisions.

Make it clear why their input matters. When everyone knows why they’re present, they show up more engaged and ready to share real insights.

3. Establish and communicate a clear agenda

No one likes walking into a meeting and wondering what’s going to happen. A simple agenda solves this. Include the scope of features or increments that will be presented, and even what’s planned for next sprints. This way stakeholders immediately see why their presence matters, and how the topics connect to their work. Share the plan ahead of time and let people know when their part of the meeting will happen. Some teams even invite certain groups for specific slots – like feature demos or roadmap discussions – so everyone uses their time wisely. When the schedule is clear, the meeting feels organized, and people know why their presence matters, they are much more likely to shop up and come back next time.

Part II: Sprint Review – delivering value and creating engagement

Now that you’re prepared, it’s showtime! The Review is your chance to share progress, get feedback, and make sure everyone is on the same page. Done right, it’s not just a demo – it’s a conversation that helps the team and stakeholders move forward together. How to orchestrate it step by step?

1. Present with a clear model

Start by giving a little context. Explain the sprint goal, the root cause or problem the team tackled during the sprint, and why it mattered. Then show the intended value: what you hoped to deliver and why it’s important. Finally, share whether the goal was achieved, and if it wasn’t, be honest about why.

2. Demonstrate the working increment

Let the product speak for itself. Show the part of the product that’s actually working. Focus on the user or business value, not just the technical details. The more tangible and “real” it feels, the more likely you are to get meaningful feedback from the stakeholders.

3. Engage the whole team

This isn’t just a Scrum Master or Product Owner moment – everyone on the team can and should contribute. Developers can explain technical choices, dependencies, or improvements made during the sprint. Encourage questions and discussion so the team understands not just how something was built, but why it matters.

The Review is also a great moment for the whole team to hear stakeholder feedback firsthand – both the concerns and the “aha!” moments. This builds real product awareness inside the team and helps them make better decisions when selecting backlog items for the next sprint.

4. Facilitate a constructive conversation

Keep things friendly, productive, and on track. Make sure discussions are ordered, and gently bring the conversation back on track if it drifts off-topic. Stick to the agenda and time slots. This way, everyone can share their insights without chaos.

See that some topics spark strong interest or lively discussion? If a particular feature, risk, or idea generates a lot of questions, it might be the right moment to plan a separate workshop or dedicated meeting. Remember: the Review is for gathering feedback and figuring out whether anything else is needed to deliver maximum value. It’s not the place to go into detail on acceptance criteria. If something requires more time, contract stakeholders for a follow-up session so the conversation can happen without slowing down the Review.

5. Strengthen the feedback loop

Not everyone will speak up just because they’re in the room. Make it easy for them to share what they think. You can actively encourage feedback by asking open-ended questions like:

What do you think?
Why do you think so?
Does this increment need any changes?

Give them space to reflect. When they talk, take notes on what matters: ideas, concerns, assumptions, and opportunities for improvement. The ultimate goal? To turn opinions into actionable insights, not just collect polite nods or “looks good to me” comments.

6. Respect the time frame

Finish on time – it shows respect for everyone’s schedule and builds trust. Consistent, punctual Reviews make stakeholders more likely to engage meaningfully. And they will be more likely to join the next meetings if they see that a clear agenda is being followed and the discussion is actually constructive.

Part III: after the review

Once the Sprint Review wraps up, the work isn’t over just yet. This is the moment to turn all feedback and discussion into real actions that will guide the next Sprint. Think of this phase as the bridge between what you learned and what you’ll do next.

1. Run the retrospective

The Sprint Review and the Sprint Retrospective work best when you let them learn from each other. Bring the key insights from the Review straight into your Retro:

Did we deliver the value we planned?
What feedback did we hear and why?
How will this affect our future work?
What should we adjust so we don’t face the same issues again?

These conversations help the team understand not only just what happened, but why. It also strengthens collaboration and prevents the same challenges from showing up again.

2. Update and adjust the product backlog

Feedback is useful if it actually lands somewhere. Make sure stakeholder insights make their way into the backlog quickly. Whether it’s a new idea, a needed adjustment, or a “let’s not do that again” decision, the backlog should reflect the latest understanding of what brings value.

A small follow-up can make a big difference. Shortly after the Review, send stakeholders a summary focused only on the most critical outcomes. People appreciate having a single, scannable document they can refer back to. Be sure to include:

Key Decisions reached
Assigned Action Points (and who owns them)
Any Next Steps they should expect from the team
Direct Links to documentation, designs, or updated backlog items for self-service

And remember: simplicity and conciseness is your compass here.

Summary

With just a bit of preparation, the right direction, and some thoughtful follow-up, Sprint Reviews can be really valuable moments in your team’s product journey. Involve the right people, show real progress, invite honest feedback, and turn that input into meaningful next steps. Keep it simple, keep it human, and keep improving. If your Sprint Reviews start feeling more like helpful conversations rather than meetings, you’ll know you’re on the right track.

The post Scrum Sprint Reviews tips that will make your work more effective appeared first on SolDevelo.

Case Study: Validating 9 LLMs for complex document analysis

Damian Szafranek — Thu, 06 Nov 2025 07:32:34 +0000

AI Data Processing

Table Of Contents

For many organizations, extracting critical information from documents is a notorious pain point. The critical data is always there, but it’s often buried in dozens of pages of dense, technical text. Missing a key compliance rule or a deadline can mean financial risk and weeks of wasted effort. Our goal was to eliminate this manual bottleneck for our client by building an AI-powered solution.

We tested nine LLMs against two non-negotiable mandates: the need for 100% accuracy and the challenge of finding a way to prove and validate that accuracy at scalable speed. Here is the journey that led us to an optimized, cost-effective model and the development of an AI Jury – a validation process over 180 times faster than human review.

Faster than human review

The Pain: Why manual document analysis fails

The challenge for our client was not just volume, but the density and complexity of their documents. They faced critical risks and productivity bottlenecks because:

Information is buried: Key data is scattered across long documents, making manual extraction time-intensive.
High cost of error: A single factual mistake in compliance or a contract term can result in costly legal or operational issues.
Zero scalability: The reliance on human domain experts meant that the validation process itself was a massive bottleneck that could not be scaled with organization growth.

This high-stakes environment meant any AI solution could not just be “good” – it had to be perfectly trustworthy.

The Criteria: Our mandates for an AI solution

To solve this pain point, our AI solution had to meet certain requirements.

Criteria 1: 100% Accuracy and nuance

The AI had to perform two distinct tasks on every document with expert-level precision:

Complex classification: This is about understanding complex rules and nuance. The model needs to read the entire document and answer questions like, “Is this project compliant with new regulations?” or “Does this agreement require a co-signature?” The answer isn’t just a simple “yes” or “no”; it’s often nuanced, like “Yes, but only as part of a consortium.”
Extraction: This is more straightforward data retrieval. The model needs to find specific data points, such as the project deadline, the total contract value, or the list of eligible regions.

Criteria 2: Explainability and trust (The evidence field)

To build essential user trust and satisfy the requirements, the system had to go beyond a simple answer. For every output, the AI was required to provide two fields:

value: The AI’s final answer (e.g., a specific date, a list of countries, or a True/False value).
evidence: The AI’s explanation for its answer, which often includes a direct quote from the source text.

This evidence field became one of the most important parts of our evaluation. It not only allows a user to instantly verify the AI’s conclusion, but it also provides the critical context that a simple value field would miss, building essential trust in the system.

Our evaluation metrics

When evaluating LLMs, you can measure many things. Public benchmarks available online test for speed (tokens per second), cost (price per million tokens), complex reasoning, coding abilities, and more.

For our specific use case, however, our priorities were very clear:

Accuracy was the #1 priority. A wrong answer on a key information is worse than no answer at all. The system had to be trustworthy.
Cost was a secondary, but important, metric. While our goal was 100% accuracy on all fields, we were interested in the cost-to-performance ratio. A model that was 10x the price would need to prove it was significantly better than a cheaper alternative that could be tuned to perfection.
AI response time was irrelevant. Our tool was designed to process and classify a new document in the background, store the results and notify the user if needed. Whether the analysis took 10 seconds or 1 minute made no practical difference to the user experience.

With this framework, we were ready to set up our first test: a direct comparison against a human expert.

The Experiment Part 1: The human-led test

Our first step was to establish a “gold standard.” A human domain expert manually analyzed three complex documents, creating a perfect set of classifications and extractions for each.

Next, we ran the same three documents through our 9 candidate LLMs. The expert then meticulously compared each model’s output (all 9 sets of answers) against the gold standard.

This review was more complex than just checking for “correct” or “incorrect.” We found that most models were factually accurate on simple extractions. The real difference was in the quality and usefulness of the evidence and the interpretation of nuanced rules.

Because of this, the expert used a 3-point “preference” scale, where a lower score is better:

Excellent

The answer is accurate and the evidence is complete and highly useful.

Good

The answer is accurate, but the evidence is weak, partially missing, or could be explained better.

Low usefulness / Incorrect

The answer is factually wrong, or it’s technically correct but so poorly explained.

Here are the final sub-scores for one of the test documents, which gives a clear picture of the performance spread:

Model	Average score for all documents (lower is better)
GPT-5 Mini	1.8
GPT-5	2.0
Gemini 2.5 Pro	2.0
Gemini 2.5 Flash	2.0
GPT-5 Nano	2.1
GPT 4.1 Mini	2.2
Claude Sonnet 4.5	2.3
Claude Opus 4.1	2.4
GLM 4.6	2.5

Key Discovery: High accuracy wasn’t limited to top-tier models

This manual process was slow and laborious. It took our expert over 3 hours just to manually classify a single document – but the insights were invaluable:

Some models performed poorly: We immediately identified and removed several consistently low-scoring models from future consideration.
Unexpected issues appeared: Unexpected, problematic behaviors emerged (e.g., date unawareness, language switching). We determined these were often “quirks” fixable through more specific prompt instructions.
Key Discovery: High accuracy wasn’t limited to top-tier models: We found that high accuracy wasn’t limited to the most expensive, top-tier models. This proved our accuracy goal was achievable with cheaper alternatives, shifting our focus to finding the best cost-to-performance ratio.
Pivoting to a target model (GPT-5 Nano): Based on its strong performance and lower cost, we selected GPT-5 Nano as the target. Through iterative prompt engineering, we added instructions to fix its quirks and elevate its performance to the required “excellent” standard.

This process was a success, but it was far too slow to be repeatable. We needed a way to run these evaluations automatically to verify if accuracy is still at a high level if we add more classification rules or find a challenging document. This led us to our next idea: if a human expert can grade the AI, could another AI do it for us?

The Experiment Part 2: Assembling the AI Jury for validation

The “Why”: A need for speed

Our human-led review was a success. It gave us invaluable insights and, most importantly, confirmed our key finding: high accuracy wasn’t exclusive to flagship models.

But this process had a fatal flaw: it was painfully slow (over 3 hours per single document). This was a massive bottleneck. We couldn’t possibly repeat this process to test our prompt refinements, validate new models, or run continuous quality checks. It just wasn’t scalable.

We needed a way to automate the evaluation itself. If a human expert can grade an AI’s output, could another AI do it for us?

The Methodology: An AI Jury of peers

We decided to build an “AI Jury” to act as a proxy for our human expert. We selected three powerful, flagship models to serve as the jurors: GPT-5, Gemini 2.5 Pro, and Claude Sonnet 4.5.

The methodology was simple:

We gave each juror the exact same “Correct Answer Sheet” (the “gold standard”) our human expert had created.
We then gave them each of the 9 candidate models’ outputs, one by one.
Their task was to compare the model’s answer to the ground truth and provide a score from 1-10, based on a detailed rubric.

To ensure the jury was meticulous and impartial, we gave them a very specific system prompt. This prompt instructed them to act as expert document evaluators and defined the exact scoring rubric for their 1-10 rating.

For those interested in the exact details, here’s the prompt we provided to each juror:

Copy Code

system_prompt = """
You are a meticulous and impartial expert analyst. Your sole task is to analyze the performance of an AI model that was tasked with classifying, extracting, and summarizing information from a complex technical document.

You will be given two pieces of information:

1.  The Correct Answer Sheet: This was prepared by a human expert and is considered the "ground truth"
2.  The Model's Answer: This is the output from the AI model you are evaluating. Most of the answers contain the AI's explanation of the decision.

Your instructions are as follows:

1.  Compare Field by Field: For each classification and extraction field, carefully compare the value in "The Model's Answer" against the value in "The Correct Answer Sheet."
2.  Evaluate Accuracy, Nuance, and Evidence: Assess the factual correctness and nuance of the value. Additionally, evaluate the evidence provided by the model. Useful evidence is a direct quote from the source text that clearly supports the value. It should help a human user quickly verify the information.
3.  Evaluate the Document Summary: Read the document_summary. A high-quality summary must coherently include the document's main purpose, key entities involved, critical dates, financial figures, and any major compliance or eligibility requirements.
4.  Provide an Overall Score: Give a single, overall score from 1 to 10 based on the model's total performance across all tasks, using the scoring rubric below.
5.  Provide a Justification: Write a brief, clear justification for your score, highlighting the model's key successes and failures in classification, extraction, evidence quality, and summary generation.

SCORING RUBRIC
10 (Perfect): The model's answers are a perfect match with the expert's in both value and nuance. The evidence is always a relevant, helpful quote. The summary is flawless.
9 (Excellent): The model's answers are correct but might have very minor wording differences. The evidence is consistently good. The summary is complete and well-written.
7-8 (Good): The model is largely correct but has one or two minor errors in value, or the evidence is sometimes weak/irrelevant. The summary might be missing one or two non-critical details.
5-6 (Acceptable): The model is partially correct but has significant errors or omissions. The evidence is often missing or unhelpful. The summary is incomplete and misses key details.
3-4 (Poor): The model is mostly incorrect, containing major factual errors. The evidence is poor. The summary is badly written or missing most key information.
1-2 (Very Poor): The model's answer is completely wrong, irrelevant, or fails to follow instructions entirely.
"""

user_prompt = """
DATA FOR EVALUATION

1. The Correct Answer Sheet (Human Expert):
{{ correct_answer.content }}

2. The Model's Answer:
{{ model_answer.content }}
"""

The Verdict: The jury confirms the strategy

The AI Jury’s scorecard

The AI Jury processed all 9 models and returned their scores. We then averaged the scores from all three jurors to get a final, blended rating for each model.

Here are the complete results (higher is better):

Model	Juror 1 (GPT-5)	Juror 2 (Gemini 2.5 Pro)	Juror 3 (Claude Sonnet 4.5)	Average score
GPT-5	7	8	8	7.67
Gemini 2.5 Pro	5	8	8	7.00
GPT-5 Nano	6	7	7	6.67
Claude Sonnet 4.5	6	7	7	6.67
GPT-5 Mini	5	8	7	6.67
Gemini 2.5 Flash	5	7	7	6.33
GLM 4.6	4	7	6	5.67
Claude Opus 4.1	5	6	6	5.67
GPT-4.1 Mini	4	6	6	5.33

The scores themselves were interesting, but the real breakthrough was that the AI Jury’s findings almost perfectly mirrored the conclusions from our human-led review.

The models clustered into the same groups. The same top-tier models (GPT-5, Gemini 2.5 Pro, GPT-5 Nano) performed well, and the exact same models that our human expert flagged as problematic (like GLM 4.6 and Claude Opus 4.1) received the lowest scores from the jury.

This confirmed our approach. The jury wasn’t just spitting out random numbers, it was successfully identifying the same strengths and weaknesses our expert did. This gave us confidence that we could rely on this automated process for future testing.

The Killer Metric: Several hours vs. < 1 minute

The most game-changing discovery wasn’t just that the AI Jury worked, but how fast it worked.

Human expert

Several hours of painstaking, manual classification and comparison.

AI Jury

Less than 1 minute for all three jurors to evaluate all 9 models.

This is the force multiplier we were looking for. We now had a method to validate our work that was over 180 times faster than our original process. This speed unlocked the ability to do things that were previously impossible. We could now run a full evaluation every time we tweaked a prompt, test a new model the day it’s released, or run continuous quality monitoring. All at a fraction of the human cost and time.

What we learned

This two-part experiment yielded a crucial insight: our final deliverable wasn’t just an optimized model, but an entirely new, scalable process for testing and validation. Our key takeaways have less to do with any single model and more to do with a modern strategy for applying AI.

Key Insight 1: High accuracy doesn’t always mean high cost

Our initial hypothesis was that accuracy was king, and we stuck to that. We would have chosen a more expensive, flagship model if it was the only one that could deliver 100% accuracy.

The surprising discovery was that we didn’t have to. For a task like complex document classification and data extraction, even the cheaper, smaller models like GPT-5 Nano were already performing at a very high level. The difference wasn’t a huge gap in accuracy, but minor “quirks”. Through iterative prompt engineering, we were able to tweak GPT-5 Nano to eliminate those issues and achieve the perfect, expert-level results we required.

This is a critical finding: for specific, well-defined tasks, the most powerful flagship models are not always necessary. A more cost-effective alternative, when paired with careful prompt refinement, can perform at the exact same level, mitigating or even eliminating any initial performance difference.

Key Insight 2: The AI Jury as a force multiplier

The “Several hours vs. < 1 minute” metric says it all. Automating our evaluation process is a genuine force multiplier. This isn’t just a one-time cost saving; it fundamentally changes how we can work. We can now:

Iterate rapidly: Tweak a prompt to fix a “quirk” and get immediate feedback on its impact.
Test continuously: Run our AI Jury as part of an automated workflow to ensure the quality of our chosen model never degrades over time.
Validate new models instantly: When a new model is released, we can add it to our test suite and know how it stacks up against our champion (GPT-5 Nano) in minutes, not days.

An AI-driven evaluation process is a viable and powerful tool, allowing for a level of speed and iteration that is impossible with human-only evaluation.

Key Insight 3: Prompt engineering still rules

Choosing a model is only the first step. The biggest performance gains, and the final push to perfect accuracy, came from carefully refining our system prompts.

Our initial human-led review was critical for finding the “quirky” failures, like models not being date-aware or switching languages. We solved these problems not by switching to a more expensive model, but by adding clearer instructions to our prompt. The model itself is the engine, but the prompt is the steering wheel.

Conclusion: Our new AI-powered workflow

Our journey started with a simple question: “Which AI model is best?” We tested 9 models, and the answer was that for our non-negotiable high-accuracy needs, several models were up to the task – including the cost-efficient GPT-5 Nano. We didn’t pivot away from accuracy, we proved that after careful prompt refinement, we could achieve our accuracy goals with a more economical model.

Our final workflow is now a hybrid, human-in-the-loop system:

Human oversight: A human expert still sets the “gold standard” by creating the perfect answer sheet for any new or particularly complex task.
AI-powered evaluation: The AI Jury uses this gold standard to provide scalable, near-instantaneous evaluation of all candidate models.
Optimized AI execution: This allows us to continuously optimize and confidently use our chosen model, GPT-5 Nano, for the live classification and extraction tasks.

To ensure our quality remains high in production, we also integrated a simple feedback system into our tool. If a user spots a result that isn’t accurate, they can flag it. This allows us to react quickly, identify new edge cases, and run our evaluation suite again, creating a continuous loop of improvement.

The future of applied AI isn’t just about finding one single, powerful model to do a job. It’s about building systems where multiple AIs work together with human oversight. One AI to perform the task, and another to validate it. To create solutions that are efficient, reliable, and, most importantly, scalable.

Need a custom AI solution?

At SolDevelo, we specialize in building and integrating custom AI solutions just like this one. If your organization needs help with AI-powered data processing or automation, check our offer.

Learn More

The post Case Study: Validating 9 LLMs for complex document analysis appeared first on SolDevelo.

How continuous feedback loops drive product improvement and innovatiaon

Kinga Korzeniowska — Tue, 28 Oct 2025 14:05:43 +0000

Imagine releasing a new app feature and discovering users aren’t engaging with it the way you expected. So, what went wrong? Often, it’s a missing feedback loop. A feedback loop is basically an ongoing dialogue between your users and your product team. It consists of five key steps: collecting, organizing, implementing, following up, and communicating feedback. So, you put a feature out there, users try it, share what works (and what doesn’t), your team learns from that feedback, makes improvements, and lets users know about the changes. Then, the cycle restarts with fresh feedback. Over time, this creates a system that continually improves your product and keeps it aligned with real user needs.

This process is at the heart of building products that truly put users first. Skip it – or collect feedback messily – and you risk creating features that miss the point. Without a systematic approach, product teams can drift away from the people who actually use their apps. A well-designed feedback loop changes all that: it not only improves product quality but also earns users’ trust and keeps them coming back.

Feedback loops aren’t just a nice-to-have – they can make or break a product. In this article, we’ll explore how they help build better products, and share how our team put this approach into practice while building the Advanced File Gallery app for monday.com.

Gathering user feedback: User tests combined with broad insights

Every feedback loop begins with one simple rule: listening to users. It’s not about guessing or relying on assumptions – it’s about understanding how people actually use your product, what excites them, and where they get stuck. Behind every comment or opinion lies a real user need, which should guide your development decisions.

Collecting feedback early and often is key. That’s exactly how we approached building Advanced File Gallery. Before releasing the app, we ran user tests with early adopters from the monday.com community to validate our assumptions about the item view and overall design. The insights we gathered confirmed we were on the right track and helped shape the product from the very beginning.

Next, we reached out to a wider audience. We posted in the monday.com community to start conversations and gave quieter users a chance to share their thoughts on their own time. To make participation more inviting, we offered free subscriptions as a thank-you and used simple feedback forms to gather practical, actionable opinions. This step turned out to be incredibly valuable. It revealed a need we hadn’t even considered: users wanted a way to filter files by specific criteria. That single insight made a real difference in shaping the product as it was something we hadn’t even thought of initially. It turned out to make a huge difference in usability and overall flow. Before, users had to manually dig through files just to find and add the right ones to a collection. Now, with filtering, it only takes a few clicks to add a bunch of files at once. Simple, but it saves tons of time and makes the whole experience way smoother.

The takeaway? Ask the right questions where your users already are, and show them that their feedback genuinely matters. That kind of engagement doesn’t just improve your product — it uncovers insights you might never get otherwise.

Internal feedback loops as fresh perspectives

Feedback doesn’t just come from users — great ideas can come from your own team, too. That’s why internal feedback is a regular part of our process. During sprint reviews, team members who aren’t directly involved in the project get a chance to take a fresh look at the app. Their outside perspective often surfaces things we might have missed while being too close to the work every day. For instance, during one sprint review, someone suggested a simple “Select All” feature for managing filtered files. It was a small idea, but it made a huge difference for usability.

By blending external user feedback with internal insights, we get a fuller picture of what works, what doesn’t, and what could make the experience even better. It’s a reminder that innovation can come from anywhere – not just from your users.

Mixing qualitative & quantitative feedback: Tools and techniques

Gathering feedback is essential, but it’s only half the story. Understanding it is where the magic happens. To get a complete picture of how users experience the app, we combine qualitative feedback (the “why”) with quantitative data (the “what”).

Qualitative feedback comes from user comments, surveys, and usability tests. It helps us uncover motivations, frustrations, and the context behind users’ actions. Quantitative data, on the other hand, includes metrics like clicks, time spent, and conversion rates. These numbers reveal how people actually interact with the product and help validate our assumptions.

Bringing the two together allows us to make evidence-based decisions, rather than guessing what users want. For example, a user might say, “It’s hard to find certain files,” and the analytics may confirm a low click rate on the filter button. It’s a clear signal for improvement, as it usually means the placement of the filter button isn’t very practical or intuitive. So even though the feature exists, users aren’t taking advantage of it simply because they can’t easily access it or don’t realize it’s there.

Using a variety of tools and channels ensures the process is thorough and effective, helping us capture everything that matters. The goal is simple and clear: to understand both what users say and what they actually do.

Closing the loop: Communicating back to users

Feedback only matters if you act on it – and that’s exactly what closing the loop is all about. Showing users that their ideas lead to real improvements builds trust and makes them champions of your product.

With Advanced File Gallery, we use several approaches to close the loop. We send direct messages to thank users for their feedback and highlight improvements inspired by their input. We also share updates publicly on LinkedIn and in the monday.com community. Whenever someone mentions a feature in a thread, we make sure to tag them and give them a little shout-out.

This kind of transparent communication sparks a positive cycle: users feel invested in the product, share even more ideas, and are more likely to recommend the app to others. It creates a culture where feedback isn’t just data – it becomes action.

In short, closing the loop doesn’t just improve your app: it builds lasting relationships with the people who use it.

Product improvement and innovation driven by consistent feedback loops

Every great product has one thing in common: it never stops learning from its users. Continuous feedback loops turn that learning into action. Every comment, survey response, or data point becomes part of a bigger conversation that drives the product forward — moving teams from “we think” to “we know.” It’s a rhythm of testing, listening, and improving that keeps innovation alive long after launch day.

When users see their feedback shape a new feature or inspire a smoother workflow, something powerful happens: they start to feel like co-creators, not just customers. That sense of involvement builds trust, and trust naturally grows into loyalty. Over time, these small cycles of listening and responding create more than just better apps – they build a relationship where users feel their voices truly matter. And that’s where real innovation happens: not in a meeting room, but in the conversation between teams and the users who inspire their products.

Key takeaways: Building a culture of continuous feedback

Feedback isn’t just data – it’s context. It’s not only about what users say, but why they say it.

Sometimes, a single comment from a user test can reveal a deeper UX issue than dozens of survey results, as it uncovers the reasoning behind users’ behavior. A handful of smart approaches to feedback can transform insights into meaningful product improvements.

Collect feedback early and often. Don’t wait for a major release to start listening – small insights can prevent big problems later.
Combine multiple sources. Mix qualitative feedback (comments, interviews, usability tests) with quantitative data (clicks, time spent, conversion rates) to make evidence-based decisions.
Close the loop with users. Show that their input leads to real changes. Even small acknowledgments build trust and boost engagement.
Make it part of your culture. Encourage teams to see feedback as a living process, not a one-time checkbox. When listening and iterating become habits, products improve faster, and users feel heard.

Through continuous feedback loops, you don’t just create better apps – you transform feedback into high-quality features and lasting user relationships.

The post How continuous feedback loops drive product improvement and innovatiaon appeared first on SolDevelo.

Worklogs now Runs on Atlassian: Migrating to Forge

Anna Kwaśny — Fri, 17 Oct 2025 09:26:13 +0000

Is data security on your mind? It definitely is on ours. That’s why we ensured to get the Runs on Atlassian badge for our Worklogs – Time Tracking, Time Reports, Timesheets for Jira. What it means is that our app runs entirely on Atlassian-hosted infrastructure, aligns with the host Atlassian app’s data residency, and gives you full control over data egress, including the ability to block it completely. That way, all your data is safe and fully protected, and you decide where it goes.

Forge: The (near) future of Atlassian apps

Earlier this year, Atlassian announced the end of support for Connect – a development framework used by vendors to build apps. Eventually, Connect is to be replaced by Forge – a much more standardized, security-focused platform.

Why Forge wins for customers

Aspect	Atlassian Connect	Atlassian Forge
Hosting	Apps are hosted externally by the developer (e.g., AWS, Azure).	Apps are hosted on Atlassian’s own infrastructure.
Security	Security depends on how the developer configures and maintains their hosting environment.	Security is managed by Atlassian – apps run in a sandboxed, Atlassian-controlled environment.
Data residency	Data residency is determined by the developer’s hosting location and may differ from the customer’s Jira/Confluence region.	Data residency matches the host Atlassian product, ensuring compliance and consistency.
Performance & Reliability	Relies on the developer’s servers and APIs; performance may vary.	Runs directly within Atlassian Cloud, benefiting from the same reliability and scalability as Atlassian products.
Data egress	App data can be sent anywhere the developer’s systems are hosted.	Customers control data egress (e.g., analytics, logs) and can block it entirely.
Development model	Traditional web app model – more flexible but requires managing infrastructure.	Serverless model – faster to deploy, easier to maintain, and automatically scales.
Access & Permissions	Uses OAuth and JWT for communication between the app and Atlassian.	Uses the Forge platform’s fine-grained permission model with automatic consent screens.

Atlassian’s shift towards security

Data is one of the most valuable resources in today’s digital world, and so it’s crucial to build platforms and products that prioritize keeping it safe. Atlassian significantly intensified their focus on security in recent years, particularly since 2024. Forge migration is just one of many security-focused initiatives designed to meet modern customers’ needs. Other include:

Atlassian Guard: A comprehensive security suite that offers centralized identity and access management, data loss prevention, and threat detection. Guard enables organizations to enforce security policies, detect suspicious activities, and respond swiftly to potential threats, ensuring robust protection across Atlassian cloud products.
Atlassian Government Cloud: A dedicated cloud environment designed to meet the stringent security and compliance requirements of U.S. government agencies and their contractors. It has achieved FedRAMP Moderate authorization, providing enhanced controls, continuous monitoring, and a separate perimeter from Atlassian’s commercial cloud.

Learn more on Atlassian’s security practices →

Runs on Atlassian: An indicator of future-focused technology

We’re proud to have earned the Runs on Atlassian badge. It shows our commitment to high quality solutions centered around the customer’s best interest.

Our users’ security has an utmost importance to us, and so we will do anything we can to deserve your trust. Migrating to Forge is one of the steps we’re taking to get there – it provides a stronger, safer, and more reliable foundation for our apps. Here’s why:

Built for security: Forge apps run entirely within Atlassian’s infrastructure, eliminating risks tied to external hosting.
Aligned with data residency: Your app data stays in the same region as your Jira instance, ensuring compliance and consistency.
Full control over data flow: You can manage or completely block data egress, giving you total control over where your information goes.
Future-ready platform: With Atlassian phasing out Connect, Forge is the supported, long-term framework that ensures compatibility and stability.
Simplified maintenance and scalability: Forge’s serverless model allows faster updates, automatic scaling, and less downtime.
Transparency and trust: By adopting Forge, we align with Atlassian’s highest security and privacy standards – so you can focus on your work with peace of mind.

Seamless migration

For the existing users, migrating to Forge is simple – all it requires is for Jira Admin to accept the major update. They can do it by going to admin.atlassian.com → Apps → Sites → Choose your site → Connected apps → Worklogs (view app details) → Update.

Once confirmed, the app will upgrade automatically, and no data, configuration, or saved reports will be lost. You’ll still be able to find them in the Saved reports section of the app.

New customers can enjoy all the benefits of a Forge-based app right from the start. Try Worklogs now →

The post Worklogs now Runs on Atlassian: Migrating to Forge appeared first on SolDevelo.

Making patient reviews faster and clearer with OpenMRS

Anna Kwaśny — Thu, 16 Oct 2025 09:59:31 +0000

OpenMRS

Electronic Medical Record

As an open-source medical record system, OpenMRS is on a journey of constant improvement, always striving to provide healthcare professionals with the best user experience possible. With their comfort in mind, SolDevelo recently partnered with MSF and Madiro to deliver a more intuitive view for patient encounter reviews.

What made this initiative truly special was that it was developed in close alignment with the OpenMRS community. Every step of the process – from planning to implementation – was discussed and reviewed collaboratively to ensure the final solution met community standards and real user needs.

The challenge: Unreadable Patient Summary

Up until recently, healthcare professionals who wished to review patient encounter form results, needed to painstakingly go through an unordered list of key-value pairs.

This imperfect view resulted in users having to invest more time and effort to read and understand the key information on their patients.

The solution: Embedding the O3 form view

The goal was to make patient review more readable. To achieve that, we’ve decided to embed the original form layout directly into the page. The information is now presented in a more clear manner that preserves field order, grouping, and labels – making the summary much easier to follow for clinicians.

Entry form (filled out by patients)

Patient summary (viewed by clinicians)

Flexible view

Users can switch between the legacy key-value list and the new embedded summary according to their preferences.

To further improve visibility, we’ve added an option to hide unanswered fields in read-only view. This reduces clutter and makes summaries easier to read.

Increased clarity

We’ve refined the Form Engine to make data presentation clearer and more intuitive:

Interactive elements are now removed in read-only mode to prevent confusion.
Expressions that evaluate to 0 are now correctly displayed, ensuring meaningful zero values are visible while null or undefined results remain hidden.

Supporting healthcare in every detail

These improvements go beyond visual polish – they directly impact the daily work of healthcare professionals. By making patient summaries clearer, better structured, and easier to read, clinicians can focus on understanding their patients’ conditions and making informed decisions quickly. The visual consistency with the original form layout enhances trust and ensures the information feels well-organized.

The added flexibility to switch views and hide unnecessary information ensures that each user can tailor the interface to their workflow, reducing cognitive load and time spent navigating data.

Built together with the OpenMRS community

One of the most valuable aspects of this project was the way it was developed – in full collaboration with the OpenMRS community. We consulted every major step, followed official review processes, and maintained transparent communication throughout. The final solution fully adheres to OpenMRS standards and addresses community-identified needs.

This open and collaborative approach ensured the project’s long-term sustainability and visibility as a shared contribution from SolDevelo and MSF. It also delivered greater value to the client – not only through a technical improvement but through a solution that strengthens and supports the broader OpenMRS ecosystem.

The journey continues

With these updates, OpenMRS continues to evolve into a more intuitive, efficient, and clinician-friendly system – empowering healthcare providers to deliver better care with greater confidence.

Empower your healthcare delivery with OpenMRS. Contact us to learn more →

The post Making patient reviews faster and clearer with OpenMRS appeared first on SolDevelo.