Why 100% Test Coverage Can Mislead Developers in Kotlin & CI Pipelines

Every developer knows the thrill of a green CI pipeline. Yet passing tests can mislead developers, giving a false sense that “100% coverage” guarantees correctness. Coverage only shows which lines ran—it says nothing about real-world behavior. A function may pass a thousand tests while production silently corrupts data.

The Poison of Silent Defaults

One of the most common ways “passing tests” mislead us is through the reckless use of default values. In Kotlin or Swift, it’s tempting to use the Elvis operator to “fix” a nullability issue. It keeps the compiler happy and makes the tests easy to write, but it’s a silent killer for business logic.

// The "Safe" Code that Ruins Your Data
data class Transaction(val id: String, val amount: Double?)

fun processPayment(tx: Transaction) {
    val finalAmount = tx.amount ?: 0.0
    bankApi.charge(finalAmount)
}

A unit test for this is trivial. You pass a null amount, you assert that finalAmount is 0.0, and the test turns green. You celebrate. Meanwhile, in production, a bug in the upstream service sends a null amount for a $5,000 invoice. Your system “safely” processes it as $0.0. No error is thrown. No alert triggers. The logs show a successful “Green” transaction. You’ve just lost five grand because your test validated the syntax of the default value rather than the integrity of the domain rule.

A passing test that confirms a bad default is worse than no test at all. It provides a false sense of security that prevents you from implementing real validation.

The Coverage Illusion

Coverage is a structural metric, not a logical one. You can achieve 100% line coverage by running “Happy Path” data that never hits the dark corners of your logic. Most developers test what they expect to happen, but production is defined by what you don’t expect.

Consider a simple discount calculator. You might have tests for VIP users, regular users, and guest users. All lines are covered. But what happens if the userType is an empty string? What if the price is negative due to a rounding error elsewhere? If those scenarios aren’t in your test suite, your “100% coverage” is a vanity metric. You are essentially driving a car with a speedometer that works perfectly but a steering wheel that isn’t connected to the wheels.

// Coverage: 100% | Reality: Broken
fun applyDiscount(price: Double, code: String?): Double {
    return if (code == "SUMMER24") price * 0.8 else price
}

@Test
fun testDiscount() {
    assertEquals(80.0, applyDiscount(100.0, "SUMMER24"))
    assertEquals(100.0, applyDiscount(100.0, "WINTER"))
}

The tests above cover every branch. But they miss the fact that code could be “summer24” (lowercase), or that price could be 0. The developer sees the green badge and moves on, leaving a pile of edge-case landmines for the next on-call engineer to step on. Real testing requires boundary analysis, not just line execution.

We need to stop asking “is this line covered?” and start asking “what happens if this input is total garbage?” Only then does the green build actually mean something.

Related materials

Afraid to touch the...

The First Time You’re Afraid to Touch the Code The fear doesn’t show up on day one. It shows up the first time you open a file, scroll for ten seconds, and realize you don’t...

[read more →]

The Mocking Circus: Testing in a Sandbox Reality

Mocking is the ultimate double-edged sword. Done right, it isolates logic. Done wrong—which is 90% of the time—it creates a “hallucination layer.” You aren’t testing your code; you’re testing your assumptions about how other code works. If your assumption is wrong, your test is a green lie. We spend hours configuring whenever(api.call()).thenReturn(data), building a perfect little world where the network never fails, the database is always fast, and the JSON schema never changes.

The danger here is Structural Over-specification. You end up testing the implementation instead of the behavior. If you refactor a function to use a different internal service but the output remains the same, your tests should stay green. If they break because a specific mock wasn’t “called twice with these exact arguments,” you aren’t writing safety nets—you’re writing overhead.

// Brittle, Useless Mocking
@Test
fun testUpdateUser() {
    val repo = mock<UserRepository>()
    val service = UserService(repo)
    
    service.updateEmail("1", "[email protected]")
    
    // This only checks if the method was called. 
    // It doesn't check if the data was actually valid or saved correctly.
    verify(repo).save(any()) 
}

In production, repo.save() might throw a ConstraintViolationException because the email is already taken. Your mock doesn’t care. It happily reports “Verified!” while the real-world system crashes. Over-mocking hides the Contractual Failures between modules. You’ve tested the plumbing, but you haven’t checked if the water is toxic.

The Integration Void

Unit tests are cheap and fast, which is why managers love them. But they rarely catch the bugs that actually take down a system. The “Integration Void” is the space between two perfectly tested units where the logic falls apart. You have a “Service A” that returns a non-null object and a “Service B” that expects that object. Both have 100% coverage.

Then comes the real world: Service A encounters a timeout and returns null (or an empty object) because someone put a try-catch block around the network call “just in case.” Service B receives this unexpected state and explodes. Because you mocked the interaction between them in your unit tests, you never saw the explosion coming. You traded System Reliability for Development Speed.

Concurrency Hazards: The Coroutine Trap

In modern Kotlin development, concurrency is the biggest source of “invisible” bugs. Coroutines are elegant, but they are a nightmare for standard unit testing. Most developers use runBlocking or TestScope to force asynchronous code into a synchronous flow. This makes the tests pass, but it completely ignores Race Conditions and Side Effects.

// The "Passes in Tests, Dies in Load" Pattern
fun syncData() = CoroutineScope(Dispatchers.IO).launch {
    val data = fetchData() // Long running
    saveToDb(data)
}

In a unit test, fetchData() returns instantly. In production, it takes 2 seconds. If the user navigates away or triggers the action again, you end up with multiple jobs writing to the DB simultaneously, or a JobCancellationException that bubbles up and kills the entire app process. Your green test didn’t catch this because it wasn’t running in a multi-threaded, high-latency environment.

Related materials

Golden Hammer Antipattern

Golden Hammer Antipattern: Why Overengineering is Killing Your Codebase If you're building a factory for a single config reader or an interface for a service that will never have a second implementation, stop. This is...

[read more →]

Testing concurrency requires more than just assertEquals. It requires Stress Testing and State Monitoring. If your test suite doesn’t simulate “What happens if this takes 10 seconds?” or “What happens if the user cancels halfway through?”, then you aren’t testing your code—you’re just hoping for the best.

Beyond Assertions: Testing for Reality

If you want to stop shipping bugs that “passed all tests,” you have to stop testing for what you want to happen. Most test suites are just a series of “Happy Path” stories. To actually break your code before a user does, you need to use tools that don’t care about your feelings or your clean architecture.

The Chaos of Property-Based Testing

Standard testing is predictable: you give 2 and 2, you expect 4. But production is a drunk user typing emojis into a credit card field. This is where Property-Based Testing (PBT) comes in. Instead of picking specific values, you define a rule (a property) and let the engine throw 10,000 random, degenerate, and borderline impossible inputs at your function.

// Property: Discount never makes the price negative
checkAll(Arb.double(), Arb.string()) { price, code ->
    val result = applyDiscount(price, code)
    result >= 0.0 // This will catch NaN, Infinity, and negative inputs you forgot
}

If your code survives 10,000 rounds of random garbage, it might actually survive a week in production. If it fails on input -1.234E-158, you’ve just found a bug that would have stayed hidden for years in a regular unit test suite.

Mutation Testing: Testing the Tests

The ultimate “bullshit detector” for your test suite is Mutation Testing. It’s simple: a tool goes into your source code and intentionally breaks it. It changes > to <, + to -, or deletes a line entirely. Then it runs your tests.

If your tests still pass after the code was sabotaged, your tests are useless. They aren’t actually asserting anything meaningful about that logic. They are just “touching” the lines to make the coverage report look pretty. If your “100% Coverage” suite doesn’t fail when the logic is inverted, you’re flying blind with a broken radar.

Related materials

Legacy Code: How to...

The Art of the Post-Mortem: Why Your Worst Bugs are Your Best Teachers You’ve just spent six hours staring at a terminal, caffeine vibrating in your veins, watching your production environment burn. You finally found...

[read more →]

Weaponizing Your CI: Mutation Testing with Pitest

If you want to stop guessing whether your tests actually work, you need to automate the “sabotage.” For Kotlin projects, Pitest is the gold standard. It hooks into your Gradle build and systematically injects faults—mutants—into your bytecode. If your test suite doesn’t scream (fail), the mutant “survives,” and you’ve just exposed a gap in your safety net.

// build.gradle.kts - The "Truth" Configuration
plugins {
    id("org.pitest.pitest") version "1.15.0"
}

pitest {
    targetClasses.set(listOf("com.krun.dev.service.*")) // Target your logic
    pitestVersion.set("1.15.0")
    threads.set(4)
    outputFormats.set(listOf("HTML", "XML"))
    timestampedReports.set(false)
    mutationThreshold.set(85) // CI fails if < 85% of mutants are killed
}

Integrating this into a GitHub Actions pipeline transforms a “Green Build” from a suggestion into a contract. Instead of just running ./gradlew test, you execute ./gradlew pitest.

# .github/workflows/ci.yml
- name: Run Mutation Tests
  run: ./gradlew pitest
- name: Check Mutation Threshold
  run: |
    # If Pitest finds that tests are blind to logic changes,
    # the pipeline must die here.
    if [ $? -ne 0 ]; then echo "Mutation score too low!"; exit 1; fi

Warning: Mutation testing is computationally expensive. Don’t run it on every small commit. Trigger it on Pull Requests to the main branch. It’s better to wait 5 extra minutes for a build than to spend 5 hours debugging a production disaster that your 100% coverage report missed. This is the difference between “playing at DevOps” and actually protecting your data.

Expert Conclusion: The “Don’t Trust, Verify” Mindset

Let’s be real: no amount of testing will make your code 100% bug-free. The goal isn’t perfection; it’s predictability. A “Green Build” is just the starting line, not the finish. To build systems that don’t crumble at 3 AM, you need to shift your mindset from “Does it pass?” to “How does it fail?”

Kill the Defaults: Stop using ?: 0.0 to hide nulls. If a value is missing and it shouldn’t be, throw an exception. Fail fast, fail loud. A crash in the logs is better than a silent corruption in the database.
Test Behavior, Not Structure: If you change a private method and ten tests break, your tests are too coupled to the “how” instead of the “what.” Focus on the output, not the internal wiring.
Integration is Everything: Units are easy. Systems are hard. Invest more in integration tests that use real (or containerized) databases and real network calls. Mocks don’t have latency; real life does.
Observability > Testing: Since tests will fail you, make sure you can see the failure happening in real-time. Logging, tracing, and metrics are your last line of defense when the “Green Build” inevitably hits a scenario you didn’t imagine.

The bottom line: High coverage is a side effect of good testing, not the goal. If you’re chasing a percentage, you’re playing a game of numbers. If you’re chasing edge cases, failures, and “impossible” states, you’re actually doing engineering. Stop trusting the green light and start questioning why it’s green in the first place.

Written by:

Krun Dev

Related Articles