Tuist Community - Latest posts https://community.tuist.dev Latest posts XCTest-dependent helper targets (TestSupport, test utilities) should be cacheable Hi @eghnacr

We tried to enable support for it, but it came with cascading effects that we didn’t have the time to look into. I’ll have a look again tomorrow and see if I can get this working.

]]>
https://community.tuist.dev/t/xctest-dependent-helper-targets-testsupport-test-utilities-should-be-cacheable/941#post_2 Mon, 16 Mar 2026 17:15:42 +0000 community.tuist.dev-post-2460
Suggestions to improve our marketing home for small viewports
marekfort:

But Get started links to the docs, which I also don’t think people would do from mobile.

Additionally, should we change Get started to be a sign up page over linking to docs?

Yeah, get started should be sign up IMO. The one thing I’d find annoying, though, is that after getting past the login page, many pages will look broken.

]]>
https://community.tuist.dev/t/suggestions-to-improve-our-marketing-home-for-small-viewports/942#post_3 Mon, 16 Mar 2026 16:44:37 +0000 community.tuist.dev-post-2459
Suggestions to improve our marketing home for small viewports
pepicrft:

It’s not obvious in our CTAs that this can be tried for free. The developer needs to navigate deep into our pricing page to figure that out. A small wording change might be sufficient, for example, Try for free (over Get started)

On board with this.

But Get started links to the docs, which I also don’t think people would do from mobile.

Additionally, should we change Get started to be a sign up page over linking to docs?

Agree, would include the logos higher up :+1:

We can move it up on desktop, though.

Agree. I see our mission crystalizing into being your virtual platform team with everything that entails (making your builds fast, ensuring your test suite is stable, etc.)

The main mission is not mobile-specific, although we are currently mobile-focused (or at least have mobile-specific features). We can keep mobile references out of the hero section, I’m not fully opposed to that.

]]>
https://community.tuist.dev/t/suggestions-to-improve-our-marketing-home-for-small-viewports/942#post_2 Mon, 16 Mar 2026 11:09:52 +0000 community.tuist.dev-post-2458
Suggestions to improve our marketing home for small viewports While showing our marketing to my wife, she came up with some good questions and suggestions to make that page more effective at getting people and companies into our funnel.

Before diving into the specifics, I think it’s important to call out the two profiles we are catering to with our home page:

  • Developers: They come to the website with the eagerness to give it a shot for free, likely in a side project (avoiding internal vendoring & security process). We need them to get excited about what we can offer.
  • Buyers: The people who make the decision to buy, usually managers, directors, or leaders. They get excited about the value it brings to the business (e.g., more productive teams), are more keen to talk, and make decisions based on social proof (i.e., these brands which I recognize use it too)

With that in mind, there are a few things in our mobile marketing home that would benefit from some rethinking:

  • It’s not obvious in our CTAs that this can be tried for free. The developer needs to navigate deep into our pricing page to figure that out. A small wording change might be sufficient, for example, Try for free (over Get started)
  • The CTAs for the two types of visitors, developers and buyers, try it for free and talk to us, are not visible in the sticky bar, so the CTA is sort of one as you scroll down. I suggest we make those match the ones in the hero section, then move “Log in” to the dropdown menu, since that’s not something we expect people to do on mobile.
  • The social proof is further down. It should be visible without scrolling. I’d suggest we drop the phone and use that space to include the proof there. I included a Supabase example below.
  • The same goes for the demo. Sentry includes a “See how in sandbox” button. I think that affordance should be in the hero section too, so I can go and see. However, this would require our dashboard to be fully responsive, which is not the case today, so we can tackle this later once we fix that.
  • The “Make mobile your competitive advantage” doesn’t capture our key role in making your teams productive. I think we should revisit it (:slight_smile: once again).

]]>
https://community.tuist.dev/t/suggestions-to-improve-our-marketing-home-for-small-viewports/942#post_1 Mon, 16 Mar 2026 10:45:23 +0000 community.tuist.dev-post-2457
XCTest-dependent helper targets (TestSupport, test utilities) should be cacheable staticFramework targets that depend on XCTest (e.g. TestSupport targets providing test mocks/helpers) are excluded from binary caching. These are not test bundles — they’re regular frameworks only consumed by test targets.

tuist cache --print-hashes does not list any XCTest-dependent staticFramework target. In our project this excludes ~60 targets (TestSupport modules, SnapshotTesting, test utility frameworks) — none of which are dependencies of the app target.

Reproduction:

  1. Create a staticFramework target that depends on an SPM package importing XCTest

  2. Run tuist cache --print-hashes

  3. The target does not appear in the cacheable list

Replacing a TestSupport target with a binary doesn’t change the dependency graph — it’s still only consumed by test bundles. These targets should be cacheable.

I noticed PR #8529 addressed exactly this but was closed. Is there a specific reason it wasn’t merged? Are there known issues with caching XCTest-dependent frameworks?

Environment: Tuist 4.150.0 (Cloud/EE), Xcode 26.2, macOS 26.2 arm64
Github Issue #9756

]]>
https://community.tuist.dev/t/xctest-dependent-helper-targets-testsupport-test-utilities-should-be-cacheable/941#post_1 Sat, 14 Mar 2026 09:01:21 +0000 community.tuist.dev-post-2456
Signed XCFramework from SPM Hi,

I’m trying to add the expected signature to an XCFramework that is an SPM dependency. I can see that it’s possible to add the expected signature for a local XCFramework dependency here but the expectedSignatureparameter doesn’t exist for an external dependency.

I can see from the pbxproj that it’s stored like this:

77C479C930030662A4AF2CDF /* Foo.xcframework */ = {isa = PBXFileReference; expectedSignature = "AppleDeveloperProgram:TEAMID:NAME"; lastKnownFileType = wrapper.xcframework; path = Foo.xcframework; sourceTree = "<group>"; };

Then in Xcode if the project is built and the code signature has changed it will fail the build. Without storing this in git then if the signing on the framework is removed or changed the build won’t fail.

Is there a way to do this in Tuist?

Thanks,
Jonathan

]]>
https://community.tuist.dev/t/signed-xcframework-from-spm/940#post_1 Fri, 13 Mar 2026 11:52:31 +0000 community.tuist.dev-post-2455
Distributed Key-Value Store for Cache Nodes Thanks for putting this together :clap:

I was going to ask why not using our server Postgres DB through the server, but I noticed you addressed it at the end.

A couple of things I’d recommend mentioning somewhere in the proposal, for people that don’t have enough context, is:

  • What prompted this? A bit of context around the per-host bandwidth limit and how we are seeing sharding as a tool to overcome that and connect that with this piece of work.
  • Maybe a summary of what the state of things are in regards to what we store in that KV DB, and what are other kinds of artifacts that those nodes deal with. Just a summary
]]>
https://community.tuist.dev/t/distributed-key-value-store-for-cache-nodes/939#post_4 Thu, 12 Mar 2026 11:13:14 +0000 community.tuist.dev-post-2454
Distributed Key-Value Store for Cache Nodes Us wanting to switch the server database to PlanetScale at some point anyways, and the general price/performance ratio benefit of them PlanetScale vs Supabase benchmarks — PlanetScale

Also personally liking their dashboard better for query insights which might become helpful for across-the-world stuff.
No hard feelings about it though, the database provider is the one interchangeable bit in this proposal :slight_smile:

]]>
https://community.tuist.dev/t/distributed-key-value-store-for-cache-nodes/939#post_3 Thu, 12 Mar 2026 10:34:50 +0000 community.tuist.dev-post-2453
Distributed Key-Value Store for Cache Nodes Aligned with the proposal. This is definitely a strong need as we horizontally scale and we need a mechanism to share the values as we do for artifacts via S3.

Is there a specific reason to use PlanetScale over Supabase? I’m not opposed to this, but would like to see the reasoning that went into this.

]]>
https://community.tuist.dev/t/distributed-key-value-store-for-cache-nodes/939#post_2 Thu, 12 Mar 2026 10:19:39 +0000 community.tuist.dev-post-2452
Distributed Key-Value Store for Cache Nodes Abstract

This RFC proposes adding an eventually-consistent distributed replication layer to the cache service’s key-value store. Today, KV entries live in per-node SQLite databases, causing inconsistent hit/miss behavior when multiple cache nodes sit behind the same load balancer within a region. The proposal introduces a shared PlanetScale Postgres instance as the global source of truth, with asynchronous background replication keeping each node’s local SQLite synchronized. The design preserves the existing local-first architecture for self-hosted deployments and keeps shared infrastructure entirely off the request hot path.

Motivation

KV entries live in per-node SQLite. As we scale horizontally within a region, an Xcode cache upload that lands on one node is a miss on its sibling nodes behind the same load balancer. That means one request can be a miss and the next a hit inside the same region, which is not acceptable. We also have globally split teams, so cross-region visibility matters too.

Requirements

  • Eventually consistent is fine; cross-node visibility can lag by minutes.
  • Last-write-wins at the key level is acceptable.
  • Request latency is critical; shared infra must stay off the hot path.
  • S3 PUTs are a major cost driver; no per-entry S3 uploads for KV metadata.
  • Self-hosted single-node deployments must keep working with local-only SQLite.
  • KV PUTs can burst to ~1000 req/s; synchronous remote writes on the request path are not viable.
  • Local SQLite on each node must stay within the current storage budget; we cannot mirror the full global dataset onto every node.

Detailed Design

Operating Modes

The system introduces two operating modes, selected via KEY_VALUE_MODE=local|distributed:

Mode Default Behavior
local Yes Current behavior. Cachex + local SQLite. No shared state.
distributed No Local Cachex + SQLite as edge cache. PlanetScale Postgres in us-east as global truth.

All existing behavior stays unchanged in local mode.

Production Topology

cache-eu-central ──┐
cache-eu-north ────┤
cache-us-east ─────┤
cache-us-west ─────┼──▶ PlanetScale Postgres (us-east)
cache-ap-southeast ┤
cache-sa-west ─────┤
cache-au-east ─────┘

Data Model

PlanetScale Postgres: kv_entries

Column Type Notes
key text PK. Format: keyvalue:{account}:{project}:{cas_id}
account_handle text Extracted from key for query efficiency
project_handle text Extracted from key for query efficiency
cas_id text Extracted from key
json_payload text Opaque blob, same format as today
source_node text Originating cache host
source_updated_at timestamptz LWW timestamp from originating node
last_accessed_at timestamptz Latest globally replicated access timestamp; used to decide what stays hot locally
updated_at timestamptz DB-side timestamp for sync ordering
deleted_at timestamptz Soft-delete tombstone, NULL when alive

Indexes:

  • PK on key
  • (updated_at, key) for poller watermark queries
  • (last_accessed_at, key) for optional recency-based operational queries
  • (account_handle, project_handle) for project-scoped cleanup
  • (deleted_at) partial index for tombstone purging

The json_payload column is stored as text, not jsonb. The payload is treated as an opaque blob everywhere – the request path never queries into it, and Postgres never needs to index or filter by its contents. text avoids the parsing overhead of jsonb on every insert. If debugging queries against the payload become necessary, a one-time ALTER COLUMN to jsonb is non-breaking.

Local SQLite Changes (distributed mode)

key_value_entries stays as it is in local mode. In distributed mode, it becomes a materialized hot cache rather than the source of truth.

A new migration adds:

  • source_updated_at (utc_datetime_usec, nullable) – payload-version conflict resolution. Intentionally separate from last_accessed_at: reads bump last_accessed_at, but must not make an older payload look newer than a later write.
  • replication_enqueued_at (utc_datetime_usec, nullable) – marks rows whose latest write or access bump still needs to be shipped to Postgres.

In local mode these columns are unused and remain NULL. In distributed mode, KeyValueBuffer.write_batch/2 sets source_updated_at to the current timestamp for payload writes, and both writes and access bumps set replication_enqueued_at when they need shipping.

Every node converges on the global hot working set using one background sync loop: KeyValueReplicationPoller follows rows in Postgres ordered by updated_at, and updated_at advances for both payload writes and replicated access bumps. The local database is kept below the existing 25GB limit by the current time- and size-based eviction policy, which already orders by last_accessed_at.

  • Each node converges toward the global hot working set, not the full historical dataset.
  • Different nodes will not hold the exact same set, but they should broadly converge as globally hot keys keep getting touched and therefore keep moving through the sync stream.
  • Cold entries can be dropped locally and later re-materialized when another node writes or accesses them, because both actions advance the shared row’s updated_at.

key_value_entry_hashes is removed entirely (see Removal of key_value_entry_hashes). The table, schema, and all code that reads or writes hash references are deleted as part of this RFC.

Local Replication Queue

We do not add a second SQLite table for replication. The existing key_value_entries table is sufficient:

  • New writes already flow through KeyValueBuffer.
  • In distributed mode, KeyValueBuffer.write_batch/2 writes the row locally and sets replication_enqueued_at.
  • KeyValueReplicationShipper scans key_value_entries WHERE replication_enqueued_at IS NOT NULL ORDER BY replication_enqueued_at, id LIMIT ....
  • On successful shipment it clears replication_enqueued_at.
  • Repeated writes to the same key naturally coalesce because key_value_entries.key is already unique.
  • A partial index on replication_enqueued_at IS NOT NULL ensures the shipper only scans pending work.
  • Pending rows must not be evicted locally before they have been shipped successfully.

Write Path

Client PUT
  │
  ▼
Cache node (any)
  ├─▶ Cachex.put (in-memory, immediate)
  └─▶ KeyValueBuffer.enqueue (ETS buffer → local SQLite, marks row for replication)
  │
  ▼
ACK to client
  │
  ▼ (async, background)
KeyValueReplicationShipper (GenServer, per-node)
  ├─ polls pending local rows every 200ms (configurable via DISTRIBUTED_KV_SHIP_INTERVAL_MS)
  ├─ batches pending rows (up to 500-2000 per batch)
  ├─ INSERT INTO kv_entries ... ON CONFLICT (key) DO UPDATE
  │    SET json_payload = CASE
  │          WHEN EXCLUDED.source_updated_at > kv_entries.source_updated_at
  │          THEN EXCLUDED.json_payload
  │          ELSE kv_entries.json_payload
  │        END,
  │        source_updated_at = GREATEST(kv_entries.source_updated_at, EXCLUDED.source_updated_at),
  │        last_accessed_at = GREATEST(kv_entries.last_accessed_at, EXCLUDED.last_accessed_at),
  │        ...
  └─ clears `replication_enqueued_at` on success

LWW resolution: payload fields use source_updated_at; access recency uses GREATEST(last_accessed_at) so an access bump can propagate without overwriting a newer payload. If two payload writes arrive with identical source_updated_at, the tie is broken by lexicographically comparing source_node so the rule is explicit and deterministic.

Burst absorption: duplicate-key bursts are rare, so the main protection is batching many different keys into a small number of SQL transactions. When the same key is hit repeatedly, the local row is updated in place and remains a single pending shipment.

Cross-region tuning: shipper database timeouts and batch sizes must be tuned for the farthest regions. The correct answer for high-latency regions is smaller batches and a longer shared-DB timeout, not a single huge transaction that times out from Australia.

Read Path

Client GET
  │
  ▼
Cachex.get(key)
  ├─ hit ──▶ return payload
  └─ miss
      │
      ▼
  KeyValueRepo.get_by(key) [local SQLite]
  ├─ hit ──▶ populate Cachex, enqueue access update, return payload
  └─ miss ──▶ return {:error, :not_found}

The initial design keeps Postgres entirely off the read path. Xcode cache artifacts are small enough that a cross-ocean metadata lookup may not beat a miss and rebuild anyway, so remote miss fallback is intentionally not part of the core feature. If metrics later show a clear benefit, it can be added as an explicit follow-up behind a flag.

Access replication: distributed mode needs a coalesced access-bump path, not just payload-write replication. When a key is read locally, we eventually mark it for shipment so the shipper can propagate an updated last_accessed_at to Postgres. That access bump also advances the shared row’s updated_at, so it naturally flows through the same inbound poller as payload writes.

  • Payload freshness is governed by source_updated_at.
  • Hotness is governed by last_accessed_at.
  • We do not need to ship every individual read; the latest observed access time per key is enough.
  • In practice this should be throttled/coalesced so hot Cachex hits still refresh global hotness without turning every hit into a SQLite + Postgres write.

Inbound Replication (Poller)

KeyValueReplicationPoller (GenServer, per-node)
  │
  ├─ stores local watermark in GenServer state (last seen `updated_at` + `key`)
  ├─ polls PlanetScale every 30-60s (configurable)
  ├─ SELECT * FROM kv_entries
  │    WHERE source_node != @current_node
  │      AND updated_at < NOW() - interval '5 seconds'
  │      AND (updated_at, key) > (@last_updated_at, @last_key)
  │    ORDER BY updated_at, key
  │    LIMIT 1000
  │
  ├─ for each row:
  │    ├─ alive (deleted_at IS NULL):
  │    │    upsert into local SQLite key_value_entries
  │    │    payload fields use LWW on `source_updated_at`
  │    │    `last_accessed_at` uses GREATEST(local, remote)
  │    └─ tombstoned (deleted_at IS NOT NULL):
  │         skip if local row has `replication_enqueued_at IS NOT NULL`
  │           (pending shipment must reach Postgres before local delete)
  │         otherwise delete from local SQLite key_value_entries
  │
  └─ advance local watermark

Key details:

  • source_node != @current_node filter: avoids re-importing the node’s own writes.
  • updated_at < NOW() - interval '5 seconds' lag buffer: prevents the classic CDC gap where an in-flight Postgres transaction with updated_at = T hasn’t committed yet, but a later transaction with updated_at = T+1 has. The poller would advance its watermark past T and never see the first transaction. A 5-second buffer gives in-flight transactions time to commit before the poller reads them.
  • Payload freshness and hotness are separate: payload bytes are governed by source_updated_at, while last_accessed_at is merged independently with GREATEST(...) so hotness can propagate without a new payload write.
  • Tombstones must not destroy pending shipments: if the local row has replication_enqueued_at IS NOT NULL, the poller skips the tombstone delete. The shipper will ship the row to Postgres, where the newer source_updated_at will win over the tombstone’s cutoff and revive the row. Once shipped, the next poll cycle will see the row is alive globally and leave it alone.
  • Global replication, local eviction: every node sees global changes, but the local SQLite store is still bounded by the existing eviction worker. We do not try to predict a per-region subset in the poller.

Convergence guarantee: all nodes eventually see globally touched rows (writes and replicated accesses), bounded by poll interval + lag buffer + query page size.

Watermark persistence: the watermark lives in GenServer state and is lost on restart. On startup, the poller initializes its watermark to NOW() - DISTRIBUTED_KV_SYNC_INTERVAL_MS * 2 so it picks up recent changes without re-scanning the entire history. Since all inbound operations are idempotent upserts, re-processing a few rows after restart is harmless.

No Cachex invalidation on inbound replication: let TTL handle staleness. Avoids complexity and Cachex churn.

Eviction and Cleanup

Local Eviction (both modes)

  • Purpose: free local SQLite space.
  • Runs via KeyValueEvictionWorker with time-based and size-based triggers (unchanged).
  • Deletes local SQLite rows only. CASCleanupWorker is removed entirely as part of this RFC (see Removal of key_value_entry_hashes). Orphaned local disk files are handled by OrphanCleanupWorker.
  • In distributed mode: must skip rows where replication_enqueued_at IS NOT NULL so pending writes are not dropped before they reach Postgres. The entry still exists globally; other nodes and the poller can re-materialize it if it becomes active again.

Shared-store Invalidation (distributed mode only)

Tombstones are normal kv_entries rows with deleted_at set. Only explicit invalidation flows create tombstones (e.g., tuist cache clean --remote via CleanProjectWorker). Local eviction never creates tombstones.

Race-safe semantics for tuist cache clean --remote:

  1. The cleanup request carries a single cleanup_started_at timestamp generated once and shared across all node requests.
  2. Each node deletes its local SQLite rows for the target scope only where source_updated_at <= cleanup_started_at.
  3. Each node writes Postgres tombstones (deleted_at) for the same scope and cutoff. Multiple nodes executing the same tombstone write is safe: the predicate is deterministic.
  4. A write that happens after cleanup_started_at wins automatically: it is not deleted locally, not tombstoned in Postgres, and if it lands after an older tombstone it revives the row.

Concrete lifecycle:

  1. A user runs tuist cache clean --remote.
  2. The system generates one cleanup_started_at cutoff and sends it to all cache nodes.
  3. Each node removes matching local rows only if source_updated_at <= cleanup_started_at.
  4. The shared-store cleanup sets deleted_at on matching Postgres rows using that same cutoff.
  5. Tombstoned rows appear in the normal replication stream, and pollers remove any remaining local copies.
  6. After a safety window, the tombstone can optionally be hard-deleted from Postgres.

The safety window (default 7 days) ensures slow or restarting nodes don’t miss the delete. Tombstone purging is not required for the initial implementation; it is an operational cleanup once we have real tombstone volume data.


Configuration

Env var Default Description
KEY_VALUE_MODE local local or distributed
DISTRIBUTED_KV_DATABASE_URL PlanetScale Postgres connection string
DISTRIBUTED_KV_POOL_SIZE 5 Connection pool size
DISTRIBUTED_KV_DATABASE_TIMEOUT_MS 10000 Shared-DB query timeout for shipper/poller operations
DISTRIBUTED_KV_SYNC_INTERVAL_MS 30000 Poller interval
DISTRIBUTED_KV_SHIP_INTERVAL_MS 200 Shipper interval
DISTRIBUTED_KV_SHIP_BATCH_SIZE 1000 Max rows per ship batch
DISTRIBUTED_KV_TOMBSTONE_RETENTION_DAYS 7 How long to keep tombstones
DISTRIBUTED_KV_NODE_NAME PUBLIC_HOST Identifier for this node, written to source_node. Used by the poller to filter out its own writes.

Telemetry

All new components emit telemetry events consistent with the existing patterns in Cache.PromEx and the codebase’s :telemetry.execute usage.

Event Measurements Metadata
[:cache, :kv, :replication, :ship, :flush] duration_ms, batch_size status (:ok, :error)
[:cache, :kv, :replication, :ship, :pending_rows] count
[:cache, :kv, :replication, :poll, :complete] duration_ms, rows_materialized, rows_deleted
[:cache, :kv, :replication, :poll, :lag_ms] lag_ms – (time between newest polled row’s updated_at and now)
[:cache, :kv, :replication, :ship, :timeout] count region
[:cache, :kv, :replication, :local_store, :size_bytes] size_bytes node, region
[:cache, :kv, :tombstone_purge, :complete] entries_purged, duration_ms

Alternatives Considered

Why a shared metadata store at all?

Each cache node has its own SQLite database. In a horizontally scaled region, there is no mechanism for sibling nodes behind the same load balancer to converge on the same KV state. The options were:

  1. Shared metadata store (chosen): nodes replicate KV state asynchronously to a central database. Local SQLite becomes an edge cache; the shared store is the global source of truth.
  2. S3-based replication: upload KV entries to S3 and have nodes poll/download them. Rejected because S3 PUTs are already a major cost driver for Xcode cache artifacts; adding per-entry KV uploads would multiply that cost for metadata that is small but high-frequency.
  3. Erlang clustering / distributed ETS / libcluster: rejected because the cache nodes are globally distributed across 7 regions with no shared network. BEAM distribution assumes low-latency, reliable connections. Cross-ocean netsplits would cause constant cluster instability.
  4. Do nothing; accept per-node misses: rejected because same-region miss/hit flip-flopping behind a load balancer is operationally unacceptable.

Why Postgres, not ClickHouse?

The KV metadata workload is OLTP, not analytics:

  • Point lookups by key (read path)
  • Frequent upserts with LWW conflict resolution (write path)
  • Deletes and tombstones (retention/cleanup)

ClickHouse is optimized for append-heavy event streams and columnar scans. Using it here would mean fighting its data model for correctness on mutations, deletes, and point lookups. Postgres is the boring-correct choice for this workload shape.

Why a dedicated Postgres, not the server’s Postgres?

  • The cache KV workload is bursty (up to 1000 writes/s), write-heavy, and operationally separate from the server’s transactional workload.
  • Mixing them risks cache bursts degrading server DB performance.
  • Independent scaling, maintenance windows, and failure domains.

Why not route KV reads/writes through the server application?

  • Adds an extra network hop, serialization layer, and queueing point on the request path.
  • The server is not on the cache request path today and should not be.
  • Cache nodes should talk directly to the shared metadata store.

Why us-east for the primary?

  • cache-us-east is by far the largest traffic region and will remain so for the foreseeable future.
  • The shared DB is still off the request path, so non-local regions pay latency only on background replication.
  • Lowest-risk placement because it is already where most cache traffic originates.

Why single-region Postgres is acceptable

  • Convergence can lag by minutes (stated requirement).
  • Most requests never touch the shared DB (local Cachex + SQLite serve the hot path).
  • Async replication amortizes RTT across batches.
  • Cross-ocean latency is paid by background shippers and pollers, not the request path.

Why poll-based replication, not LISTEN/NOTIFY or push

  • 7 globally distributed hosts with minutes-level convergence tolerance.
  • Polling is simpler, stateless, and tolerant of network interruptions.
  • LISTEN/NOTIFY requires persistent connections and is fragile across regions.
  • Push/fanout adds complexity for marginal latency improvement that is not needed.

Why pending replication coalesces by key

At 1000 req/s burst, if many writes hit the same key, the local SQLite row is updated in place and remains a single pending shipment. If writes hit different keys, the shipper batches them into a small number of SQL transactions. Without coalescing, 1000 req/s would mean 1000 remote upserts/s, which is fragile under cross-region latency.

Why no per-entry count cap on entries[]

  • Xcode controls the payload shape; we do not.
  • Currently one entry per key, but this may change.
  • A count cap would be brittle and require coordinated client/server changes.
  • The authoritative global row stores the opaque json_payload blob. Pathological payloads may slow background processing, not request serving.
  • Byte-size cap already exists at the request level (25MB body limit).

Why local mode must remain the default

Self-hosted Tuist users run a single cache node with block storage. They have no shared DB and should not need one. Forcing a roundtrip to a remote shared database would make their single-node setup strictly worse. All existing behavior (Cachex + SQLite, local eviction behavior) must remain unchanged and be the default.

Removal of key_value_entry_hashes

Today, KV eviction extracts CAS hashes from deleted entries, checks which are unreferenced via key_value_entry_hashes, and enqueues CASCleanupWorker jobs to delete artifacts from disk and S3. This coupling is removed entirely (both modes) for two reasons:

  1. S3 lifecycle policies own artifact expiration. Xcode cache artifacts are moving to S3 with aggressive lifecycle policies. Cache nodes should not be in the business of deleting S3 objects on eviction. The only S3 deletion that cache nodes perform is explicit user-initiated cleanup via tuist cache clean.
  2. Local disk orphans are already handled. When a KV entry is evicted, any CAS files on local disk become orphans. The existing OrphanCleanupWorker periodically walks the filesystem and deletes files that have no metadata entry. This is sufficient for local disk hygiene without the complexity of hash reference tracking.

Self-hosted deployments that use S3 must configure appropriate lifecycle policies on their buckets. This should be documented in the self-hosting guide.

The cleanup model after this change:

  • Eviction (both modes): delete KV metadata from SQLite. Nothing else.
  • OrphanCleanupWorker: finds and deletes orphaned CAS files from local disk.
  • S3 lifecycle policies: handle S3 artifact expiration.
  • tuist cache clean: explicit user action – deletes from local disk, S3, and (in distributed mode) writes Postgres tombstones.

Why local eviction must be decoupled from shared-store invalidation

In distributed mode, a local eviction only means “this node dropped its cached copy to free SQLite space.” The entry still exists globally in Postgres. Other nodes may still have it materialized.

  • Local eviction = free local space only. No tombstones.
  • Shared-store deletes are only for explicit invalidation flows such as project cleanup, and they propagate as tombstones.
]]>
https://community.tuist.dev/t/distributed-key-value-store-for-cache-nodes/939#post_1 Wed, 11 Mar 2026 20:50:26 +0000 community.tuist.dev-post-2451
Marketing video: Introducing MCP We’ll start making videos to present new platform capabilities, and as part of that we thought why not iterating on those in public so others can participate and also have a peek into our creative process? Here’s a first example for a video we’d like to create to announce our new MCP server.

Tuist MCP Server - 20s Announcement Video Script


Structure Overview

Timecode Purpose
0s - 2s Hook - grab attention immediately
2s - 6s Announcement - MCP is live, here’s what it is
6s - 16s Use cases - concrete examples in rapid succession
16s - 20s Sign-off - tagline and logo

Script + Visuals


[0s - 2s] - Hook

Line:

“Your team ships every day. But nobody sees the full picture.”

Visual: A dark terminal window. Cursor blinking. No prompt yet. Just silence. The terminal feels like something is waiting to be asked.


[2s - 6s] - Announcement

Line:

“Tuist now ships an MCP server - connecting your project’s insights directly to your coding agent.”

Visual: The words “Tuist MCP” appear at the top of the terminal. Then a tool list streams in, like an agent loading its context:

> tuist_get_builds
> tuist_get_test_runs
> tuist_get_bundle_size
> tuist_get_trends
...

The agent is now armed. Short beat.


[6s - 9s] - Use Case 1: Build Regression

Line:

“Ask why your build regressed on Tuesday.”

Visual: Terminal. User types:

Why did my build get slower on Tuesday?

Agent responds, streaming in:

Build time increased by 38s on Tuesday.
Cause: PaymentsTarget was added as a dependency
of CoreTarget, creating a parallelization
contention point.

Would you like me to restructure
your architecture to fix this?

[9s - 12s] - Use Case 2: Flaky Tests

Line:

“Find your flakiest tests - and a fix.”

Visual: Terminal. User types:

Which tests are flaky this week?

Agent responds:

Top flaky tests this week:
1. AuthTokenRefreshTests - fails 34% of runs
2. CheckoutFlowUITests  - fails 21% of runs
3. UserSessionTests     - fails 18% of runs

Working on a fix for AuthTokenRefreshTests...
[===========>      ] Analysing test history
[==================] Done

Likely cause: race condition on async teardown.
Opening a PR with suggested fix.

[12s - 15s] - Use Case 3: Bundle Bloat

Line:

“Spot what’s bloating your bundle before it ships.”

Visual: Terminal. User types:

What's grown the most in our bundle this sprint?

Agent responds:

Bundle size increased by 43% this sprint.
Found 12MB of duplicated images across
FeatureA and FeatureB targets.

Working on a fix...
[===========>      ] Scanning asset catalogs
[==================] Done

Moving shared assets to CommonAssets target.
Opening a PR with the changes.

[15s - 17s] - Use Case 4: Trends

Line:

“Or just ask how things are going - and get a real answer.”

Visual: Terminal. User types:

How is the project looking lately?

Agent responds:

- Build time up 12%
- Bundle size stable
- Number of modules increased by 3
- Flaky test rate down 8%

[17s - 20s] - Sign-off

Line:

“Tuist MCP. Less guessing, more shipping.”

Visual: Terminal fades. White Tuist logo centered on dark background. Tagline appears beneath: “Less guessing, more shipping.” Hold for one beat. Cut.


Notes for the Designer

  • Aesthetic: Everything lives inside a terminal. Dark background, monospace font, streaming text. No charts, no dashboards, no UI chrome.
  • Streaming effect: Agent responses should animate in line by line, like a real LLM streaming output. This is the key motion in the video.
  • Pacing: Each use case is a two-beat punch - user prompt appears, then agent response streams in fast. Cut before it feels slow.
  • Hook visual: The blinking cursor with no prompt yet is doing a lot of work. Give it a moment to breathe before anything appears.
  • Agent interface: Keep the terminal generic - no specific tool branding. It should read as “your agent, whichever one you use.”
  • Text on screen: The spoken lines can double as on-screen captions or supers, especially for silent autoplay on LinkedIn/X.
]]>
https://community.tuist.dev/t/marketing-video-introducing-mcp/938#post_1 Wed, 11 Mar 2026 11:45:26 +0000 community.tuist.dev-post-2450
Machine metrics Machine Metrics for Xcode and Gradle Builds

Hey everyone :wave:

Ever wondered why a build is slow? Low CPU or network usage in a slow build usually means the build isn’t parallelized enough. But that kind of data, correlated with your actual build, is something most teams don’t have access to.

Well, now you do. We just shipped Machine Metrics with CPU, memory, network, and disk I/O charts right on your build run page. And it works for both Xcode and Gradle, local and CI builds.

Here’s an example of how this look in practice:

How it looks

Each build run page now includes a Machine Metrics card with four charts:

  • CPU usage over time
  • Memory consumption vs. total available
  • Network I/O (in/out throughput)
  • Disk I/O (read/write throughput)

Setup

For Gradle builds, all you need to do is to update the Gradle plugin.

To track machine metrics for your Xcode builds, all you need is a single command:

tuist setup insights

This starts a lightweight background daemon that samples machine metrics during builds. The data is automatically picked up by tuist inspect build and uploaded alongside your build reports.

Full details in the docs:

What’s next

We plan to make two additions in the future that will make machine metrics even more useful:

  • track the data for the build process specifically, especially useful for analytics of local builds where your CPU or network might be busy with other tasks
  • include a build timeline with your modules and tasks, so you can better spot a bottleneck, such as when a single module is taking a long time to build, while the network or CPU is not being used much.

As always, let us know what you think :backhand_index_pointing_down:

]]>
https://community.tuist.dev/t/machine-metrics/937#post_1 Wed, 11 Mar 2026 11:37:57 +0000 community.tuist.dev-post-2449
Expanding Tuist's Cache Infrastructure Beyond Build Systems
marekfort:

I’m not opposed to playing with this on the side, but imho this functionality primarily becomes interesting when we have runners, so not sure if I’d prioritize this fully tackling this (including design, billing, etc.) before we progress with that feature.

We can start with compute, and then tackle this after. The good thing is that we have figured out already some pieces that will be needed for this:

  • A granular authorization model and the workflows to generate and scope sessions to a set of permissions.
  • The authentication workflows and the session management such that we don’t need to tell users to generate a token and paste it somewhere in their system :man_facepalming:
  • Users accostumed to have the CLI in their systems.
]]>
https://community.tuist.dev/t/expanding-tuists-cache-infrastructure-beyond-build-systems/935#post_8 Wed, 11 Mar 2026 10:01:16 +0000 community.tuist.dev-post-2448
Enhancing Tuist's `graph` Command with Interactive Graph Following up on this RFC — I ended up implementing the interactive graph work:

The demo includes an upload button in the bottom-left, so you can load any graph.json generated from tuist graph --format json and try it with your own project.

Would love any feedback on the integration approach, UX, and what would make this most useful for Tuist users.

]]>
https://community.tuist.dev/t/enhancing-tuists-graph-command-with-interactive-graph/225#post_7 Tue, 10 Mar 2026 23:50:59 +0000 community.tuist.dev-post-2446
Expanding Tuist's Cache Infrastructure Beyond Build Systems
cschmatzler:

One thing I’d like to raise is: what’s the business plan for this?

One other piece I’d add here is that this directly relates to our goal of implementing runners. Instead of coming up with a yet-another GH Action, like namespace does for their volumes, this would enable us to have a caching not tied to a specific environment and making the caching more easily translatable between providers. We would probably still to introduce a similar action for teams not using mise, but the majority of teams that we currently work with do use mise, so having a cache integration for mise tasks could really feel nice.

As for pricing, we can start with offering this for free as we did with other features like Gradle remote cache, and then coming up with pricing. Here, whatever pricing we’d land on should imho directly correlate to the infra costs (as opposed to the Xcode Cache where the value we provide does not correlate with the infra costs, here, I’d argue it does).

This will also require quite some changes in the dashboard as it would be an account-level feature that teams need to be able to track, especially once this would become a paid feature.

I’m not opposed to playing with this on the side, but imho this functionality primarily becomes interesting when we have runners, so not sure if I’d prioritize this fully tackling this (including design, billing, etc.) before we progress with that feature.

]]>
https://community.tuist.dev/t/expanding-tuists-cache-infrastructure-beyond-build-systems/935#post_7 Tue, 10 Mar 2026 13:04:55 +0000 community.tuist.dev-post-2445
Expanding Tuist's Cache Infrastructure Beyond Build Systems Thanks for the review @cschmatzler

Something like this will likely be scoped to an account instead of a project, since those are bound to a build toolchain/system. We’d definitively need to include it in the billing plan and put some mechanisms in place to prevent an Xcode CAS situation.

I’d suggest that we go with the right limits in place so that we keep the costs under control, offer it for free and understand from our users the value that we are bringing to them, and then put a price on it that feels reasonable to the value that they are getting.

]]>
https://community.tuist.dev/t/expanding-tuists-cache-infrastructure-beyond-build-systems/935#post_6 Tue, 10 Mar 2026 12:46:33 +0000 community.tuist.dev-post-2444
Expanding Tuist's Cache Infrastructure Beyond Build Systems Hey, new stuff for the stuff I built!

I think in general having a generalized cache thing, since the nodes already don’t care what it is (to an extent, we still have separate endpoints) on the domain layer.

One thing I’d like to raise is: what’s the business plan for this? I’m already ultra struggling with coming up with a reasonable strategy for the Xcode cache, and if we do something like this it needs

  • authentication like Marek already said
  • maybe some changes to how projects work, or allow cache artifacts that are not attached to a project but an account
  • a billing plan

I think on the implementation side all of the ideas are super simple, in the end it’s just using the existing upload and download logic and event pipeline that we already use; just on the business side I’d like us to preferably have a plan first before launching to not get into the same “what do we do now” situation like with Xcode CAS.

]]>
https://community.tuist.dev/t/expanding-tuists-cache-infrastructure-beyond-build-systems/935#post_5 Tue, 10 Mar 2026 11:33:53 +0000 community.tuist.dev-post-2443
Expanding Tuist's Cache Infrastructure Beyond Build Systems
pepicrft:

I think what we are indeed generalizing in this RFC is not just our ability to store binaries and serve them with low latency, but also our ability to store key-value pairs. So it’d be more appropriate to extend the title to say that we are “expanding the selection beyond test selection and expanding caching beyond build system caching”.

Just noted that Bazel calls the key-value piece “action cache”.

]]>
https://community.tuist.dev/t/expanding-tuists-cache-infrastructure-beyond-build-systems/935#post_4 Tue, 10 Mar 2026 11:30:53 +0000 community.tuist.dev-post-2442
Expanding Tuist's Cache Infrastructure Beyond Build Systems
marekfort:

Should input and output be renamed to inputs and outputs? The API should likely take in a variable list of inputs and outputs.

Yeah, we can rename it to inputs and outputs. Note that they are cumulative, so you can have multiple lines:

# Maybe not the best example for two lines, but you get the idea :)
#CACHE inputs ["Package.resolved"] hash=content
#CACHE inputs ["Package.swift"] hash=content

I placed it as an interface for developers to configure how they’d like to hash the files (e.g. mtime, size) for cases where developers prefer to trade reliability for speed, but we can start with content as the only supported/default, and only add a different method if we are xplicitly asked for.

That’s correct. We can default to tuist.toml as the source of truth for the configuration and give them the option to override at the script level:

#CACHE project "tuist/automation"

If we want to evolve this into a layer (e.g., like MCPs sit between LLMs/agents and the outside world), we might want to iterate on the concept of “project,” since it’s maybe too tied to Tuist. At the end of the day, the project is a piece of information to scope the data & artifacts on our side (we are the ones calling it a project), so we can come up with an interface that makes that concept more generic.

On the auth side, the good thing is that there’s OAuth2, so the authentication flow could be similar to the one that LLM clients follow (i.e. OAuth2 with dynamic client registration).

#FABRIK input files ["Package.resolved"]
#FABRIK input files ["src/**/*.ts"]

# From command output
#FABRIK input command "swift --version"
#FABRIK input command "tuist version"

# From the execution environment
#FABRIK input platform arch
#FABRIK input platform os
#FABRIK input platform os-version

I think what we are indeed generalizing in this RFC is not just our ability to store binaries and serve them with low latency, but also our ability to store key-value pairs. So it’d be more appropriate to extend the title to say that we are “expanding the selection beyond test selection and expanding caching beyond build system caching”.

With that in mind, we have two types of data “key-value pairs” and “key-large-binary pairs”. While we could persist the former in the latter’s storage, I think it makes sense to stick to our model of using ClickHouse for the former and volumes & object storage for the latter.

Nx does a full replacement and Turborepo merges files when globs like build/** are used. I suggest that we start with a full replacement and iterate from there.

The Cache Client Protocol (CCP) :laughing:

]]>
https://community.tuist.dev/t/expanding-tuists-cache-infrastructure-beyond-build-systems/935#post_3 Tue, 10 Mar 2026 10:04:43 +0000 community.tuist.dev-post-2441
Support treating arbitrary directories as files in Buildable Folder resolution Got it! That’s what we’re doing currently, so I guess it’s what we’ll have to continue with. I didn’t realize that Xcode only supported opaque directories with known extensions, but that totally makes sense. Thanks for the info!

]]>
https://community.tuist.dev/t/support-treating-arbitrary-directories-as-files-in-buildable-folder-resolution/934#post_3 Mon, 09 Mar 2026 16:18:46 +0000 community.tuist.dev-post-2440
Expanding Tuist's Cache Infrastructure Beyond Build Systems
pepicrft:
#!/usr/bin/env tuist exec bash
#CACHE input "Package.resolved" hash=content
#CACHE output ".build/"

tuist install

This is great and the use-case I’d start with since it’s such a common need across teams.

Should input and output be renamed to inputs and outputs? The API should likely take in a variable list of inputs and outputs.

What does the hash=content signal?

Additionally, this will need authentication against the Tuist server. I assume tuist CLI will be required to be installed (at least until this is a concept in Mise) and users would need to have a Tuist project? Since this is meant to be agnostic, integrating this into our dashboard where you need to pick between Gradle or Xcode build systems might be a bit awkward.

Being said, I think for the first iteration, this coupling is fine. If we see traction, we can consider how we can better decouple the cache (and the related analytics, etc.) from the existing projects dashboards.

I’d update the examples to not only model static inputs like Package.resolved, but also dynamic ones, like tuist version or swift --version. The API might be somewhat different in those cases but I think it’s important to piece we shouldn’t gloss over.

Do we actually benefit from the two-tiered approach of key values and artifacts? I wonder if we should have just a single tier with:

  • key being based on the inputs
  • and the value would be a zip of all the outputs

or do you think that parts of the output will be heavily reused across scripts?

If a .build already exists and you re-run a script that causes a new version of .build directory to be downloaded, how do you reconcile the changes between the two?

Yes, I think it would be beneficial in having this as a Mise-first concept. In the end, our primary proposition is to serving the artifacts close to you, not necessarily being the only provider that happens to have this.

So, in general, very aligned, just a couple of quirks to iron out before we go ahead and start implementing this :slightly_smiling_face:

]]>
https://community.tuist.dev/t/expanding-tuists-cache-infrastructure-beyond-build-systems/935#post_2 Mon, 09 Mar 2026 16:13:16 +0000 community.tuist.dev-post-2439
Support treating arbitrary directories as files in Buildable Folder resolution Hi @sphanley :waving_hand:

Unfortunately this isn’t something we can support through buildableFolders. Under the hood, buildableFolders maps to Xcode’s PBXFileSystemSynchronizedRootGroup, which doesn’t have a concept of folder references. Xcode automatically syncs the contents of the directory and natively knows how to handle known opaque directories like .xcassets or .bundle (that’s what PR #9683 fixed on Tuist’s side, our internal file resolution for synthesized accessors). But for an arbitrary directory, there’s no way to tell Xcode “treat this as opaque” within a synchronized group.

For your use case with the Git submodule where you need to preserve the directory structure, resources: [.folderReference(path: "Some/Directory")] is still the right approach.

You can use both APIs in the same target, so you could use buildableFolders for your sources and most resources, and resources: with .folderReference for the specific directories where you need the structure preserved.

]]>
https://community.tuist.dev/t/support-treating-arbitrary-directories-as-files-in-buildable-folder-resolution/934#post_2 Mon, 09 Mar 2026 15:59:32 +0000 community.tuist.dev-post-2438
Expanding Tuist's Cache Infrastructure Beyond Build Systems We’ve invested in building cache infrastructure designed so teams can optimize latency by bringing it closer to their compute. The infrastructure was purpose-built for one use case: optimizing the compilation of build systems by reusing artifacts from previous builds. That said, the primitives we’ve created are broadly useful, and it’s worth asking whether we should extend the interface so developers can plug into it from other contexts.

Below are some ideas, in no particular order, that we might want to explore.


Caching Through Decoration

The idea of reusing artifacts from previous workflows is not exclusive to build systems. Other use cases, such as dependency resolution, could benefit from the same optimization, ideally with a solution that works consistently across environments.

Drawing inspiration from Mise’s pattern of adding annotations to scripts (for example, to declare a CLI and get arguments parsed and validated), we can bring caching capabilities to scripted workflows. Take the following script to resolve dependencies:

#!/usr/bin/env tuist exec bash
#CACHE input "Package.resolved" hash=content
#CACHE output ".build/"

tuist install

Using a shebang and structured comments, developers can bring caching to their existing scripts. In the example above, we hash the input and resolve the output artifacts through our cache infrastructure. This works for any scriptable runtime. For example, in Ruby:

#!/usr/bin/env tuist exec ruby
#CACHE input "Package.resolved" hash=content
#CACHE output ".build/"

sh("tuist install")

GitHub Actions has a similar concept with its cache actions, but it is tightly coupled to that platform, making it difficult to reuse artifacts in other environments. Nx also has comparable capabilities, but it requires users to declare an automation graph using its DSL. Bringing caching closer to scripts through decoration is a smoother integration: it requires only a few comments in existing scripts and perhaps some light refactoring to make those scripts more atomic.

Portability. The portability of this approach depends on how much business logic is baked into the script. A script that just runs tuist install is highly portable; one that encodes project-specific paths, environment assumptions, or conditional logic is not. Portability is a property of the script, not of the mechanism. That said, the decoration interface at least makes the caching intent explicit and environment-agnostic, which is an improvement over solutions that hard-code caching logic inside CI platform configuration.

Mise integration. Mise has tasks that activate the right tools and add features like argument parsing via usage. Our solution would complement this nicely: adding tuist as a Mise dependency and making a few script changes is all that’s needed to unlock caching superpowers for projects already using Mise. This is a strong basis for a marketing message, since many projects use Mise tasks today.


Caching Through a Shell-Based Runtime API

Decoration solves a lot of use cases, but some scripts need access to caching primitives at runtime, for example to implement control flow based on whether a cached result exists. For this, we can expose an interface through the CLI so any script can shell out to it and parse the exit code to drive logic.

# Key-value store operations
tuist cas keys list                     # list recent key-value mappings
tuist cas keys get <key>                # look up what a key resolves to
tuist cas keys set <key> <value>        # create/update a mapping
tuist cas keys delete <key>             # remove a mapping

$ tuist cas keys get a3f9c1e
  Key:     a3f9c1e8b2d4f6a8
  Value:   d8b2f1a4c6e8b0d2
  Created: 2026-03-09 13:45:00

# Artifact (blob) operations
tuist cas artifacts get <hash>              # inspect metadata
tuist cas artifacts download <hash> <path>  # download blob to a local path
tuist cas artifacts push <path>             # upload content, returns hash
tuist cas artifacts delete <hash>           # remove
tuist cas artifacts list                    # list stored artifacts

$ tuist cas artifacts get d8b2f1a
  Hash:     d8b2f1a4c6e8b0d2
  Size:     12.4 MB
  Type:     xcframework
  Name:     TuistKit
  Stored:   2026-03-09 13:45:00

Here is a concrete example: building DocC documentation for a large target can take over 10 minutes. With this API, we can turn that CPU-bound operation into a network round-trip when the inputs haven’t changed:

#!/bin/bash

INPUT_HASH=$(cat $(find Sources/TuistKit -name "*.swift" | sort) Sources/TuistKit/TuistKit.docc/**/* | shasum | cut -d' ' -f1)

RESULT=$(tuist cas keys get "$INPUT_HASH" --json 2>/dev/null)

if [ $? -eq 0 ]; then
  ARTIFACT_HASH=$(echo "$RESULT" | jq -r '.value')
  tuist cas artifacts download "$ARTIFACT_HASH" docs.doccarchive.tar.gz
  tar xzf docs.doccarchive.tar.gz
  rm docs.doccarchive.tar.gz
  echo "Restored docs from cache."
else
  xcodebuild docbuild \
    -workspace Tuist.xcworkspace \
    -scheme TuistKit \
    -derivedDataPath .build/

  tar czf docs.doccarchive.tar.gz .build/Build/Products/Debug/TuistKit.doccarchive
  ARTIFACT_HASH=$(tuist cas artifacts push docs.doccarchive.tar.gz --json | jq -r '.hash')
  tuist cas keys set "$INPUT_HASH" "$ARTIFACT_HASH"
  rm docs.doccarchive.tar.gz
  echo "Docs built and cached."
fi

A less obvious but important benefit of this API is that it lets teams decouple skip logic from their CI pipelines. Today, a lot of “skip this job if these files haven’t changed” logic is implemented using platform-specific features: GitHub Actions path filters, custom hash-and-compare steps, or bespoke shell scripts embedded in YAML. That logic is hard to test, hard to reuse across pipelines, and completely lost when a team switches CI providers. With a runtime cache API, the same skip logic can live in a plain script that runs identically on any machine. The optimization is no longer a CI concern; it becomes part of the workflow itself.


Caching Through Native Bindings

This is a longer-term direction. Once the shell-based API has matured, we can consider providing native bindings for popular runtimes so teams can build tighter integrations without needing system processes or a separately installed CLI. The DocC example above would become something like this with Node.js bindings:

import { cas } from "@tuist/cas";
import { hash } from "@tuist/cas/hash";
import { exec } from "node:child_process";

const inputHash = await hash.files([
  "Sources/TuistKit/**/*.swift",
  "Sources/TuistKit/TuistKit.docc/**/*",
]);

const entry = await cas.keys.get(inputHash);

if (entry) {
  await cas.artifacts.download(entry.value, ".build/TuistKit.doccarchive", {
    extract: true,
  });
  console.log("Restored docs from cache.");
} else {
  exec("xcodebuild docbuild -workspace Tuist.xcworkspace -scheme TuistKit -derivedDataPath .build/");

  const { hash } = await cas.artifacts.push(".build/Build/Products/Debug/TuistKit.doccarchive");
  await cas.keys.set(inputHash, hash);
  console.log("Docs built and cached.");
}

Additional Considerations

Telemetry and Observability

If we expand caching beyond build systems, we need to invest in telemetry and UI to match. Teams will need to observe how the cache is being used, purge entries, and understand how usage distributes across different scripts and workflows. The interface changes are only part of the investment; the observability layer needs to keep pace.

A Narrow Waist for Build Infrastructure

Rather than positioning this purely as a Tuist feature, we could frame it as an infrastructure-agnostic interface between projects and their cache backends. This approach has strong precedent: OpenTelemetry, Prometheus, and Kubernetes all succeeded by defining standards that allowed users to choose their own providers.

Concretely, we could influence tools like Mise to treat caching as a first-class concept in their task runner, with Tuist as one of several pluggable backends. This would make Mise a meaningful go-to-market channel for us while benefiting the broader ecosystem. It does require giving up some branding control, but the reach and credibility that comes with being a standards-aligned provider could more than compensate.

]]>
https://community.tuist.dev/t/expanding-tuists-cache-infrastructure-beyond-build-systems/935#post_1 Mon, 09 Mar 2026 14:52:46 +0000 community.tuist.dev-post-2437
Support treating arbitrary directories as files in Buildable Folder resolution With this recent PR, opaque directories such as .xcassets directories are treated as files rather than directories. I’m wondering if it’s possible to support similarly treating an arbitrary directory as a file? This was possible when using resources: rather than buildableFolders:, by passing FileElement.folderReference(path: "Some/Directory”) instead of ”Some/Directory”. But while resources:took arguments of type ResourceFileElements, buildableFolders only takes a BuildableFolder, which can only be created from a string or Path, not a FileElement.

This would be valuable to my team– without going into too much detail, we have an external Git Submodule which contains nested subdirectories of resources, and it’s ideal for us to preserve the underlying directory structure when accessing these files.

A related but slightly different request would be the ability to add an arbitrary file to the BuildableFolders array, with it being treated the same as an opaque folder. I understand that in practical terms this would likely be functionally the same as just passing single files via resources:, but it would be nice from a readability perspective to have all resources in one place.

]]>
https://community.tuist.dev/t/support-treating-arbitrary-directories-as-files-in-buildable-folder-resolution/934#post_1 Mon, 09 Mar 2026 13:53:14 +0000 community.tuist.dev-post-2436
RFC: Test Sharding Based on some extra feedback, we decided that suite-level splitting should be supported from day one also for Xcode. Here’s how it would work technically.

Filtering mechanism

The .xctestrun plist already supports class-level filtering natively. Each TestTarget entry accepts:

  • OnlyTestIdentifiers: array of identifiers to include
  • SkipTestIdentifiers: array of identifiers to exclude

The identifier format is ClassName or ClassName/testMethodName. So the server can inject OnlyTestIdentifiers per test target to restrict each shard to its assigned classes, the same way it currently strips entire TestTarget entries for module-level sharding, but one level deeper:

<key>OnlyTestIdentifiers</key>
<array>
    <string>CalculatorTests</string>
    <string>NetworkClientTests</string>
</array>

No -only-testing flags needed. The filtered .xctestrun is self-contained.

Test suite discovery

For module-level sharding, the .xctestrun plist is the source of truth. Each TestTarget entry’s BlueprintName gives the complete module list. For suite-level sharding, the .xctestrun doesn’t list individual classes inside each target, so we need an additional enumeration step.

The plan step already runs xcodebuild build-for-testing. After building, xcodebuild test-without-building -enumerate-tests (Xcode 16+) enumerates all test targets, classes, and methods from the built products without executing them. The client sends this class list to the server the same way it sends the module list for module-level sharding. The server does the same bin-packing either way.

Performance: enumerate-tests doesn’t execute any tests, but it does load the test bundles into a simulator or test host process to reflect on XCTestCase subclasses. This can add 10-30 seconds on top of the build depending on project size and whether the simulator is already booted. Since the plan step already runs build-for-testing (which typically boots a simulator), the incremental cost should be modest. An alternative would be parsing the .xctest Mach-O binaries directly with nm to extract test* symbols — this is instant but fragile across Swift name mangling changes and wouldn’t catch dynamically generated tests. enumerate-tests is the safer default. If it turns out to be too slow for large projects, we could revisit this.

How it fits into the existing design

The change is minimal. It’s the same .xctestrun filtering mechanism, just at a finer granularity:

  • Module-level (current default): Server removes TestTarget entries not assigned to the shard.
  • Suite-level (opt-in, e.g., --granularity suite): Server keeps all TestTarget entries but adds OnlyTestIdentifiers to each, filtering to the assigned classes.
  • Timing data: Uses test_suite_runs (avg_duration per class) instead of test_module_runs.
  • Bin-packing: Same LPT algorithm, just operating on classes instead of modules.

In a very similar way, we could also do sharding at the individual test case level.

]]>
https://community.tuist.dev/t/rfc-test-sharding/929#post_6 Fri, 06 Mar 2026 18:27:58 +0000 community.tuist.dev-post-2433
Enforce bundle size in PRs with the Tuist GitHub App Tuist can already track your app’s bundle size over time and post PR comments showing how it changed. But until now, there was no way to enforce a limit, and so a size regression could slip through if no one caught the comment.

You can now configure bundle size thresholds that automatically fail a PR’s checks when your app’s install or download size grows beyond a percentage you define. The check includes a breakdown of the baseline vs. current size and the exact deviation, so there’s no guessing about what changed.

And if the increase is intentional, say you added a new feature that justifiably adds weight, you can accept it directly from the GitHub check run with one click. No need to disable the threshold, merge with a failing check, or ping someone to approve.

How it works

  1. Connect your project to the Tuist GitHub app
  2. Go to your project’s Bundles settings and add a threshold. Pick a metric (install size or download size), a baseline branch, and a maximum deviation percentage
  3. Run tuist inspect bundle in CI as you already do
  4. Tuist creates a GitHub Check Run on the PR. If the threshold is exceeded, the check fails with an Accept button to approve the increase

Learn more

Full documentation: Bundle size thresholds

As always, feedback is very much welcome :relieved_face:

]]>
https://community.tuist.dev/t/enforce-bundle-size-in-prs-with-the-tuist-github-app/931#post_1 Fri, 06 Mar 2026 17:44:51 +0000 community.tuist.dev-post-2432
RFC: Test Sharding Great question!

I verified this locally and code coverage works with .xctestproducts bundles without needing DerivedData. The .xctestrun file inside the bundle already includes CodeCoverageBuildableInfos with source file references and TESTROOT
placeholders, so coverage data is produced correctly even when running from a completely different directory.

The challenge with sharding is merging coverage across shards. Each shard produces its own .xcresult with partial coverage. Only the test targets that ran on that shard have actual coverage data; the rest show 0%. Apple’s xcrun xcresulttool merge can combine them
into a single report, but it’s a macOS-only tool.

I’d propose two options:

  1. A new Tuist command (e.g., tuist test merge-results --session <id>) that downloads all shard .xcresult bundles from the server, runs xcresulttool merge, and produces a single merged .xcresult. This would run on a macOS CI runner as a post-sharding step. The
    advantage is that it’s simple, uses Apple’s official tooling, and the merged result is a standard .xcresult that integrates with any existing coverage workflows.
  2. Server-side merging where the Tuist server merges the results. Since xcresulttool is macOS-only, this would require running a macOS node in our infrastructure. The advantage is that it’s fully automated, no extra CI job needed, and the merged result would be
    available directly in the dashboard.

I’d probably start with 1. and see if there would still be a need for 2.

Thanks for raising this!

]]>
https://community.tuist.dev/t/rfc-test-sharding/929#post_5 Fri, 06 Mar 2026 16:58:46 +0000 community.tuist.dev-post-2431
RFC: Test Sharding Great RFC!

Thank you for adding it, it is a great feature, which mobile teams have to implement themselves.

Quick question, previously I remember having issues gathering code coverage from tests executed via .xctestproducts. And we had to use derived data instead.

Is this no longer an issue with this approach?

]]>
https://community.tuist.dev/t/rfc-test-sharding/929#post_4 Fri, 06 Mar 2026 16:38:14 +0000 community.tuist.dev-post-2430
RFC: Test Sharding
pepicrft:

Is the processing logic macOS-bound? Sounds like we’ll be able to reuse the infrastructure force the server-side processing. In that case, will the client poll the server state until the shard information is ready to continue? Or is the plan to make it synchronously on the server?

.xctestrun is a plist. I’d try to process it directly server-side, so we don’t need to deal with it being asynchronous. In the worst case, we could use our processing nodes, yeah. But macOS shouldn’t be necessary for processing it, no.

For tuist test, the picture is a bit more complex as we need the graph to be available to upload selective test results. But we should be able to upload the graph.json and use that. Will make sure to cover this use-case when building this out.

It does not. We will need to bundle individual binaries/resources/etc. There are existing plugins that do that, like this, that we can take inspiration from. It’s certainly doable, but yeah, the complexity for Gradle might be a bit higher because of this.

Yeah, we can support passing the shard both as a env variable and as a CLI option.

Test suites !== modules. Modules are a higher granular level aimed at modularized Xcode projects where module feels like the better abstraction. As mentioned in the RFC, down the road, we could support test suite granularity for Xcode projects and module granularity for Gradle projects. The naming is already aligned with our current database and API test model conventions.

Agree :+1:

Will do.

]]>
https://community.tuist.dev/t/rfc-test-sharding/929#post_3 Fri, 06 Mar 2026 15:48:59 +0000 community.tuist.dev-post-2428
RFC: Test Sharding Thanks for putting this together. I’m aligned with the direction. Some comments on things that I noted.

Is the processing logic macOS-bound? Sounds like we’ll be able to reuse the infrastructure force the server-side processing. In that case, will the client poll the server state until the shard information is ready to continue? Or is the plan to make it synchronously on the server?

Do jobs need to pull the sources for this if the file is self-contained, containing the binaries?

Does Gradle have a portable format as Xcode does? Or will we have to come up with one ourselves?

Can/should we also support passing the shard as an env. variable?

./gradlew test --shard ${{ matrix.shard }}

Not a big deal, though.

I noticed the payload of this one is very similar to the Xcode one, with a slight difference in terminology, test_suites vs modules. Do you think there’s an opportunity here to align in naming? Or do you think is better to go with a different payload based on the build system of the project?

I’d leave this out of the scope of this work, but something definitely to consider down the line.

I think the presence of --shard-* is explicit enough.

Since the sharding configuration is closer to the CI automation, I think it’s better to make it explicit from the command invocation. If needed, we can add this later.


I’d recommend sharing this with the users that we know are doing sharding already to see if we’ve missed anything.

]]>
https://community.tuist.dev/t/rfc-test-sharding/929#post_2 Fri, 06 Mar 2026 15:30:56 +0000 community.tuist.dev-post-2427
RFC: Test Sharding Summary

This RFC proposes adding test sharding to Tuist, allowing users to split their test suites across multiple CI runners for faster feedback loops. The system will distribute tests across shards using timing data collected by the Tuist server, with fallback strategies when no historical data is available. The splitting granularity differs by build system: module-level for Xcode (test targets) and suite-level for Gradle (test suites, i.e. Gradle test classes), reflecting the conventions and tooling capabilities of each ecosystem. While the initial focus is on GitHub Actions integration, the design is CI-agnostic.

Motivation

As projects grow, test suites become the bottleneck in CI pipelines. A monorepo with dozens of test targets can take 30+ minutes to run sequentially on a single machine. Teams work around this by manually splitting tests across CI jobs, but this approach is fragile, hard to maintain, and leads to unbalanced shards where one job takes 20 minutes while others finish in 5.

Tuist is uniquely positioned to solve this because:

  1. Tuist already knows the project graph – it understands which test targets exist and their dependencies.
  2. The Tuist server already collects test timing data – per-module and per-test-case durations from tuist test result uploads, stored in ClickHouse with recent_durations and avg_duration fields.
  3. Tuist already detects CI environments – GitHub Actions, GitLab CI, CircleCI, Buildkite, Bitrise, and Codemagic.

The missing piece is an orchestration layer that uses this data to produce balanced shard assignments and integrates with CI matrix strategies.

Prior Art

Buildkite Test Engine Client (bktec)

Buildkite uses a bin-packing algorithm with historical timing data to distribute tests so all parallel workers finish at roughly the same time. It supports file-level and example-level splitting, marks files exceeding 70% of a worker’s estimated time for finer-grained splitting, and suggests parallelism counts that keep all workers within a ~2-minute completion window. New tests default to an estimated 1000ms until real data is available.

CircleCI circleci tests split

CircleCI offers three strategies: by name (round-robin alphabetically), by timing (historical execution times from store_test_results), and by file size. The parallelism: N key spins up N containers, each aware of its index via $CIRCLE_NODE_INDEX / $CIRCLE_NODE_TOTAL. Timing data accumulates automatically from uploaded test results.

Bazel shard_count

Bazel sets TEST_TOTAL_SHARDS and TEST_SHARD_INDEX environment variables. The test runner selects tests via index % total_shards == shard_index. Purely count-based with no timing optimization.

Gradle / Develocity

Gradle’s built-in maxParallelForks uses round-robin across JVM forks (single machine). Develocity Test Distribution (commercial) uses timing-based partitioning across remote agents with real-time work-stealing.

Proposed Solution

Overview

The sharding workflow has three phases:

  1. Plan – The CLI (or Gradle plugin) queries the server for test timing data and computes a shard assignment.
  2. Execute – Each CI runner receives its shard index and runs only its assigned tests.
  3. Report – Each shard uploads its test results to the server as it does today.

Shard Configuration

Sharding is configured via CLI flags on tuist test --build-only or tuist xcodebuild build-for-testing (for Xcode) or environment variables (for Gradle). This allows users to experiment with sharding in feature branches before rolling it out — the CI workflow file is branch-specific.

See the sharding flags table in Section 1 (Xcode projects) below for the full list of options.

Running a Specific Shard

There are two execution paths depending on the build system.

1. Xcode projects

Sharding is built into two command layers:

  • tuist test (recommended for Tuist-generated projects) — the existing tuist test command gains --shard-* flags. When --build-only is combined with sharding flags, it generates the project, builds for testing, computes shards, and outputs the matrix. When --without-building is used with shard environment variables, it pulls the filtered .xctestrun and runs the assigned tests. This is the recommended path for Tuist-generated projects because it handles project generation, selective testing, and sharding in a single command.
  • tuist xcodebuild build-for-testing / test-without-building (for non-generated projects) — the same sharding behavior, but without project generation or selective testing. This is the path for projects that manage their own .xcodeproj / .xcworkspace.

Under the hood, both paths use the same .xctestproducts bundle mechanism — the only difference is that tuist test also handles project generation and selective testing.

Test module discovery via .xctestrun: The .xctestrun plist file (embedded inside the .xctestproducts bundle produced by xcodebuild build-for-testing -testProductsPath) contains an entry for every test target in the scheme. This is the authoritative source of “what test modules exist in this project right now” — it works regardless of whether the project uses Tuist manifests.

This solves three problems:

  • New modules: A newly added test target appears in the .xctestrun file immediately. The server won’t have timing data for it, so it gets a default duration estimate. It will still be included in a shard and tested.
  • Removed modules: A deleted test target disappears from the .xctestrun file. The server may still have historical timing data, but since the module isn’t in the discovered set, it’s excluded from shard computation. Stale server data is harmlessly ignored.
  • First run: No bootstrapping problem. The .xctestrun file provides the full module list even when the server has no historical data at all. All modules get default estimates, producing a round-robin-like distribution.

Build-once, test-many pattern across machines: In CI, the build step and shard test steps typically run on different machines. The shard runners cannot reference local files from the build agent. To handle this:

  • The plan step (tuist test --build-only or tuist xcodebuild build-for-testing) auto-injects the -testProductsPath flag, producing a .xctestproducts bundle — a self-contained, portable artifact that packages the .xctestrun file alongside the compiled .xctest bundles.
  • When sharding flags are present, the bundle is uploaded to the Tuist server as part of the shard session. Each shard runner downloads the bundle with a filtered .xctestrun containing only its assigned test targets. No CI-provider-specific artifact sharing is needed.

The .xctestproducts bundle format (validated experimentally):

MyApp.xctestproducts/
├── Info.plist                          # Maps test plans to xctestrun file paths
├── Tests/
│   └── 0/
│       ├── MyApp.xctestrun            # The xctestrun file
│       └── Debug -> ../../Binaries/0/Debug  # Symlink to binaries
└── Binaries/
    └── 0/
        └── Debug/
            ├── AppTests.xctest/       # Compiled test bundles
            ├── CoreTests.xctest/
            └── ...

Key properties:

  • Self-contained and portable: The bundle contains everything needed to run tests on another machine — no source code, intermediate build artifacts, or DerivedData.
  • __TESTROOT__ resolves automatically: The .xctestrun file uses __TESTROOT__ placeholders. Inside the bundle, Tests/0/Debug symlinks to ../../Binaries/0/Debug, so __TESTROOT__/Debug/*.xctest resolves correctly.
  • xcodebuild test-without-building -testProductsPath consumes this bundle directly. Filtering works by modifying the .xctestrun inside the bundle to remove test target entries — only the targets present in the .xctestrun are executed.

Step 1: Plan job:

# Tuist-generated projects (recommended):
# Generates the project, builds for testing, computes shards, outputs the matrix.
tuist test --build-only --shard-max 6

# Non-generated projects:
# Build, compute shards, push to server, and output the shard matrix — all in one command.
tuist xcodebuild build-for-testing \
  -workspace MyApp.xcworkspace \
  -scheme MyApp \
  -destination 'platform=iOS Simulator,name=iPhone 16' \
  --shard-max 6

Sharding flags (available on both tuist test and tuist xcodebuild build-for-testing):

Flag Description Default
--shard-max N Maximum number of shards Number of test modules
--shard-min N Minimum number of shards 1
--shard-total N Exact number of shards (overrides min/max) Auto-determined
--shard-max-duration N Target max shard duration (seconds) None

These are Tuist-specific flags (not passed through to xcodebuild). The presence of any --shard-* flag activates sharding.

Automatic -testProductsPath injection: When sharding is active, the CLI auto-injects -testProductsPath (e.g., .tuist/test-products/<scheme>.xctestproducts) so the bundle is produced in a known location. For tuist xcodebuild, users can override this by passing their own -testProductsPath.

When sharding flags are present, the plan step (either tuist test --build-only or tuist xcodebuild build-for-testing) extends its normal behavior with:

  1. Auto-injects -testProductsPath if not already present.
  2. Runs xcodebuild build-for-testing with the passthrough arguments.
  3. Locates the .xctestrun file inside the produced .xctestproducts bundle.
  4. Parses the .xctestrun plist to discover test targets (each entry in TestConfigurations[0].TestTargets is a test module with a BlueprintName).
  5. Sends the module list, .xctestrun file, and shard configuration (min/max/total/max-duration) to the server. The server fetches timing data, computes shard assignments via bin-packing, and stores the .xctestrun + assignments tagged with a shard session ID.
  6. Receives the shard assignments back from the server.
  7. Outputs the shard matrix to the CI provider:
    • GitHub Actions: Writes matrix={"shard":[0,1,2,...]} directly to $GITHUB_OUTPUT (detected via the GITHUB_OUTPUT environment variable).
    • Other CI providers: Writes a tuist-shard-matrix.json file. Future integrations can add native output for other providers.

Without sharding flags, the command behaves exactly as it does today.

The .xctestproducts bundle is uploaded to the Tuist server as part of the shard session. Shard runners download it from the server alongside the filtered .xctestrun, so no CI-provider-specific artifact sharing is needed.

Step 2: Shard jobs:

# Tuist-generated projects (recommended):
# Downloads filtered .xctestrun, runs assigned tests, uploads results.
tuist test --without-building

# Non-generated projects:
# Same behavior, but without project generation or selective testing.
tuist xcodebuild test-without-building \
  -destination 'platform=iOS Simulator,name=iPhone 16'

When the TUIST_SHARD_INDEX environment variable is set, the shard step (either tuist test --without-building or tuist xcodebuild test-without-building) extends its normal behavior with:

  1. Downloads the .xctestproducts bundle for this shard from the Tuist server (session ID auto-detected from CI environment). The bundle contains a filtered .xctestrun with only the test targets assigned to this shard.
  2. Places the bundle at the known location (.tuist/test-products/<scheme>.xctestproducts) and auto-injects -testProductsPath.
  3. Runs xcodebuild test-without-building -testProductsPath <bundle-path> with the passthrough arguments.
  4. After tests complete, uploads test results to the server (as today), with shard metadata attached.

Without TUIST_SHARD_INDEX, the command behaves exactly as it does today — all tests run.

The server stores the original .xctestproducts bundle and the shard assignments. When a shard runner requests its bundle, the server removes the TestTargets entries that don’t belong to that shard from the .xctestrun’s TestConfigurations and returns the modified bundle.

Shard detection for Xcode shard runners:

Env Var Description
TUIST_SHARD_INDEX The index of this shard (0-based)

This is set in the CI workflow (e.g., from GitHub Actions matrix.shard). The total number of shards is already stored in the shard session on the server — the runner only needs to know its own index.

Coupling plan and shard jobs — shard session ID: The build and test steps need a shared identifier so shard runners can find the correct .xctestrun on the server. This is handled via a shard session ID derived from the CI environment:

CI Provider Session ID derived from
GitHub Actions github-{GITHUB_RUN_ID}-{GITHUB_RUN_ATTEMPT}
CircleCI circleci-{CIRCLE_WORKFLOW_ID}
Buildkite buildkite-{BUILDKITE_BUILD_ID}
GitLab CI gitlab-{CI_PIPELINE_ID}
Other / local Explicit --session <id> flag required

Since the Tuist CLI already detects CI environments, the session ID is auto-detected in most cases. The plan job and shard jobs within the same CI run share the same environment variables, so they produce the same session ID without any manual passing.

For retries: GITHUB_RUN_ATTEMPT is included so a retried workflow run gets a fresh session, avoiding stale shard assignments from a previous attempt.

2. Gradle projects: Tuist Gradle plugin

For Gradle projects, sharding is integrated into the Tuist Gradle plugin (dev.tuist:tuist-gradle-plugin). The plugin already hooks into Gradle’s test lifecycle for test insights and quarantine; sharding extends this with a prepareTestShards task and a test filtering step. To align with the Xcode workflow, shard configuration is passed as flags to the prepareTestShards task rather than being declared in the Gradle DSL.

Why suite-level splitting for Gradle: Gradle projects vary widely in modularization. Many Gradle projects follow a multi-module architecture (:feature:home, :core:network, etc.), but it’s equally common to see projects with a handful of modules or even a single monolithic :app module containing all tests. Module-level sharding would be useless in the latter case. Since the Tuist Gradle plugin already collects per-suite timing data (Gradle test classes map to Tuist test suites) via its TestListener, and Gradle’s filter.includeTestsMatching() API natively supports suite-level filtering (already used for test quarantine), suite-level splitting is both more practical and more effective.

Configuration: Sharding is configured via flags on the prepareTestShards Gradle task, mirroring the --shard-* flags used by tuist test and tuist xcodebuild for Xcode:

Flag Description Default
--shard-max <n> Maximum number of shards Required
--shard-min <n> Minimum number of shards 1
--shard-max-duration <s> Target max shard duration (seconds)

The TUIST_SHARD_INDEX environment variable tells the plugin which shard this runner is. When absent and prepareTestShards is invoked, the plugin runs the plan step — it discovers test suites, sends them to the server, and outputs the shard matrix (same CI integration as Xcode: GitHub Actions $GITHUB_OUTPUT, Buildkite buildkite-agent pipeline upload, or tuist-shard-matrix.json). When TUIST_SHARD_INDEX is set, the plugin runs in shard mode — it pulls its assigned test suites from the server and filters accordingly.

How it works:

Plan step (./gradlew prepareTestShards --shard-max <n>):

  1. The plugin compiles test sources and scans the test classpath to discover all current test suites. This is the source of truth for what exists now (same principle as .xctestrun for Xcode).
  2. The plugin packages the compiled test runtime classpath (compiled classes, application classes, and dependencies).
  3. The plugin calls the Tuist server’s shard session endpoint, uploading the packaged classpath along with the discovered test suites and shard configuration from the task flags.
  4. The server fetches per-suite timing data from test_suite_runs, computes shard assignments via bin-packing, and stores the session alongside the classpath.
  5. The plugin receives the shard assignments and outputs the shard matrix to the CI provider.
  6. Tests do not run in this step — the plan step only compiles, uploads, and computes the matrix.

Shard step (TUIST_SHARD_INDEX set):

  1. The plugin downloads the compiled test classpath from the Tuist server (session ID auto-detected from CI environment, same as Xcode).
  2. The plugin pulls the assigned test suites for this shard from the server.
  3. The plugin uses Gradle’s filter.includeTestsMatching() API (the same mechanism used for test quarantine today) to include only the assigned test suites, and configures the test task to use the downloaded classpath — skipping compilation entirely.
  4. Tests run and results are uploaded as usual, with shard metadata included.
# Plan step — compiles, uploads test classpath to server, computes shards, outputs matrix
./gradlew prepareTestShards --shard-max 6

# Shard step — downloads compiled test classpath from server, runs assigned tests
TUIST_SHARD_INDEX=${{ matrix.shard }} ./gradlew test

Build-once, test-many for Gradle: Like Xcode’s .xctestproducts bundle, the plan step packages and uploads the compiled test runtime classpath (compiled classes, application classes, and dependencies) to the Tuist server. Shard runners download it and run tests without recompilation — the same pattern Develocity Test Distribution uses when transferring compiled test binaries to remote agents.

Partitioning Strategy

The initial implementation uses a single strategy: timing-based bin-packing. The algorithm is the same for both build systems; what differs is the unit of distribution:

  • Xcode: test modules (targets) — data from test_module_runs
  • Gradle: test suites — data from test_suite_runs

timing (default and only strategy)

Uses historical test durations from the Tuist server to create balanced shards via a greedy bin-packing algorithm (Longest Processing Time first, or LPT). The algorithm runs server-side.

The core idea is simple: if you’re packing items of different sizes into a fixed number of bins, you get the most even distribution by placing the largest item first into the emptiest bin, then repeating. Applied to test sharding, each “item” is a test unit (module or class) with a known duration, and each “bin” is a shard. The algorithm minimizes the longest shard’s total duration, which is what determines overall CI wall-clock time.

Steps:

  1. Fetch avg_duration for each unit (module or class) from ClickHouse (scoped to the project and default branch).
  2. Sort units by duration descending (longest first).
  3. For each unit, assign it to the shard with the lowest total estimated duration so far.

Example: Given 5 modules with durations [30s, 25s, 20s, 15s, 10s] and 3 shards:

  • Shard 0 ← 30s → total: 30s
  • Shard 1 ← 25s → total: 25s
  • Shard 2 ← 20s → total: 20s
  • Shard 2 ← 15s → total: 35s (was lowest at 20s)
  • Shard 1 ← 10s → total: 35s (was lowest at 25s)
  • Result: shards of 30s, 35s, 35s — well-balanced despite uneven module sizes.

Units with no timing data are assigned an estimated duration equal to the median of known units (or a default of 30 seconds for modules / 5 seconds for classes if no data exists at all). When no timing data is available at all (e.g., a project that hasn’t uploaded test results yet), all units are assigned equal estimated durations, which effectively produces a round-robin distribution.

Future strategies

The --strategy flag is reserved for future extensibility. If the need arises, we could add strategies such as:

  • round-robin – Distributes test modules alphabetically in round-robin order. No server communication needed. Could serve as an explicit offline fallback for projects not connected to the Tuist server.
  • uniform – Distributes test modules to produce an equal count per shard (±1), ignoring timing data. Useful when test modules have roughly similar execution times.
  • dynamic – A queue-based approach (similar to Knapsack Pro) where runners pull work from a server-side queue at runtime, enabling real-time load balancing. This would require significant server-side infrastructure but would produce optimal shard balance.

We intentionally start with a single strategy to keep the initial implementation focused and gather real-world usage data before investing in alternatives.

Auto-Determining Shard Count

When --shard-total is not specified and --shard-min/--shard-max bounds are set, the timing strategy auto-determines the optimal shard count:

  1. Compute total estimated test duration from server data.
  2. Set target shard duration to total / N for candidate N values within [min, max].
  3. Select the smallest N where the longest shard (after bin-packing) is within 20% of the target duration.

If --shard-max-duration is set, start from ceil(total_duration / max_duration) and clamp to [min, max].

Server API

Shard session creation endpoint (for Xcode)

The CLI sends the discovered module list and shard configuration. The server fetches timing data, computes shard assignments, and returns the result along with an upload URL for the test artifacts.

Step 1: Create shard session

POST /api/projects/:project_handle/tests/shards

Request:

{
  "session_id": "github-12345-1",
  "modules": ["AppTests", "CoreTests", "NetworkTests", "NewFeatureTests"],
  "shard_min": 1,
  "shard_max": 6,
  "shard_max_duration": null
}

The server:

  1. Queries the test_module_runs ClickHouse table for timing data (filtered to CI runs on the default branch).
  2. Modules with no timing data get a default estimated duration (median of known modules, or 30 seconds if no data exists at all).
  3. Modules in the server’s history but not in the request (removed modules) are ignored.
  4. Determines the optimal shard count based on the configuration (min/max/total/max-duration).
  5. Computes shard assignments via the bin-packing algorithm.
  6. Stores the shard assignments and returns an S3 upload URL for the .xctestproducts bundle.

Response:

{
  "session_id": "github-12345-1",
  "shard_count": 4,
  "shards": [
    { "index": 0, "test_targets": ["AppTests", "CoreTests"], "estimated_duration_ms": 45000 },
    { "index": 1, "test_targets": ["NetworkTests", "AuthTests"], "estimated_duration_ms": 43000 }
  ],
  "upload_url": "https://storage.tuist.dev/..."
}

Step 2: Upload test artifacts

After receiving the response, the CLI uploads the .xctestproducts bundle (compressed) directly to S3 via the presigned upload_url. The server uses a conventional path based on the project and session ID, so shard runners can retrieve it later.

Step 3: Download shard (called by each shard runner):

GET /api/projects/:project_handle/tests/shards/:session_id/:shard_index

Response: A JSON with the shard assignment and a download URL for the .xctestproducts bundle. The server returns the bundle with a pre-filtered .xctestrun — test targets not assigned to this shard are already stripped from the TestConfigurations array. The shard runner downloads the bundle and runs tests directly, with no client-side filtering needed.

{
  "test_targets": ["AppTests", "CoreTests"],
  "download_url": "https://storage.tuist.dev/..."
}

Shard sessions are ephemeral — the server can garbage-collect them after a configurable TTL (e.g., 24 hours).

Shard session creation endpoint (for Gradle plugin)

The Gradle plugin uses the same session-based approach as Xcode. The plan step creates a session; shard runners pull their assignments by index.

The Gradle plugin uses the same POST /api/projects/:handle/tests/shards endpoint as Xcode, but sends test_suites instead of modules:

Request:

{
  "session_id": "github-12345-1",
  "test_suites": [
    "com.example.auth.LoginTest",
    "com.example.auth.SignupTest",
    "com.example.core.UtilsTest",
    "com.example.core.DatabaseTest"
  ],
  "shard_max": 4
}

The response includes an upload_url for the compiled test classpath (same pattern as .xctestproducts for Xcode). The plugin uploads the packaged classpath to S3 after creating the session.

Shard runners call GET /api/projects/:handle/tests/shards/:session_id/:shard_index and receive their assigned test suites plus a download_url for the compiled classpath:

{
  "test_suites": [
    "com.example.auth.LoginTest",
    "com.example.core.DatabaseTest"
  ],
  "download_url": "https://storage.tuist.dev/..."
}

This follows the same pattern as Xcode: the plan step provides the source of truth for what exists, the server provides timing data and computes balanced assignments, and shard runners only need their index to pull their assignments and artifacts.

Shard Computation Location

The shard computation happens on the server. The CLI sends the discovered module list (or suite list for Gradle), the .xctestrun file, and the shard configuration. The server fetches timing data from ClickHouse, runs the bin-packing algorithm, stores the .xctestrun and shard assignments, and returns the result to the CLI. This keeps the algorithm centralized (one implementation shared across Xcode and Gradle paths), allows the server to evolve the algorithm without CLI updates, and ensures the computation has direct access to timing data without an extra round-trip.

Dashboard Integration

Sharded test runs should appear as a single test run in the dashboard, not as separate entries per shard. This preserves the user’s mental model: “I ran my tests” produces one result, regardless of how many shards executed in parallel.

Each shard uploads its test results independently (as it does today), but tagged with the shard session ID and shard index. The server merges results into the single parent test run:

Dashboard UI Changes

The test run detail page (test_run_live) needs adjustments to surface shard information:

  • Overview tab: Show shard metadata when the run is sharded — total shard count, per-shard durations (e.g., a bar showing how balanced the shards were), and which shard was the bottleneck.
  • Test Cases / Test Suites / Test Modules tabs: Add a “Shard” column or filter, so users can see which shard ran which tests. A shard filter dropdown lets users drill into a specific shard’s results.
  • Failures tab: Each failure should show which shard it came from, helping users reproduce failures on the right shard.
  • Shard balance visualization: A simple bar chart or breakdown showing per-shard duration and test count. This helps users understand whether sharding is well-balanced and whether they should adjust --shard-max.

GitHub Actions Integration

Xcode — Tuist-generated projects (using tuist test)

This is the recommended path for projects that use Tuist manifests. tuist test handles project generation, selective testing, and sharding in a single command.

name: Tests
on: [pull_request]

jobs:
  plan:
    runs-on: macos-15
    outputs:
      matrix: ${{ steps.build.outputs.matrix }}
    steps:
      - uses: actions/checkout@v4
      - name: Install Tuist
        run: mise install
      # Generates the project, builds for testing, computes shards, and
      # writes the matrix to $GITHUB_OUTPUT — all in one step.
      # Generates the project, builds for testing, uploads the .xctestproducts
      # bundle to the Tuist server, computes shards, and writes the matrix
      # to $GITHUB_OUTPUT — all in one step.
      - name: Build and prepare shards
        id: build
        run: tuist test --build-only --shard-max 6

  test:
    runs-on: macos-15
    needs: plan
    strategy:
      fail-fast: false
      matrix: ${{ fromJson(needs.plan.outputs.matrix) }}
    steps:
      - uses: actions/checkout@v4
      - name: Install Tuist
        run: mise install
      # Downloads the .xctestproducts bundle and filtered .xctestrun from
      # the Tuist server, runs the assigned tests, and uploads results.
      - name: Run shard tests
        env:
          TUIST_SHARD_INDEX: ${{ matrix.shard }}
        run: tuist test --without-building

Xcode — non-generated projects (using tuist xcodebuild)

For projects that manage their own .xcodeproj / .xcworkspace without Tuist manifests.

name: Tests
on: [pull_request]

jobs:
  plan:
    runs-on: macos-15
    outputs:
      matrix: ${{ steps.build.outputs.matrix }}
    steps:
      - uses: actions/checkout@v4
      - name: Install Tuist
        run: mise install
      - name: Build and prepare shards
        id: build
        run: |
          tuist xcodebuild build-for-testing \
            -workspace MyApp.xcworkspace \
            -scheme MyApp \
            -destination 'platform=iOS Simulator,name=iPhone 16' \
            --shard-max 6

  test:
    runs-on: macos-15
    needs: plan
    strategy:
      fail-fast: false
      matrix: ${{ fromJson(needs.plan.outputs.matrix) }}
    steps:
      - uses: actions/checkout@v4
      - name: Install Tuist
        run: mise install
      - name: Run shard tests
        env:
          TUIST_SHARD_INDEX: ${{ matrix.shard }}
        run: |
          tuist xcodebuild test-without-building \
            -destination 'platform=iOS Simulator,name=iPhone 16'

Gradle (plugin-driven sharding)

Sharding is configured via flags on the prepareTestShards task. Shard runners use TUIST_SHARD_INDEX to pull their assignments.

name: Tests
on: [pull_request]

jobs:
  plan:
    runs-on: ubuntu-latest
    outputs:
      matrix: ${{ steps.plan.outputs.matrix }}
    steps:
      - uses: actions/checkout@v4
      - name: Set up JDK
        uses: actions/setup-java@v4
        with:
          java-version: '17'
          distribution: 'temurin'
      # Compiles test sources, discovers test suites, computes shards,
      # and writes matrix to $GITHUB_OUTPUT — all via the Gradle plugin.
      - name: Plan shards
        id: plan
        run: ./gradlew prepareTestShards --shard-max 6

  test:
    runs-on: ubuntu-latest
    needs: plan
    strategy:
      fail-fast: false
      matrix: ${{ fromJson(needs.plan.outputs.matrix) }}
    steps:
      - uses: actions/checkout@v4
      - name: Set up JDK
        uses: actions/setup-java@v4
        with:
          java-version: '17'
          distribution: 'temurin'
      - name: Run tests
        env:
          TUIST_SHARD_INDEX: ${{ matrix.shard }}
        run: ./gradlew test

Integration with Other CI Providers

The system works with any CI provider that supports parallel jobs. The key contract is:

  1. A plan job runs the plan step (tuist test --build-only, tuist xcodebuild build-for-testing, or ./gradlew prepareTestShards) and produces a shard matrix.
  2. Shard jobs set TUIST_SHARD_INDEX and run their assigned subset (tuist test --without-building, tuist xcodebuild test-without-building, or ./gradlew test).

The CLI outputs the matrix in a CI-native format when possible (GitHub Actions $GITHUB_OUTPUT, Buildkite buildkite-agent pipeline upload) and falls back to writing a tuist-shard-matrix.json file for other providers.

CI providers with native parallelism (CircleCI parallelism, GitLab CI parallel) provide their own index/total environment variables. These map directly to TUIST_SHARD_INDEX:

# CircleCI — Tuist-generated project (Xcode)
TUIST_SHARD_INDEX=$CIRCLE_NODE_INDEX tuist test --without-building

# CircleCI — non-generated project (Xcode)
TUIST_SHARD_INDEX=$CIRCLE_NODE_INDEX \
  tuist xcodebuild test-without-building -destination 'platform=iOS Simulator,name=iPhone 16'

# GitLab CI — Gradle (plugin-driven)
TUIST_SHARD_INDEX=$CI_NODE_INDEX ./gradlew test

Unsupported or custom CI providers can use tuist-shard-matrix.json directly. After the build step, the file contains the full shard assignment. Users read it to spawn parallel jobs in whatever way their CI supports:

# Plan step: build and write tuist-shard-matrix.json (use tuist test --build-only for generated projects)
tuist xcodebuild build-for-testing -scheme MyApp -destination '...' --shard-max 6

# Read the matrix
cat tuist-shard-matrix.json
# {"shard_count":4,"shards":[{"index":0,...},{"index":1,...},...]}

# Shard steps: set the shard index and run (how you spawn these depends on your CI)
TUIST_SHARD_INDEX=0 tuist test --without-building
# or: TUIST_SHARD_INDEX=0 tuist xcodebuild test-without-building -destination '...'

The tuist-shard-matrix.json format is stable and documented, so it can be consumed by any scripting or CI orchestration layer.

Alternatives Considered

Test-case-level splitting for Xcode

Splitting individual test methods (or suites) across shards for Xcode projects would give the most granular control and best balance. However, it requires enumerating all test cases before running (expensive for large suites), creates complex --only-testing argument lists, and breaks test fixtures that assume suite-level setup/teardown. Module-level splitting avoids these issues while still providing meaningful parallelism for Xcode, where projects managed by Tuist tend to be well-modularized. (Note: for Gradle, we do use suite-level splitting because Gradle’s filtering API handles it cleanly and Gradle projects are often not well-modularized.)

Future Direction: Tuist-Managed Runners

This RFC focuses on static shard assignment — the plan step computes a fixed partition, and each CI job runs its assigned subset independently. This requires users to configure CI matrix strategies and artifact sharing themselves.

A natural evolution is Tuist-managed test distribution, where a single tuist test or ./gradlew test invocation provisions remote runners, distributes tests across them, and streams results back in real time — similar to Develocity Test Distribution. This would eliminate the need for CI matrix configuration entirely: users would just run tuist test and Tuist would handle parallelism transparently.

This capability is explicitly out of scope for the current RFC. The static sharding design proposed here lays the groundwork (server-side timing data, bin-packing algorithm, session management) that a future dynamic distribution system would build on.

Open Questions

  1. Should we support test-suite-level splitting for Xcode? Some Xcode projects have a single monolithic test target. Module-level sharding would not help here. We could support an opt-in --granularity suite mode in a future phase (for Gradle, suite-level is already the default).

  2. Should sharding activation be explicit? Currently, the presence of any --shard-* flag implicitly activates sharding. An alternative would be an explicit --shard flag (or similar) to opt in, with --shard-max, --shard-min, etc. as configuration. The implicit approach is more concise, but an explicit flag would make intent clearer in CI workflows.

  3. Should Gradle shard configuration live in settings.gradle.kts instead of task flags? The current proposal uses flags on prepareTestShards (e.g., --shard-max 6) to align with the Xcode CLI approach. An alternative is a DSL block in settings.gradle.kts, closer to how Develocity configures test distribution:

    // settings.gradle.kts
    tuist {
        testSharding {
            maxShards = 6
            // minShards = 2
            // maxDuration = 300
            // isEnabled = System.getenv("CI") != null
        }
    }
    

The DSL approach might be more idiomatic for Gradle users and allows configuration to be checked in once rather than repeated in CI workflow files, but I’m not sure how common that would be. The task flags approach is simpler and more consistent across build systems.

]]>
https://community.tuist.dev/t/rfc-test-sharding/929#post_1 Fri, 06 Mar 2026 14:28:15 +0000 community.tuist.dev-post-2426
Server-Side xcactivitylog Processing
pepicrft:

Can we run this system in the same environment where they are deploying the app? I don’t see why we can’t do that, with the option to outsource to a pool of machines in case they need to scale (our case), but they can go a long way toward scaling the number and speed of cores in that server.

Can we design the system such that the version of the server (i.e., commit) is bound to the version of the business logic for doing the parsing, processing, and upload, such that if I’m self-hosting, I don’t need additional deployment automation?

Yes to both, we can build a library/executable dedicated to this that will be a part of the Docker image instead of relying on the release CLI version and then we should be able to execute it with Swift NIFs instead of shelling out. I agree depending on the released CLI is not a good idea, thanks for pointing that out.

And since the processing node will be an Elixir node (another reason why sandboxes are not a good idea for this), the server instance could share the same code and instead of delegating the work to a separate node, it could run things itself.

I agree SSH is not the right fit and was definitely the piece I was most unsure about when writing the RFC originally.

I think I’m for doing a combination of these two options. Server delegates work to process nodes through Oban, so we have direct visibility into the work from server, including retries.

But the processing nodes would then write directly to ClickHouse (so we don’t have to pass around potentially pretty large payloads, which has been working fine, but long-term, can become a problem as we will track more build data). It does mean the code for writing builds would need to move to the processing node in this scenario, but since we’re in a monorepo, we can share the code between the two to ensure we don’t break on-prem setups.

]]>
https://community.tuist.dev/t/server-side-xcactivitylog-processing/928#post_6 Fri, 06 Mar 2026 10:30:10 +0000 community.tuist.dev-post-2425
Server-Side xcactivitylog Processing

Have you compared Hetzner’s approach with Daytona from a pricing perspective? The way we own the machine comes with better cost control, but we need better monitoring and a multi-node pool.

I think sandboxes mostly make sense when running untrusted code, which in this case we don’t, so the engineering and compute overhead of sandboxing doesn’t actually bring any value. Agree on the monitoring part of having a virtual/dedicated server, but we’ve already built quite a good pipeline for that for the cache nodes, and assuming that we’ll use the same deployment mechanism for the… processing nodes (this is now the official working name), it’s mostly a copy and paste while adjusting which metrics we collect.

This is something we’re already getting better at operationally (I adjusted a bunch of alerts this week because things broke) so we can reuse that knowledge.


  1. Remote execution mechanism: The current proposal uses SSH (reusing the existing SSHClient from QA) for the Oban worker to trigger processing on the Hetzner machine. Alternatives worth considering: a lightweight HTTP API on the processing machine, a shared Oban queue (the processing machine running its own Oban consumer against the same database), or a container job. Is SSH the right trade-off between simplicity and robustness, or would one of these alternatives scale better?

I think SSH is actually the wrong choice here. It’s stateful, breaks during network blips or deployments, and we don’t really have anything in the process that would benefit from statefulness from what I can see. Maybe progress reporting but there’s other ways to do that.

I think I see two possible architectures here:

  1. Processing node polls the S3 bucket for unprocessed uploads, writes processed data directly to Postgres/Clickhouse. No communication at all between server and processing nodes. This has the benefit of completely decoupling the two services; processed data shows up on the dashboard through LiveView without the server having active knowledge of it. This has the downside of completely decoupling the two pieces; CLI uploads small initial data to server, artifact to S3, and things need to reconcile nicely from both sides.
  2. Server tells processing node to process something, processing node pings back “I’m done!”; either through webhooks or message passing through PG2. This has the big upside of everything being ordered and not dealing with weird out-of-order database writes, and the downside of things being less parallelised and having to deal with networking and a bigger API surface.

Both are significantly more robust than SSH, in my point of view.


  • Instant bug fixes. Server-side parsing fixes deploy without CLI upgrades.

This is actually one of the most underrated points here :upside_down_face: being able to add support for new Xcode weirdness without asking people to upgrade 40 CLI versions with a bunch of other unrelated changes is huge.

]]>
https://community.tuist.dev/t/server-side-xcactivitylog-processing/928#post_5 Fri, 06 Mar 2026 10:00:48 +0000 community.tuist.dev-post-2424
Codeowners for test insights Hey @Jon889 :waving_hand:

Yes, this would be amazing and something we’ve had in our backlog for quite some time. We’d love to get to this, but we don’t have a timeline just yet. Stay tuned :crossed_fingers:

]]>
https://community.tuist.dev/t/codeowners-for-test-insights/922#post_2 Fri, 06 Mar 2026 08:42:18 +0000 community.tuist.dev-post-2423
Server-Side xcactivitylog Processing
marekfort:

We can do that, I’m not super opinionated about this one. The /upload endpoint is following our existing conventions and the build_id would have been passed through the body, so in a lot of ways, it’s similar to what you’re proposing. I’m not sure if I’m super onboard for having an endpoint for all file uploads, I can see that becoming messy. But if we do need to make multipart upload work, it would make some things easier.

Let’s stick to what we have for now.

With those numbers, the maths are clear, so let’s go with VPSs.

My point was that if we can simplify the system so that on-prem doesn’t feel excluded by its complexity. Our cache nodes are a different story. Low latency is one of the most critical traits; we need to host the node close to the compute. But here, what they need is compute, which, coincidentally, is what the machine that runs the web server has. The two sources of complexity that I’d look into simplifying for them are:

  1. Can we run this system in the same environment where they are deploying the app? I don’t see why we can’t do that, with the option to outsource to a pool of machines in case they need to scale (our case), but they can go a long way toward scaling the number and speed of cores in that server.
  2. Can we design the system such that the version of the server (i.e., commit) is bound to the version of the business logic for doing the parsing, processing, and upload, such that if I’m self-hosting, I don’t need additional deployment automation?

If we extract the inspection logic into a small executable, include it in the Docker image, and consume it from the VPS servers (or the web app instance), we have a model that works for us (we can scale it), for our on-premise users (low complexity), and we don’t need to put them in the position of “if you don’t feel like hosting this trade pipeline speed with a --mode local” flag.

Also, they wouldn’t be able to avoid the extra node if the inspection requires macOS, for which I might understand having to decide between “local parsing” vs “hosting a macOS” node, but I’d avoid it for Linux if possible.

Note on the Deno: I missed the need for the native library. Deno can pull the program from the server as an ES module graph, so we can skip building and bundling the executable when building the server image. But since we need the Swift library, this idea falls apart.

]]>
https://community.tuist.dev/t/server-side-xcactivitylog-processing/928#post_4 Fri, 06 Mar 2026 08:34:29 +0000 community.tuist.dev-post-2422
Server-Side xcactivitylog Processing
pepicrft:

How is the build attributed to the build? I was wondering if it’s a good time to embrace Stripe’s API pattern for file uploads where you have endpoints dedicated to uploads, which take a purpose attribute. In this case, it could be xcactivity_log, and then the ID can be passed by the client to connect the build with the activity log upload.

We can do that, I’m not super opinionated about this one. The /upload endpoint is following our existing conventions and the build_id would have been passed through the body, so in a lot of ways, it’s similar to what you’re proposing. I’m not sure if I’m super onboard for having an endpoint for all file uploads, I can see that becoming messy. But if we do need to make multipart upload work, it would make some things easier.

Here’s what Claude thinks:

  • Daytona: 20k builds (taken from last 24 hours) × 1 min = ~333 hours/day × $0.08/hour = ~$27/day = ~$800/month (and growing)
  • Hetzner: A beefy auction server at ~€40-50/month can easily handle many concurrent parsing jobs. Even if we need 2-3 machines, that’s ~€150/month flat, regardless of volume

The 1 min/build is pessimistic, although in a sandbox, there’s also a setup and teardown step that wouldn’t be immediate. Regardless, I don’t think you can beat Hetzner on price and we already have the pieces we need from the cache nodes. It feels wasteful to do sandbox for something that doesn’t benefit much from that since we’d be repeatedly running a single command, predefined by us. Curious to get @cschmatzler take on this who has more experience with dealing with Hetzner and our caching nodes.

But I think I’d first start with Hetzner and if the maintenance turns out to be a pain, then we can always pivot, rather than starting with an option we know will be always more expensive.

You can’t associate the CAS data with builds without pre-processing the .xcactivitylog, mostly defeating the purpose of all of this. I think the answer will have to be to prune the CAS metadata often enough, so we don’t upload too much that’s unrelated. Also, compressing the CAS files will be quite efficient given the content in them will repeat quite a bit.

I don’t think I follow how deno makes things much better. In the sandbox environment, you can put anything in the Docker image already prebuilt, why make a dependency on a specific runtime? CLI releases (and node release for that matter) are all automated anyways.

Addiiontally, I feel quite strongly that we don’t want to maintain XCLogParser in a new language, unless we had a really good reason. But especially since the library builds on Linux, I don’t think we do.

We’re not excluding on-premise from our design. The same way you can already self-host caching nodes, you could self-host processing nodes. And I’d argue that running a single machine that you point to is a simpler approach than the sandbox one. And there’s always --mode local for companies where the tuist inspect build is fine.

]]>
https://community.tuist.dev/t/server-side-xcactivitylog-processing/928#post_3 Thu, 05 Mar 2026 21:17:59 +0000 community.tuist.dev-post-2421
Server-Side xcactivitylog Processing
marekfort:

POST /accounts/{account_handle}/projects/{project_handle}/builds/upload — returns a presigned S3 URL for the xcactivitylog upload and creates a build record in “processing” state

How is the build attributed to the build? I was wondering if it’s a good time to embrace Stripe’s API pattern for file uploads where you have endpoints dedicated to uploads, which take a purpose attribute. In this case, it could be xcactivity_log, and then the ID can be passed by the client to connect the build with the activity log upload.

Have you compared Hetzner’s approach with Daytona from a pricing perspective? The way we own the machine comes with better cost control, but we need better monitoring and a multi-node pool. Considering we’ll eventually have environments ourselves :), it might not be a bad idea to introduce the interface of sandbox_provider, one of which is Daytona, and down the line, your own account’s pool of environments, which we’ll manage.

How do you attribute the CAS metadata to the build that you are processing the data for if we persist everything into ~/.local/state/tuist/, including state from previous builds?

I have mixed feelings about the system and the release complexity that comes with it, but I’m also fine with testing it out and iterating.

One pattern I’ve tinkered with, and that I believe makes systems like this much simpler, is the combination of ephemeral environments, which are getting commoditized and whose prices are going down, with Deno. Instead of having to deploy CLI updates on release and scale the pool as our runtime demands increase, we can have an environment that spawns a deno process, which resolves the ES module graph via the HTTP transport (the server serves the JS modules).

For on-prem customers, we can ship the Docker image with Deno in it and mount a volume for downloads and processing, all on the same machine. With this approach:

  • We don’t need to figure out scale (Daytona and in the future us will do it)
  • We don’t need to figure out how to release the CLI into environments
  • We don’t need to exclude on-premise from our design

The one caveat is that we need to move the logic to another language and introduce a runtime, but I’m not that against that.

]]>
https://community.tuist.dev/t/server-side-xcactivitylog-processing/928#post_2 Thu, 05 Mar 2026 18:05:01 +0000 community.tuist.dev-post-2420
Server-Side xcactivitylog Processing Summary

Move xcactivitylog parsing from the CLI (client-side) to a dedicated server-side processing pipeline. Instead of parsing logs locally and uploading structured data, the CLI would upload raw .xcactivitylog files to S3, and a server-side service would parse and analyze them asynchronously.

Motivation

We’ve received multiple complaints from large-scale users about tuist inspect build being too slow in CI environments. Client-side parsing of xcactivitylogs can take up to 45 seconds on large builds — a 3–5% overhead on CI jobs that run 10–15 minutes. This has led some teams to disable build metrics collection in CI entirely.

The current client-side approach has served us well, but we’re hitting its limits:

  1. Performance overhead on client machines. Parsing is CPU-intensive and blocks CI pipelines. As we add more analytics, this will only get worse.
  2. Debugging requires user cooperation. When issues arise, we need to ask users to share their .xcactivitylog files. With server-side storage, we’d have direct access for debugging.
  3. Bug fixes require CLI upgrades. Any fix to parsing logic requires users to upgrade the CLI, creating friction. Server-side fixes deploy instantly.
  4. Constant performance pressure. Every new metric or analysis we add to the CLI increases parse time, making it a constant fight to keep the command fast enough.

Current Architecture

Today, tuist inspect build:

  1. Locates the most recent .xcactivitylog in derived data
  2. Parses it locally using XCLogParser (TuistXCActivityLog/XCActivityLogController.swift)
  3. Extracts build targets, files, issues, cache operations, CAS outputs, duration, and category
  4. Collects environment metadata (git info, CI info, Xcode version, machine info, custom tags)
  5. Uploads the structured result to the server via POST /builds

Similar flows exist for tuist inspect test (parsing .xcresult bundles) and tuist inspect bundle (analyzing app bundles).

Proposed Architecture

CLI Changes

The CLI would:

  1. Locate the .xcactivitylog (same as today)
  2. Collect lightweight metadata: git info, CI info, Xcode version, machine info, custom tags/values
  3. Zip the .xcactivitylog together with the CAS metadata directories (see CAS Metadata below) into a single archive
  4. Upload the archive to S3 via a presigned URL
  5. POST metadata + S3 reference to the server
  6. Return immediately — no local parsing

A --mode local|remote flag would control whether parsing happens client-side or server-side. The default is remote only when the CLI is connected to a known Tuist-hosted server — for self-hosted servers, the default remains local so processing stays on the client and self-hosting teams don’t need to run the additional processing infrastructure.

Since .xcactivitylog files are gzip-compressed, they typically range from a few KB to single-digit MBs even for large projects. A simple presigned S3 PUT (which supports up to 5GB) is sufficient — we’ll skip multipart upload initially and revisit if we encounter files large enough to warrant it.

This reduces the CLI’s job to file upload + metadata collection, which should complete in seconds rather than tens of seconds.

Server Changes

New upload endpoint:

  • POST /accounts/{account_handle}/projects/{project_handle}/builds/upload — returns a presigned S3 URL for the xcactivitylog upload and creates a build record in “processing” state

Processing pipeline (Oban worker):

When the upload endpoint receives metadata and confirms the S3 upload, it enqueues a ProcessBuildWorker Oban job:

  1. ProcessBuildWorker.perform/1 generates a short-lived, scoped token for the specific project/build
  2. SSHs into the Hetzner processing machine (using the existing Tuist.SSHClient abstraction, following the TestWorker pattern from QA)
  3. Runs tuist inspect build --mode local with the token and server URL — since tuist runs on Linux, we reuse the exact same parsing and upload logic with zero code duplication
  4. The CLI downloads the archive from S3, unpacks it, parses it, and uploads results to the server — the same flow as when a user runs tuist inspect build locally, just happening on the processing machine with the scoped token
  5. The raw archive remains in S3 with a 7–30 day retention policy for debugging

The worker would use max_attempts: 3 for retries on transient failures, and unique constraints keyed on the build ID to prevent duplicate processing.

Dashboard changes:

  • Show a “Processing your build…” state for builds that haven’t been analyzed yet
  • Once processing completes, display the full analytics as today

Processing Machine

A dedicated Hetzner server auction machine with strong single-thread CPU performance and ample RAM — xcactivitylog parsing is CPU-bound.

Unlike the QA TestWorker pattern which creates and destroys ephemeral Namespace instances, the processing machine is long-lived — the Oban worker simply SSHs in, runs the command, and reads the output. Multiple jobs can run concurrently on the same machine since each operates on its own unpacked archive in a temporary directory.

If a single machine becomes a bottleneck, we can add more machines behind a simple round-robin or least-loaded selection in the worker.

CLI Version Management

The processing machine is configured with NixOS, following the same pattern as the cache nodes in cache/platform/. Since tuist already publishes fully static musl-based Linux binaries on GitHub Releases (for both x86_64 and aarch64), we’d write a simple Nix derivation that fetches the tarball and pins it by version + hash. Bumping the version is just updating those two values in the derivation.

The CLI release workflow would be extended to also bump the tuist version in the processing machine’s Nix configuration, similar to how it already triggers the Homebrew formula update. This ensures the processing machine always runs a known, tested version of the parsing logic and avoids drift.

Authentication

The processing machine needs to write build analytics for arbitrary projects. The Oban worker handles this by generating a short-lived, scoped token for each job and passing it to the Hetzner machine as part of the command. The flow is:

  1. The Oban worker creates a short-lived token scoped to the specific project and build
  2. The Oban worker SSHs into the Hetzner machine and runs tuist inspect build --mode local with the token and server URL
  3. The CLI parses the archive and uploads the results — the exact same flow as when a user runs tuist inspect build locally, just authenticated with the scoped token

This keeps the Oban worker lightweight — it only orchestrates the job and generates credentials — while the processing machine handles both parsing and uploading using the standard CLI flow. The short-lived, narrowly scoped tokens ensure the processing machine never has broad access: each token is only valid for a single project/build and expires shortly after issuance.

CAS Metadata

Build analytics depend heavily on CAS (Content Addressable Storage) metadata that Xcode’s caching system writes to disk during compilation. This data lives outside the .xcactivitylog — in ~/.local/state/tuist/ (or $XDG_STATE_HOME/tuist/) — and is required for cache efficiency metrics, artifact size distributions, and download/upload performance analysis.

Directory structure:

~/.local/state/tuist/
├── nodes/          # Node ID → checksum mappings (~64 bytes each)
├── cas/            # Checksum → size/duration/compressedSize JSON (~200 bytes each)
└── keyvalue/       # Cache key operation timings
    ├── read/
    └── write/

A small-to-medium build might have 30–100 CAS outputs (~8KB of metadata), but large projects can have up to 15k cacheable tasks and 40k CAS outputs. At ~264 bytes per output (node mapping + metadata JSON), that’s roughly 10MB uncompressed — still modest, and it compresses well since the files are small repetitive JSON.

Upload approach: The CLI bundles the .xcactivitylog and the relevant nodes/, cas/, and keyvalue/ directories into a single zip archive, then uploads it to S3 via a presigned URL. The processing service unpacks the archive and runs the analysis with the full context available — exactly as the CLI does today locally.

This avoids any changes to how CAS metadata is structured or read, and the processing service can reuse the same CASNodeStore / CASOutputMetadataStore / KeyValueMetadataStore code paths that exist in the CLI.

Backward Compatibility

The existing client-side parsing mode would remain available:

  • Local development: Users who want instant feedback can continue using client-side parsing
  • On-premise deployments: Self-hosted users may prefer client-side parsing to avoid the additional infrastructure
  • A flag or configuration option (--mode local|remote) could control the behavior, with remote as the default for projects set up against the Tuist-hosted server.

Scope

In scope (Phase 1): tuist inspect build

Build log parsing is the most impactful case — it’s where users are hitting performance issues today.

Future phases: tuist inspect test and tuist inspect bundle

  • tuist inspect test parses .xcresult bundles, which depend on Xcode command-line tools (xcresulttool). This makes server-side processing harder since we’d need macOS or a compatible toolchain on the server.
  • tuist inspect bundle analyzes .app/.ipa/.xcarchive/.aab/.apk files. Some analysis (e.g., parsing .xcassets) depends on tools that may not be available on Linux.

These commands haven’t had performance complaints yet, but the same architecture could be extended to them if Linux-compatible parsing is feasible.

Trade-offs

Advantages

  • Eliminates client-side performance overhead. CI jobs are no longer blocked by log parsing.
  • Debugging access. Raw logs stored in S3 — no need to ask users for files.
  • Instant bug fixes. Server-side parsing fixes deploy without CLI upgrades.
  • Decouples analytics evolution from CLI releases. New metrics can be added server-side without client changes.
  • Scales independently. Processing capacity can be scaled without affecting CLI or main server.

Disadvantages

  • Increased infrastructure cost. A dedicated processing machine (though Hetzner auction servers are cost-effective).
  • Delayed results. Builds won’t have analytics immediately — requires a “processing” state in the dashboard.
  • More data leaves the client. Raw .xcactivitylog files must be uploaded, which may be large. Some users may have concerns about uploading raw build logs.
  • Additional infrastructure complexity. A new service to deploy, monitor, and maintain.
  • On-premise complexity. Self-hosted users would need to run the processing service or stick with client-side parsing.

Open Questions

  1. Remote execution mechanism: The current proposal uses SSH (reusing the existing SSHClient from QA) for the Oban worker to trigger processing on the Hetzner machine. Alternatives worth considering: a lightweight HTTP API on the processing machine, a shared Oban queue (the processing machine running its own Oban consumer against the same database), or a container job. Is SSH the right trade-off between simplicity and robustness, or would one of these alternatives scale better?

References

  • XCLogParser — compiles and runs on Linux
  • XCMetrics — prior art for server-side xcactivitylog processing
  • Current CLI implementation: cli/Sources/TuistXCActivityLog/XCActivityLogController.swift
  • Current upload service: cli/Sources/TuistServer/Services/CreateBuildService.swift
]]>
https://community.tuist.dev/t/server-side-xcactivitylog-processing/928#post_1 Thu, 05 Mar 2026 17:12:35 +0000 community.tuist.dev-post-2419
RFC: Comparison Primitives for MCP, CLI, and Agent Skills
marekfort:

note we will need to reconcile Gradle and Xcode as part of a single command. Alternatively, we have a new CLI scope for gradle, such as tuist build gradle .... I think a different scope might make more sense since otherwise some subcommands won’t make sense (Gradle doesn’t have a notion of CAS)

Good point. We can do that in the MCP tools too. The agent should be able to determine the right tool/CLI command to call based on the build system returned with the project.

]]>
https://community.tuist.dev/t/rfc-comparison-primitives-for-mcp-cli-and-agent-skills/925#post_8 Thu, 05 Mar 2026 11:37:07 +0000 community.tuist.dev-post-2418
RFC: Comparison Primitives for MCP, CLI, and Agent Skills
pepicrft:

tuist build cache list <build-id> List cacheable tasks with hit/miss status

note we will need to reconcile Gradle and Xcode as part of a single command. Alternatively, we have a new CLI scope for gradle, such as tuist build gradle .... I think a different scope might make more sense since otherwise some subcommands won’t make sense (Gradle doesn’t have a notion of CAS)

]]>
https://community.tuist.dev/t/rfc-comparison-primitives-for-mcp-cli-and-agent-skills/925#post_7 Thu, 05 Mar 2026 11:35:00 +0000 community.tuist.dev-post-2417
RFC: Comparison Primitives for MCP, CLI, and Agent Skills
marekfort:

I think we need to come up with a structure that will allow the agent to prompt the individual pieces of the build, such as individual module build times could be tuist build module list <build-id>. In this RFC, I’d include how we want to surface all the structured data we currently track.

That makes sense. I made the build interface more granular so the agent can drill as needed with the support of our skills and MCP prompts.

Agree, I’d do this in a follow-up once we have a better understanding of what kinds of insights people derive from the comparisons.

]]>
https://community.tuist.dev/t/rfc-comparison-primitives-for-mcp-cli-and-agent-skills/925#post_6 Thu, 05 Mar 2026 11:29:27 +0000 community.tuist.dev-post-2416
RFC: Comparison Primitives for MCP, CLI, and Agent Skills Thanks @pepicrft for the updates, this is now much better scoped and digestable, making it easier for me to provide feedback.

All of the build data in a single CLI command/endpoint might be too much, especially as we want to track more.

I think we need to come up with a structure that will allow the agent to prompt the individual pieces of the build, such as individual module build times could be tuist build module list <build-id>. In this RFC, I’d include how we want to surface all the structured data we currently track.

I agree with this. Primarily because the data will actually get quite large and as we track more data, it will be hard to do 1-1 diff (for example, once we track CPU usage, how do you compare the CPU graph properly?)

I would still push for having a diff option in the dashboard to compare data that is easily diffable (change in build time, what build settings changed, etc.) But we can keep this RFC scoped to the agentic comparison workflow, which I would prioritize tackling.

Otherwise, very aligned with this proposal :slightly_smiling_face:

]]>
https://community.tuist.dev/t/rfc-comparison-primitives-for-mcp-cli-and-agent-skills/925#post_5 Thu, 05 Mar 2026 10:36:01 +0000 community.tuist.dev-post-2415
RFC: Comparison Primitives for MCP, CLI, and Agent Skills @marekfort ready for another review

]]>
https://community.tuist.dev/t/rfc-comparison-primitives-for-mcp-cli-and-agent-skills/925#post_4 Thu, 05 Mar 2026 10:20:01 +0000 community.tuist.dev-post-2414
RFC: Comparison Primitives for MCP, CLI, and Agent Skills Thanks for the feedback. Let me address each point.

On the proxy: you’re right that .xcactivitylog covers most of the basic build data. The piece it doesn’t expose is the build graph, the dependency relationships between targets, which I think is where the most valuable insights will come from. Understanding why something is slow, not just that it is. That said, I agree that’s future scope, so I’ll reframe the proxy as an optional enrichment layer rather than a core dependency for v1.

On Bazel and other unsupported build systems: the intent wasn’t to signal we’re supporting them soon, but to understand what data other build systems expose and reason through how it’d fit our model. That kind of thinking helps us avoid designing something too Xcode-specific that we’d have to rework later. Happy to make that clearer so it doesn’t read as scope creep.

On the ingestion approach: the reasoning behind storing raw reports in object storage was to avoid having to make hard decisions upfront about what’s valuable. If you store only structured metrics in ClickHouse, you risk needing data later that you already threw away. That said, I think you’re right that we don’t fully have that problem yet. The more pragmatic path is to start with a curated subset of structured metrics that covers the most actionable insights, and design the ingestion layer so adding more data points later is cheap. We can absorb the ClickHouse cost for now and revisit when the pressure is real.

On the broader confusion: that’s on me. I’ll rewrite the framing to make clear this is an evolution of what we already have, not a big bang rewrite.​​​​​​​​​​​​​​​​

]]>
https://community.tuist.dev/t/rfc-comparison-primitives-for-mcp-cli-and-agent-skills/925#post_3 Wed, 04 Mar 2026 18:28:16 +0000 community.tuist.dev-post-2411
RFC: Comparison Primitives for MCP, CLI, and Agent Skills
pepicrft:

Existing proxy frameworks like XCBBuildServiceProxyKit and Tuist’s own XCBLoggingBuildService demonstrate feasibility. With swift-build being open source, we can use the exact SWBProtocol types for deserialization.

The vast majority of the information you mention is retrievable from the .xcactivitylog. I’d push back against tying this proposal strongly with the proxy. We can include the proxy, especially in CI environments, to get richer data, but I do think it’s better if we don’t over-rely on it.

why are we including Bazel into this RFC? We don’t plan to include Bazel support this year. I’d strongly advise not to include unsupported build systems into RFCs like this, unless there’s something specific we’re taking inspiration from.

why can’t we stick to the current pattern of the client sending a regular GET request?

If I’m completely honest, I’m a bit lost on the point of this RFC. We already have the architecture both for Xcode and Gradle to do comparisons. We need to:

  • evolve the CLI/API/MCP to give users access to all the data we already track
  • track more data. For some of the data, we might need to evolve the way we track them (as some Xcode data might be out of reach unless we integrate with the build proxy).

I am also completely missing the connection with the current architecture and how we evolve it. I don’t currently see a need for a big-bang approach where we do something completely different.

Similarly to Bazel, let’s not pollute this proposal with build systems we currently have no plans to support. We’ve just released support for Gradle and we’ll be busy with Xcode and Gradle for a while …

]]>
https://community.tuist.dev/t/rfc-comparison-primitives-for-mcp-cli-and-agent-skills/925#post_2 Wed, 04 Mar 2026 16:26:47 +0000 community.tuist.dev-post-2408
RFC: Comparison Primitives for MCP, CLI, and Agent Skills Summary

This RFC proposes adding comparison workflows to Tuist by combining existing data-fetching tools/commands with new MCP prompts and a CLI skill that guide AI agents through structured comparisons. No new server-side comparison endpoints are needed. The approach fills gaps in the current MCP tool surface, adds URL support to MCP tools, and layers comparison intelligence on top via prompts and skills.

Motivation

Tuist already collects rich data about builds, test runs, and bundles. Users and agents can inspect individual resources, but answering comparative questions like “Did this PR make the build slower?” or “Are there new test failures compared to main?” requires multi-step orchestration with no guidance. Prompts and skills solve this by teaching agents what to fetch, what to compare, and how to present findings.

Why Client-Side Comparison

I considered server-side compare_* endpoints. Client-side is better because:

  • Payloads are small. A build show or test run show response is ~300-500 tokens. Two side by side is ~1K, which is negligible.
  • LLMs are good at diffing. Given two JSON objects and instructions, an LLM produces context-aware comparisons better than a rigid delta endpoint.
  • Less server work. No new endpoints, controllers, or tests. Reuse existing infrastructure.
  • Faster iteration. Updating a prompt or skill is a text change, not a deployment.
  • Flexibility. Prompts can guide the agent to focus on whatever dimension matters for the user’s question.

If a specific comparison involves genuinely large payloads (e.g., diffing 1000+ test case results), we can add a targeted server endpoint later. Start without it.

Current Inventory

CLI Commands (all support --json)

Command Filters Pagination
tuist build list --git-branch, --status, --scheme, --configuration, --tags, --values --page, --page-size
tuist build show <id> - -
tuist test show <test-run-id> - -
tuist test case list --flaky, --quarantined --page, --page-size
tuist test case show <id-or-identifier> - -
tuist test case run list [id-or-identifier] --flaky, --test-run-id --page, --page-size
tuist test case run show <id> - -
tuist bundle list --git-branch -
tuist bundle show <id> - -

MCP Tools

Tool Parameters
list_projects (none)
list_test_cases account_handle, project_handle, flaky, quarantined, module_name, name, suite_name, page, page_size
get_test_case test_case_id OR account_handle + project_handle + identifier
get_test_run test_run_id
get_test_case_run test_case_run_id

MCP Prompts

Prompt Purpose
fix_flaky_test Guides agent through diagnosing and fixing a flaky test

Structured Data Currently Tracked

Understanding the full data surface is critical for designing a granular interface. Here is everything Tuist stores.

Build Data

Build Run (top-level)

Field Type Description
id UUID Unique identifier
duration Int32 (ms) Total build duration
status Enum success, failure
category Enum clean, incremental
scheme String Build scheme
configuration String Build configuration (Debug, Release, etc.)
xcode_version String Xcode version
macos_version String macOS version
model_identifier String Machine model (e.g., MacBookPro18,1)
is_ci Boolean Whether run on CI
ci_provider Enum github, gitlab, bitrise, circleci, buildkite, codemagic
ci_run_id String CI system run ID
ci_project_handle String CI project reference
ci_host String CI host URL
git_branch String Git branch
git_commit_sha String Git commit SHA
git_ref String Git reference/tag
cacheable_tasks_count Int32 Total cacheable compilation tasks
cacheable_task_local_hits_count Int32 Local cache hits
cacheable_task_remote_hits_count Int32 Remote cache hits
custom_tags Array(String) Up to 50 custom tags
custom_values Map(String, String) Up to 20 custom key-value pairs
ran_by String Account handle of who ran the build

Build Targets (per-target metrics)

Field Type Description
name String Target name
project String Project name (within monorepo)
build_duration UInt64 (ms) Total build duration for this target
compilation_duration UInt64 (ms) Compilation-only duration
status Enum success, failure

Build Files (per-file compilation metrics)

Field Type Description
type Enum swift, c
target String Parent target
project String Parent project
path String File path relative to project root
compilation_duration UInt64 (ms) Compilation time for this file

Build Issues (errors and warnings)

Field Type Description
type Enum warning, error
target String Target where issue occurred
project String Project where issue occurred
title String Issue title
message String Full message
signature String Deduplication signature
path String File path
starting_line, ending_line UInt64 Line range
starting_column, ending_column UInt64 Column range
step_type Enum c_compilation, swift_compilation, script_execution, create_static_library, linker, copy_swift_libs, compile_assets_catalog, compile_storyboard, write_auxiliary_file, link_storyboards, copy_resource_file, merge_swift_module, xib_compilation, swift_aggregated_compilation, precompile_bridging_header, validate_embedded_binary, validate, other

Cacheable Tasks (per-task cache hit/miss)

Field Type Description
type Enum clang, swift
status Enum hit_local, hit_remote, miss
key String Cache key
read_duration Float64 (ms) Time to read from cache
write_duration Float64 (ms) Time to write to cache
description String Task description
cas_output_node_ids Array(String) Linked CAS output nodes

CAS Outputs (content-addressed storage operations)

Field Type Description
node_id String CAS node ID
checksum String Content checksum
size UInt64 (bytes) Uncompressed size
compressed_size UInt64 (bytes) Compressed size
duration UInt64 (ms) Transfer duration
operation Enum download, upload
type String Content type (swift, object, pch, dSYM, swiftmodule, etc. – 45+ types)

Test Data

Test Run (top-level)

Field Type Description
id UUID Unique identifier
duration Int32 (ms) Total test run duration
status Enum success, failure, skipped
is_ci Boolean Whether run on CI
is_flaky Boolean Whether any test was flaky
scheme String Build scheme
xcode_version String Xcode version
macos_version String macOS version
model_identifier String Machine model
git_branch String Git branch
git_commit_sha String Git commit SHA
git_ref String Git reference
build_run_id UUID Associated build run
ci_provider String CI provider
ci_run_id String CI run ID
total_test_count Int Total test cases
failed_test_count Int Failed test cases
flaky_test_count Int Flaky test cases
avg_test_duration Int Average test case duration

Test Module Run (per-module)

Field Type Description
name String Module/target name
status Enum success, failure
is_flaky Boolean Whether any test in module was flaky
duration Int32 (ms) Total module duration
test_suite_count Int32 Number of test suites
test_case_count Int32 Total test cases
avg_test_case_duration Int32 (ms) Average test case duration

Test Suite Run (per-suite)

Field Type Description
name String Suite name
status Enum success, failure, skipped
is_flaky Boolean Whether any test in suite was flaky
duration Int32 (ms) Total suite duration
test_case_count Int32 Test cases in suite
avg_test_case_duration Int32 (ms) Average duration

Test Case Run (per-test)

Field Type Description
name String Test case name
module_name String Module/target
suite_name String Suite
status Enum success, failure, skipped
is_flaky Boolean Flaky
is_new Boolean First time this test was seen
is_ci Boolean Ran on CI
duration Int32 (ms) Duration
scheme String Build scheme
git_branch String Branch
git_commit_sha String Commit

Test Case Failure

Field Type Description
message String Failure message / assertion
path String File path
line_number Int32 Source line
issue_type String error_thrown, assertion_failure, issue_recorded, unknown

Test Case Run Repetition (retry attempts)

Field Type Description
repetition_number Int32 1 = first run, 2 = retry 1, etc.
name String Human-readable (e.g., “Retry 1”)
status String success, failure
duration Int32 (ms) Duration of this attempt

Test Case Run Attachment

Field Type Description
file_name String Attachment filename (screenshots, logs, etc.)

Crash Report

Field Type Description
exception_type String e.g., “EXC_CRASH”
signal String e.g., “SIGABRT”
exception_subtype String Subtype
triggered_thread_frames String Formatted stack trace

Test Case (deduplicated definition with latest metrics)

Field Type Description
name String Test name
module_name String Module
suite_name String Suite
last_status Enum Last observed status
last_duration Int32 (ms) Duration of last run
is_flaky Boolean Currently flaky
is_quarantined Boolean Currently quarantined
recent_durations Array(Int32) Recent run durations for trends
avg_duration Int64 Average duration
reliability_rate Float Percentage of successful runs
flakiness_rate Float Percentage of flaky runs (last 30 days)
total_runs Int Lifetime run count
failed_runs Int Failed run count

Bundle Data

Bundle (top-level)

Field Type Description
id UUID Unique identifier
name String Bundle name
app_bundle_id String App bundle identifier
version String Version string
type Enum ipa, app, xcarchive, aab, apk
supported_platforms Array(String) iOS, Android, macOS, tvOS, watchOS, visionOS, simulators
install_size Int (bytes) Installed size
download_size Int (bytes) Download size (nullable)
git_branch String Git branch
git_commit_sha String Git commit SHA
git_ref String Git reference
uploaded_by_account String Account that uploaded

Bundle Artifact (recursive tree)

Field Type Description
artifact_type String Type of artifact
path String Relative path in bundle
size Int (bytes) Size
shasum String SHA checksum
children Array Recursive child artifacts

What’s Needed

1. Granular CLI Commands and MCP Tools for Builds

The current tuist build show returns all build data in a single response. As we track more data, this becomes unwieldy for agents that only need a specific slice. Rather than changing build show (which would break existing scripts and integrations), we add dedicated subcommands that let agents and users query individual pieces of a build.

tuist build show and tuist build list remain unchanged – they continue to return the same data they do today. The new subcommands are purely additive.

New CLI Subcommands

Command Description Key Filters
tuist build target list <build-id> List targets in a build with per-target durations --status
tuist build target show <build-id> <target-name> Show detailed metrics for a single target -
tuist build file list <build-id> List files with compilation durations --target, --type, --sort-by
tuist build issue list <build-id> List build issues (errors/warnings) --type, --target, --step-type
tuist build cache list <build-id> List cacheable tasks with hit/miss status --status, --type
tuist build cas list <build-id> List CAS (content-addressed storage) operations --operation, --type

These subcommands return the same data that is nested inside tuist build show, but individually and with filtering. This is valuable for agents (which can progressively drill down without fetching everything) and for scripts that only care about one dimension (e.g., “give me just the errors”).

New MCP Tools for Builds

New MCP Tool Mirrors CLI Command Key Parameters
list_builds tuist build list account_handle, project_handle, git_branch, status, scheme, configuration, tags, values, page, page_size
get_build tuist build show build_run_id (ID or URL)
list_build_targets tuist build target list build_run_id, status
get_build_target tuist build target show build_run_id, target_name
list_build_files tuist build file list build_run_id, target, type, sort_by
list_build_issues tuist build issue list build_run_id, type, target, step_type
list_build_cache_tasks tuist build cache list build_run_id, status, type
list_build_cas_outputs tuist build cas list build_run_id, operation, type

get_build returns the full build response (same as tuist build show today, including nested data). The granular tools exist so agents can fetch individual slices with filtering when they don’t need the full payload:

  • “Which targets took longest?” → list_build_targets sorted by duration
  • “What files are slow to compile?” → list_build_files --sort-by compilation_duration
  • “What errors happened?” → list_build_issues --type error
  • “How is the cache performing?” → list_build_cache_tasks
  • “How much data transferred?” → list_build_cas_outputs

2. Granular CLI Commands and MCP Tools for Tests

Similarly, test data should be navigable at each level of the hierarchy: run → module → suite → case. Like builds, tuist test show remains unchanged. The new subcommands are additive.

New CLI Subcommands

Command Description Key Filters
tuist test list List test runs --git-branch, --status, --scheme, --page, --page-size
tuist test module list <test-run-id> List module runs within a test run --status
tuist test module show <test-run-id> <module-name> Show a specific module run -
tuist test suite list <test-run-id> List suite runs within a test run --module, --status
tuist test suite show <test-run-id> <suite-name> Show a specific suite run -

Existing test case commands (tuist test case list, tuist test case show, tuist test case run list, tuist test case run show) remain as-is since they already follow the granular pattern.

New MCP Tools for Tests

New MCP Tool Mirrors CLI Command Key Parameters
list_test_runs tuist test list account_handle, project_handle, git_branch, status, scheme, page, page_size
list_test_module_runs tuist test module list test_run_id, status
get_test_module_run tuist test module show test_run_id, module_name
list_test_suite_runs tuist test suite list test_run_id, module_name, status
get_test_suite_run tuist test suite show test_run_id, suite_name
list_test_case_runs tuist test case run list account_handle, project_handle, test_case_id, test_run_id, flaky, page, page_size

3. Fill Remaining MCP Tool Gaps (Bundles)

New MCP Tool Mirrors CLI Command Key Parameters
list_bundles tuist bundle list account_handle, project_handle, git_branch, page, page_size
get_bundle tuist bundle show bundle_id (ID or URL)

4. Add URL Support to MCP Tools

MCP tools that accept a resource ID should also accept a Tuist dashboard URL. URLs follow these patterns:

https://tuist.dev/:account/:project/builds/build-runs/:id
https://tuist.dev/:account/:project/tests/test-runs/:id
https://tuist.dev/:account/:project/tests/test-cases/:id
https://tuist.dev/:account/:project/tests/test-cases/runs/:id
https://tuist.dev/:account/:project/bundles/:id

Implement a shared Tuist.MCP.URLParser module that pattern-matches on path segments and returns {:ok, %{account_handle, project_handle, resource_type, resource_id}}. Each tool checks if the ID parameter starts with https:// and parses it automatically.

The CLI does not need native URL parsing. The compare skill handles URL parsing in its instructions, extracting the ID and project handle before calling CLI commands. This avoids a cross-cutting change to every CLI command’s argument parser for a use case that’s primarily agent-driven.

5. Add Comparison Prompts (MCP) and Skill (CLI)

MCP prompts and a CLI skill guide agents through comparisons using existing and newly added tools/commands. They follow the same pattern as fix_flaky_test (MCP) and fix-flaky-tests (CLI skill).

Implicit Baseline

When only a single resource is provided (one URL, one ID, or “this build”), the prompt/skill instructs the agent to resolve a baseline automatically by fetching the latest equivalent resource on the repository’s default branch (typically main). For example, if the user provides a single build URL, the agent fetches the latest build on main with matching scheme/configuration as the baseline. This means users can say “how does this build look?” and get a comparison without specifying both sides.

compare_builds (MCP prompt) / builds section of compare skill (CLI)

Parameters: base, head (ID, URL, or branch; head defaults to provided resource, base defaults to latest on main), account_handle, project_handle

Guidance covers:

  1. Resolving references (URLs go to get_build/tuist build show directly, branch names use list_builds/tuist build list with page_size: 1)
  2. Comparing top-level metrics: status, duration, cache hit rate, environment, category, git context, custom metadata
  3. If duration regressed: drill into list_build_targets / tuist build target list to find which targets slowed down
  4. If cache hit rate dropped: drill into list_build_cache_tasks / tuist build cache list to find new misses
  5. If errors increased: drill into list_build_issues / tuist build issue list to find new errors
  6. Only drill into per-file data (list_build_files / tuist build file list) if asked or if a specific target shows a large regression

Lead with regressions. Summarize improvements briefly.

compare_test_runs (MCP prompt) / test runs section of compare skill (CLI)

Parameters: base, head (ID, URL, or branch; base defaults to latest on main), account_handle, project_handle

Guidance covers:

  1. Resolving references
  2. Comparing top-level: status, failure count, flaky count, test count, duration, environment
  3. If failures increased: drill into list_test_module_runs / tuist test module list to find which modules regressed, then list_test_case_runs filtered by test_run_id to identify new failures, then get_test_case_run / tuist test case run show for failure details (limit top 5)
  4. If flaky count increased: identify newly flaky tests and chain into fix_flaky_test / fix-flaky-tests
  5. If duration regressed: drill into list_test_module_runs to find slow modules, then list_test_suite_runs for slow suites

compare_bundles (MCP prompt) / bundles section of compare skill (CLI)

Parameters: base, head (ID, URL, or branch; base defaults to latest on main), account_handle, project_handle

Guidance covers: resolving references, comparing total size, per-platform size, platform coverage changes, download vs install size delta.

compare_test_case (MCP prompt) / test case section of compare skill (CLI)

Parameters: test_case (ID, URL, or identifier), base_branch (defaults to main), head_branch, account_handle, project_handle

Guidance covers: fetching test case metrics, getting recent runs per branch via list_test_case_runs / tuist test case run list with git_branch filter, comparing status patterns, flakiness, duration, and failure messages. If newly flaky, chain into fix_flaky_test / fix-flaky-tests.

Example Interactions

Single resource with implicit baseline

User: How does this build look?
      https://tuist.dev/tuist/ios-app/builds/build-runs/def456

[Agent fetches the build, sees it's on feature/new-cache, scheme: Debug]
[Agent fetches latest build on main with matching scheme as baseline]

Agent: Compared against latest build on main:

       Duration improved by 5.2s (-11%), from 47.1s to 41.9s.
       Cache hit rate improved from 87% to 94%.
       Everything else unchanged. Looks good.

Drilling into target-level data

User: Why is this build slower?
      https://tuist.dev/tuist/ios-app/builds/build-runs/abc123

[Agent fetches the build, resolves baseline on main]
[Agent sees duration regressed by 12s]
[Agent calls list_build_targets for both builds]

Agent: Build duration increased by 12.0s (+26.7%).

       Slowest regressions by target:
         - NetworkKit: +8.2s (was 15.1s, now 23.3s)
         - CoreUI: +3.5s (was 9.8s, now 13.3s)

       NetworkKit had 3 cache misses that were hits on main.
       Want me to check which files in NetworkKit are slow?

User: Yes

[Agent calls list_build_files --target NetworkKit --sort-by compilation_duration]

Agent: Top slow files in NetworkKit:
         - APIClient.swift: 4.1s
         - RequestBuilder.swift: 2.8s
         - ResponseParser.swift: 1.3s

       APIClient.swift alone accounts for half the regression.

Two dashboard URLs

User: Compare these two builds
      https://tuist.dev/tuist/ios-app/builds/build-runs/abc123
      https://tuist.dev/tuist/ios-app/builds/build-runs/def456

[Agent calls get_build for each URL in parallel]

Agent: Duration increased by 12.0s (+26.7%), from 45.0s to 57.0s.
       Cache hit rate dropped from 92% to 87% (3 additional misses).
       Xcode version changed from 16.2 to 16.3.

       The duration regression correlates with the cache misses and
       the Xcode upgrade. Want me to check which targets missed?

Branch-based test run comparison

User: Did my PR make the tests worse?

[Agent resolves current branch, fetches latest test run on each branch]

Agent: Comparing latest test run on feature/auth-refactor vs main:

       2 new test failures:
         - AuthTests/testTokenRefresh: "Timeout waiting for expectation"
         - AuthTests/testSessionExpiry: "Expected true, got false"

       1 test fixed: BuildTests/testCleanBuild
       Overall duration improved by 30s (-9.4%).

       Both new failures are in AuthTests. Want me to investigate?

Drilling into test module data

User: Which test modules got slower?

[Agent calls list_test_module_runs for both test runs]

Agent: Module duration changes:
         - NetworkTests: +12.3s (was 25.1s, now 37.4s) -- 2 new failures
         - UITests: +3.1s (was 18.5s, now 21.6s)
         - CoreTests: -1.2s (was 14.3s, now 13.1s) -- improved

       NetworkTests has the biggest regression. Want me to drill
       into which suites or test cases are responsible?

Bundle comparison

User: How much bigger is the latest bundle on release/2.0 vs main?

[Agent fetches latest bundle on each branch]

Agent: Bundle size increased by 1.2MB (+2.6%), from 45.3MB to 46.5MB.
         iOS: +0.8MB, macOS: +0.4MB
       watchOS platform was added on release/2.0.

Chaining into fix-flaky-tests

Agent: Test run comparison found 2 newly flaky tests:
         - NetworkTests/testRetry
         - CacheTests/testEviction

       Want me to investigate and fix these?

User: Yes

[Agent chains into fix-flaky-tests skill/prompt for each test]

Implementation Plan

Phase 1: Granular Build Data Access

  • Add server API endpoints for build sub-resources (targets, files, issues, cacheable tasks, CAS outputs)
  • Add CLI subcommands: tuist build target list|show, tuist build file list, tuist build issue list, tuist build cache list, tuist build cas list
  • Add MCP tools: list_builds, get_build, list_build_targets, get_build_target, list_build_files, list_build_issues, list_build_cache_tasks, list_build_cas_outputs
  • Existing tuist build show and tuist build list remain unchanged (no breaking changes)

Phase 2: Granular Test Data Access

  • Add server API endpoints for test sub-resources (module runs, suite runs)
  • Add CLI subcommands: tuist test list, tuist test module list|show, tuist test suite list|show
  • Add MCP tools: list_test_runs, list_test_module_runs, get_test_module_run, list_test_suite_runs, get_test_suite_run, list_test_case_runs
  • Existing tuist test show and test case commands remain unchanged (no breaking changes)

Phase 3: Bundle MCP Tools

  • Add list_bundles, get_bundle MCP tools (reuse bundles_controller.ex)

Phase 4: URL Support in MCP

  • Implement Tuist.MCP.URLParser module
  • Update all MCP tools that accept resource IDs to also accept dashboard URLs

Phase 5: Comparison Prompts and Skill

  • Add MCP prompts: compare_builds, compare_test_runs, compare_bundles, compare_test_case
  • Add CLI compare skill (with URL parsing in skill instructions)
  • Document chaining with fix-flaky-tests

Open Questions

  1. tuist test list on CLI. The CLI has tuist test show but no tuist test list. Should we add it? It’s needed for branch-based test run comparison. (This RFC assumes yes.)
  2. Contextual selectors. Should tools/commands support latest:main syntax natively, or should prompts/skills handle the “list with page_size 1” pattern? The latter is more flexible but adds a round-trip.
  3. Pagination for build sub-resources. Builds with thousands of files or CAS outputs may need pagination on the sub-resource endpoints. Should we add pagination from the start, or start without it and add if needed?
]]>
https://community.tuist.dev/t/rfc-comparison-primitives-for-mcp-cli-and-agent-skills/925#post_1 Wed, 04 Mar 2026 15:44:17 +0000 community.tuist.dev-post-2407
Tuist cache only simulator thanks for answering

]]>
https://community.tuist.dev/t/tuist-cache-only-simulator/916#post_3 Wed, 04 Mar 2026 11:50:10 +0000 community.tuist.dev-post-2405
RFC: Bundle Size CI Check Threshold Same here, and agreed with @cschmatzler’s idea of making it reusable with other metrics in the future.

]]>
https://community.tuist.dev/t/rfc-bundle-size-ci-check-threshold/923#post_3 Wed, 04 Mar 2026 10:02:39 +0000 community.tuist.dev-post-2404
RFC: Bundle Size CI Check Threshold Love the feature.
Also think the Server → GitHub driven architecture makes a lot of sense for this opposed to failing CI; would definitely build this as an abstraction so we can reuse it in the future.

In general think for external systems integration, server-driven bits make a lot of sense since we can own retries, error logs etc rather than it being ran on the CLI where we have significantly less insight.

]]>
https://community.tuist.dev/t/rfc-bundle-size-ci-check-threshold/923#post_2 Wed, 04 Mar 2026 09:35:32 +0000 community.tuist.dev-post-2403
RFC: Bundle Size CI Check Threshold Summary

Add bundle size threshold checks to tuist inspect bundle via GitHub Check Runs. When a bundle is uploaded and a threshold is exceeded, the server posts a check run with an action_required conclusion and an “Accept” button directly in the GitHub PR UI. Clicking “Accept” flips the check to success — no CI re-run needed and no context switching to external dashboards. This lets teams catch bundle size regressions before merge while keeping the override workflow frictionless.

Motivation

Today, Tuist supports bundle size alerting via Slack notifications configured through the dashboard. While useful for monitoring trends on tracked branches, Slack alerts notify you only after a problematic merge but don’t block problematic changes.

Teams need a way to block PRs when a bundle size regression is detected. Key use cases from real customer feedback:

  • Catch regressions before merge. Teams want to detect bundle size explosions on feature branches before they hit main, not after.
  • Enforce budgets across all branches. Rather than configuring alerts per-branch, teams want a single threshold that applies to every tuist inspect bundle invocation in CI.
  • Non-blocking visibility with override escape hatch. Sometimes a significant bundle size increase is expected and intentional. The workflow should allow reviewers to acknowledge the increase rather than permanently blocking the PR.

Prior Art

  • tuist inspect dependencies --only implicit: Exits with a non-zero status when issues are found. This is the closest existing pattern in Tuist and the one users have explicitly referenced as the desired behavior for bundle size.
  • Webpack performance hints: Webpack supports performance.maxAssetSize and performance.hints (set to "error") to fail builds when bundle sizes exceed thresholds.
  • Bundlesize / size-limit: Popular JS ecosystem tools that run in CI, compare against configured budgets, and post GitHub status checks.
  • Emerge Tools status checks: Posts GitHub status checks that fail when size thresholds are exceeded. Uses an “Action Required” → “Accept” button pattern where developers acknowledge expected increases in the dashboard rather than modifying CI configuration.
  • Android App Bundles: Google Play Console warns developers about APK size increases, but this happens post-upload rather than in CI.

Proposed Solution

Dedicated Bundle Thresholds Configuration

Introduce a new Bundles tab in the project settings, separate from the existing Notifications/Alerts system. This tab is purpose-built for configuring bundle size thresholds that control whether the GitHub check run passes or fails.

The separation from alerts is intentional: as opposed to Slack alerts, thresholds are active PR gates via GitHub check runs. Mixing them in the same UI conflates two different concerns and makes it harder for teams to reason about what will block their PRs vs. what will send a Slack message.

Bundle Size Thresholds

Each threshold is a standalone rule configured in the Bundles settings tab:

Field Description Required
Name Human-readable label (e.g., “Main app install size budget”) Yes
Metric install_size or download_size Yes
Deviation percentage Maximum allowed increase before failing (e.g., 5.0%) Yes
Baseline branch Branch to compare against (e.g., main) Yes
Bundle name Filter to a specific app bundle identifier No

Multiple thresholds can be configured per project (e.g., one for install size at 5%, another for download size at 10%, or different thresholds per app in a multi-app project).

Server-Side Evaluation and GitHub Check Run

No changes are needed to the POST /bundles API response or the tuist inspect bundle CLI command. The command behaves exactly as it does today. All threshold logic is server-side and communicated to the developer via a GitHub check run.

Threshold evaluation only happens for bundles uploaded from CI environments. Local uploads are never checked against thresholds — this prevents developers from being blocked during local iteration, where cache invalidations or experimental builds may temporarily inflate bundle sizes.

When tuist inspect bundle uploads a bundle from CI, the server:

  1. Finds the latest bundle on the configured baseline branch matching the same app bundle identifier.
  2. Compares the relevant size metric (install or download) of the uploaded bundle against that baseline.
  3. Creates a GitHub check run (named tuist/bundle-size) on the commit:
    • success conclusion if no thresholds are exceeded.
    • action_required conclusion if a threshold is exceeded, with an “Accept” action button and a link to the bundle page.

The comparison is always against a fixed baseline branch, not a rolling window. This is intentional: you want to know “how does this PR compare to main?” rather than “how does this compare to the last upload on this same branch?”

There is a single tuist/bundle-size check run per project that evaluates all configured threshold rules. If any rule is violated, the check run requires action. If multiple thresholds are exceeded, only the first violation is reported in the check run summary.

Handling Expected Increases

Sometimes a significant bundle size increase is expected and intentional (e.g., adding a new SDK, large asset migration). Since enforcement happens via GitHub check runs rather than the CLI exit code, the override flow happens directly in GitHub:

  1. When a threshold is exceeded, the server creates a check run with action_required conclusion and an “Accept” action button.
  2. The developer (or reviewer) clicks “Accept” directly in the GitHub PR checks UI — no need to leave GitHub.
  3. GitHub sends a check_run.requested_action webhook to the Tuist server.
  4. The server updates the check run conclusion to success — the PR is unblocked immediately, no CI re-run needed.

Accepting applies to all violations on that commit at once — there is no per-threshold granularity. The acceptance is scoped to a specific commit, so future commits on the same branch are still checked against the threshold.

Usage Examples

Basic setup

  1. Connect the Tuist GitHub App to your repository (required for check runs).
  2. Navigate to project settings > Bundles in the Tuist dashboard.
  3. Create a new threshold:
    • Name: “Install size budget”
    • Metric: Install size
    • Deviation: 5%
    • Baseline branch: main
  4. Add tuist inspect bundle MyApp.ipa to your CI pipeline.
  5. Add tuist/bundle-size as a required status check in your GitHub branch protection rules.

CI pipeline (GitHub Actions)

- name: Check bundle size
  run: tuist inspect bundle build/MyApp.ipa
  # Always succeeds — enforcement happens via the GitHub status check

Accepting an expected increase

If a threshold is exceeded and the increase is intentional:

  1. Developer sees the tuist/bundle-size check run with action_required status in the PR.
  2. Clicks the “Accept” button directly in the GitHub checks UI.
  3. The check run flips to success — PR is unblocked, no CI re-run needed.

Alternatives Considered

Fail tuist inspect bundle directly (non-zero exit code)

Have the CLI command itself exit non-zero when a threshold is exceeded, similar to tuist inspect dependencies --only implicit. Rejected because:

  • Once the CLI exits non-zero, the only way to unblock the PR is to re-run the entire CI job, which rebuilds everything.
  • There’s no clean “accept” workflow — the developer would need to either modify the CI pipeline (adding a skip flag, polluting the PR diff) or adjust the threshold.
  • GitHub check runs provide a better enforcement model: the server can update the check from action_required to success without re-running CI, and the developer can accept directly in the GitHub UI.

Reuse existing alert rules with a fail_command toggle

Add a boolean fail_command field to alert rules with the bundle_size category. When enabled, the server evaluates the alert rule synchronously during bundle upload and violations cause the CLI to fail.

This would avoid introducing a new settings surface, but was rejected because:

  • Alerts and CI gates serve different purposes — alerts are passive notifications, thresholds are active blockers. Bundling them together makes it harder to reason about CI behavior.
  • Alert rules have fields that don’t apply to thresholds (Slack channel, cooldown period, rolling window) and vice versa. The UI would need conditional field visibility that adds complexity.
  • A dedicated Bundles tab provides a clearer home for future bundle-related settings (e.g., absolute size budgets, per-artifact thresholds).

Client-side threshold configuration in Tuist.swift

// NOT proposed
let tuist = Tuist(inspectOptions: .init(
    bundleSizeThreshold: .init(installSize: .percentage(5), comparedTo: "main")
))

Rejected because:

  • Requires the CLI to fetch the baseline bundle size from the server anyway, duplicating server logic.
  • Splitting configuration between manifest and dashboard is confusing.
  • The server already stores bundle history; it should own the comparison logic.

Separate tuist inspect bundle --check subcommand

A dedicated subcommand that only performs the threshold check without uploading. Rejected because it would require a second server round-trip and the upload + check flow is the natural CI workflow.

Percentage-only thresholds (no absolute size budgets)

The initial version only supports percentage-based deviation thresholds. Absolute size budgets (e.g., “fail if install size exceeds 100 MB”) could be added later but are out of scope for this RFC.

Implementation References

GitHub Check Runs API

We use the Check Runs API instead of the simpler commit statuses API because check runs support the action_required conclusion with custom action buttons directly in the GitHub UI.

Creating a check run via POST /repos/{owner}/{repo}/check-runs (GitHub docs):

{
  "name": "tuist/bundle-size",
  "head_sha": "abc123...",
  "status": "completed",
  "conclusion": "action_required",
  "details_url": "https://tuist.dev/team/project/bundles/123",
  "output": {
    "title": "Bundle size threshold exceeded",
    "summary": "Install size increased by 9.68% (threshold: 5.0%)\nPrevious: 45.3 MB (main) → Current: 49.7 MB"
  },
  "actions": [
    {
      "label": "Accept",
      "description": "Accept this size increase",
      "identifier": "accept_bundle_size"
    }
  ]
}
  • conclusion: action_required: Shows as a failing check that requires developer action. When actions are provided, GitHub renders action buttons directly in the PR checks UI.
  • actions: Up to 3 buttons (max 20 char label, 40 char description, 20 char identifier). When clicked, GitHub sends a check_run.requested_action webhook event to the Tuist server with the requested_action.identifier.
  • details_url: Links to the bundle page in the Tuist dashboard for a detailed breakdown.

Updating a check run via PATCH /repos/{owner}/{repo}/check-runs/{check_run_id} — when the server receives the check_run.requested_action webhook with identifier: "accept_bundle_size", it updates the check run’s conclusion to success.

Permission: Requires the GitHub App installation token with “Checks” write permission. The Tuist GitHub App currently does not have this permission, so we’ll need to bump the app’s permissions. GitHub will prompt existing installations to approve the new permission — this is a standard flow and should be low-friction for users.

Webhook handling: The server needs to handle the check_run webhook event (action: requested_action) in the existing GitHub webhook controller (server/lib/tuist_web/controllers/webhooks/github_controller.ex). This follows the same pattern as the existing issue_comment and installation event handlers.

]]>
https://community.tuist.dev/t/rfc-bundle-size-ci-check-threshold/923#post_1 Wed, 04 Mar 2026 09:25:12 +0000 community.tuist.dev-post-2402