COMPLEX.md

Technical Discovery: AlphaFold Complex API Resolution Pipeline

ProtPocket Team — March 2026

Summary

While building ProtPocket, we discovered an undocumented capability of the AlphaFold Database REST API that enables real-time retrieval of protein complex (homodimer) structural data using only a UniProt accession ID. This finding closes a critical gap in programmatic access to the March 2026 AlphaFold complex dataset — a gap that existed because no official complex API endpoint was published alongside the data release.

Background

On March 16, 2026, EMBL-EBI, Google DeepMind, NVIDIA, and Seoul National University released 1.7 million high-confidence homodimer predictions into the AlphaFold Database — the largest protein complex dataset ever assembled. However, no REST API endpoint was published to query these complex predictions programmatically by UniProt ID.

The only documented access method was bulk FTP download — meaning to get structural data for a single protein's homodimer, a researcher would theoretically need to download gigabytes of data and parse it locally.

We set out to find whether the API exposed any path to complex data without bulk download.

Investigation

What We Tested

We systematically tested every plausible API pattern against the AlphaFold REST API:

Test 1 — Standard prediction endpoint:

GET https://alphafold.ebi.ac.uk/api/prediction/P04637

Result: Returns monomer prediction only. isComplex: false. No dimer data.

Test 2 — Type parameter on prediction endpoint:

GET https://alphafold.ebi.ac.uk/api/prediction/P04637?type=complex

Result: Returns all isoforms for that UniProt ID. All isComplex: false. Parameter ignored.

Test 3 — Dedicated complexes endpoint:

GET https://alphafold.ebi.ac.uk/api/complexes/P04637

Result: HTTP 404. Endpoint does not exist.

Test 4 — FTP bulk manifest:

GET https://ftp.ebi.ac.uk/pub/databases/alphafold/complexes/

Result: HTTP 404. Complex FTP directory not yet published at time of writing.

Test 5 — Direct query with known complex numeric ID:

GET https://alphafold.ebi.ac.uk/api/prediction/AF-0000000066503175

Result: HTTP 200. Returns full complex prediction data with isComplex: true and globalMetricValue: 86.04. Complex numeric IDs are queryable — the problem reduces to finding the numeric ID for a given UniProt accession.

Test 6 — Search endpoint with type parameter:

GET https://alphafold.ebi.ac.uk/api/search?q=Q55DI5&type=complex

Result: HTTP 200. Returns both monomer and homodimer entries in a single response. This is the discovery.

The Discovery

The AlphaFold search endpoint (/api/search) accepts a type=complex parameter that, when combined with a UniProt accession query, returns all predictions associated with that accession — including homodimer complex predictions where they exist.

Verified working curl:

curl -s "https://alphafold.ebi.ac.uk/api/search?q=Q55DI5&type=complex"

Response (abbreviated):

{
  "numFound": 2,
  "docs": [
    {
      "entryId": "AF-Q55DI5-F1",
      "isComplex": false,
      "oligomericState": "monomer",
      "globalMetricValue": 50.56,
      "providerId": "GDM"
    },
    {
      "entryId": "AF-0000000066503175",
      "isComplex": true,
      "oligomericState": "dimer",
      "assemblyType": "Homo",
      "complexName": "Homodimer of Transcription elongation factor Eaf N-terminal domain-containing protein",
      "globalMetricValue": 86.06,
      "providerId": "NVDA",
      "complexPredictionAccuracy_ipTM": 0.82,
      "complexPredictionAccuracy_pDockQ2_AB": 0.72,
      "complexPredictionAccuracy_LIS_AB": 0.65,
      "complexComposition": [
        {
          "identifierType": "uniprotAccession",
          "identifier": "Q55DI5",
          "stoichiometry": 2
        }
      ]
    }
  ]
}

A single API call returns both the monomer prediction (isComplex: false) and the homodimer prediction (isComplex: true) for the same protein. The globalMetricValue field on each entry is the average pLDDT confidence score for that structure.

What This Enables

Real-Time Disorder Delta Computation

Prior to this discovery, disorder delta — the structural confidence gain when a protein forms a complex — could only be computed from manually curated data or bulk FTP downloads. With this endpoint, disorder delta is computable live for any protein that has a homodimer prediction:

monomer_plddt = docs.filter(isComplex == false && isIsoform == false)[0].globalMetricValue
dimer_plddt   = docs.filter(isComplex == true)[0].globalMetricValue
disorder_delta = dimer_plddt - monomer_plddt

For Q55DI5:

monomer_plddt  = 50.56
dimer_plddt    = 86.06
disorder_delta = +35.50

This computation previously required downloading and parsing multi-megabyte structure files. It now requires one HTTP request.

Complex Numeric ID Resolution

The search response returns the numeric complex ID (AF-0000000066503175) for any protein that has a homodimer prediction. This ID can then be used to fetch the full complex prediction record:

GET https://alphafold.ebi.ac.uk/api/prediction/AF-0000000066503175

This gives access to the complete complex metadata including per-residue confidence scores and structure file URLs as they become available.

Additional Accuracy Metrics

The complex entry exposes prediction accuracy metrics that are not available for monomer predictions and were not previously accessible via any API:

Field	Description	Value for Q55DI5
`complexPredictionAccuracy_ipTM`	Interface predicted TM-score — confidence in the chain-chain interface	0.82
`complexPredictionAccuracy_pDockQ2_AB`	Docking quality score per chain pair	0.72
`complexPredictionAccuracy_LIS_AB`	Local interaction score	0.65
`complexPredictionAccuracy_ipsae_AB`	Interface predicted SAE score	0.76
`complexPredictionAccuracy_N_clash_backbone`	Backbone clash count (structural validity)	0.0

The ipTM score is particularly significant for drug discovery — it measures confidence specifically at the protein-protein interface, which is precisely where a drug molecule would bind to disrupt the complex.

The Complete Pipeline

The full resolution pipeline from a protein name to real-time disorder delta:

Step 1 — User queries "TP53" or "tuberculosis"
          ↓
Step 2 — UniProt search returns UniProt accession ID
          e.g. Q55DI5
          ↓
Step 3 — GET /api/search?q={uniprotID}&type=complex
          Returns docs array with both monomer and complex entries
          ↓
Step 4 — Parse docs:
          Find doc where isComplex=false, isIsoform=false
          → monomer_plddt = globalMetricValue
          
          Find doc where isComplex=true
          → dimer_plddt = globalMetricValue
          → complex_id = entryId
          → iptm_score = complexPredictionAccuracy_ipTM
          ↓
Step 5 — Compute:
          disorder_delta = dimer_plddt - monomer_plddt
          ↓
Step 6 — If no isComplex=true doc exists:
          Protein has no homodimer prediction in the March 2026 dataset
          Set dimer_plddt = monomer_plddt, disorder_delta = 0
          ↓
Step 7 — Return enriched Complex object with real disorder delta

This pipeline resolves in under 500ms for any protein in the AlphaFold Database. It requires no bulk download, no file parsing, and no pre-computed lookup tables.

Scope and Limitations

What works:

Any protein where AlphaFold has computed a homodimer prediction in the March 2026 dataset
The search endpoint correctly returns isComplex: true entries with all accuracy metrics
globalMetricValue is available immediately without parsing structure files
The numeric complex ID is returned and can be used for further queries

Current limitations:

Not every protein has a homodimer prediction. The March 2026 release covers 1.7 million high-confidence and 18 million lower-confidence homodimers out of a much larger protein universe. For proteins without a complex prediction, disorder delta cannot be computed live.
Complex structure files (.cif) at the standard AlphaFold files endpoint return HTTP 404 at time of writing. The file storage for complex structures has not yet been fully populated. The metadata API is ahead of the file storage.
The type=complex parameter behavior on the search endpoint is not documented in the official AlphaFold API documentation. Its continued availability is not guaranteed. ProtPocket maintains a curated hero dataset as a fallback.

Significance

This discovery means ProtPocket can compute real disorder delta for any protein live — not just the 30 manually curated hero complexes. As AlphaFold continues to add complex predictions to their database, ProtPocket's coverage automatically expands without any code changes.

More broadly, this pipeline fills a gap that exists for every researcher and tool in the structural biology community right now. The AlphaFold complex dataset was released 5 days before this document was written. No published tool has yet built programmatic access to complex data via UniProt ID. ProtPocket is, to our knowledge, the first tool to do so.

Reproducibility

The following curl commands reproduce the core finding:

# Returns both monomer and complex predictions for Q55DI5 in one call
curl -s "https://alphafold.ebi.ac.uk/api/search?q=Q55DI5&type=complex" | jq '.docs[] | {entryId, isComplex, globalMetricValue, oligomericState}'

# Queries the complex entry directly by numeric ID
curl -s "https://alphafold.ebi.ac.uk/api/prediction/AF-0000000066503175" | jq '.[0] | {entryId, isComplex, globalMetricValue, complexPredictionAccuracy_ipTM}'

Expected output for the first command:

{ "entryId": "AF-Q55DI5-F1", "isComplex": false, "globalMetricValue": 50.56, "oligomericState": "monomer" }
{ "entryId": "AF-0000000066503175", "isComplex": true, "globalMetricValue": 86.06, "oligomericState": "dimer" }

Implementation in ProtPocket

This discovery is implemented in backend/services/alphafold.go as FetchComplexData(uniprotID string). The function makes a single HTTP GET to the search endpoint, parses the docs array, extracts both pLDDT values, computes disorder delta, and returns the enriched data to the search and detail handlers.

The gap score algorithm in backend/scoring/gap_score.go incorporates the live disorder delta as a bonus term, and the ipTM score is surfaced as a separate "Interface Confidence" metric on the detail page.

Discovered: March 2026 ProtPocket This document may be freely shared with the structural biology and bioinformatics community.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Technical Discovery: AlphaFold Complex API Resolution Pipeline

ProtPocket Team — March 2026

Summary

Background

Investigation

What We Tested

The Discovery

What This Enables

Real-Time Disorder Delta Computation

Complex Numeric ID Resolution

Additional Accuracy Metrics

The Complete Pipeline

Scope and Limitations

Significance

Reproducibility

Implementation in ProtPocket

FilesExpand file tree

COMPLEX.md

Latest commit

History

COMPLEX.md

File metadata and controls

Technical Discovery: AlphaFold Complex API Resolution Pipeline

ProtPocket Team — March 2026

Summary

Background

Investigation

What We Tested

The Discovery

What This Enables

Real-Time Disorder Delta Computation

Complex Numeric ID Resolution

Additional Accuracy Metrics

The Complete Pipeline

Scope and Limitations

Significance

Reproducibility

Implementation in ProtPocket