Skip to content

Boltz2 Format

Expected input format for tsp-maker parse boltz2.

Directory Structure

boltz2_outputs/
├── P12345/                       ← Folder name becomes protein ID
│   └── boltz_results_P12345/
│       ├── predictions/
│       │   └── P12345/
│       │       ├── P12345_model_0.pdb
│       │       ├── P12345_model_1.pdb
│       │       ├── confidence_P12345_model_0.json
│       │       ├── confidence_P12345_model_1.json
│       │       ├── plddt_P12345_model_0.npz
│       │       ├── plddt_P12345_model_1.npz
│       │       ├── pae_P12345_model_0.npz
│       │       └── pae_P12345_model_1.npz
│       └── processed/
│           └── manifest.json
├── Q67890/                       ← Folder name becomes protein ID
│   └── ...
└── ...

Folder Names = Protein IDs

The top-level folder name (e.g., P12345) becomes the protein ID in your dataset. Ensure folders are named with the identifiers you want. See Protein ID Rules.

Directory Detection

The parser looks for Boltz2 outputs in this order:

  1. Directory named boltz_results_*
  2. Directory containing predictions/
  3. Input directory itself if it matches above

Required Files

Model Files

For each model:

File Description
{name}_model_{i}.pdb Structure file
confidence_{name}_model_{i}.json Confidence metrics

Optional Files

File Description
plddt_{name}_model_{i}.npz Per-residue pLDDT
pae_{name}_model_{i}.npz PAE matrix
pde_{name}_model_{i}.npz PDE matrix

confidence JSON

{
  "confidence_score": 0.85,
  "ptm": 0.82,
  "iptm": 0.78,
  "complex_plddt": 85.2,
  "complex_pde": 1.2,
  "chains_ptm": [0.85, 0.80],
  "pair_chains_iptm": [[0.0, 0.78], [0.78, 0.0]]
}

manifest.json

Used to determine monomer vs multimer:

{
  "records": [{
    "structure": {
      "num_chains": 2
    }
  }]
}

Ranking

Models are ranked by confidence_score:

  1. All *_model_*.pdb files found
  2. Corresponding confidence JSON loaded
  3. Sorted by confidence_score (descending)
  4. Top N models extracted

Extracted Metrics

Metric Source
confidence_score confidence JSON
ranking_score Same as confidence_score
ptm confidence JSON
iptm confidence JSON (multimers)
complex_plddt confidence JSON
complex_pde confidence JSON
plddt_mean/min/median From plddt npz
pae_mean/min/max From pae npz

Multimer Metrics

For multimers (detected from manifest or iptm > 0):

  • protein_iptm
  • ligand_iptm
  • complex_iplddt
  • complex_ipde
  • chains_ptm
  • pair_chains_iptm

Example Command

tsp-maker parse boltz2 /data/boltz2_outputs /intermediate --top-n 5

Output Naming

Files are named with _BZ2_ suffix:

  • P12345_BZ2_1.pdb
  • P12345_BZ2.json
  • P12345_BZ2_1.npy

Ligand Support

Boltz2 supports protein-ligand predictions. Ligand information is preserved in the confidence metrics but the structure files contain standard PDB format.