Boltz2 Format¶

Expected input format for tsp-maker parse boltz2.

Directory Structure¶

boltz2_outputs/
├── P12345/                       ← Folder name becomes protein ID
│   └── boltz_results_P12345/
│       ├── predictions/
│       │   └── P12345/
│       │       ├── P12345_model_0.pdb
│       │       ├── P12345_model_1.pdb
│       │       ├── confidence_P12345_model_0.json
│       │       ├── confidence_P12345_model_1.json
│       │       ├── plddt_P12345_model_0.npz
│       │       ├── plddt_P12345_model_1.npz
│       │       ├── pae_P12345_model_0.npz
│       │       └── pae_P12345_model_1.npz
│       └── processed/
│           └── manifest.json
├── Q67890/                       ← Folder name becomes protein ID
│   └── ...
└── ...

Folder Names = Protein IDs

The top-level folder name (e.g., P12345) becomes the protein ID in your dataset. Ensure folders are named with the identifiers you want. See Protein ID Rules.

Directory Detection¶

The parser looks for Boltz2 outputs in this order:

Directory named boltz_results_*
Directory containing predictions/
Input directory itself if it matches above

Required Files¶

Model Files¶

For each model:

File	Description
`{name}_model_{i}.pdb`	Structure file
`confidence_{name}_model_{i}.json`	Confidence metrics

Optional Files¶

File	Description
`plddt_{name}_model_{i}.npz`	Per-residue pLDDT
`pae_{name}_model_{i}.npz`	PAE matrix
`pde_{name}_model_{i}.npz`	PDE matrix

confidence JSON¶

{
  "confidence_score": 0.85,
  "ptm": 0.82,
  "iptm": 0.78,
  "complex_plddt": 85.2,
  "complex_pde": 1.2,
  "chains_ptm": [0.85, 0.80],
  "pair_chains_iptm": [[0.0, 0.78], [0.78, 0.0]]
}

manifest.json¶

Used to determine monomer vs multimer:

{
  "records": [{
    "structure": {
      "num_chains": 2
    }
  }]
}

Ranking¶

Models are ranked by confidence_score:

All *_model_*.pdb files found
Corresponding confidence JSON loaded
Sorted by confidence_score (descending)
Top N models extracted

Extracted Metrics¶

Metric	Source
`confidence_score`	confidence JSON
`ranking_score`	Same as confidence_score
`ptm`	confidence JSON
`iptm`	confidence JSON (multimers)
`complex_plddt`	confidence JSON
`complex_pde`	confidence JSON
`plddt_mean/min/median`	From plddt npz
`pae_mean/min/max`	From pae npz

Multimer Metrics¶

For multimers (detected from manifest or iptm > 0):

protein_iptm
ligand_iptm
complex_iplddt
complex_ipde
chains_ptm
pair_chains_iptm

Example Command¶

tsp-maker parse boltz2 /data/boltz2_outputs /intermediate --top-n 5

Output Naming¶

Files are named with _BZ2_ suffix:

P12345_BZ2_1.pdb
P12345_BZ2.json
P12345_BZ2_1.npy

Ligand Support¶

Boltz2 supports protein-ligand predictions. Ligand information is preserved in the confidence metrics but the structure files contain standard PDB format.