Quick Start¶
This guide walks through creating a TSP package from AlphaFold3 predictions and uploading to Zenodo.
Prerequisites¶
- tsp-maker installed (Installation)
- Structure prediction outputs (AF2, AF3, or Boltz2)
- Zenodo API token (for upload step)
Important: Folder Names = Protein IDs¶
Critical Design Decision
tsp-maker uses folder names as protein identifiers. The name of each prediction folder becomes the protein ID in your dataset.
This is a deliberate design choice—we do not attempt to extract IDs from file contents or metadata. If your folders are named job_001, job_002, etc., those will be your protein IDs, not the actual protein names.
Before running tsp-maker, ensure your folder names are the protein identifiers you want in your final dataset.
Folder Name Requirements¶
| Rule | Details |
|---|---|
| Max length | 16 characters |
| Allowed characters | Letters (A-Z, a-z), digits (0-9), underscore (_), hyphen (-) |
| Not allowed | Spaces, brackets, special characters, path separators |
Examples¶
| Folder Name | Result |
|---|---|
P12345 |
✓ Valid — ID is P12345 |
AT1G01010 |
✓ Valid — ID is AT1G01010 |
gene-001 |
✓ Valid — ID is gene-001 |
job_001 |
✓ Valid but likely wrong — ID will be job_001, not the protein name |
alphafold_run_P12345 |
✗ Skipped — exceeds 16 characters |
my protein |
✗ Skipped — contains space |
If Your Folders Need Renaming¶
Rename folders before running tsp-maker:
# Example: rename from job IDs to protein IDs
mv predictions/job_001 predictions/P12345
mv predictions/job_002 predictions/Q67890
Or use --id-pattern to extract IDs from complex folder names (see parse command).
Step 1: Organize Your Data¶
Your prediction outputs should be organized with one folder per protein. The folder name becomes the protein ID:
predictions/
├── P12345/ ← Folder name "P12345" becomes protein ID
│ ├── seed-1_sample-0/
│ │ ├── model.cif
│ │ ├── confidences.json
│ │ └── summary_confidences.json
│ └── ...
├── Q67890/ ← Folder name "Q67890" becomes protein ID
│ └── ...
└── ...
Step 2: Parse Predictions¶
Convert predictor outputs to intermediate format:
This creates:
/tmp/intermediate/
├── structures/ # Structure files (renamed)
├── scores/ # JSON score files
└── pae/ # PAE matrices (.npy)
For multiple predictors, run parse commands sequentially—they merge into the same output:
tsp-maker parse af2 /data/af2 /tmp/intermediate --top-n 5
tsp-maker parse af3 /data/af3 /tmp/intermediate --top-n 5
tsp-maker parse boltz2 /data/boltz2 /tmp/intermediate --top-n 5
Step 3: Build TSP Package¶
Assemble the intermediate format into a TSP package:
tsp-maker build /tmp/intermediate /tmp/my-dataset \
--name my-structures \
--title "My Structure Dataset" \
--description "AlphaFold3 predictions for interesting proteins" \
--author "Jane Doe" \
--affiliation "My Institute"
Output structure:
/tmp/my-dataset/
├── datapackage.json
├── metadata.parquet
├── structures/
│ └── batch_001.tar.gz
└── predictions/
├── scores.parquet
└── pae/
└── batch_001.tar.gz
Step 4: Validate¶
Check the package is valid:
For full validation, use the R package:
Step 5: Upload to Zenodo¶
First, test on sandbox:
This creates a draft deposit. Review it in the Zenodo web interface, then publish:
Production Upload
For production (real DOIs), add --production:
Complete Script¶
#!/bin/bash
set -e
INPUT=/data/predictions
INTERMEDIATE=/tmp/tsp-intermediate
OUTPUT=/tmp/my-dataset
# Parse all predictors
tsp-maker parse af3 $INPUT/af3 $INTERMEDIATE --top-n 5
tsp-maker parse boltz2 $INPUT/boltz2 $INTERMEDIATE --top-n 3
# Build package
tsp-maker build $INTERMEDIATE $OUTPUT \
--name my-structures \
--title "My Structure Dataset" \
--author "Jane Doe" \
--affiliation "My Institute"
# Validate
tsp-maker validate $OUTPUT
# Upload (sandbox first!)
tsp-maker upload $OUTPUT --token $ZENODO_SANDBOX_TOKEN --publish
echo "Done! Check Zenodo for your deposit."
Next Steps¶
- Read the Command Reference for all options
- Learn about supported input formats
- Explore the Python API for programmatic use