Quick Start¶

This guide walks through creating a TSP package from AlphaFold3 predictions and uploading to Zenodo.

Prerequisites¶

tsp-maker installed (Installation)
Structure prediction outputs (AF2, AF3, or Boltz2)
Zenodo API token (for upload step)

Important: Folder Names = Protein IDs¶

Critical Design Decision

tsp-maker uses folder names as protein identifiers. The name of each prediction folder becomes the protein ID in your dataset.

This is a deliberate design choice—we do not attempt to extract IDs from file contents or metadata. If your folders are named job_001, job_002, etc., those will be your protein IDs, not the actual protein names.

Before running tsp-maker, ensure your folder names are the protein identifiers you want in your final dataset.

Folder Name Requirements¶

Rule	Details
Max length	16 characters
Allowed characters	Letters (`A-Z`, `a-z`), digits (`0-9`), underscore (`_`), hyphen (`-`)
Not allowed	Spaces, brackets, special characters, path separators

Examples¶

Folder Name	Result
`P12345`	✓ Valid — ID is `P12345`
`AT1G01010`	✓ Valid — ID is `AT1G01010`
`gene-001`	✓ Valid — ID is `gene-001`
`job_001`	✓ Valid but likely wrong — ID will be `job_001`, not the protein name
`alphafold_run_P12345`	✗ Skipped — exceeds 16 characters
`my protein`	✗ Skipped — contains space

If Your Folders Need Renaming¶

Rename folders before running tsp-maker:

# Example: rename from job IDs to protein IDs
mv predictions/job_001 predictions/P12345
mv predictions/job_002 predictions/Q67890

Or use --id-pattern to extract IDs from complex folder names (see parse command).

Step 1: Organize Your Data¶

Your prediction outputs should be organized with one folder per protein. The folder name becomes the protein ID:

predictions/
├── P12345/          ← Folder name "P12345" becomes protein ID
│   ├── seed-1_sample-0/
│   │   ├── model.cif
│   │   ├── confidences.json
│   │   └── summary_confidences.json
│   └── ...
├── Q67890/          ← Folder name "Q67890" becomes protein ID
│   └── ...
└── ...

Step 2: Parse Predictions¶

Convert predictor outputs to intermediate format:

tsp-maker parse af3 /path/to/predictions /tmp/intermediate --top-n 5

This creates:

/tmp/intermediate/
├── structures/       # Structure files (renamed)
├── scores/           # JSON score files
└── pae/              # PAE matrices (.npy)

For multiple predictors, run parse commands sequentially—they merge into the same output:

tsp-maker parse af2 /data/af2 /tmp/intermediate --top-n 5
tsp-maker parse af3 /data/af3 /tmp/intermediate --top-n 5
tsp-maker parse boltz2 /data/boltz2 /tmp/intermediate --top-n 5

Step 3: Build TSP Package¶

Assemble the intermediate format into a TSP package:

tsp-maker build /tmp/intermediate /tmp/my-dataset \
    --name my-structures \
    --title "My Structure Dataset" \
    --description "AlphaFold3 predictions for interesting proteins" \
    --author "Jane Doe" \
    --affiliation "My Institute"

Output structure:

/tmp/my-dataset/
├── datapackage.json
├── metadata.parquet
├── structures/
│   └── batch_001.tar.gz
└── predictions/
    ├── scores.parquet
    └── pae/
        └── batch_001.tar.gz

Step 4: Validate¶

Check the package is valid:

tsp-maker validate /tmp/my-dataset

For full validation, use the R package:

library(tslstructures)
validate_tsp("/tmp/my-dataset")

Step 5: Upload to Zenodo¶

First, test on sandbox:

tsp-maker upload /tmp/my-dataset \
    --token $ZENODO_SANDBOX_TOKEN

This creates a draft deposit. Review it in the Zenodo web interface, then publish:

tsp-maker upload /tmp/my-dataset \
    --token $ZENODO_SANDBOX_TOKEN \
    --publish

Production Upload

For production (real DOIs), add --production:

tsp-maker upload /tmp/my-dataset \
    --token $ZENODO_TOKEN \
    --production \
    --publish

This creates a permanent, citable record.

Complete Script¶

#!/bin/bash
set -e

INPUT=/data/predictions
INTERMEDIATE=/tmp/tsp-intermediate
OUTPUT=/tmp/my-dataset

# Parse all predictors
tsp-maker parse af3 $INPUT/af3 $INTERMEDIATE --top-n 5
tsp-maker parse boltz2 $INPUT/boltz2 $INTERMEDIATE --top-n 3

# Build package
tsp-maker build $INTERMEDIATE $OUTPUT \
    --name my-structures \
    --title "My Structure Dataset" \
    --author "Jane Doe" \
    --affiliation "My Institute"

# Validate
tsp-maker validate $OUTPUT

# Upload (sandbox first!)
tsp-maker upload $OUTPUT --token $ZENODO_SANDBOX_TOKEN --publish

echo "Done! Check Zenodo for your deposit."

Next Steps¶

Read the Command Reference for all options
Learn about supported input formats
Explore the Python API for programmatic use