Skip to content

Quick Start

This guide walks through creating a TSP package from AlphaFold3 predictions and uploading to Zenodo.

Prerequisites

  • tsp-maker installed (Installation)
  • Structure prediction outputs (AF2, AF3, or Boltz2)
  • Zenodo API token (for upload step)

Important: Folder Names = Protein IDs

Critical Design Decision

tsp-maker uses folder names as protein identifiers. The name of each prediction folder becomes the protein ID in your dataset.

This is a deliberate design choice—we do not attempt to extract IDs from file contents or metadata. If your folders are named job_001, job_002, etc., those will be your protein IDs, not the actual protein names.

Before running tsp-maker, ensure your folder names are the protein identifiers you want in your final dataset.

Folder Name Requirements

Rule Details
Max length 16 characters
Allowed characters Letters (A-Z, a-z), digits (0-9), underscore (_), hyphen (-)
Not allowed Spaces, brackets, special characters, path separators

Examples

Folder Name Result
P12345 ✓ Valid — ID is P12345
AT1G01010 ✓ Valid — ID is AT1G01010
gene-001 ✓ Valid — ID is gene-001
job_001 ✓ Valid but likely wrong — ID will be job_001, not the protein name
alphafold_run_P12345 ✗ Skipped — exceeds 16 characters
my protein ✗ Skipped — contains space

If Your Folders Need Renaming

Rename folders before running tsp-maker:

# Example: rename from job IDs to protein IDs
mv predictions/job_001 predictions/P12345
mv predictions/job_002 predictions/Q67890

Or use --id-pattern to extract IDs from complex folder names (see parse command).


Step 1: Organize Your Data

Your prediction outputs should be organized with one folder per protein. The folder name becomes the protein ID:

predictions/
├── P12345/          ← Folder name "P12345" becomes protein ID
│   ├── seed-1_sample-0/
│   │   ├── model.cif
│   │   ├── confidences.json
│   │   └── summary_confidences.json
│   └── ...
├── Q67890/          ← Folder name "Q67890" becomes protein ID
│   └── ...
└── ...

Step 2: Parse Predictions

Convert predictor outputs to intermediate format:

tsp-maker parse af3 /path/to/predictions /tmp/intermediate --top-n 5

This creates:

/tmp/intermediate/
├── structures/       # Structure files (renamed)
├── scores/           # JSON score files
└── pae/              # PAE matrices (.npy)

For multiple predictors, run parse commands sequentially—they merge into the same output:

tsp-maker parse af2 /data/af2 /tmp/intermediate --top-n 5
tsp-maker parse af3 /data/af3 /tmp/intermediate --top-n 5
tsp-maker parse boltz2 /data/boltz2 /tmp/intermediate --top-n 5

Step 3: Build TSP Package

Assemble the intermediate format into a TSP package:

tsp-maker build /tmp/intermediate /tmp/my-dataset \
    --name my-structures \
    --title "My Structure Dataset" \
    --description "AlphaFold3 predictions for interesting proteins" \
    --author "Jane Doe" \
    --affiliation "My Institute"

Output structure:

/tmp/my-dataset/
├── datapackage.json
├── metadata.parquet
├── structures/
│   └── batch_001.tar.gz
└── predictions/
    ├── scores.parquet
    └── pae/
        └── batch_001.tar.gz

Step 4: Validate

Check the package is valid:

tsp-maker validate /tmp/my-dataset

For full validation, use the R package:

library(tslstructures)
validate_tsp("/tmp/my-dataset")

Step 5: Upload to Zenodo

First, test on sandbox:

tsp-maker upload /tmp/my-dataset \
    --token $ZENODO_SANDBOX_TOKEN

This creates a draft deposit. Review it in the Zenodo web interface, then publish:

tsp-maker upload /tmp/my-dataset \
    --token $ZENODO_SANDBOX_TOKEN \
    --publish

Production Upload

For production (real DOIs), add --production:

tsp-maker upload /tmp/my-dataset \
    --token $ZENODO_TOKEN \
    --production \
    --publish
This creates a permanent, citable record.

Complete Script

#!/bin/bash
set -e

INPUT=/data/predictions
INTERMEDIATE=/tmp/tsp-intermediate
OUTPUT=/tmp/my-dataset

# Parse all predictors
tsp-maker parse af3 $INPUT/af3 $INTERMEDIATE --top-n 5
tsp-maker parse boltz2 $INPUT/boltz2 $INTERMEDIATE --top-n 3

# Build package
tsp-maker build $INTERMEDIATE $OUTPUT \
    --name my-structures \
    --title "My Structure Dataset" \
    --author "Jane Doe" \
    --affiliation "My Institute"

# Validate
tsp-maker validate $OUTPUT

# Upload (sandbox first!)
tsp-maker upload $OUTPUT --token $ZENODO_SANDBOX_TOKEN --publish

echo "Done! Check Zenodo for your deposit."

Next Steps