Comparative genomics of effector proteins across multiple genome assemblies.
Why paneffectR?
When studying plant pathogens, a key question is: which effector proteins are shared across isolates, and which are unique? Understanding effector repertoires helps identify core virulence factors, track pathogen evolution, and discover candidate avirulence genes.
paneffectR solves this by:
- Clustering proteins into orthogroups - Finding equivalent proteins across assemblies using sequence similarity
- Building presence/absence matrices - Creating structured data showing which proteins exist in which assemblies
- Filtering by effector scores - Focusing on high-confidence effector predictions (when using omnieff output)
- Generating publication-ready visualizations - Heatmaps, UpSet plots, and dendrograms
While designed for effector analysis, paneffectR works with any protein sets for general pan-genome comparisons.
Installation
R Dependencies
paneffectR requires Bioconductor packages. Install them first:
install.packages("BiocManager")
BiocManager::install(c("ComplexHeatmap", "Biostrings"))Then install paneffectR from GitHub:
# install.packages("devtools")
devtools::install_github("TeamMacLean/paneffectR")External Dependencies
paneffectR uses DIAMOND for fast protein sequence alignment. Install it via mamba/conda:
How paneffectR finds DIAMOND:
The package searches for DIAMOND in this order:
-
Explicit path - Pass
tool_path = "/path/to/diamond"tocluster_proteins() -
Conda/mamba prefix - Pass
conda_prefix = "./my_env"for project-local environments -
System PATH - Falls back to
Sys.which("diamond")
This flexibility supports various installation scenarios:
# System-wide installation (found via PATH)
clusters <- cluster_proteins(proteins)
# Project-local mamba environment
clusters <- cluster_proteins(proteins, conda_prefix = "./this_project_env")
# Explicit path
clusters <- cluster_proteins(proteins, tool_path = "/opt/diamond/bin/diamond")To create a project-local environment:
Quick Start
library(paneffectR)
# Load proteins from multiple assemblies
proteins <- load_proteins(
fasta_dir = "path/to/fastas/",
score_dir = "path/to/scores/"
)
# Cluster into orthogroups
clusters <- cluster_proteins(proteins, method = "diamond_rbh")
# Build presence/absence matrix (filter to high-scoring effectors)
pa <- build_pa_matrix(clusters, score_threshold = 5.0)
# Visualize
ht <- plot_heatmap(pa)
ComplexHeatmap::draw(ht)
plot_upset(pa, min_size = 2)Use Cases
Effector Comparative Genomics
Take output from the omnieff pipeline, find orthologous effectors across assemblies, and filter by prediction confidence:
# Load omnieff output (FASTAs + scores)
proteins <- load_proteins(
fasta_dir = "omnieff_output/reformatted/",
score_dir = "omnieff_output/scored/"
)
# Cluster and build matrix with score threshold
clusters <- cluster_proteins(proteins)
pa <- build_pa_matrix(clusters, score_threshold = 5.0)
# Visualize effector repertoires
plot_heatmap(pa) |> ComplexHeatmap::draw()General Pan-Genome Analysis
Compare any protein sets without effector scores:
# Load raw FASTAs
proteins <- load_proteins(fasta_dir = "my_assemblies/")
# Binary presence/absence analysis
clusters <- cluster_proteins(proteins)
pa <- build_pa_matrix(clusters, type = "binary")
# Identify core vs accessory proteins
plot_upset(pa, min_size = 2)
plot_dendro(pa, distance_method = "jaccard")Documentation
- Getting Started - Core workflow tutorial
- Effector Analysis - Working with omnieff output
- Pan-Genome Analysis - General protein comparisons
- Algorithm Deep Dive - Technical details for bioinformaticians
- Function Reference - Complete API documentation
Citation
If you use paneffectR in your research, please cite:
MacLean, D. (2026). paneffectR: Comparative Genomics of Effector Proteins. R package version 0.1.0. https://github.com/TeamMacLean/paneffectR