paneffectR • paneffectR

Comparative genomics of effector proteins across multiple genome assemblies.

Why paneffectR?

When studying plant pathogens, a key question is: which effector proteins are shared across isolates, and which are unique? Understanding effector repertoires helps identify core virulence factors, track pathogen evolution, and discover candidate avirulence genes.

paneffectR solves this by:

Clustering proteins into orthogroups - Finding equivalent proteins across assemblies using sequence similarity
Building presence/absence matrices - Creating structured data showing which proteins exist in which assemblies
Filtering by effector scores - Focusing on high-confidence effector predictions (when using omnieff output)
Generating publication-ready visualizations - Heatmaps, UpSet plots, and dendrograms

While designed for effector analysis, paneffectR works with any protein sets for general pan-genome comparisons.

Installation

R Dependencies

paneffectR requires Bioconductor packages. Install them first:

install.packages("BiocManager")
BiocManager::install(c("ComplexHeatmap", "Biostrings"))

Then install paneffectR from GitHub:

# install.packages("devtools")
devtools::install_github("TeamMacLean/paneffectR")

External Dependencies

paneffectR uses DIAMOND for fast protein sequence alignment. Install it via mamba/conda:

mamba install -c bioconda diamond

How paneffectR finds DIAMOND:

The package searches for DIAMOND in this order:

Explicit path - Pass tool_path = "/path/to/diamond" to cluster_proteins()
Conda/mamba prefix - Pass conda_prefix = "./my_env" for project-local environments
System PATH - Falls back to Sys.which("diamond")

This flexibility supports various installation scenarios:

# System-wide installation (found via PATH)
clusters <- cluster_proteins(proteins)

# Project-local mamba environment
clusters <- cluster_proteins(proteins, conda_prefix = "./this_project_env")

# Explicit path
clusters <- cluster_proteins(proteins, tool_path = "/opt/diamond/bin/diamond")

To create a project-local environment:

mamba create -p ./this_project_env -c bioconda diamond

Quick Start

library(paneffectR)

# Load proteins from multiple assemblies
proteins <- load_proteins(
  fasta_dir = "path/to/fastas/",
  score_dir = "path/to/scores/"
)

# Cluster into orthogroups
clusters <- cluster_proteins(proteins, method = "diamond_rbh")

# Build presence/absence matrix (filter to high-scoring effectors)
pa <- build_pa_matrix(clusters, score_threshold = 5.0)

# Visualize
ht <- plot_heatmap(pa)
ComplexHeatmap::draw(ht)

plot_upset(pa, min_size = 2)

Use Cases

Effector Comparative Genomics

Take output from the omnieff pipeline, find orthologous effectors across assemblies, and filter by prediction confidence:

# Load omnieff output (FASTAs + scores)
proteins <- load_proteins(
  fasta_dir = "omnieff_output/reformatted/",
  score_dir = "omnieff_output/scored/"
)

# Cluster and build matrix with score threshold
clusters <- cluster_proteins(proteins)
pa <- build_pa_matrix(clusters, score_threshold = 5.0)

# Visualize effector repertoires
plot_heatmap(pa) |> ComplexHeatmap::draw()

General Pan-Genome Analysis

Compare any protein sets without effector scores:

# Load raw FASTAs
proteins <- load_proteins(fasta_dir = "my_assemblies/")

# Binary presence/absence analysis
clusters <- cluster_proteins(proteins)
pa <- build_pa_matrix(clusters, type = "binary")

# Identify core vs accessory proteins
plot_upset(pa, min_size = 2)
plot_dendro(pa, distance_method = "jaccard")

Documentation

Getting Started - Core workflow tutorial
Effector Analysis - Working with omnieff output
Pan-Genome Analysis - General protein comparisons
Algorithm Deep Dive - Technical details for bioinformaticians
Function Reference - Complete API documentation

Citation

If you use paneffectR in your research, please cite:

MacLean, D. (2026). paneffectR: Comparative Genomics of Effector Proteins. R package version 0.1.0. https://github.com/TeamMacLean/paneffectR

License

MIT