MOSAiC orthologs

reciping
e5
coral
Author
Affiliation

Steven Roberts

Published

August 31, 2025

Orthologous Protein Identification Across Three Coral Species

Introduction

Understanding the evolutionary relationships between genes across different species is fundamental to comparative genomics. In this analysis, we identified orthologous proteins across three distinct coral species using a reciprocal best hits (RBH) approach. This methodology provides a robust foundation for cross-species comparisons and evolutionary studies.

Study Species

Our analysis focused on three coral species representing different growth forms and ecological strategies:

  • Acropora pulchra (D-Apul): A fast-growing, branching coral species
  • Porites evermanni (E-Peve): A slow-growing, massive coral species
  • Pocillopora tuahiniensis (F-Ptua): An intermediate growth, branching coral species

These species represent different evolutionary strategies within the coral phylogeny, making them ideal candidates for comparative genomic analysis.

Methodology

Orthology Identification Pipeline

We employed a comprehensive orthology identification pipeline using the following parameters:

  • BLAST algorithm: All-vs-all protein BLAST comparisons
  • Orthology criterion: Reciprocal best hits (RBH)
  • E-value threshold: 1e-5 (stringent significance threshold)
  • Minimum identity: 30% (allowing for evolutionary divergence)
  • Minimum coverage: 50% (ensuring substantial sequence overlap)

Computational Workflow

  1. Database Preparation: Created BLAST databases for each species’ protein sequences
  2. All-vs-All Comparisons: Performed bidirectional BLAST searches between all species pairs
  3. Reciprocal Best Hits: Identified protein pairs that are mutual best matches
  4. Ortholog Grouping: Organized orthologs into groups based on presence across species
  5. Quality Filtering: Applied identity and coverage thresholds to ensure orthology confidence

Results

Protein Sequence Statistics

The analysis began with comprehensive protein datasets from each species:

  • Acropora pulchra: 15,664 total proteins from genome annotation
  • Porites evermanni: 16,693 total proteins from genome assembly
  • Pocillopora tuahiniensis: 16,060 total proteins from genome annotation

These datasets represent the complete protein complement for each species, providing a comprehensive foundation for orthology analysis.

Orthology Classification

Our analysis identified several categories of orthologous relationships:

  1. Three-way orthologs: 10,346 proteins present in all three species (highest confidence)
  2. Two-way orthologs: 7,980 proteins shared between specific species pairs
  3. Total ortholog groups: 18,326 distinct ortholog groups identified

Pairwise Orthology Results

The reciprocal best hits analysis revealed strong orthology relationships between species pairs:

  • Acropora pulchra vs Porites evermanni: 13,782 orthologous protein pairs
  • Acropora pulchra vs Pocillopora tuahiniensis: 13,320 orthologous protein pairs
  • Porites evermanni vs Pocillopora tuahiniensis: 14,303 orthologous protein pairs

Key Findings

The orthology analysis revealed substantial conservation across the three coral species:

  • High conservation: ~66% of Acropora pulchra proteins have orthologs in both other species
  • Strong pairwise relationships: Each species pair shares 80-86% of their protein complement
  • Core gene set: 10,346 proteins (three-way orthologs) represent the conserved ancestral gene set
  • Lineage-specific genes: ~7,980 proteins show species-specific orthology patterns, indicating potential lineage-specific adaptations

Technical Implementation

Computational Resources

The analysis utilized:
- BLAST databases: Custom-built for each species
- Parallel processing: Multi-threaded BLAST searches for efficiency
- Quality control: Multiple filtering steps to ensure orthology confidence

Data Management

Results were organized into:
- Ortholog groups: Hierarchical classification of orthologous relationships
- Pairwise comparisons: Detailed RBH results for each species pair
- Summary statistics: Comprehensive overview of orthology patterns

Output Files

All analysis results are available in the orthology analysis output directory:

Core Results:
- ortholog_groups.csv - Complete ortholog group assignments (18,326 groups)
- orthology_summary.csv - Summary statistics and counts

Pairwise Comparisons:
- apul_peve_rbh.csv - Acropora pulchra vs Porites evermanni reciprocal best hits
- apul_ptua_rbh.csv - Acropora pulchra vs Pocillopora tuahiniensis reciprocal best hits
- peve_ptua_rbh.csv - Porites evermanni vs Pocillopora tuahiniensis reciprocal best hits

Raw BLAST Results:
- BLAST output files for all pairwise comparisons (Apul_vs_Peve.blastp, Peve_vs_Apul.blastp, etc.)
- BLAST databases for each species (Apul_proteins., Peve_proteins., Ptua_proteins.*)

Conclusions

The identification of orthologous proteins across three coral species establishes a critical resource for comparative coral genomics. This analysis reveals both conserved and divergent aspects of coral biology, providing insights into the evolutionary processes that have shaped coral diversity.

The orthology assignments generated here will serve as a reference for future studies investigating coral biology, evolution, and responses to environmental change. By understanding the genetic relationships between these species, we can better predict how different coral lineages may respond to changing environmental conditions.


This analysis represents a key step in our multi-species comparative genomics pipeline, providing the foundation for cross-species comparisons and evolutionary studies in coral biology.