Stats on e5 genomes

welcoming new pulcra
e5
Author
Affiliation

Steven Roberts

Published

October 5, 2024

This assessment compares the genomic assemblies and annotations of three coral species: Acropora pulchra (Apul), Porites evermanni (Peve), and Pocillopora meandrina (Ptuh). The analysis focuses on genome assembly metrics, gene annotation statistics, and potential biological implications.


1. Genome Assembly Quality

Number of Sequences and Contiguity

  • Apul (A. pulchra)
    • Number of Sequences: 174
    • N50: 17,861,421 bp
    • L50: 10
    • Longest Sequence: 45,111,900 bp
  • Peve (P. evermanni)
    • Number of Sequences: 8,186
    • N50: 171,385 bp
    • L50: 935
    • Longest Sequence: 1,802,771 bp
  • Ptuh (P. meandrina)
    • Number of Sequences: 212
    • N50: 10,024,633 bp
    • L50: 13
    • Longest Sequence: 21,651,136 bp

Assessment:

  • Assembly Contiguity: Apul exhibits the highest assembly contiguity, indicated by the lowest number of sequences and highest N50 value. Ptuh ranks second, while Peve has a more fragmented assembly with a significantly higher number of sequences and lower N50.
  • Sequence Length Distribution: Apul’s longest sequence is substantially longer than those of Peve and Ptuh, suggesting a more complete assembly.

Genome Size and GC Content

  • Total Genome Length:
    • Peve: 603,805,388 bp
    • Apul: 518,313,916 bp
    • Ptuh: 376,579,914 bp
  • GC Content:
    • Apul: 39.05%
    • Ptuh: 38.03%
    • Peve: 36.38%

Assessment:

  • Genome Size Variability: Peve has the largest genome size but the most fragmented assembly, which may indicate repetitive elements or assembly challenges.
  • GC Content Consistency: All species have similar GC content, ranging from 36.38% to 39.05%, which is typical for coral genomes.

2. Gene Annotation and Features

Gene Counts and Features

  • Apul:
    • Total Genes: 36,447
    • Unique Features: 5 (exon, CDS, gene, mRNA, tRNA)
    • Total GFF Entries: 499,892
  • Peve:
    • Total Genes: 40,389
    • Unique Features: 3 (CDS, mRNA, UTR)
    • Total GFF Entries: 286,807
  • Ptuh:
    • Total Genes: 31,840
    • Unique Features: 3 (exon, CDS, transcript)
    • Total GFF Entries: 448,910

Assessment:

  • Gene Count: Peve has the highest number of predicted genes, followed by Apul and Ptuh.
  • Annotation Features: Apul includes tRNA annotations, which are absent in Peve and Ptuh. This suggests a more comprehensive annotation in Apul.
  • Unique Sources: Different gene prediction tools were used—Apul (funannotate), Peve (Gmove), Ptuh (AUGUSTUS)—which may affect gene count and features.

Gene Length and Total Bases

  • Average Gene Length:
    • Apul: 5,887.31 bp
    • Ptuh: 1,376.95 bp
    • Peve: 1,337.5 bp
  • Total Bases in Genes:
    • Apul: 214,577,849 bp
    • Peve: 54,020,937 bp
    • Ptuh: 43,842,885 bp

Assessment:

  • Gene Length: Apul’s average gene length is significantly longer than that of Peve and Ptuh, potentially indicating differences in gene structure, annotation methods, or biological variation.
  • Total Coding Sequence: Apul has a higher total base count in genes, which correlates with longer genes and may reflect a greater number of exons or longer intronic regions.

3. Biological Implications and Considerations

  • Assembly Quality Impact: The higher contiguity in Apul and Ptuh assemblies may lead to more accurate gene predictions and annotations compared to Peve.
  • Gene Prediction Tools: The use of different annotation tools can result in varying gene counts and features. Funannotate (Apul) provides tRNA annotations, suggesting a more thorough annotation pipeline.
  • Gene Length Variation: The discrepancy in average gene lengths might be due to:
    • Biological Differences: Genuine differences in gene structure among species.
    • Annotation Strategies: Differences in handling of introns, UTRs, or alternative splicing events.
    • Prediction Algorithms: Varying sensitivity and specificity of gene prediction tools.

4. Conclusion

  • Best Assembly: Acropora pulchra (Apul) exhibits the most contiguous and comprehensive genome assembly and annotation among the three species.
  • Gene Annotation Depth: Apul’s inclusion of tRNA and longer gene lengths suggests a more detailed annotation, which could facilitate better functional genomics studies.
  • Areas for Improvement: Peve’s fragmented assembly and shorter gene lengths indicate potential areas for reassembly and reannotation to improve genomic resources.

Recommendation: For comparative genomics and evolutionary studies, researchers should consider the differences in assembly quality and annotation methods. Standardizing gene prediction tools and reannotating genomes where necessary could enhance the reliability of cross-species comparisons.


data: