Lichen Biodiversity Informatics & ML Classification

Nov 2024 – Present · Arizona State University Research Affiliate

Collaborating with Dr. Frank Bungartz, Collections Manager of Lichens at Arizona State University and domain authority on lichen taxonomy, on a multi-component research program spanning database engineering, phylogenetics, and deep learning image classification.

ML Image Classification Pipeline

Developing a BioCLIP ViT-L/14 + ArcFace metric learning classifier for species-level lichen identification from images. Two-stage training design: large-scale noisy pretraining on iNaturalist aggregated imagery, followed by fine-tuning on expert-labeled Lichen Consortium data (~15,000 images across ~10,000 species).

  • UMAP + HDBSCAN clustering pipeline built over LIAS DELTA morphological data (10,709 species, 880 chemical compounds)
  • Interactive three-panel browser-based cluster viewer for exploratory analysis
  • Clustering analysis revealed clean photobiont-type separation — validating signal in morphological feature space
  • Embedding ensemble strategy: BioCLIP / DINOv2 / CLIP for robust representation

Phylogenetic Analysis Pipeline

End-to-end pipeline from raw GenBank HTML-wrapped flat files through aligned multi-locus phylogenetic inference:

  • Parsed ~51K GenBank DNA records (8 loci: ITS, 18S, 28S, RPB2, and others) producing 49,610 valid records
  • MAFFT alignment, trimAl trimming, IQ-TREE2 maximum likelihood inference
  • Primary focus on ITS, 18S, and 28S loci for Lecanoromycetes phylogeny
  • Mapping LIAS morphological and chemical trait data onto inferred phylogenetic trees

Database Engineering & Taxonomy Cleanup

PostgreSQL database (fungix schema) integrating LIAS DELTA, Mycobank, and Index Fungorum data sources:

  • Removed 35,000 non-lichen taxa from the Consortium database
  • Added 60,000 new validated taxa from Mycobank and Index Fungorum
  • Updated all 200,000 taxa records with external source references and protolog data
  • Contributor to Symbiota open-source platform — sourceIdentifiers across all taxonomic tree explorers

Parallel Investigation: PCG Classification

Active investigation of heart sound (phonocardiogram) classification using mel-spectrogram + Vision Transformer architectures and pretrained audio foundation models (HuBERT, Wav2Vec 2.0) — directly extending ECG signal processing expertise to acoustic cardiac signals.

Affiliations

  • ASU Research Affiliate — Arizona State University
  • Member — American Bryological and Lichenological Society (ABLS)
  • Contributor — Consortium of North American Lichen Herbaria (CNALH) / Symbiota
  • Registered — ODSC AI East 2026, Boston MA