Daniel Egbo

Machine Learning & AI Engineer

Building production-grade LLM applications, multi-agent RAG infrastructure, and high-throughput ML pipelines. PhD researcher applying advanced scientific computing to large astronomy datasets at UCT/SAAO.

⚡ Specializing in applied ML, vector databases, and scalable AI infrastructure.

Cape Town, South Africa (Open to Remote / Relocation)

Google Scholar | LinkedIn | GitHub | Twitter

Download CV GitHub

6+

Publications

15+

Projects

4

Awards & Grants

5+

Certifications

About Me

I am a Machine Learning Engineer and Astronomy PhD Candidate at the University of Cape Town and the South African Astronomical Observatory. My research focuses on active radio-emitting stars, a domain that requires processing, cross-matching, and engineering pipelines for massive multi-wavelength datasets from MeerKAT, Gaia, and eROSITA.

Alongside my academic research, I engineer production-grade machine learning systems, LLM applications, and robust data pipelines. My work includes architecting multi-agent RAG systems, fine-tuning domain-specific models, and building automated data workflows. I treat engineering with the same analytical rigor required by big-data astronomy, ensuring that systems are scalable, mathematically sound, and optimized for performance.

I actively collaborate on both scientific computing initiatives and applied AI projects, bringing a unique blend of deep analytical research and practical software engineering to teams building next-generation intelligent systems.

Featured Projects

⭐ Featured Project (GenAI / Agents)

Multi-Agent RAG Applications (GenAI Agents)

Designed and orchestrated a multi-agent retrieval-augmented generation (RAG) infrastructure utilizing LangGraph to handle complex, document-based scientific QA. The architecture manages stateful routing, automated query reformulation, self-correction, and citation verification mechanisms across isolated, specialized LLM agents. By implementing this modular agentic framework, the system achieved a 30% reduction in hallucination rates and improved response accuracy by 25% when evaluating dense scientific literature..

Tech Stack: LangGraph, LangChain, LiteLLM, Qdrant, Milvus, Airflow, Python

Medical Speech Recognition Pipeline

Built an end-to-end automated fine-tuning and inference pipeline for OpenAI's Whisper ASR model on clinical dictation datasets. The project focused on optimizing the model to process specialized medical nomenclature, complex radiology reports, and highly accented audio across more than 200 hours of recording data. The custom training pipeline successfully reduced the Word Error Rate (WER) by 15% over out-of-the-box baselines, providing a highly reliable translation layer for domain-specific medical terminology.

Tech Stack: PyTorch, Hugging Face, Whisper ASR, Python

Big Data Spatial Cross-Matching: SARAO MeerKAT & Gaia DR3

Engineered a massive spatial data processing pipeline to cross-match over 443,000+ radio sources from the MeerKAT Galactic Plane Survey against Gaia DR3’s 1.8 billion-object catalog. To isolate genuine astrophysical matches from background noise, I developed a statistical validation pipeline using Monte Carlo simulations to quantify alignment reliability and rule out chance alignments in highly crowded fields. The project successfully identified 629 high-confidence candidate stellar counterparts, the largest radio-optical cross-match sample of its kind to date.

Tech Stack: Astropy, TAP, TOPCAT, Monte Carlo simulation, Python

View All Projects Data Science Portfolio

Technical Skills

ML & AI Engineering

Frameworks & Core ML: PyTorch · Scikit-learn · XGBoost · LightGBM · Hugging Face
Generative AI & Multi-Agent Systems: LangGraph · LangChain · LLM Orchestration · Model Evaluation · LiteLLM · Agentic Workflows
Vector Databases & Retrieval: Qdrant · Milvus · Pinecone · OpenSearch · Semantic Search · Hybrid Search
Fine-Tuning & Inference: Unsloth (PEFT/LoRA) · Inference Optimization · OpenAI APIs

Data Engineering & Cloud Infrastructure

Data Orchestration & Pipelines: Apache Airflow · Prefect · Kestra · dbt · ETL/ELT Pipelines
Databases & Big Data SQL · BigQuery · DuckDB · PostgreSQL · Entity Resolution · Scalable Architecture
Cloud & Infrastructure: AWS · GCP · S3 · GCS · MinIO · Docker (Containerization) · Docker Compose · Kubernetes

Scientific Computing & Analytics

Statistical Modeling: Monte Carlo Simulations · Time-Series Analysis · Bayesian Inference · Hypothesis Testing Predictive Analytics
High-Performance Compute: Parallel Processing · Slurm ·
Astrophysics Stack: Astropy · Specutils · Astroquery · LSDB · TOPCAT · TAP · ADQL · DS9 · CARTA

Selected Publications

Egbo, O. D., Groot, P. J., Buckley, D. A. H., Robrade, J., Schwope, A. D., Freund, S., Schneider, P. C. & Stelzer, B. (2026). X-ray counterparts to stellar MeerKAT Galactic-plane compact radio sources Astronomy & Astrophysics, 707, A193.
Egbo, O. D., Buckley, D. A. H., Groot, P. J., Cavallaro, F., Woudt, P. A., Thompson, M. A., Mutale, M. & Bietenholz, M. (2025). The stellar population in the SARAO MeerKAT Galactic Plane Survey Monthly Notices of the Royal Astronomical Society, 540(3), pp.2685-2702.
Potter, S. B., Buckley, D. A., Scaringi, S., Monageng, I. M., Egbo, O. D., Charles, P. A., ... & Hlakola, M. (2024). Optical spectroscopic and photometric classification of the X-ray transient EP240309a as an intermediate polar Monthly Notices of the Royal Astronomical Society: Letters, 532(1), L21-L26.
Garnavich, P., Potter, S. B., Buckley, D. A., van Dyk, A., Egbo, O. D., Littlefield, C., & Greiveldinger, A. (2023). Rapid Evolution of the White Dwarf Pulsar AR Scorpii The Astrophysical Journal Letters, 958(2), L22.

View all publications & presentations

Honors and Awards

SAAO PhD Prize Scholarship Award for PhD study in Astronomy at the University of Cape Town, 2021-2024
Breakthrough Listen Travel Grant to attend the 2023 ThunderKAT meeting at the University of Oxford, September 2023
COSPAR Grant to attend the X-VISION School, a Joint IAU I-HOW and COSPAR Capacity Building workshop at North-West University, South Africa, February 2023
AGNES Intra-Africa Mobility Grant to visit SAAO for research during MSc, 2018

Professional Training & Certifications

NVIDIA Deep Learning Institute

Building Conversational AI Applications (2025)
Accelerating End-to-End Data Science Workflows (2024)
Getting Started with Deep Learning (2024)
Generative AI with Diffusion Models (2024)
Building RAG Agents with LLMs (2024)
Disaster Risk Monitoring Using Satellite Imagery (2024)

Data Science & ML

MLOps Zoomcamp - DataTalks.Club (2025)
Machine Learning Zoomcamp - DataTalks.Club (2023)
Applied Data Science II: ML & Statistical Analysis - WorldQuant University (2023)
Applied Data Science I: Scientific Computing & Python - WorldQuant University (2023)

Summer Programs

Oxford Machine Learning Summer School (2023)
COSPAR X-Vision School: X-ray Astronomy (2023)
ESCAPE Summer School: Data Science for Astronomy (2021)
ZTF Summer School: Time-domain Astronomy (2021)
GROWTH Astronomy School: Time-domain Astronomy (2020)