Daniel Egbo

Machine Learning & AI Engineer

Building LLM applications, RAG systems, and ML pipelines. PhD researcher applying the same scientific rigor to large-scale radio-astronomy data at UCT/SAAO.

🎯 Target: ML Engineer / AI Engineer roles in applied ML or LLM infrastructure

Cape Town, South Africa (Open to Remote / Relocation)

Google Scholar | LinkedIn | GitHub | Twitter

6+

Publications

15+

Projects

4

Awards & Grants

5+

Certifications

About Me

I am an Astronomy PhD Candidate at the University of Cape Town and the South African Astronomical Observatory, researching active radio-emitting stars using multi-wavelength data from MeerKAT, Gaia, and eROSITA. My work involves cross-matching millions of astronomical sources, optical spectroscopy, and statistical analysis to understand stellar magnetic activity.

Alongside my research, I build machine learning systems, LLM applications, and data engineering pipelines. I have developed RAG-based chatbots, computer vision classifiers, fraud detection models, and fine-tuned large language models. I combine scientific rigor with practical engineering to solve real-world problems with AI.

I am actively seeking industry opportunities where I can apply my expertise in machine learning, data science, AI engineering, and scientific computing.

Featured Projects

⭐ Featured Project (GenAI / Agents)

Multi-agent RAG Applications

Developed a multi-agent retrieval-augmented generation (RAG) system utilizing specialized LangGraph agents to orchestrate document-based question answering.

  • Impact: Achieved a 30% reduction in hallucination rates and improved response accuracy by 25% on complex scientific papers.
  • Features: Automated routing, query reformulation, self-correction, and citation verification mechanisms.

Tech Stack: LangGraph, LangChain, LiteLLM, Qdrant, Milvus, Python

Medical Speech Recognition using Whisper

Fine-tuned OpenAI's Whisper model on clinical dictation datasets to transcribe specialized medical nomenclature, radiology reports, and accented audio.

  • Impact: Reduced Word Error Rate (WER) by 15% for specialized terminology over out-of-the-box baselines.
  • Scale: Processed and optimized workflows for 200+ hours of medical recordings.

Tech Stack: PyTorch, Hugging Face, Whisper ASR, Python

Stellar Counterpart Identification in the MeerKAT Galactic Plane Survey

Cross-matched 443,000+ radio sources from the SARAO MeerKAT survey against Gaia DR3's 1.8 billion-object catalog, using Monte Carlo simulation to statistically quantify match reliability and rule out chance alignments. Identified 629 candidate stellar counterparts.

  • Impact: The largest radio-optical crossmatch sample of its kind.
  • Scale: Cross-matched 443,000+ radio sources from the SARAO MeerKAT survey against Gaia DR3's 1.8 billion-object catalog.

Tech Stack: Astropy, TAP, TOPCAT, Monte Carlo simulation

View All Projects Data Science Portfolio

Technical Skills

ML & AI Engineering

PyTorch · Scikit-learn · XGBoost/LightGBM · Hugging Face · LangChain/LangGraph · RAG systems (Qdrant, Pinecone, Milvus) · Fine-tuning (Unsloth) · Agentic systems · OpenAI/LiteLLM APIs

Data Engineering & Cloud

Python · SQL · Pandas/NumPy · Airflow/Prefect/Kestra · dbt · BigQuery/DuckDB/Postgres · AWS/GCP · S3/GCS/MinIO

Scientific Computing (PhD research)

Astropy · TOPCAT · Large-scale catalog cross-matching · Statistical/Monte Carlo methods · NVIDIA RAPIDS (cuDF/cuML)

Selected Publications

View all publications & presentations

Honors and Awards

Professional Training & Certifications

NVIDIA Deep Learning Institute

  • Building Conversational AI Applications (2025)
  • Accelerating End-to-End Data Science Workflows (2024)
  • Getting Started with Deep Learning (2024)
  • Generative AI with Diffusion Models (2024)
  • Building RAG Agents with LLMs (2024)
  • Disaster Risk Monitoring Using Satellite Imagery (2024)

Data Science & ML

  • MLOps Zoomcamp - DataTalks.Club (2025)
  • Machine Learning Zoomcamp - DataTalks.Club (2023)
  • Applied Data Science II: ML & Statistical Analysis - WorldQuant University (2023)
  • Applied Data Science I: Scientific Computing & Python - WorldQuant University (2023)

Summer Programs

  • Oxford Machine Learning Summer School (2023)
  • COSPAR X-Vision School: X-ray Astronomy (2023)
  • ESCAPE Summer School: Data Science for Astronomy (2021)
  • ZTF Summer School: Time-domain Astronomy (2021)
  • GROWTH Astronomy School: Time-domain Astronomy (2020)