Contents
Experience
- Applying state-of-the-art machine learning techniques to improve genetic variant calling from high-error genome sequencing datasets
- Developing NextFlow pipelines for Google Cloud to measure and report variant calling accuracy across a range of genomics contexts and leading cross-team experiments to evaluate the impact of sequencing technology changes on bioinformatics findings
- Developed a graph-based algorithm to cluster and phase assembled genomic sequences into haplotypes
- Designed and implemented state-of-the-art bioinformatics pipelines using NextFlow, Python, and R for large-scale analysis through AWS Batch
- Analyzed whole-genome and whole-exome sequencing from diverse human populations, including variant calling, imputation, genotyping, quality control, and visualization
- Integrated pipeline results with MySQL databases and helped develop database schemas
Research areas:
- Designed algorithms and data structures for population-scale genomic structural variant data, and released software around them
- Contributed to the first complete human genome assembly and assessed its impact on variant calling
- Designed data structures to accelerate genomic read alignment
- Implemented Louvain clustering algorithm and integrated it into the CellRanger single-cell RNA pipeline
- Implemented a Rust API for the STAR RNA read aligner and used it to improve CellRanger’s efficiency
- Implemented algorithm for landmark recognition in Bing Images
- Automated quantifying and categorizing of bandwidth usage in the Google Search Android application
- Designed and implemented an object tracking computer vision algorithm to analyze cell state trajectories in longitudinal microscopy data
- Implemented an iOS lab device remote control proxy client/server for running automated application tests
Publications
Nature Methods 2023
Bioinformatics 2021
Cell Systems 2021
Science 2022
Genome Biology 2022
Cell Genomics 2022
Cell 2020
Genome Biology 2019
Genome Research 2020
JCI Insight 2021
Nature Communications 2020
Skills
- Java
- Python
- R
- Bash
- MySQL
- C++
- C
- Rust
- Variant discovery (Samtools, freebayes, GATK, Sniffles, Jasmine, etc.)
- Read alignment (BWA, Bowtie, NGMLR, minimap2, winnowmap, etc.)
- Quality control (FastQC, mosdepth, Picard, VerifyBamID, MultiQC, etc.)
- Genome assembly (Canu, Flye, Hifiasm, GenomeScope, etc.)
- Functional genomics (STAR, Cell Ranger, bedtools, etc.)
- Random forests
- Gradient boosting
- CNNs
- Tensorflow scikit-learn
- AWS (Batch, EC2, SageMaker, ECR)
- Docker
- Slurm
- Mathematics and statistics
- Algorithms and data structures
- Scientific writing and presentation (Google Docs, Inkscape, etc.)
- Agile project management (Jira, Confluence, Google Sheets, Asana, etc.)
- Git/Github
- Linux command line
- Pipelining (NextFlow, Snakemake)
- Google Project Management Certificate (Coursera, April 2023)