Contents

Experience

Principal Bioinformatics Scientist

June 2023 - Present
Roche Diagnostics, Remote
  • Applying state-of-the-art machine learning techniques to improve genetic variant calling from high-error genome sequencing datasets
  • Developing NextFlow pipelines for Google Cloud to measure and report variant calling accuracy across a range of genomics contexts and leading cross-team experiments to evaluate the impact of sequencing technology changes on bioinformatics findings
  • Developed a graph-based algorithm to cluster and phase assembled genomic sequences into haplotypes

Bioinformatics Scientist

June 2022 - June 2023
Variant Bio, Seattle, WA
  • Designed and implemented state-of-the-art bioinformatics pipelines using NextFlow, Python, and R for large-scale analysis through AWS Batch
  • Analyzed whole-genome and whole-exome sequencing from diverse human populations, including variant calling, imputation, genotyping, quality control, and visualization
  • Integrated pipeline results with MySQL databases and helped develop database schemas

Research Assistant (Ph.D. Student)

2017 - 2022
Schatz Lab, Johns Hopkins University, Baltimore, MD

    Research areas:

Computational Genomics Intern

Summer 2019
10x Genomics, Pleasanton, CA
  • Implemented Louvain clustering algorithm and integrated it into the CellRanger single-cell RNA pipeline
  • Implemented a Rust API for the STAR RNA read aligner and used it to improve CellRanger’s efficiency

Software Engineering Intern

Summer 2016
Microsoft, Bellevue, WA
  • Implemented algorithm for landmark recognition in Bing Images

Software Engineering Intern

Summer 2015
Google, London, UK
  • Automated quantifying and categorizing of bandwidth usage in the Google Search Android application

Software Engineering Intern

Summer 2014
Google, Mountain View, CA
  • Designed and implemented an object tracking computer vision algorithm to analyze cell state trajectories in longitudinal microscopy data

Software Engineering Intern

Summer 2013
Google, Cambridge, MA
  • Implemented an iOS lab device remote control proxy client/server for running automated application tests

Publications

  • Jasmine and Iris: population-scale structural variant comparison and analysis
  • Kirsche, M., Prabhu, G., Sherman, RM., Ni, B., Aganezov, S., Schatz, MC.
    Nature Methods 2023
  • A complete reference genome improves analysis of human genetic variation
  • Aganezov, S.*, Yan, S.*, Soto, D.*, Kirsche, M.*, Zarate, S.* et al. (*co-first authors)
    Science 2022
  • Sapling: Accelerating Suffix Array Queries with Learned Data Models
  • Kirsche, M., Das, A., Schatz, MC.
    Bioinformatics 2021
  • Democratizing long-read genome assembly
  • Kirsche, M. and Schatz, MC
    Cell Systems 2021
  • The complete sequence of a human genome
  • Nurk, S. et al.
    Science 2022
  • Automated assembly scaffolding using RagTag elevates a new tomato system for high-throughput genome editing
  • Alonge, M., Lebeigle, L., Kirsche, M., Aganezov, S., Wang, X., Lippman, Z., Schatz, MC., Soyk, S.
    Genome Biology 2022
  • Paragraph: a graph-based structural variant genotyper for short-read sequence data
  • Chen, S., Krusche, P., Dolzhenko, E., Sherman, RM., Petrovski, R., Schlesinger, F., Kirsche, M., Bentley, D., Schatz, MC., Sedlazeck, F., Eberle, M.
    Genome Biology 2019
  • Comprehensive analysis of structural variants in breast cancer genomes using single molecule sequencing
  • Aganezov, S., Goodwin, S., Sherman, RM., Sedlazeck, F., Arun, G., Bhatia, S., Lee, I., Kirsche, M., Wappel, R., Kramer, M., Kostroff, K., Spector, D., Timp, W., McCombie, WR., Schatz, MC.
    Genome Research 2020

    Skills

    Programming Languages

    • Java
    • Python
    • R
    • Bash
    • MySQL
    • C++
    • C
    • Rust

    Bioinformatics

    • Variant discovery (Samtools, freebayes, GATK, Sniffles, Jasmine, etc.)
    • Read alignment (BWA, Bowtie, NGMLR, minimap2, winnowmap, etc.)
    • Quality control (FastQC, mosdepth, Picard, VerifyBamID, MultiQC, etc.)
    • Genome assembly (Canu, Flye, Hifiasm, GenomeScope, etc.)
    • Functional genomics (STAR, Cell Ranger, bedtools, etc.)

    Machine Learning

    • Random forests
    • Gradient boosting
    • CNNs
    • Tensorflow scikit-learn

    Other

    • AWS (Batch, EC2, SageMaker, ECR)
    • Docker
    • Slurm
    • Mathematics and statistics
    • Algorithms and data structures
    • Scientific writing and presentation (Google Docs, Inkscape, etc.)
    • Agile project management (Jira, Confluence, Google Sheets, Asana, etc.)
    • Git/Github
    • Linux command line
    • Pipelining (NextFlow, Snakemake)

    Certificates