Sign in

23andMe Engineering

Patrick Yee and Allie Sanzi, Software Engineers at 23andMe


This article illustrates how to architect an AWS Lambda function, written in Python, to stream input data from an S3 object, pipe the data stream through an external program, and then pipe the output stream to an object in S3.


AWS Lambda function is a handy serverless computing service for handling unit tasks. However, it has limitations that make it impossible to fit large input and/or output files into its memory or temporary file storage:

  • Maximum temporary file storage is 512MB
  • And it is unencrypted, which does not meet our data security…

Prad Pagadala, Software Engineer at 23andMe

AWS Elastic Map Reduce (EMR), a popular big data platform, powers a lot of computation at 23andMe. One particular use case is to find associations between genes and traits using Hail, which is built on top of Apache Spark. In the course of figuring out how to optimize processing for this application, we discovered ways to reduce costs significantly. EMR and Spark is a popular approach, but configuring them optimally is overwhelming. For those of you looking to accomplish this, we hope this saves you the days and dollars we had to spend!

Let’s assume that you’ve already tackled the…

Tulasi Paradarami, Sr. Engineering Manager at 23andMe


In bioinformatics, Variant Call Format (VCF) is a popular text file format for storing genetic variation data, and is the standard output for popular imputation methodologies like minimac. It’s unambiguous, it’s flexible, and it supports arbitrary metadata. A drawback of the VCF format, however, is that it’s a text file and it requires a lot of storage space, even with compression. In this post, we’ll discuss the benefits of using columnar storage formats such as Apache Parquet for storing genetic data and share the results from experiments comparing performance between these two file formats.

For reference, here are definitions of…

Joe Banks, Tech Lead at 23andMe

I have a disability. Two, actually.

I was diagnosed with Usher Syndrome Type 2A, a rare genetic disease characterized by hearing loss and progressive vision loss.

When I was 3, my parents discovered that I was born with moderate-severe hearing loss in both of my ears. I was fitted with a pair of hearing aids, devices that would allow me to finally understand and hear my family.

The vision loss has had more of a profound impact on my life. While I knew that I was going blind, it didn’t feel real to me until I was in my 20s…

23andMe Engineering

Get your Codon

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store