Quick summary:

23andMe raw data contains insertions and deletions with proprietary identifiers, most of which have never been analyzed.

Our software can now handle over 1,000 of these “indels”, and nearly all of them impact a human disease or trait!


 

Background:

There are only a few thousand insertions and deletions (“indels”) in the 23andMe raw data.  That’s not many compared to the hundreds of thousands of SNPs.  But indels can be some of the most impactful types of genome alterations.  Many diseases and traits are caused by an insertion or deletion in a critical gene.

 

Analysis of the indels in 23andMe’s raw data is difficult, because many of the indels use 23andMe’s proprietary identifier (i.e. i5037354).  In addition, they do not provide enough information to determine the exact insertion or deletion that was designed to be tested.  We asked 23andMe if they would share this information, but they declined to do so.

 

In the latest 23andMe genotyping chip (v4) there are:

4,093 total indels

and 3,413 of these indels use a 23andMe proprietary identifier  (83.3%)

 

Even when a dbSNP (rs) identifier is used, the position of the indel can be shifted, such that it makes it difficult to compare to next-generation sequencing data.

We knew there were likely to be many important indels among those in the 23andMe data, so we set out to reverse engineer as many as we could, and identify those that affect human disease and traits.

 

The Indel Analysis:

We started with over 1,500 23andMe raw data files from the Opensnp.org database.  We compiled a list of every indel and the frequency with which we found a DD, DI, or II genotype.  Then, we cross-correlated this list with a list of nearby known indels from our own database – especially those with a disease or trait phenotype.  We expect that many of the indels in the 23andMe raw data were designed to test known clinically relevant genome variants.

Finally, we went though a very labor intensive process to analyze each indel, the surrounding sequence, the nearby clinical variants, and the expected allele frequencies.  In the end, we were able to confidently identify over 1,000 indels, most of which have a known effect on a disease or trait.

 

An Example:

Let’s take a look at one:

i5012559    8    87656009    DI

We have identified this as an autosomal recessive deletion that can lead to Achromatopsia – a condition where the individual cannot see any color – complete color blindness!  There are a few carriers of this deletion in the Opensnp database, but no homozygous individuals (2 copies and therefore affected).  The frequency of this deletion among the 1,500 23andMe users is consistent with the frequency of this deletion in next-generation sequencing data.

 

23andMe doesn’t tell you anything about this deletion (even if you have access to the health information).  In the old 23andMe health reports, 23andMe identifies only 20 total insertions and deletions.  Given that there is less total information in the new health reports, I expect this number to be even smaller in the newly announced 23andMe health reports.

As of this publication, this deletion is not reported by other interpretation services, like SNPedia/Promethease.  To examine further, I randomly selected 50 of the indels that we identified and looked for them in SNPedia.  SNPedia only had information on 2 out of the 50 indels tested.

 

Summary:

For the first time anywhere, we have been able to analyze over 1,000 of 23andMe’s proprietary indels.  To my knowledge, the Enlis software is the only solution for identifying and getting more information on the majority of these health-impacting variants.

I will have a more complete analysis of the totality of health information in the 23andMe raw data in another blog post, but one interesting thing to leave you with — the 23andMe raw data contains information on hundreds of indels that are related to hereditary cancer.  How many hereditary cancer variants does 23andMe report in their new system?  Zero.

 

Want to get your own 23andMe indels analyzed?  Click here to start our import process.

 

 

Note:  23andMe recently revamped their online service, but the genotyping chip has not changed.  The v4 chip, launched in December 2013, is still being used.

 

Appendix:

The indels that we analyze affect these diseases:

Achondrogenesis, type IB
Achromatopsia 3
Alpha Thalassemia
Alpha-2-macroglobulin polymorphism
Alzheimer disease, susceptibility to
Amyotrophic lateral sclerosis type 2
Andermann syndrome
Aspartylglycosaminuria
Ataxia with vitamin E deficiency
Ataxia, Friedreich-like, with isolated vitamin E deficiency
Ataxia-telangiectasia syndrome
Atypical Rett syndrome
BRCA1 and BRCA2 Hereditary Breast and Ovarian Cancer
Becker muscular dystrophy
Benign scapuloperoneal muscular dystrophy with cardiomyopathy
Beta Thalassemia
Beta-plus-thalassemia
Beta-thalassemia dominant
Bloom syndrome
Breast cancer, susceptibility to
Breast-ovarian cancer, familial 1
Breast-ovarian cancer, familial 2
Bronchiectasis with or without elevated sweat chloride 1, modifier of
Brugada syndrome 1
Cardiomyopathy
Carnitine palmitoyltransferase ii deficiency, late-onset
Ceroid lipofuscinosis neuronal 5
Ceroid lipofuscinosis, neuronal, 11
Choroideremia
Colorectal cancer, hereditary, nonpolyposis, type 1
Cone-rod dystrophy 3
Congenital myopathy with fiber type disproportion
Congestive heart failure and beta-blocker response, modifier of
Cystic fibrosis
Deafness, autosomal recessive 1A
Deafness, digenic, GJB2/GJB3
Deafness, digenic, GJB2/GJB6
Debrisoquine, poor metabolism of
Delta-zero-thalassemia, knossos type
Dermatitis, atopic, 2, susceptibility to
Diastrophic dysplasia
Dilated cardiomyopathy 1A
Dilated cardiomyopathy 3B
Duchenne muscular dystrophy
Dystonia 1
Dystonia 12
Early infantile epileptic encephalopathy 2
Encephalopathy, neonatal severe, due to MECP2 mutations
Enlarged vestibular aqueduct syndrome
Familial Mediterranean fever
Familial cancer of breast
Familial hypercholesterolemia
Familial hypertrophic cardiomyopathy 2
Familial hypertrophic cardiomyopathy 4
Familial hypertrophic cardiomyopathy 7
Fanconi anemia, complementation group C
Fanconi anemia, complementation group D1
Frontotemporal dementia, ubiquitin-positive
Fumarase deficiency
Gaucher’s disease, type 1
Glucose-6-phosphate transport defect
Glycogen storage disease IIIa
Glycogen storage disease IIIb
Glycogen storage disease type 1A
Glycogen storage disease type III
Hearing impairment
Heinz body hemolytic anemia
Hemoglobinopathy
Hereditary cancer-predisposing syndrome
Hereditary factor VIII deficiency disease
Hereditary fructosuria
Hereditary leiomyomatosis and renal cell cancer
Hereditary nonpolyposis colorectal cancer type 5
Hereditary pancreatitis
Hypertrophic cardiomyopathy
I cell disease
Ichthyosis vulgaris
Immunodeficiency due to ficolin 3 deficiency
Infantile hypophosphatasia
Infantile-onset ascending hereditary spastic paralysis
Infertility associated with multi-tailed spermatozoa and excessive DNA
Inflammatory bowel disease 1, susceptibility to
Leber congenital amaurosis 4
Left ventricular noncompaction 6
Li-Fraumeni syndrome 1
Limb-girdle muscular dystrophy, type 2A
Limb-girdle muscular dystrophy, type 2G
Long QT syndrome 3
Lynch syndrome
Lynch syndrome I
Lynch syndrome II
Macular dystrophy, vitelliform, adult-onset
Malignant tumor of prostate
Marfan’s syndrome
Maturity-onset diabetes of the young,  type 2
Meckel-Gruber syndrome
Mental retardation, X-linked, syndromic 13
Microcephaly, normal intelligence and immunodeficiency
Multiple epiphyseal dysplasia 4
Myopathy, distal, 1
Neurofibromatosis, familial spinal
Neurofibromatosis, type 1
Neurofibromatosis, type 2
Neurofibromatosis-Noonan syndrome
Niemann-Pick disease, type A
Osteogenesis imperfecta
Osteogenesis imperfecta type I
Osteogenesis imperfecta type III
Pachydermoperiostosis syndrome
Pachyonychia congenita type 2
Pancreatic cancer 2
Pancreatic cancer 4
Pancreatic cancer, susceptibility to
Parkinson disease 6, autosomal recessive early-onset
Parkinson disease, late-onset
Pendred’s syndrome
Persistent hyperinsulinemic hypoglycemia of infancy
Phenylketonuria
Phosphate transport defect
Polycystic kidney disease, infantile type
Primary familial hypertrophic cardiomyopathy
Primary hyperoxaluria, type II
Primary progressive aphasia
Pseudo-Hurler polydystrophy
Pseudoxanthoma elasticum
Retinitis pigmentosa 19
Retinitis pigmentosa 7
Retinoblastoma
Rett’s disorder
Schwannomatosis
Spastic ataxia Charlevoix-Saguenay type
Stargardt disease 1
Supranuclear palsy, progressive, 1, atypical
Symmetrical dyschromatosis of extremities
Tay-Sachs disease
Turcot syndrome
Tyrosinase-negative oculocutaneous albinism
Werdnig-Hoffmann disease
Wilson’s disease