EFTA01140307.pdf
dataset_9 pdf 565.3 KB • Feb 3, 2026 • 4 pages
Human Mutation
OFFICIAL JOURNAL
Back to the Future: From Genome to Metabolome HGV§1
HUMAN GENOME
VARIATION SOCIETY
wAwrq. wo
Joseph V. Thakuria,I 2* Alexander W. Zaranek," George M. Church,' and Gerard T. Eierry3
'Department of Genetics, Harvard Medical School, Boston, Massachusetts;20ivision of Genetics, Massachusetts GeneralHospital, Harvard
Medical School, Boston, Massachusetts; 'Division of Genetics, Department of Medicine, Children's Hospital Boston, Boston, Massachusetts
For the Deep Phenotyping Special Issue
Received 20 February2012; accepted revised manuscript 28 February 2012.
Published mine 18 March 2012 in Wiley Online Library Iwnw.wiley.comMumanmutation).001: 10.1002/humu.22073
Because of proven clinical benefit, a subset of these disorders has
made their way into formal newborn screening recommendations
ABSTRACT: In the traditional medical genetics setting,
metabolic disorders, identified either clinically or through [ACMG, 2006). Used for second-tier biochemical confirmation in
biochemical screening, undergo subsequent single gene conjunction with newborn screening programs, this technology has
testing to molecularly confirm diagnosis, provide further saved the lives of many newborns, children, and adults the world
insight on natural disease history, and inform on disease over. Starting with phenylketonuria in 1953, nutritional therapeutics
guided by metabolic screening and serial testing has been conclu-
management, treatment, familial testing, and reproduc-
tive options. For decades now, this process has been re- sively shown to have medical benefit in a wide variety of enzyme
sponsible for saving many lives worldwide. Only recently, deficiencies and other biochemical disorders.
As we enter the genomics era, our most diagnostically challenging
though, has it become possible to move in the opposite
direction by starting with an individual's whole genome cases in a medical genetics clinic are rapidly moving from a state
of having no causal molecular candidates to having many candi-
or exome, and, guided by this data, study more minor per-
turbations in the absolute values and substrate ratios of dates that need further evaluation and vetting. Nongenomic axes
clinically important biochemical analytes. Genomic indi- supporting causality from imaging, biochemical assay, functional
cellular work, and other lines of evidence are increasingly impor-
viduality can also be used to guide more detailed phenotyp-
ing aimed at uncovering milder manifestations of known tant to help verify pathogenicity. Of these, biochemical assays have
metabolic diseases. Metabolomic phenotyping in the Per- historically been the axis most frequently correlated with genetic
sonal Genome Project for our first 200+ participants—all data in a medical genetics practice.
of whom are scheduled to have full genome sequence at Additionally, although much progress has been made in the
screening, prevention, and treatment of inherited and primarily
more than 40x coverage available by May 2012—is aimed
at uncovering potential subclinical and preclinical disease autosomal-recessive biochemical disorders, limited resources have
states in carriers of known pathogenic mutations and in been devoted to studying potential subclinical and preclinical dis-
lesser known rare variants that are protein predicted to ease states in carriers of known pathogenic mutations as well as in
be pathogenic. Our initial focus targets 88 genes involved those harboring one or more less well-defined variants in known
disease-causing genes. In large part, this is due to newborn screen-
in 68 metabolic disturbances with established evidence-
based nutritional and/or pharmacological therapy as pan ingand other testing modalities reliance on biochemical analytes for
of standard medical care. screening and diagnosis. In clinical practice, the higher sensitivity,
Hum Murat 33:809-812, 2012. 0 2012 Wiley Periodicals, Inc. specificity, and cost-effectiveness ofscreeningbiochemically are well
justified.
KEY WORDS: genomics; metabolomics; nutritional ther- Large-scale genomic research studies utilizing next-generation
apy; pharmacological therapy sequencing, however, provides opportunity for researchers to start
with comprehensive genomic sequence data and, secondarily, study
the resulting phenotype and biochemical profile. If consistent ab-
normal trends (even trends within the normal range) are found as-
sociated with carrier states and/or lesser known mutations in genes
Background
causing metabolic disorders, it is intriguing to think of what effect
in the 1985 American film, "Back to the Future," Marty McFly is a modified diet specific to the defect will have on the health and
accidentally sent back in time to the 1950s by a plutonium powered well-being of such individuals. In order to explore this possibility,
"flux capacitor" in a modified DeLorean upon reaching 88 mph. an important first step is identifying whether such trends exist and
Throughout the film, the impact the future has on the past is ex- identifyingin which disorders subclinical or preclinical biochemical
plored. For decades now, mass spectrometric analysis typically uti- phenotypes are prevalent. In some disorders, such as galactosemia,
lizing a cylindrical capacitor ionization source to generate singly the biochemical and phenotypic effect of carrier status, and rarer
charged ions has been the backbone of diagnosis, management, Duarte allele I (GALT N314D + L2I8L) pin of function muta-
and/or treatment for hundreds of inherited metabolic disorders. tions have been studied and characterized [Striver et al., 2012). In
many other metabolic disorders, however, phenotypically, little may
be known beyond the scope of classically affected patients on the
extreme end of a disease severity spectrum.
Additional Supporting Information may be found in the online version of this article. In 1908, Archibald Garrod introduced the idea of bio-
'Correspondence to: Joseph V. Thakuria, Division of Genetics, Massachusetts chemical individuality and described four of the first known
General Hospital, Boston, MA02114. &mat jthakuria0geneticsmed.harvarctedu autosomal-recessive disorders: alkaptonuria, cystinuria, albinism,
C 2012 WILEY PERIODICALS, INC.
EFTA01140307
and pentosuria. Since then, over 300 metabolic disorders with Table 1. PGP Screening Questions Enriching for Genetic
known diagnostic metabolic and genetic alteration have been dis- Etiologies
covered. And although Norwegian physician, Ivar Asbjorn Polling
Question type(s) Purpose
discovered phenylketonuria in 1934, it was not until approximately
20 years later that dramatically effective, evidence-based nutritional 1. Age fin both early-onset disease
therapy was recognized through the collective work of Lionel Pen- and advanced age controls with
retrospective data.
rose, George Jervis, and Horst Bickel (Berry, 20101. Although the
2. Presence of severe or rare disease Prioritize by condition or suspected
number of severe metabolic disorders with effective dietary and/or phenotype (self.reported). genetic etiology (free text permitted
drug therapy continues to increase, identification of more subtle for detailed responses).
subclinical and preclinical disease states utilizing whole genome or lives to 02. disease onset. rarity. Prioritize further within the disease
exome data has not yet been explored. severity. and presence of family category of interest.
history are assessed.
Research findings will eventually move into clinical practice as
Ls objective disease evidence from Prioritize diseases with evidence beyond
insight from next-generation sequencing technology is applied to physician diagnosis and/or medical self-reporting and/or with supporting
metabolic lessons from the past, and greater correlation between testing available? laboratory. imaging. or genetic data.
genomic individuality and biochemical individuality is delineated S. Will dam from MI be uploaded into Prioritize by accessible medical
in an expanded number of individuals. Subsequently, identification participant PGP profiles? phenotype dams.
Demographics: geographic (from Provide flexibility in rapid
of subclinical and preclinical phenotypes should lead to effective local to continent level). as well as hypothesis-driven prioritization of
dietary and drug therapy in individuals exhibiting milder or non- ethnic 'ix.. "ethnicity" will not already enrolled cohorts.
classic phenotypes of known metabolic diseases. As this will have always be concordant with Finable ancestry. epigenetic.
the effect of broadening both genetic and biochemical screening, a "geography") and gender. environmental studies.
Geographic and ethnic data I both Apply appropriate population frequency
resulting cycle of medical discovery, screening, and treatment rec-
voluntary to answer/ can be thresholds when interpreting"-omic"
ommendations in this area can be expected to accelerate in the provided (or all (our grandparents. variants and other datasets.
coming years. Co-enrollment with affected or Prioritize on feasibility of familial-based
The Personal Genome Project (PGP) is a Harvard Medical School unaffected family members? State genomic or other analyses.
disease(s). affected status. and
study with institutional review board approval for the enrollment
familial relationship.
of 100,000 individuals for complete genomic and phenotypic study 8. What type of biological samples will Prioritize based on available tissue/cell
(http://www.personalgenomes.org/). Study participants must be at be provided (e.g.. blood. saliva. types or feasibility of somatic venus
least 21 years of age. Enrollment is entirely online and requires "normal' flora ((or microbiomes). tramline comparative studies.
passing an exam testing comprehension of human subject research, skin. or other tissues)?
PGP protocols, and basic genetics. Study guides and consent forms
are available online at http://www.personalgenomes.org/consentl
and http://www.pgpstudy.org/ (Church, 2005; Lunshof et al., 2010).
on 88 genes involved in 68 well-established biochemical genetic dis-
Integrated datasets of linked genomic and phenotypic data on
orders with known dietary and/or pharmacologic treatment. The
each individual are made available publicly as a free resource for
vast majority of primary and secondary newborn screening tar-
the research community and to the study participants themselves.
gets recommended by the American College of Medical Genetics
To allow for sequence confirmation and functional studies, par-
(ACMG) are included (Supp. Table SI).
ticipant cell lines are also made available and distributed through
theCoriell Institute (http://ccr.coriell.org/). These include fibroblast
and Epstein-Barr virus-transformed lymphoblastoid cell lines. Pri-
Methods
vate quarterly questionnaires are used to track safety and prospective
clinical outcomes. Purified DNA from saliva or blood on over 200 PGP participants
More than 1,000 participants have provided phenotype data are slated for library preparation and sequencing by Complete Ge-
via personal health records and standardized questionnaires. The nomics, Inc. Data are annotated using their 2.X pipeline matching
project is also actively pursuing the development and administra- against the National Center for Biotechnology Information (NCBI)
tion of new phenotyping tools with help from both the research build 37 reference genome. A preliminary interpretation derived
community and commercial organizations. Immediate phenotyp- from this data is provided privately to participants and becomes
ing plans include providingmicrobiome measurements from several public after they are allotted 30 days for review. Individual datasets
body sites, telomere lengths, and methytation profiles. Participants are linked to the participant ID and are published in the public
may then elect to participate in these additional activities as they domain under the Creative Commons CCO waiver.
become available. More than 97% of participants have expressed We have developed the GET-Evidence system to produce reports
interest in doing so. More than 85% of participants have also ex- and make datasets available to the study participants and to the pub-
pressed interest in providing discarded surgical samples for analyses lic. The purpose of GET-Evidence is to build up a public database of
and more than 90% of participants have volunteered to provide variant annotations that will ultimately be used to assist in clinical
samples postmortem. analysis. GET-Evidence prioritizes variants for review based on allele
To date, over 1,500 individuals have fully completed enrollment frequency, protein-predicted pathogenicity, and presence in clini-
with twice as many at some stage of the enrollment process. Prom cal gene and variant databases. As more variants are reviewed, the
these, 200+ are being selected to have whole-genome sequence at participants' reports are updated to reflect the newer annotations.
more than 40x coverage from blood- and saliva-derived DNA. For user-specified analyses, Clinical Future (founded by J.V.T. and
Clinical prioritization of participants is aided by a questionnaire A.W.Z. with support from G.M.C.) has developed the Genome Pars-
designed to enhance for strong genetic etiology. (Table I) ing System "GPS"—a secure, private Web service for genomic and
In this communication, we describe initial plans for metabolic phenotypic data management and filtration. A sample GPS analysis
phenotyping in our first 200+ individuals with phenotypically inte- of the PGP pilot genomes is found in Figure I. The system has been
grated whole-genome sequence datasets. Initial analysis is focused used to effectively filter variants for high-clinical importance parsing
810 HUMAN MUTATION. Vol. 33. No. S. 809-812. 2012
EFTA01140308
GPS: Genome Parsing System Genomes Variant5 Reports Collaborate Lag out Terms or service
o Cases
Ou6E4515 hu738fFF
- 1-41 1-,66
hu936584 huA9OCE6
huAE6220 holSEDA08
Search:
nuC30901 nuEttOC3D
Rating GenNAA chap,
0 Contras cccednetes
Aides Nuotaloonvot Dominance frequency RonfOnan7
Deebews
0 Ai evalatie gnomes 80604039 )
G • NG Preddied b be damson° Other
o yerant htte3 (84A9OCE6) measure InAlittall VI ',SWIM C.11U•0
Stray a ist or penes MD D44411 CJG 3 Vet, La.,' Chain 44)4Coenyrne A 1307856' OJEL=1.91911169
, 3156416/7 05443215A)) DahrOgooehose Defooency. 4.4434
ACADS ACADM ACADS ACADVI. C/G %leen] is reviltIOnel in an orar.
(hu728449 clauposs• Wong ono 4%4 eon»
Ragusa Conk C/G
(nwC30901)
AC/OVI. RISSIV C CIT mews* 0.7a% 0.999 GET•tvIdenca
Fracvency Inresnoal (8003858A)
frequency < S. MIR 77495 C — QT 0.76% 0.999 YET-Erklena
a -.0A/C
08.. 340)51)
Recant Om* MTh 0314N 0.78% 0.032 R116252762. GE.T.3.170921
(nuMiC013)
UROD 999 QG 0.76% 0.662 gflivoenct
0:800se • valiant Mel • 01003408M
PROOn R:9:5 C • A/C 0.78% GE T-EvOenc•
(hu93150A)
seR G9CC 0 Ad 0.78% 0116252762 COT-Pridenre
(0 A5013)
ANT StRiL 0 — 0.78% 0.13 T-Evdena
CPI 4,4345% uC30901)
StC7A9 4182T C • QT ins." 0.76% YET-88818808 0.11M
00.9311518.0
POI v2454 A -a N0 rec•Oto• 0.76% 0.976 P0.'251494, filaria710191.
m0000799
Showing 1 tO 66 a 66 entrees (Mend ban 35,039 total intros)
Figure 1. Genome Parsing System (GPSI screenshot: Whole-genome data from 16 Personal Genome Project (POP) participants parsed against
88 metabolic disease genes show an average of four to nine variants per genome, are less than 5% in frequency, and appear in OMIM and/or are
protein predicted to be damaging. P1.8.: the predominance of the MAP of 0.0078 in these rarest variants occurs because each variant occurs only
once in a limited frequency database of 64 public genomes used for this analysis.
genomic data against clinical gene and variant databases, filtering by level, camitine profile (free and total), folate level, zinc level,
lowallele frequency and protein-predicted pathogenicity lAdzhubei B12 level, urine-reducing substances, lipid profile, hemoglobin
et al., 20101. By analyzing aggregate data from 5,400 individual ex- electrophoresis, pyridoxine level, biotin level, urine galactitol,
omes, available from the NHLBI Exome Variant Server, we find four galactose-1-phosphate, copper level, ceruloplasmin, magnesium
to nine variants with frequency less than 10% specifically from the level, carbohydrate-deficient transferrin, urine and plasma porpho-
88 genes associated with the targeted disorders from Supp. Table SI. bilinogen, urine and plasma delta-aminolevulinic acid, RBC plas-
In the PGP pilot data, each participant has four to nine variants malogens, pipecolic acid, and plasma very-long-chain fatty adds.
with frequency less than 5% and zero to one variants in OMIM The majority of these biochemical tests will be performed in-house
(www.omim.org) specifically from the 88 genes associated with the at Children's Hospital Boston and Massachusetts General Hospi-
targeted disorders from Supp. Table SI. When analysis is extended tal with some highly specialized tests being performed by outside
to the NHLBI Exome Variant dataset, we find slightly fewer variants, clinical collaborators (Table 2).
three to seven on average per exome, with a frequency less than 5% After identification of both known and potentially pathogenic
(Exome Variant Server, 2012). mutations within the targeted 88 biochemical genes with the GPS
Consensus from several publications also indicates that an aver- platform (Supp. Table SI), we will analyze participant metabolite
age of 10-30 variants per genome are present heterozygously for values and ratios in which mutation status suggests possible devi-
autosomal-recessive disorders. One or more of these typically in- ation from normal values using Mann—Whitney and IColmogorov—
volve established metabolic disorders. Furthermore, we avoided the Smimov tests. Analyses for statistically significant and pathophysi-
summation due to the wide population-specific variability for each ologically consistent differences observed against matched controls
disorder, but adding up estimated carrier rates for all 88 disorders will be aided by performing the same biochemical testing on all
should also support the hypothesis of finding at least one biochem- participants and allowing each participant to also serve as control
ical disorder of interest, simply on the basis of carrier status for for the biochemical disorders and pathways in which they are not
at least one treatable metabolic disorder listed in Supp. Table SI found to have potentially pathogenic mutations.
(Lupski et al., 2014
All 200+ participants will have the following laboratory stud-
ies performed in a CLIA certified clinical laboratory for bio- Discussion
chemical phenotyping that are relevant to the treatable disor-
ders listed in Supp. Table SI: plasma amino acids, urine organic The concept ofbiochemical individuality first introduced by Gar-
acids, plasma acylcarnitines, urine acylglycines, basic chem7, NH4 rod has had enormous impact on modern medicine and human
HUMAN MUTATION, WI. 33, No. 5.809-812.2012 811
EFTA01140309
Table 2. Planned Biochemical Phenotyping for 200+ PGP cal disorders in over 200 individuals will be challenging because of
Participants with Whole-Genome Data multiple hypothesis testing. We still expect to see interesting data
trends supporting known biochemical pathophysiology even in this
Plasma amino acids
Urine organic acids
cohort size when targeting the rarest protein altering variants. In
Plasma arylcamitines some instances, statistically significant differences should eventu-
Urine acylglycines ally be observed once a critical mass of individuals with matching
Sodium genotype, metabolic profile, and phenotype is reached.
Potassium
Neither the metabolic diseases we have chosen to study in our
chloride
Bi<JfIX/Ilite initial metabolic analysis nor the laboratory tests we will perform
Blood urea nitrogen on all 200+ individuals are comprehensive of treatable metabolic
Creatinine disorders or available clinical biochemical testing, respectively, but
Glucose
it should generate helpful pilot data and lay the foundation for future
NH4 level
Camitine profile 'free and total)
trials studying an expanded number of genes, metabolic disorders,
Folate level and individuals.
Zinc level Our finding of four to nine rare variants predicted to be
B12 level pathogenic variants per genome on average within 88 genes causing
Urine-reducing substances
metabolic disease with established dietary and/or pharmacologic
Lipid profile
Hemoglobin electrophoresis therapy is highly dependent on the filtering algorithm. This low
Pyridoxine level figure is also bounded by the limited number of genes studied and
Biotin level our current understanding of metabolic diseases. Regardless, at 10
Urine galactitol or less variants per person with our current algorithm, the prospect
Galactose-I- phosphate
Copper level of systematic development of individualized dietary and/or medical
uloplasmin data informed by genomic and metabolomic data finally comes into
Magnesium level practical view.
Carbohydrate deficient transferrin We anticipate the biochemical interrogation of 200+ whole
Urine and plasma porphobilinogen
genomes guided by genomic individuality, and linked to a pro-
Urine and plasma delta-aminolnulinic acid
RM: plasinalogens cess of individual phenotype data gathering guided by the known
Pipecolic acid natural history ofa subset of clinically well-characterized metabolic
Plasma wry-long-chain fatty acids disorders will prove valuable.
Identifying the genomic and metabolomic circumstances under
which subclinical or predinical states exist for these same disorders
may eventually lead to the first evidence-based efficacy studies for
genetics. In contrast, due to direct observation of familial similari- nutrigenomics in these patients who would now otherwise go un-
ties, especially physical similarities in the case ofmonozygotic twins, treated and undetected by current methods and standard practices.
"genomic individuality" has not only been assumed since before the
term "genome" was coined but also could correctly be considered a
redundant term. Yet,only recently, with the deep sequencingof mul- Acknowledgments
tiple whole genomes, exomes, and targeted sequencing of genes in Disclosure Statement I.V.T. and A.W.Z. declare potential conflict of interest
the tens to thousands becoming more practical in clinical research, as cofounders of Clinical Future. Inc.. Somen•ille, MA.
are we able to systematically study and correlate three critical axes
of medical research: genomic, metabolomic, and phenomic. Addi-
tional axes, such as functional data on an individual's cell line, will References
also aid in supporting hypothesis of causality. Four decades worth of
Adchubei IA. Schmidt S. Peshkin L Ramensky VEGerasimova A. Bork P. Kondrashov
observational data on the natural history of treated patients for some AS. Sunyaev, SR. 2010. A method and seem for predicting damaging missense
of these disorders that were the first to be biochemically screened mutations. Nat Methods 7:248-249.
for in the 1960s is also extremely informative. American College of bledical Genetics. 2006. Health Resources and Services Adminis-
We expect to see correlations between rarer variants and larger tration r:ommisNioned Report. Newborn screening: toward a uniform screening
panel and system. (kiwi Med 8:15-2525.
deviations from normal (in the expected direction for the specific Berry GT. 2008. Metabolic profiling. Nestle Nutt Workshop See Pediair Program62:55.-
disorder and biochemical metabolites). The frequency and degree 75.
to which analyte deviations are in the expected direction for the Church GC. 2005. Personal genome project. Mol Syst Biol I-3.
particular disorder will also be biostatistically analyzed. Since all Home Variant Server. NHLBI Esome Sequencing Project (ESP). Seattle. WA. Available
at: http://evs.gsvrashington.edu/EVS/. (Accessed Faxuary. 20l2).
200+ participants will have the full range of biochemical studies
Lunshof IL. /lobe 1. Aach I. Angrist M. Thakuria IV. Vorhaus DB. Hoehe MR. (lurch
relevant to 88 genes involved in 68 treatable biochemical disorders, GM. Personal genomes in progress from the human genome project to the per-
those without suspected pathogenic variants in a specific gene(s) sonal genome project. 2010. Dialogues (lin Neurosci 12:47-60.
or disorder will serve as controls for those who are biochemically Lupski JR. Reid IG. Gonraga.Jauregui C. Rio Deiros D. Chen DC. Narareth L. Bain-
studied based on sequence data for the same specific disorder. bridge M. Dinh H. ling C. Wheeler DA. McGuire AL 7.hang F. and others. 2010.
Whole-genome sequencing in a patient with Charcot-Marie-Tooth neuropathy. N
Achieving statistical significance correlating relevant biochemi-
Engl ) Med 362:1181-1191.
cal analytes with genomic data in individuals found to have one or Scrim Beaudet AL. Sly WS. Wyk D. Childs B. Kindler KW. Vogelstein B. 2012.
more potentially pathogenic mutations across these 68 biochemi- Metabolic and molecular haws ofinherited disease. New York: McGraw-Hill.
812 HUMAN MUTATION. Vol. 33. No. 5.809-812.2012
EFTA01140310
Entities
0 total entities mentioned
No entities found in this document
Document Metadata
- Document ID
- 1a27ffbb-7743-4634-a749-159318667042
- Storage Key
- dataset_9/EFTA01140307.pdf
- Content Hash
- cc70a8effa7f027744c99e859e84c612
- Created
- Feb 3, 2026