Epstein Files

EFTA01140307.pdf

dataset_9 pdf 565.3 KB Feb 3, 2026 4 pages
Human Mutation OFFICIAL JOURNAL Back to the Future: From Genome to Metabolome HGV§1 HUMAN GENOME VARIATION SOCIETY wAwrq. wo Joseph V. Thakuria,I 2* Alexander W. Zaranek," George M. Church,' and Gerard T. Eierry3 'Department of Genetics, Harvard Medical School, Boston, Massachusetts;20ivision of Genetics, Massachusetts GeneralHospital, Harvard Medical School, Boston, Massachusetts; 'Division of Genetics, Department of Medicine, Children's Hospital Boston, Boston, Massachusetts For the Deep Phenotyping Special Issue Received 20 February2012; accepted revised manuscript 28 February 2012. Published mine 18 March 2012 in Wiley Online Library Iwnw.wiley.comMumanmutation).001: 10.1002/humu.22073 Because of proven clinical benefit, a subset of these disorders has made their way into formal newborn screening recommendations ABSTRACT: In the traditional medical genetics setting, metabolic disorders, identified either clinically or through [ACMG, 2006). Used for second-tier biochemical confirmation in biochemical screening, undergo subsequent single gene conjunction with newborn screening programs, this technology has testing to molecularly confirm diagnosis, provide further saved the lives of many newborns, children, and adults the world insight on natural disease history, and inform on disease over. Starting with phenylketonuria in 1953, nutritional therapeutics guided by metabolic screening and serial testing has been conclu- management, treatment, familial testing, and reproduc- tive options. For decades now, this process has been re- sively shown to have medical benefit in a wide variety of enzyme sponsible for saving many lives worldwide. Only recently, deficiencies and other biochemical disorders. As we enter the genomics era, our most diagnostically challenging though, has it become possible to move in the opposite direction by starting with an individual's whole genome cases in a medical genetics clinic are rapidly moving from a state of having no causal molecular candidates to having many candi- or exome, and, guided by this data, study more minor per- turbations in the absolute values and substrate ratios of dates that need further evaluation and vetting. Nongenomic axes clinically important biochemical analytes. Genomic indi- supporting causality from imaging, biochemical assay, functional cellular work, and other lines of evidence are increasingly impor- viduality can also be used to guide more detailed phenotyp- ing aimed at uncovering milder manifestations of known tant to help verify pathogenicity. Of these, biochemical assays have metabolic diseases. Metabolomic phenotyping in the Per- historically been the axis most frequently correlated with genetic sonal Genome Project for our first 200+ participants—all data in a medical genetics practice. of whom are scheduled to have full genome sequence at Additionally, although much progress has been made in the screening, prevention, and treatment of inherited and primarily more than 40x coverage available by May 2012—is aimed at uncovering potential subclinical and preclinical disease autosomal-recessive biochemical disorders, limited resources have states in carriers of known pathogenic mutations and in been devoted to studying potential subclinical and preclinical dis- lesser known rare variants that are protein predicted to ease states in carriers of known pathogenic mutations as well as in be pathogenic. Our initial focus targets 88 genes involved those harboring one or more less well-defined variants in known disease-causing genes. In large part, this is due to newborn screen- in 68 metabolic disturbances with established evidence- based nutritional and/or pharmacological therapy as pan ingand other testing modalities reliance on biochemical analytes for of standard medical care. screening and diagnosis. In clinical practice, the higher sensitivity, Hum Murat 33:809-812, 2012. 0 2012 Wiley Periodicals, Inc. specificity, and cost-effectiveness ofscreeningbiochemically are well justified. KEY WORDS: genomics; metabolomics; nutritional ther- Large-scale genomic research studies utilizing next-generation apy; pharmacological therapy sequencing, however, provides opportunity for researchers to start with comprehensive genomic sequence data and, secondarily, study the resulting phenotype and biochemical profile. If consistent ab- normal trends (even trends within the normal range) are found as- sociated with carrier states and/or lesser known mutations in genes Background causing metabolic disorders, it is intriguing to think of what effect in the 1985 American film, "Back to the Future," Marty McFly is a modified diet specific to the defect will have on the health and accidentally sent back in time to the 1950s by a plutonium powered well-being of such individuals. In order to explore this possibility, "flux capacitor" in a modified DeLorean upon reaching 88 mph. an important first step is identifying whether such trends exist and Throughout the film, the impact the future has on the past is ex- identifyingin which disorders subclinical or preclinical biochemical plored. For decades now, mass spectrometric analysis typically uti- phenotypes are prevalent. In some disorders, such as galactosemia, lizing a cylindrical capacitor ionization source to generate singly the biochemical and phenotypic effect of carrier status, and rarer charged ions has been the backbone of diagnosis, management, Duarte allele I (GALT N314D + L2I8L) pin of function muta- and/or treatment for hundreds of inherited metabolic disorders. tions have been studied and characterized [Striver et al., 2012). In many other metabolic disorders, however, phenotypically, little may be known beyond the scope of classically affected patients on the extreme end of a disease severity spectrum. Additional Supporting Information may be found in the online version of this article. In 1908, Archibald Garrod introduced the idea of bio- 'Correspondence to: Joseph V. Thakuria, Division of Genetics, Massachusetts chemical individuality and described four of the first known General Hospital, Boston, MA02114. &mat jthakuria0geneticsmed.harvarctedu autosomal-recessive disorders: alkaptonuria, cystinuria, albinism, C 2012 WILEY PERIODICALS, INC. EFTA01140307 and pentosuria. Since then, over 300 metabolic disorders with Table 1. PGP Screening Questions Enriching for Genetic known diagnostic metabolic and genetic alteration have been dis- Etiologies covered. And although Norwegian physician, Ivar Asbjorn Polling Question type(s) Purpose discovered phenylketonuria in 1934, it was not until approximately 20 years later that dramatically effective, evidence-based nutritional 1. Age fin both early-onset disease therapy was recognized through the collective work of Lionel Pen- and advanced age controls with retrospective data. rose, George Jervis, and Horst Bickel (Berry, 20101. Although the 2. Presence of severe or rare disease Prioritize by condition or suspected number of severe metabolic disorders with effective dietary and/or phenotype (self.reported). genetic etiology (free text permitted drug therapy continues to increase, identification of more subtle for detailed responses). subclinical and preclinical disease states utilizing whole genome or lives to 02. disease onset. rarity. Prioritize further within the disease exome data has not yet been explored. severity. and presence of family category of interest. history are assessed. Research findings will eventually move into clinical practice as Ls objective disease evidence from Prioritize diseases with evidence beyond insight from next-generation sequencing technology is applied to physician diagnosis and/or medical self-reporting and/or with supporting metabolic lessons from the past, and greater correlation between testing available? laboratory. imaging. or genetic data. genomic individuality and biochemical individuality is delineated S. Will dam from MI be uploaded into Prioritize by accessible medical in an expanded number of individuals. Subsequently, identification participant PGP profiles? phenotype dams. Demographics: geographic (from Provide flexibility in rapid of subclinical and preclinical phenotypes should lead to effective local to continent level). as well as hypothesis-driven prioritization of dietary and drug therapy in individuals exhibiting milder or non- ethnic 'ix.. "ethnicity" will not already enrolled cohorts. classic phenotypes of known metabolic diseases. As this will have always be concordant with Finable ancestry. epigenetic. the effect of broadening both genetic and biochemical screening, a "geography") and gender. environmental studies. Geographic and ethnic data I both Apply appropriate population frequency resulting cycle of medical discovery, screening, and treatment rec- voluntary to answer/ can be thresholds when interpreting"-omic" ommendations in this area can be expected to accelerate in the provided (or all (our grandparents. variants and other datasets. coming years. Co-enrollment with affected or Prioritize on feasibility of familial-based The Personal Genome Project (PGP) is a Harvard Medical School unaffected family members? State genomic or other analyses. disease(s). affected status. and study with institutional review board approval for the enrollment familial relationship. of 100,000 individuals for complete genomic and phenotypic study 8. What type of biological samples will Prioritize based on available tissue/cell (http://www.personalgenomes.org/). Study participants must be at be provided (e.g.. blood. saliva. types or feasibility of somatic venus least 21 years of age. Enrollment is entirely online and requires "normal' flora ((or microbiomes). tramline comparative studies. passing an exam testing comprehension of human subject research, skin. or other tissues)? PGP protocols, and basic genetics. Study guides and consent forms are available online at http://www.personalgenomes.org/consentl and http://www.pgpstudy.org/ (Church, 2005; Lunshof et al., 2010). on 88 genes involved in 68 well-established biochemical genetic dis- Integrated datasets of linked genomic and phenotypic data on orders with known dietary and/or pharmacologic treatment. The each individual are made available publicly as a free resource for vast majority of primary and secondary newborn screening tar- the research community and to the study participants themselves. gets recommended by the American College of Medical Genetics To allow for sequence confirmation and functional studies, par- (ACMG) are included (Supp. Table SI). ticipant cell lines are also made available and distributed through theCoriell Institute (http://ccr.coriell.org/). These include fibroblast and Epstein-Barr virus-transformed lymphoblastoid cell lines. Pri- Methods vate quarterly questionnaires are used to track safety and prospective clinical outcomes. Purified DNA from saliva or blood on over 200 PGP participants More than 1,000 participants have provided phenotype data are slated for library preparation and sequencing by Complete Ge- via personal health records and standardized questionnaires. The nomics, Inc. Data are annotated using their 2.X pipeline matching project is also actively pursuing the development and administra- against the National Center for Biotechnology Information (NCBI) tion of new phenotyping tools with help from both the research build 37 reference genome. A preliminary interpretation derived community and commercial organizations. Immediate phenotyp- from this data is provided privately to participants and becomes ing plans include providingmicrobiome measurements from several public after they are allotted 30 days for review. Individual datasets body sites, telomere lengths, and methytation profiles. Participants are linked to the participant ID and are published in the public may then elect to participate in these additional activities as they domain under the Creative Commons CCO waiver. become available. More than 97% of participants have expressed We have developed the GET-Evidence system to produce reports interest in doing so. More than 85% of participants have also ex- and make datasets available to the study participants and to the pub- pressed interest in providing discarded surgical samples for analyses lic. The purpose of GET-Evidence is to build up a public database of and more than 90% of participants have volunteered to provide variant annotations that will ultimately be used to assist in clinical samples postmortem. analysis. GET-Evidence prioritizes variants for review based on allele To date, over 1,500 individuals have fully completed enrollment frequency, protein-predicted pathogenicity, and presence in clini- with twice as many at some stage of the enrollment process. Prom cal gene and variant databases. As more variants are reviewed, the these, 200+ are being selected to have whole-genome sequence at participants' reports are updated to reflect the newer annotations. more than 40x coverage from blood- and saliva-derived DNA. For user-specified analyses, Clinical Future (founded by J.V.T. and Clinical prioritization of participants is aided by a questionnaire A.W.Z. with support from G.M.C.) has developed the Genome Pars- designed to enhance for strong genetic etiology. (Table I) ing System "GPS"—a secure, private Web service for genomic and In this communication, we describe initial plans for metabolic phenotypic data management and filtration. A sample GPS analysis phenotyping in our first 200+ individuals with phenotypically inte- of the PGP pilot genomes is found in Figure I. The system has been grated whole-genome sequence datasets. Initial analysis is focused used to effectively filter variants for high-clinical importance parsing 810 HUMAN MUTATION. Vol. 33. No. S. 809-812. 2012 EFTA01140308 GPS: Genome Parsing System Genomes Variant5 Reports Collaborate Lag out Terms or service o Cases Ou6E4515 hu738fFF - 1-41 1-,66 hu936584 huA9OCE6 huAE6220 holSEDA08 Search: nuC30901 nuEttOC3D Rating GenNAA chap, 0 Contras cccednetes Aides Nuotaloonvot Dominance frequency RonfOnan7 Deebews 0 Ai evalatie gnomes 80604039 ) G • NG Preddied b be damson° Other o yerant htte3 (84A9OCE6) measure InAlittall VI ',SWIM C.11U•0 Stray a ist or penes MD D44411 CJG 3 Vet, La.,' Chain 44)4Coenyrne A 1307856' OJEL=1.91911169 , 3156416/7 05443215A)) DahrOgooehose Defooency. 4.4434 ACADS ACADM ACADS ACADVI. C/G %leen] is reviltIOnel in an orar. (hu728449 clauposs• Wong ono 4%4 eon» Ragusa Conk C/G (nwC30901) AC/OVI. RISSIV C CIT mews* 0.7a% 0.999 GET•tvIdenca Fracvency Inresnoal (8003858A) frequency < S. MIR 77495 C — QT 0.76% 0.999 YET-Erklena a -.0A/C 08.. 340)51) Recant Om* MTh 0314N 0.78% 0.032 R116252762. GE.T.3.170921 (nuMiC013) UROD 999 QG 0.76% 0.662 gflivoenct 0:800se • valiant Mel • 01003408M PROOn R:9:5 C • A/C 0.78% GE T-EvOenc• (hu93150A) seR G9CC 0 Ad 0.78% 0116252762 COT-Pridenre (0 A5013) ANT StRiL 0 — 0.78% 0.13 T-Evdena CPI 4,4345% uC30901) StC7A9 4182T C • QT ins." 0.76% YET-88818808 0.11M 00.9311518.0 POI v2454 A -a N0 rec•Oto• 0.76% 0.976 P0.'251494, filaria710191. m0000799 Showing 1 tO 66 a 66 entrees (Mend ban 35,039 total intros) Figure 1. Genome Parsing System (GPSI screenshot: Whole-genome data from 16 Personal Genome Project (POP) participants parsed against 88 metabolic disease genes show an average of four to nine variants per genome, are less than 5% in frequency, and appear in OMIM and/or are protein predicted to be damaging. P1.8.: the predominance of the MAP of 0.0078 in these rarest variants occurs because each variant occurs only once in a limited frequency database of 64 public genomes used for this analysis. genomic data against clinical gene and variant databases, filtering by level, camitine profile (free and total), folate level, zinc level, lowallele frequency and protein-predicted pathogenicity lAdzhubei B12 level, urine-reducing substances, lipid profile, hemoglobin et al., 20101. By analyzing aggregate data from 5,400 individual ex- electrophoresis, pyridoxine level, biotin level, urine galactitol, omes, available from the NHLBI Exome Variant Server, we find four galactose-1-phosphate, copper level, ceruloplasmin, magnesium to nine variants with frequency less than 10% specifically from the level, carbohydrate-deficient transferrin, urine and plasma porpho- 88 genes associated with the targeted disorders from Supp. Table SI. bilinogen, urine and plasma delta-aminolevulinic acid, RBC plas- In the PGP pilot data, each participant has four to nine variants malogens, pipecolic acid, and plasma very-long-chain fatty adds. with frequency less than 5% and zero to one variants in OMIM The majority of these biochemical tests will be performed in-house (www.omim.org) specifically from the 88 genes associated with the at Children's Hospital Boston and Massachusetts General Hospi- targeted disorders from Supp. Table SI. When analysis is extended tal with some highly specialized tests being performed by outside to the NHLBI Exome Variant dataset, we find slightly fewer variants, clinical collaborators (Table 2). three to seven on average per exome, with a frequency less than 5% After identification of both known and potentially pathogenic (Exome Variant Server, 2012). mutations within the targeted 88 biochemical genes with the GPS Consensus from several publications also indicates that an aver- platform (Supp. Table SI), we will analyze participant metabolite age of 10-30 variants per genome are present heterozygously for values and ratios in which mutation status suggests possible devi- autosomal-recessive disorders. One or more of these typically in- ation from normal values using Mann—Whitney and IColmogorov— volve established metabolic disorders. Furthermore, we avoided the Smimov tests. Analyses for statistically significant and pathophysi- summation due to the wide population-specific variability for each ologically consistent differences observed against matched controls disorder, but adding up estimated carrier rates for all 88 disorders will be aided by performing the same biochemical testing on all should also support the hypothesis of finding at least one biochem- participants and allowing each participant to also serve as control ical disorder of interest, simply on the basis of carrier status for for the biochemical disorders and pathways in which they are not at least one treatable metabolic disorder listed in Supp. Table SI found to have potentially pathogenic mutations. (Lupski et al., 2014 All 200+ participants will have the following laboratory stud- ies performed in a CLIA certified clinical laboratory for bio- Discussion chemical phenotyping that are relevant to the treatable disor- ders listed in Supp. Table SI: plasma amino acids, urine organic The concept ofbiochemical individuality first introduced by Gar- acids, plasma acylcarnitines, urine acylglycines, basic chem7, NH4 rod has had enormous impact on modern medicine and human HUMAN MUTATION, WI. 33, No. 5.809-812.2012 811 EFTA01140309 Table 2. Planned Biochemical Phenotyping for 200+ PGP cal disorders in over 200 individuals will be challenging because of Participants with Whole-Genome Data multiple hypothesis testing. We still expect to see interesting data trends supporting known biochemical pathophysiology even in this Plasma amino acids Urine organic acids cohort size when targeting the rarest protein altering variants. In Plasma arylcamitines some instances, statistically significant differences should eventu- Urine acylglycines ally be observed once a critical mass of individuals with matching Sodium genotype, metabolic profile, and phenotype is reached. Potassium Neither the metabolic diseases we have chosen to study in our chloride Bi<JfIX/Ilite initial metabolic analysis nor the laboratory tests we will perform Blood urea nitrogen on all 200+ individuals are comprehensive of treatable metabolic Creatinine disorders or available clinical biochemical testing, respectively, but Glucose it should generate helpful pilot data and lay the foundation for future NH4 level Camitine profile 'free and total) trials studying an expanded number of genes, metabolic disorders, Folate level and individuals. Zinc level Our finding of four to nine rare variants predicted to be B12 level pathogenic variants per genome on average within 88 genes causing Urine-reducing substances metabolic disease with established dietary and/or pharmacologic Lipid profile Hemoglobin electrophoresis therapy is highly dependent on the filtering algorithm. This low Pyridoxine level figure is also bounded by the limited number of genes studied and Biotin level our current understanding of metabolic diseases. Regardless, at 10 Urine galactitol or less variants per person with our current algorithm, the prospect Galactose-I- phosphate Copper level of systematic development of individualized dietary and/or medical uloplasmin data informed by genomic and metabolomic data finally comes into Magnesium level practical view. Carbohydrate deficient transferrin We anticipate the biochemical interrogation of 200+ whole Urine and plasma porphobilinogen genomes guided by genomic individuality, and linked to a pro- Urine and plasma delta-aminolnulinic acid RM: plasinalogens cess of individual phenotype data gathering guided by the known Pipecolic acid natural history ofa subset of clinically well-characterized metabolic Plasma wry-long-chain fatty acids disorders will prove valuable. Identifying the genomic and metabolomic circumstances under which subclinical or predinical states exist for these same disorders may eventually lead to the first evidence-based efficacy studies for genetics. In contrast, due to direct observation of familial similari- nutrigenomics in these patients who would now otherwise go un- ties, especially physical similarities in the case ofmonozygotic twins, treated and undetected by current methods and standard practices. "genomic individuality" has not only been assumed since before the term "genome" was coined but also could correctly be considered a redundant term. Yet,only recently, with the deep sequencingof mul- Acknowledgments tiple whole genomes, exomes, and targeted sequencing of genes in Disclosure Statement I.V.T. and A.W.Z. declare potential conflict of interest the tens to thousands becoming more practical in clinical research, as cofounders of Clinical Future. Inc.. Somen•ille, MA. are we able to systematically study and correlate three critical axes of medical research: genomic, metabolomic, and phenomic. Addi- tional axes, such as functional data on an individual's cell line, will References also aid in supporting hypothesis of causality. Four decades worth of Adchubei IA. Schmidt S. Peshkin L Ramensky VEGerasimova A. Bork P. Kondrashov observational data on the natural history of treated patients for some AS. Sunyaev, SR. 2010. A method and seem for predicting damaging missense of these disorders that were the first to be biochemically screened mutations. Nat Methods 7:248-249. for in the 1960s is also extremely informative. American College of bledical Genetics. 2006. Health Resources and Services Adminis- We expect to see correlations between rarer variants and larger tration r:ommisNioned Report. Newborn screening: toward a uniform screening panel and system. (kiwi Med 8:15-2525. deviations from normal (in the expected direction for the specific Berry GT. 2008. Metabolic profiling. Nestle Nutt Workshop See Pediair Program62:55.- disorder and biochemical metabolites). The frequency and degree 75. to which analyte deviations are in the expected direction for the Church GC. 2005. Personal genome project. Mol Syst Biol I-3. particular disorder will also be biostatistically analyzed. Since all Home Variant Server. NHLBI Esome Sequencing Project (ESP). Seattle. WA. Available at: http://evs.gsvrashington.edu/EVS/. (Accessed Faxuary. 20l2). 200+ participants will have the full range of biochemical studies Lunshof IL. /lobe 1. Aach I. Angrist M. Thakuria IV. Vorhaus DB. Hoehe MR. (lurch relevant to 88 genes involved in 68 treatable biochemical disorders, GM. Personal genomes in progress from the human genome project to the per- those without suspected pathogenic variants in a specific gene(s) sonal genome project. 2010. Dialogues (lin Neurosci 12:47-60. or disorder will serve as controls for those who are biochemically Lupski JR. Reid IG. Gonraga.Jauregui C. Rio Deiros D. Chen DC. Narareth L. Bain- studied based on sequence data for the same specific disorder. bridge M. Dinh H. ling C. Wheeler DA. McGuire AL 7.hang F. and others. 2010. Whole-genome sequencing in a patient with Charcot-Marie-Tooth neuropathy. N Achieving statistical significance correlating relevant biochemi- Engl ) Med 362:1181-1191. cal analytes with genomic data in individuals found to have one or Scrim Beaudet AL. Sly WS. Wyk D. Childs B. Kindler KW. Vogelstein B. 2012. more potentially pathogenic mutations across these 68 biochemi- Metabolic and molecular haws ofinherited disease. New York: McGraw-Hill. 812 HUMAN MUTATION. Vol. 33. No. 5.809-812.2012 EFTA01140310

Entities

0 total entities mentioned

No entities found in this document

Document Metadata

Document ID
1a27ffbb-7743-4634-a749-159318667042
Storage Key
dataset_9/EFTA01140307.pdf
Content Hash
cc70a8effa7f027744c99e859e84c612
Created
Feb 3, 2026