Epstein Files

EFTA00611788.pdf

dataset_9 pdf 809.5 KB Feb 3, 2026 7 pages
•• OPEN 8 ACCESS Freely available online • -PLOS COMPUTATIONAL (t) BIOLOGY The Time Scale of Evolutionary Innovation Krishnendu Chatterjee", Andreas Paviogiannisl , Ben Adlam2, Martin A. Nowak2 Cr2sfivlark 1IST Austria, KlostemeubLeg, Austria, 2 Program for Evolutionary Dynamics, Department of Organismic and Evolutionary Biology, Department of Mathematics, Harvard University, Cambridge, Massachusetts, United States of America Abstract A fundamental question in biology is the following: what is the time scale that is needed for evolutionary innovations? There are many results that characterize single steps in terms of the fixation time of new mutants arising in populations of certain size and structure. But here we ask a different question, which is concerned with the much longer time scale of evolutionary trajectories: how long does it take for a population exploring a fitness landscape to find target sequences that encode new biological functions? Our key variable is the length, L. of the genetic sequence that undergoes adaptation. In computer science there is a crucial distinction between problems that require algorithms which take polynomial or exponential time. The latter are considered to be intractable. Here we develop a theoretical approach that allows us to estimate the time of evolution as function of L. We show that adaptation on many fitness landscapes takes time that is exponential in L. even if there are broad selection gradients and many targets uniformly distributed in sequence space. These negative results lead us to search for specific mechanisms that allow evolution to work on polynomial time scales. We study a regeneration process and show that it enables evolution to work in polynomial time. Citation: Chatterjee K. Pavbgiannis A. Adlam 8, Nowak MA (2014) The Tine Scale of Evolutionary Innovation. PLoS Comput Blol 10(9): e1003818. do1:10.1371/ journalpcb11003818 Editor. Niko Beerenwinket ETH Zurich, Switzerland Received December 13, 2013; Accepted July 21, 2014; Published September 11, 2014 Copyright T 2014 Chanerjee et al This Is an open-access ankle distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: Austrian Science Fund (FWD Grant No P23499-N23, FIVE NEN Grant No S11407-N23 (RISE). ERC Start grant (279307: Graph Games), and Microsoft Faculty Fellows award. Support from the John Templeton foundation is gratefully acknowledged. The funden had no role In study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing interests: The authors have declared that no competing interests exist. • Email: KtishnenduChatterjeeiNstac.at Introduction selection. This problem leads to the study of adaptive walks or fitness landscapes [15,20,21,28,29]. In this paper we ask a different Our planet came into existence 4.6 billion years ago. There is question: how long does it take for evolution to discover a nos clear chemical evidence for life on earth 3.5 billion years ago 11,21. function? More specifically, our aim is to estimate the expected The evolutionary process generated procaria, eucaria and discovery. time of new biological functions: how long does it take complex multi-cellular organisms. 'Throughout the history of life, for a population of reproducing organisms to discover a biological evolution had to discover sequences of biological polymers that function that is not present at the beginning of the search. We will perform specific, complicated functions. The average length of discuss two approximations for rugged fitness landscapes. We also bacterial genes is about 1000 nucleotides, that of human genes discuss the significance of clustered peaks. about 3000 nucleotides. The longest known bacterial gene We consider an alphabet of size four, as is the case for DNA and contains more than 105 nucleotides, the longest human gene RNA, and a nucleotide sequence of length L. We consider a more than 106. A basic question is what is the time scale required population of size N, which reproduces asexually. The mutation by evolution to discover the sequences that perform desired rate, el, is small: individual mutations arc introduced and evaluated functions. While many results exist for the fixation time of by natural selection and random drift one at a time. The individual mutants [3 15], here we ask how the time scale of probability that the evolutionary process moves from a sequence i evolution depends on the length L of the sequence that needs to be to a sequence j, which is at Hamming distance one from i, is given adapted. We consider the crucial distinction of polynomial versus by Ptr , Wit/(3L)Ipip where pry is the fixation probability of exponential time [16 18]. A time scale that grows exponentially in sequence / in a population consisting of sequence L In the special L is infeasible for long sequences. case of a flat fitness landscape, we have p,4 a I /N, and Evolutionary dynamics operates in sequence space, which can lit/(3L)]. Thus we have an evolutionary random walk, be imagined as a discrete multi-dimensional lattice that arises where each step is a jump to a neighboring sequence of Hamming when all sequences of a given length are arranged such that distance one. nearest neighbors differ by one point mutation [19]. For constant selection, each point in sequence space is associated with a non- Results negative fitness value (reproductive rate). 'flie resulting fitness landscape is a high dimensional mountain range. Populations Consider a high-dimensional sequence space. A particular explore fitness landscapes searching for elevated regions, ridges, biological function can be instantiated by some of the sequences. and peaks [20 27]. Each sequence i has a fitness value fi, which measures the ability of A question that has been extensively studied is how long does it the sequence i to encode the desired function. Biological fitness take for existing biological functions to improve under natural landscapes arc typically expected to have many peaks [29-31]. PLOS Computational Biology I www.ploscompbiol.org September 2014 I Volume 10 I Issue 9 I e1003818 EFTA00611788 The Time Scale of Evolutionary innovation We first study a broad peak of target sequences described as Author Summary follows: consider a specific sequence; any sequence within a certain Evolutionary adaptation can be described as a biased, Hamming distance of that sequence belongs to the target set. stochastic walk of a population of sequences in a high Specifically, we consider that the evolutionary process has dimensional sequence space. The population explores a succeeded, if the population discovers a sequence that differs fitness landscape. The mutation-selection process biases from the specific sequence in no more than a fraction r of the population towards regions of higher fitness. In this positions. We refer to the specific sequence as the target center and paper we estimate the time scale that is needed for c as the width (or radius) of the peak. For example, if L=100 and evolutionary innovation. Our key parameter is the length c -0.1, then the target center is surrounded by a cloud of of the genetic sequence that needs to be adapted. We show that a variety of evolutionary processes take approximately le sequences. For a single broad peak with width exponential time in sequence length. We propose a c, the target set contains at least 2d-/(3L) sequences, which is an specific process, which we call 'regeneration processes', exponential function of fitness landscape outside the broad and show that it allows evolution to work on polynomial peak is flat. We refer this binary fitness landscape as a broad peak time scales. In this view, evolution can solve a problem landscape. The population needs to discover any one of the target efficiently if it has solved a similar problem already. sequences in the broad peak, starting from some sequence that is not in the broad peak. We establish the following result. Theorem I. Consider a single search exploring a broad peak They can be highly nigger] clue to epistatic effects of mutations landscape with width c and mutation rate u. The following [32-34]. They can also contain large regions or networks of assertions hold: neutrality [20,21]. Empirical studies of short RNA sequences have revealed that the underlying fitness landscape has low peak density • if c <3/4, then there exists LooN such that for all sequence 1351: around 15 peaks in 42. sequences. spaces of sequence length L L-0, the expected discovery time is For the purpose of estimating the expected discovery time we L can approximate the fitness landscape with a binary step function at least expR3 4c) log 16 4c-1- 3 over the sequence space. We discuss two different approximations • if c 3/4, then for all sequence spaces of sequence length L, the (Figure 1). For the first approximation, we consider the scenario expected discovery time is at mast 0(L310. where fitness values below some threshold, Au, have negligible contribution; those sequences do not instantiate the desired Our result can be interpreted as follows (see Theorem S2 and function (either not at all or only below the minimum level that Corollary S2 in Text SI): (i) If a <3/4, then the expected discovery could be detected by natural selection). We approximate the time is exponential in L; and (ii) if c 3/4, then the expected rugged fitness landscape as follows: if fi <Ain then f -0; if discovery time is polynomial in L. Thus, we have derived a strong fi ≥ f,„;,, then I. The set of sequences with f i 2fmio constitutes dichotomy result which shows a sharp transition from polynomial the target set, and the remaining fitness landscape is neutral. to exponential time depending on whether a specific condition on The second approximation works as follows. Consider the c dors or does not hold. evolutionary process exploring a rugged fitness landscape where For the four letter alphabet most random sequences have the goal is to attain a fitness level7. Local maxima below!' slow Hamming distance 3L/4 from the target center. If the population down the evolutional), process to attain f*, because the is further away than this Hamming distance, then random drift evolutionary walk might get stuck in those local maxima. In order will bring it closer. If the population is closer than this Hamming to derive lower bounds for the expected discovery time, the rugged distance, then random drift will push it further away. This fitness landscape can be approximated as follows. Let / be the argument constitutes the intuitive reason that 3/4 is the critical fitness value of the highest local maximum below f . Then for threshold. If the peak has a width of less than 314, then we every sequence in a mountain range with a local maximum below prove that the expected discovery time by random drift is 7 we assign the fitness value ./). The mountain ranges with local exponential in the sequence length L (see Figure 2). This result maxima above/• are the target sequences. Note that the target set holds for any population size, N, as long as 4L > > N, which is includes sequences that start at the upslope of mountain ranges certainly the raw for realistic values of L and N. In the Text SI we with peaks above 7. .nalS, again we obtain a fitness landscape also present a more general result, where along with a single broad with clustered targets and neutral region, where the neutral region peak, instead of a flat landscape outside the peak we consider a consists of all sequences whose fitness values have been assigned to multiplicative fitness landscape and establish a sharp dichotomy 1 . The two approximations are illustrated in Figure 1. For result that generalizes Theorem 1 (see Corollary S2 in Text SI). 1 the second approximation generates larger target areas Remark 1. We highlight two important aspects of our results. than the first approximation and is therefore more lenient. Our key results for estimating the discovery time can now be I. First, when we establish exponential lower bounds for the formulated for binary fitness landscapes, but they apply to any expected discovery time, then these lower bounds hold even f the type of rugged landscape using one of the two approximations. We starting sequence is only a few steps away from the target set. note that our methods can also he applied for certain non-binary 2. Second, we present strong dichotomy results, and derive fitness landscapes, and an example of a fitness landscape with a mathematically the most precise and strongest form of the large gradient arising from multiplicative fitness effects is discussed boundary condition. in Sections 6 and 7 of Text SI. We now present our main results in the following order. We lint Let us now give a numerical example to demonstrate that estimate the discovery time of a single search aiming to find a exponential time is intractable. Bacterial life on earth has been single broad peak. Then we study multiple simultaneous searches around for at least 3.5 billion years, which correspond to 3 x 10t 3 for a single broad peak Finally, we consider multiple broad peaks hours. Assuming fast bacterial cell division of 20-30 minutes on that arc uniformly randomly distributed in sequence space. average we have at most 10" generations. The expected discovery PLOS Computational Biology I www.ploscompbiol.org 2 September 2014 I Volume 10 I Issue 9 I e1003818 EFTA00611789 The Time Scale of Evolutionary Innovation Projection of sequence space Projection of sequence space B (,) 4 C tL Projection of sequence space Projection of sequence space Figure 1. Approximations of a highly rugged fitness landscape by broad peaks and neutral regions. The figures depict examples of highly rugged fitness landscapes where the sequence space has been projected in one dimension. (A) Sequences with fitness below some level /„„ are functionally very different to the desired function, and selection cannot act upon them. All other sequences are considered as targets. The fitness landscape is approximated by a step function: if f, then b = 0, otherwise J = I. (B) Local maxima below the desired fitness threshold 1' are known to slow down the evolutionary random walk towards sequences that attain fitness at least 1'. We approximate the fitness landscape by broad peaks and neutral regions by increasing the fitness of every sequence that belongs in a mountain range with fitness below 1' to the maximal local maxima). below 1'. Note that the target set starts from the upslope of a mountain range whose peak exceeds fi. doi:10.1371/joumal.pcbi.1003818.9001 time for a sequence of length L. 1000 with a very large broad Theorem 2. In all cases where the lower bound on the expected peak of L. in 1/2 is approximately 10" generations; see Table 1. di:worm time if exponential, for polynomials pt(.), pie and If individual evolutionary processes cannot find targets in pit), for any starting sequence with Hamming distance at least polynomial time, then perhaps the success of evolution is based on 3414from the target center, the probabilityfor any one out of p 3(L) the fact that many populations arc searching independently and in independent multiple searches to reach the target set within pt(L) parallel for a particular adaptation. We prove that multiple, steps it at most Ilpi(L). independent parallel searches arc not the solution of the problem, If an evolutionary process takes exponential time, then if the starting sequence is far away from the target center. Formally polynomially many independent searches do not find the target we show the following result. in polynomial time with reasonable probability (for details see A exp(L) poly(L) 0 et L 0 ci. L BO 100 150 200 250 300 360 4 4 Sequence length, 1, Hamming Distance Hamming Distance Figure 2. Broad peak with different fitness landscapes. For the broad peak there is a specific sequence, and all sequences that are within Hamming distance el. are part of the target set. The fitness landscape is flat outside the broad peak (All( the width of the broad peak is c<3/4, then the expected discovery time is exponential in sequence length, (B) If the width of the broad peak is e 3/4, then the expected discovery time is polynomial in sequence length, L. (C) Numerical calculations for broad peak fitness landscapes. We observe exponential expected discovery time for r= 1/3 and e = 1/2, whereas polynomial expected discovery time for c= 3/4. doi:10.1371/joumal.pcbi.100381131002 PLOS Computational Biology I www.ploscompbiol.org 3 September 2014 I Volume 10 I Issue 9 I el 0038113 EFTA00611790 The Time Scale of Evolutionary Innovation Table 1. Numerical data for discovery time in flat fitness landscapes. 1 3 r.i r= - 3 2 4 L=103 1.021011 136.107 183 L=101 3.8910I” 1.28.104' 2666 Numerical data for the discovery time of broad peaks with width r=1/3.1/2. and 3/4 embedded In flat fitness landscapes. First the discovery time Is computed for small values of Las shovm In Figure 21Q. Then the mexaential growth Is extrapolated to L =100 and L=1000, respectively. We show the discovery times for r=1/2• and 1/3. For r= 3/4 the values are polynomial In L. dottimmoumai.pebi.toosstatoot Theorem 55 in the Text Sp. We also show an informal and What are then adaptive problems that can be solved by approximate calculation of the success probability for M evolution in polynomial time? We propose a "regeneration independent searches, as follows: if the expected discovery time process". The basic idea is that evolution can solve a new is exponential (say, ei), then the probability that all M independent problem efficiently, if it is has solved a similar problem already. searches fail upto h steps is at least exp(—(MM/d) (i.e., the Suppose gene duplication or genome rearrangement can give rise success probability within b steps of any of the searches is at most to starting sequences that arc at most k point mutations away from — exp(—(Mb),40), when the starting sequence is far away from the target set, where k is a number that is independent of L. It is the target center. In such a case, one could quickly exhaust the important that starting sequences can be regenerated again and physical resources of all entire planet. The estimated number of again. We prove that Lk+1 many searches are sufficient in order to bacterial cells [36] on earth is about 1030. To give a specific find the target in polynomial time with high probability (see example let us assume that there are 1024 independent searches, Figure 4 and Section 10 in Text SI). The upper bound, Lk I, each with population size N 106. The probability that at least holds even for neutral drift (without selection). Note that in this one of those independent searches succeeds within 1014 genera- case, the expected discovery time for any single search is still tions for sequence length L- 1000 and broad peak of c= 1/2 is exponential. Therefore, most of the Lk+ I searches do not succeed less than 10-26. ill polynomial time; however, with high probability one of the In our basic model, individual mutants arc evaluated one at a searches succeeds in polynomial time. 'Cliere are two key aspects to time. The situation of many mutant lineages evolving in parallel is the "regeneration process": (a) the starting sequence is only a small similar to the multiple searches described above. As we show that number of steps away from the target; and (b) the starting whenever a single search takes exponential time, multiple sequence can be generated repeatedly. This process enables independent searches do not lead to polynomial time solutions, evolution to overcome the exponential barrier. The upper bound, Lk+ i may possibly be further reduced, if selection and/or our results imply intractability for this case as well. We now explore the case of multiple broad peaks that are recombination are included. uniformly and randomly distributed. Consider that there arc m target centers. Around each target center there is a selection Discussion gradient extending up to a distance cL. Formally we can consider The regeneration process formalizes the role of several existing any fitness function f that assigns zero fitness to a sequence whose ideas. First, it tics in with the proposal that gene duplications and Hamming distance exceeds cL from all the target centers, which in germane rearrangements are major events leading to the emer- particular is subsumed by considering the multiple broad peaks gence of new genes [43]. Second, evolution can be seen as a where around each center we consider a broad peak of target set tinkerer playing around with small modifications of existing with peak width c. We establish the following result: sequences rather than creating entirely new ones [44]. Third, the Theorem 3. Consider a single search under the multiple broad process is related to Gillespie's suggestion [29] that the starting peak fitness landscape of m« ,0 target centers chosen uniformly sequence for an evolutionary search must have high fitness. In our at random, with peak width at most c for each center and c <3/4. theory, proximity in fitness value is replaced by proximity in Then with high probability, the expected discovery time of the target sequence space. However, our results show that proximity alone is set is at least (1/m)exp[2L(3/4 — insufficient to break the exponential barrier, and only when Whether or not the function (1/m)expE2L(3/4—O21 is combined with the process of regeneration it yields polynomial exponential in L depends on how m changes with L. But even if discovery time with high probability. Our process can also explain we assume exponentially many broad peak centers, m, with peak the emergence of orphan genes arising from non-coding regions width cL where c< 3/4, we need not obtain polynomial time [45). Section 12 of the Text SI discusses the connection of our (Figure 3 and Theorem S6 in Text SI). approach to existing results. It is known that recombination may accelerate evolution on There is one other scenario that must be mentioned. It is certain fitness landscapes [28,37-39], and recombination may also passible that certain biological functions are hyper-abundant in slow down evolution on other fitness landscapes [40). Recombi- sequence space [21] and that a process generating a large number nation, however, reduces the discovery time only by at most a of random sequences will find the function with high probability. linear factor in sequence length [28,37,38,41,44 A linear or even For example, Bartel & Szostak [46] isolated a new ribozyme from polynomial factor improvement over all exponential function does a pool of about 1015 random sequences of length L.'220. While not convert the exponential function into a polynomial one. such a process is conceivable for small effective sequence length, it Hence, recombination can make a significant difference only if the cannot represent a general solution for large L. underlying evolutionary process without recombination already Our theory has clear empirical implications. 'The regeneration operates in polynomial time. process can be tested ill systems of ill vitro evolution [47]. A PLOS Computational Biology I www.ploscompbiol.org 4 September 2014 I Volume 10 I Issue 9 I e1003818 EFTA00611791 The Time Scale of Evolutionary Innovation A Start of search B los 10 •"6 I . a' O co lit .77. e 10' I 10? o 10° 10 55 20 25 30 35 40 45 50 Sequence length. L C 10 D 10 E 104 .2 le oa 08 „, n le O 06 08 O C o s. to 1d g l 04 E 8 U1 02 to 02 10' 00 00 o KO 1.5 2.0 1 13 20 25 30 35 40 45 so 12 15 18 21 24 27 lime limit -7 Sequence length. L Sequence length, Figure 3. The search for randomly, uniformly distributed targets in sequence space. (A) The target set consists of m random sequences; each one of them is surrounded by a broad peak of width up to eL. The figure shows a pictorial illustration where the L-dimensional sequence space is projected onto two dimensions. From a randomly chosen starting sequence outside the target set, the expected discovery time is at least (1/m)expl2L(3/4—c)2I, which can be exponential in L. (B) Computer simulations showing the average discovery time of m=100, 150, and 200 targets, with r= 1/3. We observe exponential dependency on L. The discovery time is averaged over 200 runs. (C) Success probability estimated as the fraction of the 200 searches that succeed in finding one of the target sequences within 104 generations. The success probability drops exponentially with L. (D) Success probability as a function of time for L=42, 45. and 4S. (E) Discovery time for a large number of randomly generated target sequences. Ether m= 2'434 = or m=4" sequences were generated. For h =0 and b=3 the target set consists of balls of Hamming distance 0 and 3 (respectively) around each sequence. The figure shows the average discovery time of 100 runs. As expected we observe that the discovery time grows exponentially with sequence length, L. doi:10.1371/joumal.pcbi.10038181003 starting sequence can be generated by introducing k point mutations in a known protein encoding sequence of length L. If these point mutations destroy the finwtion of the protein, then the Process regenera ing starting sequence expected discovery time of any one attempt to find the original at Hamming distance k from target sequence should be exponential in L. But only polynomially many searches in L are required to find the target with high probability 11111 in polynomially many steps. The same setup can be used to explore whether the biological function can be found elsewhere in sequence space: the evolutionary trajectory beginning with the Target starting sequence could discover new solutions. Our theory also ti highlights how important it is to explore the distribution of biological functions in sequence space both for RNA [20,21,35,46] and in the protein llllivetse [48]. In summary, we have developed a the/try that allows us to Hamming distance estimate time scales of evolutionary trajectories. We have shown that various natural processes of evolution take exponential time as Figure 4. Regeneration process. Gene duplication (or possibly some function of the sequence length, L. In some cases we have other process) generates a steady stream of starting sequences that are established strong dichotomy results for precise boundary condi- a constant number k of mutations away from the target Many searches tions. We have proposed a mechanism that allows evolution in drift away from the target, but some will succeed in polynomially many steps. We prove that L" I searches ensure that with high probability polynomial time scales. Some interesting directions of future work some search succeed in polynomially many steps. are as follows: (I) Consider various forms of rugged fitness doi:10.1371/joumal.pcbi.10038181004 landscapes and study more refined approximations as compared to PLOS Computational Biology I www.ploscompbiol.org September 2014 I Volume 10 I Issue 9 I e1003818 EFTA00611792 The Time Scale of Evolutionary Innovation the ones we consider; and then estimate the expected discovery the result for Theorem I is obtained as follows: If c < —' 3 then for all time for the refined approximations. (2) While in this paper we 4 characterize the difference between exponential and ()drumlinl 3+4c , PI IL -fr i-I • t• some a> II, and cL<n < we have ' ' for the expected discovery time, more refined analysis (such as 8 efficiency for polynomial time, like cubic vs quadratic time) for hence the sequence ion grows geometrically for a linear length in L specific fitness landscapes using mechanisms like recombination is Then, H(cL,i) ).4kL for all states i> et (i.e., for all sequences another interesting problem. outside of the target set). This corresponds to case I of Theorem I. • PL—n.1—retl On the other hand, if 3 then it is 51, and case 2 of Materials and Methods 4 a Theorem I is derived (for details see Corollary 2 in Text SI). Our te,ult, are hard on a mathematical analysis of the underlying stochastic processes. For hlarkov chains on the one- dimensional grid, we describe recurrence relations for the Intuition behind Theorem 2 expected hitting time and present lower and upper bounds on The basic intuition for the result is as follows: consider a single the expected hitting time using combinatorial analysis (see Text search for which the expected hitting time is exponential. 'Chen for SI for details). We now present the basic intuitive arguments of the single search the probability to succeed in polynomially many the main results. steps is negligible (as otherwise the expectation would not have been exponential). In case of independent searches, the indepen- Markov chain on the one-dimensional grid dence ensures that the probability that all searches fail is the For a single broad peak, due to symmetry we can interpret the product of the probabilities that every single search fails. Using the evolutionary random walk as a Markov chain on the one- above arguments we establish Theorem 2 (for details see Section 8 dimensional grid. A sequence of type i is i steps away from the in Text SI). target, where i is the Hamming distance between this sequence and the target. The probability that a type i sequence imitates to a type Intuition behind Theorem 3 i— I sequence is given by itin3L). The stochastic process of the For this result, it is first convenient to view the evolutionary walk evolutionary random walk is a Markov chain on the one-dimensional taking place in the sequence space ofall sequences oflength L, under grid 0,1 L no selection. Each sequence has 3L neighbors, and considering that a point mutation happens, the transition probability to each of them is The basic recurrence relation I Consider a Markov chain on the one-dimensional grid, and let 3L. The imderhing Markov chain due to symmetry has fast mixing !pi/,o denote the expected hitting time from i to j. The general time, i.e., the number of steps to converge to the stationary recurrence relation for the expected hitting time is as follows: distribution (the mixing time) is O(L logL). Again by symmetry 3 the stationary distribution is the uniform distribution. If c< 4' - then H(i,i)m 1 + Put tH(1,i 1)1- ,i— 1)+ &HUM; (1) from Theorem I we obtain that the expected time to reach a single for/ <i< L, with boundary condition H(j,()= 0. The interpreta- broad peak is exponential. By union bound, if m< <41', the tion is as follows. Given the current state i, if ipsj, at least one probability to reach any of them broad peaks within O(LlogL) steps transition will be made to a neighboring state 1, with probability is negligible. Since after the first O(L log L) steps the Markov chain Prd, from which the hitting time is M(tt). converges to the stationary distribution, then each step of die process can be interpreted as selection of sequences uniformly at random Intuition behind Theorem 1 among all sequences. Using Hoefiding's inequality, we show that with Theorem I is derived by obtaining precise bounds for the exp(243/4—O2-n high probability, in expectation such steps are recurrence relation of the hitting time (Equation I). Consider that m Pu_i >0 for all j<k5i pogo ss towards state j is alums required before a sequence is found that belongs to the target set. possible), as otherwise/ is never reached from L We show (.see Lemma Thus we obtain the result of Theorem 3 (for details see Section 9 in 2 in the Text SI) that we can write Mt° as a sum. Text SI). If (O. E,, 1" b,,, where bit is the sequence defined as: Remark about techniques An important aspect of our work is that we establish our results (I) bom, using elementary techniques for analysis of Markov chains. The P (2) use of more advanced mathematical machinery, such as (ii) b„ 1+ PL_„.L_„tib„_i for n>0. martingales [4g] or drift analysis [50,51], can possibly be used to derive more refined results. While in this work our goal is to distinguish between exponential and polynomial time, whether the techniques from [49 51] can lead to a more refined The basic intuition obtained from Equation 2 is as follows: (i) If characterization within polynomial time is an interesting direc- PL-A.L-"+1 tion for future work. A, for some constant A> I, then the sequence bit PL-A.L-re- grows at least as fast as a geometric series with factor 2. cul On the Supporting Information PL-n.1.-at other hand, if n 51 and P for some L-n.1-n - I Test SI Detailed proofs for "The Time Scale of Evolutionary constant a >0, then the sequence b,, grows at most as fast as an Innovation." arithmetic series with difference litt. From the above case analysis (PDF) I21O5 Computational Biology I wvnv.ploscompbiol.org 6 September 2014 I Volume 10 I Issue 9 I e1003818 EFTA00611793 The Time Scale of Evolutionary Innovation Acknowledgments Author Contributions We thank Nick Barton and Daniel Weissman for helpful discussions and Conceived and designed the experiments: KC AP BA MAN..laslyzed the pointing us to relevant literature. data: KC AP BA MAN. Wrote the paper: KC AP BA MAN. References I. Alhrood AC. (irnixinger.IP, Knoll AFL Burch 1W. Anderson MS. et al. (2009) 19. Gillespie JH (1984) Molecular evolution over the mutational landscape. Controls on development and diversity or early anthean ommainints. Pow Nail Evolution 38: 1116-1129. And Sri USA 106: 9548-9555.

Entities

0 total entities mentioned

No entities found in this document

Document Metadata

Document ID
2efecf29-f544-490a-9630-a1b9239d5ccf
Storage Key
dataset_9/EFTA00611788.pdf
Content Hash
6b8505d81d74901dd5471df085582fcb
Created
Feb 3, 2026