Statistical Methods in Phylogenetics and Macroevolution
The DFG Emmy-Noether group of Sebastian Höhna works on a wide range of statistical methods, ranging from evolutionary biology, population genetics and phylogenetics to paleo-phylogenetics. The group is primarily funded by a DFG Emmy Noether Project and a second DFG research grant. Our main goal is to develop statistical methods to study the biological process that produced current-day biodiversity. Therefore, we are taking a phylogenetic approach to describe the relationship among species (both extant and extinct) with a specific focus on the divergence time between species. Finally, we want to study what processes drive historical biodiversity and are responsible for the fluctuations, e.g., major increases and decreases, of biodiversity over geological timescales. Each project and the role of the group members is described below.
Inference of Microevolutionary Processes
In the DFG funded project within the SPP 1991 Taxon-OMICS, our aim is to develop and test different demographic and species delimitation models. Currently, we focus this line of research on two wide-spread European firefly species (Lampyris noctiluca and Lamprohiza splendidula) as well as one North American (Photinus pyralis). Additional to the fascinating ability to produce light, these three species of fireflies are excellent model organisms to study recent local adaptation. Each species has a widespread geographical range which it must have acquired recently (after the last ice age). Furthermore, the females are neotenic in the two European species, which means that only males can fly. Nevertheless, Lampyris noctiluca managed to establish populations spanning from the Iberian Peninsula to the United Kingdom and Finland. We will try to learn when these populations were established if these populations are still connected to other populations.
Using the fireflies as a model system, we want to study if the currently known species are indeed only one species each, or if the sub-populations are rather distinguished species. Furthermore, we will explore how much gene-flow exists between the populations. This will be particularly interesting because of the different ability to move between males and females, and provides a reference about how much gene-flow is realistic for our deep-time study. Thus, this microevolution study serves as focused inspection of the underlying population genetic process, i.e., under a magnifying glass, which we wish to extrapolate to macroevolutionary processes in deep time.
Figure 1: Schematic overview of demographic hypotheses for samples from three locations. The top left plot shows the genealogy (i.e., coalescent tree) of the samples from three different hypothetical locations, four samples from each location. The gray boxes represent the species/population history with changing population sizes. The second plot in the top row represent a hypothesis with a single species, whereas the two right plots in the top row are hypotheses with two species. In the bottom row we have one hypothesis with two species (left) and three hypotheses with three species. Some of these hypotheses, for example, the bottom right plot, show changes in population sizes.
This project is led by Dr. Ana Catalán together with doctoral candidate Ronja Billenstein.
Robust Estimation of Gene Trees
As mentioned above, gene trees depict the genealogical relationship among samples for a single gene/locus. Our work heavily relies on gene trees to infer demographic histories, species trees and causes of gene tree species tree discordance (see below). Thus, we rely on reliable and robust estimates of gene trees for our other projects. In this work we are focused on how to reliably estimate gene trees by developing more robust statistical methods.
Currently, we found that single gene tree estimates cannot even recover well known and established clades, and show discordances that could only be explained either by (a) extremely high rates of gene duplication and loss, or (b) gene-flow among species that are evolutionary separated since more than 100 million years. Both explanations are clearly unrealistic. Instead, these results show that we cannot yet estimate gene tree in deep time robustly. Our approach is to develop more biologically realistic and thus more complex models how DNA changes over millions of years.
This project is led by doctoral candidates Luiza Fabreti and Killian Smith.
Understanding Discordance Between Species Trees and Gene Trees
It is widely acknowledged that gene trees can differ from species trees. The following biological scenarios can cause incongruent gene trees: (a) simple population genetic processes (i.e., the coalescent), (b) migration and thus gene-flow between species/populations, and (c) gene-duplication and gene-loss. However, it is not understood how much each of these processes actually contributes to the observed variation in gene trees. For example, we know only very little about long-time rates of gene duplication and losses and we do not know how diverged species can be while still exchanging genes.
Figure 2: Schematic overview of the main biological processes resulting in discordance between gene trees and species trees. On the left, we have four example species of fireflies for which either genome sequence data are publicly available (Photinus pyralis) or for which we are producing new reference genomes. From the genomes, we extract thousands of orthologous loci (e.g., genes and non-coding regions) from which we estimate so-called gene trees. Using these gene tree estimates, we can (i) estimate the species phylogeny and also (ii) identify if the gene tree and species tree match. If not, we can test if (a) incomplete lineage sorting, (b) gene flow, or (c) gene duplication and loss are responsible for the discordance.
We are working currently working on new statistical methods that can, for the first time, test for these three competing biological causes to explain gene tree variation.
This project was led by Allison Hsiang.
Inference of Macroevolutionary Processes
Our ultimate goal is to infer the process that is responsible for the observed patterns of historical biodiversity. Specifically, we are interest in learning if environmental variables, such as historical temperature or CO2 levels, and species-specific variables, such as habitat, diet and body size, are influencing biodiversity. Additionally, we are interested how major events, such as massive extinctions, have impacted the biodiversity of different groups and if there are species-specific factors that influenced survival probabilities.
We use stochastic processes of speciation and extinction to model how biodiversity has changed over geological time. We develop new statistical models where the speciation and extinction rates change over time and among lineages and identify correlations to genetic, phenotypic and/or environmental factors that impact speciation and extinction rates. For example, we developed a statistical method to identify correlations between rates of diversification and environmental variables, such as changes in atmospheric CO2. In our future work, we want to incorporate fossil occurrence information in our models of lineage diversification.
Figure 3: Example of diversification rates (i.e., speciation and extinction rates) in daisies and grasses which are negatively correlation with environmental CO2 levels.Our ultimate goal is to infer the process that is responsible for the observed patterns of historical biodiversity. Specifically, we are interest in learning if environmental variables, such as historical temperature or CO2 levels, and species-specific variables, such as habitat, diet and body size, are influencing biodiversity. Additionally, we are interested how major events, such as massive extinctions, have impacted the biodiversity of different groups and if there are species-specific factors that influenced survival probabilities.
This project is led by doctoral candidates Bjørn Kopperud.