HOLLAND Barbara's profile
avatar

HOLLAND Barbara

  • School of Natural Sciences (Mathematics), University of Tasmania, Hobart, Australia
  • Evolutionary Biology, Genetics and population Genetics, Probability and statistics
  • recommender, manager

Recommendations:  0

Review:  1

Educational and work
I apply mathematics and statistics to problems in evolutionary biology or population genetics. In particular, my research has focused on problems in phylogenetics. I am interested in developing tools that can assess if sequence data is well explained by a simple tree model or if more complex processes such as hybridisation, recombination or convergent selection are at work. The areas of mathematics/statistics that I use frequently include stochastic models, continuous time Markov chains, combinatorial optimization, maximum likelihood, and simulation.

Review:  1

2020-12-24
article picture

A linear time solution to the Labeled Robinson-Foulds Distance problem

Recommended by based on reviews by Gabriel Cardona, Jean-Baka Domelevo Entfellner , Barbara Holland and 1 anonymous reviewer

Comparing reconciled gene trees in linear time

Unlike a species tree, a gene tree results not only from speciation events, but also from events acting at the gene level, such as duplications and losses of gene copies, and gene transfer events [1]. The reconciliation of phylogenetic trees consists in embedding a given gene tree into a known species tree and, doing so, determining the location of these gene-level events on the gene tree [2]. Reconciled gene trees can be seen as phylogenetic trees where internal node labels are used to discriminate between different gene-level events. Comparing them is of foremost importance in order to assess the performance of various reconciliation methods (e.g. [3]).
A paper describing an extension of the widely used Robinson-Foulds (RF) distance [4] to trees with labeled internal nodes was presented earlier this year [5]. This distance, called ELRF, is based on edge edits and coincides with the RF distance when all internal labels are identical; unfortunately, the ELRF distance is very costly to compute. In the present paper [6], the authors introduce a distance called LRF, which is inspired by the TED (Tree Edit Distance [7]) and is based on node edits. As the ELRF, the new distance coincides with the RF distance for identically-labeled internal nodes, but has the additional desirable features of being computable in linear time. Also, in the ELRF distance, an edge can be deleted if only it connects nodes with the same label. The new formulation does not have this restriction, and this is, in my opinion, an improvement since the restriction makes little sense in the comparison of reconciled gene trees.
The authors show the pertinence of this new distance by studying the impact of taxon sampling on reconciled gene trees when internal labels are computed via a method based on species overlap. The linear algorithm to compute the LRF distance presented in the paper has been implemented and the software —written in Python— is freely available for the community to use it. I bet that the LRF distance will be widely used in the coming years!

References

[1] Maddison, W. P. (1997). Gene trees in species trees. Systematic biology, 46(3), 523-536. doi: https://doi.org/10.1093/sysbio/46.3.523
[2] Boussau, B., and Scornavacca, C. (2020). Reconciling gene trees with species trees. Phylogenetics in the Genomic Era, p. 3.2:1–3.2:23. [3] Doyon, J. P., Chauve, C., and Hamel, S. (2009). Space of gene/species trees reconciliations and parsimonious models. Journal of Computational Biology, 16(10), 1399-1418. doi: https://doi.org/10.1089/cmb.2009.0095
[4] Robinson, D. F., and Foulds, L. R. (1981). Comparison of phylogenetic trees. Mathematical biosciences, 53(1-2), 131-147. doi: https://doi.org/10.1016/0025-5564(81)90043-2
[5] Briand, B., Dessimoz, C., El-Mabrouk, N., Lafond, M. and Lobinska, G. (2020). A generalized Robinson-Foulds distance for labeled trees. BMC Genomics 21, 779. doi: https://doi.org/10.1186/s12864-020-07011-0
[6] Briand, S., Dessimoz, C., El-Mabrouk, N. and Nevers, Y. (2020) A linear time solution to the labeled Robinson-Foulds distance problem. bioRxiv, 2020.09.14.293522, ver. 4 peer-reviewed and recommended by PCI Mathematical and Computational Biology. doi: https://doi.org/10.1101/2020.09.14.293522
[7] Zhang, K., and Shasha, D. (1989). Simple fast algorithms for the editing distance between trees and related problems. SIAM journal on computing, 18(6), 1245-1262. doi: https://doi.org/10.1137/0218082

avatar

HOLLAND Barbara

  • School of Natural Sciences (Mathematics), University of Tasmania, Hobart, Australia
  • Evolutionary Biology, Genetics and population Genetics, Probability and statistics
  • recommender, manager

Recommendations:  0

Review:  1

Educational and work
I apply mathematics and statistics to problems in evolutionary biology or population genetics. In particular, my research has focused on problems in phylogenetics. I am interested in developing tools that can assess if sequence data is well explained by a simple tree model or if more complex processes such as hybridisation, recombination or convergent selection are at work. The areas of mathematics/statistics that I use frequently include stochastic models, continuous time Markov chains, combinatorial optimization, maximum likelihood, and simulation.