Recommendation

Bounding the reticulation number for three phylogenetic trees

Simone Linz based on reviews by Guillaume Scholz and Stefan Grünewald

A recommendation of:

When Three Trees Go to War

Leo van Iersel and Mark Jones and Mathias Weller (2023), HAL, ver.3, peer-reviewed and recommended by PCI Mathematical and Computational Biology https://hal.science/hal-04013152v3

Read preprint in preprint server Now published in Peer Community Journal

Abstract

EN

AR

ES

FR

HI

JA

PT

RU

ZH-CN

When Three Trees Go to War

How many reticulations are needed for a phylogenetic network to display a given set of k phylogenetic trees on n leaves? For k = 2, Baroni, Semple, and Steel [Ann. Comb. 8, 391-408 (2005)] showed that the answer is n − 2. Here, we show that, for k ≥ 3 the answer is at least (3 /2 − epsilon)n. Concretely, we prove that, for each epsilon > 0, there is some n ∈ N such that three n-leaf caterpillar trees can be constructed in such a way that any network displaying these caterpillars contains at least (3 /2 − epsilon)n reticulations. The case of three trees is interesting since it is the easiest case that cannot be equivalently formulated in terms of agreement forests. Instead, we base the result on a surprising lower bound for multilabelled trees (MUL-trees) displaying the caterpillars. Indeed, we show that one cannot do (more than an epsilon) better than the trivial MUL-tree resulting from a simple concatenation of the given caterpillars. The results are relevant for the development of methods for the Hybridization Number problem on more than two trees. This fundamental problem asks to construct a binary phylogenetic network with a minimum number of reticulations displaying a given set of phylogenetic trees.

Phylogenetic Networks, Tree Containment, Caterpillar Tree, Hybridization Number

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

عندما تذهب ثلاث شجرات إلى الحرب

ما عدد الشبكات اللازمة لشبكة النشوء والتطور لعرض مجموعة معينة من أشجار النشوء والتطور k على أوراق n؟ بالنسبة لـ k = 2، باروني، سمبل، وستيل [آن. مشط. 8, 391-408 (2005)] أظهر أن الإجابة هي n - 2. هنا، نوضح أنه بالنسبة لـ k ≥ 3 تكون الإجابة على الأقل (3 /2 - epsilon)n. لقد أثبتنا بشكل ملموس أنه لكل إبسيلون > 0، هناك بعض n ∈ N بحيث يمكن إنشاء ثلاث أشجار كاتربيلر ذات أوراق n بطريقة تجعل أي شبكة تعرض هذه اليرقات تحتوي على الأقل على (3/2 - إبسيلون) n شبكية. تعتبر حالة الأشجار الثلاثة مثيرة للاهتمام لأنها الحالة الأسهل التي لا يمكن صياغتها بشكل متكافئ من حيث الغابات المتفق عليها. وبدلاً من ذلك، قمنا ببناء النتيجة على حد أدنى مدهش للأشجار متعددة العلامات (MUL-trees) التي تعرض اليرقات. في الواقع، لقد أظهرنا أنه لا يمكن للمرء أن يفعل (أكثر من إبسيلون) أفضل من شجرة MUL التافهة الناتجة عن سلسلة بسيطة من اليرقات المعطاة. النتائج ذات صلة بتطوير طرق مشكلة رقم التهجين على أكثر من شجرتين. تتطلب هذه المشكلة الأساسية إنشاء شبكة ثنائية للنشوء والتطور مع أقل عدد ممكن من الشبكات التي تعرض مجموعة معينة من أشجار النشوء والتطور.

الشبكات التطورية، احتواء الأشجار، شجرة اليرقات، رقم التهجين

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Cuando tres árboles van a la guerra

¿Cuántas reticulaciones se necesitan para que una red filogenética muestre un conjunto dado de k árboles filogenéticos en n hojas? Para k = 2, Baroni, Semple y Steel [Ann. Peine. 8, 391-408 (2005)] mostró que la respuesta es n − 2. Aquí, mostramos que, para k ≥ 3 la respuesta es al menos (3/2 − épsilon)n. Concretamente demostramos que, para cada épsilon > 0, hay algo de n ∈ N tal que se pueden construir tres árboles de orugas de n hojas de tal manera que cualquier red que muestre estas orugas contenga al menos (3/2 − épsilon)n reticulaciones. El caso de los tres árboles es interesante porque es el caso más sencillo que no puede formularse de manera equivalente en términos de bosques de acuerdo. En cambio, basamos el resultado en un límite inferior sorprendente para los árboles multietiquetados (árboles MUL) que muestran las orugas. De hecho, demostramos que no se puede hacer (más que un épsilon) mejor que el trivial árbol MUL resultante de una simple concatenación de las orugas dadas. Los resultados son relevantes para el desarrollo de métodos para el problema del Número de Hibridación en más de dos árboles. Este problema fundamental requiere construir una red filogenética binaria con un número mínimo de reticulaciones que muestren un conjunto dado de árboles filogenéticos.

Redes filogenéticas, contención de árboles, árbol oruga, número de hibridación

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Quand trois arbres partent en guerre

Combien de réticulations sont nécessaires pour qu'un réseau phylogénétique affiche un ensemble donné de k arbres phylogénétiques sur n feuilles ? Pour k = 2, Baroni, Semple et Steel [Ann. Peigne. 8, 391-408 (2005)] ont montré que la réponse est n − 2. Ici, nous montrons que, pour k ≥ 3, la réponse est au moins (3/2 − epsilon)n. Concrètement, nous prouvons que, pour chaque epsilon > 0, il existe n ∈ N tel que trois arbres de chenilles à n feuilles peuvent être construits de telle manière que tout réseau affichant ces chenilles contienne au moins (3/2 − epsilon)n réticulations. Le cas de trois arbres est intéressant car c’est le cas le plus simple qui ne peut être formulé de manière équivalente en termes d’accord forêts. Au lieu de cela, nous basons le résultat sur une limite inférieure surprenante pour les arbres multi-étiquetés (arbres MUL) affichant les chenilles. En effet, nous montrons qu’on ne peut pas faire (plus qu’un epsilon) mieux que le trivial MUL-tree résultant d’une simple concaténation des chenilles données. Les résultats sont pertinents pour le développement de méthodes pour le problème du nombre d’hybridation sur plus de deux arbres. Ce problème fondamental consiste à construire un réseau phylogénétique binaire avec un nombre minimum de réticulations affichant un ensemble donné d'arbres phylogénétiques.

Réseaux phylogénétiques, confinement des arbres, arbre chenille, nombre d'hybridation

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

जब तीन पेड़ युद्ध के लिए जाते हैं

एक फ़ाइलोजेनेटिक नेटवर्क को n पत्तियों पर k फ़ाइलोजेनेटिक पेड़ों के दिए गए सेट को प्रदर्शित करने के लिए कितने रेटिक्यूलेशन की आवश्यकता होती है? K = 2 के लिए, बरोनी, सेम्पल और स्टील [Ann. कंघा। 8, 391-408 (2005)] ने दिखाया कि उत्तर n - 2 है। यहां, हम दिखाते हैं कि, k ≥ 3 के लिए उत्तर कम से कम (3/2 - एप्सिलॉन)n है। सीधे तौर पर, हम यह साबित करते हैं कि, प्रत्येक ईपीएसलॉन के लिए > 0, कुछ एन ∈ एन है जैसे कि तीन एन-लीफ कैटरपिलर पेड़ों का निर्माण इस तरह से किया जा सकता है कि इन कैटरपिलर को प्रदर्शित करने वाले किसी भी नेटवर्क में कम से कम (3/2 - ईपीएसलॉन) एन रेटिक्यूलेशन हो। तीन पेड़ों का मामला दिलचस्प है क्योंकि यह सबसे आसान मामला है जिसे समझौते वाले वनों के संदर्भ में समान रूप से तैयार नहीं किया जा सकता है। इसके बजाय, हम कैटरपिलर को प्रदर्शित करने वाले बहु-लेबल वाले पेड़ों (एमयूएल-पेड़ों) के लिए एक आश्चर्यजनक निचली सीमा पर परिणाम को आधार बनाते हैं। वास्तव में, हम दिखाते हैं कि कोई भी दिए गए कैटरपिलर के सरल संयोजन से उत्पन्न तुच्छ एमयूएल-पेड़ से बेहतर (एक एप्सिलॉन से अधिक) नहीं कर सकता है। परिणाम दो से अधिक पेड़ों पर संकरण संख्या समस्या के तरीकों के विकास के लिए प्रासंगिक हैं। यह मौलिक समस्या फ़ाइलोजेनेटिक पेड़ों के दिए गए सेट को प्रदर्शित करने वाले न्यूनतम संख्या में रेटिक्यूलेशन के साथ एक बाइनरी फ़ाइलोजेनेटिक नेटवर्क बनाने के लिए कहती है।

फ़ाइलोजेनेटिक नेटवर्क, वृक्ष रोकथाम, कैटरपिलर वृक्ष, संकरण संख्या

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

三本の木が戦争に行くとき

系統発生ネットワークが n 個の葉に k 個の系統樹の指定されたセットを表示するには、いくつの網目が必要ですか? k = 2 の場合、バローニ、センプル、スティール [Ann.櫛。ここでは、k ≥ 3 の場合、答えは少なくとも (3 /2 − epsilon)n であることを示します。具体的には、各イプシロン > について、次のことを証明します。 0 の場合、n ∈ N があり、これらの毛虫を表示するネットワークに少なくとも (3 /2 − epsilon)n 個の網目が含まれるような方法で 3 つの n 葉の毛虫ツリーを構築できます。 3 本の木のケースは、協定森林に関して等価に定式化できない最も簡単なケースであるため、興味深いです。代わりに、毛虫を表示するマルチラベルツリー (MUL ツリー) の驚くべき下限に基づいて結果を導き出します。実際、我々は、与えられたキャタピラの単純な連結から得られる自明な MUL ツリーよりも (イプシロン以上に) 優れた処理を行うことはできないことを示します。この結果は、3 つ以上のツリーでのハイブリッド数問題の手法の開発に関連します。この基本的な問題では、指定された一連の系統樹を表示する最小数の網目を持つバイナリ系統ネットワークを構築することが求められます。

系統発生ネットワーク、木の封じ込め、毛虫の木、交雑数

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Quando três árvores vão para a guerra

Quantas reticulações são necessárias para que uma rede filogenética exiba um determinado conjunto de k árvores filogenéticas em n folhas? Para k = 2, Baroni, Semple e Steel [Ann. Pentear. 8, 391-408 (2005)] mostraram que a resposta é n − 2. Aqui, mostramos que, para k ≥ 3 a resposta é pelo menos (3/2 − épsilon)n. Concretamente, provamos que, para cada épsilon > 0, existe algum n ∈ N tal que três árvores lagartas de n folhas podem ser construídas de tal forma que qualquer rede exibindo essas lagartas contenha pelo menos (3/2 − épsilon)n reticulações. O caso das três árvores é interessante porque é o caso mais fácil que não pode ser formulado de forma equivalente em termos de florestas de acordo. Em vez disso, baseamos o resultado em um limite inferior surpreendente para árvores multimarcadas (árvores MUL) exibindo as lagartas. Na verdade, mostramos que não se pode fazer (mais do que um épsilon) melhor do que a árvore MUL trivial resultante de uma simples concatenação das lagartas dadas. Os resultados são relevantes para o desenvolvimento de métodos para o problema do Número de Hibridização em mais de duas árvores. Este problema fundamental pede a construção de uma rede filogenética binária com um número mínimo de reticulações exibindo um determinado conjunto de árvores filogenéticas.

Redes Filogenéticas, Contenção de Árvores, Árvore de Lagarta, Número de Hibridização

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Когда три дерева идут на войну

Сколько ретикуляций необходимо филогенетической сети для отображения заданного набора из k филогенетических деревьев на n листьях? Для k = 2 Барони, Семпл и Стил [Ann. Гребень. 8, 391-408 (2005)] показал, что ответ равен n − 2. Здесь мы показываем, что для k ≥ 3 ответ равен не менее (3/2 − эпсилон)n. Конкретно мы доказываем, что для каждого эпсилона > 0, существует некоторое n ∈ N такое, что три n-листных дерева гусениц можно построить таким образом, что любая сеть, отображающая эти гусеницы, содержит не менее (3/2 − эпсилон)n ретикуляций. Случай трех деревьев интересен тем, что это самый простой случай, который невозможно эквивалентно сформулировать в терминах согласованных лесов. Вместо этого мы основываем результат на неожиданной нижней границе для деревьев с несколькими метками (MUL-деревья), отображающих гусениц. Действительно, мы показываем, что нельзя сделать (больше, чем на эпсилон) лучше, чем тривиальное MUL-дерево, возникающее в результате простой конкатенации данных гусениц. Результаты актуальны для разработки методов решения проблемы числа гибридизации на более чем двух деревьях. Эта фундаментальная проблема заключается в построении бинарной филогенетической сети с минимальным количеством ретикуляций, отображающей заданный набор филогенетических деревьев.

Филогенетические сети, сдерживание дерева, гусеничное дерево, число гибридизации

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

当三棵树开战时

系统发育网络需要多少个网状结构才能在 n 个叶子上显示一组给定的 k 个系统发育树？对于 k = 2，Baroni、Semple 和 Steel [Ann.梳子。 8, 391-408 (2005)]表明答案是 n − 2。在这里，我们证明，对于 k ≥ 3，答案至少是 (3 /2 − epsilon)n。具体来说，我们证明，对于每个 epsilon > 0，存在一些 n ∈ N，这样可以构造三棵 n 叶毛毛虫树，使得显示这些毛毛虫的任何网络至少包含 (3 /2 − epsilon)n 网状结构。三棵树的情况很有趣，因为它是最简单的情况，不能用协议森林等价地表述。相反，我们将结果基于显示毛毛虫的多标记树（MUL 树）的令人惊讶的下限。事实上，我们表明，一个人不能比由给定毛毛虫的简单串联产生的平凡 MUL 树做得更好（超过一个 epsilon）。该结果与两棵以上树上的杂交数问题方法的开发相关。这个基本问题要求构建一个二元系统发育网络，其中包含显示给定系统发育树集的最小数量的网状结构。

系统发育网络、树遏制、毛毛虫树、杂交数

Submission: posted 07 March 2023, validated 13 March 2023
Recommendation: posted 05 October 2023, validated 12 October 2023

Cite this recommendation as:
Linz, S. (2023) Bounding the reticulation number for three phylogenetic trees. Peer Community in Mathematical and Computational Biology, 100187. https://doi.org/10.24072/pci.mcb.100187

Recommendation

Reconstructing a phylogenetic network for a set of conflicting phylogenetic trees on the same set of leaves remains an active strand of research in mathematical and computational phylogenetic since 2005, when Baroni et al. [1] showed that the minimum number of reticulations h(T,T') needed to simultaneously embed two rooted binary phylogenetic trees T and T' into a rooted binary phylogenetic network is one less than the size of a maximum acyclic agreement forest for T and T'. In the same paper, the authors showed that h(T,T') is bounded from above by n-2, where n is the number of leaves of T and T' and that this bound is sharp. That is, for a fixed n, there exist two rooted binary phylogenetic trees T and T' such that h(T,T')=n-2.

Since 2005, many papers have been published that develop exact algorithms and heuristics to solve the above NP-hard minimisation problem in practice, which is often referred to as Minimum Hybridisation in the literature, and that further investigate the mathematical underpinnings of Minimum Hybridisation and related problems. However, many such studies are restricted to two trees and much less is known about Minimum Hybridisation for when the input consists of more than two phylogenetic trees, which is the more relevant cases from a biological point of view.

In [2], van Iersel, Jones, and Weller establish the first lower bound for the minimum reticulation number for more than two rooted binary phylogenetic trees, with a focus on exactly three trees. The above-mentioned connection between the minimum number of reticulations and maximum acyclic agreement forests does not extend to three (or more) trees. Instead, to establish their result, the authors use multi-labelled trees as an intermediate structure between phylogenetic trees and phylogenetic networks to show that, for each ε>0, there exist three caterpillar trees on n leaves such that any phylogenetic network that simultaneously embeds these three trees has at least (3/2 - ε)n reticulations. Perhaps unsurprising, caterpillar trees were also used by Baroni et al. [1] to establish that their upper bound on h(T,T') is sharp. Structurally, these trees have the property that each internal vertex is adjacent to a leaf. Each caterpillar tree can therefore be viewed as a sequence of characters, and it is exactly this viewpoint that is heavily used in [2]. More specifically, sequences with short common subsequences correspond to caterpillar trees that need many reticulations when embedded in a phylogenetic network. It would consequently be interesting to further investigate connections between caterpillar trees and certain types of sequences. Can they be used to shed more light on bounds for the minimum reticulation number?

References

[1] Baroni, M., Grünewald, S., Moulton, V., and Semple, C. (2005) "Bounding the number of hybridisation events for a consistent evolutionary history". J. Math. Biol. 51, 171–182. https://doi.org/10.1007/s00285-005-0315-9

[2] van Iersel, L., Jones, M., and Weller, M. (2023) “When three trees go to war”. HAL, ver. 3 peer-reviewed and recommended by Peer Community In Mathematical and Computational Biology. https://hal.science/hal-04013152/

PDF recommendation

Conflict of interest:
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article. The authors declared that they comply with the PCI rule of having no financial conflicts of interest in relation to the content of the article.

Funding:
Netherlands Organization for Scientific Research (NWO) KLEIN grant OCENW.KLEIN.125

Reviews

Evaluation round #2

DOI or URL of the preprint: https://hal.science/hal-04013152v2

Version of the preprint: 2

Author's Reply, 25 Sep 2023

Dear Reviewers and Recommender,

thank you for the additional feedback. Concerning the only remaining issue, we added a phrase to the Case 2a (page 9) to make absolutely clear that the case in which all other caterpillars have no leaves mapped to leaves of L_Q is indeed included in the formulation of this case.

Just to reiterate, we are talking about the property
P(T) = caterpillar T has a leaf embedded in a leaf of L_Q
and the associated set of Caterpillars that satisfy the property:
A = { T | P(T) }
Now, Case 2a applies if all caterpillars in A have the same parity as Q. In particular, if A is empty, then the condition of Case 2a is satisfied, since all-quantified formulas over the empty set are always true. That's what I wanted to point out in my previous reviewer-answer. Admittedly, I could have been more verbose and I apologize for my previous brevity.

https://doi.org/10.24072/pci.mcb.100187.ar2

Decision by Simone Linz, posted 17 Sep 2023, validated 18 Sep 2023

Dear Leo, Mark, and Mathias,

Thank you for revising your preprint `When three trees go to war’ and addressing all comments. There is one minor outstanding issue regarding the proof of Claim 1 (Case 2). Could you please have a look at the claim and address the referee's comment? If no change is necessary because the situation, as described, is already covered by Case 2a, please provide a detailed response in your reply to the comment.

Best wishes,
Simone

https://doi.org/10.24072/pci.mcb.100187.d2

Reviewed by Guillaume Scholz , 14 Sep 2023

I am happy with the changes made to the manuscript regerding my comments.

Yet, I am not convinced by the reply to my comment 2 (about case 2a in the proof of Claim 1). My concern was the possibility of the other two caterpillars being embedded in (exactly one) leaf of N_r that is NOT a leaf of L_Q. Then the constraint “all caterpillars with a leaf embedded into a leaf of L_Q have the same parity as Q” does not apply to these caterpillars. They are not, under my assumption above, embedded in a leaf of L_Q. So unfortunately, I still don't see why this situation is covered in case 2a.

https://doi.org/10.24072/pci.mcb.100187.rev21

Evaluation round #1

DOI or URL of the preprint: https://hal.science/hal-04013152v1

Version of the preprint: 1

Author's Reply, 22 Aug 2023

Download author's reply https://doi.org/10.24072/pci.mcb.100187.ar1

Decision by Simone Linz, posted 17 Jul 2023, validated 18 Jul 2023

Dear Leo, Mark, and Mathias,

Thank you for submitting your preprint `When three trees go to war’ to PCI Math Comp Biol. I have received two expert reports that you can read below. Both reports comment positively on the relevance of your work, the clarity of the writing, and the impact it may have on future work in the theoretical as well as the more applied space of research on inferring phylogenetic networks from phylogenetic trees.

Before final recommendation, I ask you to please prepare a revised version of your work that addresses the referees' comments. Below, I also include a short list of additional things that I noticed while reading your paper.

I look forward to receiving your revised preprint.

Best wishes,
Simone

line 5: is $n-2$ --> is at most $n-2$
abstract: Please mention that the problem you are considering is on rooted trees and networks.
line 53: Mention briefly what a universal network is (before saying how it can be constructed).
line 75: What does `its' refer to?
line 93: unique minimum --> unique minimum node
line 118: Has your construction to find triples of caterpillars that are `as different as possible' any connections to work on strings and sequences? You already comment that the caterpillar construction problem is essentially a problem on finding strings that have short common subsequences. Is anything known about how short these subsequences can get for a fixed number of sequences that consist of the same (multi)set of letters?
page 11: Add a full stop after the first and third (centred) inequalities.
line 364: Emphasise that such a network with 2(n-2) reticulations is not necessarily most parsimonious.

https://doi.org/10.24072/pci.mcb.100187.d1

Reviewed by Guillaume Scholz , 08 May 2023

Review of the manuscript: "When three trees go to war", by Leo van Iersel, Mark Jones and Mathias Weller.

In this manuscript, the authors investigate the question of the number of reticulation vertices in a network displaying a given set of $k$ phylogentic trees on $n$ leaves. For the case $k=2$, which was solved in 2005, the answer is $n-2$, meaning that for two phylogenetic trees, there always exists a network with at most $n-2$ reticulation vertices displaying both trees. In this contribution, the authors show that for $k \geq 3$, the answer is at least $(3/2-\epsilon)n$. More specifically, they show that for all $\epsilon>0$, there exists $n>0$ and three trees on $n$ leaves such that any network displaying these three trees has at least $(3/2-\epsilon)n$ reticulation vertices (Theorem 1). To the best of my knowledge, this is the first identified lower bound for this problem. Along the way, they also provide a bound to the number of leaves required in a multi-labelled tree (a kind of leaf-labelled tree in which two or more leaves may get the same label) displaying a given set of three phylogenetic trees (Corollary 1).

The paper is technical in essence, very well written, and I found no major flaw in the reasoning. There are however a few minor technicalities, which the authors may wish to address:

1 - Looking at the induction in the proofs of Lemma 1 and Lemma 2, I think you should state in the Preliminaries that a single isolated network is considered a MUL-network (or a single isolated edge, whichever you prefer). Your current definition implies that a MUL-tree necessarily has two or more leaves, which conflicts with the base case of your induction in both lemmas.

2 - In the proof of Claim 1, case 2: I am wondering about the leaves of $N_r$ that are not in $L_Q$. Should we not have a case 2a', where none of the other two caterpillars are embedded in a leaf of $L_Q$, but are both embedded in a leaf of $N_r - L_Q$? If this situation cannot happen, then the reason is not obvious to me.

3 - At the end of Rule 1, you claim that you never need to remove more token than the quantity present in the token reservoir. However, Claim 2 "only" states that the number of tokens at the end of the process is non negative. Since the proof of Claim 2 actually shows the former (stronger) statement, maybe you could rephrase Claim 2.

4 - At the very end of the proof of Claim 2. You showed that the number of bad leaves is at most $6n^{log_3 2}$, and the token reservoir contains at least $n$ tokens by the time the first withdrawal is made. But each withdrawal removes $2q$ tokens, not only $q$. So unless I am missing something, $n>6qn^{log_3 2}$ is not enough to ensure that we never remove "too many" tokens. Because with $6n^{log_3 2}$ bad leaves, we will remove $12qn^{log_3 2}$ tokens in total, not $6qn^{log_3 2}$.

5 - Some typos:
-First paragraph of section 2: "occurance" -> "occurence" (4 times).
-In Construction 1: "recusively" -> "recursively".
-In the proof of Lemma 1: "let let" -> "let".

https://doi.org/10.24072/pci.mcb.100187.rev11

Reviewed by Stefan Grünewald, 13 Jul 2023

Van Iersel, Jones, and Weller construct a family of triples of permutations that give rise to triples of rooted phylogenetic trees which are difficult to embed into a single phylogenetic network. While the problem of embedding two trees has been studied extensively, considering more trees has been regarded much harder. This manuscript yield the first lower bound on the maximum number of reticulations that can be required to embed exactly 3 binary trees.

The construction of the sequences is not very surprising, but it is hard to estimate the number of reticulations. The authors use multi-labeled trees as an intermediate structure. They first show that many leaves are needed to embed the input trees into a multi-labeled tree, and then they establish that a network with fewer reticulations than claimed would allow a multi-labeled tree with fewer leaves than possible.

The result is relevant for applications, because there is no biological reason why the embedding of k trees into a phylogenetic network should be restricted to k=2. The proofs are sophisticated and require complicated notation as well as distinguishing many cases. Some time is needed to understand the details of the proofs, but the paper is well written and organised, making it as easy as possible for the reader to follow. In the last section various questions for further research are asked which are all interesting. A natural conjecture would be that the maximum number of reticulations needed for k trees with n taxa would asymptotically be c(k)n where c(k) is the sum of 1/i for i ranging from 1 to k-1.

In summary, this is an interesting paper that is enjoyable to read and likely to have some impact on future work on the hybridization number problem.

I found four typos/minor errors that the authors might want to correct, but another round of review is not necessary.

Figure 2: Y_3 should end with 312645, not 645312.

l. 146 "let let"

ll. 236 and 243: Use 'monotonically' instead of 'monotonously'.

l.278: Add 'and' between 'that' and 'leaves'.

https://doi.org/10.24072/pci.mcb.100187.rev12

User comments

No user comments yet

or Register
Submit a preprint