Allowing gene transfers doesn't make life easier for inferring orthology and paralogy
Consistency of orthology and paralogy constraints in the presence of gene transfers
Determining if genes are orthologous (i.e. homologous genes whose most common ancestor represents a speciation) or paralogous (homologous genes whose most common ancestor represents a duplication) is a foundational problem in bioinformatics. For instance, the input to almost all phylogenetic methods is a sequence alignment of genes assumed to be orthologous. Understanding if genes are paralogs or orthologs can also be important for assigning function, for example genes that have diverged following duplication may be more likely to have neofunctionalised or subfunctionalised compared to genes that have diverged following speciation, which may be more likely to have continued in a similar role.
This paper by Jones et al (2022) contributes to a wide range of literature addressing the inference of orthology/paralogy relations but takes a different approach to explaining inconsistency between an assumed species phylogeny and a relation graph (a graph where nodes represent genes and edges represent that the two genes are orthologs). Rather than assuming that inconsistencies are the result of incorrect assessment of orthology (i.e. incorrect edges in the relation graph) they ask if the relation graph could be consistent with a species tree combined with some amount of lateral (horizontal) gene transfer.
The two main questions addressed in this paper are (1) if a network N and a relation graph R are consistent, and (2) if – given a species tree S and a relation graph R – transfer arcs can be added to S in such a way that it becomes consistent with R?
The first question hinges on the concept of a reconciliation between a gene tree and a network (section 2.1) and amounts to asking if a gene tree can be found that can both be reconciled with the network and consistent with the relation graph. The authors show that the problem is NP hard. Furthermore, the related problem of attempting to find a solution using k or fewer transfers is NP-hard, and also W hard implying that it is in a class of problems for which fixed parameter tractable solutions have not been found. The proof of NP hardness is by reduction to the k-multi-coloured clique problem via an intermediate problem dubbed “antichain on trees” (Section 3). The “antichain on trees” construction may be of interest to others working on algorithmic complexity with phylogenetic networks.
In the second question the possible locations of transfers are not specified (or to put it differently any time consistent transfer arc is considered possible) and it is shown that it generally will be possible to add transfer edges to S in such a way that it can be consistent with R. However, the natural extension to this question of asking if it can be done with k or fewer added arcs is also NP hard.
Many of the proofs in the paper are quite technical, but the authors have relegated a lot of this detail to the appendix thus ensuring that the main ideas and results are clear to follow in the main text. I am grateful to both reviewers for their detailed reviews and through checking of the proofs.
Jones M, Lafond M, Scornavacca C (2022) Consistency of orthology and paralogy constraints in the presence of gene transfers. arXiv:1705.01240 [cs], ver. 6 peer-reviewed and recommended by Peer Community in Mathematical and Computational Biology. https://arxiv.org/abs/1705.01240
Barbara Holland (2022) Allowing gene transfers doesn't make life easier for inferring orthology and paralogy. Peer Community in Mathematical and Computational Biology, 100009. https://doi.org/10.24072/pci.mcb.100009
Evaluation round #1
DOI or URL of the preprint: https://arxiv.org/abs/1705.01240
Version of the preprint: 4
Author's Reply, None
Decision by Barbara Holland, 10 Jan 2022
I am pleased to have (finally) managed to get two expert reviews for this paper. Apologies for how long this took!
You'll see that they both have constructive suggestions that you may like to consider in a revised version.
I haven't delved tinto the technical detail but noticed a few small things.
Orthology and paralogy relations are often inferred by methods based on gene sequence similarity, which yield a graph depicting the relationships between gene pairs.
Orthology and paralogy relations are often inferred by methods based on gene sequence similarity that yield a graph depicting the relationships between gene pairs.
I always get confused about when to use which and that but here I think that is better (i.e. an essential clause)
Vertical descent with modification (speciation) constitutes only part of the events shaping a gene history;
I'd say that vertical descent with modification is different from speciation, i.e. it's evolution along an edge whereas speciation is a splitting event.
The authors ask, given a reconciled gene tree G that displays a given of relations, whether there is a species network N that be reconciled with G.
The authors ask, given a reconciled gene tree G that displays a given of relations, whether there is a species network N that can be reconciled with G.
page 7 (last sentence)
missing a reference
It is worth mentioning the question studied in ??