Latest recommendations
Id | Title * ▲ | Authors * | Abstract * | Picture * | Thematic fields * | Recommender | Reviewers | Submission date | |
---|---|---|---|---|---|---|---|---|---|
27 Jan 2025
![]() Biology-Informed inverse problems for insect pests detection using pheromone sensorsThibault Malou, Nicolas Parisey, Katarzyna Adamczyk-Chauvat, Elisabeta Vergu, Béatrice Laroche, Paul-Andre Calatayud, Philippe Lucas, Simon Labarthe https://hal.inrae.fr/hal-04572831v2Towards accurate inference of insect presence landscapes from pheromone sensor networksRecommended by Eric TannierInsecticides are used to control crop pests and prevent severe crop losses. They are also a major cause of the current decline in biodiversity, contribute to climate change, and pollute soil and water, with consequences for human and environmental health [1]. The rationale behind the work of Malou et al [2] is that some pesticide application protocols can be improved by a better knowledge of the insects, their biology, their ecology and their real-time infestation dynamics in the fields. Thanks to a network of pheromone sensors and a mathematical method to derive the spatio-temporal distribution of pest populations from the signals, it is theoretically possible to adjust the time, dose and area of treatment and to use less pesticide with greater efficiency than an uninformed protocol. Malou et al [2] focus on the mathematical problem, recognising that its real role in pest control would require work on its implementation and on a benefit-harm analysis. The problem is an "inverse problem" [3] in that it consists of inferring the presence of insects from the trail left by the pheromones, given a model of pheromone diffusion by insects. The main contribution of this work is the formulation and comparison of different regularisation terms in the optimisation inference scheme, in order to guide the optimisation by biological knowledge of specific pests, such as some parameters of population dynamics. The accuracy and precision of the results are tested and compared on a simple toy example to test the ability of the model and algorithm to detect the source of the pheromones and the efficiency of the data assimilation principle. A further simulation is then carried out on a real plot with realistic parameters and rules based on knowledge of a maize pest. A repositioning of the sensors (informed by the results from the initial positions) is carried out during the test phase to allow better detection. The work of Malou et al [2] is large, deep and complete. Its includes a detailed study of the numerical solutions of different data assimilation methods, as well as a theoretical reflection on how this work could contribute to agricultural and environmental issues. References [1] IPBES (2024). Thematic Assessment Report on the Underlying Causes of Biodiversity Loss and the Determinants of Transformative Change and Options for Achieving the 2050 Vision for Biodiversity of the Intergovernmental Science-Policy Platform on Biodiversity and Ecosystem Services. O’Brien, K., Garibaldi, L., and Agrawal, A. (eds.). IPBES secretariat, Bonn, Germany. https://doi.org/10.5281/zenodo.11382215 [2] Thibault Malou, Nicolas Parisey, Katarzyna Adamczyk-Chauvat, Elisabeta Vergu, Béatrice Laroche, Paul-Andre Calatayud, Philippe Lucas, Simon Labarthe (2025) Biology-Informed inverse problems for insect pests detection using pheromone sensors. HAL, ver.2 peer-reviewed and recommended by PCI Math Comp Biol https://hal.inrae.fr/hal-04572831v2 [3] Isakov V (2017). Inverse Problems for Partial Differential Equations. Vol. 127. Applied Mathematical Sciences. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-51658-5. | Biology-Informed inverse problems for insect pests detection using pheromone sensors | Thibault Malou, Nicolas Parisey, Katarzyna Adamczyk-Chauvat, Elisabeta Vergu, Béatrice Laroche, Paul-Andre Calatayud, Philippe Lucas, Simon Labarthe | <p>Most insects have the ability to modify the odor landscape in order to communicate with their conspecies during key phases of their life cycle such as reproduction. They release pheromones in their nearby environment, volatile compounds that ar... | ![]() | Agricultural Science, Dynamical systems, Epidemiology, Systems biology | Eric Tannier | 2024-05-12 19:14:34 | View | |
18 Apr 2023
![]() Cancer phylogenetic tree inference at scale from 1000s of single cell genomesSohrab Salehi, Fatemeh Dorri, Kevin Chern, Farhia Kabeer, Nicole Rusk, Tyler Funnell, Marc J Williams, Daniel Lai, Mirela Andronescu, Kieran R. Campbell, Andrew McPherson, Samuel Aparicio, Andrew Roth, Sohrab Shah, and Alexandre Bouchard-Côté https://doi.org/10.1101/2020.05.06.058180Phylogenetic reconstruction from copy number aberration in large scale, low-depth genome-wide single-cell data.Recommended by Amaury Lambert based on reviews by 3 anonymous reviewersThe paper [1] presents and applies a new Bayesian inference method of phylogenetic reconstruction for multiple sequence alignments in the case of low sequencing coverage but diverse copy number aberrations (CNA), with applications to single cell sequencing of tumors. The idea is to take advantage of CNA to reconstruct the topology of the phylogenetic tree of sequenced cells in a first step (the `sitka' method), and in a second step to assign single nucleotide variants (SNV) to tree edges (and then calibrate their lengths) (the `sitka-snv' method). The data are summarized into a binary-valued CxL matrix Y, where C is the number of cells and L is the number of loci (here, loci are segments of prescribed length called `bins'). The entry of Y at row i and column j is 1 (otherwise 0) iff in the ancestral lineage of cell i, at least one genomic rearrangement has occurred, and more specifically the gain or loss of a segment with at least one endpoint in locus j or in locus j+1. The authors expect the infinite-allele assumption to approximately hold (i.e., that at most one mutation occurs at any given marker and that 0 is the ancestral state). They refer to this assumption as the `perfect phylogeny assumption'. By only recording from CNA events the endpoints at which they occur, the authors lose the information on copy number, but they gain the assumption of independence of the mutational processes occurring at different sites, which approximately holds for CNA endpoints. The goal of sitka is to produce a posterior distribution on phylogenetic trees conditional on the matrix Y , where here a phylogenetic tree is understood as containing the information on 1) the topology of the tree but not its edge lengths, and 2) for each edge, the identity of markers having undergone a mutation, in the sense of the previous paragraph. The results of the method are tested against synthetic datasets simulated under various assumptions, including conditions violating the perfect phylogeny assumption and compared to results obtained under other baseline methods. The method is extended to assign SNV to edges of the tree inferred by sitka. It is also applied to real datasets of single cell genomes of tumors. The manuscript is very well-written, with a high degree of detail. The method is novel, scalable, fast and appears to perform favorably compared to other approaches. It has been applied in independent publications, for example to multi-year time-series single-cell whole-genome sequencing of tumors, in order to infer the fitness landscape and its dynamics through time, see [2]. The reviewing process has taken too long, mainly because of other commitments I had during the period and to the difficulty of finding reviewers. Let me apologize to the authors and thank them for their patience as well as for the scientific rigor they brought to their revisions and answers to reviewers, who I also warmly thank for their quality work. REFERENCES [1] Sohrab Salehi, Fatemeh Dorri, Kevin Chern, Farhia Kabeer, Nicole Rusk, Tyler Funnell, Marc J Williams, Daniel Lai, Mirela Andronescu, Kieran R. Campbell, Andrew McPherson, Samuel Aparicio, Andrew Roth, Sohrab Shah, and Alexandre Bouchard-Côté. Cancer phylogenetic tree inference at scale from 1000s of single cell genomes (2023). bioRxiv, 2020.05.06.058180, ver. 4 peer-reviewed and recommended by Peer Community in Mathematical and Computational Biology. [2] Sohrab Salehi, Farhia Kabeer, Nicholas Ceglia, Mirela Andronescu, Marc J. Williams, Kieran R. Campbell, Tehmina Masud, Beixi Wang, Justina Biele, Jazmine Brimhall, David Gee, Hakwoo Lee, Jerome Ting, Allen W. Zhang, Hoa Tran, Ciara O’Flanagan, Fatemeh Dorri, Nicole Rusk, Teresa Ruiz de Algara, So Ra Lee, Brian Yu Chieh Cheng, Peter Eirew, Takako Kono, Jenifer Pham, Diljot Grewal, Daniel Lai, Richard Moore, Andrew J. Mungall, Marco A. Marra, IMAXT Consortium, Andrew McPherson, Alexandre Bouchard-Côté, Samuel Aparicio & Sohrab P. Shah. Clonal fitness inferred from time-series modelling of single-cell cancer genomes (2021). Nature 595, 585–590. https://doi.org/10.1038/s41586-021-03648-3 | Cancer phylogenetic tree inference at scale from 1000s of single cell genomes | Sohrab Salehi, Fatemeh Dorri, Kevin Chern, Farhia Kabeer, Nicole Rusk, Tyler Funnell, Marc J Williams, Daniel Lai, Mirela Andronescu, Kieran R. Campbell, Andrew McPherson, Samuel Aparicio, Andrew Roth, Sohrab Shah, and Alexandre Bouchard-Côté | <p style="text-align: justify;">A new generation of scalable single cell whole genome sequencing (scWGS) methods allows unprecedented high resolution measurement of the evolutionary dynamics of cancer cell populations. Phylogenetic reconstruction ... | ![]() | Evolutionary Biology, Genetics and population Genetics, Genomics and Transcriptomics, Machine learning, Probability and statistics | Amaury Lambert | 2021-12-10 17:08:04 | View | |
21 Feb 2022
![]() Consistency of orthology and paralogy constraints in the presence of gene transfersMark Jones, Manuel Lafond, Celine Scornavacca https://doi.org/10.48550/arXiv.1705.01240Allowing gene transfers doesn't make life easier for inferring orthology and paralogyRecommended by Barbara Holland based on reviews by 2 anonymous reviewersDetermining if genes are orthologous (i.e. homologous genes whose most common ancestor represents a speciation) or paralogous (homologous genes whose most common ancestor represents a duplication) is a foundational problem in bioinformatics. For instance, the input to almost all phylogenetic methods is a sequence alignment of genes assumed to be orthologous. Understanding if genes are paralogs or orthologs can also be important for assigning function, for example genes that have diverged following duplication may be more likely to have neofunctionalised or subfunctionalised compared to genes that have diverged following speciation, which may be more likely to have continued in a similar role. This paper by Jones et al (2022) contributes to a wide range of literature addressing the inference of orthology/paralogy relations but takes a different approach to explaining inconsistency between an assumed species phylogeny and a relation graph (a graph where nodes represent genes and edges represent that the two genes are orthologs). Rather than assuming that inconsistencies are the result of incorrect assessment of orthology (i.e. incorrect edges in the relation graph) they ask if the relation graph could be consistent with a species tree combined with some amount of lateral (horizontal) gene transfer. The two main questions addressed in this paper are (1) if a network N and a relation graph R are consistent, and (2) if – given a species tree S and a relation graph R – transfer arcs can be added to S in such a way that it becomes consistent with R? The first question hinges on the concept of a reconciliation between a gene tree and a network (section 2.1) and amounts to asking if a gene tree can be found that can both be reconciled with the network and consistent with the relation graph. The authors show that the problem is NP hard. Furthermore, the related problem of attempting to find a solution using k or fewer transfers is NP-hard, and also W[1] hard implying that it is in a class of problems for which fixed parameter tractable solutions have not been found. The proof of NP hardness is by reduction to the k-multi-coloured clique problem via an intermediate problem dubbed “antichain on trees” (Section 3). The “antichain on trees” construction may be of interest to others working on algorithmic complexity with phylogenetic networks. In the second question the possible locations of transfers are not specified (or to put it differently any time consistent transfer arc is considered possible) and it is shown that it generally will be possible to add transfer edges to S in such a way that it can be consistent with R. However, the natural extension to this question of asking if it can be done with k or fewer added arcs is also NP hard. Many of the proofs in the paper are quite technical, but the authors have relegated a lot of this detail to the appendix thus ensuring that the main ideas and results are clear to follow in the main text. I am grateful to both reviewers for their detailed reviews and through checking of the proofs. References Jones M, Lafond M, Scornavacca C (2022) Consistency of orthology and paralogy constraints in the presence of gene transfers. arXiv:1705.01240 [cs], ver. 6 peer-reviewed and recommended by Peer Community in Mathematical and Computational Biology. https://arxiv.org/abs/1705.01240 | Consistency of orthology and paralogy constraints in the presence of gene transfers | Mark Jones, Manuel Lafond, Celine Scornavacca | <p style="text-align: justify;">Orthology and paralogy relations are often inferred by methods based on gene sequence similarity that yield a graph depicting the relationships between gene pairs. Such relation graphs frequently contain errors, as ... | ![]() | Computational complexity, Design and analysis of algorithms, Evolutionary Biology, Graph theory | Barbara Holland | 2021-06-30 15:01:44 | View | |
28 Jun 2024
![]() Emergence of Supercoiling-Mediated Regulatory Networks through the Evolution of Bacterial Chromosome OrganizationThéotime Grohens, Sam Meyer, Guillaume Beslon https://doi.org/10.1101/2022.09.23.509185Understanding the impact of the transcription-supercoiling coupling on bacterial genome evolutionRecommended by Nelle VaroquauxDNA supercoiling, the under or overwinding of DNA, is known to strongly impact gene expression, as changes in levels of supercoiling directly influence transcription rates. In turn, gene transcription generates DNA supercoiling on each side of an advancing RNA polymerase. This coupling between DNA supercoiling and transcription may result in different outcomes, depending on neighboring gene orientations: divergent genes tend to increase transcription levels, convergent genes tend to inhibit each other, while tandem genes may exhibit more intricate relationships. While several works have investigated the relationship between transcription and supercoiling, Grohens et al [1] address a different question: how does transcription-supercoiling coupling drive genome evolution? To this end, they consider a simple model of gene expression regulation where transcription level only depends on the local DNA supercoiling and where the transcription of one gene generates a linear profile of positive and negative DNA supercoiling on each side of it. They then make genomes evolve through genomic inversions only considering a fitness that reflects the ability of a genome to cope with two distinct environments for which different genes have to be activated or repressed. Using this simple model, the authors illustrate how evolutionary adaptation via genomic inversions can adjust expression levels for enhanced fitness within specific environments, particularly with the emergence of relaxation-activated genes. Investigating the genomic organization of individual genomes revealed that genes are locally organized to leverage the transcription-supercoiling coupling for activation or inhibition, but larger-scale networks of genes are required to strongly inhibit genes (sometimes up to networks of 20 genes). Thus, supercoiling-mediated interactions between genes can implicate more than just local genes. Finally, they construct an "effective interaction graph" between genes by successively simulating gene knock-outs for all of the genes of an individual and observing the effect on the expression level of other genes. They observe a densely connected interaction network, implying that supercoiling-based regulation could evolve concurrently with genome organization in bacterial genomes. References [1] Théotime Grohens, Sam Meyer, Guillaume Beslon (2024) Emergence of Supercoiling-Mediated Regulatory Networks through the Evolution of Bacterial Chromosome Organization. bioRxiv, ver. 4 peer-reviewed and recommended by Peer Community in Mathematical and Computational Biology https://doi.org/10.1101/2022.09.23.509185 | Emergence of Supercoiling-Mediated Regulatory Networks through the Evolution of Bacterial Chromosome Organization | Théotime Grohens, Sam Meyer, Guillaume Beslon | <p>DNA supercoiling -- the level of twisting and writhing of the DNA molecule around itself -- plays a major role in the regulation of gene expression in bacteria by modulating promoter activity. The level of DNA supercoiling is a dynamic property... | ![]() | Biophysics, Evolutionary Biology, Systems biology | Nelle Varoquaux | 2023-06-30 10:34:28 | View | |
27 Jul 2021
![]() Estimating dates of origin and end of COVID-19 epidemicsThomas Bénéteau, Baptiste Elie, Mircea T. Sofonea, Samuel Alizon https://doi.org/10.1101/2021.01.19.21250080The importance of model assumptions in estimating the dynamics of the COVID-19 epidemicRecommended by Valery Forbes based on reviews by Bastien Boussau and 1 anonymous reviewerIn “Estimating dates of origin and end of COVID-19 epidemics”, Bénéteau et al. develop and apply a mathematical modeling approach to estimate the date of the origin of the SARS-CoV-2 epidemic in France. They also assess how long strict control measures need to last to ensure that the prevalence of the virus remains below key public health thresholds. This problem is challenging because the numbers of infected individuals in both tails of the epidemic are low, which can lead to errors when deterministic models are used. To achieve their goals, the authors developed a discrete stochastic model. The model is non-Markovian, meaning that individual infection histories influence the dynamics. The model also accounts for heterogeneity in the timing between infection and transmission and includes stochasticity as well as consideration of superspreader events. By comparing the outputs of their model with several alternative models, Bénéteau et al. were able to assess the importance of stochasticity, individual heterogeneity, and non-Markovian effects on the estimates of the dates of origin and end of the epidemic, using France as a test case. Some limitations of the study, which the authors acknowledge, are that the time from infection to death remains largely unknown, a lack of data on the heterogeneity of transmission among individuals, and the assumption that only a single infected individual caused the epidemic. Despite the acknowledged limitations of the work, the results suggest that cases may be detected long before the detection of an epidemic wave. Also, the approach may be helpful for informing public health decisions such as the necessary duration of strict lockdowns and for assessing the risks of epidemic rebound as restrictions are lifted. In particular, the authors found that estimates of the end of the epidemic following lockdowns are more sensitive to the assumptions of the models used than estimates of its beginning. In summary, this model adds to a valuable suite of tools to support decision-making in response to disease epidemics. References Bénéteau T, Elie B, Sofonea MT, Alizon S (2021) Estimating dates of origin and end of COVID-19 epidemics. medRxiv, 2021.01.19.21250080, ver. 3 peer-reviewed and recommended by Peer Community in Mathematical and Computational Biology. https://doi.org/10.1101/2021.01.19.21250080 | Estimating dates of origin and end of COVID-19 epidemics | Thomas Bénéteau, Baptiste Elie, Mircea T. Sofonea, Samuel Alizon | <p style="text-align: justify;">Estimating the date at which an epidemic started in a country and the date at which it can end depending on interventions intensity are important to guide public health responses. Both are potentially shaped by simi... | ![]() | Epidemiology, Probability and statistics, Stochastic dynamics | Valery Forbes | 2021-02-23 16:37:32 | View | |
18 Sep 2023
![]() Minimal encodings of canonical k-mers for general alphabets and even k-mer sizesRecommended by Paul Medvedev based on reviews by 2 anonymous reviewersAs part of many bioinformatics tools, one encodes a k-mer, which is a string, into an integer. The natural encoding uses a bijective function to map the k-mers onto the interval [0, s^k - ], where s is the alphabet size. This encoding is minimal, in the sense that the encoded integer ranges from 0 to the number of represented k-mers minus 1. However, often one is only interested in encoding canonical k-mers. One common definition is that a k-mer is canonical if it is lexicographically not larger than its reverse complement. In this case, only about half the k-mers from the universe of k-mers are canonical, and the natural encoding is no longer minimal. For the special case of a DNA alphabet and odd k, there exists a "parity-based" encoding for canonical k-mers which is minimal. In [1], the author presents a minimal encoding for canonical k-mers that works for general alphabets and both odd and even k. They also give an efficient bit-based representation for the DNA alphabet. This paper fills a theoretically interesting and often overlooked gap in how to encode k-mers as integers. It is not yet clear what practical applications this encoding will have, as the author readily acknowledges in the manuscript. Neither the author nor the reviewers are aware of any practical situations where the lack of a minimal encoding "leads to serious limitations." However, even in an applied field like bioinformatics, it would be short-sighted to only value theoretical work that has an immediate application; often, the application is several hops away and not apparent at the time of the original work. In fact, I would speculate that there may be significant benefits reaped if there was more theoretical attention paid to the fact that k-mers are often restricted to be canonical. Many papers in the field sweep under the rug the fact that k-mers are made canonical, leaving it as an implementation detail. This may indicate that the theory to describe and analyze this situation is underdeveloped. This paper makes a step forward to develop this theory, and I am hopeful that it may lead to substantial practical impact in the future. References [1] Roland Wittler (2023) "General encoding of canonical k-mers. bioRxiv, ver.2, peer-reviewed and recommended by Peer Community in Mathematical and Computational Biology https://doi.org/10.1101/2023.03.09.531845 | General encoding of canonical *k*-mers | Roland Wittler | <p style="text-align: justify;">To index or compare sequences efficiently, often <em>k</em>-mers, i.e., substrings of fixed length <em>k</em>, are used. For efficient indexing or storage, <em>k</em>-mers are encoded as integers, e.g., applying som... | ![]() | Combinatorics, Computational complexity, Genomics and Transcriptomics | Paul Medvedev | Anonymous | 2023-03-13 17:01:37 | View |
22 Jul 2024
![]() Genetic Evidence for Geographic Structure within the Neanderthal PopulationAlan R. Rogers https://doi.org/10.1101/2023.07.28.551046Decline in Neanderthal effective population size due to geographic structure and gene flowRecommended by Raquel Assis based on reviews by David Bryant and Guillaume AchazPublished PSMC estimates of Neanderthal effective population size (𝑁e) show an approximately five-fold decline over the past 20,000 years [1]. This observation may be attributed to a true decline in Neanderthal 𝑁e, statistical error that is notorious with PSMC estimation, or geographic subdivision and gene flow that has been hypothesized to occur within the Neanderthal population. Determining which of these factors contributes to the observed decline in Neanderthal 𝑁e is an important question that can provide insight into human evolutionary history. Though it is widely believed that the decline in Neanderthal 𝑁e is due to geographic subdivision and gene flow, no prior studies have theoretically examined whether these evolutionary processes can yield the observed pattern. In this paper [2], Rogers tackles this problem by employing two mathematical models to explore the roles of geographic subdivision and gene flow in the Neanderthal population. Results from both models show that geographic subdivision and gene flow can indeed result in a decline in 𝑁e that mirrors the observed decline estimated from empirical data. In contrast, Rogers argues that neither statistical error in PSMC estimates nor a true decline in 𝑁e are expected to produce the consistent decline in estimated 𝑁e observed across three distinct Neanderthal fossils. Statistical error would likely result in variation among these curves, whereas a true decline in 𝑁e would produce shifted curves due to the different ages of the three Neanderthal fossils. In summary, Rogers provides convincing evidence that the most reasonable explanation for the observed decline in Neanderthal 𝑁e is geographic subdivision and gene flow. Rogers also provides a basis for understanding this observation, suggesting that 𝑁e declines over time because coalescence times are shorter between more recent ancestors, as they are more likely to be geographic neighbors. Hence, Rogers’ theoretical findings shed light on an interesting aspect of human evolutionary history. References [1] Fabrizio Mafessoni, Steffi Grote, Cesare de Filippo, Svante Pääbo (2020) “A high-coverage Neandertal genome from Chagyrskaya Cave”. Proceedings of the National Academy of Sciences USA 117: 15132- 15136. https://doi.org/10.1073/pnas.2004944117 [2] Alan Rogers (2024) “Genetic evidence for geographic structure within the Neanderthal population”. bioRxiv, version 4 peer-reviewed and recommended by Peer Community in Mathematical and Computational Biology. https://doi.org/10.1101/2023.07.28.551046 | Genetic Evidence for Geographic Structure within the Neanderthal Population | Alan R. Rogers | <p>PSMC estimates of Neanderthal effective population size (N<sub>e</sub>)exhibit a roughly 5-fold decline across the most recent 20 ky before the death of each fossil. To explain this pattern, this article develops new theory relating... | ![]() | Evolutionary Biology, Genetics and population Genetics | Raquel Assis | 2023-10-17 18:06:38 | View | |
02 Oct 2024
![]() HairSplitter: haplotype assembly from long, noisy readsRoland Faure, Dominique Lavenier, Jean-François Flot https://doi.org/10.1101/2024.02.13.580067Accurate Haplotype Reconstruction from Long, Error-Prone, Reads with HairSplitterRecommended by Giulio Ermanno PibiriA prominent challenge in computational biology is to distinguish microbial haplotypes -- closely related organisms with highly similar genomes -- due to small genomic differences that can cause significant phenotypic variations. Current genome assembly tools struggle with distinguishing these haplotypes, especially for long-read sequencing data with high error rates, such as PacBio or Oxford Nanopore Technology (ONT) reads. While existing methods work well for either viral or bacterial haplotypes, they often fail with low-abundance haplotypes and are computationally intensive. This work by Faure, Lavenier, and Flot [1] introduces a new tool -- HairSplitter -- that offers a solution for both viral and bacterial haplotype separation, even with error-prone long reads. It does this by efficiently calling variants, clustering reads into haplotypes, creating new separated contigs, and resolving the assembly graph. A key advantage of HairSplitter is that it is entirely parameter-free and does not require prior knowledge of the organism's ploidy. HairSplitter is designed to handle both metaviromes and bacterial metagenomes, offering a more versatile and efficient solution than existing tools, like stRainy [2], Strainberry [3], and hifiasm-meta [4]. References [1] Roland Faure, Dominique Lavenier, Jean-François Flot (2024) HairSplitter: haplotype assembly from long, noisy reads. bioRxiv, ver.3 peer-reviewed and recommended by PCI Math Comp Biol https://doi.org/10.1101/2024.02.13.580067 [2] Kazantseva E, A Donmez, M Pop, and M Kolmogorov (2023). stRainy: assembly-based metagenomic strain phasing using long reads. Bioinformatics. https://doi.org/10.1101/2023.01.31.526521 [3] Vicedomini R, C Quince, AE Darling, and R Chikhi (2021). Strainberry: automated strain separation in low complexity metagenomes using long reads. Nature Communications, 12, 4485. ISSN: 2041-1723. https://doi.org/10.1038/s41467-021-24515-9 [4] Feng X, H Cheng, D Portik, and H Li (2022). Metagenome assembly of high-fidelity long reads with hifiasm-meta. Nature Methods, 19, 1–4. https://doi.org/10.1038/s41592-022-01478-3 | HairSplitter: haplotype assembly from long, noisy reads | Roland Faure, Dominique Lavenier, Jean-François Flot | <p>Long-read assemblers face challenges in discerning closely related viral or<br>bacterial strains, often collapsing similar strains in a single sequence. This limitation has<br>been hampering metagenome analysis, where diverse strains may harbor... | ![]() | Design and analysis of algorithms, Development, Genomics and Transcriptomics, Probability and statistics | Giulio Ermanno Pibiri | 2024-02-15 10:17:04 | View | |
19 Sep 2022
![]() HMMploidy: inference of ploidy levels from short-read sequencing dataSamuele Soraggi, Johanna Rhodes, Isin Altinkaya, Oliver Tarrant, Francois Balloux, Matthew C Fisher, Matteo Fumagalli https://doi.org/10.1101/2021.06.29.450340Detecting variation in ploidy within and between genomesRecommended by Alan RogersSoraggi et al. [2] describe HMMploidy, a statistical method that takes DNA sequencing data as input and uses a hidden Markov model to estimate ploidy. The method allows ploidy to vary not only between individuals, but also between and even within chromosomes. This allows the method to detect aneuploidy and also chromosomal regions in which multiple paralogous loci have been mistakenly assembled on top of one another. HMMploidy estimates genotypes and ploidy simultaneously, with a separate estimate for each genome. The genome is divided into a series of non-overlapping windows (typically 100), and HMMploidy provides a separate estimate of ploidy within each window of each genome. The method is thus estimating a large number of parameters, and one might assume that this would reduce its accuracy. However, it benefits from large samples of genomes. Large samples increase the accuracy of internal allele frequency estimates, and this improves the accuracy of genotype and ploidy estimates. In large samples of low-coverage genomes, HMMploidy outperforms all other estimators. It does not require a reference genome of known ploidy. The power of the method increases with coverage and sample size but decreases with ploidy. Consequently, high coverage or large samples may be needed if ploidy is high. The method is slower than some alternative methods, but run time is not excessive. Run time increases with number of windows but isn't otherwise affected by genome size. It should be feasible even with large genomes, provided that the number of windows is not too large. The authors apply their method and several alternatives to isolates of a pathogenic yeast, Cryptococcus neoformans, obtained from HIV-infected patients. With these data, HMMploidy replicated previous findings of polyploidy and aneuploidy. There were several surprises. For example, HMMploidy estimates the same ploidy in two isolates taken on different days from a single patient, even though sequencing coverage was three times as high on the later day as on the earlier one. These findings were replicated in data that were down-sampled to mimic low coverage. Three alternative methods (ploidyNGS [1], nQuire, and nQuire.Den [3]) estimated the highest ploidy considered in all samples from each patient. The present authors suggest that these results are artifactual and reflect the wide variation in allele frequencies. Because of this variation, these methods seem to have preferred the model with the largest number of parameters. HMMploidy represents a new and potentially useful tool for studying variation in ploidy. It will be of most use in studying the genetics of asexual organisms and cancers, where aneuploidy imposes little or no penalty on reproduction. It should also be useful for detecting assembly errors in de novo genome sequences from non-model organisms. References [1] Augusto Corrêa dos Santos R, Goldman GH, Riaño-Pachón DM (2017) ploidyNGS: visually exploring ploidy with Next Generation Sequencing data. Bioinformatics, 33, 2575–2576. https://doi.org/10.1093/bioinformatics/btx204 [2] Soraggi S, Rhodes J, Altinkaya I, Tarrant O, Balloux F, Fisher MC, Fumagalli M (2022) HMMploidy: inference of ploidy levels from short-read sequencing data. bioRxiv, 2021.06.29.450340, ver. 6 peer-reviewed and recommended by Peer Community in Mathematical and Computational Biology. https://doi.org/10.1101/2021.06.29.450340 [3] Weiß CL, Pais M, Cano LM, Kamoun S, Burbano HA (2018) nQuire: a statistical framework for ploidy estimation using next generation sequencing. BMC Bioinformatics, 19, 122. https://doi.org/10.1186/s12859-018-2128-z | HMMploidy: inference of ploidy levels from short-read sequencing data | Samuele Soraggi, Johanna Rhodes, Isin Altinkaya, Oliver Tarrant, Francois Balloux, Matthew C Fisher, Matteo Fumagalli | <p>The inference of ploidy levels from genomic data is important to understand molecular mechanisms underpinning genome evolution. However, current methods based on allele frequency and sequencing depth variation do not have power to infer ploidy ... | ![]() | Design and analysis of algorithms, Evolutionary Biology, Genetics and population Genetics, Probability and statistics | Alan Rogers | 2021-07-01 05:26:31 | View | |
27 Aug 2024
![]() Impact of a block structure on the Lotka-Volterra modelMaxime Clenet, François Massol, Jamal Najim https://doi.org/10.48550/arXiv.2311.09470Equlibrium of communities in the Lotka-Volterra modelRecommended by Loïc PaulevéThis article by Clenet et al. [1] tackles a fundamental mathematical model in ecology to understand the impact of the architecture of interactions on the equilibrium of the system. The authors consider the classical Lotka-Volterra model, depicting the effect of interactions between species on their abundances. They focus on the case whenever there are numerous species, and where their interactions are compartmentalized in a block structure. Each block has a strength coefficient, applied to a random Gaussian matrix. This model aims at capturing the structure of interacting communities, with blocks describing the interactions within a community, and other blocks the interactions between communities. In this general mathematical framework, the authors demonstrate sufficient conditions for the existence and uniqueness of a stable equilibrium, and conditions for which the equilibrium is feasible. Moreover, they derive statistical heuristics for the proportion, mean, and distribution of abundance of surviving species. Overall, the article constitutes an original and solid contribution to the study of mathematical models in ecology. It combines mathematical analysis, dynamical system theory, numerical simulations, grounded with relevant hypothesis for the modeling of ecological systems. References [1] Maxime Clenet, François Massol, Jamal Najim (2023) Impact of a block structure on the Lotka-Volterra model. arXiv, ver.3 peer-reviewed and recommended by Peer Community in Mathematical and Computational Biology. https://doi.org/10.48550/arXiv.2311.09470 | Impact of a block structure on the Lotka-Volterra model | Maxime Clenet, François Massol, Jamal Najim | <p>The Lotka-Volterra (LV) model is a simple, robust, and versatile model used to describe large interacting systems such as food webs or microbiomes. The model consists of $n$ coupled differential equations linking the abundances of $n$ differen... | ![]() | Dynamical systems, Ecology, Probability and statistics | Loïc Paulevé | 2023-11-17 21:44:38 | View |
FOLLOW US
MANAGING BOARD
Wolfram Liebermeister
Celine Scornavacca (Representative)
Donate Weghorn