PCI Math Comp Biol

185

Title *

Revisiting pangenome openness with k-mersuse asterix (*) to get italics

Authors *

Luca Parmigiani, Roland Wittler, Jens StoyePlease use the format "First name initials family name" as in "Marie S. Curie, Niels H. D. Bohr, Albert Einstein, John R. R. Tolkien, Donna T. Strickland"

Year *

2024

Picture *

Abstract *

<p style="text-align: justify;">Pangenomics is the study of related genomes collectively, usually from the same species or closely related taxa. Originally, pangenomes were defined for bacterial species. After the concept was extended to eukaryotic genomes, two definitions of pangenome evolved in parallel: the gene-based approach, which defines the pangenome as the union of all genes, and the sequence-based approach, which defines the pangenome as the set of all nonredundant genomic sequences. Estimating the total size of the pangenome for a given species has been subject of study since the very first mention of pangenomes. Traditionally, this is performed predicting the ratio at which new genes are discovered, referred to as the openness of the species. Here, we abstract each genome as a set of items, which is entirely agnostic of the two approaches (gene-based, sequence-based). Genes are a viable option for items, but also other possibilities are feasible, e.g., genome sequence substrings of fixed length k (k-mers). In the present study, we investigate the use of k-mers to estimate the openness as an alternative to genes, and compare the results. An efficient implementation is also provided.</p>

Indicate the full web address (DOI or URL) giving public access to these data (if you have any problems with the deposit of your data, please contact contact@mcb.peercommunityin.org). In case all raw data are included in the preprint, indicate the DOI or URL of the preprint. *

https://doi.org/10.5281/zenodo.8256094You should fill this box only if you chose 'All or part of the results presented in this preprint are based on data'. URL must start with http:// or https://

Indicate the full web address (DOI or URL) giving public access to these scripts (if you have any problems with the deposit of your scripts, please contact contact@mcb.peercommunityin.org). In case all raw scripts are included in the preprint, indicate the DOI or URL of the preprint. *

https://doi.org/10.5281/zenodo.8256094You should fill this box only if you chose 'Scripts were used to obtain or analyze the results'. URL must start with http:// or https://

Indicate the full web address (DOI, SWHID or URL) giving public access to these codes (if you have any problems with the deposit of your codes, please contact contact@mcb.peercommunityin.org). In case all raw codes are included in the preprint, indicate the DOI or URL of the preprint. *

https://gitlab.ub.uni-bielefeld.de/gi/pangrowthYou should fill this box only if you chose 'Codes have been used in this study'. URL must start with http:// or https://

Keywords (optional)

Bioinformatics Pangenomics

Methods that require specific expertise (optional)

NonePlease indicate the methods that may require specialised expertise during the peer review process (use a comma to separate various required expertises).

Thematic fields *

Combinatorics, Genomics and Transcriptomics

Suggested reviewers - Suggest up to 10 reviewers (provide names and Email addresses). (Optional)

e.g. John Doe john@doe.com

No need for them to be recommenders of PCI Math Comp Biol. Please do not suggest reviewers for whom there might be a conflict of interest. Reviewers are not allowed to review preprints written by close colleagues (with whom they have published in the last four years, with whom they have received joint funding in the last four years, or with whom they are currently writing a manuscript, or submitting a grant proposal), or by family members, friends, or anyone for whom bias might affect the nature of the review - see the code of conduct

Opposed reviewers - Suggest up to 5 people not to invite as reviewers. (Optional)

e.g. John Doe john@doe.com

Submission date

2022-11-22 14:48:18

Recommender

Leo van Iersel

Reviewers

Guillaume Marçais, Yadong Zhang

or Register
Submit a preprint