Submit a preprint

268

Alignment-free detection and seed-based identification of multi-loci V(D)J recombinations in Vidjil-algouse asterix (*) to get italics
Cyprien Borée, Mathieu Giraud, Mikaël SalsonPlease use the format "First name initials family name" as in "Marie S. Curie, Niels H. D. Bohr, Albert Einstein, John R. R. Tolkien, Donna T. Strickland"
2024
<p>The diversity of the immune repertoire is grounded on V(D)J recombinations in several loci. Many algorithms and software detect and designate these recombinations in high-throughput sequencing data. To improve their efficiency, we propose a multi-loci seed identification through an Aho-Corasick like automaton as well as a seed-based gene filtration. These algorithms were implemented into Vidjil-algo, used routinely by several labs for the analysis of hematologic malignancies. We benchmark the results of Vidjil-algo and of MiXCR on five datasets, evaluating the specificity and sensitivity of the detection, as well as the adequation of the designation to manually curated sequences. Compared to the previous algorithms, the new algorithms implemented in Vidjil-algo bring speedups between 3× and 30×, with a smaller memory footprint and without quality loss in results. They enable to precisely annotate in a few minutes millions of sequences coming from V(D)J recombinations, including incomplete V(D)J-like recombinations, improving our knowledge on immune repertoires.&nbsp;</p>
https://www.vidjil.org/dataYou should fill this box only if you chose 'All or part of the results presented in this preprint are based on data'. URL must start with http:// or https://
https://www.vidjil.org/dataYou should fill this box only if you chose 'Scripts were used to obtain or analyze the results'. URL must start with http:// or https://
https://www.vidjil.org/dataYou should fill this box only if you chose 'Codes have been used in this study'. URL must start with http:// or https://
Spaced seeds; Aho-Corasick automaton; Alignment-free algorithms; Immune repertoire; V(D)J recombinations; Adaptive Immune Receptor Repertoire (AIRR); Repertoire Sequencing (RepSeq)
NonePlease indicate the methods that may require specialised expertise during the peer review process (use a comma to separate various required expertises).
Combinatorics, Computational complexity, Design and analysis of algorithms, Genomics and Transcriptomics, Immunology
Véronique Giudicelli <veronique.giudicelli@igh.cnrs.fr>, Burkhard Morgenstern <bmorgen@gwdg.de>, Solon Pissis <solon.pissis@cwi.nl>, Sven Rahmann <Sven.Rahmann@uni-due.de>, Susana Vinga <susanavinga@tecnico.ulisboa.pt>, Shunsuke Kanda [shnsk.knd@gmail.com] suggested: Takuya Mieno, tmieno@uec.ac.jp, Shunsuke Kanda [shnsk.knd@gmail.com] suggested: Tomohiro I, tomohiro@ai.kyutech.ac.jp, Shunsuke Kanda [shnsk.knd@gmail.com] suggested: Dominik Köppl, dominik.koeppl@uni-muenster.de, Shunsuke Kanda [shnsk.knd@gmail.com] suggested: Takuya Takagi, takagi.takuya@fujitsu.com No need for them to be recommenders of PCI Math Comp Biol. Please do not suggest reviewers for whom there might be a conflict of interest. Reviewers are not allowed to review preprints written by close colleagues (with whom they have published in the last four years, with whom they have received joint funding in the last four years, or with whom they are currently writing a manuscript, or submitting a grant proposal), or by family members, friends, or anyone for whom bias might affect the nature of the review - see the code of conduct
e.g. John Doe [john@doe.com]
2023-12-28 18:03:42
Giulio Ermanno Pibiri