Recommendation

Estimates of Effective Population Size in Subdivided Populations

Alan Rogers based on reviews by 2 anonymous reviewers

A recommendation of:

Population genetics: coalescence rate and demographic parameters inference

Olivier Mazet, Camille Noûs (2023), arXiv, ver.2, peer-reviewed and recommended by PCI Mathematical and Computational Biology https://doi.org/10.48550/arXiv.2207.02111

Read preprint in preprint server Now published in Peer Community Journal

Abstract

EN

AR

ES

FR

HI

JA

PT

RU

ZH-CN

Population genetics: coalescence rate and demographic parameters inference

We propose in this article a brief description of the work, over almost a decade, resulting from a collaboration between mathematicians and biologists from four different research laboratories, identifiable as the co-authors of the articles whose results are described here, and implicitely co-authors of this article, under the sig- nature of Camille Nouˆs. This modeling work is part of population genetics, and is therefore essentially at the interface between mathematical tools, more particularly probabilistic ones, and biological data, more specifically genetic ones.

In a first part, we briefly present the theory of coalescence, which is the basis of our models, and the problems that this modeling tries to address. In a second part we describe our first results and the development of the IICR (Inverse of Instantaneous Coalescence Rate), the nodal point from which the different research paths we have been following are branching. The first results of these paths are summarized in the two following parts, one about the inference of demographic parameters of a structured population, the other about the consideration of selection.

Coalescent Theory, Stochastic Modeling, Population Genetics

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

الوراثة السكانية: معدل التلاحم واستدلال المعلمات الديموغرافية

نقترح في هذه المقالة وصفًا موجزًا للعمل، على مدار عقد تقريبًا، والذي نتج عن التعاون بين علماء الرياضيات وعلماء الأحياء من أربعة مختبرات بحثية مختلفة، يمكن تحديدهم كمؤلفين مشاركين من المقالات التي تم وصف نتائجها هنا، والمؤلفون المشاركون ضمنيًا لهذه المقالة، تحت توقيع كميل نوس. يعد عمل النمذجة هذا جزءًا من علم الوراثة السكانية، وبالتالي فهو في الأساس نقطة الوصل بين الأدوات الرياضية، وخاصة تلك الاحتمالية، والبيانات البيولوجية، وبشكل أكثر تحديدًا الجينات.

في الجزء الأول، نعرض بإيجاز نظرية التلاحم، وهي أساس نماذجنا، والمشكلات التي تحاول هذه النمذجة معالجتها. في الجزء الثاني، نصف نتائجنا الأولى وتطور IICR (معكوس معدل التلاحم اللحظي)، وهي النقطة العقدية التي تتفرع منها مسارات البحث المختلفة التي كنا نتبعها. يتم تلخيص النتائج الأولى لهذه المسارات في الجزأين التاليين، أحدهما حول استنتاج المعلمات الديموغرافية لمجموعة سكانية منظمة، والآخر حول النظر في الاختيار.

نظرية التحالف، النمذجة العشوائية، علم الوراثة السكانية

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Genética de poblaciones: tasa de coalescencia e inferencia de parámetros demográficos.

Proponemos en este artículo una breve descripción del trabajo, a lo largo de casi una década, resultado de una colaboración entre matemáticos y biólogos de cuatro laboratorios de investigación diferentes, identificables como los coautores de los artículos cuyos resultados se describen aquí, e implícitamente coautores de este artículo, bajo la firma de Camille Nouˆs. Este trabajo de modelización forma parte de la genética de poblaciones y, por lo tanto, se encuentra esencialmente en la interfaz entre las herramientas matemáticas, más particularmente las probabilísticas, y los datos biológicos, más específicamente los genéticos.

En una primera parte, presentamos brevemente la teoría de la coalescencia, que es la base de nuestros modelos, y los problemas que este modelado intenta abordar. En una segunda parte describimos nuestros primeros resultados y el desarrollo del IICR (Inverse of Instantaneous Coalescent Rate), punto nodal desde el que se ramifican los diferentes caminos de investigación que venimos siguiendo. Los primeros resultados de estos caminos se resumen en las dos partes siguientes, una sobre la inferencia de parámetros demográficos de una población estructurada y la otra sobre la consideración de la selección.

Teoría coalescente, modelado estocástico, genética de poblaciones

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Génétique des populations : taux de coalescence et inférence de paramètres démographiques

Nous proposons dans cet article une brève description des travaux, sur près d'une décennie, issus d'une collaboration entre mathématiciens et biologistes de quatre laboratoires de recherche différents, identifiables comme les co-auteurs des articles dont les résultats sont décrits ici, et implicitement co-auteurs de cet article, sous la signature de Camille Nouˆs. Ce travail de modélisation s'inscrit dans le cadre de la génétique des populations, et se situe donc essentiellement à l'interface entre les outils mathématiques, plus particulièrement probabilistes, et les données biologiques, plus spécifiquement génétiques.

Dans une première partie, nous présentons brièvement la théorie de la coalescence, qui est à la base de nos modèles, et les problèmes que cette modélisation tente de répondre. Dans une deuxième partie nous décrivons nos premiers résultats et le développement de l'IICR (Inverse of Instantaneous Coalescence Rate), point nodal à partir duquel bifurquent les différents chemins de recherche que nous avons suivis. Les premiers résultats de ces pistes sont résumés dans les deux parties suivantes, l'une sur l'inférence de paramètres démographiques d'une population structurée, l'autre sur la prise en compte de la sélection.

Théorie coalescente, modélisation stochastique, génétique des populations

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

जनसंख्या आनुवंशिकी: सहसंयोजन दर और जनसांख्यिकीय पैरामीटर अनुमान

हम इस लेख में लगभग एक दशक से अधिक के काम का एक संक्षिप्त विवरण प्रस्तावित करते हैं, जो चार अलग-अलग शोध प्रयोगशालाओं के गणितज्ञों और जीवविज्ञानियों के बीच सहयोग से उत्पन्न हुआ है, जिन्हें सह-लेखकों के रूप में पहचाना जा सकता है। उन लेखों के परिणाम जिनके परिणाम यहां वर्णित हैं, और केमिली नूस के हस्ताक्षर के तहत, इस लेख के अंतर्निहित सह-लेखक हैं। यह मॉडलिंग कार्य जनसंख्या आनुवंशिकी का हिस्सा है, और इसलिए अनिवार्य रूप से गणितीय उपकरणों, विशेष रूप से संभाव्य उपकरणों और जैविक डेटा, अधिक विशेष रूप से आनुवंशिक उपकरणों के बीच इंटरफेस पर है।

पहले भाग में, हम संक्षेप में सहसंयोजन के सिद्धांत को प्रस्तुत करते हैं, जो हमारे मॉडलों का आधार है, और वे समस्याएं जिन्हें यह मॉडलिंग संबोधित करने का प्रयास करता है। दूसरे भाग में हम अपने पहले परिणामों और आईआईसीआर (तात्कालिक सहसंयोजन दर का उलटा) के विकास का वर्णन करते हैं, वह नोडल बिंदु जहां से हम जिन विभिन्न शोध पथों का अनुसरण कर रहे हैं, वे शाखाएं हैं। इन रास्तों के पहले परिणामों को निम्नलिखित दो भागों में संक्षेपित किया गया है, एक संरचित जनसंख्या के जनसांख्यिकीय मापदंडों के अनुमान के बारे में, दूसरा चयन पर विचार के बारे में।

सहसंयोजक सिद्धांत, स्टोकेस्टिक मॉडलिंग, जनसंख्या आनुवंशिकी

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

集団遺伝学: 合体率と人口統計パラメータの推論

この記事では、共著者として特定される 4 つの異なる研究室の数学者と生物学者の共同研究から生まれた、ほぼ 10 年にわたる研究の簡単な説明を提案します。ここに結果が記載されている論文の著者であり、Camille Nous の署名の下でこの論文の暗黙の共著者です。このモデリング作業は集団遺伝学の一部であるため、本質的には数学的ツール、より具体的には確率的ツールと生物学的データ、より具体的には遺伝的データの間のインターフェースにあります。

最初の部分では、モデルの基礎である合体理論と、このモデリングが対処しようとしている問題について簡単に説明します。 2 番目の部分では、私たちの最初の結果と、私たちがたどってきたさまざまな研究の道が分岐する結節点である IICR (瞬間合体速度の逆数) の開発について説明します。これらのパスの最初の結果は、次の 2 つの部分に要約されています。1 つは構造化された母集団の人口統計パラメータの推論に関するもので、もう 1 つは選択の考慮に関するものです。

合体理論、確率モデリング、集団遺伝学

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Genética populacional: taxa de coalescência e inferência de parâmetros demográficos

Propomos neste artigo uma breve descrição do trabalho, ao longo de quase uma década, resultante de uma colaboração entre matemáticos e biólogos de quatro laboratórios de investigação diferentes, identificáveis como os coautores dos artigos cujos resultados aqui são descritos, e implicitamente coautores deste artigo, sob a assinatura de Camille Nouˆs. Este trabalho de modelagem insere-se na genética populacional e, portanto, está essencialmente na interface entre as ferramentas matemáticas, mais particularmente as probabilísticas, e os dados biológicos, mais especificamente os genéticos.

Numa primeira parte apresentamos brevemente a teoria da coalescência, que está na base dos nossos modelos, e os problemas que esta modelação tenta resolver. Numa segunda parte descrevemos os nossos primeiros resultados e o desenvolvimento do IICR (Inverso da Taxa de Coalescência Instantânea), ponto nodal a partir do qual se ramificam os diferentes caminhos de investigação que temos seguido. Os primeiros resultados destes caminhos estão resumidos nas duas partes seguintes, uma sobre a inferência de parâmetros demográficos de uma população estruturada, a outra sobre a consideração da seleção.

Teoria Coalescente, Modelagem Estocástica, Genética Populacional

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Популяционная генетика: скорость слияния и выводы о демографических параметрах

В этой статье мы предлагаем краткое описание работы, продолжавшейся почти десять лет и ставшей результатом сотрудничества математиков и биологов из четырех разных исследовательских лабораторий, в которых можно определить соавторов. статей, результаты которых описаны здесь, и безоговорочно соавторов этой статьи под подписью Камиля Нуса. Эта работа по моделированию является частью популяционной генетики и, следовательно, по сути находится на стыке математических инструментов, в частности вероятностных, и биологических данных, в частности генетических.

В первой части мы кратко представляем теорию слияния, которая лежит в основе наших моделей, и проблемы, которые пытается решить это моделирование. Во второй части мы описываем наши первые результаты и развитие IICR (инверсии мгновенной скорости слияния), узловой точки, от которой разветвляются различные исследовательские пути, которыми мы следуем. Первые результаты этих путей суммированы в двух следующих частях: одна посвящена выводу демографических параметров структурированной популяции, другая - рассмотрению отбора.

Теория слияния, стохастическое моделирование, популяционная генетика

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

群体遗传学：合并率和人口统计参数推断

我们在本文中提出了对近十年来这项工作的简要描述，该工作是来自四个不同研究实验室的数学家和生物学家之间的合作成果，可识别为共同作者此处描述其结果的文章的作者，以及本文隐含的共同作者，署名为 Camille Nouˆs。这项建模工作是群体遗传学的一部分，因此本质上是数学工具（更具体地说是概率工具）和生物数据（更具体地说是遗传工具）之间的接口。

在第一部分中，我们简要介绍了合并理论（这是我们模型的基础），以及该建模试图解决的问题。在第二部分中，我们描述了我们的第一个结果和 IICR（瞬时聚结率倒数）的发展，这是我们一直遵循的不同研究路径分支的节点。这些路径的第一个结果总结为以下两个部分，一个是关于结构化人群的人口统计参数的推断，另一个是关于选择的考虑。

合并理论、随机模型、群体遗传学

Submission: posted 11 July 2022
Recommendation: posted 25 April 2023, validated 02 May 2023

Cite this recommendation as:
Rogers, A. (2023) Estimates of Effective Population Size in Subdivided Populations. Peer Community in Mathematical and Computational Biology, 100150. https://doi.org/10.24072/pci.mcb.100150

Recommendation

We often use genetic data from a single site, or even a single individual, to estimate the history of effective population size, Ne, over time scales in excess of a million years. Mazet and Noûs [2] emphasize that such estimates may not mean what they seem to mean. The ups and downs of Ne may reflect changes in gene flow or selection, rather than changes in census population size. In fact, gene flow may cause Ne to decline even if the rate of gene flow has remained constant.

Consider for example the estimates of archaic population size in Fig. 1, which show an apparent decline in population size between roughly 700 kya and 300 kya. It is tempting to interpret this as evidence of a declining number of individuals, but that is not the only plausible interpretation.

Each of these estimates is based on the genome of a single diploid individual. As we trace the ancestry of that individual backwards into the past, the ancestors are likely to remain in the same locale for at least a generation or two. Being neighbors, there’s a chance they will mate. This implies that in the recent past, the ancestors of a sampled individual lived in a population of small effective size.

As we continue backwards into the past, there is more and more time for the ancestors to move around on the landscape. The farther back we go, the less likely they are to be neighbors, and the less likely they are to mate. In this more remote past, the ancestors of our sample lived in a population of larger effective size, even if neither the number of individuals nor the rate of gene flow has changed.

For awhile then, Ne should increase as we move backwards into the past. This process does not continue forever, because eventually the ancestors will be randomly distributed across the population as a whole. We therefore expect Ne to increase towards an asymptote, which represents the effective size of the entire population.

This simple story gets more complex if there is change in either the census size or the rate of gene flow. Mazet and Noûs [2] have shown that one can mimic real estimates of population history using models in which the rate of gene flow varies, but census size does not. This implies that the curves in Fig. 1 are ambiguous. The observed changes in Ne could reflect changes in census size, gene flow, or both.

For this reason, Mazet and Noûs [2] would like to replace the term “effective population size” with an alternative, the “inverse instantaneous coalescent rate,” or IIRC. I don’t share this preference, because the same critique could be made of all definitions of Ne. For example, Wright [3, p. 108] showed in 1931 that Ne varies in response to the sex ratio, and this implies that changes in Ne need not involve any change in census size. This is also true when populations are geographically structured, as Mazet and Noûs [2] have emphasized, but this does not seem to require a new vocabulary.

Figure 1: PSMC estimates of the history of population size based on three archaic genomes: two Neanderthals and a Denisovan [1].

Mazet and Noûs [2] also show that estimates of Ne can vary in response to selection. It is not hard to see why such an effect might exist. In genomic regions affected by directional or purifying selection, heterozygosity is low, and common ancestors tend to be recent. Such regions may contribute to small estimates of recent Ne. In regions under balancing selection, heterozygosity is high, and common ancestors tend to be ancient. Such regions may contribute to large estimates of ancient Ne. The magnitude of this effect presumably depends on the fraction of the genome under selection and the rate of recombination.

In summary, this article describes several processes that can affect estimates of the history of effective population size. This makes existing estimates ambiguous. For example, should we interpret Fig. 1 as evidence of a declining number of archaic individuals, or in terms of gene flow among archaic subpopulations? But these questions also present research opportunities. If the observed decline reflects gene flow, what does this imply about the geographic structure of archaic populations? Can we resolve the ambiguity by integrating samples from different locales, or using archaeological estimates of population density or interregional trade?

REFERENCES

[1] Fabrizio Mafessoni et al. “A high-coverage Neandertal genome from Chagyrskaya Cave”. Proceedings of the National Academy of Sciences, USA 117.26 (2020), pp. 15132–15136. https://doi.org/10.1073/pnas.2004944117

[2] Olivier Mazet and Camille Noûs. “Population genetics: coalescence rate and demographic parameters inference”. arXiv, ver. 2 peer-reviewed and recommended by Peer Community In Mathematical and Computational Biology (2023). https://doi.org/10.48550/ARXIV.2207.02111.

[3] Sewall Wright. “Evolution in mendelian populations”. Genetics 16 (1931), pp. 97–159. https://doi.org/10.48550/ARXIV.2207.0211110.1093/genetics/16.2.97.

PDF recommendation

Conflict of interest:
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article. The authors declared that they comply with the PCI rule of having no financial conflicts of interest in relation to the content of the article.

Funding:
No funding

Reviews

Reviewed by anonymous reviewer 2, 11 Apr 2023

The authors have validated most of the minor/stylistic suggestions made by the reviewers, which I think resulted in an overall improvement of the manuscript. Regarding the more general comments that were made:
a- the bibliographic survey is more comprehensive and credits a number of studies/ideas from various research groups, which also is an improvement;
b- my suggestion of roughly quantifying how problematic it is to ignore population structure (by showing that most populations indeed depart the panmixy assumption to an extent expected to seriously bias inferences) was not really followed, but this was admittedly an uneasy to deal with comment - maybe a subject for another article, actually;
c- the section on natural selection was substantially improved and now gives a pertinent description of the underlying rationale.

Regarding item c above, I have an additional, minor comment: the text says that "portions of the genome undergo a recurrent higher impact of selection due to their local gene content or recombination rate"; actually gene content determines the strength of selection, and recombination rate determines the width of the genomic region expected to be affected. I find the sentence to be a little bit misleading in suggesting that the two forces were more or less interchangeable. I would suggest either dropping the mention to recombination or writing "and recombination rate" instead of "or recombination rate".

Overall I think the authors invested a great deal of effort in improving the manuscript based on the reviewers' comments, making it a valuable addition to the literature.

https://doi.org/10.24072/pci.mcb.100150.rev21

Evaluation round #1

DOI or URL of the preprint: https://doi.org/10.48550/arXiv.2207.02111

Version of the preprint: 1

Author's Reply, 13 Mar 2023

Download author's reply https://doi.org/10.24072/pci.mcb.100150.ar1

Decision by Alan Rogers, posted 16 Sep 2022

Attached.

Download recommender's annotations

https://doi.org/10.24072/pci.mcb.100150.d1

Reviewed by anonymous reviewer 2, 17 Aug 2022

This preprint reviews general aspects of the coalescence theory in population genetics, then focusses on recent work by the authors on modeling and interpretating the variation in coalescence rate across loci measured from genome-wide data.

This is indeed an assumed choice by the authors, and one maybe questionable since not only them have contributed to this field, obviously. Yet the authors have been (among) the first to clearly identify and demonstrate the naivety of the typical, single-pop interpretation of PSMC patterns, which has dominated the field for years. They have a well-articulated message on this problem, and propose new developments for an informed interpretation of this kind of analysis, which all revolve around the IICR formalism they have introduced. For this reason I think this piece is a valuable addition to the literature. Yet I would encourage the authors to mention similar/related developents made by other groups, if any. It should be noted that the authors use a generic author name (with the exception of a junior scientist), which indicates that this contribution is not intended to inflate their bibliometric stats.

My other general comment would be about the importance of the problem in practice. Ignoring population structure is only a problem if populations are structured, and sufficiently so that the IICR is substantially affected. So a naive question would be: are typical natural populations sufficiently structured for the problem to be serious? I am pretty sure the ansewer is yes, and indeed some of the authors' papers address this, but I feel like this review could be an opportunity to make kind of a general agument here. Could the authors give an idea, maybe based on reviews of the empirical literature, of what proportion of analyses are expected to be seriously flawed vs. more or less robust to this problem? Even a broad picture would be a plus in my opinion.

Finally I have a number of point-by-point comments/suggestions which I hope might help improve the clarity of this important, well-done manuscript.

- [p1; section 1; paragraph 2] "until successive common ancestors are found"
-> sounds like an awkward process-ending condition to me; maybe "until a common ancestor is found" ?

- [p1; section 1; paragraph 3] "The mathematical objects of interest..."
-> I think I would use singular instead of plural ("The mathematical object of interest is the joint distribution...")
-> "express [...] as a function of [...]" : maybe just "predict"?
-> this sentence more or less implies that the whole coalescence process is entirely described by coalescence times, whereas, one could say, tree topology also matters (e.g. you don't expect the same SFS depending on whether tree is symetrical vs. pectinated, as soon as n>3); "coalescence times" and "tree" are often taken as synonyms in the ms; maybe clarify by adding a section/sentence about tree topology, its distribution and independence with respect to demography and selection?
-> this sentence mentions "a family tree", suggesting that we're here considering a single locus, whereas the previous paragraph mentions "loci", and the next sentence mentions recombination as a relevant parameter, implying several loci - could the text be more consistent with this respect?

- [fig1 legend]
-> "in the past" -> "ago"? (twice)

- the first three equations of p3 are not numbered

- [p3, second equation]
-> I am not sure every reader will know the "floor" symbol so would suggest defining it (or dropping it)

- [p3, third equation]
-> Tk and t in the third equation are not expressed in the same unit as Tk and t in the second equation (2N generations vs. generations), but the same symbols were kept

- overall I fell like the treatment of time scale normalization by Ne could be improved; it is introduced very lightly in the first equations, then recalled in several sections, sometimes in a lenghty way

- [p3, last section] "The absence of the panmictic assumption" -> "To relax the assumption of panmixy" ?

- [p4, first section] "assuming that the sizes of each population are sufficiently large" -> "assuming that populations are sufficiently large" ?

- equation 3: I do not understand why the left-hand term is not Q(alpha,beta) instead of Q(n_alpha,n_beta); I do not understand what these n_alpha and n_beta terms represent; I am apparently missing a level of complexity here

- [p4, "Mutation and genetic diversity" section] Reference to Tajima's D probably misplaced here. Tajima's D is a statistic that combines two estimators of 4Ne.mu (topic of the section) to learn about deviations from the standard coalescent. Maybe the intended reference is Tajima 1983 Genetics 105:437 ?

- [p6, first section] the key sentence starting "Considering that..." seems to lack a verb

- [p6, middle, first consequence ("The sole data...")] I would suggest being explicit and replacing "demographic model" with "population structure". At first reading I mentally interpreted "demographic model" as "model of Ne change in a single pop", thus missing the point. I know this is because I'm biased in a priori considering a single pop when thinking coalescent. Still I might not be alone, and given the importance of the sentence I would suggest avoiding any ambiguity.

- [p8] "so the matrices can be time dependent as piecewise constant functions" -> "so the matrices can be piecewise constant functions of time" ?

- [p10] "it is natural to want to increase" -> I would suggest rephrasing as "A natural way to increase the precision of the estimation of demographic parameters is to increase the sample size."

- [p13, middle]: "this rate being linked to the reproductive capacity"; this seems confusing, and the nature of the link is unclear; there is no such a thing as distinct portions of a genome differing in their "reproductive capacities"; instead a genomic portion hosts genes at which distinct alleles conferring distinct reproductive capacities to their carriers used to appear and segregate, affecting the IICR; the sentence seems to entail a simple link between coalescence rate and the strength of selection, whereas in reality things are more complicated; for instance both selective sweeps (positive s) and background selection (negative s) tend to reduced coalescence times. Maybe instead "this rate being linked to the selective regime at work"?

- Section 4 could discuss the appropriateness of modeling linked selection via a variable Ne across the genome. Selection has some intrinsic property of being variable in time, whereas the model presented here assumes a constant in time coalescence rate in any portion of the genome. For instance selective sweeps are expected to induce brief periods of very high coalescence rate (eg see papers by Barton, Hermisson, Petrov, Jensen and many others), i.e., affect the shape of genealogies and the IICR in a way not easily captured by the model used here. This model however is probably great in capturing the across-genome variations in constant-in-time selective regime, e.g. regions under balancing selection vs regions under recurrent background selection. Overall I feel like connecting section 4 a bit more firmly to the (heavy) body of literature on selection detection in pop genomics would be a great addition. Note that these considerations on tree shape essentially disappear when sample size=2, which is still often the case in PSMC-related literature, and this could also be mentionned as a justification for the approach presented here.

- [last section] in addition to Charlesworth 2009 the authors might like to cite the recent Waples 2022 J Hered 113:371

https://doi.org/10.24072/pci.mcb.100150.rev11

Reviewed by anonymous reviewer 1, 15 Sep 2022

In this manuscript, Oliver Mazet and “Camille Noûs” summarize important recent work on how population structure and selection bias PSMC estimates. In these studies, it has been shown that the PSMC actually estimates the inverse of the instantaneous coalescence rate (IICR), which captures the distribution of coalescence times along the genome. Changes in the IICR only directly correspond to changes in the effective population size in a panmictic population, but not if the population is structured. This has important implications because PSMC is often applied to genomes of humans and other species to infer demographic histories, and PSMC plots are usually interpreted as changes in the effective population size. Therefore it is critical to know which processes shape PSMC estimates and in which ways. Because of this, a review of recent developments in this direction will be useful to the broader scientific community. The equations described throughout this manuscript appear to be correct.

However, as presently written, this manuscript largely focuses on a decade of work from a single research group. It would benefit greatly from being reworked as either a primer on how to interpret PSMC results, or as a review paper that covers the work of additional research groups. An additional issue is that this manuscript can be improved for clarity. It is not always clear why certain concepts are introduced or discussed and where the text is leading to. An additional improvement would be to reduce the number of figures (as some are a little redundant).

A central point of this manuscript is that changes of IIRC inferred by PSMC should only be cautiously interpreted as actual changes in effective population sizes (Ne) if one is confident that the panmictic model holds. However, Ne != N. This distinction between Ne & N should be made explicit throughout the manuscript, as sometimes the term “population size” is used in a blurry ways. The utility of this paper would be significantly increased if the author(s) were to further discuss how different evolutionary and demographic processes can shape Ne/IIRC and PSMC plots.

Please, ensure to cite previous relevant work at all appropriate positions throughout the text. Sometimes it is not clear which study is discussed (e.g., discussion of Figure 7 on Page 9). Also other relevant computational tools for inferring population structure and changes in IICR should be cited (e.g., SMC++ and diCAL).

Additional suggestions and queries are listed below:

Page 1: I appreciate that the authors are up-front about being an anonymous collective (“Camille Noûs”). However, I think this can be shortened or even mentioned in a footnote. If the goal of give credit to other members of a research team, why not name them explicitly in an acknowledgements section at the end of the manuscript?

Page 1: The first paragraph can be more informative about the contents of this manuscript. Furthermore, the second paragraph of the paper should be more specific about what the problem is (i.e., that population structure affects PSMC estimates) and how recent work has tried to address the problem. Doing so will help guide the reader to know what is coming later in this manuscript. Following the previous comment, the review of coalescence theory is a bit lengthy and it is not always clear why these concepts are introduced. It may be good to give a brief introduction first, elaborating on the problem addressed by the reviewed work and setting up the following sections. This will make it easier for the reader to follow and see how each subsection relates to the bigger picture.

Page 4: The interpretations of the conditions in Eq. 3 should be provided.

Pager 5: Recent years have seen advances in the application of ARGs. It would be a good idea to add some citations in this regard, and perhaps go into additional detail about the strength and weaknesses of ARGs.

Page 7: I understand that Figure 3 was taken from another publication but simulation parameters in the figure legend would be helpful. Also please add a citation to the original paper in the Figure 3 legend so that one knows it is not an original result. Similar suggestions hold for subsequent figures.

Pages 7 and 9 (related to Figure 5): What exact models were simulated? I get that they are described in detail in Chikhi et al. 2018 but the reader shouldn’t need to go to the original publication to understand what is simulated.

Page 9 (related of Figure 7). This is an interesting point. It might be worth going into more detail, and providing the relevant citation.

Page 10: Computation of the IICR_k: This is interesting theory but why is introduced? Was it applied to data yet? If yes, what were the results?

Page 11: In Figure 8, please define what a component is in the figure legend. What are the units of t and M?

Page 13: The discussion of the SFS comes out of the blue and it is not entirely clear how it fits into the broader context. As such, it could be deleted without harming the paper.

https://doi.org/10.24072/pci.mcb.100150.rev12

User comments

No user comments yet

or Register
Submit a preprint