Convergent Adaptation in Mitochondria of Phylogenetically Distant Birds: Does it Exist?

Abstract In a wide range of taxa, proteins encoded by mitochondrial genomes are involved in adaptation to lifestyle that requires oxygen starvation or elevation of metabolism rate. It remains poorly understood to what extent adaptation to similar conditions is associated with parallel changes in these proteins. We search for a genetic signal of parallel or convergent evolution in recurrent molecular adaptation to high altitude, migration, diving, wintering, unusual flight abilities, or loss of flight in mitochondrial genomes of birds. Developing on previous work, we design an approach for the detection of recurrent coincident changes in genotype and phenotype, indicative of an association between the two. We describe a number of candidate sites involved in recurrent adaptation in ND genes. However, we find that the majority of convergence events can be explained by random coincidences without invoking adaptation.


Abstract Introduction
Phenotypes 87 We analyzed 415 species of birds. All species were divided into 7 groups according 88 to their phenotypic characteristics. Phenotypes were classified in accordance with the Bird 89 of the World research database (https://birdsoftheworld.org, accessed August 20, 2020). 90 As high altitude we classified those species for which the lower boundary of the range was 91 over 2000 meters, and the upper boundary of the range, over 4000 meters above sea level. 92 As divers we classified those species which can spend at least several minutes underwater. 93 As species with ability for long-distance migration we considered those species with non-94 overlapping or weakly overlapping breeding and wintering ranges. As wintering we 95 classified those species which are typically exposed to sub-zero temperatures and snow 96 cover for many months each year. We also formed two samples of species with specific 97 flight-related phenotypes: flightless birds, a phenotype which has originated repeatedly in 98 different groups of island birds; and birds with outstanding flight abilities. The latter group 99 united swifts, hummingbirds, swallows, terns and gulls, scuas, gannets, tropicbirds, falcons 100 and accipiters. Although the similarity of these adaptations may be controversial, we 101 hypothesize that the lifestyles of all these groups involve high energy demand and thus 102 could affect the mitochondrial genes similarly. 103 To study phenotypic associations, we also need a reference group of species which 104 do not carry the specific adaptations considered in this work. The choice of such a reference 105 is a complicated task, because of the complexity of natural ecological adaptations. As the 6 reference group, we decided to use tropical and subtropical birds with ranges not extending 107 above 2500 meters and which have none of the specific aforementioned adaptations. The 108 number of species in each phenotypic group is provided in Table 1, and the list of species 109 is provided in Figure S5. 110 111 reconstruction. We applied constraints from the most recent revision of bird phylogeny 124 (Kimball, 2019), which combines nuclear and mitochondrial data to construct a consensus 125 supertree for 707 bird species. Amino acid ancestor states reconstruction was also 126 performed in IQ-Tree with usage of genewise partitioned mtVer evolutionary model. 127

Search for convergent evolution and phenotype to genotype associations 128
To count simultaneous changes of phenotype and genotype at the site, we use the 129 simultaneous score of the TreeWAS package (Collins and Didelot, 2018). The 130 simultaneous score is designed for the so-called phylogenetic GWAS analysis. It splits 131 each alignment column into binary (two-state) SNPs and counts simultaneous changes of 132 phenotype and genotype at tree branches. To estimate the probability that the observed 133 association is non-random, TreeWAS simulates a "null" genetic dataset under the empirical 134 phylogenetic tree and terminal phenotypes. It also takes from empirical data the distribution 135 of numbers of substitutions per site. In each simulation, it counts the number of 136 phylogenetic branches with simultaneous changes of phenotype and genotype, and 137 combines these counts to obtain the null distribution. At the upper tail of the null 138 distribution, a threshold of significance is drawn at the quantile corresponding to [1 -139 (alpha-level corrected for multiple testing)]. If a locus in the real dataset has more 140 simultaneous changes than the threshold, it is considered to be significantly associated with 141 the corresponding phenotype. 142 TreeWAS was originally developed for analysis of whole-genome datasets of 157 closely related species, in which the assumption that no more than two variants may occur 158 in any particular site generally holds. By contrast, in distantly related bird species 159 considered here, the amino acid sequence of mitochondria has frequently undergone 9 multiple substitutions per site. Therefore, we had to adapt TreeWAS for dealing with non-161 binary SNPs. This is non-trivial, as we have no a priori knowledge which of the amino acid 162 changes may be associated with changes in the phenotype. 163 We used three approaches to identify the phenotype and genotype changes. 164 Generally, each approach resulted in its own set of branches with phenotype and genotype 165 changes, and therefore in different estimates of the simultaneous score. 166 In the first and in the second approach, we designated one amino acid as foreground, 167 and did not distinguish between the remaining amino acids. We also designated one 168 phenotype (of the two possible phenotypes) as foreground. In the first approach, as 169 genotype changes, we only considered the gains of the foreground amino acid; and as 170 phenotype changes, we only considered the acquisition of the foreground phenotype. As 171 coincident changes, we considered the phylogenetic branches where these events coincided. 172 This approach is referred to as "Convergence" (Fig. 1A). 173 In the second approach, as genotype changes, we considered both the gains and the 174 losses of the foreground amino acid, and as phenotype changes, both the gains and the 175 losses of the foreground phenotype. As coincident changes, we considered the phylogenetic 176 branches where both the foreground amino acid and the foreground phenotype were gained, 177 or both were lost. This approach is referred to as "GWAS" (Fig. 1B). 178 Finally, in the third approach, we assumed that any amino acid substitution 179 constitutes a genotype change event, and any change in the phenotype counts 180 independently of its direction. This approach is referred to as "All changes" (Fig. 1C).
These approaches correspond to different assumptions regarding the genotype-182 phenotype association. The "Convergence" approach assumes that the gains of the trait are 183 associated with a gain of a specific amino acid variant, while its losses can proceed through 184 multiple means. The "GWAS" approach assumes that both the gains and the losses of the 185 trait are associated respectively with gains and losses of a specific amino acid variant. 186 Finally, the "All changes" approach assumes that both the gains and the losses of the trait 187 are associated with any changes in the encoded amino acid. 188

Change of site-specific amino acid propensities 189
To get an alternative view of amino acid changes associated with convergent

Simultaneous change 208
All three types of simultaneous score metrics revealed that the number of 209 significant associations was low. The highest number of sites (8) was detected in high 210 altitude birds, all of them of marginal significance. The Convergence test detected two sites 211 in ND1 and ND5 genes (Figure 2). The GWAS test detected another two sites in ND2 gene 212 ( Figure S1). The All changes test detected 6 sites in ND1, ND2, ND4, ND5 and ND6 genes 213 ( Figure S3). Findings of different tests partially overlap (Table 2). Additionally, the All 214 changes test detected the site associated with adaptation to long-distance migration in the 215 ND5 gene. When the stronger Bonferroni correction was applied, only two sites detected 216 by the Convergence test in high altitude birds and one site detected by the All changes test 217 in long-distance migrants remained significant (Table 2).

Profile change 230
As an additional test for convergence, we use the amino acid Profile change metric. 231 We expect that recurrent mutations emerging simultaneously with convergent phenotype 232 change could also lead to a change in the amino acid profile between the branches carrying 233 the foreground and the background phenotypes. We compared the results of the 234 Simultaneous tests under all three approaches with the Profile change test (Figure 3, Figure First, we were interested to estimate profile change levels at sites that have a 237 significant Simultaneous score. Among them, only one (57th position of ND5, detected by 238 the Convergence test) has profile change score above 0.5 (0.82) and thus can be considered 239 as potentially convergent. Others either result from convergent changes to the same amino 240 acid without a profile change (17th position of ND1), or are a consequence of divergent 241 evolution (all other positions). 242 Second, we could expect that sites with higher simultaneous score could have 243 higher profile change metric if a substantial fraction of these sites is involved in convergent 244 adaptations. To test this assumption, we arbitrarily divided the plots (Figure 3, Figure S2, 245 S4) into four parts, and tested if sites with higher simultaneous score have higher profile 246 change score. We found no dependency in any of the tested phenotype groups (Fisher test, 247 significance level 0.01). works, we here explore convergence between distantly related species. All the phenotypes assumption. This suggests that adaptive convergent evolution is rare or hardly detectable 310 in bird OXPHOS system at the considered phylogenetic distances. 311 There remains a possibility that convergent adaptations in the OXPHOS system 312 could be found in groups of close relatives when the tree of life will be sequenced with 313 higher density. As it was shown by Natarajan et al (2016), similar mutations in hemoglobin 314 subunits lead to high latitude adaptations only in a similar genetic context: there are typical 315 "hummingbird high altitude mutations" and typical "duck high altitude mutations". If so, 316 the near-lack of signal in our study has to do with the fact that it was based on too distant 317 organisms which have highly divergent evolutionary landscapes in the genes of interest. 318 However, other work indicates that similar substitutions are rarely involved in independent 319 adaptation to high altitudes even inside a group of closely related species like 320 hummingbirds (Lim et al 2019). 321 Further work may improve the search for simultaneous changes by using better 322 statistical models. Specifically, these models could be improved by incorporating the 323 heterogeneity of the substitution rates between sites or phylogenetic branches.