-
PDF
- Split View
-
Views
-
Cite
Cite
Pratap Singh, Trevor D Price, Evolution of species recognition when ecology and sexual selection favor signal stasis, Evolution, Volume 78, Issue 10, 1 October 2024, Pages 1647–1660, https://doi.org/10.1093/evolut/qpae099
- Share Icon Share
Abstract
The process of reproductive character displacement involves divergence and/or the narrowing of variance in traits involved in species recognition, driven by interactions between taxa. However, stabilizing sexual selection may favor stasis and species similarity in these same traits if signals are optimized for transmission through the prevailing environment. Further, sexual selection may promote increased variability within species to facilitate individual recognition. Here we ask how the conflicting selection pressures of species recognition and sexual selection are resolved in a genus of Himalayan birds that sing exceptionally similar songs. We experimentally show that small differences in two traits (note shape and peak frequency) are both necessary and sufficient for species recognition. Song frequency shows remarkable clinal variation along the Himalayan elevational gradient, being most divergent where species co-occur, the classic signature of reproductive character displacement. Note shape shows no such clinal variation but varies more between individuals of an allopatric species than it does among individuals within species that co-occur. We argue that the different note shapes experience similar transmission constraints, and differences produced through species interactions spread back through the entire species range. Our results imply that reproductive character displacement is likely to be common.
The achievement of sympatry among sister species generally involves three steps (Mayr, 1942; Mayr & Diamond, 2001; Price, 2008; Tobias et al., 2020). First, populations diverge in allopatry. Second, differentiated populations come into parapatric contact, sometimes associated with the formation of hybrid zones. Third, populations establish large overlap in range (i.e., sympatry), whereby heterospecifics are largely ignored and hybridization is absent. The presence of allopatric and parapatric forms has enabled many studies of the first two stages (Anderson et al., 2023; Freeman & Montgomery, 2017; Price, 2008; Uy et al., 2018; Weir & Price, 2019). The third and final step of co-existence, that is, the development of heterospecific recognition in sympatry, has been more difficult to study, primarily because sympatric species differ in many ways. For example, large differences among bird species in extreme sexually selected clades have been extensively studied (e.g., the birds of paradise, Ligon et al., 2018), and sexual selection invoked as both a driver of divergence and a contributor to species recognition (Cooney et al., 2019; Lande, 1981; Snow, 1976; West-Eberhard, 1983), but the extent to which divergence and species recognition is completed in allopatry or results from interactions between taxa in sympatry is unclear (Coyne & Orr, 2004; Hudson & Price, 2014).
While emphasis has been on the study of striking sexually selected differences between related bird species, many sister species are very similar to each other in both color (Fišer et al., 2018; Marcondes & Brumfield, 2019; Mayr, 1942; Weir et al., 2015) and vocalizations (Anderson & Weir, 2022; McEntee et al., 2021; O’Loghlen et al., 2011). Given that this similarity is often retained over millions of years of separation, stasis is attributed to fitness costs for individuals with deviant phenotypes, as has been regularly invoked for stasis in morphological traits (Anderson & Weir, 2021, 2022; Charlesworth et al., 1982; Davis et al., 2014; Estes & Arnold, 2007; Lynch, 1990). Stasis may be imposed if social selection against deviants is frequency-dependent, whereby the common type is favored because it is more readily recognized. For example, in many territorial bird species, male competition for high quality territories is likely to be an important component of sexual selection (Scordato, 2018). In such cases, sexual selection maintaining song stasis across species arises if deviants are less easily recognized during competition for territories, leading to escalated aggressive interactions (Derryberry, 2007; Rohwer, 1973). Stabilizing selection may arise in non-frequency dependent ways as well. For example, long-distance signals are expected to be adapted for effective transmission through the prevailing environment (Endler, 1992; Fuller & Endler, 2018), resulting in the penalization of deviant songs that travel less well.
Fuller and Endler (2018) introduced the concept of a permissive environment, across which a diversity of different signals can be effectively transmitted and sexual selection can rapidly drive diversification. In this case, different signals may be effectively interchangeable, producing similar responses in distant receivers. At the other extreme of permissive environments, in restrictive environments, trait exaggeration is limited and only a few variants can effectively be transmitted. The classic example of a restrictive environment is the narrow band wavelength of light at lower depths in the ocean, making diversification in color signals irrelevant. Note that environments may be permissive for one sensory modality (e.g., sound) while being restrictive for another (e.g., color). In this paper, we examine the consequences of a restrictive terrestrial environment for the evolution of vocalizations used in species recognition.
During the allopatric phase of speciation, if different populations remain in the same restrictive environment, stabilizing selection should result in signals remaining at the single optimum that maximizes transmission and minimizes degradation (Endler, 1992). Such stabilizing selection promotes not only uniformity in mean values but also low variation around the optimum, yet other forces may favor an increase in variability. For example, signals often transmit individual identity to conspecifics, and large within-species variation facilitates rapid recognition of familiar individuals (Dale et al., 2001; Whitfield, 1987). Hence, at least in territorial vertebrate systems, we expect a balance between forces acting to promote and reduce variability (Figure 1A, top panel). Together, low divergence and high variability result in overlap between allopatric taxa; the traits will not be diagnostically different when populations come into contact, so members of one taxon are expected to respond to the signals of the other.

Conceptual view of divergence of signals in a restrictive environment in allopatry (above), followed by sympatry (below). (A) Frequency distributions of signal (solid lines) and response to signal (dashed lines) for individuals in two populations signified by different colors. Responses are expected to cover a wider range of signal trait values than signal itself to account for degradation in noise (Luther & Wiley, 2009). In allopatry, populations show little divergence, and variance within each population is a balance between forces favoring variability, such as individual recognition, and those favoring a single optimum, notably transmission through the environment. In sympatry, an additional selective force of species recognition is achieved by trait divergence (compromising transmission), and a reduction in trait variance (compromising individual recognition), accompanied by a shift in the mean and narrowing of the response curve, respectively. (B) Contour plot of trait and response curves for two traits. If divergence in a single trait is limited, species recognition can be completed through assessment of more than one trait.
When two diverged allopatric taxa come into contact, various maladaptive consequences associated with reproduction are often observed. They include wasteful aggressive responses over competition for territories or mates, signal jamming, and hybridization associated with the production of unfit hybrids. Collectively, these negative interactions are termed reproductive interference (Gröning & Hochkirch, 2008). They generate selection pressures that favor (1) divergence in mean trait value and a reduction in within-taxon variability (Brown & Wilson, 1956) and (2) receivers that narrow their “window of recognition” so that they respond only to members of their own taxon (Hudson & Price, 2014; Irwin & Price, 1999; Seddon & Tobias, 2010) (Figure 1A, lower panel). Together, these processes are termed reproductive character displacement, here defined as evolution of traits or trait recognition driven by reproductive interference. Finally, species recognition may be completed through the employment of multiple traits in assessment (Figure 1B), which is especially to be expected in restrictive environments, if divergence in any particular trait is limited by transmission constraints. The predictions that arise out of this framework, and form the basis for our study, are that entry into sympatry in restrictive environments leads to (1) limited divergence in mean, (2) reduced variability in signal and signal response, and (3) potentially the use of multiple traits (Figure 1).
We study the four species of Himalayan Cyanoderma babblers (Figures 2 and 3). In their field guide to the birds of the Indian subcontinent, Rasmussen and Anderton (2005, p. 438) stated: “All four [Indian] species have extraordinarily similar vocalizations that can hardly be distinguished, if at all (even on sonagrams!).” While species do differ in color (Figure 2), song is the only means of long-distance communication in the thick undergrowth the species occupy. In Figure 4, we show that individuals of two species which do not co-occur, Cyanoderma ruficeps and Cyanoderma ambiguum, have extremely similar songs, and in Figure 2C we show they respond to each other essentially as if they were conspecific, despite more than 2 million years of separation.

Along the elevational gradient in the east Himalaya the High and Low species do not encounter each other and treat their songs as essentially conspecific, whereas the overlapping pairs of species readily distinguish each other’s songs. (A) Schematic of the four Himalayan Cyanoderma species distributions, based on Birdlife range maps (birdlife.org), eBird records (eBird.org), and personal observations. Observations from lower elevations in east Nepal are few and distributions unclear (for range maps, see Supplementary Figure S1). (B) Phylogenetic relationships (from Moyle et al., 2012), paintings by Ren Hathway (Lynx Edicions). (C) Responses to playbacks in the east Himalayan forms and conspecific controls (means ± SE). Color along the x-axis gives the taxon of the tape played, and color of the vertical bar gives the taxon of the receiving male, with standard errors. Sample sizes are, from left to right: 11, 15, 12, 25, 12, 27 and for conspecific playbacks 31, 37, 35, 14, 25. In pairwise comparisons, responses of species that overlap in elevational distribution (Low-Middle, Middle-High) are not significantly different from each other but differ significantly from the Low-High and conspecific controls (all p < 0.001, based on mixed models with Poisson errors and the specific individual tape played included as a random effect, see Supplementary Material: “Mixed model assessment of responses”). Low-High and conspecific controls do not differ significantly from each other (p > 0.05).

Each male sings just one song type, consisting of repeated notes over a narrow bandwidth and at a frequency optimal for transmission in near ground forest with dense undergrowth. (A) Representative spectrograms from each species. Three different males are shown in the top three rows and four in the bottom row. (B) Mean maximum and mean minimum frequencies averaged within individual songs, and then across individuals for all species combined, placed under the sound transmission window estimated by Morton (1975) for dense undergrowth. (C) Scatter plot of peak frequency against body mass for 1,750 passerine bird species with masses between 5 and 15 g (from Mikula et al., 2021). The filled points mark the four Himalayan Cyanoderma species.
![Scatter plot of peak frequency in a song against frequency contour for the three species along the elevational gradient in the east Himalaya. Song features are similar in the Low and High species, but differ from the Middle species, especially when two traits are considered together. Peak frequency is the average of all songs measured for an individual (N songs [individuals]) = 290 (27), 233 (21), 150 (19), for the Low, Middle, and High species, respectively. Frequency contour is PC1 of note quartiles, averaged within a song (Supplementary Figure S4, one song per individual recording, N = 397, 320, 338 notes per species). Illustrations are of notes with PC1 values of −1, 1, and 3, respectively; arrows point to the corresponding position on the x-axis.](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/evolut/78/10/10.1093_evolut_qpae099/1/m_qpae099_fig4.jpeg?Expires=1748898245&Signature=pb9U6w2z6cNv~PYa0k7~z~XdjmYAU003e2cdEB3WFbxbr1zS-iWl1yo7xCQHbxNPwZVCfhYlI-61F83gE8a~1y6ToXJH6J-orw4FVlD1xlRnBD0I49AKEeeN~~KNyMRWp6mKdPfh5vqdrq3RxJArKSlrXEsmSh0HQQjLX1bKadjfd~UvZUD9NPxpRD-MBBedUKBIqi~6DVBBYLIYYgfVj-Kt17BluOmxtZQCxRPQc9xHQGsBsHRqgDt8lmYoV6-6sdk-GcX8xk16YP4yNwFMckcrV9XvenPSSRj5WoUzp2n~-d75JN5IF0MNd1bEx~RgVZ-sBtk~L6EwmRu6bfRmHQ__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Scatter plot of peak frequency in a song against frequency contour for the three species along the elevational gradient in the east Himalaya. Song features are similar in the Low and High species, but differ from the Middle species, especially when two traits are considered together. Peak frequency is the average of all songs measured for an individual (N songs [individuals]) = 290 (27), 233 (21), 150 (19), for the Low, Middle, and High species, respectively. Frequency contour is PC1 of note quartiles, averaged within a song (Supplementary Figure S4, one song per individual recording, N = 397, 320, 338 notes per species). Illustrations are of notes with PC1 values of −1, 1, and 3, respectively; arrows point to the corresponding position on the x-axis.
Using a combination of ancestral reconstruction, song analyses, and playback experiments, we assess the framework outlined in Figure 1. Our goals are to:
(1) Infer the factors that have led to extraordinary song stasis.
(2) Experimentally assess recognition of different components of the songs given the song similarity.
(3) Evaluate the role of reproductive character displacement as the driver of species differences in song.
Character displacement has previously been inferred using two approaches: first through comparisons between species (allopatric species should differ less from each other than sympatric species (e.g., Yukilevich, 2021), and second through studies of geographical variation within a pair of species, with the expectation that species should show displacement in regions of overlap (Goldberg & Lande, 2006; Grant, 1972; Kirschel et al., 2009). A major issue with the first approach is controlling for time (sympatric species are often older, Hudson & Price, 2014). A major issue with both approaches is that other factors that differ between zones of allopatry and sympatry that affect evolution (Grant, 1972; Kirschel et al., 2009). For example, if two species occupy different heights in the tree, and height affects song frequency, frequency may vary into zones of sympatry associated with available tree height variation, and this may sometimes result in larger differences in sympatry than allopatry.
Here we adopt both approaches, comparing vocalizations between species in sympatry and allopatry, and studying within-species along the elevational gradient. Especially, along the elevational gradient, co-occurring individuals from the two species are exposed to a similar acoustic environment, but populations outside the zone of overlap occupy quite different acoustic environments (Singh & Price, 2015, their Supplementary Figure S5). Hence, greater differences between populations where the two species are found together are not expected to result from independent adaptation to different aspects of the acoustic environment. Instead, independent adaptation predicts convergence in zones of overlap. This implies that the assessment of character displacement through comparisons of differences between populations outside and within the zone of overlap is a relatively robust test of reproductive character displacement.
Methods
Study system
The four Cyanoderma babblers found on the Indian subcontinent are small (7.5–10.5 g), skulking, undergrowth species. Three of the four species form elevational replacements in the east Himalaya (Figure 2A, Supplementary Figure S1). The low elevation (approximately 150–1,100 m) and high elevation (1,500–2,900 m) species are not found together, but each extensively overlaps the mid-elevation species (800—2,300 m; Figure 2A, Supplementary Figure S1). An allopatric species is found in the west Himalaya, and an allopatric population of the low-elevation east Himalayan species (C. ambiguum) in Peninsular India in the Eastern Ghats (Supplementary Figure S1). We studied the Peninsular population of C. ambiguum in 2018 and confirmed its close relationship to the Himalayan population by sequencing two mitochondrial genes and two nuclear genes from two birds (see Supplementary Material “Relatedness of Peninsular and Himalayan Cyanoderma ambiguum”). Henceforth, for clarity we call these taxa the Low, Middle, High, West, and Peninsular taxa, rather than by species names (however, in several figures both species names and the nicknames are given for easy cross-reference).
Only males sing, and they do so prominently during the breeding season, which is from mid-February to July. Through the years 2011–2021, we visited Himalayan localities as indicated in Supplementary Figure S1 between 1 and 10 times during these months.
History
We analyzed the history of the four species using the phylogeny constructed by Moyle et al. (2012), which includes Asian Cyanoderma and close outgroups. We first combined the allospecies C. ambiguum and C. rufifrons (depicted in the figures as Low). We used the simplest Dispersal-Extinction-Cladogenesis model (Ree & Smith, 2008) implemented in BioGeoBEARS (Matzke, 2013) to infer regions of origination and timing of invasion of the Himalaya. This method estimates probabilities based on global maximization of the likelihood of transition rates. We initially assigned four regions (Himalaya, Peninsular India, Indonesia/Philippines, Southeast Asia), leading us to a two-region model (Himalaya, Indonesia/Philippines, defined by the ellipses in Figure 5), which gave more interpretable results. Pie diagrams give probabilities of ancestral states (Himalaya, Indonesia/Philippines, or both).
![The buildup of the three east Himalayan Cyanoderma species with most support resulted from two ancestral species emerging in Indonesia and separately colonizing the Himalaya (filled arrows). One lineage led to the Middle species and the other to speciation within the Himalaya leading to the Low and High species. Present-day locations are assigned based on a species presence in the two ellipses indicated in the right panel. Large circles in the reconstruction highlight critical ancestral state assessments. Ancestral reconstructions (A) and (B) assign the highest probability to a speciation event in Indonesia that led to the two ancestral lineages. Note, however, that a Himalayan (green in [A])/Indonesian split (blue in [B]) has reasonable probability.](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/evolut/78/10/10.1093_evolut_qpae099/1/m_qpae099_fig5.jpeg?Expires=1748898245&Signature=ZSSgovYr~J3mexxkXSPicrCQ8fIkzwWKSCZxfDTDu5lA6jqEvCq9cZXxGK~4qu3Xj73Z6ccqpGunZHfQP3D8ysXnZb8y2mR6ENv8F60ExDPgtrWJwApBv~dtznqA7ysmYZrQUlY~H55o5K9JqxPEIriPBRaVMuMX~GXkvAx6LSgRWlIQNNaFkn3IYB5a~hLgP4R0HXAnvoYIINJn4HWbltOP6Vk49udTVhXCFI9-WyKGuhq~O14eW93lmKAyc1UZMEq5hLR5aiUpQa9ILyIU6-fnL9mJmS4riDi44bSI-vzNwzmdLV2EBRH-nikAzLpnqyDWAIynIEuDjAb7ceSYyA__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
The buildup of the three east Himalayan Cyanoderma species with most support resulted from two ancestral species emerging in Indonesia and separately colonizing the Himalaya (filled arrows). One lineage led to the Middle species and the other to speciation within the Himalaya leading to the Low and High species. Present-day locations are assigned based on a species presence in the two ellipses indicated in the right panel. Large circles in the reconstruction highlight critical ancestral state assessments. Ancestral reconstructions (A) and (B) assign the highest probability to a speciation event in Indonesia that led to the two ancestral lineages. Note, however, that a Himalayan (green in [A])/Indonesian split (blue in [B]) has reasonable probability.
We also attempted to estimate ancestral song traits using the ACE function in APE (Paradis et al., 2004), which assumes a Brownian motion model of evolution, and BayesTraits (Venditti et al., 2011), which modifies branch lengths according to estimated rates of evolution along the branch before fitting the Brownian model, and hence gives better inference when rates vary greatly. Rates of evolution are inferred to be so high in some outgroups that under both methods, ancestral trait reconstructions become very uncertain, as described further in Supplementary Figure S2. Much evolution in the outgroups serves to emphasize the strong stasis within the Cyanoderma.
Song recordings
P.S. conducted all fieldwork. We recorded successive songs in a bout of singing using a Telinga Twin Science microphone with a Sony PCM-D50 solid state recorder, or later with a Sound Devices Mixpre3 digital audio tape recorder. Recordings were made in.wav file format with 16-bit resolution and 44.1 kHz sampling rate, or later with 24-bit resolution and 48 kHz sampling rate. For each recording, we noted altitude, date, and locality. Whenever possible, we made recordings up to 5 min long and avoided individuals where there was a great deal of background noise. Most recordings were made between 0500 and 0900 hr. For song analysis, recordings with little background noise were selected.
Playback experiments
Tapes for playback were made from clean recordings (date and location of recordings, including xeno-canto (xeno-canto.org/) downloads, which we used for some playbacks are listed in the table “Recordings Source” in Supplementary Material). A few tapes contained noisy portions (sound louder in the background than the focal song), which were removed. Recordings used for playback contained from three songs from one individual to more than 30 songs from one individual (note each male repeats a very similar song each time it sings, for example, Supplementary Figure S4). Songs standardized in Audacity use a method that sets the largest amplitude in a sound equal to the maximal value and scales other amplitudes proportionately. Speakers for playback used were the SME-AFS Portable Field Speaker (Saul Mineroff Electronics) in earlier years and the Ahuja NBA-15 Portable Rechargeable Speaker (https://www.ahujaradios.com/portable-pa/neckband-pa-systems/nba-15.html) in later years. In the field, we adjusted the volume each time based on assessment by ear. For a 5-min playback, gaps between songs were approximately the same as natural gaps. Methods follow those commonly used in such experiments (e.g., Freeman et al., 2023). We located a singing male, placed the speaker 10–25 m distant from the bird, and ran the playback on repeat for 5 min.
Responses were scored on a 4-point scale based on approach to the speaker. Specifically, a value of 0 was assigned if the male showed no directional movement or interest in the speaker. Interest was scored as follows: (1) weak response: approach not closer than 5 m to the speaker; (2) approach between 3 and 5 m (typically this occurred at least two times with retreats in between); and (3) approach <3 m (typically the male stayed close to the speaker for more than 10 s and/or repeatedly visited). In total, we conducted 550 playback experiments on 306 males of the four species. In all, 195 experiments were assigned a value of 0, 43 a value of 1, 80 a value of 2, and 232 a value of 3 (159 [68%] of the “3” responses were controls, played to conspecific males). Some males were tested sequentially with different tapes, usually once by the playing of conspecific song after a trial with no response to the first tape, to confirm a bird’s interest. On a few occasions, three or four tapes were played sequentially (also following no earlier responses), with a maximum of four sequential trials to 10 males. Further details are available in the deposited data files.
For the experimental analysis of frequency, we manipulated songs using the “Change Pitch” function in Audacity (audacityteam.org/), which does not affect temporal parameters (see Supplementary Figure S3). We increased or decreased frequency by 10% or 20%. In 45 experiments, we played the altered tapes to the same species as the tape was derived from (Nelson, 1988). We conducted these experiments for three species (High, Middle, West): 37 involved frequency changes of 10% and 8, all of which were played to the West species, involved frequency changes of 20%. We also examined responses between different taxa to 10% manipulations in the direction of the taxon receiving the playback (32 experiments in total to five species or population pairs [altered High to Middle, altered Middle to High, and altered West to High, Middle, and Peninsular]).
Song measurements
We analyzed spectrograms from 112 recordings (27 Low, 11 Peninsular, 21 Middle, 19 High, and 34 West). For further evaluation of song similarity, we also analyzed an allopatric population of the High species (C. ruficeps) from Taiwan taken from xeno-canto. Complete information on source and location for all recordings is available in the online data sets.
From these recordings, we compiled four sets of measurements. First, we measured characteristics of entire songs (an average of 9.4 consecutive songs per individual recording, range 1–33). For each song, we measured minimum and maximum frequency, center frequency (the frequency that divides the selected sound spectrum into two frequency intervals of equal energy), bandwidth 90% (the difference in frequencies separating the spectrogram into 5% and 95% energy quantiles), and peak frequency (the frequency with highest energy), with a Hann Window size of 1,024 samples. We measured song length with a window size of 256 samples. We visually assessed and counted the number of distinct notes in each song and the total number of notes. We scored songs for whether they contained an introductory note (or on occasion two introductory notes), which we define as at least 50% greater time between this note (or pair of notes) than the other notes. Introductory notes may be identical to the following notes in the song, or different.
Second, taking a single clear song from each bout, we measured all notes in the song for frequency parameters, note length, and qualitatively recorded note shape, as upsweep, downsweep, dome, saucer, or flat, according to whether frequency monotonically increased, monotonically decreased, showed an intermediate peak, an intermediate minimum, or did not change (reported for each song in the data depository). Third, we devised a quantitative measure to capture variation in frequency contour (Supplementary Figure S5). We split each note into four equal time intervals and recorded the center frequency of each interval. We then took the deviation of each center frequency from the average of the four frequencies (for example for a flat note, the deviations would be 0, 0, 0, 0). We then extracted principal components from the correlation matrix derived from 1,921 rows (each row corresponds to a note; all notes are included) × the four columns of deviations (Supplementary Figure S5). Correlations of the first principal component with the four center frequency residuals are: 0.94, 0.41, −0.84, −0.9, implying notes with large negative scores upsweeps and those with large positive scores are downsweeps (Figure 4). Finally, for our fourth measurement, we computed a measure of amplitude variation across a single note. Using the waveform, we divided the note into 0.0002 s intervals and computed the root mean square of amplitudes within each interval. Although average amplitude is not meaningful, given the varying distances and directions of recordings, we asked if amplitude variation across a note differs consistently between individuals and species. Further details are presented in Supplementary Figure S6.
Song analysis
To identify variables potentially important to species recognition, we constructed a discriminant function of the song and note measurements for the three species in the east Himalaya, after averaging songs within an individual (Supplementary Figure S6 gives the loadings). Amplitude modulation was not included in this function because it was measured on fewer songs. For playbacks, because we used the same tape on average two times in each of the comparisons, which may generate pseudoreplicates (Kroodsma et al., 2001), we used generalized linear models with tape as a random effect, and modeled the response (values 0, 1, 2, 3) with Poisson errors (Bates et al., 2015), having confirmed data were not overdispersed. However, in all significant cases, variance due to tape was estimated as 0, implying pseudoreplication is not a problem, and a general linear model without including tape as a random effect gave the same p value as one which included tape. More details, the code, and an example are given in Supplementary Material: “Mixed model assessment of responses.”
Results
History
The initial speciation event separating the four species occurred as long as 5–6 million years ago (Moyle et al., 2012; Price et al., 2014), with one lineage leading to the mid-elevation species in the east (C. chrysaeum), and the second to the other three Himalayan species, which diversified simultaneously about 2.5 million years ago (Figures 2 and 5). Biogeographical reconstructions imply that the four species’ common ancestor lived in Indonesia, with the most likely ancestral speciation event occurring in Indonesia itself, or possibly between Indonesia and the Himalaya (Figure 5). As the two species representing the basal lineages evolved away from their Indonesian ancestor they may have evolved to smaller size and higher song frequency in parallel, although ancestral reconstructions are uncertain (Supplementary Figure S2).
Song similarity
When species come into sympatry, signal similarity or even convergence can be favored under sexual selection, notably when males of different species defend territories against each other (Drury et al., 2020; Grether et al., 2009; Tobias et al., 2014). However, in our case, song similarity is not a result of interspecific territoriality. Where two species are present together (Middle elevations in the east) they generally do not respond to each other’s songs (Figure 1). Further, we observed that sequential playbacks from the same speaker of the two species often brings in each species to the conspecific song, and the other departs with no interactions (~10 times noted) implying overlapping territories (see Supplementary Figure S4 for an example of simultaneous singing of two species). Finally, species that occur alone in which species interactions are precluded (West, Peninsular, and the Taiwan population) also sing very similar songs (Supplementary Figure S8).
Stasis in the songs of these babblers is particularly surprising because they are oscines (songbirds), which learn their songs from conspecifics. Songbirds have a high cultural mutation rate (Lynch, 1996), many species have song dialects (Mundinger, 1982), and contemporary change in song traits is common (Derryberry et al., 2020; Grant & Grant, 2010; Otter et al., 2020), all suggesting possibilities for rapid cultural evolution. Four lines of evidence imply that in this group, stasis is maintained because stabilizing selection pressures are intensified by requirements for efficient transmission in the undergrowth the species occupy. First, males sing a single song type (cf. many other species in which males have a repertoire), and the song itself consists of a single repeated note (Figure 3A). Repetition of both notes and songs facilitates rapid identification of the signaler (Wiley, 2006), implying selection pressures for efficient transmission of message override advantages of song diversity. Second, the frequency bandwidth of all species lies between 2 and 2.5 kHz, matching the optimal transmission window in forest undergrowth (Morton, 1975) (Figure 3B). Third, the bandwidth is narrow (Figure 3A and B, Supplementary Figure S7) and expected to reduce distortion due to differences in degradation across frequencies (Wiley & Richards, 1978). Fourth, the species sing at exceptionally low frequencies for their body size (Figure 3C). Given the energetic costs of singing at low frequencies for a given body size, the extremely low frequencies in Cyanoderma imply selection pressures beyond energy have driven song evolution (Francis & Wilkins, 2021). Maximization of transmission is the plausible pressure (Figure 3B).
Species recognition
Despite song similarity, the co-occurring eastern species (Middle/Low and Middle/High) essentially ignore each other’s songs (Figure 2). To identify the traits that form the basis for this recognition, we constructed a discriminant function of several song measurements (loadings, plus a separate analysis of amplitude modulation are given in Supplementary Figure S6). Song duration, note curvature, within song variety, and amplitude modulation are similar across species, but peak frequency and frequency contour (measured as the extent to which frequency declines or increases across the note) separate the mid-elevation species from the other two species (3% of individuals misclassified in the discriminant function of these two traits, N = 66), with weaker separation when a single trait is considered (peak frequency [12%] and note angle [9%]). By contrast, the High and Low elevation species have near identical songs (Figure 4, misclassifications of 23%, 33%, and 41%, for the discriminant function, peak frequency, and frequency contour, respectively). The similarity of the songs of the High and Low species is especially striking because the High elevation species is 20% heavier than the Middle and Low elevation species, which are of similar size (Supplementary Figure S7).
Song frequency
We assessed the independent contributions of frequency and frequency contour to species recognition among the three sympatric east Himalayan species. First, we manipulated sound files to alter the frequency of songs by 10% (~200 Hz), thereby bringing them into rough correspondence with heterospecifics (illustrated in Supplementary Figure S3). Responses of playbacks of manipulated songs between the pair of co-occurring species (Middle/High) in the east Himalaya are shown in Figure 6A (right two panels). Three allopatric comparisons of responses to manipulated tapes are shown in Supplementary Figure S3. In all five cases, male heterospecifics responded to altered songs, which in one comparison reached 50% the response of that shown by conspecifics. Combined, the five responses were all in the expected direction, with an average change in response = 0.99 ± 0.2 SE (one sample t test for the five comparisons against μ = 0, t4 = 3.1, p = 0.004).

Responses are affected by both frequency and frequency contour. The responder is indicated by the color of the bar. Means ± SE are for guidance, p values (in gray) are based on mixed models with Poisson errors and tape included as a random effect (see Supplementary Material, “Mixed model assessment of responses”). (A) Effects of experimental manipulations of frequency. Sample sizes are left 3, 6, 37, 10, 5; center 31, 12, 25, 11; right 35, 9, 12, 8. (B) Playback of songs of the allopatric west species to the eastern species, separated according to whether the notes in the song were downsweeps or upsweeps, sample sizes are 9, 5, 13, 7, 26, 10.
The results in the preceding paragraph imply that song frequency is an important trait used in species recognition, but also that frequency differences between co-occurring species are insufficient to eliminate all responses. Two lines of evidence indicate a difference of 400 Hz would be sufficient. First, experimental alterations of song by about 400 Hz eliminated conspecific responses (Figure 6A, left panel). Second, playbacks of unmanipulated tapes between allopatric forms show that songs that differ by ~200 Hz result in some responses, whereas those that differ by ~400 Hz result in no responses (Figure 7). We conclude that species recognition could be achieved by divergence of 400 Hz, which approximately spans the range of songs within a species (and songs just outside the frequency range have been shown to be sufficient to eliminate responses in other species, Nelson, 1988), yet the average difference between species that overlap in elevational range is 200 Hz and some individuals of one species sing songs at the same frequency as some individuals of the other species (Figure 4).

(A) Experiments comparing responses between allopatric forms to unmanipulated tapes (mean ± SE). Sample sizes are given below the figure. (B) Peak frequencies (in kHz). Note the absence of response when frequency differs by ~400 Hz.
Frequency contour
To evaluate the role of frequency contour, we took advantage of the fact that some males of the West species sing with upsweep notes in which the frequency monotonically increases, and others with downsweep notes in which the frequency monotonically decreases (Figure 3A). After conducting playback experiments where the experimenter was blind to note shape (we were unaware of its significance at the time), we split responses to West songs according to whether the eastern males were played upsweep or downsweep songs. The mid-elevation species responded more strongly to downsweeps than upsweeps, corresponding to its use of downsweeps in its songs. The High elevation species responded more to upsweeps than downsweeps, corresponding to use of upsweeps in its songs (Figure 6B).
Reproductive character displacement
As noted in the introduction, patterns supporting the process of reproductive character displacement include both between and within species comparisons. First, in comparisons of a solitary species with two or more other species that are found in the same place, the solitary species is expected to be more variable and/or with a mean value intermediate to that of the co-occurring species. Second, the phenotype of a species is expected to vary geographically to produce a pattern consistent with displacement when it is found together with the postulated interacting species. As also noted in the introduction, a number of processes may lead to these patterns even in the absence of competition, notably independent adaptation to an environment that varies geographically (Goldberg & Lande, 2006; Grant, 1972; Kirschel et al., 2009). Conversely, the history of reproductive character displacement may be obscured if species are found together throughout their mutual ranges, or gene flow from zones of co-occurrence causes any differences generated between species to spread to zones where only one species is found (Liou & Price, 1994; Walker, 1974). We consider these issues while assessing song frequency, frequency contour, and song discrimination, and conclude that all three traits show some evidence of reproductive character displacement.
First, with respect to peak frequency, the mean value for the West form lies between that of the Low and Middle species (Figure 8A), as expected if the Low and Middle elevation species are experiencing character displacement. Frequency also shows remarkable clinal variation within species over short distances along the east Himalayan elevational gradient (Figure 8A). Notably, mean peak frequency difference between the Low and Middle species where they co-occur is displaced by ~400 Hz, whereas populations outside the zone of overlap differ by ~100 Hz (Figure 8A). Signal jamming by species in other genera may contribute to setting an upper bound on frequency (Supplementary Figure S5), and the High elevation species in particular coexists with smaller diversity of other bird species than the Middle and Low species (Schumm et al., 2020), but in locations where the two species co-occur they must experience the same soundscape, making it probable that direct interactions between the species drive frequency differences between them. Hence, we conclude that the pattern of frequency variation within the Low and Middle species provides strong evidence in support of reproductive character displacement.

Evidence for reproductive character displacement along the elevational gradient. (A) Song peak frequency against elevation. Regression slopes: Low β = −0.23 Hz/m, p = 0.0003, N = 27; Middle β = −0.16 Hz/m, p = 0.02, N = 21; High: β = 0.06 Hz/m, p = 0.16, N = 19; Allopatric β = −0.11 Hz/m, p = 0.0001, N = 34; Peninsular β = −0.23 Hz/m, p = 0.11, N = 13; (B) frequency contour against elevation. In this case, after accounting for elevation in a linear model, a posteriori pairwise comparisons of the least-squares means revealed highly significant differences between the Middle species and the others (all p < 0.001).
Patterns for frequency contour differ from peak frequency (Figure 8B). First, the mean value for the West form is not intermediate between the Low and Middle species, but instead is identical to the High and the Low species, with which it shares a common ancestor. Second, unlike peak frequency, frequency contour shows no cline across the elevational gradient. These patterns are not predicted under a naive hypothesis of character displacement. Nevertheless, variation in frequency contour is 2–4× greater in the West species than any of the coexisting Himalayan species (Figure 8B, Supplementary Figure S8; pairwise comparisons of variance: p < 0.001 for West to High, p < 0.001 for West to Low, and p = 0.1 for West to Middle, see Supplementary Figure S8). The higher variability in the west species is direct evidence for reproductive character displacement in this trait. In the discussion, we consider further the possibility that the mean differences in frequency contour did indeed diverge because of species interactions, but the process has been obscured by subsequent spread out of the zone of co-occurrence.
Finally, song recognition shows a pattern consistent with reproductive character displacement. Individuals of the west species respond to the entire variety of songs sung by this species (Figure 2). Males of all three eastern species have narrower windows of recognition, responding only to the restricted range of variation each species sings (Figure 2).
Discussion
In contrast to studies of diversification in strongly sexually selected groups, we have focused on related species that show extreme similarity. We attribute stasis in songs to stabilizing sexual selection for efficient transmission in a restrictive environment. Song similarity across species has enabled us to evaluate consequences for the evolution of species recognition, as only a few traits differ between the species and hence are eligible to be targets of recognition. We demonstrate limited divergence in mean values, reduction in variance, and the requirement that co-occurring species differ in two traits. Taken in combination, these three factors result in diagnostic differences between species that are found together, and a lack of response to heterospecifics, thereby matching the conceptual framework in the lower right panel of Figure 1. Our results have implications for how species recognition evolves in the face of opposing sexual selection pressures, such as competition between males to maintain a territory, the role of interactions in sympatry in driving divergence (reproductive character displacement), and identifies how differences in transmission constraints of different traits affect their role in diversification. We consider each in turn.
Species recognition
A restrictive environment for sexual selection need not be a restrictive environment for species recognition, because in principle any species-diagnostic trait should be sufficient. For example, in birds many species have a distinctive introductory note, while the rest of the song is similar (Singh & Price, 2015) and in white-crowned sparrows, Zonotrichia leucophrys, the introductory note has been experimentally shown to be both necessary and sufficient for species recognition (Soha & Marler, 2000). Cyanoderma babblers have not taken this route but have instead diverged in two continuously varying traits, apparently using standing (cultural) variation to produce diagnostic differences in combination, and only in the locations where two species co-occur (Figure 4). One of the traits, note shape, shows reduced variation in comparison with an allopatric species (Figure 8, Supplementary Figure S8), and this reduction in variation is accompanied by a narrowing of the range of songs to which males respond (Figure 2). It is straightforward to understand how signal jamming could favor trait divergence and narrowing of variance in places where the two species co-occur. Establishment of the species-specific characteristics is likely to reflect learning from conspecifics by generalizing from diagnostic cues, perhaps including color and call notes (Gill & Murray, 1972; Irwin & Price, 1999).
Reproductive character displacement
Along the elevational gradient in the east Himalaya, peak frequency is most displaced where species co-occur, and converges on a mean value of about 2.2 kHz in locations where only a single species is found (Figure 8). Ascribing the cline to interactions between co-occurring species does require that other causes of variation can be ruled out (Grant, 1972; Kirschel et al., 2009). For example, different habitats affect transmission (Derryberry, 2009; Morton, 1975), and other vocalizing species the soundscape (Grant & Grant, 2010; Weir et al., 2012). Such differences can be rejected for Cyanoderma babblers, because where they co-occur individuals of each species are surely experiencing similar transmission constraints, whereas isolated populations of each species are separated along the elevational gradient by >1,000 m in elevation, experiencing different climates and habitats (Price et al., 2011), communities of other species (Schumm et al., 2020), and soundscapes (Singh & Price, 2015). Patterns consistent with reproductive character displacement have been regularly documented based on clinal variation within species (Higgie & Blows, 2007; Howard, 1993), including over short distances (Higgie & Blows, 2007; Matute, 2010), and in birds (Dingle et al., 2010; Irwin & Price, 1999).
The evidence for character displacement based on the within-species pattern for frequency relies on a cline, where, in the absence of another species, frequency becomes tuned to an optimal transmission window. Frequency contour shows no such cline. In this case, reproductive character displacement may still have occurred, but the differences in frequency contour have spread back through the entire range of each species (Liou & Price, 1994). We suggest that this is indeed the case. If the different shapes of syllables (downsweeps and upsweeps) are equally effective as a signal, no opposing force prevents their spread out of zones of co-occurrence. The possibility of spreading back to zones outside of the locations of overlap was long ago proposed as one reason why the process of reproductive character displacement has been difficult to detect through pattern analysis (Walker, 1974), but has not been discussed more recently.
While character displacement is plausible for frequency contour, in the absence of clines, alternative explanations for the evolution of syllable shape are plausible. Given the >2 million years of separation of the pair of lineages leading to the Middle species and the (High, Low, Allopatric west) clade, it is possible that song in one or other lineage evolved prior to contact, thereby facilitating co-existence, rather than evolving subsequent contact as a result of reproductive interference. While this hypothesis is difficult to reject, two lines of evidence are consistent with at least some role for species interactions. First, the High, Low, and Allopatric west species have been long diverged (>2 million years) from each other yet retain identical mean values for frequency contour. The value for this trait appears to have been carried through from the common ancestor, and we argue that all three species are subject to frequency-dependent stabilizing selection favoring the common form. If this is typical for the group, stasis is expected whenever species are allopatric. Second, and more directly, solitary populations of two species (west, and less so, Taiwan, Supplementary Figure S8) show relatively large differences among individuals in syllable shape. The lower variation among the coexisting east Himalayan species implies species interactions have favored a reduction in variance; by extension they should affect the mean as well.
Speciation
Our study has been based on male responses. Any inference about the extent to which females choose conspecifics (i.e., the extent to which assortative mating is 100% and speciation complete) requires that male responses correlate strongly with female discrimination (Weir & Price, 2019). Several studies have shown that females may be more discriminating than males, all in cases where males from different taxa defend territories against each other (Danner et al., 2011; Seddon & Tobias, 2010; Wheatcroft & Qvarnström, 2017), which is not the case in the present study. Further, we know of no example where females have been shown to be less discriminating than males. Hence, we consider evolution of male responses to be a reasonable guide to the evolution of assortative mating.
Our results are agnostic as to whether reproductive character displacement has occurred between species that are already fully reproductively isolated through postmating mechanisms, or hybrids have some low fitness, in which case complete reproductive isolation has required evolution of premating isolation (Butlin, 1987). While a separation of 5 million years between the co-occurring forms implies strong postmating isolation (Price & Bouvier, 2002), it is possible lineages first came in contact after about 1 million years of divergence or even earlier (Figure 5). After 1 million years, some intrinsic postmating isolation is expected, but it is likely to be incomplete (Price & Bouvier, 2002). Hence, it is possible that strong premating reproductive isolation evolved in response to strong, but possibly incomplete, postmating isolation, the subset of reproductive character displacement known as reinforcement (Butlin, 1987; Liou & Price, 1994). In summary, we cannot reject the possibility that postmating isolation, and hence speciation, was complete when the taxa first came into contact, but reinforcement is a possibility.
Cascading reinforcement
Reproductive character displacement can facilitate further speciation events. One prominent idea is that geographical variation within each species becomes established through interactions between species where they co-occur. The differences between allopatric and sympatric populations can then form the basis for further speciation within each species, a process commonly termed cascading reinforcement (Comeault & Matute, 2016; Hoskin & Higgie, 2010). Whether such variation is sufficient to prevent interbreeding requires further assessment, but simulations suggest that in the absence of postmating isolation, collapse will be the common outcome (Irwin, 2020). Here, we highlight a second contribution to speciation, which is that differences that have evolved through reproductive character displacement can be carried through subsequent speciation events. The related Allopatric west, Low, and High species have similar songs. A plausible model is that reproductive character displacement established between a pair of ancestral species (i.e., representing the lineage leading to the Middle, and the lineage leading to Low/High/Allopatric west) facilitated a second invasion (e.g., to High if the ancestral Low/High/Allopatric west species occurred at low elevations), because the second invader would already show strong premating reproductive isolation from the Middle species.
Conclusion
Our study of divergence among species singing “extraordinarily similar” songs (Rasmussen & Anderton, 2005) implicates a role for reproductive character displacement in sympatry. Two traits are involved in species recognition and the way they vary across space offers insights into why reproductive character displacement is likely more widespread than presently detected. The species occupy what appears to be a restrictive environment (Fuller & Endler, 2018), yet we argue that small differences in syllable structure (frequency contour) are equally efficiently transmitted, even as differences in frequency are not. In more permissive environments, a greater diversity of traits is, by definition, effectively transmitted (Fuller & Endler, 2018). Consequently, in such environments, reproductive character displacement between species can operate through divergence of alternative traits, and such traits should easily spread back into zones of allopatry. Under this reasoning, the large diversity and geographical uniformity of traits will often obscure a history of reproductive character displacement, which we predict to be common.
Supplementary material
Supplementary material is available online at Evolution.
Data availability
All data, including masses, biogeographical distributions, songs, and results from song playbacks, are in 11 files on Dryad (https://doi.org/10.5061/dryad.wm37pvmqg).
Funding
This work was supported in part by the National Geographic Society and the National Science Foundation (NSF DEB-2031105).
Conflict of interest: The authors declare no conflicts of interest.
Acknowledgments
This project was conducted under the auspices of a Memorandum of Understanding between the Wildlife Institute of India and the University of Chicago. We thank Sandeep Gupta, R. Suresh Kumar, Vinod Mathur, Dhananjai Mohan, G.S. Rawat, and Ruchi Badola for facilitating the MOU, obtaining permits, and help with logistics. We also thank Sahas Barve and Umesh Srinivasan for providing data on body masses, Ashutosh Singh for sequencing Peninsular C. ambiguum, Bettina Harr for collating and analyzing these sequences, and Laura Céspedes Arias, Kristina Fialko, Darren Irwin, Joe Tobias, Jason Weir, and Amarjeet Kaur for comments. We acknowledge contributors to Xeno-canto and permissions from the Chief Wildlife Wardens from the respective Indian states.