Abstract

A total of 752 odorant receptor (Or) genes, including pseudogenes, were identified in 11 Drosophila species and named after their orthologs in Drosophila melanogaster. The 813 Or genes, including 61 from D. melanogaster, were classified into 59 orthologous groups that are well supported by gene phylogeny. By reconciling with the gene family phylogeny, we estimated the number of gene duplication/loss events and intron gain/loss events in the species phylogeny. We found that these events are particularly frequent in Drosophila grimshawi, Drosophila willistoni, and obscura group. More than half of the duplicated genes stay as tandem arrays, whose size range from 2 to 8. These genes vary in sequence and some likely underwent positive selection, indicating that the gene duplication was important for flies to acquire new olfactory functions. We hypothesize that Or genes conferred the basic olfactory repertoire to ancestral flies before the speciation of the Drosophila and Sophophora subgenera about 40 Mya. This repertoire has been largely maintained in the current species, whereas lineage-specific gene duplication seems to have led to additional specialization in some species in response to specific ecological conditions.

Introduction

Animals rely on sensory neurons to detect environmental odors. The external chemical stimuli are transformed into electrical pulses and then relayed and processed by the nervous system. Having evolved in enormously different environments, the olfactory systems diverge greatly in extant species with distinct specificity and sensitivity (Buck 1996; Hildebrand and Shepherd 1997). Because of its anatomical simplicity, the availability of genetic information, and the well-established physiological and behavioral analysis techniques, Drosophila has become an ideal model organism for studying olfaction (de Bruyne et al. 2001). Drosophila has 2 pairs of peripheral olfactory organs. The antenna house about 1,200 olfactory receptor neurons (ORNs), and the maxillary palps host about 120 ORNs (Stocker 1994; Shanbhag 1999, 2000). These ORNs are compartmentalized into sensilla, protruded hair-like structures filled with proteineous fluid that bathes the dendrites of ORNs (Xu et al. 2003). Odors are initially received by ORNs and the signal propagates through ORN axons and reaches glomeruli in antennal lobe for extensive processing before passing to higher brain regions. Experiments have shown that ORNs possess different response profiles and dynamics (de Bruyne et al. 2001; Elmore et al. 2003) and such diversity is determined by the odorant receptors residing in the ORNs (Hallem et al. 2004; Kreher et al. 2005).

The odorant receptors in Drosophila melanogaster have been identified by (Clyne et al. 1999) using a bioinformatics-driven approach (Kim et al. 2000) and independently by others using experimental approaches (Gao and Chess 1999; Vosshall et al. 1999). These receptors have about 400 amino acid residues and belong to the superfamily of G-protein–coupled receptors with 7 transmembrane domains. Recently, it has been suggested that they may adopt an unconventional topology with intracellular N-termini (Benton et al. 2006). Surprisingly, D. melanogaster has a rather small Or gene repertoire with only 61 members discovered so far, of which Or98P is a pseudogene (Robertson et al. 2003), whereas the estimated number of Or genes in vertebrates ranges from about 100 in fish (Ngai et al. 1993) to about 1,000 in mouse and rat (Mombaerts 1999). However, it should be noted that recent evidence suggests the existence of more divergent members that have not been cloned (Carlson J, personal communication). The odorant receptors in D. melanogaster are extremely divergent at the sequence level with the most divergent pair being about 10% identical in their protein sequences. Nevertheless, the functional role of annotated odorant receptors has been validated using genetic and physiological approaches (Hallem et al. 2004; Kreher et al. 2005). Evidence shows that the odorant receptor determines many properties of ORNs, including spontaneous firing rate, signaling mode, onset and termination dynamics. Furthermore, each receptor has a characteristic response spectrum to different chemical stimuli.

The original identification of Or genes was enabled by the D. melanogaster genome sequencing project allowing the use of a computational approach to solve a long-standing empirical problem (Kim and Carlson 2002). Subsequent combination of sequence and functional analysis has given us many insights into the fruit fly's olfaction system. The expanded genome projects in the Drosophila genus yielded 12 whole-genome sequences distributed across the genus. In this work, we identified Or genes in 11 Drosophila species and investigated the origin and diversification of Or genes in this genus.

Materials and Methods

Identification of Odorant Receptors

The gene, transcript, and protein sequences of 61 odorant receptors in D. melanogaster were downloaded from FlyBase on August 2005. The pseudogene Or98P has only the gene sequence. The comparative analysis freeze 1 (CAF1) of 12 Drosophila genome assemblies were downloaded from the Lawrence Berkeley National Lab Web site for “Assembly/Alignment/Annotation of 12 related Drosophila species” (referred as AAA thereafter) on June 2006. Files mapping CAF1 scaffold and chromosome names to GenBank accession versions were downloaded from AAAWiki (http://rana.lbl.gov/drosophila/wiki) on January 2007. All annotation in our data set used GenBank accession number. CAF1 includes assembly release 4.3 for D. melanogaster and assemblies for 11 other Drosophila species including Drosophila simulans, Drosophila sechellia, Drosophila yakuba, Drosophila erecta, Drosophila ananassae, Drosophila pseudoobscura, Drosophila persimilis, Drosophila willistoni, Drosophila mojavensis, Drosophila virilis, and Drosophila grimshawi. There were 2 very similar assembly versions for D. pseudoobscura and D. yakuba. We used the nonreconciled versions. Protein sequences of known Or genes were used by TBlastN to search against the genome assemblies to find new Or genes. The exon–intron boundaries of newly identified Or genes were identified based on their alignment to the known ones using the CHAOS + DIALIGN Web server (Brudno et al. 2003). Whenever the alignment was insufficient to recognize splice sites, assistance was sought from the splice site prediction Web server at Berkeley Drosophila Genome Project. Newly identified receptors served as seeds in the next TBlastN search. An E value cutoff of 0.01 was used in all searches. The iteration was continued until no further Or gene was found. All sequences that are subsequences to other longer ones were excluded. We were unable to get the full-length sequences of some Or genes because either they resided on the scaffold boundaries or parts of them were not sequenced. Such Or genes were tentatively designated as incomplete unless stop codons exist inside their sequences. Or83b in D. simulans initially appeared to be a pseudogene. This is probably caused by a sequencing error because Or83b is known to be indispensable in olfaction and is unlikely to cease function in D. simulans. This gene, though having a stop codon in the 2nd exon, can be aligned perfectly with its orthologs in other species and shows no sign of excessive sequence divergence that is normal for pseudogenes under no selective pressure. Therefore, the sequence was corrected and this gene was treated as a functional gene.

Nomenclature of Odorant Receptors

The 61 Or genes in D. melanogaster were classified into 57 orthologous groups. Four groups each include 2 genes. They are Or19a and Or19b, Or22a and Or22b, Or33a and Or33b, and Or98a and Or98P. Or genes in other genomes were assigned to one of the groups based on sequence similarity and genic structure. The relative genomic/scaffold location was used to assist assignment if ambiguity arose. Two new groups were added because they do not have obvious orthologs in D. melanogaster.

Originally, Or genes in D. melanogaster were named by their positions in the cytological map (DORN Committee 2000). For example, Or22a and Or22b are both in band 22. This scheme was expanded to accommodate the new situation as follows: 1) the original 61 Or genes in D. melanogaster were given the prefix “Dmel”; 2) genes in other species were named after their orthologous gene in D. melanogaster with a 4-letter species prefix. If a D. melanogaster gene has multiple orthologous genes in another species, these copies were distinguished by a hyphen and a number suffix. For example, DsimOr2a is the ortholog to DmelOr2a; DyakOr67a-1 and DyakOr67a-2 are the 2 orthologs to DmelOr67a. 3) The new orthologous groups were named OrN1 and OrN2, preceded by the appropriate species prefix. The annotated sequences can be downloaded at http://kim.bio.upenn.edu/wiki/html/Public/Downloads.htm.

Phylogenetic and Evolutionary Analysis of Odorant Receptors

The protein sequences of 730 Or proteins were aligned by MUSCLE (Edgar 2004). Pseudogenes and short incomplete genes were not used. The alignment was filtered so that only columns with less than 10% gaps were retained. Similarly, a sequence was kept only if it had amino acids in more than 90% of the retained columns. As a result, 727 sequences were kept with 338 positions after filtration. The filtered alignment was used to compute the pairwise genetic distance by PROTDIST in PHYLIP3.66 (Felsenstein 2006). The program used JTT matrix with a gamma distribution of rates among positions. The parameter alpha was set to 1. A Neighbor-Joining (NJ) tree was constructed based on the distance matrix. The phylogeny quality was assessed by 1,000 bootstrap replicates.

The nonsynonymous to synonymous substitution ratios (dN/dS) for orthologous groups were computed by “codeml” in PAML (Yang 1997), assuming a homogeneous ratio among lineages. Two pairs of site models in PAML were chosen to test positive selection using likelihood ratio test (LRT) and to identify positively selected sites in an orthologous group using both naive empirical bayes (NEB) and bayes empirical bayes (BEB) estimation methods. The 1st pair of models was M1a (NearlyNeutral) and M2a (PositiveSelection), the 2nd pair was M7 (beta) and M8 (beta &ω). Positively selected sites were mapped to the topologies of corresponding receptors, which were predicted by the PolyPhobius server using aligned orthologous sequences (Kall et al. 2005).

Within each orthologous group, gene duplication and loss events were identified using GeneTree (Page and Charleston 1997), assisted by the known species tree and the relative genomic/scaffold positions of the duplicated genes. The alignment of orthologous genes was used to infer intron gains and losses. We simulated Or gene evolution on the species tree using a birth–death model by assuming that 1) the ancestral species had 60 Or genes, 2) these genes evolved independently, and 3) the gene duplication (birth) and loss (death) rates were homogeneous within the species tree. The duplication/loss rates were estimated using the counts of events described above and the estimated molecular tree calibrated by the estimated divergence time since the separation of the 2 subgenera (fig. 1). The simulation was run 1,000,000 times to generate a distribution of the variance of gene numbers in 12 species to assess whether the observed variation in gene numbers in each species is greater or less than expected under a null model of random duplication/loss.

Species tree of 12 Drosophila species used in this study. Divergence time is given in Mya and was estimated from a linearized Adh molecular clock (Russo et al. 1995; Powell 1997). Four types of evolutionary events were studied: gene loss (L), gene duplication (D), intron loss (IL), and intron gain (IG). Numbers in brackets count the estimated occurrence.
FIG. 1.—

Species tree of 12 Drosophila species used in this study. Divergence time is given in Mya and was estimated from a linearized Adh molecular clock (Russo et al. 1995; Powell 1997). Four types of evolutionary events were studied: gene loss (L), gene duplication (D), intron loss (IL), and intron gain (IG). Numbers in brackets count the estimated occurrence.

Results

Identification of Or Genes in Drosophila Genomes and Patterns of Divergence

Based on our search method, we identified 752 Or genes in 11 Drosophila genome assemblies (fig. 1). (Currently, there are 68 D. pseudoobscura Or genes annotated in FlyBase [www.flybase.org]. Our annotations cover all 68 with an addition of 3 more pseudogenes; 1 DpseOr98a and 2 DpseOr65b's). Together with the 61 known Or genes in D. melanogaster, we have the sequence information of 813 Or genes in the Drosophila genus. Of these genes, 73 are pseudogenes that lost the protein-coding ability because of frameshift or premature stop codons as indicated by homologous alignments. Another 19 were tentatively annotated as incomplete because parts of their coding regions were not sequenced and no evidence of coding problem was found in the sequenced portions. AAA recently released gene models for the 12 Drosophila genomes with 523 Or genes, of which 333 genes have annotated protein-coding sequences. There were all included in our data set.

The 813 Or genes were clustered into 59 orthologous groups based on their sequence similarity and their physical location on chromosomes or scaffolds (table 1). An orthologous group may contain multiple inparalogous genes in a species (Sonnhammer and Koonin 2002). For example, group Or22a consists of Or22a and Or22b from D. melanogaster, 2 genes from D. simulans, 2 from D. sechellia, 6 from D. ananassae, and 14 from the other 8 species. Members in the same orthologous group had identical or similar genic structure. The genes usually have the same number of exons, and the lengths of corresponding exons are close to each other. Extra and missing introns were easily recognized from the alignment.

Table 1

Or Genes in 12 Drosophila Species

DmelDsimDsecDyakDereDanaDpseDperDwilDmojDvirDgriMPPI
Or1a112(1)11111111061
Or2a11(1)111111111031
Or7a1111111[1]1[1]100067
Or9a11111111111142
Or10a11111111111152
Or13a11[1]1[1]11111(1)111[1]154
Or19a21111111111131
Or22a222[1]116[2]3[1]3[1]111243
Or22c11111111111245
Or23a111111111213[2]37
Or24a11111111111175
Or30a11111111111176
Or33a22222533100047
Or33c11[1]1[1]11112[1]111144
Or35a11(1)1(1)11111111154
Or42a11111122123[1]161
Or42b1111111111111[3]68
Or43a11111111[1]111156
Or43b12(1)111111100051
Or45a11111111100058
Or45b11111111011165
Or46a11111111111446
Or47a12[1]111111100041
Or47b11111111121147
Or49a11111122111239
Or49b11111111111166
Or56a11(1)1[1]11122[2]111248
Or59a111111113[1]134(1)[1]48
Or59b12(1)111111113[1]4(1)[1]22
Or59c11111111100056
Or63a11111111111154
Or65a111[1]11300211[1]031
Or65b1111107[2]6[4]000079
Or65c111[1]21000000048
Or67a13[1]2[1]22[1]110322232
Or67b11111111111153
Or67c11111111111178
Or67d11111211111524
Or69a111111112[1]112(1)31
Or71a11111111[1]111142
Or74a11111111111249
Or82a11111111411154
Or83a11(1)111111111167
Or83b111113(1)[1]11111185
Or83c11111111222141
Or85a111111[1]00120024
Or85b11111111[1]0001[1]53
Or85c11111111[1]121153
Or85d11111111111161
Or85e11111111111[1]153
Or85f12(2)3(2)111118[2]1[1]1139
Or88a111111[1]11111142
Or92a1112(1)1111111136
Or94a11(1)111111111[1]151
Or94b11111111[1]111159
Or98a2[1]2[1]12234[1]4[1]8[4]334(1)[1]23
Or98b11[1]1[1]11111101[1]1[1]54
OrN100000011[1]311149
OrN200000000056[4]2[1]48
Total61[1]66(9)[6]63(4)[8]62(1)60[1]71(1)[5]71[5]70(1)[16]80[8]62[1]64[11]83(4)[11]
DmelDsimDsecDyakDereDanaDpseDperDwilDmojDvirDgriMPPI
Or1a112(1)11111111061
Or2a11(1)111111111031
Or7a1111111[1]1[1]100067
Or9a11111111111142
Or10a11111111111152
Or13a11[1]1[1]11111(1)111[1]154
Or19a21111111111131
Or22a222[1]116[2]3[1]3[1]111243
Or22c11111111111245
Or23a111111111213[2]37
Or24a11111111111175
Or30a11111111111176
Or33a22222533100047
Or33c11[1]1[1]11112[1]111144
Or35a11(1)1(1)11111111154
Or42a11111122123[1]161
Or42b1111111111111[3]68
Or43a11111111[1]111156
Or43b12(1)111111100051
Or45a11111111100058
Or45b11111111011165
Or46a11111111111446
Or47a12[1]111111100041
Or47b11111111121147
Or49a11111122111239
Or49b11111111111166
Or56a11(1)1[1]11122[2]111248
Or59a111111113[1]134(1)[1]48
Or59b12(1)111111113[1]4(1)[1]22
Or59c11111111100056
Or63a11111111111154
Or65a111[1]11300211[1]031
Or65b1111107[2]6[4]000079
Or65c111[1]21000000048
Or67a13[1]2[1]22[1]110322232
Or67b11111111111153
Or67c11111111111178
Or67d11111211111524
Or69a111111112[1]112(1)31
Or71a11111111[1]111142
Or74a11111111111249
Or82a11111111411154
Or83a11(1)111111111167
Or83b111113(1)[1]11111185
Or83c11111111222141
Or85a111111[1]00120024
Or85b11111111[1]0001[1]53
Or85c11111111[1]121153
Or85d11111111111161
Or85e11111111111[1]153
Or85f12(2)3(2)111118[2]1[1]1139
Or88a111111[1]11111142
Or92a1112(1)1111111136
Or94a11(1)111111111[1]151
Or94b11111111[1]111159
Or98a2[1]2[1]12234[1]4[1]8[4]334(1)[1]23
Or98b11[1]1[1]11111101[1]1[1]54
OrN100000011[1]311149
OrN200000000056[4]2[1]48
Total61[1]66(9)[6]63(4)[8]62(1)60[1]71(1)[5]71[5]70(1)[16]80[8]62[1]64[11]83(4)[11]

Note.—Species: Dmel (Drosophila melanogaster), Dsim (Drosophila simulans), Dsec (Drosophila sechellia), Dyak (Drosophila yakuba), Dere (Drosophila erecta), Dana (Drosophila ananassae), Dpse (Drosophila pseudoobscura), Dper (Drosophila persimilis), Dwil (Drosophila willistoni), Dmoj (Drosophila mojavensis), Dvir (Drosophila virilis), and Dgri (Drosophila grimshawi). MPPI: minimal pairwise peptide identity within an orthologous group. Total: the number of pseudogenes is given in the brackets. The number of incompletely sequenced genes is given in the parentheses. Both are included in the total number of genes.

Table 1

Or Genes in 12 Drosophila Species

DmelDsimDsecDyakDereDanaDpseDperDwilDmojDvirDgriMPPI
Or1a112(1)11111111061
Or2a11(1)111111111031
Or7a1111111[1]1[1]100067
Or9a11111111111142
Or10a11111111111152
Or13a11[1]1[1]11111(1)111[1]154
Or19a21111111111131
Or22a222[1]116[2]3[1]3[1]111243
Or22c11111111111245
Or23a111111111213[2]37
Or24a11111111111175
Or30a11111111111176
Or33a22222533100047
Or33c11[1]1[1]11112[1]111144
Or35a11(1)1(1)11111111154
Or42a11111122123[1]161
Or42b1111111111111[3]68
Or43a11111111[1]111156
Or43b12(1)111111100051
Or45a11111111100058
Or45b11111111011165
Or46a11111111111446
Or47a12[1]111111100041
Or47b11111111121147
Or49a11111122111239
Or49b11111111111166
Or56a11(1)1[1]11122[2]111248
Or59a111111113[1]134(1)[1]48
Or59b12(1)111111113[1]4(1)[1]22
Or59c11111111100056
Or63a11111111111154
Or65a111[1]11300211[1]031
Or65b1111107[2]6[4]000079
Or65c111[1]21000000048
Or67a13[1]2[1]22[1]110322232
Or67b11111111111153
Or67c11111111111178
Or67d11111211111524
Or69a111111112[1]112(1)31
Or71a11111111[1]111142
Or74a11111111111249
Or82a11111111411154
Or83a11(1)111111111167
Or83b111113(1)[1]11111185
Or83c11111111222141
Or85a111111[1]00120024
Or85b11111111[1]0001[1]53
Or85c11111111[1]121153
Or85d11111111111161
Or85e11111111111[1]153
Or85f12(2)3(2)111118[2]1[1]1139
Or88a111111[1]11111142
Or92a1112(1)1111111136
Or94a11(1)111111111[1]151
Or94b11111111[1]111159
Or98a2[1]2[1]12234[1]4[1]8[4]334(1)[1]23
Or98b11[1]1[1]11111101[1]1[1]54
OrN100000011[1]311149
OrN200000000056[4]2[1]48
Total61[1]66(9)[6]63(4)[8]62(1)60[1]71(1)[5]71[5]70(1)[16]80[8]62[1]64[11]83(4)[11]
DmelDsimDsecDyakDereDanaDpseDperDwilDmojDvirDgriMPPI
Or1a112(1)11111111061
Or2a11(1)111111111031
Or7a1111111[1]1[1]100067
Or9a11111111111142
Or10a11111111111152
Or13a11[1]1[1]11111(1)111[1]154
Or19a21111111111131
Or22a222[1]116[2]3[1]3[1]111243
Or22c11111111111245
Or23a111111111213[2]37
Or24a11111111111175
Or30a11111111111176
Or33a22222533100047
Or33c11[1]1[1]11112[1]111144
Or35a11(1)1(1)11111111154
Or42a11111122123[1]161
Or42b1111111111111[3]68
Or43a11111111[1]111156
Or43b12(1)111111100051
Or45a11111111100058
Or45b11111111011165
Or46a11111111111446
Or47a12[1]111111100041
Or47b11111111121147
Or49a11111122111239
Or49b11111111111166
Or56a11(1)1[1]11122[2]111248
Or59a111111113[1]134(1)[1]48
Or59b12(1)111111113[1]4(1)[1]22
Or59c11111111100056
Or63a11111111111154
Or65a111[1]11300211[1]031
Or65b1111107[2]6[4]000079
Or65c111[1]21000000048
Or67a13[1]2[1]22[1]110322232
Or67b11111111111153
Or67c11111111111178
Or67d11111211111524
Or69a111111112[1]112(1)31
Or71a11111111[1]111142
Or74a11111111111249
Or82a11111111411154
Or83a11(1)111111111167
Or83b111113(1)[1]11111185
Or83c11111111222141
Or85a111111[1]00120024
Or85b11111111[1]0001[1]53
Or85c11111111[1]121153
Or85d11111111111161
Or85e11111111111[1]153
Or85f12(2)3(2)111118[2]1[1]1139
Or88a111111[1]11111142
Or92a1112(1)1111111136
Or94a11(1)111111111[1]151
Or94b11111111[1]111159
Or98a2[1]2[1]12234[1]4[1]8[4]334(1)[1]23
Or98b11[1]1[1]11111101[1]1[1]54
OrN100000011[1]311149
OrN200000000056[4]2[1]48
Total61[1]66(9)[6]63(4)[8]62(1)60[1]71(1)[5]71[5]70(1)[16]80[8]62[1]64[11]83(4)[11]

Note.—Species: Dmel (Drosophila melanogaster), Dsim (Drosophila simulans), Dsec (Drosophila sechellia), Dyak (Drosophila yakuba), Dere (Drosophila erecta), Dana (Drosophila ananassae), Dpse (Drosophila pseudoobscura), Dper (Drosophila persimilis), Dwil (Drosophila willistoni), Dmoj (Drosophila mojavensis), Dvir (Drosophila virilis), and Dgri (Drosophila grimshawi). MPPI: minimal pairwise peptide identity within an orthologous group. Total: the number of pseudogenes is given in the brackets. The number of incompletely sequenced genes is given in the parentheses. Both are included in the total number of genes.

The protein sequences for receptors in a group are also highly similar to each other. The average pairwise peptide identity in an orthologous group ranges from 46 to 91%. The most conserved gene is Or83b. Even between DsecOr83b and DgriOr83b, the 2 most divergent members of this group, their protein sequences are 85% identical. This reflected the vital role of Or83b in olfaction. This receptor forms heterodimers with other receptors and is essential for Drosophila olfaction (Benton et al. 2006). In comparison to the diversity of the ortholog groups, the average peptide identity for all odorant receptors is only 15%.

Previous work reconstructed the phylogeny of Or genes in D. melanogaster with evidence showing that these genes were as ancient as the origin of the arthropods (Robertson et al. 2003). We reconstructed a phylogeny for 727 receptors using NJ method (Saitou and Nei 1987). These receptors include isoforms of Or22a, Or46a, and Or69a created by alternative splicing. The phylogeny is shown in figure 2. The backbone of this phylogeny, that is, the subtree of the most recent common ancestors for orthologous groups, largely resembles the phylogeny of Or genes in D. melanogaster as expected. Local variations exist, but almost all at the branches where bootstrap supports are low. This is most likely caused by 2 factors: our method for reconstructing phylogeny was slightly different from the previous work and our data set is much larger.

The phylogeny of Or gene family in 12 Drosophila species. Clades with the same color are the orthologous genes with nearby labels. Internal nodes with high bootstrap support (>90%) were marked by solid squares and these with moderate support (>70%) were marked by empty cycles. Branches within orthologous genes are not marked because they nearly always have high bootstrap supports. The phylogeny is rooted using Or83b. The branch length is not in proportion to the actual genetic distance.
FIG. 2.—

The phylogeny of Or gene family in 12 Drosophila species. Clades with the same color are the orthologous genes with nearby labels. Internal nodes with high bootstrap support (>90%) were marked by solid squares and these with moderate support (>70%) were marked by empty cycles. Branches within orthologous genes are not marked because they nearly always have high bootstrap supports. The phylogeny is rooted using Or83b. The branch length is not in proportion to the actual genetic distance.

The phylogeny supports our classification of Or genes into orthologous groups. For 54 orthologous groups, their members are exclusively clustered into monophyletic groups. Or65a, Or65b, and Or65c are mostly tandem duplicated in nearly all species; however, they share higher identity within than among species. Consequently, they tend to be placed together in the phylogeny. Mixture clades of Or98a and Or85a are also found as these 2 groups share high sequence similarity. When the phylogeny is examined in detail (supplementary document 1, Supplementary Material online), it can be seen that these orthologous groups not only vary in size but that there are many inparalogs, indicating a complicated evolutionary process that includes gene duplication, gene loss, and pseudogenization.

Evolution of Genomic Location for Or Genes

All Or genes in D. melanogaster were anchored and oriented onto chromosomes in Robertson et al. (2003). These genes are broadly dispersed over 5 chromosome arms (also called Muller elements)— X, 2L, 2R, 3L, and 3R— reflecting the old age of this gene family. In our study, we mapped Or genes onto 3 additional genomes, D. simulans, D. yakuba, and D. pseudoobscura. The scaffolds in other 8 assemblies were not placed onto chromosomes, and we were not able to place them. The location and orientation of Or genes in these 3 genome were compared with their orthologs in D. melanogaster (fig. 3). We then inferred 3 types of genomic rearrangements: paracentric inversion (inversion confined to one arm of chromosome), pericentric inversion (inversion spanning centromere), and translocation (Powell 1997).

Comparison of genomic location and orientation of Or genes in Drosophila melanogaster and 3 other species. (A) D. melanogaster versus Drosophila simulans. (B) D. melanogaster versus Drosophila yakuba. (C). D. melanogaster versus Drosophila pseudoobscura. Dotted boxes are the 5 Muller elements. Each Muller element is normalized to have unit length. The x-y coordinates of an Or gene are its relative positions in the corresponding Muller element. Orthologous Or genes with different orientation are represented by solid squares, whereas those with the same orientation by open circles. Labeled Or genes outside the boxes are translocation between Muller elements. Translocations and inversions within a Muller element can be identified by the off-diagonal squares and circles. For example, a paracentric inversion is obvious in chromosome 3R in D. simulans. In D. pseudoobscura, there are several inparaologs of Or67b and Or98a. Only one for each was represented in the graph.
FIG. 3.—

Comparison of genomic location and orientation of Or genes in Drosophila melanogaster and 3 other species. (A) D. melanogaster versus Drosophila simulans. (B) D. melanogaster versus Drosophila yakuba. (C). D. melanogaster versus Drosophila pseudoobscura. Dotted boxes are the 5 Muller elements. Each Muller element is normalized to have unit length. The x-y coordinates of an Or gene are its relative positions in the corresponding Muller element. Orthologous Or genes with different orientation are represented by solid squares, whereas those with the same orientation by open circles. Labeled Or genes outside the boxes are translocation between Muller elements. Translocations and inversions within a Muller element can be identified by the off-diagonal squares and circles. For example, a paracentric inversion is obvious in chromosome 3R in D. simulans. In D. pseudoobscura, there are several inparaologs of Or67b and Or98a. Only one for each was represented in the graph.

As expected, we saw more rearrangement events in pairs of more distantly related species. Translocation across the 5 Muller elements was observed in D. yakuba and D. pseudoobscura, using D. melanogaster as reference. We were able to infer the rearrangement events in D. simulans and D. yakuba by assuming the most parsimonious edit path between the compared genomes (table 2). D. pseudoobscura had much larger extent of genomic rearrangements and it became difficult to count and categorize individual events and therefore was not used for the analysis.

Table 2

Genome Shuffling Changed Or Gene Location and Orientation Using D. melanogaster as Reference

EventChromosome ArmOr Genes
D. simulansParacentric inversion3ROr92a, Or88a, Or85b, Or85c, Or85d, Or85e
D. yakubaParacentric translocation2LOr30a
Paracentric inversion2LOr35a
Paracentric inversion2ROr49a, Or49b
Paracentric inversion2ROr56a
Paracentric translocation + inversion3LOr63a
Paracentric inversion3LOr74a
Paracentric translocation + inversion3LOr67a, Or67b
Paracentric translocation3ROr92a
Paracentric inversion3ROr88a
Paracentric translocation3ROr83c
Paracentric inversion3ROr98a, Or98b
Paracentric inversion + translocationXOr2a
Paracentric inversion + translocationXOr7a
Paracentric translocationXOr13a
Paracentric inversion + translocationXOr19a
Paracentric inversion + translocationXOr10a
Pericentric translocation2R → 2LOr43a, Or43b, Or45a, Or45b, Or46a
Pericentric translocation + inversion2R → 2LOr42b
EventChromosome ArmOr Genes
D. simulansParacentric inversion3ROr92a, Or88a, Or85b, Or85c, Or85d, Or85e
D. yakubaParacentric translocation2LOr30a
Paracentric inversion2LOr35a
Paracentric inversion2ROr49a, Or49b
Paracentric inversion2ROr56a
Paracentric translocation + inversion3LOr63a
Paracentric inversion3LOr74a
Paracentric translocation + inversion3LOr67a, Or67b
Paracentric translocation3ROr92a
Paracentric inversion3ROr88a
Paracentric translocation3ROr83c
Paracentric inversion3ROr98a, Or98b
Paracentric inversion + translocationXOr2a
Paracentric inversion + translocationXOr7a
Paracentric translocationXOr13a
Paracentric inversion + translocationXOr19a
Paracentric inversion + translocationXOr10a
Pericentric translocation2R → 2LOr43a, Or43b, Or45a, Or45b, Or46a
Pericentric translocation + inversion2R → 2LOr42b
Table 2

Genome Shuffling Changed Or Gene Location and Orientation Using D. melanogaster as Reference

EventChromosome ArmOr Genes
D. simulansParacentric inversion3ROr92a, Or88a, Or85b, Or85c, Or85d, Or85e
D. yakubaParacentric translocation2LOr30a
Paracentric inversion2LOr35a
Paracentric inversion2ROr49a, Or49b
Paracentric inversion2ROr56a
Paracentric translocation + inversion3LOr63a
Paracentric inversion3LOr74a
Paracentric translocation + inversion3LOr67a, Or67b
Paracentric translocation3ROr92a
Paracentric inversion3ROr88a
Paracentric translocation3ROr83c
Paracentric inversion3ROr98a, Or98b
Paracentric inversion + translocationXOr2a
Paracentric inversion + translocationXOr7a
Paracentric translocationXOr13a
Paracentric inversion + translocationXOr19a
Paracentric inversion + translocationXOr10a
Pericentric translocation2R → 2LOr43a, Or43b, Or45a, Or45b, Or46a
Pericentric translocation + inversion2R → 2LOr42b
EventChromosome ArmOr Genes
D. simulansParacentric inversion3ROr92a, Or88a, Or85b, Or85c, Or85d, Or85e
D. yakubaParacentric translocation2LOr30a
Paracentric inversion2LOr35a
Paracentric inversion2ROr49a, Or49b
Paracentric inversion2ROr56a
Paracentric translocation + inversion3LOr63a
Paracentric inversion3LOr74a
Paracentric translocation + inversion3LOr67a, Or67b
Paracentric translocation3ROr92a
Paracentric inversion3ROr88a
Paracentric translocation3ROr83c
Paracentric inversion3ROr98a, Or98b
Paracentric inversion + translocationXOr2a
Paracentric inversion + translocationXOr7a
Paracentric translocationXOr13a
Paracentric inversion + translocationXOr19a
Paracentric inversion + translocationXOr10a
Pericentric translocation2R → 2LOr43a, Or43b, Or45a, Or45b, Or46a
Pericentric translocation + inversion2R → 2LOr42b

We observed far more chromosomal reshufflings in D. yakuba than in D. simulans. In D. simulans, the only visible rearrangement is the large inversion of chromosome 3R, which occurred at the breakpoints 84F1 and 93F6–7 (Ashburner and Lemeunier 1975). In D. yakuba, we observed all 3 types of rearrangement events. Most of these events were simple inversions or translocations. Some are more complicated, as illustrated by the location and orientation of 5 Or genes on chromosome X of D. melanogaster and D. yakuba. These 5 genes are Or7a, Or9a, Or10a, Or13a, and Or19a (supplementary fig. 5, Supplementary Material online). The most parsimonious rearrangement involves 2 inversions and 2 translocations to obtain the edit pattern of these 5 genes. In another significant translocation in D. yakuba, a segment of at least 3 Mb moved from chromosome 2R to 2L, relocating 6 Or genes— Or42b, Or43a, Or43b, Or45a, Or45b, and Or46a— further reshuffling occurred after the relocation as we can see from the location of Or35a (supplementary fig. 1, Supplementary Material online).

Gene Duplication, Gene Loss, and Evolution of Genic Structure

Gene duplication is the major way for species to acquire new function (Ohno 1970) and was very common in the Drosophila Or gene family. By reconciling the species phylogeny with the gene phylogeny and by using genomic/scaffold location of Or genes, we estimated the gene duplication/loss events and intron loss/gain events that are summarized in figure 1. We note that although duplication events are likely to be more reasonably estimated, the estimates of gene loss are subject to the compounding effect of sequence divergence resulting in erroneous lack of ortholog identification. The prevalence of pseudogenes and the ambiguity of classifying some Or genes added to the difficulty of reconstructing gene phylogeny and inferring events duplication.

We observed that the distribution of receptor gain/loss events varies greatly in the species tree. The Hawaiian fly D. grimshawi seems to have undergone the most dramatic gene duplication and loss events. D. willistoni, D. ananassae, and the 2 flies in the obscura group also had high numbers of gene duplication and loss (fig. 1). In particular, D. persimilis, the sister species of D. pseudoobscura, suffered 13 gene loss events. This is surprising given the relatively recent speciation of these 2 species. In terms of gene copy numbers, the Or gene family is most stable in D. erecta and D. yakuba.

The genomic organization and sequence information of duplicated genes can give us insights on evolutionary dynamics. In the 240 duplicated genes, about 60% exist as tandem arrays, some of which are rather large. The longest tandem array is in D. willistoni with a battery of 8 Or85f genes, of which the 3rd and the 8th member are pseudogenes. In D. grimshawi, 11 copies of Or42b were separated into 3 tandem arrays of size 7, 2, and 2, respectively. Each of these 3 tandem clusters contains 1 pseudogene. The 8 putatively functional Or42b genes encode receptors that share 87% protein sequence identity. These receptors are very likely to be functionally divergent (see Discussion). Other evolutionary events occurred within some duplicated genes including pseudogenization and alternative splicing. An example is shown in figure 4 for DanaOr22a in D. ananassae. In this tandem array of 6 genes, the 1st and the 5th genes are pseudogenes, the 2nd shows large change in the length of the 3rd intron, and there is evidence of alternative splicing in the 4th gene.

The tandem array of 6 Or22a genes in Drosophila ananassae. The rectangles represent exons, these in pseudogenes by open rectangles. Arrows indicate the gene orientation. A functional DanaOr22a gene has 4 exons. Alternative splicing was found for DanaOr22a-4 that has 2 forms of the 1st exon, labeled as “A” and “B.” The 2 first-exon isoforms both encode 42-amino acid segments that differ in 16 positions.
FIG. 4.—

The tandem array of 6 Or22a genes in Drosophila ananassae. The rectangles represent exons, these in pseudogenes by open rectangles. Arrows indicate the gene orientation. A functional DanaOr22a gene has 4 exons. Alternative splicing was found for DanaOr22a-4 that has 2 forms of the 1st exon, labeled as “A” and “B.” The 2 first-exon isoforms both encode 42-amino acid segments that differ in 16 positions.

Or genes also differ in intron numbers and it has been speculated that the ancestral Or gene contained 3 introns. The subsequent adding and removal of introns have created Or genes with different genic structures where the intron number varies from 1 to 9 (Robertson et al. 2003). We observed that the genic structure is preserved in most orthologous Or genes, and we found only 8 intron gain events and 11 intron loss events affecting 13 orthologous groups. Most ortholog groups incurred either none or single intron gain/loss event, and only 3 groups were affected by multiple events. Or13a lost 4 introns in the obscura group; Or94a gained 1 intron in D. grimshawi and lost 1 in the melanogaster subgroup. In Or49b, all 3 species in the Drosophila subgenus have 5 introns whereas the 2nd and 4th introns are absent in the 9 species of the Sophophora subgenus. In addition, DwilOr49b lost the 3rd intron. Therefore, either 2 intron gain events occurred in the Drosophila lineage or 2 intron loss events occurred in the Sophophora lineage. A 3rd intron loss event subsequently occurred in the D. willistoni group. The maximal number of intron in an Or gene is still 9, whereas the minimal is 0, DwilOr2a is an intronless gene. In comparison to the stability of intron numbers, the size of corresponding introns in orthologous genes can differ in several folds, and the intron size seems not to correlate with genome size.

Selective Forces on Or Genes

Ors possess unique functional profiles, including different spontaneous firing rates, signal models, and response spectra to odorants (Hallem et al. 2004). We expect these molecules to be under stabilizing selection for maintaining their signaling properties while simultaneously the molecules will be under diversifying selection to develop response profiles specific to different ecological contexts of each species. Here, we used sequence-based analysis to investigate purifying and positive selection on Or genes. The results are shown in table 3, where the dN/dS ratio and the P value from the log-LRT of M7/M8 are shown, along with false discovery rate (FDR) correction for multiple tests (Benjamini and Hochberg 1995). The tests contrasting the simpler homogeneous models M1a against M2a resulted in a corrected P value of 1 for nearly all receptors suggesting a lack of power and we do not report the results here.

Table 3

Nonsynonymous to synonymous substitution ratios of Or genes

GenedN/dSP ValueCorrected P Value
Or1a0.16360.029460.1580
Or2a0.12630.485541.0000
Or7a0.11980.007220.0532
Or9a0.11220.001290.0178*
Or10a0.22430.002940.0248*
Or13a0.13801.000001.0000
Or19a0.34550.001140.0178*
Or22a0.21371.000001.0000
Or22c0.19460.009820.0644
Or23a0.18711.000001.0000
Or24a0.10130.124350.3914
Or30a0.09120.112750.3914
Or33a0.17490.907911.0000
Or33c0.24410.634641.0000
Or35a0.11541.000001.0000
Or42a0.12620.129550.3914
Or42b0.11770.332040.7836
Or43a0.19420.001810.0178*
Or43b0.12321.000001.0000
Or45a0.15370.463001.0000
Or45b0.12320.938331.0000
Or46a0.16880.125730.3914
Or47a0.06151.000001.0000
Or47b0.19450.209040.5606
Or49a0.23240.205120.5606
Or49b0.11710.949901.0000
Or56a0.16280.000200.0059*
Or59a0.14121.000001.0000
Or59b0.10411.000001.0000
Or59c0.20310.059140.2492
Or63a0.11200.035050.1591
Or65a0.23060.746301.0000
Or65b0.21920.526471.0000
Or65c0.27740.405290.9197
Or67a0.24441.000001.0000
Or67b0.08781.000001.0000
Or67c0.06721.000001.0000
Or67d0.12900.999911.0000
Or69a0.22830.012900.0761
Or71a0.18621.000001.0000
Or74a0.15160.035060.1591
Or82a0.14811.000001.0000
Or83a0.10541.000001.0000
Or83b0.03491.000001.0000
Or83c0.18321.000001.0000
Or85a0.10831.000001.0000
Or85b0.14871.000001.0000
Or85c0.15521.000001.0000
Or85d0.14550.222010.5695
Or85e0.16240.586341.0000
Or85f0.19590.249380.6131
Or88a0.16820.938851.0000
Or92a0.05820.132670.3914
Or94a0.13300.070400.2769
Or94b0.11900.707991.0000
Or98a0.24561.000001.0000
Or98b0.22690.977301.0000
OrN12.32950.000030.0018*
OrN20.34920.001650.0178*
GenedN/dSP ValueCorrected P Value
Or1a0.16360.029460.1580
Or2a0.12630.485541.0000
Or7a0.11980.007220.0532
Or9a0.11220.001290.0178*
Or10a0.22430.002940.0248*
Or13a0.13801.000001.0000
Or19a0.34550.001140.0178*
Or22a0.21371.000001.0000
Or22c0.19460.009820.0644
Or23a0.18711.000001.0000
Or24a0.10130.124350.3914
Or30a0.09120.112750.3914
Or33a0.17490.907911.0000
Or33c0.24410.634641.0000
Or35a0.11541.000001.0000
Or42a0.12620.129550.3914
Or42b0.11770.332040.7836
Or43a0.19420.001810.0178*
Or43b0.12321.000001.0000
Or45a0.15370.463001.0000
Or45b0.12320.938331.0000
Or46a0.16880.125730.3914
Or47a0.06151.000001.0000
Or47b0.19450.209040.5606
Or49a0.23240.205120.5606
Or49b0.11710.949901.0000
Or56a0.16280.000200.0059*
Or59a0.14121.000001.0000
Or59b0.10411.000001.0000
Or59c0.20310.059140.2492
Or63a0.11200.035050.1591
Or65a0.23060.746301.0000
Or65b0.21920.526471.0000
Or65c0.27740.405290.9197
Or67a0.24441.000001.0000
Or67b0.08781.000001.0000
Or67c0.06721.000001.0000
Or67d0.12900.999911.0000
Or69a0.22830.012900.0761
Or71a0.18621.000001.0000
Or74a0.15160.035060.1591
Or82a0.14811.000001.0000
Or83a0.10541.000001.0000
Or83b0.03491.000001.0000
Or83c0.18321.000001.0000
Or85a0.10831.000001.0000
Or85b0.14871.000001.0000
Or85c0.15521.000001.0000
Or85d0.14550.222010.5695
Or85e0.16240.586341.0000
Or85f0.19590.249380.6131
Or88a0.16820.938851.0000
Or92a0.05820.132670.3914
Or94a0.13300.070400.2769
Or94b0.11900.707991.0000
Or98a0.24561.000001.0000
Or98b0.22690.977301.0000
OrN12.32950.000030.0018*
OrN20.34920.001650.0178*
*

Corrected P value below 0.05.

Table 3

Nonsynonymous to synonymous substitution ratios of Or genes

GenedN/dSP ValueCorrected P Value
Or1a0.16360.029460.1580
Or2a0.12630.485541.0000
Or7a0.11980.007220.0532
Or9a0.11220.001290.0178*
Or10a0.22430.002940.0248*
Or13a0.13801.000001.0000
Or19a0.34550.001140.0178*
Or22a0.21371.000001.0000
Or22c0.19460.009820.0644
Or23a0.18711.000001.0000
Or24a0.10130.124350.3914
Or30a0.09120.112750.3914
Or33a0.17490.907911.0000
Or33c0.24410.634641.0000
Or35a0.11541.000001.0000
Or42a0.12620.129550.3914
Or42b0.11770.332040.7836
Or43a0.19420.001810.0178*
Or43b0.12321.000001.0000
Or45a0.15370.463001.0000
Or45b0.12320.938331.0000
Or46a0.16880.125730.3914
Or47a0.06151.000001.0000
Or47b0.19450.209040.5606
Or49a0.23240.205120.5606
Or49b0.11710.949901.0000
Or56a0.16280.000200.0059*
Or59a0.14121.000001.0000
Or59b0.10411.000001.0000
Or59c0.20310.059140.2492
Or63a0.11200.035050.1591
Or65a0.23060.746301.0000
Or65b0.21920.526471.0000
Or65c0.27740.405290.9197
Or67a0.24441.000001.0000
Or67b0.08781.000001.0000
Or67c0.06721.000001.0000
Or67d0.12900.999911.0000
Or69a0.22830.012900.0761
Or71a0.18621.000001.0000
Or74a0.15160.035060.1591
Or82a0.14811.000001.0000
Or83a0.10541.000001.0000
Or83b0.03491.000001.0000
Or83c0.18321.000001.0000
Or85a0.10831.000001.0000
Or85b0.14871.000001.0000
Or85c0.15521.000001.0000
Or85d0.14550.222010.5695
Or85e0.16240.586341.0000
Or85f0.19590.249380.6131
Or88a0.16820.938851.0000
Or92a0.05820.132670.3914
Or94a0.13300.070400.2769
Or94b0.11900.707991.0000
Or98a0.24561.000001.0000
Or98b0.22690.977301.0000
OrN12.32950.000030.0018*
OrN20.34920.001650.0178*
GenedN/dSP ValueCorrected P Value
Or1a0.16360.029460.1580
Or2a0.12630.485541.0000
Or7a0.11980.007220.0532
Or9a0.11220.001290.0178*
Or10a0.22430.002940.0248*
Or13a0.13801.000001.0000
Or19a0.34550.001140.0178*
Or22a0.21371.000001.0000
Or22c0.19460.009820.0644
Or23a0.18711.000001.0000
Or24a0.10130.124350.3914
Or30a0.09120.112750.3914
Or33a0.17490.907911.0000
Or33c0.24410.634641.0000
Or35a0.11541.000001.0000
Or42a0.12620.129550.3914
Or42b0.11770.332040.7836
Or43a0.19420.001810.0178*
Or43b0.12321.000001.0000
Or45a0.15370.463001.0000
Or45b0.12320.938331.0000
Or46a0.16880.125730.3914
Or47a0.06151.000001.0000
Or47b0.19450.209040.5606
Or49a0.23240.205120.5606
Or49b0.11710.949901.0000
Or56a0.16280.000200.0059*
Or59a0.14121.000001.0000
Or59b0.10411.000001.0000
Or59c0.20310.059140.2492
Or63a0.11200.035050.1591
Or65a0.23060.746301.0000
Or65b0.21920.526471.0000
Or65c0.27740.405290.9197
Or67a0.24441.000001.0000
Or67b0.08781.000001.0000
Or67c0.06721.000001.0000
Or67d0.12900.999911.0000
Or69a0.22830.012900.0761
Or71a0.18621.000001.0000
Or74a0.15160.035060.1591
Or82a0.14811.000001.0000
Or83a0.10541.000001.0000
Or83b0.03491.000001.0000
Or83c0.18321.000001.0000
Or85a0.10831.000001.0000
Or85b0.14871.000001.0000
Or85c0.15521.000001.0000
Or85d0.14550.222010.5695
Or85e0.16240.586341.0000
Or85f0.19590.249380.6131
Or88a0.16820.938851.0000
Or92a0.05820.132670.3914
Or94a0.13300.070400.2769
Or94b0.11900.707991.0000
Or98a0.24561.000001.0000
Or98b0.22690.977301.0000
OrN12.32950.000030.0018*
OrN20.34920.001650.0178*
*

Corrected P value below 0.05.

Or83b has the smallest dN/dS ratio of 0.0349, suggesting the effect of very strong purifying selection. Or83b is also the most conserved receptor with an average sequence identity of 92%. This receptor forms heterodimers with other receptors (Benton et al. 2006), and it seems that the unique and indispensable role of this receptor in olfaction exerted the most stringent pressure on its evolution. With the exception of OrN1, a putative Or gene newly identified in this study, all Or genes have low dN/dS values with the highest being 0.3492 for OrN2 and 0.3455 for Or19a.

However, dN/dS ratio could be misleading if only a small proportion of sites are under positive selection. To this end, we used LRT to detect positive selection by employing 2 pairs of site models in PAML, M1a versus M2a, and M7 versus M8. As mentioned, the comparison of M1a to M2a did not reveal any orthologous group under positive selection. The test using M7 and M8, which allows for beta-distributed site-specific dN/dS ratio, detected 7 groups under possible positive selection at 0.05 significance level with corrections for multiple tests using FDR. They are Or9a, Or10a, Or19a, Or43a, Or56a, OrN1, and OrN2. We also used the NEB and BEB estimation methods in model M8 (Zhang et al. 2005) to identify sites under positive selection. We found only 10 such sites at 0.05 significance level. These findings are consistent with the observation that broad purifying selection acts on olfactory processes and the orthologous genes probably have the same or similar functional properties.

In the above analysis, we had to assume a time-homogeneous dN/dS ratio for the orthologous gene phylogeny and lineage-specific evolution may be obscured. To account for this, we examined the pairwise dN/dS ratio within orthologous groups. We found that 94% of pairs have small dN/dS ratio (<0.2), whereas 34 pairs have dN/dS greater than 0.5 of which 30 are inparalog pairs. Thus, we found support for the hypothesis that duplicated receptors within a given genome experienced directional selection for novel function. Three pairs show high dN/dS ratio greater than 1: 3.24 for DgriOr42b-1 versus DgriOr42b-7, 2.05 for DgriOr42b-5 versus DgriOr42b-6, and 1.32 for DgriOr42b-7 versus DgriOr42b-8. It is interesting to note that Or42b and Or67d genes in D. grimshawi generally have relatively high dN/dS ratios. All 28 pairs between the inparalogous DgriOr42b have dN/dS greater than 0.3, and 20 pairs have dN/dS ratio greater than 0.5. In DgriOr67d, the 5 inparalogous genes form 2 tandem arrays. We found 4 pairs of DgriOr67d have dN/dS greater than 0.7. Sporadic high dN/dS ratios were also spotted in Or67a, Or22a, Or33a, Or47b, Or49a, Or59a, and Or59c.

Discussion

Odorant receptors in flies were long evasive from investigation until their identification in D. melanogaster. In this study, we expanded the effort to additional 11 Drosophila species. We found that in all species, the number of Or genes differs moderately among the genomes. As shown in table 1, the number of putatively functional genes in each genome varies from 53 to 72 with a median number 60. We tested whether this variability is more or less than expected under the model of random gene loss and gain over the species phylogeny by estimating the gain/loss rates and simulating the process over the tree. The gain rate was estimated to be 0.370/Myr, and loss rate was estimated to be 0.284/Myr. Using the variance of gene numbers as our statistic and 1,000,000 repeated simulations for reference distribution, we found that the variation in the Or gene numbers was significantly high at P = 0.04. Thus, the number of Or genes in each genome varies more than expected by chance; and, in fact, much of the variation can be attributed to inparalogs (genome-specific duplications). This result is consistent with the idea that receptor gain/loss might be functionally related to each species' chemical niche.

Ors are stimulated by odors, small volatile organic compounds. It is found that although odors vary greatly in their chemical structure and physical properties, receptors seem to identify them mainly by several odor characteristics including functional groups and backbone chain size (Hallem et al. 2004). Such decomposition of odors can be mirrored in the receptors. A systematic study in D. melanogaster found that although individual receptors possess unique spectra, these spectra frequently overlapped each other (Hallem and Carlson 2006). Through such combinations, a pool of approximately 60 receptors enables the flies to distinguish a much higher number of odors. That is, the odor receptors act as a “basis” set in the multidimensional world of odorants in the same manner as the 3 light receptors in humans establish color vision. The odors could include those that are indispensable to all Drosophila species and those that are idiosyncratic to particular species. In fact, in the larva phase, the pool of active receptors may be much smaller (Kreher et al. 2005). Thus, one hypothesis about the distribution of odor receptors in the Drosophila species is that a core (but variable) set of receptors form a stable functional core whereas species-specific duplicated genes elaborate the odor space for niche specific chemicals.

We have 2 additional lines of evidence to support this functional stability hypothesis. First, in all 12 species, nearly all receptor genes have orthologs in some other species. At the coarse level, only 8 Or genes in the Drosophila subgenus do not have detectable orthologs in the Sophophora subgenus. More than 80% of the orthologous groups have members in the 2 subgenera. Furthermore, these orthologs are highly conserved in sequence. In the phylogeny, members in an orthologous group form monophyletic clades and share the most recent common ancestor. Second, in the analysis of positive selection, we found very weak evidence of functional divergence within the orthologous groups that suggests in most cases orthologous receptors in different species may have spectra largely resembling each other. Most orthologous groups have more than 10 reasonably divergent sequences that can be well aligned, and the branch lengths in the phylogeny of these sequences are generally not trivially small, lending favorable conditions for our selection tests. The statistical power in detecting positive selection was also boosted by assuming varying dN/dS ratio among sites. In our test of pair of models (M7 and M8), model M8 assumes a beta distribution of dN/dS ratio plus one for positive selection. After correction for multiple testing, only 7 receptors showed significant evidence of positive selection at 0.05 significance level. With the exception of OrN1, dN/dS ratio is smaller than 0.35 for all groups. Besides these sequence-based analysis and inference, there also exists experimental evidence. In an in vivo electrophysiological study of olfaction in 9 species of melanogaster subgroup, the response profiles of corresponding ORNs were found to be very similar (Stensmyr et al. 2003).

Site-specific estimates of positive selection revealed 9 sites in D. melanogaster with reliable estimates (well-aligned sequences) and significant posterior probabilities. These are amino acid position 51 in DmelOr7a, position 115 in DmelOr9a, positions 15, 128, 183, and 246 in DmelOr10a, position 228 in DmelOr19a, position 258 in DmelOr22c, and position 49 in DmelOr59c. In addition, we found evidence for positive selection in position 54 in the newly annotated OrN1. Of these, 7 positions were mapped to the intracellular domains, 2 to the TM domains, and 1 to the extracellular domain using the structure model of Benton et al. (2006). If we pose that odorant receptors have the standard structural conformation rather than the inverted conformation suggested by Benton et al. (2006), all of our positively selected intracellular sites would be considered extracellular, and the 2 sites in the TM domains would be in their lumenal halves. If the positive selection is for novel ligand interaction, it would seem more likely to involve extracellular domains rather than intracellular domains. A study of odorant receptors in rodents also supports this hypothesis by finding that 75% positively selected sites are either in the extracellular parts or the lumenal halves of TM domains (Emes et al. 2004). Thus, these results throw caution to the idea that odorant receptors have atypical structures.

Stability of odorant repertoire does not mean that the Or genes are evolutionarily static. Or genes evolved in response to different ecological contexts. For example, D. sechellia is endemic to islands away from the Africa continent and oviposits only on morinda fruit. This fly was found to be particularly sensitive to methyl hexanoate; likely, some Or has evolved in adaptation to this environment (Dekker et al. 2006). However, the number of detectable functional genes is relatively low at 55, whereas the genome seems to have rather high proportion of pseudogenes (12% compared with 9% average for all others).

At the genome level, we examined the changes of chromosomal location and orientation of Or genes in 4 species. A recent study suggested that Drosophila genomes have undergone some of the most dramatic chromosomal evolution among all eukaryotes (Ranz et al. 2001). Species in this genus may have different karyotypes. Besides a dot chromosome, D. pseudoobscura has 1 metacentric chromosome and 3 telocentric chromosomes, whereas the other 3 species have 1 telocentric and 2 metacentric chromosomes. Nevertheless, these 5 chromosome arms, so called the Muller elements, can be mapped to each other, and it is hypothesized that genes largely mobilize within the same Muller element (Powell 1997). Our study of Or genes is consistent with the general preservation of the Muller elements with only a few Or genes showing evidence of movement across different Muller elements.

Gene duplications and losses were observed in about half of the orthologous groups and in most species as well as in almost all branches of the species tree. More than half of the duplicated genes stay as tandem arrays. In one extreme case, we inferred 27 duplication events and 14 loss events in D. grimshawi. Some of the inference must be interpreted with caution because loss events are affected by our ability to detect the homologs. However, we note that homologs for a given family is less divergent than Or genes as a whole, and the entire gene set is detectable with the recursive TBlastN search employed in this study. It is also the case that although duplication events are likely to be true positives, assessment of species-specific duplication (i.e., inparalogs) is affected by taxon sampling. If we had more species closely related to D. grimshawi, we may find other orthologs. Regardless, with 72 functional genes and 11 pseudogenes, this species has the largest number of Or genes in the 12 species. D. grimshawi belong to the Hawaiian lineage that encompasses about 1,000 species and dispersed in Hawaiian Islands. Thus, potentially, the unique ecological condition likely contributed to the diversifying of Or genes in this species, as illustrated by DgriOr42b whose 11 genes split into 3 tandem arrays each with very unbalanced number of copies.

The Or genes show multimodal evolution patterns with changes in their genic structure (gain and loss of introns), chromosomal positions, duplication and gain of new functions, as well as loss of function. Overall, the Or genes in Drosophila seem to have undergone dynamic evolution through gene duplication and loss while maintaining a core set of functions that help establish a fundamental odorant space for the group.

We thank John Carlson, Fangjun Tang, and Elissa Hallem for discussions and access to preliminary results. We thank 2 anonymous reviewers for suggestions that improved this paper. This work has been supported in part by National Science Foundation grant EF-0334866, EF-0331654, and National Institutes of Health grant P20-GM-6912-1 to J.K.

Funding to pay the Open Access publication charges for this article was provided by Penn funds.

References

Ashburner
M
Lemeunier
F
,
Relationships within the melanogaster species subgroup of the genus Drosophila (Sophophora) i. inversion polymorphisms in Drosophila melanogaster and Drosophila simulans
Proc R Soc Lond B Biol Sci
,
1975
, vol.
193
(pg.
137
-
157
)
Benjamini
Y
Hochberg
Y
,
Controlling the false discovery rate: a practical and powerful approach to multiple testing
J R Stat Soc Ser B
,
1995
, vol.
57
(pg.
289
-
300
)
Benton
R
Sachse
S
Michnick
SW
Vosshall
LB
,
Atypical membrane topology and heteromeric function of Drosophila odorant receptors in vivo
PLoS Biol
,
2006
, vol.
4
pg.
e20
Brudno
M
Chapman
M
Gottgens
B
Batzoglou
S
Morgenstern
B
,
Fast and sensitive multiple alignment of large genomic sequences
BMC Bioinformatics
,
2003
, vol.
4
pg.
66
Buck
LB
,
Information coding in the vertebrate olfactory system
Annu Rev Neurosci
,
1996
, vol.
19
(pg.
517
-
544
)
Clyne
PJ
Warr
CG
Freeman
MR
Lessing
D
Kim
J
Carlson
JR
,
A novel family of divergent seven-transmembrane proteins: candidate odorant receptors in Drosophila
Neuron
,
1999
, vol.
22
(pg.
327
-
338
)
de Bruyne
M
Foster
K
Carlson
JR
,
Odor coding in the Drosophila antenna
Neuron
,
2001
, vol.
30
(pg.
537
-
552
)
Dekker
T
Ibba
I
Siju
KP
Stensmyr
MC
Hansson
BS
,
Olfactory shifts parallel superspecialism for toxic fruit in Drosophila melanogaster sibling, D. sechellia
Curr Biol
,
2006
, vol.
16
(pg.
101
-
109
)
DORN Committee
,
A unified nomenclature system for the Drosophila odorant receptors. Drosophila Odorant Receptor Nomenclature Committee
Cell
,
2000
, vol.
102
(pg.
145
-
146
)
Edgar
RC
,
MUSCLE: a multiple sequence alignment method with reduced time and space complexity
BMC Bioinformatics
,
2004
, vol.
5
pg.
113
Elmore
T
Ignell
R
Carlson
JR
Smith
DP
,
Targeted mutation of a Drosophila odor receptor defines receptor requirement in a novel class of sensillum
J Neurosci
,
2003
, vol.
23
(pg.
9906
-
9912
)
Emes
RD
Beatson
SA
Ponting
CP
Goodstadt
L
,
Evolution and comparative genomics of dorant- and pheromone-associated genes in rodents
Genome Res
,
2004
, vol.
14
(pg.
591
-
602
)
Felsenstein
J
PHYLIP (phylogeny inference package). Version 3.66
,
2006
Seattle (WA)
Department of Genome Sciences, University of Washington
 
Distributed by the author
Gao
Q
Chess
A
,
Identification of candidate Drosophila olfactory receptors from genomic DNA sequence
Genomics
,
1999
, vol.
60
(pg.
31
-
39
)
Hallem
EA
Carlson
JR
,
Coding of odors by a receptor repertoire
Cell
,
2006
, vol.
125
(pg.
143
-
160
)
Hallem
EA
Ho
MG
Carlson
JR
,
The molecular basis of odor coding in the Drosophila antenna
Cell
,
2004
, vol.
117
(pg.
965
-
979
)
Hildebrand
JG
Shepherd
GM
,
Mechanisms of olfactory discrimination: converging evidence for common principles across phyla
Annu Rev Neurosci
,
1997
, vol.
20
(pg.
595
-
631
)
Kall
L
Krogh
A
Sonnhammer
EL
,
An HMM posterior decoder for sequence feature prediction that includes homology information
Bioinformatics
,
2005
, vol.
21
Suppl 1
(pg.
i251
-
i257
)
Kim
J
Carlson
JR
,
Gene discovery by e-genetics: Drosophila odor and taste receptors
J Cell Sci
,
2002
, vol.
115
(pg.
1107
-
1112
)
Kim
J
Moriyama
EN
Warr
CG
Clyne
PJ
Carlson
JR
,
Identification of novel multi-transmembrane proteins from genomic databases using quasi-periodic structural properties
Bioinformatics
,
2000
, vol.
16
(pg.
767
-
775
)
Kreher
SA
Kwon
JY
Carlson
JR
,
The molecular basis of odor coding in the Drosophila larva
Neuron
,
2005
, vol.
46
(pg.
445
-
456
)
Mombaerts
P
,
Molecular biology of odorant receptors in vertebrates
Annu Rev Neurosci
,
1999
, vol.
22
(pg.
487
-
509
)
Ngai
J
Dowling
MM
Buck
L
Axel
R
Chess
A
,
The family of genes encoding odorant receptors in the channel catfish
Cell
,
1993
, vol.
72
(pg.
657
-
666
)
Ohno
S
Evolution by gene duplication
,
1970
Springer-Verlag
Page
RD
Charleston
MA
,
From gene to organismal phylogeny: reconciled trees and the gene tree/species tree problem
Mol Phylogenet Evol
,
1997
, vol.
7
(pg.
231
-
240
)
Powell
JR
Progress and prospects in evolutionary biology: the Drosophila model
,
1997
Oxford University Press
Ranz
JM
Casals
F
Ruiz
A
,
How malleable is the eukaryotic genome? Extreme rate of chromosomal rearrangement in the genus Drosophila
Genome Res
,
2001
, vol.
11
(pg.
230
-
239
)
Robertson
HM
Warr
CG
Carlson
JR
,
Molecular evolution of the insect chemoreceptor gene superfamily in Drosophila melanogaster
Proc Natl Acad Sci USA
,
2003
, vol.
100
Suppl 2
(pg.
14537
-
14542
)
Russo
CA
Takezaki
N
Nei
M
,
Molecular phylogeny and divergence times of drosophilid species
Mol Biol Evol
,
1995
, vol.
12
(pg.
391
-
404
)
Saitou
N
Nei
M
,
The neighbor-joining method: a new method for reconstructing phylogenetic trees
Mol Biol Evol
,
1987
, vol.
4
(pg.
406
-
425
)
Shanbhag
S
Muller
B
Steinbrecht
RA
,
Atlas of olfactory organs of Drosophila melanogaster. 1. Types, external organization, innervation and distribution of olfactory sensilla
Int J Insect Morphol Embryol
,
1999
, vol.
28
(pg.
377
-
397
)
Shanbhag
S
Muller
B
Steinbrecht
RA
,
Atlas of olfactory organs of Drosophila melanogaster. 2. Types, external organization, innervation and distribution of olfactory sensilla
Int J Insect Morphol Embryol
,
2000
, vol.
29
(pg.
211
-
229
)
Sonnhammer
EL
Koonin
EV
,
Orthology, paralogy and proposed classification for paralog subtypes
Trends Genet
,
2002
, vol.
18
(pg.
619
-
620
)
Stensmyr
MC
Dekker
T
Hansson
BS
,
Evolution of the olfactory code in the Drosophila melanogaster subgroup
Proc Biol Sci
,
2003
, vol.
270
(pg.
2333
-
2340
)
Stocker
RF
,
The organization of the chemosensory system in Drosophila melanogaster: a review
Cell Tissue Res
,
1994
, vol.
275
(pg.
3
-
26
)
Vosshall
LB
Amrein
H
Morozov
PS
Rzhetsky
A
Axel
R
,
A spatial map of olfactory receptor expression in the Drosophila antenna
Cell
,
1999
, vol.
96
(pg.
725
-
736
)
Xu
PX
Zwiebel
LJ
Smith
DP
,
Identification of a distinct family of genes encoding atypical odorant-binding proteins in the malaria vector mosquito, Anopheles gambiae
Insect Mol Biol
,
2003
, vol.
12
(pg.
549
-
560
)
Yang
Z
,
PAML: a program package for phylogenetic analysis by maximum likelihood
Comput Appl Biosci
,
1997
, vol.
13
(pg.
555
-
556
)
Zhang
J
Nielsen
R
Yang
Z
,
Evaluation of an improved branch-site likelihood method for detecting positive selection at the molecular level
Mol Biol Evol
,
2005
, vol.
22
(pg.
2472
-
2479
)

Author notes

Adriana Briscoe, Associate Editor

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Supplementary data