Abstract

We report a complete genomic sequence of rare isolates (minor genotype) of the SARS-CoV from SARS patients in Guangdong, China, where the first few cases emerged. The most striking discovery from the isolate is an extra 29-nucleotide sequence located at the nucleotide positions between 27,863 and 27,864 (referred to the complete sequence of BJ01) within an overlapped region composed of BGI-PUP5 (BGI-postulated uncharacterized protein 5) and BGI-PUP6 upstream of the N (nucleocapsid) protein. The discovery of this minor genotype, GD-Ins29, suggests a significant genetic event and differentiates it from the previously reported genotype, the dominant form among all sequenced SARS-CoV isolates. A 17-nt segment of this extra sequence is identical to a segment of the same size in two human mRNA sequences that may interfere with viral genome replication and transcription in the cytosol of the infected cells. It provides a new avenue for the exploration of the virus-host interaction in viral evolution, host pathogenesis, and vaccine development.

Introduction

Severe Acute Respiratory Syndrome (SARS) has infected thousands of people and cost hundreds of deaths globally since it first emerged in Guangdong, China, in November 2002 (ref. 1; http://www.who.int/csr/sars/en/). Sixteen complete or partial genome sequences of SARS-CoV isolates have been available since March 14 this year (ref. 2–5; http://www.ncbi.nlm.nih.gov/; Table 1). In the process of surveying all available genomic sequences of the SARS-CoV, we have obtained the complete genomic sequence of an isolate, named Isolate GZ01 originally. The sequence contains an extra 29-nucleotide segment in the vicinity of the viral structural proteins, M (the membrane protein) and N (the nucleocapsid protein). We have renamed the isolate as GD-Ins29 to indicate the origin, length, and the type of mutation. We believe that this novel genotype represents one of the early variants of the SARS-CoV. The absence of the 29-nt fragment may help the virus to escape the likely interference from locally homologous RNA molecules of host origin and to become more prevalent in human hosts. This sequence also holds unusually high number of variations among all the sequenced SARS-CoV isolates, suggesting a more distant relationship from other sequences SARS-CoV genomes published thus far.

Table 1

The Complete Genome Sequences of 17 Isolates of SARS-CoV

IsolateGenome size (nt)Accession numberModification date
SIN250029,711AY283794.19-May-03
SIN267729,705AY283795.19-May-03
SIN267929,711AY283796.19-May-03
SIN274829,706AY283797.19-May-03
SIN277429,711AY283798.19-May-03
TOR229,751NC_004718.322-May-03
Urbani29,727AY278741.121-Apr-03
CUHK-W129,736AY278554.214-May-03
CUHK-Su1029,736AY282752.17-May-03
HKU-3984929,742AY278491.218-Apr-03
TW129,729AY291451.114-May-03
ZJ0129,715AY297028.119-May-03
BJ0129,725AY278488.21-May-03
BJ0229,740AY27848729-May-03
BJ0329,738AY27849029-May-03
BJ0429,732AY27935429-May-03
GD0129,757AY27848929-May-03
IsolateGenome size (nt)Accession numberModification date
SIN250029,711AY283794.19-May-03
SIN267729,705AY283795.19-May-03
SIN267929,711AY283796.19-May-03
SIN274829,706AY283797.19-May-03
SIN277429,711AY283798.19-May-03
TOR229,751NC_004718.322-May-03
Urbani29,727AY278741.121-Apr-03
CUHK-W129,736AY278554.214-May-03
CUHK-Su1029,736AY282752.17-May-03
HKU-3984929,742AY278491.218-Apr-03
TW129,729AY291451.114-May-03
ZJ0129,715AY297028.119-May-03
BJ0129,725AY278488.21-May-03
BJ0229,740AY27848729-May-03
BJ0329,738AY27849029-May-03
BJ0429,732AY27935429-May-03
GD0129,757AY27848929-May-03
Table 1

The Complete Genome Sequences of 17 Isolates of SARS-CoV

IsolateGenome size (nt)Accession numberModification date
SIN250029,711AY283794.19-May-03
SIN267729,705AY283795.19-May-03
SIN267929,711AY283796.19-May-03
SIN274829,706AY283797.19-May-03
SIN277429,711AY283798.19-May-03
TOR229,751NC_004718.322-May-03
Urbani29,727AY278741.121-Apr-03
CUHK-W129,736AY278554.214-May-03
CUHK-Su1029,736AY282752.17-May-03
HKU-3984929,742AY278491.218-Apr-03
TW129,729AY291451.114-May-03
ZJ0129,715AY297028.119-May-03
BJ0129,725AY278488.21-May-03
BJ0229,740AY27848729-May-03
BJ0329,738AY27849029-May-03
BJ0429,732AY27935429-May-03
GD0129,757AY27848929-May-03
IsolateGenome size (nt)Accession numberModification date
SIN250029,711AY283794.19-May-03
SIN267729,705AY283795.19-May-03
SIN267929,711AY283796.19-May-03
SIN274829,706AY283797.19-May-03
SIN277429,711AY283798.19-May-03
TOR229,751NC_004718.322-May-03
Urbani29,727AY278741.121-Apr-03
CUHK-W129,736AY278554.214-May-03
CUHK-Su1029,736AY282752.17-May-03
HKU-3984929,742AY278491.218-Apr-03
TW129,729AY291451.114-May-03
ZJ0129,715AY297028.119-May-03
BJ0129,725AY278488.21-May-03
BJ0229,740AY27848729-May-03
BJ0329,738AY27849029-May-03
BJ0429,732AY27935429-May-03
GD0129,757AY27848929-May-03

Results and Discussion

A 29-nt sequence segment was identified from a viral isolate, SARS-CoV GD01

The genome landscape of this isolate is not greatly different from others reported by our group and other laboratories. Twelve ORFs (open reading frames), including 6 CDSs (coding sequences) and 6 PUPs (postulated uncharacterized proteins) were predicted in the viral genome (Table 2; Figure 1). We annotated two new PUPs, BGI-PUP5 (so named to avoid possible ambiguity in nomenclature) and BGI-PUP6, located between the previously reported PUP4 and the N protein. These PUPs overlap in a 35-nt region (position 27,843 and 27,879, in reference to nucleotide position in the complete sequence of BJ01). Both PUPs share the same consensus leader sequence (5′-agUCUAAAAGAAC-3′) that is immediately followed by the predicted start codon of BGI-PUP5, and approximately 95-nt away from that of BGI-PUP6. The expression, actual existence of protein products, and possible function of the PUPs, such as that of BGI-PUP6, remain to be elucidated experimentally.

Table 2

The Predicted ORFs in the GD01 SARS-CoV Genome

ORFPosition1Size (a.a.)TRS position2TRS sequence
R246 – 13,379
13,379 - 21,466
7,073107A G U A UAAAC - AA UAA UAAA U U U U A
S21,473 - 25,2401,25521,463CAA CUAAACGAAC
BGI-PUP125,249 - 26,07327425,237CACA UAAACGAACUU
BGI-PUP225,670 - 26,13415425,600U GCA U C AACG C A - U G UA G AAU UAU
E26,098 - 26,3287626,086AG U GAGU ACGAACUU
M26,379 - 27,04422126,325GG UCUAAACGAACU AACU A U U A U U
BGI-PUP327,055 - 27,2466326,974A C CG UA UUG GAA AC U AU AAAU UAA
BGI-PUP427,254 - 27,62212227,184C CUCUAA - C U AA —G AA G A AU U A
BGI-PUP5  327,760 - 27,8793927, 750AG UCUAAACGAAC
BGI-PUP6  327,845 - 280998427, 750AG UCUAAACGAAC A U G AAA -C U U C
BGI-PUP27,760 - 27,86312227,750AG UCUAAACGAAC
(GD-Ins29)4-(29 bp insertion)-
27,864 - 28,099
N28,101 - 29,36942228,083U AAA UAAACGAAC AAAU U AA A
BGI - PUP728,111 - 28,4079828,083U AAA UAAACGAAC AAAUU AA AA UG
ORFPosition1Size (a.a.)TRS position2TRS sequence
R246 – 13,379
13,379 - 21,466
7,073107A G U A UAAAC - AA UAA UAAA U U U U A
S21,473 - 25,2401,25521,463CAA CUAAACGAAC
BGI-PUP125,249 - 26,07327425,237CACA UAAACGAACUU
BGI-PUP225,670 - 26,13415425,600U GCA U C AACG C A - U G UA G AAU UAU
E26,098 - 26,3287626,086AG U GAGU ACGAACUU
M26,379 - 27,04422126,325GG UCUAAACGAACU AACU A U U A U U
BGI-PUP327,055 - 27,2466326,974A C CG UA UUG GAA AC U AU AAAU UAA
BGI-PUP427,254 - 27,62212227,184C CUCUAA - C U AA —G AA G A AU U A
BGI-PUP5  327,760 - 27,8793927, 750AG UCUAAACGAAC
BGI-PUP6  327,845 - 280998427, 750AG UCUAAACGAAC A U G AAA -C U U C
BGI-PUP27,760 - 27,86312227,750AG UCUAAACGAAC
(GD-Ins29)4-(29 bp insertion)-
27,864 - 28,099
N28,101 - 29,36942228,083U AAA UAAACGAAC AAAU U AA A
BGI - PUP728,111 - 28,4079828,083U AAA UAAACGAAC AAAUU AA AA UG
1

The nucleotide position is in reference to the complete sequence of BJ01. An ORF includes both start codon and stop codon. An actual position in GD01 is calculated by adding 3 nucleotides.

2

The position is in reference to the first nucleotide in the consensus leader core sequence (CUAAACGAAC) of the TRS.

3

The PUPs are equivalent to ORF 10 and ORF 11 in Tor2 (NC_004718) (3).

4

BGI-PUP (GD-Ins29) is present only in the minor genotype of GD-Ins29.

Table 2

The Predicted ORFs in the GD01 SARS-CoV Genome

ORFPosition1Size (a.a.)TRS position2TRS sequence
R246 – 13,379
13,379 - 21,466
7,073107A G U A UAAAC - AA UAA UAAA U U U U A
S21,473 - 25,2401,25521,463CAA CUAAACGAAC
BGI-PUP125,249 - 26,07327425,237CACA UAAACGAACUU
BGI-PUP225,670 - 26,13415425,600U GCA U C AACG C A - U G UA G AAU UAU
E26,098 - 26,3287626,086AG U GAGU ACGAACUU
M26,379 - 27,04422126,325GG UCUAAACGAACU AACU A U U A U U
BGI-PUP327,055 - 27,2466326,974A C CG UA UUG GAA AC U AU AAAU UAA
BGI-PUP427,254 - 27,62212227,184C CUCUAA - C U AA —G AA G A AU U A
BGI-PUP5  327,760 - 27,8793927, 750AG UCUAAACGAAC
BGI-PUP6  327,845 - 280998427, 750AG UCUAAACGAAC A U G AAA -C U U C
BGI-PUP27,760 - 27,86312227,750AG UCUAAACGAAC
(GD-Ins29)4-(29 bp insertion)-
27,864 - 28,099
N28,101 - 29,36942228,083U AAA UAAACGAAC AAAU U AA A
BGI - PUP728,111 - 28,4079828,083U AAA UAAACGAAC AAAUU AA AA UG
ORFPosition1Size (a.a.)TRS position2TRS sequence
R246 – 13,379
13,379 - 21,466
7,073107A G U A UAAAC - AA UAA UAAA U U U U A
S21,473 - 25,2401,25521,463CAA CUAAACGAAC
BGI-PUP125,249 - 26,07327425,237CACA UAAACGAACUU
BGI-PUP225,670 - 26,13415425,600U GCA U C AACG C A - U G UA G AAU UAU
E26,098 - 26,3287626,086AG U GAGU ACGAACUU
M26,379 - 27,04422126,325GG UCUAAACGAACU AACU A U U A U U
BGI-PUP327,055 - 27,2466326,974A C CG UA UUG GAA AC U AU AAAU UAA
BGI-PUP427,254 - 27,62212227,184C CUCUAA - C U AA —G AA G A AU U A
BGI-PUP5  327,760 - 27,8793927, 750AG UCUAAACGAAC
BGI-PUP6  327,845 - 280998427, 750AG UCUAAACGAAC A U G AAA -C U U C
BGI-PUP27,760 - 27,86312227,750AG UCUAAACGAAC
(GD-Ins29)4-(29 bp insertion)-
27,864 - 28,099
N28,101 - 29,36942228,083U AAA UAAACGAAC AAAU U AA A
BGI - PUP728,111 - 28,4079828,083U AAA UAAACGAAC AAAUU AA AA UG
1

The nucleotide position is in reference to the complete sequence of BJ01. An ORF includes both start codon and stop codon. An actual position in GD01 is calculated by adding 3 nucleotides.

2

The position is in reference to the first nucleotide in the consensus leader core sequence (CUAAACGAAC) of the TRS.

3

The PUPs are equivalent to ORF 10 and ORF 11 in Tor2 (NC_004718) (3).

4

BGI-PUP (GD-Ins29) is present only in the minor genotype of GD-Ins29.

The GD-Ins29 in the genome of Isolate GD01. Solid arrows indicate ORFs and the gray arrows denote the BGI-PUPs. Open arrows highlight the region of interest. The position of the insertion and related annotations are in the parentheses.
Fig. 1

The GD-Ins29 in the genome of Isolate GD01. Solid arrows indicate ORFs and the gray arrows denote the BGI-PUPs. Open arrows highlight the region of interest. The position of the insertion and related annotations are in the parentheses.

An extra sequence segment of 29-nt (GD-Ins29) in length was unambiguously identified within the overlapped region of BGI-PUP5 and BGI-PUP6. This sequence was confirmed with 60 high-quality sequencing reads from 35 clones of the site-specific amplicon-library constructed with RT-PCR products from this genomic region. Among them, 25 clones were sequenced from both ends. No other sequence variations were found in the sequences from this particular amplicon-library, even though minor variants are occasionally seen in other amplicon-libraries from different sequences due to minor variations from viral populations and, to a much less extent, RT-PCR generated aberrant products. We have also identified another SARS-CoV isolate that harbors the same sequence segment from a SARS patient in Guangdong Province. These results strongly suggest that we have found a novel yet minor genotype of SARS-CoV, but not encountered a sequence anomaly.

In the GD-Ins29 sequence, the 29-nt segment is located in-frame after the second nucleotide of Codon 35 in BGI-PUP5 (position 27,863) and the first nucleotide of Codon 7 of BGI-PUP6. The resulted hypothetical protein is predicted to have 122 amino acids. Without affecting the coding frame, the 29-nt sequence theoretically could be spliced out in two different ways since there is uncertainty whether the uridine is at the beginning or at the end of the extra sequence, 5′- UCCUACUGGUUACCAACCUGAAUGGAAUA-3′ or 5′-CCUACUGGUUACCAACCUGAAUGGAAUAU-3′. No obvious sequence signatures were found within and around the sequence except several short stretches of simple repetitive sequences of 6 to 8 nucleotides in unit length. The actual mechanism, as to how such sequence was deleted, is yet to be revealed.

Since the deletion is very significant in size, we reasoned that it might have been affected by the host factors. Upon searching the sequence against public human sequence databases, we have found two seemingly interesting matches of 17-nt segments in the center of the sequence to three human chromosomal locations. One (5′-GGUUACCAACCUGAAUG-3′) was aligned to protein coding sequences mapped to two chromosomal locations: 15q23 and 9q22; the other (5′-UGGUUACCAACCUGAAU-3′) matches to a sequence segment on chromosome 11, which overlaps with a repetitive sequence of the mammalian interspersed repeat, or MIR, and is most likely non-protein coding. The protein-coding sequences are both members of acidic (leucine-rich) nuclear phosphoprotein 32 family (6, 7). Exhaustive database searches did not yield any other significant matching sequence in the human genomic sequence databases.

Since the virus is propagated within the cytosol, only the host sequences, such as processed transcripts, mRNAs and other operational RNAs, would have possibilities directly interacting with the viral host-dependent cellular processes, such as replication of the viral genome via a negative sense RNA intermediates, transcription of genes encoding viral replicase and structural proteins, and, to a limited extent, translation, if the interfering sequences enabling strand-annealing with viral RNA products. The unique 17-nt sequence that we have discovered from the human genome as mRNA forms is thought to be capable of annealing specifically with the negative sense strand of the viral RNA while the virus is replicating its RNA genomes from the negative sense RNA genome intermediates and transcribing viral transcripts for the replicase and structural proteins, thus interfere with the viral life cycle by reducing the efficiency of both viral replication and transcription. The prerequisite for the interference to happen is that the 17-nt sequence comes from a protein or RNA coding sequence that exists in infected human cells with reasonable abundance. It is very suggestive that the interaction of the intermediate viral genome products, namely the negative sense RNAs, and the host RNA species might have happened in the propagation processes within a host of SARS-CoV, through intermolecular RNA-RNA recombination (8).

A high number of substitutions were also found in Isolate GD01 in comparison with other SARS-CoV genomes

After careful sequence alignment, 137 substitutions were identified in comparison with the other 16 published SARS-CoV genome sequences, yielded an overall-genome mutation rate of approximately 0.46% (Table 3). 45 substitutions were found just between GD01 and BJ01 (Supplementary Table 1). 38 out of 45 are unique to GD01. Special attention was paid to the sequences of the mutation sites and multiple clones from the amplicon-libraries constructed from multiple independent RT-PCR products were sequenced. A substantial portion (70.4%, 100/142) of the substitutions was predicted to be non-synonymous mutations in the ORFs, including 14 in the PUPs. Among the 92 substitutions detected in the R (replicase) protein, 70.7% (65/92) could lead to amino acid changes. Although the S (spike) protein has a low ratio (13/22), the ratio is higher in the other structural proteins, the M (4/4) and N proteins (3/4), as well as in BGI-PUP2 (4/5), BGI-PUP3 (2/2), and BGI-PUP (GD-Ins29, 2/2). Even though a large fraction of substitutions in the viral genome have been located in the ORF for the R protein, its substitution rate is actually the lowest among all the defined CDSs, with regard to its large size (21,222 nt). The mutation rate of SARS-CoV is quite high if we take into consideration such a short time period since it was identified from human hosts. A high mutation rate is consistent with the high error rate of RNA replication (9) and the high fraction of non-synonymous substitutions also implies the possibility that the selective pressure from the host may have worked on the virus, albeit there is very little statistical power to support the conjecture with current data set. A very important notion from our data (including tens of thousands of sequencing traces) is the obvious absence of indels (insertions and deletions), suggesting that the viral polymerase is prone to the replication errors in proofreading but remains relatively accurate in frame-moving conveyed by the RdRp (RNA-dependent RNA polymerase) activity.

Table 3

Summarized Substitutions in 17 Isolates of SARS-CoV

ORFSize (nt)No. of S1Percentage of substitute (%)No. of N-Syn1Percentage of N-Syn (%)
R21,22292 (26)20.4365 (16)71
S3,76822 (7)0.5813 (5)59
BGI-PUP18259 (3)1.096 (2)67
BGI-PUP24655 (3)1.084 (2)80
E2311 (1)0.431 (1)100
M66640.604100
BGI-PUP319221.042100
N1,26940.32375
BGI-PUP
(GD-Ins29)3692 (1)0.542 (1)100
Non-ORF1
Total29,725142 (41)30.46100
ORFSize (nt)No. of S1Percentage of substitute (%)No. of N-Syn1Percentage of N-Syn (%)
R21,22292 (26)20.4365 (16)71
S3,76822 (7)0.5813 (5)59
BGI-PUP18259 (3)1.096 (2)67
BGI-PUP24655 (3)1.084 (2)80
E2311 (1)0.431 (1)100
M66640.604100
BGI-PUP319221.042100
N1,26940.32375
BGI-PUP
(GD-Ins29)3692 (1)0.542 (1)100
Non-ORF1
Total29,725142 (41)30.46100
1

S: substitution; N-Syn: non-synonymous substitution.

2

Number in the parenthesis indicates the substitutions contributed solely by Isolate GD01.

3

A single substitution at the same position in a region overlapped with two ORFs is counted as 2. The total number would be 137 if such a substitution event were calculated as 1, and the total number of substitutions contributed by Isolate GD01 would be 38, accordingly.

Table 3

Summarized Substitutions in 17 Isolates of SARS-CoV

ORFSize (nt)No. of S1Percentage of substitute (%)No. of N-Syn1Percentage of N-Syn (%)
R21,22292 (26)20.4365 (16)71
S3,76822 (7)0.5813 (5)59
BGI-PUP18259 (3)1.096 (2)67
BGI-PUP24655 (3)1.084 (2)80
E2311 (1)0.431 (1)100
M66640.604100
BGI-PUP319221.042100
N1,26940.32375
BGI-PUP
(GD-Ins29)3692 (1)0.542 (1)100
Non-ORF1
Total29,725142 (41)30.46100
ORFSize (nt)No. of S1Percentage of substitute (%)No. of N-Syn1Percentage of N-Syn (%)
R21,22292 (26)20.4365 (16)71
S3,76822 (7)0.5813 (5)59
BGI-PUP18259 (3)1.096 (2)67
BGI-PUP24655 (3)1.084 (2)80
E2311 (1)0.431 (1)100
M66640.604100
BGI-PUP319221.042100
N1,26940.32375
BGI-PUP
(GD-Ins29)3692 (1)0.542 (1)100
Non-ORF1
Total29,725142 (41)30.46100
1

S: substitution; N-Syn: non-synonymous substitution.

2

Number in the parenthesis indicates the substitutions contributed solely by Isolate GD01.

3

A single substitution at the same position in a region overlapped with two ORFs is counted as 2. The total number would be 137 if such a substitution event were calculated as 1, and the total number of substitutions contributed by Isolate GD01 would be 38, accordingly.

Based on comparative analyses on the complete genome sequences of the 17 SARS-CoV isolates from patients identified in Canada, USA, Singapore, and China (Beijing, Zhejiang, Guangdong, Hong Kong, and Taiwan), an unrooted phylogenetic tree of the SARS-CoV was constructed (Figure 2). The phylogenetic tree positioned Isolate GD01 in the group composed of BJ01, 02, 03, and 04, two of the Hong Kong isolates (CUHK), and the US isolate (Urbani). This paradigm suggests a possible transmission path among Guangdong, Hong Kong, Beijing and USA. The link between the two isolates (TOR2 and HKU) in another group has been suggested that the Toronto patient, from whom the isolate was obtained, was known to have traveled from Hong Kong (3). In addition, the five isolates identified in Singapore seem to form another group. Two relatively distant isolates in Taiwan and Zhejiang, China, reported more recently, are more distant from the rest. Additional molecular epidemiological data, based on genomic sequences of viral isolates identified from different regions, is essential to elucidate the possible routes of global spreading and mutation during its transmission among humans.

Phylogenetic analysis of the 17 SARS-CoV isolates based on completed genomes. The proposed rectangular cladogram was generated by Clustalw 1.81 and the bootstrap values were deducted from 1,000 replicates. The sources and abbreviations of the sequences are referred to the text of Table 1.
Fig. 2

Phylogenetic analysis of the 17 SARS-CoV isolates based on completed genomes. The proposed rectangular cladogram was generated by Clustalw 1.81 and the bootstrap values were deducted from 1,000 replicates. The sources and abbreviations of the sequences are referred to the text of Table 1.

The origin of SARS-CoV and possible multiple invasions to humans

The SARS-CoV has nearly identical genome organization, especially its gene order, with other members in Coronaviridae found in humans and other animals. The sequences of the SARS-CoV genome from various isolates are also almost identical with only a few dozen nucleotide substitutions per genome compared with each other. Therefore, we are strongly in favor of the hypothesis that the SARS-CoV may have originated from non-human animals, no matter it is virulent or latent to its host, in unknown reservoirs of the wild and recently moved onto humans through a less frequently established contacting route. It is of great importance that such non-human hosts should be promptly identified in order to prevent further evasion into human and other animal populations. The discovery of the GD-Ins29 genotype is very useful in differentiating the origin of SARS-CoV. The fact that we have only identified a couple of isolates in Guangdong Province supports the scenario that GD-Ins29 is the original form transmitted from non-human host to human ones and later turned into a new genotype after deletion of a DNA segment that may hinders its efficiency of propagation. Therefore, we predict two likely outcomes when surveying the animal reservoirs. First, the non-human reservoir of SARS-CoV may harbor genome sequences close to GD-Ins29, not the deleted form or the major genotype found most widespread among its human hosts. Second, both variants of the major and the minor genotypes could be found in the same animal reservoir owing to multiple transmissions of the virus back and forth between its human and non-human hosts.

Alternatively, there might have been at least two genotypes of SARS-CoV spreading in non-human host populations and they have infected human hosts as separate events. It is unlikely but possible that the virus have infected humans multiple times in a not-distant past and the two genotypes co-exist in the non-human hosts but not traceable in current human populations. Although we cannot rule out another possibility that the viruses of the major genotype had acquired a 29-nt sequence and become GD-Ins29 genotype during its propagation in human hosts, such event is deemed extremely rare since insertion or deletion is often lethal to the virus that have a rather compact genome.

Genomic and genetic knowledge can provide an extraordinary trove of information about the pathogenesis and virulence of SARS-CoV. More sequences data would significantly expand our knowledge on the etiology and evolution of the virus and are essential for seeking new approaches for diagnostics, vaccine development, and effective therapy of SARS. High quality sequence data and comprehensive analysis, as well as more experimental evidences, are still of urgent need to understand such a potent infectious agent.

Materials and Methods

We isolated the GD01 from a 54-year-old female patient. She was suspected being infected during her hospitalization by indirect contact with one of the “superspreaders”, who stayed in the same hospital as she did. She was one of the SARS cases with known transmission path connecting to the “Index Cases” identified in Guangdong Province. The viral isolate was from the autopsied lung tissue of the patient and propagated in Vero-E6 cell culture. The isolates of the same genotype were also discovered from SARS patients in Guangdong Province. Details in their genomic sequences will be reported elsewhere.

Virions were prepared from Vero-E6 cell culture. Aliquots of the RT-PCR products from the viral RNA were sequenced directly and the remaining was cloned into a plasmid vector (amplicon-library). Multiple clones from each amplicon-library were subsequently sequenced from both directions to confirm the results from those directly acquired from PCR products. High-quality sequences were assembled by using a sequence assembly package, Phred-Phrap-Consed (10). The consensus sequences from different amplicon-derived clones are accounted and minor variations among different clones are ignored for the consensus assembly. A total number of 75 amplicons and 2,718 sequencing reads were generated (1,538,993 bp of raw data), equivalent to approximately 52-fold coverage of the viral genome. A complete sequence of 29.757 Kb with an overall error rate of 0.0016% was obtained. The sequence of Isolate GD01 is available in GenBank (Accession No. AY278489).

Acknowledgements

We thank the Ministry of Science and Technology of China, the Chinese Academy of Sciences, National Natural Science Foundation of China, and the Chinese Academy of Military Medical Sciences for financial support. We thank Biopharmaceutical Center of Sun Yat-sen University for technical support. We are indebted to collaborators, clinicians and nurses from Peking Union Medical College Hospital, the National Center of Disease Control of China, Provincial Government of Zhejiang, and the Municipal Governments of Beijing and Hangzhou, as well as the patients and their families.

References

1

Peiris
 
J.S.
 et al.  
Coronavirus as a possible cause of severe acute respiratory syndrome
.
Lancet
 
2003
;
361
:
1319
1325
.

2

Rota
 
P.A.
 et al.  
Characterization of a novel coronavirus associated with severe acute respiratory syndrome
.
Science
 
2003
; http://www.sciencemag.org/cgi/rapidpdf/1085952v1.pdf.

3

Marra
 
M.A.
 et al.  
The genome sequence of SARS-associated coronavirus
.
Science
 
2003
; http://www.sciencemag.org/cgi/rapidpdf/1085953v1.pdf.

4

Ruan
 
Y.J.
 et al.  
Comparative full-length genome sequence analysis of 14 SARS coronavirus isolates and common mutations associated with putative origins of infection
.
Lancet
 
2003
;
361
:
1779
1785
.

5

Qin
 
E.D.
 et al.  
A complete sequence and comparative analysis of a SARS-associated virus (Isolate BJ01)
.
Chin. Sci. Bull.
 
2003
;
48
:
941
948
.

6

Zhu
 
L.
 et al.  
Cloning and characterization of a new silver-stainable protein SSP29, a member of the LRR family
.
Biochem. Mol. Biol. Int.
 
1997
;
42
:
927
935
.

7

Mencinger
 
M.
 et al.  
Expression analysis and chromosomal mapping of a novel human gene, APRIL, encoding an acidic protein rich in leucines
.
Biochim. Biophys. Acta
 
1998
;
1395
:
176
180
.

8

Domingo
 
E.
,
Holland
 
J.J.
 
RNA virus mutations and fitness for survival
.
Annu. Rev. Microbiol.
 
1997
;
51
:
151
178
.

9

Uchil
 
P.D.
,
Satchidanandam
 
V.
 
Characterization of RNA synthesis, replication mechanism, and in vitro RNA-dependent RNA polymerase activity of Japanese encephalitis virus
.
Virology
 
2003
;
307
:
358
371
.

10

Ewing
 
B.
,
Green
 
P.
 
Base-calling of automated sequencer traces using phred. II. Error probabilities
.
Genome Res.
 
1998
;
8
:
186
194
.

Author notes

These authors contributed equally to this work.

This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 license (https://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial use of the work as published, without adaptation or alteration provided the work is fully attributed. For commercial re-use, please contact journals.permissions@oup.com