waviCGH: a web application for the analysis and visualization of genomic copy number alterations

Author Notes

Abstract

waviCGH is a versatile web server for the analysis and comparison of genomic copy number alterations in multiple samples from any species. waviCGH processes data generated by high density SNP-arrays, array-CGH or copy-number calls generated by any technique. waviCGH includes methods for pre-processing of the data, segmentation, calling of gains and losses, and minimal common regions determination over a set of experiments. The server is a user-friendly interface to the analytical methods, with emphasis on results visualization in a genomic context. Analysis tools are introduced to the user as the different steps to follow in an experimental protocol. All the analysis steps generate high quality images and tables ready to be imported into spreadsheet programs. Additionally, for human, mouse and rat, altered regions are represented in a biological context by mapping them into chromosomes in an integrated cytogenetic browser. waviCGH is available at http://wavi.bioinfo.cnio.es.

INTRODUCTION

Classic comparative genomic hybridization (CGH) techniques were developed to compare the copy number of differentially labeled test and normal reference DNAs using fluorescence in situ hybridization (FISH) in metaphase chromosomes (1). Later on, the technique was improved (2,3) using microarrays (aCGH). Nowadays, multiple microarray platforms exist to directly measure genomic DNA copy number [see (4) for a review] including the recent adaptation of SNP arrays for copy number variants (CNV) detection (5). aCGH and SNP arrays have become the standard techniques for the detection chromosomal copy number alterations at high resolution in many laboratories. However, the statistical analysis of genomic copy number data is not straightforward for researchers without bioinformatics expertise.

waviCGH is a web server application aimed to help researchers to perform all the steps of genomic copy number analysis detected by microarrays:

Normalization.
Pre-processing.
Segmentation.
Calling of gains and losses.
Minimal common regions (MCR) determination over a set of experiments.

A number of related public systems for the analysis of this type of data are available (6–9), including several web servers (10–16). Most of the existing applications only cover only some of the analytical steps, and require the user to input data at one specific stage of processing, not allowing to catch up the analysis at previous or later steps. Major improvements offered by waviCGH are: a flexible and user-friendly interface, state-of-the-art methods, and organization of the methods and their results into multiple-step protocols that can be accessed at different stages of the analysis process. In addition, waviCGH accepts not only log-ratios, which are the usual form of starting input, but also data already called as gained/lost/no-changed. Copy number calls are the natural form to represent final results, as calls have a clear biological interpretation and can be compared between different experiments and platforms. Moreover, for human, mouse and rat, all individual altered segments and MCRs can be easily explored in a chromosomal context by using the integrated cytogenetic browser. Finally, waviCGH can produce summary karyotypes with the results, which is the most common way of presenting genomic copy number results in publications.

DESCRIPTION OF THE TOOL

waviCGH has been built as a logical multi-step but simple protocol to facilitate the usability of the most common copy number analytical procedures. A protocol, or workflow, is simply a series of analysis steps. Users can choose between two protocol types: log-ratios and copy-numbers. When a protocol is selected and the input data is uploaded into waviCGH, a new project is created and the URL of the project is provided. The results obtained in a project can be accessed for 5 days.

A flow diagram of all analysis steps in waviCGH showing the commonalities and differences of the protocols is presented in Figure 1. Each protocol automatically directs the user to the adequate set of steps, according to the nature and format of the initial data. In both protocols users can analyze dataset files as big as 400 Mb. Users with datasets exceeding this limit are invited to contact us.

The log-ratios protocol includes all the possible analysis steps. The log-ratios workflow begins with a simple table of log-ratios that reflect the difference in intensity between two samples hybridized in a two-color array (aCGH, like Agilent or Nimblegen) or hybridized in independent arrays (SNP-arrays, like Affymetrix or Illumina). The log-ratios of intensities need to be normalized, preprocessed and segmented before calling of chromosomal gains and losses; all these steps are performed by waviCGH in the log-ratios protocol.
The copy number protocol takes called probes that were already segmented and translated into copy number alteration states (–1 for losses, 1 for gains and 0 for normal). waviCGH will search MCRs altered across the different samples and will display altered regions and MCRs on chromosomes, providing links to visualize the corresponding genomic regions in EnsEMBL (17). This type of protocol can be especially useful for researchers who have data already analyzed and need to visualize their results and compare copy numbers among multiple samples.

Figure 1.

aCGH analysis protocols and methods as implemented in waviCGH.

Open in new tab Download slide

The interface of waviCGH is divided into four major sections: the Control Panel, the Results window, the Protocol Sidebar and the Help frame. The Control Panel is a button bar where the users can control the flow of the analysis protocol. The Results Window displays the results of each analysis step in a different tab; all numerical results can be downloaded as text files for easy perusal, or import into other applications. The Protocol Sidebar contains option boxes that correspond to the analysis steps of that protocol, and each one shows the parameters specific for that step. From each analysis step box you can directly access the help section that explains both the methods and the results. In addition, a complete manual with tutorials is available at http://wavi.bioinfo.cnio.es/waviCGH_guide.pdf.

ANALYSIS METHODS

The methods available in waviCGH protocols are schematically summarized in Figure 1. As mentioned above, the protocol depends on the type of input data, which in turn defines the available methods. We have optimized many of the methods to increase their speed and performance (see ‘Implementation details’ section).

Genomic coordinates

After data upload, waviCGH can update the genomic position of microarray probe annotations from old NCBI assemblies the human and mouse genomes to the current reference genome versions (GRCh37 and NCBIM37, respectively).

Normalization and preprocessing

Log-ratios can be median-normalized, thus setting the log-ratio median at zero (18). Averaging duplicated probes and missing value imputation are done with aCGH (Fridlyand and Dimitrov, www.bioconductor.org) and snapCGH (Smith, Marioni, McKinney, Hardcastle and Thorne; http://www.bioconductor.org). Genomic waves can be adjusted using the method of Diskin et al. (19).

Segmentation

We define ‘segmentation’ as the process of smoothing the observed/normalized log-ratios, so probes predicted to be in the same chromosomal segment have the same value. Translating these smoothed ratios into gain/no-change/loss states is a different step that is generally named ‘calling’ (of gains and losses). Segmentation methods available are: HaarSeg (20), DNAcopy (21), GLAD (22), wavelets (23), HMM (24), BioHMM (25) and CGHseg (26).

Calling

After segmentation, users can select between two alternative calling methods: segmentation-based or probability-based. Segmentation-based calling is done after DNAcopy, HMM and BioHMM using mergeLevels algorithm (27). In the case of GLAD, we use its own region assignment algorithm (22). For wavelets, CGHseg and HaarSeg, users can follow the recommendation from the HaarSeg authors of using a median absolute deviation (mad) cut-off of their choice (20). Probability-based calling is done using CGHcall package (28). Both strategies will give calls in a numerical format of 0/–1/1 (and optionally 2 for amplifications, in the case of probability-based calling), with CGHcall additionally providing probabilities.

MCRs

Two different strategies can be used to find MCRs: SuperSORI and Permutations. SuperSORI is a fast method that detect all MCRs shared by at least two individuals, also called smallest regions of imbalance (SORI) (29,30). SuperSORI performs a curation of the calls to generate consistent segments before searching MCRs. First, it filters out segments of a given number of probes and joins segments separated by gaps of less than a given size (in base pairs or number of probes). Then, MCRs are found by obtaining the intersection of the curated segments. Permutations method is a user-friendly implementation similar to previous approaches (31,32) but with a different permutation schema designed to deal with high density array data (see Supplementary Data for details). Briefly, it computes for each probe a P-value that tests the significance of the alteration of that probe across the set of samples. This P-value is based on a permutation test that assumes that the alterations found are randomly located in the genome. Then, the consecutive probes with P-values lower than a cut-off are merged in a common region.

Cytogenetic browser

MCRs and individually gained and lost regions in human, mouse or rat genomes are finally sent to the Cytogenetic Browser. The aim of this tool is to show all cytogenetic alterations obtained from the aCGH analysis in a simple and manageable way, so users can easily compare results among samples. If the user wants to explore any region in more detail, a simple click will directly link to Ensembl genome browser (16), which will display a genomic window size corresponding to that region. Finally, a summary karyotype/ideogram image with results for all chromosomes can be automatically generated (Figure 2).

Figure 2.

Example results from a typical analysis. Log-ratios corresponding to the eight samples from Kidd et al. (33) were analyzed. (A) Boxplots of log-ratios after (rigth) and before (left) median normalization. (B) Probability-based calling results for chromosome 21. Black points: normalized log-ratios. Blue lines: DNAcopy-segmented log-ratios. Red bars: loss probability. Green bars: red probability (inverted scale). (C) Karyotype including panels with gains (green, above chromosomes) and losses (red, below chromosomes).

Open in new tab Download slide

EXAMPLE PROJECTS

We have included pre-ran example projects, which present all results for selected datasets. waviCGH users can also download these datasets in waviCGH format, and follow the tutorials included in the guide to learn how we selected appropriate parameters.

We will use the aCGH data from Kidd (33) to illustrate waviCGH functionality. Kidd and co-workers used Agilent aCGH 244K custom arrays for the validation of 512 previously discovered copy number variant (CNVs) regions in the genome of eight healthy individuals of diverse geographic ancestry. The complete results of our analysis are available in waviCGH web site as an example project at http://wavi.bioinfo.cnio.es/?ProjectNumber=EXAMPLE/ProjectLogRatiosKIDD.

We downloaded the raw data from GEO (accession GSE10008) and selected samples with reference in Cy3 (green). Log-ratios were calculated using processed red and green signals, and the log-ratios were uploaded to waviCGH (Figure 2). As boxplots showed differences between the samples log-ratios distribution, we median-normalized them (Figure 2A). We then did the Preprocessing step: 2815 duplicated probes were averaged, probes outside autosomes were removed (1276) and missing values were imputed. The remaining 208 428 probe log-ratios were segmented using DNAcopy. We called gains and losses with the probability-based method (CGHcall) obtaining an average of 470 gains and 210 losses per sample. Figure 2B shows segmentation and calling results for the eight samples in chromosome 21. We then looked for significant MCRs using Permutations method. We found 284 MCRs with adjusted P-value lower than 0.05, with sizes ranging between 5 bp and 1.66 Mb (median 2.18 kb), and frequencies ranging between 0.375 and 1 (median 0.5). These results are in agreement with the results reported in Kidd (32) where they observed that 50% of the discovered CNVs were present in more than one individual. Finally the results were displayed in the Cytogenetic Browser, where we generated a Karyotype showing all gains and losses detected in the eight individuals (Figure 2C).

CONCLUDING REMARKS

The goal of waviCGH is to facilitate researchers the aCGH analytical process by meeting their copy number variation analysis needs in an easy-to-use web server application. waviCGH provides a fundamentally different approach to aCGH analysis. We have implemented a useful workflow than can be accessed at different steps depending on the type of input data. The different input types will determine the analysis protocols and both protocols converge at the two last analysis steps: MCR finding and Cytogenetic Browser results visualization tool. All results can be easily explored using the integrated Cytogenetic Browser, which facilitate their interpretation by mapping all regions into human, mouse or rat chromosomes.

Usage of waviCGH is free and open to all, and that there is no login requirement.

IMPLEMENTATION DETAILS

waviCGH has been optimized to use with the multi-platform browser Mozilla Firefox v3.x, and it works with Mac Safari v4, Google Chrome v4 and Opera v10. Microsoft Internet Explorer is not supported. waviCGH client side is an Ajax rich user interface implemented with HTML, CSS and Javascript, using mainly YUI (Yahoo! User Interface) v2.6. The server side is implemented in perl 5.8.8 and python 2.4. The back end analysis logic runs on a cluster of 30 nodes, with two dual-core AMD Opteron CPUs each. We use R 2.10, and segmentation methods are parallelized via Rmpi (Yu, cran.r-project.org), snow (Tierney, Rossini, Li and Sevcikova, cran.r-project.org), and snowfall (Knaus, cran-r-project.org). For detailed description of segmentation methods implementation and the default parameters we used, please see Appendix of the guide (http://wavi.bioinfo.cnio.es/waviCGH_guide.pdf). To allow the handling of very large data sets in R, and to minimize sending large objects via MPI, we use the ff R library (Adler, Glaeser, Nenadic, Oehlschlaegel and Zucchini, cran.r-project.org) for memory-efficient storage of large data on disk and fast access functions. CGHcall (28) is run by chromosome in serialized fashion using the Sun Grid Engine batch-queuing system. SuperSORI was written in perl. Genomic waves adjustment is done using perl scripts kindly provided by the authors (19). Cytoband-region mapping of the Cytogenetic Browser are done using the Ensembl API (34).

FUNDING

National Institute for Bioinformatics (www.inab.org), a platform of ‘Genoma Espaï¿½a’; Fundacion de Investigacion Medica Mutua Madrileï¿½a (partial); Spanish Ministry of Science and Innovation (MICINN) (Project BIO2009-12458 and PTA2009-2853-I partial funding to A.C.). Funding for open access charge: Project BIO2009-12458 of MICINN.

Conflict of interest statement. None declared.

ACKNOWLEDGEMENTS

We thank all CNIO researchers from the Human Cancer Genetics and Molecular Pathology Programs who tested the application; we appreciate their suggestions to improve the user interface. Eduardo Andres for his help deploying the web server. We are grateful to Kay Wang for all his help with the genomic waves adjustment method. Anaï¿½s Baudot, Gonzalo Gomez and Alfonso Valencia for their helpful suggestions after reading the manuscript.

REFERENCES

Kallioniemi

Sudar

Rutovitz

Gray

Waldman

Pinkel

Comparative genomic hybridization: a rapid new method for detecting and mapping DNA amplification in tumors

Semin Cancer Biol.

1993

, vol.

(pg.

)

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

Pinkel

Segraves

Sudar

Clark

Poole

Kowbel

Collins

Kuo

Chen

Zhai

et al. ,

High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays

Nat. Genet.

1998

, vol.

(pg.

207

211

)

Solinas-Toldo

Lampel

Stilgenbauer

Nickolenko

Benner

Dohner

Cremer

Lichter

Matrix-based comparative genomic hybridization: biochips to screen for genomic imbalances

Genes Chromosomes Cancer

1997

, vol.

(pg.

399

407

)

Carter

Methods and strategies for analyzing copy number variation using DNA microarrays

Nat. Genet.

2007

, vol.

(pg.

S16

S21

)

Zhao

Paez

Chin

Jï¿½nne

Chen

Girard

Minna

Christiani

Leo

et al. ,

An integrated view of copy number and allelic alterations in the cancer genome using single nucleotide polymorphism arrays

Cancer Res.

2004

, vol.

(pg.

3060

3071

)

Chen

Liu

Chao

CNVDetector: locating copy number variations using array CGH data

Bioinformatics

2008

, vol.

(pg.

2773

2775

)

Margolin

Greshock

Naylor

Mosse

Maris

Bignell

Saeed

Quackenbush

Weber

CGHAnalyzer: a stand-alone software package for cancer genome analysis using array-based DNA copy number data

Bioinformatics

2005

, vol.

(pg.

3308

3311

)

Chen

Erdogan

Ropers

Lenzner

Ullmann

CGHPRO – a comprehensive data analysis tool for array CGH

BMC Bioinformatics

2005

, vol.

pg.

Myers

Chen

Troyanskaya

Visualization-based discovery and analysis of genomic aberrations in microarray data

BMC Bioinformatics

2005

, vol.

pg.

146

Conde

Montaner

Burguet-Castell

Tï¿½rraga

Medina

Al-Shahrour

Dopazo

ISACGH: a web-based environment for the analysis of array CGH and gene expression which includes functional profiling

Nucleic Acids Res.

2007

, vol.

(pg.

W81

W85

)

Dï¿½az-Uriarte

Rueda

ADaCGH: a parallelized web-based application and R package for the analysis of aCGH data

PLoS ONE

2007

, vol.

pg.

e737

Frankenberger

Harmon

Church

Gangi

Munroe

Urzï¿½a

WebaCGH: an interactive online tool for the analysis and display of array comparative genomic hybridisation data

Appl. Bioinformatics

2006

, vol.

(pg.

125

130

)

Kim

Nam

Lee

Park

Yoo

Lee

Chung

ArrayCyGHt: a web application for analysis and visualization of array-CGH data

Bioinformatics

2005

, vol.

(pg.

2554

2555

)

Lai

Choudhary

Park

CGHweb: a tool for comparing DNA copy number segmentations from multiple algorithms

Bioinformatics

2008

, vol.

(pg.

1014

1015

)

Liva

Hupï¿½

Neuvial

Brito

Viara

La Rosa

Barillot

CAPweb: a bioinformatics CGH array Analysis Platform

Nucleic Acids Res.

2006

, vol.

(pg.

W477

W481

)

La Rosa

Viara

Hupï¿½

Pierron

Liva

Neuvial

Brito

Lair

Servant

Robine

et al. ,

VAMP: visualization and analysis of array-CGH, transcriptome and other molecular profiles

Bioinformatics

2006

, vol.

(pg.

2066

2073

)

Flicek

Aken

Beal

Ballester

Caccamo

Chen

Clarke

Coates

Cunningham

Cutts

et al. ,

Ensembl 2008

Nucleic Acids Res.

2008

, vol.

(pg.

D707

D714

)

Smyth

Gentleman

Carey

Dudoit

Irizarry

Huber

Limma: linear models for microarray data

Bioinformatics and Computational Biology Solutions using R and Bioconductor

2005

New York

Springer

(pg.

397

420

)

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Diskin

Hou

Yang

Glessner

Hakonarson

Bucan

Maris

Wang

Adjustment of genomic waves in signal intensities from whole-genome SNP genotyping platforms

Nucleic Acids Res.

2008

, vol.

pg.

e126

Ben-Yaacov

Eldar

A fast and flexible method for the segmentation of aCGH data

Bioinformatics

2008

, vol.

(pg.

i139

i145

)

Olshen

Venkatraman

Lucito

Wigler

Circular binary segmentation for the analysis of array-based DNA copy number data

Biostatistics

2004

, vol.

(pg.

557

572

)

Hupï¿½

Stransky

Thiery

Radvanyi

Barillot

Analysis of array CGH data: from signal ratio to gain and loss of DNA regions

Bioinformatics

2004

, vol.

(pg.

3413

3422

)

Hsu

Self

Grove

Randolph

Wang

Delrow

Loo

Porter

Denoising array-based comparative genomic hybridization data using wavelets

Biostatistics

2005

, vol.

(pg.

211

226

)

Fridlyand

Snijders

Pinkel

Albertson

Hidden markov models approach to the analysis of array cgh data

J. Multivariate Anal.

2004

, vol.

(pg.

132

153

)

Google Scholar

Crossref

WorldCat

Marioni

Thorne

Tavarï¿½

Biohmm: a heterogeneous hidden markov model for segmenting array cgh data

Bioinformatics

2006

, vol.

(pg.

1144

1146

)

Picard

Robin

Lavielle

Vaisse

Daudin

A statistical approach for array cgh data analysis

BMC Bioinformatics

2005

, vol.

pg.

Willenbrock

Fridlyand

A comparison study: applying segmentation to array CGH data for downstream analyses

Bioinformatics

2005

, vol.

(pg.

4084

4091

)

van de Wiel

Kim

Vosse

van Wieringen

Wilting

Ylstra

CGHcall: calling aberrations for array CGH tumor profiles

Bioinformatics

2007

, vol.

(pg.

892

894

)

Ferreira

Garcï¿½a

Suela

Mollejo

Camacho

Carro

Montes

Piris

Cigudosa

Comparative genome profiling across subtypes of low-grade B-cell lymphoma identifies type-specific and common aberrations that target genes with a role in B-cell neoplasia

Haematologica

2008

, vol.

(pg.

670

679

)

Ferreira

Alonso

Carrillo

Acquadro

Largo

Suela

Teixeira

Cerveira

Molares

Gï¿½mez-Lï¿½pez

et al. ,

Array CGH and gene-expression profiling reveals distinct genomic instability patterns associated with DNA repair and cell-cycle checkpoint pathways in Ewing' s sarcoma

Oncogene

2008

, vol.

(pg.

2084

2090

)

Diskin

Eck

Greshock

Mosse

Naylor

Stoeckert

Weber

Maris

Grant

STAC: A method for testing the significance of DNA copy number aberrations across multiple array-CGH experiments

Genome Res.

2006

, vol.

(pg.

1149

1158

)

Kim

Jung

Rhyu

Jung

Chung

GEAR: genomic enrichment analysis of regional DNA copy number changes

Bioinformatics

2008

, vol.

(pg.

420

421

)

Kidd

Cooper

Donahue

Hayden

Sampas

Graves

Hansen

Teague

Alkan

Antonacci

et al. ,

Mapping and sequencing of structural variation from eight human genomes

Nature

2008

, vol.

453

(pg.

)

Stabenau

McVicker

Melsopp

Proctor

Clamp

Birney

The Ensembl core software libraries

Genome Res.

2004

, vol.

(pg.

929

933

)

Author notes

The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors

ï¿½ The Author(s) 2010. Published by Oxford University Press.

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download all slides

Month:	Total Views:
December 2016	1
February 2017	12
March 2017	3
April 2017	2
June 2017	2
July 2017	4
August 2017	5
September 2017	4
November 2017	3
December 2017	18
January 2018	19
February 2018	15
March 2018	10
April 2018	13
May 2018	5
June 2018	13
July 2018	4
August 2018	6
September 2018	11
October 2018	3
November 2018	23
December 2018	5
January 2019	7
February 2019	9
March 2019	18
April 2019	17
May 2019	12
June 2019	11
July 2019	7
August 2019	14
September 2019	4
October 2019	4
November 2019	4
December 2019	10
January 2020	9
February 2020	8
March 2020	7
April 2020	11
May 2020	12
June 2020	5
July 2020	4
August 2020	12
September 2020	3
October 2020	3
November 2020	4
December 2020	5
January 2021	2
February 2021	2
March 2021	5
April 2021	5
May 2021	3
June 2021	14
July 2021	5
August 2021	1
September 2021	4
October 2021	10
November 2021	16
December 2021	8
January 2022	1
February 2022	7
March 2022	9
April 2022	10
May 2022	7
June 2022	1
July 2022	12
August 2022	19
September 2022	27
October 2022	21
November 2022	9
December 2022	8
January 2023	9
March 2023	13
April 2023	1
May 2023	10
July 2023	3
August 2023	6
September 2023	10
October 2023	12
November 2023	7
December 2023	14
January 2024	10
February 2024	9
March 2024	18
April 2024	21
May 2024	8
June 2024	2
July 2024	23
August 2024	5
September 2024	10
October 2024	6
November 2024	3
December 2024	2
January 2025	1
February 2025	3
March 2025	5
April 2025	10
May 2025	1

Article Contents

waviCGH: a web application for the analysis and visualization of genomic copy number alterations

Abstract

INTRODUCTION

DESCRIPTION OF THE TOOL

ANALYSIS METHODS

Genomic coordinates

Normalization and preprocessing

Segmentation

Calling

MCRs

Cytogenetic browser

EXAMPLE PROJECTS

CONCLUDING REMARKS

IMPLEMENTATION DETAILS

FUNDING

ACKNOWLEDGEMENTS

REFERENCES

Author notes

Supplementary data

Comments

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

Article Contents

waviCGH: a web application for the analysis and visualization of genomic copy number alterations Open Access

Abstract

INTRODUCTION

DESCRIPTION OF THE TOOL

ANALYSIS METHODS

Genomic coordinates

Normalization and preprocessing

Segmentation

Calling

MCRs

Cytogenetic browser

EXAMPLE PROJECTS

CONCLUDING REMARKS

IMPLEMENTATION DETAILS

FUNDING

ACKNOWLEDGEMENTS

REFERENCES

Author notes

Supplementary data

Comments

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only

waviCGH: a web application for the analysis and visualization of genomic copy number alterations