oposSOM: R-package for high-dimensional portraying of genome-wide expression landscapes on bioconductor

Author Notes

Abstract

Motivation: Comprehensive analysis of genome-wide molecular data challenges bioinformatics methodology in terms of intuitive visualization with single-sample resolution, biomarker selection, functional information mining and highly granular stratification of sample classes. oposSOM combines those functionalities making use of a comprehensive analysis and visualization strategy based on self-organizing maps (SOM) machine learning which we call ‘high-dimensional data portraying’. The method was successfully applied in a series of studies using mostly transcriptome data but also data of other OMICs realms.

Availability and implementation: oposSOM is now publicly available as Bioconductor R package.

Contact: [email protected]

Supplementary information: Supplementary data are available at Bioinformatics online.

1 Introduction

Bioinformatics tools are needed which allow to statistically, functionally and visually summarise high-dimensional data such as transcriptome studies at different levels of resolution ranging from individual samples and genes to sample classes and expression modules of co-regulated genes. For this purpose, we developed a bioinformatics analysis pipeline based on self-organizing map (SOM) machine learning which facilitates a holistic view on this data (Wirth et al., 2011, 2012b). We termed this technique ‘high-dimensional data portraying’. It subsumes the visualization of the data landscape of each individual, a series of downstream bioinformatics and statistics analysis options and the detailed and comprehensive reporting of the results. We have chosen SOM machine learning as backbone because it combines strong clustering, dimension reduction, multidimensional scaling and visualization capabilities which have been shown to be advantageous compared to alternative methods such as clustering heatmaps and negative matrix factorization when applied to molecular high-throughput data (see Wirth et al., 2011 and references cited therein). We complemented the basal SOM algorithm with a sophisticated data analysis workflow including visualization of the individual feature landscapes, statistical testing for differential features and biomarker selection, mining of biological function, and also sample diversity analysis to assess classes of samples. oposSOM continues and largely extends the scope of a previous SOM-based expression analysis tool, the ‘gene expression dynamic inspector’ (GEDI) (Eichler et al., 2003): oposSOM is under steady development, provides a multitude of sample diversity analyses and, most importantly, provides comprehensive functional annotations.

Our portraying-method has been developed in first instance for gene expression data comprising from tens up to thousands of samples (e.g. tumour specimen in patient cohorts, experimental conditions in cell line experiments). The portraying functionality is unique and suited especially for scientists who attach importance to visual control and intuitive perception of complex data. The software was applied in a series of previous studies aiming at discovering the gene expression landscapes of healthy human tissues (Wirth et al., 2011), of cancer subtypes (Hopp et al., 2013a, b; Reifenberger et al., 2014) and of stem cell development (Charbord et al., 2014). Further applications addressed the integrative analysis of mRNA and miRNA expression data (Cakir et al., 2014), the proteome of algae (Wirth et al., 2012a), whole genome histone modification patterns (Steiner et al., 2012) and the genomic diversity of human ethnicities (Binder and Wirth, 2015).

2 Functionality

2.1 Package usability

The oposSOM package requires the input of gene-centered expression data solely, e.g. as pre-processed microarray intensity data or RNA-seq read counts in log-scale. All other program parameters are optional (see package vignette). An image of the analysis environment is stored upon completion of the oposSOM run.

2.2 Workflow

oposSOM comprises a multitude of analysis modules whose functionalities were described in detail in our previous publications. An illustration of the workflow and a complete list of methods implemented in the package can be found in the Supplementary Material. In brief, the package fulfils the following tasks:

The SOM space obtained from the training process is characterized by several supporting maps and profiles providing, e.g. the number of genes mapped to each meta-gene.
Samples are individually portrayed in PDF report sheets allowing the detailed examination of their expression landscapes and especially to identify modules of co-expressed genes.
Feature maps, reports and lists allow feature selection and evaluation of their statistical significance.
Gene set enrichment analysis of the expression modules provides their functional context based on a large collection of predefined gene sets.
Sample diversity analysis and class discovery is performed using multiple algorithms (e.g. hierarchical clustering, correlation spanning tree) and different metrics (Euclidean distance, Pearson’s correlation coefficient).

2.3 Results

oposSOM stores the results in a defined folder structure. These results comprise a variety of PDF documents, which provide extensive information about the systems studied (for example plots and images of the input data, supplementary descriptions of the SOM generated and associated metadata, the sample diversity landscape and also functional annotations). The PDF reports are complemented by CSV spreadsheets, which render the complete information accessible. Detailed descriptions of the algorithms and visualizations were given in our previous publications (Hopp et al., 2013a, b; Wirth, 2012; Wirth et al., 2011, 2012b). HTML files are generated to provide easy access to the analysis results via an intuitive and descriptive interface. A Summary.html can be found in the results folder created by oposSOM. We recommend new users to browse the results using this interface.

3 Use case: portraying of cancer subtypes

We applied oposSOM to patient expression data of mature aggressive B-cell lymphomas to characterize their genome wide expression landscapes in terms of four distinct molecular subtypes which associate with differing clinical phenotypes and survival prognosis (Hopp et al., 2013a).

Figure 1 provides an overview of the analysis steps: The expression portraits visualize the expression landscape of each individual sample (Fig. 1a) and of each subtype (Fig. 1b). Red and blue ‘spots’ in the portraits can be assigned to modules of co-expressed genes up- and down-regulated in the respective sample/subtype, respectively. The subtype portraits in Figure 1b immediately reveal distinct and subtype-specifically over-expressed expression modules emerging as red spots located near the corners of the respective portrait.

Fig. 1.

oposSOM analysis of a cohort of 220 mature B-cell lymphoma cases (see text)

Open in new tab Download slide

All expression modules detected are summarized in the spot-overview map (Fig. 1c). Each module is characterized in terms of the list of genes included, their mean expression profile in all samples studied and a list of enriched gene sets enabling functional interpretation (Fig. 1d). Sample diversity plots, e.g. based on correlation network and correlation spanning tree algorithms visualize multivariate similarity relations between the samples (Fig. 1e, f). They support our definition of the molecular subtypes by forming well separated sample clusters.

A second use case addressing the expression landscapes of human tissues can be found in the supplement. It illustrates advantages of oposSOM data portraying compared to a ‘traditional’ two-way clustering heatmap.

4 Conclusion

oposSOM bundles a series of sophisticated analysis methods with intuitive visualization options to study high-dimensional data with the special focus on gene-centered expression data. It is designed for a broad user community ranging from bioinformaticians with demands for comprehensive analyses in a sophisticated workflow to application-oriented experimenters with needs in intuitive visualization options for their data.

Acknowledgements

This publication is supported by the Federal Ministry of Education and Research (BMBF), project grant No. FKZ 031 6166 (MMML-MYC-SYS) and FKZ 031 6065A (HNPCC-SYS).

Conflict of Interest: none declared.

References

Binder

Wirth

(

2015

)

Analysis of Large-Scale OMIC Data Using Self Organizing Maps

. In:

Khosrow-Pour

(ed.)

Encyclopedia of Information Science and Technology

, 3rd edn.

IGI global

Hershey, PA, USA

, pp.

1642

–

1654

Cakir

M.V.

et al. . (

2014

)

MicroRNA expression landscapes in stem cells, tissues, and cancer

Methods Mol. Biol.

1107

279

–

302

Charbord

et al. . (

2014

)

A systems biology approach for defining the molecular framework of the hematopoietic stem cell niche

Cell Stem Cell

376

–

391

Eichler

G.S.

et al. . (

2003

)

Gene expression dynamics inspector (GEDI): for integrative analysis of expression profiles

Bioinformatics

2321

–

2322

Hopp

et al. . (

2013a

)

Portraying the expression landscapes of B-cell lymphoma—intuitive detection of outlier samples and of molecular subtypes

Biology (Basel).

1411

–

1437

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

Hopp

et al. . (

2013b

)

Portraying the expression landscapes of cancer subtypes: a glioblastoma multiforme and prostate cancer case study

Syst. Biomed.

–

Google Scholar

OpenURL Placeholder Text

WorldCat

Reifenberger

et al. . (

2014

)

Molecular characterization of long-term survivors of glioblastoma using genome- and transcriptome-wide profiling

Int. J. Cancer

135

1822

–

1831

Steiner

et al. . (

2012

)

A global genome segmentation method for exploration of epigenetic patterns

PLoS One

e46811

Wirth

et al. . (

2011

)

Expression cartography of human tissues using self organizing maps

BMC Bioinformatics

306

–

352

Wirth

(

2012

)

Analysis of large-scale molecular biological data using self-organizing maps

Dissertation thesis, University of Leipzig

Google Scholar

Wirth

et al. . (

2012a

)

MALDI-typing of infectious algae of the genus Prototheca using SOM portraits

J. Microbiol. Methods

–

Wirth

et al. . (

2012b

)

Mining SOM expression portraits: feature selection and integrating concepts of molecular function

BioData Min.

–

Author notes

Associate Editor: Ziv Bar-Joseph

Download all slides

Month:	Total Views:
December 2016	2
January 2017	23
February 2017	49
March 2017	31
April 2017	4
May 2017	21
June 2017	34
July 2017	90
August 2017	90
September 2017	46
October 2017	65
November 2017	58
December 2017	129
January 2018	99
February 2018	100
March 2018	173
April 2018	104
May 2018	162
June 2018	97
July 2018	154
August 2018	162
September 2018	121
October 2018	154
November 2018	83
December 2018	78
January 2019	112
February 2019	57
March 2019	67
April 2019	68
May 2019	67
June 2019	102
July 2019	78
August 2019	100
September 2019	88
October 2019	53
November 2019	57
December 2019	40
January 2020	39
February 2020	45
March 2020	62
April 2020	27
May 2020	10
June 2020	47
July 2020	44
August 2020	53
September 2020	30
October 2020	45
November 2020	35
December 2020	38
January 2021	27
February 2021	55
March 2021	56
April 2021	85
May 2021	38
June 2021	47
July 2021	66
August 2021	35
September 2021	78
October 2021	72
November 2021	74
December 2021	79
January 2022	68
February 2022	60
March 2022	90
April 2022	72
May 2022	67
June 2022	46
July 2022	45
August 2022	44
September 2022	60
October 2022	83
November 2022	72
December 2022	61
January 2023	51
February 2023	45
March 2023	25
April 2023	66
May 2023	52
June 2023	18
July 2023	25
August 2023	34
September 2023	50
October 2023	47
November 2023	43
December 2023	40
January 2024	54
February 2024	53
March 2024	75
April 2024	54
May 2024	45
June 2024	27
July 2024	31
August 2024	54
September 2024	33
October 2024	63
November 2024	36
December 2024	34
January 2025	36
February 2025	40
March 2025	21
April 2025	31
May 2025	15

Article Contents

oposSOM: R-package for high-dimensional portraying of genome-wide expression landscapes on bioconductor

Abstract

1 Introduction

2 Functionality

2.1 Package usability

2.2 Workflow

2.3 Results

3 Use case: portraying of cancer subtypes

4 Conclusion

Acknowledgements

References

Author notes

Supplementary data

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

Looking for your next opportunity?

Article Contents

oposSOM: R-package for high-dimensional portraying of genome-wide expression landscapes on bioconductor Free

Abstract

1 Introduction

2 Functionality

2.1 Package usability

2.2 Workflow

2.3 Results

3 Use case: portraying of cancer subtypes

4 Conclusion

Acknowledgements

References

Author notes

Supplementary data

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

Looking for your next opportunity?

This Feature Is Available To Subscribers Only

oposSOM: R-package for high-dimensional portraying of genome-wide expression landscapes on bioconductor