Abstract

Motivation

Imaging mass spectrometry (IMS) has become an important tool for molecular characterization of biological tissue. However, IMS experiments tend to yield large datasets, routinely recording over 200,000 ion intensity values per mass spectrum and more than 100,000 pixels, i.e., spectra, per dataset. Traditionally, IMS data size challenges have been addressed by feature selection or extraction, such as by peak picking and peak integration. Selective data reduction techniques such as peak picking only retain certain parts of a mass spectrum, and often these describe only medium-to-high-abundance species. Since lower-intensity peaks and, for example, near-isobar species are sometimes missed, selective methods can potentially bias downstream analysis towards a subset of species in the data rather than considering all species measured.

Results

We present an alternative to selective data reduction of IMS data that achieves similar data size reduction while better conserving the ion intensity profiles across all recorded m/z-bins, thereby preserving full spectrum information. Our method utilizes a low-rank matrix completion model combined with a randomized sparse-format-aware algorithm to approximate IMS datasets. This representation offers reduced dimensionality and a data footprint comparable to peak picking, but also captures complete spectral profiles, enabling comprehensive analysis and compression. We demonstrate improved preservation of lower signal-to-noise-ratio signals and near-isobars, mitigation of selection bias, and reduced information loss compared to current state-of-the art data reduction methods in IMS.

Availability

The source code is available at https://github.com/vandeplaslab/full_profile and data is available at https://doi.org/10.4121/a6efd47a-b4ec-493e-a742-70e8a369f788

Supplementary information

Supplementary materials are available at Bioinformatics online.

Information Accepted manuscripts
Accepted manuscripts are PDF versions of the author’s final manuscript, as accepted for publication by the journal but prior to copyediting or typesetting. They can be cited using the author(s), article title, journal title, year of online publication, and DOI. They will be replaced by the final typeset articles, which may therefore contain changes. The DOI will remain the same throughout.
This content is only available as a PDF.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
Associate Editor: Jianlin Cheng
Jianlin Cheng
Associate Editor
Search for other works by this author on:

Supplementary data