ABSTRACT

Using sonification on scientific data analysis provides additional dimensions to visualization, potentially increasing researchers’ analytical capabilities and fostering inclusion and accessibility. This research explores the potential of multimodal integral field spectroscopy applied to galaxy analysis through the development and evaluation of a tool that complements the visualization of data cubes with sound. The proposed application, ViewCube, provides interactive visualizations and sonifications of spectral information across a 2D field-of-view, and its architecture is designed to incorporate future sonification approaches. The first sonification implementation described in this article uses a deep learning module to generate binaural unsupervised auditory representations. The work includes a qualitative and quantitative user study based on an online questionnaire, aimed at both specialized and non-specialized participants, focusing on the case study of data cubes of galaxies from the Calar Alto Integral Field Spectroscopy Area survey. Out of 67 participants who completed the questionnaire, 42 had the opportunity to test the application in person prior to filling out the online survey. 81 per cent of these 42 participants expressed the good interactive response of the tool, 79.1 per cent of the complete sample found the application ‘Useful’, and 58.2 per cent rated its aesthetics as ‘Good’. The quantitative results suggest that all participants were able to retrieve information from the sonifications, pointing to previous experience in the analysis of sound events as more helpful than previous knowledge of the data for the proposed tasks, and highlighting the importance of training and attention to detail for the understanding of complex auditory information.

1 INTRODUCTION

The combination of visual and auditory displays can offer a better understanding of a phenomenon (Enge et al. 2024), making the use of sound for the representation of physical quantities an established area of research (Dubus & Bresin 2013). It can allow holistic interpretations of the data facilitating the discovery of previously unseen relationships (Cooke et al. 2017), as well as single datum analytic tasks, both involving point estimation and comparison, trend identification, and data structure analysis (Walker & Nees 2011).

Sonification has proven to be a very useful tool to assist in the analysis of hyperspectral data, generating sonic time-series related to the spatial and spectral content near user-selected mouse positions (Bernhardt, Cowell & Oxford 2007). It can also improve the perception of density in visualization of complex data in parallel coordinates and scatter plots (Rönnberg & Jimmy 2016), and ensure simple access to information for blind and non-blind people, enhancing the accessibility of astronomical data as well as the work of the scientist accepting complementary exploration methodologies (Casado, Diaz-Merced & García 2024).

Furthermore, the enhancement of visual information with sonification allows sighted and blind or low vision (BLV) communities to have astronomy experiences at similar levels (Arcand et al. 2024). Including blind users from the beginning of the design process, Casado & García (2024) proposed the case study of the galaxies from the Sloan Digital Sky Survey (SDSS) to show the possibilities of sonoUno. This multimodal application for displaying sound and images from any data set allowed for the discovery of the variable star UCAC4 459–09273 by blind students using sonification. As a blind researcher, Foran, Cooke & Hannam (2022) reported the use of StarSound on 1D high-redshift galaxy work for the verification and initial analysis of the rest-frame ultraviolet (UV) spectra of distant galaxies, and developed the touch-based sonification tool VoxMagellan to analyse 2D images and multidimensional data sets.

In the photometric and spectroscopic analysis field, Trayford et al. (2023) proposed the audification of spectral data cubes (direct conversion of data into audible frequencies), to demonstrate that physical information can be extracted directly from sound with STRASUSS (Trayford & Harrison 2023). Using Star Sounder, Huppenkothen et al. (2023) provided an interactive sonification of the Hertzsprung–Russell diagram based on the cross-match between the Kepler Stellar Table and Gaia Data Release (DR2). The introduction of a sonic perspective has also been valued on spectroscopic analysis of Quasars by Hansen, Burchett & Forbes (2020), finding that sonification can enable more rapid discovery and identification of intergalactic/circumgalactic medium (IGM/CGM) system candidates than visually scanning through spectra.

Including multimodal interactivity, Starks (2018) explored the data cubes of the Antennae Galaxies radio-image from the Atacama Large Milimeter/submilimeter Array (ALMA), using Galaxy player within the Soniverse project. Additionally, spatialized sonifications have been used on immersive representations of Antarctic astronomy data (West et al. 2018), and highlighted by Quinton, McGregor & Benyon (2020, 2021) as an effective parameter mapping strategy with the potential to detect sudden changes between multiple sources within the field of exosolar planetary search.

Framed in this context of inclusive and immersive scientific representations, this article presents ViewCube, a multimodal interactive binaural tool for the analysis of data cubes with headphones. The application includes an unsupervised sonification approach, based on autoencoders. Furthermore, the work explores the potential of multimodal integral field spectroscopy (IFS) using the case study of the Calar Alto Legacy Integral Field Spectroscopy Area (CALIFA) survey (Sánchez et al. 2012a; Sánchez et al. 2016).

Providing quantitative and qualitative feedback from specialized and non-specialized users in Astronomy and Music, this paper aims at demonstrating the usefulness of sound in multimodal displays on IFS analysis to: (a) estimate the position of the user-selected spaxel within the galaxy, identifying whether it is located to the left/right or front/rear in the virtual soundscape; (b) estimate the spaxel’s relative distance from the centre of the galaxy, indicating whether it is near or far from this reference point; and (c) identify the type/age of a spectrum, determining whether the spaxel corresponds to a star-forming region (spectrum with multiple and relatively strong emission lines), to an intermediate age galaxy, or to a retired galaxy. The work is expected to make IFS more accessible for BLV researchers while also enhancing the overall capabilities for datacube analysis.

2 MULTIMODAL IFS

This section describes the design strategy and implementation of ViewCube. The application combines graphical and auditory displays for the interactive multimodal analysis of data cubes. Using the case study of the CALIFA survey data cubes, the work aims at enhancing 3D spectroscopy representation with immersive sonification.

2.1 Case study

The CALIFA (Sánchez et al. 2012b, 2016; Walcher et al. 2014) is a public legacy survey of over 600 galaxies with an r-band isophotal major axis between 45 and 79.2 arcsec and a redshift 0.005 |$\lt $| z |$\lt $| 0.03, selected from the Sloan Digital Sky Survey (SDSS) DR7 photometric catalogue. Aimed at helping in the study of galaxy evolution in the Local Universe through cosmic time, it uses integral field spectroscopy (IFS; Allington-Smith 2006) to provide a wide-field IFU survey of galaxies that includes all morphological types, covering masses between 108.5 and 1011.5 M|$\odot$| (Sánchez et al. 2016).

The observations were obtained with the integral-field spectrograph PMAS/PPak mounted on the 3.5 m telescope at the Calar Alto observatory. The wavelength range between 3700 and 7500 Å is sampled using two different spectral setups, a low resolution V500 mode (3745–7500 Å) with a spectral resolution of 6.0 Å (full width at half maximum, FWHM), and a medium-resolution V1200 mode (3650–4840 Å) with a spectral resolution of 2.3 Å (FWHM; García-Benito et al. 2015). CALIFA’s third Data Release (DR3; Sánchez et al. 2016) provides to the public 646 objects in the V500 setup, 484 in the V1200, and the combination of the cubes from both setups (COMBO).

The morphology references provided for each galaxy used in this work are extracted from Walcher et al. (2014).

2.2 Standalone application

ViewCube  1 is a lightweight, standalone application written entirely in Python, designed for the efficient browsing of data cubes. Originally developed for the quick assessment of the quality and physical characteristics of data cubes from the CALIFA survey, and for the rapid exploration of high-level data products generated by the PyCASSO pipeline (de Amorim et al. 2017), ViewCube’s functionality has since expanded. The application now supports data cubes from any provenance, thanks to its general and flexible FITS reader. This reader is agnostic to the source of the data cubes, enabling it to handle a wide variety of data cubes from different instruments (e.g. MUSE) and surveys (e.g. MaNGA, CALIFA) across various wavelength ranges (optical, radio).

Despite its name, ViewCube is also capable of rendering Raw Stacked Spectra (RSS) formats, provided that a file mapping the positions of the fibres is available. Visualizations within ViewCube are rendered using the Matplotlib module (Hunter 2007). The primary objective of ViewCube is to facilitate a fast and effective inspection of data cubes, either for a quick quality assessment or for a focused examination of their characteristics.

As shown in Fig. 1, the user interface features two main windows: an image window, which presents a 2D map of the datacube convolved through a chosen passband, and a spectral window, which displays the spectrum corresponding to the location of the mouse pointer. Users can select different spaxels or fibres for comparison, generate an integrated spectrum, and save both individual and integrated spectra. Additionally, users can modify the filter used to convolve the data cube to produce the image in the image window, and adjust the central wavelength of the filter by dragging and dropping the filter passband shown in the spectral window.

ViewCube UI displaying the data cube of the spiral (Sbc) galaxy NGC 5732. 2D image window (left) and multimodal representation – spectral window and sonification – of the spaxel (35,35) (right).
Figure 1.

ViewCube UI displaying the data cube of the spiral (Sbc) galaxy NGC 5732. 2D image window (left) and multimodal representation – spectral window and sonification – of the spaxel (35,35) (right).

In keeping with its initial exploratory purpose for quality assessment, ViewCube allows for the interactive comparison of spectra from two data cubes at the same spaxel, provided the data cubes share the same dimensions. To enhance the exploration of individual spectra and perform more advanced operations, ViewCube offers the possibility of integrating with other packages such as PySpeckit (Ginsburg et al. 2022) and PyRAF (Science Software Branch at STScI 2012). Future versions of ViewCube are planned to include a faster rendering visualization engine, as well as additional menu options for improved functionality.

2.3 Sonification module

This section describes the sound module implemented within ViewCube to allow the sonification of the spectra associated with each spatial element of a datacube. The module, named SoniCube, provides an open, comprehensive, and general-purpose multimodal tool for IFS analysis, supporting the development of future sonification techniques focused on specific potential features within data cubes. The aim of the SoniCube interface is to offer a diverse ‘palette of sonifications’, akin to the range of colour palettes available for visual 2D maps in ViewCube. This variety of sonification options allows users to extract or enhance different data characteristics by selecting specific sonification methods, much like how a colour palette reveals visual details. In this paper, we present the first sonification method implemented within SoniCube’s palette, providing an autonomous fast representation of the spectra. This first sonification ‘palette’ implementation is designed to replicate the purpose of the visual ViewCube counterpart, providing a quick qualitative overview of the data cube.

Also implemented in Python, SoniCube controls a sound synthesizer developed in CSound via Open Sound Control (OSC) (Wright 2002), using the python-osc native module and the ctcsound interface (Ctcsound 2022). The module provides a real-time interactive sonification of the spectrum associated to each user-selected spaxel. In this first sonification implementation, each spectrum is converted into sound using the deep learning approach described in Section 2.4. This process provides an unsupervised unique auditory footprint conveying the information of each spectrum in one single sound event. Each sound event is generated with an additive synthesizer using six independent oscillators, fed by a 6D latent vector which is generated by an autoencoder (Baldi 2012).

The module generates ‘on the fly’ a 6D latent vector from each user-selected spectrum. The components of this vector are interpreted as fundamental frequencies for the six oscillators that synthesize the sound. The six components are multiplied by a factor of 10 000 for scaling the latent values to audible frequencies, generating comprehensible accurate sonifications. For a formalized description of the synthesizer see Appendix  A.

Additionally, the module calculates the azimuth and radial distance from user-selected spaxels to the reference spaxel, which corresponds to the centre of the galaxy on each data cube. The azimuth is used to locate the auditory footprint of the spectrum within the binaural soundscape (Møller 1992) generated for each data cube, providing an immersive representation of its spectra with the listener located at the centre of the galaxy. For more information about binaural encoding see Appendix  B.

The distance is used to feed the direct-to-reverberant energy ratio of a reverberation emulator (Gardner 1998), which provides the cognitive sensation associated with the sound field that can be found in large indoor environments. This effect is used to generate a virtual auditory cue for distance perception based on direct-to-reverberant energy ratio (Lu & Cooke 2010), related to the proximity from the user-selected spaxel to the reference spaxel in the centre of the galaxy.

The amplitude of each sonification is calculated from the absolute fluxes of the represented spectrum. To solve the wide and variable dynamic range of flux density commonly found in real sky observations, SoniCube includes two operating modes for representing relative sound amplitudes. In the flux sensitive mode (default), the amplitude of each sonification is calculated from the logarithmic median of the absolute flux, normalized using feature scaling. This mode preserves the apparent relation of fluxes within the data cube. On the other hand, if the sensitive mode is deactivated, all spectra in the data cube are represented with the same amplitude, allowing the appreciation of regions with relatively low absolute fluxes. An additional two-stage broad-band dynamic range limiter/compressor (Kates 2005) is implemented in both modes to keep extremely salient values controlled, preventing from hearing damage when analysing unexplored data cubes.

To compensate for the non-linear response of human hearing in frequency and amplitude domains, an A-weighted curve is applied to the array of amplitudes using the librosa module (McFee et al. 2015). This function calculates the normalized amplification factors that provide an equal loudness response on each footprint, presenting good results for the implementation of this first sonification module evaluated with the CALIFA survey. Nevertheless, alternative loudness contours are being explored for critical future implementations (Charbonneau et al. 2012).

Finally, using the Open Sound Control (OSC) protocol, the module sends to CSound all the variables needed to synthesize the auditory footprint in a binaural soundscape. The block diagram of Fig. 2 summarizes the sound generation process. Notice the use of azimuth and distance of the selected spaxel in the binaural and reverberation blocks, as well as the normalized median flux, and the frequencies with A-weighted factors of the corresponding spectrum in the additive synthesizer.

SoniCube block diagram. Pre-processing and real time calculations including data, OSC, and audio signal flows.
Figure 2.

SoniCube block diagram. Pre-processing and real time calculations including data, OSC, and audio signal flows.

Addressing some additional aesthetic aspects of the sonification, background echo, and flanger effects were added to the workflow, facilitating the binaural localization of the footprint and smoothing fast transitions between spectra.2

2.4 Autoencoding CALIFA

As mentioned in the previous section, this first sonification implementation of SoniCube uses an autoencoder architecture to provide accurate sonifications of the spectral information of a data cube. Based on the gradient descent algorithm, these networks allow the reduction of data dimensionality better than other approaches such as principal components analysis (PCA; Hinton & Salakhutdinov 2006). Autoencoders are neural network models with the potential to learn an approximation to the identity function, providing an output that is similar to their input (Ng et al. 2011). By reducing the number of hidden units of the intermediate layer of the network, a model can learn relevant structures of the data, which can also be reconstructed from this intermediate lower dimensional representation, named latent space (Goodfellow 2016).

The dimension of the latent space depends on the data set and the architecture used. Aimed at obtaining stellar parameters using convolutional neural networks, Mas-Buitrago et al. (2024) proposed a 32D latent space autoencoder, displayed in a 8×4 matrix, for the reduction of CARMENES spectra, and ACES synthetic spectra modelled with Mas-Buitrago, González-Marcos, Solano, Passegger, Cortés-Contreras, Ordieres-Meré, Bello-García, Caballero, Schweitzer, Tabernero et al. (1990). On the other hand, the reconstruction from 4D latent vectors was enough for Xiang, Gu & Cao (2022) to analyse the stellar magnetic activity using variational autoencoders on the Large Sky Multi-Object Fiber Spectroscopic Telescope (LAMOST) K2 spectra. Demonstrating the potential of variational autoencoders, Portillo et al. (2020) summarized galaxy spectral information with only six latent variables on the Sloan Digital Sky Survey (SDSS). This dimension also worked effectively for the Calcium II Triplet library (CaT) reduced with sparse autoencoders (García Riber & Serradilla 2024), which also agrees with our preliminary tests on the CALIFA survey galaxies.

Intensive testing was done with different configurations of both architectures using the COMBO (V500+V1200) data cubes of the DR3 CALIFA survey. Fig. 3 provides a comparative example for the data cube of the spiral (Scd) galaxy NGC 5406 with a six-layer 6D autoencoder, and a four-layer 6D VAE, both implemented using TensorFlow (Abadi et al. 2016).

Autoencoder comparative for the spiral (Scd) galaxy NGC 5406. Six-layer 6D autoencoder (solid line) versus four-layer 6D VAE (dotted line). Reconstructed spectra and residual error from the original spectrum for spaxel (34,34). Sparse autoencoder: R2 = 0.99 (spectrum), R2 = 0.98 (data cube), 39.12 per cent of the spectra with R2$\gt $ 0.9, 100 epochs, one hour per cube. VAE: R2 = 0.97 (spectrum), R2 = 0.98 (data cube), 4.92 per cent of the spectra with R2$\gt $ 0.9, 291 epochs, 5h 30’ per cube. Normalized flux (ADU) versus wavelength (Å).
Figure 3.

Autoencoder comparative for the spiral (Scd) galaxy NGC 5406. Six-layer 6D autoencoder (solid line) versus four-layer 6D VAE (dotted line). Reconstructed spectra and residual error from the original spectrum for spaxel (34,34). Sparse autoencoder: R2 = 0.99 (spectrum), R2 = 0.98 (data cube), 39.12 per cent of the spectra with R2|$\gt $| 0.9, 100 epochs, one hour per cube. VAE: R2 = 0.97 (spectrum), R2 = 0.98 (data cube), 4.92 per cent of the spectra with R2|$\gt $| 0.9, 291 epochs, 5h 30’ per cube. Normalized flux (ADU) versus wavelength (Å).

On each encoded data cube, we calculated the coefficient of determination (R2) between the original and the reconstructed sets of spectra, providing a measure of the accuracy of the reduction. As for the duration of their training processes, the VAE required 5.5 times more computation time per epoch than the sparse autoencoder to provide lower results. Respectively, R2 = 0.96 (VAE) versusR2 = 0.98 (sparse) for the complete data cube, with 4.92 per cent (VAE) versus 39.12 per cent (sparse) of the spectra with R2|$\gt $| 0.9, and R2 = 0.96 versus R2 = 0.99 for the represented spaxel (34,34).

Based on these tests, a six-layer 6D sparse autoencoder module was included in SoniCube to represent ‘in real time’ the spectral information of the data cubes with low-dimensional vectors. This architecture allowed the reduction of each input spectrum Xi|$\in$||$\mathbb {R}$|1901 (each data cube contains around 5540 spectra with 1901 flux values per spectrum), to a 6D representation Zi|$\in$||$\mathbb {R}$|6, and the reconstruction of Xi from the latent vector Zi, |$\hat{X}$|i|$\in$||$\mathbb {R}$|1901.

The model was trained on each data cube independently with around 5540 spectra. Fig. 4 provides two examples of the original and reconstructed spectra from the data cubes of NGC 5784 (Sbc) and NGC 5682 (E4).3

Six-layer 6D autoencoder results. Reconstructed (dashed line) and original (solid line) spectrum with residual error (dotted line) of the spaxel (35,35) from the spiral (Sbc) galaxy NGC 5784 (left), and the elliptical (E4) galaxy NGC 5682 (right). Two examples of an old galaxy and a star-forming region from the CALIFA survey. Respectively, R2 = 0.98 and R2 = 0.95. Normalized flux (ADU) versus wavelength (Å).
Figure 4.

Six-layer 6D autoencoder results. Reconstructed (dashed line) and original (solid line) spectrum with residual error (dotted line) of the spaxel (35,35) from the spiral (Sbc) galaxy NGC 5784 (left), and the elliptical (E4) galaxy NGC 5682 (right). Two examples of an old galaxy and a star-forming region from the CALIFA survey. Respectively, R2 = 0.98 and R2 = 0.95. Normalized flux (ADU) versus wavelength (Å).

Fig. 5 shows three examples of the learning curves obtained during the training process with the spectra from the data cubes of NGC7047 (Sab), UGC10331 (E1), and UGC03960 (E5). These galaxies illustrate the performance of the autoencoder, respectively corresponding to the best, the medium, and the worst encoding results. The coefficients obtained ranged from 0.998 to 0.882 along the complete data set, with 49 per cent of the data cubes presenting an R2 higher than 0.96, and 4.78 per cent presenting an R2 under 0.92.

Learning curves showing mean square error versus epoch during the training and validation processes for the spiral (SAb) galaxy NGC7047, the elliptical (E1) galaxy UGC10331, and the elliptical (E5) galaxy UGC03960. These data cubes correspond respectively to the best, medium, and worst encoding results provided by the autoencoder. Notice the difference of scale in the y axis.
Figure 5.

Learning curves showing mean square error versus epoch during the training and validation processes for the spiral (SAb) galaxy NGC7047, the elliptical (E1) galaxy UGC10331, and the elliptical (E5) galaxy UGC03960. These data cubes correspond respectively to the best, medium, and worst encoding results provided by the autoencoder. Notice the difference of scale in the y axis.

3 EVALUATION

To evaluate the potential utility of the previously described approach for the auditory analysis of galaxy data cubes, specifically using CALIFA data cubes as representative data, we conducted an anonymous online survey provided here for reference.4 The questionnaire was complemented by training videos and, to a lesser extent, in-person interactive demonstrations, targeting both specialized and non-specialized participants. All participants received the same online form, where they indicated whether or not they had experienced the application in person.

3.1 Survey design

The online survey was administered to volunteer participants from 2024 April 15 to July 31. The questionnaire featured five training videos, which could be replayed as needed, including one providing a general overview of the application and one for each specific section. The survey comprised four sections with video-supported questions that analysed various aspects of the proposal. Additionally, 12 questions were included to gather demographic information, participants’ self-reported levels of expertise in Astronomy and Music, and three qualitative assessments concerning the application’s interactivity, usefulness, and the aesthetics of the sounds employed in the sonification.

All participants were advised to use headphones and to check their correct placement in left and right ears. There was no time limit to complete the survey. The following describes the sections and questions included in the survey. Each question included several sonifications with no graphics, generated from the spectra of the galaxies NGC 5784 (Sbc), NGC 5732 (Sbc), NGC 5682 (E4), NGC 6060 (S0a), NGC 7562 (Sbc), NGC 7671 (S0), NGC7800 (Ir), NGC 2638 (Sb), and UGC 00148 (Sb). The participants could compare the questions with the examples presented in the training videos as many times as needed.

Section 1. Sound Location. This first section consisted of four questions designed to analyse the possibilities of the application to estimate sound location. Within the virtual binaural soundscape provided, the listener is virtually placed in the centre of the galaxy, looking at the upper position. The training videos of this section provided examples around the spiral galaxy (Sa) NGC 7549.

Section 2. Distance to the centre of the galaxy. This section also included four questions aimed at studying the possibilities of the application to provide auditory information about the distance from the moving cursor to the centre of the galaxy within the virtual spectral soundscape. The training videos of this section provided examples from the spiral galaxy (Sbc) NGC 5732.

Section 3. Age/Galaxy type. This section explored how multimodal representation could aid in differentiating between various Age/Galaxy types. The training videos presented three examples, illustrated in Fig. 6. These examples include: the spectrum (37,34) from a star-forming region of the spiral galaxy (S0) NGC 3395; the spectrum (36,34) near the centre of the intermediate-age spiral galaxy (Sd) NGC 2347; and the spectrum (35,32) from a region near the centre of the retired spiral galaxy (Sb) NGC 6125.

Age/Galaxy type examples presented in the training videos. Spectrum of a star-forming region in the spiral galaxy (S0) NGC 3395 (left), spectrum close to the centre of the intermediate-age spiral galaxy (Sd) NGC 2347 (centre), and spectrum of a region close to the centre of the retired spiral galaxy (Sb) NGC 6125 (right). The upper panels display the narrowband image, produced using the narrowband filter indicated by the filled curves in the lower spectral panels. The spectra corresponding to specific spaxels, highlighted by squares, are indicated in the upper continuum maps. The axes of the upper panels represent offsets in arcseconds relative to the centre of the galaxy. The spectra in the lower panels are plotted with wavelengths in Angstroms. The colourbar represents the flux of the data cube convolved with the filter, in units of 10$^{-16}$ erg cm$^{-2}$ s$^{-1}$.
Figure 6.

Age/Galaxy type examples presented in the training videos. Spectrum of a star-forming region in the spiral galaxy (S0) NGC 3395 (left), spectrum close to the centre of the intermediate-age spiral galaxy (Sd) NGC 2347 (centre), and spectrum of a region close to the centre of the retired spiral galaxy (Sb) NGC 6125 (right). The upper panels display the narrowband image, produced using the narrowband filter indicated by the filled curves in the lower spectral panels. The spectra corresponding to specific spaxels, highlighted by squares, are indicated in the upper continuum maps. The axes of the upper panels represent offsets in arcseconds relative to the centre of the galaxy. The spectra in the lower panels are plotted with wavelengths in Angstroms. The colourbar represents the flux of the data cube convolved with the filter, in units of 10|$^{-16}$| erg cm|$^{-2}$| s|$^{-1}$|⁠.

Section 4. Combined questions. Finally, two multiple choice questions analysed the potential of the application to allow the identification of the position of the represented spectrum (left/right), its distance to the centre (close/far), and if it corresponded to a star-forming region or to a retired galaxy.

Qualitative question 1. If you have tried the application in person, please rate the multimodal experience. If not (you only saw the training videos of this questionnaire), please skip this question. Options: Very bad, Bad, Acceptable, Good, or Very good.

Qualitative question 2. Rate the potential usefulness of the multimodal display for the exploration of the CALIFA Survey. Options: Useless, Doubtfully useful, Useful, or Very useful.

Qualitative question 3. Rate the aesthetics of the sonifications. Options: Intolerable, Bad, Acceptable, Good, or Nice Sounding.

3.2 General results

The survey was completed by 67 participants,5 including 31 professional astronomers, two of them identified as blind or low vision (BLV), and 36 non-astronomers. Their ages ranged from less than 21 (1) to more than 60 yr old (10), with most of the participants ranging between 21 and 30 (18), and between 41 and 50 (21). They were mainly from Spain (50 participants) but also from Mexico, USA, Japan, Germany, China, Malta, Australia, and UK. Participants were asked about their music preferences and native language to explore whether language influences the ability to recognize sound features. Although the study included speakers of eleven different languages, the sample size was too small and diverse to draw any definitive conclusions. The same limitation applied to the analysis of music preferences.

As shown in the first graph of Fig. 7 the mean global success rate obtained by 67 participants was 0.516, with a Jeffreys confidence interval of (0.498, 0.534), standard deviation 0.169, and 68.3 per cent of uncertainty. The subgroup formed by professional astronomers (31 participants) obtained a mean success rate of 0.554 with a Jeffreys confidence interval of (0.528, 0.579), standard deviation of 0.188, and the subgroup of non-astronomers (randomly downsampled from 36 to 31 participants to allow direct comparison) obtained a mean success rate of 0.481 with a Jeffreys confidence interval of (0.455, 0.507), standard deviation 0.171, and 68.3 per cent of uncertainty.

Evaluation results. Up-left: Average success rates for 67 participants on simple questions (left), for 31 professional astronomers (centre), and for 31 non-astronomers (right). Up-right: Average success rates on simple questions by field of expertise (balanced subgroups). From left to right: astronomers musicians, astronomers no musicians, musicians no astronomers, and non-experienced in any of the fields. Down-left: Average success rates on combined questions, global and subgroup results. Down-right: Average success rates on simple and combined questions by age groups.
Figure 7.

Evaluation results. Up-left: Average success rates for 67 participants on simple questions (left), for 31 professional astronomers (centre), and for 31 non-astronomers (right). Up-right: Average success rates on simple questions by field of expertise (balanced subgroups). From left to right: astronomers musicians, astronomers no musicians, musicians no astronomers, and non-experienced in any of the fields. Down-left: Average success rates on combined questions, global and subgroup results. Down-right: Average success rates on simple and combined questions by age groups.

Although the sample was too small to establish statistical significance, these indicative results appear to confirm that all participants were able to understand the information from the sonifications thanks to the training videos, even without having previous experience in Astronomy. Agreeing with the results obtained using auditory graphs by Smith & Walker (2005), the training and context provided in the survey enhanced the performance of the participants. This is particularly notable in the analysis of the combined questions, in which non-astronomers performed 1.33 times better than professional astronomers, as can be noticed in the left-down graph of Fig. 7. This result could be related to the decision of allowing participants to repeat the training videos as many times as they wanted, that could benefit low-intermediate prior knowledge participants (van Riesen et al. 2022).

As for the performance of BLV professional astronomers (only two participants), their results on simple questions were similar, although slightly higher (8 per cent), than those obtained by an equivalent random sampled group of non-BLV professional astronomers. This suggests that the application could help in bringing IFS analysis closer to BLV astronomers. In the combined questions, none of the two BLV astronomers succeeded in the Type/Age section. In the following, their results are included in the group of professional astronomers.

3.3 Subgroup quantitative analysis

To provide further analysis of the recorded feedback, the participants were divided into four subgroups according to their self-declared level of expertise in Astronomy and Music. The Astronomers musicians subgroup included professional astronomers also identified as professional or amateur musicians. A second group of astronomers no musicians was used to analyse the influence of sound analysis experience on potential experts in the proposed analysis tasks. The Musicians no astronomers group was formed by participants identified as professional and amateur musicians. Finally, the non-experienced group included the rest of the participants declaring no experience in any of the fields. Table 1 summarizes the results obtained on simple and combined question sections.

Table 1.

Quantitative evaluation. Results for simple and combined question sections shown by group of expertise and age. BLV astronomers were included respectively in AstroMus and AstNoMus groups.

 AnswersSuccessstdComb.succComb.std
Global670.5160.1690.1570.011
Astro310.5540.1880.1450.114
NoAstro310.4810.17090.1930.091
AstroMus110.5450.2290.1360.064
AstNoMus130.5770.1870.1150.054
MusNoAst110.4850.2200.3640.128
Nothing120.5550.1860.2080.059
|$\lt $|2110.2500.4520.00.0
21–30180.5550.1950.0830.039
31–4090.5090.2290.1660.078
41–50210.5550.1930.2140.033
51–6060.4440.2170.3330.0
|$\gt $|60100.4500.2470.050.070
BLV20.5420.39600
 AnswersSuccessstdComb.succComb.std
Global670.5160.1690.1570.011
Astro310.5540.1880.1450.114
NoAstro310.4810.17090.1930.091
AstroMus110.5450.2290.1360.064
AstNoMus130.5770.1870.1150.054
MusNoAst110.4850.2200.3640.128
Nothing120.5550.1860.2080.059
|$\lt $|2110.2500.4520.00.0
21–30180.5550.1950.0830.039
31–4090.5090.2290.1660.078
41–50210.5550.1930.2140.033
51–6060.4440.2170.3330.0
|$\gt $|60100.4500.2470.050.070
BLV20.5420.39600
Table 1.

Quantitative evaluation. Results for simple and combined question sections shown by group of expertise and age. BLV astronomers were included respectively in AstroMus and AstNoMus groups.

 AnswersSuccessstdComb.succComb.std
Global670.5160.1690.1570.011
Astro310.5540.1880.1450.114
NoAstro310.4810.17090.1930.091
AstroMus110.5450.2290.1360.064
AstNoMus130.5770.1870.1150.054
MusNoAst110.4850.2200.3640.128
Nothing120.5550.1860.2080.059
|$\lt $|2110.2500.4520.00.0
21–30180.5550.1950.0830.039
31–4090.5090.2290.1660.078
41–50210.5550.1930.2140.033
51–6060.4440.2170.3330.0
|$\gt $|60100.4500.2470.050.070
BLV20.5420.39600
 AnswersSuccessstdComb.succComb.std
Global670.5160.1690.1570.011
Astro310.5540.1880.1450.114
NoAstro310.4810.17090.1930.091
AstroMus110.5450.2290.1360.064
AstNoMus130.5770.1870.1150.054
MusNoAst110.4850.2200.3640.128
Nothing120.5550.1860.2080.059
|$\lt $|2110.2500.4520.00.0
21–30180.5550.1950.0830.039
31–4090.5090.2290.1660.078
41–50210.5550.1930.2140.033
51–6060.4440.2170.3330.0
|$\gt $|60100.4500.2470.050.070
BLV20.5420.39600

As shown in the up-right graph of Fig. 7, Astronomers no musicians were the best performers in the simple question sections, obtaining an average success rate of 0.577, with a Jeffreys confidence interval of (0.573, 0.616), standard deviation 0.187, and 68.3 per cent of uncertainty. It is worth mentioning that the non-experienced group performed as well as the Astronomer musicians, achieving an average success rate of approximately 0.55 with a standard deviation of 0.186, under the same conditions of uncertainty. This performance was 1.14 times better than that of the Musicians, which may suggest that the additional focus required to learn about unfamiliar fields helped the non-experienced participants with the proposed tasks.

The number of correct responses by group of expertise is provided in Fig. 8. As for the combined questions, the success rates obtained were notably lower than those obtained in simple questions for all groups, with the exception of Musicians no astronomers. This suggests that the experience in the analysis of sound events was more helpful than previous knowledge of the data for the proposed task.

Success rate by question and group of expertise (notice the difference in the number of participants). Results for Astronomers versus Non-astronomers (up), and subgroup rates for Astronomers musicians, Astronomers non-musicians, Musicians non-astronomers, and Non-experienced participants (down). Sound location questions referenced as ‘Loc’, distance to the centre of the galaxy questions referenced as ‘Dist’, type of galaxy questions referenced as ‘Type’, and combined questions (success = all multiple choice options correct) referenced as ‘Comb’. Dotted lines represent random-choice reference rate for each question. Notice random choice results for non-experienced participants in questions Loc-4, Dist-7, and Comb-13.
Figure 8.

Success rate by question and group of expertise (notice the difference in the number of participants). Results for Astronomers versus Non-astronomers (up), and subgroup rates for Astronomers musicians, Astronomers non-musicians, Musicians non-astronomers, and Non-experienced participants (down). Sound location questions referenced as ‘Loc’, distance to the centre of the galaxy questions referenced as ‘Dist’, type of galaxy questions referenced as ‘Type’, and combined questions (success = all multiple choice options correct) referenced as ‘Comb’. Dotted lines represent random-choice reference rate for each question. Notice random choice results for non-experienced participants in questions Loc-4, Dist-7, and Comb-13.

Further analysis of the combined responses revealed how participants failed to answer all three aspects of the combined questions simultaneously, but performed well per section. From the complete sample, 74.63 per cent of the participants located the sonification correctly, 46.27 per cent marked the correct distance to the centre of the galaxy, and 36.57 per cent successfully interpreted the age of the galaxy. The respective averaged success rates from the simple questions per section were 53.2 per cent for the location questions, 63.30 per cent for distance analysis questions, and 40.7 per cent for the type of galaxy questions.

As illustrated by Fig. 9, the Type/Age questions had the lowest relative success rates across all groups, possibly due to the level of abstraction involved in these tasks. This result suggests that interpreting a galaxy’s type or age through sound in this specific sonification implementation requires more training than the other tasks, which were more intuitive and aligned with the participants’ prior experience. It is worth mentioning the exception of the astronomers no musicians, who obtained the worst results in the interpretation of the distance to the centre of the galaxy. This fact has a positive correlation with the lack of accuracy in distance perception, when compared to horizontal localization, as discussed by Middlebrooks (2015).

Average success rate of the two combined questions by blocks (location, distance, and type/age). From left to right, success rate for Astronomers versus Non-astronomers, and subgroup rates for Astronomers musicians, Astronomers non-musicians, Musicians non-astronomers, and Non-experienced participants. The dotted line indicates the success rate expected by random choice (0.33) considering three possible responses by block (correct, incorrect, and not answered), since 19 participants did not enter any response in some blocks.
Figure 9.

Average success rate of the two combined questions by blocks (location, distance, and type/age). From left to right, success rate for Astronomers versus Non-astronomers, and subgroup rates for Astronomers musicians, Astronomers non-musicians, Musicians non-astronomers, and Non-experienced participants. The dotted line indicates the success rate expected by random choice (0.33) considering three possible responses by block (correct, incorrect, and not answered), since 19 participants did not enter any response in some blocks.

Analysing the results of the combined questions by age (down-right graph of Fig. 7), the group ranging from 51 to 60 yr old (6 participants) performed 1.56 times better than the 41–50 subgroup (21 participants), and 1.73 times better than the 31–40 subgroup (9 participants), suggesting that experience and attention to detail played an important role in the understanding of complex auditory information. Notice that the size of the samples do not provide statistical significance.

Fig. 10 provides an additional comparison between the results of the 42 participants that could try the application in person and the 25 participants that only used the training videos. Live testers presented a global success rate of 0.51 versus the 0.43 of only video trained participants. Nevertheless, the best results were obtained by astronomers non-musicians subgroup (4 participants) using only the training videos with no repetition restrictions (0.65), followed by non-experienced subgroup (13 participants) testing the application live (0.60). These results suggest that both training methods were useful for the proposed tasks.

Success rate on simple questions for participants trained only with videos (V) versus participants testing the application live (L). From left to right, global results, astronomers versus non-astronomers, and expertise subgroups. Dotted line shows averaged random choice rate (0.34) from questions with 2, 3, and 4 possible responses.
Figure 10.

Success rate on simple questions for participants trained only with videos (V) versus participants testing the application live (L). From left to right, global results, astronomers versus non-astronomers, and expertise subgroups. Dotted line shows averaged random choice rate (0.34) from questions with 2, 3, and 4 possible responses.

3.4 Qualitative feedback

To evaluate the interactivity, usefulness, and aesthetics of the proposal, the three qualitative questions described in Section 3.1 were included in the survey. Of the 67 participants that completed the survey, 42 tested the application in person. As shown in Fig. 11, 81 per cent of these participants expressed the good interactive response of the application, 79.1 per cent of the complete sample of participants (67) found the application ‘Useful’ or ‘Very useful’, 19.4 per cent ‘Doubtfully useful’, and one participant (1.49 per cent) considered it ‘Usefulness’. Regarding the aesthetics, 58.2 per cent rated it as ‘Good’ or ‘Nice sounding’, 34.33 per cent as ‘Acceptable’, and 5.97 per cent as ‘Bad sounding’. The subgroup results are also available in Table 2. Additionally, seven participants explicitly expressed that they found it difficult to differentiate the sounds, and eight participants explicitly expressed their enthusiasm about the project.

Qualitative evaluation. Interactivity: feedback from 42 participants who tested the application ‘in person’. Usefulness and aesthetics: full sample, 67 participants. 81 per cent declared that the application had a ‘good interactivity’, 79.1 per cent found it ‘useful’, and 58.2 per cent ‘good sounding’.
Figure 11.

Qualitative evaluation. Interactivity: feedback from 42 participants who tested the application ‘in person’. Usefulness and aesthetics: full sample, 67 participants. 81 per cent declared that the application had a ‘good interactivity’, 79.1 per cent found it ‘useful’, and 58.2 per cent ‘good sounding’.

Table 2.

Qualitative evaluation. Percentages by group of expertise. Asterisk values correspond to the sample of participants that tested the application in person. BLV astronomers were included respectively in AstroMus and AstNoMus groups.

 AnswersGood interactivity*UsefulGood sound
Global42*/6780.9579.1058.21
Astro21*/3176.1974.1951.61
NoAstro17*/3182.3580.6464.52
AstroMus8*/1175.063.6445.45
AstNoMus9*/1366.6679.9253.85
MusNoAst6*/1183.3381.8263.64
Nothing9*/1255.5583.3358.33
BLV1*/2100.0100.0100.0
 AnswersGood interactivity*UsefulGood sound
Global42*/6780.9579.1058.21
Astro21*/3176.1974.1951.61
NoAstro17*/3182.3580.6464.52
AstroMus8*/1175.063.6445.45
AstNoMus9*/1366.6679.9253.85
MusNoAst6*/1183.3381.8263.64
Nothing9*/1255.5583.3358.33
BLV1*/2100.0100.0100.0
Table 2.

Qualitative evaluation. Percentages by group of expertise. Asterisk values correspond to the sample of participants that tested the application in person. BLV astronomers were included respectively in AstroMus and AstNoMus groups.

 AnswersGood interactivity*UsefulGood sound
Global42*/6780.9579.1058.21
Astro21*/3176.1974.1951.61
NoAstro17*/3182.3580.6464.52
AstroMus8*/1175.063.6445.45
AstNoMus9*/1366.6679.9253.85
MusNoAst6*/1183.3381.8263.64
Nothing9*/1255.5583.3358.33
BLV1*/2100.0100.0100.0
 AnswersGood interactivity*UsefulGood sound
Global42*/6780.9579.1058.21
Astro21*/3176.1974.1951.61
NoAstro17*/3182.3580.6464.52
AstroMus8*/1175.063.6445.45
AstNoMus9*/1366.6679.9253.85
MusNoAst6*/1183.3381.8263.64
Nothing9*/1255.5583.3358.33
BLV1*/2100.0100.0100.0

4 CONCLUSIONS

The design and evaluation of fast and efficient multimodal interactive tools for the exploration of IFS data cubes can help in the analysis of current massive spectroscopic surveys, complementing the possibilities of visual representations, and fostering inclusion and accessibility.

This article provides a summary of the motivations and design strategies used in the development of the interactive multimodal binaural application ViewCube, and a user study with specialized and non-specialized participants analysing selected galaxies from the CALIFA survey. The complete work was aimed at exploring the potential of multimodal IFS for the analysis of data cubes, with the motivation of making them more accessible for blind and low vision (BLV) researchers, and more immersive for complementary exploration through sound.

The tool allows the exploration of a wide variety of data cubes from different instruments and surveys across various wavelength ranges and includes a deep learning sonification approach to provide accurate comprehensible sonifications of the spectral information of the data cubes. The approach serves as an initial, general-purpose sonification tool designed to provide an overview of spectral properties, particularly in terms of stellar age and emission lines. In this context, the autoencoder effectively captures the general characteristics of both the strong emission lines (if present) and the continuum.

Regarding the qualitative feedback obtained from the 42 participants (including 21 professional astronomers) who evaluated the application in person, it can be concluded that the interactivity of the application is good or very good (80.95 per cent), with 79.1 per cent of the complete sample of participants (including 31 professional astronomers) finding it ‘Useful’ or ‘Very useful’, and 58.21 per cent declaring it as ‘Good’ or ‘Nice sounding’.

The quantitative evaluation of the application was conducted through an online survey featuring five training videos. The questionnaire was structured in four blocks aimed at analysing the potential of the application for the estimation by sound of: (1) the position of a user-selected spaxel in the virtual soundscape generated by a data cube (left, right, front, or rear); (2) the distance of the user-selected spaxel to the centre of the represented galaxy (close to the centre, intermediate distance, or far from the centre); (3) the type/age of the spectrum of the user-selected spaxel (star-forming region, intermediate age, or retired galaxy); and (4) all three characteristics combined.

Although the sample was too small to provide statistical significance, the results suggest that all participants (experienced and non-experienced) were able to retrieve information from the sonifications, presenting an average success rate of 0.516, with professional astronomers performing 1.15 times better than non-astronomers.

The sample included two professional astronomers self-declared BLV. This group presented an average success rate of 0.541, 8 per cent higher than a random sampled subgroup formed by two professional astronomers, and 5 per cent higher than the mean of the complete sample, suggesting that the application can improve the access of BLV astronomers to IFS analysis. In the following analysis, BLV participants were included in professional astronomer subgroups.

The subgroup analysis revealed how the astronomers no musicians obtained the higher success rates, followed by astronomers musicians and non-experienced participants, who performed 1.14 times better than musicians. These results suggest that the additional attention required for learning aspects of unfamiliar fields could have helped the non-experienced participants with the proposed tasks.

Concerning the combined questions, the non-experienced group performed even slightly better than the astronomers (1.33 times). In these questions, musicians were the top performers, suggesting that, for the proposed task, experience in analysing sound events was more beneficial than prior knowledge of the data. Analysing the results by age, participants between 51 and 60 yr old were the top performers, suggesting that experience and attention to detail played a significant role in understanding complex auditory information.

In conclusion, although further research with additional sonification approaches and alternative spectroscopic surveys is planned, the promising trends outlined in this article suggest that the use of multimodal IFS displays can enhance the datacube analysis process and make 3D spectroscopy more accessible to BLV researchers. ViewCube has demonstrated its ability to convey information about CALIFA’s galaxies, which was understood by both experienced and non-experienced users.

ACKNOWLEDGEMENTS

We extend our gratitude to the anonymous referee for their valuable and insightful feedback, which has contributed to enhancing the quality and clarity of our manuscript.

We want to thank all the anonymous volunteers who made this research possible by testing the application and completing the survey.

This study uses data provided by the Calar Alto Legacy Integral Field Area (CALIFA) survey (https://califa.caha.es/). Based on observations collected at the Centro Astronómico Hispano Alemán (CAHA) at Calar Alto, operated jointly by the Max-Planck-Institut für Astronomie and the Instituto de Astrofísica de Andalucía (CSIC).

RGB acknowledges financial support from the Severo Ochoa grant CEX2021-001131-S funded by MCIN/AEI/10.13039/501100011033 and PID2022-141755NB-I00.

DATA AVAILABILITY

The data cubes from the CALIFA Survey are available at https://califa.caha.es/FTP-PUB/reduced/COMB/reduced_v2.2/.

The encoded cubes generated for the sonification can be found at https://zenodo.org/records/10570065.

The feedback recorded from the survey and its analysis notebooks are available at https://github.com/rgbIAA/ViewCube-Evaluation/tree/main/2024.

Footnotes

1

The application can be downloaded from: https://github.com/rgbIAA/viewcube.

2

The following video shows a demonstration of the application.

3

The corresponding sonicubes, the encoded files for 446 galaxies from the DR3 COMBO data cubes of the CALIFA survey, can be downloaded from: https://zenodo.org/records/10570065.

4

The survey form is available at:

5

The survey results and analysis notebooks are available at:

REFERENCES

Abadi
 
M.
 et al. ,
2016
, in
12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16)
.
USENIX
,
Savannah
, p.
265
. Available at:

Allington-Smith
 
J.
,
2006
,
New Astron. Rev.
,
50
,
244
 

de Amorim
 
A. L.
 et al. ,
2017
,
MNRAS
,
471
,
3727
 

Arcand
 
K. K.
,
Schonhut-Stasik
 
J. S.
,
Kane
 
S. G.
,
Sturdevant
 
G.
,
Russo
 
M.
,
Watzke
 
M.
,
Hsu
 
B.
,
Smith
 
L. F.
,
2024
,
Front. Commun.
,
9
,
1288896
 

Baldi
 
P.
,
2012
, in
Proc. ICML Workshop on Unsupervised and Transfer Learning
.
PMLR
,
Bellevue, Washington
, p.
37

Bernhardt
 
M.
,
Cowell
 
C.
,
Oxford
 
W.
,
2007
, in
Shen
 
S. S.
,
Lewis
 
P. E.
, eds,
Proc. SPIE Conf. Ser. Vol. 6565, Algorithms and Technologies for Multispectral, Hyperspectral, and Ultraspectral Imagery XIII
.
SPIE
,
Bellingham
, p.
65650D
 

Brown
 
C. P.
,
Duda
 
R. O.
,
1997
, in
Proceedings of 1997 Workshop on Applications of Signal Processing to Audio and Acoustics
.
IEEE
,
New York
, p.
4

Carty
 
B.
,
2008
,
hrtfmove2 opcode
. Available at:

Casado
 
J.
,
García
 
B.
,
2024
,
RASTI
,
3
,
625
 

Casado
 
J.
,
Diaz-Merced
 
W.
,
García
 
B.
,
2024
,
preprint
()

Charbonneau
 
J.
,
Novak
 
C.
,
Gaspar
 
R.
,
Ule
 
H.
,
2012
,
J. Acoust. Soc. Am
,
131
,
3502
 

Cooke
 
J.
,
Díaz-Merced
 
W.
,
Foran
 
G.
,
Hannam
 
J.
,
Garcia
 
B.
,
2017
, in
Bruni
 
G.
,
Trigo
 
M. D.
,
Laha
 
S.
,
Fukumura
 
K.
, eds,
Proc. IAU Symp. 378 (Vol. 14), Black Hole Winds at All Scales
.
Cambridge Univ. Press
,
Cambridge
, p.
251
 

Ctcsound
,
2022
,
Ctcsound library
. Available at:

Dubus
 
G.
,
Bresin
 
R.
,
2013
,
PloS One
,
8
,
e82491
 

Enge
 
K.
 et al. ,
2024
,
Comput. Graph. Forum
,
43
,
e15114

Foran
 
G.
,
Cooke
 
J.
,
Hannam
 
J.
,
2022
,
Rev. Mex. Astron. Astrof. Ser. Conf.
,
54
,
1

García-Benito
 
R.
 et al. ,
2015
,
A&A
,
576
,
A135
 

García Riber
 
A.
,
Serradilla
 
F.
,
2024
,
J. Audio Eng. Soc.
,
191

Gardner
 
W. G.
,
1998
, in
Kahrs
 
M.
,
Brandenburg
 
K.
, eds,
Applications of Digital Signal Processing to Audio and Acoustics
.
Springer
,
New York
, p.
85

Ginsburg
 
A.
,
Sokolov
 
V.
,
de Val-Borro
 
M.
,
Rosolowsky
 
E.
,
Pineda
 
J. E.
,
Sipőcz
 
B. M.
,
Henshaw
 
J. D.
,
2022
,
AJ
,
163
,
291
 

Goodfellow
 
I.
,
2016
,
Deep Learning
.
MIT Press
,
Cambridge

Hansen
 
B.
,
Burchett
 
J. N.
,
Forbes
 
A. G.
,
2020
,
J. Audio Eng. Soc.
,
68
,
865
 

Hinton
 
G. E.
,
Salakhutdinov
 
R. R.
,
2006
,
Science
,
313
,
504
 

Hunter
 
J. D.
,
2007
,
Comput. Sci. Eng.
,
9
,
90
 

Huppenkothen
 
D.
,
Pampin
 
J.
,
Davenport
 
J. R.
,
Wenlock
 
J.
,
2023
,
The 28th International Conference on Auditory Display
.
ICAD 2023
,
Norrköping, Sweden
, p.
272

IECI.
,
2013
,
International Electrotechnical Commission: Geneva, Switzerland
.

Kahrs
 
M.
,
Brandenburg
 
K.
,
1998
,
Applications of Digital Signal Processing to Audio and Acoustics
.
Springer Science and Business Media
,
Berlin

Kates
 
J. M.
,
2005
,
Trends Amplif.
,
9
,
45
 

Lazzarini
 
V.
,
Carty
 
B.
,
2008
, in
Proc. 6th Linux Audio Conference
.
CiteSeer
,
Köln, Germany
, p.
28

Lu
 
Y.-C.
,
Cooke
 
M.
,
2010
,
IEEE Trans. Audio Speech
,
18
,
1793
 

Mas-Buitrago
 
P.
 et al. ,
2024
,
A&A
,
687
,
A205
 

McFee
 
B.
,
Raffel
 
C.
,
Liang
 
D.
,
Ellis
 
D. P.
,
McVicar
 
M.
,
Battenberg
 
E.
,
Nieto
 
O.
,
2015
, in
SciPy
.
Austin, Texas
, p.
18

Middlebrooks
 
J. C.
,
2015
,
Handbook Clinical Neurol.
,
129
,
99
 

Møller
 
H.
,
1992
,
Appl. Acoust.
,
36
,
171
 

Ng
 
A.
 et al. ,
2011
,
CS294A Lecture Notes
,
72
,
1

Portillo
 
S. K.
,
Parejko
 
J. K.
,
Vergara
 
J. R.
,
Connolly
 
A. J.
,
2020
,
AJ
,
160
,
45
 

Quinton
 
M.
,
McGregor
 
I.
,
Benyon
 
D.
,
2020
, in
Proc. 15th International Audio Mostly Conference
.
Association for Computing Machinery
,
New York
, p.
191

Quinton
 
M.
,
McGregor
 
I.
,
Benyon
 
D.
,
2021
, in
Proc. 16th International Audio Mostly Conference
.
Association for Computing Machinery
,
New York
, p.
72

van Riesen
 
S. A.
,
Gijlers
 
H.
,
Anjewierden
 
A. A.
,
de Jong
 
T.
,
2022
,
Interact. Learn. Envir.
,
30
,
17
 

Rönnberg
 
N.
,
Jimmy
 
J.
,
2016
, in
ISon 2016, 5th Interactive Sonification Workshop
.
CITEC
,
Bielefeld University, Germany
, p.
63

Sánchez
 
S. F.
 et al. ,
2012a
,
A&A
,
538
,
A8
 

Sánchez
 
S.
 et al. ,
2012b
,
A&A
,
538
,
A8
 

Sánchez
 
S. F.
 et al. ,
2016
,
A&A
,
594
,
A36
 

Science Software Branch at STScI
,
2012
,
Astrophysics Source Code Library
,
record ascl:1207.011

Smith
 
D. R.
,
Walker
 
B. N.
,
2005
,
Appl. Cogn. Psychol.
,
19
,
1065
 

Trayford
 
J. W.
,
Harrison
 
C. M.
,
2023
,
28th International Conference on Auditory Display
.
ICAD 2023
,
Norrköping, Sweden
, p.
249

Trayford
 
J. W.
,
Harrison
 
C.
,
Hinz
 
R.
,
Kavanagh Blatt
 
M.
,
Dougherty
 
S.
,
Girdhar
 
A.
,
2023
,
RAS Techn. Instrum.
,
2
,
387
 

Walcher
 
C.
 et al. ,
2014
,
A&A
,
569
,
A1
 

Walker
 
B. N.
,
Nees
 
M. A.
,
2011
, in
Hermann
 
T.
,
Hunt
 
A.
,
Neuhoff
 
J. G.
, eds,
The Sonification Handbook, Vol. 1
.
COST
,
Berlin
, p.
9

West
 
R.
,
Johnson
 
V.
,
Yeh
 
I. C.
,
Thomas
 
Z.
,
Tarlton
 
M.
,
2018
,
Electronic Imaging
,
30
,
1
 

Xiang
 
Y.
,
Gu
 
S.
,
Cao
 
D.
,
2022
,
MNRAS
,
514
,
4781
 

APPENDIX A: SYNTHESIZER DESCRIPTION

The equation of the six-oscillator additive synthesizer implemented in SoniCube can be expressed for each sonification as:

(A1)

where Ai is the A-weighting coefficient for each frequency, F is the median of absolute flux of the represented galaxy spectrum,r is the ratio or slope of the dynamic range limiter/compressor used to control salient flux values, fi are the fundamental frequencies obtained through the autoencoder dimension reduction process (six dimensions), and |$\phi$|i, the relative phases of the oscillators (in our case, all set to zero).

Attending the loudness of each sonification, A-weighting coefficients are calculated for each one of the six frequencies or formants as (IECI 2013):

(A2)

with RA(fi) calculated from the expression:

(A3)

Fig. A1 shows the transfer level curve of the two-stage dynamic range limiter/compressor included in SoniCube for preventing ear damage. The slope of each stage is represented by r in equation (A1).

Transfer level curve used for the control of salient flux values that may cause ear damage when representing unexplored data cubes.
Figure A1.

Transfer level curve used for the control of salient flux values that may cause ear damage when representing unexplored data cubes.

A stereo reverberation processor based on eight delay lines (Kahrs & Brandenburg 1998) is added to the signal flow to provide information about the distance from the selected spaxel to the centre of the galaxy. The final signal sent to the binaural encoder can be expressed as:

(A4)

where S(t) is the output of the additive synthesizer, d is the distance of the user-selected spaxel to the reference point (centre of the galaxy), g is the gain of the feedback loop (in our case set to 0.9 to provide a long reverberation effect), and |$\tau$| is the fixed delay time applied to each line.

APPENDIX B: BINAURAL ENCODING

Human hearing can locate sound sources in a 3D space through the analysis of interaural level differences (ILD), and interaural time delays (ITD). The alterations produced in a sound when travelling from the source to the listener can be characterized using head related transfer functions (HRTF) (Brown & Duda 1997). Binaural systems are based on the convolution of the HRTF of both ears with the sound sources, allowing the spatialization of static and dynamic locations (Lazzarini & Carty 2008).

The relationship between the sound source and the signal reaching the listener ears can be expressed in terms of the azimuth (AZ) and elevation (EL) angles, the distance to the source (d), and the angular frequency (w), as shown in the following equation:

(B1)

where YL,R are the audio spectra of acoustic signals at listener’s ears, HL,R is the HRTF and XL,R are the audio spectra of the sound source.

Sonicube uses the binaural hrtfmove Csound opcode (Carty 2008) to represent the spectral information of the data cubes in an immersive 2D soundscape correlated to ViewCube’s UI. The azimuth is calculated from the (x,y) coordinates and the elevation is set to zero to discard the third dimension. Distance is not used to keep the levels of the sonification flux-dependent, reducing the general expression to:

(B2)
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.