Interactive multimodal integral field spectroscopy

ABSTRACT

Using sonification on scientific data analysis provides additional dimensions to visualization, potentially increasing researchers’ analytical capabilities and fostering inclusion and accessibility. This research explores the potential of multimodal integral field spectroscopy applied to galaxy analysis through the development and evaluation of a tool that complements the visualization of data cubes with sound. The proposed application, ViewCube, provides interactive visualizations and sonifications of spectral information across a 2D field-of-view, and its architecture is designed to incorporate future sonification approaches. The first sonification implementation described in this article uses a deep learning module to generate binaural unsupervised auditory representations. The work includes a qualitative and quantitative user study based on an online questionnaire, aimed at both specialized and non-specialized participants, focusing on the case study of data cubes of galaxies from the Calar Alto Integral Field Spectroscopy Area survey. Out of 67 participants who completed the questionnaire, 42 had the opportunity to test the application in person prior to filling out the online survey. 81 per cent of these 42 participants expressed the good interactive response of the tool, 79.1 per cent of the complete sample found the application ‘Useful’, and 58.2 per cent rated its aesthetics as ‘Good’. The quantitative results suggest that all participants were able to retrieve information from the sonifications, pointing to previous experience in the analysis of sound events as more helpful than previous knowledge of the data for the proposed tasks, and highlighting the importance of training and attention to detail for the understanding of complex auditory information.

Software, Data Methods, Spectroscopy, Sonification, Machine Learning

1 INTRODUCTION

The combination of visual and auditory displays can offer a better understanding of a phenomenon (Enge et al. 2024), making the use of sound for the representation of physical quantities an established area of research (Dubus & Bresin 2013). It can allow holistic interpretations of the data facilitating the discovery of previously unseen relationships (Cooke et al. 2017), as well as single datum analytic tasks, both involving point estimation and comparison, trend identification, and data structure analysis (Walker & Nees 2011).

Sonification has proven to be a very useful tool to assist in the analysis of hyperspectral data, generating sonic time-series related to the spatial and spectral content near user-selected mouse positions (Bernhardt, Cowell & Oxford 2007). It can also improve the perception of density in visualization of complex data in parallel coordinates and scatter plots (Rönnberg & Jimmy 2016), and ensure simple access to information for blind and non-blind people, enhancing the accessibility of astronomical data as well as the work of the scientist accepting complementary exploration methodologies (Casado, Diaz-Merced & García 2024).

Furthermore, the enhancement of visual information with sonification allows sighted and blind or low vision (BLV) communities to have astronomy experiences at similar levels (Arcand et al. 2024). Including blind users from the beginning of the design process, Casado & García (2024) proposed the case study of the galaxies from the Sloan Digital Sky Survey (SDSS) to show the possibilities of sonoUno. This multimodal application for displaying sound and images from any data set allowed for the discovery of the variable star UCAC4 459–09273 by blind students using sonification. As a blind researcher, Foran, Cooke & Hannam (2022) reported the use of StarSound on 1D high-redshift galaxy work for the verification and initial analysis of the rest-frame ultraviolet (UV) spectra of distant galaxies, and developed the touch-based sonification tool VoxMagellan to analyse 2D images and multidimensional data sets.

In the photometric and spectroscopic analysis field, Trayford et al. (2023) proposed the audification of spectral data cubes (direct conversion of data into audible frequencies), to demonstrate that physical information can be extracted directly from sound with STRASUSS (Trayford & Harrison 2023). Using Star Sounder, Huppenkothen et al. (2023) provided an interactive sonification of the Hertzsprung–Russell diagram based on the cross-match between the Kepler Stellar Table and Gaia Data Release (DR2). The introduction of a sonic perspective has also been valued on spectroscopic analysis of Quasars by Hansen, Burchett & Forbes (2020), finding that sonification can enable more rapid discovery and identification of intergalactic/circumgalactic medium (IGM/CGM) system candidates than visually scanning through spectra.

Including multimodal interactivity, Starks (2018) explored the data cubes of the Antennae Galaxies radio-image from the Atacama Large Milimeter/submilimeter Array (ALMA), using Galaxy player within the Soniverse project. Additionally, spatialized sonifications have been used on immersive representations of Antarctic astronomy data (West et al. 2018), and highlighted by Quinton, McGregor & Benyon (2020, 2021) as an effective parameter mapping strategy with the potential to detect sudden changes between multiple sources within the field of exosolar planetary search.

Framed in this context of inclusive and immersive scientific representations, this article presents ViewCube, a multimodal interactive binaural tool for the analysis of data cubes with headphones. The application includes an unsupervised sonification approach, based on autoencoders. Furthermore, the work explores the potential of multimodal integral field spectroscopy (IFS) using the case study of the Calar Alto Legacy Integral Field Spectroscopy Area (CALIFA) survey (Sánchez et al. 2012a; Sánchez et al. 2016).

Providing quantitative and qualitative feedback from specialized and non-specialized users in Astronomy and Music, this paper aims at demonstrating the usefulness of sound in multimodal displays on IFS analysis to: (a) estimate the position of the user-selected spaxel within the galaxy, identifying whether it is located to the left/right or front/rear in the virtual soundscape; (b) estimate the spaxel’s relative distance from the centre of the galaxy, indicating whether it is near or far from this reference point; and (c) identify the type/age of a spectrum, determining whether the spaxel corresponds to a star-forming region (spectrum with multiple and relatively strong emission lines), to an intermediate age galaxy, or to a retired galaxy. The work is expected to make IFS more accessible for BLV researchers while also enhancing the overall capabilities for datacube analysis.

2 MULTIMODAL IFS

This section describes the design strategy and implementation of ViewCube. The application combines graphical and auditory displays for the interactive multimodal analysis of data cubes. Using the case study of the CALIFA survey data cubes, the work aims at enhancing 3D spectroscopy representation with immersive sonification.

2.1 Case study

The CALIFA (Sánchez et al. 2012b, 2016; Walcher et al. 2014) is a public legacy survey of over 600 galaxies with an r-band isophotal major axis between 45 and 79.2 arcsec and a redshift 0.005 |$\lt $| z |$\lt $| 0.03, selected from the Sloan Digital Sky Survey (SDSS) DR7 photometric catalogue. Aimed at helping in the study of galaxy evolution in the Local Universe through cosmic time, it uses integral field spectroscopy (IFS; Allington-Smith 2006) to provide a wide-field IFU survey of galaxies that includes all morphological types, covering masses between 10^8.5 and 10^11.5 M_|$\odot$| (Sánchez et al. 2016).

The observations were obtained with the integral-field spectrograph PMAS/PPak mounted on the 3.5 m telescope at the Calar Alto observatory. The wavelength range between 3700 and 7500 Å is sampled using two different spectral setups, a low resolution V500 mode (3745–7500 Å) with a spectral resolution of 6.0 Å (full width at half maximum, FWHM), and a medium-resolution V1200 mode (3650–4840 Å) with a spectral resolution of 2.3 Å (FWHM; García-Benito et al. 2015). CALIFA’s third Data Release (DR3; Sánchez et al. 2016) provides to the public 646 objects in the V500 setup, 484 in the V1200, and the combination of the cubes from both setups (COMBO).

The morphology references provided for each galaxy used in this work are extracted from Walcher et al. (2014).

2.2 Standalone application

ViewCube ¹ is a lightweight, standalone application written entirely in Python, designed for the efficient browsing of data cubes. Originally developed for the quick assessment of the quality and physical characteristics of data cubes from the CALIFA survey, and for the rapid exploration of high-level data products generated by the PyCASSO pipeline (de Amorim et al. 2017), ViewCube’s functionality has since expanded. The application now supports data cubes from any provenance, thanks to its general and flexible FITS reader. This reader is agnostic to the source of the data cubes, enabling it to handle a wide variety of data cubes from different instruments (e.g. MUSE) and surveys (e.g. MaNGA, CALIFA) across various wavelength ranges (optical, radio).

Despite its name, ViewCube is also capable of rendering Raw Stacked Spectra (RSS) formats, provided that a file mapping the positions of the fibres is available. Visualizations within ViewCube are rendered using the Matplotlib module (Hunter 2007). The primary objective of ViewCube is to facilitate a fast and effective inspection of data cubes, either for a quick quality assessment or for a focused examination of their characteristics.

As shown in Fig. 1, the user interface features two main windows: an image window, which presents a 2D map of the datacube convolved through a chosen passband, and a spectral window, which displays the spectrum corresponding to the location of the mouse pointer. Users can select different spaxels or fibres for comparison, generate an integrated spectrum, and save both individual and integrated spectra. Additionally, users can modify the filter used to convolve the data cube to produce the image in the image window, and adjust the central wavelength of the filter by dragging and dropping the filter passband shown in the spectral window.

Figure 1.

ViewCube UI displaying the data cube of the spiral (Sbc) galaxy NGC 5732. 2D image window (left) and multimodal representation – spectral window and sonification – of the spaxel (35,35) (right).

Open in new tab Download slide

In keeping with its initial exploratory purpose for quality assessment, ViewCube allows for the interactive comparison of spectra from two data cubes at the same spaxel, provided the data cubes share the same dimensions. To enhance the exploration of individual spectra and perform more advanced operations, ViewCube offers the possibility of integrating with other packages such as PySpeckit (Ginsburg et al. 2022) and PyRAF (Science Software Branch at STScI 2012). Future versions of ViewCube are planned to include a faster rendering visualization engine, as well as additional menu options for improved functionality.

2.3 Sonification module

This section describes the sound module implemented within ViewCube to allow the sonification of the spectra associated with each spatial element of a datacube. The module, named SoniCube, provides an open, comprehensive, and general-purpose multimodal tool for IFS analysis, supporting the development of future sonification techniques focused on specific potential features within data cubes. The aim of the SoniCube interface is to offer a diverse ‘palette of sonifications’, akin to the range of colour palettes available for visual 2D maps in ViewCube. This variety of sonification options allows users to extract or enhance different data characteristics by selecting specific sonification methods, much like how a colour palette reveals visual details. In this paper, we present the first sonification method implemented within SoniCube’s palette, providing an autonomous fast representation of the spectra. This first sonification ‘palette’ implementation is designed to replicate the purpose of the visual ViewCube counterpart, providing a quick qualitative overview of the data cube.

Also implemented in Python, SoniCube controls a sound synthesizer developed in CSound via Open Sound Control (OSC) (Wright 2002), using the python-osc native module and the ctcsound interface (Ctcsound 2022). The module provides a real-time interactive sonification of the spectrum associated to each user-selected spaxel. In this first sonification implementation, each spectrum is converted into sound using the deep learning approach described in Section 2.4. This process provides an unsupervised unique auditory footprint conveying the information of each spectrum in one single sound event. Each sound event is generated with an additive synthesizer using six independent oscillators, fed by a 6D latent vector which is generated by an autoencoder (Baldi 2012).

The module generates ‘on the fly’ a 6D latent vector from each user-selected spectrum. The components of this vector are interpreted as fundamental frequencies for the six oscillators that synthesize the sound. The six components are multiplied by a factor of 10 000 for scaling the latent values to audible frequencies, generating comprehensible accurate sonifications. For a formalized description of the synthesizer see Appendix A.

Additionally, the module calculates the azimuth and radial distance from user-selected spaxels to the reference spaxel, which corresponds to the centre of the galaxy on each data cube. The azimuth is used to locate the auditory footprint of the spectrum within the binaural soundscape (Møller 1992) generated for each data cube, providing an immersive representation of its spectra with the listener located at the centre of the galaxy. For more information about binaural encoding see Appendix B.

The distance is used to feed the direct-to-reverberant energy ratio of a reverberation emulator (Gardner 1998), which provides the cognitive sensation associated with the sound field that can be found in large indoor environments. This effect is used to generate a virtual auditory cue for distance perception based on direct-to-reverberant energy ratio (Lu & Cooke 2010), related to the proximity from the user-selected spaxel to the reference spaxel in the centre of the galaxy.

The amplitude of each sonification is calculated from the absolute fluxes of the represented spectrum. To solve the wide and variable dynamic range of flux density commonly found in real sky observations, SoniCube includes two operating modes for representing relative sound amplitudes. In the flux sensitive mode (default), the amplitude of each sonification is calculated from the logarithmic median of the absolute flux, normalized using feature scaling. This mode preserves the apparent relation of fluxes within the data cube. On the other hand, if the sensitive mode is deactivated, all spectra in the data cube are represented with the same amplitude, allowing the appreciation of regions with relatively low absolute fluxes. An additional two-stage broad-band dynamic range limiter/compressor (Kates 2005) is implemented in both modes to keep extremely salient values controlled, preventing from hearing damage when analysing unexplored data cubes.

To compensate for the non-linear response of human hearing in frequency and amplitude domains, an A-weighted curve is applied to the array of amplitudes using the librosa module (McFee et al. 2015). This function calculates the normalized amplification factors that provide an equal loudness response on each footprint, presenting good results for the implementation of this first sonification module evaluated with the CALIFA survey. Nevertheless, alternative loudness contours are being explored for critical future implementations (Charbonneau et al. 2012).

Finally, using the Open Sound Control (OSC) protocol, the module sends to CSound all the variables needed to synthesize the auditory footprint in a binaural soundscape. The block diagram of Fig. 2 summarizes the sound generation process. Notice the use of azimuth and distance of the selected spaxel in the binaural and reverberation blocks, as well as the normalized median flux, and the frequencies with A-weighted factors of the corresponding spectrum in the additive synthesizer.

Figure 2.

SoniCube block diagram. Pre-processing and real time calculations including data, OSC, and audio signal flows.

Open in new tab Download slide

Addressing some additional aesthetic aspects of the sonification, background echo, and flanger effects were added to the workflow, facilitating the binaural localization of the footprint and smoothing fast transitions between spectra.²

2.4 Autoencoding CALIFA

As mentioned in the previous section, this first sonification implementation of SoniCube uses an autoencoder architecture to provide accurate sonifications of the spectral information of a data cube. Based on the gradient descent algorithm, these networks allow the reduction of data dimensionality better than other approaches such as principal components analysis (PCA; Hinton & Salakhutdinov 2006). Autoencoders are neural network models with the potential to learn an approximation to the identity function, providing an output that is similar to their input (Ng et al. 2011). By reducing the number of hidden units of the intermediate layer of the network, a model can learn relevant structures of the data, which can also be reconstructed from this intermediate lower dimensional representation, named latent space (Goodfellow 2016).

The dimension of the latent space depends on the data set and the architecture used. Aimed at obtaining stellar parameters using convolutional neural networks, Mas-Buitrago et al. (2024) proposed a 32D latent space autoencoder, displayed in a 8×4 matrix, for the reduction of CARMENES spectra, and ACES synthetic spectra modelled with Mas-Buitrago, González-Marcos, Solano, Passegger, Cortés-Contreras, Ordieres-Meré, Bello-García, Caballero, Schweitzer, Tabernero et al. (1990). On the other hand, the reconstruction from 4D latent vectors was enough for Xiang, Gu & Cao (2022) to analyse the stellar magnetic activity using variational autoencoders on the Large Sky Multi-Object Fiber Spectroscopic Telescope (LAMOST) K2 spectra. Demonstrating the potential of variational autoencoders, Portillo et al. (2020) summarized galaxy spectral information with only six latent variables on the Sloan Digital Sky Survey (SDSS). This dimension also worked effectively for the Calcium II Triplet library (CaT) reduced with sparse autoencoders (García Riber & Serradilla 2024), which also agrees with our preliminary tests on the CALIFA survey galaxies.

Intensive testing was done with different configurations of both architectures using the COMBO (V500+V1200) data cubes of the DR3 CALIFA survey. Fig. 3 provides a comparative example for the data cube of the spiral (Scd) galaxy NGC 5406 with a six-layer 6D autoencoder, and a four-layer 6D VAE, both implemented using TensorFlow (Abadi et al. 2016).

$Autoencoder comparative for the spiral (Scd) galaxy NGC 5406. Six-layer 6D autoencoder (solid line) versus four-layer 6D VAE (dotted line). Reconstructed spectra and residual error from the original spectrum for spaxel (34,34). Sparse autoencoder: R2 = 0.99 (spectrum), R2 = 0.98 (data cube), 39.12 per cent of the spectra with R2$\gt $ 0.9, 100 epochs, one hour per cube. VAE: R2 = 0.97 (spectrum), R2 = 0.98 (data cube), 4.92 per cent of the spectra with R2$\gt $ 0.9, 291 epochs, 5h 30’ per cube. Normalized flux (ADU) versus wavelength (Å).$

Figure 3.

Autoencoder comparative for the spiral (Scd) galaxy NGC 5406. Six-layer 6D autoencoder (solid line) versus four-layer 6D VAE (dotted line). Reconstructed spectra and residual error from the original spectrum for spaxel (34,34). Sparse autoencoder: R² = 0.99 (spectrum), R² = 0.98 (data cube), 39.12 per cent of the spectra with R²|$\gt $| 0.9, 100 epochs, one hour per cube. VAE: R² = 0.97 (spectrum), R² = 0.98 (data cube), 4.92 per cent of the spectra with R²|$\gt $| 0.9, 291 epochs, 5h 30’ per cube. Normalized flux (ADU) versus wavelength (Å).

Open in new tab Download slide

On each encoded data cube, we calculated the coefficient of determination (R²) between the original and the reconstructed sets of spectra, providing a measure of the accuracy of the reduction. As for the duration of their training processes, the VAE required 5.5 times more computation time per epoch than the sparse autoencoder to provide lower results. Respectively, R² = 0.96 (VAE) versusR² = 0.98 (sparse) for the complete data cube, with 4.92 per cent (VAE) versus 39.12 per cent (sparse) of the spectra with R²|$\gt $| 0.9, and R² = 0.96 versus R² = 0.99 for the represented spaxel (34,34).

Based on these tests, a six-layer 6D sparse autoencoder module was included in SoniCube to represent ‘in real time’ the spectral information of the data cubes with low-dimensional vectors. This architecture allowed the reduction of each input spectrum X_i|$\in$||$\mathbb {R}$|¹⁹⁰¹ (each data cube contains around 5540 spectra with 1901 flux values per spectrum), to a 6D representation Z_i|$\in$||$\mathbb {R}$|⁶, and the reconstruction of X_i from the latent vector Z_i, |$\hat{X}$|_i|$\in$||$\mathbb {R}$|¹⁹⁰¹.

The model was trained on each data cube independently with around 5540 spectra. Fig. 4 provides two examples of the original and reconstructed spectra from the data cubes of NGC 5784 (Sbc) and NGC 5682 (E4).³

Figure 4.

Six-layer 6D autoencoder results. Reconstructed (dashed line) and original (solid line) spectrum with residual error (dotted line) of the spaxel (35,35) from the spiral (Sbc) galaxy NGC 5784 (left), and the elliptical (E4) galaxy NGC 5682 (right). Two examples of an old galaxy and a star-forming region from the CALIFA survey. Respectively, R² = 0.98 and R² = 0.95. Normalized flux (ADU) versus wavelength (Å).

Open in new tab Download slide

Fig. 5 shows three examples of the learning curves obtained during the training process with the spectra from the data cubes of NGC7047 (Sab), UGC10331 (E1), and UGC03960 (E5). These galaxies illustrate the performance of the autoencoder, respectively corresponding to the best, the medium, and the worst encoding results. The coefficients obtained ranged from 0.998 to 0.882 along the complete data set, with 49 per cent of the data cubes presenting an R² higher than 0.96, and 4.78 per cent presenting an R² under 0.92.

Figure 5.

Learning curves showing mean square error versus epoch during the training and validation processes for the spiral (SAb) galaxy NGC7047, the elliptical (E1) galaxy UGC10331, and the elliptical (E5) galaxy UGC03960. These data cubes correspond respectively to the best, medium, and worst encoding results provided by the autoencoder. Notice the difference of scale in the y axis.

Open in new tab Download slide

3 EVALUATION

To evaluate the potential utility of the previously described approach for the auditory analysis of galaxy data cubes, specifically using CALIFA data cubes as representative data, we conducted an anonymous online survey provided here for reference.⁴ The questionnaire was complemented by training videos and, to a lesser extent, in-person interactive demonstrations, targeting both specialized and non-specialized participants. All participants received the same online form, where they indicated whether or not they had experienced the application in person.

3.1 Survey design

The online survey was administered to volunteer participants from 2024 April 15 to July 31. The questionnaire featured five training videos, which could be replayed as needed, including one providing a general overview of the application and one for each specific section. The survey comprised four sections with video-supported questions that analysed various aspects of the proposal. Additionally, 12 questions were included to gather demographic information, participants’ self-reported levels of expertise in Astronomy and Music, and three qualitative assessments concerning the application’s interactivity, usefulness, and the aesthetics of the sounds employed in the sonification.

All participants were advised to use headphones and to check their correct placement in left and right ears. There was no time limit to complete the survey. The following describes the sections and questions included in the survey. Each question included several sonifications with no graphics, generated from the spectra of the galaxies NGC 5784 (Sbc), NGC 5732 (Sbc), NGC 5682 (E4), NGC 6060 (S0a), NGC 7562 (Sbc), NGC 7671 (S0), NGC7800 (Ir), NGC 2638 (Sb), and UGC 00148 (Sb). The participants could compare the questions with the examples presented in the training videos as many times as needed.

Section 1. Sound Location. This first section consisted of four questions designed to analyse the possibilities of the application to estimate sound location. Within the virtual binaural soundscape provided, the listener is virtually placed in the centre of the galaxy, looking at the upper position. The training videos of this section provided examples around the spiral galaxy (Sa) NGC 7549.

Section 2. Distance to the centre of the galaxy. This section also included four questions aimed at studying the possibilities of the application to provide auditory information about the distance from the moving cursor to the centre of the galaxy within the virtual spectral soundscape. The training videos of this section provided examples from the spiral galaxy (Sbc) NGC 5732.

Section 3. Age/Galaxy type. This section explored how multimodal representation could aid in differentiating between various Age/Galaxy types. The training videos presented three examples, illustrated in Fig. 6. These examples include: the spectrum (37,34) from a star-forming region of the spiral galaxy (S0) NGC 3395; the spectrum (36,34) near the centre of the intermediate-age spiral galaxy (Sd) NGC 2347; and the spectrum (35,32) from a region near the centre of the retired spiral galaxy (Sb) NGC 6125.

$Age/Galaxy type examples presented in the training videos. Spectrum of a star-forming region in the spiral galaxy (S0) NGC 3395 (left), spectrum close to the centre of the intermediate-age spiral galaxy (Sd) NGC 2347 (centre), and spectrum of a region close to the centre of the retired spiral galaxy (Sb) NGC 6125 (right). The upper panels display the narrowband image, produced using the narrowband filter indicated by the filled curves in the lower spectral panels. The spectra corresponding to specific spaxels, highlighted by squares, are indicated in the upper continuum maps. The axes of the upper panels represent offsets in arcseconds relative to the centre of the galaxy. The spectra in the lower panels are plotted with wavelengths in Angstroms. The colourbar represents the flux of the data cube convolved with the filter, in units of 10$^{-16}$ erg cm$^{-2}$ s$^{-1}$.$

Figure 6.

Age/Galaxy type examples presented in the training videos. Spectrum of a star-forming region in the spiral galaxy (S0) NGC 3395 (left), spectrum close to the centre of the intermediate-age spiral galaxy (Sd) NGC 2347 (centre), and spectrum of a region close to the centre of the retired spiral galaxy (Sb) NGC 6125 (right). The upper panels display the narrowband image, produced using the narrowband filter indicated by the filled curves in the lower spectral panels. The spectra corresponding to specific spaxels, highlighted by squares, are indicated in the upper continuum maps. The axes of the upper panels represent offsets in arcseconds relative to the centre of the galaxy. The spectra in the lower panels are plotted with wavelengths in Angstroms. The colourbar represents the flux of the data cube convolved with the filter, in units of 10|$^{-16}$| erg cm|$^{-2}$| s|$^{-1}$|⁠.

Open in new tab Download slide

Section 4. Combined questions. Finally, two multiple choice questions analysed the potential of the application to allow the identification of the position of the represented spectrum (left/right), its distance to the centre (close/far), and if it corresponded to a star-forming region or to a retired galaxy.

Qualitative question 1. If you have tried the application in person, please rate the multimodal experience. If not (you only saw the training videos of this questionnaire), please skip this question. Options: Very bad, Bad, Acceptable, Good, or Very good.

Qualitative question 2. Rate the potential usefulness of the multimodal display for the exploration of the CALIFA Survey. Options: Useless, Doubtfully useful, Useful, or Very useful.

Qualitative question 3. Rate the aesthetics of the sonifications. Options: Intolerable, Bad, Acceptable, Good, or Nice Sounding.

3.2 General results

The survey was completed by 67 participants,⁵ including 31 professional astronomers, two of them identified as blind or low vision (BLV), and 36 non-astronomers. Their ages ranged from less than 21 (1) to more than 60 yr old (10), with most of the participants ranging between 21 and 30 (18), and between 41 and 50 (21). They were mainly from Spain (50 participants) but also from Mexico, USA, Japan, Germany, China, Malta, Australia, and UK. Participants were asked about their music preferences and native language to explore whether language influences the ability to recognize sound features. Although the study included speakers of eleven different languages, the sample size was too small and diverse to draw any definitive conclusions. The same limitation applied to the analysis of music preferences.

As shown in the first graph of Fig. 7 the mean global success rate obtained by 67 participants was 0.516, with a Jeffreys confidence interval of (0.498, 0.534), standard deviation 0.169, and 68.3 per cent of uncertainty. The subgroup formed by professional astronomers (31 participants) obtained a mean success rate of 0.554 with a Jeffreys confidence interval of (0.528, 0.579), standard deviation of 0.188, and the subgroup of non-astronomers (randomly downsampled from 36 to 31 participants to allow direct comparison) obtained a mean success rate of 0.481 with a Jeffreys confidence interval of (0.455, 0.507), standard deviation 0.171, and 68.3 per cent of uncertainty.

Figure 7.

Evaluation results. Up-left: Average success rates for 67 participants on simple questions (left), for 31 professional astronomers (centre), and for 31 non-astronomers (right). Up-right: Average success rates on simple questions by field of expertise (balanced subgroups). From left to right: astronomers musicians, astronomers no musicians, musicians no astronomers, and non-experienced in any of the fields. Down-left: Average success rates on combined questions, global and subgroup results. Down-right: Average success rates on simple and combined questions by age groups.

Open in new tab Download slide

Although the sample was too small to establish statistical significance, these indicative results appear to confirm that all participants were able to understand the information from the sonifications thanks to the training videos, even without having previous experience in Astronomy. Agreeing with the results obtained using auditory graphs by Smith & Walker (2005), the training and context provided in the survey enhanced the performance of the participants. This is particularly notable in the analysis of the combined questions, in which non-astronomers performed 1.33 times better than professional astronomers, as can be noticed in the left-down graph of Fig. 7. This result could be related to the decision of allowing participants to repeat the training videos as many times as they wanted, that could benefit low-intermediate prior knowledge participants (van Riesen et al. 2022).

As for the performance of BLV professional astronomers (only two participants), their results on simple questions were similar, although slightly higher (8 per cent), than those obtained by an equivalent random sampled group of non-BLV professional astronomers. This suggests that the application could help in bringing IFS analysis closer to BLV astronomers. In the combined questions, none of the two BLV astronomers succeeded in the Type/Age section. In the following, their results are included in the group of professional astronomers.

3.3 Subgroup quantitative analysis

To provide further analysis of the recorded feedback, the participants were divided into four subgroups according to their self-declared level of expertise in Astronomy and Music. The Astronomers musicians subgroup included professional astronomers also identified as professional or amateur musicians. A second group of astronomers no musicians was used to analyse the influence of sound analysis experience on potential experts in the proposed analysis tasks. The Musicians no astronomers group was formed by participants identified as professional and amateur musicians. Finally, the non-experienced group included the rest of the participants declaring no experience in any of the fields. Table 1 summarizes the results obtained on simple and combined question sections.

Table 1.

Open in new tab

Quantitative evaluation. Results for simple and combined question sections shown by group of expertise and age. BLV astronomers were included respectively in AstroMus and AstNoMus groups.

	Answers	Success	std	Comb.succ	Comb.std
Global	67	0.516	0.169	0.157	0.011
Astro	31	0.554	0.188	0.145	0.114
NoAstro	31	0.481	0.1709	0.193	0.091
AstroMus	11	0.545	0.229	0.136	0.064
AstNoMus	13	0.577	0.187	0.115	0.054
MusNoAst	11	0.485	0.220	0.364	0.128
Nothing	12	0.555	0.186	0.208	0.059
\|$\lt $\|21	1	0.250	0.452	0.0	0.0
21–30	18	0.555	0.195	0.083	0.039
31–40	9	0.509	0.229	0.166	0.078
41–50	21	0.555	0.193	0.214	0.033
51–60	6	0.444	0.217	0.333	0.0
\|$\gt $\|60	10	0.450	0.247	0.05	0.070
BLV	2	0.542	0.396	0	0

	Answers	Success	std	Comb.succ	Comb.std
Global	67	0.516	0.169	0.157	0.011
Astro	31	0.554	0.188	0.145	0.114
NoAstro	31	0.481	0.1709	0.193	0.091
AstroMus	11	0.545	0.229	0.136	0.064
AstNoMus	13	0.577	0.187	0.115	0.054
MusNoAst	11	0.485	0.220	0.364	0.128
Nothing	12	0.555	0.186	0.208	0.059
\|$\lt $\|21	1	0.250	0.452	0.0	0.0
21–30	18	0.555	0.195	0.083	0.039
31–40	9	0.509	0.229	0.166	0.078
41–50	21	0.555	0.193	0.214	0.033
51–60	6	0.444	0.217	0.333	0.0
\|$\gt $\|60	10	0.450	0.247	0.05	0.070
BLV	2	0.542	0.396	0	0

Table 1.

Open in new tab

Quantitative evaluation. Results for simple and combined question sections shown by group of expertise and age. BLV astronomers were included respectively in AstroMus and AstNoMus groups.

	Answers	Success	std	Comb.succ	Comb.std
Global	67	0.516	0.169	0.157	0.011
Astro	31	0.554	0.188	0.145	0.114
NoAstro	31	0.481	0.1709	0.193	0.091
AstroMus	11	0.545	0.229	0.136	0.064
AstNoMus	13	0.577	0.187	0.115	0.054
MusNoAst	11	0.485	0.220	0.364	0.128
Nothing	12	0.555	0.186	0.208	0.059
\|$\lt $\|21	1	0.250	0.452	0.0	0.0
21–30	18	0.555	0.195	0.083	0.039
31–40	9	0.509	0.229	0.166	0.078
41–50	21	0.555	0.193	0.214	0.033
51–60	6	0.444	0.217	0.333	0.0
\|$\gt $\|60	10	0.450	0.247	0.05	0.070
BLV	2	0.542	0.396	0	0

	Answers	Success	std	Comb.succ	Comb.std
Global	67	0.516	0.169	0.157	0.011
Astro	31	0.554	0.188	0.145	0.114
NoAstro	31	0.481	0.1709	0.193	0.091
AstroMus	11	0.545	0.229	0.136	0.064
AstNoMus	13	0.577	0.187	0.115	0.054
MusNoAst	11	0.485	0.220	0.364	0.128
Nothing	12	0.555	0.186	0.208	0.059
\|$\lt $\|21	1	0.250	0.452	0.0	0.0
21–30	18	0.555	0.195	0.083	0.039
31–40	9	0.509	0.229	0.166	0.078
41–50	21	0.555	0.193	0.214	0.033
51–60	6	0.444	0.217	0.333	0.0
\|$\gt $\|60	10	0.450	0.247	0.05	0.070
BLV	2	0.542	0.396	0	0

As shown in the up-right graph of Fig. 7, Astronomers no musicians were the best performers in the simple question sections, obtaining an average success rate of 0.577, with a Jeffreys confidence interval of (0.573, 0.616), standard deviation 0.187, and 68.3 per cent of uncertainty. It is worth mentioning that the non-experienced group performed as well as the Astronomer musicians, achieving an average success rate of approximately 0.55 with a standard deviation of 0.186, under the same conditions of uncertainty. This performance was 1.14 times better than that of the Musicians, which may suggest that the additional focus required to learn about unfamiliar fields helped the non-experienced participants with the proposed tasks.

The number of correct responses by group of expertise is provided in Fig. 8. As for the combined questions, the success rates obtained were notably lower than those obtained in simple questions for all groups, with the exception of Musicians no astronomers. This suggests that the experience in the analysis of sound events was more helpful than previous knowledge of the data for the proposed task.

Figure 8.

Success rate by question and group of expertise (notice the difference in the number of participants). Results for Astronomers versus Non-astronomers (up), and subgroup rates for Astronomers musicians, Astronomers non-musicians, Musicians non-astronomers, and Non-experienced participants (down). Sound location questions referenced as ‘Loc’, distance to the centre of the galaxy questions referenced as ‘Dist’, type of galaxy questions referenced as ‘Type’, and combined questions (success = all multiple choice options correct) referenced as ‘Comb’. Dotted lines represent random-choice reference rate for each question. Notice random choice results for non-experienced participants in questions Loc-4, Dist-7, and Comb-13.

Open in new tab Download slide

Further analysis of the combined responses revealed how participants failed to answer all three aspects of the combined questions simultaneously, but performed well per section. From the complete sample, 74.63 per cent of the participants located the sonification correctly, 46.27 per cent marked the correct distance to the centre of the galaxy, and 36.57 per cent successfully interpreted the age of the galaxy. The respective averaged success rates from the simple questions per section were 53.2 per cent for the location questions, 63.30 per cent for distance analysis questions, and 40.7 per cent for the type of galaxy questions.

As illustrated by Fig. 9, the Type/Age questions had the lowest relative success rates across all groups, possibly due to the level of abstraction involved in these tasks. This result suggests that interpreting a galaxy’s type or age through sound in this specific sonification implementation requires more training than the other tasks, which were more intuitive and aligned with the participants’ prior experience. It is worth mentioning the exception of the astronomers no musicians, who obtained the worst results in the interpretation of the distance to the centre of the galaxy. This fact has a positive correlation with the lack of accuracy in distance perception, when compared to horizontal localization, as discussed by Middlebrooks (2015).

Figure 9.

Average success rate of the two combined questions by blocks (location, distance, and type/age). From left to right, success rate for Astronomers versus Non-astronomers, and subgroup rates for Astronomers musicians, Astronomers non-musicians, Musicians non-astronomers, and Non-experienced participants. The dotted line indicates the success rate expected by random choice (0.33) considering three possible responses by block (correct, incorrect, and not answered), since 19 participants did not enter any response in some blocks.

Open in new tab Download slide

Analysing the results of the combined questions by age (down-right graph of Fig. 7), the group ranging from 51 to 60 yr old (6 participants) performed 1.56 times better than the 41–50 subgroup (21 participants), and 1.73 times better than the 31–40 subgroup (9 participants), suggesting that experience and attention to detail played an important role in the understanding of complex auditory information. Notice that the size of the samples do not provide statistical significance.

Fig. 10 provides an additional comparison between the results of the 42 participants that could try the application in person and the 25 participants that only used the training videos. Live testers presented a global success rate of 0.51 versus the 0.43 of only video trained participants. Nevertheless, the best results were obtained by astronomers non-musicians subgroup (4 participants) using only the training videos with no repetition restrictions (0.65), followed by non-experienced subgroup (13 participants) testing the application live (0.60). These results suggest that both training methods were useful for the proposed tasks.

Figure 10.

Success rate on simple questions for participants trained only with videos (V) versus participants testing the application live (L). From left to right, global results, astronomers versus non-astronomers, and expertise subgroups. Dotted line shows averaged random choice rate (0.34) from questions with 2, 3, and 4 possible responses.

Open in new tab Download slide

3.4 Qualitative feedback

To evaluate the interactivity, usefulness, and aesthetics of the proposal, the three qualitative questions described in Section 3.1 were included in the survey. Of the 67 participants that completed the survey, 42 tested the application in person. As shown in Fig. 11, 81 per cent of these participants expressed the good interactive response of the application, 79.1 per cent of the complete sample of participants (67) found the application ‘Useful’ or ‘Very useful’, 19.4 per cent ‘Doubtfully useful’, and one participant (1.49 per cent) considered it ‘Usefulness’. Regarding the aesthetics, 58.2 per cent rated it as ‘Good’ or ‘Nice sounding’, 34.33 per cent as ‘Acceptable’, and 5.97 per cent as ‘Bad sounding’. The subgroup results are also available in Table 2. Additionally, seven participants explicitly expressed that they found it difficult to differentiate the sounds, and eight participants explicitly expressed their enthusiasm about the project.

Figure 11.

Qualitative evaluation. Interactivity: feedback from 42 participants who tested the application ‘in person’. Usefulness and aesthetics: full sample, 67 participants. 81 per cent declared that the application had a ‘good interactivity’, 79.1 per cent found it ‘useful’, and 58.2 per cent ‘good sounding’.

Open in new tab Download slide

Table 2.

Open in new tab

Qualitative evaluation. Percentages by group of expertise. Asterisk values correspond to the sample of participants that tested the application in person. BLV astronomers were included respectively in AstroMus and AstNoMus groups.

	Answers	Good interactivity*	Useful	Good sound
Global	42*/67	80.95	79.10	58.21
Astro	21*/31	76.19	74.19	51.61
NoAstro	17*/31	82.35	80.64	64.52
AstroMus	8*/11	75.0	63.64	45.45
AstNoMus	9*/13	66.66	79.92	53.85
MusNoAst	6*/11	83.33	81.82	63.64
Nothing	9*/12	55.55	83.33	58.33
BLV	1*/2	100.0	100.0	100.0

	Answers	Good interactivity*	Useful	Good sound
Global	42*/67	80.95	79.10	58.21
Astro	21*/31	76.19	74.19	51.61
NoAstro	17*/31	82.35	80.64	64.52
AstroMus	8*/11	75.0	63.64	45.45
AstNoMus	9*/13	66.66	79.92	53.85
MusNoAst	6*/11	83.33	81.82	63.64
Nothing	9*/12	55.55	83.33	58.33
BLV	1*/2	100.0	100.0	100.0

Table 2.

Open in new tab

	Answers	Good interactivity*	Useful	Good sound
Global	42*/67	80.95	79.10	58.21
Astro	21*/31	76.19	74.19	51.61
NoAstro	17*/31	82.35	80.64	64.52
AstroMus	8*/11	75.0	63.64	45.45
AstNoMus	9*/13	66.66	79.92	53.85
MusNoAst	6*/11	83.33	81.82	63.64
Nothing	9*/12	55.55	83.33	58.33
BLV	1*/2	100.0	100.0	100.0

	Answers	Good interactivity*	Useful	Good sound
Global	42*/67	80.95	79.10	58.21
Astro	21*/31	76.19	74.19	51.61
NoAstro	17*/31	82.35	80.64	64.52
AstroMus	8*/11	75.0	63.64	45.45
AstNoMus	9*/13	66.66	79.92	53.85
MusNoAst	6*/11	83.33	81.82	63.64
Nothing	9*/12	55.55	83.33	58.33
BLV	1*/2	100.0	100.0	100.0

4 CONCLUSIONS

The design and evaluation of fast and efficient multimodal interactive tools for the exploration of IFS data cubes can help in the analysis of current massive spectroscopic surveys, complementing the possibilities of visual representations, and fostering inclusion and accessibility.

This article provides a summary of the motivations and design strategies used in the development of the interactive multimodal binaural application ViewCube, and a user study with specialized and non-specialized participants analysing selected galaxies from the CALIFA survey. The complete work was aimed at exploring the potential of multimodal IFS for the analysis of data cubes, with the motivation of making them more accessible for blind and low vision (BLV) researchers, and more immersive for complementary exploration through sound.

The tool allows the exploration of a wide variety of data cubes from different instruments and surveys across various wavelength ranges and includes a deep learning sonification approach to provide accurate comprehensible sonifications of the spectral information of the data cubes. The approach serves as an initial, general-purpose sonification tool designed to provide an overview of spectral properties, particularly in terms of stellar age and emission lines. In this context, the autoencoder effectively captures the general characteristics of both the strong emission lines (if present) and the continuum.

Regarding the qualitative feedback obtained from the 42 participants (including 21 professional astronomers) who evaluated the application in person, it can be concluded that the interactivity of the application is good or very good (80.95 per cent), with 79.1 per cent of the complete sample of participants (including 31 professional astronomers) finding it ‘Useful’ or ‘Very useful’, and 58.21 per cent declaring it as ‘Good’ or ‘Nice sounding’.

The quantitative evaluation of the application was conducted through an online survey featuring five training videos. The questionnaire was structured in four blocks aimed at analysing the potential of the application for the estimation by sound of: (1) the position of a user-selected spaxel in the virtual soundscape generated by a data cube (left, right, front, or rear); (2) the distance of the user-selected spaxel to the centre of the represented galaxy (close to the centre, intermediate distance, or far from the centre); (3) the type/age of the spectrum of the user-selected spaxel (star-forming region, intermediate age, or retired galaxy); and (4) all three characteristics combined.

Although the sample was too small to provide statistical significance, the results suggest that all participants (experienced and non-experienced) were able to retrieve information from the sonifications, presenting an average success rate of 0.516, with professional astronomers performing 1.15 times better than non-astronomers.

The sample included two professional astronomers self-declared BLV. This group presented an average success rate of 0.541, 8 per cent higher than a random sampled subgroup formed by two professional astronomers, and 5 per cent higher than the mean of the complete sample, suggesting that the application can improve the access of BLV astronomers to IFS analysis. In the following analysis, BLV participants were included in professional astronomer subgroups.

The subgroup analysis revealed how the astronomers no musicians obtained the higher success rates, followed by astronomers musicians and non-experienced participants, who performed 1.14 times better than musicians. These results suggest that the additional attention required for learning aspects of unfamiliar fields could have helped the non-experienced participants with the proposed tasks.

Concerning the combined questions, the non-experienced group performed even slightly better than the astronomers (1.33 times). In these questions, musicians were the top performers, suggesting that, for the proposed task, experience in analysing sound events was more beneficial than prior knowledge of the data. Analysing the results by age, participants between 51 and 60 yr old were the top performers, suggesting that experience and attention to detail played a significant role in understanding complex auditory information.

In conclusion, although further research with additional sonification approaches and alternative spectroscopic surveys is planned, the promising trends outlined in this article suggest that the use of multimodal IFS displays can enhance the datacube analysis process and make 3D spectroscopy more accessible to BLV researchers. ViewCube has demonstrated its ability to convey information about CALIFA’s galaxies, which was understood by both experienced and non-experienced users.

ACKNOWLEDGEMENTS

We extend our gratitude to the anonymous referee for their valuable and insightful feedback, which has contributed to enhancing the quality and clarity of our manuscript.

We want to thank all the anonymous volunteers who made this research possible by testing the application and completing the survey.

This study uses data provided by the Calar Alto Legacy Integral Field Area (CALIFA) survey (https://califa.caha.es/). Based on observations collected at the Centro Astronómico Hispano Alemán (CAHA) at Calar Alto, operated jointly by the Max-Planck-Institut für Astronomie and the Instituto de Astrofísica de Andalucía (CSIC).

RGB acknowledges financial support from the Severo Ochoa grant CEX2021-001131-S funded by MCIN/AEI/10.13039/501100011033 and PID2022-141755NB-I00.

DATA AVAILABILITY

The data cubes from the CALIFA Survey are available at https://califa.caha.es/FTP-PUB/reduced/COMB/reduced_v2.2/.

The encoded cubes generated for the sonification can be found at https://zenodo.org/records/10570065.

The feedback recorded from the survey and its analysis notebooks are available at https://github.com/rgbIAA/ViewCube-Evaluation/tree/main/2024.

Footnotes

The application can be downloaded from: https://github.com/rgbIAA/viewcube.

The following video shows a demonstration of the application.

https://vimeo.com/1005208084

The corresponding sonicubes, the encoded files for 446 galaxies from the DR3 COMBO data cubes of the CALIFA survey, can be downloaded from: https://zenodo.org/records/10570065.

The survey form is available at:

https://forms.office.com/e/RbHVp8Vbbt

The survey results and analysis notebooks are available at:

https://github.com/rgbIAA/ViewCube-Evaluation/tree/main/2024

REFERENCES

Abadi

et al. ,

2016

, in

12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16)

USENIX

Savannah

, p.

265

. Available at:

https://www.usenix.org/sites/default/files/osdi16_full_proceedings.pdf

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Allington-Smith

2006

New Astron. Rev.

244

10.1016/j.newar.2006.02.024

Crossref

de Amorim

A. L.

et al. ,

2017

MNRAS

471

3727

10.1093/mnras/stx1805

Crossref

Arcand

K. K.

Schonhut-Stasik

J. S.

Kane

S. G.

Sturdevant

Russo

Watzke

Hsu

Smith

L. F.

2024

Front. Commun.

1288896

10.3389/fcomm.2024.1288896

Crossref

Baldi

2012

, in

Proc. ICML Workshop on Unsupervised and Transfer Learning

PMLR

Bellevue, Washington

, p.

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Bernhardt

Cowell

Oxford

2007

, in

Shen

S. S.

Lewis

P. E.

, eds,

Proc. SPIE Conf. Ser. Vol. 6565, Algorithms and Technologies for Multispectral, Hyperspectral, and Ultraspectral Imagery XIII

SPIE

Bellingham

, p.

65650D

Brown

C. P.

Duda

R. O.

1997

, in

Proceedings of 1997 Workshop on Applications of Signal Processing to Audio and Acoustics

IEEE

New York

, p.

Carty

2008

hrtfmove2 opcode

. Available at:

http://www.csounds.com/manual/html/hrtfmove2.html

Casado

García

2024

RASTI

625

10.1093/rasti/rzae042

Crossref

Casado

Diaz-Merced

García

2024

preprint

(

arXiv

)

Charbonneau

Novak

Gaspar

Ule

2012

J. Acoust. Soc. Am

131

3502

10.1121/1.4709236

Crossref

Cooke

Díaz-Merced

Foran

Hannam

Garcia

2017

, in

Bruni

Trigo

M. D.

Laha

Fukumura

, eds,

Proc. IAU Symp. 378 (Vol. 14), Black Hole Winds at All Scales

Cambridge Univ. Press

Cambridge

, p.

251

10.1017/S1743921318002703

Ctcsound

2022

Ctcsound library

. Available at:

https://pypi.org/project/ctcsound/

Dubus

Bresin

2013

PloS One

e82491

10.1371/journal.pone.0082491

Crossref

PubMed

Enge

et al. ,

2024

Comput. Graph. Forum

e15114

Crossref

Foran

Cooke

Hannam

2022

Rev. Mex. Astron. Astrof. Ser. Conf.

García-Benito

et al. ,

2015

A&A

576

A135

10.1051/0004-6361/201425080

Crossref

García Riber

Serradilla

2024

J. Audio Eng. Soc.

191

Gardner

W. G.

1998

, in

Kahrs

Brandenburg

, eds,

Applications of Digital Signal Processing to Audio and Acoustics

Springer

New York

, p.

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Ginsburg

Sokolov

de Val-Borro

Rosolowsky

Pineda

J. E.

Sipőcz

B. M.

Henshaw

J. D.

2022

163

291

10.3847/1538-3881/ac695a

Crossref

Goodfellow

2016

Deep Learning

MIT Press

Cambridge

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Hansen

Burchett

J. N.

Forbes

A. G.

2020

J. Audio Eng. Soc.

865

10.17743/jaes.2020.0011

Crossref

Hinton

G. E.

Salakhutdinov

R. R.

2006

Science

313

504

10.1126/science.1127647

Crossref

PubMed

Hunter

J. D.

2007

Comput. Sci. Eng.

10.1109/MCSE.2007.55

Crossref

Huppenkothen

Pampin

Davenport

J. R.

Wenlock

2023

The 28th International Conference on Auditory Display

ICAD 2023

Norrköping, Sweden

, p.

272

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

IECI.

2013

International Electrotechnical Commission: Geneva, Switzerland

Kahrs

Brandenburg

1998

Applications of Digital Signal Processing to Audio and Acoustics

Springer Science and Business Media

Berlin

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Kates

J. M.

2005

Trends Amplif.

10.1177/108471380500900202

Crossref

PubMed

Lazzarini

Carty

2008

, in

Proc. 6th Linux Audio Conference

CiteSeer

Köln, Germany

, p.

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Y.-C.

Cooke

2010

IEEE Trans. Audio Speech

1793

10.1109/TASL.2010.2050687

Crossref

Mas-Buitrago

et al. ,

2024

A&A

687

A205

10.1051/0004-6361/202449865

Crossref

McFee

Raffel

Liang

Ellis

D. P.

McVicar

Battenberg

Nieto

2015

, in

SciPy

Austin, Texas

, p.

Middlebrooks

J. C.

2015

Handbook Clinical Neurol.

129

10.1016/B978-0-444-62630-1.00006-8

Crossref

Møller

1992

Appl. Acoust.

171

10.1016/0003-682X(92)90046-U

Crossref

et al. ,

2011

CS294A Lecture Notes

PHOENIX

1990

. Available at: http://astro.vaporia.com/start/phoenixcode.html

Portillo

S. K.

Parejko

J. K.

Vergara

J. R.

Connolly

A. J.

2020

160

10.3847/1538-3881/ab9644

Crossref

Quinton

McGregor

Benyon

2020

, in

Proc. 15th International Audio Mostly Conference

Association for Computing Machinery

New York

, p.

191

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Quinton

McGregor

Benyon

2021

, in

Proc. 16th International Audio Mostly Conference

Association for Computing Machinery

New York

, p.

van Riesen

S. A.

Gijlers

Anjewierden

A. A.

de Jong

2022

Interact. Learn. Envir.

10.1080/10494820.2019.1631193

Crossref

Rönnberg

Jimmy

2016

, in

ISon 2016, 5th Interactive Sonification Workshop

CITEC

Bielefeld University, Germany

, p.

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Sánchez

S. F.

et al. ,

2012a

A&A

538

10.1051/0004-6361/201117353

Crossref

Sánchez

et al. ,

2012b

A&A

538

10.1051/0004-6361/201117353

Crossref

Sánchez

S. F.

et al. ,

2016

A&A

594

A36

10.1051/0004-6361/201628661

Crossref

Science Software Branch at STScI

2012

Astrophysics Source Code Library

record ascl:1207.011

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Smith

D. R.

Walker

B. N.

2005

Appl. Cogn. Psychol.

1065

10.1002/acp.1146

Crossref

Starks

2018

Soniverse

. Available at:

https://soniverse.space/2018/02/15/listen-to-the-radio-cube-of-the-antennae-galaxies

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Trayford

J. W.

Harrison

C. M.

2023

28th International Conference on Auditory Display

ICAD 2023

Norrköping, Sweden

, p.

249

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Trayford

J. W.

Harrison

Hinz

Kavanagh Blatt

Dougherty

Girdhar

2023

RAS Techn. Instrum.

387

10.1093/rasti/rzad021

Crossref

Walcher

et al. ,

2014

A&A

569

10.1051/0004-6361/201424198

Crossref

Walker

B. N.

Nees

M. A.

2011

, in

Hermann

Hunt

Neuhoff

J. G.

, eds,

The Sonification Handbook, Vol. 1

COST

Berlin

, p.

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

West

Johnson

Yeh

I. C.

Thomas

Tarlton

2018

Electronic Imaging

10.2352/ISSN.2470-1173.2018.03.ERVR-449

Crossref

Wright

2002

OSC Specification

. Available at:

http://opensoundcontrol.org/spec-1_0.html

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Xiang

Cao

2022

MNRAS

514

4781

10.1093/mnras/stac1693

Crossref

APPENDIX A: SYNTHESIZER DESCRIPTION

The equation of the six-oscillator additive synthesizer implemented in SoniCube can be expressed for each sonification as:

$$\begin{eqnarray} S(t) = \sum _{i=0}^{6} A_{i} F r \sin {(2\pi f_{i} t + \phi _{i})} , \end{eqnarray}$$

(A1)

where A_i is the A-weighting coefficient for each frequency, F is the median of absolute flux of the represented galaxy spectrum,r is the ratio or slope of the dynamic range limiter/compressor used to control salient flux values, f_i are the fundamental frequencies obtained through the autoencoder dimension reduction process (six dimensions), and |$\phi$|_i, the relative phases of the oscillators (in our case, all set to zero).

Attending the loudness of each sonification, A-weighting coefficients are calculated for each one of the six frequencies or formants as (IECI 2013):

$$\begin{eqnarray} A_{i} = 20\log {R_{A}(f_{i})}-20\log {R_{A}(1000)} \approx 20\log {R_{A}(f_{i})} + 2 \end{eqnarray}$$

(A2)

with R_A(f_i) calculated from the expression:

$$\begin{eqnarray} R_{A}(f) = \frac{12194^2 f^4}{(f^2+20.6^2) \sqrt{(f^2+1007.7^2)(f^2+737.9^2)(f^2+12194^2)}}. \end{eqnarray}$$

(A3)

Fig. A1 shows the transfer level curve of the two-stage dynamic range limiter/compressor included in SoniCube for preventing ear damage. The slope of each stage is represented by r in equation (A1).

Figure A1.

Transfer level curve used for the control of salient flux values that may cause ear damage when representing unexplored data cubes.

Open in new tab Download slide

A stereo reverberation processor based on eight delay lines (Kahrs & Brandenburg 1998) is added to the signal flow to provide information about the distance from the selected spaxel to the centre of the galaxy. The final signal sent to the binaural encoder can be expressed as:

$$\begin{eqnarray} X_{L,R}(t) = S(t) + d \sum _{j=0}^{8} g^{j-1} S(t-j \tau ), \end{eqnarray}$$

(A4)

where S(t) is the output of the additive synthesizer, d is the distance of the user-selected spaxel to the reference point (centre of the galaxy), g is the gain of the feedback loop (in our case set to 0.9 to provide a long reverberation effect), and |$\tau$| is the fixed delay time applied to each line.

APPENDIX B: BINAURAL ENCODING

Human hearing can locate sound sources in a 3D space through the analysis of interaural level differences (ILD), and interaural time delays (ITD). The alterations produced in a sound when travelling from the source to the listener can be characterized using head related transfer functions (HRTF) (Brown & Duda 1997). Binaural systems are based on the convolution of the HRTF of both ears with the sound sources, allowing the spatialization of static and dynamic locations (Lazzarini & Carty 2008).

The relationship between the sound source and the signal reaching the listener ears can be expressed in terms of the azimuth (AZ) and elevation (EL) angles, the distance to the source (d), and the angular frequency (w), as shown in the following equation:

$$\begin{eqnarray} Y_{L,R}(AZ, EL, d, w) = H_{L,R}(AZ, EL, d, w)* X_{L,R}(w) , \end{eqnarray}$$

(B1)

where Y_L,R are the audio spectra of acoustic signals at listener’s ears, H_L,R is the HRTF and X_L,R are the audio spectra of the sound source.

Sonicube uses the binaural hrtfmove Csound opcode (Carty 2008) to represent the spectral information of the data cubes in an immersive 2D soundscape correlated to ViewCube’s UI. The azimuth is calculated from the (x,y) coordinates and the elevation is set to zero to discard the third dimension. Distance is not used to keep the levels of the sonification flux-dependent, reducing the general expression to:

$$\begin{eqnarray} Y_{L,R}(AZ, 0, w) = H_{L,R}(AZ, 0, w) * X_{L,R}(w) . \end{eqnarray}$$

(B2)

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Download all slides

Month:	Total Views:
November 2024	17
December 2024	169
January 2025	104
February 2025	71
March 2025	74
April 2025	58

Article Contents

Interactive multimodal integral field spectroscopy

ABSTRACT

1 INTRODUCTION

2 MULTIMODAL IFS

2.1 Case study

2.2 Standalone application

2.3 Sonification module

2.4 Autoencoding CALIFA

3 EVALUATION

3.1 Survey design

3.2 General results

3.3 Subgroup quantitative analysis

3.4 Qualitative feedback

4 CONCLUSIONS

ACKNOWLEDGEMENTS

DATA AVAILABILITY

Footnotes

REFERENCES

APPENDIX A: SYNTHESIZER DESCRIPTION

APPENDIX B: BINAURAL ENCODING

Citations

Views

Altmetric

Email alerts

Citing articles via

Most Read

Latest

Article Contents

Interactive multimodal integral field spectroscopy

ABSTRACT

1 INTRODUCTION

2 MULTIMODAL IFS

2.1 Case study

2.2 Standalone application

2.3 Sonification module

2.4 Autoencoding CALIFA

3 EVALUATION

3.1 Survey design

3.2 General results

3.3 Subgroup quantitative analysis

3.4 Qualitative feedback

4 CONCLUSIONS

ACKNOWLEDGEMENTS

DATA AVAILABILITY

Footnotes

REFERENCES

APPENDIX A: SYNTHESIZER DESCRIPTION

APPENDIX B: BINAURAL ENCODING

Citations

Views

Altmetric

Email alerts

Citing articles via

Most Read

Latest

This Feature Is Available To Subscribers Only