-
PDF
- Split View
-
Views
-
Cite
Cite
A. Ewall-Wice, Joshua S. Dillon, J. N. Hewitt, A. Loeb, A. Mesinger, A. R. Neben, A. R. Offringa, M. Tegmark, N. Barry, A. P. Beardsley, G. Bernardi, Judd D. Bowman, F. Briggs, R. J. Cappallo, P. Carroll, B. E. Corey, A. de Oliveira-Costa, D. Emrich, L. Feng, B. M. Gaensler, R. Goeke, L. J. Greenhill, B. J. Hazelton, N. Hurley-Walker, M. Johnston-Hollitt, Daniel C. Jacobs, D. L. Kaplan, J. C. Kasper, HS Kim, E. Kratzenberg, E. Lenc, J. Line, C. J. Lonsdale, M. J. Lynch, B. McKinley, S. R. McWhirter, D. A. Mitchell, M. F. Morales, E. Morgan, Nithyanandan Thyagarajan, D. Oberoi, S. M. Ord, S. Paul, B. Pindor, J. C. Pober, T. Prabu, P. Procopio, J. Riding, A. E. E. Rogers, A. Roshi, N. Udaya Shankar, Shiv K. Sethi, K. S. Srivani, R. Subrahmanyan, I. S. Sullivan, S. J. Tingay, C. M. Trott, M. Waterson, R. B. Wayth, R. L. Webster, A. R. Whitney, A. Williams, C. L. Williams, C. Wu, J. S. B. Wyithe, First limits on the 21 cm power spectrum during the Epoch of X-ray heating, Monthly Notices of the Royal Astronomical Society, Volume 460, Issue 4, 21 August 2016, Pages 4320–4347, https://doi.org/10.1093/mnras/stw1022
- Share Icon Share
Abstract
We present first results from radio observations with the Murchison Widefield Array seeking to constrain the power spectrum of 21 cm brightness temperature fluctuations between the redshifts of 11.6 and 17.9 (113 and 75 MHz). 3 h of observations were conducted over two nights with significantly different levels of ionospheric activity. We use these data to assess the impact of systematic errors at low frequency, including the ionosphere and radio-frequency interference, on a power spectrum measurement. We find that after the 1–3 h of integration presented here, our measurements at the Murchison Radio Observatory are not limited by RFI, even within the FM band, and that the ionosphere does not appear to affect the level of power in the modes that we expect to be sensitive to cosmology. Power spectrum detections, inconsistent with noise, due to fine spectral structure imprinted on the foregrounds by reflections in the signal-chain, occupy the spatial Fourier modes where we would otherwise be most sensitive to the cosmological signal. We are able to reduce this contamination using calibration solutions derived from autocorrelations so that we achieve an sensitivity of 104 mK on comoving scales k ≲ 0.5 h Mpc−1. This represents the first upper limits on the 21 cm power spectrum fluctuations at redshifts 12 ≲ z ≲ 18 but is still limited by calibration systematics. While calibration improvements may allow us to further remove this contamination, our results emphasize that future experiments should consider carefully the existence of and their ability to calibrate out any spectral structure within the EoR window.
1 INTRODUCTION
Mapping the 21 cm transition of neutral hydrogen at high redshift promises to revolutionize our knowledge on the first generations of stars and galaxies and to provide a unique probe of the ‘Dark Ages’ preceding this first generation of luminous objects (see Barkana & Loeb 2001; Furlanetto, Oh & Briggs 2006; Morales & Wyithe 2010 for reviews). Planned instruments such as the Square Kilometre Array (SKA; Koopmans et al. 2015) and the Hydrogen Epoch of Reionization Array (HERA; Pober et al. 2014) are expected to elucidate the formation of the first luminous structures and to place strict constraints on the properties of the sources that reionized the intergalactic medium (Pober et al. 2014; Greig & Mesinger 2015). A number of experiments including the Giant Metrewave Telescope (GMRT’ Paciga et al. 2013), the Low-Frequency Array (LOFAR; van Haarlem et al. 2013), the Murchison Widefield Array (MWA Bowman et al. 2013; Tingay et al. 2013a), and the Precision Array for Probing the Epoch of Reionization (PAPER Parsons et al. 2010) are already underway to explore the challenges of separating the faint cosmological signal from bright foregrounds and to attempt a first detection of the power spectrum of the cosmological 21 cm emission line.
Thus far, these experiments have targeted redshifts between 6 and 12. During this Epoch of Reionization (EoR), ultraviolet photons from the first generations of luminous sources transformed the intergalactic medium (IGM) from predominantly neutral to ionized. Over the past several years, deep integrations have placed significant upper limits on the power spectrum during reionization (Paciga et al. 2013; Dillon et al. 2014; Parsons et al. 2014; Ali et al. 2015; Dillon et al. 2015b; Trott et al. 2016). The best upper limits of 502 mK2 at z ∼ 8.4 (Ali et al. 2015) have begun to rule out scenarios where the neutral IGM experiences little or no heating (Pober et al. 2015). Integrations at comparatively high redshifts have also been carried out. Dillon et al. (2014) put an upper limit on the power spectrum at z = 11.7 using the 32-tile MWA pathfinder. A much deeper integration at redshift 10.3 was performed with PAPER's 32-element configuration (Jacobs et al. 2015), though it was limited by residual foregrounds at the edge of the instrumental bandpass.
While observations of the power spectrum during the EoR alone will shed light on the sources and astrophysics that drove reionization, it is only the final milestone in the evolution of the neutral IGM. Before reionization, the gas was heated, most likely by the first generations of high mass X-ray binaries (HMXB; Mirabel et al. 2011) and/or hot interstellar medium (ISM; Pacucci et al. 2014). Brightness temperature fluctuations from inhomogeneous heating at these early times can yield power spectrum amplitudes that are over an order of magnitude larger than those expected during reionization (Pritchard & Furlanetto 2007; Mesinger, Ferrara & Spiegel 2013). Even before the X-ray heating, fluctuations in the brightness temperature were likely sourced by fluctuations in the Lyman α flux field from the first stars (Barkana & Loeb 2005; Pritchard & Furlanetto 2006).
The ultimate goal of 21 cm cosmology is a three-dimensional map of the entire IGM between z ≈ 200 and reionization since, at least in principle, the 21 cm line is a cosmological observable accessible all the way back through the dark ages to the decoupling of the spin temperature from the cosmic microwave background (CMB; Furlanetto et al. 2006). Even though the ionosphere obscures extraterrestrial radio signals below about 30 MHz (Jester & Falcke 2009), it is expedient to use ground-based experiments to cover as great a redshift span as possible. Because foreground amplitudes and ionospheric effects grow progressively at lower frequency, the most reasonable next step after reionization in our march into the dark ages is the Epoch of X-ray heating (EoX). The exact redshift range for the EoX depends on the astrophysical model (see for example Mesinger, Ferrara & Spiegel 2013; Mesinger, Ewall-Wice & Hewitt 2014; Pacucci et al. 2014), but a reasonable range, targeted in this work, is z = 11.6 (113 MHz) to z = 17.9 (75 MHz).
However, assuming that the number of X-rays per baryon involved in star formation is the same as what is observed in nearby star-forming galaxies (Mineo, Gilfanov & Sunyaev 2012), the H i spin temperature is expected to be heated well above the CMB by the time reionization begins, saturating the effect of heating on equation (1) (Furlanetto 2006b). Hence, direct measurements of the 21 cm line during the EoX will be necessary if we want to learn about the detailed properties of the thermal history and the astrophysical phenomena that influence it. Recent work has shown that if X-ray heating proceeds inefficiently, the current generation of interferometers will be sensitive enough to detect the power spectrum sourced by spin temperature fluctuations at z ≈ 12 (Christian & Loeb 2013). Next generation of 21 cm observatories will detect the heating power spectrum for a wide range of heating scenarios out to redshifts as high as 20 (Mesinger et al. 2014) and place percent level constraints on the properties of the earliest X-ray sources (Ewall-Wice et al. 2016a). While pre-reionization measurements are expected to shed light on the first stellar mass black holes or the hot ISM, they may also offer us insights into other astrophysical processes. It is possible for dark matter annihilation (Valdés et al. 2013) and the existence of warm dark matter (Sitwell et al. 2014) to create observable impacts on the IGM thermal history. Finally, the IGM is especially cool and optically thick during the beginning of the heating process, making it ideal for 21 cm forest (Carilli, Gnedin & Owen 2002; Furlanetto & Loeb 2002; Furlanetto 2006a; Mack & Wyithe 2012; Ciardi et al. 2013) studies should any radio loud sources exist at those redshifts. It is also possible to constrain the source population itself by detecting its signature in the 21 cm power spectrum (Ewall-Wice et al. 2014).
Complementary observations of the sky-averaged (the ‘global’) 21 cm signal with a single dipole can also explore the reionization and pre-reionization epochs and experiments such as EDGES (Bowman & Rogers 2010), LEDA (Greenhill & Bernardi 2012), DARE (Burns et al. 2012), SARAS (Patra et al. 2013), SciHI (Voytek et al. 2014), and BIGHORNS (Sokolowski et al. 2015) are beginning to take data. While demanding much greater sensitivity than global signal experiments, power spectrum measurements with an interferometer probe fine frequency scales while foregrounds occupy a limited region of Fourier-space, known as the ‘wedge’ (Datta, Bowman & Carilli 2010; Morales et al. 2012; Parsons et al. 2012; Trott, Wayth & Tingay 2012; Vedantham, Udaya Shankar & Subrahmanyan 2012; Hazelton, Morales & Sullivan 2013; Thyagarajan et al. 2013; Liu, Parsons & Trott 2014a,b; Thyagarajan et al. 2015a,b). The region of Fourier space outside of the wedge, in principle free of foregrounds and therefore having greater sensitivity to brightness temperature fluctuations, is known as the ‘EoR window’ (henceforth ‘window’). foreground modelling and calibration errors that are smooth in frequency should have limited impact within the window.
In this paper we assess the levels of systematic errors that are especially severe at the lower EoX frequencies (relative to those typical of EoR studies) including the ionosphere, radio-frequency interference (RFI), and the enhanced noise and foregrounds from a sky that is both intrinsically brighter at lower frequencies and observed with a larger primary beam. In Section 2 we describe the MWA, our observations, and data reduction. In Section 3 we address the systematic errors that are especially challenging below EoR frequencies and our efforts to mitigate them. The limiting systematic error that we encounter is fine frequency structure in the instrumental bandpass due to standing wave reflections in the cables between the MWA's beamformers and receivers. After making a reasonable assumption about the relationship between our autocorrelations and the gain amplitudes, we achieve notable improvement in calibration but we are still left with significant foreground contamination.
Power spectrum upper limits are derived (Section 4) which are broadly consistent with thermal noise except in several regions of Fourier space corresponding to the light travel time delays of the reflections. We expect that refined calibration techniques employing better foreground models (Caroll et al., in preparation) and redundant baselines (Wieringa 1992; Liu et al. 2010; Zheng et al. 2014) can improve the removal of this contamination. In order to avoid signal loss and the introduction of spurious spectral structure, we have been conservative in the number of free parameters allowed in our gain solutions; increasing these may also resolve this problem. Reduced cable lengths expected in upcoming experiments such as HERA and the SKA will ameliorate the problem of reflections.
2 OBSERVING AND INITIAL DATA REDUCTION
We begin our discussion with an overview of our observations and our data reduction procedure. Our analysis yields two different image products with 112 s cadence: high resolution continuum images created from bandwidth multifrequency synthesis (MFS) (with ≈6 arcmin resolution) where baselines across all fine frequencies are combined into a single image, and naturally weighted multifrequency data cubes, where each fine frequency channel is imaged separately and integrated over 3 h. We use the MFS images to evaluate ionospheric conditions, and we use the multifrequency data cubes in our power spectrum analysis. We note that with 112 s averaging, we are performing significant averaging over fine time-scale ionospheric effects which for the MWA baselines have a typical coherence time of ≈10–44 s (Vedantham & Koopmans 2015, hereafter V15). After outlining the instrument and our observing strategy (Section 2.1), we discuss our initial calibration procedure (Section 2.2) and finish with the production MFS images (Section 2.3) and data cubes (Section 2.4) which serve as the input to our power spectrum pipeline.
2.1 Observations with the MWA
The MWA (Lonsdale et al. 2009; Tingay et al. 2013a) is a 128 antenna interferometer located at the Murchison Radio Observatory (MRO) in Western Australia (26.70°S, 116.67° E) with an analog bandpass of 80–300 MHz. Each correlated antenna tile consists of 16 dual polarization dipole elements arranged in a four-by-four grid. The phased output of the dipoles on each tile is summed together in an analog beam-former and delivered to one of sixteen different receiver units where 30.72 MHz of bandwidth is digitized before correlation in an on-site building. We refer the reader to Prabu et al. (2015) and Ord et al. (2015) for a detailed discussion of the MWA's receivers and correlator, respectively. The instrument is designed to achieve a diverse set of science goals (Bowman et al. 2013) including a first detection of the 21 cm power spectrum during the EoR, detecting and monitoring transients, pulsars (Tremblay et al. 2015), solar and heliospheric science (Tingay et al. 2013b), and a low-frequency survey of the sky below DEC = +25° (Wayth et al. 2015).
Observations of a field centred at R.A.(J2000) = 4h0m0s and decl.(J2000)=−30°0′0′ were carried out for 4.13 h over two nights on 2013 September 5 and 6, with primary beams of the 128 MWA antenna elements (‘tiles’) formed at five different altitude/azimuth pointings each night to track the field. After flagging for RFI and anomalous behaviour that we will discuss in detail in Section 3.1, our total observation time for our power spectrum upper limit is 3.08 h. In Fig. 1 we show the relative integration time on the sky weighted by the primary beam over all observations. We observed with 40 kHz spectral resolution simultaneously over two contiguous bands; a 16.64 MHz interval between 75.52 MHz and 90.88 MHz (Band 1) and a 14.08 MHz band between 98.84 MHz and 112.64 MHz (Band 2). Both of these sub-bands overlap with the FM band (88–108 MHz). In Fig. 2 we show our observed bands superimposed on the autocorrelation spectrum of a single MWA tile. Observations in Band 1 took place right on the edge of the analogue cutoff of the MWA, making its shape relatively complicated to model in calibration (our calibration is direction independent and described in Section 2.2 and Section 3.4). Band 2, while in a flatter region of the bandpass, has a larger overlap with the FM band. Observations were divided into 112 s snapshot with data averaged into 0.5 s integration intervals by the correlator.

A radio map at 408 MHz (Haslam et al. 1982) sin-projected over the region of the sky observed in this paper. Cyan through magenta contours indicate the total fraction of observation time weighted by our primary beam gain for our 3 h of observation at 83 MHz. Red contours indicate R.A.-decl. lines. Observation tracked the position (R.A.(J2000) = 4h0m0s, decl.(J2000) = −30°0′0′) on a region of the sky with relatively little galactic contamination and dominated by the resolved sources Fornax A and Pictor A. The galactic anticentre and bright diffuse sources, such as the Gum Nebula, are below 1 per cent bore-site gain.

The autocorrelation spectrum of a single tile, showing the MWA bandpass, is plotted here (solid black line) along with the frequency ranges over which our data was taken (grey striped rectangles). Observations were performed simultaneously on two non-contiguous bands located on either side of the 88–108 MHz FM band (red shaded region) to both assess conditions within the FM band and to preserve some usable bandwidth should it have proven overly contaminated by RFI.
2.2 RFI flagging and initial calibration
The data were first flagged for RFI contamination. An optimized version of aoflagger (Offringa, van de Gronde & Roerdink 2012), called cotter (Offringa et al. 2015), was run on each snapshot with automatic RFI identification performed only on the visibility cross-correlations. Additional flags were applied to the centre and 40 kHz edges of each 1.28 MHz resolution coarse spectral channel to remove digital artefacts arising from the two stage channelization scheme used in the MWA. After flagging, visibilities were averaged in time from 0.5 to 2 s and in frequency from 40 to 80 kHz. The averaged visibilities were than converted to Common Astronomy Software Application (CASA) measurement sets (McMullin et al. 2007) which served as the inputs to all ensuing steps in our pipeline. The percentage of all data flagged by cotter was approximately 0.75 per cent for Band 1 and 2 per cent for Band 2. As described in Section 3.1, we also implement additional (and highly conservative) flagging of observations by inspecting autocorrelations which leads us to discard 25 per cent of our data. Our initial calibration was divided into three steps: A preliminary complex antenna gain solution using an approximate sky model, one iteration of self calibration, and polynomial fitting to reduce noise and limit fine frequency scale systematics that might arise from our incomplete sky model.
For the first step, our sky model combined a list of point sources with images of the two bright resolved sources in our field: Fornax A and Pictor A. For the point source model we included the 200 brightest sources at our frequencies based on extrapolated power-law fits to data from the Coolgoora survey (Slee 1995), the Molongolo Reference Catalogue (Large et al. 1981) and more recent measurements by PAPER at 145 MHz (Jacobs et al. 2011). Fornax A is the brightest extended source in our field and is highly resolved, so we modelled it with a VLA image taken by Fomalont et al. (1989) at 1.4 GHz, scaled to match the flux density and spectral index measured by Bernardi et al. (2013) at 180 MHz, and extrapolated to our band with a spectral index of −0.88. For Pictor A, we used a VLA image at 333 MHz by Perley, Roser & Meisenheimer (1997) and extrapolated to our band with a spectral index of −0.71. The model components for our initial calibration extended down to an apparent flux density of ≈5 Jy which is comparable to the flux uncertainties in the brightest sources in the initial catalog. Due to the high uncertainty in the sky at our frequencies, this model was updated by a round of self calibration which we describe shortly.
The CASAbandpass function was used to obtain a first set of best-fitting calibration gains, averaging over 32 fine channels for each solution. Since our starting model was uncertain at the 10 per cent level, an iteration of self calibration was performed by MFS imaging and deconvolving 104 components with WSClean (Offringa et al. 2014). The CLEAN components were used as a model for a second run of bandpass where we relaxed channel averaging so that the complex gain for each 80 kHz fine channel was found independently. In Fig. 3 we show the fractional change in calibration amplitude for one antenna tile over time intervals in which beamformer settings were held constant (pointings). Over our two nights we find that variations are on the order of a few per cent and dominated by uniform amplitude jumps across the entire frequency range that are strongly correlated between all antennas. Observations of the autocorrelations do not show such behaviour within each pointing so these gain jumps must arise from the calibration routine itself. Possible sources of these jumps could be variations in the overall amplitudes in self-calibration which occurs if the cleaning step does not recover all of the flux density on large scales, unmodelled sources moving through the sidelobes of the primary beam, or the result of a varying sky in the presence of beam modelling errors. Since these features do not introduce fine frequency structure, we do not think they are an impediment to the power spectrum analysis in this work.

A false-colour plot of the fractional change in our calibration amplitudes over each pointing (in which beamformer delay settings are fixed). Pointing changes are marked by the solid white horizontal lines. We see that the calibration amplitudes vary within a pointing on the order of several percent with little systematic variation. There are several coarse channel scale jumps on September 5 that correspond to observations in which the number of sources identified within a snapshot image were reduced (see Fig. 9). We found that these jumps corresponded to excess flagging from cotter, indicating high levels of RFI or other bad data, and dropped them from our analysis. Vertical lines from RFI are visible at 98 MHz and 107 MHz, especially on September 6.
When we formed power spectra from visibilities calibrated by this initial method, we found that our beamformer to receiver cables introduced spectral structure at the ≲1 per cent level into our instrumental bandpass which were not removed by this smooth model (we return to this issue in Section 3.3). The effect of this spectral structure on the MFS images, used to measure ionospheric refraction in Section 3.2, was negligible and we therefore employed this calibration for their production. A more refined calibration procedure, which we describe in Section 3.4, was used for the data cubes and our power spectra.
2.3 MFS imaging and flux scale corrections
Sources were identified using the Aegean source finder (Hancock et al. 2012). An overall flux scale for each stokes I snapshot was set following the technique, described in Jacobs et al. (2013), in which the flux scale for all sources is simultaneously fit to catalog flux densities using a Markov chain Monte Carlo method. We used the ten highest signal-to-noise point sources in each field and catalog flux densities interpolated between 74 MHz measurements from the Very Large Array Low-Frequency Sky Survey Redux (Lane et al. 2014) and 80 and 160 MHz measurements from the Culgoora catalog (Slee 1995). Since a flux scale error has no frequency dependence and the errors themselves evolve slowly in time near the centre of the primary beam, we do not think that such mismodelling will result in frequency-dependent errors. The dominant uncertainty in our flux scale is the systematic uncertainty in the model source fluxes themselves which are on the order of ≈20 per cent (Jacobs et al. 2013) while uncertainties in the beam model contribute at the several percent level (Neben et al. 2015). On September 6, we observed systematically smaller source counts than on September 5 (Fig. 9). We attribute this difference to greater ionospheric turbulence on September 6; and discuss this result further in Section 3.2.
In Fig. 4 we show a deep, primary beam corrected, integration of a portion of our field formed from an MFS image of Band 1. Known sources are well reproduced. The diffuse structures of the Vela and Puppis supernova remnants are clearly visible along with the fine scale structure of Fornax A.

A deep image of the MWA ‘EoR-01’ field centred at (R.A.(J2000) = 4h0m0s and decl.(J2000) = −30°0′0′), derived by stacking restored multifrequency synthesis Stokes XX and YY images produced by WSClean on Band 1. The dominant source in our field is Fornax A (detailed in the inset) whose structure is well recovered in imaging. Pictor A is also present at the centre of the image (at ∼30 per cent beam) along with the diffuse Puppis and Vela supernova remnants on the left.
Similar to previous upper limits on the power spectrum (Dillon et al. 2014; Parsons et al. 2014; Ali et al. 2015; Dillon et al. 2015b; Trott et al. 2016), our analysis does not consider the cross polarization products from the interferometer, which would require an additional calibration step to solve for the arbitrary phase difference between the X and Y polarized arrays (Cotton 2012; Moore et al. 2015).
2.4 Data cubes for power spectrum analysis
The inputs for our power spectrum analysis are multifrequency data cubes further calibrated with autocorrelation data from each tile. We describe the autocorrelation calibration technique in Section 3.4 where we address systematic errors revealed by our first look at data cubes and power spectra. In this section we describe our procedure for building the data cubes. While the MWA's long baselines, extending out to 2864 m, are useful for gauging ionospheric conditions and other systematics, the uv plane is completely filled out to ≲160 m after ≈3 h of rotation synthesis. Since forming image power spectra using data with incomplete uv coverage leads to unwanted mode-mixing and spectral artefacts (Hazelton et al. 2013), we threw away the sparse regions of the uv plane and reprocessed the data for our power spectrum analysis at much lower angular resolution than our MFS images. For each 112 s snapshot, we divided the data into even and odd time step visibility sets formed from every other 2 s integration step, which we later cross multiply to form power spectra without noise bias (Dillon et al. 2014). A naturally weighted, dirty snapshot cube was produced for each set with 80 kHz spectral resolution, 80 pixels on a side and 1| $_{.}^{\circ}$|0 (0| $_{.}^{\circ}$|75) resolution for Band 1 (2). To form a power spectrum we will have to uniformly weight the sum of these cubes, hence the array point spread function (PSF), sj, even/odd, which is the 2 d Fourier transform of the number of samples within each uv cell; and the analytic primary beam matrix, |$\bf{B}$|j, were also saved for each snapshot, polarization and interleaved time step. The power spectrum formalism of (Dillon, Liu & Tegmark 2013) assumes the flat sky approximation and requires a sufficiently small field of view to be computationally tractable. Hence, before creating a uniformly weighted data cube by stacking the naturally weighted images in the uv plane and dividing by their cumulative sampling function, we cropped each snapshot from 80 × 80 to 24 × 24 pixels.
By dividing by the point spread function in the uv plane, equation (5) is essentially the application of uniform weighting to our data with some additional factors of the primary beam which warrant explanation. An additional factor of the primary beam was included in the sum to upweight regions of the field to which the MWA has the greatest gain, and is equivalent to optimal mapmaking (Tegmark 1997b; Dillon et al. 2015a). Other 21 cm pipelines, notably Fast Holographic Deconvolution (Sullivan et al. 2012) perform a similar upweighting by gridding visibilities with the primary beam while the fringe rate filtering procedure (Parsons et al. 2016) weights without gridding or imaging at all.
The impact of multiplying by the two factors of the beam in equation (5) has the effect of convolving the true visibilities the uv space beam convolved with its complex conjugate. Since our uv cells are quiet large, this is well approximated by multiplication of the visibilities by the convolution of the uv beam with its complex conjugate. We therefore also include the factor B2 in the denominator to correctly normalize the data in the uv plane.
Our |${\bf \widehat{x}}$| estimate of I in equation (5) is similar to the Stokes-visibility I approximated in previous power spectrum analyses (Dillon et al. 2014, 2015b; Parsons et al. 2014; Ali et al. 2015). Visibility stokes I is susceptible to leakage from visibility stokes |$Q \equiv \frac{1}{2} (XX - YY)$| due to beam ellipticity which has the potential to introduce fine frequency structure, caused by Faraday rotation, into the EoR window (Jelić et al. 2010; Geil, Gaensler & Wyithe 2011; Moore et al. 2013; Jelić et al. 2014; Asad et al. 2015; Moore et al. 2015). We can obtain an upper limit on polarization leakage in our power spectrum estimate by considering the best upper limits to date of the polarized Q, U, and V visibility power spectra that might leak into our I visibilities, measured by Moore et al. (2015) over the large field of view of PAPER. In this analysis the authors place limits of ≈5 × 104 mK2 on Q, U, and V at ≈120 MHz. Scaling by the frequency dependence of the sky, the polarization power spectrum levels at our frequencies should be below ≈5 × 104 mK2(80 MHz/120 MHz)5 ≈ 3.7 × 105 mK2. The leakage from Q/U to I is given by equations 15 and 16 in Moore et al. (2015), and is equal to the product of the polarized power spectrum and the ratio between the integral of the differences of the X and Y polarized beams squared and the integral of the sum of the polarized beams squared. Using a short dipole model of our primary beam, we find that this ratio for the MWA beam is ≈5 × 10−3. We therefore estimate an upper limit on the stokes Q, U to I leakage in our power spectra to be ≈5 × 10−3 × 3.7 × 105 mK2 ≈ 1.5 × 103 mK2. This is slightly larger than the EoX power spectrum which is anticipated to be several hundred mK2 (Pritchard & Furlanetto 2007; Santos et al. 2008; Baek et al. 2010; Mesinger et al. 2013) so it is still possible that polarized leakage may limit a detection unless direction-dependent polarization corrections are applied. However, this number is an upper limit and the actual leakage is probably lower. As of now, the most sensitive limits on the EoR power spectrum formed from XX and YY visibilities (Ali et al. 2015) limit (Q, U) → I leakage from the similarly elliptical PAPER beam to below the level of ≈500 mK2 between k∥ ≈ 0.2–0.5 h Mpc−1 at 150 MHz while Asad et al. (2015) predict stokes polarized power spectrum from observations of the 3C196 field at ≈142 MHz to be at the level of only 102–103 mK2 at k⊥ ≲ 0.1 h Mpc−1 (see their fig. 12, panel a). Scaling this up by (140 MHz/80 MHz)5 ≈ 16 to account for the increasing sky temperature and applying our ellipticity factor of 5 × 10−3 gives a polarization leakage of ≈8–80 mK2 which is still several times smaller than the predicted amplitude of the EoX power spectrum.
In order to reduce artefacts from aliasing at the edges of the coarse channels due to the two-stage polyphase filter bank, we flagged 240 kHz at each side. Finally, uv cells with poor sampling were flagged since sampling and noise in these cells can change rapidly with frequency, leading to fine frequency artefacts (Hazelton et al. 2013).
In Fig. 5 we show the standard deviation across all uv cells at each frequency in our Band 1 difference cube along with the square root of the mean of our model variances at each frequency assuming Tsys∝f −2.6 and normalized to the centre channel. We see that they are in good agreement. We also find that the standard deviation across frequency in each uv cell is consistent with the square root of the mean across frequency of our model variances.

Left: the standard deviation over uv cells as a function of frequency for an even/odd time step difference cube after 3 h of integration. The blue line is derived from data, while the green line is a model with a system temperature of 470 K at 150 MHz and a spectral index of −2.6. Spikes in the standard deviation are present at the centre of each coarse channel since the centre channel has one half of the data due to flagging the centre channel which is contaminated by a digital artefact before averaging from 40 to 80 kHz. Right: the ratio of variance taken over frequency in each uv cell and our variance model using the same system temperature as on the left. The ratio between our model and observed variance is close to unity across the uv plane. White cells indicate uv voxels that were flagged at all frequencies due to poor sampling. All data in this figure is from Band 1.
An interesting question is whether or not our determination of Tsys is contaminated by ionospheric scintillation noise. For baselines within the Fresnel radius, |$r_F=\sqrt{\lambda h/(2\pi )}$|, of an ionospheric plasma screen of height h (which is the case for the MWA core), V15 determine the coherence time for scintillation noise to be set by the length of time it takes for overhead plasma, travelling at velocity v, to cross the Fresnel radius, 2rF/v. For typical plasma velocities of ≈100–500 km s−1 and h ≈ 600 km, consistent with measurements (Loi et al. 2015a), we obtain coherence times between 10 and 44 s at 83 MHz, which is significantly greater than the 2 s interleaving of our data cubes. Hence, ionospheric fluctuations are likely subtracted away in our differencing on 2 s intervals. For a much lower altitude of 100 km, the correlation time is still ≈4 s at 83 MHz. Even if there was still some variation between the time slices, we can put an upper limit on the variation relative to thermal noise by comparing the amplitude of scintillation noise we might expect given our primary beam and the parameters of the phase power spectrum of the ionospheric fluctuations that we determine in Section 3.2. We find that the level of scintillation noise on a single visibility in a 2 s snapshot (Appendix C) is only ≲2 per cent the thermal noise level. Further suppression of the scintillations comes from the fact that we approximate Tsys at each frequency by taking the standard deviation across the uv plane within the Fresnel zone, in which the noise is expected to be coherent and would not contribute to such a standard deviation. The coherence in frequency of the inosopheric fluctuations (V15,V16) would also suppress their contribution to the standard deviation across frequency in each uv cell. For these reasons, we expect scintillation noise to have a very small contribution to our determination of Tsys at or below the 1 per cent level.
3 ADDRESSING THE CHALLENGES OF LOW FREQUENCY OBSERVING
A number of systematics associated with observing the EoR become dramatically more challenging as one moves to higher redshift. Because the Epoch of X-ray heating (EoX) spans the FM band, we expect enhanced RFI contamination. The ionosphere's influence on electromagnetic wave propagation increases with wavelength, though its smooth evolution in frequency should cause its impact on source mis-subtraction and calibration to be contained within the wedge (Trott & Tingay 2015; Vedantham & Koopmans 2016, hereafter V16). Moving down in frequency, the larger primary beam extends foreground emission to higher delays and hence larger |$k_\Vert$| while the foregrounds increase rapidly in brightness temperature as ≈f −2.6 (Rogers & Bowman 2008; Fixsen et al. 2011). Finally, spectral structure in our gains at fixed delays move down in |$k_\Vert$| at higher redshift and, due to the increased primary beam width, occupy a greater extent in k-space as well (Thyagarajan et al. 2015a). In this section we determine the impact (if any) of each these obstacles on our power spectrum analysis and describe our strategies for mitigating them. We deal with RFI, ionosphere, and spectral structure in Sections 3.1, 3.2, and 3.3/3.4, respectively.
3.1 Radio-frequency interference
As explained above, automated RFI detection and flagging was performed using cotter on the 0.5 s, 40 kHz resolution cross-correlations before time and frequency averaging. To illustrate the time-frequency structure of RFI contamination, we plot the fraction of visibilities flagged at each fine frequency channel and 112 s snapshot interval in Fig. 6. One can see that the majority of Band 1 is clear of RFI. Even within the region overlapping with the FM, events are sparse in time and frequency. In Band 2 we see significantly greater interference, especially in the two lowest coarse channels. There are clearly a greater number of events contained within the FM band however they only appear intermittently with the exception of a handful of 40 kHz fine-channels. The existence of intermittent FM signals, even in a radio quite site such as the MRO is possibly due to signals from over the horizon, reflected off of the bottom of the ionosphere

The percentage of visibilities flagged for RFI by cotter as a function of time and frequency. White regions indicate missing data including the coarse band edges and blue-dashed vertical lines indicate the edge of the FM band. While Band 1 remains predominantly clear with a few sparse events within the lower end of the FM band, contamination is significantly greater over Band 2. Even in the FM band, RFI events are either isolated in frequency or time, allowing us to flag them. A handful of observations in Band 1 on September 5 are missing entire coarse channels which we also discard. Bar plots on the bottom and right show the averages of the RFI flagging fraction over time and frequency, respectively.
It is possible for interference that is present for extended periods of time but weak enough to remain below the 112 s noise floor over which RFI is flagged to contaminate our data. RFI can also make it past flagging through calibration solutions which are derived from the autocorrelations on which cotter does not perform flagging (Section 3.4). When we first created integrated data cubes there were clear signs in Band 2 that some low level RFI contamination remained in the form of ripples in the power spectrum and spikes in the frequency domain of our gridded visibility cubes. We identified and discarded observations that appeared to contain increased flagging for a wide range of frequencies and completely flagged any channels that contained spikes in our final data cubes.
The lower two coarse channels in Band 2 were contaminated for a wide range of times (Fig. 6) and were thus excluded from our power spectrum cubes entirely. In addition, we flagged a total of 9, 80 kHz channels (7 per cent of our data) which appeared to be contaminated by RFI at contiguous intervals over a significant number of observations; 104.08, 104.48, 106.08, 106.24, 106.32, 107.44, 107.60, and 107.84 MHz. The rest of the FM band appears clean after routine cotter flagging is applied. After these channels are discarded, we see no evidence that our 3 h power spectrum results are limited by RFI (see the end of Section 4.1 for further discussion).
3.2 Ionospheric contamination
The refraction induced by gradients in the total electron content (TEC) of the ionosphere scales as λ2 and is therefore expected to be more severe at our frequencies than those associated with the EoR. In this section we quantify the severity of ionospheric conditions over our observations by measuring the differential refraction of point sources relative to known catalog positions. We find that the ionospheric gradients change considerably over the duration of our observations, despite nearly constant and mild solar weather indicators (Table 1). However, when we form power spectra from data with nearly a factor of 2 difference in the observed gradients, we see no effect on the power spectrum within the EoR window (Fig. 22). The analysis presented here is meant to assess the level of refraction and, in Section 4.3, its impact on the power spectrum. Readers who are interested in a more detailed analysis of TEC gradients over the MRO and an interpretation of their physical origin or their impact on time domain astrophysics should consult Loi et al. (2015a,b).
Bulk ionospheric and solar weather properties on the two nights of observing presented here. 〈TEC〉 indicates the mean total electron content over the entire night. We also show the diffractive scale calculated from all source separations on both nights which differ by a factor of 2.
Night . | Kp . | 〈TEC〉 (TECU) . | F10.7 (sfu) . | 〈rdiff〉 (km) . |
---|---|---|---|---|
September 5 | 2 | 10.6 | 10.9 | |$10.4\pm _{0.2}^{0.4}$| |
September 6 | 2 | 11.4 | 11.0 | |$5.20\pm _{.05}^{.04}$| |
Night . | Kp . | 〈TEC〉 (TECU) . | F10.7 (sfu) . | 〈rdiff〉 (km) . |
---|---|---|---|---|
September 5 | 2 | 10.6 | 10.9 | |$10.4\pm _{0.2}^{0.4}$| |
September 6 | 2 | 11.4 | 11.0 | |$5.20\pm _{.05}^{.04}$| |
Bulk ionospheric and solar weather properties on the two nights of observing presented here. 〈TEC〉 indicates the mean total electron content over the entire night. We also show the diffractive scale calculated from all source separations on both nights which differ by a factor of 2.
Night . | Kp . | 〈TEC〉 (TECU) . | F10.7 (sfu) . | 〈rdiff〉 (km) . |
---|---|---|---|---|
September 5 | 2 | 10.6 | 10.9 | |$10.4\pm _{0.2}^{0.4}$| |
September 6 | 2 | 11.4 | 11.0 | |$5.20\pm _{.05}^{.04}$| |
Night . | Kp . | 〈TEC〉 (TECU) . | F10.7 (sfu) . | 〈rdiff〉 (km) . |
---|---|---|---|---|
September 5 | 2 | 10.6 | 10.9 | |$10.4\pm _{0.2}^{0.4}$| |
September 6 | 2 | 11.4 | 11.0 | |$5.20\pm _{.05}^{.04}$| |
Cohen & Röttgering (2009) have observed ionospheric TEC gradients with the VLA using the differential refraction statistic and Helmboldt & Intema (2012) have measured the 2 d power spectrum of spatial and temporal fluctuations in TEC over the VLA. A similar power spectrum analysis exploiting the MWA's much larger field of view was recently carried out by Loi et al. (2015a,b). Because the ionosphere is curved and the derivation of equation (11) relies on the Fresnel approximation this model is only strictly valid for small fields of view so we only measure source shifts within 15 degrees of the phase centre. To obtain a global picture of ionospheric conditions, we turn to the differential refraction statistic employed in Cohen & Röttgering (2009) which we now briefly describe.
If |$\boldsymbol{\Delta } \boldsymbol{\theta } = {\bf \Delta } \boldsymbol{\theta }_1 - {\bf \Delta } \boldsymbol{\theta }_2 = (\Delta \alpha , \Delta \delta )$| is a two-dimensional vector with each component distributed with standard deviation σ, than the probability density function of the amplitude square is given by an exponential distribution. We compute an estimate of D(θ), |$\widehat{D}(\theta )$|, within each angular separation bin by fitting a histogram of the lower 80 per cent of source separations to an exponential distribution in order to eliminate potentially spurious outliers. D(θ), as will be explained, is directly related to the power spectrum of phase fluctuations whose properties we will determine below.
In Fig. 7 we show the differential refraction computed from all differential source pair separations over 30 min on 2013 September 5 and 6 for both of our observing bands. We see that the level of fluctuations recorded in both bands scales as λ2 indicating that it indeed originates from ionospheric effects. Kassim et al. (1993) use a similar comparison to confirm ionospheric refraction as a source of variation in visibility phases. On September 6, the levels of refraction are approximately a factor of 2 greater than those observed on September 5, peaking at the end of the night.

Differential refraction derived from source pairs within 30 min bins on 2013 September 5 (top row) and 2013 September 6 (bottom row). Band 1 (black points) scaled by the ratio of the band centre frequencies square (solid black line) very nicely predicts the differential refraction in Band 2 (red points), indicating that the refraction we are measuring here is indeed due to ionospheric fluctuations. The magnitude of ionospheric activity differs significantly between September 5 and 6 and peaks over the last observations taken on the 6. We also show fits to an isotropic power spectrum model of differential refraction at 83 MHz (dashed black line) and print the inferred diffractive scale.

The dimensionless integral, Fn(x) normalized to unity at Fn(10), given in equation (20). For small values of x, Fn(X) is well approximated by a power law, but flattens out towards x = 1. Hence, the structure function of observed source offsets levels out at the outer energy injection scale of the turbulence.
Given our fitted values, we can also compute the diffractive scale of the ionospheric fluctuations rdiff which gives the scale at which the structure function of the phase field, ϕ reaches unity. Phase fluctuations that have large amplitudes and deocorrelate rapidly with separation give a smaller diffractive scale, allowing it to serve as a single number indicator of the severity of the fluctuations. We determine rdiff for each 30-min interval by computing the structure function from each fitted power spectrum and numerically solving for rdiff. We obtain error bars by computing rdiff for 1000 instances of (r0, n, ϕ0) drawn from a multivariate Gaussian distribution whose covariance is the estimate of the fitted parameter covariances given by the Levenberg–Marquardt method as implemented in scipy.1 We use the 16 and 84 per cent percentile values of the resulting distributions to obtain 1σ upper and lower bounds. On September 5, the median rdiff, at 83 MHz, ranged between ≈11 and 13 km while on September 6, it ranged between ≈4 and 6 km. We report these values and their errors in Fig. 7. We also show the values of rdiff derived from all source pairs on each night in table 1. The frequency scaling of the diffractive scale depends on the spectral slope of P(k), n. The n-values in our analyses typically fall between 2.3 and 2.7 which yields a rdiff ∝ f scaling.2 Hence, our measurements imply diffractive scales of ≈20 km and ≈10 km at 150 MHz on September 5 and 6, respectively. These values are within the range of typical scales (5–50 km) described in V15.
Changes in the refractive index over a source on time-scales shorter than the snapshot integration can cause source smearing in image space, resulting in a reduction in the observed peak brightness and reducing the number of source detections (Kassim et al. 2007). In Fig. 9 we compare the number of sources identified in each snapshot over September 5 and 6. On September 5, when the level of refraction is significantly lower, we observe that the number of sources increases as the field approaches transit, corresponding to the pointing at which the beam has maximal gain and then turns over. On September 6, the night in which significantly greater refraction was observed, the number of sources identified stays relatively constant and significantly lower than any of the source counts observed on September 5. There are also noticeable drops in the source counts on a handful of observations on September 5 which we found to correspond to flagging events in which an entire coarse band was eliminated with significant RFI detections on the edges (Fig. 6). We drop these snapshots from the rest of our analysis.

The number of sources identified in 112 s multifrequency synthesis images of Band 1 as a function of time over both nights of observing. On September 5, the source counts increase with primary beam gain, until transit (vertical grey line) before decreasing. On September 6, when more severe ionospheric refraction was observed, the source counts remain significantly lower. Fewer sources were identified in a handful of September 5 snapshots corresponding to observations in which calibration and flagging anomalies were observed(Figs 3 and 6). We exclude these snapshots from our analysis.
While refraction varied significantly over both nights and within each night, the bulk solar weather and geomagnetic conditions are nearly identical. In Table 1 we also list several bulk statistics such as Kp index, 10.7 cm flux (F10.7),3 and mean TEC at the MRO.4 The solar flux at 10.7 cm is an often used index of solar activity and is known to correlate with the emission of the UV photons responsible for generating the free electrons that impact ionospheric radio propagation (Yeh & Flaherty 1966; da Rosa et al. 1973; Titheridge 1973) with values ranging between 50 and 300 SFU (1 SFU = 10−22 W m−2). The Kp index (Bartels, Heck & Johnston 1939), quantifies the severity of geomagnetic activity and combines many local measurements of the maximal horizontal displacement of the Earth's magnetic field. Values for Kp range from 0 to 9 with any values above 5 indicating a geomagnetic storm. We find that the bulk values are very similar between nights, indicating that ionospheric indicators derived from the observations themselves are a much better way of assessing data quality. Indeed, Helmboldt, Lane & Cotton (2012) also found that the levels of ionospheric turbulence and the incidence of travelling ionospheric disturbances did not appear to correlate strongly with bulk ionosphere and solar weather statistics. A systematic study of ionospheric conditions at the MRO site and correlations with bulk ionosphere statistics at higher frequencies is currently underway (Loi et al., in preparation). Studies incorporating information from ancillary probes such as GPS stations are also being carried out (Arora et al. 2015).
Having established that one of our observing nights had twice the level of refraction than the other, we can compare power spectra from each night to guage the impact of ionospheric fluctuations on the 21 cm power spectrum. In Section 4.3, we find that the two power spectra appear indistinguishable.
3.3 Instrumental spectral structure
At low frequencies, the combination of wider primary beams and intrinsically brighter foregrounds causes leakage into the EoR window to corrupt a wider range of delays which are related to cosmological Fourier modes. In addition, the cosmological modes occupied by a fixed delay depends on redshift, causing features that contaminate low-sensitivity regions of k-space at one redshift to contaminate scientifically important regions at another.
All terms in equation (22) involve the cross multiplications of |$\widetilde{V}_{ij}(\tau + \Delta \tau ) \widetilde{V}_{ij}^{\ast }(\tau + \Delta \tau ^{\prime })$| and a coefficient on the order of |$\widetilde{r}^n$|. The cable delays on the MWA are all ≳90 m, corresponding to round trip delays significantly beyond the wedge for the short baseline lengths considered in our power spectrum analysis. Hence, if |$|\widetilde{r}|$| is greater than the ratio between the signal and foreground amplitudes, Rfg, terms with Δτ = Δτ′ dominate equation (22). As a consequence, the first and last lines in equation (22) dominate all others. The first (|$\mathcal {O}(0)$|) is the foregrounds in the absence of reflections and exceeds the signal by a factor of 1010 but is also contained within the wedge. The last line contaminates τ = τi, −τj at the level of |$\widetilde{r}^2\times 10^{10}$| the signal level. All other lines in equation (22) exceed the signal level by |$\widetilde{r}^2 10^5$| but also contaminate a greater range of delays: τ = τi, −τj, 2τi, −2τj, τi − τj.
As we will discuss below, we find that |$\widetilde{r} \lesssim 10^{-2}$| (Fig. 9), hence our analysis is only sensitive to the first and last lines arising from first-order reflections. The fourth, fifth and sixth terms in equation (22), which couple second-order reflections and beats will be above the level of the cosmological signal twice the round trip delay times and their differences. However, if the ≲1 per cent reflections can be corrected to be below the level of ≲10−3, these cross terms will only appear at the 10−3 level of the signal and not impede a detection (though they may introduce some bias).
Since the |$\mathcal {O}(\widetilde{r}^2)$| terms in the last line of equation (22) dominate the others outside of the wedge, the lowest order effect of a reflection is to multiply our foregrounds by the reflection coefficient squared and translate them outside of the low-delay region in which they are usually confined (the wedge) to the round-trip delay of our cable which for the MWA is outside of the EoR window. Unless |$|\widetilde{r}|$| can be brought below the ratio between the foregrounds and the signal itself (|$|\widetilde{r}_i| \lesssim 10^{-5}$|), the modes within a wedge translated to τi will remain unusable. Uncorrected reflections at the 10−2 level will also contaminate higher order harmonics and the differences between the delays.
For |$\widetilde{r}$| significantly larger than Rfg (as is the case here), delays corresponding to higher order reflections will also be contaminated. These higher order terms are well below the sensitivity of this analysis but they can still potentially pose an obstacle to the detection of the signal. Hence, it is worth commenting on the level that |$\widetilde{r}^3$| and |$\widetilde{r}^4$| terms contaminate our data. First we address |$\widetilde{r}^3$|. Since every contribution of order |$\mathcal {O}(\widetilde{r}^3)$| in equation (22) involves the product of an |$\mathcal {O}(\widetilde{r}^3)$| coefficient with delay transformed visibilities translated to two different delays (Δτ ≠ Δτ′), these terms will contribute at the level of |$\widetilde{r}^3 \times 10^5$| the level of the foregrounds. Uncorrected third-order terms with |$\widetilde{r} \lesssim 10^{-2}$| will contaminate our data at 10−1 the level of the signal and we do not consider them a serious issue, especially if the reflections are corrected to the ≈10−3 level.
We conclude that the sub-percent reflections observed in our data will introduce |$\mathcal {O}(\widetilde{r}^4)$| terms to the power spectrum at |$\widetilde{r}^4 \times 10^{10}$| times the amplitude of the signal at twice the fundamental delays and their differences. Second-order reflections at the ≲1 per cent level will therefore dominate the signal by two orders of magnitude, impeding a detection. Fortunately, the |$\widetilde{r}^4$| dependence of these second-order terms greatly amplifies even modest improvements in correcting the reflections. For example, if the reflections are brought to below the 0.1 per cent level, the |$\mathcal {O}(\widetilde{r}^4)$| terms will be brought to below 10−2 the level of the 21 cm signal. We are able to bring the reflections down to ≈0.002 so they are not a problem in our data.
In Table 2 we list the lengths of cable between the MWA receivers and beamformers along with their round-trip delays, and their corresponding |$k_\Vert$| at the centre redshift of our two bands along with a redshift typical of an EoR measurement. Because P(k) decreases rapidly with increasing k, interferometers are expected to have the highest signal to noise at the smallest delay that is uncontaminated by foregrounds (280 ns, corresponding to |$k_\Vert \approx 0.1{\rm -}0.2$| h Mpc−1 at EoX to EoR frequencies (Pober et al. 2013)). Assuming that reflections can be corrected to be below the 10−3 level so that only the last line in equation (22) is above the signal, they should be benign as long as they remain at sufficiently large or small |$k_\Vert$| that they do not leak into the region of maximal sensitivity. Because standing waves translate the entire wedge up to their delay, reflections located at the edge of the wedge will result in excess supra-horizon emission while cable reflections outside of the wedge will contaminate a finite chunk of |$k_\Vert$|, not just the delay of the reflection itself.
There are N cables of each length (L) between the MWA receivers and beamformers with associated round-trip delay times (τr). In the f30λ column, we list the percentage of baselines within 30λ at 83 MHz (where the majority of the MWA's power spectrum sensitivity lies) that are formed from at least one tile with the given cable length. We also list the |$k_\Vert$| of each delay given by equation (24) for three different redshifts. Cable reflections that are significantly above the |$k_\Vert$| values where we expect to obtain maximum sensitivity to the power spectrum at EoR redshifts (z ≈ 7) move into the maximum sensitivity region at EoX redshifts (z ≈ 16). Higher order reflections will also contaminate multiples of and differences between the delays and k∥ values listed in this table (thought at a lower level).
L . | N . | τr . | f30λ . | |$k_\Vert (z=7)$| . | |$k_\Vert (z= 12)$| . | |$k_\Vert (z=16)$| . |
---|---|---|---|---|---|---|
(m) . | . | (μs) . | (per cent) . | (h Mpc−1) . | (h Mpc−1) . | (h Mpc−1) . |
90 | 19 | 0.74 | 40 | 0.42 | 0.31 | 0.27 |
150 | 31 | 1.2 | 80 | 0.70 | 0.53 | 0.50 |
230 | 23 | 1.9 | 29 | 1.1 | 0.83 | 0.70 |
320 | 8 | 2.6 | 0.4 | 1.5 | 1.2 | 1.0 |
400 | 17 | 3.3 | 1.7 | 1.9 | 1.4 | 1.3 |
524 | 30 | 4.3 | 1.9 | 2.5 | 1.9 | 1.7 |
L . | N . | τr . | f30λ . | |$k_\Vert (z=7)$| . | |$k_\Vert (z= 12)$| . | |$k_\Vert (z=16)$| . |
---|---|---|---|---|---|---|
(m) . | . | (μs) . | (per cent) . | (h Mpc−1) . | (h Mpc−1) . | (h Mpc−1) . |
90 | 19 | 0.74 | 40 | 0.42 | 0.31 | 0.27 |
150 | 31 | 1.2 | 80 | 0.70 | 0.53 | 0.50 |
230 | 23 | 1.9 | 29 | 1.1 | 0.83 | 0.70 |
320 | 8 | 2.6 | 0.4 | 1.5 | 1.2 | 1.0 |
400 | 17 | 3.3 | 1.7 | 1.9 | 1.4 | 1.3 |
524 | 30 | 4.3 | 1.9 | 2.5 | 1.9 | 1.7 |
There are N cables of each length (L) between the MWA receivers and beamformers with associated round-trip delay times (τr). In the f30λ column, we list the percentage of baselines within 30λ at 83 MHz (where the majority of the MWA's power spectrum sensitivity lies) that are formed from at least one tile with the given cable length. We also list the |$k_\Vert$| of each delay given by equation (24) for three different redshifts. Cable reflections that are significantly above the |$k_\Vert$| values where we expect to obtain maximum sensitivity to the power spectrum at EoR redshifts (z ≈ 7) move into the maximum sensitivity region at EoX redshifts (z ≈ 16). Higher order reflections will also contaminate multiples of and differences between the delays and k∥ values listed in this table (thought at a lower level).
L . | N . | τr . | f30λ . | |$k_\Vert (z=7)$| . | |$k_\Vert (z= 12)$| . | |$k_\Vert (z=16)$| . |
---|---|---|---|---|---|---|
(m) . | . | (μs) . | (per cent) . | (h Mpc−1) . | (h Mpc−1) . | (h Mpc−1) . |
90 | 19 | 0.74 | 40 | 0.42 | 0.31 | 0.27 |
150 | 31 | 1.2 | 80 | 0.70 | 0.53 | 0.50 |
230 | 23 | 1.9 | 29 | 1.1 | 0.83 | 0.70 |
320 | 8 | 2.6 | 0.4 | 1.5 | 1.2 | 1.0 |
400 | 17 | 3.3 | 1.7 | 1.9 | 1.4 | 1.3 |
524 | 30 | 4.3 | 1.9 | 2.5 | 1.9 | 1.7 |
L . | N . | τr . | f30λ . | |$k_\Vert (z=7)$| . | |$k_\Vert (z= 12)$| . | |$k_\Vert (z=16)$| . |
---|---|---|---|---|---|---|
(m) . | . | (μs) . | (per cent) . | (h Mpc−1) . | (h Mpc−1) . | (h Mpc−1) . |
90 | 19 | 0.74 | 40 | 0.42 | 0.31 | 0.27 |
150 | 31 | 1.2 | 80 | 0.70 | 0.53 | 0.50 |
230 | 23 | 1.9 | 29 | 1.1 | 0.83 | 0.70 |
320 | 8 | 2.6 | 0.4 | 1.5 | 1.2 | 1.0 |
400 | 17 | 3.3 | 1.7 | 1.9 | 1.4 | 1.3 |
524 | 30 | 4.3 | 1.9 | 2.5 | 1.9 | 1.7 |
We see in Table 2 that the minimum |$k_\Vert$| associated with a cable ripple on the MWA at z = 7 is 0.42 h Mpc−1 which, even if we allow for a delay width of 280 ns (the approximate width of the wedge with a supra-horizon emission buffer at k⊥ = 0), is above the region of maximal sensitivity. However, at z = 16, the approximate centre redshift of our Band 1, this cable ripple lies at 0.27 h Mpc−1, which can leak a significant amount of power into the sensitivity sweet spot due to its finite |$k_\Vert$| width. This effect is illustrated in Fig. 10. At z = 7, the smallest reflection delay on the MWA introduces foreground contamination down to |$k_\Vert \approx$| 0.3 h Mpc−1, leaving a small foreground-free window in which cosmological measurements may be performed. On the other hand, at z = 16, this window becomes smaller with the other reflections filling the EoR window up to |$k_\Vert \sim 0.8$| h Mpc−1. If the corrected reflections are several times larger than 10−3, higher order terms below the 90 m delay may be comparable to the level of the 21 cm signal (shown as light grey regions and centred on light-grey dashed lines). However, only the peak of this foreground power will be near the signal level and the broad wings caused by beam chromaticity and will be well below the signal.

The regions of the EoR window contaminated by foregrounds due to uncalibrated cable reflections for several different redshifts. Dark grey regions denote contamination from first-order cable reflections assuming a wedge out to the first null of the primary beam plus the 0.15 h Mpc−1 at z = 8.5 buffer observed in Pober et al. (2013). Since the buffer is associated with the intrinsic spectral structure of foregrounds, it lives in delay space. Dark grey regions denote foreground contamination within the wedge which exists even without instrumental spectral structure. At z = 7, a representative EoR redshift, the contaminated regions remain at relatively high |$k_\Vert$| and have smaller widths due to the smaller primary beam and the scaling of k⊥ and |$k_\Vert$| with z. While regions exist between the first-order reflections that are somewhat wider at lower redshift, second-order reflections can still potentially contaminate nearly all of the EoR window in which interferometers are supposed to be sensitive (light grey shaded regions). Second-order reflections are below the sensitivity level of this study but will also pose an obstacle to longer integration unless calibrated out.
This effect is purely geometrical in that while the mapping from instrumental delays to k-space varies, the number of measurements in (u, v, η) cells which are uncontaminated stays constant. However, as we go to higher redshift the increasing width of the primary beam increases foreground power at supra-horizon delays, effectively reducing the number of usable (u, v, η) cells. As mentioned above, the cosmological power spectrum decreases significantly with increasing k so the fact that smaller k are contaminated by cable reflections at higher redshifts hurts our sensitivity disproportionately. We also note that any partial reflections from kinks and bends within the cable itself can lead to contamination of additional delays below the round-trip travel time on the cable.
We make a first attempt to remove these reflections by fitting reflection functions (equation 3) to our self calibration solutions divided by the smooth fit in equation (2). Since our per snapshot calibration solutions are too noisy to obtain good fits for the beamformer-receiver reflection, we fit these on calibration solutions that are averaged over each night of observing. We find that this method is of limited efficacy in removing the receiver to beamformer ripples (Fig. 20, top right).
Since, to avoid cosmological signal removal and spurious frequency structure due to mis-modelled foregrounds, we are attempting to model our bandpass with a small number of parameters we are unable to capture the full spectral structure of the cable reflections. As we see in Fig. 13 the reflection parameters are frequency-dependent and we do not have a clear picture of their precise evolution. Calibration exploiting redundant baselines might be able to make headway on the problem since it is not sensitive to unmodelled signal except in for deriving an overall phase and amplitude scaling that averages over all baselines (Wieringa 1992; Liu et al. 2010; Zheng et al. 2014). The MWA's baselines are designed for very low redundancy, making the technique unusable here. Future upgrades to the MWA are expected to include a significant number of redundant baselines (Tingay, private communication).
3.4 Calibration with autocorrelations
Confronted with the problem of reflections contaminating the power spectrum, we apply an alternative approach that uses tile autocorrelations to obtain calibration amplitude gains with sub-percent level accuracy.
To remove this multiplicative factor, we need a model for the ratio of the square root of each autocorrelation to the amplitude of the calibration solution. We use the product of a third-order polynomial and a 7 m cable reflection (modelling the LNA-beamformer cables) and fit it to the ratio, averaged over 112 s intervals, using our noisy initial calibration solutions.
We then use the square root of the autocorrelation divided by this polynomial as our calibration amplitude. In Fig. 11 we demonstrate the validity of this technique by comparing the smoothly corrected autocorrelations for a single snapshot with a calibration amplitude that has been averaged over a single pointing and see that they are in very good agreement.

We show the amplitude of a calibration gain averaged over a 15 min pointing (black circles) along with the square root of our autocorrelations which have been scaled by a third-order polynomial and a single 7 m reflection to match the calibration solution (red line). After multiplying the autocorrelations by a smooth function, they are brought into good agreement with the calibration gains.

Left: in order to obtain reflection parameters, we divide our scaled autocorrelation (magenta circles) by a smooth function consisting of a third-order polynomial and large-scale reflections (green line). Right: we fit this ratio (magenta circles) to a reflection function (green line) and are left with ∼10 per cent residuals (grey points).
Since tiles with 320, 400, and 524 m cables only contribute to ∼4 per cent of our sensitive baselines, we discard them entirely. In Fig. 13 we show the distributions of the reflection amplitudes fitted from autocorrelations (averaged over the night of September 5) inferred for our 90, 150, and 230 m cables for both the high and low bands. One can see that the reflection coefficients are on the order of fractions of a percent and vary significantly from cable length to cable length. This is reasonable since the cable impedance, which determines the reflection amplitude, is a function of both its geometry and dielectric properties (with equal length cables likely formed from cable batches of similar dielectric properties). In addition, frequency evolution of the reflection amplitude is apparent by comparing the fit in Bands 1 and 2 implying that a single delay standing wave is not quite the correct model to use in our phases.

Histograms of fitted cable reflection amplitudes for Band 1 (blue) and Band 2 (green) obtained from fits to autocorrelations for three different cable lengths between MWA receivers and tiles. The reflection amplitudes range from 0.2 to 1 per cent making them difficult to fit using the noisy self calibration solutions. Reflection amplitudes in Band 2 are systematically larger than Band 1 for all cable lengths, indicative of non-trivial frequency evolution in the reflection parameters.
Autocorrelations are particularly susceptible to RFI and potential contamination due to cross talk and other artefacts. In Fig. 11, we saw that after flagging the channel edges, the spectral structure in the autocorrelations was consistent with our calibration solutions up to a smooth polynomial factor. In Fig. 14, we inspect for artefacts and RFI in a typical tile autocorrelation as a function of time. We see that RFI is present at similar times that were flagged in autocal (Fig. 6). We also see that the time evolution of each autocorrelation is consistent between the two nights with rapid 10 per cent transitions occurring at ≈30 min intervals when the analogue beamformer settings are changed to track the sky. Ripples in frequency are also visible, corresponding to the structure in the standing wave reflections. It is difficult to pick out small artefacts in this dynamic spectrum view unless more large-scale smooth structure is fitted out, as is done in our calibration procedure. The residuals after this fitting give a better picture of what fine spectral features exist in the dynamic spectra of the autocorrelations which we discuss in the next section.

Dynamic spectra of the square root of a representative tile autocorrelation. Note the different colour bars for the two frequency bands since Band 1 evolves more steeply in frequency than Band 2. The autocorrelations exhibit repetitive structure in time from night to night with smooth time variations occurring as the sky rotates overhead and steep transitions occurring every ≈30 min due to changes in the analogue beamformer settings as the antennas track the sky. Limited RFI is plainly visible within the FM band, especially in Band 2, and the events are consistent with the flagging events identified by cotter shown in Fig. 6.
3.5 The time dependence of residual structure
We noted in Section 3.4 that our fits to reflections tended to have ∼10 per cent residuals. Since we rely on these fits to predict the reflections in our gain phases, we expect residuals of these fits that are also present in the phases to contribute reflection power at a similar level. Our residuals could arise from thermal noise in the autocorrelations and calibration solutions. If this were the case we might expect them to average down with time. On the other hand, these residuals might also arise from mismodelling of the reflections themselves and would not average down with time. The result would be a systematic floor which can only be overcome by finding the correct model of the reflections or removing them from the signal path.
Plotting the fit residuals for two representative 90 m and 150 m tiles over the low band (Fig. 15), we find them to be at the ∼10−3 level. While there is some scatter in these residuals due to fitting noise, their frequency-dependent shape is relatively constant. As a consequence, the residuals average to a spectrum with frequency structure. These residuals are likely due to mismodelling of the frequency-dependent amplitude, phase, and period of the reflections but at a lower level may have some contributions from digital artefacts and cross-talk present in the autocorrelations. Because the component in these residuals that is sourced by reflections is also present in the phases which we are trying to model, there remains an uncorrected component to the gains that we are not calibrating out and does not average down with time.

Left: the residuals to fitting reflection functions in our autocorrelations for all 2 min time-steps in our analysis for a representative tile with a 90 m beamformer to receiver connection (light grey points). While some scatter exists in the residuals due to fitting noise, they average to non-zero values on the order of ∼10−3 (black dots). These residuals are due to mismodelling the reflections and at a lower level potentially arise from digital artefacts. Right: the same as the left but for a 150 m cable whose reflection coefficient is ∼× 2 as large as the 90 m cable, leading to larger residuals due to mismodelling.
While calibration with autocorrelations still appears to be limited by fine frequency artefacts arising from reflections, the high SNR of the reflections in the autocorrelations does offer significant improvement over fitting the reflections in the calibration solutions themselves. We provide a more quantitative look at the improvement achieve using calibration with auto-correlations in Section 4.2.
4 POWER SPECTRUM RESULTS
We can now present our power spectrum results and the first upper limits on the Epoch of X-ray heating power spectrum. We form cross power spectra of the even and odd timestep data cubes through the empirical covariance modelling technique developed in Dillon et al. (2015b, hereafter D15). In this procedure, the foreground residual model used in the inverse-covariance weighted quadratic power spectrum estimates and in the associated error statistics is derived from the data. It assumes that foreground residuals are correlated in frequency but uncorrelated in the uv plane and depend only on frequency and |u|. We refer the reader to D15 and its predecessors (Tegmark 1997a; Liu & Tegmark 2011; Dillon et al. 2013, 2014) for a thorough discussion of how this technique works. Along with estimates of the power spectrum amplitude, our pipeline outputs error bars and window functions which describe the mixing of the true power spectrum values into each estimate. We form 1 d power spectra by binning our 2 d power spectra using the optimal estimator formalism of Dillon et al. (2014) with the weights of all modes lying outside of the EoR window or with |$k_\Vert$| values showing consistent cable reflection contamination set to zero (D15).
First we will examine our two dimensional power spectra for Bands 1 and 2, derived from ≈15 MHz of bandwidth each, and comment on systematics (Section 4.1) and how well our calibration techniques mitigate them (Section 4.2). We finish by presenting our spherically binned 1 d power spectra, our most sensitive data product. We use our 1 d power spectra to compare foreground contamination from ionospheric systematics on both nights (Section 4.3) and determine our best upper limits (Section 4.4).
4.1 Systematics in the 2 d power spectrum
The absolute values of our two dimensional power spectrum estimates using data calibrated with auto-correlations are shown in Fig. 16 for both bands. The distinctive ‘wedge’ confines the majority of our foreground power with some supra-horizon emission clearly present out to ∼0.1 h Mpc−1 as was found in observations of foreground contamination with a similarly large PAPER primary beam (Pober et al. 2013). Smooth frequency calibration errors, arising from foreground mismodelling, may also contribute to the supra-horizon emission along with intrinsic chromaticity in the primary beam itself. As we expected, the level of foregrounds and thermal noise is noticeably higher in our measurement of Band 1. We note that at the edge of our k⊥ range, there is a significant increase in power which is due to a rapid increase in thermal noise from the drop-off in the uv-coverage of our instrument. Though somewhat hard to see by eye, there are signs of coherent non-noise-like structures in both bands below |$k_\Vert \approx 0.5$| h Mpc−1.

The absolute value of our cylindrical power spectrum estimate from our three nights of observing on Band 2 (left) and Band 1 (right). We overplot the locations of the primary beam (dash-dotted), horizon (dashed), and horizon plus a 0.1 h Mpc−1 buffer (solid black) wedges. We see that the foregrounds are primarily contained within the wedge and that the EoR window is, for the most part, noise-like. There is some low SNR structure below |$k_\Vert \approx 0.5$| h Mpc−1, corresponding to |$k_\Vert$| modes contaminated by cable reflections. The amplitude in power rises very quickly due to an increase in thermal noise which rises very quickly at large |$k_\Vert$| due to a rapid falloff in uv coverage beyond k⊥ ∼ 0.2 hMpc −1.
We confirm these faint |$k_\Vert \lesssim 0.5$| h Mpc−1 structures as systematic contamination by inspecting the sign of our power spectrum estimate over the k⊥–|$k_\Vert$| plane. While the expected value of the even/odd cross power spectrum is always positive, k-bins that are dominated by noise have an equal probability of being positive or negative. Regions in which band powers are predominately positive are detections of foregrounds or systematics. In Fig. 17 we show P(k) from data calibrated with auto-correlations for both bands with an inverse hyperbolic sine colour scale to highlight regions of k-space that have positive or negative values. It is clear that the region of with |$k_\Vert \lesssim 0.5$| h Mpc−1 is not well described by thermal noise.

P(k) over Band 2 (left) and Band 1 (right) with a colour scale that highlights cells with positive or negative values. We expect regions that are thermal noise dominated to contain an equal number of positive and negative estimates and regions that are dominated by foreground leakage to be entirely positive. We observe significant foreground contamination outside of the wedge up to |$k_\Vert \approx 0.5$| h Mpc−1 in both bands.
Detections of foregrounds and systematics are especially visible in the ratio between the power spectrum and error bars predicted by our empirical covariance method (Fig. 18). In Fig. 19 we observe excess power at the ∼2σ level. While this is not a significant excess on a per cell basis, we detect this same power at high significance when we average in bins of constant |$k\equiv \sqrt{k_\perp + k_\Vert }$|.

The errors on |$\boldsymbol{\widehat{p}}$| arising from residual foregrounds and thermal noise are determined by looking at even/odd difference cubes and foreground-subtracted residual cubes using the method of D15. We show the error bars on our cylindrical power spectrum here, seeing that errors arising from foregrounds are contained within the wedge. These foreground errors are maximized at the smallest and largest k⊥ arising from large power in diffuse emission and increasing thermal noise from a dropoff in baseline density, respectively.

The foreground contamination within the wedge along with residual detections due to miscalibrated fine frequency features in the bandpass are especially clear in plots of the ratio between power and the error bars estimated by the empirical covariance method of D15. We overplot the wedge with a 0.1 h Mpc−1 buffer along with the wedge translated to cable reflection delays of our 90 and 150 m receiver to beamformer cables to highlight the effect of this systematic.
Because this excess power is present at similar levels over both of our observing subbands, one of which has a significantly greater overlap with the FM, we cannot attribute this excess to RFI. In our 1 d power spectra we also find that excess power is detected in our highest redshift bin which is outside of the FM entirely. The best explanation we have for this leakage is the residual structure in the MWA's bandpass caused by standing wave reflections on the beamformer to receiver cables. To demonstrate the plausibility of this explanation, we overlay the wedge translated to the |$k_\Vert$| modes corresponding to the delays of our 90 and 150 metre cables. For clarity, we do not show the 230 metre cable reflections in this overlay since their amplitudes and the number of tiles affected is comparatively small. We also observe this reflection in the 1 d power spectrum which has higher signal to noise. We find that the region where one might expect contamination from a cable reflection is in good agreement with the observed excess power.
4.2 Comparing calibration techniques
Having formed 2 d power spectra and estimates of the vertical error bars, we are in a position to asses the performance of our calibration solution in removing systematics. By inspecting the signal to error ratio in the EoR window, we compare our different calibration techniques. In Fig. 20 we show the ratio of P(k), binned over annuli, to the error bars in Band 1 for the calibration techniques discussed in this work. For all calibration methods, the majority of foreground detections are contained within the wedge with a ∼0.1 h Mpc−1 buffer, indicating that all perform at a similar level in removing smooth gain structure within the wedge.

Here we show the ratio between our 2D power spectrum and the error bars estimated by the empirical covariance method of D15. On the top left, we show our data calibrated using our initial calibration (see Section 2.2) with no attempt made to correct for standing wave structure in the MWA bandpass. Bright, band-like structures are clearly visible at the delays associated with reflections. On the top right, we show a first attempt to correct for cable reflections by fitting a sinusoidal model to rather noisy calibration solutions that had been integrated over a night of observing (1.5 h each night). While the bands appear weaker, they are still quite visible above the noise. In the bottom right panel, we show the same plot with calibration solutions using scaled autocorrelations described in Section 3.4. In the lower left panel we show a power spectrum with calibration solutions using autocorrelations for the amplitudes but without any attempt to correct reflections in the phase solutions. Pronounced reflection features are visible in this power spectrum, indicating that any mismodelled reflection structure in the phases will contaminate our measurement.
We first inspect a power spectrum derived from data calibrated using the initial method described in Section 2.2 in which coarse band structure is removed by averaging over tiles, the per tile amplitudes and phases of each antennas are fit to smooth polynomials, and no attempt is made to model the beamformer-receiver reflections (top left corner). Significant foreground power is visible beyond the wedge to k ∼ 0.5 h Mpc−1 and is especially bright at the delays corresponding to the |$k_\Vert$| values of the cable reflections in Table 2. The fact that the 150 m delay dominates the others stems from the fact that most of our short baselines are formed from 150 m cables and that the amplitude of the reflections in the 150 m cables is larger compared to the 90 and 230 m cable lengths (Fig. 13). We next show a first attempt to fit out the reflections by averaging all calibration amplitudes in a night, dividing out a polynomial, and fitting equation (26). While the power in the bands is reduced significantly, residuals remain at the 2σ–10σ level, especially in the reflection bands. Since our initial calibration solutions are so noisy, it makes sense that they are difficult to fit.
We finally inspect results from calibrations derived from the autocorrelations described in Section 3.4 (lower right). While there is significant reduction compared to the amplitude on the top right corner, there still exist residuals outside of the window at the ∼1σ–2σ level. We think that these residuals arise from imperfect modelling of the reflection coefficients in the autocorrelation amplitudes, which will leave some reflection structure in the visibility phases. To demonstrate the impact of unmodelled reflection structure in the phases, we leave the phases of our auto-calibration solutions uncorrected for any fitted reflection coefficients (lower left) and find that significant power is reintroduced into the window.
We can get a more quantitative view of how much autocorrelations can improve calibration by taking a slice through the cylindrical power spectrum at the |$k_\Vert$| of our 150 m cable reflection (Fig. 21) where we see that fitting the calibration solutions was able to remove roughly an order of magnitude of the power in the reflection while AutoCal removes a factor of ∼20. Since the power spectrum is proportional to the square of the visibilities which are primarily contaminated by first-order reflection contributions, this corresponds to an accuracy of ≈ 20 per cent in removing the reflections in the visibilities and is consistent with the residuals observed in Fig. 15. Such inaccuracy likely arises from our inability to model the precise frequency dependence of the reflection parameters in the phases and is on a similar order to the residuals observed in Fig. 12. Since the reflections are removed to this accuracy in the visibility, we can briefly comment on how the relative contribution of second-order reflections (which are below our noise floor even without any calibration). Since the second-order reflections appear in the data at the |$\widetilde{r}^{4}$| level and we have reduce their amplitude in the data from ≲0.01 to ≲0.003, they will enter the power spectrum at the level of ≲10 × 10−1 the level of the 21 cm signal.

The level of power at a fixed |$k_\Vert$| corresponding to the delay of reflections from our 150 m cable (left), and comparing it to a value of |$k_\Vert$| unaffected by cable reflections (right). The blue line shows the power spectrum level for calibration in which the bandpass is modelled as a polynomial with no attempt to correct fine frequency scale reflections. We see that power is on the order of ∼50 times the thermal noise level (green-dashed line). Attempting to fit the reflections to calibration solutions integrated over each night gives us an improvement in the power level by roughly an order of magnitude (orange solid line). Using calibration solutions derived from autocorrelations brings down the reflection power by another factor of a few (purple solid line) but is still unable to bring the majority of measurements below the ∼1σ level. While we think that the autocorrelations accurately capture the fine frequency structure of the gains, we are still forced to model this fine frequency structure and predict it in the phases. Residual power is likely due to inaccuracies in this modelling. The right-hand panel shows all data below the estimated noise level. This is due to the fact that in Dillon et al. (2014) it is shown that the method for calculating error bars layed-out in Liu & Tegmark (2011); Dillon et al. (2013) slightly overestimates the noise.
We attempted to better model the reflections by allowing for frequency evolution of the amplitudes but found little improvement in the power spectrum. We also found that we are able to obtain better fits of the autocorrelations by adding additional smooth reflections terms to equation (2) which could be important if unmodelled large scale structures bias our fits of small scale ones. However, using more complicated fits of the large scale structure, we did not observe significant improvement in power spectrum contamination. The solutions that we ultimately settled on in this analysis allow for a power-law evolution of the reflection amplitude and add an additional small delay reflection term to equation (2) which lies well within the wedge. In most of our autocorrelation fits, residuals remained at the ∼10 per cent level, some of which may arise from secondary reflections in bent or kinked cables. While these residuals were clearly present at high SNR in the autocorrelations, we have not found a way to sufficiently model the contribution of these low level structures to our phases.
While using the autocorrelations has allowed us to characterize and subtract the fine spectral structure in the instrument better, it may not be a viable long term solution, even in the regions of the EoR window that currently appear foreground free. RFI contamination and digital artefacts are known to contaminate autocorrelations and likely exist below our current noise level.
4.3 Power spectra comparison between nights of varying ionospheric activity
An open question is whether or not the ionosphere will significantly hamper measurements of the power spectrum. The fact that the severity of ionospheric effects increase with λ2 makes the question especially pertinent at low frequency. Changes in foreground emission induced by ionospheric effects can enter the power spectrum in two ways: through calibration and through the foreground residuals themselves. We check whether either of these potential error sources have an observable effect on our 1 d power spectrum in Fig. 22 by comparing power spectra derived from 1.4 h of Band 1 data on September 5, over which ionospheric activity was comparatively mild to the same number of hours of Band 1 data on September 6 where differential refraction was approximately twice as severe.

The Band 1, 1 d power spectra from our two nights of observing: 2013 September 5 (black) and 2013 September 6 (red). We saw in Fig. 7, that the magnitude of refractions on September 6 were on average twice as severe. The two power spectra nearly indistinguishable (within error bars) despite the significant differences in conditions, indicating that ionospheric systematics do not have a significant effect after three hours of integration, even at these low frequencies.
We find that the power spectra, which are estimated from data outside of the wedge, are consistent with each other. This result confirms the intuitive idea that since ionospheric errors in the foreground model are spectrally smooth (evolving as ∼λ2), they should be contained within the wedge. We also extended our 1 d power spectrum estimation into the wedge to see whether the foreground detections appeared to be significantly different and find that they are not. This suggests that the random errors induced by the ionosphere average down with time. It is important to keep in mind that the spatial scales being probed in our analysis are relatively large, on the order of ≳2.5°, while ionospheric refraction at these frequencies effects sub-arcminute scales. Hence the contamination that we might expect from ionospheric refraction should be small. Amplitude scintillation effects are prominent on short baselines (V15) and likely dominate any contamination, however their spectral coherence still constrains them to be predominantly within the wedge (V16).
4.4 First upper limits on the 21 cm power spectrum during the pre-reionization epoch
We limit our 1 d power spectra to redshift widths of Δz ∼ 1.5 to minimize effects from cosmic evolution. A redshift interval of Δz ∼ 0.5 is the range most cited in the literature over which the statistics of the brightness temperature field are expected to be stationary (Mao et al. 2008). However, at higher redshift, the frequency range corresponding to Δz ≈ 0.5 decreases as (1 + z)−2 with Δz = 0.5 corresponding to a bandwidth of only 2.45 MHz by z = 16. Reducing our bandwidth to such a small interval leads to poor k resolution which we prefer to maintain for assessing systematics. Since we are far from a detection, we opt for a larger redshift interval than we would otherwise use if we were actually observing the cosmological signal. In Fig. 23 we show 1 d power spectra derived from our three hours of observing. Vertical error bars give 2σ uncertainties and the horizontal error bars give the width of our window functions. The amplitudes of our power spectrum values are consistent with thermal noise except for the regions of k-space below |$k_\Vert \lesssim 0.5$| h Mpc−1. At k ≈ 1 h Mpc−1, where our measurements are well described by thermal noise, our upper limits are on the order of 100 times higher than the results presented in D15 in which a similar three hour upper limit was established at ≈180 MHz. This factor of ≈100 is consistent with what we expect from equation (8). The sky temperature increases with decreasing frequency as f −2.6 leading to a factor of ≈30 from |$T_{{\rm sys}}^2$| while |$\lambda ^4/A_e^2$| introduces an additional factor of 4–10.

Dimensionless 1 d power spectra derived by Integrating spherical shells excluding the foreground contaminated wedge region with a 0.1 h Mpc−1 buffer. Black dots indicate the mean estimated from the weighted average in each bin. Vertical error bars denote the 2σ uncertainties while horizontal error bars indicate the width of window functions. We also shade regions of k-space that we expect to have some level of foreground contamination due to uncalibrated cable reflection structure. Grey shaded regions clearly correspond to regions in which our power spectrum measurements are not consistent with thermal noise. We note that where our upper limits do agree with thermal noise, the power spectrum is on the order of ∼100 times larger than the upper limits set with the MWA at ≈180 MHz(D15). This factor is reasonable given that the sky noise (noise power spectrum) scales with ∼f −2.6 (f −5.2) and the primary beam solid angle increases as ∼f 2.
The detections at small k are many orders of magnitude larger than the expected cosmological signal from a 21cmFAST simulation (Mesinger, Furlanetto & Cen 2011) (blue solid lines) so they cannot possibly originate from the redshifted H i emission. Instead, these detections are most likely the miscalibrated reflection structure observed in our 2D power spectra. We shade out regions of the k axis in which we expect contamination given the reflections discussed above and find that they correspond to the same modes where detections are observed. These systematic detections occupy the regions of Fourier space where our interferometer has the greatest sensitivity to the cosmic signal. Since we do not expect the systematics dominated regions to integrate down, a detection with the MWA in its current state using the techniques presented in this work would have to take place at k ≳ 0.5 h Mpc−1, requiring over 105 h of integration – a rather infeasible time-scale. Thus, in order to probe the pre-reionization epoch, improvements in calibration and/or changes in the hardware of the MWA will have to be implemented. We note that at lower redshifts, the primary beam is smaller and the k-modes occupied by reflections are farther away from the sensitivity sweet spot, so it is less likely that this problem will prevent the MWA from detecting the EoR power spectrum.
Our best upper limits fall within the region of Fourier space with systematic errors and, while we do not expect them to integrate down with more observing time, we can infer that Δ2(k) is less than 2.5 × 107 mK2 at k = 0.18 h Mpc−1 and z = 12.2, 8.3 × 107 mK2 at k = 0.21 h Mpc−1 and z = 15.35, and 2.7 × 108 mK2 at k = 0.22 h Mpc−1 and z = 17.05, all at 95 per cent confidence.
4.5 The outlook for EoR measurements on the MWA
A pertinent question arising from our analysis is how much the observed reflections impact or limit observations with the MWA of the Epoch of Reionization power spectrum at higher frequencies. The answer depends significantly on the calibration and reduction approach and as of now, several different efforts using alternative calibration and reduction schemes are being undertaken (Jacobs et al. 2016). The analyses in D15 and Beardsley et al. (in preparation) are calibrated in a way similar to this work, employing limited calibration parameters to avoid detrimental modelling errors. After an integration time of ≈3 hours, D15 also observe the cable reflections above the thermal noise level, however the smallest k mode occupied by the shortest 90 m cable lies at k ≈ 0.4 h Mpc−1 while the delay width is narrower due to the smaller primary beam, causing the wedge to occupy fewer k∥ modes. As a result, there are regions of k-space below the shortest reflection that are still consistent with noise. Such miscalibrated structure should be highly detectable after ≈10–30 h of integration but the results of such an analysis are still forthcoming (Beardsley et al., in preparation). If the reflections can be corrected to the ∼10−3 level as was done in this analysis, the region below the first reflection should remain free of contamination from the beam-former to receiver reflections.
Additional calibration pipelines, which include far greater degrees of freedom, such as the Real Time System (RTS) (Mitchell et al. 2008; Ord et al. 2010) and the reduction pipeline discussed in Offringa et al. (2016) include direction-dependent calibration, ionospheric phase fitting, and greater frequency resolution, are also being applied to MWA data sets. A recent upper limit at 180 MHz derived from RTS calibrated data and the CHIPS power spectrum estimator did not show evidence of the cable reflections being present (Trott et al. 2016). It is likely that the enhanced degrees of freedom allowed by RTS calibration did a better job at removing the structure from reflections but it is difficult to tell given that the error bars due to thermal noise at the comoving scales relevant to these reflections are an order of magnitude larger than those in D15 because of the shorter integration time. Ultimately, the increase in the number of fitting parameters may enhance the removal of instrumental chromaticity in the EoR window, however simulations by Barry et al. (2016) show that small errors in ones calibration model will introduce power into the window in excess of the 21 cm signal unless the intrinsic bandpass is smooth enough to be modelled by a small number of parameters or a source model exists with an accuracy significantly beyond what is currently available. Ultimately, the increase in fitting parameters may enhance the removal of instrumental chromaticity in the EoR window. Whether they can be introduced without adding power into the EoR window in excess of the signal, due to small errors in source modelling is still an open question that is currently being investigated.
5 CONCLUSIONS AND FUTURE EXPERIMENTAL CONSIDERATIONS
In this paper, we have presented low frequency radio observations with the MWA at unprecedentedly high redshifts between 11.6 and 17.9. Our goals in conducting these observations were to place upper limits on the 21 cm power spectrum during the Epoch of X-ray heating and to assess the levels of systematics which are expected to be generally worse than at EoR frequencies. These systematics include ionospheric effects, RFI (due to the FM band) and increased thermal noise. We need to control these systematics if we are to learn the detailed properties of the sources that heated the IGM; be they the first generation of stellar mass black holes, the hot interstellar medium left over from the first supernovae explosions in the Universe, or dark matter annihilation.
With regards to RFI, we have found after three hours of integration that existing algorithms are sufficient to flag RFI below the FM band. Within the FM band, we have found that only a handful of channels are contaminated continuously and that after discarding them our power spectra do not show any evidence of RFI contamination. This bodes well for future planned 21 cm experiments at the MRO such as the SKA-low which is expected to make high signal to noise detections of the power spectrum (Koopmans et al. 2015). However, we are still many orders of magnitude above the level of a detection and reducing the thermal noise through longer integrations may reveal lower level RFI.
Over two nights of observing, we encounter different ionospheric conditions, observed quantitatively using the differential refraction metric described in Cohen & Röttgering (2009). We establish that ionospheric fluctuations are the source of observed position shifts by comparing the level of refraction in our two observing bands and find that they exhibit the expected λ2 evolution. Diffractive scales on the second night of ≈5 km are a factor of 2 shorter than the first night, indicative of more severe ionospheric activity. When we compare the 1 d power spectra derived from an equal amount of data on each night, we find that they are very similar to each other, lending support to the idea that since ionospheric effects on calibration and foreground residuals are spectrally smooth, they should not contaminate the EoR window.
While the majority of foreground power is contained within the wedge, we find high-significance foreground detections within the EoR window out to a |$k_\Vert \lesssim 0.5$| h Mpc−1. These contaminated regions are consistent with miscalibrated cable reflections. We are able to obtain an order of magnitude improvement on removing the worst of these features using fits to autocorrelations, however they still limit our sensitivity at the 2σ–5σ level. In addition, since auto-correlations are generally contaminated by RFI and digital artefacts, it is likely that in reducing the dominant obstacle in our data, we have introduced additional features that are below the noise level of this analysis. Since the reflections occupy the regions of k-space where we would otherwise expect the greatest cosmological sensitivity, our best upper limits are a factor of a few larger than the limits we would obtain if we were thermal noise limited. Cable reflections are especially pernicious at higher redshifts because the increasing primary beam width adds foreground power to delays ever closer to the horizon. While supra horizon emission off of the wedge moves up in |$k_\Vert$|, the modes occupied by cable reflections move down, increasing in width. The EoR window is crushed between the shortest reflection mode and the top of the wedge.
While our observations on the MWA will not integrate down below ≈108 mK2 at k ≲ 0.5 h Mpc−1 and is limited by the intrinsic spectral structure of the instrument, the systematics encountered in this analysis do not prevent 21 cm observations at high redshift in general. A robust source catalog, that includes emission all the way down to the horizon along with precise models of the primary beam will lead to less foreground power bleeding from the edge of the wedge, (Thyagarajan et al. 2015a,b; Pober et al. 2016) and potentially open up a foreground free region under the first cable reflection. Resolving the question of cosmological signal loss and mixing of foreground spectral structure from large to short baselines may enable us to calibrate with more free parameters, better capturing the spectral structure of the bandpass. More robust calibration of these features may also be obtainable with a redundant array (Wieringa 1992; Liu et al. 2010; Zheng et al. 2014). The 128-tile MWA has very little redundancy by design, however an additional 128 tile expansion is expected to introduce two highly redundant, hex-packed, subarrays (Tingay, private commmunication). The final plan for HERA, which is currently under construction, is dominated by 331 hexagonally packed dishes. Its layout is designed to take advantage of redundant calibration as well (Pober et al. 2014). Finally, calibration using injected signals (Patra et al. 2015) can also be employed to make high precision measurements of the bandpass.
The most sure way of eliminating reflection features is to remove them in hardware either by ensuring better impedance matching on the cable connections, changing the cable lengths to move reflections out of the window, or early digitization. The current HERA design employs cables no longer than 35 m in length, translating to |$k_\Vert = 0.09$| h Mpc−1 at z = 16 and ensures that reflections within the dish are below an acceptable level (Ewall-Wice et al. 2016b; Patra et al., in preparation; Thyagarajan et al. 2016), while the planned MWA phase III upgrade and the SKA are considering digitization at the beamformers (Tingay, private communication), eliminating reflections altogether.
While measurements of the 21 cm line at EoR frequencies can teach us about the nature of UV photon sources and constrain cool thermal histories, a significant number of scenarios predict saturation of heating's contribution to brightness temperature fluctuations during reionization. In order to learn of the detailed properties of the sources that heated the IGM and to exploit the full potential of the 21 cm line as a cosmological and astrophysical probe, we will invariably want to extend our search to as low a frequency as possible. In this work we have obtained a first look at the systematics facing us in this high redshift realm and have found that most of them are navigable. As of now, our primary limitation lies in the design of our instrument and calibration, both of which can be dramatically improved on relatively short time-scales. Ultimately, we expect a combination of improvements in instrumental design including shorter/no cables to keep reflections inside of the wedge and redundant baseline layouts allowing for more robust calibration to allow for much deeper integrations in the near future.
We would like to thank the referees for their helpful comments. We extend our gratitude to Jeff Zhang and Adrian Liu for useful discussions. This work was supported by NSF Grants AST-0457585, AST-0821321, AST-1105835, AST-1410719, AST-1410484, AST-1411622, and AST-1440343, by the MIT School of Science, by the Marble Astrophysics Fund, and by generous donations from Jonathan Rothberg and an anonymous donor. AEW acknowledges support from the National Science Foundation Graduate Research Fellowship under Grant No. 1122374. AM acknowledges support from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation program (grant agreement No 638809 – AIDA).
This scientific work makes use of the Murchison Radio-astronomy Observatory, operated by CSIRO. We acknowledge the Wajarri Yamatji people as the traditional owners of the Observatory site. Support for the MWA comes from the U.S. National Science Foundation (grants AST-0457585, PHY-0835713, CAREER-0847753, and AST-0908884), the Australian Research Council (LIEF grants LE0775621 and LE0882938), the U.S. Air Force Office of Scientific Research (grant FA9550-0510247), and the Centre for All-sky Astrophysics (an Australian Research Council Centre of Excellence funded by grant CE110001020). Support is also provided by the Smithsonian Astrophysical Observatory, the Raman Research Institute, the Australian National University, and the Victoria University of Wellington (via grant MED-E1799 from the New Zealand Ministry of Economic Development and an IBM Shared University Research Grant). The Australian Federal government provides additional support via the Commonwealth Scientific and Industrial Research Organisation (CSIRO), National Collaborative Research Infrastructure Strategy, Education Investment Fund, and the Australia India Strategic Research Fund, and Astronomy Australia Limited, under contract to Curtin University. We acknowledge the iVEC Petabyte Data Store, the Initiative in Innovative Computing and the CUDA Center for Excellence sponsored by NVIDIA at Harvard University, and the International Centre for Radio Astronomy Research (ICRAR), a Joint Venture of Curtin University and The University of Western Australia, funded by the Western Australian State government.
Kolmogorov turbulence, with a spectral index of n = 11/6 in the V15 parametrization scales in a very similar way; rdiff∝6/5 ≈ 1.
Kp and 10.7 cm flux values were obtained from http://spaceweather.com/archive.php.
Values obtained by averaging global TEC maps downloaded from the MIT Haystack Madrigal data base (Rideout & Coster 2006).
REFERENCES
APPENDIX A: THE EFFECT OF CABLE REFLECTIONS ON TILE GAINS
Throughout this work, we reference several expressions dealing with standing waves on cables that arise from mismatched impedances at their connections. In this section we derive these expressions for the reader's convenience. Discussions of this problem can be found in most elementary electricity and magnetism texts.
APPENDIX B: The Power Spectrum of ionospheric phase fluctuations from measurements of Differential Refraction
APPENDIX C: THE AMPLITUDE OF SCINTILLATION NOISE IN MWA VISIBILITIES
We substitute β = δ(γ − 1) = 1.2, |$C=k f_0^{-\delta (1-\gamma )} = 1.6\times 10^6 {\rm Jy}^{-1} {\rm sr}^{-1}\, {\rm MHz}^{-2.5}$|, α = γ = 2.5, and an effective beam area of C1. Using the short dipole model of the MWA beam, we compute a Beff(83 MHz) of 0.33 sr. From these numbers, we obtain Seff = 212.3 Jy.
The final ingredient is |$P(\bf k)$| which we compute from our fits of our differential refraction measurements described in appendix B and using the functional form in equation (19). Applying equation (C2), we obtain values for σscint[V(b)] between ≈4 and 6 Jy on the 30 min intervals on September 5 and ≈1–2 Jy on the 30 min intervals on September 6 at 83 MHz.