A publishing partnership

Articles

IGM CONSTRAINTS FROM THE SDSS-III/BOSS DR9 Lyα FOREST TRANSMISSION PROBABILITY DISTRIBUTION FUNCTION

, , , , , , , , , , , , , , , and

Published 2015 January 29 © 2015. The American Astronomical Society. All rights reserved.
, , Citation Khee-Gan Lee et al 2015 ApJ 799 196 DOI 10.1088/0004-637X/799/2/196

0004-637X/799/2/196

ABSTRACT

The Lyα forest transmission probability distribution function (PDF) is an established probe of the intergalactic medium (IGM) astrophysics, especially the temperature–density relationship of the IGM. We measure the transmission PDF from 3393 Baryon Oscillations Spectroscopic Survey (BOSS) quasars from Sloan Digital Sky Survey Data Release 9, and compare with mock spectra that include careful modeling of the noise, continuum, and astrophysical uncertainties. The BOSS transmission PDFs, measured at 〈z〉 = [2.3, 2.6, 3.0], are compared with PDFs created from mock spectra drawn from a suite of hydrodynamical simulations that sample the IGM temperature–density relationship, γ, and temperature at mean density, T0, where T(Δ) = T0Δγ − 1. We find that a significant population of partial Lyman-limit systems (LLSs) with a column-density distribution slope of βpLLS  ∼  − 2 are required to explain the data at the low-transmission end of transmission PDF, while uncertainties in the mean Lyα forest transmission affect the high-transmission end. After modeling the LLSs and marginalizing over mean transmission uncertainties, we find that γ = 1.6 best describes the data over our entire redshift range, although constraints on T0 are affected by systematic uncertainties. Within our model framework, isothermal or inverted temperature–density relationships (γ  ⩽  1) are disfavored at a significance of over 4σ, although this could be somewhat weakened by cosmological and astrophysical uncertainties that we did not model.

Export citation and abstract BibTeX RIS

1. INTRODUCTION

Remarkably soon after the discovery of the first high-redshift (zqso ≳ 2) quasars (Schmidt 1965), Gunn & Peterson (1965) realized that the amount of resonant Lyman-α (Lyα) scattering off neutral hydrogen structures observed in the spectra of these quasars could be used to constrain the state of the intergalactic medium (IGM) at high-redshifts: they deduced that the hydrogen in the IGM had to be highly photo-ionized (neutral fractions of $n_{{\rm H\,\scriptsize{I}}}/n_{\rm H}\,{<}\,10^{-4}$) and hot (temperatures, T  >  104 K).

Lynds (1971) then discovered that this Lyα absorption could be separated into discrete absorption lines, i.e., the Lyα "forest." Over the next two decades, it was recognized that the individual Lyα forest lines have Voigt absorption profiles corresponding to Doppler-broadened systems with T  ∼  1 − 3 × 104 K (see, e.g., Rauch et al. 1992; Ricotti et al. 2000; Schaye et al. 2000; McDonald et al. 2001; Tytler et al. 2004; Lidz et al. 2010; Becker et al. 2011) and neutral column densities of N  ∼  1013–1017 cm−2 (Petitjean et al. 1993; Penton et al. 2000; Janknecht et al. 2006; Rudie et al. 2013), and increasingly precise measurements of mean Lyα forest transmission have been carried out (Theuns et al. 2002; Bernardi et al. 2003; Faucher-Giguère et al. 2008; Becker et al. 2013). However, the exact physical nature of these absorbers was unclear for many years (see Rauch 1998, for a historical review of the field).

Beginning in the 1990s, detailed hydrodynamical simulations of the IGM led to the current physical picture of the Lyα forest arising from baryons in the IGM, which trace fluctuations in the dark matter (DM) field induced by gravitational collapse, in ionization balance with a uniform ultraviolet ionizing background (see, e.g., Cen et al. 1994; Miralda-Escudé et al. 1996; Croft et al. 1998; Davé et al. 1999; Theuns et al. 1998). A physically motivated analytic description of this picture is the fluctuating Gunn–Peterson approximation (Croft et al. 1998; Hui et al. 1997), in which the Lyα optical depth, τ, scales with underlying matter density, ρ, through a polynomial relationship:

Equation (1)

where Γ is the background photoionization rate, and Δ ≡ ρ/〈ρ〉 is the matter density relative to the mean density of the universe at the given epoch. In the second proportionality above, we have made the assumption that the local temperature of the gas has a polynomial relationship with the local density,

Equation (2)

where T0 is the gas temperature at mean density and γ parameterizes the temperature–density relation, which encodes the thermal history of the IGM (e.g., Hui & Gnedin 1997; Schaye et al. 1999; Ricotti et al. 2000; McDonald et al. 2001; Hui & Haiman 2003; see Meiksin 2009 for a detailed overview on the relevant physics).

Over the past decade and a half, the 2000–2008 Sloan Digital Sky Survey (SDSS-I and -II; York et al. 2000; Stoughton et al. 2002, http://www.sdss.org) spectroscopic data has represented a dramatic improvement in the statistical power available to Lyα forest studies: McDonald et al. (2006) measured the one-dimensional (1D) Lyα forest transmission power spectrum from ≈3000 SDSS quasar sightlines. This measurement was used to place significant constraints on cosmological parameters and large-scale structure (see, e.g., McDonald et al. 2005b; Seljak et al. 2005; Viel & Haehnelt 2006).

The McDonald et al. (2006) quasar sample, which in its time represented a ∼100 increase in sample size over previous data sets, is superseded by the Baryon Oscillations Sky Survey (BOSS; part of SDSS-III; Eisenstein et al. 2011; Dawson et al. 2013) quasar survey. This spectroscopic survey, which operated between fall 2009 and spring 2014, is aimed at taking spectra of ∼150, 000 zqso ≳ 2.2 quasars (Dawson et al. 2013) with the goal of constraining dark energy at z  >  2 using transverse correlations of Lyα forest absorption (see, e.g., Slosar et al. 2011) to measure the baryon acoustic oscillation (BAO) scale.15 At time of writing, the full BOSS survey is complete, with ∼170, 000 high-redshift quasars observed, although this paper is based on the earlier sample of ∼50, 000 BOSS quasars from SDSS Data Release 9 (DR9; Ahn et al. 2012; Pâris et al. 2012; Lee et al. 2013).

The quality of the individual BOSS Lyα forest spectra might appear at first glance inadequate for studying the astrophysics of the IGM, that have to date been carried out largely with high-resolution, high-signal-to-noise ratio (S/N) spectra: the typical BOSS spectrum has S/N  ∼  2 pixel−1,16 since the BAO analysis is optimized with large numbers of low S/N sightlines, densely sampled on the sky (McDonald & Eisenstein 2007; McQuinn & White 2011). It is therefore interesting to ask whether it is possible to model the various instrumental and astrophysical effects seen in the BOSS Lyα forest spectra, to sufficient accuracy level to exploit the unprecedented statistical power.

In this paper, we will measure the probability distribution function (PDF) of the Lyα forest transmission, F ≡ exp (− τ), from BOSS. This one-point statistic, which was first studied by Jenkins & Ostriker (1991), is sensitive to astrophysical parameters such as the amplitude of matter fluctuations and the thermal history of the IGM. However, the transmission17 PDF is also highly sensitive to effects such as pixel noise level, resolution of the spectra, and systematic uncertainties in the placement of the quasar continuum level, especially in moderate resolution spectra such as SDSS or BOSS. Desjacques et al. (2007) studied the transmission PDF from a sample of ∼3500 Lyα forest spectra from SDSS Data Release 3 (Abazajian et al. 2005). Using mock spectra generated from a log-normal model of the Lyα forest with parameters tuned to reproduce high-resolution, high-S/N spectra, they fitted for the estimated pipeline noise level and continuum-fitting errors in the SDSS spectra. They concluded that the noise levels reported by the SDSS pipeline were underestimated by ∼10%, consistent with the findings of McDonald et al. (2006). They also found that the quasar continuum-level was systematically lower by ∼10% in comparison with a power law extrapolated from redward of the quasar Lyα line, with a rms variance of ∼20%, although certain aspects of their study, e.g., the noise modeling and quasar continuum model, were rather crude.

We intend to take an approach distinct from that of Desjacques et al. (2007): instead of treating the noise and continuum as free parameters, we will attempt to measure the BOSS Lyα forest transmission PDF using a rigorous treatment of the noise and continuum-fitting, and then adopt a "forward-modeling" approach of trying to model the various instrumental effects as accurately as possible in mock spectra generated from detailed hydrodynamical simulations. Using the raw individual exposures and calibration data from BOSS, we will first implement a novel probabilistic method for co-adding the exposures, which will yield more accurate noise estimates and also enable self-consistent noise modeling in mock spectra. Similarly, we will use a new method for continuum estimation called mean flux regulated/principal component analysis (MF-PCA; Lee et al. 2012). This technique provides unprecedented continuum accuracy for noisy Lyα forest spectra: <10% rms errors for S/N  ∼  2 and <5% rms errors for S/N ≳ 5 spectra.

On the modeling side, we will use the detailed hydrodynamical IGM simulations of Viel et al. (2013a) as a basis. The mock spectra are then smoothed to BOSS resolution, and have Lyman-limit systems (LLSs) and metal contamination added, followed by the introduction of pixel noise based on our improved noise estimates. We will then self-consistently introduce continuum errors by applying our continuum-estimation procedure to the mock spectra.

With the increase in statistical power from the sheer number of BOSS spectra, and our improved modeling of the noise and continuum, we expect to significantly reduce the errors on the measured transmission PDF in comparison with Desjacques et al. (2007). This should enable us to place independent constraints on the shape of the underlying transmission PDF, and the thermal history of the IGM as parameterized by the power-law temperature–density relation, γ and T0.

The IGM temperature–density relationship is a topic of recent interest, as Bolton et al. (2008) and Viel et al. (2009) have found evidence of an inverted temperature–density relation, γ  <  1, at z  ∼  2–3 from the transmission PDF from high-resolution, high-S/N Lyα forest spectra (Kim et al. 2007), implying that IGM voids are hotter than overdensities. This result is in contrast with theoretical expectations of γ ≈ 1.6 (Miralda-Escudé & Rees 1994; Hui & Gnedin 1997; Theuns et al. 1998; Hui & Haiman 2003), which arises from the balance between adiabatic cooling in the lower-density IGM and photoheating in the higher-density regions. Even inhomogeneous He ii reionization, which is expected to flatten the IGM temperature–density relation (see, e.g., Furlanetto & Oh 2008; Bolton et al. 2009; McQuinn et al. 2009), is insufficient to account for the extremely low values of γ  ∼  0.5 estimated by the aforementioned authors (although inversions could occur at higher densities, see, e.g., Meiksin & Tittley 2012).

Indeed, earlier papers studying the temperature–density relationship using either the transmission PDF (McDonald et al. 2001) or by measuring the Doppler parameters and hydrogen column densities of individual forest absorbers (the so-called $b-N_{{\rm H\,\scriptsize{I}}}$ relation, e.g., Schaye et al. 1999; Ricotti et al. 2000; Rudie et al. 2012) have found no evidence of an inverted γ. In recent years, the decay of blazar gamma rays via plasma instabilities (Broderick et al. 2012; Chang et al. 2012; although see Sironi & Giannios 2014) has been invoked as a possible mechanism for supplying the heat necessary to flatten γ to the observed levels (Puchwein et al. 2012).

It would be desirable to perform an independent re-analysis of high-resolution data taking into account continuum-fitting bias (Lee 2012), to place these claims on a firmer footing. However, Lee & Spergel (2011) have argued that the complete SDSS DR7 (Abazajian et al. 2009) Lyα forest data set could have sufficient statistical power to place interesting constraints on γ, even assuming continuum-fitting errors at the ∼10% rms level. Therefore, with the current BOSS data, we hope to model noise and resolution, as well as astrophysical systematics, at a sufficient precision to place interesting constraints on the IGM thermal history.

This paper is organized as follows: we first give a broad overview of the BOSS Lyα forest data set, followed by our measurement of the BOSS transmission PDF with detailed descriptions of our method of combining multiple raw exposures and continuum estimation. We then discuss how we include various instrumental and astrophysical effects into our modeling of the transmission PDF, starting with hydrodynamical simulations. The model transmission PDF is then compared with the observed PDF to obtain constraints on the thermal parameters governing the IGM.

2. DATA

2.1. Summary of BOSS

BOSS (Dawson et al. 2013) is part of SDSS-III (Eisenstein et al. 2011; the other surveys are SEGUE-2, MARVELS, and APOGEE). The primary goal of the survey is to carry out precision BAO measurements at z  ∼  0.5 and z  ∼  2.5, from the luminous red galaxy distribution and Lyα forest absorption field, respectively (see, e.g., Anderson et al. 2014; Busca et al. 2013; Slosar et al. 2013). Its eventual goal is to obtain spectra of ∼1.5 million luminous red galaxies and ∼170, 000 z  >  2.15 quasars over 4.5 yr of operation.

BOSS is conducted on upgraded versions of the twin SDSS spectrographs (Smee et al. 2013) mounted on the 2.5 m Sloan telescope (Gunn et al. 2006) at the Apache Point Observatory, NM. One thousand optical fibers mounted on a plug-plate at the focal plane (spanning a 3° field of view) feed the incoming flux to the two identical spectrographs, of which 160–200 fibers per plate are allocated to quasar targets (see Ross et al. 2012; Bovy et al. 2011 for a detailed description of the quasar target selection). Both spectrographs split the light into blue and red cameras that cover 3610–10140 Å, with the dichroic overlap region occurring at around 6000 Å. The resolving power R ≡ λ/Δλ ranges from 1300 at the blue end to 2600 at the red end.

Each plate is observed for sufficiently long to achieve the S/N requirements set by the survey goals; typically, 5 individual exposures of 15 minutes are taken. The data are processed, calibrated, and combined into co-added spectra by the "idlspec2d" pipeline, followed by a pipeline that operates on the 1D spectra to classify objects and assign redshifts (Bolton et al. 2012). However, as described later in this paper, we will generate our own co-added spectra from the individual exposures and other intermediate data products.

2.2. Data Cuts

In this paper we use data from the publicly available SDSS Data Release 9 (DR9; Ahn et al. 2012). This includes 87,822 quasars at all redshifts that have been confirmed by visual inspection as described in Pâris et al. (2012). In Lee et al. (2013), we have defined a further subset of 54,468 quasars with zqso ⩾ 2.15 that are suitable for Lyα forest analysis, and have provided in individual FITS files for each quasar various products such as sky masks, masks for damped Lyα absorbers (DLAs), noise corrections, and continua; these are designed to ameliorate systematics in the BOSS spectra and aid in Lyα forest analysis (see Table 1 in Lee et al. 2013 for a full listing). While we use this Lee et al. (2013) catalog as a starting point, in this paper we will generate our own custom co-added spectra and noise estimates.

The typical S/N of the BOSS Lyα forest quasars is low: 〈S/N〉 ≈ 2 pixel−1 within the Lyα forest; this criterion is driven by a strategy to ensure a large number of sightlines over a large area in order to optimize the 3D Lyα forest BAO analysis (McDonald & Eisenstein 2007; McQuinn & White 2011), rather than increasing the S/N in individual spectra. However, for our analysis we wish to select a subset of BOSS Lyα forest sightlines with reasonably high S/N in order to reduce the sensitivity of our PDF measurement to inaccuracies in our modeling of the noise and continuum of the BOSS spectra. We therefore make a cut on the S/N, including only sightlines that have a median of 〈S/N〉 ⩾ 6 pixel−1 within the Lyα forest,18 defined with respect to the pipeline noise estimate (see Lee et al. 2013)—this selects only ∼10% of the spectra with the highest S/N. The 1041–1185 Å Lyα forest region of each quasar must also include at least 30 pixels (Δv = 2071 km s−1) within one of our absorption redshift bins of 〈z〉 = 2.3, 〈z〉 = 2.6, and 〈z〉 = 3.0, with bin widths of Δz = 0.3 (see Section 3).

We discard spectra with identified DLAs in the sightline, as listed in the "DLA Concordance Catalog" used in the Lee et al. (2013) sample. This DLA catalog (W. Carithers 2014, in preparation) includes objects with column densities $N_{{\rm H\,\scriptsize{I}}}\,{>}\,10^{20}\,\mathrm{cm^{-2}}$; however, the completeness of this catalog is uncertain below $N_{{\rm H\,\scriptsize{I}}} = 10^{20.3}\,\mathrm{cm^{-2}}$. We therefore discard only sightlines containing DLAs with $N_{{\rm H\,\scriptsize{I}}} \ge 10^{20.3}\,\mathrm{cm^{-2}}$, and take into account lower column-density absorbers in our subsequent modeling of mock spectra. At the relatively high S/N that we will work with (see below), the detection efficiency of DLAs is essentially 100% (see, e.g., Prochaska et al. 2005; Noterdaeme et al. 2012) and thus we expect our rejection of $N_{{\rm H\,\scriptsize{I}}}\ge 10^{20.3}\,\mathrm{cm^{-2}}$ DLAs to be quite thorough.

Measurements of the Lyα forest transmission PDF are known to be sensitive to the continuum estimate (Lee 2012), but in this paper we use an automated continuum-fitter, MF-PCA (Lee 2012), that is less susceptible to biases introduced by manual continuum estimation. Moreover, unlike the laborious process of manually fitting continua on high-resolution spectra, the automated continuum estimation can be used to explore various biases in continuum estimation. For this purpose, we will use the same MF-PCA continuum estimation used in Lee et al. (2013), albeit with minor modifications as described in Section 3.2. We select only quasars that appear to be well described by the continuum basis templates, based on the goodness-of-fit to the quasar spectrum redward of Lyα. This is flagged by the variable CONT_FLAG =1 as listed in the Lee et al. (2013) catalog (see Table 3 in that paper). Broad absorption line (BAL) quasars, the continua of which are difficult to estimate due to broad intrinsic absorption troughs, have already been discarded from the Lee et al. (2013) sample.

Another consideration is that the shape of the transmission PDF is affected by the resolution of the spectrum, especially since the BOSS spectrographs do not resolve the Lyα forest. The exact spectral resolution of a BOSS spectrum at a given wavelength varies as a function of both observing conditions and row position on the BOSS CCDs. The BOSS pipeline reports the wavelength dispersion at each pixel, σdisp, in units of the co-added wavelength pixel size (binned such that ln (10) Δ(λ)/λ = 10−4). This is related to the resolving power by R ≈ (2.35 × 1 × 10−4ln 10 σdisp)−1. Palanque-Delabrouille et al. (2013) have recently found, using their own analysis of the width of the arc-lamp lines and bright sky emission lines, that the spectral dispersion reported by the pipeline had a bias that depended on the CCD row and increased with wavelength, up to 10% at λ ≈ 6000 Å. We will correct for this bias when creating mock spectra to compare with the data, as described in Section 4. Figure 1 shows the (uncorrected) pixel dispersions from 236 BOSS quasars from the 〈z〉 = 2.3, S/N = 6–8 bin, as a function of wavelength at the blue end (λ = 3700–4200 Å) of the spectrograph. At a fixed wavelength, there are outliers that contribute to the large spread in σdisp, e.g., ranging from σdisp ≈ 0.9–1.8 at 3700 Å. We therefore discard spectra with outlying values of σdisp based on the following criterion: we first rank-order the spectra based on their σdisp value evaluated at the central wavelength of each PDF bin (i.e., λ = [4012, 4377, 4863] Å at 〈z〉 = [2.3, 2.6, 3.0]), and then discard spectra below the 5th percentile and above the 90th percentile. This is illustrated by the dashed red lines in Figure 1.

Figure 1.

Figure 1. Wavelength dispersions, σdisp, for 236 BOSS quasar spectra randomly selected from the 〈z〉 = 2.3, 6  <  S/N  <  8 PDF bin. The ordinate axis on the right shows the equivalent spectral resolution, R ≡ λ/Δλ. The dashed red lines are objects that have been discarded from the analysis because they are outliers in spectral dispersion.

Standard image High-resolution image

Finally, since our noise estimation procedure uses the individual BOSS exposures, we discard objects that have less than three individual exposures available.

Our final data set comprises 3373 unique quasars with redshifts ranging from zqso = 2.255 to zqso = 3.811, and a median S/N of S/N = 8.08 pixel−1. This data set represents only a small subsample of the BOSS DR9 quasar spectra, but is over two orders of magnitude larger than high-resolution quasar samples previously used for transmission PDF analysis. Table 1 summarizes our data sample, and the statistics of the redshifts and S/N bins for which we measure the transmission PDF. Figure 2 shows histograms of the pixels used in our analysis, as a function of absorption redshift.

Figure 2.

Figure 2. Pixel distribution of Lyα absorber redshifts in the BOSS Lyα forest sample used in this paper, shown in bin sizes of Δz = 0.05. The different colors and line-styles denote the three redshift bins used in this paper. We have chosen these redshift bins—with the gap at 2.75  <  z  <  2.85—to match the simulation redshifts (Section 4.1).

Standard image High-resolution image

Table 1. Binning of BOSS Lyα Forest Transmission PDFs

Lyα Forest S/Na Nspecb Npixc Δvd Δze ΔXf
Redshift (pixel−1) ( km s−1)
2.15 < z < 2.45 6–8 1109 288442 1.99 × 107 219 704
  8–10 501 129141 8.90 × 106 97.9 315
  >10 561 146478 1.01 × 107 111 357
2.45 < z < 2.75 6–8 1004 229898 1.59 × 107 191 646
  8–10 490 107001 7.38 × 106 88.6 300
  >10 604 140843 9.71 × 106 117 396
2.85 < z < 3.15 6–8 511 108443 7.48 × 106 99.7 358
  8–10 326 72448 5.00 × 106 66.7 239
  >10 341 74284 5.12 × 106 68.3 245

Notes. aMedian S/N within Lyα forest. bNumber of contributing spectra. cNumber of Δv = 69 km s−1 pixels. dVelocity path length. eRedshift path length. fAbsorption distance, where $dX/dz = (1+z)^2 (\Omega _M (1+z)^3 + \Omega _\Lambda)^{-1/2}$. For this conversion, we assume ΩM = 0.3 and $\Omega _\Lambda = 0.7$.

Download table as:  ASCIITypeset image

3. MEASURING THE TRANSMISSION PDF FROM BOSS

In this section, we will measure the Lyα forest transmission PDF from BOSS. In principle, the transmission PDF is simply the histogram of the transmitted flux in the Lyα forest after dividing by the quasar continuum. However, with the comparatively noisy BOSS data we need to ensure an accurate estimate of the pixel noise. We will therefore first describe a new probabilistic method for co-adding the individual BOSS exposures that will enable us to have an accurate noise estimate. We will also describe the continuum-estimation method with which we normalize the forest transmission.

3.1. Co-addition of Multiple Exposures and Noise Estimation

Since we intend to model BOSS spectra with modest S/N, we need an accurate estimate of the pixel noise that also allows us to separate out the contributions from Poisson noise due to the background and sky as well as read noise from the detector. In this subsection, we will construct an accurate probabilistic model of the flux and noise of the BOSS spectrograph, based on the individual exposure data that BOSS delivers.

The basic BOSS spectral data consists of a spectrum of each raw exposure, fλi (inclusive of noise), an estimate of the sky sλi, and a calibration vector Sλi, where i indicates the exposure of the nexp exposures taken.19 The quantity sλi is the actual sky model that was subtracted from the fiber spectra in the extraction. The calibration vector is defined as $S_{\lambda i} \equiv f_{\lambda i}/f_{N i}$, with fNi being the flux of exposure i in units of photoelectrons. The idlspec2d pipeline then estimates the co-added spectrum of the true object flux, $\mathcal {F}_\lambda$, from the raw individual exposures, sky estimates, and calibration vectors.

The BOSS data reduction pipeline also delivers noise estimates in the form of variance vectors, which are, however, known to be inaccurate (McDonald et al. 2006; Desjacques et al. 2007; Lee et al. 2013; Palanque-Delabrouille et al. 2013).

To quantify the fidelity of the BOSS noise estimate, we used the so-called "side-band" method described in Lee et al. (2014b) and Palanque-Delabrouille et al. (2013), which uses the variance in flat, absorption-free regions of the quasar spectra to quantify the fidelity of the noise estimate. First, we randomly selected 10,000 BOSS quasars (omitting BAL quasars) from the Pâris et al. (2012) catalog in the redshift range 1.4  ⩽  zqso  <  3.4, evenly distributed into 20 redshift bins of width Δzqso = 0.1 (i.e., 500 objects per bin). We then consider the flat 1460 Å  <  λrest  <  1510 Å spectral region in the quasar rest-frame, which is dominated by the smooth power-law continuum and relatively unaffected by broad emission lines (e.g., Vanden Berk et al. 2001; Suzuki 2006) or absorption lines. The pixel variance in this flat portion of the spectrum should therefore be dominated by spectral noise, allowing us to examine whether the noise estimate provided by the pipeline is accurate. We then evaluate the ratio of σside, the pixel flux rms in the rest-frame 1460 Å  <  λrest  <  1510 Å region divided by the average pipeline noise estimate, σλ:

Equation (3)

where the summations and average flux is evaluated in the quasar rest-frame 1460 Å  <  λrest  <  1510 Å.

In Figure 3, this quantity is averaged over the 500 individual quasars per redshift bin and plotted as a function of the observed wavelength corresponding to λ = (1 + 〈zqso〉)1485 Å. With a perfect noise estimate, 〈σsideλ〉 should be unity at all wavelengths, but we see that the BOSS pipeline underestimates the true noise in the spectra at λ  ≲  5000 Å, by up to ∼15% at the blue end of the spectra, with an overall tilt that changes over to an overestimate at λ ≳ 4500 Å. Lee et al. (2013) and Palanque-Delabrouille et al. (2013) provide a set of correction vectors that can be applied to the pipeline noise estimates to bring the latter to within several percent of the true noise level across the wavelength coverage of the blue spectrograph.

Figure 3.

Figure 3. Quantitative test of the noise estimation fidelity in the spectra. Each point shows the ratio of the pixel variance divided by the estimated noise variance, averaged over the rest-frame 1460 Å  <  λrest  <  1510 Å flat spectral region of 500 BOSS quasars within redshift bins of Δzqso = 0.1 and plotted as a function of the corresponding observed wavelength of the flat spectral region. If there is no bias in the noise estimation, this ratio should be unity. The black asterisks show this quantity estimated using the BOSS pipeline co-added spectra and noise estimates, while the red triangles show the results from the Markov Chain Monte Carlo (MCMC) co-addition and noise estimation procedure described in Section 3.1. The MCMC method clearly provides a better noise estimation than the BOSS pipeline.

Standard image High-resolution image

Unfortunately, these noise corrections are inadequate for our purposes, since we want to generate realistic mock spectra that have different realizations of the Lyα forest transmission field from the actual spectra, i.e., a different $\mathcal {F}_{\lambda }$. We therefore require a method that not only accurately estimates the noise in a given BOSS spectrum, but also separates out the photon-counting and CCD terms in the variance that results from applying the Horne (1986) optimal spectral extraction algorithm:

Equation (4)

where σRN is the CCD read-noise.

To resolve this issue, we apply our own novel statistical method to the individual BOSS exposures to generate co-added spectra while simultaneously estimating the corresponding noise parameters for each individual spectrum. This procedure, which uses a Gibbs-sampled Markov-Chain Monte Carlo (MCMC) algorithm, is described in detail in the Appendix. Initially, we attempted to model the noise with just a single constant noise parameter that rescales the read-noise term of Equation (4), but this was found to be inadequate. This is likely because an optimal extraction algorithm weights by the product of the S/N and object profile, causing the corresponding variance to have a non-linear dependence on the flux and sky level. Furthermore, systematic errors in the reduction, sky-subtraction and calibration will result in additional noise contributions, which could depend on sky level, object flux, or wavelength, hence deviating from this simple model.

After considerable trial-and-error to find a model that best minimizes the bias illustrated in Figure 3, we settled on the form:

Equation (5)

where

Equation (6)

where the Aj are free parameters in our noise model, while the σdisp(λ) factor in the second term (the pixel dispersion) provides a rough approximation for the wavelength dependence of the spot size (i.e., the size of the raw CCD image in the spatial direction). Meanwhile, σdisp = 12 is the average CCD read-noise per wavelength bin in the BOSS spectra (D. J. Schlegel et al., in preparation). The quantities sλ, i, Sλ, i, and σdisp(λ) (sky flux, calibration vector, and dispersion, respectively) are taken directly from the BOSS pipeline.

In addition, we assume that the pixel noise can be modeled as a Gaussian distribution with a variance given by Equation (5). The first, photon counting, term in the equation should formally be modeled as a Poisson distribution, but since the BOSS spectrograph always receives ≳ 30–40 counts even at the blue end of the spectrograph where the counts are the lowest, it is reasonable to use the Gaussian approximation because even in the limit of low S/N (i.e., when the spectrum is dominated by the sky flux), the moderate resolution ensures that there are at least several dozen sky photons per pixel in each exposure.

For each BOSS spectrum, we use the MCMC procedure described in the Appendix to combine the multiple exposures while simultaneously estimating the noise parameters Aj and true observed spectrum, $\mathcal {F}_{\lambda }$. With the optimal estimates of Aj and $\mathcal {F}_{\lambda }$ for a given spectrum, the estimated noise variance is then simply Equation (5).

An important advantage of the form in Equation (5) is that the object photon noise $\propto \mathcal {F}_{\lambda }$ is explicitly separated out. This facilitates the construction of a mock spectrum with the same noise characteristics as a true spectrum, but with a different spectral flux. For example, a mock spectrum of the Lyα forest will have a very different transmission field than the original data, and so the variance due to object photon counting noise can be added appropriately, in addition to contributions from the known sky, and the read noise term (Equation (5)). Our empirical determination of the parameters governing this noise model for each individual spectrum form a crucial ingredient in our forward model, which we will describe in Section 4.

Our MCMC procedure works for spectra from a single camera, either red or blue; we have not yet generalized it to combine blue and red spectra of each object. However, the spectral range of the blue camera alone (≈3600–6400 Å) covers the Lyα forest up to z  ∼  5, i.e., most practical redshifts for Lyα forest analysis. For the purposes of this paper, we restrict ourselves to spectra from the blue camera alone.

In Figures 4 and 5, we show examples of co-added BOSS quasar spectra, using both the MCMC procedure and the standard BOSS pipeline. In the upper panels, the MCMC co-adds are not noticeably different from the BOSS pipeline, although the numerical values are different. In the lower panels, we show the estimated noise from both methods—the differences are larger than in the fluxes but still difficult to distinguish by eye.

Figure 4.

Figure 4. Examples of co-added BOSS spectra from the MCMC procedure described in Section 3.1 (red) and from the BOSS pipeline (black) are shown in the upper panels, in the rest-frame interval 1035–1260 Å. The corresponding pixel noise estimates are shown in the upper panels. The blue line shows the MF-PCA continuum used to extract the Lyα forest transmitted flux, while the vertical dotted lines delineate the 1041–1185 Å rest-frame interval, which we define as the Lyα forest. The continuum discontinuity at λrest = 1185 Å is where we have applied the "mean flux regulation" correction to the Lyα forest. In the top figure, masked pixels have had their flux and noise set to zero. The signal-to-noise ratios for the two spectra are S/N ≈ 11 (top) and S/N ≈ 6 (bottom) within the Lyα forest.

Standard image High-resolution image
Figure 5.

Figure 5. Same as Figure 4, but the 1050 Å  <  λrest  <  1090 Å rest-frame region is expanded to better illustrate the differences between the MCMC and pipeline co-added spectra.

Standard image High-resolution image

We therefore return to the statistical analysis by calculating 〈σsideλ〉, the ratio of the pixel variance against the estimated noise from the flat 1460 Å  <  λrest  <  1510 Å region of BOSS quasars; this ratio, computed for our MCMC coadds, is plotted in Figure 3. With these new co-adds, we see that this ratio is within roughly ±3% of unity across the entire λ  ∼  3800–5000 Å wavelength range relevant to our subsequent analysis, with an overall bias of 1% (i.e., the noise is still underestimated by this level). Crucially, we have removed the strong wavelength dependence of 〈σsideλ〉 that was present in the standard pipeline, and we suspect most of the scatter about unity is caused by the limited number of quasars (500 per bin) available for this estimate, which will be mitigated by the larger number of quasars spectra available in the subsequent BOSS data releases. In principle, we could correct the remaining 1% noise bias, but since our selected spectra have S/N > 6, this remaining 1% noise bias would smooth the forest transmission PDF by an amount roughly 1/25 of the average PDF bin width (ΔF = 0.05). As we shall see, there are other systematic uncertainties in our modeling that have much larger effects than this, therefore we regard our noise estimates as adequate for the subsequent transmission PDF analysis, without requiring any further correction.

3.2. Mean-Flux Regulated Continuum Estimation

In order to obtain the transmitted flux F of the Lyα forest20 we first need to divide the observed flux, $\mathcal {F}_{\lambda }$, by an estimate for the quasar continuum, c. We use the version of MF-PCA continuum fitting (Lee et al. 2012) described in Lee et al. (2013). Initially, PCA fitting with eight eigenvectors is performed on each quasar spectrum redward of the Lyα line (λrest = 1216–1600 Å) in order to obtain a prediction for the continuum shape in the λrest  <  1216 Å Lyα forest region (e.g., Suzuki et al. 2005). The slope and amplitude of this initial continuum estimate is then corrected to agree with the Lyα forest mean transmission, 〈Fcont(z), at the corresponding absorber redshifts, using a linear correction function.

The only difference in our continuum-fitting with that in Lee et al. (2013) is that here we use the latest mean flux measurements of Becker et al. (2013) to constrain our continua. Their final result yielded the power-law redshift evolution of the effective optical depth in the unshielded Lyα forest, defined in their paper as $N_{{\rm H\,\scriptsize{I}}}\le 10^{17.2}\ \mathrm{cm^{-2}}$ (although they only removed contributions from $N_{{\rm H\,\scriptsize{I}}}\ge 10^{19}\ \mathrm{cm^{-2}}$ absorbers). This is given by

Equation (7)

with best-fit values of [τ0, β, C] = [0.751, 2.90, −0.132] at z0 = 3.5.

However, the actual raw measurement made by Becker et al. (2013) is the effective total absorption within the Lyα forest region of their quasars, which also contain contributions from metals and optically thick systems:

Equation (8)

where τmetals and τLLS(z) denote the IGM optical depth contributions from metals and LLSs, respectively. For the purposes of our continuum-fitting, the quantity we require is τeff(z), since the τmetals and τLLS(z) contributions are also present in our BOSS spectra. Becker et al. (2013) did not publish their raw τeff(z), therefore we must now "uncorrect" the metal and LLS contributions from the published τLyα, B13(z). The discussion below therefore attempts to retrace their footsteps and does not necessarily reflect our own beliefs regarding the actual level of these contributions.

We find τmetals = 0.02525 by simply averaging over the Schaye et al. (2003) metal correction tabulated by Faucher-Giguère et al. (2008) (i.e., the 2.2 ⩽ z  ⩽  2.5 values in Δz = 0.1 bins from their Table 4), that were used by Becker et al. (2013) to normalize their relative mean flux measurements. Note that there is no redshift dependence on τmetals in this context, because Becker et al. (2013) argued that the metal contribution does not vary significantly over their redshift range. Whether or not this is really true is unimportant to us at the moment, since we are merely "uncorrecting" their measurement.

The LLS contribution to the optical depth is re-introduced by integrating over $f(N_{{\rm H\,\scriptsize{I}}}, b, z)$, the column-density distribution of neutral hydrogen absorbers:

Equation (9)

where b is the Doppler parameter and $W_0(N_{{\rm H\,\scriptsize{I}}},b)$ is the rest-frame equivalent width (EW; we use the analytic approximation given by Draine 2011, valid in the saturated regime).

Following Becker et al. (2013), we adopted a fixed value of b = 20 km s−1 and assumed that $f(N_{{\rm H\,\scriptsize{I}}},z) = f(N_{{\rm H\,\scriptsize{I}}}) {d}n/{d}z$, where $f(N_{{\rm H\,\scriptsize{I}}})$ is given by the z = 3.7 broken power-law column density distribution of Prochaska et al. (2010) and dn/dz∝(1 + z)2. Becker et al. (2013) had corrected for super-LLSs and DLAs in the column-density range [Nmin, Nmax] = [1019 cm−2, 1022 cm−2], but as discussed above we have discarded all sightlines that include $N_{{\rm H\,\scriptsize{I}}}\ge 10^{20.3} \,\mathrm{cm^{-2}}$ DLAs; therefore, we reintroduce the optical depth contribution for super-LLSs, i.e., [Nmin, Nmax] = [1019 cm−2, 1020.3 cm−2]. We find τLLS(z) = 0.0022 × [(1 + z)/3]3. This is a small correction, giving rise to only a 0.5% change in 〈F〉 at z = 3.0.

This estimate of the raw absorption, 〈Feff(z) = exp [ − τeff(z)], is now the constraint used to fit the continua of the BOSS quasars, i.e., we set 〈Fcont = 〈Feff(z). Note that in our subsequent modeling of the data, we will use the same 〈Fcont(z) to fit the mock spectra to ensure an equal treatment between data and mocks. Since 〈Fcont(z) includes a contribution from $N_{{\rm H\,\scriptsize{I}}}\,{<}\,10^{20.3}\ \mathrm{cm^{-2}}$ optically thick systems, our mock spectra will need to account for these systems as we shall describe in Section 4.2.

The MF-PCA technique requires spectral coverage in the quasar rest-frame interval 1000–1600 Å. However, as noted in the previous section, we work with co-added BOSS spectra from only the blue cameras covering λ  ≲  6400 Å; this covers the full 1000–1600 Å interval required for the PCA fitting only for z  ≲  3 quasars. However, the differences in the fluxes between our MCMC co-adds and the BOSS pipeline co-adds are relatively small, and we do not expect the relative shape of the quasar spectrum to vary significantly. We can thus carry out PCA fitting on the BOSS pipeline co-adds, which cover the full observed range (3700–10000 Å), to predict the overall quasar continuum shape. This initial prediction is then used to perform mean flux regulation using the MCMC co-adds and noise estimates, to fine-tune the amplitude of the continuum fits.

The observed flux, fλ, is divided by the continuum estimate, c, to derive the Lyα forest transmission, F = fλ/c. For each quasar, we define the Lyα forest as the rest wavelength interval 1041–1185 Å. This wavelength range conservatively avoids the quasar's Lyβ/O vi emission line blend by Δv  ∼  3000 km s−1 on the blue end, as well as the proximity zone close to the quasar redshift by staying Δv  ∼  10, 000 km s−1 from the nominal quasar systemic redshift. We are now in a position to measure the transmission PDF, which is simply the histogram of pixel transmissions F ≡ exp (− τ).

3.3. Observed Transmission PDF from BOSS

Since the Lyα forest evolves as a function of redshift, we measure the BOSS Lyα forest transmission PDF in three bins with mean redshifts of 〈z〉 = 2.3, 〈z〉 = 2.6, and 〈z〉 = 3.0, and bin sizes of Δz = 0.3. These redshift bins were chosen to match the simulations outputs (Section 4.1) that we will later use to make mock spectra to compare with the observed PDF; this choice of binning leads to the gap at 2.75  <  z  <  2.85 as seen in Figure 2. In this paper, we restrict ourselves to z  ≲  3 since the primary purpose is to develop the machinery to model the BOSS spectra. In subsequent papers, we will apply these techniques to analyze the transmission PDF in the full 2  ≲  z  ≲  4 range using the larger samples of subsequent BOSS data releases (DR10; Ahn et al. 2014).

Another consideration is that the transmission PDF is strongly affected by the noise in the data. While we will model this effect in detail (Section 4), there is a large distribution of S/N within our subsample ranging from S/N = 6 pixel−1 to S/N  ∼  20 pixel−1. We therefore further divide the sample into three bins depending on the median S/N per pixel within the Lyα forest: 6  <  S/N  <  8, 8  <  S/N  <  10, S/N > 10. The consistency of our results across the S/N bins will act as an important check for the robustness of our noise model (Section 3.1).

We now have nine redshift and S/N bins in which we evaluate the transmission PDF from BOSS; the sample sizes are summarized in Table 1. For each bin, we have selected quasars that have at least 30 Lyα forest pixels within the required redshift range, and which occupy the quasar rest-frame interval 1041–1185 Å. The co-added spectrum is divided with its MF-PCA continuum estimate (described in the previous section) to obtain the transmitted flux, F, in the desired pixels. We then compute the transmission PDF from these pixels.

Physically, the possible values of the Lyα forest transmission range from F = 0 (full absorption) to F = 1 (no absorption). However, the noise in the BOSS Lyα forest pixels, as well as continuum fitting errors, leads to pixels with F  <  0 and F > 1. We therefore measure the transmission PDF in the range −0.2  <  F  <  1.5, in 35 bins with width Δ(F) = 0.05, and normalized such that the area under the curve is unity. The statistical errors on the transmission PDF are estimated by the following method: we concatenate all the individual Lyα forest segments that contribute to each PDF, and then carry out bootstrap resampling over Δv = 2 × 104 km s−1 segments with 200 iterations. This choice of Δv corresponds to ∼250–300 Å in the observed frame at z  ∼  2–3—according to Rollinde et al. (2013), this choice of Δv and number of iterations should be sufficient for the errors to converge (see also Appendix B in McDonald et al. 2000).

In Figure 6, we show the Lyα forest transmission PDF measured from the various redshift and S/N subsamples in our BOSS sample. At fixed redshift, the PDFs from the lower S/N data have a broader shape as expected from increased noise variance. With increasing redshift, there are more absorbed pixels, causing the transmission PDFs to shift toward lower F values. As discussed previously, there is a significant portion of F  >  1 pixels due to a combination of pixel noise and continuum errors, with a greater proportion of F  >  1 pixels in the lower-S/N subsamples as expected. Unlike the high-resolution transmission PDF, at 〈z〉  ≲  3 there are few pixels that reach F = 0. This effect is due to the resolution of the BOSS spectrograph, which smooths over the observed Lyα forest such that even saturated Lyα forest absorbers with $N_{{\rm H\,\scriptsize{I}}}\,{\gtrsim}\,10^{14}\hbox{--}10^{16}\, \mathrm{cm^{-2}}$ rarely reach transmission values of F  ≲  0.3. The pixels with F  ≲  0.3 are usually contributed either by blends of absorbers or optically thick LLSs (see also Pieri et al. 2014).

Figure 6.

Figure 6. Lyα forest transmission PDFs, p(F), measured from different subsamples of our BOSS sample, at various redshift (with Δz = 0.3) and S/N. Both the upper and lower panels show the PDF, but with linear and logarithmic ordinate axes, respectively. The different colors and line styles denote our different S/N subsamples at each redshift. The error bars are estimated from bootstrap resampling over Δv = 2 × 104 km s−1 segments from the contributing spectra. Table 1 summarizes the number of spectra and pixels that contribute to each bin.

Standard image High-resolution image

An advantage of our large sample size is that it is also able to directly estimate the error covariances, Cboot, via bootstrap resampling—an example is shown in Figure 7. In contrast to the Lyα forest transmission PDF from high-resolution data which have significant off-diagonal covariances (Bolton et al. 2008), the error covariance from the BOSS transmission PDF is nearly diagonal with just some small correlations between neighboring bins, although we also see some anti-correlation between transmission bins at F  ∼  0.8 and F  ∼  1.

Figure 7.

Figure 7. Top: 2D density plot of the error covariance matrix for the Lyα forest transmission PDF from the 〈z〉 = 2.6, S/N = 8–10 BOSS subsample as a function of transmission bins, along with (bottom) the corresponding correlation function. The covariance matrix was estimated through bootstrap resampling, and the values have been multiplied by 104 for clarity. The covariances are largely diagonal, except for some cross-correlations between neighboring bins.

Standard image High-resolution image

It is interesting to compare the transmission PDF from our data with that measured by Desjacques et al. (2007) from SDSS DR3. This comparison is shown in Figure 8, in which the transmission PDFs calculated from SDSS DR3 Lyα forest spectra with S/N > 4 (kindly provided by Dr. V. Desjacques) are shown for two redshift bins, juxtaposed with the BOSS transmission PDFs calculated from spectra with the same redshift and S/N cuts.

Figure 8.

Figure 8. Comparison between the Lyα forest transmission PDFs measured from our BOSS DR9 sample (solid black lines), and the SDSS DR3 sample from Desjacques et al. (2007; dashed red lines). Only sightlines with S/N > 4 were used in evaluating these PDFs. The lower average transmission of the DR3 PDFs is because Desjacques et al. (2007) had directly extrapolated a power law from λrest > 1216 Å for continuum estimates, which does not take into account a flattening of the quasar continuum that occurs at λrest  ∼  1200 Å; our BOSS spectra, in contrast, have been normalized to mean transmission values in agreement with the latest measurements and takes this effect into account.

Standard image High-resolution image

While there is some resemblance between the two PDFs, the most immediate difference is that the Desjacques et al. (2007) PDFs are shifted to lower transmission values, i.e., the mean transmission, 〈F〉, is considerably smaller than that from our BOSS data: 〈F〉(z = 2.4) = 0.73 and 〈F〉(z = 3.0) = 0.64 from their measurement, whereas the BOSS PDFs have 〈F〉(z = 2.4) = 0.80 and 〈F〉(z = 3.0) = 0.70. This difference arises because the Desjacques et al. (2007) used a power-law continuum (albeit with corrections for the weak emission lines in the quasar continuum) extrapolated from λrest > 1216 Å in the quasar rest-frame; this does not take into account the power-law break that appears to occur in low-redshift quasar spectra at λrest ≈ 1200 Å (Telfer et al. 2002; Suzuki 2006). Later in their paper, Desjacques et al. (2007) indeed conclude that this must be the case in order to be consistent with other 〈F〉(z) measurements. Our continua, in contrast, have been constrained to match existing measurements of 〈F〉(z), for which there is good agreement between different authors at z  ≲  3 (e.g., Faucher-Giguère et al. 2008; Becker et al. 2013).

Another point of interest in Figure 8 is that the error bars of the BOSS sample are considerably smaller than those of the earlier measurement. This difference is largely due to the significantly larger sample size of BOSS. The proportion of pixels with F  ≲  0 appears to be smaller in the BOSS PDFs compared with the older data set, but this is because Desjacques et al. (2007) did not remove DLAs from their data.

We next describe the creation of mock Lyα absorption spectra designed to match the properties of the BOSS data.

4. MODELING OF THE BOSS TRANSMISSION PDF

In this section, we will describe simulated Lyα forest mock spectra designed, through a "forward-modeling" process, to have the same characteristics as the BOSS spectra, for comparison with the observed transmission PDFs described in the previous section. For each BOSS spectrum that contributed to our transmission PDFs in the previous section, we will take the Lyα absorption from randomly selected simulation sightlines, then introduce the characteristics of the observed spectrum using auxiliary information returned by our pipeline.

Starting with simulated spectra from a set of detailed hydrodynamical IGM simulations, we carry out the following steps, which we will describe in turn in the subsequent subsections:

  • 1.  
    Introduce LLS absorbers.
  • 2.  
    Smooth the spectrum to BOSS resolution.
  • 3.  
    Add metal absorption via an empirical method using lower-redshift SDSS/BOSS quasars.
  • 4.  
    Add pixel noise, based on the noise properties of the real BOSS spectrum using parameters estimated by our MCMC noise estimation technique.
  • 5.  
    Simulate continuum errors by refitting the noisy mock spectrum.

In the subsequent subsections, we will describe each step in detail. The effect of each step on the observed transmission PDF is illustrated in Figure 9.

Figure 9.

Figure 9. Cumulative effect of various aspects of our forward model that attempts to reproduce the Lyα forest transmission PDF from BOSS. Starting with the "raw" transmission PDF from the simulations (top), the black curve in each panel shows the PDF from the prior panel, while the red curve shows the effect from (a) the addition of LLS, (b) smoothing from the finite spectrograph resolution, (c) contamination from lower-redshift metals, (d) pixel noise, and (e) continuum fitting errors. The transmission PDF modeled in this figure is from the 〈z〉 = 2.3, 8  <  S/N  <  10 bin.

Standard image High-resolution image

4.1. Hydrodynamical Simulations

As the basis for our mock spectra, we use hydrodynamic simulations run with a modification of the publicly available GADGET-2 code. This code implements a simplified star formation criterion (Springel et al. 2005) that converts all gas particles that have an overdensity above 1000 and a temperature below 105 K into star particles (see Viel et al. 2004). The simulations used are described in detail in Becker et al. (2011) and in Viel et al. (2013a).

The reference model that we use is a box of length 20 h−1 comoving Mpc with 2 × 5123 gas and cold DM particles (with a gravitational softening length of 1.3 h−1 kpc) in a flat ΛCDM universe with cosmological parameters Ωm = 0.274, Ωb = 0.0457, ns = 0.968, H0 = 70.2 km s−1 Mpc−1 and σ8 = 0.816, in agreement both with nine-year WMAP (Komatsu et al. 2011) and Planck data (Planck Collaboration et al. 2014). The initial condition power spectra are generated with CAMB (Lewis et al. 2000). For the boxes considered in this work, we have verified that the transmission PDF has converged in terms of box size and resolution.

We explore the impact of different thermal histories on the Lyα forest by modifying the ultraviolet (UV) background photo-heating rates in the simulations as done in, e.g., Bolton et al. (2008). A power-law temperature–density relation, T = T0Δγ − 1, arises in the low-density IGM (Δ < 10) as a natural consequence of the interplay between photo-heating and adiabatic cooling (Hui et al. 1997; Gnedin & Hui 1998). The value of γ within a simulation can be modified by varying a density-dependent heating term (see, e.g., Bolton et al. 2008). We consider a range of values for the temperature at mean density, T0, and the power-law index of the temperature–density relation, γ, based on the observational measurements presented recently by Becker et al. (2011). These consist of a set of three different indices for the temperature–density relation, γ(z = 2.5)  ∼  1.0, 1.3, 1.6, that are kept roughly constant over the redshift range z = [2–6] and three different temperatures at mean density, T0(z = 2.5)  ∼  [11000, 16000, 21500] K, which evolve with redshift, yielding a total of nine different thermal histories. Between z = 2 and z = 3 there is some temperature evolution and the IGM becomes hotter at low redshift; at z = 2.3, the models have T0  ∼  [13000, 18000, 23000] K. We refer to the intermediate temperature model as our "reference" model, or T_REF, while the hot and cold models are referred to as T_HOT and T_COLD, respectively. The values of T0 of our simulations at the various redshifts are summarized in Table 2.

Table 2. Evolution of T0 in Hydrodynamical Simulations

z T_COLD T_REF T_HOT
2.3 13000 K 18000 K 23000 K
2.6 11000 K 16000 K 21500 K
3.0  9000 K 14000 K 19000 K

Download table as:  ASCIITypeset image

Approximately 4000 core hours were required for each simulation run to reach z = 2. The physical properties of the Lyα forest obtained from the Tree PM/SPH code GADGET-2 are in agreement at the percent level with those inferred from the moving-mesh code AREPO (Bird et al. 2013) and with the Eulerian code ENZO (O'Shea et al. 2004).

For this study, the simulation outputs were saved at z = [2.3, 2.6, 3.0], from which we extract 5000 optical depth sightlines binned to 2048 pixels each. To convert these to transmission spectra, the optical depths were rescaled such that the skewers collectively yielded a desired mean transmission, 〈FLyα ≡ exp (− τLyα). For our fiducial models, we would like to use the mean transmission values estimated by Becker et al. (2013), which we denote as 〈FLyα, B13 ≡ exp (− τLyα, B13). However, their estimates assume certain corrections from optically thick systems and metal absorption. We therefore add back in the corrections they made (see discussion in Section 3.2) to get their "raw" measurement for 〈F〉 that now includes all optically thick systems and metals, and then remove these contributions assuming our own LLS and metal absorption models (see below).

Later in the paper, we will argue that our PDF analysis in fact places independent constraints on 〈FLyα.

4.2. Lyman-limit Systems

In principle, all optically thick Lyα absorbers such as LLSs and DLAs should be discarded from Lyα forest analyses, since they do not trace the underlying matter density field in the same way as the optically thin forest (Equation (1)), and require radiative transfer simulations to accurately capture their properties (e.g., McQuinn et al. 2011; Rahmati et al. 2013).

While DLAs are straightforward to identify through their saturated absorption and broad damping wings even in noisy BOSS data (see, e.g., Noterdaeme et al. 2012), the detection completeness of optically thick systems through their Lyα absorption drops rapidly at $N_{{\rm H\,\scriptsize{I}}}\,{\lesssim}\,10^{20} \ \mathrm{cm^{-2}}$. Even in high-S/N, high-resolution spectra, optically thick systems can only be reliably detected through their Lyα absorption at $N_{{\rm H\,\scriptsize{I}}}\gtrsim 10^{19}\ \mathrm{cm^{-2}}$ ("super-LLS"). Below these column densities, optically thick systems can be identified either through their rest-frame 912 Å Lyman-limit (albeit only one per spectrum) or using higher-order Lyman-series lines (e.g., Rudie et al. 2013). Neither of these approaches have been applied in previous Lyα forest transmission PDF analyses (McDonald et al. 2000; Kim et al. 2007; Calura et al. 2012; Rollinde et al. 2013), so arguably all these analyses are contaminated by LLSs.

Instead of attempting to remove LLSs from our observed spectra, we incorporate them into our mock spectra through the following procedure. For each PDF bin, we evaluate the total redshift pathlength of the contributing BOSS spectra (and corresponding mocks)—this quantity is summarized in Table 1. This is multiplied by lLLS(z), the number of LLS per unit redshift, to give the total number of LLS expected within our sample. We used the published estimates of this quantity by Ribaudo et al. (2011)21 which is valid over 0.24  <  z  <  4.9:

Equation (10)

where lz0 = 0.1157 and γLLS = 1.83.

After estimating the total number of LLSs in our mock spectra, lLLS(zz, we add them at random points within our set of simulated optical depth skewers. We also experimented with adding LLSs such that they are correlated with regions that already have high column density (e.g., Font-Ribera & Miralda-Escudé 2012), but we found little significant changes to the transmission PDF and therefore stick to the less computationally intensive random LLSs.

For each model LLS, we then draw a column density using the published LLS column density distribution, $f(N_{{\rm H\,\scriptsize{I}}})$, from Prochaska et al. (2010). This distribution is measured at z ≈ 3.7, so we make the assumption that $f(N_{{\rm H\,\scriptsize{I}}})$ does not evolve with redshift between 2  ≲  z  ≲  3.7. For our column densities of interest, this distribution is represented by the broken power laws:

Equation (11)

For the normalizations k1 and k2, we demand that

Equation (12)

and require both power laws to be continuous at $N_{{\rm H\,\scriptsize{I}}}= 10^{19.0}\ \mathrm{cm^{-2}}$. These constraints produce k1 = 10−4.505 and k2 = 103.095. After drawing a random value for the column density of each LLS, we add the corresponding Voigt profile to the optical depth in the simulated skewer.

In addition to the LLS with column densities of $10^{17.5}\ \mathrm{cm^{-2}}\,{<}\,N_{{\rm H\,\scriptsize{I}}}\,{<}\,10^{20.3}\ \mathrm{cm^{-2}}$ that are defined to have $\tau _{{\rm H\,\scriptsize{I}}} \ge 2$, there is also a population of partial Lyman-limit systems (pLLSs) that are not well-captured in our hydrodynamical simulations since they have column densities ($10^{16.5}\ \mathrm{cm^{-2}}\,{\lesssim}\,N_{{\rm H\,\scriptsize{I}}}< 10^{17.5}\ \mathrm{cm^{-2}}$) at which radiative transfer effects become significant ($\tau _{{\rm H\,\scriptsize{I}}} \gtrsim 0.1$). However, the incidence rates and column-density distribution of pLLSs are ill-constrained since they are difficult to detect in normal LLS searches. We therefore account for the pLLS by extrapolating the low end of the power-law distribution in Equation (11) down to $N_{{\rm H\,\scriptsize{I}}}= 10^{16.5}\ \mathrm{cm^{-2}}$, i.e.,

Equation (13)

This simple extrapolation does not take into account constraints from the mean free path of ionizing photons (e.g., Prochaska et al. 2010), which predicts a steeper slope for the pLLS distribution, but we will explore this later in Section 5.2.

Comparing the integral of this extrapolated pLLS distribution with Equation (12) leads us to conclude that

Equation (14)

and we proceed to randomly add pLLSs to our mock spectra in the same way as LLSs.

The other free parameter in our LLS model is their effective b-parameter distribution. However, due to the observational difficulty in identifying $N_{{\rm H\,\scriptsize{I}}}\,{\lesssim}\,18.5\ \mathrm{cm^{-2}}$ LLSs, the b-parameter distribution of this distribution has, to our knowledge, never been quantified. Due to this lack of knowledge, it is common to simply adopt a single b-value when attempting to model LLSs (e.g., Font-Ribera & Miralda-Escudé 2012; Becker et al. 2013). We therefore assume that all our pLLSs and LLSs have a b-parameter of b = 70 km s−1 similar to DLAs (Prochaska & Wolfe 1997), an "effective" value meant to capture the blending of multiple Lyα components. However, the b-parameter for this population of absorbers is a highly uncertain quantity and as we shall see, it will need to be modified to provide a satisfactory fit to the data although it will turn out to not strongly affect our conclusions regarding the IGM temperature–density relationship. Figure 10 illustrates the effect of adding a LLS to a mock spectrum, and its subsequent smoothing with the spectrograph resolution kernel (next section).

Figure 10.

Figure 10. Simulated 〈z〉 = 2.3 Lyα forest skewer from our hydrodynamical simulations, without smoothing (top panel) and smoothed to BOSS resolution (bottom panel). The black curve is the simulated transmission directly extracted from the simulations, while the red curve is the same transmission field but with an LLS added at λ = 4057 Å or z = 2.337. The blue curve in the bottom panel shows the effect of the metal absorbers added using our empirical method. For illustrative purposes, we have specifically chosen to this simulated sightline to have significant LLS and metal absorption; it is possible for a sightline to have neither. The dashed horizontal line denotes F = 0.3, below which our fiducial transmission PDF model disagrees with BOSS (see Section 5).

Standard image High-resolution image

4.3. Spectral Resolution

The spectral resolution of SDSS/BOSS spectra is R ≡ λ/Δλ ≈ 1500–2500 (Smee et al. 2013). The exact value varies significantly both as a function of wavelength, and across different fibers and plates depending on observing conditions (Figure 1).

For each spectrum, the BOSS pipeline provides an estimate of the 1σ wavelength dispersion at each pixel, σdisp, in units of the co-added wavelength grid size (Δlog10λ = 10−4). The spectral resolution at that pixel can then be obtained from the dispersion, through the following conversion: R ≈ (2.35 × 1 × 10−4ln 10 σdisp)−1. Figure 1 shows the pixel dispersions from 236 randomly selected BOSS quasar as a function of wavelength at the blue end of the spectrograph. Even at fixed wavelength, there is a considerable spread in the dispersion, e.g., ranging from σdisp ≈ 0.9–1.8 at 3700 Å. The value of σdisp typically decreases with wavelength (i.e., the resolution increases).

In their analysis of the Lyα forest 1D transmission power spectrum, Palanque-Delabrouille et al. (2013) made their own study of the BOSS spectral resolution by directly analyzing the line profiles of the mercury and cadmium arc lamps used in the wavelength calibration. They found that the pipeline underestimates the spectral resolution as a function of fiber position (i.e., CCD row) and wavelength: the discrepancy is <1% at blue wavelengths and near the CCD edges, but increases to as much as 10% at λ  ∼  6000 Å near the center of the blue CCD (compare with Figure 4 in Palanque-Delabrouille et al. 2013). Our analysis is limited to λ  ⩽  5045 Å, i.e., z  ⩽  3.15, where the discrepancy is under 4%. Nevertheless, we implement these corrections to the BOSS resolution estimate to ensure that we model the spectral resolution to an accuracy of <1%.

For each BOSS Lyα forest segment that contributes to the observed transmission PDFs discussed in Section 3, we concatenate randomly selected transmission skewers from the simulations described in the previous section. This is because the simulation box size of L = 20 h−1 Mpc (Δv  ∼  2000 km s−1) is significantly shorter than the path length of our redshift bins (Δz = 0.3, or Δv ≈ 27, 000 km s−1). This ensures that each BOSS spectrum in our sample has a mock spectrum that is exactly matched in pathlength.

We then directly convolve the simulated skewers by a Gaussian kernel with a standard deviation that varies with wavelength, using the estimated resolution from the real spectrum, multiplied by the Palanque-Delabrouille et al. (2013) resolution corrections. The effect of smoothing on the transmission PDF is illustrated by the dashed red curve in Figure 9(b). Smoothing decreases the proportion of pixels with high transmission (F ≈ 1) and with high absorption (F ≈ 0), and increases the number of pixels with intermediate transmission values.

4.4. Metal Contamination

Metal absorption along our observed Lyα forest sightlines acts as a contaminant since their presence alters the observed statistics of the Lyα forest. In high-resolution data, this contamination is usually treated by directly identifying and masking the metal absorbers, although in the presence of line blending it is unclear how thorough this approach can be.

With the lower S/N and moderate resolution of the BOSS data, direct metal identification and masking is not a viable approach. Furthermore, most of the weak metal absorbers seen in high-resolution spectra are not resolved in the BOSS data.

Rather than removing metals from the BOSS Lyα forest spectra, we instead add metals as observed in lower-redshift quasar spectra. In other words, we add absorbers observed in the rest-frame λrest ≈ 1260–1390 Å region of lower-redshift quasars with 1 + zqso ≈ (1216 Å/1300 Å)(1 + 〈z〉), such that the observed wavelengths are matched to the Lyα forest segment with average redshift 〈z〉. Figure 11 is an illustration that illustrates this concept. This method makes no assumption about the nature of the metal absorption in the Lyα forest, and includes all resolved metal absorption spanning the whole range of redshifts down to z  ∼  0. The disadvantage of this method is that it does not include metals with intrinsic wavelengths λ  ≲  1300 Å, but the relative contribution of such metal species toward the transmission PDF should be small22 since most of the metal contamination comes from low-redshift (z  ≲  2) C iv and Mg ii.

Figure 11.

Figure 11. Illustration of our empirical "sideband" model of metal contamination in our mock Lyα forest spectra. The lower panel shows the zqso = 2.7 quasar along with its Lyα forest region (red) that we wish to model. To its corresponding mock spectrum, we add metals observed in the λrest ≈ 1260–1390 Å region of a lower-redshift (zqso = 2.0) quasar (blue region in top panel).

Standard image High-resolution image

We use a metal catalog generated by B. Lundgren et al. (in preparation; see also Lundgren et al. 2009), which lists absorbers in SDSS (Schneider et al. 2010) and BOSS quasar spectra (Pâris et al. 2012)—the SDSS spectra were included in order to increase the number of zqso ≈ 1.9–2.0 quasars needed to introduce metals into the 〈z〉 = 2.3 Lyα forest mock spectra, which are not well sampled by the BOSS target selection (Ross et al. 2012). We emphasize that we work with the "raw" absorber catalog, i.e., the individual absorption lines have not been identified in terms of metal species or redshift. For each quasar, the catalog provides a line list with the observed wavelength, EW (Wr), FWHM, and detection S/N, $W_r/\sigma _{W_r}$. To ensure a clean catalog, we use only $W_r/\sigma _{W_r} \ge 3.5$ absorbers in the catalog that were identified from quasar spectra with S/N > 15 per angstrom redward of Lyα. The latter criterion ensures that even relatively weak lines (with EW ≳ 0.5 Å) are accounted for in our catalog. Figure 12 shows an example of the lower-redshift quasar spectra that we use for the metal modeling.

Figure 12.

Figure 12. Continuum-normalized spectrum of a BOSS quasar showing the metal absorbers in the 1300 Å  <  λrest  <  1390 Å "sideband" region, which would be used to add metals to 〈z〉 = 2.6 mock Lyα forest spectra. The red curve shows our metal model for this spectrum, generated from the observed wavelengths and equivalent widths in the absorber catalog generated by the automatic algorithm of Lundgren et al. (2009). We also assume that the absorbers all lie on the saturated portion of the curve-of-growth and have τ0 = 3, with the equivalent width (labeled above each absorption line) proportional to the b-parameter. The model absorption profiles represented by the red curve would be added to our mock Lyα forest spectra. We have chosen to plot this particular "sideband" because it has more absorbers than average—the typical spectrum has less metal absorption than this.

Standard image High-resolution image

However, we want to add a smooth model of the metal-line absorption to add to our mock spectra, rather than adding in a noisy spectrum. We therefore use a simple model as follows: For each Lyα forest segment we wish to model at redshift 〈z〉, we select an absorber line-list from a random quasar with 1 + zqso ≈ (1216 Å/1300 Å)(1 + 〈z〉). We next assume that all resolved metals in the SDSS/BOSS spectra are saturated and thus in the flat regime of the curve-of-growth. The EW is then given by

Equation (15)

where τ0 is the optical depth at line center, b is the velocity width and c is the speed of light. In the saturated regime, Wr is mostly sensitive to changes in b while being highly insensitive to changes in τ0. We can thus adopt τ0 as a global constant and solve for b, given the Wr of each listed absorber in the selected "sideband" quasar. We have found that τ0 = 3 provides a good fit for most of the absorbers.

We then add the Gaussian profile into our simulated optical depth skewers:

Equation (16)

centered at the same observed wavelength, λ, as the real absorber. The red curve in Figure 12 shows our model for the observed absorbers, using just the observed wavelength, λ, and EW, Wr, from the absorber catalog.

Our method for incorporating metals is somewhat crude since one should, in principle, first deconvolve the spectrograph resolution from the input absorbers, and then add the metal absorbers into the mock spectra prior to convolving with the BOSS spectral resolution. In contrast, we fit b-parameters to the absorber catalog without spectral deconvolution, and therefore these b-parameters can be thought of as combinations of the true absorber width, babs, and the spectral dispersion, σdisp, i.e., $b^2\,{\sim}\,b^2_{\rm abs} + \sigma ^2_{\rm disp}$. While technically incorrect, this seems reasonable since the template quasar spectra and forest spectra that we are attempting to model both have approximately the same resolution, and in practical terms this ad hoc approach does seem to be able to reproduce the observed metals in the lower-redshift quasar spectra (Figure 12). The other possible criticism of our approach is that it does not incorporate weak metal absorbers, although we attempted to mitigate this by setting a very high S/N threshold on the template quasars for the metals. However, we have checked that such weak metals do not significantly change the forest PDF (and indeed metals in general do not seriously affect the PDF, compare with Figure 9(c)).

We also tried adding metals with similar redshifts to—and correlated with—forest absorbers (e.g., absorption by Si ii and Si iii) measured in Pieri et al. (2010, 2014) using a method described in the Appendix of Slosar et al. (2011). We found a negligible impact on the transmission PDF owing mainly to the fact that these correlated metals contribute only ∼0.3% to the overall flux decrement, so we neglect this contribution in our subsequent analysis.

4.5. Pixel Noise

It is non-trivial to introduce the correct noise to a simulated Lyα forest spectrum: given a noise estimate from the observed spectrum, one needs to first ensure that the mock spectrum has approximately the same flux normalization as the data. This is challenging, as the Lyα forest transmission at any given pixel, which ranges from 0 to 1, will vary considerably between the simulated spectrum and the real data.

The simplest method of adding noise to a mock spectrum is simply to introduce Gaussian deviates using the pipeline noise estimate for each spectrum—this was essentially the method used by Desjacques et al. (2007) and the BOSS mocks described in Font-Ribera et al. (2012). However, with the MCMC co-addition procedure described in Section 3.1, we are in a position to model the noise in a more robust and self-consistent fashion.

Recall that the MCMC procedure returns posterior probabilities for two quantities: the true underlying spectral flux density, $\mathcal {F}_{\lambda }$, and the four free parameters Aj, which parameterize the noise in each spectrum. This estimate of the Aj from each quasar spectrum allows us to accurately model the pixel noise using Equation (5).

The MF-PCA method (Section 3.2) produces an estimate of the quasar continuum, c, providing approximately the correct flux level at each point in the spectrum. We can now multiply c with the simulated Lyα forest transmission spectra, F, which had already been smoothed to the same dispersion as its real counterpart (the estimated quasar continuum is already at approximately the correct smoothing, since it was fitted to the observed spectrum).

This procedure produces a noiseless mock spectrum with the correct flux normalization and smoothing. We can now generate noisy spectra corresponding to a given BOSS quasar, using our MCMC noise estimation described in Section 3.1. First, we substitute our mock spectrum as $\mathcal {F}_{\lambda }$ into Equation (5), and then combine the Aj noise parameters (estimated through our MCMC procedure) as well as the calibration vectors Sλ, i and sky estimates sλ, i. This lets us generate self-consistent noise vectors corresponding to each individual exposure that make up the mock quasar spectrum, σλi. The noise vectors are then used to draw random Gaussian deviates that are added to the mock spectrum, on a per-pixel basis, to create the mock spectral flux density, fλi. Finally, we combine these individual mock exposures into the optimal spectral flux density for the mock spectrum, through the expression (see the Appendix):

Equation (17)

where

Equation (18)

Figure 9(c) illustrates the effect of adding pixel noise to the smoothed Lyα forest transmission PDF. As expected, this scatters a significant fraction of pixels to F  >  1, and also to F  <  0 to a smaller extent.

4.6. Continuum Errors

With the noisy mock spectrum in hand (see, e.g., bottom panel of Figure 13), we can self-consistently include the effect of continuum errors into our model transmission PDFs by simply carrying out our MF-PCA continuum-fitting procedure on the individual noisy mock spectra. Dividing out the mock spectra with the new continuum fits then incorporates an estimate of the continuum errors (estimated by Lee et al. 2012 to be at the ∼4%–5% rms level) into the evaluated model transmission PDF. This estimated error includes uncertainties stemming from the estimation of the quasar continuum shape due to pixel noise, as well as the random variance in the mean Lyα forest absorption in individual lines-of-sight.

Figure 13.

Figure 13. Simulating the noise properties and continuum errors of a BOSS quasar. The top panel shows the observed spectrum of a BOSS quasar, and its associated continuum fit, c, in blue. The middle panel shows the simulated transmission spectra (after adding LLS, smoothing, and adding metals) multiplied by the quasar continuum fitted to the true spectrum. In the lower panel, we have added noise to the mock spectrum using the noise parameters estimated from the true spectrum (see Section 3.1). A new continuum, c', (red) is re-fitted to the noisy mock spectrum. The difference between new continuum c' and "true" continuum, c, of the mock (blue) introduces continuum errors to our model. The vertical dotted lines indicates the range of pixels that contribute to the 〈z〉 = 3.0 subsample in our transmission PDF; a small segment between (1 + zqso)1040 Å = 4461 Å and (1 + 2.75)1216 Å = 4560 Å also contributes to the 〈z〉 = 2.6 bin.

Standard image High-resolution image

Note that regardless of the overall mean absorption in the mock spectra (i.e., inclusive of our models for metals, LLSs, and mean forest absorption—see Section 5.4), we always use 〈Fcont(z), the same input mean transmission derived from Becker et al. (2013; described in Section 3.2) to fit the continua in both the data and mock spectra. While the overall absorption in our fiducial model is consistent with that from Becker et al. (2013), as we shall see later, the shape of the transmission PDF retains information on the true underlying mean transmission even if fitted with a mean flux regulated continuum with a wrong input 〈F〉(z).

The effect of continuum errors on the transmission PDF is shown in Figure 9(e): like pixel noise, it degrades the peak of the PDF, but only near F  ∼  1.

5. MODEL REFINEMENT

In an ideal world, one would like to do a blind analysis by generating the transmission PDF model (Section 4) in isolation from the data, before "unblinding" to compare with data—this would then in principle yield results free from psychological bias in the model building. However, as we shall see in Section 5.1, this does not give acceptable fits to the data so we have to instead modify our model to yield a better agreement, in particular our LLS model (Section 5.2) and assumed mean transmission (Section 5.4).

5.1. Initial Comparison with T_REF Models

For each of our nine hydrodynamical simulations (sampling three points each in T0 and γ), we determine the transmission PDF from the Lyα forest mock spectra that include the effects described in the previous section, for the various redshift and S/N subsamples in which we had measured the PDF in BOSS (Section 3.3). In Figure 14, we show the transmission PDFs for all our redshift and S/N subsamples in BOSS, compared with the corresponding simulated transmission PDFs from the T_REF simulation with γ = [1.0, 1.3, 1.6]. Note that the error bars shown are the diagonal elements of the covariance matrix estimated through bootstrap resampling on the data.

Figure 14.

Figure 14. Initial comparison between the transmission PDFs observed from BOSS Lyα forest data (error bars) and simulated PDFs generated from the T_REF hydrodynamical simulations (curves) with the method described in Section 4; each row is at the same redshift, while the different columns display the different S/N cut. The points with the error bars are the PDFs measured from the BOSS data (estimated from bootstrap resampling, while the black, dotted red, and dashed blue curves denote simulated PDFs with γ = [1.5, 1.3, 1.0], respectively. The top and middle panels show the transmission PDFs with linear and logarithmic axes, while the lower panels show the pull, i.e., residuals between the simulated PDF and the data PDF, divided by the error. The χ2 values indicated in these plots are for 24 dof, and clearly indicate unacceptable fits to the data—modifications to the model are required.

Standard image High-resolution image

At first glance, the model transmission PDFs seem to be a reasonable match for the data, especially considering we have carried out purely forward modeling without fitting for any parameters. However, when comparing the "pull," (pdata, ipmodel, i)/σp, i, between the data and model (bottom panels of Figure 14), we see significant discrepancies in part due to the extremely small bootstrap error bars. Nevertheless, it is gratifying to see that the shape of the residuals is relatively consistent across the different S/N subsamples at fixed redshift and γ, since this indicates that our spectral noise model is robust.

We proceed to quantify the differences between the simulated transmission PDFs, pmodel, and observed transmission PDFs, pdata, with the χ2 statistic:

Equation (19)

where we use the bootstrap error covariance matrix, Cboot. Note that we also include a bootstrap error term that accounts for the sample variance in the model transmission PDFs, since our pipeline for generating mock spectra is too computationally expensive to include sufficiently large amounts of skewers to fully beat down the sample variance in the models.23

We limit our model comparison to the range −0.1  ⩽  F  ⩽  1.2, i.e., 27 transmission bins with bin width Δ(F) = 0.05. This range covers pixels that have been scattered to "unphysical" values of F < 0 or F > 1 due to pixel noise, as is expected from the low-S/N of our BOSS data, and also captures >99.8% of the pixels within each of our data subsets. In particular, it is important to retain the bins with F  >  1 because the F  ∼  1 transmission bins are highly sensitive to γ (Lee 2012) and therefore we want to fully sample that region of the PDF even if it will require careful modeling of pixel noise and continuum errors.

There are two constraints on all our transmission PDFs: the normalization convention

Equation (20)

and the imposition of the same mean transmission due to the mean flux regulated continuum-fitting

Equation (21)

such that all the mock spectra have the same absorption, 〈Fcont(z). This is because the mock spectra have been continuum-fitted (Section 4.6) in exactly the same way as the BOSS spectra, which assumes the same mean Lyα transmission inferred from the Becker et al. (2013) measurements (Section 3.2). The "true" optically thin mean transmission, 〈FLyα, imposed on the simulation skewers is in principle a different quantity from 〈Fcont, since the latter includes contribution from metal contamination and optically thick LLSs.

This leaves us with ν = 27 − 1 − 2 = 24 degrees of freedom (dof) in our χ2 comparison. The χ2 for all the models shown in Figure 14 are shown in the corresponding figure legends.

In this initial comparison, the χ2 values for the models in Figure 14 are clearly unacceptable: we find χ2 ≳ 200 for 24 dof in all cases. However, it is interesting to note that the γ = 1.6 or γ = 1.3 models are preferred at all redshifts and S/N cuts. Note that the S/N = 8–10 subsamples (middle column in Figure 14) tends to have a slightly better agreement between model and data compared to the other S/N cuts at the same redshift: this simply reflects the smaller quantity of data of the subsample (compare with Table 1) and hence larger bootstrap errors.

A closer inspection of the residuals in Figure 14 indicate that there are two major sources of discrepancy between the models and data: first, at the low-transmission end, we underproduce pixels at 0.1  ≲  F  ≲  0.4 while simultaneously overproducing F  ≲  0.1 pixels, especially at 〈z〉 = 2.3 and 〈z〉 = 2.6. This seems to affect all γ models equally. Pieri et al. (2014) found that at BOSS resolution, pixels with F  ≲  0.3 come predominantly from saturated Lyα absorption from LLS. We therefore investigate possible modifications to our LLS model in Section 5.2.

The other discrepancy in the model transmission PDFs manifests at the higher-transmission end in the 〈z〉 = 2.6 and 〈z〉 = 3 subsamples, where we see a sinusoidal shape in the residuals at F  >  0.6 that appears consistent across different S/N. This portion of the transmission PDF depends on both γ and, as we shall see, on the assumed mean transmission 〈F〉(z), which we shall discuss in more detail in Section 5.4.

Finally, our transmission PDF model includes various uncertainties in the modeling of metals, LLSs, and continuum-fitting which have not yet been taken into account. In Section 5.3, we will estimate the contribution of these uncertainties, by means of a Monte Carlo method, in our error covariances.

5.2. Modifying the LLS Column Density Distribution

With the moderate spectral resolution of BOSS, there are few individual pixels in the optically thin Lyα forest that reach transmission values of F  ≲  0.4. Such low-transmission pixels are typically due to either the blending of multiple absorbers (see, e.g., Figure 2 in Pieri et al. 2014), or optically thick systems (see Figure 10 in this paper).

As we have seen in Figure 14, at low-transmission values the discrepancy between data and model has a distinct shape, which is particularly clear at 〈z〉 = 2.3: the models underproduce pixels at 0.1  ≲  F  ≲  0.4 while at the same time overproducing saturated pixels with F ≈ 0.

To resolve this particular discrepancy would therefore require either drastically increasing the amount of clustering in the Lyα forest, or modifying our assumptions on the LLSs in our mock spectra. The first possibility seems rather unlikely since the Lyα forest power on relevant scales are well-constrained (Palanque-Delabrouille et al. 2013), and would in any case require new simulation suites to address—beyond the scope of this paper.

On the other hand, it is not altogether surprising that our fiducial column density distribution (Section 4.2)—which was measured at z ≈ 3.7 (Prochaska et al. 2010)—do not reproduce the BOSS data at 〈z〉 = 2.3–2.6. We therefore search for an LLS model that better describes the low-transmission end of the BOSS Lyα forest. Looking at the 〈z〉 = 2.3 PDFs in Figure 14, we see that our fiducial model overproduces pixels at F = 0, yet is deficient at slightly higher F. This suggests that our model is overproducing super-LLS ($N_{{\rm H\,\scriptsize{I}}}> 10^{19}\ \mathrm{cm^{-2}}$) that contribute large absorption troughs with F = 0, while not providing sufficient lower-column density absorbers that can individually reach minima of 0.1  ≲  F  ≲  0.4 when smoothed to BOSS resolution. In other words, our fiducial model appears to have an excessively "top-heavy" LLS column density distribution.

For a change, we will try an LLS column density distribution with a more ample bottom end, using the steepest power laws within the 1σ limits estimated by Prochaska et al. (2010):

Equation (22)

We use the same lLLS(z) as before, and obey the integral constraints from Prochaska et al. (2010) that demand that the ratios of $\int f(N_{{\rm H\,\scriptsize{I}}})\; {d}N_{{\rm H\,\scriptsize{I}}}$ between the two column-density regimes be fixed. This gives us k1 = 102.819 and k2 = 107.039, although the new distribution is no longer continuous at $N_{{\rm H\,\scriptsize{I}}}= 10^{19} \ \mathrm{cm^{-2}}$. This new distribution is illustrated by the red power laws in Figure 15.

Figure 15.

Figure 15. LLS and pLLS column-density power-law distributions used in our initial model (black; Section 4.2) and steeper modification (red; Section 5.2). The distributions are normalized assuming the overall LLS incidence rate at z = 2.25 (compare with Equation 10). The vertical dashed lines denotes the $N_{{\rm H\,\scriptsize{I}}}= 10^{17.5}\ \mathrm{cm^{-2}}$ boundary between pLLS and LLS, and $N_{{\rm H\,\scriptsize{I}}}= 10^{19}\ \mathrm{cm^{-2}}$ boundary between LLS and super-LLS. The shaded regions show the range of possible distributions as determined by Prochaska et al. (2010), but there are few robust constraints in the $10^{16.5}\ \mathrm{cm^{-2}}\le N_{{\rm H\,\scriptsize{I}}}\le 10^{17.5}\ \mathrm{cm^{-2}}$ pLLS regime. The "initial" distribution was used in the preliminary data comparisons in Section 5.1, but all subsequent analysis (after Section 5.2) assumes the "steep" distribution.

Standard image High-resolution image

Another change we have made is to the partial LLS model, which was possibly too conservative in the fiducial model. Instead of extrapolating from the LLS distribution, we now adopt the pLLS power-law slope of βpLLS = −2.0 inferred from the total mean free path to ionizing photons by Prochaska et al. (2010). This dramatically increases the incidence of pLLS in our spectra relative to LLS: we now have lpLLS = 1.8 lLLS, where lLLS is the same value we used previously (Equation (10)). This increase, while large, is not unreasonable in light of the large uncertainties in direct measurements on the H i column-density distribution from direct Lyα line-profile fitting (e.g., Janknecht et al. 2006; Rudie et al. 2013). Note also that even this increased pLLS incidence only amounts to, on average, less than one pLLS per quasar (Δ(z)  ∼  0.3–0.4 per quasar at our redshifts).

We found that while increasing the number of pLLS relieves the tension between data and model at 0.1  ≲  F  ≲  0.4, it does not resolve the excess at the fully absorbed F ≈ 0 pixels in the models. However, changing the b-parameter of the LLS and pLLS from our original fiducial value of b = 70 km s−1 modifies the PDF in a way that improves the agreement. This is a reasonable step, since the effective b-parameter is otherwise observationally ill-constrained for the LLS and pLLS populations. This is because LLSs are typically complexes of multiple systems separated in velocity space, and while there have been analyses of the b-parameter in these individual components, the "effective" b-parameter for complete LLS systems has never been quantified to our knowledge.

We therefore search for the best-fit b-parameter with respect to the T_REF, γ = 1.3 model at 〈z〉 = 2.3, focusing primarily on the agreement in the 0  ⩽  F  ⩽  0.4 bins (Figure 16). Our choice of model for this purpose should not significantly affect our subsequent conclusions regarding the IGM temperature–density slope, since there is little sensitivity toward the latter in the relevant low-transmission bins (compare with Figure 14). However, there will be some degeneracy between the LLS b-parameter and T0 (Figure 17) since changing the latter does somewhat change the low-transmission portion of the PDF—we will come back to this point in Section 7.

Figure 16.

Figure 16. Variation of the transmission PDF as a function of LLS b-parameter. All model transmission PDFs here are computed from the T_REF, γ = 1.6 model assuming the revised pLLS/LLS distribution described in Section 5.2 (curves), compared with the S/N = 6–8 BOSS transmission PDF at 〈z〉 = 2.3 (error bars). The quoted χ2 values are for 24 dof, and evaluated using only bootstrap error covariances. We find that b = 45 km s−1 gives the best fit to the data.

Standard image High-resolution image
Figure 17.

Figure 17. Variation of the transmission PDF as a function of the IGM temperature at mean density, T0. All model transmission PDFs here have the same temperature–density relationship, γ = 1.6, and are compared with the S/N = 6–8 BOSS transmission PDF at 〈z〉 = 2.3 (error bars). The quoted χ2 values are for 23 dof. Note that in these models we have already implemented the improved LLS/pLLS model described in Section 5.2, hence the much improved χ2 values compared to those quoted in Figure 14.

Standard image High-resolution image

As shown in Figure 16, a value of b = 45 km s−1 gives the best agreement with the data at 0  ⩽  F  ⩽  0.4. This yields χ2 = 116 for 24 dof, which is dramatically improved over that quoted in Figure 14, but still not quite a good fit. In the subsequent results, we will adopt this steeper pLLS/LLS model and b-parameter as the fiducial model in our analysis, and will correspondingly decrease the dof in our χ2 analysis to account for the fitting of b.

Note that while significantly improving the PDF fit, this new b-parameter still does not give a perfect fit to the low-transmission (F < 0.4) end. This is probably due to the simplified nature of our LLS model, which neglects the finite distribution of b-parameters and internal velocity dispersion of individual components. These properties are currently not well known, and it seems likely that an improved model would allow a better fit to the low-transmission end of the PDF.

5.3. Estimation of Systematic Uncertainties

While we have estimated the sample variance of our BOSS transmission PDFs by bootstrap resampling on the spectra, there are significant uncertainties associated with each component of our transmission PDF model as described above, e.g., the LLS incidence rate and level of continuum error. These uncertainties can be incorporated into a systematics covariance matrix, Csys that can then be added to the bootstrap covariance, Cboot, when computing the model likelihoods. This requires assuming that Csys and Cboot are uncorrelated, and that the errors are Gaussian distributed.

We adopt a Monte Carlo approach to estimate Csys by generating 200 model transmission PDFs that randomly vary the systematics. We then evaluate the covariance of the transmission PDFs, pi, relative to the fiducial model, pref, i at each transmission bin i. This allows us to construct a covariance matrix with the elements

Equation (23)

that encompasses the errors from the uncertainties in the LLS model, metal absorption, and continuum scatter. Note that estimation of systematic uncertainties is typically a subjective process, and for most of these contributions we can only make educated guesses as to their uncertainty.

Our Monte Carlo iterations sample the various components of our model as follows:

  • 1.  
    LLS incidence. We sample the uncertainty in the power-law exponent γLLS of the redshift evolution in LLS incidence rate (Equation (10)), which is $\sigma _{\gamma _{\rm LLS}}\pm 0.21$ as reported by Ribaudo et al. (2011). We assume this uncertainty is Gaussian and draw lLLS(z) accordingly. This primarily affects the low-flux regions −0.1  ≲  F  ≲  0.3 of the PDF.
  • 2.  
    Partial-LLS slope. Our choice of slope for the distribution of partial LLS ($N_{{\rm H\,\scriptsize{I}}}<10^{17.5}\ \mathrm{cm^{-2}}$ absorbers is from an indirect constraint with significant uncertainty (Prochaska et al. 2010). We therefore vary the pLLS slope around the fiducial βpLLS = −2.0 by ±0.5 assuming a flat prior in this range, which primarily alters the 0  ≲  F  ≲  0.4 portion of the PDF since pLLS typically do not saturate at BOSS resolution.
  • 3.  
    LLS b-parameters. Also in the previous section, we found that a global b-parameter of b = 45 km s−1 gives the best agreement with the data, but this is an ad hoc approach with significant uncertainties. In our Monte Carlo sampling we therefore adopt a conservative b = 45 km s−1 ± 20 km s−1 with a uniform prior. This primarily affects the PDF at −0.1  ⩽  F  ⩽  0.4 as can be seen in Figure 16.
  • 4.  
    Intervening metals. Although we used an empirical method to model intervening metals (Section 4.4), we may have missed metals with rest wavelengths λ  ≲  1300 Å. Furthermore, we have a relatively small set (∼300–400) of "template" quasars from which our metal model is derived, which may contribute some sampling variance. We therefore guess at a Gaussian error of ±30% for the metal incidence rate. This modulates the extent to which metals pulls the overall PDF toward lower F-values (compare with Figure 9(c)).
  • 5.  
    Continuum errors. The overall rms scatter in our continuum estimation also affect the flux PDF (Figure 9(e)). This can be varied in our model by rescaling the quantity c'(λ)/c(λ) − 1, where c is the "true" continuum used to generate the mock spectrum, while c' is the model continuum that we subsequently fit (Figure 13). For each iteration in our Monte Carlo systematics estimation, we dilate or reduce c'(λ)/c(λ) − 1 by a Gaussian deviate assuming ±20% scatter. This primarily affects the high-transmission (F  >  0.8) end of the PDF.

For these Monte Carlo iterations, we used the identical thermal model (γ = 1.6, T_REF) as well as fixed the same random number seeds used for the selection of simulation skewers and generation of noise vectors in our spectra, in order to ensure that the only variation between the different iterations are from the randomly sampled systematics. Figure 18 shows 50 of these Monte Carlo iterations on the transmission PDF for the 〈z〉 = 2.3, S/N = 8–10 subsample.

Figure 18.

Figure 18. Gray curves show 50 model transmission PDFs with a random sampling of different LLS incidence rates, metal absorption, and continuum scatter, evaluated for the 〈z〉 = 2.3, S/N = 8–10 BOSS subsample and using the T_REF simulation with γ = 1.6. The red curve shows the transmission PDF at our fiducial level of LLS incidence, metal absorption, and continuum scatter. The top panel has a linear abscissa, while the lower panel has a logarithmic abscissa.

Standard image High-resolution image

Figure 19 shows an example of the systematic contribution to the covariance matrix. The overall amplitude of the systematic contribution is considerably higher than that estimated from the bootstrap resampling (compare with Figure 7), indicating that we are in the systematics-limited regime. We also see significant anti-correlations at almost the same level as the positive correlations, which are due mostly to correlations between transmission bins on either side of "pivot points" as the transmission PDF varies from the systematics—these anti-correlations will somewhat counteract the increased size of the diagonal components. In the subsequent analysis, we will use an error covariance matrix, C = Cboot + Csys, in which the systematics covariance matrix estimated in this subsection is added to the bootstrap covariance matrix (described in Section 3) estimated from the BOSS transmission PDFs.

Figure 19.

Figure 19. Top: 2D density plot of the error covariance matrix representing our systematic uncertainties in the LLS incidence rate, pLLS column-density distribution, LLS b-parameter, metal absorption, and continuum scatter, as estimated through the Monte Carlo method described in Section 5.3. The bottom plot shows the corresponding correlation function. This particular covariance matrix was estimated for the 〈z〉 = 2.6, S/N = 8–10 subsample, and the values in the covariance have been multiplied by 104 for clarity.

Standard image High-resolution image

We have at this point yet to address one more parameter that can significantly change the shape of our model transmission PDFs, namely the Lyα forest mean transmission assumed in the mock spectra, 〈FLyα. However, this is an important astrophysical parameter which we did not want to treat as a "systematic," so the next subsection will describe our treatment of 〈FLyα.

5.4. Modifying the Mean-transmission

In the initial comparison of the model transmission PDFs shown in Figure 14, the models show a discrepancy with the data at higher transmission bins F ≳ 0.6. Such differences can be alleviated by varying the mean transmission of the pure Lyα forest, 〈FLyα ≡ exp (− τLyα), i.e., ignoring the contribution from metals and LLS. This quantity can be varied directly in the simulation skewers (Section 4.1). When we vary 〈FLyα in the simulations, the quantity 〈Fcont, which is used to normalize the continuum level of the mock quasar spectrum, is always kept fixed to 〈Feff(z) = exp [ − (τLyα + τmetals + τLLS)] as derived from Becker et al. (2013; see Section 3.2). However, since we are applying the same 〈Fcont to both the real and mock spectra, 〈Fcont can be best thought of as a normalization that does not actually need to match 〈Feff. Once both the real and mock spectra have been normalized by 〈Fcont, the transmission PDF retains information on the respective contributions from the Lyα forest, metals, and LLSs regardless of the assumed 〈Fcont, because these contributions affect the shape of the PDF in different ways. In principle, it is possible to vary these all components to infer their relative contributions, but due to the crudeness of our metal and LLS models, we choose to have only 〈FLyα as a free parameter while keeping 〈Fmetals = exp (− τmetals) and 〈FLLS = exp (− τLLS) fixed. The possible variation of these latter two components are instead incorporated into the systematic uncertainties determined in Section 5.3. The effect of varying 〈FLyα is illustrated in Figure 20, where we plot the same IGM model with different underlying values of 〈FLyα in the simulation skewers while keeping fixed the contribution from metals, LLSs, etc.

Figure 20.

Figure 20. Variation of the model transmission PDFs (curves) with respect to changing the mean transmission, 〈FLyα, of the Lyα forest simulations. The model PDFs were generated from the γ = 1.6, T_REF model, while the error bars show the corresponding transmission PDFs from BOSS data. In the bottom panel, the dashed horizontal lines indicate ±1σ discrepancies between models and data, although we caution against "chi-by-eye" due to the significantly non-diagonal covariances in the errors. The central 〈FLyα value shown here corresponds to that estimated by Becker et al. (2013), while the other two are evaluated at ±1σ of their reported errors. The mean transmission value, 〈Fcont, assumed in the mean flux regulated continuum fitting is constant in all cases. Note that the χ2 values, which are for 23 dof, are much improved over the previous data comparisons, since they now include the improved LLS/pLLS model as well as the full covariance matrix including systematic uncertainties.

Standard image High-resolution image

We therefore explore a range of 〈FLyα around the vicinity of that estimated by Becker et al. (2013), 〈FLyα, B13, and at each value of 〈FLyα evaluate the χ2 summed over all the S/N subsamples for each 〈z〉 and γ combination. In addition, we now adopt the updated LLS/pLLS model described in Section 5.2, while the χ2 evaluation now uses the full covariance matrix including both the bootstrap and systematics (Section 5.3) uncertainties to compare with the transmission PDFs measured from the BOSS data.

The models are compared with the BOSS data as we vary 〈FLyα, and for each 〈FLyα we compute the total chi-squared summed over all three S/N subsamples, where each subsample contributes 27 − 1 − 2 = 24 dof (compare with Equations (20) and (21)) along with a further reduction of one dof since we have effectively fitted for the LLS b-parameters in Section 5.2, for a total of ν = 71 dof. The result of this exercise is shown in Figure 21 which shows the χ2 values for the T_REF models with different γ—we only vary γ and not T0 because the F ≳ 0.6 portions of the transmission PDF that change the most with 〈FLyα do not vary as much with respect to changes in T0 (compare with Figure 17). Examples of the corresponding best-fit model PDFs in one S/N subsample are shown in Figure 22, where we see that varying 〈FLyα can indeed change the shape of the F ≳ 0.6 portion of the transmission PDF sufficiently, improving the fits in those transmission ranges compared to the fiducial models (Figure 14).

Figure 21.

Figure 21. χ2 values for the T_REF models (with different γ) plotted as a function of Lyα forest mean transmission values, 〈FLyα, used to normalize the simulation skewers. The quoted χ2 values (with ν = 71 dof) were obtained by summing over the χ2 for the different S/N subsamples at each redshift. The fiducial transmission values inferred from Becker et al. (2013) are shown as the solid vertical lines, while the dot-dashed vertical lines denote their 1σ errors. The dashed lines in the 〈z〉 = 3 panel denote the inflated error bars we use to account for the quasar selection bias shown in Figure 23. In Section 6 we will marginalize over the uncertainties in 〈FLyα to obtain our final results.

Standard image High-resolution image
Figure 22.

Figure 22. Model transmission PDFs (curves) with the best-fit Lyα forest mean transmission 〈FLyα for different γ values from the T_REF family of models (using the improved LLS/pLLS model). These are for the S/N = 8–10 subsample and compare with the corresponding BOSS data transmission PDFs (error bars). The upper two panels in each plot show the transmission PDFs in linear and logarithmic ordinate axes, respectively, while the bottom panels show residuals divided by the errors, with dashed horizontal lines indicating the ±1σ region relative to the data. The best-fitting 〈FLyα values correspond to the minima in Figure 21, but here we have labeled them relative to 〈FLyα, B13, the fiducial Becker et al. (2013) values and errors. The χ2 values quoted are for 23 dof (taking into account the fitting of the LLS b-parameter), and were computed using the full error covariances including both bootstrap and systematic terms.

Standard image High-resolution image

In all our redshift bins, the best-fitting models seen in Figure 21 are γ = 1.6 with χ2 = [69, 67, 54] for 70 dof24 at 〈z〉 = [2.3, 2.6, 3.0] (for the combined data using all S/N bins), respectively, implying probabilities of P = [52%, 59%, and 92%] of obtaining larger values.25 At the higher redshifts best-fitting mean transmission for the γ = 1.6 case is pushed to significantly discrepant values with respect to the fiducial Becker et al. (2013) values (Figure 21).

The γ = 1.3 model also provides acceptable fits to the models, with χ2 = [71, 73, 58] for 70 dof (P = [43%, 40%, 84%]) at 〈z〉 = [2.3, 2.6, 3.0], but at the two higher redshift bins this requires 〈FLyα values that are increasingly discrepant compared to Becker et al. (2013) (+2.3σ and +5σ respectively at 〈z〉 = [2.6, 3.0]). The isothermal γ = 1.0 models are disfavored at the two lower redshift bins, with best-fit values of χ2 = [98, 97] for 70 dof (P = [2%, 2%]) at 〈z〉 = [2.3, 2.6], whereas at 〈z〉 = 3, the error bars on the PDF are sufficiently large that acceptable fits are obtainable using γ = 1.0, with χ2 = 68 for 70 dof (P = 54%). However, this requires a +5σ discrepancy in 〈FLyα with respect to Becker et al. (2013). In Figure 22, one sees that fitting for 〈FLyα allows the γ = 1.0 models to be in good agreement with the data in the F  >  0.7 portion of the PDF, but gives rise to discrepancies in the 0.4  ≲  F  ≲  0.7 range, which limits the goodness-of-fit, and cannot easily be compensated by modifying the metals or LLS model.

From Figure 21, it is clear that as we move to higher redshifts, we require increasingly higher 〈FLyα relative to the fiducial Becker et al. (2013) values in order to agree with the data: at 〈z〉 = 2.3, our best-fit mean transmission for the γ = 1.6 model agrees with Becker et al. (2013), but at 〈z〉 = 3 there is a significant deviation of +2σ with respect to the Becker et al. (2013) measurement. The same trend is true for the best-fit γ = 1.3 and γ = 1.0 models, but these require even greater discrepancies with respect to the fiducial 〈FLyα.

One possible explanation for this discrepancy is the effect on the Becker et al. (2013) measurement of u-band selection bias in the SDSS quasars. This was first noted by Worseck & Prochaska (2011), who found that the color–color criteria used to select SDSS quasars preferentially selected quasars, specifically in the redshift range 3  ≲  zqso  ≲  3.5, that have intervening Lyman-breaks at λrest  <  912 Å. The 3  ≲  zqso  ≲  3.5 SDSS quasars are thus more likely to have intervening LLSs in their sightlines, yielding an additional contribution to the Lyα absorption and hence causing Becker et al. (2013) to possibly underestimate 〈FLyα when stacking the impacted quasars. Becker et al. (2013) mentioned this effect in their paper but argued that it was much smaller than their estimated errors by referencing theoretical IGM transmission curves estimated by Worseck & Prochaska (2011; Figure 17 in the latter paper).

Dr. G. Worseck has kindly provided us with these transmission curves, TIGM(λ), which were generated for both the average IGM absorption and that extracted from SDSS quasars affected by the color–color selection bias. In Figure 23 we plot the relative difference between the biased Lyα transmission deduced from zqso = 3.2 and zqso = 3.4 quasars and the true mean IGM transmission, using the Worseck & Prochaska (2011) transmission curves. It is clear that at Lyα absorption redshifts of zabs ≈ 3, the excess LLSs picked up from such quasars contribute an additional ∼1% compared to the mean IGM decrement, a discrepancy that is of the same magnitude as the error bars in the Becker et al. (2013) measurement, indicated by the dashed line.

Figure 23.

Figure 23. Red and black curves show the excess Lyα absorption expected from sightlines of zqso = 3.2 and zqso = 3.4 quasars, respectively, relative to the mean IGM transmission. This is caused by the SDSS selection bias described in Worseck & Prochaska (2011), which yields above-average numbers of intervening LLSs. These are derived from the same curves shown in Figure 17 of Worseck & Prochaska (2011), but replotted as ratios smoothed by a boxcar function over 12 pixels for clarity. The top axis labels the Lyα absorption redshift corresponding to each wavelength, while the shaded region indicates the wavelength range of our 〈z〉 = 3.0 bin. The dashed line shows, for comparison, the relative errors on the Lyα forest mean transmission estimated by Becker et al. (2013). The discrepancy due to the SDSS bias is significant compared to the Becker et al. (2013) errors.

Standard image High-resolution image

This could partially explain the higher 〈FLyα required to make our 〈z〉 = 3 models fit the data in Figure 21. Note that we expect this UV color selection bias to be much less significant in our BOSS data, since we have selected bright quasars in the top 5th percentile of the S/N distribution. Given that such quasars have high-S/N photometry, their colors separate much more cleanly from stellar contaminants. Furthermore, such bright quasars are much more likely to have been selected with multi-wavelength data (e.g., including near-IR and radio in addition to optical photometry see Ross et al. 2012). For both of these reasons, we expect our quasars to be much less susceptible to biases in color-selection related to the presence of an LLS. A careful accounting of this bias is beyond the scope of this paper, but from now on we will inflate by a factor of two the corresponding errors on 〈FLyα at 〈z〉 = 3 to account for this possible bias in the mean transmission measurements (dashed vertical lines in bottom panel of Figure 21).

Another possibility that could explain a bias in the 〈FLyα measured by Becker et al. (2013) is their assumption that the metal contamination of the Lyα forest does not evolve with redshift. While there are few clear constraints on the aggregate metal contamination within the forest, assuming that the metals actually decrease with increasing redshift (e.g., in the case of C iv, Cooksey et al. 2013), the assumption of an unevolving metal contribution calibrated at z ≈ 2.3 would lead to an underestimate of 〈FLyα at higher redshifts, which could explain the trend we seem to be seeing.

It is clear from the previous discussion that there is some degeneracy between γ and 〈FLyα in our transmission PDFs. However, we are primarily interested in γ, while the 〈FLyα has been extensively measured over the years, which allows the placement of strong priors. In the next section, we will therefore marginalize over 〈FLyα in order to obtain our final results.

6. RESULTS

Due to the uncertainties in 〈FLyα described in the previous subsection, for a better comparison between transmission PDFs, p, from models with different [γ, T0] we will marginalize the model likelihoods, $\mathcal {L} = \exp (-\chi ^2/2)$, over the Lyα forest mean transmission, 〈F〉:

Equation (24)

where A(〈F〉) is the prior on 〈F〉 (for clarity in these equations, 〈F〉 is used as a shorthand for 〈FLyα). We assume a Gaussian prior:

Equation (25)

where 〈FB13 and σF are the optically thin Lyα forest mean transmission and associated errors, respectively, estimated from Becker et al. (2013). Note that for 〈z〉 = 3, we have decided to dilate the error bars by a factor of two to account for the suspected quasar selection bias discussed in the previous section.

For each model, we generate transmission PDFs with different 〈FLyα (similar to Figure 21) and evaluate the combined χ2 summed over different S/N. We interpolate the χ2 over 〈FLyα to obtain a finer grid, which then allows us to numerically integrate Equation (24) using five-point Newton–Coates quadrature.

At this stage, we also analyze models with different IGM temperatures at mean density, T0. Hitherto, we have been working only with the central T_REF model (T0(z = 2.5)  ∼  16, 000 K), but we now also compare models from the T_HOT and T_COLD simulations, which have T0(z = 2.5)  ∼  11, 000 K and T0(z = 2.5)  ∼  21, 500 K, respectively. Each of these temperature models also sample temperature–density relationships of γ = [1.0, 1.3, 1.6] for a model grid of 3 × 3 parameters at each redshift.

The marginalized χ2 values for all the models are tabulated in Table 3, and plotted as a function of γ in Figure 24. In general, the T_REF models with γ = 1.6 provide the best agreements with the data at all redshifts with χ2 ≈ 60–70 for 69 dof. The T_HOT models (with higher IGM temperatures at mean density) provide fits of comparable quality, and indeed at 〈z〉 = 2.3 the T_HOT model with γ = 1.3 gives essentially the same goodness-of-fit as the γ = 1.6T_REF model. The cooler T_COLD models are less favored by the data, and at 〈z〉 = 2.6 give unreasonable fits to the data with χ2 = 89 for 69 dof (P = 5%), but at other redshifts they are acceptable fits to the data. In other words, the transmission PDF does not show a strong sensitivity for T0, which we shall show later is due to degeneracy with our LLS model in the low-transmission end of the transmission PDF.

Figure 24.

Figure 24. χ2 values (for 71 dof) from models with different γ and T0 at different redshifts, after marginalizing over uncertainties in the mean transmission 〈F〉 of the Lyα forest. Models with γ = 1.6 are generally favored, although γ = 1.3 with the T_HOT model is also acceptable at 〈z〉 = 2.3. The same quantities are also tabulated in Table 3.

Standard image High-resolution image

Table 3. Marginalized χ2 for ν = 71 dof

z〉 = 2.3
γ T_COLD T_REF T_HOT
  (T0 = 13000 K) (T0 = 18000 K) (T0 = 23000 K)
1.6 87.7 72.9 79.5
1.3 103.4 76.0 71.8
1.0 174.2 105.5 88.4
z〉 = 2.6
γ T_COLD T_REF T_HOT
  (T0 = 11000 K) (T0 = 16000 K) (T0 = 21500 K)
1.6 88.8 72.0 71.4
1.3 118.0 82.6 91.8
1.0 203.3 127.3 111.1
z〉 = 3.0
γ T_COLD T_REF T_HOT
  (T0 = 9000 K) (T0 = 14000 K) (T0 = 19000 K)
1.6 61.7 65.1 62.5
1.3 77.6 72.7 63.8
1.0 119.5 77.7 85.8

Download table as:  ASCIITypeset image

The more important question to address is the possibility of isothermal or inverted temperature–density relationships (γ  ⩽  1) as suggested by some studies on the transmission PDF of high-resolution, high-S/N echelle quasar spectra (e.g., Bolton et al. 2008; Viel et al. 2009; Calura et al. 2012). It is clear from Table 3 and Figure 24 that for all T0 models the isothermal, γ = 1.0 models disagree strongly with the BOSS data. The closest match for an isothermal IGM is the T_REF model at 〈z〉 = 3.0, which yields χ2 = 78 for 69 dof, or a probability of 21% of obtaining the data from this model. However, relative to the γ = 1.6 model at 〈z〉 = 3.0, which gives the minimum χ2 at that redshift, we find Δχ2 ≈ 16 for the isothermal model, i.e., a $\sqrt{\Delta \chi ^2} = 4\sigma$ discrepancy from the best-fit model. The isothermal model is also strongly disfavored at the other redshifts, where we find Δχ2 ≈ [15, 40] at 〈z〉 = [2.3, 2.6] or $\sqrt{(}\Delta \chi ^2) \approx [3.9\sigma, 6.3\sigma ]$. Since the shape of the transmission PDF varies continuously as a function of γ (see, e.g., Bolton et al. 2008; Lee 2012), these results imply that inverted (γ < 1) IGM temperature–density slopes are even more strongly ruled out.

7. DISCUSSION

In this paper, we have studied the 〈z〉 = 2.3–3 Lyα forest transmission PDF from 3373 BOSS DR9 quasar spectra. Although this is a relatively small subsample selected to be in the top 95th percentile in terms of S/N, it provides a two-orders-of-magnitude- larger Lyα forest path length than high-resolution, high-S/N data sets previously used for this purpose, providing unprecedented statistical power for transmission PDF analysis.

In order to ensure accurate characterization and allow subsequent modeling of the spectral noise, we have introduced a novel, probabilistic method of combining the multiple exposures that comprise each BOSS observation, using the raw sky and calibration data. This method significantly improves the accuracy of the noise estimation, and additionally allows us to generate mock spectra with noise properties tailored to each individual BOSS spectrum, but self-consistently for different Lyα forest realizations. We believe that our noise modeling—which yields noise estimates accurate to ∼3% across the relevant wavelength range—is the most careful treatment of spectral noise in multi-object fiber spectra to date, and we invite readers with similarly stringent requirements in understanding the BOSS spectral noise to contact the authors. In the future, the spectral extraction algorithm described by Bolton & Schlegel (2010) may solve some of the issues that affected us, but this has yet to be implemented.

For the continuum estimation, we used the MF-PCA method introduced in Lee (2012). This method, which reduces the uncertainty in the continuum estimation to σcont  ≲  5%, fits for a continuum such that the resulting Lyα forest has a mean transmission 〈F〉 matched to external constraints, for which we use the precise measurements by Becker et al. (2013). While MF-PCA does require external constraints for 〈F〉, we argue that so long as both the real quasars and mock spectra are continuum-fitted in exactly the same way, the shape of the transmission PDF retains independent information on the Lyα forest mean transmission.

To compare with the data, we used the detailed hydrodynamical simulations of Viel et al. (2013a), which explore a range of IGM temperature–density slopes (γ ≈ 1.0–1.6) and temperatures at mean density (T0(z = 2.5) ≈ [11000, 16000, 21500] K). We processed the simulated spectra to take into account the characteristics of the individual BOSS spectra in our sample, such as spectral resolution, pixel noise, and continuum fitting errors. We also incorporate the effects of astrophysical "nuisance" parameters such as LLSs and metal contamination. The LLSs are modeled by adding $10^{16.5} \ \mathrm{cm^{-2}}\,{\lesssim}\,N_{{\rm H\,\scriptsize{I}}}\,{\lesssim}\,10^{20.3}\ \mathrm{cm^{-2}}$ absorbers into our mock spectra, based on published measurements of the observed incidence lLLS(z) (Ribaudo et al. 2011) and H i column density distribution $f(N_{{\rm H\,\scriptsize{I}}})$ (Prochaska et al. 2010). Meanwhile, contamination from lower-redshift metals are modeled in an empirical fashion by inserting λrest > 1216 Å absorbers observed in lower-redshift SDSS/BOSS quasars into the same observed wavelengths of our mock spectra.

Our initial models did not provide satisfactory agreement with the transmission PDF measured from the BOSS spectra, with discrepancies at both the high-transmission and low-transmission bins. However, the differences between data and models were consistent across the different S/N subsamples, indicating that our noise modeling is robust. To resolve the discrepancies at the low-transmission end of the PDF, we explored various modifications to our LLS model. First, we steepened the column-density distribution slope of partial LLSs ($16.5\,{<}\,\log _{10}(N_{{\rm H\,\scriptsize{I}}})\,{<}\,17.5$ systems) to βLLS = −2, a value suggested from the mean free path of ionizing photons (Prochaska et al. 2010). This change relieved the tension between model and data in the F ≈ 0.1–0.4 bins, but implies increasing the number of pLLS by nearly an order of magnitude, but this is not unreasonable given the current uncertainties for this population (Janknecht et al. 2006; Prochaska et al. 2010). We believe that the necessity of a pLLS distribution with βLLS ≈ −2 to fit the BOSS Lyα transmission PDF supports the claims of Prochaska et al. (2010) regarding the column-density distribution of this population.

However, after adding pLLSs a major discrepancy remained in the saturated F ≈ 0 bins, which we addressed by adjusting the effective b-parameter assumed in all the optically thick systems in our model. We found that an effective value of b = 45 km s−1 provided the best fit to our model.26

At the high-transmission (F ≳ 0.6) end of the model transmission PDFs, we found that modifying the Lyα forest mean transmission in the simulations, 〈FLyα, allowed much better agreement with the BOSS data. At 〈z〉 = [2.3, 2.6], the 〈FLyα that gave the best-fitting model PDFs were within 1σ of the Becker et al. (2013) measurements, but at 〈z〉 = 3 we required a value that was ∼2σ larger. We argue that this discrepancy could be due to a color–color selection bias in the 3  ≲  zqso  ≲  3.5 SDSS quasars used by Becker et al. (2013), which preferentially selected sightlines with intervening LLSs, giving rise to additional Lyα absorption (and thus lower 〈FLyα) at a level comparable to the errors estimated by Becker et al. (2013). Our BOSS spectra, on the other hand, should be comparatively unaffected on account of being the brightest quasars in the survey, hence they separate more cleanly from the stellar locus in color-space, and were more likely to have been selected with additional criteria (radio, near-IR, variability, etc.) beyond color–color information (Ross et al. 2012).

To deal with these uncertainties, we decided to marginalize over the mean transmission in our χ2 analysis. At 〈z〉 = 2.3, the preferred model is for a hot IGM with (T0 = 23000 K) along with γ = 1.3 (P ≈ 45%), although the intermediate-temperature model (T0 = 18000 K) with γ = 1.6 is nearly as good a fit with P ≈ 82%. The preferred models at 〈z〉 = [2.6, 3.0] are for γ = 1.6 at temperatures at mean density of T0 = [21500, 9000] K (P = [46%, 78%], respectively. We find that the isothermal (γ = 1) temperature–density relationship is strongly disfavored at all redshifts regardless of T0, with discrepancies of $\sqrt{\Delta \chi ^2}\,{\sim}\,4\hbox{--}6\sigma$ compared to the best-fit models.

One might be skeptical of the results given the various assumptions we had to make in modeling astrophysical nuisance parameters. To test the robustness of our results to systematics, we generated 20 iterations of model transmission PDFs sampling all nine of our [T0, γ] models (i.e., 180 PDFs in total) in the 〈z〉 = 2.6, S/N = 8–10 bin, where each iteration has a random realization of the systematics (LLSs, metals, continuum errors, etc.) drawn in the same way as our Monte Carlo estimate of systematic uncertainty (Section 5.3). We then asked how many times each T0 or γ model gave the lowest χ2 when compared with the data. For this test we only evaluated the χ2 at the fiducial 〈FLyα without marginalization.

The results of this test are shown in Figure 25. In the top panel, the T_REF and T_HOT models are favored ∼40% of the time but the T_COLD has ∼15% of being favored depending on the (random) choice of systematics. In other words, there is significant degeneracy between our systematics model and T0. We suspect this is driven largely by the choice of the LLS b-parameter, which changes the shape of the transmission PDF in a similar way to T0 (compare Figure 16 with Figure 17). In contrast, the bottom panel of Figure 25 shows that whatever systematics we choose, γ = 1.6 is always favored, indicating a robust constraint.

Figure 25.

Figure 25. Histogram indicating the fraction of times a given T0 (top) or γ (bottom) model is favored for the 〈z〉 = 2.3, S/N = 8–10 transmission PDF when the systematics levels in the model are randomly sampled 20 times. While different systematics could lead to different best-fitting models for T0, the models with γ = 1.6 are always preferred. This indicates some degeneracy in our systematics model with T0, but our conclusions on γ are robust.

Standard image High-resolution image

There is, however, some degeneracy between γ and the Lyα forest mean transmission, 〈FLyα. While we marginalize over the latter quantity, the choice of prior can, in principle, affect the results. However, at 〈z〉 = [2.3, 2.6], the chi-squared minimum of the γ = 1.0 PDF model as a function of 〈FLyα is χ2 ≈ 100 for 71 dof (Figure 21), which has a probability of P ≈ 1%. In other words, even if we fine-tuned 〈FLyα in an attempt to force the isothermal model as the best-fit model at these redshifts, it would still be an unacceptable fit, and the γ = 1.3 model would still be preferred over it. This is less clear cut at 〈z〉 = 3, where the error bars are large enough to permit a reasonable minimum chi-squared of χ2 ≈ 70 for 71 dof using the γ = 1 model, but this requires a value of 〈FLyα = 0.71, which is 5σ discrepant from the value reported by Becker et al. (2013). While this 〈FLyα measurement is dependent on corrections for metals and LLS absorption (and indeed we argue that they have neglected a subtle bias related to SDSS quasar selection), they have attempted to incorporate these uncertainties into their errors and we have no particular reason to believe that they have underestimated this by a factor of >5. A quick survey of the available measurements on the forest mean transmission from the past decade yields 〈FLyα(z = 3) ≈ 0.65–0.69 (Kim et al. 2007; Faucher-Giguère et al. 2008; Dall'Aglio et al. 2008), albeit with larger errors. The use of any of these measurements as priors for our analysis would therefore disfavor an IGM with γ  ⩽  1, (which requires 〈FLyα(z = 3) ⩾ 0.71), unless all the available literature in the field has significantly underestimated the mean transmission.

There are several cosmological and astrophysical effects that we did not model, which could in principle affect our conclusions on γ. Since the Lyα forest transmission PDF essentially measures the contrast between high-absorption and low-absorption regions of the IGM, this can be degenerate with the underlying amplitude of matter fluctuations, which is specified by a combination of σ8 and ns, the matter fluctuation variance on 8 h−1 Mpc scales and the slope of the amplitude power spectrum, respectively. While these parameters are increasingly well-constrained (e.g., Planck Collaboration et al. 2014), there is still some uncertainty regarding the level of the fluctuations on the sub-Mpc scales relevant to the Lyα forest, which could be degenerate with our γ measurement. Bolton et al. (2008) explored this degeneracy between σ8 and γ in the context of transmission PDF measurements from high-resolution spectra, and found that the PDF is less sensitive to plausible changes in σ8 compared to γ, e.g., modifying σ8 by Δσ8 ± 0.1, affected the shape of the PDF less than a modification of Δγ ± 0.25 (Figure 2 in their paper). This degeneracy is in fact further weakened when an MCMC analysis of the full parameter space is considered, as shown by the likelihood contours in Viel et al. (2009).

The astrophysical effects that could be degenerate with γ include galactic winds and inhomogeneities in the background UV ionizing field. The injection of gas into the IGM by strong galaxy outflows could in principle modify Lyα forest statistics at fixed γ; this was studied using hydrodynamical simulations by Viel et al. (2013b), who concluded that the effect on the PDF is small compared to the uncertainties in high-resolution PDF measurements. Our BOSS measurement has roughly the same errors as those from high-resolution spectra once systematic uncertainties are taken into account, and it therefore seems unlikely that galactic winds could significantly bias our conclusions on γ. Meanwhile, fluctuations in the UV ionizing background, Γ, that are correlated with the overall density field could also be degenerate with the temperature–density relationship (compare with Equation (1)). This effect was studied by McDonald et al. (2005a) in simulations using an extreme model that considered only UV background contributions from highly biased active galactic nuclei, which maximizes the inhomogeneities. They concluded that while these UV fluctuations affected forest transmission statistics at z  ∼  4, the effect was small at z  ≲  3, the redshift range of our measurements.

Various observational and systematic effects could also, in principle, affect our constraints on γ. For example, our modeling of the BOSS spectral resolution assumes a Gaussian smoothing kernel which might affect our constraints if this were untrue. However, in their analysis of the 1D forest transmission power spectrum, Palanque-Delabrouille et al. (2013) examined the BOSS smoothing kernel and did not find significant deviations from Gaussianity. There are also possible systematics caused by our simplified modeling of LLS and metal contamination in the data, for example in our assumption of a single b-parameter for all LLSs and our neglect of very weak metal absorbers. However, we believe that the test performed in Figure 25 samples larger differences in the transmission PDF than those caused by our model simplifications, e.g., it seems unlikely that going from a single LLS b-parameter to a finite b-distribution could cause greater differences in the flux PDF than varying the single b-parameter by ±50% as was done in Figure 25. As for continuum-estimation, we carry out the exact same continuum-fitting procedure on the mock spectra as on the real quasar spectra, which leads no overall bias since in both cases the resulting forest transmission field is forced to have the same overall transmission, 〈Fcont. The only uncertainty then relates to the distribution of c'/c − 1, i.e., the per-pixel error of the estimated continuum, c', relative to the true continuum, c. In reality the shape of this distribution could be different between the data and the mock versions, whereas within our mock framework we could only explore overall rescalings of the distribution width. Again, we find it unlikely that differences in the transmission PDF caused by the true shape of the c'/c − 1 distribution could be so large as to be comparable to the effect caused by varying the width of the continuum error distribution, which we have examined.

While we do not think that the effects described in the previous few paragraphs qualitatively affect our conclusion that the BOSS data is inconsistent with isothermal or inverted IGM temperature–density relationships (γ  ⩽  1), when taken in aggregate these systematic uncertainties do weaken our formal 4σ–6σ limits against γ  ⩽  1 and need to be explicitly considered in future analyses.

7.1. Astrophysical Implications

How does this compare with other results on the thermal state of the IGM? McDonald et al. (2001) analyzed the transmission PDF from eight high-resolution, high S/N spectra and compared with them now-obsolete hydrodynamical simulations. They found the data to be consistent with a temperature–density relationship with the expected values of γ ≈ 1.5 (Hui & Gnedin 1997). More recently, Bolton et al. (2008) and Viel et al. (2009) carried out analyses of the transmission PDF measured from a larger sample (18 spectra) of Lyα forest sightlines measured by Kim et al. (2007) and found evidence for an inverted temperature–density relationship (γ  <  1). Viel et al. (2009) found that at z ≈ 3.0, the temperature–density relation was highly inverted (γ ≈ 0.5), and remained so as low as z ≈ 2.0 although at the lower redshifts the data was marginally consistent with an isothermal IGM. They suggested that the difference between their results and those of McDonald et al. (2001) was due to the now-obsolescent cosmological parameters and less-detailed treatment of intervening metals in the earlier study. However, Lee (2012) then pointed out that there is a sensitivity of the measured values of γ from the transmission PDF on continuum-fitting. Since continuum-fitting of high-resolution data generally involves manually placing the continuum at Lyα forest transmission peaks that do not necessarily reach the true continuum, it is conceivable that continuum biases combined with underestimated jacknife errors bars (e.g., Rollinde et al. 2013) could have led Bolton et al. (2008) and Viel et al. (2009) to erroneously deduce an inverted temperature–density relationship (see Bolton et al. 2014 for a detailed discussion on this point). In our analysis we have fitted our continua using an automated process that is free from the same continuum-fitting bias, although it does require an assumption on the underlying Lyα forest transmission, which we have marginalized over in our analysis.

Most recent measurements of the transmission PDF from high-resolution data have continued to favor an isothermal or inverted γ—Calura et al. (2012) analyzed the transmission PDF from a sample of z ≈ 3.3–3.8 quasars and also found an isothermal temperature–density relationship at z = 3, although combining with the Kim et al. (2007) data drove the estimated γ to inverted values at z  <  3. However, Rollinde et al. (2013) carried out a re-analysis of the transmission PDF from various high-resolution echelle data sets, which included significant overlap with the Kim et al. (2007) data. They argue that previous analyses have underestimated the error on the transmission PDF, and found the observed transmission PDF to be consistent with simulations that have γ ≈ 1.4 over 2  <  z  <  3—this discrepancy is probably also driven by a different continuum-estimation from the Kim et al. (2007) measurement.

The use of other statistics on high-resolution spectra have, however, tended to disfavor an isothermal or inverted temperature–density relationship. Rudie et al. (2012) analyzed the lower-end of the b$N_{{\rm H\,\scriptsize{I}}}$ cutoff from individual Lyα forest absorbers measured in a set of 15 very high-S/N quasar echelle spectra, and estimated γ ≈ 1.5 at z = 2.4. Bolton et al. (2014) compared the Rudie et al. (2012) measurements to hydrodynamical simulations and corroborated their determination of the temperature–density relationship slope.

Garzilli et al. (2012) analyzed the Kim et al. (2007) sample and found that while the transmission PDF supports an isothermal or inverted temperature–density relationship, a wavelet analysis favors γ  >  1. Note, however, that the b$N_{{\rm H\,\scriptsize{I}}}$ cutoff and the transmission PDF are sensitive to different density ranges, with the PDF probing gas densities predominantly below the mean (e.g., Bolton et al. 2014).

Our result of γ ≈ 1.6 at 〈z〉 = [2.3, 2.6, 3.0] is thus in rough agreement with measurements that do not involve the transmission PDF from high-resolution Lyα forest spectra (with the exception of Rollinde et al. 2013). Our value of γ at 〈z〉 = 3 is somewhat unexpected because one expects a flattening of the temperature–density relationship close to the He ii reionization epoch at z  ∼  3 (Furlanetto & Oh 2008; McQuinn et al. 2009; but see Gleser et al. 2005; Meiksin & Tittley 2012), although γ = 1.3 is not strongly disfavored ($\sqrt{(\Delta \chi ^2)}\,{\sim}\,2.6$).

Taken at face value, the temperature–density relationship during He ii reionization can be made steeper by a density-independent reionization and/or a lower heating rate in the IGM (Furlanetto & Oh 2008), which could be reconciled with an extended He ii event (Shull et al. 2010; Worseck et al. 2011).

Our constraints on γ appear to be in conflict with the prediction of the theories of Broderick et al. (2012) and Chang et al. (2012), who elucidated a relativistic pair-beam channel for plasma-instability heating of the IGM from TeV gamma-rays produced by a population of luminous blazars. This mechanism provides a uniform volumetric heating rate, which would cause an inverted temperature–density relationship in the IGM (Puchwein et al. 2012) since voids would experience a higher specific heating rate compared with heating by He ii reionization alone. This picture has been challenged by the recent study of Sironi & Giannios (2014), who dispute the amount of heating this mechanism could provide, since they found that the momentum dispersion of such relativistic pair beams allows ≪10% of the beam energies to be deposited into the IGM.

However, in this paper we have assumed relatively simple temperature–density relationships in which the bulk of the IGM in the density range 0.1  ≲  Δ  ≲  5 follows a relatively tight power law. We have therefore not studied more complicated T–Δ relationships, e.g., with a spread of temperatures at fixed density (e.g., Meiksin & Tittley 2012; Compostella et al. 2013) that might be caused by He ii reionization or other phenomena. It is therefore possible that such complicated temperature–density relationships could result in Lyα forest transmission PDFs that mimic the γ ≈ 1.6 power law; this is something that needs to be examined in more detail in future work.

7.2. Future Prospects

Looking forward, the subsequent BOSS data releases will significantly enlarge our sample size, e.g., DR10 (Ahn et al. 2014) is nearly double the size of the DR9 sample used in this paper, while the final BOSS sample (DR12) should be three times as large as DR9. In particular, the newer data sets should be sufficiently large for us to analyze the transmission PDF and constrain γ during the epoch of He ii reionization at z  >  3. This would be a valuable measurement, since high-resolution spectra are particularly affected by continuum-fitting biases at these redshifts (Faucher-Giguère et al. 2008; Lee 2012).

The analysis of the optically thin Lyα forest transmission PDF from these expanded data sets will have vanishingly small sample errors, and the errors will be dominated by systematic and astrophysical uncertainties. At the high-transmission end, our uncertainties are dominated by the scatter of the continuum-fitting, which is dominated by the question of whether our quasar PCA templates, derived from low-luminosity low-redshift quasars (Suzuki et al. 2005), or high-luminosity SDSS quasars (Pâris et al. 2011), respectively, are an accurate representation of the BOSS quasars. This uncertainty should be eliminated in the near future by PCA templates derived self-consistently from the BOSS data (N. Suzuki et al. 2014, in preparation). The modeling of metal contamination could also be improved in the near future by advances in our understanding of how metals are distributed in the IGM (e.g., Zhu et al. 2014), although metals are a comparatively minor contribution to the uncertainty in our transmission PDF.

We also aim to improve on the rather ad hoc data analysis in this paper, in which we accounted for some uncertainties in our modeling by incorporating them into our error covariances (e.g., LLSs, metals, continuum errors), while 〈FLyα was marginalized over a fixed grid. In future analyses, it would make sense to carry out a full MCMC treatment of all these parameters which would rigorously account for all the uncertainties and allow straightforward marginalization over nuisance parameters.

Since this paper was initially focused on modeling the BOSS spectra, for the model comparison we used only simulations sampling a very coarse 3 × 3 grid in T0 and γ parameter space, and were unable to take account for uncertainties in other cosmological (σ8, ns etc) and astrophysical (e.g., Jeans scale, Rorai et al. 2013; or galactic winds, Viel et al. 2013b) parameters in our analysis. However, methods already exist to interpolate Lyα forest statistics from hydrodynamical simulations given a set of IGM and cosmological parameters (e.g., Viel & Haehnelt 2006; Borde et al. 2014; Rorai et al. 2013). In the near future we expect to perform joint analyses using other Lyα forest statistics in conjunction with the transmission PDF, such as new measurements of the small-scale (k ≳ 0.2 s km−1) 1D transmission power spectrum (M. Walther et al. 2014, in preparation), moderate-scale (0.002 s km−1  ≲  k  ≲  0.2 s km−1) transmission power spectrum in both 1D (e.g., Palanque-Delabrouille et al. 2013) and 3D (from ultra-dense Lyα forest surveys using high-redshift star-forming galaxies, Lee et al. 2014a, 2014b), the phase angle PDF determined from close quasar pair sightlines (Rorai et al. 2013), and others. Such efforts would require a fine grid sampling the full set of cosmological and IGM thermal parameters in order to ensure that the interpolation errors are small compared to the uncertainties in the data (see e.g., Rorai et al. 2013). Efforts are underway to utilize massively parallel adaptive-mesh refinement codes (Almgren et al. 2013) to generate such parameter grids to study the IGM (Lukić et al. 2014) However, one of the findings of this paper is the importance of correct modeling of LLSs, in particular partial LLSs ($10^{16.5}\ \mathrm{cm^{-2}}\,{\lesssim}\,N_{{\rm H\,\scriptsize{I}}}\,{\lesssim}\,10^{17.5}\ \mathrm{cm^{-2}}$), in accounting for the shape of the observed Lyα transmission PDF. Since our hydrodynamical simulations did not include radiative transfer and cannot accurately capture optically thick systems, we had to add these in an ad hoc manner based on observational constraints that are currently rather imprecise. In the near future, we would want to use hydrodynamical simulations with radiative transfer (even if only in post-processing, e.g., Altay et al. 2011; McQuinn et al. 2011; Altay et al. 2013; Rahmati et al. 2013) to self-consistently model the optically thick absorbers in the IGM. With the unprecedented statistical power of the full BOSS Lyα forest sample, this could provide the opportunity to place unique constraints on the column-density distribution function of partial LLSs.

8. SUMMARY/CONCLUSIONS

In this paper, we analyzed the PDF of the Lyα forest transmitted flux using 3393 BOSS quasar spectra (with 〈S/N〉 ⩾ 6) from Data Release 9 of the SDSS-III survey.

To rectify the inaccurate noise estimates in the standard pipeline, we first carried a custom co-addition of the individual exposures of each spectrum, using a probabilistic procedure that also separates out the signal and CCD contributions, allowing us to later create mock spectra with realistic noise properties. We then estimated the intrinsic quasar continuum using a mean flux regulated technique that reduces the scatter in the estimated continua by forcing the resultant Lyα forest mean transmission to match the precise estimates of Becker et al. (2013), although we had to make minor corrections on the latter to account for our different assumptions on optically thick systems in the data. This now allows us to measure the transmission PDF in the data, which we do so at 〈z〉 = [2.3, 2.6, 3.0] (with bin widths of Δz = 0.3), and split it into S/N subsamples of S/N = [6–8, 8–10, 10–25] at each redshift bin.

The second part of the paper describes finding a transmission PDF model that describes the data, based on detailed hydrodynamical simulations of the optically thin Lyα forest that sample different IGM temperature–density relationship slopes, γ, and temperatures at mean density, T0 (where T(Δ) = T0Δγ − 1). Using these simulations we generate mock spectra based on the real spectra. These take into account the following instrumental and astrophysical effects:

  • 1.  
    Lyman-limit systems. These are randomly added into our mock spectra based on published incidence rates (Ribaudo et al. 2011) and column-density distributions (Prochaska et al. 2010), including a large population of partial LLSs ($10^{16.5} \, \mathrm{cm^{-2}}\,{\le}\, N_{{\rm H\,\scriptsize{I}}}\,{\le}\, 10^{17.5}\, \mathrm{cm^{-2}}$) with a power-law distribution of roughly $f(N_{{\rm H\,\scriptsize{I}}}) \,{\propto}\, N_{{\rm H\,\scriptsize{I}}}^{-2}$. We assumed an effective b = 45 km s−1 for the velocity width of these absorbers.
  • 2.  
    Metal contamination. We measure metal absorption from the 1260 Å  ≲  λ  ≲  1390 Å rest-frame region of lower-redshift quasars at the same observed wavelength, and then add these directly into our mock spectra.
  • 3.  
    Spectral resolution and noise. Each mock spectrum is smoothed by the dispersion vector of the corresponding real spectrum (determined by the BOSS pipeline), and we apply corrections that bring the spectral resolution modeling to within ∼1% accuracy. We then introduce pixel noise based on the noise parameters estimated by our probabilistic co-addition procedure on the real data, which also achieves percent-level accuracy on modeling the noise.
  • 4.  
    Continuum errors. Since we generate a full mock Lyα forest spectrum including the simulated quasar continuum (based on the continua fitted to the actual data), we can apply our continuum-estimation procedure on each mock to fit a new continuum. The difference between the new continuum and the underlying simulated quasar continuum yields an estimate of the continuum error.

We then compare the model transmission PDFs with the data, using an error covariance that includes both bootstrap errors and systematic uncertainties in the model components described above. At 〈z〉 = 3.0 we find a discrepancy in the assumed Lyα forest mean transmission, 〈FLyα, between our data and that derived from Becker et al. (2013), which we argue is likely caused by a selection bias in the SDSS quasars used by the latter. We therefore marginalize out these uncertainties in 〈FLyα to obtain our final results.

The models with an IGM temperature–density slope of γ = 1.6 give the best-fit to the data at all our redshift bins (〈z〉 = [2.3, 2.6, 3.0]). Models with an isothermal or inverted temperature–density relationship (γ  ⩽  1) are disfavored at the $\sqrt{(\Delta \chi ^2)} = [3.9, 6.3, 4.0]\sigma$ at 〈z〉 = [2.3, 2.6, 3.0], respectively. Due to a degeneracy with our LLS model, we are unable to put robust constraints on T0, but we have checked that our conclusions on γ are robust to such systematics as can be considered within our model framework. There are other possible systematics we did not consider that could in principle affect our measurement, such as cosmological parameters (σ8, ns) and astrophysical effects (galactic winds, inhomogeneous UV ionizing background), but we argue that these are unlikely to qualitatively affect our conclusions.

We thank Michael Strauss, J. Xavier Prochaska, Gabor Worseck, and Joop Schaye for useful comments and discussion. We also thank the members of the ENIGMA group (http://www.mpia-hd.mpg.de/ENIGMA/) at the Max Planck Institute for Astronomy (MPIA) for helpful discussions. J.F.H. acknowledges generous support from the Alexander von Humboldt foundation in the context of the Sofja Kovalevskaja Award. The Humboldt foundation is funded by the German Federal Ministry for Education and Research. The hydrodynamic simulations in this work were performed using the COSMOS Supercomputer in Cambridge (UK), which is sponsored by SGI, Intel, HEFCE and the Darwin Supercomputer of the University of Cambridge High Performance Computing Service (http://www.hpc.cam.ac.uk/), provided by Dell Inc. using Strategic Research Infrastructure Funding from the Higher Education Funding Council for England. COSMOS and DARWIN are part of the DIRAC high performance computing facility funded by STFC. M.V. is supported by the FP7 ERC grant "cosmoIGM" GA-257670, PRIN-MIUR and INFN/PD51 grants. J.S.B. acknowledges the support of a Royal Society University Research Fellowship. B.L. acknowledges support from the NSF Astronomy and Astrophsics Fellowship grant AST-1202963.

Funding for SDSS-III has been provided by the Alfred P. Sloan Foundation, the Participating Institutions, the National Science Foundation, and the U.S. Department of Energy Office of Science. The SDSS-III Web site is http://www.sdss3.org/.

SDSS-III is managed by the Astrophysical Research Consortium for the Participating Institutions of the SDSS-III Collaboration including the University of Arizona, the Brazilian Participation Group, Brookhaven National Laboratory, University of Cambridge, Carnegie Mellon University, University of Florida, the French Participation Group, the German Participation Group, Harvard University, the Instituto de Astrofisica de Canarias, the Michigan State/Notre Dame/JINA Participation Group, Johns Hopkins University, Lawrence Berkeley National Laboratory, Max Planck Institute for Astrophysics, Max Planck Institute for Extraterrestrial Physics, New Mexico State University, New York University, Ohio State University, Pennsylvania State University, University of Portsmouth, Princeton University, the Spanish Participation Group, University of Tokyo, University of Utah, Vanderbilt University, University of Virginia, University of Washington, and Yale University.

APPENDIX:

In this Appendix, we describe our probabilistic procedure for combining the multiple BOSS exposures of each spectrum27 while simultaneously estimating the noise variance in terms of a parameterized model. We assume that the noise in each pixel can be described by

Equation (A1)

where

Equation (A2)

The true object flux $\mathcal {F}_\lambda$ and Aj = 1 − 4 are noise parameters that we will determine given the individual exposure spectra fλ, i, sky flux estimates sλ, i, and calibrations vectors Sλ, i (which convert between detector counts and photons). σRN, eff is the effective read noise that we fixed to σRN, eff = 12; this can be thought of as an effective number of pixels times the true read noise of the CCD squared, which we multiplied by the spectrograph dispersion σdisp(λ) to approximately account for the change in spot size as a function of wavelength. Equation (A2) parameterizes wavelength-dependent biases in the calibration vector.

We search for the model that best describes the multiple exposure spectra fλi, where our model parameters are Aj from Equation (A1) and $\mathcal {F}_{\lambda }$ is the true flux of the object. In what follows, we will outline a method for determining the posterior distribution $P(A_j, \mathcal {F}_{\lambda } |f_{\lambda i})$ using an MCMC method. From this distribution, we can obtain both an accurate model for the noise via Equation (A1), and our final combined spectrum. The estimates for Aj can also be used to self-consistently generate pixel noise in mock Lyα forest spectra.

The probability of the data given the model, or the likelihood, can be written as

Equation (A3)

Note that individual exposure data fλi are on the native wavelength grid of each CCD exposure, whereas the BOSS pipeline interpolates and then combines these individual spectra into a final co-added spectrum, defined on a wavelength grid with uniform spacing. Furthermore, flexure and other variations in the spectrograph wavelength solution will result in small (typically sub-pixel) shifts between the individual exposure wavelength grids. In Equation (A3) our model $\mathcal {F}_{\lambda }$ must be computable at every wavelength fλi of the individual exposures. We are free to choose the wavelengths at which $\mathcal {F}_{\lambda }$ is represented, but this choice is a subtle issue for several reasons. First, note that we want to avoid interpolating the data, fλi, onto the model wavelength grid, as this would correlate the data pixels, and require that we track covariances in the likelihood in Equation (A3), making it significantly more complicated and challenging to evaluate. Similarly, it is undesirable to interpolate our model $\mathcal {F}_{\lambda }$, as this would introduce correlations in the model parameters, making it much more difficult to sample them with our MCMC. Finally, note that $\mathcal {F}_{\lambda }$ also represents our final co-added spectrum, so we might consider opting for a uniform wavelength grid, similar to what is done by the BOSS pipeline. Our approach is to simply determine the model flux $\mathcal {F}_{\lambda }$ at each wavelength of the individual exposures fλi. Shifts among the individual exposure wavelength grids result in a more finely sampled model grid. For the reasons explained above, we use nearest grid point (NGP) interpolation, so that the fλi are evaluated on the $\mathcal {F}_{\lambda }$ grid (and vice versa) by assigning the value from the single nearest pixel.

In our MCMC iterations, we use the standard Metropolis– Hastings criterion to sample the parameters Aj, with trials drawn from a uniform prior. For the $\mathcal {F}_{\lambda }$, we exploit an analogy with Gibbs sampling, which dramatically simplifies the MCMC for likelihood functions with a multivariate Gaussian form. Gibbs sampling exploits the fact that given a multivariate distribution, it is much simpler to sample from conditional distributions than to integrate over a joint distribution. To be more specific, the likelihood in Equation (A3) is proportional to the joint probability distribution of the noise parameters Aj and $\mathcal {F}_{\lambda }$, but it is also proportional to the conditional probability distribution of the $\mathcal {F}_{\lambda }$ at fixed Aj. With Aj fixed the probability of $\mathcal {F}_{\lambda }$ is then

Equation (A4)

which is very nearly a multivariate Gaussian distribution for $\mathcal {F}_{\lambda }$ with a diagonal covariance matrix. The equation above slightly deviates from a Gaussian because the σλi depends on $\mathcal {F}_{\lambda }$ via Equation (A1). In what follows, we ignore this small deviation, and assume that the conditional PDF of the $\mathcal {F}_{\lambda }$ (at fixed Aj) is Gaussian.

Given that Equation (A4) is a multivariate Gaussian with diagonal covariance, the Gibbs sampling of the $\mathcal {F}_{\lambda }$ becomes trivial. Since Equation (A4) can be factored into a product of individual Gaussians, we need not follow the standard Gibbs sampling algorithm, whereby each parameter is updated sequentially holding the others fixed. Instead we need only hold Aj fixed (since the likelihood is not Gaussian in these parameters), and we can sample all of the $\mathcal {F}_{\lambda }$ simultaneously. This simplification, which dramatically speeds up the algorithm, is possible because the conditional distribution for $\mathcal {F}_{\lambda }$ can be factored into a product of Gaussians for each pixel $\mathcal {F}_{\lambda }$, thus the conditional distribution at any wavelength is completely independent of all the others.

Completing the square in Equation (A4) we can then write

Equation (A5)

where

Equation (A6)

The expressions above for fopt, λ and $\sigma ^2_{\rm opt,\lambda }$ simply represent the optimally combined flux estimator and the resulting variance. Thus one can think of our MCMC algorithm as performing an optimal combination of the individual exposure spectra fλi, whereby the noise is simultaneously determined via an iterative procedure.

Thus the basic steps of our algorithm can be summarized as follows.

  • 1.  
    Initialize, by creating a model λ grid from all unique wavelengths in the individual exposures, and use NGP interpolation to assign a fλi to this grid for each exposure.
  • 2.  
    Choose a starting guess for noise parameters Aj. For the starting $\mathcal {F}_{\lambda }$ use $\mathcal {F}_{\lambda } = f_{\rm opt,\lambda }$ from Equation (A6), but with the model σλi replaced by the noise delivered by the pipeline.
  • 3.  
    Begin the MCMC loop:
  • 4.  
    Use only the second half of the chain for the posterior distributions, as the first half is the burn in phase.

Our MCMC algorithm directly determines the posterior distribution $P(A_j, \mathcal {F}_{\lambda }| f_{\lambda i})$, which provides all the information we need to construct mock spectra using Equation (A1) as described in Section 4.5.

The distribution of $P(\mathcal {F}_{\lambda }|f_{\lambda i})$, on the other hand, contains everything we need to know about the combined spectrum. Namely, we can define

Equation (A7)

as the combined spectrum, and

Equation (A8)

as its variance. If the formal noise returned by BOSS pipeline were actually the true noise in the data, then our ${\bar{\mathcal {F}}_{\lambda }}$ in Equation (A8) would be equivalent to the optimally combined noise and our variance the optimal variance, i.e., according to Equation (A6). In practice, the BOSS pipeline does not return the true noise and so our ${\bar{\mathcal {F}}_{\lambda }}$ is optimal whereas the pipeline flux is sub-optimal, and our $\sigma ^2_{\lambda }$ is an empirical estimate of the actual noise in the data.

Footnotes

  • 15 

    There is also a simultaneous effort to observe ∼1.5 million luminous red galaxies, to measure the BAO at z  ∼  0.5. See, e.g., Anderson et al. (2014).

  • 16 

    All spectral signal-to-noise ratios quoted in this paper are per 69 km s−1 SDSS/BOSS pixel unless noted otherwise.

  • 17 

    The Lyα forest transmitted flux fraction is sometimes also referred to as "flux" in the literature; but we do however use the variable F to refer to this quantity.

  • 18 

    Defined as the 1041–1185 Å region in the quasar rest-frame.

  • 19 

    Typically there are nexp = 5 exposures of 15 minutes each, although this can vary due to the requirements to achieve a given (S/N)2 over each individual plug-plate, as determined by the overall BOSS survey strategy (see Dawson et al. 2013).

  • 20 

    Note that the ideal/model observed flux described in the noise modeling section, $\mathcal {F}_{\lambda }$, and the Lyα forest transmission F, are completely different quantities.

  • 21 

    Note that the value lz0 = 0.30 given in Table 6 of Ribaudo et al. (2011) is actually erroneous, and the correct normalization is in fact lz0 = 0.1157, consistent with the data in their paper, which is used in Equation (10). Dr. J. Ribaudo 2014, private communication, has concurred with this conclusion.

  • 22 

    Si iii an obvious exception, although we will later account for this omission in our error bars (Section 5.3).

  • 23 

    We aim for three to four times more mock spectra than in the corresponding data sample, but later when we have to compute large model grids we are limited to models with the same size as the data.

  • 24 

    In this particular section, when we quote the χ2 for the best-fitting 〈FLyα the dof is further reduced by 1 compared to the other χ2 summed over the S/N subsamples.

  • 25 

    These χ2 values are very small for the dof, suggesting that we may have overestimated the size of our systematic errors, but as we shall see this does not affect our ability to place constraints on γ and merely makes our conclusions rather conservative.

  • 26 

    Note that we have quoted an effective b-parameter, which must not be confused with the b from individual kinematical components, which is often quoted by workers carrying out Voigt profile analysis of high-resolution spectra.

  • 27 

    Defined as unique combinations of plate number, fiber number, and MJD of observation.

Please wait… references are loading.
10.1088/0004-637X/799/2/196