TERRESTRIAL PLANET OCCURRENCE RATES FOR THE KEPLER GK DWARF SAMPLE

, , , , , , , , , , , , , , , , , , , , , , , , and

Published 2015 August 4 © 2015. The American Astronomical Society. All rights reserved.
, , Citation Christopher J. Burke et al 2015 ApJ 809 8 DOI 10.1088/0004-637X/809/1/8

0004-637X/809/1/8

ABSTRACT

We measure planet occurrence rates using the planet candidates discovered by the Q1-Q16 Kepler pipeline search. This study examines planet occurrence rates for the Kepler GK dwarf target sample for planet radii, $0.75\;\leqslant \;$ ${R}_{{\rm{p}}}$ $\;\leqslant $ 2.5 ${R}_{\oplus }$, and orbital periods, $50\;\leqslant $ ${P}_{\mathrm{orb}}$ $\leqslant $ 300 days, with an emphasis on a thorough exploration and identification of the most important sources of systematic uncertainties. Integrating over this parameter space, we measure an occurrence rate of F0 = 0.77 planets per star, with an allowed range of $0.3\leqslant {F}_{0}\;\;\leqslant $ 1.9. The allowed range takes into account both statistical and systematic uncertainties, and values of F0 beyond the allowed range are significantly in disagreement with our analysis. We generally find higher planet occurrence rates and a steeper increase in planet occurrence rates toward small planets than previous studies of the Kepler GK dwarf sample. Through extrapolation, we find that the one year orbital period terrestrial planet occurrence rate ${\zeta }_{1.0}$ = 0.1, with an allowed range of $0.01\leqslant {\zeta }_{1.0}\;\;\leqslant $ 2, where ${\zeta }_{1.0}$ is defined as the number of planets per star within 20% of the ${R}_{{\rm{p}}}$ and ${P}_{\mathrm{orb}}$ of Earth. For G dwarf hosts, the ${\zeta }_{1.0}$ parameter space is a subset of the larger ${\eta }_{\oplus }$ parameter space, thus ${\zeta }_{1.0}$ places a lower limit on ${\eta }_{\oplus }$ for G dwarf hosts. From our analysis, we identify the leading sources of systematics impacting Kepler occurrence rate determinations as reliability of the planet candidate sample, planet radii, pipeline completeness, and stellar parameters.

Export citation and abstract BibTeX RIS

1. INTRODUCTION

The Kepler data set is the only currently available experiment capable of detecting and characterizing the planetary content of the Milky Way down to the regime of terrestrial planets orbiting within 1 AU of solar-type stars (Borucki et al. 2010; Koch et al. 2010). The ongoing detailed analysis of Kepler data can provide accurate and precise measurement of the occurrence rate of Earth analogs beyond the Solar System, which is a critical parameter for understanding the potential for life outside the Solar System (Drake 2013; Prantzos 2013) and a quantity of immense interest across the disciplines of science, engineering, philosophy, and sociology.

Kepler builds upon the enormous efforts of the astronomical community in filling out the parameter space of planet properties in numerous stellar environments (Fischer & Valenti 2005; Marcy et al. 2005; Naef et al. 2005; Gould et al. 2006; Sahu et al. 2006; Cumming et al. 2008; Bowler et al. 2010; Howard et al. 2010; Bayliss & Sackett 2011; Mayor et al. 2011; van Saders & Gaudi 2011; Wright et al. 2012; Bonfils et al. 2013; Meibom et al. 2013; Clanton & Gaudi 2014). Kepler expands the planetary discovery space to terrestrial planets within 1 AU of solar-type stars. Kepler data enables the study and simulations of planetary formation to finally be confronted with their predictive outcomes in the regime of rocky planets (Ida & Lin 2004; Benz et al. 2014; Mordasini et al. 2015). In addition, Kepler constraints on the prevalence of rocky planets in the habitable zone (HZ) of nearby stars provides a key input in defining the scope of future missions that will probe the atmospheres of extrasolar planets (Dressing & Charbonneau 2013; Batalha 2014; Kouveliotou et al. 2014; Leger et al. 2015; Stark et al. 2015).

A substantial shortcoming for planet occurrence rate determinations using the Kepler pipeline planet candidate samples (Borucki et al. 2011a, 2011b; Batalha et al. 2013; Burke et al. 2014; Mullally et al. 2015; Rowe et al. 2015a) has been the unavailability of an accurate model for the completeness of the Kepler pipeline (Batalha 2014). Previous planet occurrence determinations from Kepler data have dealt with this shortcoming by employing simplifying assumptions as to the pipeline completeness: assume the theoretical performance of the Transiting Planet Search algorithm (TPS Jenkins 2002) with a signal-to-noise ratio (S/N) threshold of 7.1 (Borucki et al. 2011b), designate a higher S/N level where the planet sample is close to 100% complete (Catanzarite & Shao 2011; Youdin 2011; Howard et al. 2012; Traub 2012; Dong & Zhu 2013; Dressing & Charbonneau 2013; Silburt et al. 2015), or simultaneously solve for a parameterized completeness model in addition to planet occurrence (Fressin et al. 2013; Farr et al. 2014; Mulders et al. 2015). Others avoid this shortcoming altogether through an independent planet search pipeline and pipeline completeness measurement (Petigura et al. 2013a, 2013b; Dressing & Charbonneau 2015).

Christiansen et al. (2015) rectify this shortcoming by directly measuring the Kepler pipeline completeness of the Q1-Q16 Kepler pipeline run (Tenenbaum et al. 2014) through Monte-Carlo transit injection and recovery tests. In this study, we make use of the Christiansen et al. (2015) Kepler pipeline completeness parameterization in order to derive the planet occurrence rates from the resulting Q1-Q16 Kepler planet candidate sample of Mullally et al. (2015). Another highlight of this study is a comprehensive analysis of the systematic errors present in deriving planet occurrence rates with Kepler data. As exemplified in Youdin (2011) and Dong & Zhu (2013), we undertake a sensitivity analysis where we iteratively change an input assumption and recalculate the occurrence rates. We investigate the following input assumptions: pipeline completeness systematics, orbital eccentricity, stellar parameter systematics, planet parameter systematics, and planet sample classification systematics.

This paper is organized as follows. Section 2 describes the pipeline completeness model that quantifies the survey completeness for any target observed by Kepler. Sections 3 and 4 summarize the stellar properties and planet sample from the Q1-Q16 Kepler pipeline run adopted for derivation of the planet occurrence rates. We extend the analysis techniques of Youdin (2011) by increasing the complexity of the parameterized model for the planet occurrence rate and employ Markov Chain Monte-Carlo (MCMC) methods for solving the parameter estimation problem in Section 5. Section 6.1 presents results for the planet occurrence rate using a baseline set of inputs, and we thoroughly explore the systematic errors in this result through a sensitivity analysis in Section 6.2. We compare the occurrence rate analysis with previous efforts in Section 7. We apply the resulting occurrence rates to determine the occurrence rate for terrestrial planets with an orbital period equivalent to Venus in Section 9 as well as extrapolating these results toward longer periods (Section 8) in order to measure a one year terrestrial planet occurrence rate in Section 10. Finally, Section 11 summarizes the future work necessary to improve the accuracy for the resulting planet occurrence rates.

2. KEPLER PIPELINE COMPLETENESS MODEL

This section details an analytic star-by-star model for the Kepler pipeline completeness. A critical component for modeling the completeness of Kepler observations is simulating the performance of the TPS pipeline module which is responsible for characterizing the noise present in a light curve and detection of the transit signals (Jenkins 2002; Tenenbaum et al. 2012, 2013, 2014). The performance of a transit survey can be fully specified with intensive, end-to-end Monte Carlo signal injection and recovery tests (Weldrake & Sackett 2005; Burke et al. 2006; Hartman et al. 2009; Christiansen et al. 2013; Seader et al. 2014). Unfortunately, due to their numerically intensive nature, Monte-Carlo injection tests are not amenable to a systematic sensitivity analysis, and the tests are limited to the subset of targets that one performs the analysis upon. Therefore, we present a simplified analytic model for the Kepler pipeline that can be readily applied to any observed Kepler target using a minimum of input data. Fortunately, the joint noise characterization, filtering, and detection properties of TPS were designed to facilitate a well defined and tested detector response for transit signals even in the presence of astrophysical broadband or red noise (Jenkins 2002). Given the well defined properties of the TPS detector, our analytic completeness model can achieve high fidelity after it is calibrated with Monte-Carlo injection tests. For a single target, we parameterize the pipeline completeness over a two-dimensional (2D) grid of orbital period, ${P}_{\mathrm{orb}}$, and planet radius, ${R}_{{\rm{p}}}$.

2.1. Multiple Event Statistic (MES) Estimation

Modeling pipeline completeness requires modeling the statistical behavior of TPS and its response to noise in the presence of a signal (Jenkins 2002; Seader et al. 2013). In the presence of broadband red noise, TPS considers the so-called MES to measure the strength of a potential transit signal. In the null hypothesis case of no signal present, the MES distribution is Gaussian with an average of zero and unit variance. In the alternative hypothesis case for the presence of a signal, the MES distribution is Gaussian but the average MES is shifted proportional to the S/N of the transit signal. The first step for modeling pipeline completeness is to estimate the expected MES of a transit signal for a specified ${P}_{\mathrm{orb}}$ and ${R}_{{\rm{p}}}$. This requires an estimate of the expected transit duration,

Equation (1)

where e is the orbital eccentricity, and the stellar radius, ${R}_{\star }$, and orbital semimajor axis, a, are in a consistent set of units. In Equation (1), we assume ${R}_{{\rm{p}}}$ $\ll $ ${R}_{\star }$, shorten the transit duration from the central crossing time by a factor of $\pi /4$ for its expectation assuming a uniform distribution of $\mathrm{cos}i$ for the orbital inclination (Gilliland et al. 2000; Seager & Mallén-Ornelas 2003), and include the expected dependence on the transit duration with e (Burke 2008). We explore the sensitivity of our results to $e\gt 0$ in Section 6.2.2.

Figure 1.

Figure 1. Fractional completeness model for the host to Kepler-22b (KIC: 10593626) in the Q1-Q16 pipeline run using the analytic model described in Section 2.

Standard image High-resolution image
Figure 2.

Figure 2. Absolute difference for pipeline completeness between the analytic model described in Section 2 and a more accurate numerical pipeline completeness model that employs the full CDPP time series and numerical window function for the KIC target 10593626.

Standard image High-resolution image

Next, we determine the noise present in the light curve data averaged over the transit duration of interest. TPS estimates the time varying noise present in a light curve, the so-called Combined Differential Photometric Precision (CDPP, Koch et al. 2010; Christiansen et al. 2012). CDPP varies with time and is calculated over the same 14 transit durations, ${\tau }_{\mathrm{dur},\mathrm{srch}}=[1.5,2.0,2.5,3.0,3.5,4.5,5.0,6.0,7.5,9.0,10.5,$ $12.0,12.5,15.0]\;\mathrm{hr}$, that are used in the transiting planet search (see Figure 3 of Christiansen et al. 2012). For the analytic completeness model, we employ a summary statistic, a robust rms of the CDPP (robCDPP), for the light curve noise. In testing it was found that the non-robust rms CDPP (rmsCDPP) statistic typically reported by the Q1-Q16 pipeline data products (SOC build 9.2 and earlier) can be biased. The bias in the rmsCDPP arises when the distribution formed from the CDPP time series values is asymmetric with the outlying tail of the distribution inordinately affecting the results.

Figure 3.

Figure 3. Stellar ${T}_{\mathrm{eff}}$ as a function of $\mathrm{log}\;{\text{}}g$ for the observed Kepler targets from the Q1-Q16 stellar catalog of Huber et al. (2014; red points), and the GK dwarf targets selected for this study (gray points). For efficient plotting only a randomly selected subsample of the full catalog is shown.

Standard image High-resolution image

We calculate robCDPP by replacing the typical components of the rms that includes the mean/dc component

Equation (2)

where $\bar{x}$ is the mean and σ is the standard deviation, with their robust equivalents

Equation (3)

where MAD is the median absolute deviation. The robCDPP is calculated after removing both invalid and deweighted data identified in the same manner as during the transit search. In order to remove the influence of strong signals in estimating the noise present in a light curve, the robCDPP value adopted for the completeness model is calculated on the light curve after all the potential transit signatures have been removed in the Data Validation (DV) multiple planet search (Wu et al. 2010). On average, the robCDPP to rmsCDPP ratio is 1.03 with a sample standard deviation of 0.06. A higher fidelity model of pipeline completeness would employ the full CDPP time series. However, in tests we find that when the number of expected transits contributing to a detection, ${N}_{\mathrm{trn}}\gtrsim 5$, the robCDPP summary is sufficient to model pipeline completeness rather than a more time-consuming calculation involving the full CDPP time series (see Section 2.4).

For a given ${\tau }_{\mathrm{dur}}$, we interpolate within the grid of 14 robCDPP values to estimate the noise for that duration, ${\sigma }_{\mathrm{cdpp}}$. For values of ${\tau }_{\mathrm{dur}}$ outside the ${\tau }_{\mathrm{dur},\ \mathrm{srch}}$ grid, we adopt the end point robCDPP for ${\sigma }_{\mathrm{cdpp}}$. Alternative extrapolation methods such as assuming a ${\sigma }_{\mathrm{cdpp}}\propto 1/\sqrt{{\tau }_{\mathrm{dur}}}$ dependence or linear extrapolation were not stable for the sometimes complicated behavior of robCDPP as a function of ${\tau }_{\mathrm{dur}}$.

The next step is to estimate the expected transit signal depth, Δ. Since the TPS search algorithm employs a square box-car signal template match to the signal for detection, the appropriate Δ is the average signal depth over ${\tau }_{\mathrm{dur}}$ rather than the purely geometric depth $\delta ={k}^{2}={({R}_{{\rm{p}}}/{R}_{\star })}^{2}$, where k is the physical radius ratio. From the results of the transit injection study of Christiansen et al. (2015), we determine that for limb darkened transit signals, on average ${\rm{\Delta }}=0.84{{\rm{\Delta }}}_{\mathrm{max}}$, where ${{\rm{\Delta }}}_{\mathrm{max}}$ is the depth at closest approach or maximum transit depth. When averaged over a uniform distribution of impact parameter, b, we find using the limb darkened transit model of Giménez (2006) that ${{\rm{\Delta }}}_{\mathrm{max}}/{k}^{2}=c+s\;k$ is well fit by a linear relationship with parameters c and s that vary with the limb darkening profile of the stellar intensity. When using a linear limb darkening law, $I=1-u(1-\mathrm{cos}\theta )$, where I is the stellar intensity relative to the stellar center, u is the linear coefficient, and θ is the line-of-sight stellar-surface-normal angle, with a coefficient appropriate for G dwarfs, u = 0.6, we determine best-fit values of (c = 1.0874, s = 1.0187). We note that c and s are weakly dependent upon limb darkening, taking on values of (c = 1.0696, s = 1.001) for u = 0.5 and (c = 1.1068, s = 1.0379) for u = 0.7. The needed value of Δ is expressible in terms of k using the previous equations to provide the single transit event S/N, ${\rm{\Delta }}/{\sigma }_{\mathrm{cdpp}}$.

The MES is calculated by averaging the transit signal strength over multiple transit events. The resulting MES =$\sqrt{{N}_{\mathrm{trn}}}{\rm{\Delta }}/{\sigma }_{\mathrm{cdpp}}$, where ${N}_{\mathrm{trn}}=({T}_{\mathrm{obs}}/{P}_{\mathrm{orb}})\times {f}_{\mathrm{duty}}$ is the expected number of transit events, ${T}_{\mathrm{obs}}$ is the time baseline of observational coverage for a target and ${f}_{\mathrm{duty}}$ is the observing duty cycle. The observing duty cycle, ${f}_{\mathrm{duty}}$, is defined as the fraction of ${T}_{\mathrm{obs}}$ with valid observations. The Kepler spacecraft experiences planned data gaps for data download and other spacecraft operations. These data gaps result in an overall duty cycle of ∼95% for targets observed for all quarters. In addition, the duty cycle is suppressed further in the transit search to ∼88% by a set of data weights applied in TPS. Data near gaps suffers from spacecraft systematics, thus TPS deweights data near gaps using a smooth exponential decay functional form that goes from fully deweighted data at the gap edge to full data weighting over a span of 2 days. We calculate ${f}_{\mathrm{duty}}$ by dividing the number of cadences with an overall deweighting factor $\gt 0.5$ by the total number of cadences within ${T}_{\mathrm{obs}}$. In addition, we force a floor of ${N}_{\mathrm{trn}}\geqslant 3$ in the MES estimate since TPS requires at least three transit events for detection. ${T}_{\mathrm{obs}}$ and ${f}_{\mathrm{duty}}$ are calculated in the initial call to TPS before transit signals have been identified and removed by DV.

2.2. Pipeline Completeness Modeling

The TPS search algorithm design results in a well-defined pipeline completeness that is, in the limit of broadband noise, a function of MES alone (Jenkins 2002). Since the TPS detection statistic, MES, is distributed as a Gaussian with unit variance, the pipeline completeness (fraction of transit signals present in the data that are recovered by the pipeline) has a theoretical expected form of

Equation (4)

where ${\mathrm{MES}}_{\mathrm{thresh}}=7.1$ is the adopted detection threshold (Jenkins et al. 2002; Jenkins 2002). However, the presence of stochastic, impulsive systematics of instrumental or astrophysical origin in the time series that are not due to a transit signal results in an overwhelming number of false alarm detections when MES is the only criteria employed for detection. In the Q1-Q16 pipeline run, 57% of targets result in a detection when based upon MES alone (Tenenbaum et al. 2014). To mitigate the false positive detections, additional criteria (or vetoes) are employed that quantify how consistent the depths, shape, and duration of individual events are with each other (Seader et al. 2013; Tenenbaum et al. 2013, 2014). The additional vetoes cause the pipeline completeness to be suppressed relative to the theoretical expectation given in Equation (4).

Christiansen et al. (2015) quantify the resulting suppression of the pipeline completeness through Monte-Carlo transit injection and recovery tests. They find that the gamma cumulative distribution function (CDF) provides a good fit to the suppressed pipeline completeness,

Equation (5)

where ${\rm{\Gamma }}(a)$ is the gamma function and the argument to the gamma CDF, $x=\mathrm{MES}-4.1-({\mathrm{MES}}_{\mathrm{thresh}}-7.1)$, is related to MES by an offset of 4.1 in order to achieve a good fit. The parameters for the gamma CDF adopted for this study are a = 4.65 and b = 0.98. In rare cases, due to the timeout limits of the TPS planet search, MES${}_{\mathrm{thresh}}$ is higher than the normal ${\mathrm{MES}}_{\mathrm{thresh}}$ = 7.1. Section 2.1 can be used to provide a mapping for the 2D grid of ${R}_{{\rm{p}}}$ and ${P}_{\mathrm{orb}}$ onto MES. The pipeline detection efficiency provides a mapping from the MES to the pipeline completeness.

2.3. Window Function

The final component of the pipeline completeness model accounts for the limits of the data coverage for meeting the planet search detection requirement of having at least ${N}_{\mathrm{tr}}\geqslant 3$. The transit survey window function, ${P}_{\mathrm{win}}$, quantifies the probability that a requisite number of transits required for detection occurs in the observational data (Gaudi 2000; von Braun et al. 2009; Burke & McCullough 2014). Since Kepler operates in the high duty cycle regime, we adopt the binomial analytic window function as discussed in Burke & McCullough (2014) and originally introduced by Deeg et al. (2004). The analytic window function matches the average behavior of the fully numerical window function (see Figure 9 of Burke & McCullough 2014), and requires as input ${T}_{\mathrm{obs}}$, ${P}_{\mathrm{orb}}$, and ${f}_{\mathrm{duty}}$. Following Appendix A.4 of Burke & McCullough (2014), the window function probability of detecting at least three transits can be explicitly written out in the binomial approximation as

Equation (6)

where $M={T}_{\mathrm{obs}}/$ ${P}_{\mathrm{orb}}$. The final completeness model results from ${P}_{\mathrm{comp}}={P}_{\mathrm{win}}\times {P}_{\mathrm{det}}$.

For targets with data in all Q1-Q16 quarters, the analytic window function predicts ${P}_{\mathrm{win},\geqslant 3}\geqslant 0.98$ for ${P}_{\mathrm{orb}}$ $\;\leqslant $ 300 days. We compare the impact on ${P}_{\mathrm{comp}}$ between using the analytic window function and a full numerical window function in Section 2.4.

2.4. Worked Example and Limitations

As a worked example, we demonstrate the calculation of ${P}_{\mathrm{comp}}$ for the host to Kepler-22b (Borucki et al. 2012). This target with Kepler Input Catalog (KIC) identifier 10593626 has stellar parameters ${R}_{\star }$ = 0.98 ${R}_{\odot }$, ${T}_{\mathrm{eff}}$ = 5640 K, and $\mathrm{log}\;{\text{}}g$ = 4.44 as compiled by the Q1-Q16 stellar catalog of Huber et al. (2014). In order to generate ${P}_{\mathrm{comp}}$ over the 2D space of ${R}_{{\rm{p}}}$ and ${P}_{\mathrm{orb}}$ shown in Figure 1, we employ the following input values: ${f}_{\mathrm{duty}}=0.879$, ${T}_{\mathrm{obs}}=1426.7$ days, e = 0, ${\mathrm{MES}}_{\mathrm{thresh}}=7.1$, and the 14 robCDPP noise estimates, ${\sigma }_{\mathrm{cdpp},14}\;=[36.2,33.2,31.0,29.4,28.0,26.1,25.4,24.2,23.1,22.4,21.9,21.8,21.7,21.5]$ ppm. The quantities necessary to generate ${P}_{\mathrm{comp}}$ for all Kepler targets searched for planets in the Q1-Q16 pipeline run are available as part of the Q1-Q16 Kepler stellar table hosted by the NASA Exoplanet Archive.9

In order to investigate potential biases in the analytic pipeline completeness model, we show the absolute difference between the analytic model presented in this study and a higher fidelity completeness model that is available for future pipeline runs in Figure 2. The higher fidelity completeness model replaces two components of the completeness model described in Section 2. First, the analytic window function approximation is replaced by a full numerical window function calculated during TPS to take into account data gaps and data deweighting consistent with the planet search. Second, the simplified MES estimate of Section 2.1 that employs the robCDPP values is replaced by the "1σ depth function" (1SDF). For the 1SDF, TPS quantifies the transit signal depth that yields a MES = 1 as a function of ${P}_{\mathrm{orb}}$ for all 14 transit durations searched taking into account the full details of the full CDPP time series and deweighted data.

Differences between the completeness estimates are largest (∼0.06) toward longer periods (${P}_{\mathrm{orb}}$ $\;\geqslant $ 300 days). The occurrence rate is $\propto {P}_{\mathrm{comp}}^{-1}$ (Youdin 2011), thus errors in the pipeline completeness can propagate directly to occurrence rates. Future study is needed in order to characterize the net impact of the higher fidelity completeness model on occurrence rates for a full sample of Kepler targets.

Although this initial test affirms the efficacy of the analytic completeness model for this well behaved Kepler target, this comparison does not fully test all its simplifying assumptions. The above test does not check the accuracy of our assumption of a simple dependence of the pipeline completeness on MES alone and the adopted Δ to k conversion. Due to a single injection per target, Christiansen et al. (2015) cannot rule out the possibility that the pipeline completeness may depend on additional parameters beyond MES and contain a strong star-by-star dependence. Results for small samples or individual targets may systematically differ from the average pipeline completeness results of Christiansen et al. (2015). However, Christiansen et al. (2015) have characterized the average pipeline completeness as a function of MES when averaged over a large sample of targets as is the case in this study. Future studies will focus on characterizing the star-by-star dependence of pipeline completeness by comparing the simplified completeness model outlined in this study to the higher fidelity completeness model using the full numerical window function and 1σ depth function for larger samples of targets. In addition, we are implementing support for $\sim {10}^{4}$ transit injections on a single target employing the NASA Ames Pleiades super computing facility (Seader et al. 2014).

An additional shortcoming of any pipeline completeness model is the inescapable dependence on the assumed stellar parameters, eccentricity, and stellar binarity. Stellar parameters and eccentricity are employed in defining ${\tau }_{\mathrm{dur}}$ which sets the timescale relevant for the integrated noise level, and stellar binarity can result in third light contamination that impacts the assumed planet radius. In this study, we treat the pipeline completeness as having no uncertainty due to the incomplete understanding of the stellar parameters, eccentricity, and stellar binarity. However, we do explore sensitivity in the derived planet occurrence rates to alternative assumptions for the stellar parameters and non-zero eccentricities in Section 6.2.

Although it is a distinct process separate from the pipeline generation of TCEs, the vetting classification process of TCEs into planet candidates and false positives also shapes the overall completeness of the planet candidate sample, and the vetting relied upon on a manual classification (i.e., human inspection) procedure (Mullally et al. 2015). The TCE vetting process has an unquantified false negative rate of incorrectly classifying valid planet candidates as false positives and unconscious human biases and/or errors. For this study, we assume the vetting process is 100% complete, unbiased, and correct. However, we do investigate the sensitivity of our results on this assumption in Section 6.2.4 by varying the planet candidate sample.

The analytic pipeline completeness model and input data presented in this study are only relevant to the TCE population generated by the Q1-Q16 pipeline run (Tenenbaum et al. 2014) using the SOC 9.1 software release. The next TCE release (Seader et al. 2015) used the SOC 9.2 software, which introduced changes to the data analysis and planet search algorithm that influences the pipeline sensitivity. In SOC 9.2, TPS implemented a bootstrap noise characterization algorithm during the search in order to recalibrate the detection threshold (Seader et al. 2015). The bootstrap noise characterization test allows the effective MES threshold to be a function of ${P}_{\mathrm{orb}}$ as opposed to being independent of ${P}_{\mathrm{orb}}$ in the SOC 9.1 software release. In addition, the box-car signal template matched to the data in TPS was replaced by an average limb-darkened signal template to yield a better signal match. However, changing the match template influences the noise statistics (such as robCDPP) and the Δ to k conversion. Finally, the relation between pipeline completeness and MES is being analyzed through Monte-Carlo transit injection using the updated software release.

3. STELLAR PROPERTIES

Precise and accurate planet occurrence rates depend on precise and accurate stellar properties. Stellar properties influence three areas relevant to planet occurrence rates: the measured planet radii, the estimated transit duration, and geometric transit probability. In this study, we adopt stellar parameters from the Q1-Q16 KIC revision of Huber et al. (2014). Huber et al. (2014) update Kepler target stellar parameters by adopting literature values and additional observations (asteroseismology, spectroscopy, and photometry) that have become available since the original KIC observations (Brown et al. 2011). With an improved observational database, Huber et al. (2014) derive stellar parameters by fitting the observations to isochrones from the Dartmouth Stellar Evolution Database (Dotter et al. 2008) using a ${\chi }^{2}$ minimization.

For this study we focus on planet occurrence for the G and K dwarf sample observed by Kepler. Previous occurrence rate calculations indicate significant variations in the planet population as a function of stellar ${T}_{\mathrm{eff}}$ for the Kepler sample (Howard et al. 2012; Burke et al. 2015; Mulders et al. 2015). In order to simplify the planet occurrence model by avoiding the dependence on stellar ${T}_{\mathrm{eff}}$, we focus on the GK dwarfs rather than the full FGKM dwarf sample. Burke et al. (2015) find that the planet occurrence rates agree when the G and K dwarf Kepler targets are analyzed separately in an ${R}_{{\rm{p}}}$ and ${P}_{\mathrm{orb}}$ parameter space similar to this study. We select G and K dwarfs by making the following cuts on the stellar parameter catalog of Huber et al. (2014): 4200 $\;\leqslant \;$ ${T}_{\mathrm{eff}}$ $\;\leqslant \;6100$ K and ${R}_{\star }$ $\;\leqslant $ 1.15 ${R}_{\odot }$. In addition, we focus on the Kepler targets with nearly continuous coverage over the entire Q1-Q16 data span of the mission. We select targets with an observation data spanning ${T}_{\mathrm{obs}}\;\gt $ 2 years, duty cycle ${f}_{\mathrm{duty}}\gt $ 0.6, and a $\mathrm{robCDPP}\;\;\leqslant $ 1000 ppm at the 7.5 hr transit duration. The lower limit on ${f}_{\mathrm{duty}}$ ensures the inclusion of the targets that are impacted by the CCD electronics loss in Q4 of the Kepler mission (Batalha et al. 2013). The above cuts result in 91,567 targets in our sample. Figure 3 shows ${T}_{\mathrm{eff}}$ and $\mathrm{log}\;{\text{}}g$ for the full catalog of Huber et al. (2014; red points) along with the Kepler targets selected using the above criteria (gray points). Table 1 provides the Kepler identification number and a binary flag to indicate that the target belongs to the baseline stellar sample selected by the above criteria. Adopting ${T}_{\mathrm{eff}}$ = 5200 K as the dividing line separating the G and K dwarfs, 80% of our stellar sample belong to the G dwarf category.

Table 1.  Kepler GK Dwarf Target Samples

KIC ID Baseline Sample Original KIC Sample
757450 1 1
891901 0 1
891916 1 1
892718 1 1
892772 1 1
892832 1 1
892834 1 1
892882 1 1
892911 1 1
892946 0 1

Only a portion of this table is shown here to demonstrate its form and content. A machine-readable version of the full table is available.

Download table as:  DataTypeset image

In Figure 4 we present the noise distribution of the selected GK dwarfs as a function of ${K}_{{\rm{p}}}$. Gilliland et al. (2011) and Christiansen et al. (2012) discuss the instrumental and astrophysical sources of noise in the Kepler data. We also provide mean properties of the GK dwarfs as a function of ${K}_{{\rm{p}}}$ in Table 2. In magnitude wide bins, Table 2 provides the bin centers, number of targets, mean stellar properties (${R}_{\star }$ $\mathrm{log}\;{\text{}}g$, and ${T}_{\mathrm{eff}}$), and the 7.5 hr robCDPP for the 10th, 50th, and 90th percentiles in each bin. Using the pipeline completeness model of Section 2, we show in Figure 5 the average pipeline completeness for the GK dwarf sample in terms of detection probability contour levels over the ${P}_{\mathrm{orb}}$ and ${R}_{{\rm{p}}}$ parameter space examined in this study. The figure does not include the effects of the geometric probability to transit. We discuss our choice of ${P}_{\mathrm{orb}}$ and ${R}_{{\rm{p}}}$ space examined for this study in Section 4.

Figure 4.

Figure 4. Distribution of the robCDPP at the 7.5 hr time scale as a function of ${K}_{{\rm{p}}}$ for the G and K dwarfs analyzed in this study. Mean properties of the target stars as a function of ${K}_{{\rm{p}}}$ are provided in Table 2.

Standard image High-resolution image
Figure 5.

Figure 5. Pipeline completeness model, ${P}_{\mathrm{comp}}$, averaged over the GK dwarf sample is shown by the contour levels over the ${P}_{\mathrm{orb}}$ and ${R}_{{\rm{p}}}$ parameter space. The Q1-Q16 Kepler planet candidate sample of Mullally et al. (2015) found around the GK dwarf sample is also shown (orange points).

Standard image High-resolution image

Table 2.  Kepler Target Summary

${K}_{{\rm{p}}}$ N $\langle {R}_{\star }\rangle $ $\langle \mathrm{log}\;{\text{}}g\rangle $ $\langle {T}_{\mathrm{eff}}\rangle $ ${\mathrm{robCDPP}}_{7.5\mathrm{hr}}$ ${\mathrm{robCDPP}}_{7.5\mathrm{hr}}$ ${\mathrm{robCDPP}}_{7.5\mathrm{hr}}$
(mag)   (${R}_{\odot }$) (cgs) (K) 10th% (ppm) 50th% (ppm) 90th% (ppm)
8 9 0.93 4.48 5658 9.3 13.1 27.4
9 36 0.90 4.48 5543 12.5 22.6 169.9
10 117 0.93 4.46 5640 13.1 23.9 169.8
11 364 0.91 4.47 5608 17.5 26.8 142.5
12 1312 0.89 4.48 5613 22.6 32.1 67.6
13 4964 0.90 4.48 5643 32.8 44.9 78.2
14 16011 0.88 4.50 5625 51.3 70.9 106.8
15 41649 0.86 4.52 5563 88.2 123.9 180.8
16 27027 0.81 4.56 5413 147.8 193.4 271.4
17 58 0.82 4.54 5201 328.0 456.0 788.9

Download table as:  ASCIITypeset image

4. PLANET PROPERTIES

We measure the planet occurrence rate using the Q1-Q16 pipeline run (Tenenbaum et al. 2014) and the resulting Kepler planet candidate sample from Mullally et al. (2015). We choose to limit our analysis to 50 $\;\leqslant $ ${P}_{\mathrm{orb}}$ $\leqslant \;$ 300 days and rocky to mini-Neptune planets with 0.75 $\leqslant \;$ ${R}_{{\rm{p}}}$ $\;\leqslant $ 2.5 ${R}_{\oplus }$. The ${P}_{\mathrm{orb}}$ range under investigation has several advantages. Mullally et al. (2015) classified the new and pre-existing KOIs into planet candidate and false positives for the ${P}_{\mathrm{orb}}$ > 50 days regime. Thus, the selected period range represents a uniform classification of planet candidates following the procedures of Mullally et al. (2015). The cumulative KOI catalog for ${P}_{\mathrm{orb}}$ > 50 days currently consists of classifications from Batalha et al. (2013), Burke et al. (2014), Mullally et al. (2015), and Rowe et al. (2015a). Toward shorter and longer ${P}_{\mathrm{orb}}$, the population of instrumental false alarm detections increases rapidly (see bottom panel of Figure 4 in Tenenbaum et al. 2014). The vetting process employed by Mullally et al. (2015) effectively removes much of the instrumental false alarm contamination, but the remaining contamination is potentially higher outside this ${P}_{\mathrm{orb}}$ range.

For a majority of Kepler targets, ${P}_{\mathrm{orb}}$ ∼ 300 days is roughly the transition between planet candidates having at least 4–5 transit events contributing to the detection and planet candidates having the minimum three transit events contributing to the detection (see Figure 9 of Burke & McCullough 2014. The three transit event (${P}_{\mathrm{orb}}$ ≳ 300 days) low MES planet candidates are the most challenging candidates to vet properly (Mullally et al. 2015) and further work is needed to understand the false alarm rate of this population. Thus, we exclude them from planet occurrence rate calculations for the time being. Third, approximating the behavior of the full CDPP time series by the summary robCDPP statistics for the pipeline completeness model can be inaccurate at long periods when there is an increased chance that the few transit events available can occur during outliers of the CDPP noise distribution.

The astrophysical false positive contamination rates have observationally (Santerne et al. 2012; Désert et al. 2015) and theoretically (Morton & Johnson 2011; Fressin et al. 2013) been shown to increase toward shorter ${P}_{\mathrm{orb}}$. Also for shorter periods, ${P}_{\mathrm{orb}}$ > 3 days, the harmonic removal filter in the pipeline increasingly removes transiting planet signals (Christiansen et al. 2013, 2015). The detection efficiency reported in Christiansen et al. (2015) is calculated for ${P}_{\mathrm{orb}}$ > 10 days. Thus, the detection efficiency does not take into account the impact of the harmonic removal filter.

In order to select the Kepler planet candidate sample for analysis, we must limit the KOI planet candidates to ones recovered in the Q1-Q16 pipeline run. The analysis of Mullally et al. (2015) uniformly vetted the pre-existing KOIs and new KOIs that made a Threshold Crossing Event (TCE) corresponding to the KOI ephemeris for ${P}_{\mathrm{orb}}$ $\;\geqslant \;50$ days. To supplement the list of planet candidates, we make special exceptions for systems with strong transit timing variations (TTVs). The pipeline only searches for transits with a uniform ephemeris and targets with high S/N and strong TTVs can result in multiple TCEs at the incorrect period, but corresponding to one to a few of the high S/N transit events. If it is clear that the TCE corresponds to one or a few of the single events of a TTV system, but formally the TCE ephemeris does not match the KOI ephemeris, then that TTV KOI planet candidate is accepted as recovered by the pipeline and included in our analysis. We provide the KOI planet candidates deemed as recovered by the Q1-Q16 pipeline run in Table 3. The recovered KOIs within the $50\;\leqslant $ ${P}_{\mathrm{orb}}$ $\leqslant $ 300 days limits for the baseline occurrence rate calculation are indicated by a binary flag in Table 3.

Table 3.  Q1-Q16 Planet Candidate Samples

KOI Number Baseline Original KIC High Reliability Full Long Period Trimmed Long Period
12.01 0 0 0 0 0
41.01 0 0 0 0 0
41.03 0 0 0 0 0
42.01 0 0 0 0 0
51.01 0 0 0 0 0
70.01 0 0 0 0 0
70.03 0 1 0 0 0
70.05 0 0 0 0 0
72.02 0 0 0 0 0
75.01 0 0 0 0 0

Only a portion of this table is shown here to demonstrate its form and content. A machine-readable version of the full table is available.

Download table as:  DataTypeset image

In addition, we provide KOI planet candidates recovered in the Q1-Q16 pipeline run in the $10\;\leqslant $ ${P}_{\mathrm{orb}}$ $\leqslant $ 50 days range in order to support analysis of the Kepler planet candidate sample outside the parameter space of this study. The $10\;\leqslant $ ${P}_{\mathrm{orb}}$ $\leqslant $ 50 day range has not been uniformly vetted, so for KOI recovery designations in this parameter space we start with the cumulative KOI table combining results from all the Kepler planet catalogs (Borucki et al. 2011a, 2011b; Batalha et al. 2013; Burke et al. 2014; Mullally et al. 2015; Rowe et al. 2015a). We employ an ephemeris matching routine (Mullally et al. 2015) to judge whether a TCE detection from the pipeline run (Tenenbaum et al. 2014) matches the ephemeris for the KOIs. KOIs with a match statistic satisfying that the KOI ephemeris overlaps with $\geqslant 90$% of the TCE's transit events are automatically accepted as recovered. The KOI ephemeris from a previous catalog may not be accurate enough to guarantee 100% of the transit overlapping, and the 90% matching level is sufficient to automatically adopt the TCE as corresponding to the KOI without manual inspection. Also, special exception was made for systems with strong TTVs as discussed above. Table 3 lists the KOIs from the cumulative KOI list that are designated as being recovered in the Q1-Q16 pipeline run for ${P}_{\mathrm{orb}}$ ≥ 10 days.

For the planet radii, we adopt estimates from the uniform KOI analysis in Mullally et al. (2015) and Rowe et al. (2015a). The technique is described fully in Rowe et al. (2014). Briefly, the flux time series data are detrended with a moving cubic polynomial fit. Data occurring during transit or near gaps are excluded from the moving polynomial fit. Assuming a circular orbit, fixed limb darkening parameters from Claret & Bloemen (2011), and stellar parameters from Huber et al. (2014), Rowe et al. (2014) use a MCMC methodology to estimate the best fitting parameters of a limb darkened transit model.

Overall, we find 156 planet candidates orbit stars in the GK dwarf sample within the ${P}_{\mathrm{orb}}$, ${R}_{{\rm{p}}}$ parameter space under investigation. We illustrate the planet candidate sample (orange points) in Figure 5. In our analysis described in Section 5, we do not take into account the uncertainties on ${R}_{{\rm{p}}}$. However, we do explore the influence that systematic changes to the planet candidate sample, stellar sample, and independent model fits have on the resulting planet occurrence rates in Section 6.2.

In this study, we do not model or include the impact of astrophysical false positive contamination in our sample. Following the process outlined in Morton (2012), a preliminary astrophysical false positive analysis was completed for 108 (70%) of the baseline planet candidate sample. We find that the average and median false positive probabilities for the calculated sample are 4% and 0.6%, respectively. Twelve planet candidates in the sample have an astrophysical false positive probability ${p}_{\mathrm{fpp}}\geqslant \ 10\%$ and the highest is 60%. The astrophysical false positive contamination for the parameter space under investigation is within the statistical and systematic uncertainties and can be safely ignored for this study. However, for shorter and longer ${P}_{\mathrm{orb}}$, the astrophysical false positive contamination becomes increasingly important.

5. PLANET OCCURRENCE RATE METHOD

In order to infer the underlying planet occurrence rate from the observed distribution of Kepler planet candidates, we extend the methodology of Youdin (2011). Youdin (2011) present a parametric model for the planet distribution function (PLDF) and use likelihood maximization techniques to estimate the parameters that best describe the observed planet candidate distribution and the parameter uncertainties. We extend the method of Youdin (2011) by employing Bayesian parameter estimation theory using MCMC methods to numerically evaluate the posterior distribution of the PLDF parameters (Gregory 2005). We were motivated to replace the intuitive and analytic minimization method of Youdin (2011) with a Bayesian MCMC parameter estimation method in order to analyze a more complicated PLDF model and enable future efforts to explore higher dimensional models including, for example, dependence on stellar parameters.

Following Youdin (2011), we employ the Poisson distribution for the likelihood. A helpful description motivating the Poisson likelihood is given in Section 5.3.2 of Loredo (1992), and the Poisson likelihood is commonly used in order to interpret astronomical detections with a varying survey sensitivity (Tabachnik & Tremaine 2002; Allen 2007; Cumming et al. 2008; Kraus & Hillenbrand 2012; Foreman-Mackey et al. 2014; Nielsen et al. 2014). Also, the point process statistics literature (Daley & Vere-Jones 1988; Baddeley 2007) rigorously shows that the Poisson likelihood is appropriate for analyses of spatial point data. For this application, the observed planet candidate distribution is treated as an inhomogeneous Poisson process where the PLDF describes the dependence of the Poisson process intensity on ${P}_{\mathrm{orb}}$ and ${R}_{{\rm{p}}}$.

Independent of the choice of likelihood, one is free to choose any parametric form for the PLDF model. Previous work has indicated that a power law form of the PLDF describes the Kepler observations (Youdin 2011; Howard et al. 2012; Dong & Zhu 2013) over portions of the ${R}_{{\rm{p}}}$ and ${P}_{\mathrm{orb}}$ parameter space. For this study, we adopt a PLDF dependent upon ${P}_{\mathrm{orb}}$ and ${R}_{{\rm{p}}}$ parameterized as a power law in ${P}_{\mathrm{orb}}$ and a broken power law in ${R}_{{\rm{p}}}$ over a specified domain ${P}_{\mathrm{min}}\leqslant {P}_{\mathrm{orb}}\leqslant {P}_{\mathrm{max}}$ and ${R}_{\mathrm{min}}\leqslant {R}_{{\rm{p}}}\leqslant {R}_{\mathrm{max}}$:

Equation (7)

where F0 is the integrated planet occurrence rate, ${C}_{{\rm{n}}}$ is a normalization factor, $g({\boldsymbol{x}})$ is the shape function, ${P}_{{\rm{o}}}=({P}_{\mathrm{min}}+{P}_{\mathrm{max}})/2$ and ${R}_{{\rm{o}}}=({R}_{\mathrm{min}}+{R}_{\mathrm{max}})/2$ are domain scaling factors, ${R}_{\mathrm{brk}}$ is the break radius transition between the two ${R}_{{\rm{p}}}$ power-law exponents (${\alpha }_{1}$ and ${\alpha }_{2}$), and β is the ${P}_{\mathrm{orb}}$ power-law exponent. The ${C}_{{\rm{n}}}$ is determined from the normalization requirement,

Equation (8)

Overall, the PLDF has five free parameters: F0, β, ${\alpha }_{1}$, ${\alpha }_{2}$, and ${R}_{\mathrm{brk}}$. Following Equation (18) of Youdin (2011), the Poisson likelihood of the data for a survey that detects ${N}_{\mathrm{pl}}$ planets around ${N}_{\star }$ survey targets is

Equation (9)

where the PLDF model predicted number of detections from the survey is

Equation (10)

and the likelihood ignores constant multiplicative factors. In Equation (10), the underlying PLDF model is modified by the per-star transit survey effectiveness, ${\eta }_{j}({\boldsymbol{x}})$, summed over ${N}_{\star }$ targets in the sample, where ${\eta }_{j}({\boldsymbol{x}})={P}_{j,\mathrm{comp}}\times {P}_{j,\mathrm{tr}}$ is the per-star pipeline completeness model of Section 2 and ${P}_{j,\mathrm{tr}}$ is the geometric probability to transit. The transit probability factor, ${P}_{j,\mathrm{tr}}=({R}_{\star }/a)/(1-{e}^{2})$, depends on the stellar parameters and orbital eccentricity (Burke 2008).

The separable form between ${R}_{{\rm{p}}}$ and ${P}_{\mathrm{orb}}$ of the PLDF adopted in this study, is overly restrictive if applied to a larger range of ${R}_{{\rm{p}}}$. Previous studies have identified a dependence of the ${P}_{\mathrm{orb}}$ exponent, β, on planet radius (Dong & Zhu 2013; Foreman-Mackey et al. 2014), with an apparent transition in the ${P}_{\mathrm{orb}}$ dependence around ${R}_{{\rm{p}}}$ ∼ 4${R}_{\oplus }$. For the $0.75\;\leqslant \;$ ${R}_{{\rm{p}}}$ $\;\leqslant $ 2.5 ${R}_{\oplus }$ analysis region of this study we do not find evidence for a more complicated dependence between ${R}_{{\rm{p}}}$ and ${P}_{\mathrm{orb}}$ being necessary based upon residuals between the observed and model fitted planet counts. Also of note in Equation (10), is that the summation of ${\eta }_{j}$ over the stellar sample is independent of the PLDF parameters. Thus, the summation can be computed once for the analysis and the planet occurrence depends upon the integrated/average transit survey effectiveness alone rather than explicitly depending upon the per-star survey effectiveness.

We complete the Bayesian posterior calculation by specifying uniform priors for all parameters except for F0 which has a prior that is uniform in the logarithm. The adopted MCMC implementation for this analysis is based upon the description in Gregory (2005) that employs a Metropolis–Hastings algorithm with an automated iterative proposal step-size determination and has been applied to transit model light curve analysis (Burke et al. 2007), transit timing analysis (Burke et al. 2010), and radial velocity analysis (Charbonneau et al. 2009; Ballard et al. 2011). For this study, we do not incorporate uncertainty in ${R}_{{\rm{p}}}$ and ignore contributions to the planet candidate sample due to astrophysical and instrumental false positives (see Youdin 2011; Mullally et al. 2015, for a more in-depth discussion). In the case of multiple planet systems, adopting the Poisson likelihood treats multiple planets in a system as independent, and thus this method can not capture any structure and correlations between planet's in a single system, but captures the average behavior over a large sample of stars. However, we do constrain the sensitivity of our results to these potential complications in Section 6.2.

6. RESULTS

In this section, we provide planet occurrence rate determinations based upon the Q1-Q16 Kepler planet candidate sample of Mullally et al. (2015; see Section 4 and Table 3). We focus on the GK dwarf targets observed by Kepler using stellar parameters from the catalog of Huber et al. (2014; see Section 3 and Table 1). We describe our analytic pipeline completeness model in Section 2 that employs the pipeline detection efficiency as calibrated by the Monte-Carlo transit injection and recovery provided by Christiansen et al. (2015). The planet occurrence rate is derived through a parameterized model for the PLDF, where the parameters and their uncertainties are determined within a Bayesian parameter estimation problem with the posterior estimated through MCMC techniques (see Section 5). The above set of inputs represents our current best/baseline model for planet occurrence rates, and we describe the results in Section 6.1. We then perform a sensitivity analysis in Section 6.2 in order to explore the systematic uncertainty in the planet occurrence rates due to imprecise knowledge of the baseline inputs.

6.1. Baseline Results

For the baseline result, we fit the PLDF over the parameter space of 0.75 < ${R}_{{\rm{p}}}$ < 2.5 ${R}_{\oplus }$ and 50 < ${P}_{\mathrm{orb}}$ $\lt 300$ days. We tabulate 10,000 subsamples from the full MCMC posterior samples for all the parameters along with the resulting likelihood and prior values in Table 4. The overall occurrence rate for this parameter space ${F}_{0}=0.77\pm 0.12$ planets per star. Relying on the statistical uncertainty alone, the 3σ upper limit ${F}_{\mathrm{0,3}\sigma \;{\rm{U}}.{\rm{L}}.}=1.3$ implies that we cannot currently rule-out a scenario that when averaged over large samples of GK dwarfs there exists more planets in the analyzed parameter space than stellar hosts. The 3σ lower limit ${F}_{\mathrm{0,3}\sigma \;{\rm{L}}.{\rm{L}}}=0.49$ implies that for large samples of GK dwarfs there exists on average at least one planet in the analyzed parameter space for every two stellar hosts.

Table 4.  PLDF Model Parameter Posterior Samples

${\alpha }_{1}$ ${\alpha }_{2}$ ${R}_{\mathrm{brk}}$ β F0 Ln(Likelihood) Ln(Prior)
−1.80587 9.60189 2.42398 −0.53218 1.04356 −1154.1220 −11.3374
−0.91336 −7.31895 2.20004 −0.68832 0.74751 −1152.1463 −11.3374
−2.55130 −1.50156 1.71089 −0.78946 1.04314 −1156.0940 −11.3374
16.62721 −1.43560 0.91343 −0.68223 0.69378 −1151.8822 −11.3374
6.77569 −1.25637 0.87505 −0.61621 0.67331 −1153.4823 −11.3374
−1.34402 −7.41157 2.37924 −0.45503 0.93427 −1155.0970 −11.3374
−1.90010 −3.45914 2.01354 −0.91872 0.97944 −1155.2124 −11.3374
6.59736 −1.64782 1.00841 −0.55609 0.71677 −1152.5281 −11.3374
−1.96796 −3.36621 2.29527 −0.34320 1.02223 −1155.7010 −11.3374
16.05061 −2.14134 0.94454 −0.76077 0.83308 −1152.5983 −11.3374

Only a portion of this table is shown here to demonstrate its form and content. A machine-readable version of the full table is available.

Download table as:  DataTypeset image

Figure 6 shows how well the parametric PLDF model predicts the observed, uncorrected Kepler planet candidate counts summed over 50 < ${P}_{\mathrm{orb}}$ < 300 days in d${R}_{{\rm{p}}}$ = 0.25 ${R}_{\oplus }$ sized bins (points with uncertainties). Evaluating the PLDF at the parameters that maximize the likelihood fit to the data, we show the model predicted counts (${N}_{\mathrm{exp}}$ of Equation (10) where the limits of integration are 50 < ${P}_{\mathrm{orb}}$ < 300 days and d${R}_{{\rm{p}}}$ = 0.25 ${R}_{\oplus }$) as the white dashed line. In addition, we show the median (solid blue line), 1σ (orange region), and 3σ (blue region) model predicted counts by evaluating ${N}_{\mathrm{exp}}$ using 10,000 random samples from the posterior PLDF parameter estimates from the MCMC chain. Figure 7 shows the equivalent information, but along the ${P}_{\mathrm{orb}}$ dimension after marginalizing over 0.75 ≤ ${R}_{{\rm{p}}}$ ≤ 2.5 ${R}_{\oplus }$ and d${P}_{\mathrm{orb}}$ = 31.25 days. The bin sizes for the abcissae in Figures 6 and 7 are chosen in order to balance segmenting the parameter space range into a high number of evenly sized bins and maintaining at least three detections in each bin.

Figure 6.

Figure 6. Comparison between the predicted planet sample from the planet occurrence rate model and the observed Kepler planet candidate sample. The observed, marginalized over 50 < ${P}_{\mathrm{orb}}$ < 300 days, histogram of Kepler planet candidate counts as a function of ${R}_{{\rm{p}}}$ (points) can be compared to the maximum likelihood model for the predicted counts (white dash line). Also shown is the posterior distributions of the model predicted counts for the median (blue solid line), 1σ region (orange region), and 3σ region (blue region) marginalized over ${P}_{\mathrm{orb}}$ and in bins of d${R}_{{\rm{p}}}$ = 0.25 ${R}_{\oplus }$.

Standard image High-resolution image
Figure 7.

Figure 7. Same as Figure 6, but marginalized over 0.75 < ${R}_{{\rm{p}}}$ < 2.5 ${R}_{\oplus }$ and bins of d${P}_{\mathrm{orb}}$ = 31.25 days.

Standard image High-resolution image

Figure 8 quantifies the underlying PLDF free of the deleterious effects of the Kepler pipeline completeness and geometric transit probability. The white dashed line, representing the PLDF for parameters that maximize the likelihood of the data, rises toward small planets with ${\alpha }_{2}=-1.8$ and has a break near the edge of the parameter space. Given the low numbers of observed planet candidates in the smallest planet bins, the full posterior allowed behavior (1σ orange region ; 3σ blue region) cannot distinguish between a rising or falling PLDF for ${R}_{{\rm{p}}}$ $\;\lesssim \;1.5$ ${R}_{\oplus }$. Figure 9 shows the equivalent information, but along the ${P}_{\mathrm{orb}}$ dimension after marginalizing over 0.75 ≤ ${R}_{{\rm{p}}}$ ≤ 2.5 ${R}_{\oplus }$ and d${P}_{\mathrm{orb}}$ = 31.25 days.

Figure 8.

Figure 8. Shows the underlying planet occurrence rate model. Marginalized over 50 < ${P}_{\mathrm{orb}}$ < 300 days and bins of d${R}_{{\rm{p}}}$ = 0.25 ${R}_{\oplus }$ planet occurrence rates for the model parameters that maximize the likelihood (white dash line). Posterior distribution for the underlying planet occurrence rate for the median (blue solid line), 1σ region (orange region), and 3σ region (blue region). An approximate PLDF based upon results from Petigura et al. (2013a) for comparison (dash dot line).

Standard image High-resolution image

Formally, in our baseline analysis of the GK dwarf sample, the double power law in the ${R}_{{\rm{p}}}$ model is unwarranted relative to a single power law according to the Bayesian information criterion (BIC) methodology for model comparison. However, we choose to provide the final results in terms of the double power law model for the following reasons: (a) the additional flexibility of the double power law model provides a better fit to the smallest ${R}_{{\rm{p}}}$ parameter space of most interest, whereas the single power law model systematically overestimates (by ∼0.5 σ in a comparable data/model comparison to that shown in Figure 6) the occurrence rates in the smallest ${R}_{{\rm{p}}}$ bins. (b) The more complicated model ensures the ability to adapt to variations in the PLDF in the sensitivity analysis of Section 6.2. (c) Previous work on Kepler planet occurrence rates indicated a break in the planet population for $2.0\;\lesssim \;$ ${R}_{{\rm{p}}}$ ≲ 2.8 ${R}_{\oplus }$(Fressin et al. 2013; Petigura et al. 2013a, 2013b; Silburt et al. 2015). (d) Finally, extending this work to a larger parameter space and for alternative target selection samples, such as the Kepler M dwarf sample where a sharp break at ${R}_{{\rm{p}}}$ ∼ 2.5 ${R}_{\oplus }$ is observed (Dressing & Charbonneau 2013; Burke et al. 2015), the double power law in ${R}_{{\rm{p}}}$ is strongly (BIC $\gt 10$) warranted.

Symptomatic of the weak evidence for a broken power law model over the $0.75\;\leqslant \;$ ${R}_{{\rm{p}}}$ $\;\leqslant $ 2.5 ${R}_{\oplus }$ range, ${R}_{\mathrm{brk}}$ is not constrained within the prior ${R}_{{\rm{p}}}$ limits of the parameter space. When ${R}_{\mathrm{brk}}$ is near the lower and upper ${R}_{{\rm{p}}}$ limits, ${\alpha }_{1}$ and ${\alpha }_{2}$ also become poorly constrained, respectively. To provide a more meaningful constraint on the average power law behavior for ${R}_{{\rm{p}}}$ in the double power law PLDF model, we introduce ${\alpha }_{\mathrm{avg}}$, which we set to ${\alpha }_{\mathrm{avg}}={\alpha }_{1}$ if ${R}_{\mathrm{brk}}\geqslant {R}_{\mathrm{mid}}$ and ${\alpha }_{\mathrm{avg}}={\alpha }_{2}$ otherwise, where ${R}_{\mathrm{mid}}$ is the midpoint between the upper and lower limits of ${R}_{{\rm{p}}}$. We find ${\alpha }_{\mathrm{avg}}=-1.54\pm 0.5$ and $\beta =-0.68\pm 0.17$ for our baseline result. We use ${\alpha }_{\mathrm{avg}}$ as a summary statistic for the model parameters only to enable a simpler comparison of our results to independent analyses of planet occurrence rates and to approximate the behavior for the power law ${R}_{{\rm{p}}}$ dependence if we had used the simpler single power law model. The results for a single power law model in both ${R}_{{\rm{p}}}$ and ${P}_{\mathrm{orb}}$ are equivalent to the results for the double power law model (${F}_{0}=0.83\pm 0.13$, $\alpha =-1.56\pm 0.3$, and $\beta =-0.68\pm 0.17$).

In Table 5, we provide the parameters of the PLDF that maximizes the likelihood for the data in our baseline analysis as well as the median and percentile posterior values for F0, β, and ${\alpha }_{\mathrm{avg}}$. Additional statistics for the full five parameter PLDF can be estimated from the 10,000 posterior MCMC samples in Table 4.

Table 5.  PLDF Model Parameter Summary

Parameter Estimate F0 ${\alpha }_{1}$ ${\alpha }_{2}$ ${R}_{\mathrm{brk}}$ ${\alpha }_{\mathrm{avg}}$ β
Likelihood Max 0.73 19.68 −1.78 0.94 ... −0.65
0.13% 0.48 ... ... ... −3.09 −1.20
15.9% 0.66 ... ... ... −1.97 −0.85
50.0% 0.77 ... ... ... −1.54 −0.68
84.1% 0.92 ... ... ... −1.04 −0.35
99.9% 1.32 ... ... ... 0.53 −0.19
Lower Limit 0.28 ... ... ... −3.25 −1.4
Upper Limit 1.92 ... ... ... 0.53 -0.10

Download table as:  ASCIITypeset image

6.2. Sensitivity Analysis

Planet occurrence rate calculations are only as accurate as the inputs. The baseline results of Section 6 represent our current best set of data that are uniformly applicable to the Kepler observations and planet search results. The resulting posterior distribution for the PLDF parameters in the above analysis only represent their statistical precision and do not capture potential sources of systematic uncertainties present in the inputs. To explore the level of systematic errors present in the current results, we repeat the baseline analysis, but for several scenarios in which we change a single input. The following sections describe results of these sensitivity tests.

6.2.1. Pipeline Completeness Systematics

The pipeline detection efficiency we employ for the baseline analysis is calibrated with transit injection and recovery tests (Christiansen et al. 2015), but it represents the pipeline response averaged over a wider range of Kepler targets than the limited GK dwarf sample of this study. In addition, Christiansen et al. (2015) analyzed a shorter (four Kepler quarter) subset of the entire Q1-Q16 data. It is expected that the pipeline completeness primarily depends upon the MES and number of transits, thus the results from the shorter four quarter analysis are applicable to the sixteen quarter run. However, star-by-star deviations are expected, and until we perform larger injection studies it is prudent to investigate the sensitivity of the occurrence rates to this potential source of uncertainty. We consider optimistic and pessimistic detection efficiencies relative to the baseline result. For the optimistic detection efficiency, we assume the ideal theoretical expected performance of TPS given by Equation (4). For the pessimistic detection efficiency, we assume the result from Fressin et al. (2013), where they find a linear detection efficiency over the range 6 < MES < 16 provides the best match to the S/N distribution of the Q1-Q6 Kepler planet candidate sample (Batalha et al. 2013). The detection efficiency of the Q1-Q6 Kepler pipeline was never measured directly using Monte-Carlo transit recovery tests. Thus, we cannot determine the accuracy of the Fressin et al. (2013) detection efficiency relative to the Q1-Q6 Kepler pipeline run. However, having measured the detection efficiency for the Q1-Q16 pipeline run (Christiansen et al. 2015), the Fressin et al. (2013) detection efficiency is overly pessimistic for the pipeline completeness of the Q1-Q16 pipeline run.

Figure 9.

Figure 9. Same as Figure 8, but marginalized over 0.75 < ${R}_{{\rm{p}}}$ < 2.5 ${R}_{\oplus }$ and bins of d${P}_{\mathrm{orb}}$ = 31.25 days.

Standard image High-resolution image

Overall, an overly optimistic detection efficiency reduces the planet occurrence rate and a pessimistic detection efficiency increases the planet occurrence rates. We show in Figure 10 the posterior integrated planet occurrence rate for the baseline result (orange histogram) compared to the case of an optimistic (black line) and the pessimistic (black with circles line) detection efficiency alternatives. For this comparison we narrow the parameter space of integration (1.0 ≤ ${R}_{{\rm{p}}}$ ≤ 2.0 ${R}_{\oplus }$ and 50 ≤ ${P}_{\mathrm{orb}}$ ≤ 200 days) in order to limit the comparison to a region of parameter space with better statistics and to facilitate comparison with Kepler occurrence rates from previous studies. We symbolize this narrower parameter space planet occurrence rate as F1. For clarity of display in Figure 10, the optimistic and pessimistic occurrence rate posteriors are shown by a log-normal fit to the posterior rather than the full posterior detail in a histogram format. The pessimistic detection efficiency has a ≥3σ larger occurrence rate than the baseline result and the optimistic detection efficiency is 2.5σ lower than the baseline result. This initial test demonstrates that systematic effects can be larger than the random uncertainties.

Figure 10.

Figure 10. Posterior distribution for the integrated planet occurrence rate over the 1.0 < ${R}_{{\rm{p}}}$ < 2.0 ${R}_{\oplus }$ and 50 < ${P}_{\mathrm{orb}}$ < 200 days parameter space, F1. Changes from the baseline inputs (filled orange histogram) systematically impact the derived occurrence rate beyond the statistical uncertainty. We discuss in Section 6.2 alternative inputs: optimistic detection efficiency (black line), pessimistic detection efficiency (black with circles line), assuming e = 0.4 for all orbits (black with triangles line), original KIC stellar parameters (yellow line), alternative DV ${R}_{{\rm{p}}}$ (yellow with circles line), low reliability planet candidate sample (blue line), high reliability planet candidate sample (blue with circles line), and assuming a single planet search (red line).

Standard image High-resolution image

The alternative inputs also influence the other "shape" parameters of the PLDF. We show samples from the posterior distribution of ${\alpha }_{\mathrm{avg}}$ (Figure 11) and β (Figure 12) as a function of F0 for the baseline (orange points) occurrence rate parameter estimates. As an approximation to the joint 2σ posterior distribution we model the posterior as a multi-normal distribution with major and minor axes along the eigenvectors determined from the posterior samples (orange ellipse). For comparison, we show the optimistic (black ellipse) and pessimistic (black with circles ellipse) detection efficiency solutions by the 2σ ellipse model for the shape parameters. The systematic variations of ${\alpha }_{\mathrm{avg}}$ and β are correlated with F0.

6.2.2. Orbital Eccentricity

In the baseline result we have assumed circular orbits when constructing the model for pipeline completeness. However, radial velocity studies have revealed that eccentric orbits are common for ${P}_{\mathrm{orb}}$ ≥ 10 days (Butler et al. 2006). A nonzero eccentricity results in higher probability to transit, but a shorter transit duration degrades the transit S/N (Burke 2008). Burke (2008) shows that yields from a transit survey could be up to 25% higher using the observed distribution of radial velocity planets. We investigate a limiting case of assuming all planets have e = 0.4. The e = 0.4 case results in an 11% (1σ) lower planet occurrence rate (black with triangles line in Figure 10). Thus, for this parameter space the systematic effect due to orbital eccentricity is comparable to the statistical errors. The impact of eccentricity on ${\alpha }_{\mathrm{avg}}$ and β is also modest.

6.2.3. Stellar Parameter Systematics

Stellar parameter estimates of Kepler targets are subject to systematic uncertainties (Brown et al. 2011; Mann et al. 2012; Muirhead et al. 2012; Pinsonneault et al. 2012; Dressing & Charbonneau 2013; Everett et al. 2013; Gaidos & Mann 2013) and multiplicity/blend effects (Cartier et al. 2014; Lillo-Box et al. 2014; Ciardi et al. 2015). Also, the stellar parameter catalog of Huber et al. (2014) relies upon a heterogeneous compilation of input sources and still has some limitations (see their Section 8 for a discussion). As a proxy for constraining the impact on occurrence rate studies due to stellar parameter systematics, we repeat the analysis but adopt stellar parameters from the original KIC (Brown et al. 2011). Using the original KIC is also germane since it was employed for previous work on planet occurrence rates with Kepler (Howard et al. 2012; Fressin et al. 2013).

We redo the GK dwarf target selection resulting in 102,186 targets that meet the selection criteria. There are 83,724 targets (91.4%) in common with the baseline GK dwarf sample discussed in Section 3. Table 1 provides a binary flag to indicate that the Kepler target was included in the stellar sample based upon the original KIC stellar parameters. In the original KIC GK dwarf sample there are 177 planet candidates that have 122 planet candidates (78.2%) in common with the baseline planet candidate sample discussed in Section 4. Table 3 has a binary flag indicating the planet candidates selected for this original KIC stellar sample. We adjust the derived ${R}_{{\rm{p}}}$ of the planet candidate sample by linearly scaling ${R}_{{\rm{p}}}$ by the ratio in ${R}_{\star }$ between the baseline and the original KIC values.

The net impact of the alternative stellar parameters results in $\sim 2\sigma $ higher occurrence rates (yellow line in Figure 10). The change in ${\alpha }_{\mathrm{avg}}$ is larger than for β (yellow ellipse in Figures 11 and 12).

6.2.4. Planet Candidate Parameters

Planet radii are not a direct observable, and they must be derived through parameter fits to light curves with various assumptions as to the stellar parameters, limb darkening coefficients, flux time series detrending, treatment of instrumental effects, orbital eccentricity, and third light contamination to name a few (Mandel & Agol 2002; Seager & Mallén-Ornelas 2003; Giménez 2006; Holman et al. 2006; Burke et al. 2007; Torres et al. 2008; Southworth 2011; Kipping 2014; Ciardi et al. 2015; Mullally et al. 2015; Rowe et al. 2015a). In our current analysis, we treat ${R}_{{\rm{p}}}$ as perfectly known without uncertainty. Recent work has pointed out the non-negligible bias in deriving planet occurrence rates without taking into account the full error distribution of ${R}_{{\rm{p}}}$ (Foreman-Mackey et al. 2014; Morton & Swift 2014; Dressing & Charbonneau 2015; Silburt et al. 2015). Detailed planet parameter posterior estimates have only recently become available for a majority of the Kepler planet candidate sample (Rowe et al. 2015b), thus we defer occurrence rate analysis using a full posterior distribution of planet radii for future work.

We repeat the occurrence rate calculation using the alternative ${R}_{{\rm{p}}}$ estimates provided by the DV module of the Kepler pipeline (Wu et al. 2010). The most important differences between the DV analysis and the baseline planet parameters from Mullally et al. (2015) and Rowe et al. (2014) are the independent methods of detrending the flux time series data and DV use of ${\chi }^{2}$ minimization instead of a MCMC analysis. Both analyses assume the same stellar parameters, fixed limb darkening coefficients, zero eccentricity, and begin with the pre-search data conditioning time series (Smith et al. 2012; Stumpe et al. 2012). Mullally et al. (2015) find that the radii ratios, ${R}_{{\rm{p}}}$/${R}_{\star }$ from the MCMC analysis are ∼7% smaller than from the analysis in DV. The typically larger ${R}_{{\rm{p}}}$ from DV results in $\sim 2.2\sigma $ lower occurrence rates (yellow with circles line in Figure 10). The change in ${\alpha }_{\mathrm{avg}}$ is larger than for β (yellow with circles ellipse in Figures 11 and 12).

6.2.5. Planet Candidate Sample

Characterizing a detection by the Kepler pipeline as a bona fide member of the Kepler planet candidate sample has increasingly relied upon an automated classification procedure (McCauliff et al. 2014; Coughlin et al. 2015; Mullally et al. 2015; Rowe et al. 2015a). However, the accuracy, efficacy, and impact on deriving planet occurrence rates due to the automated classification and remaining manual vetting decision steps have not been fully quantified. The vetting process has its own false negative alarm rate outside of the pipeline completeness, that currently we do not account for. In addition, the decision process for both the automated and manual decision methods becomes increasingly less definitive toward low S/N (see the discussion of the current planet sample limitations in Sections 7 and 9.1 of Mullally et al. 2015). The planet candidate catalog of Mullally et al. (2015) takes an "innocent until proven guilty" approach to deal with the indeterminant diagnostics in the low S/N regime. The instrumental aperture contamination and crosstalk also become increasingly difficult to identify at low S/N (Coughlin et al. 2014). We constrain the potential systematics in deriving planet occurrence rates due to uncertainty in the planet candidate sample classification process by considering two alternative planet samples.

First, we include a population of twelve "lower reliability" KOIs with a false positive disposition in the $50\;\leqslant $ ${P}_{\mathrm{orb}}$ $\leqslant \;300$ days and ${R}_{{\rm{p}}}$ $\;\leqslant \;2.5$ ${R}_{\oplus }$ parameter space under investigation (see Table 6). This sample of "lower reliability" KOIs were characterized as planet candidates for all the vetting procedures described in Mullally et al. (2015) except one. These KOIs are false positives because they failed to maintain an ${\rm{S}}/{\rm{N}}\geqslant 7.1$ in the independent detrending employed for the MCMC planet parameter estimates (Rowe et al. 2014). Prior to the MCMC evaluation, a trial ${\chi }^{2}$ minimization provides a parameter initialization. These lower reliability KOIs failed to yield ${\rm{S}}/{\rm{N}}\geqslant 7.1$ in this trial fit and were therefore demoted from a PLANET CANDIDATE to a FALSE POSITIVE disposition. Requiring an independent recovery of a potential detection is a valuable criteria for a planet candidate, but it largely impacts our lowest S/N detections and we have not fully quantified the false negative rate of this independent recovery test. In lieu of a more detailed investigation, it provides a useful limiting test case sample to constrain the potential breakdown of the vetting metrics at the lowest S/N of the planet candidate sample. Including a lower reliability KOI sample in the planet candidate list, results in $\sim 1\sigma $ higher occurrence rates (blue line in Figure 10) and modest changes in ${\alpha }_{\mathrm{avg}}$ and β (blue ellipse in Figures 11 and 12).

Table 6.  Low Reliability KOI False Positive Sample

KOI Number
4954.01
5043.01
5081.01
5102.01
5123.01
5177.01
5198.01
5210.01
5257.01
5309.01
5325.01
5405.01

Download table as:  ASCIITypeset image

Second, we cull the baseline KOI planet candidate sample to the most reliable detections by requiring KOIs to have been detected in at least one other pipeline run. Each run of the Kepler pipeline is independent and has different amounts and versions of the data. To remain in the "high reliability" planet candidate sample, we require a KOI to be represented as a TCE in either the Q1-Q12 pipeline run (Tenenbaum et al. 2013; Rowe et al. 2015a), the Q1-Q17 pipeline run (Seader et al. 2015), or a testing/development run using Q1-Q17 data with a near-final Kepler pipeline code version. This requirement removed 26 KOI planet candidates in the $50\;\leqslant $ ${P}_{\mathrm{orb}}$ $\leqslant \;300$ days and ${R}_{{\rm{p}}}$ $\;\leqslant \;$ 2.5 ${R}_{\oplus }$ parameter space under investigation. Table 3 contains a binary flag indicating the planet candidates belonging to this high reliability planet sample. Adopting a higher reliability KOI sample results in $\sim 2.2\sigma $ lower occurrence rates (blue with circles line in Figure 10). The change in ${\alpha }_{\mathrm{avg}}$ and β (blue with circles ellipse in Figures 11 and 12) are consistent with preferentially removing the lower S/N KOIs which typically reside at smaller radii and longer orbital periods.

The final systematic we investigate is the impact of limiting the search to a single planet per target, effectively ignoring the multiple planet search in the Kepler pipeline. We provide this result in order to more directly compare independent analyses of the Kepler data that do not search for multiple planets (Petigura et al. 2013a). Petigura et al. (2013a) estimate that their occurrence rates would be ∼25% higher by including multiple planet systems in their study. We concur with their estimate by finding a 25% ($\sim 2.2\sigma $) lower occurrence rate by only including the lowest numbered KOI (typically the highest S/N) of a system (red line in Figure 10). The change in ${\alpha }_{\mathrm{avg}}$ is negligible and β prefers a more gradual decrease in planet occurrence with ${P}_{\mathrm{orb}}$ despite the lower overall occurrence rate normalization (red ellipse in Figures 11 and 12).

6.2.6. Systematic Error Summary

The previous sections show that individual systematic effects can reach 2σ biases in the occurrence rates, where σ is determined from statistical errors alone. Unfortunately, multiple systematic effects can add coherently rather than quadratically (see Section 7). To provide a more realistic uncertainty in the context of all these systematic uncertainties, we express the uncertainties on the occurrence rate parameters as an acceptable range. We adopt the lower and upper limit of the acceptable range as the 2σ lower and upper limit for the largest systematic effect calculated in the previous sections. Based upon the results in this section, we find that the planet occurrence rate for the 1.0 ≤ ${R}_{{\rm{p}}}$ ≤ 2.0 ${R}_{\oplus }$ and 50 ≤ ${P}_{\mathrm{orb}}$ ≤ 200 days parameter space to have a best value from the baseline calculation of ${F}_{1}=0.34$ with an acceptable range of $0.19\leqslant {F}_{1}\;\leqslant $ 0.7. We provide acceptable ranges for the PLDF model parameters in Table 5. In addition, Tables 7 and 8, provides the parameters of the PLDF that maximizes the likelihood for the data as well as the percentile posterior values for F0, β, and ${\alpha }_{\mathrm{avg}}$ for all of the alternative analyses considered in the sensitivity analysis, respectively.

Table 7.  PLDF Model Parameters Maximizing Likelihood For Alternative Analyses

Analysis Identifier F0 ${\alpha }_{1}$ ${\alpha }_{2}$ ${R}_{\mathrm{brk}}$ β
Baseline 0.731 19.684 −1.779 0.941 −0.655
Optimistic Efficiency 0.502 27.317 −1.334 0.940 −0.773
Pessimistic Efficiency 1.121 29.886 −2.183 0.940 −0.576
Eccentric 0.638 28.817 −1.856 0.940 −0.669
Original KIC 1.212 −2.269 −26.911 2.383 −0.881
DV Rp 1.212 −2.269 −26.911 2.383 −0.881
Low Reliability 0.799 29.650 −1.867 0.941 −0.593
High Reliability 0.475 28.923 −1.393 0.980 −0.905
First Planet Only 0.536 29.846 −1.573 0.940 −0.554
Long Period 3.045 29.215 −4.594 1.042 0.867
Trimmed Long Period 1.199 29.546 −3.716 1.231 0.416

Download table as:  ASCIITypeset image

Table 8.  Model Parameter Posterior Percentiles For Alternative Analyses

Analysis Identifier Percentile F0 ${\alpha }_{\mathrm{avg}}$ β
Baseline 0.135 0.484 −3.086 −1.199
Baseline 2.275 0.560 −2.439 −1.024
Baseline 15.865 0.656 −1.973 −0.851
Baseline 50.000 0.774 −1.544 −0.682
Baseline 84.135 0.918 −1.038 −0.517
Baseline 97.725 1.098 −0.324 −0.353
Baseline 99.865 1.325 0.532 −0.191
Optimistic Efficiency 0.135 0.359 −2.637 −1.277
Optimistic Efficiency 2.275 0.412 −2.051 −1.100
Optimistic Efficiency 15.865 0.474 −1.616 −0.928

Only a portion of this table is shown here to demonstrate its form and content. A machine-readable version of the full table is available.

Download table as:  DataTypeset image

7. DISCUSSION

In this section, we compare our PLDF to previous work on the Kepler target sample that included analysis of the G dwarf targets using at least twelve quarters of Kepler data. We generally find higher occurrence rates, no evidence for a break at ${R}_{{\rm{p}}}$ $\;\lesssim \;2.5$ ${R}_{\oplus }$, increasing planet occurrence rates toward ${R}_{{\rm{p}}}$ = 1.0 ${R}_{\oplus }$, slightly shallower drop-off of occurrence rates toward longer ${P}_{\mathrm{orb}}$, and larger uncertainty on occurrence rates driven by systematic effects.

Figure 11.

Figure 11. Samples from the posterior distribution of F0 as a function ${\alpha }_{\mathrm{avg}}$ for the baseline results (orange points) along with an approximate 2σ error ellipse for the baseline results (orange ellipse). Also shown are 2σ error ellipses for the alternative inputs with the same line types as in Figure 10.

Standard image High-resolution image
Figure 12.

Figure 12. Same as Figure 11, but showing the posterior distribution of F0 as a function β.

Standard image High-resolution image

As a primary source for comparison, we refer to the independent pipeline analysis on planet occurrence rates by Petigura et al. (2013a). We compare the integrated planet occurrence rate over the 1.0 ≤ ${R}_{{\rm{p}}}$ ≤ 2.0 ${R}_{\oplus }$ and 50 ≤ ${P}_{\mathrm{orb}}$ ≤ 200 days range, F1, in Figure 13. We approximate the result from Petigura et al. (2013a; F1 = 9% ± 3% occurrence rate) as a Gaussian (black line) with value and uncertainty as published from their Figure 2. The posterior distribution of our baseline result (orange histogram) demonstrates a significant difference from the occurrence rate of Petigura et al. (2013a). For consistency with the TERRA pipeline (which does not search for multiple planets), we show our alternative occurrence rate after keeping only the highest S/N planet candidate for a target (red line) in Figure 13. The "first planet only" occurrence rate does not fully remove the difference. In our analysis we explored numerous alternative inputs (see Section 6.2). Even when assuming a wide variety of systematics, we have a difficult time reconciling our results with Petigura et al. (2013a).

Figure 13.

Figure 13. Comparison of the integrated posterior distribution from our baseline PLDF over the 1.0 < ${R}_{{\rm{p}}}$ < 2.0 ${R}_{\oplus }$ and 50 < ${P}_{\mathrm{orb}}$ < 200 days parameter space (orange filled histogram) to previous results over the same parameter space from Petigura et al. (2013a; black curve) and Mulders et al. (2015; blue curve). The posterior distribution from the other works are approximated as Gaussians with their central and standard deviation parameters as published. We also show our single planet search results (red line) and an extreme scenario where the four leading systematics resulting in lower occurrence rates are combined (red with circles line, see Section 7).

Standard image High-resolution image

It is possible that several sources of systematics add coherently to reconcile the results. For instance, we can reproduce the occurrence rate of Petigura et al. (2013a; black line in Figure 13) by combining together four of the systematic effects resulting in lower occurrence rates: single planet search only, alternative DV ${R}_{{\rm{p}}}$, highest reliability KOIs, and optimistic detection efficiency. The resulting occurrence rate (red with circles line in Figure 13) ${F}_{1}=0.09$ is less than our lower limit of an acceptable range ${F}_{1}\geqslant 0.19$. Further work is needed to understand the differences between our results and the results of Petigura et al. (2013a). Some possibilities including increasing the number of injection and recovery trials, characterizing the impact of the flux time series detrending on planet recovery, investigating systematic differences between the stellar parameter estimates of the planet candidate hosts and non-planet candidate hosts, and better characterization of the biases that may be present when using the binned occurrence rate methodology (Morton & Swift 2014). For instance, Foreman-Mackey et al. (2014) note a bias in the binned occurrence rate methodology is present if the completeness function is evaluated at exactly the location of the planet parameters rather than being averaged over the entire bin of analysis.

One characteristic result of Petigura et al. (2013a) is a plateau to declining occurrence rates in the mini-Neptune to terrestrial planet regime. To enable comparison of the results from our parametric PLDF model, we derive an approximate PLDF consistent with the occurrence rate from Figure 3 of Petigura et al. (2013a), marginalized over 5 ≤ ${P}_{\mathrm{orb}}$ ≤ 100 days. We determine a PLDF with two free parameters, ${g}_{{\rm{p}}}={F}_{{\rm{p}}}{R}_{{\rm{p}}}^{{\alpha }_{{\rm{p}}}}{P}_{\mathrm{orb}}^{-1}$, using the $1\;\leqslant \;$ ${R}_{{\rm{p}}}$ $\;\leqslant $ 1.4 ${R}_{\oplus }$ and $2\;\leqslant \;$ ${R}_{{\rm{p}}}$ $\;\leqslant $ 2.8 ${R}_{\oplus }$ bins from Figure 3 of Petigura et al. (2013a), yielding ${\alpha }_{{\rm{p}}}=-0.3677$ and ${F}_{{\rm{p}}}=0.103$. The approximating PLDF yields an occurrence rate of 14.9% for the $1.4\;\leqslant \;$ ${R}_{{\rm{p}}}$ $\;\leqslant $ 2.0 ${R}_{\oplus }$ bin compared to the published value of 14.2% ± 1.0% from Petigura et al. (2013a). As a further check, the approximating PLDF yields a F1 = 12.5% occurrence rate compared to F1 = 9% ± 3% for the published 1.0 ≤ ${R}_{{\rm{p}}}$ ≤ 2.0${R}_{\oplus }$ and 50 ≤ ${P}_{\mathrm{orb}}$ ≤ 200 days rate of Petigura et al. (2013a). The derived ${\alpha }_{{\rm{p}}}$ is ∼2σ (statistical uncertainty alone) shallower than our ${\alpha }_{\mathrm{avg}}=-1.5$ power law dependence of the occurrence rates on ${R}_{{\rm{p}}}$. However, ${\alpha }_{{\rm{p}}}$ is consistent with our results for ${\alpha }_{\mathrm{avg}}$ if one considers the full systematic range −3.25 $\leqslant {\alpha }_{\mathrm{avg}}\leqslant 0.53$ allowed from the sensitivity analysis of Section 6.2. A similar comparison applies to ${P}_{\mathrm{orb}}$ dependence of the PLDF for the β parameter. Our $\beta =-0.68$ is ∼2σ (statistical uncertainty alone) shallower than the ${\beta }_{{\rm{p}}}=-1$ dependence qualitatively stated in Petigura et al. (2013a). However within the full range allowed, −1.4 $\;\leqslant \;\beta \;\leqslant \;$−0.1, the two values agree. We show the approximating PLDF (dash dot line) for comparison to our result in Figure 8.10

If we use our baseline inputs to fit the broken power law in ${R}_{{\rm{p}}}$ PLDF over a larger, 0.75 ≤ ${R}_{{\rm{p}}}$ ≤ 5.0 ${R}_{\oplus }$, parameter space, we do find decisive evidence, BIC ≥ 10, for the broken power law model over a single power law in ${R}_{{\rm{p}}}$ PLDF. The derived ${R}_{\mathrm{brk}}={3.3}_{-0.4}^{+0.2}$ ${R}_{\oplus }$, with a ${\alpha }_{1}=-1.72\pm 0.3$ power law dependence for ${R}_{{\rm{p}}}$ $\;\lt \;{R}_{\mathrm{brk}}$ and ${\alpha }_{2}=-6.6\pm 1.7$. The planet occurrence rate derived from this study is consistent with a power law break, but we find that it qualitatively occurs at a larger radius than the study of Petigura et al. (2013b; ${R}_{\mathrm{brk}}\sim 2.5$), but is consistent with the qualitatively stated break at ${R}_{\mathrm{brk}}\sim 3$ of Dong & Zhu (2013).

We also compare to the integrated occurrence rate from the Kepler G dwarf sample of Mulders et al. (2015; blue line) in Figure 13. To estimate a value from Table 7 of Mulders et al. (2015), the 150 ≤ ${P}_{\mathrm{orb}}$ ≤ 250 bin was weighted by 0.56 assuming a ${P}_{\mathrm{orb}}$−1 PLDF dependence across the bin. This occurrence rate is in between the results of Petigura et al. (2013a) and this study, and has uncertainty overlap with both studies especially when considering the systematic sources of error. We find a very similar result between Dong & Zhu (2013) and Mulders et al. (2015) for the occurrence rate in this parameter space. Silburt et al. (2015) find results comparable to Petigura et al. (2013a).

8. EXTRAPOLATION TO LONGER PERIODS

In this section, we compare the observed Q1-Q16 planet candidate sample at longer periods (300 ≤ ${P}_{\mathrm{orb}}$ ≤ 700 days) to the predicted planet candidate yield deduced by extrapolating the PLDF model with parameters determined from the shorter period (50 ≤ ${P}_{\mathrm{orb}}$ ≤ 300 days) parameter space. In our baseline study, we purposely avoided the longer period regime because the planet candidate sample with three transit events and low MES has the potential for a substantially higher false alarm rate (see the discussion in Section 9.1 of Mullally et al. 2015). In previous planet candidate samples, the false alarm rate was minimal since a KOI detection from an earlier pipeline run could be compared to a later pipeline run with substantially more data available. With the ending of the Kepler primary mission, further data beyond Q1-Q17 is not available to verify our lowest MES detections.

Figure 14 shows the average pipeline completeness contours toward longer ${P}_{\mathrm{orb}}$ for the GK dwarf sample of this study along with the Kepler planet candidate sample in this regime. Using this long period pipeline completeness model and the shorter period PLDF model, we predict the expected planet candidate yield for Kepler. The top panel of Figure 15 shows the difference between observed and predicted planet candidate counts marginalized over 300 ≤ ${P}_{\mathrm{orb}}$ ≤ 700 days. There is a statistically significant overabundance of planet candidates toward longer periods than predicted from the baseline PLDF derived at the shorter orbital periods. The largest discrepancy is for the smallest ${R}_{{\rm{p}}}$ bin under consideration in Figure 15. The bottom panel of Figure 15 shows the observed minus predicted planet candidate counts as a function of ${P}_{\mathrm{orb}}$ after marginalizing over 0.75 ≤ ${R}_{{\rm{p}}}$ ≤ 2.5 ${R}_{\oplus }$. The most significant overabundance is in the middle ${P}_{\mathrm{orb}}$ bin. The largest contributor to the overabundance are the cluster of five planet candidates around ${R}_{{\rm{p}}}$ ∼ 1.1 ${R}_{\oplus }$ and ${P}_{\mathrm{orb}}$ ∼ 500 days that fall along the (0.01) average pipeline completeness contour level.

Figure 14.

Figure 14. Average pipeline completeness contours for the GK dwarf sample toward longer, 300 < ${P}_{\mathrm{orb}}$ < 700 days, along with the Q1-Q16 Kepler planet candidate sample (orange points).

Standard image High-resolution image
Figure 15.

Figure 15. Top: marginalized over periods of 300 < ${P}_{\mathrm{orb}}$ < 700 days observed Kepler planet candidate counts minus the predicted planet candidate counts obtained by extrapolating our planet occurrence rate results from the shorter 50 < ${P}_{\mathrm{orb}}$ < 300 days analysis of Section 6. Bottom: same as top, but marginalized over planet radius 0.75 < ${R}_{{\rm{p}}}$ < 2.5 ${R}_{\oplus }$.

Standard image High-resolution image

The significant overabundance of planet candidates implies that extrapolations of our PLDF from the 50 ≤ ${P}_{\mathrm{orb}}$ ≤ 300 day range may underestimate the planet occurrence rates toward longer periods. However, at this time we cannot distinguish between a higher occurrence of planets toward long periods in the Kepler GK dwarf planets, a larger false alarm contribution among the lowest MES planet candidates, or systematic bias in our simplified pipeline completeness model. We are investigating flux time series inversion and permutation tests along with a bootstrap noise characterization test (Seader et al. 2015) in order to calibrate the false alarm rate in the Kepler planet candidate sample.

9. TERRESTRIAL PLANET OCCURRENCE RATE FOR VENUS ORBITAL PERIODS

Earth's sister planet, Venus, has an orbital period within the ${P}_{\mathrm{orb}}$ range of the baseline analysis. Thus, in this section we present results for the occurrence rate of terrestrial planets corresponding to the ${P}_{\mathrm{orb}}$ ∼ 0.6 years of Venus. We define ${\zeta }_{0.6}$ as the 0.6 years terrestrial planet occurrence rate, which we take to be within 20% of ${R}_{{\rm{p}}}$ = 1 ${R}_{\oplus }$ and 20% of ${P}_{\mathrm{orb}}$ ${}_{\unicode{x2640}}$. The integral range of 20% is within the expectations for the regime of rocky terrestrial planets (Wolfgang & Lopez 2014; Rogers 2015). Since ${P}_{\mathrm{orb}}$ is a direct observable, providing occurrence rates in terms of ${P}_{\mathrm{orb}}$ such as ζ, has advantages over providing occurrence rates in terms of stellar insolation flux, such as the Venus zone (${\eta }_{\unicode{x2640} }$) concept of Kane et al. (2014) or the HZ (${\eta }_{\oplus }$) concept (Kasting et al. 1993; Selsis et al. 2007; Kopparapu et al. 2013; Zsom et al. 2013). Stellar insolation flux is an indirectly measured quantity and ${\eta }_{\unicode{x2640} }$ and ${\eta }_{\oplus }$ depend upon uncertain theoretical models for terrestrial planet atmospheric evolution. Providing occurrence rates in terms of ${P}_{\mathrm{orb}}$ facilitates comparison with future Kepler occurrence rate studies and is readily compared to theoretical terrestrial planet formation models.

We defer the additional complications in calculating ${\eta }_{\unicode{x2640} }$ and ${\eta }_{\oplus }$ to future work. Despite the complications, for G dwarfs, ${\zeta }_{0.6}$ is a subset of the full ${\eta }_{\unicode{x2640} }$ parameter space, thus ${\zeta }_{0.6}$ places a valuable lower limit on ${\eta }_{\unicode{x2640} }$ for G dwarfs. For the K dwarfs, ${P}_{\mathrm{orb}}$ ${}_{\unicode{x2640} }$ corresponds to the Sun–Earth insolation flux. Thus, ${\zeta }_{0.6}$ is a lower limit on the K dwarf ${\eta }_{\oplus }$. We find ${\zeta }_{0.6}=0.075$ with an acceptable range of 0.013$\;\leqslant \;{\zeta }_{0.6}\;\leqslant \;$0.30, and show the baseline and systematic posterior distributions for ${\zeta }_{0.6}$ in Figure 16.

Figure 16.

Figure 16. Distribution for the 0.6 years terrestrial planet occurrence rate, ${\zeta }_{0.6}$, integrated within 20% of ${R}_{{\rm{p}}}$ = 1 ${R}_{\oplus }$ and ${P}_{\mathrm{orb}}$ ${}_{\unicode{x2640} }$, using the baseline analysis (filled orange histogram). Solid lines represent results using alternative inputs with the same line types as in Figure 10.

Standard image High-resolution image

10. TERRESTRIAL PLANET OCCURRENCE RATE FOR ONE YEAR ORBITAL PERIODS

10.1. Extrapolating to One Year Orbital Period

The longer, 300 ≤ ${P}_{\mathrm{orb}}$ ≤ 700 days parameter space roughly coincides with the theoretical HZ for the G dwarf targets, which is a preferred location in a planetary system for a stable water bearing planet (Kasting et al. 1993). In Section 8, we demonstrated that determining the planet occurrence rate in the 300 ≤ ${P}_{\mathrm{orb}}$ ≤ 700 days range directly from Kepler data is at a premature stage due to significant false alarm contamination. In this section, we extrapolate our PLDF parametric model in order to calculate two occurrence rate parameters that can be used as a baseline for comparison to future work on refining HZ occurrence rates.

First, we measure the PLDF evaluated at 1.0 ${R}_{\oplus }$ and ${P}_{\mathrm{orb}}$ = 365.25 days, ${{\rm{\Gamma }}}_{\oplus }={dN}/d\mathrm{ln}$ ${P}_{\mathrm{orb}}$ $d\mathrm{ln}$ ${R}_{{\rm{p}}}$(Youdin 2011; Foreman-Mackey et al. 2014). In the top panel of Figure 17, we show the baseline (filled orange histogram) ${{\rm{\Gamma }}}_{\oplus }$ determined by extrapolating the PLDF models from the 50 ≤ ${P}_{\mathrm{orb}}$ ≤ 300 days results. We also show the alternative systematic effects discussed in Section 6.2 (solid curves). Note that the logarithmic scaling for the abscissa indicates substantial systematic uncertainty in the results due to the extrapolation. We also show results for an extrapolated one year terrestrial planet occurrence rate, ${\zeta }_{1.0}$, defined as the occurrence rate of a planet within 20% of the Earth'(s) radius and ${P}_{\mathrm{orb}}$ in the bottom panel of Figure 17, for the baseline (filled orange histogram) and alternative systematic effects discussed in Section 6.2 (solid curves). For clarity the effects of eccentricity and for "first planet only" are not displayed in Figure 17 as the results are within the statistical uncertainty of the extrapolated baseline result.

Figure 17.

Figure 17. Top: distribution for the PLDF evaluated at the ${R}_{{\rm{p}}}$ and ${P}_{\mathrm{orb}}$ of Earth, ${{\rm{\Gamma }}}_{\oplus }$, using the extrapolated baseline analysis (filled orange histogram). Solid curves represent results using alternative inputs with the same line types as in Figure 10. We also show two alternative analyses that directly measure, without extrapolation over ${P}_{\mathrm{orb}}$, ${{\rm{\Gamma }}}_{\oplus }$ from the Q1-Q16 Kepler planet candidate sample. The direct measurement of ${{\rm{\Gamma }}}_{\oplus }$ using the full long period planet candidate sample (orange dash curve) and the trimmed long period planet candidate sample (black dash curve) result in higher ${{\rm{\Gamma }}}_{\oplus }$ than the extrapolated PLDF results, respectively. Previous ${{\rm{\Gamma }}}_{\oplus }$ determinations from Youdin (2011), Dong & Zhu (2013), Petigura et al. (2013a), and Foreman-Mackey et al. (2014) are shown with vertical lines as labeled. Bottom: same as top, but for the one year terrestrial planet occurrence rate, ${\zeta }_{1.0}$.

Standard image High-resolution image

10.2. Directly Measured At One Year Orbital Period

Though the level of systematics present in our analysis are substantial toward longer periods, we repeat the PLDF parameter estimation in the 0.75 ≤ ${R}_{{\rm{p}}}$ ≤ 2.5 ${R}_{\oplus }$ and 300 ≤ ${P}_{\mathrm{orb}}$ ≤ 700 days parameter space. We show the average pipeline completeness for the long period parameter space in Figure 14. The planet candidates from the Q1-Q16 catalog of Mullally et al. (2015) are shown as orange points and are indicated by a binary flag in Table 3. The analysis yields a high ${F}_{0}=4.7{\pm }_{1.77}^{3.1}$ planets per star, significantly steeper ${\alpha }_{\mathrm{avg}}=-4.02\pm 0.8$ and shallower $\beta =0.92\pm 0.8$, where the errors are the statistical uncertainty alone.

We defer a more detailed analysis of the systematics to future work, but as a first step we consider culling the planet candidate sample of the five planet candidates along the 0.01 pipeline completeness contour shown in Figure 14. As discussed in Section 8, this cluster of five planet candidates represents a significant overabundance of planet candidates relative to our shorter period analysis. The overabundance relative to the shorter period extrapolation prediction is nearly erased (1.5σ overabundance), if the cluster of five planet candidates is removed from the sample. The KOIs belonging to the trimmed long period planet candidate sample are indicated by a binary flag in Table 3. The PLDF parameter estimation after removing these five planet candidates yields ${F}_{0}=1.7{\pm }_{0.6}^{1.2}$, ${\alpha }_{\mathrm{avg}}=-2.7\pm 1.1$, and $\beta =0.4\pm 0.8$. From this direct analysis we show the one year terrestrial planet occurrence rate in the bottom panel of Figure 17 for ${\zeta }_{1.0}=0.76{\pm }_{0.33}^{0.55}$ (orange dash line) and ${\zeta }_{1.0}=0.21{\pm }_{0.15}^{0.28}$ (black dash line), evaluated using the full and clipped Kepler planet candidate sample in the 300 ≤ ${P}_{\mathrm{orb}}$ ≤ 700 days parameter space, respectively. Tables 7 and 8 provide the parameters that maximize the likelihood and posterior percentiles of the parameters, respectively, for these two longer period analyses.

10.3. One Year Terrestrial Planet Occurrence Rate Summary

The wide range of occurrence rates obtained from this study is a consequence of the difficulties associated with extrapolating, small number statistics, and systematics (including false alarm reliabilities). This will impact refining ${\zeta }_{1.0}$ and HZ statistics in future studies of the Kepler data set. Compiling our results of the extrapolated and direct analyses, we find ${\zeta }_{1.0}=0.1$ with an allowed range of 0.01$\;\leqslant \;{\zeta }_{1.0}\;\leqslant \;$2. Dynamical simulations cannot rule out an upper limit of ${\zeta }_{1.0}\leqslant 2$ (Smith & Lissauer 2009). The mutual hill radii separation for a system of three ${M}_{{\rm{p}}}$ = 1 ${M}_{\oplus }$ planets within the ${\zeta }_{1.0}$ occurrence region of a G dwarf is ≳9 corresponding to $\sim {10}^{10}$ years stability (Smith & Lissauer 2009). However, for a lower mass K dwarf host and larger (${R}_{{\rm{p}}}$ = 1.2${R}_{\oplus }$) planets the mutual hill radii separation ∼7 for a triple planet system in the ${\zeta }_{1.0}$ zone would likely be unstable on a 109 years timescale.

For the PLDF value at the ${R}_{{\rm{p}}}$ and ${P}_{\mathrm{orb}}$ of Earth, we find ${{\rm{\Gamma }}}_{\oplus }=0.6$ with an acceptable range from 0.04$\;\leqslant \;{{\rm{\Gamma }}}_{\oplus }\;\leqslant \;$11.5. For comparison with previous studies, we show in the bottom panel of Figure 17 as vertical lines estimates of ${{\rm{\Gamma }}}_{\oplus }$ from Youdin (2011), Dong & Zhu (2013), Petigura et al. (2013a), and Foreman-Mackey et al. (2014) from left to right, respectively. In order to calculate results for ${{\rm{\Gamma }}}_{\oplus }$ from the Dong & Zhu (2013) study, we extrapolate their parametric power law model as given for the $1\;\leqslant \;$ ${R}_{{\rm{p}}}$ $\;\leqslant $ 2 ${R}_{\oplus }$ analysis from their Table 2. The central value for ${{\rm{\Gamma }}}_{\oplus }$ from Foreman-Mackey et al. (2014) is in tension with our analysis, but there is overlap in the upper tail of their posterior with our lower limits. The analysis of Foreman-Mackey et al. (2014) used the same inputs from Petigura et al. (2013a). However, Foreman-Mackey et al. (2014) determine that finding a steeper fall off of occurrence rates toward longer ${P}_{\mathrm{orb}}$ than Petigura et al. (2013a) and taking into account uncertainty on planet radii lead to a systematically lower value for ${{\rm{\Gamma }}}_{\oplus }$ than Petigura et al. (2013a) when starting from the same inputs. Further work is needed in order to isolate whether the differences between Foreman-Mackey et al. (2014) and our study results predominately from differing inputs or methodology. The other results for ${{\rm{\Gamma }}}_{\oplus }$ from the literature are consistent with our allowed range of ${{\rm{\Gamma }}}_{\oplus }$.

11. CONCLUSION

In this study we make use of the first Kepler pipeline run using nearly all (Q1-Q16) the extant Kepler data in order to measure the planet occurrence rate for $0.75\;\leqslant \;$ ${R}_{{\rm{p}}}$ $\;\leqslant $ 2.5 ${R}_{\oplus }$in the 50 $\;\leqslant $ ${P}_{\mathrm{orb}}$ $\leqslant $ 300 days range orbiting the GK dwarf Kepler sample. We employ the first characterization of the Kepler pipeline detection efficiency calibrated with transit injection and recovery tests (Christiansen et al. 2015), the Q1-Q16 Kepler planet candidate catalog (Mullally et al. 2015), and the KIC stellar parameter catalog revision of Huber et al. (2014).

We fit the observed planet candidate sample using a parametric PLDF model following the work of Youdin (2011) and explore alternative inputs into the calculation in order to study the systematic errors present. In general, we find higher occurrence rates for the mini-Neptune to terrestrial planet regime orbiting GK dwarfs and also larger uncertainties driven by the systematics than indicated by previous studies (Dong & Zhu 2013; Petigura et al. 2013a, 2013b; Mulders et al. 2015; Silburt et al. 2015). We determine that ${F}_{0}=0.77$ planets per GK dwarf in the Kepler sample have a planet within the 0.75 $\;\leqslant \;$ ${R}_{{\rm{p}}}$ $\;\leqslant $ 2.5 ${R}_{\oplus }$ and $50\;\leqslant $ ${P}_{\mathrm{orb}}$ $\leqslant $ 300 days regime with a systematic dominated allowable range of $0.28\leqslant {F}_{0}\;\leqslant $ 1.9. The power law exponent for the ${R}_{{\rm{p}}}$ dependence in the PLDF model has a best value ${\alpha }_{\mathrm{avg}}$ = −1.5 indicating an increasing planet occurrence toward small planets, but the allowed range, −3.25$\;\leqslant \;{\alpha }_{\mathrm{avg}}\;\leqslant \;$0.53, implies that we cannot definitively determine whether the occurrence increases or decreases toward the smallest planets. However, fitting a double power-law model over a wider range of $0.75\;\leqslant \;$ ${R}_{{\rm{p}}}$ $\;\leqslant $ 5.0 ${R}_{\oplus }$ does find conclusive evidence for a break in the occurrence rate at ${R}_{\mathrm{brk}}=3.3{\pm }_{-0.4}^{+0.2}$.

We estimate a one year terrestrial planet occurrence rate, ${\zeta }_{1.0}=0.1$, with an acceptable range 0.01$\;\leqslant \;{\zeta }_{1.0}\;\leqslant \;$2, by integrating within 20% of the ${R}_{{\rm{p}}}$ and ${P}_{\mathrm{orb}}$ of Earth. The narrower ${\zeta }_{1.0}$ parameter space is a subset of the G dwarf HZ, ${\eta }_{\oplus }$ (Kasting et al. 1993; Selsis et al. 2007; Kopparapu et al. 2013; Zsom et al. 2013). Thus, ${\zeta }_{1.0}$ places a lower limit on ${\eta }_{\oplus }$ for G dwarfs. ${\zeta }_{1.0}$, which depends upon the direct observable ${P}_{\mathrm{orb}}$, facilitates comparison with future Kepler occurrence rate studies and is readily compared to theoretical terrestrial planet formation models. We defer estimates of ${\eta }_{\oplus }$, which depends upon the indirect observable of stellar insolation and uncertain atmospheric evolution theory for terrestrial planets outside the Solar System, to future studies.

We also determine a 0.6 years terrestrial planet occurrence rate, ${\zeta }_{0.6}$ = 0.075, with an acceptable range 0.013$\;\leqslant \;{\zeta }_{0.6}\;\leqslant \;$0.30, by integrating within 20% of the ${R}_{{\rm{p}}}$=1 ${R}_{\oplus }$ and ${P}_{\mathrm{orb}}$ ${}_{\unicode{x2640} }$ corresponding to Venus-type planets for G dwarf hosts. For the K dwarfs of our sample (${T}_{\mathrm{eff}}$ ≤ 5200 K), ${P}_{\mathrm{orb}}$ = 0.6 years roughly corresponds to the Solar-Earth insolation flux level. Thus, ${\zeta }_{0.6}$ places a lower limit on ${\eta }_{\oplus }$ for K dwarfs.

The current results are dominated by systematic uncertainties. Unlike statistical uncertainties that are limited by the quantity and quality of data, these systematic uncertainties can be minimized with additional study. From our analysis, we identify the leading sources of systematics: instrumental false alarm contamination of the planet candidate sample, determining planet radii (independent of the degeneracy with ${R}_{\star }$), pipeline completeness, and stellar parameters. Additional work on third light contamination, orbital eccentricity, astrophysical false positives, and false negative rate of the planet vetting process is needed. All of these should be examined carefully before accepting a definitive value for ${\eta }_{\oplus }$.

We thank the referee for insightful suggestions which improved the manuscript. Funding for this Discovery mission is provided by NASA's Science Mission Directorate. This research has made use of the NASA Exoplanet Archive, which is operated by the California Institute of Technology, under contract with the National Aeronautics and Space Administration under the Exoplanet Exploration Program. D.H. acknowledges support by the Australian Research Council's Discovery Projects funding scheme (project number DE140101364) and support by the National Aeronautics and Space Administration under grant NNX14AB92G issued through the Kepler Participating Scientist Program.

Footnotes

  • 10 

    The rising slope toward smaller ${R}_{{\rm{p}}}$ of the approximating PLDF model from the Petigura et al. (2013a) results is visually inconsistent with the decreasing occurrence rate shown in Figure 3 of Petigura et al. (2013a), but the visual inconsistency arises due to our adoption of linear bin widths for this study and the adoption of logarithmic bin widths of Petigura et al. (2013a). Thus, $\alpha =0.0$ corresponds to a flat occurrence rate in the linear bin widths of this study and $\alpha =-1.0$ would correspond to a flat occurrence if we were to adopt logarithmic bin widths.

Please wait… references are loading.
10.1088/0004-637X/809/1/8