Bayesian model selection for electromagnetic kaon production on the nucleon

We present the results of a Bayesian analysis of a Regge model to describe the background contribution for K+ Lambda and K+ Sigma0 photoproduction. The model is based on the exchange of K+(494) and K*+(892) trajectories in the t-channel. We utilise the Bayesian evidence Z to determine the best model variant for each channel. The Bayesian evidence integrals were calculated using the Nested Sampling algorithm. For different prior widths, we find decisive Bayesian evidence (\Delta ln Z ~ 24) for a K+ Lambda photoproduction Regge model with a positive vector coupling and a negative tensor coupling constant for the K*+(892) trajectory, and a rotating phase factor for both trajectories. Using the chi^2 minimisation method, one could not draw this conclusion from the same dataset. For the K+ Sigma0 photoproduction Regge model, on the other hand, the difference between the evidence integrals is insufficient to pinpoint one model variant.


Introduction
To resolve the structure of the nucleon, many models based on effective degrees of freedom have been developed. One of the more popular ones is the constituent quark model (CQM), which presents the nucleon as a system of three constituent quarks [1]. The CQM, however, predicts far more resonances than confirmed by experiment. This may lead one to turn to other models that predict fewer resonances [2].
Alternatively, this discrepancy may arise because missing resonances do not couple to the channels commonly used for nucleon spectroscopy, such as the pion-nucleon (πN ) channel. This issue can be addressed by using an electromagnetic probe instead of a pion, and by examining decay channels other than πN . Hence, electromagnetic open strangeness or kaon-hyperon (KY ) production is suggested as a key process to seek for unobserved resonances. The analysis of this process faces other difficulties, such as a small cross section and a high threshold [3,4]. Unlike threshold one-pion photoproduction, which is dominated by the ∆(1232) resonance, the KY reaction channel opens in a resonance-rich energy region. The identification of those resonances constitutes a major challenge in modelling KY production. A second characteristic of this channel is the great importance of the non-resonant or background contributions. Hence, a correct determination of the background is crucial for a correct assessment of the resonance contributions.
The earliest studies of the KY production focused on the estimation of coupling constants within a single model variant [5,6]. Only when the emphasis came to lie on identifying missing resonances did the focus shift from parameter estimation to model comparison. The statistical tools, however, have not been adapted to this new objective. The least-squares method in particular has often been stretched beyond its limits, being used not only as an optimisation tool, but also as a model selection criterion. In this Letter, we present the Bayesian evidence computation, a method based on the principles of Bayesian inference, as a more robust and well-founded tool for model comparison.
The outline of this Letter is as follows. The next section introduces the Bayesian evidence and the Nested Sampling (NS) algorithm for evidence computation. The effectiveness of this algorithm is subsequently demonstrated for a Regge model in Section 3 and the results are discussed in Section 4. The conclusions and outlook are given in Section 5.

Bayesian analysis
Bayesian analysis is an established tool for model selection in astronomy and cosmology, and is gaining momentum in other fields [7,8]. The potential of this method in hadronic physics was recently demonstrated in a Bayesian analysis of pentaquark data by Ireland et al. [9], and for parameter estimation in effective field theories by Schindler et al. [10]. The quantity of interest for model comparison is the Bayesian evidence, which will be derived below.
One can straightforwardly derive the posterior probability of a model M , given a set of experimental data {d k }. Indeed, using Bayes' theorem, one can write this probability as The quantity P ({d k } |M ) is referred to as the marginal likelihood or the Bayesian evidence (Z). If the model M can be parametrised with a set α M , this probability can be written as an integral over all possible values of these parameters. This procedure, which is referred to as marginalisation, yields the following |∆ ln Z| < 1 Not worth more than a bare mention 1 < |∆ ln Z| < 2.5 Significant 2.5 < |∆ ln Z| < 5 Strong to very strong 5 < |∆ ln Z| Decisive expression for the Bayesian evidence: Eq. (4) states that the Bayesian evidence is the integral of the product of two distributions: (i) the probability of the dataset {d k }, given the set of parameters α M and the model M , and (ii) the probability of the set of parameters α M , given the model M . The first factor, P ({d k } |α M , M ), can be identified as the likelihood function, L(α M ). Any prior knowledge of the parameters' probability distribution before considering the data {d k } is contained in the second factor P (α M |M ). This distribution, which is indispensable in Bayesian statistics, is referred to as the prior distribution π(α M ). These two substitutions allow us to write the evidence in a more familiar form, in which the explicit dependence on {d k } and M is omitted for brevity. It is clear that the actual quantity of interest for model comparison is the relative probability of a model M A versus a model M B , given the available experimental data {d k }. Writing down the probability ratios and subsequently applying Bayes' theorem, one can see how the evidence emerges from this expression: Any prior preference for one model over the other can be incorporated by the factor P (M A ) P (M B ) . As we have no prior preference for one of the models, we can take this value to be one, hence reducing the comparison of two models to the calculation of the evidence ratio, which is often referred to as the Bayes factor. The direct relation between Z and a model's probability elucidates the term "evidence": if a model has a higher value of Z, there is more evidence in favour of this model. In accordance with our intuitive notions, evidence is not only based on experimental data, but also on theoretical restrictions that are incorporated through the prior distribution. The natural logarithm of the evidence ratio can be interpreted qualitatively with the aid of Jeffreys' scale, listed in Table 1.
The analytical form of the likelihood function L(α M ) is rarely known and a normal distribution is often used to approximate it. Indeed, data points are independent and are usually reported to have normally distributed errors. This gives rise to a χ 2 -distribution for the quantity defined as where σ i is the error bar of data point d i , and f i (α M ) is the corresponding model prediction. The χ 2 -distribution can be approximated by a normal distribution if the number of degrees of freedom k is sufficiently large. This value is defined as k = N − dim(α M ), which approximates the number of data points for a sufficiently large dataset and low number of free parameters. The expression for this limiting distribution is [13]: In an analysis based on χ 2 minimisation -which is an approximation to a maximum likelihood fit -only the maximum value of L(α M ) is considered for model selection. A Bayesian approach is more comprehensive, as it evaluates the model over its entire parameter space, and takes the prior distribution into account. This distinction is illustrated in Fig. 1.
Determining the evidence of a model is not a straightforward task, because it requires the calculation of multidimensional integrals of the type (5). Most often, analytical simplifications are not possible and it is key to adopt numerical integration techniques that are optimised for the problem at hand.
Nested Sampling (NS) is a novel integration technique for computing Bayesian evidence, developed by Skilling [14,15]. This technique significantly reduces the computational cost of the integral over the model's parameter space by transforming it into a one-dimensional integral over the prior mass dX = π(α M )dα M . This is accomplished by regarding the prior mass as a monotonically decreasing function of the likelihood, λ: Assuming a normalised prior, we can hence write the evidence as the following integral over L(X), the inverse of X(λ),  Figure 2: The Bayesian evidence Z as an integral of the likelihood L over the prior mass X.
The lower plots illustrate that for a uniform prior, the increasing prior masses X(L i ) are equal to the area or mass { α | L(α) > L i } inside nested iso-likelihood contours. The thick arrows indicate the order of integration.
as illustrated in Fig. 2. This integral can be approximated by a sum of likelihoods, weighted with their respective prior mass contribution ∆X. In NS, this sum is computed using Markov chain Monte Carlo methods to sample from the parameter space with the constraint L i+1 > L i . The prior mass of a sample of N points with likelihood greater than L i can be estimated by a factor exp − i N . This can be derived from the probability distribution of the largest value t N of a set of N uniformly distributed points in the interval [0, 1] (representing the total prior weight): Indeed, the estimation value for log t N is − 1 N , which after i iterations amounts to a sum of − i N . Besides efficiently calculating the evidence, NS can also be used for determining the posterior distribution and, more importantly, for parameter estimation. A more detailed account of the technique as well as implementation examples can be found in Skilling's work [15].
In the following section, we will present the results of an application of the NS technique to a Regge model for K + Λ and K + Σ 0 production.

Bayesian analysis of a Regge Model
Regge phenomenology is a powerful tool to economically describe reactions at high energies [16]. A Regge model based on the exchange of charged meson Regge trajectories was applied to KY photoproduction by Vanderhaeghen, Guidal and Laget [17]. They found the exchange of two t-channel trajectories sufficient to successfully describe the cross sections as well as polarisation observables in photoproduction of kaons above the resonance region [18].
In the Regge-plus-resonance (RPR) description of electromagnetic KY production, developed by Corthals et al., the Regge background is complemented with s-channel nucleon (N * ) resonances. This hybrid approach ensures a correct high-energy behaviour as well as an improved description of the resonance region [4,19,20].
In the case of K + Λ and K + Σ 0 photoproduction, the Regge background can be modelled with the exchange of the K + (494) and K * + (892) trajectories. This amplitude is derived from the t-channel Feynman amplitude by replacing the Feynman propagator by the respective Regge propagator [4]: The kaon trajectories are given by [4]: and the scale factor s 0 = 1 GeV −2 . The so-called sign factor in the Regge propagator is reduced to a phase factor of either 1 (constant phase) or e −iπα(t) (rotating phase) due to the strong degeneracy of the trajectories. This assumption is inspired by the structureless high-energy differential cross-section. These phases cannot be determined on theoretical grounds. The possibility of the K + and K * + trajectories having a constant phase is excluded, as this combination gives rise to a photon asymmetry Σ = 0, which disagrees with the data. The remaining three possibilities, namely rotating K + /rotating K * + , rotating K + /constant K * + , and constant K + /rotating K * + , will be abbreviated as rot./rot., rot./cst. and cst./rot., respectively. Apart from the three choices with regard to the phases, the model has three continuous parameters. These are the strong coupling constant g K + Y p of the K + trajectory and the tensor and vector couplings of the K * + trajectory, Here, κ K + K * + is the transition magnetic moment for K * + → γK + decay. In the following analysis, the Regge background will be constrained by the high-energy data, in accordance with the RPR-approach. In this energy region, there is a set of 72 data points for the K + Λ channel, comprising 56 differential cross section data points ( dσ dt ) [21], 9 photon asymmetries (Σ) [22] and 7 recoil asymmetries (P ) [23]. The database for K + Σ 0 photoproduction at high energies is even smaller, with only 48 differential cross section data points [21] and 9 photon asymmetries (Σ) [22]. Optimisation of the above-mentioned parameters against these data reveals that there are several model variants with comparable χ 2 values [4]. For example, in both K + Λ and K + Σ 0 photoproduction, the signs of G v K * + and G t K * + cannot be established conclusively using the χ 2 -method [4]. The sign and phase ambiguities may not seem important for the Regge model itself. For the RPR model, however, an exact determination of the background parameters is of major importance, because it affects the extraction of the resonance information.
Ref. [20] shows that by comparing the RPR model variants for electromagnetic K + Λ production to photo-and electroproduction data from the resonance region, all but one Regge background model can be eliminated. We will show that the Bayesian evidence can be used to distinguish among the twelve different models that result from the possible sign and phase combinations, using only the high-energy dataset.

Results
The prior π(α) is chosen to be a uniform distribution. Note that under conditions of highly concentrated likelihood, for which the prior distribution varies mildly, the likelihood dominates the shape of the posterior distribution [14]. Accordingly, the evidence calculations will not be largely affected by the choice with regard to the prior distribution. This means that a uniform distribution will lead to results that are similar to those obtained with a Gaussian or any other well-behaved distribution. We show that the bulk of the likelihood is indeed concentrated at parameter values below 100 by demonstrating that evidence calculations for prior widths equal to 100 and much greater than 100 yield the same results.
Prior information exists for the coupling constants of the K + Y 0 p vertices. Indeed, the following relations follow from SU(3) symmetry [24]: where α = F/(F + D) quantifies the ratio of F-type to D-type coupling and g πN N is the pion-nucleon coupling constant. It is commonly assumed that SU(3) symmetry can be broken at the 20% level [25]. Inserting the experimentally determined values α = 0.644 and g 2 πN N / √ 4π = 14.3 yields the following prior ranges for the coupling constants of the K + Y 0 p vertices [24,25]: There are no reliable theoretical constraints for the K * + Y 0 p vertices [26]. We therefore choose a uniform distribution between zero and a value much larger than the natural value of one. To justify this choice for the prior interval, a sensitivity analysis of the evidence ratios is performed by repeating the calculations for different prior ranges.
The results of these calculations for the K + Λ and K + Σ 0 production models are displayed in Table 2 and 3 respectively. These tables list the computed values of ∆ ln Z = ln Z − ln Z max , using a prior width of respectively 100, 1000 and 10000. Changing the prior width from 100 to 1000 results in a difference of less than 5% in the computed values. For a prior width of 10000, the error increases significantly due to a reduced sampling efficiency. More importantly, however, the ranking of the models is not significantly affected. Clearly, the effect of the prior width on the relative probabilities of the models is negligible, provided that it is large enough to contain the area where most of the likelihood is concentrated. Table 2: Logarithms of the evidence ratios (∆ ln Z ≡ ln (Z/Zmax)) for the twelve model variants resulting from phase and sign ambiguities in the two-trajectory Regge model for K + Λ photoproduction. The results are listed in order of decreasing probability for a prior width of 100.  Table 3: Logarithms of the evidence ratios (∆ ln Z ≡ ln (Z/Zmax)) for the twelve model variants resulting from phase and sign ambiguities in the two-trajectory Regge model for K + Σ 0 photoproduction. The results are listed in order of decreasing probability for a prior width of 100. Furthermore, the comparison with Jeffreys' scale (Table 1) indicates that the p(γ, K + )Λ data exhibit decisive evidence for the model variant with a positive vector and a negative tensor coupling constant, and a rotating phase for both trajectories. Indeed, the difference in ln Z with the second-best model is around 24, amply exceeding the value of 5 required for a decisive statement. This result resolves the sign and phase ambiguity for K + Λ photoproduction, which previously could not be achieved using high-energy data alone [4]. Moreover, the result from this Bayesian analysis is consistent with the previous analysis of this particular model, but it did not require an additional analysis with data from the resonance region (for which E γ lab 3 GeV). In other words, the Nested Sampling method requires less experimental data to reach the same conclusion as the χ 2 -analysis which used a much larger set of K + Λ photoproduction data.  Apart from the relative probability of the different models, the Nested Sampling technique also provides us with an estimation value of the different parameters. In our best model, these values are: G v K * + = 12.47 ± 0.14 G t K * + = −32.19 ± 0.50.
Additional calculations demonstrate that the results do not change significantly when the g K + Y 0 p coupling constant is allowed to deviate up to 40% from SU(3) predictions. For example, the values of ∆ ln Z for the second and third K + Λ production models are −24.21 ± 0.73 and −77.09 ± 0.70 respectively, agreeing with the values found for 20% SU(3) symmetry breaking. The estimation values for the coupling constants are not affected. We conclude that the high-energy p(γ, K + )Λ data support a coupling constant g K + Λp compatible with a level of SU(3) symmetry breaking of at most 20%. The estimation value for g K + Λp deviates 15% from the SU(3) prediction. Fig. 3 shows the log-likelihood of the parameter G v K * + , integrated over the remaining two parameters, using a uniform prior between -100 and 100. The likelihood is determined by the high-energy K + Λ photoproduction data only.  The greater width and height of the peak of the rot./rot. model variant indicate that it should have a greater evidence than the alternative model.
The results for K + Σ 0 production, listed in Table 3, are not as clear-cut as those for p(γ, K + )Λ. The difference in ln Z of the model variants is next to negligible. This was to be expected, as a smaller dataset provides fewer restraints on the models' parameters.
Even when the extensive set of resonance-region data is taken into account, the experimental data from the proton target alone does not allow us to single out one background model for the K + Σ 0 channel [20]. However, a recent analysis of the corresponding reaction on the neutron, n(γ, K + )Σ − , was able to resolve the remaining ambiguity [27]. The models' parameters were converted from the proton to the neutron channel using isospin considerations.
The log-likelihood of the parameter G v K * + for K + Σ 0 production is shown in Fig. 4. Note that despite boasting the maximal likelihood value, the rot./rot model has a lower evidence than the rot./cst. model.

Conclusions and outlook
Bayesian inference provides us with a promising tool for model comparison. We have demonstrated this by using the Nested Sampling algorithm to compute the Bayesian evidence for different model variants of a Regge model for KY photoproduction. The results of this calculation indicate that there is decisive evidence for a K + Λ production model with G v K * + > 0, G t K * + < 0, and a rotating phase factor for both trajectories. This conclusion could not be drawn from the high-energy data set by means of the method of χ 2 minimisation.
For K + Σ 0 production, the differences in evidence are too small to draw a decisive conclusion, and supplementary data is required to fully determine the background model for this channel.
The Nested Sampling method has many applications, both for the RPR model and for other research. One of these applications is the accurate estimation of model parameters as well as the elimination of nuisance parameters. More importantly, however, this method may provide us with a means to address the missing-resonance problem by calculating the probability of individual resonance contributions in a Bayesian framework. This is an approach we intend to explore in the near future.