OPTIMIZING HIGGS BOSON CP MEASUREMENT IN H → ττ DECAY WITH ML TECHNIQUES ∗

.


Introduction
As the Large Hadron Collider (LHC) has started to probe higher energy collisions, a phase of precision Higgs boson measurements begins.The measurement of Higgs boson properties performed over the last few years (and the years to come) will be crucial for laying the foundations of this previously unreached sector.Measurements of couplings and CP properties of the Higgs boson will enable more constraints to be placed on (or perhaps reveal indications of) new physics.One such measurement is the determination of the fundamental CP state of the Higgs boson.Whilst a scalar in the Standard Model (SM), more complex states are predicted by several other theories.
Several beyond SM (BSM) models such as Supersymmetric Models and generally two Higgs doublet models (2HDM) predict or require a spectra of Higgs bosons [1][2][3].In particular, these models have a CP-odd Higgs boson in the particle spectra.If this Higgs boson is otherwise degenerate with the SM Higgs boson, the effect would not necessarily be apparent in a simple measurement of the couplings.Thus, more focused studies are required in order to truly derive the full picture.
The measurement of the Higgs CP has been performed with the use of several bosonic decay modes [4,5]; results indicate that the scalar hypothesis is strongly favoured over a pseudoscalar hypothesis.Whilst hypothetically this is a fundamental result, it must be noted these decay modes are not sensitive to a pseudoscalar (CP odd) Higgs boson component at tree level (at least in the SM).Conversely, fermionic modes, which in comparison to the bosonic modes are currently less established, can possibly couple directly to a pseudoscalar Higgs boson via the Yukawa interaction.Of these fermionic decay modes the τ -decay mode offers the most viable measurement of the coupling [6] and the CP state.The focus of this paper is to explore the possibility of utilising a neural network approach to optimise the measurement of the CP state of the Higgs boson via decays to τ pairs.

Baseline approach
Consider a mixing of CP-even and CP-odd Higgs boson where φ τ is the mixing angle between CP-even and CP-odd Higgs bosons and g τ is the total coupling strength.The information relating to the mixing angle is subsequently encoded into the transverse spin components of the τ leptons where R is a rotation in the x-y (transverse) plane, and s τ ± and s τ ± ⊥ are the longitudinal and transverse spin components respectively [7].
Clearly, only the transverse spin component of the τ is relevant to the measurement (as there is no dependence on φ τ for the z component).Correlations in the transverse spin component of the τ manifest in the distribution of the τ -decay products.

Base observable
The development of a CP sensitive observable for H → τ τ decays is wellestablished in [7][8][9].Ultimately, the CP sensitive variable can be defined as the acoplanarity between planes spanned by visible decay products of τ + and τ − .This angle is denoted φ * CP [9] or φ * [7,8] in literature (these will be used interchangeably here).This observable must be separated according to a discriminating variable (in literature denoted y) which is described later.This is a manifestation of the direction of the τ -decay products in the τ -rest frame, which is necessary in studying properties of τ -decay matrix elements.When events are separated based on the sign of the product of the y (one calculated for each τ ), the resultant observable is of a sinusoidal shape.As one varies the mixing angle, the sinusoid shifts, retaining the same amplitude.Thus, the core idea of this measurement would be to take the acoplanarity, fit a sinusoidal function and measure the shift as the CP mixing angle.

Hadronic τ -decay modes
What has been discussed in this section thus far has not accounted for any experimental concerns.The hadronic final state of the τ is very extensive but can be classified mostly in terms of the number of charged and neutral pions.The most common decays (and the corresponding branching ratios) are to: -One charged and no neutral pions (direct decays τ → π ± ν -BR ∼ 10.8%).-One charged and one neutral pions (decays via the ρ resonance -BR ∼ 25.5%).-Three charged and no neutral pions (decays via the a 1 resonance -BR ∼ 9.3%).-One charged and two neutral pions (decays via the a 1 resonance -BR ∼ 9.3%).
Despite the direct decays being fairly simple, the reconstruction of a CP sensitive observable must involve the use of impact parameter [9] or proper reconstruction of the neutrino momenta, a much harder task.The use of the impact parameter is hampered often by a poor detector resolution and will not be considered for this study.Note that decays to leptonic final states (∼ 35% of the τ branching ratio) are also possible and are treated similarly to these direct decays.
The decays via ρ resonances are the most common and are fairly wellreconstructed at detectors using a particle flow approach [10].The limiting factor is the reconstructed neutral pion resolution.This is the most difficult part of the τ reconstruction.For these decays, the y is defined as [7] y where E π is the energy of the pion.The variable represents cos(θ) of the angle between the direction of the π ± in the rest-frame of intermediate ρ resonance and the direction of the boost from this frame to the Higgs rest-frame.Its use is necessary because of the dynamics of τ → π ± π 0 ν decay.With its help, the CP sensitivity of the acoplanarity angle distribution can be achieved [7].Decays to the next most common modes are via the a 1 resonance.These decays result in three pions final states.Due to the reconstruction efficiency of the neutral pions, the decays with two neutral pions have a largely reduced sample; this study will only consider the a 1 decays to three charged pions.Due to the larger mass, this mode decays via a cascade (first a 1 → π ± ρ 0 and then ρ 0 → π ± π ∓ ).As a result, these decays are difficult to handle due to the various interference effects between the pions.If one considers all possibilities, 16 acoplanarities can be reconstructed between two a 1 decays [11].These decays suffer from larger QCD jet backgrounds.
For τ decays to a 1 resonances, the definition of y is modified due to the sizeable ρ mass where m π , m a 1 and m ρ are the masses of the individual pion, a 1 and ρ resonances.This is similarly a representation of the cos(θ) angles in the intermediate resonance (a 1 ) decay products in its rest frame.Examples of the acoplanarity angles in various decay modes are presented in Fig. 1.
Evidently, modes containing a 1 decays have a reduced sensitivity compared to ρ-decay modes.What must be kept in mind is that only one reconstructed angle is presented here.In the case of a 1 decays, information is clearly lost when taking this simplistic "1D approach" (in which we only consider the φ * CP angles with events separated using y).The purpose of this study is largely to improve the sensitivity of the measurement of the Higgs CP state in τ decays by extending the usable branching fraction from the 6.5% (which is currently accessible by the method described in the section prior with decays to intermediate ρ ± resonances) of H → τ τ events to 11.9% (including decays to three charged pions) [11].To do this, more sophisticated tools are required in order to retain as much sensitivity as possible.decay modes [11].Decays involving a 1 lose sensitivity (in the separation between scalar and pseudoscalar) as there are more possible acoplanar angles available.

Neural network approach
It is evident that the difficulty lies in the dimensionality of the problem.A total of 16 acoplanarity angles and 8 separating variables, with all the interference effects implied, makes it difficult to take the same 1D approach as previously outlined.The complexity of the three prong decays calls for a more comprehensive approach to the search.
A neural network approach to the measurement was suggested in [11].This approach is an attempt to combine features, which are expected to be sensitive to the CP state of the Higgs boson, into a 1D classifier which separates the scalar and pseudoscalar hypotheses.This section will summarise results presented in [11] and detail further results.

Base neural network
Details of the neural network setup will not be presented.A description is available in [11] and key results summarised in Table 3 of [11].
The key figure of merit is the area under the Receiver Operator Characteristic curve (ROC curve).This is a measure of the separation which can be achieved with the classifier output [12].Note that there is a hypothetical upper limit to this approach which is consistent across all decay modes equal to 0.782.
Results derived in [11] demonstrate a number of important aspects of the sensitivity of features and decays.The variation of improvement across decay modes is indicative of the issue at heart; more complex decays have a reduced sensitivity.The key result one confirms by examining the combinations of input features is that the neural network is apparently able to utilise the four-vectors in such a way as to encompass the sensitivity achieved using high-level features such as the y separating variables and the masses.Also of note is that the masses seem only relevant in the more complex cases, possibly indicating that the formulation of the intermediate resonance mass is crucial in the sensitivity of a 1 decays due to the resonance cascades of these modes.

Improvements and robustness
The results summarised in the previous section are built upon in this section.Improvements to the network are possible through the previously unused neutrino information.Questions of robustness of the NN approach are addressed through application of harder selections and detector smearing.

Extension with use of E miss T
The lack of separation between scalar and pseudoscalar hypotheses in a 1 -a 1 decays is of some concern in relation to the purpose of this study.Whilst this may possibly outperform the standard base approach, the aim should be to use all the information available.If one studies the polarimetric vector for a τ (see [8]), the term proportional to the neutrino four-momenta is present yet not utilised.This is due to the difficulty in properly reconstructing the neutrino four-momenta from an under-constrained system.It is possible to apply approximations in order to constrain the system.However, it is not clear whether solutions to these kinematic equations would necessarily yield real solutions which can be applicable as inputs for the neural network.Instead, as a first attempt, the missing transverse energy (E miss T -defined here as simply the sum of neutrino transverse momenta) was used as an input feature.The results are detailed in Table I (relevant columns are denoted with a B).

TABLE I
Area under ROC curve for combinations of input features, decay modes and selections.A comparison is made using NNs trained with events, where p T (τ ) > 20 GeV.The area under the ROC curve is taken from application of these NNs to events with the same selections (denoted B = "basic selections") and also from events with selections p T (Higgs) > 100 GeV, p T (τ ) > 40 GeV and |η(τ )| < 2.5 (denoted H = "hard selections").For definitions of the features, see [11].Note that the results of "true classification" are based on [11].Clearly, the introduction of E miss T is a boon for the prospects of improving the separation.Across all decay modes, there is a substantial improvement in the separation power in any set of input features.For the ρ-ρ mode, the improvement approaches the upper limit which was discussed in the previous section.

Application of harder selections
The selections used to generate results mentioned in the previous section are quite loose (see [11] for details) which will be denoted as B ("basic selections").The tests are repeated with a tighter set of selections which are closer kinematically to the selections used in previous searches for H → τ τ decays [13].The selections below are applied as the tighter (or "hard") selection criteria: p T (τ ) > 40 GeV and |η(τ )| < 2.5 , p T (Higgs) > 100 GeV .

The results are summarised in Table I (relevant columns denoted with an H).
There is a modest loss in sensitivity, however, this is largely compatible with the previous results.This would indicate that there is some degree of dependence of the sensitivity to the selection cuts, which constrain the phase space of the τ decays, but is largely expected if one refers to [9].

Detector smearing
The demonstrated improvement is very promising, however, this needs to be tested with respect to experimental conditions (namely the resolution of the detector).For this study, the generated truth four-vectors are smeared with a simple Gaussian according to resolutions for charged particles, neutral pion and E miss T reconstructed by the ATLAS experiment at the LHC.Obviously, these do not accurately reflect the true state of the reconstructed particles but should be a reasonable proxy for the true reconstruction.These resolutions are detailed in [10,14] and [15] respectively.
The four-momenta of charged pions are smeared with Gaussians of resolution: θ -0.88 mrad , φ -0.147 mrad , The four-momenta of neutral pions are smeared with Gaussians of resolution: η -0.0056 rad , φ -0.012 rad , p T -16% of true p T .The E miss T (which is simply taken as the sum of the neutrino p T ) was smeared in both x and y (transverse) directions by 2 GeV.As the E miss T resolution varies depending on the sum of all jet p T , which is difficult to concisely quantify, a simple fixed value was chosen.The value of 2 GeV was chosen as it corresponds to the resolution for the minimum sum of jet p T that would be used in the selections (as a boost of 100 GeV is required from the Higgs candidate from [13], the transverse momenta of subsequent recoiling jets must sum to at least 100 GeV) [15].
Neural networks were trained on the generated Monte-Carlo (MC) simulation and then evaluated on smeared MC.Results are detailed in Table II.

TABLE II
Area under ROC curve for various combinations of input features, decay modes and type of MC.A comparison is made with NNs trained with unsmeared MC events, where p T (τ ) > 20 GeV.The area under ROC curve is taken from application of the NN to unsmeared (denoted U) and smeared (denoted S) MC.For definitions of the features, see [11].Note that the results of "true classification" are based on [11].

Features
ρρ(U) ρρ(S) a 1 ρ(U) a 1 ρ(S) a 1 a 1 (U) a 1 a 1 (S) It is quite evident that the large gains which were provided from the addition of E miss T are subsequently lost when considering the detector resolution.Further studies are, therefore, needed in order to salvage the lost sensitivity.

Systematic considerations for τ -decay modelling
The modelling of τ decays to hadronic final states is not typically done purely analytically.Difficulties lie in the nature of medium energy QCD interactions.Form factors are used to parameterise decays and then measurements made at low-energy machines (e.g.CLEO, BaBar) to experimentally determine the parameters.As this study utilises MC simulations with a particular parameterisation of the τ decay, it is prudent to investigate how variations in the parameterisation of the decay systematically affect the effectiveness of the neural network.In particular, the variations affect the modelling of the complex a 1 resonance decays through the vector currents.
Variations considered: -Standard CLEO (Std) parameterisation as used by default in the Tauola package [16].
The mass distributions of two or three pions, with various parameterisations are presented in Fig. 2 (a) and 2 (b) respectively.Tests of how these variations affect the NN approach to CP sensitive observables will be detailed in future works.

Future developments
The accounting of detector effects on the neural network approach demonstrates the level of precision this measurement will require.Clearly, improvements in the resolution of the reconstruction, or development of more robust inputs need to be established in order to reap the most from this NN approach.In principle, the impact parameter can be useful in providing a constraint on the E miss T , but may itself have troubles due to the detector resolution.
For refined use of τ -decay dynamics, effects due to τ -decay modelling have to be taken into account.This is an important topic even though from Figs. 2 (a) and 2 (b) one could expect it is not of great importance.This may become important if τ -decay dynamics are used to partly correct for loss due to smearing.Also the response to these modelling variations of CP sensitive observables require MC studies.
Further studies involving the contamination between channels and contaminations from background (Z → τ τ ) also need to be evaluated.

Conclusion
The measurement of a potential CP mixing of CP-even and CP-odd Higgs bosons is amongst the most interesting measurements which LHC experiments are exploring in 13 TeV collisions and beyond.The complexity of τ decays, in particular to hadronic decay products via resonances, represents both a challenge and opportunity.The use of deep learning techniques have been demonstrated to be not only a viable tool but to be potentially vital.
Currently experimental effects (detector resolution) have restricted the usefulness of this, however, much is yet to be explored.In-depth studies are in progress to evaluate further experimental concerns such as contamination from backgrounds and mis-reconstructed signal.The numerical results are expected to change as a result of future work.

Fig. 2 .
Fig. 2. Plots of masses constructed from a τ → a 1 ν decay.The ratio plots represent the ratio between the alternative current (RχL, Alt, BBr) and the standard (Std) current.The two-pion mass is formed from two oppositely charged pions.