Illustrating different approaches to computing the significance (local p-value) using SkyLLH

This tutorial shows how to calculate the significance (local p-value) using in SkyLLH.

[ ]:
import numpy as np
from matplotlib import pyplot as plt

For a more details on the create_dataset_collection, PointLikeSource, create_analysis, initialize_trial and unblind functions see ‘here the url for the 14 year tutorial’.

First we have to create a configuration instance.

[3]:
from skyllh.core.config import Config

cfg = Config()

Importing dataset definition of the public 14-year point-source data set and we can get a list of data sets via the access operator [dataset1, dataset2, ...] of the dsc instance: For more details on the create_dataset_collection function see ‘url’

[ ]:
from skyllh.datasets.i3.PublicData_14y_ps import create_dataset_collection

dsc = create_dataset_collection(cfg=cfg)
print(dsc.dataset_names)
datasets = dsc['IC40', 'IC59', 'IC79', 'IC86_I-XI']

Defining the Source: Here we use TXS 0506+056 as the source

Once the source is defined the create_analysis function can create the analysis instance

[5]:
from skyllh.analyses.i3.publicdata_ps.time_integrated_ps import create_analysis
from skyllh.core.source_model import PointLikeSource

src_ra = 77.35  # degrees
src_dec = 5.7  # degrees

source = PointLikeSource(ra=np.radians(src_ra), dec=np.radians(src_dec))

ana = create_analysis(cfg=cfg, datasets=datasets, source=source)
100%|██████████| 33/33 [00:25<00:00,  1.27it/s]
100%|██████████| 33/33 [00:35<00:00,  1.09s/it]
100%|██████████| 33/33 [00:25<00:00,  1.29it/s]
100%|██████████| 33/33 [00:08<00:00,  3.92it/s]
100%|██████████| 4/4 [08:22<00:00, 125.69s/it]
100%|██████████| 136/136 [00:00<00:00, 1876.73it/s]

Initializing the analysis instance and unbliding the data.

[6]:
from skyllh.core.random import RandomStateService

rss = RandomStateService(3090)
TS, bestfit_params, minimizer_status = ana.unblind(rss)
ns_bf = bestfit_params['ns']
gamma_bf = bestfit_params['gamma']
print(f'Unblinded results: TS={TS:.2f}, ns={ns_bf:.2f}, gamma={gamma_bf:.2f}')
print(f'Minimizer status: {minimizer_status}')
Unblinded results: TS=9.77, ns=8.80, gamma=1.97
Minimizer status: {'grad': array([-3.67243376e-06,  5.62921408e-05]), 'task': 'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH', 'funcalls': 16, 'nit': 11, 'warnflag': 0, 'skyllh_minimizer_n_reps': 0, 'n_llhratio_func_calls': 16}

Calculating the significance (local p-value) by generating the test-statistic distribution of background-only data trials

The significance of the source, i.e. the local p-value, can be calculated by generating the test-statistic distribution of background-only data trials, i.e. for zero injected signal events. SkyLLH provides the helper function create_trial_data_file to do that:

[7]:
from skyllh.core.utils.analysis import create_trial_data_file
[51]:
help(create_trial_data_file)
Help on function create_trial_data_file in module skyllh.core.utils.analysis:

create_trial_data_file(ana, rss, n_trials, mean_n_sig=0, mean_n_sig_null=0, mean_n_bkg_list=None, minimizer_rss=None, bkg_kwargs=None, sig_kwargs=None, pathfilename=None, ncpu=None, ppbar=None, tl=None)
    Creates and fills a trial data file with `n_trials` generated trials for
    each mean number of injected signal events specified by `mean_n_sig` for a
    given analysis.

    Parameters
    ----------
    ana : instance of Analysis
        The Analysis instance to use for the trial generation.
    rss : instance of RandomStateService
        The RandomStateService instance to use for generating random
        numbers.
    n_trials : int
        The number of trials to perform for each hypothesis test.
    mean_n_sig : ndarray of float | float | 2- or 3-element sequence of float
        The array of mean number of injected signal events (MNOISEs) for which
        to generate trials. If this argument is not a ndarray, an array of
        MNOISEs is generated based on this argument.
        If a single float is given, only this given MNOISEs are injected.
        If a 2-element sequence of floats is given, it specifies the range of
        MNOISEs with a step size of one.
        If a 3-element sequence of floats is given, it specifies the range plus
        the step size of the MNOISEs.
    mean_n_sig_null : ndarray of float | float | 2- or 3-element sequence of float
        The array of the fixed mean number of signal events (FMNOSEs) for the
        null-hypothesis for which to generate trials. If this argument is not a
        ndarray, an array of FMNOSEs is generated based on this argument.
        If a single float is given, only this given FMNOSEs are used.
        If a 2-element sequence of floats is given, it specifies the range of
        FMNOSEs with a step size of one.
        If a 3-element sequence of floats is given, it specifies the range plus
        the step size of the FMNOSEs.
    mean_n_bkg_list : list of float | None
        The mean number of background events that should be generated for
        each dataset. This parameter is passed to the ``do_trials`` method of
        the ``Analysis`` class. If set to None (the default), the background
        generation method needs to obtain this number itself.
    minimizer_rss : instance of RandomStateService | None
        The instance of RandomStateService to use for generating random
        numbers for the minimizer, e.g. new initial fit parameter values.
        If set to ``None``, a rss with the same seed as ``rss`` will be
        initialized.
    bkg_kwargs : dict | None
        Additional keyword arguments for the `generate_events` method of the
        background generation method class. An usual keyword argument is
        `poisson`.
    sig_kwargs : dict | None
        Additional keyword arguments for the `generate_signal_events` method
        of the `SignalGenerator` class. An usual keyword argument is
        `poisson`.
    pathfilename : string | None
        Trial data file path including the filename.
        If set to None generated trials won't be saved.
    ncpu : int | None
        The number of CPUs to use.
    ppbar : instance of ProgressBar | None
        The optional instance of the parent progress bar.
    tl: instance of TimeLord | None
        The instance of TimeLord that should be used to measure individual
        tasks.

    Returns
    -------
    seed : int
        The seed used to generate the trials.
    mean_n_sig : 1d ndarray
        The array holding the mean number of signal events used to generate the
        trials.
    mean_n_sig_null : 1d ndarray
        The array holding the fixed mean number of signal events for the
        null-hypothesis used to generate the trials.
    trial_data : structured numpy ndarray
        The generated trial data.

[8]:
from skyllh.core.timing import TimeLord

tl = TimeLord()
[ ]:
rss = RandomStateService(seed=1)
(_, _, _, trials) = create_trial_data_file(
    ana=ana, rss=rss, n_trials=1e3, mean_n_sig=0, pathfilename='txs_bkg_trails.npy', ncpu=8, tl=tl
)
print(tl)
100%|██████████| 1001/1001 [29:07<00:00,  1.75s/it]
TimeLord: Executed tasks:
[Generating background events for dataset "IC40".     ]   0.003 sec/iter (1000)
[Generating background events for dataset "IC59".     ]   0.008 sec/iter (1000)
[Generating background events for dataset "IC79".     ]   0.013 sec/iter (1000)
[Generating background events for dataset "IC86_I-XI".]   0.121 sec/iter (1000)
[Generating pseudo data.                              ]   0.137 sec/iter (1000)
[Initializing trial.                                  ]   0.528 sec/iter (1000)
[Get sig probability densities and grads.             ] 2.0e-05 sec/iter (90416)
[Get bkg probability densities and grads.             ] 2.0e-05 sec/iter (90416)
[Calculate PDF ratios.                                ] 3.1e-04 sec/iter (90416)
[Calc pdfratio value Ri                               ]   0.003 sec/iter (45208)
[Calc logLambda and grads                             ] 8.0e-04 sec/iter (45208)
[Evaluate llh-ratio function.                         ]   0.019 sec/iter (11302)
[Minimize -llhratio function.                         ]   0.219 sec/iter (1000)
[Maximizing LLH ratio function.                       ]   0.219 sec/iter (1000)
[Calculating test statistic.                          ] 9.8e-05 sec/iter (1000)

After generating the background trials, we can histogram the test-statistic values and plot the TS distribution.

[47]:
(h, be) = np.histogram(trials['ts'], bins=np.arange(0, np.max(trials['ts']) + 0.1, 0.1))
plt.plot(0.5 * (be[:-1] + be[1:]), h, drawstyle='steps-mid', label='background')
plt.vlines(TS, 1, np.max(h), label=f'TS(TXS 0506+056)={TS:.3f}')
plt.yscale('log')
plt.xlabel('TS')
plt.ylabel('#trials per bin')
plt.legend()
pass
../_images/tutorials_p_value_method_comparison_19_0.png

One can also use a helper function extend_trial_data_file to extend the trial data if the trial data generated is not enough to estimate the p-value reliably.

It is also possible to write these trials onto a file, by defining a file path in the function.

[49]:
from skyllh.core.utils.analysis import extend_trial_data_file
[50]:
help(extend_trial_data_file)
Help on function extend_trial_data_file in module skyllh.core.utils.analysis:

extend_trial_data_file(ana, rss, n_trials, trial_data, mean_n_sig=0, mean_n_sig_null=0, mean_n_bkg_list=None, bkg_kwargs=None, sig_kwargs=None, pathfilename=None, **kwargs)
    Appends to the trial data file `n_trials` generated trials for each
    mean number of injected signal events specified by `mean_n_sig` for a
    given analysis.

    Parameters
    ----------
    ana : instance of Analysis
        The Analysis instance to use for sensitivity estimation.
    rss : instance of RandomStateService
        The RandomStateService instance to use for generating random
        numbers.
    n_trials : int
        The number of trials the trial data file needs to be extended by.
    trial_data : structured numpy ndarray
        The structured numpy ndarray holding the trials.
    mean_n_sig : ndarray of float | float | 2- or 3-element sequence of float
        The array of mean number of injected signal events (MNOISEs) for which
        to generate trials. If this argument is not a ndarray, an array of
        MNOISEs is generated based on this argument.
        If a single float is given, only this given MNOISEs are injected.
        If a 2-element sequence of floats is given, it specifies the range of
        MNOISEs with a step size of one.
        If a 3-element sequence of floats is given, it specifies the range plus
        the step size of the MNOISEs.
    mean_n_sig_null : ndarray of float | float | 2- or 3-element sequence of float
        The array of the fixed mean number of signal events (FMNOSEs) for the
        null-hypothesis for which to generate trials. If this argument is not a
        ndarray, an array of FMNOSEs is generated based on this argument.
        If a single float is given, only this given FMNOSEs are used.
        If a 2-element sequence of floats is given, it specifies the range of
        FMNOSEs with a step size of one.
        If a 3-element sequence of floats is given, it specifies the range plus
        the step size of the FMNOSEs.
    bkg_kwargs : dict | None
        Additional keyword arguments for the `generate_events` method of the
        background generation method class. An usual keyword argument is
        `poisson`.
    sig_kwargs : dict | None
        Additional keyword arguments for the `generate_signal_events` method
        of the `SignalGenerator` class. An usual keyword argument is
        `poisson`.
    pathfilename : string | None
        Trial data file path including the filename.

    Additional keyword arguments
    ----------------------------
    Additional keyword arguments are passed-on to the ``create_trial_data_file``
    function.

    Returns
    -------
    trial_data :
        Trial data file extended by the required number of trials for each
        mean number of injected signal events..

[ ]:
tl = TimeLord()
rss = RandomStateService(seed=2)
trials = extend_trial_data_file(
    ana=ana, rss=rss, n_trials=4e3, trial_data=trials, pathfilename='txs_bkg_trails.npy', ncpu=8, tl=tl
)
  0%|          | 0/1 [00:00<?, ?it/s]100%|██████████| 4001/4001 [19:38:59<00:00, 17.68s/it]
[17]:
print(tl)
TimeLord: Executed tasks:
[Generating background events for dataset "IC40".     ]   0.002 sec/iter (4000)
[Generating background events for dataset "IC59".     ]   0.007 sec/iter (4000)
[Generating background events for dataset "IC79".     ]   0.010 sec/iter (4000)
[Generating background events for dataset "IC86_I-XI".]   0.103 sec/iter (4000)
[Generating pseudo data.                              ]   0.116 sec/iter (4000)
[Initializing trial.                                  ]   0.377 sec/iter (4000)
[Get sig probability densities and grads.             ] 1.4e-05 sec/iter (346512)
[Get bkg probability densities and grads.             ] 1.3e-05 sec/iter (346512)
[Calculate PDF ratios.                                ] 2.4e-04 sec/iter (346512)
[Calc pdfratio value Ri                               ]   0.002 sec/iter (173256)
[Calc logLambda and grads                             ] 5.9e-04 sec/iter (173256)
[Evaluate llh-ratio function.                         ]   0.014 sec/iter (43314)
[Minimize -llhratio function.                         ]   0.159 sec/iter (4000)
[Maximizing LLH ratio function.                       ]   0.159 sec/iter (4000)
[Calculating test statistic.                          ] 6.5e-05 sec/iter (4000)

skyllh.core.utils.analysis has a function calculate_pval_from_trials that allows one to calculate the p_value form trials. The function takes the created trials and the critical TS value to give the computed p_value and the error on it.

[52]:
from skyllh.core.utils.analysis import calculate_pval_from_trials
[53]:
help(calculate_pval_from_trials)
Help on function calculate_pval_from_trials in module skyllh.core.utils.analysis:

calculate_pval_from_trials(ts_vals, ts_threshold, comp_operator='greater')
    Calculates the percentage (p-value) of test-statistic trials that are
    above the given test-statistic critical value.
    In addition it calculates the standard deviation of the p-value assuming
    binomial statistics.

    Parameters
    ----------
    ts_vals : (n_trials,)-shaped 1D ndarray of float
        The ndarray holding the test-statistic values of the trials.
    ts_threshold : float
        The critical test-statistic value.
    comp_operator: string, optional
        The comparison operator for p-value calculation. It can be set to one of
        the following options: 'greater' or 'greater_equal'.

    Returns
    -------
    p, p_sigma: tuple(float, float)

[ ]:
trials_ts_dis = np.load('txs_bkg_trails.npy')
[84]:
p_val_from_trials, p_val_from_trials_sigma = calculate_pval_from_trials(trials_ts_dis['ts'], TS)
print(f'p_value calculated from trials: {p_val_from_trials:.3e}')
p_value calculated from trials: 4.400e-03
[83]:
(h, be) = np.histogram(trials_ts_dis['ts'], bins=np.arange(0, np.max(trials_ts_dis['ts']) + 0.1, 0.1))
plt.plot(0.5 * (be[:-1] + be[1:]), h, drawstyle='steps-mid', label='background')
plt.vlines(TS, 1, np.max(h), label=f'TS(TXS 0506+056)={TS:.3f}')
plt.yscale('log')
plt.xlabel('TS')
plt.ylabel('#trials per bin')
plt.legend()
pass
../_images/tutorials_p_value_method_comparison_30_0.png

Calculating the significance (local p-value) by fitting a truncated gamma function

To estimate the significance of the result, one can use the do_trials property of the Analysis instance. This function generates some pseudo-experiments which are background only hypothesis with no signal.

[ ]:
help(ana.do_trials)
Help on method do_trials in module skyllh.core.analysis:

do_trials(rss, n, ncpu=None, tl=None, ppbar=None, **kwargs) method of skyllh.core.analysis.SingleSourceMultiDatasetLLHRatioAnalysis instance
    Executes the :meth:`do_trial` method ``n`` times with possible
    multi-processing.

    Parameters
    ----------
    rss : instance of RandomStateService
        The RandomStateService instance to use for generating random
        numbers.
    n : int
        Number of trials to generate using the `do_trial` method.
    ncpu : int | None
        The number of CPUs to use, i.e. the number of subprocesses to
        spawn. If set to None, the global setting will be used.
    tl : instance of TimeLord | None
        The optional instance of TimeLord that should be used to time
        individual tasks.
    ppbar : instance of ProgressBar | None
        The possible parent ProgressBar instance.
    **kwargs
        Additional keyword arguments are passed to the :meth:`do_trial`
        method. See the documentation of that method for allowed keyword
        arguments.

    Returns
    -------
    recarray : numpy record ndarray
        The numpy record ndarray holding the result of all trials.
        See the documentation of the
        :py:meth:`~skyllh.core.analysis.Analysis.do_trial` method for the
        list of data fields.

[23]:
trials = ana.do_trials(
    rss=rss,
    n=1000,
)
100%|██████████| 1000/1000 [29:03<00:00,  1.74s/it]

extend_trial_data_file is a helper function that allows one to add more trial data to better calculate the p-value. This helper function used here is seperate from the do_trials property of the analysis instance.

[24]:
from skyllh.core.utils.analysis import extend_trial_data_file

trials = extend_trial_data_file(ana=ana, rss=rss, n_trials=1000, trial_data=trials)
  0%|          | 0/1 [00:00<?, ?it/s]100%|██████████| 1001/1001 [15:38<00:00,  1.07it/s]

We can have a look at the distributions of the trial results and compare it to the unblinded result.

[26]:
fig, ax = plt.subplots(1, 3, figsize=(12, 4))

ax[0].hist(trials['ns'], bins=30, color='C0', alpha=0.7)
ax[0].axvline(ns_bf, color='C1', linestyle='--', label='Unblinded ns')
ax[0].legend()
ax[0].set_yscale('log')
ax[0].set_ylabel(r'Counts')
ax[0].set_xlabel(r'Signal Events')

ax[1].hist(trials['gamma'], bins=30, color='C0', alpha=0.7)
ax[1].axvline(gamma_bf, color='C1', linestyle='--', label='Unblinded gamma')
ax[1].legend()
ax[1].set_yscale('log')
ax[1].set_xlabel(r'Spectral Index $\gamma$')

ax[2].hist(trials['ts'], bins=30, color='C0', alpha=0.7)
ax[2].axvline(TS, color='C1', linestyle='--', label='Unblinded TS')
ax[2].legend()
ax[2].set_yscale('log')
ax[2].set_xlabel('Test-Statistic')
  0%|          | 8/4001 [21:59:02<10972:45:58, 9892.80s/it]
[26]:
Text(0.5, 0, 'Test-Statistic')
../_images/tutorials_p_value_method_comparison_38_2.png

Estimating the p-value for the result, via fitting the Test Statistic (TS) distribution with a truncated gamma function

The fit_truncated_gamma function can be used to fit the truncated gamma function to the TS distribution.

The calucate_pval_from_trials_mixed function, allows us to estimate the p-value from the fit. This function can also be used to estimate the p-value for very high TS values.

[27]:
from scipy.stats import gamma

from skyllh.core.utils.analysis import (
    calculate_critical_ts_from_gamma,
    calculate_pval_from_trials_mixed,
    fit_truncated_gamma,
)
[28]:
help(calculate_pval_from_trials_mixed)
Help on function calculate_pval_from_trials_mixed in module skyllh.core.utils.analysis:

calculate_pval_from_trials_mixed(ts_vals, ts_threshold, switch_at_ts=3.0, eta=None, n_max=500000, comp_operator='greater_equal')
    Calculates the probability (p-value) of test-statistic exceeding
    the given test-statistic threshold. This calculation relies on fitting
    a gamma distribution to a list of ts values if ts_threshold is larger than
    switch_at_ts. If ts_threshold is smaller then the pvalue will be taken
    from the trials directly.

    Parameters
    ----------
    ts_vals : (n_trials,)-shaped 1D ndarray of float
        The ndarray holding the test-statistic values of the trials.
    ts_threshold : float
        The critical test-statistic value.
    switch_at_ts : float, optional
        Test-statistic value below which p-value is computed from trials
        directly. For thresholds greater than switch_at_ts the pvalue is
        calculated using a gamma fit.
    eta : float, optional
        Test-statistic value at which the gamma function is truncated
        from below. Default is None.
    n_max : int, optional
        The maximum number of trials that should be used during
        fitting. Default = 500,000
    comp_operator: string, optional
        The comparison operator for p-value calculation. It can be set to one of
        the following options: 'greater' or 'greater_equal'.

    Returns
    -------
    p, p_sigma: tuple(float, float)

You need to truncate the distribution at some \(\eta > 0\) because the TS distribution is the sum of a delta peak at 0 and a continuous distribution for TS > 0.

[ ]:
eta = 1.0
pars, norm = fit_truncated_gamma(trials['ts'], eta=eta)

# Plot
fig, ax = plt.subplots(1, 1, figsize=(6, 4))
counts, bins = np.histogram(trials['ts'], bins=30, density=True)
ax.bar(
    0.5 * (bins[1:] + bins[:-1]),
    height=counts,
    width=bins[1:] - bins[:-1],
    edgecolor='C0',
    color='C0',
    alpha=0.5,
    label='Bkg TS distribution',
)
ax.errorbar(
    0.5 * (bins[1:] + bins[:-1]),
    counts,
    yerr=np.sqrt(counts / len(trials['ts']) / (bins[1:] - bins[:-1])),
    fmt='',
    ls='',
    color='C0',
)
bins = bins[bins > eta]
bincenters = 0.5 * (bins[1:] + bins[:-1])
bw = bins[1:] - bins[:-1]
ax.set_yscale('log')

vals = np.linspace(eta, TS + 2, 100)
ax.plot(
    vals, norm * gamma.pdf(vals, a=pars[0], scale=pars[1]), lw=1.5, color='k', zorder=5, label='Fitted truncated gamma'
)

ax.axvline(TS, color='C1', linestyle='--', label='unblinded TS')

ax.set_xlabel('Test statistic')
ax.set_ylabel('Counts')

ax.legend()
<matplotlib.legend.Legend at 0x12a5818b0>
../_images/tutorials_p_value_method_comparison_43_1.png

calculate_critical_ts_from_gamma function calculates the TS you would need in your analysis to reach a given significance.

[32]:
critical_ts = calculate_critical_ts_from_gamma(
    ts=trials['ts'],
    h0_ts_quantile=1.3e-3,  # this is ~ 3 sigma
    eta=1.0,
)
print(f'TS value for a 3 sigma detection: {critical_ts:.2f}')
critical_ts = calculate_critical_ts_from_gamma(
    ts=trials['ts'],
    h0_ts_quantile=2.9e-7,  # this is ~ 5 sigma
    eta=1.0,
)
print(f'TS value for a 5 sigma detection: {critical_ts:.2f}')
TS value for a 3 sigma detection: 12.88
TS value for a 5 sigma detection: 30.21

Finally, here the significance fo the unblinded result can be obtained.

[42]:
pval, _ = calculate_pval_from_trials_mixed(ts_vals=trials['ts'], ts_threshold=TS, eta=eta)
print(f'Unblinded result p-value by fitting the truncated gamma function: {pval:.2e}')
Unblinded result p-value by fitting the truncated gamma function: 5.81e-03
[82]:
print(f'p-value from TS distribution: {p_val_from_trials:.2e}')
p-value from TS distribution: 4.40e-03

The p-value calculated usng both methods are compatible.

Note

The p-value computed from the trials was computed using only 1000 trials. This introduces an error in the calcuation that is not negligible. The function calculate_pval_from_trials also returns this error.

[85]:
print(f'The error in calculating p-value from trials: {p_val_from_trials_sigma:.2e}')
The error in calculating p-value from trials: 9.36e-04