Working with the public 10-year IceCube point-source data

This tutorial shows how to use the IceCube public 10-year point-source data with SkyLLH.

Disclaimer

The released 10-year IceCube point-source data can reproduce the published results only within a certain amount of uncertainty due to the limited instrument response function binning provided in the data release. The IceCube collaboration is able to reproduce the published results using detailed direct simulation data, as done for the publication.

[2]:

import numpy as np
from matplotlib import pyplot as plt
import scipy.stats

Creating a configuration

First we have to create a configuration instance.

[4]:

from skyllh.core.config import Config
cfg = Config()

Getting the datasets

Now we import the dataset definition of the public 10-year point-source data set:

[3]:

from skyllh.datasets.i3.PublicData_10y_ps import create_dataset_collection

The collection of datasets can be created using the create_dataset_collection function. This function requires the base path to the data repository. It’s the path where the public point-source data is stored. The public point-source data can be downloaded from the IceCube website.

[5]:

dsc = create_dataset_collection(
    cfg=cfg,
    base_path='/home/mwolf/projects/publicdata_ps/')

The dataset_names property provides a list of all the data sets defined in the data set collection of the public point-source data.

[6]:

dsc.dataset_names

[6]:

['IC40',
 'IC59',
 'IC79',
 'IC86_I',
 'IC86_II',
 'IC86_II-VII',
 'IC86_III',
 'IC86_IV',
 'IC86_V',
 'IC86_VI',
 'IC86_VII']

The individual data sets IC86_II, IC86_III, IC86_IV, IC86_V, IC86_VI, and IC86_VII are also available as a single combined data set IC86_II-VII, because these data sets share the same detector simulation and event selection. Hence, we can get a list of data sets via the access operator [dataset1, dataset2, ...] of the dsc instance:

[7]:

datasets = dsc['IC40', 'IC59', 'IC79', 'IC86_I', 'IC86_II-VII']

Getting the analysis

The analysis used for the published PRL results is referred in SkyLLH as “traditional point-source analysis” and is pre-defined:

[8]:

from skyllh.analyses.i3.publicdata_ps.time_integrated_ps import create_analysis

[9]:

help(create_analysis)

Help on function create_analysis in module skyllh.analyses.i3.publicdata_ps.time_integrated_ps:

create_analysis(cfg, datasets, source, refplflux_Phi0=1, refplflux_E0=1000.0, refplflux_gamma=2.0, ns_seed=100.0, ns_min=0.0, ns_max=1000.0, gamma_seed=3.0, gamma_min=1.0, gamma_max=5.0, kde_smoothing=False, minimizer_impl='LBFGS', cut_sindec=None, spl_smooth=None, cap_ratio=False, compress_data=False, keep_data_fields=None, evt_sel_delta_angle_deg=10, construct_sig_generator=True, tl=None, ppbar=None, logger_name=None)
Creates the Analysis instance for this particular analysis.

Parameters
----------
cfg : instance of Config
The instance of Config holding the local configuration.
datasets : list of Dataset instances
The list of Dataset instances, which should be used in the
analysis.
source : PointLikeSource instance
The PointLikeSource instance defining the point source position.
refplflux_Phi0 : float
The flux normalization to use for the reference power law flux model.
refplflux_E0 : float
The reference energy to use for the reference power law flux model.
refplflux_gamma : float
The spectral index to use for the reference power law flux model.
ns_seed : float
Value to seed the minimizer with for the ns fit.
ns_min : float
Lower bound for ns fit.
ns_max : float
Upper bound for ns fit.
gamma_seed : float | None
Value to seed the minimizer with for the gamma fit. If set to None,
the refplflux_gamma value will be set as gamma_seed.
gamma_min : float
Lower bound for gamma fit.
gamma_max : float
Upper bound for gamma fit.
kde_smoothing : bool
Apply a KDE-based smoothing to the data-driven background pdf.
Default: False.
minimizer_impl : str
Minimizer implementation to be used. Supported options are ``"LBFGS"``
(L-BFG-S minimizer used from the :mod:`scipy.optimize` module), or
``"minuit"`` (Minuit minimizer used by the :mod:`iminuit` module).
Default: "LBFGS".
cut_sindec : list of float | None
sin(dec) values at which the energy cut in the southern sky should
start. If None, np.sin(np.radians([-2, 0, -3, 0, 0])) is used.
spl_smooth : list of float
Smoothing parameters for the 1D spline for the energy cut. If None,
[0., 0.005, 0.05, 0.2, 0.3] is used.
cap_ratio : bool
If set to True, the energy PDF ratio will be capped to a finite value
where no background energy PDF information is available. This will
ensure that an energy PDF ratio is available for high energies where
no background is available from the experimental data.
If kde_smoothing is set to True, cap_ratio should be set to False!
Default is False.
compress_data : bool
Flag if the data should get converted from float64 into float32.
keep_data_fields : list of str | None
List of additional data field names that should get kept when loading
the data.
evt_sel_delta_angle_deg : float
The delta angle in degrees for the event selection optimization methods.
construct_sig_generator : bool
Flag if the signal generator should be constructed (``True``) or not
(``False``).
tl : TimeLord instance | None
The TimeLord instance to use to time the creation of the analysis.
ppbar : ProgressBar instance | None
The instance of ProgressBar for the optional parent progress bar.
logger_name : str | None
The name of the logger to be used. If set to ``None``, ``__name__`` will
be used.

Returns
-------
ana : instance of SingleSourceMultiDatasetLLHRatioAnalysis
The Analysis instance for this analysis.

As source we use TXS 0506+056.

[10]:

from skyllh.core.source_model import PointLikeSource

[11]:

source = PointLikeSource(ra=np.deg2rad(77.35), dec=np.deg2rad(5.7))

[12]:

ana = create_analysis(cfg=cfg, datasets=datasets, source=source)

  0%|          | 0/5 [00:00<?, ?it/s]100%|██████████| 43/43 [00:04<00:00,  9.11it/s]
100%|██████████| 43/43 [00:04<00:00,  9.90it/s]
100%|██████████| 43/43 [00:04<00:00,  9.70it/s]
100%|██████████| 43/43 [00:04<00:00,  9.85it/s]
100%|██████████| 43/43 [00:04<00:00,  8.89it/s]
100%|██████████| 5/5 [01:17<00:00, 15.48s/it]
100%|██████████| 220/220 [00:00<00:00, 2409.32it/s]

Initializing a trial

After the Analysis instance was created trials can be run. To do so the analysis needs to be initialized with some trial data. For instance we could initialize the analysis with the experimental data to “unblind” the analysis afterwards. Technically the TrialDataManager of each log-likelihood ratio function, i.e. dataset, is initialized with data.

The Analysis class provides the method initialize_trial to initialize a trial with data. It takes a list of DataFieldRecordArray instances holding the events. If we want to initialize a trial with the experimental data, we can get that list from the Analysis instance itself:

[13]:

events_list = [ data.exp for data in ana.data_list ]
ana.initialize_trial(events_list)

Maximizing the log-likelihood ratio function

After initializing a trial, we can maximize the LLH ratio function using the maximize_llhratio method of the Analysis class. This method requires a RandomStateService instance in case the minimizer does not succeed and a new set of initial values for the fit parameters need to get generated. The method returns a 4-element tuple. The first element is the set of fit parameters used in the maximization. The second element is the value of the LLH ration function at its maximum. The third element is the array of the fit parameter values at the maximum, and the forth element is the status dictionary of the minimizer.

[14]:

from skyllh.core.random import RandomStateService
rss = RandomStateService(seed=1)

[15]:

(log_lambda_max, fitparam_values, status) = ana.llhratio.maximize(rss)

[16]:

print(f'log_lambda_max = {log_lambda_max}')
print(f'fitparam_values = {fitparam_values}')
print(f'status = {status}')

log_lambda_max = 6.5725295599619535
fitparam_values = [14.58017285  2.16856498]
status = {'grad': array([ 2.22650525e-06, -7.55260353e-05]), 'task': 'CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH', 'funcalls': 27, 'nit': 22, 'warnflag': 0, 'skyllh_minimizer_n_reps': 0, 'n_llhratio_func_calls': 27}

Calculating the test-statistic

Using the maximum of the LLH ratio function and the fit parameter values at the maximum we can calculate the test-statistic using the calculate_test_statistic method of the Analysis class:

[17]:

TS = ana.calculate_test_statistic(log_lambda_max, fitparam_values)
print(f'TS = {TS:.3f}')

TS = 13.145

Unblinding the data

After creating the analysis instance we can unblind the data for the choosen source. Hence, we initialize the analysis with a trial of the experimental data, maximize the log-likelihood ratio function for all given experimental data events, and calculate the test-statistic value. The analysis instance has the method unblind that can be used for that. This method requires a RandomStateService instance in case the minimizer does not succeed and a new set of initial values for the fit parameters need to get generated.

[18]:

from skyllh.core.random import RandomStateService
rss = RandomStateService(seed=1)

[19]:

help(ana.unblind)

Help on method unblind in module skyllh.core.analysis:

unblind(rss, tl=None) method of skyllh.core.analysis.SingleSourceMultiDatasetLLHRatioAnalysis instance
    Evaluates the unscrambled data, i.e. unblinds the data.

    Parameters
    ----------
    rss : instance of RandomStateService
        The instance of RandomStateService that should be used draw random
        numbers from.
    tl : instance of TimeLord | None
        The optional instance of TimeLord that should be used to time the
        maximization of the LLH ratio function.

    Returns
    -------
    TS : float
        The test-statistic value.
    global_params_dict : dict
        The dictionary holding the global parameter names and their
        best fit values. It includes fixed and floating parameters.
    status : dict
        The status dictionary with information about the performed
        minimization process of the negative of the log-likelihood ratio
        function.

The unblind method returns the test-statistic value, the best-fit fit parameter values, and a status dictionary of the minimizer.

[20]:

(ts, x, status) = ana.unblind(rss=rss)

[21]:

print(f'TS = {ts:.3f}')
print(f'ns = {x["ns"]:.2f}')
print(f'gamma = {x["gamma"]:.2f}')

TS = 13.145
ns = 14.58
gamma = 2.17

Calculating the corresponding flux normalization

By default the analysis is created with a flux normalization of 1 GeV\(^{-1}\)s\(^{-1}\)cm\(^{-2}\)sr\(^{-1}\) (see refplflux_Phi0 argument of the create_analysis method). The analysis instance has the method calculate_fluxmodel_scaling_factor that calculates the scaling factor the reference flux normalization has to be multiplied with to represent a given analysis result, i.e. \(n_{\text{s}}\) and \(\gamma\) value. This function takes the detected mean \(n_{\text{s}}\) value as first argument and the list of source parameter values as second argument:

[22]:

scaling_factor = ana.calculate_fluxmodel_scaling_factor(x['ns'], [x['ns'], x['gamma']])
print(f'Flux scaling factor = {scaling_factor:.3e}')

Flux scaling factor = 1.422e-15

Hence, our result corresponds to a power-law flux of:

[23]:

print(f'{scaling_factor:.3e}'' (E/1000 GeV)^{-'f'{x["gamma"]:.2f}'+'} 1/(GeV s cm^2 sr)')

1.422e-15 (E/1000 GeV)^{-2.17} 1/(GeV s cm^2 sr)

Evaluating the log-likelihood ratio function

Sometimes it is useful to be able to evaluate the log-likelihood ratio function, e.g. for creating a likelihood contour plot. Because SkyLLH’s structure is based on the mathematical structure of the likelihood function, the Analysis instance has the property llhratio which is the class instance of the used log-likelihood ratio function. This instance has the method evaluate. The method takes an array of the fit parameter values as argument at which the LLH ratio function will be evaluated. It returns the value of the LLH ratio function at the given point and its gradients w.r.t. the fit parameters.

In our case this is the number of signal events, \(n_{\mathrm{s}}\) and the spectral index \(\gamma\). If we evaluate the LLH ratio function at the maximum, the gradients should be close to zero.

[24]:

help(ana.llhratio.evaluate)

Help on method evaluate in module skyllh.core.llhratio:

evaluate(fitparam_values, src_params_recarray=None, tl=None) method of skyllh.core.llhratio.MultiDatasetTCLLHRatio instance
    Evaluates the composite log-likelihood-ratio function and returns its
    value and global fit parameter gradients.

    Parameters
    ----------
    fitparam_values : instance of numpy ndarray
        The (N_fitparams,)-shaped numpy 1D ndarray holding the current
        values of the global fit parameters.
    src_params_recarray : instance of numpy record ndarray | None
        The numpy record ndarray of length N_sources holding the parameter
        names and values of all sources.
        See the documentation of the
        :meth:`skyllh.core.parameters.ParameterModelMapper.create_src_params_recarray`
        method for more information about this array.
        It case it is ``None``, it will be created automatically from the
        ``fitparam_values`` argument using the
        :class:`~skyllh.core.parameters.ParameterModelMapper` instance.
    tl : instance of TimeLord | None
        The optional instance of TimeLord that should be used for timing
        measurements.

    Returns
    -------
    log_lambda : float
        The calculated log-lambda value of the composite
        log-likelihood-ratio function.
    grads : instance of numpy ndarray
        The (N_fitparams,)-shaped 1D ndarray holding the gradient value of
        the composite log-likelihood-ratio function for each global fit
        parameter.

[25]:

(llhratio_value, (grad_ns, grad_gamma)) = ana.llhratio.evaluate([14.58, 2.17])
print(f'llhratio_value = {llhratio_value:.3f}')
print(f'grad_ns = {grad_ns:.3f}')
print(f'grad_gamma = {grad_gamma:.3f}')

llhratio_value = 6.573
grad_ns = 0.001
grad_gamma = -0.027

Using the evaluate method of the LLHRatio class we can scan the log-likelihood ratio space and create a contour plot showing the best fit and the 68%, 90%, and 95% quantile assuming Wilks-theorem.

[26]:

(ns_min, ns_max, ns_step) = (0, 80, 0.5)
(gamma_min, gamma_max, gamma_step) = (1.5, 4.0, 0.1)

ns_edges = np.linspace(ns_min, ns_max, int((ns_max-ns_min)/ns_step)+1)
ns_vals = 0.5*(ns_edges[1:] + ns_edges[:-1])

gamma_edges = np.linspace(gamma_min, gamma_max, int((gamma_max-gamma_min)/gamma_step+1))
gamma_vals = 0.5*(gamma_edges[1:] + gamma_edges[:-1])

delta_ts = np.empty((len(ns_vals), len(gamma_vals)), dtype=np.double)
for (ns_i, ns) in enumerate(ns_vals):
    for (gamma_i, gamma) in enumerate(gamma_vals):

        delta_ts[ns_i, gamma_i] = (
            ana.calculate_test_statistic(llhratio_value, [14.58, 2.17]) -
            ana.calculate_test_statistic(ana.llhratio.evaluate([ns, gamma])[0], [ns, gamma])
        )

# Determine the best fit ns and gamma values from the scan.
index_max = np.argmin(delta_ts)
ns_i_max = int(index_max / len(gamma_vals))
gamma_i_max = index_max % len(gamma_vals)
ns_best = ns_vals[ns_i_max]
gamma_best = gamma_vals[gamma_i_max]

[27]:

# Determine the delta lambda value for the 95% quantile assuming a chi-sqaure
# distribution with 2 degrees of freedom (i.e. assuming Wilks theorem).
chi2_68_quantile = scipy.stats.chi2.ppf(0.68, df=2)
chi2_90_quantile = scipy.stats.chi2.ppf(0.90, df=2)
chi2_95_quantile = scipy.stats.chi2.ppf(0.95, df=2)

[28]:

from matplotlib.colors import LogNorm
plt.figure(figsize=(8,6))
plt.pcolormesh(gamma_edges, ns_edges, delta_ts, cmap='nipy_spectral')
cbar = plt.colorbar()
cbar.set_label(r'$\Delta$TS')
plt.contour(gamma_vals, ns_vals, delta_ts, [chi2_68_quantile], colors='#FFFFFF')
plt.contour(gamma_vals, ns_vals, delta_ts, [chi2_90_quantile], colors='#AAAAAA')
plt.contour(gamma_vals, ns_vals, delta_ts, [chi2_95_quantile], colors='#444444')
plt.plot(gamma_best, ns_best, marker='x', color='white', ms=10)
plt.xlabel(r'$\gamma$')
plt.ylabel(r'$n_{\mathrm{s}}$')
plt.ylim(ns_min, ns_max)
plt.xlim(gamma_min, gamma_max)

[28]:

(1.5, 4.0)

../_images/tutorials_publicdata_ps_54_1.png

Calculating the significance (local p-value)

The significance of the source, i.e. the local p-value, can be calculated by generating the test-statistic distribution of background-only data trials, i.e. for zero injected signal events. SkyLLH provides the helper function create_trial_data_file to do that:

[27]:

from skyllh.core.utils.analysis import create_trial_data_file

[28]:

help(create_trial_data_file)

Help on function create_trial_data_file in module skyllh.core.utils.analysis:

create_trial_data_file(ana, rss, n_trials, mean_n_sig=0, mean_n_sig_null=0, mean_n_bkg_list=None, bkg_kwargs=None, sig_kwargs=None, pathfilename=None, ncpu=None, ppbar=None, tl=None)
Creates and fills a trial data file with `n_trials` generated trials for
each mean number of injected signal events from `ns_min` up to `ns_max` for
a given analysis.

Parameters
----------
ana : instance of Analysis
The Analysis instance to use for the trial generation.
rss : instance of RandomStateService
The RandomStateService instance to use for generating random
numbers.
n_trials : int
The number of trials to perform for each hypothesis test.
mean_n_sig : ndarray of float | float | 2- or 3-element sequence of float
The array of mean number of injected signal events (MNOISEs) for which
to generate trials. If this argument is not a ndarray, an array of
MNOISEs is generated based on this argument.
If a single float is given, only this given MNOISEs are injected.
If a 2-element sequence of floats is given, it specifies the range of
MNOISEs with a step size of one.
If a 3-element sequence of floats is given, it specifies the range plus
the step size of the MNOISEs.
mean_n_sig_null : ndarray of float | float | 2- or 3-element sequence of float
The array of the fixed mean number of signal events (FMNOSEs) for the
null-hypothesis for which to generate trials. If this argument is not a
ndarray, an array of FMNOSEs is generated based on this argument.
If a single float is given, only this given FMNOSEs are used.
If a 2-element sequence of floats is given, it specifies the range of
FMNOSEs with a step size of one.
If a 3-element sequence of floats is given, it specifies the range plus
the step size of the FMNOSEs.
mean_n_bkg_list : list of float | None
The mean number of background events that should be generated for
each dataset. This parameter is passed to the ``do_trials`` method of
the ``Analysis`` class. If set to None (the default), the background
generation method needs to obtain this number itself.
bkg_kwargs : dict | None
Additional keyword arguments for the `generate_events` method of the
background generation method class. An usual keyword argument is
`poisson`.
sig_kwargs : dict | None
Additional keyword arguments for the `generate_signal_events` method
of the `SignalGenerator` class. An usual keyword argument is
`poisson`.
pathfilename : string | None
Trial data file path including the filename.
If set to None generated trials won't be saved.
ncpu : int | None
The number of CPUs to use.
ppbar : instance of ProgressBar | None
The optional instance of the parent progress bar.
tl: instance of TimeLord | None
The instance of TimeLord that should be used to measure individual
tasks.

Returns
-------
seed : int
The seed used to generate the trials.
mean_n_sig : 1d ndarray
The array holding the mean number of signal events used to generate the
trials.
mean_n_sig_null : 1d ndarray
The array holding the fixed mean number of signal events for the
null-hypothesis used to generate the trials.
trial_data : structured numpy ndarray
The generated trial data.

At first we will generate 10k trials and look at the test-statistic distribution. We will time the trial generation using the TimeLord class.

[29]:

from skyllh.core.timing import TimeLord
tl = TimeLord()

[31]:

rss = RandomStateService(seed=1)
(_, _, _, trials) = create_trial_data_file(
    ana=ana,
    rss=rss,
    n_trials=1e4,
    mean_n_sig=0,
    pathfilename='/home/mwolf/projects/publicdata_ps/txs_bkg_trails.npy',
    ncpu=8,
    tl=tl)
print(tl)

100%|██████████| 10001/10001 [08:52<00:00, 18.78it/s]

TimeLord: Executed tasks:
[Generating background events for data set 0.]   0.002 sec/iter (10000)
[Generating background events for data set 1.]   0.003 sec/iter (10000)
[Generating background events for data set 2.]   0.003 sec/iter (10000)
[Generating background events for data set 3.]   0.006 sec/iter (10000)
[Generating background events for data set 4.]   0.019 sec/iter (10000)
[Generating pseudo data.                     ]   0.027 sec/iter (10000)
[Initializing trial.                         ]   0.030 sec/iter (10000)
[Get sig probability densities and grads.    ] 4.4e-06 sec/iter (1950580)
[Get bkg probability densities and grads.    ] 3.3e-06 sec/iter (1950580)
[Calculate PDF ratios.                       ] 9.9e-05 sec/iter (1950580)
[Calc pdfratio value Ri                      ] 5.4e-04 sec/iter (975290)
[Calc logLamds and grads                     ] 2.7e-04 sec/iter (975290)
[Evaluate llh-ratio function.                ]   0.003 sec/iter (195058)
[Minimize -llhratio function.                ]   0.058 sec/iter (10000)
[Maximizing LLH ratio function.              ]   0.058 sec/iter (10000)
[Calculating test statistic.                 ] 5.1e-05 sec/iter (10000)

After generating the background trials, we can histogram the test-statistic values and plot the TS distribution.

[32]:

(h, be) = np.histogram(trials['ts'], bins=np.arange(0, np.max(trials['ts'])+0.1, 0.1))
plt.plot(0.5*(be[:-1]+be[1:]), h, drawstyle='steps-mid', label='background')
plt.vlines(ts, 1, np.max(h), label=f'TS(TXS 0506+056)={ts:.3f}')
plt.yscale('log')
plt.xlabel('TS')
plt.ylabel('#trials per bin')
plt.legend()
pass

../_images/tutorials_publicdata_ps_63_0.png

We can see that the TS value of the unblinded data for TXS is rather large and 10k trials are not enough to calculate a reliable estimate for the p-value. Hence, we will generate a few more trials. SkyLLH provides also a helper function to extend the trial data file we just created. It is called extend_trial_data_file:

[32]:

from skyllh.core.utils.analysis import extend_trial_data_file

[33]:

help(extend_trial_data_file)

Help on function extend_trial_data_file in module skyllh.core.utils.analysis:

extend_trial_data_file(ana, rss, n_trials, trial_data, mean_n_sig=0, mean_n_sig_null=0, mean_n_bkg_list=None, bkg_kwargs=None, sig_kwargs=None, pathfilename=None, **kwargs)
Appends to the trial data file `n_trials` generated trials for each
mean number of injected signal events up to `ns_max` for a given analysis.

Parameters
----------
ana : Analysis
The Analysis instance to use for sensitivity estimation.
rss : RandomStateService
The RandomStateService instance to use for generating random
numbers.
n_trials : int
The number of trials the trial data file needs to be extended by.
trial_data : structured numpy ndarray
The structured numpy ndarray holding the trials.
mean_n_sig : ndarray of float | float | 2- or 3-element sequence of float
The array of mean number of injected signal events (MNOISEs) for which
to generate trials. If this argument is not a ndarray, an array of
MNOISEs is generated based on this argument.
If a single float is given, only this given MNOISEs are injected.
If a 2-element sequence of floats is given, it specifies the range of
MNOISEs with a step size of one.
If a 3-element sequence of floats is given, it specifies the range plus
the step size of the MNOISEs.
mean_n_sig_null : ndarray of float | float | 2- or 3-element sequence of
float
The array of the fixed mean number of signal events (FMNOSEs) for the
null-hypothesis for which to generate trials. If this argument is not a
ndarray, an array of FMNOSEs is generated based on this argument.
If a single float is given, only this given FMNOSEs are used.
If a 2-element sequence of floats is given, it specifies the range of
FMNOSEs with a step size of one.
If a 3-element sequence of floats is given, it specifies the range plus
the step size of the FMNOSEs.
bkg_kwargs : dict | None
Additional keyword arguments for the `generate_events` method of the
background generation method class. An usual keyword argument is
`poisson`.
sig_kwargs : dict | None
Additional keyword arguments for the `generate_signal_events` method
of the `SignalGenerator` class. An usual keyword argument is
`poisson`.
pathfilename : string | None
Trial data file path including the filename.

Additional keyword arguments
----------------------------
Additional keyword arguments are passed-on to the ``create_trial_data_file``
function.

Returns
-------
trial_data :
Trial data file extended by the required number of trials for each
mean number of injected signal events..

[34]:

tl = TimeLord()
rss = RandomStateService(seed=2)
trials = extend_trial_data_file(
    ana=ana,
    rss=rss,
    n_trials=4e4,
    trial_data=trials,
    pathfilename='/home/mwolf/projects/publicdata_ps/txs_bkg_trails.npy',
    ncpu=8,
    tl=tl)

100%|██████████| 40001/40001 [1:33:15<00:00,  7.15it/s]

[35]:

print(tl)

TimeLord: Executed tasks:
[Generating background events for data set 0.]   0.003 sec/iter (40000)
[Generating background events for data set 1.]   0.005 sec/iter (40000)
[Generating background events for data set 2.]   0.004 sec/iter (40000)
[Generating background events for data set 3.]   0.008 sec/iter (40000)
[Generating background events for data set 4.]   0.029 sec/iter (40000)
[Generating pseudo data.                     ]   0.045 sec/iter (40000)
[Initializing trial.                         ]   0.126 sec/iter (40000)
[Get sig probability densities and grads.    ] 2.6e-04 sec/iter (7959160)
[Evaluating bkg log-spline.                  ] 3.3e-04 sec/iter (7959160)
[Get bkg probability densities and grads.    ] 4.0e-04 sec/iter (7959160)
[Calculate PDF ratios.                       ] 1.3e-04 sec/iter (7959160)
[Calc pdfratio value Ri                      ]   0.002 sec/iter (3979580)
[Calc logLamds and grads                     ] 4.4e-04 sec/iter (3979580)
[Evaluate llh-ratio function.                ]   0.008 sec/iter (795916)
[Minimize -llhratio function.                ]   0.166 sec/iter (40000)
[Maximizing LLH ratio function.              ]   0.166 sec/iter (40000)
[Calculating test statistic.                 ] 6.1e-05 sec/iter (40000)

The local p-value is defined as the fraction of background trials with TS value greater than the unblinded TS value of the source.

[36]:

minus_log10_pval = -np.log10(len(trials[trials['ts'] > ts]) / len(trials))
print(f'-log10(p_local) = {minus_log10_pval:.2f}')

-log10(p_local) = 2.89

[37]:

(h, be) = np.histogram(trials['ts'], bins=np.arange(0, np.max(trials['ts'])+0.1, 0.1))
plt.plot(0.5*(be[:-1]+be[1:]), h, drawstyle='steps-mid', label='background')
plt.vlines(ts, 1, np.max(h), label=f'TS(TXS 0506+056)={ts:.3f}')
plt.yscale('log')
plt.xlabel('TS')
plt.ylabel('#trials per bin')
plt.legend()
pass

../_images/tutorials_publicdata_ps_71_0.png