Setting an energy range for the signal injection

This tutorial shows how to set an energy range for the signal injection and how to visualize the distribution of the generated pseudo-data events.

For analysis setup details check the Fitting a steady point-source with the public 14-year IceCube track data tutorial.

[1]:
import matplotlib.pyplot as plt
import numpy as np

import skyllh
from skyllh.core.config import Config
from skyllh.core.logging import setup_logging
from skyllh.core.source_model import PointLikeSource

cfg = Config()
logger = setup_logging(cfg=cfg, name='setting_an_energy_range')
datasets = skyllh.create_datasets('IceTracks-DR2', cfg=cfg)

# Location of NGC 1068
src_ra = 40.67  # degrees
src_dec = -0.01  # degrees

source = PointLikeSource(ra=np.radians(src_ra), dec=np.radians(src_dec))
2026-05-21 17:30:07,363 MainProcess skyllh.datasets.datasets INFO: Loaded 4 dataset(s) from sample "IceTracks-DR2": IC40, IC59, IC79, IC86_I-XI

The create_analysis instance can be created with the energy_range set to a specific energy range. This internally sets the energy range for the signal generator. Here in this example, the analysis object is created with the energy range \(10^2\, \text{GeV}\) to \(10^3\, \text{GeV}\).

[2]:
from skyllh.analyses.i3.publicdata_ps.time_integrated_ps import create_analysis

ana = create_analysis(cfg=cfg, datasets=datasets, source=source, energy_range=(1e2, 1e3))
2026-05-21 17:30:08,046 MainProcess skyllh.analyses.i3.publicdata_ps.time_integrated_ps INFO: SourceHypoGroupManager
    Source Hypothesis Groups:
        0: SourceHypoGroup:
            sources (1):
                0: PointLikeSource: "4549661024": { ra=40.670 deg, dec=-0.010 deg }
            fluxmodel:
                1.000e+00 * (E / (1000 GeV))^-2 * 1 (GeV cm^2 s)^-1
            detector signal yield builders (1):
                PDSingleParamFluxPointLikeSourceI3DetSigYieldBuilder
            signal generation method:
                NoneType
2026-05-21 17:30:08,047 MainProcess skyllh.analyses.i3.publicdata_ps.time_integrated_ps INFO: ParameterModelMapper: 2 global parameters, 2 models (1 source)
    Parameters:
        ns [floating (0 <= 10 <= 1000)]
            in models:
            - IceCube: ns

        gamma [floating (1 <= 3 <= 4)]
            in models:
            - 4549661024: gamma

100%|██████████| 33/33 [00:00<00:00, 604.09it/s]
2026-05-21 17:30:10,743 MainProcess skyllh.analyses.i3.publicdata_ps.signal_generator.PDDatasetSignalGenerator INFO: Energy range for signal generation set to (100.0, 1000.0) GeV (effective bin-aligned range: (np.float64(2.0), np.float64(3.0)) in log10(E/GeV)).
100%|██████████| 33/33 [00:00<00:00, 604.03it/s]
2026-05-21 17:30:13,361 MainProcess skyllh.analyses.i3.publicdata_ps.signal_generator.PDDatasetSignalGenerator INFO: Energy range for signal generation set to (100.0, 1000.0) GeV (effective bin-aligned range: (np.float64(2.0), np.float64(3.0)) in log10(E/GeV)).
100%|██████████| 33/33 [00:00<00:00, 600.33it/s]
2026-05-21 17:30:15,868 MainProcess skyllh.analyses.i3.publicdata_ps.signal_generator.PDDatasetSignalGenerator INFO: Energy range for signal generation set to (100.0, 1000.0) GeV (effective bin-aligned range: (np.float64(2.0), np.float64(3.0)) in log10(E/GeV)).
100%|██████████| 33/33 [00:00<00:00, 597.62it/s]
2026-05-21 17:30:19,461 MainProcess skyllh.analyses.i3.publicdata_ps.signal_generator.PDDatasetSignalGenerator INFO: Energy range for signal generation set to (100.0, 1000.0) GeV (effective bin-aligned range: (np.float64(2.0), np.float64(3.0)) in log10(E/GeV)).
100%|██████████| 4/4 [00:11<00:00,  2.85s/it]
100%|██████████| 136/136 [00:00<00:00, 10083.89it/s]

The ana instance has a generate_signal_events method, which can be used to generate signal-like pseudo-data. The function takes the RandomStateService instance and the mean number of events to be generated as its arguments along with a few optional arguments. The actual number of generated events is drawn from a Poisson distribution with mean value mean_n_sig.

It returns:

  1. The actual number of generated events;

  2. The list of the number of signal events that have been generated for each data set;

  3. The list of instance of DataFieldRecordArray containing the signal data events for each data set. An entry is None, if no signal events were generated for this particular data set.

Here below we generate 100 signal-like events:

[3]:
help(ana.generate_signal_events)
Help on method generate_signal_events in module skyllh.core.analysis:

generate_signal_events(
    rss,
    mean_n_sig,
    sig_kwargs=None,
    n_events_list=None,
    events_list=None,
    tl=None
) method of skyllh.core.analysis.SingleSourceMultiDatasetLLHRatioAnalysis instance
    Generates signal events utilizing the signal generator.

    Parameters
    ----------
    rss : instance of RandomStateService
        The instance of RandomStateService to use for generating random
        numbers.
    mean_n_sig : float
        The mean number of signal events that should be generated for the
        trial. The actual number of generated events will be drawn from a
        Poisson distribution with this given signal mean as mean.
    sig_kwargs : dict | None
        Additional keyword arguments for the ``generate_signal_events``
        method of the ``sig_generator_cls`` class. An usual keyword argument
        is ``poisson``.
    n_events_list : list of int | None
        If given, it specifies the number of events of each data set already
        present and the number of signal events will be added.
    events_list : list of instance of DataFieldRecordArray | None
        If given, it specifies the events of each data set already present
        and the signal events will be added.
    tl : instance of TimeLord | None
        The instance of TimeLord that should be used to time individual
        tasks of this method.

    Returns
    -------
    n_sig : int
        The actual number of injected signal events.
    n_events_list : list of int
        The list of the number of signal events that have been generated for
        each data set.
    events_list : list of instance of DataFieldRecordArray
        The list of instance of DataFieldRecordArray containing the
        signal data events for each data set. An entry is None, if no signal
        events were generated for this particular data set.

[4]:
from skyllh.core.random import RandomStateService

mean_n_sig = 100

rss = RandomStateService(seed=1)
n_sig_erange, n_events_list_erange, events_list_erange = ana.generate_signal_events(rss=rss, mean_n_sig=mean_n_sig)
2026-05-21 17:30:19,503 MainProcess skyllh.analyses.i3.publicdata_ps.signal_generator.PDDatasetSignalGenerator WARNING: No `cut_sindec` has been specified. The energy cut will be applied in [-90, 90] deg.
2026-05-21 17:30:19,504 MainProcess skyllh.analyses.i3.publicdata_ps.signal_generator.PDDatasetSignalGenerator WARNING: No `cut_sindec` has been specified. The energy cut will be applied in [-90, 90] deg.
2026-05-21 17:30:19,505 MainProcess skyllh.analyses.i3.publicdata_ps.signal_generator.PDDatasetSignalGenerator WARNING: No `cut_sindec` has been specified. The energy cut will be applied in [-90, 90] deg.
2026-05-21 17:30:19,507 MainProcess skyllh.analyses.i3.publicdata_ps.signal_generator.PDDatasetSignalGenerator WARNING: No `cut_sindec` has been specified. The energy cut will be applied in [-90, 90] deg.
2026-05-21 17:30:19,508 MainProcess skyllh.analyses.i3.publicdata_ps.signal_generator.PDDatasetSignalGenerator WARNING: No `cut_sindec` has been specified. The energy cut will be applied in [-90, 90] deg.

It is also possible to change the energy range to generate signal events without having to reinitialise the analysis instance ana. By re-defining ana.energy_range, one can change what ana.generate_signal_events() uses as the energy range.

By setting ana.energy_range to None, we fall back to the full energy range of the analysis, which is the maximum energy range covered in the smearing matrix, that is (1e2 - 1e9) GeV.

[5]:
rss = RandomStateService(seed=1)
ana.energy_range = None
n_sig, n_events_list, events_list = ana.generate_signal_events(rss=rss, mean_n_sig=mean_n_sig)
2026-05-21 17:30:19,519 MainProcess skyllh.analyses.i3.publicdata_ps.signal_generator.PDDatasetSignalGenerator WARNING: No `cut_sindec` has been specified. The energy cut will be applied in [-90, 90] deg.
2026-05-21 17:30:19,520 MainProcess skyllh.analyses.i3.publicdata_ps.signal_generator.PDDatasetSignalGenerator WARNING: No `cut_sindec` has been specified. The energy cut will be applied in [-90, 90] deg.
2026-05-21 17:30:19,521 MainProcess skyllh.analyses.i3.publicdata_ps.signal_generator.PDDatasetSignalGenerator WARNING: No `cut_sindec` has been specified. The energy cut will be applied in [-90, 90] deg.
2026-05-21 17:30:19,525 MainProcess skyllh.analyses.i3.publicdata_ps.signal_generator.PDDatasetSignalGenerator WARNING: No `cut_sindec` has been specified. The energy cut will be applied in [-90, 90] deg.

Now we can look at the true neutrino energy of the generate pseudo-data events and check that they belong to their respective energy ranges.

[6]:
print(events_list[0])
DataFieldRecordArray: 11 fields, 3 entries, 1 Kbytes
    fields = {
        isvalid        : {dtype: bool, vmin: 1.000e+00, vmax: True}
        log_true_energy: {dtype: float64, vmin: 2.208e+00, vmax: 4.61901412729572}
        log_energy     : {dtype: float64, vmin: 3.319e+00, vmax: 4.004816440590243}
        dec            : {dtype: float64, vmin: -1.067e-02, vmax: 0.018836588819249878}
        ra             : {dtype: float64, vmin: 6.821e-01, vmax: 0.6993641866096665}
        sin_dec        : {dtype: float64, vmin: -1.067e-02, vmax: 0.018835474915109864}
        ang_err        : {dtype: float64, vmin: 9.260e-03, vmax: 0.0205442374500656}
        time           : {dtype: float64, vmin: nan, vmax: nan}
        azi            : {dtype: float64, vmin: nan, vmax: nan}
        zen            : {dtype: float64, vmin: nan, vmax: nan}
        run            : {dtype: int64, vmin: -1.000e+00, vmax: -1}
    }
[7]:
log_true_energy = np.concatenate([events['log_true_energy'] for events in events_list])
injected_log_true_energy = np.concatenate([events['log_true_energy'] for events in events_list_erange])
[8]:
bins = np.linspace(2, 9, 56)

fig, ax = plt.subplots()
ax.hist(log_true_energy, bins=bins, histtype='step', label='Without energy range cut')
ax.hist(
    injected_log_true_energy,
    bins=bins,
    histtype='step',
    label='With (1e2, 1e3) GeV energy range cut',
)
ax.set_title('Generated signal events with gamma=2.0')
ax.set_xlabel('Log True Energy [GeV]')
ax.set_ylabel('Number of generated events')
ax.grid(True)
ax.legend()
plt.show()
../_images/tutorials_setting_an_energy_range_13_0.png

It is to be noted that the only thing that is being limited is the injected signal: We inject signals only in that energy range but the signal hypothesis in the likelihood still assumes the full energy range.