PISA stage modes

Every PISA stage of a pipeline has a calc_mode and an apply_mode. Both instance attributes specify the “representation” in which generic data (e.g., neutrino MC events) is processed through the pipeline. Often calculations can be faster when performed on grids, but we need to be careful in order to ensure that we are not introducing large errors.

More specifically, calc_mode by default defines the representation during the setup()and compute() steps, and apply_mode that during the apply step, see the stages readme. The latter two steps are executed successively whenever a given stage instance is run(), during the pipeline output calculation. Like this, complex event-by-event calculations (e.g., oscillation probabilities) can for example be executed during the compute() step, which also has a basic caching mechanism to avoid redundant calculations. The apply() step typically performs simple transformations (using results of a preceding compute() step or not) of the data in the representation determined by apply_mode. Take a look at different stage implementations (“services”) and example pipeline configuration files to get a better feel for the concept.

Note that you can change the modes on runtime, but after doing so need to setup() the stage or pipeline again (exercise: can you find a service which only defines its setup() step, but neither compute() nor apply()?).

If the output representation of a stage is different than what, for example, the next stage needs to have as input, the output is automatically translated by PISA (translation between data representations). So you can mix and match, but be aware that translations will introduce computational cost and hence may slow things down.

Service implementation reference

This auto-generated table documents which methods each service implements, which representations it supports, as well as any deviant behavior or properties to be aware of. It currently only contains services that can be instantiated without importing optional PISA dependencies.

Legend:

  • has_setup/compute/apply: whether the service overrides setup/compute/apply_function (✓ = yes, ✗ = no)

  • calc_mode support: allowed representations during setup/compute steps

  • apply_mode support: allowed representations during apply step

  • Notes: special behaviors, restrictions, problems, etc.

Service

has_setup

has_compute

has_apply

calc_mode support

apply_mode support

Notes

absorption.earth_absorption

“events”, “log_events”, MultiDimBinning

“events”, “log_events”, MultiDimBinning

aeff.aeff

None

“events”, “log_events”, MultiDimBinning

aeff.param

None

“events”, “log_events”, MultiDimBinning

aeff.weight

None

“events”, “log_events”, MultiDimBinning

aeff.weight_hnl

None

“events”, “log_events”, MultiDimBinning

background.atm_muons

“events”, “log_events”, MultiDimBinning

“events”, “log_events”, MultiDimBinning

cont_sys.snowstorm_hist

“events”

None, MultiDimBinning

data.csv_data_hist

“events”, “log_events”, MultiDimBinning

None

⚠️ Implicit caching (#821), setup only

data.csv_icc_hist

“events”, “log_events”, MultiDimBinning

“events”, “log_events”, MultiDimBinning

data.csv_loader

“events”

“events”

data.grid

“events”

“events”, “log_events”, MultiDimBinning

data.meows_loader

“events”, “log_events”, MultiDimBinning

“events”, “log_events”, MultiDimBinning

data.simple_data_loader

None

“events”

data.sqlite_loader

“events”, “log_events”, MultiDimBinning

“events”, “log_events”, MultiDimBinning

data.toy_event_generator

“events”, “log_events”, MultiDimBinning

“events”, “log_events”, MultiDimBinning

discr_sys.csv_hypersurfaces

MultiDimBinning

MultiDimBinning, “events”

discr_sys.hypersurfaces

MultiDimBinning

“events”, “log_events”, MultiDimBinning

discr_sys.ultrasurfaces

“events”

“events”, “log_events”, MultiDimBinning

flux.astrophysical

“events”, “log_events”, MultiDimBinning

“events”, “log_events”, MultiDimBinning

flux.barr_simple

“events”, “log_events”, MultiDimBinning

None

⚠️ Implicit caching (#821)

flux.daemon_flux

“events”, “log_events”, MultiDimBinning

None

⚠️ Implicit caching (#821)

flux.hillasg

“events”, “log_events”, MultiDimBinning

None

⚠️ Implicit caching (#821)

flux.honda_ip

“events”, “log_events”, MultiDimBinning

None

⚠️ Implicit caching (#821)

flux.mceq_barr

“events”, “log_events”, MultiDimBinning

None

⚠️ Implicit caching (#821)

flux.mceq_barr_red

“events”, “log_events”, MultiDimBinning

None

⚠️ Implicit caching (#821)

likelihood.generalized_llh_params

“events”, “log_events”, MultiDimBinning

MultiDimBinning

osc.decoherence

“events”, “log_events”, MultiDimBinning

“events”, “log_events”, MultiDimBinning

osc.globes

“events”, “log_events”, MultiDimBinning

“events”, “log_events”, MultiDimBinning

osc.prob3

“events”, “log_events”, MultiDimBinning

“events”, “log_events”, MultiDimBinning

osc.two_nu_osc

None

“events”, “log_events”, MultiDimBinning

reco.resolutions

“events”

None

⚠️ Implicit caching (#821), setup only

reco.simple_param

“events”, “log_events”, MultiDimBinning

None

⚠️ Implicit caching (#821), setup only

utils.add_indices

“events”

MultiDimBinning

⚠️ Implicit caching (#821), setup only

utils.adhoc_sys

“events”

“events”

utils.bootstrap

“events”

“events”, “log_events”, MultiDimBinning

utils.fix_error

“events”, “log_events”, MultiDimBinning

“events”, “log_events”, MultiDimBinning

utils.hist

MultiDimBinning, “events”

None, MultiDimBinning

utils.kde

“events”

MultiDimBinning

utils.kfold

“events”

“events”, “log_events”, MultiDimBinning

utils.set_variance

MultiDimBinning

MultiDimBinning

xsec.correct_charm_y

“events”, “log_events”, MultiDimBinning

“events”, “log_events”, MultiDimBinning

xsec.dis_sys

“events”

“events”, “log_events”, MultiDimBinning

xsec.genie_sys

“events”, “log_events”, MultiDimBinning

“events”, “log_events”, MultiDimBinning

xsec.nutau_xsec

“events”, “log_events”, MultiDimBinning

“events”, “log_events”, MultiDimBinning

import numpy as np
import matplotlib.pyplot as plt
from pisa.core.pipeline import Pipeline

We will configure our neutrino pipeline in 3 different ways:

  • The standard form with some calculation on grids

  • All calculations on an event-by-event basis (the most correct, but by far slowest way)

  • All calculations on grids (usually faster for large event samples)

mixed_modes_model = Pipeline("settings/pipeline/IceCube_3y_neutrinos.cfg", profile=True)
events_modes_model = Pipeline("settings/pipeline/IceCube_3y_neutrinos.cfg", profile=True)

events_modes_model.stages[1].calc_mode = "events"
events_modes_model.stages[2].calc_mode = "events"
events_modes_model.stages[3].calc_mode = "events"

events_modes_model.setup()
grid_modes_model = Pipeline("settings/pipeline/IceCube_3y_neutrinos.cfg", profile=True)

true_binning = grid_modes_model.stages[1].calc_mode

for s in grid_modes_model.stages[:-2]:
    try:
        s.calc_mode = true_binning
    except:
        pass
    try:
        s.apply_mode = true_binning
    except:
        pass
grid_modes_model.stages[5].calc_mode = true_binning
grid_modes_model.setup()
mixed_modes_model
stage number name calc_mode apply_mode has setup has compute has apply # fixed params # free params
0 csv_loader events events True False True 0 0
1 honda_ip "true_allsky_fine": 200 (true_energy) x 200 (true_coszen) True True False 1 0
2 barr_simple "true_allsky_fine": 200 (true_energy) x 200 (true_coszen) True True False 1 4
3 prob3 "true_allsky_fine": 200 (true_energy) x 200 (true_coszen) events True True True 9 3
4 aeff events False False True 2 3
5 hist events "dragon_datarelease": 8 (reco_energy) x 8 (reco_coszen) x 2 (pid) True False True 0 0
6hypersurfaces "dragon_datarelease": 8 (reco_energy) x 8 (reco_coszen) x 2 (pid) "dragon_datarelease": 8 (reco_energy) x 8 (reco_coszen) x 2 (pid) True True True 0 5
events_modes_model
stage number name calc_mode apply_mode has setup has compute has apply # fixed params # free params
0 csv_loader events events True False True 0 0
1 honda_ip events True True False 1 0
2 barr_simple events True True False 1 4
3 prob3 events events True True True 9 3
4 aeff events False False True 2 3
5 hist events "dragon_datarelease": 8 (reco_energy) x 8 (reco_coszen) x 2 (pid) True False True 0 0
6hypersurfaces "dragon_datarelease": 8 (reco_energy) x 8 (reco_coszen) x 2 (pid) "dragon_datarelease": 8 (reco_energy) x 8 (reco_coszen) x 2 (pid) True True True 0 5
grid_modes_model
stage number name calc_mode apply_mode has setup has compute has apply # fixed params # free params
0 csv_loader events events True False True 0 0
1 honda_ip "true_allsky_fine": 200 (true_energy) x 200 (true_coszen) True True False 1 0
2 barr_simple "true_allsky_fine": 200 (true_energy) x 200 (true_coszen) True True False 1 4
3 prob3 "true_allsky_fine": 200 (true_energy) x 200 (true_coszen) "true_allsky_fine": 200 (true_energy) x 200 (true_coszen) True True True 9 3
4 aeff "true_allsky_fine": 200 (true_energy) x 200 (true_coszen) False False True 2 3
5 hist "true_allsky_fine": 200 (true_energy) x 200 (true_coszen) "dragon_datarelease": 8 (reco_energy) x 8 (reco_coszen) x 2 (pid) True False True 0 0
6hypersurfaces "dragon_datarelease": 8 (reco_energy) x 8 (reco_coszen) x 2 (pid) "dragon_datarelease": 8 (reco_energy) x 8 (reco_coszen) x 2 (pid) True True True 0 5

We can compare timings. Event-by-event it takes around 8 minutes! The two other modes around 25 seconds.

Note: To speed up the following get_outputs() calls, consider running this notebook after having set the environment variables PISA_TARGET="parallel" and PISA_NUM_THREADS = <some number of threads > 1>.

%%time
events = events_modes_model.get_outputs()
CPU times: user 6min 42s, sys: 36.8 ms, total: 6min 42s
Wall time: 6min 42s
%%time
mixed = mixed_modes_model.get_outputs()
CPU times: user 25.2 s, sys: 15.8 ms, total: 25.2 s
Wall time: 25.4 s
%%time
grid = grid_modes_model.get_outputs()
CPU times: user 21.8 s, sys: 8.19 ms, total: 21.8 s
Wall time: 21.5 s

We can see that in this configuration we probably have fine enough grids, such that differences are at the sub-percent level. This may or may not be acceptable for the specific analysis you want to do.

fig, ax = plt.subplots(2, 3, figsize=(20, 7))
plt.subplots_adjust(hspace=0.5)

e = events.combine_wildcard('*')
m = mixed.combine_wildcard('*') 
g = grid.combine_wildcard('*') 

e.plot(ax=ax[0,0], title="Events")
m.plot(ax=ax[0,1], title="Mixed")
((e - m)/e).plot(ax=ax[0,2], symm=True)

e.plot(ax=ax[1,0], title="Events")
g.plot(ax=ax[1,1], title="Grid")
((e - g)/e).plot(ax=ax[1,2], symm=True)
(<Figure size 2000x700 with 18 Axes>,
 <Axes: title={'center': '${\\nu_{\\rm all}} + {\\bar\\nu_{\\rm all}},{\\;}{\\rm PID}{\\;}bin{\\;}0$'}, xlabel='$E_{\\rm reco} \\; \\left( \\mathrm{GeV} \\right)$', ylabel='$\\cos{\\theta}_{\\rm reco}$'>,
 <matplotlib.collections.QuadMesh at 0x7f760edadd10>,
 None)
../_images/5fe623a088a76a2589ff4e42f6d1e8d062cf8d075c0a9f7395057786da4d68b7.png

In depth comparison of single maps:

fig, axes = plt.subplots(3,4, figsize=(24,10))
plt.subplots_adjust(hspace=0.5)
axes = axes.flatten()

diff = (events - mixed) / (events + 1e-7)

for m, ax in zip(diff, axes):
    m.plot(ax=ax, symm=True)
../_images/ab1c5349a6bcfed64cffe45ef510c401f3e1c4c1cc07ab2052abbbafb9cb26b5.png
fig, axes = plt.subplots(3,4, figsize=(24,10))
plt.subplots_adjust(hspace=0.5)
axes = axes.flatten()

diff = (events - grid) / (events + 1e-7)

for m, ax in zip(diff, axes):
    m.plot(ax=ax, symm=True)
../_images/25c066f1623ec7669f594372c1f08b43aa8996ac4a3296f3de3b6fad26ee9b8e.png