pisa.stages.utils package
Submodules
pisa.stages.utils.add_indices module
PISA module to prep incoming data into formats that are compatible with the mc_uncertainty likelihood formulation
This module takes in events containers from the pipeline, and introduces an additional array giving the indices where each event falls into.
module structure imported from bootcamp example
- class pisa.stages.utils.add_indices.add_indices(**std_kwargs)[source]
Bases:
Stage
PISA Pi stage to map out the index of the analysis binning where each event falls into.
- Parameters:
params – foo : Quantity bar : Quanitiy with time dimension
Notes
------
module (- input and calc specs are predetermined in the) – (inputs from the config files will be disregarded)
bin_indices (- stage appends an array quantity called)
by (- stage also appends an array mask to access events) – bin index later in the pipeline
pisa.stages.utils.adhoc_sys module
Stage to implement an ad-hoc systematic that corrects the discrepancy between data and MC in one particular variable. This can be used to check how large the impact of such a hypothetical systematic would be on the physics parameters of an analysis.
- class pisa.stages.utils.adhoc_sys.adhoc_sys(data=None, params=None, variable_name=None, scale_file=None, **std_kwargs)[source]
Bases:
Stage
Stage to re-weight events according to factors derived from post-fit data/MC comparisons. The comparisons are produced somewhere externally and stored as a JSON which encodes the binning that was used to make the comparison and the resulting scaling factors.
- Parameters:
variable_name (str) – Name of the variable to correct data/MC agreement for. The variable must be loaded in the data loading stage and it must be present in the loaded JSON file.
scale_file (str) – Path to the file which contains the binning and the scale factors. The JSON file must contain a dictionary in which, for each variable, a 1D binning and an array of factors. This file is produced externally from PISA.
pisa.stages.utils.bootstrap module
Make bootstrap samples of data.
This stage allows one to resample datasets to estimate MC uncertainties without having to decrease statistics. Bootstrap samples are produced by random selection with replacement, which is implemented in this stage by an equivalent re-weighting of events.
- class pisa.stages.utils.bootstrap.bootstrap(seed=None, **std_kwargs)[source]
Bases:
Stage
Stage to make bootstrap samples from input data.
- Parameters:
seed (int, optional) – Seed for the random number generator.
- pisa.stages.utils.bootstrap.insert_bootstrap_after_data_loader(cfg_dict, seed=None)[source]
Given a pipeline configuration parsed with parse_pipeline_config, insert the bootstrap stage directly after the simple_data_loader stage and return the modified config dict.
- Parameters:
cfg_dict (collections.OrderedDict) – Pipeline configuration in the form of an ordered dictionary.
seed (int, optional) – Seed to be placed into the pipeline configuration.
- Returns:
A deepcopy of the original input cfg_dict with the configuration of the bootstrap stage inserted after the data loader.
- Return type:
pisa.stages.utils.fix_error module
Stage to take the initial errors of MC and keep them for all minimization.
Needed for the DRAGON nutau appearance analysis.
pisa.stages.utils.hist module
Stage to transform arrays with weights into actual histograms that represent event counts
- class pisa.stages.utils.hist.hist(apply_unc_weights=False, unweighted=False, **std_kwargs)[source]
Bases:
Stage
Stage to histogram events
- Parameters:
unweighted (bool, default False) – Return un-weighted event counts in each bin
apply_unc_weights (bool, default False) –
Expected container keys are ..
"weights" "unc_weights" (if `apply_unc_weights`)
pisa.stages.utils.kde module
Stage to transform arrays with weights into KDE maps that represent event counts
- class pisa.stages.utils.kde.kde(bw_method='silverman', coszen_name='reco_coszen', oversample=10, coszen_reflection=0.25, alpha=0.1, stack_pid=True, stash_hists=False, bootstrap=False, bootstrap_niter=10, bootstrap_seed=None, linearize_log_dims=True, **std_kargs)[source]
Bases:
Stage
stage to KDE-map events
- Parameters:
bw_method (string) – ‘scott’ or ‘silverman’ (see kde module)
coszen_name (string) – Binning name to identify the coszen bin that needs to undergo special treatment for reflection
oversample (int) – Evaluate KDE at more points per bin, takes longer, but is more accurate
stash_hists (bool) – Evaluate KDE only once and stash the result. This effectively ignores all changes from earlier stages, but greatly increases speed. Useful for muons where only over-all weight and detector systematic variations matter, which can both be applied on the histograms after this stage.
bootstrap (bool) – Use the bootstrapping technique to estimate errors on the KDE histograms.
linearize_log_dims (bool) – If True (default), calculate the KDE for a dimension that is binned logarithmically on the logarithm of the sample values. This generally results in better agreement of the total normalization of the KDE’d histograms to the sum of weights.
Notes
Make sure enough events are present with reco energy below and above the binning range, otherwise events will only “bleed out”
pisa.stages.utils.kfold module
Make K-folds of data.
This stage can be used to split MC into chunks of equal size and to select only one chunk to make histograms from. It uses the KFold class from scikit-learn to make “test” and “train” indeces for the dataset and sets all weights in the “train” indeces to zero. Optionally, weights can be re-scaled by the number of splits to renormalize the total rates.
- class pisa.stages.utils.kfold.kfold(n_splits, select_split=0, seed=None, renormalize=False, shuffle=False, save_mask=False, **std_kwargs)[source]
Bases:
Stage
Stage to make splits of the MC set and select one split to make histograms. The weight of all indeces not belonging to the selected split are set to zero.
- Parameters:
(int) (n_splits)
(int (seed)
optional) (shuffle indeces before splitting)
(int
optional)
(bool (shuffle) – by the number of splits
optional) – by the number of splits
(bool
optional)
pisa.stages.utils.resample module
Stage to transform binned data from one binning to another while also dealing with uncertainty estimates in a reasonable way. In particular, this allows up-sampling from a more coarse binning to a finer binning.
The implementation is similar to that of the hist stage, hence the over-writing of the apply method.
- class pisa.stages.utils.resample.ResampleMode(value)[source]
Bases:
Enum
Enumerates sampling methods of the resample stage.
- ARB = 3
- DOWN = 2
- UP = 1
- class pisa.stages.utils.resample.resample(scale_errors=True, **std_kwargs)[source]
Bases:
Stage
Stage to resample weighted MC histograms from one binning to another.
The origin binning is given as calc_mode and the output binning is given in apply_mode.
- Parameters:
scale_errors (bool, optional) – If True (default), apply scaling to errors.
pisa.stages.utils.set_variance module
Override errors and replace with manually chosen error fraction.