pisa.utils.hypersurface package

Submodules

pisa.utils.hypersurface.hyper_interpolator module

Classes and methods needed to do hypersurface interpolation over arbitrary parameters.

class pisa.utils.hypersurface.hyper_interpolator.HypersurfaceInterpolator(interpolation_param_spec, hs_fits, ignore_nan=True)[source]

Bases: object

Factory for interpolated hypersurfaces.

After being initialized with a set of hypersurface fits produced at different parameters, it uses interpolation to produce a Hypersurface object at a given point in parameter space using scipy’s RegularGridInterpolator.

The interpolation is piecewise-linear between points. All points must lie on a rectilinear ND grid.

Parameters:
  • interpolation_param_spec (dict) –

    Specification of interpolation parameter grid of the form::
    interpolation_param_spec = {

    ‘param1’: {“values”: [val1_1, val1_2, …], “scales_log”: True/False} ‘param2’: {“values”: [val2_1, val2_2, …], “scales_log”: True/False} … ‘paramN’: {“values”: [valN_1, valN_2, …], “scales_log”: True/False}

    }

    where values are given as Quantity.

  • hs_fits (list of dict) – list of dicts with hypersurfacesthat were fit at the points of the parameter mesh defined by interpolation_param_spec

  • ignore_nan (bool) – Ignore empty bins in hypersurfaces. The intercept in those bins is set to 1 and all slopes are set to 0.

Notes

Be sure to give a support that covers the entire relevant parameter range and a good distance beyond! To prevent minimization failure from NaNs, extrapolation is used if hypersurfaces outside the support are requested but needless to say these numbers are unreliable.

See also

scipy.interpolate.RegularGridInterpolator

class used for interpolation

property binning
get_hypersurface(**param_kw)[source]

Get a Hypersurface object with interpolated coefficients.

Parameters:

**param_kw – Parameters are given as keyword arguments, where the names of the arguments must match the names of the parameters over which the hypersurfaces are interpolated. The values are given as Quantity objects with units.

property interpolation_param_names
property num_interp_params
property param_names
plot_fits_in_bin(bin_idx, ax=None, n_steps=20, **param_kw)[source]

Plot the coefficients as well as covariance matrix elements as a function of the interpolation parameters.

Parameters:
  • bin_idx (tuple) – index of the bin for which to plot the fits

  • ax (2D array of axes, optional) – axes into which to place the plots. If None (default), appropriate axes will be generated. Must have at least size (n_coeff, n_coeff + 1).

  • n_steps (int, optional) – number of steps to plot between minimum and maximum

  • **param_kw – Parameters to be fixed when producing slices. If the interpolation is in N-D, then (N-2) parameters need to be fixed to produce 2D plots of the remaining 2 parameters and (N-1) need to be fixed to produce a 1D slice.

pisa.utils.hypersurface.hyper_interpolator.assemble_interpolated_fits(fit_directory, output_file, drop_fit_maps=False, leftout_param=None, leftout_surface=None)[source]

After all of the fits on the cluster are done, assemble the results to one JSON.

The JSON produced by this function is what load_interpolated_hypersurfaces expects.

pisa.utils.hypersurface.hyper_interpolator.get_incomplete_job_idx(fit_directory)[source]

Get job indices of fits that are not flagged as successful.

pisa.utils.hypersurface.hyper_interpolator.load_interpolated_hypersurfaces(input_file, expected_binning=None)[source]

Load a set of interpolated hypersurfaces from a file.

Analogously to “load_hypersurfaces”, this function returns a collection with a HypersurfaceInterpolator object for each Map.

Parameters:

input_file (str) –

A JSON input file as produced by fit_hypersurfaces if interpolation params were given. It has the form:

{
    interpolation_param_spec = {
        'param1': {"values": [val1_1, val1_2, ...], "scales_log": True/False}
        'param2': {"values": [val2_1, val2_2, ...], "scales_log": True/False}
        ...
        'paramN': {"values": [valN_1, valN_2, ...], "scales_log": True/False}
    },
    'hs_fits': [
        <list of dicts where keys are map names such as 'nue_cc' and values
        are hypersurface states>
    ]
}

Returns:

dictionary with a HypersurfaceInterpolator for each map

Return type:

collections.OrderedDict

pisa.utils.hypersurface.hyper_interpolator.pipeline_cfg_from_states(state_dict)[source]

Recover a pipeline cfg containing PISA objects from a raw state.

When a pipeline configuration is stored to JSON, the PISA objects turn into their serialized states. This function looks through the dictionary returned by from_json and recovers the PISA objects such as ParamSet and MultiDimBinning.

It should really become part of PISA file I/O functionality to read and write PISA objects inside dictionaries/lists into a JSON and be able to recover them…

pisa.utils.hypersurface.hyper_interpolator.prepare_interpolated_fit(nominal_dataset, sys_datasets, params, fit_directory, interpolation_param_spec, combine_regex=None, log=False, minimum_mc=0, **hypersurface_fit_kw)[source]

Writes steering files for fitting hypersurfaces on a grid of arbitrary parameters. The fits can then be run on a cluster with run_interpolated_fit.

Parameters:
  • nominal_dataset (dict) –

    Definition of the nominal dataset. Specifies the pipleline with which the maps can be created, and the values of all systematic parameters used to produced the dataset. Format must be:

    nominal_dataset = {

    “pipeline_cfg” = <pipeline cfg file (either cfg file path or dict)>), “sys_params” = { param_0_name : param_0_value_in_dataset, …, param_N_name : param_N_value_in_dataset }

    }

    Sys params must correspond to the provided HypersurfaceParam instances provided in the params arg.

  • sys_datasets (list of dicts) – List of dicts, where each dict defines one of the systematics datasets to be fitted. The format of each dict is the same as explained for nominal_dataset

  • params (list of HypersurfaceParams) – List of HypersurfaceParams instances that define the hypersurface. Note that this defined ALL hypersurfaces fitted in this function, e.g. only supports a single parameterisation for all maps (this is almost always what you want).

  • output_directory (str) – Directory in which the fits will be run. Steering files for the fits to be run will be stored here.

  • combine_regex (list of str, or None) – List of string regex expressions that will be used for merging maps. Used to combine similar species. Must be something that can be passed to the MapSet.combine_re function (see that functions docs for more details). Choose None is do not want to perform this merging.

  • interpolation_param_spec (collections.OrderedDict) –

    Specification of parameter grid that hypersurfaces should be interpolated over. The dict should have the following form:

    interpolation_param_spec = {
        'param1': {"values": [val1_1, val1_2, ...], "scales_log": True/False}
        'param2': {"values": [val2_1, val2_2, ...], "scales_log": True/False}
        ...
        'paramN': {"values": [valN_1, valN_2, ...], "scales_log": True/False}
    }
    

    The hypersurfaces will be fit on an N-dimensional rectilinear grid over parameters 1 to N. The flag scales_log indicates that the interpolation over that parameter should happen in log-space.

  • minimum_mc (int, optional) – Minimum number of un-weighted MC events required in each bin.

  • hypersurface_fit_kw (kwargs) – kwargs will be passed on to the calls to Hypersurface.fit

pisa.utils.hypersurface.hyper_interpolator.run_interpolated_fit(fit_directory, job_idx, skip_successful=False)[source]

Run the hypersurface fit for a grid point.

If skip_successful is true, do not run if the fit_successful flag is already True.

pisa.utils.hypersurface.hyper_interpolator.serialize_pipeline_cfg(pipeline_cfg)[source]

Turn a pipeline configuration into something we can store to JSON.

It doesn’t work by default because tuples are not allowed as keys when storing to JSON. All we do is to turn the tuples into strings divided by a double underscore.

pisa.utils.hypersurface.hypersurface module

Tools for working with hypersurfaces, which are continuous functions in N-D with arbitrary functional forms.

Hypersurfaces can be used to model systematic uncertainties derived from discrete simulation datasets, for example for detedctor uncertainties.

class pisa.utils.hypersurface.hypersurface.Hypersurface(params, initial_intercept=None, log=False)[source]

Bases: object

A class defining the hypersurface

Contains :
  • A single common intercept

  • N systematic parameters, inside which the functional form is defined

This class can be configured to hold both the functional form of the hypersurface and values (likely fitted from simulation datasets) for the free parameters of this functional form.

Fitting functionality is provided to fit these free parameters.

This class can simultaneously hold hypersurfaces for every bin in a histogram (Map).

The functional form of the systematic parameters can be arbitrarily complex.

The class has a fit method for fitting the hypersurface to some data (e.g. discrete systematics sets).

Serialization functionality is included to allow fitted hypersurfaces to be stored to a file and re-loaded later (e.g. to be used in analysis).

The main use cases are:
  1. Fit hypersurfaces
    • Define the desired HypersurfaceParams (functional form, intial coefficient guesses).

    • Instantiate the Hypersurface class, providing the hypersurface params and initial intercept guess.

    • Use Hypersurface.fit function (or more likely the fit_hypersurfaces helper function provided below), to fit the hypersurface coefficients to some provided datasets.

    • Store to file

  2. Evaluate an existing hypersurface
    • Load existing fitted Hypersurface from a file (load_hypersurfaces helper function)

    • Get the resulting hypersurface value for each bin for a given set of systemaic param values using the Hypersurface.evaluate method.

    • Use the hypersurface value for each bin to re-weight events

The class stores information about the datasets used to fit the hypersurfaces, including the Maps used and nominal and systematic parameter values.

Parameters:
  • params (list) – A list of HypersurfaceParam instances defining the hypersurface. The initial_fit_coeffts values in this instances will be used as the starting point for any fits.

  • initial_intercept (float) – Starting point for the hypersurface intercept in any fits

  • log (bool, optional) – Set hypersurface to log mode. The surface is fit to the log of the bin counts. The fitted surface is exponentiated during evaluation. Default: False

evaluate(param_values, bin_idx=None, return_uncertainty=False)[source]

Evaluate the hypersurface, using the systematic parameter values provided. Uses the current internal values for all functional form coefficients.

Parameters:
  • param_values (dict) –

    A dict specifying the values of the systematic parameters to use in the evaluation. Format is :

    { sys_param_name_0 : sys_param_0_val, …, sys_param_name_N : sys_param_N_val }. The keys must be string and correspond to the HypersurfaceParam instances. The values must be scalars.

  • bin_idx (tuple or None) – Optionally can specify a particular bin (using numpy indexing). d Othewise will evaluate all bins.

  • return_uncertainty (bool, optional) – return the uncertainty on the output (default: False)

fit(nominal_map, nominal_param_values, sys_maps, sys_param_values, norm=True, method='L-BFGS-B', fix_intercept=False, intercept_bounds=None, intercept_sigma=None, include_empty=False, keep_maps=True, ref_bin_idx=None, smooth_method=None, smooth_kw=None)[source]

Fit the hypersurface coefficients (in every bin) to best match the provided nominal and systematic datasets.

Writes the results directly into this data structure.

Parameters:
  • nominal_map (Map) – Map from the nominal dataset

  • nominal_param_values (dict) – Value of each systematic param used to generate the nominal dataset Format: { param_0_name : param_0_nom_val, …, param_N_name : param_N_nom_val }

  • sys_maps (list of Maps) – List containing the Map from each systematic dataset

  • sys_param_values (list of dicts) – List where each element if a dict containing the values of each systematic param used to generate the that dataset Each list element specified the parameters for the corresponding element in sys_maps

  • norm (bool) – Normalise the maps to the nominal map. This is what you want to do when using the hypersurface to re-weight simulation (which is the main use case). In principal the hypersurfaces are more general though and could be used for other tasks too, hence this option.

  • method (str) – method arg to pass to scipy.optimize.minimiza

  • fix_intercept (bool) – Fix intercept to the initial intercept.

  • intercept_bounds (2-tuple, optional) – Bounds on the intercept. Default is None (no bounds)

  • include_empty (bool) – Include empty bins in the fit. If True, empty bins are included with value 0 and sigma 1. Default: False

  • keep_maps (bool) – Keep maps used to make the fit. If False, maps will be set to None after the fit is complete. This helps to reduce the size of JSONS if the Hypersurface is to be stored on disk.

  • ref_bin_idx (tuple) – An index specifying a reference bin that will be used for logging

property fit_coefft_labels

Return labels for each fit coefficient

property fit_coeffts

Return all coefficients, in all bins, as a single array This is the overall intercept, plus the coefficients for each individual param Dimensions are: [binning …, fit coeffts]

property fit_maps

Return the `Map instances used for fitting These will be normalised if the fit was performend to normalised maps.

property fit_param_values

Return the stored systematic parameters from the datasets used for fitting Returns: { param_0_name : [ param_0_sys_val_0, …, param_0_sys_val_M ], …, param_N_name : [ param_N_sys_val_0, …, param_N_sys_val_M ] }

fluctuate(random_state=None)[source]

Return a new hypersurface object whose coefficients have been randomly fluctuated according to the fit covariance matrix.

Used for testing the impact of statistical uncertainty in the hypersurfaces fits on downstream analyses.

classmethod from_state(state)[source]

Instantiate a new object from the contents of a serialized state dict

Parameters:

resource (dict) – A dict

See also

to_json

get_nominal_mask()[source]

Return a mask indicating which datasets have nominal values for all parameters

get_on_axis_mask(param_name)[source]

Return a mask indicating which datasets are “on-axis” for a given parameter.

“On-axis” means “generated using the nominal value for this parameter”. Parameters other than the one specified can have non-nominal values.

Parameters:

param_name (str) – The name of systematic parameter for which we want on-axis datasets

property initialized

Return flag indicating if hypersurface has been initialized Not giving use direct write-access to the variable as they should nt be setting it themselves

property nominal_values

Return the stored nominal parameter for each dataset Returns: { param_0_name : param_0_nom_val, …, param_N_name : param_N_nom_val }

property num_fit_coeffts

Return the total number of coefficients in the hypersurface fit This is the overall intercept, plus the coefficients for each individual param

property num_fit_sets

Return number of datasets used for fitting

property param_names

Return the (ordered) names of the systematic parameters

report(bin_idx=None)[source]

Return a string version of the hypersurface contents

Parameters:

bin_idx (tuple of None) – Specify a particular bin (using numpy indexing). In this case only report on that bin.

property serializable_state

OrderedDict containing savable state attributes

class pisa.utils.hypersurface.hypersurface.HypersurfaceParam(name, func_name, initial_fit_coeffts=None, bounds=None, coeff_prior_sigma=None)[source]

Bases: object

A class representing one of the parameters (and corresponding functional forms) in the hypersurface.

A user creates the initial instances of thse params, before passing the to the Hypersurface instance. Once this has happened, the user typically does not need to directly interact woth these HypersurfaceParam instances.

Parameters:
  • name (str) – Name of the parameter

  • func_name (str) – Name of the hypersurface function to use. See “Hypersurface functional forms” section for more details, including available functions.

  • initial_fit_coeffts (array) – Initial values for the coefficients of the functional form Number and meaning of coefficients depends on functional form

  • bounds (2-tuple of array_like, optional) – Lower and upper bounds on independent variables. Defaults to no bounds. Each element of the tuple must be either an array with the length equal to the number of parameters, or a scalar (in which case the bound is taken to be the same for all parameters.) Use np.inf with an appropriate sign to disable bounds on all or some parameters.

  • coeff_prior_sigma (array, optional) – Prior sigma values for the coefficients. If None (default), no regularization will be applied during the fit.

evaluate(param, out, bin_idx=None)[source]

Evaluate the functional form for the given param values. Uses the current values of the fit coefficients.

By default evaluates all bins, but optionally can specify a particular bin (used when fitting).

classmethod from_state(state)[source]
get_fit_coefft(*args, **kwargs)[source]

Get a fit coefficient values from the matrix Basically just wrapping the indexing function

get_fit_coefft_idx(bin_idx=None, coefft_idx=None)[source]

Indexing the fit_coefft matrix is a bit of a pain This helper function eases things

gradient(param, out, bin_idx=None)[source]

Evaluate gradient of the functional form for the given param values. Uses the current values of the fit coefficients.

By default evaluates all bins, but optionally can specify a particular bin (used when fitting).

property serializable_state

OrderedDict containing savable state attributes

pisa.utils.hypersurface.hypersurface.fit_hypersurfaces(nominal_dataset, sys_datasets, params, output_dir, tag, combine_regex=None, log=True, minimum_mc=0, minimum_weight=0, **hypersurface_fit_kw)[source]

A helper function that a user can use to fit hypersurfaces to a bunch of simulation datasets, and save the results to a file. Basically a wrapper of Hypersurface.fit, handling common pre-fitting tasks like producing mapsets from piplelines, merging maps from similar specifies, etc.

Note that this supports fitting multiple hypersurfaces to the datasets, e.g. one per simulated species. Returns a dict with format: { map_0_key : map_0_hypersurface, …, map_N_key : map_N_hypersurface, }

Parameters:
  • nominal_dataset (dict) –

    Definition of the nominal dataset. Specifies the pipleline with which the maps can be created, and the values of all systematic parameters used to produced the dataset. Format must be:

    nominal_dataset = {

    “pipeline_cfg” = <pipeline cfg file (either cfg file path or dict)>), “sys_params” = { param_0_name : param_0_value_in_dataset, …, param_N_name : param_N_value_in_dataset }

    }

    Sys params must correspond to the provided HypersurfaceParam instances provided in the params arg.

  • sys_datasets (list of dicts) – List of dicts, where each dict defines one of the systematics datasets to be fitted. The format of each dict is the same as explained for nominal_dataset

  • params (list of HypersurfaceParams) – List of HypersurfaceParams instances that define the hypersurface. Note that this defined ALL hypersurfaces fitted in this function, e.g. only supports a single parameterisation for all maps (this is almost almost what you want).

  • output_dir (str) – Path to directly to write results file in

  • tag (str) – A string identifier that will be included in the file name to help you make sense of the file in the future. Note that additional information on the contents will be added to the file name by this function.

  • combine_regex (list of str, or None) – List of string regex expressions that will be used for merging maps. Used to combine similar species. Must be something that can be passed to the MapSet.combine_re function (see that functions docs for more details). Choose None is do not want to perform this merging.

  • minimum_mc (int, optional) – Minimum number of unweighted MC events required in each bin. If the number of unweighted MC events in a bin in any MC set is less than this number, the value is set to exactly zero and will be excluded from the fit.

  • minimum_weight (float, optional) – Minimum weight per bin. Bins with a total summed weight of less than this number are excluded from the fit. Intended use is to exclude extremely small values from KDE histograms that would pull the fit to zero.

  • hypersurface_fit_kw (kwargs) – kwargs will be passed on to the calls to Hypersurface.fit

pisa.utils.hypersurface.hypersurface.load_hypersurfaces(input_file, expected_binning=None)[source]

User function to load file containing hypersurface fits, as written using fit_hypersurfaces. Can be multiple hypersurfaces assosicated with different maps.

Returns a dict with the format: { map_0_key : map_0_hypersurface, …, map_N_key : map_N_hypersurface, }

Hnadling the following input files cases:
  1. Load files produced using this code (recommended)

  2. Load files producing using older versions of PISA

  3. Load public data releases csv formatted files

Parameters:
  • input_file (str) – Path to the file contsaining the hypersurface fits. For the special case of the datareleases these needs to be the path to all relevent CSV fles, e.g. “<path/to/datarelease>/hyperplanes_*.csv”.

  • expected_binning (One/MultiDimBinning) – (Optional) Expected binning for hypersurface. It will checked enforced that this mathes the binning found in the parsed hypersurfaces. For certain legacy cases where binning info is not stored, this will be assumed to be the actual binning.

pisa.utils.hypersurface.hypersurface_plotting module

Hypersurface Plotting functions

pisa.utils.hypersurface.hypersurface_plotting.plot_bin_fits(ax, hypersurface, bin_idx, param_name, color=None, label=None, hs_label=None, show_nominal=False, show_offaxis=True, show_onaxis=True, show_zero=False, show_uncertainty=True, xlim=None)[source]

Plot the hypersurface for a given bin, in 1D w.r.t. to a single specified parameter. Plots the following:

  • on-axis data points used in the fit

  • hypersurface w.r.t to the specified parameter (1D)

  • nominal value of the specified parameter

Parameters:
  • ax (matplotlib.Axes) – matplotlib ax to draw the plot on

  • hypersurface (Hypersurface) – Hypersurface to make the plots from

  • bin_idx (tuple) – Index (numpy array indexing format) of the bin to plot

  • param_name (str) – Name of the parameter of interest

  • color (str) – color to use for hypersurface curve

  • label (str) – label to use for hypersurface curve

  • show_nominal (bool) – Indicate the nominal value of the param on the plot

  • show_uncertainty (bool) – Indicate the hypersurface uncertainty on the plot

  • show_onaxis (bool) – Plot the “on-axis” input datasets (meaning those whose only off-nominal parameter is the one being plotter).

  • show_offaxis (bool) – Plot the “off-axis” input datasets (meaning those with multiple off-nominal parameter values).

  • xlim (tuple or None) – Optionally, specify the xlim to span when plotting the hypersurface If not specified, will span all input datasets

pisa.utils.hypersurface.hypersurface_plotting.plot_bin_fits_2d(ax, hypersurface, bin_idx, param_names)[source]

Plot the hypersurface for a given bin, in 2D w.r.t. to a pair of params Plots the following:

  • All data points used in the fit

  • hypersurface w.r.t to the specified parameters (2D)

  • nominal value of the specified parameters

Parameters:
  • ax (matplotlib.Axes) – matplotlib ax to draw the plot on

  • hypersurface (Hypersurface) – Hypersurface to make the plots from

  • bin_idx (tuple) – Index (numpy array indexing format) of the bin to plot

  • param_names (list of str) – List containing the names of the two parameters of interest

Module contents