pisa.core package

Submodules

pisa.core.bin_indexing module

Functions to retrieve the bin index for a 1- to 3-dimensional sample.

Functions were adapted from translation.py

Notes

The binning convention in PISA (from numpy.histogramdd) is that the lower edge is inclusive and upper edge is exclusive for a given bin, except for the upper-most bin whose upper edge is also inclusive. Visually, for 1D:

[ bin 0 ) [ bin 1 ) … [ bin num_bins - 1 ]

First bin is index = 0 and last bin is index = (num_bins - 1)

  • Values below the lowermost-edge of any dimension’s binning return index = -1

  • NaN values return index = -1

  • Otherwise, values above the uppermost-edge of any dimension’s binning return index = num_bins

pisa.core.bin_indexing.lookup_indices(sample, binning)[source]

Lookup (flattened) bin index for sample points.

Parameters:
  • sample (length-M_dimensions sequence of length-N_events arrays) – All smart arrays must have the same lengths; corresponding elements of the arrays are the coordinates of an event in the dimensions each array represents.

  • binning (pisa.core.binning.MultiDimBinning or convertible thereto) – binning is passed to instantiate MultiDimBinning, so e.g., a pisa.core.binning.OneDimBinning is valid to pass as binning

Returns:

indices – One for each event the index of the histogram in which it falls into

Return type:

length-N_events arrays

Notes

this method works for 1d, 2d and 3d histogram only

pisa.core.bin_indexing.test_lookup_indices()[source]

Unit tests for lookup_indices function

pisa.core.binning module

Class to define binning in one dimension (OneDimBinning) and then a container class (MultiDimBinning) for arbitrarily many of dimensions (one or more). These classes have many useful methods for working with binning.

class pisa.core.binning.MultiDimBinning(dimensions, name=None, mask=None)[source]

Bases: object

Multi-dimensional binning object. This can contain one or more OneDimBinning objects, and all subsequent operations (e.g. slicing) will act on these in the order they are supplied.

Note that it is convenient to construct MultiDimBinning objects via the * operator (which implementes the outer product) from multiple OneDimBinning objects. See Examples below for details.

Parameters:

dimensions (OneDimBinning or sequence convertible thereto) – Dimensions for the binning object. Indexing into the MultiDimBinning object follows the order in which dimensions are provided.

See also

OneDimBinning

each item that is not a OneDimBinning object is passed to this class to be instantiated as such.

Examples

>>> from pisa import ureg
>>> from pisa.core.binning import MultiDimBinning, OneDimBinning
>>> ebins = OneDimBinning(name='energy', is_log=True,
...                       num_bins=40, domain=[1, 100]*ureg.GeV)
>>> czbins = OneDimBinning(name='coszen',
...                        is_lin=True, num_bins=4, domain=[-1, 0])
>>> mdb = ebins * czbins
>>> print(mdb)
MultiDimBinning(
    OneDimBinning('energy', 40 logarithmically-regular bins spanning [1.0, 100.0] GeV (behavior is logarithmic)),
    OneDimBinning('coszen', 4 linearly-regular bins spanning [-1.0, 0.0] (behavior is linear))
)
>>> print(mdb.energy)
OneDimBinning('energy', 40 logarithmically-regular bins spanning [1.0, 100.0] GeV (behavior is logarithmic))
>>> print(mdb[0, 0])
MultiDimBinning(
    OneDimBinning('energy', 1 logarithmically-regular bin with edges at [1.0, 1.1220184543019633]GeV (behavior is logarithmic)),
    OneDimBinning('coszen', 1 linearly-regular bin with edges at [-1.0, -0.75] (behavior is linear))
)
>>> print(mdb.slice(energy=2))
MultiDimBinning(
    OneDimBinning('energy', 1 logarithmically-regular bin with edges at [1.2589254117941673, 1.4125375446227544]GeV (behavior is logarithmic)),
    OneDimBinning('coszen', 4 linearly-regular bins spanning [-1.0, 0.0] (behavior is linear))
)
>>> smaller_binning = mdb[0:2, 0:3]
>>> map = smaller_binning.ones(name='my_map')
>>> print(map)
Map(name='my_map',
    tex='{\\rm my\\_map}',
    full_comparison=False,
    hash=None,
    parent_indexer=None,
    binning=MultiDimBinning(
    OneDimBinning('energy', 2 logarithmically-regular bins spanning [1.0, 1.2589254117941673] GeV (behavior is logarithmic)),
    OneDimBinning('coszen', 3 linearly-regular bins spanning [-1.0, -0.25] (behavior is linear))
),
    hist=array([[1., 1., 1.],
       [1., 1., 1.]]))
assert_array_fits(array)[source]

Check if a 2D array of values fits into the defined bins (i.e., has the exact shape defined by this binning).

Parameters:

array (2D array (or sequence-of-sequences))

Returns:

fits

Return type:

bool, True if array fits or False otherwise

Raises:

ValueError if array shape does not match the binning shape

assert_compat(other)[source]

Check if a (possibly different) binning can map onto the defined binning. Allows for simple re-binning schemes (but no interpolation).

Parameters:

other (Binning or container with attribute "binning")

Returns:

compat

Return type:

bool

property basename_binning

Identical binning but with dimensions named by their basenames. Note that the tex properties for the dimensions are not carried over into the new binning.

property basenames

List of binning names with prefixes and/or suffixes along with any number of possible separator characters removed. See function basename for detailed specifications.

property bin_edges

Return a list of the contained dimensions’ bin_edges that is compatible with the numpy.histogramdd hist argument.

bin_volumes(attach_units=True)[source]

Bin “volumes” defined in num_dims-dimensions

Parameters:

attach_units (bool) – Whether to attach pint units to the resulting array

Returns:

volumes – Bin volumes

Return type:

array

broadcast(a, from_dim, to_dims)[source]

Take a one-dimensional array representing one input dimension and broadcast it across some number of output dimensions.

Parameters:
  • a (1D array) – Data from the from_dim dimension. a must have same length as the dimension it comes from (or Numpy must be able to automatically cast it into this dimension).

  • from_dim (string) – Name of dimension that the data in a comes from.

  • to_dims (string or iterable of strings) – Dimension(s) to cast a into.

Returns:

a_broadcast – Broadcast version of a

Return type:

array

See also

broadcaster

The method used internally to derive the tuple used to broadcast the array. This can be used directly to return the broadcaster for use on other Maps or Numpy arrays.

broadcaster(from_dim, to_dims)[source]

Generate an indexder that, if applied to a one-dimensional array representing data from one dimension, broadcasts that array into some number of other dimensions.

Parameters:
  • from_dim (string) – Name of dimension that the data in comes from.

  • to_dims (string or iterable of strings) – Dimension(s) to cast into.

Returns:

bcast – Tuple that can be applied to a Numpy array for purposes of broadcasting it. E.g. use as np.array([0,1,2])[bcast].

Return type:

tuple

property coord

coordinate for indexing into binning by dim names

Type:

namedtuple

property dimensions

each dimension’s binning in a list

Type:

tuple of OneDimBinning

property dims

shortcut for dimensions

Type:

tuple of OneDimBinning

property domains

Return a list of the contained dimensions’ domains

downsample(*args, **kwargs)[source]

Return a Binning object downsampled relative to this binning.

Parameters:
  • *args (each factor an int) – Factors by which to downsample the binnings. There must either be one factor (one arg)–which will be broadcast to all dimensions–or there must be as many factors (args) as there are dimensions. If positional args are specified (i.e., non-kwargs), then kwargs are forbidden.

  • **kwargs (name=factor pairs)

Returns:

new_binning – New binning, downsampled from the current binning.

Return type:

MultiDimBinning

Notes

Can either specify downsampling by passing in args (ordered values, no keywords) or kwargs (order doesn’t matter, but uses keywords), but not both.

See also

oversample

Oversample (upsample) a the MultiDimBinning

OneDimBinning.downsample

The method actually called to perform the downsampling for each OneDimBinning within this MultiDimBinning object.

OneDimBinning.oversample

Same, but oversample (upsample) a OneDimBinning object

property edges_hash

hash on the list of hashes for each dimension’s edge values

Type:

int

empty(name, map_kw=None, **kwargs)[source]

Return a Map whose hist is an “empty” numpy ndarray with same dimensions as this binning.

The contents are not _actually_ empty, just undefined. Therefore be careful to populate the array prior to using its contents.

Parameters:
  • name (string) – Name of the Map

  • map_kw (None or dict) – keyword arguments sent to instantiate the new Map (except name which is specified above)

  • **kwargs – keyword arguments passed on to numpy.empty() (except shape which must be omitted)

Returns:

map

Return type:

Map

property finite_binning

Identical binning but with infinities in bin edges replaced by largest/smallest floating-point numbers representable with the current pisa.FTYPE.

classmethod from_json(resource)[source]

Instantiate a new MultiDimBinning object from a JSON file.

The format of the JSON is generated by the MultiDimBinning.to_json method, which converts a MultiDimBinning object to basic types and numpy arrays are converted in a call to pisa.utils.jsons.to_json.

Parameters:

resource (str) – A PISA resource specification (see pisa.utils.resources)

full(fill_value, name, map_kw=None, **kwargs)[source]

Return a map whose hist is filled with fill_value of same dimensions as this binning.

Parameters:
  • fill_value – Value with which to fill the map

  • name (string) – Name of the map

  • map_kw (None or dict) – keyword arguments sent to instantiate the new Map (except name which is specified above)

  • **kwargs – keyword arguments passed on to numpy.fill_value() (except shape, which must be omitted)

Returns:

map

Return type:

Map

property hash

Unique hash value for this object

property hashable_state

Everything necessary to fully describe this object’s state. Note that objects may be returned by reference, so to prevent external modification, the user must call deepcopy() separately on the returned OrderedDict.

Returns:

state – MultiDimBinning via MultiDimBinning(**state)

Return type:

OrderedDict that can be passed to instantiate a new

property inbounds_criteria

Return string boolean criteria indicating e.g. an event falls within the limits of the defined binning.

This can be used for e.g. applying cuts to events.

See also

pisa.core.events.keepEventsInBins

index(dim, use_basenames=False)[source]

Find dimension implied by dim and return its integer index.

Parameters:
  • dim (int, string, OneDimBinning) – An integer index, dimesion name, or identical OneDimBinning object to locate within the contained dimensions

  • use_basenames (bool) – Dimension names are only compared after pre/suffixes are stripped, allowing for e.g. `dim`=’true_energy’ to find ‘reco_energy’.

Returns:

idx – index of the dimension corresponding to dim

Return type:

integer

Raises:

ValueError if dim cannot be found

index2coord(index)[source]

Convert a flat index into an N-dimensional bin coordinate.

Useful in conjunction with enumerate(iterbins)

Parameters:

index (integer) – The flat index

Returns:

coord – Coordinates are in the same order as the binning is here defined and each coordinate is named by its corresponding dimension. Therefore integer indexing into coord as well as named indexing are possible.

Return type:

self.coord namedtuple

indexer(**kwargs)[source]

Any dimension index/slice not specified by name in kwargs will default to “:” (all elements).

Parameters:

**kwargs – kwargs are names of dimension(s) and assigned to these are either an integer index into that dimension or a Python slice object for that dimension. See examples below for details.

Returns:

indexer

Return type:

tuple

See also

broadcast

Assignment of a one-dimensional array to a higher-dimensional array is simplified greatly by using broadcast in conjunction with indexer or pisa.core.map.Map.slice. See examples in docs for broadcast.

broadcaster

Similar to broadcast, but returns a tuple that can be applied to broadcast any one-dimensional array.

slice

Apply the indexer returned by this method to this MultiDimBinning object, returning a new MultiDimBinning object.

pisa.core.map.Map.slice

Same operation, but slices a Map object by dimension-name (internally, calls indexer).

Examples

>>> from pisa import ureg
>>> from pisa.core.binning import MultiDimBinning, OneDimBinning
>>> ebins = OneDimBinning(name='energy', is_log=True,
...                       num_bins=40, domain=[1, 80]*ureg.GeV)
>>> czbins = OneDimBinning(name='coszen',
...                        is_lin=True, num_bins=4, domain=[-1, 0])
>>> mdb = ebins * czbins
>>> print(mdb.indexer(energy=0))
(0, slice(None, None, None))

Omitting a dimension (coszen in the above) is equivalent to slicing with a colon (i.e., (0, slice(None))):

>>> print(mdb.indexer(energy=0, coszen=slice(None)))
(0, slice(None, None, None))
>>> print(mdb.indexer(energy=slice(None), coszen=1))
(slice(None, None, None), 1)

Now create an indexer to use on a Numpy array:

>>> x = np.random.RandomState(0).uniform(size=mdb.shape)
>>> indexer = mdb.indexer(energy=slice(0, 5), coszen=1)
>>> print(x[indexer])
[0.71518937 0.64589411 0.38344152 0.92559664 0.83261985]
is_compat(other)[source]

Check if another binning is compatible with this binning.

Note that for now, only downsampling is allowed from other to this, and not vice versa.

Parameters:

other (MultiDimBinning)

Returns:

is_compat

Return type:

bool

property is_irregular

Returns True if any of the 1D binnings is irregular.

property is_lin

Returns True iff all dimensions are linear.

property is_log

Returns True iff all dimensions are log.

iterbins()[source]

Return an iterator over each N-dimensional bin. The elments returned by the iterator are each a MultiDimBinning, just containing a single bin.

Return type:

bin_iterator

See also

index2coord

convert the (flat) index to multi-dimensional coordinate, which is useful when using e.g. enumerate(iterbins)

itercoords()[source]

Return an iterator over each N-dimensional coordinate into the binning. The elments returned by the iterator are each a namedtuple, which can be used to directly index into the binning.

Return type:

coord_iterator

See also

iterbins

Iterator over each bin

index2coord

convert the (flat) index to multi-dimensional coordinate, which is useful when using e.g. enumerate(iterbins)

iterdims()[source]

Iterator over contained dimensions, each a OneDimBinning

iteredgetuples()[source]

Return an iterator over each bin’s edges. The elments returned by the iterator are a tuple of tuples, where the innermost tuples correspond to each dimension (in the order they’re defined here).

Units are stripped prior to iteration for purposes of speed.

Note that this method is, according to one simple test, about 5000x faster than iterbins.

Return type:

edges_iterator

See also

iterbins

Similar, but returns a OneDimBinning object for each bin. This is slower that iteredgetuples but easier to work with.

ito(*args, **kwargs)[source]

Convert units in-place. Cf. Pint’s ito method.

property mask

return the bin mask

Type:

array

property mask_hash

Hash value based solely upon the mask.

meshgrid(entity, attach_units=True)[source]

Apply NumPy’s meshgrid method on various entities of interest.

Parameters:
  • entity (string) – Can be any attribute of OneDimBinning that returns a 1D array with units. E.g., one of ‘midpoints’, ‘weighted_centers’, ‘bin_edges’, ‘bin_widths’, or ‘weighted_bin_widths’

  • attach_units (bool) – Whether to attach units to the result (can save computation time by not doing so).

Returns:

[X1, X2,…, XN] – One ndarray or quantity is returned per dimension; see docs for numpy.meshgrid for details

Return type:

list of numpy ndarray or Pint quantities of the same

See also

numpy.meshgrid

property midpoints

Return a list of the contained dimensions’ midpoints

property name

Name of the dimension

property names

names of each dimension contained

Type:

list of strings

property normalize_values

Normalize quantities’ units prior to hashing

Type:

bool

property normalized_state

OrderedDict containing normalized (base units, and rounded to appropriate precision) state attributes used for testing equality between two objects.

Use hashable_state for faster equality checks and normalized_state for inspecting the contents of each state attribute pre-hashing

property num_bins

Return a list of the contained dimensions’ num_bins. Note that this does not accpunt for any bin mask (since it is computed per dimension)

property num_dims

number of dimensions

Type:

int

ones(name, map_kw=None, **kwargs)[source]

Return a numpy ndarray filled with 1’s with same dimensions as this binning.

Parameters:
  • name (string) – Name of the map

  • map_kw (None or dict) – keyword arguments sent to instantiate the new Map (except name which is specified above)

  • **kwargs – keyword arguments passed on to numpy.ones() (except shape which must be omitted)

Returns:

map

Return type:

Map

oversample(*args, **kwargs)[source]

Return a MultiDimBinning object oversampled relative to this one.

Parameters:
  • *args (each factor an int) – Factors by which to oversample the binnings. There must either be one factor (one arg)–which will be broadcast to all dimensions–or there must be as many factors (args) as there are dimensions. If positional args are specified (i.e., non-kwargs), then kwargs are forbidden. For more detailed control, use keyword arguments to specify the dimension(s) to be oversampled and their factors.

  • **kwargs (name=factor pairs) – Dimensions not specified default to oversample factor of 1 (i.e., no oversampling)

Returns:

new_binning – New binning, oversampled from the current binning.

Return type:

MultiDimBinning

Notes

You can either specify oversmapling by passing in args (ordered values, no keywords) or kwargs (order doesn’t matter, but uses keywords), but not both.

Specifying simple args (no keywords) requires either a single scalar (in which case all dimensions will be oversampled by the same factor) or one scalar per dimension (which oversamples the dimensions in the order specified).

Specifying keyword args is far more explicit (and general), where each dimension’s oversampling can be specified by name=factor pairs, but not every dimension must be specified (where no oversampling is applied to unspecified dimensions).

See also

downsample

Similar to this, but downsample the MultiDimBinning

OneDimBinning.oversample

Oversample a OneDimBinning object; this method is called to actually perform the oversampling for each dimension within this MultiDimBinning object

OneDimBinning.downsample

Same but downsample for OneDimBinning

Examples

>>> x = OneDimBinning('x', bin_edges=[0, 1, 2])
>>> y = OneDimBinning('y', bin_edges=[0, 20])
>>> mdb = x * y

The following are all equivalent:

>>> print(mdb.oversample(2))
MultiDimBinning(
    OneDimBinning('x', 4 linearly-regular bins spanning [0.0, 2.0] (behavior is linear)),
    OneDimBinning('y', 2 linearly-regular bins spanning [0.0, 20.0] (behavior is linear))
)
>>> print(mdb.oversample(2, 2))
MultiDimBinning(
    OneDimBinning('x', 4 linearly-regular bins spanning [0.0, 2.0] (behavior is linear)),
    OneDimBinning('y', 2 linearly-regular bins spanning [0.0, 20.0] (behavior is linear))
)
>>> print(mdb.oversample(x=2, y=2))
MultiDimBinning(
    OneDimBinning('x', 4 linearly-regular bins spanning [0.0, 2.0] (behavior is linear)),
    OneDimBinning('y', 2 linearly-regular bins spanning [0.0, 20.0] (behavior is linear))
)

But with kwargs, you can specify only the dimensions you want to oversample, and the other dimension(s) remain unchanged:

>>> print(mdb.oversample(y=5))
MultiDimBinning(
    OneDimBinning('x', 2 linearly-regular bins spanning [0.0, 2.0] (behavior is linear)),
    OneDimBinning('y', 5 linearly-regular bins spanning [0.0, 20.0] (behavior is linear))
)
remove(dims)[source]

Remove dimensions.

Parameters:

dims (str, int, or sequence thereof) – Dimensions to be removed

Returns:

binning – Identical binning as this but with dims removed.

Return type:

MultiDimBinning

reorder_dimensions(order, use_deepcopy=False, use_basenames=False)[source]

Return a new MultiDimBinning object with dimensions ordered according to order.

Parameters:

order (MultiDimBinning or sequence of string, int, or OneDimBinning) – Order of dimensions to use. Strings are interpreted as dimension basenames, integers are interpreted as dimension indices, and OneDimBinning objects are interpreted by their basename attributes (so e.g. the exact binnings in order do not have to match this object’s exact binnings; only their basenames). Note that a MultiDimBinning object is a valid sequence type to use for order.

Notes

Dimensions specified in order that are not in this object are ignored, but dimensions in this object that are missing in order result in an error.

Return type:

MultiDimBinning object with reordred dimensions.

Raises:
  • ValueError if dimensions present in this object are missing from

  • order

Examples

>>> b0 = MultiDimBinning(...)
>>> b1 = MultiDimBinning(...)
>>> b2 = b0.reorder_dimensions(b1)
>>> print(b2.binning.names)
property serializable_state

Attributes of the object that are stored to disk. Note that attributes may be returned as references to other objects, so to prevent external modification of those objects, the user must call deepcopy() separately on the returned OrderedDict.

Returns:

state dict – can be passed to instantiate a new MultiDimBinning via MultiDimBinning(**state)

Return type:

OrderedDict

property shape

shape of binning, akin to nump.ndarray.shape

Type:

tuple

property size

total number of bins

Type:

int

slice(**kwargs)[source]

Slice the binning by dimension name. Any dimension/index not specified by name in kwargs will default to “:” (all bins).

Uses indexer internally to define the indexing tuple.

Returns:

sliced_binning

Return type:

MultiDimBinning

squeeze()[source]

Remove any singleton dimensions (i.e. that have only a single bin). Analagous to numpy.squeeze.

Return type:

MultiDimBinning with only non-singleton dimensions

to(*args, **kwargs)[source]

Convert the contained dimensions to the passed units. Unspecified dimensions will be omitted.

to_json(filename, **kwargs)[source]

Serialize the state to a JSON file that can be instantiated as a new object later.

Parameters:
  • filename (str) – Filename; must be either a relative or absolute path (not interpreted as a PISA resource specification)

  • **kwargs – Further keyword args are sent to pisa.utils.jsons.to_json()

See also

from_json

Instantiate new object from the file written by this method

pisa.utils.jsons.to_json

property tot_num_bins

Return total number of bins. If a bin mask is used, this will only count bins that are not masked off

property units

Return a list of the contained dimensions’ units

Type:

list

weighted_bin_volumes(attach_units=True)[source]

Bin “volumes” defined in num_dims-dimensions, but unlike bin_volumes, the volume is evaluated in the space of the binning. E.g., logarithmic bins have weighted_bin_volumes of equal size in log-space.

Parameters:

attach_units (bool) – Whether to attach pint units to the resulting array

Returns:

volumes – Bin volumes

Return type:

array

property weighted_centers

Return a list of the contained dimensions’ weighted_centers (e.g. equidistant from bin edges on logarithmic scale, if the binning is logarithmic; otherwise linear). Access midpoints attribute for always-linear alternative.

zeros(name, map_kw=None, **kwargs)[source]

Return a numpy ndarray filled with 0’s with same dimensions as this binning.

Parameters:
  • name (string) – Name of the map

  • map_kw (None or dict) – keyword arguments sent to instantiate the new Map (except name which is specified above)

  • **kwargs – keyword arguments passed on to numpy.zeros() (except shape which must be omitted)

Returns:

map

Return type:

Map

class pisa.core.binning.OneDimBinning(name, tex=None, bin_edges=None, units=None, domain=None, num_bins=None, is_lin=None, is_log=None, bin_names=None)[source]

Bases: object

Histogram-oriented binning specialized to a single dimension.

If neither is_lin nor is_log is specified, linear behavior is assumed (i.e., is_lin is set to True).

Parameters:
  • name (str, of length > 0) – Name for this dimension. Must be valid Python name (since it will be accessed with the dot operator). If not, name will be converted to a valid Python name.

  • tex (str or None) – TeX label for this dimension.

  • bin_edges (sequence of scalars, or None) – Numerical values (optionally including Pint units) that represent the edges of the bins. bin_edges needn’t be specified if domain, num_bins, and optionally is_log is specified. Pint units can be attached to bin_edges, but will be converted to units if this argument is specified.

  • units (Pint unit or object convertible to Pint unit, or None) – If None, units will be read from either bin_edges or domain, and if none of these have units, the binning has unit ‘dimensionless’ attached.

  • is_lin (bool or None) – Binning behavior is linear for purposes of resampling, plotting, etc. Mutually exclusive with is_log. If neither is_lin or is_log is True (i.e., both are None), default behavior is linear (is_lin is set to True internally).

  • is_log (bool or None) – Binning behavior is logarithmic for purposes of resampling, plotting, etc. Mutually exclusive with is_lin. If neither is_lin or is_log is True (i.e., both are None), default behavior is linear (is_lin is set to True internally).

  • domain (length-2 sequence of scalars, or None) – Units may be specified. Required along with num_bins if bin_edges is not specified (optionally specify is_log=True to define the bin_edges to be log-uniform).

  • num_bins (int or None) – Number of bins. Required along with domain if bin_edges is not specified (optionally specify is_log=True to define the bin_edges to be log-uniform).

  • bin_names (sequence of nonzero-length strings, or None) – Strings by which each bin can be identified. This is expected to be useful when one needs to easily identify bins by name where the actual numerical values can be non-obvious e.g. the PID dimension. None is also acceptable if there is no reason to name the bins.

Notes

Consistency is enforced for all redundant parameters passed to the constructor.

You can avoid passing bin_edges if num_bins and domain are specified. Specify is_lin=True or is_log=True to define the binning to be linear or logarithmic (but note that if neither is specified as True, linear behavior is the default).

Be careful, though, since bin edges will be defined slightly differently depending on the pisa.FTYPE defined (PISA_FTYPE environment variable).

Examples

>>> from pisa import ureg
>>> from pisa.core.binning import OneDimBinning
>>> ebins = OneDimBinning(name='energy', is_log=True,
...                       num_bins=40, domain=[1, 100]*ureg.GeV)
>>> print(ebins)
OneDimBinning('energy', 40 logarithmically-regular bins spanning [1.0, 100.0] GeV (behavior is logarithmic))
>>> ebins2 = ebins.to('joule')
>>> print(ebins2)
OneDimBinning('energy', 40 logarithmically-regular bins spanning [1.6021766339999998e-10, 1.602176634e-08] J (behavior is logarithmic))
>>> czbins = OneDimBinning(name='coszen',
...                        is_lin=True, num_bins=4, domain=[-1, 0])
>>> print(czbins)
OneDimBinning('coszen', 4 linearly-regular bins spanning [-1.0, 0.0] (behavior is linear))
>>> czbins2 = OneDimBinning(name='coszen',
...                         bin_edges=[-1, -0.75, -0.5, -0.25, 0])
>>> czbins == czbins2
True
assert_compat(other)[source]

Assert that this binning is compatible with other.

property basename

Basename of the dimension, stripping “true”, “reco”, underscores, whitespace, etc. from the name attribute.

property basename_binning

Identical binning but named as the basename of this binning. Note that the tex property is not carried over into the new binning.

property bin_edges

Edges of the bins.

Type:

array

property bin_names

Bin names

Type:

list of strings or None

property bin_widths

Absolute widths of bins.

property domain

domain of the binning, (min, max) bin edges

Type:

array

downsample(factor)[source]

Downsample the binning by an integer factor that evenly divides the current number of bins.

Parameters:

factor (int >= 1) – Downsampling factor that evenly divides the current number of bins. E.g., if the current number of bins is 4, factor can be one of 1, 2, or 4. Note that floats are converted into integers if float(factor) == int(factor).

Returns:

new_binning – New binning, downsampled from the current binning.

Return type:

OneDimBinning

Raises:

ValueError if illegal value is specified for factor

Notes

Bin names are _not_ preserved for any factor except 1 since it is ambiguous how names should be propagated. If you wish to have bin names after downsampling, assign them afterwards.

property edge_magnitudes

Bin edges’ magnitudes

property edges_hash

Hash value based solely upon bin edges’ values.

The hash value is obtained on the edges after “normalizing” their values if self.normalize_values is True; see pisa.utils.comparsions.normQuant for details of the normalization process.

property finite_binning

Identical binning but with infinities in bin edges replaced by largest/smallest floating-point numbers representable with the current pisa.FTYPE.

classmethod from_json(resource)[source]

Instantiate a new object from the contents of a JSON file as formatted by the to_json method.

Parameters:

resource (str) – A PISA resource specification (see pisa.utils.resources)

See also

to_json

property hash

Hash value based upon less-than-double-precision-rounded numerical values and any other state (includes name, tex, is_log, and is_lin attributes). Rounding is done to HASH_SIGFIGS significant figures.

Set this class attribute to None to keep full numerical precision in the values hashed (but be aware that this can cause equal things defined using different unit orders-of-magnitude to hash differently).

Type:

int

property hashable_state

OrderedDict containing simplified state attributes (i.e. some state attributes are represented by their hashes) used for testing equality between two objects.

Use hashable_state for faster equality checks and normalized_state for inspecting the contents of each state attribute pre-hashing

property inbounds_criteria

Return string boolean criteria indicating e.g. an event falls within the limits of the defined binning.

This can be used for e.g. applying cuts to events.

See also

pisa.core.events.keepInbounds

index(x)[source]

Return integer index of bin identified by x.

Parameters:

x (int, string) – If int, ensure it is a valid index and return; if string, look for bin with corresponding name.

Returns:

idx – index of bin corresponding to x

Return type:

int

Raises:

ValueError if x cannot identify a valid bin

static is_bin_spacing_lin_uniform(bin_edges)[source]

Check if bin_edges define a linearly-uniform bin spacing.

Parameters:

bin_edges (sequence) – Fewer than 2 bin_edges - raises ValueError Two bin_edges - returns True as a reasonable guess More than two bin_edges - whether spacing is linear is computed

Return type:

bool

Raises:

ValueError if fewer than 2 bin_edges are specified.

static is_bin_spacing_log_uniform(bin_edges)[source]

Check if bin_edges define a logarithmically-uniform bin spacing.

Parameters:

bin_edges (sequence) –

Fewer than 2 bin_edges - raises ValueError Two bin_edges - returns False as a reasonable guess (spacing is

assumed to be linear)

More than two bin_edges - whether spacing is linear is computed

Return type:

bool

static is_binning_ok(bin_edges)[source]

Check that there are 2 or more bin edges, and that they are monotonically increasing.

Parameters:

bin_edges (sequence) – Bin edges to check the validity of

Return type:

bool, True if binning is OK, False if not

is_compat(other)[source]

Compatibility – for now – is defined by all of self’s bin edges form a subset of other’s bin edges (i.e. you can downsample to get from the other binning to this binning), and the units must be compatible.

Note that this might bear revisiting, or redefining just for special circumstances.

Parameters:

other (OneDimBinning)

Return type:

bool

property is_irregular

True if bin spacing is not unform in the space defined (i.e., NOT linearly-uniform if is_lin or NOT logarithmically-uniform if is_log).

Type:

bool

property is_lin

Whether binning is to be treated in a linear space

Type:

bool

property is_log

Whether binning is to be treated in a log space

Type:

bool

iterbins()[source]

Return an iterator over each bin. The elments returned by the iterator are each a OneDimBinning object, just containing a single bin.

Note that for one test, iterbins is about 500x slower than iteredgetuples.

Return type:

bin_iterator

See also

iteredgetuples

Faster but only returns edges of bins, not OneDimBinning objects.

iteredgetuples()[source]

Return an iterator over each bin’s edges. The elments returned by the iterator are each a tuple, containing the edges of the bin. Units are stripped prior to iteration for purposes of speed.

Return type:

edges_iterator

See also

iterbins

Similar, but returns a OneDimBinning object for each bin; slower than this method (by as much as 500x in one test) but easier to work with.

ito(units)[source]

Convert units in-place. Cf. Pint’s ito method.

property label

TeX-formatted axis label, including units (if not dimensionless)

property midpoints

linear average of each bin’s edges.

Type:

array

Type:

Midpoints of the bins

property name

Name of the dimension

property normalize_values

Whether to normalize quantities’ units prior to hashing

Type:

bool

property normalized_state

OrderedDict containing normalized (base units, and rounded to appropriate precision) state attributes used for testing equality between two objects.

Use hashable_state for faster equality checks and normalized_state for inspecting the contents of each state attribute pre-hashing

property num_bins

Number of bins

Type:

int

oversample(factor)[source]

Return a OneDimBinning object oversampled relative to this object’s binning.

Parameters:

factor (integer) – Factor by which to oversample the binning, with factor-times as many bins (not bin edges) as this object has.

Returns:

new_binning – New binning, oversampled from the current binning.

Return type:

OneDimBinning

Raises:

ValueError if illegal value is specified for factor

Notes

Bin names are _not_ preserved for any factor except 1 since it is ambiguous how names should be propagated. If you wish to have bin names after oversampling, assign them afterwards.

property range

range of the binning, (max-min) bin edges

Type:

float

rehash()[source]

Force hash and edges_hash attributes to be recomputed

property serializable_state

OrderedDict containing savable state attributes

property shape

shape of binning, akin to nump.ndarray.shape

Type:

tuple

property size

total number of bins

Type:

int

property tex

TeX label

Type:

string

to(units)[source]

Convert bin edges’ units to units.

Parameters:

units (None, string, or pint.Unit)

Returns:

new_binning – New binning object whose edges have units units

Return type:

OneDimBinning

to_json(filename, **kwargs)[source]

Serialize the state to a JSON file that can be instantiated as a new object later.

Parameters:
  • filename (str) – Filename; must be either a relative or absolute path (not interpreted as a PISA resource specification)

  • **kwargs – Further keyword args are sent to pisa.utils.jsons.to_json()

See also

from_json

Instantiate new OneDimBinning object from the file written by this method

property units

units of the bins’ edges

Type:

pint.Unit

property weighted_bin_widths

Absolute widths of bins.

property weighted_centers

Centers of the bins taking e.g. logarithmic behavior into account. I.e., if binning is logarithmic, this is not the same midpoints, whereas in all other cases, it is identical.

Type:

array

pisa.core.binning.basename(n)[source]

Remove “true” or “reco” prefix(es) and/or suffix(es) from binning name n along with any number of possible separator characters.

  • Valid (pre/suf)fix(es): “true”, “reco”

  • Valid separator characters: “<whitespace>”, “_”, “-” (any number)

Parameters:

n (string or OneDimBinning) – Name from which to have pre/suffixes stripped.

Returns:

basename

Return type:

string

Examples

>>> print(basename('true_energy'))
energy
>>> print(basename('Reconstructed coszen'))
coszen
>>> print(basename('coszen  reco'))
coszen
>>> print(basename('energy___truth'))
energy
>>> print(basename('trueenergy'))
energy
>>> print(basename('energytruth'))
energy
pisa.core.binning.is_binning(something)[source]

Return True if argument is a PISA binning (of any dimension), False otherwise

pisa.core.binning.test_MultiDimBinning()[source]

Unit tests for MultiDimBinning class

pisa.core.binning.test_OneDimBinning()[source]

Unit tests for OneDimBinning class

pisa.core.container module

Class to hold generic data in container. he data can be unbinned or binned or scalar, while translation methods between such different representations are provided.

class pisa.core.container.Container(name, representation='events')[source]

Bases: object

Container to hold data in multiple representations

Parameters:

namestr

name of container

representationhashable object, e.g. str or MultiDimBinning

Representation in which to initialize the container

property all_keys

return all available keys, regardless of representation

property all_keys_incl_aux_data

same as all_keys, but including auxiliary data

array_representations = ('events', 'log_events')
array_to_binned(key, src_representation, dest_representation, averaged=True)[source]

Histogram data array into binned data :param key: :type key: str :param src_representation: :type src_representation: str :param dest_representation: :type dest_representation: MultiDimBinning :param #averaged: :type #averaged: bool :param # if True: :param the histogram entries are averages of the numbers that: :param # end up in a given bin. This for example must be used when oscillation: :param # probabilities are translated…..otherwise we end up with probability*count: :param # per bin:

Notes

right now, CPU-only

auto_translate(key)[source]
binned_to_array(key, src_representation, dest_representation)[source]

Augmented binned data to array data

default_translation_mode = 'average'
find_valid_representation(key)[source]

Find valid, and best representation for key

get_hist(key)[source]

Return reshaped data as normal n-dimensional histogram

get_map(key, error=None)[source]

Return binned data in the form of a PISA map

property is_map

Is current representation a map/grid

property keys
property keys_incl_aux_data

same as keys, but including auxiliary data

mark_changed(key)[source]

mark a key as changed and only what is in the current representation is valid

mark_valid(key)[source]

validate data as is in current representation, regardless

property num_dims
property representation
property representation_keys
property representations
resample(key, src_representation, dest_representation)[source]

Resample a binned key into a different binning :param key: :type key: str :param src_representation: :type src_representation: MultiDimBinning :param dest_representation: :type dest_representation: MultiDimBinning

set_aux_data(key, val)[source]

Add any auxiliary data, which will not be translated or tied to a specific representation

property shape
property size
translate(key, src_representation)[source]

translate variable from source representation

key : str src_representation : representation present in container

translation_modes = ('average', 'sum', None)
static unroll_binning(key, binning)[source]

Get an Array containing the unrolled binning

class pisa.core.container.ContainerSet(name, containers=None, representation=None)[source]

Bases: object

Class to hold a set of container objects

Parameters:
  • name (str)

  • containers (list or None)

  • representation (MultiDimBinning, "events" or None)

add_container(container)[source]

Append a container whose name mustn’t exist yet

get_mapset(key, error=None)[source]

For a given key, get a MapSet

Parameters:
  • key (str)

  • error (None or str) – specify a key that errors are read from

Returns:

map_set

Return type:

MapSet

get_shared_keys(rep_indep=True)[source]

Get a tuple of all keys shared among contained containers.

Parameters:

rep_indep (bool) – Whether all keys should be considered, not just those in the current representation (default: True)

property is_map

Is current representation a map/grid

Link containers together. When containers are linked, they are treated as a single (virtual) container for binned data

Parameters:
  • key (str) – name of linked object

  • names (list) – name of containers to be linked under the given key

property names
property representation

Unlink all container

class pisa.core.container.VirtualContainer(name, containers)[source]

Bases: object

Class providing a virtual container for linked individual containers

It should just behave like a normal container

For reading, it just uses one container as a representative (no checking at the moment if the others actually contain the same data)

For writing, it creates one object that is added to all containers

Parameters:
  • name (str)

  • containers (list)

mark_changed(key)[source]

Copy data under this key from representative container into all others and then mark all as changed (see Container.mark_changed)

mark_valid(key)[source]

See Container.mark_valid

property representation
set_aux_data(key, val)[source]

See Container.set_aux_data

property shape
property size

Reset link flag and copy all accessed keys

pisa.core.container.test_container()[source]

Unit tests for Container class.

pisa.core.container.test_container_set()[source]

pisa.core.detectors module

Detector class definition and a simple script to generate, save, and plot distributions for different detectors from pipeline config file(s). A detector is represented by a DistributionMaker.

DistributionMaker: A single detector Detectors: A sequence of detectors

class pisa.core.detectors.Detectors(pipelines, label=None, set_livetime_from_data=True, profile=False, shared_params=None)[source]

Bases: object

Container for one or more distribution makers, that represent different detectors.

Parameters:
  • pipelines (Pipeline or convertible thereto, or iterable thereof) – A new pipline is instantiated with each object passed. Legal objects are already-instantiated Pipelines and anything interpret-able by the Pipeline init method.

  • shared_params (Parameter to be treated the same way in all the) – distribution_makers that contain them.

property distribution_makers
property empty_bin_indices

Find indices where there are no events present

get_outputs(**kwargs)[source]

Compute and return the outputs.

Parameters:

**kwargs – Passed on to each distribution_maker’s get_outputs method.

Return type:

List of MapSets if return_sum=True or list of lists of MapSets if return_sum=False

property hash
init_params()[source]

Returns a ParamSet including all params of all detectors. First the shared params (if there are some), then all the “single detector” params. If two detectors use a parameter with the same name (but not shared), the name of the detector is added to the parameter name (except for the first detector).

property num_events_per_bin

returns list of arrays of bin indices where none of the pipelines in the respective distribution maker have MC events

property param_selections
property params
property profile
randomize_free_params(random_state=None)[source]
report_profile(detailed=False, format_num_kwargs=None)[source]

Report timing information on contained distribution makers. See Pipeline.report_profile for details.

reset_all()[source]

Reset both free and fixed parameters to their nominal values.

reset_free()[source]

Reset only free parameters to their nominal values.

run()[source]
select_params(selections, error_on_missing=True)[source]
set_free_params(values)[source]

Set free parameters’ values.

Parameters:

values (a list of quantities)

set_nominal_by_current_values()[source]

Define the nominal values as the parameters’ current values.

setup()[source]

Setup (reset) all distribution makers

property shared_param_ind_list

A list of lists (one for each detector) containing the position of the shared params in the free params of the DistributionMaker (that belongs to the detector) together with their position in the shared parameter list.

property source_code_hash

Hash for the source code of this object’s class.

Not meant to be perfect, but should suffice for tracking provenance of an object stored to disk that were produced by a Stage.

tabulate(tablefmt='plain')[source]
update_params(params)[source]
pisa.core.detectors.main(return_outputs=False)[source]

Main; call as script with return_outputs=False or interactively with return_outputs=True

pisa.core.detectors.parse_args()[source]

Get command line arguments

pisa.core.detectors.test_Detectors(verbosity=Levels.WARN)[source]

pisa.core.distribution_maker module

DistributionMaker class definition and a simple script to generate, save, and plot a distribution from pipeline config file(s).

class pisa.core.distribution_maker.DistributionMaker(pipelines, label=None, set_livetime_from_data=True, profile=False)[source]

Bases: object

Container for one or more pipelines; the outputs from all contained pipelines are added together to create the distribution.

Parameters:
  • pipelines (Pipeline or convertible thereto, or iterable thereof) – A new pipline is instantiated with each object passed. Legal objects are already-instantiated Pipelines and anything interpret-able by the Pipeline init method.

  • label (str or None, optional) – A label for the DistributionMaker.

  • set_livetime_from_data (bool, optional) – If a (data) pipeline is found with the attr metadata and which has the contained key “livetime”, this livetime is used to set the livetime on all pipelines which have param params.livetime. If multiple such data pipelines are found and set_livetime_from_data is True, all are checked for consistency (you should use multiple `Detector`s if you have incompatible data sets).

  • profile (bool) – timing of inidividual pipelines / stages

Notes

Free params with the same name in two pipelines are updated at the same time so long as you use the update_params, set_free_params, or _set_rescaled_free_params methods. Also use select_params to select params across all pipelines (if a pipeline does not have one or more of the param selectors specified, those param selectors have no effect in that pipeline).

_*_rescaled_* properties and methods are for interfacing with a minimizer, where values are linearly mapped onto the interval [0, 1] according to the parameter’s allowed range. Avoid interfacing with these except if using a minimizer, since, e.g., units are stripped and values and intervals are non-physical.

add_covariance(covmat)[source]

Incorporates covariance between parameters. This is done by replacing relevant correlated parameters with “DerivedParams”

that depend on new parameters in an uncorrelated basis

The parameters are all updated, but this doesn’t add the new parameters in So we go to the first stage we find that has one of the original parameters and manually add this in

See the docstring in “pisa.core.param.ParamSet” for more

property empty_bin_indices

Find indices where there are no events present

get_outputs(return_sum=False, sum_map_name='total', sum_map_tex_name='Total', **kwargs)[source]

Compute and return the outputs.

Parameters:

return_sum (bool) – If True, add up all Maps in all MapSets returned by all pipelines. The result will be a single Map contained in a MapSet. If False, return a list where each element is the full MapSet returned by each pipeline in the DistributionMaker.

**kwargs

Passed on to each pipeline’s get_outputs method.

Return type:

MapSet if return_sum=True or list of MapSets if return_sum=False

property hash
property num_events_per_bin

returns an array of bin indices where none of the pipelines have MC events

assumes that all pipelines have the same binning output specs

number of events is taken out of the last stage of the pipeline

property param_selections
property params
property pipelines: list[Pipeline]
property profile
randomize_free_params(random_state=None)[source]
report_profile(detailed=False, format_num_kwargs=None)[source]

Report timing information on contained pipelines. See Pipeline.report_profile for details.

reset_all()[source]

Reset both free and fixed parameters to their nominal values.

reset_free()[source]

Reset only free parameters to their nominal values.

run()[source]

Run all pipelines

select_params(selections, error_on_missing=True)[source]
set_free_params(values)[source]

Set free parameters’ values.

Parameters:

values (list of quantities)

set_nominal_by_current_values()[source]

Define the nominal values as the parameters’ current values.

setup()[source]

Setup (reset) all pipelines

property source_code_hash

Hash for the source code of this object’s class.

Not meant to be perfect, but should suffice for tracking provenance of an object stored to disk that were produced by a Stage.

tabulate(tablefmt='plain')[source]
update_params(params)[source]
pisa.core.distribution_maker.main(return_outputs=False)[source]

Main; call as script with return_outputs=False or interactively with return_outputs=True

pisa.core.distribution_maker.parse_args()[source]

Get command line arguments

pisa.core.distribution_maker.test_DistributionMaker()[source]

Unit tests for DistributionMaker

pisa.core.events module

Events class for working with PISA events files and Data class for working with arbitrary Monte Carlo and datasets

class pisa.core.events.Data(val=None, flavint_groups=None, metadata=None)[source]

Bases: FlavIntDataGroup

Container for storing events, including metadata about the events.

Examples

[(‘cuts’, [‘analysis’]),

(‘detector’, ‘pingu’), (‘flavints_joined’,

[‘nue_cc+nuebar_cc’,

‘numu_cc+numubar_cc’, ‘nutau_cc+nutaubar_cc’, ‘nuall_nc+nuallbar_nc’]),

(‘geom’, ‘v39’), (‘proc_ver’, ‘5.1’), (‘runs’, [620, 621, 622])]

applyCut(keep_criteria)[source]

Apply a cut by specifying criteria for keeping events. The cut must be successfully applied to all flav/ints in the events object before the changes are kept, otherwise the cuts are reverted.

Parameters:

keep_criteria (string) – Any string interpretable as numpy boolean expression.

Returns:

remaining_events – An Events object with the remaining events (deepcopied) and with updated cut metadata including keep_criteria.

Return type:

Events

Examples

Keep events with true energies in [1, 80] GeV (note that units are not recognized, so have to be handled outside this method)

>>> remaining = applyCut("(true_energy >= 1) & (true_energy <= 80)")

Do the opposite with “~” inverting the criteria

>>> remaining = applyCut("~((true_energy >= 1) & (true_energy <= 80))")

Numpy namespace is available for use via np prefix

>>> remaining = applyCut("np.log10(true_energy) >= 0")
data_eq(other)[source]

Test whether the data for this object matche that of other

digitize(kinds, binning, binning_cols=None)[source]

Wrapper for numpy’s digitize function.

property hash

Probabilistically unique identifier

histogram(kinds, binning, binning_cols=None, weights_col=None, errors=False, name=None, tex=None, **kwargs)[source]

Histogram the events of all kinds specified, with binning and optionally applying weights.

Parameters:
  • kinds (string, sequence of NuFlavInt, or NuFlavIntGroup)

  • binning (OneDimBinning, MultiDimBinning or sequence of arrays) – (one array per binning dimension)

  • binning_cols (string or sequence of strings) – Bin only these dimensions, ignoring other dimensions in binning

  • weights_col (None or string) – Column to use for weighting the events

  • errors (bool) – Whether to attach errors to the resulting Map

  • name (None or string) – Name to give to resulting Map. If None, a default is derived from kinds and weights_col.

  • tex (None or string) – TeX label to give to the resulting Map. If None, default is dereived from the name specified or the derived default.

  • **kwargs (Keyword args passed to Map object)

Returns:

Map – argument

Return type:

numpy ndarray with as many dimensions as specified by binning

histogram_set(binning, nu_weights_col, mu_weights_col, noise_weights_col, mapset_name, errors=False)[source]

Uses the above histogram function but returns the set of all of them for everything in the Data object.

Parameters:
  • binning (OneDimBinning, MultiDimBinning) – The definition of the binning for the histograms.

  • nu_weights_col (None or string) – The column in the Data object by which to weight the neutrino histograms. Specify None for unweighted histograms.

  • mu_weights_col (None or string) – The column in the Data object by which to weight the muon histograms. Specify None for unweighted histograms.

  • noise_weights_col (None or string) – The column in the Data object by which to weight the noise histograms. Specify None for unweighted histograms.

  • mapset_name (string) – The name by which the resulting MapSet will be identified.

  • errors (boolean) – A flag for whether to calculate errors on the histograms or not. This defaults to False.

Returns:

MapSet – Data object.

Return type:

A MapSet containing all of the Maps for everything in this

keepInbounds(binning)[source]

Cut out any events that fall outside binning. Note that events that fall exactly on the outer edge are kept.

Parameters:

binning (OneDimBinning or MultiDimBinning)

meta_eq(other)[source]

Test whether the metadata for this object matches that of other

property muons

muon data

property names

Names of flavints joined

property neutrinos

neutrino data

property noise
transform_groups(flavint_groups)[source]

Transform Data into a structure given by the input flavint_groups. Calls the corresponding inherited function.

Parameters:

flavint_groups (string, or sequence of strings or sequence of) – NuFlavIntGroups

Returns:

t_data

Return type:

Data

update_hash()[source]

Update the cached hash value

class pisa.core.events.Events(val=None)[source]

Bases: FlavIntData

Container for storing events, including metadata about the events.

Examples

>>> from pisa.core.binning import OneDimBinning, MultiDimBinning
>>> # Load events from a PISA HDF5 file
>>> events = Events('events/events__vlvnt__toy_1_to_80GeV_spidx1.0_cz-1_to_1_1e2evts_set0__unjoined__with_fluxes_honda-2015-spl-solmin-aa.hdf5')
>>> # Apply a simple cut
>>> events = events.applyCut('(true_coszen <= 0.5) & (true_energy <= 70)')
>>> np.max(events[fi]['true_coszen']) <= 0.5
True
>>> # Apply an "inbounds" cut via a OneDimBinning
>>> true_e_binning = OneDimBinning(
...    name='true_energy', num_bins=80, is_log=True,
...    domain=[10, 60]*ureg.GeV
... )
>>> events = events.keepInbounds(true_e_binning)
>>> np.min(events[fi]['true_energy']) >= 10
True
>>> print([(k, events.metadata[k]) for k in sorted(events.metadata.keys())])
[('cuts', ['analysis']),
  ('detector', 'pingu'),
  ('flavints_joined',
     ['nue_cc+nuebar_cc',
         'numu_cc+numubar_cc',
         'nutau_cc+nutaubar_cc',
         'nuall_nc+nuallbar_nc']),
  ('geom', 'v39'),
  ('proc_ver', '5.1'),
  ('runs', [620, 621, 622])]
applyCut(keep_criteria)[source]

Apply a cut by specifying criteria for keeping events. The cut must be successfully applied to all flav/ints in the events object before the changes are kept, otherwise the cuts are reverted.

Parameters:

keep_criteria (string) – Any string interpretable as numpy boolean expression.

Examples

Keep events with true energies in [1, 80] GeV (note that units are not recognized, so have to be handled outside this method)

>>> events = events.applyCut("(true_energy >= 1) & (true_energy <= 80)")

Do the opposite with “~” inverting the criteria

>>> events = events.applyCut("~((true_energy >= 1) & (true_energy <= 80))")

Numpy namespace is available for use via np prefix

>>> events = events.applyCut("np.log10(true_energy) >= 0")
data_eq(other)[source]

Test whether the data for this object matches that of other

property flavint_groups

All flavor/interaction type groups (even singletons) present

property flavints_present

Returns a tuple of the flavints that are present in the events

property hash

Hash value

histogram(kinds, binning, binning_cols=None, weights_col=None, errors=False, name=None, tex=None)[source]

Histogram the events of all kinds specified, with binning and optionally applying weights.

Parameters:
  • kinds (string, sequence of NuFlavInt, or NuFlavIntGroup)

  • binning (OneDimBinning, MultiDimBinning or sequence of arrays (one array per binning dimension))

  • binning_cols (string or sequence of strings) – Bin only these dimensions, ignoring other dimensions in binning

  • weights_col (None or string) – Column to use for weighting the events

  • errors (bool) – Whether to attach errors to the resulting Map

  • name (None or string) – Name to give to resulting Map. If None, a default is derived from kinds and weights_col.

  • tex (None or string) – TeX label to give to the resulting Map. If None, default is dereived from the name specified (or its value derived from kinds and weights_col).

Returns:

Map – argument

Return type:

numpy ndarray with as many dimensions as specified by binning

property joined_string

Concise string identifying _only_ joined flavints

keepInbounds(binning)[source]

Cut out any events that fall outside binning. Note that events that fall exactly on an outer edge are kept.

Parameters:

binning (OneDimBinning or MultiDimBinning)

Returns:

remaining_events

Return type:

Events

meta_eq(other)[source]

Test whether the metadata for this object matches that of other

save(fname, **kwargs)[source]

Save data structure to a file; see fileio.to_file for details

update_hash()[source]

Update the cached hash value

pisa.core.events.test_Events()[source]

Unit tests for Events class

pisa.core.events_pi module

PISA data container

class pisa.core.events_pi.EventsPi(*args, name=None, neutrinos=True, fraction_events_to_keep=None, events_subsample_index=0, **kwargs)[source]

Bases: OrderedDict

Container for events for use with PISA pi

Parameters:
  • name (string, optional) – Name to identify events

  • neutrinos (bool, optional) – Flag indicating if events represent neutrinos; toggles special behavior such as splitting into nu/nubar and CC/NC. Default is True.

  • fraction_events_to_keep (float) – Fraction of loaded events to use (use to downsample). Must be in range [0.,1.], or disable by setting to None. Default in None.

  • *args – Passed on to __init__ method of OrderedDict

  • **kwargs – Passed on to __init__ method of OrderedDict

apply_cut(keep_criteria)[source]

Apply a cut by specifying criteria for keeping events. The cut must be successfully applied to all flav/ints in the events object before the changes are kept, otherwise the cuts are reverted.

Parameters:

keep_criteria (string) – Any string interpretable as numpy boolean expression.

Examples

Keep events with true energies in [1, 80] GeV (note that units are not recognized, so have to be handled outside this method)

>>> events = events.apply_cut("(true_energy >= 1) & (true_energy <= 80)")

Do the opposite with “~” inverting the criteria

>>> events = events.apply_cut("~((true_energy >= 1) & (true_energy <= 80))")

Numpy namespace is available for use via np prefix

>>> events = events.apply_cut("np.log10(true_energy) >= 0")
keep_inbounds(binning)[source]

Cut out any events that fall outside binning. Note that events that fall exactly on an outer edge are kept.

Parameters:

binning (OneDimBinning or MultiDimBinning)

Returns:

cut_data

Return type:

EventsPi

load_events_file(events_file, variable_mapping=None, required_metadata=None, seed=123456)[source]

Fill this events container from an input HDF5 file filled with event data Optionally can provide a variable mapping so select a subset of variables, rename them, etc.

Parameters:
  • events_file (string or mapping) – If string, interpret as a path and load file at that path; the loaded object should be a mapping. If already a mapping, take and interpret events from that.

  • variable_mapping (mapping, optional) – If specified, should be a mapping where the keys are the destination variable names and the items are either the source variable names or an iterable of source variables names. In the latter case, each of the specified source variables will become a column vector in the destination array.

  • required_metadata (None, or list of str) – Can optionally specify metadata keys to parse from the input file metdata. ONLY metadata specified here will be parsed. Anything specified here MUST exist in the files.

pisa.core.events_pi.fix_oppo_flux(input_data)[source]

Fix this oppo flux insanity someone added this in the nominal flux calculation that oppo flux is nue flux if flavour is nuebar, and vice versa here we revert that, incase these oppo keys are there

pisa.core.events_pi.main()[source]

Load an events file and print the contents

pisa.core.events_pi.split_nu_events_by_flavor_and_interaction(input_data)[source]

Split neutrino events by nu vs nubar, and CC vs NC.

Should be compatible with DRAGON and GRECO samples, but this depends on the contents of the original I3 files and whatever conversion script was used to produce the HDF5 files from these.

Parameters:

input_data (mapping)

Returns:

output_data

Return type:

OrderedDict

pisa.core.map module

Map class to contain 2D histogram, error, and metadata about the contents. MapSet class to contain a set of maps.

Also provide basic mathematical operations that user applies directly to the containers but that get passed down to operate on the contained data.

class pisa.core.map.Map(name, hist, binning, error_hist=None, hash=None, tex=None, full_comparison=False)[source]

Bases: object

Class to contain a multi-dimensional histogram, error, and metadata about the histogram. Also provides basic mathematical operations for the contained data. See Examples below for how to use a Map object.

Parameters:
  • name (string) – Name for the map. Used to identify the map.

  • hist (numpy.ndarray (incl. obj array from uncertainties.unumpy.uarray)) – The “data” (counts, etc.) in the map. The shape of hist must be compatible with the binning specified.

  • binning (MultiDimBinning) – Describes the binning of the Map.

  • error_hist (numpy ndarray) – Must be same shape as hist. If specified, sets the error standard deviations for the contained hist, replacing any stddev information that might be contained in the passed hist arg.

  • hash (None, or immutable object (typically an integer)) – Hash value to attach to the map.

  • tex (None or string) – TeX string that can be used for e.g. plotting.

  • full_comparison (bool) – Whether to perform full (recursive) comparisons when testing the equality of this map with another. See __eq__ method.

Examples

>>> from pisa.core.binning import MultiDimBinning
>>> binning = MultiDimBinning([dict(name='energy', is_log=True, num_bins=4,
...                                 domain=[1, 80], units='GeV'),
...                            dict(name='coszen', is_lin=True, num_bins=5,
...                                 domain=[-1, 0])])
>>> m0 = Map(name='x', binning=binning, hist=np.zeros(binning.shape))
>>> m0
array([[ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.]])
>>> m0.binning
energy: 4 logarithmically-uniform bins spanning [1.0, 80.0] GeV
coszen: 5 equally-sized bins spanning [-1.0, 0.0]
>>> m0.hist[0:4, 0] = 1
>>> m0
array([[ 1.,  0.,  0.,  0.,  0.],
       [ 1.,  0.,  0.,  0.,  0.],
       [ 1.,  0.,  0.,  0.,  0.],
       [ 1.,  0.,  0.,  0.,  0.]])
>>> m1 = m0[0:3, 0:2]
>>> m1.binning
energy: 3 logarithmically-uniform bins spanning [1.0, 26.7496121991]
coszen: 2 equally-sized bins spanning [-1.0, -0.6]
>>> m1
array([[ 1.,  0.],
       [ 1.,  0.],
       [ 1.,  0.]])
>>> for bin in m1.iterbins():
...     print('({0:~.2f}, {1:~.2f}): {2:0.1f}'.format(
...             bin.binning.energy.midpoints[0],
...             bin.binning.coszen.midpoints[0],
...             bin.hist[0, 0]))
(2.00 GeV, -0.90 ): 1.0
(2.00 GeV, -0.70 ): 0.0
(5.97 GeV, -0.90 ): 1.0
(5.97 GeV, -0.70 ): 0.0
(17.85 GeV, -0.90 ): 1.0
(17.85 GeV, -0.70 ): 0.0
allclose(other)[source]

Check if this map and another have the same (within machine precision) bin counts

assert_compat(other)[source]
barlow_llh(expected_values, binned=False)[source]

Calculate the total barlow log-likelihood value between this map and the map described by expected_values; self is taken to be the “actual values” (or (pseudo)data), and expected_values are the expectation values for each bin. I assumes at the moment some things that are not true, namely that the weights are uniform

Parameters:
  • expected_values (numpy.ndarray or Map of same dimension as this)

  • binned (bool)

Returns:

total_barlow_llh

Return type:

float or binned_barlow_llh if binned=True

property binning

Map’s binning

Type:

pisa.core.binning.MultiDimBinning

chi2(expected_values, binned=False)[source]

Calculate the total chi-squared value between this map and the map described by expected_values; self is taken to be the “actual values” (or (pseudo)data), and expected_values are the expectation values for each bin.

Parameters:
  • expected_values (numpy.ndarray or Map of same dimension as this)

  • binned (bool)

Returns:

total_chi2

Return type:

float or binned_chi2 if binned=True

compare(ref)[source]

Compare this map with another, where the other map is taken to be the “reference” against which this is compared.

Parameters:

ref (Map) – Map against with to compare this one. ref is taken as reference. Each dimension in `ref.binning must have the same name and bin edges as this map, but the order of the dimensions does not matter.

Returns:

comparisons

  • ‘diff’ : Map, self - ref

  • ’fract’ : Map, self / ref

  • ’fractdiff’ : Map, (self - ref) / ref

  • ’max_abs_diff’ : float, max(abs(diff))

  • ’max_abs_fractdiff’ : float, max(abs(fractdiff))

  • ’nanmatch’ : bool, whether nan elements match

  • ’infmatch’ : bool, whether +inf (and separately -inf) entries match

Return type:

OrderedDict containing the following key/value pairs:

conv_llh(expected_values, binned=False)[source]

Calculate the total convoluted log-likelihood value between this map and the map described by expected_values; self is taken to be the “actual values” (or (pseudo)data), and expected_values are the expectation values for each bin.

Parameters:
  • expected_values (numpy.ndarray or Map of same dimension as this)

  • binned (bool)

Returns:

total_conv_llh

Return type:

float or binned_conv_llh if binned=True

correct_chi2(expected_values, binned=False)[source]

Calculate the total correct chi2 value between this map and the map described by expected_values; self is taken to be the “actual values” (or (pseudo)data), and expected_values are the expectation values for each bin.

Parameters:
  • expected_values (numpy.ndarray or Map of same dimension as this.)

  • binned (bool)

Returns:

total_correct_chi2

Return type:

float or binned_correct_chi2 if binned=True

downsample(*args, **kwargs)[source]

Downsample by integer factor(s), summing together merged bins’ values.

See pisa.utils.binning.MultiDimBinning.downsample for args/kwargs details.

fluctuate(method, random_state=None, jumpahead=None)[source]

Apply fluctuations to the map’s values.

Parameters:
  • method (None or string) – Valid strings are ‘’, ‘none’, ‘poisson’, ‘scaled_poisson’, ‘gauss’, or ‘gauss+poisson’. Strings are case-insensitive and whitespace is removed. The ‘scaled_poisson’ method implements a Scaled Poisson Process, which is a better approximation than a normal distribution to the true distribution of bin counts that are the result of a Poisson process with weighted events[1]. The fluctuated maps are guaranteed to have the same mean and standard deviation as the original map.

  • random_state (None or type accepted by utils.random_numbers.get_random_state)

Returns:

fluctuated_map – New map with entries fluctuated as compared to this map

Return type:

Map

References

classmethod from_json(resource)[source]

Instantiate a new Map object from a JSON file.

The format of the JSON is generated by the Map.to_json method, which converts a Map object to basic types and then numpy arrays are converted in a call to pisa.utils.jsons.to_json.

Parameters:

resource (str) – A PISA resource specification (see pisa.utils.resources)

property full_comparison

Compare element-by-element instead of just comparing hashes.

generalized_poisson_llh(expected_values=None, empty_bins=None, binned=False)[source]

compute the likelihood of this map’s count to originate from

Note that unlike the other likelihood functions, expected_values is expected to be a ditribution maker

inputs:

expected_values: OrderedDict of MapSets

empty_bins: None, list or np.ndarray (list the bin indices that are empty)

binned: bool (return the bin-by-bin llh or the sum over all bins)

property hash

Hash value

Type:

int or None

property hashable_state
property hist

Histogram array underlying the Map

Type:

numpy.ndarray

item(*args)[source]

Call item(*args) method on the contained hist, returning a single Python scalar corresponding to *args. See help for :method:`numpy.ndarray.item` for more info.

Note that this method is called by :method:`numpy.asscalar`.

Parameters:

*args – Passed to :method:`numpy.ndarray.item`

Returns:

z

Return type:

Standard Python scalar object

iterbins()[source]

Returns a bin iterator which yields a map containing a single bin each time. Note that modifications to that single-bin map will be reflected in this (the parent) map.

Note that the returned map has the attribute parent_indexer for indexing directly into to the parent map (or to a similar map).

Yields:

Map object containing one of each bin of this Map

itercoords()[source]

Iterator that yields the coordinate of each bin in the map.

llh(expected_values, binned=False)[source]

Calculate the total log-likelihood value between this map and the map described by expected_values; self is taken to be the “actual values” (or (pseudo)data), and expected_values are the expectation values for each bin.

Parameters:
  • expected_values (numpy.ndarray or Map of same dimension as this)

  • binned (bool)

Returns:

total_llh

Return type:

float or binned_llh if binned=True

log()[source]

Take natural logarithm of map’s values, returning a new map.

Returns:

log_map

Return type:

Map

log10()[source]

Take base-10 logarithm of map’s values, returning a new map.

Returns:

log10_map

Return type:

Map

mcllh_eff(expected_values, binned=False)[source]

Calculate the total LEff log-likelihood value between this map and the map described by expected_values; self is taken to be the “actual values” (or (pseudo)data), and expected_values are the expectation values for each bin.

Parameters:
  • expected_values (numpy.ndarray or Map of same dimension as this)

  • binned (bool)

Returns:

total_llh

Return type:

float or binned_llh if binned=True

mcllh_mean(expected_values, binned=False)[source]

Calculate the total LMean log-likelihood value between this map and the map described by expected_values; self is taken to be the “actual values” (or (pseudo)data), and expected_values are the expectation values for each bin.

Parameters:
  • expected_values (numpy.ndarray or Map of same dimension as this)

  • binned (bool)

Returns:

total_llh

Return type:

float or binned_llh if binned=True

metric_total(expected_values, metric, metric_kwargs=None)[source]

Compute the optimization metric on the bins of a Map

Inputs

expected_values: Map (the data/pseudo-data binned counts)

metric: str (name of the optimization metric)

metric_kwargs: None or Dict (special arguments to pass to

a special metric - right now just useful for generalized_poisson_llh)

Returns:

float (sum of the metric over all bins of expected_values)

mod_chi2(expected_values, binned=False)[source]

Calculate the total modified chi2 value between this map and the map described by expected_values; self is taken to be the “actual values” (or (pseudo)data), and expected_values are the expectation values for each bin.

Parameters:
  • expected_values (numpy.ndarray or Map of same dimension as this.)

  • binned (bool)

Returns:

total_mod_chi2

Return type:

float or binned_mod_chi2 if binned=True

property name

Map’s name

Type:

string

property nominal_values

Bin values stripped of uncertainties

Type:

numpy.ndarray

property normalize_values
property num_entries

total number of weighted entries in all bins

Type:

int

plot(symm=False, logz=False, vmin=None, vmax=None, backend=None, ax=None, title=None, cmap=None, clabel=None, clabelsize=None, xlabelsize=None, ylabelsize=None, titlesize=None, fig_kw=None, pcolormesh_kw=None, colorbar_kw=None, colorbar_label_kw=None, outdir=None, fname=None, fmt=None, binlabel_format=None, binlabel_colors=['white', 'black'], binlabel_color_thresh=None, binlabel_stripzeros=True, dpi=300, bad_color=None, pure_bin_names=False, bin_id=None, full_ax=None)[source]

Plot a 2D map.

Parameters:
  • symm (bool, optional) – Plot with symmetric (about 0) value-range limits.

  • logz (bool, optional) – Plot logarithmic value-range

  • vmin (float, optional) – Minimum and maximum values for the value-range of the plot. If None specified, these are set according to symm and/or the values of the hist in this Map.

  • vmax (float, optional) – Minimum and maximum values for the value-range of the plot. If None specified, these are set according to symm and/or the values of the hist in this Map.

  • backend (string, optional) – Matplotlib backend to use (only takes effect if matplotlib is first imported by this function).

  • ax (matplotlib.axis.Axis, optional) – Provide an axis onto which the plot is drawn; if None is specified, a new figure and axis are created.

  • title (string, optional) – Set the title to this value; if None is specified, the title is taken from the name of this Map.

  • cmap (string or matplotlib.colors.Colormap, optional)

  • clabel (string, optional) – Label to place on the colorbar

  • clabelsize (float, optional) – Size of the colorbar, x-axis label, y-axis label, and title text

  • xlabelsize (float, optional) – Size of the colorbar, x-axis label, y-axis label, and title text

  • ylabelsize (float, optional) – Size of the colorbar, x-axis label, y-axis label, and title text

  • titlesize (float, optional) – Size of the colorbar, x-axis label, y-axis label, and title text

  • fig_kw (mapping, optional) – Keyword arguments passed to call to matplotlib.pyplot.subplots; this is only done, however, if ax is None and so a new figure needs to be created.

  • pcolormesh_kw (mapping, optional) – Keyword arguments to pass to call to matplotlib.pyplot.pcolormesh (if Map is two or more dimensions).

  • colorbar_kw (mapping, optional) – Keyword arguments to pass to call to matplotlib.colorbar.

  • colorbar_label_kw (mapping, optional) – Keyword arguments to pass to call to matplotlib.colorbar.set_label.

  • fmt (string in ('pdf', 'png') or iterable thereof, optional) – File format(s) in which to save the file. If None, then the plot will not be saved.

  • outdir (string, optional) – Directory into which to save the plot. If None is provided, the the default is the current directory. Note that if fmt is None, then this argument is irrelevant.

  • fname (string, optional) – Custom filename to set for saved figure. If not provided, a name is derived from the name attribute of the Map. Note that if fmt is None, then this argument is irrelevant.

  • binlabel_format (str, optional) – Format string to label the content in each bin. If None (default), the bins will not be labeled. Bin labels are generated by calling .format(zi) on the given string, where zi is the z-value of bin i.

  • binlabel_stripzeros (bool, optional) – Strip zeros from bin labels. Default: True

  • binlabel_colors (str or list of str, optional) – Colors to be used below (index 0) and above (index 1) the binlabel_color_thresh value. Default: “white” below and “black” above threshold. If only one str is given, all labels will have that color.

  • binlabel_color_thresh (float or str, optional) – Threshold at which to switch color of the bin labels for better contrast. If None (default), all labels will use the last color given in binlabel_colors. If a float is given, bins with a value below the given number use the first color in binlabel_colors and bins with a value above the given number use the second color in binlabel_colors. If “auto”, set threshold automatically (basically half way).

  • dpi (int, optional) – Dots per inch for saved figure. Default: 300

  • bad_color (string, optional) – Can choose the color used for “bad” bins (e.g. NaN)

  • pure_bin_names (bool, optional) – If True, use the (third dimension) bin names as they are defined in binning config, without any formatting. Default: False

  • bin_id (int, optional) – If the map is a slice of a multi-dimensional map, this is the index of the slice. Used for the internal recursive function call, to keep track of current third dimension slice. Default setting produces the correct behaviour for 3-dimensional histograms. Default: None

  • full_ax (matplotlib.axes.Axes, optional) – If the map is a slice of a multi-dimensional map, this is the full axis object of the overall map. Used for the internal recursive function call, to keep track of the main axis. Default setting produces the correct behaviour for 3-dimensional histograms. Default: None

Returns:

  • fig (matplotlib.figure.Figure object)

  • ax (matplotlib.axes.Axes object)

  • pcmesh (matplotlib.collections.QuadMesh)

  • colorbar (matplotlib.colorbar.Colorbar)

project(axis, keepdims=False)[source]

Project all dimensions onto a single axis.

Parameters:
  • axis (string or int) – Dimensions to be projected onto.

  • keepdims (bool) – If True, marginalizes out (removes) the _un_specified dimensions. If False, the binning in the summed dimension(s) includes the full range of the binning for each dimension in the original Map. Note that if you want to remove all _singleton_ dimensions (which could include the axis specified here), call the squeeze method on the result of project.

Returns:

projection

Return type:

Map

rebin(new_binning)[source]

Rebin the map with bin edge locations and names according to those specified in new_binning.

Calls the rebin function in the pisa.core.map.rebin module to do the actual work.

Parameters:

new_binning (MultiDimBinning) – Dimensions specified in new_binning must match (modulo pre/suffixes) the current dimensions.

Return type:

Map binned according to new_binning.

See also

None

function called to do the work

reorder_dimensions(order)[source]

Rearrange the dimensions in the map. This affects both the binning and the contained histogram.

Parameters:

order (MultiDimBinning or sequence of str, int, or OneDimBinning) – Ordering desired for the dimensions of this map. See binning.reorder_dimensions for details on how to specify order.

Returns:

Map

Return type:

copy of this map but with dimensions reordered

See also

rebin

Modify Map (and its binning) by splitting or combining adjacent bins

downsample

Modify Map (and its binning) by combining adjacent bins

round2int()[source]
property serializable_state
set_errors(error_hist)[source]

Manually define the error with an array the same shape as the contained histogram. Can also remove errors by passing None.

Parameters:

error_hist (None or ndarray (same shape as hist)) – Standard deviations to apply to self.hist. If None is passed, any errors present are removed, making self.hist a bare numpy array.

set_poisson_errors()[source]

Approximate poisson errors using sqrt(n).

property shape

shape of the map, akin to nump.ndarray.shape

Type:

tuple

signed_sqrt_mod_chi2(expected_values)[source]

Calculate the binwise (signed) square-root of the modified chi2 value between this map and the map described by expected_values; self is taken to be the “actual values” (or (pseudo)data), and expected_values are the expectation values for each bin.

Parameters:

expected_values (numpy.ndarray or Map of same dimension as this.)

Returns:

m_pulls

Return type:

signed_sqrt_mod_chi2

property size

total number of elements

Type:

int

slice(**kwargs)[source]

Slice the map, where each argument is the name of a dimension. Dimensions not named are included in full (i.e., via np.slice(None)).

Note that the resulting map maintains the same number of dimensions as its parent, including the ordering of the dimensions. The size of each dimension, however, is reduced by slicing.

Note also that modifications to the returned object’s hist will modify the parent’s hist.

Examples

Indexing can be done as in the following examples:

>>> mdb = MultiDimBinning([
...     dict(name='x', domain=[0,1], is_lin=True, num_bins=5),
...     dict(name='y', domain=[1,2], is_lin=True, num_bins=10)
... ])
>>> ones = mdb.ones(name='ones')
>>> print(ones.slice(x=0,))
Map(name='ones',
        tex='{\rm ones}',
        full_comparison=False,
        hash=None,
        parent_indexer=(0, slice(None, None, None)),
        binning=MultiDimBinning([
                    OneDimBinning(name=OneDimBinning('x', 1 bin with edges at [0.0, 0.2] (behavior is linear))),
                    OneDimBinning(name=OneDimBinning('y', 10 equally-sized bins spanning [1.0, 2.0]))]),
        hist=array([[ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.]]))
>>> print(ones.slice(x=0, y=slice(None)).hist)
[[ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]]
>>> print(ones.slice(x=0, y=0).hist)
[[ 1.]]

Modifications to the slice modifies the original:

>>> mdb = MultiDimBinning([
...     dict(name='x', domain=[0,1], is_lin=True, num_bins=5),
...     dict(name='y', domain=[1,2], is_lin=True, num_bins=10)
... ])
>>> ones = mdb.ones(name='ones')
>>> sl = ones.slice(x=2)
>>> sl.hist[...] = 0
>>> print(sl.hist)
>>> print(ones.hist)
[[ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]
 [ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]
 [ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]]

See also

pisa.core.binning.MultiDimBinning.indexer

Method used to generate a raw indexer (that can be used to index into a map or a Numpy array of same dimensionality). This method is accessible from a Map map_x object via its binning attribute: map_x.binning.indexer(…)

pisa.core.binning.MultiDimBinning.broadcast

Broadcast a 1D Numpy array to dimensionality with reference to this object’s dimensionality.

split(dim, bin=None, use_basenames=False, pure_bin_names=False)[source]

Split this map into one or more maps by selecting the dim dimension and optionally the specific bin(s) within that dimension specified by bin.

If both dim and bin are specified and this identifies a single bin, a single Map is returned, while if this locates multiple bins, a MapSet is returned where each map corresponds to a bin (in the order dictated by the bin specification).

If only dim is specified, _regardless_ if multiple bins meet the (dim, bin) criteria, the maps corresponding to each bin are collected into a MapSet and returned.

Resulting maps are ordered according to the binning and are renamed as:

new_map[j].name = orig_map.name__dim.binning.bin_names[i]

if the current map has a name, or

new_map[j].name = dim.binning.bin_names[i]

if the current map has a zero-length name.

In the above, j is the index into the new MapSet and i is the index to the bin in the original binning spec. map.name is the current (pre-split) map’s name, and if the bins do not have names, then the stringified integer index to the bin, str(i), is used instead.

Parameters:
  • dim (string, int) – Name or index of a dimension in the map

  • bin (None or bin indexing object (str, int, slice, ellipsis)) – Optionally specify specific bin(s) to split out from the chosen dimension.

Returns:

split_maps – If only dim is passed, returns MapSet regardless of how many maps are found. If both dim and bin are specified and this results in selecting more than one bin, also returns a MapSet. However if both dim and bin are specified and this selects a single bin, just the indexed Map is returned. Naming of the maps and MapSet is updated to reflect what the map represents, while the hash value is copied into the new map(s).

Return type:

Map or MapSet

sqrt()[source]

Take square root of map’s values, returning a new map.

Returns:

sqrt_map

Return type:

Map

squeeze()[source]

Remove any singleton dimensions (i.e. that have only a single bin). Analagous to numpy.squeeze.

Return type:

Map with equivalent values but singleton dimensions removed

property std_devs

Uncertainties (standard deviations) per bin

Type:

numpy.ndarray

sum(axis=None, keepdims=False)[source]

Sum over dimensions corresponding to axis specification. Similar in behavior to numpy.sum method.

Parameters:
  • axis (None; or str, int, or sequence thereof) – Dimension(s) to be summed over. If None, sum over _all_ dimensions.

  • keepdims (bool) – If True, marginalizes out (removes) the specified dimensions. If False, the binning in the summed dimension(s) is expanded to the full range of the binning for each dimension over which the sum is performed.

Returns:

s – If all contained dimensiosn are summed over and keepdims is False, a scalar is returned. Otherwise, a Map is returned with dimensions marginalized out in the sum removed if keepdims is False.

Return type:

Map or scalar

property tex

TeX label

Type:

string

to_json(filename, **kwargs)[source]

Serialize the state to a JSON file that can be instantiated as a new object later.

Parameters:
  • filename (str) – Filename; must be either a relative or absolute path (not interpreted as a PISA resource specification)

  • **kwargs – Further keyword args are sent to pisa.utils.jsons.to_json()

See also

from_json

Intantiate new object from the file written by this method

pisa.utils.jsons.to_json

class pisa.core.map.MapSet(maps, name=None, tex=None, hash=None, collate_by_name=True)[source]

Bases: object

Ordered set of event rate maps (aka histograms) defined over an arbitrary regluar hyper-rectangular binning.

Parameters:
  • maps (Map or sequence of Map)

  • name (string)

  • tex (string)

  • hash (immutable)

  • collate_by_name (bool) –

    If True, when this MapSet is passed alongside another MapSet to a function that operates on the maps, contained maps in each will be accessed by name. Hence, only maps with the same names will be operated on simultaneously.

    If false, the contained maps in each MapSet will be accessed by their order in each MapSet. This behavior is useful if maps are renamed through some operation but their order is maintained, and then comparisons are sought with their progenitors with the original (different) name.

allclose(other)[source]

Check if this mapset and another have the same (within machine precision) bin counts

apply_to_maps(attr, *args, **kwargs)[source]
chi2_per_map(expected_values)[source]
chi2_total(expected_values)[source]
collate_with_names(vals)[source]
combine_re(regexes)[source]

For each regex passed, add together contained maps whose names match.

If a string or regex is passed, the corresponding maps are combined and returned as a Map object. If an iterable of one or more regexes is passed, each grouping found is combined into a Map separately and the resulting Maps are populated into a new MapSet to be returned.

Parameters:

regexes (compiled regex, str representing a regex, or iterable thereof) – See Python module re for formatting.

Returns:

Map if regexes is a string or regex; MapSet if regexes is an iterable of one or more strings or regexes

Return type:

combined

Raises:

ValueError – If any regexes fail to match any map names.

Notes

If special characters are used in the regex, like a backslash, be sure to use a Python raw string (which does not interpret such special characters) by prefixing the string with an “r”. E.g., the regex to match a period requires passing

regex=r’.’

Examples

Get total of trck and cscd maps, which are named with suffixes “trck” and “cscd”, respectively.

>>> total_trck_map = outputs.combine_re('.*trck')
>>> total_cscd_map = outputs.combine_re('.*cscd')

Get a MapSet with both of the above maps in it (and a single command)

>>> total_pid_maps = outputs.combine_re(['.*trck', '.*cscd'])

Strict name-checking, combine nue_cc + nuebar_cc, including both cascades and tracks.

>>> nue_cc_nuebar_cc_map = outputs.combine_re(
...     '^nue(bar){0,1}_cc_(cscd|trck)$')

Lenient nue_cc + nuebar_cc including both cascades and tracks.

>>> nue_cc_nuebar_cc_map = outputs.combine_re('nue.*_cc_.*')

Total of all maps

>>> total = outputs.combine_re('.*')

See also

combine_wildcard

Similar method but using wildcards (i.e., globbing, like filename matching in the Unix shell)

References

rePython module used for parsing regular expressions

https://docs.python.org/2/library/re.html

combine_wildcard(expressions)[source]

For each expression passed, add together contained maps whose names match.

Expressions can contain wildcards like those used in the Unix shell.

Valid wildcards (from fnmatch docs, link below):

“*” : matches everything “?” : mateches any single character “[seq]” : matches any character in seq “[!`seq`]” : matches any character not in seq

Note that if a string is passed, the matching maps are combined and returned as a Map object. If an iterable of strings is passed, each grouping found is combined into a Map separately and the resulting Maps are populated into a new MapSet to be returned.

Parameters:

expressions (string or sequence thereof) – See Python module fnmatch for more info.

Returns:

Map if expressions is a string; MapSet if expressions is an iterable of one or more strings

Return type:

combined

Raises:

ValueError – If any expressions fail to match any map names.

Examples

>>> total_trck_map = outputs.combine_wildcard('*trck')
>>> total_cscd_map = outputs.combine_wildcard('*cscd')
>>> total_pid_maps = outpubs.combine_wildcard(['*trck', '*cscd'])
>>> nue_cc_nuebar_cc_map = outputs.combine_wildcard('nue*_cc_*')
>>> total = outputs.combine_wildcard('*')

See also

combine_re

similar method but using regular expressions

References

fnmatchPython module used for parsing the expression with wildcards

https://docs.python.org/2/library/fnmatch.html

compare(ref)[source]

Compare maps in this MapSet against a reference MapSet.

Parameters:

ref (MapSet) – Maps taken as the reference against which to compare maps contained within this MapSet.

Returns:

stats – Each key is the name of a map, and each value is istelf an OrderedDict as returned by the Map.compare method

Return type:

OrderedDict

Examples

>>> stats = map_set_test.compare(map_set_ref)
downsample(*args, **kwargs)[source]
find_map(value)[source]
fluctuate(method, random_state=None, jumpahead=None)[source]

Add fluctuations to the maps in the set and return as a new MapSet.

Parameters:
  • method (None or string)

  • random_state (None, numpy.random.RandomState, or seed spec)

classmethod from_json(resource)[source]

Instantiate a new MapSet object from a JSON file.

The format of the JSON is generated by the MapSet.to_json method, which converts a MapSet object to basic types and then numpy arrays are converted in a call to pisa.utils.jsons.to_json.

Parameters:

resource (str) – A PISA resource specification (see pisa.utils.resources)

property hash

Hash value of the map set is based upon the contained maps. * If all maps have the same hash value, this value is returned as

the map set’s hash

  • If one or more maps have different hash values, a list of the contained maps’ hash values is hashed

  • If any contained map has None for hash value, the hash value of the map set is also None (i.e., invalid)

hash_maps(map_names=None)[source]

Generate a hash on the contained maps (i.e. exclude state pertaining only to the MapSet itself, but include all state pertaining to the contained Maps).

Parameters:

map_names (None or sequence of strings) – If sequence of strings, use these as the map names instead of any names contained.

Returns:

hash – If any contained map hashes to None, the resulting hash will also be None.

Return type:

None or int

property hashes

hash of each map

Type:

list of int

index(x)[source]

Find map corresponding to x and return its index. Accepts either an integer index or a map name to make interface consistent.

Parameters:

x (int, string, or Map) – Map, map name, or integer index of map in this MapSet. If a Map is passed, only its name is matched to the maps in this set.

Return type:

integer index to the map

llh_per_map(expected_values)[source]
llh_total(expected_values)[source]
log()[source]
log10()[source]
metric_per_map(expected_values, metric)[source]
metric_total(expected_values, metric, metric_kwargs=None)[source]

Compute the binned optimization metric on all maps of a mapset, then sum it up.

metric_kwargs allows to pass extra arguments to the metric, like

the number of empty bins for the generalized poisson llh (Not yet implemented for Mapset)

property name

name of the map (legal Python name)

Type:

string

property names

name of each map

Type:

list of strings

plot(*args, **kwargs)[source]
pop(*args)[source]

Remove a map and return it. If a value is passed, the map corresponding to index(value) is removed and returned instead.

Parameters:

(optional) (x) – Map, map name, or integer index of map in this MapSet. If a Map is passed, only its name is matched to the maps in this set.

Return type:

Map removed from this MapSet

See also

list.pop

project(axis, keepdims=False)[source]

Per-map projections onto single axis. See Map.project for more detailed help.

Parameters:
  • axis (string or int)

  • keepdims (bool)

Returns:

projection – Each map in this MapSet projected onto axis.

Return type:

MapSet

See also

sum

Sum over specified dimension(s)

Map.project

Method called for each map in this MapSet to perform the actual projection.

rebin(*args, **kwargs)[source]
reorder_dimensions(order)[source]

Return a new MapSet object with dimensions ordered according to order.

Parameters:

order (MultiDimBinning or sequence of string, int, or OneDimBinning) – Order of dimensions to use. Strings are interpreted as dimension basenames, integers are interpreted as dimension indices, and OneDimBinning objects are interpreted by their basename attributes (so e.g. the exact binnings in order do not have to match this object’s exact binnings; only their basenames). Note that a MultiDimBinning object is a valid sequence type to use for order.

Notes

Dimensions specified in order that are not in this object are ignored, but dimensions in this object that are missing in order result in an error.

Return type:

MapSet object with reordred dimensions.

Raises:
  • ValueError if dimensions present in this object are missing from

  • order

property serializable_state

all state needed to reconstruct object

Type:

OrderedDict

set_poisson_errors()[source]
sqrt()[source]
squeeze()[source]

Remove any singleton dimensions (i.e. that have only a single bin) from all contained maps. Analagous to numpy.squeeze.

Return type:

MapSet with equivalent values but singleton Map dimensions removed

sum(*args, **kwargs)[source]
to_json(filename, **kwargs)[source]

Serialize the state to a JSON file that can be instantiated as a new object later.

Parameters:
  • filename (str) – Filename; must be either a relative or absolute path (not interpreted as a PISA resource specification)

  • **kwargs – Further keyword args are sent to pisa.utils.jsons.to_json()

See also

from_json

Intantiate new object from the file written by this method

pisa.utils.jsons.to_json

pisa.core.map.rebin(hist, orig_binning, new_binning, normalize_values=True)[source]

Rebin a histogram.

Note that the new binning’s edges must be a subset of the original binning’s edges (i.e. no sub-division or extrapolation of bins is implemented).

Parameters:
  • hist (numpy.ndarray) – Array containing the (original) histogram’s entries

  • orig_binning (MultiDimBinning) – Original binning

  • new_binning (MultiDimBinning) – Desired binning, where new_binning.bin_edges must be a subset of orig_binning.bin_edges.

  • normalize_values (bool) – Whether to apply pisa.utils.comparisons.normQuant to the bin edges prior to comparing new_binning to orig_binning. This is computationally expensive but ensures similar binnings and eqivalent units do not cause erroneous results. It is recommended to set normalize_values=True unless you know the two binning specs are consistently defined.

Returns:

new_hist – New histogram rebinned from hist

Return type:

numpy.ndarray

pisa.core.map.reduceToHist(obj)[source]

Recursively sum to reduce an object to a single histogram.

Parameters:

obj (numpy.ndarray, Map, MapSet, or iterable of MapSets)

Returns:

hist – Single histogram version of obj

Return type:

numpy.ndarray

Raises:

TypeError if obj is an unhandled type

pisa.core.map.test_Map()[source]

Unit tests for Map class

pisa.core.map.test_MapSet()[source]

Unit tests for MapSet class

pisa.core.map.type_error(value)[source]

Generic formulation of a TypeError that can be called throughout the code

pisa.core.map.valid_nominal_values(data_array)[source]

Get the the nominal values that are valid for an array

pisa.core.param module

Define Param, ParamSet, and ParamSelector classes for handling parameters, sets of parameters, and being able to discretely switch between sets of parameter values.

class pisa.core.param.DerivedParam(name, value, unique_id=None, is_discrete=False, scales_as_log=False, nominal_value=None, tex=None, range=None, depends_names='', function_file='', help='')[source]

Bases: Param

This is a meta-parameter param that implements a param depending on other Params.

property callable: Funct
property depends_names
property dependson: dict[Param]
classmethod from_state(state) DerivedParam[source]
prior_penalty(metric)[source]

We don’t want to double-count the penalty from derived params

property range
property serializable_state
property state: dict
to_json(filename, **kwargs)[source]

Serialize the state to a JSON file that can be instantiated as a new object later.

validate_value(value)[source]
class pisa.core.param.Param(name, value, prior, range, is_fixed, unique_id=None, is_discrete=False, scales_as_log=False, nominal_value=None, tex=None, help='')[source]

Bases: object

Parameter class to store any kind of parameters

Parameters:
  • name (string)

  • value (scalar, bool, pint Quantity (value with units), string, or None)

  • prior (pisa.prior.Prior or instantiable thereto)

  • range (sequence of two scalars or Pint quantities, or None)

  • is_fixed (bool)

  • unique_id (string, optional) – If None is provided (default), unique_id is set to name

  • is_discrete (bool, optional) – Default is False

  • scales_as_log (bool, optional) – Rescale the log of the parameter’s value between 0 and 1 for minimization, rather than the value itself. This can help optimizing parameters spanning several orders of magnitude.

  • nominal_value (same type as value, optional) – If None (default), set to same as value

  • tex (None or string)

  • help (string)

Notes

In the case of a free (`is_fixed`=False) parameter, a valid range for the parameter should be spicfied and a prior must be assigned to compute llh and chi2 values.

Examples

>>> from pisa import ureg
>>> from pisa.core.prior import Prior
>>> gaussian = Prior(kind='gaussian', mean=10*ureg.meter,
...                  stddev=1*ureg.meter)
>>> x = Param(name='x', value=1.5*ureg.foot, prior=gaussian,
...           range=[-10, 60]*ureg.foot, is_fixed=False, is_discrete=False)
>>> x.value
<Quantity(1.5, 'foot')>
>>> print(x.prior_penalty(metric='llh'))
-45.532515919999994
>>> print(x.to('m'))
>>> x.value = 10*ureg.m
>>> print(x.value)
<Quantity(32.8083989501, 'foot')>
>>> x.ito('m')
>>> print(x.value)
>>> x.prior_penalty(metric='llh')
-1.5777218104420236e-30
>>> p.nominal_value
>>> x.reset()
>>> print(x.value)
property dimensionality
classmethod from_json(filename)[source]

Instantiate a new Param from a JSON file

property hash

hash of full state

Type:

int

ito(units)[source]

Convert this param (in place) to have units of units.

Parameters:

units (string or pint.Unit)

Return type:

None

See also

to, Pint.to, Pint.ito

property m
m_as(u)[source]
property magnitude
property nominal_value
property prior
property prior_chi2
property prior_llh
prior_penalty(metric)[source]

Return the prior penalty according to metric.

Parameters:

metric (str) – Metric to use for evaluating the prior penalty.

Returns:

penalty

Return type:

float prior penalty value

randomize(random_state=None)[source]

Randomize the parameter’s value according to a uniform random distribution within the parameter’s defined limits.

Parameters:

random_state (None, int, or RandomState) – Object to use for random state. None defaults to the global random state (this is discouraged, as results are not reproducible). An integer acts as a seed to numpy.random.seed(), and RandomState is a numpy.random.RandomState object.

property range
reset()[source]

Reset the parameter’s value to its nominal value.

property serializable_state
set_nominal_to_current_value()[source]

Define the nominal value to the parameter’s current value.

property state
property tex
to(units)[source]

Return an equivalent copy of param but in units of units.

Parameters:

units (string or pint.Unit)

Returns:

Param

Return type:

copy of this param, but in specified units.

See also

ito, Pint.to, Pint.ito

to_json(filename, **kwargs)[source]

Serialize the state to a JSON file that can be instantiated as a new object later.

property u
property units
validate_value(value)[source]
property value
class pisa.core.param.ParamSelector(regular_params=None, selector_param_sets=None, selections=None)[source]

Bases: object

Parameters:
  • regular_params (ParamSet or instantiable thereto)

  • selector_param_sets (None, dict, or sequence of dict) –

    Dict(s) format:
    {

    ‘<name1>’: <ParamSet or instantiable thereto>, ‘<name2>’: <ParamSet or instantiable thereto>, …

    }

    The names are what must be specified in selections to select the corresponding ParamSets. Params specified in any of the ParamSets within selector_param_sets cannot be in `regular_params.

  • selections (None, string, or sequence of strings) – One string is required per

Notes

Params specified in regular_params are enforced to be mutually exclusive with params in the param sets specified by selector_param_sets.

get(name, selector=None)[source]
property param_selections
property params
select_params(selections=None, error_on_missing=False)[source]
update(p, selector=None, existing_must_match=False, extend=True)[source]
class pisa.core.param.ParamSet(*args)[source]

Bases: MutableSequence, Set

Container class for a set of parameters. Most methods are passed through to contained params.

Interface is a superset of both MutableSequence (i.e., behaves like a Python list, so ordering, appending, extending, etc. all work) and Set (i.e., behaves like a Python set, so no duplicates (tested by name) are allowed, you can test set membership like issuperset, issubset, etc.). See ..

https://docs.python.org/3/library/collections.abc.html

for the definitions of the MutableSequence and Set interfaces.

Parameters:

*args (one or more Param objects or sequences thereof)

Examples

>>> from pisa import ureg
>>> from pisa.core.prior import Prior
>>> e_prior = Prior(kind='gaussian', mean=10*ureg.GeV, stddev=1*ureg.GeV)
>>> cz_prior = Prior(kind='uniform', llh_offset=-5)
>>> reco_energy = Param(name='reco_energy', value=12*ureg.GeV,
...                     prior=e_prior, range=[1, 80]*ureg.GeV,
...                     is_fixed=False, is_discrete=False,
...                     tex=r'E^{\rm reco}')
>>> reco_coszen = Param(name='reco_coszen', value=-0.2, prior=cz_prior,
...                     range=[-1, 1], is_fixed=True, is_discrete=False,
...                     tex=r'\cos\,\theta_Z^{\rm reco}')
>>> param_set = ParamSet(reco_energy, reco_coszen)
>>> print(param_set)
reco_coszen=-2.0000e-01 reco_energy=+1.2000e+01 GeV
>>> print(param_set.free)
reco_energy=+1.2000e+01 GeV
>>> print(param_set.reco_energy.value)
12 gigaelectron_volt
>>> print([p.prior_penalty(metric='llh') for p in param_set])
[-5.0, -2]
>>> print(param_set.priors_penalty(metric='llh'))
-7.0
>>> print(param_set.values_hash)
3917547535160330856
>>> print(param_set.free.values_hash)
-7121742559130813936
add_covariance(covmat: dict) None[source]
Correlates several Params.

It works by taking the existing params, and rotating them into a new, uncorrelated, basis state. New parameters are added in the new basis, and the old params are replaced with derived params The fits therefore are done in the uncorrelated basis

Parameters:
  • covmat (dict) – A two-deep nested dictionary for covariances between Params Note: this is specifically not a 2D array such as to be explicit about which params are used

  • ex

  • {

  • Param1 – Param1: 0.9, Param2: 0.1 },

  • Param2 – Param1:0.1, Param2:0.8}

  • }

Raises:
  • KeyError if given dict has keys not not shared with Parameter names

  • TypeError if a given entry in covmat is not the proper type

  • NotImplementedError if the means of calculating the mean for a given parameters prior isn't there yet

property are_discrete
property are_fixed
property continuous
property discrete
extend(values)[source]

Append param(s) in values to this param set, but ensure params in values that are already in this param set match. Params with same name attribute are not duplicated.

(Convenience method or calling update method with existing_must_match=True and extend=True.)

fix(x)[source]

Set param(s) to be fixed in value (and hence not modifiable by e.g. a minimizer).

Note that the operation is atomic: If x is a sequence of indexing objects, if _any_ index in x cannot be found, _no_ other params specified in x will be set to be fixed.

Any params specified in x that are already fixed simply remain so.

Parameters:

x (int, str, Param, or iterable thereof) – Object or sequence to index into params to define which to affix. See index method for valid objects to use for indexing into the ParamSet.

:raises ValueError : if any index cannot be found:

property fixed
property free
classmethod from_json(filename)[source]

Instantiate a new ParamSet from a JSON file

property has_derived: bool

Returns whether or not this set contains a derived parameter

property hash

full state hash

Type:

int

index(value)[source]

Return an integer index to the Param in this ParamSet indexed by value. This does not look up a param’s value property but looks for param by name, integer index, or matching object.

Parameters:

value (int, str or Param) – The object to return an index for. If int, the integer is returned (so long as it’s in the valid range). If str, return index of param with matching name attribute. If Param object, return index of an equivalent Param in this set.

Returns:

idx

Return type:

int index to a corresponding param in this ParamSet

:raises ValueError : if value does not correspond to a param in this ParamSet:

insert(index, value)[source]

Insert value before index

property is_nominal
issubset(other)[source]
issuperset(other)[source]
property name_val_dict
property names
property nominal_values
property nominal_values_hash

hash only on the nominal param values

Type:

int

property priors
property priors_chi2
property priors_llh
priors_penalties(metric)[source]

Return the prior penalties for each param at their current values.

Parameters:

metric (str) – Metric to use for evaluating the prior.

Returns:

penalty

Return type:

list of float prior values, one for each param

priors_penalty(metric)[source]

Return the aggregate prior penalty for all params at their current values.

Parameters:

metric (str) – Metric to use for evaluating the prior.

Returns:

penalty

Return type:

float sum of all parameters’ prior values

randomize_free(random_state=None)[source]

Randomize any free parameters with according to a uniform random distribution within the parameters’ defined limits.

Parameters:

random_state (None, int, or RandomState) – Object to use for random state. None defaults to the global random state (this is discouraged, as results are not reproducible). An integer acts as a seed to numpy.random.seed(), and RandomState is a numpy.random.RandomState object.

property ranges
replace(new)[source]

Replace an existing param with new param, where the existing param must have the same name attribute as new.

Parameters:

new (Param) – New param to use instead of current param.

:raises ValueError : if new.name does not match an existing param’s name:

reset_all()[source]

Reset both free and fixed parameters to their nominal values.

reset_free()[source]

Reset only free parameters to their nominal values.

property serializable_state
set_nominal_by_current_values()[source]

Define the nominal values as the parameters’ current values.

set_values(new_params)[source]

Set values in current ParamSet to those defined in new ParamSet

Parameters:

new_params (ParamSet) – ParamSet containing set of values to change current ParamSet to.

property state
tabulate(tablefmt='plain')[source]
property tex
to_json(filename, **kwargs)[source]

Serialize the state to a JSON file that can be instantiated as a new object later.

unfix(x)[source]

Set param(s) to be free (and hence modifiable by e.g. a minimizer).

Note that the operation is atomic: If x is a sequence of indexing objects, if _any_ index in x cannot be found, _no_ other params specified in x will be set to be free.

Any params specified in x that are already free simply remain so.

Parameters:

x (int, str, Param, or iterable thereof) – Object or sequence to index into params to define which to affix. See index method for valid objects to use for indexing into the ParamSet.

:raises ValueError : if any index cannot be found:

update(obj, existing_must_match=False, extend=True)[source]

Update this param set using obj.

Default behavior is similar to Python’s dict.update, but this can be modified via existing_must_match and extend.

Parameters:
  • obj (Param, ParamSet, or sequence thereof) – Param or container with params to update and/or extend this param set

  • existing_must_match (bool) – If True, raises ValueError if param values passed in that already exist in this param set have differing values.

  • extend (bool) – If True, params not in this param set are appended.

update_existing(obj)[source]

Only existing params in this set are updated by that(those) param(s) in obj.

Convenience method for calling update with existing_must_match=False and extend=False.

property values
property values_hash

hash only on the current param values (not full state)

Type:

int

pisa.core.param.test_Param()[source]

Unit tests for Param class

pisa.core.param.test_ParamSelector()[source]

Unit tests for ParamSelector class

pisa.core.param.test_ParamSet()[source]

Unit tests for ParamSet class

pisa.core.pipeline module

Implementation of the Pipeline object, and a simple script to instantiate and run a pipeline (the outputs of which can be plotted and stored to disk).

class pisa.core.pipeline.Pipeline(config, profile=False)[source]

Bases: object

Instantiate stages according to a parsed config object; excecute stages.

Parameters:
  • config (string, OrderedDict, or PISAConfigParser) – If string, interpret as resource location; send to the config_parser.parse_pipeline_config() method to get a config OrderedDict. If OrderedDict, use directly as pipeline configuration.

  • profile (bool) – Perform timings

add_covariance(covmat)[source]

Incorporates covariance between parameters. This is done by replacing relevant correlated parameters with “DerivedParams”

that depend on new parameters in an uncorrelated basis

The parameters are all updated, but this doesn’t add the new parameters in So we go to the first stage we find that has one of the original parameters and manually add this in

property config

Deepcopy of the OrderedDict used to instantiate the pipeline

get_outputs(**get_outputs_kwargs)[source]

Wrapper around _get_outputs. The latter might have quite some overhead compared to run alone

property hash

Hash of the state of the pipeline. This hashes together a hash of the Pipeline class’s source code and a hash of the state of each contained stage.

Type:

int

index(stage_id)[source]

Return the index in the pipeline of stage_id.

Parameters:

stage_id (string or int) – Name of the stage, or stage number (0-indexed)

Returns:

idx

Return type:

integer stage number (0-indexed)

:raises ValueError : if stage_id not in pipeline.:

property param_selections

param selections collected from all stages

Type:

list of strings

property params

pipeline’s parameters

Type:

pisa.core.param.ParamSet

property profile
report_profile(detailed=False, format_num_kwargs=None)[source]

Report timing information on pipeline and contained services

Parameters:
  • detailed (bool, default False) – Whether to increase level of detail

  • format_num_kwargs (dict, optional) –

    Dictionary containing arguments passed to utils.format.format_num.

    Will display each number with three decimal digits by default.

run()[source]

Wrapper around _run_function

select_params(selections, error_on_missing=False)[source]

Select a set of alternate param values/specifications.

Parameters:
  • selections (string or iterable of strings)

  • error_on_missing (bool)

Raises:

KeyError if error_on_missing is True and any of selections does – not exist in any stage in the pipeline.

setup()[source]

Wrapper around _setup_function

property source_code_hash

Hash for the source code of this object’s class.

Not meant to be perfect, but should suffice for tracking provenance of an object stored to disk that were produced by a Stage.

property stage_names

names of stages in the pipeline

Type:

list of strings

property stages: list[Stage]

stages in the pipeline

Type:

list of Stage

tabulate(tablefmt='plain')[source]
update_params(params, existing_must_match=False, extend=False)[source]

Update params for the pipeline.

Note that any param in params in excess of those that already exist in the pipeline’s stages will have no effect.

Parameters:
  • params (ParamSet) – Parameters to be updated

  • existing_must_match (bool)

  • extend (bool)

pisa.core.pipeline.main(return_outputs=False)[source]

Main; call as script with return_outputs=False or interactively with return_outputs=True

FIXME: This is broken in various ways (easiest fix: pipeline.get_outputs() has no idx parameter anymore)

pisa.core.pipeline.parse_args()[source]

Parse command line arguments if pipeline.py is called as a script.

pisa.core.pipeline.test_Pipeline()[source]

Unit tests for Pipeline class

pisa.core.prior module

Prior class for use in pisa.core.Param objects

class pisa.core.prior.Prior(kind, **kwargs)[source]

Bases: object

Prior information for a parameter. Defines the penalty (in log-likelihood (llh)) for a parameter being at a given value (within the prior’s valid parameter range). Chi-squared penalties can also be returned (but the definition of a prior here is always in terms of llh).

Note that since this is a penalty, the more negative the prior’s log likelihood, the greater the penalty and the less likely the parameter’s value is.

Valid parameters and properties of the object differ based upon what kind of prior is specified.

Parameters:
  • kind='uniform' – Uniform prior, no preference for any position relative to the valid range, which is taken to be [-inf, +inf] [x-units].

  • llh_offset=... – Uniform prior, no preference for any position relative to the valid range, which is taken to be [-inf, +inf] [x-units].

  • kind='gaussian' – Gaussian prior, defining log likelihood penalty for parameter being at any particular position. Valid range is [-inf, +inf] [x-units].

  • mean=... – Gaussian prior, defining log likelihood penalty for parameter being at any particular position. Valid range is [-inf, +inf] [x-units].

  • stddev=... – Gaussian prior, defining log likelihood penalty for parameter being at any particular position. Valid range is [-inf, +inf] [x-units].

  • kind='linterp' – Linearly-interpolated prior. Note that “corners” in linear interpolation may cause difficulties for some minimizers.

  • param_vals=... – Linearly-interpolated prior. Note that “corners” in linear interpolation may cause difficulties for some minimizers.

  • llh_vals=... – Linearly-interpolated prior. Note that “corners” in linear interpolation may cause difficulties for some minimizers.

  • kind='spline' – Smooth spline interpolation.

  • knots=... – Smooth spline interpolation.

  • coeffs=... – Smooth spline interpolation.

  • deg=... – Smooth spline interpolation.

  • Properties

  • ----------

  • kind (Additional properties are defined based on)

  • max_at

  • max_at_str

  • state

  • valid_range

  • kind

  • kind='uniform' – llh_offset

  • kind='gaussian' – mean stddev

  • kind='linterp' – param_vals llh_vals

  • kind='spline' – knots coeffs deg

chi2()
llh()

Notes

If the parameter the prior is being applied to has units, the prior’s “x”-values specification must have compatible units.

If you implement a new prior, it *must* raise an exception if methods llh or chi2 are called with a parameter value outside the prior’s valid range, so subtle bugs aren’t introduced that appear as an issue in e.g. the minimizer.

Examples

For spline prior: knots, coeffs, and deg can be found by, e.g., scipy.interpolate.splrep; evaluation of spline priors is carried out internally by scipy.interpolate.splev, so an exact match to the output of the spline prior can be produced as follows:

>>> from scipy.interpolate import splrep, splev
>>> # Generate sample points
>>> param_vals = np.linspace(-10, 10, 100)
>>> llh_vals = param_vals**2
>>> # Define spline interpolant
>>> knots, coeffs, deg = splrep(param_vals, llh_vals)
>>> # Instantiate spline prior
>>> prior = Prior(kind='spline', knots=knots, coeffs=coeffs, deg=deg)
>>> # Generate sample points for interpolation
>>> param_upsamp = np.linspace(-10, 10, 1000)
>>> # Evaluation of spline using splev
>>> llh_upsamp = splev(param_upsamp, tck=(knots, coeffs, deg), ext=2)
>>> # Check that evaluation of spline matches call to prior.llh()
>>> all(prior.llh(param_upsamp) == llh_upsamp)
True
property serializable_state
property state
property units_str
pisa.core.prior.get_prior_bounds(obj, param=None, stddev=1.0)[source]

Obtain confidence intervals for given number of standard deviations from parameter prior.

Parameters:
  • obj (Prior, string, or Mapping) –

    if str, interpret as path from which to load a dict if dict, can be:

    template settings dict; must supply parameter name via param params dict; must supply parameter name via param prior dict

  • param (Param) – Name of parameter for which to get bounds; necessary if obj is either template settings or params

  • stddev (float or Iterable of floats) – number of stddevs

Returns:

bounds – A dictionary mapping the passed stddev values to the corresponding bounds

Return type:

OrderedDict

pisa.core.prior.test_Prior()[source]

Unit tests for Prior class

pisa.core.stage module

Stage class designed to be inherited by PISA services, such that all basic functionality is built-in.

class pisa.core.stage.Stage(data=None, params=None, expected_params=None, expected_container_keys=None, debug_mode=None, error_method=None, supported_reps=None, calc_mode=None, apply_mode=None, profile=False, in_standalone_mode=False)[source]

Bases: object

PISA stage base class.

Specialization should be done via subclasses.

Parameters:
  • data (ContainerSet or None) – object to be passed along

  • params (ParamSelector, dict of ParamSelector kwargs, ParamSet, or object instantiable to ParamSet)

  • expected_params (list of strings) – List containing required params names.

  • expected_container_keys (list of strings) – List containing required container keys.

  • debug_mode (None, bool, or string) –

    If None, False, or empty string, the stage runs normally.

    Otherwise, the stage runs in debug mode. This disables caching (TODO: where or how?). Services that subclass from the Stage class can then implement further custom behavior when this mode is set by reading the value of the self.debug_mode attribute.

  • error_method (None or string (not enforced)) – An option to define one or more dedicated error calculation methods for the stage transforms or outputs

  • supported_reps (dict) – Dictionary containing the representations allowed for calc_mode and apply_mode. If nothing is specified, Container.array_representations plus MultiDimBinning is assumed. Should have keys calc_mode and/or apply_mode, they will be created if not there.

  • calc_mode (pisa.core.binning.MultiDimBinning, str, or None) – Specify the default data representation for setup() and compute()

  • apply_mode (pisa.core.binning.MultiDimBinning, str, or None) – Specify the default data representation for apply()

  • profile (bool) – If True, perform timings for the setup, compute, and apply functions.

  • in_standalone_mode (bool) – If True, assume stage is not part of a pipeline. Affects whether setup() can be automatically rerun whenever calc_mode is changed.

apply()[source]
apply_function()[source]

Implement in services (subclasses of Stage)

property apply_mode
property calc_mode
compute()[source]
compute_function()[source]

Implement in services (subclasses of Stage)

property data

Data based on which stage may make computations and which it may modify

property debug_mode

Read-only attribute indicating whether or not the stage is being run in debug mode. None indicates non-debug mode, while non-none value indicates a debug mode.

property error_method

Read-only attribute indicating whether or not the stage will compute errors for its transforms and outputs (whichever is applicable). Errors on inputs are propagated regardless of this setting.

expected_container_keys

The full set of keys that is expected to be present in each container within data

expected_params

The full set of parameters (by name) that must be present in params

full_hash

Whether to do full hashing if true, otherwise do fast hashing

property hash

Combines source_code_hash and params.hash for checking/tagging provenance of persisted (on-disk) objects.

in_standalone_mode

Whether stage is standalone or part of a pipeline

include_attrs_for_hashes(attrs)[source]

Include a class attribute or attributes in the hash computation.

This is a convenience that allows some customization of hashing (and hence caching) behavior without having to override the hash-computation method.

Parameters:

attrs (string or sequence thereof) – Name of the attribute(s) to include for hashes. Each must be an existing attribute of the object at the time this method is invoked.

property is_map

See ContainerSet.is_map for documentation

param_hash

Hash of stage params. Also serves as an indicator of whether setup() has already been called.

property param_selections

Param selections

property params

Params

profile

Whether to perform timings

report_profile(detailed=False, **format_num_kwargs)[source]

Report timing information on calls to setup, compute, and apply

run()[source]
select_params(selections, error_on_missing=False)[source]

Apply the selections to contained ParamSet.

Parameters:
  • selections (string or iterable)

  • error_on_missing (bool)

service_name

Name of the specific service implementing the stage.

setup()[source]
setup_function()[source]

Implement in services (subclasses of Stage)

property source_code_hash

Hash for the source code of this object’s class.

Not meant to be perfect, but should suffice for tracking provenance of an object stored to disk that were produced by a Stage.

stage_name

Name of the stage (flux, osc, aeff, reco, pid, etc.)

validate_params(params)[source]

Override this method to test if params are valid; e.g., check range and dimensionality. Invalid params should be indicated by raising an exception; no value should be returned.

pisa.core.translation module

Module for data representation translation methods

pisa.core.translation.find_index(val, bin_edges)

Find index in binning for val. If val is below binning range or is nan, return -1; if val is above binning range, return num_bins. Edge inclusivity/exclusivity is defined as ..

[ bin 0 ) [ bin 1 ) ... [ bin num_bins-1 ]

Using these indices to produce histograms should yield identical results (ignoring underflow and overflow, which find_index has) that are equivalent to those produced by numpy.histogramdd.

Parameters:
  • val (scalar) – Value for which to find bin index

  • bin_edges (1d numpy ndarray of 2 or more scalars) – Must be monotonically increasing, and all bins are assumed to be adjacent

Returns:

bin_idx – -1 is returned for underflow or if val is nan. num_bins is returned for overflow. Otherwise, for bin_edges[0] <= val <= bin_edges[-1], 0 <= bin_idx <= num_bins - 1

Return type:

int in [-1, num_bins]

pisa.core.translation.find_index_unsafe(val, bin_edges)

Find bin index of val within binning defined by bin_edges.

Validity of val and bin_edges is not checked.

Parameters:
  • val (scalar) – Assumed to be within range of bin_edges (including lower and upper bin edges)

  • bin_edges (array)

Return type:

index

See also

find_index

includes bounds checking and handling of special cases

pisa.core.translation.histogram(sample, weights, binning, averaged, apply_weights=True)[source]

Histogram sample points, weighting by weights, according to binning.

Parameters:
  • sample (list of np.ndarray)

  • weights (np.ndarray)

  • binning (PISA MultiDimBinning)

  • averaged (bool) – If True, the histogram entries are averages of the numbers that end up in a given bin. This for example must be used when oscillation probabilities are translated, otherwise we end up with probability*count per bin

  • apply_weights (bool) – wether to use weights or not

pisa.core.translation.lookup(sample, flat_hist, binning)[source]

The inverse of histograming: Extract the histogram values at sample points.

Parameters:
  • sample (num_dims list of length-num_samples np.ndarray) – Points at which to find histogram’s values

  • flat_hist (np.ndarray) – Histogram values

  • binning (num_dims MultiDimBinning) – Histogram’s binning

Returns:

hist_vals

Return type:

len-num_samples np.ndarray

Notes

Handles up to 3D.

pisa.core.translation.resample(weights, old_sample, old_binning, new_sample, new_binning)[source]

Resample binned data with a given binning into any arbitrary new_binning

Parameters:
  • weights (np.ndarray)

  • old_sample (list of np.ndarray)

  • old_binning (PISA MultiDimBinning)

  • new_sample (list of np.ndarray)

  • new_binning (PISA MultiDimBinning)

Return type:

new_hist_vals

pisa.core.translation.test_find_index()[source]

Unit tests for find_index function.

Correctness is defined as producing the same histogram as numpy.histogramdd by using the output of find_index (ignoring underflow and overflow values). Additionally, -1 should be returned if a value is below the range (underflow) or is nan, and num_bins should be returned for a value above the range (overflow).

pisa.core.translation.test_histogram()[source]

Unit tests for histogram function.

Correctness is defined as matching the histogram produced by numpy.histogramdd.

Module contents