pisa.utils package

Subpackages

Submodules

pisa.utils.barlow module

Likelihoods class for computing Poisson and Barlow likelihoods.

The Poisson likelihood assumes the template being compared against is the perfect expectation, while the Barlow likelihood accounts for the template being imperfect due to being generated from finite Monte Carlo statistics.

class pisa.utils.barlow.Likelihoods[source]

Bases: object

A class to handle the likelihood calculations in OscFit. It can be used in other poisson binned-likelihood calculators as well. The class includes two likelihood spaces: the Poisson and the Barlow LLH.

The Poisson likelihood is the ideal case and assumes infinite statistical accuracy on the Monte Carlo distributions. This is the simple case and has been used in the past for analyses, but should not be used if there’s a significant statistical error on one of the samples.

The Barlow likelihood is an implementation of the likelihood model given in doi:10.1016/0010-4655(93)90005-W (“Fitting using finite Monte Carlo samples” by Barlow and Beeston). This requires the unweighted distributions of the MC and involves assuming that each MC samples is drawn from an underlying “true” expectation distribution which should be fit. This gives (number of bins)x(number of samples) new parameters to fit and allows the shape of the distributions to fluctuate in order to fit both the observed MC and the observed data. This gives a more informed result and allows one to estimate the effect of the finite MC samples.

To use this, you need to run set_data, set_mc, and the set_unweighted functions.

Important

the set_mc function takes a histogram of the average weight per event for each bin! not the total weighted histogram!

Simply calling the get_llh function after these will return the best fit LLH for your chosen likelihood function. The function takes the name of the likelihood space (“Poisson” or “Barlow”). You can retrieve the best fit weighted plots using the get_plot (total best-fit histogram including all samples) and the get_single_plots (the list of best-fit weighted histograms for each sample).

bestfit_plots = None

current_bin = None

data_histogram = None

get_llh(llh_type)[source]

Calculate the likelihood given the data, average weights, and the unweighted histograms. You can choose between “Poisson” and “Barlow” likelihoods at the moment.

If using the “Barlow” LLH, note that the method is picked to be Powell with 25 iterations maximum per step. This is not optimized at all and was explicitly chosen simply because it “worked”. This doesn’t work with the bounds set in the normal way, so the positive-definiteness of the rates is enforced in the get_llh_barlow_bin method.

get_llh_barlow_bin(a_i)[source]

The Barlow LLH finds the best-fit “expected” MC distribution using both the data and observed MC as constraints. Each bin is independent in this calculation, since the assumption is that there are no correlations between bins. This likely leads to a somewhat worse limit than you might get otherwise, but at least its conservative.

If you have a good idea for how to extend this to include bin-to-bin, let me know and I can help implement it.

get_llh_poisson()[source]: The standard binned-poisson likelihood comparing the weighted MC distribution to the data, ignoring MC statistical uncertainties.

get_plot()[source]: Get the total weighted best-fit histogram post-fit.

get_single_plots()[source]: Get the individual weighted best-fit histograms post-fit.

mc_histograms = None

reset()[source]: Re-instantiate so that we can reuse the object

set_data(data_histogram)[source]: Set up the data histogram. This histogram is flattened in order to make the looping later on a bit simpler to handle.

set_mc(mc_histograms)[source]

Set up the MC histogram. Each histogram is flattened in order to make the looping later on a bit simpler to handle. The values in each bin should correspond to the weight-per-event in that bin, NOT the total weight in that bin!

Make sure you don’t have any nulls here.

set_unweighted(unweighted_histograms)[source]: Save the unweighted distributions in the MC. These can include 0s.

shape = None

unweighted_histograms = None

pisa.utils.barr_parameterization module

External PISA file to be kept in private location Containing the Barr parameterizations for flux modifications

originaly developped by Juan Pablo Yanez

pisa.utils.callable module

This is used to define a serializable object used for functions assigned to the DerivedParams

These can be constructed and evaluated symbolically and procedurally. In principle, we may even be able to include a filepath to a seriialized Funct object such that the pipeline configs can include definitions for these therein

Contains: OPS - an Enum listing recognized operations TrigOps - some definitions for handling some of the trig functions, shared by Vars and Functs Var - a class for representing variables Funct - a series of operations done representing the functions

Uses - quite simple!

create some vars and do math on them

x = Var(‘x’) y = Var(‘y’)

function = sin(x**2) + 3*cos(y+1)**2

The object function is now callable with keyword arguments passed to the instantiated Vars

class pisa.utils.callable.Funct(first_var)[source]

Bases: TrigOps

Functions are constructed as a series of operations one to some starting value. The starting value can be whatever - a value, function or variable

add_opp(kind: OPS, other)[source]

classmethod from_json(filename)[source]: Instantiate a new Param from a JSON file

classmethod from_state(state)[source]

property serializable_state

property state

to_json(filename, **kwargs)[source]: Serialize the state to a JSON file that can be instantiated as a new object later.

class pisa.utils.callable.OPS(value)[source]

Bases: Enum

Enumerate different operations so that the Funct class can do math

ADD = 0

COS = 4

MUL = 1

POW = 2

SIN = 3

TAN = 5

classmethod from_json(filename)[source]: Instantiate a new Param from a JSON file

classmethod from_state(state)[source]

property serializable_state

property state

to_json(filename, **kwargs)[source]: Serialize the state to a JSON file that can be instantiated as a new object later.

class pisa.utils.callable.TrigOps[source]

Bases: object

These are all used by both the Var and Funct classes, so there’s some fun python hierarchy stuff going on instead

property cos

property sin

property tan

class pisa.utils.callable.Var(name=None)[source]

Bases: TrigOps

A variable

These are a lot like functions in how they are combined, but instead evaluate simply to one of the keyword arguments passed to Functions

classmethod from_json(filename)[source]: Instantiate a new Param from a JSON file

classmethod from_state(state)[source]

property name

property serializable_state

property state

to_json(filename, **kwargs)[source]: Serialize the state to a JSON file that can be instantiated as a new object later.

pisa.utils.callable.cos(target: Funct)[source]

pisa.utils.callable.sin(target: Funct)[source]

pisa.utils.callable.tan(target: Funct)[source]

pisa.utils.comparisons module

Utilities for comparing things.

recursiveEquality and recursiveAllclose traverse into potentially nested datstructures and compare all elements for equality. normQuant performs the same kind of traversal, but returns a normalized version of the input object whereby “essentially-equal” things are returned as “actually-equal” objects.

These functions are at the heart of hashing behaving as one expects, and this in turn is essential for caching to work correctly.

Important

Read carefully how each function in this module defines “equality” for various datatypes so you understand what two things being “equal” based upon these functions actually means.

E.g., (nan == nan) == True, uncorrelated uncertainties are equal if both their nominal values and standard deviations are equal regardless if they come from independent random variables, etc.

pisa.utils.comparisons.ALLCLOSE_KW = {'atol': 2.220446049250313e-16, 'equal_nan': True, 'rtol': 1e-12}: Keyword args to pass to all calls to numpy.allclose

pisa.utils.comparisons.EQUALITY_PREC = 1e-12: Precision (“rtol”) for performing equality comparisons

pisa.utils.comparisons.EQUALITY_SIGFIGS = 12: Significant figures for performing equality comparisons

pisa.utils.comparisons.FTYPE_PREC = 2.220446049250313e-16: Machine precision (“eps”) for PISA’s FTYPE (float datatype)

pisa.utils.comparisons.interpret_quantity(value, expect_sequence)[source]

Interpret a value as a pint Quantity via pisa.ureg

Parameters:

value (scalar, Quantity, or sequence interpretable as Quantity)
expect_sequence (bool) – Specify True if you expect a sequence of quantities (or a pint-Quantity containing a numpy array). This allows interpreting each element of a passed sequence as a quantity. Otherwise, specify False if you expect a scalar. This allows interpreting a pint.Qauntity tuple as a ascalar (the first element of the tuple is the magnitude and the second element contains the units).

Returns:

value

Return type:

Quantity

pisa.utils.comparisons.isbarenumeric(x)[source]: Check if input is a numerical datatype (including arrays of numbers) but without units. Note that for these purposes, booleans are not considered numerical datatypes.

pisa.utils.comparisons.isscalar(x)[source]

Check if input is a scalar object.

Best check found for now as to scalar-ness (works for pint, uncertainties, lists, tuples, numpy arrays, …) but not tested against all things.

See also

numpy.isscalar

pisa.utils.comparisons.isunitless(x)[source]

Check if input is unitless. Only the first scalar element of an Iterable (or arbitrarily nested Iterables) is checked if it has units.

Strings and bools are considered to be unit-less.

Parameters:: x (object)
Returns:: isunitless
Return type:: bool
Raises:: TypeError if a Mapping is encountered –

pisa.utils.comparisons.isvalidname(x)[source]: Name that is valid to use for a Python variable

pisa.utils.comparisons.normQuant(obj, sigfigs=None, full_norm=True)[source]

Normalize quantities such that two things that should be equal are returned as identical objects.

Handles floating point numbers, pint quantities, uncertainties, and combinations thereof as standalone objects or in sequences, dicts, or numpy ndarrays. Numerical precision issues and equal quantities represented in differently-scaled or different systems of units come out identically.

Outputs from this function (not the inputs) deemed to be equal by the above logic will compare to be equal (via the == operator and via pisa.utils.comparisons.recursiveEquality) and will also hash to equal values (via pisa.utils.hash.hash_obj).

Parameters:

obj – Object to be normalized.
sigfigs (None or int > 0) – Number of digits to which to round numbers’ significands; if None, do not round numbers.
full_norm (bool) –
If True, does full translation and normalization which is good across independent invocations and is careful about normalizing units, etc. If false, certain assumptions are made that modify the behavior described below in the Notes section which help speed things up in the case of e.g. a minimizer run, where we know certain things won’t change: * Units are not normalized. They are assumed to stay the same from

run to run.
- sigfigs are not respected; full significant figures are returned (since it takes time to round all values appropriately).

Returns:

normed_obj – Simple types are returned as the same type as at the input, Numpy ndarrays are returned in the same shape and representation as the input, Mappings (dicts) are returned as OrderedDict, and all other sequences or iterables are returned as (possibly nested) lists.

Return type:

object roughly of same type as input obj

Notes

Conversion logic by obj type or types found within obj:

Sequences and OrderedDicts (but not numpy arrays) are iterated through recursively.
Mappings without ordering (e.g. dicts) are iterated through recursively after sorting their keys, and are returned as OrderedDicts (such that the output is always consistent when serialized).
Sequences (not numpy arrays) are iterated through recursively.
Numpy ndarrays are treated as the below data types (according to the array’s dtype).
Simple objects (non-floating point / non-sequence / non-numpy / etc.) are returned unaltered (e.g. strings).
Pint quantities (numbers with units): Convert to their base units.
Floating-point numbers (including the converted pint quantities): Round values to sigfigs significant figures.
Numbers with uncertainties (via the uncertainties module) have their nominal values rounded as above but their standard deviations are rounded to the same order of magnitude (not number of significant figures) as the nominal. Therefore passing obj=10.23+/-0.25 and sigfigs=2 returns 10+/-0.0. Note that correlations are lost in the outputs of this function, so equality of the output requires merely having equal nomial values and equal standard deviations. The calculations leading to these equal numbers might have used independent random variables to arrive at them, however, and so the uncertainties module would have evaluated them to be unequal. [1]

To achieve rounding that masks floating point precision issues, set sigfigs to a value less than the number of decimal digits used for the significand of the calculation floating point precision.

For reference, the IEEE 754 floating point standard [2] uses the following:

FP16 (half precision): 3.31 significand decimal digits (11 bits)
FP32 (single precision): 7.22 significand decimal digits (24 bits)
FP64 (double precision): 15.95 significand decimal digits (53 bits)
FP128 (quad precision): 34.02 significand decimal digits (113 bits)

Logic for rounding the significand for numpy arrays was derived from [3].

References

[1] https://github.com/lebigot/uncertainties/blob/master/uncertainties/test_uncertainties.py#L436

[2] https://en.wikipedia.org/wiki/IEEE_floating_point

[3] http://stackoverflow.com/questions/18915378, answer by user BlackGriffin.

Examples

Pint quantities hash to unequal values if specified in different scales or different systems of units (even if the underlying physical quantity is identical).

>>> from pisa import ureg
>>> from pisa.utils.hash import hash_obj
>>> q0 = 1 * ureg.m
>>> q1 = 100 * ureg.cm
>>> q0 == q1
True
>>> hash_obj(q0) == hash_obj(q1)
False

Even the to_base_units() method fails for hashing to equal values, as q0 is a float and q1 is an integer.

>>> hash_obj(q0.to_base_units()) == hash_obj(q1.to_base_units())
False

Even if both quantities are floating point numbers, finite precision effects in the to_base_units conversion can still cause two things which we “know” are equal to evaluate to be unequal.

>>> q2 = 0.1 * ureg.m
>>> q3 = 1e5 * ureg.um
>>> q2 == q3
True
>>> q2.to_base_units() == q3.to_base_units()
False

normQuant handles all of these issues given an appropriate sigfigs argument.

>>> q2_normed = normQuant(q2, sigfigs=12)
>>> q3_normed = normQuant(q3, sigfigs=12)
>>> q2_normed == q3_normed
True
>>> hash_obj(q2_normed) == hash_obj(q3_normed)
True

pisa.utils.comparisons.recursiveAllclose(x, y, *args, **kwargs)[source]

Recursively verify close-equality between two objects x and y. If structure is different between the two objects, returns False

args and kwargs are passed into numpy.allclose() function

pisa.utils.comparisons.recursiveEquality(x, y, allclose_kw={'atol': 2.220446049250313e-16, 'equal_nan': True, 'rtol': 1e-12})[source]

Recursively verify equality between two objects x and y.

Parameters:

x – Objects to be compared
y – Objects to be compared

Notes

Possibly unintuitive behaviors:

Sequences of any type evaluate equal if their contents are the same. E.g., a list can equal a tuple.
Mappings of any type evaluate equal if their contents are the same. E.g. a dict can equal an OrderedDict.
nan SHOULD equal nan, +inf SHOULD equal +inf, and -inf SHOULD equal -inf … but this *only* holds true (as of now) if those values are in numpy arrays! (TODO!)
Two pint units with same __repr__ but that were derived from different unit registries evaluate to be equal. (This is contrary to pint’s implementation of equality comparisons, which is careful in case a unit is defined differently in different registries. We’ll assume this isn’t done here in PISA, until a use case arises where this is no longer a good assumption.)
Two pint units that are compatible but different (even just in magnitude prefix) evaluate to be unequal.

This behavior is chosen for the case where numbers are given independently of their units, and hence the automatic conversion facility available for comparing pint quantities is not available. The only reliable way to test equality for these “less intelligent” objects is to ensure that both the numerical values are exactly equal and that the units are exactly equal; the latter includes order of magnitude prefixes (micro, milli, …, giga, etc.).

pisa.utils.comparisons.test_isscalar()[source]: Unit test for isscalar function

pisa.utils.comparisons.test_isunitless()[source]: Unit tests for isunitless function

pisa.utils.comparisons.test_normQuant()[source]: Unit test for normQuant function

pisa.utils.comparisons.test_recursiveEquality()[source]: Unit test for recursiveEquality function

pisa.utils.config_parser module

Parse a config file into a dict containing an item for every analysis stage, that itself contains all necessary instantiation arguments/objects for that stage. For an example config file, please consider $PISA/pisa_examples/resources/settings/pipeline/example.cfg

Config File Structure

A pipeline config file is expected to contain something like the following, with the sections [pipeline] and corresponding [stage.service] required, in addition to a [binning] section:

#include file_x.cfg as x
#include file_y.cfg as y

[pipeline]

order = stageA.serviceA, stageB.serviceB

output_binning = binning1
output_key = weights, errors


[binning]

#include generic_binning.cfg

binning1.order = axis1, axis2
binning1.axis1 = {
    'num_bins': 40, 'is_log': True, 'domain': [1,80] units.GeV, 'tex': r'A_1'
    }
binning1.axis2 = {
    'num_bins': 10, 'is_lin': True, 'domain': [1,5], 'tex': r'A_2'
    }


[stageA.serviceA]

calc_mode = binning1
apply_mode = binning1
error_method = None
debug_mode = False

param.p1 = 0.0 +/- 0.5 units.deg
param.p1.fixed = False
param.p1.range = nominal + [-2.0, +2.0] * sigma

param.selector1.p2 = 0.5 +/- 0.5 units.deg
param.selector1.p2.fixed = False
param.selector1.p2.range = nominal + [-2.0, +2.0] * sigma

param.selector2.p2 = -0.5 +/- 0.1 units.deg
param.selector2.p2.fixed = False
param.selector2.p2.range = nominal + [-2.0, +2.0] * sigma


[stageB.serviceB]

calc_mode = binning1
apply_mode = binning1
error_method = None
debug_mode = False

param.p1 = ${stageA.serviceA:param.p1}
param.selector1.p2 = ${stageA.serviceA:param.selector1.p2}
param.selector2.p2 = ${stageA.serviceA:param.selector2.p2}
...

#include statements can be used to include other config files. The #include statement must be the first non-whitespace on a line, and these statements can be used anywhere within a config file.
#include resource as xyz statements behave similarly, but prepend the included file’s text with a section header containing xyz in this case.
pipeline is the top-most section that defines the hierarchy of stages and services to be instantiated. An output_binning is required to be able to get a pisa.core.map.MapSet (set of histograms) output for the pipeline; output_key then specifies the keys of the data passed through the pipeline which contain histogram weights and (if desired) errors (note: the presence of these keys is in general not obvious from a given pipeline config file itself)
binning can contain different binning definitions, that are then later referred to from within the stage.service sections.
stage.service: one such section per stage.service is necessary. It may contain the options debug_mode, error_method, calc_mode, apply_mode, which are common to all stages, and must contain all the necessary arguments and parameters for a given stage.service.
Duplicate section headers and duplicate keys within a section are illegal.

Param definitions

Every key in a stage section that starts with param.<name> is interpreted and parsed into a pisa.core.param.Param object. These can be strings (e.g. a filename–but don’t use any quotation marks) or quantities (numbers with units).

Quantities expect an expression that can be converted by the parse_quantity() function. The expression for a quantity can optionally include a simple Gaussian prior and units. The simplest definition of a quantity with neither Gaussian prior nor units would look something like this:

param.p1 = 12.5

Gaussian priors can be included for a quantity using +/- notation, where the number that follows +/- is the standard deviation. E.g.:

param.p1 = 12.5 +/- 2.3

If no units are explicitly set for a quantity, it is taken to be a quantity with special units dimensionless. Units can be set by multiplying (using *) by units.<unit> where <unit> is the short or long name (optionally including metric prefix) of a unit. For example, the following lines set equivalent values for params p1 and p2:

param.p1 = 12.5 * units.GeV
param.p2 = 12.5 * units.gigaelectronvolt

and this can be combined with the Gaussian-prior +/- notation:

param.p1 = 12.5 +/- 2.3 * units.GeV

Additional arguments to a parameter are passed in with the . notation, for example param.p1.fixed = False, which makes p1 a free parameter in the fit (by default a parameter is fixed unless specified like this).

A uniform, spline, or Jeffreys pisa.core.prior.Prior can also be set using the .prior attribute:

param.p1 = 12.5
param.p1.prior = uniform

param.p2 = 12.5
param.p2.prior = spline
param.p2.prior.data = resource_loc

param.p3 = 12.5
param.p3.prior = jeffreys

In the second case, a .prior.data option will be expected, pointing to the spline data file. If no prior is specified, it is taken to have no prior (or, equivalently, a uniform prior with no penalty).

A range must be given for a free parameter. Either as absolute range [x,y] or in conjunction with the keywords nominal (= nominal parameter value) and sigma if the param was specified with the +/- notation.

N.B.

Params that have the same name in multiple stages of the pipeline are instantiated as references to a single param in memory, so updating one updates all of them.

Note that this mechanism of synchronizing parameters holds only within the scope of a single pipeline; synchronization of parameters across pipelines is done by adding the pipelines to a single pisa.core.distribution_maker.DistributionMaker object and updating params through the DistributionMaker’s pisa.core.distribution_maker.DistributionMaker.update_params() method.

If you DO NOT want parameters to be synchronized, provide a unique_id for them. This is simply done by setting .unique_id

Param selector

A special mechanism allows the user to specify different values for the same param via the pisa.core.param.ParamSelector mechanism. This can be used for example for hypothesis testing, where for hypothesis A a param takes one value and for hypothesis B another.

A given param, say foo, then needs two definitions like the following, assuming we name our selections A and B:

param.A.foo = 1
param.B.foo = 2

The default param selector needs to be specified in the pipeline section as e.g.

param_selections = A

, which will default to the value of 1 for param foo. An instantiated pipeline can dynamically switch to another selection after instantiation.

Multiple different param selections are allowed in a single config. In the default selection they must be separated by commas.

N.B.

Currently, for better or worse, the param selector mechanism requires at least one stage which contains all of the specified selections.

class pisa.utils.config_parser.MutableMultiFileIterator(fp, fpname, fpath=None)[source]

Bases: object

Iterate through the lines of an already-open file (fp) but then can pause at any point and open and iterate through another file via the switch_to_file method (and this file can be paused to iterate through another, etc.).

This has the effect of in-lining files within files for e.g. parsing multiple files as if they’re a singe file. Which line comes from which file is also tracked for generating maximally-useful error messages, via the location method.

Note that circular references are not allowed.

Parameters:

fp (file-like object) – The (opened) main config to be read. E.g. can be an opened file, io.StringIO object, etc.
fpname (string) – Identifier for the initially fp object

property location

Full hierarchical location, formatted for display

Type:: string

switch_to_file(fp=None, fpname=None)[source]

Switch iterator to a new resource location to continue processing.

Parameters:

fp (None or file-like object) – If fp is specified, this takes precedence over fpname.
fpname (None or string) – Path of the file or resource to read from. This resource will be located and opened if fp is None.
encoding – Argument is passed to the builtin open function for opening the file.

class pisa.utils.config_parser.PISAConfigParser[source]

Bases: RawConfigParser

Parses a PISA config file, extending configparser.RawConfigParser (the backport of RawConfigParser from Python 3.x) by adding the ability to include external files inline via, for example:

#include /path/to/file.cfg
#include path/to/resource.cfg
#include path/to/resource2.cfg as section2

[section1]
key11 = value1
key12 = ${section2:key21}
key13 = value3

where the files or resources located at “/path/to/file.cfg”, “path/to/resource.cfg”, and “path/to/resource2.cfg” are effectively inlined wherever the #include statements occur.

The #include path/to/resource2.cfg as section_name syntax prefixes the contents of resource2.cfg by a section header named “section2”, expanding resource2.cfg as:

[section2]
line1 of resource2.cfg
line2 of resource2.cfg
... etc.

Special parsing rules we have added to make #include behavior sensible:

Using an #include file that contains a section header ([section_name]) or using #include file as section_name requires that the next non-blank / non-comment / non-#include line be a new section header ([section_name2]).
Empty sections after fully parsing a config will raise a ValueError. This is likely never a desired behavior, and should alert the user to inadvertent use of #include.

Also note that, unlike the default ConfigParser behavior, ExtendedInterpolation is used, whitespace surrounding text in a section header is ignored, empty lines are not allowed between multi-line values, and section names, keys, and values are all case-sensitive.

All other options are taken as the defaults / default behaviors of ConfigParser.

See help for configparser.ConfigParser for further help on valid config file syntax and parsing behavior.

INCLUDE_AS_RE = re.compile('\\s*(?P<file>.+)((?:\\s+as\\s+)(?P<as>\\S+))')

INCLUDE_RE = re.compile('\\s*#include\\s+(?P<include>\\S.*)')

SECTCRE = re.compile('\\[\\s*(?P<header>[^]]+?)\\s*\\]')

add_section(section)[source]: Create a new section in the configuration. Extends RawConfigParser.add_section by validating if the section name is a string.

property hash

Hash value of the contents (does not depend on order of sections, but does depend on order of keys within each section)

Type:: int

optionxform(optionstr)[source]: Enable case-sensitive options in .cfg files, and force all values to be ASCII strings.

read(filenames, encoding=None)[source]

Override read method to interpret filenames as PISA resource locations, then call overridden read method. Also, IOError fails here, whereas it is ignored in RawConfigParser.

For further help on this method and its arguments, see :method:`~backports.configparser.configparser.read`

set(section, option, value=None)[source]: Set an option. Extends RawConfigParser.set by validating type and interpolation syntax on the value.

pisa.utils.config_parser.interpret_param_subfields(subfields, selector=None, pname=None, attr=None)[source]

Recursively interpret and parse parameter subfields.

Parameters:

subfields (list of strings)
selector (string)
pname (string)
attr (list of strings)

Returns:

infodict

Return type:

dict

Examples

>>> print(interpret_param_subfields(subfields=['nh', 'deltam31', 'range']))
{'pname': 'deltam31', 'subfields': [], 'attr': ['range'], 'selector': 'nh'}

pisa.utils.config_parser.parse_param(config, section, selector, fullname, pname, value)[source]

Parse a param specification from a PISA config file.

Note that if the param specification does not include fixed, prior, and/or range, the defaults for these are: fixed = True, prior = None, and range = None.

If a prior is specified explicitly via .prior, this takes precendence, but if no .prior is specified and the param’s value is parsed to be a uncertainties.AffineScalarFunc (i.e. have std_dev attribute), a Gaussian prior is constructed from that and then the AffineScalarFunc is stripped out of the param’s value (such that it is just a Quantity).

Parameters:

config (pisa.utils.config_parser.PISAConfigParser)
section (string)
selector (string or None)
fullname (string)
pname (string)
value (string)

Returns:

param

Return type:

pisa.core.param.Param

pisa.utils.config_parser.parse_pipeline_config(config)[source]

Parse pipeline config.

Parameters:: config (string or pisa.utils.config_parser.PISAConfigParser)
Returns:: stage_dicts – Keys are (stage_name, service_name) tuples and values are OrderedDicts with arguments’ names as keys and values as values. Some known arg values are parsed out fully into Python objects, while the rest remain as strings that must be used or parsed elsewhere.
Return type:: OrderedDict

pisa.utils.config_parser.parse_quantity(string)[source]

Parse a string into a pint/uncertainty quantity.

Parameters:: string (string)
Returns:: value
Return type:: pint.quantity of uncertainties.core.AffineScalarFunc

Examples

>>> quant = parse_quantity('1.2 +/- 0.7 * units.meter')
>>> print(str(quant))
1.2+/-0.7 meter
>>> print('{:~}'.format(quant))
1.2+/-0.7 m
>>> print(quant.magnitude)
1.2+/-0.7
>>> print(quant.units)
meter
>>> print(quant.nominal_value)
1.2
>>> print(quant.std_dev)
0.7

Also note that spaces and the “*” are optional:

>>> print(parse_quantity('1+/-1units.GeV'))
1.0+/-1.0 gigaelectron_volt

pisa.utils.config_parser.parse_string_literal(string)[source]

Evaluate a string with certain special values, or return the string. Any further parsing must be done outside this module, as this is as specialized as we’re willing to be in assuming/interpreting what a string is supposed to mean.

Parameters:: string (string)
Returns:: val
Return type:: bool, None, or str

Examples

>>> print(parse_string_literal('true'))
True

>>> print(parse_string_literal('False'))
False

>>> print(parse_string_literal('none'))
None

>>> print(parse_string_literal('something else'))
'something else'

pisa.utils.cross_sections module

Define CrossSections class for importing, working with, and storing neutrino cross sections

class pisa.utils.cross_sections.CrossSections(ver=None, energy=None, xsec='cross_sections/cross_sections.json')[source]

Bases: FlavIntData

Cross sections for each neutrino flavor & interaction type (“flavint”). What is stored are PER-H2O-MOLECULE cross sections, in units of [m^2].

Be careful when using these; cross sections are often plotted in the literature as per-nucleon and per-energy with units [10^-38 cm^2 / GeV].

verstring: Version of cross sections to load from file. E.g. ‘genie_2.6.4’
energyNone or array of float: Energies at which cross sections are defined.
xsecstring, dictonary, or another CrossSections object: If str: provides PISA resource name from which to load cross sections If dict or CrossSections object: construct cross sections from a

deepcopy of that object

get_version()[source]: Return the cross sections version string

get_xs_ratio_integral(flavintgroup0, flavintgroup1, e_range, gamma=0, average=False)[source]

Energy-spectrum-weighted integral of (possibly a ratio of) (possibly-combined) flavor/interaction type cross sections.

Parameters:

flavintgroup0 (NuFlavIntGroup or convertible thereto) – Flavor(s)/interaction type(s) for which to combine cross sections for numerator of ratio
flavintgroup1 (None, NuFlavIntGroup or convertible thereto) – Flavor(s)/interaction type(s) for which to combine cross sections for denominator of ratio. If None is passed, the denominator of the “ratio” is effectively 1.
e_range – Range of energy over which to integrate (GeV)
gamma (float >= 0) – Power law spectral index used for weighting the integral, E^{-gamma}. Note that gamma should be >= 0.
average (bool) – If True, return the average of the cross section (ratio) If False, return the integral of the cross section (ratio)

See also

See

get_xs_ratio_value(flavintgroup0, flavintgroup1, energy, gamma=0)[source]

Get ratio of combined cross sections for flavintgroup0 to combined cross sections for flavintgroup1, weighted by E^{-gamma}.

Parameters:

flavintgroup0 (NuFlavIntGroup or convertible thereto)
flavintgroup1 (NuFlavIntGroup or convertible thereto)
energy (numeric or sequence thereof) – Energy (or energies) at which to evaluate total cross section, in units of GeV

Returns:

Ratio of combined cross sections flavintgroup0 / flavintgroup1
evaluated at each energy. Shape of returned value matches that of
passed energy parameter.

get_xs_value(flavintgroup, energy)[source]

Get (combined) cross section value (in units of m^2) for flavintgroup at energy (in units of GeV).

Parameters:

flavintgroup (NuFlavIntGroup or convertible thereto)
energy (numeric or sequence thereof) – Energy (or energies) at which to evaluate total cross section, in units of GeV

Returns:

Combined cross section for flavor/interaction types in units of
m^2, evaluated at each energy. Shape of returned value matches that of
passed energy parameter.

static load(fpath, ver=None, **kwargs)[source]: Load cross sections from a file locatable and readable by the PISA from_file command. If ver is provided, it is used to index into the top level of the loaded dictionary

static load_root_file(fpath, ver, tot_sfx='_tot', o_sfx='_o16', h_sfx='_h1', plt_sfx='_plot')[source]

Load cross sections from root file, where graphs are first-level in hierarchy. This is yet crude and not very flexible, but at least it’s recorded here for posterity.

Requires ROOT and ROOT python module be installed

Parameters:

fpath (string) – Path to ROOT file
ver (string) – Necessary to differentaite among different file formats that Ken has sent out
tot_sfx (string (default = '_tot')) – Suffix for finding total cross sections in ROOT file (if these fields are found, the oxygen/hydrogen fields are skipped)
o_sfx (string (default = '_o16')) – Suffix for finding oxygen-16 cross sections in ROOT file
h_sfx (string (default = '_h1')) – Suffix for finding hydrogen-1 cross sections in ROOT file
plt_sfx (string (default = '_plt')) – Suffix for plots containing cross sections per GeV in ROOT file

Returns:

xsec – Object containing the loaded cross sections

Return type:

pisa.utils.flavInt.FlavIntData

classmethod new_from_root(fpath, ver, **kwargs)[source]

Instantiate a new CrossSections object from ROOT file.

Parameters:

fpath (string) – PISA resource specification for location of ROOT file
ver (string) – Specify the version name of the cross sections loaded
**kwargs – Passed to method load_root_file()

Return type:

Instantiated CrossSections object

plot(save=None)[source]

Plot cross sections per GeV; optionally, save plot to file. Requires matplotlib and optionally uses seaborn to set “pretty” defaults.

saveNone, str, or dict: If None, figure is not saved (default) If a string, saves figure as that name If a dict, calls pyplot.savefig(**save) (i.e., save contains keyword args to savefig)

save(fpath, ver=None, **kwargs)[source]: Save cross sections (and the energy specification) to a file at fpath.

set_version(ver)[source]: Set the cross sections version to the string ver.

static validate_xsec(energy, xsec)[source]: Validate cross sections

pisa.utils.cross_sections.manual_test_CrossSections(outdir=None)[source]: Unit tests for CrossSections class

pisa.utils.data_proc_params module

DataProcParams class for importing, working with, and storing data processing parameters (e.g., PINGU’s V5 processing).

class pisa.utils.data_proc_params.DataProcParams(detector, proc_ver, data_proc_params=None)[source]

Bases: dict

Class for importing, working with, and storing data processing parameters.

Implements cutting and particle identification (PID) functionality that can be applied to MC/data that have the specified verion of processing applied to it.

Parameters:

data_proc_params (string or dict) –

If string: looks for the corresponding JSON resource (file) and loads
the contents as a data_proc_params dict

If dict: taken to be data_proc_params dict The data_proc_params dict must follow the format described below.
detector (string) – Converted to lower-case string which must be a detector key in data_proc_params dict
proc_ver – Converted to lower-case string which must be a proc_ver key in data_proc_params dict

Notes

All information describing the processing version is loaded from a JSON file with the following defined format:

Note that the following common cuts are defined in this class and so needn’t be defined in the JSON file:

‘1’ : Select particles

‘-1’ : Select anti-particles ‘cc’ : Select charged-current (CC) interactions ‘nc’ : Select neutral-current (NC) interactions

‘true_upgoing_zen’ : Select true-upgoing events by zenith angle

‘true_upgoing_coszen’ : Select true-upgoing events by cos(zenith) angle

data_proc_params dictionary format (and same for corresponding JSON file):

{
  # Specify the detector name, lower case

  "<lower-case detector name>": {

    # Specify the processing version, lower case
    # Examples in PINGU include "4", "5", and "5.1"

    "<lower-case processing version>": {


      # Mapping from standard names to path where these can be found in
      # source HDF5 file; separate HDF5 path nodes with a forward-slash.
      #
      # Fields that cuts or particles in the "cuts"/"pid" sections below
      # require (e.g., "cuts_step_1" for PINGU v5 processing), must be
      # added here so the code knows how to extract the info from the
      # source HDF5 file.
      #
      # Outputting a PID field to the destination PISA HDF5 file will *not*
      # be done if the "pid" field is omitted below.
      #
      # In addition to the below-named fields, "true_coszen" and
      # "reco_coszen" are generated from the data from the "true_zenith"
      # and "reco_zenith" fields, respectively. So any of those fields can
      # be used via the aforementioned names.

      "field_map": {
        "run": "<HDF5 path to corresponding node>",
        "nu_code": "<HDF5 path to corresponding node>",
        "true_energy": "<HDF5 path to corresponding node>",
        "true_zenith": "<HDF5 path to corresponding node>",
        "reco_energy": "<HDF5 path to corresponding node>",
        "reco_zenith": "<HDF5 path to corresponding node>",
        "one_weight": "<HDF5 path to corresponding node>",
        "generator_volume": "<HDF5 path to corresponding node>",
        "generator_radius": "<HDF5 path to corresponding node>",
        "detection_length": "<HDF5 path to corresponding node>",
        "interaction_type": "<HDF5 path to corresponding node>",
        "azimuth_min": "<HDF5 path to corresponding node>",
        "azimuth_max": "<HDF5 path to corresponding node>",
        "zenith_min": "<HDF5 path to corresponding node>",
        "zenith_max": "<HDF5 path to corresponding node>",
        "energy_log_min": "<HDF5 path to corresponding node>",
        "energy_log_max": "<HDF5 path to corresponding node>",
        "num_events_per_file": "<HDF5 path to corresponding node>",
        "sim_spectral_index": "<HDF5 path to corresponding node>",
        "pid": "<HDF5 path to corresponding node>",
      },


      # Mapping from file's nu code to PDG nu codes (only necessary if
      # nu_code values are not the PDG codes listed below)

      "nu_code_to_pdg_map": {
        "<source nue code>":        12,
        "<source nue_bar code>":   -12,
        "<source numu code>":       14,
        "<source numu_bar code>":  -14,
        "<source nutau code>":      16,
        "<source nutau_bar code>": -16
      },


      # Specify standard cuts

      "cuts": {


        # Cut name; "bjrej" and "analysis" listed here are just example
        # names for cuts one might specify

        "bgrej": {

          # Which of the fields in the field_map (and the derived fields
          # such as true_coszen and reco_coszen) are required for this cut?

          "fields": ["<field1>", "<field2>", ... ],

          # Expression for an event to pass the cut (not get thrown away);
          # see below for details on specifying an expression

          "pass_if": "<expression>"
        },

        "analysis": {
          "fields": ["<field1>", "<field2>", ... ],
          "pass_if": "<expression>"
        },

        "<cut name>": {
          "fields": ["<field1>", "<field2>", ... ],
          "pass_if": "<expression>"
        }
      },


      # Particle identification section

      "pid": {


        # Name of the particle (case-insensitive); e.g., in PINGU this
        # would be "trck" or "cscd"

        "<particle name 1>": {


          # Which of the fields in the field_map (and the derived fields
          # such as true_coszen and reco_coszen) are required for this cut?

          "field": [<field1>, <field2>, ...],


          # Expression for an event to be classified as this type of
          # particle; # see below for details on specifying an expression

          "criteria": "<expression>"
        }

        "<particle name 2>": {
          "field": [<field1>, <field2>, ...],
          "criteria": "<expression>"
        }
      }
    }
  }
}

Note that cuts “pass_if” and pid “criteria” expressions can make use of the numpy namespace and have access to any columns extracted from the source HDF5 file, by the standardized names given in the “field_map”. For example, if the following “fields” are specified for a cut in the data_proc_params dict:

[“cuts_step_1”, “cuts_step_2”]

then the following is a valid “pass_if” expression:: “(reco_zenith > pi/2) & (cuts_step_1 == 1) & (cuts_step_2 == 1)”

apply_cuts(data, cuts, boolean_op='&', return_fields=None)[source]

Perform cuts on data and return a dict containing return_fields from events that pass the cuts.

Parameters:

data (single-level dict or FlavIntData object)
cuts (string or dict, or sequence thereof)
boolean_op (string)
return_fields (string or sequence thereof)

static cut_bool_idx(h5group, cut_fields, keep_criteria)[source]

Return numpy boolean indexing for data in h5group given a cut specified using cut_fields in the h5group and evaluation criteria keep_criteria

Parameters:

h5group (h5py node/entity)
cut_fields (field_map dict)
keep_criteria (string)

Returns:

bool_idx

Return type:

numpy array (1=keep, 0=reject)

get_data(h5, run_settings=None, flav=None)[source]

Get data attached to an HDF5 node, returned as a dictionary.

The returned dictionary’s keys match those in the field_map and the dict’s values are the data from the HDF5’s nodes found at the addresses specified as values in the field_map

Parameters:: file_type (string, one of {'mc', 'data'})

interpret_data(data)[source]

Perform mappings from non-standard to standard values (such as translating non-PDG neutrino flavor codes to PDG codes) and add fields expected to be useful (such as coszen, derived from zen fields).

Attach / reattach the translated/new fields to the data object passed into this methd.

static populate_global_namespace(h5group, field_map, allow_missing=False)[source]: Populate the Python global namespace with variables named as the keys in field_map and values loaded from the h5group at addresses specified by the corresponding values in field_map.

static retrieve_expression(h5group, expression)[source]

Retrieve data from an HDF5 group h5group according to expresssion. This can apply expressions with simple mathematical operators and numpy functions to multiple fields within the HDF5 file to derive the output. Python keywords are _not_ allowed, since they may alias with a name.

Refer to any numpy functions by prefixing with either “np.<func>” or “numpy.<func>”. In order to specify division, spaces must surround the forward slash, such that it isn’t interpreted as a path.

Nodes in the HDF5 hierarchy are separated by forward slashes (“/”) in a path spec. We restrict valid HDF5 node names to contain the characters a-z, A-Z, 0-9, peroids (“.”), and underscores (“_”). with the additional restriction that the node name must not start with a period or a number, and a path cannot start with a slash.

Parameters:

h5group (h5py Group)
expression (string) – Expression to evaluate.

Returns:

result

Return type:

result of evaluating expression

Examples

>>> retrieve_expression('np.sqrt(MCneutrino/x**2 + MCneutrino/y**2)')

Indexing into the data arrays can also be performed, and numpy masks used as usual:

>>> expr = 'I3MCTree/energy[I3MCTree/event == I3EventHeader[0]

static retrieve_node_data(h5group, address, allow_missing=False)[source]

Retrieve data from an HDF5 group group at address address. Levels of hierarchy are separated by forward-slashes (‘/’).

See h5py for further details on specifying a valid address.

static subselect(data, fields, indices=None)[source]

static validate_cut_spec(cuts)[source]: Validate a cut specification dictionary

static validate_pid_spec(pids)[source]: Validate a PID specification dictionary

pisa.utils.fileio module

Generic file I/O, dispatching specific file readers/writers as necessary

pisa.utils.fileio.expand(path, exp_user=True, exp_vars=True, absolute=False, resolve_symlinks=False)[source]

Convenience function for expanding a path

Parameters:

path (string) – Path to be expanded.
exp_vars (bool) – Expand the string using environment variables. E.g. “$HOME/${vardir}/xyz” will have “$HOME” and “${vardir}$” replaced by the values stored in “HOME” and “vardir”.
exp_user (bool) – Expand special home dir spec character, tilde: “~”.
absolute (bool) – Make a relative path (e.g. “../xyz”) absolute, referenced from system root directory, “/dir/sbudir/xyz”.
resolve_symlinks (bool) – Resolve symlinks to the paths they refer to

Returns:

exp_path – Expanded path

Return type:

string

pisa.utils.fileio.find_files(root, regex=None, fname=None, recurse=True, dir_sorter=<function nsort>, file_sorter=<function nsort>)[source]

Find files by re or name recursively w/ ordering.

Code adapted from stackoverflow.com/questions/18282370/python-os-walk-what-order

Parameters:

root (str) – Root directory at which to start searching for files
regex (str or re.SRE_Pattern) – Only yield files matching regex.
fname (str) – Only yield files matching fname
recurse (bool) – Whether to search recursively down from the root directory
dir_sorter – Function that takes a list and returns a sorted version of it, for purposes of sorting directories
file_sorter – Function as specified for dir_sorter but used for sorting file names

Yields:

fullfilepath (str)
basename (str)
match (re.SRE_Match or None)

pisa.utils.fileio.from_cfg(fname)[source]: Load a PISA config file

pisa.utils.fileio.from_file(fname, fmt=None, **kwargs)[source]

Dispatch correct file reader based on fmt (if specified) or guess based on file name’s extension.

Parameters:

fname (string) – File path / name from which to load data.
fmt (None or string) – If string, for interpretation of the file according to this format. If None, file format is deduced by an extension found in fname.
**kwargs – All other arguments are passed to the function dispatched to read the file.

Returns:

Object instantiated from the file (string, dictionary, …). Each format
is interpreted differently.

Raises:

ValueError – If extension is not recognized

pisa.utils.fileio.from_pickle(fname)[source]: Load from a Python pickle file

pisa.utils.fileio.fsort(l, signed=True, reverse=False)[source]

Sort a sequence of strings with one or more floating point number fields in using the floating point value(s) (and intervening strings are treated as normally done). Note that + and - preceding a number are included in the floating point value unless signed=False.

Code adapted from nedbatchelder.com/blog/200712/human_sorting.html#comments

Parameters:

l (sequence of strings) – Sequence of strings to be sorted.
signed (bool, optional) – Whether to include a “+” or “-” preceeding a number in its value to be sorted. One might specify False if “-” is used exclusively as a separator in the string.
reverse (bool, optional) – Whether to reverse the sort order (True => descending order)

Returns:

sorted_l – Sorted strings

Return type:

list of strings

Examples

>>> l = ['a-0.1.txt', 'a-0.01.txt', 'a-0.05.txt']
>>> fsort(l, signed=True)
['a-0.1.txt', 'a-0.05.txt', 'a-0.01.txt']

>>> fsort(l, signed=False)
['a-0.01.txt', 'a-0.05.txt', 'a-0.1.txt']

See also

nsort: Sort using integer-only values of numbers; good for e.g. version numbers, where periods are separators rather than decimal points.

pisa.utils.fileio.get_valid_filename(s)[source]

Sanitize string to make it reasonable to use as a filename.

From https://github.com/django/django/blob/master/django/utils/text.py

Parameters:: s (string)

Examples

>>> print(get_valid_filename(r'A,bCd $%#^#*!()"' .ext '))
'a_bcd__.ext'

pisa.utils.fileio.mkdir(d, mode=488, warn=True)[source]

Simple wrapper around os.makedirs to create a directory but not raise an exception if the dir already exists

Parameters:

d (string) – Directory path
mode (integer) – Permissions on created directory; see os.makedirs for details.
warn (bool) – Whether to warn if directory already exists.

pisa.utils.fileio.nsort(l, reverse=False)[source]

Sort a sequence of strings containing integer number fields by the value of those numbers, rather than by simple alpha order. Useful for sorting e.g. version strings, etc..

Code adapted from nedbatchelder.com/blog/200712/human_sorting.html#comments

Parameters:

l (sequence of strings) – Sequence of strings to be sorted.
reverse (bool, optional) – Whether to reverse the sort order (True => descending order)

Returns:

sorted_l – Sorted strings

Return type:

list of strings

Examples

>>> l = ['f1.10.0.txt', 'f1.01.2.txt', 'f1.1.1.txt', 'f9.txt', 'f10.txt']
>>> nsort(l)
['f1.1.1.txt', 'f1.01.2.txt', 'f1.10.0.txt', 'f9.txt', 'f10.txt']

See also

fsort: Sort sequence of strings with floating-point numbers in the strings.

pisa.utils.fileio.nsort_key_func(s)[source]

Use as the key argument to the sorted function or sort method.

Code adapted from nedbatchelder.com/blog/200712/human_sorting.html#comments

Examples

>>> l = ['f1.10.0.txt', 'f1.01.2.txt', 'f1.1.1.txt', 'f9.txt', 'f10.txt']
>>> sorted(l, key=nsort_key_func)
['f1.1.1.txt', 'f1.01.2.txt', 'f1.10.0.txt', 'f9.txt', 'f10.txt']

pisa.utils.fileio.to_file(obj, fname, fmt=None, overwrite=True, warn=True, **kwargs)[source]: Dispatch correct file writer based on fmt (if specified) or guess based on file name’s extension

pisa.utils.fileio.to_pickle(obj, fname, overwrite=True, warn=True)[source]: Save object to a pickle file

pisa.utils.fisher_matrix module

A Fisher matrix class definition.

class pisa.utils.fisher_matrix.FisherMatrix(matrix, parameters, best_fits, priors=None, labels=None)[source]

Bases: object

addPrior(par, sigma)[source]: Add a prior of value sigma to the existing one for par (in quadrature)

calculateCovariance()[source]: Calculate covariance matrix from Fisher matrix (i.e. invert including priors).

checkConsistency()[source]: Check whether number of parameters matches dimension of matrix; matrix is symmetrical; parameter names are unique; and number of best_fits, labels, and priors all match number of parameters.

classmethod fromFile(filename)[source]: Load a Fisher matrix from a json file

classmethod fromPaPAFile(filename)[source]: Load Fisher matrix from json file

getBestFit(par)[source]: Get best fit value for given parameter

getCorrelation(par1, par2)[source]: Returns correlation coefficient between par1 and par2

getCovariance(par1, par2)[source]: Returns the covariance of par1 and par2

getErrorEllipse(par1, par2, confLevel=0.6827)[source]

Returns a, b, tan(2 theta) of confLevel error ellipse in par1-par2-plane with:

a: large half axis b: small half axis tan(2 theta): tilt angle, has to be divided by the aspect

ratio of the actual plot before taking arctan

Formulae taken from arXiv:0906.4123

getLabel(par)[source]: Get pretty-print label for given parameter

getParameterIndex(par)[source]: Whether par is already existing in parameter list

getPrior(par)[source]: List the prior (sigma value) for par

getPriorDict()[source]: List priors of all parameters (sigma values)

getSigma(par)[source]: Returns full standard deviation of par, marginalized over all other parameters

getSigmaNoPriors(par)[source]: Returns standard deviation of par, marginalized over all other parameters, but ignoring priors on this parameter

getSigmaStatistical(par)[source]: Returns standard deviation of par, if all other parameters are fixed (i.e. known infinitely well)

getSigmaSystematic(par)[source]: Returns standard deviation of par for infinite statistics (i.e. systematic error)

getVariance(par)[source]: Returns full variance of par

printResults(parameters=None, file=None)[source]: Prints statistical, systematic errors, priors, best fits for specified (default: all) parameters

printResultsSorted(par, file=None, latex=False)[source]: Prints statistical, systematic errors, priors, best fits sorted by parameter par

removeAllPriors()[source]: Remove all priors from this Fisher Matrix

removeParameter(par)[source]: Remove par from Fisher matrix and recalculate covariance

renameParameter(fromname, toname)[source]: Rename a parameter

saveFile(filename)[source]: Write Fisher matrix to json file

setLabel(par, newlabel)[source]: Change the pretty-print label for given parameter

setPrior(par, sigma)[source]: Set prior for parameter ‘par’ to value sigma. If sigma is None, no prior is assumed

sortByParam(par)[source]

Sorts the parameters by their impact on parameter par. Relevant quantity is covariance(par,i)/sigma_i.

Returns sorted list of (parameters, impact), par first, then ordered descendingly by impact.

static translatePrior(prior)[source]

Translates a Prior object, numeric, or None to the simplistic prior format used internally (a value for sigma).

Parameters:: prior (Prior object (gaussian or uniform), float, or None)
Returns:: sigma
Return type:: Standard deviation of prior (np.inf for uniform Prior or None)

pisa.utils.flavInt module

Classes for working with neutrino flavors (NuFlav), interactions types (IntType), “flavints” (a flavor and an interaction type) (NuFlavInt), and flavint groups (NuFlavIntGroup) in a consistent and convenient manner.

FlavIntData class for working with data stored by flavint (flavor & interaction type). This should replace the PISA convention of using raw doubly-nested dictionaries indexed as [<flavor>][<interaction type>]. For now, FlavIntData objects can be drop-in replacements for such dictionaries (they can be accessed and written to in the same way since FlavIntData subclasses dict) but this should be deprecated; eventually, all direct access of the data structure should be eliminated and disallowed by the FlavIntData object.

Define convenience tuples ALL_{x} for easy iteration

class pisa.utils.flavInt.BarSep(val)[source]

Bases: object

Context manager to make global __BAR_SSEP__ modification slightly less error-prone.

__BAR_SSEP__ defines the separator between a flavor and the string “bar” (to define the flavor as the antiparticle version; e.g. nuebar). Some datastructures in PISA 2 used ‘_’ between the two (“nue_bar”) while others did not use this. To make dealing with this (slightly) less painful, this context manager was introduced to switch between the PISA 3 default (no ‘_’) and the sometimes-use ‘_’ separator. See Examples for usage.

Parameters:: val (string) – Separator to use between flavor (“nue”, “numu”, “nutau”) and “bar”.

Examples

>>> nuebar = NuFlav('nuebar')
>>> print(str(nuebar))
nuebar
>>> with BarSep('_'):
...     print(nuebar)
nue_bar
>>> print(str(nuebar))
nuebar

class pisa.utils.flavInt.FlavIntData(val=None)[source]

Bases: dict

Container class for storing data for each NuFlavInt.

Parameters:

val (string, dict, or None) –

Data with which to populate the hierarchy.

If string, interpret as PISA resource and load data from it If dict, populate data from the dictionary If None, instantiate with None for all data

The interpreted version of val must be a valid data structure: A dict with keys ‘nue’, ‘numu’, ‘nutau’, ‘nue_bar’, ‘numu_bar’, and ‘nutau_bar’; and each item corresponding to these keys must itself be a dict with keys ‘cc’ and ‘nc’.

Notes

Accessing data (both for getting and setting) is fairly flexible. It uses dict-like square-brackets syntax, but can accept any object (or two objects) that are convertible to a NuFlav or NuFlavInt object. In the former case, the entire flavor dictionary (which includes both ‘cc’ and ‘nc’) is returned, while in the latter case whatever lives at the node is returned.

Initializing, setting and getting data in various ways:

>>> fi_dat = FlavIntData()
>>> fi_dat['nue', 'cc'] = 1
>>> fi_dat['nuenc'] = 2
>>> fi_dat['numu'] = {'cc': 'cc data...', 'nc': 'nc data...'}
>>> fi_dat[NuFlav(16), IntType(1)] == 4

>>> fi_dat['nuecc'] == 1
True
>>> fi_dat['NUE_NC'] == 2
True
>>> fi_dat['nu_e'] == {'cc': 1, 'nc': 2}
True
>>> fi_dat['nu mu cc'] == 'cc data...'
True
>>> fi_dat['nu mu'] == {'cc': 'cc data...', 'nc': 'nc data...'}
True
>>> fi_dat['nutau cc'] == 4
True

allclose(other, rtol=1e-05, atol=1e-08)[source]: Returns True if all data structures are equal and all numerical values contained are within relative (rtol) and/or absolute (atol) tolerance of one another.

property flavints

all flavints present

Type:: tuple of NuFlavInt

property flavs

all flavors present

Type:: tuple of NuFlav

id_dupes(rtol=None, atol=None)[source]

Identify flavints with duplicated data (exactly or within a specified tolerance), convert these NuFlavInt’s into NuFlavIntGroup’s and returning these along with the data associated with each.

Parameters:

rtol – Set to positive value to use as rtol argument for numpy allclose
atol – Set to positive value to use as atol argument for numpy allclose
0 (If either rtol or atol is)
enforced. (exact equality is)

Returns:

dupe_flavintgroups (list of NuFlavIntGroup) – A NuFlavIntGroup object is returned for each group of NuFlavInt’s found with duplicate data
dupe_flavintgroups_data (list of objects) – Data associated with each NuFlavIntGroup in dupe_flavintgroups. Each object in dupe_flavintgroups_data corresponds to, and in the same order as, the objects in dupe_flavintgroups.

save(fname, **kwargs)[source]: Save data structure to a file; see fileio.to_file for details

validate(fi_container)[source]: Perform basic validation on the data structure

class pisa.utils.flavInt.FlavIntDataGroup(val=None, flavint_groups=None)[source]

Bases: dict

Container class for storing data for some set(s) of NuFlavIntGroups (cf. FlavIntData, which stores one datum for each NuFlavInt separately)

Parameters:

val (None, str, or dict) – Data with which to populate the hierarchy
flavint_groups (None, str, or iterable) –
User-defined groupings of NuFlavIntGroups. These can be specified in several ways.

None
If val == None, flavint_groups must be specified If val != None, flavitn_groups are deduced from the data

string
If val is a string, it is expected to be a comma-separated list, each field of which describes a NuFlavIntGroup. The returned list of groups encompasses all possible flavor/int types, but the groups are mutually exclusive.

iterable of strings or NuFlavIntGroup
If val is an iterable, each member of the iterable is interpreted as a NuFlavIntGroup.

allclose(other, rtol=1e-05, atol=1e-08)[source]: Returns True if all data structures are equal and all numerical values contained are within relative (rtol) and/or absolute (atol) tolerance of one another.

property flavint_groups

save(fname, **kwargs)[source]: Save data structure to a file; see fileio.to_file for details

transform_groups(flavint_groups)[source]

Transform FlavIntDataGroup into a structure given by the input flavint_groups.

Parameters:: flavint_groups (string, or sequence of strings or sequence of) – NuFlavIntGroups
Returns:: transformed_fidg
Return type:: FlavIntDataGroup

validate(fi_container)[source]: Perform basic validation on the data structure

class pisa.utils.flavInt.IntType(val)[source]

Bases: object

Interaction type object.

val
See Notes.

Instantiate via a val of:

Numerical code: 1=CC, 2=NC

String (case-insensitive; all characters besides valid tokens are ignored)

Instantiated IntType object (or any method implementing int_type.code which returns a valid interaction type code)

Instantiated NuFlavInt object (or any object implementing int_type which returns a valid IntType object)

The following, e.g., are all interpreted as charged-current IntTypes:
>>> IntType('cc')
>>> IntType('
     _cc

‘)

>>> IntType('numubarcc')
>>> IntType(1)
>>> IntType(1.0)
>>> IntType(IntType('cc'))
>>> IntType(NuFlavInt('numubarcc'))

CC_CODE = 1

IGNORE = re.compile('[^a-zA-Z]')

IT_RE = re.compile('^(cc|nc)$')

NC_CODE = 2

property cc: Is this interaction type charged current (CC)?

property code: Integer code for this interaction type

property nc: Is this interaction type neutral current (NC)?

property tex: TeX representation of this interaction type

class pisa.utils.flavInt.NuFlav(val)[source]

Bases: object

Class for handling neutrino flavors (and anti-flavors)

ANTIPART_CODE = -1

FLAV_RE = re.compile('^(?P<fullflav>(?:nue|numu|nutau)(?P<barnobar>bar){0,1})$')

IGNORE = re.compile('[^a-zA-Z]')

NUEBAR_CODE = -12

NUE_CODE = 12

NUMUBAR_CODE = -14

NUMU_CODE = 14

NUTAUBAR_CODE = -16

NUTAU_CODE = 16

PART_CODE = 1

property antiparticle: Is this an antiparticle flavor?

property bar_code: Return +/-1 for particle/antiparticle

property code

PDG code

Type:: int

property particle: Is this a particle (vs. antiparticle) flavor?

pidx(d, *args)[source]

Extract data from a nested dictionary d whose format is commonly found in PISA

The dictionary must have the format: d = {“<flavor>”: <data object>} <flavor> is one of “nue”, “nue_bar”, “numu”, “numu_bar”, “nutau”,

“nutau_bar”

property prob3_codes

flavor and particle/antiparticle codes, as used by prob3

Type:: (int,int)

property tex: TeX string

class pisa.utils.flavInt.NuFlavInt(*args, **kwargs)[source]

Bases: object

A neutrino “flavint” encompasses both the neutrino flavor and its interaction type.

Instantiate via

String containing a single flavor and a single interaction type e.g.: ‘numucc’, ‘nu_mu_cc’, ‘nu mu CC’, ‘numu_bar CC’, etc.
Another instantiated NuFlavInt object
Two separate objects that can be converted to a valid NuFlav and a valid IntType (in that order)
An iterable of length two which contains such objects
kwargs flav and int_type specifying such objects

String specifications simply ignore all characters not recognized as a valid token.

FINT_RE = re.compile('(?P<fullflav>(?:nue|numu|nutau)(?P<barnobar>bar){0,1})(?P<int_type>cc|nc){0,1}')

FINT_SSEP = '_'

FINT_TEXSEP = ' \\, '

TOKENS = re.compile('(nu|e|mu|tau|bar|nc|cc)')

property antiparticle: Is this an antiparticle flavor?

property cc: Is this interaction type charged current (CC)?

property flav: Return just the NuFlav part of this NuFlavInt

property int_type: Return IntType object that composes this NuFlavInt

property nc: Is this interaction type neutral current (NC)?

property particle: Is this a particle (vs. antiparticle) flavor?

pidx(d, *args)[source]

Extract data from a nested dictionary d whose format is commonly found in PISA

The dictionary must have the format:

d = {"<flavor>": {"<interaction type>": <data object>}}
<flavor> is one of "nue", "nue_bar", "numu", "numu_bar", "nutau",
    "nutau_bar"
<interaction type> is one of "cc", "nc"

property tex: TeX string representation of this NuFlavInt

class pisa.utils.flavInt.NuFlavIntGroup(*args)[source]

Bases: MutableSequence

Grouping of neutrino flavors+interaction types (flavints)

Grouping of neutrino flavints. Specification can be via

A single NuFlav object; this gets promoted to include both interaction types
A single NuFlavInt object
String: * Ignores anything besides valid tokens * A flavor with no interaction type specified will include both CC

and NC interaction types
- Multiple flavor/interaction-type specifications can be made; use of delimiters is optional
- Interprets “nuall” as nue+numu+nutau and “nuallbar” as nuebar+numubar+nutaubar
Iterable containing any of the above (i.e., objects convertible to NuFlavInt objects). Note that a valid iterable is another NuFlavIntGroup object.

FLAVINT_RE = re.compile('((?:nue|numu|nutau|nuall)(?:bar){0,1}(?:cc|nc){0,2})')

FLAV_RE = re.compile('(?P<fullflav>(?:nue|numu|nutau|nuall)(?:bar){0,1})')

IGNORE = re.compile('[^a-zA-Z]')

IT_RE = re.compile('(cc|nc)')

TOKENS = re.compile('(nu|e|mu|tau|all|bar|nc|cc)')

property antiparticles: Return tuple of unique antiparticle NuFlavInts that make up this group

property cc_flavints: Return tuple of unique charged-current-interaction NuFlavInts that make up this group

property cc_flavs: Return tuple of unique charged-current-interaction flavors that make up this group. Note that only the flavors, and not NuFlavInts, are returned (cf. method cc_flavints

file_str(flavsep='_', intsep='_', flavintsep='__', addsep='')[source]: String representation for this group useful for file names

property flavints: Return tuple of all NuFlavInts that make up this group

property flavs: Return tuple of unique flavors that make up this group

group_flavs_by_int_type()[source]

Return a dictionary with flavors grouped by the interaction types represented in this group.

The returned dictionary has format:

{
    'all_int_type_flavs': [<NuFlav object>, <NuFlav object>, ...],
    'cc_only_flavs':      [<NuFlav object>, <NuFlav object>, ...],
    'nc_only_flavs':      [<NuFlav object>, <NuFlav object>, ...],
}

where the lists of NuFlav objects are mutually exclusive

insert(index, value)[source]: Insert flavint value before index

static interpret(val)[source]: Interpret a NuFlavIntGroup arg

property nc_flavints: Return tuple of unique neutral-current-interaction NuFlavInts that make up this group

property nc_flavs: Return tuple of unique neutral-current-interaction flavors that make up this group. Note that only the flavors, and not NuFlavInts, are returned (cf. method nc_flavints

property particles: Return tuple of unique particle (vs antiparticle) NuFlavInts that make up this group

remove(value)[source]

Remove a flavint from this group.

Parameters:: value (anything accepted by interpret method)

simple_str(flavsep='+', intsep=' ', flavintsep=', ', addsep='+')[source]: Simple string representation of this group

simple_tex(flavsep=' + ', intsep=' \\, ', flavintsep='; \\; ', addsep='+')[source]: Simplified TeX string reperesentation of this group

property tex: TeX string representation for this group

property unique_flavs_tex: TeX string representation of the unique flavors present in this group

pisa.utils.flavInt.flavintGroupsFromString(groups)[source]

Interpret groups to break into neutrino flavor/interaction type(s) that are to be grouped together; also form singleton groups as specified explicitly in groups or for any unspecified flavor/interaction type(s).

The returned list of groups encompasses all possible flavor/int types, but the groups are mutually exclusive.

Parameters:: groups (None, string, or sequence of strings)
Returns:: flavint_groups
Return type:: list of NuFlavIntGroup

pisa.utils.flavInt.get_bar_ssep()[source]

Get the separator that is set to be used between “base” flavor (“nue”, “numu”, or “nutau”) and “bar” when converting antineutrino `NuFlav`s or `NuFlavInt`s to strings.

Returns:: sep – Separator
Return type:: string

pisa.utils.flavInt.set_bar_ssep(val)[source]

Set the separator between “base” flavor (“nue”, “numu”, or “nutau”) and “bar” when converting antineutrino `NuFlav`s or `NuFlavInt`s to strings.

Parameters:: val (string) – Separator

pisa.utils.flavInt.xlateGroupsStr(val)[source]

Translate a “,”-separated string into separate `NuFlavIntGroup`s.

val

“,”-delimited list of valid NuFlavIntGroup strings, e.g.:: “nuall_nc,nue,numu_cc+numubar_cc”

Note that specifying NO interaction type results in both interaction types being selected, e.g. “nue” implies “nue_cc+nue_nc”. For other details of how the substrings are interpreted, see docs for NuFlavIntGroup.

Returns:: grouped, ungrouped

grouped, ungrouped: lists of NuFlavIntGroups; the first will have more than one flavint in each NuFlavIntGroup whereas the second will have just one flavint in each NuFlavIntGroup. Either list can be of 0-length.

This function does not enforce mutual-exclusion on flavints in the various flavint groupings, but does list any flavints not grouped together in the ungrouped return arg. Mutual exclusion can be enforced through set operations upon return.

pisa.utils.flux_weights module

A set of functions for calculating flux weights given an array of energy and cos(zenith) values based on the Honda atmospheric flux tables. A lot of this functionality will be copied from honda.py but since I don’t want to initialise this as a stage it makes sense to copy it in to here so somebody can’t accidentally do the wrong thing with that script.

pisa.utils.flux_weights.calculate_2d_flux_weights(true_energies, true_coszens, en_splines, enpow=1, out=None)[source]

Calculate flux weights for given array of energy and cos(zenith). Arrays of true energy and zenith are expected to be for MC events, so they are tested to be of the same length. en_splines should be the spline for the primary of interest. The entire dictionary is calculated in the previous function.

Parameters:

true_energies (list or numpy array) – A list of the true energies of your MC events. Pass this in GeV!
true_coszens (list or numpy array) – A list of the true coszens of your MC events
en_splines (list of splines) – A list of the initialised energy splines from the previous function for your desired primary.
enpow (integer) – The power to which the energy was raised in the construction of the splines. If you don’t know what this means, leave it as 1.
out (np.array) – optional array to store results

Example

Use the previous function to calculate the spline dict for the South Pole.

spline_dict = load_2d_table(‘flux/honda-2015-spl-solmax-aa.d’)

Then you must have some equal length arrays of energy and zenith.

ens = [3.0, 4.0, 5.0] czs = [-0.4, 0.7, 0.3]

These are used in this function, along with whatever primary you are interested in calculating the flux weights for.

flux_weights = calculate_2d_flux_weights(ens, czs, spline_dict[‘numu’])

Done!

pisa.utils.flux_weights.calculate_3d_flux_weights(true_energies, true_coszens, true_azimuths, en_splines, enpow=1, az_linear=True)[source]

Calculate flux weights for given array of energy, cos(zenith) and azimuth.

Arrays of true energy, zenith and azimuth are expected to be for MC events, so they are tested to be of the same length. En_splines should be the spline for the primary of interest. The entire dictionary is calculated in the previous function.

Parameters:

true_energies (list or numpy array) – A list of the true energies of your MC events. Pass this in GeV!
true_coszens (list or numpy array) – A list of the true coszens of your MC events
true_azimuths (list or numpy array) – A list of the true azimuths of your MC events. Pass this in radians!
en_splines (list of splines) – A list of the initialised energy splines from the previous function for your desired primary.
enpow (integer) – The power to which the energy was raised in the construction of the splines. If you don’t know what this means, leave it as 1.
az_linear (boolean) – Whether or not to linearly interpolate in the azimuthal direction. If you don’t know why this is an option, leave it as true.

Example

Use the previous function to calculate the spline dict for the South Pole.

spline_dict = load_3d_table(‘flux/honda-2015-spl-solmax.d’)

Then you must have some equal length arrays of energy, zenith and azimuth.

ens = [3.0, 4.0, 5.0] czs = [-0.4, 0.7, 0.3] azs = [0.3, 1.2, 2.1]

These are used in this function, along with whatever primary you are interested in calculating the flux weights for.

flux_weights = calculate_3d_flux_weights(ens,
czs, azs, spline_dict[‘numu’])

Done!

pisa.utils.flux_weights.load_2d_bartol_table(flux_file, enpow=1, return_table=False)[source]

pisa.utils.flux_weights.load_2d_honda_table(flux_file, enpow=1, return_table=False, hg_taumode=False)[source]: Added “hg_taumode” to load in hillas gaisser h3a tables made with tau neutrino contributions.

pisa.utils.flux_weights.load_2d_table(flux_file, enpow=1, return_table=False)[source]

Manipulate 2 dimensional flux tables.

2D is expected to mean energy and cosZenith, where azimuth is averaged over (before being stored in the table) and the zenith range should include both hemispheres.

Parameters:

flux_file (string) – The location of the flux file you want to spline. Should be a honda azimuth-averaged file.
enpow (integer) – The power to which the energy will be raised in the construction of the splines. If you don’t know what this means, leave it as 1.
return_table (boolean) – Flag to true if you want the function to also return a dictionary of the underlying values from the tables. Useful for comparisons.

pisa.utils.flux_weights.load_3d_honda_table(flux_file, enpow=1, return_table=False)[source]

pisa.utils.flux_weights.load_3d_table(flux_file, enpow=1, return_table=False)[source]

Manipulate 3 dimensional flux tables.

3D is expected to mean energy, cosZenith and azimuth. The angular range should be fully sky.

Parameters:

flux_file (string) – The location of the flux file you want to spline. Should be a honda azimuth-averaged file.
enpow (integer) – The power to which the energy will be raised in the construction of the splines. If you don’t know what this means, leave it as 1.
return_table (boolean) – Flag to true if you want the function to also return a dictionary of the underlying values from the tables. Useful for comparisons.

pisa.utils.format module

Utilities for interpreting and returning formatted strings.

pisa.utils.format.BIN_PREFIX_TO_POWER_OF_1024 = {'': 0, 'Ei': 6, 'Gi': 3, 'Ki': 1, 'Mi': 2, 'Pi': 5, 'Ti': 4, 'Yi': 8, 'Zi': 7}: Mapping from binary prefixes to powerorders-of-1024

pisa.utils.format.NUMBER_RE = re.compile('((?:-|\\+){0,1}[0-9.]+(?:e(?:-|\\+)[0-9.]+){0,1})', re.IGNORECASE): Regex for matching signed, unsigned, and sci.-not. (“1e10”) numbers.

pisa.utils.format.NUMBER_RESTR = '((?:-|\\+){0,1}[0-9.]+(?:e(?:-|\\+)[0-9.]+){0,1})': RE str for matching signed, unsigned, and sci.-not. (“1e10”) numbers.

pisa.utils.format.ORDER_OF_MAG_TO_SI_PREFIX = {-24: 'y', -21: 'z', -18: 'a', -15: 'f', -12: 'p', -9: 'n', -6: 'μ', -3: 'm', 0: '', 3: 'k', 6: 'M', 9: 'G', 12: 'T', 15: 'P', 18: 'E', 21: 'Z', 24: 'Y'}: Mapping of powers-of-10 to SI prefixes (orders-of-magnitude)

pisa.utils.format.POWER_OF_1024_TO_BIN_PREFIX = {0: '', 1: 'Ki', 2: 'Mi', 3: 'Gi', 4: 'Ti', 5: 'Pi', 6: 'Ei', 7: 'Zi', 8: 'Yi'}: Mapping from powers-of-1024 to binary prefixes

pisa.utils.format.SI_PREFIX_TO_ORDER_OF_MAG = {'': 0, 'E': 18, 'G': 9, 'M': 6, 'P': 15, 'T': 12, 'Y': 24, 'Z': 21, 'a': -18, 'f': -15, 'k': 3, 'm': -3, 'n': -9, 'p': -12, 'u': -6, 'y': -24, 'z': -21, 'μ': -6}: Mapping of SI prefixes to powers-of-10

pisa.utils.format.arg_str_seq_none(inputs, name)[source]

Simple input handler.

Parameters:

inputs (None, string, or iterable of strings) – Input value(s) provided by caller
name (string) – Name of input, used for producing a meaningful error message

Returns:

inputs

Return type:

None, or list of strings

Raises:

TypeError if unrecognized type –

pisa.utils.format.arg_to_tuple(arg)[source]

Convert arg to a tuple: None becomes an empty tuple, an isolated string becomes a tuple containing that string, and any iterable or sequence is simply converted into a tuple.

Parameters:: arg (str, sequence of str, iterable of str, or None)
Returns:: arg_tup
Return type:: tuple of str

pisa.utils.format.default_map_tex(map)[source]

pisa.utils.format.engfmt(n, sigfigs=3, decimals=None, sign_always=False)[source]

Format number as string in engineering format (10^(multiples-of-three)), including the most common metric prefixes (from atto to Exa).

Parameters:

n (scalar) – Number to be formatted
sigfigs (int >= 0) – Number of significant figures to limit the result to; default=3.
decimals (int or None) – Number of decimals to display (zeros filled out as necessary). If None, decimals is automatically determined by the magnitude of the significand and the specified sigfigs.
sign_always (bool) – Prefix the number with “+” sign if number is positive; otherwise, only negative numbers are prefixed with a sign (“-“)

pisa.utils.format.format_num(value, sigfigs=None, precision=None, fmt=None, sci_thresh=(6, -4), exponent=None, inf_thresh=inf, trailing_zeros=False, always_show_sign=False, decstr='.', thousands_sep=None, thousandths_sep=None, left_delimiter=None, right_delimiter=None, expprefix=None, exppostfix=None, nanstr='nan', infstr='inf')[source]

Fine-grained control over formatting a number as a string.

Parameters:

value (numeric) – The number to be formatted.
sigfigs (int > 0, optional) – Use up to this many significant figures for displaying a number. You can use either sigfigs or precision, but not both. If neither are specified, default is to set sigfigs to 8. See also trailing_zeros.
precision (float, optional) – Round value to a precision the same as the order of magnitude of precision. You can use either precision or sigfigs, but not both. If neither is specified, default is to set sigfigs to 8. See also trailing_zeros.
fmt (None or one of {'sci', 'eng', 'sipre', 'binpre', 'full'}, optional) –
Force a particular format to be used::
- None allows the value and what is passed for sci_thresh and exponent to decide whether or not to use scientific notation
- ’sci’ forces scientific notation
- ’eng’ uses the engineering convention of powers divisible by 3 (10e6, 100e-9, etc.)
- ’sipre’ uses powers divisible by 3 but uses SI prefixes (e.g. k, M, G, etc.) instead of displaying the exponent
- ’binpre’ uses powers of 1024 and uses IEC prefixes (e.g. Ki, Mi, Gi, etc.) instead displaying the exponent
- ’full’ forces printing all digits left and/or right of the decimal to display the number (no exponent notation or SI/binary prefix will be used)
Note that the display of NaN and +/-inf are unaffected by fmt.
exponent (None, integer, or string, optional) – Force the number to be scaled with respect to this exponent. If a string prefix is passed and fmt is None, then the SI prefix or binary prefix will be used for the number. E.g., exponent=3 would cause the number 1 to be expressed as '0.001e3'`, while ``exponent='k' would cause it to be displayed as '1 m'. Both ‘μ’ and ‘u’ are accepted to mean “micro”. A non-None value for exponent forces some form of scientific/engineering notation, so fmt cannot be 'full' in this case. Finally, if fmt is 'binpre' then exponent is applied to 1024. I.e., 1 maps to kibi (Ki), 2 maps to mebi (Mi), etc.
sci_thresh (sequence of 2 integers) – When to switch to scientific notation. The first integer is the order of magnitude of value at or above which scientific notation will be used. The second integer indicates the order of magnitude at or below which the most significant digit falls for scientific notation to be used. E.g., sci_thresh=(3, -3) means that numbers in the ones-of-thousands or greater OR numbers in the ones-of-thousandths or less will be displayed using scientific notation. Note that fmt, if not None, overrides this behavior. Default is (10,-5).
inf_thresh (numeric, optional) – Numbers whose magnitude is equal to or greater than this threhshold are considered infinite and therefore displayed using infstr (possibly including a sign, as appropriate). Default is np.inf.
trailing_zeros (bool, optional) – Whether to display all significant figures specified by sigfigs, even if this results in trailing zeros. Default is False.
always_show_sign (bool, optional) – Always show a sign, whether number is positive or negative, and whether exponent (if present) is positive or negative. Default is False.
decstr (string, optional) – Separator to use for the decimal point. E.g. decstr='.' or decstr=',' for mthe most common cases, but this can also be used in TeX tables for alignment on decimal points via decstr='&.&'. Default is ‘.’.
thousands_sep (None or string, optional) – Separator to use between thousands, e.g. thousands_sep=',' to give results like '1,000,000', or `thousands_sep=r'\,' for TeX formatting with small spaces between thousands. Default is None.
thousandths_sep (None or string, optional) – Separator to use between thousandthss. Default is None.
left_delimiter (None or string, optional) – Strings to delimit the left and right sides of the resulting string. E.g. left_delimiter='${' and right_delimiter='}$' could be used to delimit TeX-formatted strings, such that a number is displayed, e.g., as r'${1\times10^3}$'. Defaults are None for both.
right_delimiter (None or string, optional) – Strings to delimit the left and right sides of the resulting string. E.g. left_delimiter='${' and right_delimiter='}$' could be used to delimit TeX-formatted strings, such that a number is displayed, e.g., as r'${1\times10^3}$'. Defaults are None for both.
expprefix (None or string, optional) – E.g. use expprefix=’e’ for simple “e” scientific notation (“1e3”), or use expprefix=r’times10^{’ and exppostfix=r’}’ for TeX-formatted scientific notation. Use a space (or tex equivalent) for binary and SI prefixes. If scientific notation is to be used, `expprefix defaults to ‘e’. If either SI or binary prefixes are to be used, expprefix defaults to ‘ ‘ (space). In any case, exppostfix defaults to None.
exppostfix (None or string, optional) – E.g. use expprefix=’e’ for simple “e” scientific notation (“1e3”), or use expprefix=r’times10^{’ and exppostfix=r’}’ for TeX-formatted scientific notation. Use a space (or tex equivalent) for binary and SI prefixes. If scientific notation is to be used, `expprefix defaults to ‘e’. If either SI or binary prefixes are to be used, expprefix defaults to ‘ ‘ (space). In any case, exppostfix defaults to None.
nanstr (string, optional) – Not-a-number (NaN) values will be displayed using this string. Default is ‘nan’ (following the Numpy convention)
infstr (string, optional) – Infinite values will be displayed using this string (note that the sign is prepended, as appropriate). Default is ‘inf’ (following the Numpy convention).

Returns:

formatted

Return type:

string

pisa.utils.format.hash2hex(hash, bits=64)[source]

Convert a hash value to its string hexadecimal representation.

Parameters:

hash (integer or string)
bits (integer > 0)

Returns:

hash

Return type:

string

pisa.utils.format.hr_range_formatter(start, end, step)[source]

Format a range (sequence) in a simple and human-readable format by specifying the range’s starting number, ending number (inclusive), and step size.

Parameters:

start (numeric)
end (numeric)
step (numeric)

Notes

If start and end are integers and step is 1, step size is omitted.

The format does NOT follow Python’s slicing syntax, in part because the interpretation is meant to differ; e.g.,

‘0-10:2’ includes both 0 and 10 with step size of 2

whereas: 0:10:2 (slicing syntax) excludes 10

Numbers are converted to integers if they are equivalent for more compact display.

Examples

>>> hr_range_formatter(start=0, end=10, step=1)
'0-10'
>>> hr_range_formatter(start=0, end=10, step=2)
'0-10:2'
>>> hr_range_formatter(start=0, end=3, step=8)
'0-3:8'
>>> hr_range_formatter(start=0.1, end=3.1, step=1.0)
'0.1-3.1:1'

pisa.utils.format.hrbool2bool(s)[source]

Convert a string that a user might input to indicate a boolean value of either True or False and convert to the appropriate Python bool.

Note first that the case used in the string is ignored
‘t’, ‘true’, ‘1’, ‘yes’, and ‘one’ all map to True
‘f’, ‘false’, ‘0’, ‘no’, and ‘zero’ all map to False

Parameters:: s (string)
Returns:: b
Return type:: bool

pisa.utils.format.hrlist2list(hrlst)[source]

Convert human-readable string specifying a list of numbers to a Python list of numbers.

Parameters:: hrlist (string)
Returns:: lst
Return type:: list of numbers

pisa.utils.format.hrlol2lol(hrlol)[source]

Convert a human-readable string specifying a list-of-lists of numbers to a Python list-of-lists of numbers.

Parameters:: hrlol (string) – Human-readable list-of-lists-of-numbers string. Each list specification is separated by a semicolon, and whitespace is ignored. Refer to hrlist2list for list specification.
Returns:: lol
Return type:: list-of-lists of numbers

Examples

A single number evaluates to a list with a list with a single number.

>>>  hrlol2lol("1")
[[1]]

A sequence of numbers or ranges can be specified separated by commas.

>>>  hrlol2lol("1, 3.2, 19.8")
[[1, 3.2, 19.8]]

A range can be specified with a dash; default is a step size of 1 (or -1 if the end of the range is less than the start of the range); note that the endpoint is included, unlike slicing in Python.

>>>  hrlol2lol("1-3")
[[1, 2, 3]]

The range can go from or to a negative number, and can go in a negative direction.

>>>  hrlol2lol("-1 - -5")
[[-1, -3, -5]]

Multiple lists are separated by semicolons, and parentheses and brackets can be used to make it easier to understand the string.

>>>  hrlol2lol("1 ; 8 ; [(-10 - -8:2), 1]")
[[1], [8], [-10, -8, 1]]

Finally, all of the above can be combined.

>>>  hrlol2lol("1.-3.; 9.5-10.6:0.5,3--1:-1; 12.5-13:0.8")
[[1, 2, 3], [9.5, 10.0, 10.5, 3, 2, 1, 0, -1], [12.5]]

pisa.utils.format.int2hex(i, bits, signed)[source]

Convert a signed or unsigned integer bits long to hexadecimal representation. As many hex characters are returned to fully specify any number bits in length regardless of the value of i.

Parameters:

i (int) – The integer to be converted. Signed integers have a range of -2**(bits-1) to 2**(bits-1)-1), while unsigned integers have a range of 0 to 2**(bits-1).
bits (int) – Number of bits long the representation is
signed (bool) – Whether the number is a signed integer; this is dependent upon the representation used for numbers, and _not_ whether the value i is positive or negative.

Returns:

h (string of length ceil(bits/4.0) since it takes this many hex characters)
to represent a number bits long.

pisa.utils.format.is_tex(s)[source]

pisa.utils.format.list2hrlist(lst)[source]

Convert a list of numbers to a compact and human-readable string.

Parameters:: lst (sequence)

Notes

Adapted to make scientific notation work correctly from [1].

References

[1] http://stackoverflow.com/questions/9847601 user Scott B’s adaptation to: Python 2 of Rik Poggi’s answer to his question

Examples

>>> list2hrlist([0, 1])
'0,1'
>>> list2hrlist([0, 3])
'0,3'
>>> list2hrlist([0, 1, 2])
'0-2'
>>> utils.list2hrlist([0.1, 1.1, 2.1, 3.1])
'0.1-3.1:1'
>>> list2hrlist([0, 1, 2, 4, 5, 6, 20])
'0-2,4-6,20'

pisa.utils.format.make_valid_python_name(name)[source]

Make a name a valid Python identifier.

From user Triptych at http://stackoverflow.com/questions/3303312

pisa.utils.format.sep_three_tens(strval, direction, sep=None)[source]

Insert sep char into sequence of chars strval.

Parameters:

strval (sequence of chars or string) – Sequence of chars into which to insert the separator
direction (string, one of {'left', 'right'}) – Use ‘left’ for left of the decimal, and ‘right’ for right of the decimal
sep (None or char) – Separator to insert

Returns:

formatted

Return type:

list of chars

pisa.utils.format.split(string, sep=',', force_case=None, parse_func=None)[source]

Parse a string containing a separated list.

Before splitting the list, the string has extraneous whitespace removed from either end.
The strings that result after the split can have their case forced or be left alone.
Whitespace surrounding (but not falling between non-whitespace) in each resulting string is removed.
After all of the above, the value can be parsed further by a user-supplied parse_func.

Note that repeating a separator without intervening values yields empty-string values.

Parameters:

string (string) – The string to be split
sep (string) – Separator to look for
force_case (None, 'lower', or 'upper') – Whether to force the case of the resulting items: None does not change the case, while ‘lower’ or ‘upper’ change the case.
parse_func (None or callable) – If a callable is supplied, each item in the list, after the basic parsing, is processed by parse_func.

Returns:

lst – The types of the items in the list depend upon parse_func if it is supplied; otherwise, all items are strings.

Return type:

list of objects

Examples

>>> print(split(' One, TWO, three ', sep=',', force_case='lower'))
['one', 'two', 'three']

>>> print(split('One:TWO:three', sep=':'))
['One', 'TWO', 'three']

>>> print(split('one  two  three', sep=' '))
['one', '', 'two', '' , 'three']

>>> print(split('1 2 3', sep=' ', parse_func=int))
[1, 2, 3]

>>> from ast import literal_eval
>>> print(split('True; False; None; (1, 2, 3)', sep=',',
>>>             parse_func=literal_eval))
[True, False, None, (1, 2, 3)]

pisa.utils.format.strip_outer_dollars(value)[source]: Strip surrounding dollars signs from TeX string, ignoring leading and trailing whitespace

pisa.utils.format.strip_outer_parens(value)[source]: Strip parentheses surrounding a string, ignoring leading and trailing whitespace

pisa.utils.format.test_format_num()[source]: Unit tests for the format_num function

pisa.utils.format.test_hr_range_formatter()[source]: Unit tests for hr_range_formatter

pisa.utils.format.test_list2hrlist()[source]: Unit tests for list2hrlist

pisa.utils.format.test_timediff()[source]: Unit tests for timediff function

pisa.utils.format.test_timestamp()[source]: Unit tests for timestamp function

pisa.utils.format.tex_dollars(s)[source]

pisa.utils.format.tex_join(sep, *args)[source]: Join TeX-formatted strings together into one, each separated by sep. Also, this strips surrounding ‘$’ from each string before joining.

pisa.utils.format.text2tex(txt)[source]: Convert common characters so they show up the same as TeX

pisa.utils.format.timediff(dt_sec, hms_always=False, sec_decimals=3)[source]

Smart string formatting for a time difference (in seconds)

Parameters:

dt_sec (numeric) – Time difference, in seconds
hms_always (bool) –
- True
  Always display hours, minuts, and seconds regardless of the order- of-magnitude of dt_sec
- False
  Display a minimal-length string that is meaningful, by omitting units that are more significant than those necessary to display dt_sec; if… * dt_sec < 1 s
  
  Use engineering formatting for the number.
  - dt_sec is an integer in the range 0-59 (inclusive)
    sec_decimals is ignored and the number is formatted as an integer
  See Notes below for handling of units.
(Default: False)
sec_decimals (int) – Round seconds to this number of digits

Notes

If colon notation (e.g. HH:MM:SS.xxx, MM:SS.xxx, etc.) is not used, the number is only seconds, and is appended by a space ‘ ‘ followed by units of ‘s’ (possibly with a metric prefix).

pisa.utils.format.timestamp(d=True, t=True, tz=True, utc=False, winsafe=False)[source]

Simple utility to print out a time, date, or time+date stamp for the time at which the function is called.

Calling via defaults (explcitly provided here for reference) ..

timestamp(d=True, t=True, tz=True, utc=False, winsafe=False)

should be equivalent to the shell command ..

date +'%Y-%m-%dT%H:%M:%S%z'

Parameters:

d (bool) – Include date (default: True)
t (bool) – Include time (default: True)
tz (bool) – Include timezone offset from UTC (default: True)
utc (bool) – Include UTC time/date (as opposed to local time/date) (default: False)
winsafe (bool) – Omit colons between hours/minutes (default: False)

pisa.utils.gaussians module

Multiple implementations of sum-of-gaussians for compatibility and speed

pisa.utils.gaussians.gaussians(x, mu, sigma, weights=None, implementation=None, **kwargs)[source]

Sum of multiple Gaussian curves, normalized to have area of 1.

Parameters:

x (array) – Points at which to evaluate the sum of Gaussians
mu (arrays) – Means of the Gaussians to accumulate
sigma (array) – Standard deviations of the Gaussians to accumulate
weights (array or None) – Weights given to each Gaussian
implementation (None or string) – One of ‘singlethreaded’, ‘multithreaded’, or ‘cuda’. Passing None, the function will try to determine which of the implementations is best to call.
kwargs – Passed on to the underlying implementation

Returns:

outbuf – Resulting sum of Gaussians

Return type:

array

Notes

This function dynamically calls an appropriate implementation depending upon the problem size, the hardware available (multi-core CPU or a GPU), and whether the user specifies implementation.

pisa.utils.gaussians.test_gaussians(test_perf=False)[source]: Test gaussians function

pisa.utils.hash module

Utilities for hashing objects.

pisa.utils.hash.FAST_HASH_FILESIZE_BYTES = 10000: For a fast hash on a file object, this many bytes of the file are used

pisa.utils.hash.FAST_HASH_NDARRAY_ELEMENTS = 1000: For a fast hash on a numpy array or matrix, this many elements of the array or matrix are used

pisa.utils.hash.FAST_HASH_STR_CHARS = 1000: For a fast hash on a string (or object’s pickle string representation), this many characters are used

pisa.utils.hash.hash_file(fname, hash_to=None, full_hash=True)[source]: Return a hash for a file, passing contents through hash_obj function.

pisa.utils.hash.hash_obj(obj, hash_to='int', full_hash=True)[source]

Return hash for an object. Object can be a numpy ndarray or matrix (which is serialized to a string), an open file (which has its contents read), or any pickle-able Python object.

Note that only the first most-significant 8 bytes (64 bits) from the MD5 sum are used in the hash.

Parameters:

obj (object) – Object to hash. Note that the larger the object, the longer it takes to hash.
hash_to (string) –

‘i’, ‘int’, or ‘integer’: First 8 bytes of the MD5 sum are interpreted
as an integer.

’b’, ‘bin’, or ‘binary’: MD5 sum digest; returns an 8-character string ‘h’, ‘x’, ‘hex’: MD5 sum hexdigest, (string of 16 characters) ‘b64’, ‘base64’: first 8 bytes of MD5 sum are base64 encoded (with ‘+’

and ‘-’ as final two characters of encoding). Returns string of 11 characters.
full_hash (bool) – If True, hash on the full object’s contents (which can be slow) or if False, hash on a partial object. For example, only a file’s first kB is read, and only 1000 elements (chosen at random) of a numpy ndarray are hashed on. This mode of operation should suffice for e.g. a minimization run, but should _not_ be used for storing to/loading from disk.

Returns:

hash_val

Return type:

int or string

See also

hash_file: hash a file on disk by filename/path

pisa.utils.hash.test_hash_file()[source]: Unit tests for hash_file function

pisa.utils.hash.test_hash_obj()[source]: Unit tests for hash_obj function

pisa.utils.hdf module

Set of utilities for handling HDF5 file I/O

pisa.utils.hdf.from_hdf(val, return_node=None, choose=None)[source]

Return the contents of an HDF5 file or node as a nested dict; optionally return a second dict containing any HDF5 attributes attached to the entry-level HDF5 entity.

Parameters:

val (string or h5py.Group) –
Specifies entry-level entity * If val is a string, it is interpreted as a filename; file is opened

as an h5py.File
- Otherwise, val must be an h5py.Group in an instantiated object
return_node (None or string) – Not yet implemented
choose (None or list) – Optionally can provide a list of variables names to parse (items not in this list will be skipped, saving time & memory)

Returns:

data – Nested dictionary; keys are HDF5 node names and values contain the contents of that node. If the entry-level entity of val has “attrs”, these are extracted and attached as an OrderedDict at data.attrs; otherwise, this entity is an empty OrderedDict.

Return type:

OrderedDict with additional attr of type OrderedDict named attrs

pisa.utils.hdf.test_hdf()[source]: Unit tests for hdf module

pisa.utils.hdf.to_hdf(data_dict, tgt, attrs=None, overwrite=True, warn=True)[source]

Store a (possibly nested) dictionary to an HDF5 file or branch node within an HDF5 file (an h5py Group).

This creates hardlinks for duplicate non-trivial leaf nodes (h5py Datasets) to minimize storage space required for redundant datasets. Duplication is detected via object hashing.

NOTE: Branch nodes are sorted before storing (by name) for consistency in the generated file despite Python dictionaries having no defined ordering among keys.

Parameters:

data_dict (Mapping) – Dictionary, OrderedDict, or other Mapping to be stored
tgt (str or h5py.Group) – Target for storing data. If tgt is a str, it is interpreted as a filename; a file is created with that name (overwriting an existing file, if present). After writing, the file is closed. If tgt is an h5py.Group, the data is simply written to that Group and it is left open at function return.
attrs (Mapping) – Attributes to apply to the top-level entity being written. See http://docs.h5py.org/en/latest/high/attr.html
overwrite (bool) – Set to True (default) to allow overwriting existing file. Raise exception and quit otherwise.
warn (bool) – Issue a warning message if a file is being overwritten. Suppress warning by setting to False (e.g. when overwriting is the desired behaviour).

pisa.utils.hdfchain module

class to access hdf5 files chained together.

class pisa.utils.hdfchain.HDFChain(files, maxdepth=1, verbose=False, **kwargs)[source]

Bases: object

getNode(path)[source]

class pisa.utils.hdfchain.HDFTableProxy(table, files)[source]

Bases: object

col(colname)[source]

col_iter(colname)[source]

read()[source]

read_iter()[source]

class pisa.utils.hdfchain.TableAccessor(tabledict)[source]: Bases: object

pisa.utils.jsons module

A set of utilities for reading (and instantiating) objects from and writing objects to JSON files.

class pisa.utils.jsons.NumpyDecoder(encoding=None, object_hook=None, parse_float=None, parse_int=None, parse_constant=None, strict=True, object_pairs_hook=None)[source]

Bases: JSONDecoder

Decode JSON array(s) as numpy.ndarray; also returns python strings instead of unicode.

json_array_numpy(s_and_end, scan_once, **kwargs)[source]: Interpret arrays (lists by default) as numpy arrays where this does not yield a string or object array; also handle conversion of particularly-formatted input to pint Quantities.

class pisa.utils.jsons.NumpyEncoder(skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, encoding='utf-8', default=None, use_decimal=True, namedtuple_as_object=True, tuple_as_array=True, bigint_as_string=False, item_sort_key=None, for_json=False, ignore_nan=False, int_as_string_bitcount=None, iterable_as_array=False)[source]

Bases: JSONEncoder

Subclass of ::class::json.JSONEncoder that overrides default method to allow writing numpy arrays and other special objects PISA uses to JSON files.

default(obj)[source]: Encode special objects to be representable as JSON.

pisa.utils.jsons.dumps(content, indent=2)[source]: Dump object to JSON-encoded string

pisa.utils.jsons.from_json(filename, cls=None)[source]

Open a file in JSON format (optionally compressed with bz2 or xor-scrambled) and parse the content into Python objects.

Parameters:

filename (str)
cls (class (type) object, optional) – If provided, the class is attempted to be instantiated as described in Notes section.

Returns:

contents_or_obj

Return type:

simple Python objects or cls instantiated therewith

Notes

If cls is provided as a class (type) object, this function attempts to instantiate the class with the data loaded from the JSON file, as follows:

if cls has a from_json method, that is called directly: ..
cls.from_json(filename)
if the data loaded from the JSON file is a non-string sequence: ..
cls(*data)
if the data loaded is a Mapping (dict, OrderedDict, etc.): ..
cls(**data)
for all other types loaded from the JSON: ..
cls(data)

Note that this currently only recognizes files by their extensions. I.e., the file must be named ..

myfile.json
myfile.json.bz2
myfile.json.xor

represent a bsic JSON file, a bzip-compressed JSON, and an xor-scrambled JSON, respectively.

pisa.utils.jsons.json_string(string)[source]: Decode a json string

pisa.utils.jsons.loads(s)[source]: Load (create) object from JSON-encoded string

pisa.utils.jsons.test_to_json_from_json()[source]: Unit tests for writing various types of objects to and reading from JSON files (including bz2-compressed and xor-scrambled files)

pisa.utils.jsons.to_json(content, filename, indent=2, overwrite=True, warn=True, sort_keys=False)[source]

Write content to a JSON file at filename.

Uses a custom parser that automatically converts numpy arrays to lists.

If filename has a “.bz2” extension, the contents will be compressed (using bz2 and highest-level of compression, i.e., -9).

If filename has a “.xor” extension, the contents will be xor-scrambled to make them human-unreadable (this is useful for, e.g., blind fits).

Parameters:

content (obj) – Object to be written to file. Tries making use of the object’s own to_json method if it exists.
filename (str) – Name of the file to be written to. Extension has to be ‘json’ or ‘bz2’.
indent (int) – Pretty-printing. Cf. documentation of json.dump() or json.dumps()
overwrite (bool) – Set to True (default) to allow overwriting existing file. Raise exception and quit otherwise.
warn (bool) – Issue a warning message if a file is being overwritten (True, default). Suppress warning by setting to False (e.g. when overwriting is the desired behaviour).
sort_keys (bool) – Output of dictionaries will be sorted by key if set to True. Default is False. Cf. json.dump() or json.dumps().

pisa.utils.kde_hist module

Functions to get KDE smoothed historgams

pisa.utils.kde_hist.get_hist(sample, binning, weights=None, bw_method='scott', adaptive=True, alpha=0.3, use_cuda=False, coszen_reflection=0.25, coszen_name='coszen', oversample=1, bootstrap=False, bootstrap_niter=10)[source]

Helper function for histograms from KDE

For description of args see kde_histogramdd()

Handling the reflctions at the coszen edges

ToDo:

Handle zenith like coszen? Or better: Define set of variables to perform reflection on and reflection parameters (e.g. reflect_fract or somesuch to stand in for for coszen_reflection and reflect_dims as standin for coszen_name; also need some way to specify whether to reflect about lower and/or upper edge); each such parameter can either be a single value, or a sequence with one value per variable.
Any good reason for 0.25 and ‘scott’ defaults? If not, don’t define a default and force the user to explicitly set this when function is called.

pisa.utils.kde_hist.kde_histogramdd(sample, binning, weights=None, bw_method='scott', adaptive=True, alpha=0.3, use_cuda=False, coszen_reflection=0.25, coszen_name='coszen', oversample=1, stack_pid=True, bootstrap=False, bootstrap_niter=10)[source]

Run kernel density estimation (KDE) for an array of data points, and then evaluate them on a histogram-like grid to effectively produce a histogram-like output. Handles reflection at coszen edges, and will expect coszen to be in the binning

Based on Sebastian Schoenen’s KDE implementation: http://code.icecube.wisc.edu/svn/sandbox/schoenen/kde

Parameters:

sample (array) – Shape (N_evts, vars), with vars in the right order corresponding to the binning order.
binning (MultiDimBinning) – A coszen dimension is expected
weights (None or array) – Same shape as sample
bw_method (string) – ‘scott’ or ‘silverman’ (see kde module)
adaptive (bool) – (see kde module)
alpha (float) – A parameter for the KDEs (see kde module)
use_cuda (bool) – Run on GPU (only works with <= 2d)
coszen_reflection (float) – Part (number between 0 and 1) of binning that is reflected at the coszen -1 and 1 edges
coszen_name (string) – Binning name to identify the coszen bin that needs to undergo special treatment for reflection
oversample (int) – Evaluate KDE at more points per bin, takes longer, but is more accurate
stack_pid (bool) – Treat each pid bin separately, not as another dimension of the KDEs Only supported for two additional dimensions, pid binning must be named pid
bootstrap (bool) – Use the bootstrap_kde class to produce error estimates on the KDE histograms. Slow, not recommended during fits.
bootstrap_niter (int) – Number of bootstrap iterations.

Returns:

histogram (numpy.ndarray)
ToDo
—–
* Maybe return Map with binnings attached insted of nd-array?
* Generalize to handle any dimensions with any reflection criterias

pisa.utils.kde_hist.test_kde_histogramdd()[source]: Unit tests for kde_histogramdd

pisa.utils.likelihood_functions module

This script contains functions to compute Barlow-Beeston Likelihood, as well as an implementation of the Poisson-Gamma mixture.

These likelihood implementations (except for poissonLLH) take into account uncertainties due to finite Monte Carlo statistics.

The functions are called in stats.py to apply them to histograms.

Note that these likelihoods are NOT centered around 0 (i.e. if data == expectation, LLH != 0)

pisa.utils.likelihood_functions.barlowLLH(data, unweighted_mc, weights)[source]

Barlow-Beeston log-likelihood (constant terms not omitted) Link to paper: https://doi.org/10.1016/0010-4655(93)90005-W – Input variables – data = data histogram mc = weighted MC histogram unweighted_mc = unweighted MC histogream weights = weight of each bin

– Output – llh = LLH values in each bin

– Notes – Shape of data, mc, unweighted_mc, weights and llh must be identical

pisa.utils.likelihood_functions.poissonLLH(data, mc)[source]

Standard poisson likelihood – Input variables – data = data histogram mc = MC histogram

– Output – LLH values in each bin

– Notes – Shape of data, mc are identical

pisa.utils.likelihood_functions.poisson_gamma(data, sum_w, sum_w2, a=1, b=0)[source]

Log-likelihood based on the poisson-gamma mixture. This is a Poisson likelihood using a Gamma prior. This implementation is based on the implementation of Austin Schneider (aschneider@icecube.wisc.edu) – Input variables – data = data histogram sum_w = MC histogram sum_w2 = Uncertainty map (sum of weights squared in each bin) a, b = hyperparameters of gamma prior for MC counts; default values of a = 1 and b = 0 corresponds to LEff (eq 3.16) https://doi.org/10.1007/JHEP06(2019)030

a = 0 and b = 0 corresponds to LMean (Table 2) https://doi.org/10.1007/JHEP06(2019)030

– Output – llh = LLH values in each bin

– Notes – Shape of data, sum_w, sum_w2 and llh are identical

pisa.utils.llh_client module

pisa.utils.llh_server module

Server(s) for handling llh requests from a client: client passes free param values, server sets these on its DistributionMaker, generates outputs, and compares the resulting distributions against a reference template, returning the llh value.

Code adapted from Dan Krause: https://gist.github.com/dankrause/9607475

see __license__.

pisa.utils.llh_server.fork_servers(config, ref, port='9000', num=4)[source]

Fork multiple servers for handling LLH requests. Objects are identically configured, and ports used are sequential starting from port.

Parameters:

config (str or iterable thereof)
ref (str)
port (str or int, optional)
num (int, optional) – Defaults to number of CPUs returned by multiple.cpu_count()

pisa.utils.llh_server.main(description='\nServer(s) for handling llh requests from a client: client passes free param\nvalues, server sets these on its DistributionMaker, generates outputs, and\ncompares the resulting distributions against a reference template, returning\nthe llh value.\n\nCode adapted from Dan Krause\n https://gist.github.com/dankrause/9607475\nsee `__license__`.\n')[source]: Parse command line arguments

pisa.utils.llh_server.receive_obj(sock)[source]

Receive an object from a socket. Payload is a pickle-encoded object, and header (prefixing payload) is 4-byte int indicating length of the payload.

Parameters:: sock (socket)
Returns:: Unpickled Python object
Return type:: obj

pisa.utils.llh_server.send_obj(obj, sock)[source]

Send a Python object over a socket. Object is pickle-encoded as the payload and sent preceded by a 4-byte header which indicates the number of bytes of the payload.

Parameters:

sock (socket)
obj (pickle-able Python object) – Object to send

pisa.utils.llh_server.serve(config, ref, port='9000')[source]

Instantiate PISA objects and run server for processing requests.

Parameters:

config (str or iterable thereof) – Resource path(s) to pipeline config(s)
ref (str) – Resource path to reference map
port (int or str, optional)

pisa.utils.log module

This module sets up the logging system by looking for a “logging.json” configuration file. It will search (in this order) the local directory, $PISA and finally the package resources. The loggers found in there will be lifted to the module namespace.

Currently, we have three loggers * logging: generic for what is going on (typically: opening file x or

doing this now messages)

physics: for any physics output that might be interesting (have x many events, the flux is …)
tprofile: for how much time it takes to run some step (in the format of time : start bla, time : stop bla)

class pisa.utils.log.Levels(value)[source]

Bases: IntEnum

Logging levels / int values we use in PISA in set_verbosity (and typically from the command line, where -v/-vv/-vvv/etc. are used)

DEBUG = 2

ERROR = -1

FATAL = -2

INFO = 1

TRACE = 3

WARN = 0

pisa.utils.log.set_verbosity(verbosity)[source]: Set the verbosity level for the root logger Verbosity should be an integer with the levels defined by pisa.utils.log.Levels enum.

pisa.utils.matrix module

Utilities for performing some not-so-common matrix tasks.

pisa.utils.matrix.fronebius_nearest_psd(A, return_distance=False)[source]

Find the positive semi-definite matrix closest to A.

The closeness to A is measured by the Fronebius norm. The matrix closest to A by that measure is uniquely defined in [3].

Parameters:

A (numpy.ndarray) – Symmetric matrix
return_distance (bool, optional) – Return distance of the input matrix to the approximation as given in theorem 2.1 in [3]. This can be compared to the actual Frobenius norm between the input and output to verify the calculation.

Returns:

X – Positive semi-definite matrix approximating A.

Return type:

numpy.ndarray

Notes

This function is a modification of [1]_, which is a Python adaption of [2], which credits [3].

References

pisa.utils.matrix.is_psd(A)[source]

Test whether a matrix is positive semi-definite.

Test is done via attempted Cholesky decomposition as suggested in [1]_.

Parameters:: A (numpy.ndarray) – Symmetric matrix
Returns:: True if A is positive semi-definite, else False
Return type:: bool

References

pisa.utils.mcSimRunSettings module

Handle Monte Carlo simulation run settings

class pisa.utils.mcSimRunSettings.DetMCSimRunsSettings(run_settings, detector=None)[source]

Bases: dict

Handle Monte Carlo run settings for a detector (i.e., without specifying which run as is required for the MCSimRunSettings object)

Since run is not specified at instantiation, method calls require the user to specify a run ID.

Parameters:

run_settings (string or dict)
detector (string or None)

See also

MCSimRunSettings: Same, but specifies a specific run at instantiation; see class docstring for specification of run_settings dict / JSON file

barnobarfract(run, barnobar=None, is_particle=None, flav_or_flavint=None)[source]

consistency_checks(data, run, flav=None)[source]

get_energy_range(run)[source]: (min, max) energy in GeV

get_flavints(run)[source]

get_flavs(run)[source]

get_num_gen(run, barnobar=None, is_particle=None, flav_or_flavint=None, include_physical_fract=True)[source]: Return the total number of events generated

get_spectral_index(run)[source]: Spectral index (positive number for negative powers of energy)

get_xsec(run, xsec=None)[source]: Instantiated CrossSections object

get_xsec_version(run)[source]: Cross sectons version name used in generating the MC

class pisa.utils.mcSimRunSettings.MCSimRunSettings(run_settings, run=None, detector=None)[source]

Bases: dict

Handle Monte Carlo run settings

Parameters:

run_settings (string or dict)
run
detector (string or None)

Notes

run_settings dictionary format (and same for corresponding JSON file, e.g. resources/events/mc_sim_run_settings.json); example is for PINGU but should generalize to DeepCore, etc. Note that a JSON file will use null and not None.

(Also note that expressions for numerical values utilizing the Python/numpy namespace are valid, e.g. “2*pi” will be evaluated within the code to its decimal value.):

{

  # Specify the detector name, lower case
  "pingu": {

    # Monte Carlo run number for this detector
    "388": {

      # Version of geometry, lower-case. E.g., ic86 for IceCube/DeepCore
      "geom": "v36",

      # A straightforward way of computing aeff/veff/meff is to keep all
      # simulated events and compare the #analysis/#simulated. If all
      # simulated events are kept, then the filename containing these is
      # recorded here.
      "all_gen_events_file": None,

      # Max and min azimuth angle simulated (rad)
      "azimuth_max": "2*pi",
      "azimuth_min": 0,

      # Max and min energy simulated (GeV)
      "energy_max": 80,
      "energy_min": 1,

      # GENIE simulates some un-physica events (interactions that will not
      # occur in nature). The number below was arrived at by Ken Clark, so
      # ask him for more info.
      "physical_events_fract": 0.8095,

      # GENIE has a prescale factor (TODO: generalize or eliminate for
      # other xsec?)
      "genie_prescale_factor": 1.2,

      # Neutrino flavors simulated
      "flavints": "nutau,nutaubar",

      # #nu / (#nu + #nubar) simulated
      "nu_to_total_fract": 0.5,

      # Number of events simulated per I3 file
      "num_events_per_file": 250000,

      # Number of I3 files used
      "num_i3_files": 195,

      # Simulated spectral inde gamma; value of 1 => E*{-1}
      "sim_spectral_index": 1,

      # Version of neutrino/ice cross sections used for the simulation
      "xsec_version": "genie_2.6.4",

      # Max and min zenith angle simulated (rad)
      "zenith_max": "pi",
      "zenith_min": 0
    }
  }
}

barnobarfract(barnobar=None, is_particle=None, flav_or_flavint=None)[source]

Fraction of events generated (either particles or antiparticles).

Specifying whether you want the fraction for particle or antiparticle is done by specifying one (and only one) of the three :param barnobar: :param is_particle or flav_or_flavint:

Parameters:

barnobar (None or int) – -1 for antiparticle, +1 for particle
is_particle (None or bool) – True for particle, false for antiparticle
flav_or_flavint (None or convertible to NuFlav or NuFlavInt) – Particle or antiparticles is determined from the flavor or flavint passed

consistency_checks(data, flav=None)[source]

get_energy_range()[source]: (min, max) energy in GeV

get_flavints()[source]

get_flavs()[source]

get_num_gen(barnobar=None, is_particle=None, flav_or_flavint=None, include_physical_fract=True)[source]

Return the number of events generated.

Parameters:

barnobar (None or int) – -1 for antiparticle or +1 for particle
is_particle (None or bool)
flav_or_flavint (None or convertible to NuFlav or NuFlavInt) – If one of barnobar, is_particle, or flav_or_flavint is specified, returns only the number of particles or antiparticles generated. Otherwise (if none of those is specified), return the total number of generated events.
include_physical_fract (bool) – Whether to include the “GENIE physical fraction”, which accounts for events that are generated but are un-physical and therefore will never be detectable. These are removed to not penalize detection efficiency.

get_spectral_index()[source]: Spectral index (positive number for negative powers of energy)

get_xsec(xsec=None)[source]: Instantiated CrossSections object

get_xsec_version()[source]: Cross sectons version name used in generating the MC

static translate_source_dict(d)[source]

pisa.utils.numba_tools module

Numba tools

This is a colection of functions used for numba functions that work for targets cpu as well as cuda

pisa.utils.numba_tools.clear_matrix(A)[source]

Zero out 2D matrix ..

A[i, j] = 0

pisa.utils.numba_tools.conjugate(A, B)[source]

B is the element-by-element conjugate of A ..

B[i, j] = A[i, j]*

Parameters:

A (2d array)
B (2d array)

pisa.utils.numba_tools.conjugate_transpose(A, B)[source]

B is the conjugate (Hermitian) transpose of A ..

B[j, i] = A[i, j]*

A : 2d array of shape (M, N) B : 2d array of shape (N, M)

pisa.utils.numba_tools.copy_matrix(A, B)[source]

Copy elemnts of 2d array A to array B ..

B[i, j] = A[i, j]

pisa.utils.numba_tools.ctype: alias of complex128

pisa.utils.numba_tools.cuda()

pisa.utils.numba_tools.cuda_copy(func)[source]: Handle copying back device array

pisa.utils.numba_tools.ftype: alias of float64

pisa.utils.numba_tools.matrix_dot_matrix(A, B, C)[source]

Dot-product of two 2d arrays ..

C = A * B

pisa.utils.numba_tools.matrix_dot_vector(A, v, w)[source]

Dot-product of a 2d array and a vector ..

w = A * v

pisa.utils.numba_tools.myjit(func)[source]

Decorator to assign the right jit for different targets In case of non-cuda targets, all instances of cuda.local.array are replaced by np.empty. This is a dirty fix, hopefully in the near future numba will support numpy array allocation and this will not be necessary anymore

Parameters:: func (callable)
Returns:: new_nb_func – Refactored version of func but with cuda.local.array replaced by np.empty if TARGET == “cpu”. For either TARGET, the returned function will be callable within numba code for that target.
Return type:: numba callable

pisa.utils.numba_tools.test_clear_matrix()[source]: Unit tests of clear_matrix and clear_matrix_guf

pisa.utils.numba_tools.test_conjugate()[source]: Unit tests of conjugate and conjugate_guf

pisa.utils.numba_tools.test_conjugate_transpose()[source]: Unit tests of conjugate_transpose and conjugate_transpose_guf

pisa.utils.numba_tools.test_copy_matrix()[source]: Unit tests of copy_matrix and copy_matrix_guf

pisa.utils.numba_tools.test_matrix_dot_matrix()[source]: Unit tests of matrix_dot_matrix and matrix_dot_matrix_guf

pisa.utils.numba_tools.test_matrix_dot_vector()[source]: Unit tests of matrix_dot_vector and matrix_dot_vector_guf

pisa.utils.plotter module

Plotter class for doing plots easily

class pisa.utils.plotter.Plotter(outdir='.', stamp=None, size=(8, 8), fmt='pdf', log=True, label='# events', grid=True, ratio=False, annotate=False, symmetric=False, loc='outside')[source]

Bases: object

Plotting library for PISA `Map`s and `MapSet`s

Parameters:

outdir (str) – output directory path
stamp (str) – stamp to be put on every subplot, e.g. ‘Preliminary’, ‘DeepCore nutau’, etc.
fmt (str or iterable of str) – formats to be plotted, e.g. [‘pdf’, ‘png’]
size ((int, int)) – canvas size (inches)
log (bool) – logarithmic z-axis
label (str) – z-axis label
grid (bool) – plot grid
ratio (bool) – add ratio plots in 1-d histos
annotate (bool) – annotate counts per bin in 2-d histos
symmetric (bool) – force symmetric extent of z-axis
loc (str) – either ‘inside’ or ‘outside’, defining where to put axis titles

add_leg()[source]: initialize legend

add_stamp(text=None, **kwargs)[source]

Add common stamp with text.

NOTE add_stamp cannot be used on a subplot that has been de-selected and then re-selected. It will write over existing text.

dump(fname)[source]: dump figure to file

init_fig(figsize=None)[source]: clear/initialize figure

next_color()[source]

plot_1d_all(map_set, plot_axis, **kwargs)[source]: all one one canvas

plot_1d_array(map_set, plot_axis, n_rows=None, n_cols=None, fname=None, **kwargs)[source]: plot 1d projections as an array

plot_1d_cmp(map_sets, plot_axis, fname=None, **kwargs)[source]: 1d comparisons for two map_sets as projections

plot_1d_projection(map, plot_axis, **kwargs)[source]: plot map projected on plot_axis

plot_1d_ratio(maps, plot_axis, **kwargs)[source]: make a ratio plot for a 1d projection

plot_1d_single(map_set, plot_axis, **kwargs)[source]: plot all maps in individual plots

plot_1d_slices_array(map_sets, plot_axis, fname=None, **kwargs)[source]: plot 1d projections as an array

plot_1d_stack(map_set, plot_axis, **kwargs)[source]: all maps stacked on top of each other

plot_2d_array(map_set, n_rows=None, n_cols=None, fname=None, **kwargs)[source]: plot all maps in a single plot

plot_2d_map(map, cmap=None, **kwargs)[source]: plot map on current axis in 2d

plot_2d_single(map_set, **kwargs)[source]: plot all maps in individual plots

plot_array(map_set, fun, *args, **kwargs)[source]: wrapper funtion to exccute plotting function fun for every map in a set distributed over a grid

plot_xsec(map_set, ylim=None, logx=True)[source]

project_1d(map, plot_axis)[source]: sum up a map along all axes except plot_axis

reset_colors()[source]

slices_array(map_sets, plot_axis, *args, **kwargs)[source]: plot map_set in array using a function fun

pisa.utils.profiler module

Tools for profiling code

pisa.utils.profiler.line_profile(func)[source]

Use as @line_profile decorator for a function or class method to log how long each line in the function/method takes to run.

Note that timings can be skewed by overhead from the line_profiler module, which is used as the core timing mechanism for this function.

Parameters:: func (callable) – Function or method to be profiled
Returns:: new_func – New version of func that is callable just like func but that logs the time spent in each line of code in func.
Return type:: callable

pisa.utils.profiler.profile(func)[source]

Use as @profile decorator for a function or class method to log the time that it takes to complete.

Parameters:: func (callable) – Function or method to profile
Returns:: new_func – New version of func that is callable just like func but that logs the total time spent in func.
Return type:: callable

pisa.utils.profiler.test_line_profile()[source]: Unit tests for line_profile functional (decorator)

pisa.utils.profiler.test_profile()[source]: Unit tests for profile functional (decorator)

pisa.utils.pull_method module

Pull method tools.

pisa.utils.random_numbers module

Utilities to handle random numbers needed by PISA in a consistent and reproducible manner.

pisa.utils.random_numbers.get_random_state(random_state, jumpahead=None)[source]

Derive a numpy.random.RandomState object (usable to generate random numbers and distributions) from a flexible specification..

Parameters:

random_state (None, RandomState, string, int, state vector, or seq of int) –

If instantiated RandomState object is passed, it is used directly
If string : must be either ‘rand’ or ‘random’; random state is instantiated at random from either /dev/urandom or (if that is not present) the clock. This creates an irreproducibly-random number.
If int or sequence of lenth one: This is used as the seed value; must be in [0, 2**32).
If sequence of two integers: first must be in [0, 32768): 15 most-significant bits. Second must be in [0, 131072): 17 least-significant bits.
If sequence of three integers: first must be in [0, 4): 2 most-significant bits. Second must be in [0, 8192): next 13 (less-significant) bits. Third must be in [0, 131072): 17 least-significant bits.
If a “state vector” (sequence of length five usable by numpy.random.RandomState.set_state), set the random state using this method.

Returns:

random_state – Object callable like numpy.random (e.g. random_state.rand((10,10))), but with __exclusively local__ state (whereas numpy.random has global state).

Return type:

numpy.random.RandomState

pisa.utils.random_numbers.test_get_random_state()[source]: Unit tests for get_random_state function

pisa.utils.resources module

Tools to obtain resource files needed for PISA, whether the resource is located in the filesystem or with the installed PISA package.

pisa.utils.resources.find_path(pathspec, fail=True)[source]

Find a file or directory in the filesystem (i.e., something that the operating system can locate, as opposed to Python package resources, which can be located within a package and therefore hidden from the filesystem).

Parameters:

pathspec (string)
fail (bool)

Return type:

None (if not found) or string (absolute path to file or dir if found)

pisa.utils.resources.find_resource(resource, fail=True)[source]

Try to find a resource (file or directory).

First check if resource is an absolute path, then check relative to the paths specified by the PISA_RESOURCES environment variable (if it is defined). Otherwise, look in the resources directory of the PISA installation.

Note that the PISA_RESOURCES environment variable can contain multiple paths, each separated by a colon. Due to using colons as separators, however, the paths themselves can not contain colons.

Note also if PISA is packaged as archive distribution (e.g. zipped egg), this method extracts the resource (if directory, the entire contents are extracted) to a temporary cache directory. Therefore, it is preferable to use the open_resource method directly, and avoid this method if possible.

Parameters:

resource (str) – Resource path; can be path relative to CWD, path relative to PISA_RESOURCES environment variable (if defined), or a package resource location relative to the pisa_examples/resources sub-directory. Within each path specified in PISA_RESOURCES and within the pisa_examples/resources dir, the sub-directories ‘data’, ‘scripts’, and ‘settings’ are checked for resource _before_ the base directories are checked. Note that the first result found is returned.
fail (bool) – If True, raise IOError if resource not found If False, return None if resource not found

Returns:

String if resource is found (relative path to the file or directory); if
not found and fail is False, returns None.

Raises:

IOError if resource is not found and fail is True. –

pisa.utils.resources.open_resource(resource, mode='r')[source]

Find the resource file (see find_resource), open it, and return a file handle.

Parameters:

resource (str) – Resource path; can be path relative to CWD, path relative to PISA_RESOURCES environment variable (if defined), or a package resource location relative to PISA’s pisa_examples/resources sub-directory. Within each path specified in PISA_RESOURCES and within the pisa_examples/resources dir, the sub-directories ‘data’, ‘scripts’, and ‘settings’ are checked for resource _before_ the base directories are checked. Note that the first result found is returned.
mode (str) – ‘r’, ‘w’, or ‘rw’; only ‘r’ is valid for package resources (as these cannot be written)

Return type:

binary stream object (which behaves identically to a file object)

See also

find_resource: Locate a file or directory (in fileystem) or a package resource
find_path: Locate a file or directory in the filesystem Open a (file) package resource and return stream object.

Notes

See help for pkg_resources module / resource_stream method for more details on handling of package resources.

pisa.utils.spline module

Classes to store and handle the evaluation of splines.

class pisa.utils.spline.CombinedSpline(inSpline, interactions=True, ver=None)[source]

Bases: FlavIntData

Contained class for operating on Spline objects for various neutrino flavours.

Inherits from FlavIntData object. Provides methods to allow evaluation of the splines for all neutrino flavours.

Parameters:

inSpline (Spline or tuple of Spline) – Spline objects with name entry corresponding to a neutrino flavour nue, numu, nuebar, numubar and also corresponding to an interaction type cc and nc if the flag interactions is True.
interactions (Bool) – Default = True Flag to specifiy whether to store flavours or flavour+interaction signatures.

compute_integrated_maps(binning, **kwargs)[source]: Compute the map of spline values for a given signature integrated over the input binning, then store it internally.

compute_maps(binning, **kwargs)[source]: Compute the map of spline values for a given signature and binning, then store it internally.

get_integrated_map(signature, binning, **kwargs)[source]: Return a map of spline values for a given signature integrated over the input binning.

get_map(signature, binning, **kwargs)[source]: Return a map of spline values for a given signature and binning.

get_spline(signature, centers, **kwargs)[source]: Return the spline of a given signature and bins.

reset()[source]: Reset the flux maps to the original input maps.

return_mapset(**kwargs)[source]: Return a MapSet of stored spline maps.

scale_map(signature, value)[source]: Scale a specific spline map by an input value.

scale_maps(value)[source]: Scale the stored spline maps by an input value.

static validate_spline(spline)[source]: Validate spline data.

class pisa.utils.spline.Spline(name, spline, eval_spl, tex=None, validate_spl=None, hash=None)[source]

Bases: object

Encapsulation of spline evaluation and other operations.

Provides methods to evaluate the spline object over a given binning.

Parameters:

name (string) – Name for the spline object. Used to identify the object.
tex (None or string) – TeX string that can be used for e.g. plotting.
spline – Splines used for evaluation.
eval_spl (function) – Function prescribing how to obtain values for the input spline object from a given binning.
validate_spl (function) – Function performing validation test on a given binning used to evaluate the spline.
hash (None, or immutable object (typically an integer)) – Hash value to attach to the spline.

get_integrated_map(binning, bw_units=None, **kwargs)[source]: Get the spline map integrated over the input binning values in output units specified by bw_units.

get_map(binning, **kwargs)[source]: Return a map of the spline evaluated at the centers of the given binning.

property hash

property name

property spline

property tex

pisa.utils.spline_smooth module

Smooth an array by splining it and resampling from the spline

pisa.utils.spline_smooth.spline_smooth(array, spline_binning, eval_binning, axis=0, smooth_factor=5, k=3, errors=None)[source]

Fuction for spline-smoothing arrays

It is smoothing in slices along one axis, assuming 2d arrays The smoothing is done by splines

Parameters:

array (2d-array) – array to be smoothed
spline_binning (OneDimBinning) – Binning of axis on which to construct the spline Must corrspond to the array dimension
axis (int) – Index of the axis along which to smooth
eval_binning (OneDimBinning) – Binning on which to evaluate the constructed spline
smooth_factor (float) – smoothing factor for spline
k (int) – spline degree
errors (2d-array or None) – uncertainties on the array

Notes

could be expanded to nd arrays to generalize it

pisa.utils.stats module

Statistical functions

pisa.utils.stats.ALL_METRICS = ['llh', 'conv_llh', 'barlow_llh', 'mcllh_mean', 'mcllh_eff', 'generalized_poisson_llh', 'chi2', 'mod_chi2', 'correct_chi2', 'weighted_chi2']: All metrics defined

pisa.utils.stats.CHI2_METRICS = ['chi2', 'mod_chi2', 'correct_chi2', 'weighted_chi2']: Metrics defined that result in measures of chi squared

pisa.utils.stats.LLH_METRICS = ['llh', 'conv_llh', 'barlow_llh', 'mcllh_mean', 'mcllh_eff', 'generalized_poisson_llh']: Metrics defined that result in measures of log likelihood

pisa.utils.stats.SMALL_POS = 1e-10: A small positive number with which to replace numbers smaller than it

pisa.utils.stats.barlow_llh(actual_values, expected_values)[source]

Compute the Barlow LLH taking into account finite statistics. The likelihood is described in this paper: https://doi.org/10.1016/0010-4655(93)90005-W

Parameters:

actual_values (numpy.ndarrays of same shape)
expected_values (numpy.ndarrays of same shape)

Returns:

barlow_llh

Return type:

numpy.ndarray

pisa.utils.stats.chi2(actual_values, expected_values)[source]

Compute Pearson’s chi-square between each value in actual_values and expected_values.

\[\chi^2 = (N_{\mathrm{actual}} - N_{\mathrm{exp}})^2/N_{\mathrm{exp}}\]

Parameters:

actual_values (numpy.ndarrays of same shape)
expected_values (numpy.ndarrays of same shape)

Returns:

chi2 – chi-squared values corresponding to each pair of elements in the inputs

Return type:

numpy.ndarray of same shape as inputs

Notes

Uncertainties are not propagated through this calculation.
expected_values are clipped to the range [SMALL_POS, inf] prior to the calculation to avoid infinities when dividing
actual_values = 0 are allowed (appear only in numerator)

pisa.utils.stats.conv_llh(actual_values, expected_values)[source]

This convolution log-likelihood takes into account any uncertainties on the expected values (from e.g. finite MC statistics), which is achieved by smearing out the simple Poisson pdf with a normal distribution centered at 0.

\[(p\star n)(k,\lambda,\sigma) = \int p(k,\lambda-x)n(x,\sigma)\mathrm{d}x\]

The integral is calculated as a discrete sum with N steps.

Parameters:

actual_values (numpy.ndarrays of same shape)
expected_values (numpy.ndarrays of same shape)

Returns:

conv_llh – convoluted llh corresponding to each pair of elements in actual_values and expected_values.

Return type:

numpy.ndarray of same shape as the inputs

pisa.utils.stats.conv_poisson(k, l, s, nsigma=3, steps=50)[source]

Poisson pdf

\[p(k,l) = l^k \cdot e^{-l}/k!\]

Parameters:

k (float)
l (float)
s (float) – sigma for smearing term (= the uncertainty to be accounted for)
nsigma (int) – The ange in sigmas over which to do the convolution, 3 sigmas is > 99%, so should be enough
steps (int) – Number of steps to do the intergration in (actual steps are 2*steps + 1, so this is the steps to each side of the gaussian smearing term)

Returns:

convoluted poissson likelihood

Return type:

float

pisa.utils.stats.correct_chi2(actual_values, expected_values)[source]

Compute -2 times the log-likelihood of a normal approximation of the Poisson pdf between each value in actual_values and expected_values taking into account uncertainty terms (incl. e.g. finite stats) and their changes.

\[\chi^2 = \ln(\sigma^2 + N_{\mathrm{exp}}) + (N_{\mathrm{actual}} - N_{\mathrm{exp}})^2/(\sigma^2 + N_{\mathrm{exp}})\]

Note that a constant normalisation factor $\ln(2\pi)$ has already been subtracted.

Parameters:

actual_values (numpy.ndarrays of same shape)
expected_values (numpy.ndarrays of same shape)

Returns:

m_chi2 – modified chi-squared values corresponding to each pair of elements in the inputs

Return type:

numpy.ndarray of same shape as inputs

pisa.utils.stats.generalized_poisson_llh(actual_values, expected_values=None, empty_bins=None)[source]

Compute the generalized Poisson likelihood as formulated in https://arxiv.org/abs/1902.08831

Note that unlike the other likelihood functions, expected_values is expected to be a distribution maker

Parameters:

actual_values (flattened hist of a Map object)
expected_values (OrderedDict of MapSets)
empty_bins (None, list or np.ndarray (list the bin indices that are empty))

Returns:

llh_per_bin

Return type:

bin-wise llh values, in a numpy array

pisa.utils.stats.llh(actual_values, expected_values)[source]

Compute the log-likelihoods that each count in actual_values came from the corresponding expected value in expected_values.

\[\mathcal{L} = N_{\mathrm{actual}} \ln(N_{\mathrm{exp}}) - N_{\mathrm{exp}} - (N_{\mathrm{actual}} \ln(N_{\mathrm{actual}}) - N_{\mathrm{actual}})\]

Note that this expression uses https://en.wikipedia.org/wiki/Stirling%27s_approximation to estimate ln(k!)~k ln(k)-k.

Parameters:

actual_values (numpy.ndarrays of same shape)
expected_values (numpy.ndarrays of same shape)

Returns:

llh – llh corresponding to each pair of elements in actual_values and expected_values.

Return type:

numpy.ndarray of same shape as the inputs

Notes

Uncertainties are not propagated through this calculation.
Values in expected_values are clipped to the range [SMALL_POS, inf] prior to the calculation to avoid infinities due to the log function.

pisa.utils.stats.log_poisson(k, l)[source]

Calculate the log of a poisson pdf

\[p(k,l) = \log\left( l^k \cdot e^{-l}/k! \right)\]

Parameters:

k (float)
l (float)

Return type:

log of poisson

pisa.utils.stats.log_smear(x, sigma)[source]

Calculate the log of a normal pdf

\[p(x, \sigma) = \log\left( (\sigma \sqrt{2\pi})^{-1} \exp( -x^2 / 2\sigma^2 ) \right)\]

Parameters:

x (float)
sigma (float)

Return type:

log of gaussian

pisa.utils.stats.maperror_logmsg(m)[source]: Create message with thorough info about a map for logging purposes

pisa.utils.stats.mcllh_eff(actual_values, expected_values)[source]

Compute the log-likelihood (llh) based on eq. 3.16 - https://doi.org/10.1007/JHEP06(2019)030 accounting for finite MC statistics. This is the most recommended likelihood in the paper.

Parameters:

actual_values (numpy.ndarrays of same shape)
expected_values (numpy.ndarrays of same shape)

Returns:

llh – llh corresponding to each pair of elements in actual_values and expected_values.

Return type:

numpy.ndarray of same shape as the inputs

pisa.utils.stats.mcllh_mean(actual_values, expected_values)[source]

Compute the log-likelihood (llh) based on LMean in table 2 - https://doi.org/10.1007/JHEP06(2019)030 accounting for finite MC statistics. This is the second most recommended likelihood in the paper.

Parameters:

actual_values (numpy.ndarrays of same shape)
expected_values (numpy.ndarrays of same shape)

Returns:

llh – llh corresponding to each pair of elements in actual_values and expected_values.

Return type:

numpy.ndarray of same shape as the inputs

pisa.utils.stats.mod_chi2(actual_values, expected_values)[source]

Compute a modified Pearson’s chi-square between each value in actual_values and expected_values taking into account uncertainty terms (incl. e.g. finite stats).

\[\chi^2 = (N_{\mathrm{actual}} - N_{\mathrm{exp}})^2/(\sigma^2 + N_{\mathrm{exp}})\]

actual_values, expected_values : numpy.ndarrays of same shape

m_chi2numpy.ndarray of same shape as inputs
Modified chi-squared values corresponding to each pair of elements in the inputs

Notes

expected_values are clipped to the range [SMALL_POS, inf] prior to the calculation to avoid infinities when dividing
actual_values = 0 are allowed (appear only in numerator)

pisa.utils.stats.norm_conv_poisson(k, l, s, nsigma=3, steps=50)[source]

Convoluted poisson likelihood normalized so that the value at k=l (asimov) does not change

Parameters:

k (float)
l (float)
s (float) – sigma for smearing term (= the uncertainty to be accounted for)
nsigma (int) – The range in sigmas over which to do the convolution, 3 sigmas is > 99%, so should be enough
steps (int) – Number of steps to do the intergration in (actual steps are 2*steps + 1, so this is the steps to each side of the gaussian smearing term)

Returns:

Convoluted poisson likelihood normalized so that the value at k=l (asimov) does not change

Return type:

likelihood

pisa.utils.stats.signed_sqrt_mod_chi2(actual_values, expected_values)[source]

Compute a (signed) pull value taking into account uncertainty terms (cf. mod_chi2).

Parameters:

actual_values (numpy.ndarrays of same shape)
expected_values (numpy.ndarrays of same shape)

Returns:

m_pull – Pull values corresponding to each pair of elements in the inputs

Return type:

numpy.ndarray of same shape as inputs

pisa.utils.tests module

Functions to help compare and plot differences between PISA 2 and PISA 3 maps

pisa.utils.tests.baseplot(m, title, ax, clabel=None, symm=False, evtrate=False, vmax=None, cmap=<matplotlib.colors.LinearSegmentedColormap object>)[source]: Simple plotting of a 2D histogram (map)

pisa.utils.tests.baseplot2(map, title, ax, vmax=None, symm=False, evtrate=False)[source]

Simple plotting of a 2D map.

Parameters:

map (Map)
title (str)
ax (axis)
symm (bool)
evtrate (bool)

Return type:

ax, pcmesh, cbar

pisa.utils.tests.check_agreement(testname, thresh_ratio, ratio, thresh_diff, diff)[source]

pisa.utils.tests.make_delta_map(amap, bmap)[source]: Get the difference between two PISA 2 style maps (amap-bmap) and return as another PISA 2 style map.

pisa.utils.tests.make_ratio_map(amap, bmap)[source]: Get the ratio of two PISA 2 style maps (amap/bmap) and return as another PISA 2 style map.

pisa.utils.tests.order(x)[source]

pisa.utils.tests.order_str(x)[source]

pisa.utils.tests.pisa2_map_to_pisa3_map(pisa2_map, ebins_name='ebins', czbins_name='czbins', is_log=True, is_lin=True)[source]

pisa.utils.tests.plot_cmp(new, ref, new_label, ref_label, plot_label, file_label, outdir, ftype='png')[source]

Plot comparisons between two (identically-binned) maps or map sets.

Parameters:

new (Map or MapSet)
ref (Map or MapSet)
new_label (str)
ref_label (str)
plot_label (str)
file_label (str)
outdir (str)
ftype (str)

pisa.utils.tests.plot_comparisons(ref_map, new_map, ref_abv, new_abv, outdir, subdir, name, texname, stagename, servicename, shorttitles=False, ftype='png')[source]: Plot comparisons between two identically-binned PISA 2 style maps

pisa.utils.tests.plot_map_comparisons(ref_map, new_map, ref_abv, new_abv, outdir, subdir, name, texname, stagename, servicename, shorttitles=False, ftype='png')[source]: Plot comparisons between two identically-binned PISA 3 style maps

pisa.utils.tests.print_agreement(testname, ratio)[source]

pisa.utils.tests.print_event_rates(testname1, testname2, kind, map1_events, map2_events)[source]

pisa.utils.tests.validate_map_objs(amap, bmap)[source]: Validate that two PISA 3 style maps are compatible binning.

pisa.utils.tests.validate_maps(amap, bmap)[source]: Validate that two PISA 2 style maps are compatible binning.

pisa.utils.vbwkde module

Implementation of the Improved Sheather Jones (ISJ) KDE bandwidth selection method outlined in:

Botev, J. F. Grotowski, and D. P. Kroese. Kernel density

estimation via diffusion. The Annals of Statistics, 38(5):2916-2957, 2010.

and extension to use this in varaible bandwidth KDE via

I.S. Abramson, On bandwidth variation in kernel estimates - A
square root law, Annals of Stat. Vol. 10, No. 4, 1217-1223 1982

See also

Hall, T. C. Hu, J. S. Marron, Improved Variable Window Kernel

Estimates of Probability Densities, Annals of Statistics Vol. 23, No. 1, 1-10, 1995

pisa.utils.vbwkde.fbwkde(data, weights=None, n_dct=None, min=None, max=None, evaluate_dens=True, evaluate_at=None)[source]

Fixed-bandwidth (standard) Gaussian KDE using the Improved Sheather-Jones bandwidth.

Code adapted for Python from the implementation in Matlab by Zdravko Botev.

Ref: Z. I. Botev, J. F. Grotowski, and D. P. Kroese. Kernel density estimation via diffusion. The Annals of Statistics, 38(5):2916-2957, 2010.

Parameters:

data (array)
weights (array or None)
n_dct (None or int) – Number of points with which to form a regular grid, from min to max; histogram values at these points are sent through a discrete Cosine Transform (DCT), so n_dct should be an integer power of 2 for speed purposes. If None, uses next-highest-power-of-2 above len(data)*10.
min (float or None)
max (float or None)
evaluate_dens (bool)
evaluate_at (None or array)

Returns:

bandwidth (float)
evaluate_at (array of float)
density (None or array of float)

pisa.utils.vbwkde.isj_bandwidth(y, n_datapoints, x_range, min_bandwidth)[source]

Parameters:

y (array of float)
n_datapoints (int)
x_range (float)

Returns:

bandwidth (float)
t_star (float)
dct_data (array of float)

pisa.utils.vbwkde.test_fbwkde()[source]: Test speed of fbwkde implementation

pisa.utils.vbwkde.test_vbwkde()[source]: Test speed of unweighted vbwkde implementation

pisa.utils.vbwkde.vbwkde(data, weights=None, n_dct=None, min=None, max=None, n_addl_iter=0, evaluate_dens=True, evaluate_at=None)[source]

Variable-bandwidth (standard) Gaussian KDE that uses the function fbwkde for a pilot density estimate.

Parameters:

data (array) – The data points for which the density estimate is sought
weights (array or None) – Weight for each point in data
n_dct (None or int) – Number of points with which to form a regular grid, from min to max; histogram values at these points are sent through a discrete Cosine Transform (DCT), so n_dct should be an integer power of 2 for speed purposes. If None, uses next-highest-power-of-2 above len(data)*10.
min (None or float) – Minimum of range over which to compute density. If None, defaults to min(data) - range(data)/2
max (None or float) – Maximum of range over which to compute density. If None: max(data) + range(data)/2
n_addl_iter (int >= 0) – Number of additional iterations on the Abramson VBW density after the initial VBW estimate. This can help smooth the tails of the distribution, at the expense of additional computational cost.
evaluate_dens (bool) – Whether to evaluate and return the density estimate on the points defined by evaluate_at
evaluate_at (None, float, or array of float) –

Point(s) at which to evaluate the density estimate. If None,
evaluate_at = np.linspace(min + dx/2, max - dx/2, n_dct)

where
dx = (max - min) / n_dct

Returns:

kernel_bandwidths (array)
evaluate_at (array) – Locations at which the density estimates would be evaluated
density (array) – Density estimates

Notes

Specifying the range:

The specification of min and max are critical for obtaining a reasonable density estimate. If the true underlying density slowly decays to zero on one side or the other, like a gaussian, specifying too-small a range will distort the edge the VBW-KDE finds. On the other hand, an abrupt cut-off in the distribution should be accompanied by a similar cutoff in the computational range (min and/or max). The algorithm here will approximate such a sharp cut-off with roughly the same performance to the reflection method for standard KDE’s (as the fixed-BW portion uses a DCT of the data), but note that this will not perform as well as polynomial-edges or other modifications that have been proposed in the literature.

pisa.utils.vectorizer module

Collection of useful vectorized functions

pisa.utils.vectorizer.assign(*args, **kwargs)

pisa.utils.vectorizer.imul(*args, **kwargs)

pisa.utils.vectorizer.imul_and_scale(*args, **kwargs)

pisa.utils.vectorizer.itruediv(*args, **kwargs)

pisa.utils.vectorizer.mul(*args, **kwargs)

pisa.utils.vectorizer.pow(*args, **kwargs)

pisa.utils.vectorizer.replace_where_counts_gt(*args, **kwargs)

pisa.utils.vectorizer.sqrt(*args, **kwargs)

pisa.utils package

Subpackages

Submodules

pisa.utils.barlow module

pisa.utils.barr_parameterization module

pisa.utils.callable module

pisa.utils.comparisons module

pisa.utils.config_parser module

Config File Structure

Param definitions

N.B.

Param selector

N.B.

pisa.utils.cross_sections module

pisa.utils.data_proc_params module

pisa.utils.fileio module

pisa.utils.fisher_matrix module

pisa.utils.flavInt module

pisa.utils.flux_weights module

pisa.utils.format module

pisa.utils.gaussians module

pisa.utils.hash module

pisa.utils.hdf module

pisa.utils.hdfchain module

pisa.utils.jsons module

pisa.utils.kde_hist module

ToDo:

pisa.utils.likelihood_functions module

pisa.utils.llh_client module

pisa.utils.llh_server module

pisa.utils.log module

pisa.utils.matrix module

pisa.utils.mcSimRunSettings module

pisa.utils.numba_tools module

pisa.utils.plotter module

pisa.utils.profiler module

pisa.utils.pull_method module

pisa.utils.random_numbers module

pisa.utils.resources module

pisa.utils.spline module

pisa.utils.spline_smooth module

pisa.utils.stats module

pisa.utils.tests module

pisa.utils.vbwkde module

pisa.utils.vectorizer module

Module contents