Likelihoods class for computing Poisson and Barlow likelihoods.
The Poisson likelihood assumes the template being compared against is the
perfect expectation, while the Barlow likelihood accounts for the template
being imperfect due to being generated from finite Monte Carlo statistics.
A class to handle the likelihood calculations in OscFit. It can
be used in other poisson binned-likelihood calculators as well.
The class includes two likelihood spaces: the Poisson and the
Barlow LLH.
The Poisson likelihood is the ideal case and assumes infinite
statistical accuracy on the Monte Carlo distributions. This is
the simple case and has been used in the past for analyses, but
should not be used if there’s a significant statistical error on
one of the samples.
The Barlow likelihood is an implementation of the likelihood model
given in doi:10.1016/0010-4655(93)90005-W (“Fitting using finite
Monte Carlo samples” by Barlow and Beeston). This requires the
unweighted distributions of the MC and involves assuming that each
MC samples is drawn from an underlying “true” expectation distribution
which should be fit. This gives (number of bins)x(number of samples)
new parameters to fit and allows the shape of the distributions to
fluctuate in order to fit both the observed MC and the observed
data. This gives a more informed result and allows one to estimate
the effect of the finite MC samples.
To use this, you need to run set_data, set_mc, and the set_unweighted
functions.
Important
the set_mc function takes a histogram of the average
weight per event for each bin! not the total weighted histogram!
Simply calling the get_llh function after these will return the
best fit LLH for your chosen likelihood function. The function takes
the name of the likelihood space (“Poisson” or “Barlow”). You can
retrieve the best fit weighted plots using the get_plot (total best-fit
histogram including all samples) and the get_single_plots (the list
of best-fit weighted histograms for each sample).
Calculate the likelihood given the data, average weights, and the
unweighted histograms. You can choose between “Poisson” and “Barlow”
likelihoods at the moment.
If using the “Barlow” LLH, note that the method is picked to be Powell
with 25 iterations maximum per step. This is not optimized at all and
was explicitly chosen simply because it “worked”. This doesn’t work
with the bounds set in the normal way, so the positive-definiteness of
the rates is enforced in the get_llh_barlow_bin method.
The Barlow LLH finds the best-fit “expected” MC distribution using
both the data and observed MC as constraints. Each bin is independent
in this calculation, since the assumption is that there are no
correlations between bins. This likely leads to a somewhat worse limit
than you might get otherwise, but at least its conservative.
If you have a good idea for how to extend this to include bin-to-bin,
let me know and I can help implement it.
Set up the MC histogram. Each histogram is flattened in order to
make the looping later on a bit simpler to handle. The values in each
bin should correspond to the weight-per-event in that bin, NOT the
total weight in that bin!
This is used to define a serializable object used for functions assigned to the DerivedParams
These can be constructed and evaluated symbolically and procedurally.
In principle, we may even be able to include a filepath to a seriialized Funct object such that the pipeline configs can include definitions for these therein
Contains
OPS - an Enum listing recognized operations
TrigOps - some definitions for handling some of the trig functions, shared by Vars and Functs
Var - a class for representing variables
Funct - a series of operations done representing the functions
Uses - quite simple!
create some vars and do math on them
x = Var(‘x’)
y = Var(‘y’)
function = sin(x**2) + 3*cos(y+1)**2
The object function is now callable with keyword arguments passed to the instantiated Vars
recursiveEquality and recursiveAllclose traverse into potentially nested
datstructures and compare all elements for equality.
normQuant performs the same kind of traversal, but returns a normalized
version of the input object whereby “essentially-equal” things are returned as
“actually-equal” objects.
These functions are at the heart of hashing behaving as one expects, and this
in turn is essential for caching to work correctly.
Important
Read carefully how each function in this module defines
“equality” for various datatypes so you understand what two things being
“equal” based upon these functions actually means.
E.g., (nan == nan) == True, uncorrelated uncertainties are equal if both
their nominal values and standard deviations are equal regardless if they
come from independent random variables, etc.
Interpret a value as a pint Quantity via pisa.ureg
Parameters:
value (scalar, Quantity, or sequence interpretable as Quantity)
expect_sequence (bool) – Specify True if you expect a sequence of quantities (or a
pint-Quantity containing a numpy array). This allows interpreting each
element of a passed sequence as a quantity. Otherwise, specify False
if you expect a scalar. This allows interpreting a pint.Qauntity tuple
as a ascalar (the first element of the tuple is the magnitude and the
second element contains the units).
Check if input is a numerical datatype (including arrays of numbers) but
without units. Note that for these purposes, booleans are not considered
numerical datatypes.
Normalize quantities such that two things that should be equal are
returned as identical objects.
Handles floating point numbers, pint quantities, uncertainties, and
combinations thereof as standalone objects or in sequences, dicts, or numpy
ndarrays. Numerical precision issues and equal quantities represented in
differently-scaled or different systems of units come out identically.
Outputs from this function (not the inputs) deemed to be equal by the
above logic will compare to be equal (via the == operator and via
pisa.utils.comparisons.recursiveEquality) and will also hash to equal
values (via pisa.utils.hash.hash_obj).
Parameters:
obj – Object to be normalized.
sigfigs (None or int > 0) – Number of digits to which to round numbers’ significands; if None, do
not round numbers.
full_norm (bool) –
If True, does full translation and normalization which is good across
independent invocations and is careful about normalizing units, etc.
If false, certain assumptions are made that modify the behavior
described below in the Notes section which help speed things up in the
case of e.g. a minimizer run, where we know certain things won’t
change:
* Units are not normalized. They are assumed to stay the same from
run to run.
sigfigs are not respected; full significant figures are returned
(since it takes time to round all values appropriately).
Returns:
normed_obj – Simple types are returned as the same type as at the input, Numpy
ndarrays are returned in the same shape and representation as the
input, Mappings (dicts) are returned as OrderedDict, and all other
sequences or iterables are returned as (possibly nested) lists.
Return type:
object roughly of same type as input obj
Notes
Conversion logic by obj type or types found within obj:
Sequences and OrderedDicts (but not numpy arrays) are iterated
through recursively.
Mappings without ordering (e.g. dicts) are iterated through
recursively after sorting their keys, and are returned as
OrderedDicts (such that the output is always consistent when
serialized).
Sequences (not numpy arrays) are iterated through recursively.
Numpy ndarrays are treated as the below data types (according to the
array’s dtype).
Simple objects (non-floating point / non-sequence / non-numpy / etc.)
are returned unaltered (e.g. strings).
Pint quantities (numbers with units): Convert to their base units.
Floating-point numbers (including the converted pint quantities):
Round values to sigfigs significant figures.
Numbers with uncertainties (via the uncertainties module) have
their nominal values rounded as above but their standard deviations are
rounded to the same order of magnitude (not number of significant
figures) as the nominal.
Therefore passing obj=10.23+/-0.25 and sigfigs=2 returns 10+/-0.0.
Note that correlations are lost in the outputs of this function, so
equality of the output requires merely having equal nomial values and
equal standard deviations.
The calculations leading to these equal numbers might have used
independent random variables to arrive at them, however, and so the
uncertainties module would have evaluated them to be unequal. [1]
To achieve rounding that masks floating point precision issues, set
sigfigs to a value less than the number of decimal digits used for the
significand of the calculation floating point precision.
For reference, the IEEE 754 floating point standard [2] uses the following:
Pint quantities hash to unequal values if specified in different scales or
different systems of units (even if the underlying physical quantity is
identical).
Even if both quantities are floating point numbers, finite precision
effects in the to_base_units conversion can still cause two things which
we “know” are equal to evaluate to be unequal.
pisa.utils.comparisons.recursiveAllclose(x, y, *args, **kwargs)[source]
Recursively verify close-equality between two objects x and y. If
structure is different between the two objects, returns False
args and kwargs are passed into numpy.allclose() function
pisa.utils.comparisons.recursiveEquality(x, y, allclose_kw={'atol':2.220446049250313e-16,'equal_nan':True,'rtol':1e-12})[source]
Recursively verify equality between two objects x and y.
Parameters:
x – Objects to be compared
y – Objects to be compared
Notes
Possibly unintuitive behaviors:
Sequences of any type evaluate equal if their contents are the same.
E.g., a list can equal a tuple.
Mappings of any type evaluate equal if their contents are the same.
E.g. a dict can equal an OrderedDict.
nan SHOULD equal nan, +inf SHOULD equal +inf, and -inf SHOULD equal -inf
… but this *only* holds true (as of now) if those values are in
numpy arrays! (TODO!)
Two pint units with same __repr__ but that were derived from different
unit registries evaluate to be equal. (This is contrary to pint’s
implementation of equality comparisons, which is careful in case a
unit is defined differently in different registries. We’ll assume this
isn’t done here in PISA, until a use case arises where this is no
longer a good assumption.)
Two pint units that are compatible but different (even just in
magnitude prefix) evaluate to be unequal.
This behavior is chosen for the case where numbers are given
independently of their units, and hence the automatic conversion
facility available for comparing pint quantities is not available.
The only reliable way to test equality for these “less intelligent”
objects is to ensure that both the numerical values are exactly equal
and that the units are exactly equal; the latter includes order of
magnitude prefixes (micro, milli, …, giga, etc.).
Parse a config file into a dict containing an item for every analysis
stage, that itself contains all necessary instantiation arguments/objects for
that stage. For an example config file, please consider
$PISA/pisa_examples/resources/settings/pipeline/example.cfg
A pipeline config file is expected to contain something like the following,
with the sections [pipeline] and corresponding [stage.service]
required, in addition to a [binning] section:
#include statements can be used to include other config files. The
#include statement must be the first non-whitespace on a line, and these
statements can be used anywhere within a config file.
#includeresourceasxyz statements behave similarly, but prepend the
included file’s text with a section header containing xyz in this case.
pipeline is the top-most section that defines the hierarchy of stages
and services to be instantiated. An output_binning is required to be able
to get a pisa.core.map.MapSet (set of histograms) output for
the pipeline; output_key then specifies the keys of the data passed
through the pipeline which contain histogram weights and (if desired) errors
(note: the presence of these keys is in general not obvious from a given
pipeline config file itself)
binning can contain different binning definitions, that are then later
referred to from within the stage.service sections.
stage.service: one such section per stage.service is necessary. It may
contain the options debug_mode, error_method, calc_mode,
apply_mode, which are common to all stages, and must contain all the
necessary arguments and parameters for a given stage.service.
Duplicate section headers and duplicate keys within a section are illegal.
Every key in a stage section that starts with param.<name> is interpreted and
parsed into a pisa.core.param.Param object. These can be strings
(e.g. a filename–but don’t use any quotation marks) or quantities (numbers
with units).
Quantities expect an expression that can be converted by the
parse_quantity() function. The expression for a quantity can optionally
include a simple Gaussian prior and units. The simplest definition of a
quantity with neither Gaussian prior nor units would look something like this:
param.p1=12.5
Gaussian priors can be included for a quantity using +/- notation, where
the number that follows +/- is the standard deviation. E.g.:
param.p1=12.5 +/- 2.3
If no units are explicitly set for a quantity, it is taken to be a quantity
with special units dimensionless. Units can be set by multiplying (using
*) by units.<unit> where <unit> is the short or long name
(optionally including metric prefix) of a unit. For example, the following
lines set equivalent values for params p1 and p2:
and this can be combined with the Gaussian-prior +/- notation:
param.p1=12.5 +/- 2.3 * units.GeV
Additional arguments to a parameter are passed in with the . notation, for
example param.p1.fixed=False, which makes p1 a free parameter in the
fit (by default a parameter is fixed unless specified like this).
A uniform, spline, or Jeffreys pisa.core.prior.Prior can also be set
using the .prior attribute:
In the second case, a .prior.data option will be expected, pointing to the
spline data file.
If no prior is specified, it is taken to have no prior (or, equivalently, a
uniform prior with no penalty).
A range must be given for a free parameter. Either as absolute range [x,y]
or in conjunction with the keywords nominal (= nominal parameter value) and
sigma if the param was specified with the +/- notation.
Params that have the same name in multiple stages of the pipeline are
instantiated as references to a single param in memory, so updating one updates
all of them.
A special mechanism allows the user to specify different values for
the same param via the pisa.core.param.ParamSelector mechanism.
This can be used for example for hypothesis testing, where for hypothesis A a
param takes one value and for hypothesis B another.
A given param, say foo, then needs two definitions like the following,
assuming we name our selections A and B:
param.A.foo=1param.B.foo=2
The default param selector needs to be specified in the pipeline section as
e.g.
param_selections=A
, which will default to the value of 1 for param foo. An instantiated
pipeline can dynamically switch to another selection after instantiation.
Multiple different param selections are allowed in a single config. In the
default selection they must be separated by commas.
Iterate through the lines of an already-open file (fp) but then can pause
at any point and open and iterate through another file via the
switch_to_file method (and this file can be paused to iterate through
another, etc.).
This has the effect of in-lining files within files for e.g. parsing
multiple files as if they’re a singe file. Which line comes from which file
is also tracked for generating maximally-useful error messages, via the
location method.
Note that circular references are not allowed.
Parameters:
fp (file-like object) – The (opened) main config to be read. E.g. can be an opened file,
io.StringIO object, etc.
fpname (string) – Identifier for the initially fp object
Parses a PISA config file, extending configparser.RawConfigParser
(the backport of RawConfigParser from Python 3.x) by adding the ability to
include external files inline via, for example:
#include /path/to/file.cfg#include path/to/resource.cfg#include path/to/resource2.cfg as section2[section1]key11=value1key12=${section2:key21}key13=value3
where the files or resources located at “/path/to/file.cfg”,
“path/to/resource.cfg”, and “path/to/resource2.cfg” are effectively inlined
wherever the #include statements occur.
The #includepath/to/resource2.cfgassection_name syntax
prefixes the contents of resource2.cfg by a section header named
“section2”, expanding resource2.cfg as:
[section2]line1 of resource2.cfgline2 of resource2.cfg... etc.
Special parsing rules we have added to make #include behavior sensible:
Using an #includefile that contains a section header
([section_name]) or using #includefileassection_name
requires that the next non-blank / non-comment / non-#include line
be a new section header ([section_name2]).
Empty sections after fully parsing a config will raise a ValueError.
This is likely never a desired behavior, and should alert the user to
inadvertent use of #include.
Also note that, unlike the default ConfigParser
behavior, ExtendedInterpolation is used, whitespace
surrounding text in a section header is ignored, empty lines are not
allowed between multi-line values, and section names, keys, and values are
all case-sensitive.
All other options are taken as the defaults / default behaviors of
ConfigParser.
See help for configparser.ConfigParser for further help on valid
config file syntax and parsing behavior.
Override read method to interpret filenames as PISA resource
locations, then call overridden read method. Also, IOError fails
here, whereas it is ignored in RawConfigParser.
Parse a param specification from a PISA config file.
Note that if the param specification does not include fixed,
prior, and/or range, the defaults for these are:
fixed=True, prior=None, and range=None.
If a prior is specified explicitly via .prior, this takes precendence,
but if no .prior is specified and the param’s value is parsed to be a
uncertainties.AffineScalarFunc (i.e. have std_dev attribute),
a Gaussian prior is constructed from that and then the AffineScalarFunc is
stripped out of the param’s value (such that it is just a
Quantity).
stage_dicts – Keys are (stage_name, service_name) tuples and values are OrderedDicts
with arguments’ names as keys and values as values. Some known arg
values are parsed out fully into Python objects, while the rest remain
as strings that must be used or parsed elsewhere.
Evaluate a string with certain special values, or return the string. Any
further parsing must be done outside this module, as this is as specialized
as we’re willing to be in assuming/interpreting what a string is supposed
to mean.
Energy-spectrum-weighted integral of (possibly a ratio of)
(possibly-combined) flavor/interaction type cross sections.
Parameters:
flavintgroup0 (NuFlavIntGroup or convertible thereto) – Flavor(s)/interaction type(s) for which to combine cross sections
for numerator of ratio
flavintgroup1 (None, NuFlavIntGroup or convertible thereto) – Flavor(s)/interaction type(s) for which to combine cross sections
for denominator of ratio. If None is passed, the denominator of
the “ratio” is effectively 1.
e_range – Range of energy over which to integrate (GeV)
gamma (float >= 0) – Power law spectral index used for weighting the integral,
E^{-gamma}. Note that gamma should be >= 0.
average (bool) – If True, return the average of the cross section (ratio)
If False, return the integral of the cross section (ratio)
Load cross sections from a file locatable and readable by the PISA
from_file command. If ver is provided, it is used to index into the
top level of the loaded dictionary
Load cross sections from root file, where graphs are first-level in
hierarchy. This is yet crude and not very flexible, but at least it’s
recorded here for posterity.
Requires ROOT and ROOT python module be installed
Parameters:
fpath (string) – Path to ROOT file
ver (string) – Necessary to differentaite among different file formats that Ken
has sent out
tot_sfx (string (default = '_tot')) – Suffix for finding total cross sections in ROOT file (if these
fields are found, the oxygen/hydrogen fields are skipped)
o_sfx (string (default = '_o16')) – Suffix for finding oxygen-16 cross sections in ROOT file
h_sfx (string (default = '_h1')) – Suffix for finding hydrogen-1 cross sections in ROOT file
plt_sfx (string (default = '_plt')) – Suffix for plots containing cross sections per GeV in ROOT file
Returns:
xsec – Object containing the loaded cross sections
Plot cross sections per GeV; optionally, save plot to file. Requires
matplotlib and optionally uses seaborn to set “pretty” defaults.
saveNone, str, or dict
If None, figure is not saved (default)
If a string, saves figure as that name
If a dict, calls pyplot.savefig(**save) (i.e., save contains
keyword args to savefig)
Class for importing, working with, and storing data processing
parameters.
Implements cutting and particle identification (PID) functionality that can
be applied to MC/data that have the specified verion of processing applied
to it.
Parameters:
data_proc_params (string or dict) –
If string: looks for the corresponding JSON resource (file) and loads
the contents as a data_proc_params dict
If dict: taken to be data_proc_params dict
The data_proc_params dict must follow the format described below.
detector (string) – Converted to lower-case string which must be a detector key in
data_proc_params dict
proc_ver – Converted to lower-case string which must be a proc_ver key in
data_proc_params dict
Notes
All information describing the processing version is loaded from a JSON
file with the following defined format:
Note that the following common cuts are defined in this class and so
needn’t be defined in the JSON file:
‘true_upgoing_zen’ : Select true-upgoing events by zenith angle
‘true_upgoing_coszen’ : Select true-upgoing events by cos(zenith) angle
data_proc_params dictionary format (and same for corresponding JSON file):
{# Specify the detector name, lower case"<lower-case detector name>":{# Specify the processing version, lower case# Examples in PINGU include "4", "5", and "5.1""<lower-case processing version>":{# Mapping from standard names to path where these can be found in# source HDF5 file; separate HDF5 path nodes with a forward-slash.## Fields that cuts or particles in the "cuts"/"pid" sections below# require (e.g., "cuts_step_1" for PINGU v5 processing), must be# added here so the code knows how to extract the info from the# source HDF5 file.## Outputting a PID field to the destination PISA HDF5 file will *not*# be done if the "pid" field is omitted below.## In addition to the below-named fields, "true_coszen" and# "reco_coszen" are generated from the data from the "true_zenith"# and "reco_zenith" fields, respectively. So any of those fields can# be used via the aforementioned names."field_map":{"run":"<HDF5 path to corresponding node>","nu_code":"<HDF5 path to corresponding node>","true_energy":"<HDF5 path to corresponding node>","true_zenith":"<HDF5 path to corresponding node>","reco_energy":"<HDF5 path to corresponding node>","reco_zenith":"<HDF5 path to corresponding node>","one_weight":"<HDF5 path to corresponding node>","generator_volume":"<HDF5 path to corresponding node>","generator_radius":"<HDF5 path to corresponding node>","detection_length":"<HDF5 path to corresponding node>","interaction_type":"<HDF5 path to corresponding node>","azimuth_min":"<HDF5 path to corresponding node>","azimuth_max":"<HDF5 path to corresponding node>","zenith_min":"<HDF5 path to corresponding node>","zenith_max":"<HDF5 path to corresponding node>","energy_log_min":"<HDF5 path to corresponding node>","energy_log_max":"<HDF5 path to corresponding node>","num_events_per_file":"<HDF5 path to corresponding node>","sim_spectral_index":"<HDF5 path to corresponding node>","pid":"<HDF5 path to corresponding node>",},# Mapping from file's nu code to PDG nu codes (only necessary if# nu_code values are not the PDG codes listed below)"nu_code_to_pdg_map":{"<source nue code>":12,"<source nue_bar code>":-12,"<source numu code>":14,"<source numu_bar code>":-14,"<source nutau code>":16,"<source nutau_bar code>":-16},# Specify standard cuts"cuts":{# Cut name; "bjrej" and "analysis" listed here are just example# names for cuts one might specify"bgrej":{# Which of the fields in the field_map (and the derived fields# such as true_coszen and reco_coszen) are required for this cut?"fields":["<field1>","<field2>",...],# Expression for an event to pass the cut (not get thrown away);# see below for details on specifying an expression"pass_if":"<expression>"},"analysis":{"fields":["<field1>","<field2>",...],"pass_if":"<expression>"},"<cut name>":{"fields":["<field1>","<field2>",...],"pass_if":"<expression>"}},# Particle identification section"pid":{# Name of the particle (case-insensitive); e.g., in PINGU this# would be "trck" or "cscd""<particle name 1>":{# Which of the fields in the field_map (and the derived fields# such as true_coszen and reco_coszen) are required for this cut?"field":[<field1>,<field2>,...],# Expression for an event to be classified as this type of# particle; # see below for details on specifying an expression"criteria":"<expression>"}"<particle name 2>":{"field":[<field1>,<field2>,...],"criteria":"<expression>"}}}}}
Note that cuts “pass_if” and pid “criteria” expressions can make use of the
numpy namespace and have access to any columns extracted from the source
HDF5 file, by the standardized names given in the “field_map”. For example,
if the following “fields” are specified for a cut in the data_proc_params
dict:
[“cuts_step_1”, “cuts_step_2”]
then the following is a valid “pass_if” expression:
Get data attached to an HDF5 node, returned as a dictionary.
The returned dictionary’s keys match those in the field_map and the
dict’s values are the data from the HDF5’s nodes found at the addresses
specified as values in the field_map
Perform mappings from non-standard to standard values (such as
translating non-PDG neutrino flavor codes to PDG codes) and add
fields expected to be useful (such as coszen, derived from zen fields).
Attach / reattach the translated/new fields to the data object passed
into this methd.
Populate the Python global namespace with variables named as the
keys in field_map and values loaded from the h5group at addresses
specified by the corresponding values in field_map.
Retrieve data from an HDF5 group h5group according to
expresssion. This can apply expressions with simple mathematical
operators and numpy functions to multiple fields within the HDF5 file
to derive the output. Python keywords are _not_ allowed, since they
may alias with a name.
Refer to any numpy functions by prefixing with either “np.<func>” or
“numpy.<func>”. In order to specify division, spaces must surround the
forward slash, such that it isn’t interpreted as a path.
Nodes in the HDF5 hierarchy are separated by forward slashes (“/”) in a
path spec. We restrict valid HDF5 node names to contain the characters
a-z, A-Z, 0-9, peroids (“.”), and underscores (“_”). with the
additional restriction that the node name must not start with a period
or a number, and a path cannot start with a slash.
exp_vars (bool) – Expand the string using environment variables. E.g.
“$HOME/${vardir}/xyz” will have “$HOME” and “${vardir}$” replaced by
the values stored in “HOME” and “vardir”.
exp_user (bool) – Expand special home dir spec character, tilde: “~”.
absolute (bool) – Make a relative path (e.g. “../xyz”) absolute, referenced from system
root directory, “/dir/sbudir/xyz”.
resolve_symlinks (bool) – Resolve symlinks to the paths they refer to
Dispatch correct file reader based on fmt (if specified) or guess
based on file name’s extension.
Parameters:
fname (string) – File path / name from which to load data.
fmt (None or string) – If string, for interpretation of the file according to this format. If
None, file format is deduced by an extension found in fname.
**kwargs – All other arguments are passed to the function dispatched to read the
file.
Returns:
Object instantiated from the file (string, dictionary, …). Each format
Sort a sequence of strings with one or more floating point number fields
in using the floating point value(s) (and intervening strings are treated
as normally done). Note that + and - preceding a number are included in the
floating point value unless signed=False.
Code adapted from nedbatchelder.com/blog/200712/human_sorting.html#comments
Parameters:
l (sequence of strings) – Sequence of strings to be sorted.
signed (bool, optional) – Whether to include a “+” or “-” preceeding a number in its value to be
sorted. One might specify False if “-” is used exclusively as a
separator in the string.
reverse (bool, optional) – Whether to reverse the sort order (True => descending order)
Sort a sequence of strings containing integer number fields by the
value of those numbers, rather than by simple alpha order. Useful
for sorting e.g. version strings, etc..
Code adapted from nedbatchelder.com/blog/200712/human_sorting.html#comments
Parameters:
l (sequence of strings) – Sequence of strings to be sorted.
reverse (bool, optional) – Whether to reverse the sort order (True => descending order)
Check whether number of parameters matches dimension of matrix;
matrix is symmetrical; parameter names are unique; and number of
best_fits, labels, and priors all match number of parameters.
Classes for working with neutrino flavors (NuFlav), interactions types
(IntType), “flavints” (a flavor and an interaction type) (NuFlavInt), and
flavint groups (NuFlavIntGroup) in a consistent and convenient manner.
FlavIntData class for working with data stored by flavint (flavor &
interaction type). This should replace the PISA convention of using raw
doubly-nested dictionaries indexed as [<flavor>][<interaction type>]. For
now, FlavIntData objects can be drop-in replacements for such dictionaries
(they can be accessed and written to in the same way since FlavIntData
subclasses dict) but this should be deprecated; eventually, all direct access
of the data structure should be eliminated and disallowed by the FlavIntData
object.
Define convenience tuples ALL_{x} for easy iteration
Context manager to make global __BAR_SSEP__ modification slightly less
error-prone.
__BAR_SSEP__ defines the separator between a flavor and the string “bar”
(to define the flavor as the antiparticle version; e.g. nuebar). Some
datastructures in PISA 2 used ‘_’ between the two (“nue_bar”) while others
did not use this. To make dealing with this (slightly) less painful, this
context manager was introduced to switch between the PISA 3 default (no
‘_’) and the sometimes-use ‘_’ separator. See Examples for usage.
Parameters:
val (string) – Separator to use between flavor (“nue”, “numu”, “nutau”) and “bar”.
Container class for storing data for each NuFlavInt.
Parameters:
val (string, dict, or None) –
Data with which to populate the hierarchy.
If string, interpret as PISA resource and load data from it
If dict, populate data from the dictionary
If None, instantiate with None for all data
The interpreted version of val must be a valid data structure: A
dict with keys ‘nue’, ‘numu’, ‘nutau’, ‘nue_bar’, ‘numu_bar’, and
‘nutau_bar’; and each item corresponding to these keys must itself be a
dict with keys ‘cc’ and ‘nc’.
Notes
Accessing data (both for getting and setting) is fairly flexible. It uses
dict-like square-brackets syntax, but can accept any object (or two
objects) that are convertible to a NuFlav or NuFlavInt object. In the
former case, the entire flavor dictionary (which includes both ‘cc’ and
‘nc’) is returned, while in the latter case whatever lives at the node is
returned.
Initializing, setting and getting data in various ways:
Returns True if all data structures are equal and all numerical
values contained are within relative (rtol) and/or absolute (atol)
tolerance of one another.
Identify flavints with duplicated data (exactly or within a
specified tolerance), convert these NuFlavInt’s into NuFlavIntGroup’s
and returning these along with the data associated with each.
Parameters:
rtol – Set to positive value to use as rtol argument for numpy allclose
atol – Set to positive value to use as atol argument for numpy allclose
0 (If either rtol or atol is)
enforced. (exact equality is)
Returns:
dupe_flavintgroups (list of NuFlavIntGroup) – A NuFlavIntGroup object is returned for each group of NuFlavInt’s
found with duplicate data
dupe_flavintgroups_data (list of objects) – Data associated with each NuFlavIntGroup in dupe_flavintgroups.
Each object in dupe_flavintgroups_data corresponds to, and in the
same order as, the objects in dupe_flavintgroups.
Container class for storing data for some set(s) of NuFlavIntGroups
(cf. FlavIntData, which stores one datum for each NuFlavInt separately)
Parameters:
val (None, str, or dict) – Data with which to populate the hierarchy
flavint_groups (None, str, or iterable) –
User-defined groupings of NuFlavIntGroups. These can be specified
in several ways.
None
If val == None, flavint_groups must be specified
If val != None, flavitn_groups are deduced from the data
string
If val is a string, it is expected to be a comma-separated
list, each field of which describes a NuFlavIntGroup. The
returned list of groups encompasses all possible flavor/int
types, but the groups are mutually exclusive.
iterable of strings or NuFlavIntGroup
If val is an iterable, each member of the iterable is
interpreted as a NuFlavIntGroup.
Returns True if all data structures are equal and all numerical
values contained are within relative (rtol) and/or absolute (atol)
tolerance of one another.
Return tuple of unique charged-current-interaction flavors that
make up this group. Note that only the flavors, and not NuFlavInts, are
returned (cf. method cc_flavints
Return tuple of unique neutral-current-interaction flavors that
make up this group. Note that only the flavors, and not NuFlavInts, are
returned (cf. method nc_flavints
Interpret groups to break into neutrino flavor/interaction type(s)
that are to be grouped together; also form singleton groups as specified
explicitly in groups or for any unspecified flavor/interaction type(s).
The returned list of groups encompasses all possible flavor/int types, but
the groups are mutually exclusive.
Get the separator that is set to be used between “base” flavor (“nue”,
“numu”, or “nutau”) and “bar” when converting antineutrino `NuFlav`s or
`NuFlavInt`s to strings.
Translate a “,”-separated string into separate `NuFlavIntGroup`s.
val
“,”-delimited list of valid NuFlavIntGroup strings, e.g.:
“nuall_nc,nue,numu_cc+numubar_cc”
Note that specifying NO interaction type results in both interaction
types being selected, e.g. “nue” implies “nue_cc+nue_nc”. For other
details of how the substrings are interpreted, see docs for
NuFlavIntGroup.
Returns:
grouped, ungrouped
grouped, ungrouped
lists of NuFlavIntGroups; the first will have more than one flavint
in each NuFlavIntGroup whereas the second will have just one
flavint in each NuFlavIntGroup. Either list can be of 0-length.
This function does not enforce mutual-exclusion on flavints in the
various flavint groupings, but does list any flavints not grouped
together in the ungrouped return arg. Mutual exclusion can be
enforced through set operations upon return.
A set of functions for calculating flux weights given an array of energy and
cos(zenith) values based on the Honda atmospheric flux tables. A lot of this
functionality will be copied from honda.py but since I don’t want to initialise
this as a stage it makes sense to copy it in to here so somebody can’t
accidentally do the wrong thing with that script.
Calculate flux weights for given array of energy and cos(zenith).
Arrays of true energy and zenith are expected to be for MC events, so
they are tested to be of the same length.
en_splines should be the spline for the primary of interest. The entire
dictionary is calculated in the previous function.
Parameters:
true_energies (list or numpy array) – A list of the true energies of your MC events. Pass this in GeV!
true_coszens (list or numpy array) – A list of the true coszens of your MC events
en_splines (list of splines) – A list of the initialised energy splines from the previous function
for your desired primary.
enpow (integer) – The power to which the energy was raised in the construction of the
splines. If you don’t know what this means, leave it as 1.
out (np.array) – optional array to store results
Example
Use the previous function to calculate the spline dict for the South Pole.
Calculate flux weights for given array of energy, cos(zenith) and
azimuth.
Arrays of true energy, zenith and azimuth are expected to be for MC events,
so they are tested to be of the same length.
En_splines should be the spline for the primary of interest. The entire
dictionary is calculated in the previous function.
Parameters:
true_energies (list or numpy array) – A list of the true energies of your MC events. Pass this in GeV!
true_coszens (list or numpy array) – A list of the true coszens of your MC events
true_azimuths (list or numpy array) – A list of the true azimuths of your MC events. Pass this in radians!
en_splines (list of splines) – A list of the initialised energy splines from the previous function
for your desired primary.
enpow (integer) – The power to which the energy was raised in the construction of the
splines. If you don’t know what this means, leave it as 1.
az_linear (boolean) – Whether or not to linearly interpolate in the azimuthal direction. If
you don’t know why this is an option, leave it as true.
Example
Use the previous function to calculate the spline dict for the South Pole.
2D is expected to mean energy and cosZenith, where azimuth is averaged
over (before being stored in the table) and the zenith range should
include both hemispheres.
Parameters:
flux_file (string) – The location of the flux file you want to spline. Should be a honda
azimuth-averaged file.
enpow (integer) – The power to which the energy will be raised in the construction of the
splines. If you don’t know what this means, leave it as 1.
return_table (boolean) – Flag to true if you want the function to also return a dictionary
of the underlying values from the tables. Useful for comparisons.
3D is expected to mean energy, cosZenith and azimuth. The angular range
should be fully sky.
Parameters:
flux_file (string) – The location of the flux file you want to spline. Should be a honda
azimuth-averaged file.
enpow (integer) – The power to which the energy will be raised in the construction of the
splines. If you don’t know what this means, leave it as 1.
return_table (boolean) – Flag to true if you want the function to also return a dictionary
of the underlying values from the tables. Useful for comparisons.
Convert arg to a tuple: None becomes an empty tuple, an isolated
string becomes a tuple containing that string, and any iterable or sequence
is simply converted into a tuple.
Parameters:
arg (str, sequence of str, iterable of str, or None)
Format number as string in engineering format (10^(multiples-of-three)),
including the most common metric prefixes (from atto to Exa).
Parameters:
n (scalar) – Number to be formatted
sigfigs (int >= 0) – Number of significant figures to limit the result to; default=3.
decimals (int or None) – Number of decimals to display (zeros filled out as necessary). If None,
decimals is automatically determined by the magnitude of the
significand and the specified sigfigs.
sign_always (bool) – Prefix the number with “+” sign if number is positive; otherwise,
only negative numbers are prefixed with a sign (“-“)
Fine-grained control over formatting a number as a string.
Parameters:
value (numeric) – The number to be formatted.
sigfigs (int > 0, optional) – Use up to this many significant figures for displaying a number. You
can use either sigfigs or precision, but not both. If neither are
specified, default is to set sigfigs to 8. See also trailing_zeros.
precision (float, optional) – Round value to a precision the same as the order of magnitude of
precision. You can use either precision or sigfigs, but not both.
If neither is specified, default is to set sigfigs to 8. See also
trailing_zeros.
fmt (None or one of {'sci', 'eng', 'sipre', 'binpre', 'full'}, optional) –
Force a particular format to be used::
None allows the value and what is passed for sci_thresh and
exponent to decide whether or not to use scientific notation
’sci’ forces scientific notation
’eng’ uses the engineering convention of powers divisible by 3
(10e6, 100e-9, etc.)
’sipre’ uses powers divisible by 3 but uses SI prefixes (e.g. k,
M, G, etc.) instead of displaying the exponent
’binpre’ uses powers of 1024 and uses IEC prefixes (e.g. Ki, Mi,
Gi, etc.) instead displaying the exponent
’full’ forces printing all digits left and/or right of the
decimal to display the number (no exponent notation or SI/binary
prefix will be used)
Note that the display of NaN and +/-inf are unaffected by
fmt.
exponent (None, integer, or string, optional) – Force the number to be scaled with respect to this exponent. If a
string prefix is passed and fmt is None, then the SI prefix
or binary prefix will be used for the number. E.g., exponent=3
would cause the number 1 to be expressed as '0.001e3'`,while``exponent='k' would cause it to be displayed as '1m'. Both ‘μ’
and ‘u’ are accepted to mean “micro”. A non-None value for
exponent forces some form of scientific/engineering notation, so
fmt cannot be 'full' in this case. Finally, if
fmt is 'binpre' then exponent is applied to 1024.
I.e., 1 maps to kibi (Ki), 2 maps to mebi (Mi), etc.
sci_thresh (sequence of 2 integers) – When to switch to scientific notation. The first integer is the order
of magnitude of value at or above which scientific notation will be
used. The second integer indicates the order of magnitude at or below
which the most significant digit falls for scientific notation to be
used. E.g., sci_thresh=(3,-3) means that numbers in the
ones-of-thousands or greater OR numbers in the ones-of-thousandths or
less will be displayed using scientific notation. Note that
fmt, if not None, overrides this behavior. Default is
(10,-5).
inf_thresh (numeric, optional) – Numbers whose magnitude is equal to or greater than this threhshold are
considered infinite and therefore displayed using infstr (possibly
including a sign, as appropriate). Default is np.inf.
trailing_zeros (bool, optional) – Whether to display all significant figures specified by sigfigs, even
if this results in trailing zeros. Default is False.
always_show_sign (bool, optional) – Always show a sign, whether number is positive or negative, and whether
exponent (if present) is positive or negative. Default is False.
decstr (string, optional) – Separator to use for the decimal point. E.g. decstr='.' or
decstr=',' for mthe most common cases, but this can also be used in
TeX tables for alignment on decimal points via decstr='&.&'.
Default is ‘.’.
thousands_sep (None or string, optional) – Separator to use between thousands, e.g. thousands_sep=',' to give
results like '1,000,000', or `thousands_sep=r'\,' for TeX
formatting with small spaces between thousands. Default is None.
thousandths_sep (None or string, optional) – Separator to use between thousandthss. Default is None.
left_delimiter (None or string, optional) – Strings to delimit the left and right sides of the resulting string.
E.g. left_delimiter='${' and right_delimiter='}$' could be used
to delimit TeX-formatted strings, such that a number is displayed,
e.g., as r'${1\times10^3}$'. Defaults are None for both.
right_delimiter (None or string, optional) – Strings to delimit the left and right sides of the resulting string.
E.g. left_delimiter='${' and right_delimiter='}$' could be used
to delimit TeX-formatted strings, such that a number is displayed,
e.g., as r'${1\times10^3}$'. Defaults are None for both.
expprefix (None or string, optional) – E.g. use expprefix=’e’ for simple “e” scientific notation (“1e3”),
or use expprefix=r’times10^{’ and exppostfix=r’}’ for
TeX-formatted scientific notation. Use a space (or tex equivalent) for
binary and SI prefixes. If scientific notation is to be used,
`expprefix defaults to ‘e’. If either SI or binary prefixes are to be
used, expprefix defaults to ‘ ‘ (space). In any case, exppostfix
defaults to None.
exppostfix (None or string, optional) – E.g. use expprefix=’e’ for simple “e” scientific notation (“1e3”),
or use expprefix=r’times10^{’ and exppostfix=r’}’ for
TeX-formatted scientific notation. Use a space (or tex equivalent) for
binary and SI prefixes. If scientific notation is to be used,
`expprefix defaults to ‘e’. If either SI or binary prefixes are to be
used, expprefix defaults to ‘ ‘ (space). In any case, exppostfix
defaults to None.
nanstr (string, optional) – Not-a-number (NaN) values will be displayed using this string. Default
is ‘nan’ (following the Numpy convention)
infstr (string, optional) – Infinite values will be displayed using this string (note that the sign
is prepended, as appropriate). Default is ‘inf’ (following the Numpy
convention).
Convert a human-readable string specifying a list-of-lists of numbers to
a Python list-of-lists of numbers.
Parameters:
hrlol (string) – Human-readable list-of-lists-of-numbers string. Each list specification
is separated by a semicolon, and whitespace is ignored. Refer to
hrlist2list for list specification.
Returns:
lol
Return type:
list-of-lists of numbers
Examples
A single number evaluates to a list with a list with a single number.
>>> hrlol2lol("1")[[1]]
A sequence of numbers or ranges can be specified separated by commas.
>>> hrlol2lol("1, 3.2, 19.8")[[1, 3.2, 19.8]]
A range can be specified with a dash; default is a step size of 1 (or -1 if
the end of the range is less than the start of the range); note that the
endpoint is included, unlike slicing in Python.
>>> hrlol2lol("1-3")[[1, 2, 3]]
The range can go from or to a negative number, and can go in a negative
direction.
>>> hrlol2lol("-1 - -5")[[-1, -3, -5]]
Multiple lists are separated by semicolons, and parentheses and brackets
can be used to make it easier to understand the string.
Convert a signed or unsigned integer bits long to hexadecimal
representation. As many hex characters are returned to fully specify any
number bits in length regardless of the value of i.
Parameters:
i (int) – The integer to be converted. Signed integers have a range of
-2**(bits-1) to 2**(bits-1)-1), while unsigned integers have a range of
0 to 2**(bits-1).
bits (int) – Number of bits long the representation is
signed (bool) – Whether the number is a signed integer; this is dependent upon the
representation used for numbers, and _not_ whether the value i is
positive or negative.
Returns:
h (string of length ceil(bits/4.0) since it takes this many hex characters)
Before splitting the list, the string has extraneous whitespace removed
from either end.
The strings that result after the split can have their case forced or be
left alone.
Whitespace surrounding (but not falling between non-whitespace) in each
resulting string is removed.
After all of the above, the value can be parsed further by a
user-supplied parse_func.
Note that repeating a separator without intervening values yields
empty-string values.
Parameters:
string (string) – The string to be split
sep (string) – Separator to look for
force_case (None, 'lower', or 'upper') – Whether to force the case of the resulting items: None does not change
the case, while ‘lower’ or ‘upper’ change the case.
parse_func (None or callable) – If a callable is supplied, each item in the list, after the basic
parsing, is processed by parse_func.
Returns:
lst – The types of the items in the list depend upon parse_func if it is
supplied; otherwise, all items are strings.
Return type:
list of objects
Examples
>>> print(split(' One, TWO, three ',sep=',',force_case='lower'))['one', 'two', 'three']
Smart string formatting for a time difference (in seconds)
Parameters:
dt_sec (numeric) – Time difference, in seconds
hms_always (bool) –
True
Always display hours, minuts, and seconds regardless of the order-
of-magnitude of dt_sec
False
Display a minimal-length string that is meaningful, by omitting
units that are more significant than those necessary to display
dt_sec; if…
* dt_sec < 1 s
Use engineering formatting for the number.
dt_sec is an integer in the range 0-59 (inclusive)
sec_decimals is ignored and the number is formatted as an
integer
See Notes below for handling of units.
(Default: False)
sec_decimals (int) – Round seconds to this number of digits
Notes
If colon notation (e.g. HH:MM:SS.xxx, MM:SS.xxx, etc.) is not used, the
number is only seconds, and is appended by a space ‘ ‘ followed by units
of ‘s’ (possibly with a metric prefix).
Sum of multiple Gaussian curves, normalized to have area of 1.
Parameters:
x (array) – Points at which to evaluate the sum of Gaussians
mu (arrays) – Means of the Gaussians to accumulate
sigma (array) – Standard deviations of the Gaussians to accumulate
weights (array or None) – Weights given to each Gaussian
implementation (None or string) – One of ‘singlethreaded’, ‘multithreaded’, or ‘cuda’. Passing None, the
function will try to determine which of the implementations is best to
call.
kwargs – Passed on to the underlying implementation
Returns:
outbuf – Resulting sum of Gaussians
Return type:
array
Notes
This function dynamically calls an appropriate implementation depending
upon the problem size, the hardware available (multi-core CPU or a GPU),
and whether the user specifies implementation.
Return hash for an object. Object can be a numpy ndarray or matrix
(which is serialized to a string), an open file (which has its contents
read), or any pickle-able Python object.
Note that only the first most-significant 8 bytes (64 bits) from the MD5
sum are used in the hash.
Parameters:
obj (object) – Object to hash. Note that the larger the object, the longer it takes to
hash.
hash_to (string) –
‘i’, ‘int’, or ‘integer’: First 8 bytes of the MD5 sum are interpreted
as an integer.
’b’, ‘bin’, or ‘binary’: MD5 sum digest; returns an 8-character string
‘h’, ‘x’, ‘hex’: MD5 sum hexdigest, (string of 16 characters)
‘b64’, ‘base64’: first 8 bytes of MD5 sum are base64 encoded (with ‘+’
and ‘-’ as final two characters of encoding). Returns string of 11
characters.
full_hash (bool) – If True, hash on the full object’s contents (which can be slow) or if
False, hash on a partial object. For example, only a file’s first kB is
read, and only 1000 elements (chosen at random) of a numpy ndarray are
hashed on. This mode of operation should suffice for e.g. a
minimization run, but should _not_ be used for storing to/loading from
disk.
Return the contents of an HDF5 file or node as a nested dict; optionally
return a second dict containing any HDF5 attributes attached to the
entry-level HDF5 entity.
Parameters:
val (string or h5py.Group) –
Specifies entry-level entity
* If val is a string, it is interpreted as a filename; file is opened
as an h5py.File
Otherwise, val must be an h5py.Group in an instantiated object
return_node (None or string) – Not yet implemented
choose (None or list) – Optionally can provide a list of variables names to parse (items not in
this list will be skipped, saving time & memory)
Returns:
data – Nested dictionary; keys are HDF5 node names and values contain the
contents of that node. If the entry-level entity of val has “attrs”,
these are extracted and attached as an OrderedDict at data.attrs;
otherwise, this entity is an empty OrderedDict.
Return type:
OrderedDict with additional attr of type OrderedDict named attrs
Store a (possibly nested) dictionary to an HDF5 file or branch node
within an HDF5 file (an h5py Group).
This creates hardlinks for duplicate non-trivial leaf nodes (h5py Datasets)
to minimize storage space required for redundant datasets. Duplication is
detected via object hashing.
NOTE: Branch nodes are sorted before storing (by name) for consistency in
the generated file despite Python dictionaries having no defined ordering
among keys.
Parameters:
data_dict (Mapping) – Dictionary, OrderedDict, or other Mapping to be stored
tgt (str or h5py.Group) – Target for storing data. If tgt is a str, it is interpreted as a
filename; a file is created with that name (overwriting an existing
file, if present). After writing, the file is closed. If tgt is an
h5py.Group, the data is simply written to that Group and it is left
open at function return.
overwrite (bool) – Set to True (default) to allow overwriting existing file. Raise
exception and quit otherwise.
warn (bool) – Issue a warning message if a file is being overwritten. Suppress
warning by setting to False (e.g. when overwriting is the desired
behaviour).
Interpret arrays (lists by default) as numpy arrays where this does
not yield a string or object array; also handle conversion of
particularly-formatted input to pint Quantities.
Uses a custom parser that automatically converts numpy arrays to lists.
If filename has a “.bz2” extension, the contents will be compressed
(using bz2 and highest-level of compression, i.e., -9).
If filename has a “.xor” extension, the contents will be xor-scrambled to
make them human-unreadable (this is useful for, e.g., blind fits).
Parameters:
content (obj) – Object to be written to file. Tries making use of the object’s own
to_json method if it exists.
filename (str) – Name of the file to be written to. Extension has to be ‘json’ or ‘bz2’.
indent (int) – Pretty-printing. Cf. documentation of json.dump() or json.dumps()
overwrite (bool) – Set to True (default) to allow overwriting existing file. Raise
exception and quit otherwise.
warn (bool) – Issue a warning message if a file is being overwritten (True,
default). Suppress warning by setting to False (e.g. when overwriting
is the desired behaviour).
sort_keys (bool) – Output of dictionaries will be sorted by key if set to True.
Default is False. Cf. json.dump() or json.dumps().
Handle zenith like coszen? Or better: Define set of variables to perform
reflection on and reflection parameters (e.g. reflect_fract or somesuch
to stand in for for coszen_reflection and reflect_dims as standin for
coszen_name; also need some way to specify whether to reflect about lower
and/or upper edge); each such parameter can either be a single value, or a
sequence with one value per variable.
Any good reason for 0.25 and ‘scott’ defaults? If not, don’t define a
default and force the user to explicitly set this when function is called.
Run kernel density estimation (KDE) for an array of data points, and
then evaluate them on a histogram-like grid to effectively produce a
histogram-like output.
Handles reflection at coszen edges, and will expect coszen to be in the binning
bw_method (string) – ‘scott’ or ‘silverman’ (see kde module)
adaptive (bool) – (see kde module)
alpha (float) – A parameter for the KDEs (see kde module)
use_cuda (bool) – Run on GPU (only works with <= 2d)
coszen_reflection (float) – Part (number between 0 and 1) of binning that is reflected at the
coszen -1 and 1 edges
coszen_name (string) – Binning name to identify the coszen bin that needs to undergo special
treatment for reflection
oversample (int) – Evaluate KDE at more points per bin, takes longer, but is more accurate
stack_pid (bool) – Treat each pid bin separately, not as another dimension of the KDEs
Only supported for two additional dimensions, pid binning must be named pid
bootstrap (bool) – Use the bootstrap_kde class to produce error estimates on the KDE histograms.
Slow, not recommended during fits.
bootstrap_niter (int) – Number of bootstrap iterations.
Returns:
histogram (numpy.ndarray)
ToDo
—–
* Maybe return Map with binnings attached insted of nd-array?
* Generalize to handle any dimensions with any reflection criterias
Barlow-Beeston log-likelihood (constant terms not omitted)
Link to paper: https://doi.org/10.1016/0010-4655(93)90005-W
– Input variables –
data = data histogram
mc = weighted MC histogram
unweighted_mc = unweighted MC histogream
weights = weight of each bin
– Output –
llh = LLH values in each bin
– Notes –
Shape of data, mc, unweighted_mc, weights and llh must be identical
Log-likelihood based on the poisson-gamma mixture. This is a Poisson likelihood using a Gamma prior.
This implementation is based on the implementation of Austin Schneider (aschneider@icecube.wisc.edu)
– Input variables –
data = data histogram
sum_w = MC histogram
sum_w2 = Uncertainty map (sum of weights squared in each bin)
a, b = hyperparameters of gamma prior for MC counts; default values of a = 1 and b = 0 corresponds to LEff (eq 3.16) https://doi.org/10.1007/JHEP06(2019)030
Server(s) for handling llh requests from a client: client passes free param
values, server sets these on its DistributionMaker, generates outputs, and
compares the resulting distributions against a reference template, returning
the llh value.
Send a Python object over a socket. Object is pickle-encoded as the
payload and sent preceded by a 4-byte header which indicates the number of
bytes of the payload.
This module sets up the logging system by looking for a “logging.json”
configuration file. It will search (in this order) the local directory, $PISA
and finally the package resources. The loggers found in there will be lifted
to the module namespace.
Currently, we have three loggers
* logging: generic for what is going on (typically: opening file x or
doing this now messages)
physics: for any physics output that might be interesting
(have x many events, the flux is …)
tprofile: for how much time it takes to run some step (in the format of
time : start bla, time : stop bla)
Find the positive semi-definite matrix closest to A.
The closeness to A is measured by the Fronebius norm. The matrix closest to A
by that measure is uniquely defined in [3].
Parameters:
A (numpy.ndarray) – Symmetric matrix
return_distance (bool, optional) – Return distance of the input matrix to the approximation as given in
theorem 2.1 in [3].
This can be compared to the actual Frobenius norm between the
input and output to verify the calculation.
Returns:
X – Positive semi-definite matrix approximating A.
Return type:
numpy.ndarray
Notes
This function is a modification of [1]_, which is a Python adaption of [2], which
credits [3].
run_settings dictionary format (and same for corresponding JSON file, e.g.
resources/events/mc_sim_run_settings.json); example is for PINGU but should
generalize to DeepCore, etc. Note that a JSON file will use null and not
None.
(Also note that expressions for numerical values utilizing the Python/numpy
namespace are valid, e.g. “2*pi” will be evaluated within the code to its
decimal value.):
{# Specify the detector name, lower case"pingu":{# Monte Carlo run number for this detector"388":{# Version of geometry, lower-case. E.g., ic86 for IceCube/DeepCore"geom":"v36",# A straightforward way of computing aeff/veff/meff is to keep all# simulated events and compare the #analysis/#simulated. If all# simulated events are kept, then the filename containing these is# recorded here."all_gen_events_file":None,# Max and min azimuth angle simulated (rad)"azimuth_max":"2*pi","azimuth_min":0,# Max and min energy simulated (GeV)"energy_max":80,"energy_min":1,# GENIE simulates some un-physica events (interactions that will not# occur in nature). The number below was arrived at by Ken Clark, so# ask him for more info."physical_events_fract":0.8095,# GENIE has a prescale factor (TODO: generalize or eliminate for# other xsec?)"genie_prescale_factor":1.2,# Neutrino flavors simulated"flavints":"nutau,nutaubar",# #nu / (#nu + #nubar) simulated"nu_to_total_fract":0.5,# Number of events simulated per I3 file"num_events_per_file":250000,# Number of I3 files used"num_i3_files":195,# Simulated spectral inde gamma; value of 1 => E*{-1}"sim_spectral_index":1,# Version of neutrino/ice cross sections used for the simulation"xsec_version":"genie_2.6.4",# Max and min zenith angle simulated (rad)"zenith_max":"pi","zenith_min":0}}}
Fraction of events generated (either particles or antiparticles).
Specifying whether you want the fraction for particle or
antiparticle is done by specifying one (and only one) of the three
:param barnobar:
:param is_particle or flav_or_flavint:
Parameters:
barnobar (None or int) – -1 for antiparticle, +1 for particle
is_particle (None or bool) – True for particle, false for antiparticle
flav_or_flavint (None or convertible to NuFlav or NuFlavInt) – Particle or antiparticles is determined from the flavor or flavint
passed
barnobar (None or int) – -1 for antiparticle or +1 for particle
is_particle (None or bool)
flav_or_flavint (None or convertible to NuFlav or NuFlavInt) – If one of barnobar, is_particle, or flav_or_flavint is
specified, returns only the number of particles or antiparticles
generated. Otherwise (if none of those is specified), return the
total number of generated events.
include_physical_fract (bool) – Whether to include the “GENIE physical fraction”, which accounts
for events that are generated but are un-physical and therefore
will never be detectable. These are removed to not penalize
detection efficiency.
Decorator to assign the right jit for different targets
In case of non-cuda targets, all instances of cuda.local.array
are replaced by np.empty. This is a dirty fix, hopefully in the
near future numba will support numpy array allocation and this will
not be necessary anymore
Parameters:
func (callable)
Returns:
new_nb_func – Refactored version of func but with cuda.local.array replaced by
np.empty if TARGET == “cpu”. For either TARGET, the returned
function will be callable within numba code for that target.
Derive a numpy.random.RandomState object (usable to generate random
numbers and distributions) from a flexible specification..
Parameters:
random_state (None, RandomState, string, int, state vector, or seq of int) –
If instantiated RandomState object is passed, it is used directly
If string : must be either ‘rand’ or ‘random’; random state is
instantiated at random from either /dev/urandom or (if that is not
present) the clock. This creates an irreproducibly-random number.
If int or sequence of lenth one: This is used as the seed value;
must be in [0, 2**32).
If sequence of two integers: first must be in [0, 32768): 15
most-significant bits. Second must be in [0, 131072): 17
least-significant bits.
If sequence of three integers: first must be in [0, 4): 2
most-significant bits. Second must be in [0, 8192): next 13
(less-significant) bits. Third must be in [0, 131072): 17
least-significant bits.
If a “state vector” (sequence of length five usable by
numpy.random.RandomState.set_state), set the random state using
this method.
Returns:
random_state – Object callable like numpy.random (e.g. random_state.rand((10,10))),
but with __exclusively local__ state (whereas numpy.random has global
state).
Find a file or directory in the filesystem (i.e., something that the
operating system can locate, as opposed to Python package resources, which
can be located within a package and therefore hidden from the filesystem).
Parameters:
pathspec (string)
fail (bool)
Return type:
None (if not found) or string (absolute path to file or dir if found)
First check if resource is an absolute path, then check relative to
the paths specified by the PISA_RESOURCES environment variable (if it is
defined). Otherwise, look in the resources directory of the PISA
installation.
Note that the PISA_RESOURCES environment variable can contain multiple
paths, each separated by a colon. Due to using colons as separators,
however, the paths themselves can not contain colons.
Note also if PISA is packaged as archive distribution (e.g. zipped egg),
this method extracts the resource (if directory, the entire contents are
extracted) to a temporary cache directory. Therefore, it is preferable to
use the open_resource method directly, and avoid this method if possible.
Parameters:
resource (str) – Resource path; can be path relative to CWD, path relative to
PISA_RESOURCES environment variable (if defined), or a package resource
location relative to the pisa_examples/resources sub-directory. Within
each path specified in PISA_RESOURCES and within the
pisa_examples/resources dir, the sub-directories ‘data’, ‘scripts’,
and ‘settings’ are checked for resource _before_ the base directories
are checked. Note that the first result found is returned.
fail (bool) – If True, raise IOError if resource not found
If False, return None if resource not found
Returns:
String if resource is found (relative path to the file or directory); if
not found and fail is False, returns None.
Raises:
IOError if resource is not found and fail is True. –
Find the resource file (see find_resource), open it, and return a file
handle.
Parameters:
resource (str) – Resource path; can be path relative to CWD, path relative to
PISA_RESOURCES environment variable (if defined), or a package resource
location relative to PISA’s pisa_examples/resources sub-directory.
Within each path specified in PISA_RESOURCES and within the
pisa_examples/resources dir, the sub-directories ‘data’, ‘scripts’,
and ‘settings’ are checked for resource _before_ the base directories
are checked. Note that the first result found is returned.
mode (str) – ‘r’, ‘w’, or ‘rw’; only ‘r’ is valid for package resources (as these
cannot be written)
Return type:
binary stream object (which behaves identically to a file object)
Contained class for operating on Spline objects for various neutrino
flavours.
Inherits from FlavIntData object. Provides methods to allow
evaluation of the splines for all neutrino flavours.
Parameters:
inSpline (Spline or tuple of Spline) – Spline objects with name entry corresponding to a neutrino flavour
nue, numu, nuebar, numubar and also corresponding to an
interaction type cc and nc if the flag interactions is True.
interactions (Bool) – Default = True
Flag to specifiy whether to store flavours or flavour+interaction
signatures.
Compute the Barlow LLH taking into account finite statistics.
The likelihood is described in this paper: https://doi.org/10.1016/0010-4655(93)90005-W
:param actual_values:
:type actual_values: numpy.ndarrays of same shape
:param expected_values:
:type expected_values: numpy.ndarrays of same shape
s (float) – sigma for smearing term (= the uncertainty to be accounted for)
nsigma (int) – The ange in sigmas over which to do the convolution, 3 sigmas is > 99%,
so should be enough
steps (int) – Number of steps to do the intergration in (actual steps are 2*steps + 1,
so this is the steps to each side of the gaussian smearing term)
Compute the log-likelihood (llh) based on eq. 3.16 - https://doi.org/10.1007/JHEP06(2019)030
accounting for finite MC statistics.
This is the most recommended likelihood in the paper.
Parameters:
actual_values (numpy.ndarrays of same shape)
expected_values (numpy.ndarrays of same shape)
Returns:
llh – llh corresponding to each pair of elements in actual_values and
expected_values.
Compute the log-likelihood (llh) based on LMean in table 2 - https://doi.org/10.1007/JHEP06(2019)030
accounting for finite MC statistics.
This is the second most recommended likelihood in the paper.
Parameters:
actual_values (numpy.ndarrays of same shape)
expected_values (numpy.ndarrays of same shape)
Returns:
llh – llh corresponding to each pair of elements in actual_values and
expected_values.
Convoluted poisson likelihood normalized so that the value at k=l
(asimov) does not change
Parameters:
k (float)
l (float)
s (float) – sigma for smearing term (= the uncertainty to be accounted for)
nsigma (int) – The range in sigmas over which to do the convolution, 3 sigmas is >
99%, so should be enough
steps (int) – Number of steps to do the intergration in (actual steps are 2*steps + 1,
so this is the steps to each side of the gaussian smearing term)
Returns:
Convoluted poisson likelihood normalized so that the value at k=l
(asimov) does not change
Fixed-bandwidth (standard) Gaussian KDE using the Improved
Sheather-Jones bandwidth.
Code adapted for Python from the implementation in Matlab by Zdravko Botev.
Ref: Z. I. Botev, J. F. Grotowski, and D. P. Kroese. Kernel density
estimation via diffusion. The Annals of Statistics, 38(5):2916-2957, 2010.
Parameters:
data (array)
weights (array or None)
n_dct (None or int) – Number of points with which to form a regular grid, from min to
max; histogram values at these points are sent through a discrete
Cosine Transform (DCT), so n_dct should be an integer power of 2 for
speed purposes. If None, uses next-highest-power-of-2 above
len(data)*10.
Variable-bandwidth (standard) Gaussian KDE that uses the function
fbwkde for a pilot density estimate.
Parameters:
data (array) – The data points for which the density estimate is sought
weights (array or None) – Weight for each point in data
n_dct (None or int) – Number of points with which to form a regular grid, from min to
max; histogram values at these points are sent through a discrete
Cosine Transform (DCT), so n_dct should be an integer power of 2 for
speed purposes. If None, uses next-highest-power-of-2 above
len(data)*10.
min (None or float) – Minimum of range over which to compute density.
If None, defaults to min(data) - range(data)/2
max (None or float) – Maximum of range over which to compute density.
If None: max(data) + range(data)/2
n_addl_iter (int >= 0) – Number of additional iterations on the Abramson VBW density after
the initial VBW estimate. This can help smooth the tails of the
distribution, at the expense of additional computational cost.
evaluate_dens (bool) – Whether to evaluate and return the density estimate on the points
defined by evaluate_at
evaluate_at (None, float, or array of float) –
Point(s) at which to evaluate the density estimate. If None,
evaluate_at = np.linspace(min + dx/2, max - dx/2, n_dct)
where
dx = (max - min) / n_dct
Returns:
kernel_bandwidths (array)
evaluate_at (array) – Locations at which the density estimates would be evaluated
density (array) – Density estimates
Notes
Specifying the range:
The specification of min and max are critical for obtaining a
reasonable density estimate. If the true underlying density slowly
decays to zero on one side or the other, like a gaussian, specifying
too-small a range will distort the edge the VBW-KDE finds. On the
other hand, an abrupt cut-off in the distribution should be
accompanied by a similar cutoff in the computational range (min and/or
max). The algorithm here will approximate such a sharp cut-off with
roughly the same performance to the reflection method for standard
KDE’s (as the fixed-BW portion uses a DCT of the data), but note that
this will not perform as well as polynomial-edges or other
modifications that have been proposed in the literature.