pybilt.common package¶

Submodules¶

pybilt.common.distance_cutoff_clustering module¶

Function to compute hiearchical distance cutoff clusters.

pybilt.common.distance_cutoff_clustering.distance_cutoff_clustering(vectors, cutoff, dist_func, min_size=1, *df_args, **df_kwargs)[source]¶

Hiearchical distance cutoff clustering.

This function takes a set of vector points and clusters them using a hiearchical distance based clustering algorithm. Points are clustered together whenevever a point is within the cutoff distance of any point within in a cluster.

Parameters

vectors (np.array, list like) – The array of vector points.
cutoff (float) – The cutoff distance.
dist_func (function) – The function to use when computing the distance between points.
min_size (Optional[int]) – The minimum size of a cluster. Defaults to 1.
*df_args – Any additional arguments to be passed to the distance function (dist_func).
**df_kwargs – Any additional keyword arguments to be passed to the distance function (dist_func).

Returns

Returns a list of clustered points where each set of clustered i: points is a list of the indices of the points in that cluster.

Return type

list

pybilt.common.distance_cutoff_clustering.distance_euclidean(v_a, v_b)[source]¶

Compute the Euclidean distance between two vectors.

Parameters

v_a (numpy.array, array like) – The first input vector.
v_b (numpy.array, array like) – The second input vector.

Returns

The Euclidean distance between the two vectors.

Return type

float

Notes

The two vectors should have the same size and dimension.

pybilt.common.distance_cutoff_clustering.distance_euclidean_pbc(v_a, v_b, box_lengths, center='zero')[source]¶

Compute the Euclidean distance between two vectors under periodic boundaries.

Parameters

v_a (numpy.array, array like) – The first input vector.
v_b (numpy.array, array like) – The second input vector.
box_lengths (numpy.array, array like) – The periodic boundary box lengths for each dimension.
center (Optional[str, array like]) – Set the coordinate center of the periodic box dimensions. Defaults to ‘zero’, which sets the center to numpy.zeros(len(box_lengths)). Also accepts the string value ‘box_half’, which sets the center to 0.5*box_lengths.

Returns

The Euclidean distance between the two vectors.

Return type

float

Notes

The two vectors should have the same size and dimension, while: box_lengths should have the length of the vector dimension.

pybilt.common.distance_cutoff_clustering.vector_difference_pbc(v_a, v_b, box_lengths, center='zero')[source]¶

Compute the Euclidean distance between two vectors under periodic boundaries.

Parameters

v_a (numpy.array, array like) – The first input vector.
v_b (numpy.array, array like) – The second input vector.
box_lengths (numpy.array, array like) – The periodic boundary box lengths for each dimension.
center (Optional[str, array like]) – Set the coordinate center of the periodic box dimensions. Defaults to ‘zero’, which sets the center to numpy.zeros(len(box_lengths)). Also accepts the string value ‘box_half’, which sets the center to 0.5*box_lengths.

Returns

The Euclidean distance between the two vectors.

Return type

float

Notes

The two vectors should have the same size and dimension, while: box_lengths should have the length of the vector dimension.

pybilt.common.gaussian module¶

Define Gaussian function objects.

This module defines the Gaussian class and the GaussianRange class.

class pybilt.common.gaussian.Gaussian(mean, std)[source]¶

Bases: object

A Gaussian function object.

mean¶

The mean of the Gaussian.

Type: float

std¶

The standard deviation of the Gaussian.

Type: float

Initialize a Gaussian function object.

Parameters

mean (float) – Set the mean of the Gaussian.
std (float) – Set the standard deviation of the Gaussian.

eval(x_in)[source]¶

Return the Gaussian function evaluated at the input x value.

Parameters: x_in (float) – The x value to evaluate the function at.
Returns: The function evaluation for the Gaussian.
Return type: float

reset_mean(new_mean)[source]¶

Change the mean of the Gaussian function.

Parameters: new_mean (float) – The new mean of the Gaussian function.

class pybilt.common.gaussian.GaussianRange(in_range, mean, std, npoints=200)[source]¶

Bases: object

Define a Gaussian function over a range.

This object is used to define a Gaussian function over a defined finite range and store its values as evaluated at points evenly spaced over the range. The points can then for example be used for integrating the Gaussian function over the range using numerical quadrature.

mean¶

The mean of the Gaussian.

Type: float

std¶

The standard deviation of the Gaussian.

Type: float

upper¶

The upper boundary of the range.

Type: float

lower¶

The lower boundary of the range.

Type: float

npoints¶

The number of points to evaluate in the range.

Type: int

Initialize the GaussianRange object.

The GaussianRange stores the values of Gaussian function with the input mean and standard deviation evaluated at evenly spaced points in the specified x-value range.

Parameters

in_range (tuple, list) – Specify the endpoints for range, e.g. (x_start, x_end).
mean (float) – The mean of the Gaussian function.
std (float) – The standard deviation of the Gaussian function.
npoints (Optional[int]) – The number of x-value points to evaluate the Gaussian function for in the specified range (i.e. in_range).

eval(x_in)[source]¶

Return the Gaussian function evaluated at the input x value.

Parameters: x_in (float) – The x value to evaluate the function at.
Returns: The function evaluation for the Gaussian.
Return type: float

get_values()[source]¶

Return the x and y values for the Gaussian range function.

Returns: The x and y values for the function, returned as ( x_values, y_values).
Return type: tuple

integrate_range(lower, upper)[source]¶

Returns the numerical integration of the Gaussian range.

This function does a simple quadrature for the Gaussian function as evaluated on the range (or subset of the range) specified at initialization.

Parameters

lower (float) – The lower boundary for the integration.
upper (float) – The upper boundary for the integration.

Returns

The numerical value of the Gaussian range integrated from: lower to upper.

Return type

float

Notes

This function does not thoroughly check the bounds, so if upper is less than lower the function will break.

normalize()[source]¶: Normalizes (by area) the Gaussian function values over the range.

reset_mean(new_mean)[source]¶

Change the mean of the Gaussian function.

Parameters: new_mean (float) – The new mean of the Gaussian function.

Notes

This function does not re-evaluate the Gaussian range and therefore only affects the output of the eval function.

sum_range(lower, upper)[source]¶

Returns the over the Gaussian range.

This function sums the Gaussian function at the points that were evaluated on the range (or subset of the range) specified at initialization.

Parameters

lower (float) – The lower boundary for the sum.
upper (float) – The upper boundary for the sum.

Returns

The numerical value of the Gaussian range as summed from: lower to upper.

Return type

float

Notes

This function does not thoroughly check the bounds, so if upper is less than lower the function will break.

pybilt.common.knn_entropy module¶

Functions to evaluate information theoretic measures using knn approaches.

This module defines a set of functions to compute information theoretic measures (i.e. Shannon Entropy, Mutual Information, etc.) using the k-nearest neighbors (knn) approach.

pybilt.common.knn_entropy.conditional_mutual_information(var_tuple, cond_tuple, k=2)[source]¶

Returns an estimate of the conditional mutual information.

This function computes an estimate of the mutual information between a set of random variables or random vectors conditioned on other random variables or vectors using knn estimators for the entropy calculations.

Parameters

var_tuple (tuple) – A tuple of random variables or random vectors (i.e. numpy.array) to estimate the mutual information between.; e.g. var_tuple = (X, Y) where X form X = {x_1, x_2, x_3,…,x_N} and Y has form Y = {x_1, x_2, x_3,…,x_N}, or where X has form X = {(x_X1, y_X1), (x_X2, y_X2),…,(x_XN, y_XN)} and Y has form Y = {(x_Y1, y_Y1), (x_Y2, y_Y2),…,(x_YN, y_YN)}.
cond_tuple (tuple) –

A tuple of random variables or random
vectors (i.e. numpy.array) that the mutual information is to be conditioned on; e.g. var_tuple = (X) where X has the form X = {x_1, x_2, x_3,…,x_N}.

k (Optional[int]): The number of nearest neighbors to store for each
point. Defaults to 2.

Returns

The mutual information estimate.

Return type

float

Notes

The information entropies used to estimate the mutual information are computed using the shannon_entropy function. All input random variable/vector arrays must have the same shape.

pybilt.common.knn_entropy.k_nearest_neighbors(X, k=1)[source]¶

Get the k-nearest neighbors between points in a random variable/vector.

Determines the k nearest neighbors for each point in teh random variable/vector using Euclidean style distances.

Parameters

X (np.array) – A random variable of form X = {x_1, x_2, x_3,…,x_N} or a random vector of form X = {(x_1, y_1), (x_2, y_2),…,(x_N, y_n)}.
k (Optional[int]) – The number of nearest neighbors to store for each point. Defaults to 1.

Returns

A dictionary keyed by the indices of X and containing a list: of the k nearest neighbor for each point along with the distance value between the point and the nearest neighbor.

Return type

dict

pybilt.common.knn_entropy.kth_nearest_neighbor_distances(X, k=1)[source]¶

Returns the distance for the kth nearest neighbor of each point.

Parameters

X (np.array) – A random variable of form X = {x_1, x_2, x_3,…,x_N} or
a random vector of form X = {(x_1, y_1), (x_2, y_2),…,(x_N, y_n)}.
- k (Optional[int]) – The number of nearest neighbors to check for each point. Defaults to 1.

Returns:: list: A list in same order as X with the distance value to the kth nearest neighbor of each point in X.

pybilt.common.knn_entropy.mutual_information(var_tuple, k=2)[source]¶

Returns an estimate of the mutual information.

This function computes an estimate of the mutual information between a set of random variables or random vectors using knn estimators for the entropy calculations.

Parameters

var_tuple (tuple) –

A tuple of random variables or random: vectors (i.e. numpy.array); e.g. var_tuple = (X, Y) where X form X = {x_1, x_2, x_3,…,x_N} and Y has form Y = {x_1, x_2, x_3,…,x_N}, or where X has form X = {(x_X1, y_X1), (x_X2, y_X2),…,(x_XN, y_XN)} and Y has form Y = {(x_Y1, y_Y1), (x_Y2, y_Y2),…,(x_YN, y_YN)}.
k (Optional[int]): The number of nearest neighbors to store for each: point. Defaults to 2.

Returns

The mutual information estimate.

Return type

float

Notes

The information entropies used to estimate the mutual information are computed using the shannon_entropy function. All input random variable/vector arrays must have the same shape.

pybilt.common.knn_entropy.shannon_entropy(X, k=1, kth_dists=None)[source]¶

Return the Shannon Entropy of the random variable/vector.

This function computes the Shannon information entropy of the random variable/vector as estimated using the Kozachenko-Leonenko (KL) knn estimator.

Parameters

X (np.array) – A random variable of form X = {x_1, x_2, x_3,…,x_N} or a random vector of form X = {(x_1, y_1), (x_2, y_2),…,(x_N, y_n)}.
k (Optional[int]) – The number of nearest neighbors to store for each point. Defaults to 1.
kth_dists (Optional[list]) – A list in the same order as points in X that has the pre-computed distances between the points in X and their kth nearest neighbors at. Defaults to None.

References

Damiano Lombardi and Sanjay Pant, A non-parametric k-nearest

neighbour entropy estimator, arXiv preprint,
[cs.IT] 2015, arXiv:1506.06501v1. https://arxiv.org/pdf/1506.06501v1.pdf
https://www.cs.tut.fi/~timhome/tim/tim/core/differential_entropy_kl_details.htm
Kozachenko, L. F. & Leonenko, N. N. 1987 Sample estimate of entropy
of a random vector. Probl. Inf. Transm. 23, 95-101.

Returns: The estimate of the Shannon Information entropy of X.
Return type: float

pybilt.common.knn_entropy.shannon_entropy_pc(X, k=1, kth_dists=None)[source]¶

Return the Shannon Entropy of the random variable/vector.

This function computes the Shannon information entropy of the random variable/vector as estimated using the Perez-Cruz knn estimator described in Reference 1.

Parameters

X (np.array) – A random variable of form X = {x_1, x_2, x_3,…,x_N} or a random vector of form X = {(x_1, y_1), (x_2, y_2),…,(x_N, y_n)}.
k (Optional[int]) – The number of nearest neighbors to store for each point. Defaults to 1.
kth_dists (Optional[list]) – A list in the same order as points in X that has the pre-computed distances between the points in X and their kth nearest neighbors at. Defaults to None.

References

Perez-Cruz, (2008). Estimation of Information Theoretic Measures
for Continuous Random Variables. Advances in Neural Information Processing Systems 21 (NIPS). Vancouver (Canada), December. https://papers.nips.cc/paper/3417-estimation-of-information-theoretic-measures-for-continuous-random-variables.pdf

Returns: The estimate of the Shannon Information entropy of X.
Return type: float

pybilt.common.running_stats module¶

Running stats module.

This module defines the RunningStats and BlockAverager classes, as well as the gen_running_average function.

class pybilt.common.running_stats.BlockAverager(points_per_block=1000, min_points_in_block=500, store_data=False)[source]¶

Bases: object

An object that keeps track of points for block averaging.

n_blocks¶

The current number of active blocks.

Type: int

Init a the BlockAverager

Parameters

points_per_block (int, Optional) – The number of points to assign to a block before initiating a new block. Default: 1000
min_points_in_block (int, Optional) – The minimum number of points that a block (typically the last block) can have and still be included in computing the final block average and standard error estimates. This value should be <= points_per_block. Default: 500

averages_of_blocks()[source]¶

Return the block average and standard error.

Returns: Returns a length two tuple with the block average and standard error estimates.
Return type: tuple

get()[source]¶

Return the block average and standard error.

Returns: Returns a length two tuple with the block average and standard error estimates.
Return type: tuple

n_block()[source]¶

number_of_blocks()[source]¶

Return the current number of blocks.

Returns: The number of blocks.
Return type: int

points_per_block()[source]¶

Return information about the points per block.

Returns

A three element tuple containing the setting for points per block, the setting for minimum points: per block, and the number of points in the last block.

Return type

tuple

push_container(data)[source]¶

Push a container (array or array like) of data points to the block averaging.

Parameters: data (array like) – The container (list, tuple, np.array, etc.) of data points to add to the block averaging.

push_single(datum)[source]¶

Push a single data point (datum) into the block averager.

Parameters: datum (float) – The value to add to the block averaging.

standards_of_blocks()[source]¶

Return the block average and standard error.

Returns: Returns a length two tuple with the block average and standard error estimates.
Return type: tuple

class pybilt.common.running_stats.RunningStats[source]¶

Bases: object

A RunningStats object.

The RunningStats object keeps running statistics for a single value/quantity.

n¶

The number of points that have pushed to the running

Type: int

average.

Initialize the RunningStats object.

deviation()[source]¶: Return the current standard deviation.

mean()[source]¶: Return the current mean.

push(val)[source]¶

Push a new value to the running average.

Parameters: val (float) – The value to be added to the running average.

Returns:

reset()[source]¶: Reset the running average.

variance()[source]¶: Returun the current variance.

pybilt.common.running_stats.binned_average(data, positions, n_bins=25, position_range=None, min_count=0)[source]¶

Compute averages over a quantized range of histogram like bins.

Parameters

data (np.array) – A 1d numpy array of values.
positions (np.array) – A 1d numpy array of positions corresponding to the values in data. These are used to assign the values to the histogram like bins for averaging.
n_bins (Optional[int]) – Set the target number of bins to quantize the position_range up into. Defaults to 25
position_range (Optional[tuple]) – A two element tuple containing the lower and upper range to bin the postions over; i.e. (position_lower, postion_upper). Defaults to None, which uses positions.min() and positions.max().

Returns

returns a tuple with two numpy arrays of form (bins, averages)

Return type

tuple

Notes

The function automatically filters out bins that have a zero count, so the final value of the number of bins and values will be len(bins) <= n_bins.

pybilt.common.running_stats.block_average_bse_v_size(data)[source]¶

pybilt.common.running_stats.block_avg_hist(nparray_1d, block_size, in_range='auto', scale=False, *args, **kwargs)[source]¶: Creates histograms for each block and averages them to generate block a single block averaged historgram.

pybilt.common.running_stats.gen_running_average(onednparray)[source]¶

Generates a running average

Args: onednparray (numpy.array): A 1d numpy array of measurements (e.g. over time)

Returns: numpy.array: 2d array of dim len(onednparray)x2

2dnparray[i][0] = running average at i 2dnparray[i][1] = running standard deviation at i for i in range(0,len(onednparray))

pybilt.common.running_stats.pair_ttest(mean_1, std_err_1, n_1, mean_2, std_err_2, n_2)[source]¶

pybilt.common package¶

Submodules¶

pybilt.common.distance_cutoff_clustering module¶

pybilt.common.gaussian module¶

pybilt.common.knn_entropy module¶

pybilt.common.running_stats module¶

Module contents¶