pybilt.common package

Submodules

pybilt.common.distance_cutoff_clustering module

Function to compute hiearchical distance cutoff clusters.

pybilt.common.distance_cutoff_clustering.distance_cutoff_clustering(vectors, cutoff, dist_func, min_size=1, *df_args, **df_kwargs)[source]

Hiearchical distance cutoff clustering.

This function takes a set of vector points and clusters them using a hiearchical distance based clustering algorithm. Points are clustered together whenevever a point is within the cutoff distance of any point within in a cluster.

Parameters
  • vectors (np.array, list like) – The array of vector points.

  • cutoff (float) – The cutoff distance.

  • dist_func (function) – The function to use when computing the distance between points.

  • min_size (Optional[int]) – The minimum size of a cluster. Defaults to 1.

  • *df_args – Any additional arguments to be passed to the distance function (dist_func).

  • **df_kwargs – Any additional keyword arguments to be passed to the distance function (dist_func).

Returns

Returns a list of clustered points where each set of clustered i

points is a list of the indices of the points in that cluster.

Return type

list

pybilt.common.distance_cutoff_clustering.distance_euclidean(v_a, v_b)[source]

Compute the Euclidean distance between two vectors.

Parameters
  • v_a (numpy.array, array like) – The first input vector.

  • v_b (numpy.array, array like) – The second input vector.

Returns

The Euclidean distance between the two vectors.

Return type

float

Notes

The two vectors should have the same size and dimension.

pybilt.common.distance_cutoff_clustering.distance_euclidean_pbc(v_a, v_b, box_lengths, center='zero')[source]

Compute the Euclidean distance between two vectors under periodic boundaries.

Parameters
  • v_a (numpy.array, array like) – The first input vector.

  • v_b (numpy.array, array like) – The second input vector.

  • box_lengths (numpy.array, array like) – The periodic boundary box lengths for each dimension.

  • center (Optional[str, array like]) – Set the coordinate center of the periodic box dimensions. Defaults to ‘zero’, which sets the center to numpy.zeros(len(box_lengths)). Also accepts the string value ‘box_half’, which sets the center to 0.5*box_lengths.

Returns

The Euclidean distance between the two vectors.

Return type

float

Notes

The two vectors should have the same size and dimension, while

box_lengths should have the length of the vector dimension.

pybilt.common.distance_cutoff_clustering.vector_difference_pbc(v_a, v_b, box_lengths, center='zero')[source]

Compute the Euclidean distance between two vectors under periodic boundaries.

Parameters
  • v_a (numpy.array, array like) – The first input vector.

  • v_b (numpy.array, array like) – The second input vector.

  • box_lengths (numpy.array, array like) – The periodic boundary box lengths for each dimension.

  • center (Optional[str, array like]) – Set the coordinate center of the periodic box dimensions. Defaults to ‘zero’, which sets the center to numpy.zeros(len(box_lengths)). Also accepts the string value ‘box_half’, which sets the center to 0.5*box_lengths.

Returns

The Euclidean distance between the two vectors.

Return type

float

Notes

The two vectors should have the same size and dimension, while

box_lengths should have the length of the vector dimension.

pybilt.common.gaussian module

Define Gaussian function objects.

This module defines the Gaussian class and the GaussianRange class.

class pybilt.common.gaussian.Gaussian(mean, std)[source]

Bases: object

A Gaussian function object.

mean

The mean of the Gaussian.

Type

float

std

The standard deviation of the Gaussian.

Type

float

Initialize a Gaussian function object.

Parameters
  • mean (float) – Set the mean of the Gaussian.

  • std (float) – Set the standard deviation of the Gaussian.

eval(x_in)[source]

Return the Gaussian function evaluated at the input x value.

Parameters

x_in (float) – The x value to evaluate the function at.

Returns

The function evaluation for the Gaussian.

Return type

float

reset_mean(new_mean)[source]

Change the mean of the Gaussian function.

Parameters

new_mean (float) – The new mean of the Gaussian function.

class pybilt.common.gaussian.GaussianRange(in_range, mean, std, npoints=200)[source]

Bases: object

Define a Gaussian function over a range.

This object is used to define a Gaussian function over a defined finite range and store its values as evaluated at points evenly spaced over the range. The points can then for example be used for integrating the Gaussian function over the range using numerical quadrature.

mean

The mean of the Gaussian.

Type

float

std

The standard deviation of the Gaussian.

Type

float

upper

The upper boundary of the range.

Type

float

lower

The lower boundary of the range.

Type

float

npoints

The number of points to evaluate in the range.

Type

int

Initialize the GaussianRange object.

The GaussianRange stores the values of Gaussian function with the input mean and standard deviation evaluated at evenly spaced points in the specified x-value range.

Parameters
  • in_range (tuple, list) – Specify the endpoints for range, e.g. (x_start, x_end).

  • mean (float) – The mean of the Gaussian function.

  • std (float) – The standard deviation of the Gaussian function.

  • npoints (Optional[int]) – The number of x-value points to evaluate the Gaussian function for in the specified range (i.e. in_range).

eval(x_in)[source]

Return the Gaussian function evaluated at the input x value.

Parameters

x_in (float) – The x value to evaluate the function at.

Returns

The function evaluation for the Gaussian.

Return type

float

get_values()[source]

Return the x and y values for the Gaussian range function.

Returns

The x and y values for the function, returned as ( x_values, y_values).

Return type

tuple

integrate_range(lower, upper)[source]

Returns the numerical integration of the Gaussian range.

This function does a simple quadrature for the Gaussian function as evaluated on the range (or subset of the range) specified at initialization.

Parameters
  • lower (float) – The lower boundary for the integration.

  • upper (float) – The upper boundary for the integration.

Returns

The numerical value of the Gaussian range integrated from

lower to upper.

Return type

float

Notes

This function does not thoroughly check the bounds, so if upper is less than lower the function will break.

normalize()[source]

Normalizes (by area) the Gaussian function values over the range.

reset_mean(new_mean)[source]

Change the mean of the Gaussian function.

Parameters

new_mean (float) – The new mean of the Gaussian function.

Notes

This function does not re-evaluate the Gaussian range and therefore only affects the output of the eval function.

sum_range(lower, upper)[source]

Returns the over the Gaussian range.

This function sums the Gaussian function at the points that were evaluated on the range (or subset of the range) specified at initialization.

Parameters
  • lower (float) – The lower boundary for the sum.

  • upper (float) – The upper boundary for the sum.

Returns

The numerical value of the Gaussian range as summed from

lower to upper.

Return type

float

Notes

This function does not thoroughly check the bounds, so if upper is less than lower the function will break.

pybilt.common.knn_entropy module

Functions to evaluate information theoretic measures using knn approaches.

This module defines a set of functions to compute information theoretic measures (i.e. Shannon Entropy, Mutual Information, etc.) using the k-nearest neighbors (knn) approach.

pybilt.common.knn_entropy.conditional_mutual_information(var_tuple, cond_tuple, k=2)[source]

Returns an estimate of the conditional mutual information.

This function computes an estimate of the mutual information between a set of random variables or random vectors conditioned on other random variables or vectors using knn estimators for the entropy calculations.

Parameters
  • var_tuple (tuple) – A tuple of random variables or random vectors (i.e. numpy.array) to estimate the mutual information between.; e.g. var_tuple = (X, Y) where X form X = {x_1, x_2, x_3,…,x_N} and Y has form Y = {x_1, x_2, x_3,…,x_N}, or where X has form X = {(x_X1, y_X1), (x_X2, y_X2),…,(x_XN, y_XN)} and Y has form Y = {(x_Y1, y_Y1), (x_Y2, y_Y2),…,(x_YN, y_YN)}.

  • cond_tuple (tuple) –

    A tuple of random variables or random

    vectors (i.e. numpy.array) that the mutual information is to be conditioned on; e.g. var_tuple = (X) where X has the form X = {x_1, x_2, x_3,…,x_N}.

    k (Optional[int]): The number of nearest neighbors to store for each

    point. Defaults to 2.

Returns

The mutual information estimate.

Return type

float

Notes

The information entropies used to estimate the mutual information are computed using the shannon_entropy function. All input random variable/vector arrays must have the same shape.

pybilt.common.knn_entropy.k_nearest_neighbors(X, k=1)[source]

Get the k-nearest neighbors between points in a random variable/vector.

Determines the k nearest neighbors for each point in teh random variable/vector using Euclidean style distances.

Parameters
  • X (np.array) – A random variable of form X = {x_1, x_2, x_3,…,x_N} or a random vector of form X = {(x_1, y_1), (x_2, y_2),…,(x_N, y_n)}.

  • k (Optional[int]) – The number of nearest neighbors to store for each point. Defaults to 1.

Returns

A dictionary keyed by the indices of X and containing a list

of the k nearest neighbor for each point along with the distance value between the point and the nearest neighbor.

Return type

dict

pybilt.common.knn_entropy.kth_nearest_neighbor_distances(X, k=1)[source]

Returns the distance for the kth nearest neighbor of each point.

Parameters
  • X (np.array) – A random variable of form X = {x_1, x_2, x_3,…,x_N} or

    a random vector of form X = {(x_1, y_1), (x_2, y_2),…,(x_N, y_n)}.

    • k (Optional[int]) – The number of nearest neighbors to check for each point. Defaults to 1.

Returns:

list: A list in same order as X with the distance value to the kth nearest neighbor of each point in X.

pybilt.common.knn_entropy.mutual_information(var_tuple, k=2)[source]

Returns an estimate of the mutual information.

This function computes an estimate of the mutual information between a set of random variables or random vectors using knn estimators for the entropy calculations.

Parameters

var_tuple (tuple) –

A tuple of random variables or random

vectors (i.e. numpy.array); e.g. var_tuple = (X, Y) where X form X = {x_1, x_2, x_3,…,x_N} and Y has form Y = {x_1, x_2, x_3,…,x_N}, or where X has form X = {(x_X1, y_X1), (x_X2, y_X2),…,(x_XN, y_XN)} and Y has form Y = {(x_Y1, y_Y1), (x_Y2, y_Y2),…,(x_YN, y_YN)}.

k (Optional[int]): The number of nearest neighbors to store for each

point. Defaults to 2.

Returns

The mutual information estimate.

Return type

float

Notes

The information entropies used to estimate the mutual information are computed using the shannon_entropy function. All input random variable/vector arrays must have the same shape.

pybilt.common.knn_entropy.shannon_entropy(X, k=1, kth_dists=None)[source]

Return the Shannon Entropy of the random variable/vector.

This function computes the Shannon information entropy of the random variable/vector as estimated using the Kozachenko-Leonenko (KL) knn estimator.

Parameters
  • X (np.array) – A random variable of form X = {x_1, x_2, x_3,…,x_N} or a random vector of form X = {(x_1, y_1), (x_2, y_2),…,(x_N, y_n)}.

  • k (Optional[int]) – The number of nearest neighbors to store for each point. Defaults to 1.

  • kth_dists (Optional[list]) – A list in the same order as points in X that has the pre-computed distances between the points in X and their kth nearest neighbors at. Defaults to None.

References

  1. Damiano Lombardi and Sanjay Pant, A non-parametric k-nearest
    neighbour entropy estimator, arXiv preprint,

    [cs.IT] 2015, arXiv:1506.06501v1. https://arxiv.org/pdf/1506.06501v1.pdf

  2. https://www.cs.tut.fi/~timhome/tim/tim/core/differential_entropy_kl_details.htm

  3. Kozachenko, L. F. & Leonenko, N. N. 1987 Sample estimate of entropy

    of a random vector. Probl. Inf. Transm. 23, 95-101.

Returns

The estimate of the Shannon Information entropy of X.

Return type

float

pybilt.common.knn_entropy.shannon_entropy_pc(X, k=1, kth_dists=None)[source]

Return the Shannon Entropy of the random variable/vector.

This function computes the Shannon information entropy of the random variable/vector as estimated using the Perez-Cruz knn estimator described in Reference 1.

Parameters
  • X (np.array) – A random variable of form X = {x_1, x_2, x_3,…,x_N} or a random vector of form X = {(x_1, y_1), (x_2, y_2),…,(x_N, y_n)}.

  • k (Optional[int]) – The number of nearest neighbors to store for each point. Defaults to 1.

  • kth_dists (Optional[list]) – A list in the same order as points in X that has the pre-computed distances between the points in X and their kth nearest neighbors at. Defaults to None.

References

  1. Perez-Cruz, (2008). Estimation of Information Theoretic Measures

    for Continuous Random Variables. Advances in Neural Information Processing Systems 21 (NIPS). Vancouver (Canada), December. https://papers.nips.cc/paper/3417-estimation-of-information-theoretic-measures-for-continuous-random-variables.pdf

Returns

The estimate of the Shannon Information entropy of X.

Return type

float

pybilt.common.running_stats module

Running stats module.

This module defines the RunningStats and BlockAverager classes, as well as the gen_running_average function.

class pybilt.common.running_stats.BlockAverager(points_per_block=1000, min_points_in_block=500, store_data=False)[source]

Bases: object

An object that keeps track of points for block averaging.

n_blocks

The current number of active blocks.

Type

int

Init a the BlockAverager

Parameters
  • points_per_block (int, Optional) – The number of points to assign to a block before initiating a new block. Default: 1000

  • min_points_in_block (int, Optional) – The minimum number of points that a block (typically the last block) can have and still be included in computing the final block average and standard error estimates. This value should be <= points_per_block. Default: 500

averages_of_blocks()[source]

Return the block average and standard error.

Returns

Returns a length two tuple with the block average and standard error estimates.

Return type

tuple

get()[source]

Return the block average and standard error.

Returns

Returns a length two tuple with the block average and standard error estimates.

Return type

tuple

n_block()[source]
number_of_blocks()[source]

Return the current number of blocks.

Returns

The number of blocks.

Return type

int

points_per_block()[source]

Return information about the points per block.

Returns

A three element tuple containing the setting for points per block, the setting for minimum points

per block, and the number of points in the last block.

Return type

tuple

push_container(data)[source]

Push a container (array or array like) of data points to the block averaging.

Parameters

data (array like) – The container (list, tuple, np.array, etc.) of data points to add to the block averaging.

push_single(datum)[source]

Push a single data point (datum) into the block averager.

Parameters

datum (float) – The value to add to the block averaging.

standards_of_blocks()[source]

Return the block average and standard error.

Returns

Returns a length two tuple with the block average and standard error estimates.

Return type

tuple

class pybilt.common.running_stats.RunningStats[source]

Bases: object

A RunningStats object.

The RunningStats object keeps running statistics for a single value/quantity.

n

The number of points that have pushed to the running

Type

int

average.

Initialize the RunningStats object.

deviation()[source]

Return the current standard deviation.

mean()[source]

Return the current mean.

push(val)[source]

Push a new value to the running average.

Parameters

val (float) – The value to be added to the running average.

Returns:

reset()[source]

Reset the running average.

variance()[source]

Returun the current variance.

pybilt.common.running_stats.binned_average(data, positions, n_bins=25, position_range=None, min_count=0)[source]

Compute averages over a quantized range of histogram like bins.

Parameters
  • data (np.array) – A 1d numpy array of values.

  • positions (np.array) – A 1d numpy array of positions corresponding to the values in data. These are used to assign the values to the histogram like bins for averaging.

  • n_bins (Optional[int]) – Set the target number of bins to quantize the position_range up into. Defaults to 25

  • position_range (Optional[tuple]) – A two element tuple containing the lower and upper range to bin the postions over; i.e. (position_lower, postion_upper). Defaults to None, which uses positions.min() and positions.max().

Returns

returns a tuple with two numpy arrays of form (bins, averages)

Return type

tuple

Notes

The function automatically filters out bins that have a zero count, so the final value of the number of bins and values will be len(bins) <= n_bins.

pybilt.common.running_stats.block_average_bse_v_size(data)[source]
pybilt.common.running_stats.block_avg_hist(nparray_1d, block_size, in_range='auto', scale=False, *args, **kwargs)[source]

Creates histograms for each block and averages them to generate block a single block averaged historgram.

pybilt.common.running_stats.gen_running_average(onednparray)[source]

Generates a running average

Args: onednparray (numpy.array): A 1d numpy array of measurements (e.g. over time)

Returns: numpy.array: 2d array of dim len(onednparray)x2

2dnparray[i][0] = running average at i 2dnparray[i][1] = running standard deviation at i for i in range(0,len(onednparray))

pybilt.common.running_stats.pair_ttest(mean_1, std_err_1, n_1, mean_2, std_err_2, n_2)[source]

Module contents