Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Central moments #23

Merged
merged 47 commits into from
Apr 6, 2019
Merged
Show file tree
Hide file tree
Changes from 44 commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
62e8b02
Added definition of central order moment
LukeMathWalker Jan 15, 2019
89d6b8c
Added stubbed implementation
LukeMathWalker Jan 15, 2019
9a5b565
Add behaviour to return None if the array is empty
LukeMathWalker Jan 15, 2019
d9826e5
Handle edge cases
LukeMathWalker Jan 15, 2019
316a0e5
Added test
LukeMathWalker Jan 15, 2019
00782d8
First implementation completed
LukeMathWalker Jan 15, 2019
1d634ac
Added test for second order moment (variance)
LukeMathWalker Jan 15, 2019
05bf683
Implemented test and renamed to central_moment
LukeMathWalker Jan 15, 2019
5921dd2
Using Horner's method for evaluating polynomials
LukeMathWalker Jan 15, 2019
c04e295
Add algorithm notes to the docs
LukeMathWalker Jan 15, 2019
9cddde5
Added algorithmic complexity to the docs
LukeMathWalker Jan 15, 2019
336a51a
Added two optimizations
LukeMathWalker Jan 15, 2019
7f012af
Added signature for bulk central moment computation
LukeMathWalker Jan 18, 2019
cfd4dbe
Fixed trait signature
LukeMathWalker Jan 18, 2019
2517b4d
Indent
LukeMathWalker Jan 18, 2019
b5a520c
Factored out logic from central_moment
LukeMathWalker Jan 18, 2019
7eb268f
Implemented central_moments
LukeMathWalker Jan 18, 2019
2e61f2e
Test implementation correctness
LukeMathWalker Jan 18, 2019
9d37f15
Added skewness and kurtosis
LukeMathWalker Jan 18, 2019
4829ca6
Added docs
LukeMathWalker Jan 18, 2019
c748c8e
Added tests for kurtosis and skewness
LukeMathWalker Jan 18, 2019
4bf0035
Fixed assertions
LukeMathWalker Jan 22, 2019
da49a28
No need to get to order 4 for skewness
LukeMathWalker Jan 22, 2019
4a193bb
Fixed kurtosis test
LukeMathWalker Jan 23, 2019
5fe1fe8
Enriched docs
LukeMathWalker Jan 23, 2019
36428c7
Fmt
LukeMathWalker Mar 26, 2019
cff93fe
Merge master
LukeMathWalker Mar 26, 2019
25f3f0f
Avoid computing mean for p=0,1 central moments
jturner314 Mar 31, 2019
d9ba4bb
Replace map + clone with mapv
jturner314 Mar 31, 2019
4a05288
Check for moment order overflowing `i32`
jturner314 Mar 31, 2019
3aaab13
Take advantage of IterBinomial from num-integer
jturner314 Mar 31, 2019
7efba82
Remove unnecessary explicit clones
jturner314 Mar 31, 2019
0b0b0ba
Replace .map() with ?
jturner314 Mar 31, 2019
f992453
Test more cases in central moment tests
jturner314 Mar 31, 2019
a8af880
Rename central moment tests
jturner314 Mar 31, 2019
a701f33
Push diff to the lowest possible exponent
LukeMathWalker Apr 1, 2019
94f0950
Merge pull request #3 from jturner314/central-moments
LukeMathWalker Apr 1, 2019
595af7b
Merge branch 'master' into central-moments
LukeMathWalker Apr 2, 2019
1796a48
Return a Result instead of Option, for consistency
LukeMathWalker Apr 2, 2019
e21ca2b
Match impl with trait definition
LukeMathWalker Apr 2, 2019
8762490
Match test to trait+impl
LukeMathWalker Apr 2, 2019
76b0077
Fmt
LukeMathWalker Apr 2, 2019
74cc782
Converted order to u16
LukeMathWalker Apr 5, 2019
31a3909
Fmt
LukeMathWalker Apr 5, 2019
1f9bcf1
Fix typo. Change casting to use as.
LukeMathWalker Apr 6, 2019
fb65893
Fix typo. Change casting to use as.
LukeMathWalker Apr 6, 2019
c3a166e
Fix typo. Change casting to use as.
LukeMathWalker Apr 6, 2019
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ categories = ["data-structures", "science"]
[dependencies]
ndarray = "0.12.1"
noisy_float = "0.1.8"
num-integer = "0.1"
num-traits = "0.2"
rand = "0.6"
itertools = { version = "0.7.0", default-features = false }
Expand Down
1 change: 1 addition & 0 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@
extern crate itertools;
extern crate ndarray;
extern crate noisy_float;
extern crate num_integer;
extern crate num_traits;
extern crate rand;

Expand Down
266 changes: 258 additions & 8 deletions src/summary_statistics/means.rs
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
use super::SummaryStatisticsExt;
use crate::errors::EmptyInput;
use ndarray::{ArrayBase, Data, Dimension};
use num_traits::{Float, FromPrimitive, Zero};
use num_integer::IterBinomial;
use num_traits::{Float, FromPrimitive, ToPrimitive, Zero};
use std::ops::{Add, Div};

impl<A, S, D> SummaryStatisticsExt<A, S, D> for ArrayBase<S, D>
Expand Down Expand Up @@ -36,15 +37,163 @@ where
{
self.map(|x| x.ln()).mean().map(|x| x.exp())
}

fn kurtosis(&self) -> Result<A, EmptyInput>
where
A: Float + FromPrimitive,
{
let central_moments = self.central_moments(4)?;
Ok(central_moments[4] / central_moments[2].powi(2))
}

fn skewness(&self) -> Result<A, EmptyInput>
where
A: Float + FromPrimitive,
{
let central_moments = self.central_moments(3)?;
Ok(central_moments[3] / central_moments[2].sqrt().powi(3))
}

fn central_moment(&self, order: u16) -> Result<A, EmptyInput>
where
A: Float + FromPrimitive,
{
if self.is_empty() {
return Err(EmptyInput);
}
match order {
0 => Ok(A::one()),
1 => Ok(A::zero()),
n => {
let mean = self.mean().unwrap();
let shifted_array = self.mapv(|x| x - mean);
let shifted_moments = moments(shifted_array, n);
let correction_term = -shifted_moments[1];

let coefficients = central_moment_coefficients(&shifted_moments);
Ok(horner_method(coefficients, correction_term))
}
}
}

fn central_moments(&self, order: u16) -> Result<Vec<A>, EmptyInput>
where
A: Float + FromPrimitive,
{
if self.is_empty() {
return Err(EmptyInput);
}
match order {
0 => Ok(vec![A::one()]),
1 => Ok(vec![A::one(), A::zero()]),
n => {
// We only perform these operations once, and then reuse their
// result to compute all the required moments
let mean = self.mean().unwrap();
let shifted_array = self.mapv(|x| x - mean);
let shifted_moments = moments(shifted_array, n);
let correction_term = -shifted_moments[1];

let mut central_moments = vec![A::one(), A::zero()];
for k in 2..=n {
let coefficients =
central_moment_coefficients(&shifted_moments[..=(k as usize)]);
let central_moment = horner_method(coefficients, correction_term);
central_moments.push(central_moment)
}
Ok(central_moments)
}
}
}
}

/// Returns a vector containing all moments of the array elements up to
/// *order*, where the *p*-th moment is defined as:
///
/// ```text
/// 1 n
/// ― ∑ xᵢᵖ
/// n i=1
/// ```
///
/// The returned moments are ordered by power magnitude: 0th moment, 1st moment, etc.
///
/// **Panics** if `A::from_usize()` fails to convert the number of elements in the array.
fn moments<A, S, D>(a: ArrayBase<S, D>, order: u16) -> Vec<A>
where
A: Float + FromPrimitive,
S: Data<Elem = A>,
D: Dimension,
{
let n_elements =
A::from_usize(a.len()).expect("Converting number of elements to `A` must not fail");
let order = order
Copy link
Member

@jturner314 jturner314 Apr 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that order is u16, we can change this to let order = order as i32.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, it can't fail anymore. I'll merge once the CI has finished 👍

.to_i32()
.expect("Moment order must not overflow `i32`.");

// When k=0, we are raising each element to the 0th power
// No need to waste CPU cycles going through the array
let mut moments = vec![A::one()];

if order >= 1 {
// When k=1, we don't need to raise elements to the 1th power (identity)
moments.push(a.sum() / n_elements)
}

for k in 2..=order {
moments.push(a.map(|x| x.powi(k)).sum() / n_elements)
}
moments
}

/// Returns the coefficients in the polynomial expression to compute the *p*th
/// central moment as a function of the sample mean.
///
/// It takes as input all moments up to order *p*, ordered by power magnitude - *p* is
/// inferred to be the length of the *moments* array.
fn central_moment_coefficients<A>(moments: &[A]) -> Vec<A>
where
A: Float + FromPrimitive,
{
let order = moments.len();
IterBinomial::new(order)
.zip(moments.iter().rev())
.map(|(binom, &moment)| A::from_usize(binom).unwrap() * moment)
.collect()
}

/// Uses [Horner's method] to evaluate a polynomial with a single indeterminate.
///
/// Coefficients are expected to be sorted by ascending order
/// with respect to the indeterminate's exponent.
///
/// If the array is empty, `A::zero()` is returned.
///
/// Horner's method can evaluate a polynomial of order *n* with a single indeterminate
/// using only *n-1* multiplications and *n-1* sums - in terms of number of operations,
/// this is an optimal algorithm for polynomial evaluation.
///
/// [Horner's method]: https://en.wikipedia.org/wiki/Horner%27s_method
fn horner_method<A>(coefficients: Vec<A>, indeterminate: A) -> A
where
A: Float,
{
let mut result = A::zero();
for coefficient in coefficients.into_iter().rev() {
result = coefficient + indeterminate * result
}
result
}

#[cfg(test)]
mod tests {
use super::SummaryStatisticsExt;
use crate::errors::EmptyInput;
use approx::abs_diff_eq;
use ndarray::{array, Array1};
use approx::assert_abs_diff_eq;
use ndarray::{array, Array, Array1};
use ndarray_rand::RandomExt;
use noisy_float::types::N64;
use rand::distributions::Uniform;
use std::f64;

#[test]
Expand Down Expand Up @@ -90,16 +239,117 @@ mod tests {
// Computed using SciPy
let expected_geometric_mean = 0.4345897639796527;

abs_diff_eq!(a.mean().unwrap(), expected_mean, epsilon = f64::EPSILON);
abs_diff_eq!(
assert_abs_diff_eq!(a.mean().unwrap(), expected_mean, epsilon = 1e-9);
assert_abs_diff_eq!(
a.harmonic_mean().unwrap(),
expected_harmonic_mean,
epsilon = f64::EPSILON
epsilon = 1e-7
);
abs_diff_eq!(
assert_abs_diff_eq!(
a.geometric_mean().unwrap(),
expected_geometric_mean,
epsilon = f64::EPSILON
epsilon = 1e-12
);
}

#[test]
fn test_central_moment_with_empty_array_of_floats() {
let a: Array1<f64> = array![];
for order in 0..=3 {
assert_eq!(a.central_moment(order), Err(EmptyInput));
assert_eq!(a.central_moments(order), Err(EmptyInput));
}
}

#[test]
fn test_zeroth_central_moment_is_one() {
let n = 50;
let bound: f64 = 200.;
let a = Array::random(n, Uniform::new(-bound.abs(), bound.abs()));
assert_eq!(a.central_moment(0).unwrap(), 1.);
}

#[test]
fn test_first_central_moment_is_zero() {
let n = 50;
let bound: f64 = 200.;
let a = Array::random(n, Uniform::new(-bound.abs(), bound.abs()));
assert_eq!(a.central_moment(1).unwrap(), 0.);
}

#[test]
fn test_central_moments() {
let a: Array1<f64> = array![
0.07820559, 0.5026185, 0.80935324, 0.39384033, 0.9483038, 0.62516215, 0.90772261,
0.87329831, 0.60267392, 0.2960298, 0.02810356, 0.31911966, 0.86705506, 0.96884832,
0.2222465, 0.42162446, 0.99909868, 0.47619762, 0.91696979, 0.9972741, 0.09891734,
0.76934818, 0.77566862, 0.7692585, 0.2235759, 0.44821286, 0.79732186, 0.04804275,
0.87863238, 0.1111003, 0.6653943, 0.44386445, 0.2133176, 0.39397086, 0.4374617,
0.95896624, 0.57850146, 0.29301706, 0.02329879, 0.2123203, 0.62005503, 0.996492,
0.5342986, 0.97822099, 0.5028445, 0.6693834, 0.14256682, 0.52724704, 0.73482372,
0.1809703,
];
// Computed using scipy.stats.moment
let expected_moments = vec![
1.,
0.,
0.09339920262960291,
-0.0026849636727735186,
0.015403769257729755,
-0.001204176487006564,
0.002976822584939186,
];
for (order, expected_moment) in expected_moments.iter().enumerate() {
assert_abs_diff_eq!(
a.central_moment(order as u16).unwrap(),
expected_moment,
epsilon = 1e-8
);
}
}

#[test]
fn test_bulk_central_moments() {
// Test that the bulk method is coherent with the non-bulk method
let n = 50;
let bound: f64 = 200.;
let a = Array::random(n, Uniform::new(-bound.abs(), bound.abs()));
let order = 10;
let central_moments = a.central_moments(order).unwrap();
for i in 0..=order {
assert_eq!(a.central_moment(i).unwrap(), central_moments[i as usiz
e]);
}
}

#[test]
fn test_kurtosis_and_skewness_is_none_with_empty_array_of_floats() {
let a: Array1<f64> = array![];
assert_eq!(a.skewness(), Err(EmptyInput));
assert_eq!(a.kurtosis(), Err(EmptyInput));
}

#[test]
fn test_kurtosis_and_skewness() {
let a: Array1<f64> = array![
0.33310096, 0.98757449, 0.9789796, 0.96738114, 0.43545674, 0.06746873, 0.23706562,
0.04241815, 0.38961714, 0.52421271, 0.93430327, 0.33911604, 0.05112372, 0.5013455,
0.05291507, 0.62511183, 0.20749633, 0.22132433, 0.14734804, 0.51960608, 0.00449208,
0.4093339, 0.2237519, 0.28070469, 0.7887231, 0.92224523, 0.43454188, 0.18335111,
0.08646856, 0.87979847, 0.25483457, 0.99975627, 0.52712442, 0.41163279, 0.85162594,
0.52618733, 0.75815023, 0.30640695, 0.14205781, 0.59695813, 0.851331, 0.39524328,
0.73965373, 0.4007615, 0.02133069, 0.92899207, 0.79878191, 0.38947334, 0.22042183,
0.77768353,
];
// Computed using scipy.stats.kurtosis(a, fisher=False)
let expected_kurtosis = 1.821933711687523;
// Computed using scipy.stats.skew
let expected_skewness = 0.2604785422878771;

let kurtosis = a.kurtosis().unwrap();
let skewness = a.skewness().unwrap();

assert_abs_diff_eq!(kurtosis, expected_kurtosis, epsilon = 1e-12);
assert_abs_diff_eq!(skewness, expected_skewness, epsilon = 1e-8);
}
}
Loading