Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Central moments #23

Merged
merged 47 commits into from
Apr 6, 2019
Merged

Conversation

LukeMathWalker
Copy link
Member

Computation of central moments of arbitrary order.

Would it be worth to add another method that takes the mean as a parameter to avoid re-computing it if the user already has its value @jturner314?

@LukeMathWalker
Copy link
Member Author

Once pairwise summation gets merged in ndarray, this will actually be a corrected pairwise two-pass algorithm for central moments.

@LukeMathWalker LukeMathWalker mentioned this pull request Jan 15, 2019
17 tasks
@LukeMathWalker
Copy link
Member Author

Added another method, central_moments, to compute multiple central moments up to a certain order without wasting time repeating several times the same steps (moment computation).

Useful to get an efficient kurtosis and skewness computation, given that their formulas involve central moments of different order.

@LukeMathWalker
Copy link
Member Author

Added kurtosis and skewness computation.

@LukeMathWalker
Copy link
Member Author

LukeMathWalker commented Jan 23, 2019

Fixed an issue in the kurtosis' test (I was comparing to SciPy's kurtosis with fisher=True instead of fisher=False).

LukeMathWalker and others added 10 commits March 26, 2019 22:15
In reality, this probably isn't necessary because I'd expect someone
to terminate their program when trying to calculate a moment of order
greater than `i32::MAX` (because it would take so long), but it's nice
to have an explicit check anyway.
`A: Float` requires `A: Copy`.
I think this is a bit easier to understand at first glance.
Copy link
Member

@jturner314 jturner314 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! It's especially nice that you found a numerically stable formula instead of using the naive calculation. I've created LukeMathWalker#3 with some minor changes and added a couple of comments. Everything else looks good.

src/summary_statistics/mod.rs Outdated Show resolved Hide resolved
@@ -88,16 +256,114 @@ mod tests {
// Computed using SciPy
let expected_geometric_mean = 0.4345897639796527;

abs_diff_eq!(a.mean().unwrap(), expected_mean, epsilon = f64::EPSILON);
abs_diff_eq!(
assert_abs_diff_eq!(a.mean().unwrap(), expected_mean, epsilon = 1e-6);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This epsilon is fairly large. Is it due to ndarray not having pairwise summation yet?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tuned all epsilons to be as close as possible to 1e-12, stopping when the assertion fails.
It seems our mean estimate agrees with NumPy's with a delta of 1e-9 - I'd say that the lack of pairwise summation is a big suspect. It would be nice to have something like decimal in Rust to get a sense of the real error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tuned all epsilons to be as close as possible to 1e-12, stopping when the assertion fails.

Okay, thanks. That's helpful to get a better understanding of the precision.

It would be nice to have something like decimal in Rust to get a sense of the real error.

For this particular case, decimal gives the following for the standard and harmonic mean:

>>> from decimal import *
>>> getcontext().prec = 100
>>> data = [
...     Decimal('0.99889651'), Decimal('0.0150731'), Decimal('0.28492482'), Decimal('0.83819218'), Decimal('0.48413156'), Decimal('0.80710412'), Decimal('0.41762936'),
...     Decimal('0.22879429'), Decimal('0.43997224'), Decimal('0.23831807'), Decimal('0.02416466'), Decimal('0.6269962'), Decimal('0.47420614'), Decimal('0.56275487'),
...     Decimal('0.78995021'), Decimal('0.16060581'), Decimal('0.64635041'), Decimal('0.34876609'), Decimal('0.78543249'), Decimal('0.19938356'), Decimal('0.34429457'),
...     Decimal('0.88072369'), Decimal('0.17638164'), Decimal('0.60819363'), Decimal('0.250392'), Decimal('0.69912532'), Decimal('0.78855523'), Decimal('0.79140914'),
...     Decimal('0.85084218'), Decimal('0.31839879'), Decimal('0.63381769'), Decimal('0.22421048'), Decimal('0.70760302'), Decimal('0.99216018'), Decimal('0.80199153'),
...     Decimal('0.19239188'), Decimal('0.61356023'), Decimal('0.31505352'), Decimal('0.06120481'), Decimal('0.66417377'), Decimal('0.63608897'), Decimal('0.84959691'),
...     Decimal('0.43599069'), Decimal('0.77867775'), Decimal('0.88267754'), Decimal('0.83003623'), Decimal('0.67016118'), Decimal('0.67547638'), Decimal('0.65220036'),
...     Decimal('0.68043427')
... ]
>>> sum(data) / len(data)
Decimal('0.5475494054')
>>> 1 / (sum(1/x for x in data) / len(data))
Decimal('0.2179009364951495638060724591285330338229747648318613758290505954023583698639515519076072129091188061')

In Rust, we could add a dev-dependency on rug to compute the correct result to the necessary precision, but it's not necessary for this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's reassuring: using the results from decimal it turns out our computation is correct up to f64::EPSILON in this specific case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should definitely look into using something like rug to keep in check our numerical error. On the side, comparing with SciPy/NumPy reassures me that the algorithm implementation is roughly correct, even if due to the respective numerical error we do not agree on the smaller digits.

@jturner314 jturner314 added the Enhancement New feature or request label Mar 31, 2019
@LukeMathWalker
Copy link
Member Author

I have changed the return type to match what we decided in #25 - it should be ready to be merged now @jturner314

Copy link
Member

@jturner314 jturner314 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. The only remaining potential change is making order have type u32 or even u16 (since I can't imagine anyone wanting to calculate the 65537th order central moment). usize is fine, though.

@LukeMathWalker
Copy link
Member Author

I agree - I changed it to u16 @jturner314.

{
let n_elements =
A::from_usize(a.len()).expect("Converting number of elements to `A` must not fail");
let order = order
Copy link
Member

@jturner314 jturner314 Apr 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that order is u16, we can change this to let order = order as i32.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, it can't fail anymore. I'll merge once the CI has finished 👍

@LukeMathWalker LukeMathWalker merged commit 9e04bc0 into rust-ndarray:master Apr 6, 2019
@LukeMathWalker LukeMathWalker deleted the central-moments branch April 6, 2019 17:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants