Central moments #23

LukeMathWalker · 2019-01-15T08:48:33Z

Computation of central moments of arbitrary order.

Would it be worth to add another method that takes the mean as a parameter to avoid re-computing it if the user already has its value @jturner314?

LukeMathWalker · 2019-01-15T08:50:56Z

Once pairwise summation gets merged in ndarray, this will actually be a corrected pairwise two-pass algorithm for central moments.

LukeMathWalker · 2019-01-18T23:28:58Z

Added another method, central_moments, to compute multiple central moments up to a certain order without wasting time repeating several times the same steps (moment computation).

Useful to get an efficient kurtosis and skewness computation, given that their formulas involve central moments of different order.

LukeMathWalker · 2019-01-18T23:53:15Z

Added kurtosis and skewness computation.

LukeMathWalker · 2019-01-23T08:44:18Z

Fixed an issue in the kurtosis' test (I was comparing to SciPy's kurtosis with fisher=True instead of fisher=False).

In reality, this probably isn't necessary because I'd expect someone to terminate their program when trying to calculate a moment of order greater than `i32::MAX` (because it would take so long), but it's nice to have an explicit check anyway.

`A: Float` requires `A: Copy`.

I think this is a bit easier to understand at first glance.

jturner314

Thanks for working on this! It's especially nice that you found a numerically stable formula instead of using the naive calculation. I've created LukeMathWalker#3 with some minor changes and added a couple of comments. Everything else looks good.

src/summary_statistics/mod.rs

jturner314 · 2019-03-31T20:08:04Z

src/summary_statistics/means.rs

@@ -88,16 +256,114 @@ mod tests {
        // Computed using SciPy
        let expected_geometric_mean = 0.4345897639796527;

-        abs_diff_eq!(a.mean().unwrap(), expected_mean, epsilon = f64::EPSILON);
-        abs_diff_eq!(
+        assert_abs_diff_eq!(a.mean().unwrap(), expected_mean, epsilon = 1e-6);


This epsilon is fairly large. Is it due to ndarray not having pairwise summation yet?

I have tuned all epsilons to be as close as possible to 1e-12, stopping when the assertion fails.
It seems our mean estimate agrees with NumPy's with a delta of 1e-9 - I'd say that the lack of pairwise summation is a big suspect. It would be nice to have something like decimal in Rust to get a sense of the real error.

I have tuned all epsilons to be as close as possible to 1e-12, stopping when the assertion fails.

Okay, thanks. That's helpful to get a better understanding of the precision.

It would be nice to have something like decimal in Rust to get a sense of the real error.

For this particular case, decimal gives the following for the standard and harmonic mean:

>>> from decimal import * >>> getcontext().prec = 100 >>> data = [ ... Decimal('0.99889651'), Decimal('0.0150731'), Decimal('0.28492482'), Decimal('0.83819218'), Decimal('0.48413156'), Decimal('0.80710412'), Decimal('0.41762936'), ... Decimal('0.22879429'), Decimal('0.43997224'), Decimal('0.23831807'), Decimal('0.02416466'), Decimal('0.6269962'), Decimal('0.47420614'), Decimal('0.56275487'), ... Decimal('0.78995021'), Decimal('0.16060581'), Decimal('0.64635041'), Decimal('0.34876609'), Decimal('0.78543249'), Decimal('0.19938356'), Decimal('0.34429457'), ... Decimal('0.88072369'), Decimal('0.17638164'), Decimal('0.60819363'), Decimal('0.250392'), Decimal('0.69912532'), Decimal('0.78855523'), Decimal('0.79140914'), ... Decimal('0.85084218'), Decimal('0.31839879'), Decimal('0.63381769'), Decimal('0.22421048'), Decimal('0.70760302'), Decimal('0.99216018'), Decimal('0.80199153'), ... Decimal('0.19239188'), Decimal('0.61356023'), Decimal('0.31505352'), Decimal('0.06120481'), Decimal('0.66417377'), Decimal('0.63608897'), Decimal('0.84959691'), ... Decimal('0.43599069'), Decimal('0.77867775'), Decimal('0.88267754'), Decimal('0.83003623'), Decimal('0.67016118'), Decimal('0.67547638'), Decimal('0.65220036'), ... Decimal('0.68043427') ... ] >>> sum(data) / len(data) Decimal('0.5475494054') >>> 1 / (sum(1/x for x in data) / len(data)) Decimal('0.2179009364951495638060724591285330338229747648318613758290505954023583698639515519076072129091188061')

In Rust, we could add a dev-dependency on rug to compute the correct result to the necessary precision, but it's not necessary for this PR.

That's reassuring: using the results from decimal it turns out our computation is correct up to f64::EPSILON in this specific case.

We should definitely look into using something like rug to keep in check our numerical error. On the side, comparing with SciPy/NumPy reassures me that the algorithm implementation is roughly correct, even if due to the respective numerical error we do not agree on the smaller digits.

Miscellaneous small improvements to central moments

LukeMathWalker · 2019-04-02T07:04:52Z

I have changed the return type to match what we decided in #25 - it should be ready to be merged now @jturner314

jturner314

Looks good to me. The only remaining potential change is making order have type u32 or even u16 (since I can't imagine anyone wanting to calculate the 65537th order central moment). usize is fine, though.

LukeMathWalker · 2019-04-05T08:00:04Z

I agree - I changed it to u16 @jturner314.

jturner314 · 2019-04-06T02:39:53Z

src/summary_statistics/means.rs

+{
+    let n_elements =
+        A::from_usize(a.len()).expect("Converting number of elements to `A` must not fail");
+    let order = order


Now that order is u16, we can change this to let order = order as i32.

True, it can't fail anymore. I'll merge once the CI has finished 👍

LukeMathWalker added 12 commits January 15, 2019 06:58

Added definition of central order moment

62e8b02

Added stubbed implementation

89d6b8c

Add behaviour to return None if the array is empty

9a5b565

Handle edge cases

d9826e5

Added test

316a0e5

First implementation completed

00782d8

Added test for second order moment (variance)

1d634ac

Implemented test and renamed to central_moment

05bf683

Using Horner's method for evaluating polynomials

5921dd2

Add algorithm notes to the docs

c04e295

Added algorithmic complexity to the docs

9cddde5

Added two optimizations

336a51a

LukeMathWalker mentioned this pull request Jan 15, 2019

Roadmap #1

Open

17 tasks

LukeMathWalker added 6 commits January 18, 2019 22:28

Added signature for bulk central moment computation

7f012af

Fixed trait signature

cfd4dbe

Indent

2517b4d

Factored out logic from central_moment

b5a520c

Implemented central_moments

7eb268f

Test implementation correctness

2e61f2e

LukeMathWalker added 3 commits January 18, 2019 23:39

Added skewness and kurtosis

9d37f15

Added docs

4829ca6

Added tests for kurtosis and skewness

c748c8e

LukeMathWalker added 3 commits January 22, 2019 09:22

Fixed assertions

4bf0035

No need to get to order 4 for skewness

da49a28

Fixed kurtosis test

4a193bb

Enriched docs

5fe1fe8

LukeMathWalker and others added 10 commits March 26, 2019 22:15

Fmt

36428c7

Merge master

cff93fe

Avoid computing mean for p=0,1 central moments

25f3f0f

Replace map + clone with mapv

d9ba4bb

Check for moment order overflowing i32

4a05288

In reality, this probably isn't necessary because I'd expect someone to terminate their program when trying to calculate a moment of order greater than `i32::MAX` (because it would take so long), but it's nice to have an explicit check anyway.

Take advantage of IterBinomial from num-integer

3aaab13

Remove unnecessary explicit clones

7efba82

`A: Float` requires `A: Copy`.

Replace .map() with ?

0b0b0ba

I think this is a bit easier to understand at first glance.

Test more cases in central moment tests

f992453

Rename central moment tests

a8af880

jturner314 reviewed Mar 31, 2019

View reviewed changes

jturner314 added the Enhancement New feature or request label Mar 31, 2019

LukeMathWalker and others added 6 commits April 1, 2019 08:30

Push diff to the lowest possible exponent

a701f33

Merge pull request #3 from jturner314/central-moments

94f0950

Miscellaneous small improvements to central moments

Merge branch 'master' into central-moments

595af7b

Return a Result instead of Option, for consistency

1796a48

Match impl with trait definition

e21ca2b

Match test to trait+impl

8762490

Fmt

76b0077

jturner314 approved these changes Apr 2, 2019

View reviewed changes

LukeMathWalker added 2 commits April 5, 2019 08:59

Converted order to u16

74cc782

Fmt

31a3909

jturner314 reviewed Apr 6, 2019

View reviewed changes

LukeMathWalker added 3 commits April 6, 2019 18:17

Fix typo. Change casting to use as.

1f9bcf1

Fix typo. Change casting to use as.

fb65893

Fix typo. Change casting to use as.

c3a166e

LukeMathWalker merged commit 9e04bc0 into rust-ndarray:master Apr 6, 2019

LukeMathWalker deleted the central-moments branch April 6, 2019 17:30

Central moments #23

Central moments #23

Uh oh!

Conversation

LukeMathWalker commented Jan 15, 2019

Uh oh!

LukeMathWalker commented Jan 15, 2019

Uh oh!

LukeMathWalker commented Jan 18, 2019

Uh oh!

LukeMathWalker commented Jan 18, 2019

Uh oh!

LukeMathWalker commented Jan 23, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jturner314 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jturner314 Mar 31, 2019

Choose a reason for hiding this comment

Uh oh!

LukeMathWalker Apr 1, 2019

Choose a reason for hiding this comment

Uh oh!

jturner314 Apr 1, 2019

Choose a reason for hiding this comment

Uh oh!

LukeMathWalker Apr 1, 2019

Choose a reason for hiding this comment

Uh oh!

LukeMathWalker Apr 1, 2019

Choose a reason for hiding this comment

Uh oh!

LukeMathWalker commented Apr 2, 2019

Uh oh!

jturner314 left a comment

Choose a reason for hiding this comment

Uh oh!

LukeMathWalker commented Apr 5, 2019

Uh oh!

jturner314 Apr 6, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LukeMathWalker Apr 6, 2019

Choose a reason for hiding this comment

Uh oh!

Uh oh!

LukeMathWalker commented Jan 23, 2019 •

edited

Loading

jturner314 Apr 6, 2019 •

edited

Loading