Skip to content

Non-parametric data collocation functionality for smoothing timeseries data and estimating derivatives.

License

Notifications You must be signed in to change notification settings

SciML/DataCollocations.jl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

DataCollocations.jl

CI Documentation

DataCollocations.jl provides non-parametric data collocation functionality for smoothing timeseries data and estimating derivatives.

Two Approaches for Data Collocation

DataCollocations.jl offers two distinct methodologies for data collocation, each optimized for different data characteristics:

1. Kernel Smoothing Methods (For Noisy Data)

Robust regression-based approach for handling noisy measurements:

  • Multiple kernel functions: Epanechnikov, Triangular, Gaussian, Quartic, Triweight, Tricube, Cosine, Logistic, Sigmoid, Silverman
  • Automatic bandwidth selection: Optimally balances bias-variance tradeoff
  • Noise robustness: Designed to handle measurement noise and outliers
  • Regression splines: Smoothed fit that doesn't necessarily pass through data points
  • Best for: Experimental data with significant measurement noise

2. DataInterpolations.jl Integration (For Clean Data)

Exact interpolation approach for high-quality data:

  • Standard interpolation methods: CubicSpline, QuadraticInterpolation, BSpline, Akima, etc.
  • Exact fitting: Interpolation curves pass exactly through data points
  • Minimal noise assumption: Assumes data points are accurate measurements
  • High efficiency: Fast computation for clean, well-sampled data
  • Best for: Simulation data or high-precision measurements with minimal noise

When to Use Each Approach

Data Characteristics Recommended Method Reason
Experimental measurements with noise Kernel smoothing Robust to noise, provides smoothed estimates
Simulation results DataInterpolations Exact, efficient, preserves accuracy
Sparse, clean data DataInterpolations (CubicSpline) Exact interpolation between points
Dense, noisy data Kernel smoothing (Epanechnikov) Optimal noise handling
Very noisy data NoiseRobustDifferentiation.jl Specialized for heavy noise

Features

  • Multiple kernel functions for data smoothing
  • Automatic bandwidth selection
  • Support for DataInterpolations.jl integration
  • Derivative estimation from noisy data
  • Efficient implementation with pre-allocated arrays

Installation

Since this package is not yet registered, install it directly from GitHub:

using Pkg
Pkg.add(url="https://github.com/SciML/DataCollocations.jl")

Once registered in the General registry:

using Pkg
Pkg.add("DataCollocations")

Quick Start

using DataCollocations
using OrdinaryDiffEq

# Generate some sample data
f(u, p, t) = p .* u
prob = ODEProblem(f, [1.0], (0.0, 10.0), [-0.1])
t = collect(0.0:0.1:10.0)
data = Array(solve(prob, Tsit5(); saveat=t))

# Perform collocation to estimate derivatives and smooth data
u′, u = collocate_data(data, t, TriangularKernel(), 0.1)

# u′ contains the estimated derivatives
# u contains the smoothed data

Available Kernels

DataCollocations.jl supports multiple kernel functions for noisy data:

Bounded Support Kernels (support on [-1, 1]):

  • EpanechnikovKernel()
  • UniformKernel()
  • TriangularKernel() (default)
  • QuarticKernel()
  • TriweightKernel()
  • TricubeKernel()
  • CosineKernel()

Unbounded Support Kernels:

  • GaussianKernel()
  • LogisticKernel()
  • SigmoidKernel()
  • SilvermanKernel()

DataInterpolations.jl Integration

With DataInterpolations.jl loaded, you can use interpolation methods:

using DataInterpolations

# Use interpolation to generate data at intermediate timepoints
tpoints_sample = 0.05:0.1:9.95
u′, u = collocate_data(data, t, tpoints_sample, LinearInterpolation)

API Reference

collocate_data

u′, u = collocate_data(data, tpoints, kernel=TriangularKernel(), bandwidth=nothing)
u′, u = collocate_data(data, tpoints, tpoints_sample, interp, args...)

Arguments:

  • data: Matrix where each column is a snapshot of the timeseries
  • tpoints: Time points corresponding to data columns
  • kernel: Kernel function for smoothing (default: TriangularKernel())
  • bandwidth: Smoothing bandwidth (auto-selected if nothing)
  • tpoints_sample: Sample points for interpolation method
  • interp: Interpolation method from DataInterpolations.jl

Returns:

  • u′: Estimated derivatives
  • u: Smoothed data

Contributing

Contributions are welcome! Please see the contributing guidelines for more information.

Related Packages

Citation

If you use DataCollocations.jl in your research, please cite the collocation methodology paper:

@article{roesch2021collocation,
  title={Collocation based training of neural ordinary differential equations},
  author={Roesch, Elisabeth and Rackauckas, Christopher and Stumpf, Michael P. H.},
  journal={Statistical Applications in Genetics and Molecular Biology},
  volume={20},
  number={2},
  pages={37--49},
  year={2021},
  publisher={De Gruyter},
  doi={10.1515/sagmb-2020-0025},
  url={https://doi.org/10.1515/sagmb-2020-0025}
}

About

Non-parametric data collocation functionality for smoothing timeseries data and estimating derivatives.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

 

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages