Skip to content

overvac/mlcore

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

2 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

mlcore

๐Ÿš€ Modern C++17 Machine Learning Library

A lightweight, high-performance machine learning library built in pure C++17.
Inspired by scikit-learn, TensorFlow, and pandas.

C++17 Platforms License Build


๐Ÿ“– Overview

mlcore is a modern, high-performance C++ machine learning framework that brings Python-like ergonomics (think fit, predict, transform) to native C++17. Itโ€™s engineered for low-latency, production-grade systems where you want full control, zero interpreter overhead, and clean, readable implementations of classic and modern ML algorithms.

  • ๐Ÿ”ฅ Blazing fast native code for quant finance, embedded/edge, and HPC workflows.
  • ๐Ÿงฉ Modular & composable: mix preprocessing, filters, supervised/unsupervised models into pipelines.
  • ๐Ÿ Python-inspired API: scikit-learn style methods you already know.
  • ๐Ÿ’พ Portable model I/O: Save/Load everything with RapidJSON.
  • ๐ŸŒ Cross-platform: Windows, Linux, macOS. Developed with Visual Studio 2022 on Windows 11.

Perfect for: algorithmic trading systems, signal processing stacks, model prototyping-to-production without a language switch, and educational deep-dives into ML internals.


๐Ÿ›  Requirements

  • C++17 or newer
  • RapidJSON (header-only) for model serialization

Supported Compilers

  • MSVC 2022+
  • GCC 11+
  • Clang 14+

๐Ÿš€ Installation

Clone the repository:

git clone https://github.com/yourusername/mlcore.git
cd mlcore

Include in your project:

// Example
#include "mlcore/preprocessing/scaler.h"
#include "mlcore/preprocessing/split.h"
#include "mlcore/preprocessing/transform.h"
#include "mlcore/filter/kalman.h"
#include "mlcore/supervised/linreg.h"
#include "mlcore/supervised/ransac.h"
#include "mlcore/supervised/logreg.h"
#include "mlcore/supervised/nb.h"
#include "mlcore/supervised/knn.h"
#include "mlcore/supervised/svm.h"
#include "mlcore/supervised/gpr.h"
#include "mlcore/supervised/rf.h"
#include "mlcore/supervised/nn.h"
#include "mlcore/unsupervised/kmeans.h"
#include "mlcore/unsupervised/pca.h"

Build your app:

  • Windows (MSVC)
    cl /std:c++17 /O2 main.cpp
  • Linux / macOS
    g++ -std=c++17 -O2 main.cpp -o app

๐Ÿ“‚ Project Structure

mlcore/
โ”œโ”€โ”€ preprocessing/     # Scaling, splitting, transpose
โ”œโ”€โ”€ filter/            # Kalman filter
โ”œโ”€โ”€ supervised/        # LinReg, RANSAC, LogReg, NB, KNN, SVM, GPR, RF, NN
โ”œโ”€โ”€ unsupervised/      # KMeans, PCA
โ””โ”€โ”€ utils/             # Math/Time helpers (e.g., RapidJSON wrappers)

๐Ÿง  Algorithms Implemented (uniform tables)

๐Ÿ”ง Preprocessing

Algorithm Type Description
StandardScaler Preprocessing Zero-mean, unit-variance feature scaling
MinMaxScaler Preprocessing Scale features to [feature_min, feature_max]
Transform (Transpose) Preprocessing Transpose vectors/matrices (mร—n โ†” nร—m)
Train/Test Split Preprocessing Deterministic split by ratio (e.g., 80/20)

๐Ÿช„ Filters

Algorithm Type Description
Kalman Filter Filter 1D recursive filter for noisy signals & state estimation

๐Ÿค– Supervised Learning

Algorithm Type Description
Linear Regression Regression Closed-form slope/intercept (mean/variance/covariance)
RANSAC (Robust LinReg) Regression Outlier-robust line fit with MAD-based residual threshold
Logistic Regression Classification Binary classification via gradient descent
Naive Bayes (Gaussian) Classification Class-conditional Gaussians with priors
K-Nearest Neighbors Classification Majority vote in Euclidean space
SVM (Linear) Classification Hinge-loss, L2, SGD-style updates (0/1 mapped to ยฑ1)
Multi-class SVM (OvR) Classification One-vs-rest wrapper over linear SVM
Gaussian Process Regressor Regression RBF/Matern(3/2,5/2)/Rational-Quadratic/Periodic kernels
Random Forest Classification CART-style trees, bagging, OOB score, feature importances
Neural Network (Dense/Dropout) Cls/Reg Adam optimizer, multiple activations & loss functions

๐ŸŒ€ Unsupervised Learning

Algorithm Type Description
K-Means Clustering K-Means++ init, L2 distance, tolerance/convergence
PCA Dim. Reduction Covariance eigendecomposition, explained variance, whiten

๐Ÿ“Š Usage Examples

Preprocessing: StandardScaler, MinMaxScaler, Split, Transpose

#include "mlcore/preprocessing/scaler.h"
#include "mlcore/preprocessing/split.h"
#include "mlcore/preprocessing/transform.h"
using namespace n_mlcore::n_preprocessing;

int main() {
    std::vector<std::vector<double>> X = {{1,10}, {2,20}, {3,30}, {4,40}, {5,50}};

    // StandardScaler
    c_standard_scaler stds;
    auto Xz = stds.fit_transform(X);

    // MinMaxScaler to [0,1]
    c_min_max_scaler mms(0.0, 1.0);
    auto X01 = mms.fit_transform(X);

    // Train/Test split (80/20)
    auto [X_train, X_test] = c_split::split(X, 0.8);

    // Transpose
    auto Xt = c_transform::transpose(X); // shape: (5x2) -> (2x5)

    (void)Xz; (void)X01; (void)X_train; (void)X_test; (void)Xt;
}

Filter: Kalman (1D)

#include "mlcore/filter/kalman.h"
#include <iostream>
using namespace n_mlcore::n_filter;

int main() {
    c_kalman kf(1e-5, 1e-2); // process noise q, measurement noise r
    std::vector<double> noisy = {0.3, 0.9, 1.1, 0.2, 1.5, 1.6};
    auto smooth = kf.filter(noisy);

    for (double s : smooth) std::cout << s << "\n";
}

Linear Regression

#include "mlcore/supervised/linreg.h"
#include <iostream>
using namespace n_mlcore::n_supervised;

int main() {
    std::vector<double> x = {1,2,3,4,5};
    std::vector<double> y = {1.2,2.1,2.9,4.2,5.0};

    c_linreg lr;
    lr.fit(x, y);

    std::cout << "slope=" << lr.get_slope()
              << " intercept=" << lr.get_intercept() << "\n";
    std::cout << "predict(3.5)=" << lr.predict(3.5) << "\n";

    lr.save("linreg.json");
    c_linreg lr2; lr2.load("linreg.json");
}

RANSAC (Robust Line Fitting)

#include "mlcore/supervised/ransac.h"
#include <iostream>
using namespace n_mlcore::n_supervised;

int main() {
    std::vector<double> x = {1,2,3,4,5,6,7,8};
    std::vector<double> y = {1,2,3,4,5,100,7,8}; // big outlier at x=6

    c_ransac ransac(/*min_samples*/2, /*max_trials*/200);
    ransac.fit(x, y);

    std::cout << "RANSAC slope=" << ransac.get_slope()
              << " intercept=" << ransac.get_intercept() << "\n";

    auto mask = ransac.get_inlier_mask();
    std::cout << "Inliers: ";
    for (size_t i=0;i<mask.size();++i) if (mask[i]) std::cout << i << " ";
    std::cout << "\n";
}

Logistic Regression (Binary)

#include "mlcore/supervised/logreg.h"
#include <iostream>
using namespace n_mlcore::n_supervised;

int main() {
    std::vector<double> x = {1,2,3,4,5};
    std::vector<double> y = {0,0,0,1,1};

    c_logreg model;
    model.fit(x, y, /*lr*/0.1, /*epochs*/1000);

    std::cout << "proba(3.5)=" << model.predict_proba(3.5) << "\n";
    std::cout << "class(3.5)=" << int(model.predict(3.5)) << "\n";
}

Naive Bayes (Gaussian)

#include "mlcore/supervised/nb.h"
#include <iostream>
using namespace n_mlcore::n_supervised;

int main() {
    // Two features, two classes {0,1}
    std::vector<std::vector<double>> X = {{1,2},{1,3},{2,2},{8,9},{9,10},{8,10}};
    std::vector<int> y = {0,0,0,1,1,1};

    c_nb nb;
    nb.fit(X, y);

    std::vector<double> sample = {2.0, 2.5};
    int pred = nb.predict(sample);
    auto probs = nb.predict_proba(sample);

    std::cout << "NB pred=" << pred << "  P(0)=" << probs[0] << "  P(1)=" << probs[1] << "\n";
}

K-Nearest Neighbors

#include "mlcore/supervised/knn.h"
#include <iostream>
using namespace n_mlcore::n_supervised;

int main() {
    std::vector<std::vector<double>> X = {{1,2},{2,3},{3,4},{8,9},{9,10}};
    std::vector<int> y = {0,0,0,1,1};

    c_knn knn(3);
    knn.fit(X, y);

    std::cout << "KNN pred=" << knn.predict({2.5,3.0}) << "\n";
}

SVM (Binary) & Multi-class SVM (OvR)

#include "mlcore/supervised/svm.h"
#include <iostream>
using namespace n_mlcore::n_supervised;

int main() {
    // Binary SVM (labels 0/1)
    std::vector<std::vector<double>> Xb = {{1,2},{2,2},{2,3},{8,8},{9,9},{9,10}};
    std::vector<int> yb = {0,0,0,1,1,1};

    c_svm svm(/*lr*/0.01, /*C*/1.0, /*epochs*/1000);
    svm.fit(Xb, yb);
    std::cout << "SVM pred=" << svm.predict({2.5,2.5}) << "\n";

    // Multi-class (0,1,2) with One-vs-Rest
    std::vector<std::vector<double>> Xm = {{1,1},{2,1},{1,2}, {8,8},{9,8},{8,9}, {4,8},{5,9},{6,8}};
    std::vector<int> ym = {0,0,0, 1,1,1, 2,2,2};

    c_multi_svm msvm(0.01, 1.0, 1000);
    msvm.fit(Xm, ym);
    std::cout << "MultiSVM pred=" << msvm.predict({5,8.2}) << "\n";
}

Gaussian Process Regressor

#include "mlcore/supervised/gpr.h"
#include <iostream>
using namespace n_mlcore::n_supervised;

int main() {
    std::vector<std::vector<double>> X = {{1},{2},{3},{4},{5}};
    std::vector<double> y = {1.0, 2.1, 2.9, 4.2, 5.0};

    c_gpr gpr; // defaults: RBF kernel, normalize_y=true
    gpr.fit(X, y);

    auto [mean, stddev] = gpr.predict({{2.5},{3.5}}, /*return_std*/true);
    for (size_t i=0;i<mean.size();++i)
        std::cout << "x*[" << i << "]: mean=" << mean[i] << " std=" << stddev[i] << "\n";
}

Random Forest (Classification)

#include "mlcore/supervised/rf.h"
#include <iostream>
using namespace n_mlcore::n_supervised;

int main() {
    std::vector<std::vector<double>> X = {{1,2},{2,3},{3,4},{8,9},{9,10}};
    std::vector<int> y = {0,0,0,1,1};

    c_rf forest(/*n_estimators*/50, /*max_depth*/-1, /*min_samples_split*/2,
                /*max_features*/-1, /*bootstrap*/true, /*oob*/true, /*seed*/42);

    forest.fit(X, y);

    auto preds = forest.predict(X);
    std::cout << "OOB score ~ " << forest.oob_score() << "\n";
    std::cout << "preds: "; for (int p : preds) std::cout << p << " "; std::cout << "\n";

    auto importances = forest.feature_importances();
    std::cout << "feature importances: "; for (auto w: importances) std::cout << w << " "; std::cout << "\n";

    forest.save("rf.json");
    c_rf rf2; rf2.load("rf.json");
}

Neural Network (Dense + Dropout)

#include "mlcore/supervised/nn.h"
#include <iostream>
using namespace n_mlcore::n_supervised;

int main() {
    c_nn nn(/*lr*/0.001, /*l2*/0.0001, e_NN_LOSS_FUNCTION::MSE);

    nn.add_dense(2, 8, e_NN_ACTIVATION_TYPE::RELU);
    nn.add_dropout(0.2);
    nn.add_dense(8, 1, e_NN_ACTIVATION_TYPE::SIGMOID);

    std::vector<std::vector<double>> X = {{0,0},{0,1},{1,0},{1,1}};
    std::vector<std::vector<double>> Y = {{0},{1},{1},{0}};

    nn.fit(X, Y, /*epochs*/500);
    auto out = nn.predict({0,1});
    std::cout << "nn([0,1])=" << out[0] << "\n";

    nn.save("nn.json");
    c_nn nn2; nn2.load("nn.json");
}

K-Means (Clustering)

#include "mlcore/unsupervised/kmeans.h"
#include <iostream>
using namespace n_mlcore::n_unsupervised;

int main() {
    std::vector<std::vector<double>> X = {{1,1},{1,2},{2,1},{8,8},{9,9},{8,9}};
    c_kmeans kmeans(/*k*/2, /*max_iter*/100, /*tol*/1e-4);
    kmeans.fit(X);

    auto labels = kmeans.predict(X);
    auto C = kmeans.get_centroids();

    std::cout << "labels: "; for (auto l: labels) std::cout << l << " "; std::cout << "\n";
    std::cout << "centroids:\n";
    for (auto& c: C) std::cout << c[0] << "," << c[1] << "\n";

    kmeans.save("kmeans.json");
    c_kmeans km2; km2.load("kmeans.json");
}

PCA (Dimensionality Reduction)

#include "mlcore/unsupervised/pca.h"
#include <iostream>
using namespace n_mlcore::n_unsupervised;

int main() {
    std::vector<std::vector<double>> X = {
        {1,2,3}, {4,5,6}, {7,8,9}, {10,11,12}
    };

    c_pca pca(/*n_components*/2);          // or c_pca(/*target_var_ratio*/0.95, /*whiten*/true)
    auto Z = pca.fit_transform(X);

    std::cout << "Z (2D):\n";
    for (auto& row : Z) { for (double v: row) std::cout << v << " "; std::cout << "\n"; }

    auto Xrec = pca.inverse_transform(Z);
    std::cout << "Reconstruction:\n";
    for (auto& row : Xrec) { for (double v: row) std::cout << v << " "; std::cout << "\n"; }
}

๐Ÿ’พ Model Saving & Loading (RapidJSON)

All algorithms expose save(filename) and load(filename):

c_linreg lr; lr.fit(x, y); lr.save("linreg.json");
c_linreg lr2; lr2.load("linreg.json");

// Works similarly for: c_ransac, c_logreg, c_nb, c_knn, c_svm, c_multi_svm,
// c_gpr, c_rf, c_nn, c_kmeans, c_pca (and scalers where implemented)

๐Ÿงช Design Notes & Good-to-Know

  • API parity: Methods mirror Pythonic names (fit, predict, transform, fit_transform).
  • NN losses: MSE, MAE, CROSS_ENTROPY, BINARY_CROSS_ENTROPY, HUBER, HINGE, KL.
  • NN activations: SIGMOID, TANH, RELU, LEAKY_RELU, LINEAR, SOFTMAX.
  • GPR kernels: RBF, Matern(3/2,5/2), RationalQuadratic(ฮฑ), Periodic(period).
  • GPR optimizer: Optional random search restarts.
  • Random Forest: OOB scoring + normalized feature importances.
  • RANSAC: Robust residual threshold via MAD if not provided.

๐Ÿ“ License

Licensed under the MIT License.

About

Machine learning library in C++17

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages