A lightweight, high-performance machine learning library built in pure C++17.
Inspired by scikit-learn, TensorFlow, and pandas.
mlcore is a modern, high-performance C++ machine learning framework that brings Python-like ergonomics (think fit, predict, transform) to native C++17. Itโs engineered for low-latency, production-grade systems where you want full control, zero interpreter overhead, and clean, readable implementations of classic and modern ML algorithms.
- ๐ฅ Blazing fast native code for quant finance, embedded/edge, and HPC workflows.
- ๐งฉ Modular & composable: mix preprocessing, filters, supervised/unsupervised models into pipelines.
- ๐ Python-inspired API: scikit-learn style methods you already know.
- ๐พ Portable model I/O: Save/Load everything with RapidJSON.
- ๐ Cross-platform: Windows, Linux, macOS. Developed with Visual Studio 2022 on Windows 11.
Perfect for: algorithmic trading systems, signal processing stacks, model prototyping-to-production without a language switch, and educational deep-dives into ML internals.
- C++17 or newer
- RapidJSON (header-only) for model serialization
Supported Compilers
- MSVC 2022+
- GCC 11+
- Clang 14+
Clone the repository:
git clone https://github.com/yourusername/mlcore.git
cd mlcoreInclude in your project:
// Example
#include "mlcore/preprocessing/scaler.h"
#include "mlcore/preprocessing/split.h"
#include "mlcore/preprocessing/transform.h"
#include "mlcore/filter/kalman.h"
#include "mlcore/supervised/linreg.h"
#include "mlcore/supervised/ransac.h"
#include "mlcore/supervised/logreg.h"
#include "mlcore/supervised/nb.h"
#include "mlcore/supervised/knn.h"
#include "mlcore/supervised/svm.h"
#include "mlcore/supervised/gpr.h"
#include "mlcore/supervised/rf.h"
#include "mlcore/supervised/nn.h"
#include "mlcore/unsupervised/kmeans.h"
#include "mlcore/unsupervised/pca.h"Build your app:
- Windows (MSVC)
cl /std:c++17 /O2 main.cpp
- Linux / macOS
g++ -std=c++17 -O2 main.cpp -o app
mlcore/
โโโ preprocessing/ # Scaling, splitting, transpose
โโโ filter/ # Kalman filter
โโโ supervised/ # LinReg, RANSAC, LogReg, NB, KNN, SVM, GPR, RF, NN
โโโ unsupervised/ # KMeans, PCA
โโโ utils/ # Math/Time helpers (e.g., RapidJSON wrappers)
| Algorithm | Type | Description |
|---|---|---|
| StandardScaler | Preprocessing | Zero-mean, unit-variance feature scaling |
| MinMaxScaler | Preprocessing | Scale features to [feature_min, feature_max] |
| Transform (Transpose) | Preprocessing | Transpose vectors/matrices (mรn โ nรm) |
| Train/Test Split | Preprocessing | Deterministic split by ratio (e.g., 80/20) |
| Algorithm | Type | Description |
|---|---|---|
| Kalman Filter | Filter | 1D recursive filter for noisy signals & state estimation |
| Algorithm | Type | Description |
|---|---|---|
| Linear Regression | Regression | Closed-form slope/intercept (mean/variance/covariance) |
| RANSAC (Robust LinReg) | Regression | Outlier-robust line fit with MAD-based residual threshold |
| Logistic Regression | Classification | Binary classification via gradient descent |
| Naive Bayes (Gaussian) | Classification | Class-conditional Gaussians with priors |
| K-Nearest Neighbors | Classification | Majority vote in Euclidean space |
| SVM (Linear) | Classification | Hinge-loss, L2, SGD-style updates (0/1 mapped to ยฑ1) |
| Multi-class SVM (OvR) | Classification | One-vs-rest wrapper over linear SVM |
| Gaussian Process Regressor | Regression | RBF/Matern(3/2,5/2)/Rational-Quadratic/Periodic kernels |
| Random Forest | Classification | CART-style trees, bagging, OOB score, feature importances |
| Neural Network (Dense/Dropout) | Cls/Reg | Adam optimizer, multiple activations & loss functions |
| Algorithm | Type | Description |
|---|---|---|
| K-Means | Clustering | K-Means++ init, L2 distance, tolerance/convergence |
| PCA | Dim. Reduction | Covariance eigendecomposition, explained variance, whiten |
#include "mlcore/preprocessing/scaler.h"
#include "mlcore/preprocessing/split.h"
#include "mlcore/preprocessing/transform.h"
using namespace n_mlcore::n_preprocessing;
int main() {
std::vector<std::vector<double>> X = {{1,10}, {2,20}, {3,30}, {4,40}, {5,50}};
// StandardScaler
c_standard_scaler stds;
auto Xz = stds.fit_transform(X);
// MinMaxScaler to [0,1]
c_min_max_scaler mms(0.0, 1.0);
auto X01 = mms.fit_transform(X);
// Train/Test split (80/20)
auto [X_train, X_test] = c_split::split(X, 0.8);
// Transpose
auto Xt = c_transform::transpose(X); // shape: (5x2) -> (2x5)
(void)Xz; (void)X01; (void)X_train; (void)X_test; (void)Xt;
}#include "mlcore/filter/kalman.h"
#include <iostream>
using namespace n_mlcore::n_filter;
int main() {
c_kalman kf(1e-5, 1e-2); // process noise q, measurement noise r
std::vector<double> noisy = {0.3, 0.9, 1.1, 0.2, 1.5, 1.6};
auto smooth = kf.filter(noisy);
for (double s : smooth) std::cout << s << "\n";
}#include "mlcore/supervised/linreg.h"
#include <iostream>
using namespace n_mlcore::n_supervised;
int main() {
std::vector<double> x = {1,2,3,4,5};
std::vector<double> y = {1.2,2.1,2.9,4.2,5.0};
c_linreg lr;
lr.fit(x, y);
std::cout << "slope=" << lr.get_slope()
<< " intercept=" << lr.get_intercept() << "\n";
std::cout << "predict(3.5)=" << lr.predict(3.5) << "\n";
lr.save("linreg.json");
c_linreg lr2; lr2.load("linreg.json");
}#include "mlcore/supervised/ransac.h"
#include <iostream>
using namespace n_mlcore::n_supervised;
int main() {
std::vector<double> x = {1,2,3,4,5,6,7,8};
std::vector<double> y = {1,2,3,4,5,100,7,8}; // big outlier at x=6
c_ransac ransac(/*min_samples*/2, /*max_trials*/200);
ransac.fit(x, y);
std::cout << "RANSAC slope=" << ransac.get_slope()
<< " intercept=" << ransac.get_intercept() << "\n";
auto mask = ransac.get_inlier_mask();
std::cout << "Inliers: ";
for (size_t i=0;i<mask.size();++i) if (mask[i]) std::cout << i << " ";
std::cout << "\n";
}#include "mlcore/supervised/logreg.h"
#include <iostream>
using namespace n_mlcore::n_supervised;
int main() {
std::vector<double> x = {1,2,3,4,5};
std::vector<double> y = {0,0,0,1,1};
c_logreg model;
model.fit(x, y, /*lr*/0.1, /*epochs*/1000);
std::cout << "proba(3.5)=" << model.predict_proba(3.5) << "\n";
std::cout << "class(3.5)=" << int(model.predict(3.5)) << "\n";
}#include "mlcore/supervised/nb.h"
#include <iostream>
using namespace n_mlcore::n_supervised;
int main() {
// Two features, two classes {0,1}
std::vector<std::vector<double>> X = {{1,2},{1,3},{2,2},{8,9},{9,10},{8,10}};
std::vector<int> y = {0,0,0,1,1,1};
c_nb nb;
nb.fit(X, y);
std::vector<double> sample = {2.0, 2.5};
int pred = nb.predict(sample);
auto probs = nb.predict_proba(sample);
std::cout << "NB pred=" << pred << " P(0)=" << probs[0] << " P(1)=" << probs[1] << "\n";
}#include "mlcore/supervised/knn.h"
#include <iostream>
using namespace n_mlcore::n_supervised;
int main() {
std::vector<std::vector<double>> X = {{1,2},{2,3},{3,4},{8,9},{9,10}};
std::vector<int> y = {0,0,0,1,1};
c_knn knn(3);
knn.fit(X, y);
std::cout << "KNN pred=" << knn.predict({2.5,3.0}) << "\n";
}#include "mlcore/supervised/svm.h"
#include <iostream>
using namespace n_mlcore::n_supervised;
int main() {
// Binary SVM (labels 0/1)
std::vector<std::vector<double>> Xb = {{1,2},{2,2},{2,3},{8,8},{9,9},{9,10}};
std::vector<int> yb = {0,0,0,1,1,1};
c_svm svm(/*lr*/0.01, /*C*/1.0, /*epochs*/1000);
svm.fit(Xb, yb);
std::cout << "SVM pred=" << svm.predict({2.5,2.5}) << "\n";
// Multi-class (0,1,2) with One-vs-Rest
std::vector<std::vector<double>> Xm = {{1,1},{2,1},{1,2}, {8,8},{9,8},{8,9}, {4,8},{5,9},{6,8}};
std::vector<int> ym = {0,0,0, 1,1,1, 2,2,2};
c_multi_svm msvm(0.01, 1.0, 1000);
msvm.fit(Xm, ym);
std::cout << "MultiSVM pred=" << msvm.predict({5,8.2}) << "\n";
}#include "mlcore/supervised/gpr.h"
#include <iostream>
using namespace n_mlcore::n_supervised;
int main() {
std::vector<std::vector<double>> X = {{1},{2},{3},{4},{5}};
std::vector<double> y = {1.0, 2.1, 2.9, 4.2, 5.0};
c_gpr gpr; // defaults: RBF kernel, normalize_y=true
gpr.fit(X, y);
auto [mean, stddev] = gpr.predict({{2.5},{3.5}}, /*return_std*/true);
for (size_t i=0;i<mean.size();++i)
std::cout << "x*[" << i << "]: mean=" << mean[i] << " std=" << stddev[i] << "\n";
}#include "mlcore/supervised/rf.h"
#include <iostream>
using namespace n_mlcore::n_supervised;
int main() {
std::vector<std::vector<double>> X = {{1,2},{2,3},{3,4},{8,9},{9,10}};
std::vector<int> y = {0,0,0,1,1};
c_rf forest(/*n_estimators*/50, /*max_depth*/-1, /*min_samples_split*/2,
/*max_features*/-1, /*bootstrap*/true, /*oob*/true, /*seed*/42);
forest.fit(X, y);
auto preds = forest.predict(X);
std::cout << "OOB score ~ " << forest.oob_score() << "\n";
std::cout << "preds: "; for (int p : preds) std::cout << p << " "; std::cout << "\n";
auto importances = forest.feature_importances();
std::cout << "feature importances: "; for (auto w: importances) std::cout << w << " "; std::cout << "\n";
forest.save("rf.json");
c_rf rf2; rf2.load("rf.json");
}#include "mlcore/supervised/nn.h"
#include <iostream>
using namespace n_mlcore::n_supervised;
int main() {
c_nn nn(/*lr*/0.001, /*l2*/0.0001, e_NN_LOSS_FUNCTION::MSE);
nn.add_dense(2, 8, e_NN_ACTIVATION_TYPE::RELU);
nn.add_dropout(0.2);
nn.add_dense(8, 1, e_NN_ACTIVATION_TYPE::SIGMOID);
std::vector<std::vector<double>> X = {{0,0},{0,1},{1,0},{1,1}};
std::vector<std::vector<double>> Y = {{0},{1},{1},{0}};
nn.fit(X, Y, /*epochs*/500);
auto out = nn.predict({0,1});
std::cout << "nn([0,1])=" << out[0] << "\n";
nn.save("nn.json");
c_nn nn2; nn2.load("nn.json");
}#include "mlcore/unsupervised/kmeans.h"
#include <iostream>
using namespace n_mlcore::n_unsupervised;
int main() {
std::vector<std::vector<double>> X = {{1,1},{1,2},{2,1},{8,8},{9,9},{8,9}};
c_kmeans kmeans(/*k*/2, /*max_iter*/100, /*tol*/1e-4);
kmeans.fit(X);
auto labels = kmeans.predict(X);
auto C = kmeans.get_centroids();
std::cout << "labels: "; for (auto l: labels) std::cout << l << " "; std::cout << "\n";
std::cout << "centroids:\n";
for (auto& c: C) std::cout << c[0] << "," << c[1] << "\n";
kmeans.save("kmeans.json");
c_kmeans km2; km2.load("kmeans.json");
}#include "mlcore/unsupervised/pca.h"
#include <iostream>
using namespace n_mlcore::n_unsupervised;
int main() {
std::vector<std::vector<double>> X = {
{1,2,3}, {4,5,6}, {7,8,9}, {10,11,12}
};
c_pca pca(/*n_components*/2); // or c_pca(/*target_var_ratio*/0.95, /*whiten*/true)
auto Z = pca.fit_transform(X);
std::cout << "Z (2D):\n";
for (auto& row : Z) { for (double v: row) std::cout << v << " "; std::cout << "\n"; }
auto Xrec = pca.inverse_transform(Z);
std::cout << "Reconstruction:\n";
for (auto& row : Xrec) { for (double v: row) std::cout << v << " "; std::cout << "\n"; }
}All algorithms expose save(filename) and load(filename):
c_linreg lr; lr.fit(x, y); lr.save("linreg.json");
c_linreg lr2; lr2.load("linreg.json");
// Works similarly for: c_ransac, c_logreg, c_nb, c_knn, c_svm, c_multi_svm,
// c_gpr, c_rf, c_nn, c_kmeans, c_pca (and scalers where implemented)- API parity: Methods mirror Pythonic names (
fit,predict,transform,fit_transform). - NN losses:
MSE,MAE,CROSS_ENTROPY,BINARY_CROSS_ENTROPY,HUBER,HINGE,KL. - NN activations:
SIGMOID,TANH,RELU,LEAKY_RELU,LINEAR,SOFTMAX. - GPR kernels:
RBF,Matern(3/2,5/2),RationalQuadratic(ฮฑ),Periodic(period). - GPR optimizer: Optional random search restarts.
- Random Forest: OOB scoring + normalized feature importances.
- RANSAC: Robust residual threshold via MAD if not provided.
Licensed under the MIT License.