Random Forest is a supervised machine learning algorithm that is widely used for both classification and regression tasks. It belongs to the family of ensemble learning methods, which means it builds multiple models (in this case, decision trees) and combines their results to improve overall performance and robustness.
The core idea behind Random Forest is:
- Build many decision trees.
- Each tree is trained on a random subset of the data.
- During prediction, all trees vote (classification) or average their predictions (regression).
This strategy helps to reduce overfitting and improve generalization compared to single decision trees.
- Single decision trees are prone to overfitting and high variance.
- Random Forest introduces randomness in:
- Data (bootstrap sampling)
- Features (random subsets of features at each split)
- The aggregation of results leads to more stable and accurate predictions.
In our ELVIS trading platform, the RandomForestModel is a production-grade implementation leveraging TensorFlow Decision Forests (TFDF). It includes modern ML practices:
- Integrated with Optuna for automated hyperparameter tuning
- Supports cross-validation and metric tracking
- Offers explainability through SHAP values and feature importance
- Designed for streaming updates with a
partial_fitinterface - Integrated with Prometheus and Grafana for MLOps
- Based on TFDF's
RandomForestModel - Uses
num_trees,max_depth, andmin_examplesas primary hyperparameters - Saved and loaded via
.ydfformat
- Accepts pandas DataFrames (
X_train,y_train) - Optionally optimized using an Optuna trial
- Automatically saved after training
- Standard K-Fold, StratifiedKFold, and GroupKFold strategies
- Saves fold-wise metric plots
- Pushes average results to Prometheus for Grafana monitoring
- Returns a dictionary with:
- Accuracy
- Precision
- Recall
- F1 score
- Loss
- ROC AUC
- Uses TFDF's batch prediction API
- Returns flattened NumPy array
- SHAP value summary
- TFDF's built-in feature importance extractors:
MEAN_DECREASE_IN_ACCURACYNUM_AS_ROOTSUM_SCORE
- Automatically suggests
num_trees,max_depth, andmin_examples - Integrated with cross-validation
- Can be extended to optimize learning rate, class weights, etc.
- After each
cross_validate():- Pushes average metrics to Pushgateway
- Metrics are exposed as
rf_accuracy,rf_loss, etc. - CSV version saved for external ingestion
- Saves plots from
cross_validate()as.pngand.svg - SHAP summary plot saved to
docs/plots/ - Mermaid
.mmdmodel architecture exported and rendered
TensorFlow Decision Forests does not natively support online learning. As a workaround:
partial_fit()accumulates data batches- Retrains on all previously seen data
- Simulates streaming adaptability
This ensures the model can continue learning from new market data.
The model is powered by a modular feature pipeline supporting:
- OHLCV-based features (e.g., high-low ratio, rolling means)
- Custom financial indicators
- Rolling statistics
- Future integrations for sentiment and blockchain metrics
Each version of the feature set is hashed and tracked for reproducibility.
core/
├── models/
│ └── random_forest_model.py # Main model implementation
├── features/
│ └── feature_pipeline.py # Modular feature engineering
├── viz/
│ ├── export_utils.py # SHAP, Prometheus, CSV exports
metrics/
├── model_metrics.csv
├── shap_summary.csv
prometheus/
├── pushgateway_config.yml
docs/
├── plots/
│ ├── shap_summary.png
│ └── cv_metrics.png
└── architecture_links.mmd
- Full support for streaming ML frameworks like
river - Real-time alerts from Prometheus thresholds (e.g. F1 score dip)
- Grafana dashboards for CV history, SHAP trends, and live predictions
- Automated model drift detection and retraining triggers
- IBM Random Forest Explanation
- TensorFlow Decision Forests Documentation
- Optuna: Hyperparameter Optimization Framework
- Prometheus Client for Python
- SHAP: Explainable AI
This document will be updated iteratively as new capabilities are deployed.
