In this project, we apply Random forest model to analyze composition of Portfolios. We only release
In order to have balanced and meaningful feature of each stocks, we calculate some meaningful features from raw data. With raw data at hand, we recommend calculating meaningful features so that the training result is reasonable.
We Calculate Beta for each Sector. E(Ri) = β1E(RTech) + β2E(RUtility) + ... + β1E(RFinancial) + ε
Market Cap, Sector Betas, Volatility, EV/Asset Ratio, P/E Ratio, ...etc.
Use GridSearchCV from sklearn to search the optimal hyperparameter (we use 5-fold cross validation and Roc_Auc as evalutating score).
Use RandomForestClassifier as classifier.
In order to get balance training set, we use upsampling tecnique.
Save trained models by pickle.
We get over 95% accuracy on sector prediction, and average 84% accuracy on other metrics prediction.
Wuzhe Xu