This project suggests using reinforcement learning to address portfolio optimization challenges. It is mentored by Mathworks' Valerio Sperandeo and Alejandra Pena-Ordieres. MathWorks Blog can be found HERE.
Simply put, the purpose is to optimize the probability of meeting the target after making periodic investments in a particular portfolio by altering weights based on remaining time and wealth in hand.
This is a sequential decision-making problem, at which reinforcement learning excels. In this project, we developed a reinforcement learning system that provides investors with regular investment recommendations based on their investing goals. We researched the different reward While what intestors should perform is straightforward:
If you know the combination of stocks you want to invest in and tell our system what sort of rate of return you want after how many years, our system will be able to swiftly and dynamically give you with the best feasible option in order to reach your goal.
Set up and run GBWM-RLToolbox/main.mlx by
- Specifying the symbols of stocks want to invest in
- Specifying the period of investment (in years)
- Specifying the total expected return
Goal-Based Wealth Management with Reinforcement Learning
initDate = '2017-01-01';
endDate = '2018-01-01';
initDate = datetime(initDate, 'InputFormat','yyyy-MM-dd');
endDate = datetime(endDate, 'InputFormat','yyyy-MM-dd');
portfolio = ["URTH"; "HYG";"LQD";"DBC"];
c = containers.Map;
for k=1:length(portfolio)
symbol = portfolio(k);
data = Yahooscraper(convertStringsToChars(symbol), initDate, endDate, '1d');
TT = table2timetable(data(:,[1,4]));
TT.Properties.VariableNames = {convertStringsToChars(symbol)};
c(symbol) = TT;
end
f = @(i) c(portfolio(i));
T = synchronize(f(1),f(2),f(3), f(4), 'Intersection');
prices = [T.URTH, T.HYG, T.LQD, T.DBC];
ts = timeseries(prices, datestr(T.Date));
ts.Name = " Portfolio Prices over Time";
plot(ts);
legend({'URTH'; 'HYG';'LQD';'DBC'});
...
env = MultiFactorGBWMEnvironment(G, T, grid, cash, w0_idx, pwgt, pret, prsk, line, simulate_n_periods, simulate_dt, simulate_n_trials);
Optimal Policy (Indicated by color, the colder the riskier):
100 Investment Trials with Optimal Policy: (Red Lines: successful investments)
Training Episodes:
env = LineGoalGBWMEnvironment(G, T, grid, cash, w0_idx, pwgt, pret, prsk, line, simulate_n_periods, simulate_dt, simulate_n_trials);
env = ScaleGBWMEnvironment(G, T, grid, cash, w0_idx, pwgt, pret, prsk, gamma, simulate_n_periods, simulate_dt, simulate_n_trials);
env = SparseGoalGBWMEnvironment(G, T, grid, cash, w0_idx, pwgt, pret, prsk, simulate_n_periods, simulate_dt, simulate_n_trials);
Portfolio Optimization by Reinforcement Learning (Q-learning) and Dynamic Programming with MATLAB.
Reinforcement-learning: Open and run agent/main_qln.m OR rl_demo.mlx.
Dynamic Programming: Open and run agent/main_dp.m OR dp_demo.mlx.
Open rl_demo.pdf OR dp_demo.pdf.
Sample Investment Sequence (2-D State Space: investment period & wealth in hand):
Optimal Q-learning Policy (Indicated by color, the colder the riskier):
100 Investment Trials with Optimal Policy: (Red Lines: successful investments)
Higher hitting rate achieved!
- Botao Zhang ([email protected]) * Bowen Fang ([email protected]) * Chongyi Chie ([email protected]) * Yichen Yao ([email protected])