Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration of Continual Learning tasks and algorithm (WIP) #45

Draft
wants to merge 171 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
171 commits
Select commit Hold shift + click to select a range
967e4ac
Added targets files
Caselles Mar 14, 2019
cbffb3b
new tasks for continual learning: random target, circular and square …
kalifou Mar 14, 2019
8949baa
Merge branch 'circular_movement_omnibot' of https://github.com/Gaspar…
Caselles Mar 15, 2019
8f23671
adding args for learning the CL tasks
kalifou Mar 15, 2019
6750919
Merge branch 'circular_movement_omnibot' of https://github.com/Gaspar…
Caselles Mar 15, 2019
daff917
collect CL args for replay
kalifou Mar 15, 2019
ad352c9
Merge branch 'circular_movement_omnibot' of https://github.com/Gaspar…
Caselles Mar 15, 2019
bc21db4
WIP on continual tasks
Caselles Mar 24, 2019
c68baeb
Continual tasks: added vizu and solved a few bugs
Caselles Mar 26, 2019
a1d4da9
Solved bug on history not getting emptied between episodes
Caselles Mar 26, 2019
bf9c6c9
add penality for bumping
kalifou Mar 26, 2019
7de393d
coeff for circular task
kalifou Mar 26, 2019
edd82fa
fix reward shaping with the product operator
kalifou Mar 27, 2019
ed6923c
adding new task - eight shape (draft)
kalifou Apr 1, 2019
1920de4
On-Policy dataset-generator
kalifou Apr 11, 2019
e6bdde1
add small fix
kalifou Apr 11, 2019
93ea793
Merge branch 'master' of https://github.com/GaspardQin/robotics-rl-sr…
TLESORT Apr 12, 2019
a5b6623
Generative Replay for Dataset generation
kalifou Apr 12, 2019
b976aa8
Merge branch 'circular_movement_omnibot' of https://github.com/Gaspar…
TLESORT Apr 12, 2019
71f29fa
fix to on-policy generation for srl based policies
kalifou Apr 12, 2019
413c447
fix to loading args for replay
kalifou Apr 12, 2019
e99f6f2
Merge branch 'circular_movement_omnibot' of https://github.com/Gaspar…
TLESORT Apr 12, 2019
6082a68
first steps towards policy distillation
TLESORT Apr 12, 2019
a6dbfe8
small fix (init OmniRobotManagerBase)
kalifou Apr 15, 2019
bc224b8
cleaning pkgs imports
kalifou Apr 15, 2019
bcb12db
clean up & loading srl model in distillation script
kalifou Apr 15, 2019
4fb1e7b
cross-evaluation
sun-te Apr 15, 2019
deaf450
cross-evaluation
sun-te Apr 15, 2019
ca5df77
read-me update
sun-te Apr 15, 2019
c2d0405
plot results
sun-te Apr 16, 2019
89aa184
cross evaluation and comparison plot
sun-te Apr 16, 2019
7e07c29
On-policy generation: Fix to save action proba
kalifou Apr 16, 2019
c05bf7d
draft: Policy distillation
kalifou Apr 16, 2019
a612571
loss update
kalifou Apr 16, 2019
94aba32
bug-fix for pipeline cross
sun-te Apr 17, 2019
c69e558
format
kalifou Apr 17, 2019
a352a77
format and update data loader in submodule (srl_zoo)
kalifou Apr 17, 2019
3a7de3a
Merge branch 'dev_distillation' of https://github.com/GaspardQin/robo…
kalifou Apr 17, 2019
3d133ab
fix for off-policy generation
kalifou Apr 17, 2019
b797657
distillation: handling ts size & saving model
kalifou Apr 17, 2019
e446489
fix for replay
kalifou Apr 17, 2019
9de7d58
additional fix for replay
kalifou Apr 17, 2019
9e8bfb8
change loss for distillation for mse, it seems to work better
TLESORT Apr 17, 2019
b19da9f
update Distillation loss: swith to MSE
kalifou Apr 17, 2019
1bb20a5
format, fix args safety & load
kalifou Apr 18, 2019
9f64156
remove useless script, fix distillation with raw pixels & dataset fus…
kalifou Apr 18, 2019
39c5d22
fix for distillation using RL from raw_pixels
kalifou Apr 18, 2019
dd1fdbe
Merge branch 'dev_distillation' into circular_movement_omnibot
kalifou Apr 23, 2019
1edda89
cross evaluation during training
sun-te Apr 23, 2019
9cace21
fix for distillation from raw_pixels
kalifou Apr 23, 2019
dd8cba7
fix data fusioner
kalifou Apr 24, 2019
6e865aa
Option for generating shorter episodes (SC) and fix
kalifou Apr 24, 2019
8797884
command for distillation readme updated
TLESORT Apr 24, 2019
911a7e0
fix merged conflict, i have put MSE for distillation loss
TLESORT Apr 24, 2019
159947f
command for distillation readme updated
TLESORT Apr 24, 2019
8ddd317
loss MSE for distillation
Caselles Apr 24, 2019
570251d
Update Readme for distillation
kalifou Apr 24, 2019
db434a5
Added KL loss for distilaltion
Caselles Apr 24, 2019
be399f7
change cpu_number to 6
TLESORT Apr 24, 2019
1be664a
change cpu_number to 6
TLESORT Apr 24, 2019
85b588f
file added to be able to run all experiments at once
TLESORT Apr 24, 2019
cb160b7
Update readme
sun-te Apr 24, 2019
2574888
add args for replay when loading specific env task (Omnirobot)
kalifou Apr 24, 2019
ca757a8
Merge branch 'circular_movement_omnibot' of https://github.com/Gaspar…
kalifou Apr 24, 2019
487df1b
fix command for random dataset generator
TLESORT Apr 25, 2019
3687a75
Added sample flag for getAction and fixed KL loss for distillation.
Caselles Apr 25, 2019
44730bc
fix command for train SRL and added a force flag for dataset generator
TLESORT Apr 25, 2019
2876f53
Merge branch 'circular_movement_omnibot' of https://github.com/Gaspar…
TLESORT Apr 25, 2019
a3a0784
update datafusionner to log task labels to each obs
kalifou Apr 25, 2019
059a190
update option for shorter eps (CC)
kalifou Apr 25, 2019
1a42fdd
update: add option in distillation loss for temperature depending on …
kalifou Apr 25, 2019
62bb031
fix adaptive loss
kalifou Apr 25, 2019
2a6c0db
update safety
kalifou Apr 25, 2019
f3b21a8
distillation readme fully tested, normally everything is written in i…
TLESORT Apr 25, 2019
74fe983
default temperature changed to 0.1
TLESORT Apr 25, 2019
621cbaf
Merge branch 'circular_movement_omnibot' into cross-task
kalifou Apr 25, 2019
cfa068f
merge: circular_... into cross_task
kalifou Apr 25, 2019
f6a5d72
short episode flag added into dataset_generator for cc
TLESORT Apr 26, 2019
3cbd093
cross evaluation for model trained by srl
sun-te Apr 26, 2019
7e39645
Update Distilation_Readme.md
sun-te Apr 26, 2019
619cd6b
Update Distilation_Readme.md
sun-te Apr 26, 2019
eee27f4
cross eval
sun-te Apr 26, 2019
d3f799e
Merge branch 'cross-task' of https://github.com/GaspardQin/robotics-r…
sun-te Apr 26, 2019
34fff5d
fix MLP Policy for distillation
kalifou Apr 26, 2019
cc6c9d4
Merge branch 'circular_movement_omnibot' into cross-task
kalifou Apr 26, 2019
ef81461
add automatic creation of save_path if the folder does not exist
TLESORT Apr 26, 2019
a90681b
add latest past possible (to use carefully)
TLESORT Apr 26, 2019
4ab3c49
add latest flag
TLESORT Apr 26, 2019
6fd19d1
normalisation of reward
sun-te Apr 26, 2019
6b37395
update and fix readme with comments
TLESORT Apr 26, 2019
a776b9a
Merge branch 'cross-task' of https://github.com/GaspardQin/robotics-r…
sun-te Apr 26, 2019
bca7eaf
Merge branch 'circular_movement_omnibot' of https://github.com/Gaspar…
TLESORT Apr 26, 2019
919fc9b
first version of script to run all in once
TLESORT Apr 26, 2019
66b6bdf
small fix, NB : starting from a clean repos is recommended to run the…
TLESORT Apr 26, 2019
be7b779
tested run in once script
TLESORT Apr 26, 2019
162d96a
dry run file for end to end testing
TLESORT Apr 26, 2019
95e02be
name's folder have been parametrize to easely change path
TLESORT Apr 26, 2019
77b1cb0
evaluation for student policy(TODO)
sun-te Apr 26, 2019
f586d0c
Added scripts for raw pixels
Caselles Apr 26, 2019
6fdba9e
grid walking policy (draft)
kalifou Apr 27, 2019
adf48fb
grid walker for exploration in on-policy data generation
kalifou Apr 27, 2019
f53dbc5
Option for finetuning of SRL while distilling
kalifou Apr 29, 2019
4cdb4f4
cross_eval and student eval
sun-te Apr 29, 2019
c8af536
update: student policy distillation and eval while learning a teacher…
kalifou Apr 29, 2019
e561b93
update of distillation eval
kalifou Apr 29, 2019
35286ef
update eval distillation
kalifou Apr 29, 2019
4c6bdfd
Adjust episode window for checkpoints when saving a teacher
kalifou Apr 29, 2019
324aa20
fix default value
kalifou Apr 29, 2019
db3fb30
Merge branch 'circular_movement_omnibot' into cross-task
kalifou Apr 30, 2019
a09cb48
update distillation eval
kalifou Apr 30, 2019
8afa433
add fix (copy merge to proper loc)
kalifou Apr 30, 2019
03091b1
Plot for cross evaluation
sun-te May 2, 2019
eacc954
eval of student from single source
kalifou May 6, 2019
1e2ec91
corss eval after training
sun-te May 6, 2019
bca6200
comment and evaluation bugs fixed
sun-te May 7, 2019
dcea7b9
cross plot
sun-te May 7, 2019
0264c31
merge updates from master & cross-task into current branch
kalifou May 7, 2019
3d0f79d
clean and test for distillation(draft)
kalifou May 8, 2019
fd08dd7
corss eval and dataset generator, test_eval
sun-te May 10, 2019
bac57ce
dataset generator update
sun-te May 10, 2019
7fe104c
merge cross-task branch into current
kalifou May 13, 2019
80c01ae
update tests for distillation
kalifou May 21, 2019
9eec0f5
reduce distillation config files
kalifou May 21, 2019
a85942d
fix generator for cross env compatibility
kalifou May 22, 2019
bbe9c44
fix for on-policy data-gen: normalizing obs
kalifou May 22, 2019
b959e43
small fixes & cleaning: data-gen, distillation logs
kalifou May 23, 2019
3874111
adapt generative replay for on-policy data generation
kalifou May 24, 2019
51694e0
args.log_dir fix when latestPath is used
TLESORT Jun 3, 2019
9daf762
Merge branch 'circular_movement_omnibot' of https://github.com/Gaspar…
TLESORT Jun 3, 2019
e1b72bd
more informative print
TLESORT Jun 3, 2019
2b9dcef
fix merge
TLESORT Jun 3, 2019
8982d7c
Merge branch 'circular_movement_omnibot' of https://github.com/Gaspar…
TLESORT Jun 3, 2019
299a374
escape task
sun-te Jun 3, 2019
86c7a73
update for escape evaluation
sun-te Jun 5, 2019
6ec2e0a
ground_truth moification
sun-te Jun 5, 2019
ea564a2
modify readme.md in policy distillation
Jun 11, 2019
8ae7d9b
note in dataset_generator
Jun 11, 2019
4646f16
update supervised_rl/reade.md
Jun 11, 2019
9490fa8
fix distillation at checkpoints for CC (TC)
kalifou Jun 12, 2019
cc4e30c
Merge pull request #4 from saybunthet/circular_movement_omnibot
kalifou Jun 12, 2019
208d472
cleaning
kalifou Jun 12, 2019
af944ee
Merge branch 'circular_movement_omnibot' into escape_dev
kalifou Jun 13, 2019
486ea7e
fixed orientation for the chasing agent
sun-te Jun 13, 2019
12cd0c0
bug fixed
sun-te Jun 13, 2019
77f7267
target position update
sun-te Jun 14, 2019
7f2ce56
fix merger in case of distillation
kalifou Jun 17, 2019
fe09e49
reward update
sun-te Jun 24, 2019
b8899f9
Merge branch 'escape_dev' of https://github.com/GaspardQin/robotics-r…
sun-te Jun 24, 2019
ed26f2e
reward can be float for circular task and escaping task
sun-te Jun 25, 2019
83bed1e
a new dataset merger for the balanced timesteps settings during the m…
sun-te Jun 25, 2019
d8fdb05
Merge pull request #8 from GaspardQin/circular_movement_omnibot_data_…
sun-te Jun 26, 2019
8701bb0
Revert "reward can be float for circular task and escaping task"
sun-te Jun 26, 2019
fbde89d
dataset manager
sun-te Jun 26, 2019
52e4730
separator
sun-te Jun 26, 2019
99d3039
data separator
sun-te Jun 26, 2019
ce2d7e4
separator
sun-te Jun 27, 2019
f6c0f03
sparser dataset
sun-te Jun 27, 2019
d45a369
resampling of data
sun-te Jun 28, 2019
e622e1f
float reward data merger
sun-te Jun 28, 2019
e823cbb
Merge pull request #9 from GaspardQin/revert-8-circular_movement_omni…
sun-te Jun 28, 2019
6c29da3
separator
sun-te Jun 28, 2019
a1d2639
dataset_merger can preserve the original dataset for further use
sun-te Jul 3, 2019
bf50fc5
cleaning
sun-te Jul 15, 2019
24b16be
learning
sun-te Jul 15, 2019
6e2272c
preserve original data after merge
sun-te Jul 15, 2019
d10e9b6
resampling for the distillation
sun-te Jul 15, 2019
11586d1
cleaning
sun-te Jul 15, 2019
604b1ac
Merge pull request #11 from GaspardQin/escape_dev
sun-te Jul 15, 2019
d4c2bd9
test4esc&clearning
sun-te Jul 18, 2019
9527b00
Delete delete_val.ipynb
sun-te Jul 18, 2019
e2f4c76
Update environment.yml
sun-te Jul 18, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,13 @@ python -m rl_baselines.train --algo rl_algo --env env1 --log-dir logs/ --srl-mod

To use the robot's position as input instead of pixels, just pass `--srl-model ground_truth` instead of `--srl-model raw_pixels`

To perform a cross evaluation for the different srl model, one could run in the terminal:

```
python -m rl_baselines.pipeline_cross --algo ppo2 --log-dir logs/ --srl-model srl_comnbination ground_truth --num-iteration 5 --num-timesteps 1000000 --task cc sqc sc --srl-config-file config/srl_models1.yaml config/srl_models2.yaml config/srl_models3.yaml
```
This will output the learning result into the repository logs.


## Installation

Expand Down Expand Up @@ -191,6 +198,11 @@ If you have troubles installing mpi4py, make sure you the following installed:
sudo apt-get install libopenmpi-dev openmpi-bin openmpi-doc
```

If you have troubles building wheel for ```atari```, you could fix that by running:
```
sudo apt-get install cmake libz-dev
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I recall, we would only need libz-dev as Atary requirement, since cmake is already part of stable-baselines install guidelines

```

## Known issues

The inverse kinematics function has trouble finding a solution when the arm is fully straight and the arm must bend to reach the requested point.
Expand Down
8 changes: 8 additions & 0 deletions config/srl_models_circular.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@

OmnirobotEnv-v0:
# Base path to SRL log folder
# log_folder: srl_zoo/logs/Omnibot_random_simple/
log_folder: srl_zoo/logs/Omnibot_circular/
autoencoder: 19-02-04_23h27_22_custom_cnn_ST_DIM200_autoencoder_reward_inverse_forward/srl_model.pth


9 changes: 9 additions & 0 deletions config/srl_models_escape.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@

OmnirobotEnv-v0:
# Base path to SRL log folder
# log_folder: srl_zoo/logs/escape_agent/
log_folder: srl_zoo/logs/escape_agent/
autoencoder: 19-02-04_23h27_22_custom_cnn_ST_DIM200_autoencoder_reward_inverse_forward/srl_model.pth
srl_combination: 19-06-03_18h38_59_custom_cnn_ST_DIM200_autoencoder_inverse/srl_model.pth


8 changes: 8 additions & 0 deletions config/srl_models_merged.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@

OmnirobotEnv-v0:
# Base path to SRL log folder
# log_folder: srl_zoo/logs/Omnibot_random_simple/
log_folder: srl_zoo/logs/merge_CC_SC/
autoencoder: 19-02-04_23h27_22_custom_cnn_ST_DIM200_autoencoder_reward_inverse_forward/srl_model.pth


8 changes: 8 additions & 0 deletions config/srl_models_simple.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@

OmnirobotEnv-v0:
# Base path to SRL log folder
# log_folder: srl_zoo/logs/Omnibot_random_simple/
log_folder: srl_zoo/logs/Omnibot_random_simple/
autoencoder: 19-02-04_23h27_22_custom_cnn_ST_DIM200_autoencoder_reward_inverse_forward/srl_model.pth


260 changes: 238 additions & 22 deletions environments/dataset_generator.py

Large diffs are not rendered by default.

78 changes: 63 additions & 15 deletions environments/dataset_fusioner.py → environments/dataset_merger.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,23 @@
import numpy as np
from tqdm import tqdm

# List of all possible labels identifying a task,
# for experiments in Continual Learning scenari.
CONTINUAL_LEARNING_LABELS = ['CC', 'SC', 'EC', 'SQC', 'ESC']
CL_LABEL_KEY = "continual_learning_label"


def main():
parser = argparse.ArgumentParser(description='Dataset Manipulator: useful to merge two datasets by concatenating '
+ 'episodes. PS: Deleting sources after merging into the destination '
+ 'folder.')
parser.add_argument('--continual-learning-labels', type=str, nargs=2, metavar=('label_1', 'label_2'),
default=argparse.SUPPRESS, help='Labels for the continual learning RL distillation task.')
parser.add_argument('-f', '--force', action='store_true', default=False,
help='Force the merge, even if it overrides something else,'
' including the destination if it exist')
parser.add_argument('-rm', '--remove', action='store_true', default=False,
help='Remove the original data set.')
group = parser.add_mutually_exclusive_group()
group.add_argument('--merge', type=str, nargs=3, metavar=('source_1', 'source_2', 'destination'),
default=argparse.SUPPRESS,
Expand All @@ -23,28 +35,44 @@ def main():
if 'merge' in args:
# let make sure everything is in order
assert os.path.exists(args.merge[0]), "Error: dataset '{}' could not be found".format(args.merge[0])
assert (not os.path.exists(args.merge[2])), \
"Error: dataset '{}' already exists, cannot rename '{}' to '{}'".format(args.merge[2], args.merge[0],
args.merge[2])
assert os.path.exists(args.merge[1]), "Error: dataset '{}' could not be found".format(args.merge[1])

# If the merge file exists already, delete it for the convenince of updating student's policy
if os.path.exists(args.merge[2]) or os.path.exists(args.merge[2] + '/'):
assert args.force, "Error: destination directory '{}' already exists".format(args.merge[2])
shutil.rmtree(args.merge[2])

if 'continual_learning_labels' in args:
assert args.continual_learning_labels[0] in CONTINUAL_LEARNING_LABELS \
and args.continual_learning_labels[1] in CONTINUAL_LEARNING_LABELS, \
"Please specify a valid Continual learning label to each dataset to be used for RL distillation !"

# create the output
os.mkdir(args.merge[2])

# copy files from first source
os.rename(args.merge[0] + "/dataset_config.json", args.merge[2] + "/dataset_config.json")
os.rename(args.merge[0] + "/env_globals.json", args.merge[2] + "/env_globals.json")

shutil.copy2(args.merge[0] + "/dataset_config.json", args.merge[2] + "/dataset_config.json")
shutil.copy2(args.merge[0] + "/env_globals.json", args.merge[2] + "/env_globals.json")
record = ''
for record in sorted(glob.glob(args.merge[0] + "/record_[0-9]*/*")):
s = args.merge[2] + "/" + record.split("/")[-2] + '/' + record.split("/")[-1]
os.renames(record, s)

try:
shutil.copy2(record, s)
except FileNotFoundError: # no folders named so, we should create it first
os.mkdir(os.path.dirname(s))
shutil.copy2(record, s)
num_episode_dataset_1 = int(record.split("/")[-2][7:]) + 1

# copy files from second source
for record in sorted(glob.glob(args.merge[1] + "/record_[0-9]*/*")):
episode = str(num_episode_dataset_1 + int(record.split("/")[-2][7:]))
new_episode = record.split("/")[-2][:-len(episode)] + episode
s = args.merge[2] + "/" + new_episode + '/' + record.split("/")[-1]
os.renames(record, s)
try:
shutil.copy2(record, s)
except FileNotFoundError: # no folders named so, we should create it first
os.mkdir(os.path.dirname(s))
shutil.copy2(record, s)
num_episode_dataset_2 = int(record.split("/")[-2][7:]) + 1

# load and correct ground_truth
Expand Down Expand Up @@ -101,20 +129,40 @@ def main():
preprocessed_load = np.load(args.merge[0] + "/preprocessed_data.npz")
preprocessed_load_2 = np.load(args.merge[1] + "/preprocessed_data.npz")

for prepro_load in [preprocessed_load, preprocessed_load_2]:
dataset_1_size = preprocessed_load["actions"].shape[0]
dataset_2_size = preprocessed_load_2["actions"].shape[0]

# Concatenating additional information: indices of episode start, action probabilities, CL labels...
for idx, prepro_load in enumerate([preprocessed_load, preprocessed_load_2]):
for arr in prepro_load.files:
pr_arr = prepro_load[arr]
preprocessed[arr] = np.concatenate((preprocessed.get(arr, []), pr_arr), axis=0)

if arr == "episode_starts":
preprocessed[arr] = preprocessed[arr].astype(bool)
to_class = bool
elif arr == "actions_proba" or arr == "rewards":
to_class = float
else:
to_class = int
if preprocessed.get(arr, None) is None:
preprocessed[arr] = pr_arr.astype(to_class)
else:
preprocessed[arr] = np.concatenate((preprocessed[arr].astype(to_class),
pr_arr.astype(to_class)), axis=0)
if 'continual_learning_labels' in args:
if preprocessed.get(CL_LABEL_KEY, None) is None:
preprocessed[CL_LABEL_KEY] = \
np.array([args.continual_learning_labels[idx] for _ in range(dataset_1_size)])
else:
preprocessed[arr] = preprocessed[arr].astype(int)
preprocessed[CL_LABEL_KEY] = \
np.concatenate((preprocessed[CL_LABEL_KEY], np.array([args.continual_learning_labels[idx]
for _ in range(dataset_2_size)])), axis=0)

np.savez(args.merge[2] + "/preprocessed_data.npz", ** preprocessed)

# remove the old folders
shutil.rmtree(args.merge[0])
shutil.rmtree(args.merge[1])
if args.remove:
shutil.rmtree(args.merge[0])
shutil.rmtree(args.merge[1])


if __name__ == '__main__':
Expand Down
Loading