Replies: 1 comment 5 replies
-
Hi @famora2 Typically, curricular learning is accomplished by increasing the difficulty of the task. # NOTE: using API skrl-v0.8.0
env = ...
# create a sequential trainer
cfg = {"timesteps": 50000, "headless": False}
trainer = ManualTrainer(env=env, agents=agents, cfg=cfg)
# train the agent(s)
for timestep in range(cfg["timesteps"]):
trainer.train(timestep=timestep)
# adjust environment difficulty
if SOME_METRIC is REACHED:
env.increase_difficulty() In the case, you want to overwrite the agent's action, you can overwrite the policy `.act(...)` method to return whatever actions you want. # NOTE: using API skrl-v0.8.0
from skrl.models.torch import Model, GaussianMixin
# define the model
class Policy(GaussianMixin, Model):
def __init__(self, observation_space, action_space, device,
clip_actions=False, clip_log_std=True, min_log_std=-20, max_log_std=2, reduction="sum"):
Model.__init__(self, observation_space, action_space, device)
GaussianMixin.__init__(self, clip_actions, clip_log_std, min_log_std, max_log_std, reduction)
self.net = nn.Sequential(...)
self.log_std_parameter = nn.Parameter(torch.zeros(self.num_actions))
def act(self, states, taken_actions, role):
# use custom recorded actions...
if SOME_METRIC is USED:
return CUSTOM_action, CUSTOM_log_prob, None
# use the policy
else:
return super().act(states, taken_actions, role)
def compute(self, states, taken_actions, role):
return self.net(states), self.log_std_parameter How about these ideas? |
Beta Was this translation helpful? Give feedback.
5 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
I would like to implement the so-called curriculum learning using skrl, where I initialize the training with a pre-recorded data and gradually decrease the usage of this pre-recorded data.
The part that I do not understand is the way the code is structured. Taking the "FrankaCabinet" as an example:
Above code is used to initialize the agent and start the training. Assuming I have the pre-recorded joint trajectory of Franka arm as Numpy array, I would like to overwrite action (which is the output of the agent) with this Numpy array to guide the robot arm towards the desired behavior. However, in this way, the whole training would be messed up, as the provided action is actually crap. So, by simply overwriting the action values, the pre-recorded numpy array can not be appropriately used.
Do you have advice/tips for this case?
Beta Was this translation helpful? Give feedback.
All reactions