| |
- experiments.off_p_tsb_experiment.OffPolicyTimeStepBasedExperiment(builtins.object)
-
- SacExperiment
class SacExperiment(experiments.off_p_tsb_experiment.OffPolicyTimeStepBasedExperiment) |
|
SacExperiment(root_dir='', num_eval_episodes=10, summary_interval=1000, env_name='BipedalWalker-v3', env_action_repeat_times=4, actor_fc_layers=(256, 256), critic_obs_fc_layers=None, critic_action_fc_layers=None, critic_joint_fc_layers=(256, 256), actor_learning_rate=0.0003, critic_learning_rate=0.0003, alpha_learning_rate=0.0003, target_update_tau=0.005, target_update_period=1, td_errors_loss_fn=<function squared_difference at 0x00000172846348B8>, gamma=0.99, reward_scale_factor=0.1, gradient_clipping=None, debug_summaries=False, summarize_grads_and_vars=False, replay_buffer_capacity=1000000, initial_collect_steps=10000, collect_steps_per_iteration=1, use_tf_functions=True, sample_batch_size=256, num_iterations=3000000, train_steps_per_iteration=1, log_interval=1000, eval_interval=10000, train_checkpoint_interval=50000, policy_checkpoint_interval=50000, replay_buffer_checkpoint_interval=50000, name='sac_default_experiment')
A simple train and eval class for a SAC agent. |
|
- Method resolution order:
- SacExperiment
- experiments.off_p_tsb_experiment.OffPolicyTimeStepBasedExperiment
- builtins.object
Methods defined here:
- __init__(self, root_dir='', num_eval_episodes=10, summary_interval=1000, env_name='BipedalWalker-v3', env_action_repeat_times=4, actor_fc_layers=(256, 256), critic_obs_fc_layers=None, critic_action_fc_layers=None, critic_joint_fc_layers=(256, 256), actor_learning_rate=0.0003, critic_learning_rate=0.0003, alpha_learning_rate=0.0003, target_update_tau=0.005, target_update_period=1, td_errors_loss_fn=<function squared_difference at 0x00000172846348B8>, gamma=0.99, reward_scale_factor=0.1, gradient_clipping=None, debug_summaries=False, summarize_grads_and_vars=False, replay_buffer_capacity=1000000, initial_collect_steps=10000, collect_steps_per_iteration=1, use_tf_functions=True, sample_batch_size=256, num_iterations=3000000, train_steps_per_iteration=1, log_interval=1000, eval_interval=10000, train_checkpoint_interval=50000, policy_checkpoint_interval=50000, replay_buffer_checkpoint_interval=50000, name='sac_default_experiment')
- Initialize self. See help(type(self)) for accurate signature.
Methods inherited from experiments.off_p_tsb_experiment.OffPolicyTimeStepBasedExperiment:
- copy(self)
- launch(self)
Data descriptors inherited from experiments.off_p_tsb_experiment.OffPolicyTimeStepBasedExperiment:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
| |