API del Producte - Treball Final de Màster
by Rafael Jesús Castaño Ribes, January 2021

Els mètodes derivats, si no ha estat sobreescrita, hereten la documentació de les seves superclasses. L'autoria de la documentació en aquests casos correspon al desenvolupador de la superclasse.
index -> experiments.sac_experiment
index
..\src\experiments\sac_experiment.py

Train and Eval SAC.
All hyperparameters come from the SAC paper
https://arxiv.org/pdf/1812.05905.pdf

 
Modules
       
tf_agents.networks.actor_distribution_network
tf_agents.agents.ddpg.critic_network
tf_agents.agents.sac.sac_agent
tf_agents.agents.sac.tanh_normal_projection_network
tensorflow

 
Classes
       
experiments.off_p_tsb_experiment.OffPolicyTimeStepBasedExperiment(builtins.object)
SacExperiment

 
class SacExperiment(experiments.off_p_tsb_experiment.OffPolicyTimeStepBasedExperiment)
    SacExperiment(root_dir='', num_eval_episodes=10, summary_interval=1000, env_name='BipedalWalker-v3', env_action_repeat_times=4, actor_fc_layers=(256, 256), critic_obs_fc_layers=None, critic_action_fc_layers=None, critic_joint_fc_layers=(256, 256), actor_learning_rate=0.0003, critic_learning_rate=0.0003, alpha_learning_rate=0.0003, target_update_tau=0.005, target_update_period=1, td_errors_loss_fn=<function squared_difference at 0x00000172846348B8>, gamma=0.99, reward_scale_factor=0.1, gradient_clipping=None, debug_summaries=False, summarize_grads_and_vars=False, replay_buffer_capacity=1000000, initial_collect_steps=10000, collect_steps_per_iteration=1, use_tf_functions=True, sample_batch_size=256, num_iterations=3000000, train_steps_per_iteration=1, log_interval=1000, eval_interval=10000, train_checkpoint_interval=50000, policy_checkpoint_interval=50000, replay_buffer_checkpoint_interval=50000, name='sac_default_experiment')
 
A simple train and eval class for a SAC agent.
 
 
Method resolution order:
SacExperiment
experiments.off_p_tsb_experiment.OffPolicyTimeStepBasedExperiment
builtins.object

Methods defined here:
__init__(self, root_dir='', num_eval_episodes=10, summary_interval=1000, env_name='BipedalWalker-v3', env_action_repeat_times=4, actor_fc_layers=(256, 256), critic_obs_fc_layers=None, critic_action_fc_layers=None, critic_joint_fc_layers=(256, 256), actor_learning_rate=0.0003, critic_learning_rate=0.0003, alpha_learning_rate=0.0003, target_update_tau=0.005, target_update_period=1, td_errors_loss_fn=<function squared_difference at 0x00000172846348B8>, gamma=0.99, reward_scale_factor=0.1, gradient_clipping=None, debug_summaries=False, summarize_grads_and_vars=False, replay_buffer_capacity=1000000, initial_collect_steps=10000, collect_steps_per_iteration=1, use_tf_functions=True, sample_batch_size=256, num_iterations=3000000, train_steps_per_iteration=1, log_interval=1000, eval_interval=10000, train_checkpoint_interval=50000, policy_checkpoint_interval=50000, replay_buffer_checkpoint_interval=50000, name='sac_default_experiment')
Initialize self.  See help(type(self)) for accurate signature.

Methods inherited from experiments.off_p_tsb_experiment.OffPolicyTimeStepBasedExperiment:
copy(self)
launch(self)

Data descriptors inherited from experiments.off_p_tsb_experiment.OffPolicyTimeStepBasedExperiment:
__dict__
dictionary for instance variables (if defined)
__weakref__
list of weak references to the object (if defined)

 
Data
        absolute_import = _Feature((2, 5, 0, 'alpha', 1), (3, 0, 0, 'alpha', 0), 16384)
division = _Feature((2, 2, 0, 'alpha', 2), (3, 0, 0, 'alpha', 0), 8192)
print_function = _Feature((2, 6, 0, 'alpha', 2), (3, 0, 0, 'alpha', 0), 65536)