| |
- experiments.off_p_tsb_experiment.OffPolicyTimeStepBasedExperiment(builtins.object)
-
- NafExperiment
class NafExperiment(experiments.off_p_tsb_experiment.OffPolicyTimeStepBasedExperiment) |
|
NafExperiment(root_dir='', num_eval_episodes=10, summary_interval=1000, env_name='BipedalWalker-v3', env_action_repeat_times=4, preprocessing_conv_layer_params=None, preprocessing_conv_type='1d', preprocessing_fc_layer_params=(256,), preprocessing_dropout_layer_params=None, v_network_fc_layer_params=(256,), v_network_dropout_layer_params=None, l_network_fc_layer_params=(256,), l_network_dropout_layer_params=None, policy_network_fc_layer_params=(256,), policy_network_dropout_layer_params=None, policy_network_uses_shared_preprocessing_network=True, learning_rate=0.0003, target_update_tau=0.005, target_update_period=1, td_errors_loss_fn=<function squared_difference at 0x0000022AE01448B8>, gamma=0.99, noise_factor=0.1, debug_summaries=False, summarize_grads_and_vars=False, replay_buffer_capacity=1000000, initial_collect_steps=10000, collect_steps_per_iteration=1, use_tf_functions=True, sample_batch_size=256, num_iterations=3000000, train_steps_per_iteration=1, log_interval=1000, eval_interval=10000, train_checkpoint_interval=50000, policy_checkpoint_interval=50000, replay_buffer_checkpoint_interval=50000, name='default_naf_experiment')
A simple train and eval class for a NAF Agent. |
|
- Method resolution order:
- NafExperiment
- experiments.off_p_tsb_experiment.OffPolicyTimeStepBasedExperiment
- builtins.object
Methods defined here:
- __init__(self, root_dir='', num_eval_episodes=10, summary_interval=1000, env_name='BipedalWalker-v3', env_action_repeat_times=4, preprocessing_conv_layer_params=None, preprocessing_conv_type='1d', preprocessing_fc_layer_params=(256,), preprocessing_dropout_layer_params=None, v_network_fc_layer_params=(256,), v_network_dropout_layer_params=None, l_network_fc_layer_params=(256,), l_network_dropout_layer_params=None, policy_network_fc_layer_params=(256,), policy_network_dropout_layer_params=None, policy_network_uses_shared_preprocessing_network=True, learning_rate=0.0003, target_update_tau=0.005, target_update_period=1, td_errors_loss_fn=<function squared_difference at 0x0000022AE01448B8>, gamma=0.99, noise_factor=0.1, debug_summaries=False, summarize_grads_and_vars=False, replay_buffer_capacity=1000000, initial_collect_steps=10000, collect_steps_per_iteration=1, use_tf_functions=True, sample_batch_size=256, num_iterations=3000000, train_steps_per_iteration=1, log_interval=1000, eval_interval=10000, train_checkpoint_interval=50000, policy_checkpoint_interval=50000, replay_buffer_checkpoint_interval=50000, name='default_naf_experiment')
- Initialize self. See help(type(self)) for accurate signature.
Methods inherited from experiments.off_p_tsb_experiment.OffPolicyTimeStepBasedExperiment:
- copy(self)
- launch(self)
Data descriptors inherited from experiments.off_p_tsb_experiment.OffPolicyTimeStepBasedExperiment:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
| |