| |
- tf_agents.agents.tf_agent.TFAgent(tensorflow.python.module.module.Module)
-
- NafAgent
class NafAgent(tf_agents.agents.tf_agent.TFAgent) |
|
NafAgent(time_step_spec: tf_agents.trajectories.time_step.TimeStep, action_spec: Union[tensorflow.python.framework.type_spec.TypeSpec, Iterable[tensorflow.python.framework.type_spec.TypeSpec], Mapping[str, tensorflow.python.framework.type_spec.TypeSpec]], v_network: tf_agents.networks.network.Network, l_matrix_network: tf_agents.networks.network.Network, policy_network: tf_agents.networks.network.Network, optimizer: Union[tensorflow.python.keras.optimizer_v2.optimizer_v2.OptimizerV2, tensorflow.python.training.optimizer.Optimizer], target_v_network: Union[tf_agents.networks.network.Network, NoneType] = None, target_update_tau: Union[float, numpy.float16, numpy.float32, numpy.float64, tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.sparse_tensor.SparseTensor, tensorflow.python.ops.ragged.ragged_tensor.RaggedTensor, numpy.ndarray] = 1.0, target_update_period: Union[int, numpy.int16, numpy.int32, numpy.int64, tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.sparse_tensor.SparseTensor, tensorflow.python.ops.ragged.ragged_tensor.RaggedTensor, numpy.ndarray] = 1, td_errors_loss_fn: Callable[[Union[tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.sparse_tensor.SparseTensor, tensorflow.python.ops.ragged.ragged_tensor.RaggedTensor], Union[tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.sparse_tensor.SparseTensor, tensorflow.python.ops.ragged.ragged_tensor.RaggedTensor]], Union[tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.sparse_tensor.SparseTensor, tensorflow.python.ops.ragged.ragged_tensor.RaggedTensor]] = <function squared_difference at 0x00000272E13DBB88>, gamma: Union[float, numpy.float16, numpy.float32, numpy.float64, tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.sparse_tensor.SparseTensor, tensorflow.python.ops.ragged.ragged_tensor.RaggedTensor, numpy.ndarray] = 1.0, train_step_counter: Union[tensorflow.python.ops.variables.Variable, NoneType] = None, noise_factor: Union[float, numpy.float16, numpy.float32, numpy.float64, tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.sparse_tensor.SparseTensor, tensorflow.python.ops.ragged.ragged_tensor.RaggedTensor, numpy.ndarray] = 0.1, debug_summaries: Union[bool, tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.sparse_tensor.SparseTensor, tensorflow.python.ops.ragged.ragged_tensor.RaggedTensor, numpy.ndarray] = False, summarize_grads_and_vars: Union[bool, tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.sparse_tensor.SparseTensor, tensorflow.python.ops.ragged.ragged_tensor.RaggedTensor, numpy.ndarray] = False, name: Union[str, NoneType] = None)
A NAF Agent. |
|
- Method resolution order:
- NafAgent
- tf_agents.agents.tf_agent.TFAgent
- tensorflow.python.module.module.Module
- tensorflow.python.training.tracking.tracking.AutoTrackable
- tensorflow.python.training.tracking.base.Trackable
- builtins.object
Methods defined here:
- __init__(self, time_step_spec: tf_agents.trajectories.time_step.TimeStep, action_spec: Union[tensorflow.python.framework.type_spec.TypeSpec, Iterable[tensorflow.python.framework.type_spec.TypeSpec], Mapping[str, tensorflow.python.framework.type_spec.TypeSpec]], v_network: tf_agents.networks.network.Network, l_matrix_network: tf_agents.networks.network.Network, policy_network: tf_agents.networks.network.Network, optimizer: Union[tensorflow.python.keras.optimizer_v2.optimizer_v2.OptimizerV2, tensorflow.python.training.optimizer.Optimizer], target_v_network: Union[tf_agents.networks.network.Network, NoneType] = None, target_update_tau: Union[float, numpy.float16, numpy.float32, numpy.float64, tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.sparse_tensor.SparseTensor, tensorflow.python.ops.ragged.ragged_tensor.RaggedTensor, numpy.ndarray] = 1.0, target_update_period: Union[int, numpy.int16, numpy.int32, numpy.int64, tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.sparse_tensor.SparseTensor, tensorflow.python.ops.ragged.ragged_tensor.RaggedTensor, numpy.ndarray] = 1, td_errors_loss_fn: Callable[[Union[tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.sparse_tensor.SparseTensor, tensorflow.python.ops.ragged.ragged_tensor.RaggedTensor], Union[tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.sparse_tensor.SparseTensor, tensorflow.python.ops.ragged.ragged_tensor.RaggedTensor]], Union[tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.sparse_tensor.SparseTensor, tensorflow.python.ops.ragged.ragged_tensor.RaggedTensor]] = <function squared_difference at 0x00000272E13DBB88>, gamma: Union[float, numpy.float16, numpy.float32, numpy.float64, tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.sparse_tensor.SparseTensor, tensorflow.python.ops.ragged.ragged_tensor.RaggedTensor, numpy.ndarray] = 1.0, train_step_counter: Union[tensorflow.python.ops.variables.Variable, NoneType] = None, noise_factor: Union[float, numpy.float16, numpy.float32, numpy.float64, tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.sparse_tensor.SparseTensor, tensorflow.python.ops.ragged.ragged_tensor.RaggedTensor, numpy.ndarray] = 0.1, debug_summaries: Union[bool, tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.sparse_tensor.SparseTensor, tensorflow.python.ops.ragged.ragged_tensor.RaggedTensor, numpy.ndarray] = False, summarize_grads_and_vars: Union[bool, tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.sparse_tensor.SparseTensor, tensorflow.python.ops.ragged.ragged_tensor.RaggedTensor, numpy.ndarray] = False, name: Union[str, NoneType] = None)
- Creates a NAF Agent.
Args:
time_step_spec: A `TimeStep` spec of the expected time_steps.
action_spec: A nest of BoundedTensorSpec representing the actions.
v_network: A tf_agents.network.Network to be used by the agent to
predict the v-value of an observation. It will be called with
call(observation, step_type, network_state) and should return
the estimated v_value.
l_matrix_network: A tf_agents.network.Network to be used by the
agent to obtain the terms of the L matrix used to compute the
advantage value on an action. It will be called with
call(observation, step_type, network_state) and should return
the elements to build the matrix.
policy_network: A tf_agents.network.Network to be used by the
agent to obtain the action to execute given an observation of
the environment. It will be called with call(observation,
step_type [,policy_state]) and should return the action to execute.
optimizer: The optimizer to use for fit the networks.
target_v_network: (Optional.) A `tf_agents.network.Network` to
be used during Q learning to predict the target V values.
Every `target_update_period` train steps, the weights from
`v_network` are copied (possibly withsmoothing via
`target_update_tau`) to `target_v_network`. If `target_v_network`
is not provided, it is created by making a copy of the `v_network`,
which initializes a new network with the same structure and its
own layers and weights.
Performing a `Network.copy` does not work when the network instance
already has trainable parameters (e.g., has already been built, or
when the network is sharing layers with another). In these cases, it is
up to you to build a copy having weights that are not
shared with the original `actor_network`, so that this can be used as a
target network. If you provide a `target_actor_network` that shares any
weights with `actor_network`, a warning will be logged but no exception
is thrown.
target_update_tau: Factor for soft update of the target networks.
target_update_period: Period for soft update of the target networks.
td_errors_loss_fn: A function for computing the TD errors loss. If None,
a default value of elementwise huber_loss is used.
gamma: A discount factor for future rewards.
train_step_counter: An optional counter to increment every time the train
op is run. Defaults to the global_step.
noise_factor: Standard deviation for the Gaussian noise added
in the default collect policy.
debug_summaries: A bool to gather debug summaries.
summarize_grads_and_vars: If True, gradient and network variable summaries
will be written during training.
name: The name of this agent. All variables in this module will fall
under that name.
Data and other attributes defined here:
- __abstractmethods__ = frozenset()
Methods inherited from tf_agents.agents.tf_agent.TFAgent:
- initialize(self) -> Union[tensorflow.python.framework.ops.Operation, NoneType]
- Initializes the agent.
Returns:
An operation that can be used to initialize the agent.
Raises:
RuntimeError: If the class was not initialized properly (`super.__init__`
was not called).
- preprocess_sequence(self, experience: Union[tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.sparse_tensor.SparseTensor, tensorflow.python.ops.ragged.ragged_tensor.RaggedTensor, Iterable[Union[tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.sparse_tensor.SparseTensor, tensorflow.python.ops.ragged.ragged_tensor.RaggedTensor]], Mapping[str, Union[tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.sparse_tensor.SparseTensor, tensorflow.python.ops.ragged.ragged_tensor.RaggedTensor]]]) -> Union[tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.sparse_tensor.SparseTensor, tensorflow.python.ops.ragged.ragged_tensor.RaggedTensor, Iterable[Union[tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.sparse_tensor.SparseTensor, tensorflow.python.ops.ragged.ragged_tensor.RaggedTensor]], Mapping[str, Union[tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.sparse_tensor.SparseTensor, tensorflow.python.ops.ragged.ragged_tensor.RaggedTensor]]]
- Defines preprocess_sequence function to be fed into replay buffers.
This defines how we preprocess the collected data before training.
Defaults to pass through for most agents.
Structure of `experience` must match that of `self.collect_data_spec`.
Args:
experience: a `Trajectory` shaped [batch, time, ...] or [time, ...] which
represents the collected experience data.
Returns:
A post processed `Trajectory` with the same shape as the input.
Raises:
TypeError: If experience does not match `self.collect_data_spec` structure
types.
- train(self, experience: Union[tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.sparse_tensor.SparseTensor, tensorflow.python.ops.ragged.ragged_tensor.RaggedTensor, Iterable[Union[tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.sparse_tensor.SparseTensor, tensorflow.python.ops.ragged.ragged_tensor.RaggedTensor]], Mapping[str, Union[tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.sparse_tensor.SparseTensor, tensorflow.python.ops.ragged.ragged_tensor.RaggedTensor]]], weights: Union[tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.sparse_tensor.SparseTensor, tensorflow.python.ops.ragged.ragged_tensor.RaggedTensor, NoneType] = None, **kwargs) -> tf_agents.agents.tf_agent.LossInfo
- Trains the agent.
Args:
experience: A batch of experience data in the form of a `Trajectory`. The
structure of `experience` must match that of `self.training_data_spec`.
All tensors in `experience` must be shaped `[batch, time, ...]` where
`time` must be equal to `self.train_step_length` if that
property is not `None`.
weights: (optional). A `Tensor`, either `0-D` or shaped `[batch]`,
containing weights to be used when calculating the total train loss.
Weights are typically multiplied elementwise against the per-batch loss,
but the implementation is up to the Agent.
**kwargs: Any additional data as declared by `self.train_argspec`.
Returns:
A `LossInfo` loss tuple containing loss and info tensors.
- In eager mode, the loss values are first calculated, then a train step
is performed before they are returned.
- In graph mode, executing any or all of the loss tensors
will first calculate the loss value(s), then perform a train step,
and return the pre-train-step `LossInfo`.
Raises:
TypeError: If `validate_args is True` and: Experience is not type
`Trajectory`; or if `experience` does not match
`self.training_data_spec` structure types.
ValueError: If `validate_args is True` and: Experience tensors' time axes
are not compatible with `self.train_sequence_length`; or if experience
does not match `self.training_data_spec` structure.
ValueError: If `validate_args is True` and the user does not pass
`**kwargs` matching `self.train_argspec`.
RuntimeError: If the class was not initialized properly (`super.__init__`
was not called).
Data descriptors inherited from tf_agents.agents.tf_agent.TFAgent:
- action_spec
- TensorSpec describing the action produced by the agent.
Returns:
An single BoundedTensorSpec, or a nested dict, list or tuple of
`BoundedTensorSpec` objects, which describe the shape and
dtype of each action Tensor.
- collect_data_spec
- Returns a `Trajectory` spec, as expected by the `collect_policy`.
Returns:
A `Trajectory` spec.
- collect_policy
- Return a policy that can be used to collect data from the environment.
Returns:
A `tf_policy.TFPolicy` object.
- data_context
- debug_summaries
- policy
- Return the current policy held by the agent.
Returns:
A `tf_policy.TFPolicy` object.
- summaries_enabled
- summarize_grads_and_vars
- time_step_spec
- Describes the `TimeStep` tensors expected by the agent.
Returns:
A `TimeStep` namedtuple with `TensorSpec` objects instead of Tensors,
which describe the shape, dtype and name of each tensor.
- train_argspec
- TensorSpec describing extra supported `kwargs` to `train()`.
Returns:
A `dict` mapping kwarg strings to nests of `tf.TypeSpec` objects (or
`None` if there is no `train_argspec`).
- train_sequence_length
- The number of time steps needed in experience tensors passed to `train`.
Train requires experience to be a `Trajectory` containing tensors shaped
`[B, T, ...]`. This argument describes the value of `T` required.
For example, for non-RNN DQN training, `T=2` because DQN requires single
transitions.
If this value is `None`, then `train` can handle an unknown `T` (it can be
determined at runtime from the data). Most RNN-based agents fall into
this category.
Returns:
The number of time steps needed in experience tensors passed to `train`.
May be `None` to mean no constraint.
- train_step_counter
- training_data_spec
- Returns a trajectory spec, as expected by the train() function.
- validate_args
- Whether `train` & `preprocess_sequence` validate input & output args.
Class methods inherited from tensorflow.python.module.module.Module:
- with_name_scope(method) from abc.ABCMeta
- Decorator to automatically enter the module name scope.
>>> class MyModule(tf.Module):
... @tf.Module.with_name_scope
... def __call__(self, x):
... if not hasattr(self, 'w'):
... self.w = tf.Variable(tf.random.normal([x.shape[1], 3]))
... return tf.matmul(x, self.w)
Using the above module would produce `tf.Variable`s and `tf.Tensor`s whose
names included the module name:
>>> mod = MyModule()
>>> mod(tf.ones([1, 2]))
<tf.Tensor: shape=(1, 3), dtype=float32, numpy=..., dtype=float32)>
>>> mod.w
<tf.Variable 'my_module/Variable:0' shape=(2, 3) dtype=float32,
numpy=..., dtype=float32)>
Args:
method: The method to wrap.
Returns:
The original method wrapped such that it enters the module's name scope.
Data descriptors inherited from tensorflow.python.module.module.Module:
- name
- Returns the name of this module as passed or determined in the ctor.
NOTE: This is not the same as the `self.name_scope.name` which includes
parent module names.
- name_scope
- Returns a `tf.name_scope` instance for this class.
- submodules
- Sequence of all sub-modules.
Submodules are modules which are properties of this module, or found as
properties of modules which are properties of this module (and so on).
>>> a = tf.Module()
>>> b = tf.Module()
>>> c = tf.Module()
>>> a.b = b
>>> b.c = c
>>> list(a.submodules) == [b, c]
True
>>> list(b.submodules) == [c]
True
>>> list(c.submodules) == []
True
Returns:
A sequence of all submodules.
- trainable_variables
- Sequence of trainable variables owned by this module and its submodules.
Note: this method uses reflection to find variables on the current instance
and submodules. For performance reasons you may wish to cache the result
of calling this method if you don't expect the return value to change.
Returns:
A sequence of variables for the current module (sorted by attribute
name) followed by variables from all submodules recursively (breadth
first).
- variables
- Sequence of variables owned by this module and its submodules.
Note: this method uses reflection to find variables on the current instance
and submodules. For performance reasons you may wish to cache the result
of calling this method if you don't expect the return value to change.
Returns:
A sequence of variables for the current module (sorted by attribute
name) followed by variables from all submodules recursively (breadth
first).
Methods inherited from tensorflow.python.training.tracking.tracking.AutoTrackable:
- __delattr__(self, name)
- Implement delattr(self, name).
- __setattr__(self, name, value)
- Support self.foo = trackable syntax.
Data descriptors inherited from tensorflow.python.training.tracking.base.Trackable:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
| |