| |
- tf_agents.environments.wrappers.PyEnvironmentBaseWrapper(tf_agents.environments.py_environment.PyEnvironment)
-
- ActionRepeatHistoryWrapper
class ActionRepeatHistoryWrapper(tf_agents.environments.wrappers.PyEnvironmentBaseWrapper) |
|
ActionRepeatHistoryWrapper(env: tf_agents.environments.py_environment.PyEnvironment, times: Union[int, numpy.int16, numpy.int32, numpy.int64, tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.sparse_tensor.SparseTensor, tensorflow.python.ops.ragged.ragged_tensor.RaggedTensor, numpy.ndarray])
Repeates actions over n-steps while batching observations and accumulating the received reward.
Attribution:
This class is a fussion of the classes ActionRepeat and HistoryWrapper from the TF-Agents library:
ActionRepeat -> https://github.com/tensorflow/agents/blob/v0.6.0/tf_agents/environments/wrappers.py#L190-L221
HistoryWrapper -> https://github.com/tensorflow/agents/blob/v0.6.0/tf_agents/environments/wrappers.py#L797-L877
Both classes are copyrighted 2020 by the TF-Agents Authors and licensed under the Apache License, Version 2.0. |
|
- Method resolution order:
- ActionRepeatHistoryWrapper
- tf_agents.environments.wrappers.PyEnvironmentBaseWrapper
- tf_agents.environments.py_environment.PyEnvironment
- builtins.object
Methods defined here:
- __init__(self, env: tf_agents.environments.py_environment.PyEnvironment, times: Union[int, numpy.int16, numpy.int32, numpy.int64, tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.sparse_tensor.SparseTensor, tensorflow.python.ops.ragged.ragged_tensor.RaggedTensor, numpy.ndarray])
- Creates an action repeat wrapper.
Args:
env: Environment to wrap.
times: Number of times the action should be executed. All the consequent observations will be batched and the reward will be the sum of the rewards.
Raises:
ValueError: If the times parameter is not greater than 1.
- observation_spec(self) -> Union[tf_agents.specs.array_spec.ArraySpec, Iterable[tf_agents.specs.array_spec.ArraySpec], Mapping[str, tf_agents.specs.array_spec.ArraySpec]]
- Defines the observations provided by the environment.
May use a subclass of `ArraySpec` that specifies additional properties such
as min and max bounds on the values.
Returns:
An `ArraySpec`, or a nested dict, list or tuple of `ArraySpec`s.
Data and other attributes defined here:
- __abstractmethods__ = frozenset()
Methods inherited from tf_agents.environments.wrappers.PyEnvironmentBaseWrapper:
- __getattr__(self, name: str)
- Forward all other calls to the base environment.
- action_spec(self) -> Union[numpy.ndarray, Iterable[numpy.ndarray], Mapping[str, numpy.ndarray]]
- Defines the actions that should be provided to `step()`.
May use a subclass of `ArraySpec` that specifies additional properties such
as min and max bounds on the values.
Returns:
An `ArraySpec`, or a nested dict, list or tuple of `ArraySpec`s.
- close(self) -> None
- Frees any resources used by the environment.
Implement this method for an environment backed by an external process.
This method be used directly
```python
env = Env(...)
# Use env.
env.close()
```
or via a context manager
```python
with Env(...) as env:
# Use env.
```
- get_info(self) -> Any
- Returns the environment info returned on the last step.
Returns:
Info returned by last call to step(). None by default.
Raises:
NotImplementedError: If the environment does not use info.
- get_state(self) -> Any
- Returns the `state` of the environment.
The `state` contains everything required to restore the environment to the
current configuration. This can contain e.g.
- The current time_step.
- The number of steps taken in the environment (for finite horizon MDPs).
- Hidden state (for POMDPs).
Callers should not assume anything about the contents or format of the
returned `state`. It should be treated as a token that can be passed back to
`set_state()` later.
Note that the returned `state` handle should not be modified by the
environment later on, and ensuring this (e.g. using copy.deepcopy) is the
responsibility of the environment.
Returns:
state: The current state of the environment.
- render(self, mode: str = 'rgb_array') -> Union[numpy.ndarray, Iterable[numpy.ndarray], Mapping[str, numpy.ndarray]]
- Renders the environment.
Args:
mode: One of ['rgb_array', 'human']. Renders to an numpy array, or brings
up a window where the environment can be visualized.
Returns:
An ndarray of shape [width, height, 3] denoting an RGB image if mode is
`rgb_array`. Otherwise return nothing and render directly to a display
window.
Raises:
NotImplementedError: If the environment does not support rendering.
- seed(self, seed: Union[int, Sequence[int], tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.sparse_tensor.SparseTensor, tensorflow.python.ops.ragged.ragged_tensor.RaggedTensor, numpy.ndarray]) -> Union[int, Sequence[int], tensorflow.python.framework.ops.Tensor, tensorflow.python.framework.sparse_tensor.SparseTensor, tensorflow.python.ops.ragged.ragged_tensor.RaggedTensor, numpy.ndarray]
- Seeds the environment.
Args:
seed: Value to use as seed for the environment.
- set_state(self, state: Any) -> None
- Restores the environment to a given `state`.
See definition of `state` in the documentation for get_state().
Args:
state: A state to restore the environment to.
- wrapped_env(self) -> Any
Data descriptors inherited from tf_agents.environments.wrappers.PyEnvironmentBaseWrapper:
- batch_size
- batched
Methods inherited from tf_agents.environments.py_environment.PyEnvironment:
- __enter__(self)
- Allows the environment to be used in a with-statement context.
- __exit__(self, unused_exception_type, unused_exc_value, unused_traceback)
- Allows the environment to be used in a with-statement context.
- current_time_step(self) -> tf_agents.trajectories.time_step.TimeStep
- Returns the current timestep.
- discount_spec(self) -> Union[tf_agents.specs.array_spec.ArraySpec, Iterable[tf_agents.specs.array_spec.ArraySpec], Mapping[str, tf_agents.specs.array_spec.ArraySpec]]
- Defines the discount that are returned by `step()`.
Override this method to define an environment that uses non-standard
discount values, for example an environment with array-valued discounts.
Returns:
An `ArraySpec`, or a nested dict, list or tuple of `ArraySpec`s.
- reset(self) -> tf_agents.trajectories.time_step.TimeStep
- Starts a new sequence and returns the first `TimeStep` of this sequence.
Note: Subclasses cannot override this directly. Subclasses implement
_reset() which will be called by this method. The output of _reset() will
be cached and made available through current_time_step().
Returns:
A `TimeStep` namedtuple containing:
step_type: A `StepType` of `FIRST`.
reward: 0.0, indicating the reward.
discount: 1.0, indicating the discount.
observation: A NumPy array, or a nested dict, list or tuple of arrays
corresponding to `observation_spec()`.
- reward_spec(self) -> Union[tf_agents.specs.array_spec.ArraySpec, Iterable[tf_agents.specs.array_spec.ArraySpec], Mapping[str, tf_agents.specs.array_spec.ArraySpec]]
- Defines the rewards that are returned by `step()`.
Override this method to define an environment that uses non-standard reward
values, for example an environment with array-valued rewards.
Returns:
An `ArraySpec`, or a nested dict, list or tuple of `ArraySpec`s.
- step(self, action: Union[numpy.ndarray, Iterable[numpy.ndarray], Mapping[str, numpy.ndarray]]) -> tf_agents.trajectories.time_step.TimeStep
- Updates the environment according to the action and returns a `TimeStep`.
If the environment returned a `TimeStep` with `StepType.LAST` at the
previous step the implementation of `_step` in the environment should call
`reset` to start a new sequence and ignore `action`.
This method will start a new sequence if called after the environment
has been constructed and `reset` has not been called. In this case
`action` will be ignored.
Note: Subclasses cannot override this directly. Subclasses implement
_step() which will be called by this method. The output of _step() will be
cached and made available through current_time_step().
Args:
action: A NumPy array, or a nested dict, list or tuple of arrays
corresponding to `action_spec()`.
Returns:
A `TimeStep` namedtuple containing:
step_type: A `StepType` value.
reward: A NumPy array, reward value for this timestep.
discount: A NumPy array, discount in the range [0, 1].
observation: A NumPy array, or a nested dict, list or tuple of arrays
corresponding to `observation_spec()`.
- time_step_spec(self) -> tf_agents.trajectories.time_step.TimeStep
- Describes the `TimeStep` fields returned by `step()`.
Override this method to define an environment that uses non-standard values
for any of the items returned by `step()`. For example, an environment with
array-valued rewards.
Returns:
A `TimeStep` namedtuple containing (possibly nested) `ArraySpec`s defining
the step_type, reward, discount, and observation structure.
Data descriptors inherited from tf_agents.environments.py_environment.PyEnvironment:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
| |