yogadlΒΆ
-
class
yogadl.
Stream
(iterator_fn: Callable, length: int, output_types: Any = None, output_shapes: Any = None) Stream contains a generator of data and other required information to feed into framework specific data APIs.
-
__iter__
() → Any Iterate through the records in the stream.
-
__len__
() → int Return the length of the stream, which may differ from the length of the dataset.
-
-
class
yogadl.
DataRef
The base interface for a reference to a dataset in the yogadl framework.
The DataRef may refer to a dataset in a remote storage location; it need not refer to locally- available data. The only mechanism for accessing the records inside the dataset is to create a Stream and to iterate through them.
By specifying all of the random-access options up front, the backend which provides the DataRef can provide performance-optimized streaming, since it is guaranteed with yogadl that lower layers will operate without random access.
-
abstract
__len__
() → int Return the length of the dataset that the DataRef refers to.
-
abstract
stream
(start_offset: int = 0, shuffle: bool = False, skip_shuffle_at_epoch_end: bool = False, shuffle_seed: Optional[int] = None, shard_rank: int = 0, num_shards: int = 1, drop_shard_remainder: bool = False) → yogadl._core.Stream Create a sequentially accessible set of records from the dataset, according to the random-access arguments given as parameters.
-
abstract
-
class
yogadl.
Storage
Storage is a cache for datasets.
Storage accepts datasets in various forms via submit(), and returns DataRef objects via fetch().
Conceptually, Storage is sort of like a DataRef factory. It stores datasets in an unspecified format, and returns objects which implement the DataRef interface.
Note that submit() and fetch() are not multiprocessing-safe by default. The @cacheable decorator should be safe to call simultaneously from many threads, processes, or machines.
-
abstract
cacheable
(dataset_id: str, dataset_version: str) → Callable A decorator that calls submit and fetch and is responsible for coordinating amongst instances of Storage in different processes.
-
abstract
fetch
(dataset_id: str, dataset_version: str) → yogadl._core.DataRef Fetch a dataset from storage and provide a DataRef for streaming it.
-
abstract
submit
(data: tensorflow.python.data.ops.dataset_ops.DatasetV2, dataset_id: str, dataset_version: str) → None Stores dataset to a cache.
-
abstract