class yogadl.Stream(iterator_fn: Callable, length: int, output_types: Any = None, output_shapes: Any = None)

Stream contains a generator of data and other required information to feed into framework specific data APIs.

__iter__() → Any

Iterate through the records in the stream.

__len__() → int

Return the length of the stream, which may differ from the length of the dataset.

class yogadl.DataRef

The base interface for a reference to a dataset in the yogadl framework.

The DataRef may refer to a dataset in a remote storage location; it need not refer to locally- available data. The only mechanism for accessing the records inside the dataset is to create a Stream and to iterate through them.

By specifying all of the random-access options up front, the backend which provides the DataRef can provide performance-optimized streaming, since it is guaranteed with yogadl that lower layers will operate without random access.

abstract __len__() → int

Return the length of the dataset that the DataRef refers to.

abstract stream(start_offset: int = 0, shuffle: bool = False, skip_shuffle_at_epoch_end: bool = False, shuffle_seed: Optional[int] = None, shard_rank: int = 0, num_shards: int = 1, drop_shard_remainder: bool = False) → yogadl._core.Stream

Create a sequentially accessible set of records from the dataset, according to the random-access arguments given as parameters.

class yogadl.Storage

Storage is a cache for datasets.

Storage accepts datasets in various forms via submit(), and returns DataRef objects via fetch().

Conceptually, Storage is sort of like a DataRef factory. It stores datasets in an unspecified format, and returns objects which implement the DataRef interface.

Note that submit() and fetch() are not multiprocessing-safe by default. The @cacheable decorator should be safe to call simultaneously from many threads, processes, or machines.

abstract cacheable(dataset_id: str, dataset_version: str) → Callable

A decorator that calls submit and fetch and is responsible for coordinating amongst instances of Storage in different processes.

abstract fetch(dataset_id: str, dataset_version: str) → yogadl._core.DataRef

Fetch a dataset from storage and provide a DataRef for streaming it.

abstract submit(data:, dataset_id: str, dataset_version: str) → None

Stores dataset to a cache.

Read the Docs v: latest
On Read the Docs
Project Home

Free document hosting provided by Read the Docs.