Working with datasets

$ om datasets -h
Usage:
  om datasets list [<pattern>] [--raw] [-E|--regexp] [options]
  om datasets put <path> <name> [--replace] [--csv=<param=value>]... [--format csv|image|binary] [options]
  om datasets get <name> <path> [--csv <param>=<value>]... [options]
  om datasets drop <name> [--force] [options]
  om datasets metadata <name> [options]

Storing a dataset

The cli supports storing the following types of datasets:

  • csv files

  • image files

  • binary files

For csv files, the <path> can be a local path, S3 URI, azure Blob Storage URI, http source, webhdfs, or scp path. See the smart_open library for details.

To store a csv:

# -- python equivalent to om.datasets.read_csv('sample.csv', 'sample')
$ om datasets put sample.csv sample
Metadata(name=sample,bucket=omegaml,prefix=data/,kind=pandas.dfrows,created=2021-02-12 15:34:38.633000)

To store images or binary files

$ om datasets put lenna.png lenna
Metadata(name=lenna,bucket=omegaml,prefix=data/,kind=ndarray.bin,created=2021-02-20 09:19:58.788435)

To store any other file as a file

$ om datasets put lenna.zip lenna.zip
Metadata(name=lenna.zip,bucket=omegaml,prefix=data/,kind=python.file,created=2021-02-20 09:21:30.882209)

To work with remote files:

$ om datasets put --format csv https://www.openml.org/data/get_csv/61/dataset_61_iris.arff iris

Retrieving a dataset

The cli supports retrieving the same types of datasets and store the contents to a local or remote path:

  • csv files

  • image files

  • binary files

For csv files, the <path> can be a local path, S3 URI, azure Blob Storage URI, http source, webhdfs, or scp path. See the smart_open library for details.

To retrieve a dataset

$ om datasets get iris iris.csv

To transfer a dataset to a remote location

$ om datasets get iris s3://mybucket/iris.csv

Working with remote files

Remote files are supported by providing a valid URL to a remote location. The cli uses the smart_open library for reading from and writing to remote locations.