orca_sdk.datasource#
Datasource
#
A Handle to a datasource in the OrcaCloud
A Datasource is a collection of data saved to the OrcaCloud that can be used to create a Memoryset.
It can be created from a Hugging Face Dataset, a PyTorch DataLoader or Dataset, a list of dictionaries, a dictionary of columns, a pandas DataFrame, a pyarrow Table, or a local file.
Attributes:
-
id(str) –Unique identifier for the datasource
-
name(str) –Unique name of the datasource
-
description(str | None) –Optional description of the datasource
-
length(int) –Number of rows in the datasource
-
created_at(datetime) –When the datasource was created
-
columns(dict[str, str]) –Dictionary of column names and types
from_hf_dataset
classmethod
#
Create a new datasource from a Hugging Face Dataset
Parameters:
-
name(str) –Required name for the new datasource (must be unique)
-
dataset(Dataset) –The Hugging Face Dataset to create the datasource from
-
if_exists(CreateMode, default:'error') –What to do if a datasource with the same name already exists, defaults to
"error". Other option is"open"to open the existing datasource. -
description(str | None, default:None) –Optional description for the datasource
Returns:
-
Datasource–A handle to the new datasource in the OrcaCloud
Raises:
-
ValueError–If the datasource already exists and if_exists is
"error"
from_hf_dataset_dict
classmethod
#
Create datasources from a Hugging Face DatasetDict
Parameters:
-
name(str) –Name prefix for the new datasources, will be suffixed with the dataset name
-
dataset_dict(DatasetDict) –The Hugging Face DatasetDict to create the datasources from
-
if_exists(CreateMode, default:'error') –What to do if a datasource with the same name already exists, defaults to
"error". Other option is"open"to open the existing datasource. -
description(dict[str, str | None] | str | None, default:None) –Optional description for the datasources, can be a string or a dictionary of dataset names to descriptions
Returns:
-
dict[str, Datasource]–A dictionary of datasource handles, keyed by the dataset name
Raises:
-
ValueError–If a datasource already exists and if_exists is
"error"
from_pytorch
classmethod
#
Create a new datasource from a PyTorch DataLoader or Dataset
Parameters:
-
name(str) –Required name for the new datasource (must be unique)
-
torch_data(DataLoader | Dataset) –The PyTorch DataLoader or Dataset to create the datasource from
-
column_names(list[str] | None, default:None) –If the provided dataset or data loader returns unnamed tuples, this argument must be provided to specify the names of the columns.
-
if_exists(CreateMode, default:'error') –What to do if a datasource with the same name already exists, defaults to
"error". Other option is"open"to open the existing datasource. -
description(str | None, default:None) –Optional description for the datasource
Returns:
-
Datasource–A handle to the new datasource in the OrcaCloud
Raises:
-
ValueError–If the datasource already exists and if_exists is
"error"
from_list
classmethod
#
Create a new datasource from a list of dictionaries
Parameters:
-
name(str) –Required name for the new datasource (must be unique)
-
data(list[dict]) –The list of dictionaries to create the datasource from
-
if_exists(CreateMode, default:'error') –What to do if a datasource with the same name already exists, defaults to
"error". Other option is"open"to open the existing datasource. -
description(str | None, default:None) –Optional description for the datasource
Returns:
-
Datasource–A handle to the new datasource in the OrcaCloud
Raises:
-
ValueError–If the datasource already exists and if_exists is
"error"
Examples:
from_dict
classmethod
#
Create a new datasource from a dictionary of columns
Parameters:
-
name(str) –Required name for the new datasource (must be unique)
-
data(dict) –The dictionary of columns to create the datasource from
-
if_exists(CreateMode, default:'error') –What to do if a datasource with the same name already exists, defaults to
"error". Other option is"open"to open the existing datasource. -
description(str | None, default:None) –Optional description for the datasource
Returns:
-
Datasource–A handle to the new datasource in the OrcaCloud
Raises:
-
ValueError–If the datasource already exists and if_exists is
"error"
Examples:
from_pandas
classmethod
#
Create a new datasource from a pandas DataFrame
Parameters:
-
name(str) –Required name for the new datasource (must be unique)
-
dataframe(DataFrame) –The pandas DataFrame to create the datasource from
-
if_exists(CreateMode, default:'error') –What to do if a datasource with the same name already exists, defaults to
"error". Other option is"open"to open the existing datasource. -
description(str | None, default:None) –Optional description for the datasource
Returns:
-
Datasource–A handle to the new datasource in the OrcaCloud
Raises:
-
ValueError–If the datasource already exists and if_exists is
"error"
from_arrow
classmethod
#
Create a new datasource from a pyarrow Table
Parameters:
-
name(str) –Required name for the new datasource (must be unique)
-
pyarrow_table(Table) –The pyarrow Table to create the datasource from
-
if_exists(CreateMode, default:'error') –What to do if a datasource with the same name already exists, defaults to
"error". Other option is"open"to open the existing datasource. -
description(str | None, default:None) –Optional description for the datasource
Returns:
-
Datasource–A handle to the new datasource in the OrcaCloud
Raises:
-
ValueError–If the datasource already exists and if_exists is
"error"
from_disk
classmethod
#
Create a new datasource from a local file
Parameters:
-
name(str) –Required name for the new datasource (must be unique)
-
file_path(str | PathLike) –Path to the file on disk to create the datasource from. The file type will be inferred from the file extension. The following file types are supported:
-
if_exists(CreateMode, default:'error') –What to do if a datasource with the same name already exists, defaults to
"error". Other option is"open"to open the existing datasource. -
description(str | None, default:None) –Optional description for the datasource
Returns:
-
Datasource–A handle to the new datasource in the OrcaCloud
Raises:
-
ValueError–If the datasource already exists and if_exists is
"error"
open
classmethod
#
Get a handle to a datasource by name or id in the OrcaCloud
Parameters:
-
name_or_id(str) –The name or unique identifier of the datasource to get
Returns:
-
Datasource–A handle to the existing datasource in the OrcaCloud
Raises:
-
LookupError–If the datasource does not exist
exists
classmethod
#
all
classmethod
#
List all datasource handles in the OrcaCloud
Returns:
-
list[Datasource]–A list of all datasource handles in the OrcaCloud
drop
classmethod
#
Delete a datasource from the OrcaCloud
Parameters:
-
name_or_id(str) –The name or id of the datasource to delete
-
if_not_exists(DropMode, default:'error') –What to do if the datasource does not exist, defaults to
"error". Other options are"ignore"to do nothing.
Raises:
-
LookupError–If the datasource does not exist and if_not_exists is
"error"
download
#
Download the datasource to a specified path in the specified format type
Parameters:
-
output_dir(str | PathLike) –The local directory where the downloaded file will be saved.
-
file_type(Literal['hf_dataset', 'json', 'csv'], default:'hf_dataset') –The type of file to download.
Returns:
-
None–None