orca_sdk.datasource#
Datasource
#
A Handle to a datasource in the OrcaCloud
A Datasource is a collection of data saved to the OrcaCloud that can be used to create a Memoryset
.
It can be created from a Hugging Face Dataset, a PyTorch DataLoader or Dataset, a list of dictionaries, a dictionary of columns, a pandas DataFrame, a pyarrow Table, or a local file.
Attributes:
-
id
(str
) –Unique identifier for the datasource
-
name
(str
) –Unique name of the datasource
-
length
(int
) –Number of rows in the datasource
-
created_at
(datetime
) –When the datasource was created
-
columns
(dict[str, str]
) –Dictionary of column names and types
from_hf_dataset
classmethod
#
Create a new datasource from a Hugging Face Dataset
Parameters:
-
name
(str
) –Required name for the new datasource (must be unique)
-
dataset
(Dataset
) –The Hugging Face Dataset to create the datasource from
-
if_exists
(CreateMode
, default:'error'
) –What to do if a datasource with the same name already exists, defaults to
"error"
. Other option is"open"
to open the existing datasource.
Returns:
-
Datasource
–A handle to the new datasource in the OrcaCloud
Raises:
-
ValueError
–If the datasource already exists and if_exists is
"error"
from_pytorch
classmethod
#
Create a new datasource from a PyTorch DataLoader or Dataset
Parameters:
-
name
(str
) –Required name for the new datasource (must be unique)
-
torch_data
(DataLoader | Dataset
) –The PyTorch DataLoader or Dataset to create the datasource from
-
column_names
(list[str] | None
, default:None
) –If the provided dataset or data loader returns unnamed tuples, this argument must be provided to specify the names of the columns.
-
if_exists
(CreateMode
, default:'error'
) –What to do if a datasource with the same name already exists, defaults to
"error"
. Other option is"open"
to open the existing datasource.
Returns:
-
Datasource
–A handle to the new datasource in the OrcaCloud
Raises:
-
ValueError
–If the datasource already exists and if_exists is
"error"
from_list
classmethod
#
Create a new datasource from a list of dictionaries
Parameters:
-
name
(str
) –Required name for the new datasource (must be unique)
-
data
(list[dict]
) –The list of dictionaries to create the datasource from
-
if_exists
(CreateMode
, default:'error'
) –What to do if a datasource with the same name already exists, defaults to
"error"
. Other option is"open"
to open the existing datasource.
Returns:
-
Datasource
–A handle to the new datasource in the OrcaCloud
Raises:
-
ValueError
–If the datasource already exists and if_exists is
"error"
Examples:
from_dict
classmethod
#
Create a new datasource from a dictionary of columns
Parameters:
-
name
(str
) –Required name for the new datasource (must be unique)
-
data
(dict
) –The dictionary of columns to create the datasource from
-
if_exists
(CreateMode
, default:'error'
) –What to do if a datasource with the same name already exists, defaults to
"error"
. Other option is"open"
to open the existing datasource.
Returns:
-
Datasource
–A handle to the new datasource in the OrcaCloud
Raises:
-
ValueError
–If the datasource already exists and if_exists is
"error"
Examples:
from_pandas
classmethod
#
Create a new datasource from a pandas DataFrame
Parameters:
-
name
(str
) –Required name for the new datasource (must be unique)
-
dataframe
(DataFrame
) –The pandas DataFrame to create the datasource from
-
if_exists
(CreateMode
, default:'error'
) –What to do if a datasource with the same name already exists, defaults to
"error"
. Other option is"open"
to open the existing datasource.
Returns:
-
Datasource
–A handle to the new datasource in the OrcaCloud
Raises:
-
ValueError
–If the datasource already exists and if_exists is
"error"
from_arrow
classmethod
#
Create a new datasource from a pyarrow Table
Parameters:
-
name
(str
) –Required name for the new datasource (must be unique)
-
pyarrow_table
(Table
) –The pyarrow Table to create the datasource from
-
if_exists
(CreateMode
, default:'error'
) –What to do if a datasource with the same name already exists, defaults to
"error"
. Other option is"open"
to open the existing datasource.
Returns:
-
Datasource
–A handle to the new datasource in the OrcaCloud
Raises:
-
ValueError
–If the datasource already exists and if_exists is
"error"
from_disk
classmethod
#
Create a new datasource from a local file
Parameters:
-
name
(str
) –Required name for the new datasource (must be unique)
-
file_path
(str | PathLike
) –Path to the file on disk to create the datasource from. The file type will be inferred from the file extension. The following file types are supported:
-
if_exists
(CreateMode
, default:'error'
) –What to do if a datasource with the same name already exists, defaults to
"error"
. Other option is"open"
to open the existing datasource.
Returns:
-
Datasource
–A handle to the new datasource in the OrcaCloud
Raises:
-
ValueError
–If the datasource already exists and if_exists is
"error"
open
classmethod
#
Get a handle to a datasource by name or id in the OrcaCloud
Parameters:
-
name
(str
) –The name or unique identifier of the datasource to get
Returns:
-
Datasource
–A handle to the existing datasource in the OrcaCloud
Raises:
-
LookupError
–If the datasource does not exist
exists
classmethod
#
all
classmethod
#
List all datasource handles in the OrcaCloud
Returns:
-
list[Datasource]
–A list of all datasource handles in the OrcaCloud
drop
classmethod
#
Delete a datasource from the OrcaCloud
Parameters:
-
name_or_id
(str
) –The name or id of the datasource to delete
-
if_not_exists
(DropMode
, default:'error'
) –What to do if the datasource does not exist, defaults to
"error"
. Other options are"ignore"
to do nothing.
Raises:
-
LookupError
–If the datasource does not exist and if_not_exists is
"error"