orca_sdk.labeled_memoryset#
FilterOperation
module-attribute
#
Operations that can be used in a filter expression.
FilterValue
module-attribute
#
FilterValue = (
str
| int
| float
| bool
| datetime
| None
| list[str]
| list[int]
| list[float]
| list[bool]
)
Values that can be used in a filter expression.
FilterItemTuple
module-attribute
#
Filter expression consisting of a field, an operator, and a value:
field
: The field to filter on.operation
: The operation to apply to the field and value.value
: The value to compare the field against.
Examples:
LabeledMemory
#
A row of the LabeledMemoryset
Attributes:
-
value
(str
) –Value represented by the row
-
embedding
(list[float]
) –Embedding of the value of the memory for semantic search, automatically generated with the
LabeledMemoryset.embedding_model
-
label
(int
) –Class label of the memory
-
label_name
(str | None
) –Human-readable name of the label, automatically populated from the
LabeledMemoryset.label_names
-
source_id
(str | None
) –Optional unique identifier of the memory in a system of reference
-
metrics
(LabeledMemoryMetrics | None
) –Metrics about the memory, generated when running an analysis on the
LabeledMemoryset
-
metadata
(dict[str, str | float | int | bool | None]
) –Metadata associated with the memory that is not used in the model. Metadata properties are also accessible as individual attributes on the instance.
-
memory_id
(str
) –Unique identifier for the memory, automatically generated on insert
-
memory_version
(int
) –Version of the memory, automatically updated when the label or value changes
-
created_at
(datetime
) –When the memory was created, automatically generated on insert
-
updated_at
(datetime
) –When the memory was last updated, automatically updated on update
Other Attributes:#
update
#
Update the memory with new values
Note
If a field is not provided, it will default to UNSET and not be updated.
Parameters:
-
value
(str
, default:UNSET
) –New value of the memory
-
label
(int
, default:UNSET
) –New label of the memory
-
source_id
(str | None
, default:UNSET
) –New source ID of the memory
-
**metadata
(None | bool | float | int | str
, default:{}
) –New values for metadata properties
Returns:
-
LabeledMemory
–The updated memory
LabeledMemoryLookup
#
Bases: LabeledMemory
Lookup result for a memory in a memoryset
Attributes:
-
lookup_score
(float
) –Similarity between the memory embedding and search query embedding
-
attention_weight
(float | None
) –Weight the model assigned to the memory during prediction if this lookup happened as part of a prediction
-
value
(float | None
) –Value represented by the row
-
embedding
(float | None
) –Embedding of the value of the memory for semantic search, automatically generated with the
LabeledMemoryset.embedding_model
-
label
(float | None
) –Class label of the memory
-
label_name
(float | None
) –Human-readable name of the label, automatically populated from the
LabeledMemoryset.label_names
-
source_id
(float | None
) –Optional unique identifier of the memory in a system of reference
-
metrics
(float | None
) –Metrics about the memory, generated when running an analysis on the
LabeledMemoryset
-
metadata
(float | None
) –Metadata associated with the memory that is not used in the model. Metadata properties are also accessible as individual attributes on the instance.
-
memory_id
(float | None
) –The unique identifier for the memory, automatically generated on insert
-
memory_version
(float | None
) –The version of the memory, automatically updated when the label or value changes
-
created_at
(float | None
) –When the memory was created, automatically generated on insert
-
updated_at
(float | None
) –When the memory was last updated, automatically updated on update
Other Attributes:#
update
#
Update the memory with new values
Note
If a field is not provided, it will default to UNSET and not be updated.
Parameters:
-
value
(str
, default:UNSET
) –New value of the memory
-
label
(int
, default:UNSET
) –New label of the memory
-
source_id
(str | None
, default:UNSET
) –New source ID of the memory
-
**metadata
(None | bool | float | int | str
, default:{}
) –New values for metadata properties
Returns:
-
LabeledMemory
–The updated memory
LabeledMemoryset
#
A Handle to a collection of memories with labels in the OrcaCloud
Attributes:
-
id
(str
) –Unique identifier for the memoryset
-
name
(str
) –Unique name of the memoryset
-
label_names
(list[str]
) –Names for the class labels in the memoryset
-
length
(int
) –Number of memories in the memoryset
-
embedding_model
(_EmbeddingModel
) –Embedding model used to embed the memory values for semantic search
-
created_at
(datetime
) –When the memoryset was created, automatically generated on create
-
updated_at
(datetime
) –When the memoryset was last updated, automatically updated on updates
create
classmethod
#
create(
name,
datasource,
*,
embedding_model=None,
value_column="value",
label_column="label",
source_id_column=None,
label_names=None,
max_seq_length_override=None,
if_exists="error"
)
Create a new memoryset in the OrcaCloud
All columns from the datasource that are not specified in the value_column
,
label_column
, or source_id_column
will be stored as metadata in the memoryset.
Parameters:
-
name
(str
) –Name for the new memoryset (must be unique)
-
datasource
(Datasource
) –Source data to populate the memories in the memoryset
-
embedding_model
(FinetunedEmbeddingModel | PretrainedEmbeddingModel | None
, default:None
) –Embedding model to use for embedding memory values for semantic search. If not provided, a default embedding model for the memoryset will be used.
-
value_column
(str
, default:'value'
) –Name of the column in the datasource that contains the memory values
-
label_column
(str
, default:'label'
) –Name of the column in the datasource that contains the memory labels, these must be contiguous integers starting from 0
-
source_id_column
(str | None
, default:None
) –Optional name of the column in the datasource that contains the ids in the system of reference
-
label_names
(list[str] | None
, default:None
) –List of human-readable names for the labels in the memoryset, must match the number of labels in the
label_column
. Will be automatically inferred if a Dataset with aClassLabel
feature for labels is used as the datasource -
max_seq_length_override
(int | None
, default:None
) –Maximum sequence length of values in the memoryset, if the value is longer than this it will be truncated, will default to the model’s max sequence length if not provided
-
if_exists
(CreateMode
, default:'error'
) –What to do if a memoryset with the same name already exists, defaults to
"error"
. Other option is"open"
to open the existing memoryset.
Returns:
-
LabeledMemoryset
–Handle to the new memoryset in the OrcaCloud
Raises:
-
ValueError
–If the memoryset already exists and if_exists is
"error"
or if it is"open"
and the params do not match those of the existing memoryset.
from_hf_dataset
classmethod
#
Create a new memoryset from a Hugging Face Dataset
in the OrcaCloud
This will automatically create a Datasource
with the same name
appended with _datasource
and use that as the datasource for the memoryset.
All features that are not specified to be used as value_column
, label_column
, or
source_id_column
will be stored as metadata in the memoryset.
Parameters:
-
name
(str
) –Name for the new memoryset (must be unique)
-
hf_dataset
(Dataset
) –Hugging Face dataset to create the memoryset from
-
**kwargs
–Additional parameters for creating the memoryset. See
create
attributes for details.
Returns:
-
LabeledMemoryset
–Handle to the new memoryset in the OrcaCloud
from_pytorch
classmethod
#
Create a new memoryset from a PyTorch DataLoader
or
Dataset
in the OrcaCloud
This will automatically create a Datasource
with the same name
appended with _datasource
and use that as the datasource for the memoryset.
All properties that are not specified to be used as value_column
, label_column
, or
source_id_column
will be stored as metadata in the memoryset.
Parameters:
-
name
(str
) –Name for the new memoryset (must be unique)
-
torch_data
(DataLoader | Dataset
) –PyTorch data loader or dataset to create the memoryset from
-
column_names
(list[str] | None
, default:None
) –If the provided dataset or data loader returns unnamed tuples, this argument must be provided to specify the names of the columns.
-
**kwargs
–Additional parameters for creating the memoryset. See
create
attributes for details.
Returns:
-
LabeledMemoryset
–Handle to the new memoryset in the OrcaCloud
from_list
classmethod
#
Create a new memoryset from a list of dictionaries in the OrcaCloud
This will automatically create a Datasource
with the same name
appended with _datasource
and use that as the datasource for the memoryset.
All properties that are not specified to be used as value_column
, label_column
, or
source_id_column
will be stored as metadata in the memoryset.
Parameters:
-
name
(str
) –Name for the new memoryset (must be unique)
-
data
(list[dict]
) –List of dictionaries to create the memoryset from
-
**kwargs
–Additional parameters for creating the memoryset. See
create
attributes for details.
Returns:
-
LabeledMemoryset
–Handle to the new memoryset in the OrcaCloud
Examples:
from_dict
classmethod
#
Create a new memoryset from a dictionary of columns in the OrcaCloud
This will automatically create a Datasource
with the same name
appended with _datasource
and use that as the datasource for the memoryset.
All columns from the datasource that are not specified in the value_column
,
label_column
, or source_id_column
will be stored as metadata in the memoryset.
Parameters:
-
name
(str
) –Name for the new memoryset (must be unique)
-
data
(dict
) –Dictionary of columns to create the memoryset from
-
**kwargs
–Additional parameters for creating the memoryset. See
create
attributes for details.
Returns:
-
LabeledMemoryset
–Handle to the new memoryset in the OrcaCloud
Examples:
from_pandas
classmethod
#
Create a new memoryset from a pandas DataFrame
in the OrcaCloud
This will automatically create a Datasource
with the same name
appended with _datasource
and use that as the datasource for the memoryset.
All columns that are not specified to be used as value_column
, label_column
, or
source_id_column
will be stored as metadata in the memoryset.
Parameters:
-
name
(str
) –Name for the new memoryset (must be unique)
-
dataframe
(DataFrame
) –Dataframe to create the memoryset from
-
**kwargs
–Additional parameters for creating the memoryset. See
create
attributes for details.
Returns:
-
LabeledMemoryset
–Handle to the new memoryset in the OrcaCloud
from_arrow
classmethod
#
Create a new memoryset from a PyArrow Table
in the OrcaCloud
This will automatically create a Datasource
with the same name
appended with _datasource
and use that as the datasource for the memoryset.
All columns that are not specified to be used as value_column
, label_column
, or
source_id_column
will be stored as metadata in the memoryset.
Parameters:
-
name
(str
) –Name for the new memoryset (must be unique)
-
pyarrow_table
(Table
) –PyArrow table to create the memoryset from
-
**kwargs
–Additional parameters for creating the memoryset. See
create
attributes for details.
Returns:
-
LabeledMemoryset
–Handle to the new memoryset in the OrcaCloud
from_disk
classmethod
#
Create a new memoryset from a file on disk in the OrcaCloud
This will automatically create a Datasource
with the same name
appended with _datasource
and use that as the datasource for the memoryset.
All columns from the datasource that are not specified in the value_column
,
label_column
, or source_id_column
will be stored as metadata in the memoryset.
Parameters:
-
name
(str
) –Name for the new memoryset (must be unique)
-
file_path
(str | PathLike
) –Path to the file on disk to create the memoryset from. The file type will be inferred from the file extension. The following file types are supported:
-
**kwargs
–Additional parameters for creating the memoryset. See
create
attributes for details.
Returns:
-
LabeledMemoryset
–Handle to the new memoryset in the OrcaCloud
open
classmethod
#
Get a handle to a memoryset in the OrcaCloud
Parameters:
-
name
(str
) –Name or unique identifier of the memoryset
Returns:
-
LabeledMemoryset
–Handle to the existing memoryset in the OrcaCloud
Raises:
-
LookupError
–If the memoryset does not exist
exists
classmethod
#
all
classmethod
#
Get a list of handles to all memorysets in the OrcaCloud
Returns:
-
list[LabeledMemoryset]
–List of handles to all memorysets in the OrcaCloud
drop
classmethod
#
Delete a memoryset from the OrcaCloud
Parameters:
-
name_or_id
(str
) –Name or id of the memoryset
-
if_not_exists
(DropMode
, default:'error'
) –What to do if the memoryset does not exist, defaults to
"error"
. Other options are"ignore"
to do nothing if the memoryset does not exist.
Raises:
-
LookupError
–If the memoryset does not exist and if_not_exists is
"error"
clone
#
Create a clone of the memoryset with a new name
Parameters:
-
name
(str
) –Name for the new memoryset (must be unique)
-
embedding_model
(PretrainedEmbeddingModel | FinetunedEmbeddingModel | None
, default:None
) –Optional new embedding model to use for re-embedding the memory values
-
max_seq_length_override
(int | None
, default:None
) –Maximum sequence length of values in the memoryset, if the value is longer than this it will be truncated, will default to the model’s max sequence length if not provided
-
if_exists
(CreateMode
, default:'error'
) –What to do if a memoryset with the same name already exists, defaults to
"error"
. Other option is"open"
to open the existing memoryset.
Returns:
-
LabeledMemoryset
–Handle to the cloned memoryset in the OrcaCloud
Examples:
refresh
#
Refresh the information about the memoryset from the OrcaCloud
Parameters:
-
throttle
(float
, default:0
) –Minimum time in seconds between refreshes
__getitem__
#
Get memories from the memoryset by index or memory id
Parameters:
-
index
(int | slice | str
) –Index or memory to retrieve or slice of memories to retrieve or unique identifier of the memory to retrieve
Returns:
-
LabeledMemory | list[LabeledMemory]
–Memory or memories from the memoryset
Raises:
-
LookupError
–If the id is not found or the index is out of bounds
Examples:
Retrieve the first memory in the memoryset:
Retrieve the last memory in the memoryset:
Retrieve a slice of memories in the memoryset:
Retrieve a memory by id:
search
#
Search for memories that are semantically similar to the query
Parameters:
-
query
(str | list[str]
) –Query to lookup memories in the memoryset, can be a single query or a list
-
count
(int
, default:1
) –Number of memories to return for each query
Returns:
-
list[LabeledMemoryLookup] | list[list[LabeledMemoryLookup]]
–List of memories from the memoryset that match the query. If a single query is provided, the return value is a list containing a single list of memories. If a list of queries is provided, the return value is a list of lists of memories.
Examples:
Search for similar memories:
Search for similar memories for multiple queries:
query
#
Query the memoryset for memories that match the filters
Parameters:
-
offset
(int
, default:0
) –The offset of the first memory to return
-
limit
(int
, default:100
) –The maximum number of memories to return
-
filters
(list[FilterItemTuple]
, default:[]
) –List of filters to apply to the query.
Returns:
-
list[LabeledMemory]
–List of memories from the memoryset that match the filters
Examples:
insert
#
Insert memories into the memoryset
Parameters:
-
memories
–List of memories to insert into the memoryset. This should be a list of dictionaries with the following keys:
value
: Value of the memorylabel
: Label of the memorysource_id
: Optional unique ID of the memory in a system of reference...
: Any other metadata to store for the memory
Examples:
get
#
Fetch a memory or memories from the memoryset
Parameters:
Returns:
-
LabeledMemory | list[LabeledMemory]
–Memory or list of memories from the memoryset
Raises:
-
LookupError
–If no memory with the given id is found
Examples:
Fetch a single memory:
Fetch multiple memories:
update
#
Update one or multiple memories in the memoryset
Parameters:
-
updates
(dict[str, Any] | Iterable[dict[str, Any]]
) –List of updates to apply to the memories. Each update should be a dictionary with the following keys:
memory_id
: Unique identifier of the memory to update (required)value
: Optional new value of the memorylabel
: Optional new label of the memorysource_id
: Optional new source ID of the memory...
: Optional new values for metadata properties
Returns:
-
LabeledMemory | list[LabeledMemory]
–Updated memory or list of updated memories
Examples:
Update a single memory:
Update multiple memories:
delete
#
Delete memories from the memoryset
Parameters:
Examples:
Delete a single memory:
Delete multiple memories:
find_duplicates
#
Run an analysis to find duplicate memories in the memoryset
The results of the analysis will be stored in the LabeledMemory.metrics
attribute of each memory in the memoryset.
Returns:
-
dict
–Summary of analysis with number of duplicate memories found
Examples:
analyze_labels
#
Run an analysis to access if the labels in the memoryset are consistent to detect possibly mislabeled memories.
The results of the analysis will be stored in the LabeledMemory.metrics
attribute of each memory in the memoryset.
Returns:
-
dict
–Summary of analysis with aggregate metrics for each label class
Examples:
display_label_analysis
#
Display a UI to review and act upon the label analysis results