orca_sdk.memoryset#

TelemetrySortItem `module-attribute` #

TelemetrySortItem = tuple[str, Literal['asc', 'desc']]

Sort expression for telemetry data consisting of a field and a direction.

field: The field to sort on.
direction: The direction to sort in.

Examples:

>>> ("feedback_metrics.accuracy.avg", "asc")
>>> ("lookup.count", "desc")

FilterOperation `module-attribute` #

FilterOperation = Literal[
    "==", "!=", ">", ">=", "<", "<=", "in", "not in", "like"
]

Operations that can be used in a filter expression.

FilterValue `module-attribute` #

FilterValue = (
    str
    | int
    | float
    | bool
    | datetime
    | None
    | list[str | None]
    | list[int]
    | list[float]
    | list[bool]
)

Values that can be used in a filter expression.

FilterItemTuple `module-attribute` #

FilterItemTuple = tuple[str, FilterOperation, FilterValue]

Filter expression consisting of a field, an operator, and a value:

field: The field to filter on.
operation: The operation to apply to the field and value.
value: The value to compare the field against.

Examples:

>>> ("label", "==", 0)
>>> ("metadata.author", "like", "John")
>>> ("source_id", "in", ["123", "456"])
>>> ("feedback_metrics.accuracy.avg", ">", 0.95)

MemoryBase #

Bases: ABC

feedback #

feedback()

Get feedback metrics computed from predictions that used this memory.

Returns a dictionary where: - Keys are feedback category names - Values are lists of feedback values (you may want to look at mean on the raw data)

to_dict #

to_dict()

Convert the memory to a dictionary

LabeledMemory #

Bases: MemoryBase

A row of the LabeledMemoryset

Attributes:

value –

Value represented by the row
embedding –

Embedding of the value of the memory for semantic search, automatically generated with the LabeledMemoryset.embedding_model
label (int | None) –

Class label of the memory
label_name (str | None) –

Human-readable name of the label, automatically populated from the LabeledMemoryset.label_names
source_id (str | None) –

Optional unique identifier of the memory in a system of reference
partition_id (str | None) –

Optional identifier of the partition the memory belongs to
metrics (str | None) –

Metrics about the memory, generated when running an analysis on the LabeledMemoryset
metadata (str | None) –

Metadata associated with the memory that is not used in the model. Metadata properties are also accessible as individual attributes on the instance.
memory_id (str | None) –

Unique identifier for the memory, automatically generated on insert
memory_version (str | None) –

Version of the memory, automatically updated when the label or value changes
created_at (str | None) –

When the memory was created, automatically generated on insert
updated_at (str | None) –

When the memory was last updated, automatically updated on update

Other Attributes:#

... (str | float | int | bool | None): All metadata properties can be accessed as attributes

feedback #

feedback()

Get feedback metrics computed from predictions that used this memory.

Returns a dictionary where: - Keys are feedback category names - Values are lists of feedback values (you may want to look at mean on the raw data)

update #

update(
    *,
    value=UNSET,
    label=UNSET,
    source_id=UNSET,
    partition_id=UNSET,
    **metadata
)

Update the memory with new values

Note

If a field is not provided, it will default to UNSET and not be updated.

Parameters:

value (str, default: UNSET ) –

New value of the memory
label (int | None, default: UNSET ) –

New label of the memory
source_id (str | None, default: UNSET ) –

New source ID of the memory
partition_id (str | None, default: UNSET ) –

New partition ID of the memory
**metadata (None | bool | float | int | str, default: {} ) –

New values for metadata properties

Returns:

LabeledMemory –

The updated memory

predictions #

predictions(
    limit=100,
    offset=0,
    tag=None,
    sort=[],
    expected_label_match=None,
)

Get classification predictions that used this memory.

Parameters:

limit (int, default: 100 ) –

Maximum number of predictions to return (default: 100)
offset (int, default: 0 ) –

Number of predictions to skip for pagination (default: 0)
tag (str | None, default: None ) –

Optional tag filter to only include predictions with this tag
sort (list[tuple[Literal['anomaly_score', 'confidence', 'timestamp'], Literal['asc', 'desc']]], default: [] ) –

List of (field, direction) tuples for sorting results. Valid fields: “anomaly_score”, “confidence”, “timestamp”. Valid directions: “asc”, “desc”
expected_label_match (bool | None, default: None ) –

Filter by prediction correctness: - True: only return correct predictions (label == expected_label) - False: only return incorrect predictions (label != expected_label) - None: return all predictions (default)

Returns:

list[ClassificationPrediction] –

List of ClassificationPrediction objects that used this memory

to_dict #

to_dict()

Convert the memory to a dictionary

LabeledMemoryLookup #

Bases: LabeledMemory

Lookup result for a memory in a memoryset

Attributes:

lookup_score (float) –

Similarity between the memory embedding and search query embedding
attention_weight (float | None) –

Weight the model assigned to the memory during prediction if this lookup happened as part of a prediction
value (float | None) –

Value represented by the row
embedding (float | None) –

Embedding of the value of the memory for semantic search, automatically generated with the LabeledMemoryset.embedding_model
label (float | None) –

Class label of the memory
label_name (float | None) –

Human-readable name of the label, automatically populated from the LabeledMemoryset.label_names
source_id (float | None) –

Optional unique identifier of the memory in a system of reference
partition_id (float | None) –

Optional identifier of the partition the memory belongs to
metrics (float | None) –

Metrics about the memory, generated when running an analysis on the LabeledMemoryset
metadata (float | None) –

Metadata associated with the memory that is not used in the model. Metadata properties are also accessible as individual attributes on the instance.
memory_id (float | None) –

The unique identifier for the memory, automatically generated on insert
memory_version (float | None) –

The version of the memory, automatically updated when the label or value changes
created_at (float | None) –

When the memory was created, automatically generated on insert
updated_at (float | None) –

When the memory was last updated, automatically updated on update

Other Attributes:#

... (str | float | int | bool | None): All metadata properties can be accessed as attributes

feedback #

feedback()

Get feedback metrics computed from predictions that used this memory.

Returns a dictionary where: - Keys are feedback category names - Values are lists of feedback values (you may want to look at mean on the raw data)

to_dict #

to_dict()

Convert the memory to a dictionary

update #

update(
    *,
    value=UNSET,
    label=UNSET,
    source_id=UNSET,
    partition_id=UNSET,
    **metadata
)

Update the memory with new values

Note

If a field is not provided, it will default to UNSET and not be updated.

Parameters:

value (str, default: UNSET ) –

New value of the memory
label (int | None, default: UNSET ) –

New label of the memory
source_id (str | None, default: UNSET ) –

New source ID of the memory
partition_id (str | None, default: UNSET ) –

New partition ID of the memory
**metadata (None | bool | float | int | str, default: {} ) –

New values for metadata properties

Returns:

LabeledMemory –

The updated memory

predictions #

predictions(
    limit=100,
    offset=0,
    tag=None,
    sort=[],
    expected_label_match=None,
)

Get classification predictions that used this memory.

Parameters:

limit (int, default: 100 ) –

Maximum number of predictions to return (default: 100)
offset (int, default: 0 ) –

Number of predictions to skip for pagination (default: 0)
tag (str | None, default: None ) –

Optional tag filter to only include predictions with this tag
sort (list[tuple[Literal['anomaly_score', 'confidence', 'timestamp'], Literal['asc', 'desc']]], default: [] ) –

List of (field, direction) tuples for sorting results. Valid fields: “anomaly_score”, “confidence”, “timestamp”. Valid directions: “asc”, “desc”
expected_label_match (bool | None, default: None ) –

Filter by prediction correctness: - True: only return correct predictions (label == expected_label) - False: only return incorrect predictions (label != expected_label) - None: return all predictions (default)

Returns:

list[ClassificationPrediction] –

List of ClassificationPrediction objects that used this memory

ScoredMemory #

Bases: MemoryBase

A row of the ScoredMemoryset

Attributes:

value –

Value represented by the row
embedding –

Embedding of the value of the memory for semantic search, automatically generated with the ScoredMemoryset.embedding_model
score (float | None) –

Score of the memory
source_id (float | None) –

Optional unique identifier of the memory in a system of reference
partition_id (float | None) –

Optional identifier of the partition the memory belongs to
metrics (float | None) –

Metrics about the memory, generated when running an analysis on the ScoredMemoryset
metadata (float | None) –

Metadata associated with the memory that is not used in the model. Metadata properties are also accessible as individual attributes on the instance.
memory_id (float | None) –

Unique identifier for the memory, automatically generated on insert
memory_version (float | None) –

Version of the memory, automatically updated when the score or value changes
created_at (float | None) –

When the memory was created, automatically generated on insert
updated_at (float | None) –

When the memory was last updated, automatically updated on update

Other Attributes:#

... (str | float | int | bool | None): All metadata properties can be accessed as attributes

feedback #

feedback()

Get feedback metrics computed from predictions that used this memory.

Returns a dictionary where: - Keys are feedback category names - Values are lists of feedback values (you may want to look at mean on the raw data)

update #

update(
    *,
    value=UNSET,
    score=UNSET,
    source_id=UNSET,
    partition_id=UNSET,
    **metadata
)

Update the memory with new values

Note

If a field is not provided, it will default to UNSET and not be updated.

Parameters:

value (str, default: UNSET ) –

New value of the memory
score (float | None, default: UNSET ) –

New score of the memory
source_id (str | None, default: UNSET ) –

New source ID of the memory
**metadata (None | bool | float | int | str, default: {} ) –

New values for metadata properties

Returns:

ScoredMemory –

The updated memory

predictions #

predictions(
    limit=100,
    offset=0,
    tag=None,
    sort=[],
    expected_label_match=None,
)

Get regression predictions that used this memory.

Parameters:

limit (int, default: 100 ) –

Maximum number of predictions to return (default: 100)
offset (int, default: 0 ) –

Number of predictions to skip for pagination (default: 0)
tag (str | None, default: None ) –

Optional tag filter to only include predictions with this tag
sort (list[tuple[Literal['anomaly_score', 'confidence', 'timestamp'], Literal['asc', 'desc']]], default: [] ) –

List of (field, direction) tuples for sorting results. Valid fields: “anomaly_score”, “confidence”, “timestamp”. Valid directions: “asc”, “desc”
expected_label_match (bool | None, default: None ) –

Filter by prediction correctness: - True: only return correct predictions (score close to expected_score) - False: only return incorrect predictions (score differs from expected_score) - None: return all predictions (default) Note: For regression, “correctness” is based on score proximity to expected_score.

Returns:

list[RegressionPrediction] –

List of RegressionPrediction objects that used this memory

to_dict #

to_dict()

Convert the memory to a dictionary

ScoredMemoryLookup #

Bases: ScoredMemory

Lookup result for a memory in a memoryset

Attributes:

lookup_score (float) –

Similarity between the memory embedding and search query embedding
attention_weight (float | None) –

Weight the model assigned to the memory during prediction if this lookup happened as part of a prediction
value (float | None) –

Value represented by the row
embedding (float | None) –

Embedding of the value of the memory for semantic search, automatically generated with the ScoredMemoryset.embedding_model
score (float | None) –

Score of the memory
source_id (float | None) –

Optional unique identifier of the memory in a system of reference
partition_id (float | None) –

Optional identifier of the partition the memory belongs to
metrics (float | None) –

Metrics about the memory, generated when running an analysis on the ScoredMemoryset
memory_id (float | None) –

The unique identifier for the memory, automatically generated on insert
memory_version (float | None) –

The version of the memory, automatically updated when the score or value changes
created_at (float | None) –

When the memory was created, automatically generated on insert
updated_at (float | None) –

When the memory was last updated, automatically updated on update

Other Attributes:#

... (str | float | int | bool | None): All metadata properties can be accessed as attributes

feedback #

feedback()

Get feedback metrics computed from predictions that used this memory.

Returns a dictionary where: - Keys are feedback category names - Values are lists of feedback values (you may want to look at mean on the raw data)

to_dict #

to_dict()

Convert the memory to a dictionary

update #

update(
    *,
    value=UNSET,
    score=UNSET,
    source_id=UNSET,
    partition_id=UNSET,
    **metadata
)

Update the memory with new values

Note

If a field is not provided, it will default to UNSET and not be updated.

Parameters:

value (str, default: UNSET ) –

New value of the memory
score (float | None, default: UNSET ) –

New score of the memory
source_id (str | None, default: UNSET ) –

New source ID of the memory
**metadata (None | bool | float | int | str, default: {} ) –

New values for metadata properties

Returns:

ScoredMemory –

The updated memory

predictions #

predictions(
    limit=100,
    offset=0,
    tag=None,
    sort=[],
    expected_label_match=None,
)

Get regression predictions that used this memory.

Parameters:

limit (int, default: 100 ) –

Maximum number of predictions to return (default: 100)
offset (int, default: 0 ) –

Number of predictions to skip for pagination (default: 0)
tag (str | None, default: None ) –

Optional tag filter to only include predictions with this tag
sort (list[tuple[Literal['anomaly_score', 'confidence', 'timestamp'], Literal['asc', 'desc']]], default: [] ) –

List of (field, direction) tuples for sorting results. Valid fields: “anomaly_score”, “confidence”, “timestamp”. Valid directions: “asc”, “desc”
expected_label_match (bool | None, default: None ) –

Filter by prediction correctness: - True: only return correct predictions (score close to expected_score) - False: only return incorrect predictions (score differs from expected_score) - None: return all predictions (default) Note: For regression, “correctness” is based on score proximity to expected_score.

Returns:

list[RegressionPrediction] –

List of RegressionPrediction objects that used this memory

MemorysetBase #

Bases: Generic[MemoryT, MemoryLookupT], ABC

A Handle to a collection of memories with labels in the OrcaCloud

Attributes:

id (str) –

Unique identifier for the memoryset
name (str) –

Unique name of the memoryset
description (str | None) –

Description of the memoryset
length (int) –

Number of memories in the memoryset
embedding_model (EmbeddingModelBase) –

Embedding model used to embed the memory values for semantic search
created_at (datetime) –

When the memoryset was created, automatically generated on create
updated_at (datetime) –

When the memoryset was last updated, automatically updated on updates

create `classmethod` #

create(
    name: str,
    *,
    datasource: None = None,
    embedding_model: (
        FinetunedEmbeddingModel
        | PretrainedEmbeddingModel
        | None
    ) = None,
    description: str | None = None,
    label_names: list[str] | None = None,
    max_seq_length_override: int | None = None,
    prompt: str | None = None,
    index_type: IndexType = "FLAT",
    index_params: dict[str, Any] = {},
    if_exists: CreateMode = "error",
    hidden: bool = False,
    memory_type: MemoryType | None = None
) -> Self

create(
    name: str,
    *,
    datasource: Datasource,
    embedding_model: (
        FinetunedEmbeddingModel
        | PretrainedEmbeddingModel
        | None
    ) = None,
    value_column: str = "value",
    label_column: str | None = None,
    score_column: str | None = None,
    source_id_column: str | None = None,
    partition_id_column: str | None = None,
    description: str | None = None,
    label_names: list[str] | None = None,
    max_seq_length_override: int | None = None,
    prompt: str | None = None,
    remove_duplicates: bool = True,
    index_type: IndexType = "FLAT",
    index_params: dict[str, Any] = {},
    if_exists: CreateMode = "error",
    background: Literal[True],
    hidden: bool = False,
    subsample: int | float | None = None,
    memory_type: MemoryType | None = None
) -> Job[Self]

create(
    name: str,
    *,
    datasource: Datasource,
    embedding_model: (
        FinetunedEmbeddingModel
        | PretrainedEmbeddingModel
        | None
    ) = None,
    value_column: str = "value",
    label_column: str | None = None,
    score_column: str | None = None,
    source_id_column: str | None = None,
    partition_id_column: str | None = None,
    description: str | None = None,
    label_names: list[str] | None = None,
    max_seq_length_override: int | None = None,
    prompt: str | None = None,
    remove_duplicates: bool = True,
    index_type: IndexType = "FLAT",
    index_params: dict[str, Any] = {},
    if_exists: CreateMode = "error",
    background: Literal[False] = False,
    hidden: bool = False,
    subsample: int | float | None = None,
    memory_type: MemoryType | None = None
) -> Self

create(
    name,
    *,
    datasource=None,
    embedding_model=None,
    value_column="value",
    label_column=None,
    score_column=None,
    source_id_column=None,
    partition_id_column=None,
    description=None,
    label_names=None,
    max_seq_length_override=None,
    prompt=None,
    remove_duplicates=True,
    index_type="FLAT",
    index_params={},
    if_exists="error",
    background=False,
    hidden=False,
    subsample=None,
    memory_type=None
)

Create a new memoryset in the OrcaCloud

If datasource is provided, all columns from the datasource that are not specified in the value_column, label_column, source_id_column, or partition_id_column will be stored as metadata in the memoryset.

If datasource is omitted (None), an empty memoryset will be created with no initial memories. You can add memories later using the insert method.

Parameters:

name (str) –

Name for the new memoryset (must be unique)
datasource (Datasource | None, default: None ) –

Optional source data to populate the memories in the memoryset. If omitted, an empty memoryset will be created.
embedding_model (FinetunedEmbeddingModel | PretrainedEmbeddingModel | None, default: None ) –

Embedding model to use for embedding memory values for semantic search. If not provided, a default embedding model for the memoryset will be used.
value_column (str, default: 'value' ) –

Name of the column in the datasource that contains the memory values
label_column (str | None, default: None ) –

Name of the column in the datasource that contains the memory labels. Must contain categorical values as integers or strings. String labels will be converted to integers with the unique strings extracted as label_names
score_column (str | None, default: None ) –

Name of the column in the datasource that contains the memory scores
source_id_column (str | None, default: None ) –

Optional name of the column in the datasource that contains the ids in the system of reference
partition_id_column (str | None, default: None ) –

Optional name of the column in the datasource that contains the partition ids
description (str | None, default: None ) –

Optional description for the memoryset, this will be used in agentic flows, so make sure it is concise and describes the contents of your memoryset not the datasource or the embedding model.
label_names (list[str] | None, default: None ) –

List of human-readable names for the labels in the memoryset, must match the number of labels in the label_column. Will be automatically inferred if string labels are provided or if a Dataset with a ClassLabel feature for labels is used as the datasource
max_seq_length_override (int | None, default: None ) –

Maximum sequence length of values in the memoryset, if the value is longer than this it will be truncated, will default to the model’s max sequence length if not provided
prompt (str | None, default: None ) –

Optional prompt to use when embedding documents/memories for storage
remove_duplicates (bool, default: True ) –

Whether to remove duplicates from the datasource before inserting into the memoryset
index_type (IndexType, default: 'FLAT' ) –

Type of vector index to use for the memoryset, defaults to "FLAT". Valid values are "FLAT", "IVF_FLAT", "IVF_SQ8", "IVF_PQ", "HNSW", and "DISKANN".
index_params (dict[str, Any], default: {} ) –

Parameters for the vector index, defaults to {}
if_exists (CreateMode, default: 'error' ) –

What to do if a memoryset with the same name already exists, defaults to "error". Other option is "open" to open the existing memoryset.
background (bool, default: False ) –

Whether to run the operation none blocking and return a job handle. Note: This parameter is ignored when creating an empty memoryset (when datasource is None).
hidden (bool, default: False ) –

Whether the memoryset should be hidden
subsample (int | float | None, default: None ) –

Optional number (int) of rows to insert or fraction (float in (0, 1]) of the datasource to insert. Use to limit the size of the initial memoryset.
memory_type (MemoryType | None, default: None ) –

Type of memoryset to create, defaults to "LABELED" if label_column is provided, and "SCORED" if score_column is provided, must be specified for other cases.

Returns: Handle to the new memoryset in the OrcaCloud

Raises:

ValueError –

If the memoryset already exists and if_exists is "error" or if it is "open" and the params do not match those of the existing memoryset.

from_datasource `classmethod` #

from_datasource(
    name: str,
    *,
    datasource: Datasource,
    embedding_model: (
        FinetunedEmbeddingModel
        | PretrainedEmbeddingModel
        | None
    ) = None,
    value_column: str = "value",
    label_column: str | None = None,
    score_column: str | None = None,
    source_id_column: str | None = None,
    partition_id_column: str | None = None,
    description: str | None = None,
    label_names: list[str] | None = None,
    max_seq_length_override: int | None = None,
    prompt: str | None = None,
    remove_duplicates: bool = True,
    index_type: IndexType = "FLAT",
    index_params: dict[str, Any] = {},
    if_exists: CreateMode = "error",
    background: Literal[True],
    hidden: bool = False,
    subsample: int | float | None = None,
    memory_type: MemoryType | None = None
) -> Job[Self]

from_datasource(
    name: str,
    *,
    datasource: Datasource,
    embedding_model: (
        FinetunedEmbeddingModel
        | PretrainedEmbeddingModel
        | None
    ) = None,
    value_column: str = "value",
    label_column: str | None = None,
    score_column: str | None = None,
    source_id_column: str | None = None,
    partition_id_column: str | None = None,
    description: str | None = None,
    label_names: list[str] | None = None,
    max_seq_length_override: int | None = None,
    prompt: str | None = None,
    remove_duplicates: bool = True,
    index_type: IndexType = "FLAT",
    index_params: dict[str, Any] = {},
    if_exists: CreateMode = "error",
    background: Literal[False] = False,
    hidden: bool = False,
    subsample: int | float | None = None,
    memory_type: MemoryType | None = None
) -> Self

from_datasource(
    name,
    *,
    datasource,
    embedding_model=None,
    value_column="value",
    label_column=None,
    score_column=None,
    source_id_column=None,
    partition_id_column=None,
    description=None,
    label_names=None,
    max_seq_length_override=None,
    prompt=None,
    remove_duplicates=True,
    index_type="FLAT",
    index_params={},
    if_exists="error",
    background=False,
    hidden=False,
    subsample=None,
    memory_type=None
)

Create a new memoryset in the OrcaCloud from a datasource.

This is a convenience method that is equivalent to calling create with a datasource. All columns from the datasource that are not specified in the value_column, label_column, source_id_column, or partition_id_column will be stored as metadata in the memoryset.

Parameters:

name (str) –

Name for the new memoryset (must be unique)
datasource (Datasource) –

Source data to populate the memories in the memoryset.
embedding_model (FinetunedEmbeddingModel | PretrainedEmbeddingModel | None, default: None ) –

Embedding model to use for embedding memory values for semantic search. If not provided, a default embedding model for the memoryset will be used.
value_column (str, default: 'value' ) –

Name of the column in the datasource that contains the memory values
label_column (str | None, default: None ) –

Name of the column in the datasource that contains the memory labels. Must contain categorical values as integers or strings. String labels will be converted to integers with the unique strings extracted as label_names
score_column (str | None, default: None ) –

Name of the column in the datasource that contains the memory scores
source_id_column (str | None, default: None ) –

Optional name of the column in the datasource that contains the ids in the system of reference
partition_id_column (str | None, default: None ) –

Optional name of the column in the datasource that contains the partition ids
description (str | None, default: None ) –

Optional description for the memoryset, this will be used in agentic flows, so make sure it is concise and describes the contents of your memoryset not the datasource or the embedding model.
label_names (list[str] | None, default: None ) –

List of human-readable names for the labels in the memoryset, must match the number of labels in the label_column. Will be automatically inferred if string labels are provided or if a Dataset with a ClassLabel feature for labels is used as the datasource
max_seq_length_override (int | None, default: None ) –

Maximum sequence length of values in the memoryset, if the value is longer than this it will be truncated, will default to the model’s max sequence length if not provided
prompt (str | None, default: None ) –

Optional prompt to use when embedding documents/memories for storage
remove_duplicates (bool, default: True ) –

Whether to remove duplicates from the datasource before inserting into the memoryset
index_type (IndexType, default: 'FLAT' ) –

Type of vector index to use for the memoryset, defaults to "FLAT". Valid values are "FLAT", "IVF_FLAT", "IVF_SQ8", "IVF_PQ", "HNSW", and "DISKANN".
index_params (dict[str, Any], default: {} ) –

Parameters for the vector index, defaults to {}
if_exists (CreateMode, default: 'error' ) –

What to do if a memoryset with the same name already exists, defaults to "error". Other option is "open" to open the existing memoryset.
background (bool, default: False ) –

Whether to run the operation none blocking and return a job handle.
hidden (bool, default: False ) –

Whether the memoryset should be hidden
subsample (int | float | None, default: None ) –

Optional number (int) of rows to insert or fraction (float in (0, 1]) of the datasource to insert. Use to limit the size of the initial memoryset.
memory_type (MemoryType | None, default: None ) –

Type of memoryset to create, defaults to "LABELED" if label_column is provided, and "SCORED" if score_column is provided, must be specified for other cases.

Returns: Handle to the new memoryset in the OrcaCloud

Raises:

ValueError –

If the memoryset already exists and if_exists is "error" or if it is "open" and the params do not match those of the existing memoryset.

from_hf_dataset `classmethod` #

from_hf_dataset(
    name: str,
    hf_dataset: Dataset,
    background: Literal[True],
    **kwargs: Any
) -> Self

from_hf_dataset(
    name: str,
    hf_dataset: Dataset,
    background: Literal[False] = False,
    **kwargs: Any
) -> Self

from_hf_dataset(
    name, hf_dataset, background=False, **kwargs
)

Create a new memoryset from a Hugging Face Dataset in the OrcaCloud

This will automatically create a Datasource with the same name appended with _datasource and use that as the datasource for the memoryset.

All features that are not specified to be used as value_column, label_column, or source_id_column will be stored as metadata in the memoryset.

Parameters:

name (str) –

Name for the new memoryset (must be unique)
hf_dataset (Dataset) –

Hugging Face dataset to create the memoryset from
kwargs (Any, default: {} ) –

Additional parameters for creating the memoryset. See create attributes for details.

Returns:

Self | Job[Self] –

Handle to the new memoryset in the OrcaCloud

from_pytorch `classmethod` #

from_pytorch(
    name: str,
    torch_data: DataLoader | Dataset,
    *,
    column_names: list[str] | None = None,
    background: Literal[True],
    **kwargs: Any
) -> Job[Self]

from_pytorch(
    name: str,
    torch_data: DataLoader | Dataset,
    *,
    column_names: list[str] | None = None,
    background: Literal[False] = False,
    **kwargs: Any
) -> Self

from_pytorch(
    name,
    torch_data,
    *,
    column_names=None,
    background=False,
    **kwargs
)

Create a new memoryset from a PyTorch DataLoader or Dataset in the OrcaCloud

This will automatically create a Datasource with the same name appended with _datasource and use that as the datasource for the memoryset.

All properties that are not specified to be used as value_column, label_column, or source_id_column, or partition_id_column will be stored as metadata in the memoryset.

Parameters:

name (str) –

Name for the new memoryset (must be unique)
torch_data (DataLoader | Dataset) –

PyTorch data loader or dataset to create the memoryset from
column_names (list[str] | None, default: None ) –

If the provided dataset or data loader returns unnamed tuples, this argument must be provided to specify the names of the columns.
background (bool, default: False ) –

Whether to run the operation in the background
kwargs (Any, default: {} ) –

Additional parameters for creating the memoryset. See create attributes for details.

Returns:

Self | Job[Self] –

Handle to the new memoryset in the OrcaCloud

from_list `classmethod` #

from_list(
    name: str,
    data: list[dict],
    *,
    background: Literal[True],
    **kwargs: Any
) -> Job[Self]

from_list(
    name: str,
    data: list[dict],
    *,
    background: Literal[False] = False,
    **kwargs: Any
) -> Self

from_list(name, data, *, background=False, **kwargs)

Create a new memoryset from a list of dictionaries in the OrcaCloud

This will automatically create a Datasource with the same name appended with _datasource and use that as the datasource for the memoryset.

All properties that are not specified to be used as value_column, label_column, or source_id_column, or partition_id_column will be stored as metadata in the memoryset.

Parameters:

name (str) –

Name for the new memoryset (must be unique)
data (list[dict]) –

List of dictionaries to create the memoryset from
background (bool, default: False ) –

Whether to run the operation in the background
kwargs (Any, default: {} ) –

Additional parameters for creating the memoryset. See create attributes for details.

Returns:

Self | Job[Self] –

Handle to the new memoryset in the OrcaCloud

Examples:

>>> LabeledMemoryset.from_list("my_memoryset", [
...     {"value": "hello", "label": 0, "tag": "tag1"},
...     {"value": "world", "label": 1, "tag": "tag2"},
... ])

from_dict `classmethod` #

from_dict(
    name: str,
    data: dict,
    *,
    background: Literal[True],
    **kwargs: Any
) -> Job[Self]

from_dict(
    name: str,
    data: dict,
    *,
    background: Literal[False] = False,
    **kwargs: Any
) -> Self

from_dict(name, data, *, background=False, **kwargs)

Create a new memoryset from a dictionary of columns in the OrcaCloud

This will automatically create a Datasource with the same name appended with _datasource and use that as the datasource for the memoryset.

All columns from the datasource that are not specified in the value_column, label_column, source_id_column, or partition_id_column will be stored as metadata in the memoryset.

Parameters:

name (str) –

Name for the new memoryset (must be unique)
data (dict) –

Dictionary of columns to create the memoryset from
background (bool, default: False ) –

Whether to run the operation in the background
kwargs (Any, default: {} ) –

Additional parameters for creating the memoryset. See create attributes for details.

Returns:

Self | Job[Self] –

Handle to the new memoryset in the OrcaCloud

Examples:

>>> LabeledMemoryset.from_dict("my_memoryset", {
...     "value": ["hello", "world"],
...     "label": [0, 1],
...     "tag": ["tag1", "tag2"],
... })

from_pandas `classmethod` #

from_pandas(
    name: str,
    dataframe: DataFrame,
    *,
    background: Literal[True],
    **kwargs: Any
) -> Job[Self]

from_pandas(
    name: str,
    dataframe: DataFrame,
    *,
    background: Literal[False] = False,
    **kwargs: Any
) -> Self

from_pandas(name, dataframe, *, background=False, **kwargs)

Create a new memoryset from a pandas DataFrame in the OrcaCloud

This will automatically create a Datasource with the same name appended with _datasource and use that as the datasource for the memoryset.

All columns that are not specified to be used as value_column, label_column, or source_id_column, or partition_id_column will be stored as metadata in the memoryset.

Parameters:

name (str) –

Name for the new memoryset (must be unique)
dataframe (DataFrame) –

Dataframe to create the memoryset from
background (bool, default: False ) –

Whether to run the operation in the background
kwargs (Any, default: {} ) –

Additional parameters for creating the memoryset. See create attributes for details.

Returns:

Self | Job[Self] –

Handle to the new memoryset in the OrcaCloud

from_arrow `classmethod` #

from_arrow(
    name: str,
    pyarrow_table: Table,
    *,
    background: Literal[True],
    **kwargs: Any
) -> Job[Self]

from_arrow(
    name: str,
    pyarrow_table: Table,
    *,
    background: Literal[False] = False,
    **kwargs: Any
) -> Self

from_arrow(
    name, pyarrow_table, *, background=False, **kwargs
)

Create a new memoryset from a PyArrow Table in the OrcaCloud

This will automatically create a Datasource with the same name appended with _datasource and use that as the datasource for the memoryset.

All columns that are not specified to be used as value_column, label_column, or source_id_column, or partition_id_column will be stored as metadata in the memoryset.

Parameters:

name (str) –

Name for the new memoryset (must be unique)
pyarrow_table (Table) –

PyArrow table to create the memoryset from
background (bool, default: False ) –

Whether to run the operation in the background
kwargs (Any, default: {} ) –

Additional parameters for creating the memoryset. See create attributes for details.

Returns:

Self | Job[Self] –

Handle to the new memoryset in the OrcaCloud

from_disk `classmethod` #

from_disk(
    name: str,
    file_path: str | PathLike,
    *,
    background: Literal[True],
    **kwargs: Any
) -> Job[Self]

from_disk(
    name: str,
    file_path: str | PathLike,
    *,
    background: Literal[False] = False,
    **kwargs: Any
) -> Self

from_disk(name, file_path, *, background=False, **kwargs)

Create a new memoryset from a file on disk in the OrcaCloud

This will automatically create a Datasource with the same name appended with _datasource and use that as the datasource for the memoryset.

All columns from the datasource that are not specified in the value_column, label_column, source_id_column, or partition_id_column will be stored as metadata in the memoryset.

Parameters:

name (str) –

Name for the new memoryset (must be unique)
file_path (str | PathLike) –
Path to the file on disk to create the memoryset from. The file type will be inferred from the file extension. The following file types are supported:
- .pkl: Pickle files containing lists of dictionaries or dictionaries of columns
- .json/.jsonl: JSON and [JSON] Lines files
- .csv: CSV files
- .parquet: Parquet files
- dataset directory: Directory containing a saved HuggingFace Dataset
background (bool, default: False ) –

Whether to run the operation in the background
kwargs (Any, default: {} ) –

Additional parameters for creating the memoryset. See create attributes for details.

Returns:

Self | Job[Self] –

Handle to the new memoryset in the OrcaCloud

open `classmethod` #

open(name)

Get a handle to a memoryset in the OrcaCloud

Parameters:

name (str) –

Name or unique identifier of the memoryset

Returns:

Self –

Handle to the existing memoryset in the OrcaCloud

Raises:

LookupError –

If the memoryset does not exist

aopen `async` `classmethod` #

aopen(name)

Asynchronously get a handle to a memoryset in the OrcaCloud

Parameters:

name (str) –

Name or unique identifier of the memoryset

Returns:

Self –

Handle to the existing memoryset in the OrcaCloud

Raises:

LookupError –

If the memoryset does not exist

exists `classmethod` #

exists(name_or_id)

Check if a memoryset exists in the OrcaCloud

Parameters:

name_or_id (str) –

Name or id of the memoryset

Returns:

bool –

True if the memoryset exists, False otherwise

all `classmethod` #

all(show_hidden=False)

Get a list of handles to all memorysets in the OrcaCloud

Parameters:

show_hidden (bool, default: False ) –

Whether to include hidden memorysets in results, defaults to False

Returns:

list[Self] –

List of handles to all memorysets in the OrcaCloud

drop `classmethod` #

drop(name_or_id, if_not_exists='error')

Delete a memoryset from the OrcaCloud

Parameters:

name_or_id (str) –

Name or id of the memoryset
if_not_exists (DropMode, default: 'error' ) –

What to do if the memoryset does not exist, defaults to "error". Other options are "ignore" to do nothing if the memoryset does not exist.

Raises:

LookupError –

If the memoryset does not exist and if_not_exists is "error"

set #

set(
    *,
    name=UNSET,
    description=UNSET,
    label_names=UNSET,
    hidden=UNSET
)

Update editable attributes of the memoryset

Note

If a field is not provided, it will default to UNSET and not be updated.

Parameters:

description (str | None, default: UNSET ) –

Value to set for the description
name (str, default: UNSET ) –

Value to set for the name
label_names (list[str], default: UNSET ) –

Value to replace existing label names with

clone #

clone(
    name: str,
    *,
    embedding_model: (
        PretrainedEmbeddingModel
        | FinetunedEmbeddingModel
        | None
    ) = None,
    max_seq_length_override: int | None = None,
    prompt: str | None = None,
    if_exists: CreateMode = "error",
    background: Literal[True]
) -> Job[Self]

clone(
    name: str,
    *,
    embedding_model: (
        PretrainedEmbeddingModel
        | FinetunedEmbeddingModel
        | None
    ) = None,
    max_seq_length_override: int | None = None,
    prompt: str | None = None,
    if_exists: CreateMode = "error",
    background: Literal[False] = False
) -> Self

clone(
    name,
    *,
    embedding_model=None,
    max_seq_length_override=UNSET,
    prompt=None,
    if_exists="error",
    background=False
)

Create a clone of the memoryset with a new name

Parameters:

name (str) –

Name for the new memoryset (must be unique)
embedding_model (PretrainedEmbeddingModel | FinetunedEmbeddingModel | None, default: None ) –

Optional new embedding model to use for re-embedding the memory values value is longer than this it will be truncated, will default to the model’s max sequence length if not provided
max_seq_length_override (int | None, default: UNSET ) –

Optional custom max sequence length to use for the cloned memoryset. If not provided, will use the source memoryset’s max sequence length.
prompt (str | None, default: None ) –

Optional custom prompt to use for the cloned memoryset. If not provided, will use the source memoryset’s prompt.
if_exists (CreateMode, default: 'error' ) –

What to do if a memoryset with the same name already exists, defaults to "error". Other option is "open" to open the existing memoryset.

Returns:

Self | Job[Self] –

Handle to the cloned memoryset in the OrcaCloud

Examples:

>>> memoryset = LabeledMemoryset.open("my_memoryset")
>>> finetuned_embedding_model = PretrainedEmbeddingModel.GTE_BASE.finetune(
...     "gte_base_finetuned", my_memoryset
... )
>>> new_memoryset = memoryset.clone(
...     "my_memoryset_finetuned", embedding_model=finetuned_embedding_model,
... )

>>> # Clone with custom prompts
>>> new_memoryset = memoryset.clone(
...     "my_memoryset_with_prompts",
...     document_prompt_override="Represent this document for retrieval:",
...     query_prompt_override="Represent this query for retrieval:",
... )

refresh #

refresh(throttle=0)

Refresh the information about the memoryset from the OrcaCloud

Parameters:

throttle (float, default: 0 ) –

Minimum time in seconds between refreshes

len #

__len__()

Get the number of memories in the memoryset

getitem #

__getitem__(index: int | str) -> MemoryT

__getitem__(index: slice) -> list[MemoryT]

__getitem__(index)

Get memories from the memoryset by index or memory id

Parameters:

index (int | slice | str) –

Index or memory to retrieve or slice of memories to retrieve or unique identifier of the memory to retrieve

Returns:

MemoryT | list[MemoryT] –

Memory or memories from the memoryset

Raises:

LookupError –

If the id is not found or the index is out of bounds

Examples:

Retrieve the first memory in the memoryset:

>>> memoryset[0]
LabeledMemory({ label: <positive: 1>, value: 'I am happy' })

Retrieve the last memory in the memoryset:

>>> memoryset[-1]
LabeledMemory({ label: <negative: 0>, value: 'I am sad' })

Retrieve a slice of memories in the memoryset:

>>> memoryset[1:3]
[
    LabeledMemory({ label: <positive: 1>, value: 'I am happy' }),
    LabeledMemory({ label: <negative: 0>, value: 'I am sad' }),
]

Retrieve a memory by id:

>>> memoryset["0195019a-5bc7-7afb-b902-5945ee1fb766"]
LabeledMemory({ label: <positive: 1>, value: 'I am happy' })

search #

search(
    query: str,
    *,
    count: int = 1,
    prompt: str | None = None,
    partition_id: str | None = None,
    partition_filter_mode: Literal[
        "ignore_partitions",
        "include_global",
        "exclude_global",
        "only_global",
    ] = "include_global"
) -> list[MemoryLookupT]

search(
    query: list[str],
    *,
    count: int = 1,
    prompt: str | None = None,
    partition_id: str | None = None,
    partition_filter_mode: Literal[
        "ignore_partitions",
        "include_global",
        "exclude_global",
        "only_global",
    ] = "include_global"
) -> list[list[MemoryLookupT]]

search(
    query,
    *,
    count=1,
    prompt=None,
    partition_id=None,
    partition_filter_mode="include_global"
)

Search for memories that are semantically similar to the query

Parameters:

query (str | list[str]) –

Query to lookup memories in the memoryset, can be a single query or a list
count (int, default: 1 ) –

Number of memories to return for each query
prompt (str | None, default: None ) –

Optional prompt for query embedding during search. If not provided, the memoryset’s default query prompt will be used if available.
partition_id (str | None, default: None ) –

Optional partition ID to filter memories by
partition_filter_mode (Literal['ignore_partitions', 'include_global', 'exclude_global', 'only_global'], default: 'include_global' ) –

How to filter partitions when searching for memories - “ignore_partitions”: Ignore partitions - “include_global”: Include global memories - “exclude_global”: Exclude global memories - “only_global”: Only include global memories

Returns: List of memories from the memoryset that match the query. If a single query is provided, the return value is a list containing a single list of memories. If a list of queries is provided, the return value is a list of lists of memories.

Examples:

Search for similar memories:

>>> memoryset.search("I am happy", count=2)
[
    LabeledMemoryLookup({ label: <positive: 1>, value: 'I am happy' }),
    LabeledMemoryLookup({ label: <positive: 1>, value: 'I am content' }),
]

Search with custom query prompt for instruction-following models:

>>> memoryset.search("I am happy", count=2, query_prompt="Represent this query for sentiment retrieval:")
[
    LabeledMemoryLookup({ label: <positive: 1>, value: 'I am happy' }),
    LabeledMemoryLookup({ label: <positive: 1>, value: 'I am content' }),
]

Search for similar memories for multiple queries:

>>> memoryset.search(["I am happy", "I am sad"], count=1)
[
    [
        LabeledMemoryLookup({ label: <positive: 1>, value: 'I am happy' }),
    ],
    [
        LabeledMemoryLookup({ label: <negative: 0>, value: 'I am sad' }),
    ],
]

query #

query(
    offset=0,
    limit=100,
    filters=[],
    with_feedback_metrics=False,
    sort=None,
    partition_id=None,
    partition_filter_mode="include_global",
)

Query the memoryset for memories that match the filters

Parameters:

offset (int, default: 0 ) –

The offset of the first memory to return
limit (int, default: 100 ) –

The maximum number of memories to return
filters (list[FilterItemTuple], default: [] ) –

List of filters to apply to the query.
with_feedback_metrics (bool, default: False ) –

Whether to include feedback metrics in the response

Returns:

list[MemoryT] –

List of memories from the memoryset that match the filters

Examples:

>>> memoryset.query(filters=[("label", "==", 0)], limit=2)
[
    LabeledMemory({ label: <positive: 1>, value: "I am happy" }),
    LabeledMemory({ label: <negative: 0>, value: "I am sad" }),
]

to_pandas #

to_pandas(
    offset=0,
    limit=100,
    filters=[],
    with_feedback_metrics=False,
    sort=None,
)

Convert the memoryset to a pandas DataFrame

insert #

insert(items)

Insert memories into the memoryset

Parameters:

items (Iterable[dict[str, Any]] | dict[str, Any]) –
List of memories to insert into the memoryset. This should be a list of dictionaries with the following keys:
- value: Value of the memory
- label: Label of the memory
- score: Score of the memory
- source_id: Optional unique ID of the memory in a system of reference
- ...: Any other metadata to store for the memory

Examples:

>>> memoryset.insert([
...     {"value": "I am happy", "label": 1, "source_id": "data_123", "partition_id": "user_1", "tag": "happy"},
...     {"value": "I am sad", "label": 0, "source_id": "data_124", "partition_id": "user_1", "tag": "sad"},
... ])

ainsert `async` #

ainsert(items)

Asynchronously insert memories into the memoryset

Parameters:

items (Iterable[dict[str, Any]] | dict[str, Any]) –
List of memories to insert into the memoryset. This should be a list of dictionaries with the following keys:
- value: Value of the memory
- label: Label of the memory
- score: Score of the memory
- source_id: Optional unique ID of the memory in a system of reference
- partition_id: Optional partition ID of the memory
- ...: Any other metadata to store for the memory

Examples:

>>> await memoryset.ainsert([
...     {"value": "I am happy", "label": 1, "source_id": "data_123", "partition_id": "user_1", "tag": "happy"},
...     {"value": "I am sad", "label": 0, "source_id": "data_124", "partition_id": "user_1", "tag": "sad"},
... ])

arefresh `async` #

arefresh(throttle=0)

Asynchronously refresh the information about the memoryset from the OrcaCloud

Parameters:

throttle (float, default: 0 ) –

Minimum time in seconds between refreshes

get #

get(memory_id: str) -> MemoryT

get(memory_id: Iterable[str]) -> list[MemoryT]

get(memory_id)

Fetch a memory or memories from the memoryset

Parameters:

memory_id (str | Iterable[str]) –

Unique identifier of the memory or memories to fetch

Returns:

MemoryT | list[MemoryT] –

Memory or list of memories from the memoryset

Raises:

LookupError –

If no memory with the given id is found

Examples:

Fetch a single memory:

>>> memoryset.get("0195019a-5bc7-7afb-b902-5945ee1fb766")
LabeledMemory({ label: <positive: 1>, value: 'I am happy' })

Fetch multiple memories:

>>> memoryset.get([
...     "0195019a-5bc7-7afb-b902-5945ee1fb766",
...     "019501a1-ea08-76b2-9f62-95e4800b4841",
... ])
[
    LabeledMemory({ label: <positive: 1>, value: 'I am happy' }),
    LabeledMemory({ label: <negative: 0>, value: 'I am sad' }),
]

update #

update(updates: dict[str, Any]) -> MemoryT

update(updates: Iterable[dict[str, Any]]) -> list[MemoryT]

update(updates)

Update one or multiple memories in the memoryset

Parameters:

updates (dict[str, Any] | Iterable[dict[str, Any]]) –
List of updates to apply to the memories. Each update should be a dictionary with the following keys:
- memory_id: Unique identifier of the memory to update (required)
- value: Optional new value of the memory
- label: Optional new label of the memory
- source_id: Optional new source ID of the memory
- partition_id: Optional new partition ID of the memory
- ...: Optional new values for metadata properties

Returns:

MemoryT | list[MemoryT] –

Updated memory or list of updated memories

Examples:

Update a single memory:

>>> memoryset.update(
...     {
...         "memory_id": "019501a1-ea08-76b2-9f62-95e4800b4841",
...         "tag": "happy",
...     },
... )

Update multiple memories:

>>> memoryset.update(
...     {"memory_id": m.memory_id, "label": 2}
...     for m in memoryset.query(filters=[("tag", "==", "happy")])
... )

get_cascading_edits_suggestions #

get_cascading_edits_suggestions(
    memory,
    *,
    old_label,
    new_label,
    max_neighbors=50,
    max_validation_neighbors=10,
    similarity_threshold=None,
    only_if_has_old_label=True,
    exclude_if_new_label=True,
    suggestion_cooldown_time=3600.0 * 24.0,
    label_confirmation_cooldown_time=3600.0 * 24.0 * 7
)

Suggests cascading edits for a given memory based on nearby points with similar labels.

This function is triggered after a user changes a memory’s label. It looks for nearby candidates in embedding space that may be subject to similar relabeling and returns them as suggestions. The system uses scoring heuristics, label filters, and cooldown tracking to reduce noise and improve usability.

Parameters:

memory (MemoryT) –

The memory whose label was just changed.
old_label (int) –

The label this memory used to have.
new_label (int) –

The label it was changed to.
max_neighbors (int, default: 50 ) –

Maximum number of neighbors to consider.
max_validation_neighbors (int, default: 10 ) –

Maximum number of neighbors to use for label suggestion.
similarity_threshold (float | None, default: None ) –

If set, only include neighbors with a lookup score above this threshold.
only_if_has_old_label (bool, default: True ) –

If True, only consider neighbors that have the old label.
exclude_if_new_label (bool, default: True ) –

If True, exclude neighbors that already have the new label.
suggestion_cooldown_time (float, default: 3600.0 * 24.0 ) –

Minimum time (in seconds) since the last suggestion for a neighbor to be considered again.
label_confirmation_cooldown_time (float, default: 3600.0 * 24.0 * 7 ) –

Minimum time (in seconds) since a neighbor’s label was confirmed to be considered for suggestions.

Returns:

list[CascadingEditSuggestion] –

A list of CascadingEditSuggestion objects, each containing a neighbor and the suggested new label.

delete #

delete(memory_id)

Delete memories from the memoryset

Parameters:

memory_id (str | Iterable[str]) –

unique identifiers of the memories to delete

Examples:

Delete a single memory:

>>> memoryset.delete("0195019a-5bc7-7afb-b902-5945ee1fb766")

Delete multiple memories:

>>> memoryset.delete([
...     "0195019a-5bc7-7afb-b902-5945ee1fb766",
...     "019501a1-ea08-76b2-9f62-95e4800b4841",
... )

analyze #

analyze(
    *analyses: dict[str, Any] | str,
    lookup_count: int = 15,
    clear_metrics: bool = False,
    background: Literal[True],
    partition_filter_mode: Literal[
        "ignore_partitions",
        "include_global",
        "exclude_global",
        "only_global",
    ] = "include_global"
) -> Job[MemorysetMetrics]

analyze(
    *analyses: dict[str, Any] | str,
    lookup_count: int = 15,
    clear_metrics: bool = False,
    background: Literal[False] = False,
    partition_filter_mode: Literal[
        "ignore_partitions",
        "include_global",
        "exclude_global",
        "only_global",
    ] = "include_global"
) -> MemorysetMetrics

analyze(
    *analyses,
    lookup_count=15,
    clear_metrics=False,
    background=False,
    partition_filter_mode="include_global"
)

Run analyses on the memoryset to find duplicates, clusters, mislabelings, and more

The results of the analysis will be stored in the LabeledMemory.metrics attribute of each memory in the memoryset. Overall memoryset metrics will be returned as a dictionary.

Parameters:

analyses (dict[str, Any] | str, default: () ) –
List of analysis to run on the memoryset, can either be just the name of an analysis or a dictionary with a name property and additional config. The available analyses are:
- "duplicate": Find potentially duplicate memories in the memoryset
- "cluster": Cluster the memories in the memoryset
- "distribution": Analyze the embedding distribution
- "projection": Create a 2D projection of the embeddings for visualization
- "label": Analyze the labels to find potential mislabelings (labeled memorysets only)
- "class_patterns": Analyze class patterns and find representative memories (labeled memorysets only)
- "concepts": Discover and name conceptual clusters in the memoryset (labeled memorysets only)
lookup_count (int, default: 15 ) –

Number of memories to lookup for each memory in the memoryset
clear_metrics (bool, default: False ) –

Whether to clear any existing metrics from the memories before running the analysis
partition_filter_mode (Literal['ignore_partitions', 'include_global', 'exclude_global', 'only_global'], default: 'include_global' ) –

How to filter partitions when running the analysis - “ignore_partitions”: Ignore partitions - “include_global”: Include global memories - “exclude_global”: Exclude global memories - “only_global”: Only include global memories

Returns:

Job[MemorysetMetrics] | MemorysetMetrics –

dictionary with aggregate metrics for each analysis that was run

Raises:

ValueError –

If an invalid analysis name is provided

Examples:

Run label and duplicate analysis:

>>> memoryset.analyze("label", {"name": "duplicate", "possible_duplicate_threshold": 0.99})
{ "duplicate": { "num_duplicates": 10 },
  "label": {
    "label_metrics": [{
        "label": 0,
        "label_name": "negative",
        "average_lookup_score": 0.95,
        "memory_count": 100,
    }, {
        "label": 1,
        "label_name": "positive",
        "average_lookup_score": 0.90,
        "memory_count": 100,
    }]
    "neighbor_prediction_accuracy": 0.95,
    "mean_neighbor_label_confidence": 0.95,
    "mean_neighbor_label_entropy": 0.95,
    "mean_neighbor_predicted_label_ambiguity": 0.95,
  }
}

Remove all exact duplicates:

>>> memoryset.delete(
...     m.memory_id
...     for m in memoryset.query(
...         filters=[("metrics.is_duplicate", "==", True)]
...     )
... )

Display label analysis to review potential mislabelings:

>>> memoryset.display_label_analysis()

get_potential_duplicate_groups #

get_potential_duplicate_groups()

Group potential duplicates in the memoryset

LabeledMemoryset #

Bases: MemorysetBase[LabeledMemory, LabeledMemoryLookup]

A Handle to a collection of memories with labels in the OrcaCloud

Attributes:

id –

Unique identifier for the memoryset
name –

Unique name of the memoryset
description –

Description of the memoryset
label_names (list[str]) –

Names for the class labels in the memoryset
length (list[str]) –

Number of memories in the memoryset
embedding_model (list[str]) –

Embedding model used to embed the memory values for semantic search
created_at (list[str]) –

When the memoryset was created, automatically generated on create
updated_at (list[str]) –

When the memoryset was last updated, automatically updated on updates

from_hf_dataset `classmethod` #

from_hf_dataset(
    name: str,
    hf_dataset: Dataset,
    background: Literal[True],
    **kwargs: Any
) -> Self

from_hf_dataset(
    name: str,
    hf_dataset: Dataset,
    background: Literal[False] = False,
    **kwargs: Any
) -> Self

from_hf_dataset(
    name, hf_dataset, background=False, **kwargs
)

Create a new memoryset from a Hugging Face Dataset in the OrcaCloud

This will automatically create a Datasource with the same name appended with _datasource and use that as the datasource for the memoryset.

All features that are not specified to be used as value_column, label_column, or source_id_column will be stored as metadata in the memoryset.

Parameters:

name (str) –

Name for the new memoryset (must be unique)
hf_dataset (Dataset) –

Hugging Face dataset to create the memoryset from
kwargs (Any, default: {} ) –

Additional parameters for creating the memoryset. See create attributes for details.

Returns:

Self | Job[Self] –

Handle to the new memoryset in the OrcaCloud

from_pytorch `classmethod` #

from_pytorch(
    name: str,
    torch_data: DataLoader | Dataset,
    *,
    column_names: list[str] | None = None,
    background: Literal[True],
    **kwargs: Any
) -> Job[Self]

from_pytorch(
    name: str,
    torch_data: DataLoader | Dataset,
    *,
    column_names: list[str] | None = None,
    background: Literal[False] = False,
    **kwargs: Any
) -> Self

from_pytorch(
    name,
    torch_data,
    *,
    column_names=None,
    background=False,
    **kwargs
)

Create a new memoryset from a PyTorch DataLoader or Dataset in the OrcaCloud

This will automatically create a Datasource with the same name appended with _datasource and use that as the datasource for the memoryset.

All properties that are not specified to be used as value_column, label_column, or source_id_column, or partition_id_column will be stored as metadata in the memoryset.

Parameters:

name (str) –

Name for the new memoryset (must be unique)
torch_data (DataLoader | Dataset) –

PyTorch data loader or dataset to create the memoryset from
column_names (list[str] | None, default: None ) –

If the provided dataset or data loader returns unnamed tuples, this argument must be provided to specify the names of the columns.
background (bool, default: False ) –

Whether to run the operation in the background
kwargs (Any, default: {} ) –

Additional parameters for creating the memoryset. See create attributes for details.

Returns:

Self | Job[Self] –

Handle to the new memoryset in the OrcaCloud

from_list `classmethod` #

from_list(
    name: str,
    data: list[dict],
    *,
    background: Literal[True],
    **kwargs: Any
) -> Job[Self]

from_list(
    name: str,
    data: list[dict],
    *,
    background: Literal[False] = False,
    **kwargs: Any
) -> Self

from_list(name, data, *, background=False, **kwargs)

Create a new memoryset from a list of dictionaries in the OrcaCloud

This will automatically create a Datasource with the same name appended with _datasource and use that as the datasource for the memoryset.

All properties that are not specified to be used as value_column, label_column, or source_id_column, or partition_id_column will be stored as metadata in the memoryset.

Parameters:

name (str) –

Name for the new memoryset (must be unique)
data (list[dict]) –

List of dictionaries to create the memoryset from
background (bool, default: False ) –

Whether to run the operation in the background
kwargs (Any, default: {} ) –

Additional parameters for creating the memoryset. See create attributes for details.

Returns:

Self | Job[Self] –

Handle to the new memoryset in the OrcaCloud

Examples:

>>> LabeledMemoryset.from_list("my_memoryset", [
...     {"value": "hello", "label": 0, "tag": "tag1"},
...     {"value": "world", "label": 1, "tag": "tag2"},
... ])

from_dict `classmethod` #

from_dict(
    name: str,
    data: dict,
    *,
    background: Literal[True],
    **kwargs: Any
) -> Job[Self]

from_dict(
    name: str,
    data: dict,
    *,
    background: Literal[False] = False,
    **kwargs: Any
) -> Self

from_dict(name, data, *, background=False, **kwargs)

Create a new memoryset from a dictionary of columns in the OrcaCloud

This will automatically create a Datasource with the same name appended with _datasource and use that as the datasource for the memoryset.

All columns from the datasource that are not specified in the value_column, label_column, source_id_column, or partition_id_column will be stored as metadata in the memoryset.

Parameters:

name (str) –

Name for the new memoryset (must be unique)
data (dict) –

Dictionary of columns to create the memoryset from
background (bool, default: False ) –

Whether to run the operation in the background
kwargs (Any, default: {} ) –

Additional parameters for creating the memoryset. See create attributes for details.

Returns:

Self | Job[Self] –

Handle to the new memoryset in the OrcaCloud

Examples:

>>> LabeledMemoryset.from_dict("my_memoryset", {
...     "value": ["hello", "world"],
...     "label": [0, 1],
...     "tag": ["tag1", "tag2"],
... })

from_pandas `classmethod` #

from_pandas(
    name: str,
    dataframe: DataFrame,
    *,
    background: Literal[True],
    **kwargs: Any
) -> Job[Self]

from_pandas(
    name: str,
    dataframe: DataFrame,
    *,
    background: Literal[False] = False,
    **kwargs: Any
) -> Self

from_pandas(name, dataframe, *, background=False, **kwargs)

Create a new memoryset from a pandas DataFrame in the OrcaCloud

This will automatically create a Datasource with the same name appended with _datasource and use that as the datasource for the memoryset.

All columns that are not specified to be used as value_column, label_column, or source_id_column, or partition_id_column will be stored as metadata in the memoryset.

Parameters:

name (str) –

Name for the new memoryset (must be unique)
dataframe (DataFrame) –

Dataframe to create the memoryset from
background (bool, default: False ) –

Whether to run the operation in the background
kwargs (Any, default: {} ) –

Additional parameters for creating the memoryset. See create attributes for details.

Returns:

Self | Job[Self] –

Handle to the new memoryset in the OrcaCloud

from_arrow `classmethod` #

from_arrow(
    name: str,
    pyarrow_table: Table,
    *,
    background: Literal[True],
    **kwargs: Any
) -> Job[Self]

from_arrow(
    name: str,
    pyarrow_table: Table,
    *,
    background: Literal[False] = False,
    **kwargs: Any
) -> Self

from_arrow(
    name, pyarrow_table, *, background=False, **kwargs
)

Create a new memoryset from a PyArrow Table in the OrcaCloud

This will automatically create a Datasource with the same name appended with _datasource and use that as the datasource for the memoryset.

All columns that are not specified to be used as value_column, label_column, or source_id_column, or partition_id_column will be stored as metadata in the memoryset.

Parameters:

name (str) –

Name for the new memoryset (must be unique)
pyarrow_table (Table) –

PyArrow table to create the memoryset from
background (bool, default: False ) –

Whether to run the operation in the background
kwargs (Any, default: {} ) –

Additional parameters for creating the memoryset. See create attributes for details.

Returns:

Self | Job[Self] –

Handle to the new memoryset in the OrcaCloud

from_disk `classmethod` #

from_disk(
    name: str,
    file_path: str | PathLike,
    *,
    background: Literal[True],
    **kwargs: Any
) -> Job[Self]

from_disk(
    name: str,
    file_path: str | PathLike,
    *,
    background: Literal[False] = False,
    **kwargs: Any
) -> Self

from_disk(name, file_path, *, background=False, **kwargs)

Create a new memoryset from a file on disk in the OrcaCloud

This will automatically create a Datasource with the same name appended with _datasource and use that as the datasource for the memoryset.

All columns from the datasource that are not specified in the value_column, label_column, source_id_column, or partition_id_column will be stored as metadata in the memoryset.

Parameters:

name (str) –

Name for the new memoryset (must be unique)
file_path (str | PathLike) –
Path to the file on disk to create the memoryset from. The file type will be inferred from the file extension. The following file types are supported:
- .pkl: Pickle files containing lists of dictionaries or dictionaries of columns
- .json/.jsonl: JSON and [JSON] Lines files
- .csv: CSV files
- .parquet: Parquet files
- dataset directory: Directory containing a saved HuggingFace Dataset
background (bool, default: False ) –

Whether to run the operation in the background
kwargs (Any, default: {} ) –

Additional parameters for creating the memoryset. See create attributes for details.

Returns:

Self | Job[Self] –

Handle to the new memoryset in the OrcaCloud

open `classmethod` #

open(name)

Get a handle to a memoryset in the OrcaCloud

Parameters:

name (str) –

Name or unique identifier of the memoryset

Returns:

Self –

Handle to the existing memoryset in the OrcaCloud

Raises:

LookupError –

If the memoryset does not exist

aopen `async` `classmethod` #

aopen(name)

Asynchronously get a handle to a memoryset in the OrcaCloud

Parameters:

name (str) –

Name or unique identifier of the memoryset

Returns:

Self –

Handle to the existing memoryset in the OrcaCloud

Raises:

LookupError –

If the memoryset does not exist

exists `classmethod` #

exists(name_or_id)

Check if a memoryset exists in the OrcaCloud

Parameters:

name_or_id (str) –

Name or id of the memoryset

Returns:

bool –

True if the memoryset exists, False otherwise

all `classmethod` #

all(show_hidden=False)

Get a list of handles to all memorysets in the OrcaCloud

Parameters:

show_hidden (bool, default: False ) –

Whether to include hidden memorysets in results, defaults to False

Returns:

list[Self] –

List of handles to all memorysets in the OrcaCloud

drop `classmethod` #

drop(name_or_id, if_not_exists='error')

Delete a memoryset from the OrcaCloud

Parameters:

name_or_id (str) –

Name or id of the memoryset
if_not_exists (DropMode, default: 'error' ) –

What to do if the memoryset does not exist, defaults to "error". Other options are "ignore" to do nothing if the memoryset does not exist.

Raises:

LookupError –

If the memoryset does not exist and if_not_exists is "error"

set #

set(
    *,
    name=UNSET,
    description=UNSET,
    label_names=UNSET,
    hidden=UNSET
)

Update editable attributes of the memoryset

Note

If a field is not provided, it will default to UNSET and not be updated.

Parameters:

description (str | None, default: UNSET ) –

Value to set for the description
name (str, default: UNSET ) –

Value to set for the name
label_names (list[str], default: UNSET ) –

Value to replace existing label names with

clone #

clone(
    name: str,
    *,
    embedding_model: (
        PretrainedEmbeddingModel
        | FinetunedEmbeddingModel
        | None
    ) = None,
    max_seq_length_override: int | None = None,
    prompt: str | None = None,
    if_exists: CreateMode = "error",
    background: Literal[True]
) -> Job[Self]

clone(
    name: str,
    *,
    embedding_model: (
        PretrainedEmbeddingModel
        | FinetunedEmbeddingModel
        | None
    ) = None,
    max_seq_length_override: int | None = None,
    prompt: str | None = None,
    if_exists: CreateMode = "error",
    background: Literal[False] = False
) -> Self

clone(
    name,
    *,
    embedding_model=None,
    max_seq_length_override=UNSET,
    prompt=None,
    if_exists="error",
    background=False
)

Create a clone of the memoryset with a new name

Parameters:

name (str) –

Name for the new memoryset (must be unique)
embedding_model (PretrainedEmbeddingModel | FinetunedEmbeddingModel | None, default: None ) –

Optional new embedding model to use for re-embedding the memory values value is longer than this it will be truncated, will default to the model’s max sequence length if not provided
max_seq_length_override (int | None, default: UNSET ) –

Optional custom max sequence length to use for the cloned memoryset. If not provided, will use the source memoryset’s max sequence length.
prompt (str | None, default: None ) –

Optional custom prompt to use for the cloned memoryset. If not provided, will use the source memoryset’s prompt.
if_exists (CreateMode, default: 'error' ) –

What to do if a memoryset with the same name already exists, defaults to "error". Other option is "open" to open the existing memoryset.

Returns:

Self | Job[Self] –

Handle to the cloned memoryset in the OrcaCloud

Examples:

>>> memoryset = LabeledMemoryset.open("my_memoryset")
>>> finetuned_embedding_model = PretrainedEmbeddingModel.GTE_BASE.finetune(
...     "gte_base_finetuned", my_memoryset
... )
>>> new_memoryset = memoryset.clone(
...     "my_memoryset_finetuned", embedding_model=finetuned_embedding_model,
... )

>>> # Clone with custom prompts
>>> new_memoryset = memoryset.clone(
...     "my_memoryset_with_prompts",
...     document_prompt_override="Represent this document for retrieval:",
...     query_prompt_override="Represent this query for retrieval:",
... )

refresh #

refresh(throttle=0)

Refresh the information about the memoryset from the OrcaCloud

Parameters:

throttle (float, default: 0 ) –

Minimum time in seconds between refreshes

len #

__len__()

Get the number of memories in the memoryset

getitem #

__getitem__(index: int | str) -> MemoryT

__getitem__(index: slice) -> list[MemoryT]

__getitem__(index)

Get memories from the memoryset by index or memory id

Parameters:

index (int | slice | str) –

Index or memory to retrieve or slice of memories to retrieve or unique identifier of the memory to retrieve

Returns:

MemoryT | list[MemoryT] –

Memory or memories from the memoryset

Raises:

LookupError –

If the id is not found or the index is out of bounds

Examples:

Retrieve the first memory in the memoryset:

>>> memoryset[0]
LabeledMemory({ label: <positive: 1>, value: 'I am happy' })

Retrieve the last memory in the memoryset:

>>> memoryset[-1]
LabeledMemory({ label: <negative: 0>, value: 'I am sad' })

Retrieve a slice of memories in the memoryset:

>>> memoryset[1:3]
[
    LabeledMemory({ label: <positive: 1>, value: 'I am happy' }),
    LabeledMemory({ label: <negative: 0>, value: 'I am sad' }),
]

Retrieve a memory by id:

>>> memoryset["0195019a-5bc7-7afb-b902-5945ee1fb766"]
LabeledMemory({ label: <positive: 1>, value: 'I am happy' })

search #

search(
    query: str,
    *,
    count: int = 1,
    prompt: str | None = None,
    partition_id: str | None = None,
    partition_filter_mode: Literal[
        "ignore_partitions",
        "include_global",
        "exclude_global",
        "only_global",
    ] = "include_global"
) -> list[MemoryLookupT]

search(
    query: list[str],
    *,
    count: int = 1,
    prompt: str | None = None,
    partition_id: str | None = None,
    partition_filter_mode: Literal[
        "ignore_partitions",
        "include_global",
        "exclude_global",
        "only_global",
    ] = "include_global"
) -> list[list[MemoryLookupT]]

search(
    query,
    *,
    count=1,
    prompt=None,
    partition_id=None,
    partition_filter_mode="include_global"
)

Search for memories that are semantically similar to the query

Parameters:

query (str | list[str]) –

Query to lookup memories in the memoryset, can be a single query or a list
count (int, default: 1 ) –

Number of memories to return for each query
prompt (str | None, default: None ) –

Optional prompt for query embedding during search. If not provided, the memoryset’s default query prompt will be used if available.
partition_id (str | None, default: None ) –

Optional partition ID to filter memories by
partition_filter_mode (Literal['ignore_partitions', 'include_global', 'exclude_global', 'only_global'], default: 'include_global' ) –

How to filter partitions when searching for memories - “ignore_partitions”: Ignore partitions - “include_global”: Include global memories - “exclude_global”: Exclude global memories - “only_global”: Only include global memories

Returns: List of memories from the memoryset that match the query. If a single query is provided, the return value is a list containing a single list of memories. If a list of queries is provided, the return value is a list of lists of memories.

Examples:

Search for similar memories:

>>> memoryset.search("I am happy", count=2)
[
    LabeledMemoryLookup({ label: <positive: 1>, value: 'I am happy' }),
    LabeledMemoryLookup({ label: <positive: 1>, value: 'I am content' }),
]

Search with custom query prompt for instruction-following models:

>>> memoryset.search("I am happy", count=2, query_prompt="Represent this query for sentiment retrieval:")
[
    LabeledMemoryLookup({ label: <positive: 1>, value: 'I am happy' }),
    LabeledMemoryLookup({ label: <positive: 1>, value: 'I am content' }),
]

Search for similar memories for multiple queries:

>>> memoryset.search(["I am happy", "I am sad"], count=1)
[
    [
        LabeledMemoryLookup({ label: <positive: 1>, value: 'I am happy' }),
    ],
    [
        LabeledMemoryLookup({ label: <negative: 0>, value: 'I am sad' }),
    ],
]

query #

query(
    offset=0,
    limit=100,
    filters=[],
    with_feedback_metrics=False,
    sort=None,
    partition_id=None,
    partition_filter_mode="include_global",
)

Query the memoryset for memories that match the filters

Parameters:

offset (int, default: 0 ) –

The offset of the first memory to return
limit (int, default: 100 ) –

The maximum number of memories to return
filters (list[FilterItemTuple], default: [] ) –

List of filters to apply to the query.
with_feedback_metrics (bool, default: False ) –

Whether to include feedback metrics in the response

Returns:

list[MemoryT] –

List of memories from the memoryset that match the filters

Examples:

>>> memoryset.query(filters=[("label", "==", 0)], limit=2)
[
    LabeledMemory({ label: <positive: 1>, value: "I am happy" }),
    LabeledMemory({ label: <negative: 0>, value: "I am sad" }),
]

to_pandas #

to_pandas(
    offset=0,
    limit=100,
    filters=[],
    with_feedback_metrics=False,
    sort=None,
)

Convert the memoryset to a pandas DataFrame

insert #

insert(items)

Insert memories into the memoryset

Parameters:

items (Iterable[dict[str, Any]] | dict[str, Any]) –
List of memories to insert into the memoryset. This should be a list of dictionaries with the following keys:
- value: Value of the memory
- label: Label of the memory
- score: Score of the memory
- source_id: Optional unique ID of the memory in a system of reference
- ...: Any other metadata to store for the memory

Examples:

>>> memoryset.insert([
...     {"value": "I am happy", "label": 1, "source_id": "data_123", "partition_id": "user_1", "tag": "happy"},
...     {"value": "I am sad", "label": 0, "source_id": "data_124", "partition_id": "user_1", "tag": "sad"},
... ])

ainsert `async` #

ainsert(items)

Asynchronously insert memories into the memoryset

Parameters:

items (Iterable[dict[str, Any]] | dict[str, Any]) –
List of memories to insert into the memoryset. This should be a list of dictionaries with the following keys:
- value: Value of the memory
- label: Label of the memory
- score: Score of the memory
- source_id: Optional unique ID of the memory in a system of reference
- partition_id: Optional partition ID of the memory
- ...: Any other metadata to store for the memory

Examples:

>>> await memoryset.ainsert([
...     {"value": "I am happy", "label": 1, "source_id": "data_123", "partition_id": "user_1", "tag": "happy"},
...     {"value": "I am sad", "label": 0, "source_id": "data_124", "partition_id": "user_1", "tag": "sad"},
... ])

arefresh `async` #

arefresh(throttle=0)

Asynchronously refresh the information about the memoryset from the OrcaCloud

Parameters:

throttle (float, default: 0 ) –

Minimum time in seconds between refreshes

get #

get(memory_id: str) -> MemoryT

get(memory_id: Iterable[str]) -> list[MemoryT]

get(memory_id)

Fetch a memory or memories from the memoryset

Parameters:

memory_id (str | Iterable[str]) –

Unique identifier of the memory or memories to fetch

Returns:

MemoryT | list[MemoryT] –

Memory or list of memories from the memoryset

Raises:

LookupError –

If no memory with the given id is found

Examples:

Fetch a single memory:

>>> memoryset.get("0195019a-5bc7-7afb-b902-5945ee1fb766")
LabeledMemory({ label: <positive: 1>, value: 'I am happy' })

Fetch multiple memories:

>>> memoryset.get([
...     "0195019a-5bc7-7afb-b902-5945ee1fb766",
...     "019501a1-ea08-76b2-9f62-95e4800b4841",
... ])
[
    LabeledMemory({ label: <positive: 1>, value: 'I am happy' }),
    LabeledMemory({ label: <negative: 0>, value: 'I am sad' }),
]

update #

update(updates: dict[str, Any]) -> MemoryT

update(updates: Iterable[dict[str, Any]]) -> list[MemoryT]

update(updates)

Update one or multiple memories in the memoryset

Parameters:

updates (dict[str, Any] | Iterable[dict[str, Any]]) –
List of updates to apply to the memories. Each update should be a dictionary with the following keys:
- memory_id: Unique identifier of the memory to update (required)
- value: Optional new value of the memory
- label: Optional new label of the memory
- source_id: Optional new source ID of the memory
- partition_id: Optional new partition ID of the memory
- ...: Optional new values for metadata properties

Returns:

MemoryT | list[MemoryT] –

Updated memory or list of updated memories

Examples:

Update a single memory:

>>> memoryset.update(
...     {
...         "memory_id": "019501a1-ea08-76b2-9f62-95e4800b4841",
...         "tag": "happy",
...     },
... )

Update multiple memories:

>>> memoryset.update(
...     {"memory_id": m.memory_id, "label": 2}
...     for m in memoryset.query(filters=[("tag", "==", "happy")])
... )

get_cascading_edits_suggestions #

get_cascading_edits_suggestions(
    memory,
    *,
    old_label,
    new_label,
    max_neighbors=50,
    max_validation_neighbors=10,
    similarity_threshold=None,
    only_if_has_old_label=True,
    exclude_if_new_label=True,
    suggestion_cooldown_time=3600.0 * 24.0,
    label_confirmation_cooldown_time=3600.0 * 24.0 * 7
)

Suggests cascading edits for a given memory based on nearby points with similar labels.

This function is triggered after a user changes a memory’s label. It looks for nearby candidates in embedding space that may be subject to similar relabeling and returns them as suggestions. The system uses scoring heuristics, label filters, and cooldown tracking to reduce noise and improve usability.

Parameters:

memory (MemoryT) –

The memory whose label was just changed.
old_label (int) –

The label this memory used to have.
new_label (int) –

The label it was changed to.
max_neighbors (int, default: 50 ) –

Maximum number of neighbors to consider.
max_validation_neighbors (int, default: 10 ) –

Maximum number of neighbors to use for label suggestion.
similarity_threshold (float | None, default: None ) –

If set, only include neighbors with a lookup score above this threshold.
only_if_has_old_label (bool, default: True ) –

If True, only consider neighbors that have the old label.
exclude_if_new_label (bool, default: True ) –

If True, exclude neighbors that already have the new label.
suggestion_cooldown_time (float, default: 3600.0 * 24.0 ) –

Minimum time (in seconds) since the last suggestion for a neighbor to be considered again.
label_confirmation_cooldown_time (float, default: 3600.0 * 24.0 * 7 ) –

Minimum time (in seconds) since a neighbor’s label was confirmed to be considered for suggestions.

Returns:

list[CascadingEditSuggestion] –

A list of CascadingEditSuggestion objects, each containing a neighbor and the suggested new label.

delete #

delete(memory_id)

Delete memories from the memoryset

Parameters:

memory_id (str | Iterable[str]) –

unique identifiers of the memories to delete

Examples:

Delete a single memory:

>>> memoryset.delete("0195019a-5bc7-7afb-b902-5945ee1fb766")

Delete multiple memories:

>>> memoryset.delete([
...     "0195019a-5bc7-7afb-b902-5945ee1fb766",
...     "019501a1-ea08-76b2-9f62-95e4800b4841",
... )

analyze #

analyze(
    *analyses: dict[str, Any] | str,
    lookup_count: int = 15,
    clear_metrics: bool = False,
    background: Literal[True],
    partition_filter_mode: Literal[
        "ignore_partitions",
        "include_global",
        "exclude_global",
        "only_global",
    ] = "include_global"
) -> Job[MemorysetMetrics]

analyze(
    *analyses: dict[str, Any] | str,
    lookup_count: int = 15,
    clear_metrics: bool = False,
    background: Literal[False] = False,
    partition_filter_mode: Literal[
        "ignore_partitions",
        "include_global",
        "exclude_global",
        "only_global",
    ] = "include_global"
) -> MemorysetMetrics

analyze(
    *analyses,
    lookup_count=15,
    clear_metrics=False,
    background=False,
    partition_filter_mode="include_global"
)

Run analyses on the memoryset to find duplicates, clusters, mislabelings, and more

The results of the analysis will be stored in the LabeledMemory.metrics attribute of each memory in the memoryset. Overall memoryset metrics will be returned as a dictionary.

Parameters:

analyses (dict[str, Any] | str, default: () ) –
List of analysis to run on the memoryset, can either be just the name of an analysis or a dictionary with a name property and additional config. The available analyses are:
- "duplicate": Find potentially duplicate memories in the memoryset
- "cluster": Cluster the memories in the memoryset
- "distribution": Analyze the embedding distribution
- "projection": Create a 2D projection of the embeddings for visualization
- "label": Analyze the labels to find potential mislabelings (labeled memorysets only)
- "class_patterns": Analyze class patterns and find representative memories (labeled memorysets only)
- "concepts": Discover and name conceptual clusters in the memoryset (labeled memorysets only)
lookup_count (int, default: 15 ) –

Number of memories to lookup for each memory in the memoryset
clear_metrics (bool, default: False ) –

Whether to clear any existing metrics from the memories before running the analysis
partition_filter_mode (Literal['ignore_partitions', 'include_global', 'exclude_global', 'only_global'], default: 'include_global' ) –

How to filter partitions when running the analysis - “ignore_partitions”: Ignore partitions - “include_global”: Include global memories - “exclude_global”: Exclude global memories - “only_global”: Only include global memories

Returns:

Job[MemorysetMetrics] | MemorysetMetrics –

dictionary with aggregate metrics for each analysis that was run

Raises:

ValueError –

If an invalid analysis name is provided

Examples:

Run label and duplicate analysis:

>>> memoryset.analyze("label", {"name": "duplicate", "possible_duplicate_threshold": 0.99})
{ "duplicate": { "num_duplicates": 10 },
  "label": {
    "label_metrics": [{
        "label": 0,
        "label_name": "negative",
        "average_lookup_score": 0.95,
        "memory_count": 100,
    }, {
        "label": 1,
        "label_name": "positive",
        "average_lookup_score": 0.90,
        "memory_count": 100,
    }]
    "neighbor_prediction_accuracy": 0.95,
    "mean_neighbor_label_confidence": 0.95,
    "mean_neighbor_label_entropy": 0.95,
    "mean_neighbor_predicted_label_ambiguity": 0.95,
  }
}

Remove all exact duplicates:

>>> memoryset.delete(
...     m.memory_id
...     for m in memoryset.query(
...         filters=[("metrics.is_duplicate", "==", True)]
...     )
... )

Display label analysis to review potential mislabelings:

>>> memoryset.display_label_analysis()

get_potential_duplicate_groups #

get_potential_duplicate_groups()

Group potential duplicates in the memoryset

create `classmethod` #

create(
    name: str,
    *,
    datasource: None = None,
    embedding_model: (
        FinetunedEmbeddingModel
        | PretrainedEmbeddingModel
        | None
    ) = None,
    description: str | None = None,
    label_names: list[str],
    max_seq_length_override: int | None = None,
    prompt: str | None = None,
    index_type: IndexType = "FLAT",
    index_params: dict[str, Any] = {},
    if_exists: CreateMode = "error",
    hidden: bool = False
) -> Self

create(
    name: str,
    *,
    datasource: Datasource,
    embedding_model: (
        FinetunedEmbeddingModel
        | PretrainedEmbeddingModel
        | None
    ) = None,
    value_column: str = "value",
    label_column: str | None = "label",
    source_id_column: str | None = None,
    partition_id_column: str | None = None,
    description: str | None = None,
    label_names: list[str] | None = None,
    max_seq_length_override: int | None = None,
    prompt: str | None = None,
    remove_duplicates: bool = True,
    index_type: IndexType = "FLAT",
    index_params: dict[str, Any] = {},
    if_exists: CreateMode = "error",
    background: Literal[True],
    hidden: bool = False,
    subsample: int | float | None = None
) -> Job[Self]

create(
    name: str,
    *,
    datasource: Datasource,
    embedding_model: (
        FinetunedEmbeddingModel
        | PretrainedEmbeddingModel
        | None
    ) = None,
    value_column: str = "value",
    label_column: str | None = "label",
    source_id_column: str | None = None,
    partition_id_column: str | None = None,
    description: str | None = None,
    label_names: list[str] | None = None,
    max_seq_length_override: int | None = None,
    prompt: str | None = None,
    remove_duplicates: bool = True,
    index_type: IndexType = "FLAT",
    index_params: dict[str, Any] = {},
    if_exists: CreateMode = "error",
    background: Literal[False] = False,
    hidden: bool = False,
    subsample: int | float | None = None
) -> Self

create(
    name,
    *,
    datasource=None,
    embedding_model=None,
    value_column="value",
    label_column="label",
    source_id_column=None,
    partition_id_column=None,
    description=None,
    label_names=None,
    max_seq_length_override=None,
    prompt=None,
    remove_duplicates=True,
    index_type="FLAT",
    index_params={},
    if_exists="error",
    background=False,
    hidden=False,
    subsample=None
)

Create a new labeled memoryset in the OrcaCloud

If datasource is provided, all columns from the datasource that are not specified in the value_column, label_column, source_id_column, or partition_id_column will be stored as metadata in the memoryset.

If datasource is omitted (None), an empty memoryset will be created with no initial memories. You can add memories later using the insert method.

Parameters:

name (str) –

Name for the new memoryset (must be unique)
datasource (Datasource | None, default: None ) –

Optional source data to populate the memories in the memoryset. If omitted, an empty memoryset will be created.
embedding_model (FinetunedEmbeddingModel | PretrainedEmbeddingModel | None, default: None ) –

Embedding model to use for embedding memory values for semantic search. If not provided, a default embedding model for the memoryset will be used.
value_column (str, default: 'value' ) –

Name of the column in the datasource that contains the memory values
label_column (str | None, default: 'label' ) –

Name of the column in the datasource that contains the memory labels. Must contain categorical values as integers or strings. String labels will be converted to integers with the unique strings extracted as label_names. To create a memoryset with all none labels, set to None.
source_id_column (str | None, default: None ) –

Optional name of the column in the datasource that contains the ids in the system of reference
partition_id_column (str | None, default: None ) –

Optional name of the column in the datasource that contains the partition ids
description (str | None, default: None ) –

Optional description for the memoryset, this will be used in agentic flows, so make sure it is concise and describes the contents of your memoryset not the datasource or the embedding model.
label_names (list[str] | None, default: None ) –

List of human-readable names for the labels in the memoryset, must match the number of labels in the label_column. Will be automatically inferred if string labels are provided or if a Dataset with a ClassLabel feature for labels is used as the datasource
max_seq_length_override (int | None, default: None ) –

Maximum sequence length of values in the memoryset, if the value is longer than this it will be truncated, will default to the model’s max sequence length if not provided
prompt (str | None, default: None ) –

Optional prompt to use when embedding documents/memories for storage
remove_duplicates (bool, default: True ) –

Whether to remove duplicates from the datasource before inserting into the memoryset
index_type (IndexType, default: 'FLAT' ) –

Type of vector index to use for the memoryset, defaults to "FLAT". Valid values are "FLAT", "IVF_FLAT", "IVF_SQ8", "IVF_PQ", "HNSW", and "DISKANN".
index_params (dict[str, Any], default: {} ) –

Parameters for the vector index, defaults to {}
if_exists (CreateMode, default: 'error' ) –

What to do if a memoryset with the same name already exists, defaults to "error". Other option is "open" to open the existing memoryset.
background (bool, default: False ) –

Whether to run the operation none blocking and return a job handle
hidden (bool, default: False ) –

Whether the memoryset should be hidden

Returns:

Self | Job[Self] –

Handle to the new memoryset in the OrcaCloud

Raises:

ValueError –

If the memoryset already exists and if_exists is "error" or if it is "open" and the params do not match those of the existing memoryset.

from_datasource `classmethod` #

from_datasource(
    name: str,
    *,
    datasource: Datasource,
    embedding_model: (
        FinetunedEmbeddingModel
        | PretrainedEmbeddingModel
        | None
    ) = None,
    value_column: str = "value",
    label_column: str | None = "label",
    source_id_column: str | None = None,
    partition_id_column: str | None = None,
    description: str | None = None,
    label_names: list[str] | None = None,
    max_seq_length_override: int | None = None,
    prompt: str | None = None,
    remove_duplicates: bool = True,
    index_type: IndexType = "FLAT",
    index_params: dict[str, Any] = {},
    if_exists: CreateMode = "error",
    background: Literal[True],
    hidden: bool = False,
    subsample: int | float | None = None
) -> Job[Self]

from_datasource(
    name: str,
    *,
    datasource: Datasource,
    embedding_model: (
        FinetunedEmbeddingModel
        | PretrainedEmbeddingModel
        | None
    ) = None,
    value_column: str = "value",
    label_column: str | None = "label",
    source_id_column: str | None = None,
    partition_id_column: str | None = None,
    description: str | None = None,
    label_names: list[str] | None = None,
    max_seq_length_override: int | None = None,
    prompt: str | None = None,
    remove_duplicates: bool = True,
    index_type: IndexType = "FLAT",
    index_params: dict[str, Any] = {},
    if_exists: CreateMode = "error",
    background: Literal[False] = False,
    hidden: bool = False,
    subsample: int | float | None = None
) -> Self

from_datasource(
    name,
    *,
    datasource,
    embedding_model=None,
    value_column="value",
    label_column="label",
    source_id_column=None,
    partition_id_column=None,
    description=None,
    label_names=None,
    max_seq_length_override=None,
    prompt=None,
    remove_duplicates=True,
    index_type="FLAT",
    index_params={},
    if_exists="error",
    background=False,
    hidden=False,
    subsample=None
)

Create a new labeled memoryset in the OrcaCloud from a datasource.

This is a convenience method that is equivalent to calling create with a datasource. All columns from the datasource that are not specified in the value_column, label_column, source_id_column, or partition_id_column will be stored as metadata in the memoryset.

Parameters:

name (str) –

Name for the new memoryset (must be unique)
datasource (Datasource) –

Source data to populate the memories in the memoryset.
embedding_model (FinetunedEmbeddingModel | PretrainedEmbeddingModel | None, default: None ) –

Embedding model to use for embedding memory values for semantic search. If not provided, a default embedding model for the memoryset will be used.
value_column (str, default: 'value' ) –

Name of the column in the datasource that contains the memory values
label_column (str | None, default: 'label' ) –

Name of the column in the datasource that contains the memory labels. Must contain categorical values as integers or strings. String labels will be converted to integers with the unique strings extracted as label_names. To create a memoryset with all none labels, set to None.
source_id_column (str | None, default: None ) –

Optional name of the column in the datasource that contains the ids in the system of reference
partition_id_column (str | None, default: None ) –

Optional name of the column in the datasource that contains the partition ids
description (str | None, default: None ) –

Optional description for the memoryset, this will be used in agentic flows, so make sure it is concise and describes the contents of your memoryset not the datasource or the embedding model.
label_names (list[str] | None, default: None ) –

List of human-readable names for the labels in the memoryset, must match the number of labels in the label_column. Will be automatically inferred if string labels are provided or if a Dataset with a ClassLabel feature for labels is used as the datasource
max_seq_length_override (int | None, default: None ) –

Maximum sequence length of values in the memoryset, if the value is longer than this it will be truncated, will default to the model’s max sequence length if not provided
prompt (str | None, default: None ) –

Optional prompt to use when embedding documents/memories for storage
remove_duplicates (bool, default: True ) –

Whether to remove duplicates from the datasource before inserting into the memoryset
index_type (IndexType, default: 'FLAT' ) –

Type of vector index to use for the memoryset, defaults to "FLAT". Valid values are "FLAT", "IVF_FLAT", "IVF_SQ8", "IVF_PQ", "HNSW", and "DISKANN".
index_params (dict[str, Any], default: {} ) –

Parameters for the vector index, defaults to {}
if_exists (CreateMode, default: 'error' ) –

What to do if a memoryset with the same name already exists, defaults to "error". Other option is "open" to open the existing memoryset.
background (bool, default: False ) –

Whether to run the operation none blocking and return a job handle.
hidden (bool, default: False ) –

Whether the memoryset should be hidden
subsample (int | float | None, default: None ) –

Optional number (int) of rows to insert or fraction (float in (0, 1]) of the datasource to insert. Use to limit the size of the initial memoryset.

Returns:

Self | Job[Self] –

Handle to the new memoryset in the OrcaCloud

Raises:

ValueError –

If the memoryset already exists and if_exists is "error" or if it is "open" and the params do not match those of the existing memoryset.

display_label_analysis #

display_label_analysis()

Display an interactive UI to review and act upon the label analysis results

Note

This method is only available in Jupyter notebooks.

ScoredMemoryset #

Bases: MemorysetBase[ScoredMemory, ScoredMemoryLookup]

A Handle to a collection of memories with scores in the OrcaCloud

Attributes:

id –

Unique identifier for the memoryset
name –

Unique name of the memoryset
description –

Description of the memoryset
length –

Number of memories in the memoryset
embedding_model –

Embedding model used to embed the memory values for semantic search
created_at –

When the memoryset was created, automatically generated on create
updated_at –

When the memoryset was last updated, automatically updated on updates

from_hf_dataset `classmethod` #

from_hf_dataset(
    name: str,
    hf_dataset: Dataset,
    background: Literal[True],
    **kwargs: Any
) -> Self

from_hf_dataset(
    name: str,
    hf_dataset: Dataset,
    background: Literal[False] = False,
    **kwargs: Any
) -> Self

from_hf_dataset(
    name, hf_dataset, background=False, **kwargs
)

Create a new memoryset from a Hugging Face Dataset in the OrcaCloud

This will automatically create a Datasource with the same name appended with _datasource and use that as the datasource for the memoryset.

All features that are not specified to be used as value_column, label_column, or source_id_column will be stored as metadata in the memoryset.

Parameters:

name (str) –

Name for the new memoryset (must be unique)
hf_dataset (Dataset) –

Hugging Face dataset to create the memoryset from
kwargs (Any, default: {} ) –

Additional parameters for creating the memoryset. See create attributes for details.

Returns:

Self | Job[Self] –

Handle to the new memoryset in the OrcaCloud

from_pytorch `classmethod` #

from_pytorch(
    name: str,
    torch_data: DataLoader | Dataset,
    *,
    column_names: list[str] | None = None,
    background: Literal[True],
    **kwargs: Any
) -> Job[Self]

from_pytorch(
    name: str,
    torch_data: DataLoader | Dataset,
    *,
    column_names: list[str] | None = None,
    background: Literal[False] = False,
    **kwargs: Any
) -> Self

from_pytorch(
    name,
    torch_data,
    *,
    column_names=None,
    background=False,
    **kwargs
)

Create a new memoryset from a PyTorch DataLoader or Dataset in the OrcaCloud

This will automatically create a Datasource with the same name appended with _datasource and use that as the datasource for the memoryset.

All properties that are not specified to be used as value_column, label_column, or source_id_column, or partition_id_column will be stored as metadata in the memoryset.

Parameters:

name (str) –

Name for the new memoryset (must be unique)
torch_data (DataLoader | Dataset) –

PyTorch data loader or dataset to create the memoryset from
column_names (list[str] | None, default: None ) –

If the provided dataset or data loader returns unnamed tuples, this argument must be provided to specify the names of the columns.
background (bool, default: False ) –

Whether to run the operation in the background
kwargs (Any, default: {} ) –

Additional parameters for creating the memoryset. See create attributes for details.

Returns:

Self | Job[Self] –

Handle to the new memoryset in the OrcaCloud

from_list `classmethod` #

from_list(
    name: str,
    data: list[dict],
    *,
    background: Literal[True],
    **kwargs: Any
) -> Job[Self]

from_list(
    name: str,
    data: list[dict],
    *,
    background: Literal[False] = False,
    **kwargs: Any
) -> Self

from_list(name, data, *, background=False, **kwargs)

Create a new memoryset from a list of dictionaries in the OrcaCloud

This will automatically create a Datasource with the same name appended with _datasource and use that as the datasource for the memoryset.

All properties that are not specified to be used as value_column, label_column, or source_id_column, or partition_id_column will be stored as metadata in the memoryset.

Parameters:

name (str) –

Name for the new memoryset (must be unique)
data (list[dict]) –

List of dictionaries to create the memoryset from
background (bool, default: False ) –

Whether to run the operation in the background
kwargs (Any, default: {} ) –

Additional parameters for creating the memoryset. See create attributes for details.

Returns:

Self | Job[Self] –

Handle to the new memoryset in the OrcaCloud

Examples:

>>> LabeledMemoryset.from_list("my_memoryset", [
...     {"value": "hello", "label": 0, "tag": "tag1"},
...     {"value": "world", "label": 1, "tag": "tag2"},
... ])

from_dict `classmethod` #

from_dict(
    name: str,
    data: dict,
    *,
    background: Literal[True],
    **kwargs: Any
) -> Job[Self]

from_dict(
    name: str,
    data: dict,
    *,
    background: Literal[False] = False,
    **kwargs: Any
) -> Self

from_dict(name, data, *, background=False, **kwargs)

Create a new memoryset from a dictionary of columns in the OrcaCloud

This will automatically create a Datasource with the same name appended with _datasource and use that as the datasource for the memoryset.

All columns from the datasource that are not specified in the value_column, label_column, source_id_column, or partition_id_column will be stored as metadata in the memoryset.

Parameters:

name (str) –

Name for the new memoryset (must be unique)
data (dict) –

Dictionary of columns to create the memoryset from
background (bool, default: False ) –

Whether to run the operation in the background
kwargs (Any, default: {} ) –

Additional parameters for creating the memoryset. See create attributes for details.

Returns:

Self | Job[Self] –

Handle to the new memoryset in the OrcaCloud

Examples:

>>> LabeledMemoryset.from_dict("my_memoryset", {
...     "value": ["hello", "world"],
...     "label": [0, 1],
...     "tag": ["tag1", "tag2"],
... })

from_pandas `classmethod` #

from_pandas(
    name: str,
    dataframe: DataFrame,
    *,
    background: Literal[True],
    **kwargs: Any
) -> Job[Self]

from_pandas(
    name: str,
    dataframe: DataFrame,
    *,
    background: Literal[False] = False,
    **kwargs: Any
) -> Self

from_pandas(name, dataframe, *, background=False, **kwargs)

Create a new memoryset from a pandas DataFrame in the OrcaCloud

This will automatically create a Datasource with the same name appended with _datasource and use that as the datasource for the memoryset.

All columns that are not specified to be used as value_column, label_column, or source_id_column, or partition_id_column will be stored as metadata in the memoryset.

Parameters:

name (str) –

Name for the new memoryset (must be unique)
dataframe (DataFrame) –

Dataframe to create the memoryset from
background (bool, default: False ) –

Whether to run the operation in the background
kwargs (Any, default: {} ) –

Additional parameters for creating the memoryset. See create attributes for details.

Returns:

Self | Job[Self] –

Handle to the new memoryset in the OrcaCloud

from_arrow `classmethod` #

from_arrow(
    name: str,
    pyarrow_table: Table,
    *,
    background: Literal[True],
    **kwargs: Any
) -> Job[Self]

from_arrow(
    name: str,
    pyarrow_table: Table,
    *,
    background: Literal[False] = False,
    **kwargs: Any
) -> Self

from_arrow(
    name, pyarrow_table, *, background=False, **kwargs
)

Create a new memoryset from a PyArrow Table in the OrcaCloud

This will automatically create a Datasource with the same name appended with _datasource and use that as the datasource for the memoryset.

All columns that are not specified to be used as value_column, label_column, or source_id_column, or partition_id_column will be stored as metadata in the memoryset.

Parameters:

name (str) –

Name for the new memoryset (must be unique)
pyarrow_table (Table) –

PyArrow table to create the memoryset from
background (bool, default: False ) –

Whether to run the operation in the background
kwargs (Any, default: {} ) –

Additional parameters for creating the memoryset. See create attributes for details.

Returns:

Self | Job[Self] –

Handle to the new memoryset in the OrcaCloud

from_disk `classmethod` #

from_disk(
    name: str,
    file_path: str | PathLike,
    *,
    background: Literal[True],
    **kwargs: Any
) -> Job[Self]

from_disk(
    name: str,
    file_path: str | PathLike,
    *,
    background: Literal[False] = False,
    **kwargs: Any
) -> Self

from_disk(name, file_path, *, background=False, **kwargs)

Create a new memoryset from a file on disk in the OrcaCloud

This will automatically create a Datasource with the same name appended with _datasource and use that as the datasource for the memoryset.

All columns from the datasource that are not specified in the value_column, label_column, source_id_column, or partition_id_column will be stored as metadata in the memoryset.

Parameters:

name (str) –

Name for the new memoryset (must be unique)
file_path (str | PathLike) –
Path to the file on disk to create the memoryset from. The file type will be inferred from the file extension. The following file types are supported:
- .pkl: Pickle files containing lists of dictionaries or dictionaries of columns
- .json/.jsonl: JSON and [JSON] Lines files
- .csv: CSV files
- .parquet: Parquet files
- dataset directory: Directory containing a saved HuggingFace Dataset
background (bool, default: False ) –

Whether to run the operation in the background
kwargs (Any, default: {} ) –

Additional parameters for creating the memoryset. See create attributes for details.

Returns:

Self | Job[Self] –

Handle to the new memoryset in the OrcaCloud

open `classmethod` #

open(name)

Get a handle to a memoryset in the OrcaCloud

Parameters:

name (str) –

Name or unique identifier of the memoryset

Returns:

Self –

Handle to the existing memoryset in the OrcaCloud

Raises:

LookupError –

If the memoryset does not exist

aopen `async` `classmethod` #

aopen(name)

Asynchronously get a handle to a memoryset in the OrcaCloud

Parameters:

name (str) –

Name or unique identifier of the memoryset

Returns:

Self –

Handle to the existing memoryset in the OrcaCloud

Raises:

LookupError –

If the memoryset does not exist

exists `classmethod` #

exists(name_or_id)

Check if a memoryset exists in the OrcaCloud

Parameters:

name_or_id (str) –

Name or id of the memoryset

Returns:

bool –

True if the memoryset exists, False otherwise

all `classmethod` #

all(show_hidden=False)

Get a list of handles to all memorysets in the OrcaCloud

Parameters:

show_hidden (bool, default: False ) –

Whether to include hidden memorysets in results, defaults to False

Returns:

list[Self] –

List of handles to all memorysets in the OrcaCloud

drop `classmethod` #

drop(name_or_id, if_not_exists='error')

Delete a memoryset from the OrcaCloud

Parameters:

name_or_id (str) –

Name or id of the memoryset
if_not_exists (DropMode, default: 'error' ) –

What to do if the memoryset does not exist, defaults to "error". Other options are "ignore" to do nothing if the memoryset does not exist.

Raises:

LookupError –

If the memoryset does not exist and if_not_exists is "error"

set #

set(
    *,
    name=UNSET,
    description=UNSET,
    label_names=UNSET,
    hidden=UNSET
)

Update editable attributes of the memoryset

Note

If a field is not provided, it will default to UNSET and not be updated.

Parameters:

description (str | None, default: UNSET ) –

Value to set for the description
name (str, default: UNSET ) –

Value to set for the name
label_names (list[str], default: UNSET ) –

Value to replace existing label names with

clone #

clone(
    name: str,
    *,
    embedding_model: (
        PretrainedEmbeddingModel
        | FinetunedEmbeddingModel
        | None
    ) = None,
    max_seq_length_override: int | None = None,
    prompt: str | None = None,
    if_exists: CreateMode = "error",
    background: Literal[True]
) -> Job[Self]

clone(
    name: str,
    *,
    embedding_model: (
        PretrainedEmbeddingModel
        | FinetunedEmbeddingModel
        | None
    ) = None,
    max_seq_length_override: int | None = None,
    prompt: str | None = None,
    if_exists: CreateMode = "error",
    background: Literal[False] = False
) -> Self

clone(
    name,
    *,
    embedding_model=None,
    max_seq_length_override=UNSET,
    prompt=None,
    if_exists="error",
    background=False
)

Create a clone of the memoryset with a new name

Parameters:

name (str) –

Name for the new memoryset (must be unique)
embedding_model (PretrainedEmbeddingModel | FinetunedEmbeddingModel | None, default: None ) –

Optional new embedding model to use for re-embedding the memory values value is longer than this it will be truncated, will default to the model’s max sequence length if not provided
max_seq_length_override (int | None, default: UNSET ) –

Optional custom max sequence length to use for the cloned memoryset. If not provided, will use the source memoryset’s max sequence length.
prompt (str | None, default: None ) –

Optional custom prompt to use for the cloned memoryset. If not provided, will use the source memoryset’s prompt.
if_exists (CreateMode, default: 'error' ) –

What to do if a memoryset with the same name already exists, defaults to "error". Other option is "open" to open the existing memoryset.

Returns:

Self | Job[Self] –

Handle to the cloned memoryset in the OrcaCloud

Examples:

>>> memoryset = LabeledMemoryset.open("my_memoryset")
>>> finetuned_embedding_model = PretrainedEmbeddingModel.GTE_BASE.finetune(
...     "gte_base_finetuned", my_memoryset
... )
>>> new_memoryset = memoryset.clone(
...     "my_memoryset_finetuned", embedding_model=finetuned_embedding_model,
... )

>>> # Clone with custom prompts
>>> new_memoryset = memoryset.clone(
...     "my_memoryset_with_prompts",
...     document_prompt_override="Represent this document for retrieval:",
...     query_prompt_override="Represent this query for retrieval:",
... )

refresh #

refresh(throttle=0)

Refresh the information about the memoryset from the OrcaCloud

Parameters:

throttle (float, default: 0 ) –

Minimum time in seconds between refreshes

len #

__len__()

Get the number of memories in the memoryset

getitem #

__getitem__(index: int | str) -> MemoryT

__getitem__(index: slice) -> list[MemoryT]

__getitem__(index)

Get memories from the memoryset by index or memory id

Parameters:

index (int | slice | str) –

Index or memory to retrieve or slice of memories to retrieve or unique identifier of the memory to retrieve

Returns:

MemoryT | list[MemoryT] –

Memory or memories from the memoryset

Raises:

LookupError –

If the id is not found or the index is out of bounds

Examples:

Retrieve the first memory in the memoryset:

>>> memoryset[0]
LabeledMemory({ label: <positive: 1>, value: 'I am happy' })

Retrieve the last memory in the memoryset:

>>> memoryset[-1]
LabeledMemory({ label: <negative: 0>, value: 'I am sad' })

Retrieve a slice of memories in the memoryset:

>>> memoryset[1:3]
[
    LabeledMemory({ label: <positive: 1>, value: 'I am happy' }),
    LabeledMemory({ label: <negative: 0>, value: 'I am sad' }),
]

Retrieve a memory by id:

>>> memoryset["0195019a-5bc7-7afb-b902-5945ee1fb766"]
LabeledMemory({ label: <positive: 1>, value: 'I am happy' })

search #

search(
    query: str,
    *,
    count: int = 1,
    prompt: str | None = None,
    partition_id: str | None = None,
    partition_filter_mode: Literal[
        "ignore_partitions",
        "include_global",
        "exclude_global",
        "only_global",
    ] = "include_global"
) -> list[MemoryLookupT]

search(
    query: list[str],
    *,
    count: int = 1,
    prompt: str | None = None,
    partition_id: str | None = None,
    partition_filter_mode: Literal[
        "ignore_partitions",
        "include_global",
        "exclude_global",
        "only_global",
    ] = "include_global"
) -> list[list[MemoryLookupT]]

search(
    query,
    *,
    count=1,
    prompt=None,
    partition_id=None,
    partition_filter_mode="include_global"
)

Search for memories that are semantically similar to the query

Parameters:

query (str | list[str]) –

Query to lookup memories in the memoryset, can be a single query or a list
count (int, default: 1 ) –

Number of memories to return for each query
prompt (str | None, default: None ) –

Optional prompt for query embedding during search. If not provided, the memoryset’s default query prompt will be used if available.
partition_id (str | None, default: None ) –

Optional partition ID to filter memories by
partition_filter_mode (Literal['ignore_partitions', 'include_global', 'exclude_global', 'only_global'], default: 'include_global' ) –

How to filter partitions when searching for memories - “ignore_partitions”: Ignore partitions - “include_global”: Include global memories - “exclude_global”: Exclude global memories - “only_global”: Only include global memories

Returns: List of memories from the memoryset that match the query. If a single query is provided, the return value is a list containing a single list of memories. If a list of queries is provided, the return value is a list of lists of memories.

Examples:

Search for similar memories:

>>> memoryset.search("I am happy", count=2)
[
    LabeledMemoryLookup({ label: <positive: 1>, value: 'I am happy' }),
    LabeledMemoryLookup({ label: <positive: 1>, value: 'I am content' }),
]

Search with custom query prompt for instruction-following models:

>>> memoryset.search("I am happy", count=2, query_prompt="Represent this query for sentiment retrieval:")
[
    LabeledMemoryLookup({ label: <positive: 1>, value: 'I am happy' }),
    LabeledMemoryLookup({ label: <positive: 1>, value: 'I am content' }),
]

Search for similar memories for multiple queries:

>>> memoryset.search(["I am happy", "I am sad"], count=1)
[
    [
        LabeledMemoryLookup({ label: <positive: 1>, value: 'I am happy' }),
    ],
    [
        LabeledMemoryLookup({ label: <negative: 0>, value: 'I am sad' }),
    ],
]

query #

query(
    offset=0,
    limit=100,
    filters=[],
    with_feedback_metrics=False,
    sort=None,
    partition_id=None,
    partition_filter_mode="include_global",
)

Query the memoryset for memories that match the filters

Parameters:

offset (int, default: 0 ) –

The offset of the first memory to return
limit (int, default: 100 ) –

The maximum number of memories to return
filters (list[FilterItemTuple], default: [] ) –

List of filters to apply to the query.
with_feedback_metrics (bool, default: False ) –

Whether to include feedback metrics in the response

Returns:

list[MemoryT] –

List of memories from the memoryset that match the filters

Examples:

>>> memoryset.query(filters=[("label", "==", 0)], limit=2)
[
    LabeledMemory({ label: <positive: 1>, value: "I am happy" }),
    LabeledMemory({ label: <negative: 0>, value: "I am sad" }),
]

to_pandas #

to_pandas(
    offset=0,
    limit=100,
    filters=[],
    with_feedback_metrics=False,
    sort=None,
)

Convert the memoryset to a pandas DataFrame

insert #

insert(items)

Insert memories into the memoryset

Parameters:

items (Iterable[dict[str, Any]] | dict[str, Any]) –
List of memories to insert into the memoryset. This should be a list of dictionaries with the following keys:
- value: Value of the memory
- label: Label of the memory
- score: Score of the memory
- source_id: Optional unique ID of the memory in a system of reference
- ...: Any other metadata to store for the memory

Examples:

>>> memoryset.insert([
...     {"value": "I am happy", "label": 1, "source_id": "data_123", "partition_id": "user_1", "tag": "happy"},
...     {"value": "I am sad", "label": 0, "source_id": "data_124", "partition_id": "user_1", "tag": "sad"},
... ])

ainsert `async` #

ainsert(items)

Asynchronously insert memories into the memoryset

Parameters:

items (Iterable[dict[str, Any]] | dict[str, Any]) –
List of memories to insert into the memoryset. This should be a list of dictionaries with the following keys:
- value: Value of the memory
- label: Label of the memory
- score: Score of the memory
- source_id: Optional unique ID of the memory in a system of reference
- partition_id: Optional partition ID of the memory
- ...: Any other metadata to store for the memory

Examples:

>>> await memoryset.ainsert([
...     {"value": "I am happy", "label": 1, "source_id": "data_123", "partition_id": "user_1", "tag": "happy"},
...     {"value": "I am sad", "label": 0, "source_id": "data_124", "partition_id": "user_1", "tag": "sad"},
... ])

arefresh `async` #

arefresh(throttle=0)

Asynchronously refresh the information about the memoryset from the OrcaCloud

Parameters:

throttle (float, default: 0 ) –

Minimum time in seconds between refreshes

get #

get(memory_id: str) -> MemoryT

get(memory_id: Iterable[str]) -> list[MemoryT]

get(memory_id)

Fetch a memory or memories from the memoryset

Parameters:

memory_id (str | Iterable[str]) –

Unique identifier of the memory or memories to fetch

Returns:

MemoryT | list[MemoryT] –

Memory or list of memories from the memoryset

Raises:

LookupError –

If no memory with the given id is found

Examples:

Fetch a single memory:

>>> memoryset.get("0195019a-5bc7-7afb-b902-5945ee1fb766")
LabeledMemory({ label: <positive: 1>, value: 'I am happy' })

Fetch multiple memories:

>>> memoryset.get([
...     "0195019a-5bc7-7afb-b902-5945ee1fb766",
...     "019501a1-ea08-76b2-9f62-95e4800b4841",
... ])
[
    LabeledMemory({ label: <positive: 1>, value: 'I am happy' }),
    LabeledMemory({ label: <negative: 0>, value: 'I am sad' }),
]

update #

update(updates: dict[str, Any]) -> MemoryT

update(updates: Iterable[dict[str, Any]]) -> list[MemoryT]

update(updates)

Update one or multiple memories in the memoryset

Parameters:

updates (dict[str, Any] | Iterable[dict[str, Any]]) –
List of updates to apply to the memories. Each update should be a dictionary with the following keys:
- memory_id: Unique identifier of the memory to update (required)
- value: Optional new value of the memory
- label: Optional new label of the memory
- source_id: Optional new source ID of the memory
- partition_id: Optional new partition ID of the memory
- ...: Optional new values for metadata properties

Returns:

MemoryT | list[MemoryT] –

Updated memory or list of updated memories

Examples:

Update a single memory:

>>> memoryset.update(
...     {
...         "memory_id": "019501a1-ea08-76b2-9f62-95e4800b4841",
...         "tag": "happy",
...     },
... )

Update multiple memories:

>>> memoryset.update(
...     {"memory_id": m.memory_id, "label": 2}
...     for m in memoryset.query(filters=[("tag", "==", "happy")])
... )

get_cascading_edits_suggestions #

get_cascading_edits_suggestions(
    memory,
    *,
    old_label,
    new_label,
    max_neighbors=50,
    max_validation_neighbors=10,
    similarity_threshold=None,
    only_if_has_old_label=True,
    exclude_if_new_label=True,
    suggestion_cooldown_time=3600.0 * 24.0,
    label_confirmation_cooldown_time=3600.0 * 24.0 * 7
)

Suggests cascading edits for a given memory based on nearby points with similar labels.

This function is triggered after a user changes a memory’s label. It looks for nearby candidates in embedding space that may be subject to similar relabeling and returns them as suggestions. The system uses scoring heuristics, label filters, and cooldown tracking to reduce noise and improve usability.

Parameters:

memory (MemoryT) –

The memory whose label was just changed.
old_label (int) –

The label this memory used to have.
new_label (int) –

The label it was changed to.
max_neighbors (int, default: 50 ) –

Maximum number of neighbors to consider.
max_validation_neighbors (int, default: 10 ) –

Maximum number of neighbors to use for label suggestion.
similarity_threshold (float | None, default: None ) –

If set, only include neighbors with a lookup score above this threshold.
only_if_has_old_label (bool, default: True ) –

If True, only consider neighbors that have the old label.
exclude_if_new_label (bool, default: True ) –

If True, exclude neighbors that already have the new label.
suggestion_cooldown_time (float, default: 3600.0 * 24.0 ) –

Minimum time (in seconds) since the last suggestion for a neighbor to be considered again.
label_confirmation_cooldown_time (float, default: 3600.0 * 24.0 * 7 ) –

Minimum time (in seconds) since a neighbor’s label was confirmed to be considered for suggestions.

Returns:

list[CascadingEditSuggestion] –

A list of CascadingEditSuggestion objects, each containing a neighbor and the suggested new label.

delete #

delete(memory_id)

Delete memories from the memoryset

Parameters:

memory_id (str | Iterable[str]) –

unique identifiers of the memories to delete

Examples:

Delete a single memory:

>>> memoryset.delete("0195019a-5bc7-7afb-b902-5945ee1fb766")

Delete multiple memories:

>>> memoryset.delete([
...     "0195019a-5bc7-7afb-b902-5945ee1fb766",
...     "019501a1-ea08-76b2-9f62-95e4800b4841",
... )

analyze #

analyze(
    *analyses: dict[str, Any] | str,
    lookup_count: int = 15,
    clear_metrics: bool = False,
    background: Literal[True],
    partition_filter_mode: Literal[
        "ignore_partitions",
        "include_global",
        "exclude_global",
        "only_global",
    ] = "include_global"
) -> Job[MemorysetMetrics]

analyze(
    *analyses: dict[str, Any] | str,
    lookup_count: int = 15,
    clear_metrics: bool = False,
    background: Literal[False] = False,
    partition_filter_mode: Literal[
        "ignore_partitions",
        "include_global",
        "exclude_global",
        "only_global",
    ] = "include_global"
) -> MemorysetMetrics

analyze(
    *analyses,
    lookup_count=15,
    clear_metrics=False,
    background=False,
    partition_filter_mode="include_global"
)

Run analyses on the memoryset to find duplicates, clusters, mislabelings, and more

The results of the analysis will be stored in the LabeledMemory.metrics attribute of each memory in the memoryset. Overall memoryset metrics will be returned as a dictionary.

Parameters:

analyses (dict[str, Any] | str, default: () ) –
List of analysis to run on the memoryset, can either be just the name of an analysis or a dictionary with a name property and additional config. The available analyses are:
- "duplicate": Find potentially duplicate memories in the memoryset
- "cluster": Cluster the memories in the memoryset
- "distribution": Analyze the embedding distribution
- "projection": Create a 2D projection of the embeddings for visualization
- "label": Analyze the labels to find potential mislabelings (labeled memorysets only)
- "class_patterns": Analyze class patterns and find representative memories (labeled memorysets only)
- "concepts": Discover and name conceptual clusters in the memoryset (labeled memorysets only)
lookup_count (int, default: 15 ) –

Number of memories to lookup for each memory in the memoryset
clear_metrics (bool, default: False ) –

Whether to clear any existing metrics from the memories before running the analysis
partition_filter_mode (Literal['ignore_partitions', 'include_global', 'exclude_global', 'only_global'], default: 'include_global' ) –

How to filter partitions when running the analysis - “ignore_partitions”: Ignore partitions - “include_global”: Include global memories - “exclude_global”: Exclude global memories - “only_global”: Only include global memories

Returns:

Job[MemorysetMetrics] | MemorysetMetrics –

dictionary with aggregate metrics for each analysis that was run

Raises:

ValueError –

If an invalid analysis name is provided

Examples:

Run label and duplicate analysis:

>>> memoryset.analyze("label", {"name": "duplicate", "possible_duplicate_threshold": 0.99})
{ "duplicate": { "num_duplicates": 10 },
  "label": {
    "label_metrics": [{
        "label": 0,
        "label_name": "negative",
        "average_lookup_score": 0.95,
        "memory_count": 100,
    }, {
        "label": 1,
        "label_name": "positive",
        "average_lookup_score": 0.90,
        "memory_count": 100,
    }]
    "neighbor_prediction_accuracy": 0.95,
    "mean_neighbor_label_confidence": 0.95,
    "mean_neighbor_label_entropy": 0.95,
    "mean_neighbor_predicted_label_ambiguity": 0.95,
  }
}

Remove all exact duplicates:

>>> memoryset.delete(
...     m.memory_id
...     for m in memoryset.query(
...         filters=[("metrics.is_duplicate", "==", True)]
...     )
... )

Display label analysis to review potential mislabelings:

>>> memoryset.display_label_analysis()

get_potential_duplicate_groups #

get_potential_duplicate_groups()

Group potential duplicates in the memoryset

create `classmethod` #

create(
    name: str,
    *,
    datasource: None = None,
    embedding_model: (
        FinetunedEmbeddingModel
        | PretrainedEmbeddingModel
        | None
    ) = None,
    description: str | None = None,
    max_seq_length_override: int | None = None,
    prompt: str | None = None,
    index_type: IndexType = "FLAT",
    index_params: dict[str, Any] = {},
    if_exists: CreateMode = "error",
    hidden: bool = False
) -> Self

create(
    name: str,
    *,
    datasource: Datasource,
    embedding_model: (
        FinetunedEmbeddingModel
        | PretrainedEmbeddingModel
        | None
    ) = None,
    value_column: str = "value",
    score_column: str | None = "score",
    source_id_column: str | None = None,
    partition_id_column: str | None = None,
    description: str | None = None,
    max_seq_length_override: int | None = None,
    prompt: str | None = None,
    remove_duplicates: bool = True,
    index_type: IndexType = "FLAT",
    index_params: dict[str, Any] = {},
    if_exists: CreateMode = "error",
    background: Literal[True],
    hidden: bool = False,
    subsample: int | float | None = None
) -> Job[Self]

create(
    name: str,
    *,
    datasource: Datasource,
    embedding_model: (
        FinetunedEmbeddingModel
        | PretrainedEmbeddingModel
        | None
    ) = None,
    score_column: str | None = "score",
    value_column: str = "value",
    source_id_column: str | None = None,
    partition_id_column: str | None = None,
    description: str | None = None,
    max_seq_length_override: int | None = None,
    prompt: str | None = None,
    remove_duplicates: bool = True,
    index_type: IndexType = "FLAT",
    index_params: dict[str, Any] = {},
    if_exists: CreateMode = "error",
    background: Literal[False] = False,
    hidden: bool = False,
    subsample: int | float | None = None
) -> Self

create(
    name,
    *,
    datasource=None,
    embedding_model=None,
    value_column="value",
    score_column="score",
    source_id_column=None,
    partition_id_column=None,
    description=None,
    max_seq_length_override=None,
    prompt=None,
    remove_duplicates=True,
    index_type="FLAT",
    index_params={},
    if_exists="error",
    background=False,
    hidden=False,
    subsample=None
)

Create a new scored memoryset in the OrcaCloud

If datasource is provided, all columns from the datasource that are not specified in the value_column, score_column, source_id_column, or partition_id_column will be stored as metadata in the memoryset.

If datasource is omitted (None), an empty memoryset will be created with no initial memories. You can add memories later using the insert method.

Parameters:

name (str) –

Name for the new memoryset (must be unique)
datasource (Datasource | None, default: None ) –

Optional source data to populate the memories in the memoryset. If omitted, an empty memoryset will be created.
embedding_model (FinetunedEmbeddingModel | PretrainedEmbeddingModel | None, default: None ) –

Embedding model to use for embedding memory values for semantic search. If not provided, a default embedding model for the memoryset will be used.
value_column (str, default: 'value' ) –

Name of the column in the datasource that contains the memory values
score_column (str | None, default: 'score' ) –

Name of the column in the datasource that contains the memory scores. Must contain numerical values. To create a memoryset with all none scores, set to None.
source_id_column (str | None, default: None ) –

Optional name of the column in the datasource that contains the ids in the system of reference
partition_id_column (str | None, default: None ) –

Optional name of the column in the datasource that contains the partition ids
description (str | None, default: None ) –

Optional description for the memoryset, this will be used in agentic flows, so make sure it is concise and describes the contents of your memoryset not the datasource or the embedding model.
max_seq_length_override (int | None, default: None ) –

Maximum sequence length of values in the memoryset, if the value is longer than this it will be truncated, will default to the model’s max sequence length if not provided
prompt (str | None, default: None ) –

Optional prompt to use when embedding documents/memories for storage
remove_duplicates (bool, default: True ) –

Whether to remove duplicates from the datasource before inserting into the memoryset
index_type (IndexType, default: 'FLAT' ) –

Type of vector index to use for the memoryset, defaults to "FLAT". Valid values are "FLAT", "IVF_FLAT", "IVF_SQ8", "IVF_PQ", "HNSW", and "DISKANN".
index_params (dict[str, Any], default: {} ) –

Parameters for the vector index, defaults to {}
if_exists (CreateMode, default: 'error' ) –

What to do if a memoryset with the same name already exists, defaults to "error". Other option is "open" to open the existing memoryset.
background (bool, default: False ) –

Whether to run the operation none blocking and return a job handle
hidden (bool, default: False ) –

Whether the memoryset should be hidden

Returns:

Self | Job[Self] –

Handle to the new memoryset in the OrcaCloud

Raises:

ValueError –

If the memoryset already exists and if_exists is "error" or if it is "open" and the params do not match those of the existing memoryset.

from_datasource `classmethod` #

from_datasource(
    name: str,
    *,
    datasource: Datasource,
    embedding_model: (
        FinetunedEmbeddingModel
        | PretrainedEmbeddingModel
        | None
    ) = None,
    value_column: str = "value",
    score_column: str | None = "score",
    source_id_column: str | None = None,
    partition_id_column: str | None = None,
    description: str | None = None,
    max_seq_length_override: int | None = None,
    prompt: str | None = None,
    remove_duplicates: bool = True,
    index_type: IndexType = "FLAT",
    index_params: dict[str, Any] = {},
    if_exists: CreateMode = "error",
    background: Literal[True],
    hidden: bool = False,
    subsample: int | float | None = None
) -> Job[Self]

from_datasource(
    name: str,
    *,
    datasource: Datasource,
    embedding_model: (
        FinetunedEmbeddingModel
        | PretrainedEmbeddingModel
        | None
    ) = None,
    score_column: str | None = "score",
    value_column: str = "value",
    source_id_column: str | None = None,
    partition_id_column: str | None = None,
    description: str | None = None,
    max_seq_length_override: int | None = None,
    prompt: str | None = None,
    remove_duplicates: bool = True,
    index_type: IndexType = "FLAT",
    index_params: dict[str, Any] = {},
    if_exists: CreateMode = "error",
    background: Literal[False] = False,
    hidden: bool = False,
    subsample: int | float | None = None
) -> Self

from_datasource(
    name,
    *,
    datasource,
    embedding_model=None,
    value_column="value",
    score_column="score",
    source_id_column=None,
    partition_id_column=None,
    description=None,
    max_seq_length_override=None,
    prompt=None,
    remove_duplicates=True,
    index_type="FLAT",
    index_params={},
    if_exists="error",
    background=False,
    hidden=False,
    subsample=None
)

Create a new scored memoryset in the OrcaCloud from a datasource.

This is a convenience method that is equivalent to calling create with a datasource. All columns from the datasource that are not specified in the value_column, score_column, source_id_column, or partition_id_column will be stored as metadata in the memoryset.

Parameters:

name (str) –

Name for the new memoryset (must be unique)
datasource (Datasource) –

Source data to populate the memories in the memoryset.
embedding_model (FinetunedEmbeddingModel | PretrainedEmbeddingModel | None, default: None ) –

Embedding model to use for embedding memory values for semantic search. If not provided, a default embedding model for the memoryset will be used.
value_column (str, default: 'value' ) –

Name of the column in the datasource that contains the memory values
score_column (str | None, default: 'score' ) –

Name of the column in the datasource that contains the memory scores. Must contain numerical values. To create a memoryset with all none scores, set to None.
source_id_column (str | None, default: None ) –

Optional name of the column in the datasource that contains the ids in the system of reference
partition_id_column (str | None, default: None ) –

Optional name of the column in the datasource that contains the partition ids
description (str | None, default: None ) –

Optional description for the memoryset, this will be used in agentic flows, so make sure it is concise and describes the contents of your memoryset not the datasource or the embedding model.
max_seq_length_override (int | None, default: None ) –

Maximum sequence length of values in the memoryset, if the value is longer than this it will be truncated, will default to the model’s max sequence length if not provided
prompt (str | None, default: None ) –

Optional prompt to use when embedding documents/memories for storage
remove_duplicates (bool, default: True ) –

Whether to remove duplicates from the datasource before inserting into the memoryset
index_type (IndexType, default: 'FLAT' ) –

Type of vector index to use for the memoryset, defaults to "FLAT". Valid values are "FLAT", "IVF_FLAT", "IVF_SQ8", "IVF_PQ", "HNSW", and "DISKANN".
index_params (dict[str, Any], default: {} ) –

Parameters for the vector index, defaults to {}
if_exists (CreateMode, default: 'error' ) –

What to do if a memoryset with the same name already exists, defaults to "error". Other option is "open" to open the existing memoryset.
background (bool, default: False ) –

Whether to run the operation none blocking and return a job handle.
hidden (bool, default: False ) –

Whether the memoryset should be hidden
subsample (int | float | None, default: None ) –

Optional number (int) of rows to insert or fraction (float in (0, 1]) of the datasource to insert. Use to limit the size of the initial memoryset.

Returns:

Self | Job[Self] –

Handle to the new memoryset in the OrcaCloud

Raises:

ValueError –

If the memoryset already exists and if_exists is "error" or if it is "open" and the params do not match those of the existing memoryset.

orca_sdk.memoryset#

TelemetrySortItem module-attribute #

FilterOperation module-attribute #

FilterValue module-attribute #

FilterItemTuple module-attribute #

MemoryBase #

feedback #

to_dict #

LabeledMemory #

Other Attributes:#

feedback #

update #

predictions #

to_dict #

LabeledMemoryLookup #

Other Attributes:#

feedback #

to_dict #

update #

predictions #

ScoredMemory #

Other Attributes:#

feedback #

update #

predictions #

to_dict #

ScoredMemoryLookup #

Other Attributes:#

feedback #

to_dict #

update #

predictions #

MemorysetBase #

create classmethod #

from_datasource classmethod #

from_hf_dataset classmethod #

from_pytorch classmethod #

from_list classmethod #

from_dict classmethod #

from_pandas classmethod #

from_arrow classmethod #

from_disk classmethod #

open classmethod #

aopen async classmethod #

exists classmethod #

all classmethod #

drop classmethod #

set #

clone #

refresh #

__len__ #

__getitem__ #

search #

query #

to_pandas #

insert #

ainsert async #

arefresh async #

get #

update #

get_cascading_edits_suggestions #

delete #

analyze #

get_potential_duplicate_groups #

LabeledMemoryset #

from_hf_dataset classmethod #

from_pytorch classmethod #

from_list classmethod #

from_dict classmethod #

from_pandas classmethod #

from_arrow classmethod #

from_disk classmethod #

open classmethod #

aopen async classmethod #

exists classmethod #

all classmethod #

drop classmethod #

set #

clone #

refresh #

TelemetrySortItem `module-attribute` #

FilterOperation `module-attribute` #

FilterValue `module-attribute` #

FilterItemTuple `module-attribute` #

create `classmethod` #

from_datasource `classmethod` #

from_hf_dataset `classmethod` #

from_pytorch `classmethod` #

from_list `classmethod` #

from_dict `classmethod` #

from_pandas `classmethod` #

from_arrow `classmethod` #

from_disk `classmethod` #

open `classmethod` #

aopen `async` `classmethod` #

exists `classmethod` #

all `classmethod` #

drop `classmethod` #

len #

getitem #

ainsert `async` #

arefresh `async` #

from_hf_dataset `classmethod` #

from_pytorch `classmethod` #

from_list `classmethod` #

from_dict `classmethod` #

from_pandas `classmethod` #

from_arrow `classmethod` #

from_disk `classmethod` #

open `classmethod` #

aopen `async` `classmethod` #

exists `classmethod` #

all `classmethod` #

drop `classmethod` #

len #

getitem #

ainsert `async` #

arefresh `async` #

create `classmethod` #

from_datasource `classmethod` #

from_hf_dataset `classmethod` #

from_pytorch `classmethod` #

from_list `classmethod` #

from_dict `classmethod` #

from_pandas `classmethod` #

from_arrow `classmethod` #

from_disk `classmethod` #

open `classmethod` #

aopen `async` `classmethod` #

exists `classmethod` #

all `classmethod` #

drop `classmethod` #

len #

getitem #

ainsert `async` #

arefresh `async` #

create `classmethod` #

from_datasource `classmethod` #