Skip to content

orca_sdk.memoryset#

TelemetrySortItem module-attribute #

TelemetrySortItem = tuple[str, Literal['asc', 'desc']]

Sort expression for telemetry data consisting of a field and a direction.

  • field: The field to sort on.
  • direction: The direction to sort in.

Examples:

>>> ("feedback_metrics.accuracy.avg", "asc")
>>> ("lookup.count", "desc")

FilterOperation module-attribute #

FilterOperation = Literal[
    "==", "!=", ">", ">=", "<", "<=", "in", "not in", "like"
]

Operations that can be used in a filter expression.

FilterValue module-attribute #

FilterValue = (
    str
    | int
    | float
    | bool
    | datetime
    | None
    | list[str]
    | list[int]
    | list[float]
    | list[bool]
)

Values that can be used in a filter expression.

FilterItemTuple module-attribute #

FilterItemTuple = tuple[str, FilterOperation, FilterValue]

Filter expression consisting of a field, an operator, and a value:

  • field: The field to filter on.
  • operation: The operation to apply to the field and value.
  • value: The value to compare the field against.

Examples:

1
2
3
4
>>> ("label", "==", 0)
>>> ("metadata.author", "like", "John")
>>> ("source_id", "in", ["123", "456"])
>>> ("feedback_metrics.accuracy.avg", ">", 0.95)

LabeledMemory #

Bases: _Memory

A row of the LabeledMemoryset

Attributes:

  • value

    Value represented by the row

  • embedding

    Embedding of the value of the memory for semantic search, automatically generated with the LabeledMemoryset.embedding_model

  • label (int) –

    Class label of the memory

  • label_name (str | None) –

    Human-readable name of the label, automatically populated from the LabeledMemoryset.label_names

  • source_id (str | None) –

    Optional unique identifier of the memory in a system of reference

  • metrics (str | None) –

    Metrics about the memory, generated when running an analysis on the LabeledMemoryset

  • metadata (str | None) –

    Metadata associated with the memory that is not used in the model. Metadata properties are also accessible as individual attributes on the instance.

  • memory_id (str | None) –

    Unique identifier for the memory, automatically generated on insert

  • memory_version (str | None) –

    Version of the memory, automatically updated when the label or value changes

  • created_at (str | None) –

    When the memory was created, automatically generated on insert

  • updated_at (str | None) –

    When the memory was last updated, automatically updated on update

Other Attributes:#

  • ... (str | float | int | bool | None): All metadata properties can be accessed as attributes

update #

update(
    *, value=UNSET, label=UNSET, source_id=UNSET, **metadata
)

Update the memory with new values

Note

If a field is not provided, it will default to UNSET and not be updated.

Parameters:

  • value (str, default: UNSET ) –

    New value of the memory

  • label (int, default: UNSET ) –

    New label of the memory

  • source_id (str | None, default: UNSET ) –

    New source ID of the memory

  • **metadata (None | bool | float | int | str, default: {} ) –

    New values for metadata properties

Returns:

LabeledMemoryLookup #

Bases: LabeledMemory

Lookup result for a memory in a memoryset

Attributes:

  • lookup_score (float) –

    Similarity between the memory embedding and search query embedding

  • attention_weight (float | None) –

    Weight the model assigned to the memory during prediction if this lookup happened as part of a prediction

  • value (float | None) –

    Value represented by the row

  • embedding (float | None) –

    Embedding of the value of the memory for semantic search, automatically generated with the LabeledMemoryset.embedding_model

  • label (float | None) –

    Class label of the memory

  • label_name (float | None) –

    Human-readable name of the label, automatically populated from the LabeledMemoryset.label_names

  • source_id (float | None) –

    Optional unique identifier of the memory in a system of reference

  • metrics (float | None) –

    Metrics about the memory, generated when running an analysis on the LabeledMemoryset

  • metadata (float | None) –

    Metadata associated with the memory that is not used in the model. Metadata properties are also accessible as individual attributes on the instance.

  • memory_id (float | None) –

    The unique identifier for the memory, automatically generated on insert

  • memory_version (float | None) –

    The version of the memory, automatically updated when the label or value changes

  • created_at (float | None) –

    When the memory was created, automatically generated on insert

  • updated_at (float | None) –

    When the memory was last updated, automatically updated on update

Other Attributes:#

  • ... (str | float | int | bool | None): All metadata properties can be accessed as attributes

update #

update(
    *, value=UNSET, label=UNSET, source_id=UNSET, **metadata
)

Update the memory with new values

Note

If a field is not provided, it will default to UNSET and not be updated.

Parameters:

  • value (str, default: UNSET ) –

    New value of the memory

  • label (int, default: UNSET ) –

    New label of the memory

  • source_id (str | None, default: UNSET ) –

    New source ID of the memory

  • **metadata (None | bool | float | int | str, default: {} ) –

    New values for metadata properties

Returns:

ScoredMemory #

Bases: _Memory

A row of the ScoredMemoryset

Attributes:

  • value

    Value represented by the row

  • embedding

    Embedding of the value of the memory for semantic search, automatically generated with the ScoredMemoryset.embedding_model

  • score (float) –

    Score of the memory

  • source_id (float) –

    Optional unique identifier of the memory in a system of reference

  • metrics (float) –

    Metrics about the memory, generated when running an analysis on the ScoredMemoryset

  • metadata (float) –

    Metadata associated with the memory that is not used in the model. Metadata properties are also accessible as individual attributes on the instance.

  • memory_id (float) –

    Unique identifier for the memory, automatically generated on insert

  • memory_version (float) –

    Version of the memory, automatically updated when the score or value changes

  • created_at (float) –

    When the memory was created, automatically generated on insert

  • updated_at (float) –

    When the memory was last updated, automatically updated on update

Other Attributes:#

  • ... (str | float | int | bool | None): All metadata properties can be accessed as attributes

update #

update(
    *, value=UNSET, score=UNSET, source_id=UNSET, **metadata
)

Update the memory with new values

Note

If a field is not provided, it will default to UNSET and not be updated.

Parameters:

  • value (str, default: UNSET ) –

    New value of the memory

  • score (float, default: UNSET ) –

    New score of the memory

  • source_id (str | None, default: UNSET ) –

    New source ID of the memory

  • **metadata (None | bool | float | int | str, default: {} ) –

    New values for metadata properties

Returns:

ScoredMemoryLookup #

Bases: ScoredMemory

Lookup result for a memory in a memoryset

Attributes:

  • lookup_score (float) –

    Similarity between the memory embedding and search query embedding

  • attention_weight (float | None) –

    Weight the model assigned to the memory during prediction if this lookup happened as part of a prediction

  • value (float | None) –

    Value represented by the row

  • embedding (float | None) –

    Embedding of the value of the memory for semantic search, automatically generated with the ScoredMemoryset.embedding_model

  • score (float | None) –

    Score of the memory

  • source_id (float | None) –

    Optional unique identifier of the memory in a system of reference

  • metrics (float | None) –

    Metrics about the memory, generated when running an analysis on the ScoredMemoryset

  • memory_id (float | None) –

    The unique identifier for the memory, automatically generated on insert

  • memory_version (float | None) –

    The version of the memory, automatically updated when the score or value changes

  • created_at (float | None) –

    When the memory was created, automatically generated on insert

  • updated_at (float | None) –

    When the memory was last updated, automatically updated on update

Other Attributes:#

  • ... (str | float | int | bool | None): All metadata properties can be accessed as attributes

update #

update(
    *, value=UNSET, score=UNSET, source_id=UNSET, **metadata
)

Update the memory with new values

Note

If a field is not provided, it will default to UNSET and not be updated.

Parameters:

  • value (str, default: UNSET ) –

    New value of the memory

  • score (float, default: UNSET ) –

    New score of the memory

  • source_id (str | None, default: UNSET ) –

    New source ID of the memory

  • **metadata (None | bool | float | int | str, default: {} ) –

    New values for metadata properties

Returns:

LabeledMemoryset #

Bases: _Memoryset[LabeledMemory, LabeledMemoryLookup]

A Handle to a collection of memories with labels in the OrcaCloud

Attributes:

  • id

    Unique identifier for the memoryset

  • name

    Unique name of the memoryset

  • description

    Description of the memoryset

  • label_names (list[str]) –

    Names for the class labels in the memoryset

  • length (list[str]) –

    Number of memories in the memoryset

  • embedding_model (list[str]) –

    Embedding model used to embed the memory values for semantic search

  • created_at (list[str]) –

    When the memoryset was created, automatically generated on create

  • updated_at (list[str]) –

    When the memoryset was last updated, automatically updated on updates

from_hf_dataset classmethod #

from_hf_dataset(
    name: str,
    hf_dataset: Dataset,
    background: Literal[True],
    **kwargs: Any
) -> Self
from_hf_dataset(
    name: str,
    hf_dataset: Dataset,
    background: Literal[False] = False,
    **kwargs: Any
) -> Self
from_hf_dataset(
    name, hf_dataset, background=False, **kwargs
)

Create a new memoryset from a Hugging Face Dataset in the OrcaCloud

This will automatically create a Datasource with the same name appended with _datasource and use that as the datasource for the memoryset.

All features that are not specified to be used as value_column, label_column, or source_id_column will be stored as metadata in the memoryset.

Parameters:

  • name (str) –

    Name for the new memoryset (must be unique)

  • hf_dataset (Dataset) –

    Hugging Face dataset to create the memoryset from

  • kwargs (Any, default: {} ) –

    Additional parameters for creating the memoryset. See [create][orca_sdk.LabeledMemoryset.create] attributes for details.

Returns:

  • Self | Job[Self]

    Handle to the new memoryset in the OrcaCloud

from_pytorch classmethod #

from_pytorch(
    name: str,
    torch_data: DataLoader | Dataset,
    *,
    column_names: list[str] | None = None,
    background: Literal[True],
    **kwargs: Any
) -> Job[Self]
from_pytorch(
    name: str,
    torch_data: DataLoader | Dataset,
    *,
    column_names: list[str] | None = None,
    background: Literal[False] = False,
    **kwargs: Any
) -> Self
from_pytorch(
    name,
    torch_data,
    *,
    column_names=None,
    background=False,
    **kwargs
)

Create a new memoryset from a PyTorch DataLoader or Dataset in the OrcaCloud

This will automatically create a Datasource with the same name appended with _datasource and use that as the datasource for the memoryset.

All properties that are not specified to be used as value_column, label_column, or source_id_column will be stored as metadata in the memoryset.

Parameters:

  • name (str) –

    Name for the new memoryset (must be unique)

  • torch_data (DataLoader | Dataset) –

    PyTorch data loader or dataset to create the memoryset from

  • column_names (list[str] | None, default: None ) –

    If the provided dataset or data loader returns unnamed tuples, this argument must be provided to specify the names of the columns.

  • background (bool, default: False ) –

    Whether to run the operation in the background

  • kwargs (Any, default: {} ) –

    Additional parameters for creating the memoryset. See [create][orca_sdk.LabeledMemoryset.create] attributes for details.

Returns:

  • Self | Job[Self]

    Handle to the new memoryset in the OrcaCloud

from_list classmethod #

from_list(
    name: str,
    data: list[dict],
    *,
    background: Literal[True],
    **kwargs: Any
) -> Job[Self]
from_list(
    name: str,
    data: list[dict],
    *,
    background: Literal[False] = False,
    **kwargs: Any
) -> Self
from_list(name, data, *, background=False, **kwargs)

Create a new memoryset from a list of dictionaries in the OrcaCloud

This will automatically create a Datasource with the same name appended with _datasource and use that as the datasource for the memoryset.

All properties that are not specified to be used as value_column, label_column, or source_id_column will be stored as metadata in the memoryset.

Parameters:

  • name (str) –

    Name for the new memoryset (must be unique)

  • data (list[dict]) –

    List of dictionaries to create the memoryset from

  • background (bool, default: False ) –

    Whether to run the operation in the background

  • kwargs (Any, default: {} ) –

    Additional parameters for creating the memoryset. See [create][orca_sdk.LabeledMemoryset.create] attributes for details.

Returns:

  • Self | Job[Self]

    Handle to the new memoryset in the OrcaCloud

Examples:

1
2
3
4
>>> LabeledMemoryset.from_list("my_memoryset", [
...     {"value": "hello", "label": 0, "tag": "tag1"},
...     {"value": "world", "label": 1, "tag": "tag2"},
... ])

from_dict classmethod #

from_dict(
    name: str,
    data: dict,
    *,
    background: Literal[True],
    **kwargs: Any
) -> Job[Self]
from_dict(
    name: str,
    data: dict,
    *,
    background: Literal[False] = False,
    **kwargs: Any
) -> Self
from_dict(name, data, *, background=False, **kwargs)

Create a new memoryset from a dictionary of columns in the OrcaCloud

This will automatically create a Datasource with the same name appended with _datasource and use that as the datasource for the memoryset.

All columns from the datasource that are not specified in the value_column, label_column, or source_id_column will be stored as metadata in the memoryset.

Parameters:

  • name (str) –

    Name for the new memoryset (must be unique)

  • data (dict) –

    Dictionary of columns to create the memoryset from

  • background (bool, default: False ) –

    Whether to run the operation in the background

  • kwargs (Any, default: {} ) –

    Additional parameters for creating the memoryset. See [create][orca_sdk.LabeledMemoryset.create] attributes for details.

Returns:

  • Self | Job[Self]

    Handle to the new memoryset in the OrcaCloud

Examples:

1
2
3
4
5
>>> LabeledMemoryset.from_dict("my_memoryset", {
...     "value": ["hello", "world"],
...     "label": [0, 1],
...     "tag": ["tag1", "tag2"],
... })

from_pandas classmethod #

from_pandas(
    name: str,
    dataframe: DataFrame,
    *,
    background: Literal[True],
    **kwargs: Any
) -> Job[Self]
from_pandas(
    name: str,
    dataframe: DataFrame,
    *,
    background: Literal[False] = False,
    **kwargs: Any
) -> Self
from_pandas(name, dataframe, *, background=False, **kwargs)

Create a new memoryset from a pandas DataFrame in the OrcaCloud

This will automatically create a Datasource with the same name appended with _datasource and use that as the datasource for the memoryset.

All columns that are not specified to be used as value_column, label_column, or source_id_column will be stored as metadata in the memoryset.

Parameters:

  • name (str) –

    Name for the new memoryset (must be unique)

  • dataframe (DataFrame) –

    Dataframe to create the memoryset from

  • background (bool, default: False ) –

    Whether to run the operation in the background

  • kwargs (Any, default: {} ) –

    Additional parameters for creating the memoryset. See [create][orca_sdk.LabeledMemoryset.create] attributes for details.

Returns:

  • Self | Job[Self]

    Handle to the new memoryset in the OrcaCloud

from_arrow classmethod #

from_arrow(
    name: str,
    pyarrow_table: Table,
    *,
    background: Literal[True],
    **kwargs: Any
) -> Job[Self]
from_arrow(
    name: str,
    pyarrow_table: Table,
    *,
    background: Literal[False] = False,
    **kwargs: Any
) -> Self
from_arrow(
    name, pyarrow_table, *, background=False, **kwargs
)

Create a new memoryset from a PyArrow Table in the OrcaCloud

This will automatically create a Datasource with the same name appended with _datasource and use that as the datasource for the memoryset.

All columns that are not specified to be used as value_column, label_column, or source_id_column will be stored as metadata in the memoryset.

Parameters:

  • name (str) –

    Name for the new memoryset (must be unique)

  • pyarrow_table (Table) –

    PyArrow table to create the memoryset from

  • background (bool, default: False ) –

    Whether to run the operation in the background

  • kwargs (Any, default: {} ) –

    Additional parameters for creating the memoryset. See [create][orca_sdk.LabeledMemoryset.create] attributes for details.

Returns:

  • Self | Job[Self]

    Handle to the new memoryset in the OrcaCloud

from_disk classmethod #

from_disk(
    name: str,
    file_path: str | PathLike,
    *,
    background: Literal[True],
    **kwargs: Any
) -> Job[Self]
from_disk(
    name: str,
    file_path: str | PathLike,
    *,
    background: Literal[False] = False,
    **kwargs: Any
) -> Self
from_disk(name, file_path, *, background=False, **kwargs)

Create a new memoryset from a file on disk in the OrcaCloud

This will automatically create a Datasource with the same name appended with _datasource and use that as the datasource for the memoryset.

All columns from the datasource that are not specified in the value_column, label_column, or source_id_column will be stored as metadata in the memoryset.

Parameters:

  • name (str) –

    Name for the new memoryset (must be unique)

  • file_path (str | PathLike) –

    Path to the file on disk to create the memoryset from. The file type will be inferred from the file extension. The following file types are supported:

    • .pkl: Pickle files containing lists of dictionaries or dictionaries of columns
    • .json/.jsonl: JSON and [JSON] Lines files
    • .csv: CSV files
    • .parquet: Parquet files
    • dataset directory: Directory containing a saved HuggingFace Dataset
  • background (bool, default: False ) –

    Whether to run the operation in the background

  • kwargs (Any, default: {} ) –

    Additional parameters for creating the memoryset. See [create][orca_sdk.LabeledMemoryset.create] attributes for details.

Returns:

  • Self | Job[Self]

    Handle to the new memoryset in the OrcaCloud

open classmethod #

open(name)

Get a handle to a memoryset in the OrcaCloud

Parameters:

  • name (str) –

    Name or unique identifier of the memoryset

Returns:

  • Self

    Handle to the existing memoryset in the OrcaCloud

Raises:

exists classmethod #

exists(name_or_id)

Check if a memoryset exists in the OrcaCloud

Parameters:

  • name_or_id (str) –

    Name or id of the memoryset

Returns:

  • bool

    True if the memoryset exists, False otherwise

all classmethod #

all()

Get a list of handles to all memorysets in the OrcaCloud

Returns:

  • list[Self]

    List of handles to all memorysets in the OrcaCloud

drop classmethod #

drop(name_or_id, if_not_exists='error')

Delete a memoryset from the OrcaCloud

Parameters:

  • name_or_id (str) –

    Name or id of the memoryset

  • if_not_exists (DropMode, default: 'error' ) –

    What to do if the memoryset does not exist, defaults to "error". Other options are "ignore" to do nothing if the memoryset does not exist.

Raises:

  • LookupError

    If the memoryset does not exist and if_not_exists is "error"

set #

set(*, name=UNSET, description=UNSET, label_names=UNSET)

Update editable attributes of the memoryset

Note

If a field is not provided, it will default to UNSET and not be updated.

Parameters:

  • description (str | None, default: UNSET ) –

    Value to set for the description

  • name (str, default: UNSET ) –

    Value to set for the name

  • label_names (list[str], default: UNSET ) –

    Value to replace existing label names with

clone #

clone(
    name: str,
    *,
    embedding_model: (
        PretrainedEmbeddingModel
        | FinetunedEmbeddingModel
        | None
    ) = None,
    max_seq_length_override: int | None = None,
    if_exists: CreateMode = "error",
    background: Literal[True]
) -> Job[Self]
clone(
    name: str,
    *,
    embedding_model: (
        PretrainedEmbeddingModel
        | FinetunedEmbeddingModel
        | None
    ) = None,
    max_seq_length_override: int | None = None,
    if_exists: CreateMode = "error",
    background: Literal[False] = False
) -> Self
clone(
    name,
    *,
    embedding_model=None,
    max_seq_length_override=None,
    if_exists="error",
    background=False
)

Create a clone of the memoryset with a new name

Parameters:

  • name (str) –

    Name for the new memoryset (must be unique)

  • embedding_model (PretrainedEmbeddingModel | FinetunedEmbeddingModel | None, default: None ) –

    Optional new embedding model to use for re-embedding the memory values value is longer than this it will be truncated, will default to the model’s max sequence length if not provided

  • if_exists (CreateMode, default: 'error' ) –

    What to do if a memoryset with the same name already exists, defaults to "error". Other option is "open" to open the existing memoryset.

Returns:

  • Self | Job[Self]

    Handle to the cloned memoryset in the OrcaCloud

Examples:

1
2
3
4
5
6
7
>>> memoryset = LabeledMemoryset.open("my_memoryset")
>>> finetuned_embedding_model = PretrainedEmbeddingModel.GTE_BASE.finetune(
...     "gte_base_finetuned", my_memoryset
... )
>>> new_memoryset = memoryset.clone(
...     "my_memoryset_finetuned", embedding_model=finetuned_embedding_model,
... )

refresh #

refresh(throttle=0)

Refresh the information about the memoryset from the OrcaCloud

Parameters:

  • throttle (float, default: 0 ) –

    Minimum time in seconds between refreshes

__len__ #

__len__()

Get the number of memories in the memoryset

__getitem__ #

__getitem__(index: int | str) -> MemoryT
__getitem__(index: slice) -> list[MemoryT]
__getitem__(index)

Get memories from the memoryset by index or memory id

Parameters:

  • index (int | slice | str) –

    Index or memory to retrieve or slice of memories to retrieve or unique identifier of the memory to retrieve

Returns:

  • MemoryT | list[MemoryT]

    Memory or memories from the memoryset

Raises:

  • LookupError

    If the id is not found or the index is out of bounds

Examples:

Retrieve the first memory in the memoryset:

>>> memoryset[0]
LabeledMemory({ label: <positive: 1>, value: 'I am happy' })

Retrieve the last memory in the memoryset:

>>> memoryset[-1]
LabeledMemory({ label: <negative: 0>, value: 'I am sad' })

Retrieve a slice of memories in the memoryset:

1
2
3
4
5
>>> memoryset[1:3]
[
    LabeledMemory({ label: <positive: 1>, value: 'I am happy' }),
    LabeledMemory({ label: <negative: 0>, value: 'I am sad' }),
]

Retrieve a memory by id:

>>> memoryset["0195019a-5bc7-7afb-b902-5945ee1fb766"]
LabeledMemory({ label: <positive: 1>, value: 'I am happy' })

search #

search(
    query: str, *, count: int = 1
) -> list[MemoryLookupT]
search(
    query: list[str], *, count: int = 1
) -> list[list[MemoryLookupT]]
search(query, *, count=1)

Search for memories that are semantically similar to the query

Parameters:

  • query (str | list[str]) –

    Query to lookup memories in the memoryset, can be a single query or a list

  • count (int, default: 1 ) –

    Number of memories to return for each query

Returns:

  • list[MemoryLookupT] | list[list[MemoryLookupT]]

    List of memories from the memoryset that match the query. If a single query is provided, the return value is a list containing a single list of memories. If a list of queries is provided, the return value is a list of lists of memories.

Examples:

Search for similar memories:

1
2
3
4
5
>>> memoryset.search("I am happy", count=2)
[
    LabeledMemoryLookup({ label: <positive: 1>, value: 'I am happy' }),
    LabeledMemoryLookup({ label: <positive: 1>, value: 'I am content' }),
]

Search for similar memories for multiple queries:

1
2
3
4
5
6
7
8
9
>>> memoryset.search(["I am happy", "I am sad"], count=1)
[
    [
        LabeledMemoryLookup({ label: <positive: 1>, value: 'I am happy' }),
    ],
    [
        LabeledMemoryLookup({ label: <negative: 0>, value: 'I am sad' }),
    ],
]

query #

query(
    offset=0,
    limit=100,
    filters=[],
    with_feedback_metrics=False,
    sort=None,
)

Query the memoryset for memories that match the filters

Parameters:

  • offset (int, default: 0 ) –

    The offset of the first memory to return

  • limit (int, default: 100 ) –

    The maximum number of memories to return

  • filters (list[FilterItemTuple], default: [] ) –

    List of filters to apply to the query.

  • with_feedback_metrics (bool, default: False ) –

    Whether to include feedback metrics in the response

Returns:

  • list[MemoryT]

    List of memories from the memoryset that match the filters

Examples:

1
2
3
4
5
>>> memoryset.query(filters=[("label", "==", 0)], limit=2)
[
    LabeledMemory({ label: <positive: 1>, value: "I am happy" }),
    LabeledMemory({ label: <negative: 0>, value: "I am sad" }),
]

insert #

insert(items)

Insert memories into the memoryset

Parameters:

  • items (Iterable[dict[str, Any]] | dict[str, Any]) –

    List of memories to insert into the memoryset. This should be a list of dictionaries with the following keys:

    • value: Value of the memory
    • label: Label of the memory
    • score: Score of the memory
    • source_id: Optional unique ID of the memory in a system of reference
    • ...: Any other metadata to store for the memory

Examples:

1
2
3
4
>>> memoryset.insert([
...     {"value": "I am happy", "label": 1, "source_id": "user_123", "tag": "happy"},
...     {"value": "I am sad", "label": 0, "source_id": "user_124", "tag": "sad"},
... ])

get #

get(memory_id: str) -> MemoryT
get(memory_id: Iterable[str]) -> list[MemoryT]
get(memory_id)

Fetch a memory or memories from the memoryset

Parameters:

  • memory_id (str | Iterable[str]) –

    Unique identifier of the memory or memories to fetch

Returns:

  • MemoryT | list[MemoryT]

    Memory or list of memories from the memoryset

Raises:

  • LookupError

    If no memory with the given id is found

Examples:

Fetch a single memory:

>>> memoryset.get("0195019a-5bc7-7afb-b902-5945ee1fb766")
LabeledMemory({ label: <positive: 1>, value: 'I am happy' })

Fetch multiple memories:

1
2
3
4
5
6
7
8
>>> memoryset.get([
...     "0195019a-5bc7-7afb-b902-5945ee1fb766",
...     "019501a1-ea08-76b2-9f62-95e4800b4841",
... ])
[
    LabeledMemory({ label: <positive: 1>, value: 'I am happy' }),
    LabeledMemory({ label: <negative: 0>, value: 'I am sad' }),
]

update #

update(updates: dict[str, Any]) -> MemoryT
update(updates: Iterable[dict[str, Any]]) -> list[MemoryT]
update(updates)

Update one or multiple memories in the memoryset

Parameters:

  • updates (dict[str, Any] | Iterable[dict[str, Any]]) –

    List of updates to apply to the memories. Each update should be a dictionary with the following keys:

    • memory_id: Unique identifier of the memory to update (required)
    • value: Optional new value of the memory
    • label: Optional new label of the memory
    • source_id: Optional new source ID of the memory
    • ...: Optional new values for metadata properties

Returns:

  • MemoryT | list[MemoryT]

    Updated memory or list of updated memories

Examples:

Update a single memory:

1
2
3
4
5
6
>>> memoryset.update(
...     {
...         "memory_id": "019501a1-ea08-76b2-9f62-95e4800b4841",
...         "tag": "happy",
...     },
... )

Update multiple memories:

1
2
3
4
>>> memoryset.update(
...     {"memory_id": m.memory_id, "label": 2}
...     for m in memoryset.query(filters=[("tag", "==", "happy")])
... )

get_cascading_edits_suggestions #

get_cascading_edits_suggestions(
    memory,
    *,
    old_label,
    new_label,
    max_neighbors=50,
    max_validation_neighbors=10,
    similarity_threshold=None,
    only_if_has_old_label=True,
    exclude_if_new_label=True,
    suggestion_cooldown_time=3600.0 * 24.0,
    label_confirmation_cooldown_time=3600.0 * 24.0 * 7
)

Suggests cascading edits for a given memory based on nearby points with similar labels.

This function is triggered after a user changes a memory’s label. It looks for nearby candidates in embedding space that may be subject to similar relabeling and returns them as suggestions. The system uses scoring heuristics, label filters, and cooldown tracking to reduce noise and improve usability.

Parameters:

  • memory (MemoryT) –

    The memory whose label was just changed.

  • old_label (int) –

    The label this memory used to have.

  • new_label (int) –

    The label it was changed to.

  • max_neighbors (int, default: 50 ) –

    Maximum number of neighbors to consider.

  • max_validation_neighbors (int, default: 10 ) –

    Maximum number of neighbors to use for label suggestion.

  • similarity_threshold (float | None, default: None ) –

    If set, only include neighbors with a lookup score above this threshold.

  • only_if_has_old_label (bool, default: True ) –

    If True, only consider neighbors that have the old label.

  • exclude_if_new_label (bool, default: True ) –

    If True, exclude neighbors that already have the new label.

  • suggestion_cooldown_time (float, default: 3600.0 * 24.0 ) –

    Minimum time (in seconds) since the last suggestion for a neighbor to be considered again.

  • label_confirmation_cooldown_time (float, default: 3600.0 * 24.0 * 7 ) –

    Minimum time (in seconds) since a neighbor’s label was confirmed to be considered for suggestions.

  • _current_time

    Optional override for the current timestamp (useful for testing).

Returns:

  • list[CascadingEditSuggestion]

    A list of CascadingEditSuggestion objects, each containing a neighbor and the suggested new label.

delete #

delete(memory_id)

Delete memories from the memoryset

Parameters:

  • memory_id (str | Iterable[str]) –

    unique identifiers of the memories to delete

Examples:

Delete a single memory:

>>> memoryset.delete("0195019a-5bc7-7afb-b902-5945ee1fb766")

Delete multiple memories:

1
2
3
4
>>> memoryset.delete([
...     "0195019a-5bc7-7afb-b902-5945ee1fb766",
...     "019501a1-ea08-76b2-9f62-95e4800b4841",
... )

analyze #

analyze(
    *analyses: Iterable[dict[str, Any] | str],
    lookup_count: int = 15,
    clear_metrics: bool = False,
    background: Literal[True]
) -> Job[dict]
analyze(
    *analyses: Iterable[dict[str, Any] | str],
    lookup_count: int = 15,
    clear_metrics: bool = False,
    background: Literal[False] = False
) -> dict
analyze(
    *analyses,
    lookup_count=15,
    clear_metrics=False,
    background=False
)

Run analyses on the memoryset to find duplicates, clusters, mislabelings, and more

The results of the analysis will be stored in the LabeledMemory.metrics attribute of each memory in the memoryset. Overall memoryset metrics will be returned as a dictionary.

Parameters:

  • analyses (Iterable[dict[str, Any] | str], default: () ) –

    List of analysis to run on the memoryset, can either be just the name of an analysis or a dictionary with a name property and additional config. The available analyses are:

    • "duplicate": Find potentially duplicate memories in the memoryset
    • "cluster": Cluster the memories in the memoryset
    • "label": Analyze the labels to find potential mislabelings
    • "neighbor": Analyze the neighbors to populate anomaly scores
    • "projection": Create a 2D projection of the embeddings for visualization
  • lookup_count (int, default: 15 ) –

    Number of memories to lookup for each memory in the memoryset

  • clear_metrics (bool, default: False ) –

    Whether to clear any existing metrics from the memories before running the analysis

Returns:

  • Job[dict] | dict

    dictionary with aggregate metrics for each analysis that was run

Raises:

  • ValueError

    If an invalid analysis name is provided

Examples:

Run label and duplicate analysis:

>>> memoryset.analyze("label", {"name": "duplicate", "possible_duplicate_threshold": 0.99})
{ "duplicate": { "num_duplicates": 10 },
  "label": {
    "label_metrics": [{
        "label": 0,
        "label_name": "negative",
        "average_lookup_score": 0.95,
        "memory_count": 100,
    }, {
        "label": 1,
        "label_name": "positive",
        "average_lookup_score": 0.90,
        "memory_count": 100,
    }]
    "neighbor_prediction_accuracy": 0.95,
    "mean_neighbor_label_confidence": 0.95,
    "mean_neighbor_label_entropy": 0.95,
    "mean_neighbor_predicted_label_ambiguity": 0.95,
  }
}

Remove all exact duplicates:

1
2
3
4
5
6
>>> memoryset.delete(
...     m.memory_id
...     for m in memoryset.query(
...         filters=[("metrics.is_duplicate", "==", True)]
...     )
... )

Display label analysis to review potential mislabelings:

>>> memoryset.display_label_analysis()

get_potential_duplicate_groups #

get_potential_duplicate_groups()

Group potential duplicates in the memoryset

run_embedding_evaluation staticmethod #

run_embedding_evaluation(
    datasource: Datasource,
    *,
    value_column: str = "value",
    label_column: str = "label",
    source_id_column: str | None = None,
    neighbor_count: int = 5,
    embedding_models: list[str] | None = None,
    background: Literal[True]
) -> Job[dict]
run_embedding_evaluation(
    datasource: Datasource,
    *,
    value_column: str = "value",
    label_column: str = "label",
    source_id_column: str | None = None,
    neighbor_count: int = 5,
    embedding_models: list[str] | None = None,
    background: Literal[False] = False
) -> dict
run_embedding_evaluation(
    datasource,
    *,
    value_column="value",
    label_column="label",
    source_id_column=None,
    neighbor_count=5,
    embedding_models=None,
    background=False
)

Test the quality of embeddings for the datasource by computing metrics such as prediction accuracy.

Parameters:

  • datasource (Datasource) –

    The datasource to run the embedding evaluation on

  • value_column (str, default: 'value' ) –

    Name of the column in the datasource that contains the memory values

  • label_column (str, default: 'label' ) –

    Name of the column in the datasource that contains the memory labels, these must be contiguous integers starting from 0

  • source_id_column (str | None, default: None ) –

    Optional name of the column in the datasource that contains the ids in the system of reference

  • neighbor_count (int, default: 5 ) –

    The number of neighbors to select for prediction

  • embedding_models (list[str] | None, default: None ) –

    Optional list of embedding model keys to evaluate, if not provided all available embedding models will be used

Returns:

  • Job[dict] | dict

    A dictionary containing the results of the embedding evaluation

display_label_analysis #

display_label_analysis()

Display a UI to review and act upon the label analysis results

ScoredMemoryset #

Bases: _Memoryset[ScoredMemory, ScoredMemoryLookup]

A Handle to a collection of memories with scores in the OrcaCloud

Attributes:

  • id

    Unique identifier for the memoryset

  • name

    Unique name of the memoryset

  • description

    Description of the memoryset

  • length

    Number of memories in the memoryset

  • embedding_model

    Embedding model used to embed the memory values for semantic search

  • created_at

    When the memoryset was created, automatically generated on create

  • updated_at

    When the memoryset was last updated, automatically updated on updates

from_hf_dataset classmethod #

from_hf_dataset(
    name: str,
    hf_dataset: Dataset,
    background: Literal[True],
    **kwargs: Any
) -> Self
from_hf_dataset(
    name: str,
    hf_dataset: Dataset,
    background: Literal[False] = False,
    **kwargs: Any
) -> Self
from_hf_dataset(
    name, hf_dataset, background=False, **kwargs
)

Create a new memoryset from a Hugging Face Dataset in the OrcaCloud

This will automatically create a Datasource with the same name appended with _datasource and use that as the datasource for the memoryset.

All features that are not specified to be used as value_column, label_column, or source_id_column will be stored as metadata in the memoryset.

Parameters:

  • name (str) –

    Name for the new memoryset (must be unique)

  • hf_dataset (Dataset) –

    Hugging Face dataset to create the memoryset from

  • kwargs (Any, default: {} ) –

    Additional parameters for creating the memoryset. See [create][orca_sdk.LabeledMemoryset.create] attributes for details.

Returns:

  • Self | Job[Self]

    Handle to the new memoryset in the OrcaCloud

from_pytorch classmethod #

from_pytorch(
    name: str,
    torch_data: DataLoader | Dataset,
    *,
    column_names: list[str] | None = None,
    background: Literal[True],
    **kwargs: Any
) -> Job[Self]
from_pytorch(
    name: str,
    torch_data: DataLoader | Dataset,
    *,
    column_names: list[str] | None = None,
    background: Literal[False] = False,
    **kwargs: Any
) -> Self
from_pytorch(
    name,
    torch_data,
    *,
    column_names=None,
    background=False,
    **kwargs
)

Create a new memoryset from a PyTorch DataLoader or Dataset in the OrcaCloud

This will automatically create a Datasource with the same name appended with _datasource and use that as the datasource for the memoryset.

All properties that are not specified to be used as value_column, label_column, or source_id_column will be stored as metadata in the memoryset.

Parameters:

  • name (str) –

    Name for the new memoryset (must be unique)

  • torch_data (DataLoader | Dataset) –

    PyTorch data loader or dataset to create the memoryset from

  • column_names (list[str] | None, default: None ) –

    If the provided dataset or data loader returns unnamed tuples, this argument must be provided to specify the names of the columns.

  • background (bool, default: False ) –

    Whether to run the operation in the background

  • kwargs (Any, default: {} ) –

    Additional parameters for creating the memoryset. See [create][orca_sdk.LabeledMemoryset.create] attributes for details.

Returns:

  • Self | Job[Self]

    Handle to the new memoryset in the OrcaCloud

from_list classmethod #

from_list(
    name: str,
    data: list[dict],
    *,
    background: Literal[True],
    **kwargs: Any
) -> Job[Self]
from_list(
    name: str,
    data: list[dict],
    *,
    background: Literal[False] = False,
    **kwargs: Any
) -> Self
from_list(name, data, *, background=False, **kwargs)

Create a new memoryset from a list of dictionaries in the OrcaCloud

This will automatically create a Datasource with the same name appended with _datasource and use that as the datasource for the memoryset.

All properties that are not specified to be used as value_column, label_column, or source_id_column will be stored as metadata in the memoryset.

Parameters:

  • name (str) –

    Name for the new memoryset (must be unique)

  • data (list[dict]) –

    List of dictionaries to create the memoryset from

  • background (bool, default: False ) –

    Whether to run the operation in the background

  • kwargs (Any, default: {} ) –

    Additional parameters for creating the memoryset. See [create][orca_sdk.LabeledMemoryset.create] attributes for details.

Returns:

  • Self | Job[Self]

    Handle to the new memoryset in the OrcaCloud

Examples:

1
2
3
4
>>> LabeledMemoryset.from_list("my_memoryset", [
...     {"value": "hello", "label": 0, "tag": "tag1"},
...     {"value": "world", "label": 1, "tag": "tag2"},
... ])

from_dict classmethod #

from_dict(
    name: str,
    data: dict,
    *,
    background: Literal[True],
    **kwargs: Any
) -> Job[Self]
from_dict(
    name: str,
    data: dict,
    *,
    background: Literal[False] = False,
    **kwargs: Any
) -> Self
from_dict(name, data, *, background=False, **kwargs)

Create a new memoryset from a dictionary of columns in the OrcaCloud

This will automatically create a Datasource with the same name appended with _datasource and use that as the datasource for the memoryset.

All columns from the datasource that are not specified in the value_column, label_column, or source_id_column will be stored as metadata in the memoryset.

Parameters:

  • name (str) –

    Name for the new memoryset (must be unique)

  • data (dict) –

    Dictionary of columns to create the memoryset from

  • background (bool, default: False ) –

    Whether to run the operation in the background

  • kwargs (Any, default: {} ) –

    Additional parameters for creating the memoryset. See [create][orca_sdk.LabeledMemoryset.create] attributes for details.

Returns:

  • Self | Job[Self]

    Handle to the new memoryset in the OrcaCloud

Examples:

1
2
3
4
5
>>> LabeledMemoryset.from_dict("my_memoryset", {
...     "value": ["hello", "world"],
...     "label": [0, 1],
...     "tag": ["tag1", "tag2"],
... })

from_pandas classmethod #

from_pandas(
    name: str,
    dataframe: DataFrame,
    *,
    background: Literal[True],
    **kwargs: Any
) -> Job[Self]
from_pandas(
    name: str,
    dataframe: DataFrame,
    *,
    background: Literal[False] = False,
    **kwargs: Any
) -> Self
from_pandas(name, dataframe, *, background=False, **kwargs)

Create a new memoryset from a pandas DataFrame in the OrcaCloud

This will automatically create a Datasource with the same name appended with _datasource and use that as the datasource for the memoryset.

All columns that are not specified to be used as value_column, label_column, or source_id_column will be stored as metadata in the memoryset.

Parameters:

  • name (str) –

    Name for the new memoryset (must be unique)

  • dataframe (DataFrame) –

    Dataframe to create the memoryset from

  • background (bool, default: False ) –

    Whether to run the operation in the background

  • kwargs (Any, default: {} ) –

    Additional parameters for creating the memoryset. See [create][orca_sdk.LabeledMemoryset.create] attributes for details.

Returns:

  • Self | Job[Self]

    Handle to the new memoryset in the OrcaCloud

from_arrow classmethod #

from_arrow(
    name: str,
    pyarrow_table: Table,
    *,
    background: Literal[True],
    **kwargs: Any
) -> Job[Self]
from_arrow(
    name: str,
    pyarrow_table: Table,
    *,
    background: Literal[False] = False,
    **kwargs: Any
) -> Self
from_arrow(
    name, pyarrow_table, *, background=False, **kwargs
)

Create a new memoryset from a PyArrow Table in the OrcaCloud

This will automatically create a Datasource with the same name appended with _datasource and use that as the datasource for the memoryset.

All columns that are not specified to be used as value_column, label_column, or source_id_column will be stored as metadata in the memoryset.

Parameters:

  • name (str) –

    Name for the new memoryset (must be unique)

  • pyarrow_table (Table) –

    PyArrow table to create the memoryset from

  • background (bool, default: False ) –

    Whether to run the operation in the background

  • kwargs (Any, default: {} ) –

    Additional parameters for creating the memoryset. See [create][orca_sdk.LabeledMemoryset.create] attributes for details.

Returns:

  • Self | Job[Self]

    Handle to the new memoryset in the OrcaCloud

from_disk classmethod #

from_disk(
    name: str,
    file_path: str | PathLike,
    *,
    background: Literal[True],
    **kwargs: Any
) -> Job[Self]
from_disk(
    name: str,
    file_path: str | PathLike,
    *,
    background: Literal[False] = False,
    **kwargs: Any
) -> Self
from_disk(name, file_path, *, background=False, **kwargs)

Create a new memoryset from a file on disk in the OrcaCloud

This will automatically create a Datasource with the same name appended with _datasource and use that as the datasource for the memoryset.

All columns from the datasource that are not specified in the value_column, label_column, or source_id_column will be stored as metadata in the memoryset.

Parameters:

  • name (str) –

    Name for the new memoryset (must be unique)

  • file_path (str | PathLike) –

    Path to the file on disk to create the memoryset from. The file type will be inferred from the file extension. The following file types are supported:

    • .pkl: Pickle files containing lists of dictionaries or dictionaries of columns
    • .json/.jsonl: JSON and [JSON] Lines files
    • .csv: CSV files
    • .parquet: Parquet files
    • dataset directory: Directory containing a saved HuggingFace Dataset
  • background (bool, default: False ) –

    Whether to run the operation in the background

  • kwargs (Any, default: {} ) –

    Additional parameters for creating the memoryset. See [create][orca_sdk.LabeledMemoryset.create] attributes for details.

Returns:

  • Self | Job[Self]

    Handle to the new memoryset in the OrcaCloud

open classmethod #

open(name)

Get a handle to a memoryset in the OrcaCloud

Parameters:

  • name (str) –

    Name or unique identifier of the memoryset

Returns:

  • Self

    Handle to the existing memoryset in the OrcaCloud

Raises:

exists classmethod #

exists(name_or_id)

Check if a memoryset exists in the OrcaCloud

Parameters:

  • name_or_id (str) –

    Name or id of the memoryset

Returns:

  • bool

    True if the memoryset exists, False otherwise

all classmethod #

all()

Get a list of handles to all memorysets in the OrcaCloud

Returns:

  • list[Self]

    List of handles to all memorysets in the OrcaCloud

drop classmethod #

drop(name_or_id, if_not_exists='error')

Delete a memoryset from the OrcaCloud

Parameters:

  • name_or_id (str) –

    Name or id of the memoryset

  • if_not_exists (DropMode, default: 'error' ) –

    What to do if the memoryset does not exist, defaults to "error". Other options are "ignore" to do nothing if the memoryset does not exist.

Raises:

  • LookupError

    If the memoryset does not exist and if_not_exists is "error"

set #

set(*, name=UNSET, description=UNSET, label_names=UNSET)

Update editable attributes of the memoryset

Note

If a field is not provided, it will default to UNSET and not be updated.

Parameters:

  • description (str | None, default: UNSET ) –

    Value to set for the description

  • name (str, default: UNSET ) –

    Value to set for the name

  • label_names (list[str], default: UNSET ) –

    Value to replace existing label names with

clone #

clone(
    name: str,
    *,
    embedding_model: (
        PretrainedEmbeddingModel
        | FinetunedEmbeddingModel
        | None
    ) = None,
    max_seq_length_override: int | None = None,
    if_exists: CreateMode = "error",
    background: Literal[True]
) -> Job[Self]
clone(
    name: str,
    *,
    embedding_model: (
        PretrainedEmbeddingModel
        | FinetunedEmbeddingModel
        | None
    ) = None,
    max_seq_length_override: int | None = None,
    if_exists: CreateMode = "error",
    background: Literal[False] = False
) -> Self
clone(
    name,
    *,
    embedding_model=None,
    max_seq_length_override=None,
    if_exists="error",
    background=False
)

Create a clone of the memoryset with a new name

Parameters:

  • name (str) –

    Name for the new memoryset (must be unique)

  • embedding_model (PretrainedEmbeddingModel | FinetunedEmbeddingModel | None, default: None ) –

    Optional new embedding model to use for re-embedding the memory values value is longer than this it will be truncated, will default to the model’s max sequence length if not provided

  • if_exists (CreateMode, default: 'error' ) –

    What to do if a memoryset with the same name already exists, defaults to "error". Other option is "open" to open the existing memoryset.

Returns:

  • Self | Job[Self]

    Handle to the cloned memoryset in the OrcaCloud

Examples:

1
2
3
4
5
6
7
>>> memoryset = LabeledMemoryset.open("my_memoryset")
>>> finetuned_embedding_model = PretrainedEmbeddingModel.GTE_BASE.finetune(
...     "gte_base_finetuned", my_memoryset
... )
>>> new_memoryset = memoryset.clone(
...     "my_memoryset_finetuned", embedding_model=finetuned_embedding_model,
... )

refresh #

refresh(throttle=0)

Refresh the information about the memoryset from the OrcaCloud

Parameters:

  • throttle (float, default: 0 ) –

    Minimum time in seconds between refreshes

__len__ #

__len__()

Get the number of memories in the memoryset

__getitem__ #

__getitem__(index: int | str) -> MemoryT
__getitem__(index: slice) -> list[MemoryT]
__getitem__(index)

Get memories from the memoryset by index or memory id

Parameters:

  • index (int | slice | str) –

    Index or memory to retrieve or slice of memories to retrieve or unique identifier of the memory to retrieve

Returns:

  • MemoryT | list[MemoryT]

    Memory or memories from the memoryset

Raises:

  • LookupError

    If the id is not found or the index is out of bounds

Examples:

Retrieve the first memory in the memoryset:

>>> memoryset[0]
LabeledMemory({ label: <positive: 1>, value: 'I am happy' })

Retrieve the last memory in the memoryset:

>>> memoryset[-1]
LabeledMemory({ label: <negative: 0>, value: 'I am sad' })

Retrieve a slice of memories in the memoryset:

1
2
3
4
5
>>> memoryset[1:3]
[
    LabeledMemory({ label: <positive: 1>, value: 'I am happy' }),
    LabeledMemory({ label: <negative: 0>, value: 'I am sad' }),
]

Retrieve a memory by id:

>>> memoryset["0195019a-5bc7-7afb-b902-5945ee1fb766"]
LabeledMemory({ label: <positive: 1>, value: 'I am happy' })

search #

search(
    query: str, *, count: int = 1
) -> list[MemoryLookupT]
search(
    query: list[str], *, count: int = 1
) -> list[list[MemoryLookupT]]
search(query, *, count=1)

Search for memories that are semantically similar to the query

Parameters:

  • query (str | list[str]) –

    Query to lookup memories in the memoryset, can be a single query or a list

  • count (int, default: 1 ) –

    Number of memories to return for each query

Returns:

  • list[MemoryLookupT] | list[list[MemoryLookupT]]

    List of memories from the memoryset that match the query. If a single query is provided, the return value is a list containing a single list of memories. If a list of queries is provided, the return value is a list of lists of memories.

Examples:

Search for similar memories:

1
2
3
4
5
>>> memoryset.search("I am happy", count=2)
[
    LabeledMemoryLookup({ label: <positive: 1>, value: 'I am happy' }),
    LabeledMemoryLookup({ label: <positive: 1>, value: 'I am content' }),
]

Search for similar memories for multiple queries:

1
2
3
4
5
6
7
8
9
>>> memoryset.search(["I am happy", "I am sad"], count=1)
[
    [
        LabeledMemoryLookup({ label: <positive: 1>, value: 'I am happy' }),
    ],
    [
        LabeledMemoryLookup({ label: <negative: 0>, value: 'I am sad' }),
    ],
]

query #

query(
    offset=0,
    limit=100,
    filters=[],
    with_feedback_metrics=False,
    sort=None,
)

Query the memoryset for memories that match the filters

Parameters:

  • offset (int, default: 0 ) –

    The offset of the first memory to return

  • limit (int, default: 100 ) –

    The maximum number of memories to return

  • filters (list[FilterItemTuple], default: [] ) –

    List of filters to apply to the query.

  • with_feedback_metrics (bool, default: False ) –

    Whether to include feedback metrics in the response

Returns:

  • list[MemoryT]

    List of memories from the memoryset that match the filters

Examples:

1
2
3
4
5
>>> memoryset.query(filters=[("label", "==", 0)], limit=2)
[
    LabeledMemory({ label: <positive: 1>, value: "I am happy" }),
    LabeledMemory({ label: <negative: 0>, value: "I am sad" }),
]

insert #

insert(items)

Insert memories into the memoryset

Parameters:

  • items (Iterable[dict[str, Any]] | dict[str, Any]) –

    List of memories to insert into the memoryset. This should be a list of dictionaries with the following keys:

    • value: Value of the memory
    • label: Label of the memory
    • score: Score of the memory
    • source_id: Optional unique ID of the memory in a system of reference
    • ...: Any other metadata to store for the memory

Examples:

1
2
3
4
>>> memoryset.insert([
...     {"value": "I am happy", "label": 1, "source_id": "user_123", "tag": "happy"},
...     {"value": "I am sad", "label": 0, "source_id": "user_124", "tag": "sad"},
... ])

get #

get(memory_id: str) -> MemoryT
get(memory_id: Iterable[str]) -> list[MemoryT]
get(memory_id)

Fetch a memory or memories from the memoryset

Parameters:

  • memory_id (str | Iterable[str]) –

    Unique identifier of the memory or memories to fetch

Returns:

  • MemoryT | list[MemoryT]

    Memory or list of memories from the memoryset

Raises:

  • LookupError

    If no memory with the given id is found

Examples:

Fetch a single memory:

>>> memoryset.get("0195019a-5bc7-7afb-b902-5945ee1fb766")
LabeledMemory({ label: <positive: 1>, value: 'I am happy' })

Fetch multiple memories:

1
2
3
4
5
6
7
8
>>> memoryset.get([
...     "0195019a-5bc7-7afb-b902-5945ee1fb766",
...     "019501a1-ea08-76b2-9f62-95e4800b4841",
... ])
[
    LabeledMemory({ label: <positive: 1>, value: 'I am happy' }),
    LabeledMemory({ label: <negative: 0>, value: 'I am sad' }),
]

update #

update(updates: dict[str, Any]) -> MemoryT
update(updates: Iterable[dict[str, Any]]) -> list[MemoryT]
update(updates)

Update one or multiple memories in the memoryset

Parameters:

  • updates (dict[str, Any] | Iterable[dict[str, Any]]) –

    List of updates to apply to the memories. Each update should be a dictionary with the following keys:

    • memory_id: Unique identifier of the memory to update (required)
    • value: Optional new value of the memory
    • label: Optional new label of the memory
    • source_id: Optional new source ID of the memory
    • ...: Optional new values for metadata properties

Returns:

  • MemoryT | list[MemoryT]

    Updated memory or list of updated memories

Examples:

Update a single memory:

1
2
3
4
5
6
>>> memoryset.update(
...     {
...         "memory_id": "019501a1-ea08-76b2-9f62-95e4800b4841",
...         "tag": "happy",
...     },
... )

Update multiple memories:

1
2
3
4
>>> memoryset.update(
...     {"memory_id": m.memory_id, "label": 2}
...     for m in memoryset.query(filters=[("tag", "==", "happy")])
... )

get_cascading_edits_suggestions #

get_cascading_edits_suggestions(
    memory,
    *,
    old_label,
    new_label,
    max_neighbors=50,
    max_validation_neighbors=10,
    similarity_threshold=None,
    only_if_has_old_label=True,
    exclude_if_new_label=True,
    suggestion_cooldown_time=3600.0 * 24.0,
    label_confirmation_cooldown_time=3600.0 * 24.0 * 7
)

Suggests cascading edits for a given memory based on nearby points with similar labels.

This function is triggered after a user changes a memory’s label. It looks for nearby candidates in embedding space that may be subject to similar relabeling and returns them as suggestions. The system uses scoring heuristics, label filters, and cooldown tracking to reduce noise and improve usability.

Parameters:

  • memory (MemoryT) –

    The memory whose label was just changed.

  • old_label (int) –

    The label this memory used to have.

  • new_label (int) –

    The label it was changed to.

  • max_neighbors (int, default: 50 ) –

    Maximum number of neighbors to consider.

  • max_validation_neighbors (int, default: 10 ) –

    Maximum number of neighbors to use for label suggestion.

  • similarity_threshold (float | None, default: None ) –

    If set, only include neighbors with a lookup score above this threshold.

  • only_if_has_old_label (bool, default: True ) –

    If True, only consider neighbors that have the old label.

  • exclude_if_new_label (bool, default: True ) –

    If True, exclude neighbors that already have the new label.

  • suggestion_cooldown_time (float, default: 3600.0 * 24.0 ) –

    Minimum time (in seconds) since the last suggestion for a neighbor to be considered again.

  • label_confirmation_cooldown_time (float, default: 3600.0 * 24.0 * 7 ) –

    Minimum time (in seconds) since a neighbor’s label was confirmed to be considered for suggestions.

  • _current_time

    Optional override for the current timestamp (useful for testing).

Returns:

  • list[CascadingEditSuggestion]

    A list of CascadingEditSuggestion objects, each containing a neighbor and the suggested new label.

delete #

delete(memory_id)

Delete memories from the memoryset

Parameters:

  • memory_id (str | Iterable[str]) –

    unique identifiers of the memories to delete

Examples:

Delete a single memory:

>>> memoryset.delete("0195019a-5bc7-7afb-b902-5945ee1fb766")

Delete multiple memories:

1
2
3
4
>>> memoryset.delete([
...     "0195019a-5bc7-7afb-b902-5945ee1fb766",
...     "019501a1-ea08-76b2-9f62-95e4800b4841",
... )

analyze #

analyze(
    *analyses: Iterable[dict[str, Any] | str],
    lookup_count: int = 15,
    clear_metrics: bool = False,
    background: Literal[True]
) -> Job[dict]
analyze(
    *analyses: Iterable[dict[str, Any] | str],
    lookup_count: int = 15,
    clear_metrics: bool = False,
    background: Literal[False] = False
) -> dict
analyze(
    *analyses,
    lookup_count=15,
    clear_metrics=False,
    background=False
)

Run analyses on the memoryset to find duplicates, clusters, mislabelings, and more

The results of the analysis will be stored in the LabeledMemory.metrics attribute of each memory in the memoryset. Overall memoryset metrics will be returned as a dictionary.

Parameters:

  • analyses (Iterable[dict[str, Any] | str], default: () ) –

    List of analysis to run on the memoryset, can either be just the name of an analysis or a dictionary with a name property and additional config. The available analyses are:

    • "duplicate": Find potentially duplicate memories in the memoryset
    • "cluster": Cluster the memories in the memoryset
    • "label": Analyze the labels to find potential mislabelings
    • "neighbor": Analyze the neighbors to populate anomaly scores
    • "projection": Create a 2D projection of the embeddings for visualization
  • lookup_count (int, default: 15 ) –

    Number of memories to lookup for each memory in the memoryset

  • clear_metrics (bool, default: False ) –

    Whether to clear any existing metrics from the memories before running the analysis

Returns:

  • Job[dict] | dict

    dictionary with aggregate metrics for each analysis that was run

Raises:

  • ValueError

    If an invalid analysis name is provided

Examples:

Run label and duplicate analysis:

>>> memoryset.analyze("label", {"name": "duplicate", "possible_duplicate_threshold": 0.99})
{ "duplicate": { "num_duplicates": 10 },
  "label": {
    "label_metrics": [{
        "label": 0,
        "label_name": "negative",
        "average_lookup_score": 0.95,
        "memory_count": 100,
    }, {
        "label": 1,
        "label_name": "positive",
        "average_lookup_score": 0.90,
        "memory_count": 100,
    }]
    "neighbor_prediction_accuracy": 0.95,
    "mean_neighbor_label_confidence": 0.95,
    "mean_neighbor_label_entropy": 0.95,
    "mean_neighbor_predicted_label_ambiguity": 0.95,
  }
}

Remove all exact duplicates:

1
2
3
4
5
6
>>> memoryset.delete(
...     m.memory_id
...     for m in memoryset.query(
...         filters=[("metrics.is_duplicate", "==", True)]
...     )
... )

Display label analysis to review potential mislabelings:

>>> memoryset.display_label_analysis()

get_potential_duplicate_groups #

get_potential_duplicate_groups()

Group potential duplicates in the memoryset

run_embedding_evaluation staticmethod #

run_embedding_evaluation(
    datasource: Datasource,
    *,
    value_column: str = "value",
    label_column: str = "label",
    source_id_column: str | None = None,
    neighbor_count: int = 5,
    embedding_models: list[str] | None = None,
    background: Literal[True]
) -> Job[dict]
run_embedding_evaluation(
    datasource: Datasource,
    *,
    value_column: str = "value",
    label_column: str = "label",
    source_id_column: str | None = None,
    neighbor_count: int = 5,
    embedding_models: list[str] | None = None,
    background: Literal[False] = False
) -> dict
run_embedding_evaluation(
    datasource,
    *,
    value_column="value",
    label_column="label",
    source_id_column=None,
    neighbor_count=5,
    embedding_models=None,
    background=False
)

Test the quality of embeddings for the datasource by computing metrics such as prediction accuracy.

Parameters:

  • datasource (Datasource) –

    The datasource to run the embedding evaluation on

  • value_column (str, default: 'value' ) –

    Name of the column in the datasource that contains the memory values

  • label_column (str, default: 'label' ) –

    Name of the column in the datasource that contains the memory labels, these must be contiguous integers starting from 0

  • source_id_column (str | None, default: None ) –

    Optional name of the column in the datasource that contains the ids in the system of reference

  • neighbor_count (int, default: 5 ) –

    The number of neighbors to select for prediction

  • embedding_models (list[str] | None, default: None ) –

    Optional list of embedding model keys to evaluate, if not provided all available embedding models will be used

Returns:

  • Job[dict] | dict

    A dictionary containing the results of the embedding evaluation