Memories and Memorysets#
This guide dives into the details of how to work with memories and memorysets in OrcaCloud. You will learn what memories are, how to create a memoryset, how to lookup memories that are similar to a given query, and how to update or delete memories.
What are Memories?#
In the context of Orca, memories are additional data that your model uses to guide its predictions. Your model will look up relevant memories based on the input it receives and use them to inform its output. Memories are stored in OrcaCloud and can thus be updated at any time, which allows changing the model’s behavior without retraining or redeploying it. For more information about memories, check out our memories concept guide
To interact with memories in Orca, you will use memorysets that provide a high-level interface for storing, looking up, and updating and deleting memories. You can think of memorysets as tables in the vector database where each row is a memory.
The memorysets store memories with the following properties:
value
: value of the memory. (1)embedding
: embedding of the value of the memory for semantic search, automatically generated by the embedding model of the memoryset.source_id
: optional unique identifier of the memory in your system of reference (has to be a string).metrics
: metrics about the memory, generated when running an analysis on the memoryset.memory_id
: unique identifier for the memory, automatically generated on insert.memory_version
: version of the memory, automatically updated when the label or value changes...
: The memoryset can also contain additional properties, which are stored in a metadata dictionary but can also be accessed as individual attributes on the instance.
- The SDK currently only supports working with text memories. We have experimental support for images as well in the OrcaCloud. Please contact us if you have a use case for this.
Different types of memorysets will store additional properties. For example, the LabeledMemoryset
that we use above, also stores:
label
: label of the memorylabel_name
: human-readable name of the label, automatically populated from the label names of the memoryset.
Label Names
label_names
is a list of human-readable names for the labels in the memoryset. It must match the number of labels in the datasource in which the index of the label name matches the label value. If the datasource is created from a Hugging Face Dataset with a ClassLabel
feature for labels, the label names will be inferred from that. Otherwise, the label names must be provided manually.
Create a Memoryset#
In this guide we will use the LabeledMemoryset
, which is a memoryset that stores labels for classification tasks, as an example. The memoryset will automatically generate embeddings for your memories using the embedding model you specify.
- Name of the memoryset in the OrcaCloud that will store the memories.
Datasource
that contains the memories to store in the memoryset.- Embedding model that will be used to embed the memories for semantic search.
- Name of the column in the datasource that contains the memory values (e.g.
text
). Will default to"value"
if not specified. - Name of the column in the datasource that contains the associated labels. Will default to
"label"
if not specified. - Optional name of the column in the datasource that contains the external source IDs.
- List of human-readable names for the labels in the memoryset, must match the number of labels in the datasource in which the index of the label name matches the label value. If the datasource contains a
ClassLabel
feature for labels, the label names will be inferred from that. - Maximum sequence length for the embedding model.
- What to do if a memoryset with the same name already exists, defaults to
"error"
. Other option is"open"
to open the existing memoryset.
Above we create a LabeledMemoryset
from a Datasource
. Additionally, the OrcaSDK provides a number of convenience methods so you can create them directly from a Hugging Face or Pytorch Dataset, list, column dictionary, pandas DataFrame, pyarrow Table, or local file. All these methods will create the Datasource
under the hood and then create the LabeledMemoryset
from it.
Since this dataset contains a ClassLabel
feature for labels, the label names will be inferred from the dataset.
- This also support DataLoader objects.
- If the provided dataset or data loader returns unnamed tuples, this argument must be provided to specify the names of the columns.
Open an Existing Memoryset#
If you already have a memoryset in the OrcaCloud, you can open it by using the LabeledMemoryset.open
method:
This will give you a handle to an existing memoryset that you can use to interact with the memoryset and the memories in the memoryset.
List all Memorysets#
You can list all memorysets in your OrcaCloud by using the LabeledMemoryset.all
method:
This will return a list of handles to all memorysets in your OrcaCloud.
Delete a Memoryset#
You can delete a memoryset by using the LabeledMemoryset.drop
method:
- The name or ID of the memoryset to drop.
- What to do if the memoryset does not exist, defaults to
"error"
. Other option is"ignore"
to do nothing if the memoryset does not exist.
This will delete the memoryset from the OrcaCloud. If the memoryset does not exist, it will raise a LookupError
. You can also specify the if_not_exists
parameter as "ignore"
if you do not wish to raise an error.
Clone a Memoryset#
You can clone a memoryset by using the clone
method and optionally change the embedding model used to embed the memories:
This will create a new memoryset with the same memories as the original one, but with a different embedding model and return a handle to the new memoryset.
Filter Memories#
You can filter memories by label, value, source_id, or metadata columns using the query
method:
[LabeledMemory({ label: <neg: 0>, value: 'This movie was so frustrating. Everything seemed energetic and I was totally prepared to have a good...' }),
LabeledMemory({ label: <neg: 0>, value: 'This movie spends most of its time preaching that it is the script that makes the movie, but apparen...' })]
The filters arg param takes a list of tuples. Each tuple contains a column name, a comparison operator, and a value. We support filtering on the value, label, source_id, and custom metadata columns. The comparison operator can be one of the following: ==
, !=
, >
, >=
, <
, <=
, in
, not in
, like
. Please see the FilterItemTuple
documentation for more details.
Some examples of valid filters:
Look up Relevant Memories#
The main purpose of a memoryset is to enable efficiently looking up memories that are similar to a given query (typically an input to a model). You can use the search
method for this:
The search
method takes a single query or a list of queries and is automatically batched for efficiency.
The result is a list of LabeledMemoryLookup
objects that contain the memory properties and an additional lookup_score
property with a score between 0 and 1 that indicates the similarity between the query and the memory.
Get Memories#
If you already have the memory_id
s of the memories you want to retrieve, you can use the get
method:
[LabeledMemory({ label: <neg: 0>, value: 'This movie was so frustrating. Everything seemed energetic and I was totally prepared to have a good...' }),
LabeledMemory({ label: <neg: 0>, value: 'This movie spends most of its time preaching that it is the script that makes the movie, but apparen...' })]
This will return a single LabeledMemory
or a list of LabeledMemory
objects that match the provided memory_id(s) depending on the input type.
You can also get a memory by index or slice:
[LabeledMemory({ label: <neg: 0>, value: 'This movie was so frustrating. Everything seemed energetic and I was totally prepared to have a good...' }),
LabeledMemory({ label: <neg: 0>, value: 'This movie spends most of its time preaching that it is the script that makes the movie, but apparen...' })]
Insert Memories#
You can insert additional memories into an existing memoryset by using the insert
method:
- This method takes a list of dictionaries with
value
,label
, and optionallysource_id
keys. Any other key/value pairs in the dictionaries will be stored as metadata.
This will insert the memories into the memoryset and refresh the memoryset handle.
Update#
You can update a memory in the memoryset by using the update
method. You have to provide the memory_id
of the memory you want to update and any keys you want to update.
- Update the title of the memory.
This will update the memory in the memoryset and return the updated memory. You can also update multiple memories at once by providing a list of dictionaries with the memory_id
and the keys you want to update.
- Update the label of the first memory.
- Update the value and label of the second memory.
This will update the memory or memories in the memoryset and return the updated memory or memories.
If you have an instance of a LabeledMemory
, you can also update it by using the update
method:
Delete Memories#
You can delete a memory in the memoryset by using the delete
method:
This will delete the memory or memories in the memoryset and refresh the memoryset handle. You can also delete multiple memories at once by providing any iterable of memory_ids
: