Skip to content

Memories and Memorysets#

This guide dives into the details of how to work with memories and memorysets in OrcaLib. You will learn what memories are, how to create a memoryset, insert memories into it, lookup memories that are similar to a given query, and update or delete memories.

What are Memories?#

In the context of Orca, memories are additional data that your model uses to guide its predictions. Your model will look up relevant memories based on the input it receives and use them to inform its output. Memories are stored in OrcaDB tables and can thus be updated at any time, which allows changing the model’s behavior without retraining or redeploying it. For more information about memories, check out our memories concept guide

The easiest way to work with memories in Orca is by using memorysets that provide a high-level interface for storing and looking up memories.

Create a Memoryset#

In this guide we will use the LabeledMemoryset, which is a memoryset that stores labels for classification tasks, as an example. The memoryset will take care of creating a table with the right schema in the database if it doesn’t exist yet (or use the existing one), and will automatically generate embeddings for your memories using the embedding model you specify.

We recommend, starting out with a memoryset saved to a local DB for quick testing and prototyping.

1
2
3
4
5
6
from orcalib import LabeledMemoryset, EmbeddingModel

memoryset = LabeledMemoryset(
    "file:orca.db#faq_items", # (1)!
    embedding_model=EmbeddingModel.CLIP_BASE, # (2)!
)
  1. This is the name of the table in your database that will store the memories.
  2. This is the embedding model that will be used to embed the memories for semantic search.

The memoryset stores LabeledMemory objects with the following properties:

  • value: the str or Image value of the memory
  • label: the label of the memory
  • metadata: a dict with additional information about the memory
  • embedding: the embedding of the value that is generated by the embedding model attached to the memoryset
  • memory_id: the ID of the memory that is generated when it is inserted into the table
  • memory_version: the version of the memory that is incremented each time the memory is updated

Insert and Inspect Memories#

To insert memories into the memoryset, you use the insert method.

memoryset.insert([
    {
        "value": "OrcaDB is a memory-augmented database that allows you to build memory-augmented PyTorch models.",
        "label": 0,
        "metadata": {"tag": "db"},
    },
    {
        "value": "OrcaLib is a Python library that allows you to interact with OrcaDB.",
        "label": 1,
        "metadata": {"tag": "sdk"},
    },
])

This will insert two memories into the faq_items table and generate an embedding for each of them. The insert method accepts a wide range of data types (e.g. list[dict], Dataset, and DataFrame) that will be automatically converted into the correct format and saved.

All formats except for the list of tuples provided above, require the input to contain a value key (text and image keys are also supported for convenience) as well as keys for all features of the specific memoryset (e.g. label and label_name for our LabeledMemoryset).

To quickly inspect the contents of the memoryset, you can call the to_pandas method:

memoryset.to_pandas()
                  value                             label                   embedding                                 metadata                    memory_id                   memory_version
'OrcaDB is a memory-augmented database that all...'   0      [0.005379246082156897, 0.0002812617167364806, ...     {'tag': 'db'}     '5fb9521a-d3c2-430f-b43a-f51ff92643de'          1
'OrcaLib is a Python library that allows you to...'   1      [0.011897868476808071, -0.011060018092393875, ...     {'tag': 'sdk'}    '5fb9521a-d3c2-430f-b43a-f51ff92643de'          1

Look up Relevant Memories#

The main purpose of a memoryset is to enable efficiently looking up memories that are similar to a given query (typically an input to a model). You use the lookup method for this:

memoryset.lookup("What is OrcaLib?", k=1)
[
    LabeledMemoryLookup(
        value='OrcaLib is a Python library that allows you to interact with OrcaDB.',
        label=1,
        metadata={'tag': 'sdk'},
        embedding=<array.float64(768,)>,
        memory_id='5fb9521a-d3c2-430f-b43a-f51ff92643de',
        memory_version=1,
        lookup_score=0.7605084329836961,
    )
]

The lookup method takes a single query, a list if queries, or an embedding and is automatically batched for efficiency.

The result is a list of LabeledMemoryLookups that contain the memory properties and an additional lookup_score property with a score between 0 and 1 that indicates the similarity between the query and the memory (1).

  1. If you have a reranker attached to the memoryset, the lookups will also contain a reranker_score property. See the Reranking Guide to learn more about how to use rerankers with your memorysets.

Mapping and Filtering Memories#

Memorysets are not designed for updating individual memory values. But they do provide a way to generate new memorysets based on the existing ones by using the map and filter methods.

Let’s say you want to create a new memoryset that contains the memories of the original memoryset but with the label flipped. You can achieve this by using the map method:

1
2
3
4
swapped_memoryset = memoryset.map(
    lambda m: dict(label=1 if m.label == 0 else 0), # (1)!
    "my_swapped_memoryset", # (2)!
)
  1. The lambda function takes in a memory and returns a dictionary containing the values to update in the memory (or an entirely new memory).
  2. This is the name of the table in which the new memoryset will be stored. If you leave this as None, the new memoryset will replace the original memoryset.

Or say you want to create a new memoryset that contains only the memories of the original memoryset that have a specific metadata value. You can achieve this by using the filter method:

1
2
3
4
filtered_memoryset = memoryset.filter(
    lambda m: m.metadata["tag"] == "db", # (1)!
    "my_filtered_memoryset", # (2)!
)
  1. The lambda function takes in a memory and returns a boolean indicating whether the memory should be included in the new memoryset.
  2. This is the name of the table in which the new memoryset will be stored. If you leave this as None, the new memoryset will replace the original memoryset.

Sometimes you may want to delete all memories from the memoryset. You can achieve this by using the reset method:

memoryset.reset()

Deploy Memoryset to Hosted OrcaDB#

Once you are ready to deploy your model to production, you can deploy your memoryset to a hosted OrcaDB with the clone method:

1
2
3
4
5
import os

hosted_memoryset = memoryset.clone(
    os.environ["ORCADB_URL"] + "#faq_items", # (1)!
)
  1. Ensure you followed the Installation & Setup Tutorial to setup the environment variable.

This will create a new table in your hosted OrcaDB and copy all the memories from the original memoryset into the new one. You can now adapt the behavior of the model that has this memoryset attached, by updating memories in the hosted memoryset. To update memories in the hosted memoryset, you can use the Orca App or directly run SQL queries against the table that backs the hosted memoryset. See the DB Querying Guide to learn more about how to interact directly with tables in your hosted OrcaDB.