orcalib.memoryset.memoryset#
LabeledMemoryset
#
Collection of memories with labels that are stored in an OrcaDB table and can be queried using embedding similarity search.
Note
This will create a database if it doesn’t exist yet and a table in it.
Parameters:
-
uri
(str | None
, default:None
) –URL of the database that should store the memories table or name of the table for the memories. Either a file URL or the URL to a hosted OrcaDB instance is accepted. If empty, the
ORCADB_URL
environment variable is used instead. If a string is provided, it is interpreted as the name of the table to create in the database specified by theORCADB_URL
environment variable. -
api_key
(str | None
, default:None
) –API key for the OrcaDB instance. If not provided, the
ORCADB_API_KEY
environment variable or the credentials encoded in the uri are used -
secret_key
(str | None
, default:None
) –Secret key for the OrcaDB instance. If not provided, the
ORCADB_SECRET_KEY
environment variable or the credentials encoded in the uri are used. -
database
(str | None
, default:None
) –Name of the database. Do not provide this if it is already encoded in the
uri
. -
table
(str | None
, default:None
) –Name of the table. Do not provide this if it is already encoded in the
uri
. -
embedding_model
(EmbeddingModel
, default:GTE_BASE
) –Embedding model to use for semantic similarity search.
-
reranker
(Reranker | None
, default:None
) –optional reranking model to use during lookup.
Examples:
Infer connection details from the ORCADB_URL, ORCADB_API_KEY, and ORCADB_SECRET_KEY environment variables:
>>> import os
>>> os.environ["ORCADB_URL"] = "https://<my-api-key>:<my-secret-key>@instance.orcadb.cloud/my-db"
>>> LabeledMemoryset()
LabeledMemoryset(table="memories", database="my-db")
>>> LabeledMemoryset("my_memories_table")
LabeledMemoryset(table="my_memories_table", database="my-db")
All connection details can be fully encoded in the the uri:
>>> LabeledMemoryset("https://<my-api-key>:<my-secret-key>@instance.orcadb.cloud/my-db/my-memories-table")
LabeledMemoryset(table="my-memories-table", database="my-db")
Or they can be provided explicitly:
>>> LabeledMemoryset(
... "https://instance.orcadb.cloud",
... api_key="my-api-key",
... secret_key="my-secret-key",
... database="my-db",
... table="my-memories-table"
... )
LabeledMemoryset(table="my-memories-table", database="my-db")
insert
#
Inserts a dataset into the LabeledMemoryset database.
For dict-like or list of dict-like datasets, there must be a label
key and one of the following keys: text
, image
, or value
.
If there are only two keys and one is label
, the other will be inferred to be value
.
For list-like datasets, the first element of each tuple must be the value and the second must be the label.
Parameters:
-
dataset
(DatasetLike
) –data to insert into the memoryset
-
log
(bool
, default:True
) –whether to show a progressbar and log messages
Examples:
Example 1: Inserting a dictionary-like dataset#
>>> dataset = [{
... "text": "text 1",
... "label": 0
... }]
>>> memoryset = LabeledMemoryset("file:///path/to/memoryset")
>>> memoryset.insert(dataset)
Example 2: Inserting a list-like dataset#
>>> dataset = [
... ("text 1", 0),
... ("text 2", 1)
]
>>> memoryset = LabeledMemoryset("file:///path/to/memoryset")
>>> memoryset.insert(dataset)
Example 3: Inserting a Hugging Face Dataset#
from datasets import Dataset
lookup
#
Retrieves the most similar memories to the query from the memoryset.
Parameters:
-
query
(InputType | list[InputType] | ndarray
) –The query to retrieve memories for. Can be a single value, a list of values, or a numpy array with value embeddings.
-
k
(int
, default:1
) –The number of memories to retrieve.
-
batch_size
(int
, default:32
) –The number of queries to process at a time.
-
run_ids
(list[int] | None
, default:None
) –A list of run IDs to track with the lookup.
-
rerank
(bool | None
, default:None
) –Whether to rerank the results. If None (default), results will be reranked if a reranker is attached to the Memoryset.
-
log
(bool
, default:False
) –Whether to log the lookup process and show progress bars.
Returns:
-
list[list[LabeledMemoryLookup]] | list[MemoryLookupResults]
–A list of lists of LabeledMemoryLookups, where each inner list contains the k most similar memories to the corresponding query.
Examples:
Example 1: Retrieving the most similar memory to a single example#
>>> memoryset = LabeledMemoryset("file:///path/to/memoryset")
>>> query = "Apple"
>>> memories = memoryset.lookup(query, k=1)
[
[
LabeledMemoryLookup(
value='Orange',
memory_id=12,
memory_version=1,
label=0,
label_name='fruit',
embedding=array([...], dtype=float32),
metadata=None,
lookup_score=.98,
reranker_score=None,
reranker_embedding=None
)
]
]
to_list
#
Get a list of all the memories in the memoryset.
Returns:
-
list[LabeledMemory]
–list containing the memories
to_pandas
#
update_embedding_model
#
Updates the embedding model for the memoryset and re-embeds all memories in the current memoryset or a new destination memoryset if it is provided.
Note
This will reset the destination memoryset if it already exists.
Parameters:
-
embedding_model
(EmbeddingModel
) –new embedding model to use.
-
destination
(LabeledMemoryset | str | None
, default:None
) –destination memoryset to store the results in, this can either be a memoryset instance, or the URL to a new memoryset, or the name of a table in the same database. A table for the destination will be created if it does not already exist. It this is
None
the current memoryset will be updated.
Returns:
-
LabeledMemoryset
–The destination memoryset with the updated embeddings.
Examples:
Replace the embedding model for the current memoryset:
>>> memoryset = LabeledMemoryset("file:./orca.db#my_memoryset")
>>> memoryset.update_model(EmbeddingModel.CLIP_BASE)
Create a new memoryset with a new embedding model:
clone
#
Clone the current memoryset into a new memoryset.
Note
This will reset the destination memoryset if it already exists.
Parameters:
-
destination
(LabeledMemoryset | str
) –The destination memoryset to clone this memoryset into, this can either be a memoryset instance, or the URL to a new memoryset, or the name of a table in the same database. A table for the destination will be created if it does not already exist.
Returns:
-
LabeledMemoryset
–The destination memoryset that the memories were cloned into.
Examples:
Clone a local memoryset into a hosted database:
>>> memoryset = LabeledMemoryset("file:./orca.db#my_memoryset")
>>> memoryset.clone("https://<my-api-key>:<my-secret-key>@instance.orcadb.cloud/my-database#my_memoryset")
Clone a local memoryset into a new table in the same database:
map
#
Apply a function to all the memories in the memoryset and store them in the current memoryset or a new destination memoryset if it is provided.
Note
If your function returns a column that already exists, then it overwrites it.
Parameters:
-
fn
(Callable[[LabeledMemory], dict[str, Any] | LabeledMemory]
) –Function that takes in the memory and returns a new memory or a dictionary containing the values to update in the memory.
-
destination
(LabeledMemoryset | str | None
, default:None
) –The destination memoryset to store the results in, this can either be a memoryset instance, or the URL to a new memoryset, or the name of a table in the same database. A table for the destination will be created if it does not already exist.
Returns:
-
LabeledMemoryset
–The destination memoryset with the updated memories.
Examples:
Add new metadata to all memories in the memoryset:
>>> memoryset = LabeledMemoryset("file:./orca.db#my_memoryset")
>>> memoryset.map(lambda m: dict(metadata=dict(**m.metadata, new_key="new_value")))
Create a new memoryset with swapped labels in a new table in the same database:
filter
#
Filters the current memoryset using the given function and stores the result in the current memoryset or a new destination memoryset if it is provided.
Note
This will reset the destination memoryset if it already exists.
Parameters:
-
fn
(Callable[[LabeledMemory], bool]
) –Function that takes in the memory and returns a boolean indicating whether the memory should be included or not.
-
destination
(LabeledMemoryset | str | None
, default:None
) –The destination memoryset to store the results in, this can either be a memoryset instance, or the URL to a new memoryset, or the name of a table in the same database. A table for the destination will be created if it does not already exist.
Returns:
-
LabeledMemoryset
–The destination memoryset with the filtered memories.
Examples:
Filter out memories with a label of 0:
>>> memoryset = LabeledMemoryset("file:./orca.db#my_memoryset")
>>> memoryset.filter(lambda m: m.label != 0)
Create a new memoryset with some metadata in a new table in the same database: