orcalib.index_handle#

IndexHandle #

IndexHandle(
    name,
    db_name,
    table_name,
    column_name,
    column_type,
    embedding_type,
    index_type,
    artifact_columns,
    embedding_model=None,
)

A handle to an index in an Orca database.

Usually this is not called directly but through the db.get_index or db.create_vector_index etc. methods on a database handle.

Parameters:

name (str) –

Name of this index
db_name (str) –

Database that this index belongs to
table_name (str) –

Table that this index belongs to
column_name (ColumnName) –

Name of the column that this index is built on
column_type (OrcaTypeHandle) –

Type of the column that this index is built on
embedding_type (OrcaTypeHandle) –

Type of the vector embedding used by this index (if any)
index_type (str) –

Type of this index
artifact_columns (dict[ColumnName, str | OrcaTypeHandle]) –

Artifact columns that are available from the index

db `property` #

db

The database handle for the database that this index belongs to

table `property` #

table

The table handle for the table that this index belongs to

embedding_dim `property` #

1	`embedding_dim`

Get the embedding dimension of this index (if any).

Returns:

int –

Embedding dimension if this index has an embedding, None otherwise

Raises:

NotImplementedError –

If the embedding type is not a numeric type

Examples:

>>> index.embedding_dim
768

scan #

scan(
    query,
    drop_exact_match=False,
    exact_match_threshold=EXACT_MATCH_THRESHOLD,
)

Entry point for a search query on the index

Parameters:

query (Any) –

Query value for the index, must match the column type this index is defined on, for example this would be a string if this is a text index
drop_exact_match (bool, default: False ) –

If True, drop exact matches from the results
exact_match_threshold (float, default: EXACT_MATCH_THRESHOLD ) –

Threshold for exact match, if the similarity score is above this

Returns:

DefaultIndexQuery –

chainable query builder object, see example

Examples:

>>> index.scan("Are Orcas really whales?").select("id", "text").fetch(1)
[
    {
        'id': 1,
        'text': "Despite being commonly known as killer whales, orcas are actually the largest member of the dolphin family."
    }
]

vector_scan #

vector_scan(
    query,
    drop_exact_match=False,
    exact_match_threshold=EXACT_MATCH_THRESHOLD,
)

Entry point for a vector search query on the index that returns a results batch

Parameters:

query (Any) –

A batch of queries to scan the index with, can either be a list of vectors represented by a list of floats each, or a list of for example strings if this is a text index. Can also be a single value, which will be treated as a list of one.
drop_exact_match (bool, default: False ) –

If True, drop exact matches from the results
exact_match_threshold (float, default: EXACT_MATCH_THRESHOLD ) –

Threshold for exact match, if the similarity score is above this

Returns:

VectorIndexQuery –

chainable query handle object, see example

Examples:

>>> res = (
...     index.vector_scan(torch.rand(2, index.embedding_dim).tolist())
...     .select("$embedding", "label")
...     .fetch(10)
... )
>>> res.to_tensor("$embedding").shape, res.to_tensor("$embedding").dtype
torch.Size([2, 10, 768]), torch.float64
>>> res.to_tensor("label").shape, res.to_tensor("label").dtype
torch.Size([2, 10]), torch.int64

>>> res = index.vector_scan("I love Orcas").select("text", "label").fetch(2)
>>> res.to_records_list()
[
    [
        {
            'text': "Orcas use sophisticated hunting techniques.",
            'label': 1
        },
        {
            'text': "Orcas can swim at speeds up to 34 miles per hour.",
            'label': 1
        }
    ]
]

get_status #

1	`get_status()`

Get the status of this index.

Returns:

CatchupStatus –

The processing status of the index

Examples:

>>> index.get_status()
'COMPLETED'

embed #

embed(text, result_format='pt')

Encode text into vectors using the index’s embedding model.

Parameters:

text (str | list[str]) –

Text to encode. Can be a single string or a list of strings.
result_format (Literal['pt', 'list'], default: 'pt' ) –

Format of the result. Can be “pt” for a PyTorch tensor or “list” for a list of lists.

Returns:

list[list[float]] | Tensor –

The embeddings of the text in the format specified by result_format.

Examples:

>>> index.embed("I love Orcas", result_format="list")
[[0.1, 0.2, 0.3, ...]]
>>> x = index.embed(["I love Orcas", "Orcas are cool"], result_format="pt")
>>> x.shape, x.dtype
(torch.Size([2, 768]), torch.float32)

orcalib.index_handle#

IndexHandle #

db property #

table property #

embedding_dim property #

scan #

vector_scan #

get_status #

embed #

db `property` #

table `property` #

embedding_dim `property` #