Skip to content

orcalib.index_handle#

IndexHandle #

IndexHandle(
    name,
    db_name,
    table_name,
    column_name,
    column_type,
    embedding_type,
    index_type,
    artifact_columns,
    embedding_model=None,
)

A handle to an index in an Orca database.

Usually this is not called directly but through the db.get_index or db.create_vector_index etc. methods on a database handle.

Parameters:

  • name (str) –

    Name of this index

  • db_name (str) –

    Database that this index belongs to

  • table_name (str) –

    Table that this index belongs to

  • column_name (ColumnName) –

    Name of the column that this index is built on

  • column_type (OrcaTypeHandle) –

    Type of the column that this index is built on

  • embedding_type (OrcaTypeHandle) –

    Type of the vector embedding used by this index (if any)

  • index_type (str) –

    Type of this index

  • artifact_columns (dict[ColumnName, str | OrcaTypeHandle]) –

    Artifact columns that are available from the index

db property #

db

The database handle for the database that this index belongs to

table property #

table

The table handle for the table that this index belongs to

embedding_dim property #

embedding_dim

Get the embedding dimension of this index (if any).

Returns:

  • int

    Embedding dimension if this index has an embedding, None otherwise

Raises:

Examples:

>>> index.embedding_dim
768

scan #

1
2
3
4
5
scan(
    query,
    drop_exact_match=False,
    exact_match_threshold=EXACT_MATCH_THRESHOLD,
)

Entry point for a search query on the index

Parameters:

  • query (Any) –

    Query value for the index, must match the column type this index is defined on, for example this would be a string if this is a text index

  • drop_exact_match (bool, default: False ) –

    If True, drop exact matches from the results

  • exact_match_threshold (float, default: EXACT_MATCH_THRESHOLD ) –

    Threshold for exact match, if the similarity score is above this

Returns:

Examples:

>>> index.scan("Are Orcas really whales?").select("id", "text").fetch(1)
[
    {
        'id': 1,
        'text': "Despite being commonly known as killer whales, orcas are actually the largest member of the dolphin family."
    }
]

vector_scan #

1
2
3
4
5
vector_scan(
    query,
    drop_exact_match=False,
    exact_match_threshold=EXACT_MATCH_THRESHOLD,
)

Entry point for a vector search query on the index that returns a results batch

Parameters:

  • query (Any) –

    A batch of queries to scan the index with, can either be a list of vectors represented by a list of floats each, or a list of for example strings if this is a text index. Can also be a single value, which will be treated as a list of one.

  • drop_exact_match (bool, default: False ) –

    If True, drop exact matches from the results

  • exact_match_threshold (float, default: EXACT_MATCH_THRESHOLD ) –

    Threshold for exact match, if the similarity score is above this

Returns:

Examples:

>>> res = (
...     index.vector_scan(torch.rand(2, index.embedding_dim).tolist())
...     .select("$embedding", "label")
...     .fetch(10)
... )
>>> res.to_tensor("$embedding").shape, res.to_tensor("$embedding").dtype
torch.Size([2, 10, 768]), torch.float64
>>> res.to_tensor("label").shape, res.to_tensor("label").dtype
torch.Size([2, 10]), torch.int64
>>> res = index.vector_scan("I love Orcas").select("text", "label").fetch(2)
>>> res.to_records_list()
[
    [
        {
            'text': "Orcas use sophisticated hunting techniques.",
            'label': 1
        },
        {
            'text': "Orcas can swim at speeds up to 34 miles per hour.",
            'label': 1
        }
    ]
]

get_status #

get_status()

Get the status of this index.

Returns:

Examples:

>>> index.get_status()
'COMPLETED'

embed #

embed(text, result_format='pt')

Encode text into vectors using the index’s embedding model.

Parameters:

  • text (str | list[str]) –

    Text to encode. Can be a single string or a list of strings.

  • result_format (Literal['pt', 'list'], default: 'pt' ) –

    Format of the result. Can be “pt” for a PyTorch tensor or “list” for a list of lists.

Returns:

  • list[list[float]] | Tensor

    The embeddings of the text in the format specified by result_format.

Examples:

>>> index.embed("I love Orcas", result_format="list")
[[0.1, 0.2, 0.3, ...]]
>>> x = index.embed(["I love Orcas", "Orcas are cool"], result_format="pt")
>>> x.shape, x.dtype
(torch.Size([2, 768]), torch.float32)