Skip to content

orcalib.memoryset.embedding_models#

EmbeddingModelMeta #

Bases: type

CLIP_BASE property #

CLIP_BASE

CLIP-L14 embedding model

GTE_BASE property #

GTE_BASE

Alibaba GTE-Base v1.5 embedding model

EmbeddingModel #

1
2
3
EmbeddingModel(
    name, version=0, embedding_dim=None, tokenizer=None
)

Embedding models for use with memorysets

Warning

Only the models that are available as class properties like EmbeddingModel.CLIP_BASE as well as fine-tuned versions of them are guaranteed to work.

Parameters:

  • name (str) –

    the name of the model to use, can be a HuggingFace model name or path to a local saved model, only models that are available as class properties like EmbeddingModel.CLIP_BASE as well as fine-tuned versions of them are guaranteed to work

  • version (int, default: 0 ) –

    optional version number of the model to use, this is only used for default models

  • embedding_dim (int | None, default: None ) –

    optional overwrite for embeddings dimension in case it is not correctly specified in the config

  • tokenizer (str | None, default: None ) –

    optional name of a tokenizer model to use, if not given it will be the same as name

embed #

embed(data, show_progress_bar=False, batch_size=32)

Generate embeddings for the given input

Parameters:

  • data (InputType | list[InputType]) –

    the data to encode, will be converted to a list if a scalar is given

  • show_progress_bar (bool, default: False ) –

    whether to show a progress bar

  • batch_size (int, default: 32 ) –

    the size of the batches to use

Returns:

  • ndarray

    matrix with embeddings of shape len_data x embedding_dim