orcalib.memoryset.embedding_models#
EmbeddingModelMeta
#
EmbeddingModel
#
Embedding models for use with memorysets
Warning
Only the models that are available as class properties like EmbeddingModel.CLIP_BASE
as
well as fine-tuned versions of them are guaranteed to work.
Parameters:
-
name
(str
) –the name of the model to use, can be a HuggingFace model name or path to a local saved model, only models that are available as class properties like
EmbeddingModel.CLIP_BASE
as well as fine-tuned versions of them are guaranteed to work -
version
(int
, default:0
) –optional version number of the model to use, this is only used for default models
-
embedding_dim
(int | None
, default:None
) –optional overwrite for embeddings dimension in case it is not correctly specified in the config
-
tokenizer
(str | None
, default:None
) –optional name of a tokenizer model to use, if not given it will be the same as
name
embed
#
Generate embeddings for the given input
Parameters:
-
data
(InputType | list[InputType]
) –the data to encode, will be converted to a list if a scalar is given
-
show_progress_bar
(bool
, default:False
) –whether to show a progress bar
-
batch_size
(int
, default:32
) –the size of the batches to use
Returns:
-
ndarray
–matrix with embeddings of shape
len_data
xembedding_dim