Embedding Models#

This guide explains what embedding models are and how they work in OrcaCloud. You will learn about the available pretrained embedding models, how to generate embeddings manually, and how to finetune embedding models to improve performance for your specific use case.

What are Embedding Models?#

Embedding models are at the heart of retrieval-augmented systems. They convert text (or other data types) into dense vector representations called embeddings. These embeddings capture semantic meaning in a way that allows for efficient similarity comparisons. In OrcaCloud, embedding models serve two critical functions:

Memory Indexing: When you add memories to a memoryset, the embedding model converts each memory’s value into a vector that is stored in the OrcaCloud. This enables fast semantic search for similar memories.
Query Embedding: During inference, the same embedding model converts the input query into a vector, which is then used to find the most similar memories in the memoryset.

The quality of these embeddings directly impacts the performance of your retrieval-augmented models. Better embeddings lead to more relevant memory lookups, which in turn lead to more accurate predictions.

Pretrained Embedding Models#

OrcaCloud provides several pretrained embedding models that perform well on the Hugging Face MTEB Leaderboard. These models can be accessed as class attributes of the PretrainedEmbeddingModel class:

GTE_BASE: Alibaba’s GTE model is a general-purpose embedding model that performs well across a wide range of tasks. It has a 768-dimensional embedding space and can handle sequences up to 8192 tokens in length.
CDE_SMALL: The context-aware CDE small model is designed to generate embeddings that take into account both the document and its neighboring context, rather than just encoding documents in isolation. This contextual awareness helps it better capture relationships between documents and achieve stronger performance, especially on out-of-domain tasks. It has a 768-dimensional embedding space and can handle sequences up to 512 tokens in length. The model achieves state-of-the-art results on the MTEB benchmark.

Using a Pretrained Model#

To use a pretrained model, you can simply access it as a class attribute:

PretrainedEmbeddingModel.GTE_BASE

PretrainedEmbeddingModel({name: GTE_BASE, embedding_dim: 768, max_seq_length: 8192})

List All Pretrained Models#

You can list all pretrained embedding models that are currently available in OrcaCloud using the PretrainedEmbeddingModel.all class method:

PretrainedEmbeddingModel.all()

[PretrainedEmbeddingModel({name: CDE_SMALL, embedding_dim: 768, max_seq_length: 512}),
 PretrainedEmbeddingModel({name: GTE_BASE, embedding_dim: 768, max_seq_length: 8192})]

We are always adding new models to OrcaCloud, so make sure to check back regularly to see the latest additions and contact us if there is a specific model you’d like to try with Orca.

Generate Embeddings#

While memorysets and models handle embedding generation automatically, you can also generate embeddings manually using the embed method of an embedding model. This can be useful for debugging, visualization, or custom similarity calculations.

embedding = PretrainedEmbeddingModel.GTE_BASE.embed(
    "I love this movie",
    max_seq_length=10, # (1)!
)

You can optionally specify a maximum sequence length to improve performance if you know your inputs will be shorter than the model’s default maximum. This value needs to be less than or equal to the model’s max_seq_length and is specified in tokens not characters.

[0.023, -0.015, 0.042, 0.018, -0.031, ...]

To embed multiple texts at once, pass a list of strings to the embed method:

embeddings = PretrainedEmbeddingModel.GTE_BASE.embed([
    "I love this movie",
    "This movie is terrible"
])

[Embedding([0.023, -0.015, 0.042, 0.018, -0.031, ...]),
 Embedding([0.015, -0.032, 0.028, -0.045, 0.012, ...])]

Finetuned Embedding Models#

While pretrained embedding models work well for many applications, you can often achieve better performance by finetuning an embedding model on your specific data. OrcaCloud makes it easy to finetune embedding models for your memorysets.

When to Finetune#

Consider finetuning an embedding model when:

Your data has domain-specific terminology or concepts that general-purpose models might not capture well
You need to distinguish between subtle differences in your data that general models might consider similar
You want to optimize for a specific task like classification or clustering

Finetune an Embedding Model#

To create a finetuned embedding model for your memoryset, you can use the finetune method.

from orca_sdk import PretrainedEmbeddingModel

finetuned_gte_base = PretrainedEmbeddingModel.GTE_BASE.finetune( # (1)!
    "memoryset_finetuned_gte_base", # (2)!
    memoryset, # (3)!
    training_method="classification", # (4)!
    training_args={"learning_rate": 0.0001}, # (5)!
    if_exists="open" # (6)!
)

You can finetune any pretrained model, but we recommend starting with the GTE_BASE model as it is a general-purpose model that performs well on a wide range of tasks.
You must specify a unique name for the finetuned model
The second argument is the data to train your model on. This can either be a Datasource or a LabeledMemoryset. If you pass a custom Datasource, you may have to specify the value_column and label_column parameters to indicate which columns contain the text and label data respectively.
The training_method parameter specifies the finetuning method to use. For example, "classification" will finetune the model by adding a logistic regression head on top of the embedding model and then training on the labeled examples. "batch_triplet_loss" will finetune the model with a batched triplet loss function. Start with the default "classification" method as it is easier to train.
You can optionally pass a dictionary of training arguments to the training_args parameter. This generally excepts all simple value hyperparameters of the Hugging Face Trainer class. We automatically set reasonable defaults for the training arguments for the specified training method. Tread carefully with more complex training arguments and consider getting in touch with our team of experts if you need help.
The training_method parameter specifies the finetuning method to use. For example, "classification" will finetune the model by adding a logistic regression head on top of the embedding model and then training on the labeled examples. "batch_triplet_loss" will finetune the model with a batched triplet loss function. We suggest starting with the default "classification" method as it is easier to train.
The if_exists parameter specifies what to do if a finetuned model with the same name already exists. If you pass "open", a handle to the existing model will simply be returned. This is useful in Jupyter notebooks when re-executing cells.

To use the finetuned model in a memoryset, create or make a copy of a memoryset and pass the finetuned model to the embedding_model parameter:

finetuned_memoryset = memoryset.clone(
    "my_finetuned_memoryset", # (1)!
    embedding_model=finetuned_gte_base
)

You must specify a unique name for the new memoryset with the finetuned model.

LabeledMemoryset({
    name: 'my_finetuned_memoryset',
    length: 2500,
    label_names: ['neg', 'pos'],
    embedding_model: FinetunedEmbeddingModel({name: my_finetuned_model, embedding_dim: 768, max_seq_length: 8192}),
})

Managing Finetuned Models#

To open an existing finetuned model, use the FinetunedEmbeddingModel.open method:

FinetunedEmbeddingModel.open("my_finetuned_model")

FinetunedEmbeddingModel({
    name: my_finetuned_model,
    embedding_dim: 768,
    max_seq_length: 8192,
    base_model: PretrainedEmbeddingModel.GTE_BASE
})

Similarly to PretrainedEmbeddingModel, You can list all your finetuned models with the FinetunedEmbeddingModel.all method:

from orca_sdk import FinetunedEmbeddingModel

FinetunedEmbeddingModel.all()

To delete a finetuned model when you no longer need it, use the FinetunedEmbeddingModel.drop method:

FinetunedEmbeddingModel.drop("my_finetuned_model", if_not_exists="ignore") # (1)!

Note that you cannot delete embedding models that are currently in use in a memoryset.

The if_not_exists parameter specifies what to do if the model does not exist. If you pass "ignore", the method will do nothing if the model does not exist. If you pass "raise", the method will raise an error.

Choosing an Embedding Model#

Selecting the right embedding model for your use case is crucial for optimal performance. Here are some considerations:

Domain: If your data is from a specialized domain (e.g., medical, legal), consider using a context-aware model like CDE_SMALL or finetuning a model on domain-specific data.
Sequence Length: Choose a model with a maximum sequence length that accommodates your data. For example, GTE_BASE can handle up to 8192 tokens, while CDE_SMALL is limited to 512 tokens.
Performance vs. Speed: Larger models generally provide better embeddings but may be slower. Consider your latency requirements.

The best approach is often to experiment with different models and evaluate their performance on your specific task. Contact our team of ML experts if you need help choosing the right model for your use case.