Skip to content

orca_sdk.embedding_model#

PretrainedEmbeddingModel #

Bases: _EmbeddingModel

A pretrained embedding model

Models:

OrcaCloud supports a select number of small to medium sized embedding models that perform well on the Hugging Face MTEB Leaderboard. These can be accessed as class attributes. We currently support:

Examples:

>>> PretrainedEmbeddingModel.CDE_SMALL
PretrainedEmbeddingModel({name: CDE_SMALL, embedding_dim: 768, max_seq_length: 512})

Attributes:

  • name

    Name of the pretrained embedding model

  • embedding_dim

    Dimension of the embeddings that are generated by the model

  • max_seq_length

    Maximum input length (in tokens not characters) that this model can process. Inputs that are longer will be truncated during the embedding process

  • uses_context

    Whether the pretrained embedding model uses context

all classmethod #

all()

List all pretrained embedding models in the OrcaCloud

Returns:

exists classmethod #

exists(name)

Check if a pretrained embedding model exists by name

Parameters:

  • name (str) –

    The name of the pretrained embedding model

Returns:

  • bool

    True if the pretrained embedding model exists, False otherwise

finetune #

finetune(
    name,
    train_datasource,
    *,
    eval_datasource=None,
    label_column="label",
    value_column="value",
    training_method=EmbeddingFinetuningMethod.CLASSIFICATION,
    training_args=None,
    if_exists="error"
)

Finetune an embedding model

Parameters:

  • name (str) –

    Name of the finetuned embedding model

  • train_datasource (Datasource | LabeledMemoryset) –

    Data to train on

  • eval_datasource (Datasource | None, default: None ) –

    Optionally provide data to evaluate on

  • label_column (str, default: 'label' ) –

    Column name of the label

  • value_column (str, default: 'value' ) –

    Column name of the value

  • training_method (EmbeddingFinetuningMethod | str, default: CLASSIFICATION ) –

    Training method to use

  • training_args (dict | None, default: None ) –

    Optional override for Hugging Face TrainingArguments. If not provided, reasonable training arguments will be used for the specified training method

  • if_exists (CreateMode, default: 'error' ) –

    What to do if a finetuned embedding model with the same name already exists, defaults to "error". Other option is "open" to open the existing finetuned embedding model.

Returns:

Raises:

  • ValueError

    If the finetuned embedding model already exists and if_exists is "error" or if it is "open" but the base model param does not match the existing model

Examples:

1
2
3
>>> datasource = Datasource.open("my_datasource")
>>> model = PretrainedEmbeddingModel.CLIP_BASE
>>> model.finetune("my_finetuned_model", datasource)

FinetunedEmbeddingModel #

Bases: _EmbeddingModel

A finetuned embedding model in the OrcaCloud

Attributes:

  • name

    Name of the finetuned embedding model

  • embedding_dim

    Dimension of the embeddings that are generated by the model

  • max_seq_length

    Maximum input length (in tokens not characters) that this model can process. Inputs that are longer will be truncated during the embedding process

  • uses_context

    Whether the model uses the memoryset to contextualize embeddings (acts akin to inverse document frequency in TFIDF features)

  • id (str) –

    Unique identifier of the finetuned embedding model

  • base_model (PretrainedEmbeddingModel) –

    Base model the finetuned embedding model was trained on

  • created_at (datetime) –

    When the model was finetuned

base_model property #

base_model

Pretrained model the finetuned embedding model was based on

all classmethod #

all()

List all finetuned embedding model handles in the OrcaCloud

Returns:

open classmethod #

open(name)

Get a handle to a finetuned embedding model in the OrcaCloud

Parameters:

  • name (str) –

    The name or unique identifier of a finetuned embedding model

Returns:

Raises:

  • LookupError

    If the finetuned embedding model does not exist

exists classmethod #

exists(name_or_id)

Check if a finetuned embedding model with the given name or id exists.

Parameters:

  • name_or_id (str) –

    The name or id of the finetuned embedding model

Returns:

  • bool

    True if the finetuned embedding model exists, False otherwise

drop classmethod #

drop(name_or_id, *, if_not_exists='error')

Delete the finetuned embedding model from the OrcaCloud

Parameters:

  • name_or_id (str) –

    The name or id of the finetuned embedding model

Raises:

  • LookupError

    If the finetuned embedding model does not exist and if_not_exists is "error"