orca_sdk.embedding_model#
PretrainedEmbeddingModel
#
Bases: _EmbeddingModel
A pretrained embedding model
Models:
OrcaCloud supports a select number of small to medium sized embedding models that perform well on the Hugging Face MTEB Leaderboard. These can be accessed as class attributes. We currently support:
CDE_SMALL
: Context-aware CDE small model from Hugging Face (jxm/cde-small-v1)CLIP_BASE
: Multi-modal CLIP model from Hugging Face (sentence-transformers/clip-ViT-L-14)GTE_BASE
: Alibaba’s GTE model from Hugging Face (Alibaba-NLP/gte-base-en-v1.5)DISTILBERT
: DistilBERT embedding model from Hugging Face (distilbert-base-uncased)GTE_SMALL
: GTE-Small embedding model from Hugging Face (Supabase/gte-small)E5_LARGE
: E5-Large instruction-tuned embedding model from Hugging Face (intfloat/multilingual-e5-large-instruct)GIST_LARGE
: GIST-Large embedding model from Hugging Face (avsolatorio/GIST-large-Embedding-v0)MXBAI_LARGE
: Mixbreas’s Large embedding model from Hugging Face (mixedbread-ai/mxbai-embed-large-v1)QWEN2_1_5B
: Alibaba’s Qwen2-1.5B instruction-tuned embedding model from Hugging Face (Alibaba-NLP/gte-Qwen2-1.5B-instruct)
Examples:
Attributes:
-
name
–Name of the pretrained embedding model
-
embedding_dim
–Dimension of the embeddings that are generated by the model
-
max_seq_length
–Maximum input length (in tokens not characters) that this model can process. Inputs that are longer will be truncated during the embedding process
-
uses_context
–Whether the pretrained embedding model uses context
embed
#
all
classmethod
#
List all pretrained embedding models in the OrcaCloud
Returns:
-
list[PretrainedEmbeddingModel]
–A list of all pretrained embedding models available in the OrcaCloud
open
classmethod
#
Open an embedding model by name.
This is an alternative method to access models for environments where IDE autocomplete for model names is not available.
Parameters:
-
name
(str
) –Name of the model to open (e.g., “GTE_BASE”, “CLIP_BASE”)
Returns:
-
PretrainedEmbeddingModel
–The embedding model instance
Examples:
exists
classmethod
#
finetune
#
finetune(
name: str,
train_datasource: Datasource | LabeledMemoryset,
*,
eval_datasource: Datasource | None = None,
label_column: str = "label",
value_column: str = "value",
training_method: (
EmbeddingFinetuningMethod | str
) = EmbeddingFinetuningMethod.CLASSIFICATION,
training_args: dict | None = None,
if_exists: CreateMode = "error",
background: Literal[True]
) -> Job[FinetunedEmbeddingModel]
finetune(
name: str,
train_datasource: Datasource | LabeledMemoryset,
*,
eval_datasource: Datasource | None = None,
label_column: str = "label",
value_column: str = "value",
training_method: (
EmbeddingFinetuningMethod | str
) = EmbeddingFinetuningMethod.CLASSIFICATION,
training_args: dict | None = None,
if_exists: CreateMode = "error",
background: Literal[False] = False
) -> FinetunedEmbeddingModel
finetune(
name,
train_datasource,
*,
eval_datasource=None,
label_column="label",
value_column="value",
training_method=EmbeddingFinetuningMethod.CLASSIFICATION,
training_args=None,
if_exists="error",
background=False
)
Finetune an embedding model
Parameters:
-
name
(str
) –Name of the finetuned embedding model
-
train_datasource
(Datasource | LabeledMemoryset
) –Data to train on
-
eval_datasource
(Datasource | None
, default:None
) –Optionally provide data to evaluate on
-
label_column
(str
, default:'label'
) –Column name of the label
-
value_column
(str
, default:'value'
) –Column name of the value
-
training_method
(EmbeddingFinetuningMethod | str
, default:CLASSIFICATION
) –Training method to use
-
training_args
(dict | None
, default:None
) –Optional override for Hugging Face
TrainingArguments
. If not provided, reasonable training arguments will be used for the specified training method -
if_exists
(CreateMode
, default:'error'
) –What to do if a finetuned embedding model with the same name already exists, defaults to
"error"
. Other option is"open"
to open the existing finetuned embedding model. -
background
(bool
, default:False
) –Whether to run the operation in the background and return a job handle
Returns:
-
FinetunedEmbeddingModel | Job[FinetunedEmbeddingModel]
–The finetuned embedding model
Raises:
-
ValueError
–If the finetuned embedding model already exists and
if_exists
is"error"
or if it is"open"
but the base model param does not match the existing model
Examples:
FinetunedEmbeddingModel
#
Bases: _EmbeddingModel
A finetuned embedding model in the OrcaCloud
Attributes:
-
name
–Name of the finetuned embedding model
-
embedding_dim
–Dimension of the embeddings that are generated by the model
-
max_seq_length
–Maximum input length (in tokens not characters) that this model can process. Inputs that are longer will be truncated during the embedding process
-
uses_context
–Whether the model uses the memoryset to contextualize embeddings (acts akin to inverse document frequency in TFIDF features)
-
id
(str
) –Unique identifier of the finetuned embedding model
-
base_model
(PretrainedEmbeddingModel
) –Base model the finetuned embedding model was trained on
-
created_at
(datetime
) –When the model was finetuned
embed
#
all
classmethod
#
List all finetuned embedding model handles in the OrcaCloud
Returns:
-
list[FinetunedEmbeddingModel]
–A list of all finetuned embedding model handles in the OrcaCloud
open
classmethod
#
Get a handle to a finetuned embedding model in the OrcaCloud
Parameters:
-
name
(str
) –The name or unique identifier of a finetuned embedding model
Returns:
-
FinetunedEmbeddingModel
–A handle to the finetuned embedding model in the OrcaCloud
Raises:
-
LookupError
–If the finetuned embedding model does not exist
exists
classmethod
#
drop
classmethod
#
Delete the finetuned embedding model from the OrcaCloud
Parameters:
-
name_or_id
(str
) –The name or id of the finetuned embedding model
Raises:
-
LookupError
–If the finetuned embedding model does not exist and
if_not_exists
is"error"