orca_sdk.embedding_model#
PretrainedEmbeddingModel
#
Bases: _EmbeddingModel
A pretrained embedding model
Models:
OrcaCloud supports a select number of small to medium sized embedding models that perform well on the Hugging Face MTEB Leaderboard. These can be accessed as class attributes. We currently support:
CDE_SMALL
: Context-aware CDE small model from Hugging Face (jxm/cde-small-v1)CLIP_BASE
: Multi-modal CLIP model from Hugging Face (sentence-transformers/clip-ViT-L-14)GTE_BASE
: Alibaba’s GTE model from Hugging Face (Alibaba-NLP/gte-base-en-v1.5)
Examples:
Attributes:
-
name
–Name of the pretrained embedding model
-
embedding_dim
–Dimension of the embeddings that are generated by the model
-
max_seq_length
–Maximum input length (in tokens not characters) that this model can process. Inputs that are longer will be truncated during the embedding process
-
uses_context
–Whether the pretrained embedding model uses context
all
classmethod
#
List all pretrained embedding models in the OrcaCloud
Returns:
-
list[PretrainedEmbeddingModel]
–A list of all pretrained embedding models available in the OrcaCloud
exists
classmethod
#
finetune
#
finetune(
name,
train_datasource,
*,
eval_datasource=None,
label_column="label",
value_column="value",
training_method=EmbeddingFinetuningMethod.CLASSIFICATION,
training_args=None,
if_exists="error"
)
Finetune an embedding model
Parameters:
-
name
(str
) –Name of the finetuned embedding model
-
train_datasource
(Datasource | LabeledMemoryset
) –Data to train on
-
eval_datasource
(Datasource | None
, default:None
) –Optionally provide data to evaluate on
-
label_column
(str
, default:'label'
) –Column name of the label
-
value_column
(str
, default:'value'
) –Column name of the value
-
training_method
(EmbeddingFinetuningMethod | str
, default:CLASSIFICATION
) –Training method to use
-
training_args
(dict | None
, default:None
) –Optional override for Hugging Face
TrainingArguments
. If not provided, reasonable training arguments will be used for the specified training method -
if_exists
(CreateMode
, default:'error'
) –What to do if a finetuned embedding model with the same name already exists, defaults to
"error"
. Other option is"open"
to open the existing finetuned embedding model.
Returns:
-
FinetunedEmbeddingModel
–The finetuned embedding model
Raises:
-
ValueError
–If the finetuned embedding model already exists and
if_exists
is"error"
or if it is"open"
but the base model param does not match the existing model
Examples:
FinetunedEmbeddingModel
#
Bases: _EmbeddingModel
A finetuned embedding model in the OrcaCloud
Attributes:
-
name
–Name of the finetuned embedding model
-
embedding_dim
–Dimension of the embeddings that are generated by the model
-
max_seq_length
–Maximum input length (in tokens not characters) that this model can process. Inputs that are longer will be truncated during the embedding process
-
uses_context
–Whether the model uses the memoryset to contextualize embeddings (acts akin to inverse document frequency in TFIDF features)
-
id
(str
) –Unique identifier of the finetuned embedding model
-
base_model
(PretrainedEmbeddingModel
) –Base model the finetuned embedding model was trained on
-
created_at
(datetime
) –When the model was finetuned
all
classmethod
#
List all finetuned embedding model handles in the OrcaCloud
Returns:
-
list[FinetunedEmbeddingModel]
–A list of all finetuned embedding model handles in the OrcaCloud
open
classmethod
#
Get a handle to a finetuned embedding model in the OrcaCloud
Parameters:
-
name
(str
) –The name or unique identifier of a finetuned embedding model
Returns:
-
FinetunedEmbeddingModel
–A handle to the finetuned embedding model in the OrcaCloud
Raises:
-
LookupError
–If the finetuned embedding model does not exist
exists
classmethod
#
drop
classmethod
#
Delete the finetuned embedding model from the OrcaCloud
Parameters:
-
name_or_id
(str
) –The name or id of the finetuned embedding model
Raises:
-
LookupError
–If the finetuned embedding model does not exist and
if_not_exists
is"error"