Skip to content

orca_sdk.async_client#

ColumnType module-attribute #

ColumnType = Literal[
    "STRING",
    "FLOAT",
    "INT",
    "BOOL",
    "ENUM",
    "IMAGE",
    "OTHER",
]

The type of a column in a datasource

JobStatus module-attribute #

JobStatus = Literal[
    "INITIALIZED",
    "DISPATCHED",
    "WAITING",
    "PROCESSING",
    "COMPLETED",
    "FAILED",
    "ABORTING",
    "ABORTED",
]

Status of job in the job queue

WorkerStatus module-attribute #

WorkerStatus = Literal[
    "IDLE", "BUSY", "DRAINING", "SHUTDOWN", "CRASHED"
]

Status of worker in the worker pool

ActionRecommendation #

Bases: TypedDict

action instance-attribute #

action

The recommended action to take

rationale instance-attribute #

rationale

Explanation for why this action was recommended

EmbeddingFinetuneConfig #

Bases: TypedDict

loss instance-attribute #

loss

Which loss family to train with. ‘prediction’ adds a linear head on top of embeddings. ‘contrastive’ and ‘triplet’ train embeddings directly for similarity. ‘proxy’ uses class-proxy similarity. Each loss has different defaults for batch size, evaluation, and learning rate.

task_type instance-attribute #

task_type

Whether the model predicts discrete labels or continuous scores. The default is classification; set regression when training against a score column.

learning_rate instance-attribute #

learning_rate

Peak learning rate after warmup. Higher values train faster but risk instability. Tuple searches log-uniformly between (min, max).

batch_size instance-attribute #

batch_size

Total samples per training step. Larger batches give more stable gradients and better contrastive negatives but use more memory. Automatically split across GPUs and gradient accumulation steps. List to try multiple sizes in a sweep.

epochs instance-attribute #

epochs

Number of full passes over the training data.

max_steps instance-attribute #

max_steps

Maximum number of training steps. Overrides epochs when set. Useful for quick validation runs or capping long-running jobs.

warmup instance-attribute #

warmup

Learning rate warmup. int = number of steps, float = fraction of total steps (0–1). Warmup helps stabilize early training. Tuples/lists follow the same int/float convention.

weight_decay instance-attribute #

weight_decay

L2 regularization strength. Helps prevent overfitting. Typical range: 0.0 to 0.1.

learning_rate_scheduler instance-attribute #

learning_rate_scheduler

How the learning rate changes after warmup. ‘linear’ decays to zero, ‘cosine’ follows a cosine curve, ‘constant’ stays flat. List to compare schedulers in a sweep.

loss_scale instance-attribute #

loss_scale

Inverse temperature (1/τ) for contrastive and proxy losses — controls how sharply the model distinguishes similar from dissimilar pairs. Higher = more discriminative. Only used by contrastive and proxy losses.

contrastive_sigma instance-attribute #

contrastive_sigma

Gaussian kernel width for contrastive regression. None auto-tunes from the training score standard deviation. Only used by contrastive regression.

normalize_embeddings instance-attribute #

normalize_embeddings

L2-normalize embeddings before the prediction head. Can improve stability when embedding magnitudes vary. Only used by prediction loss.

max_seq_length instance-attribute #

max_seq_length

Maximum token length for input text. Longer sequences are truncated. ‘max’ fits the longest sample, ‘p95’ / ‘p99’ covers 95% / 99% of samples (saves memory), or set an int for an explicit limit.

truncation_side instance-attribute #

truncation_side

Which end to cut when text exceeds max_seq_length. ‘right’ keeps the beginning.

instruction instance-attribute #

instruction

Task instruction for instruction-tuned models (e.g. ‘Classify this text’). Formatted into ‘Instruct: {instruction}\nQuery: ’ by the embedding model. None uses the model’s built-in default prompt (if any).

bf16 instance-attribute #

bf16

Use bfloat16 mixed precision to halve memory usage and speed up training. None auto-enables on supported GPUs (A100, H100, etc.). Set False to force full precision.

device_batch_size_limit instance-attribute #

device_batch_size_limit

Maximum samples that fit in one GPU’s memory per forward pass. None auto-estimates from your GPU’s memory and model size. Override if you hit out-of-memory errors or want tighter control.

gradient_checkpointing instance-attribute #

gradient_checkpointing

Trade compute for memory by recomputing activations during backward. Roughly halves memory at ~30% slower training. None auto-enables when it would avoid quality-degrading workarounds (mini-batching for contrastive, or fitting an otherwise impossible triplet batch).

gather_across_devices instance-attribute #

gather_across_devices

Share contrastive negatives across all GPUs. Improves quality when the batch is too large for a single GPU. None auto-enables when needed. Only relevant for contrastive loss with multiple GPUs.

eval_method instance-attribute #

eval_method

How to measure model quality during training. ‘head’ reuses the prediction head (fast, prediction only). ‘neighbor’ runs nearest-neighbor search (works for all losses, slower). ‘loss’ uses the training loss on held-out data (cheapest). None picks ‘head’ for prediction, ‘neighbor’ otherwise.

eval_steps instance-attribute #

eval_steps

How often to evaluate. int = every N training steps, ‘epoch’ = once per epoch, ‘end’ = only after training finishes, ‘off’ = skip evaluation entirely. None picks a sensible default based on eval_method.

eval_batch_size instance-attribute #

eval_batch_size

Batch size for evaluation inference. None auto-detects from GPU memory.

max_eval_batch_size instance-attribute #

max_eval_batch_size

Cap on auto-detected eval_batch_size. Lower this if evaluation hits OOM.

neighbor_eval_count instance-attribute #

neighbor_eval_count

Number of nearest neighbors to consider for neighbor evaluation.

neighbor_eval_pool_subsample instance-attribute #

neighbor_eval_pool_subsample

Reduce the neighbor search pool for faster evaluation. int = use this many train samples, float = use this fraction. None uses the full train set.

early_stopping instance-attribute #

early_stopping

Stop training when the eval metric stops improving. True = stop after 2 evaluations without improvement, int = custom patience count, False = always train for all epochs. Requires eval_steps to run during training (not ‘end’ or ‘off’).

early_stopping_threshold instance-attribute #

early_stopping_threshold

Minimum improvement to count as progress for early stopping. 0.0 means any improvement resets the patience counter.

trial_count instance-attribute #

trial_count

Number of hyperparameter configurations to try. 1 = single training run. Values > 1 activate sweep mode, which uses Optuna to search over any parameter specified as a range or list.

startup_trial_count instance-attribute #

startup_trial_count

Number of random trials before the optimizer starts making informed suggestions. 0 = start optimizing immediately (good for small trial budgets).

seed instance-attribute #

seed

Random seed for reproducibility. Controls data shuffling, dropout, weight initialization, and sweep trial sampling.

logging_steps instance-attribute #

logging_steps

Print training metrics (loss, learning rate, etc.) every N steps.

accelerator_config instance-attribute #

accelerator_config

Advanced HuggingFace Accelerator settings. Most users can leave this as None.

extra_training_args instance-attribute #

extra_training_args

Additional HuggingFace TrainingArguments not exposed above. Merged directly into the training configuration.

MemorysetClassPatternsAnalysisConfig #

Bases: TypedDict

min_uniformity_threshold instance-attribute #

min_uniformity_threshold

Minimum uniformity score (0-1) required for a memory to be considered as a representative. A uniformity of 1.0 means all neighbors are the same class. Lower values allow more flexibility but may select fewer prototypical examples. Default is 1.0.

PRCurve #

Bases: TypedDict

thresholds instance-attribute #

thresholds

Threshold values for the curve

precisions instance-attribute #

precisions

Precision values at each threshold

recalls instance-attribute #

recalls

Recall values at each threshold

PredictionFeedbackRequest #

Bases: TypedDict

value instance-attribute #

value

The feedback value. For updates, UNSET means keep existing value. None means delete the feedback.

comment instance-attribute #

comment

Optional comment. For updates, UNSET means keep existing comment. None means remove the comment.

ROCCurve #

Bases: TypedDict

thresholds instance-attribute #

thresholds

Threshold values for the curve

false_positive_rates instance-attribute #

false_positive_rates

False positive rate values at each threshold

true_positive_rates instance-attribute #

true_positive_rates

True positive rate values at each threshold

RegressionMetrics #

Bases: TypedDict

coverage instance-attribute #

coverage

Percentage of predictions that are not none

mse instance-attribute #

mse

Mean squared error of the predictions

rmse instance-attribute #

rmse

Root mean squared error of the predictions

mae instance-attribute #

mae

Mean absolute error of the predictions

r2 instance-attribute #

r2

R-squared score (coefficient of determination) of the predictions

explained_variance instance-attribute #

explained_variance

Explained variance score of the predictions

loss instance-attribute #

loss

Mean squared error loss of the predictions

anomaly_score_mean instance-attribute #

anomaly_score_mean

Mean of anomaly scores across the dataset

anomaly_score_median instance-attribute #

anomaly_score_median

Median of anomaly scores across the dataset

anomaly_score_variance instance-attribute #

anomaly_score_variance

Variance of anomaly scores across the dataset

warnings instance-attribute #

warnings

Human-readable warnings about skipped or adjusted metrics

GetMemorysetByNameOrIdMemoryByMemoryIdParams #

Bases: TypedDict

memory_id instance-attribute #

memory_id

ID of the memory

DeleteMemorysetByNameOrIdMemoryByMemoryIdParams #

Bases: TypedDict

memory_id instance-attribute #

memory_id

ID of the memory

PostEmbeddingModelUploadRequest #

Bases: TypedDict

name instance-attribute #

name

Name for the embedding model

description instance-attribute #

description

Optional description for the embedding model

PostDatasourceUploadRequest #

Bases: TypedDict

name instance-attribute #

name

Name for the datasource

description instance-attribute #

description

Optional description for the datasource

GetDatasourceByNameOrIdDownloadParams #

Bases: TypedDict

file_type instance-attribute #

file_type

File type to download: * hf_dataset: Zipped HuggingFace dataset (default) * json: Row-oriented JSON array * csv: CSV file

GetClassificationModelParams #

Bases: TypedDict

memoryset_name_or_id instance-attribute #

memoryset_name_or_id

Filter by memoryset name or ID

GetRegressionModelParams #

Bases: TypedDict

memoryset_name_or_id instance-attribute #

memoryset_name_or_id

Filter by memoryset name or ID

GetPredictiveModelParams #

Bases: TypedDict

memoryset_name_or_id instance-attribute #

memoryset_name_or_id

Filter by memoryset name or ID

GetTelemetryPredictionByPredictionIdParams #

Bases: TypedDict

calc_neighborhood_density instance-attribute #

calc_neighborhood_density

Calculate neighborhood density

GetTelemetryPredictionByPredictionIdMemorySuggestionsParams #

Bases: TypedDict

prediction_id instance-attribute #

prediction_id

ID of the prediction to generate suggestions for

num_memories instance-attribute #

num_memories

Number of memory suggestions to generate

refresh instance-attribute #

refresh

Force the explanation agent to re-run even if a cached explanation exists

ClassificationMetrics #

Bases: TypedDict

coverage instance-attribute #

coverage

Percentage of predictions that are not none

f1_score instance-attribute #

f1_score

F1 score of the predictions

accuracy instance-attribute #

accuracy

Accuracy of the predictions

loss instance-attribute #

loss

Cross-entropy loss of the logits

anomaly_score_mean instance-attribute #

anomaly_score_mean

Mean of anomaly scores across the dataset

anomaly_score_median instance-attribute #

anomaly_score_median

Median of anomaly scores across the dataset

anomaly_score_variance instance-attribute #

anomaly_score_variance

Variance of anomaly scores across the dataset

roc_auc instance-attribute #

roc_auc

Receiver operating characteristic area under the curve

pr_auc instance-attribute #

pr_auc

Average precision (area under the curve of the precision-recall curve)

pr_curve instance-attribute #

pr_curve

Precision-recall curve

roc_curve instance-attribute #

roc_curve

Receiver operating characteristic curve

confusion_matrix instance-attribute #

confusion_matrix

Confusion matrix where the entry at row i, column j is the count of samples with true label i predicted as label j

warnings instance-attribute #

warnings

Human-readable warnings about skipped or adjusted metrics

OrcaAsyncClient #

Bases: AsyncClient

__init__ #

__init__(
    *,
    api_key=None,
    base_url="",
    headers=None,
    transport=None,
    timeout=None,
    limits=None,
    max_redirects=20,
    event_hooks=None,
    http1=True,
    http2=False,
    proxy=None,
    log_level=logging.WARNING
)

Initialize an OrcaAPI async httpx client

Parameters:

  • api_key (str | None, default: None ) –

    API key to use for authentication, will default to ORCA_API_KEY if not set.

  • base_url (URL | str, default: '' ) –

    URL of the OrcaAPI, will default to ORCA_API_URL or the cloud API URL if not set.

use #

use()

Context manager to inject this async client into OrcaSDK async methods