Model Telemetry#

This guide explains how to work with prediction telemetry in OrcaCloud. You will learn what telemetry data is collected for predictions, how to retrieve predictions, record feedback, and use this data to analyze and improve model performance.

Predictions#

When you make predictions with a model in OrcaCloud, detailed telemetry data is automatically collected and stored. This includes information about the prediction itself, the memories that were used to make the prediction, and any feedback that was recorded for the prediction.

from orca_sdk import ClassificationModel

model = ClassificationModel.open("my_model")
model.predict("I love this movie").prediction_id # (1)!

Every time you make a prediction, Orca automatically stores a [LabelPrediction][orca_sdk.telemetry.LabelPrediction] object with the prediction telemetry that is identified by a unique prediction_id.

"0195019a-5bc7-7afb-b902-5945ee1fb766"

Each prediction is represented by a [LabelPrediction][orca_sdk.telemetry.LabelPrediction] object with the following properties:

prediction_id: Unique identifier for the prediction
label: Predicted label (integer)
label_name: Human-readable name of the predicted label
confidence: Confidence score of the prediction (between 0 and 1)
input_value: Input value that was used to make the prediction
memory_lookups: List of memories that were used to ground the prediction
expected_label: Optional expected label that was set for the prediction
tags: Set of tags that were set for the prediction
feedback: Dictionary mapping from feedback category names to values

Retrieve Predictions#

You can retrieve predictions by their prediction_id using the [get][orca_sdk.telemetry.LabelPrediction.get] method:

from orca_sdk import LabelPrediction

LabelPrediction.get("0195019a-5bc7-7afb-b902-5945ee1fb766")

LabelPrediction({label: <positive: 1>, confidence: 0.95, input_value: 'I love this movie'})

You can also retrieve multiple predictions at once by passing a list of prediction_ids:

LabelPrediction.get([
    "0195019a-5bc7-7afb-b902-5945ee1fb766",
    "019501a1-ea08-76b2-9f62-95e4800b4841",
])

To retrieve predictions made by a specific model, use the model’s predictions method:

model.predictions(
    limit=10, # (1)!
    offset=0, # (2)!
    sort=[("timestamp", "desc")], # (3)!
)

Maximum number of predictions to return
Skip this many predictions
Sort by timestamp in descending order. Predictions can be sorted by timestamp or confidence in either ascending ("asc") or descending ("desc") order.

Inspect Memory Lookups#

Each prediction stores information about which memories were used to make the prediction. You can access these through the memory_lookups property:

prediction.memory_lookups

[LabeledMemoryLookup({label: <pos: 1>, lookup_score: 0.95, attention_weight: 0.8, value: 'I love this movie'},
LabeledMemoryLookup({label: <pos: 1>, lookup_score: 0.5, attention_weight: 0.2, value: 'This movie was great'})]

Apart from the standard memory properties, each lookup contains the following properties:

lookup_score: Semantic similarity score between the model input and the memory (between 0 and 1)
attention_weight: Weight the model put on the memory to guide its prediction (between 0 and 1)

For a more interactive way to inspect the memories used by a prediction, you can open a UI by calling the [inspect][orca_sdk.telemetry.LabelPrediction.inspect] method:

prediction.inspect()

This is particularly useful when debugging why a model made a certain prediction or when trying to identify memories that need to be updated.

Tags#

Tags are used to organize and filter predictions. They’re useful for marking predictions that are made in specific contexts of your product or in different environments or to mark all telemetry from a test run.

Add Tags to Predictions#

To add a tag when making a prediction, simply pass a set of tags to the predict method:

model.predict("I love this movie", tags={"production", "user_123"})

To add a tag to a prediction after it has been made, use the [add_tag][orca_sdk.telemetry.LabelPrediction.add_tag] method:

prediction.add_tag("test")

Remove Tags from Predictions#

To remove a specific tag from a prediction, use the [remove_tag][orca_sdk.telemetry.LabelPrediction.remove_tag] method:

prediction.remove_tag("test")

Query Predictions by Tag#

To query predictions by tag, use the model’s predictions method with the tag parameter:

model.predictions(tag="test")

Expected Labels#

To track model accuracy, you can record the expected output label for a prediction. This will also enable generating new memories from samples your model saw in production to improve its performance over time.

Record Expected Labels#

To set an expected label for a prediction, use the [update][orca_sdk.telemetry.LabelPrediction.update] method on the prediction object:

prediction.update(expected_label=1)

To remove the expected label, set it to None:

prediction.update(expected_label=None)

Telemetry from Evaluations#

You can also record the expected labels when running a model evaluation. By passing the record_predictions parameter to the evaluate method, you can store predictions for later analysis: (1)

See the quickstart guide for more details on how to evaluate a model.

metrics = model.evaluate(
    validation_datasource, # (1)!
    value_column="text", # (2)!
    label_column="label", # (3)!
    record_predictions=True,
    tags={"my_eval"}, # (4)!
)

Datasource containing the validation data
Column containing the input values in the datasource
Column containing the expected labels that will be recorded for each prediction
Tag the recorded predictions for easy retrieval

This will store predictions for each sample in the datasource along with the sample’s label as the prediction’s expected labels.

Filter Predictions by Expected Label Match#

To filter predictions by whether they match the expected label, use the expected_label_match parameter:

model.predictions(expected_label_match=False) # (1)!

This will only return predictions where the expected label is present and does not match the predicted label. To filter by predictions where the expected label matches the predicted label, set expected_label_match=True.

Feedback#

To track how a model is performing in production, Orca allows you to record feedback in as many categories as you’d like for each prediction. This feedback could come from your users or from your own internal metrics.

Feedback Categories#

Feedback is organized into categories, where each category can accept either binary or continuous values. A FeedbackCategory is created automatically the first time feedback with a new name is recorded. The value type of the category is inferred from the first recorded value. Subsequent feedback for the same category must be of the same type.

You can see what feedback categories already exist in your organization and what their value types are by calling FeedbackCategory.all:

FeedbackCategory.all()

[FeedbackCategory({name: 'accepted', value_type: <bool>}),
FeedbackCategory({name: 'rating', value_type: <float>})]

To reset a feedback category and delete all associated feedback, use the FeedbackCategory.drop method:

FeedbackCategory.drop("rating")

Record Feedback#

To enable automatic analysis of feedback, it must be recorded in a way that makes it clear whether the feedback is positive or negative. We support two types of feedback: binary and continuous.

Binary feedback is typically used for categorical feedback like “correct” or “incorrect”. To create a binary feedback category, you must record boolean values. Any positive feedback must have the value True and any negative feedback must have the value False. For example, to record a prediction as “accepted” use the model’s record_feedback method with a boolean value:

prediction.record_feedback("accepted", True) # (1)!

Store positive feedback in the "accepted" category

Continuous feedback is typically used for numerical feedback. To create a continuous feedback category, you must record floating point values between -1.0 and +1.0. Negative values indicate negative feedback and positive values indicate positive feedback. (1)

For example, to record a 5 star rating we would encode the values as follows:
- 1 star: -1.0
- 2 stars: -0.5
- 3 stars: 0.0
- 4 stars: 0.5
- 5 stars: 1.0

prediction.record_feedback(
    category="rating",
    value=-0.5, # 2 stars
    comment="The actors were good, but the plot was weak.", # (1)!
)

We can store an optional comment to provide more context about the feedback.

We can also use the model’s record_feedback method to record feedback for multiple predictions at once:

model.record_feedback([
    {
        "prediction_id": p.prediction_id,
        "category": "accepted",
        "value": False,
    },
    {
        "prediction_id": p.prediction_id,
        "category": "rating",
        "value": -0.5,
        "comment": "The actors were good, but the plot was weak.",
    },
])

Delete Feedback#

To delete feedback for a specific prediction, use the [delete_feedback][orca_sdk.telemetry.LabelPrediction.delete_feedback] method:

prediction.delete_feedback("rating")

You can also delete feedback for several predictions at a time by calling the model’s record_feedback method with a list of predictions and setting the value to None:

model.record_feedback(
    {
        "prediction_id": p.prediction_id,
        "category": "rating",
        "value": None,
    }
    for p in predictions
)