Skip to content

Model Telemetry#

This guide explains how to work with prediction telemetry in OrcaCloud. You will learn what telemetry data is collected for predictions, how to retrieve predictions, record feedback, and use this data to analyze and improve model performance.

Predictions#

When you make predictions with a model in OrcaCloud, detailed telemetry data is automatically collected and stored. This includes information about the prediction itself, the memories that were used to make the prediction, and any feedback that was recorded for the prediction.

1
2
3
4
from orca_sdk import ClassificationModel

model = ClassificationModel.open("my_model")
model.predict("I love this movie").prediction_id # (1)!

  1. Every time you make a prediction, Orca automatically stores a LabelPrediction object with the prediction telemetry that is identified by a unique prediction_id.
"0195019a-5bc7-7afb-b902-5945ee1fb766"

Each prediction is represented by a LabelPrediction object with the following properties:

  • prediction_id: Unique identifier for the prediction
  • label: Predicted label (integer)
  • label_name: Human-readable name of the predicted label
  • confidence: Confidence score of the prediction (between 0 and 1)
  • input_value: Input value that was used to make the prediction
  • memory_lookups: List of memories that were used to ground the prediction
  • expected_label: Optional expected label that was set for the prediction
  • tags: Set of tags that were set for the prediction
  • feedback: Dictionary mapping from feedback category names to values

Retrieve Predictions#

You can retrieve predictions by their prediction_id using the get method:

1
2
3
from orca_sdk import LabelPrediction

LabelPrediction.get("0195019a-5bc7-7afb-b902-5945ee1fb766")
LabelPrediction({label: <positive: 1>, confidence: 0.95, input_value: 'I love this movie'})

You can also retrieve multiple predictions at once by passing a list of prediction_ids:

1
2
3
4
LabelPrediction.get([
    "0195019a-5bc7-7afb-b902-5945ee1fb766",
    "019501a1-ea08-76b2-9f62-95e4800b4841",
])

To retrieve predictions made by a specific model, use the model’s predictions method:

1
2
3
4
5
model.predictions(
    limit=10, # (1)!
    offset=0, # (2)!
    sort=[("timestamp", "desc")], # (3)!
)

  1. Maximum number of predictions to return
  2. Skip this many predictions
  3. Sort by timestamp in descending order. Predictions can be sorted by timestamp or confidence in either ascending ("asc") or descending ("desc") order.

Inspect Memory Lookups#

Each prediction stores information about which memories were used to make the prediction. You can access these through the memory_lookups property:

prediction.memory_lookups
[LabeledMemoryLookup({label: <pos: 1>, lookup_score: 0.95, attention_weight: 0.8, value: 'I love this movie'},
LabeledMemoryLookup({label: <pos: 1>, lookup_score: 0.5, attention_weight: 0.2, value: 'This movie was great'})]

Apart from the standard memory properties, each lookup contains the following properties:

  • lookup_score: Semantic similarity score between the model input and the memory (between 0 and 1)
  • attention_weight: Weight the model put on the memory to guide its prediction (between 0 and 1)

For a more interactive way to inspect the memories used by a prediction, you can open a UI by calling the inspect method:

prediction.inspect()

This is particularly useful when debugging why a model made a certain prediction or when trying to identify memories that need to be updated.

Tags#

Tags are used to organize and filter predictions. They’re useful for marking predictions that are made in specific contexts of your product or in different environments or to mark all telemetry from a test run.

Add Tags to Predictions#

To add a tag when making a prediction, simply pass a set of tags to the predict method:

model.predict("I love this movie", tags={"production", "user_123"})

To add a tag to a prediction after it has been made, use the add_tag method:

prediction.add_tag("test")

Remove Tags from Predictions#

To remove a specific tag from a prediction, use the remove_tag method:

prediction.remove_tag("test")

Query Predictions by Tag#

To query predictions by tag, use the model’s predictions method with the tag parameter:

model.predictions(tag="test")

Expected Labels#

To track model accuracy, you can record the expected output label for a prediction. This is also useful for tracking model accuracy and will enable generating new memories from samples your model saw in production to improve its performance over time.

Record Expected Labels#

To set an expected label for a prediction, use the update method on the prediction object:

prediction.update(expected_label=1)

To remove the expected label, set it to None:

prediction.update(expected_label=None)

Record Telemetry during Evaluation#

You can also record the expected labels when running a model evaluation. By passing the record_predictions parameter to the evaluate method, you can store predictions for later analysis:

1
2
3
4
5
6
7
metrics = model.evaluate(
    validation_datasource, # (1)!
    value_column="text", # (2)!
    label_column="label", # (3)!
    record_predictions=True,
    tags={"my_eval"}, # (4)!
)

  1. Datasource containing the validation data
  2. Column containing the input values in the datasource
  3. Column containing the expected labels that will be recorded for each prediction
  4. Tag the recorded predictions for easy retrieval

This will store predictions for each sample in the datasource and record the expected labels. You can then retrieve the predictions by tag:

1
2
3
4
incorrect_predictions = [
    p for p in model.predictions(tag="my_eval")
    if p.expected_label != p.label
]

Feedback#

To track how a model is performing in production, Orca allows you to record feedback in several categories for each prediction. This feedback could come from your users or from your own internal metrics.

Feedback Categories#

Feedback is organized into categories, where each category can accept either boolean or continuous values. A FeedbackCategory is created automatically the first time feedback with a new name is recorded. The value type of the category is inferred from the first recorded value. Subsequent feedback for the same category must be of the same type.

You can see what feedback categories already exist in your organization and what their value types are by calling FeedbackCategory.all:

FeedbackCategory.all()
[FeedbackCategory({name: 'accepted', value_type: <bool>}),
FeedbackCategory({name: 'rating', value_type: <float>})]

To reset a feedback category and delete all associated feedback, use the FeedbackCategory.drop method:

FeedbackCategory.drop("rating")

Record Feedback#

To enable automatic analysis of feedback, it must be recorded in a way that makes it clear whether the feedback is positive or negative. We support two types of feedback: binary and continuous.

Binary feedback is typically used for categorical feedback like “correct” or “incorrect”. To create a binary feedback category, you must record boolean values. Any positive feedback must have the value True and any negative feedback must have the value False. For example, to record a prediction as “accepted” or “inappropriate”:

prediction.record_feedback("accepted", True) # (1)!

  1. Store positive feedback in the "accepted" category

Continuous feedback is typically used for numerical feedback. To create a continuous feedback category, you must record floating point values between -1.0 and +1.0. Negative values indicate negative feedback and positive values indicate positive feedback. For example, to record a 5 star rating we would encode the values as follows:

  • 1 star: -1.0
  • 2 stars: -0.5
  • 3 stars: 0.0
  • 4 stars: 0.5
  • 5 stars: 1.0

1
2
3
4
5
prediction.record_feedback(
    category="rating",
    value=-0.5,
    comment="The actors were good, but the plot was weak.", # (1)!
)

  1. We can store an optional comment to provide more context about the feedback.

To record feedback for multiple predictions at once, use the model’s record_feedback method:

model.record_feedback([
    {
        "prediction_id": p.prediction_id,
        "category": "accepted",
        "value": False,
    },
    {
        "prediction_id": p.prediction_id,
        "category": "rating",
        "value": -0.5,
        "comment": "The actors were good, but the plot was weak.",
    },
])

Delete Feedback#

To delete feedback for a specific prediction, use the delete_feedback method:

prediction.delete_feedback("rating")

You can also delete feedback for several predictions at a time by calling the model’s record_feedback method with a list of predictions and setting the value to None:

1
2
3
4
5
6
7
8
model.record_feedback(
    {
        "prediction_id": p.prediction_id,
        "category": "rating",
        "value": None,
    }
    for p in predictions
)