Welcome to Orca#

Orca enables you to build and maintain retrieval-augmented models that can adapt to changing circumstances. This documentation will help you to get started building models with Orca, understand the core concepts of retrieval-augmentation, and teach you how to leverage Orca to maintain model performance through memory tuning.

Retrieval-Augmentation#

You might have heard of retrieval-augmented generation (RAG) in the context of LLMs, Orca takes a similar approach but applies it to other types of models like classification, regression, and recommendation models.

Retrieval-Augmentation is a technique that enables machine learning models to adapt to new circumstances without retraining by accessing external data which we call “memories” that are stored separately from the model’s logic. During training, Orca injects relevant memory data based on model inputs, teaching the model to efficiently use this supplementary information alongside its inherent knowledge. At inference, the model reliably looks up and uses memories, allowing for behavior changes without retraining or redeployment.

This enables adapting model behavior in real-time by updating the memories used by the model to make predictions. Allowing you to solve issues such as data-drift, customize behavior to users at scale, identify and combat bias, and more without the need for retraining or redeployment. Check out our guide on retrieval-augmentation to learn more.

Orca provides a fully managed cloud solution for hosting and maintaining retrieval-augmented models, storing and optimizing memories, and observing and tuning model behavior in real-time.

Key Concepts#

Before you begin, familiarize yourself with the following core concepts to streamline your experience:

Memory-Augmented Models: Orca’s proprietary models use external, human-readable data to enhance their knowledge and performance. When Orca trains these models, they learn to leverage this external data instead of memorizing their training data. Memory-augmented models consist of two components:

The foundation model: The underlying AI
Embeddings database: The integrated database the model leverages for each inference. The data stored in this database provides the model’s “knowledge,” and this knowledge can be updated without retraining.

Memories: Discrete, unique pieces of information saved as embeddings, that the model uses for each inference

Memoryset: A complete collection of memories.

Memory management: The practice of determining which data points (“memories”) positively or negatively impact model performance, enabling informed adjustments.

Orca Components#

OrcaCloud is a fully managed cloud solution for hosting memory-augmented models, storing and embedding memories, collecting telemetry data about memory usage, and observing and tuning model behavior in real-time. Orca offers two ways to create, observe, and steer your models:

OrcaSDK is a Python library that allows you to easily ingest memories into OrcaCloud, quickly deploy retrieval-augmented models, collect feedback from model predictions, and analyze and optimize memory usage. OrcaSDK allows knowledgeable builders greater control to adapt model architectures and tweak embeddings to unlock “last-mile” increases in accuracy.
OrcaApp is a web application for managing models in the OrcaCloud, creating memorysets, browsing memory data, auditing model predictions, and tuning memories. OrcaApp offers additional guided workflows and co-pilots that help with model set-up, memory steering, and memoryset changes.

Overview Diagram

Benefits of Orca#

Once you deploy a retrieval-augmented model and instrument it with Orca, you can leverage Orca to optimize model performance. Orca actively records memory usage, analyzes memory relevance, and allows you to record feedback for all model predictions. You can use this data to drive ongoing model performance by:

Assessing which memories contribute to accurate and inaccurate results and making surgical updates to problematic memories.
Identifying underperforming clusters of inputs that would benefit from additional memories and easily generating new (real or synthetic) memories.
Understanding which memories contribute to specific outputs to ensure compliance and guarantee your model is making fair and unbiased decisions.
Creating new memorysets for new use cases and sub-populations of users that can be swapped at each inference to enable customized predictions.

Structure of these Docs#

Because Orca offers two different ways to create, observe, and steer your models, you will find documentation dedicated to both OrcaSDK and OrcaApp. In both cases, this documentation is structured to take you from zero to production and beyond with Orca.

For using the OrcaSDK, please consult the following sections:

Quick Start (SDK) will get you up and running with Orca on the command line in a few minutes.

How-to Guides help you leverage all the bells and whistles of Orca to deploy retrieval-augmented models, and tune memories to optimize model performance.
Reference contains the detailed specification for the interfaces of all public OrcaSDK modules as well as the OrcaCloud API.

For OrcaApp, please consult the Quick Start (App). This page will walk you through each step of Orca’s workflows and provide the necessary information for you to make informed decisions when building, tuning, and steering models.

For a more in depth explanation of the core concepts of retrieval-augmentation, please consult the Concepts section.

Where to Start#

Whether you elect to use the OrcaSDK or OrcaApp, you should be able to setup and build your first memory-augmented predictive model in less than 10 minutes.

To get started, follow our quick start to get up and running with Orca in a few minutes:

SDK Quick Start

Build your first retrieval-augmented classifier in less than 10 minutes using the OrcaSDK.

Continue
App Quick Start

Learn how to use OrcaApp to build, tune, and steer retrieval-augmented models.

Continue

Do you still have questions or are unsure how Orca can fit into your specific use case? We’d love to chat with you!

Let’s talk!