Welcome to Orca#
Orca enables you to build and maintain retrieval-augmented models that can adapt to changing circumstances. This documentation will help you to get started building models with Orca, understand the core concepts of retrieval-augmentation, and teach you how to leverage Orca to maintain model performance through memory tuning.
Retrieval-Augmentation#
You might have heard of retrieval-augmented generation (RAG) in the context of LLMs, Orca takes a similar approach but applies it to arbitrary neural networks built with PyTorch.
Retrieval-Augmentation is a technique that enables machine learning models to adapt to new circumstances without retraining by accessing external data which we call “memories” that are stored separately from the model’s logic. During training, Orca injects relevant memory data based on model inputs, teaching the model to efficiently use this supplementary information alongside its inherent knowledge. At inference, the model reliably looks up and uses memories, allowing for behavior changes without retraining or redeployment.
This enables adapting model behavior in real-time by updating the memories used by the model to make predictions. Allowing you to solve issues such as data-drift, customize behavior to users at scale, identify and combat bias, and more without the need for retraining or redeployment. Check out our guide on retrieval-augmentation to learn more.
Orca provides a custom database optimized for efficient memory storage, embedding generation, and retrieval. It also offers capabilities for tracking memory lookups, recording feedback, and continuously tuning memories based on telemetry data to improve performance and ensure compliance.
Orca Components#
Orca consists of three main components that work together to enable building, instrumenting, and maintaining retrieval-augmented models:
OrcaDB is a database (1) that is optimized for memory storage and lookups that can also track telemetry data about memory usage and model runs.
- We provide hosted OrcaDB instances for different cloud providers and a local file-based database with a subset of the functionality for quick model prototyping.
Why Use a Special Database?
OrcaDB is designed to store and retrieve memory data efficiently for retrieval-augmented models. It provides the following features:
- Vector Indices: OrcaDB creates vector indices on your memory data which enables efficient similarity searches, making it easy to retrieve relevant memories for a given input.
- Embedding Generation: OrcaDB can compute embeddings for your memory data, which allows you to easily update memories in the database without having to worry about computing embeddings in your application code.
- Batch Lookups: OrcaDB is optimized for batch lookups, which allows you to retrieve multiple memories in a single request, speeding up the retrieval process significantly.
- Telemetry Storage: OrcaDB stores telemetry data about memory lookups and model runs, which allows you to analyze how your model is using memories and tune your memories based on the insights you gain.
- Reranking: OrcaDB has built-in support for reranking memory outputs with models that can be fine-tuned to your specific use case, allowing for even more accurate and relevant memory retrieval while maintaining the ability to update memories in real-time without the need to handle reranking logic in your application code.
OrcaLib is our Python library that allows you to easily store memories in OrcaDB, quickly build retrieval-augmented models, and enables instrumenting and analyzing memory usage.
Orca App is a web application for managing hosted OrcaDB instances, browsing memory data, monitoring memory usage, and tuning memories as model usage evolves over time.
Benefits of Orca#
Once you deploy a retrieval-augmented model and instrument it with Orca, you can leverage Orca to optimize model performance. Orca actively records memory lookups, analyzes memory relevance, and allows you to record feedback for all model runs. You can use this data to drive ongoing model performance by:
- Assessing which memories typically contribute to accurate and inaccurate results and make surgical updates to problematic memories.
- Identifying underperforming clusters of inputs that would benefit from additional memories and easily generating new (real or synthetic) memories.
- Understanding which memories contribute to specific outputs to ensure compliance and guarantee your model is making fair and unbiased decisions.
Structure of these Docs#
This documentation is structured to take you from zero to production and beyond with Orca. It consists of the following sections:
Tutorials teach you how to build your first retrieval-augmented models, understand how they can adapt to data drift, and use Orca to optimize your memories.
How-to Guides help you leverage all the bells and whistles of Orca to build your custom retrieval-augmented models, instrument them with Orca, and deploy them to production.
Concepts provides explanations of the core concepts of retrieval-augmentation and how to reason about memory usage of these models.
Reference contains the detailed specification for the interfaces of all public OrcaLib modules.
Where to Start#
To get started, follow our quick start to get up and running with Orca in a few minutes:
-
Quick Start
Setup OrcaDB, install OrcaLib, and build your first retrieval-augmented classifier in less than 10 minutes.
If you prefer some more in depth explanations of the core concepts of retrieval-augmentation, you might want to directly dive into one of our model building tutorials:
Do you still have questions or are unsure how Orca can fit into your specific use case? We’d love to chat with you!