Skip to content

orcalib.orca_torch_mixins#

ProjectionMode #

Bases: str, Enum

Determines how the values from the memory should be “projected” into the models embedding space (i.e. what’s the V in the attention mechanism QKV).

Attributes:

  • LABEL

    Project the memory’s label into the model embedding space.

  • POSITIONAL

    Project the memory’s position (0…num_memories-1) into the model embedding space.

ClassificationMode #

Bases: str, Enum

Determined how the final classification is performed.

Attributes:

  • DIRECT

    Predicts directly into num_classes like a conventional classification model.

  • MEMORY_BOUND

    which uses memory binding to make the prediction (i.e. pick from the classes in the memories).

DropExactMatchOption #

Bases: str, Enum

Determines when to drop exact matches from the results.

Attributes:

  • ALWAYS

    Always drop exact matches from the results.

  • NEVER

    Never drop exact matches from the results.

  • TRAINING_ONLY

    Drop exact matches from the results only during training.

  • INFERENCE_ONLY

    Drop exact matches from the results only during inference.

PostInitMixin #

Bases: ABC

Mixin class that adds an (abstract) post_init() and wraps descendent’s init() to call it.

Note:

If PostInitMixin appears more than once in the inheritance chain, only the outermost class will run post_init(). In other words, the post_init method will only be called once, after all other init methods have been called, even if there are multiple PostInitMixin classes in the inheritance chain.

post_init abstractmethod #

post_init()

Override this function to execute code after the init method.

PreForwardMixin #

Bases: ABC

Mixin class that adds an (abstract) pre_forward() and wraps descendent’s forward() to call it before the original forward method.

NOTE: This uses functools.wraps to wrap the forward method, so the original forward method’s signature is preserved.

pre_forward abstractmethod #

pre_forward(*args, **kwargs)

Override this function to execute code before the forward method is called.

post_forward abstractmethod #

post_forward(output)

Override this function to execute code right after the forward method returned.

CurateSettingsMixin #

1
2
3
4
5
6
7
8
CurateSettingsMixin(
    curate_database=None,
    model_id=None,
    model_version=None,
    metadata=None,
    curate_enabled=False,
    tags=None,
)

Mixin that adds curate settings to a class as self.curate_settings, then provides properties to access the individual settings.

Note

This class is intended to be used with OrcaModule classes, and should not be used directly.

Parameters:

  • curate_database (OrcaDatabase | str | None, default: None ) –

    The database to use for saving curate tracking data.

  • model_id (str | None, default: None ) –

    The model id to associate with curated model runs.

  • model_version (str | None, default: None ) –

    The model version to associate with curated model runs.

  • metadata (OrcaMetadataDict | None, default: None ) –

    The metadata to attach to curated model runs.

  • curate_enabled (bool, default: False ) –

    Whether the model should collect curate tracking data during eval runs.

  • tags (Iterable[str] | None, default: None ) –

    The tags to attach to the curated model runs.

curate_database property writable #

curate_database

The name of the database to use for saving curate tracking data.

curate_next_run_settings property writable #

curate_next_run_settings

The settings for the next curate model run.

curate_model_id property writable #

curate_model_id

The model id to associate with curated model runs.

curate_model_version property writable #

curate_model_version

The model version to associate with curated model runs.

curate_metadata property writable #

curate_metadata

The metadata to attach to curated model runs.

curate_tags property writable #

curate_tags

The tags to attach to the curated model runs.

curate_seq_id property writable #

curate_seq_id

The sequence id to associate with curated model runs.

curate_batch_size property writable #

curate_batch_size

The batch size of the model run to track curate data for, usually inferred automatically.

last_curate_run_ids property writable #

last_curate_run_ids

The run ids of the last model run for which curate tracking data was collected.

last_curate_run_settings property writable #

last_curate_run_settings

The settings of the last model run for which curate tracking data was collected.

DatabaseIndexName dataclass #

DatabaseIndexName(database_name, index_name)

Holds the name of an index and its associated database.

Attributes:

  • database_name (str) –

    The name of database for the index

  • index_name (str) –

    The name of the index

LookupSettingsSummary dataclass #

1
2
3
4
5
6
7
8
9
LookupSettingsSummary(
    lookup_database_name,
    memory_index_name,
    lookup_column_names=list(),
    num_memories_range=None,
    drop_exact_match=list(),
    exact_match_thresholds=list(),
    shuffle_memories=list(),
)

A summary of lookup settings for a collection of OrcaLookupModule instances that share the same database and index.

Note

The summary doesn’t actually summarize over all possible settings, but instead chooses to ignore the “override” settings (e.g., lookup_result_override, lookup_query_override).

Attributes:

  • lookup_database_name (str) –

    The name of the database used for looking up memories. This is half of the key for the summary; the other half is the memory index name.

  • memory_index_name (str) –

    The name of the index used for looking up memories. This is half of the key for the summary; the other half is the lookup database name.

  • lookup_column_names (list[str]) –

    A list of lookup columns that were requested by any of the [LookupSettings] in this summary.

  • num_memories_range (tuple[int, int] | None) –

    The range of the number of memories to look up across all [LookupSettings] in this summary. This will be None if num_memories was not set in any of the [LookupSettings].

  • drop_exact_match (list[DropExactMatchOption]) –

    The options for dropping exact matches from the results for all [LookupSettings] in this summary. This will be [] if no drop-exact-match options were set.

  • exact_match_thresholds (list[float]) –

    The exact-match thresholds for all [LookupSettings] in this this summary. This will be [] if no exact-match thresholds were set.

  • shuffle_memories (list[bool]) –

    The shuffle-memories options for all [LookupSettings] in this summary. This will be [] if no shuffle-memories options were set.

Example

settings1 = LookupSettings(
    lookup_database_name="test_db",
    memory_index_name="test_index1",
    lookup_column_names=["col1", "col2", "$score"],
    num_memories=10,
    drop_exact_match=DropExactMatchOption.NEVER,
)

settings2 = LookupSettings(
    lookup_database_name="test_db",
    memory_index_name="test_index1",
    lookup_column_names=["col2", "col3"],
    num_memories=20,
    drop_exact_match=DropExactMatchOption.ALWAYS,
)

summary = LookupSettingsSummary.from_lookup_settings([settings1, settings2])
Result:
{
    DatabaseIndexName("test_db", "test_index1"): LookupSettingsSummary(
        lookup_database_name="test_db",
        memory_index_name="test_index1",
        lookup_column_names=["col1", "col2", "col3", "$score"],
        num_memories_range=(10, 20),
        drop_exact_match=[DropExactMatchOption.NEVER, DropExactMatchOption.ALWAYS],
        exact_match_thresholds=[],
        shuffle_memories=[],
    )
}

__or__ #

__or__(settings)

Merges a LookupSettings object into the LookupSettingsSummary object.

Parameters:

  • settings (LookupSettings) –

    The LookupSettings object to merge.

Returns:

from_lookup_settings classmethod #

from_lookup_settings(lookup_settings)

Create a dictionary of LookupSettingsSummary objects from a collection of LookupSettings objects. This is useful for summarizing the lookup settings for a collection of OrcaLookupModule instances. The keys of the dictionary are DatabaseIndexName objects, so we have a separate summary object for each unique database–index combination.

Parameters:

  • lookup_settings (Iterable[LookupSettings]) –

    An iterable collection LookupSettings objects to summarize.

Returns:

LookupSettingsMixin #

LookupSettingsMixin(
    lookup_database=None,
    memory_index_name=None,
    lookup_column_names=None,
    num_memories=None,
    drop_exact_match=None,
    exact_match_threshold=None,
    shuffle_memories=False,
    lookup_result_override=None,
    lookup_result_transforms=None,
    lookup_query_override=None,
    extra_lookup_column_names=None,
    freeze_num_memories=False,
    propagate_lookup_settings=True,
)

Mixin that adds lookup settings to a class as self.lookup_settings, then provides properties to access the individual settings.

Note

This class is intended to be used with OrcaModule classes, and should not be used directly.

Parameters:

  • lookup_database (OrcaDatabase | str | None, default: None ) –

    The database to use for looking up memories.

  • memory_index_name (str | None, default: None ) –

    The name of the index to use for looking up memories.

  • lookup_column_names (list[str] | None, default: None ) –

    The names of the columns to retrieve for each memory.

  • num_memories (int | None, default: None ) –

    The number of memories to look up.

  • drop_exact_match (DropExactMatchOption | None, default: None ) –

    Whether to drop exact matches from the results.

  • exact_match_threshold (float | None, default: None ) –

    The similarity threshold for exact matches.

  • shuffle_memories (bool, default: False ) –

    Whether to shuffle the looked up memories.

  • freeze_num_memories (bool, default: False ) –

    Whether to freeze the number of memories once set.

  • propagate_lookup_settings (bool, default: True ) –

    Whether to propagate lookup settings to child modules.

lookup_result_transforms property writable #

lookup_result_transforms

A list of transforms to apply to the lookup result. NOTE: This will be applied even when lookup_result_override is set.

extra_lookup_column_names property writable #

extra_lookup_column_names

While set, all lookups will include these additional columns. They may inclue columns on the indexed table as well as index-specific columns, e.g., $score, $embedding.

lookup_query_override property writable #

lookup_query_override

The query to use instead of performing a lookup. NOTE: This will be ignored if lookup_result_override is also set.

lookup_result_override property writable #

lookup_result_override

The lookup result to use instead of performing a lookup.

lookup_database property writable #

lookup_database

The name of the database to use for looking up memories.

memory_index_name property writable #

memory_index_name

The name of the index to use for looking up memories.

lookup_column_names property writable #

lookup_column_names

The names of the columns to retrieve for each memory.

num_memories property writable #

num_memories

The number of memories to look up.

drop_exact_match property writable #

drop_exact_match

Whether to drop exact matches from the results.

exact_match_threshold property writable #

exact_match_threshold

The similarity threshold for exact matches.

shuffle_memories property writable #

shuffle_memories

Whether to shuffle the looked up memories.

get_effective_lookup_settings #

get_effective_lookup_settings()

Returns the effective lookup settings for this module, with any inherited settings applied.

Returns:

  • LookupSettings

    The effective lookup settings for this module. Practically, this be the lookup settings

  • LookupSettings

    set on this module. For any settings that are not set on this module, the inherited settings

  • LookupSettings

    will be used instead.

get_lookup_database_instance #

get_lookup_database_instance()

Returns the OrcaDatabase instance to use for looking up memories.

LabelColumnNameMixin #

LabelColumnNameMixin()

Mixin that lets the user set a label column for lookup instead of requiring them to set the lookup column names directly. It can be mixed with OrcaModel or OrcaLookupModule classes.

This is useful when the user wants the lookup columns to be ["$embedding", label_column_name]. The label_column_name property handles updates to lookup_column_names automatically.

Note

Make sure to set self.label_column_name AFTER calling super().__init__(...) in derived modules/models.

label_column_name property writable #

label_column_name

The name of the label column to use for lookup.