Skip to content

orcalib.rac.drift_simulator#

drift_classes #

drift_classes(dataset, drift_ratios)

Modify a balanced Huggingface dataset into an unequal distribution.

Parameters:

  • dataset (DatasetDict | Dataset) –

    The Huggingface dataset (assumed balanced).

  • drift_ratios (dict[int, float]) –

    A dictionary where keys are class ints, and values are the desired proportion of samples to retain (e.g., {0: 0.5, 1: 0.2, 2: 1.0}). If a key is missing it will be unchanged.

Returns:

test_dataset_drift #

test_dataset_drift(
    dataset,
    drift_ratios,
    model_a_config,
    model_b_config,
    memoryset_config=default_memoryset_config,
    training_config_a=training_config,
    training_config_b=training_config,
    graph_config={
        "title": "Drift performance comparison",
        "xaxis": "Metrics",
        "yaxis": "Score",
    },
    dataset_name=None,
)

Evaluate the performance of two models on a dataset that has been artificially drifted.

Parameters:

  • dataset (Dataset | IterableDataset | IterableDatasetDict | DatasetDict) –

    huggingface dataset

  • drift_ratios (dict[int, float]) –

    A dictionary of class to drift ratio ex {0: 0.1, 1: 0.2}

  • model_a_config (dict) –

    Configuration for model A

  • model_b_config (dict) –

    Configuration for model B

  • memoryset_config (dict[str, str], default: default_memoryset_config ) –

    Configuration for where memorysets are stored. Defaults to default_memoryset_config.

  • training_config_a (TrainingConfig, default: training_config ) –

    Training configuration for the first model.

  • training_config_b (TrainingConfig, default: training_config ) –

    Training configuration for the second model.

  • graph_config (dict[str, str], default: {'title': 'Drift performance comparison', 'xaxis': 'Metrics', 'yaxis': 'Score'} ) –

    Configuration for the graph titles / axis labels. Defaulted to {“title”: “Drift performance comparison”, “xaxis”: “Metrics”, “yaxis”: “Score”}

Info

This prints a graph that compares the performance (f1,roc_auc,accuracy) of the models before and after the drift.