orcalib.rac.drift_simulator#
drift_classes
#
Modify a balanced Huggingface dataset into an unequal distribution.
Parameters:
-
dataset
(DatasetDict | Dataset
) –The Huggingface dataset (assumed balanced).
-
drift_ratios
(dict[int, float]
) –A dictionary where keys are class ints, and values are the desired proportion of samples to retain (e.g., {0: 0.5, 1: 0.2, 2: 1.0}). If a key is missing it will be unchanged.
Returns:
-
DatasetDict | Dataset
–A new dataset with drifted class distributions.
test_dataset_drift
#
Evaluate the performance of two models on a dataset that has been artificially drifted.
Parameters:
-
dataset
(Dataset | IterableDataset | IterableDatasetDict | DatasetDict
) –huggingface dataset
-
drift_ratios
(dict[int, float]
) –A dictionary of class to drift ratio ex {0: 0.1, 1: 0.2}
-
model_a_config
(dict
) –Configuration for model A
-
model_b_config
(dict
) –Configuration for model B
-
memoryset_config
(dict[str, str]
, default:default_memoryset_config
) –Configuration for where memorysets are stored. Defaults to default_memoryset_config.
-
training_config_a
(TrainingConfig
, default:training_config
) –Training configuration for the first model.
-
training_config_b
(TrainingConfig
, default:training_config
) –Training configuration for the second model.
-
graph_config
(dict[str, str]
, default:{'title': 'Drift performance comparison', 'xaxis': 'Metrics', 'yaxis': 'Score'}
) –Configuration for the graph titles / axis labels. Defaulted to {“title”: “Drift performance comparison”, “xaxis”: “Metrics”, “yaxis”: “Score”}
Info
This prints a graph that compares the performance (f1,roc_auc,accuracy) of the models before and after the drift.