orcalib.hf_utils#
OrcaGroundingProcessor
#
Bases: LogitsProcessor
Logits processor that adjusts the logits based on the memories in the database.
Parameters:
-
memories
(list[list[str]]
) –List of memories
-
tokenizer
(Any
) –Tokenizer
-
bag_weight
(float
, default:0.05
) –Bag of words weight
-
sim_weight
(float
, default:0.5
) –Similarity weight
-
S_min
(int
, default:3
) –Minimum suffix length
-
S_max
(int
, default:10
) –Maximum suffix length
__call__
#
Adjusts the given scores based on memory and similarity weights
This method processes each batch of input IDs and their corresponding scores to adjust the scores using memory-based and similarity-based adjustments. The final adjusted probabilities are then normalized and converted to log probabilities.
- The similarity and bag-of-words weights (
self.sim_weight
andself.bag_weight
) are used to scale the adjustments. - The semantic scores are currently based on exponential decay but can be modified to use scores from an approximate nearest neighbor (ANN) search in the future.
Parameters:
-
input_ids
(Tensor
) –A tensor of shape (
batch_size
,sequence_length
) containing input token IDs. -
scores
(Tensor
) –A tensor of shape (
batch_size
,vocab_size
) containing the initial scores for each token.
Returns:
-
Tensor
–A tensor of shape (
batch_size
,vocab_size
) containing the log probabilities after adjustment.