Skip to content

orcalib.hf_utils#

OrcaGroundingProcessor #

1
2
3
4
5
6
7
8
OrcaGroundingProcessor(
    memories,
    tokenizer,
    bag_weight=0.05,
    sim_weight=0.5,
    S_min=3,
    S_max=10,
)

Bases: LogitsProcessor

Logits processor that adjusts the logits based on the memories in the database.

Parameters:

  • memories (list[list[str]]) –

    List of memories

  • tokenizer (Any) –

    Tokenizer

  • bag_weight (float, default: 0.05 ) –

    Bag of words weight

  • sim_weight (float, default: 0.5 ) –

    Similarity weight

  • S_min (int, default: 3 ) –

    Minimum suffix length

  • S_max (int, default: 10 ) –

    Maximum suffix length

__call__ #

__call__(input_ids, scores)

Adjusts the given scores based on memory and similarity weights

This method processes each batch of input IDs and their corresponding scores to adjust the scores using memory-based and similarity-based adjustments. The final adjusted probabilities are then normalized and converted to log probabilities.

  • The similarity and bag-of-words weights (self.sim_weight and self.bag_weight) are used to scale the adjustments.
  • The semantic scores are currently based on exponential decay but can be modified to use scores from an approximate nearest neighbor (ANN) search in the future.

Parameters:

  • input_ids (Tensor) –

    A tensor of shape (batch_size, sequence_length) containing input token IDs.

  • scores (Tensor) –

    A tensor of shape (batch_size, vocab_size) containing the initial scores for each token.

Returns:

  • Tensor

    A tensor of shape (batch_size, vocab_size) containing the log probabilities after adjustment.