Skip to content

[BUG] Do not use generation of hf model in inferencers #920

@Kamichanw

Description

@Kamichanw

Describe the bug
As I mentioned in this issue, the default value of top_p and temperature is not guaranteed to be 1. Therefore, the code below will get a modified logits, i.e., a distribution processed depending on generation_config from hf end.

if self.use_accelerator:
outputs = self.backend_model.generate(
input_ids=inputs,
pad_token_id=self.tokenizer.pad_token_id,
*args,
**kwargs
)
else:
if self.device == "gpu":
outputs = self.ds_engine.module.generate(
input_ids=inputs,
synced_gpus=True,
pad_token_id=self.tokenizer.pad_token_id,
*args,
**kwargs
)
elif self.device == "cpu":
outputs = self.backend_model.generate(
input_ids=inputs,
synced_gpus=True,
pad_token_id=self.tokenizer.pad_token_id,
*args,
**kwargs
)

Much worse, you applied top_p and temperature again in score_to_prob, resulting unexpected distribution:

for _ in range(num_new_tokens):
pred = self.predict_next_token(model=model, input_ids=sequence, num_new_tokens=1) # predict next one token
prob = self.score_to_prob(pred.scores[0], temperature=temperature)
sampled = self.sample(prob=prob, num_samples=1)
new_tokens.append(sampled)
sequence = torch.cat([sequence, sampled['sampled_token']], dim=1)

Metadata

Metadata

Assignees

No one assigned

    Labels

    pendingSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions