[BUG] Do not use generation of hf model in inferencers

**Describe the bug**
As I mentioned in this [issue](https://github.com/huggingface/transformers/issues/35045), the default value of `top_p` and `temperature` is not guaranteed to be `1`. Therefore, the code below  will get a modified logits, i.e., a distribution processed depending on `generation_config` from hf end.
https://github.com/OptimalScale/LMFlow/blob/1b223f7693c2b8b91c7b6dfb7d7c48ab2d040834/src/lmflow/models/hf_decoder_model.py#L382-L405

Much worse, you applied `top_p` and `temperature` again in `score_to_prob`, resulting unexpected distribution:
https://github.com/OptimalScale/LMFlow/blob/1b223f7693c2b8b91c7b6dfb7d7c48ab2d040834/src/lmflow/pipeline/inferencer.py#L435-L440

	if self.use_accelerator:
	outputs = self.backend_model.generate(
	input_ids=inputs,
	pad_token_id=self.tokenizer.pad_token_id,
	*args,
	**kwargs
	)
	else:
	if self.device == "gpu":
	outputs = self.ds_engine.module.generate(
	input_ids=inputs,
	synced_gpus=True,
	pad_token_id=self.tokenizer.pad_token_id,
	*args,
	**kwargs
	)
	elif self.device == "cpu":
	outputs = self.backend_model.generate(
	input_ids=inputs,
	synced_gpus=True,
	pad_token_id=self.tokenizer.pad_token_id,
	*args,
	**kwargs
	)

	for _ in range(num_new_tokens):
	pred = self.predict_next_token(model=model, input_ids=sequence, num_new_tokens=1) # predict next one token
	prob = self.score_to_prob(pred.scores[0], temperature=temperature)
	sampled = self.sample(prob=prob, num_samples=1)
	new_tokens.append(sampled)
	sequence = torch.cat([sequence, sampled['sampled_token']], dim=1)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] Do not use generation of hf model in inferencers #920

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] Do not use generation of hf model in inferencers #920

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions