-
Notifications
You must be signed in to change notification settings - Fork 50
feat: making generate_from_raw public #219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🟢 Enforce conventional commitWonderful, this rule succeeded.Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/
|
…erative-computing/mellea into feat-208-public-generate-raw
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The vLLM backend was just merged. Could you please make sure that it conforms to the changes outlined here?
Also, test are failing, it looks like we may be hitting some time limit.
| def generate_from_raw( | ||
| self, | ||
| actions: list[Component | CBlock], | ||
| ctx: Context, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why we are passing in a context if we are not using it. I think we should just leave it out of the function definition if we aren't going to use it for generation or append to it after the generation. I see that the docstring documents that it's not used, but I think that just adds extra confusion.
It was also my impression from the team meeting yesterday that we don't have a clear idea going forward of what we want context to eventually do here.
I think the one reason for keeping it would be to eventually utilize it. However, that is functionally an api change if we were to change the behavior of context. As a result, I think it would make sense to just break the function def in that instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree we should remove both tools and ctx, but only kept it in cause in the original issue, the motivation was to keep the signature consistent with generate_from_context.
| mot._generate_log = generate_log | ||
|
|
||
| def _generate_from_raw( | ||
| def generate_from_raw( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation of generate_from_raw continues the current implementation's decision to be synchronous. Are we okay with not allowing async batching operations (and it might be the case that these batching interfaces don't support async, I haven't looked)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think support of async is inconsistent here. Given that generate_from_raw is based on the /competitions endpoint which is deprecated, I think its going to be hard to maintain that for all backends here.
| """Generate using the completions api. Gives the input provided to the model without templating.""" | ||
| raise NotImplementedError("This method is not implemented yet.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if we are promoting this to be a supported API, we should add support for LiteLLM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair, but I might make a separate PR for this just to unblock this issue
| output._meta = { | ||
| "oai_completion_response": response.model_dump(), | ||
| "usage": completion_response.usage.model_dump() | ||
| if completion_response.usage | ||
| else None, | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like LiteLLM has standardized on:
"usage": {
"prompt_tokens": 18,
"completion_tokens": 25,
"total_tokens": 43
}
Even if OpenAI doesn't give us back correct values, it might still be worthwhile to standardize these fields in ModelOutputThunks and just not set them for OpenAI. Otherwise, anyone that has to use these values needs to be aware of every backend (instead of just checking if the values are None).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is similar to what openai returns. I think it should be part of the .usage object.
…erative-computing/mellea into feat-208-public-generate-raw
…erative-computing/mellea into feat-208-public-generate-raw
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
approving; I think litellm needs to eventually be supported and I think the "usage" values that are being stored in meta ought to be moved to standardized fields in the ModelOutputThunk
* feat: making generate_from_raw public in openai * feat: making generate_from_raw public in ollama * feat: making generate_from_raw public in ollama * feat: making generate_from_raw public in hf, watsonx * feat: making generate_from_raw public in vllm * adding warning to hf backend * attempt at standardizing usage metrics
Make
generate_from_rawpublic in all backends. Fixes #208TODOs: