Skip to content

Ollama client (from adalflow.components.model_client.ollama_client) does not work with stream=True #299

@debasisdwivedy

Description

@debasisdwivedy

Bug description

The application does not work with stream set to True. The class adalflow.components.model_client.ollama_client has the method todo stream input as below:

def parse_stream_response(completion: GeneratorType) -> Any:
    """Parse the completion to a str. We use the generate with prompt instead of chat with messages."""
    for chunk in completion:
        log.debug(f"Raw chunk: {chunk}")
        raw_response = chunk["response"] if "response" in chunk else None
        yield GeneratorOutput(data=None, raw_response=raw_response)


def parse_chat_completion(
        self, completion: Union[GenerateResponse, GeneratorType]
    ) -> GeneratorOutput:
        """Parse the completion to a str. We use the generate with prompt instead of chat with messages."""
        log.debug(f"completion: {completion}, {isinstance(completion, GeneratorType)}")
        if isinstance(completion, GeneratorType):  # streaming
            return parse_stream_response(completion)
        else:
            return parse_generate_response(completion)

The yield method would require a loop to get all the tokens. Is there a reason to use yield instead of return.

There are two ways to go about solving this:

SOLUTION 1
Change yield to return.

def parse_stream_response(completion: GeneratorType) -> Any:
    """Parse the completion to a str. We use the generate with prompt instead of chat with messages."""
    gen_output = GeneratorOutput(data=None, raw_response='')
    for chunk in completion:
        log.debug(f"Raw chunk: {chunk}")
        raw_response = chunk["response"] if "response" in chunk else None
        gen_output.raw_response += token
    return gen_output 
def parse_chat_completion(
        self, completion: Union[GenerateResponse, GeneratorType]
    ) -> GeneratorOutput:
        """Parse the completion to a str. We use the generate with prompt instead of chat with messages."""
        log.debug(f"completion: {completion}, {isinstance(completion, GeneratorType)}")
        if isinstance(completion, GeneratorType):  # streaming
            return parse_stream_response(completion)
        else:
            return parse_generate_response(completion)

SOLUTION 2

Change method parse_chat_completion to get all the token and then return the GeneratorOutput

def parse_stream_response(completion: GeneratorType) -> Any:
    """Parse the completion to a str. We use the generate with prompt instead of chat with messages."""
    for chunk in completion:
        log.debug(f"Raw chunk: {chunk}")
        raw_response = chunk["response"] if "response" in chunk else None
        yield raw_response 
def parse_chat_completion(
        self, completion: Union[GenerateResponse, GeneratorType]
    ) -> GeneratorOutput:
        """Parse the completion to a str. We use the generate with prompt instead of chat with messages."""
        log.debug(f"completion: {completion}, {isinstance(completion, GeneratorType)}")
        if isinstance(completion, GeneratorType):  # streaming
            gen_output = GeneratorOutput(data=None, raw_response='')
            tokens = parse_stream_response(completion)

            for token in tokens:
                gen_output.raw_response += token
            return gen_output
        else:
            return parse_generate_response(completion)

One thing to remember is that for async implementation we have to create async_parse_chat_completion as the method parse_chat_completion would not work for asynchronous calls.

@liyin2015 Once reviewed and verified that this is an issue i would go ahed and raise a PR for the implementation.

Regards,

What version are you seeing the problem on?

pip installed.

To get the version:

 show adalflow

Name: adalflow
Version: 0.2.6
Summary: The Library to Build and Auto-optimize LLM Applications
Home-page: https://github.com/SylphAI-Inc/AdalFlow
Author: Li Yin
Author-email: li@sylphai.com
License: MIT
Location: <>
Requires: backoff, boto3, botocore, colorama, diskcache, jinja2, jsonlines, nest-asyncio, numpy, python-dotenv, pyyaml, tiktoken, tqdm
Required-by:

How to reproduce the bug

from adalflow.components.model_client.ollama_client import OllamaClient
from adalflow.core.generator import Generator

host = "127.0.0.1:11434"


ollama_ai = {
        "model_client": OllamaClient(host=host),
        "model_kwargs": {
            "model": "phi3:latest",
            "stream": True,
        },
    }

generator = Generator(**ollama_ai)
output = generator({"input_str": "What is the capital of France?"})
print(output)


### Error messages and logs

Error processing the output: 'generator' object has no attribute 'raw_response'
GeneratorOutput(id=None, data=None, error="'generator' object has no attribute 'raw_response'", usage=None, raw_response='<generator object Client._request.<locals>.inner at 0x12aa131b0>', metadata=None)



### Environment

- OS: [e.g., Linux, Windows, macOS]
macOS - M1 pro
Version OS -  15.1

### More info

_No response_

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working, either in /adalflow, /tutorials, or /use cases...

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions