- 
                Notifications
    You must be signed in to change notification settings 
- Fork 90
qwen3_workarena_changes #312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review by Korbit AI
Korbit automatically attempts to detect when you fix issues in new commits.
| Category | Issue | Status | 
|---|---|---|
| Missing AGENT_QWEN3 configuration ▹ view | ||
| Configuration Duplication Instead of Inheritance ▹ view | ||
| Missing response parameter structure documentation ▹ view | ||
| Inefficient nested iteration in content processing ▹ view | ||
| Complex Function with Multiple Responsibilities ▹ view | ||
| Missing Purpose in UI Pattern Removal Docstring ▹ view | ||
| In-place mutation of observation dictionary ▹ view | ||
| Hardcoded UI Pattern List ▹ view | 
Files scanned
| File Path | Reviewed | 
|---|---|
| src/agentlab/agents/generic_agent/agent_configs.py | ✅ | 
| src/agentlab/llm/chat_api.py | ✅ | 
| src/agentlab/agents/dynamic_prompting.py | ✅ | 
Explore our documentation to understand the languages and file types we support and the files we ignore.
Check out our docs on how you can make Korbit work best for you and your team.
| FLAGS_QWEN3 = GenericPromptFlags( | ||
| obs=dp.ObsFlags( | ||
| use_html=False, | ||
| use_ax_tree=True, | ||
| use_focused_element=True, | ||
| use_error_logs=False, | ||
| use_history=True, | ||
| use_past_error_logs=False, | ||
| use_action_history=True, | ||
| use_think_history=False, # Qwen3 doesn't include thinking history | ||
| use_diff=False, | ||
| html_type="pruned_html", | ||
| use_screenshot=False, | ||
| use_som=False, | ||
| extract_visible_tag=True, | ||
| extract_clickable_tag=False, | ||
| extract_coords="False", | ||
| filter_visible_elements_only=False, | ||
| ), | ||
| action=dp.ActionFlags( | ||
| action_set=HighLevelActionSetArgs( | ||
| subsets=["bid"], | ||
| multiaction=False, | ||
| ), | ||
| long_description=False, | ||
| individual_examples=True, | ||
| ), | ||
| use_plan=False, | ||
| use_criticise=False, | ||
| use_thinking=True, | ||
| use_memory=False, | ||
| use_concrete_example=True, | ||
| use_abstract_example=True, | ||
| use_hints=True, | ||
| enable_chat=False, | ||
| max_prompt_tokens=40_000, | ||
| be_cautious=True, | ||
| extra_instructions=None, | ||
| add_missparsed_messages=True, | ||
| ) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing AGENT_QWEN3 configuration 
Tell me more
What is the issue?
The FLAGS_QWEN3 configuration is defined but no corresponding AGENT_QWEN3 is created to use it.
Why this matters
Without a corresponding agent configuration, the FLAGS_QWEN3 cannot be used by the system, making the configuration effectively dead code that serves no functional purpose.
Suggested change ∙ Feature Preview
Add a corresponding agent configuration after the FLAGS_QWEN3 definition:
AGENT_QWEN3 = GenericAgentArgs(
    chat_model_args=CHAT_MODEL_ARGS_DICT["qwen3_model_key"],  # Replace with actual Qwen3 model key
    flags=FLAGS_QWEN3,
)Provide feedback to improve future suggestions
💬 Looking for more details? Reply to this comment to chat with Korbit.
| # qwen3 default config - same as llama3-70b but with use_think_history=False | ||
| FLAGS_QWEN3 = GenericPromptFlags( | ||
| obs=dp.ObsFlags( | ||
| use_html=False, | ||
| use_ax_tree=True, | ||
| use_focused_element=True, | ||
| use_error_logs=False, | ||
| use_history=True, | ||
| use_past_error_logs=False, | ||
| use_action_history=True, | ||
| use_think_history=False, # Qwen3 doesn't include thinking history | ||
| use_diff=False, | ||
| html_type="pruned_html", | ||
| use_screenshot=False, | ||
| use_som=False, | ||
| extract_visible_tag=True, | ||
| extract_clickable_tag=False, | ||
| extract_coords="False", | ||
| filter_visible_elements_only=False, | ||
| ), | ||
| action=dp.ActionFlags( | ||
| action_set=HighLevelActionSetArgs( | ||
| subsets=["bid"], | ||
| multiaction=False, | ||
| ), | ||
| long_description=False, | ||
| individual_examples=True, | ||
| ), | ||
| use_plan=False, | ||
| use_criticise=False, | ||
| use_thinking=True, | ||
| use_memory=False, | ||
| use_concrete_example=True, | ||
| use_abstract_example=True, | ||
| use_hints=True, | ||
| enable_chat=False, | ||
| max_prompt_tokens=40_000, | ||
| be_cautious=True, | ||
| extra_instructions=None, | ||
| add_missparsed_messages=True, | ||
| ) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Configuration Duplication Instead of Inheritance 
Tell me more
What is the issue?
The code duplicates the entire configuration structure from FLAGS_LLAMA3_70B with only one parameter difference (use_think_history=False).
Why this matters
Code duplication increases maintenance burden and risk of inconsistencies when updates are needed. If the base configuration changes, both copies need to be updated.
Suggested change ∙ Feature Preview
# Create Qwen3 flags by inheriting from LLAMA3_70B flags
FLAGS_QWEN3 = FLAGS_LLAMA3_70B.copy()
FLAGS_QWEN3.obs.use_think_history = False  # Override only the different parameterProvide feedback to improve future suggestions
💬 Looking for more details? Reply to this comment to chat with Korbit.
| def _extract_thinking_content_from_response(response, wrap_tag: str = "think"): | ||
| """ | ||
| Extracts the content from the message, including reasoning if available. | ||
| It wraps the reasoning around <think>...</think> for easy identification of reasoning content, | ||
| when LLM produces 'text' and 'reasoning' in the same message. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing response parameter structure documentation 
Tell me more
What is the issue?
The docstring lacks information about the expected format/structure of the 'response' parameter.
Why this matters
Without clear documentation of the expected response structure, developers may pass incorrectly formatted responses leading to runtime errors.
Suggested change ∙ Feature Preview
def _extract_thinking_content_from_response(response, wrap_tag: str = "think"):
"""
Extracts the content from the message, including reasoning if available.
It wraps the reasoning around ... for easy identification of reasoning content,
when LLM produces 'text' and 'reasoning' in the same message.
Args:
    response: OpenAI-style completion response object with choices[0].message containing
             either a dict or object with 'content', 'text', 'reasoning', or 'reasoning_content' fields.
    wrap_tag: The tag name to wrap reasoning content (default: "think").
Returns:
    str: The extracted content with reasoning wrapped in specified tags.
"""
Provide feedback to improve future suggestions
💬 Looking for more details? Reply to this comment to chat with Korbit.
| if isinstance(raw_content, list): | ||
| # Concatenate text-like parts if provider returns content blocks | ||
| parts = [] | ||
| for part in raw_content: | ||
| if isinstance(part, str): | ||
| parts.append(part) | ||
| elif isinstance(part, dict): | ||
| # Common shapes: {"type":"text","text":"..."} or {"type":"text","text":{"value":"..."}} | ||
| if part.get("type") == "text": | ||
| txt = part.get("text") | ||
| if isinstance(txt, dict) and isinstance(txt.get("value"), str): | ||
| parts.append(txt["value"]) | ||
| elif isinstance(txt, str): | ||
| parts.append(txt) | ||
| else: | ||
| # Fallback: try a few likely keys | ||
| for k in ("content", "text", "value"): | ||
| v = part.get(k) | ||
| if isinstance(v, str): | ||
| parts.append(v) | ||
| break | ||
| else: | ||
| parts.append(str(part)) | ||
| msg_content = "\n".join(p for p in parts if p) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inefficient nested iteration in content processing 
Tell me more
What is the issue?
Inefficient nested loops and repeated dictionary lookups when processing list-style content.
Why this matters
The nested loops with multiple dictionary get() calls and string type checks create O(n*m) complexity where n is the number of parts and m is the number of fallback keys. This will cause performance degradation when processing large content lists.
Suggested change ∙ Feature Preview
Extract the text extraction logic into a separate function and use early returns to avoid nested loops:
def _extract_text_from_part(part):
    if isinstance(part, str):
        return part
    if isinstance(part, dict):
        if part.get("type") == "text":
            txt = part.get("text")
            if isinstance(txt, dict):
                return txt.get("value", "")
            return txt or ""
        # Single loop for fallback keys
        for key in ("content", "text", "value"):
            value = part.get(key)
            if isinstance(value, str):
                return value
    return str(part)
# Then use: parts = [_extract_text_from_part(part) for part in raw_content]Provide feedback to improve future suggestions
💬 Looking for more details? Reply to this comment to chat with Korbit.
|  | ||
| import logging | ||
|  | ||
| def _extract_thinking_content_from_response(response, wrap_tag: str = "think"): | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Complex Function with Multiple Responsibilities 
Tell me more
What is the issue?
The function is too complex and handles multiple responsibilities: message normalization, reasoning extraction, content parsing with multiple formats, and content wrapping.
Why this matters
The high complexity makes the code harder to maintain, test, and modify. Each responsibility could change for different reasons, violating the Single Responsibility Principle.
Suggested change ∙ Feature Preview
Split the function into smaller, focused functions:
def _normalize_message(response):
    message = response.choices[0].message
    return message.to_dict() if not isinstance(message, dict) else message
def _extract_reasoning(message):
    return message.get('reasoning') or message.get('reasoning_content')
def _parse_content(raw_content):
    # Handle string vs list content parsing logic
def _format_response(reasoning_text, msg_text, msg_content, wrap_tag):
    # Handle wrapping and combining the partsProvide feedback to improve future suggestions
💬 Looking for more details? Reply to this comment to chat with Korbit.
| def remove_ui_patterns(text): | ||
| """ | ||
| Remove lines containing specific UI patterns for ServiceNow accessibility tree text. | ||
| Args: | ||
| text (str): The input string containing the accessibility tree | ||
| Returns: | ||
| str: The cleaned string with lines containing UI patterns removed | ||
| """ | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing Purpose in UI Pattern Removal Docstring 
Tell me more
What is the issue?
The docstring lacks explanation of why these UI patterns need to be removed and what impact it has.
Why this matters
Without understanding the purpose, maintainers may inadvertently modify or remove this filtering which could impact the accessibility tree's usability.
Suggested change ∙ Feature Preview
def remove_ui_patterns(text):
"""
Remove ServiceNow UI management patterns from accessibility tree text to focus on core page content.
Filters out widget management controls that are not relevant for task completion
and would add noise to the accessibility tree representation.
Args:
    text (str): The input string containing the accessibility tree
Returns:
    str: The cleaned string with UI management patterns removed
"""
Provide feedback to improve future suggestions
💬 Looking for more details? Reply to this comment to chat with Korbit.
| visible=lambda: flags.use_html, | ||
| prefix="## ", | ||
| ) | ||
| obs["axtree_txt"] = remove_ui_patterns(obs["axtree_txt"]) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In-place mutation of observation dictionary 
Tell me more
What is the issue?
The code modifies the observation dictionary in-place during object initialization, which could cause side effects if the same observation dictionary is used elsewhere or reused.
Why this matters
This mutation could lead to unexpected behavior in other parts of the system that expect the original unmodified observation data, potentially causing debugging difficulties and inconsistent state.
Suggested change ∙ Feature Preview
Create a copy of the observation or modify the axtree_txt after extraction:
obs = copy(obs)  # Add this line before modification
obs["axtree_txt"] = remove_ui_patterns(obs["axtree_txt"])Or modify it in the AXTree constructor:
self.ax_tree = AXTree(
    remove_ui_patterns(obs["axtree_txt"]),
    visible_elements_only=flags.filter_visible_elements_only,
    visible=lambda: flags.use_ax_tree,
    coord_type=flags.extract_coords,
    visible_tag=flags.extract_visible_tag,
    prefix="## ",
)Provide feedback to improve future suggestions
💬 Looking for more details? Reply to this comment to chat with Korbit.
| """ | ||
|  | ||
| # Words to look for | ||
| words_to_remove = ["Edit Widget", "Edit Widget Preferences", "Close Widget", "Add content"] | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hardcoded UI Pattern List 
Tell me more
What is the issue?
UI patterns to remove are hardcoded as a list within the function, making it difficult to modify or reuse these patterns elsewhere.
Why this matters
If UI patterns need to be updated or reused in other parts of the code, developers would need to modify them in multiple places, increasing maintenance burden and risk of inconsistency.
Suggested change ∙ Feature Preview
Move the UI patterns to a module-level constant or configuration:
UI_PATTERNS_TO_REMOVE = [
    "Edit Widget",
    "Edit Widget Preferences", 
    "Close Widget",
    "Add content"
]
def remove_ui_patterns(text, patterns=UI_PATTERNS_TO_REMOVE):
    ...Provide feedback to improve future suggestions
💬 Looking for more details? Reply to this comment to chat with Korbit.
Description by Korbit AI
What change is being made?
Refactor to clean up UI text in dynamic prompts, add a default Qwen3 config, and improve LLM response handling to extract and wrap reasoning content from replies.
Why are these changes being made?
To remove noisy UI patterns from the AXTree representation, provide a Qwen3-specific default configuration, and ensure reasoning content from model responses is preserved and clearly identified for downstream processing. This improves prompt clarity, config parity, and traceability of model thinking.