qwen3_workarena_changes #312

Vattikondadheeraj · 2025-10-27T17:13:10Z

Description by Korbit AI

What change is being made?

Refactor to clean up UI text in dynamic prompts, add a default Qwen3 config, and improve LLM response handling to extract and wrap reasoning content from replies.

Why are these changes being made?

To remove noisy UI patterns from the AXTree representation, provide a Qwen3-specific default configuration, and ensure reasoning content from model responses is preserved and clearly identified for downstream processing. This improves prompt clarity, config parity, and traceability of model thinking.

Is this description stale? Ask me to generate a new description by commenting /korbit-generate-pr-description

korbit-ai

Review by Korbit AI

Korbit automatically attempts to detect when you fix issues in new commits.

Category	Issue	Status
	Missing AGENT_QWEN3 configuration ▹ view
	Configuration Duplication Instead of Inheritance ▹ view
	Missing response parameter structure documentation ▹ view
	Inefficient nested iteration in content processing ▹ view
	Complex Function with Multiple Responsibilities ▹ view
	Missing Purpose in UI Pattern Removal Docstring ▹ view
	In-place mutation of observation dictionary ▹ view
	Hardcoded UI Pattern List ▹ view

Files scanned

File Path	Reviewed
src/agentlab/agents/generic_agent/agent_configs.py	✅
src/agentlab/llm/chat_api.py	✅
src/agentlab/agents/dynamic_prompting.py	✅

Explore our documentation to understand the languages and file types we support and the files we ignore.

Check out our docs on how you can make Korbit work best for you and your team.

Loving Korbit!? Share us on LinkedIn Reddit and X

korbit-ai · 2025-10-27T17:16:20Z

src/agentlab/agents/generic_agent/agent_configs.py

+FLAGS_QWEN3 = GenericPromptFlags(
+    obs=dp.ObsFlags(
+        use_html=False,
+        use_ax_tree=True,
+        use_focused_element=True,
+        use_error_logs=False,
+        use_history=True,
+        use_past_error_logs=False,
+        use_action_history=True,
+        use_think_history=False,  # Qwen3 doesn't include thinking history
+        use_diff=False,
+        html_type="pruned_html",
+        use_screenshot=False,
+        use_som=False,
+        extract_visible_tag=True,
+        extract_clickable_tag=False,
+        extract_coords="False",
+        filter_visible_elements_only=False,
+    ),
+    action=dp.ActionFlags(
+        action_set=HighLevelActionSetArgs(
+            subsets=["bid"],
+            multiaction=False,
+        ),
+        long_description=False,
+        individual_examples=True,
+    ),
+    use_plan=False,
+    use_criticise=False,
+    use_thinking=True,
+    use_memory=False,
+    use_concrete_example=True,
+    use_abstract_example=True,
+    use_hints=True,
+    enable_chat=False,
+    max_prompt_tokens=40_000,
+    be_cautious=True,
+    extra_instructions=None,
+    add_missparsed_messages=True,
+)


Missing AGENT_QWEN3 configuration

Tell me more

What is the issue?

The FLAGS_QWEN3 configuration is defined but no corresponding AGENT_QWEN3 is created to use it.

Why this matters

Without a corresponding agent configuration, the FLAGS_QWEN3 cannot be used by the system, making the configuration effectively dead code that serves no functional purpose.

Suggested change ∙ Feature Preview

Add a corresponding agent configuration after the FLAGS_QWEN3 definition:

AGENT_QWEN3 = GenericAgentArgs( chat_model_args=CHAT_MODEL_ARGS_DICT["qwen3_model_key"], # Replace with actual Qwen3 model key flags=FLAGS_QWEN3, )

Provide feedback to improve future suggestions

_{💬 Looking for more details? Reply to this comment to chat with Korbit.}

korbit-ai · 2025-10-27T17:16:20Z

src/agentlab/agents/generic_agent/agent_configs.py

+# qwen3 default config - same as llama3-70b but with use_think_history=False
+FLAGS_QWEN3 = GenericPromptFlags(
+    obs=dp.ObsFlags(
+        use_html=False,
+        use_ax_tree=True,
+        use_focused_element=True,
+        use_error_logs=False,
+        use_history=True,
+        use_past_error_logs=False,
+        use_action_history=True,
+        use_think_history=False,  # Qwen3 doesn't include thinking history
+        use_diff=False,
+        html_type="pruned_html",
+        use_screenshot=False,
+        use_som=False,
+        extract_visible_tag=True,
+        extract_clickable_tag=False,
+        extract_coords="False",
+        filter_visible_elements_only=False,
+    ),
+    action=dp.ActionFlags(
+        action_set=HighLevelActionSetArgs(
+            subsets=["bid"],
+            multiaction=False,
+        ),
+        long_description=False,
+        individual_examples=True,
+    ),
+    use_plan=False,
+    use_criticise=False,
+    use_thinking=True,
+    use_memory=False,
+    use_concrete_example=True,
+    use_abstract_example=True,
+    use_hints=True,
+    enable_chat=False,
+    max_prompt_tokens=40_000,
+    be_cautious=True,
+    extra_instructions=None,
+    add_missparsed_messages=True,
+)


Configuration Duplication Instead of Inheritance

Tell me more

What is the issue?

The code duplicates the entire configuration structure from FLAGS_LLAMA3_70B with only one parameter difference (use_think_history=False).

Why this matters

Code duplication increases maintenance burden and risk of inconsistencies when updates are needed. If the base configuration changes, both copies need to be updated.

Suggested change ∙ Feature Preview

# Create Qwen3 flags by inheriting from LLAMA3_70B flags FLAGS_QWEN3 = FLAGS_LLAMA3_70B.copy() FLAGS_QWEN3.obs.use_think_history = False # Override only the different parameter

Provide feedback to improve future suggestions

_{💬 Looking for more details? Reply to this comment to chat with Korbit.}

korbit-ai · 2025-10-27T17:16:20Z

src/agentlab/llm/chat_api.py

+def _extract_thinking_content_from_response(response, wrap_tag: str = "think"):
+    """
+    Extracts the content from the message, including reasoning if available.
+    It wraps the reasoning around <think>...</think> for easy identification of reasoning content,
+    when LLM produces 'text' and 'reasoning' in the same message.


Missing response parameter structure documentation

Tell me more

What is the issue?

The docstring lacks information about the expected format/structure of the 'response' parameter.

Why this matters

Without clear documentation of the expected response structure, developers may pass incorrectly formatted responses leading to runtime errors.

Suggested change ∙ Feature Preview

def _extract_thinking_content_from_response(response, wrap_tag: str = "think"):
"""
Extracts the content from the message, including reasoning if available.
It wraps the reasoning around ... for easy identification of reasoning content,
when LLM produces 'text' and 'reasoning' in the same message.

Args: response: OpenAI-style completion response object with choices[0].message containing either a dict or object with 'content', 'text', 'reasoning', or 'reasoning_content' fields. wrap_tag: The tag name to wrap reasoning content (default: "think"). Returns: str: The extracted content with reasoning wrapped in specified tags. """

Provide feedback to improve future suggestions

_{💬 Looking for more details? Reply to this comment to chat with Korbit.}

korbit-ai · 2025-10-27T17:16:20Z

src/agentlab/llm/chat_api.py

+    if isinstance(raw_content, list):
+        # Concatenate text-like parts if provider returns content blocks
+        parts = []
+        for part in raw_content:
+            if isinstance(part, str):
+                parts.append(part)
+            elif isinstance(part, dict):
+                # Common shapes: {"type":"text","text":"..."} or {"type":"text","text":{"value":"..."}}
+                if part.get("type") == "text":
+                    txt = part.get("text")
+                    if isinstance(txt, dict) and isinstance(txt.get("value"), str):
+                        parts.append(txt["value"])
+                    elif isinstance(txt, str):
+                        parts.append(txt)
+                else:
+                    # Fallback: try a few likely keys
+                    for k in ("content", "text", "value"):
+                        v = part.get(k)
+                        if isinstance(v, str):
+                            parts.append(v)
+                            break
+            else:
+                parts.append(str(part))
+        msg_content = "\n".join(p for p in parts if p)


Inefficient nested iteration in content processing

Tell me more

What is the issue?

Inefficient nested loops and repeated dictionary lookups when processing list-style content.

Why this matters

The nested loops with multiple dictionary get() calls and string type checks create O(n*m) complexity where n is the number of parts and m is the number of fallback keys. This will cause performance degradation when processing large content lists.

Suggested change ∙ Feature Preview

Extract the text extraction logic into a separate function and use early returns to avoid nested loops:

def _extract_text_from_part(part): if isinstance(part, str): return part if isinstance(part, dict): if part.get("type") == "text": txt = part.get("text") if isinstance(txt, dict): return txt.get("value", "") return txt or "" # Single loop for fallback keys for key in ("content", "text", "value"): value = part.get(key) if isinstance(value, str): return value return str(part) # Then use: parts = [_extract_text_from_part(part) for part in raw_content]

Provide feedback to improve future suggestions

_{💬 Looking for more details? Reply to this comment to chat with Korbit.}

korbit-ai · 2025-10-27T17:16:21Z

src/agentlab/llm/chat_api.py

+
+import logging
+
+def _extract_thinking_content_from_response(response, wrap_tag: str = "think"):


Complex Function with Multiple Responsibilities

Tell me more

What is the issue?

The function is too complex and handles multiple responsibilities: message normalization, reasoning extraction, content parsing with multiple formats, and content wrapping.

Why this matters

The high complexity makes the code harder to maintain, test, and modify. Each responsibility could change for different reasons, violating the Single Responsibility Principle.

Suggested change ∙ Feature Preview

Split the function into smaller, focused functions:

def _normalize_message(response): message = response.choices[0].message return message.to_dict() if not isinstance(message, dict) else message def _extract_reasoning(message): return message.get('reasoning') or message.get('reasoning_content') def _parse_content(raw_content): # Handle string vs list content parsing logic def _format_response(reasoning_text, msg_text, msg_content, wrap_tag): # Handle wrapping and combining the parts

Provide feedback to improve future suggestions

_{💬 Looking for more details? Reply to this comment to chat with Korbit.}

korbit-ai · 2025-10-27T17:16:21Z

src/agentlab/agents/dynamic_prompting.py

+def remove_ui_patterns(text):
+    """
+    Remove lines containing specific UI patterns for ServiceNow accessibility tree text.
+
+    Args:
+        text (str): The input string containing the accessibility tree
+
+    Returns:
+        str: The cleaned string with lines containing UI patterns removed
+    """


Missing Purpose in UI Pattern Removal Docstring

Tell me more

What is the issue?

The docstring lacks explanation of why these UI patterns need to be removed and what impact it has.

Why this matters

Without understanding the purpose, maintainers may inadvertently modify or remove this filtering which could impact the accessibility tree's usability.

Suggested change ∙ Feature Preview

def remove_ui_patterns(text):
"""
Remove ServiceNow UI management patterns from accessibility tree text to focus on core page content.

Filters out widget management controls that are not relevant for task completion and would add noise to the accessibility tree representation. Args: text (str): The input string containing the accessibility tree Returns: str: The cleaned string with UI management patterns removed """

Provide feedback to improve future suggestions

_{💬 Looking for more details? Reply to this comment to chat with Korbit.}

korbit-ai · 2025-10-27T17:16:21Z

src/agentlab/agents/dynamic_prompting.py

            visible=lambda: flags.use_html,
            prefix="## ",
        )
+        obs["axtree_txt"] = remove_ui_patterns(obs["axtree_txt"]) 


In-place mutation of observation dictionary

Tell me more

What is the issue?

The code modifies the observation dictionary in-place during object initialization, which could cause side effects if the same observation dictionary is used elsewhere or reused.

Why this matters

This mutation could lead to unexpected behavior in other parts of the system that expect the original unmodified observation data, potentially causing debugging difficulties and inconsistent state.

Suggested change ∙ Feature Preview

Create a copy of the observation or modify the axtree_txt after extraction:

obs = copy(obs) # Add this line before modification obs["axtree_txt"] = remove_ui_patterns(obs["axtree_txt"])

Or modify it in the AXTree constructor:

self.ax_tree = AXTree( remove_ui_patterns(obs["axtree_txt"]), visible_elements_only=flags.filter_visible_elements_only, visible=lambda: flags.use_ax_tree, coord_type=flags.extract_coords, visible_tag=flags.extract_visible_tag, prefix="## ", )

Provide feedback to improve future suggestions

_{💬 Looking for more details? Reply to this comment to chat with Korbit.}

korbit-ai · 2025-10-27T17:16:21Z

src/agentlab/agents/dynamic_prompting.py

+    """
+
+    # Words to look for
+    words_to_remove = ["Edit Widget", "Edit Widget Preferences", "Close Widget", "Add content"]


Hardcoded UI Pattern List

Tell me more

What is the issue?

UI patterns to remove are hardcoded as a list within the function, making it difficult to modify or reuse these patterns elsewhere.

Why this matters

If UI patterns need to be updated or reused in other parts of the code, developers would need to modify them in multiple places, increasing maintenance burden and risk of inconsistency.

Suggested change ∙ Feature Preview

Move the UI patterns to a module-level constant or configuration:

UI_PATTERNS_TO_REMOVE = [ "Edit Widget", "Edit Widget Preferences", "Close Widget", "Add content" ] def remove_ui_patterns(text, patterns=UI_PATTERNS_TO_REMOVE): ...

Provide feedback to improve future suggestions

_{💬 Looking for more details? Reply to this comment to chat with Korbit.}

qwen3_workarena_changes

2a1c67c

Vattikondadheeraj requested a review from optimass October 27, 2025 17:13

korbit-ai bot reviewed Oct 27, 2025

View reviewed changes

hardcoded n_retry to 1

0701c52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

qwen3_workarena_changes #312

qwen3_workarena_changes #312

Uh oh!

Vattikondadheeraj commented Oct 27, 2025 •

edited by korbit-ai bot

Loading

Uh oh!

korbit-ai bot left a comment •

edited

Loading

Uh oh!

korbit-ai bot Oct 27, 2025

Uh oh!

korbit-ai bot Oct 27, 2025

Uh oh!

korbit-ai bot Oct 27, 2025

Uh oh!

korbit-ai bot Oct 27, 2025

Uh oh!

korbit-ai bot Oct 27, 2025

Uh oh!

korbit-ai bot Oct 27, 2025

Uh oh!

korbit-ai bot Oct 27, 2025

Uh oh!

korbit-ai bot Oct 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		import logging

		def _extract_thinking_content_from_response(response, wrap_tag: str = "think"):

qwen3_workarena_changes #312

Are you sure you want to change the base?

qwen3_workarena_changes #312

Uh oh!

Conversation

Vattikondadheeraj commented Oct 27, 2025 • edited by korbit-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description by Korbit AI

What change is being made?

Why are these changes being made?

Uh oh!

korbit-ai bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Review by Korbit AI

Korbit automatically attempts to detect when you fix issues in new commits.

Uh oh!

korbit-ai bot Oct 27, 2025

Choose a reason for hiding this comment

Missing AGENT_QWEN3 configuration

What is the issue?

Why this matters

Suggested change ∙ Feature Preview

Provide feedback to improve future suggestions

Uh oh!

korbit-ai bot Oct 27, 2025

Choose a reason for hiding this comment

Configuration Duplication Instead of Inheritance

What is the issue?

Why this matters

Suggested change ∙ Feature Preview

Provide feedback to improve future suggestions

Uh oh!

korbit-ai bot Oct 27, 2025

Choose a reason for hiding this comment

Missing response parameter structure documentation

What is the issue?

Why this matters

Suggested change ∙ Feature Preview

Provide feedback to improve future suggestions

Uh oh!

korbit-ai bot Oct 27, 2025

Choose a reason for hiding this comment

Inefficient nested iteration in content processing

What is the issue?

Why this matters

Suggested change ∙ Feature Preview

Provide feedback to improve future suggestions

Uh oh!

korbit-ai bot Oct 27, 2025

Choose a reason for hiding this comment

Complex Function with Multiple Responsibilities

What is the issue?

Why this matters

Suggested change ∙ Feature Preview

Provide feedback to improve future suggestions

Uh oh!

korbit-ai bot Oct 27, 2025

Choose a reason for hiding this comment

Missing Purpose in UI Pattern Removal Docstring

What is the issue?

Why this matters

Suggested change ∙ Feature Preview

Provide feedback to improve future suggestions

Uh oh!

korbit-ai bot Oct 27, 2025

Choose a reason for hiding this comment

In-place mutation of observation dictionary

What is the issue?

Why this matters

Suggested change ∙ Feature Preview

Provide feedback to improve future suggestions

Uh oh!

korbit-ai bot Oct 27, 2025

Choose a reason for hiding this comment

Hardcoded UI Pattern List

What is the issue?

Why this matters

Suggested change ∙ Feature Preview

Provide feedback to improve future suggestions

Vattikondadheeraj commented Oct 27, 2025 •

edited by korbit-ai bot

Loading

korbit-ai bot left a comment •

edited

Loading