Skip to content

Conversation

yuyutaotao
Copy link
Collaborator

No description provided.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

let imagePayload = screenshotBase64;
let imageWidth = size.width;
let imageHeight = size.height;
const rightLimit = imageWidth;
const bottomLimit = imageHeight;
if (vlMode === 'qwen-vl') {
const paddedResult = await paddingToMatchBlockByBase64(imagePayload);

P1 Badge Planning requests omit element overlays for non-VL models

The planning pipeline now forwards the original screenshot for non-VL models after only handling the qwen-vl padding case. The prompts used for non-VL planning still tell the model that element IDs are “marked with a rectangle in the screenshot” (prompt/llm-planning.ts lines ~272–274). With the overlay removed, GPT‑4o receives an unannotated screenshot and cannot reliably return element IDs for actions, leading to incorrect or null locate parameters. Reintroduce the overlay or revise prompts/consumers so the model’s assumptions match the data it sees.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting

Comment on lines 176 to 178
imageWidth = paddedResult.width;
imageHeight = paddedResult.height;
imagePayload = paddedResult.imageBase64;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Non-VL element lookup no longer overlays element IDs

The AiLocateElement flow now skips markupImageForLLM whenever vlMode is undefined, so GPT‑4o receives the raw screenshot instead of the annotated version. The locator prompt for non-VL models still asserts that non-text elements “have been highlighted” and expects IDs marked in the image (prompt/llm-locator.ts around lines 83‑86). Without those overlays the model cannot map the textual element list to the screenshot, so element searches for default models will degrade or fail outright. Consider restoring the markup step or updating the prompt and downstream logic to align with the unannotated image.

Useful? React with 👍 / 👎.

@quanru quanru force-pushed the feat/remove-gpt-4 branch from 30c347e to ff22375 Compare October 20, 2025 03:01
@quanru quanru merged commit 2a98471 into 1.0 Oct 20, 2025
3 checks passed
@quanru quanru deleted the feat/remove-gpt-4 branch October 20, 2025 03:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants