Skip to content

[Bug]: aiScroll in yaml , the locate prompt not in effect #1234

@Zhangxiancheng63

Description

@Zhangxiancheng63

Version

System:
    OS: Windows 11 10.0.26100
    CPU: (16) x64 13th Gen Intel(R) Core(TM) i5-1340P
    Memory: 4.60 GB / 31.64 GB
Browsers:
    Edge: Chromium (140.0.3485.54), ChromiumDev (140.0.3421.0)
    Internet Explorer: 11.0.26100.1882
npmPackage:
    @midscene/cli@0.28.10
    npm@11.6.0

Details

在yaml配置文件运行的情况下,模型使用的是阿里的千问模型,配置如下:
OPENAI_BASE_URL="https://dashscope.aliyuncs.com/compatible-mode/v1"
OPENAI_API_KEY="xxx"
MIDSCENE_MODEL_NAME="qwen-vl-max-latest"
MIDSCENE_USE_QWEN_VL=1

简单yaml配置文件内容如下:
web:
url: "xxx"
viewportWidth: 1861
viewportHeight: 911

tasks:

  • name: "View Data Dashboard"
    flow:
    • aiScroll:
      direction: down
      scrollType: untilBottom
      locate: "侧边栏导航菜单"
    • aiAssert: "The data dashboard is visible with transaction amounts and charts"
Image

直接使用命令行运行:midscene .\testScroll.yml
无论是使用puppeteer的无头浏览器模式、还是直接使用浏览器桥接模式,均会报以下错误:
error:
AI model failed to locate:
The user's description is undefined, so no element can be identified.
(Failed to parse bbox: invalid bbox data for qwen-vl mode: [] )
Error: AI model failed to locate:
The user's description is undefined, so no element can be identified.
(Failed to parse bbox: invalid bbox data for qwen-vl mode: [] )
at Insight.locate (C:\Users\53647\AppData\Roaming\npm\node_modules@midscene\cli\node_modules@midscene\core\dist\lib\insight\webpack:@midscene\core\src\insight\index.ts:213:13)
at processTicksAndRejections (node:internal/process/task_queues:105:5)
at Object.executor (C:\Users\53647\AppData\Roaming\npm\node_modules@midscene\cli\node_modules@midscene\core\dist\lib\agent\webpack:@midscene\core\src\agent\tasks.ts:302:19)
at Executor.flush (C:\Users\53647\AppData\Roaming\npm\node_modules@midscene\cli\node_modules@midscene\core\dist\lib\ai-model\webpack:@midscene\core\src\ai-model\action-executor.ts:127:25)
at TaskExecutor.runPlans (C:\Users\53647\AppData\Roaming\npm\node_modules@midscene\cli\node_modules@midscene\core\dist\lib\agent\webpack:@midscene\core\src\agent\tasks.ts:884:20)
at AgentOverChromeBridge.callActionInActionSpace (C:\Users\53647\AppData\Roaming\npm\node_modules@midscene\cli\node_modules@midscene\core\dist\lib\agent\webpack:@midscene\core\src\agent\agent.ts:339:34)
at ScriptPlayer.playTask (C:\Users\53647\AppData\Roaming\npm\node_modules@midscene\cli\node_modules@midscene\core\dist\lib\yaml\webpack:@midscene\core\src\yaml\player.ts:439:9)
at ScriptPlayer.run (C:\Users\53647\AppData\Roaming\npm\node_modules@midscene\cli\node_modules@midscene\core\dist\lib\yaml\webpack:@midscene\core\src\yaml\player.ts:581:9)
at executeFile (C:\Users\53647\AppData\Roaming\npm\node_modules@midscene\cli\dist\lib\webpack:@midscene\cli\src\batch-runner.ts:231:9)
at C:\Users\53647\AppData\Roaming\npm\node_modules@midscene\cli\dist\lib\webpack:@midscene\cli\src\batch-runner.ts:319:35

从错误信息上得出,应该是模型没有接收到写在yaml配置文件中的提示词,实测使用浏览器插件,同样的页面元素,同样的指令,不存在这个问题,能够正常定位元素并滚动到最下方,因此推断是yaml模式下的aiScroll指令的提示词解析并透传给大模型的这一步出了问题

Reproduce link

https://github.com/web-infra-dev/midscene-example/tree/main/yaml-scripts-demo

Reproduce Steps

midscene .\testScroll.yml

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions