-
Notifications
You must be signed in to change notification settings - Fork 736
Description
Version
System:
OS: Windows 11 10.0.26100
CPU: (16) x64 13th Gen Intel(R) Core(TM) i5-1340P
Memory: 4.60 GB / 31.64 GB
Browsers:
Edge: Chromium (140.0.3485.54), ChromiumDev (140.0.3421.0)
Internet Explorer: 11.0.26100.1882
npmPackage:
@midscene/cli@0.28.10
npm@11.6.0
Details
在yaml配置文件运行的情况下,模型使用的是阿里的千问模型,配置如下:
OPENAI_BASE_URL="https://dashscope.aliyuncs.com/compatible-mode/v1"
OPENAI_API_KEY="xxx"
MIDSCENE_MODEL_NAME="qwen-vl-max-latest"
MIDSCENE_USE_QWEN_VL=1
简单yaml配置文件内容如下:
web:
url: "xxx"
viewportWidth: 1861
viewportHeight: 911
tasks:
- name: "View Data Dashboard"
flow:- aiScroll:
direction: down
scrollType: untilBottom
locate: "侧边栏导航菜单" - aiAssert: "The data dashboard is visible with transaction amounts and charts"
- aiScroll:

直接使用命令行运行:midscene .\testScroll.yml
无论是使用puppeteer的无头浏览器模式、还是直接使用浏览器桥接模式,均会报以下错误:
error:
AI model failed to locate:
The user's description is undefined, so no element can be identified.
(Failed to parse bbox: invalid bbox data for qwen-vl mode: [] )
Error: AI model failed to locate:
The user's description is undefined, so no element can be identified.
(Failed to parse bbox: invalid bbox data for qwen-vl mode: [] )
at Insight.locate (C:\Users\53647\AppData\Roaming\npm\node_modules@midscene\cli\node_modules@midscene\core\dist\lib\insight\webpack:@midscene\core\src\insight\index.ts:213:13)
at processTicksAndRejections (node:internal/process/task_queues:105:5)
at Object.executor (C:\Users\53647\AppData\Roaming\npm\node_modules@midscene\cli\node_modules@midscene\core\dist\lib\agent\webpack:@midscene\core\src\agent\tasks.ts:302:19)
at Executor.flush (C:\Users\53647\AppData\Roaming\npm\node_modules@midscene\cli\node_modules@midscene\core\dist\lib\ai-model\webpack:@midscene\core\src\ai-model\action-executor.ts:127:25)
at TaskExecutor.runPlans (C:\Users\53647\AppData\Roaming\npm\node_modules@midscene\cli\node_modules@midscene\core\dist\lib\agent\webpack:@midscene\core\src\agent\tasks.ts:884:20)
at AgentOverChromeBridge.callActionInActionSpace (C:\Users\53647\AppData\Roaming\npm\node_modules@midscene\cli\node_modules@midscene\core\dist\lib\agent\webpack:@midscene\core\src\agent\agent.ts:339:34)
at ScriptPlayer.playTask (C:\Users\53647\AppData\Roaming\npm\node_modules@midscene\cli\node_modules@midscene\core\dist\lib\yaml\webpack:@midscene\core\src\yaml\player.ts:439:9)
at ScriptPlayer.run (C:\Users\53647\AppData\Roaming\npm\node_modules@midscene\cli\node_modules@midscene\core\dist\lib\yaml\webpack:@midscene\core\src\yaml\player.ts:581:9)
at executeFile (C:\Users\53647\AppData\Roaming\npm\node_modules@midscene\cli\dist\lib\webpack:@midscene\cli\src\batch-runner.ts:231:9)
at C:\Users\53647\AppData\Roaming\npm\node_modules@midscene\cli\dist\lib\webpack:@midscene\cli\src\batch-runner.ts:319:35
从错误信息上得出,应该是模型没有接收到写在yaml配置文件中的提示词,实测使用浏览器插件,同样的页面元素,同样的指令,不存在这个问题,能够正常定位元素并滚动到最下方,因此推断是yaml模式下的aiScroll指令的提示词解析并透传给大模型的这一步出了问题
Reproduce link
https://github.com/web-infra-dev/midscene-example/tree/main/yaml-scripts-demo
Reproduce Steps
midscene .\testScroll.yml