-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Implement VLM Run Invoice Parser node with file upload and parsing logic #1442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Implement VLM Run Invoice Parser node with file upload and parsing logic #1442
Conversation
@whereisfarukk is attempting to deploy a commit to the Sim Team on Vercel. A member of the Team first needs to authorize it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Summary
This PR adds a new VLM Run Invoice Parser node to sim.ai that enables users to upload invoice files (PDF or images) and extract structured JSON data using the VLM Run API. The implementation includes a two-step process: file upload and document parsing with hardcoded settings (domain: "document.invoice", model: "vlm-1").
Key Changes:
- New Tool Implementation:
parseInvoiceTool
with support for both local file paths and URLs - Block Configuration: UI block for invoice parser with API key and file path inputs
- Type Definitions: Proper TypeScript interfaces for VLM Run parameters and responses
- Integration: Added to tools and blocks registries with a simple house icon
Issues Found:
- Unused request configuration in tool definition that conflicts with direct API usage
- Debug console.log statements should be replaced with proper logging
- Redundant data/output fields in response object
- Generic icon doesn't represent invoice parsing functionality
The core functionality works as intended but could benefit from some code cleanup and better error handling patterns.
Confidence Score: 4/5
- This PR is safe to merge with minor code quality improvements needed
- Score reflects functional implementation that follows established patterns, but contains some code quality issues like unused configuration, debug logging, and redundant fields. No critical security or logic errors were found.
- The main implementation file
apps/sim/tools/vlmrun/parse_invoice.ts
needs attention for code cleanup
Important Files Changed
File Analysis
Filename | Score | Overview |
---|---|---|
apps/sim/tools/vlmrun/parse_invoice.ts | 3/5 | Implements VLM Run invoice parsing with file upload and URL support; has some code quality issues |
apps/sim/tools/vlmrun/types.ts | 4/5 | Defines TypeScript types for VLM Run tool configuration; well-structured |
apps/sim/blocks/blocks/vlmrun.ts | 4/5 | Block configuration for VLM Run Invoice Parser UI; follows established patterns |
apps/sim/blocks/registry.ts | 5/5 | Added VLM Run block to registry; clean integration with existing codebase |
apps/sim/tools/index.ts | 5/5 | Added VLM Run tool to tools registry; proper integration |
Sequence Diagram
sequenceDiagram
participant User
participant UI as VLM Run Block UI
participant Tool as parseInvoiceTool
participant Client as VlmRun Client
participant API as VLM Run API
User->>UI: Enter API Key & File Path
UI->>Tool: execute(params: {apiKey, filePath})
Tool->>Tool: Check if filePath is URL
alt File Path is URL
Tool->>Client: document.generate({url, model: 'vlm-1', domain: 'document.invoice'})
Client->>API: Generate document from URL
API-->>Client: Parse result with structured data
Client-->>Tool: Response with invoice JSON
else File Path is Local File
Tool->>Client: files.upload({filePath})
Client->>API: Upload file
API-->>Client: File ID
Client-->>Tool: Upload response with fileId
Tool->>Client: document.generate({fileId, model: 'vlm-1', domain: 'document.invoice'})
Client->>API: Generate document from file
API-->>Client: Parse result with structured data
Client-->>Tool: Response with invoice JSON
end
Tool->>Tool: Check response status
alt Status is 'completed'
Tool-->>UI: {success: true, data: parsed_invoice_data}
UI-->>User: Display structured invoice data
else Status is not 'completed'
Tool-->>UI: {success: false, error: "Parsing failed"}
UI-->>User: Display error message
end
11 files reviewed, 5 comments
const client = new VlmRun({ apiKey: params.apiKey }) | ||
console.log(`[DEBUG] Starting execution with filePath/URL: ${params.filePath}`) | ||
|
||
let fileId: string | undefined | ||
let parseResponse: any | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Variable fileId
is declared with type string | undefined
but only used in the file upload path. Consider declaring it closer to where it's used to improve code clarity.
console.log(`[DEBUG] Starting execution with filePath/URL: ${params.filePath}`) | ||
|
||
let fileId: string | undefined | ||
let parseResponse: any | ||
|
||
const isUrl = params.filePath.startsWith('http://') || params.filePath.startsWith('https://') | ||
console.log(`[DEBUG] Detected as URL: ${isUrl}`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Multiple console.log statements used for debugging. Consider using a proper logging library or removing debug statements for production.
request: { | ||
url: () => 'https://api.vlmrun.com/files/upload', | ||
method: 'POST' as const, | ||
headers: (params: VlmRunParams) => ({ | ||
'Authorization': `Bearer ${params.apiKey}`, | ||
'Content-Type': 'multipart/form-data', | ||
}), | ||
body: (params: VlmRunParams) => ({ | ||
filePath: params.filePath, | ||
}), | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: The request
field defines an upload URL and headers but isn't actually used by the tool. The actual API calls are made directly in the execute
function. Either remove this unused configuration or refactor to use the framework's request handling.
return { | ||
success: true, | ||
data: parseResponse.response, | ||
output: parseResponse.response, | ||
error: undefined, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: The response object has both data
and output
fields with the same value. This redundancy might be confusing - consider using only one field consistently.
export const VlmRunIcon = () => ( | ||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="currentColor" className="w-6 h-6"> | ||
<path d="M12 2L2 12h3v8h14v-8h3L12 2z" /> | ||
</svg> | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: The VlmRunIcon uses a generic house icon. Consider creating a more specific icon that represents invoice parsing or the VLM Run brand.
import { WebhookBlock } from "@/blocks/blocks/webhook"; | ||
import { WhatsAppBlock } from "@/blocks/blocks/whatsapp"; | ||
import { WikipediaBlock } from "@/blocks/blocks/wikipedia"; | ||
import { WorkflowBlock } from "@/blocks/blocks/workflow"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may need to remove these formatting related changes
}; | ||
|
||
export const getBlock = (type: string): BlockConfig | undefined => registry[type] | ||
export const getBlock = (type: string): BlockConfig | undefined => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Formatting changes. remove?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah no formatting changes needed
apps/sim/.env.example
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why remove this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do not remove env.example
console.log(`[DEBUG] API response: ${JSON.stringify(parseResponse)}`) | ||
|
||
if (parseResponse.status !== 'completed') { | ||
console.error(`[ERROR] Parsing failed: status ${parseResponse.status}`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't have console.error --> use our logger instead
apps/sim/tools/vlmrun/types.ts
Outdated
version: string | ||
provider?: string | ||
params: Record<string, ToolParamConfig> | ||
request: { // 👈 MUST EXIST |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove comment
}, | ||
"scripts": { | ||
"dev": "next dev --turbo --port 3000", | ||
"dev": "next dev --port 3000", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please don't disable turbo for dev builds
Feat: Add VLM Run Invoice Parser Node
Description:
Implemented a new VLM Run Invoice Parser node for sim.ai, following the assignment requirements. The node performs a two-step process:
Uploads an invoice file (PDF or image) to the VLM Run File Upload API.
Extracts structured JSON data using the Document Parsing API with hardcoded settings (domain:
document.invoice", model: "vlm-1
).Screenshots/Videos