Skip to content

Conversation

whereisfarukk
Copy link

Feat: Add VLM Run Invoice Parser Node

Description:
Implemented a new VLM Run Invoice Parser node for sim.ai, following the assignment requirements. The node performs a two-step process:

Uploads an invoice file (PDF or image) to the VLM Run File Upload API.
Extracts structured JSON data using the Document Parsing API with hardcoded settings (domain: document.invoice", model: "vlm-1).

Screenshots/Videos

Copy link

vercel bot commented Sep 24, 2025

@whereisfarukk is attempting to deploy a commit to the Sim Team on Vercel.

A member of the Team first needs to authorize it.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Summary

This PR adds a new VLM Run Invoice Parser node to sim.ai that enables users to upload invoice files (PDF or images) and extract structured JSON data using the VLM Run API. The implementation includes a two-step process: file upload and document parsing with hardcoded settings (domain: "document.invoice", model: "vlm-1").

Key Changes:

  • New Tool Implementation: parseInvoiceTool with support for both local file paths and URLs
  • Block Configuration: UI block for invoice parser with API key and file path inputs
  • Type Definitions: Proper TypeScript interfaces for VLM Run parameters and responses
  • Integration: Added to tools and blocks registries with a simple house icon

Issues Found:

  • Unused request configuration in tool definition that conflicts with direct API usage
  • Debug console.log statements should be replaced with proper logging
  • Redundant data/output fields in response object
  • Generic icon doesn't represent invoice parsing functionality

The core functionality works as intended but could benefit from some code cleanup and better error handling patterns.

Confidence Score: 4/5

  • This PR is safe to merge with minor code quality improvements needed
  • Score reflects functional implementation that follows established patterns, but contains some code quality issues like unused configuration, debug logging, and redundant fields. No critical security or logic errors were found.
  • The main implementation file apps/sim/tools/vlmrun/parse_invoice.ts needs attention for code cleanup

Important Files Changed

File Analysis

Filename        Score        Overview
apps/sim/tools/vlmrun/parse_invoice.ts 3/5 Implements VLM Run invoice parsing with file upload and URL support; has some code quality issues
apps/sim/tools/vlmrun/types.ts 4/5 Defines TypeScript types for VLM Run tool configuration; well-structured
apps/sim/blocks/blocks/vlmrun.ts 4/5 Block configuration for VLM Run Invoice Parser UI; follows established patterns
apps/sim/blocks/registry.ts 5/5 Added VLM Run block to registry; clean integration with existing codebase
apps/sim/tools/index.ts 5/5 Added VLM Run tool to tools registry; proper integration

Sequence Diagram

sequenceDiagram
    participant User
    participant UI as VLM Run Block UI
    participant Tool as parseInvoiceTool
    participant Client as VlmRun Client
    participant API as VLM Run API

    User->>UI: Enter API Key & File Path
    UI->>Tool: execute(params: {apiKey, filePath})
    Tool->>Tool: Check if filePath is URL
    
    alt File Path is URL
        Tool->>Client: document.generate({url, model: 'vlm-1', domain: 'document.invoice'})
        Client->>API: Generate document from URL
        API-->>Client: Parse result with structured data
        Client-->>Tool: Response with invoice JSON
    else File Path is Local File
        Tool->>Client: files.upload({filePath})
        Client->>API: Upload file
        API-->>Client: File ID
        Client-->>Tool: Upload response with fileId
        Tool->>Client: document.generate({fileId, model: 'vlm-1', domain: 'document.invoice'})
        Client->>API: Generate document from file
        API-->>Client: Parse result with structured data  
        Client-->>Tool: Response with invoice JSON
    end
    
    Tool->>Tool: Check response status
    alt Status is 'completed'
        Tool-->>UI: {success: true, data: parsed_invoice_data}
        UI-->>User: Display structured invoice data
    else Status is not 'completed'
        Tool-->>UI: {success: false, error: "Parsing failed"}
        UI-->>User: Display error message
    end
Loading

11 files reviewed, 5 comments

Edit Code Review Bot Settings | Greptile

Comment on lines 39 to 44
const client = new VlmRun({ apiKey: params.apiKey })
console.log(`[DEBUG] Starting execution with filePath/URL: ${params.filePath}`)

let fileId: string | undefined
let parseResponse: any

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Variable fileId is declared with type string | undefined but only used in the file upload path. Consider declaring it closer to where it's used to improve code clarity.

Comment on lines 40 to 46
console.log(`[DEBUG] Starting execution with filePath/URL: ${params.filePath}`)

let fileId: string | undefined
let parseResponse: any

const isUrl = params.filePath.startsWith('http://') || params.filePath.startsWith('https://')
console.log(`[DEBUG] Detected as URL: ${isUrl}`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Multiple console.log statements used for debugging. Consider using a proper logging library or removing debug statements for production.

Comment on lines 12 to 22
request: {
url: () => 'https://api.vlmrun.com/files/upload',
method: 'POST' as const,
headers: (params: VlmRunParams) => ({
'Authorization': `Bearer ${params.apiKey}`,
'Content-Type': 'multipart/form-data',
}),
body: (params: VlmRunParams) => ({
filePath: params.filePath,
}),
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: The request field defines an upload URL and headers but isn't actually used by the tool. The actual API calls are made directly in the execute function. Either remove this unused configuration or refactor to use the framework's request handling.

Comment on lines 78 to 83
return {
success: true,
data: parseResponse.response,
output: parseResponse.response,
error: undefined,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: The response object has both data and output fields with the same value. This redundancy might be confusing - consider using only one field consistently.

Comment on lines +24 to +28
export const VlmRunIcon = () => (
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="currentColor" className="w-6 h-6">
<path d="M12 2L2 12h3v8h14v-8h3L12 2z" />
</svg>
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: The VlmRunIcon uses a generic house icon. Consider creating a more specific icon that represents invoice parsing or the VLM Run brand.

import { WebhookBlock } from "@/blocks/blocks/webhook";
import { WhatsAppBlock } from "@/blocks/blocks/whatsapp";
import { WikipediaBlock } from "@/blocks/blocks/wikipedia";
import { WorkflowBlock } from "@/blocks/blocks/workflow";

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may need to remove these formatting related changes

};

export const getBlock = (type: string): BlockConfig | undefined => registry[type]
export const getBlock = (type: string): BlockConfig | undefined =>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting changes. remove?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah no formatting changes needed

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not remove env.example

console.log(`[DEBUG] API response: ${JSON.stringify(parseResponse)}`)

if (parseResponse.status !== 'completed') {
console.error(`[ERROR] Parsing failed: status ${parseResponse.status}`)
Copy link
Collaborator

@icecrasher321 icecrasher321 Sep 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't have console.error --> use our logger instead

version: string
provider?: string
params: Record<string, ToolParamConfig>
request: { // 👈 MUST EXIST
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove comment

},
"scripts": {
"dev": "next dev --turbo --port 3000",
"dev": "next dev --port 3000",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please don't disable turbo for dev builds

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants