Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1 +1,9 @@
node_modules
**/dist/
**/examples/
**/*.traineddata
.env
.env.local
tmp/
!tmp/.gitkeep
*.tsbuildinfo
254 changes: 254 additions & 0 deletions cli/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,254 @@
# Documind CLI

Command-line interface for intelligent document processing and structured data extraction using Documind.

## Installation

```bash
# Install dependencies
npm install

# Build the CLI
npm run build
```

## Usage

The CLI provides several commands to work with documents:

### Extract Command

Extract structured data from documents using schemas, templates, or auto-generation.

```bash
documind extract [options]
```

#### Options

- `-f, --file <path>` - Path to the document file (required)
- `-m, --model <model>` - LLM model to use (default: gpt-4o-mini)
- `-s, --schema <path>` - Path to JSON schema file
- `-t, --template <name>` - Name of a predefined template
- `-a, --auto-schema` - Auto-generate schema from document
- `-i, --instructions <text>` - Instructions for auto-schema generation
- `-o, --output <path>` - Output file path for results (JSON)
- `--base-url <url>` - Base URL for local LLM (for Ollama)

#### Examples

**Extract with auto-generated schema:**
```bash
documind extract -f invoice.pdf --auto-schema
```

**Extract with custom schema:**
```bash
documind extract -f document.pdf -s schema.json -o output.json
```

**Extract using a predefined template:**
```bash
documind extract -f invoice.pdf -t invoice -o result.json
```

**Extract with local LLM (Ollama):**
```bash
documind extract -f doc.pdf -m llama3.2-vision --base-url http://localhost:11434/v1 --auto-schema
```

**Extract with specific instructions:**
```bash
documind extract -f contract.pdf --auto-schema -i "Extract party names, dates, and monetary amounts"
```

### Convert Command

Convert documents to markdown or plain text.

```bash
documind convert [options]
```

#### Options

- `-f, --file <path>` - Path to the document file (required)
- `-m, --model <model>` - LLM model to use (default: gpt-4o-mini)
- `-t, --format <format>` - Output format: markdown or plaintext (default: markdown)
- `-o, --output <path>` - Output file path
- `--base-url <url>` - Base URL for local LLM (for Ollama)

#### Examples

**Convert to markdown:**
```bash
documind convert -f document.pdf -t markdown -o output.md
```

**Convert to plain text:**
```bash
documind convert -f document.pdf -t plaintext -o output.txt
```

### Templates Command

List all available predefined templates.

```bash
documind templates
```

## Supported Models

### OpenAI Models
- `gpt-4o`
- `gpt-4o-mini` (default)
- `gpt-4.1`
- `gpt-4.1-mini`

### Google Models
- `gemini-2.0-flash-001`
- `gemini-2.0-flash-lite-preview-02-05`
- `gemini-1.5-flash`
- `gemini-1.5-flash-8b`
- `gemini-1.5-pro`

### Local Models (Ollama)
- `llama3.2-vision`

## Environment Variables

Set these environment variables for API access:

```bash
# OpenAI API key for GPT models
export OPENAI_API_KEY="your-openai-api-key"

# Google Gemini API key
export GEMINI_API_KEY="your-gemini-api-key"

# Base URL for local LLM (Ollama)
export BASE_URL="http://localhost:11434/v1"
```

Or create a `.env` file:

```env
OPENAI_API_KEY=your-openai-api-key
GEMINI_API_KEY=your-gemini-api-key
BASE_URL=http://localhost:11434/v1
```

## Schema Format

Schemas should be in JSON format with the following structure:

```json
[
{
"name": "field_name",
"type": "string",
"description": "Field description"
},
{
"name": "nested_object",
"type": "object",
"description": "An object field",
"children": [
{
"name": "child_field",
"type": "number",
"description": "Child field description"
}
]
},
{
"name": "items_list",
"type": "array",
"description": "A list of items",
"children": [
{
"name": "item_name",
"type": "string",
"description": "Item name"
}
]
}
]
```

### Supported Field Types

- `string` - Text values
- `number` - Numeric values
- `boolean` - True/false values
- `enum` - Predefined set of values (requires `values` array)
- `object` - Nested objects (requires `children` array)
- `array` - Lists (requires `children` array)

## Using with Local LLMs

To use Ollama or other local LLMs:

1. Install and start Ollama:
```bash
# Install Ollama
curl https://ollama.ai/install.sh | sh

# Pull the vision model
ollama pull llama3.2-vision

# Start Ollama (if not already running)
ollama serve
```

2. Use with the CLI:
```bash
documind extract \
-f document.pdf \
-m llama3.2-vision \
--base-url http://localhost:11434/v1 \
--auto-schema \
-o result.json
```

## Development

```bash
# Watch mode
npm run dev

# Build
npm run build

# Run
npm start -- extract -f example.pdf --auto-schema
```

## Troubleshooting

### "Cannot find module" errors

Make sure you've built the project:
```bash
npm run build
```

### API Key errors

Ensure your API keys are set:
```bash
echo $OPENAI_API_KEY
echo $GEMINI_API_KEY
```

### Local LLM connection issues

Check if Ollama is running:
```bash
curl http://localhost:11434/v1/models
```

## License

AGPL-3.0
37 changes: 37 additions & 0 deletions cli/package.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
{
"name": "cli",
"version": "1.0.0",
"description": "Command-line interface for Documind extractor",
"type": "module",
"main": "dist/index.js",
"types": "dist/index.d.ts",
"bin": {
"documind": "./dist/index.js"
},
"files": [
"dist"
],
"scripts": {
"build": "tsc --build && tsc-alias -p tsconfig.json",
"start": "node dist/index.js",
"dev": "tsx src/index.ts"
},
"keywords": [
"cli",
"document",
"extraction",
"llm"
],
"dependencies": {
"commander": "^12.0.0",
"chalk": "^5.3.0",
"ora": "^8.0.1",
"extractor": "*",
"dotenv": "^16.4.5"
},
"devDependencies": {
"@types/node": "^20.14.11",
"tsc-alias": "^1.8.8",
"typescript": "^5.6.3"
}
}
73 changes: 73 additions & 0 deletions cli/src/commands/convert.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
import { Command } from 'commander';
import { formatter } from 'extractor';
import ora from 'ora';
import { logger } from '../utils/logger.js';
import { fileExists, writeJsonFile, resolveFilePath } from '../utils/file-helper.js';
import fs from 'fs';

export function createConvertCommand(): Command {
const cmd = new Command('convert');

cmd
.description('Convert a document to markdown or plain text')
.requiredOption('-f, --file <path>', 'Path to the document file')
.option('-m, --model <model>', 'LLM model to use', 'gpt-4o-mini')
.option('-t, --format <format>', 'Output format (markdown or plaintext)', 'markdown')
.option('-o, --output <path>', 'Output file path')
.option('--base-url <url>', 'Base URL for local LLM (for Ollama)')
.action(async (options: ConvertOptions) => {
await handleConvert(options);
});

return cmd;
}

interface ConvertOptions {
file: string;
model: string;
format: 'markdown' | 'plaintext';
output?: string;
baseUrl?: string;
}

async function handleConvert(options: ConvertOptions): Promise<void> {
const spinner = ora('Starting conversion...').start();

try {
const filePath = resolveFilePath(options.file);

if (!fileExists(filePath)) {
spinner.fail(`File not found: ${filePath}`);
process.exit(1);
}

if (options.baseUrl) {
process.env.BASE_URL = options.baseUrl;
}

spinner.text = `Converting document to ${options.format}...`;

let result: string;

if (options.format === 'markdown') {
result = await formatter.markdown({ file: filePath, model: options.model });
} else {
result = await formatter.plaintext({ file: filePath, model: options.model });
}

spinner.succeed('Conversion completed!');

if (options.output) {
const outputPath = resolveFilePath(options.output);
fs.writeFileSync(outputPath, result, 'utf-8');
logger.success(`Output saved to: ${outputPath}`);
} else {
console.log('\n' + result);
}

} catch (error) {
spinner.fail('Conversion failed');
logger.error((error as Error).message);
process.exit(1);
}
}
Loading