Skip to content

SIGSEGV Error in go-fitz When Processing Multiple PDFs Concurrently #142

@lazaromedina

Description

@lazaromedina

SIGSEGV Error in go-fitz When Processing Multiple PDFs Concurrently

Hi everyone!
I'm wondering if someone here has experienced something similar and might be able to help.
I'd really appreciate any suggestions or insights you can share.
Thanks in advance!

Issue Summary

When running a Go application using the go-fitz package (v1.24.15) to process 28 PDF files with one page each, I encounter a SIGSEGV segmentation violation error. The error happens when concurrently rendering PDF pages in separate goroutines, specifically using the ImageDPI method of the Document object.
The issue appears to be related to concurrent usage of the library, as the crash always occurs when running multiple workers to process PDFs in parallel. Reducing the number of workers to maxWorkers=1 eliminates the issue, but increasing it to a higher number (e.g., maxWorkers=2 or more) causes random segmentation violations during the execution.

Environment:

- `go-fitz` version: v1.24.14
- Go version: 1.23.6
- OS: macOS (arm64)
- Files: 28 PDF files, 1 page each, typical dimensions `(0,0)-(2550,3300)`
- Concurrent workers: 2 or more cause the crash.
- dpi := 300

code:

package main

import (
	"context"
	"fmt"
	"github.com/gen2brain/go-fitz"
	"image"
	"os"
	"path/filepath"
	"sync"
)

type PdfServiceImpl struct {
}

func (service *PdfServiceImpl) renderRegularPage(ctx context.Context, pdfBytes []byte, fileName string, dpi int) (image.Image, error) {
	fmt.Printf("IN - Rendering regular PDF page. File: %s\n", fileName)

	// Load the PDF from memory
	fileDoc, err := fitz.NewFromMemory(pdfBytes)
	if err != nil {
		return nil, fmt.Errorf("Error initializing the PDF file %s: %w", fileName, err)
	}
	defer fileDoc.Close()
	fmt.Printf("The file %s was successfully loaded.\n", fileName)

	// Validate that the PDF has at least one page
	if fileDoc.NumPage() <= 0 {
		return nil, fmt.Errorf("The PDF %s contains no pages", fileName)
	}

	// Render the first page
	pageIndex := 0
	img, err := fileDoc.ImageDPI(pageIndex, float64(dpi))
	if err != nil {
		return nil, fmt.Errorf("Error getting the DPI of the page image in %s: %w", fileName, err)
	}

	fmt.Printf("OUT - PDF page successfully rendered. File: %s\n", fileName)
	return img, nil
}

func main() {
	ctx := context.Background()
	service := &PdfServiceImpl{}

	const maxWorkers = 2 
	workerPool := make(chan struct{}, maxWorkers)

	pdfDirPath := "input/"
	files, err := os.ReadDir(pdfDirPath)
	if err != nil {
		fmt.Printf("Error reading the directory: %v\n", err)
		return
	}

	dpi := 300
	wg := &sync.WaitGroup{}

	for _, file := range files {
		if file.IsDir() {
			continue
		}

		workerPool <- struct{}{}
		wg.Add(1)

		go func(fileName string) {
			defer func() {
				<-workerPool
				wg.Done()
			}()

			pdfPath := filepath.Join(pdfDirPath, fileName)

			pdfBytes, err := os.ReadFile(pdfPath)
			if err != nil {
				fmt.Printf("Error reading PDF file %s: %v\n", fileName, err)
				return
			}

			_, err = service.renderRegularPage(ctx, pdfBytes, fileName, dpi)
			if err != nil {
				fmt.Printf("Error rendering file %s: %v\n", fileName, err)
				return
			}

			fmt.Printf("File successfully processed: %s\n", fileName)
		}(file.Name())
	}

	wg.Wait()
	fmt.Println("All files processed successfully.")
}

ERROR:

Starting to process file: page-02.pdf
Starting to process file: page-01.pdf
IN - Rendering regular PDF page. File: page-02.pdf
IN - Rendering regular PDF page. File: page-01.pdf
The file page-01.pdf was successfully loaded.
The file page-02.pdf was successfully loaded.
The PDF page-01.pdf has 1 pages
The PDF page-02.pdf has 1 pages
OUT - PDF page successfully rendered. File: page-02.pdf
File successfully processed: page-02.pdf, Dimensions: (0,0)-(2550,3300)
Starting to process file: page-03.pdf
IN - Rendering regular PDF page. File: page-03.pdf
The file page-03.pdf was successfully loaded.
The PDF page-03.pdf has 1 pages
OUT - PDF page successfully rendered. File: page-01.pdf
File successfully processed: page-01.pdf, Dimensions: (0,0)-(2550,3300)
Starting to process file: page-04.pdf
IN - Rendering regular PDF page. File: page-04.pdf
The file page-04.pdf was successfully loaded.
The PDF page-04.pdf has 1 pages
SIGSEGV: segmentation violation
PC=0x1029ad6e4 m=0 sigcode=2 addr=0x28
signal arrived during cgo execution

goroutine 4 gp=0x14000104540 m=0 mp=0x103039f60 [syscall]:
runtime.cgocall(0x10292cce4, 0x140000a4a28)
	go/go1.23.6/src/runtime/cgocall.go:167 +0x44 fp=0x140000a49f0 sp=0x140000a49b0 pc=0x1028e2b84
github.com/gen2brain/go-fitz._Cfunc_run_page_contents(0x300060000, 0x600000fd0000, 0x11b013e00, {0x3f800000, 0x0, 0x0, 0x3f800000, 0x0, 0x0}, 0x0)
	_cgo_gotypes.go:1679 +0x34 fp=0x140000a4a20 sp=0x140000a49f0 pc=0x102929194
github.com/gen2brain/go-fitz.(*Document).ImageDPI.func10(0x0?, 0x600000fd0000, 0x11b013e00, {0x3f800000, 0x0, 0x0, 0x3f800000, 0x0, 0x0})
	go/pkg/mod/github.com/gen2brain/go-fitz@v1.24.14/fitz_cgo.go:231 +0xc8 fp=0x140000a4a90 sp=0x140000a4a20 pc=0x10292a038
github.com/gen2brain/go-fitz.(*Document).ImageDPI(0x1400009c0c0, 0x0, 0x4072c00000000000)
	go/pkg/mod/github.com/gen2brain/go-fitz@v1.24.14/fitz_cgo.go:231 +0x2ec fp=0x140000a4c90 sp=0x140000a4a90 pc=0x102929b1c
main.(*PdfServiceImpl).renderRegularPage(0x140000220a0?, {0x14000116018?, 0x102ca6fc3?}, {0x1400016c000, 0xc2e2, 0xc2e3}, {0x140001120d0, 0xb}, 0x12c)

...

r0      0x0
r1      0x11b04a600
r2      0x0
r3      0x30027ec10
r4      0x7f
r5      0x11b04a624
r6      0x42
r7      0x11b04a628
r8      0x103064000
r9      0x11b0479c0
r10     0x11b04c610
r11     0x1fdc
r12     0x0
r13     0x11b04c520
r14     0x0
r15     0x0
r16     0x19069c250
r17     0x0
r18     0x0
r19     0x11b04a600
r20     0x3ec
r21     0x100
r22     0x0
r23     0x11b0479c0
r24     0x100
r25     0x30027eff0
r26     0x11b04c610
r27     0xc
r28     0x30027eff0
r29     0x16d57b8a0
lr      0x102c59880
sp      0x16d57b890
pc      0x1029ad6e4
fault   0x28

Process finished with the exit code 2

What I Tried

  1. Mutex synchronization: I added a sync.Mutex within the PdfServiceImpl struct or as a global mutex for the entire application to ensure serialized access to renderRegularPage.the program runs successfully and processes all files without issues, but just one woker can access to renderRegularPage at a time
  2. Serial processing: When maxWorkers=1, the program runs successfully and processes all files without issues.
  3. Testing problematic files: I considered the possibility of a specific PDF file causing the crash, but the crash does not always occur with the same file, reinforcing the idea that the issue is concurrency-related.

Possible Cause

The issue could be due to non-thread-safe behavior in the go-fitz library or the underlying MuPDF C library used by go-fitz. Given that the crash occurs only during concurrent execution and resolves when running sequentially, it suggests that shared internal state or global resources in the library may not be properly isolated for thread safety.

Thanks you for your time and support!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions