-
Notifications
You must be signed in to change notification settings - Fork 110
Description
SIGSEGV Error in go-fitz
When Processing Multiple PDFs Concurrently
Hi everyone!
I'm wondering if someone here has experienced something similar and might be able to help.
I'd really appreciate any suggestions or insights you can share.
Thanks in advance!
Issue Summary
When running a Go application using the go-fitz
package (v1.24.15
) to process 28 PDF files with one page each, I encounter a SIGSEGV segmentation violation error. The error happens when concurrently rendering PDF pages in separate goroutines, specifically using the ImageDPI
method of the Document
object.
The issue appears to be related to concurrent usage of the library, as the crash always occurs when running multiple workers to process PDFs in parallel. Reducing the number of workers to maxWorkers=1
eliminates the issue, but increasing it to a higher number (e.g., maxWorkers=2
or more) causes random segmentation violations during the execution.
Environment:
- `go-fitz` version: v1.24.14
- Go version: 1.23.6
- OS: macOS (arm64)
- Files: 28 PDF files, 1 page each, typical dimensions `(0,0)-(2550,3300)`
- Concurrent workers: 2 or more cause the crash.
- dpi := 300
code:
package main
import (
"context"
"fmt"
"github.com/gen2brain/go-fitz"
"image"
"os"
"path/filepath"
"sync"
)
type PdfServiceImpl struct {
}
func (service *PdfServiceImpl) renderRegularPage(ctx context.Context, pdfBytes []byte, fileName string, dpi int) (image.Image, error) {
fmt.Printf("IN - Rendering regular PDF page. File: %s\n", fileName)
// Load the PDF from memory
fileDoc, err := fitz.NewFromMemory(pdfBytes)
if err != nil {
return nil, fmt.Errorf("Error initializing the PDF file %s: %w", fileName, err)
}
defer fileDoc.Close()
fmt.Printf("The file %s was successfully loaded.\n", fileName)
// Validate that the PDF has at least one page
if fileDoc.NumPage() <= 0 {
return nil, fmt.Errorf("The PDF %s contains no pages", fileName)
}
// Render the first page
pageIndex := 0
img, err := fileDoc.ImageDPI(pageIndex, float64(dpi))
if err != nil {
return nil, fmt.Errorf("Error getting the DPI of the page image in %s: %w", fileName, err)
}
fmt.Printf("OUT - PDF page successfully rendered. File: %s\n", fileName)
return img, nil
}
func main() {
ctx := context.Background()
service := &PdfServiceImpl{}
const maxWorkers = 2
workerPool := make(chan struct{}, maxWorkers)
pdfDirPath := "input/"
files, err := os.ReadDir(pdfDirPath)
if err != nil {
fmt.Printf("Error reading the directory: %v\n", err)
return
}
dpi := 300
wg := &sync.WaitGroup{}
for _, file := range files {
if file.IsDir() {
continue
}
workerPool <- struct{}{}
wg.Add(1)
go func(fileName string) {
defer func() {
<-workerPool
wg.Done()
}()
pdfPath := filepath.Join(pdfDirPath, fileName)
pdfBytes, err := os.ReadFile(pdfPath)
if err != nil {
fmt.Printf("Error reading PDF file %s: %v\n", fileName, err)
return
}
_, err = service.renderRegularPage(ctx, pdfBytes, fileName, dpi)
if err != nil {
fmt.Printf("Error rendering file %s: %v\n", fileName, err)
return
}
fmt.Printf("File successfully processed: %s\n", fileName)
}(file.Name())
}
wg.Wait()
fmt.Println("All files processed successfully.")
}
ERROR:
Starting to process file: page-02.pdf
Starting to process file: page-01.pdf
IN - Rendering regular PDF page. File: page-02.pdf
IN - Rendering regular PDF page. File: page-01.pdf
The file page-01.pdf was successfully loaded.
The file page-02.pdf was successfully loaded.
The PDF page-01.pdf has 1 pages
The PDF page-02.pdf has 1 pages
OUT - PDF page successfully rendered. File: page-02.pdf
File successfully processed: page-02.pdf, Dimensions: (0,0)-(2550,3300)
Starting to process file: page-03.pdf
IN - Rendering regular PDF page. File: page-03.pdf
The file page-03.pdf was successfully loaded.
The PDF page-03.pdf has 1 pages
OUT - PDF page successfully rendered. File: page-01.pdf
File successfully processed: page-01.pdf, Dimensions: (0,0)-(2550,3300)
Starting to process file: page-04.pdf
IN - Rendering regular PDF page. File: page-04.pdf
The file page-04.pdf was successfully loaded.
The PDF page-04.pdf has 1 pages
SIGSEGV: segmentation violation
PC=0x1029ad6e4 m=0 sigcode=2 addr=0x28
signal arrived during cgo execution
goroutine 4 gp=0x14000104540 m=0 mp=0x103039f60 [syscall]:
runtime.cgocall(0x10292cce4, 0x140000a4a28)
go/go1.23.6/src/runtime/cgocall.go:167 +0x44 fp=0x140000a49f0 sp=0x140000a49b0 pc=0x1028e2b84
github.com/gen2brain/go-fitz._Cfunc_run_page_contents(0x300060000, 0x600000fd0000, 0x11b013e00, {0x3f800000, 0x0, 0x0, 0x3f800000, 0x0, 0x0}, 0x0)
_cgo_gotypes.go:1679 +0x34 fp=0x140000a4a20 sp=0x140000a49f0 pc=0x102929194
github.com/gen2brain/go-fitz.(*Document).ImageDPI.func10(0x0?, 0x600000fd0000, 0x11b013e00, {0x3f800000, 0x0, 0x0, 0x3f800000, 0x0, 0x0})
go/pkg/mod/github.com/gen2brain/go-fitz@v1.24.14/fitz_cgo.go:231 +0xc8 fp=0x140000a4a90 sp=0x140000a4a20 pc=0x10292a038
github.com/gen2brain/go-fitz.(*Document).ImageDPI(0x1400009c0c0, 0x0, 0x4072c00000000000)
go/pkg/mod/github.com/gen2brain/go-fitz@v1.24.14/fitz_cgo.go:231 +0x2ec fp=0x140000a4c90 sp=0x140000a4a90 pc=0x102929b1c
main.(*PdfServiceImpl).renderRegularPage(0x140000220a0?, {0x14000116018?, 0x102ca6fc3?}, {0x1400016c000, 0xc2e2, 0xc2e3}, {0x140001120d0, 0xb}, 0x12c)
...
r0 0x0
r1 0x11b04a600
r2 0x0
r3 0x30027ec10
r4 0x7f
r5 0x11b04a624
r6 0x42
r7 0x11b04a628
r8 0x103064000
r9 0x11b0479c0
r10 0x11b04c610
r11 0x1fdc
r12 0x0
r13 0x11b04c520
r14 0x0
r15 0x0
r16 0x19069c250
r17 0x0
r18 0x0
r19 0x11b04a600
r20 0x3ec
r21 0x100
r22 0x0
r23 0x11b0479c0
r24 0x100
r25 0x30027eff0
r26 0x11b04c610
r27 0xc
r28 0x30027eff0
r29 0x16d57b8a0
lr 0x102c59880
sp 0x16d57b890
pc 0x1029ad6e4
fault 0x28
Process finished with the exit code 2
What I Tried
- Mutex synchronization: I added a
sync.Mutex
within thePdfServiceImpl
struct or as a global mutex for the entire application to ensure serialized access torenderRegularPage
.the program runs successfully and processes all files without issues, but just one woker can access torenderRegularPage
at a time - Serial processing: When
maxWorkers=1
, the program runs successfully and processes all files without issues. - Testing problematic files: I considered the possibility of a specific PDF file causing the crash, but the crash does not always occur with the same file, reinforcing the idea that the issue is concurrency-related.
Possible Cause
The issue could be due to non-thread-safe behavior in the go-fitz
library or the underlying MuPDF C library used by go-fitz
. Given that the crash occurs only during concurrent execution and resolves when running sequentially, it suggests that shared internal state or global resources in the library may not be properly isolated for thread safety.
Thanks you for your time and support!