Skip to content

Conversation

brf153
Copy link
Contributor

@brf153 brf153 commented Sep 21, 2025

Problem

The parser was failing to handle Mutual Fund Folios sections due to:

  • Regex mismatch: DEMAT_MF_HEADER_RE did not match the actual header format in CAS PDFs.
  • Logic flow issues: When current_demat was None, relevant lines were skipped, causing incomplete MF data extraction.

Solution

  • Updated the regex to correctly detect Mutual Fund Folios headers.
  • Adjusted the parsing logic to ensure lines are not skipped when current_demat is None.

Impact

This fix enables complete parsing of NSDL CAS statements containing Mutual Fund Folios alongside traditional demat accounts.

Signed-off-by: brf153 <153hsb@gmail.com>
@brf153
Copy link
Contributor Author

brf153 commented Sep 21, 2025

Before fix:
Screenshot 2025-09-21 232509

After fix:
image

@codereverser
Copy link
Owner

I'm not able to reproduce this issue. Current code works fine with all my test NSDL statement files
for eg:-
image

image

Can you send me the list of packages in your env? I suspect something might've changed in the underlying parser modules.

@brf153
Copy link
Contributor Author

brf153 commented Sep 22, 2025

@codereverser I’m using pip install ., which should install the packages defined in pyproject.toml. I’m using the same .toml file that’s present in the main branch of the codebase. Could this be a parser issue? I got an error saying that I need to install the pymupdf package.

@brf153
Copy link
Contributor Author

brf153 commented Sep 22, 2025

Screen.Recording.2025-09-22.100534.mp4

These are the packages

@codereverser
Copy link
Owner

codereverser commented Sep 22, 2025 via email

@brf153
Copy link
Contributor Author

brf153 commented Sep 22, 2025

I tried logging values as well. These are my observations.

This is the code to get the mf data
image

When I use the original regex DEMAT_MF_HEADER_RE = r"Mutual Fund Folios\s+(\d+)\s+folios\s+(\d+)\s+([\d,.]+)"

image image

When I use the updated regex
DEMAT_MF_HEADER_RE = (
r"(Mutual Fund Folios)\s+(\d+)\s+Folios"
r"[\s\S]*?Total\s+\d+\s+[\d,.]+\s+[\d,.]+\s+([\d,.]+)"
)

image image

@brf153
Copy link
Contributor Author

brf153 commented Sep 22, 2025

Hi, could you try it with an older version of pymupdf, like before 1.25 (1.24.14 for eg)? I think something's messed up in the newer versions, and I'm trying to figure it out. Also, maybe install casparser with the fast extra? Just do pip install -U casparser[fast].

On Mon, 22 Sept, 2025, 2:36 pm Devaansh Bhandari, @.> wrote: brf153 left a comment (codereverser/casparser#115) <#115 (comment)> https://github.com/user-attachments/assets/9c054194-7cb7-4fa9-85ae-320070bc4c33 These are the packages — Reply to this email directly, view it on GitHub <#115 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACBIEX7FATJDV3SWZ5TBIMD3T54EPAVCNFSM6AAAAACHDE2L7KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTGMJWG4ZTONZWG4 . You are receiving this because you were mentioned.Message ID: @.>

sure. I will try it and let you know

@brf153
Copy link
Contributor Author

brf153 commented Sep 22, 2025

It installs the older version of pymupdf when I use the casparser[fast] command

image

It’s working fine with the casparser[fast] package. Thank you!

@codereverser
Copy link
Owner

Thanks for checking!

Yeah. there's something broken with pymupdf 1.25+ . I'm working on a new version with fixes and shall release it soon.

@brf153
Copy link
Contributor Author

brf153 commented Sep 22, 2025

@codereverser sure. You can check the updated regex in this PR — it might help with the new changes in the latest version of the PyMuPDF package. Thanks for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants