Text Segmentation Task ( with custom dataset ) #248

sau-arv-gul · 2025-08-04T05:50:44Z

sau-arv-gul
Aug 4, 2025

I want to train the BiRefNet for Text segmentation task.
I am using the TotalText dataset for a text segmentation task. The dataset contains two folders: im and gt. The im folder holds the input images, while the gt folder contains the corresponding ground truth masks. In the ground truth images, text regions are represented in white, and all non-text areas are black.

Answered by sau-arv-gul

Aug 11, 2025

Hi @ZhengPeng7 !
Thanks a lot for training the model and sharing the checkpoints — they worked great!

I figured out why I wasn’t getting the segmented images with my own training on Windows OS . In dataset.py (line 64), the replacement

 p_gt = p.replace('/im/', '/gt/')[:-(len(p.split('.')[-1])+1)] + ext

p.replace('/im/', '/gt/') works on Linux, but not on Windows and due to which the label path remains the same as original image. so the replacement '/im/' → '/gt/' never happens for me on my Windows.

So my labels were actually just grayscale versions of the input images itself, not the real GT masks.

        image = path_to_image(self.image_paths[index], size=self.data_size, color_type='rgb'…

View full answer

ZhengPeng7 · 2025-08-04T05:54:57Z

ZhengPeng7
Aug 4, 2025
Maintainer

Yeah, your process is proper. You can choose training from scratch or fine-tuning with some pre-trained weights. Interesting about the performance you can achieve.

5 replies

sau-arv-gul Aug 4, 2025
Author

Thankyou so much for replying!
I have setup the most recent GPU ( NVIDIA RTX 5090 32GB ) in my windows system. So my personal system has only one GPU, therefore I do not require a distributed training. Though with some changes in the train.py ( as there was some error running .sh file in windows ) I trained the model for 88 epoch ( referred your YouTube tutorial ).

During the training I noticed that SSMI error value is greater than 1:
2025-08-01 19:06:17,613 INFO Epoch[8/120] Iter[100/313]. Training Losses: bce: 16.546 | ssim: 1.8636 | mae: 10.14 | loss_pix: 114.2 |

However, the model saved at epoch 1 and the model saved at epoch 88 produce visually identical segmentation results on test images.

left image is input image and right image is its segmentation

sau-arv-gul Aug 4, 2025
Author

There must be some issue with the my training code
the ground truth image is :

ZhengPeng7 Aug 4, 2025
Maintainer

Sorry, I cannot think of suspicious problems for it with the given info. Did you train from scratch or fine-tune?
Can you give me the link to the dataset? Maybe someday I can try it.

sau-arv-gul Aug 4, 2025
Author

Once I fine-tuned (after 244 epoch: BiRefNet-general-epoch_244.pth) and once I trained from scratch. In both the case the result is same ( No segmentation )
Link to the dataset: https://drive.google.com/file/d/1EmyZMhB5yPMYMMsTu9flE6liihO-u1QI/view?usp=sharing

For below image:

I got the following segmentation:

but ground truth is :

ZhengPeng7 Aug 4, 2025
Maintainer

You should check the modification you made (especially the dataloader-related codes) and the custom data. I don't think my codes would produce such results. To ensure the validity of the original codes, you can use the DIS5K or another smaller dataset to train for several epochs and see if the predictions are normal.

sau-arv-gul · 2025-08-06T13:41:04Z

sau-arv-gul
Aug 6, 2025
Author

I am trying to fine-tune BiRefNet on my own custom dataset on my Personal windows System, Dataset folder structured like this:

Workspace/
  └── datasets/
      └── dis/
          └── Custom/
              ├── TR-Custom/
              │   ├── gt/
              │   └── im/
              └── TE-Custom/
                  ├── gt/
                  └── im/

Changes I mage in the config.py:

self.sys_home_dir = r"C:\Users\user\Desktop\workspace"
self.task = 'Custom'
self.testsets = {'Custom': 'TE-Custom'}
self.training_set = {'Custom': 'TR-Custom'}
self.size = (512, 512)
self.compile = False
self.save_last = 50
self.save_step = 5

Changes I made in the train.py:

os.environ['PYTORCH_CUDA_ALLOC_CONF'] = 'expandable_segments:False'
pin_memory = False

I could not run the .sh scripts on Windows, so I directly executed train.py using a terminal and passed the arguments through argparse.

python train.py --ckpt_dir ckpt/SAVED --epochs 256 --dist False --resume C:\Users\user\Desktop\workspace\weights\cv\BiRefNet-general-resolution_512x512-fp16-epoch_216.pth

These were the ONLY modification made in code base.
The model trains for 50 epochs, and checkpoints are saved correctly. However, during inference, the outputs are just black and white versions of the input images — I don’t see any visible segmentation or masks in the saved outputs.

Could you please help me understand what might be going wrong here? Do I need to use the task as Matting ?

3 replies

ZhengPeng7 Aug 6, 2025
Maintainer

Thanks for providing the details. I'll try to take some time to set up a training with your data to see the results. If I can obtain normal predictions, the problem should be in your environment.

sau-arv-gul Aug 7, 2025
Author

if you will get normal prediction please share that with me it will be very helpful and also please suggest where to do change in code so that i will be more clear and please also remember that I could not run the .sh scripts on Windows, so I directly executed train.py using a terminal and passed the arguments through argparse. is that issue in any result? , please suggest after trying, Thankyou.

ZhengPeng7 Aug 7, 2025
Maintainer

I did a little training last night with both fine-tuning with BiRefNet-general-epoch_244.pth and training from scratch.
The results seem to be very good, as shown below. Training from scratch took 300 epochs and showed better performance. I've also uploaded the weights, results, and performance to my Google Drive (will be all transferred within 1 hour). The links are as above and might be deleted in one week (Download them quickly if you need them).

In summary, BiRefNet can easily achieve a good performance on such text segmentation tasks. The problem should be in your environment or some modifications in config.py. BTW, for fewer modifications, I just used the General for the text-seg task. Here are all the only changed lines in my project:

# config.py
# ln 13
self.task = ['DIS5K', 'COD', 'HRSOD', 'General', 'General-2K', 'Matting'][3]
# ln 20
'General': ','.join(['TE-Custom']),
# ln 35
self.size = (1024//2, 1024//2) if self.task not in ['General-2K'] else (2560, 1440)   # wid, hei. Can be overwritten by dynamic_size in training.

# train.sh
# ln 10
'General') epochs=300 && val_last=50 && step=5 ;;
# ln 22
resume_weights_path='BiRefNet-general-epoch_244.pth'

And the dataset folder structure:

I know you may be some greenhand on this, so I've taken some time to write all the details. Hope this can help you.

ZhengPeng7 · 2025-08-11T02:07:41Z

ZhengPeng7
Aug 11, 2025
Maintainer

Hi, @sau-arv-gul, how about your results now?

2 replies

sau-arv-gul Aug 11, 2025
Author

Hi @ZhengPeng7 !
Thanks a lot for training the model and sharing the checkpoints — they worked great!

I figured out why I wasn’t getting the segmented images with my own training on Windows OS . In dataset.py (line 64), the replacement

 p_gt = p.replace('/im/', '/gt/')[:-(len(p.split('.')[-1])+1)] + ext

p.replace('/im/', '/gt/') works on Linux, but not on Windows and due to which the label path remains the same as original image. so the replacement '/im/' → '/gt/' never happens for me on my Windows.

So my labels were actually just grayscale versions of the input images itself, not the real GT masks.

        image = path_to_image(self.image_paths[index], size=self.data_size, color_type='rgb')
        label = path_to_image(self.label_paths[index], size=self.data_size, color_type='gray')

Since training ran without any errors, I assumed it was working fine on my Windows OS, but the issue was on my side.

That’s why my outputs were only grayscale copies of the inputs. Now, the results are fantastic, and BiRefNet works incredibly well on my data.

I truly appreciate the time and effort you took to train the model. Thanks for helping me, and I’m really sorry for the error from my side.

Answer selected by ZhengPeng7

ZhengPeng7 Aug 11, 2025
Maintainer

Good to hear about that!
That's also a problem on my side. But to be honest, I know this problem (you can use os.sep to automatically use / or \\). But this would make the codes look too complex.
And the gray-scale images are really a coincidence. I'll realize it next time when someone asks me about it. Thanks!

sau-arv-gul · 2025-09-03T09:35:07Z

sau-arv-gul
Sep 3, 2025
Author

Hi ZhengPeng7, hope you’re doing well
I have a new task where I’m currently using the generic model BiRefNet-massive-epoch_240.pth for background removal. Could you please let me know if you have any better or updated models for background removal that you would suggest

5 replies

ZhengPeng7 Sep 3, 2025
Maintainer

Hi, I don't have enough time and don't see the need from the community for some new models for certain tasks, which I could complish with limited efforts.
But there're indeed several people who have asked me for the OCR seg task. If you could give me a comprehensive list of existing OCR seg datasets in the MIT License, I may be able to spare some time to conduct a large training for it.

sau-arv-gul Sep 3, 2025
Author

Hey, thanks for your reply , We already have our personal dataset of 1,800 samples, is it large enough for the model to train effectively ? I’m wondering if we need to prepare more data, or if this will suffice?

ZhengPeng7 Sep 3, 2025
Maintainer

Yes, 1800 images should be already good enough, since your cases cannot be a 100% general scenario.
But if I want to train a general model, that's not enough, the license of which is not MIT.

sau-arv-gul Sep 3, 2025
Author

Hey @ZhengPeng7,I’m planning to create more data under the MIT License to expand our dataset. Could you advise how much data would be enough for us to start training a more general OCR segmentation model?

ZhengPeng7 Sep 3, 2025
Maintainer

It's hard to say. I think you don't need to do the best for all cases, since not very possible to include all cases. We can include as many data sources as possible. Different people tend to have data from different sources.

Text Segmentation Task ( with custom dataset ) #248

Uh oh!

sau-arv-gul Aug 4, 2025

Replies: 4 comments · 15 replies

Uh oh!

ZhengPeng7 Aug 4, 2025 Maintainer

Uh oh!

Uh oh!

sau-arv-gul Aug 4, 2025 Author

Uh oh!

Uh oh!

sau-arv-gul Aug 4, 2025 Author

Uh oh!

ZhengPeng7 Aug 4, 2025 Maintainer

Uh oh!

Uh oh!

sau-arv-gul Aug 4, 2025 Author

Uh oh!

ZhengPeng7 Aug 4, 2025 Maintainer

Uh oh!

Uh oh!

sau-arv-gul Aug 6, 2025 Author

Changes I mage in the config.py:

Changes I made in the train.py:

Uh oh!

ZhengPeng7 Aug 6, 2025 Maintainer

Uh oh!

sau-arv-gul Aug 7, 2025 Author

Uh oh!

ZhengPeng7 Aug 7, 2025 Maintainer

Uh oh!

ZhengPeng7 Aug 11, 2025 Maintainer

Uh oh!

Uh oh!

sau-arv-gul Aug 11, 2025 Author

Uh oh!

ZhengPeng7 Aug 11, 2025 Maintainer

Uh oh!

Uh oh!

sau-arv-gul Sep 3, 2025 Author

Uh oh!

ZhengPeng7 Sep 3, 2025 Maintainer

Uh oh!

Uh oh!

sau-arv-gul Sep 3, 2025 Author

Uh oh!

ZhengPeng7 Sep 3, 2025 Maintainer

Uh oh!

sau-arv-gul Sep 3, 2025 Author

Uh oh!

ZhengPeng7 Sep 3, 2025 Maintainer

sau-arv-gul
Aug 4, 2025

Replies: 4 comments 15 replies

ZhengPeng7
Aug 4, 2025
Maintainer

sau-arv-gul Aug 4, 2025
Author

sau-arv-gul Aug 4, 2025
Author

ZhengPeng7 Aug 4, 2025
Maintainer

sau-arv-gul Aug 4, 2025
Author

ZhengPeng7 Aug 4, 2025
Maintainer

sau-arv-gul
Aug 6, 2025
Author

ZhengPeng7 Aug 6, 2025
Maintainer

sau-arv-gul Aug 7, 2025
Author

ZhengPeng7 Aug 7, 2025
Maintainer

ZhengPeng7
Aug 11, 2025
Maintainer

sau-arv-gul Aug 11, 2025
Author

ZhengPeng7 Aug 11, 2025
Maintainer

sau-arv-gul
Sep 3, 2025
Author

ZhengPeng7 Sep 3, 2025
Maintainer

sau-arv-gul Sep 3, 2025
Author

ZhengPeng7 Sep 3, 2025
Maintainer

sau-arv-gul Sep 3, 2025
Author

ZhengPeng7 Sep 3, 2025
Maintainer