Skip to content

Thai fonts renders incorrectly #153

@jontelang

Description

@jontelang

Summary of the issue

When generating PDFs with thai fonts, it seems like some of the characters are not rendering properly. From my understanding, the fonts are supposed to rely on GSUB to do glyph substitution.

The font used in the thai unit test uses Sarabun font, which does have this capability, see the examples here:

CleanShot 2024-10-04 at 13 58 00@2x

I would be happy to investigate this further, but I have not much experience with pdfs, fonts, and so on.

Environment

iTextSharp.LGPLv2.Core version: Latest, 3.2.1
.NET Core SDK version: 8.0.400
IDE: Visual Studio 2022

Example code/Steps to reproduce:

Run the test for thai PDF creation, or generate any PDF with this sequence of thai characters: ญู (ญ + ู)

Output:

Here is a screenshot of a PDF generated with some of these glyph substituted strings. I used two different fonts (Sarabun and ChatThaiUI). Both have support for the glyph substitution.

CleanShot 2024-10-04 at 13 53 59@2x

Testing done

I debugged and as far as I could tell, there is a GSUB table in the font loaded, but I am not sure how to proceed further than this, nor whether the functionality is missing or whether I am simply not able to find it.

If I perform the "glyph substitution" manually, it is able to render the correct unicode character in both fonts. Not sure if this tells us anything though.

var originalText = "ทญ ญุ ญู อำ อ้ อ้ำ ฐ ฐุ ฐู ได้";

var tahomaFont = TestUtils.GetUnicodeFont("THSarabunNew", ...
var tahomaText = originalText.Replace("\u0E0D\u0E39", "\uF70F\u0E39");
pdfDoc.Add(new Paragraph(tahomaText, tahomaFont));

var chatThaiFont = TestUtils.GetUnicodeFont("CSChatThaiUI", ...
var chatThaiText = originalText.Replace("\u0E0D\u0E39", "\uEE0D\u0E39");
pdfDoc.Add(new Paragraph(chatThaiText, chatThaiFont));

This then renders those specific words correctly:

CleanShot 2024-10-04 at 14 07 11@2x

Bonus issue, potentially related

There appears to be another issue with positioning as well, the diacritics are positioned too high up. For example, look at the last word where the diacritic is positioned too far up, compared to eg. in browser: ได้. Potentially this indicates a similar issue with rendering. Maybe?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions