Home Backend Development C++ How Can I Retrieve Text Formatting (Font, Size, Style) from a PDF Using iTextSharp?

How Can I Retrieve Text Formatting (Font, Size, Style) from a PDF Using iTextSharp?

Jan 11, 2025 am 10:56 AM

How Can I Retrieve Text Formatting (Font, Size, Style) from a PDF Using iTextSharp?

How to extract text format using iTextSharp

Although iTextSharp provides an efficient text extraction method, it may have shortcomings in retaining formatting details such as fonts, colors, and sizes. To overcome this limitation, we explored an alternative approach.

Customized text extraction strategy

The custom TextWithFontExtractionStategy class extends the ITextExtractionStrategy interface to capture format information. In the RenderText method:

  • It monitors font names, pseudo-bold usage, baseline changes, and font size changes.
  • If any of these attributes change, it will close the current HTML span tag and create a new one with the corresponding styles.

Example output

The following C# code demonstrates how to extract text and font-related formatting from a PDF:

StringBuilder result = new StringBuilder();
PdfReader reader = new PdfReader(System.IO.Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "Document.pdf"));
TextWithFontExtractionStategy S = new TextWithFontExtractionStategy();
string F = iTextSharp.text.pdf.parser.PdfTextExtractor.GetTextFromPage(reader, 1, S);
Console.WriteLine(F);
Copy after login

The generated HTML output contains tags for font family, font size, and font style.

Other considerations

  • PostscriptFontName may contain additional characters, which may be related to font subsetting.
  • The example code assumes that changes in the baseline represent newlines in HTML.
  • The extraction process currently does not capture color information, but there are indications that this can be achieved manually.

The above is the detailed content of How Can I Retrieve Text Formatting (Font, Size, Style) from a PDF Using iTextSharp?. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot Article Tags

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

What are the types of values ​​returned by c language functions? What determines the return value? What are the types of values ​​returned by c language functions? What determines the return value? Mar 03, 2025 pm 05:52 PM

What are the types of values ​​returned by c language functions? What determines the return value?

C language function format letter case conversion steps C language function format letter case conversion steps Mar 03, 2025 pm 05:53 PM

C language function format letter case conversion steps

What are the definitions and calling rules of c language functions and what are the What are the definitions and calling rules of c language functions and what are the Mar 03, 2025 pm 05:53 PM

What are the definitions and calling rules of c language functions and what are the

Gulc: C library built from scratch Gulc: C library built from scratch Mar 03, 2025 pm 05:46 PM

Gulc: C library built from scratch

Where is the return value of the c language function stored in memory? Where is the return value of the c language function stored in memory? Mar 03, 2025 pm 05:51 PM

Where is the return value of the c language function stored in memory?

distinct usage and phrase sharing distinct usage and phrase sharing Mar 03, 2025 pm 05:51 PM

distinct usage and phrase sharing

How do I use algorithms from the STL (sort, find, transform, etc.) efficiently? How do I use algorithms from the STL (sort, find, transform, etc.) efficiently? Mar 12, 2025 pm 04:52 PM

How do I use algorithms from the STL (sort, find, transform, etc.) efficiently?

How does the C   Standard Template Library (STL) work? How does the C Standard Template Library (STL) work? Mar 12, 2025 pm 04:50 PM

How does the C Standard Template Library (STL) work?

See all articles