


How Can I Retrieve Text Formatting (Font, Size, Style) from a PDF Using iTextSharp?
Jan 11, 2025 am 10:56 AMHow to extract text format using iTextSharp
Although iTextSharp provides an efficient text extraction method, it may have shortcomings in retaining formatting details such as fonts, colors, and sizes. To overcome this limitation, we explored an alternative approach.
Customized text extraction strategy
The custom TextWithFontExtractionStategy
class extends the ITextExtractionStrategy
interface to capture format information. In the RenderText
method:
- It monitors font names, pseudo-bold usage, baseline changes, and font size changes.
- If any of these attributes change, it will close the current HTML span tag and create a new one with the corresponding styles.
Example output
The following C# code demonstrates how to extract text and font-related formatting from a PDF:
StringBuilder result = new StringBuilder(); PdfReader reader = new PdfReader(System.IO.Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "Document.pdf")); TextWithFontExtractionStategy S = new TextWithFontExtractionStategy(); string F = iTextSharp.text.pdf.parser.PdfTextExtractor.GetTextFromPage(reader, 1, S); Console.WriteLine(F);
The generated HTML output contains tags for font family, font size, and font style.
Other considerations
-
PostscriptFontName
may contain additional characters, which may be related to font subsetting. - The example code assumes that changes in the baseline represent newlines in HTML.
- The extraction process currently does not capture color information, but there are indications that this can be achieved manually.
The above is the detailed content of How Can I Retrieve Text Formatting (Font, Size, Style) from a PDF Using iTextSharp?. For more information, please follow other related articles on the PHP Chinese website!

Hot Article

Hot tools Tags

Hot Article

Hot Article Tags

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

What are the types of values returned by c language functions? What determines the return value?

C language function format letter case conversion steps

What are the definitions and calling rules of c language functions and what are the

Where is the return value of the c language function stored in memory?

How do I use algorithms from the STL (sort, find, transform, etc.) efficiently?

How does the C Standard Template Library (STL) work?
