Class LegacyPDFStreamEngine

java.lang.Object
org.apache.pdfbox.contentstream.PDFStreamEngine
org.apache.pdfbox.text.LegacyPDFStreamEngine
Direct Known Subclasses:
PDFMarkedContentExtractor, PDFTextStripper

class LegacyPDFStreamEngine extends PDFStreamEngine
LEGACY text calculations which are known to be incorrect but are depended on by PDFTextStripper. This class exists only so that we don't break the code of users who have their own subclasses of PDFTextStripper. It replaces the mostly empty implementation of showGlyph() in PDFStreamEngine with a heuristic implementation which is backwards compatible. DO NOT USE THIS CODE UNLESS YOU ARE WORKING WITH PDFTextStripper. THIS CODE IS DELIBERATELY INCORRECT, USE PDFStreamEngine INSTEAD.
  • Field Details

    • LOG

      private static final org.apache.commons.logging.Log LOG
    • pageRotation

      private int pageRotation
    • pageSize

      private PDRectangle pageSize
    • translateMatrix

      private Matrix translateMatrix
    • GLYPHLIST

      private static final GlyphList GLYPHLIST
    • fontHeightMap

      private final Map<COSDictionary,Float> fontHeightMap
  • Constructor Details

  • Method Details

    • processPage

      public void processPage(PDPage page) throws IOException
      This will initialize and process the contents of the stream.
      Overrides:
      processPage in class PDFStreamEngine
      Parameters:
      page - the page to process
      Throws:
      IOException - if there is an error accessing the stream.
    • showGlyph

      protected void showGlyph(Matrix textRenderingMatrix, PDFont font, int code, String unicode, Vector displacement) throws IOException
      Called when a glyph is to be processed. The heuristic calculations here were originally written by Ben Litchfield for PDFStreamEngine.
      Overrides:
      showGlyph in class PDFStreamEngine
      Parameters:
      textRenderingMatrix - the current text rendering matrix, Trm
      font - the current font
      code - internal PDF character code for the glyph
      unicode - the Unicode text for this glyph, or null if the PDF does provide it
      displacement - the displacement (i.e. advance) of the glyph in text space
      Throws:
      IOException - if the glyph cannot be processed
    • computeFontHeight

      protected float computeFontHeight(PDFont font) throws IOException
      Compute the font height. Override this if you want to use own calculations.
      Parameters:
      font - the font.
      Returns:
      the font height.
      Throws:
      IOException - if there is an error while getting the font bounding box.
    • processTextPosition

      protected void processTextPosition(TextPosition text)
      A method provided as an event interface to allow a subclass to perform some specific functionality when text needs to be processed.
      Parameters:
      text - The text to be processed.