Package org.apache.pdfbox.pdfparser
Class PDFParser
java.lang.Object
org.apache.pdfbox.pdfparser.BaseParser
org.apache.pdfbox.pdfparser.COSParser
org.apache.pdfbox.pdfparser.PDFParser
- Direct Known Subclasses:
PreflightParser
-
Field Summary
FieldsFields inherited from class org.apache.pdfbox.pdfparser.COSParser
EOF_MARKER, fileLen, initialParseDone, OBJ_MARKER, securityHandler, source, SYSPROP_EOFLOOKUPRANGE, SYSPROP_PARSEMINIMAL, TMP_FILE_PREFIX, xrefTrailerResolver
Fields inherited from class org.apache.pdfbox.pdfparser.BaseParser
A, ASCII_CR, ASCII_LF, B, D, DEF, document, E, ENDOBJ_STRING, ENDSTREAM_STRING, J, M, MAX_LENGTH_LONG, N, O, R, S, seqSource, STREAM_STRING, T
-
Constructor Summary
ConstructorsConstructorDescriptionPDFParser
(RandomAccessRead source) Constructor.PDFParser
(RandomAccessRead source, String decryptionPassword) Constructor.PDFParser
(RandomAccessRead source, String decryptionPassword, InputStream keyStore, String alias) Constructor.PDFParser
(RandomAccessRead source, String decryptionPassword, InputStream keyStore, String alias, ScratchFile scratchFile) Constructor.PDFParser
(RandomAccessRead source, String decryptionPassword, ScratchFile scratchFile) Constructor.PDFParser
(RandomAccessRead source, ScratchFile scratchFile) Constructor. -
Method Summary
Modifier and TypeMethodDescriptionThis will get the PD document that was parsed.private void
init
(ScratchFile scratchFile) protected void
The initial parse will first parse only the trailer, the xrefstart and all xref tables to have a pointer (offset) to all the pdf's objects.void
parse()
This will parse the stream and populate the COSDocument object.Methods inherited from class org.apache.pdfbox.pdfparser.COSParser
checkPages, getAccessPermission, getDocument, getEncryption, getStartxrefOffset, isCatalog, isLenient, lastIndexOf, parseCOSStream, parseDictObjects, parseFDFHeader, parseObjectDynamically, parseObjectDynamically, parsePDFHeader, parseTrailerValuesDynamically, parseXref, parseXrefTable, rebuildTrailer, retrieveTrailer, setEOFLookupRange, setLenient
Methods inherited from class org.apache.pdfbox.pdfparser.BaseParser
isClosing, isClosing, isDigit, isDigit, isEndOfName, isEOL, isEOL, isSpace, isSpace, isWhitespace, isWhitespace, parseBoolean, parseCOSArray, parseCOSDictionary, parseCOSName, parseCOSString, parseDirObject, readExpectedChar, readExpectedString, readExpectedString, readGenerationNumber, readInt, readLine, readLong, readObjectNumber, readString, readString, readStringNumber, skipSpaces, skipWhiteSpaces
-
Field Details
-
LOG
private static final org.apache.commons.logging.Log LOG
-
-
Constructor Details
-
PDFParser
Constructor. Unrestricted main memory will be used for buffering PDF streams.- Parameters:
source
- source representing the pdf.- Throws:
IOException
- If something went wrong.
-
PDFParser
Constructor.- Parameters:
source
- input representing the pdf.scratchFile
- use aScratchFile
for temporary storage.- Throws:
IOException
- If something went wrong.
-
PDFParser
Constructor. Unrestricted main memory will be used for buffering PDF streams.- Parameters:
source
- input representing the pdf.decryptionPassword
- password to be used for decryption.- Throws:
IOException
- If something went wrong.
-
PDFParser
public PDFParser(RandomAccessRead source, String decryptionPassword, ScratchFile scratchFile) throws IOException Constructor.- Parameters:
source
- input representing the pdf.decryptionPassword
- password to be used for decryption.scratchFile
- use aScratchFile
for temporary storage.- Throws:
IOException
- If something went wrong.
-
PDFParser
public PDFParser(RandomAccessRead source, String decryptionPassword, InputStream keyStore, String alias) throws IOException Constructor. Unrestricted main memory will be used for buffering PDF streams.- Parameters:
source
- input representing the pdf.decryptionPassword
- password to be used for decryption.keyStore
- key store to be used for decryption when using public key securityalias
- alias to be used for decryption when using public key security- Throws:
IOException
- If something went wrong.
-
PDFParser
public PDFParser(RandomAccessRead source, String decryptionPassword, InputStream keyStore, String alias, ScratchFile scratchFile) throws IOException Constructor.- Parameters:
source
- input representing the pdf.decryptionPassword
- password to be used for decryption.keyStore
- key store to be used for decryption when using public key securityalias
- alias to be used for decryption when using public key securityscratchFile
- buffer handler for temporary storage; it will be closed onCOSDocument.close()
- Throws:
IOException
- If something went wrong.
-
-
Method Details
-
init
-
getPDDocument
This will get the PD document that was parsed. When you are done with this document you must call close() on it to release resources.- Returns:
- The document at the PD layer.
- Throws:
IOException
- If there is an error getting the document.
-
initialParse
The initial parse will first parse only the trailer, the xrefstart and all xref tables to have a pointer (offset) to all the pdf's objects. It can handle linearized pdfs, which will have an xref at the end pointing to an xref at the beginning of the file. Last the root object is parsed.- Throws:
InvalidPasswordException
- If the password is incorrect.IOException
- If something went wrong.
-
parse
This will parse the stream and populate the COSDocument object. This will close the keystore stream when it is done parsing.- Throws:
InvalidPasswordException
- If the password is incorrect.IOException
- If there is an error reading from the stream or corrupt data is found.
-