Package org.jsoup.parser
Class HtmlTreeBuilder
java.lang.Object
org.jsoup.parser.TreeBuilder
org.jsoup.parser.HtmlTreeBuilder
HTML Tree Builder; creates a DOM from Tokens.
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate boolean
private Element
private Token.EndTag
private FormElement
private boolean
private boolean
private boolean
private Element
private static final int
static final int
private static final int
private HtmlTreeBuilderState
private List<Token.Character>
private final String[]
private HtmlTreeBuilderState
(package private) static final String[]
(package private) static final String[]
(package private) static final String[]
(package private) static final String[]
(package private) static final String[]
(package private) static final String[]
(package private) static final String[]
(package private) static final String[]
(package private) static final String[]
(package private) static final String[]
private ArrayList<HtmlTreeBuilderState>
Fields inherited from class org.jsoup.parser.TreeBuilder
baseUri, currentToken, doc, parser, reader, seenTags, settings, stack, tokeniser, trackSourceRange
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescription(package private) Element
aboveOnStack
(Element el) (package private) void
(package private) void
(package private) void
private void
clearStackToContext
(String... nodeNames) Removes elements from the stack until one of the supplied HTML elements is removed.(package private) void
(package private) void
(package private) void
(package private) void
closeElement
(String name) (package private) Element
createElementFor
(Token.StartTag startTag, String namespace, boolean forcePreserveCase) (package private) HtmlTreeBuilderState
(package private) ParseSettings
private void
doInsertElement
(Element el, Token token) Inserts the Element onto the stack.(package private) void
error
(HtmlTreeBuilderState state) (package private) boolean
(package private) void
framesetOk
(boolean framesetOk) (package private) void
(package private) void
generateImpliedEndTags
(boolean thorough) Pops HTML elements off the stack according to the implied end tag rules(package private) void
generateImpliedEndTags
(String excludeTag) 13.2.6.3 Closing elements that have implied end tags When the steps below require the UA to generate implied end tags, then, while the current node is a dd element, a dt element, an li element, an optgroup element, an option element, a p element, an rb element, an rp element, an rt element, or an rtc element, the UA must pop the current node off the stack of open elements.(package private) Element
getActiveFormattingElement
(String nodeName) (package private) String
(package private) Document
(package private) FormElement
(package private) Element
getFromStack
(String elName) Gets the nearest (lowest) HTML element with the given name from the stack.(package private) Element
(package private) List<Token.Character>
getStack()
(package private) boolean
inButtonScope
(String targetName) protected void
initialiseParse
(Reader input, String baseUri, Parser parser) (package private) boolean
inListItemScope
(String targetName) (package private) boolean
(package private) boolean
(package private) boolean
(package private) boolean
inSelectScope
(String targetName) (package private) void
insertCharacterNode
(Token.Character characterToken) Inserts the provided character token into the current element.(package private) void
insertCharacterToElement
(Token.Character characterToken, Element el) Inserts the provided character token into the provided element.(package private) void
insertCommentNode
(Token.Comment token) (package private) Element
insertElementFor
(Token.StartTag startTag) Inserts an HTML element for the given tag)(package private) Element
insertEmptyElementFor
(Token.StartTag startTag) (package private) Element
insertForeignElementFor
(Token.StartTag startTag, String namespace) Inserts a foreign element.(package private) FormElement
insertFormElement
(Token.StartTag startTag, boolean onStack, boolean checkTemplateStack) (package private) void
(package private) void
(package private) void
insertOnStackAfter
(Element after, Element in) private boolean
inSpecificScope
(String[] targetNames, String[] baseTypes, String[] extraTypes) private boolean
inSpecificScope
(String targetName, String[] baseTypes, String[] extraTypes) (package private) boolean
inTableScope
(String targetName) protected boolean
isContentForTagData
(String normalName) (An internal method, visible for Element.(package private) boolean
(package private) boolean
(package private) static boolean
(package private) boolean
(package private) static boolean
private static boolean
(package private) static boolean
(package private) Element
(package private) void
(package private) void
maybeSetBaseUri
(Element base) (package private) HtmlTreeBuilder
Create a new copy of this TreeBuilder(package private) boolean
Checks if there is an HTML element with the given name on the stack.private static boolean
(package private) boolean
(package private) boolean
onStackNot
(String[] allowedTags) Tests if there is some element on the stack that is not in the provided set.(package private) HtmlTreeBuilderState
parseFragment
(String inputFragment, Element context, String baseUri, Parser parser) (package private) Element
popStackToClose
(String elName) Pops the stack until the given HTML element is removed.(package private) void
popStackToClose
(String... elNames) Pops the stack until one of the given HTML elements is removed.(package private) Element
popStackToCloseAnyNamespace
(String elName) Pops the stack until an element with the supplied name is removed, irrespective of namespace.(package private) HtmlTreeBuilderState
(package private) int
protected boolean
(package private) boolean
process
(Token token, HtmlTreeBuilderState state) (package private) void
(package private) void
(package private) void
pushWithBookmark
(Element in, int bookmark) (package private) void
(package private) void
(package private) boolean
(package private) Element
(package private) void
replaceActiveFormattingElement
(Element out, Element in) private static void
replaceInQueue
(ArrayList<Element> queue, Element out, Element in) (package private) void
replaceOnStack
(Element out, Element in) (package private) void
Places the body back onto the stack and moves to InBody, for cases in AfterBody / AfterAfterBody when more content comes(package private) boolean
Reset the insertion mode, by searching up the stack for an appropriate insertion mode.(package private) void
(package private) void
setFormElement
(FormElement formElement) (package private) void
setFosterInserts
(boolean fosterInserts) (package private) void
setHeadElement
(Element headElement) (package private) HtmlTreeBuilderState
state()
(package private) int
toString()
(package private) void
transition
(HtmlTreeBuilderState state) (package private) boolean
useCurrentOrForeignInsert
(Token token) Methods inherited from class org.jsoup.parser.TreeBuilder
currentElement, currentElementIs, currentElementIs, defaultNamespace, error, error, onNodeClosed, onNodeInserted, parse, pop, processEndTag, processStartTag, processStartTag, push, runParser, tagFor, tagFor
-
Field Details
-
TagsSearchInScope
-
TagSearchList
-
TagSearchButton
-
TagSearchTableScope
-
TagSearchSelectScope
-
TagSearchEndTags
-
TagThoroughSearchEndTags
-
TagSearchSpecial
-
TagMathMlTextIntegration
-
TagSvgHtmlIntegration
-
MaxScopeSearchDepth
public static final int MaxScopeSearchDepth- See Also:
-
state
-
originalState
-
baseUriSetFromDoc
private boolean baseUriSetFromDoc -
headElement
-
formElement
-
contextElement
-
formattingElements
-
tmplInsertMode
-
pendingTableCharacters
-
emptyEnd
-
framesetOk
private boolean framesetOk -
fosterInserts
private boolean fosterInserts -
fragmentParsing
private boolean fragmentParsing -
maxQueueDepth
private static final int maxQueueDepth- See Also:
-
specificScopeTarget
-
maxUsedFormattingElements
private static final int maxUsedFormattingElements- See Also:
-
-
Constructor Details
-
HtmlTreeBuilder
public HtmlTreeBuilder()
-
-
Method Details
-
defaultSettings
ParseSettings defaultSettings()- Specified by:
defaultSettings
in classTreeBuilder
-
newInstance
HtmlTreeBuilder newInstance()Description copied from class:TreeBuilder
Create a new copy of this TreeBuilder- Specified by:
newInstance
in classTreeBuilder
- Returns:
- copy, ready for a new parse
-
initialiseParse
- Overrides:
initialiseParse
in classTreeBuilder
-
parseFragment
- Specified by:
parseFragment
in classTreeBuilder
-
process
- Specified by:
process
in classTreeBuilder
-
useCurrentOrForeignInsert
-
isMathmlTextIntegration
-
isHtmlIntegration
-
process
-
transition
-
state
HtmlTreeBuilderState state() -
markInsertionMode
void markInsertionMode() -
originalState
HtmlTreeBuilderState originalState() -
framesetOk
void framesetOk(boolean framesetOk) -
framesetOk
boolean framesetOk() -
getDocument
Document getDocument() -
getBaseUri
String getBaseUri() -
maybeSetBaseUri
-
isFragmentParsing
boolean isFragmentParsing() -
error
-
createElementFor
-
insertElementFor
Inserts an HTML element for the given tag) -
insertForeignElementFor
Inserts a foreign element. Preserves the case of the tag name and of the attributes. -
insertEmptyElementFor
-
insertFormElement
-
doInsertElement
Inserts the Element onto the stack. All element inserts must run through this method. Performs any general tests on the Element before insertion.- Parameters:
el
- the Element to insert and make the current elementtoken
- the token this element was parsed from. If null, uses a zero-width current token as intrinsic insert
-
insertCommentNode
-
insertCharacterNode
Inserts the provided character token into the current element. -
insertCharacterToElement
Inserts the provided character token into the provided element. -
getStack
-
onStack
-
onStack
Checks if there is an HTML element with the given name on the stack. -
onStack
-
getFromStack
Gets the nearest (lowest) HTML element with the given name from the stack. -
removeFromStack
-
popStackToClose
Pops the stack until the given HTML element is removed. -
popStackToCloseAnyNamespace
Pops the stack until an element with the supplied name is removed, irrespective of namespace. -
popStackToClose
Pops the stack until one of the given HTML elements is removed. -
clearStackToTableContext
void clearStackToTableContext() -
clearStackToTableBodyContext
void clearStackToTableBodyContext() -
clearStackToTableRowContext
void clearStackToTableRowContext() -
clearStackToContext
Removes elements from the stack until one of the supplied HTML elements is removed. -
aboveOnStack
-
insertOnStackAfter
-
replaceOnStack
-
replaceInQueue
-
resetInsertionMode
boolean resetInsertionMode()Reset the insertion mode, by searching up the stack for an appropriate insertion mode. The stack search depth is limited tomaxQueueDepth
.- Returns:
- true if the insertion mode was actually changed.
-
resetBody
void resetBody()Places the body back onto the stack and moves to InBody, for cases in AfterBody / AfterAfterBody when more content comes -
inSpecificScope
-
inSpecificScope
-
inScope
-
inScope
-
inScope
-
inListItemScope
-
inButtonScope
-
inTableScope
-
inSelectScope
-
onStackNot
Tests if there is some element on the stack that is not in the provided set. -
setHeadElement
-
getHeadElement
Element getHeadElement() -
isFosterInserts
boolean isFosterInserts() -
setFosterInserts
void setFosterInserts(boolean fosterInserts) -
getFormElement
FormElement getFormElement() -
setFormElement
-
resetPendingTableCharacters
void resetPendingTableCharacters() -
getPendingTableCharacters
List<Token.Character> getPendingTableCharacters() -
addPendingTableCharacters
-
generateImpliedEndTags
13.2.6.3 Closing elements that have implied end tags When the steps below require the UA to generate implied end tags, then, while the current node is a dd element, a dt element, an li element, an optgroup element, an option element, a p element, an rb element, an rp element, an rt element, or an rtc element, the UA must pop the current node off the stack of open elements. If a step requires the UA to generate implied end tags but lists an element to exclude from the process, then the UA must perform the above steps as if that element was not in the above list. When the steps below require the UA to generate all implied end tags thoroughly, then, while the current node is a caption element, a colgroup element, a dd element, a dt element, an li element, an optgroup element, an option element, a p element, an rb element, an rp element, an rt element, an rtc element, a tbody element, a td element, a tfoot element, a th element, a thead element, or a tr element, the UA must pop the current node off the stack of open elements.- Parameters:
excludeTag
- If a step requires the UA to generate implied end tags but lists an element to exclude from the process, then the UA must perform the above steps as if that element was not in the above list.
-
generateImpliedEndTags
void generateImpliedEndTags() -
generateImpliedEndTags
void generateImpliedEndTags(boolean thorough) Pops HTML elements off the stack according to the implied end tag rules- Parameters:
thorough
- if we are thorough (includes table elements etc) or not
-
closeElement
-
isSpecial
-
lastFormattingElement
Element lastFormattingElement() -
positionOfElement
-
removeLastFormattingElement
Element removeLastFormattingElement() -
pushActiveFormattingElements
-
pushWithBookmark
-
checkActiveFormattingElements
-
isSameFormattingElement
-
reconstructFormattingElements
void reconstructFormattingElements() -
clearFormattingElementsToLastMarker
void clearFormattingElementsToLastMarker() -
removeFromActiveFormattingElements
-
isInActiveFormattingElements
-
getActiveFormattingElement
-
replaceActiveFormattingElement
-
insertMarkerToFormattingElements
void insertMarkerToFormattingElements() -
insertInFosterParent
-
pushTemplateMode
-
popTemplateMode
HtmlTreeBuilderState popTemplateMode() -
templateModeSize
int templateModeSize() -
currentTemplateMode
HtmlTreeBuilderState currentTemplateMode() -
toString
-
isContentForTagData
Description copied from class:TreeBuilder
(An internal method, visible for Element. For HTML parse, signals that script and style text should be treated as Data Nodes).- Overrides:
isContentForTagData
in classTreeBuilder
-