DOM Usage
Navigating the tree
The most important thing in trees is navigating between elements and how to interact with them — specially after parsing a document.
Each node may contain text and other nodes. MarkupEver provides numerous properties, methods, and iterators to help you work with and navigate between them.
Imagine this to start:
import markupever
dom: markupever.dom.TreeDom = markupever.parse(
"""
<note><to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body></note>
""",
markupever.XmlOptions()
)
root = dom.root()
# Document
# └── Element(name=QualName(local="note"), attrs=[], template=false, integration_point=false)
# ├── Element(name=QualName(local="to"), attrs=[], template=false, integration_point=false)
# │ └── Text(content="Tove")
# ├── Text(content="\n ")
# ├── Element(name=QualName(local="from"), attrs=[], template=false, integration_point=false)
# │ └── Text(content="Jani")
# ├── Text(content="\n ")
# ├── Element(name=QualName(local="heading"), attrs=[], template=false, integration_point=false)
# │ └── Text(content="Reminder")
# ├── Text(content="\n ")
# └── Element(name=QualName(local="body"), attrs=[], template=false, integration_point=false)
# └── Text(content="Don't forget me this weekend!")
Let's discuss about first_child
, last_child
, parent
, next_sibling
and prev_sibling
properties:
-
first_child: This property retrieves the first child node of the given element. If the element has no children, it returns
None
. -
last_child: This property retrieves the last child node of the given element. If the element has no children, it returns
None
. -
parent: This property retrieves the parent node of the given element. If the element has no parent (e.g., it's the root), it returns
None
. -
next_sibling: This property retrieves the next sibling node of the given element. If there is no next sibling, it returns
None
. -
prev_sibling: This property retrieves the previous sibling node of the given element. If there is no previous sibling, it returns
None
.
While these properties are useful, they might not always meet our needs. In such cases, methods like .children()
, .ancestors()
, .prev_siblings()
, .next_siblings()
, .first_children()
, .last_children()
, .traverse()
, and .descendants()
can provide additional functionality.
- children() - Returns an iterator which iterates over children of node.
- ancestors() - Returns an iterator which iterates over ancestors (parents) of node.
- prev_siblings() - Returns an iterator which iterates over previous siblings of node.
- next_siblings() - Returns an iterator which iterates over next siblings of node.
- first_children() - Returns an iterator which iterates over first children.
- last_children() - Returns an iterator which iterates over last children.
- traverse() - Returns a traverse iterator.
- descendants() - Returns an iterator which iterates over a node and its descendants.
Build a document
In MarkupEver, we use a class named TreeDom
(1) as a tree structure. This class allows you to work with the document — move, create, remove, select, serialize, and more. In this tutorial, we'll create a document without using the Parser
class. We'll focus on TreeDom
properties and methods.
- A tree structure which specialy designed for HTML and XML documents. Uses Rust's
Vec
type in backend. The memory consumed by theTreeDom
is dynamic and depends on the number of tokens stored in the tree. The allocated memory is never reduced and is only released when it is dropped.
Start
To start creating a document, we first need to create a TreeDom
.
Each TreeDom
always has a root node of the dom.Document
type. We can access it using the .root()
method.
>>> from markupever import dom
>>> tree = dom.TreeDom()
>>>
>>> root = tree.root() # type is dom.Document
>>> root
Document
Be Careful
Avoid using is
for node types in markupever.dom
(such as Document
, Element
, Text
, etc.) because they are not alive and serve only as a bridge for you to communicate with the core written in Rust.
Adding nodes
dom.Document
and dom.Element
types have methods start with create_
. These are help you to create and add new nodes to
document. Let's add a DOCTYPE to our document:
>>> from markupever import dom
>>> tree = dom.TreeDom()
>>>
>>> tree.root().create_doctype("html")
Doctype(name="html", public_id="", system_id="")
Let's check what we did by printing or serializing the tree:
>>> print(tree)
Document
└── Doctype(name="html", public_id="", system_id="")
>>> tree.serialize(is_html=True)
'<!DOCTYPE html>'
OK. Let's add some elements and check again:
>>> html = tree.root().create_element("html", {"lang": "en"}) # type is dom.Element
>>> html.create_element("body")
Element(name=QualName(local="body", ns="", prefix=None), attrs=[], template=false, integration_point=false)
>>> print(tree)
Document
└── Element(name=QualName(local="html", ns="", prefix=None), attrs=[(QualName(local="lang", ns="", prefix=None), "en")], template=false, integration_point=false)
└── Element(name=QualName(local="body", ns="", prefix=None), attrs=[], template=false, integration_point=false)
>>> tree.serialize(is_html=True)
'<!DOCTYPE html><html lang="en"><body></body></html>'
This is very easy as you can see ...
Ordering
The create_*
methods allow you to perform append, prepend, insert after, and insert before operations within the document.
- append means adding a child as the last child of a node (default).
- prepend means adding a child as the first child of a node.
- insert after means adding a node as the next sibling of another node.
- insert before means adding a node as the previous sibling of another node.
You can specify the operation with dom.Ordering
class and the ordering
parameter in the create_*
methods.