19. Structured Markup Processing Tools ************************************** Python supports a variety of modules to work with various forms of structured data markup. This includes modules to work with the Standard Generalized Markup Language (SGML) and the Hypertext Markup Language (HTML), and several interfaces for working with the Extensible Markup Language (XML). It is important to note that modules in the "xml" package require that there be at least one SAX-compliant XML parser available. Starting with Python 2.3, the Expat parser is included with Python, so the "xml.parsers.expat" module will always be available. You may still want to be aware of the PyXML add-on package; that package provides an extended set of XML libraries for Python. The documentation for the "xml.dom" and "xml.sax" packages are the definition of the Python bindings for the DOM and SAX interfaces. * 19.1. "HTMLParser" — Simple HTML and XHTML parser * 19.1.1. Example HTML Parser Application * 19.1.2. "HTMLParser" Methods * 19.1.3. Examples * 19.2. "sgmllib" — Simple SGML parser * 19.3. "htmllib" — A parser for HTML documents * 19.3.1. HTMLParser Objects * 19.4. "htmlentitydefs" — Definitions of HTML general entities * 19.5. XML Processing Modules * 19.6. XML vulnerabilities * 19.6.1. defused packages * 19.7. "xml.etree.ElementTree" — The ElementTree XML API * 19.7.1. Tutorial * 19.7.1.1. XML tree and elements * 19.7.1.2. Parsing XML * 19.7.1.3. Finding interesting elements * 19.7.1.4. Modifying an XML File * 19.7.1.5. Building XML documents * 19.7.1.6. Parsing XML with Namespaces * 19.7.1.7. Additional resources * 19.7.2. XPath support * 19.7.2.1. Example * 19.7.2.2. Supported XPath syntax * 19.7.3. Reference * 19.7.3.1. Functions * 19.7.3.2. Element Objects * 19.7.3.3. ElementTree Objects * 19.7.3.4. QName Objects * 19.7.3.5. TreeBuilder Objects * 19.7.3.6. XMLParser Objects * 19.8. "xml.dom" — The Document Object Model API * 19.8.1. Module Contents * 19.8.2. Objects in the DOM * 19.8.2.1. DOMImplementation Objects * 19.8.2.2. Node Objects * 19.8.2.3. NodeList Objects * 19.8.2.4. DocumentType Objects * 19.8.2.5. Document Objects * 19.8.2.6. Element Objects * 19.8.2.7. Attr Objects * 19.8.2.8. NamedNodeMap Objects * 19.8.2.9. Comment Objects * 19.8.2.10. Text and CDATASection Objects * 19.8.2.11. ProcessingInstruction Objects * 19.8.2.12. Exceptions * 19.8.3. Conformance * 19.8.3.1. Type Mapping * 19.8.3.2. Accessor Methods * 19.9. "xml.dom.minidom" — Minimal DOM implementation * 19.9.1. DOM Objects * 19.9.2. DOM Example * 19.9.3. minidom and the DOM standard * 19.10. "xml.dom.pulldom" — Support for building partial DOM trees * 19.10.1. DOMEventStream Objects * 19.11. "xml.sax" — Support for SAX2 parsers * 19.11.1. SAXException Objects * 19.12. "xml.sax.handler" — Base classes for SAX handlers * 19.12.1. ContentHandler Objects * 19.12.2. DTDHandler Objects * 19.12.3. EntityResolver Objects * 19.12.4. ErrorHandler Objects * 19.13. "xml.sax.saxutils" — SAX Utilities * 19.14. "xml.sax.xmlreader" — Interface for XML parsers * 19.14.1. XMLReader Objects * 19.14.2. IncrementalParser Objects * 19.14.3. Locator Objects * 19.14.4. InputSource Objects * 19.14.5. The "Attributes" Interface * 19.14.6. The "AttributesNS" Interface * 19.15. "xml.parsers.expat" — Fast XML parsing using Expat * 19.15.1. XMLParser Objects * 19.15.2. ExpatError Exceptions * 19.15.3. Example * 19.15.4. Content Model Descriptions * 19.15.5. Expat error constants