All articles

How to Parse XML in Python

Published Date May 7, 2025
Read 5 min
How to Parse XML in Python

TL;DR

  • Walks through two Python options for XML: ElementTree first, then Minidom.

  • ElementTree: quick to load/parse XML and extract tag/text, good default for most tasks.

  • Minidom: full DOM-style navigation and node control when you need detailed structure handling.

  • Side-by-side code examples help you choose the right parser for your workflow.

Parsing XML in Python is essential for developers dealing with structured data, web services, or configuration files. XML, or Extensible Markup Language, remains a widely used format for storing and exchanging data across different systems. Python offers powerful and user-friendly libraries that make navigating, extracting, and modifying XML data straightforward and efficient.

In this article, we’ll explore practical methods to parse XML using Python, highlighting easy-to-follow examples and helpful tips to streamline your data processing tasks.

Python XML Parsing Modules

There are two main modules for parsing XML with Python.

1 EdqxqVUnGftVFUfNcs0BwQ
  • xml.etree.ElementTree helps us format XML data in a tree structure, which is the most natural representation of hierarchical data. The Element data type allows data storage of a hierarchical data structure in memory.

  • xml.dom.minidom is used by people who are proficient with DOM (Document Object Module). It often starts with converting XML into a DOM.

Let’s discuss each of them in detail.

ElementTree

ElementTree is a class that wraps the element structure and allows the conversion to and from XML. It has the following properties:

  • Each element that is present in the element module will consist of a tag that represents the type of data being stored.

  • The attributes that are stored are Python dictionaries.

  • text string consisting of the information that needs to be displayed.

  • An optional tail string.

  • Child elements that consist of other specific data.

Now, we will learn how this module can be used for parsing an XML document.

Parsing with ElementTree module

There are two ways to parse the XML file with this module:

  1. Using the Parse function

  2. Using fromstring() function

Parsing with the Parse function

Consider this sample XML data. I am naming this file as sample.xml.

1User
2 Admin
3 Hello from XML!

Now, let’s write some Python code to parse the data from this XML file using the Parse function. I am naming this file as pyxml.py.

1import xml.etree.ElementTree as ET
2
3# Parse the XML file
4tree = ET.parse('sample.xml')
5root = tree.getroot()
6
7# Access elements
8print("To:", root.find('to').text)
9print("From:", root.find('from').text)
10print("Message:", root.find('message').text)

Let me explain to you this code step by step

  1. First, we imported Python’s built-in XML parsing library ElementTree.

  2. We are loading the file sample.xml and here tree becomes an ElementTree object representing the full XML structure.

  3. The third line retrieves the top-level (root) element of the XML, which is <note> in this case.

  4. Finally, we search for the <to> tag inside the root and get its text content. The same applies for <from> and <message>.

Once you run this code, you will get parsed data.

1 k2Sfm6PgzpDD09L82XphvA

Parsing with fromstring() function

1import xml.etree.ElementTree as ET
2
3xml_data = '''
4
5 User
6 Admin
7 Hello from XML!
8
9'''
10
11# Parse the XML string
12root = ET.fromstring(xml_data)
13
14# Access elements
15print("To:", root.find('to').text)
16print("From:", root.find('from').text)
17print("Message:", root.find('message').text)
  • 1. Import the ElementTree module as ET.

  • 2. Define an XML string stored in the xml_data variable.

  • 3. Use ET.fromstring(xml_data) to parse the XML string into an element tree.

  • 4. root now represents the <note> element (root of the XML structure).

  • 5. Use root.find('tag').text to extract text from <to><from>, and <message> tags.

  • 6. Print the extracted values.

Minidom

minidom (short for Minimal DOM implementation) It is a lightweight XML parser in Python that provides a Document Object Model (DOM) interface to XML documents. It's part of Python's standard library under xml.dom.

  • Allows navigation and modification of XML elements, attributes, and text nodes.

  • Suitable for small to moderately sized XML documents.

  • Access elements by tag name using getElementsByTagName().

Parsing with the Minidom module

Just like the Elementtree module, this module also has two methods for parsing.

  • Using the Parse() function.

  • Using parseString() function.

Parsing with the Parse function

Consider this sample.xml file.

1User
2 Admin
3 Hello from XML!

Now, let’s write some Python code to parse this data.

1from xml.dom import minidom
2
3# Parse the XML file
4doc = minidom.parse('example.xml')
5
6# Access elements
7to = doc.getElementsByTagName('to')[0].firstChild.nodeValue
8from_ = doc.getElementsByTagName('from')[0].firstChild.nodeValue
9message = doc.getElementsByTagName('message')[0].firstChild.nodeValue
10
11# Print values
12print("To:", to)
13print("From:", from_)
14print("Message:", message)

1. Import the minidom module from xml.dom.

2. Use minidom.parse('example.xml') to read and parse the XML file.

3. doc now holds the parsed XML document object.

4. Use getElementsByTagName('tag')[0] to access the desired element.

5. Access the text inside the tag using .firstChild.nodeValue.

6. Print the extracted values for <to><from>, and <message>.

Once you run this code, you will get this.

1 K72WkkrQhoUIDNu3gjt8oQ

Parsing with the parseString() function

1from xml.dom import minidom
2
3xml_data = '''
4
5 User
6 Admin
7 Hello from XML!
8
9'''
10
11# Parse the XML string
12doc = minidom.parseString(xml_data)
13
14# Access elements
15to = doc.getElementsByTagName('to')[0].firstChild.nodeValue
16from_ = doc.getElementsByTagName('from')[0].firstChild.nodeValue
17message = doc.getElementsByTagName('message')[0].firstChild.nodeValue
18
19# Print values
20print("To:", to)
21print("From:", from_)
22print("Message:", message)

1. Import minidom from the xml.dom module.

2. Define an XML string and store it in xml_data.

3. Use minidom.parseString(xml_data) to parse the XML string into a document object.

4. Access the first <to><from>, and <message> elements using getElementsByTagName('tag')[0].

5. Extract the text inside each tag using .firstChild.nodeValue.

6. Print the extracted values.

You will get the same response once you run this code.

Key Takeaways:

  • Shows how to parse and navigate XML data using Python.

  • Demonstrates different Python libraries for XML parsing (e.g., ElementTree, lxml).

  • Provides example code to extract specific elements and attributes from XML.

  • Explains how to handle XML with namespaces and hierarchical structures.

  • Useful for working with APIs, RSS feeds, and structured data formats.

Conclusion

Parsing XML in Python is straightforward thanks to built-in libraries like ElementTree and minidom. Whether you're working with XML files or raw XML strings, both modules offer simple methods parse() and fromstring() or parseString(), to access and manipulate XML data efficiently. While ElementTree is more Pythonic and suited for most use cases, minidom provides a complete DOM-style interface for those needing more control. By understanding both, you can choose the right approach depending on your project's needs.

Additional Resources

Try Scrapingdog for Free!

Get 200 free credits to spin the API. No credit card required!