In BigQuery, loading and unloading of XML Data is not supported. The only way to load/unload XML Data into/from BigQuery is to use external tools like Python or other third-party tools.
In my use case, we were required to unload/export customer data from BigQuery into an XML file. We created a Python program to read the BigQuery data and load it into an XML formatted file.
Briefly, let's understand what XML is and why it is still around.
XML (eXtensible Markup Language) is a versatile markup language for storing and transporting data. It's commonly used to structure, store, and share data across different systems in a readable format. XML uses tags to define elements within a document, creating a hierarchical structure that's both human-readable and machine-readable.
XML remains relevant and is still used for several reasons despite the emergence of other data formats like JSON, YAML, and others:
However, it's essential to note that XML's usage has somewhat diminished in certain areas due to the emergence of more lightweight and easy-to-parse formats like JSON, especially in web-based applications and APIs. JSON is often preferred for its simplicity, compactness, and ease of use in JavaScript environments.
The choice between XML and other data formats often depends on factors like existing infrastructure, industry standards, data complexity, interoperability needs, and specific use case requirements. For newer applications, JSON or other formats might be preferred, while for certain legacy systems or industries, XML remains a prevalent choice due to established practices and standards.
Unload customer data from BigQuery into Structured XML format and push that XML file into an SFTP location.
We followed two steps to accomplish this.
To unload data from BigQuery into XML using Python, you can use the BigQuery Python client library to fetch the data and an XML library, such as `xml.etree.ElementTree`, to create the XML file. Here's a step-by-step guide:
`pip install google-cloud-storage`
`pip install google-cloud-bigquery`
DAG Code link: https://github.com/pythian/blog-files/tree/offload_xml_from_BQ
Make sure to replace the placeholder values in the script with your actual project, dataset, and table information. This script fetches data from BigQuery, creates an XML structure, and saves it to the specified XML file.
https://realpython.com/python-xml-parser/
https://docs.python.org/3/library/xml.etree.elementtree.html
Ready to make smarter, data-driven decisions?