Rebranding Word Documents to Use a New Branded Template (A First Attempt at python-docx and docx2python)
I made this because I had 160 articles to rebrand for work, and figured I'd make a personal project of it. But gosh, this pushed way beyond what I've worked with before.
The goal was to reformat old .docx articles into a clean new branded style using a Word template, all as a batch.
I learned about the python-docx library, low-level XML manipulation, and building structured content transformation pipelines... this was all super new for me and probably more advanced than I SHOULD have gone at my current level. But hey, pretty in character for me to throw myself into the deep end and try not to drown. ;)
I've got a parser.py to detect elements of the old docs and a writer.py to create the new docs, and a convert.py for single file tests and convert_batch.py for all 160 (no limit I'm aware of?).
Here's the GitHub folder
Libraries and Tools I Used
python-docx
Documentation - Many thanks to Scanny!
I used this library to read and write Word .docx files. Most of the document structure work happened with this.
Key things I worked with:
Document(...)
– to create or load a .docx.add_paragraph()
and.add_run()
– to insert styled text.add_table()
– to generate and format tables.paragraphs[0]._element.getparent().remove(...)
– this was used to delete placeholder content from my Word template
There were a few things I wanted to do that python-docx can’t handle out of the box—like clickable hyperlinks, table cell borders, and padding. So I dropped down into the WordprocessingML XML layer, which I still don't think I quite understand yet, to be honest.
Key functions and objects:
OxmlElement(...)
– to create custom XML tags like<w:hyperlink>
,<w:r>
,<w:tcBorders>
parse_xml(...)
– to convert a raw XML string into an elementqn(...)
– to apply the correct namespace prefix to tags
docx2python
Documentation - Many thanks to ShayHill!
Used only for extracting the title text from textboxes, since python-docx can’t read headers or textboxes very well.
This was a good learning stretch for me and helped me understand how Word documents are really structured under the hood.
Design Elements My Article Rebrander Can (and Can't) Handle
Paragraph Types
The parser detects these block types:
- heading2, heading3
- paragraph
- list_item (used for both bulleted and numbered lists for now)
- table
Each type is output with the correct style name from my template (like “Body Text”, “List Paragraph”, etc). This gave me full control over the final formatting.
Lists
Unordered and ordered list detection was trickier than expected. I spent like 2 hours trying things that kept not working for the ordered lists.
For now, I settled on treating all list items as list_item, and styled them uniformly as unordered lists. Hyperlinked text inside list items is also supported.
Partial win here.
Tables
Tables are generated row by row, and for each cell:
- Borders are added (light grey)
- Internal margins/padding are applied
- Text formatting (bold, italic, underline) is preserved
This only works for simple tables. For example merged cells, column widths, and specific table formatting isn't preserved.
Another partial win!
Things I'm Learning as a Programming Noob
This project helped me:
- Understand how to use python-docx for both reading and writing Word content
- Dive into XML-based formatting using OxmlElement
- Improve my ability to modularize code and handle complex document structures
- Debug layout issues that didn’t come from syntax errors but from subtle Word styling quirks
I encountered some small pieces for the first time as well:
- Wrote my first README.md file
- Learned what a .gitignore is and used one
- I didn't realize you can't comment in a JSON file, but now I know!
- I tested out GitHub Copilots and to be honest, it wasn't a win for me. It misunderstood my instructions and gave incorrect responses for almost all the things I wanted it to do, so I fell back on my good faithful non-optimal ChatGPT. Which sent me in circles on the ordered list issue for a long time but helped me find the libraries, figure out how to implement them, and get to where I got with this!
This was a stretch project for me and a rewarding one—I built something genuinely useful, and got much better at handling real-world Python code in the process.
I don't know if I SAVED myself time on converting those articles, ha, but I learned a heck of a lot more than I would have by monotonously doing them all manually one by one.