Skip to content

Fix XXE vulnerability in DOCX preprocessing (#1565)#1582

Open
KevinChen1994 wants to merge 1 commit intomicrosoft:mainfrom
KevinChen1994:fix/xxe-vulnerability-docx-preprocessing
Open

Fix XXE vulnerability in DOCX preprocessing (#1565)#1582
KevinChen1994 wants to merge 1 commit intomicrosoft:mainfrom
KevinChen1994:fix/xxe-vulnerability-docx-preprocessing

Conversation

@KevinChen1994
Copy link

Summary

  • Replace unsafe xml.etree.ElementTree with defusedxml.ElementTree in
    converter_utils/docx/pre_process.py to prevent XXE (XML External Entity)
    injection attacks
  • The ET.fromstring() call parses XML content extracted from user-supplied
    DOCX files, making it vulnerable to XXE injection and Billion Laughs
    (exponential entity expansion) attacks
  • defusedxml is already a core dependency of the project and is used
    consistently in other converters (omml.py, _rss_converter.py,
    _epub_converter.py)

Fixes #1565

Test plan

  • Verified import works correctly
  • All 109 module vector tests pass
  • All 84 remaining tests pass (CLI, PDF tables, misc)
  • DOCX conversion with math equations (OMML → LaTeX) works correctly
  • Maintainers may want to add a test with a crafted XXE payload to verify the fix blocks the attack

Replace unsafe `xml.etree.ElementTree` with `defusedxml.ElementTree` in
`converter_utils/docx/pre_process.py`. The `ET.fromstring()` call parses
XML content extracted from user-supplied DOCX files, which is vulnerable
to XML External Entity (XXE) injection and Billion Laughs (exponential
entity expansion) attacks.

`defusedxml` is already a core dependency of the project and is used in
other converters (omml.py, _rss_converter.py, _epub_converter.py). This
change makes the DOCX preprocessor consistent with the rest of the
codebase.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Security: XXE vulnerability in DOCX pre-processor (ET.fromstring on untrusted input)

1 participant