Skip to content

UnicodeDecodeError when parsing email with "\u" in its body #97

@dcfreire

Description

@dcfreire

Using parse_from_file or parse_from_bytes in an email containing "\u" raises UnicodeDecodeError

Steps to reproduce the behavior:

  1. import mailparser
  2. mail = mailparser.parse_from_file(f)
  3. See error

Expected behavior
Mail is properly parsed

Raw mail

Received: from localhost ([127.0.0.1]) by home with MailEnable ESMTPA; Mon, 2 Aug 2021 06:23:45 -0300
Subject: <example.com> Test
From: Example <[email protected]>
To: Example <[email protected]>
Reply-To: Example <[email protected]>
Return-Path: [email protected]
Date: Mon, 02 Aug 2021 06:23:45 -0300
X-Mailer: PHP/7.1.14
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Content-Disposition: inline
MIME-Version: 1.0
Message-ID: <9F4E043004D549FEAEBF0A374D252EE8.MAI@home>


1. Website "\upload site\public_html"

Environment:

  • OS: Debian Buster
  • Docker: yes
  • mail-parser version 3.15.0

Additional context
Traceback:

   File "/app/parser/parsers/mail.py", line 22, in _set_parser_obj
     self.parser_obj = parse_from_file(self.filepath)
   File "/usr/local/lib/python3.9/site-packages/mailparser/mailparser.py", line 79, in parse_from_file
    return MailParser.from_file(fp)
   File "/usr/local/lib/python3.9/site-packages/mailparser/mailparser.py", line 191, in from_file
     return cls(message)
   File "/usr/local/lib/python3.9/site-packages/mailparser/mailparser.py", line 138, in __init__
     self.parse()
   File "/usr/local/lib/python3.9/site-packages/mailparser/mailparser.py", line 446, in parse
     payload = payload.decode('raw-unicode-escape')
 UnicodeDecodeError: 'rawunicodeescape' codec can't decode bytes in position 13-14: truncated \uXXXX escape

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions