UUencoded attachment parsing

When dealing with attachments encoded via uuencoding (`Content-transfer-encoding` is `uuencode` or `x-uuencode`), mail-parser treats them as text, as can be seen in `parse()` (`mailparser.py:378`):
```python
if transfer_encoding == "base64" or (
  transfer_encoding == "quoted-\
  printable" and "application" in mail_content_type):
    ...
else:
  payload = ported_string(p.get_payload(decode=True), encoding=charset)
  log.debug("Filename {!r} part {!r} is not binary".format(filename, i))
```
Within the `else` block, the payload is correctly decoded with `p.get_payload(decode=True)`, but then passed to `ported_string()` which attempts to encode the returned bytes to UTF-8 in `utils.py:85`:
```python
def ported_string(raw_data, encoding='utf-8', errors='ignore'):
...
  try:
    return six.text_type(raw_data, encoding).strip()
  except (LookupError, UnicodeDecodeError):
    return six.text_type(raw_data, "utf-8", errors).strip()
```
Since `errors` are ignored, encoding doesn't fail, but returns a attachment stripped of all bytes that can't be encoded in utf-8 (that can be easily verified by attempting to write that binary to disk with `write_attachments`).

I encountered this issue while porting SpamScope to Python3, which has a test `test_store_samples_unicode_error` that parses and saves a uuencoded attachment. According to the test, the resulting file should have a MD5 checksum of `2ea90c996ca28f751d4841e6c67892b8`. That test passes with Python2, because the incorrectly parsed payload does indeed have that hash. However, with Python3 the hash changes due to differences in unicode handling. However, the correct checksum is actually `4f2cf891e7cfb349fca812091f184ecc`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

UUencoded attachment parsing #80

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

UUencoded attachment parsing #80

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions