Skip to content

Commit bccc4c5

Browse files
author
martin.v.loewis
committed
Issue #6097: Escape UTF-8 surrogates resulting from mbstocs conversion
of the command line. git-svn-id: http://svn.python.org/projects/python/branches/py3k@73020 6015fed2-1504-0410-9fe1-9d1591cc4771
1 parent f72477a commit bccc4c5

2 files changed

Lines changed: 21 additions & 2 deletions

File tree

Misc/NEWS

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,9 @@ What's New in Python 3.1 release candidate 1?
1212
Core and Builtins
1313
-----------------
1414

15+
- Issue #6097: Escape UTF-8 surrogates resulting from mbstocs conversion
16+
of the command line.
17+
1518
- Issue #6012: Add cleanup support to O& argument parsing.
1619

1720
- Issue #6089: Fixed str.format with certain invalid field specifiers

Modules/python.c

Lines changed: 18 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -38,8 +38,16 @@ char2wchar(char* arg)
3838
if (!res)
3939
goto oom;
4040
count = mbstowcs(res, arg, argsize+1);
41-
if (count != (size_t)-1)
42-
return res;
41+
if (count != (size_t)-1) {
42+
wchar_t *tmp;
43+
/* Only use the result if it contains no
44+
surrogate characters. */
45+
for (tmp = res; *tmp != 0 &&
46+
(*tmp < 0xd800 || *tmp > 0xdfff); tmp++)
47+
;
48+
if (*tmp == 0)
49+
return res;
50+
}
4351
PyMem_Free(res);
4452
}
4553
/* Conversion failed. Fall back to escaping with surrogateescape. */
@@ -75,6 +83,14 @@ char2wchar(char* arg)
7583
memset(&mbs, 0, sizeof mbs);
7684
continue;
7785
}
86+
if (*out >= 0xd800 && *out <= 0xdfff) {
87+
/* Surrogate character. Escape the original
88+
byte sequence with surrogateescape. */
89+
argsize -= converted;
90+
while (converted--)
91+
*out++ = 0xdc00 + *in++;
92+
continue;
93+
}
7894
/* successfully converted some bytes */
7995
in += converted;
8096
argsize -= converted;

0 commit comments

Comments
 (0)