Skip to content

Commit 47164fc

Browse files
URL query percent encoding is incorrect
https://bugs.webkit.org/show_bug.cgi?id=306742 rdar://169566553 Reviewed by Anne van Kesteren. We have used ucnv_setFallback(converter, true) for many, many years for our ICU-based text decoding. However, when using ICU to encode non-UTF8 text through a URL, we need to not use the fallback. This matches the behavior of Chrome and Firefox, and Chromium has a comment saying this matches Netscape behavior. This behavior is odd but already implemented and specified in https://url.spec.whatwg.org/#string-percent-encode-after-encoding Chromium implemented this by using a class for the URL encoding, ICUCharsetConverter which does not call ucnv_setFallback, and a separate class for text encoding, and TextCodecIcu, which does call ucnv_setFallback. I've already used an abstract interface for this rare case, URLTextEncoding, in order to keep it down to one implementation, so I keep that design and use ucnv_setFallback to turn off fallback when using UnencodableHandling::URLEncodedEntities for URL query encoding, then I use ucnv_setFallback to reset the state of the encoder when I'm done with the operation. Test: imported/w3c/web-platform-tests/url/resources/percent-encoding.window.html * LayoutTests/imported/w3c/web-platform-tests/url/percent-encoding.window-expected.txt: * LayoutTests/imported/w3c/web-platform-tests/url/resources/percent-encoding.json: * Source/WebCore/PAL/pal/text/TextCodecICU.cpp: (PAL::TextCodecICU::encode const): Canonical link: https://commits.webkit.org/306768@main
1 parent 717c6c0 commit 47164fc

File tree

3 files changed

+14
-0
lines changed

3 files changed

+14
-0
lines changed

LayoutTests/imported/w3c/web-platform-tests/url/percent-encoding.window-expected.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,5 +13,7 @@ PASS Input  with encoding gb18030
1313
PASS Input  with encoding utf-8
1414
PASS Input − with encoding shift_jis
1515
PASS Input − with encoding utf-8
16+
PASS Input ¢ with encoding iso-8859-2
17+
PASS Input ¢ with encoding utf-8
1618
PASS Input á| with encoding utf-8
1719

LayoutTests/imported/w3c/web-platform-tests/url/resources/percent-encoding.json

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,13 @@
3939
"utf-8": "%E2%88%92"
4040
}
4141
},
42+
{
43+
"input": "\u00A2",
44+
"output": {
45+
"iso-8859-2": "%26%23162%3B",
46+
"utf-8": "%C2%A2"
47+
}
48+
},
4249
{
4350
"input": "á|",
4451
"output": {

Source/WebCore/PAL/pal/text/TextCodecICU.cpp

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -305,6 +305,7 @@ Vector<uint8_t> TextCodecICU::encode(StringView string, UnencodableHandling hand
305305
ucnv_setFromUCallBack(m_converter.get(), urlEscapedEntityCallback, 0, 0, 0, &error);
306306
if (U_FAILURE(error))
307307
return { };
308+
ucnv_setFallback(m_converter.get(), false);
308309
break;
309310
}
310311

@@ -321,6 +322,10 @@ Vector<uint8_t> TextCodecICU::encode(StringView string, UnencodableHandling hand
321322
ucnv_fromUnicode(m_converter.get(), &target, targetLimit, &source, sourceLimit, 0, true, &error);
322323
result.append(byteCast<uint8_t>(std::span(buffer)).first(target - buffer.data()));
323324
} while (needsToGrowToProduceBuffer(error));
325+
326+
if (handling == UnencodableHandling::URLEncodedEntities)
327+
ucnv_setFallback(m_converter.get(), true);
328+
324329
return result;
325330
}
326331

0 commit comments

Comments
 (0)