Request #69086 - enhancement for mb_convert_encoding#1098
Request #69086 - enhancement for mb_convert_encoding#1098masakielastic wants to merge 3 commits intophp:masterfrom masakielastic:mb_converter
Conversation
…current_filter_illegal_substchar)
Fix bug #69086 enhancement for mb_convert_encoding
|
Comment on behalf of yohgaki at php.net: Merged. Thank you for PR. |
|
Reviewing more code related to #1094 (comment)... there are some problems here as well. Again the check is being made against the source encoding of the string, not the target encoding, which is where the substitution character has to be mapped. For example: <?php
mb_internal_encoding("UTF-8");
mb_substitute_character(0xfffd);
var_dump(bin2hex(mb_convert_encoding("\x80", "UTF-8", "EUC-JP-2004")));This will result in U+3F, even though UTF-8 clearly supports U+FFFD. However, even if the target encoding is checked instead of the source encoding, the check would still be too strict in the case where the target encoding is a "non-Unicode" encoding and does not match the internal encoding. There are many encodings that support large ranges of non-ASCII Unicode codepoints, but with the current logic they would always fall back to using U+3F. |
|
This is now fixed by fb9bf5b. No upfront check is performed anymore, instead mbfl_convert will simply try to use the character and if that fails, fall back to |
This pull request improves the value of subsitute charahcter when the value of third argument of mb_convert_encoding is different from the value of mb_internal_encoding.