Fix #70962: xml_parse_into_struct strips embedded whitespace with XML…#7493
Fix #70962: xml_parse_into_struct strips embedded whitespace with XML…#7493Flashwade1990 wants to merge 3 commits intophp:PHP-7.4from
Conversation
…_OPTION_SKIP_WHITE.
cmb69
left a comment
There was a problem hiding this comment.
Thank you for working on this! The patch looks generally good to me. I left some comments on details to improve.
One thing that makes me wonder is the description of XML_OPTION_SKIP_WHITE in the manual:
Whether to skip values consisting of whitespace characters.
That makes me think that not only the value element of the respective $values would be skipped, but the complete $values element (i.e. for the given test script count($values) === 3). However, changing this would be a BC break, and given the long-standing behavior, it might be better to clarify the documentation.
ext/xml/tests/bug70962.phpt
Outdated
| --EXPECT-- | ||
| <d> | ||
| <e> | ||
| bool(false) No newline at end of file |
ext/xml/xml.c
Outdated
| } else { | ||
| zend_string_release_ex(decoded_value, 0); | ||
| } | ||
| if (parser->lastwasopen) { |
There was a problem hiding this comment.
Please use tabs (as four spaces) for indentation.
ext/xml/xml.c
Outdated
|
|
||
| /* check if the current tag already has a value - if yes append to that! */ | ||
| if ((myval = zend_hash_str_find(Z_ARRVAL_P(parser->ctag), "value", sizeof("value") - 1))) { | ||
| int newlen = Z_STRLEN_P(myval) + ZSTR_LEN(decoded_value); |
There was a problem hiding this comment.
| int newlen = Z_STRLEN_P(myval) + ZSTR_LEN(decoded_value); | |
| size_t newlen = Z_STRLEN_P(myval) + ZSTR_LEN(decoded_value); |
That was int before this patch, but I think we should fix that right away.
|
I just noticed that empty "cdata" nodes have an empty "value" even if PS: it is actually different, see https://3v4l.org/ue9dU and https://3v4l.org/GTZtn. So, if |
Unfortunately I was not able to find any information regarding XML_OPTION_SKIP_WHITE behavior as well. I may try to work on skipping the whole value object if it makes sense, but it will introduce BC break as you have already mentioned. |
I suggest to leave this as is for now, but I noticed a change due to this PR regarding cdata element. Please see https://3v4l.org/L4vTC; this has no "cdata" element, but with this PR there is one now. I think that needs to fixed. This extension's test suite is definitely not comprehensive enough. :( |
Ok I will leave this PR as it is. Does it make sense to work on a separate PR for skipping an empty "cdata" element as it was before? |
I think this needs to be fixed in this PR; otherwise we would introduce a BC break. By the way, do you test locally with libxml2 or libexpat? I don't think that we have CI testing with libexpat, but the behavior may be different. |
Ok, will try to fix this issue in the current PR. Based on compile info it is libxml2. Sorry, I am really new in C and do not know how to change library and test using libexpat. I need to dig deeper into code to find a way. |
|
No need for you to check with libexpat; I can do that. :) |
Thank you :) |
|
Something like the following might be sufficient to solve the cdata issue: ext/xml/xml.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/ext/xml/xml.c b/ext/xml/xml.c
index 45fd06971d..273e523d94 100644
--- a/ext/xml/xml.c
+++ b/ext/xml/xml.c
@@ -948,7 +948,9 @@ void _xml_characterDataHandler(void *userData, const XML_Char *s, int len)
add_assoc_string(&tag, "tag", SKIP_TAGSTART(parser->ltags[parser->level-1]));
add_assoc_str(&tag, "value", decoded_value);
- add_assoc_string(&tag, "type", "cdata");
+ if (!parser->skipwhite) {
+ add_assoc_string(&tag, "type", "cdata");
+ }
add_assoc_long(&tag, "level", parser->level);
zend_hash_next_index_insert(Z_ARRVAL(parser->data), &tag);This diff won't cleanly apply, because it's based on an earlier version of this PR, but you can apply manually. :) PS: Nope, disregard this. Sorry for the noise. |
|
Maybe we should go for a simpler fix? ext/xml/xml.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/ext/xml/xml.c b/ext/xml/xml.c
index fd8aebe03a..df052b8910 100644
--- a/ext/xml/xml.c
+++ b/ext/xml/xml.c
@@ -900,7 +900,6 @@ void _xml_characterDataHandler(void *userData, const XML_Char *s, int len)
break;
}
}
- if (doprint || (! parser->skipwhite)) {
if (parser->lastwasopen) {
zval *myval;
@@ -915,7 +914,7 @@ void _xml_characterDataHandler(void *userData, const XML_Char *s, int len)
add_assoc_str(parser->ctag, "value", decoded_value);
}
- } else {
+ } else if (doprint || (! parser->skipwhite)) {
zval tag;
zval *curtag, *mytype, *myval;
@@ -949,7 +948,6 @@ void _xml_characterDataHandler(void *userData, const XML_Char *s, int len)
} else if (parser->level == (XML_MAXLEVEL + 1)) {
php_error_docref(NULL, E_WARNING, "Maximum depth exceeded - Results truncated");
}
- }
} else {
zend_string_release_ex(decoded_value, 0);
}And then fix the indentation, and |
Tag |
|
Oh, right! Just when I thought I understand what the function is currently doing, I have to learn more about it. :) |
Latest commit returns an original behavior for skipping empty "cdata" elements. I think all cases are covered now and there is no BC break anymore :) |
cmb69
left a comment
There was a problem hiding this comment.
Thank you! This is good now, and also works fine with libexpat. I'm going to commit.
|
@Flashwade1990 - This introduces a bug, a CRLF and multiple spaces gets added to the structure. using the following code snipet <title>Dynamic Allocations</title> Configuring_Account_Allocation$p = xml_parser_create('UTF-8'); echo $errcode; having the next tag on the same line as the ending CDATA tag is a workaround: |
|
@radu-barbu-sage, this is deliberate. |


…_OPTION_SKIP_WHITE.