Conversation
|
also I borrowed tests from #81561 and from #81461 |
e7bcc44 to
149f268
Compare
149f268 to
4966edb
Compare
|
I forced the assignment list. |
|
Some fast tests are failed. PR is in draft until I fix them |
| memory.resize(chunk_size); | ||
| in->readStrict(memory.data(), chunk_size); | ||
| working_buffer = Buffer(memory.data(), memory.data() + chunk_size); | ||
| in.reset(); |
There was a problem hiding this comment.
If the stream is broken, we should not read from it anymore.
| { | ||
| /// We have to distinguish empty buffer and nullptr. | ||
| stream = std::make_unique<EmptyReadBuffer>(); | ||
| stream_is_bounded = true; |
There was a problem hiding this comment.
In the first iteration I missed this line. All GET requests have been closed after response.
| return stream; | ||
| } | ||
|
|
||
| std::pair<std::shared_ptr<WriteBuffer>, std::shared_ptr<WriteBuffer>> HTTPServerResponse::beginSend() |
There was a problem hiding this comment.
That was an unused piece of code.
| setKeepAlive(false); | ||
| // Send header | ||
| Poco::Net::HTTPHeaderOutputStream hs(session); | ||
| write(hs); |
There was a problem hiding this comment.
it is HTTPResponse::write method. It is not virtual method.
It calls HTTPResponse::beginWrite method.
But HTTPServerResponse has its own beginWrite method. That method should be used here.
It is ugly. May be it is better to make a refactoring here without backporting it to the releases branches.
| bool canKeepAlive() const | ||
| { | ||
| if (stream && stream_is_bounded) | ||
| return stream->eof(); |
There was a problem hiding this comment.
This is not enough: it takes care of chunked encoding end marker, but not compression end marker. (stream doesn't know about compression, HTTPHandler adds decompressing buffer in_post_maybe_compressed over it.)
The way to test it in test_http_connection_drain_before_reuse is to add lz4 compression similar to the kotlin test, making sure the final \0\0\0\0 are in their own chunked-encoding chunk, and sleep for a second before sending that chunk. (Make sure the compressed data is bigger than max_query_size = 256 KiB.) (I guess make make_arrow_stream_data use incompressible random data instead of repeated b"xxxxxxxx", then lz4-compress it in python, split off the last 4 bytes, and yield them in a separate chunk from yield_then_sleep-like generator?) (After that, I think we wouldn't need to commit the kotlin test, test_http_connection_drain_before_reuse would cover all known problems that that test found.)
To fix it, I'm not necessarily suggesting draining in_post_maybe_compressed like in #81561 . We can instead say that IInputFormat should drain the input till eof(); most formats do that anyway, ArrowStream might be the only one that needs a change.
There was a problem hiding this comment.
You are right here and I'm right also.
My changes are made on the lowest level before format readers and below compression. It does not drain any data that is left, it only check that data is fully read and consumes final empty chunk of transfer chunk encoding format if necessary. So it would always work correctly no matter how format reader misbehaves. It either close connection or preserve it if request was fully read.
The case when there is compressions final block is left in the stream is considerable rare and then the connection would be closed, which is a correct behavior.
You are talking about making data reader read all data that helps in preserving the connections. Before my changes that leaded to the error, with my changes that leads to closing connection, you are proposing how to preserve it.
| if (resp.getheader('Connection').lower() == 'close'): | ||
| return |
There was a problem hiding this comment.
Don't we want to check that keepalive is working here? Seems important since we don't even print a warning when connection is closed because of leftover data. How else would we notice if the canKeepAlive() mechanism sometimes unnecessarily closes connections?
There was a problem hiding this comment.
You are right. I think I change it while I was debuggin tests on the first commit when I missed stream_is_bounded = true; for get requests.
After I found the bug I forgot to remove this lines.
| if (resp.getheader('Connection').lower() == 'close'): | ||
| return |
There was a problem hiding this comment.
Hold on, I ran this test a few times, and this seems to just always get a 'close'. In the HTTPServerRequest::canKeepAlive() call, stream has > 200 KB available() in the buffer. That's probably because reading is done through a wrapper around stream (without compression it's a ReadBufferWrapper), which doesn't necessarily propagate position() updates back to stream. I.e. in_post_maybe_compressed's position() was moved to the end the buffer, but this change will only be propagated to the inner buffer (stream) when in_post_maybe_compressed->nextImpl() is called.
There was a problem hiding this comment.
So I guess what we'll do is just rely on IInputFormat draining to eof() and not try to do anything else to separately drain the chunked encoding tail in particular.
I'll add eof() in ArrowBlockInputFormat and remove half of the long comment in allowKeepAliveIFFRequestIsFullyRead().
al13n321
left a comment
There was a problem hiding this comment.
Looks good to me. Before merging, plz review my commit and consider removing the kotlin test.
|
Failed test 02769_parallel_replicas_unavailable_shards is flaky: #81628 |
Why should we remove it? Your changes are OK for me. |
|
Could you please explain why not just use something like that at the end of size_t ignored = in->tryIgnore(std::numeric_limits<size_t>::max());
if (ignored > 0)
{
#if defined(DEBUG_OR_SANITIZER_BUILD)
throw Exception(ErrorCodes::LOGICAL_ERROR,
"HTTP request body was not fully read. Bytes ignored: {}", ignored);
#else
LOG_WARNING(log, "HTTP request body was not fully read. Bytes ignored: {}", ignored);
#endif
}Or we could add Or even use both Then all other changes become unnecessary It passes all of the new tests in The reason I am asking is because I am currently merging these changes into my PR: #80141 and some of them look redundant |
Read this #81595 (comment)
|
I mostly agree about But then there is another problem I think I've found. The $ cat 10-million-rows-insert.txt | curl -v 'http://localhost:8123/' -H "Transfer-Encoding: chunked" --data-binary @- 2>&1 | grep 'Connection:'
< Connection: Keep-Alive
$ cat 10-million-rows-insert.txt | curl -v 'http://localhost:8123/?send_progress_in_http_headers=1' -H "Transfer-Encoding: chunked" --data-binary @- 2>&1 | grep 'Connection:'
< Connection: CloseSo I still propose moving the P.S. I am still not sure if not consuming the whole request body (!)by an HTTPRequestHandler(!) is a misbehaviour that deserves a connection close |
Good catch.
Actually this violates HTTP protocol. We need to fix that. |
Previously, the error Invalid HTTP version string: /?query=I might have occurred while inserting some data over HTTP. This happened because ClickHouse didn't fully consume the input bytes (\r\n0\r\n\r\n) and reused the connection for the subsequent requests. This behavior is now fixed.
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Server should not preserve a HTTP connection if the request has not been fully read from the socket.
Documentation entry for user-facing changes