Return an explicit 500 if there's an exception generating metrics#85
Return an explicit 500 if there's an exception generating metrics#85FauxFaux wants to merge 1 commit intoprometheus:masterfrom
Conversation
| class MetricsHandler(BaseHTTPRequestHandler): | ||
| def do_GET(self): | ||
| try: | ||
| content = generate_latest(core.REGISTRY) |
There was a problem hiding this comment.
I'd suggest removing the try and letting the exception propagate up.
There was a problem hiding this comment.
That causes an empty body to be returned, not a 500. I felt that a 500 was more useful than just closing the tcp connection (although I suspect Prometheus would handle the case in the same way).
def do_GET(self):
- try:
- content = generate_latest(core.REGISTRY)
- except:
- logging.exception('error generating response')
- self.send_error(500, 'error generating metrics')
- self.end_headers()
- return
-
+ content = generate_latest(core.REGISTRY)
self.send_response(200)% curl -v http://localhost:23456/
* Trying ::1...
* connect to ::1 port 23456 failed: Connection refused
* Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 23456 (#0)
> GET / HTTP/1.1
> Host: localhost:23456
> User-Agent: curl/7.47.0
> Accept: */*
>
* Empty reply from server
* Connection #0 to host localhost left intact
curl: (52) Empty reply from server
There was a problem hiding this comment.
Prometheus would handle it the same. If we're going to explicitly return a 500 we should also include the full exception in the response.
There was a problem hiding this comment.
Many security policies discourage returning the stack-trace (or any details about the actual problem) in the response from 500 errors by default. Maybe you could argue that people typically authenticate these metrics endpoints, so it's not a problem? I don't, so would rather the stack-trace wasn't returned. If you decide to put it in, it will just be stripped by our load balancers anyway, so I don't mind either way.
There was a problem hiding this comment.
I would generally expect metrics endpoints not to be exposed to the outside world, and even then for the information you can gleam from the errors to be minimal.
|
Obsoleted by #119 |
The current behaviour observed in #84 is pretty bad. This implements the simplest possible fix: if there's any exception of any kind generating metrics, log it as we are now, but change the response code to a 500.
send_erroris used, which dumps a random bit of html into the response, which doesn't seem like a problem.send_response(500)could be used instead, to get a 500 with a blank body.