-
Notifications
You must be signed in to change notification settings - Fork 0
Description
What happened so far: We made a spike (#258) for vector logging in general. We decided on an architecture (#259). Now we want to define a CRD in an ADR (#261).
For this, we want to do another Spike, which is based on the decisions we made in #259, to base our ADR for #261 on. This new spike can be built on Teos old code: stackabletech/zookeeper-operator#450
Previous spikes:
Implementation status
The current status of the implementation can be seen in the logging demo.
Goal
The goal of this implementation is to have a production-ready logging solution for ZooKeeper and everything which is related to it. Every problem should be solvable by just having a look at the collected logs (Improving the logs so that this is actually possible, is out of scope. Also the Kubernetes events are out of scope). This implementation will be the template for the other operators, so it is worth doing it right. The decisions documented in the ADR go into the implementation but also the insights gained in the implementation flow back into the ADR. I planned to also gather the logs of the operator itself but this would require a lot of changes and new decisions, so I will postpone this and finish the log aggregation of the product first.
Tasks
- Allow custom configuration files
- Analyze bug when levelThreshold for file is given but not for console.
- Use a fragment also for the ZookeeperConfig.
- Support log4j (currently only logback is supported which is used in ZooKeeper 3.8.0).
- Adapt the ADR so that the logging of sidecar containers can be configured.
- Collect the logs of all sidecar containers like the init container and the Vector agent itself.
- Implement log level configuration.
- Move common code to operator-rs.
- Document logging in the concepts section of the top-level documentation. (assigned to @fhennig)
- Improve the documentation of the demo.
- Write an integration test.
- Discuss how production-ready the logging stack in stackablectl should be. Currently the default configuration of the OpenSearch security plugin is used which is insecure.
- Test hot-reloading of log configuration.
- Test log file rollover.
- Forward logs to the T2 Vector aggregator.
- Add additional information to the log entries like namespace, cluster, group, and role group.
- Analyze empty log entries in T2.
- Document the calculation of the maximum log file size.
Overview of the involved branches:
- https://github.com/stackabletech/docker-images/tree/logging (Logging docker-images#268)
- https://github.com/stackabletech/operator-rs/tree/logging ([Merged by Bors] - Logging operator-rs#517)
- https://github.com/stackabletech/zookeeper-operator/tree/logging ([Merged by Bors] - Logging zookeeper-operator#588)
- https://github.com/stackabletech/stackablectl/tree/logging ([Merged by Bors] - Logging stackablectl#187)
- https://github.com/stackabletech/documentation/tree/feature/logging-docs (Logging platform level docs documentation#326)
Temporary Docker images
see https://repo.stackable.tech/#browse/browse:docker:v2%2Fsandbox%2Flogging
Acceptance Criteria
- Log plain text to stdout
- Log structured entries (json/xml) to file (or using some other way directly to the sidecar Vector agent)
- Vector agent as a sidecar
- Read file with the vector agent sidecar
- Deploy a vector aggregator
- Deploy Opensearch
- Forward log entries to vector aggregator
- vector aggregator pushes entries to opensearch
- use secret operator for the certificate
- Log level should be configurable per logger in the ZookeeperCluster Resource