When I run the operator_wrong_operator_image problem, I get 4 alerts with no namespace attached and one alert with the hotel-reservation namespace. I don't have the hotel-reservation namespace deployed!
I added a logger to print the firing alerts and their namespaces. Since none of the alerts match the expected namespace (tidb-cluster), they get filtered out. As a result, oracle says [WAIT] No alerts yet and keeps waiting till it times out.
run logs
INFO - all.sregym.conductor - [ENV] Injected fault
INFO - all.sregym.conductor - [WAIT] Waiting for alerts to fire in namespace 'tidb-cluster' (timeout=600s)…
INFO - sregym.conductor.oracles.alert_oracle - All Prometheus alerts (5 total, filtering for namespace='tidb-cluster'):
INFO - sregym.conductor.oracles.alert_oracle - [firing] HTTPProbeFailure — namespace=hotel-reservation
INFO - sregym.conductor.oracles.alert_oracle - [firing] TiDBInstanceNotReady — namespace=None
INFO - sregym.conductor.oracles.alert_oracle - [firing] TiDBStatefulSetNotReady — namespace=None
INFO - sregym.conductor.oracles.alert_oracle - [firing] TiDBStatefulSetNotReady — namespace=None
INFO - sregym.conductor.oracles.alert_oracle - [firing] TiDBStatefulSetNotReady — namespace=None
INFO - sregym.conductor.oracles.alert_oracle - All Prometheus alerts (5 total, filtering for namespace='tidb-cluster'):
INFO - sregym.conductor.oracles.alert_oracle - [firing] TiDBInstanceNotReady — namespace=None
INFO - sregym.conductor.oracles.alert_oracle - [firing] TiDBStatefulSetNotReady — namespace=None
INFO - sregym.conductor.oracles.alert_oracle - [firing] TiDBStatefulSetNotReady — namespace=None
INFO - sregym.conductor.oracles.alert_oracle - [firing] TiDBStatefulSetNotReady — namespace=None
INFO - sregym.conductor.oracles.alert_oracle - [firing] HTTPProbeFailure — namespace=hotel-reservation
INFO - sregym.conductor.oracles.alert_oracle - All Prometheus alerts (5 total, filtering for namespace='tidb-cluster'):
INFO - sregym.conductor.oracles.alert_oracle - [firing] TiDBInstanceNotReady — namespace=None
INFO - sregym.conductor.oracles.alert_oracle - [firing] TiDBStatefulSetNotReady — namespace=None
INFO - sregym.conductor.oracles.alert_oracle - [firing] TiDBStatefulSetNotReady — namespace=None
INFO - sregym.conductor.oracles.alert_oracle - [firing] TiDBStatefulSetNotReady — namespace=None
INFO - sregym.conductor.oracles.alert_oracle - [firing] HTTPProbeFailure — namespace=hotel-reservation
INFO - sregym.conductor.oracles.alert_oracle - All Prometheus alerts (5 total, filtering for namespace='tidb-cluster'):
INFO - sregym.conductor.oracles.alert_oracle - [firing] HTTPProbeFailure — namespace=hotel-reservation
INFO - sregym.conductor.oracles.alert_oracle - [firing] TiDBInstanceNotReady — namespace=None
INFO - sregym.conductor.oracles.alert_oracle - [firing] TiDBStatefulSetNotReady — namespace=None
INFO - sregym.conductor.oracles.alert_oracle - [firing] TiDBStatefulSetNotReady — namespace=None
INFO - sregym.conductor.oracles.alert_oracle - [firing] TiDBStatefulSetNotReady — namespace=None
INFO - all.sregym.conductor - [WAIT] No alerts yet — 38/600s elapsed
kubernetes namespaces...hotel-reservation is not there
$ kubectl get namespaces
NAME STATUS AGE
default Active 6d5h
fleetcast Active 4m17s
ingress-nginx Active 18m
kube-flannel Active 6d4h
kube-node-lease Active 6d5h
kube-public Active 6d5h
kube-system Active 6d5h
observe Active 6d4h
openebs Active 6d4h
sregym Active 6d4h
tidb-cluster Active 16m
tidb-operator Active 16m
A more detailed alert print shows that the hotel-reservation alert is being fired from ingress-nginx. But this namespace was deployed by the fleetcast app. I am not sure the alert it is firing get tagged as hotel-reservation.
more detailed alerts
[?] TiDBInstanceNotReady [firing]
Detail: TiDB instance basic-tidb.tidb-cluster.svc.cluster.local:10080 has been unreachable or has no connections for more than 2 minutes.
[?] TiDBStatefulSetNotReady [firing]
Detail: StatefulSet basic-pd has 1 ready replicas but desired is higher
[?] TiDBStatefulSetNotReady [firing]
Detail: StatefulSet basic-tikv has 1 ready replicas but desired is higher
[?] TiDBStatefulSetNotReady [firing]
Detail: StatefulSet basic-tidb has 1 ready replicas but desired is higher
[hotel-reservation] HTTPProbeFailure [firing]
Detail: HTTP probe to http://ingress-nginx-controller.ingress-nginx/api/hotels?inDate=2015-04-09&outDate=2015-04-10&lat=38.0235&lon=-122.095 in hotel-reservation is failing.
When I run the
operator_wrong_operator_imageproblem, I get 4 alerts with no namespace attached and one alert with thehotel-reservationnamespace. I don't have thehotel-reservationnamespace deployed!I added a logger to print the firing alerts and their namespaces. Since none of the alerts match the expected namespace (
tidb-cluster), they get filtered out. As a result, oracle says[WAIT] No alerts yetand keeps waiting till it times out.run logs
kubernetes namespaces...
hotel-reservationis not thereA more detailed alert print shows that the
hotel-reservationalert is being fired fromingress-nginx. But this namespace was deployed by thefleetcastapp. I am not sure the alert it is firing get tagged ashotel-reservation.more detailed alerts