| faulty_image_correlated |
Correlated Failure |
New |
All the image in containers are faulty, causing error |
Container |
App |
| update_incompatible_correlated |
Correlated Failure |
New |
The image in all the mongodb containers are updated to an incompatible version, causing error |
Container |
App |
| kubelet_crash |
Correlated Failure |
New |
The kublet process on worker node crashes, making all the services on the nodes unavailable |
Cluster Management |
App |
| incorrect_image |
System/Application Software Failure |
New |
Container of product-catalog service pulled incorrect image |
Container |
App |
| incorrect_port_assignment |
Misconfiguration Failure |
New |
PRODUCT_CATALOG_ADDR port of the checkout service is misconfigured |
Container |
App |
| misconfig_app_hotel_res |
System/Application Software Failure |
AIOpsLab |
Container of geo service pulled incorrect image |
Container |
App |
| missing_env_variable_astronomy_shop |
Misconfiguration Failure |
New |
CART_ADDR environment variable in frontend containers are missed |
Container |
App |
| revoke_auth_mongodb-1 |
Security Failure |
AIOpsLab |
Admin privileges in mongodb-geo are revoked |
App |
App |
| revoke_auth_mongodb-2 |
Security Failure |
AIOpsLab |
Admin privileges in mongodb-rate are revoked |
App |
App |
| storage_user_unregistered-1 |
Security Failure |
AIOpsLab |
User is not registered to mongodb-geo |
App |
App |
| storage_user_unregistered-2 |
Security Failure |
AIOpsLab |
User is not registered to mongodb-rate |
App |
App |
| valkey_auth_disruption |
Security Failure |
New |
The password in valkey is invalidated, therefore dependent services cannot work |
App |
App |
| valkey_memory_disruption |
Database Failure |
New |
The valkey store is in OOM state |
App |
App |
| capacity_decrease_rpc_retry_storm |
Metastable Failure |
New |
RPC module is misconfigured, then a CPU containment will trigger a retry storm |
Hardware/App |
App |
| gc_capacity_degradation |
Metastable Failure |
New |
GC frequency is set to be high, then a load spike will trigger lot requests stacked, leading to more GC activity, forming the sustaining loop |
OS/App |
App |
| load_spike_rpc_retry_storm |
Metastable Failure |
New |
RPC module is misconfigured, then a load spike will trigger a retry storm |
App |
App |
| assign_to_non_existent_node |
Cloud Management System Failure |
AIOpsLab |
user-service service is assigned to a non-ready node |
Cluster Management |
App |
| auth_miss_mongodb |
Security Failure |
AIOpsLab |
mongodb requires TLS certificate, while the client fails to provide |
App |
App |
| configmap_drift_hotel_reservation |
Misconfiguration Failure |
New |
The GeoMongoAddress configuration misses in geo service, making it fail to connect to mongodb |
App |
App |
| duplicate_pvc_mounts_astronomy_shop |
Cloud Management System Failure |
New |
Multiple pods in mongodb-rate service tries to mount same ReadWriteOnce pvc |
Cluster Management |
App |
| duplicate_pvc_mounts_hotel_reservation |
Cloud Management System Failure |
New |
Multiple pods in mongodb-rate service tries to mount same ReadWriteOnce pvc |
Cluster Management |
App |
| duplicate_pvc_mounts_social_network |
Cloud Management System Failure |
New |
Multiple pods in mongodb-rate service tries to mount same ReadWriteOnce pvc |
Cluster Management |
App |
| env_variable_shadowing_astronomy_shop |
Misconfiguration Failure |
New |
FRONTEND_HOST environment variable is incorrectly set to localhost |
Container |
App |
| k8s_target_port-misconfig |
Misconfiguration Failure |
AIOpsLab |
Target port in user-service is misconfigured |
Container |
App |
| liveness_probe_misconfiguration_astronomy_shop |
Misconfiguration Failure |
New |
Misconfigured healthz port, making liveness probe fail, therefore pods of frontend service will be caught into continuously restart cycle |
Cluster Management |
App |
| liveness_probe_misconfiguration_hotel_reservation |
Misconfiguration Failure |
New |
Misconfigured healthz port, making liveness probe fail, therefore pods of recommendation service will be caught into continuously restart cycle |
Cluster Management |
App |
| liveness_probe_misconfiguration_social_network |
Misconfiguration Failure |
New |
Misconfigured healthz port, making liveness probe fail, therefore pods of user-service service will be caught into continuously restart cycle |
Cluster Management |
App |
| liveness_probe_too_aggressive_astronomy_shop |
Cloud Management System Failure |
New |
Pods of aux-service service will be caught into continuously restart cycle |
Cluster Management |
App |
| liveness_probe_too_aggressive_hotel_reservation |
Cloud Management System Failure |
New |
Pods of aux-service service will be caught into continuously restart cycle |
Cluster Management |
App |
| liveness_probe_too_aggressive_social_network |
Cloud Management System Failure |
New |
Pods of aux-service service will be caught into continuously restart cycle |
Cluster Management |
App |
| missing_configmap_hotel_reservation |
Cloud Management System Failure |
New |
mongo-geo-script configmap is missed in mongodb-geo service |
App |
App |
| missing_configmap_social_network |
Cloud Management System Failure |
New |
media-mongodb configmap is missed in media-mongodb service |
App |
App |
| missing_service_astronomy_shop |
Cloud Management System Failure |
New |
ad service is missed in astronomy_shop app |
App |
App |
| missing_service_hotel_reservation |
Cloud Management System Failure |
New |
mongodb-rate service is missed in hotel_reservation app |
App |
App |
| missing_service_social_network |
Cloud Management System Failure |
New |
user-service service is missed in social_network app |
App |
App |
| namespace_memory_limit |
Cloud Management System Failure |
New |
Pods in search service will get stuck in pending as they can't be scheduled due to memory limit |
OS |
App |
| pod_anti_affinity_deadlock |
Cloud Management System Failure |
New |
Pods in user-service service will get stuck in pending as they can't be scheduled due to affinity rule |
Cluster Management |
App |
| persistent_volume_affinity_violation |
Cloud Management System Failure |
New |
Pods in user-service service will get stuck in pending as they can't be scheduled because they can't mount the pvc |
Cluster Management |
App |
| pvc_claim_mismatch |
Cloud Management System Failure |
New |
Pods in mongodb will get stuck in pending as they can't be scheduled because they can't mount the pvc |
Cluster Management |
App |
| rbac_misconfiguration |
Misconfiguration Failure |
New |
Init containers unable to perform required operations due to insufficient permissions. |
Cluster Management |
App |
| readiness_probe_misconfiguration_astronomy_shop |
Misconfiguration Failure |
New |
Pods in frontend will never enter ready state, making service unable to deal with requests |
Cluster Management |
App |
| readiness_probe_misconfiguration_hotel_reservation |
Misconfiguration Failure |
New |
Pods in frontend will never enter ready state, making service unable to deal with requests |
Cluster Management |
App |
| readiness_probe_misconfiguration_social_network |
Misconfiguration Failure |
New |
Pods in user-service will never enter ready state, making service unable to deal with requests |
Cluster Management |
App |
| resource_request_too_large |
Cloud Management System Failure |
New |
Pods in mongodb-rare will never be scheduled because the container requires memory that excess limit in every node |
Container |
App |
| resource_request_too_small |
Cloud Management System Failure |
New |
Pods in mongodb-rare will never be scheduled because the container requires memory that excess limit in every node |
Container |
App |
| rolling_update_misconfigured_hotel_reservation |
Misconfiguration Failure |
New |
No pods will be available since they get stuck with a contracdictory update configuration |
Cluster Management |
App |
| rolling_update_misconfigured_social_network |
Misconfiguration Failure |
New |
No pods will be available since they get stuck with a contracdictory update configuration |
Container |
App |
| scale_pod_zero_social_net |
Cloud Management System Failure |
AIOpsLab |
The replicas will be set to 0, leading to no available pods, thus making service unavailable |
Cluster Management |
App |
| service_dns_resolution_failure_astronomy_shop |
Network Failure |
New |
Modify the configuration of CoreDNS, making DNS resolution to frontend service fail |
Cluster Management |
App |
| service_dns_resolution_failure_social_network |
Network Failure |
New |
Modify the configuration of CoreDNS, making DNS resolution to user-service service fail |
Cluster Management |
App |
| sidecar_port_conflict_astronomy_shop |
Misconfiguration Failure |
New |
The pods fails to start because a malicious sidecar container will compete for the port with main container |
Container |
App |
| sidecar_port_conflict_hotel_reservation |
Misconfiguration Failure |
New |
The pods fails to start because a malicious sidecar container will compete for the port with main container |
Container |
App |
| sidecar_port_conflict_social_network |
Misconfiguration Failure |
New |
The pods fails to start because a malicious sidecar container will compete for the port with main container |
Container |
App |
| stale_coredns_config_astronomy_shop |
Network Failure |
New |
All communication within cluster will be interrupted, because DNS resolution to all services will fail |
Cluster Management |
App |
| stale_coredns_config_social_network |
Network Failure |
New |
All communication within cluster will be interrupted, because DNS resolution to all services will fail |
Cluster Management |
App |
| taint_no_toleration_social_network |
Cloud Management System Failure |
New |
Target pod can't be scheduled because there are no available nodes tolerate the pod |
Cluster Management |
App |
| wrong_bin_usage |
System/Application Software Failure |
AIOpsLab |
Pod will start with wrong bin file |
Container |
App |
| wrong_dns_policy_astronomy_shop |
Network Failure |
New |
All communication within cluster will be interrupted due to wrong DNS resolution policy in cluster |
Cluster Management |
App |
| wrong_dns_policy_hotel_reservation |
Network Failure |
New |
All communication within cluster will be interrupted due to wrong DNS resolution policy in clusterl |
Cluster Management |
App |
| wrong_dns_policy_social_network |
Network Failure |
New |
All communication within cluster will be interrupted due to wrong DNS resolution policy in cluster |
Cluster Management |
App |
| wrong_service_selector_astronomy_shop |
Misconfiguration Failure |
New |
No available pods in the service due to wrong selector policy |
Cluster Management |
App |
| wrong_service_selector_hotel_reservation |
Misconfiguration Failure |
New |
No available pods in the service due to wrong selector policy |
Cluster Management |
App |
| wrong_service_selector_social_network |
Misconfiguration Failure |
New |
No available pods in the service due to wrong selector policy |
Cluster Management |
App |
| astronomy_shop_ad_service_failure |
System/Application Software Failure |
AIOpsLab |
Ad service will fail |
App |
App |
| astronomy_shop_ad_service_high_cpu |
System/Application Software Failure |
AIOpsLab |
Triggers high cpu load in the Ad service |
Hardware |
App |
| astronomy_shop_ad_service_manual_gc |
System/Application Software Failure |
AIOpsLab |
Triggers full manual garbage collections in the Ad service |
OS |
App |
| astronomy_shop_cart_service_failure |
System/Application Software Failure |
AIOpsLab |
Fail cart service |
App |
App |
| astronomy_shop_ad_service_image_slow_load |
System/Application Software Failure |
AIOpsLab |
Slow loading images in the frontend |
Container |
App |
| astronomy_shop_payment_service_failure |
System/Application Software Failure |
AIOpsLab |
Fail payment service charge requests n% |
App |
App |
| astronomy_shop_payment_service_unreachable |
System/Application Software Failure |
AIOpsLab |
payment service is unavailable |
App |
App |
| astronomy_shop_product_catalog_service_failure |
System/Application Software Failure |
AIOpsLab |
Fail product_catalog service on a specific product |
App |
App |
| astronomy_shop_recommendation_service_cache_failure |
System/Application Software Failure |
AIOpsLab |
Fail recommendation service cache |
App |
App |
| kafka_queue_problems |
System/Application Software Failure |
AIOpsLab |
Overloads Kafka queue while simultaneously introducing a consumer side delay leading to a lag spike |
App |
App |
| loadgenerator_flood_homepage |
System/Application Software Failure |
AIOpsLab |
Flood the frontend with a large amount of requests |
App |
App |
| trainticket_f17_nested_sql_select_clause_error |
System/Application Software Failure |
New |
Too many nested 'select' and 'from' clauses are in the constructed SQL statement |
App |
App |
| trainticket_f22_sql_column_name_mismatch_error |
System/Application Software Failure |
New |
The constructed SQL statement includes a wrong column name in the 'select' part according to its 'from' part |
App |
App |
| read_error |
Hardware Component Failure |
New |
Pods on the node will encounter read error when visiting storage |
Hardware |
App |
| latent_sector_error |
Hardware Component Failure |
New |
Latent Sector Errors in storage cause I/O errors when mongodb-geo reads bad blocks, causing pod crashes |
Hardware |
App |
| silent_data_corruption |
Hardware Component Failure |
New |
Silent data corruption in storage using dm-flakey causes MongoDB to crash when reading corrupted WiredTiger data files |
Hardware |
App |
| ingress_misroute |
Network Failure |
New |
By modifying Kubernetes Ingress rules to route traffic from specific paths to incorrect backend services, simulates request misrouting issues caused by network routing configuration errors or load balancing misconfigurations. |
Cluster Management |
App |
| network_policy_block |
Network Failure |
New |
Service isolation issues caused by network partitioning, firewall configuration errors, or overly strict security policies |
Cluster Management |
App |
| social_net_hotel_res_astro_shop_concurrent_failures |
Multiple Independent Failures |
New |
/ |
/ |
/ |
| workload_imbalance |
System/Application Software Failure |
New |
By modifying kube-proxy, the workload won't be distributed to pods evenly and balancedly |
Cluster Management |
App |
| operator_overload_replicas |
Cloud Management System Failure |
New |
Replicas number of TiDB is set to be a huge number, making a lot of pods stuck |
Cluster Management |
App |
| operator_non_existent_storage |
Cloud Management System Failure |
New |
The name of storageclass is set to be an invalid value, making pvc stuck at pending |
Cluster Management |
App |
| operator_invalid_affinity_toleration |
Misconfiguration Failure |
New |
An invalid toleration effect field is set, making pods unable to schedule |
Cluster Management |
App |
| operator_security_context_fault |
Security Failure |
New |
runAsUser is set to be an invalid value, making pods crash |
Cluster Management |
App |
| operator_wrong_update_strategy_fault |
Misconfiguration Failure |
New |
statefulSetUpdateStrategy is set to be an invalid value, making pods unable to restart and cluster unable to rollout update |
Cluster Management |
App |
| service_port_conflict_astronomy_shop |
Misconfiguration Failure |
New |
Pods of ad service fail to schedule because hostPort conflicts with another service in a different namespace |
Cluster Management |
App |
| service_port_conflict_hotel_reservation |
Misconfiguration Failure |
New |
Pods of recommendation service fail to schedule because hostPort conflicts with another service in a different namespace |
Cluster Management |
App |
| service_port_conflict_social_network |
Misconfiguration Failure |
New |
Pods of media-service service fail to schedule because hostPort conflicts with another service in a different namespace |
Cluster Management |
App |
| top_of_rack_router_failure_hotel_reservation |
Network Failure |
New |
Top-of-rack (ToR) router failure that partitions a subset of nodes from the rest of the cluster, leading to partial service reachability loss and cross-service communication failures |
Cluster Management |
App |