Skip to content

Latest commit

 

History

History
96 lines (94 loc) · 34 KB

File metadata and controls

96 lines (94 loc) · 34 KB
Problem ID Type Origin Failure to Simulate Fault Level Failure Level
faulty_image_correlated Correlated Failure New All the image in containers are faulty, causing error Container App
update_incompatible_correlated Correlated Failure New The image in all the mongodb containers are updated to an incompatible version, causing error Container App
kubelet_crash Correlated Failure New The kublet process on worker node crashes, making all the services on the nodes unavailable Cluster Management App
incorrect_image System/Application Software Failure New Container of product-catalog service pulled incorrect image Container App
incorrect_port_assignment Misconfiguration Failure New PRODUCT_CATALOG_ADDR port of the checkout service is misconfigured Container App
misconfig_app_hotel_res System/Application Software Failure AIOpsLab Container of geo service pulled incorrect image Container App
missing_env_variable_astronomy_shop Misconfiguration Failure New CART_ADDR environment variable in frontend containers are missed Container App
revoke_auth_mongodb-1 Security Failure AIOpsLab Admin privileges in mongodb-geo are revoked App App
revoke_auth_mongodb-2 Security Failure AIOpsLab Admin privileges in mongodb-rate are revoked App App
storage_user_unregistered-1 Security Failure AIOpsLab User is not registered to mongodb-geo App App
storage_user_unregistered-2 Security Failure AIOpsLab User is not registered to mongodb-rate App App
valkey_auth_disruption Security Failure New The password in valkey is invalidated, therefore dependent services cannot work App App
valkey_memory_disruption Database Failure New The valkey store is in OOM state App App
capacity_decrease_rpc_retry_storm Metastable Failure New RPC module is misconfigured, then a CPU containment will trigger a retry storm Hardware/App App
gc_capacity_degradation Metastable Failure New GC frequency is set to be high, then a load spike will trigger lot requests stacked, leading to more GC activity, forming the sustaining loop OS/App App
load_spike_rpc_retry_storm Metastable Failure New RPC module is misconfigured, then a load spike will trigger a retry storm App App
assign_to_non_existent_node Cloud Management System Failure AIOpsLab user-service service is assigned to a non-ready node Cluster Management App
auth_miss_mongodb Security Failure AIOpsLab mongodb requires TLS certificate, while the client fails to provide App App
configmap_drift_hotel_reservation Misconfiguration Failure New The GeoMongoAddress configuration misses in geo service, making it fail to connect to mongodb App App
duplicate_pvc_mounts_astronomy_shop Cloud Management System Failure New Multiple pods in mongodb-rate service tries to mount same ReadWriteOnce pvc Cluster Management App
duplicate_pvc_mounts_hotel_reservation Cloud Management System Failure New Multiple pods in mongodb-rate service tries to mount same ReadWriteOnce pvc Cluster Management App
duplicate_pvc_mounts_social_network Cloud Management System Failure New Multiple pods in mongodb-rate service tries to mount same ReadWriteOnce pvc Cluster Management App
env_variable_shadowing_astronomy_shop Misconfiguration Failure New FRONTEND_HOST environment variable is incorrectly set to localhost Container App
k8s_target_port-misconfig Misconfiguration Failure AIOpsLab Target port in user-service is misconfigured Container App
liveness_probe_misconfiguration_astronomy_shop Misconfiguration Failure New Misconfigured healthz port, making liveness probe fail, therefore pods of frontend service will be caught into continuously restart cycle Cluster Management App
liveness_probe_misconfiguration_hotel_reservation Misconfiguration Failure New Misconfigured healthz port, making liveness probe fail, therefore pods of recommendation service will be caught into continuously restart cycle Cluster Management App
liveness_probe_misconfiguration_social_network Misconfiguration Failure New Misconfigured healthz port, making liveness probe fail, therefore pods of user-service service will be caught into continuously restart cycle Cluster Management App
liveness_probe_too_aggressive_astronomy_shop Cloud Management System Failure New Pods of aux-service service will be caught into continuously restart cycle Cluster Management App
liveness_probe_too_aggressive_hotel_reservation Cloud Management System Failure New Pods of aux-service service will be caught into continuously restart cycle Cluster Management App
liveness_probe_too_aggressive_social_network Cloud Management System Failure New Pods of aux-service service will be caught into continuously restart cycle Cluster Management App
missing_configmap_hotel_reservation Cloud Management System Failure New mongo-geo-script configmap is missed in mongodb-geo service App App
missing_configmap_social_network Cloud Management System Failure New media-mongodb configmap is missed in media-mongodb service App App
missing_service_astronomy_shop Cloud Management System Failure New ad service is missed in astronomy_shop app App App
missing_service_hotel_reservation Cloud Management System Failure New mongodb-rate service is missed in hotel_reservation app App App
missing_service_social_network Cloud Management System Failure New user-service service is missed in social_network app App App
namespace_memory_limit Cloud Management System Failure New Pods in search service will get stuck in pending as they can't be scheduled due to memory limit OS App
pod_anti_affinity_deadlock Cloud Management System Failure New Pods in user-service service will get stuck in pending as they can't be scheduled due to affinity rule Cluster Management App
persistent_volume_affinity_violation Cloud Management System Failure New Pods in user-service service will get stuck in pending as they can't be scheduled because they can't mount the pvc Cluster Management App
pvc_claim_mismatch Cloud Management System Failure New Pods in mongodb will get stuck in pending as they can't be scheduled because they can't mount the pvc Cluster Management App
rbac_misconfiguration Misconfiguration Failure New Init containers unable to perform required operations due to insufficient permissions. Cluster Management App
readiness_probe_misconfiguration_astronomy_shop Misconfiguration Failure New Pods in frontend will never enter ready state, making service unable to deal with requests Cluster Management App
readiness_probe_misconfiguration_hotel_reservation Misconfiguration Failure New Pods in frontend will never enter ready state, making service unable to deal with requests Cluster Management App
readiness_probe_misconfiguration_social_network Misconfiguration Failure New Pods in user-service will never enter ready state, making service unable to deal with requests Cluster Management App
resource_request_too_large Cloud Management System Failure New Pods in mongodb-rare will never be scheduled because the container requires memory that excess limit in every node Container App
resource_request_too_small Cloud Management System Failure New Pods in mongodb-rare will never be scheduled because the container requires memory that excess limit in every node Container App
rolling_update_misconfigured_hotel_reservation Misconfiguration Failure New No pods will be available since they get stuck with a contracdictory update configuration Cluster Management App
rolling_update_misconfigured_social_network Misconfiguration Failure New No pods will be available since they get stuck with a contracdictory update configuration Container App
scale_pod_zero_social_net Cloud Management System Failure AIOpsLab The replicas will be set to 0, leading to no available pods, thus making service unavailable Cluster Management App
service_dns_resolution_failure_astronomy_shop Network Failure New Modify the configuration of CoreDNS, making DNS resolution to frontend service fail Cluster Management App
service_dns_resolution_failure_social_network Network Failure New Modify the configuration of CoreDNS, making DNS resolution to user-service service fail Cluster Management App
sidecar_port_conflict_astronomy_shop Misconfiguration Failure New The pods fails to start because a malicious sidecar container will compete for the port with main container Container App
sidecar_port_conflict_hotel_reservation Misconfiguration Failure New The pods fails to start because a malicious sidecar container will compete for the port with main container Container App
sidecar_port_conflict_social_network Misconfiguration Failure New The pods fails to start because a malicious sidecar container will compete for the port with main container Container App
stale_coredns_config_astronomy_shop Network Failure New All communication within cluster will be interrupted, because DNS resolution to all services will fail Cluster Management App
stale_coredns_config_social_network Network Failure New All communication within cluster will be interrupted, because DNS resolution to all services will fail Cluster Management App
taint_no_toleration_social_network Cloud Management System Failure New Target pod can't be scheduled because there are no available nodes tolerate the pod Cluster Management App
wrong_bin_usage System/Application Software Failure AIOpsLab Pod will start with wrong bin file Container App
wrong_dns_policy_astronomy_shop Network Failure New All communication within cluster will be interrupted due to wrong DNS resolution policy in cluster Cluster Management App
wrong_dns_policy_hotel_reservation Network Failure New All communication within cluster will be interrupted due to wrong DNS resolution policy in clusterl Cluster Management App
wrong_dns_policy_social_network Network Failure New All communication within cluster will be interrupted due to wrong DNS resolution policy in cluster Cluster Management App
wrong_service_selector_astronomy_shop Misconfiguration Failure New No available pods in the service due to wrong selector policy Cluster Management App
wrong_service_selector_hotel_reservation Misconfiguration Failure New No available pods in the service due to wrong selector policy Cluster Management App
wrong_service_selector_social_network Misconfiguration Failure New No available pods in the service due to wrong selector policy Cluster Management App
astronomy_shop_ad_service_failure System/Application Software Failure AIOpsLab Ad service will fail App App
astronomy_shop_ad_service_high_cpu System/Application Software Failure AIOpsLab Triggers high cpu load in the Ad service Hardware App
astronomy_shop_ad_service_manual_gc System/Application Software Failure AIOpsLab Triggers full manual garbage collections in the Ad service OS App
astronomy_shop_cart_service_failure System/Application Software Failure AIOpsLab Fail cart service App App
astronomy_shop_ad_service_image_slow_load System/Application Software Failure AIOpsLab Slow loading images in the frontend Container App
astronomy_shop_payment_service_failure System/Application Software Failure AIOpsLab Fail payment service charge requests n% App App
astronomy_shop_payment_service_unreachable System/Application Software Failure AIOpsLab payment service is unavailable App App
astronomy_shop_product_catalog_service_failure System/Application Software Failure AIOpsLab Fail product_catalog service on a specific product App App
astronomy_shop_recommendation_service_cache_failure System/Application Software Failure AIOpsLab Fail recommendation service cache App App
kafka_queue_problems System/Application Software Failure AIOpsLab Overloads Kafka queue while simultaneously introducing a consumer side delay leading to a lag spike App App
loadgenerator_flood_homepage System/Application Software Failure AIOpsLab Flood the frontend with a large amount of requests App App
trainticket_f17_nested_sql_select_clause_error System/Application Software Failure New Too many nested 'select' and 'from' clauses are in the constructed SQL statement App App
trainticket_f22_sql_column_name_mismatch_error System/Application Software Failure New The constructed SQL statement includes a wrong column name in the 'select' part according to its 'from' part App App
read_error Hardware Component Failure New Pods on the node will encounter read error when visiting storage Hardware App
latent_sector_error Hardware Component Failure New Latent Sector Errors in storage cause I/O errors when mongodb-geo reads bad blocks, causing pod crashes Hardware App
silent_data_corruption Hardware Component Failure New Silent data corruption in storage using dm-flakey causes MongoDB to crash when reading corrupted WiredTiger data files Hardware App
ingress_misroute Network Failure New By modifying Kubernetes Ingress rules to route traffic from specific paths to incorrect backend services, simulates request misrouting issues caused by network routing configuration errors or load balancing misconfigurations. Cluster Management App
network_policy_block Network Failure New Service isolation issues caused by network partitioning, firewall configuration errors, or overly strict security policies Cluster Management App
social_net_hotel_res_astro_shop_concurrent_failures Multiple Independent Failures New / / /
workload_imbalance System/Application Software Failure New By modifying kube-proxy, the workload won't be distributed to pods evenly and balancedly Cluster Management App
operator_overload_replicas Cloud Management System Failure New Replicas number of TiDB is set to be a huge number, making a lot of pods stuck Cluster Management App
operator_non_existent_storage Cloud Management System Failure New The name of storageclass is set to be an invalid value, making pvc stuck at pending Cluster Management App
operator_invalid_affinity_toleration Misconfiguration Failure New An invalid toleration effect field is set, making pods unable to schedule Cluster Management App
operator_security_context_fault Security Failure New runAsUser is set to be an invalid value, making pods crash Cluster Management App
operator_wrong_update_strategy_fault Misconfiguration Failure New statefulSetUpdateStrategy is set to be an invalid value, making pods unable to restart and cluster unable to rollout update Cluster Management App
service_port_conflict_astronomy_shop Misconfiguration Failure New Pods of ad service fail to schedule because hostPort conflicts with another service in a different namespace Cluster Management App
service_port_conflict_hotel_reservation Misconfiguration Failure New Pods of recommendation service fail to schedule because hostPort conflicts with another service in a different namespace Cluster Management App
service_port_conflict_social_network Misconfiguration Failure New Pods of media-service service fail to schedule because hostPort conflicts with another service in a different namespace Cluster Management App
top_of_rack_router_failure_hotel_reservation Network Failure New Top-of-rack (ToR) router failure that partitions a subset of nodes from the rest of the cluster, leading to partial service reachability loss and cross-service communication failures Cluster Management App