Problem List.md

Problem ID	Type	Origin	Failure to Simulate	Fault Level	Failure Level
faulty_image_correlated	Correlated Failure	New	All the image in containers are faulty, causing error	Container	App
update_incompatible_correlated	Correlated Failure	New	The image in all the `mongodb` containers are updated to an incompatible version, causing error	Container	App
kubelet_crash	Correlated Failure	New	The kublet process on worker node crashes, making all the services on the nodes unavailable	Cluster Management	App
incorrect_image	System/Application Software Failure	New	Container of `product-catalog` service pulled incorrect image	Container	App
incorrect_port_assignment	Misconfiguration Failure	New	`PRODUCT_CATALOG_ADDR` port of the `checkout` service is misconfigured	Container	App
misconfig_app_hotel_res	System/Application Software Failure	AIOpsLab	Container of `geo` service pulled incorrect image	Container	App
missing_env_variable_astronomy_shop	Misconfiguration Failure	New	`CART_ADDR` environment variable in `frontend` containers are missed	Container	App
revoke_auth_mongodb-1	Security Failure	AIOpsLab	Admin privileges in `mongodb-geo` are revoked	App	App
revoke_auth_mongodb-2	Security Failure	AIOpsLab	Admin privileges in `mongodb-rate` are revoked	App	App
storage_user_unregistered-1	Security Failure	AIOpsLab	User is not registered to `mongodb-geo`	App	App
storage_user_unregistered-2	Security Failure	AIOpsLab	User is not registered to `mongodb-rate`	App	App
valkey_auth_disruption	Security Failure	New	The password in valkey is invalidated, therefore dependent services cannot work	App	App
valkey_memory_disruption	Database Failure	New	The valkey store is in OOM state	App	App
capacity_decrease_rpc_retry_storm	Metastable Failure	New	RPC module is misconfigured, then a CPU containment will trigger a retry storm	Hardware/App	App
gc_capacity_degradation	Metastable Failure	New	GC frequency is set to be high, then a load spike will trigger lot requests stacked, leading to more GC activity, forming the sustaining loop	OS/App	App
load_spike_rpc_retry_storm	Metastable Failure	New	RPC module is misconfigured, then a load spike will trigger a retry storm	App	App
assign_to_non_existent_node	Cloud Management System Failure	AIOpsLab	`user-service` service is assigned to a non-ready node	Cluster Management	App
auth_miss_mongodb	Security Failure	AIOpsLab	`mongodb` requires TLS certificate, while the client fails to provide	App	App
configmap_drift_hotel_reservation	Misconfiguration Failure	New	The `GeoMongoAddress` configuration misses in `geo` service, making it fail to connect to `mongodb`	App	App
duplicate_pvc_mounts_astronomy_shop	Cloud Management System Failure	New	Multiple pods in `mongodb-rate` service tries to mount same `ReadWriteOnce` pvc	Cluster Management	App
duplicate_pvc_mounts_hotel_reservation	Cloud Management System Failure	New	Multiple pods in `mongodb-rate` service tries to mount same `ReadWriteOnce` pvc	Cluster Management	App
duplicate_pvc_mounts_social_network	Cloud Management System Failure	New	Multiple pods in `mongodb-rate` service tries to mount same `ReadWriteOnce` pvc	Cluster Management	App
env_variable_shadowing_astronomy_shop	Misconfiguration Failure	New	`FRONTEND_HOST` environment variable is incorrectly set to `localhost`	Container	App
k8s_target_port-misconfig	Misconfiguration Failure	AIOpsLab	Target port in `user-service` is misconfigured	Container	App
liveness_probe_misconfiguration_astronomy_shop	Misconfiguration Failure	New	Misconfigured `healthz` port, making liveness probe fail, therefore pods of `frontend` service will be caught into continuously restart cycle	Cluster Management	App
liveness_probe_misconfiguration_hotel_reservation	Misconfiguration Failure	New	Misconfigured `healthz` port, making liveness probe fail, therefore pods of `recommendation` service will be caught into continuously restart cycle	Cluster Management	App
liveness_probe_misconfiguration_social_network	Misconfiguration Failure	New	Misconfigured `healthz` port, making liveness probe fail, therefore pods of `user-service` service will be caught into continuously restart cycle	Cluster Management	App
liveness_probe_too_aggressive_astronomy_shop	Cloud Management System Failure	New	Pods of `aux-service` service will be caught into continuously restart cycle	Cluster Management	App
liveness_probe_too_aggressive_hotel_reservation	Cloud Management System Failure	New	Pods of `aux-service` service will be caught into continuously restart cycle	Cluster Management	App
liveness_probe_too_aggressive_social_network	Cloud Management System Failure	New	Pods of `aux-service` service will be caught into continuously restart cycle	Cluster Management	App
missing_configmap_hotel_reservation	Cloud Management System Failure	New	`mongo-geo-script` configmap is missed in `mongodb-geo` service	App	App
missing_configmap_social_network	Cloud Management System Failure	New	`media-mongodb` configmap is missed in `media-mongodb` service	App	App
missing_service_astronomy_shop	Cloud Management System Failure	New	`ad` service is missed in `astronomy_shop` app	App	App
missing_service_hotel_reservation	Cloud Management System Failure	New	`mongodb-rate` service is missed in `hotel_reservation` app	App	App
missing_service_social_network	Cloud Management System Failure	New	`user-service` service is missed in `social_network` app	App	App
namespace_memory_limit	Cloud Management System Failure	New	Pods in `search` service will get stuck in pending as they can't be scheduled due to memory limit	OS	App
pod_anti_affinity_deadlock	Cloud Management System Failure	New	Pods in `user-service` service will get stuck in pending as they can't be scheduled due to affinity rule	Cluster Management	App
persistent_volume_affinity_violation	Cloud Management System Failure	New	Pods in `user-service` service will get stuck in pending as they can't be scheduled because they can't mount the pvc	Cluster Management	App
pvc_claim_mismatch	Cloud Management System Failure	New	Pods in `mongodb` will get stuck in pending as they can't be scheduled because they can't mount the pvc	Cluster Management	App
rbac_misconfiguration	Misconfiguration Failure	New	Init containers unable to perform required operations due to insufficient permissions.	Cluster Management	App
readiness_probe_misconfiguration_astronomy_shop	Misconfiguration Failure	New	Pods in `frontend` will never enter `ready` state, making service unable to deal with requests	Cluster Management	App
readiness_probe_misconfiguration_hotel_reservation	Misconfiguration Failure	New	Pods in `frontend` will never enter `ready` state, making service unable to deal with requests	Cluster Management	App
readiness_probe_misconfiguration_social_network	Misconfiguration Failure	New	Pods in `user-service` will never enter `ready` state, making service unable to deal with requests	Cluster Management	App
resource_request_too_large	Cloud Management System Failure	New	Pods in `mongodb-rare` will never be scheduled because the container requires memory that excess limit in every node	Container	App
resource_request_too_small	Cloud Management System Failure	New	Pods in `mongodb-rare` will never be scheduled because the container requires memory that excess limit in every node	Container	App
rolling_update_misconfigured_hotel_reservation	Misconfiguration Failure	New	No pods will be available since they get stuck with a contracdictory update configuration	Cluster Management	App
rolling_update_misconfigured_social_network	Misconfiguration Failure	New	No pods will be available since they get stuck with a contracdictory update configuration	Container	App
scale_pod_zero_social_net	Cloud Management System Failure	AIOpsLab	The replicas will be set to `0`, leading to no available pods, thus making service unavailable	Cluster Management	App
service_dns_resolution_failure_astronomy_shop	Network Failure	New	Modify the configuration of CoreDNS, making DNS resolution to `frontend` service fail	Cluster Management	App
service_dns_resolution_failure_social_network	Network Failure	New	Modify the configuration of CoreDNS, making DNS resolution to `user-service` service fail	Cluster Management	App
sidecar_port_conflict_astronomy_shop	Misconfiguration Failure	New	The pods fails to start because a malicious sidecar container will compete for the port with main container	Container	App
sidecar_port_conflict_hotel_reservation	Misconfiguration Failure	New	The pods fails to start because a malicious sidecar container will compete for the port with main container	Container	App
sidecar_port_conflict_social_network	Misconfiguration Failure	New	The pods fails to start because a malicious sidecar container will compete for the port with main container	Container	App
stale_coredns_config_astronomy_shop	Network Failure	New	All communication within cluster will be interrupted, because DNS resolution to all services will fail	Cluster Management	App
stale_coredns_config_social_network	Network Failure	New	All communication within cluster will be interrupted, because DNS resolution to all services will fail	Cluster Management	App
taint_no_toleration_social_network	Cloud Management System Failure	New	Target pod can't be scheduled because there are no available nodes tolerate the pod	Cluster Management	App
wrong_bin_usage	System/Application Software Failure	AIOpsLab	Pod will start with wrong bin file	Container	App
wrong_dns_policy_astronomy_shop	Network Failure	New	All communication within cluster will be interrupted due to wrong DNS resolution policy in cluster	Cluster Management	App
wrong_dns_policy_hotel_reservation	Network Failure	New	All communication within cluster will be interrupted due to wrong DNS resolution policy in clusterl	Cluster Management	App
wrong_dns_policy_social_network	Network Failure	New	All communication within cluster will be interrupted due to wrong DNS resolution policy in cluster	Cluster Management	App
wrong_service_selector_astronomy_shop	Misconfiguration Failure	New	No available pods in the service due to wrong selector policy	Cluster Management	App
wrong_service_selector_hotel_reservation	Misconfiguration Failure	New	No available pods in the service due to wrong selector policy	Cluster Management	App
wrong_service_selector_social_network	Misconfiguration Failure	New	No available pods in the service due to wrong selector policy	Cluster Management	App
astronomy_shop_ad_service_failure	System/Application Software Failure	AIOpsLab	`Ad` service will fail	App	App
astronomy_shop_ad_service_high_cpu	System/Application Software Failure	AIOpsLab	Triggers high cpu load in the `Ad` service	Hardware	App
astronomy_shop_ad_service_manual_gc	System/Application Software Failure	AIOpsLab	Triggers full manual garbage collections in the `Ad` service	OS	App
astronomy_shop_cart_service_failure	System/Application Software Failure	AIOpsLab	Fail `cart` service	App	App
astronomy_shop_ad_service_image_slow_load	System/Application Software Failure	AIOpsLab	Slow loading images in the `frontend`	Container	App
astronomy_shop_payment_service_failure	System/Application Software Failure	AIOpsLab	Fail `payment` service charge requests n%	App	App
astronomy_shop_payment_service_unreachable	System/Application Software Failure	AIOpsLab	`payment` service is unavailable	App	App
astronomy_shop_product_catalog_service_failure	System/Application Software Failure	AIOpsLab	Fail `product_catalog` service on a specific product	App	App
astronomy_shop_recommendation_service_cache_failure	System/Application Software Failure	AIOpsLab	Fail `recommendation` service cache	App	App
kafka_queue_problems	System/Application Software Failure	AIOpsLab	Overloads Kafka queue while simultaneously introducing a consumer side delay leading to a lag spike	App	App
loadgenerator_flood_homepage	System/Application Software Failure	AIOpsLab	Flood the frontend with a large amount of requests	App	App
trainticket_f17_nested_sql_select_clause_error	System/Application Software Failure	New	Too many nested 'select' and 'from' clauses are in the constructed SQL statement	App	App
trainticket_f22_sql_column_name_mismatch_error	System/Application Software Failure	New	The constructed SQL statement includes a wrong column name in the 'select' part according to its 'from' part	App	App
read_error	Hardware Component Failure	New	Pods on the node will encounter read error when visiting storage	Hardware	App
latent_sector_error	Hardware Component Failure	New	Latent Sector Errors in storage cause I/O errors when `mongodb-geo` reads bad blocks, causing pod crashes	Hardware	App
silent_data_corruption	Hardware Component Failure	New	Silent data corruption in storage using `dm-flakey` causes MongoDB to crash when reading corrupted WiredTiger data files	Hardware	App
ingress_misroute	Network Failure	New	By modifying Kubernetes Ingress rules to route traffic from specific paths to incorrect backend services, simulates request misrouting issues caused by network routing configuration errors or load balancing misconfigurations.	Cluster Management	App
network_policy_block	Network Failure	New	Service isolation issues caused by network partitioning, firewall configuration errors, or overly strict security policies	Cluster Management	App
social_net_hotel_res_astro_shop_concurrent_failures	Multiple Independent Failures	New	/	/	/
workload_imbalance	System/Application Software Failure	New	By modifying kube-proxy, the workload won't be distributed to pods evenly and balancedly	Cluster Management	App
operator_overload_replicas	Cloud Management System Failure	New	Replicas number of TiDB is set to be a huge number, making a lot of pods stuck	Cluster Management	App
operator_non_existent_storage	Cloud Management System Failure	New	The name of storageclass is set to be an invalid value, making pvc stuck at pending	Cluster Management	App
operator_invalid_affinity_toleration	Misconfiguration Failure	New	An invalid `toleration effect` field is set, making pods unable to schedule	Cluster Management	App
operator_security_context_fault	Security Failure	New	`runAsUser` is set to be an invalid value, making pods crash	Cluster Management	App
operator_wrong_update_strategy_fault	Misconfiguration Failure	New	`statefulSetUpdateStrategy` is set to be an invalid value, making pods unable to restart and cluster unable to rollout update	Cluster Management	App
service_port_conflict_astronomy_shop	Misconfiguration Failure	New	Pods of `ad` service fail to schedule because hostPort conflicts with another service in a different namespace	Cluster Management	App
service_port_conflict_hotel_reservation	Misconfiguration Failure	New	Pods of `recommendation` service fail to schedule because hostPort conflicts with another service in a different namespace	Cluster Management	App
service_port_conflict_social_network	Misconfiguration Failure	New	Pods of `media-service` service fail to schedule because hostPort conflicts with another service in a different namespace	Cluster Management	App
top_of_rack_router_failure_hotel_reservation	Network Failure	New	Top-of-rack (ToR) router failure that partitions a subset of nodes from the rest of the cluster, leading to partial service reachability loss and cross-service communication failures	Cluster Management	App

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FilesExpand file tree

Problem List.md

Latest commit

History

Problem List.md

File metadata and controls