SGLang uses the built-in library unittest as the testing framework.
cd sglang/test/srt
# Run a single file
python3 test_srt_endpoint.py
# Run a single test
python3 test_srt_endpoint.py TestSRTEndpoint.test_simple_decode
# Run a suite with multiple files
python3 run_suite.py --suite per-commitcd sglang/test/lang
# Run a single file
python3 test_choices.py- Create new test files under
test/srtortest/langdepending on the type of test. - For nightly tests, place them in
test/srt/nightly/. Use theNightlyBenchmarkRunnerhelper class innightly_utils.pyfor performance benchmarking tests. - Ensure they are referenced in the respective
run_suite.py(e.g.,test/srt/run_suite.py) so they are picked up in CI. For most small test cases, they can be added to theper-commit-1-gpusuite. Sort the test cases alphabetically by name. - Ensure you added
unittest.main()for unittest andsys.exit(pytest.main([__file__]))for pytest in the scripts. The CI run them viapython3 test_file.py. - The CI will run some suites such as
per-commit-1-gpu,per-commit-2-gpu, andnightly-1-gpuautomatically. If you need special setup or custom test groups, you may modify the workflows in.github/workflows/.
Tests in test/registered/ use a registry-based CI system for flexible backend/schedule configuration.
from sglang.test.ci.ci_register import (
register_cuda_ci,
register_amd_ci,
register_cpu_ci,
register_npu_ci,
)
# Per-commit test (small 1-gpu, runs on 5090)
register_cuda_ci(est_time=80, suite="stage-b-test-small-1-gpu")
# Per-commit test (large 1-gpu, runs on H100)
register_cuda_ci(est_time=120, suite="stage-b-test-large-1-gpu")
# Per-commit test (2-gpu)
register_cuda_ci(est_time=200, suite="stage-b-test-large-2-gpu")
# Nightly-only test
register_cuda_ci(est_time=200, suite="nightly-1-gpu", nightly=True)
# Multi-backend test
register_cuda_ci(est_time=80, suite="stage-b-test-small-1-gpu")
register_amd_ci(est_time=120, suite="stage-a-test-1")
# Temporarily disabled test
register_cuda_ci(est_time=80, suite="stage-b-test-small-1-gpu", disabled="flaky - see #12345")When adding 1-GPU tests, choose the appropriate suite based on hardware compatibility:
| Suite | Runner | GPU | When to Use |
|---|---|---|---|
stage-b-test-small-1-gpu |
1-gpu-5090 |
RTX 5090 (32GB, SM120) | 5090-compatible tests (preferred) |
stage-b-test-large-1-gpu |
1-gpu-runner |
H100 (80GB, SM90) | Large models or 5090-incompatible tests |
Use stage-b-test-small-1-gpu (5090) whenever possible - this is the preferred suite for most 1-GPU tests.
Use stage-b-test-large-1-gpu (H100) if ANY of these apply:
-
Architecture incompatibility (SM120/Blackwell):
- FA3 attention backend (requires SM≤90)
- MLA with FA3 backend
- FP8/MXFP4 quantization (not supported on SM120)
- Certain Triton kernels (shared memory limits)
-
Memory requirements:
- Models >30B params or large MoE
- Tests requiring >32GB VRAM
-
Known 5090 failures:
- Weight update/sync tests
- Certain spec decoding tests
If a test cannot run on 5090 due to any of the above, use stage-b-test-large-1-gpu which runs on H100.
Per-Commit (CUDA):
- Stage A:
stage-a-test-1(locked),stage-a-test-2,stage-a-test-cpu - Stage B:
stage-b-test-small-1-gpu(5090),stage-b-test-large-1-gpu(H100),stage-b-test-large-2-gpu - Stage C (4-GPU):
stage-c-test-4-gpu-h100,stage-c-test-4-gpu-b200,stage-c-test-4-gpu-gb200,stage-c-test-deepep-4-gpu - Stage C (8-GPU):
stage-c-test-8-gpu-h20,stage-c-test-8-gpu-h200,stage-c-test-8-gpu-b200,stage-c-test-deepep-8-gpu-h200
Per-Commit (AMD):
stage-a-test-1,stage-b-test-small-1-gpu-amd,stage-b-test-large-2-gpu-amd
Nightly:
nightly-1-gpu,nightly-2-gpu,nightly-4-gpu,nightly-8-gpu, etc.
# Run per-commit tests
python test/run_suite.py --hw cuda --suite stage-b-test-small-1-gpu
# Run nightly tests
python test/run_suite.py --hw cuda --suite nightly-1-gpu --nightly
# With auto-partitioning (for parallel CI jobs)
python test/run_suite.py --hw cuda --suite stage-b-test-small-1-gpu \
--auto-partition-id 0 --auto-partition-size 4- Learn from existing examples in sglang/test/srt.
- Reduce the test time by using smaller models and reusing the server for multiple test cases. Launching a server takes a lot of time.
- Use as few GPUs as possible. Do not run long tests with 8-gpu runners.
- If the test cases take too long, considering adding them to nightly tests instead of per-commit tests.
- Keep each test function focused on a single scenario or piece of functionality.
- Give tests descriptive names reflecting their purpose.
- Use robust assertions (e.g., assert, unittest methods) to validate outcomes.
- Clean up resources to avoid side effects and preserve test independence.
- Reduce the test time by using smaller models and reusing the server for multiple test cases.
- For text models: extend global model lists variables in
test_utils.py, or add more model lists - For vlms: extend the
MODEL_THRESHOLDSglobal dictionary intest/srt/nightly/test_vlms_mmmu_eval.py