-
Notifications
You must be signed in to change notification settings - Fork 101
Open
Description
process_result.py calculates interactivity metrics from the time per output token (tpot) metrics:
for key, value in bmk_result.items():
if key.endswith('ms'):
data[key.replace('_ms', '')] = float(value) / 1000.0
if 'tpot' in key:
data[key.replace('_ms', '').replace('tpot', 'intvty')] = 1000.0 / float(value)This is incorrect for the standard deviation - when the standard deviation is low, we'll produce a huge standard deviation of interactivity. Here's an example from a recent run:
{
"hw": "gb200",
"tp": 36,
"ep": 1,
"dp_attention": "false",
"conc": 512,
"model": "/mnt/lustre01/models/deepseek-r1-0528-fp4-v2",
"framework": "dynamo-trtllm",
"precision": "fp4",
"isl": 1024,
"osl": 1024,
"tput_per_gpu": 2423.747715243624,
"output_tput_per_gpu": 1363.5939611144877,
"input_tput_per_gpu": 10904.977748276717,
"disagg": true,
"num_prefill_gpu": 4,
"num_decode_gpu": 32,
"mtp": "on",
"mean_ttft": 1.2919637345420085,
"median_ttft": 0.4378697440261021,
"std_ttft": 2.1574805073166776,
"p99_ttft": 9.712858405703447,
"mean_tpot": 0.009590339660491956,
"mean_intvty": 104.27159364538116,
"median_tpot": 0.009667698638864516,
"median_intvty": 103.43723334320352,
"std_tpot": 0.0009682636562182605,
"std_intvty": 1032.7765516942895,
"p99_tpot": 0.011633036075231341,
"p99_intvty": 85.96208191335067,
"mean_itl": 0.4402040011098412,
"median_itl": 0.4408050079946406,
"std_itl": 0.0884514747228438,
"p99_itl": 0.8289496807963588,
"mean_e2el": 10.116033398757338,
"median_e2el": 9.590883667988237,
"std_e2el": 2.3359472701507817,
"p99_e2el": 18.67461088597775
},
Unless you're able to get the figure from the benchmark harness, the simplest fix is probably to avoid calculating and emitting that field altogether.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels