14 minute read

NVIDIA Jetson Thor 采用了 Blackwell 架构的 GPU。

性能基准测试分析

部署模型

vllm serve /models/Qwen/Qwen3-8B --served-model-name qwen3

运行性能基准测试

  • 高负载
vllm bench serve \
    --base-url http://localhost:8000 \
    --model qwen3 \
    --tokenizer /models/Qwen/Qwen3-8B \
    --dataset-name random \
    --random-input-len 2048 \
    --random-output-len 128 \
    --num-prompts 100 \
    --max-concurrency 8
  • 低负载
vllm bench serve \
    --base-url http://localhost:8000 \
    --model qwen3 \
    --tokenizer /models/Qwen/Qwen3-8B \
    --dataset-name random \
    --random-input-len 2048 \
    --random-output-len 128 \
    --num-prompts 10 \
    --max-concurrency 1

性能基准测试结果

Qwen3-8B

  • 高负载
============ Serving Benchmark Result ============
Successful requests:                     100
Maximum request concurrency:             8
Benchmark duration (s):                  150.59
Total input tokens:                      204169
Total generated tokens:                  12419
Request throughput (req/s):              0.66
Output token throughput (tok/s):         82.47
Total Token throughput (tok/s):          1438.24
---------------Time to First Token----------------
Mean TTFT (ms):                          974.57
Median TTFT (ms):                        959.99
P99 TTFT (ms):                           2200.61
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          85.33
Median TPOT (ms):                        86.05
P99 TPOT (ms):                           90.57
---------------Inter-token Latency----------------
Mean ITL (ms):                           85.33
Median ITL (ms):                         72.52
P99 ITL (ms):                            361.74
==================================================
  • 低负载
============ Serving Benchmark Result ============
Successful requests:                     10
Maximum request concurrency:             1
Benchmark duration (s):                  81.59
Total input tokens:                      20431
Total generated tokens:                  1280
Request throughput (req/s):              0.12
Output token throughput (tok/s):         15.69
Total Token throughput (tok/s):          266.09
---------------Time to First Token----------------
Mean TTFT (ms):                          78.19
Median TTFT (ms):                        78.22
P99 TTFT (ms):                           80.61
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          63.63
Median TPOT (ms):                        63.63
P99 TPOT (ms):                           63.76
---------------Inter-token Latency----------------
Mean ITL (ms):                           63.63
Median ITL (ms):                         63.59
P99 ITL (ms):                            64.89
==================================================

Qwen3-8B-FP8

  • 高负载
============ Serving Benchmark Result ============
Successful requests:                     100
Maximum request concurrency:             8
Benchmark duration (s):                  42.94
Total input tokens:                      204169
Total generated tokens:                  12800
Request throughput (req/s):              2.33
Output token throughput (tok/s):         298.07
Total Token throughput (tok/s):          5052.48
---------------Time to First Token----------------
Mean TTFT (ms):                          495.44
Median TTFT (ms):                        455.84
P99 TTFT (ms):                           912.34
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          22.59
Median TPOT (ms):                        23.00
P99 TPOT (ms):                           25.05
---------------Inter-token Latency----------------
Mean ITL (ms):                           22.59
Median ITL (ms):                         17.88
P99 ITL (ms):                            150.30
==================================================
  • 低负载
============ Serving Benchmark Result ============
Successful requests:                     10
Maximum request concurrency:             1
Benchmark duration (s):                  11.52
Total input tokens:                      20431
Total generated tokens:                  1280
Request throughput (req/s):              0.87
Output token throughput (tok/s):         111.15
Total Token throughput (tok/s):          1885.21
---------------Time to First Token----------------
Mean TTFT (ms):                          23.06
Median TTFT (ms):                        23.14
P99 TTFT (ms):                           25.49
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          8.88
Median TPOT (ms):                        8.88
P99 TPOT (ms):                           8.98
---------------Inter-token Latency----------------
Mean ITL (ms):                           8.88
Median ITL (ms):                         8.68
P99 ITL (ms):                            9.99
==================================================

Qwen3-8B-FP4

  • 高负载
============ Serving Benchmark Result ============
Successful requests:                     100
Maximum request concurrency:             8
Benchmark duration (s):                  74.12
Total input tokens:                      204169
Total generated tokens:                  12393
Request throughput (req/s):              1.35
Output token throughput (tok/s):         167.20
Total Token throughput (tok/s):          2921.73
---------------Time to First Token----------------
Mean TTFT (ms):                          570.92
Median TTFT (ms):                        460.86
P99 TTFT (ms):                           1935.81
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          42.08
Median TPOT (ms):                        41.89
P99 TPOT (ms):                           50.72
---------------Inter-token Latency----------------
Mean ITL (ms):                           42.09
Median ITL (ms):                         33.41
P99 ITL (ms):                            211.06
==================================================
  • 低负载
============ Serving Benchmark Result ============
Successful requests:                     10
Maximum request concurrency:             1
Benchmark duration (s):                  31.79
Total input tokens:                      20431
Total generated tokens:                  1280
Request throughput (req/s):              0.31
Output token throughput (tok/s):         40.26
Total Token throughput (tok/s):          682.94
---------------Time to First Token----------------
Mean TTFT (ms):                          38.55
Median TTFT (ms):                        38.39
P99 TTFT (ms):                           40.58
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          24.73
Median TPOT (ms):                        24.71
P99 TPOT (ms):                           24.81
---------------Inter-token Latency----------------
Mean ITL (ms):                           24.73
Median ITL (ms):                         24.60
P99 ITL (ms):                            25.78
==================================================

Qwen3-8B-GPTQ-Int4

  • 高负载
============ Serving Benchmark Result ============
Successful requests:                     200
Maximum request concurrency:             8
Benchmark duration (s):                  240.75
Total input tokens:                      408281
Total generated tokens:                  24244
Request throughput (req/s):              0.83
Output token throughput (tok/s):         100.70
Total Token throughput (tok/s):          1796.55
---------------Time to First Token----------------
Mean TTFT (ms):                          1918.40
Median TTFT (ms):                        1886.97
P99 TTFT (ms):                           3725.30
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          64.07
Median TPOT (ms):                        65.20
P99 TPOT (ms):                           77.52
---------------Inter-token Latency----------------
Mean ITL (ms):                           63.69
Median ITL (ms):                         31.55
P99 ITL (ms):                            743.86
==================================================
  • 低负载
============ Serving Benchmark Result ============
Successful requests:                     10
Maximum request concurrency:             1
Benchmark duration (s):                  29.70
Total input tokens:                      20431
Total generated tokens:                  1280
Request throughput (req/s):              0.34
Output token throughput (tok/s):         43.09
Total Token throughput (tok/s):          730.96
---------------Time to First Token----------------
Mean TTFT (ms):                          36.87
Median TTFT (ms):                        36.94
P99 TTFT (ms):                           38.49
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          23.09
Median TPOT (ms):                        23.09
P99 TPOT (ms):                           23.19
---------------Inter-token Latency----------------
Mean ITL (ms):                           23.09
Median ITL (ms):                         22.96
P99 ITL (ms):                            24.13
==================================================

Qwen3-8B-GPTQ-Int8

  • 高负载
============ Serving Benchmark Result ============
Successful requests:                     100
Maximum request concurrency:             8
Benchmark duration (s):                  156.75
Total input tokens:                      204169
Total generated tokens:                  12419
Request throughput (req/s):              0.64
Output token throughput (tok/s):         79.23
Total Token throughput (tok/s):          1381.78
---------------Time to First Token----------------
Mean TTFT (ms):                          2225.45
Median TTFT (ms):                        1899.85
P99 TTFT (ms):                           5168.47
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          81.04
Median TPOT (ms):                        83.37
P99 TPOT (ms):                           91.55
---------------Inter-token Latency----------------
Mean ITL (ms):                           81.04
Median ITL (ms):                         45.06
P99 ITL (ms):                            858.38
==================================================
  • 低负载
============ Serving Benchmark Result ============
Successful requests:                     10
Maximum request concurrency:             1
Benchmark duration (s):                  47.19
Total input tokens:                      20431
Total generated tokens:                  1280
Request throughput (req/s):              0.21
Output token throughput (tok/s):         27.13
Total Token throughput (tok/s):          460.11
---------------Time to First Token----------------
Mean TTFT (ms):                          50.83
Median TTFT (ms):                        50.65
P99 TTFT (ms):                           53.36
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          36.75
Median TPOT (ms):                        36.74
P99 TPOT (ms):                           36.86
---------------Inter-token Latency----------------
Mean ITL (ms):                           36.75
Median ITL (ms):                         36.83
P99 ITL (ms):                            37.66
==================================================

Qwen3-8B-AWQ

  • 高负载
============ Serving Benchmark Result ============
Successful requests:                     100
Maximum request concurrency:             8
Benchmark duration (s):                  123.36
Total input tokens:                      204169
Total generated tokens:                  12392
Request throughput (req/s):              0.81
Output token throughput (tok/s):         100.45
Total Token throughput (tok/s):          1755.51
---------------Time to First Token----------------
Mean TTFT (ms):                          1823.17
Median TTFT (ms):                        1529.95
P99 TTFT (ms):                           4474.37
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          63.92
Median TPOT (ms):                        65.64
P99 TPOT (ms):                           77.46
---------------Inter-token Latency----------------
Mean ITL (ms):                           63.99
Median ITL (ms):                         31.84
P99 ITL (ms):                            745.13
==================================================
  • 低负载
============ Serving Benchmark Result ============
Successful requests:                     10
Maximum request concurrency:             1
Benchmark duration (s):                  30.00
Total input tokens:                      20431
Total generated tokens:                  1280
Request throughput (req/s):              0.33
Output token throughput (tok/s):         42.66
Total Token throughput (tok/s):          723.61
---------------Time to First Token----------------
Mean TTFT (ms):                          36.95
Median TTFT (ms):                        37.39
P99 TTFT (ms):                           38.72
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          23.33
Median TPOT (ms):                        23.36
P99 TPOT (ms):                           23.40
---------------Inter-token Latency----------------
Mean ITL (ms):                           23.33
Median ITL (ms):                         23.17
P99 ITL (ms):                            24.34
==================================================

Qwen3-8B-Int4-W4A16

  • 高负载
============ Serving Benchmark Result ============
Successful requests:                     100
Maximum request concurrency:             8
Benchmark duration (s):                  124.74
Total input tokens:                      204169
Total generated tokens:                  12419
Request throughput (req/s):              0.80
Output token throughput (tok/s):         99.56
Total Token throughput (tok/s):          1736.36
---------------Time to First Token----------------
Mean TTFT (ms):                          2114.14
Median TTFT (ms):                        2230.05
P99 TTFT (ms):                           4737.55
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          62.10
Median TPOT (ms):                        61.38
P99 TPOT (ms):                           71.65
---------------Inter-token Latency----------------
Mean ITL (ms):                           62.10
Median ITL (ms):                         31.80
P99 ITL (ms):                            746.19
==================================================
  • 低负载
============ Serving Benchmark Result ============
Successful requests:                     10
Maximum request concurrency:             1
Benchmark duration (s):                  29.98
Total input tokens:                      20431
Total generated tokens:                  1280
Request throughput (req/s):              0.33
Output token throughput (tok/s):         42.70
Total Token throughput (tok/s):          724.22
---------------Time to First Token----------------
Mean TTFT (ms):                          37.85
Median TTFT (ms):                        38.06
P99 TTFT (ms):                           39.91
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          23.30
Median TPOT (ms):                        23.31
P99 TPOT (ms):                           23.41
---------------Inter-token Latency----------------
Mean ITL (ms):                           23.30
Median ITL (ms):                         23.11
P99 ITL (ms):                            24.41
==================================================

Qwen3-8B-Int8-W8A16

  • 高负载
============ Serving Benchmark Result ============
Successful requests:                     100
Maximum request concurrency:             8
Benchmark duration (s):                  152.79
Total input tokens:                      204169
Total generated tokens:                  12419
Request throughput (req/s):              0.65
Output token throughput (tok/s):         81.28
Total Token throughput (tok/s):          1417.54
---------------Time to First Token----------------
Mean TTFT (ms):                          2294.06
Median TTFT (ms):                        2376.24
P99 TTFT (ms):                           5134.68
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          77.95
Median TPOT (ms):                        80.18
P99 TPOT (ms):                           87.64
---------------Inter-token Latency----------------
Mean ITL (ms):                           77.95
Median ITL (ms):                         45.00
P99 ITL (ms):                            852.45
==================================================
  • 低负载
============ Serving Benchmark Result ============
Successful requests:                     10
Maximum request concurrency:             1
Benchmark duration (s):                  46.51
Total input tokens:                      20431
Total generated tokens:                  1280
Request throughput (req/s):              0.22
Output token throughput (tok/s):         27.52
Total Token throughput (tok/s):          466.84
---------------Time to First Token----------------
Mean TTFT (ms):                          50.23
Median TTFT (ms):                        50.24
P99 TTFT (ms):                           51.65
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          36.22
Median TPOT (ms):                        36.22
P99 TPOT (ms):                           36.25
---------------Inter-token Latency----------------
Mean ITL (ms):                           36.22
Median ITL (ms):                         36.24
P99 ITL (ms):                            37.03
==================================================

Qwen3-8B-GGUF

  • 高负载
============ Serving Benchmark Result ============
Successful requests:                     100
Maximum request concurrency:             8
Benchmark duration (s):                  1617.23
Total input tokens:                      204169
Total generated tokens:                  12800
Request throughput (req/s):              0.06
Output token throughput (tok/s):         7.91
Total Token throughput (tok/s):          134.16
---------------Time to First Token----------------
Mean TTFT (ms):                          43688.41
Median TTFT (ms):                        47162.20
P99 TTFT (ms):                           93242.93
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          670.56
Median TPOT (ms):                        706.07
P99 TPOT (ms):                           830.26
---------------Inter-token Latency----------------
Mean ITL (ms):                           670.56
Median ITL (ms):                         88.76
P99 ITL (ms):                            15722.16
==================================================
  • 低负载
============ Serving Benchmark Result ============
Successful requests:                     1
Benchmark duration (s):                  4.39
Total input tokens:                      2048
Total generated tokens:                  128
Request throughput (req/s):              0.23
Output token throughput (tok/s):         29.14
Total Token throughput (tok/s):          495.41
---------------Time to First Token----------------
Mean TTFT (ms):                          148.53
Median TTFT (ms):                        148.53
P99 TTFT (ms):                           148.53
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          33.41
Median TPOT (ms):                        33.41
P99 TPOT (ms):                           33.41
---------------Inter-token Latency----------------
Mean ITL (ms):                           33.41
Median ITL (ms):                         33.34
P99 ITL (ms):                            34.03
==================================================

Qwen3-32B-AWQ

  • 高负载
============ Serving Benchmark Result ============
Successful requests:                     100
Maximum request concurrency:             8
Benchmark duration (s):                  459.09
Total input tokens:                      204169
Total generated tokens:                  12421
Request throughput (req/s):              0.22
Output token throughput (tok/s):         27.06
Total Token throughput (tok/s):          471.78
---------------Time to First Token----------------
Mean TTFT (ms):                          6799.32
Median TTFT (ms):                        6419.03
P99 TTFT (ms):                           19086.70
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          240.60
Median TPOT (ms):                        238.28
P99 TPOT (ms):                           295.35
---------------Inter-token Latency----------------
Mean ITL (ms):                           237.24
Median ITL (ms):                         92.45
P99 ITL (ms):                            3155.36
==================================================
  • 低负载
============ Serving Benchmark Result ============
Successful requests:                     10
Maximum request concurrency:             1
Benchmark duration (s):                  98.36
Total input tokens:                      20431
Total generated tokens:                  1280
Request throughput (req/s):              0.10
Output token throughput (tok/s):         13.01
Total Token throughput (tok/s):          220.74
---------------Time to First Token----------------
Mean TTFT (ms):                          96.62
Median TTFT (ms):                        96.92
P99 TTFT (ms):                           98.87
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          76.68
Median TPOT (ms):                        76.69
P99 TPOT (ms):                           76.72
---------------Inter-token Latency----------------
Mean ITL (ms):                           76.68
Median ITL (ms):                         76.57
P99 ITL (ms):                            77.71
==================================================

Qwen3-30B-A3B

  • 高负载
============ Serving Benchmark Result ============
Successful requests:                     100
Maximum request concurrency:             8
Benchmark duration (s):                  180.95
Total input tokens:                      204169
Total generated tokens:                  12800
Request throughput (req/s):              0.55
Output token throughput (tok/s):         70.74
Total Token throughput (tok/s):          1199.07
---------------Time to First Token----------------
Mean TTFT (ms):                          1635.21
Median TTFT (ms):                        1486.42
P99 TTFT (ms):                           4474.52
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          98.74
Median TPOT (ms):                        96.12
P99 TPOT (ms):                           130.45
---------------Inter-token Latency----------------
Mean ITL (ms):                           98.74
Median ITL (ms):                         77.16
P99 ITL (ms):                            728.40
==================================================
  • 低负载
============ Serving Benchmark Result ============
Successful requests:                     10
Maximum request concurrency:             1
Benchmark duration (s):                  37.26
Total input tokens:                      20431
Total generated tokens:                  1280
Request throughput (req/s):              0.27
Output token throughput (tok/s):         34.36
Total Token throughput (tok/s):          582.73
---------------Time to First Token----------------
Mean TTFT (ms):                          81.13
Median TTFT (ms):                        85.54
P99 TTFT (ms):                           92.11
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          28.70
Median TPOT (ms):                        28.70
P99 TPOT (ms):                           28.73
---------------Inter-token Latency----------------
Mean ITL (ms):                           28.70
Median ITL (ms):                         28.68
P99 ITL (ms):                            29.28
==================================================

Qwen3-30B-A3B-FP8

  • 高负载
============ Serving Benchmark Result ============
Successful requests:                     100
Maximum request concurrency:             8
Benchmark duration (s):                  112.22
Total input tokens:                      204169
Total generated tokens:                  12800
Request throughput (req/s):              0.89
Output token throughput (tok/s):         114.07
Total Token throughput (tok/s):          1933.51
---------------Time to First Token----------------
Mean TTFT (ms):                          1297.50
Median TTFT (ms):                        1305.82
P99 TTFT (ms):                           2802.43
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          59.88
Median TPOT (ms):                        59.43
P99 TPOT (ms):                           76.87
---------------Inter-token Latency----------------
Mean ITL (ms):                           59.88
Median ITL (ms):                         41.25
P99 ITL (ms):                            456.73
==================================================
  • 低负载
============ Serving Benchmark Result ============
Successful requests:                     10
Maximum request concurrency:             1
Benchmark duration (s):                  13.35
Total input tokens:                      20431
Total generated tokens:                  1280
Request throughput (req/s):              0.75
Output token throughput (tok/s):         95.91
Total Token throughput (tok/s):          1626.78
---------------Time to First Token----------------
Mean TTFT (ms):                          71.24
Median TTFT (ms):                        68.31
P99 TTFT (ms):                           94.13
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          9.95
Median TPOT (ms):                        9.89
P99 TPOT (ms):                           10.33
---------------Inter-token Latency----------------
Mean ITL (ms):                           9.95
Median ITL (ms):                         9.87
P99 ITL (ms):                            11.44
==================================================

Qwen3-Coder-30B-A3B-Instruct-FP8

  • 高负载
============ Serving Benchmark Result ============
Successful requests:                     100
Maximum request concurrency:             8
Benchmark duration (s):                  129.15
Total input tokens:                      204169
Total generated tokens:                  12800
Request throughput (req/s):              0.77
Output token throughput (tok/s):         99.11
Total Token throughput (tok/s):          1680.01
---------------Time to First Token----------------
Mean TTFT (ms):                          1368.76
Median TTFT (ms):                        1400.81
P99 TTFT (ms):                           2779.06
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          69.97
Median TPOT (ms):                        74.07
P99 TPOT (ms):                           83.47
---------------Inter-token Latency----------------
Mean ITL (ms):                           69.97
Median ITL (ms):                         57.64
P99 ITL (ms):                            481.45
==================================================
  • 低负载
============ Serving Benchmark Result ============
Successful requests:                     10
Maximum request concurrency:             1
Benchmark duration (s):                  15.52
Total input tokens:                      20431
Total generated tokens:                  1280
Request throughput (req/s):              0.64
Output token throughput (tok/s):         82.45
Total Token throughput (tok/s):          1398.53
---------------Time to First Token----------------
Mean TTFT (ms):                          96.15
Median TTFT (ms):                        96.01
P99 TTFT (ms):                           98.57
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          11.46
Median TPOT (ms):                        11.46
P99 TPOT (ms):                           11.49
---------------Inter-token Latency----------------
Mean ITL (ms):                           11.46
Median ITL (ms):                         11.41
P99 ITL (ms):                            12.12
==================================================

Qwen3-Coder-30B-A3B-Instruct-AWQ-8bit

  • 高负载
============ Serving Benchmark Result ============
Successful requests:                     100
Maximum request concurrency:             8
Benchmark duration (s):                  131.51
Total input tokens:                      204169
Total generated tokens:                  12800
Request throughput (req/s):              0.76
Output token throughput (tok/s):         97.33
Total Token throughput (tok/s):          1649.80
---------------Time to First Token----------------
Mean TTFT (ms):                          1665.38
Median TTFT (ms):                        1823.10
P99 TTFT (ms):                           3795.56
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          68.32
Median TPOT (ms):                        69.15
P99 TPOT (ms):                           79.89
---------------Inter-token Latency----------------
Mean ITL (ms):                           68.32
Median ITL (ms):                         44.27
P99 ITL (ms):                            630.03
==================================================
  • 低负载
============ Serving Benchmark Result ============
Successful requests:                     10
Maximum request concurrency:             1
Benchmark duration (s):                  27.37
Total input tokens:                      20431
Total generated tokens:                  1280
Request throughput (req/s):              0.37
Output token throughput (tok/s):         46.77
Total Token throughput (tok/s):          793.33
---------------Time to First Token----------------
Mean TTFT (ms):                          61.33
Median TTFT (ms):                        62.60
P99 TTFT (ms):                           71.58
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          21.06
Median TPOT (ms):                        21.73
P99 TPOT (ms):                           21.80
---------------Inter-token Latency----------------
Mean ITL (ms):                           21.06
Median ITL (ms):                         21.59
P99 ITL (ms):                            22.42
==================================================

Qwen3-Coder-30B-A3B-Instruct-AWQ-4bit

  • 高负载
============ Serving Benchmark Result ============
Successful requests:                     100
Maximum request concurrency:             8
Benchmark duration (s):                  102.74
Total input tokens:                      204169
Total generated tokens:                  12675
Request throughput (req/s):              0.97
Output token throughput (tok/s):         123.37
Total Token throughput (tok/s):          2110.67
---------------Time to First Token----------------
Mean TTFT (ms):                          1298.94
Median TTFT (ms):                        1150.24
P99 TTFT (ms):                           3232.16
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          53.75
Median TPOT (ms):                        53.71
P99 TPOT (ms):                           61.10
---------------Inter-token Latency----------------
Mean ITL (ms):                           53.79
Median ITL (ms):                         31.55
P99 ITL (ms):                            541.87
==================================================
  • 低负载
============ Serving Benchmark Result ============
Successful requests:                     10
Maximum request concurrency:             1
Benchmark duration (s):                  20.71
Total input tokens:                      20431
Total generated tokens:                  1280
Request throughput (req/s):              0.48
Output token throughput (tok/s):         61.79
Total Token throughput (tok/s):          1048.09
---------------Time to First Token----------------
Mean TTFT (ms):                          44.06
Median TTFT (ms):                        45.90
P99 TTFT (ms):                           49.16
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          15.96
Median TPOT (ms):                        15.96
P99 TPOT (ms):                           16.00
---------------Inter-token Latency----------------
Mean ITL (ms):                           15.96
Median ITL (ms):                         15.95
P99 ITL (ms):                            16.52
==================================================

Qwen3-Coder-30B-A3B-Instruct-Int4-W4A16

  • 高负载
============ Serving Benchmark Result ============
Successful requests:                     100
Maximum request concurrency:             8
Benchmark duration (s):                  99.27
Total input tokens:                      204169
Total generated tokens:                  12789
Request throughput (req/s):              1.01
Output token throughput (tok/s):         128.83
Total Token throughput (tok/s):          2185.54
---------------Time to First Token----------------
Mean TTFT (ms):                          1314.33
Median TTFT (ms):                        1111.64
P99 TTFT (ms):                           3123.45
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          51.26
Median TPOT (ms):                        52.49
P99 TPOT (ms):                           59.17
---------------Inter-token Latency----------------
Mean ITL (ms):                           51.27
Median ITL (ms):                         30.34
P99 ITL (ms):                            523.20
==================================================
  • 低负载
============ Serving Benchmark Result ============
Successful requests:                     10
Maximum request concurrency:             1
Benchmark duration (s):                  19.68
Total input tokens:                      20431
Total generated tokens:                  1280
Request throughput (req/s):              0.51
Output token throughput (tok/s):         65.05
Total Token throughput (tok/s):          1103.33
---------------Time to First Token----------------
Mean TTFT (ms):                          41.75
Median TTFT (ms):                        43.20
P99 TTFT (ms):                           46.20
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          15.16
Median TPOT (ms):                        15.16
P99 TPOT (ms):                           15.19
---------------Inter-token Latency----------------
Mean ITL (ms):                           15.16
Median ITL (ms):                         15.15
P99 ITL (ms):                            15.68
==================================================

参考资料

Updated: