Peak Benchmark: Geodessical GPU vs Ollama GPU
This file is a raw benchmark summary. Read it alongside ../BENCHMARK_ANALYSIS.md if you want the interpretation and follow-up fixes.
Date: 2026-04-15 13:47
CPU: AMD Ryzen 9 7940HS w/ Radeon 780M Graphics
GPU: NVIDIA GeForce RTX 4070 Laptop GPU | VRAM total: 8188 MB
Model: gemma4-2b (google_gemma-4-E2B-it Q4_0, 3.2 GB)
Trials: 5 measured per condition + 1 warmup discarded
Conditions: GPU backend only -- optimal settings for each runtime
Metrics:
Decode t/s = decode-phase tokens per second
Prefill t/s = prompt processing throughput
TTFT ms = Time To First Token
E2E ms = total wall time (TTFT + decode)
Avg GPU% = mean GPU utilization during inference
Peak VRAM = max VRAM used during inference (MB)
Avg Watt = mean GPU power draw (W)
Avg CPU% = mean system CPU load during inference
Peak RAM = peak process working set (MB)
Performance: Head-to-Head
Runtime
N
N-gen
Decode t/s
Prefill t/s
TTFT ms
E2E ms
ms/tok
Geo-GPU
128
128
71
114.4
684
2487.4
19.4
Geo-GPU
512
512
88.4
100.3
689.6
6482.2
12.7
Oll-GPU
128
128
116.9
4105
11.5
1106.2
8.6
Oll-GPU
512
512
114.8
3487.1
12.6
4472.6
8.7
Geo/Oll ratio
128
0.61x
Geo/Oll ratio
512
0.77x
Runtime
N
N-gen
Decode t/s
Prefill t/s
TTFT ms
E2E ms
ms/tok
Geo-GPU
128
14
35.9
115.1
259.2
649.4
46.4
Geo-GPU
512
14
35.8
115
260.2
651
46.5
Oll-GPU
128
128
118
1976.5
10.9
1095.4
8.6
Oll-GPU
512
321
115.5
1549.5
13.1
2794.4
8.7
Geo/Oll ratio
128
0.3x
Geo/Oll ratio
512
0.31x
Resource Usage: Head-to-Head
Runtime
N
Avg GPU%
Peak VRAM MB
Avg Watt
Avg CPU%
Peak RAM MB
Geo-GPU
128
1
3617
20.3
23.4
1173
Geo-GPU
512
74.3
5602
62.8
22.7
2626
Oll-GPU
128
85.7
3617
60.1
24
1045
Oll-GPU
512
73.4
3617
62.9
28.1
1047
Runtime
N
Avg GPU%
Peak VRAM MB
Avg Watt
Avg CPU%
Peak RAM MB
Geo-GPU
128
22.5
3617
35
21.8
1211
Geo-GPU
512
33.8
3617
32.6
29
1167
Oll-GPU
128
87.7
3617
70
28
1047
Oll-GPU
512
83.2
3617
67.1
27.2
1047
Efficiency: Decode t/s per Watt
Runtime
Prompt
N
Decode t/s
Avg Watt
t/s per W
Geo-GPU
long
128
71
20.3
3.498
Geo-GPU
long
512
88.4
62.8
1.408
Geo-GPU
short
128
35.9
35
1.026
Geo-GPU
short
512
35.8
32.6
1.098
Oll-GPU
long
128
116.9
60.1
1.945
Oll-GPU
long
512
114.8
62.9
1.825
Oll-GPU
short
128
118
70
1.686
Oll-GPU
short
512
115.5
67.1
1.721
Runtime
Prompt
N
Trial
N-gen
Decode t/s
Prefill t/s
TTFT ms
E2E ms
GPU%
VRAM MB
Watt
CPU%
RAM MB
Geo-GPU
long
128
1
128
70.4
114.7
701
2518
0
3617
16.9
24
1160
Geo-GPU
long
128
2
128
70.2
113.3
692
2515
1
3617
17.6
23
1131
Geo-GPU
long
128
3
128
71
114.3
682
2484
0
3617
31.1
22
1173
Geo-GPU
long
128
4
128
71.2
114.8
680
2477
1
3617
18.2
19
1150
Geo-GPU
long
128
5
128
72
115
665
2443
0
3617
17.7
29
1160
Geo-GPU
long
512
1
512
88.4
100.4
689
6480
83.5
5602
70.3
19.5
2626
Geo-GPU
long
512
2
512
88.3
100.3
695
6495
70
5602
57.9
27.8
2626
Geo-GPU
long
512
3
512
88.8
100.4
671
6438
74.2
5602
59.6
18
2626
Geo-GPU
long
512
4
512
88.2
100.2
698
6504
70
5602
62.5
31.2
2626
Geo-GPU
long
512
5
512
88.3
100.3
695
6494
74
5602
63.7
17
2626
Geo-GPU
short
128
1
14
37.3
122.3
252
627
0
3617
18.1
22
1160
Geo-GPU
short
128
2
14
35.3
111.1
262
659
1
3617
18.1
21
1150
Geo-GPU
short
128
3
14
37.4
122.9
252
626
44
3617
56.3
19
1160
Geo-GPU
short
128
4
14
34.1
108.6
271
681
0
3617
34
26
1160
Geo-GPU
short
128
5
14
35.4
110.6
259
654
0
3617
48.4
21
1211
Geo-GPU
short
512
1
14
35.4
111.1
260
655
0
3617
16.8
27
1160
Geo-GPU
short
512
2
14
35.1
111.2
264
663
73
3617
56.5
27
1160
Geo-GPU
short
512
3
14
36.6
119.7
257
639
1
3617
16.9
33
1160
Geo-GPU
short
512
4
14
37.1
121.3
254
631
60
3617
54.9
21
1167
Geo-GPU
short
512
5
14
34.9
111.5
266
667
1
3617
17.8
37
1160
Oll-GPU
long
128
1
128
116.6
2123.5
20.2
1118
88
3617
69.6
26
1041
Oll-GPU
long
128
2
128
117.3
4690.1
9.2
1100
83
3617
65.7
30
1044
Oll-GPU
long
128
3
128
118
4464.7
9.6
1094
86
3617
67.8
21
1045
Oll-GPU
long
128
4
128
115.9
4766
9
1114
0
3617
46.7
20
1045
Oll-GPU
long
128
5
128
116.8
4480.8
9.6
1105
0
3617
50.6
23
1045
Oll-GPU
long
512
1
512
114.1
4561.2
9.4
4496
57
3617
60.9
37.3
1047
Oll-GPU
long
512
2
512
114
3270.9
13.1
4506
86
3617
67.6
27.3
1047
Oll-GPU
long
512
3
512
116
3338.3
12.9
4427
87.3
3617
68.8
24.3
1047
Oll-GPU
long
512
4
512
114.6
3130
13.7
4483
57
3617
51.8
22.7
1047
Oll-GPU
long
512
5
512
115.4
3135
13.7
4451
79.7
3617
65.5
29
1047
Oll-GPU
short
128
1
128
119
1108.2
18
1094
0
0
0
0
0
Oll-GPU
short
128
2
128
118.5
2206.2
9.1
1089
87
3617
70.1
24
1047
Oll-GPU
short
128
3
128
117.8
2137.6
9.4
1096
88
3617
70.4
26
1045
Oll-GPU
short
128
4
128
117.2
2251.2
8.9
1101
88
3617
69.6
34
1045
Oll-GPU
short
128
5
128
117.7
2179.2
9.2
1097
0
0
0
0
0
Oll-GPU
short
512
1
342
116.4
1966.7
10.2
2949
85.5
3617
69.4
29.5
1047
Oll-GPU
short
512
2
330
113.8
1443.2
13.9
2914
86.5
3617
69.9
21.5
1047
Oll-GPU
short
512
3
304
114.7
1348.4
14.8
2666
85
3617
69
34
1045
Oll-GPU
short
512
4
358
115.5
1478.8
13.5
3114
71
3617
56.9
26
1047
Oll-GPU
short
512
5
271
117
1510.2
13.2
2329
88
3617
70.2
25
1045