Skip to content

Commit b567a7b

Browse files
committed
add en doc
1 parent eb0a242 commit b567a7b

11 files changed

Lines changed: 3266 additions & 26 deletions

File tree

docs/.vitepress/config.mts

Lines changed: 60 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -4,35 +4,69 @@ import { defineConfig } from 'vitepress'
44
export default defineConfig({
55
title: "SQLRec",
66
description: "SQLRec docs",
7-
themeConfig: {
8-
// https://vitepress.dev/reference/default-theme-config
9-
nav: [
10-
{ text: '主页', link: '/' },
11-
{ text: '文档', link: '/docs/intro' }
12-
],
13-
14-
outline: [2, 6],
15-
16-
sidebar: [
17-
{ text: '介绍', link: '/docs/intro' },
18-
{ text: '部署', link: '/docs/deployment' },
19-
{ text: '快速开始', link: '/docs/quick_start' },
20-
{ text: '性能测试', link: '/docs/benchmark' },
21-
{ text: '编程模型', link: '/docs/program_model' },
22-
{ text: 'SQL语法', link: '/docs/sql_reference' },
23-
{ text: '模型', link: '/docs/models' },
24-
{ text: '内置UDF', link: '/docs/udf' },
25-
{
7+
locales: {
8+
root: {
9+
label: '简体中文',
10+
lang: 'zh-CN',
11+
themeConfig: {
12+
nav: [
13+
{ text: '主页', link: '/' },
14+
{ text: '文档', link: '/docs/intro' }
15+
],
16+
outline: [2, 6],
17+
sidebar: [
18+
{ text: '介绍', link: '/docs/intro' },
19+
{ text: '部署', link: '/docs/deployment' },
20+
{ text: '快速开始', link: '/docs/quick_start' },
21+
{ text: '性能测试', link: '/docs/benchmark' },
22+
{ text: '编程模型', link: '/docs/program_model' },
23+
{ text: 'SQL语法', link: '/docs/sql_reference' },
24+
{ text: '模型', link: '/docs/models' },
25+
{ text: '内置UDF', link: '/docs/udf' },
26+
{
2627
text: '教程',
2728
collapsed: true,
2829
items: [
29-
{ text: '召回', link: '/docs/tutorial/recall' }
30+
{ text: '召回', link: '/docs/tutorial/recall' }
3031
]
31-
}
32-
],
33-
34-
socialLinks: [
35-
{ icon: 'github', link: 'https://github.com/sqlrec/sqlrec' }
36-
]
32+
}
33+
],
34+
socialLinks: [
35+
{ icon: 'github', link: 'https://github.com/sqlrec/sqlrec' }
36+
]
37+
}
38+
},
39+
en: {
40+
label: 'English',
41+
lang: 'en-US',
42+
link: '/en/',
43+
themeConfig: {
44+
nav: [
45+
{ text: 'Home', link: '/en/' },
46+
{ text: 'Docs', link: '/en/docs/intro' }
47+
],
48+
outline: [2, 6],
49+
sidebar: [
50+
{ text: 'Introduction', link: '/en/docs/intro' },
51+
{ text: 'Deployment', link: '/en/docs/deployment' },
52+
{ text: 'Quick Start', link: '/en/docs/quick_start' },
53+
{ text: 'Benchmark', link: '/en/docs/benchmark' },
54+
{ text: 'Programming Model', link: '/en/docs/program_model' },
55+
{ text: 'SQL Reference', link: '/en/docs/sql_reference' },
56+
{ text: 'Models', link: '/en/docs/models' },
57+
{ text: 'Built-in UDF', link: '/en/docs/udf' },
58+
{
59+
text: 'Tutorials',
60+
collapsed: true,
61+
items: [
62+
{ text: 'Recall', link: '/en/docs/tutorial/recall' }
63+
]
64+
}
65+
],
66+
socialLinks: [
67+
{ icon: 'github', link: 'https://github.com/sqlrec/sqlrec' }
68+
]
69+
}
70+
}
3771
}
3872
})

docs/en/docs/benchmark.md

Lines changed: 152 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,152 @@
1+
# Benchmark
2+
3+
This document introduces SQLRec performance testing methods and results.
4+
5+
## Test Environment
6+
7+
**Hardware Configuration**:
8+
- CPU: AMD Ryzen 5600H
9+
- Memory: 32GB DDR4
10+
11+
**Software Environment**:
12+
- Operating System: Debian 12
13+
- Kubernetes: Minikube
14+
- SQLRec: Single instance deployment
15+
16+
## Test Data
17+
18+
Default test configuration is as follows:
19+
20+
| Configuration Item | Value |
21+
|-------------------|-------|
22+
| Number of Users | 100,000 |
23+
| Number of Items | 100,000 |
24+
| Vector Dimension | 8 dimensions |
25+
| User Embedding | Fixed value |
26+
27+
## Recommendation Pipeline
28+
29+
The tested recommendation pipeline includes the following stages:
30+
31+
### Recall Stage
32+
33+
| Recall Strategy | Description | Recall Count |
34+
|----------------|-------------|--------------|
35+
| Global Hot Recall | Based on global item popularity ranking | 300 |
36+
| User Interest Category Recall | Recall hot items based on user interest categories | 300 |
37+
| ItemCF Recall | Recall based on item collaborative filtering | 300 |
38+
| Vector Search Recall | Based on vector similarity search | 300 |
39+
40+
### Filtering Stage
41+
42+
| Filtering Strategy | Description |
43+
|-------------------|-------------|
44+
| Exposure Deduplication | Filter items already exposed to users |
45+
| Category Diversification | Display at most N items per category |
46+
47+
## Test Scripts
48+
49+
### Initialize Test Environment
50+
51+
```bash
52+
cd benchmark
53+
bash init.sh
54+
```
55+
56+
The `init.sh` script performs the following operations:
57+
58+
1. **Create Milvus Vector Collection**
59+
- Create `item_embedding` collection
60+
- Define vector dimension as 8
61+
- Create COSINE similarity index
62+
63+
2. **Create Data Tables**
64+
- User table (`user_table`)
65+
- Item table (`item_table`)
66+
- Global hot items table (`global_hot_item`)
67+
- User interest category table (`user_interest_category1`)
68+
- Category hot items table (`category1_hot_item`)
69+
- User recent clicks table (`user_recent_click_item`)
70+
- User exposure table (`user_exposure_item`)
71+
- ItemCF I2I table (`itemcf_i2i`)
72+
- Item vector table (`item_embedding`)
73+
- Recommendation log table (`rec_log_kafka`)
74+
75+
3. **Generate Simulated Data**
76+
- Use Python scripts to generate 100,000 users and 100,000 items data
77+
- Generate user behavior data and upload to HDFS
78+
79+
4. **Install Test Tools**
80+
- Install wrk HTTP benchmarking tool
81+
82+
### Execute Performance Test
83+
84+
```bash
85+
bash benchmark.sh
86+
```
87+
88+
The `benchmark.sh` script performs the following operations:
89+
90+
1. **Warm-up Phase**
91+
- Single thread, single connection, run for 10 seconds
92+
- Warm up system cache
93+
94+
2. **Formal Testing**
95+
- Concurrency: 10
96+
- Duration: 30 seconds
97+
- Test URL: `/api/v1/main_rec`
98+
99+
### Test Request Script
100+
101+
`request.lua` is a custom request script for wrk:
102+
103+
```lua
104+
-- Set random seed
105+
math.randomseed(os.time())
106+
107+
function request()
108+
-- Generate random ID between 0-99999
109+
local random_id = math.random(0, 99999)
110+
111+
-- Construct request body
112+
local request_body = string.format(
113+
'{"inputs":{"user_info":[{"id":%d}]},"params":{"recall_fun":"recall_fun"}}',
114+
random_id
115+
)
116+
117+
-- Configure HTTP request
118+
wrk.method = "POST"
119+
wrk.headers["Content-Type"] = "application/json"
120+
wrk.body = request_body
121+
122+
return wrk.format()
123+
end
124+
```
125+
126+
## Test Results
127+
128+
Test results on AMD Ryzen 5600H, 32GB DDR4 memory machine:
129+
130+
```
131+
Running 30s test @ http://192.168.49.2:30301/api/v1/main_rec
132+
10 threads and 10 connections
133+
Thread Stats Avg Stdev Max +/- Stdev
134+
Latency 10.80ms 4.81ms 53.85ms 92.68%
135+
Req/Sec 95.12 17.61 121.00 82.63%
136+
28464 requests in 30.02s, 49.80MB read
137+
Socket errors: connect 0, read 28463, write 0, timeout 0
138+
Requests/sec: 948.09
139+
Transfer/sec: 1.66MB
140+
```
141+
142+
**Performance Metrics**:
143+
144+
| Metric | Value |
145+
|--------|-------|
146+
| Average Latency | 10.80ms |
147+
| Latency Standard Deviation | 4.81ms |
148+
| Max Latency | 53.85ms |
149+
| Average QPS | 95.12 |
150+
| Total Requests | 28,464 |
151+
| Total QPS | 948.09 |
152+
| Throughput | 1.66MB/s |

0 commit comments

Comments
 (0)