Mysql query will become slow when table contains millions of rows. A simple way of solving this probolem is spliting the big table into small ones. We use some method to decide which table each row should be moved into. A widely used method is "modulus", e.g. group rows by [ID mod 100]. This tool will help you to do this job easily.
- Move records from one table to another(or other) table(s) according to user specified rules.
- Dynamically create new tables during moving.
- Move records between different mysql servers.
- Check data integrity of the new tables.
- For records who have been moved to new tables, remove them from source table.
- Using it as a python module which means dynamically changing the behavior.
python2: pip install pymysql_split_tool
python3: pip3 install pymysql_split_tool
The following python version are supported and tested: 2.6, 2.7, 3.3, 3.4, 3.5, 3.6, 3.7
- Just create a split task file and run this tool with it:
python -m pymysql_split_tool --action split --task ./task.json
- Supported parameters:
| parameter name | description |
|---|---|
| action | what type of task to do. required. |
| task | path to a file contains the detailed infomation for the task. required. |
| debug | print debug log. optional. |
- There are two type of
actionsupported:
| action | description |
|---|---|
| split | Move rows from a big table into small tables. |
| check | Check data integrity of new small tables. ==not implemented yet==. |
| remove | Remove rows from a big table after checking data integrity in small tables. ==not implemented yet==. |
- A quick look at a sample
task file:
{
"src":{
"mysql":{
"host":"127.0.0.1",
"port":"3306",
"unix_socket":"/path/to/mysqld.sock",
"user":"user_name",
"password":"password"
},
"database":"name_of_database",
"table":"name_of_the_big_table"
},
"dest":{
"table":"small_table_[n]"
},
"rule":{
"filter":"id>1000",
"group_method":"modulus",
"group_base":100,
"group_column":"id"
}
}Check the test directory for more sample task files.
| field @ level-1 | field @ level-2 | field @ level-3 | description | sample |
|---|---|---|---|---|
| src | infomation of the being splitted table | |||
| mysql | mysql conneciton parameters for PyMysql | |||
| host | mysql host(domain name or ip) | "127.0.0.1" | ||
| port | mysql port | 3306 | ||
| unix_socket | connect mysql server with unix-socket | "/var/run/mysqld.sock" | ||
| user | mysql user name, need read privilege at least | "root" | ||
| password | mysql user password | |||
| database | name of database | |||
| table | name of table | |||
| dest | infomation of the new created small tables | |||
| mysql | mysql conneciton parameters. If not set, will use the same parameters as "src". Has same level-3 fileds like "src" |
|||
| database | name of database where we create new small tables. If not set, will use the same database name as "src". |
|||
| table | name pattern of table. [n] will be replaced by an integer. | "small_table_[n]" | ||
| create_table_first | if set to 1, it will create the small table if not exists. default is 1. ==0 not supported yet==. |
0 | ||
| create_table_sql | specify a sql to create new table. If not set, will use 'show create table' to get a creation sql. ==can not specify yet==. |
"create table [table_name] if not exists..." | ||
| rule | defines how it do the work | |||
| filter | a sql 'where' clause telling it which rows should be moved | "id>1000" | ||
| page_size | page size when select in paging mode | 1000 | ||
| page_sleep | sleep a few seconds between pages to avoid mysql server busy working | 1 | ||
| order_by | select order in paging mode | "id asc" | ||
| group_method | split rows by [modulus] or [devide], or [all]. each row will map to an integer. | "modulus" | ||
| group_base | [modulus] or [devide] by which number | 100 | ||
| group_column | column name which the method will used on. | |||
| group_int | an int array. if set, this tool will move data whose method result fall in this array. for method [modulus] and [devide] only. pattern: [a,b,[from,to],c,d] |
[1,7,[12,15],20] | ||
| check | for "check" action only. how we check data integrity. | |||
| count | set to 1 to use mysql count() function to compare src and dest.. | 1 | ||
| sum | a column name list on which mysql sum() function is used. | ["id","created"] |
import json
import pymysql_split_tool
task_file = open('path/to/task.json')
task = json.loads(task_file.read())
task_file.close()
#init a split task with a specified task object whose type is dict
pymysql_split_tool.init("split", task)
#run the task
pymysql_split_tool.do_work()Check here for a sample with monkey patch.
For action split, there are 4 different work flows:
-
A: if
dest.mysqlis set,rule.group_intnot set, it works in the following steps:- make connection to "src" and "dest" mysql server.
- get data with this sql: select * from
src.database.src.tablewhererule.filterorder byrule.order_bylimit offset,rule.page_size - for each selected data, use
rule.group_methodon therule.group_column, get a method result:n - ensure table
dest.table_[n]existance ifdest.create_table_firstexists. - insert data into new table
-
B: if
dest.mysqlis set,rule.group_intis set, it works like this:- make connection to "src" and "dest" mysql server.
- generate an integer array according to
rule.group_int. - for each integer
n, ensure tabledest.table_[n]existance ifdest.create_table_firstexists. - for each integer
n, get data with this sql: select * fromsrc.database.src.tablewhererule.filterandrule.group_columnmethod =norder byrule.order_bylimit offset,rule.page_size - insert data into new table
-
C: if
dest.mysqlis not set,rule.group_intnot set, it works like this:- make connection to "src" mysql server
- get group list with this sql: select n from
src.database.src.tablegroup byrule.group_columnmethod result - for each group n, copy data with this sql: replace into
dest.database.dest.tableselect * fromsrc.database.src.tablewhererule.filterwhererule.group_columnmethod result =n
-
D: if
dest.mysqlis not set,rule.group_intis set, it works like this:- make connection to "src" mysql server
- copy data with this sql: replace into
dest.database.dest.tableselect * fromsrc.database.src.tablewhererule.filter
For action check, here is the work flow:
- make connection to mysql server.
- if
rule.group_intis not set, use this sql to get a group integer list: select n fromsrc.database.src.tablegroup byrule.group_columnmethod result - for each integer in group list, select count() and sum() in both src and dest tables. do the comparation.
For action remove, here is the work flow:
- make connection to mysql server.
- if
rule.group_intis not set, use this sql to get a group integer list. - for each integer in group list, delete records in src table.