Skip to content

Commit 73e203b

Browse files
committed
新增大数据读取方式综合对比
1 parent 330c3d5 commit 73e203b

File tree

1 file changed

+30
-10
lines changed

1 file changed

+30
-10
lines changed

big_data.py

Lines changed: 30 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -23,37 +23,57 @@ def create_data():
2323

2424

2525
@tools.time_log.time_log
26-
def read_data():
26+
def read_data_line():
2727
"""
28-
读取大数据
28+
逐行读取大数据
2929
"""
3030
file_name = 'static/csv/data.csv'
3131
with open(file_name) as f:
3232
for line in f:
3333
print line.rstrip('\n').split('\t')
3434

3535

36+
@tools.time_log.time_log
37+
def read_data_all():
38+
"""
39+
一次读取大数据
40+
"""
41+
file_name = 'static/csv/data.csv'
42+
with open(file_name) as f:
43+
data_tmp = f.read()
44+
# print data_tmp # 如果仅仅是读取数据,速度可以达到2.42S
45+
for i in data_tmp.split('\n'):
46+
print i.split('\t')
47+
48+
3649
if __name__ == '__main__':
3750
# create_data()
38-
read_data()
51+
# read_data_line()
52+
read_data_all()
3953

4054

4155
"""
4256
运行状况:
4357
4458
方法create_data运行时间:19.08S
59+
zhanghe@ubuntu:~/code/python$ du -h static/csv/data.csv
60+
11M static/csv/data.csv
61+
zhanghe@ubuntu:~/code/python$ less static/csv/data.csv
4562
46-
方法read_data运行时间:65.21S
4763
48-
读取文件时的状态
64+
方法read_data_line运行时间:68.91S
65+
逐行读取文件时的状态
4966
zhanghe@ubuntu:~/code/python$ top
5067
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
5168
10059 zhanghe 20 0 12656 4532 2632 R 95.2 0.2 0:13.74 python
5269
70+
方法read_data_all运行时间:47.39S
71+
一次读取文件时的状态
72+
zhanghe@ubuntu:~/code/python$ top
73+
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
74+
10744 zhanghe 20 0 58808 50632 2628 R 97.2 2.5 0:43.25 python
5375
54-
zhanghe@ubuntu:~/code/python$ du -h static/csv/data.csv
55-
11M static/csv/data.csv
56-
57-
zhanghe@ubuntu:~/code/python$ less static/csv/data.csv
58-
76+
以上结果可以看出:
77+
逐行读取文件比一次性加载文件节约内存,适合处理大数据的场景
78+
一次性加载文件的方式适合数据量不大(占用内存可以忽略),但对速度要求较高的场景
5979
"""

0 commit comments

Comments
 (0)