Skip to content

Commit 3da2fc0

Browse files
committed
the 10th chapter
1 parent aa55fe2 commit 3da2fc0

33 files changed

Lines changed: 14494 additions & 0 deletions

chapter10/.ipynb_checkpoints/1_1dataGuiyue-checkpoint.ipynb

Lines changed: 652 additions & 0 deletions
Large diffs are not rendered by default.

chapter10/.ipynb_checkpoints/2_1dataExchange_divideEvent-checkpoint.ipynb

Lines changed: 680 additions & 0 deletions
Large diffs are not rendered by default.

chapter10/.ipynb_checkpoints/2_2dataExchange_thresholdOptimization-checkpoint.ipynb

Lines changed: 440 additions & 0 deletions
Large diffs are not rendered by default.

chapter10/.ipynb_checkpoints/2_3dataExchange_attributeConstruction-checkpoint.ipynb

Lines changed: 2959 additions & 0 deletions
Large diffs are not rendered by default.

chapter10/.ipynb_checkpoints/3_1modelBuild-checkpoint.ipynb

Lines changed: 1111 additions & 0 deletions
Large diffs are not rendered by default.

chapter10/1TimeWaterDivide.xlsx

333 KB
Binary file not shown.

chapter10/1_1dataGuiyue.ipynb

Lines changed: 652 additions & 0 deletions
Large diffs are not rendered by default.

chapter10/1_1dataGuiyue.py

Lines changed: 108 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,108 @@
1+
2+
# coding: utf-8
3+
4+
# In[1]:
5+
6+
# -*- utf-8 -*-
7+
# 1 数据抽取
8+
# 2 数据探索分析
9+
# 通过频率分布直方图分析用户用水停顿时间间隔的规律性--->探究划分一次完整用水事件的时间间隔阈值
10+
#
11+
# 3 数据预处理
12+
# (1)数据规约 data_guiyue.py
13+
# -*- utf-8 -*-
14+
# 规约掉"热水器编号"、"有无水流"、"节能模式"三个属性
15+
# 注意:
16+
#书中提到:规约掉热水器"开关机状态"=="关"且”水流量”==0的数据,说明热水器不处于工作状态,数据记录可以规约掉。但由后文知,此条件不能进行规约
17+
# 因为,"开关机状态"=="关"且”水流量”==0可能是一次用水中的停顿部分,删掉后则无法准确计算关于停顿的数据
18+
19+
20+
# In[2]:
21+
22+
import pandas as pd
23+
import numpy as np
24+
from pandas import DataFrame
25+
or_data = pd.read_excel('original_data.xls',encoding='gbk')
26+
or_data.head()
27+
28+
29+
# In[3]:
30+
31+
data = or_data.drop(or_data.columns[[0,5,9]],axis=1) # 删掉不相关属性
32+
data.head()
33+
34+
35+
# In[4]:
36+
37+
data.info()
38+
39+
40+
# In[5]:
41+
42+
data[u'发生时间'] = pd.to_datetime(data[u'发生时间'], format = '%Y%m%d%H%M%S')#将时间列转成日期格式(***)
43+
print len(data)
44+
# 由后文知,此条件无用
45+
# data1 = data[(data[u'开关机状态']==u'开')|(data[u'水流量']!=0)]
46+
# data1.head()
47+
data.head(10)
48+
49+
50+
# In[6]:
51+
52+
data.to_excel('data_guiyue.xlsx')
53+
54+
55+
# In[ ]:
56+
57+
58+
59+
60+
# In[ ]:
61+
62+
63+
64+
65+
# In[ ]:
66+
67+
68+
69+
70+
# In[ ]:
71+
72+
73+
74+
75+
# In[ ]:
76+
77+
78+
79+
80+
# In[ ]:
81+
82+
83+
84+
85+
# In[ ]:
86+
87+
88+
89+
90+
# In[ ]:
91+
92+
93+
94+
95+
# In[ ]:
96+
97+
98+
99+
100+
# In[ ]:
101+
102+
103+
104+
105+
# In[ ]:
106+
107+
108+

0 commit comments

Comments
 (0)