<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>cgDeepLearn</title>
  
  <subtitle>More than code</subtitle>
  <link href="/atom.xml" rel="self"/>
  
  <link href="https://cgdeeplearn.github.io/"/>
  <updated>2021-10-10T08:06:15.695Z</updated>
  <id>https://cgdeeplearn.github.io/</id>
  
  <author>
    <name>cgDeepLearn</name>
    
  </author>
  
  <generator uri="http://hexo.io/">Hexo</generator>
  
  <entry>
    <title>scheduler</title>
    <link href="https://cgdeeplearn.github.io/2021/10/10/scheduler/"/>
    <id>https://cgdeeplearn.github.io/2021/10/10/scheduler/</id>
    <published>2021-10-10T07:45:09.000Z</published>
    <updated>2021-10-10T08:06:15.695Z</updated>
    
    <content type="html"><![CDATA[<script src="/assets/js/APlayer.min.js"> </script><p class="description">可扩展的任务调度器框架<br></p><p><img src="" alt="" style="width:100%"></p><h2 id="前言"><a href="#前言" class="headerlink" title="前言"></a>前言</h2><p><div class="note info"><p><br>平常在项目中我们经常会有一些后台的定时或者间隔的任务，比如间隔或者定时计算、数据规则判断告警等。如果需要统一灵活的配置管理，并支持大量任务的扩展，我们是否可以考虑做一个统一的调度管理框架呢?</p><p>下面是基于个人现实中的一些情况，提炼实现的一个可扩展的任务调度框架…<br></p></div><br><a id="more"></a></p><p>##</p><p>##</p><h2 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h2><p>此项目github地址: <a href="https://github.com/cgDeepLearn/scheduler" target="_blank" rel="noopener">scheduler</a></p>]]></content>
    
    <summary type="html">
    
      &lt;p class=&quot;description&quot;&gt;可扩展的任务调度器框架&lt;br&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;&quot; alt=&quot;&quot; style=&quot;width:100%&quot;&gt;&lt;/p&gt;
&lt;h2 id=&quot;前言&quot;&gt;&lt;a href=&quot;#前言&quot; class=&quot;headerlink&quot; title=&quot;前言&quot;&gt;&lt;/a&gt;前言&lt;/h2&gt;&lt;p&gt;&lt;div class=&quot;note info&quot;&gt;&lt;p&gt;&lt;br&gt;平常在项目中我们经常会有一些后台的定时或者间隔的任务，比如间隔或者定时计算、数据规则判断告警等。如果需要统一灵活的配置管理，并支持大量任务的扩展，我们是否可以考虑做一个统一的调度管理框架呢?&lt;/p&gt;
&lt;p&gt;下面是基于个人现实中的一些情况，提炼实现的一个可扩展的任务调度框架…&lt;br&gt;&lt;/p&gt;&lt;/div&gt;&lt;br&gt;
    
    </summary>
    
      <category term="Python" scheme="https://cgdeeplearn.github.io/categories/Python/"/>
    
    
      <category term="docker" scheme="https://cgdeeplearn.github.io/tags/docker/"/>
    
      <category term="redis" scheme="https://cgdeeplearn.github.io/tags/redis/"/>
    
      <category term="scheduler" scheme="https://cgdeeplearn.github.io/tags/scheduler/"/>
    
  </entry>
  
  <entry>
    <title>生成日报图表并发送邮件</title>
    <link href="https://cgdeeplearn.github.io/2020/12/06/Daily-Report/"/>
    <id>https://cgdeeplearn.github.io/2020/12/06/Daily-Report/</id>
    <published>2020-12-06T12:42:02.000Z</published>
    <updated>2020-12-07T13:00:49.129Z</updated>
    
    <content type="html"><![CDATA[<script src="/assets/js/APlayer.min.js"> </script><p class="description">使用<code>pandas</code>生成日报图表并发送邮件<br></p><p><img src="" alt="" style="width:100%"></p><h2 id="前言"><a href="#前言" class="headerlink" title="前言"></a>前言</h2><p><div class="note info"><p></p><p>平常我们在项目中可能会用到<a href="https://yunlzheng.gitbook.io/prometheus-book/parti-prometheus-ji-chu/quickstart/why-monitor" target="_blank" rel="noopener">Prometheus</a>这样的监控报警系统,来了解项目系统内部的实际运行状态，以及做趋势分析、对照分析、告警与故障定位、数据可视化等。</p><p>虽然他有诸多优点，但是对于小项目来说，这个又有点太重。如果我们想在小项目里做一些简单的数据分析和可定制化的报表生成，是否有简单可复用的方法呢？有的！下面我们就用<a href="https://www.pypandas.cn/docs/getting_started/10min.html" target="_blank" rel="noopener">pandas</a>结合python其他的一些基本的库，来做一个日报图表生成发送的小模板。</p><p></p></div><br><a id="more"></a></p><h2 id="准备"><a href="#准备" class="headerlink" title="准备"></a>准备</h2><h3 id="需求-示例"><a href="#需求-示例" class="headerlink" title="需求(示例)"></a>需求(示例)</h3><ul><li>现在每天有很多业务都要查询几个三方提供的数据服务接口,我们想要知道近几日每个数据源的查询总量、成功失败量这些基本的情况，以及每天对于各业务对各数据源的详细查询情况。(每个数据源我们都做了查询的记录(查询来源，查询状态等))</li><li>每天定时分析数据，生成图表</li><li>发送邮件(或者只针对异常情况发送告警邮件)</li></ul><p>基本每日数据简略如下 :</p><ul><li>A数据源:</li></ul><table><thead><tr><th>请求编号</th><th>产品</th><th>查询状态</th><th>结果数据</th><th>创建时间</th><th>更新时间</th></tr></thead><tbody><tr><td>10001</td><td>产品1</td><td>成功</td><td>…</td><td>2020-11-09 12:00:01</td><td>2020-11-09 12:01:03</td></tr><tr><td>10002</td><td>产品2</td><td>成功</td><td>…</td><td>2020-11-09 12:03:01</td><td>2020-11-09 12:04:03</td></tr><tr><td>10003</td><td>产品2</td><td>失败</td><td>…</td><td>2020-11-09 12:05:01</td><td>2020-11-09 12:06:12</td></tr><tr><td>10004</td><td>产品3</td><td>特殊失败</td><td>…</td><td>2020-11-09 12:06:01</td><td>2020-11-09 12:06:15</td></tr><tr><td>…</td><td>…</td><td>…</td><td>…</td><td>…</td><td>…</td></tr></tbody></table><ul><li>B/C/D数据源类似</li></ul><h3 id="开发环境准备"><a href="#开发环境准备" class="headerlink" title="开发环境准备"></a>开发环境准备</h3><ul><li>python版本: 2.7+/3.6+</li><li>python包(示例):<figure class="highlight yaml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line"><span class="string">DBUtils==2.0</span></span><br><span class="line"><span class="string">matplotlib==3.3.3</span></span><br><span class="line"><span class="string">numpy==1.19.4</span></span><br><span class="line"><span class="string">pandas==1.1.4</span></span><br><span class="line"><span class="string">Pillow==8.0.1</span></span><br><span class="line"><span class="string">PyMySQL==0.10.1</span></span><br><span class="line"><span class="string">pyparsing==2.4.7</span></span><br><span class="line"><span class="string">python-dateutil==2.8.1</span></span><br><span class="line"><span class="string">pytz==2020.4</span></span><br><span class="line"><span class="string">six==1.15.0</span></span><br></pre></td></tr></table></figure></li></ul><h2 id="开发"><a href="#开发" class="headerlink" title="开发"></a>开发</h2><p>整个小项目分为获取数据、清洗数据、分析数据、生成图表、发送邮件大致几个模块。</p><h3 id="获取数据"><a href="#获取数据" class="headerlink" title="获取数据"></a>获取数据</h3><h4 id="连接数据库"><a href="#连接数据库" class="headerlink" title="连接数据库"></a>连接数据库</h4><ul><li>数据库engine</li></ul><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">#!/usr/bin/env python</span></span><br><span class="line"><span class="comment"># -*- coding: utf-8 -*-</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">import</span> pymysql</span><br><span class="line"><span class="keyword">from</span> DBUtils.PooledDB <span class="keyword">import</span> PooledDB</span><br><span class="line"></span><br><span class="line"><span class="keyword">from</span> srf_log <span class="keyword">import</span> logger</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="class"><span class="keyword">class</span> <span class="title">MySQLEngine</span><span class="params">(object)</span>:</span></span><br><span class="line">    <span class="string">'''</span></span><br><span class="line"><span class="string">    mysql engine</span></span><br><span class="line"><span class="string">    '''</span></span><br><span class="line">    __tablename__ = <span class="keyword">None</span></span><br><span class="line">    placeholder = <span class="string">'%s'</span></span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">connect</span><span class="params">(self, **kwargs)</span>:</span></span><br><span class="line">        <span class="string">'''</span></span><br><span class="line"><span class="string">        mincached : 启动时开启的空连接数量(缺省值 0 意味着开始时不创建连接)</span></span><br><span class="line"><span class="string">        maxcached: 连接池使用的最多连接数量(缺省值 0 代表不限制连接池大小)</span></span><br><span class="line"><span class="string">        maxshared: 最大允许的共享连接数量(缺省值 0 代表所有连接都是专用的)如果达到了最大数量，被请求为共享的连接将会被共享使用。</span></span><br><span class="line"><span class="string">        maxconnections: 最大允许连接数量(缺省值 0 代表不限制)</span></span><br><span class="line"><span class="string">        blocking: 设置在达到最大数量时的行为(缺省值 0 或 False 代表返回一个错误；其他代表阻塞直到连接数减少)</span></span><br><span class="line"><span class="string">        maxusage: 单个连接的最大允许复用次数(缺省值 0 或 False 代表不限制的复用)。当达到最大数值时，连接会自动重新连接(关闭和重新打开)</span></span><br><span class="line"><span class="string">        '''</span></span><br><span class="line">        db_host = kwargs.get(<span class="string">'db_host'</span>, <span class="string">'localhost'</span>)</span><br><span class="line">        db_port = kwargs.get(<span class="string">'db_port'</span>, <span class="number">3306</span>)</span><br><span class="line">        db_user = kwargs.get(<span class="string">'db_user'</span>, <span class="string">'root'</span>)</span><br><span class="line">        db_pwd = kwargs.get(<span class="string">'db_pwd'</span>, <span class="string">''</span>)</span><br><span class="line">        db = kwargs.get(<span class="string">'db'</span>, <span class="string">''</span>)</span><br><span class="line"></span><br><span class="line">        self.pool = PooledDB(pymysql, maxconnections=<span class="number">5</span>, mincached=<span class="number">1</span>, maxcached=<span class="number">5</span>, blocking=<span class="keyword">True</span>, host=db_host,</span><br><span class="line">                             user=db_user, passwd=db_pwd, db=db, port=db_port, charset=<span class="string">'utf8'</span>)</span><br><span class="line"></span><br><span class="line">        logger.info(<span class="string">'''connect mysql db_host:%s db_port:%d db_user:%s </span></span><br><span class="line"><span class="string">            db_pwd:%s db:%s'''</span>, db_host, db_port, db_user, db_pwd, db)</span><br><span class="line"></span><br><span class="line"><span class="meta">    @staticmethod</span></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">escape</span><span class="params">(string)</span>:</span></span><br><span class="line">        <span class="keyword">pass</span></span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">_check_parameter</span><span class="params">(self, sql_query, values, req_id=None)</span>:</span></span><br><span class="line">        count = sql_query.count(<span class="string">'%s'</span>)</span><br><span class="line">        <span class="keyword">if</span> count &gt; <span class="number">0</span>:</span><br><span class="line">            <span class="keyword">for</span> elem <span class="keyword">in</span> values:</span><br><span class="line">                <span class="keyword">if</span> <span class="keyword">not</span> elem:</span><br><span class="line">                    <span class="keyword">if</span> req_id:</span><br><span class="line">                        logger.debug(<span class="string">'req_id:%s sql_query:%s values:%s check failed'</span>,</span><br><span class="line">                                     req_id, sql_query, values)</span><br><span class="line">                    <span class="keyword">return</span> <span class="keyword">False</span></span><br><span class="line">        <span class="keyword">return</span> <span class="keyword">True</span></span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">_execute</span><span class="params">(self, sql_query, values=[], req_id=None)</span>:</span></span><br><span class="line">        <span class="string">'''</span></span><br><span class="line"><span class="string">        每次都使用新的连接池中的链接</span></span><br><span class="line"><span class="string">        '''</span></span><br><span class="line">        <span class="keyword">if</span> <span class="keyword">not</span> self._check_parameter(sql_query, values):</span><br><span class="line">            <span class="keyword">return</span></span><br><span class="line">        conn = self.pool.connection()</span><br><span class="line">        cur = conn.cursor()</span><br><span class="line">        cur.execute(sql_query, values)</span><br><span class="line">        conn.commit()</span><br><span class="line">        conn.close()</span><br><span class="line">        <span class="keyword">return</span> cur</span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">select</span><span class="params">(self, sql_query, values=[], req_id=None)</span>:</span></span><br><span class="line">        sql_query = sql_query.replace(<span class="string">'\n'</span>, <span class="string">''</span>)</span><br><span class="line">        <span class="keyword">while</span> <span class="string">'  '</span> <span class="keyword">in</span> sql_query:</span><br><span class="line">            sql_query = sql_query.replace(<span class="string">'  '</span>, <span class="string">' '</span>)</span><br><span class="line">        <span class="keyword">if</span> <span class="keyword">not</span> self._check_parameter(sql_query, values, req_id):</span><br><span class="line">            <span class="keyword">return</span></span><br><span class="line">        cur = self._execute(sql_query, values, req_id)</span><br><span class="line">        <span class="keyword">for</span> row <span class="keyword">in</span> cur:</span><br><span class="line">            <span class="keyword">yield</span> row</span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">execute</span><span class="params">(self, sql_query, values=[], req_id=None)</span>:</span></span><br><span class="line">        sql_query = sql_query.replace(<span class="string">'\n'</span>, <span class="string">''</span>)</span><br><span class="line">        <span class="keyword">while</span> <span class="string">'  '</span> <span class="keyword">in</span> sql_query:</span><br><span class="line">            sql_query = sql_query.replace(<span class="string">'  '</span>, <span class="string">' '</span>)</span><br><span class="line">        cur = self._execute(sql_query, values)</span><br><span class="line"></span><br><span class="line"></span><br><span class="line">sql_engine = MySQLEngine()</span><br></pre></td></tr></table></figure><ul><li>数据库连接</li></ul><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment"># -*- coding: utf-8 -*-</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">import</span> config</span><br><span class="line"><span class="keyword">from</span> utils <span class="keyword">import</span> mysql_tools</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="class"><span class="keyword">class</span> <span class="title">DBInterface</span><span class="params">(object)</span>:</span></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">__init__</span><span class="params">(self)</span>:</span></span><br><span class="line">        self.db1_check = mysql_tools.MySQLEngine()</span><br><span class="line">        self.db1_check.connect(db_host=config.db1_host,</span><br><span class="line">                               db_port=config.db1_port,</span><br><span class="line">                               db_user=config.db1_user,</span><br><span class="line">                               db_pwd=config.db1_pwd,</span><br><span class="line">                               db=config.db1_db)</span><br><span class="line"></span><br><span class="line">        self.db2_check = mysql_tools.MySQLEngine()</span><br><span class="line">        self.db2_check.connect(db_host=config.db2_host,</span><br><span class="line">                               db_port=config.db2_port,</span><br><span class="line">                               db_user=config.db2_user,</span><br><span class="line">                               db_pwd=config.db2_pwd,</span><br><span class="line">                               db=config.db2_db)</span><br><span class="line"></span><br><span class="line"></span><br><span class="line">db_interface = DBInterface()</span><br></pre></td></tr></table></figure><h4 id="获取近几日数据概况"><a href="#获取近几日数据概况" class="headerlink" title="获取近几日数据概况"></a>获取近几日数据概况</h4><pre><code><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">_get_range_data</span><span class="params">(self, begin_date, end_date)</span>:</span></span><br><span class="line">    <span class="string">"""</span></span><br><span class="line"><span class="string">    获取起止日期之间的数据</span></span><br><span class="line"><span class="string">    """</span></span><br><span class="line">    STATUS_SUCCESS = <span class="number">1</span>  <span class="comment"># 查询成功</span></span><br><span class="line">    STATUS_CreditA_FAIL = <span class="number">2</span>  <span class="comment"># A数据源查询失败</span></span><br><span class="line">    STATUS_QUERYING = <span class="number">3</span>  <span class="comment"># 查询中</span></span><br><span class="line">    STATUS_CreditB_FAIL = <span class="number">4</span>  <span class="comment"># B数据源查询失败</span></span><br><span class="line">    STATUS_CreditC_FAIL = <span class="number">5</span>  <span class="comment"># C数据源查询失败</span></span><br><span class="line">    STATUS_CreditD_FAIL = <span class="number">6</span>  <span class="comment"># D数据源查询失败</span></span><br><span class="line">    total_cnt = <span class="number">0</span>  <span class="comment"># 总量</span></span><br><span class="line">    success_cnt = <span class="number">0</span>  <span class="comment"># 成功量</span></span><br><span class="line">    querying_cnt = <span class="number">0</span>  <span class="comment"># 查询中量</span></span><br><span class="line">    fail_cnt = <span class="number">0</span>  <span class="comment"># 失败量</span></span><br><span class="line">    special_fail_cnt = <span class="number">0</span>  <span class="comment"># 特殊失败量</span></span><br><span class="line">    begin_date_str = datetime.datetime.strftime(begin_date, <span class="string">'%Y-%m-%d'</span>)</span><br><span class="line">    sql = <span class="string">'''</span></span><br><span class="line"><span class="string">    SELECT status, count(1) FROM &#123;&#125;</span></span><br><span class="line"><span class="string">    where source = 1 and method regexp 'rule'  and created_at &gt;= %s and created_at &lt; %s</span></span><br><span class="line"><span class="string">    group by status</span></span><br><span class="line"><span class="string">    '''</span>.format(self.credit_table)  <span class="comment"># 按照查询状态统计</span></span><br><span class="line">    values = [begin_date, end_date]</span><br><span class="line">    <span class="keyword">for</span> row <span class="keyword">in</span> db_interface.db1.select(sql, values):</span><br><span class="line">        total_cnt += int(row[<span class="number">1</span>]) <span class="comment"># 总量统计</span></span><br><span class="line">        <span class="keyword">if</span> int(row[<span class="number">0</span>]) == STATUS_SUCCESS:</span><br><span class="line">            success_cnt += int(row[<span class="number">1</span>]) <span class="comment"># 成功量统计</span></span><br><span class="line">        <span class="keyword">elif</span> int(row[<span class="number">0</span>]) == STATUS_QUERYING:</span><br><span class="line">            querying_cnt += int(row[<span class="number">1</span>])  <span class="comment"># 查询中统计</span></span><br><span class="line">        <span class="keyword">else</span>:</span><br><span class="line">            <span class="comment"># 失败量统计</span></span><br><span class="line">            fail_cnt += int(row[<span class="number">1</span>])</span><br><span class="line">            <span class="comment"># 失败细分统计</span></span><br><span class="line">            <span class="keyword">if</span> self.credit_type == <span class="string">'CreditA'</span> <span class="keyword">and</span> int(row[<span class="number">0</span>]) == STATUS_CreditA_FAIL:</span><br><span class="line">                <span class="comment"># A数据源失败</span></span><br><span class="line">                special_fail_cnt += int(row[<span class="number">1</span>])</span><br><span class="line">            <span class="keyword">elif</span> self.credit_type == <span class="string">'CreditB'</span> <span class="keyword">and</span> int(row[<span class="number">0</span>]) == STATUS_CreditB_FAIL:</span><br><span class="line">                <span class="comment"># B数据源失败</span></span><br><span class="line">                special_fail_cnt += int(row[<span class="number">1</span>])</span><br><span class="line">            <span class="keyword">elif</span> self.credit_type == <span class="string">'CreditC'</span> <span class="keyword">and</span> int(row[<span class="number">0</span>]) == STATUS_CreditC_FAIL:</span><br><span class="line">                <span class="comment"># C数据源失败</span></span><br><span class="line">                special_fail_cnt += int(row[<span class="number">1</span>])</span><br><span class="line">            <span class="keyword">elif</span> self.credit_type == <span class="string">'CreditD'</span> <span class="keyword">and</span> int(row[<span class="number">0</span>]) == STATUS_CreditD_FAIL:</span><br><span class="line">                <span class="comment"># D数据源失败</span></span><br><span class="line">                special_fail_cnt += int(row[<span class="number">1</span>])</span><br><span class="line">    data = [begin_date_str, self.credit_type, total_cnt, success_cnt,</span><br><span class="line">            querying_cnt, fail_cnt, special_fail_cnt] <span class="comment"># 近几天时间序列数据</span></span><br><span class="line">    <span class="keyword">return</span> data</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">get_summary_data</span><span class="params">(self)</span>:</span></span><br><span class="line">    <span class="string">"""获取近几天的数据"""</span></span><br><span class="line">    sum_datas = list()</span><br><span class="line">    today = datetime.date.today()</span><br><span class="line">    <span class="comment"># 日期起始列表</span></span><br><span class="line">    days_range = range(<span class="number">1</span>, self.report_days + <span class="number">1</span>, <span class="number">1</span>)  <span class="comment"># self.report_days 是类的近几日的报告天数，可根据需求动态配置</span></span><br><span class="line">    date_pairs = [(today - datetime.timedelta(days=i), today - datetime.timedelta(days=i - <span class="number">1</span>)) <span class="keyword">for</span> i <span class="keyword">in</span> days_range]</span><br><span class="line">    <span class="keyword">for</span> date_pair <span class="keyword">in</span> date_pairs:</span><br><span class="line">        data = self._get_range_data(date_pair[<span class="number">0</span>], date_pair[<span class="number">1</span>])</span><br><span class="line">        sum_datas.append(data)</span><br><span class="line">    <span class="keyword">return</span> sum_datas</span><br></pre></td></tr></table></figure></code></pre><h4 id="获取昨日详细数据"><a href="#获取昨日详细数据" class="headerlink" title="获取昨日详细数据"></a>获取昨日详细数据</h4><pre><code><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">get_yestoday_data</span><span class="params">(self, include_today=False)</span>:</span></span><br><span class="line">    <span class="string">"""获取前一日数据</span></span><br><span class="line"><span class="string">    include_today为True时，即为获取截止到当前时间的今天的数据，默认为获取昨天的数据"""</span></span><br><span class="line">    datas = list()</span><br><span class="line">    end_date = datetime.date.today()</span><br><span class="line">    begin_date = end_date - datetime.timedelta(days=<span class="number">1</span>)</span><br><span class="line">    begin_date_str = datetime.datetime.strftime(begin_date, <span class="string">'%Y-%m-%d'</span>)</span><br><span class="line">    end_date_str = datetime.datetime.strftime(end_date, <span class="string">'%Y-%m-%d'</span>)</span><br><span class="line">    <span class="keyword">if</span> include_today:</span><br><span class="line">        end_date += datetime.timedelta(days=<span class="number">1</span>)</span><br><span class="line">        begin_date += datetime.timedelta(days=<span class="number">1</span>)</span><br><span class="line">    sql = <span class="string">""" SELECT &#123;&#125; FROM &#123;&#125;</span></span><br><span class="line"><span class="string">    WHERE source = 1 AND created_at &gt;= %s AND created_at &lt; %s</span></span><br><span class="line"><span class="string">    """</span>.format(self.query_fileds, self.credit_table)  <span class="comment"># 参数为要查询的字段和表名，可配置</span></span><br><span class="line">    values = [begin_date_str, end_date_str]</span><br><span class="line">    <span class="keyword">for</span> row <span class="keyword">in</span> db_interface.db1_check.select(sql, values):</span><br><span class="line">        record = [int(r) <span class="keyword">if</span> index &lt;= <span class="number">1</span> <span class="keyword">else</span> r <span class="keyword">for</span> index, r <span class="keyword">in</span> enumerate(row)]  <span class="comment"># apply_id/uniq_id 和status转int</span></span><br><span class="line">        datas.append(record)</span><br><span class="line">    <span class="keyword">return</span> datas</span><br></pre></td></tr></table></figure></code></pre><h3 id="清洗转换数据"><a href="#清洗转换数据" class="headerlink" title="清洗转换数据"></a>清洗转换数据</h3><p>使用<code>pandas</code>将获取到的数据清理转换为<code>DataFrame</code>,根据需求，进行轴的转换以及聚合分析</p><ul><li>单个数据源处理</li></ul><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br></pre></td><td class="code"><pre><span class="line">credit_name_map[<span class="string">'CreditA'</span>] = <span class="string">'A数据源'</span></span><br><span class="line">credit_name_map[<span class="string">'CreditB'</span>] = <span class="string">'B数据源'</span></span><br><span class="line">credit_name_map[<span class="string">'CreditC'</span>] = <span class="string">'C数据源'</span></span><br><span class="line">credit_name_map[<span class="string">'CreditD'</span>] = <span class="string">'D数据源'</span></span><br><span class="line">credit_special_fail_map = &#123;</span><br><span class="line">    <span class="string">'CreditA'</span>: <span class="string">'A数据特殊失败'</span>,</span><br><span class="line">    <span class="string">'CreditB'</span>: <span class="string">'B数据特殊失败'</span>,</span><br><span class="line">    <span class="string">'CreditC'</span>: <span class="string">'C数据特殊失败'</span>,</span><br><span class="line">    <span class="string">'CreditD'</span>: <span class="string">'D数据特殊失败'</span>,</span><br><span class="line">&#125;</span><br><span class="line">all_sum_datas = list()</span><br><span class="line"><span class="keyword">for</span> credit_type, credit_name <span class="keyword">in</span> credit_name_map.items():</span><br><span class="line">    sum_data = get_data(credit_type=credit_type, summary=<span class="keyword">True</span>, report_days=<span class="number">7</span>)</span><br><span class="line">    all_sum_datas.extend(sum_data)  <span class="comment"># 添加到汇总数据</span></span><br><span class="line">    <span class="comment"># 下面四行为表格生成准备代码</span></span><br><span class="line">    table_name = <span class="string">'&#123;&#125; 近7日查询概要'</span>.format(credit_name)</span><br><span class="line">    header = [<span class="string">''</span>, <span class="string">'数据源'</span>, <span class="string">'日查询总量'</span>, <span class="string">'查询成功量'</span>, <span class="string">'在查询中量'</span>, <span class="string">'查询失败总量'</span>, <span class="string">'特殊失败量'</span>]</span><br><span class="line">    header[<span class="number">-1</span>] = credit_special_fail_map.get(credit_type, <span class="string">'特殊失败量'</span>)</span><br><span class="line">    self.html.add_table(table_name, header, sum_data)</span><br><span class="line">    <span class="comment"># 单个数据源分析</span></span><br><span class="line">    <span class="comment"># 转换为DataFrame</span></span><br><span class="line">    pd_data = pd.DataFrame(</span><br><span class="line">        np.array(sum_data),</span><br><span class="line">        columns=[<span class="string">'date'</span>, <span class="string">'credit_type'</span>, <span class="string">'total'</span>, <span class="string">'success'</span>, <span class="string">'querying'</span>, <span class="string">'fail'</span>, <span class="string">'special_fail'</span>])</span><br><span class="line">    <span class="comment"># 设置日期行索引，逆序排列</span></span><br><span class="line">    pd_data = pd_data.set_index([<span class="string">'date'</span>]).drop(labels=[<span class="string">'credit_type'</span>], axis=<span class="number">1</span>).sort_index(ascending=<span class="keyword">True</span>)</span><br><span class="line">    pd_data = pd_data.apply(pd.to_numeric)  <span class="comment"># 转换为数字类型</span></span><br></pre></td></tr></table></figure><ul><li>汇总数据源处理</li></ul><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment"># 选取想要分析的列生成DataFrame</span></span><br><span class="line">all_sum_datas_pd = pd.DataFrame(np.array(all_sum_datas), columns=[<span class="string">'date'</span>, <span class="string">'credit_type'</span>, <span class="string">'total'</span>, <span class="string">'success'</span>, <span class="string">'querying'</span>, <span class="string">'fail'</span>, <span class="string">'special_fail'</span>])</span><br><span class="line"><span class="comment"># 设置日期和数据源作为行索引</span></span><br><span class="line">all_sum_count_pd = all_sum_datas_pd.set_index([<span class="string">'date'</span>, <span class="string">'credit_type'</span>]).apply(pd.to_numeric)  <span class="comment"># 重新索引</span></span><br><span class="line">all_sum_count_goupby_date = all_sum_count_pd.groupby(<span class="string">'date'</span>).sum()  <span class="comment"># 根据日期聚合汇总</span></span><br></pre></td></tr></table></figure><h3 id="分析数据生成图表"><a href="#分析数据生成图表" class="headerlink" title="分析数据生成图表"></a>分析数据生成图表</h3><h4 id="生成图片"><a href="#生成图片" class="headerlink" title="生成图片"></a>生成图片</h4><ul><li>利用<code>matplotlib</code>对于处理好的数据生成对应的柱状图、折线图等</li></ul><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">#!/usr/bin/env python</span></span><br><span class="line"><span class="comment"># -*- coding: utf-8 -*-</span></span><br><span class="line"><span class="comment"># @File : gen_img.py </span></span><br><span class="line"><span class="comment"># @Author : cgDeepLearn</span></span><br><span class="line"><span class="comment"># @Create Date : 2020/11/4-11:18 上午</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">import</span> matplotlib <span class="keyword">as</span> mpl</span><br><span class="line"><span class="comment"># mpl.rcParams['font.sans-serif'] = ['SimHei']</span></span><br><span class="line"><span class="comment"># mpl.rcParams['font.serif'] = ['SimHei']</span></span><br><span class="line">mpl.rcParams[<span class="string">'axes.unicode_minus'</span>] = <span class="keyword">False</span></span><br><span class="line">mpl.use(<span class="string">'Agg'</span>)</span><br><span class="line"></span><br><span class="line"><span class="keyword">import</span> os</span><br><span class="line"><span class="keyword">import</span> matplotlib.pyplot <span class="keyword">as</span> plt</span><br><span class="line"><span class="keyword">import</span> numpy <span class="keyword">as</span> np</span><br><span class="line"><span class="keyword">import</span> pandas <span class="keyword">as</span> pd</span><br><span class="line"></span><br><span class="line">CUR_PATH = os.path.dirname(os.path.abspath(__file__))</span><br><span class="line">IMG_PATH = os.path.join(os.path.dirname(CUR_PATH), <span class="string">"files"</span>)  <span class="comment"># 图片保存位置</span></span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="class"><span class="keyword">class</span> <span class="title">GenIMG</span><span class="params">(object)</span>:</span></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">__init__</span><span class="params">(self, img_name, pd_data)</span>:</span></span><br><span class="line">        self.img_name = img_name  <span class="comment"># 图片名</span></span><br><span class="line">        self.pd_data = pd_data  <span class="comment"># pandas数据</span></span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">process</span><span class="params">(self, sum=False)</span>:</span></span><br><span class="line">        kind = <span class="string">'line'</span> <span class="comment"># 折线图</span></span><br><span class="line">        title = <span class="string">'query &#123;&#125; info'</span>.format(self.img_name)</span><br><span class="line">        <span class="keyword">if</span> sum:</span><br><span class="line">            <span class="comment"># 汇总的使用柱状图</span></span><br><span class="line">            kind = <span class="string">'bar'</span> <span class="comment"># 柱状图</span></span><br><span class="line">            title = <span class="string">'all credit summary info'</span></span><br><span class="line">        axes_subplot = self.pd_data.plot(kind=kind)</span><br><span class="line">        plt.title(title) <span class="comment"># 标题</span></span><br><span class="line">        plt.xlabel(<span class="string">"date"</span>) <span class="comment"># 横轴</span></span><br><span class="line">        plt.ylabel(<span class="string">"num"</span>) <span class="comment"># 纵轴</span></span><br><span class="line">        plt.legend(loc=<span class="string">"best"</span>) <span class="comment"># 图例</span></span><br><span class="line">        plt.grid(<span class="keyword">True</span>) <span class="comment"># 网格</span></span><br><span class="line">        full_path_filename = os.path.join(IMG_PATH, <span class="string">'&#123;&#125;.png'</span>.format(self.img_name)) <span class="comment"># 图片保存位置</span></span><br><span class="line">        plt.savefig(full_path_filename) <span class="comment"># 生成保存</span></span><br><span class="line">        <span class="keyword">return</span> full_path_filename</span><br></pre></td></tr></table></figure><h4 id="近几日汇总数据"><a href="#近几日汇总数据" class="headerlink" title="近几日汇总数据"></a>近几日汇总数据</h4><ul><li>近几日查询汇总数据生成</li></ul><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment"># 汇总数据</span></span><br><span class="line">all_sum_datas_pd = pd.DataFrame(np.array(all_sum_datas), columns=[<span class="string">'date'</span>, <span class="string">'credit_type'</span>, <span class="string">'total'</span>, <span class="string">'success'</span>, <span class="string">'querying'</span>, <span class="string">'fail'</span>, <span class="string">'special_fail'</span>])</span><br><span class="line">all_sum_count_pd = all_sum_datas_pd.set_index([<span class="string">'date'</span>, <span class="string">'credit_type'</span>]).apply(pd.to_numeric)  <span class="comment"># 重新索引</span></span><br><span class="line">all_sum_count_goupby_date = all_sum_count_pd.groupby(<span class="string">'date'</span>).sum()  <span class="comment"># 根据日期聚合汇总</span></span><br><span class="line"></span><br><span class="line"><span class="comment"># 生成 图表</span></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">df_to_img</span><span class="params">(self, img_name, df, summary=False)</span>:</span></span><br><span class="line">        gen_image = GenIMG(img_name=img_name, pd_data=df)</span><br><span class="line">        img_file = gen_image.process(sum=summary)</span><br><span class="line">        self.images.append(img_file)</span><br><span class="line">        self.html.add_img(img_name)</span><br><span class="line"></span><br><span class="line">self.df_to_img(img_name=<span class="string">'sum'</span>, df=all_sum_count_goupby_date, summary=<span class="keyword">True</span>)  <span class="comment"># 生成图片</span></span><br></pre></td></tr></table></figure><ul><li>生成结果示例图如下:</li></ul><p><img src="https://github.com/cgDeepLearn/dailyreport/blob/develop/files/sum.png" alt="近几日数据查询概况"></p><h4 id="前一日数据"><a href="#前一日数据" class="headerlink" title="前一日数据"></a>前一日数据</h4><ul><li>前一日数据分析生成</li></ul><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br></pre></td><td class="code"><pre><span class="line">credit_name_map = OrderedDict()</span><br><span class="line">credit_name_map[<span class="string">'CreditA'</span>] = <span class="string">'A数据源'</span></span><br><span class="line">credit_name_map[<span class="string">'CreditB'</span>] = <span class="string">'B数据源'</span></span><br><span class="line">credit_name_map[<span class="string">'CreditC'</span>] = <span class="string">'C数据源'</span></span><br><span class="line">credit_name_map[<span class="string">'CreditD'</span>] = <span class="string">'D数据源'</span></span><br><span class="line">credit_special_fail_map = &#123;</span><br><span class="line">    <span class="string">'CreditA'</span>: <span class="string">'A数据特殊失败'</span>,</span><br><span class="line">    <span class="string">'CreditB'</span>: <span class="string">'B数据特殊失败'</span>,</span><br><span class="line">    <span class="string">'CreditC'</span>: <span class="string">'C数据特殊失败'</span>,</span><br><span class="line">    <span class="string">'CreditD'</span>: <span class="string">'D数据特殊失败'</span>,</span><br><span class="line">&#125;</span><br><span class="line">all_yestoday_data = list()</span><br><span class="line"><span class="keyword">for</span> credit_type, credit_name <span class="keyword">in</span> credit_name_map.iteritems():</span><br><span class="line">    credit_data = get_data(credit_type=credit_type, include_today=include_today) <span class="comment"># 获取前一日数据</span></span><br><span class="line">    <span class="keyword">if</span> <span class="keyword">not</span> credit_data:</span><br><span class="line">        <span class="keyword">continue</span> <span class="comment"># 该数据如果没有数据，跳过</span></span><br><span class="line">    all_yestoday_data.extend(credit_data) <span class="comment"># 添加到昨日总数据中</span></span><br><span class="line">    <span class="comment"># 生成DataFrame</span></span><br><span class="line">    pd_data = pd.DataFrame(np.array(credit_data),</span><br><span class="line">                            columns=[<span class="string">'apply_id'</span>, <span class="string">'status'</span>, <span class="string">'method'</span>, <span class="string">'cache_key'</span>, <span class="string">'product_id'</span>])</span><br><span class="line">    pd_data[<span class="string">'status'</span>] = pd_data[<span class="string">'status'</span>].apply(int) <span class="comment"># 数据类型转换</span></span><br><span class="line">    rule_df = pd_data[pd_data[<span class="string">'method'</span>].str.contains(<span class="string">'rule'</span>)]  <span class="comment"># 规则</span></span><br><span class="line">    rule_df[<span class="string">'product_name'</span>] = rule_df.apply(cachekey_to_product, axis=<span class="number">1</span>)  <span class="comment"># 产品名称映射</span></span><br><span class="line">    count_df = rule_df[[<span class="string">'product_name'</span>, <span class="string">'status'</span>]]  <span class="comment"># 只分析产品和查询状态</span></span><br><span class="line">    groupby_product = count_df.groupby([<span class="string">'product_name'</span>, <span class="string">'status'</span>]).size()  <span class="comment"># 根据产品和状态汇总统计</span></span><br><span class="line">    product_status_count = groupby_product.unstack(level=<span class="number">1</span>, fill_value=<span class="number">0</span>)  <span class="comment"># status unstack到列</span></span><br><span class="line">    col_name = sorted(product_status_count.columns.tolist()) <span class="comment"># 列名</span></span><br><span class="line">    col_name_str = [status_map[col] <span class="keyword">for</span> col <span class="keyword">in</span> col_name] <span class="comment"># 列名字符串</span></span><br><span class="line"></span><br><span class="line">    product_status_count[<span class="string">'query_total'</span>] = product_status_count.sum(axis=<span class="number">1</span>) <span class="comment"># 新生成查询总量的列</span></span><br><span class="line">    <span class="comment"># 添加请求列</span></span><br><span class="line">    col_name.insert(<span class="number">0</span>, <span class="string">'query_total'</span>) </span><br><span class="line">    col_name_str.insert(<span class="number">0</span>, <span class="string">'请求'</span>)</span><br><span class="line"></span><br><span class="line">    sum_data = [product_status_count.sum(axis=<span class="number">0</span>)[col] <span class="keyword">for</span> col <span class="keyword">in</span> col_name]</span><br><span class="line">    <span class="comment"># 添加合计行</span></span><br><span class="line">    sum_info = [<span class="string">'合计'</span>] + sum_data</span><br><span class="line"></span><br><span class="line">    product_status_count[<span class="string">'product_name'</span>] = product_status_count.index</span><br><span class="line">    <span class="comment"># 添加产品列</span></span><br><span class="line">    col_name.insert(<span class="number">0</span>, <span class="string">'product_name'</span>)</span><br><span class="line">    col_name_str.insert(<span class="number">0</span>, <span class="string">'产品'</span>)</span><br><span class="line"></span><br><span class="line">    table_data_df = product_status_count.reindex(columns=col_name) <span class="comment"># 添加列后，根据新列重新索引</span></span><br><span class="line">    table_datas = table_data_df.values.tolist() <span class="comment"># 总量列表</span></span><br></pre></td></tr></table></figure><h4 id="生成表格"><a href="#生成表格" class="headerlink" title="生成表格"></a>生成表格</h4><p>行列数据添加相应的html代码，使得能在邮件中显示</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">#!/usr/bin/python</span></span><br><span class="line"><span class="comment"># coding:utf8</span></span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="class"><span class="keyword">class</span> <span class="title">GetHtml</span><span class="params">(object)</span>:</span></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">__init__</span><span class="params">(self)</span>:</span></span><br><span class="line">        self._html_head = <span class="string">"&lt;html&gt;&lt;body&gt;"</span></span><br><span class="line">        self._format_html_foot = <span class="string">"""&lt;p style="font-family: verdana,arial,sans-serif;font-size:10px;font-weight:lighter;"&gt;%s&lt;/p&gt;"""</span></span><br><span class="line">        self._format_html_head = <span class="string">"""&lt;p style="font-family: verdana,arial,sans-serif;font-size:12px;font-weight:bold;"&gt;%s&lt;/p&gt;"""</span></span><br><span class="line">        self._format_html_img = <span class="string">"""&lt;br&gt;&lt;img src="cid:%s" alt="" width="1200" height="600"&gt;&lt;/br&gt;"""</span></span><br><span class="line">        self._html_tail = <span class="string">"&lt;/body&gt;&lt;/html&gt;"</span></span><br><span class="line">        self._html_p_head = <span class="string">"""&lt;p style="font-family: verdana,arial,sans-serif;font-size:12px;font-weight:bold;"&gt;%s&lt;/p&gt;"""</span></span><br><span class="line"></span><br><span class="line">        self._table_head = <span class="string">"""&lt;table style="font-family: verdana,arial,sans-serif;font-size:11px;color:#333333;border-width: 1px;border-color: #666666;border-collapse: collapse;" border="1"&gt;&lt;tr&gt;"""</span></span><br><span class="line">        self._format_table_th = <span class="string">"""&lt;th style="border-width: 1px;padding: 8px;border-style: solid;border-color: #666666;background-color: #dedede;" nowrap&gt;%s&lt;/th&gt;"""</span></span><br><span class="line"></span><br><span class="line">        self._format_table_td = <span class="string">"""&lt;td style="border-width: 1px;padding: 8px;text-align: right;border-style: solid;border-color: #666666;background-color: #ffffff;" align="right" nowrap&gt;%s&lt;/td&gt;"""</span></span><br><span class="line">        self._table_tail = <span class="string">"&lt;/table&gt;"</span></span><br><span class="line">        self._content = <span class="string">""</span></span><br><span class="line"></span><br><span class="line">        self._table_html = []</span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">add_table</span><span class="params">(self, table_title, th_info, td_info_list)</span>:</span></span><br><span class="line">        <span class="string">"""添加表格数据"""</span></span><br><span class="line">        table_str = <span class="string">""</span></span><br><span class="line">        table_p_head = self._html_p_head % (str(table_title))</span><br><span class="line">        table_str = table_p_head + self._table_head</span><br><span class="line">        <span class="comment"># th</span></span><br><span class="line">        table_str += <span class="string">"&lt;tr&gt;"</span></span><br><span class="line">        <span class="keyword">for</span> th <span class="keyword">in</span> th_info:</span><br><span class="line">            temp_str = self._format_table_th % (str(th))</span><br><span class="line">            table_str += temp_str</span><br><span class="line">        table_str += <span class="string">"&lt;/tr&gt;"</span></span><br><span class="line">        <span class="comment"># td</span></span><br><span class="line">        <span class="keyword">for</span> td_info <span class="keyword">in</span> td_info_list:</span><br><span class="line">            table_str += <span class="string">"&lt;tr&gt;"</span></span><br><span class="line">            <span class="keyword">for</span> td <span class="keyword">in</span> td_info:</span><br><span class="line">                temp_str = self._format_table_td % (str(td))</span><br><span class="line">                table_str += temp_str</span><br><span class="line">            table_str += <span class="string">"&lt;/tr&gt;"</span></span><br><span class="line">        <span class="comment">#</span></span><br><span class="line">        table_str += self._table_tail</span><br><span class="line">        self._table_html.append(table_str)</span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">add_head</span><span class="params">(self, head)</span>:</span></span><br><span class="line">        <span class="string">"""添加表头"""</span></span><br><span class="line">        head_str = self._format_html_head % (str(head))</span><br><span class="line">        self._table_html.append(head_str)</span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">add_foot</span><span class="params">(self, foot)</span>:</span></span><br><span class="line">        <span class="string">"""添加表格注脚"""</span></span><br><span class="line">        foot_str = self._format_html_foot % (str(foot))</span><br><span class="line">        self._table_html.append(foot_str)</span><br><span class="line">    </span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">add_img</span><span class="params">(self, img)</span>:</span></span><br><span class="line">        <span class="string">"""添加图片"""</span></span><br><span class="line">        img_str = self._format_html_img % (str(img))</span><br><span class="line">        self._table_html.append(img_str)</span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">output_html</span><span class="params">(self)</span>:</span></span><br><span class="line">        <span class="string">"""输出html"""</span></span><br><span class="line">        html_content = self._html_head</span><br><span class="line">        <span class="keyword">for</span> s <span class="keyword">in</span> self._table_html:</span><br><span class="line">            html_content += s</span><br><span class="line">        html_content += self._html_tail</span><br><span class="line">        <span class="keyword">return</span> html_content</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="keyword">if</span> __name__ == <span class="string">"__main__"</span>:</span><br><span class="line">    gh = GetHtml()</span><br><span class="line">    p_title = <span class="string">"test"</span></span><br><span class="line">    th = [<span class="number">1</span>, <span class="number">2</span>, <span class="number">3</span>, <span class="number">4</span>]</span><br><span class="line">    td = [[<span class="number">1</span>, <span class="number">2</span>, <span class="number">3</span>, <span class="number">4</span>], [<span class="number">4</span>, <span class="number">5</span>, <span class="number">5</span>, <span class="number">56</span>], [<span class="number">3</span>, <span class="number">3</span>, <span class="number">3</span>, <span class="number">3</span>]]</span><br><span class="line">    gh.add_table(p_title, th, td)</span><br><span class="line">    cont = gh.output_html()</span><br></pre></td></tr></table></figure><ul><li>昨日详细请求数据表格生成</li></ul><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">table_datas = table_data_df.values.tolist()</span><br><span class="line">            table_datas.append(sum_info)</span><br><span class="line">            table_name = <span class="string">'&#123;&#125; 查征明细'</span>.format(credit_name)</span><br><span class="line">            header = col_name_str</span><br><span class="line">            self.html.add_table(table_name, header, table_datas)</span><br><span class="line">self.html.output_html()</span><br></pre></td></tr></table></figure><p><img src="https://github.com/cgDeepLearn/dailyreport/blob/develop/files/yestoday_detail.png" alt="昨日详情"></p><h3 id="发送邮件"><a href="#发送邮件" class="headerlink" title="发送邮件"></a>发送邮件</h3><p>使用<code>smtplib</code> 和 <code>email</code>，添加对应配置，方邮件</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br><span class="line">91</span><br><span class="line">92</span><br><span class="line">93</span><br><span class="line">94</span><br><span class="line">95</span><br><span class="line">96</span><br><span class="line">97</span><br><span class="line">98</span><br><span class="line">99</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">#!/usr/bin/python</span></span><br><span class="line"><span class="comment"># coding:utf-8</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">import</span> smtplib</span><br><span class="line"><span class="keyword">from</span> email.mime.text <span class="keyword">import</span> MIMEText</span><br><span class="line"><span class="keyword">from</span> email.mime.multipart <span class="keyword">import</span> MIMEMultipart</span><br><span class="line"><span class="keyword">from</span> email.mime.image <span class="keyword">import</span> MIMEImage</span><br><span class="line"><span class="keyword">from</span> config <span class="keyword">import</span> email_configs  <span class="comment"># 邮件配置</span></span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="class"><span class="keyword">class</span> <span class="title">Email</span>:</span></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">__init__</span><span class="params">(self, to_list, sub, content)</span>:</span></span><br><span class="line">        <span class="string">'''</span></span><br><span class="line"><span class="string">        to_list:发给谁</span></span><br><span class="line"><span class="string">        sub:主题</span></span><br><span class="line"><span class="string">        content:内容</span></span><br><span class="line"><span class="string">        send_mail("aaa@126.com","sub","content")</span></span><br><span class="line"><span class="string">        '''</span></span><br><span class="line">        <span class="comment">#####################</span></span><br><span class="line">        <span class="comment"># 设置服务器，用户名、口令以及邮箱的后缀</span></span><br><span class="line">        self.to_list = to_list</span><br><span class="line">        self.mail_host = <span class="string">'smtp.exmail.qq.com'</span>  <span class="comment"># 邮箱host</span></span><br><span class="line">        self.mail_user = <span class="string">'report'</span>  <span class="comment"># 发送用户</span></span><br><span class="line">        self.mail_postfix = <span class="string">'xxx.com'</span> <span class="comment"># 邮箱后缀，替换为你的</span></span><br><span class="line"></span><br><span class="line">        self.me = self.mail_user + <span class="string">"&lt;"</span> + self.mail_user + <span class="string">"@"</span> + self.mail_postfix + <span class="string">"&gt;"</span></span><br><span class="line">        <span class="comment"># 尝试用utf8和GBK解码邮件内容和主题成unicode</span></span><br><span class="line">        <span class="keyword">try</span>:</span><br><span class="line">            content = unicode(content, <span class="string">'utf8'</span>)</span><br><span class="line">            sub = unicode(sub, <span class="string">'utf8'</span>)</span><br><span class="line">        <span class="keyword">except</span> UnicodeDecodeError:</span><br><span class="line">            <span class="keyword">try</span>:</span><br><span class="line">                content = unicode(content, <span class="string">'gbk'</span>)</span><br><span class="line">                sub = unicode(sub, <span class="string">'gbk'</span>)</span><br><span class="line">            <span class="keyword">except</span> UnicodeDecodeError:</span><br><span class="line">                <span class="comment"># print format_exc()</span></span><br><span class="line">                <span class="keyword">return</span> <span class="keyword">False</span></span><br><span class="line">        <span class="comment"># 已经是unicode</span></span><br><span class="line">        <span class="keyword">except</span> TypeError:</span><br><span class="line">            <span class="keyword">pass</span></span><br><span class="line"></span><br><span class="line">        self.msg = MIMEMultipart(<span class="string">'related'</span>)  <span class="comment"># 超文本</span></span><br><span class="line">        self.msg[<span class="string">'Subject'</span>] = sub  <span class="comment"># 邮件主题</span></span><br><span class="line">        self.msg[<span class="string">'From'</span>] = <span class="string">'report@xxx.com'</span>  <span class="comment"># report用户没有可以建一个或者用已注册的其他的</span></span><br><span class="line">        self.msg[<span class="string">'To'</span>] = <span class="string">";"</span>.join(to_list)  <span class="comment"># 收件人</span></span><br><span class="line"></span><br><span class="line">        txt = MIMEText(content.encode(<span class="string">'utf-8'</span>), <span class="string">'html'</span>, <span class="string">'UTF-8'</span>)</span><br><span class="line">        self.msg.attach(txt)</span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">add_image</span><span class="params">(self, file_name)</span>:</span></span><br><span class="line">        <span class="string">"""添加图片"""</span></span><br><span class="line">        <span class="comment"># prefix = file_name.split('.')[0]</span></span><br><span class="line">        image = MIMEImage(open(file_name, <span class="string">'rb'</span>).read())</span><br><span class="line">        end_index = file_name.rfind(<span class="string">'/'</span>)</span><br><span class="line">        <span class="keyword">if</span> end_index != <span class="number">-1</span>:</span><br><span class="line">            tag = file_name[end_index + <span class="number">1</span>:]</span><br><span class="line">        <span class="keyword">else</span>:</span><br><span class="line">            tag = file_name</span><br><span class="line">        tag = tag.split(<span class="string">'.'</span>)[<span class="number">0</span>]</span><br><span class="line">        image.add_header(<span class="string">'Content-ID'</span>, <span class="string">'&lt;'</span> + tag + <span class="string">'&gt;'</span>)</span><br><span class="line">        <span class="comment"># image.add_header("Content-Disposition", "inline", filename=file_name)</span></span><br><span class="line">        <span class="comment"># image.add_header('Content-Disposition', 'attachment', filename=file_name)</span></span><br><span class="line"></span><br><span class="line">        self.msg.attach(image)</span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">send_mail</span><span class="params">(self)</span>:</span></span><br><span class="line">        <span class="keyword">try</span>:</span><br><span class="line">            s = smtplib.SMTP()</span><br><span class="line">            s.connect(self.mail_host, <span class="number">587</span>) <span class="comment"># 连接邮件host服务器</span></span><br><span class="line">            s.ehlo()</span><br><span class="line">            s.starttls()</span><br><span class="line">            s.ehlo()</span><br><span class="line">            <span class="comment"># s.set_debuglevel(1)</span></span><br><span class="line">            s.login(<span class="string">'report@xxx.com'</span>, <span class="string">'***'</span>)  <span class="comment"># ***为密码，登录</span></span><br><span class="line"></span><br><span class="line">            s.sendmail(self.me, self.to_list, self.msg.as_string()) <span class="comment"># 发送</span></span><br><span class="line">            s.close()</span><br><span class="line">            <span class="keyword">return</span> <span class="keyword">True</span></span><br><span class="line">        <span class="keyword">except</span> Exception, e:</span><br><span class="line">            <span class="keyword">print</span> e</span><br><span class="line">            <span class="comment"># print format_exc()</span></span><br><span class="line">            <span class="keyword">return</span> <span class="keyword">False</span></span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">get_tag_name</span><span class="params">(pic_name)</span>:</span></span><br><span class="line">    end_index = pic_name.rfind(<span class="string">'/'</span>)</span><br><span class="line">    <span class="keyword">if</span> end_index != <span class="number">-1</span>:</span><br><span class="line">        tag = pic_name[end_index + <span class="number">1</span>:]</span><br><span class="line">    <span class="keyword">else</span>:</span><br><span class="line">        tag = pic_name</span><br><span class="line">    <span class="keyword">return</span> tag</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">send_mail</span><span class="params">(mail_list, sub, content, images=None)</span>:</span></span><br><span class="line">    email_sender = Email(mail_list, sub, content)</span><br><span class="line">    <span class="keyword">if</span> images:</span><br><span class="line">        <span class="keyword">for</span> image <span class="keyword">in</span> images:</span><br><span class="line">            email_sender.add_image(image)</span><br><span class="line">    email_sender.send_mail()</span><br></pre></td></tr></table></figure><h3 id="last"><a href="#last" class="headerlink" title="last"></a>last</h3><p>这样一个基本的数据获取、分析、生成报表发送邮件的基本框架就搭好了，当然还可以配置定时任务，来定时生成。或者对于异常的数据进行特殊告警发送。</p><h2 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h2><p>此项目github地址: <a href="https://github.com/cgDeepLearn/dailyreport" target="_blank" rel="noopener">daily_report</a></p><p>这只是一个基本的数据生成报表发送的简单示例，如果数据更加复杂，我们可能需要利用<code>pandas</code>进行更加精细的操作.<br>或者如果有很多这样的项目需要数据可视化和实时告警，那么我们还是推荐使用<code>Prometheus</code>这样的工具，来配合云平台或者容器平台。</p>]]></content>
    
    <summary type="html">
    
      &lt;p class=&quot;description&quot;&gt;使用&lt;code&gt;pandas&lt;/code&gt;生成日报图表并发送邮件&lt;br&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;&quot; alt=&quot;&quot; style=&quot;width:100%&quot;&gt;&lt;/p&gt;
&lt;h2 id=&quot;前言&quot;&gt;&lt;a href=&quot;#前言&quot; class=&quot;headerlink&quot; title=&quot;前言&quot;&gt;&lt;/a&gt;前言&lt;/h2&gt;&lt;p&gt;&lt;div class=&quot;note info&quot;&gt;&lt;p&gt;&lt;/p&gt;
&lt;p&gt;平常我们在项目中可能会用到&lt;a href=&quot;https://yunlzheng.gitbook.io/prometheus-book/parti-prometheus-ji-chu/quickstart/why-monitor&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;Prometheus&lt;/a&gt;这样的监控报警系统,来了解项目系统内部的实际运行状态，以及做趋势分析、对照分析、告警与故障定位、数据可视化等。&lt;/p&gt;
&lt;p&gt;虽然他有诸多优点，但是对于小项目来说，这个又有点太重。如果我们想在小项目里做一些简单的数据分析和可定制化的报表生成，是否有简单可复用的方法呢？有的！下面我们就用&lt;a href=&quot;https://www.pypandas.cn/docs/getting_started/10min.html&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;pandas&lt;/a&gt;结合python其他的一些基本的库，来做一个日报图表生成发送的小模板。&lt;/p&gt;
&lt;p&gt;&lt;/p&gt;&lt;/div&gt;&lt;br&gt;
    
    </summary>
    
      <category term="Python" scheme="https://cgdeeplearn.github.io/categories/Python/"/>
    
    
      <category term="numpy" scheme="https://cgdeeplearn.github.io/tags/numpy/"/>
    
      <category term="pandas" scheme="https://cgdeeplearn.github.io/tags/pandas/"/>
    
      <category term="matplotlib" scheme="https://cgdeeplearn.github.io/tags/matplotlib/"/>
    
      <category term="email" scheme="https://cgdeeplearn.github.io/tags/email/"/>
    
      <category term="Prometheus" scheme="https://cgdeeplearn.github.io/tags/Prometheus/"/>
    
  </entry>
  
  <entry>
    <title>初始化GO项目模板</title>
    <link href="https://cgdeeplearn.github.io/2020/12/06/init-go/"/>
    <id>https://cgdeeplearn.github.io/2020/12/06/init-go/</id>
    <published>2020-12-06T12:31:41.000Z</published>
    <updated>2020-12-08T12:14:23.743Z</updated>
    
    <content type="html"><![CDATA[<script src="/assets/js/APlayer.min.js"> </script><p class="description">使用<code>cookiecutter初始化可自定义的GO项目模板</code><br></p><p><img src="" alt="" style="width:100%"></p><a id="more"></a><h2 id="前言"><a href="#前言" class="headerlink" title="前言"></a>前言</h2><p>##</p><h2 id="小结"><a href="#小结" class="headerlink" title="小结"></a>小结</h2><p>项目github地址: <a href="https://github.com/cgDeepLearn/initgo" target="_blank" rel="noopener">https://github.com/cgDeepLearn/initgo</a></p>]]></content>
    
    <summary type="html">
    
      &lt;p class=&quot;description&quot;&gt;使用&lt;code&gt;cookiecutter初始化可自定义的GO项目模板&lt;/code&gt;&lt;br&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;&quot; alt=&quot;&quot; style=&quot;width:100%&quot;&gt;&lt;/p&gt;
    
    </summary>
    
      <category term="GO" scheme="https://cgdeeplearn.github.io/categories/GO/"/>
    
    
      <category term="cookiecutter" scheme="https://cgdeeplearn.github.io/tags/cookiecutter/"/>
    
      <category term="golang" scheme="https://cgdeeplearn.github.io/tags/golang/"/>
    
  </entry>
  
  <entry>
    <title>解析pdf电子签章</title>
    <link href="https://cgdeeplearn.github.io/2020/12/06/pdf-signature/"/>
    <id>https://cgdeeplearn.github.io/2020/12/06/pdf-signature/</id>
    <published>2020-12-06T12:10:14.000Z</published>
    <updated>2020-12-07T13:11:57.351Z</updated>
    
    <content type="html"><![CDATA[<script src="/assets/js/APlayer.min.js"> </script><p class="description">Java使用<code>itext</code>解析<code>pdf</code>签章<br></p><p><img src="" alt="" style="width:100%"></p><h2 id="前言"><a href="#前言" class="headerlink" title="前言"></a>前言</h2><p><div class="note info"><p></p><p>许多业务交易（包括金融、法律和其他受管制交易）都要求在对文档进行签名时提供较高程度的保证。当以电子方式分发文档时，收件人务必可以：</p><ul><li>验证文档真实性 — 确认对文档进行签名的每个人的身份</li><li>验证文档完整性 — 确认在传输过程中文档未被更改</li></ul><p>那么我们怎么通过程序鉴别一个带有电子签章的pdf签章的有效性呢？</p><p></p></div><br><a id="more"></a></p><h2 id="电子签章"><a href="#电子签章" class="headerlink" title="电子签章"></a>电子签章</h2><p>电子签章有签名时间，签章有效期起止时间，从这三个时间我们就可以看一些基本的签章有效性和时间是否合理(是否被篡改)。</p><h2 id="解析电子签章"><a href="#解析电子签章" class="headerlink" title="解析电子签章"></a>解析电子签章</h2><p>我们利用<code>itext</code>来解析电子签章。</p><h3 id="读取加载pdf"><a href="#读取加载pdf" class="headerlink" title="读取加载pdf"></a>读取加载pdf</h3><p>通过读取pdf的url，将内容加载到流中</p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">public</span> <span class="keyword">static</span> InputStream <span class="title">getInputStreamByUrl</span><span class="params">(String strUrl)</span> </span>&#123;</span><br><span class="line">        HttpURLConnection conn = <span class="keyword">null</span>;</span><br><span class="line">        <span class="keyword">try</span> &#123;</span><br><span class="line">            URL url = <span class="keyword">new</span> URL(strUrl);</span><br><span class="line">            conn = (HttpURLConnection) url.openConnection();</span><br><span class="line">            conn.setRequestMethod(<span class="string">"GET"</span>);</span><br><span class="line">            conn.setConnectTimeout(<span class="number">20</span> * <span class="number">1000</span>);</span><br><span class="line">            <span class="keyword">final</span> ByteArrayOutputStream output = <span class="keyword">new</span> ByteArrayOutputStream();</span><br><span class="line">            IOUtils.copy(conn.getInputStream(), output);</span><br><span class="line">            <span class="keyword">return</span> <span class="keyword">new</span> ByteArrayInputStream(output.toByteArray());</span><br><span class="line">        &#125; <span class="keyword">catch</span> (Exception e) &#123;</span><br><span class="line">            e.printStackTrace();</span><br><span class="line">        &#125;</span><br><span class="line">        <span class="keyword">return</span> <span class="keyword">null</span>;</span><br><span class="line">    &#125;</span><br></pre></td></tr></table></figure><h3 id="解析签章时间"><a href="#解析签章时间" class="headerlink" title="解析签章时间"></a>解析签章时间</h3><p>我们主要用到<code>com.itextpdf.text.pdf</code>里的三个：<code>AcroFields</code>、<code>PdfReader</code>、<code>security.PdfPKCS7</code></p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> com.google.gson.Gson;</span><br><span class="line"><span class="keyword">import</span> com.itextpdf.text.pdf.AcroFields;</span><br><span class="line"><span class="keyword">import</span> com.itextpdf.text.pdf.PdfReader;</span><br><span class="line"><span class="keyword">import</span> com.itextpdf.text.pdf.security.PdfPKCS7;</span><br><span class="line"><span class="keyword">import</span> com.xiaoying.risksystem.util.StackTraceUtils;</span><br><span class="line"><span class="keyword">import</span> org.bouncycastle.jce.provider.BouncyCastleProvider;</span><br><span class="line"><span class="keyword">import</span> org.slf4j.Logger;</span><br><span class="line"><span class="keyword">import</span> org.slf4j.LoggerFactory;</span><br><span class="line"><span class="comment">// ...</span></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">public</span> GsonResponse <span class="title">resolvePdf</span><span class="params">(Integer applyId, String pdfUrl)</span> <span class="keyword">throws</span> IOException, GeneralSecurityException </span>&#123;</span><br><span class="line"></span><br><span class="line"></span><br><span class="line">        GsonResponse gsonResponse = <span class="keyword">new</span> GsonResponse();</span><br><span class="line"></span><br><span class="line">        <span class="keyword">try</span> &#123;</span><br><span class="line">            String signed_on = <span class="string">""</span>; <span class="comment">// 签名时间</span></span><br><span class="line">            String valid_to = <span class="string">""</span>; <span class="comment">// 有效期开始时间</span></span><br><span class="line">            String valid_from = <span class="string">""</span>; <span class="comment">// 有效期截止时间</span></span><br><span class="line"></span><br><span class="line">            SimpleDateFormat date_format = <span class="keyword">new</span> SimpleDateFormat(<span class="string">"yyyy-MM-dd HH:mm:ss"</span>);</span><br><span class="line">            InputStream resource = getInputStreamByUrl(pdfUrl);</span><br><span class="line">            PdfReader reader = <span class="keyword">new</span> PdfReader(resource); <span class="comment">//pdf reader 读取流内容</span></span><br><span class="line">            AcroFields fields = reader.getAcroFields(); <span class="comment">// 获取Acro字段</span></span><br><span class="line">            ArrayList&lt;String&gt; names = fields.getSignatureNames(); <span class="comment">// 获取Acro字段里的签名信息</span></span><br><span class="line">            <span class="keyword">for</span> (String name : names) &#123;</span><br><span class="line">                PdfPKCS7 pkcs7 = fields.verifySignature(name); <span class="comment">// 获取签名的pkcs7数据</span></span><br><span class="line">                signed_on = date_format.format(pkcs7.getSignDate().getTime()); <span class="comment">// 获取转换pkcs7数据里的签名日期时间</span></span><br><span class="line">                valid_from = date_format.format(pkcs7.getSigningCertificate().getNotBefore()); <span class="comment">// 获取pkcs7数据里有效期开始时间</span></span><br><span class="line">                valid_to = date_format.format(pkcs7.getSigningCertificate().getNotAfter()); <span class="comment">// 获取pkcs7数据里有效期</span></span><br><span class="line"></span><br><span class="line">            &#125;</span><br><span class="line"></span><br><span class="line">            GsonResponse.DataBean dataBean = <span class="keyword">new</span> GsonResponse.DataBean();</span><br><span class="line">            dataBean.setSigned_on(signed_on);</span><br><span class="line">            dataBean.setValid_from(valid_from);</span><br><span class="line">            dataBean.setValid_to(valid_to);</span><br><span class="line">            gsonResponse.setData(dataBean);</span><br><span class="line">        &#125; <span class="keyword">catch</span> (Exception e) &#123;</span><br><span class="line">            Log.error(<span class="string">"applyId:&#123;&#125; itextPdf except:&#123;&#125;"</span>, applyId, StackTraceUtils.getStackTrace(e));</span><br><span class="line">            gsonResponse.setErr_code(-<span class="number">1</span>);</span><br><span class="line">        &#125; <span class="keyword">finally</span> &#123;</span><br><span class="line">            <span class="keyword">return</span> gsonResponse;</span><br><span class="line">        &#125;</span><br><span class="line"></span><br><span class="line">    &#125;</span><br></pre></td></tr></table></figure><h2 id="签章有效性"><a href="#签章有效性" class="headerlink" title="签章有效性"></a>签章有效性</h2><p>这样我们可以获取到三个时间了，一般可以通过签名时间和有效期时间是否为同一天或者看当前日期是否在有效期内来判断签章的有效期行，此外我们也可以利用函数来判断:</p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">PdfPKCS7 pkcs7 = fields.verifySignature(name); <span class="comment">// 获取签名的pkcs7数据</span></span><br><span class="line"><span class="keyword">boolean</span> isValid = pkcs7.verifySignatureIntegrityAndAuthenticity(); <span class="comment">//数字签名验证是否有效</span></span><br><span class="line">Sysetm.out.println(<span class="string">"Is signature valid: "</span> + isValid)</span><br></pre></td></tr></table></figure><h2 id="小结"><a href="#小结" class="headerlink" title="小结"></a>小结</h2><p>有时候我们设计到pdf合同的有效性的时候，可以用<code>itextpdf</code>来读取解析其签章时间来判定合同签章的有效性。</p>]]></content>
    
    <summary type="html">
    
      &lt;p class=&quot;description&quot;&gt;Java使用&lt;code&gt;itext&lt;/code&gt;解析&lt;code&gt;pdf&lt;/code&gt;签章&lt;br&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;&quot; alt=&quot;&quot; style=&quot;width:100%&quot;&gt;&lt;/p&gt;
&lt;h2 id=&quot;前言&quot;&gt;&lt;a href=&quot;#前言&quot; class=&quot;headerlink&quot; title=&quot;前言&quot;&gt;&lt;/a&gt;前言&lt;/h2&gt;&lt;p&gt;&lt;div class=&quot;note info&quot;&gt;&lt;p&gt;&lt;/p&gt;
&lt;p&gt;许多业务交易（包括金融、法律和其他受管制交易）都要求在对文档进行签名时提供较高程度的保证。当以电子方式分发文档时，收件人务必可以：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;验证文档真实性 — 确认对文档进行签名的每个人的身份&lt;/li&gt;
&lt;li&gt;验证文档完整性 — 确认在传输过程中文档未被更改&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;那么我们怎么通过程序鉴别一个带有电子签章的pdf签章的有效性呢？&lt;/p&gt;
&lt;p&gt;&lt;/p&gt;&lt;/div&gt;&lt;br&gt;
    
    </summary>
    
      <category term="Java" scheme="https://cgdeeplearn.github.io/categories/Java/"/>
    
    
      <category term="PDF" scheme="https://cgdeeplearn.github.io/tags/PDF/"/>
    
      <category term="itextpdf" scheme="https://cgdeeplearn.github.io/tags/itextpdf/"/>
    
      <category term="signature" scheme="https://cgdeeplearn.github.io/tags/signature/"/>
    
  </entry>
  
  <entry>
    <title>简单的python 微服务小框架</title>
    <link href="https://cgdeeplearn.github.io/2020/05/18/Mirco-service/"/>
    <id>https://cgdeeplearn.github.io/2020/05/18/Mirco-service/</id>
    <published>2020-05-18T13:04:17.000Z</published>
    <updated>2020-12-07T12:56:03.943Z</updated>
    
    <content type="html"><![CDATA[<script src="/assets/js/APlayer.min.js"> </script><p class="description">快速开发你的小服务<br></p><p><img src="" alt="" style="width:100%"></p><h2 id="前言"><a href="#前言" class="headerlink" title="前言"></a>前言</h2><p><div class="note info"><p></p><p>现在微服务盛行，很多项目依赖一些协同工作小而自治的服务。那么对于python，虽然有<code>grpc</code>、<code>nameko</code>这样的微服务框架，或者<code>Flask</code>、<code>Django</code>这样的web框架。我们是否可以编写一个轻量可复用的服务框架呢？</p><p></p></div><br><a id="more"></a></p><h2 id="准备"><a href="#准备" class="headerlink" title="准备"></a>准备</h2><p>python: 2.7+/3.6+</p><p>主要库：PyMySQL、redis、DBUtils、requests、supervisor、urllib3、Werkzeug、gevent、gunicorn</p><h2 id="简介"><a href="#简介" class="headerlink" title="简介"></a>简介</h2><p>我们主要使用<code>Werkzeug</code>、<code>gevent</code>、<code>gunicorn</code>来实现。</p><p><code>Werkzeug</code>是一个<code>WSGI</code>工具包，他可以作为一个Web框架的底层库。它可以作为一个 Web 框架的底层库，因为它封装好了很多 Web 框架的东西，例如 <code>Request</code>，<code>Response</code> 等等,例如 <code>Flask</code> 框架就是一 <code>Werkzeug</code> 为基础开发的.</p><p><code>Gevent</code>是一个基于<code>greenlet</code>的Python的并发框架，以微线程<code>greenlet</code>为核心，使用了<code>epoll</code>事件监听机制以及诸多其他优化而变得高效。</p><p><code>Gunicorn</code>是一个unix上被广泛使用的高性能的Python WSGI UNIX HTTP Server。和大多数的web框架兼容，并具有实现简单，轻量级，高性能等特点。</p><p>一般微服务都是本质上是一些数据库简易CRUD操作的集合，一个服务基本包含数据库连接、日志记录，业务逻辑等。那么我们看看这些的基本实现.</p><h2 id="实现"><a href="#实现" class="headerlink" title="实现"></a>实现</h2><h3 id="数据库连接"><a href="#数据库连接" class="headerlink" title="数据库连接"></a>数据库连接</h3><p>mysql我们使用<code>pymysql</code>和<code>DBUtils</code>做一个连接池，缓存我们使用<code>redis</code>.</p><p>操作数据库我们也使用了日志记录，日志记录实现请看下段。</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br><span class="line">91</span><br><span class="line">92</span><br><span class="line">93</span><br><span class="line">94</span><br><span class="line">95</span><br><span class="line">96</span><br><span class="line">97</span><br><span class="line">98</span><br><span class="line">99</span><br><span class="line">100</span><br><span class="line">101</span><br><span class="line">102</span><br><span class="line">103</span><br><span class="line">104</span><br><span class="line">105</span><br><span class="line">106</span><br><span class="line">107</span><br><span class="line">108</span><br><span class="line">109</span><br><span class="line">110</span><br><span class="line">111</span><br><span class="line">112</span><br><span class="line">113</span><br><span class="line">114</span><br><span class="line">115</span><br><span class="line">116</span><br><span class="line">117</span><br><span class="line">118</span><br><span class="line">119</span><br><span class="line">120</span><br><span class="line">121</span><br><span class="line">122</span><br><span class="line">123</span><br><span class="line">124</span><br><span class="line">125</span><br><span class="line">126</span><br><span class="line">127</span><br><span class="line">128</span><br><span class="line">129</span><br><span class="line">130</span><br><span class="line">131</span><br><span class="line">132</span><br><span class="line">133</span><br><span class="line">134</span><br><span class="line">135</span><br><span class="line">136</span><br><span class="line">137</span><br><span class="line">138</span><br><span class="line">139</span><br><span class="line">140</span><br><span class="line">141</span><br><span class="line">142</span><br><span class="line">143</span><br><span class="line">144</span><br><span class="line">145</span><br><span class="line">146</span><br><span class="line">147</span><br><span class="line">148</span><br><span class="line">149</span><br><span class="line">150</span><br><span class="line">151</span><br><span class="line">152</span><br><span class="line">153</span><br><span class="line">154</span><br><span class="line">155</span><br><span class="line">156</span><br><span class="line">157</span><br><span class="line">158</span><br><span class="line">159</span><br><span class="line">160</span><br><span class="line">161</span><br><span class="line">162</span><br><span class="line">163</span><br><span class="line">164</span><br><span class="line">165</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">#!/usr/bin/env python</span></span><br><span class="line"><span class="comment"># -*- coding: utf-8 -*-</span></span><br><span class="line"><span class="comment"># @File : db.py </span></span><br><span class="line"><span class="comment"># @Author : cgDeepLearn</span></span><br><span class="line"><span class="comment"># @Create Date : 2020/11/16-3:30 下午</span></span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="keyword">import</span> redis</span><br><span class="line"><span class="keyword">from</span> conf <span class="keyword">import</span> config</span><br><span class="line"><span class="keyword">import</span> pymysql</span><br><span class="line"><span class="keyword">from</span> DBUtils.PooledDB <span class="keyword">import</span> PooledDB</span><br><span class="line"><span class="keyword">from</span> utils.log <span class="keyword">import</span> logger</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="class"><span class="keyword">class</span> <span class="title">RedisOps</span><span class="params">(object)</span>:</span></span><br><span class="line">    FIELD_EXIST = <span class="number">0</span></span><br><span class="line">    NEW_FIELD = <span class="number">1</span></span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">__init__</span><span class="params">(self, host, port, password, db)</span>:</span></span><br><span class="line">        rd = redis.ConnectionPool(host=host, port=port, password=password, db=db)</span><br><span class="line">        self.rd = redis.Redis(connection_pool=rd)</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="class"><span class="keyword">class</span> <span class="title">MysqlOps</span><span class="params">(object)</span>:</span></span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">__init__</span><span class="params">(self, host, port, user, passwd, db)</span>:</span></span><br><span class="line">        self.pool = PooledDB(</span><br><span class="line">            pymysql,</span><br><span class="line">            mincached=<span class="number">10</span>,</span><br><span class="line">            maxcached=<span class="number">30</span>,</span><br><span class="line">            maxconnections=<span class="number">0</span>,</span><br><span class="line">            host=host,</span><br><span class="line">            user=user,</span><br><span class="line">            passwd=passwd,</span><br><span class="line">            db=db,</span><br><span class="line">            port=port,</span><br><span class="line">            charset=<span class="string">'utf8'</span>)</span><br><span class="line">        self.user_apply = <span class="string">'user_apply'</span></span><br><span class="line">        self.user_base = <span class="string">'user_base'</span></span><br><span class="line">        self.flows = <span class="string">'flows'</span></span><br><span class="line">        self.table_list = list()</span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">_execute</span><span class="params">(self, sql, values)</span>:</span></span><br><span class="line">        <span class="string">'''</span></span><br><span class="line"><span class="string">        每次都使用新的连接池中的链接</span></span><br><span class="line"><span class="string">        '''</span></span><br><span class="line">        conn = self.pool.connection()</span><br><span class="line">        cur = conn.cursor()</span><br><span class="line">        cur.execute(sql, values)</span><br><span class="line">        conn.commit()</span><br><span class="line">        conn.close()</span><br><span class="line">        <span class="keyword">return</span> cur</span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">_check_parameter</span><span class="params">(self, sql, values)</span>:</span></span><br><span class="line">        count = sql.count(<span class="string">'%s'</span>)</span><br><span class="line">        <span class="keyword">if</span> count &gt; <span class="number">0</span>:</span><br><span class="line">            <span class="keyword">for</span> elem <span class="keyword">in</span> values:</span><br><span class="line">                <span class="keyword">if</span> <span class="keyword">not</span> elem:</span><br><span class="line">                    <span class="keyword">return</span> <span class="keyword">False</span></span><br><span class="line">        <span class="keyword">return</span> <span class="keyword">True</span></span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">_get_table_list</span><span class="params">(self)</span>:</span></span><br><span class="line">        <span class="keyword">if</span> len(self.table_list) == <span class="number">0</span>:</span><br><span class="line">            sql = <span class="string">'''SELECT COUNT(id) FROM data_split_info'''</span></span><br><span class="line">            table_num = list(self.select(sql))[<span class="number">0</span>][<span class="number">0</span>]</span><br><span class="line">            self.table_list = [num <span class="keyword">for</span> num <span class="keyword">in</span> range(<span class="number">0</span>, table_num)]</span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">_replace</span><span class="params">(self, sql, table, num)</span>:</span></span><br><span class="line">        <span class="keyword">if</span> num == <span class="number">0</span>:</span><br><span class="line">            <span class="keyword">if</span> table <span class="keyword">in</span> sql:</span><br><span class="line">                string = <span class="string">' AND %s.deleted_at is null'</span> % table</span><br><span class="line">                sql = sql + string</span><br><span class="line">        <span class="keyword">else</span>:</span><br><span class="line">            pattern = <span class="string">'%s'</span> % table</span><br><span class="line">            string = <span class="string">'%s_%d'</span> % (table, num)</span><br><span class="line">            sql = sql.replace(pattern, string)</span><br><span class="line">        <span class="keyword">return</span> sql</span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">_mulselect</span><span class="params">(self, apply_id, sql, values)</span>:</span></span><br><span class="line">        self._get_table_list()</span><br><span class="line"></span><br><span class="line">        mulcur = list()</span><br><span class="line">        <span class="keyword">for</span> num <span class="keyword">in</span> self.table_list:</span><br><span class="line">            temp_c = <span class="number">0</span></span><br><span class="line">            sql_tmp = sql</span><br><span class="line">            sql_tmp = self._replace(sql_tmp, self.user_apply, num)</span><br><span class="line">            sql_tmp = self._replace(sql_tmp, self.user_base, num)</span><br><span class="line">            sql_tmp = self._replace(sql_tmp, self.flows, num)</span><br><span class="line"></span><br><span class="line">            cur = self._execute(sql_tmp, values)</span><br><span class="line">            <span class="keyword">for</span> row <span class="keyword">in</span> cur:</span><br><span class="line">                temp_c = temp_c + <span class="number">1</span></span><br><span class="line">                mulcur.append(row)</span><br><span class="line">            logger.info(<span class="string">'apply_id:%d _mulselect sql:%s, values:%s, result:%s'</span>,</span><br><span class="line">                        apply_id, sql_tmp, values, temp_c)</span><br><span class="line"></span><br><span class="line">        <span class="keyword">return</span> mulcur</span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">mulselect</span><span class="params">(self, sql, values=[], apply_id=<span class="number">0</span>, check=False, log=True)</span>:</span></span><br><span class="line">        <span class="string">'''</span></span><br><span class="line"><span class="string">        多表查询接口</span></span><br><span class="line"><span class="string">        1、支持mysql基本查询，不支持聚集函数和分组排序等</span></span><br><span class="line"><span class="string">        '''</span></span><br><span class="line">        sql = sql.replace(<span class="string">'\n'</span>, <span class="string">''</span>)</span><br><span class="line">        <span class="keyword">if</span> check <span class="keyword">and</span> <span class="keyword">not</span> self._check_parameter(sql, values):</span><br><span class="line">            <span class="keyword">return</span></span><br><span class="line">        <span class="keyword">if</span> log:</span><br><span class="line">            logger.info(<span class="string">'apply_id:%d mulselect sql:%s, values:%s'</span>, apply_id,</span><br><span class="line">                        sql, values)</span><br><span class="line">        cur = self._mulselect(apply_id, sql, values)</span><br><span class="line">        <span class="keyword">for</span> row <span class="keyword">in</span> cur:</span><br><span class="line">            <span class="keyword">yield</span> row</span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">sinselect</span><span class="params">(self, sql, values=[], apply_id=<span class="number">0</span>, check=False, log=True)</span>:</span></span><br><span class="line">        sql = sql.replace(<span class="string">'\n'</span>, <span class="string">''</span>)</span><br><span class="line">        <span class="keyword">if</span> check <span class="keyword">and</span> <span class="keyword">not</span> self._check_parameter(sql, values):</span><br><span class="line">            <span class="keyword">return</span></span><br><span class="line">        <span class="comment">#过渡期间，增加deleted_at值判断</span></span><br><span class="line">        sql = self._replace(sql, self.user_apply, num=<span class="number">0</span>)</span><br><span class="line">        sql = self._replace(sql, self.user_base, num=<span class="number">0</span>)</span><br><span class="line">        sql = self._replace(sql, self.flows, num=<span class="number">0</span>)</span><br><span class="line"></span><br><span class="line">        <span class="keyword">if</span> log:</span><br><span class="line">            logger.info(<span class="string">'apply_id:%d sinselect sql:%s, values:%s'</span>, apply_id,</span><br><span class="line">                        sql, values)</span><br><span class="line">        cur = self._execute(sql, values)</span><br><span class="line">        <span class="keyword">for</span> row <span class="keyword">in</span> cur:</span><br><span class="line">            <span class="keyword">yield</span> row</span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">select</span><span class="params">(self, sql, values=[], apply_id=<span class="number">0</span>, check=False, log=True)</span>:</span></span><br><span class="line">        sql = sql.replace(<span class="string">'\n'</span>, <span class="string">''</span>)</span><br><span class="line">        <span class="keyword">if</span> check <span class="keyword">and</span> <span class="keyword">not</span> self._check_parameter(sql, values):</span><br><span class="line">            <span class="keyword">return</span></span><br><span class="line">        <span class="keyword">if</span> log:</span><br><span class="line">            logger.info(<span class="string">'apply_id:%d select sql:%s, values:%s'</span>, apply_id, sql,</span><br><span class="line">                        values)</span><br><span class="line">        cur = self._execute(sql, values)</span><br><span class="line">        <span class="keyword">for</span> row <span class="keyword">in</span> cur:</span><br><span class="line">            <span class="keyword">yield</span> row</span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">execute</span><span class="params">(self, sql, values=[], apply_id=<span class="number">0</span>, check=False, log=True)</span>:</span></span><br><span class="line">        sql = sql.replace(<span class="string">'\n'</span>, <span class="string">''</span>)</span><br><span class="line">        <span class="keyword">if</span> check <span class="keyword">and</span> <span class="keyword">not</span> self._check_parameter(sql, values):</span><br><span class="line">            <span class="keyword">return</span></span><br><span class="line">        <span class="keyword">if</span> log:</span><br><span class="line">            logger.info(<span class="string">'apply_id:%d execute sql:%s, values:%s'</span>, apply_id, sql,</span><br><span class="line">                        values)</span><br><span class="line">        cur = self._execute(sql, values)</span><br><span class="line"></span><br><span class="line"></span><br><span class="line">redis_op = RedisOps(</span><br><span class="line">    host=config.redis_host, port=config.redis_port, password=config.redis_pwd, db=config.redis_db)</span><br><span class="line"></span><br><span class="line"></span><br><span class="line">mysql_op = MysqlOps(</span><br><span class="line">    host=config.mysql_host,</span><br><span class="line">    port=config.mysql_port,</span><br><span class="line">    user=config.mysql_user,</span><br><span class="line">    passwd=config.mysql_pwd,</span><br><span class="line">    db=config.mysql_db)</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="keyword">if</span> __name__ == <span class="string">'__main__'</span>:</span><br><span class="line">    print(dir(redis_op))</span><br><span class="line">    print(dir(mysql_op))</span><br></pre></td></tr></table></figure><h3 id="日志记录"><a href="#日志记录" class="headerlink" title="日志记录"></a>日志记录</h3><p>使用logging模块，配置轮转</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">#!/usr/bin/env python</span></span><br><span class="line"><span class="comment"># -*- coding: utf-8 -*-</span></span><br><span class="line"><span class="comment"># @File : log.py</span></span><br><span class="line"><span class="comment"># @Author : cgDeepLearn</span></span><br><span class="line"><span class="comment"># @Create Date : 2020/11/12-2:50 下午</span></span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="keyword">import</span> logging</span><br><span class="line"><span class="keyword">import</span> logging.config</span><br><span class="line"></span><br><span class="line">logger = logging.getLogger(<span class="string">"debug"</span>)</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">init_log</span><span class="params">(log_path, log_name, log_level=<span class="string">"DEBUG"</span>)</span>:</span></span><br><span class="line">  <span class="string">"""初始化log</span></span><br><span class="line"><span class="string">  log_path:日志保存路径</span></span><br><span class="line"><span class="string">  log_name：日志文件名</span></span><br><span class="line"><span class="string">  log_level：日志级别，默认为DEBUG</span></span><br><span class="line"><span class="string">  """</span></span><br><span class="line">    log_level = log_level.upper()</span><br><span class="line">    LOG_PATH_DEBUG = <span class="string">"%s/%s.log"</span> % (log_path, log_name)  </span><br><span class="line">    LOG_PATH_ERROR = <span class="string">"%s/process_server_error.log"</span> % log_path  <span class="comment"># 错误日志路径</span></span><br><span class="line">    LOG_FILE_BACKUP_COUNT = <span class="number">7</span>  <span class="comment"># 日志保存天数</span></span><br><span class="line"><span class="comment"># 日志配置字典</span></span><br><span class="line">    log_conf = &#123;</span><br><span class="line">        <span class="string">"version"</span>: <span class="number">1</span>,</span><br><span class="line">        <span class="string">"formatters"</span>: &#123;</span><br><span class="line">            <span class="string">"format1"</span>: &#123;</span><br><span class="line">                <span class="string">"format"</span>: <span class="comment"># 日志格式</span></span><br><span class="line">                    <span class="string">'%(asctime)-15s [%(thread)d] - [%(filename)s %(lineno)d] %(levelname)s %(message)s'</span>,</span><br><span class="line">            &#125;,</span><br><span class="line">        &#125;,</span><br><span class="line">        <span class="string">"handlers"</span>: &#123;</span><br><span class="line">            <span class="string">"handler1"</span>: &#123;</span><br><span class="line">                <span class="string">"class"</span>: <span class="string">"logging.handlers.TimedRotatingFileHandler"</span>,</span><br><span class="line">                <span class="string">"level"</span>: log_level,</span><br><span class="line">                <span class="string">"formatter"</span>: <span class="string">"format1"</span>,</span><br><span class="line">                <span class="string">"when"</span>: <span class="string">"midnight"</span>,</span><br><span class="line">                <span class="string">"backupCount"</span>: LOG_FILE_BACKUP_COUNT,</span><br><span class="line">                <span class="string">"filename"</span>: LOG_PATH_DEBUG</span><br><span class="line">            &#125;,</span><br><span class="line">            <span class="string">"handler2"</span>: &#123;</span><br><span class="line">                <span class="string">"class"</span>: <span class="string">"logging.handlers.TimedRotatingFileHandler"</span>,</span><br><span class="line">                <span class="string">"level"</span>: <span class="string">'ERROR'</span>,</span><br><span class="line">                <span class="string">"formatter"</span>: <span class="string">"format1"</span>,</span><br><span class="line">                <span class="string">"when"</span>: <span class="string">"midnight"</span>,</span><br><span class="line">                <span class="string">"backupCount"</span>: LOG_FILE_BACKUP_COUNT,</span><br><span class="line">                <span class="string">"filename"</span>: LOG_PATH_ERROR</span><br><span class="line">            &#125;,</span><br><span class="line">        &#125;,</span><br><span class="line">        <span class="string">"loggers"</span>: &#123;</span><br><span class="line">            <span class="string">"debug"</span>: &#123;</span><br><span class="line">                <span class="string">"handlers"</span>: [<span class="string">"handler1"</span>, <span class="string">"handler2"</span>],</span><br><span class="line">                <span class="string">"level"</span>: log_level</span><br><span class="line">            &#125;,</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line">    logging.config.dictConfig(log_conf)</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">close_log</span><span class="params">()</span>:</span></span><br><span class="line">    logging.shutdown()</span><br></pre></td></tr></table></figure><h3 id="请求返回"><a href="#请求返回" class="headerlink" title="请求返回"></a>请求返回</h3><p>我们编写了一个expose装饰器来做路由:</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">#!/usr/bin/env python</span></span><br><span class="line"><span class="comment"># -*- coding: utf-8 -*-</span></span><br><span class="line"><span class="comment"># @File : decorator.py </span></span><br><span class="line"><span class="comment"># @Author : cgDeepLearn</span></span><br><span class="line"><span class="comment"># @Create Date : 2020/11/12-2:56 下午</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">import</span> time</span><br><span class="line"><span class="keyword">from</span> functools <span class="keyword">import</span> wraps</span><br><span class="line"><span class="keyword">from</span> utils.log <span class="keyword">import</span> logger</span><br><span class="line"><span class="keyword">from</span> werkzeug.routing <span class="keyword">import</span> Map, Rule</span><br><span class="line"></span><br><span class="line"></span><br><span class="line">url_map = Map()</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">expose</span><span class="params">(rule, **kw)</span>:</span></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">decorate</span><span class="params">(f)</span>:</span></span><br><span class="line">        kw[<span class="string">'endpoint'</span>] = f.__name__</span><br><span class="line">        url_map.add(Rule(rule, **kw))</span><br><span class="line">        <span class="keyword">return</span> f</span><br><span class="line"></span><br><span class="line">    <span class="keyword">return</span> decorate</span><br></pre></td></tr></table></figure><ul><li><p>对应url_map绑定到环境:</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">#!/usr/bin/env python</span></span><br><span class="line"><span class="comment"># -*- coding: utf-8 -*-</span></span><br><span class="line"><span class="comment"># @File : app.py </span></span><br><span class="line"><span class="comment"># @Author : cgDeepLearn</span></span><br><span class="line"><span class="comment"># @Create Date : 2020/11/12-2:47 下午</span></span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="keyword">from</span> gevent <span class="keyword">import</span> monkey</span><br><span class="line">monkey.patch_all()</span><br><span class="line"></span><br><span class="line"><span class="keyword">from</span> traceback <span class="keyword">import</span> format_exc</span><br><span class="line"><span class="keyword">from</span> werkzeug.wrappers <span class="keyword">import</span> Request, Response</span><br><span class="line"><span class="keyword">from</span> werkzeug.exceptions <span class="keyword">import</span> HTTPException, NotFound, MethodNotAllowed, BadRequest</span><br><span class="line"><span class="keyword">from</span> server <span class="keyword">import</span> g_server</span><br><span class="line"><span class="keyword">from</span> utils.log <span class="keyword">import</span> logger</span><br><span class="line"><span class="keyword">from</span> decorator <span class="keyword">import</span> url_map</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">g_app</span><span class="params">(environ, start_response)</span>:</span></span><br><span class="line">    request = Request(environ)</span><br><span class="line">    adapter = url_map.bind_to_environ(environ)</span><br><span class="line">    <span class="keyword">try</span>:</span><br><span class="line">        endpoint, values = adapter.match()</span><br><span class="line">        response = getattr(g_server, endpoint)(request, **values)</span><br><span class="line">    <span class="keyword">except</span> (NotFound, MethodNotAllowed, BadRequest) <span class="keyword">as</span> e:</span><br><span class="line">        response = e</span><br><span class="line">    <span class="keyword">except</span> HTTPException <span class="keyword">as</span> e:</span><br><span class="line">        response = e</span><br><span class="line">    <span class="keyword">except</span>:</span><br><span class="line">        Response()</span><br><span class="line">        response = Response(<span class="string">'Uncatched Error'</span>)</span><br><span class="line">        logger.error(<span class="string">'app uncatched error, exception:%s'</span>, format_exc())</span><br><span class="line">    <span class="keyword">return</span> response(environ, start_response)</span><br></pre></td></tr></table></figure></li><li><p>请求/test响应示例</p></li></ul><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br><span class="line">91</span><br><span class="line">92</span><br><span class="line">93</span><br><span class="line">94</span><br><span class="line">95</span><br><span class="line">96</span><br><span class="line">97</span><br><span class="line">98</span><br><span class="line">99</span><br><span class="line">100</span><br><span class="line">101</span><br><span class="line">102</span><br><span class="line">103</span><br><span class="line">104</span><br><span class="line">105</span><br><span class="line">106</span><br><span class="line">107</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">#!/usr/bin/env python</span></span><br><span class="line"><span class="comment"># -*- coding: utf-8 -*-</span></span><br><span class="line"><span class="comment"># @File : server.py </span></span><br><span class="line"><span class="comment"># @Author : edgar.chen</span></span><br><span class="line"><span class="comment"># @Create Date : 2020/11/12-2:47 下午</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">import</span> os</span><br><span class="line"><span class="keyword">import</span> json</span><br><span class="line"><span class="keyword">import</span> uuid</span><br><span class="line"><span class="keyword">from</span> werkzeug.wrappers <span class="keyword">import</span> Response</span><br><span class="line"><span class="keyword">from</span> werkzeug.exceptions <span class="keyword">import</span> BadRequest</span><br><span class="line"><span class="keyword">from</span> traceback <span class="keyword">import</span> format_exc</span><br><span class="line"></span><br><span class="line"><span class="keyword">from</span> conf <span class="keyword">import</span> config</span><br><span class="line"><span class="keyword">from</span> utils.log <span class="keyword">import</span> logger, init_log</span><br><span class="line"><span class="keyword">from</span> utils.errors <span class="keyword">import</span> ServerProcessError, ParameterError</span><br><span class="line"><span class="keyword">from</span> decorator <span class="keyword">import</span> timer, expose</span><br><span class="line"><span class="keyword">from</span> api <span class="keyword">import</span> TestApi</span><br><span class="line"></span><br><span class="line"><span class="class"><span class="keyword">class</span> <span class="title">Server</span><span class="params">(object)</span>:</span></span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">__init__</span><span class="params">(self)</span>:</span></span><br><span class="line">        self._pid = os.getpid()</span><br><span class="line"></span><br><span class="line">        os.makedirs(config.log_path) <span class="keyword">if</span> <span class="keyword">not</span> os.path.exists(config.log_path) <span class="keyword">else</span> <span class="keyword">None</span></span><br><span class="line">        log_name = <span class="string">"&#123;&#125;.&#123;&#125;"</span>.format(config.log_name, self._pid)</span><br><span class="line">        init_log(config.log_path, log_name)</span><br><span class="line">        logger.debug(<span class="string">'server is running...'</span>)</span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">_rsp_encode</span><span class="params">(self, rsp)</span>:</span></span><br><span class="line">        <span class="keyword">return</span> json.dumps(rsp, separators=(<span class="string">','</span>, <span class="string">':'</span>))</span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">_req_decode</span><span class="params">(self, req)</span>:</span></span><br><span class="line">        <span class="keyword">try</span>:</span><br><span class="line">            data = json.loads(req)</span><br><span class="line">            <span class="keyword">return</span> data</span><br><span class="line">        <span class="keyword">except</span> Exception:</span><br><span class="line">            <span class="keyword">raise</span> ParameterError(<span class="string">"post data not json string!"</span>)</span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">_errcode</span><span class="params">(self, code=<span class="number">0</span>, msg=<span class="string">'ok'</span>)</span>:</span></span><br><span class="line">        <span class="keyword">return</span> dict(errCode=code, errMsg=msg, err_code=code, err_msg=msg)</span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">_response</span><span class="params">(self, response)</span>:</span></span><br><span class="line">      <span class="string">"""包装返回结果"""</span></span><br><span class="line">        response[<span class="string">'version'</span>] = config.version</span><br><span class="line">        encode_response = self._rsp_encode(response)</span><br><span class="line">        <span class="keyword">return</span> Response(encode_response, mimetype=<span class="string">'application/json'</span>)</span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">_get_query_args</span><span class="params">(self, data, apply_detail)</span>:</span></span><br><span class="line">      <span class="string">"""获取检验请求参数"""</span></span><br><span class="line">        <span class="keyword">try</span>:</span><br><span class="line">            apply_detail[<span class="string">'order_id'</span>] = int(data[<span class="string">"order_id"</span>])</span><br><span class="line">        <span class="keyword">except</span>:</span><br><span class="line">            <span class="keyword">raise</span> ParameterError(<span class="string">'request params not complete or format not right'</span>)</span><br><span class="line"></span><br><span class="line"><span class="meta">    @expose('/test', methods=['POST'])</span></span><br><span class="line"><span class="meta">    @timer</span></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">api_test</span><span class="params">(self, request)</span>:</span></span><br><span class="line"><span class="string">"""请求/test"""</span></span><br><span class="line">        <span class="keyword">try</span>:</span><br><span class="line">            req_id = str(uuid.uuid1())</span><br><span class="line">            request_data = request.get_data()</span><br><span class="line">            <span class="keyword">if</span> <span class="keyword">not</span> request_data:</span><br><span class="line">                <span class="keyword">raise</span> BadRequest()</span><br><span class="line"></span><br><span class="line">            logger.debug(<span class="string">'req_id: [&#123;&#125;] - request:&#123;&#125;'</span>.format(req_id, request_data))</span><br><span class="line">            <span class="comment"># 得到请求参数字典</span></span><br><span class="line">            data = self._req_decode(request_data)</span><br><span class="line"></span><br><span class="line">            <span class="comment"># 获取需要的请求参数</span></span><br><span class="line">            apply_detail = dict()</span><br><span class="line">            apply_detail[<span class="string">'req_id'</span>] = req_id</span><br><span class="line">            self._get_query_args(data, apply_detail)</span><br><span class="line"></span><br><span class="line">            <span class="comment"># 结果</span></span><br><span class="line">            res_data = TestApi(req_data=apply_detail).process()</span><br><span class="line"></span><br><span class="line">            <span class="comment"># 返回</span></span><br><span class="line">            response = self._errcode(<span class="number">0</span>)</span><br><span class="line">            order_id = apply_detail[<span class="string">"order_id"</span>]</span><br><span class="line">            result = &#123;<span class="string">"order_id"</span>: order_id,</span><br><span class="line">                      <span class="string">"uuid"</span>: req_id,</span><br><span class="line">                      <span class="string">"data"</span>: res_data&#125;</span><br><span class="line">            response.update(result)</span><br><span class="line">            logger.debug(<span class="string">'req_id: [%s] - order_id: %s, predict_org, response:%s'</span>, req_id, order_id, response)</span><br><span class="line"><span class="comment"># 异常处理</span></span><br><span class="line">        <span class="keyword">except</span> BadRequest:</span><br><span class="line">            logger.error(<span class="string">'bad request, request params needed!'</span>)</span><br><span class="line"></span><br><span class="line">            response = self._errcode(<span class="number">-2</span>, <span class="string">'bad request, request params needed!'</span>)</span><br><span class="line"></span><br><span class="line">        <span class="keyword">except</span> ParameterError <span class="keyword">as</span> e:</span><br><span class="line">            logger.error(str(e))</span><br><span class="line">            response = self._errcode(<span class="number">-4</span>, str(e))</span><br><span class="line"></span><br><span class="line">        <span class="keyword">except</span> ServerProcessError <span class="keyword">as</span> e:</span><br><span class="line">            logger.error(<span class="string">'req_id: [%s] - apply_id: [%s] except: %s'</span> % (req_id, order_id, str(e)))</span><br><span class="line">            response = self._errcode(<span class="number">-3</span>, str(e))</span><br><span class="line"></span><br><span class="line">        <span class="keyword">except</span>:</span><br><span class="line">            logger.error(<span class="string">'req_id: [%s] - apply_id: [%s] except: %s'</span> % (req_id, order_id, format_exc()))</span><br><span class="line">            response = self._errcode(<span class="number">-1</span>, <span class="string">'server error'</span>)</span><br><span class="line">        <span class="keyword">finally</span>:</span><br><span class="line">            <span class="keyword">return</span> self._response(response)</span><br><span class="line"></span><br><span class="line"></span><br><span class="line">g_server = Server()</span><br></pre></td></tr></table></figure><p>如果我们想要新增一个路由，只需再server模块增加对应处理即可</p><ul><li><p>业务逻辑简答示例</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">#!/usr/bin/env python</span></span><br><span class="line"><span class="comment"># -*- coding: utf-8 -*-</span></span><br><span class="line"><span class="comment"># @File : api.py </span></span><br><span class="line"><span class="comment"># @Author : cgDeepLearn</span></span><br><span class="line"><span class="comment"># @Create Date : 2020/11/16-3:40 下午</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">import</span> datetime</span><br><span class="line"><span class="keyword">import</span> random</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="class"><span class="keyword">class</span> <span class="title">TestApi</span><span class="params">(object)</span>:</span></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">__init__</span><span class="params">(self, req_data)</span>:</span></span><br><span class="line">        self.req_data = req_data</span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">process</span><span class="params">(self)</span>:</span></span><br><span class="line">        <span class="string">"""</span></span><br><span class="line"><span class="string">        处理返回</span></span><br><span class="line"><span class="string">        """</span></span><br><span class="line">        results = [<span class="number">-1</span>, <span class="number">0</span>, <span class="number">1</span>]</span><br><span class="line">        res = &#123;</span><br><span class="line">            <span class="string">'timestamp'</span>: datetime.datetime.now().timestamp(),</span><br><span class="line">            <span class="string">'result'</span>: random.choice(results)</span><br><span class="line">        &#125;</span><br><span class="line">        <span class="keyword">return</span> res</span><br></pre></td></tr></table></figure></li></ul><h3 id="gunicorn-配置"><a href="#gunicorn-配置" class="headerlink" title="gunicorn 配置"></a>gunicorn 配置</h3><p>我们使用gevent协程工作方式，配置对应服务器ip、端口、进程数等参数。</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">#!/usr/bin/env python</span></span><br><span class="line"><span class="comment"># -*- coding: utf-8 -*-</span></span><br><span class="line"><span class="comment"># @File : gunicorn_config.py </span></span><br><span class="line"><span class="comment"># @Author : cgDeepLearn</span></span><br><span class="line"><span class="comment"># @Create Date : 2020/11/12-3:17 下午</span></span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="keyword">import</span> os</span><br><span class="line"><span class="keyword">import</span> datetime</span><br><span class="line"></span><br><span class="line">server_ip = <span class="string">'127.0.0.1'</span>  <span class="comment"># 服务器地址</span></span><br><span class="line"><span class="comment"># linux get ip</span></span><br><span class="line"><span class="comment"># server_ip = os.popen('ifconfig eth0|grep inet|grep -v 127.0.0.1|grep -v inet6|awk \'&#123;print $2&#125;\'|tr -d "addr:"').readlines()[0].strip('\r\n')</span></span><br><span class="line">server_port = <span class="number">10001</span>  <span class="comment"># 端口号</span></span><br><span class="line">bind = <span class="string">'%s:%s'</span> % (server_ip, server_port)</span><br><span class="line">workers = <span class="number">1</span>  <span class="comment"># 工作进程数</span></span><br><span class="line">keepalive = <span class="number">600</span>  <span class="comment"># 保持连接时间,10分钟，避免过多短连接</span></span><br><span class="line"></span><br><span class="line">backlog = <span class="number">2048</span>  <span class="comment"># the maximum number of pending connections</span></span><br><span class="line">worker_connections = <span class="number">2048</span>  <span class="comment"># the maximum number of simultaneous clients</span></span><br><span class="line">worker_class = <span class="string">'gevent'</span> <span class="comment"># worker进程的工作方式。 有 sync, eventlet, gevent, tornado, gthread, 缺省值sync。我们使用gevent协程方式</span></span><br><span class="line"></span><br><span class="line"></span><br><span class="line">loglevel = <span class="string">'info'</span></span><br><span class="line"><span class="comment"># daemon = False  # 应用是否以daemon方式运行。</span></span><br><span class="line">script_path = os.path.dirname(os.path.abspath(__file__))</span><br><span class="line">work_path = os.path.dirname(script_path)</span><br><span class="line">log_name = <span class="string">'server'</span></span><br><span class="line">pidfile = <span class="string">'&#123;&#125;/gunicorn.pid'</span>.format(script_path)</span><br><span class="line">errorlog = <span class="string">'&#123;&#125;/log/gunicorn_error.log'</span>.format(work_path)</span><br><span class="line">chdir = <span class="string">'&#123;&#125;/src'</span>.format(work_path)</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">worker_exit</span><span class="params">(server, worker)</span>:</span></span><br><span class="line">  <span class="string">"""工作进程 退出钩子，添加之前日志日期"""</span></span><br><span class="line">    pid = worker.pid</span><br><span class="line">    logfile = os.path.join(work_path, <span class="string">'log/&#123;&#125;.&#123;&#125;.log'</span>.format(log_name,pid))</span><br><span class="line">    </span><br><span class="line">    newfile = os.path.join(work_path, <span class="string">'log/&#123;&#125;.&#123;&#125;.log.&#123;&#125;'</span>.format(log_name,</span><br><span class="line">        pid, datetime.datetime.now().strftime(<span class="string">'%Y-%m-%d'</span>)))</span><br><span class="line">    <span class="keyword">if</span> os.path.exists(logfile):</span><br><span class="line">        os.rename(logfile, newfile)</span><br></pre></td></tr></table></figure><h3 id="启动脚本"><a href="#启动脚本" class="headerlink" title="启动脚本"></a>启动脚本</h3><p>脚本里使用gunicorn启动服务、通过<code>kill</code>命令停止、重载或者重启服务。</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#</span>!/bin/sh</span><br><span class="line"></span><br><span class="line"><span class="meta">#</span> IP=`/sbin/ifconfig  | grep 'inet addr:'| grep -v '127.0.0.1' |  cut -d: -f2 | awk '&#123; print $1&#125;'` # linux</span><br><span class="line">BASE_DIR=` cd "$(dirname "$0")"; cd ..;  pwd `</span><br><span class="line"></span><br><span class="line">PROCESS_NAME=$&#123;BASE_DIR##*/&#125;</span><br><span class="line">CONFIG_PATH=$BASE_DIR/script/gunicorn_config.py</span><br><span class="line">PID_FILE=$BASE_DIR/script/gunicorn.pid</span><br><span class="line">CMD="gunicorn --daemon -c $CONFIG_PATH app:g_app "</span><br><span class="line"></span><br><span class="line">function start_gunicorn() &#123;</span><br><span class="line">    cd $BASE_DIR</span><br><span class="line">    $CMD</span><br><span class="line">    echo "$PROCESS_NAME is running."</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line">function stop_gunicorn() &#123;</span><br><span class="line">    if [ -f $PID_FILE ]; then</span><br><span class="line">        PID=`cat $PID_FILE`</span><br><span class="line">        echo "gunicorn: kill PID=$&#123;PID&#125;"</span><br><span class="line">        kill -TERM $&#123;PID&#125;</span><br><span class="line">    else</span><br><span class="line">        echo "gunicorn: $PID_FILE not exists..."</span><br><span class="line">    fi</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line">function reload_gunicorn() &#123;</span><br><span class="line">    if [ -f $PID_FILE ]; then</span><br><span class="line">        PID=`cat $PID_FILE`</span><br><span class="line">        kill -HUP $PID</span><br><span class="line">        echo "$PID reloaded..."</span><br><span class="line">    else</span><br><span class="line">        echo "$PID_FILE not exists..."</span><br><span class="line">    fi</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line">function restart_gunicorn () &#123;</span><br><span class="line">    stop_gunicorn</span><br><span class="line">    sleep 1</span><br><span class="line">    start_gunicorn</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line">case "$1" in</span><br><span class="line">    start)</span><br><span class="line">        start_gunicorn</span><br><span class="line">        exit $?</span><br><span class="line">        ;;</span><br><span class="line">    stop)</span><br><span class="line">        stop_gunicorn</span><br><span class="line">        exit $?</span><br><span class="line">        ;;</span><br><span class="line">    restart)</span><br><span class="line">        restart_gunicorn</span><br><span class="line">        exit $?</span><br><span class="line">        ;;</span><br><span class="line">    reload)</span><br><span class="line">        reload_gunicorn</span><br><span class="line">        exit $?</span><br><span class="line">        ;;</span><br><span class="line">    *)</span><br><span class="line">        echo "Usage: $0 &#123; start | stop | restart | reload &#125;"</span><br><span class="line">        exit 1</span><br><span class="line">        ;;</span><br><span class="line">esac</span><br></pre></td></tr></table></figure><h3 id="请求示例"><a href="#请求示例" class="headerlink" title="请求示例"></a>请求示例</h3><ul><li><p>请求：</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">curl http://127.0.0.1:10001/test -d '&#123;"order_id":12345&#125;'</span><br></pre></td></tr></table></figure></li><li><p>返回:</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">&#123;"order_id": 12345,"uuid": "e3bd4a7e-35eb-11eb-bd7e-5254008f07ce","err_msg":"ok","err_code":0,"data": &#123;"timestamp":1589193745,"result":1&#125;&#125;</span><br></pre></td></tr></table></figure></li></ul><h2 id="小结"><a href="#小结" class="headerlink" title="小结"></a>小结</h2><p>以上我们使用<code>Werkzeug</code>、<code>gevent</code>、<code>gunicorn</code>实现了一个轻量的python服务，修改配置、添加对应业务逻辑即可快速开发一个业务对应的微服务。项目git代码: <a href="https://github.com/cgDeepLearn/pyserver" target="_blank" rel="noopener">https://github.com/cgDeepLearn/pyserver</a></p><p>此外</p><ul><li>我们还可以使用<code>supervisor</code>来控制服务的监控拉起。</li><li>使用<code>docker</code>来进行容器化管理</li><li><p>使用<a href="https://github.com/cookiecutter/cookiecutter" target="_blank" rel="noopener">cookiecutter</a>来生成可配置的代码</p></li><li><p>使用<code>serviceMesh</code> 来进行服务注册发现</p></li></ul>]]></content>
    
    <summary type="html">
    
      &lt;p class=&quot;description&quot;&gt;快速开发你的小服务&lt;br&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;&quot; alt=&quot;&quot; style=&quot;width:100%&quot;&gt;&lt;/p&gt;
&lt;h2 id=&quot;前言&quot;&gt;&lt;a href=&quot;#前言&quot; class=&quot;headerlink&quot; title=&quot;前言&quot;&gt;&lt;/a&gt;前言&lt;/h2&gt;&lt;p&gt;&lt;div class=&quot;note info&quot;&gt;&lt;p&gt;&lt;/p&gt;
&lt;p&gt;现在微服务盛行，很多项目依赖一些协同工作小而自治的服务。那么对于python，虽然有&lt;code&gt;grpc&lt;/code&gt;、&lt;code&gt;nameko&lt;/code&gt;这样的微服务框架，或者&lt;code&gt;Flask&lt;/code&gt;、&lt;code&gt;Django&lt;/code&gt;这样的web框架。我们是否可以编写一个轻量可复用的服务框架呢？&lt;/p&gt;
&lt;p&gt;&lt;/p&gt;&lt;/div&gt;&lt;br&gt;
    
    </summary>
    
      <category term="Python" scheme="https://cgdeeplearn.github.io/categories/Python/"/>
    
    
      <category term="python" scheme="https://cgdeeplearn.github.io/tags/python/"/>
    
      <category term="微服务" scheme="https://cgdeeplearn.github.io/tags/%E5%BE%AE%E6%9C%8D%E5%8A%A1/"/>
    
      <category term="gevent" scheme="https://cgdeeplearn.github.io/tags/gevent/"/>
    
      <category term="gunicorn" scheme="https://cgdeeplearn.github.io/tags/gunicorn/"/>
    
      <category term="Werkzeug" scheme="https://cgdeeplearn.github.io/tags/Werkzeug/"/>
    
      <category term="ServiceMesh" scheme="https://cgdeeplearn.github.io/tags/ServiceMesh/"/>
    
  </entry>
  
  <entry>
    <title>Python下Select模块以及IO多路复用</title>
    <link href="https://cgdeeplearn.github.io/2018/10/31/Python-Select/"/>
    <id>https://cgdeeplearn.github.io/2018/10/31/Python-Select/</id>
    <published>2018-10-31T01:09:27.000Z</published>
    <updated>2018-10-31T08:44:14.674Z</updated>
    
    <content type="html"><![CDATA[<script src="/assets/js/APlayer.min.js"> </script><p class="description">select模块以及IO多路复用<br></p><p><img src="" alt="" style="width:100%"></p><h2 id="前言"><a href="#前言" class="headerlink" title="前言"></a>前言</h2><p>Python中的<code>select</code>模块专注于I/O多路复用，提供了<code>select</code>, <code>poll</code>, <code>epoll</code>三个方法(其中后两个在Linux中可用，windows仅支持select)，另外也提供了<code>kqueue</code>方法(freeBSD系统).</p><p>select()的机制中提供一<code>fd_set</code>的数据结构，实际上是一<code>long</code>类型的数组， 每一个数组元素都能与一打开的文件句柄（不管是Socket句柄，还是其他文件或命名管道或设备句柄）建立联系，建立联系的工作由程序员完成， 当调用select()时，由内核根据IO状态修改fd_set的内容，由此来通知执行了select()的进程哪一Socket或文件可读或可写。</p><p><strong>select主要用于socket通信当中，能监视我们需要的文件描述符变化。</strong></p><a id="more"></a><h2 id="非阻塞式I-O编程特点"><a href="#非阻塞式I-O编程特点" class="headerlink" title="非阻塞式I/O编程特点"></a>非阻塞式I/O编程特点</h2><ul><li>如果发现一个I/O有输入，读取的过程中，另外一个也有了输入，这时候不会产生任何反应.这就需要你的程序语句去用到select函数的时候才知道有数据输入。</li><li>程序去select的时候，如果没有数据输入，程序会一直等待，直到有数据为止，也就是程序中无需循环和sleep。</li></ul><p>Select在Socket编程中还是比较重要的，可是对于初学Socket的人来说都不太爱用Select写程序，他们只是习惯写诸如<code>connect</code>、<code>accept</code>、<code>recv</code>或<code>recvfrom</code>这样的阻塞程序（所谓阻塞方式block，顾名思义，就是进程或是线程执行到这些函数时必须等待某个事件的发生，如果事件没有发生，进程或线程就被阻塞，函数不能立即返回）。</p><p>可是使用Select就可以完成非阻塞（所谓非阻塞方式non-block，就是进程或线程执行此函数时不必非要等待事件的发生，一旦执行肯定返回，以返回值的不同来反映函数的执行情况，如果事件发生则与阻塞方式相同，若事件没有发生，则返回一个代码来告知事件未发生，而进程或线程继续执行，所以效率较高）方式工作的程序，它能够监视我们需要监视的文件描述符的变化情况——读写或是异常。</p><h2 id="Select方法"><a href="#Select方法" class="headerlink" title="Select方法"></a>Select方法</h2><h3 id="基本原理"><a href="#基本原理" class="headerlink" title="基本原理"></a>基本原理</h3><p>进程指定内核监听哪些文件描述符(最多监听1024个fd)的哪些事件，当没有文件描述符事件发生时，进程被阻塞；当一个或者多个文件描述符事件发生时，进程被唤醒。</p><p>当我们调用select()时：</p><ol><li>上下文切换转换为内核态</li><li>将fd从用户空间复制到内核空间</li><li>内核遍历所有fd，查看其对应事件是否发生</li><li>如果没发生，将进程阻塞，当设备驱动产生中断或者timeout时间后，将进程唤醒，再次进行遍历</li><li>返回遍历后的fd</li><li>将fd从内核空间复制到用户空间</li></ol><h3 id="select函数方法参数"><a href="#select函数方法参数" class="headerlink" title="select函数方法参数"></a>select函数方法参数</h3><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">fd_r_list, fd_w_list, fd_e_list = select.select(rlist, wlist, xlist, [timeout])</span><br></pre></td></tr></table></figure><h4 id="参数"><a href="#参数" class="headerlink" title="参数"></a>参数</h4><p>可接受四个参数（前三个必须）:</p><ul><li>rlist: wait until ready for reading</li><li>wlist: wait until ready for writing</li><li>xlist: wait for an “exceptional condition”</li><li>timeout: 超时时间</li></ul><h4 id="返回值：三个列表"><a href="#返回值：三个列表" class="headerlink" title="返回值：三个列表"></a>返回值：三个列表</h4><p>select方法用来监视文件描述符(当文件描述符条件不满足时，select会阻塞)，当某个文件描述符状态改变后，会返回三个列表</p><ol><li>当参数1 序列中的fd满足“可读”条件时，则获取发生变化的fd并添加到fd_r_list中</li><li>当参数2 序列中含有fd时，则将该序列中所有的fd添加到 fd_w_list中</li><li>当参数3 序列中的fd发生错误时，则将该发生错误的fd添加到 fd_e_list中</li><li>当超时时间为空，则select会一直阻塞，直到监听的句柄发生变化.当超时时间 ＝ n(正整数)时，那么如果监听的句柄均无任何变化，则select会阻塞n秒，之后返回三个空列表，如果监听的句柄有变化，则直接执行。</li></ol><h3 id="示例"><a href="#示例" class="headerlink" title="示例"></a>示例</h3><h4 id="示例1-模拟select-同时监听多个端口"><a href="#示例1-模拟select-同时监听多个端口" class="headerlink" title="示例1:模拟select,同时监听多个端口"></a>示例1:模拟select,同时监听多个端口</h4><ul><li>服务端</li></ul><figure class="highlight python"><figcaption><span>服务端select_server.py</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment"># coding=utf-8</span></span><br><span class="line"><span class="string">"""模拟select,同时监听多个端口"""</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">import</span> socket</span><br><span class="line"><span class="keyword">import</span> select</span><br><span class="line"></span><br><span class="line">HOST = <span class="string">''</span></span><br><span class="line">PORT1, PORT2, PORT3 = <span class="number">8001</span>, <span class="number">8002</span>, <span class="number">8003</span></span><br><span class="line">BUFSIZ = <span class="number">1024</span></span><br><span class="line">ADDR1, ADDR2, ADDR3 = (HOST, PORT1), (HOST, PORT2), (HOST, PORT3)</span><br><span class="line"></span><br><span class="line">ss1 = socket.socket()</span><br><span class="line">ss1.bind(ADDR1)</span><br><span class="line">ss1.listen()</span><br><span class="line"></span><br><span class="line">ss2 = socket.socket()</span><br><span class="line">ss2.bind(ADDR2)</span><br><span class="line">ss2.listen()</span><br><span class="line"></span><br><span class="line">ss3 = socket.socket()</span><br><span class="line">ss3.bind(ADDR3)</span><br><span class="line">ss3.listen()</span><br><span class="line"></span><br><span class="line">inputs = [ss1, ss2, ss3]</span><br><span class="line"></span><br><span class="line"><span class="keyword">while</span> <span class="keyword">True</span>:</span><br><span class="line">    r_list, w_list, e_list = select.select(inputs,[],inputs,<span class="number">1</span>)</span><br><span class="line">    <span class="keyword">for</span> ss <span class="keyword">in</span> r_list:</span><br><span class="line">        <span class="comment"># conn表示每一个连接对象</span></span><br><span class="line">        conn, address = ss.accept()</span><br><span class="line">        conn.sendall(bytes(<span class="string">'hello'</span>, encoding=<span class="string">'utf-8'</span>))</span><br><span class="line">        conn.close()</span><br><span class="line"></span><br><span class="line">    <span class="keyword">for</span> ss <span class="keyword">in</span> e_list:</span><br><span class="line">        inputs.remove(ss)</span><br></pre></td></tr></table></figure><ul><li>客户端</li></ul><figure class="highlight python"><figcaption><span>客户端1select_client1.py</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment"># coding=utf-8</span></span><br><span class="line"><span class="string">"""客户端1"""</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">import</span> socket</span><br><span class="line"></span><br><span class="line">HOST = <span class="string">'localhost'</span></span><br><span class="line">PORT = <span class="number">8001</span></span><br><span class="line">BUFSIZ = <span class="number">1024</span></span><br><span class="line">ADDR = (HOST, PORT)</span><br><span class="line"></span><br><span class="line"></span><br><span class="line">cs = socket.socket()</span><br><span class="line">cs.connect(ADDR)</span><br><span class="line"></span><br><span class="line">msg = cs.recv(BUFSIZ)</span><br><span class="line"></span><br><span class="line">print(msg.decode(<span class="string">'utf-8'</span>))</span><br><span class="line"></span><br><span class="line">cs.close()</span><br></pre></td></tr></table></figure><hr><figure class="highlight python"><figcaption><span>客户端2select_client2.py</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment"># coding=utf-8</span></span><br><span class="line"><span class="string">"""客户端2"""</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">import</span> socket</span><br><span class="line"></span><br><span class="line">HOST = <span class="string">'localhost'</span></span><br><span class="line">PORT = <span class="number">8002</span></span><br><span class="line">BUFSIZ = <span class="number">1024</span></span><br><span class="line">ADDR = (HOST, PORT)</span><br><span class="line"></span><br><span class="line"></span><br><span class="line">cs = socket.socket()</span><br><span class="line">cs.connect(ADDR)</span><br><span class="line"></span><br><span class="line">msg = cs.recv(BUFSIZ)</span><br><span class="line"></span><br><span class="line">print(msg.decode(<span class="string">'utf-8'</span>))</span><br><span class="line"></span><br><span class="line">cs.close()</span><br></pre></td></tr></table></figure><p>运行server端和client端，客户端1,2均能连接。<br>但是以上程序并不能同时对客户端的输入同时响应处理(两个客户端连接都没关闭的情况下)，下面就来介绍I/O多路复用的例子</p><h4 id="示例2：IO多路复用–使用socket模拟多线程，并实现读写分离"><a href="#示例2：IO多路复用–使用socket模拟多线程，并实现读写分离" class="headerlink" title="示例2：IO多路复用–使用socket模拟多线程，并实现读写分离"></a>示例2：IO多路复用–使用socket模拟多线程，并实现读写分离</h4><ul><li>服务端</li></ul><figure class="highlight python"><figcaption><span>服务端select_multi_server.py</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br><span class="line">91</span><br><span class="line">92</span><br><span class="line">93</span><br><span class="line">94</span><br><span class="line">95</span><br><span class="line">96</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment"># coding=utf-8</span></span><br><span class="line"><span class="string">"""使用socket模拟多线程，使多用户可以同时连接"""</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">import</span> socket</span><br><span class="line"><span class="keyword">import</span> select</span><br><span class="line"><span class="keyword">import</span> queue</span><br><span class="line"><span class="keyword">from</span> time <span class="keyword">import</span> ctime</span><br><span class="line"></span><br><span class="line">HOST = <span class="string">''</span></span><br><span class="line">PORT = <span class="number">8001</span></span><br><span class="line">BUFSIZ = <span class="number">1024</span></span><br><span class="line">ADDR = (HOST, PORT)</span><br><span class="line"></span><br><span class="line"><span class="comment"># 创建连接</span></span><br><span class="line">ss = socket.socket(socket.AF_INET, socket.SOCK_STREAM)</span><br><span class="line"><span class="comment">#ss.setblocking(False)</span></span><br><span class="line"></span><br><span class="line">ss.bind(ADDR)</span><br><span class="line">ss.listen(<span class="number">5</span>)</span><br><span class="line"></span><br><span class="line">inputs = [ss, ]</span><br><span class="line">outputs = []</span><br><span class="line">message_dict = &#123;&#125;</span><br><span class="line"></span><br><span class="line"><span class="keyword">while</span> inputs:</span><br><span class="line">    print(<span class="string">'waiting for the next event...'</span>)</span><br><span class="line">    r_list, w_list, e_list = select.select(inputs, outputs, inputs, <span class="number">10</span>)</span><br><span class="line"></span><br><span class="line">    <span class="keyword">for</span> s <span class="keyword">in</span> r_list:</span><br><span class="line">        <span class="comment"># 判断当前触发的是不是服务端对象，当触发的是服务端对象时，说明有新客户端连接进来了</span></span><br><span class="line">        <span class="keyword">if</span> s <span class="keyword">is</span> ss:</span><br><span class="line">            <span class="comment"># 表示有新用户来连接</span></span><br><span class="line">            conn, addr = s.accept()</span><br><span class="line">            print(<span class="string">"connection from"</span>, addr)</span><br><span class="line">            <span class="comment"># 将客户端对象也加入到监听的列表中，当客户端发消息时select将触发</span></span><br><span class="line">            <span class="comment">#conn.setblocking(0)</span></span><br><span class="line">            inputs.append(conn)</span><br><span class="line">            <span class="comment"># 为连接的客户端单独创建一个消息队列，用来保存客户端发送的消息</span></span><br><span class="line">            message_dict[conn] = queue.Queue()</span><br><span class="line">        <span class="keyword">else</span>:</span><br><span class="line">            <span class="comment"># 有老用户发消息</span></span><br><span class="line">            <span class="keyword">try</span>:</span><br><span class="line">                data_bytes = s.recv(BUFSIZ)</span><br><span class="line">            <span class="comment"># 客户端未断开</span></span><br><span class="line">            <span class="comment">#if data_bytes != '':</span></span><br><span class="line">                data = data_bytes.decode(<span class="string">'utf-8'</span>)</span><br><span class="line">                print(<span class="string">'received "%s" from %s'</span> % (data, s.getpeername()))</span><br><span class="line">                <span class="comment"># 将收到的消息放到相对应的socket客户端的消息列表中</span></span><br><span class="line">                message_dict[s].put(data)</span><br><span class="line">                <span class="comment"># 将需要进行回复操作socket放到outputs列表中，让select监听</span></span><br><span class="line">                <span class="keyword">if</span> s <span class="keyword">not</span> <span class="keyword">in</span> outputs:</span><br><span class="line">                    outputs.append(s)</span><br><span class="line">            <span class="keyword">except</span> Exception <span class="keyword">as</span> e:</span><br><span class="line">            <span class="comment"># else:</span></span><br><span class="line">                <span class="comment"># 客户端断开了连接(或出现其他异常)，将客户端的监听从inputs列表中移除</span></span><br><span class="line">                print(<span class="string">'closing'</span>, addr)</span><br><span class="line">                <span class="keyword">if</span> s <span class="keyword">in</span> outputs:</span><br><span class="line">                    outputs.remove(s)</span><br><span class="line">                inputs.remove(s)</span><br><span class="line">                s.close()</span><br><span class="line">                <span class="comment"># 移除相应socket客户端对象的消息队列</span></span><br><span class="line">                <span class="keyword">del</span> message_dict[s]</span><br><span class="line"></span><br><span class="line">    <span class="comment"># 处理发送消息列表</span></span><br><span class="line">    <span class="keyword">for</span> s <span class="keyword">in</span> w_list:</span><br><span class="line">        <span class="keyword">try</span>:</span><br><span class="line">            <span class="comment"># 如果消息队列中有消息，从消息队列中获取要发送的消息</span></span><br><span class="line">            message_queue = message_dict.get(s)</span><br><span class="line">            send_data = <span class="string">''</span></span><br><span class="line">            <span class="keyword">if</span> message_queue <span class="keyword">is</span> <span class="keyword">not</span> <span class="keyword">None</span>:</span><br><span class="line">                send_data = message_queue.get_nowait()</span><br><span class="line">            <span class="keyword">else</span>:</span><br><span class="line">                <span class="comment"># 客户端连接断开了</span></span><br><span class="line">                print(<span class="string">'has closed'</span>)</span><br><span class="line">        <span class="keyword">except</span> queue.Empty:</span><br><span class="line">            <span class="comment"># 客户端连接断开了</span></span><br><span class="line">            print(s.getpeername())</span><br><span class="line">            outputs.remove(s)</span><br><span class="line">        <span class="keyword">else</span>:</span><br><span class="line">            <span class="comment"># 处理消息</span></span><br><span class="line">            <span class="keyword">if</span> message_queue <span class="keyword">is</span> <span class="keyword">not</span> <span class="keyword">None</span>:</span><br><span class="line">                <span class="comment"># 把接收到的数据加上时间戳再返回</span></span><br><span class="line">                s.send((<span class="string">"[%s] %s"</span> % (ctime(), send_data)).encode(<span class="string">'utf-8'</span>))</span><br><span class="line">            <span class="keyword">else</span>:</span><br><span class="line">                print(<span class="string">"has closed"</span>)</span><br><span class="line"></span><br><span class="line">    <span class="comment"># 处理异常情况</span></span><br><span class="line">    <span class="keyword">for</span> s <span class="keyword">in</span> e_list:</span><br><span class="line">        print(<span class="string">"exception condition on"</span>, s.getpeername())</span><br><span class="line">        inputs.remove(s)</span><br><span class="line">        <span class="keyword">if</span> s <span class="keyword">in</span> outputs:</span><br><span class="line">            outputs.remove(s)</span><br><span class="line"></span><br><span class="line">        s.close()</span><br><span class="line"></span><br><span class="line">        <span class="keyword">del</span> message_dict[s]</span><br></pre></td></tr></table></figure><ul><li>客户端</li></ul><figure class="highlight python"><figcaption><span>客户端select_multi_client.py</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment"># coidng=utf-8</span></span><br><span class="line"><span class="string">"""客户端"""</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">import</span> socket</span><br><span class="line"></span><br><span class="line">HOST = <span class="string">'localhost'</span></span><br><span class="line">PORT = <span class="number">8001</span></span><br><span class="line">BUFSIZ = <span class="number">1024</span></span><br><span class="line">ADDR = (HOST, PORT)</span><br><span class="line"></span><br><span class="line"></span><br><span class="line">sock_num = <span class="number">2</span></span><br><span class="line">socks = [socket.socket(socket.AF_INET, socket.SOCK_STREAM) <span class="keyword">for</span> _ <span class="keyword">in</span> range(sock_num)]</span><br><span class="line"></span><br><span class="line">msgs = [<span class="string">"Hello"</span>, <span class="string">"I'm Robot"</span>, <span class="string">"Bye"</span>]</span><br><span class="line"></span><br><span class="line">print(<span class="string">"connecting to %s port %s..."</span> % ADDR)</span><br><span class="line"><span class="comment"># 连接到服务器</span></span><br><span class="line"><span class="keyword">for</span> s <span class="keyword">in</span> socks:</span><br><span class="line">    s.connect(ADDR)</span><br><span class="line"></span><br><span class="line"><span class="keyword">for</span> index, msg <span class="keyword">in</span> enumerate(msgs):</span><br><span class="line">    <span class="keyword">for</span> s <span class="keyword">in</span> socks:</span><br><span class="line">        print(<span class="string">'%s: sending "%s" %d'</span> % (s.getpeername(), msg, index))</span><br><span class="line">        s.send(msg.encode(<span class="string">'utf-8'</span>))</span><br><span class="line"></span><br><span class="line"><span class="keyword">for</span> s <span class="keyword">in</span> socks:</span><br><span class="line">    data = s.recv(BUFSIZ).decode(<span class="string">"utf-8"</span>)</span><br><span class="line">    print(<span class="string">'%s: received "%s"'</span> % (s.getpeername(),data))</span><br><span class="line">    <span class="comment"># 接收到一个回复后就断开连接，我们就可以看看服务器端是如何处理之后的请求的</span></span><br><span class="line">    <span class="keyword">if</span> data != <span class="string">""</span>:</span><br><span class="line">        print(<span class="string">'closing socket'</span>, s.getsockname())</span><br><span class="line">        s.close()</span><br></pre></td></tr></table></figure><ul><li>运行结果</li></ul><p>分别运行服务端和客户端程序：</p><figure class="highlight shell"><figcaption><span>服务端结果</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">$</span> python select_multi_server.py</span><br><span class="line">waiting for the next event...</span><br><span class="line">connection from ('127.0.0.1', 9078)</span><br><span class="line">waiting for the next event...</span><br><span class="line">connection from ('127.0.0.1', 9079)</span><br><span class="line">received "HelloI'm Robot" from ('127.0.0.1', 9078)</span><br><span class="line">waiting for the next event...</span><br><span class="line">received "Bye" from ('127.0.0.1', 9078)</span><br><span class="line">received "HelloI'm RobotBye" from ('127.0.0.1', 9079)</span><br><span class="line">waiting for the next event...</span><br><span class="line">waiting for the next event...</span><br><span class="line">('127.0.0.1', 9078)</span><br><span class="line">('127.0.0.1', 9079)</span><br><span class="line">waiting for the next event...</span><br><span class="line">closing ('127.0.0.1', 9079)</span><br><span class="line">waiting for the next event...</span><br><span class="line">received "" from ('127.0.0.1', 9079)</span><br><span class="line">waiting for the next event...</span><br><span class="line">received "" from ('127.0.0.1', 9079)</span><br><span class="line">waiting for the next event...</span><br><span class="line">closing ('127.0.0.1', 9079)</span><br><span class="line">has closed</span><br><span class="line">has closed</span><br><span class="line">waiting for the next event...</span><br></pre></td></tr></table></figure><hr><figure class="highlight shell"><figcaption><span>客户端结果</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">$</span> python select_multi_client.py</span><br><span class="line">connecting to localhost port 8001...</span><br><span class="line">('127.0.0.1', 8001): sending "Hello" 0</span><br><span class="line">('127.0.0.1', 8001): sending "Hello" 0</span><br><span class="line">('127.0.0.1', 8001): sending "I'm Robot" 1</span><br><span class="line">('127.0.0.1', 8001): sending "I'm Robot" 1</span><br><span class="line">('127.0.0.1', 8001): sending "Bye" 2</span><br><span class="line">('127.0.0.1', 8001): sending "Bye" 2</span><br><span class="line">('127.0.0.1', 8001): received "[Wed Oct 31 09:41:05 2018] HelloI'm RobotBye"</span><br><span class="line">closing socket ('127.0.0.1', 9078)</span><br><span class="line">('127.0.0.1', 8001): received "[Wed Oct 31 09:41:05 2018] HelloI'm RobotBye"</span><br><span class="line">closing socket ('127.0.0.1', 9079)</span><br></pre></td></tr></table></figure><p>多次运行程序，你会发现客户端程序返回结果里的received后面的略有不同，你发现其中的原因了吗！</p><h2 id="select、poll、epoll区别"><a href="#select、poll、epoll区别" class="headerlink" title="select、poll、epoll区别"></a>select、poll、epoll区别</h2><p><code>select</code>, <code>poll</code>, <code>epoll</code> 都是I/O多路复用的具体的实现，之所以有这三个存在，其实是因为他们出现是有先后顺序的。<br>I/O多路复用这个概念被提出来以后， select是第一个实现 (1983 左右在BSD里面实现的)。</p><h3 id="select"><a href="#select" class="headerlink" title="select"></a>select</h3><p>select 被实现以后，很快就暴露出了很多问题:</p><ul><li>select 会修改传入的参数数组，这个对于一个需要调用很多次的函数，是非常不友好的。</li><li>每次调用select，都需要把fd集合从用户态拷贝到内核态，这个开销在fd很多时会很大</li><li>select 如果任何一个sock(I/O stream)出现了数据，select仅仅会返回，但是并不会告诉你是那个sock上有数据，于是你只能自己一个一个的找，）每次调用select都需要在内核遍历传递进来的所有fd，这个开销在fd很多时也很大</li><li>select 只能监视1024个链接，linux 定义在头文件中的，参见<code>FD_SETSIZE</code>。</li><li>select 不是线程安全的，如果你把一个sock加入到select, 然后突然另外一个线程发现，尼玛，这个sock不用，要收回。对不起，这个select 不支持的，如果你丧心病狂的竟然关掉这个sock, select的标准行为是。。呃。。不可预测的，</li></ul><p>于是14年以后(1997年）一帮人又实现了<code>poll</code>, <code>poll</code> 修复了<code>select</code>的很多问题</p><h3 id="poll"><a href="#poll" class="headerlink" title="poll"></a>poll</h3><ul><li>poll 去掉了1024个链接的限制，于是要多少链接呢， 主人你开心就好。</li><li>poll 从设计上来说，不再修改传入数组，不过这个要看你的平台了，所以行走江湖，还是小心为妙。</li></ul><p>其实拖14年那么久也不是效率问题， 而是那个时代的硬件实在太弱，一台服务器处理1千多个链接简直就是神一样的存在了，<code>select</code>很长段时间已经满足需求。</p><p>但是<code>poll</code>仍然不是线程安全的， 这就意味着，不管服务器有多强悍，你也只能在一个线程里面处理一组I/O流。你当然可以那多进程来配合了，不过然后你就有了多进程的各种问题。</p><p>于是5年以后, 在2002, 大神 Davide Libenzi 实现了<code>epoll</code>.</p><h3 id="epoll"><a href="#epoll" class="headerlink" title="epoll"></a>epoll</h3><p><code>epoll</code> 可以说是I/O 多路复用最新的一个实现，<code>epoll</code> 修复了<code>poll</code> 和<code>select</code>绝大部分问题, 比如：</p><ul><li>对于每次需要将FD从用户态拷贝至内核态.epoll的解决方案在epoll_ctl函数中。每次注册新的事件到epoll句柄中时（在epoll_ctl中指定EPOLL_CTL_ADD），会把所有的fd拷贝进内核，而不是在epoll_wait的时候重复拷贝。epoll保证了每个fd在整个过程中只会拷贝一次。</li><li>同样epoll也没有1024的连接数限制</li><li>epoll 现在是线程安全的。</li><li>epoll 现在不仅告诉你sock组里面数据，还会告诉你具体哪个sock有数据，你不用自己去找了。</li></ul><p><code>epoll</code>的解决方案不像<code>select</code>或<code>poll</code>一样每次都把current轮流加入fd对应的设备等待队列中，而只在<code>epoll_ctl</code>时把current挂一遍（这一遍必不可少）并为每个fd指定一个回调函数，当设备就绪，唤醒等待队列上的等待者时，就会调用这个回调函数，而这个回调函数会把就绪的fd加入一个就绪链表）。<code>epoll_wait</code>的工作实际上就是在这个就绪链表中查看有没有就绪的fd（利用<code>schedule_timeout()</code>实现睡一会，判断一会的效果，和select实现中的第7步是类似的）。</p><p><a href="http://www.zhihu.com/question/32163005" target="_blank" rel="noopener">I/O多路复用知友有话说</a></p><h3 id="select-poll-epoll总结"><a href="#select-poll-epoll总结" class="headerlink" title="select/poll, epoll总结"></a>select/poll, epoll总结</h3><ol><li><p><code>select</code>，<code>poll</code>实现需要自己<strong>不断轮询所有fd集合</strong>，直到设备就绪，期间可能要睡眠和唤醒多次交替。而epoll其实也需要调用epoll_wait不断轮询就绪链表，期间也可能多次睡眠和唤醒交替，但是它是设备就绪时，<strong>调用回调函数</strong>，把就绪fd放入就绪链表中，并唤醒在epoll_wait中进入睡眠的进程。虽然都要睡眠和交替，但是select和poll在“醒着”的时候要遍历整个fd集合，而epoll在“醒着”的时候只要判断一下就绪链表是否为空就行了，这节省了大量的CPU时间。这就是回调机制带来的性能提升。</p></li><li><p>select，poll每次调用都要把fd集合从用户态往内核态拷贝一次，并且要把current往设备等待队列中挂一次，而epoll只要一次拷贝，而且把current往等待队列上挂也只挂一次（在epoll_wait的开始，注意这里的等待队列并不是设备等待队列，只是一个epoll内部定义的等待队列）。这也能节省不少的开销。</p></li></ol><h3 id="epoll示例1-简单时间戳服务器"><a href="#epoll示例1-简单时间戳服务器" class="headerlink" title="epoll示例1:简单时间戳服务器"></a>epoll示例1:简单时间戳服务器</h3><ul><li>服务端</li></ul><figure class="highlight python"><figcaption><span>服务端epoll_simple_server.py</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> socket</span><br><span class="line"><span class="keyword">import</span> select</span><br><span class="line"><span class="keyword">from</span> time <span class="keyword">import</span> ctime</span><br><span class="line"></span><br><span class="line">s = socket.socket()</span><br><span class="line">s.bind((<span class="string">'127.0.0.1'</span>,<span class="number">8888</span>))</span><br><span class="line">s.listen(<span class="number">5</span>)</span><br><span class="line">epoll_obj = select.epoll()</span><br><span class="line">epoll_obj.register(s,select.EPOLLIN)</span><br><span class="line">connections = &#123;&#125;</span><br><span class="line"><span class="keyword">while</span> <span class="keyword">True</span>:</span><br><span class="line">    events = epoll_obj.poll()</span><br><span class="line">    <span class="keyword">for</span> fd, event <span class="keyword">in</span> events:</span><br><span class="line">        print(<span class="string">"fd : &#123;fd&#125; | event : &#123;event&#125;"</span>.format(fd=fd, event=event))</span><br><span class="line">        <span class="keyword">if</span> fd == s.fileno():</span><br><span class="line">            conn, addr = s.accept()</span><br><span class="line">            connections[conn.fileno()] = conn</span><br><span class="line">            epoll_obj.register(conn,select.EPOLLIN)</span><br><span class="line">            msg = conn.recv(<span class="number">200</span>)</span><br><span class="line">            conn.sendall((<span class="string">'OK, first input --- [%s] %s'</span>% (ctime(), msg.decode(<span class="string">'utf-8'</span>))).encode())</span><br><span class="line">        <span class="keyword">else</span>:</span><br><span class="line">            <span class="keyword">try</span>:</span><br><span class="line">                fd_obj = connections[fd]</span><br><span class="line">                msg = fd_obj.recv(<span class="number">200</span>)</span><br><span class="line">                fd_obj.sendall((<span class="string">'[%s] %s'</span>% (ctime(), msg.decode(<span class="string">'utf-8'</span>))).encode())</span><br><span class="line">            <span class="keyword">except</span> BrokenPipeError:</span><br><span class="line">                epoll_obj.unregister(fd)</span><br><span class="line">                connections[fd].close()</span><br><span class="line">                <span class="keyword">del</span> connections[fd]</span><br><span class="line"></span><br><span class="line">s.close()</span><br><span class="line">epoll_obj.close()</span><br></pre></td></tr></table></figure><ul><li>客户端</li></ul><figure class="highlight python"><figcaption><span>客户端epoll_simple_client.py</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> socket</span><br><span class="line"></span><br><span class="line">flag = <span class="number">1</span></span><br><span class="line">s = socket.socket()</span><br><span class="line">s.connect((<span class="string">'127.0.0.1'</span>,<span class="number">8888</span>))</span><br><span class="line"><span class="keyword">while</span> flag:</span><br><span class="line">    input_msg = input(<span class="string">'input&gt;&gt;&gt;'</span>)</span><br><span class="line">    <span class="keyword">if</span> input_msg == <span class="string">'0'</span>:</span><br><span class="line">        <span class="keyword">break</span></span><br><span class="line">    s.sendall(input_msg.encode())</span><br><span class="line">    msg = s.recv(<span class="number">1024</span>)</span><br><span class="line">    print(msg.decode())</span><br><span class="line"></span><br><span class="line">s.close()</span><br></pre></td></tr></table></figure><h3 id="epoll示例2：读写分离的epoll"><a href="#epoll示例2：读写分离的epoll" class="headerlink" title="epoll示例2：读写分离的epoll"></a>epoll示例2：读写分离的epoll</h3><ul><li>服务端</li></ul><figure class="highlight python"><figcaption><span>服务端epoll_server.py</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br><span class="line">91</span><br><span class="line">92</span><br><span class="line">93</span><br><span class="line">94</span><br><span class="line">95</span><br><span class="line">96</span><br><span class="line">97</span><br><span class="line">98</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">#!/usr/bin/env python</span></span><br><span class="line"><span class="comment"># coding=utf-8</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">import</span> socket</span><br><span class="line"><span class="keyword">import</span> select</span><br><span class="line"><span class="keyword">import</span> queue</span><br><span class="line"><span class="keyword">from</span> time <span class="keyword">import</span> ctime</span><br><span class="line"></span><br><span class="line"><span class="comment">#创建socket对象</span></span><br><span class="line">ss = socket.socket(socket.AF_INET, socket.SOCK_STREAM)</span><br><span class="line"><span class="comment">#设置IP地址复用</span></span><br><span class="line">ss.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, <span class="number">1</span>)</span><br><span class="line"><span class="comment">#ip地址和端口号</span></span><br><span class="line">SADDR = (<span class="string">"127.0.0.1"</span>, <span class="number">8888</span>)</span><br><span class="line"><span class="comment">#绑定IP地址</span></span><br><span class="line">ss.bind(SADDR)</span><br><span class="line"><span class="comment">#监听，并设置最大连接数</span></span><br><span class="line">ss.listen(<span class="number">10</span>)</span><br><span class="line"><span class="keyword">print</span> (<span class="string">"服务器启动成功，监听IP："</span> , SADDR)</span><br><span class="line"><span class="comment">#服务端设置非阻塞</span></span><br><span class="line">ss.setblocking(<span class="keyword">False</span>)  </span><br><span class="line"><span class="comment">#超时时间</span></span><br><span class="line">timeout = <span class="number">10</span></span><br><span class="line"><span class="comment"># bufsize</span></span><br><span class="line">BUFSIZ = <span class="number">1024</span></span><br><span class="line"><span class="comment">#创建epoll事件对象，后续要监控的事件添加到其中</span></span><br><span class="line">epoll = select.epoll()</span><br><span class="line"><span class="comment">#注册服务器监听fd到等待读事件集合</span></span><br><span class="line">epoll.register(ss.fileno(), select.EPOLLIN)</span><br><span class="line"><span class="comment">#保存连接客户端消息的字典，格式为&#123;&#125;</span></span><br><span class="line">message_queues = &#123;&#125;</span><br><span class="line"><span class="comment">#文件句柄到所对应对象的字典，格式为&#123;句柄：对象&#125;</span></span><br><span class="line">fd_to_socket = &#123;ss.fileno():ss,&#125;</span><br><span class="line"></span><br><span class="line"><span class="keyword">while</span> <span class="keyword">True</span>:</span><br><span class="line">    print(<span class="string">"等待活动连接......"</span>)</span><br><span class="line">    <span class="comment">#轮询注册的事件集合，返回值为[(文件句柄，对应的事件)，(...),....]</span></span><br><span class="line">    events = epoll.poll(timeout)</span><br><span class="line">    <span class="keyword">if</span> <span class="keyword">not</span> events:</span><br><span class="line">        print(<span class="string">"epoll超时无活动连接，重新轮询......"</span>)</span><br><span class="line">        <span class="keyword">continue</span></span><br><span class="line">    print(<span class="string">"有&#123;num&#125;个新事件，开始处理......"</span>.format(num=len(events)))</span><br><span class="line">  </span><br><span class="line">    <span class="keyword">for</span> fd, event <span class="keyword">in</span> events:</span><br><span class="line">        socket = fd_to_socket[fd]</span><br><span class="line">        <span class="comment">#如果活动socket为当前服务器socket，表示有新连接</span></span><br><span class="line">        <span class="keyword">if</span> socket == ss:</span><br><span class="line">            connection, address = ss.accept()</span><br><span class="line">            print(<span class="string">"新连接："</span> , address)</span><br><span class="line">            <span class="comment">#新连接socket设置为非阻塞</span></span><br><span class="line">            connection.setblocking(<span class="keyword">False</span>)</span><br><span class="line">            <span class="comment">#注册新连接fd到待读事件集合</span></span><br><span class="line">            epoll.register(connection.fileno(), select.EPOLLIN)</span><br><span class="line">            <span class="comment">#把新连接的文件句柄以及对象保存到字典</span></span><br><span class="line">            fd_to_socket[connection.fileno()] = connection</span><br><span class="line">            <span class="comment">#以新连接的对象为键值，值存储在队列中，保存每个连接的信息</span></span><br><span class="line">            message_queues[connection]  = queue.Queue()</span><br><span class="line">        <span class="comment">#关闭事件</span></span><br><span class="line">        <span class="keyword">elif</span> event &amp; select.EPOLLHUP:</span><br><span class="line">            print(<span class="string">'client close'</span>)</span><br><span class="line">            <span class="comment">#在epoll中注销客户端的文件句柄</span></span><br><span class="line">            epoll.unregister(fd)</span><br><span class="line">            <span class="comment">#关闭客户端的文件句柄</span></span><br><span class="line">            fd_to_socket[fd].close()</span><br><span class="line">            <span class="comment">#在字典中删除与已关闭客户端相关的信息</span></span><br><span class="line">            <span class="keyword">del</span> message_queues[fd_to_socket[fd]]</span><br><span class="line">            <span class="keyword">del</span> fd_to_socket[fd]</span><br><span class="line">        <span class="comment">#可读事件</span></span><br><span class="line">        <span class="keyword">elif</span> event &amp; select.EPOLLIN:</span><br><span class="line">            <span class="comment">#接收数据</span></span><br><span class="line">            data = socket.recv(BUFSIZ)</span><br><span class="line">            <span class="keyword">if</span> data:</span><br><span class="line">                data = data.decode(<span class="string">"utf-8"</span>)</span><br><span class="line">                print(<span class="string">"收到数据：&#123;data&#125; , 客户端：&#123;client&#125;"</span>.format(data=data,client=socket.getpeername()))</span><br><span class="line">                <span class="comment">#将数据放入对应客户端的字典</span></span><br><span class="line">                message_queues[socket].put(data)</span><br><span class="line">                <span class="comment">#修改读取到消息的连接到等待写事件集合(即对应客户端收到消息后，再将其fd修改并加入写事件集合)</span></span><br><span class="line">                epoll.modify(fd, select.EPOLLOUT)</span><br><span class="line">        <span class="comment">#可写事件</span></span><br><span class="line">        <span class="keyword">elif</span> event &amp; select.EPOLLOUT:</span><br><span class="line">            <span class="keyword">try</span>:</span><br><span class="line">                <span class="comment">#从字典中获取对应客户端的信息</span></span><br><span class="line">                msg = message_queues[socket].get_nowait()</span><br><span class="line">            <span class="keyword">except</span> queue.Empty:</span><br><span class="line">                print(socket.getpeername() , <span class="string">" queue empty"</span>)</span><br><span class="line">                <span class="comment">#修改文件句柄为读事件</span></span><br><span class="line">                epoll.modify(fd, select.EPOLLIN)</span><br><span class="line">            <span class="keyword">else</span>:</span><br><span class="line">                print(<span class="string">"发送数据: &#123;data&#125; 客户端：&#123;client&#125;"</span>.format(data=msg, client=socket.getpeername()))</span><br><span class="line">                <span class="comment">#发送数据</span></span><br><span class="line">                socket.send((<span class="string">'[%s] %s'</span>% (ctime(), msg)).encode())</span><br><span class="line"></span><br><span class="line"><span class="comment">#在epoll中注销服务端文件句柄</span></span><br><span class="line">epoll.unregister(ss.fileno())</span><br><span class="line"><span class="comment">#关闭epoll</span></span><br><span class="line">epoll.close()</span><br><span class="line"><span class="comment">#关闭服务器socket</span></span><br><span class="line">ss.close()</span><br></pre></td></tr></table></figure><ul><li>客户端</li></ul><figure class="highlight python"><figcaption><span>客户端epoll_client.py</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> socket</span><br><span class="line"></span><br><span class="line"><span class="comment">#创建客户端socket对象</span></span><br><span class="line">cs = socket.socket(socket.AF_INET,socket.SOCK_STREAM)</span><br><span class="line"><span class="comment">#服务端IP地址和端口号元组</span></span><br><span class="line">server_address = (<span class="string">'127.0.0.1'</span>,<span class="number">8888</span>)</span><br><span class="line"><span class="comment">#客户端连接指定的IP地址和端口号</span></span><br><span class="line">cs.connect(server_address)</span><br><span class="line">BUFSIZE = <span class="number">1024</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">while</span> <span class="keyword">True</span>:</span><br><span class="line">    <span class="comment">#输入数据</span></span><br><span class="line">    data = input(<span class="string">'input&gt;'</span>)</span><br><span class="line">    <span class="keyword">if</span> <span class="keyword">not</span> data:</span><br><span class="line">        <span class="keyword">break</span></span><br><span class="line">    <span class="comment">#客户端发送数据</span></span><br><span class="line">    cs.sendall(data.encode(<span class="string">'utf-8'</span>))</span><br><span class="line">    <span class="comment">#客户端接收数据</span></span><br><span class="line">    server_data = cs.recv(BUFSIZE)</span><br><span class="line">    print(<span class="string">'客户端收到的数据：'</span>,server_data.decode())</span><br><span class="line"></span><br><span class="line"><span class="comment">#关闭客户端socket</span></span><br><span class="line">cs.close()</span><br></pre></td></tr></table></figure><h2 id="小结"><a href="#小结" class="headerlink" title="小结"></a>小结</h2><p>本文总结了I/O多路复用的三种方式select、poll、epoll，并使用python下select模块实现了以其为基础的时间戳服务端和客户端。</p>]]></content>
    
    <summary type="html">
    
      &lt;p class=&quot;description&quot;&gt;select模块以及IO多路复用&lt;br&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;&quot; alt=&quot;&quot; style=&quot;width:100%&quot;&gt;&lt;/p&gt;
&lt;h2 id=&quot;前言&quot;&gt;&lt;a href=&quot;#前言&quot; class=&quot;headerlink&quot; title=&quot;前言&quot;&gt;&lt;/a&gt;前言&lt;/h2&gt;&lt;p&gt;Python中的&lt;code&gt;select&lt;/code&gt;模块专注于I/O多路复用，提供了&lt;code&gt;select&lt;/code&gt;, &lt;code&gt;poll&lt;/code&gt;, &lt;code&gt;epoll&lt;/code&gt;三个方法(其中后两个在Linux中可用，windows仅支持select)，另外也提供了&lt;code&gt;kqueue&lt;/code&gt;方法(freeBSD系统).&lt;/p&gt;
&lt;p&gt;select()的机制中提供一&lt;code&gt;fd_set&lt;/code&gt;的数据结构，实际上是一&lt;code&gt;long&lt;/code&gt;类型的数组， 每一个数组元素都能与一打开的文件句柄（不管是Socket句柄，还是其他文件或命名管道或设备句柄）建立联系，建立联系的工作由程序员完成， 当调用select()时，由内核根据IO状态修改fd_set的内容，由此来通知执行了select()的进程哪一Socket或文件可读或可写。&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;select主要用于socket通信当中，能监视我们需要的文件描述符变化。&lt;/strong&gt;&lt;/p&gt;
    
    </summary>
    
      <category term="Python" scheme="https://cgdeeplearn.github.io/categories/Python/"/>
    
    
      <category term="python" scheme="https://cgdeeplearn.github.io/tags/python/"/>
    
      <category term="select" scheme="https://cgdeeplearn.github.io/tags/select/"/>
    
      <category term="socket" scheme="https://cgdeeplearn.github.io/tags/socket/"/>
    
      <category term="IO多路复用" scheme="https://cgdeeplearn.github.io/tags/IO%E5%A4%9A%E8%B7%AF%E5%A4%8D%E7%94%A8/"/>
    
      <category term="epoll" scheme="https://cgdeeplearn.github.io/tags/epoll/"/>
    
  </entry>
  
  <entry>
    <title>Python Socket编程</title>
    <link href="https://cgdeeplearn.github.io/2018/10/24/Socket/"/>
    <id>https://cgdeeplearn.github.io/2018/10/24/Socket/</id>
    <published>2018-10-24T06:59:09.000Z</published>
    <updated>2018-10-24T07:48:02.565Z</updated>
    
    <content type="html"><![CDATA[<script src="/assets/js/APlayer.min.js"> </script><p class="description">Socket套接字<br></p><p><img src="" alt="" style="width:100%"></p><h2 id="前言"><a href="#前言" class="headerlink" title="前言"></a>前言</h2><div class="note info"><p><br><br><code>socket</code>起源于Unix，而Unix/Linux基本哲学之一就是“一切皆文件”，对于文件用【打开】【读写】【关闭】模式来操作。<br>socket就是该模式的一个实现，<code>socket</code>即是一种特殊的文件，一些socket函数就是对其进行的操作（读/写IO、打开、关闭）<br><br>- socket和file的区别<br><br>1. file模块是针对某个指定文件进行【打开】【读写】【关闭】<br><br>2. socket模块是针对 服务器端 和 客户端 Socket 进行【打开】【读写】【关闭】<br><br>下面我们通过几种不同方式来实现时间戳服务器端和客户端：<code>TCP</code>、<code>UDP</code>、<code>SocketServer TCP</code>、<code>Twisted Reactor TCP</code><br><br></p></div><a id="more"></a><h2 id="TCP时间戳服务"><a href="#TCP时间戳服务" class="headerlink" title="TCP时间戳服务"></a>TCP时间戳服务</h2><h3 id="TCP服务器端"><a href="#TCP服务器端" class="headerlink" title="TCP服务器端"></a>TCP服务器端</h3><h4 id="TCP服务器端设计方式伪代码"><a href="#TCP服务器端设计方式伪代码" class="headerlink" title="TCP服务器端设计方式伪代码"></a>TCP服务器端设计方式伪代码</h4><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line">ss = socket()               <span class="comment"># 创建服务器套接字</span></span><br><span class="line">ss.bind()                   <span class="comment"># 套接字与地址绑定</span></span><br><span class="line">ss.listen()                 <span class="comment"># 监听连接</span></span><br><span class="line">inf_loop:                   <span class="comment"># 服务器无限循环</span></span><br><span class="line">    cs = ss.accept()        <span class="comment"># 接受客户端连接</span></span><br><span class="line">    comm_loop:              <span class="comment"># 通信循环</span></span><br><span class="line">        cs.recv()/cs.send() <span class="comment"># 对话(接受/发送)</span></span><br><span class="line">    cs.close()              <span class="comment"># 关闭客户端套接字</span></span><br><span class="line">ss.close()                  <span class="comment"># 关闭服务器套接字(可选)</span></span><br></pre></td></tr></table></figure><h4 id="创建TCP时间戳服务器"><a href="#创建TCP时间戳服务器" class="headerlink" title="创建TCP时间戳服务器"></a>创建TCP时间戳服务器</h4><figure class="highlight python"><figcaption><span>TCP时间戳服务器(tsTserv.py)</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">#!/usr/bin/env python</span></span><br><span class="line"><span class="comment"># coding=utf-8</span></span><br><span class="line"></span><br><span class="line"><span class="string">"""</span></span><br><span class="line"><span class="string">创建一个TCP服务器，它接受来自客户端的消息，然后将消息加上时间前缀并发送回客户端</span></span><br><span class="line"><span class="string">"""</span></span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="keyword">from</span> socket <span class="keyword">import</span> *</span><br><span class="line"><span class="keyword">from</span> time <span class="keyword">import</span> ctime</span><br><span class="line"></span><br><span class="line"></span><br><span class="line">HOST = <span class="string">''</span></span><br><span class="line">PORT =<span class="number">21567</span></span><br><span class="line">BUFSIZ = <span class="number">1024</span></span><br><span class="line">ADDR = (HOST, PORT)</span><br><span class="line"></span><br><span class="line">tcpSerSock = socket(AF_INET, SOCK_STREAM)</span><br><span class="line">tcpSerSock.bind(ADDR)</span><br><span class="line">tcpSerSock.listen(<span class="number">5</span>)</span><br><span class="line"></span><br><span class="line"><span class="keyword">while</span> <span class="keyword">True</span>:</span><br><span class="line">    print(<span class="string">"waiting for connection..."</span>)</span><br><span class="line">    tcpCliSock, addr = tcpSerSock.accept()</span><br><span class="line">    print(<span class="string">"...connected from "</span>, addr)</span><br><span class="line"></span><br><span class="line">    <span class="keyword">while</span> <span class="keyword">True</span>:</span><br><span class="line">        data = tcpCliSock.recv(BUFSIZ)</span><br><span class="line">        <span class="keyword">if</span> <span class="keyword">not</span> data:</span><br><span class="line">            <span class="keyword">break</span></span><br><span class="line">        tcpCliSock.send(bytes(<span class="string">'[%s] %s'</span> % (ctime(), data.decode(<span class="string">'utf-8'</span>)),<span class="string">'utf-8'</span>))</span><br><span class="line">    tcpCliSock.close()</span><br><span class="line"></span><br><span class="line">tcpSerSock.close()</span><br></pre></td></tr></table></figure><h3 id="TCP客户端"><a href="#TCP客户端" class="headerlink" title="TCP客户端"></a>TCP客户端</h3><h4 id="客户端伪代码"><a href="#客户端伪代码" class="headerlink" title="客户端伪代码"></a>客户端伪代码</h4><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">cs = socket()           <span class="comment"># 创建客户端套接字</span></span><br><span class="line">cs.connect()            <span class="comment"># 尝试连接服务器</span></span><br><span class="line">comm_loop:              <span class="comment"># 通信循环</span></span><br><span class="line">    cs.send()/cs.recv() <span class="comment"># 对话(发送/接收)</span></span><br><span class="line">cs.close()              <span class="comment"># 关闭客户端套接字</span></span><br></pre></td></tr></table></figure><h4 id="创建TCP时间戳客户端"><a href="#创建TCP时间戳客户端" class="headerlink" title="创建TCP时间戳客户端"></a>创建TCP时间戳客户端</h4><figure class="highlight python"><figcaption><span>TCP时间戳客户端(tsTclnt.py)</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">#!/usr/bin/env python</span></span><br><span class="line"><span class="comment"># coding=utf-8</span></span><br><span class="line"></span><br><span class="line"><span class="string">"""</span></span><br><span class="line"><span class="string">"""</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">from</span> socket <span class="keyword">import</span> *</span><br><span class="line"></span><br><span class="line">HOST = <span class="string">'localhost'</span></span><br><span class="line">PORT = <span class="number">21567</span></span><br><span class="line">BUFSIZ = <span class="number">1024</span></span><br><span class="line">ADDR = (HOST, PORT)</span><br><span class="line"></span><br><span class="line">tcpCliSock = socket(AF_INET, SOCK_STREAM)</span><br><span class="line">tcpCliSock.connect(ADDR)</span><br><span class="line"></span><br><span class="line"><span class="keyword">while</span> <span class="keyword">True</span>:</span><br><span class="line">    data = input(<span class="string">'&gt; '</span>)</span><br><span class="line">    <span class="keyword">if</span> <span class="keyword">not</span> data:</span><br><span class="line">        <span class="keyword">break</span></span><br><span class="line">    tcpCliSock.send(bytes(data,<span class="string">'utf-8'</span>))</span><br><span class="line">    data = tcpCliSock.recv(BUFSIZ)</span><br><span class="line">    <span class="keyword">if</span> <span class="keyword">not</span> data:</span><br><span class="line">        <span class="keyword">break</span></span><br><span class="line">    print(data.decode(<span class="string">'utf-8'</span>))</span><br><span class="line"></span><br><span class="line"></span><br><span class="line">tcpCliSock.close()</span><br></pre></td></tr></table></figure><h3 id="执行TCP服务器和客户端"><a href="#执行TCP服务器和客户端" class="headerlink" title="执行TCP服务器和客户端"></a>执行TCP服务器和客户端</h3><ul><li>服务器端:</li></ul><figure class="highlight sh"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">$ python tsTserv.py</span><br><span class="line">waiting <span class="keyword">for</span> connection...</span><br><span class="line">...connected from (<span class="string">'127.0.0.1'</span>, 28182)</span><br><span class="line">waiting <span class="keyword">for</span> connection</span><br></pre></td></tr></table></figure><ul><li>客户端</li></ul><figure class="highlight sh"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">$ python tsTclnt.py</span><br><span class="line">&gt; Hi</span><br><span class="line">[Wed Oct 24 10:39:43 2018] Hi</span><br><span class="line">&gt; I<span class="string">'m Jack</span></span><br><span class="line"><span class="string">[Wed Oct 24 10:39:47 2018] I'</span>m Jack</span><br><span class="line">&gt;</span><br></pre></td></tr></table></figure><h2 id="UDP时间戳服务器"><a href="#UDP时间戳服务器" class="headerlink" title="UDP时间戳服务器"></a>UDP时间戳服务器</h2><h3 id="UDP服务器端"><a href="#UDP服务器端" class="headerlink" title="UDP服务器端"></a>UDP服务器端</h3><h4 id="TCP服务器端设计方式伪代码-1"><a href="#TCP服务器端设计方式伪代码-1" class="headerlink" title="TCP服务器端设计方式伪代码"></a>TCP服务器端设计方式伪代码</h4><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">ss = socket()                   <span class="comment"># 创建服务器套接字</span></span><br><span class="line">ss.bind()                       <span class="comment"># 绑定服务器套接字</span></span><br><span class="line">inf_loop:                       <span class="comment"># 服务器无限循环</span></span><br><span class="line">    ss.recvfrom()/ss.sendto()   <span class="comment"># 接收/发送</span></span><br><span class="line">ss.close()                      <span class="comment"># 关闭服务器套接字</span></span><br></pre></td></tr></table></figure><h4 id="创建UDP服务器"><a href="#创建UDP服务器" class="headerlink" title="创建UDP服务器"></a>创建UDP服务器</h4><figure class="highlight python"><figcaption><span>UDP时间戳服务器(tsUserv.py)</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">#!/usr/bin/env python</span></span><br><span class="line"><span class="comment"># coding=utf-8</span></span><br><span class="line"><span class="string">"""</span></span><br><span class="line"><span class="string">UDP TimeStamp server</span></span><br><span class="line"><span class="string">"""</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">from</span> socket <span class="keyword">import</span> *</span><br><span class="line"><span class="keyword">from</span> time <span class="keyword">import</span> ctime</span><br><span class="line"></span><br><span class="line">HOST = <span class="string">''</span></span><br><span class="line">PORT = <span class="number">21567</span></span><br><span class="line">BUFSIZ = <span class="number">1024</span></span><br><span class="line">ADDR = (HOST, PORT)</span><br><span class="line"></span><br><span class="line">udpSerSock = socket(AF_INET, SOCK_DGRAM)</span><br><span class="line">udpSerSock.bind(ADDR)</span><br><span class="line"></span><br><span class="line"><span class="keyword">while</span> <span class="keyword">True</span>:</span><br><span class="line">    print(<span class="string">"wating for message..."</span>)</span><br><span class="line">    data, addr = udpSerSock.recvfrom(BUFSIZ)</span><br><span class="line">    udpSerSock.sendto(bytes(<span class="string">'[%s] %s'</span> %</span><br><span class="line">                            (ctime(), data.decode(<span class="string">'utf-8'</span>)), <span class="string">'utf-8'</span>), addr)</span><br><span class="line">    print(<span class="string">"...received from and returned to:"</span>, addr)</span><br><span class="line"></span><br><span class="line">udpSerSock.close()</span><br></pre></td></tr></table></figure><h3 id="UDP客户端"><a href="#UDP客户端" class="headerlink" title="UDP客户端"></a>UDP客户端</h3><h4 id="UDP客户端端设计方式伪代码"><a href="#UDP客户端端设计方式伪代码" class="headerlink" title="UDP客户端端设计方式伪代码"></a>UDP客户端端设计方式伪代码</h4><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">cs = socket()                   <span class="comment"># 创建客户端套接字</span></span><br><span class="line">comm_loop:                      <span class="comment"># 通信循环</span></span><br><span class="line">    cs.sendto()/cs.recvfrom()   <span class="comment"># 对话(发送接收)</span></span><br><span class="line">cs.close()                      <span class="comment"># 关闭客户端套接字</span></span><br></pre></td></tr></table></figure><h4 id="创建UDP客户端"><a href="#创建UDP客户端" class="headerlink" title="创建UDP客户端"></a>创建UDP客户端</h4><figure class="highlight python"><figcaption><span>UDP时间戳客户端(tsUclnt.py)</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">#!/usr/bin/env python</span></span><br><span class="line"><span class="comment"># codin=utf-8</span></span><br><span class="line"><span class="string">"""</span></span><br><span class="line"><span class="string">UDP TimeStamp Client</span></span><br><span class="line"><span class="string">"""</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">from</span> socket <span class="keyword">import</span> *</span><br><span class="line"></span><br><span class="line"></span><br><span class="line">HOST = <span class="string">'localhost'</span></span><br><span class="line">PORT = <span class="number">21567</span></span><br><span class="line">BUFSIZ = <span class="number">1024</span></span><br><span class="line">ADDR = (HOST, PORT)</span><br><span class="line"></span><br><span class="line">udpSerSock = socket(AF_INET, SOCK_DGRAM)</span><br><span class="line"></span><br><span class="line"><span class="keyword">while</span> <span class="keyword">True</span>:</span><br><span class="line">    data = input(<span class="string">'&gt; '</span>)</span><br><span class="line">    <span class="keyword">if</span> <span class="keyword">not</span> data:</span><br><span class="line">        <span class="keyword">break</span></span><br><span class="line">    udpSerSock.sendto(bytes(data,<span class="string">'utf-8'</span>), ADDR)</span><br><span class="line">    data, ADDR = udpSerSock.recvfrom(BUFSIZ)</span><br><span class="line">    <span class="keyword">if</span> <span class="keyword">not</span> data:</span><br><span class="line">        <span class="keyword">break</span></span><br><span class="line">    print(data.decode(<span class="string">'utf-8'</span>))</span><br><span class="line"></span><br><span class="line">udpSerSock.close()</span><br></pre></td></tr></table></figure><h2 id="SocketServer时间戳"><a href="#SocketServer时间戳" class="headerlink" title="SocketServer时间戳"></a>SocketServer时间戳</h2><h3 id="服务器端"><a href="#服务器端" class="headerlink" title="服务器端"></a>服务器端</h3><figure class="highlight python"><figcaption><span>SocketServer时间戳TCP服务器(tsTservSS.py)</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">#!/usr/bin/env python</span></span><br><span class="line"><span class="comment"># coding=utf-8</span></span><br><span class="line"></span><br><span class="line"><span class="string">"""</span></span><br><span class="line"><span class="string">SocketServer时间戳TCP服务器</span></span><br><span class="line"><span class="string">使用SocketServer类、TCPServer和StreamRequestHandler</span></span><br><span class="line"><span class="string">分叉，多线程</span></span><br><span class="line"><span class="string">windows不支持分叉</span></span><br><span class="line"><span class="string">"""</span></span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="keyword">from</span> socketserver <span class="keyword">import</span> (TCPServer <span class="keyword">as</span> TCP, StreamRequestHandler <span class="keyword">as</span> SRH,</span><br><span class="line">                          ForkingMixIn <span class="keyword">as</span> FMI, ThreadingMixIn <span class="keyword">as</span> TMI)</span><br><span class="line"></span><br><span class="line"><span class="keyword">from</span> time <span class="keyword">import</span> ctime</span><br><span class="line"></span><br><span class="line">HOST = <span class="string">''</span></span><br><span class="line">PORT = <span class="number">21567</span></span><br><span class="line">ADDR = (HOST, PORT)</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="class"><span class="keyword">class</span> <span class="title">FServer</span><span class="params">(FMI, TCP)</span>:</span></span><br><span class="line">    <span class="keyword">pass</span></span><br><span class="line"></span><br><span class="line"><span class="class"><span class="keyword">class</span> <span class="title">TServer</span><span class="params">(TMI, TCP)</span>:</span></span><br><span class="line">    <span class="keyword">pass</span></span><br><span class="line"></span><br><span class="line"><span class="class"><span class="keyword">class</span> <span class="title">MyRequestHandler</span><span class="params">(SRH)</span>:</span></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">handle</span><span class="params">(self)</span>:</span></span><br><span class="line">        print(<span class="string">"...connected from:"</span>, self.client_address)</span><br><span class="line">        self.wfile.write(</span><br><span class="line">            (<span class="string">"[%s] %s"</span> % (ctime(), self.rfile.readline().decode(<span class="string">'utf-8'</span>))).encode(<span class="string">'utf-8'</span>))</span><br><span class="line"></span><br><span class="line"></span><br><span class="line">tcpServ = TServer(ADDR, MyRequestHandler) <span class="comment"># TCP, FSever, TServer</span></span><br><span class="line">print(<span class="string">"waiting for connection..."</span>)</span><br><span class="line"></span><br><span class="line">tcpServ.serve_forever()</span><br></pre></td></tr></table></figure><h3 id="客户端"><a href="#客户端" class="headerlink" title="客户端"></a>客户端</h3><figure class="highlight python"><figcaption><span>SocketServer时间戳TCP客户端(tsTclntSS.py)</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">#!/usr/bin/env python</span></span><br><span class="line"><span class="comment"># coding=utf-8</span></span><br><span class="line"></span><br><span class="line"><span class="string">"""</span></span><br><span class="line"><span class="string">SocketServer时间戳TCP客户端</span></span><br><span class="line"><span class="string"></span></span><br><span class="line"><span class="string">"""</span></span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="keyword">from</span> socket <span class="keyword">import</span> *</span><br><span class="line"></span><br><span class="line">HOST = <span class="string">'localhost'</span></span><br><span class="line">PORT = <span class="number">21567</span></span><br><span class="line">BUFSIZ = <span class="number">1024</span></span><br><span class="line">ADDR = (HOST, PORT)</span><br><span class="line"></span><br><span class="line"><span class="keyword">while</span> <span class="keyword">True</span>:</span><br><span class="line">    tcpCliSock = socket(AF_INET, SOCK_STREAM)</span><br><span class="line">    tcpCliSock.connect(ADDR)</span><br><span class="line">    data = input(<span class="string">'&gt; '</span>)</span><br><span class="line">    <span class="keyword">if</span> <span class="keyword">not</span> data:</span><br><span class="line">        <span class="keyword">break</span></span><br><span class="line">    tcpCliSock.send((<span class="string">"%s\r\n"</span> % data).encode(<span class="string">'utf-8'</span>))</span><br><span class="line">    data = tcpCliSock.recv(BUFSIZ).decode(<span class="string">'utf-8'</span>)</span><br><span class="line">    <span class="keyword">if</span> <span class="keyword">not</span> data:</span><br><span class="line">        <span class="keyword">break</span></span><br><span class="line">    print(data.strip())</span><br><span class="line">    tcpCliSock.close()</span><br></pre></td></tr></table></figure><h2 id="Twisted-Reactor-TCP时间戳"><a href="#Twisted-Reactor-TCP时间戳" class="headerlink" title="Twisted Reactor TCP时间戳"></a>Twisted Reactor TCP时间戳</h2><h3 id="服务器端-1"><a href="#服务器端-1" class="headerlink" title="服务器端"></a>服务器端</h3><figure class="highlight python"><figcaption><span>Twisted Reactor时间戳TCP服务器(tsTservTW.py)</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">#!/usr/bin/env python</span></span><br><span class="line"><span class="comment"># coding=utf-8</span></span><br><span class="line"></span><br><span class="line"><span class="string">"""</span></span><br><span class="line"><span class="string">Twisted Reactor时间戳TCP服务器，使用了Twisted Internet类</span></span><br><span class="line"><span class="string">"""</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">from</span> twisted.internet <span class="keyword">import</span> protocol, reactor</span><br><span class="line"><span class="keyword">from</span> time <span class="keyword">import</span> ctime</span><br><span class="line"></span><br><span class="line">PORT = <span class="number">21567</span></span><br><span class="line"></span><br><span class="line"><span class="class"><span class="keyword">class</span> <span class="title">TSServProtocol</span><span class="params">(protocol.Protocol)</span>:</span></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">connectionMade</span><span class="params">(self)</span>:</span></span><br><span class="line">        clnt = self.clnt = self.transport.getPeer().host</span><br><span class="line">        print(<span class="string">"...connected from:"</span>, clnt)</span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">dataReceived</span><span class="params">(self, data)</span>:</span></span><br><span class="line">        self.transport.write((<span class="string">"[%s] %s"</span> % (ctime(), data.decode(<span class="string">'utf-8'</span>))).encode(<span class="string">"utf-8"</span>))</span><br><span class="line"></span><br><span class="line">factory = protocol.Factory()</span><br><span class="line">factory.protocol = TSServProtocol</span><br><span class="line">print(<span class="string">"waiting fro connection..."</span>)</span><br><span class="line">reactor.listenTCP(PORT, factory)</span><br><span class="line">reactor.run()</span><br></pre></td></tr></table></figure><h3 id="客户端-1"><a href="#客户端-1" class="headerlink" title="客户端"></a>客户端</h3><figure class="highlight python"><figcaption><span>Twisted Reactor时间戳TCP客户端(tsTclntTW.py)</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">#!/usr/bin/env python</span></span><br><span class="line"><span class="comment"># coding=utf-8</span></span><br><span class="line"></span><br><span class="line"><span class="string">"""</span></span><br><span class="line"><span class="string">Twisted Reactor时间戳TCP客户端</span></span><br><span class="line"><span class="string">"""</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">from</span> twisted.internet <span class="keyword">import</span> protocol, reactor</span><br><span class="line"></span><br><span class="line">PORT = <span class="number">21567</span></span><br><span class="line">HOST = <span class="string">'localhost'</span></span><br><span class="line"></span><br><span class="line"><span class="class"><span class="keyword">class</span> <span class="title">TSClntProtocol</span><span class="params">(protocol.Protocol)</span>:</span></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">sendData</span><span class="params">(self)</span>:</span></span><br><span class="line">        data = input(<span class="string">'&gt; '</span>)</span><br><span class="line">        <span class="keyword">if</span> data:</span><br><span class="line">            print(<span class="string">"...sending %s..."</span> % data)</span><br><span class="line">            self.transport.write(data.encode(<span class="string">'utf-8'</span>))</span><br><span class="line">        <span class="keyword">else</span>:</span><br><span class="line">            self.transport.loseConnection()</span><br><span class="line">    </span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">connectionMade</span><span class="params">(self)</span>:</span></span><br><span class="line">        self.sendData()</span><br><span class="line">    </span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">dataReceived</span><span class="params">(self, data)</span>:</span></span><br><span class="line">        print(data.decode())</span><br><span class="line">        self.sendData()</span><br><span class="line"></span><br><span class="line"><span class="class"><span class="keyword">class</span> <span class="title">TSClntFactory</span><span class="params">(protocol.ClientFactory)</span>:</span></span><br><span class="line">    protocol = TSClntProtocol</span><br><span class="line">    clientConnectionLost = clientConnectionFailed = <span class="keyword">lambda</span> self, connector, reason: reactor.stop()</span><br><span class="line"></span><br><span class="line">reactor.connectTCP(HOST, PORT, TSClntFactory())</span><br><span class="line">reactor.run()</span><br></pre></td></tr></table></figure><h2 id="小结"><a href="#小结" class="headerlink" title="小结"></a>小结</h2><p>我们用几种方式实现了一个时间戳服务器和客户端,下次我们将学习IO多路复用及python下的<code>select</code>模块</p>]]></content>
    
    <summary type="html">
    
      &lt;p class=&quot;description&quot;&gt;Socket套接字&lt;br&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;&quot; alt=&quot;&quot; style=&quot;width:100%&quot;&gt;&lt;/p&gt;
&lt;h2 id=&quot;前言&quot;&gt;&lt;a href=&quot;#前言&quot; class=&quot;headerlink&quot; title=&quot;前言&quot;&gt;&lt;/a&gt;前言&lt;/h2&gt;&lt;div class=&quot;note info&quot;&gt;&lt;p&gt;&lt;br&gt;&lt;br&gt;&lt;code&gt;socket&lt;/code&gt;起源于Unix，而Unix/Linux基本哲学之一就是“一切皆文件”，对于文件用【打开】【读写】【关闭】模式来操作。&lt;br&gt;socket就是该模式的一个实现，&lt;code&gt;socket&lt;/code&gt;即是一种特殊的文件，一些socket函数就是对其进行的操作（读/写IO、打开、关闭）&lt;br&gt;&lt;br&gt;- socket和file的区别&lt;br&gt;&lt;br&gt;1. file模块是针对某个指定文件进行【打开】【读写】【关闭】&lt;br&gt;&lt;br&gt;2. socket模块是针对 服务器端 和 客户端 Socket 进行【打开】【读写】【关闭】&lt;br&gt;&lt;br&gt;下面我们通过几种不同方式来实现时间戳服务器端和客户端：&lt;code&gt;TCP&lt;/code&gt;、&lt;code&gt;UDP&lt;/code&gt;、&lt;code&gt;SocketServer TCP&lt;/code&gt;、&lt;code&gt;Twisted Reactor TCP&lt;/code&gt;&lt;br&gt;&lt;br&gt;&lt;/p&gt;&lt;/div&gt;
    
    </summary>
    
      <category term="Python" scheme="https://cgdeeplearn.github.io/categories/Python/"/>
    
    
      <category term="python" scheme="https://cgdeeplearn.github.io/tags/python/"/>
    
      <category term="socket" scheme="https://cgdeeplearn.github.io/tags/socket/"/>
    
      <category term="TCP" scheme="https://cgdeeplearn.github.io/tags/TCP/"/>
    
      <category term="UDP" scheme="https://cgdeeplearn.github.io/tags/UDP/"/>
    
      <category term="SocketServer" scheme="https://cgdeeplearn.github.io/tags/SocketServer/"/>
    
      <category term="Twisted" scheme="https://cgdeeplearn.github.io/tags/Twisted/"/>
    
  </entry>
  
  <entry>
    <title>gRPC简介及其在Python中使用</title>
    <link href="https://cgdeeplearn.github.io/2018/10/22/gRPC/"/>
    <id>https://cgdeeplearn.github.io/2018/10/22/gRPC/</id>
    <published>2018-10-22T01:10:35.000Z</published>
    <updated>2018-10-29T08:33:54.741Z</updated>
    
    <content type="html"><![CDATA[<script src="/assets/js/APlayer.min.js"> </script><p class="description"><br><br></p><p><img src="" alt="" style="width:100%"></p><h2 id="前言"><a href="#前言" class="headerlink" title="前言"></a>前言</h2><p>gRPC  是一个高性能、开源和通用的 RPC 框架，面向移动和 HTTP/2 设计。目前提供 C、Java 和 Go 语言版本，分别是：grpc, grpc-java, grpc-go. 其中 C 版本支持 C, C++, Node.js, Python, Ruby, Objective-C, PHP 和 C# 支持.</p><p>gRPC 基于 HTTP/2 标准设计，带来诸如双向流、流控、头部压缩、单 TCP 连接上的多复用请求等特。这些特性使得其在移动设备上表现更好，更省电和节省空间占用。<br><a id="more"></a></p><h2 id="简介"><a href="#简介" class="headerlink" title="简介"></a>简介</h2><h3 id="gRPC是什么"><a href="#gRPC是什么" class="headerlink" title="gRPC是什么"></a>gRPC是什么</h3><p>在 gRPC 里客户端应用可以像调用本地对象一样直接调用另一台不同的机器上服务端应用的方法，使得您能够更容易地创建分布式应用和服务。与许多 RPC 系统类似，gRPC 也是基于以下理念：定义一个服务，指定其能够被远程调用的方法（包含参数和返回类型）。在服务端实现这个接口，并运行一个 gRPC 服务器来处理客户端调用。在客户端拥有一个存根能够像服务端一样的方法。</p><p><img src="http://www.grpc.io/img/grpc_concept_diagram_00.png" alt="grpc_concept_diagram_00"></p><p>gRPC 客户端和服务端可以在多种环境中运行和交互 - 从 google 内部的服务器到你自己的笔记本，并且可以用任何 gRPC 支持的语言来编写。所以，你可以很容易地用 Java 创建一个 gRPC 服务端，用 Go、Python、Ruby 来创建客户端。此外，Google 最新 API 将有 gRPC 版本的接口，使你很容易地将 Google 的功能集成到你的应用里。</p><h3 id="使用protocol-buffers"><a href="#使用protocol-buffers" class="headerlink" title="使用protocol buffers"></a>使用protocol buffers</h3><p>gRPC 默认使用 <code>protocol buffers</code>，这是 Google 开源的一套成熟的<code>结构数据序列化机制</code>（当然也可以使用其他数据格式如 JSON）。正如你将在下方例子里所看到的，你用 proto files 创建 gRPC 服务，用 protocol buffers 消息类型来定义方法参数和返回类型。你可以在 Protocol Buffers 文档找到更多关于 Protocol Buffers 的资料。</p><h2 id="安装"><a href="#安装" class="headerlink" title="安装"></a>安装</h2><h3 id="准备Python环境"><a href="#准备Python环境" class="headerlink" title="准备Python环境"></a>准备Python环境</h3><figure class="highlight sh"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">$ python -m pip install --upgrade pip</span><br><span class="line"></span><br><span class="line">$ python -m pip install virtualenv</span><br><span class="line">$ virtualenv venv</span><br><span class="line">$ <span class="built_in">source</span> venv/bin/activate</span><br><span class="line">$ python -m pip install --upgrade pip</span><br></pre></td></tr></table></figure><h3 id="安装gRPC"><a href="#安装gRPC" class="headerlink" title="安装gRPC"></a>安装gRPC</h3><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment"># 在之前激活的虚拟环境下运行</span></span><br><span class="line">pip install grpcio</span><br></pre></td></tr></table></figure><p>同时还要安装gRPC tools:</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">pip install grpcio-tools googleapis-common-protos</span><br></pre></td></tr></table></figure><h2 id="示例"><a href="#示例" class="headerlink" title="示例"></a>示例</h2><h3 id="下载官方例子"><a href="#下载官方例子" class="headerlink" title="下载官方例子"></a>下载官方例子</h3><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">git clone -b v1.15.0 https://github.com/grpc/grpc</span><br><span class="line"># Navigate to the &quot;hello, world&quot; Python example:</span><br><span class="line">$ cd grpc/examples/python/helloworld</span><br></pre></td></tr></table></figure><h3 id="运行一个gRPC应用"><a href="#运行一个gRPC应用" class="headerlink" title="运行一个gRPC应用"></a>运行一个gRPC应用</h3><p>在 <code>examples/python/helloworld</code> 目录中:</p><ol><li>运行服务端：</li></ol><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">python greeter_server.py</span><br></pre></td></tr></table></figure><ol><li>在另一个terminal，运行客户端：</li></ol><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">python greeter_client.py</span><br></pre></td></tr></table></figure><p>Congratulations! You’ve just run a client-server application with gRPC.</p><h2 id="python编写一个RPC服务完整过程"><a href="#python编写一个RPC服务完整过程" class="headerlink" title="python编写一个RPC服务完整过程"></a>python编写一个RPC服务完整过程</h2><h3 id="定义服务"><a href="#定义服务" class="headerlink" title="定义服务"></a>定义服务</h3><p>创建我们例子的第一步是定义一个服务：一个 RPC 服务通过参数和返回类型来指定可以远程调用的方法。 gRPC 通过 <code>protocol buffers</code> 来实现。<br>我们使用 protocol buffers 接口定义语言来定义服务方法，用 protocol buffer 来定义参数和返回类型。客户端和服务端均使用服务定义生成的接口代码。</p><p>这里有我们服务定义的例子，在 <code>helloworld.proto</code> 里用 protocol buffers IDL 定义的。<code>Greeter</code> 服务有一个方法 <code>SayHello</code> ，可以让服务端从远程客户端接收一个包含用户名的 <code>HelloRequest</code> 消息后，在一个 <code>HelloReply</code> 里发送回一个 <code>Greeter</code>。这是你可以在 gRPC 里指定的最简单的 RPC - 你可以在教程里找到针对你选择的语言更多类型的例子。</p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br></pre></td><td class="code"><pre><span class="line">syntax = <span class="string">"proto3"</span>;</span><br><span class="line"></span><br><span class="line">option java_multiple_files = <span class="keyword">true</span>;</span><br><span class="line">option java_package = <span class="string">"io.grpc.examples.helloworld"</span>;</span><br><span class="line">option java_outer_classname = <span class="string">"HelloWorldProto"</span>;</span><br><span class="line">option objc_class_prefix = <span class="string">"HLW"</span>;</span><br><span class="line"></span><br><span class="line"><span class="keyword">package</span> helloworld;</span><br><span class="line"></span><br><span class="line"><span class="comment">// The greeting service definition.</span></span><br><span class="line">service Greeter &#123;</span><br><span class="line">  <span class="comment">// Sends a greeting</span></span><br><span class="line">  <span class="function">rpc <span class="title">SayHello</span> <span class="params">(HelloRequest)</span> <span class="title">returns</span> <span class="params">(HelloReply)</span> </span>&#123;&#125;</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// The request message containing the user's name.</span></span><br><span class="line">message HelloRequest &#123;</span><br><span class="line">  string name = <span class="number">1</span>;</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// The response message containing the greetings</span></span><br><span class="line">message HelloReply &#123;</span><br><span class="line">  string message = <span class="number">1</span>;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h3 id="生成gRPC"><a href="#生成gRPC" class="headerlink" title="生成gRPC"></a>生成gRPC</h3><p>一旦定义好服务，我们可以使用 protocol buffer 编译器 protoc 来生成创建应用所需的特定客户端和服务端的代码 - 你可以生成任意 gRPC 支持的语言的代码，当然 PHP 和 Objective-C 仅支持创建客户端代码。生成的代码同时包括客户端的存根和服务端要实现的抽象接口，均包含 Greeter 所定义的方法。</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">python -m grpc_tools.protoc -I . --python_out=. --grpc_python_out=. helloworld.proto</span><br><span class="line"><span class="comment"># helloworld.proto为上面我们编写的proto文件</span></span><br></pre></td></tr></table></figure><p>这生成了 <code>helloworld_pb2.py</code>和<code>helloworld_pb2_grpc.py</code>两个文件 ，包含我们生成的客户端和服务端类，此外还有用于填充、序列化、提取 HelloRequest 和 HelloResponse 消息类型的类。</p><h3 id="编写服务器端代码"><a href="#编写服务器端代码" class="headerlink" title="编写服务器端代码"></a>编写服务器端代码</h3><h4 id="服务实现"><a href="#服务实现" class="headerlink" title="服务实现"></a>服务实现</h4><p><code>greeter_server.py</code> 实现了 <code>Greeter</code> 服务所需要的行为。<br>正如你所见，Greeter 类通过实现 <code>sayHello</code> 方法，实现了从 proto 服务定义生成的<code>helloworld_pb2.BetaGreeterServicer</code> 接口：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="class"><span class="keyword">class</span> <span class="title">Greeter</span><span class="params">(helloworld_pb2.BetaGreeterServicer)</span>：</span></span><br><span class="line"><span class="class"></span></span><br><span class="line"><span class="class"><span class="title">def</span> <span class="title">SayHello</span><span class="params">(self, request, context)</span>：</span></span><br><span class="line"><span class="class">    <span class="title">return</span> <span class="title">helloworld_pb2</span>.<span class="title">HelloReply</span><span class="params">(message=<span class="string">'Hello, %s!'</span> % request.name)</span></span></span><br></pre></td></tr></table></figure><p>为了返回给客户端应答并且完成调用：</p><p>用我们的激动人心的消息构建并填充一个在我们接口定义的 <code>HelloReply</code> 应答对象。将 HelloReply 返回给客户端。</p><h4 id="服务端实现"><a href="#服务端实现" class="headerlink" title="服务端实现"></a>服务端实现</h4><p>需要提供一个 gRPC 服务的另一个主要功能是让这个服务实在在网络上可用。</p><p>greeter_server.py 提供了以下代码作为 Python 的例子。</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line">server = helloworld_pb2.beta_create_Greeter_server(Greeter())</span><br><span class="line">server.add_insecure_port(<span class="string">'[：：]：50051'</span>)</span><br><span class="line">server.start()</span><br><span class="line"><span class="keyword">try</span>：</span><br><span class="line">    <span class="keyword">while</span> <span class="keyword">True</span>：</span><br><span class="line">        time.sleep(_ONE_DAY_IN_SECONDS)</span><br><span class="line"><span class="keyword">except</span> KeyboardInterrupt：</span><br><span class="line">    server.stop()</span><br></pre></td></tr></table></figure><p>在这里我们创建了合理的 gRPC 服务器，将我们实现的 Greeter 服务绑定到一个端口。然后我们启动服务器：服务器现在已准备好从 Greeter 服务客户端接收请求。我们将在具体语言对应的文档里更深入地了解这所有的工作是怎样进行的。</p><h3 id="编写客户端代码"><a href="#编写客户端代码" class="headerlink" title="编写客户端代码"></a>编写客户端代码</h3><h4 id="连接服务"><a href="#连接服务" class="headerlink" title="连接服务"></a>连接服务</h4><p>首先我们看一下我们如何连接 Greeter 服务器。我们需要创建一个 gRPC 频道，指定我们要连接的主机名和服务器端口。然后我们用这个频道创建存根实例。</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">channel = implementations.insecure_channel(<span class="string">'localhost'</span>, <span class="number">50051</span>)</span><br><span class="line">stub = helloworld_pb2.beta_create_Greeter_stub(channel)</span><br><span class="line">...</span><br></pre></td></tr></table></figure><h4 id="调用-RPC"><a href="#调用-RPC" class="headerlink" title="调用 RPC"></a>调用 RPC</h4><p>现在我们可以联系服务并获得一个 greeting ：</p><ol><li>我们创建并填充一个 HelloRequest 发送给服务。</li><li>我们用请求调用存根的 SayHello()，如果 RPC 成功，会得到一个填充的 HelloReply ，从其中我们可以获得 greeting。</li></ol><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">response = stub.SayHello(helloworld_pb2.HelloRequest(name=<span class="string">'you'</span>), _TIMEOUT_SECONDS)</span><br><span class="line"><span class="keyword">print</span> <span class="string">"Greeter client received： "</span> + response.message</span><br></pre></td></tr></table></figure><p><code>greeter_client.py</code>完整代码如下:</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> grpc</span><br><span class="line"></span><br><span class="line"><span class="keyword">import</span> helloworld_pb2</span><br><span class="line"><span class="keyword">import</span> helloworld_pb2_grpc</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">run</span><span class="params">()</span>:</span></span><br><span class="line">    <span class="comment"># NOTE(gRPC Python Team): .close() is possible on a channel and should be</span></span><br><span class="line">    <span class="comment"># used in circumstances in which the with statement does not fit the needs</span></span><br><span class="line">    <span class="comment"># of the code.</span></span><br><span class="line">    <span class="keyword">with</span> grpc.insecure_channel(<span class="string">'localhost:50051'</span>) <span class="keyword">as</span> channel:</span><br><span class="line">        stub = helloworld_pb2_grpc.GreeterStub(channel)</span><br><span class="line">        response = stub.SayHello(helloworld_pb2.HelloRequest(name=<span class="string">'you'</span>))</span><br><span class="line">    print(<span class="string">"Greeter client received: "</span> + response.message)</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="keyword">if</span> __name__ == <span class="string">'__main__'</span>:</span><br><span class="line">    run()</span><br></pre></td></tr></table></figure><h3 id="运行并测试服务"><a href="#运行并测试服务" class="headerlink" title="运行并测试服务"></a>运行并测试服务</h3><p>你可以尝试用同一个语言在客户端和服务端构建并运行例子。或者你可以尝试 gRPC 最有用的一个功能 - 不同的语言间的互操作性，即在不同的语言运行客户端和服务端。每个服务端和客户端使用从同一过 proto 文件生成的接口代码，则意味着任何 Greeter 客户端可以与任何 Greeter 服务端对话。</p><ol><li>运行服务端程序,程序会监听 50051端口:</li></ol><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">python route_guide_server.py</span><br></pre></td></tr></table></figure><ol><li>运行客户端程序</li></ol><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">python route_guide_client.py</span><br></pre></td></tr></table></figure><h2 id="grpc-4种通信方式"><a href="#grpc-4种通信方式" class="headerlink" title="grpc: 4种通信方式"></a>grpc: 4种通信方式</h2><p>helloworld 使用了最简单的 grpc 通信方式: 类似 http 协议的一次 request+response.</p><h3 id="4种通信方式"><a href="#4种通信方式" class="headerlink" title="4种通信方式"></a>4种通信方式</h3><p>根据不同的业务场景, grpc 支持 4 种通信方式:</p><ul><li>客服端一次请求, 服务器一次应答</li><li>客服端一次请求, 服务器多次应答(流式)</li><li>客服端多次请求(流式), 服务器一次应答</li><li>客服端多次请求(流式), 服务器多次应答(流式)</li></ul><p>官方提供了一个 route guide service 的 demo, 应用到了这 4 种通信方式, 具体的业务如下:</p><ul><li>数据源: json 格式的数据源, 存储了很多地点, 每个地点由经纬度(point)和地名(location)组成</li><li>通信方式 1: 客户端请求一个地点是否在数据源中</li><li>通信方式 2: 客户端指定一个矩形范围(矩形的对角点坐标), 服务器返回这个范围内的地点信息</li><li>通信方式 3: 客户端给服务器发送多个地点信息, 服务器返回汇总信息(summary)</li><li>通信方式 4: 客户端和服务器使用地点信息 聊天(chat)</li></ul><h3 id="对应的proto文件"><a href="#对应的proto文件" class="headerlink" title="对应的proto文件"></a>对应的proto文件</h3><figure class="highlight java"><figcaption><span>route_guide.proto</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br></pre></td><td class="code"><pre><span class="line">syntax = <span class="string">"proto3"</span>;</span><br><span class="line"></span><br><span class="line"></span><br><span class="line">option java_multiple_files = <span class="keyword">true</span>;</span><br><span class="line">option java_package = <span class="string">"io.grpc.examples.routeguide"</span>;</span><br><span class="line">option java_outer_classname = <span class="string">"RouteGuideProto"</span>;</span><br><span class="line">option objc_class_prefix = <span class="string">"RTG"</span>;</span><br><span class="line"></span><br><span class="line"><span class="keyword">package</span> routeguide;</span><br><span class="line"></span><br><span class="line"><span class="comment">// Interface exported by the server.</span></span><br><span class="line">service RouteGuide &#123;</span><br><span class="line">    </span><br><span class="line">    <span class="function">rpc <span class="title">GetFeature</span><span class="params">(Point)</span> <span class="title">returns</span> <span class="params">(Feature)</span> </span>&#123;&#125;</span><br><span class="line"></span><br><span class="line">    <span class="function">rpc <span class="title">ListFeatures</span><span class="params">(Rectangle)</span> <span class="title">returns</span> <span class="params">(stream Feature)</span> </span>&#123;&#125;</span><br><span class="line"></span><br><span class="line">    <span class="function">rpc <span class="title">RecordRoute</span><span class="params">( stream Point)</span> <span class="title">returns</span> <span class="params">(RouteSummary)</span> </span>&#123;&#125;</span><br><span class="line"></span><br><span class="line">    <span class="function">rpc <span class="title">RouteChat</span><span class="params">(stream RouteNote)</span> <span class="title">returns</span> <span class="params">(stream RouteNote)</span> </span>&#123;&#125;</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line">message Point &#123;</span><br><span class="line">    int32 latitude = <span class="number">1</span>;</span><br><span class="line">    int32 longitude = <span class="number">2</span>;</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line">message Rectangle &#123;</span><br><span class="line">    Point lo = <span class="number">1</span>;</span><br><span class="line">    Point hi = <span class="number">2</span>;</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line">message Feature &#123;</span><br><span class="line">    string name = <span class="number">1</span>;</span><br><span class="line">    Point location = <span class="number">2</span>;</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line">message RouteNote &#123;</span><br><span class="line">    Point location = <span class="number">1</span>;</span><br><span class="line">    string message = <span class="number">2</span>;</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line">message RouteSummary &#123;</span><br><span class="line">    int32 point_count = <span class="number">1</span>;</span><br><span class="line">    int32 feature_count = <span class="number">2</span>;</span><br><span class="line">    int32 distance = <span class="number">3</span>;</span><br><span class="line">    int32 elapsed_time = <span class="number">4</span>;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>proto 中想要表示流式传输, 只需要添加 <code>stream</code> 关键字即可</p><p>同样的, 使用 <code>protoc</code> 生成代码:</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">python -m grpc_tools.protoc --python_out=. --grpc_python_out=. -I. route_guide.proto</span><br></pre></td></tr></table></figure><p>生成了 <code>route_guide_pb2.py</code> 和<code>route_guide_pb2_grpc.py</code> 文件</p><h3 id="处理数据源文件-route-guide-db-json"><a href="#处理数据源文件-route-guide-db-json" class="headerlink" title="处理数据源文件(route_guide_db.json)"></a>处理数据源文件(route_guide_db.json)</h3><figure class="highlight py"><figcaption><span>route_guide_db.py</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> json</span><br><span class="line"><span class="keyword">import</span> route_guide_pb2</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">read_route_guide_db</span><span class="params">()</span>:</span></span><br><span class="line">    feature_list = []</span><br><span class="line">    <span class="keyword">with</span> open(<span class="string">'route_guide_db.json'</span>) <span class="keyword">as</span> f:</span><br><span class="line">        <span class="keyword">for</span> item <span class="keyword">in</span> json.load(f):</span><br><span class="line">            feature = route_guide_pb2.Feature(name=item[<span class="string">'name'</span>],</span><br><span class="line">                                              location=route_guide_pb2.Point(latitude=item[<span class="string">'location'</span>][<span class="string">'latitude'</span>],</span><br><span class="line">                                                                             longitude=item[<span class="string">'location'</span>][<span class="string">'longitude'</span>]))</span><br><span class="line">            feature_list.append(feature)</span><br><span class="line">    <span class="keyword">return</span> feature_list</span><br></pre></td></tr></table></figure><p>处理 json 的过程很简单, 解析 json 数据得到由坐标点组成的数组</p><p>怎么处理流式数据呢?. 答案是 <code>for ... in</code> + <code>yield</code></p><h3 id="完整服务器端代码"><a href="#完整服务器端代码" class="headerlink" title="完整服务器端代码"></a>完整服务器端代码</h3><figure class="highlight python"><figcaption><span>route_guide_server.py</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br><span class="line">91</span><br><span class="line">92</span><br><span class="line">93</span><br><span class="line">94</span><br><span class="line">95</span><br><span class="line">96</span><br><span class="line">97</span><br><span class="line">98</span><br><span class="line">99</span><br><span class="line">100</span><br><span class="line">101</span><br><span class="line">102</span><br><span class="line">103</span><br><span class="line">104</span><br><span class="line">105</span><br><span class="line">106</span><br><span class="line">107</span><br><span class="line">108</span><br><span class="line">109</span><br><span class="line">110</span><br><span class="line">111</span><br><span class="line">112</span><br><span class="line">113</span><br><span class="line">114</span><br></pre></td><td class="code"><pre><span class="line"><span class="string">"""The Python implementation of the gRPC route guide server."""</span></span><br><span class="line"><span class="keyword">from</span> concurrent <span class="keyword">import</span> futures</span><br><span class="line"><span class="keyword">import</span> math</span><br><span class="line"><span class="keyword">import</span> time</span><br><span class="line"></span><br><span class="line"><span class="keyword">import</span> grpc</span><br><span class="line"></span><br><span class="line"><span class="keyword">import</span> route_guide_pb2</span><br><span class="line"><span class="keyword">import</span> route_guide_pb2_grpc</span><br><span class="line"><span class="keyword">import</span> route_guide_db</span><br><span class="line"></span><br><span class="line">_ONE_DAY_IN_SECONDS = <span class="number">60</span> * <span class="number">60</span> * <span class="number">24</span></span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">get_feature</span><span class="params">(feature_db, point)</span>:</span></span><br><span class="line">    <span class="string">"""returns feature at given location or None"""</span></span><br><span class="line">    <span class="keyword">for</span> feature <span class="keyword">in</span> feature_db:</span><br><span class="line">        <span class="keyword">if</span> feature.location == point:</span><br><span class="line">            <span class="keyword">return</span> feature</span><br><span class="line">    <span class="keyword">return</span> <span class="keyword">None</span></span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">get_distance</span><span class="params">(start, end)</span>:</span></span><br><span class="line">    <span class="string">"""Distance between two points."""</span></span><br><span class="line">    coord_factor = <span class="number">10000000.0</span></span><br><span class="line">    lat_1 = start.latitude / coord_factor</span><br><span class="line">    lat_2 = end.latitude / coord_factor</span><br><span class="line">    lon_1 = start.longitude / coord_factor</span><br><span class="line">    lon_2 = end.longitude / coord_factor</span><br><span class="line"></span><br><span class="line">    lat_rad_1 = math.radians(lat_1)</span><br><span class="line">    lat_rad_2 = math.radians(lat_2)</span><br><span class="line">    delta_lat_rad = math.radians(lat_2 - lat_1)</span><br><span class="line">    delta_lon_rad = math.radians(lon_2 - lon_2)</span><br><span class="line"></span><br><span class="line">    <span class="comment"># Formula is based on http://mathforum.org/library/drmath/view/51879.html</span></span><br><span class="line">    a = (pow(math.sin(delta_lat_rad / <span class="number">2</span>), <span class="number">2</span>) +</span><br><span class="line">         (math.cos(lat_rad_1) * math.cos(lat_rad_2) * pow(math.sin(delta_lon_rad / <span class="number">2</span>), <span class="number">2</span>)))</span><br><span class="line">    c = <span class="number">2</span> * math.atan2(math.sqrt(a), math.sqrt(<span class="number">1</span> - a))</span><br><span class="line">    R = <span class="number">6371000</span></span><br><span class="line">    <span class="keyword">return</span> R * c</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="class"><span class="keyword">class</span> <span class="title">RouteGuideServicer</span><span class="params">(route_guide_pb2_grpc.RouteGuideServicer)</span>:</span></span><br><span class="line">    <span class="string">"""Provides methods that implement functionality of route guide server."""</span></span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">__init__</span><span class="params">(self)</span>:</span></span><br><span class="line">        self.db = route_guide_db.read_route_guide_db()</span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">GetFeature</span><span class="params">(self, request, context)</span>:</span></span><br><span class="line">        feature = get_feature(self.db, request)</span><br><span class="line">        <span class="keyword">if</span> feature <span class="keyword">is</span> <span class="keyword">None</span>:</span><br><span class="line">            <span class="keyword">return</span> route_guide_pb2.Feature(name=<span class="string">""</span>, location=request)</span><br><span class="line">        <span class="keyword">else</span>:</span><br><span class="line">            <span class="keyword">return</span> feature</span><br><span class="line">    </span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">ListFeatures</span><span class="params">(self, request, context)</span>:</span></span><br><span class="line">        left = min(request.lo.longitude, request.hi.longitude)</span><br><span class="line">        right = max(request.lo.longitude, request.hi.longitude)</span><br><span class="line">        top = max(request.lo.latitude, request.hi.latitude)</span><br><span class="line">        bottom = min(request.lo.latitude, request.hi.latitude)</span><br><span class="line">        <span class="keyword">for</span> feature <span class="keyword">in</span> self.db:</span><br><span class="line">            <span class="keyword">if</span>(feature.location.longitude &gt;= left <span class="keyword">and</span></span><br><span class="line">                feature.location.longitude &lt;=right <span class="keyword">and</span></span><br><span class="line">                feature.location.latitude &gt;= bottom <span class="keyword">and</span></span><br><span class="line">                feature.location.latitude &lt;= top):</span><br><span class="line">                <span class="keyword">yield</span> feature</span><br><span class="line">        </span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">RecordRoute</span><span class="params">(self, request_iterator, context)</span>:</span></span><br><span class="line">        point_count = <span class="number">0</span></span><br><span class="line">        feature_count = <span class="number">0</span></span><br><span class="line">        distance = <span class="number">0.0</span></span><br><span class="line">        prev_point = <span class="keyword">None</span></span><br><span class="line"></span><br><span class="line">        start_time = time.time()</span><br><span class="line">        <span class="keyword">for</span> point <span class="keyword">in</span> request_iterator:</span><br><span class="line">            point_count += <span class="number">1</span></span><br><span class="line">            <span class="keyword">if</span> get_feature(self.db, point):</span><br><span class="line">                feature_count += <span class="number">1</span></span><br><span class="line">            <span class="keyword">if</span> prev_point:</span><br><span class="line">                distance += get_distance(prev_point, point)</span><br><span class="line">            prev_point = point</span><br><span class="line"></span><br><span class="line">        elapsed_time = time.time() - start_time</span><br><span class="line">        <span class="keyword">return</span> route_guide_pb2.RouteSummary(</span><br><span class="line">            point_count=point_count,</span><br><span class="line">            feature_count=feature_count,</span><br><span class="line">            distance=int(distance),</span><br><span class="line">            elapsed_time=int(elapsed_time))</span><br><span class="line">    </span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">RouteChat</span><span class="params">(self, request_iterator, context)</span>:</span></span><br><span class="line">        prev_notes = []</span><br><span class="line">        <span class="keyword">for</span> new_note <span class="keyword">in</span> request_iterator:</span><br><span class="line">            <span class="keyword">for</span> prev_note <span class="keyword">in</span> prev_notes:</span><br><span class="line">                <span class="keyword">if</span> prev_note.location == new_note.location:</span><br><span class="line">                    <span class="keyword">yield</span> prev_note</span><br><span class="line">            prev_notes.append(new_note)</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">serve</span><span class="params">()</span>:</span></span><br><span class="line">    server = grpc.server(futures.ThreadPoolExecutor(max_workers=<span class="number">10</span>))</span><br><span class="line">    route_guide_pb2_grpc.add_RouteGuideServicer_to_server(RouteGuideServicer(), server)</span><br><span class="line">    server.add_insecure_port(<span class="string">'[::]:50051'</span>)</span><br><span class="line">    server.start()</span><br><span class="line">    <span class="keyword">try</span>:</span><br><span class="line">        <span class="keyword">while</span> <span class="keyword">True</span>:</span><br><span class="line">            time.sleep(_ONE_DAY_IN_SECONDS)</span><br><span class="line">    <span class="keyword">except</span> KeyboardInterrupt:</span><br><span class="line">        server.stop(<span class="number">0</span>)</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="keyword">if</span> __name__ == <span class="string">'__main__'</span>:</span><br><span class="line">    print(<span class="string">"route guide server is running..."</span>)</span><br><span class="line">    serve()</span><br></pre></td></tr></table></figure><h3 id="完整客户端代码"><a href="#完整客户端代码" class="headerlink" title="完整客户端代码"></a>完整客户端代码</h3><figure class="highlight python"><figcaption><span>route_guide_client.py</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br><span class="line">91</span><br><span class="line">92</span><br><span class="line">93</span><br><span class="line">94</span><br><span class="line">95</span><br><span class="line">96</span><br><span class="line">97</span><br><span class="line">98</span><br><span class="line">99</span><br><span class="line">100</span><br><span class="line">101</span><br></pre></td><td class="code"><pre><span class="line"><span class="string">"""The Python implementation of the gRPC route guide client."""</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">from</span> __future__ <span class="keyword">import</span> print_function</span><br><span class="line"></span><br><span class="line"><span class="keyword">import</span> random</span><br><span class="line"><span class="keyword">import</span> grpc</span><br><span class="line"><span class="keyword">import</span> route_guide_pb2</span><br><span class="line"><span class="keyword">import</span> route_guide_pb2_grpc</span><br><span class="line"><span class="keyword">import</span> route_guide_db</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">make_route_note</span><span class="params">(message, latitude, longitude)</span>:</span></span><br><span class="line">    <span class="keyword">return</span> route_guide_pb2.RouteNote(</span><br><span class="line">        message=message,</span><br><span class="line">        location=route_guide_pb2.Point(latitude=latitude, longitude=longitude)</span><br><span class="line">    )</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">guide_get_one_feature</span><span class="params">(stub, point)</span>:</span></span><br><span class="line">    feature = stub.GetFeature(point)</span><br><span class="line">    <span class="keyword">if</span> <span class="keyword">not</span> feature.location:</span><br><span class="line">        print(<span class="string">"server returned incomplete feature"</span>)</span><br><span class="line">        <span class="keyword">return</span> </span><br><span class="line">    </span><br><span class="line">    <span class="keyword">if</span> feature.name:</span><br><span class="line">        print(<span class="string">"Feature called %s at %s"</span> % (feature.name, feature.location))</span><br><span class="line">    <span class="keyword">else</span>:</span><br><span class="line">        print(<span class="string">"Found no feature at %s"</span> % feature.location)</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">guide_get_feature</span><span class="params">(stub)</span>:</span></span><br><span class="line">    guide_get_one_feature(stub,</span><br><span class="line">        route_guide_pb2.Point(latitude=<span class="number">409146138</span>, longitude=<span class="number">-746188906</span>))</span><br><span class="line">    guide_get_one_feature(stub, route_guide_pb2.Point(latitude=<span class="number">0</span>, longitude=<span class="number">0</span>))</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">guide_list_features</span><span class="params">(stub)</span>:</span></span><br><span class="line">    rectangle = route_guide_pb2.Rectangle(</span><br><span class="line">        lo=route_guide_pb2.Point(latitude=<span class="number">400000000</span>, longitude=<span class="number">-750000000</span>),</span><br><span class="line">        hi=route_guide_pb2.Point(latitude=<span class="number">420000000</span>, longitude=<span class="number">-730000000</span>))</span><br><span class="line">    print(<span class="string">"looking for features between 40, -75 and 42, -73"</span>)</span><br><span class="line"></span><br><span class="line">    features = stub.ListFeatures(rectangle)</span><br><span class="line"></span><br><span class="line">    <span class="keyword">for</span> feature <span class="keyword">in</span> features:</span><br><span class="line">        print(<span class="string">"Feature called %s at %s"</span> % (feature.name, feature.location))</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">generate_route</span><span class="params">(feature_list)</span>:</span></span><br><span class="line">    <span class="keyword">for</span> _ <span class="keyword">in</span> range(<span class="number">0</span>, <span class="number">10</span>):</span><br><span class="line">        random_feature = feature_list[random.randint(<span class="number">0</span>, len(feature_list))]</span><br><span class="line">        print(<span class="string">"Visiting point %s"</span> % random_feature.location)</span><br><span class="line">        <span class="keyword">yield</span> random_feature.location</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">guide_record_route</span><span class="params">(stub)</span>:</span></span><br><span class="line">    feature_list = route_guide_db.read_route_guide_db()</span><br><span class="line"></span><br><span class="line">    route_iterator = generate_route(feature_list)</span><br><span class="line">    route_summary = stub.RecordRoute(route_iterator)</span><br><span class="line">    print(<span class="string">"Finished trip with %s points "</span> % route_summary.point_count)</span><br><span class="line">    print(<span class="string">"Passed %s features "</span> % route_summary.feature_count)</span><br><span class="line">    print(<span class="string">"Travelled %s meters "</span> % route_summary.distance)</span><br><span class="line">    print(<span class="string">"It took %s seconds "</span> % route_summary.elapsed_time)</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">generate_messages</span><span class="params">()</span>:</span></span><br><span class="line">    messages = [</span><br><span class="line">        make_route_note(<span class="string">"First message"</span>, <span class="number">0</span>, <span class="number">0</span>),</span><br><span class="line">        make_route_note(<span class="string">"Second message"</span>, <span class="number">0</span>, <span class="number">1</span>),</span><br><span class="line">        make_route_note(<span class="string">"Third message"</span>, <span class="number">1</span>, <span class="number">0</span>),</span><br><span class="line">        make_route_note(<span class="string">"Fourth message"</span>, <span class="number">0</span>, <span class="number">0</span>),</span><br><span class="line">        make_route_note(<span class="string">"Fifth message"</span>, <span class="number">1</span>, <span class="number">0</span>),</span><br><span class="line">    ]</span><br><span class="line"></span><br><span class="line">    <span class="keyword">for</span> msg <span class="keyword">in</span> messages:</span><br><span class="line">        print(<span class="string">"Sending %s at %s"</span> % (msg.message, msg.location))</span><br><span class="line">        <span class="keyword">yield</span> msg</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">guide_route_chat</span><span class="params">(stub)</span>:</span></span><br><span class="line">    responses = stub.RouteChat(generate_messages())</span><br><span class="line">    <span class="keyword">for</span> response <span class="keyword">in</span> responses:</span><br><span class="line">        print(<span class="string">"Received message %s at %s"</span> % (response.message, response.location))</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">run</span><span class="params">()</span>:</span></span><br><span class="line">    <span class="keyword">with</span> grpc.insecure_channel(<span class="string">'localhost:50051'</span>) <span class="keyword">as</span> channel:</span><br><span class="line">        stub = route_guide_pb2_grpc.RouteGuideStub(channel)</span><br><span class="line">        print(<span class="string">"-------------- GetFeature --------------"</span>)</span><br><span class="line">        guide_get_feature(stub)</span><br><span class="line">        print(<span class="string">"-------------- ListFeatures --------------"</span>)</span><br><span class="line">        guide_list_features(stub)</span><br><span class="line">        print(<span class="string">"-------------- RecordRoute --------------"</span>)</span><br><span class="line">        guide_record_route(stub)</span><br><span class="line">        print(<span class="string">"-------------- RouteChat --------------"</span>)</span><br><span class="line">        guide_route_chat(stub)</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="keyword">if</span> __name__ == <span class="string">'__main__'</span>:</span><br><span class="line">    run()</span><br></pre></td></tr></table></figure><h3 id="运行结果"><a href="#运行结果" class="headerlink" title="运行结果"></a>运行结果</h3><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br><span class="line">91</span><br><span class="line">92</span><br><span class="line">93</span><br><span class="line">94</span><br><span class="line">95</span><br><span class="line">96</span><br><span class="line">97</span><br><span class="line">98</span><br><span class="line">99</span><br><span class="line">100</span><br><span class="line">101</span><br><span class="line">102</span><br><span class="line">103</span><br><span class="line">104</span><br><span class="line">105</span><br><span class="line">106</span><br><span class="line">107</span><br><span class="line">108</span><br><span class="line">109</span><br><span class="line">110</span><br><span class="line">111</span><br><span class="line">112</span><br><span class="line">113</span><br><span class="line">114</span><br><span class="line">115</span><br><span class="line">116</span><br><span class="line">117</span><br><span class="line">118</span><br><span class="line">119</span><br><span class="line">120</span><br><span class="line">121</span><br><span class="line">122</span><br><span class="line">123</span><br><span class="line">124</span><br><span class="line">125</span><br><span class="line">126</span><br><span class="line">127</span><br><span class="line">128</span><br><span class="line">129</span><br><span class="line">130</span><br><span class="line">131</span><br><span class="line">132</span><br><span class="line">133</span><br><span class="line">134</span><br><span class="line">135</span><br><span class="line">136</span><br><span class="line">137</span><br><span class="line">138</span><br><span class="line">139</span><br><span class="line">140</span><br><span class="line">141</span><br><span class="line">142</span><br><span class="line">143</span><br><span class="line">144</span><br><span class="line">145</span><br><span class="line">146</span><br><span class="line">147</span><br><span class="line">148</span><br><span class="line">149</span><br><span class="line">150</span><br><span class="line">151</span><br><span class="line">152</span><br><span class="line">153</span><br><span class="line">154</span><br><span class="line">155</span><br><span class="line">156</span><br><span class="line">157</span><br><span class="line">158</span><br><span class="line">159</span><br><span class="line">160</span><br><span class="line">161</span><br><span class="line">162</span><br><span class="line">163</span><br><span class="line">164</span><br><span class="line">165</span><br><span class="line">166</span><br><span class="line">167</span><br><span class="line">168</span><br><span class="line">169</span><br><span class="line">170</span><br><span class="line">171</span><br><span class="line">172</span><br><span class="line">173</span><br><span class="line">174</span><br><span class="line">175</span><br><span class="line">176</span><br><span class="line">177</span><br><span class="line">178</span><br><span class="line">179</span><br><span class="line">180</span><br><span class="line">181</span><br><span class="line">182</span><br><span class="line">183</span><br><span class="line">184</span><br><span class="line">185</span><br><span class="line">186</span><br><span class="line">187</span><br><span class="line">188</span><br><span class="line">189</span><br><span class="line">190</span><br><span class="line">191</span><br><span class="line">192</span><br><span class="line">193</span><br><span class="line">194</span><br><span class="line">195</span><br><span class="line">196</span><br><span class="line">197</span><br><span class="line">198</span><br><span class="line">199</span><br><span class="line">200</span><br><span class="line">201</span><br><span class="line">202</span><br><span class="line">203</span><br><span class="line">204</span><br><span class="line">205</span><br><span class="line">206</span><br><span class="line">207</span><br><span class="line">208</span><br><span class="line">209</span><br><span class="line">210</span><br><span class="line">211</span><br><span class="line">212</span><br><span class="line">213</span><br><span class="line">214</span><br><span class="line">215</span><br><span class="line">216</span><br><span class="line">217</span><br><span class="line">218</span><br><span class="line">219</span><br><span class="line">220</span><br><span class="line">221</span><br><span class="line">222</span><br><span class="line">223</span><br><span class="line">224</span><br><span class="line">225</span><br><span class="line">226</span><br><span class="line">227</span><br><span class="line">228</span><br><span class="line">229</span><br><span class="line">230</span><br><span class="line">231</span><br><span class="line">232</span><br><span class="line">233</span><br><span class="line">234</span><br><span class="line">235</span><br><span class="line">236</span><br><span class="line">237</span><br><span class="line">238</span><br><span class="line">239</span><br><span class="line">240</span><br><span class="line">241</span><br><span class="line">242</span><br><span class="line">243</span><br><span class="line">244</span><br><span class="line">245</span><br><span class="line">246</span><br><span class="line">247</span><br><span class="line">248</span><br><span class="line">249</span><br><span class="line">250</span><br><span class="line">251</span><br><span class="line">252</span><br><span class="line">253</span><br><span class="line">254</span><br><span class="line">255</span><br><span class="line">256</span><br><span class="line">257</span><br><span class="line">258</span><br><span class="line">259</span><br><span class="line">260</span><br><span class="line">261</span><br><span class="line">262</span><br><span class="line">263</span><br><span class="line">264</span><br><span class="line">265</span><br><span class="line">266</span><br><span class="line">267</span><br><span class="line">268</span><br><span class="line">269</span><br><span class="line">270</span><br><span class="line">271</span><br><span class="line">272</span><br><span class="line">273</span><br><span class="line">274</span><br><span class="line">275</span><br><span class="line">276</span><br><span class="line">277</span><br><span class="line">278</span><br><span class="line">279</span><br><span class="line">280</span><br><span class="line">281</span><br><span class="line">282</span><br><span class="line">283</span><br><span class="line">284</span><br><span class="line">285</span><br><span class="line">286</span><br><span class="line">287</span><br><span class="line">288</span><br><span class="line">289</span><br><span class="line">290</span><br><span class="line">291</span><br><span class="line">292</span><br><span class="line">293</span><br><span class="line">294</span><br><span class="line">295</span><br><span class="line">296</span><br><span class="line">297</span><br><span class="line">298</span><br><span class="line">299</span><br><span class="line">300</span><br><span class="line">301</span><br><span class="line">302</span><br><span class="line">303</span><br><span class="line">304</span><br><span class="line">305</span><br><span class="line">306</span><br><span class="line">307</span><br><span class="line">308</span><br><span class="line">309</span><br><span class="line">310</span><br><span class="line">311</span><br><span class="line">312</span><br><span class="line">313</span><br><span class="line">314</span><br><span class="line">315</span><br><span class="line">316</span><br><span class="line">317</span><br><span class="line">318</span><br><span class="line">319</span><br><span class="line">320</span><br><span class="line">321</span><br><span class="line">322</span><br><span class="line">323</span><br><span class="line">324</span><br><span class="line">325</span><br><span class="line">326</span><br><span class="line">327</span><br><span class="line">328</span><br><span class="line">329</span><br><span class="line">330</span><br><span class="line">331</span><br><span class="line">332</span><br><span class="line">333</span><br><span class="line">334</span><br><span class="line">335</span><br><span class="line">336</span><br><span class="line">337</span><br><span class="line">338</span><br><span class="line">339</span><br><span class="line">340</span><br><span class="line">341</span><br><span class="line">342</span><br><span class="line">343</span><br><span class="line">344</span><br><span class="line">345</span><br><span class="line">346</span><br><span class="line">347</span><br><span class="line">348</span><br><span class="line">349</span><br><span class="line">350</span><br><span class="line">351</span><br><span class="line">352</span><br><span class="line">353</span><br></pre></td><td class="code"><pre><span class="line">-------------- GetFeature --------------</span><br><span class="line">Feature called Berkshire Valley Management Area Trail, Jefferson, NJ, USA at latitude: 409146138</span><br><span class="line">longitude: -746188906</span><br><span class="line"></span><br><span class="line">Found no feature at</span><br><span class="line">-------------- ListFeatures --------------</span><br><span class="line">looking for features between 40, -75 and 42, -73</span><br><span class="line">Feature called Patriots Path, Mendham, NJ 07945, USA at latitude: 407838351</span><br><span class="line">longitude: -746143763</span><br><span class="line"></span><br><span class="line">Feature called 101 New Jersey 10, Whippany, NJ 07981, USA at latitude: 408122808</span><br><span class="line">longitude: -743999179</span><br><span class="line"></span><br><span class="line">Feature called U.S. 6, Shohola, PA 18458, USA at latitude: 413628156</span><br><span class="line">longitude: -749015468</span><br><span class="line"></span><br><span class="line">Feature called 5 Conners Road, Kingston, NY 12401, USA at latitude: 419999544</span><br><span class="line">longitude: -740371136</span><br><span class="line"></span><br><span class="line">Feature called Mid Hudson Psychiatric Center, New Hampton, NY 10958, USA at latitude: 414008389</span><br><span class="line">longitude: -743951297</span><br><span class="line"></span><br><span class="line">Feature called 287 Flugertown Road, Livingston Manor, NY 12758, USA at latitude: 419611318</span><br><span class="line">longitude: -746524769</span><br><span class="line"></span><br><span class="line">Feature called 4001 Tremley Point Road, Linden, NJ 07036, USA at latitude: 406109563</span><br><span class="line">longitude: -742186778</span><br><span class="line"></span><br><span class="line">Feature called 352 South Mountain Road, Wallkill, NY 12589, USA at latitude: 416802456</span><br><span class="line">longitude: -742370183</span><br><span class="line"></span><br><span class="line">Feature called Bailey Turn Road, Harriman, NY 10926, USA at latitude: 412950425</span><br><span class="line">longitude: -741077389</span><br><span class="line"></span><br><span class="line">Feature called 193-199 Wawayanda Road, Hewitt, NJ 07421, USA at latitude: 412144655</span><br><span class="line">longitude: -743949739</span><br><span class="line"></span><br><span class="line">Feature called 406-496 Ward Avenue, Pine Bush, NY 12566, USA at latitude: 415736605</span><br><span class="line">longitude: -742847522</span><br><span class="line"></span><br><span class="line">Feature called 162 Merrill Road, Highland Mills, NY 10930, USA at latitude: 413843930</span><br><span class="line">longitude: -740501726</span><br><span class="line"></span><br><span class="line">Feature called Clinton Road, West Milford, NJ 07480, USA at latitude: 410873075</span><br><span class="line">longitude: -744459023</span><br><span class="line"></span><br><span class="line">Feature called 16 Old Brook Lane, Warwick, NY 10990, USA at latitude: 412346009</span><br><span class="line">longitude: -744026814</span><br><span class="line"></span><br><span class="line">Feature called 3 Drake Lane, Pennington, NJ 08534, USA at latitude: 402948455</span><br><span class="line">longitude: -747903913</span><br><span class="line"></span><br><span class="line">Feature called 6324 8th Avenue, Brooklyn, NY 11220, USA at latitude: 406337092</span><br><span class="line">longitude: -740122226</span><br><span class="line"></span><br><span class="line">Feature called 1 Merck Access Road, Whitehouse Station, NJ 08889, USA at latitude: 406421967</span><br><span class="line">longitude: -747727624</span><br><span class="line"></span><br><span class="line">Feature called 78-98 Schalck Road, Narrowsburg, NY 12764, USA at latitude: 416318082</span><br><span class="line">longitude: -749677716</span><br><span class="line"></span><br><span class="line">Feature called 282 Lakeview Drive Road, Highland Lake, NY 12743, USA at latitude: 415301720</span><br><span class="line">longitude: -748416257</span><br><span class="line"></span><br><span class="line">Feature called 330 Evelyn Avenue, Hamilton Township, NJ 08619, USA at latitude: 402647019</span><br><span class="line">longitude: -747071791</span><br><span class="line"></span><br><span class="line">Feature called New York State Reference Route 987E, Southfields, NY 10975, USA at latitude: 412567807</span><br><span class="line">longitude: -741058078</span><br><span class="line"></span><br><span class="line">Feature called 103-271 Tempaloni Road, Ellenville, NY 12428, USA at latitude: 416855156</span><br><span class="line">longitude: -744420597</span><br><span class="line"></span><br><span class="line">Feature called 1300 Airport Road, North Brunswick Township, NJ 08902, USA at latitude: 404663628</span><br><span class="line">longitude: -744820157</span><br><span class="line"></span><br><span class="line">Feature called  at latitude: 407113723</span><br><span class="line">longitude: -749746483</span><br><span class="line"></span><br><span class="line">Feature called  at latitude: 402133926</span><br><span class="line">longitude: -743613249</span><br><span class="line"></span><br><span class="line">Feature called  at latitude: 400273442</span><br><span class="line">longitude: -741220915</span><br><span class="line"></span><br><span class="line">Feature called  at latitude: 411236786</span><br><span class="line">longitude: -744070769</span><br><span class="line"></span><br><span class="line">Feature called 211-225 Plains Road, Augusta, NJ 07822, USA at latitude: 411633782</span><br><span class="line">longitude: -746784970</span><br><span class="line"></span><br><span class="line">Feature called  at latitude: 415830701</span><br><span class="line">longitude: -742952812</span><br><span class="line"></span><br><span class="line">Feature called 165 Pedersen Ridge Road, Milford, PA 18337, USA at latitude: 413447164</span><br><span class="line">longitude: -748712898</span><br><span class="line"></span><br><span class="line">Feature called 100-122 Locktown Road, Frenchtown, NJ 08825, USA at latitude: 405047245</span><br><span class="line">longitude: -749800722</span><br><span class="line"></span><br><span class="line">Feature called  at latitude: 418858923</span><br><span class="line">longitude: -746156790</span><br><span class="line"></span><br><span class="line">Feature called 650-652 Willi Hill Road, Swan Lake, NY 12783, USA at latitude: 417951888</span><br><span class="line">longitude: -748484944</span><br><span class="line"></span><br><span class="line">Feature called 26 East 3rd Street, New Providence, NJ 07974, USA at latitude: 407033786</span><br><span class="line">longitude: -743977337</span><br><span class="line"></span><br><span class="line">Feature called  at latitude: 417548014</span><br><span class="line">longitude: -740075041</span><br><span class="line"></span><br><span class="line">Feature called  at latitude: 410395868</span><br><span class="line">longitude: -744972325</span><br><span class="line"></span><br><span class="line">Feature called  at latitude: 404615353</span><br><span class="line">longitude: -745129803</span><br><span class="line"></span><br><span class="line">Feature called 611 Lawrence Avenue, Westfield, NJ 07090, USA at latitude: 406589790</span><br><span class="line">longitude: -743560121</span><br><span class="line"></span><br><span class="line">Feature called 18 Lannis Avenue, New Windsor, NY 12553, USA at latitude: 414653148</span><br><span class="line">longitude: -740477477</span><br><span class="line"></span><br><span class="line">Feature called 82-104 Amherst Avenue, Colonia, NJ 07067, USA at latitude: 405957808</span><br><span class="line">longitude: -743255336</span><br><span class="line"></span><br><span class="line">Feature called 170 Seven Lakes Drive, Sloatsburg, NY 10974, USA at latitude: 411733589</span><br><span class="line">longitude: -741648093</span><br><span class="line"></span><br><span class="line">Feature called 1270 Lakes Road, Monroe, NY 10950, USA at latitude: 412676291</span><br><span class="line">longitude: -742606606</span><br><span class="line"></span><br><span class="line">Feature called 509-535 Alphano Road, Great Meadows, NJ 07838, USA at latitude: 409224445</span><br><span class="line">longitude: -748286738</span><br><span class="line"></span><br><span class="line">Feature called 652 Garden Street, Elizabeth, NJ 07202, USA at latitude: 406523420</span><br><span class="line">longitude: -742135517</span><br><span class="line"></span><br><span class="line">Feature called 349 Sea Spray Court, Neptune City, NJ 07753, USA at latitude: 401827388</span><br><span class="line">longitude: -740294537</span><br><span class="line"></span><br><span class="line">Feature called 13-17 Stanley Street, West Milford, NJ 07480, USA at latitude: 410564152</span><br><span class="line">longitude: -743685054</span><br><span class="line"></span><br><span class="line">Feature called 47 Industrial Avenue, Teterboro, NJ 07608, USA at latitude: 408472324</span><br><span class="line">longitude: -740726046</span><br><span class="line"></span><br><span class="line">Feature called 5 White Oak Lane, Stony Point, NY 10980, USA at latitude: 412452168</span><br><span class="line">longitude: -740214052</span><br><span class="line"></span><br><span class="line">Feature called Berkshire Valley Management Area Trail, Jefferson, NJ, USA at latitude: 409146138</span><br><span class="line">longitude: -746188906</span><br><span class="line"></span><br><span class="line">Feature called 1007 Jersey Avenue, New Brunswick, NJ 08901, USA at latitude: 404701380</span><br><span class="line">longitude: -744781745</span><br><span class="line"></span><br><span class="line">Feature called 6 East Emerald Isle Drive, Lake Hopatcong, NJ 07849, USA at latitude: 409642566</span><br><span class="line">longitude: -746017679</span><br><span class="line"></span><br><span class="line">Feature called 1358-1474 New Jersey 57, Port Murray, NJ 07865, USA at latitude: 408031728</span><br><span class="line">longitude: -748645385</span><br><span class="line"></span><br><span class="line">Feature called 367 Prospect Road, Chester, NY 10918, USA at latitude: 413700272</span><br><span class="line">longitude: -742135189</span><br><span class="line"></span><br><span class="line">Feature called 10 Simon Lake Drive, Atlantic Highlands, NJ 07716, USA at latitude: 404310607</span><br><span class="line">longitude: -740282632</span><br><span class="line"></span><br><span class="line">Feature called 11 Ward Street, Mount Arlington, NJ 07856, USA at latitude: 409319800</span><br><span class="line">longitude: -746201391</span><br><span class="line"></span><br><span class="line">Feature called 300-398 Jefferson Avenue, Elizabeth, NJ 07201, USA at latitude: 406685311</span><br><span class="line">longitude: -742108603</span><br><span class="line"></span><br><span class="line">Feature called 43 Dreher Road, Roscoe, NY 12776, USA at latitude: 419018117</span><br><span class="line">longitude: -749142781</span><br><span class="line"></span><br><span class="line">Feature called Swan Street, Pine Island, NY 10969, USA at latitude: 412856162</span><br><span class="line">longitude: -745148837</span><br><span class="line"></span><br><span class="line">Feature called 66 Pleasantview Avenue, Monticello, NY 12701, USA at latitude: 416560744</span><br><span class="line">longitude: -746721964</span><br><span class="line"></span><br><span class="line">Feature called  at latitude: 405314270</span><br><span class="line">longitude: -749836354</span><br><span class="line"></span><br><span class="line">Feature called  at latitude: 414219548</span><br><span class="line">longitude: -743327440</span><br><span class="line"></span><br><span class="line">Feature called 565 Winding Hills Road, Montgomery, NY 12549, USA at latitude: 415534177</span><br><span class="line">longitude: -742900616</span><br><span class="line"></span><br><span class="line">Feature called 231 Rocky Run Road, Glen Gardner, NJ 08826, USA at latitude: 406898530</span><br><span class="line">longitude: -749127080</span><br><span class="line"></span><br><span class="line">Feature called 100 Mount Pleasant Avenue, Newark, NJ 07104, USA at latitude: 407586880</span><br><span class="line">longitude: -741670168</span><br><span class="line"></span><br><span class="line">Feature called 517-521 Huntington Drive, Manchester Township, NJ 08759, USA at latitude: 400106455</span><br><span class="line">longitude: -742870190</span><br><span class="line"></span><br><span class="line">Feature called  at latitude: 400066188</span><br><span class="line">longitude: -746793294</span><br><span class="line"></span><br><span class="line">Feature called 40 Mountain Road, Napanoch, NY 12458, USA at latitude: 418803880</span><br><span class="line">longitude: -744102673</span><br><span class="line"></span><br><span class="line">Feature called  at latitude: 414204288</span><br><span class="line">longitude: -747895140</span><br><span class="line"></span><br><span class="line">Feature called  at latitude: 414777405</span><br><span class="line">longitude: -740615601</span><br><span class="line"></span><br><span class="line">Feature called 48 North Road, Forestburgh, NY 12777, USA at latitude: 415464475</span><br><span class="line">longitude: -747175374</span><br><span class="line"></span><br><span class="line">Feature called  at latitude: 404062378</span><br><span class="line">longitude: -746376177</span><br><span class="line"></span><br><span class="line">Feature called  at latitude: 405688272</span><br><span class="line">longitude: -749285130</span><br><span class="line"></span><br><span class="line">Feature called  at latitude: 400342070</span><br><span class="line">longitude: -748788996</span><br><span class="line"></span><br><span class="line">Feature called  at latitude: 401809022</span><br><span class="line">longitude: -744157964</span><br><span class="line"></span><br><span class="line">Feature called 9 Thompson Avenue, Leonardo, NJ 07737, USA at latitude: 404226644</span><br><span class="line">longitude: -740517141</span><br><span class="line"></span><br><span class="line">Feature called  at latitude: 410322033</span><br><span class="line">longitude: -747871659</span><br><span class="line"></span><br><span class="line">Feature called  at latitude: 407100674</span><br><span class="line">longitude: -747742727</span><br><span class="line"></span><br><span class="line">Feature called 213 Bush Road, Stone Ridge, NY 12484, USA at latitude: 418811433</span><br><span class="line">longitude: -741718005</span><br><span class="line"></span><br><span class="line">Feature called  at latitude: 415034302</span><br><span class="line">longitude: -743850945</span><br><span class="line"></span><br><span class="line">Feature called  at latitude: 411349992</span><br><span class="line">longitude: -743694161</span><br><span class="line"></span><br><span class="line">Feature called 1-17 Bergen Court, New Brunswick, NJ 08901, USA at latitude: 404839914</span><br><span class="line">longitude: -744759616</span><br><span class="line"></span><br><span class="line">Feature called 35 Oakland Valley Road, Cuddebackville, NY 12729, USA at latitude: 414638017</span><br><span class="line">longitude: -745957854</span><br><span class="line"></span><br><span class="line">Feature called  at latitude: 412127800</span><br><span class="line">longitude: -740173578</span><br><span class="line"></span><br><span class="line">Feature called  at latitude: 401263460</span><br><span class="line">longitude: -747964303</span><br><span class="line"></span><br><span class="line">Feature called  at latitude: 412843391</span><br><span class="line">longitude: -749086026</span><br><span class="line"></span><br><span class="line">Feature called  at latitude: 418512773</span><br><span class="line">longitude: -743067823</span><br><span class="line"></span><br><span class="line">Feature called 42-102 Main Street, Belford, NJ 07718, USA at latitude: 404318328</span><br><span class="line">longitude: -740835638</span><br><span class="line"></span><br><span class="line">Feature called  at latitude: 419020746</span><br><span class="line">longitude: -741172328</span><br><span class="line"></span><br><span class="line">Feature called  at latitude: 404080723</span><br><span class="line">longitude: -746119569</span><br><span class="line"></span><br><span class="line">Feature called  at latitude: 401012643</span><br><span class="line">longitude: -744035134</span><br><span class="line"></span><br><span class="line">Feature called  at latitude: 404306372</span><br><span class="line">longitude: -741079661</span><br><span class="line"></span><br><span class="line">Feature called  at latitude: 403966326</span><br><span class="line">longitude: -748519297</span><br><span class="line"></span><br><span class="line">Feature called  at latitude: 405002031</span><br><span class="line">longitude: -748407866</span><br><span class="line"></span><br><span class="line">Feature called  at latitude: 409532885</span><br><span class="line">longitude: -742200683</span><br><span class="line"></span><br><span class="line">Feature called  at latitude: 416851321</span><br><span class="line">longitude: -742674555</span><br><span class="line"></span><br><span class="line">Feature called 3387 Richmond Terrace, Staten Island, NY 10303, USA at latitude: 406411633</span><br><span class="line">longitude: -741722051</span><br><span class="line"></span><br><span class="line">Feature called 261 Van Sickle Road, Goshen, NY 10924, USA at latitude: 413069058</span><br><span class="line">longitude: -744597778</span><br><span class="line"></span><br><span class="line">Feature called  at latitude: 418465462</span><br><span class="line">longitude: -746859398</span><br><span class="line"></span><br><span class="line">Feature called  at latitude: 411733222</span><br><span class="line">longitude: -744228360</span><br><span class="line"></span><br><span class="line">Feature called 3 Hasta Way, Newton, NJ 07860, USA at latitude: 410248224</span><br><span class="line">longitude: -747127767</span><br><span class="line"></span><br><span class="line">-------------- RecordRoute --------------</span><br><span class="line">Visiting point latitude: 405002031</span><br><span class="line">longitude: -748407866</span><br><span class="line"></span><br><span class="line">Visiting point latitude: 400106455</span><br><span class="line">longitude: -742870190</span><br><span class="line"></span><br><span class="line">Visiting point latitude: 409532885</span><br><span class="line">longitude: -742200683</span><br><span class="line"></span><br><span class="line">Visiting point latitude: 413628156</span><br><span class="line">longitude: -749015468</span><br><span class="line"></span><br><span class="line">Visiting point latitude: 413700272</span><br><span class="line">longitude: -742135189</span><br><span class="line"></span><br><span class="line">Visiting point latitude: 406523420</span><br><span class="line">longitude: -742135517</span><br><span class="line"></span><br><span class="line">Visiting point latitude: 400273442</span><br><span class="line">longitude: -741220915</span><br><span class="line"></span><br><span class="line">Visiting point latitude: 400066188</span><br><span class="line">longitude: -746793294</span><br><span class="line"></span><br><span class="line">Visiting point latitude: 415034302</span><br><span class="line">longitude: -743850945</span><br><span class="line"></span><br><span class="line">Visiting point latitude: 412567807</span><br><span class="line">longitude: -741058078</span><br><span class="line"></span><br><span class="line">Finished trip with 10 points</span><br><span class="line">Passed 10 features</span><br><span class="line">Travelled 551060 meters</span><br><span class="line">It took 0 seconds</span><br><span class="line">-------------- RouteChat --------------</span><br><span class="line">Sending First message at</span><br><span class="line">Sending Second message at longitude: 1</span><br><span class="line"></span><br><span class="line">Sending Third message at latitude: 1</span><br><span class="line"></span><br><span class="line">Sending Fourth message at</span><br><span class="line">Sending Fifth message at latitude: 1</span><br><span class="line"></span><br><span class="line">Received message First message at</span><br><span class="line">Received message Third message at latitude: 1</span><br></pre></td></tr></table></figure><h2 id="小结"><a href="#小结" class="headerlink" title="小结"></a>小结</h2><p>本文主要介绍了grpc下python的基本运行方式，以及grpc的四种通信方式。</p>]]></content>
    
    <summary type="html">
    
      &lt;p class=&quot;description&quot;&gt;&lt;br&gt;&lt;br&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;&quot; alt=&quot;&quot; style=&quot;width:100%&quot;&gt;&lt;/p&gt;
&lt;h2 id=&quot;前言&quot;&gt;&lt;a href=&quot;#前言&quot; class=&quot;headerlink&quot; title=&quot;前言&quot;&gt;&lt;/a&gt;前言&lt;/h2&gt;&lt;p&gt;gRPC  是一个高性能、开源和通用的 RPC 框架，面向移动和 HTTP/2 设计。目前提供 C、Java 和 Go 语言版本，分别是：grpc, grpc-java, grpc-go. 其中 C 版本支持 C, C++, Node.js, Python, Ruby, Objective-C, PHP 和 C# 支持.&lt;/p&gt;
&lt;p&gt;gRPC 基于 HTTP/2 标准设计，带来诸如双向流、流控、头部压缩、单 TCP 连接上的多复用请求等特。这些特性使得其在移动设备上表现更好，更省电和节省空间占用。&lt;br&gt;
    
    </summary>
    
      <category term="Python" scheme="https://cgdeeplearn.github.io/categories/Python/"/>
    
    
      <category term="python" scheme="https://cgdeeplearn.github.io/tags/python/"/>
    
      <category term="微服务" scheme="https://cgdeeplearn.github.io/tags/%E5%BE%AE%E6%9C%8D%E5%8A%A1/"/>
    
      <category term="RPC" scheme="https://cgdeeplearn.github.io/tags/RPC/"/>
    
      <category term="分布式" scheme="https://cgdeeplearn.github.io/tags/%E5%88%86%E5%B8%83%E5%BC%8F/"/>
    
  </entry>
  
  <entry>
    <title>MySQL-Datamask-ProxySQL</title>
    <link href="https://cgdeeplearn.github.io/2018/07/31/MySQL-Datamask-ProxySQL/"/>
    <id>https://cgdeeplearn.github.io/2018/07/31/MySQL-Datamask-ProxySQL/</id>
    <published>2018-07-31T01:59:54.000Z</published>
    <updated>2018-10-23T02:17:11.228Z</updated>
    
    <content type="html"><![CDATA[<script src="/assets/js/APlayer.min.js"> </script><p class="description">MySQL datamasking using ProxySQL<br></p><p><img src="" alt="" style="width:100%"></p><h2 id="前言"><a href="#前言" class="headerlink" title="前言"></a>前言</h2><div class="note info"><p><br><br>操作环境一览:<br><br>- 操作系统： CentOS7<br><br>- MySQL: 5.5<br><br>- ProxySQL: 1.4.9<br><br>- ProxySQL主机IP: 192.168.48.100<br><br>- MySQL主库IP: 192.168.48.120<br><br></p></div><a id="more"></a><h2 id="场景"><a href="#场景" class="headerlink" title="场景"></a>场景</h2><h3 id="描述"><a href="#描述" class="headerlink" title="描述"></a>描述</h3><ul><li>一张带有信用卡信息(faked)等敏感信息的顾客表，</li><li>开发、测试用户并不真正需要信用卡号的等敏感信息。</li></ul><h3 id="需求"><a href="#需求" class="headerlink" title="需求"></a>需求</h3><ul><li>开发测试用户能够通过ProxySQL访问数据</li><li>开发测试用户能够访问所有列，但是带有敏感信息的需要隐藏</li><li>开发测试用户不能在特定表上执行SELECT *操作</li></ul><p>顾客表示例：</p><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line">+<span class="comment">----+-----------+-------------+------------+------------------+----------+</span></span><br><span class="line">| id | firstname | lastname    | cc_type    | cc_num           | cc_verif |</span><br><span class="line">+<span class="comment">----+-----------+-------------+------------+------------------+----------+</span></span><br><span class="line">|  1 | Frederic  | Descamps    | mastercard | 5275653223285289 |      456 |</span><br><span class="line">|  8 | Dim0      | Vanoverbeke | mastercard | 5345654523285289 |      123 |</span><br><span class="line">| 15 | Kenny     |  Gryp       |  visa      | 4916066793184589 |      456 |</span><br><span class="line">+<span class="comment">----+-----------+-------------+------------+------------------+----------+</span></span><br></pre></td></tr></table></figure><p>我们可以在后端mysql主库上(192.168.48.120)创建该测试顾客表:</p><ul><li>创建账号</li></ul><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">CREATE</span> <span class="keyword">USER</span> <span class="string">'proxysql'</span>@<span class="string">'192.168.48.120'</span> <span class="keyword">IDENTIFIED</span> <span class="keyword">BY</span> <span class="string">'123456'</span>;</span><br></pre></td></tr></table></figure><ul><li>创建表</li></ul><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">create</span> <span class="keyword">database</span> <span class="keyword">test</span>;</span><br><span class="line"><span class="keyword">create</span> <span class="keyword">table</span> customers</span><br><span class="line">(</span><br><span class="line">  <span class="keyword">id</span> <span class="built_in">int</span>(<span class="number">3</span>) <span class="keyword">not</span> <span class="literal">null</span> primary <span class="keyword">key</span>,</span><br><span class="line">  firstname <span class="built_in">varchar</span>(<span class="number">20</span>) <span class="keyword">not</span> <span class="literal">null</span>,</span><br><span class="line">  lastname <span class="built_in">varchar</span>(<span class="number">20</span>) <span class="keyword">not</span> <span class="literal">null</span>,</span><br><span class="line">  cc_type <span class="built_in">varchar</span>(<span class="number">20</span>) <span class="keyword">not</span> <span class="literal">null</span>,</span><br><span class="line">  cc_num <span class="built_in">varchar</span>(<span class="number">50</span>) <span class="keyword">not</span> <span class="literal">null</span>,</span><br><span class="line">  cc_verif <span class="built_in">int</span>(<span class="number">3</span>)</span><br><span class="line">);</span><br></pre></td></tr></table></figure><ul><li>授权</li></ul><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">GRANT</span> ALL <span class="keyword">ON</span> test.customers <span class="keyword">TO</span> <span class="string">'proxysql'</span>@<span class="string">'192.168.48.120'</span>;</span><br><span class="line"></span><br><span class="line"><span class="keyword">FLUSH</span> <span class="keyword">PRIVILEGES</span>;</span><br></pre></td></tr></table></figure><h2 id="ProxySQL"><a href="#ProxySQL" class="headerlink" title="ProxySQL"></a>ProxySQL</h2><h3 id="安装ProxySQL"><a href="#安装ProxySQL" class="headerlink" title="安装ProxySQL"></a>安装ProxySQL</h3><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#</span>proxysql需要依赖一些perl库，所以使用yum安装</span><br><span class="line">wget https://github.com/sysown/proxysql/releases/download/v1.4.9/proxysql-1.4.9-3-centos7.x86_64.rpm</span><br><span class="line">yum install -y proxysql-1.4.9-3-centos7.x86_64.rpm</span><br></pre></td></tr></table></figure><h3 id="启动ProxySQL"><a href="#启动ProxySQL" class="headerlink" title="启动ProxySQL"></a>启动ProxySQL</h3><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">/etc/init.d/proxysql start</span><br><span class="line"><span class="meta">#</span>proxysql客户端监听在6033端口上，管理端监听6032端口</span><br><span class="line">连接proxysql管理端进行配置：</span><br><span class="line">mysql -uadmin -padmin -h127.0.0.1 -P6032</span><br><span class="line"><span class="meta">#</span>默认的管理端账号密码都是admin，登录进去之后可以修改变量进行修改账号密码</span><br></pre></td></tr></table></figure><h3 id="添加后端的mysql主机"><a href="#添加后端的mysql主机" class="headerlink" title="添加后端的mysql主机"></a>添加后端的mysql主机</h3><p>将mysql服务器ip换成你的mysql服务器ip</p><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line">ProxySQL&gt; INSERT INTO mysql_servers(hostgroup_id,hostname,port) </span><br><span class="line">          VALUES (1,'192.168.48.100',3306);</span><br><span class="line"></span><br><span class="line"><span class="keyword">select</span> * <span class="keyword">from</span> mysql_servers;</span><br><span class="line"></span><br><span class="line">MySQL [(none)]&gt; select * from mysql_servers;</span><br><span class="line">+<span class="comment">--------------+-----------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+</span></span><br><span class="line">| hostgroup_id | hostname  | port | status | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | <span class="keyword">comment</span> |</span><br><span class="line">+<span class="comment">--------------+-----------+------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+</span></span><br><span class="line">| <span class="number">1</span>            | <span class="number">192.168</span><span class="number">.48</span><span class="number">.120</span> | <span class="number">3306</span> | <span class="keyword">ONLINE</span> | <span class="number">1</span>      | <span class="number">0</span>           | <span class="number">1000</span>            | <span class="number">0</span>                   | <span class="number">0</span>       | <span class="number">0</span>              |         |</span><br></pre></td></tr></table></figure><h3 id="添加可以访问后端主机的账号"><a href="#添加可以访问后端主机的账号" class="headerlink" title="添加可以访问后端主机的账号"></a>添加可以访问后端主机的账号</h3><p>在mysql主库(192.168.48.120)中添加账号proxysql及密码，以及授权</p><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">GRANT</span> ALL <span class="keyword">ON</span> *.* <span class="keyword">TO</span> <span class="string">'proxysql'</span>@<span class="string">'192.168.48.120'</span> <span class="keyword">IDENTIFIED</span> <span class="keyword">BY</span> <span class="string">'123456'</span>;</span><br><span class="line">在proxysql服务器(192.168.48.100)中添加可以增删改查后端mysql服务器的账号</span><br><span class="line"></span><br><span class="line"><span class="keyword">insert</span> <span class="keyword">into</span> mysql_users(username,<span class="keyword">password</span>,default_hostgroup,transaction_persistent)<span class="keyword">values</span>(<span class="string">'proxysql'</span>,<span class="string">'123456'</span>,<span class="number">1</span>,<span class="number">1</span>);</span><br><span class="line"></span><br><span class="line">MySQL [(none)]&gt; insert into mysql_users(username,password,default_hostgroup,transaction_persistent)values('proxysql','123456',1,1);</span><br><span class="line">Query OK, 1 row affected (0.00 sec)</span><br></pre></td></tr></table></figure><p>在proxysql主机的mysql_users表中添加刚才创建的账号，proxysql客户端需要使用这个账号来访问数据库。</p><ul><li>default_hostgroup默认组设置为写组，也就是1</li><li>当读写分离的路由规则不符合时，会访问默认组的数据库</li><li>将刚才我们修改的数据加载至RUNTIME中(参考ProxySQL的多层配置结构)：</li></ul><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">load</span> mysql <span class="keyword">users</span> <span class="keyword">to</span> runtime;</span><br><span class="line"><span class="keyword">load</span> mysql servers <span class="keyword">to</span> runtime;</span><br><span class="line">save mysql users to disk;</span><br><span class="line">save mysql servers to disk;</span><br></pre></td></tr></table></figure><h2 id="DataMasking"><a href="#DataMasking" class="headerlink" title="DataMasking"></a>DataMasking</h2><p>ProxySQL有查询重写(Query Rewrite)功能，如果你想要重写查询，你必匹配查询的原始语句(使用match_pattern)，因为原始查询语句需要被重写。</p><h3 id="添加查询规则"><a href="#添加查询规则" class="headerlink" title="添加查询规则"></a>添加查询规则</h3><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">ProxySQL&gt; INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,error_msg)</span><br><span class="line">          VALUES (90,1,'proxysql','^<span class="keyword">SELECT</span> \*.*FROM.*customers<span class="string">',</span></span><br><span class="line"><span class="string">          '</span><span class="keyword">Query</span> <span class="keyword">not</span> allowed due <span class="keyword">to</span> sensitive information, please contact dba@myapp.com<span class="string">');</span></span><br><span class="line"><span class="string">Let’s load it in runtime and test</span></span><br><span class="line"><span class="string">ProxySQL&gt; LOAD MYSQL QUERY RULES TO RUNTIME;</span></span><br></pre></td></tr></table></figure><ul><li>另开一个终端，以6033端口(数据端口)登录:</li></ul><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">mysql -uproxysql -p123456 -h 192.168.48.100 -P 6033</span><br></pre></td></tr></table></figure><ul><li>执行SELECT *操作：</li></ul><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">mysql&gt; select * from test.customers;</span><br><span class="line">ERROR 1148 (42000): Query not allowed due to sensitive information, please contact dba@myapp.com</span><br></pre></td></tr></table></figure><p>Yeah!我们根据配置的mysql_query_rules成功阻断了对customers表上的<code>SELECT*</code>操作.</p><ul><li>我们再在管理连接中插入如下一条查询规则:</li></ul><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br></pre></td><td class="code"><pre><span class="line">ProxySQL&gt; INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,replace_pattern,apply)</span><br><span class="line">          VALUES (1,1,'proxysql','^[sS][eE][lL][eE][cC][tT] (.*)cc_num([ ,])(.*)', </span><br><span class="line">                "<span class="keyword">SELECT</span> \<span class="number">1</span><span class="keyword">CONCAT</span>(<span class="keyword">REPEAT</span>(<span class="string">'X'</span>,<span class="number">12</span>),<span class="keyword">RIGHT</span>(cc_num,<span class="number">4</span>)) cc_num\<span class="number">2</span>\<span class="number">3</span><span class="string">",1);</span></span><br><span class="line"><span class="string"></span></span><br><span class="line"><span class="string">ProxySQL&gt; LOAD MYSQL QUERY RULES TO RUNTIME;</span></span><br><span class="line"><span class="string">我们在数据连接中再测试一下:</span></span><br><span class="line"><span class="string"></span></span><br><span class="line"><span class="string">mysql&gt; select firstname, cc_num from test.customers;</span></span><br><span class="line"><span class="string">+-----------+------------------+</span></span><br><span class="line"><span class="string">| firstname | cc_num           |</span></span><br><span class="line"><span class="string">+-----------+------------------+</span></span><br><span class="line"><span class="string">| Frederic  | XXXXXXXXXXXX5289 |</span></span><br><span class="line"><span class="string">| Dim0      | XXXXXXXXXXXX5289 |</span></span><br><span class="line"><span class="string">| Kenny     | XXXXXXXXXXXX4589 |</span></span><br><span class="line"><span class="string">+-----------+------------------+</span></span><br></pre></td></tr></table></figure><p>WOOhoo!我们成功实现了只显示卡号后4位！</p><ul><li>保存规则到磁盘</li></ul><figure class="highlight"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">ProxySQL&gt; SAVE MYSQL QUERY RULES TO DISK;</span><br></pre></td></tr></table></figure><h2 id="更多"><a href="#更多" class="headerlink" title="更多"></a>更多</h2><p>我们需要对更多的表和字段做更多的datamasking(例如姓名字段做隐藏等)，我们就需要编写更多的查询规则(mysql_query_rules),并在管理连接中添加到mysql_query_rules表中.</p>]]></content>
    
    <summary type="html">
    
      &lt;p class=&quot;description&quot;&gt;MySQL datamasking using ProxySQL&lt;br&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;&quot; alt=&quot;&quot; style=&quot;width:100%&quot;&gt;&lt;/p&gt;
&lt;h2 id=&quot;前言&quot;&gt;&lt;a href=&quot;#前言&quot; class=&quot;headerlink&quot; title=&quot;前言&quot;&gt;&lt;/a&gt;前言&lt;/h2&gt;&lt;div class=&quot;note info&quot;&gt;&lt;p&gt;&lt;br&gt;&lt;br&gt;操作环境一览:&lt;br&gt;&lt;br&gt;- 操作系统： CentOS7&lt;br&gt;&lt;br&gt;- MySQL: 5.5&lt;br&gt;&lt;br&gt;- ProxySQL: 1.4.9&lt;br&gt;&lt;br&gt;- ProxySQL主机IP: 192.168.48.100&lt;br&gt;&lt;br&gt;- MySQL主库IP: 192.168.48.120&lt;br&gt;&lt;br&gt;&lt;/p&gt;&lt;/div&gt;
    
    </summary>
    
      <category term="Linux" scheme="https://cgdeeplearn.github.io/categories/Linux/"/>
    
    
      <category term="MySQL" scheme="https://cgdeeplearn.github.io/tags/MySQL/"/>
    
      <category term="ProxySQL" scheme="https://cgdeeplearn.github.io/tags/ProxySQL/"/>
    
      <category term="DataMask" scheme="https://cgdeeplearn.github.io/tags/DataMask/"/>
    
  </entry>
  
  <entry>
    <title>只在使用 Mix-in 组件制作工具类时进行多重继承</title>
    <link href="https://cgdeeplearn.github.io/2018/06/11/Mix-in-inherit/"/>
    <id>https://cgdeeplearn.github.io/2018/06/11/Mix-in-inherit/</id>
    <published>2018-06-11T08:37:57.000Z</published>
    <updated>2018-06-12T03:04:36.515Z</updated>
    
    <content type="html"><![CDATA[<script src="/assets/js/APlayer.min.js"> </script><p class="description">Python是面向对象的编程语言，它提供了一些内置的编程机制，使得开发者可以适当地实现多重继承。但是我们仍然应该尽量避开多重继承。<br></p><p><img src="" alt="" style="width:100%"></p><h2 id="前言"><a href="#前言" class="headerlink" title="前言"></a>前言</h2><div class="note info"><p><br><br>若一定要利用多重继承所带来的便利及封装性，那就编写<code>mix-in</code>类。<code>mix-in</code>是一种小型的类，它只定义了其他类可能需要提供的一套附加方法，而不定义自己的实例属性，此外，它也不要求使用者调用自己的<code>__init__</code>构造器。<br><br></p></div><a id="more"></a><h2 id="例子1-ToDictMixin"><a href="#例子1-ToDictMixin" class="headerlink" title="例子1:ToDictMixin"></a>例子1:ToDictMixin</h2><p>现在，要把内存中的Python对象转换成字典形式，以便将其序列化，那我们就不妨把这个功能写成通用的代码，以便其他类使用。</p><h3 id="ToDictMixin"><a href="#ToDictMixin" class="headerlink" title="ToDictMixin"></a>ToDictMixin</h3><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment"># ToDictMixin类</span></span><br><span class="line"><span class="class"><span class="keyword">class</span> <span class="title">ToDictMixin</span><span class="params">(object)</span>:</span></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">to_dict</span><span class="params">(self)</span>:</span></span><br><span class="line">        <span class="comment"># 用__dict__来访问实例内部的字典</span></span><br><span class="line">        <span class="keyword">return</span> self._traverse_dict(self.__dict__)</span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">_traverse_dict</span><span class="params">(self, instance_dict)</span>:</span></span><br><span class="line">        output = &#123;&#125;</span><br><span class="line">        <span class="keyword">for</span> key, value <span class="keyword">in</span> instance_dict.items():</span><br><span class="line">            output[key] = self._traverse(key, value)</span><br><span class="line">        <span class="keyword">return</span> output</span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">_traverse</span><span class="params">(self, key, value)</span>:</span></span><br><span class="line">        <span class="comment"># 根据value不同类型分别作处理</span></span><br><span class="line">        <span class="keyword">if</span> isinstance(value, ToDictMixin):</span><br><span class="line">            <span class="keyword">return</span> value.to_dict()</span><br><span class="line">        <span class="keyword">elif</span> isinstance(value, dict):</span><br><span class="line">            <span class="keyword">return</span> self._traverse_dict(value)</span><br><span class="line">        <span class="keyword">elif</span> isinstance(value, list):</span><br><span class="line">            <span class="keyword">return</span> [self._traverse(key, i) <span class="keyword">for</span> i <span class="keyword">in</span> value]</span><br><span class="line">        <span class="keyword">elif</span> hasattr(value, <span class="string">'__dict__'</span>):</span><br><span class="line">            <span class="keyword">return</span> self._traverse_dict(value.__dict__)</span><br><span class="line">        <span class="keyword">else</span>:</span><br><span class="line">            <span class="keyword">return</span> value</span><br></pre></td></tr></table></figure><h3 id="BinaryTree"><a href="#BinaryTree" class="headerlink" title="BinaryTree"></a>BinaryTree</h3><p>使用<code>ToDictMixin</code>把二叉树表示为字典:</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment"># 二叉树类</span></span><br><span class="line"><span class="class"><span class="keyword">class</span> <span class="title">BinaryTree</span><span class="params">(ToDictMixin)</span>:</span></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">__init__</span><span class="params">(self, value, left=None, right=None)</span>:</span></span><br><span class="line">        self.value= value</span><br><span class="line">        self.left = left</span><br><span class="line">        self.right = right</span><br></pre></td></tr></table></figure><p>现在，我们可以把一大批互相关联的Python对象都轻松地转换成字典：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">tree = BinaryTree(<span class="number">1</span>,</span><br><span class="line">    left=BinaryTree(<span class="number">2</span>, right=BinaryTree(<span class="number">3</span>)),</span><br><span class="line">    right=BinaryTree(<span class="number">4</span>, left=BinaryTree(<span class="number">5</span>)))</span><br><span class="line">print(tree.to_dict())</span><br><span class="line">&gt;&gt;&gt;</span><br><span class="line">&#123;<span class="string">'value'</span>: <span class="number">1</span>, <span class="string">'right'</span>: &#123;<span class="string">'value'</span>: <span class="number">4</span>, <span class="string">'right'</span>: <span class="keyword">None</span>, <span class="string">'left'</span>: &#123;<span class="string">'value'</span>: <span class="number">5</span>, <span class="string">'right'</span>: <span class="keyword">None</span>, <span class="string">'left'</span>: <span class="keyword">None</span>&#125;&#125;, <span class="string">'left'</span>: &#123;<span class="string">'value'</span>: <span class="number">2</span>, <span class="string">'right'</span>: &#123;<span class="string">'value'</span>: <span class="number">3</span>, <span class="string">'right'</span>: <span class="keyword">None</span>, <span class="string">'left'</span>: <span class="keyword">None</span>&#125;, <span class="string">'left'</span>: <span class="keyword">None</span>&#125;&#125;</span><br></pre></td></tr></table></figure><h3 id="BinaryTreeWithParent"><a href="#BinaryTreeWithParent" class="headerlink" title="BinaryTreeWithParent"></a>BinaryTreeWithParent</h3><p><code>mix-in</code>的最大优势在于，使用者可以随时安插这些通用的功能，并能在必要的时候覆写它们。</p><p>下面定义的这个<code>BinaryTree</code>子类，会持有指向父节点的引用。如果采用默认的<code>ToDictMixin.to_dict</code>来处理它，那么程序就会因为循环引用而陷入死循环(parent)。</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="class"><span class="keyword">class</span> <span class="title">BinaryTreeWithParent</span><span class="params">(BinaryTree)</span>:</span></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">__init__</span><span class="params">(self, value, left=None, right=None, parent=None)</span>:</span></span><br><span class="line">        super().__init__(value, left=left, right=right)</span><br><span class="line">        self.parent = parent</span><br></pre></td></tr></table></figure><p>解决办法是在<code>BinaryTreeWithParent</code>里覆写<code>ToDictMixin._traverse方法</code>，令该方法只处理与序列化有关的值，从而使mix-in的实现代码不会陷入死循环：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment"># 覆写_traverse方法，不再遍历父节点，而是只把父节点所对应的数值插入到最终生成的字典里面</span></span><br><span class="line"><span class="class"><span class="keyword">class</span> <span class="title">BinaryTreeWithParent</span><span class="params">(BinaryTree)</span>:</span></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">__init__</span><span class="params">(self, value, left=None, right=None, parent=None)</span>:</span></span><br><span class="line">        super().__init__(value, left=left, right=right)</span><br><span class="line">        self.parent = parent</span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">_traverse</span><span class="params">(self, key, value)</span>:</span></span><br><span class="line">        <span class="keyword">if</span> (isinstance(value, BinaryTreeWithParent) <span class="keyword">and</span> key == <span class="string">'parent'</span>):</span><br><span class="line">            <span class="keyword">return</span> value.value  <span class="comment"># 返回父节点(parent)的值</span></span><br><span class="line">        <span class="keyword">else</span>:</span><br><span class="line">            <span class="keyword">return</span> super()._traverse(key, value)</span><br></pre></td></tr></table></figure><p>调用<code>BinaryTreeWithParent.to_dict</code>看看:</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">root = BinaryTreeWithParent(<span class="number">1</span>)</span><br><span class="line">root.left = BinaryTreeWithParent(<span class="number">2</span>, parent=root)</span><br><span class="line">root.left.right = BinaryTreeWithParent(<span class="number">4</span>, parent=root.left)</span><br><span class="line">print(root.to_dict())</span><br><span class="line">&gt;&gt;&gt;</span><br><span class="line">&#123;<span class="string">'value'</span>: <span class="number">1</span>, <span class="string">'right'</span>: <span class="keyword">None</span>, <span class="string">'left'</span>: &#123;<span class="string">'value'</span>: <span class="number">2</span>, <span class="string">'right'</span>: &#123;<span class="string">'value'</span>: <span class="number">4</span>, <span class="string">'right'</span>: <span class="keyword">None</span>, <span class="string">'left'</span>: <span class="keyword">None</span>, <span class="string">'parent'</span>: <span class="number">2</span>&#125;, <span class="string">'left'</span>: <span class="keyword">None</span>, <span class="string">'parent'</span>: <span class="number">1</span>&#125;, <span class="string">'parent'</span>: <span class="keyword">None</span>&#125;</span><br></pre></td></tr></table></figure><p>定义了<code>BinaryTreeWithParent._traverse</code>方法之后，如果其他类的某个属性也是<code>BinaryTreeWithParent</code>类型，那么<code>ToDictMixin</code>会自动处理好这些属性:</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line"><span class="class"><span class="keyword">class</span> <span class="title">NamedSubTree</span><span class="params">(ToDictMixin)</span>:</span></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">__init__</span><span class="params">(self, name, tree_with_parent)</span>:</span></span><br><span class="line">        self.name = name</span><br><span class="line">        self.tree_with_parent = tree_with_parent</span><br><span class="line"></span><br><span class="line">my_tree = NamedSubTree(<span class="string">'foobar'</span>, root.left.right)  <span class="comment"># 上面定义的root</span></span><br><span class="line">print(my_tree.to_dict())</span><br><span class="line">&gt;&gt;&gt;</span><br><span class="line">&#123;<span class="string">'name'</span>: <span class="string">'foobar'</span>, <span class="string">'tree_with_parent'</span>: &#123;<span class="string">'value'</span>: <span class="number">4</span>, <span class="string">'right'</span>: <span class="keyword">None</span>, <span class="string">'left'</span>: <span class="keyword">None</span>, <span class="string">'parent'</span>: <span class="number">2</span>&#125;&#125;</span><br></pre></td></tr></table></figure><h2 id="多个mix-in组合"><a href="#多个mix-in组合" class="headerlink" title="多个mix-in组合"></a>多个mix-in组合</h2><h3 id="JsonMixin"><a href="#JsonMixin" class="headerlink" title="JsonMixin"></a>JsonMixin</h3><p>多个mix-in之间也可以相互组合。例如，可以编写这样一个mix-in，它能够为任意类提供通用的JSON序列化功能。我们可以假定：继承了mix-in的哪个类，会提供名为to_dict的方法(此方法有可能是那个类通过多重继承ToDictMixin而具备的，也有可能不是)。</p><figure class="highlight python"><figcaption><span>JsonMixin</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment"># import json first</span></span><br><span class="line"><span class="class"><span class="keyword">class</span> <span class="title">JsonMixin</span><span class="params">(object)</span>:</span></span><br><span class="line"><span class="meta">    @classmethod</span></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">from_json</span><span class="params">(cls, data)</span>:</span></span><br><span class="line">        kwargs = json.loads(data)</span><br><span class="line">        <span class="keyword">return</span> cls(**kwargs)</span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">to_json</span><span class="params">(self)</span>:</span></span><br><span class="line">        <span class="keyword">return</span> json.dumps(self.to_dict())</span><br></pre></td></tr></table></figure><p>请注意，JsonMixin类既定义了实例方法，有定义了类方法。这两种行为都可以通过mix-in来提供。在本例中，凡是想继承JsonMixin的类，只需符合两个条件即可:</p><ul><li>(1) 包含名为to_dict的方法</li><li>(2) <strong>init</strong>方法接受关键字参数</li></ul><h3 id="组合ToDictMixin和JsonMixin"><a href="#组合ToDictMixin和JsonMixin" class="headerlink" title="组合ToDictMixin和JsonMixin"></a>组合ToDictMixin和JsonMixin</h3><p>我们用下面这个继承了mix-in组件的数据类来表示数据中心的拓扑结构:</p><figure class="highlight python"><figcaption><span>DatacenterRack</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br></pre></td><td class="code"><pre><span class="line"><span class="class"><span class="keyword">class</span> <span class="title">DatacenterRack</span><span class="params">(ToDictMixin, JsonMixin)</span>:</span></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">__init__</span><span class="params">(self, switch=None, machines=None)</span>:</span></span><br><span class="line">        self.switch = Switch(**switch)</span><br><span class="line">        self.machines = [Machine(**kwargs) <span class="keyword">for</span> kwargs <span class="keyword">in</span> machines]</span><br><span class="line"></span><br><span class="line"><span class="class"><span class="keyword">class</span> <span class="title">Switch</span><span class="params">(ToDictMixin, JsonMixin)</span>:</span></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">__init__</span><span class="params">(self, **kwargs)</span>:</span></span><br><span class="line">        super().__init__()</span><br><span class="line">        <span class="comment"># 接受处理关键字参数</span></span><br><span class="line">        <span class="keyword">for</span> k, w <span class="keyword">in</span> kwargs.items():</span><br><span class="line">            setattr(self, k, w)</span><br><span class="line"></span><br><span class="line"><span class="class"><span class="keyword">class</span> <span class="title">Machine</span><span class="params">(ToDictMixin, JsonMixin)</span>:</span></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">__init__</span><span class="params">(self, **kwargs)</span>:</span></span><br><span class="line">        super().__init__()</span><br><span class="line">        <span class="comment"># 接受处理关键字参数</span></span><br><span class="line">        <span class="keyword">for</span> k, w <span class="keyword">in</span> kwargs.items():</span><br><span class="line">            setattr(self, k, w)</span><br></pre></td></tr></table></figure><p>对这样的类进行序列化，以及从JSON中加载它，都是比较简单的。下面的这段代码，会重复执行序列化及反序列化操作，以验证这两个功能有没有正确地实现出来。</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line">serialized = <span class="string">"""&#123;</span></span><br><span class="line"><span class="string">    "switch": &#123;"ports": 5,"speed":1e9&#125;,</span></span><br><span class="line"><span class="string">    "machines": [</span></span><br><span class="line"><span class="string">        &#123;"cores": 8, "ram": 32e9, "disk": 5e12&#125;,</span></span><br><span class="line"><span class="string">        &#123;"cores": 4, "ram": 16e9, "disk": 5e12&#125;,</span></span><br><span class="line"><span class="string">        &#123;"cores": 2, "ram": 4e9, "disk": 500e9&#125;</span></span><br><span class="line"><span class="string">    ]</span></span><br><span class="line"><span class="string">&#125;"""</span></span><br><span class="line"></span><br><span class="line">deserialized = DatacenterRack.from_json(serialized)</span><br><span class="line">roundtrip = deserialized.to_json()</span><br><span class="line"><span class="keyword">assert</span> json.loads(serialized)  == json.loads(roundtrip)</span><br></pre></td></tr></table></figure><p>使用这种mix-in的时候，既可以像本例这样，直接继承多个mix-in组件，也可以先令继承体系中的其他类继承相关的mix-in组件，然后再令本类继承那些类，以达到同样的效果</p><h2 id="小结"><a href="#小结" class="headerlink" title="小结"></a>小结</h2><ul><li>能用mix-in组件实现的效果，就不要用多重继承来做</li><li>将各功能实现为可插拔的mix-in组件，然后令相关的类继承自己需要的那些组件，即可定制该类实例所应具备的行为。</li><li>把简单的行为封装到mix-in组件里，然后就可以用多个mix-in组合出复杂的行为了。</li></ul>]]></content>
    
    <summary type="html">
    
      &lt;p class=&quot;description&quot;&gt;Python是面向对象的编程语言，它提供了一些内置的编程机制，使得开发者可以适当地实现多重继承。但是我们仍然应该尽量避开多重继承。&lt;br&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;&quot; alt=&quot;&quot; style=&quot;width:100%&quot;&gt;&lt;/p&gt;
&lt;h2 id=&quot;前言&quot;&gt;&lt;a href=&quot;#前言&quot; class=&quot;headerlink&quot; title=&quot;前言&quot;&gt;&lt;/a&gt;前言&lt;/h2&gt;&lt;div class=&quot;note info&quot;&gt;&lt;p&gt;&lt;br&gt;&lt;br&gt;若一定要利用多重继承所带来的便利及封装性，那就编写&lt;code&gt;mix-in&lt;/code&gt;类。&lt;code&gt;mix-in&lt;/code&gt;是一种小型的类，它只定义了其他类可能需要提供的一套附加方法，而不定义自己的实例属性，此外，它也不要求使用者调用自己的&lt;code&gt;__init__&lt;/code&gt;构造器。&lt;br&gt;&lt;br&gt;&lt;/p&gt;&lt;/div&gt;
    
    </summary>
    
      <category term="Python" scheme="https://cgdeeplearn.github.io/categories/Python/"/>
    
    
      <category term="Mix-in" scheme="https://cgdeeplearn.github.io/tags/Mix-in/"/>
    
      <category term="inherit" scheme="https://cgdeeplearn.github.io/tags/inherit/"/>
    
  </entry>
  
  <entry>
    <title>Ubuntu下搭建个人gitlab服务器</title>
    <link href="https://cgdeeplearn.github.io/2018/05/10/start-gitlab/"/>
    <id>https://cgdeeplearn.github.io/2018/05/10/start-gitlab/</id>
    <published>2018-05-10T06:21:22.000Z</published>
    <updated>2018-05-10T06:34:46.176Z</updated>
    
    <content type="html"><![CDATA[<script src="/assets/js/APlayer.min.js"> </script><p class="description">本文主要记录在Ubuntu 16.04操作系统中搭建GitLab服务器的操作记录<br></p><p><img src="" alt="" style="width:100%"></p><h2 id="前言"><a href="#前言" class="headerlink" title="前言"></a>前言</h2><blockquote><p>GitLab 是一个用于仓库管理系统的开源项目。使用Git作为代码管理工具，并在此基础上搭建起来的web服务.可通过Web界面进行访问公开的或者私人项目,它拥有与Github类似的功能,能够浏览源代码,管理缺陷和注释.可以管理团队对仓库的访问,它非常易于浏览提交过的版本并提供一个文件历史库,团队成员可以利用内置的简单聊天程序(Wall)进行交流。它还提供一个代码片段收集功能可以轻松实现代码复用.<br><a id="more"></a></p></blockquote><h2 id="安装依赖包"><a href="#安装依赖包" class="headerlink" title="安装依赖包"></a>安装依赖包</h2><figure class="highlight cmd"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo apt-get install curl openssh-server ca-certificates postfix</span><br></pre></td></tr></table></figure><h2 id="执行完成后-出现邮件配置，选择Internet-Site这一项，确定"><a href="#执行完成后-出现邮件配置，选择Internet-Site这一项，确定" class="headerlink" title="执行完成后,出现邮件配置，选择Internet Site这一项，确定"></a>执行完成后,出现邮件配置，选择<code>Internet Site</code>这一项，确定</h2><h2 id="添加清华镜像源"><a href="#添加清华镜像源" class="headerlink" title="添加清华镜像源"></a>添加清华镜像源</h2><ul><li>添加Gitlab的GPG公钥</li></ul><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">curl https://packages.gitlab.com/gpg.key 2&gt; /dev/null | sudo apt-key add - &amp;&gt;/dev/null</span><br></pre></td></tr></table></figure><ul><li>添加源</li></ul><p><code>sudo vim /etc/apt/sources.list.d/gitlab-ce.list</code>,加入如下语句：</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">deb https://mirrors.tuna.tsinghua.edu.cn/gitlab-ce/ubuntu xenial main</span><br></pre></td></tr></table></figure><h2 id="安装gitlab-ce"><a href="#安装gitlab-ce" class="headerlink" title="安装gitlab-ce"></a>安装gitlab-ce</h2><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">sudo apt-get update</span><br><span class="line">sudo apt-get install gitlab-ce</span><br></pre></td></tr></table></figure><p>会下载400多MB的，安装完成会占用1GB多的空间，请确保服务器空间充足</p><h2 id="启动各项服务"><a href="#启动各项服务" class="headerlink" title="启动各项服务"></a>启动各项服务</h2><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo gitlab-ctl reconfigure</span><br></pre></td></tr></table></figure><h2 id="检查GitLab是否安装好并且已经正确运行"><a href="#检查GitLab是否安装好并且已经正确运行" class="headerlink" title="检查GitLab是否安装好并且已经正确运行"></a>检查GitLab是否安装好并且已经正确运行</h2><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo gitlab-ctl status</span><br></pre></td></tr></table></figure><p>得到如下结果，说明Gitlab运行正常</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><span class="line">run: gitaly: (pid 25122) 1215s; run: log: (pid 17658) 2960s</span><br><span class="line">run: gitlab-monitor: (pid 25134) 1214s; run: log: (pid 17872) 2948s</span><br><span class="line">run: gitlab-workhorse: (pid 25139) 1213s; run: log: (pid 17594) 2974s</span><br><span class="line">run: logrotate: (pid 25153) 1211s; run: log: (pid 17626) 2966s</span><br><span class="line">run: nginx: (pid 26851) 857s; run: log: (pid 17610) 2972s</span><br><span class="line">run: node-exporter: (pid 25180) 1210s; run: log: (pid 17842) 2954s</span><br><span class="line">run: postgres-exporter: (pid 25190) 1209s; run: log: (pid 17956) 2934s</span><br><span class="line">run: postgresql: (pid 25201) 1207s; run: log: (pid 17322) 3028s</span><br><span class="line">run: prometheus: (pid 25210) 1204s; run: log: (pid 17914) 2940s</span><br><span class="line">run: redis: (pid 25226) 1201s; run: log: (pid 17256) 3034s</span><br><span class="line">run: redis-exporter: (pid 25309) 1199s; run: log: (pid 17888) 2946s</span><br><span class="line">run: sidekiq: (pid 26311) 1071s; run: log: (pid 17576) 2976s</span><br><span class="line">run: unicorn: (pid 26662) 933s; run: log: (pid 17532) 2982s</span><br></pre></td></tr></table></figure><h2 id="配置gitlab-external-url访问规则"><a href="#配置gitlab-external-url访问规则" class="headerlink" title="配置gitlab external_url访问规则"></a>配置gitlab external_url访问规则</h2><ul><li>修改gitlab.rb配置文件</li></ul><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo vim /etc/gitlab/gitlab.rb</span><br></pre></td></tr></table></figure><figure class="highlight diff"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="deletion">- external_url 'http://gitlab.example.com'</span></span><br><span class="line"><span class="addition">+ external_url 'http://192.168.2.200:9876/' </span></span><br><span class="line"><span class="addition">+ ## 192.168.48.200为服务器地址，请替换为你的</span></span><br><span class="line"><span class="addition">+ ## 9876为自定义的端口，默认为80端口。</span></span><br></pre></td></tr></table></figure><ul><li>添加防火墙规则</li></ul><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">## 开放自定义端口访问(上面定义的9876端口)</span><br><span class="line">sudo iptables -A INPUT -p tcp -m tcp --dport 9876 -j ACCEPT</span><br><span class="line">## 如果上面使用的默认端口,就开放80端口</span><br><span class="line">## sudo iptables -A INPUT -p tcp -m tcp --dport 80 -j ACCEPT</span><br></pre></td></tr></table></figure><h2 id="启动sshd和postfix服务"><a href="#启动sshd和postfix服务" class="headerlink" title="启动sshd和postfix服务"></a>启动<code>sshd</code>和<code>postfix</code>服务</h2><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">sudo service sshd start</span><br><span class="line"></span><br><span class="line">sudo service postfix start</span><br></pre></td></tr></table></figure><h2 id="浏览页面并设置密码"><a href="#浏览页面并设置密码" class="headerlink" title="浏览页面并设置密码"></a>浏览页面并设置密码</h2><p>浏览器输入<code>http://192.168.2.200:9876</code>(如果使用的默认端口则为<code>http://192.168.2.200</code>),将<code>192.168.2.200</code>替换为你的服务器地址.</p><p><img src="https://github.com/cgDeepLearn/LinuxSetups/blob/master/pics/gitlab-changepass.png?raw=true" alt="change-pass"></p><p>第一次进入后会出现修改密码的页面</p><p>输入密码确认。其默认用户为<code>root</code></p><p>然后可以修改用户名密码或者注册新用户等</p><h2 id="创建组、项目"><a href="#创建组、项目" class="headerlink" title="创建组、项目"></a>创建组、项目</h2><p><img src="https://github.com/cgDeepLearn/LinuxSetups/blob/master/pics/gitlab-create.png?raw=true" alt="git-create"></p><p>当然也可以从github等仓库导入</p><h2 id="添加ssh-key等"><a href="#添加ssh-key等" class="headerlink" title="添加ssh-key等"></a>添加ssh-key等</h2><p>这和github等相同</p>]]></content>
    
    <summary type="html">
    
      &lt;p class=&quot;description&quot;&gt;本文主要记录在Ubuntu 16.04操作系统中搭建GitLab服务器的操作记录&lt;br&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;&quot; alt=&quot;&quot; style=&quot;width:100%&quot;&gt;&lt;/p&gt;
&lt;h2 id=&quot;前言&quot;&gt;&lt;a href=&quot;#前言&quot; class=&quot;headerlink&quot; title=&quot;前言&quot;&gt;&lt;/a&gt;前言&lt;/h2&gt;&lt;blockquote&gt;
&lt;p&gt;GitLab 是一个用于仓库管理系统的开源项目。使用Git作为代码管理工具，并在此基础上搭建起来的web服务.可通过Web界面进行访问公开的或者私人项目,它拥有与Github类似的功能,能够浏览源代码,管理缺陷和注释.可以管理团队对仓库的访问,它非常易于浏览提交过的版本并提供一个文件历史库,团队成员可以利用内置的简单聊天程序(Wall)进行交流。它还提供一个代码片段收集功能可以轻松实现代码复用.&lt;br&gt;
    
    </summary>
    
      <category term="Linux" scheme="https://cgdeeplearn.github.io/categories/Linux/"/>
    
    
      <category term="gitlab" scheme="https://cgdeeplearn.github.io/tags/gitlab/"/>
    
      <category term="git" scheme="https://cgdeeplearn.github.io/tags/git/"/>
    
      <category term="Ubuntu" scheme="https://cgdeeplearn.github.io/tags/Ubuntu/"/>
    
  </entry>
  
  <entry>
    <title>使用Cookiecutter来初始化你的Django项目,Awesome!!!</title>
    <link href="https://cgdeeplearn.github.io/2018/03/19/Cookiecutter-Django/"/>
    <id>https://cgdeeplearn.github.io/2018/03/19/Cookiecutter-Django/</id>
    <published>2018-03-19T06:53:10.000Z</published>
    <updated>2018-03-19T07:16:19.598Z</updated>
    
    <content type="html"><![CDATA[<script src="/assets/js/APlayer.min.js"> </script><p class="description">快来动手试试吧，简直太棒了<br></p><p><img src="" alt="" style="width:100%"></p><h2 id="前言"><a href="#前言" class="headerlink" title="前言"></a>前言</h2><p><div class="note info"><p><br>Cookiecutter可以让你快速从模板中建立工程，cookiecutter-django则是Django的模板，可以快速生成<code>Django</code>大型项目模板。其特性如下:</p><p><ul><br>  <li>跨平台: Windows,Mac 和Linux都支持</li><br>  <li>在Python2.7, 3.3, 3.4, 3.5, 3.6 和PyPy下运行</li><br>  <li>工程模板可以是任何语言</li><br>  <li>简单易用</li><br></ul><br></p></div><br><a id="more"></a></p><h2 id="安装配置Cookiecutter-django"><a href="#安装配置Cookiecutter-django" class="headerlink" title="安装配置Cookiecutter-django"></a>安装配置Cookiecutter-django</h2><h3 id="安装cookiecutter"><a href="#安装cookiecutter" class="headerlink" title="安装cookiecutter"></a>安装cookiecutter</h3><p>首先, get Cookiecutter.相信我,它棒极了:</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">$</span> pip install "cookiecutter&gt;=1.4.0"</span><br></pre></td></tr></table></figure><h3 id="生成项目"><a href="#生成项目" class="headerlink" title="生成项目"></a>生成项目</h3><p>然后用<code>Cookiecutter-django</code>来生成一个Django项目:</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">cookiecutter https://github.com/audreyr/cookiecutter-pypackage.git</span><br></pre></td></tr></table></figure><p>你需要在引导下填一些values,例如:</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br></pre></td><td class="code"><pre><span class="line">Cloning into &apos;cookiecutter-django&apos;...</span><br><span class="line">remote: Counting objects: 550, done.</span><br><span class="line">remote: Compressing objects: 100% (310/310), done.</span><br><span class="line">remote: Total 550 (delta 283), reused 479 (delta 222)</span><br><span class="line">Receiving objects: 100% (550/550), 127.66 KiB | 58 KiB/s, done.</span><br><span class="line">Resolving deltas: 100% (283/283), done.</span><br><span class="line">project_name [Project Name]: My First Django Project</span><br><span class="line">project_slug [reddit_clone]: my_first_django_project</span><br><span class="line">author_name [Daniel Roy Greenfeld]: cgDeepLearn</span><br><span class="line">email [you@example.com]: cgDeepLearn@gmail.com</span><br><span class="line">description [A short description of the project.]: My first Django Project with Cookiecutter</span><br><span class="line">domain_name [example.com]: myxxxx.com</span><br><span class="line">version [0.1.0]: 0.0.1</span><br><span class="line">timezone [UTC]: Asia/Shanghai</span><br><span class="line">use_whitenoise [y]: n</span><br><span class="line">use_celery [n]: y</span><br><span class="line">use_mailhog [n]: n</span><br><span class="line">use_sentry_for_error_reporting [y]: y</span><br><span class="line">use_opbeat [n]: y</span><br><span class="line">use_pycharm [n]: y</span><br><span class="line">windows [n]: n</span><br><span class="line">use_docker [y]: n</span><br><span class="line">use_heroku [n]: n</span><br><span class="line">use_compressor [n]: y</span><br><span class="line">Select postgresql_version:</span><br><span class="line">1 - 10.3</span><br><span class="line">2 - 10.2</span><br><span class="line">3 - 10.1</span><br><span class="line">4 - 9.6</span><br><span class="line">5 - 9.5</span><br><span class="line">6 - 9.4</span><br><span class="line">7 - 9.3</span><br><span class="line">Choose from 1, 2, 3, 4 [1]: 4</span><br><span class="line">Select js_task_runner:</span><br><span class="line">1 - Gulp</span><br><span class="line">2 - Grunt</span><br><span class="line">3 - None</span><br><span class="line">Choose from 1, 2, 3, 4 [1]: 1</span><br><span class="line">custom_bootstrap_compilation [n]: n</span><br><span class="line">Select open_source_license:</span><br><span class="line">1 - MIT</span><br><span class="line">2 - BSD</span><br><span class="line">3 - GPLv3</span><br><span class="line">4 - Apache Software License 2.0</span><br><span class="line">5 - Not open source</span><br><span class="line">Choose from 1, 2, 3, 4, 5 [1]: 1</span><br><span class="line">keep_local_envs_in_vcs [y]: y</span><br></pre></td></tr></table></figure><p>根据你的需要来选择一些选项。注: <code>project_slug</code>是你的项目名(在路径中体现)</p><h3 id="配置使用Django"><a href="#配置使用Django" class="headerlink" title="配置使用Django"></a>配置使用Django</h3><p>进入项目根目录:</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">$</span> cd  my_first_django_project</span><br><span class="line"><span class="meta">$</span> ls</span><br></pre></td></tr></table></figure><h4 id="关联仓库"><a href="#关联仓库" class="headerlink" title="关联仓库"></a>关联仓库</h4><p>在github创建一个repo,关联你的项目，并首次push:</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">$</span> git init</span><br><span class="line"><span class="meta">$</span> git add .</span><br><span class="line"><span class="meta">$</span> git commit -m "first awesome commit"</span><br><span class="line"><span class="meta">$</span> git remote add origin git@github.com:yourname/yourproject.git</span><br><span class="line"><span class="meta">$</span> git push -u origin master</span><br></pre></td></tr></table></figure><h4 id="配置Django"><a href="#配置Django" class="headerlink" title="配置Django"></a>配置Django</h4><p>选择<code>Django</code>安装版本(修改requirements/base.txt):</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">django==1.11.2</span><br><span class="line"># django==2.0.3 ## django 2.0+</span><br></pre></td></tr></table></figure><p>数据库如果选择了<code>Postgresql</code>(Postgresql的安装使用情参考<a href="https://github.com/cgDeepLearn/LinuxSetups/blob/master/docs/databases/postgresql.md" target="_blank" rel="noopener">Postgresql安装配置</a>),需安装<code>psycopg2</code>依赖:</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">## requirements/local.txt</span><br><span class="line">psycopg==2.7.4</span><br></pre></td></tr></table></figure><p>在激活的虚拟环境下安装依赖:</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">$ pip install -r requirements\local.txt</span><br></pre></td></tr></table></figure><h4 id="Pycharm的配置"><a href="#Pycharm的配置" class="headerlink" title="Pycharm的配置"></a>Pycharm的配置</h4><p>如果生成项目时选项pycharm填入了y,下面我们来配置一下。</p><ul><li>打开 <code>File - Settings</code> -&gt; <code>Languages and Frameworks</code> -&gt; <code>Django</code>.</li></ul><p><img src="https://github.com/cgDeepLearn/LinuxSetups/blob/master/pics/pycharm_django_settings.png?raw=true" alt="pycharm-django-settings"></p><ul><li>勾选上 <code>Enable Django Support</code></li></ul><p>我们需要为Django数据库配置Postgresql数据库地址，我们点击<code>Environment variavles</code> 的 <code>...</code> ,添加<code>DATABASE_URL</code>变量(注DATABASE_URL在conf.setting中使用):</p><p><img src="https://github.com/cgDeepLearn/LinuxSetups/blob/master/pics/pycharm_django_env.png?raw=true" alt="pycharm-dajngo-env"></p><ul><li>Run the Server</li></ul><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">$</span> python manage.py migrate</span><br><span class="line"><span class="meta">$</span> python manage.py createsuperuser</span><br><span class="line"><span class="meta">$</span> python manage.py runserver</span><br></pre></td></tr></table></figure><h2 id="Cookiecutter-Django英文文档"><a href="#Cookiecutter-Django英文文档" class="headerlink" title="Cookiecutter-Django英文文档"></a>Cookiecutter-Django英文文档</h2><p>阅读<a href="https://cookiecutter-django.readthedocs.io/en/latest/" target="_blank" rel="noopener">英文指南</a>:</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">https://cookiecutter-django.readthedocs.io/en/latest/</span><br></pre></td></tr></table></figure><h2 id="结束"><a href="#结束" class="headerlink" title="结束"></a>结束</h2>]]></content>
    
    <summary type="html">
    
      &lt;p class=&quot;description&quot;&gt;快来动手试试吧，简直太棒了&lt;br&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;&quot; alt=&quot;&quot; style=&quot;width:100%&quot;&gt;&lt;/p&gt;
&lt;h2 id=&quot;前言&quot;&gt;&lt;a href=&quot;#前言&quot; class=&quot;headerlink&quot; title=&quot;前言&quot;&gt;&lt;/a&gt;前言&lt;/h2&gt;&lt;p&gt;&lt;div class=&quot;note info&quot;&gt;&lt;p&gt;&lt;br&gt;Cookiecutter可以让你快速从模板中建立工程，cookiecutter-django则是Django的模板，可以快速生成&lt;code&gt;Django&lt;/code&gt;大型项目模板。其特性如下:&lt;/p&gt;
&lt;p&gt;&lt;ul&gt;&lt;br&gt;  &lt;li&gt;跨平台: Windows,Mac 和Linux都支持&lt;/li&gt;&lt;br&gt;  &lt;li&gt;在Python2.7, 3.3, 3.4, 3.5, 3.6 和PyPy下运行&lt;/li&gt;&lt;br&gt;  &lt;li&gt;工程模板可以是任何语言&lt;/li&gt;&lt;br&gt;  &lt;li&gt;简单易用&lt;/li&gt;&lt;br&gt;&lt;/ul&gt;&lt;br&gt;&lt;/p&gt;&lt;/div&gt;&lt;br&gt;
    
    </summary>
    
      <category term="Python" scheme="https://cgdeeplearn.github.io/categories/Python/"/>
    
      <category term="Django" scheme="https://cgdeeplearn.github.io/categories/Python/Django/"/>
    
    
      <category term="python" scheme="https://cgdeeplearn.github.io/tags/python/"/>
    
      <category term="django" scheme="https://cgdeeplearn.github.io/tags/django/"/>
    
      <category term="cookiecutter" scheme="https://cgdeeplearn.github.io/tags/cookiecutter/"/>
    
  </entry>
  
  <entry>
    <title>Python协程的演化-从yield/send到async/await</title>
    <link href="https://cgdeeplearn.github.io/2018/03/12/asyncio/"/>
    <id>https://cgdeeplearn.github.io/2018/03/12/asyncio/</id>
    <published>2018-03-12T01:51:27.000Z</published>
    <updated>2018-03-19T08:29:15.938Z</updated>
    
    <content type="html"><![CDATA[<script src="/assets/js/APlayer.min.js"> </script><p class="description">python协程的演化<br></p><p><img src="" alt="" style="width:100%"></p><h2 id="前言"><a href="#前言" class="headerlink" title="前言"></a>前言</h2><p><div class="note info"><p></p><p>Python由于众所周知的<code>GIL</code>的原因,同一时刻只能有一个线程在运行，那么对于CPU密集的程序来说，线程之间的切换开销就成了拖累，而以I/O为瓶颈的程序正是协程所擅长的：</p><p>多任务并发（非并行），每个任务在合适的时候挂起（发起I/O）和恢复(I/O结束)*</p><p>Python中的协程经历了很长的一段发展历程。其大概经历了如下三个阶段：</p><ol><li>最初的生成器进化的yield/send</li><li>python3.4引入@asyncio.coroutine和yield from</li><li>在Python3.5版本中引入async/await关键字</li></ol><p></p></div><br><a id="more"></a></p><h2 id="yield-send"><a href="#yield-send" class="headerlink" title="yield/send"></a>yield/send</h2><p>我们用斐波那契数列做个例子</p><h3 id="传统的方式"><a href="#传统的方式" class="headerlink" title="传统的方式"></a>传统的方式</h3><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">normal_fib</span><span class="params">(n)</span>:</span></span><br><span class="line">    <span class="string">"""返回斐波那契数列前n项"""</span></span><br><span class="line">    res = [<span class="number">0</span>] * n</span><br><span class="line">    index = <span class="number">0</span></span><br><span class="line">    a, b = <span class="number">0</span>, <span class="number">1</span></span><br><span class="line">    <span class="keyword">while</span> index &lt; n:</span><br><span class="line">        res[index] = b</span><br><span class="line">        a, b = b, a + b</span><br><span class="line">        index += <span class="number">1</span></span><br><span class="line">    <span class="keyword">return</span> res</span><br><span class="line"></span><br><span class="line">print(<span class="string">'-'</span>*<span class="number">10</span> + <span class="string">'test old fib'</span> + <span class="string">'-'</span>*<span class="number">10</span>)</span><br><span class="line"><span class="keyword">for</span> fib_res <span class="keyword">in</span> normal_fib(<span class="number">20</span>):</span><br><span class="line">    print(fib_res)</span><br></pre></td></tr></table></figure><p>如果我们仅仅是需要拿到斐波那契序列的第n位，或者仅仅是希望依此产生斐波那契序列，那么上面这种传统方式就会比较耗费内存。这时生成器的特性就派上用场了—&gt; <code>yield</code>!!!</p><h3 id="yield"><a href="#yield" class="headerlink" title="yield"></a>yield</h3><p>我们用<code>yield</code>实现菲波那切数列。</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">gen_fib</span><span class="params">(n)</span>:</span></span><br><span class="line">    <span class="string">"""斐波那契数列生成器"""</span></span><br><span class="line">    index = <span class="number">0</span></span><br><span class="line">    a, b = <span class="number">0</span>, <span class="number">1</span></span><br><span class="line">    <span class="keyword">while</span> index &lt; n:</span><br><span class="line">        <span class="keyword">yield</span> b</span><br><span class="line">        a, b = b, a + b</span><br><span class="line">        index += <span class="number">1</span></span><br><span class="line"></span><br><span class="line">print(<span class="string">'-'</span>*<span class="number">10</span> + <span class="string">'test yield fib'</span> + <span class="string">'-'</span>*<span class="number">10</span>)</span><br><span class="line"><span class="keyword">for</span> fib_res <span class="keyword">in</span> fib(<span class="number">20</span>):</span><br><span class="line">    print(fib_res)</span><br></pre></td></tr></table></figure><p>当一个函数中包含<code>yield</code>语句时，python会自动将其识别为一个生成器。这时fib(20)并不会真正调用函数体，而是以函数体生成了一个生成器对象实例。</p><p><code>yield</code>在这里可以保留<code>gen_fib</code>函数的计算现场，暂停<code>gen_fib</code>的计算并将b返回。而将fib放入<code>for…in</code>循环中时，每次循环都会调用<code>next(fib(20))</code>，唤醒生成器，执行到下一个<code>yield</code>语句处，直到抛出<code>StopIteration</code>异常。此异常会被for循环捕获，导致跳出循环。</p><h3 id="send"><a href="#send" class="headerlink" title="send"></a>send</h3><p><code>send</code> 事件驱动，生成器进化成协程</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> time</span><br><span class="line"><span class="keyword">import</span> random</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">coro_fib</span><span class="params">(n)</span>:</span></span><br><span class="line">    <span class="string">"""斐波那契协程,send一个间隔时间，产出一个值"""</span></span><br><span class="line">    index = <span class="number">0</span></span><br><span class="line">    a, b = <span class="number">0</span>, <span class="number">1</span></span><br><span class="line">    <span class="keyword">while</span> index &lt; n:</span><br><span class="line">        sleep_sec = <span class="keyword">yield</span> b  <span class="comment"># 产出b，将send值绑定到sleep_sec,</span></span><br><span class="line">        print(<span class="string">'wait &#123;&#125; secs.'</span>.format(sleep_sec))</span><br><span class="line">        time.sleep(sleep_sec)</span><br><span class="line">        a, b = b, a + b</span><br><span class="line">        index += <span class="number">1</span></span><br><span class="line"></span><br><span class="line">print(<span class="string">'-'</span>*<span class="number">10</span> + <span class="string">'test yield send'</span> + <span class="string">'-'</span>*<span class="number">10</span>)</span><br><span class="line">N = <span class="number">20</span></span><br><span class="line">cfib = coro_fib(N)</span><br><span class="line">fib_res = next(cfib)  <span class="comment"># 预激协程,运行至yield处暂停</span></span><br><span class="line"><span class="keyword">while</span> <span class="keyword">True</span>:</span><br><span class="line">    print(fib_res)</span><br><span class="line">    <span class="keyword">try</span>:</span><br><span class="line">        fib_res = cfib.send(random.uniform(<span class="number">0</span>, <span class="number">0.5</span>))  <span class="comment"># send驱动协程, 修改合适的时间清楚执行过程</span></span><br><span class="line">    <span class="keyword">except</span> StopIteration:</span><br><span class="line">        <span class="keyword">break</span></span><br></pre></td></tr></table></figure><p>协程更多详细信息请移步<a href="coroutine.md">python coroutine</a>这里~</p><h3 id="yield-from"><a href="#yield-from" class="headerlink" title="yield from"></a>yield from</h3><p><code>yield from</code>用于重构生成器，简单的，可以这么使用：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">copy_fib</span><span class="params">(n)</span>:</span></span><br><span class="line">    print(<span class="string">'I am copy from gen_fib'</span>)</span><br><span class="line">    <span class="keyword">yield</span> <span class="keyword">from</span> gen_fib(n)  <span class="comment"># 委派给gen_fib生成器</span></span><br><span class="line">    print(<span class="string">'Copy end'</span>)</span><br><span class="line">print(<span class="string">'-'</span>*<span class="number">10</span> + <span class="string">'test yield from'</span> + <span class="string">'-'</span>*<span class="number">10</span>)</span><br><span class="line"><span class="keyword">for</span> fib_res <span class="keyword">in</span> copy_fib(<span class="number">20</span>):</span><br><span class="line">    print(fib_res)</span><br></pre></td></tr></table></figure><p>这种使用方式很简单，但远远不是<code>yield from</code>的全部。<code>yield from</code>的作用还体现可以像一个管道一样将<code>send</code>信息传递给内层协程，并且<strong>处理好了各种异常情况</strong>，因此，对于<code>coro_fib</code>也可以这样包装和使用：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">copy_coro_fib</span><span class="params">(n)</span>:</span></span><br><span class="line">    print(<span class="string">'I am copy from coro_fib'</span>)</span><br><span class="line">    <span class="keyword">yield</span> <span class="keyword">from</span> coro_fib(n)  <span class="comment"># 委托给coro_fib,异常也交由它处理</span></span><br><span class="line">    print(<span class="string">'Copy end'</span>)</span><br><span class="line">print(<span class="string">'-'</span>*<span class="number">10</span> + <span class="string">'test yield from and send'</span> + <span class="string">'-'</span>*<span class="number">10</span>)</span><br><span class="line">N = <span class="number">20</span></span><br><span class="line">ccfib = copy_coro_fib(N)</span><br><span class="line">fib_res = next(ccfib)</span><br><span class="line"><span class="keyword">while</span> <span class="keyword">True</span>:</span><br><span class="line">    print(fib_res)</span><br><span class="line">    <span class="keyword">try</span>:</span><br><span class="line">        fib_res = ccfib.send(random.uniform(<span class="number">0</span>, <span class="number">0.5</span>))</span><br><span class="line">    <span class="keyword">except</span> StopIteration:</span><br><span class="line">        <span class="keyword">break</span></span><br></pre></td></tr></table></figure><h2 id="asyncio-yield-from"><a href="#asyncio-yield-from" class="headerlink" title="asyncio/yield from"></a>asyncio/yield from</h2><p><code>asyncio</code>是一个基于事件循环的实现<code>异步I/O</code>的模块。通过<code>yield from</code>，我们可以将协程的控制权交给事件循环，然后挂起当前协程；之后，由事件循环决定何时唤醒协程,接着向后执行代码。</p><p>使用<code>asyncio.coroutine</code>装饰器</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment"># 并发处理两个快慢不一的斐波那契生成函数</span></span><br><span class="line"></span><br><span class="line"><span class="meta">@asyncio.coroutine</span></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">fast_fib</span><span class="params">(n)</span>:</span></span><br><span class="line">    <span class="string">"""smart one"""</span></span><br><span class="line">    index = <span class="number">0</span></span><br><span class="line">    a, b = <span class="number">0</span>, <span class="number">1</span></span><br><span class="line">    <span class="keyword">while</span> index &lt; n:</span><br><span class="line">        sleep_secs = random.uniform(<span class="number">0</span>, <span class="number">0.2</span>)</span><br><span class="line">        <span class="keyword">yield</span> <span class="keyword">from</span> asyncio.sleep(sleep_secs)</span><br><span class="line">        print(<span class="string">'Fast one think &#123;&#125; secs to get &#123;&#125;'</span>.format(sleep_secs, b))</span><br><span class="line">        a, b = b, a + b</span><br><span class="line">        index += <span class="number">1</span></span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">slow_fib</span><span class="params">(n)</span>:</span></span><br><span class="line">    <span class="string">"""slow one"""</span></span><br><span class="line">    index = <span class="number">0</span></span><br><span class="line">    a, b = <span class="number">0</span>, <span class="number">1</span></span><br><span class="line">    <span class="keyword">while</span> index &lt; n:</span><br><span class="line">        sleep_secs = random.uniform(<span class="number">0</span>, random_sec)</span><br><span class="line">        <span class="keyword">yield</span> <span class="keyword">from</span> asyncio.sleep(sleep_secs)</span><br><span class="line">        print(<span class="string">'Slow one think &#123;&#125; secs to get &#123;&#125;'</span>.format(sleep_secs, b))</span><br><span class="line">        a, b = b, a + b</span><br><span class="line">        index += <span class="number">1</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">if</span> __name__ == <span class="string">'__main__'</span>:</span><br><span class="line">    loop = asyncio.get_event_loop()  <span class="comment"># 获取时间循环的引用</span></span><br><span class="line">    tasks = [</span><br><span class="line">        asyncio.ensure_future(fast_fib(<span class="number">10</span>)),</span><br><span class="line">        asyncio.ensure_future(slow_fib(<span class="number">10</span>))  </span><br><span class="line">        <span class="comment"># ensure_future 和create_task都可以，asyncio.async过时了</span></span><br><span class="line">        <span class="comment"># loop.create_task(fast_fib(10)),</span></span><br><span class="line">        <span class="comment"># loop.create_task(slow_fib(10)) </span></span><br><span class="line">    ]</span><br><span class="line">    loop.run_until_complete(asyncio.wait(tasks)) </span><br><span class="line">    print(<span class="string">'All fib finished.'</span>)</span><br><span class="line">    loop.close()</span><br></pre></td></tr></table></figure><p>运行结果如下:</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line">...</span><br><span class="line">Fast one think 0.0393240884371622 secs to get 21</span><br><span class="line">Slow one think 0.12157996704037113 secs to get 5</span><br><span class="line">Fast one think 0.08259000223641344 secs to get 34</span><br><span class="line">Slow one think 0.15816909012449587 secs to get 8</span><br><span class="line">Fast one think 0.1967429201039252 secs to get 55</span><br><span class="line">Slow one think 0.25365548691367573 secs to get 13</span><br><span class="line">Slow one think 0.3235222687782598 secs to get 21</span><br><span class="line">Slow one think 0.35160632142878434 secs to get 34</span><br><span class="line">Slow one think 0.34477299780059134 secs to get 55</span><br><span class="line">All fib finished.</span><br></pre></td></tr></table></figure><h2 id="async-await"><a href="#async-await" class="headerlink" title="async/await"></a>async/await</h2><p>清楚了<code>asyncio.coroutine</code>和<code>yield from</code>之后，在Python3.5中引入的<code>async</code>和<code>await</code>就不难理解了：<br>可以将他们理解成<code>asyncio.coroutine/yield from</code>的完美替身。当然，从Python设计的角度来说，<code>async/await</code>让协程表面上独立于生成器而存在，将细节都隐藏于<code>asyncio</code>模块之下，语法更清晰明了。</p><p>async/await 示例:</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment"># 使用 async/await 关键字</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">async</span> <span class="function"><span class="keyword">def</span> <span class="title">fast_fib</span><span class="params">(n)</span>:</span></span><br><span class="line">    <span class="string">"""smart one"""</span></span><br><span class="line">    index = <span class="number">0</span></span><br><span class="line">    a, b = <span class="number">0</span>, <span class="number">1</span></span><br><span class="line">    <span class="keyword">while</span> index &lt; n:</span><br><span class="line">        sleep_secs = random.uniform(<span class="number">0</span>, <span class="number">0.2</span>)</span><br><span class="line">        <span class="keyword">await</span> asyncio.sleep(sleep_secs)</span><br><span class="line">        print(<span class="string">'Fast one think &#123;&#125; secs to get &#123;&#125;'</span>.format(sleep_secs, b))</span><br><span class="line">        a, b = b, a + b</span><br><span class="line">        index += <span class="number">1</span></span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="keyword">async</span> <span class="function"><span class="keyword">def</span> <span class="title">slow_fib</span><span class="params">(n)</span>:</span></span><br><span class="line">    <span class="string">"""slow one"""</span></span><br><span class="line">    index = <span class="number">0</span></span><br><span class="line">    a, b = <span class="number">0</span>, <span class="number">1</span></span><br><span class="line">    <span class="keyword">while</span> index &lt; n:</span><br><span class="line">        sleep_secs = random.uniform(<span class="number">0</span>, random_sec)</span><br><span class="line">        <span class="keyword">await</span> asyncio.sleep(sleep_secs)</span><br><span class="line">        print(<span class="string">'Slow one think &#123;&#125; secs to get &#123;&#125;'</span>.format(sleep_secs, b))</span><br><span class="line">        a, b = b, a + b</span><br><span class="line">        index += <span class="number">1</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">if</span> __name__ == <span class="string">'__main__'</span>:</span><br><span class="line">    loop = asyncio.get_event_loop()  <span class="comment"># 获取时间循环的引用</span></span><br><span class="line">    tasks = [</span><br><span class="line">        asyncio.ensure_future(fast_fib(<span class="number">10</span>)),</span><br><span class="line">        asyncio.ensure_future(slow_fib(<span class="number">10</span>))  </span><br><span class="line">        <span class="comment"># ensure_future 和create_task都可以，asyncio.async过时了</span></span><br><span class="line">        <span class="comment"># loop.create_task(fast_fib(10)),</span></span><br><span class="line">        <span class="comment"># loop.create_task(slow_fib(10)) </span></span><br><span class="line">    ]</span><br><span class="line">    loop.run_until_complete(asyncio.wait(tasks)) </span><br><span class="line">    print(<span class="string">'All fib finished.'</span>)</span><br><span class="line">    loop.close()</span><br></pre></td></tr></table></figure><p>可以发现相比上面<code>yield from</code>的版本只改变了以下两点:</p><ul><li>函数定义前面加了<code>async</code>关键字，更加清晰表明这是一个协程</li><li><code>yield from</code> 换成了<code>await</code>关键字</li></ul><h2 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h2><p>示例程序中都是以sleep为异步I/O的代表，在实际项目中，可以使用协程异步的读写网络、读写文件、渲染界面等，而在等待协程完成的同时，CPU还可以进行其他的计算。协程的作用正在于此。</p>]]></content>
    
    <summary type="html">
    
      &lt;p class=&quot;description&quot;&gt;python协程的演化&lt;br&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;&quot; alt=&quot;&quot; style=&quot;width:100%&quot;&gt;&lt;/p&gt;
&lt;h2 id=&quot;前言&quot;&gt;&lt;a href=&quot;#前言&quot; class=&quot;headerlink&quot; title=&quot;前言&quot;&gt;&lt;/a&gt;前言&lt;/h2&gt;&lt;p&gt;&lt;div class=&quot;note info&quot;&gt;&lt;p&gt;&lt;/p&gt;
&lt;p&gt;Python由于众所周知的&lt;code&gt;GIL&lt;/code&gt;的原因,同一时刻只能有一个线程在运行，那么对于CPU密集的程序来说，线程之间的切换开销就成了拖累，而以I/O为瓶颈的程序正是协程所擅长的：&lt;/p&gt;
&lt;p&gt;多任务并发（非并行），每个任务在合适的时候挂起（发起I/O）和恢复(I/O结束)*&lt;/p&gt;
&lt;p&gt;Python中的协程经历了很长的一段发展历程。其大概经历了如下三个阶段：&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;最初的生成器进化的yield/send&lt;/li&gt;
&lt;li&gt;python3.4引入@asyncio.coroutine和yield from&lt;/li&gt;
&lt;li&gt;在Python3.5版本中引入async/await关键字&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;/p&gt;&lt;/div&gt;&lt;br&gt;
    
    </summary>
    
      <category term="Python" scheme="https://cgdeeplearn.github.io/categories/Python/"/>
    
      <category term="进程线程协程" scheme="https://cgdeeplearn.github.io/categories/Python/%E8%BF%9B%E7%A8%8B%E7%BA%BF%E7%A8%8B%E5%8D%8F%E7%A8%8B/"/>
    
    
      <category term="yield/send" scheme="https://cgdeeplearn.github.io/tags/yield-send/"/>
    
      <category term="yield from" scheme="https://cgdeeplearn.github.io/tags/yield-from/"/>
    
      <category term="asyncio.coroutine" scheme="https://cgdeeplearn.github.io/tags/asyncio-coroutine/"/>
    
      <category term="async/await" scheme="https://cgdeeplearn.github.io/tags/async-await/"/>
    
      <category term="asyncio" scheme="https://cgdeeplearn.github.io/tags/asyncio/"/>
    
  </entry>
  
  <entry>
    <title>Python 并发 concurrent.futures</title>
    <link href="https://cgdeeplearn.github.io/2018/02/06/concurrent-futures/"/>
    <id>https://cgdeeplearn.github.io/2018/02/06/concurrent-futures/</id>
    <published>2018-02-06T09:02:42.000Z</published>
    <updated>2018-03-12T02:09:49.015Z</updated>
    
    <content type="html"><![CDATA[<script src="/assets/js/APlayer.min.js"> </script><p class="description">考虑用concurrent.futures来实现平行计算或并发处理<br></p><p><img src="" alt="" style="width:100%"></p><h2 id="导读"><a href="#导读" class="headerlink" title="导读"></a>导读</h2><p><div class="note info"><p><br>编写Python程序时,我们可以利用CPU的多核心通过平行计算来提升计算任务的速度。很遗憾，Python的全局解释器(<code>GIL</code>)的存在使得我们没有办法用<code>线程</code>实现真正的平行计算。</p><p>为了实现平行计算，我们可以考虑用C语言扩展或者使用诸如<code>Cython</code>和<code>Numba</code>等开源工具迁移到C语言。但是这样做大幅增加了测试量和风险。于是我们思考一下：有没有一种更好的方式，只需使用少量的Python代码，即可有效提升执行效率，并迅速解决复杂的计算问题。</p><p>我们可以试着通过内置的<code>concurrent.futures</code>模块来利用内置的<code>multiprocessing</code>模块实现这种需求。这样的做法会以子进程的形式，平行运行多个解释器，从而利用多核心CPU来提升执行速度(子进程与主解释器相分离，所以它们的全局解释器锁也是相互独立的)。<br></p></div><br><a id="more"></a><br>我们可以通过下面的例子来看一下效果。</p><h2 id="计算两数最大公约数"><a href="#计算两数最大公约数" class="headerlink" title="计算两数最大公约数"></a>计算两数最大公约数</h2><p>现在给出一个列表，列表里每个元素是一对数，求出每对数的最大公约数</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">numbers = [(<span class="number">1963309</span>, <span class="number">2265973</span>), (<span class="number">2030677</span>, <span class="number">3814172</span>),</span><br><span class="line">            (<span class="number">1551645</span>, <span class="number">2229620</span>), (<span class="number">2039045</span>, <span class="number">2020802</span>)]</span><br></pre></td></tr></table></figure><h3 id="没有做平行计算的版本"><a href="#没有做平行计算的版本" class="headerlink" title="没有做平行计算的版本"></a>没有做平行计算的版本</h3><figure class="highlight python"><figcaption><span>求最大公约数</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">gcd</span><span class="params">(pair)</span>:</span></span><br><span class="line">    a, b = pair</span><br><span class="line">    low = min(a, b)</span><br><span class="line">    <span class="keyword">for</span> i <span class="keyword">in</span> range(low, <span class="number">0</span>, <span class="number">-1</span>):</span><br><span class="line">        <span class="keyword">if</span> a % i == <span class="number">0</span> <span class="keyword">and</span> b % i == <span class="number">0</span>:</span><br><span class="line">            <span class="keyword">return</span> i</span><br></pre></td></tr></table></figure><p>我们用map来试运行一下:</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> time</span><br><span class="line">start = time.time()</span><br><span class="line">results = list(map(gcd, numbers))</span><br><span class="line">end = time.time()</span><br><span class="line">print(<span class="string">'Took %.3f seconds'</span> % (end - start))</span><br><span class="line">&gt;&gt;&gt;</span><br><span class="line">Took <span class="number">0.530</span> seconds</span><br></pre></td></tr></table></figure><p>下面我们用conccurrent.futures来模拟多线程和多进程</p><h3 id="使用concurretn-futures的ThreadPoolExecutor"><a href="#使用concurretn-futures的ThreadPoolExecutor" class="headerlink" title="使用concurretn.futures的ThreadPoolExecutor"></a>使用<code>concurretn.futures</code>的<code>ThreadPoolExecutor</code></h3><figure class="highlight python"><figcaption><span>使用ThreadPoolExecutor多线程</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">from</span> concurrent.futures <span class="keyword">import</span> ThreadPoolExecutor</span><br><span class="line">start = time.time()</span><br><span class="line">pool = ThreadPoolExecutor(max_workers=<span class="number">2</span>) <span class="comment"># cpu核心数目个工作线程 </span></span><br><span class="line">results = list(pool.map(gcd, numbers))</span><br><span class="line">end = time.time()</span><br><span class="line">print(<span class="string">'Took %.3f seconds'</span> % (end - start))</span><br><span class="line">&gt;&gt;&gt;</span><br><span class="line">Took <span class="number">0.535</span> seconds</span><br></pre></td></tr></table></figure><p>两个线程用了和上面差不多的时间，而且比上面还慢一些，说明多线程并不能平行计算，而且开线程也有耗费。</p><h3 id="使用concurrent-futures的ProcessPoollExecutor"><a href="#使用concurrent-futures的ProcessPoollExecutor" class="headerlink" title="使用concurrent.futures的ProcessPoollExecutor"></a>使用<code>concurrent.futures</code>的<code>ProcessPoollExecutor</code></h3><figure class="highlight python"><figcaption><span>将ThreadPoolExecutor换成ProcessPoolExecutor</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">from</span> concurrent.futures <span class="keyword">import</span> ProcessPoolExecutor</span><br><span class="line">start = time.time()</span><br><span class="line">pool = ProcessPoolExecutor(max_workers=<span class="number">2</span>) <span class="comment"># cpu核心数目个工作进程 </span></span><br><span class="line">results = list(pool.map(gcd, numbers))</span><br><span class="line">end = time.time()</span><br><span class="line">print(<span class="string">'Took %.3f seconds'</span> % (end - start))</span><br><span class="line"><span class="meta">&gt;&gt;&gt; </span></span><br><span class="line">Took <span class="number">0.287</span> seconds</span><br></pre></td></tr></table></figure><p>在双核电脑上运行上面程序发现比之前两个版本运行快很多。这是因为<code>ProcessPoolExecutor</code>会利用<code>multiprocessing</code>模块所提供的的底层机制来逐步完成下列操作：</p><ol><li>把numbers列表中的每一项输入数据都传给map</li><li>用pickle模块对数据进行序列化，将其变成二进制形式。</li><li>通过本地套接字socket将序列化后的数据从主解释器所在的进程发送到子解释器所在的进程。</li><li>接下来在子进程中，用pickle对二进制数据进行反序列化操作,将其还原为Python对象</li><li>引入包含gcd函数的那个Python模块</li><li>各条子进程平行地针对各自的输入数据，来运行gcd函数</li><li>对运行结果进行序列化操作，将其变为字节</li><li>将这些字节通过socket复制到主进程中</li><li>主进程对这些字节执行反序列化操作，将其还原为Python对象。</li><li>最后，把每条子进程所求出的计算结果合并到一份列表中，返回给调用者</li></ol><h3 id="编后语"><a href="#编后语" class="headerlink" title="编后语"></a>编后语</h3><p>为了实现平行计算，<code>multiprocessing</code>模块和<code>ProcessPoolExecutor</code>类在幕后做了大量的工作。如果改用其他的语言来写，那么开发者只需一把同步锁或一项原子操作，就可以把线程之间的通信过程协调好。而在Python中，我们却必须使用开销较高的<code>multiprocessing</code>模块,其开销之所以大，原因就在于主进程与子进程之间，必须进行序列化和反序列化操作，这些是导致大量开销的来源。</p><p>对于某些较为孤立，且数据利用率高的任务来说，上述方案非常适合。如果执行的运算不符合上述特征，那么<code>multiprocessing</code>所产生的的开销可能并不能使程序加速。在这种情况下，可以求助multiprocessing所提供的的一些高级机制，如内存共享(<code>shared memory</code>)、跨进程锁定(<code>cross-process lock</code>)、队列(<code>queue</code>)和代理(<code>proxy</code>)等。</p><h2 id="下载进度条显示"><a href="#下载进度条显示" class="headerlink" title="下载进度条显示"></a>下载进度条显示</h2><p>用<code>concurrent.futures</code>的<code>ThreadPoolExecutor</code>类处理对于大量I/O操作的并发任务的示例。非常值得参考的实现。</p><p>flags_common.py是一些默认参数和函数接口以及argparse。<br>flags_sequential.py是单线程依序下载以及进度条显示实现。<br>flags_threadpool.py是利用concurrent.futures的多线程操作实现。</p><ul><li>flags_common.py</li></ul><figure class="highlight python"><figcaption><span>flags_common.py</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br><span class="line">91</span><br><span class="line">92</span><br><span class="line">93</span><br><span class="line">94</span><br><span class="line">95</span><br><span class="line">96</span><br><span class="line">97</span><br><span class="line">98</span><br><span class="line">99</span><br><span class="line">100</span><br><span class="line">101</span><br><span class="line">102</span><br><span class="line">103</span><br><span class="line">104</span><br><span class="line">105</span><br><span class="line">106</span><br><span class="line">107</span><br><span class="line">108</span><br><span class="line">109</span><br><span class="line">110</span><br><span class="line">111</span><br><span class="line">112</span><br><span class="line">113</span><br><span class="line">114</span><br><span class="line">115</span><br><span class="line">116</span><br><span class="line">117</span><br><span class="line">118</span><br><span class="line">119</span><br><span class="line">120</span><br><span class="line">121</span><br><span class="line">122</span><br><span class="line">123</span><br><span class="line">124</span><br><span class="line">125</span><br><span class="line">126</span><br><span class="line">127</span><br><span class="line">128</span><br><span class="line">129</span><br><span class="line">130</span><br><span class="line">131</span><br><span class="line">132</span><br><span class="line">133</span><br><span class="line">134</span><br><span class="line">135</span><br><span class="line">136</span><br><span class="line">137</span><br><span class="line">138</span><br><span class="line">139</span><br><span class="line">140</span><br><span class="line">141</span><br><span class="line">142</span><br><span class="line">143</span><br><span class="line">144</span><br><span class="line">145</span><br><span class="line">146</span><br><span class="line">147</span><br><span class="line">148</span><br><span class="line">149</span><br></pre></td><td class="code"><pre><span class="line"><span class="string">"""Utilities for second set of flag examples.</span></span><br><span class="line"><span class="string">"""</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">import</span> os</span><br><span class="line"><span class="keyword">import</span> time</span><br><span class="line"><span class="keyword">import</span> sys</span><br><span class="line"><span class="keyword">import</span> string</span><br><span class="line"><span class="keyword">import</span> argparse</span><br><span class="line"><span class="keyword">from</span> collections <span class="keyword">import</span> namedtuple</span><br><span class="line"><span class="keyword">from</span> enum <span class="keyword">import</span> Enum</span><br><span class="line"></span><br><span class="line"></span><br><span class="line">Result = namedtuple(<span class="string">'Result'</span>, <span class="string">'status data'</span>)</span><br><span class="line"></span><br><span class="line">HTTPStatus = Enum(<span class="string">'Status'</span>, <span class="string">'ok not_found error'</span>)</span><br><span class="line"></span><br><span class="line">POP20_CC = (<span class="string">'CN IN US ID BR PK NG BD RU JP '</span></span><br><span class="line">            <span class="string">'MX PH VN ET EG DE IR TR CD FR'</span>).split()</span><br><span class="line"></span><br><span class="line">DEFAULT_CONCUR_REQ = <span class="number">1</span></span><br><span class="line">MAX_CONCUR_REQ = <span class="number">1</span></span><br><span class="line"></span><br><span class="line">SERVERS = &#123;</span><br><span class="line">    <span class="string">'REMOTE'</span>: <span class="string">'http://flupy.org/data/flags'</span>,</span><br><span class="line">    <span class="string">'LOCAL'</span>:  <span class="string">'http://localhost:8001/flags'</span>,</span><br><span class="line">    <span class="string">'DELAY'</span>:  <span class="string">'http://localhost:8002/flags'</span>,</span><br><span class="line">    <span class="string">'ERROR'</span>:  <span class="string">'http://localhost:8003/flags'</span>,</span><br><span class="line">&#125;</span><br><span class="line">DEFAULT_SERVER = <span class="string">'LOCAL'</span></span><br><span class="line"></span><br><span class="line">DEST_DIR = <span class="string">'downloads/'</span></span><br><span class="line">COUNTRY_CODES_FILE = <span class="string">'country_codes.txt'</span></span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">save_flag</span><span class="params">(img, filename)</span>:</span></span><br><span class="line">    path = os.path.join(DEST_DIR, filename)</span><br><span class="line">    <span class="keyword">with</span> open(path, <span class="string">'wb'</span>) <span class="keyword">as</span> fp:</span><br><span class="line">        fp.write(img)</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">initial_report</span><span class="params">(cc_list, actual_req, server_label)</span>:</span></span><br><span class="line">    <span class="keyword">if</span> len(cc_list) &lt;= <span class="number">10</span>:</span><br><span class="line">        cc_msg = <span class="string">', '</span>.join(cc_list)</span><br><span class="line">    <span class="keyword">else</span>:</span><br><span class="line">        cc_msg = <span class="string">'from &#123;&#125; to &#123;&#125;'</span>.format(cc_list[<span class="number">0</span>], cc_list[<span class="number">-1</span>])</span><br><span class="line">    print(<span class="string">'&#123;&#125; site: &#123;&#125;'</span>.format(server_label, SERVERS[server_label]))</span><br><span class="line">    msg = <span class="string">'Searching for &#123;&#125; flag&#123;&#125;: &#123;&#125;'</span></span><br><span class="line">    plural = <span class="string">'s'</span> <span class="keyword">if</span> len(cc_list) != <span class="number">1</span> <span class="keyword">else</span> <span class="string">''</span></span><br><span class="line">    print(msg.format(len(cc_list), plural, cc_msg))</span><br><span class="line">    plural = <span class="string">'s'</span> <span class="keyword">if</span> actual_req != <span class="number">1</span> <span class="keyword">else</span> <span class="string">''</span></span><br><span class="line">    msg = <span class="string">'&#123;&#125; concurrent connection&#123;&#125; will be used.'</span></span><br><span class="line">    print(msg.format(actual_req, plural))</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">final_report</span><span class="params">(cc_list, counter, start_time)</span>:</span></span><br><span class="line">    elapsed = time.time() - start_time</span><br><span class="line">    print(<span class="string">'-'</span> * <span class="number">20</span>)</span><br><span class="line">    msg = <span class="string">'&#123;&#125; flag&#123;&#125; downloaded.'</span></span><br><span class="line">    plural = <span class="string">'s'</span> <span class="keyword">if</span> counter[HTTPStatus.ok] != <span class="number">1</span> <span class="keyword">else</span> <span class="string">''</span></span><br><span class="line">    print(msg.format(counter[HTTPStatus.ok], plural))</span><br><span class="line">    <span class="keyword">if</span> counter[HTTPStatus.not_found]:</span><br><span class="line">        print(counter[HTTPStatus.not_found], <span class="string">'not found.'</span>)</span><br><span class="line">    <span class="keyword">if</span> counter[HTTPStatus.error]:</span><br><span class="line">        plural = <span class="string">'s'</span> <span class="keyword">if</span> counter[HTTPStatus.error] != <span class="number">1</span> <span class="keyword">else</span> <span class="string">''</span></span><br><span class="line">        print(<span class="string">'&#123;&#125; error&#123;&#125;.'</span>.format(counter[HTTPStatus.error], plural))</span><br><span class="line">    print(<span class="string">'Elapsed time: &#123;:.2f&#125;s'</span>.format(elapsed))</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">expand_cc_args</span><span class="params">(every_cc, all_cc, cc_args, limit)</span>:</span></span><br><span class="line">    codes = set()</span><br><span class="line">    A_Z = string.ascii_uppercase</span><br><span class="line">    <span class="keyword">if</span> every_cc:</span><br><span class="line">        codes.update(a+b <span class="keyword">for</span> a <span class="keyword">in</span> A_Z <span class="keyword">for</span> b <span class="keyword">in</span> A_Z)</span><br><span class="line">    <span class="keyword">elif</span> all_cc:</span><br><span class="line">        <span class="keyword">with</span> open(COUNTRY_CODES_FILE) <span class="keyword">as</span> fp:</span><br><span class="line">            text = fp.read()</span><br><span class="line">        codes.update(text.split())</span><br><span class="line">    <span class="keyword">else</span>:</span><br><span class="line">        <span class="keyword">for</span> cc <span class="keyword">in</span> (c.upper() <span class="keyword">for</span> c <span class="keyword">in</span> cc_args):</span><br><span class="line">            <span class="keyword">if</span> len(cc) == <span class="number">1</span> <span class="keyword">and</span> cc <span class="keyword">in</span> A_Z:</span><br><span class="line">                codes.update(cc+c <span class="keyword">for</span> c <span class="keyword">in</span> A_Z)</span><br><span class="line">            <span class="keyword">elif</span> len(cc) == <span class="number">2</span> <span class="keyword">and</span> all(c <span class="keyword">in</span> A_Z <span class="keyword">for</span> c <span class="keyword">in</span> cc):</span><br><span class="line">                codes.add(cc)</span><br><span class="line">            <span class="keyword">else</span>:</span><br><span class="line">                msg = <span class="string">'each CC argument must be A to Z or AA to ZZ.'</span></span><br><span class="line">                <span class="keyword">raise</span> ValueError(<span class="string">'*** Usage error: '</span>+msg)</span><br><span class="line">    <span class="keyword">return</span> sorted(codes)[:limit]</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">process_args</span><span class="params">(default_concur_req)</span>:</span></span><br><span class="line">    server_options = <span class="string">', '</span>.join(sorted(SERVERS))</span><br><span class="line">    parser = argparse.ArgumentParser(</span><br><span class="line">                description=<span class="string">'Download flags for country codes. '</span></span><br><span class="line">                <span class="string">'Default: top 20 countries by population.'</span>)</span><br><span class="line">    parser.add_argument(<span class="string">'cc'</span>, metavar=<span class="string">'CC'</span>, nargs=<span class="string">'*'</span>,</span><br><span class="line">                help=<span class="string">'country code or 1st letter (eg. B for BA...BZ)'</span>)</span><br><span class="line">    parser.add_argument(<span class="string">'-a'</span>, <span class="string">'--all'</span>, action=<span class="string">'store_true'</span>,</span><br><span class="line">                help=<span class="string">'get all available flags (AD to ZW)'</span>)</span><br><span class="line">    parser.add_argument(<span class="string">'-e'</span>, <span class="string">'--every'</span>, action=<span class="string">'store_true'</span>,</span><br><span class="line">                help=<span class="string">'get flags for every possible code (AA...ZZ)'</span>)</span><br><span class="line">    parser.add_argument(<span class="string">'-l'</span>, <span class="string">'--limit'</span>, metavar=<span class="string">'N'</span>, type=int,</span><br><span class="line">                help=<span class="string">'limit to N first codes'</span>, default=sys.maxsize)</span><br><span class="line">    parser.add_argument(<span class="string">'-m'</span>, <span class="string">'--max_req'</span>, metavar=<span class="string">'CONCURRENT'</span>, type=int,</span><br><span class="line">                default=default_concur_req,</span><br><span class="line">                help=<span class="string">'maximum concurrent requests (default=&#123;&#125;)'</span></span><br><span class="line">                      .format(default_concur_req))</span><br><span class="line">    parser.add_argument(<span class="string">'-s'</span>, <span class="string">'--server'</span>, metavar=<span class="string">'LABEL'</span>,</span><br><span class="line">                default=DEFAULT_SERVER,</span><br><span class="line">                help=<span class="string">'Server to hit; one of &#123;&#125; (default=&#123;&#125;)'</span></span><br><span class="line">                      .format(server_options, DEFAULT_SERVER))</span><br><span class="line">    parser.add_argument(<span class="string">'-v'</span>, <span class="string">'--verbose'</span>, action=<span class="string">'store_true'</span>,</span><br><span class="line">                help=<span class="string">'output detailed progress info'</span>)</span><br><span class="line">    args = parser.parse_args()</span><br><span class="line">    <span class="keyword">if</span> args.max_req &lt; <span class="number">1</span>:</span><br><span class="line">        print(<span class="string">'*** Usage error: --max_req CONCURRENT must be &gt;= 1'</span>)</span><br><span class="line">        parser.print_usage()</span><br><span class="line">        sys.exit(<span class="number">1</span>)</span><br><span class="line">    <span class="keyword">if</span> args.limit &lt; <span class="number">1</span>:</span><br><span class="line">        print(<span class="string">'*** Usage error: --limit N must be &gt;= 1'</span>)</span><br><span class="line">        parser.print_usage()</span><br><span class="line">        sys.exit(<span class="number">1</span>)</span><br><span class="line">    args.server = args.server.upper()</span><br><span class="line">    <span class="keyword">if</span> args.server <span class="keyword">not</span> <span class="keyword">in</span> SERVERS:</span><br><span class="line">        print(<span class="string">'*** Usage error: --server LABEL must be one of'</span>,</span><br><span class="line">              server_options)</span><br><span class="line">        parser.print_usage()</span><br><span class="line">        sys.exit(<span class="number">1</span>)</span><br><span class="line">    <span class="keyword">try</span>:</span><br><span class="line">        cc_list = expand_cc_args(args.every, args.all, args.cc, args.limit)</span><br><span class="line">    <span class="keyword">except</span> ValueError <span class="keyword">as</span> exc:</span><br><span class="line">        print(exc.args[<span class="number">0</span>])</span><br><span class="line">        parser.print_usage()</span><br><span class="line">        sys.exit(<span class="number">1</span>)</span><br><span class="line"></span><br><span class="line">    <span class="keyword">if</span> <span class="keyword">not</span> cc_list:</span><br><span class="line">        cc_list = sorted(POP20_CC)</span><br><span class="line">    <span class="keyword">return</span> args, cc_list</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">main</span><span class="params">(download_many, default_concur_req, max_concur_req)</span>:</span></span><br><span class="line">    args, cc_list = process_args(default_concur_req)</span><br><span class="line">    actual_req = min(args.max_req, max_concur_req, len(cc_list))</span><br><span class="line">    initial_report(cc_list, actual_req, args.server)</span><br><span class="line">    base_url = SERVERS[args.server]</span><br><span class="line">    t0 = time.time()</span><br><span class="line">    counter = download_many(cc_list, base_url, args.verbose, actual_req)</span><br><span class="line">    <span class="keyword">assert</span> sum(counter.values()) == len(cc_list), \</span><br><span class="line">        <span class="string">'some downloads are unaccounted for'</span></span><br><span class="line">    final_report(cc_list, counter, t0)</span><br></pre></td></tr></table></figure><ul><li>falgs_sequential.py</li></ul><figure class="highlight python"><figcaption><span>flags_sequential.py</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br></pre></td><td class="code"><pre><span class="line"><span class="string">"""Download flags of countries (with error handling).</span></span><br><span class="line"><span class="string"></span></span><br><span class="line"><span class="string">Sequential version</span></span><br><span class="line"><span class="string"></span></span><br><span class="line"><span class="string">Sample run::</span></span><br><span class="line"><span class="string"></span></span><br><span class="line"><span class="string">    $ python3 flags_sequential.py -s DELAY b</span></span><br><span class="line"><span class="string">    DELAY site: http://localhost:8002/flags</span></span><br><span class="line"><span class="string">    Searching for 26 flags: from BA to BZ</span></span><br><span class="line"><span class="string">    1 concurrent connection will be used.</span></span><br><span class="line"><span class="string">    --------------------</span></span><br><span class="line"><span class="string">    17 flags downloaded.</span></span><br><span class="line"><span class="string">    9 not found.</span></span><br><span class="line"><span class="string">    Elapsed time: 13.36s</span></span><br><span class="line"><span class="string"></span></span><br><span class="line"><span class="string">"""</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">import</span> collections</span><br><span class="line"></span><br><span class="line"><span class="keyword">import</span> requests</span><br><span class="line"><span class="keyword">import</span> tqdm</span><br><span class="line"></span><br><span class="line"><span class="keyword">from</span> flags_common <span class="keyword">import</span> main, save_flag, HTTPStatus, Result</span><br><span class="line"></span><br><span class="line"></span><br><span class="line">DEFAULT_CONCUR_REQ = <span class="number">1</span></span><br><span class="line">MAX_CONCUR_REQ = <span class="number">1</span></span><br><span class="line"></span><br><span class="line"><span class="comment"># BEGIN FLAGS2_BASIC_HTTP_FUNCTIONS</span></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">get_flag</span><span class="params">(base_url, cc)</span>:</span></span><br><span class="line">    url = <span class="string">'&#123;&#125;/&#123;cc&#125;/&#123;cc&#125;.gif'</span>.format(base_url, cc=cc.lower())</span><br><span class="line">    resp = requests.get(url)</span><br><span class="line">    <span class="keyword">if</span> resp.status_code != <span class="number">200</span>:  <span class="comment"># &lt;1&gt;</span></span><br><span class="line">        resp.raise_for_status()</span><br><span class="line">    <span class="keyword">return</span> resp.content</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">download_one</span><span class="params">(cc, base_url, verbose=False)</span>:</span></span><br><span class="line">    <span class="keyword">try</span>:</span><br><span class="line">        image = get_flag(base_url, cc)</span><br><span class="line">    <span class="keyword">except</span> requests.exceptions.HTTPError <span class="keyword">as</span> exc:  <span class="comment"># &lt;2&gt;</span></span><br><span class="line">        res = exc.response</span><br><span class="line">        <span class="keyword">if</span> res.status_code == <span class="number">404</span>:</span><br><span class="line">            status = HTTPStatus.not_found  <span class="comment"># &lt;3&gt;</span></span><br><span class="line">            msg = <span class="string">'not found'</span></span><br><span class="line">        <span class="keyword">else</span>:  <span class="comment"># &lt;4&gt;</span></span><br><span class="line">            <span class="keyword">raise</span></span><br><span class="line">    <span class="keyword">else</span>:</span><br><span class="line">        save_flag(image, cc.lower() + <span class="string">'.gif'</span>)</span><br><span class="line">        status = HTTPStatus.ok</span><br><span class="line">        msg = <span class="string">'OK'</span></span><br><span class="line"></span><br><span class="line">    <span class="keyword">if</span> verbose:  <span class="comment"># &lt;5&gt;</span></span><br><span class="line">        print(cc, msg)</span><br><span class="line"></span><br><span class="line">    <span class="keyword">return</span> Result(status, cc)  <span class="comment"># &lt;6&gt;</span></span><br><span class="line"><span class="comment"># END FLAGS2_BASIC_HTTP_FUNCTIONS</span></span><br><span class="line"></span><br><span class="line"><span class="comment"># BEGIN FLAGS2_DOWNLOAD_MANY_SEQUENTIAL</span></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">download_many</span><span class="params">(cc_list, base_url, verbose, max_req)</span>:</span></span><br><span class="line">    counter = collections.Counter()  <span class="comment"># &lt;1&gt;</span></span><br><span class="line">    cc_iter = sorted(cc_list)  <span class="comment"># &lt;2&gt;</span></span><br><span class="line">    <span class="keyword">if</span> <span class="keyword">not</span> verbose:</span><br><span class="line">        cc_iter = tqdm.tqdm(cc_iter)  <span class="comment"># &lt;3&gt;</span></span><br><span class="line">    <span class="keyword">for</span> cc <span class="keyword">in</span> cc_iter:  <span class="comment"># &lt;4&gt;</span></span><br><span class="line">        <span class="keyword">try</span>:</span><br><span class="line">            res = download_one(cc, base_url, verbose)  <span class="comment"># &lt;5&gt;</span></span><br><span class="line">        <span class="keyword">except</span> requests.exceptions.HTTPError <span class="keyword">as</span> exc:  <span class="comment"># &lt;6&gt;</span></span><br><span class="line">            error_msg = <span class="string">'HTTP error &#123;res.status_code&#125; - &#123;res.reason&#125;'</span></span><br><span class="line">            error_msg = error_msg.format(res=exc.response)</span><br><span class="line">        <span class="keyword">except</span> requests.exceptions.ConnectionError <span class="keyword">as</span> exc:  <span class="comment"># &lt;7&gt;</span></span><br><span class="line">            error_msg = <span class="string">'Connection error'</span></span><br><span class="line">        <span class="keyword">else</span>:  <span class="comment"># &lt;8&gt;</span></span><br><span class="line">            error_msg = <span class="string">''</span></span><br><span class="line">            status = res.status</span><br><span class="line"></span><br><span class="line">        <span class="keyword">if</span> error_msg:</span><br><span class="line">            status = HTTPStatus.error  <span class="comment"># &lt;9&gt;</span></span><br><span class="line">        counter[status] += <span class="number">1</span>  <span class="comment"># &lt;10&gt;</span></span><br><span class="line">        <span class="keyword">if</span> verbose <span class="keyword">and</span> error_msg: <span class="comment"># &lt;11&gt;</span></span><br><span class="line">            print(<span class="string">'*** Error for &#123;&#125;: &#123;&#125;'</span>.format(cc, error_msg))</span><br><span class="line"></span><br><span class="line">    <span class="keyword">return</span> counter  <span class="comment"># &lt;12&gt;</span></span><br><span class="line"><span class="comment"># END FLAGS2_DOWNLOAD_MANY_SEQUENTIAL</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">if</span> __name__ == <span class="string">'__main__'</span>:</span><br><span class="line">    main(download_many, DEFAULT_CONCUR_REQ, MAX_CONCUR_REQ)</span><br></pre></td></tr></table></figure><ul><li>flags_threadpool.py</li></ul><figure class="highlight python"><figcaption><span>flags_threadpool.py</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br></pre></td><td class="code"><pre><span class="line"><span class="string">"""Download flags of countries (with error handling).</span></span><br><span class="line"><span class="string"></span></span><br><span class="line"><span class="string">ThreadPool version</span></span><br><span class="line"><span class="string"></span></span><br><span class="line"><span class="string">Sample run::</span></span><br><span class="line"><span class="string"></span></span><br><span class="line"><span class="string">    $ python3 flags_threadpool.py -s REMOTE -e</span></span><br><span class="line"><span class="string">    ERROR site: http://localhost:8003/flags</span></span><br><span class="line"><span class="string">    Searching for 676 flags: from AA to ZZ</span></span><br><span class="line"><span class="string">    30 concurrent connections will be used.</span></span><br><span class="line"><span class="string">    --------------------</span></span><br><span class="line"><span class="string">    150 flags downloaded.</span></span><br><span class="line"><span class="string">    361 not found.</span></span><br><span class="line"><span class="string">    165 errors.</span></span><br><span class="line"><span class="string">    Elapsed time: 7.46s</span></span><br><span class="line"><span class="string"></span></span><br><span class="line"><span class="string">"""</span></span><br><span class="line"></span><br><span class="line"><span class="comment"># BEGIN FLAGS2_THREADPOOL</span></span><br><span class="line"><span class="keyword">import</span> collections</span><br><span class="line"><span class="keyword">from</span> concurrent <span class="keyword">import</span> futures</span><br><span class="line"></span><br><span class="line"><span class="keyword">import</span> requests</span><br><span class="line"><span class="keyword">import</span> tqdm  <span class="comment"># &lt;1&gt;</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">from</span> flags_common <span class="keyword">import</span> main, HTTPStatus  <span class="comment"># &lt;2&gt;</span></span><br><span class="line"><span class="keyword">from</span> flags_sequential <span class="keyword">import</span> download_one  <span class="comment"># &lt;3&gt;</span></span><br><span class="line"></span><br><span class="line">DEFAULT_CONCUR_REQ = <span class="number">30</span>  <span class="comment"># &lt;4&gt;</span></span><br><span class="line">MAX_CONCUR_REQ = <span class="number">1000</span>  <span class="comment"># &lt;5&gt;</span></span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">download_many</span><span class="params">(cc_list, base_url, verbose, concur_req)</span>:</span></span><br><span class="line">    counter = collections.Counter()</span><br><span class="line">    <span class="keyword">with</span> futures.ThreadPoolExecutor(max_workers=concur_req) <span class="keyword">as</span> executor:  <span class="comment"># &lt;6&gt;</span></span><br><span class="line">        to_do_map = &#123;&#125;  <span class="comment"># &lt;7&gt;</span></span><br><span class="line">        <span class="keyword">for</span> cc <span class="keyword">in</span> sorted(cc_list):  <span class="comment"># &lt;8&gt;</span></span><br><span class="line">            future = executor.submit(download_one,</span><br><span class="line">                            cc, base_url, verbose)  <span class="comment"># &lt;9&gt;</span></span><br><span class="line">            to_do_map[future] = cc  <span class="comment"># &lt;10&gt;</span></span><br><span class="line">        done_iter = futures.as_completed(to_do_map)  <span class="comment"># &lt;11&gt;</span></span><br><span class="line">        <span class="keyword">if</span> <span class="keyword">not</span> verbose:</span><br><span class="line">            done_iter = tqdm.tqdm(done_iter, total=len(cc_list))  <span class="comment"># &lt;12&gt;</span></span><br><span class="line">        <span class="keyword">for</span> future <span class="keyword">in</span> done_iter:  <span class="comment"># &lt;13&gt;</span></span><br><span class="line">            <span class="keyword">try</span>:</span><br><span class="line">                res = future.result()  <span class="comment"># &lt;14&gt;</span></span><br><span class="line">            <span class="keyword">except</span> requests.exceptions.HTTPError <span class="keyword">as</span> exc:  <span class="comment"># &lt;15&gt;</span></span><br><span class="line">                error_msg = <span class="string">'HTTP &#123;res.status_code&#125; - &#123;res.reason&#125;'</span></span><br><span class="line">                error_msg = error_msg.format(res=exc.response)</span><br><span class="line">            <span class="keyword">except</span> requests.exceptions.ConnectionError <span class="keyword">as</span> exc:</span><br><span class="line">                error_msg = <span class="string">'Connection error'</span></span><br><span class="line">            <span class="keyword">else</span>:</span><br><span class="line">                error_msg = <span class="string">''</span></span><br><span class="line">                status = res.status</span><br><span class="line"></span><br><span class="line">            <span class="keyword">if</span> error_msg:</span><br><span class="line">                status = HTTPStatus.error</span><br><span class="line">            counter[status] += <span class="number">1</span></span><br><span class="line">            <span class="keyword">if</span> verbose <span class="keyword">and</span> error_msg:</span><br><span class="line">                cc = to_do_map[future]  <span class="comment"># &lt;16&gt;</span></span><br><span class="line">                print(<span class="string">'*** Error for &#123;&#125;: &#123;&#125;'</span>.format(cc, error_msg))</span><br><span class="line"></span><br><span class="line">    <span class="keyword">return</span> counter</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="keyword">if</span> __name__ == <span class="string">'__main__'</span>:</span><br><span class="line">    main(download_many, DEFAULT_CONCUR_REQ, MAX_CONCUR_REQ)</span><br><span class="line"><span class="comment"># END FLAGS2_THREADPOOL</span></span><br></pre></td></tr></table></figure><h2 id="结束"><a href="#结束" class="headerlink" title="结束"></a>结束</h2>]]></content>
    
    <summary type="html">
    
      &lt;p class=&quot;description&quot;&gt;考虑用concurrent.futures来实现平行计算或并发处理&lt;br&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;&quot; alt=&quot;&quot; style=&quot;width:100%&quot;&gt;&lt;/p&gt;
&lt;h2 id=&quot;导读&quot;&gt;&lt;a href=&quot;#导读&quot; class=&quot;headerlink&quot; title=&quot;导读&quot;&gt;&lt;/a&gt;导读&lt;/h2&gt;&lt;p&gt;&lt;div class=&quot;note info&quot;&gt;&lt;p&gt;&lt;br&gt;编写Python程序时,我们可以利用CPU的多核心通过平行计算来提升计算任务的速度。很遗憾，Python的全局解释器(&lt;code&gt;GIL&lt;/code&gt;)的存在使得我们没有办法用&lt;code&gt;线程&lt;/code&gt;实现真正的平行计算。&lt;/p&gt;
&lt;p&gt;为了实现平行计算，我们可以考虑用C语言扩展或者使用诸如&lt;code&gt;Cython&lt;/code&gt;和&lt;code&gt;Numba&lt;/code&gt;等开源工具迁移到C语言。但是这样做大幅增加了测试量和风险。于是我们思考一下：有没有一种更好的方式，只需使用少量的Python代码，即可有效提升执行效率，并迅速解决复杂的计算问题。&lt;/p&gt;
&lt;p&gt;我们可以试着通过内置的&lt;code&gt;concurrent.futures&lt;/code&gt;模块来利用内置的&lt;code&gt;multiprocessing&lt;/code&gt;模块实现这种需求。这样的做法会以子进程的形式，平行运行多个解释器，从而利用多核心CPU来提升执行速度(子进程与主解释器相分离，所以它们的全局解释器锁也是相互独立的)。&lt;br&gt;&lt;/p&gt;&lt;/div&gt;&lt;br&gt;
    
    </summary>
    
      <category term="Python" scheme="https://cgdeeplearn.github.io/categories/Python/"/>
    
      <category term="进程线程协程" scheme="https://cgdeeplearn.github.io/categories/Python/%E8%BF%9B%E7%A8%8B%E7%BA%BF%E7%A8%8B%E5%8D%8F%E7%A8%8B/"/>
    
    
      <category term="concurrent.futures" scheme="https://cgdeeplearn.github.io/tags/concurrent-futures/"/>
    
      <category term="ProcessPoolExecutor" scheme="https://cgdeeplearn.github.io/tags/ProcessPoolExecutor/"/>
    
      <category term="ThreadPoolExecutor" scheme="https://cgdeeplearn.github.io/tags/ThreadPoolExecutor/"/>
    
      <category term="multiprocessing" scheme="https://cgdeeplearn.github.io/tags/multiprocessing/"/>
    
  </entry>
  
  <entry>
    <title>Python协程-coroutine</title>
    <link href="https://cgdeeplearn.github.io/2018/01/30/coroutine/"/>
    <id>https://cgdeeplearn.github.io/2018/01/30/coroutine/</id>
    <published>2018-01-30T02:45:38.000Z</published>
    <updated>2018-03-01T09:35:10.701Z</updated>
    
    <content type="html"><![CDATA[<script src="/assets/js/APlayer.min.js"> </script><p class="description">考虑用协程来并发的运行多个函数<br></p><p><img src="" alt="" style="width:100%"></p><h2 id="前言"><a href="#前言" class="headerlink" title="前言"></a>前言</h2><div class="note primary"><p><br>我们可以用线程来运行多个函数，使这些函数看上去好像是在同一时间得到执行的。然而，线程有<code>三</code>个显著的缺点：<br><ul><br><li><i class="fa fa-minus-square"></i> 为了确保数据安全，我们必须使用特殊的工具(<code>Lock</code>, <code>Queue</code>等)来协调这些线程，这使得多线程的代码，要比单线程的过程式代码更加难懂。这些复杂的多线程代码，会逐渐令程序变得难以扩展和维护。</li><br><li><i class="fa fa-minus-square"></i> 线程需要<code>占用大量内存</code>，每个正在执行的线程，大约占据<code>8MB</code>内存。如果只开十几个线程，多数计算机还是可以承受的。</li><br><li><i class="fa fa-minus-square"></i> 线程<code>启动的开销比较大</code>。如果程序不停地依靠创建新线程来同时执行多个函数，并等待这些线程结束，那么使用线程所引发的开销，就会拖慢整个程序的速度。</li><br></ul></p></div><a id="more"></a><p>Python的<code>协程(coroutine)</code>可以避免上述问题，它使得Python程序看上去好像是在同时运行多个函数。协程的实现方式，实际上是对生成器的一种扩展。启动生成器协程所需的开销，与调用函数的开销相仿。处于活跃状态的协程，在其耗尽之前，只会占用不到<code>1KB</code>的内存。</p><h2 id="协程的工作原理"><a href="#协程的工作原理" class="headerlink" title="协程的工作原理"></a>协程的工作原理</h2><p>每当生成器函数执行到<code>yield</code>表达式的时候，消耗生成器的那段代码，就通过<code>send</code>方法给生成器回传一个值。而生成器在手熬了经由send函数所传进来的这个值后，这个值会绑定给<code>yield</code>关键字左边的变量；如果<code>yield</code>关键字右边有表达式，那么<code>yield</code>表达式右侧的内容会当成send方法的返回值(没有的话其实返回的是<code>None</code>)，返回给外界(调用方).关键的一点是，协程在 <code>yield</code> 关键字所在的位置暂停执行。在赋值语句中， <code>=</code> 右边的代码在赋值之前执行。下面我们结合两个例子来看看。</p><h3 id="简单的协程示例"><a href="#简单的协程示例" class="headerlink" title="简单的协程示例"></a>简单的协程示例</h3><figure class="highlight python"><figcaption><span>简单协程示例</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">my_coroutine</span><span class="params">()</span>:</span></span><br><span class="line">    <span class="keyword">while</span> <span class="keyword">True</span>:</span><br><span class="line">        received = <span class="keyword">yield</span></span><br><span class="line">        print(<span class="string">'Received:'</span>, received)</span><br><span class="line"></span><br><span class="line">it = my_coroutine()</span><br><span class="line">next(it)  <span class="comment"># 1</span></span><br><span class="line">it.send(<span class="string">'First'</span>)  <span class="comment"># 2</span></span><br><span class="line">it.send(<span class="string">'Second'</span>)</span><br><span class="line"></span><br><span class="line">&gt;&gt;&gt;</span><br><span class="line">Received: First</span><br><span class="line">Received: Second</span><br></pre></td></tr></table></figure><p><i class="fa fa-pencil"></i>注1: 在生成器上面调用<code>send</code>方法，我们要先调用next函数(这叫<code>预激协程</code>)，以便将生成器推进到第一条<code>yield</code>表达式那里</p><h3 id="协程产出值"><a href="#协程产出值" class="headerlink" title="协程产出值"></a>协程产出值</h3><p>该示例在协程每收到一个数值，就会产出当前所统计到的最大值</p><figure class="highlight python"><figcaption><span>协程产出值</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">maximize</span><span class="params">()</span>:</span></span><br><span class="line">    current = <span class="keyword">yield</span>  <span class="comment"># 1</span></span><br><span class="line">    <span class="keyword">while</span> <span class="keyword">True</span>:</span><br><span class="line">        value = <span class="keyword">yield</span> current  <span class="comment"># 2</span></span><br><span class="line">        current = max(value, current)  <span class="comment"># 3</span></span><br><span class="line"></span><br><span class="line">it = maximize()</span><br><span class="line">next(it)  <span class="comment"># 预激协程，执行到第一个yield处</span></span><br><span class="line">print(it.send(<span class="number">10</span>)) <span class="comment"># 执行到#2处产出current值，等待接收值</span></span><br><span class="line">print(it.send(<span class="number">12</span>)) <span class="comment"># 绑定12给value，计算current，执行到#2处产出current值，等待接收值</span></span><br><span class="line">print(it.send(<span class="number">4</span>))  <span class="comment"># 同上，即执行到yield表达式右边，等待左边输入绑定</span></span><br><span class="line">print(it.send(<span class="number">22</span>))</span><br><span class="line"></span><br><span class="line">&gt;&gt;&gt;</span><br><span class="line"><span class="number">10</span></span><br><span class="line"><span class="number">12</span></span><br><span class="line"><span class="number">12</span></span><br><span class="line"><span class="number">22</span></span><br></pre></td></tr></table></figure><p>上面的代码范例中，第一条<code>yield</code>语句中的<code>yield</code>关键字后面没有跟随内容，其意思是，把外面传进来的首个值，当成目前的最大值。<br>此后生成器会屡次执行while循环中的那条<code>yield</code>语句，以便将当前统计到的最大值告诉外界，同时等候外界传入下一个待考察的值。</p><div class="note info"><p>协程在yield关键字所在的位置暂停执行。在赋值语句中， = 右边的代码在赋值之前执行。即各个阶段都在yield表达式中结束，先产出值然后在yield出暂停，等待外界传入值。下一个阶段都从那一行代码开始</p></div><h2 id="yield-from"><a href="#yield-from" class="headerlink" title="yield from"></a>yield from</h2><p>协程可以通过yield的输出值来推进其他的生成器函数，使得那些生成器函数也执行到它们各自的下一条yield比到时处。接连推进多个独立的生成器，即可模拟出Python线程的并发行为，令程序看上去好像是在同时运行多个函数</p><h3 id="使用yield-from计算平均值并输出统计报告"><a href="#使用yield-from计算平均值并输出统计报告" class="headerlink" title="使用yield from计算平均值并输出统计报告"></a>使用yield from计算平均值并输出统计报告</h3><p>从一个字典中读取虚构的七年级男女学生的体重和身高。例如，’boys;m’ 键对应于 9 个男学生的身高（单位是米）， ‘girls;kg’ 键对应于 10 个女学生的体重（单位是千克）。这个脚本把各组数据传给前面定义的 averager 协程，然后生成一个报告。</p><figure class="highlight python"><figcaption><span>使用yield from计算平均值并输出统计报告</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment"># -*- coding: utf-8 -*-</span></span><br><span class="line"><span class="string">"""使用yield from计算平均值并输出统计报告"""</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">from</span> collections <span class="keyword">import</span> namedtuple</span><br><span class="line"></span><br><span class="line">Result = namedtuple(<span class="string">'Result'</span>, <span class="string">'count average'</span>)</span><br><span class="line"></span><br><span class="line"><span class="comment"># 子生成器</span></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">averager</span><span class="params">()</span>:</span>  <span class="comment"># 1</span></span><br><span class="line">    total = <span class="number">0.0</span></span><br><span class="line">    count = <span class="number">0</span></span><br><span class="line">    average = <span class="keyword">None</span></span><br><span class="line">    <span class="keyword">while</span> <span class="keyword">True</span>:</span><br><span class="line">        term = <span class="keyword">yield</span>  <span class="comment"># 2</span></span><br><span class="line">        <span class="keyword">if</span> term <span class="keyword">is</span> <span class="keyword">None</span>:  <span class="comment"># 3</span></span><br><span class="line">            <span class="keyword">break</span></span><br><span class="line">        total += term</span><br><span class="line">        count += <span class="number">1</span></span><br><span class="line">        average = total / count</span><br><span class="line">    <span class="keyword">return</span> Result(count, average)  <span class="comment"># 4</span></span><br><span class="line"></span><br><span class="line"><span class="comment"># 委派生成器</span></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">grouper</span><span class="params">(results, key)</span>:</span>  <span class="comment"># 5</span></span><br><span class="line">    <span class="keyword">while</span> <span class="keyword">True</span>:  <span class="comment"># 6</span></span><br><span class="line">        results[key] = <span class="keyword">yield</span> <span class="keyword">from</span> averager()  <span class="comment"># 7</span></span><br><span class="line"><span class="comment"># 客户端代码，即调用方</span></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">main</span><span class="params">(data)</span>:</span>  <span class="comment"># 8</span></span><br><span class="line">    results = &#123;&#125;</span><br><span class="line">    <span class="keyword">for</span> key, values <span class="keyword">in</span> data.items():</span><br><span class="line">        group = grouper(results, key)  <span class="comment"># 9</span></span><br><span class="line">        next(group)  <span class="comment"># 10</span></span><br><span class="line">        <span class="keyword">for</span> value <span class="keyword">in</span> values:</span><br><span class="line">            group.send(value)  <span class="comment"># 11</span></span><br><span class="line">        group.send(<span class="keyword">None</span>)  <span class="comment"># 重要！ 12</span></span><br><span class="line">    print(results)  <span class="comment"># 如果要调试，去掉注释</span></span><br><span class="line">    report(results)</span><br><span class="line"><span class="comment"># 输出报告</span></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">report</span><span class="params">(results)</span>:</span></span><br><span class="line">    <span class="keyword">for</span> key, result <span class="keyword">in</span> sorted(results.items()):</span><br><span class="line">        group, unit = key.split(<span class="string">';'</span>)</span><br><span class="line">        print(<span class="string">'&#123;:2&#125; &#123;:5&#125; averaging &#123;:.2f&#125;&#123;&#125;'</span>.format(</span><br><span class="line">            result.count, group, result.average, unit))</span><br><span class="line"></span><br><span class="line">DATA = &#123;</span><br><span class="line">    <span class="string">'girls;kg'</span>: [<span class="number">40.9</span>, <span class="number">38.5</span>, <span class="number">44.3</span>, <span class="number">42.2</span>, <span class="number">45.2</span>, <span class="number">41.7</span>, <span class="number">44.5</span>, <span class="number">38.0</span>, <span class="number">40.6</span>, <span class="number">44.5</span>],</span><br><span class="line">    <span class="string">'girls;m'</span>: [<span class="number">1.6</span>, <span class="number">1.51</span>, <span class="number">1.4</span>, <span class="number">1.3</span>, <span class="number">1.41</span>, <span class="number">1.39</span>, <span class="number">1.33</span>, <span class="number">1.46</span>, <span class="number">1.45</span>, <span class="number">1.43</span>],</span><br><span class="line">    <span class="string">'boys;kg'</span>: [<span class="number">39.0</span>, <span class="number">40.8</span>, <span class="number">43.2</span>, <span class="number">40.8</span>, <span class="number">43.1</span>, <span class="number">38.6</span>, <span class="number">41.4</span>, <span class="number">40.6</span>, <span class="number">36.3</span>],</span><br><span class="line">    <span class="string">'boys;m'</span>: [<span class="number">1.38</span>, <span class="number">1.5</span>, <span class="number">1.32</span>, <span class="number">1.25</span>, <span class="number">1.37</span>, <span class="number">1.48</span>, <span class="number">1.25</span>, <span class="number">1.49</span>, <span class="number">1.46</span>],</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="keyword">if</span> __name__ == <span class="string">'__main__'</span>:</span><br><span class="line">    main(DATA)</span><br></pre></td></tr></table></figure><div class="note info"><p><br>1-  与示例 16-13 中的 averager 协程一样。这里作为子生成器使用。<br>2-  main 函数中的客户代码发送的各个值绑定到这里的 term 变量上。<br>3-  至关重要的终止条件。如果不这么做，使用 yield from 调用这个协程的生成器会永<br>远阻塞。<br>4- 返回的 Result 会成为 grouper 函数中 yield from 表达式的值。<br>5-  grouper 是委派生成器。<br>6-  这个循环每次迭代时会新建一个 averager 实例；每个实例都是作为协程使用的生成器对象。<br>7-  grouper 发送的每个值都会经由 yield from 处理，通过管道传给 averager 实例。 grouper 会在 yield from 表达式处暂停，等待 averager 实例处理客户端发来的值。 averager 实例运行完毕后，返回的值绑定到 results[key] 上。 while 循环会不断创建 averager 实例，处理更多的值。<br>8- main 函数是客户端代码，用 PEP 380 定义的术语来说，是“调用方”。这是驱动一切的函数<br>9- group 是调用 grouper 函数得到的生成器对象，传给 grouper 函数的第一个参数是results，用于收集结果；第二个参数是某个键。 group 作为协程使用。<br>10- 预激 group 协程。<br>11- 把各个 value 传给 grouper。传入的值最终到达 averager 函数中 term = yield 那一行； grouper 永远不知道传入的值是什么。<br>12- 把 None 传入 grouper，导致当前的 averager 实例终止，也让 grouper 继续运行，再创建一个 averager 实例，处理下一组值。<br></p></div><h2 id="生命游戏：演示协程的协同运作效果。"><a href="#生命游戏：演示协程的协同运作效果。" class="headerlink" title="生命游戏：演示协程的协同运作效果。"></a>生命游戏：演示协程的协同运作效果。</h2><h3 id="游戏规则"><a href="#游戏规则" class="headerlink" title="游戏规则"></a>游戏规则</h3><ul><li>在一个任意尺寸的二维网格中，每个细胞(即每个单元格)都处于<code>生存(alive,用*表示)</code>或<code>空白(empty,用-表示)</code>状态。</li><li>时钟每走一步，生命游戏就向前进一步。向前推进时，我们要点算每个细胞周边的那八个单元格，看看该细胞附近有多少个存活的细胞。然后根据存活的数量来判断自己下一轮是继续存活、死亡还是再生。</li><li>具体判断规则<ul><li>若本细胞存活，且周围存活者不足两个，则本细胞下一轮死亡。</li><li>若本细胞存活，且周围的存活者多于3个，则本细胞下一轮死亡。</li><li>若本细胞死亡，且周围的存活者恰有3个，则本细胞下一轮再生。</li></ul></li></ul><h3 id="建模"><a href="#建模" class="headerlink" title="建模"></a>建模</h3><p>基于规则我们可以将整个程序分成三个阶段:<code>count_neighbors</code>, <code>step_cell</code>, <code>display</code></p><ul><li>count_neighbors: 计算每个细胞附近8个细胞存活的数目</li><li>step_cell: 根据细胞本轮状态和计算得到周围的细胞数量生成下一轮的状态</li><li>根据每轮的结果显示细胞状态</li></ul><h4 id="count-neighbors"><a href="#count-neighbors" class="headerlink" title="count_neighbors"></a>count_neighbors</h4><p>我们定义一个协程来获取周围细胞的生存状态。协程会产生一个自定义的<code>Query</code>对象，每个<code>yield</code>表达式的结果，要么是<code>ALIVE</code>，要么是<code>EMPTY</code>。其后count_neighbors生成器会根据相邻细胞的状态，来返回本细胞周围的存活细胞数(生成器return语句在python3中才可用，实际是把结果作为StopIteration异常的value属性传给了调用者)</p><figure class="highlight python"><figcaption><span>count_neighbors协程计算细胞周围的存活数目</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">from</span> collections <span class="keyword">import</span> namedtuple</span><br><span class="line"></span><br><span class="line">ALIVE = <span class="string">'*'</span></span><br><span class="line">EMPTY = <span class="string">'-'</span></span><br><span class="line"></span><br><span class="line">Query = namedtuple(<span class="string">'Query'</span>, (<span class="string">'y'</span>, <span class="string">'x'</span>))</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">count_neighbors</span><span class="params">(y, x)</span>:</span></span><br><span class="line">    n_ = <span class="keyword">yield</span> Query(y + <span class="number">1</span>, x + <span class="number">0</span>)  <span class="comment"># North</span></span><br><span class="line">    ne = <span class="keyword">yield</span> Query(y + <span class="number">1</span>, x + <span class="number">1</span>)  <span class="comment"># Northeast</span></span><br><span class="line">    e_ = <span class="keyword">yield</span> Query(y + <span class="number">0</span>, x + <span class="number">1</span>)  <span class="comment"># East</span></span><br><span class="line">    se = <span class="keyword">yield</span> Query(y - <span class="number">1</span>, x + <span class="number">1</span>)  <span class="comment"># Southeast</span></span><br><span class="line">    s_ = <span class="keyword">yield</span> Query(y - <span class="number">1</span>, x + <span class="number">0</span>)  <span class="comment"># South</span></span><br><span class="line">    sw = <span class="keyword">yield</span> Query(y - <span class="number">1</span>, x - <span class="number">1</span>)  <span class="comment"># Southwest</span></span><br><span class="line">    w_ = <span class="keyword">yield</span> Query(y + <span class="number">0</span>, x - <span class="number">1</span>)  <span class="comment"># West</span></span><br><span class="line">    nw = <span class="keyword">yield</span> Query(y + <span class="number">1</span>, x - <span class="number">1</span>)  <span class="comment"># Northwest</span></span><br><span class="line">    neighbor_states = [n_, ne, e_, se, s_, sw, w_, nw]</span><br><span class="line">    count = <span class="number">0</span></span><br><span class="line">    <span class="keyword">for</span> state <span class="keyword">in</span> neighbor_states:</span><br><span class="line">        <span class="keyword">if</span> state == ALIVE:</span><br><span class="line">            count += <span class="number">1</span></span><br><span class="line">    <span class="keyword">return</span> count</span><br></pre></td></tr></table></figure><p>我们用虚构的数据来测试一下这个count_neighbors协程.<br>下面这段代码，会针对本细胞的每个相邻细胞，向生成器索要一个<code>Query</code>对象，并产出<code>Query namedtuple</code>。然后通过<code>send</code>方法把状态发给协程，使<code>count_neighbors</code>协程可以收到上一个<code>Query</code>对象所对应的状态(注意我们上文提到的<code>yield</code>表达式一行执行顺序–先右再左)</p><figure class="highlight python"><figcaption><span>测试count_neighbors协程</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">&gt;&gt;&gt; </span>it = count_neighbors(<span class="number">10</span>, <span class="number">5</span>)</span><br><span class="line"><span class="meta">&gt;&gt;&gt; </span>next(it)  <span class="comment"># Get the first query, for q1</span></span><br><span class="line">Query(y=<span class="number">11</span>, x=<span class="number">5</span>)</span><br><span class="line"><span class="meta">&gt;&gt;&gt; </span>it.send(ALIVE)  <span class="comment"># Send q1 state, get q2</span></span><br><span class="line">Query(y=<span class="number">11</span>, x=<span class="number">6</span>)</span><br><span class="line"><span class="meta">&gt;&gt;&gt; </span>it.send(ALIVE)  <span class="comment"># Send q2 state, get q3</span></span><br><span class="line">Query(y=<span class="number">10</span>, x=<span class="number">6</span>)</span><br><span class="line"><span class="meta">&gt;&gt;&gt; </span> <span class="comment"># Send q3 ... q7 states, get q4 ... q8</span></span><br><span class="line"><span class="meta">&gt;&gt;&gt; </span>[it.send(state) <span class="keyword">for</span> state <span class="keyword">in</span> (EMPTY)*<span class="number">5</span>]  <span class="comment"># doctest: +ELLIPSIS</span></span><br><span class="line">[Query(y=<span class="number">9</span>, x=<span class="number">6</span>), Query(y=<span class="number">9</span>, x=<span class="number">5</span>), ..., Query(y=<span class="number">11</span>, x=<span class="number">4</span>)]</span><br><span class="line"><span class="meta">&gt;&gt;&gt; </span><span class="keyword">try</span>:</span><br><span class="line"><span class="meta">... </span>    it.send(EMPTY)  <span class="comment"># Send q8 state, drive coroutine to end</span></span><br><span class="line"><span class="meta">... </span><span class="keyword">except</span> StopIteration <span class="keyword">as</span> e:</span><br><span class="line"><span class="meta">... </span>    count = e.value  <span class="comment"># Value from return statement</span></span><br><span class="line">...</span><br><span class="line"><span class="meta">&gt;&gt;&gt; </span>count</span><br><span class="line"><span class="number">2</span></span><br></pre></td></tr></table></figure><h4 id="step-cell"><a href="#step-cell" class="headerlink" title="step_cell"></a>step_cell</h4><p>计算出了细胞周围的存活数量，我们就需要根据这个数量来更新细胞的状态。并把得到的状态传给外部调用者。<br>这里我们自定义了一个<code>Transition</code>对象，它表示坐标位于(y,x)的细胞的下一轮的状态。</p><figure class="highlight python"><figcaption><span>step_cell根据count_neighbors计算出来的存活状态数量产生下一轮的状态</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br></pre></td><td class="code"><pre><span class="line">Transition = namedtuple(<span class="string">'Transition'</span>, (<span class="string">'y'</span>, <span class="string">'x'</span>, <span class="string">'state'</span>))  <span class="comment"># state即是下一轮的状态</span></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">step_cell</span><span class="params">(y, x)</span>:</span></span><br><span class="line">    current_state = <span class="keyword">yield</span> Query(y, x) <span class="comment"># 获取当前状态</span></span><br><span class="line">    neighbors = <span class="keyword">yield</span> <span class="keyword">from</span> count_neighbors(y, x)  <span class="comment"># 委派给子生成器count_neighbors </span></span><br><span class="line">    next_state = game_logic(state, neighbors)  <span class="comment"># game_logic根据规则判断下一轮状态</span></span><br><span class="line">    <span class="keyword">yield</span> Transition(y, x, next_state)</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">game_logic</span><span class="params">(state, neighbors)</span>:</span></span><br><span class="line">    <span class="comment"># 这里其实我们可以使用是否等于3来简化判断</span></span><br><span class="line">    <span class="keyword">if</span> state == ALIVE:</span><br><span class="line">        <span class="keyword">if</span> neighbors &lt; <span class="number">2</span>:</span><br><span class="line">            <span class="keyword">return</span> EMPTY     <span class="comment"># Die: Too few</span></span><br><span class="line">        <span class="keyword">elif</span> neighbors &gt; <span class="number">3</span>:</span><br><span class="line">            <span class="keyword">return</span> EMPTY     <span class="comment"># Die: Too many</span></span><br><span class="line">    <span class="keyword">else</span>:</span><br><span class="line">        <span class="keyword">if</span> neighbors == <span class="number">3</span>:</span><br><span class="line">            <span class="keyword">return</span> ALIVE     <span class="comment"># Regenerate</span></span><br><span class="line">    <span class="keyword">return</span> state</span><br></pre></td></tr></table></figure><p>下面我们用虚拟数据来测试一下<code>step_cell</code>协程：</p><figure class="highlight python"><figcaption><span>测试step_cell协程</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">&gt;&gt;&gt; </span>it = step_cell(<span class="number">10</span>, <span class="number">5</span>)</span><br><span class="line"><span class="meta">&gt;&gt;&gt; </span>next(it)  <span class="comment"># Initial location query</span></span><br><span class="line">Query(y=<span class="number">10</span>, x=<span class="number">5</span>)</span><br><span class="line"><span class="meta">&gt;&gt;&gt; </span>[it.send(st) <span class="keyword">for</span> st <span class="keyword">in</span> (ALIVE)*<span class="number">5</span> + (EMPTY)*<span class="number">3</span>]   <span class="comment"># doctest: +ELLIPSIS</span></span><br><span class="line">[Query(y=<span class="number">11</span>, x=<span class="number">5</span>), Query(y=<span class="number">11</span>, x=<span class="number">6</span>), ... Query(y=<span class="number">11</span>, x=<span class="number">4</span>)]</span><br><span class="line"><span class="meta">&gt;&gt;&gt; </span>it.send(EMPTY)  <span class="comment"># Send q8 state, get game decision</span></span><br><span class="line">Transition(y=<span class="number">10</span>, x=<span class="number">5</span>, state=<span class="string">'-'</span>)</span><br></pre></td></tr></table></figure><p>上面演示了在网格中一个细胞的一次前进。下面我们把<code>step_cell</code>组合到新的<code>simulate</code>协程之中。新的协程会多次通过yield from 表达式，来推进网格中的每一个细胞。把每个细胞处理完后，<code>simulate</code>协程会产生<code>TICK</code>对象，用以表示当前这一代的细胞已经全部迁移完毕。</p><figure class="highlight python"><figcaption><span>simulate</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line">TICK = object()</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">simulate</span><span class="params">(height, width)</span>:</span></span><br><span class="line">    <span class="keyword">while</span> <span class="keyword">True</span>:</span><br><span class="line">        <span class="keyword">for</span> y <span class="keyword">in</span> range(height):</span><br><span class="line">            <span class="keyword">for</span> x <span class="keyword">in</span> range(width):</span><br><span class="line">                <span class="keyword">yield</span> <span class="keyword">from</span> step_cell(y, x)  <span class="comment"># 委派给子生成器step_cell</span></span><br><span class="line">        <span class="keyword">yield</span> TICK</span><br></pre></td></tr></table></figure><h4 id="网格显示状态"><a href="#网格显示状态" class="headerlink" title="网格显示状态"></a>网格显示状态</h4><p>为了在真实环境中运行<code>simulate</code>，我们需要把网格中的每个细胞状态表示出来。我们定义一个Grid类，来代表整张网格：</p><figure class="highlight python"><figcaption><span>Grid类显示网格和细胞状态</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br></pre></td><td class="code"><pre><span class="line"><span class="class"><span class="keyword">class</span> <span class="title">Grid</span><span class="params">(object)</span>:</span></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">__init__</span><span class="params">(self, height, width)</span>:</span></span><br><span class="line">        self.height = height</span><br><span class="line">        self.width = width</span><br><span class="line">        self.rows = []</span><br><span class="line">        <span class="keyword">for</span> _ <span class="keyword">in</span> range(self.height):</span><br><span class="line">            self.rows.append([EMPTY] * self.width)</span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">__str__</span><span class="params">(self)</span>:</span></span><br><span class="line">        output = <span class="string">''</span></span><br><span class="line">        <span class="keyword">for</span> row <span class="keyword">in</span> self.rows:</span><br><span class="line">            <span class="keyword">for</span> cell <span class="keyword">in</span> row:</span><br><span class="line">                output += cell</span><br><span class="line">            output += <span class="string">'\n'</span></span><br><span class="line">        <span class="keyword">return</span> output</span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">__getitem__</span><span class="params">(self, position)</span>:</span></span><br><span class="line">        y, x = position</span><br><span class="line">        <span class="comment"># 如果传入的坐标值越界，我们用取余来自动折回</span></span><br><span class="line">        <span class="keyword">return</span> self.rows[y % self.height][x % self.width]</span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">__setitem__</span><span class="params">(self, position, state)</span>:</span></span><br><span class="line">        y, x = position</span><br><span class="line">        self.rows[y % self.height][x % self.width] = state</span><br></pre></td></tr></table></figure><p>我们定义了<code>__getitem__</code>和<code>__setitem__</code>两个元方法来设置和获取<code>state</code>。下面我们看一下<code>Grid</code>的显示：</p><figure class="highlight python"><figcaption><span>根据参数Grid生成网格和状态</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">&gt;&gt;&gt; </span>grid = Grid(<span class="number">5</span>, <span class="number">9</span>)</span><br><span class="line"><span class="meta">&gt;&gt;&gt; </span>grid[<span class="number">0</span>, <span class="number">3</span>] = ALIVE</span><br><span class="line"><span class="meta">&gt;&gt;&gt; </span>grid[<span class="number">1</span>, <span class="number">4</span>] = ALIVE</span><br><span class="line"><span class="meta">&gt;&gt;&gt; </span>grid[<span class="number">2</span>, <span class="number">2</span>] = ALIVE</span><br><span class="line"><span class="meta">&gt;&gt;&gt; </span>grid[<span class="number">2</span>, <span class="number">3</span>] = ALIVE</span><br><span class="line"><span class="meta">&gt;&gt;&gt; </span>grid[<span class="number">2</span>, <span class="number">4</span>] = ALIVE</span><br><span class="line"><span class="meta">&gt;&gt;&gt; </span>print(grid)</span><br><span class="line">---*-----</span><br><span class="line">----*----</span><br><span class="line">--***----</span><br><span class="line">---------</span><br><span class="line">---------</span><br></pre></td></tr></table></figure><h4 id="live-a-generation"><a href="#live-a-generation" class="headerlink" title="live_a_generation"></a>live_a_generation</h4><p>这个函数把网格内的所有细胞都向前推进一步，待各细胞状态迁移完成后，这些细胞就构成了一张新的网格，该函数会把新的网格返回给调用者。</p><figure class="highlight python"><figcaption><span>live_a_generation</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">def</span> <span class="title">live_a_generation</span><span class="params">(grid, sim)</span>:</span></span><br><span class="line">    <span class="comment"># grid: 现阶段网格对象；sim: simulate生成器对象</span></span><br><span class="line">    progeny = Grid(grid.height, grid.width)  <span class="comment"># 下一代网格对象 </span></span><br><span class="line">    item = next(sim)</span><br><span class="line">    <span class="keyword">while</span> item <span class="keyword">is</span> <span class="keyword">not</span> TICK:</span><br><span class="line">        <span class="keyword">if</span> isinstance(item, Query):  <span class="comment">#计算附近细胞</span></span><br><span class="line">            state = grid[item.y, item.x]</span><br><span class="line">            item = sim.send(state)</span><br><span class="line">        <span class="keyword">else</span>:  <span class="comment"># Must be a Transition，附近细胞算完了,得到Transition对象</span></span><br><span class="line">            progeny[item.y, item.x] = item.state</span><br><span class="line">            item = next(sim) <span class="comment"># 生成器运行到下一个yield处，即simulate的下一个坐标处</span></span><br><span class="line">    <span class="keyword">return</span> progeny  <span class="comment">#返回下一轮的网格对象</span></span><br></pre></td></tr></table></figure><p><code>live_a_generation</code>是将当前细胞向前推进一步，现在我们把每一代的结果都显示出来</p><figure class="highlight python"><figcaption><span>ColumnPrinter</span></figcaption><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br></pre></td><td class="code"><pre><span class="line"><span class="class"><span class="keyword">class</span> <span class="title">ColumnPrinter</span><span class="params">(object)</span>:</span></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">__init__</span><span class="params">(self)</span>:</span></span><br><span class="line">        self.columns = []</span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">append</span><span class="params">(self, data)</span>:</span></span><br><span class="line">        self.columns.append(data)</span><br><span class="line"></span><br><span class="line">    <span class="function"><span class="keyword">def</span> <span class="title">__str__</span><span class="params">(self)</span>:</span></span><br><span class="line">        row_count = <span class="number">1</span></span><br><span class="line">        <span class="keyword">for</span> data <span class="keyword">in</span> self.columns:</span><br><span class="line">            row_count = max(row_count, len(data.splitlines()) + <span class="number">1</span>)</span><br><span class="line">        rows = [<span class="string">''</span>] * row_count</span><br><span class="line">        <span class="keyword">for</span> j <span class="keyword">in</span> range(row_count):</span><br><span class="line">            <span class="keyword">for</span> i, data <span class="keyword">in</span> enumerate(self.columns):</span><br><span class="line">                line = data.splitlines()[max(<span class="number">0</span>, j - <span class="number">1</span>)]</span><br><span class="line">                <span class="keyword">if</span> j == <span class="number">0</span>:</span><br><span class="line">                    rows[j] += str(i).center(len(line))</span><br><span class="line">                <span class="keyword">else</span>:</span><br><span class="line">                    rows[j] += line</span><br><span class="line">                <span class="keyword">if</span> (i + <span class="number">1</span>) &lt; len(self.columns):</span><br><span class="line">                    rows[j] += <span class="string">' | '</span></span><br><span class="line">        <span class="keyword">return</span> <span class="string">'\n'</span>.join(rows)</span><br></pre></td></tr></table></figure><p>我们来看看效果：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">&gt;&gt;&gt; </span>columns = ColumnPrinter()</span><br><span class="line"><span class="meta">&gt;&gt;&gt; </span>sim = simulate(grid.height, grid.width)</span><br><span class="line"><span class="meta">&gt;&gt;&gt; </span><span class="keyword">for</span> i <span class="keyword">in</span> range(<span class="number">5</span>):</span><br><span class="line"><span class="meta">... </span>    columns.append(str(grid))</span><br><span class="line"><span class="meta">... </span>    grid = live_a_generation(grid, sim)</span><br><span class="line">...</span><br><span class="line"><span class="meta">&gt;&gt;&gt; </span>print(columns)  <span class="comment"># doctest: +NORMALIZE_WHITESPACE</span></span><br><span class="line">    <span class="number">0</span>     |     <span class="number">1</span>     |     <span class="number">2</span>     |     <span class="number">3</span>     |     <span class="number">4</span></span><br><span class="line">---*----- | --------- | --------- | --------- | ---------</span><br><span class="line">----*---- | --*-*---- | ----*---- | ---*----- | ----*----</span><br><span class="line">--***---- | ---**---- | --*-*---- | ----**--- | -----*---</span><br><span class="line">--------- | ---*----- | ---**---- | ---**---- | ---***---</span><br><span class="line">--------- | --------- | --------- | --------- | ---------</span><br></pre></td></tr></table></figure><p>上面这套的实现方式，其最大优势在于：开发者能够在不修改game_logic函数的前提下，更新该函数外围的那些代码。<br>上面这套范例代码，演示了如何用协程来分离程序中的各个关注点，而关注点的分离，正是一条重要的原则。</p><h2 id="结束"><a href="#结束" class="headerlink" title="结束"></a>结束</h2>]]></content>
    
    <summary type="html">
    
      &lt;p class=&quot;description&quot;&gt;考虑用协程来并发的运行多个函数&lt;br&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;&quot; alt=&quot;&quot; style=&quot;width:100%&quot;&gt;&lt;/p&gt;
&lt;h2 id=&quot;前言&quot;&gt;&lt;a href=&quot;#前言&quot; class=&quot;headerlink&quot; title=&quot;前言&quot;&gt;&lt;/a&gt;前言&lt;/h2&gt;&lt;div class=&quot;note primary&quot;&gt;&lt;p&gt;&lt;br&gt;我们可以用线程来运行多个函数，使这些函数看上去好像是在同一时间得到执行的。然而，线程有&lt;code&gt;三&lt;/code&gt;个显著的缺点：&lt;br&gt;&lt;ul&gt;&lt;br&gt;&lt;li&gt;&lt;i class=&quot;fa fa-minus-square&quot;&gt;&lt;/i&gt; 为了确保数据安全，我们必须使用特殊的工具(&lt;code&gt;Lock&lt;/code&gt;, &lt;code&gt;Queue&lt;/code&gt;等)来协调这些线程，这使得多线程的代码，要比单线程的过程式代码更加难懂。这些复杂的多线程代码，会逐渐令程序变得难以扩展和维护。&lt;/li&gt;&lt;br&gt;&lt;li&gt;&lt;i class=&quot;fa fa-minus-square&quot;&gt;&lt;/i&gt; 线程需要&lt;code&gt;占用大量内存&lt;/code&gt;，每个正在执行的线程，大约占据&lt;code&gt;8MB&lt;/code&gt;内存。如果只开十几个线程，多数计算机还是可以承受的。&lt;/li&gt;&lt;br&gt;&lt;li&gt;&lt;i class=&quot;fa fa-minus-square&quot;&gt;&lt;/i&gt; 线程&lt;code&gt;启动的开销比较大&lt;/code&gt;。如果程序不停地依靠创建新线程来同时执行多个函数，并等待这些线程结束，那么使用线程所引发的开销，就会拖慢整个程序的速度。&lt;/li&gt;&lt;br&gt;&lt;/ul&gt;&lt;/p&gt;&lt;/div&gt;
    
    </summary>
    
      <category term="Python" scheme="https://cgdeeplearn.github.io/categories/Python/"/>
    
      <category term="进程线程协程" scheme="https://cgdeeplearn.github.io/categories/Python/%E8%BF%9B%E7%A8%8B%E7%BA%BF%E7%A8%8B%E5%8D%8F%E7%A8%8B/"/>
    
    
      <category term="yield from" scheme="https://cgdeeplearn.github.io/tags/yield-from/"/>
    
      <category term="yield" scheme="https://cgdeeplearn.github.io/tags/yield/"/>
    
      <category term="coroutine" scheme="https://cgdeeplearn.github.io/tags/coroutine/"/>
    
  </entry>
  
  <entry>
    <title>SparkMLlib-Advanced-Topics</title>
    <link href="https://cgdeeplearn.github.io/2018/01/23/SparkMLlib-Advanced-Topics/"/>
    <id>https://cgdeeplearn.github.io/2018/01/23/SparkMLlib-Advanced-Topics/</id>
    <published>2018-01-23T02:45:01.000Z</published>
    <updated>2018-03-01T09:35:47.765Z</updated>
    
    <content type="html"><![CDATA[<script src="/assets/js/APlayer.min.js"> </script><p class="description">线性方法的优化</p><p><img src="" alt="" style="width:100%"></p><p>三种线性方法的优化方法：</p><ul><li><code>Limited-memory BFGS(L-BFGS)</code>有限记忆BFGS</li><li><code>Normal equation solver for weighted least square</code>用于加权最小二乘法的正态方程求解器</li><li><code>Iteratively reweighted least squares(IRLS)</code>迭代重新加权最小二乘</li></ul><a id="more"></a><h2 id="Limited-memory-BFGS-L-BFGS-有限记忆BFGS"><a href="#Limited-memory-BFGS-L-BFGS-有限记忆BFGS" class="headerlink" title="Limited-memory BFGS (L-BFGS)有限记忆BFGS"></a>Limited-memory BFGS (L-BFGS)有限记忆BFGS</h2><p> <code>L-BFGS</code>是<code>拟牛顿方法家族</code>里的一个优化算法，解决<code>min w∈R d f(w)</code>形式的优化问题。<code>L-BFGS</code>方法以二次方程来逼近目标函数来构造<code>Hessian</code>矩阵，不考虑目标函数的二阶偏导数。<code>Hessian</code>矩阵由先前的迭代评估逼近，所以不像直接使用牛顿方法一样可垂直扩展（训练特征的数目）。所以<code>L-BFGS</code>通常比其他一阶优化方法能更快收敛。</p><p><a href="http://research-srv.microsoft.com/en-us/um/people/jfgao/paper/icml07scalable.pdf" target="_blank" rel="noopener">象限有限记忆拟牛顿(OWL-QN)</a>算法是L-BFGS的扩展，它可以有效处理L1和弹性网格正则化。<code>L-BFGS</code>在Spark MLlib中用于<code>线性回归</code>、<code>逻辑回归</code>、<code>AFT生存回归</code>和<code>多层感知器的求解</code>。</p><h2 id="Normal-equation-solver-for-weighted-least-square用于加权最小二乘法的正态方程求解器"><a href="#Normal-equation-solver-for-weighted-least-square用于加权最小二乘法的正态方程求解器" class="headerlink" title="Normal equation solver for weighted least square用于加权最小二乘法的正态方程求解器"></a>Normal equation solver for weighted least square用于加权最小二乘法的正态方程求解器</h2><p>MLlib 通过<a href="https://github.com/apache/spark/blob/v2.2.1/mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala" target="_blank" rel="noopener">WeightedLeastSquares</a>实现了<a href="https://en.wikipedia.org/wiki/Least_squares#Weighted_least_squares" target="_blank" rel="noopener">加权最小二乘法</a>的方程求解器。</p><p>Spark MLlib目前支持正态方程的两种求解器：<code>Cholesky分解法</code>和拟<code>牛顿法(L-BFGS / OWL-QN)</code>。乔列斯基因式分解依赖于正定的协方差矩阵（即数据矩阵的列必须是线性无关的），并且如果违反这种条件将会失败。即使协方差矩阵不是正定的，准牛顿方法仍然能够提供合理的解，所以在这种情况下，正规方程求解器也可以退回到拟牛顿法。对于<code>LinearRegression</code>和<code>GeneralizedLinearRegression</code>估计，这种回退目前总是启用的。</p><p><code>WeightedLeastSquares</code>支持L1，L2和弹性网络正则化，并提供启用或禁用正则化和标准化的选项。在没有L1正则化的情况下（即α = 0），存在解析解，可以使用乔列斯基(Cholesky)或拟牛顿(Quasi-Newton)求解器。当α &gt; 0时 不存在解析解，而是使用拟牛顿求解器迭代地求出系数。</p><p>为了使正态方程有效，<code>WeightedLeastSquares</code>要求特征数不超过4096个。对于较大的问题，使用<code>L-BFGS</code>代替。</p><h2 id="Iteratively-reweighted-least-squares-IRLS-迭代重新加权最小二乘"><a href="#Iteratively-reweighted-least-squares-IRLS-迭代重新加权最小二乘" class="headerlink" title="Iteratively reweighted least squares (IRLS)迭代重新加权最小二乘"></a>Iteratively reweighted least squares (IRLS)迭代重新加权最小二乘</h2><p>MLlib 通过<a href="https://en.wikipedia.org/wiki/Iteratively_reweighted_least_squares" target="_blank" rel="noopener">IterativelyReweightedLeastSquares</a>实现<a href="https://github.com/apache/spark/blob/v2.2.1/mllib/src/main/scala/org/apache/spark/ml/optim/IterativelyReweightedLeastSquares.scala" target="_blank" rel="noopener">迭代重新加权最小二乘（IRLS）</a>。它可以用来找到广义线性模型(GLM)的最大似然估计，在鲁棒回归和其他优化问题中找到M估计。有关更多信息，请参阅<a href="http://www.jstor.org/stable/2345503" target="_blank" rel="noopener">迭代重新加权的最小二乘法以获得最大似然估计，以及一些鲁棒性和抗性替代方法</a>。</p><p>它通过以下过程迭代地解决某些优化问题：</p><ul><li>线性化目前的解决方案的目标，并更新相应的权重。</li><li>通过WeightedLeastSquares解决加权最小二乘（WLS）问题。</li><li>重复上述步骤直到收敛。</li></ul><p>由于它涉及到<code>WeightedLeastSquares</code>每次迭代求解加权最小二乘（WLS）问题，因此它还要求特征数不超过<code>4096</code>个。目前IRLS被用作<code>GeneralizedLinearRegression</code>的默认求解器。</p><p><strong>更多详细信息请查阅<a href="https://spark.apache.org/docs/latest/ml-advanced.html" target="_blank" rel="noopener">Spark ml-advanced</a></strong></p><h2 id="结束"><a href="#结束" class="headerlink" title="结束"></a>结束</h2>]]></content>
    
    <summary type="html">
    
      &lt;p class=&quot;description&quot;&gt;线性方法的优化&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;&quot; alt=&quot;&quot; style=&quot;width:100%&quot;&gt;&lt;/p&gt;
&lt;p&gt;三种线性方法的优化方法：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;Limited-memory BFGS(L-BFGS)&lt;/code&gt;有限记忆BFGS&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Normal equation solver for weighted least square&lt;/code&gt;用于加权最小二乘法的正态方程求解器&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Iteratively reweighted least squares(IRLS)&lt;/code&gt;迭代重新加权最小二乘&lt;/li&gt;
&lt;/ul&gt;
    
    </summary>
    
      <category term="Spark" scheme="https://cgdeeplearn.github.io/categories/Spark/"/>
    
      <category term="MLlib" scheme="https://cgdeeplearn.github.io/categories/Spark/MLlib/"/>
    
    
      <category term="线性方法的优化" scheme="https://cgdeeplearn.github.io/tags/%E7%BA%BF%E6%80%A7%E6%96%B9%E6%B3%95%E7%9A%84%E4%BC%98%E5%8C%96/"/>
    
      <category term="有限记忆BFGS" scheme="https://cgdeeplearn.github.io/tags/%E6%9C%89%E9%99%90%E8%AE%B0%E5%BF%86BFGS/"/>
    
      <category term="用于加权最小二乘法的正态方程求解器" scheme="https://cgdeeplearn.github.io/tags/%E7%94%A8%E4%BA%8E%E5%8A%A0%E6%9D%83%E6%9C%80%E5%B0%8F%E4%BA%8C%E4%B9%98%E6%B3%95%E7%9A%84%E6%AD%A3%E6%80%81%E6%96%B9%E7%A8%8B%E6%B1%82%E8%A7%A3%E5%99%A8/"/>
    
      <category term="迭代重新加权最小二乘" scheme="https://cgdeeplearn.github.io/tags/%E8%BF%AD%E4%BB%A3%E9%87%8D%E6%96%B0%E5%8A%A0%E6%9D%83%E6%9C%80%E5%B0%8F%E4%BA%8C%E4%B9%98/"/>
    
  </entry>
  
  <entry>
    <title>SparkMLlib-ML-Tuning</title>
    <link href="https://cgdeeplearn.github.io/2018/01/23/SparkMLlib-ML-Tuning/"/>
    <id>https://cgdeeplearn.github.io/2018/01/23/SparkMLlib-ML-Tuning/</id>
    <published>2018-01-23T02:29:01.000Z</published>
    <updated>2018-03-01T09:37:05.697Z</updated>
    
    <content type="html"><![CDATA[<script src="/assets/js/APlayer.min.js"> </script><p class="description">模型选择, 超参调整<br></p><p><img src="" alt="" style="width:100%"><br><code>ML Tuning</code>: <code>model selection</code>(模型选择) and <code>hyperparameter tuning</code>(超参调整)<br>本节介绍如何使用MLlib的工具来调整ML算法和管道。内置的交叉验证和其他工具允许用户优化算法和管道中的超参数。</p><a id="more"></a><h2 id="Model-selection-又叫hyperparameter-tuning"><a href="#Model-selection-又叫hyperparameter-tuning" class="headerlink" title="Model selection(又叫hyperparameter tuning)"></a>Model selection(又叫hyperparameter tuning)</h2><p>ML中的一个重要任务是<code>Model Selection</code>(选择模型)，或者使用数据为给定任务找到最佳模型或参数。这也被称为<code>Tuning</code>(调整)。调整可以是以针对三个Estimators算子如LogisticRegression进行调整，也可以对整个Pipeline进行调整。用户可以一次对Pipeline整体进行调整，而不是对Pipeline的每个元素单独进行调整。</p><p>MLlib支持使用<a href="https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.ml.tuning.CrossValidator" target="_blank" rel="noopener">CrossValidator</a>和<a href="https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.ml.tuning.TrainValidationSplit" target="_blank" rel="noopener">TrainValidationSplit</a>的工具进行模型选择。这些工具需要下列项目：</p><ul><li>Estimator: 需要调整的算法或Pipeline</li><li>Set of ParamMaps: 可供选择的参数，有时称为“parameter grid”来搜索</li><li>Evaluator: 度量标准,衡量一个拟合Model在测试数据上的表现</li></ul><p>在较高层面上，这些模型选择工具的工作如下：</p><ul><li>他们将输入数据分成单独的训练和测试数据集。</li><li>对每组训练数据与测试数据对，对参数表集合，用相应参数来拟合估计器，得到训练后的模型，再使用评估器来评估模型表现。</li><li>选择最好的一组参数生成的模型。</li></ul><p>其中，对于回归问题评估器可选择<code>RegressionEvaluator</code>，二值数据可选择<code>BinaryClassificationEvaluator</code>，多分类问题可选择<code>MulticlassClassificationEvaluator</code>。评估器里默认的评估准则可通过<code>setMetricName</code>方法重写。</p><p>用户可通过<a href="https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.ml.tuning.ParamGridBuilder" target="_blank" rel="noopener">ParamGridBuilder</a>构建参数网格。</p><h2 id="Cross-Validation"><a href="#Cross-Validation" class="headerlink" title="Cross-Validation"></a>Cross-Validation</h2><p><code>CrossValidator</code>将数据集划分为若干子集分别地进行训练和测试。如当k＝3时，CrossValidator产生3个训练数据与测试数据对，每个数据对使用2/3的数据来训练，1/3的数据来测试。对于一组特定的参数表，CrossValidator计算基于三组不同训练数据与测试数据对训练得到的模型的评估准则的平均值。确定最佳参数表后，CrossValidator最后使用最佳参数表基于全部数据来重新拟合Estimator。</p><p>示例：</p><p>注意对参数网格进行交叉验证的成本是很高的。如下面例子中，参数网格<code>hashingTF.numFeatures</code>有3个值，<code>lr.regParam</code>有2个值，CrossValidator使用2折交叉验证。这样就会产生<code>(3*2)*2 = 12</code>中不同的模型需要进行训练。在实际的设置中，通常有更多的参数需要设置，且我们可能会使用更多的交叉验证折数（3折或者10折都是经使用的）。所以CrossValidator的成本是很高的，尽管如此，比起启发式的手工验证，交叉验证仍然是目前存在的参数选择方法中非常有用的一种。</p><h3 id="Examples"><a href="#Examples" class="headerlink" title="Examples"></a>Examples</h3><p>有关API的更多详细信息，请参阅<a href="https://spark.apache.org/docs/latest/api/python/pyspark.ml.html#pyspark.ml.tuning.CrossValidator" target="_blank" rel="noopener">CrossValidatorPython</a>文档。</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">from</span> pyspark.ml <span class="keyword">import</span> Pipeline</span><br><span class="line"><span class="keyword">from</span> pyspark.ml.classification <span class="keyword">import</span> LogisticRegression</span><br><span class="line"><span class="keyword">from</span> pyspark.ml.evaluation <span class="keyword">import</span> BinaryClassificationEvaluator</span><br><span class="line"><span class="keyword">from</span> pyspark.ml.feature <span class="keyword">import</span> HashingTF, Tokenizer</span><br><span class="line"><span class="keyword">from</span> pyspark.ml.tuning <span class="keyword">import</span> CrossValidator, ParamGridBuilder</span><br><span class="line"><span class="keyword">from</span> pyspark.sql <span class="keyword">import</span> SparkSession</span><br><span class="line"></span><br><span class="line">spark = SparkSession.builder.appName(<span class="string">"CrossValidatorExample"</span>).getOrCreate()</span><br><span class="line"><span class="comment"># Prepare training documents, which are labeled.</span></span><br><span class="line">training = spark.createDataFrame([</span><br><span class="line">    (<span class="number">0</span>, <span class="string">"a b c d e spark"</span>, <span class="number">1.0</span>),</span><br><span class="line">    (<span class="number">1</span>, <span class="string">"b d"</span>, <span class="number">0.0</span>),</span><br><span class="line">    (<span class="number">2</span>, <span class="string">"spark f g h"</span>, <span class="number">1.0</span>),</span><br><span class="line">    (<span class="number">3</span>, <span class="string">"hadoop mapreduce"</span>, <span class="number">0.0</span>),</span><br><span class="line">    (<span class="number">4</span>, <span class="string">"b spark who"</span>, <span class="number">1.0</span>),</span><br><span class="line">    (<span class="number">5</span>, <span class="string">"g d a y"</span>, <span class="number">0.0</span>),</span><br><span class="line">    (<span class="number">6</span>, <span class="string">"spark fly"</span>, <span class="number">1.0</span>),</span><br><span class="line">    (<span class="number">7</span>, <span class="string">"was mapreduce"</span>, <span class="number">0.0</span>),</span><br><span class="line">    (<span class="number">8</span>, <span class="string">"e spark program"</span>, <span class="number">1.0</span>),</span><br><span class="line">    (<span class="number">9</span>, <span class="string">"a e c l"</span>, <span class="number">0.0</span>),</span><br><span class="line">    (<span class="number">10</span>, <span class="string">"spark compile"</span>, <span class="number">1.0</span>),</span><br><span class="line">    (<span class="number">11</span>, <span class="string">"hadoop software"</span>, <span class="number">0.0</span>)</span><br><span class="line">], [<span class="string">"id"</span>, <span class="string">"text"</span>, <span class="string">"label"</span>])</span><br><span class="line"></span><br><span class="line"><span class="comment"># Configure an ML pipeline, which consists of tree stages: tokenizer, hashingTF, and lr.</span></span><br><span class="line">tokenizer = Tokenizer(inputCol=<span class="string">"text"</span>, outputCol=<span class="string">"words"</span>)</span><br><span class="line">hashingTF = HashingTF(inputCol=tokenizer.getOutputCol(), outputCol=<span class="string">"features"</span>)</span><br><span class="line">lr = LogisticRegression(maxIter=<span class="number">10</span>)</span><br><span class="line">pipeline = Pipeline(stages=[tokenizer, hashingTF, lr])</span><br><span class="line"></span><br><span class="line"><span class="comment"># We now treat the Pipeline as an Estimator, wrapping it in a CrossValidator instance.</span></span><br><span class="line"><span class="comment"># This will allow us to jointly choose parameters for all Pipeline stages.</span></span><br><span class="line"><span class="comment"># A CrossValidator requires an Estimator, a set of Estimator ParamMaps, and an Evaluator.</span></span><br><span class="line"><span class="comment"># We use a ParamGridBuilder to construct a grid of parameters to search over.</span></span><br><span class="line"><span class="comment"># With 3 values for hashingTF.numFeatures and 2 values for lr.regParam,</span></span><br><span class="line"><span class="comment"># this grid will have 3 x 2 = 6 parameter settings for CrossValidator to choose from.</span></span><br><span class="line">paramGrid = ParamGridBuilder() \</span><br><span class="line">    .addGrid(hashingTF.numFeatures, [<span class="number">10</span>, <span class="number">100</span>, <span class="number">1000</span>]) \</span><br><span class="line">    .addGrid(lr.regParam, [<span class="number">0.1</span>, <span class="number">0.01</span>]) \</span><br><span class="line">    .build()</span><br><span class="line"></span><br><span class="line">crossval = CrossValidator(estimator=pipeline,</span><br><span class="line">                          estimatorParamMaps=paramGrid,</span><br><span class="line">                          evaluator=BinaryClassificationEvaluator(),</span><br><span class="line">                          numFolds=<span class="number">2</span>)  <span class="comment"># use 3+ folds in practice</span></span><br><span class="line"></span><br><span class="line"><span class="comment"># Run cross-validation, and choose the best set of parameters.</span></span><br><span class="line">cvModel = crossval.fit(training)</span><br><span class="line"></span><br><span class="line"><span class="comment"># Prepare test documents, which are unlabeled.</span></span><br><span class="line">test = spark.createDataFrame([</span><br><span class="line">    (<span class="number">4</span>, <span class="string">"spark i j k"</span>),</span><br><span class="line">    (<span class="number">5</span>, <span class="string">"l m n"</span>),</span><br><span class="line">    (<span class="number">6</span>, <span class="string">"mapreduce spark"</span>),</span><br><span class="line">    (<span class="number">7</span>, <span class="string">"apache hadoop"</span>)</span><br><span class="line">], [<span class="string">"id"</span>, <span class="string">"text"</span>])</span><br><span class="line"></span><br><span class="line"><span class="comment"># Make predictions on test documents. cvModel uses the best model found (lrModel).</span></span><br><span class="line">prediction = cvModel.transform(test)</span><br><span class="line">selected = prediction.select(<span class="string">"id"</span>, <span class="string">"text"</span>, <span class="string">"probability"</span>, <span class="string">"prediction"</span>)</span><br><span class="line"><span class="keyword">for</span> row <span class="keyword">in</span> selected.collect():</span><br><span class="line">    print(row)</span><br><span class="line">selected.show()</span><br><span class="line">spark.stop()</span><br></pre></td></tr></table></figure><p>output:</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line">Row(id=<span class="number">4</span>, text=<span class="string">'spark i j k'</span>, probability=DenseVector([<span class="number">0.627</span>, <span class="number">0.373</span>]), prediction=<span class="number">0.0</span>)</span><br><span class="line">Row(id=<span class="number">5</span>, text=<span class="string">'l m n'</span>, probability=DenseVector([<span class="number">0.3451</span>, <span class="number">0.6549</span>]), prediction=<span class="number">1.0</span>)</span><br><span class="line">Row(id=<span class="number">6</span>, text=<span class="string">'mapreduce spark'</span>, probability=DenseVector([<span class="number">0.3351</span>, <span class="number">0.6649</span>]), prediction=<span class="number">1.0</span>)</span><br><span class="line">Row(id=<span class="number">7</span>, text=<span class="string">'apache hadoop'</span>, probability=DenseVector([<span class="number">0.2767</span>, <span class="number">0.7233</span>]), prediction=<span class="number">1.0</span>)</span><br><span class="line">+---+---------------+--------------------+----------+</span><br><span class="line">| id|           text|         probability|prediction|</span><br><span class="line">+---+---------------+--------------------+----------+</span><br><span class="line">|  <span class="number">4</span>|    spark i j k|[<span class="number">0.62703425702535</span>...|       <span class="number">0.0</span>|</span><br><span class="line">|  <span class="number">5</span>|          l m n|[<span class="number">0.34509123755317</span>...|       <span class="number">1.0</span>|</span><br><span class="line">|  <span class="number">6</span>|mapreduce spark|[<span class="number">0.33514123783842</span>...|       <span class="number">1.0</span>|</span><br><span class="line">|  <span class="number">7</span>|  apache hadoop|[<span class="number">0.27672019766802</span>...|       <span class="number">1.0</span>|</span><br><span class="line">+---+---------------+--------------------+----------+</span><br></pre></td></tr></table></figure><p>Find full example code at “examples/src/main/python/ml/cross_validator.py” in the Spark repo.</p><h2 id="Train-Validation-Split"><a href="#Train-Validation-Split" class="headerlink" title="Train-Validation Split"></a>Train-Validation Split</h2><p>除了交叉验证以外，Spark还提供 <code>TrainValidationSplit</code> 用以进行超参数调整。和交叉验证评估K次不同， <code>TrainValidationSplit</code> 只对每组参数评估一次。因此它计算代价更低，但当训练数据集不是足够大时，其结果可靠性不高。</p><p>与交叉验证不同， <code>TrainValidationSplit</code>仅需要一个训练数据与验证数据对。使用训练比率参数将原始数据划分为两个部分。如当训练比率为0.75时，训练验证分裂使用75%数据以训练，25%数据以验证。</p><p>与交叉验证相同，确定最佳参数表后，训练验证分裂最后使用最佳参数表基于全部数据来重新拟合Estimator。</p><h3 id="Examples-1"><a href="#Examples-1" class="headerlink" title="Examples"></a>Examples</h3><p>有关API的更多详细信息，请参阅<a href="https://spark.apache.org/docs/latest/api/python/pyspark.ml.html#pyspark.ml.tuning.TrainValidationSplit" target="_blank" rel="noopener">TrainValidationSplitPython</a>文档。</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">from</span> pyspark.ml.evaluation <span class="keyword">import</span> RegressionEvaluator</span><br><span class="line"><span class="keyword">from</span> pyspark.ml.regression <span class="keyword">import</span> LinearRegression</span><br><span class="line"><span class="keyword">from</span> pyspark.ml.tuning <span class="keyword">import</span> ParamGridBuilder, TrainValidationSplit</span><br><span class="line"><span class="keyword">from</span> pyspark.sql <span class="keyword">import</span> SparkSession</span><br><span class="line"></span><br><span class="line">spark = SparkSession.builder.appName(<span class="string">"TrainValidationSplitExample"</span>).getOrCreate()</span><br><span class="line"><span class="comment"># Prepare training and test data.</span></span><br><span class="line">data = spark.read.format(<span class="string">"libsvm"</span>)\</span><br><span class="line">    .load(<span class="string">"data/mllib/sample_linear_regression_data.txt"</span>)</span><br><span class="line">train, test = data.randomSplit([<span class="number">0.9</span>, <span class="number">0.1</span>], seed=<span class="number">12345</span>)</span><br><span class="line"></span><br><span class="line">lr = LinearRegression(maxIter=<span class="number">10</span>)</span><br><span class="line"></span><br><span class="line"><span class="comment"># We use a ParamGridBuilder to construct a grid of parameters to search over.</span></span><br><span class="line"><span class="comment"># TrainValidationSplit will try all combinations of values and determine best model using</span></span><br><span class="line"><span class="comment"># the evaluator.</span></span><br><span class="line">paramGrid = ParamGridBuilder()\</span><br><span class="line">    .addGrid(lr.regParam, [<span class="number">0.1</span>, <span class="number">0.01</span>]) \</span><br><span class="line">    .addGrid(lr.fitIntercept, [<span class="keyword">False</span>, <span class="keyword">True</span>])\</span><br><span class="line">    .addGrid(lr.elasticNetParam, [<span class="number">0.0</span>, <span class="number">0.5</span>, <span class="number">1.0</span>])\</span><br><span class="line">    .build()</span><br><span class="line"></span><br><span class="line"><span class="comment"># In this case the estimator is simply the linear regression.</span></span><br><span class="line"><span class="comment"># A TrainValidationSplit requires an Estimator, a set of Estimator ParamMaps, and an Evaluator.</span></span><br><span class="line">tvs = TrainValidationSplit(estimator=lr,</span><br><span class="line">                           estimatorParamMaps=paramGrid,</span><br><span class="line">                           evaluator=RegressionEvaluator(),</span><br><span class="line">                           <span class="comment"># 80% of the data will be used for training, 20% for validation.</span></span><br><span class="line">                           trainRatio=<span class="number">0.8</span>)</span><br><span class="line"></span><br><span class="line"><span class="comment"># Run TrainValidationSplit, and choose the best set of parameters.</span></span><br><span class="line">model = tvs.fit(train)</span><br><span class="line"></span><br><span class="line"><span class="comment"># Make predictions on test data. model is the model with combination of parameters</span></span><br><span class="line"><span class="comment"># that performed best.</span></span><br><span class="line">model.transform(test)\</span><br><span class="line">    .select(<span class="string">"features"</span>, <span class="string">"label"</span>, <span class="string">"prediction"</span>)\</span><br><span class="line">    .show()</span><br><span class="line">spark.stop()</span><br></pre></td></tr></table></figure><p>output:</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br></pre></td><td class="code"><pre><span class="line">+--------------------+--------------------+--------------------+</span><br><span class="line">|            features|               label|          prediction|</span><br><span class="line">+--------------------+--------------------+--------------------+</span><br><span class="line">|(<span class="number">10</span>,[<span class="number">0</span>,<span class="number">1</span>,<span class="number">2</span>,<span class="number">3</span>,<span class="number">4</span>,<span class="number">5</span>,...|  <span class="number">-23.51088409032297</span>| <span class="number">-1.6659388625179559</span>|</span><br><span class="line">|(<span class="number">10</span>,[<span class="number">0</span>,<span class="number">1</span>,<span class="number">2</span>,<span class="number">3</span>,<span class="number">4</span>,<span class="number">5</span>,...| <span class="number">-21.432387764165806</span>|  <span class="number">0.3400877302576284</span>|</span><br><span class="line">|(<span class="number">10</span>,[<span class="number">0</span>,<span class="number">1</span>,<span class="number">2</span>,<span class="number">3</span>,<span class="number">4</span>,<span class="number">5</span>,...| <span class="number">-12.977848725392104</span>|<span class="number">-0.02335359093652395</span>|</span><br><span class="line">|(<span class="number">10</span>,[<span class="number">0</span>,<span class="number">1</span>,<span class="number">2</span>,<span class="number">3</span>,<span class="number">4</span>,<span class="number">5</span>,...| <span class="number">-11.827072996392571</span>|  <span class="number">2.5642684021108417</span>|</span><br><span class="line">|(<span class="number">10</span>,[<span class="number">0</span>,<span class="number">1</span>,<span class="number">2</span>,<span class="number">3</span>,<span class="number">4</span>,<span class="number">5</span>,...| <span class="number">-10.945919657782932</span>| <span class="number">-0.1631314487734783</span>|</span><br><span class="line">|(<span class="number">10</span>,[<span class="number">0</span>,<span class="number">1</span>,<span class="number">2</span>,<span class="number">3</span>,<span class="number">4</span>,<span class="number">5</span>,...|  <span class="number">-10.58331129986813</span>|   <span class="number">2.517790654691453</span>|</span><br><span class="line">|(<span class="number">10</span>,[<span class="number">0</span>,<span class="number">1</span>,<span class="number">2</span>,<span class="number">3</span>,<span class="number">4</span>,<span class="number">5</span>,...| <span class="number">-10.288657252388708</span>| <span class="number">-0.9443474180536754</span>|</span><br><span class="line">|(<span class="number">10</span>,[<span class="number">0</span>,<span class="number">1</span>,<span class="number">2</span>,<span class="number">3</span>,<span class="number">4</span>,<span class="number">5</span>,...|  <span class="number">-8.822357870425154</span>|  <span class="number">0.6872889429113783</span>|</span><br><span class="line">|(<span class="number">10</span>,[<span class="number">0</span>,<span class="number">1</span>,<span class="number">2</span>,<span class="number">3</span>,<span class="number">4</span>,<span class="number">5</span>,...|  <span class="number">-8.772667465932606</span>|  <span class="number">-1.485408580416465</span>|</span><br><span class="line">|(<span class="number">10</span>,[<span class="number">0</span>,<span class="number">1</span>,<span class="number">2</span>,<span class="number">3</span>,<span class="number">4</span>,<span class="number">5</span>,...|  <span class="number">-8.605713514762092</span>|   <span class="number">1.110272909026478</span>|</span><br><span class="line">|(<span class="number">10</span>,[<span class="number">0</span>,<span class="number">1</span>,<span class="number">2</span>,<span class="number">3</span>,<span class="number">4</span>,<span class="number">5</span>,...|  <span class="number">-6.544633229269576</span>|  <span class="number">3.0454559778611285</span>|</span><br><span class="line">|(<span class="number">10</span>,[<span class="number">0</span>,<span class="number">1</span>,<span class="number">2</span>,<span class="number">3</span>,<span class="number">4</span>,<span class="number">5</span>,...|  <span class="number">-5.055293333055445</span>|  <span class="number">0.6441174575094268</span>|</span><br><span class="line">|(<span class="number">10</span>,[<span class="number">0</span>,<span class="number">1</span>,<span class="number">2</span>,<span class="number">3</span>,<span class="number">4</span>,<span class="number">5</span>,...|  <span class="number">-5.039628433467326</span>|  <span class="number">0.9572366607107066</span>|</span><br><span class="line">|(<span class="number">10</span>,[<span class="number">0</span>,<span class="number">1</span>,<span class="number">2</span>,<span class="number">3</span>,<span class="number">4</span>,<span class="number">5</span>,...|  <span class="number">-4.937258492902948</span>|  <span class="number">0.2292114538379546</span>|</span><br><span class="line">|(<span class="number">10</span>,[<span class="number">0</span>,<span class="number">1</span>,<span class="number">2</span>,<span class="number">3</span>,<span class="number">4</span>,<span class="number">5</span>,...|  <span class="number">-3.741044592262687</span>|   <span class="number">3.343205816009816</span>|</span><br><span class="line">|(<span class="number">10</span>,[<span class="number">0</span>,<span class="number">1</span>,<span class="number">2</span>,<span class="number">3</span>,<span class="number">4</span>,<span class="number">5</span>,...|  <span class="number">-3.731112242951253</span>| <span class="number">-2.6826413698701064</span>|</span><br><span class="line">|(<span class="number">10</span>,[<span class="number">0</span>,<span class="number">1</span>,<span class="number">2</span>,<span class="number">3</span>,<span class="number">4</span>,<span class="number">5</span>,...|  <span class="number">-2.109441044710089</span>| <span class="number">-2.1930034039595445</span>|</span><br><span class="line">|(<span class="number">10</span>,[<span class="number">0</span>,<span class="number">1</span>,<span class="number">2</span>,<span class="number">3</span>,<span class="number">4</span>,<span class="number">5</span>,...| <span class="number">-1.8722161156986976</span>| <span class="number">0.49547270330052423</span>|</span><br><span class="line">|(<span class="number">10</span>,[<span class="number">0</span>,<span class="number">1</span>,<span class="number">2</span>,<span class="number">3</span>,<span class="number">4</span>,<span class="number">5</span>,...| <span class="number">-1.1009750789589774</span>| <span class="number">-0.9441633113006601</span>|</span><br><span class="line">|(<span class="number">10</span>,[<span class="number">0</span>,<span class="number">1</span>,<span class="number">2</span>,<span class="number">3</span>,<span class="number">4</span>,<span class="number">5</span>,...|<span class="number">-0.48115211266405217</span>| <span class="number">-0.6756196573079968</span>|</span><br><span class="line">+--------------------+--------------------+--------------------+</span><br></pre></td></tr></table></figure><p>Find full example code at “examples/src/main/python/ml/train_validation_split.py” in the Spark repo.</p><p><strong>更多相关信息请查阅<a href="https://spark.apache.org/docs/latest/ml-tuning.html" target="_blank" rel="noopener">Spark ml-tuning</a></strong></p><h2 id="结束"><a href="#结束" class="headerlink" title="结束"></a>结束</h2>]]></content>
    
    <summary type="html">
    
      &lt;p class=&quot;description&quot;&gt;模型选择, 超参调整&lt;br&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;&quot; alt=&quot;&quot; style=&quot;width:100%&quot;&gt;&lt;br&gt;&lt;code&gt;ML Tuning&lt;/code&gt;: &lt;code&gt;model selection&lt;/code&gt;(模型选择) and &lt;code&gt;hyperparameter tuning&lt;/code&gt;(超参调整)&lt;br&gt;本节介绍如何使用MLlib的工具来调整ML算法和管道。内置的交叉验证和其他工具允许用户优化算法和管道中的超参数。&lt;/p&gt;
    
    </summary>
    
      <category term="Spark" scheme="https://cgdeeplearn.github.io/categories/Spark/"/>
    
      <category term="MLlib" scheme="https://cgdeeplearn.github.io/categories/Spark/MLlib/"/>
    
    
      <category term="模型选择" scheme="https://cgdeeplearn.github.io/tags/%E6%A8%A1%E5%9E%8B%E9%80%89%E6%8B%A9/"/>
    
      <category term="交叉验证" scheme="https://cgdeeplearn.github.io/tags/%E4%BA%A4%E5%8F%89%E9%AA%8C%E8%AF%81/"/>
    
      <category term="训练-验证集划分" scheme="https://cgdeeplearn.github.io/tags/%E8%AE%AD%E7%BB%83-%E9%AA%8C%E8%AF%81%E9%9B%86%E5%88%92%E5%88%86/"/>
    
  </entry>
  
  <entry>
    <title>SparkMLlib-Frequent-Pattern-Mining</title>
    <link href="https://cgdeeplearn.github.io/2018/01/23/SparkMLlib-Frequent-Pattern-Mining/"/>
    <id>https://cgdeeplearn.github.io/2018/01/23/SparkMLlib-Frequent-Pattern-Mining/</id>
    <published>2018-01-23T02:02:05.000Z</published>
    <updated>2018-03-01T09:36:53.828Z</updated>
    
    <content type="html"><![CDATA[<script src="/assets/js/APlayer.min.js"> </script><p class="description">Frequent Pattern Mining<br></p><p><img src="" alt="" style="width:100%"><br><code>Frequent Pattern Mining</code>：频繁项目，项目集，子序列或其他子结构的挖掘通常是分析大规模数据集的第一步，这已经成为<code>数据挖掘</code>领域的一个活跃的研究课题。我们将用户引用到Wikipedia的<a href="http://en.wikipedia.org/wiki/Association_rule_learning" target="_blank" rel="noopener">关联规则学习</a>中以获取更多信息。</p><a id="more"></a><h2 id="FP-Growth"><a href="#FP-Growth" class="headerlink" title="FP-Growth"></a>FP-Growth</h2><p>FP-growth算法在Han等人的文章<a href="http://dx.doi.org/10.1145/335191.335372" target="_blank" rel="noopener">“ Mining frequent patterns without candidate generation”</a>中描述，其中“FP”代表频繁模式。给定交易数据集，FP增长的第一步是计算项目频率并识别频繁项目。与为同样目的而设计的<a href="http://en.wikipedia.org/wiki/Apriori_algorithm" target="_blank" rel="noopener">Apriori-like</a>算法不同，FP-growth的第二步使用后缀树（FP-tree）结构来编码事务，而不显式生成候选集合，这通常是耗费的。第二步之后，可以从FP-tree中提取频繁项目集。在这里spark.mllib，我们实现了FP-growth的并行版本，称为PFP(详细请查看Li等人：<a href="http://dx.doi.org/10.1145/1454008.1454027" target="_blank" rel="noopener">PFP:Parallel FP-growth for query recommendation</a>)。PFP根据事务的后缀分配增长的FP-树的工作，因此比单机实现更具可扩展性。</p><ul><li><p>spark.ml FP的增长实现需要以下（超）参数：</p><ul><li><p><code>minSupport</code><br>一个项目组的最小支持被确定为频繁的。例如，如果一个项目在5个交易中出现3个，则它具有3/5 = 0.6的支持。</p></li><li><p><code>minConfidence</code><br>生成关联规则的最低置信度。信心是一个关联规则被发现是真实的指标。例如，如果交易项目集X出现4次，X 并且Y只出现2次，则规则的置信度为X =&gt; Y2/4 = 0.5。该参数不会影响对频繁项目集的挖掘，但指定从频繁项集生成关联规则的最小置信度。</p></li><li><p><code>numPartitions</code><br>用于分配工作的分区数量。默认情况下，param未设置，并使用输入数据集的分区数量。</p></li></ul></li><li><p>FPGrowthModel规定：</p><ul><li><p><code>freqItemsets</code><br>DataFrame格式的频繁项目集(“items”[Array]，“freq”[Long])</p></li><li><p><code>associationRules</code><br><code>minConfidence</code>以<code>DataFrame</code>(“antecedent”[Array],”consequent”[Array],<br>“confidence”[Double])格式在上面生成的关联规则。</p></li><li><p><code>transform</code><br>对于每个交易itemsCol，transform方法将比较其项目与每个关联规则的前提。如果记录包含特定关联规则的所有前提条件，则该规则将被视为适用，并将其结果添加到预测结果中。变换法将所有适用规则的后果总结为预测。预测列具有相同的数据类型，itemsCol并且不包含中的现有项目itemsCol。</p></li></ul></li></ul><h3 id="Examples"><a href="#Examples" class="headerlink" title="Examples"></a>Examples</h3><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">from</span> pyspark.ml.fpm <span class="keyword">import</span> FPGrowth</span><br><span class="line"><span class="keyword">from</span> pyspark.sql <span class="keyword">import</span> SparkSession</span><br><span class="line"></span><br><span class="line">spark = SparkSession.builder.appName(<span class="string">"FrequentPatternMiningExample"</span>).getOrCreate()</span><br><span class="line">df = spark.createDataFrame([</span><br><span class="line">    (<span class="number">0</span>, [<span class="number">1</span>, <span class="number">2</span>, <span class="number">5</span>]),</span><br><span class="line">    (<span class="number">1</span>, [<span class="number">1</span>, <span class="number">2</span>, <span class="number">3</span>, <span class="number">5</span>]),</span><br><span class="line">    (<span class="number">2</span>, [<span class="number">1</span>, <span class="number">2</span>])</span><br><span class="line">], [<span class="string">"id"</span>, <span class="string">"items"</span>])</span><br><span class="line"></span><br><span class="line">fpGrowth = FPGrowth(itemsCol=<span class="string">"items"</span>, minSupport=<span class="number">0.5</span>, minConfidence=<span class="number">0.6</span>)</span><br><span class="line">model = fpGrowth.fit(df)</span><br><span class="line"></span><br><span class="line"><span class="comment"># Display frequent itemsets.</span></span><br><span class="line">model.freqItemsets.show()</span><br><span class="line"></span><br><span class="line"><span class="comment"># Display generated association rules.</span></span><br><span class="line">model.associationRules.show()</span><br><span class="line"></span><br><span class="line"><span class="comment"># transform examines the input items against all the association rules and summarize the</span></span><br><span class="line"><span class="comment"># consequents as prediction</span></span><br><span class="line">model.transform(df).show()</span><br><span class="line">spark.stop()</span><br></pre></td></tr></table></figure><p>output:</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br></pre></td><td class="code"><pre><span class="line">+---------+----+</span><br><span class="line">|    items|freq|</span><br><span class="line">+---------+----+</span><br><span class="line">|      [<span class="number">5</span>]|   <span class="number">2</span>|</span><br><span class="line">|   [<span class="number">5</span>, <span class="number">1</span>]|   <span class="number">2</span>|</span><br><span class="line">|[<span class="number">5</span>, <span class="number">1</span>, <span class="number">2</span>]|   <span class="number">2</span>|</span><br><span class="line">|   [<span class="number">5</span>, <span class="number">2</span>]|   <span class="number">2</span>|</span><br><span class="line">|      [<span class="number">2</span>]|   <span class="number">3</span>|</span><br><span class="line">|      [<span class="number">1</span>]|   <span class="number">3</span>|</span><br><span class="line">|   [<span class="number">1</span>, <span class="number">2</span>]|   <span class="number">3</span>|</span><br><span class="line">+---------+----+</span><br><span class="line"></span><br><span class="line">+----------+----------+------------------+</span><br><span class="line">|antecedent|consequent|        confidence|</span><br><span class="line">+----------+----------+------------------+</span><br><span class="line">|       [<span class="number">5</span>]|       [<span class="number">1</span>]|               <span class="number">1.0</span>|</span><br><span class="line">|       [<span class="number">5</span>]|       [<span class="number">2</span>]|               <span class="number">1.0</span>|</span><br><span class="line">|    [<span class="number">1</span>, <span class="number">2</span>]|       [<span class="number">5</span>]|<span class="number">0.6666666666666666</span>|</span><br><span class="line">|    [<span class="number">5</span>, <span class="number">2</span>]|       [<span class="number">1</span>]|               <span class="number">1.0</span>|</span><br><span class="line">|    [<span class="number">5</span>, <span class="number">1</span>]|       [<span class="number">2</span>]|               <span class="number">1.0</span>|</span><br><span class="line">|       [<span class="number">2</span>]|       [<span class="number">5</span>]|<span class="number">0.6666666666666666</span>|</span><br><span class="line">|       [<span class="number">2</span>]|       [<span class="number">1</span>]|               <span class="number">1.0</span>|</span><br><span class="line">|       [<span class="number">1</span>]|       [<span class="number">5</span>]|<span class="number">0.6666666666666666</span>|</span><br><span class="line">|       [<span class="number">1</span>]|       [<span class="number">2</span>]|               <span class="number">1.0</span>|</span><br><span class="line">+----------+----------+------------------+</span><br><span class="line"></span><br><span class="line">+---+------------+----------+</span><br><span class="line">| id|       items|prediction|</span><br><span class="line">+---+------------+----------+</span><br><span class="line">|  <span class="number">0</span>|   [<span class="number">1</span>, <span class="number">2</span>, <span class="number">5</span>]|        []|</span><br><span class="line">|  <span class="number">1</span>|[<span class="number">1</span>, <span class="number">2</span>, <span class="number">3</span>, <span class="number">5</span>]|        []|</span><br><span class="line">|  <span class="number">2</span>|      [<span class="number">1</span>, <span class="number">2</span>]|       [<span class="number">5</span>]|</span><br><span class="line">+---+------------+----------+</span><br></pre></td></tr></table></figure><p>Find full example code at “examples/src/main/python/ml/fpgrowth_example.py” in the Spark repo.</p><p><strong>更多相关信息请查阅<a href="https://spark.apache.org/docs/latest/ml-frequent-pattern-mining.html" target="_blank" rel="noopener">Spark FPGrowth</a></strong></p><h2 id="结束"><a href="#结束" class="headerlink" title="结束"></a>结束</h2>]]></content>
    
    <summary type="html">
    
      &lt;p class=&quot;description&quot;&gt;Frequent Pattern Mining&lt;br&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;&quot; alt=&quot;&quot; style=&quot;width:100%&quot;&gt;&lt;br&gt;&lt;code&gt;Frequent Pattern Mining&lt;/code&gt;：频繁项目，项目集，子序列或其他子结构的挖掘通常是分析大规模数据集的第一步，这已经成为&lt;code&gt;数据挖掘&lt;/code&gt;领域的一个活跃的研究课题。我们将用户引用到Wikipedia的&lt;a href=&quot;http://en.wikipedia.org/wiki/Association_rule_learning&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;关联规则学习&lt;/a&gt;中以获取更多信息。&lt;/p&gt;
    
    </summary>
    
      <category term="Spark" scheme="https://cgdeeplearn.github.io/categories/Spark/"/>
    
      <category term="MLlib" scheme="https://cgdeeplearn.github.io/categories/Spark/MLlib/"/>
    
    
      <category term="数据挖掘" scheme="https://cgdeeplearn.github.io/tags/%E6%95%B0%E6%8D%AE%E6%8C%96%E6%8E%98/"/>
    
      <category term="频繁项集" scheme="https://cgdeeplearn.github.io/tags/%E9%A2%91%E7%B9%81%E9%A1%B9%E9%9B%86/"/>
    
      <category term="关联规则" scheme="https://cgdeeplearn.github.io/tags/%E5%85%B3%E8%81%94%E8%A7%84%E5%88%99/"/>
    
      <category term="FP-Growth" scheme="https://cgdeeplearn.github.io/tags/FP-Growth/"/>
    
  </entry>
  
  <entry>
    <title>SparkMLlib-Collaborative-Filtering</title>
    <link href="https://cgdeeplearn.github.io/2018/01/22/SparkMLlib-Collaborative-Filtering/"/>
    <id>https://cgdeeplearn.github.io/2018/01/22/SparkMLlib-Collaborative-Filtering/</id>
    <published>2018-01-22T03:36:52.000Z</published>
    <updated>2018-03-01T09:36:37.827Z</updated>
    
    <content type="html"><![CDATA[<script src="/assets/js/APlayer.min.js"> </script><p class="description">推荐算法<br></p><p><img src="" alt="" style="width:100%"><br><a href="http://en.wikipedia.org/wiki/Recommender_system#Collaborative_filtering" target="_blank" rel="noopener">协同过滤</a>常被用于<code>推荐系统</code>。这类技术目标在于填充“用户－商品”联系矩阵中的缺失项。Spark.ml目前支持基于模型的协同过滤，其中用户和商品以少量的潜在因子来描述，用以预测缺失项。Spark.ml使用<a href="http://dl.acm.org/citation.cfm?id=1608614" target="_blank" rel="noopener">交替最小二乘（ALS）</a>算法来学习这些潜在因子。<br><a id="more"></a><br>spark.ml有以下参数：</p><ul><li>numBlocks是为了并行化计算而将用户和项目分割成的块的数量（默认为10）。</li><li>rank是模型中潜在因素的数量（默认为10）。</li><li>maxIter是要运行的最大迭代次数（默认为10）。</li><li>regParam指定ALS中的正则化参数（默认为1.0）。</li><li>implicitPrefs指定是使用显式反馈ALS变体还是用于隐式反馈数据的变体 （默认false使用显式反馈）。</li><li>alpha是适用于ALS的隐式反馈变体的参数，其支配偏好观察值的 基线置信度（默认为1.0）。</li><li>非负指定是否对最小二乘使用非负约束（默认为false）。</li></ul><p>注意：用于ALS的基于DataFrame的API目前仅支持整数类型的用户和项目ID。用户和项目ID列支持其他数字类型，但ID必须在整数值范围内。</p><h2 id="Explicit-vs-Implict-feedfack-显示与隐式反馈"><a href="#Explicit-vs-Implict-feedfack-显示与隐式反馈" class="headerlink" title="Explicit vs Implict feedfack(显示与隐式反馈)"></a><strong>Explicit vs Implict feedfack(显示与隐式反馈)</strong></h2><p>基于矩阵分解的协同过滤的标准方法中，“用户－商品”矩阵中的条目是用户给予商品的显式偏好，例如，用户给电影评级。然而在现实世界中使用时，我们常常只能访问隐式反馈（如意见、点击、购买、喜欢以及分享等），在spark.ml中我们使用“隐式反馈数据集的协同过滤“来处理这类数据。本质上来说它不是直接对评分矩阵进行建模，而是将数据当作数值来看待，这些数值代表用户行为的观察值（如点击次数，用户观看一部电影的持续时间）。这些数值被用来衡量用户偏好观察值的置信水平，而不是显式地给商品一个评分。然后，模型用来寻找可以用来预测用户对商品预期偏好的潜在因子。</p><h2 id="Scaling-of-the-regularization-parameter-正则化参数缩放"><a href="#Scaling-of-the-regularization-parameter-正则化参数缩放" class="headerlink" title="Scaling of the regularization parameter(正则化参数缩放)"></a><strong>Scaling of the regularization parameter(正则化参数缩放)</strong></h2><p>我们调整正则化参数regParam来解决用户在更新用户因子时产生新评分或者商品更新商品因子时收到的新评分带来的最小二乘问题。这个方法叫做<a href="http://dx.doi.org/10.1007/978-3-540-68880-8_32" target="_blank" rel="noopener">“ALS-WR”</a>,它降低regParam对数据集规模的依赖，所以我们可以将从部分子集中学习到的最佳参数应用到整个数据集中时获得同样的性能。</p><h2 id="Cold-start-strategy-冷启动策略"><a href="#Cold-start-strategy-冷启动策略" class="headerlink" title="Cold-start strategy(冷启动策略)"></a>Cold-start strategy(冷启动策略)</h2><p>在使用ALSModel进行预测时，通常会遇到测试数据集中用户和/或物品在训练模型期间不存在的情况。这通常发生在两种情况下：</p><ol><li>在生产中，对于没有评分历史记录且尚未训练的新用户或物品（这是“冷启动问题”）。</li><li>在交叉验证过程中，数据分为训练集和评估集。当Spark的CrossValidator或者TrainValidationSplit中的使用简单随机拆分，实际上在评估集中普遍遇到用户或物品不存在的问题，而在训练集中并未出现这样的问题</li></ol><p>默认情况下，Spark NaN在当用户和/或物品因素不存在于模型中时，Spark在ALSModel.transform时使用NAN作为预测。这在生产系统中可能是有用的，因为它表示一个新的用户或物品，所以系统可以做出一个决定，作为预测。</p><p>然而，这在交叉验证期间是不好的，因为任何NaN预测值都将导致NaN评估度量的结果（例如在使用RegressionEvaluator时）。这使得模型无法作出选择。</p><p>Spark允许用户将coldStartStrategy参数设置为“drop”，以便删除DataFrame包含NaN值的预测中的任何行。评估指标然后在非NaN数据上计算，并且这是有效的。下面的例子说明了这个参数的用法。</p><p>注意：目前支持的冷启动策略是“nan”（上面提到的默认行为）和“drop”。未来可能会支持进一步的策略。</p><h2 id="Examples"><a href="#Examples" class="headerlink" title="Examples"></a>Examples</h2><p>在以下示例中，我们将从<a href="http://grouplens.org/datasets/movielens/" target="_blank" rel="noopener">MovieLens数据集</a>中加载评分数据 ，每行由用户，电影，评分和时间戳组成。然后，我们训练一个ALS模型，默认情况下，这个模型的评级是明确的（implicitPrefs是false）。我们通过测量评级预测的均方根误差来评估推荐模型。</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">from</span> pyspark.ml.evaluation <span class="keyword">import</span> RegressionEvaluator</span><br><span class="line"><span class="keyword">from</span> pyspark.ml.recommendation <span class="keyword">import</span> ALS</span><br><span class="line"><span class="keyword">from</span> pyspark.sql <span class="keyword">import</span> Row, SparkSession</span><br><span class="line"></span><br><span class="line">spark = SparkSession.builder.appName(<span class="string">"CollaborativeFilteringExample"</span>).getOrCreate()</span><br><span class="line">lines = spark.read.text(<span class="string">"data/mllib/als/sample_movielens_ratings.txt"</span>).rdd</span><br><span class="line">parts = lines.map(<span class="keyword">lambda</span> row: row.value.split(<span class="string">"::"</span>))</span><br><span class="line">ratingsRDD = parts.map(<span class="keyword">lambda</span> p: Row(userId=int(p[<span class="number">0</span>]), movieId=int(p[<span class="number">1</span>]),</span><br><span class="line">                                     rating=float(p[<span class="number">2</span>]), timestamp=int(p[<span class="number">3</span>])))</span><br><span class="line">ratings = spark.createDataFrame(ratingsRDD)</span><br><span class="line">(training, test) = ratings.randomSplit([<span class="number">0.8</span>, <span class="number">0.2</span>])</span><br><span class="line"></span><br><span class="line"><span class="comment"># Build the recommendation model using ALS on the training data</span></span><br><span class="line"><span class="comment"># Note we set cold start strategy to 'drop' to ensure we don't get NaN evaluation metrics</span></span><br><span class="line">als = ALS(maxIter=<span class="number">5</span>, regParam=<span class="number">0.01</span>, userCol=<span class="string">"userId"</span>, itemCol=<span class="string">"movieId"</span>, ratingCol=<span class="string">"rating"</span>,</span><br><span class="line">          coldStartStrategy=<span class="string">"drop"</span>)</span><br><span class="line">model = als.fit(training)</span><br><span class="line"></span><br><span class="line"><span class="comment"># Evaluate the model by computing the RMSE on the test data</span></span><br><span class="line">predictions = model.transform(test)</span><br><span class="line">evaluator = RegressionEvaluator(metricName=<span class="string">"rmse"</span>, labelCol=<span class="string">"rating"</span>,</span><br><span class="line">                                predictionCol=<span class="string">"prediction"</span>)</span><br><span class="line">rmse = evaluator.evaluate(predictions)</span><br><span class="line">print(<span class="string">"Root-mean-square error = "</span> + str(rmse))</span><br><span class="line"></span><br><span class="line"><span class="comment"># Generate top 10 movie recommendations for each user</span></span><br><span class="line">userRecs = model.recommendForAllUsers(<span class="number">10</span>)</span><br><span class="line">userRecs.show()</span><br><span class="line"><span class="comment"># userRecs.filter(userRecs['userId'] == 1).select('recommendations').show(truncate=False)  # 看看给userId==1的用户推荐了哪10部电影</span></span><br><span class="line"><span class="comment"># Generate top 10 user recommendations for each movie</span></span><br><span class="line">movieRecs = model.recommendForAllItems(<span class="number">10</span>)</span><br><span class="line">movieRecs.show()</span><br><span class="line">spark.stop()</span><br></pre></td></tr></table></figure><p>output:</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br></pre></td><td class="code"><pre><span class="line">Root-mean-square error = <span class="number">1.742790392299329</span></span><br><span class="line">+------+--------------------+</span><br><span class="line">|userId|     recommendations|</span><br><span class="line">+------+--------------------+</span><br><span class="line">|    <span class="number">28</span>|[[<span class="number">92</span>,<span class="number">5.0226665</span>], ...|</span><br><span class="line">|    <span class="number">26</span>|[[<span class="number">81</span>,<span class="number">5.6422243</span>], ...|</span><br><span class="line">|    <span class="number">27</span>|[[<span class="number">18</span>,<span class="number">4.069487</span>], [...|</span><br><span class="line">|    <span class="number">12</span>|[[<span class="number">19</span>,<span class="number">6.6280622</span>], ...|</span><br><span class="line">|    <span class="number">22</span>|[[<span class="number">74</span>,<span class="number">5.141776</span>], [...|</span><br><span class="line">|     <span class="number">1</span>|[[<span class="number">46</span>,<span class="number">4.550467</span>], [...|</span><br><span class="line">|    <span class="number">13</span>|[[<span class="number">93</span>,<span class="number">3.4347346</span>], ...|</span><br><span class="line">|     <span class="number">6</span>|[[<span class="number">25</span>,<span class="number">5.163864</span>], [...|</span><br><span class="line">|    <span class="number">16</span>|[[<span class="number">54</span>,<span class="number">4.865331</span>], [...|</span><br><span class="line">|     <span class="number">3</span>|[[<span class="number">75</span>,<span class="number">5.5034533</span>], ...|</span><br><span class="line">|    <span class="number">20</span>|[[<span class="number">22</span>,<span class="number">4.563996</span>], [...|</span><br><span class="line">|     <span class="number">5</span>|[[<span class="number">46</span>,<span class="number">6.402665</span>], [...|</span><br><span class="line">|    <span class="number">19</span>|[[<span class="number">94</span>,<span class="number">4.0123057</span>], ...|</span><br><span class="line">|    <span class="number">15</span>|[[<span class="number">46</span>,<span class="number">4.932741</span>], [...|</span><br><span class="line">|    <span class="number">17</span>|[[<span class="number">46</span>,<span class="number">5.196739</span>], [...|</span><br><span class="line">|     <span class="number">9</span>|[[<span class="number">65</span>,<span class="number">4.703967</span>], [...|</span><br><span class="line">|     <span class="number">4</span>|[[<span class="number">85</span>,<span class="number">4.958973</span>], [...|</span><br><span class="line">|     <span class="number">8</span>|[[<span class="number">43</span>,<span class="number">5.747457</span>], [...|</span><br><span class="line">|    <span class="number">23</span>|[[<span class="number">32</span>,<span class="number">5.279368</span>], [...|</span><br><span class="line">|     <span class="number">7</span>|[[<span class="number">62</span>,<span class="number">5.059422</span>], [...|</span><br><span class="line">+------+--------------------+</span><br><span class="line">only showing top <span class="number">20</span> rows</span><br><span class="line"></span><br><span class="line">+-------+--------------------+</span><br><span class="line">|movieId|     recommendations|</span><br><span class="line">+-------+--------------------+</span><br><span class="line">|     <span class="number">31</span>|[[<span class="number">12</span>,<span class="number">3.5030043</span>], ...|</span><br><span class="line">|     <span class="number">85</span>|[[<span class="number">14</span>,<span class="number">5.6425133</span>], ...|</span><br><span class="line">|     <span class="number">65</span>|[[<span class="number">23</span>,<span class="number">4.9570875</span>], ...|</span><br><span class="line">|     <span class="number">53</span>|[[<span class="number">14</span>,<span class="number">5.271897</span>], [...|</span><br><span class="line">|     <span class="number">78</span>|[[<span class="number">12</span>,<span class="number">1.4262005</span>], ...|</span><br><span class="line">|     <span class="number">34</span>|[[<span class="number">2</span>,<span class="number">3.9721959</span>], [...|</span><br><span class="line">|     <span class="number">81</span>|[[<span class="number">26</span>,<span class="number">5.6422243</span>], ...|</span><br><span class="line">|     <span class="number">28</span>|[[<span class="number">18</span>,<span class="number">5.0155253</span>], ...|</span><br><span class="line">|     <span class="number">76</span>|[[<span class="number">14</span>,<span class="number">4.9423637</span>], ...|</span><br><span class="line">|     <span class="number">26</span>|[[<span class="number">5</span>,<span class="number">4.06113</span>], [<span class="number">15.</span>..|</span><br><span class="line">|     <span class="number">27</span>|[[<span class="number">11</span>,<span class="number">5.220525</span>], [...|</span><br><span class="line">|     <span class="number">44</span>|[[<span class="number">18</span>,<span class="number">3.830072</span>], [...|</span><br><span class="line">|     <span class="number">12</span>|[[<span class="number">28</span>,<span class="number">4.8217144</span>], ...|</span><br><span class="line">|     <span class="number">91</span>|[[<span class="number">12</span>,<span class="number">3.090134</span>], [...|</span><br><span class="line">|     <span class="number">22</span>|[[<span class="number">18</span>,<span class="number">8.003841</span>], [...|</span><br><span class="line">|     <span class="number">93</span>|[[<span class="number">2</span>,<span class="number">4.621838</span>], [<span class="number">2.</span>..|</span><br><span class="line">|     <span class="number">47</span>|[[<span class="number">6</span>,<span class="number">4.48774</span>], [<span class="number">25.</span>..|</span><br><span class="line">|      <span class="number">1</span>|[[<span class="number">27</span>,<span class="number">3.527709</span>], [...|</span><br><span class="line">|     <span class="number">52</span>|[[<span class="number">8</span>,<span class="number">5.0824013</span>], [...|</span><br><span class="line">|     <span class="number">13</span>|[[<span class="number">23</span>,<span class="number">4.004786</span>], [...|</span><br><span class="line">+-------+--------------------+</span><br><span class="line">only showing top <span class="number">20</span> rows</span><br></pre></td></tr></table></figure><p>Find full example code at “examples/src/main/python/ml/als_example.py” in the Spark repo.</p><p>如果评分矩阵是从另一个信息源（即它是从其他信号推断）得出，可以设置implicitPrefs以true获得更好的效果：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">als = ALS(maxIter=<span class="number">5</span>, regParam=<span class="number">0.01</span>, implicitPrefs=<span class="keyword">True</span>,</span><br><span class="line">          userCol=<span class="string">"userId"</span>, itemCol=<span class="string">"movieId"</span>, ratingCol=<span class="string">"rating"</span>)</span><br></pre></td></tr></table></figure><p><strong>更多相关信息请查阅<a href="https://spark.apache.org/docs/latest/ml-collaborative-filtering.html" target="_blank" rel="noopener">spark 协同过滤</a></strong></p><h2 id="结束"><a href="#结束" class="headerlink" title="结束"></a>结束</h2>]]></content>
    
    <summary type="html">
    
      &lt;p class=&quot;description&quot;&gt;推荐算法&lt;br&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;&quot; alt=&quot;&quot; style=&quot;width:100%&quot;&gt;&lt;br&gt;&lt;a href=&quot;http://en.wikipedia.org/wiki/Recommender_system#Collaborative_filtering&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;协同过滤&lt;/a&gt;常被用于&lt;code&gt;推荐系统&lt;/code&gt;。这类技术目标在于填充“用户－商品”联系矩阵中的缺失项。Spark.ml目前支持基于模型的协同过滤，其中用户和商品以少量的潜在因子来描述，用以预测缺失项。Spark.ml使用&lt;a href=&quot;http://dl.acm.org/citation.cfm?id=1608614&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;交替最小二乘（ALS）&lt;/a&gt;算法来学习这些潜在因子。&lt;br&gt;
    
    </summary>
    
      <category term="Spark" scheme="https://cgdeeplearn.github.io/categories/Spark/"/>
    
      <category term="MLlib" scheme="https://cgdeeplearn.github.io/categories/Spark/MLlib/"/>
    
    
      <category term="协同过滤" scheme="https://cgdeeplearn.github.io/tags/%E5%8D%8F%E5%90%8C%E8%BF%87%E6%BB%A4/"/>
    
      <category term="推荐系统" scheme="https://cgdeeplearn.github.io/tags/%E6%8E%A8%E8%8D%90%E7%B3%BB%E7%BB%9F/"/>
    
  </entry>
  
  <entry>
    <title>SparkMLlib-Clustering</title>
    <link href="https://cgdeeplearn.github.io/2018/01/22/SparkMLlib-Clustering/"/>
    <id>https://cgdeeplearn.github.io/2018/01/22/SparkMLlib-Clustering/</id>
    <published>2018-01-22T03:09:09.000Z</published>
    <updated>2018-03-01T09:36:26.302Z</updated>
    
    <content type="html"><![CDATA[<script src="/assets/js/APlayer.min.js"> </script><p class="description">聚类算法<br></p><p><img src="" alt="" style="width:100%"></p><p>本节介绍MLlib中的聚类算法(<code>KMeans</code>, <code>LDA</code>, <code>GMM</code>)。在<a href="https://spark.apache.org/docs/latest/mllib-clustering.html" target="_blank" rel="noopener">基于RDD-API聚类指南</a>里还提供了有关这些算法的相关信息。</p><a id="more"></a><h2 id="K-means"><a href="#K-means" class="headerlink" title="K-means"></a>K-means</h2><p><code>K-means</code>是一个常用的聚类算法来将数据点按预定的簇数进行聚集。K-means算法的基本思想是：以空间中k个点为中心进行聚类，对最靠近他们的对象归类。通过迭代的方法，逐次更新各聚类中心的值，直至得到最好的聚类结果。</p><p>假设要把样本集分为c个类别，算法描述如下：</p><p>（1）适当选择c个类的初始中心；</p><p>（2）在第k次迭代中，对任意一个样本，求其到c个中心的距离，将该样本归到距离最短的中心所在的类；</p><p>（3）利用均值等方法更新该类的中心值；</p><p>（4）对于所有的c个聚类中心，如果利用（2）（3）的迭代法更新后，值保持不变，则迭代结束，否则继续迭代。</p><p>MLlib工具包含并行的K-means++算法，称为kmeans||。Kmeans是一个Estimator，它在基础模型之上产生一个KMeansModel。</p><ul><li><strong>Input Columns(输入列)</strong></li></ul><table><thead><tr><th>Param name(参数名称)</th><th>Type(s)(类型)</th><th>Default(默认)</th><th>Description(描述)</th></tr></thead><tbody><tr><td>featuresCol</td><td>Vector</td><td>“features”</td><td>Feature vector(特征向量)</td></tr></tbody></table><ul><li><strong>Output Columns(输出列)</strong></li></ul><table><thead><tr><th>Param name(参数名称)</th><th>Type(s)(类型)</th><th>Default(默认)</th><th>Description(描述)</th></tr></thead><tbody><tr><td>predictionCol</td><td>Int</td><td>“prediction”</td><td>Predicted cluster center(预测的聚类中心)</td></tr></tbody></table><h3 id="Examples"><a href="#Examples" class="headerlink" title="Examples"></a>Examples</h3><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">from</span> pyspark.ml.clustering <span class="keyword">import</span> KMeans</span><br><span class="line"><span class="keyword">from</span> pyspark.sql <span class="keyword">import</span> SparkSession</span><br><span class="line"></span><br><span class="line">spark = SparkSession.builder.appName(<span class="string">"ClusterExample"</span>).getOrCreate()</span><br><span class="line"><span class="comment"># Loads data.</span></span><br><span class="line">dataset = spark.read.format(<span class="string">"libsvm"</span>).load(<span class="string">"data/mllib/sample_kmeans_data.txt"</span>)</span><br><span class="line"></span><br><span class="line"><span class="comment"># Trains a k-means model.</span></span><br><span class="line">kmeans = KMeans().setK(<span class="number">2</span>).setSeed(<span class="number">1</span>)</span><br><span class="line">model = kmeans.fit(dataset)</span><br><span class="line"></span><br><span class="line"><span class="comment"># Evaluate clustering by computing Within Set Sum of Squared Errors.</span></span><br><span class="line">wssse = model.computeCost(dataset)</span><br><span class="line">print(<span class="string">"Within Set Sum of Squared Errors = "</span> + str(wssse))</span><br><span class="line"></span><br><span class="line"><span class="comment"># Shows the result.</span></span><br><span class="line">centers = model.clusterCenters()</span><br><span class="line">print(<span class="string">"Cluster Centers: "</span>)</span><br><span class="line"><span class="keyword">for</span> center <span class="keyword">in</span> centers:</span><br><span class="line">    print(center)</span><br><span class="line">spark.stop()</span><br></pre></td></tr></table></figure><p>output:</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"></span><br><span class="line">Within Set Sum of Squared Errors = <span class="number">0.11999999999994547</span></span><br><span class="line">Cluster Centers: </span><br><span class="line">[ <span class="number">0.1</span>  <span class="number">0.1</span>  <span class="number">0.1</span>]</span><br><span class="line">[ <span class="number">9.1</span>  <span class="number">9.1</span>  <span class="number">9.1</span>]</span><br></pre></td></tr></table></figure><p>Find full example code at “examples/src/main/python/ml/kmeans_example.py” in the Spark repo.</p><h2 id="Latent-Dirichlet-allocation-LDA"><a href="#Latent-Dirichlet-allocation-LDA" class="headerlink" title="Latent Dirichlet allocation(LDA)"></a>Latent Dirichlet allocation(LDA)</h2><p><code>LDA</code>（Latent Dirichlet Allocation）是一种文档主题生成模型，也称为一个三层贝叶斯概率模型，包含词、主题和文档三层结构。所谓生成模型，就是说，我们认为一篇文章的每个词都是通过“以一定概率选择了某个主题，并从这个主题中以一定概率选择某个词语”这样一个过程得到。文档到主题服从多项式分布，主题到词服从多项式分布。</p><p>LDA是一种非监督机器学习技术，可以用来识别大规模文档集（document collection）或语料库（corpus）中潜藏的主题信息。它采用了词袋（bag of words）的方法，这种方法将每一篇文档视为一个词频向量，从而将文本信息转化为了易于建模的数字信息。但是词袋方法没有考虑词与词之间的顺序，这简化了问题的复杂性，同时也为模型的改进提供了契机。每一篇文档代表了一些主题所构成的一个概率分布，而每一个主题又代表了很多单词所构成的一个概率分布。</p><p>LDA被实现为一个Estimator,既支持EMLDAOptimizer和OnlineLDAOptimizer，并生成一个LDAModel作为基础模型。如果需要的话，专家用户可以将EMLDAOptimizer生成的LDAModel映射到一个DistributedLDAModel</p><h3 id="Examples-1"><a href="#Examples-1" class="headerlink" title="Examples"></a>Examples</h3><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">from</span> pyspark.ml.clustering <span class="keyword">import</span> LDA</span><br><span class="line"><span class="keyword">from</span> pyspark.sql <span class="keyword">import</span> SparkSession</span><br><span class="line"></span><br><span class="line">spark = SparkSession.builder.appName(<span class="string">"LDAExample"</span>).getOrCreate()</span><br><span class="line"><span class="comment"># Loads data.</span></span><br><span class="line">dataset = spark.read.format(<span class="string">"libsvm"</span>).load(<span class="string">"data/mllib/sample_lda_libsvm_data.txt"</span>)</span><br><span class="line"></span><br><span class="line"><span class="comment"># Trains a LDA model.</span></span><br><span class="line">lda = LDA(k=<span class="number">10</span>, maxIter=<span class="number">10</span>)</span><br><span class="line">model = lda.fit(dataset)</span><br><span class="line"></span><br><span class="line">ll = model.logLikelihood(dataset)</span><br><span class="line">lp = model.logPerplexity(dataset)</span><br><span class="line">print(<span class="string">"The lower bound on the log likelihood of the entire corpus: "</span> + str(ll))</span><br><span class="line">print(<span class="string">"The upper bound on perplexity: "</span> + str(lp))</span><br><span class="line"></span><br><span class="line"><span class="comment"># Describe topics.</span></span><br><span class="line">topics = model.describeTopics(<span class="number">3</span>)</span><br><span class="line">print(<span class="string">"The topics described by their top-weighted terms:"</span>)</span><br><span class="line">topics.show(truncate=<span class="keyword">False</span>)</span><br><span class="line"></span><br><span class="line"><span class="comment"># Shows the result</span></span><br><span class="line">transformed = model.transform(dataset)</span><br><span class="line">transformed.show(truncate=<span class="keyword">False</span>)</span><br><span class="line">spark.stop()</span><br></pre></td></tr></table></figure><p>output:</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br></pre></td><td class="code"><pre><span class="line">The lower bound on the log likelihood of the entire corpus: <span class="number">-797.8018456907539</span></span><br><span class="line">The upper bound on perplexity: <span class="number">3.068468635551357</span></span><br><span class="line">The topics described by their top-weighted terms:</span><br><span class="line">+-----+-----------+---------------------------------------------------------------+</span><br><span class="line">|topic|termIndices|termWeights                                                    |</span><br><span class="line">+-----+-----------+---------------------------------------------------------------+</span><br><span class="line">|<span class="number">0</span>    |[<span class="number">0</span>, <span class="number">4</span>, <span class="number">7</span>]  |[<span class="number">0.13939487929625935</span>, <span class="number">0.13346874874963285</span>, <span class="number">0.11911498796394984</span>]|</span><br><span class="line">|<span class="number">1</span>    |[<span class="number">8</span>, <span class="number">6</span>, <span class="number">0</span>]  |[<span class="number">0.09761719173430919</span>, <span class="number">0.09664530483154511</span>, <span class="number">0.0959033498887414</span>] |</span><br><span class="line">|<span class="number">2</span>    |[<span class="number">5</span>, <span class="number">9</span>, <span class="number">1</span>]  |[<span class="number">0.09763288175177705</span>, <span class="number">0.0967699480930826</span>, <span class="number">0.09474971437446654</span>] |</span><br><span class="line">|<span class="number">3</span>    |[<span class="number">6</span>, <span class="number">2</span>, <span class="number">5</span>]  |[<span class="number">0.09993087551790403</span>, <span class="number">0.09802667103524504</span>, <span class="number">0.09669791743434605</span>]|</span><br><span class="line">|<span class="number">4</span>    |[<span class="number">10</span>, <span class="number">5</span>, <span class="number">8</span>] |[<span class="number">0.10838084105098059</span>, <span class="number">0.1065719519796393</span>, <span class="number">0.10564271921581836</span>] |</span><br><span class="line">|<span class="number">5</span>    |[<span class="number">2</span>, <span class="number">5</span>, <span class="number">3</span>]  |[<span class="number">0.09975664174839147</span>, <span class="number">0.09917147147531298</span>, <span class="number">0.09482946730767593</span>]|</span><br><span class="line">|<span class="number">6</span>    |[<span class="number">1</span>, <span class="number">7</span>, <span class="number">3</span>]  |[<span class="number">0.1025918379349122</span>, <span class="number">0.09670884980694468</span>, <span class="number">0.09661321616852961</span>] |</span><br><span class="line">|<span class="number">7</span>    |[<span class="number">3</span>, <span class="number">10</span>, <span class="number">6</span>] |[<span class="number">0.18074276445784626</span>, <span class="number">0.17140880975201497</span>, <span class="number">0.11846617165050731</span>]|</span><br><span class="line">|<span class="number">8</span>    |[<span class="number">7</span>, <span class="number">9</span>, <span class="number">1</span>]  |[<span class="number">0.10376667278659339</span>, <span class="number">0.10266984655859988</span>, <span class="number">0.10261491999135175</span>]|</span><br><span class="line">|<span class="number">9</span>    |[<span class="number">5</span>, <span class="number">9</span>, <span class="number">4</span>]  |[<span class="number">0.17217259005160918</span>, <span class="number">0.11130983487715354</span>, <span class="number">0.10625585388024414</span>]|</span><br><span class="line">+-----+-----------+---------------------------------------------------------------+</span><br><span class="line"></span><br><span class="line">+-----+---------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+</span><br><span class="line">|label|features                                                       |topicDistribution                                                                                                                                                                                                      |</span><br><span class="line">+-----+---------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+</span><br><span class="line">|<span class="number">0.0</span>  |(<span class="number">11</span>,[<span class="number">0</span>,<span class="number">1</span>,<span class="number">2</span>,<span class="number">4</span>,<span class="number">5</span>,<span class="number">6</span>,<span class="number">7</span>,<span class="number">10</span>],[<span class="number">1.0</span>,<span class="number">2.0</span>,<span class="number">6.0</span>,<span class="number">2.0</span>,<span class="number">3.0</span>,<span class="number">1.0</span>,<span class="number">1.0</span>,<span class="number">3.0</span>])      |[<span class="number">0.004834482522877391</span>,<span class="number">0.004775061546874506</span>,<span class="number">0.0047750850624618665</span>,<span class="number">0.00477508209536724</span>,<span class="number">0.004775110752126829</span>,<span class="number">0.0047751198765325934</span>,<span class="number">0.0047750802565546275</span>,<span class="number">0.44999380128294686</span>,<span class="number">0.004775119757841731</span>,<span class="number">0.5117460568464164</span>]     |</span><br><span class="line">|<span class="number">1.0</span>  |(<span class="number">11</span>,[<span class="number">0</span>,<span class="number">1</span>,<span class="number">3</span>,<span class="number">4</span>,<span class="number">7</span>,<span class="number">10</span>],[<span class="number">1.0</span>,<span class="number">3.0</span>,<span class="number">1.0</span>,<span class="number">3.0</span>,<span class="number">2.0</span>,<span class="number">1.0</span>])                  |[<span class="number">0.9268994208923648</span>,<span class="number">0.007965511763080765</span>,<span class="number">0.007965521320089061</span>,<span class="number">0.007965447383722308</span>,<span class="number">0.007965587789582014</span>,<span class="number">0.007965461329343004</span>,<span class="number">0.00796558757403698</span>,<span class="number">0.009276986136774072</span>,<span class="number">0.007965614108681227</span>,<span class="number">0.008064861702326028</span>]       |</span><br><span class="line">|<span class="number">2.0</span>  |(<span class="number">11</span>,[<span class="number">0</span>,<span class="number">1</span>,<span class="number">2</span>,<span class="number">5</span>,<span class="number">6</span>,<span class="number">8</span>,<span class="number">9</span>],[<span class="number">1.0</span>,<span class="number">4.0</span>,<span class="number">1.0</span>,<span class="number">4.0</span>,<span class="number">9.0</span>,<span class="number">1.0</span>,<span class="number">2.0</span>])             |[<span class="number">0.004202815262490896</span>,<span class="number">0.004151229704235803</span>,<span class="number">0.004151279248440336</span>,<span class="number">0.004151250849060332</span>,<span class="number">0.004151298320120848</span>,<span class="number">0.004151248811452763</span>,<span class="number">0.004151213592542253</span>,<span class="number">0.6501025149437936</span>,<span class="number">0.00415114952939257</span>,<span class="number">0.3166359997384707</span>]         |</span><br><span class="line">|<span class="number">3.0</span>  |(<span class="number">11</span>,[<span class="number">0</span>,<span class="number">1</span>,<span class="number">3</span>,<span class="number">6</span>,<span class="number">8</span>,<span class="number">9</span>,<span class="number">10</span>],[<span class="number">2.0</span>,<span class="number">1.0</span>,<span class="number">3.0</span>,<span class="number">5.0</span>,<span class="number">2.0</span>,<span class="number">3.0</span>,<span class="number">9.0</span>])            |[<span class="number">0.0037170513237456872</span>,<span class="number">0.0036715329471578005</span>,<span class="number">0.0036715360552429213</span>,<span class="number">0.003671511493261907</span>,<span class="number">0.003671797370463146</span>,<span class="number">0.0036715102318871204</span>,<span class="number">0.0036715134308361727</span>,<span class="number">0.9668647838101413</span>,<span class="number">0.003671504403863317</span>,<span class="number">0.003717258933400576</span>] |</span><br><span class="line">|<span class="number">4.0</span>  |(<span class="number">11</span>,[<span class="number">0</span>,<span class="number">1</span>,<span class="number">2</span>,<span class="number">3</span>,<span class="number">4</span>,<span class="number">6</span>,<span class="number">9</span>,<span class="number">10</span>],[<span class="number">3.0</span>,<span class="number">1.0</span>,<span class="number">1.0</span>,<span class="number">9.0</span>,<span class="number">3.0</span>,<span class="number">2.0</span>,<span class="number">1.0</span>,<span class="number">3.0</span>])      |[<span class="number">0.004027376743557338</span>,<span class="number">0.003977866599137274</span>,<span class="number">0.003977850254362953</span>,<span class="number">0.003977835428829377</span>,<span class="number">0.0039778820932092175</span>,<span class="number">0.003977853048840427</span>,<span class="number">0.003977852184563374</span>,<span class="number">0.9641001717255747</span>,<span class="number">0.0039778458818949</span>,<span class="number">0.004027466040030248</span>]       |</span><br><span class="line">|<span class="number">5.0</span>  |(<span class="number">11</span>,[<span class="number">0</span>,<span class="number">1</span>,<span class="number">3</span>,<span class="number">4</span>,<span class="number">5</span>,<span class="number">6</span>,<span class="number">7</span>,<span class="number">8</span>,<span class="number">9</span>],[<span class="number">4.0</span>,<span class="number">2.0</span>,<span class="number">3.0</span>,<span class="number">4.0</span>,<span class="number">5.0</span>,<span class="number">1.0</span>,<span class="number">1.0</span>,<span class="number">1.0</span>,<span class="number">4.0</span>]) |[<span class="number">0.003717509832713523</span>,<span class="number">0.0036716615407946934</span>,<span class="number">0.0036716846624067615</span>,<span class="number">0.0036716395255962085</span>,<span class="number">0.0036717149575019995</span>,<span class="number">0.0036716664005927474</span>,<span class="number">0.0036716667567801204</span>,<span class="number">0.27461258177043146</span>,<span class="number">0.0036716647781321666</span>,<span class="number">0.6959682097750503</span>]|</span><br><span class="line">|<span class="number">6.0</span>  |(<span class="number">11</span>,[<span class="number">0</span>,<span class="number">1</span>,<span class="number">3</span>,<span class="number">6</span>,<span class="number">8</span>,<span class="number">9</span>,<span class="number">10</span>],[<span class="number">2.0</span>,<span class="number">1.0</span>,<span class="number">3.0</span>,<span class="number">5.0</span>,<span class="number">2.0</span>,<span class="number">2.0</span>,<span class="number">9.0</span>])            |[<span class="number">0.0038659082828356533</span>,<span class="number">0.003818570338009387</span>,<span class="number">0.0038185658000222077</span>,<span class="number">0.0038185390646671936</span>,<span class="number">0.003818726199778954</span>,<span class="number">0.0038185379956121677</span>,<span class="number">0.003818554784511252</span>,<span class="number">0.9655379642100607</span>,<span class="number">0.003818526437489602</span>,<span class="number">0.003866106887012979</span>]  |</span><br><span class="line">|<span class="number">7.0</span>  |(<span class="number">11</span>,[<span class="number">0</span>,<span class="number">1</span>,<span class="number">2</span>,<span class="number">3</span>,<span class="number">4</span>,<span class="number">5</span>,<span class="number">6</span>,<span class="number">9</span>,<span class="number">10</span>],[<span class="number">1.0</span>,<span class="number">1.0</span>,<span class="number">1.0</span>,<span class="number">9.0</span>,<span class="number">2.0</span>,<span class="number">1.0</span>,<span class="number">2.0</span>,<span class="number">1.0</span>,<span class="number">3.0</span>])|[<span class="number">0.004394125793389081</span>,<span class="number">0.004340066102223131</span>,<span class="number">0.004340117929521572</span>,<span class="number">0.004340091402319875</span>,<span class="number">0.004340183500883856</span>,<span class="number">0.004340117374988447</span>,<span class="number">0.004340096103563213</span>,<span class="number">0.9608305723851966</span>,<span class="number">0.004340058125232322</span>,<span class="number">0.004394571282681922</span>]      |</span><br><span class="line">|<span class="number">8.0</span>  |(<span class="number">11</span>,[<span class="number">0</span>,<span class="number">1</span>,<span class="number">3</span>,<span class="number">4</span>,<span class="number">5</span>,<span class="number">6</span>,<span class="number">7</span>],[<span class="number">4.0</span>,<span class="number">4.0</span>,<span class="number">3.0</span>,<span class="number">4.0</span>,<span class="number">2.0</span>,<span class="number">1.0</span>,<span class="number">3.0</span>])             |[<span class="number">0.9601715212707249</span>,<span class="number">0.0043400767428901635</span>,<span class="number">0.004340086711133699</span>,<span class="number">0.004340041546373581</span>,<span class="number">0.004340093118553618</span>,<span class="number">0.004340077924408194</span>,<span class="number">0.004340099543124161</span>,<span class="number">0.005053547015193133</span>,<span class="number">0.004340064942938327</span>,<span class="number">0.004394391184660286</span>]     |</span><br><span class="line">|<span class="number">9.0</span>  |(<span class="number">11</span>,[<span class="number">0</span>,<span class="number">1</span>,<span class="number">2</span>,<span class="number">4</span>,<span class="number">6</span>,<span class="number">8</span>,<span class="number">9</span>,<span class="number">10</span>],[<span class="number">2.0</span>,<span class="number">8.0</span>,<span class="number">2.0</span>,<span class="number">3.0</span>,<span class="number">2.0</span>,<span class="number">2.0</span>,<span class="number">7.0</span>,<span class="number">2.0</span>])      |[<span class="number">0.003332384443784424</span>,<span class="number">0.0032914608990001755</span>,<span class="number">0.003291474583522146</span>,<span class="number">0.003291442358715674</span>,<span class="number">0.003291502923651029</span>,<span class="number">0.0032914477446806248</span>,<span class="number">0.003291451230227242</span>,<span class="number">0.9702948142666302</span>,<span class="number">0.0032914840138979083</span>,<span class="number">0.0033325375358905047</span>]  |</span><br><span class="line">|<span class="number">10.0</span> |(<span class="number">11</span>,[<span class="number">0</span>,<span class="number">1</span>,<span class="number">2</span>,<span class="number">3</span>,<span class="number">5</span>,<span class="number">6</span>,<span class="number">9</span>,<span class="number">10</span>],[<span class="number">1.0</span>,<span class="number">1.0</span>,<span class="number">1.0</span>,<span class="number">9.0</span>,<span class="number">2.0</span>,<span class="number">2.0</span>,<span class="number">3.0</span>,<span class="number">3.0</span>])      |[<span class="number">0.004202933475197545</span>,<span class="number">0.004151218860618786</span>,<span class="number">0.004151338270182237</span>,<span class="number">0.004151288340705789</span>,<span class="number">0.004151431515312671</span>,<span class="number">0.004151332888945593</span>,<span class="number">0.00415129142785515</span>,<span class="number">0.9625342071313459</span>,<span class="number">0.0041512327632204984</span>,<span class="number">0.004203725326615945</span>]      |</span><br><span class="line">|<span class="number">11.0</span> |(<span class="number">11</span>,[<span class="number">0</span>,<span class="number">1</span>,<span class="number">4</span>,<span class="number">5</span>,<span class="number">6</span>,<span class="number">7</span>,<span class="number">9</span>],[<span class="number">4.0</span>,<span class="number">1.0</span>,<span class="number">4.0</span>,<span class="number">5.0</span>,<span class="number">1.0</span>,<span class="number">3.0</span>,<span class="number">1.0</span>])             |[<span class="number">0.5794463100207559</span>,<span class="number">0.004774699657046339</span>,<span class="number">0.004774740812070836</span>,<span class="number">0.0047746922036681246</span>,<span class="number">0.004774755044701768</span>,<span class="number">0.004774721978296648</span>,<span class="number">0.0047747158288502884</span>,<span class="number">0.0055583559655138</span>,<span class="number">0.004774694223725667</span>,<span class="number">0.38157231426537064</span>]       |</span><br><span class="line">+-----+---------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+</span><br></pre></td></tr></table></figure><p>Find full example code at “examples/src/main/python/ml/lda_example.py” in the Spark repo.</p><h2 id="Bisecting-k-means"><a href="#Bisecting-k-means" class="headerlink" title="Bisecting k-means"></a>Bisecting k-means</h2><p><code>二分K均值</code>算法是一种<code>层次聚类算法</code>，使用自顶向下的逼近：所有的观察值开始是一个簇，递归地向下一个层级分裂。分裂依据为选择能最大程度降低聚类代价函数（也就是误差平方和）的簇划分为两个簇。以此进行下去，直到簇的数目等于用户给定的数目k为止。二分K均值常常比传统K均值算法有更快的计算速度，但产生的簇群与传统K均值算法往往也是不同的。</p><p>BisectingKMeans是一个Estimator，在基础模型上训练得到BisectingKMeansModel。</p><h3 id="Examples-2"><a href="#Examples-2" class="headerlink" title="Examples"></a>Examples</h3><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">from</span> pyspark.ml.clustering <span class="keyword">import</span> BisectingKMeans</span><br><span class="line"><span class="keyword">from</span> pyspark.sql <span class="keyword">import</span> SparkSession</span><br><span class="line"></span><br><span class="line">spark = SparkSession.builder.appName(<span class="string">"BisectingKMeansExample"</span>).getOrCreate()</span><br><span class="line"><span class="comment"># Loads data.</span></span><br><span class="line">dataset = spark.read.format(<span class="string">"libsvm"</span>).load(<span class="string">"data/mllib/sample_kmeans_data.txt"</span>)</span><br><span class="line"></span><br><span class="line"><span class="comment"># Trains a bisecting k-means model.</span></span><br><span class="line">bkm = BisectingKMeans().setK(<span class="number">2</span>).setSeed(<span class="number">1</span>)</span><br><span class="line">model = bkm.fit(dataset)</span><br><span class="line"></span><br><span class="line"><span class="comment"># Evaluate clustering.</span></span><br><span class="line">cost = model.computeCost(dataset)</span><br><span class="line">print(<span class="string">"Within Set Sum of Squared Errors = "</span> + str(cost))</span><br><span class="line"></span><br><span class="line"><span class="comment"># Shows the result.</span></span><br><span class="line">print(<span class="string">"Cluster Centers: "</span>)</span><br><span class="line">centers = model.clusterCenters()</span><br><span class="line"><span class="keyword">for</span> center <span class="keyword">in</span> centers:</span><br><span class="line">    print(center)</span><br><span class="line">spark.stop()</span><br></pre></td></tr></table></figure><p>output:</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">Within Set Sum of Squared Errors = <span class="number">0.11999999999994547</span></span><br><span class="line">Cluster Centers: </span><br><span class="line">[ <span class="number">0.1</span>  <span class="number">0.1</span>  <span class="number">0.1</span>]</span><br><span class="line">[ <span class="number">9.1</span>  <span class="number">9.1</span>  <span class="number">9.1</span>]</span><br></pre></td></tr></table></figure><p>Find full example code at “examples/src/main/python/ml/bisecting_k_means_example.py” in the Spark repo.</p><h2 id="Gaussian-Mixture-Model-GMM"><a href="#Gaussian-Mixture-Model-GMM" class="headerlink" title="Gaussian Mixture Model(GMM)"></a>Gaussian Mixture Model(GMM)</h2><p><code>混合高斯模型</code>描述数据点以一定的概率服从k种高斯子分布的一种混合分布。Spark.ml使用EM算法给出一组样本的极大似然模型。</p><p>GaussianMixture被实现为一个Estimator,并生成一个GaussianMixtureModel基本模型。</p><ul><li><strong>Input Columns</strong></li></ul><table><thead><tr><th>Param name</th><th>Type(s)</th><th>Default</th><th>Description</th></tr></thead><tbody><tr><td>featuresCol</td><td>Vector</td><td>“features”</td><td>Feature vector</td></tr></tbody></table><ul><li><strong>Output Columns</strong></li></ul><table><thead><tr><th>Param name</th><th>Type(s)</th><th>Default</th><th>Description</th></tr></thead><tbody><tr><td>predictionCol</td><td>Int</td><td>“prediction”</td><td>Predicted cluster center</td></tr><tr><td>probabilityCol</td><td>Vector</td><td>“probability”</td><td>Probability of each cluster</td></tr></tbody></table><h3 id="Examples-3"><a href="#Examples-3" class="headerlink" title="Examples"></a>Examples</h3><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">from</span> pyspark.ml.clustering <span class="keyword">import</span> GaussianMixture</span><br><span class="line"><span class="keyword">from</span> pyspark.sql <span class="keyword">import</span> SparkSession</span><br><span class="line"></span><br><span class="line">spark = SparkSession.builder.appName(<span class="string">"GaussianMixtureExample"</span>).getOrCreate()</span><br><span class="line"><span class="comment"># loads data</span></span><br><span class="line">dataset = spark.read.format(<span class="string">"libsvm"</span>).load(<span class="string">"data/mllib/sample_kmeans_data.txt"</span>)</span><br><span class="line"></span><br><span class="line">gmm = GaussianMixture().setK(<span class="number">2</span>).setSeed(<span class="number">538009335</span>)</span><br><span class="line">model = gmm.fit(dataset)</span><br><span class="line"></span><br><span class="line">print(<span class="string">"Gaussians shown as a DataFrame: "</span>)</span><br><span class="line">model.gaussiansDF.show(truncate=<span class="keyword">False</span>)</span><br><span class="line">spark.stop()</span><br></pre></td></tr></table></figure><p>output:</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line">Gaussians shown <span class="keyword">as</span> a DataFrame: </span><br><span class="line">+-------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+</span><br><span class="line">|mean                                                         |cov                                                                                                                                                                                                     |</span><br><span class="line">+-------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+</span><br><span class="line">|[<span class="number">0.10000000000001552</span>,<span class="number">0.10000000000001552</span>,<span class="number">0.10000000000001552</span>]|<span class="number">0.006666666666806454</span>  <span class="number">0.006666666666806454</span>  <span class="number">0.006666666666806454</span>  </span><br><span class="line"><span class="number">0.006666666666806454</span>  <span class="number">0.006666666666806454</span>  <span class="number">0.006666666666806454</span>  </span><br><span class="line"><span class="number">0.006666666666806454</span>  <span class="number">0.006666666666806454</span>  <span class="number">0.006666666666806454</span>  |</span><br><span class="line">|[<span class="number">9.099999999999984</span>,<span class="number">9.099999999999984</span>,<span class="number">9.099999999999984</span>]      |<span class="number">0.006666666666812185</span>  <span class="number">0.006666666666812185</span>  <span class="number">0.006666666666812185</span>  </span><br><span class="line"><span class="number">0.006666666666812185</span>  <span class="number">0.006666666666812185</span>  <span class="number">0.006666666666812185</span>  </span><br><span class="line"><span class="number">0.006666666666812185</span>  <span class="number">0.006666666666812185</span>  <span class="number">0.006666666666812185</span>  |</span><br><span class="line">+-------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+</span><br></pre></td></tr></table></figure><p>Find full example code at “examples/src/main/python/ml/gaussian_mixture_example.py” in the Spark repo.</p><p><strong>更多相关信息请查阅<a href="https://spark.apache.org/docs/latest/ml-clustering.html" target="_blank" rel="noopener">Spark Clustering文档</a></strong></p><h2 id="结束"><a href="#结束" class="headerlink" title="结束"></a>结束</h2>]]></content>
    
    <summary type="html">
    
      &lt;p class=&quot;description&quot;&gt;聚类算法&lt;br&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;&quot; alt=&quot;&quot; style=&quot;width:100%&quot;&gt;&lt;/p&gt;
&lt;p&gt;本节介绍MLlib中的聚类算法(&lt;code&gt;KMeans&lt;/code&gt;, &lt;code&gt;LDA&lt;/code&gt;, &lt;code&gt;GMM&lt;/code&gt;)。在&lt;a href=&quot;https://spark.apache.org/docs/latest/mllib-clustering.html&quot; target=&quot;_blank&quot; rel=&quot;noopener&quot;&gt;基于RDD-API聚类指南&lt;/a&gt;里还提供了有关这些算法的相关信息。&lt;/p&gt;
    
    </summary>
    
      <category term="Spark" scheme="https://cgdeeplearn.github.io/categories/Spark/"/>
    
      <category term="MLlib" scheme="https://cgdeeplearn.github.io/categories/Spark/MLlib/"/>
    
    
      <category term="KMeans" scheme="https://cgdeeplearn.github.io/tags/KMeans/"/>
    
      <category term="LDA" scheme="https://cgdeeplearn.github.io/tags/LDA/"/>
    
      <category term="Bisecting KMeans" scheme="https://cgdeeplearn.github.io/tags/Bisecting-KMeans/"/>
    
      <category term="GMM" scheme="https://cgdeeplearn.github.io/tags/GMM/"/>
    
  </entry>
  
</feed>
