<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>Stupig</title>
  
  
  <link href="https://plpan.github.io/atom.xml" rel="self"/>
  
  <link href="https://plpan.github.io/"/>
  <updated>2021-11-30T03:43:26.099Z</updated>
  <id>https://plpan.github.io/</id>
  
  <author>
    <name>plpan</name>
    
  </author>
  
  <generator uri="https://hexo.io/">Hexo</generator>
  
  <entry>
    <title>client-go informer 缓存失效问题排查</title>
    <link href="https://plpan.github.io/client-go-informer-%E7%BC%93%E5%AD%98%E5%A4%B1%E6%95%88%E9%97%AE%E9%A2%98%E6%8E%92%E6%9F%A5/"/>
    <id>https://plpan.github.io/client-go-informer-%E7%BC%93%E5%AD%98%E5%A4%B1%E6%95%88%E9%97%AE%E9%A2%98%E6%8E%92%E6%9F%A5/</id>
    <published>2021-11-27T02:36:56.000Z</published>
    <updated>2021-11-30T03:43:26.099Z</updated>
    
    <content type="html"><![CDATA[<h1 id="背景"><a href="#背景" class="headerlink" title="背景"></a>背景</h1><p>长期以来，弹性云线上服务一直饱受缓存不一致的困扰。</p><p>缓存不一致的发生一般伴随着kube-apiserver的升级或重启。且当缓存不一致问题发生时，用户侧能够较为明显的感知，问题严重时会引发线上故障。而常见的故障有：</p><ul><li>平台数据不一致：Pod状态一会正常，一会不正常，并且来回跳动</li><li>服务管理事件丢失：服务变更时，服务管理未正常工作，如服务树未挂载、流量未接入等等</li></ul><p>在问题未定位之前，弹性云制定了诸多问题感知与及时止损策略：</p><ul><li>问题感知：<ul><li>人工：kube-apiserver升级或重启时，人工通知关联方也重启平台服务</li><li>智能：配置监控与报警策略，当一段时间内未收到k8s对象的变更事件时，发送告警信息</li></ul></li><li>及时止损：<ul><li>重启：缓存不一致问题发生时，重启服务，并从kube-apiserver全量拉取最新的数据</li><li>自愈：部分场景下，即使服务重启也不能完全恢复，添加自愈策略，主动感知并处理异常情况</li></ul></li></ul><p>问题感知与止损策略并没有真正意义上解决问题，而仅仅是在确定性场景下尝试恢复服务，并且伴随着更多异常场景的发现，策略也需同步调整。</p><h1 id="问题定位"><a href="#问题定位" class="headerlink" title="问题定位"></a>问题定位</h1><p>感知与止损是一种类似亡羊补牢的修复手段，显然，我们更希望的是一个彻底解决问题的方案。那么，我们先从引起缓存不一致的根因开始排查。</p><p>我们选择notifier来排查该问题，notifier是一个集群管理服务的控制器集合，其功能主要包含：</p><ul><li>服务树挂载</li><li>DNS注册</li><li>LVS摘接流等</li></ul><p>选择notifier的原因，在于其功能较为简单：notifier使用了client-go的informer，并对核心资源事件注册处理函数；此外也没有复杂的业务流程来干扰问题排查。</p><h2 id="问题复现"><a href="#问题复现" class="headerlink" title="问题复现"></a>问题复现</h2><p>我们在线下环境中进行测试，发现kube-apiserver服务重启后，问题能够稳定复现，这给我们排查问题带来了极大的便利。因此问题复现步骤如下：</p><ul><li>启动notifier服务</li><li>重启kube-apiserver服务</li></ul><h2 id="状态分析"><a href="#状态分析" class="headerlink" title="状态分析"></a>状态分析</h2><p>当问题发生时，我们首先对服务状态做一些基本检查：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment"># #服务存活状态</span></span><br><span class="line"><span class="comment"># ps -ef|grep notifier</span></span><br><span class="line">stupig 1007922 1003335  2 13:41 pts/1    00:00:08 ./notifier -c configs/notifier-test.toml</span><br><span class="line"> </span><br><span class="line"><span class="comment"># #服务FD打开状态</span></span><br><span class="line"><span class="comment"># lsof -nP -p 1007922</span></span><br><span class="line">COMMAND       PID       USER   FD      TYPE             DEVICE SIZE/OFF        NODE NAME</span><br><span class="line">nobody    1007922     stupig   0u       CHR              136,1      0t0           4 /dev/pts/1</span><br><span class="line">nobody    1007922     stupig   1u       CHR              136,1      0t0           4 /dev/pts/1</span><br><span class="line">nobody    1007922     stupig   2u       CHR              136,1      0t0           4 /dev/pts/1</span><br><span class="line">nobody    1007922     stupig   3u      unix 0xffff8810a3132400      0t0  4254094659 socket</span><br><span class="line">nobody    1007922     stupig   4u   a_inode                0,9        0        8548 [eventpoll]</span><br><span class="line">nobody    1007922     stupig   5r      FIFO                0,8      0t0  4253939077 pipe</span><br><span class="line">nobody    1007922     stupig   6w      FIFO                0,8      0t0  4253939077 pipe</span><br><span class="line">nobody    1007922     stupig   8u      IPv4         4254094660      0t0         UDP *:37087</span><br><span class="line">nobody    1007922     stupig   9r       CHR                1,9      0t0        2057 /dev/urandom</span><br><span class="line">nobody    1007922     stupig   10u     IPv4         4253939079      0t0         TCP *:4397 (LISTEN)</span><br><span class="line">nobody    1007922     stupig   11u      REG               8,17 12538653  8604570895 ./logs/notifier.stupig.log.INFO.20211127-134138.1007922</span><br><span class="line">nobody    1007922     stupig   15u     IPv4         4254204931      0t0         TCP 127.0.0.1:43566-&gt;127.0.0.1:2479 (ESTABLISHED)   <span class="comment"># ETCD</span></span><br><span class="line">nobody    1007922     stupig   19u      REG                8,5   252384         821 /tmp/notifier.stupig.log.ERROR.20211127-134505.1007922</span><br><span class="line">nobody    1007922     stupig   20u      REG                8,5   252384         822 /tmp/notifier.stupig.log.WARNING.20211127-134505.1007922</span><br><span class="line">nobody    1007922     stupig   21u      REG               8,17   414436  8606917935 ./logs/notifier.stupig.log.WARNING.20211127-134139.1007922</span><br><span class="line">nobody    1007922     stupig   24u      REG               8,17   290725  8606917936 ./logs/notifier.stupig.log.ERROR.20211127-134238.1007922</span><br><span class="line">nobody    1007922     stupig   30u      REG                8,5   252384         823 /tmp/notifier.stupig.log.INFO.20211127-134505.1007922</span><br></pre></td></tr></table></figure><p>对比问题发生前的服务状态信息，我们发现一个严重的问题，notifier与kube-apiserver (服务地址：<a href="https://localhost:6443/">https://localhost:6443</a>) 建立的连接消失了。</p><p>因此，notifier与kube-apiserver的数据失去了同步，其后notifier也感知不到业务的变更事件，并最终丧失了对服务的管理能力。</p><h2 id="日志分析"><a href="#日志分析" class="headerlink" title="日志分析"></a>日志分析</h2><p>现在我们分析notifier的运行日志，重点关注kube-apiserver重启时，notifier打印的日志，其中关键日志信息如下：</p><figure class="highlight routeros"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">E1127 14:08:19.728515 1041482 reflector.go:251] notifier/monitor/endpointInformer.go:140: Failed <span class="keyword">to</span> watch *v1.Endpoints: <span class="builtin-name">Get</span> <span class="string">&quot;https://127.0.0.1:6443/api/v1/endpoints?resourceVersion=276025109&amp;timeoutSeconds=395&amp;watch=true&quot;</span>: http2: <span class="literal">no</span> cached<span class="built_in"> connection </span>was available</span><br><span class="line">E1127 14:08:20.731407 1041482 reflector.go:134] notifier/monitor/endpointInformer.go:140: Failed <span class="keyword">to</span> list *v1.Endpoints: <span class="builtin-name">Get</span> <span class="string">&quot;https://127.0.0.1:6443/api/v1/endpoints?limit=500&amp;resourceVersion=0&quot;</span>: http2: <span class="literal">no</span> cached<span class="built_in"> connection </span>was available</span><br><span class="line">E1127 14:08:21.733509 1041482 reflector.go:134] notifier/monitor/endpointInformer.go:140: Failed <span class="keyword">to</span> list *v1.Endpoints: <span class="builtin-name">Get</span> <span class="string">&quot;https://127.0.0.1:6443/api/v1/endpoints?limit=500&amp;resourceVersion=0&quot;</span>: http2: <span class="literal">no</span> cached<span class="built_in"> connection </span>was available</span><br><span class="line">E1127 14:08:22.734679 1041482 reflector.go:134] notifier/monitor/endpointInformer.go:140: Failed <span class="keyword">to</span> list *v1.Endpoints: <span class="builtin-name">Get</span> <span class="string">&quot;https://127.0.0.1:6443/api/v1/endpoints?limit=500&amp;resourceVersion=0&quot;</span>: http2: <span class="literal">no</span> cached<span class="built_in"> connection </span>was available</span><br></pre></td></tr></table></figure><p>上面展示了关键的异常信息 <code>http2: no cached connection was available</code> ，而其关联的操作正是EndpointInformer的ListAndWatch操作。</p><p>这里我们已经掌握了关键线索，下一步，我们将结合代码分析定位根因。</p><h2 id="代码分析"><a href="#代码分析" class="headerlink" title="代码分析"></a>代码分析</h2><p>Informer的工作机制介绍不是本文重点，我们仅关注下面的代码片段：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// Run starts a watch and handles watch events. Will restart the watch if it is closed.</span></span><br><span class="line"><span class="comment">// Run will exit when stopCh is closed.</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(r *Reflector)</span> <span class="title">Run</span><span class="params">(stopCh &lt;-<span class="keyword">chan</span> <span class="keyword">struct</span>&#123;&#125;)</span></span> &#123;</span><br><span class="line">   glog.V(<span class="number">3</span>).Infof(<span class="string">&quot;Starting reflector %v (%s) from %s&quot;</span>, r.expectedType, r.resyncPeriod, r.name)</span><br><span class="line">   wait.Until(<span class="function"><span class="keyword">func</span><span class="params">()</span></span> &#123;</span><br><span class="line">      <span class="keyword">if</span> err := r.ListAndWatch(stopCh); err != <span class="literal">nil</span> &#123;</span><br><span class="line">         utilruntime.HandleError(err)</span><br><span class="line">      &#125;</span><br><span class="line">   &#125;, r.period, stopCh)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>Informer的Reflector组件运行在一个独立的goroutine中，并循环调用ListAndWatch接收kube-apiserver的通知事件。</p><p>我们结合日志分析可得出结论：当kube-apiserver服务重启后，notifier服务的所有ListAndWatch操作都返回了 <code>http2: no cached connection was available</code> 错误。</p><p>因此，我们将关注的重点转移至该错误信息上。</p><p>通过代码检索，我们定位了该错误的定位及返回位置：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// file: vendor/golang.org/x/net/http2/transport.go:L301</span></span><br><span class="line"><span class="keyword">var</span> ErrNoCachedConn = errors.New(<span class="string">&quot;http2: no cached connection was available&quot;</span>)</span><br><span class="line"> </span><br><span class="line"><span class="comment">// file: vendor/golang.org/x/net/http2/client_conn_pool.go:L55~80</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(p *clientConnPool)</span> <span class="title">getClientConn</span><span class="params">(req *http.Request, addr <span class="keyword">string</span>, dialOnMiss <span class="keyword">bool</span>)</span> <span class="params">(*ClientConn, error)</span></span> &#123;</span><br><span class="line">   <span class="keyword">if</span> isConnectionCloseRequest(req) &amp;&amp; dialOnMiss &#123;</span><br><span class="line">      <span class="comment">// It gets its own connection.</span></span><br><span class="line">      <span class="keyword">const</span> singleUse = <span class="literal">true</span></span><br><span class="line">      cc, err := p.t.dialClientConn(addr, singleUse)</span><br><span class="line">      <span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">         <span class="keyword">return</span> <span class="literal">nil</span>, err</span><br><span class="line">      &#125;</span><br><span class="line">      <span class="keyword">return</span> cc, <span class="literal">nil</span></span><br><span class="line">   &#125;</span><br><span class="line">   p.mu.Lock()</span><br><span class="line">   <span class="keyword">for</span> _, cc := <span class="keyword">range</span> p.conns[addr] &#123;</span><br><span class="line">      <span class="keyword">if</span> cc.CanTakeNewRequest() &#123;</span><br><span class="line">         p.mu.Unlock()</span><br><span class="line">         <span class="keyword">return</span> cc, <span class="literal">nil</span></span><br><span class="line">      &#125;</span><br><span class="line">   &#125;</span><br><span class="line">   <span class="keyword">if</span> !dialOnMiss &#123;</span><br><span class="line">      p.mu.Unlock()</span><br><span class="line">      <span class="keyword">return</span> <span class="literal">nil</span>, ErrNoCachedConn</span><br><span class="line">   &#125;</span><br><span class="line">   call := p.getStartDialLocked(addr)</span><br><span class="line">   p.mu.Unlock()</span><br><span class="line">   &lt;-call.done</span><br><span class="line">   <span class="keyword">return</span> call.res, call.err</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>上述代码返回 <code>ErrNoCachedConn</code> 的条件为：</p><ul><li>参数dialOnMiss值为false</li><li>p.conns连接池内没有可用连接</li></ul><p>理论上，在发送http请求时，如果连接池为空，则会先建立一个连接，然后发送请求；并且连接池能够自动剔除状态异常的连接。那么本文关注的问题有时如何发生的呢？</p><p>现在我们关注 <code>getClientConn</code> 方法的调用链，主要有二：</p><p>栈一：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br></pre></td><td class="code"><pre><span class="line"> <span class="number">0</span>  <span class="number">0x0000000000a590b8</span> in notifier/vendor/golang.org/x/net/http2.(*clientConnPool).getClientConn</span><br><span class="line">    at ./<span class="keyword">go</span>/src/notifier/vendor/golang.org/x/net/http2/client_conn_pool.<span class="keyword">go</span>:<span class="number">55</span></span><br><span class="line"> <span class="number">1</span>  <span class="number">0x0000000000a5aea6</span> in notifier/vendor/golang.org/x/net/http2.noDialClientConnPool.GetClientConn</span><br><span class="line">    at ./<span class="keyword">go</span>/src/notifier/vendor/golang.org/x/net/http2/client_conn_pool.<span class="keyword">go</span>:<span class="number">255</span></span><br><span class="line"> <span class="number">2</span>  <span class="number">0x0000000000a6c4f9</span> in notifier/vendor/golang.org/x/net/http2.(*Transport).RoundTripOpt</span><br><span class="line">    at ./<span class="keyword">go</span>/src/notifier/vendor/golang.org/x/net/http2/transport.<span class="keyword">go</span>:<span class="number">345</span></span><br><span class="line"> <span class="number">3</span>  <span class="number">0x0000000000a6bd0e</span> in notifier/vendor/golang.org/x/net/http2.(*Transport).RoundTrip</span><br><span class="line">    at ./<span class="keyword">go</span>/src/notifier/vendor/golang.org/x/net/http2/transport.<span class="keyword">go</span>:<span class="number">313</span></span><br><span class="line"> <span class="number">4</span>  <span class="number">0x0000000000a5b97e</span> in notifier/vendor/golang.org/x/net/http2.noDialH2RoundTripper.RoundTrip</span><br><span class="line">    at ./<span class="keyword">go</span>/src/notifier/vendor/golang.org/x/net/http2/configure_transport.<span class="keyword">go</span>:<span class="number">75</span></span><br><span class="line"> <span class="number">5</span>  <span class="number">0x0000000000828e45</span> in net/http.(*Transport).roundTrip</span><br><span class="line">    at /usr/local/go1<span class="number">.16</span><span class="number">.7</span>/src/net/http/transport.<span class="keyword">go</span>:<span class="number">537</span></span><br><span class="line"> <span class="number">6</span>  <span class="number">0x00000000008016de</span> in net/http.(*Transport).RoundTrip</span><br><span class="line">    at /usr/local/go1<span class="number">.16</span><span class="number">.7</span>/src/net/http/roundtrip.<span class="keyword">go</span>:<span class="number">17</span></span><br><span class="line"> <span class="number">7</span>  <span class="number">0x00000000016a1ef8</span> in notifier/vendor/k8s.io/client-<span class="keyword">go</span>/transport.(*userAgentRoundTripper).RoundTrip</span><br><span class="line">    at ./<span class="keyword">go</span>/src/notifier/vendor/k8s.io/client-<span class="keyword">go</span>/transport/round_trippers.<span class="keyword">go</span>:<span class="number">162</span></span><br><span class="line"> <span class="number">8</span>  <span class="number">0x00000000007a3aa2</span> in net/http.send</span><br><span class="line">    at /usr/local/go1<span class="number">.16</span><span class="number">.7</span>/src/net/http/client.<span class="keyword">go</span>:<span class="number">251</span></span><br><span class="line"> <span class="number">9</span>  <span class="number">0x00000000007a324b</span> in net/http.(*Client).send</span><br><span class="line">    at /usr/local/go1<span class="number">.16</span><span class="number">.7</span>/src/net/http/client.<span class="keyword">go</span>:<span class="number">175</span></span><br><span class="line"><span class="number">10</span>  <span class="number">0x00000000007a6ed5</span> in net/http.(*Client).do</span><br><span class="line">    at /usr/local/go1<span class="number">.16</span><span class="number">.7</span>/src/net/http/client.<span class="keyword">go</span>:<span class="number">717</span></span><br><span class="line"><span class="number">11</span>  <span class="number">0x00000000007a5d9e</span> in net/http.(*Client).Do</span><br><span class="line">    at /usr/local/go1<span class="number">.16</span><span class="number">.7</span>/src/net/http/client.<span class="keyword">go</span>:<span class="number">585</span></span><br><span class="line"><span class="number">12</span>  <span class="number">0x00000000016b9487</span> in notifier/vendor/k8s.io/client-<span class="keyword">go</span>/rest.(*Request).request</span><br><span class="line">    at ./<span class="keyword">go</span>/src/notifier/vendor/k8s.io/client-<span class="keyword">go</span>/rest/request.<span class="keyword">go</span>:<span class="number">732</span></span><br><span class="line"><span class="number">13</span>  <span class="number">0x00000000016b9f2d</span> in notifier/vendor/k8s.io/client-<span class="keyword">go</span>/rest.(*Request).Do</span><br><span class="line">    at ./<span class="keyword">go</span>/src/notifier/vendor/k8s.io/client-<span class="keyword">go</span>/rest/request.<span class="keyword">go</span>:<span class="number">804</span></span><br><span class="line"><span class="number">14</span>  <span class="number">0x00000000017093bb</span> in notifier/vendor/k8s.io/client-<span class="keyword">go</span>/kubernetes/typed/core/v1.(*endpoints).List</span><br><span class="line">    at ./<span class="keyword">go</span>/src/notifier/vendor/k8s.io/client-<span class="keyword">go</span>/kubernetes/typed/core/v1/endpoints.<span class="keyword">go</span>:<span class="number">83</span></span><br><span class="line">……</span><br></pre></td></tr></table></figure><p>栈二：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br></pre></td><td class="code"><pre><span class="line"> <span class="number">0</span>  <span class="number">0x0000000000a590b8</span> in notifier/vendor/golang.org/x/net/http2.(*clientConnPool).getClientConn</span><br><span class="line">    at ./<span class="keyword">go</span>/src/notifier/vendor/golang.org/x/net/http2/client_conn_pool.<span class="keyword">go</span>:<span class="number">55</span></span><br><span class="line"> <span class="number">1</span>  <span class="number">0x0000000000a5aea6</span> in notifier/vendor/golang.org/x/net/http2.noDialClientConnPool.GetClientConn</span><br><span class="line">    at ./<span class="keyword">go</span>/src/notifier/vendor/golang.org/x/net/http2/client_conn_pool.<span class="keyword">go</span>:<span class="number">255</span></span><br><span class="line"> <span class="number">2</span>  <span class="number">0x0000000000a6c4f9</span> in notifier/vendor/golang.org/x/net/http2.(*Transport).RoundTripOpt</span><br><span class="line">    at ./<span class="keyword">go</span>/src/notifier/vendor/golang.org/x/net/http2/transport.<span class="keyword">go</span>:<span class="number">345</span></span><br><span class="line"> <span class="number">3</span>  <span class="number">0x0000000000a6bd0e</span> in notifier/vendor/golang.org/x/net/http2.(*Transport).RoundTrip</span><br><span class="line">    at ./<span class="keyword">go</span>/src/notifier/vendor/golang.org/x/net/http2/transport.<span class="keyword">go</span>:<span class="number">313</span></span><br><span class="line"> <span class="number">4</span>  <span class="number">0x00000000008296ed</span> in net/http.(*Transport).roundTrip</span><br><span class="line">    at /usr/local/go1<span class="number">.16</span><span class="number">.7</span>/src/net/http/transport.<span class="keyword">go</span>:<span class="number">590</span></span><br><span class="line"> <span class="number">5</span>  <span class="number">0x00000000008016de</span> in net/http.(*Transport).RoundTrip</span><br><span class="line">    at /usr/local/go1<span class="number">.16</span><span class="number">.7</span>/src/net/http/roundtrip.<span class="keyword">go</span>:<span class="number">17</span></span><br><span class="line"> <span class="number">6</span>  <span class="number">0x00000000016a1ef8</span> in notifier/vendor/k8s.io/client-<span class="keyword">go</span>/transport.(*userAgentRoundTripper).RoundTrip</span><br><span class="line">    at ./<span class="keyword">go</span>/src/notifier/vendor/k8s.io/client-<span class="keyword">go</span>/transport/round_trippers.<span class="keyword">go</span>:<span class="number">162</span></span><br><span class="line"> <span class="number">7</span>  <span class="number">0x00000000007a3aa2</span> in net/http.send</span><br><span class="line">    at /usr/local/go1<span class="number">.16</span><span class="number">.7</span>/src/net/http/client.<span class="keyword">go</span>:<span class="number">251</span></span><br><span class="line"> <span class="number">8</span>  <span class="number">0x00000000007a324b</span> in net/http.(*Client).send</span><br><span class="line">    at /usr/local/go1<span class="number">.16</span><span class="number">.7</span>/src/net/http/client.<span class="keyword">go</span>:<span class="number">175</span></span><br><span class="line"> <span class="number">9</span>  <span class="number">0x00000000007a6ed5</span> in net/http.(*Client).do</span><br><span class="line">    at /usr/local/go1<span class="number">.16</span><span class="number">.7</span>/src/net/http/client.<span class="keyword">go</span>:<span class="number">717</span></span><br><span class="line"><span class="number">10</span>  <span class="number">0x00000000007a5d9e</span> in net/http.(*Client).Do</span><br><span class="line">    at /usr/local/go1<span class="number">.16</span><span class="number">.7</span>/src/net/http/client.<span class="keyword">go</span>:<span class="number">585</span></span><br><span class="line"><span class="number">11</span>  <span class="number">0x00000000016b9487</span> in notifier/vendor/k8s.io/client-<span class="keyword">go</span>/rest.(*Request).request</span><br><span class="line">    at ./<span class="keyword">go</span>/src/notifier/vendor/k8s.io/client-<span class="keyword">go</span>/rest/request.<span class="keyword">go</span>:<span class="number">732</span></span><br><span class="line"><span class="number">12</span>  <span class="number">0x00000000016b9f2d</span> in notifier/vendor/k8s.io/client-<span class="keyword">go</span>/rest.(*Request).Do</span><br><span class="line">    at ./<span class="keyword">go</span>/src/notifier/vendor/k8s.io/client-<span class="keyword">go</span>/rest/request.<span class="keyword">go</span>:<span class="number">804</span></span><br><span class="line"><span class="number">13</span>  <span class="number">0x00000000017093bb</span> in notifier/vendor/k8s.io/client-<span class="keyword">go</span>/kubernetes/typed/core/v1.(*endpoints).List</span><br><span class="line">    at ./<span class="keyword">go</span>/src/notifier/vendor/k8s.io/client-<span class="keyword">go</span>/kubernetes/typed/core/v1/endpoints.<span class="keyword">go</span>:<span class="number">83</span></span><br><span class="line">……</span><br></pre></td></tr></table></figure><p>分别跟踪两个调用栈后，我们可以很快排除栈一的因素：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// file: net/http/transport.go:L502~620</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(t *Transport)</span> <span class="title">roundTrip</span><span class="params">(req *Request)</span> <span class="params">(*Response, error)</span></span> &#123;</span><br><span class="line">   <span class="keyword">if</span> altRT := t.alternateRoundTripper(req); altRT != <span class="literal">nil</span> &#123;               <span class="comment">// L537</span></span><br><span class="line">   <span class="keyword">if</span> resp, err := altRT.RoundTrip(req); err != ErrSkipAltProtocol &#123;</span><br><span class="line">      <span class="keyword">return</span> resp, err</span><br><span class="line">   &#125;</span><br><span class="line">   <span class="keyword">var</span> err error</span><br><span class="line">   req, err = rewindBody(req)</span><br><span class="line">   <span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">      <span class="keyword">return</span> <span class="literal">nil</span>, err</span><br><span class="line">   &#125;</span><br><span class="line">&#125;</span><br><span class="line"> </span><br><span class="line"><span class="comment">// file: vendor/golang.org/x/net/http2/configure_transport.go:L74~80</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(rt noDialH2RoundTripper)</span> <span class="title">RoundTrip</span><span class="params">(req *http.Request)</span> <span class="params">(*http.Response, error)</span></span> &#123;</span><br><span class="line">   res, err := rt.t.RoundTrip(req)                                        <span class="comment">// L75</span></span><br><span class="line">   <span class="keyword">if</span> err == ErrNoCachedConn &#123;</span><br><span class="line">      <span class="keyword">return</span> <span class="literal">nil</span>, http.ErrSkipAltProtocol</span><br><span class="line">   &#125;</span><br><span class="line">   <span class="keyword">return</span> res, err</span><br><span class="line">&#125;</span><br><span class="line"> </span><br><span class="line"><span class="comment">// file: vendor/golang.org/x/net/http2/transport.go:L312~314</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(t *Transport)</span> <span class="title">RoundTrip</span><span class="params">(req *http.Request)</span> <span class="params">(*http.Response, error)</span></span> &#123;</span><br><span class="line">   <span class="keyword">return</span> t.RoundTripOpt(req, RoundTripOpt&#123;&#125;)                             <span class="comment">// L313</span></span><br><span class="line">&#125;</span><br><span class="line"> </span><br><span class="line"><span class="comment">// file: vendor/golang.org/x/net/http2/transport.go:L337~379</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(t *Transport)</span> <span class="title">RoundTripOpt</span><span class="params">(req *http.Request, opt RoundTripOpt)</span> <span class="params">(*http.Response, error)</span></span> &#123;</span><br><span class="line">   addr := authorityAddr(req.URL.Scheme, req.URL.Host)</span><br><span class="line">   <span class="keyword">for</span> retry := <span class="number">0</span>; ; retry++ &#123;</span><br><span class="line">      cc, err := t.connPool().GetClientConn(req, addr)                    <span class="comment">// L345</span></span><br><span class="line">      <span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">         t.vlogf(<span class="string">&quot;http2: Transport failed to get client conn for %s: %v&quot;</span>, addr, err)</span><br><span class="line">         <span class="keyword">return</span> <span class="literal">nil</span>, err</span><br><span class="line">      &#125;</span><br><span class="line">   &#125;</span><br><span class="line">&#125;</span><br><span class="line"> </span><br><span class="line"><span class="comment">// file: vendor/golang.org/x/net/http2/client_conn_pool.go:L254~256</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(p noDialClientConnPool)</span> <span class="title">GetClientConn</span><span class="params">(req *http.Request, addr <span class="keyword">string</span>)</span> <span class="params">(*ClientConn, error)</span></span> &#123;</span><br><span class="line">   <span class="keyword">return</span> p.getClientConn(req, addr, noDialOnMiss)                        <span class="comment">// L255</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>栈一调用 <code>getClientConn</code> 返回了 <code>ErrNoCachedConn</code> 错误，并在 <code>noDialH2RoundTripper.RoundTrip</code> 函数中被替换为 <code>http.ErrSkipAltProtocol</code> 错误，返回 <code>roundTrip</code> 函数后继续执行余下流程，并进入栈二的流程。</p><p>因此我们重点关注栈二的流程：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// file: net/http/transport.go:L502~620</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(t *Transport)</span> <span class="title">roundTrip</span><span class="params">(req *Request)</span> <span class="params">(*Response, error)</span></span> &#123;</span><br><span class="line">   <span class="keyword">for</span> &#123;</span><br><span class="line">      <span class="keyword">var</span> resp *Response</span><br><span class="line">      <span class="keyword">if</span> pconn.alt != <span class="literal">nil</span> &#123;</span><br><span class="line">         <span class="comment">// HTTP/2 path.</span></span><br><span class="line">         t.setReqCanceler(cancelKey, <span class="literal">nil</span>) <span class="comment">// not cancelable with CancelRequest</span></span><br><span class="line">         resp, err = pconn.alt.RoundTrip(req)                             <span class="comment">// L590</span></span><br><span class="line">      &#125;</span><br><span class="line">      <span class="keyword">if</span> err == <span class="literal">nil</span> &#123;</span><br><span class="line">         resp.Request = origReq</span><br><span class="line">         <span class="keyword">return</span> resp, <span class="literal">nil</span></span><br><span class="line">      &#125;</span><br><span class="line"> </span><br><span class="line">      <span class="comment">// Failed. Clean up and determine whether to retry.</span></span><br><span class="line">      <span class="keyword">if</span> http2isNoCachedConnError(err) &#123;</span><br><span class="line">         <span class="keyword">if</span> t.removeIdleConn(pconn) &#123;</span><br><span class="line">            t.decConnsPerHost(pconn.cacheKey)</span><br><span class="line">         &#125;</span><br><span class="line">      &#125; <span class="keyword">else</span> <span class="keyword">if</span> !pconn.shouldRetryRequest(req, err) &#123;</span><br><span class="line">         <span class="keyword">return</span> <span class="literal">nil</span>, err</span><br><span class="line">      &#125;</span><br><span class="line">   &#125;</span><br><span class="line">&#125;</span><br><span class="line"> </span><br><span class="line"><span class="comment">// file: vendor/golang.org/x/net/http2/transport.go:L312~314</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(t *Transport)</span> <span class="title">RoundTrip</span><span class="params">(req *http.Request)</span> <span class="params">(*http.Response, error)</span></span> &#123;</span><br><span class="line">   <span class="keyword">return</span> t.RoundTripOpt(req, RoundTripOpt&#123;&#125;)                             <span class="comment">// L313</span></span><br><span class="line">&#125;</span><br><span class="line"> </span><br><span class="line"><span class="comment">// 内层调用栈同栈一，不再列出</span></span><br></pre></td></tr></table></figure><p>区别于栈一，栈二不再对返回错误做一个转换，而是直接返回了 <code>ErrNoCachedConn</code> 错误，并且 <code>roundTrip</code> 的错误处理流程中也特殊处理了本类错误。如果检测 <code>http2isnoCachedConnError</code> 返回true，则连接池会移除该异常连接。</p><p>一切都那么的合乎情理，那么问题是如何发生的呢？这里问题就发生在 <code>http2isnoCachedConnError</code>：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// file: net/http/h2_bundle.go:L6922~6928</span></span><br><span class="line"><span class="comment">// isNoCachedConnError reports whether err is of type noCachedConnError</span></span><br><span class="line"><span class="comment">// or its equivalent renamed type in net/http2&#x27;s h2_bundle.go. Both types</span></span><br><span class="line"><span class="comment">// may coexist in the same running program.</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">http2isNoCachedConnError</span><span class="params">(err error)</span> <span class="title">bool</span></span> &#123;</span><br><span class="line">   _, ok := err.(<span class="keyword">interface</span>&#123; IsHTTP2NoCachedConnError() &#125;)</span><br><span class="line">   <span class="keyword">return</span> ok</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>如果 <code>err</code> 对象实现了匿名接口 (仅定义了一个函数 <code>IsHTTP2NoCachedConnError</code>)，那么返回true，否则返回false。</p><p>那么，<code>getClientConn</code> 返回的错误类型实现了该接口吗？很显然：没有。</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// file: vendor/golang.org/x/net/http2/transport.go:L301</span></span><br><span class="line"><span class="keyword">var</span> ErrNoCachedConn = errors.New(<span class="string">&quot;http2: no cached connection was available&quot;</span>)</span><br></pre></td></tr></table></figure><p>至此，问题发生的原因已基本定位清楚。</p><h1 id="解决方案"><a href="#解决方案" class="headerlink" title="解决方案"></a>解决方案</h1><p>既然问题是由于 <code>getClientConn</code> 返回的错误类型 <code>ErrNoCachedConn</code> 没有实现 <code>IsHTTP2NoCachedConnError</code> 函数引起，那么其修复策略自然是：修改返回错误类型，并实现该接口函数。</p><p>注意，由于该部分代码是我们引用的外部代码库的内容，我们检查最新的 <code>golang.org/x/net</code> 代码发现，问题早在2018年1月份就已被修复。。。具体参见：<a href="https://github.com/golang/net/commit/ab555f366c4508dbe0802550b1b20c46c5c18aa0">golang.org/x/net修复方案</a>。</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// noCachedConnError is the concrete type of ErrNoCachedConn, which</span></span><br><span class="line"><span class="comment">// needs to be detected by net/http regardless of whether it&#x27;s its</span></span><br><span class="line"><span class="comment">// bundled version (in h2_bundle.go with a rewritten type name) or</span></span><br><span class="line"><span class="comment">// from a user&#x27;s x/net/http2. As such, as it has a unique method name</span></span><br><span class="line"><span class="comment">// (IsHTTP2NoCachedConnError) that net/http sniffs for via func</span></span><br><span class="line"><span class="comment">// isNoCachedConnError.</span></span><br><span class="line"><span class="keyword">type</span> noCachedConnError <span class="keyword">struct</span>&#123;&#125;</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(noCachedConnError)</span> <span class="title">IsHTTP2NoCachedConnError</span><span class="params">()</span></span> &#123;&#125;</span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(noCachedConnError)</span> <span class="title">Error</span><span class="params">()</span> <span class="title">string</span></span>             &#123; <span class="keyword">return</span> <span class="string">&quot;http2: no cached connection was available&quot;</span> &#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// isNoCachedConnError reports whether err is of type noCachedConnError</span></span><br><span class="line"><span class="comment">// or its equivalent renamed type in net/http2&#x27;s h2_bundle.go. Both types</span></span><br><span class="line"><span class="comment">// may coexist in the same running program.</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">isNoCachedConnError</span><span class="params">(err error)</span> <span class="title">bool</span></span> &#123;</span><br><span class="line">_, ok := err.(<span class="keyword">interface</span>&#123; IsHTTP2NoCachedConnError() &#125;)</span><br><span class="line"><span class="keyword">return</span> ok</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="keyword">var</span> ErrNoCachedConn error = noCachedConnError&#123;&#125;</span><br></pre></td></tr></table></figure><p>而我们线上使用的版本仍然为：1c05540f6。</p><p>因此，我们的修复策略变得更为简单，升级vendor中的依赖库版本即可。</p><p>目前，线上notifier服务已升级依赖版本，全量上线所有机房。并且也已验证kube-apiserver重启，不会导致notifier服务异常。</p>]]></content>
    
    
      
      
    <summary type="html">&lt;h1 id=&quot;背景&quot;&gt;&lt;a href=&quot;#背景&quot; class=&quot;headerlink&quot; title=&quot;背景&quot;&gt;&lt;/a&gt;背景&lt;/h1&gt;&lt;p&gt;长期以来，弹性云线上服务一直饱受缓存不一致的困扰。&lt;/p&gt;
&lt;p&gt;缓存不一致的发生一般伴随着kube-apiserver的升级或重启。且当缓</summary>
      
    
    
    
    <category term="问题排查" scheme="https://plpan.github.io/categories/%E9%97%AE%E9%A2%98%E6%8E%92%E6%9F%A5/"/>
    
    
    <category term="kubernetes" scheme="https://plpan.github.io/tags/kubernetes/"/>
    
    <category term="client-go" scheme="https://plpan.github.io/tags/client-go/"/>
    
    <category term="informer" scheme="https://plpan.github.io/tags/informer/"/>
    
  </entry>
  
  <entry>
    <title>pod terminating 排查之旅(二)</title>
    <link href="https://plpan.github.io/pod-terminating-%E6%8E%92%E6%9F%A5%E4%B9%8B%E6%97%85-%E4%BA%8C/"/>
    <id>https://plpan.github.io/pod-terminating-%E6%8E%92%E6%9F%A5%E4%B9%8B%E6%97%85-%E4%BA%8C/</id>
    <published>2021-06-24T08:39:23.000Z</published>
    <updated>2021-06-28T02:19:32.413Z</updated>
    
    <content type="html"><![CDATA[<h1 id="背景"><a href="#背景" class="headerlink" title="背景"></a>背景</h1><p>近期，线上报障了多起Pod删除失败的Case，用户的多次删除请求均已失败告终。Pod删除失败的影响主要有二：</p><ul><li>面向用户：用户体验下降，且无法对该Pod执行后续的发布流程</li><li>面向弹性云：失败率提升，SLA无法达标</li></ul><p>并且，随着弹性云Docker版本升级 (1.13 → 18.06) 进度的推进，线上出现Pod删除失败的Case隐隐有增多的趋势。</p><p>线上问题无小事！不论是从哪个角度出发，我们都应该给线上环境把把脉，看看是哪个系统出了问题。</p><h1 id="问题定位"><a href="#问题定位" class="headerlink" title="问题定位"></a>问题定位</h1><p>由于线上出Case的频率并不低，基本每周都会出现，这反而给我们定位问题带来了便利^_^。</p><p>排查线上问题的思路一般分如下两个步骤：</p><ul><li>定位出现问题的组件</li><li>定位组件出现的问题</li></ul><h2 id="组件定位"><a href="#组件定位" class="headerlink" title="组件定位"></a>组件定位</h2><p>对于弹性云的同学来说，定位问题组件已经有一套标准流程：从上往下，看看问题是由哪个组件引起。【不清楚组件通信流程的同学可以看看<a href="https://plpan.github.io/pod-terminating-%E6%8E%92%E6%9F%A5%E4%B9%8B%E6%97%85/">容器删除失败排查</a>】</p><p>1）第一嫌疑人：kubelet</p><p>在kubernetes体系架构下，删除Pod的执行者就是kubelet，作为第一个提审对象，它不冤。</p><p>虽然无奈，但是kubelet也早已习惯了，并且在多次经历过社会的毒打之后，练就了一身的甩锅能力：</p><figure class="highlight angelscript"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">I0624 <span class="number">11</span>:<span class="number">00</span>:<span class="number">26.658872</span>   <span class="number">21280</span> kubelet.go:<span class="number">1923</span>] skipping pod synchronization - [PLEG <span class="keyword">is</span> <span class="keyword">not</span> healthy: pleg was last seen active <span class="number">3</span>m0<span class="number">.439656895</span>s ago; threshold <span class="keyword">is</span> <span class="number">3</span>m0s]</span><br></pre></td></tr></table></figure><p>什么意思？死贫道不死道友呗。</p><p>PLEG模块不健康？<code>PLEG</code>是kubelet的一个子模块单元，用来统一管理底层容器的运行状态。</p><p>kubelet招供：我是好人啊，不能冤枉我，都是docker惹的祸！</p><p>2）第二嫌疑人：dockerd</p><p>根据kubelet的证词，我们很快提审本案的第二嫌疑人：dockerd。</p><p>为自证清白，dockerd三下五除二打出一套军体拳：</p><figure class="highlight awk"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment"># docker ps</span></span><br><span class="line"><span class="regexp">//</span> 运行正常，dockerd轻舒一口气</span><br><span class="line"><span class="comment"># docker ps -a | grep -V NAMES | awk &#x27;&#123;print $1&#125;&#x27; | xargs -ti docker inspect -f &#123;&#123;.State.Pid&#125;&#125; &#123;&#125;</span></span><br><span class="line"><span class="regexp">//</span> 执行到 docker inspect -f &#123;&#123;.State.Pid&#125;&#125; <span class="number">60</span>f253d59f26 时，命令卡住</span><br></pre></td></tr></table></figure><p>该容器恰好属于用户删除失败的Pod。</p><p>你们可能没看到当时dockerd的脸色，面如死灰，且一直喃喃自语：难道真是我的锅？</p><p>好几分钟后，dockerd才慢慢缓过来，理了理思绪，想好了一套甩锅流程：虽然问题出现在我这，但是你们的证据不足，不能证明是我亲手干的。我手下养着一大帮人，可能是一些小弟自己偷偷干的。</p><p>嘿，你还有理了，那好，我继续收集证据，让你死的明明白白。</p><p>3）第三嫌疑人：containerd</p><p>作为dockerd手下的二当家，我们首先传讯了containerd。这人一看就老实忠厚，它看着dockerd的证词，苦笑了下，吐槽到：跟着大哥这么多年，还是没有得到大哥的信任（所以kubernetes在1.20版本中，将containerd扶上了大哥的位置？哈哈）。</p><p>containerd有条不紊的祭出三板斧：</p><figure class="highlight gradle"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line"># docker-containerd-ctr -a <span class="regexp">/var/</span>run<span class="regexp">/docker/</span>containerd/docker-containerd.sock -n moby c ls</span><br><span class="line"><span class="comment">// 运行正常</span></span><br><span class="line"># docker-containerd-ctr -a <span class="regexp">/var/</span>run<span class="regexp">/docker/</span>containerd/docker-containerd.sock -n moby t ls</span><br><span class="line"><span class="comment">// 命令卡死，containerd老脸一僵，但是很快恢复正常</span></span><br><span class="line"># docker-containerd-ctr -a <span class="regexp">/var/</span>run<span class="regexp">/docker/</span>containerd<span class="regexp">/docker-containerd.sock -n moby c ls | grep -v IMAGE | awk &#x27;&#123;print $1&#125;&#x27; | xargs -ti docker-containerd-ctr -a /</span>var<span class="regexp">/run/</span>docker<span class="regexp">/containerd/</span>docker-containerd.sock -n moby t ps &#123;&#125;</span><br><span class="line"><span class="comment">// 执行到 docker-containerd-ctr -a /var/run/docker/containerd/docker-containerd.sock -n moby t ps 60f253d59f26e1c573d4ba5f824e73b3a4b1bb1629edace85caba4c620755d4d 时，命令卡住</span></span><br></pre></td></tr></table></figure><p>containerd神色不自然的说：不好意思啊，警官，可能是我家里不争气的孩子惹的祸，这孩子以前也犯过事【<a href="https://plpan.github.io/docker-hang-%E6%AD%BB%E6%8E%92%E6%9F%A5%E4%B9%8B%E6%97%85/">docker hang 死排查</a>】。</p><p>4）第四嫌疑人：containerd-shim</p><p>containerd的话还没说完，待在一旁的儿子containerd-shim跳出来指着containerd叛逆地说：你凭什么说是我？你自己干的那些破事，我都不稀罕说你。</p><p>清官难断家务事！案件排查至此，从已知证据，还真不好确认到底是老子，还是儿子犯的罪。</p><p>5）第五嫌疑人：runc</p><p>万般无奈，我们清楚了本案的最后一个嫌疑人，也是年纪最小的runc。runc只会呀呀自语地说：不是我，不是我！</p><figure class="highlight apache"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment"># docker-runc --root /var/run/docker/runtime-runc/moby/ list</span></span><br><span class="line"><span class="attribute">60f253d59f26e1c573d4ba5f824e73b3a4b1bb1629edace85caba4c620755d4d</span>   <span class="number">0</span>           stopped     /run/docker/containerd/daemon/io.containerd.runtime.v<span class="number">1</span>.linux/moby/<span class="number">60</span>f<span class="number">253</span>d<span class="number">59</span>f<span class="number">26</span>e<span class="number">1</span>c<span class="number">573</span>d<span class="number">4</span>ba<span class="number">5</span>f<span class="number">824</span>e<span class="number">73</span>b<span class="number">3</span>a<span class="number">4</span>b<span class="number">1</span>bb<span class="number">1629</span>edace<span class="number">85</span>caba<span class="number">4</span>c<span class="number">620755</span>d<span class="number">4</span>d   <span class="number">2021</span>-<span class="number">05</span>-<span class="number">07</span>T<span class="number">12</span>:<span class="number">43</span>:<span class="number">02</span>.<span class="number">62261156</span>Z    root</span><br></pre></td></tr></table></figure><p>从现场收集到的证据表明，这个案件和runc还真没什么关系，案发时，它已经离开了现场，只不过留下了一个烂摊子等着别人来清理。</p><p>从众人招供的语录来看，案件嫌疑人初步锁定了containerd与containerd-shim，但是具体是谁，都还不好说。</p><p>正当大家一筹莫展之际，一位老刑警从现场提取到了一些新的线索，案件终于有了新的进展。</p><p>6）新线索</p><p>老刑警领着大家观察它收集到的新线索：</p><figure class="highlight yaml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment"># sudo docker ps -a | grep PODNAME</span></span><br><span class="line"><span class="string">60f253d59f26</span>  <span class="string">image</span>      <span class="string">&quot;/dockerinit&quot;</span>  <span class="number">6</span> <span class="string">weeks</span> <span class="string">ago</span>    <span class="string">Up</span> <span class="number">6</span> <span class="string">weeks</span>                        <span class="string">k8s_CNAME_PODNAME_default_013e3b0e-8d17-11eb-8ef7-246e9693e13c_10</span></span><br><span class="line"><span class="string">6e6fc586dc12</span>  <span class="string">pause:3.1</span>  <span class="string">&quot;/pause&quot;</span>       <span class="number">6</span> <span class="string">weeks</span> <span class="string">ago</span>    <span class="string">Exited</span> <span class="string">(0)</span> <span class="string">About</span> <span class="string">an</span> <span class="string">hour</span> <span class="string">ago</span>      <span class="string">k8s_POD_PODNAME_default_013e3b0e-8d17-11eb-8ef7-246e9693e13c_1</span></span><br><span class="line"> </span><br><span class="line"> </span><br><span class="line"><span class="comment"># ps -ef|grep 60f253d59f26</span></span><br><span class="line"><span class="string">root</span>      <span class="number">119820</span>    <span class="number">3608  </span><span class="number">0</span> <span class="string">May07</span> <span class="string">?</span>        <span class="number">00</span><span class="string">:11:16</span> <span class="string">docker-containerd-shim</span> <span class="string">-namespace</span> <span class="string">moby</span> <span class="string">-workdir</span> <span class="string">/docker/docker_rt/containerd/daemon/io.containerd.runtime.v1.linux/moby/60f253d59f26e1c573d4ba5f824e73b3a4b1bb1629edace85caba4c620755d4d</span> <span class="string">-address</span> <span class="string">/var/run/docker/containerd/docker-containerd.sock</span> <span class="string">-containerd-binary</span> <span class="string">/usr/bin/docker-containerd</span> <span class="string">-runtime-root</span> <span class="string">/var/run/docker/runtime-runc</span></span><br><span class="line"><span class="string">stupig</span>    <span class="number">1629698</span> <span class="number">1599793</span>  <span class="number">0</span> <span class="number">11</span><span class="string">:44</span> <span class="string">pts/0</span>    <span class="number">00</span><span class="string">:00:00</span> <span class="string">grep</span> <span class="string">--color=auto</span> <span class="string">60f253d59f26</span></span><br><span class="line"> </span><br><span class="line"> </span><br><span class="line"><span class="comment"># ps -ef|grep 119820</span></span><br><span class="line"><span class="string">root</span>       <span class="number">40825</span>  <span class="number">119820</span>  <span class="number">0</span> <span class="string">Jun04</span> <span class="string">?</span>        <span class="number">00</span><span class="string">:00:00</span> [<span class="string">su</span>] <span class="string">&lt;defunct&gt;</span></span><br><span class="line"><span class="number">76183</span>      <span class="number">40833</span>  <span class="number">119820</span>  <span class="number">0</span> <span class="string">Jun04</span> <span class="string">?</span>        <span class="number">00</span><span class="string">:00:00</span> [<span class="string">bash</span>] <span class="string">&lt;defunct&gt;</span></span><br><span class="line"><span class="string">root</span>      <span class="number">119820</span>    <span class="number">3608  </span><span class="number">0</span> <span class="string">May07</span> <span class="string">?</span>        <span class="number">00</span><span class="string">:11:16</span> <span class="string">docker-containerd-shim</span> <span class="string">-namespace</span> <span class="string">moby</span> <span class="string">-workdir</span> <span class="string">/docker/docker_rt/containerd/daemon/io.containerd.runtime.v1.linux/moby/60f253d59f26e1c573d4ba5f824e73b3a4b1bb1629edace85caba4c620755d4d</span> <span class="string">-address</span> <span class="string">/var/run/docker/containerd/docker-containerd.sock</span> <span class="string">-containerd-binary</span> <span class="string">/usr/bin/docker-containerd</span> <span class="string">-runtime-root</span> <span class="string">/var/run/docker/runtime-runc</span></span><br><span class="line"><span class="string">root</span>      <span class="number">119886</span>  <span class="number">119820</span>  <span class="number">0</span> <span class="string">May07</span> <span class="string">?</span>        <span class="number">00</span><span class="string">:04:29</span> [<span class="string">dockerinit</span>] <span class="string">&lt;defunct&gt;</span></span><br><span class="line"><span class="string">root</span>      <span class="number">568896</span>  <span class="number">119820</span>  <span class="number">0</span> <span class="string">Jun07</span> <span class="string">?</span>        <span class="number">00</span><span class="string">:00:00</span> [<span class="string">su</span>] <span class="string">&lt;defunct&gt;</span></span><br><span class="line"><span class="number">76183</span>     <span class="number">568898</span>  <span class="number">119820</span>  <span class="number">0</span> <span class="string">Jun07</span> <span class="string">?</span>        <span class="number">00</span><span class="string">:00:00</span> [<span class="string">bash</span>] <span class="string">&lt;defunct&gt;</span></span><br><span class="line"><span class="string">root</span>      <span class="number">695031</span>  <span class="number">119820</span>  <span class="number">0</span> <span class="string">May08</span> <span class="string">?</span>        <span class="number">00</span><span class="string">:00:00</span> [<span class="string">su</span>] <span class="string">&lt;defunct&gt;</span></span><br><span class="line"><span class="number">74647</span>     <span class="number">695037</span>  <span class="number">119820</span>  <span class="number">0</span> <span class="string">May08</span> <span class="string">?</span>        <span class="number">00</span><span class="string">:00:00</span> [<span class="string">bash</span>] <span class="string">&lt;defunct&gt;</span></span><br><span class="line"><span class="string">root</span>      <span class="number">802705</span>  <span class="number">119820</span>  <span class="number">0</span> <span class="string">Jun09</span> <span class="string">?</span>        <span class="number">00</span><span class="string">:00:00</span> [<span class="string">su</span>] <span class="string">&lt;defunct&gt;</span></span><br><span class="line"><span class="number">74647</span>     <span class="number">802709</span>  <span class="number">119820</span>  <span class="number">0</span> <span class="string">Jun09</span> <span class="string">?</span>        <span class="number">00</span><span class="string">:00:00</span> [<span class="string">bash</span>] <span class="string">&lt;defunct&gt;</span></span><br><span class="line"><span class="string">root</span>      <span class="number">865131</span>  <span class="number">119820</span>  <span class="number">0</span> <span class="string">Jun07</span> <span class="string">?</span>        <span class="number">00</span><span class="string">:00:00</span> [<span class="string">su</span>] <span class="string">&lt;defunct&gt;</span></span><br><span class="line"><span class="number">69099</span>     <span class="number">865133</span>  <span class="number">119820</span>  <span class="number">0</span> <span class="string">Jun07</span> <span class="string">?</span>        <span class="number">00</span><span class="string">:00:00</span> [<span class="string">bash</span>] <span class="string">&lt;defunct&gt;</span></span><br><span class="line"><span class="string">root</span>     <span class="number">1073407</span>  <span class="number">119820</span>  <span class="number">0</span> <span class="string">Jun23</span> <span class="string">?</span>        <span class="number">00</span><span class="string">:00:00</span> [<span class="string">su</span>] <span class="string">&lt;defunct&gt;</span></span><br><span class="line"><span class="number">76183</span>    <span class="number">1073428</span>  <span class="number">119820</span>  <span class="number">0</span> <span class="string">Jun23</span> <span class="string">?</span>        <span class="number">00</span><span class="string">:00:00</span> [<span class="string">bash</span>] <span class="string">&lt;defunct&gt;</span></span><br><span class="line"><span class="string">root</span>     <span class="number">1375526</span>  <span class="number">119820</span>  <span class="number">0</span> <span class="string">Jun22</span> <span class="string">?</span>        <span class="number">00</span><span class="string">:00:00</span> [<span class="string">su</span>] <span class="string">&lt;defunct&gt;</span></span><br><span class="line"><span class="number">69099</span>    <span class="number">1375561</span>  <span class="number">119820</span>  <span class="number">0</span> <span class="string">Jun22</span> <span class="string">?</span>        <span class="number">00</span><span class="string">:00:00</span> [<span class="string">bash</span>] <span class="string">&lt;defunct&gt;</span></span><br><span class="line"><span class="string">root</span>     <span class="number">1397568</span>  <span class="number">119820</span>  <span class="number">0</span> <span class="string">Jun16</span> <span class="string">?</span>        <span class="number">00</span><span class="string">:00:00</span> [<span class="string">su</span>] <span class="string">&lt;defunct&gt;</span></span><br><span class="line"><span class="number">69099</span>    <span class="number">1397570</span>  <span class="number">119820</span>  <span class="number">0</span> <span class="string">Jun16</span> <span class="string">?</span>        <span class="number">00</span><span class="string">:00:00</span> [<span class="string">bash</span>] <span class="string">&lt;defunct&gt;</span></span><br><span class="line"><span class="string">root</span>     <span class="number">1483339</span>  <span class="number">119820</span>  <span class="number">0</span> <span class="string">Jun23</span> <span class="string">?</span>        <span class="number">00</span><span class="string">:00:00</span> [<span class="string">su</span>] <span class="string">&lt;defunct&gt;</span></span><br><span class="line"><span class="number">76183</span>    <span class="number">1483341</span>  <span class="number">119820</span>  <span class="number">0</span> <span class="string">Jun23</span> <span class="string">?</span>        <span class="number">00</span><span class="string">:00:00</span> [<span class="string">bash</span>] <span class="string">&lt;defunct&gt;</span></span><br><span class="line"><span class="string">stupig</span>   <span class="number">1631234</span> <span class="number">1599793</span>  <span class="number">0</span> <span class="number">11</span><span class="string">:44</span> <span class="string">pts/0</span>    <span class="number">00</span><span class="string">:00:00</span> <span class="string">grep</span> <span class="string">--color=auto</span> <span class="number">119820</span></span><br><span class="line"><span class="string">root</span>     <span class="number">1692888</span>  <span class="number">119820</span>  <span class="number">0</span> <span class="string">Jun07</span> <span class="string">?</span>        <span class="number">00</span><span class="string">:00:00</span> [<span class="string">su</span>] <span class="string">&lt;defunct&gt;</span></span><br><span class="line"><span class="number">76183</span>    <span class="number">1692903</span>  <span class="number">119820</span>  <span class="number">0</span> <span class="string">Jun07</span> <span class="string">?</span>        <span class="number">00</span><span class="string">:00:00</span> [<span class="string">bash</span>] <span class="string">&lt;defunct&gt;</span></span><br><span class="line"><span class="string">root</span>     <span class="number">1882984</span>  <span class="number">119820</span>  <span class="number">0</span> <span class="string">Jun21</span> <span class="string">?</span>        <span class="number">00</span><span class="string">:00:00</span> [<span class="string">su</span>] <span class="string">&lt;defunct&gt;</span></span><br><span class="line"><span class="number">76183</span>    <span class="number">1882985</span>  <span class="number">119820</span>  <span class="number">0</span> <span class="string">Jun21</span> <span class="string">?</span>        <span class="number">00</span><span class="string">:00:00</span> [<span class="string">bash</span>] <span class="string">&lt;defunct&gt;</span></span><br><span class="line"><span class="string">root</span>     <span class="number">1964311</span>  <span class="number">119820</span>  <span class="number">0</span> <span class="string">Jun07</span> <span class="string">?</span>        <span class="number">00</span><span class="string">:00:00</span> [<span class="string">su</span>] <span class="string">&lt;defunct&gt;</span></span><br><span class="line"><span class="number">76183</span>    <span class="number">1964318</span>  <span class="number">119820</span>  <span class="number">0</span> <span class="string">Jun07</span> <span class="string">?</span>        <span class="number">00</span><span class="string">:00:00</span> [<span class="string">bash</span>] <span class="string">&lt;defunct&gt;</span></span><br><span class="line"><span class="string">root</span>     <span class="number">2019760</span>  <span class="number">119820</span>  <span class="number">0</span> <span class="string">Jun07</span> <span class="string">?</span>        <span class="number">00</span><span class="string">:00:00</span> [<span class="string">su</span>] <span class="string">&lt;defunct&gt;</span></span><br><span class="line"><span class="number">76183</span>    <span class="number">2019784</span>  <span class="number">119820</span>  <span class="number">0</span> <span class="string">Jun07</span> <span class="string">?</span>        <span class="number">00</span><span class="string">:00:00</span> [<span class="string">bash</span>] <span class="string">&lt;defunct&gt;</span></span><br><span class="line"><span class="string">root</span>     <span class="number">2122420</span>  <span class="number">119820</span>  <span class="number">0</span> <span class="string">Jun07</span> <span class="string">?</span>        <span class="number">00</span><span class="string">:00:00</span> [<span class="string">su</span>] <span class="string">&lt;defunct&gt;</span></span><br><span class="line"><span class="number">76183</span>    <span class="number">2122434</span>  <span class="number">119820</span>  <span class="number">0</span> <span class="string">Jun07</span> <span class="string">?</span>        <span class="number">00</span><span class="string">:00:00</span> [<span class="string">bash</span>] <span class="string">&lt;defunct&gt;</span></span><br><span class="line"><span class="string">root</span>     <span class="number">2288703</span>  <span class="number">119820</span>  <span class="number">0</span> <span class="string">Jun09</span> <span class="string">?</span>        <span class="number">00</span><span class="string">:00:00</span> [<span class="string">su</span>] <span class="string">&lt;defunct&gt;</span></span><br><span class="line"><span class="number">69099</span>    <span class="number">2288705</span>  <span class="number">119820</span>  <span class="number">0</span> <span class="string">Jun09</span> <span class="string">?</span>        <span class="number">00</span><span class="string">:00:00</span> [<span class="string">bash</span>] <span class="string">&lt;defunct&gt;</span></span><br><span class="line"><span class="string">root</span>     <span class="number">2330164</span>  <span class="number">119820</span>  <span class="number">0</span> <span class="string">Jun07</span> <span class="string">?</span>        <span class="number">00</span><span class="string">:00:00</span> [<span class="string">su</span>] <span class="string">&lt;defunct&gt;</span></span><br><span class="line"><span class="number">76183</span>    <span class="number">2330166</span>  <span class="number">119820</span>  <span class="number">0</span> <span class="string">Jun07</span> <span class="string">?</span>        <span class="number">00</span><span class="string">:00:00</span> [<span class="string">bash</span>] <span class="string">&lt;defunct&gt;</span></span><br><span class="line"><span class="string">root</span>     <span class="number">2406740</span>  <span class="number">119820</span>  <span class="number">0</span> <span class="string">May27</span> <span class="string">?</span>        <span class="number">00</span><span class="string">:00:00</span> [<span class="string">su</span>] <span class="string">&lt;defunct&gt;</span></span><br><span class="line"><span class="number">74647</span>    <span class="number">2406745</span>  <span class="number">119820</span>  <span class="number">0</span> <span class="string">May27</span> <span class="string">?</span>        <span class="number">00</span><span class="string">:00:00</span> [<span class="string">bash</span>] <span class="string">&lt;defunct&gt;</span></span><br><span class="line"><span class="string">root</span>     <span class="number">2421050</span>  <span class="number">119820</span>  <span class="number">0</span> <span class="string">Jun07</span> <span class="string">?</span>        <span class="number">00</span><span class="string">:00:00</span> [<span class="string">su</span>] <span class="string">&lt;defunct&gt;</span></span><br><span class="line"><span class="number">76183</span>    <span class="number">2421069</span>  <span class="number">119820</span>  <span class="number">0</span> <span class="string">Jun07</span> <span class="string">?</span>        <span class="number">00</span><span class="string">:00:00</span> [<span class="string">bash</span>] <span class="string">&lt;defunct&gt;</span></span><br><span class="line"><span class="string">root</span>     <span class="number">2445918</span>  <span class="number">119820</span>  <span class="number">0</span> <span class="string">Jun22</span> <span class="string">?</span>        <span class="number">00</span><span class="string">:00:00</span> [<span class="string">su</span>] <span class="string">&lt;defunct&gt;</span></span><br><span class="line"><span class="number">76183</span>    <span class="number">2445927</span>  <span class="number">119820</span>  <span class="number">0</span> <span class="string">Jun22</span> <span class="string">?</span>        <span class="number">00</span><span class="string">:00:00</span> [<span class="string">bash</span>] <span class="string">&lt;defunct&gt;</span></span><br><span class="line"><span class="string">root</span>     <span class="number">2487600</span>  <span class="number">119820</span>  <span class="number">0</span> <span class="string">Jun22</span> <span class="string">?</span>        <span class="number">00</span><span class="string">:00:00</span> [<span class="string">su</span>] <span class="string">&lt;defunct&gt;</span></span><br><span class="line"><span class="number">76183</span>    <span class="number">2487602</span>  <span class="number">119820</span>  <span class="number">0</span> <span class="string">Jun22</span> <span class="string">?</span>        <span class="number">00</span><span class="string">:00:00</span> [<span class="string">bash</span>] <span class="string">&lt;defunct&gt;</span></span><br><span class="line"><span class="string">root</span>     <span class="number">2897660</span>  <span class="number">119820</span>  <span class="number">0</span> <span class="string">Jun07</span> <span class="string">?</span>        <span class="number">00</span><span class="string">:00:00</span> [<span class="string">su</span>] <span class="string">&lt;defunct&gt;</span></span><br><span class="line"><span class="number">76183</span>    <span class="number">2897662</span>  <span class="number">119820</span>  <span class="number">0</span> <span class="string">Jun07</span> <span class="string">?</span>        <span class="number">00</span><span class="string">:00:00</span> [<span class="string">bash</span>] <span class="string">&lt;defunct&gt;</span></span><br></pre></td></tr></table></figure><p>同一个Pod内pause容器依然退出，但是业务容器却没有退出，并且业务容器关联的containerd-shim进程并未执行子进程收割动作，就像是卡住了。</p><p>面对这些新证据，containerd-shim毫无征兆地崩溃了，大哭道：为什么总是我？</p><h2 id="问题定位-1"><a href="#问题定位-1" class="headerlink" title="问题定位"></a>问题定位</h2><p>言归正传，我们如何能根据上述现象快速定位问题呢？思路有三：</p><ol><li>拿着现象问谷歌</li><li>带着问题看代码</li><li>深挖现场定问题</li></ol><p>1）谷歌大法</p><p>当我们拿着问题呈现的现象搜索谷歌时，还真搜到了关联的内容：<a href="https://github.com/containerd/containerd/issues/2709">Exec process may cause shim hang</a>。</p><p>该issue中所描述的内容和我们碰到的问题基本一致。问题是由于 <a href="https://github.com/containerd/containerd/blob/v1.1.2/reaper/reaper.go#L34">reaper.Default</a> 处定义的channel大小太小引起的，调整channel大小可规避该问题。</p><p>2）理解代码</p><p>尽管本问题可以通过一些手段规避，但是我们还是需要理解代码中出现的问题。</p><p>containerd-shim的主要事务是执行runc命令，主要功能是托管真正的容器进程，并暴露一个服务，供外部用户与容器进行交互。</p><p>containerd-shim内部处理逻辑如下图所示：</p><p><img src="containerd-shim.png" alt="containerd-shim简单架构"></p><p>GRPC服务：containerd-shim的核心服务，对外暴露众多接口，诸如创建/启动task等，并调用runc执行对应的命令。</p><p>此外，containerd-shim内启动了三个协程（包含主协程）共同处理容器内进程退出事件。首先是主协程handleSignals：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">handleSignals</span><span class="params">(logger *logrus.Entry, signals <span class="keyword">chan</span> os.Signal, server *ttrpc.Server, sv *shim.Service)</span> <span class="title">error</span></span> &#123;</span><br><span class="line">   signals := <span class="built_in">make</span>(<span class="keyword">chan</span> os.Signal, <span class="number">32</span>)</span><br><span class="line">   signal.Notify(signals, unix.SIGTERM, unix.SIGINT, unix.SIGCHLD, unix.SIGPIPE)</span><br><span class="line">   runc.Monitor = reaper.Default</span><br><span class="line">   <span class="comment">// set the shim as the subreaper for all orphaned processes created by the container</span></span><br><span class="line">   <span class="keyword">if</span> err := system.SetSubreaper(<span class="number">1</span>); err != <span class="literal">nil</span> &#123;</span><br><span class="line">      <span class="keyword">return</span> <span class="literal">nil</span>, err</span><br><span class="line">   &#125;</span><br><span class="line">   <span class="keyword">for</span> &#123;</span><br><span class="line">      <span class="keyword">select</span> &#123;</span><br><span class="line">      <span class="keyword">case</span> s := &lt;-signals:</span><br><span class="line">         <span class="keyword">switch</span> s &#123;</span><br><span class="line">         <span class="keyword">case</span> unix.SIGCHLD:</span><br><span class="line">            <span class="keyword">if</span> err := reaper.Reap(); err != <span class="literal">nil</span> &#123;</span><br><span class="line">               logger.WithError(err).Error(<span class="string">&quot;reap exit status&quot;</span>)</span><br><span class="line">            &#125;</span><br><span class="line">         <span class="keyword">case</span> unix.SIGTERM, unix.SIGINT:</span><br><span class="line">            <span class="comment">// shim退出处理</span></span><br><span class="line">         &#125;</span><br><span class="line">      &#125;</span><br><span class="line">   &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>containerd-shim调用<code>system.SetSubreaper</code>将自己作为容器内进程的收割者，一般容器内的1号进程也具备收割僵尸进程的能力，因此containerd-shim更多的是收割<code>runc exec</code>进容器内的进程。</p><p>当有僵尸进程出现时，就执行收割逻辑：</p><figure class="highlight css"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br></pre></td><td class="code"><pre><span class="line"><span class="selector-tag">func</span> <span class="selector-tag">Reap</span>() <span class="selector-tag">error</span> &#123;</span><br><span class="line">   <span class="attribute">now </span>:= time.<span class="built_in">Now</span>()</span><br><span class="line">   exits, err := sys.<span class="built_in">Reap</span>(false)          // 调用wait系统调用处理僵尸进程</span><br><span class="line">   Default.<span class="built_in">Lock</span>()</span><br><span class="line">   for c := range Default.subscribers &#123;   // 将退出事件发送给所有订阅者</span><br><span class="line">      for _, e := range exits &#123;</span><br><span class="line">         c &lt;- runc.Exit&#123;</span><br><span class="line">            Timestamp: now,</span><br><span class="line">            Pid:       e.Pid,</span><br><span class="line">            Status:    e.Status,</span><br><span class="line">         &#125;</span><br><span class="line">      &#125;</span><br><span class="line"> </span><br><span class="line">   &#125;</span><br><span class="line">   <span class="selector-tag">Default</span><span class="selector-class">.Unlock</span>()</span><br><span class="line">   <span class="selector-tag">return</span> <span class="selector-tag">err</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>那么谁又是订阅者呢？shim在初始化时就订阅了一份进程退出事件：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">NewService</span><span class="params">(config Config, publisher events.Publisher)</span> <span class="params">(*Service, error)</span></span> &#123;</span><br><span class="line">   s := &amp;Service&#123;</span><br><span class="line">      processes: <span class="built_in">make</span>(<span class="keyword">map</span>[<span class="keyword">string</span>]proc.Process),</span><br><span class="line">      events:    <span class="built_in">make</span>(<span class="keyword">chan</span> <span class="keyword">interface</span>&#123;&#125;, <span class="number">128</span>),</span><br><span class="line">      ec:        reaper.Default.Subscribe(),    <span class="comment">// 订阅进程退出事件</span></span><br><span class="line">   &#125;</span><br><span class="line">   <span class="keyword">go</span> s.processExits()                          <span class="comment">// 退出事件处理</span></span><br><span class="line">   <span class="keyword">go</span> s.forward(publisher)                      <span class="comment">// 退出事件转发</span></span><br><span class="line">   <span class="keyword">return</span> s, <span class="literal">nil</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br></pre></td></tr></table></figure><p>其中退出事件处理逻辑如下：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(s *Service)</span> <span class="title">processExits</span><span class="params">()</span></span> &#123;</span><br><span class="line">   <span class="keyword">for</span> e := <span class="keyword">range</span> s.ec &#123;</span><br><span class="line">      s.checkProcesses(e)</span><br><span class="line">   &#125;</span><br><span class="line">&#125;</span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(s *Service)</span> <span class="title">checkProcesses</span><span class="params">(e runc.Exit)</span></span> &#123;</span><br><span class="line">   s.mu.Lock()</span><br><span class="line">   <span class="keyword">defer</span> s.mu.Unlock()</span><br><span class="line">   <span class="keyword">for</span> _, p := <span class="keyword">range</span> s.processes &#123;</span><br><span class="line">      <span class="keyword">if</span> p.Pid() == e.Pid &#123;</span><br><span class="line">         s.events &lt;- &amp;eventstypes.TaskExit&#123;&#125;</span><br><span class="line">         <span class="keyword">return</span></span><br><span class="line">      &#125;</span><br><span class="line">   &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>只处理<code>s.processes</code>的退出事件，而<code>s.processes</code>关联的都是什么对象呢？主要有二：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// Create a new initial process and container with the underlying OCI runtime</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(s *Service)</span> <span class="title">Create</span><span class="params">(ctx context.Context, r *shimapi.CreateTaskRequest)</span> <span class="params">(*shimapi.CreateTaskResponse, error)</span></span> &#123;</span><br><span class="line">   ......</span><br><span class="line">   s.processes[r.ID] = process</span><br><span class="line">   ......</span><br><span class="line">&#125;</span><br><span class="line"><span class="comment">// Exec an additional process inside the container</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(s *Service)</span> <span class="title">Exec</span><span class="params">(ctx context.Context, r *shimapi.ExecProcessRequest)</span> <span class="params">(*ptypes.Empty, error)</span></span> &#123;</span><br><span class="line">   ......</span><br><span class="line">   s.processes[r.ID] = process</span><br><span class="line">   ......</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>代码就展示这些，其他代码大家感兴趣的自己查阅。</p><p>现在，我们再来根据现象查问题。从现象可知，异常容器待收割的僵尸进程较多，肯定超过了32个。当shim在收割众多僵尸进程时，往订阅者信道(大小32)中发送时，出现阻塞，阻塞点：<a href="https://github.com/containerd/containerd/blob/v1.1.2/reaper/reaper.go#L44">阻塞信号</a>，并且此时持有<code>Default.Lock</code>这一把大锁。</p><p>那么只要这时候再有人来申请这把锁，就会形成死锁。</p><p>那么究竟谁会来申请这把锁呢？这时候，要是能查看containerd-shim的协程栈就好了。</p><p>3）现场分析</p><p>确实，containerd-shim启动了一个协程方便用户导出协程栈信息，我们来看看能不能行呢？</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">executeShim</span><span class="params">()</span> <span class="title">error</span></span> &#123;</span><br><span class="line">   dump := <span class="built_in">make</span>(<span class="keyword">chan</span> os.Signal, <span class="number">32</span>)</span><br><span class="line">   signal.Notify(dump, syscall.SIGUSR1)</span><br><span class="line">   <span class="keyword">go</span> <span class="function"><span class="keyword">func</span><span class="params">()</span></span> &#123;</span><br><span class="line">      <span class="keyword">for</span> <span class="keyword">range</span> dump &#123;</span><br><span class="line">         dumpStacks(logger)</span><br><span class="line">      &#125;</span><br><span class="line">   &#125;()</span><br><span class="line">&#125;</span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">dumpStacks</span><span class="params">(logger *logrus.Entry)</span></span> &#123;</span><br><span class="line">   <span class="keyword">var</span> (</span><br><span class="line">      buf       []<span class="keyword">byte</span></span><br><span class="line">      stackSize <span class="keyword">int</span></span><br><span class="line">   )</span><br><span class="line">   bufferLen := <span class="number">16384</span></span><br><span class="line">   <span class="keyword">for</span> stackSize == <span class="built_in">len</span>(buf) &#123;</span><br><span class="line">      buf = <span class="built_in">make</span>([]<span class="keyword">byte</span>, bufferLen)</span><br><span class="line">      stackSize = runtime.Stack(buf, <span class="literal">true</span>)</span><br><span class="line">      bufferLen *= <span class="number">2</span></span><br><span class="line">   &#125;</span><br><span class="line">   buf = buf[:stackSize]</span><br><span class="line">   logger.Infof(<span class="string">&quot;=== BEGIN goroutine stack dump ===\n%s\n=== END goroutine stack dump ===&quot;</span>, buf)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>一切看起来貌似没什么问题，我们发送<code>SIGUSR1</code>就能获取一份协程栈。</p><p>当我们真正去执行时，却发现到处了一个寂寞。根本原因在于：</p><ul><li>logger.Infof()往os.Stdout输出</li><li>由于线上环境docker没有开启<code>debug</code>模式，线上containerd-shim的os.Stdout被赋值为<code>/dev/null</code></li></ul><figure class="highlight apache"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"><span class="attribute">COMMAND</span>       PID USER   FD      TYPE             DEVICE SIZE/<span class="literal">OFF</span>       NODE NAME</span><br><span class="line"><span class="attribute">docker</span>-co  <span class="number">119820</span> root  cwd       DIR               <span class="number">0</span>,<span class="number">18</span>      <span class="number">120</span>  <span class="number">916720892</span> /run/docker/containerd/daemon/io.containerd.runtime.v<span class="number">1</span>.linux/moby/<span class="number">60</span>f<span class="number">253</span>d<span class="number">59</span>f<span class="number">26</span>e<span class="number">1</span>c<span class="number">573</span>d<span class="number">4</span>ba<span class="number">5</span>f<span class="number">824</span>e<span class="number">73</span>b<span class="number">3</span>a<span class="number">4</span>b<span class="number">1</span>bb<span class="number">1629</span>edace<span class="number">85</span>caba<span class="number">4</span>c<span class="number">620755</span>d<span class="number">4</span>d</span><br><span class="line"><span class="attribute">docker</span>-co  <span class="number">119820</span> root  rtd       DIR                <span class="number">8</span>,<span class="number">3</span>     <span class="number">4096</span>          <span class="number">2</span> /</span><br><span class="line"><span class="attribute">docker</span>-co  <span class="number">119820</span> root  txt       REG                <span class="number">8</span>,<span class="number">3</span>  <span class="number">4173632</span>     <span class="number">392525</span> /usr/bin/docker-containerd-shim</span><br><span class="line"><span class="attribute">docker</span>-co  <span class="number">119820</span> root    <span class="number">0</span>r      CHR                <span class="number">1</span>,<span class="number">3</span>      <span class="number">0</span>t<span class="number">0</span>       <span class="number">2052</span> /dev/null</span><br><span class="line"><span class="attribute">docker</span>-co  <span class="number">119820</span> root    <span class="number">1</span>w      CHR                <span class="number">1</span>,<span class="number">3</span>      <span class="number">0</span>t<span class="number">0</span>       <span class="number">2052</span> /dev/null</span><br><span class="line"><span class="attribute">docker</span>-co  <span class="number">119820</span> root    <span class="number">2</span>w      CHR                <span class="number">1</span>,<span class="number">3</span>      <span class="number">0</span>t<span class="number">0</span>       <span class="number">2052</span> /dev/null</span><br></pre></td></tr></table></figure><p>问题排查至此，似乎又僵住了！好在，之前看过一丢丢内核问题排查相关知识。虽然线上containerd-shim将协程栈的信息全部导出到了<code>/dev/null</code>中，但是我们还是有一些手段获取。</p><p>赶紧找组里的内核大佬帮忙，并很快确定了方案，基于kprobe，在操作系统往<code>/dev/null</code>设备写入协程栈时，拷贝一份内容写到内核日志中。方案实施起来也不复杂：</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string">&lt;linux/kernel.h&gt;</span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string">&lt;linux/module.h&gt;</span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string">&lt;linux/kprobes.h&gt;</span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string">&lt;linux/sched.h&gt;</span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string">&lt;asm/uaccess.h&gt;</span></span></span><br><span class="line"><span class="meta">#<span class="meta-keyword">include</span> <span class="meta-string">&lt;linux/slab.h&gt;</span></span></span><br><span class="line"> </span><br><span class="line"><span class="keyword">static</span> <span class="keyword">int</span> pid;</span><br><span class="line">module_param(pid, <span class="keyword">int</span>, <span class="number">0</span>);</span><br><span class="line"> </span><br><span class="line"><span class="keyword">static</span> <span class="class"><span class="keyword">struct</span> <span class="title">kprobe</span> <span class="title">kp</span> =</span> &#123;</span><br><span class="line">    .symbol_name    = <span class="string">&quot;write_null&quot;</span>,</span><br><span class="line">&#125;;</span><br><span class="line"> </span><br><span class="line"><span class="meta">#<span class="meta-keyword">define</span> SEGMENT 512</span></span><br><span class="line"> </span><br><span class="line"><span class="function"><span class="keyword">static</span> <span class="keyword">int</span> <span class="title">handler_pre</span><span class="params">(struct kprobe *p, struct pt_regs *regs)</span></span></span><br><span class="line"><span class="function"></span>&#123;</span><br><span class="line">    <span class="keyword">char</span> *wbuf;</span><br><span class="line">    <span class="keyword">size_t</span> count, place = <span class="number">0</span>;</span><br><span class="line"> </span><br><span class="line">    <span class="keyword">if</span> (pid != current-&gt;tgid) &#123;</span><br><span class="line">        <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">    &#125;</span><br><span class="line"> </span><br><span class="line">    count = (<span class="keyword">size_t</span>)(regs-&gt;dx);</span><br><span class="line"> </span><br><span class="line">    printk(KERN_INFO <span class="string">&quot;%u call write_null count: %lu\n&quot;</span>, current-&gt;tgid, count);</span><br><span class="line"> </span><br><span class="line">    wbuf = (<span class="keyword">char</span> *)kmalloc(count + <span class="number">1</span>, <span class="number">0</span>);</span><br><span class="line">    <span class="keyword">if</span> (wbuf == <span class="literal">NULL</span>) &#123;</span><br><span class="line">        <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">    &#125;</span><br><span class="line">    <span class="built_in">memset</span>(wbuf, <span class="number">0x0</span>, count + <span class="number">1</span>);</span><br><span class="line"> </span><br><span class="line">    <span class="keyword">if</span> (copy_from_user(wbuf, (<span class="keyword">void</span> *)regs-&gt;si, count)) &#123;</span><br><span class="line">        printk(KERN_ERR <span class="string">&quot;copy_from_user fail\n&quot;</span>);</span><br><span class="line">        <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">    &#125;</span><br><span class="line"> </span><br><span class="line">    <span class="keyword">while</span> (place &lt; count) &#123;</span><br><span class="line">        <span class="keyword">char</span> tmp[SEGMENT + <span class="number">1</span>];</span><br><span class="line">        <span class="built_in">memset</span>(tmp, <span class="number">0x0</span>, SEGMENT + <span class="number">1</span>);</span><br><span class="line"> </span><br><span class="line">        <span class="built_in">snprintf</span>(tmp, SEGMENT + <span class="number">1</span>, <span class="string">&quot;%s&quot;</span>, wbuf + place);</span><br><span class="line">        <span class="keyword">if</span> ((count - place) &gt;= SEGMENT) &#123;</span><br><span class="line">            place += SEGMENT;</span><br><span class="line">        &#125; <span class="keyword">else</span> &#123;</span><br><span class="line">            place = count;</span><br><span class="line">        &#125;</span><br><span class="line">        printk(KERN_INFO <span class="string">&quot;%s\n&quot;</span>, tmp);</span><br><span class="line">    &#125;</span><br><span class="line"> </span><br><span class="line">    <span class="keyword">if</span> (wbuf) &#123;</span><br><span class="line">        kfree(wbuf);</span><br><span class="line">    &#125;</span><br><span class="line"> </span><br><span class="line">    <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">&#125;</span><br><span class="line"> </span><br><span class="line"><span class="function"><span class="keyword">static</span> <span class="keyword">int</span> __init <span class="title">kprobe_init</span><span class="params">(<span class="keyword">void</span>)</span></span></span><br><span class="line"><span class="function"></span>&#123;</span><br><span class="line">    <span class="keyword">int</span> ret;</span><br><span class="line">    kp.pre_handler = handler_pre;</span><br><span class="line"> </span><br><span class="line">    ret = register_kprobe(&amp;kp);</span><br><span class="line">    <span class="keyword">if</span> (ret &lt; <span class="number">0</span>) &#123;</span><br><span class="line">        printk(KERN_INFO <span class="string">&quot;register_kprobe failed, returned %d\n&quot;</span>, ret);</span><br><span class="line">        <span class="keyword">return</span> ret;</span><br><span class="line">    &#125;</span><br><span class="line">    printk(KERN_INFO <span class="string">&quot;Planted kprobe at %p\n&quot;</span>, kp.addr);</span><br><span class="line">    <span class="keyword">return</span> <span class="number">0</span>;</span><br><span class="line">&#125;</span><br><span class="line"> </span><br><span class="line"><span class="function"><span class="keyword">static</span> <span class="keyword">void</span> __exit <span class="title">kprobe_exit</span><span class="params">(<span class="keyword">void</span>)</span></span></span><br><span class="line"><span class="function"></span>&#123;</span><br><span class="line">    unregister_kprobe(&amp;kp);</span><br><span class="line">    printk(KERN_INFO <span class="string">&quot;kprobe at %p unregistered\n&quot;</span>, kp.addr);</span><br><span class="line">&#125;</span><br><span class="line"> </span><br><span class="line">module_init(kprobe_init)</span><br><span class="line">module_exit(kprobe_exit)</span><br><span class="line">MODULE_LICENSE(<span class="string">&quot;GPL&quot;</span>);</span><br></pre></td></tr></table></figure><p>这里着重感谢睿哥提供的kprobe代码。kprobe的相关知识，感兴趣的自己学习下。</p><p>我们在线上部署了该kprobe内核模块之后，往containerd-shim发送<code>SIGUSR1</code>顺利获取了协程栈信息。终于补全了问题排查的最后一块拼盘。</p><p>补：这里重点感谢飞哥（不愧是老中医）的提醒，其实还有一个非常简单的方法获取协程栈：借助strace跟踪进程的系统调用。</p><p>线上containerd-shim的协程栈信息(删减了大量不重要协程栈，并调整了格式)展示如下：</p><figure class="highlight yaml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br><span class="line">91</span><br><span class="line">92</span><br><span class="line">93</span><br><span class="line">94</span><br><span class="line">95</span><br><span class="line">96</span><br><span class="line">97</span><br><span class="line">98</span><br><span class="line">99</span><br><span class="line">100</span><br><span class="line">101</span><br><span class="line">102</span><br><span class="line">103</span><br><span class="line">104</span><br><span class="line">105</span><br><span class="line">106</span><br><span class="line">107</span><br><span class="line">108</span><br><span class="line">109</span><br><span class="line">110</span><br><span class="line">111</span><br><span class="line">112</span><br><span class="line">113</span><br><span class="line">114</span><br><span class="line">115</span><br><span class="line">116</span><br><span class="line">117</span><br><span class="line">118</span><br><span class="line">119</span><br><span class="line">120</span><br><span class="line">121</span><br><span class="line">122</span><br><span class="line">123</span><br><span class="line">124</span><br><span class="line">125</span><br><span class="line">126</span><br><span class="line">127</span><br><span class="line">128</span><br><span class="line">129</span><br><span class="line">130</span><br><span class="line">131</span><br><span class="line">132</span><br><span class="line">133</span><br><span class="line">134</span><br><span class="line">135</span><br><span class="line">136</span><br><span class="line">137</span><br><span class="line">138</span><br><span class="line">139</span><br><span class="line">140</span><br><span class="line">141</span><br><span class="line">142</span><br><span class="line">143</span><br><span class="line">144</span><br><span class="line">145</span><br><span class="line">146</span><br><span class="line">147</span><br><span class="line">148</span><br><span class="line">149</span><br><span class="line">150</span><br><span class="line">151</span><br><span class="line">152</span><br><span class="line">153</span><br><span class="line">154</span><br><span class="line">155</span><br><span class="line">156</span><br><span class="line">157</span><br><span class="line">158</span><br><span class="line">159</span><br><span class="line">160</span><br><span class="line">161</span><br><span class="line">162</span><br><span class="line">163</span><br><span class="line">164</span><br><span class="line">165</span><br><span class="line">166</span><br><span class="line">167</span><br><span class="line">168</span><br><span class="line">169</span><br><span class="line">170</span><br><span class="line">171</span><br><span class="line">172</span><br></pre></td><td class="code"><pre><span class="line"><span class="string">time=&quot;2021-06-23T16:35:07+08:00&quot;</span> <span class="string">level=info</span> <span class="string">msg=&quot;===</span> <span class="string">BEGIN</span> <span class="string">goroutine</span> <span class="string">stack</span> <span class="string">dump</span> <span class="string">===</span></span><br><span class="line"><span class="string">goroutine</span> <span class="number">22</span> [<span class="string">running</span>]<span class="string">:</span></span><br><span class="line"><span class="string">main.dumpStacks(0xc4201c81e0)</span></span><br><span class="line">      <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/cmd/containerd-shim/main_unix.go:228</span> <span class="string">+0x8a</span></span><br><span class="line"><span class="string">main.executeShim.func1(0xc42012c300,</span> <span class="number">0xc4201c81e0</span><span class="string">)</span></span><br><span class="line">      <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/cmd/containerd-shim/main_unix.go:148</span> <span class="string">+0x3d</span></span><br><span class="line"><span class="string">created</span> <span class="string">by</span> <span class="string">main.executeShim</span></span><br><span class="line">      <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/cmd/containerd-shim/main_unix.go:146</span> <span class="string">+0x5de</span></span><br><span class="line"> </span><br><span class="line"><span class="string">goroutine</span> <span class="number">1</span> [<span class="string">chan</span> <span class="string">send</span>, <span class="number">83</span> <span class="string">minutes</span>]<span class="string">:</span></span><br><span class="line"><span class="string">github.com/containerd/containerd/reaper.Reap(0xc420243be0,</span> <span class="number">0x1</span><span class="string">)</span></span><br><span class="line">      <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/reaper/reaper.go:44</span> <span class="string">+0x168</span></span><br><span class="line"><span class="string">main.handleSignals(0xc4201c81e0,</span> <span class="number">0xc42012c240</span><span class="string">,</span> <span class="number">0xc4201d6090</span><span class="string">,</span> <span class="number">0xc4201e4000</span><span class="string">,</span> <span class="number">0xc420117ea0</span><span class="string">,</span> <span class="number">0x86</span><span class="string">)</span></span><br><span class="line">      <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/cmd/containerd-shim/main_unix.go:197</span> <span class="string">+0x2a1</span></span><br><span class="line"><span class="string">main.executeShim(0x2,</span> <span class="number">0x60</span><span class="string">)</span></span><br><span class="line">      <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/cmd/containerd-shim/main_unix.go:151</span> <span class="string">+0x616</span></span><br><span class="line"><span class="string">main.main()</span></span><br><span class="line">      <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/cmd/containerd-shim/main_unix.go:96</span> <span class="string">+0x81</span></span><br><span class="line"> </span><br><span class="line"><span class="string">goroutine</span> <span class="number">5</span> [<span class="string">syscall</span>]<span class="string">:</span></span><br><span class="line"><span class="string">os/signal.signal_recv(0x6c14c0)</span></span><br><span class="line">      <span class="string">/usr/local/go/src/runtime/sigqueue.go:139</span> <span class="string">+0xa6</span></span><br><span class="line"><span class="string">os/signal.loop()</span></span><br><span class="line">      <span class="string">/usr/local/go/src/os/signal/signal_unix.go:22</span> <span class="string">+0x22</span></span><br><span class="line"><span class="string">created</span> <span class="string">by</span> <span class="string">os/signal.init.0</span></span><br><span class="line">      <span class="string">/usr/local/go/src/os/signal/signal_unix.go:28</span> <span class="string">+0x41</span></span><br><span class="line"> </span><br><span class="line"><span class="string">goroutine</span> <span class="number">6</span> [<span class="string">chan</span> <span class="string">receive</span>]<span class="string">:</span></span><br><span class="line"><span class="string">main.main.func1()</span></span><br><span class="line">      <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/cmd/containerd-shim/main_unix.go:81</span> <span class="string">+0x7b</span></span><br><span class="line"><span class="string">created</span> <span class="string">by</span> <span class="string">main.main</span></span><br><span class="line">      <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/cmd/containerd-shim/main_unix.go:80</span> <span class="string">+0x46</span></span><br><span class="line"> </span><br><span class="line"><span class="string">goroutine</span> <span class="number">7</span> [<span class="string">select</span>, <span class="number">92427</span> <span class="string">minutes</span>, <span class="string">locked</span> <span class="string">to</span> <span class="string">thread</span>]<span class="string">:</span></span><br><span class="line"><span class="string">runtime.gopark(0x6a70a0,</span> <span class="number">0x0</span><span class="string">,</span> <span class="number">0x696b74</span><span class="string">,</span> <span class="number">0x6</span><span class="string">,</span> <span class="number">0x18</span><span class="string">,</span> <span class="number">0x1</span><span class="string">)</span></span><br><span class="line">      <span class="string">/usr/local/go/src/runtime/proc.go:291</span> <span class="string">+0x11a</span></span><br><span class="line"><span class="string">runtime.selectgo(0xc420104f50,</span> <span class="number">0xc42014a180</span><span class="string">)</span></span><br><span class="line">      <span class="string">/usr/local/go/src/runtime/select.go:392</span> <span class="string">+0xe50</span></span><br><span class="line"><span class="string">runtime.ensureSigM.func1()</span></span><br><span class="line">      <span class="string">/usr/local/go/src/runtime/signal_unix.go:549</span> <span class="string">+0x1f4</span></span><br><span class="line"><span class="string">runtime.goexit()</span></span><br><span class="line">      <span class="string">/usr/local/go/src/runtime/asm_amd64.s:2361</span> <span class="string">+0x1</span></span><br><span class="line"> </span><br><span class="line"><span class="string">goroutine</span> <span class="number">18</span> [<span class="string">semacquire</span>, <span class="number">83</span> <span class="string">minutes</span>]<span class="string">:</span></span><br><span class="line"><span class="string">sync.runtime_SemacquireMutex(0xc4201e4004,</span> <span class="number">0x403200</span><span class="string">)</span></span><br><span class="line">      <span class="string">/usr/local/go/src/runtime/sema.go:71</span> <span class="string">+0x3d</span></span><br><span class="line"><span class="string">sync.(*Mutex).Lock(0xc4201e4000)</span></span><br><span class="line">      <span class="string">/usr/local/go/src/sync/mutex.go:134</span> <span class="string">+0x108</span></span><br><span class="line"><span class="string">github.com/containerd/containerd/linux/shim.(*Service).checkProcesses(0xc4201e4000,</span> <span class="number">0xc02cd591b40305ef</span><span class="string">,</span> <span class="number">0x13af4280e2630f</span><span class="string">,</span> <span class="number">0x7fc440</span><span class="string">,</span> <span class="number">0x1c11b</span><span class="string">,</span> <span class="number">0x0</span><span class="string">)</span></span><br><span class="line">      <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/linux/shim/service.go:470</span> <span class="string">+0x45</span></span><br><span class="line"><span class="string">github.com/containerd/containerd/linux/shim.(*Service).processExits(0xc4201e4000)</span></span><br><span class="line">      <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/linux/shim/service.go:465</span> <span class="string">+0xd0</span></span><br><span class="line"><span class="string">created</span> <span class="string">by</span> <span class="string">github.com/containerd/containerd/linux/shim.NewService</span></span><br><span class="line">      <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/linux/shim/service.go:86</span> <span class="string">+0x3e9</span></span><br><span class="line"> </span><br><span class="line"><span class="string">goroutine</span> <span class="number">19</span> [<span class="string">syscall</span>, <span class="number">83</span> <span class="string">minutes</span>]<span class="string">:</span></span><br><span class="line"><span class="string">syscall.Syscall6(0xe8,</span> <span class="number">0x4</span><span class="string">,</span> <span class="number">0xc4201189b8</span><span class="string">,</span> <span class="number">0x80</span><span class="string">,</span> <span class="number">0xffffffffffffffff</span><span class="string">,</span> <span class="number">0x0</span><span class="string">,</span> <span class="number">0x0</span><span class="string">,</span> <span class="number">0xc420118938</span><span class="string">,</span> <span class="number">0x45b793</span><span class="string">,</span> <span class="number">0xc42047add0</span><span class="string">)</span></span><br><span class="line">      <span class="string">/usr/local/go/src/syscall/asm_linux_amd64.s:44</span> <span class="string">+0x5</span></span><br><span class="line"><span class="string">github.com/containerd/containerd/vendor/golang.org/x/sys/unix.EpollWait(0x4,</span> <span class="number">0xc4201189b8</span><span class="string">,</span> <span class="number">0x80</span><span class="string">,</span> <span class="number">0x80</span><span class="string">,</span> <span class="number">0xffffffffffffffff</span><span class="string">,</span> <span class="number">0x5</span><span class="string">,</span> <span class="number">0x0</span><span class="string">,</span> <span class="number">0x0</span><span class="string">)</span></span><br><span class="line">      <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/vendor/golang.org/x/sys/unix/zsyscall_linux_amd64.go:1518</span> <span class="string">+0x7a</span></span><br><span class="line"><span class="string">github.com/containerd/containerd/vendor/github.com/containerd/console.(*Epoller).Wait(0xc4201be060,</span> <span class="number">0xc420117aa8</span><span class="string">,</span> <span class="number">0xc420117ab0</span><span class="string">)</span></span><br><span class="line">      <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/vendor/github.com/containerd/console/console_linux.go:110</span> <span class="string">+0x7a</span></span><br><span class="line"><span class="string">created</span> <span class="string">by</span> <span class="string">github.com/containerd/containerd/linux/shim.(*Service).initPlatform</span></span><br><span class="line">      <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/linux/shim/service_linux.go:109</span> <span class="string">+0xc6</span></span><br><span class="line"> </span><br><span class="line"><span class="string">goroutine</span> <span class="number">20</span> [<span class="string">chan</span> <span class="string">receive</span>, <span class="number">83</span> <span class="string">minutes</span>]<span class="string">:</span></span><br><span class="line"><span class="string">github.com/containerd/containerd/linux/shim.(*Service).forward(0xc4201e4000,</span> <span class="number">0x6c0600</span><span class="string">,</span> <span class="number">0xc4201d4010</span><span class="string">)</span></span><br><span class="line">      <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/linux/shim/service.go:514</span> <span class="string">+0x62</span></span><br><span class="line"><span class="string">created</span> <span class="string">by</span> <span class="string">github.com/containerd/containerd/linux/shim.NewService</span></span><br><span class="line">      <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/linux/shim/service.go:90</span> <span class="string">+0x49b</span></span><br><span class="line"> </span><br><span class="line"><span class="string">goroutine</span> <span class="number">21</span> [<span class="string">IO</span> <span class="string">wait</span>, <span class="number">92427</span> <span class="string">minutes</span>]<span class="string">:</span></span><br><span class="line"><span class="string">internal/poll.runtime_pollWait(0x7f75331fcf00,</span> <span class="number">0x72</span><span class="string">,</span> <span class="number">0x0</span><span class="string">)</span></span><br><span class="line">      <span class="string">/usr/local/go/src/runtime/netpoll.go:173</span> <span class="string">+0x57</span></span><br><span class="line"><span class="string">internal/poll.(*pollDesc).wait(0xc4201e6118,</span> <span class="number">0x72</span><span class="string">,</span> <span class="number">0xc420010100</span><span class="string">,</span> <span class="number">0x0</span><span class="string">,</span> <span class="number">0x0</span><span class="string">)</span></span><br><span class="line">      <span class="string">/usr/local/go/src/internal/poll/fd_poll_runtime.go:85</span> <span class="string">+0x9b</span></span><br><span class="line"><span class="string">internal/poll.(*pollDesc).waitRead(0xc4201e6118,</span> <span class="number">0xffffffffffffff00</span><span class="string">,</span> <span class="number">0x0</span><span class="string">,</span> <span class="number">0x0</span><span class="string">)</span></span><br><span class="line">      <span class="string">/usr/local/go/src/internal/poll/fd_poll_runtime.go:90</span> <span class="string">+0x3d</span></span><br><span class="line"><span class="string">internal/poll.(*FD).Accept(0xc4201e6100,</span> <span class="number">0x0</span><span class="string">,</span> <span class="number">0x0</span><span class="string">,</span> <span class="number">0x0</span><span class="string">,</span> <span class="number">0x0</span><span class="string">,</span> <span class="number">0x0</span><span class="string">,</span> <span class="number">0x0</span><span class="string">,</span> <span class="number">0x0</span><span class="string">)</span></span><br><span class="line">      <span class="string">/usr/local/go/src/internal/poll/fd_unix.go:372</span> <span class="string">+0x1a8</span></span><br><span class="line"><span class="string">net.(*netFD).accept(0xc4201e6100,</span> <span class="number">0xc4201d60a0</span><span class="string">,</span> <span class="number">0xc4201d6060</span><span class="string">,</span> <span class="number">0xc4201563c0</span><span class="string">)</span></span><br><span class="line">      <span class="string">/usr/local/go/src/net/fd_unix.go:238</span> <span class="string">+0x42</span></span><br><span class="line"><span class="string">net.(*UnixListener).accept(0xc4201d6270,</span> <span class="number">0x451c70</span><span class="string">,</span> <span class="number">0xc42011cea8</span><span class="string">,</span> <span class="number">0xc42011ceb0</span><span class="string">)</span></span><br><span class="line">      <span class="string">/usr/local/go/src/net/unixsock_posix.go:162</span> <span class="string">+0x32</span></span><br><span class="line"><span class="string">net.(*UnixListener).Accept(0xc4201d6270,</span> <span class="number">0x6a6b10</span><span class="string">,</span> <span class="number">0xc4201563c0</span><span class="string">,</span> <span class="number">0x6c3840</span><span class="string">,</span> <span class="number">0xc420012018</span><span class="string">)</span></span><br><span class="line">      <span class="string">/usr/local/go/src/net/unixsock.go:253</span> <span class="string">+0x49</span></span><br><span class="line"><span class="string">github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.(*Server).Serve(0xc4201d6090,</span> <span class="number">0x6c3440</span><span class="string">,</span> <span class="number">0xc4201d6270</span><span class="string">,</span> <span class="number">0x0</span><span class="string">,</span> <span class="number">0x0</span><span class="string">)</span></span><br><span class="line">      <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/server.go:69</span> <span class="string">+0x106</span></span><br><span class="line"><span class="string">main.serve.func1(0x6c3440,</span> <span class="number">0xc4201d6270</span><span class="string">,</span> <span class="number">0xc4201d6090</span><span class="string">)</span></span><br><span class="line">      <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/cmd/containerd-shim/main_unix.go:176</span> <span class="string">+0x71</span></span><br><span class="line"><span class="string">created</span> <span class="string">by</span> <span class="string">main.serve</span></span><br><span class="line">      <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/cmd/containerd-shim/main_unix.go:174</span> <span class="string">+0x1be</span></span><br><span class="line"> </span><br><span class="line"><span class="string">goroutine</span> <span class="number">8</span> [<span class="string">select</span>, <span class="number">83</span> <span class="string">minutes</span>]<span class="string">:</span></span><br><span class="line"><span class="string">github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.(*serverConn).run(0xc4201563c0,</span> <span class="number">0x6c3840</span><span class="string">,</span> <span class="number">0xc420012018</span><span class="string">)</span></span><br><span class="line">      <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/server.go:398</span> <span class="string">+0x3f0</span></span><br><span class="line"><span class="string">created</span> <span class="string">by</span> <span class="string">github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.(*Server).Serve</span></span><br><span class="line">      <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/server.go:109</span> <span class="string">+0x25c</span></span><br><span class="line"> </span><br><span class="line"><span class="string">goroutine</span> <span class="number">9</span> [<span class="string">IO</span> <span class="string">wait</span>, <span class="number">83</span> <span class="string">minutes</span>]<span class="string">:</span></span><br><span class="line"><span class="string">internal/poll.runtime_pollWait(0x7f75331fce30,</span> <span class="number">0x72</span><span class="string">,</span> <span class="number">0xc42011ea48</span><span class="string">)</span></span><br><span class="line">      <span class="string">/usr/local/go/src/runtime/netpoll.go:173</span> <span class="string">+0x57</span></span><br><span class="line"><span class="string">internal/poll.(*pollDesc).wait(0xc4201a0118,</span> <span class="number">0x72</span><span class="string">,</span> <span class="number">0xffffffffffffff00</span><span class="string">,</span> <span class="number">0x6c0a40</span><span class="string">,</span> <span class="number">0x7e11b8</span><span class="string">)</span></span><br><span class="line">      <span class="string">/usr/local/go/src/internal/poll/fd_poll_runtime.go:85</span> <span class="string">+0x9b</span></span><br><span class="line"><span class="string">internal/poll.(*pollDesc).waitRead(0xc4201a0118,</span> <span class="number">0xc4201ed000</span><span class="string">,</span> <span class="number">0x1000</span><span class="string">,</span> <span class="number">0x1000</span><span class="string">)</span></span><br><span class="line">      <span class="string">/usr/local/go/src/internal/poll/fd_poll_runtime.go:90</span> <span class="string">+0x3d</span></span><br><span class="line"><span class="string">internal/poll.(*FD).Read(0xc4201a0100,</span> <span class="number">0xc4201ed000</span><span class="string">,</span> <span class="number">0x1000</span><span class="string">,</span> <span class="number">0x1000</span><span class="string">,</span> <span class="number">0x0</span><span class="string">,</span> <span class="number">0x0</span><span class="string">,</span> <span class="number">0x0</span><span class="string">)</span></span><br><span class="line">      <span class="string">/usr/local/go/src/internal/poll/fd_unix.go:157</span> <span class="string">+0x17d</span></span><br><span class="line"><span class="string">net.(*netFD).Read(0xc4201a0100,</span> <span class="number">0xc4201ed000</span><span class="string">,</span> <span class="number">0x1000</span><span class="string">,</span> <span class="number">0x1000</span><span class="string">,</span> <span class="number">0xc42011eb30</span><span class="string">,</span> <span class="number">0x451430</span><span class="string">,</span> <span class="number">0xc420001b00</span><span class="string">)</span></span><br><span class="line">      <span class="string">/usr/local/go/src/net/fd_unix.go:202</span> <span class="string">+0x4f</span></span><br><span class="line"><span class="string">net.(*conn).Read(0xc42000c050,</span> <span class="number">0xc4201ed000</span><span class="string">,</span> <span class="number">0x1000</span><span class="string">,</span> <span class="number">0x1000</span><span class="string">,</span> <span class="number">0x0</span><span class="string">,</span> <span class="number">0x0</span><span class="string">,</span> <span class="number">0x0</span><span class="string">)</span></span><br><span class="line">      <span class="string">/usr/local/go/src/net/net.go:176</span> <span class="string">+0x6a</span></span><br><span class="line"><span class="string">bufio.(*Reader).Read(0xc42012c480,</span> <span class="number">0xc4200105a0</span><span class="string">,</span> <span class="number">0xa</span><span class="string">,</span> <span class="number">0xa</span><span class="string">,</span> <span class="number">0x0</span><span class="string">,</span> <span class="number">0x2</span><span class="string">,</span> <span class="number">0x2</span><span class="string">)</span></span><br><span class="line">      <span class="string">/usr/local/go/src/bufio/bufio.go:216</span> <span class="string">+0x238</span></span><br><span class="line"><span class="string">io.ReadAtLeast(0x6c02c0,</span> <span class="number">0xc42012c480</span><span class="string">,</span> <span class="number">0xc4200105a0</span><span class="string">,</span> <span class="number">0xa</span><span class="string">,</span> <span class="number">0xa</span><span class="string">,</span> <span class="number">0xa</span><span class="string">,</span> <span class="number">0x6c62e0</span><span class="string">,</span> <span class="number">0xc42011ef50</span><span class="string">,</span> <span class="number">0x0</span><span class="string">)</span></span><br><span class="line">      <span class="string">/usr/local/go/src/io/io.go:309</span> <span class="string">+0x86</span></span><br><span class="line"><span class="string">io.ReadFull(0x6c02c0,</span> <span class="number">0xc42012c480</span><span class="string">,</span> <span class="number">0xc4200105a0</span><span class="string">,</span> <span class="number">0xa</span><span class="string">,</span> <span class="number">0xa</span><span class="string">,</span> <span class="number">0xc42011eef0</span><span class="string">,</span> <span class="number">0x3</span><span class="string">,</span> <span class="number">0x3</span><span class="string">)</span></span><br><span class="line">      <span class="string">/usr/local/go/src/io/io.go:327</span> <span class="string">+0x58</span></span><br><span class="line"><span class="string">github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.readMessageHeader(0xc4200105a0,</span> <span class="number">0xa</span><span class="string">,</span> <span class="number">0xa</span><span class="string">,</span> <span class="number">0x6c02c0</span><span class="string">,</span> <span class="number">0xc42012c480</span><span class="string">,</span> <span class="number">0xc42011ee70</span><span class="string">,</span> <span class="number">0x2</span><span class="string">,</span> <span class="number">0x2</span><span class="string">,</span> <span class="number">0xc42011eed0</span><span class="string">)</span></span><br><span class="line">      <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/channel.go:38</span> <span class="string">+0x60</span></span><br><span class="line"><span class="string">github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.(*channel).recv(0xc420010580,</span> <span class="number">0x6c3800</span><span class="string">,</span> <span class="number">0xc4200105c0</span><span class="string">,</span> <span class="number">0x0</span><span class="string">,</span> <span class="number">0xc420392940</span><span class="string">,</span> <span class="number">0x0</span><span class="string">,</span> <span class="number">0x0</span><span class="string">,</span> <span class="number">0x73</span><span class="string">,</span> <span class="number">0x0</span><span class="string">,</span> <span class="number">0x0</span><span class="string">)</span></span><br><span class="line">      <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/channel.go:86</span> <span class="string">+0x6d</span></span><br><span class="line"><span class="string">github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.(*serverConn).run.func1(0xc42014a2a0,</span> <span class="number">0xc4201563c0</span><span class="string">,</span> <span class="number">0xc42014a360</span><span class="string">,</span> <span class="number">0xc420010580</span><span class="string">,</span> <span class="number">0x6c3800</span><span class="string">,</span> <span class="number">0xc4200105c0</span><span class="string">,</span> <span class="number">0xc42014a300</span><span class="string">,</span> <span class="number">0xc42012c4e0</span><span class="string">)</span></span><br><span class="line">      <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/server.go:329</span> <span class="string">+0x1bf</span></span><br><span class="line"><span class="string">created</span> <span class="string">by</span> <span class="string">github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.(*serverConn).run</span></span><br><span class="line">      <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/server.go:299</span> <span class="string">+0x247</span></span><br><span class="line"> </span><br><span class="line"><span class="string">goroutine</span> <span class="number">22661144</span> [<span class="string">chan</span> <span class="string">receive</span>, <span class="number">83</span> <span class="string">minutes</span>]<span class="string">:</span></span><br><span class="line"><span class="string">github.com/containerd/containerd/reaper.(*Monitor).Wait(0x7fb5d0,</span> <span class="number">0xc42017a580</span><span class="string">,</span> <span class="number">0xc420674360</span><span class="string">,</span> <span class="number">0x0</span><span class="string">,</span> <span class="number">0x0</span><span class="string">,</span> <span class="number">0x68c3c0</span><span class="string">)</span></span><br><span class="line">    <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/reaper/reaper.go:82</span> <span class="string">+0x52</span></span><br><span class="line"><span class="string">github.com/containerd/containerd/vendor/github.com/containerd/go-runc.cmdOutput(0xc42017a580,</span> <span class="number">0x1</span><span class="string">,</span> <span class="number">0x0</span><span class="string">,</span> <span class="number">0x0</span><span class="string">,</span> <span class="number">0x0</span><span class="string">,</span> <span class="number">0x0</span><span class="string">,</span> <span class="number">0x0</span><span class="string">)</span></span><br><span class="line">    <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/vendor/github.com/containerd/go-runc/runc.go:693</span> <span class="string">+0x110</span></span><br><span class="line"><span class="string">github.com/containerd/containerd/vendor/github.com/containerd/go-runc.(*Runc).runOrError(0xc42018b6c0,</span> <span class="number">0xc42017a580</span><span class="string">,</span> <span class="number">0xc4200105c0</span><span class="string">,</span> <span class="number">0xc4201d6c30</span><span class="string">)</span></span><br><span class="line">    <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/vendor/github.com/containerd/go-runc/runc.go:673</span> <span class="string">+0x19b</span></span><br><span class="line"><span class="string">github.com/containerd/containerd/vendor/github.com/containerd/go-runc.(*Runc).Kill(0xc42018b6c0,</span> <span class="number">0x6c3800</span><span class="string">,</span> <span class="number">0xc4200105c0</span><span class="string">,</span> <span class="number">0xc420014480</span><span class="string">,</span> <span class="number">0x40</span><span class="string">,</span> <span class="number">0xf</span><span class="string">,</span> <span class="number">0xc42054bbc7</span><span class="string">,</span> <span class="number">0xc420393080</span><span class="string">,</span> <span class="number">0xc42012d9e0</span><span class="string">)</span></span><br><span class="line">    <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/vendor/github.com/containerd/go-runc/runc.go:320</span> <span class="string">+0x1e2</span></span><br><span class="line"><span class="string">github.com/containerd/containerd/linux/proc.(*Init).kill(0xc42018c3c0,</span> <span class="number">0x6c3800</span><span class="string">,</span> <span class="number">0xc4200105c0</span><span class="string">,</span> <span class="number">0xf</span><span class="string">,</span> <span class="number">0x40</span><span class="string">,</span> <span class="number">0xc42054bc01</span><span class="string">)</span></span><br><span class="line">    <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/linux/proc/init.go:341</span> <span class="string">+0x78</span></span><br><span class="line"><span class="string">github.com/containerd/containerd/linux/proc.(*runningState).Kill(0xc4201ca030,</span> <span class="number">0x6c3800</span><span class="string">,</span> <span class="number">0xc4200105c0</span><span class="string">,</span> <span class="number">0xf</span><span class="string">,</span> <span class="number">0x0</span><span class="string">,</span> <span class="number">0x0</span><span class="string">)</span></span><br><span class="line">    <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/linux/proc/init_state.go:331</span> <span class="string">+0xa5</span></span><br><span class="line"><span class="string">github.com/containerd/containerd/linux/shim.(*Service).Kill(0xc4201e4000,</span> <span class="number">0x6c3800</span><span class="string">,</span> <span class="number">0xc4200105c0</span><span class="string">,</span> <span class="number">0xc42000a940</span><span class="string">,</span> <span class="number">0x0</span><span class="string">,</span> <span class="number">0x0</span><span class="string">,</span> <span class="number">0x0</span><span class="string">)</span></span><br><span class="line">    <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/linux/shim/service.go:356</span> <span class="string">+0x271</span></span><br><span class="line"><span class="string">github.com/containerd/containerd/linux/shim/v1.RegisterShimService.func10(0x6c3800,</span> <span class="number">0xc4200105c0</span><span class="string">,</span> <span class="number">0xc42000a920</span><span class="string">,</span> <span class="number">0xc4201ce968</span><span class="string">,</span> <span class="number">0x4</span><span class="string">,</span> <span class="number">0xc4201be1a0</span><span class="string">,</span> <span class="number">0x0</span><span class="string">)</span></span><br><span class="line">    <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/linux/shim/v1/shim.pb.go:1670</span> <span class="string">+0xc5</span></span><br><span class="line"><span class="string">github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.(*serviceSet).dispatch(0xc4201ca008,</span> <span class="number">0x6c3800</span><span class="string">,</span> <span class="number">0xc4200105c0</span><span class="string">,</span> <span class="number">0xc4204e6570</span><span class="string">,</span> <span class="number">0x25</span><span class="string">,</span> <span class="number">0xc4201ce968</span><span class="string">,</span> <span class="number">0x4</span><span class="string">,</span> <span class="number">0xc420226c30</span><span class="string">,</span> <span class="number">0x44</span><span class="string">,</span> <span class="number">0x44</span><span class="string">,</span> <span class="string">...)</span></span><br><span class="line">    <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/services.go:71</span> <span class="string">+0x10e</span></span><br><span class="line"><span class="string">github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.(*serviceSet).call(0xc4201ca008,</span> <span class="number">0x6c3800</span><span class="string">,</span> <span class="number">0xc4200105c0</span><span class="string">,</span> <span class="number">0xc4204e6570</span><span class="string">,</span> <span class="number">0x25</span><span class="string">,</span> <span class="number">0xc4201ce968</span><span class="string">,</span> <span class="number">0x4</span><span class="string">,</span> <span class="number">0xc420226c30</span><span class="string">,</span> <span class="number">0x44</span><span class="string">,</span> <span class="number">0x44</span><span class="string">,</span> <span class="string">...)</span></span><br><span class="line">    <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/services.go:44</span> <span class="string">+0xb5</span></span><br><span class="line"><span class="string">github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.(*serverConn).run.func2(0xc4201563c0,</span> <span class="number">0x6c3800</span><span class="string">,</span> <span class="number">0xc4200105c0</span><span class="string">,</span> <span class="number">0xace455</span><span class="string">,</span> <span class="number">0xc420392900</span><span class="string">,</span> <span class="number">0xc42014a2a0</span><span class="string">,</span> <span class="number">0xc42014a360</span><span class="string">,0xc400ace455)</span></span><br><span class="line">    <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/server.go:402</span> <span class="string">+0xaa</span></span><br><span class="line"><span class="string">created</span> <span class="string">by</span> <span class="string">github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.(*serverConn).run</span></span><br><span class="line">    <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/server.go:401</span> <span class="string">+0x763</span></span><br><span class="line"> </span><br><span class="line"><span class="string">goroutine</span> <span class="number">22661145</span> [<span class="string">semacquire</span>, <span class="number">83</span> <span class="string">minutes</span>]<span class="string">:</span></span><br><span class="line"><span class="string">sync.runtime_SemacquireMutex(0xc4201e4004,</span> <span class="number">0x64b800</span><span class="string">)</span></span><br><span class="line">      <span class="string">/usr/local/go/src/runtime/sema.go:71</span> <span class="string">+0x3d</span></span><br><span class="line"><span class="string">sync.(*Mutex).Lock(0xc4201e4000)</span></span><br><span class="line">      <span class="string">/usr/local/go/src/sync/mutex.go:134</span> <span class="string">+0x108</span></span><br><span class="line"><span class="string">github.com/containerd/containerd/linux/shim.(*Service).State(0xc4201e4000,</span> <span class="number">0x6c3800</span><span class="string">,</span> <span class="number">0xc4200105c0</span><span class="string">,</span> <span class="number">0xc420120af0</span><span class="string">,</span> <span class="number">0x0</span><span class="string">,</span> <span class="number">0x0</span><span class="string">,</span> <span class="number">0x0</span><span class="string">)</span></span><br><span class="line">      <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/linux/shim/service.go:271</span> <span class="string">+0x59</span></span><br><span class="line"><span class="string">github.com/containerd/containerd/linux/shim/v1.RegisterShimService.func1(0x6c3800,</span> <span class="number">0xc4200105c0</span><span class="string">,</span> <span class="number">0xc42000a980</span><span class="string">,</span> <span class="number">0xc4201ce988</span><span class="string">,</span> <span class="number">0x5</span><span class="string">,</span> <span class="number">0xc4201be080</span><span class="string">,</span> <span class="number">0x0</span><span class="string">)</span></span><br><span class="line">      <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/linux/shim/v1/shim.pb.go:1607</span> <span class="string">+0xc8</span></span><br><span class="line"><span class="string">github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.(*serviceSet).dispatch(0xc4201ca008,</span> <span class="number">0x6c3800</span><span class="string">,</span> <span class="number">0xc4200105c0</span><span class="string">,</span> <span class="number">0xc4204e65a0</span><span class="string">,</span> <span class="number">0x25</span><span class="string">,</span> <span class="number">0xc4201ce988</span><span class="string">,</span> <span class="number">0x5</span><span class="string">,</span> <span class="number">0xc420226c80</span><span class="string">,</span> <span class="number">0x42</span><span class="string">,</span> <span class="number">0x42</span><span class="string">,</span> <span class="string">...)</span></span><br><span class="line">      <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/services.go:71</span> <span class="string">+0x10e</span></span><br><span class="line"><span class="string">github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.(*serviceSet).call(0xc4201ca008,</span> <span class="number">0x6c3800</span><span class="string">,</span> <span class="number">0xc4200105c0</span><span class="string">,</span> <span class="number">0xc4204e65a0</span><span class="string">,</span> <span class="number">0x25</span><span class="string">,</span> <span class="number">0xc4201ce988</span><span class="string">,</span> <span class="number">0x5</span><span class="string">,</span> <span class="number">0xc420226c80</span><span class="string">,</span> <span class="number">0x42</span><span class="string">,</span> <span class="number">0x42</span><span class="string">,</span> <span class="string">...)</span></span><br><span class="line">      <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/services.go:44</span> <span class="string">+0xb5</span></span><br><span class="line"><span class="string">github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.(*serverConn).run.func2(0xc4201563c0,</span> <span class="number">0x6c3800</span><span class="string">,</span> <span class="number">0xc4200105c0</span><span class="string">,</span> <span class="number">0xace457</span><span class="string">,</span> <span class="number">0xc420392940</span><span class="string">,</span> <span class="number">0xc42014a2a0</span><span class="string">,</span> <span class="number">0xc42014a360</span><span class="string">,</span> <span class="number">0xc400ace457</span><span class="string">)</span></span><br><span class="line">      <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/server.go:402</span> <span class="string">+0xaa</span></span><br><span class="line"><span class="string">created</span> <span class="string">by</span> <span class="string">github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.(*serverConn).run</span></span><br><span class="line">      <span class="string">/tmp/tmp.ma5UpKvnFZ/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/server.go:401</span> <span class="string">+0x763</span></span><br><span class="line"> </span><br><span class="line"><span class="string">===</span> <span class="string">END</span> <span class="string">goroutine</span> <span class="string">stack</span> <span class="string">dump</span> <span class="string">===&quot;</span> <span class="string">namespace=moby</span> <span class="string">path=&quot;/run/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/60f253d59f26e1c573d4ba5f824e73b3a4b1bb1629edace85caba4c620755d4d&quot;</span> <span class="string">pid=119820</span></span><br></pre></td></tr></table></figure><p>分析以上协程栈可知：</p><ul><li>goroutine 1：<code>handleSignals</code>确实阻塞在<code>reaper.go:44</code>，导致后续进程无法被收割</li><li>goroutine 18：<code>checkProcesses</code>阻塞在<code>service.go:470</code>，获取锁失败，但是并非是<code>reaper.Default</code>这把大锁</li><li>goroutine 22661144：<code>shim.(*Service).Kill</code>阻塞在<code>reaper.go:82</code></li></ul><p>其中，最为异常的是<code>goroutine 22661144</code>，其阻塞点代码如下：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(m *Monitor)</span> <span class="title">Wait</span><span class="params">(c *exec.Cmd, ec <span class="keyword">chan</span> runc.Exit)</span> <span class="params">(<span class="keyword">int</span>, error)</span></span> &#123;</span><br><span class="line">   <span class="keyword">for</span> e := <span class="keyword">range</span> ec &#123;                   <span class="comment">// reaper.go:82</span></span><br><span class="line">      <span class="keyword">if</span> e.Pid == c.Process.Pid &#123;</span><br><span class="line">         c.Wait()</span><br><span class="line">         m.Unsubscribe(ec)</span><br><span class="line">         <span class="keyword">return</span> e.Status, <span class="literal">nil</span></span><br><span class="line">      &#125;</span><br><span class="line">   &#125;</span><br><span class="line">   <span class="keyword">return</span> <span class="number">-1</span>, ErrNoSuchProcess</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>其中<code>ec</code>就是<code>reaper.Default</code>的一个订阅方。</p><p>死锁的形成如下：</p><ul><li>goroutine 22661144：等待着关联进程的退出事件到来，并且持有<code>Service.mu</code>这把锁</li><li>goroutine 18：等待获取<code>Service.mu</code>这把锁后，再去处理订阅的事件</li><li>goroutine 1：往所有订阅方发送事件</li></ul><p>这三个协程形成了完美的死锁现场。</p><h1 id="解决方案"><a href="#解决方案" class="headerlink" title="解决方案"></a>解决方案</h1><p>清楚了问题的成因之后，解决问题的方案也很简单，只需调整默认的订阅者信道大小即可。社区优化方案有二：</p><ul><li>调整信道大小，溢出事件自动忽略：<a href="https://github.com/containerd/containerd/pull/2748/files">https://github.com/containerd/containerd/pull/2748/files</a></li><li>优化锁逻辑：<a href="https://github.com/containerd/containerd/pull/2743">https://github.com/containerd/containerd/pull/2743</a> 【好几个commit】</li></ul><p>但是，当我们替换containerd-shim后，影响的也仅是在此之后创建的容器，而之前创建的容器仍然会出现该问题。</p><p>这个可以通过添加告警自愈的手段解决：直接杀containerd-shim进程。这样所有的进程都会由init进程完成收割。</p>]]></content>
    
    
      
      
    <summary type="html">&lt;h1 id=&quot;背景&quot;&gt;&lt;a href=&quot;#背景&quot; class=&quot;headerlink&quot; title=&quot;背景&quot;&gt;&lt;/a&gt;背景&lt;/h1&gt;&lt;p&gt;近期，线上报障了多起Pod删除失败的Case，用户的多次删除请求均已失败告终。Pod删除失败的影响主要有二：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;面向</summary>
      
    
    
    
    <category term="问题排查" scheme="https://plpan.github.io/categories/%E9%97%AE%E9%A2%98%E6%8E%92%E6%9F%A5/"/>
    
    
    <category term="kubernetes" scheme="https://plpan.github.io/tags/kubernetes/"/>
    
    <category term="docker" scheme="https://plpan.github.io/tags/docker/"/>
    
    <category term="containers" scheme="https://plpan.github.io/tags/containers/"/>
    
  </entry>
  
  <entry>
    <title>docker exec 失败问题排查之旅</title>
    <link href="https://plpan.github.io/docker-exec-%E5%A4%B1%E8%B4%A5%E9%97%AE%E9%A2%98%E6%8E%92%E6%9F%A5%E4%B9%8B%E6%97%85/"/>
    <id>https://plpan.github.io/docker-exec-%E5%A4%B1%E8%B4%A5%E9%97%AE%E9%A2%98%E6%8E%92%E6%9F%A5%E4%B9%8B%E6%97%85/</id>
    <published>2021-05-13T11:43:51.000Z</published>
    <updated>2021-06-27T08:33:36.285Z</updated>
    
    <content type="html"><![CDATA[<p>锄禾日当午，值班好辛苦；</p><p>汗滴禾下土，一查一下午。</p><h3 id="问题描述"><a href="#问题描述" class="headerlink" title="问题描述"></a>问题描述</h3><p>今天，在值班排查线上问题的过程中，发现系统日志一直在刷docker异常日志：</p><figure class="highlight apache"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="attribute">May</span> <span class="number">12</span> <span class="number">09</span>:<span class="number">08</span>:<span class="number">40</span> HOSTNAME dockerd[<span class="number">4085</span>]: time=<span class="string">&quot;2021-05-12T09:08:40.642410594+08:00&quot;</span> level=error msg=<span class="string">&quot;stream copy error: reading from a closed fifo&quot;</span></span><br><span class="line"><span class="attribute">May</span> <span class="number">12</span> <span class="number">09</span>:<span class="number">08</span>:<span class="number">40</span> HOSTNAME dockerd[<span class="number">4085</span>]: time=<span class="string">&quot;2021-05-12T09:08:40.642418571+08:00&quot;</span> level=error msg=<span class="string">&quot;stream copy error: reading from a closed fifo&quot;</span></span><br><span class="line"><span class="attribute">May</span> <span class="number">12</span> <span class="number">09</span>:<span class="number">08</span>:<span class="number">40</span> HOSTNAME dockerd[<span class="number">4085</span>]: time=<span class="string">&quot;2021-05-12T09:08:40.663754355+08:00&quot;</span> level=error msg=<span class="string">&quot;Error running exec 110deb1c1b2a2d2671d7368bd02bfc18a968e4712a3c771dedf0b362820e73cb in container: OCI runtime exec failed: exec failed: container_linux.go:348: starting container process caused \&quot;read init-p: connection reset by peer\&quot;: unknown&quot;</span></span><br></pre></td></tr></table></figure><p>从系统风险性上来看，异常日志出现的原因需要排查清楚，并摸清是否会对业务产生影响。</p><p>下文简单介绍问题排查的流程，以及产生的原因。</p><h3 id="问题排查"><a href="#问题排查" class="headerlink" title="问题排查"></a>问题排查</h3><p>现在我们唯一掌握的信息，只有系统日志告知dockerd执行exec失败。</p><p>在具体的问题分析之前，我们再来回顾一下docker的工作原理与调用链路：</p><p><img src="docker-call-path.png" alt="docker调用链路"></p><p>可见，docker的调用链路非常长，涉及组件也较多。因此，我们的排查路径主要分为如下两步：</p><ul><li>确定引起失败的组件</li><li>确定组件失败的原因</li></ul><h4 id="定位组件"><a href="#定位组件" class="headerlink" title="定位组件"></a>定位组件</h4><p>熟悉docker的用户能够一眼定位引起问题的组件。但是，我们还是按照常规的排查流程走一遍：</p><figure class="highlight angelscript"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br><span class="line">91</span><br><span class="line">92</span><br><span class="line">93</span><br><span class="line">94</span><br><span class="line">95</span><br><span class="line">96</span><br><span class="line">97</span><br><span class="line">98</span><br><span class="line">99</span><br><span class="line">100</span><br><span class="line">101</span><br><span class="line">102</span><br><span class="line">103</span><br><span class="line">104</span><br><span class="line">105</span><br><span class="line">106</span><br><span class="line">107</span><br><span class="line">108</span><br><span class="line">109</span><br><span class="line">110</span><br><span class="line">111</span><br><span class="line">112</span><br><span class="line">113</span><br><span class="line">114</span><br><span class="line">115</span><br><span class="line">116</span><br><span class="line">117</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// 1. 定位问题容器</span></span><br><span class="line"># sudo docker ps | grep -v pause | grep -v NAMES | awk <span class="string">&#x27;&#123;print $1&#125;&#x27;</span> | xargs -ti sudo docker exec &#123;&#125; sleep <span class="number">1</span></span><br><span class="line">sudo docker exec aa1e331ec24f sleep <span class="number">1</span></span><br><span class="line">OCI runtime exec failed: exec failed: container_linux.go:<span class="number">348</span>: starting container process caused <span class="string">&quot;read init-p: connection reset by peer&quot;</span>: unknown</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="comment">// 2. 排除docker嫌疑</span></span><br><span class="line"># docker-containerd-ctr -a /var/run/docker/containerd/docker-containerd.sock -n moby t exec --exec-id stupig1 aa1e331ec24f621ab3152ebe94f1e533734164af86c9df0f551eab2b1967ec4e sleep <span class="number">1</span></span><br><span class="line">ctr: OCI runtime exec failed: exec failed: container_linux.go:<span class="number">348</span>: starting container process caused <span class="string">&quot;read init-p: connection reset by peer&quot;</span>: unknown</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="comment">// 3. 排除containerd与containerd-shim嫌疑</span></span><br><span class="line"># docker-runc --root /var/run/docker/runtime-runc/moby/ exec aa1e331ec24f621ab3152ebe94f1e533734164af86c9df0f551eab2b1967ec4e sleep</span><br><span class="line">runtime/cgo: pthread_create failed: Resource temporarily unavailable</span><br><span class="line">SIGABRT: abort</span><br><span class="line">PC=<span class="number">0x6b657e</span> m=<span class="number">0</span> sigcode=<span class="number">18446744073709551610</span></span><br><span class="line"></span><br><span class="line">goroutine <span class="number">0</span> [idle]:</span><br><span class="line">runtime: unknown pc <span class="number">0x6b657e</span></span><br><span class="line">stack: frame=&#123;sp:<span class="number">0x7ffd30f0d218</span>, fp:<span class="number">0x0</span>&#125; stack=[<span class="number">0x7ffd2ab0e738</span>,<span class="number">0x7ffd30f0d760</span>)</span><br><span class="line"><span class="number">00007f</span>fd30f0d118:  <span class="number">0000000000000002</span>  <span class="number">00007f</span>fd30f7f184</span><br><span class="line"><span class="number">00007f</span>fd30f0d128:  <span class="number">000000000069</span>c31c  <span class="number">00007f</span>fd30f0d1a8</span><br><span class="line"><span class="number">00007f</span>fd30f0d138:  <span class="number">000000000045814</span>e &lt;runtime.callCgoMmap+<span class="number">62</span>&gt;  <span class="number">00007f</span>fd30f0d140</span><br><span class="line"><span class="number">00007f</span>fd30f0d148:  <span class="number">00007f</span>fd30f0d190  <span class="number">0000000000411</span>a88 &lt;runtime.persistentalloc1+<span class="number">456</span>&gt;</span><br><span class="line"><span class="number">00007f</span>fd30f0d158:  <span class="number">0000000000</span>bf6dd0  <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007f</span>fd30f0d168:  <span class="number">0000000000010000</span>  <span class="number">0000000000000008</span></span><br><span class="line"><span class="number">00007f</span>fd30f0d178:  <span class="number">0000000000</span>bf6dd8  <span class="number">0000000000</span>bf7ca0</span><br><span class="line"><span class="number">00007f</span>fd30f0d188:  <span class="number">00007f</span>dcbb4b7000  <span class="number">00007f</span>fd30f0d1c8</span><br><span class="line"><span class="number">00007f</span>fd30f0d198:  <span class="number">0000000000451205</span> &lt;runtime.persistentalloc.func1+<span class="number">69</span>&gt;  <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007f</span>fd30f0d1a8:  <span class="number">0000000000000000</span>  <span class="number">0000000000</span>c1c080</span><br><span class="line"><span class="number">00007f</span>fd30f0d1b8:  <span class="number">00007f</span>dcbb4b7000  <span class="number">00007f</span>fd30f0d1e0</span><br><span class="line"><span class="number">00007f</span>fd30f0d1c8:  <span class="number">00007f</span>fd30f0d210  <span class="number">00007f</span>fd30f0d220</span><br><span class="line"><span class="number">00007f</span>fd30f0d1d8:  <span class="number">0000000000000000</span>  <span class="number">00000000000000f</span>1</span><br><span class="line"><span class="number">00007f</span>fd30f0d1e8:  <span class="number">0000000000000011</span>  <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007f</span>fd30f0d1f8:  <span class="number">000000000069</span>c31c  <span class="number">0000000000</span>c1c080</span><br><span class="line"><span class="number">00007f</span>fd30f0d208:  <span class="number">000000000045814</span>e &lt;runtime.callCgoMmap+<span class="number">62</span>&gt;  <span class="number">00007f</span>fd30f0d210</span><br><span class="line"><span class="number">00007f</span>fd30f0d218: &lt;<span class="number">00007f</span>fd30f0d268  fffffffe7fffffff</span><br><span class="line"><span class="number">00007f</span>fd30f0d228:  ffffffffffffffff  ffffffffffffffff</span><br><span class="line"><span class="number">00007f</span>fd30f0d238:  ffffffffffffffff  ffffffffffffffff</span><br><span class="line"><span class="number">00007f</span>fd30f0d248:  ffffffffffffffff  ffffffffffffffff</span><br><span class="line"><span class="number">00007f</span>fd30f0d258:  ffffffffffffffff  ffffffffffffffff</span><br><span class="line"><span class="number">00007f</span>fd30f0d268:  ffffffffffffffff  ffffffffffffffff</span><br><span class="line"><span class="number">00007f</span>fd30f0d278:  ffffffffffffffff  ffffffffffffffff</span><br><span class="line"><span class="number">00007f</span>fd30f0d288:  ffffffffffffffff  ffffffffffffffff</span><br><span class="line"><span class="number">00007f</span>fd30f0d298:  ffffffffffffffff  <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007f</span>fd30f0d2a8:  <span class="number">00000000006</span>b68ba  <span class="number">0000000000000020</span></span><br><span class="line"><span class="number">00007f</span>fd30f0d2b8:  <span class="number">0000000000000000</span>  <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007f</span>fd30f0d2c8:  <span class="number">0000000000000000</span>  <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007f</span>fd30f0d2d8:  <span class="number">0000000000000000</span>  <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007f</span>fd30f0d2e8:  <span class="number">0000000000000000</span>  <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007f</span>fd30f0d2f8:  <span class="number">0000000000000000</span>  <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007f</span>fd30f0d308:  <span class="number">0000000000000000</span>  <span class="number">0000000000000000</span></span><br><span class="line">runtime: unknown pc <span class="number">0x6b657e</span></span><br><span class="line">stack: frame=&#123;sp:<span class="number">0x7ffd30f0d218</span>, fp:<span class="number">0x0</span>&#125; stack=[<span class="number">0x7ffd2ab0e738</span>,<span class="number">0x7ffd30f0d760</span>)</span><br><span class="line"><span class="number">00007f</span>fd30f0d118:  <span class="number">0000000000000002</span>  <span class="number">00007f</span>fd30f7f184</span><br><span class="line"><span class="number">00007f</span>fd30f0d128:  <span class="number">000000000069</span>c31c  <span class="number">00007f</span>fd30f0d1a8</span><br><span class="line"><span class="number">00007f</span>fd30f0d138:  <span class="number">000000000045814</span>e &lt;runtime.callCgoMmap+<span class="number">62</span>&gt;  <span class="number">00007f</span>fd30f0d140</span><br><span class="line"><span class="number">00007f</span>fd30f0d148:  <span class="number">00007f</span>fd30f0d190  <span class="number">0000000000411</span>a88 &lt;runtime.persistentalloc1+<span class="number">456</span>&gt;</span><br><span class="line"><span class="number">00007f</span>fd30f0d158:  <span class="number">0000000000</span>bf6dd0  <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007f</span>fd30f0d168:  <span class="number">0000000000010000</span>  <span class="number">0000000000000008</span></span><br><span class="line"><span class="number">00007f</span>fd30f0d178:  <span class="number">0000000000</span>bf6dd8  <span class="number">0000000000</span>bf7ca0</span><br><span class="line"><span class="number">00007f</span>fd30f0d188:  <span class="number">00007f</span>dcbb4b7000  <span class="number">00007f</span>fd30f0d1c8</span><br><span class="line"><span class="number">00007f</span>fd30f0d198:  <span class="number">0000000000451205</span> &lt;runtime.persistentalloc.func1+<span class="number">69</span>&gt;  <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007f</span>fd30f0d1a8:  <span class="number">0000000000000000</span>  <span class="number">0000000000</span>c1c080</span><br><span class="line"><span class="number">00007f</span>fd30f0d1b8:  <span class="number">00007f</span>dcbb4b7000  <span class="number">00007f</span>fd30f0d1e0</span><br><span class="line"><span class="number">00007f</span>fd30f0d1c8:  <span class="number">00007f</span>fd30f0d210  <span class="number">00007f</span>fd30f0d220</span><br><span class="line"><span class="number">00007f</span>fd30f0d1d8:  <span class="number">0000000000000000</span>  <span class="number">00000000000000f</span>1</span><br><span class="line"><span class="number">00007f</span>fd30f0d1e8:  <span class="number">0000000000000011</span>  <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007f</span>fd30f0d1f8:  <span class="number">000000000069</span>c31c  <span class="number">0000000000</span>c1c080</span><br><span class="line"><span class="number">00007f</span>fd30f0d208:  <span class="number">000000000045814</span>e &lt;runtime.callCgoMmap+<span class="number">62</span>&gt;  <span class="number">00007f</span>fd30f0d210</span><br><span class="line"><span class="number">00007f</span>fd30f0d218: &lt;<span class="number">00007f</span>fd30f0d268  fffffffe7fffffff</span><br><span class="line"><span class="number">00007f</span>fd30f0d228:  ffffffffffffffff  ffffffffffffffff</span><br><span class="line"><span class="number">00007f</span>fd30f0d238:  ffffffffffffffff  ffffffffffffffff</span><br><span class="line"><span class="number">00007f</span>fd30f0d248:  ffffffffffffffff  ffffffffffffffff</span><br><span class="line"><span class="number">00007f</span>fd30f0d258:  ffffffffffffffff  ffffffffffffffff</span><br><span class="line"><span class="number">00007f</span>fd30f0d268:  ffffffffffffffff  ffffffffffffffff</span><br><span class="line"><span class="number">00007f</span>fd30f0d278:  ffffffffffffffff  ffffffffffffffff</span><br><span class="line"><span class="number">00007f</span>fd30f0d288:  ffffffffffffffff  ffffffffffffffff</span><br><span class="line"><span class="number">00007f</span>fd30f0d298:  ffffffffffffffff  <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007f</span>fd30f0d2a8:  <span class="number">00000000006</span>b68ba  <span class="number">0000000000000020</span></span><br><span class="line"><span class="number">00007f</span>fd30f0d2b8:  <span class="number">0000000000000000</span>  <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007f</span>fd30f0d2c8:  <span class="number">0000000000000000</span>  <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007f</span>fd30f0d2d8:  <span class="number">0000000000000000</span>  <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007f</span>fd30f0d2e8:  <span class="number">0000000000000000</span>  <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007f</span>fd30f0d2f8:  <span class="number">0000000000000000</span>  <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007f</span>fd30f0d308:  <span class="number">0000000000000000</span>  <span class="number">0000000000000000</span></span><br><span class="line"></span><br><span class="line">goroutine <span class="number">1</span> [running]:</span><br><span class="line">runtime.systemstack_switch()</span><br><span class="line">/usr/local/go/src/runtime/asm_amd64.s:<span class="number">363</span> fp=<span class="number">0xc4200fe788</span> sp=<span class="number">0xc4200fe780</span> pc=<span class="number">0x454120</span></span><br><span class="line">runtime.main()</span><br><span class="line">/usr/local/go/src/runtime/proc.go:<span class="number">128</span> +<span class="number">0x63</span> fp=<span class="number">0xc4200fe7e0</span> sp=<span class="number">0xc4200fe788</span> pc=<span class="number">0x42bb83</span></span><br><span class="line">runtime.goexit()</span><br><span class="line">/usr/local/go/src/runtime/asm_amd64.s:<span class="number">2361</span> +<span class="number">0x1</span> fp=<span class="number">0xc4200fe7e8</span> sp=<span class="number">0xc4200fe7e0</span> pc=<span class="number">0x456c91</span></span><br><span class="line"></span><br><span class="line">rax    <span class="number">0x0</span></span><br><span class="line">rbx    <span class="number">0xbe2978</span></span><br><span class="line">rcx    <span class="number">0x6b657e</span></span><br><span class="line">rdx    <span class="number">0x0</span></span><br><span class="line">rdi    <span class="number">0x2</span></span><br><span class="line">rsi    <span class="number">0x7ffd30f0d1a0</span></span><br><span class="line">rbp    <span class="number">0x8347ce</span></span><br><span class="line">rsp    <span class="number">0x7ffd30f0d218</span></span><br><span class="line">r8     <span class="number">0x0</span></span><br><span class="line">r9     <span class="number">0x6</span></span><br><span class="line">r10    <span class="number">0x8</span></span><br><span class="line">r11    <span class="number">0x246</span></span><br><span class="line">r12    <span class="number">0x2bedc30</span></span><br><span class="line">r13    <span class="number">0xf1</span></span><br><span class="line">r14    <span class="number">0x11</span></span><br><span class="line">r15    <span class="number">0x0</span></span><br><span class="line">rip    <span class="number">0x6b657e</span></span><br><span class="line">rflags <span class="number">0x246</span></span><br><span class="line">cs     <span class="number">0x33</span></span><br><span class="line">fs     <span class="number">0x0</span></span><br><span class="line">gs     <span class="number">0x0</span></span><br><span class="line">exec failed: container_linux.go:<span class="number">348</span>: starting container process caused <span class="string">&quot;read init-p: connection reset by peer&quot;</span></span><br></pre></td></tr></table></figure><p>由上可知，异常是runc返回的。</p><h4 id="定位原因"><a href="#定位原因" class="headerlink" title="定位原因"></a>定位原因</h4><p>定位异常组件的同时，runc还给了我们一个惊喜：提供了详细的异常日志。</p><p>异常日志表明：runc exec失败的原因是因为 <code>Resource temporarily unavailable</code>，比较典型的资源不足问题。而常见的资源不足类型主要包含（ulimit -a）：</p><ul><li>线程数达到限制</li><li>文件数达到限制</li><li>内存达到限制</li></ul><p>因此，我们需要进一步排查业务容器的监控，以定位不足的资源类型。</p><p><img src="thread-monitor.png" alt="业务线程数监控指标"></p><p>上图展示了业务容器的线程数监控。所有容器的线程数都已经达到1w，而弹性云默认限制容器的线程数上限就是1w，设定该上限的原因，也是为了避免单容器线程泄漏而耗尽宿主机的线程资源。</p><figure class="highlight gradle"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"># cat <span class="regexp">/sys/</span>fs<span class="regexp">/cgroup/</span>pids<span class="regexp">/kubepods/</span>burstable<span class="regexp">/pod64a6c0e7-830c-11eb-86d6-b8cef604db88/</span>aa1e331ec24f621ab3152ebe94f1e533734164af86c9df0f551eab2b1967ec4e/pids.max</span><br><span class="line"><span class="number">10000</span></span><br></pre></td></tr></table></figure><p>至此，问题的原因已定位清楚，对，就是这么简单。</p><h3 id="runc梳理"><a href="#runc梳理" class="headerlink" title="runc梳理"></a>runc梳理</h3><p>虽然，我们已经定位了异常日志的成因，但是，对于runc的具体工作机制，一直只有一个模糊的概念。</p><p>趁此机会，我们以runc exec为例，梳理runc的工作流程。</p><ul><li>runc exec首先启动子进程runc init</li><li>runc init负责初始化容器namespace<ul><li>runc init利用C语言的constructor特性，实现在go代码启动之前，设置容器namespace</li><li>C代码nsexec执行两次clone，共三个线程：父进程，子进程，孙进程，完成对容器namespace的初始化</li><li>父进程与子进程完成初始化任务后退出，此时，孙进程已经在容器namespace内，孙进程开始执行go代码初始化，并等待接收runc exec发送配置</li></ul></li><li>runc exec将孙进程添加到容器cgroup</li><li>runc exec发送配置给孙进程，配置主要包含：exec的具体命令与参数等</li><li>孙进程调用system.Execv执行用户命令</li></ul><p>注意：</p><ul><li>步骤2.c与步骤3是并发执行的</li><li>runc exec与runc init通信基于socket pair对（init-p和init-c）</li></ul><p>runc exec过程中各进程的交互流程，以及namespace与cgroup的初始化参见下图：</p><p><img src="runc-detail.png" alt="runc工作流程"></p><p>综合我们对runc exec执行流程的梳理，以及runc exec返回的错误信息，我们基本定位到了runc exec返回错误的代码：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(p *setnsProcess)</span> <span class="title">start</span><span class="params">()</span> <span class="params">(err error)</span></span> &#123;</span><br><span class="line">   <span class="keyword">defer</span> p.parentPipe.Close()</span><br><span class="line">   err = p.cmd.Start()</span><br><span class="line">   p.childPipe.Close()</span><br><span class="line">   <span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">      <span class="keyword">return</span> newSystemErrorWithCause(err, <span class="string">&quot;starting setns process&quot;</span>)</span><br><span class="line">   &#125;</span><br><span class="line">   <span class="keyword">if</span> p.bootstrapData != <span class="literal">nil</span> &#123;</span><br><span class="line">      <span class="keyword">if</span> _, err := io.Copy(p.parentPipe, p.bootstrapData); err != <span class="literal">nil</span> &#123;       <span class="comment">// clone标志位，ns配置</span></span><br><span class="line">         <span class="keyword">return</span> newSystemErrorWithCause(err, <span class="string">&quot;copying bootstrap data to pipe&quot;</span>)</span><br><span class="line">      &#125;</span><br><span class="line">   &#125;</span><br><span class="line">   <span class="keyword">if</span> err = p.execSetns(); err != <span class="literal">nil</span> &#123;</span><br><span class="line">      <span class="keyword">return</span> newSystemErrorWithCause(err, <span class="string">&quot;executing setns process&quot;</span>)</span><br><span class="line">   &#125;</span><br><span class="line">   <span class="keyword">if</span> <span class="built_in">len</span>(p.cgroupPaths) &gt; <span class="number">0</span> &#123;</span><br><span class="line">      <span class="keyword">if</span> err := cgroups.EnterPid(p.cgroupPaths, p.pid()); err != <span class="literal">nil</span> &#123;        <span class="comment">// 这里将runc init添加到容器cgroup中</span></span><br><span class="line">         <span class="keyword">return</span> newSystemErrorWithCausef(err, <span class="string">&quot;adding pid %d to cgroups&quot;</span>, p.pid())</span><br><span class="line">      &#125;</span><br><span class="line">   &#125;</span><br><span class="line">   <span class="keyword">if</span> err := utils.WriteJSON(p.parentPipe, p.config); err != <span class="literal">nil</span> &#123;            <span class="comment">// 发送配置：命令、环境变量等</span></span><br><span class="line">      <span class="keyword">return</span> newSystemErrorWithCause(err, <span class="string">&quot;writing config to pipe&quot;</span>)</span><br><span class="line">   &#125;</span><br><span class="line"></span><br><span class="line">   ierr := parseSync(p.parentPipe, <span class="function"><span class="keyword">func</span><span class="params">(sync *syncT)</span> <span class="title">error</span></span> &#123;                  <span class="comment">// 这里返回 read init-p: connection reset by peer</span></span><br><span class="line">      <span class="keyword">switch</span> sync.Type &#123;</span><br><span class="line">      <span class="keyword">case</span> procReady:</span><br><span class="line">         <span class="comment">// This shouldn&#x27;t happen.</span></span><br><span class="line">         <span class="built_in">panic</span>(<span class="string">&quot;unexpected procReady in setns&quot;</span>)</span><br><span class="line">      <span class="keyword">case</span> procHooks:</span><br><span class="line">         <span class="comment">// This shouldn&#x27;t happen.</span></span><br><span class="line">         <span class="built_in">panic</span>(<span class="string">&quot;unexpected procHooks in setns&quot;</span>)</span><br><span class="line">      <span class="keyword">default</span>:</span><br><span class="line">         <span class="keyword">return</span> newSystemError(fmt.Errorf(<span class="string">&quot;invalid JSON payload from child&quot;</span>))</span><br><span class="line">      &#125;</span><br><span class="line">   &#125;)</span><br><span class="line">   <span class="keyword">if</span> ierr != <span class="literal">nil</span> &#123;</span><br><span class="line">      p.wait()</span><br><span class="line">      <span class="keyword">return</span> ierr</span><br><span class="line">   &#125;</span><br><span class="line">   <span class="keyword">return</span> <span class="literal">nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>现在，问题的成因与代码分析已全部完成。</p><h3 id="Reference"><a href="#Reference" class="headerlink" title="Reference"></a>Reference</h3><ol><li><a href="https://www.kernel.org/doc/Documentation/cgroup-v1/pids.txt">https://www.kernel.org/doc/Documentation/cgroup-v1/pids.txt</a></li><li><a href="https://github.com/opencontainers/runc">https://github.com/opencontainers/runc</a></li></ol>]]></content>
    
    
      
      
    <summary type="html">&lt;p&gt;锄禾日当午，值班好辛苦；&lt;/p&gt;
&lt;p&gt;汗滴禾下土，一查一下午。&lt;/p&gt;
&lt;h3 id=&quot;问题描述&quot;&gt;&lt;a href=&quot;#问题描述&quot; class=&quot;headerlink&quot; title=&quot;问题描述&quot;&gt;&lt;/a&gt;问题描述&lt;/h3&gt;&lt;p&gt;今天，在值班排查线上问题的过程中，发现系统日</summary>
      
    
    
    
    <category term="问题排查" scheme="https://plpan.github.io/categories/%E9%97%AE%E9%A2%98%E6%8E%92%E6%9F%A5/"/>
    
    
    <category term="docker" scheme="https://plpan.github.io/tags/docker/"/>
    
    <category term="containerd" scheme="https://plpan.github.io/tags/containerd/"/>
    
    <category term="runc" scheme="https://plpan.github.io/tags/runc/"/>
    
  </entry>
  
  <entry>
    <title>删除容器报错 device or resource busy 问题排查之旅</title>
    <link href="https://plpan.github.io/%E5%88%A0%E9%99%A4%E5%AE%B9%E5%99%A8%E6%8A%A5%E9%94%99-device-or-resource-busy-%E9%97%AE%E9%A2%98%E6%8E%92%E6%9F%A5%E4%B9%8B%E6%97%85/"/>
    <id>https://plpan.github.io/%E5%88%A0%E9%99%A4%E5%AE%B9%E5%99%A8%E6%8A%A5%E9%94%99-device-or-resource-busy-%E9%97%AE%E9%A2%98%E6%8E%92%E6%9F%A5%E4%B9%8B%E6%97%85/</id>
    <published>2020-10-15T03:59:32.000Z</published>
    <updated>2020-11-12T14:53:22.099Z</updated>
    
    <content type="html"><![CDATA[<h3 id="背景"><a href="#背景" class="headerlink" title="背景"></a>背景</h3><p>承接<a href="https://plpan.github.io/pod-terminating-%E6%8E%92%E6%9F%A5%E4%B9%8B%E6%97%85/">上文</a>，近期我们排查弹性云线上几起故障时，故障由多个因素共同引起，列举如下：</p><ul><li>弹性云在逐步灰度升级docker版本至 <code>18.06.3-ce</code></li><li>由于历史原因，弹性云启用了docker服务的systemd配置选项 <code>MountFlags=slave</code></li><li>为了避免dockerd重启引起业务容器重建，弹性云启用了 <code>live-restore=true</code> 配置，docker服务发生重启，dockerd与shim进程mnt ns不一致</li></ul><p>在以上三个因素合力作用下，线上容器在重建与漂移场景下，出现删除失败的事件。</p><p>同样，文章最后也给出了两种解决方案：</p><ul><li>长痛：修改代码，忽略错误</li><li>短痛：修改配置，一劳永逸</li></ul><p>作为优秀的社会主义接班人，我们当然选择短痛了！依据官方提示 <code>MountFlags=slave</code> 与 <code>live-restore=true</code> 不能协同工作，那么我们只需关闭二者之一就能解决问题。</p><p>与我们而言，docker提供的 <code>live-restore</code> 能力是一个很关键的特性。docker重启的原因多种多样，既可能是人为调试因素，也可能是机器的非预期行为，当docker重启后，我们并不希望用户的容器也发生重建。似乎关闭 <code>MountFlags=slave</code> 成了我们唯一的选择。</p><p>等等，回想一下<a href="https://blog.terminus.io/docker-device-is-busy/">docker device busy问题解决方案</a>，别人正是为了避免docker挂载泄漏而引起删除容器失败才开启的这个特性。</p><p>但是，这个17年的结论真的还具有普适性吗？是与不是，我们亲自验证即可。</p><h3 id="对比实验"><a href="#对比实验" class="headerlink" title="对比实验"></a>对比实验</h3><p>为了验证在关闭 <code>MountFlags=slave</code> 选项后，docker是否存在挂载点泄漏的问题，我们分别挑选了一台 <code>1.13.1</code> 与 <code>18.06.3-ce</code> 的宿主进行实验。实验步骤正如<a href="https://blog.terminus.io/docker-device-is-busy/">docker device busy问题解决方案</a>所提示，在验证之前，环境准备如下：</p><ul><li>删除docker服务的systemd配置项 <code>MountFlags=slave</code></li><li>挑选启用systemd配置项 <code>PrivateTmp=true</code> 的任意服务，本文以 <code>httpd</code> 为例</li></ul><p>下面开始验证：</p><figure class="highlight awk"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br></pre></td><td class="code"><pre><span class="line"><span class="regexp">//</span><span class="regexp">//</span><span class="regexp">//</span> docker <span class="number">1.13</span>.<span class="number">1</span> 验证步骤及结果</span><br><span class="line"><span class="regexp">//</span> <span class="number">1</span>. 重新加载配置</span><br><span class="line">[stupig@hostname2 ~]$ sudo systemctl daemon-reload</span><br><span class="line"> </span><br><span class="line"><span class="regexp">//</span> <span class="number">2</span>. 重启docker</span><br><span class="line">[stupig@hostname2 ~]$ sudo systemctl restart docker</span><br><span class="line"> </span><br><span class="line"><span class="regexp">//</span> <span class="number">3</span>. 创建容器</span><br><span class="line">[stupig@hostname2 ~]$ sudo docker run -d nginx</span><br><span class="line">c89c2aeff6e3e6414dfc7f448b4a560b4aac96d69a82ba021b78ee576bf6771c</span><br><span class="line"> </span><br><span class="line"><span class="regexp">//</span> <span class="number">4</span>. 重启httpd</span><br><span class="line">[stupig@hostname2 ~]$ sudo systemctl restart httpd</span><br><span class="line"> </span><br><span class="line"><span class="regexp">//</span> <span class="number">5</span>. 停止容器</span><br><span class="line">[stupig@hostname2 ~]$ sudo docker stop c89c2aeff6e3e6414dfc7f448b4a560b4aac96d69a82ba021b78ee576bf6771c</span><br><span class="line">c89c2aeff6e3e6414dfc7f448b4a560b4aac96d69a82ba021b78ee576bf6771c</span><br><span class="line"> </span><br><span class="line"><span class="regexp">//</span> <span class="number">6</span>. 清理容器</span><br><span class="line">[stupig@hostname2 ~]$ sudo docker rm c89c2aeff6e3e6414dfc7f448b4a560b4aac96d69a82ba021b78ee576bf6771c</span><br><span class="line">Error response from daemon: Driver overlay2 failed to remove root filesystem c89c2aeff6e3e6414dfc7f448b4a560b4aac96d69a82ba021b78ee576bf6771c: remove <span class="regexp">/home/</span>docker_rt<span class="regexp">/overlay2/</span><span class="number">6</span>c77cfb6c0c4b1e809c47af3c5ff6a4732a783cc14ff53270a7709c837c96346/merged: device or resource busy</span><br><span class="line"> </span><br><span class="line"><span class="regexp">//</span> <span class="number">7</span>. 定位挂载点</span><br><span class="line">[stupig@hostname2 ~]$ grep -rwn <span class="regexp">/home/</span>docker_rt<span class="regexp">/overlay2/</span><span class="number">6</span>c77cfb6c0c4b1e809c47af3c5ff6a4732a783cc14ff53270a7709c837c96346<span class="regexp">/merged /</span>proc<span class="regexp">/*/m</span>ountinfo</span><br><span class="line"><span class="regexp">/proc/</span><span class="number">19973</span><span class="regexp">/mountinfo:40:231 227 0:40 /</span> <span class="regexp">/home/</span>docker_rt<span class="regexp">/overlay2/</span><span class="number">6</span>c77cfb6c0c4b1e809c47af3c5ff6a4732a783cc14ff53270a7709c837c96346/merged rw,relatime shared:<span class="number">119</span> - overlay overlay rw,lowerdir=XXX,upperdir=XXX,workdir=XXX</span><br><span class="line"><span class="regexp">/proc/</span><span class="number">19974</span><span class="regexp">/mountinfo:40:231 227 0:40 /</span> <span class="regexp">/home/</span>docker_rt<span class="regexp">/overlay2/</span><span class="number">6</span>c77cfb6c0c4b1e809c47af3c5ff6a4732a783cc14ff53270a7709c837c96346/merged rw,relatime shared:<span class="number">119</span> - overlay overlay rw,lowerdir=XXX,upperdir=XXX,workdir=XXX</span><br><span class="line"><span class="regexp">/proc/</span><span class="number">19975</span><span class="regexp">/mountinfo:40:231 227 0:40 /</span> <span class="regexp">/home/</span>docker_rt<span class="regexp">/overlay2/</span><span class="number">6</span>c77cfb6c0c4b1e809c47af3c5ff6a4732a783cc14ff53270a7709c837c96346/merged rw,relatime shared:<span class="number">119</span> - overlay overlay rw,lowerdir=XXX,upperdir=XXX,workdir=XXX</span><br><span class="line"><span class="regexp">/proc/</span><span class="number">19976</span><span class="regexp">/mountinfo:40:231 227 0:40 /</span> <span class="regexp">/home/</span>docker_rt<span class="regexp">/overlay2/</span><span class="number">6</span>c77cfb6c0c4b1e809c47af3c5ff6a4732a783cc14ff53270a7709c837c96346/merged rw,relatime shared:<span class="number">119</span> - overlay overlay rw,lowerdir=XXX,upperdir=XXX,workdir=XXX</span><br><span class="line"><span class="regexp">/proc/</span><span class="number">19977</span><span class="regexp">/mountinfo:40:231 227 0:40 /</span> <span class="regexp">/home/</span>docker_rt<span class="regexp">/overlay2/</span><span class="number">6</span>c77cfb6c0c4b1e809c47af3c5ff6a4732a783cc14ff53270a7709c837c96346/merged rw,relatime shared:<span class="number">119</span> - overlay overlay rw,lowerdir=XXX,upperdir=XXX,workdir=XXX</span><br><span class="line"><span class="regexp">/proc/</span><span class="number">19978</span><span class="regexp">/mountinfo:40:231 227 0:40 /</span> <span class="regexp">/home/</span>docker_rt<span class="regexp">/overlay2/</span><span class="number">6</span>c77cfb6c0c4b1e809c47af3c5ff6a4732a783cc14ff53270a7709c837c96346/merged rw,relatime shared:<span class="number">119</span> - overlay overlay rw,lowerdir=XXX,upperdir=XXX,workdir=XXX</span><br><span class="line"> </span><br><span class="line"><span class="regexp">//</span> <span class="number">8</span>. 定位目标进程</span><br><span class="line">[stupig@hostname2 ~]$ ps -ef|egrep <span class="string">&#x27;19973|19974|19975|19976|19977|19978&#x27;</span></span><br><span class="line">root     <span class="number">19973</span>     <span class="number">1</span>  <span class="number">0</span> <span class="number">15</span>:<span class="number">13</span> ?        <span class="number">00</span>:<span class="number">00</span>:<span class="number">00</span> <span class="regexp">/usr/</span>sbin/httpd -DFOREGROUND</span><br><span class="line">apache   <span class="number">19974</span> <span class="number">19973</span>  <span class="number">0</span> <span class="number">15</span>:<span class="number">13</span> ?        <span class="number">00</span>:<span class="number">00</span>:<span class="number">00</span> <span class="regexp">/usr/</span>sbin/httpd -DFOREGROUND</span><br><span class="line">apache   <span class="number">19975</span> <span class="number">19973</span>  <span class="number">0</span> <span class="number">15</span>:<span class="number">13</span> ?        <span class="number">00</span>:<span class="number">00</span>:<span class="number">00</span> <span class="regexp">/usr/</span>sbin/httpd -DFOREGROUND</span><br><span class="line">apache   <span class="number">19976</span> <span class="number">19973</span>  <span class="number">0</span> <span class="number">15</span>:<span class="number">13</span> ?        <span class="number">00</span>:<span class="number">00</span>:<span class="number">00</span> <span class="regexp">/usr/</span>sbin/httpd -DFOREGROUND</span><br><span class="line">apache   <span class="number">19977</span> <span class="number">19973</span>  <span class="number">0</span> <span class="number">15</span>:<span class="number">13</span> ?        <span class="number">00</span>:<span class="number">00</span>:<span class="number">00</span> <span class="regexp">/usr/</span>sbin/httpd -DFOREGROUND</span><br><span class="line">apache   <span class="number">19978</span> <span class="number">19973</span>  <span class="number">0</span> <span class="number">15</span>:<span class="number">13</span> ?        <span class="number">00</span>:<span class="number">00</span>:<span class="number">00</span> <span class="regexp">/usr/</span>sbin/httpd -DFOREGROUND</span><br></pre></td></tr></table></figure><p>docker <code>1.13.1</code> 版本的实验结果正如网文所料，容器读写层挂载点出现了泄漏，并且 <code>docker rm</code> 无法清理该容器（注意 <code>docker rm -f</code> 仍然可以清理，原因参考上文）。</p><p>弹性云启用docker配置 <code>MountFlags=slave</code> 也是为了避免该问题发生。</p><p>那么现在压力转移到 docker <code>18.06.3-ce</code> 这边来了，新版本是否仍然存在这个问题呢？</p><figure class="highlight llvm"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br></pre></td><td class="code"><pre><span class="line">////// docker <span class="number">18.06</span>.<span class="number">3</span>-ce 验证步骤及结果</span><br><span class="line">[stupig<span class="title">@hostname</span> ~]$ sudo systemctl daemon-reload</span><br><span class="line"> </span><br><span class="line">[stupig<span class="title">@hostname</span> ~]$ sudo systemctl restart docker</span><br><span class="line"> </span><br><span class="line">[stupig<span class="title">@hostname</span> ~]$ sudo docker run -d nginx</span><br><span class="line"><span class="number">718114321</span>d<span class="number">67</span>a<span class="number">817</span><span class="keyword">c</span><span class="number">1498e530</span>b<span class="number">943</span><span class="keyword">c</span><span class="number">2514</span>ed<span class="number">4200</span>f<span class="number">2</span>d<span class="number">0</span>d<span class="number">138880</span>f<span class="number">8</span><span class="keyword">c</span><span class="number">345</span>df<span class="number">7048</span>f</span><br><span class="line"> </span><br><span class="line">[stupig<span class="title">@hostname</span> ~]$ sudo systemctl restart httpd</span><br><span class="line"> </span><br><span class="line">[stupig<span class="title">@hostname</span> ~]$ sudo docker stop <span class="number">718114321</span>d<span class="number">67</span>a<span class="number">817</span><span class="keyword">c</span><span class="number">1498e530</span>b<span class="number">943</span><span class="keyword">c</span><span class="number">2514</span>ed<span class="number">4200</span>f<span class="number">2</span>d<span class="number">0</span>d<span class="number">138880</span>f<span class="number">8</span><span class="keyword">c</span><span class="number">345</span>df<span class="number">7048</span>f</span><br><span class="line"><span class="number">718114321</span>d<span class="number">67</span>a<span class="number">817</span><span class="keyword">c</span><span class="number">1498e530</span>b<span class="number">943</span><span class="keyword">c</span><span class="number">2514</span>ed<span class="number">4200</span>f<span class="number">2</span>d<span class="number">0</span>d<span class="number">138880</span>f<span class="number">8</span><span class="keyword">c</span><span class="number">345</span>df<span class="number">7048</span>f</span><br><span class="line"> </span><br><span class="line">[stupig<span class="title">@hostname</span> ~]$ sudo docker rm <span class="number">718114321</span>d<span class="number">67</span>a<span class="number">817</span><span class="keyword">c</span><span class="number">1498e530</span>b<span class="number">943</span><span class="keyword">c</span><span class="number">2514</span>ed<span class="number">4200</span>f<span class="number">2</span>d<span class="number">0</span>d<span class="number">138880</span>f<span class="number">8</span><span class="keyword">c</span><span class="number">345</span>df<span class="number">7048</span>f</span><br><span class="line"><span class="number">718114321</span>d<span class="number">67</span>a<span class="number">817</span><span class="keyword">c</span><span class="number">1498e530</span>b<span class="number">943</span><span class="keyword">c</span><span class="number">2514</span>ed<span class="number">4200</span>f<span class="number">2</span>d<span class="number">0</span>d<span class="number">138880</span>f<span class="number">8</span><span class="keyword">c</span><span class="number">345</span>df<span class="number">7048</span>f</span><br></pre></td></tr></table></figure><p>针对docker <code>18.06.3-ce</code> 的实验非常丝滑顺畅，不存在任何问题。回顾上文知识点，当容器读写层挂载点出现泄漏后，docker <code>18.06.3-ce</code> 清理容器必定失败，而现在的结果却成功了，说明容器读写层挂载点没有泄漏。</p><p>这简直就是黎明的曙光。</p><h3 id="蛛丝马迹"><a href="#蛛丝马迹" class="headerlink" title="蛛丝马迹"></a>蛛丝马迹</h3><p>上一节对比实验的结果给了我们莫大的鼓励，本节我们探索两个版本的docker的表现差异，以期定位症结所在。</p><p>既然核心问题在于挂载点是否被泄漏，那么我们就以挂载点为切入点，深入分析两个版本docker的差异性。我们对比在两个环境下执行完 <code>步骤4</code> 后，不同进程内的挂载详情，结果如下：</p><figure class="highlight gradle"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// docker 1.13.1</span></span><br><span class="line">[stupig@hostname2 ~]$ sudo docker run -d nginx</span><br><span class="line"><span class="number">0</span>fe8d412f99a53229ea0df3ec44c93496e150a39f724ea304adb7f924910d61b</span><br><span class="line"> </span><br><span class="line">[stupig@hostname2 ~]$ sudo docker <span class="keyword">inspect</span> -f &#123;&#123;.GraphDriver.Data.MergedDir&#125;&#125; <span class="number">0</span>fe8d412f99a53229ea0df3ec44c93496e150a39f724ea304adb7f924910d61b</span><br><span class="line"><span class="regexp">/home/</span>docker_rt<span class="regexp">/overlay2/</span><span class="number">4</span>e09fa6803feab9d96fe72a44fb83d757c1788812ff60071ac2e62a5cf14cd97/merged</span><br><span class="line"> </span><br><span class="line"><span class="comment">// 共享命名空间</span></span><br><span class="line">[stupig@hostname2 ~]$ <span class="keyword">grep</span> -rw <span class="regexp">/home/</span>docker_rt<span class="regexp">/overlay2/</span><span class="number">4</span>e09fa6803feab9d96fe72a44fb83d757c1788812ff60071ac2e62a5cf14cd97<span class="regexp">/merged /</span>proc<span class="regexp">/$$/m</span>ountinfo</span><br><span class="line"><span class="number">223</span> <span class="number">1143</span> <span class="number">0</span>:<span class="number">40</span> <span class="regexp">/ /</span>home<span class="regexp">/docker_rt/</span>overlay2<span class="regexp">/4e09fa6803feab9d96fe72a44fb83d757c1788812ff60071ac2e62a5cf14cd97/m</span>erged rw,relatime - overlay overlay rw,lowerdir=XXX,upperdir=XXX,workdir=XXX</span><br><span class="line"> </span><br><span class="line">[stupig@hostname2 ~]$ sudo systemctl restart httpd</span><br><span class="line"> </span><br><span class="line">[stupig@hostname2 ps -ef|<span class="keyword">grep</span> httpd|head -n <span class="number">1</span></span><br><span class="line">root     <span class="number">16715</span>     <span class="number">1</span>  <span class="number">2</span> <span class="number">16</span>:<span class="number">09</span> ?        <span class="number">00</span>:<span class="number">00</span>:<span class="number">00</span> <span class="regexp">/usr/</span>sbin/httpd -DFOREGROUND</span><br><span class="line"> </span><br><span class="line"><span class="comment">// httpd进程命名空间</span></span><br><span class="line">[stupig@hostname2 ~]$ <span class="keyword">grep</span> -rw <span class="regexp">/home/</span>docker_rt<span class="regexp">/overlay2/</span><span class="number">4</span>e09fa6803feab9d96fe72a44fb83d757c1788812ff60071ac2e62a5cf14cd97<span class="regexp">/merged /</span>proc<span class="regexp">/16715/m</span>ountinfo</span><br><span class="line"><span class="number">257</span> <span class="number">235</span> <span class="number">0</span>:<span class="number">40</span> <span class="regexp">/ /</span>home<span class="regexp">/docker_rt/</span>overlay2<span class="regexp">/4e09fa6803feab9d96fe72a44fb83d757c1788812ff60071ac2e62a5cf14cd97/m</span>erged rw,relatime shared:<span class="number">123</span> - overlay overlay rw,lowerdir=XXX,upperdir=XXX,workdir=XXX</span><br><span class="line"></span><br></pre></td></tr></table></figure><figure class="highlight gradle"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// docker 18.06.3-ce</span></span><br><span class="line">[stupig@hostname ~]$ sudo docker run -d nginx</span><br><span class="line">ce75d4fdb6df6d13a7bf4270f71b3752ee2d3849df1f64d5d5d19a478ac7db8d</span><br><span class="line"> </span><br><span class="line">[stupig@hostname ~]$ sudo docker <span class="keyword">inspect</span> -f &#123;&#123;.GraphDriver.Data.MergedDir&#125;&#125; ce75d4fdb6df6d13a7bf4270f71b3752ee2d3849df1f64d5d5d19a478ac7db8d</span><br><span class="line"><span class="regexp">/home/</span>docker_rt<span class="regexp">/overlay2/</span>a9823ed6b3c5a752eaa92072ff9d91dbe1467ceece3eedf613bf6ffaa5183b76/merged</span><br><span class="line"> </span><br><span class="line"><span class="comment">// 共享命名空间</span></span><br><span class="line">[stupig@hostname ~]$ <span class="keyword">grep</span> -rw <span class="regexp">/home/</span>docker_rt<span class="regexp">/overlay2/</span>a9823ed6b3c5a752eaa92072ff9d91dbe1467ceece3eedf613bf6ffaa5183b76<span class="regexp">/merged /</span>proc<span class="regexp">/$$/m</span>ountinfo</span><br><span class="line"><span class="number">218</span> <span class="number">43</span> <span class="number">0</span>:<span class="number">105</span> <span class="regexp">/ /</span>home<span class="regexp">/docker_rt/</span>overlay2<span class="regexp">/a9823ed6b3c5a752eaa92072ff9d91dbe1467ceece3eedf613bf6ffaa5183b76/m</span>erged rw,relatime shared:<span class="number">109</span> - overlay overlay rw,lowerdir=XXX,upperdir=XXX,workdir=XXX</span><br><span class="line"> </span><br><span class="line">[stupig@hostname ~]$ sudo systemctl restart httpd</span><br><span class="line"> </span><br><span class="line">[stupig@hostname ~]$ ps -ef|<span class="keyword">grep</span> httpd|head -n <span class="number">1</span></span><br><span class="line">root      <span class="number">63694</span>      <span class="number">1</span>  <span class="number">0</span> <span class="number">16</span>:<span class="number">14</span> ?        <span class="number">00</span>:<span class="number">00</span>:<span class="number">00</span> <span class="regexp">/usr/</span>sbin/httpd -DFOREGROUND</span><br><span class="line"> </span><br><span class="line"><span class="comment">// httpd进程命名空间</span></span><br><span class="line">[stupig@hostname ~]$ <span class="keyword">grep</span> -rw <span class="regexp">/home/</span>docker_rt<span class="regexp">/overlay2/</span>a9823ed6b3c5a752eaa92072ff9d91dbe1467ceece3eedf613bf6ffaa5183b76<span class="regexp">/merged /</span>proc<span class="regexp">/63694/m</span>ountinfo</span><br><span class="line"><span class="number">435</span> <span class="number">376</span> <span class="number">0</span>:<span class="number">105</span> <span class="regexp">/ /</span>home<span class="regexp">/docker_rt/</span>overlay2<span class="regexp">/a9823ed6b3c5a752eaa92072ff9d91dbe1467ceece3eedf613bf6ffaa5183b76/m</span>erged rw,relatime shared:<span class="number">122</span> master:<span class="number">109</span> - overlay overlay rw,lowerdir=XXX,upperdir=XXX,workdir=XXX</span><br></pre></td></tr></table></figure><p>咋一看，好像没啥区别啊！睁大你们的火眼金睛，是否发现差异所在了？</p><p>如果细心对比，还是很容易分辨出差异所在的：</p><ul><li>共享命名空间中<ul><li>docker <code>18.06.3-ce</code> 版本创建的挂载点是shared的</li><li>而docker <code>1.13.1</code> 版本创建的挂载点是private的</li></ul></li><li>httpd进程命名空间中<ul><li>docker <code>18.06.3-ce</code> 创建的挂载点仍然是共享的，并且接收共享组109传递的挂载与卸载事件，注意：共享组109正好就是共享命名空间中对应的挂载点</li><li>而docker <code>1.13.1</code> 版本创建的挂载点虽然也是共享的，但是却与共享命名空间中对应的挂载点没有关联关系</li></ul></li></ul><p>可能会有用户不禁要问：怎么分辨挂载点是什么类型？以及不同类型挂载点的传递属性呢？请参阅：<a href="https://man7.org/linux/man-pages/man7/mount_namespaces.7.html">mount命名空间说明文档</a>。</p><p>问题已然明了，由于两个版本docker所创建的容器读写层挂载点具备不同的属性，导致它们之间的行为差异。</p><h3 id="刨根问底"><a href="#刨根问底" class="headerlink" title="刨根问底"></a>刨根问底</h3><p>相信大家如果理解了上一节的内容，就已经了解了问题的本质。本节我们继续探索问题的根因。</p><p>为什么两个版本的docker行为表现不一致？不外乎两个主要原因：</p><ol><li>docker处理逻辑发生变动</li><li>宿主环境不一致，主要指内核</li></ol><p>第二个因素很好排除，我们对比了两个测试环境的宿主内核版本，结果是一致的。所以，基本还是因docker代码升级而产生的行为不一致。理论上，我们只需逐个分析docker <code>1.13.1</code> 与 docker <code>18.06.3-ce</code> 两个版本间的所有提交记录，就一定能够定位到关键提交信息，大力总是会出现奇迹。</p><p>但是，我们还是希望能够从现场中发现有用信息，缩小检索范围。</p><p>仍然从挂载点切入，既然两个版本的docker所创建的挂载点在共享命名空间中就已经出现差异，我们顺藤摸瓜，找找容器读写层挂载点链路上是否存在差异：</p><figure class="highlight awk"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br></pre></td><td class="code"><pre><span class="line"><span class="regexp">//</span> docker <span class="number">1.13</span>.<span class="number">1</span></span><br><span class="line"><span class="regexp">//</span> 本挂载点</span><br><span class="line">[stupig@hostname2 ~]$ grep -rw <span class="regexp">/home/</span>docker_rt<span class="regexp">/overlay2/</span><span class="number">4</span>e09fa6803feab9d96fe72a44fb83d757c1788812ff60071ac2e62a5cf14cd97<span class="regexp">/merged /</span>proc<span class="regexp">/$$/m</span>ountinfo</span><br><span class="line"><span class="number">223</span> <span class="number">1143</span> <span class="number">0</span>:<span class="number">40</span> <span class="regexp">/ /</span>home<span class="regexp">/docker_rt/</span>overlay2<span class="regexp">/4e09fa6803feab9d96fe72a44fb83d757c1788812ff60071ac2e62a5cf14cd97/m</span>erged rw,relatime - overlay overlay rw,lowerdir=XXX,upperdir=XXX,workdir=XXX</span><br><span class="line"> </span><br><span class="line"><span class="regexp">//</span> 定位本挂载点的父挂载点</span><br><span class="line">[stupig@hostname2 ~]$ grep -rw <span class="number">1143</span> <span class="regexp">/proc/</span>$$/mountinfo</span><br><span class="line"><span class="number">1143</span> <span class="number">44</span> <span class="number">8</span>:<span class="number">4</span> <span class="regexp">/docker_rt/</span>overlay2 <span class="regexp">/home/</span>docker_rt<span class="regexp">/overlay2 rw,relatime - xfs /</span>dev/sda4 rw,attr2,inode64,logbsize=<span class="number">256</span>k,sunit=<span class="number">512</span>,swidth=<span class="number">512</span>,prjquota</span><br><span class="line"> </span><br><span class="line"><span class="regexp">//</span> 继续定位祖父挂载点</span><br><span class="line">[stupig@hostname2 ~]$ grep -rw <span class="number">44</span> <span class="regexp">/proc/</span>$$/mountinfo</span><br><span class="line"><span class="number">44</span> <span class="number">39</span> <span class="number">8</span>:<span class="number">4</span> <span class="regexp">/ /</span>home rw,relatime shared:<span class="number">28</span> - xfs <span class="regexp">/dev/</span>sda4 rw,attr2,inode64,logbsize=<span class="number">256</span>k,sunit=<span class="number">512</span>,swidth=<span class="number">512</span>,prjquota</span><br><span class="line"> </span><br><span class="line"><span class="regexp">//</span> 继续往上</span><br><span class="line">[stupig@hostname2 ~]$ grep -rw <span class="number">39</span> <span class="regexp">/proc/</span>$$/mountinfo</span><br><span class="line"><span class="number">39</span> <span class="number">1</span> <span class="number">8</span>:<span class="number">3</span> <span class="regexp">/ /</span> rw,relatime shared:<span class="number">1</span> - ext4 <span class="regexp">/dev/</span>sda3 rw,stripe=<span class="number">64</span>,data=ordered</span><br><span class="line"> </span><br><span class="line"><span class="regexp">//</span> docker <span class="number">18.06</span>.<span class="number">3</span>-ce</span><br><span class="line"><span class="regexp">//</span> 本挂载点</span><br><span class="line">[stupig@hostname ~]$ grep -rw <span class="regexp">/home/</span>docker_rt<span class="regexp">/overlay2/</span>a9823ed6b3c5a752eaa92072ff9d91dbe1467ceece3eedf613bf6ffaa5183b76<span class="regexp">/merged /</span>proc<span class="regexp">/$$/m</span>ountinfo</span><br><span class="line"><span class="number">218</span> <span class="number">43</span> <span class="number">0</span>:<span class="number">105</span> <span class="regexp">/ /</span>home<span class="regexp">/docker_rt/</span>overlay2<span class="regexp">/a9823ed6b3c5a752eaa92072ff9d91dbe1467ceece3eedf613bf6ffaa5183b76/m</span>erged rw,relatime shared:<span class="number">109</span> - overlay overlay rw,lowerdir=XXX,upperdir=XXX,workdir=XXX</span><br><span class="line"> </span><br><span class="line"><span class="regexp">//</span> 定位本挂在点的父挂载点</span><br><span class="line">[stupig@hostname ~]$ grep -rw <span class="number">43</span> <span class="regexp">/proc/</span>$$/mountinfo</span><br><span class="line"><span class="number">43</span> <span class="number">61</span> <span class="number">8</span>:<span class="number">17</span> <span class="regexp">/ /</span>home rw,noatime shared:<span class="number">29</span> - xfs <span class="regexp">/dev/</span>sdb1 rw,attr2,nobarrier,inode64,prjquota</span><br><span class="line"> </span><br><span class="line"><span class="regexp">//</span> 继续定位祖父挂载点</span><br><span class="line">[stupig@hostname ~]$ grep -rw <span class="number">61</span> <span class="regexp">/proc/</span>$$/mountinfo</span><br><span class="line"><span class="number">61</span> <span class="number">1</span> <span class="number">8</span>:<span class="number">3</span> <span class="regexp">/ /</span> rw,relatime shared:<span class="number">1</span> - ext4 <span class="regexp">/dev/</span>sda3 rw,data=ordered</span><br></pre></td></tr></table></figure><p>两个版本的docker所创建的容器读写层挂载点链路上差异还是非常明显的：</p><ul><li>容器读写层挂载点的父级挂载点不同<ul><li>docker <code>18.06.3-ce</code> 创建的容器读写层挂载点的父级挂载点是 <code>/home/</code> ，并且是共享的</li><li>docker <code>1.13.1</code> 创建的容器读写层挂载点的父级挂载点是 <code>/home/docker_rt/overlay2</code> ，并且是私有的</li></ul></li></ul><p>这里补充一个背景，弹性云机器在初始化阶段，会将 <code>/home</code> 初始化为xfs文件系统类型，因此所有宿主上 <code>/home</code> 挂载点都具备相同属性。</p><p>那么，问题基本就是由 docker <code>1.13.1</code> 中多出的一层挂载层 <code>/home/docker_rt/overlay2</code> 引起。</p><p>如何验证这个猜想呢？现在，其实我们已经具备了检索代码的关键目标，docker <code>1.13.1</code> 会设置容器镜像层根目录的传递属性。拿着这个先验知识，我们直接查代码，检索过程基本没费什么功夫，直接展示相关代码：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// filepath: daemon/graphdriver/overlay2/overlay.go</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">init</span><span class="params">()</span></span> &#123;</span><br><span class="line">   graphdriver.Register(driverName, Init)</span><br><span class="line">&#125;</span><br><span class="line"> </span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">Init</span><span class="params">(home <span class="keyword">string</span>, options []<span class="keyword">string</span>, uidMaps, gidMaps []idtools.IDMap)</span> <span class="params">(graphdriver.Driver, error)</span></span> &#123;</span><br><span class="line">   <span class="keyword">if</span> err := mount.MakePrivate(home); err != <span class="literal">nil</span> &#123;</span><br><span class="line">      <span class="keyword">return</span> <span class="literal">nil</span>, err</span><br><span class="line">   &#125;</span><br><span class="line"> </span><br><span class="line">   supportsDType, err := fsutils.SupportsDType(home)</span><br><span class="line">   <span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">      <span class="keyword">return</span> <span class="literal">nil</span>, err</span><br><span class="line">   &#125;</span><br><span class="line">   <span class="keyword">if</span> !supportsDType &#123;</span><br><span class="line">      <span class="comment">// not a fatal error until v1.16 (#27443)</span></span><br><span class="line">      logrus.Warn(overlayutils.ErrDTypeNotSupported(<span class="string">&quot;overlay2&quot;</span>, backingFs))</span><br><span class="line">   &#125;</span><br><span class="line"> </span><br><span class="line">   d := &amp;Driver&#123;</span><br><span class="line">      home:          home,</span><br><span class="line">      uidMaps:       uidMaps,</span><br><span class="line">      gidMaps:       gidMaps,</span><br><span class="line">      ctr:           graphdriver.NewRefCounter(graphdriver.NewFsChecker(graphdriver.FsMagicOverlay)),</span><br><span class="line">      supportsDType: supportsDType,</span><br><span class="line">   &#125;</span><br><span class="line"> </span><br><span class="line">   d.naiveDiff = graphdriver.NewNaiveDiffDriver(d, uidMaps, gidMaps)</span><br><span class="line"> </span><br><span class="line">   <span class="keyword">if</span> backingFs == <span class="string">&quot;xfs&quot;</span> &#123;</span><br><span class="line">      <span class="comment">// Try to enable project quota support over xfs.</span></span><br><span class="line">      <span class="keyword">if</span> d.quotaCtl, err = quota.NewControl(home); err == <span class="literal">nil</span> &#123;</span><br><span class="line">         projectQuotaSupported = <span class="literal">true</span></span><br><span class="line">      &#125;</span><br><span class="line">   &#125;</span><br><span class="line"> </span><br><span class="line">   <span class="keyword">return</span> d, <span class="literal">nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>很明显，问题就出在 <code>mount.MakePrivate</code> 函数调用上。</p><p>官方将 <code>GraphDriver</code> 根目录设置为 <code>Private</code>，本意是为了避免容器读写层挂载点泄漏。那为什么在高版本中去掉了这个逻辑呢？显然官方也意识到这么做并不能实现期望的目的，官方也在<a href="https://github.com/moby/moby/pull/36047">修复</a>中给出了详细说明。</p><p>实际上，不设置 <code>GraphDriver</code> 根目录的传播属性，反而能避免绝大多数挂载点泄漏的问题。。。</p><h3 id="结语"><a href="#结语" class="headerlink" title="结语"></a>结语</h3><p>现在，我们已经了解了问题的来龙去脉，我们总结下问题的解决方案：</p><ul><li>针对 <code>1.13.1</code> 版本docker，存量宿主较多，我们可以忽略 <code>device or resource busy</code> 问题，基本也不会给线上服务带来什么影响</li><li>针对 <code>18.06.3-ce</code> 版本docker，存量宿主较少，我们删除docker服务的systemd配置项 <code>MountFlags</code>，通过故障自愈解决docker卡在问题</li><li>针对增量宿主，全部删除docker服务的systemd配置项 <code>MountFlags</code></li></ul><p>最后，告诫大家不要迷信网络解决方案，甚至是官方。</p>]]></content>
    
    
      
      
    <summary type="html">&lt;h3 id=&quot;背景&quot;&gt;&lt;a href=&quot;#背景&quot; class=&quot;headerlink&quot; title=&quot;背景&quot;&gt;&lt;/a&gt;背景&lt;/h3&gt;&lt;p&gt;承接&lt;a href=&quot;https://plpan.github.io/pod-terminating-%E6%8E%92%E6%9F%A5%</summary>
      
    
    
    
    <category term="问题排查" scheme="https://plpan.github.io/categories/%E9%97%AE%E9%A2%98%E6%8E%92%E6%9F%A5/"/>
    
    
    <category term="docker" scheme="https://plpan.github.io/tags/docker/"/>
    
    <category term="linux" scheme="https://plpan.github.io/tags/linux/"/>
    
  </entry>
  
  <entry>
    <title>pod terminating 排查之旅</title>
    <link href="https://plpan.github.io/pod-terminating-%E6%8E%92%E6%9F%A5%E4%B9%8B%E6%97%85/"/>
    <id>https://plpan.github.io/pod-terminating-%E6%8E%92%E6%9F%A5%E4%B9%8B%E6%97%85/</id>
    <published>2020-10-14T10:53:03.000Z</published>
    <updated>2020-11-12T14:52:44.125Z</updated>
    
    <content type="html"><![CDATA[<h3 id="背景"><a href="#背景" class="headerlink" title="背景"></a>背景</h3><p>近期，弹性云线上集群发生了几起特殊的容器漂移失败事件，其特殊之处在于容器处于Pod Terminating状态，而宿主则处于Ready状态。</p><p>宿主状态为Ready说明其能够正常处理Pod事件，但是Pod却卡在了退出阶段，说明此问题并非由kubelet引起，那么docker就是1号犯罪嫌疑人了。</p><p>下文将详细介绍问题的排查与分析全过程。</p><h3 id="抽丝剥茧"><a href="#抽丝剥茧" class="headerlink" title="抽丝剥茧"></a>抽丝剥茧</h3><h4 id="排除kubelet嫌疑"><a href="#排除kubelet嫌疑" class="headerlink" title="排除kubelet嫌疑"></a>排除kubelet嫌疑</h4><p>Pod状态如下：</p><figure class="highlight angelscript"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"><span class="string">[stupig@master ~]</span>$ kubectl <span class="keyword">get</span> pod -owide</span><br><span class="line">pod<span class="number">-976</span>a0<span class="number">-5</span>              <span class="number">0</span>/<span class="number">1</span>     Terminating        <span class="number">0</span>          <span class="number">112</span>m</span><br></pre></td></tr></table></figure><p>尽管kubelet的犯罪嫌疑已经很小，但是我们还是需要排查kubelet日志进一步确认。截取kubelet关键日志片段如下：</p><figure class="highlight apache"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="attribute">I1014</span> <span class="number">10</span>:<span class="number">56</span>:<span class="number">46</span>.<span class="number">492682</span>   <span class="number">34976</span> kubelet_pods.go:<span class="number">1017</span>] Pod <span class="string">&quot;pod-976a0-5_default(f1e03a3d-0dc7-11eb-b4b1-246e967c4efc)&quot;</span> is terminated, but some containers have not been cleaned up: &#123;ID:&#123;Type:docker ID:<span class="number">41020461</span>ed<span class="number">4</span>d<span class="number">801</span>afa<span class="number">8</span>d<span class="number">10847</span>a<span class="number">16907</span>e<span class="number">65</span>f<span class="number">6</span>e<span class="number">8</span>ca<span class="number">34</span>d<span class="number">1704</span>edf<span class="number">15</span>b<span class="number">0</span>d<span class="number">0</span>e<span class="number">72</span>bf<span class="number">4</span>ef&#125; Name:stupig State:exited CreatedAt:<span class="number">2020</span>-<span class="number">10</span>-<span class="number">14</span> <span class="number">10</span>:<span class="number">49</span>:<span class="number">57</span>.<span class="number">859913657</span> +<span class="number">0800</span> CST StartedAt:<span class="number">2020</span>-<span class="number">10</span>-<span class="number">14</span> <span class="number">10</span>:<span class="number">49</span>:<span class="number">57</span>.<span class="number">928654495</span> +<span class="number">0800</span> CST FinishedAt:<span class="number">2020</span>-<span class="number">10</span>-<span class="number">14</span> <span class="number">10</span>:<span class="number">50</span>:<span class="number">28</span>.<span class="number">661263065</span> +<span class="number">0800</span> CST ExitCode:<span class="number">0</span> Hash:<span class="number">2101852810</span> HashWithoutResources:<span class="number">2673273670</span> RestartCount:<span class="number">0</span> Reason:Completed Message: Resources:map[CpuQuota:<span class="number">200000</span> Memory:<span class="number">2147483648</span> MemorySwap:<span class="number">2147483648</span>]&#125;</span><br><span class="line"><span class="attribute">E1014</span> <span class="number">10</span>:<span class="number">56</span>:<span class="number">46</span>.<span class="number">709255</span>   <span class="number">34976</span> remote_runtime.go:<span class="number">250</span>] RemoveContainer <span class="string">&quot;41020461ed4d801afa8d10847a16907e65f6e8ca34d1704edf15b0d0e72bf4ef&quot;</span> from runtime service failed: rpc error: code = Unknown desc = failed to remove container <span class="string">&quot;41020461ed4d801afa8d10847a16907e65f6e8ca34d1704edf15b0d0e72bf4ef&quot;</span>: Error response from daemon: container <span class="number">41020461</span>ed<span class="number">4</span>d<span class="number">801</span>afa<span class="number">8</span>d<span class="number">10847</span>a<span class="number">16907</span>e<span class="number">65</span>f<span class="number">6</span>e<span class="number">8</span>ca<span class="number">34</span>d<span class="number">1704</span>edf<span class="number">15</span>b<span class="number">0</span>d<span class="number">0</span>e<span class="number">72</span>bf<span class="number">4</span>ef: driver <span class="string">&quot;overlay2&quot;</span> failed to remove root filesystem: unlinkat /home/docker_rt/overlay<span class="number">2</span>/e<span class="number">5</span>dab<span class="number">77</span>be<span class="number">213</span>d<span class="number">9</span>f<span class="number">9</span>cfc<span class="number">0</span>b<span class="number">0</span>b<span class="number">3281</span>dbef<span class="number">9</span>c<span class="number">2878</span>fee<span class="number">3</span>b<span class="number">8</span>e<span class="number">406</span>bc<span class="number">8</span>ab<span class="number">97</span>adc<span class="number">30</span>ae<span class="number">4</span>d<span class="number">5</span>/merged: device or resource busy</span><br><span class="line"><span class="attribute">E1014</span> <span class="number">10</span>:<span class="number">56</span>:<span class="number">46</span>.<span class="number">709292</span>   <span class="number">34976</span> kuberuntime_gc.go:<span class="number">126</span>] Failed to remove container <span class="string">&quot;41020461ed4d801afa8d10847a16907e65f6e8ca34d1704edf15b0d0e72bf4ef&quot;</span>: rpc error: code = Unknown desc = failed to remove container <span class="string">&quot;41020461ed4d801afa8d10847a16907e65f6e8ca34d1704edf15b0d0e72bf4ef&quot;</span>: Error response from daemon: container <span class="number">41020461</span>ed<span class="number">4</span>d<span class="number">801</span>afa<span class="number">8</span>d<span class="number">10847</span>a<span class="number">16907</span>e<span class="number">65</span>f<span class="number">6</span>e<span class="number">8</span>ca<span class="number">34</span>d<span class="number">1704</span>edf<span class="number">15</span>b<span class="number">0</span>d<span class="number">0</span>e<span class="number">72</span>bf<span class="number">4</span>ef: driver <span class="string">&quot;overlay2&quot;</span> failed to remove root filesystem: unlinkat /home/docker_rt/overlay<span class="number">2</span>/e<span class="number">5</span>dab<span class="number">77</span>be<span class="number">213</span>d<span class="number">9</span>f<span class="number">9</span>cfc<span class="number">0</span>b<span class="number">0</span>b<span class="number">3281</span>dbef<span class="number">9</span>c<span class="number">2878</span>fee<span class="number">3</span>b<span class="number">8</span>e<span class="number">406</span>bc<span class="number">8</span>ab<span class="number">97</span>adc<span class="number">30</span>ae<span class="number">4</span>d<span class="number">5</span>/merged: device or resource busy</span><br></pre></td></tr></table></figure><p>日志显示kubelet处于Pod Terminating状态的原因很清楚：清理容器失败。</p><p>kubelet清理容器的命令是 <code>docker rm -f</code> ，其失败的原因在于删除容器目录 <code>xxx/merged</code> 时报错，错误提示为 <code>device or resource busy</code> 。</p><p>除此之外，kubelet无法再提供其他关键信息。</p><p>登陆宿主，我们验证对应容器的状态：</p><figure class="highlight subunit"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">[stupig@hostname ~]$ sudo docker ps -a | grep pod<span class="string">-976</span>a0<span class="string">-5</span></span><br><span class="line">41020461ed4d            Removal In Progress                            k8s_stupig_pod<span class="string">-976</span>a0<span class="string">-5</span>_default_f1e03a3d<span class="string">-0</span>dc7<span class="string">-11</span>eb-b4b1<span class="string">-246</span>e967c4efc_0</span><br><span class="line">f0a75e10b252            Exited (0) 2 minutes ago                       k8s_POD_pod<span class="string">-976</span>a0<span class="string">-5</span>_default_f1e03a3d<span class="string">-0</span>dc7<span class="string">-11</span>eb-b4b1<span class="string">-246</span>e967c4efc_0</span><br><span class="line">[stupig@hostname ~]$ sudo docker rm -f 41020461ed4d</span><br><span class="line"><span class="keyword">Error </span>response from daemon: container 41020461ed4d801afa8d10847a16907e65f6e8ca34d1704edf15b0d0e72bf4ef: driver &quot;overlay2&quot; failed to remove root filesystem: unlinkat /home/docker_rt/overlay2/e5dab77be213d9f9cfc0b0b3281dbef9c2878fee3b8e406bc8ab97adc30ae4d5/merged: device or resource busy</span><br></pre></td></tr></table></figure><p>问题已然清楚，现在我们有两种排查思路：</p><ul><li>参考Google上解决 <code>device or resource busy</code> 问题的思路</li><li>结合现象分析代码</li></ul><h4 id="Google大法"><a href="#Google大法" class="headerlink" title="Google大法"></a>Google大法</h4><p>有问题找Google！所以，我们首先咨询了Google，检索结果显示很多人都碰到了类似的问题。</p><p>而网络上主流的解决方案：配置docker服务MountFlags为slave，避免docker挂载点信息泄漏到其他mnt命名空间，详细原因请参阅：<a href="https://blog.terminus.io/docker-device-is-busy/">docker device busy问题解决方案</a>。</p><p>这么简单？？？显然不能，检查发现docker服务当前已配置MountFlags为slave。网络银弹再次失去功效。</p><p>so，我们还是老老实实结合现场分析代码吧。</p><h4 id="docker处理流程"><a href="#docker处理流程" class="headerlink" title="docker处理流程"></a>docker处理流程</h4><p>在具体分析docker代码之前，先简单介绍下docker的处理流程，避免作为一只无头苍蝇处处碰壁。</p><p><img src="docker-procedure.png" alt="docker处理流程"></p><p>清楚了docker的处理流程之后，我们再来分析现场。</p><h4 id="提审docker"><a href="#提审docker" class="headerlink" title="提审docker"></a>提审docker</h4><p>问题发生在docker清理阶段，docker清理容器读写层出错，报错信息为 <code>device or resource busy</code>，说明docker读写层并没有被正确卸载，或者是没有完全卸载。下面的命令可以验证这个结论：</p><figure class="highlight gradle"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">[stupig@hostname ~]$ <span class="keyword">grep</span> -rwn <span class="string">&#x27;/home/docker_rt/overlay2/e5dab77be213d9f9cfc0b0b3281dbef9c2878fee3b8e406bc8ab97adc30ae4d5/merged&#x27;</span> <span class="regexp">/proc/</span>*/mountinfo</span><br><span class="line"><span class="regexp">/proc/</span><span class="number">22283</span><span class="regexp">/mountinfo:50:386 542 0:92 /</span> <span class="regexp">/home/</span>docker_rt<span class="regexp">/overlay2/</span>e5dab77be213d9f9cfc0b0b3281dbef9c2878fee3b8e406bc8ab97adc30ae4d5/merged rw,relatime - overlay overlay rw,lowerdir=XXX,upperdir=XXX,workdir=XXX</span><br><span class="line"><span class="regexp">/proc/</span><span class="number">22407</span><span class="regexp">/mountinfo:50:386 542 0:92 /</span> <span class="regexp">/home/</span>docker_rt<span class="regexp">/overlay2/</span>e5dab77be213d9f9cfc0b0b3281dbef9c2878fee3b8e406bc8ab97adc30ae4d5/merged rw,relatime - overlay overlay rw,lowerdir=XXX,upperdir=XXX,workdir=XXX</span><br><span class="line"><span class="regexp">/proc/</span><span class="number">28454</span><span class="regexp">/mountinfo:50:386 542 0:92 /</span> <span class="regexp">/home/</span>docker_rt<span class="regexp">/overlay2/</span>e5dab77be213d9f9cfc0b0b3281dbef9c2878fee3b8e406bc8ab97adc30ae4d5/merged rw,relatime - overlay overlay rw,lowerdir=XXX,upperdir=XXX,workdir=XXX</span><br><span class="line"><span class="regexp">/proc/</span><span class="number">28530</span><span class="regexp">/mountinfo:50:386 542 0:92 /</span> <span class="regexp">/home/</span>docker_rt<span class="regexp">/overlay2/</span>e5dab77be213d9f9cfc0b0b3281dbef9c2878fee3b8e406bc8ab97adc30ae4d5/merged rw,relatime - overlay overlay rw,lowerdir=XXX,upperdir=XXX,workdir=XXX</span><br></pre></td></tr></table></figure><p>不出所料，容器读写层仍然被以上四个进程所挂载，进而导致docker在清理读写层目录时报错。</p><p>随之而来的问题是，为什么docker没有正确卸载容器读写层？我们先展示下 <code>docker stop</code> 中卸载容器读写层挂载的相关部分代码：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(daemon *Daemon)</span> <span class="title">Cleanup</span><span class="params">(container *container.Container)</span></span> &#123;</span><br><span class="line">   <span class="keyword">if</span> err := daemon.conditionalUnmountOnCleanup(container); err != <span class="literal">nil</span> &#123;</span><br><span class="line">      <span class="keyword">if</span> mountid, err := daemon.imageService.GetLayerMountID(container.ID, container.OS); err == <span class="literal">nil</span> &#123;</span><br><span class="line">         daemon.cleanupMountsByID(mountid)</span><br><span class="line">      &#125;</span><br><span class="line">   &#125;</span><br><span class="line">&#125;</span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(daemon *Daemon)</span> <span class="title">conditionalUnmountOnCleanup</span><span class="params">(container *container.Container)</span> <span class="title">error</span></span> &#123;</span><br><span class="line">   <span class="keyword">return</span> daemon.Unmount(container)</span><br><span class="line">&#125;</span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(daemon *Daemon)</span> <span class="title">Unmount</span><span class="params">(container *container.Container)</span> <span class="title">error</span></span> &#123;</span><br><span class="line">   <span class="keyword">if</span> container.RWLayer == <span class="literal">nil</span> &#123;</span><br><span class="line">      <span class="keyword">return</span> errors.New(<span class="string">&quot;RWLayer of container &quot;</span> + container.ID + <span class="string">&quot; is unexpectedly nil&quot;</span>)</span><br><span class="line">   &#125;</span><br><span class="line">   <span class="keyword">if</span> err := container.RWLayer.Unmount(); err != <span class="literal">nil</span> &#123;</span><br><span class="line">      logrus.Errorf(<span class="string">&quot;Error unmounting container %s: %s&quot;</span>, container.ID, err)</span><br><span class="line">      <span class="keyword">return</span> err</span><br><span class="line">   &#125;</span><br><span class="line"> </span><br><span class="line">   <span class="keyword">return</span> <span class="literal">nil</span></span><br><span class="line">&#125;</span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(rl *referencedRWLayer)</span> <span class="title">Unmount</span><span class="params">()</span> <span class="title">error</span></span> &#123;</span><br><span class="line">   <span class="keyword">return</span> rl.layerStore.driver.Put(rl.mountedLayer.mountID)</span><br><span class="line">&#125;</span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(d *Driver)</span> <span class="title">Put</span><span class="params">(id <span class="keyword">string</span>)</span> <span class="title">error</span></span> &#123;</span><br><span class="line">   d.locker.Lock(id)</span><br><span class="line">   <span class="keyword">defer</span> d.locker.Unlock(id)</span><br><span class="line">   dir := d.dir(id)</span><br><span class="line">   mountpoint := path.Join(dir, <span class="string">&quot;merged&quot;</span>)</span><br><span class="line">   logger := logrus.WithField(<span class="string">&quot;storage-driver&quot;</span>, <span class="string">&quot;overlay2&quot;</span>)</span><br><span class="line">   <span class="keyword">if</span> err := unix.Unmount(mountpoint, unix.MNT_DETACH); err != <span class="literal">nil</span> &#123;</span><br><span class="line">      logger.Debugf(<span class="string">&quot;Failed to unmount %s overlay: %s - %v&quot;</span>, id, mountpoint, err)</span><br><span class="line">   &#125;</span><br><span class="line">   <span class="keyword">if</span> err := unix.Rmdir(mountpoint); err != <span class="literal">nil</span> &amp;&amp; !os.IsNotExist(err) &#123;</span><br><span class="line">      logger.Debugf(<span class="string">&quot;Failed to remove %s overlay: %v&quot;</span>, id, err)</span><br><span class="line">   &#125;</span><br><span class="line">   <span class="keyword">return</span> <span class="literal">nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>代码处理流程清晰明了，最终docker会发起 <code>SYS_UMOUNT2</code> 系统调用卸载容器读写层。</p><p>但是，docker在清理容器读写层时却提示错误，并且容器读写层挂载信息也出现在其他进程中。难不成docker没有执行卸载操作？结合docker日志分析：</p><figure class="highlight routeros"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">Oct 14 10:50:28 hostname dockerd: <span class="attribute">time</span>=<span class="string">&quot;2020-10-14T10:50:28.769199725+08:00&quot;</span> <span class="attribute">level</span>=debug <span class="attribute">msg</span>=<span class="string">&quot;Failed to unmount e5dab77be213d9f9cfc0b0b3281dbef9c2878fee3b8e406bc8ab97adc30ae4d5 overlay: /home/docker_rt/overlay2/e5dab77be213d9f9cfc0b0b3281dbef9c2878fee3b8e406bc8ab97adc30ae4d5/merged - invalid argument&quot;</span> <span class="attribute">storage-driver</span>=overlay2</span><br><span class="line">Oct 14 10:50:28 hostname dockerd: <span class="attribute">time</span>=<span class="string">&quot;2020-10-14T10:50:28.769213547+08:00&quot;</span> <span class="attribute">level</span>=debug <span class="attribute">msg</span>=<span class="string">&quot;Failed to remove e5dab77be213d9f9cfc0b0b3281dbef9c2878fee3b8e406bc8ab97adc30ae4d5 overlay: device or resource busy&quot;</span> <span class="attribute">storage-driver</span>=overlay2</span><br></pre></td></tr></table></figure><p>日志显示docker在执行卸载容器读写层命令时出错，提示 <code>invalid argument</code>。结合 <a href="https://man7.org/linux/man-pages/man2/umount.2.html">umount2</a> 文档可知，容器读写层并非是dockerd（docker后台进程）的挂载点？？？</p><p>现在，回过头来分析拥有容器读写层挂载信息的进程，我们发现一个惊人的信息：</p><figure class="highlight yaml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">[<span class="string">stupig@hostname</span> <span class="string">~</span>]<span class="string">$</span> <span class="string">ps</span> <span class="string">-ef|grep</span> <span class="string">-E</span> <span class="string">&quot;22283|22407|28454|28530&quot;</span></span><br><span class="line"><span class="string">root</span>      <span class="number">22283</span>      <span class="number">1</span>  <span class="number">0</span> <span class="number">10</span><span class="string">:48</span> <span class="string">?</span>        <span class="number">00</span><span class="string">:00:00</span> <span class="string">docker-containerd-shim</span> <span class="string">-namespace</span> <span class="string">moby</span></span><br><span class="line"><span class="string">root</span>      <span class="number">22407</span>      <span class="number">1</span>  <span class="number">0</span> <span class="number">10</span><span class="string">:48</span> <span class="string">?</span>        <span class="number">00</span><span class="string">:00:00</span> <span class="string">docker-containerd-shim</span> <span class="string">-namespace</span> <span class="string">moby</span></span><br><span class="line"><span class="string">root</span>      <span class="number">28454</span>      <span class="number">1</span>  <span class="number">0</span> <span class="number">10</span><span class="string">:49</span> <span class="string">?</span>        <span class="number">00</span><span class="string">:00:00</span> <span class="string">docker-containerd-shim</span> <span class="string">-namespace</span> <span class="string">moby</span></span><br><span class="line"><span class="string">root</span>      <span class="number">28530</span>      <span class="number">1</span>  <span class="number">0</span> <span class="number">10</span><span class="string">:49</span> <span class="string">?</span>        <span class="number">00</span><span class="string">:00:00</span> <span class="string">docker-containerd-shim</span> <span class="string">-namespace</span> <span class="string">moby</span></span><br></pre></td></tr></table></figure><p>容器读写层挂载信息没有出现在dockerd进程命名空间中，却出现在其他容器的托管服务shim进程的命名空间内，推断dockerd进程发生了重启，对比进程启动时间与命名空间详情可以进行验证：</p><figure class="highlight angelscript"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br></pre></td><td class="code"><pre><span class="line"><span class="string">[stupig@hostname ~]</span>$ ps -eo pid,cmd,lstart|grep dockerd</span><br><span class="line"> <span class="number">34836</span> /usr/bin/dockerd --storage- Wed Oct <span class="number">14</span> <span class="number">10</span>:<span class="number">50</span>:<span class="number">15</span> <span class="number">2020</span></span><br><span class="line"> </span><br><span class="line"><span class="string">[stupig@hostname ~]</span>$ sudo ls -la /proc/$(pidof dockerd)/ns</span><br><span class="line">lrwxrwxrwx <span class="number">1</span> root root <span class="number">0</span> Oct <span class="number">14</span> <span class="number">10</span>:<span class="number">50</span> ipc -&gt; ipc:[<span class="number">4026531839</span>]</span><br><span class="line">lrwxrwxrwx <span class="number">1</span> root root <span class="number">0</span> Oct <span class="number">14</span> <span class="number">10</span>:<span class="number">50</span> mnt -&gt; mnt:[<span class="number">4026533327</span>]</span><br><span class="line">lrwxrwxrwx <span class="number">1</span> root root <span class="number">0</span> Oct <span class="number">14</span> <span class="number">10</span>:<span class="number">50</span> net -&gt; net:[<span class="number">4026531968</span>]</span><br><span class="line">lrwxrwxrwx <span class="number">1</span> root root <span class="number">0</span> Oct <span class="number">14</span> <span class="number">10</span>:<span class="number">50</span> pid -&gt; pid:[<span class="number">4026531836</span>]</span><br><span class="line">lrwxrwxrwx <span class="number">1</span> root root <span class="number">0</span> Oct <span class="number">14</span> <span class="number">10</span>:<span class="number">50</span> user -&gt; user:[<span class="number">4026531837</span>]</span><br><span class="line">lrwxrwxrwx <span class="number">1</span> root root <span class="number">0</span> Oct <span class="number">14</span> <span class="number">10</span>:<span class="number">50</span> uts -&gt; uts:[<span class="number">4026531838</span>]</span><br><span class="line"> </span><br><span class="line"><span class="string">[stupig@hostname ~]</span>$ ps -eo pid,cmd,lstart|grep -w containerd|grep -v shim</span><br><span class="line"> <span class="number">34849</span> docker-containerd --config  Wed Oct <span class="number">14</span> <span class="number">10</span>:<span class="number">50</span>:<span class="number">15</span> <span class="number">2020</span></span><br><span class="line"> </span><br><span class="line"><span class="string">[stupig@hostname ~]</span>$ sudo ls -la /proc/$(pidof docker-containerd)/ns</span><br><span class="line">lrwxrwxrwx <span class="number">1</span> root root <span class="number">0</span> Oct <span class="number">14</span> <span class="number">10</span>:<span class="number">50</span> ipc -&gt; ipc:[<span class="number">4026531839</span>]</span><br><span class="line">lrwxrwxrwx <span class="number">1</span> root root <span class="number">0</span> Oct <span class="number">14</span> <span class="number">10</span>:<span class="number">50</span> mnt -&gt; mnt:[<span class="number">4026533327</span>]</span><br><span class="line">lrwxrwxrwx <span class="number">1</span> root root <span class="number">0</span> Oct <span class="number">14</span> <span class="number">10</span>:<span class="number">50</span> net -&gt; net:[<span class="number">4026531968</span>]</span><br><span class="line">lrwxrwxrwx <span class="number">1</span> root root <span class="number">0</span> Oct <span class="number">14</span> <span class="number">10</span>:<span class="number">50</span> pid -&gt; pid:[<span class="number">4026531836</span>]</span><br><span class="line">lrwxrwxrwx <span class="number">1</span> root root <span class="number">0</span> Oct <span class="number">14</span> <span class="number">10</span>:<span class="number">50</span> user -&gt; user:[<span class="number">4026531837</span>]</span><br><span class="line">lrwxrwxrwx <span class="number">1</span> root root <span class="number">0</span> Oct <span class="number">14</span> <span class="number">10</span>:<span class="number">50</span> uts -&gt; uts:[<span class="number">4026531838</span>]</span><br><span class="line"> </span><br><span class="line"><span class="string">[stupig@hostname ~]</span>$ ps -eo pid,cmd,lstart|grep -w containerd-shim</span><br><span class="line"> <span class="number">22283</span> docker-containerd-shim -nam Wed Oct <span class="number">14</span> <span class="number">10</span>:<span class="number">48</span>:<span class="number">50</span> <span class="number">2020</span></span><br><span class="line"> <span class="number">22407</span> docker-containerd-shim -nam Wed Oct <span class="number">14</span> <span class="number">10</span>:<span class="number">48</span>:<span class="number">55</span> <span class="number">2020</span></span><br><span class="line"> <span class="number">28454</span> docker-containerd-shim -nam Wed Oct <span class="number">14</span> <span class="number">10</span>:<span class="number">49</span>:<span class="number">53</span> <span class="number">2020</span></span><br><span class="line"> <span class="number">28530</span> docker-containerd-shim -nam Wed Oct <span class="number">14</span> <span class="number">10</span>:<span class="number">49</span>:<span class="number">53</span> <span class="number">2020</span></span><br><span class="line"> </span><br><span class="line"><span class="string">[stupig@hostname ~]</span>$ sudo ls -la /proc/<span class="number">28454</span>/ns</span><br><span class="line">lrwxrwxrwx <span class="number">1</span> root root <span class="number">0</span> Oct <span class="number">14</span> <span class="number">10</span>:<span class="number">50</span> ipc -&gt; ipc:[<span class="number">4026531839</span>]</span><br><span class="line">lrwxrwxrwx <span class="number">1</span> root root <span class="number">0</span> Oct <span class="number">14</span> <span class="number">10</span>:<span class="number">50</span> mnt -&gt; mnt:[<span class="number">4026533200</span>]</span><br><span class="line">lrwxrwxrwx <span class="number">1</span> root root <span class="number">0</span> Oct <span class="number">14</span> <span class="number">10</span>:<span class="number">50</span> net -&gt; net:[<span class="number">4026531968</span>]</span><br><span class="line">lrwxrwxrwx <span class="number">1</span> root root <span class="number">0</span> Oct <span class="number">14</span> <span class="number">10</span>:<span class="number">50</span> pid -&gt; pid:[<span class="number">4026531836</span>]</span><br><span class="line">lrwxrwxrwx <span class="number">1</span> root root <span class="number">0</span> Oct <span class="number">14</span> <span class="number">10</span>:<span class="number">50</span> user -&gt; user:[<span class="number">4026531837</span>]</span><br><span class="line">lrwxrwxrwx <span class="number">1</span> root root <span class="number">0</span> Oct <span class="number">14</span> <span class="number">10</span>:<span class="number">50</span> uts -&gt; uts:[<span class="number">4026531838</span>]</span><br><span class="line"> </span><br><span class="line"><span class="string">[stupig@hostname ~]</span>$ sudo ls -la /proc/$$/ns</span><br><span class="line">lrwxrwxrwx <span class="number">1</span> stupig stupig <span class="number">0</span> Oct <span class="number">14</span> <span class="number">21</span>:<span class="number">49</span> ipc -&gt; ipc:[<span class="number">4026531839</span>]</span><br><span class="line">lrwxrwxrwx <span class="number">1</span> stupig stupig <span class="number">0</span> Oct <span class="number">14</span> <span class="number">21</span>:<span class="number">49</span> mnt -&gt; mnt:[<span class="number">4026531840</span>]</span><br><span class="line">lrwxrwxrwx <span class="number">1</span> stupig stupig <span class="number">0</span> Oct <span class="number">14</span> <span class="number">21</span>:<span class="number">49</span> net -&gt; net:[<span class="number">4026531968</span>]</span><br><span class="line">lrwxrwxrwx <span class="number">1</span> stupig stupig <span class="number">0</span> Oct <span class="number">14</span> <span class="number">21</span>:<span class="number">49</span> pid -&gt; pid:[<span class="number">4026531836</span>]</span><br><span class="line">lrwxrwxrwx <span class="number">1</span> stupig stupig <span class="number">0</span> Oct <span class="number">14</span> <span class="number">21</span>:<span class="number">49</span> user -&gt; user:[<span class="number">4026531837</span>]</span><br><span class="line">lrwxrwxrwx <span class="number">1</span> stupig stupig <span class="number">0</span> Oct <span class="number">14</span> <span class="number">21</span>:<span class="number">49</span> uts -&gt; uts:[<span class="number">4026531838</span>]</span><br></pre></td></tr></table></figure><p>结果验证了我们推断的正确性。现在再补充下docker组件的进程树模型，用以解释这个现象，模型如下：</p><figure class="highlight brainfuck"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line">                       <span class="literal">+</span>--<span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">+</span>                      </span><br><span class="line">                       <span class="comment">|</span>   <span class="comment">dockerd</span>   <span class="comment">|</span>                      </span><br><span class="line">                       <span class="literal">+</span>--<span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="comment">|</span>--<span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">+</span>                      </span><br><span class="line">                              <span class="comment">|</span>                             </span><br><span class="line">                       <span class="literal">+</span>--<span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="comment">|</span>--<span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">+</span>                      </span><br><span class="line">                       <span class="comment">|</span> <span class="comment">containerd</span>  <span class="comment">|</span>                      </span><br><span class="line">                       <span class="literal">+</span>--<span class="literal">-</span><span class="comment">|</span>--<span class="comment">|</span>--<span class="literal">-</span><span class="comment">|</span>--<span class="literal">+</span>                      </span><br><span class="line">         <span class="literal">+</span>--<span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">+</span>  <span class="comment">|</span>   <span class="literal">+</span>--<span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">+</span>        </span><br><span class="line"><span class="literal">+</span>--<span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="comment">|</span>--<span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">+</span>  <span class="literal">+</span>--<span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="comment">|</span>--<span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">+</span>  <span class="literal">+</span>--<span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="comment">|</span>--<span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">+</span></span><br><span class="line"><span class="comment">|</span> <span class="comment">containerd</span><span class="literal">-</span><span class="comment">shim</span> <span class="comment">|</span>  <span class="comment">|</span> <span class="comment">containerd</span><span class="literal">-</span><span class="comment">shim</span> <span class="comment">|</span>  <span class="comment">|</span> <span class="comment">containerd</span><span class="literal">-</span><span class="comment">shim</span> <span class="comment">|</span></span><br><span class="line"><span class="comment"></span><span class="literal">+</span>--<span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">+</span>  <span class="literal">+</span>--<span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">+</span>  <span class="literal">+</span>--<span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">-</span><span class="literal">+</span></span><br><span class="line"></span><br></pre></td></tr></table></figure><p>dockerd进程启动时，会自动拉起containerd进程；当用户创建并启动容器时，containerd会启动containerd-shim进程用于托管容器进程，最终由containerd-shim调用runc启动容器进程。runc负责初始化进程命名空间，并exec容器启动命令。</p><p>上述模型中shim进程存在的意义是：允许dockerd/containerd升级或重启，同时不影响已运行容器。docker提供了 <code>live-restore</code> 的能力，而我们的集群也的确启用了该配置。</p><p>此外，由于我们在systemd的docker配置选项中配置了 <code>MountFlags=slave</code>，参考<a href="https://freedesktop.org/software/systemd/man/systemd.exec.html#MountFlags=">systemd配置说明</a>，systemd在启动dockerd进程时，会创建一个新的mnt命名空间。</p><p>至此，问题已基本定位清楚：</p><ul><li>systemd在启动dockerd服务时，将dockerd安置在一个新的mnt命名空间中</li><li>用户创建并启动容器时，dockerd会在本mnt命名空间内挂载容器读写层目录，并启动shim进程托管容器进程</li><li>由于某种原因，dockerd服务发生重启，systemd会将其安置在另一个新的mnt命名空间内</li><li>用户删除容器时，容器退出时，dockerd在清理容器读写层挂载时报错，因为挂载并非在当前dockerd的mnt命名空间内</li></ul><p>后来，我们在docker issue中也发现了<a href="https://github.com/moby/moby/issues/35873#issuecomment-386467562">官方给出的说明</a>，<code>MountFlags=slave</code> 与 <code>live-restore</code> 确实不能同时使用。</p><h4 id="一波又起"><a href="#一波又起" class="headerlink" title="一波又起"></a>一波又起</h4><p>还没当我们沉浸在解决问题的喜悦之中，另一个疑问接踵而来。我们线上集群好多宿主同时配置了 <code>MountFlags=slave</code> 和 <code>live-restore=true</code>，为什么问题直到最近才报出来呢？</p><p>当我们分析了几起 <code>Pod Terminating</code> 的涉事宿主后，发现它们的一个通性是docker版本为 <code>18.06.3-ce</code>，而我们当前主流的版本仍然是 <code>1.13.1</code>。</p><p>难道是新版本中才引入的问题？我们首先在测试环境中对 <code>1.13.1</code> 版本的docker进行了验证，Pod确实没有被阻塞在 Terminating 状态，这是不是说明低版本docker不存在挂载点泄漏的问题呢？</p><p>事实并非如此。当我们再次进行验证时，在删除Pod前记录了测试容器的读写层，之后发送删除Pod指令，Pod顺利退出，但此时，我们登录Pod之前所在宿主，发现docker日志中同样也存在如下日志：</p><figure class="highlight apache"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"><span class="attribute">Oct</span> <span class="number">14</span> <span class="number">22</span>:<span class="number">12</span>:<span class="number">43</span> hostname<span class="number">2</span> dockerd: time=<span class="string">&quot;2020-10-14T22:12:43.730726978+08:00&quot;</span> level=debug msg=<span class="string">&quot;Failed to unmount fb41efa2cfcbfbb8d90bd1d8d77d299e17518829faf52af40f7a1552ec8aa165 overlay: /home/docker_rt/overlay2/fb41efa2cfcbfbb8d90bd1d8d77d299e17518829faf52af40f7a1552ec8aa165/merged - invalid argument&quot;</span></span><br><span class="line"></span><br></pre></td></tr></table></figure><p>同样存在卸载问题的情况下，高低版本的docker却呈现出了不同的结果，这显然是docker的处理逻辑发生了变更，这里我们对比源码能够很快得出结论：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// 1.13.1 版本处理逻辑</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(daemon *Daemon)</span> <span class="title">cleanupContainer</span><span class="params">(container *container.Container, forceRemove, removeVolume <span class="keyword">bool</span>)</span> <span class="params">(err error)</span></span> &#123;</span><br><span class="line">   <span class="comment">// If force removal is required, delete container from various</span></span><br><span class="line">   <span class="comment">// indexes even if removal failed.</span></span><br><span class="line">   <span class="keyword">defer</span> <span class="function"><span class="keyword">func</span><span class="params">()</span></span> &#123;</span><br><span class="line">      <span class="keyword">if</span> err == <span class="literal">nil</span> || forceRemove &#123;</span><br><span class="line">         daemon.nameIndex.Delete(container.ID)</span><br><span class="line">         daemon.linkIndex.<span class="built_in">delete</span>(container)</span><br><span class="line">         selinuxFreeLxcContexts(container.ProcessLabel)</span><br><span class="line">         daemon.idIndex.Delete(container.ID)</span><br><span class="line">         daemon.containers.Delete(container.ID)</span><br><span class="line">         <span class="keyword">if</span> e := daemon.removeMountPoints(container, removeVolume); e != <span class="literal">nil</span> &#123;</span><br><span class="line">            logrus.Error(e)</span><br><span class="line">         &#125;</span><br><span class="line">         daemon.LogContainerEvent(container, <span class="string">&quot;destroy&quot;</span>)</span><br><span class="line">      &#125;</span><br><span class="line">   &#125;()</span><br><span class="line"> </span><br><span class="line">   <span class="keyword">if</span> err = os.RemoveAll(container.Root); err != <span class="literal">nil</span> &#123;</span><br><span class="line">      <span class="keyword">return</span> fmt.Errorf(<span class="string">&quot;Unable to remove filesystem for %v: %v&quot;</span>, container.ID, err)</span><br><span class="line">   &#125;</span><br><span class="line"> </span><br><span class="line">   <span class="comment">// When container creation fails and `RWLayer` has not been created yet, we</span></span><br><span class="line">   <span class="comment">// do not call `ReleaseRWLayer`</span></span><br><span class="line">   <span class="keyword">if</span> container.RWLayer != <span class="literal">nil</span> &#123;</span><br><span class="line">      metadata, err := daemon.layerStore.ReleaseRWLayer(container.RWLayer)</span><br><span class="line">      layer.LogReleaseMetadata(metadata)</span><br><span class="line">      <span class="keyword">if</span> err != <span class="literal">nil</span> &amp;&amp; err != layer.ErrMountDoesNotExist &#123;</span><br><span class="line">         <span class="keyword">return</span> fmt.Errorf(<span class="string">&quot;Driver %s failed to remove root filesystem %s: %s&quot;</span>, daemon.GraphDriverName(), container.ID, err)</span><br><span class="line">      &#125;</span><br><span class="line">   &#125;</span><br><span class="line"> </span><br><span class="line">   <span class="keyword">return</span> <span class="literal">nil</span></span><br><span class="line">&#125;</span><br><span class="line"> </span><br><span class="line"> </span><br><span class="line"><span class="comment">// 18.06.3-ce 版本处理逻辑</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(daemon *Daemon)</span> <span class="title">cleanupContainer</span><span class="params">(container *container.Container, forceRemove, removeVolume <span class="keyword">bool</span>)</span> <span class="params">(err error)</span></span> &#123;</span><br><span class="line">   <span class="comment">// When container creation fails and `RWLayer` has not been created yet, we</span></span><br><span class="line">   <span class="comment">// do not call `ReleaseRWLayer`</span></span><br><span class="line">   <span class="keyword">if</span> container.RWLayer != <span class="literal">nil</span> &#123;</span><br><span class="line">      err := daemon.imageService.ReleaseLayer(container.RWLayer, container.OS)</span><br><span class="line">      <span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">         err = errors.Wrapf(err, <span class="string">&quot;container %s&quot;</span>, container.ID)</span><br><span class="line">         container.SetRemovalError(err)</span><br><span class="line">         <span class="keyword">return</span> err</span><br><span class="line">      &#125;</span><br><span class="line">      container.RWLayer = <span class="literal">nil</span></span><br><span class="line">   &#125;</span><br><span class="line"> </span><br><span class="line">   <span class="keyword">if</span> err := system.EnsureRemoveAll(container.Root); err != <span class="literal">nil</span> &#123;</span><br><span class="line">      e := errors.Wrapf(err, <span class="string">&quot;unable to remove filesystem for %s&quot;</span>, container.ID)</span><br><span class="line">      container.SetRemovalError(e)</span><br><span class="line">      <span class="keyword">return</span> e</span><br><span class="line">   &#125;</span><br><span class="line"> </span><br><span class="line">   linkNames := daemon.linkIndex.<span class="built_in">delete</span>(container)</span><br><span class="line">   selinuxFreeLxcContexts(container.ProcessLabel)</span><br><span class="line">   daemon.idIndex.Delete(container.ID)</span><br><span class="line">   daemon.containers.Delete(container.ID)</span><br><span class="line">   daemon.containersReplica.Delete(container)</span><br><span class="line">   <span class="keyword">if</span> e := daemon.removeMountPoints(container, removeVolume); e != <span class="literal">nil</span> &#123;</span><br><span class="line">      logrus.Error(e)</span><br><span class="line">   &#125;</span><br><span class="line">   <span class="keyword">for</span> _, name := <span class="keyword">range</span> linkNames &#123;</span><br><span class="line">      daemon.releaseName(name)</span><br><span class="line">   &#125;</span><br><span class="line">   container.SetRemoved()</span><br><span class="line">   stateCtr.del(container.ID)</span><br><span class="line">   <span class="keyword">return</span> <span class="literal">nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>改动一目了然，官方在<a href="https://github.com/moby/moby/pull/31012">清理容器变更</a>中给出了详细的说明。也即在低版本docker中，问题并非不存在，仅仅是被隐藏了，并在高版本中被暴露出来。</p><h3 id="问题影响"><a href="#问题影响" class="headerlink" title="问题影响"></a>问题影响</h3><p>既然所有版本的docker都存在这个问题，那么其影响是什么呢？</p><p>在高版本docker中，其影响是显式的，会引起容器清理失败，进而造成Pod删除失败。</p><p>而在低版本docker中，其影响是隐式的，造成挂载点泄漏，进而可能会造成的影响如下：</p><ul><li>inode被打满：由于挂载点泄漏，容器读写层不会被清理，长时间累计可能会造成inode耗尽问题，但是是小概率事件</li><li>容器ID复用：由于挂载点未被卸载，当docker复用了原来已经退出的容器ID时，在挂载容器init层与读写层时会失败。由于docker生成容器ID是随机的，因此也是小概率事件</li></ul><h3 id="解决方案"><a href="#解决方案" class="headerlink" title="解决方案"></a>解决方案</h3><p>问题已然明确，如何解决问题成了当务之急。思路有二：</p><ol><li>治标：对标 <code>1.13.1</code> 版本的处理逻辑，修改 <code>18.06.3-ce</code> 处理代码</li><li>治本：既然官方也提及 <code>MountFlags=slave</code> 与 <code>live-restore</code> 不能同时使用，那么我们修改两个配置选项之一即可</li></ol><p>考虑到 <strong>重启docker不重启容器</strong> 这样一个强需求的存在，似乎我们唯一的解决方案就是关闭 <code>MountFlags=slave</code> 配置。关闭该配置后，与之而来的疑问如下：</p><ul><li>能否解决本问题？</li><li>如网络所传，其他systemd托管服务启用PrivateTmp是否会造成挂载点泄漏？</li></ul><p>预知后事如何，且听下回分解！</p><p><a href="https://plpan.github.io/%E5%88%A0%E9%99%A4%E5%AE%B9%E5%99%A8%E6%8A%A5%E9%94%99-device-or-resource-busy-%E9%97%AE%E9%A2%98%E6%8E%92%E6%9F%A5%E4%B9%8B%E6%97%85/">传送门</a></p>]]></content>
    
    
      
      
    <summary type="html">&lt;h3 id=&quot;背景&quot;&gt;&lt;a href=&quot;#背景&quot; class=&quot;headerlink&quot; title=&quot;背景&quot;&gt;&lt;/a&gt;背景&lt;/h3&gt;&lt;p&gt;近期，弹性云线上集群发生了几起特殊的容器漂移失败事件，其特殊之处在于容器处于Pod Terminating状态，而宿主则处于Ready状态。</summary>
      
    
    
    
    <category term="问题排查" scheme="https://plpan.github.io/categories/%E9%97%AE%E9%A2%98%E6%8E%92%E6%9F%A5/"/>
    
    
    <category term="kubernetes" scheme="https://plpan.github.io/tags/kubernetes/"/>
    
    <category term="docker" scheme="https://plpan.github.io/tags/docker/"/>
    
  </entry>
  
  <entry>
    <title>一次读 pipe 引发的血案</title>
    <link href="https://plpan.github.io/%E4%B8%80%E6%AC%A1%E8%AF%BB-pipe-%E5%BC%95%E5%8F%91%E7%9A%84%E8%A1%80%E6%A1%88/"/>
    <id>https://plpan.github.io/%E4%B8%80%E6%AC%A1%E8%AF%BB-pipe-%E5%BC%95%E5%8F%91%E7%9A%84%E8%A1%80%E6%A1%88/</id>
    <published>2020-07-19T10:53:03.000Z</published>
    <updated>2020-11-12T14:53:08.617Z</updated>
    
    <content type="html"><![CDATA[<h3 id="背景"><a href="#背景" class="headerlink" title="背景"></a>背景</h3><p>背景详见：<a href="https://plpan.github.io/docker-hang-%E6%AD%BB%E6%8E%92%E6%9F%A5%E4%B9%8B%E6%97%85/">docker-hang-死排查之旅</a>。总结成一句话：runc非预期写pipe造成一系列组件的阻塞，当我们读pipe以消除阻塞时，发生了一个非预期的现象——宿主上所有的容器都被重建了。</p><p>再详细分析问题原因之前，我们先简单回顾下linux pipe的基础知识。</p><h3 id="linux-pipe"><a href="#linux-pipe" class="headerlink" title="linux pipe"></a>linux pipe</h3><p>linux pipe（也即管道），相信大家对它都不陌生，是一种典型的进程间通信机制。管道主要分为两类：命名管道与匿名管道。其区别在于：</p><ul><li>命名管道：管道以文件形式存储在文件系统之上，系统中的任意两个进程都可以借助命名管道通信</li><li>匿名管道：管道不以文件形式存储在文件系统之上，仅存储在进程的文件描述符表中，只有具有血缘关系的进程直接才能借助管道通信，如父子进程、子子进程、祖孙进程等</li></ul><p>管道可以被想象成一个固定大小的文件，分为读写两端，阻塞型管道读写有如下特点：</p><ul><li>读端：当管道内无数据时，读操作阻塞，直到有数据写入，或者所有写段关闭</li><li>写段：当管道已被写满时，写操作阻塞，直到数据被读出</li></ul><p>linux pipe默认大小为16内存页[ref.2]，也即65536字节。</p><p>这里我有一个小疑惑：写pipe超过65536字节才会被阻塞，我们在宿主上也验证了这个结论，但是<a href="https://plpan.github.io/docker-hang-%E6%AD%BB%E6%8E%92%E6%9F%A5%E4%B9%8B%E6%97%85/">docker-hang-死排查之旅</a>写入5378字符时就已被阻塞。欢迎了解的小伙伴解惑。</p><h3 id="血案发生"><a href="#血案发生" class="headerlink" title="血案发生"></a>血案发生</h3><p>由于runc init非预期往pipe里写入大量数据而引起阻塞，我们消除阻塞的做法很简单，人为读取pipe中的内容。当我们读取完pipe中的内容时，原本一切都应该按照我们的预期发展：收集到runc init非预期写pipe的真正原因；异常容器恢复响应。确实，以上两点都如我们预期的发生了，然而，此时还发生了一个非预期的动作：宿主上所有容器都被重建了。</p><p>一个线上事故就此发生。原本其他线上容器运行正常，当我们解决docker hang死问题时，却引起了其他容器的一次重建，这显然是不可接受的。</p><h3 id="问题定位"><a href="#问题定位" class="headerlink" title="问题定位"></a>问题定位</h3><p>我们的第一嫌犯是docker，怀疑宿主docker服务发生了重启。当我们验证docker服务状态时，排除了docker的嫌疑，因为docker上一次重启时间是好多天前。</p><p>既然不是docker，那基本就是kubernetes了。kubernetes组件又分为master端与node端两大类。node端组件仅有kubelet，但是kubelet的嫌疑很小，因为它就是个打工仔，所有事情都是听从master的安排。而master端组件有三：控制器、调度器，与API服务。由于调度器包含驱逐功能，原本调度器嫌疑最大，但是因为我们线上关闭了驱逐功能，因此也基本不可能是调度器搞的鬼；而API服务则是被动的接收变更请求，也能排除嫌疑；那么嫌疑犯只剩下控制器了，控制器为什么要重建宿主上的所有容器呢？</p><p>以上是我们的猜测环节，为了验证猜测正确与否，我们必须收集证据。证据何在？基本就埋没在海量的组件日志中。天网恢恢疏而不漏，在控制器日志中，我们掌握了它犯罪的关键证据：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">/<span class="keyword">var</span>/log/kubernetes/kube-controller-manager.root.log.INFO<span class="number">.20200712</span><span class="number">-014245.35913</span>:I0712 <span class="number">03</span>:<span class="number">19</span>:<span class="number">59.590703</span>   <span class="number">35913</span> controller_utils.<span class="keyword">go</span>:<span class="number">95</span>] Starting deletion of pod <span class="keyword">default</span>/kproxy-sf<span class="number">-69466</span><span class="number">-1</span></span><br></pre></td></tr></table></figure><p>我们在0709发现宿主docker异常，而控制器在0712主动删除了宿主上的容器。证据在手，我们就开始审问控制器，它的交代入下：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// DeletePods will delete all pods from master running on given node,</span></span><br><span class="line"><span class="comment">// and return true if any pods were deleted, or were found pending</span></span><br><span class="line"><span class="comment">// deletion.</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">DeletePods</span><span class="params">(kubeClient clientset.Interface, recorder record.EventRecorder, nodeName, nodeUID <span class="keyword">string</span>, daemonStore extensionslisters.DaemonSetLister)</span> <span class="params">(<span class="keyword">bool</span>, error)</span></span> &#123;</span><br><span class="line">......</span><br><span class="line"><span class="keyword">for</span> _, pod := <span class="keyword">range</span> pods.Items &#123;</span><br><span class="line">......</span><br><span class="line">glog.V(<span class="number">2</span>).Infof(<span class="string">&quot;Starting deletion of pod %v/%v&quot;</span>, pod.Namespace, pod.Name)</span><br><span class="line"><span class="keyword">if</span> err := kubeClient.CoreV1().Pods(pod.Namespace).Delete(pod.Name, <span class="literal">nil</span>); err != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span> <span class="literal">false</span>, err</span><br><span class="line">&#125;</span><br><span class="line">remaining = <span class="literal">true</span></span><br><span class="line">&#125;</span><br><span class="line">  ......</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(nc *Controller)</span> <span class="title">doEvictionPass</span><span class="params">()</span></span> &#123;</span><br><span class="line">nc.evictorLock.Lock()</span><br><span class="line"><span class="keyword">defer</span> nc.evictorLock.Unlock()</span><br><span class="line"><span class="keyword">for</span> k := <span class="keyword">range</span> nc.zonePodEvictor &#123;</span><br><span class="line"><span class="comment">// Function should return &#x27;false&#x27; and a time after which it should be retried, or &#x27;true&#x27; if it shouldn&#x27;t (it succeeded).</span></span><br><span class="line">nc.zonePodEvictor[k].Try(<span class="function"><span class="keyword">func</span><span class="params">(value scheduler.TimedValue)</span> <span class="params">(<span class="keyword">bool</span>, time.Duration)</span></span> &#123;</span><br><span class="line">......</span><br><span class="line">remaining, err := nodeutil.DeletePods(nc.kubeClient, nc.recorder, value.Value, nodeUID, nc.daemonSetStore)</span><br><span class="line">......</span><br><span class="line">&#125;)</span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// Run starts an asynchronous loop that monitors the status of cluster nodes.</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(nc *Controller)</span> <span class="title">Run</span><span class="params">(stopCh &lt;-<span class="keyword">chan</span> <span class="keyword">struct</span>&#123;&#125;)</span></span> &#123;</span><br><span class="line">  ......</span><br><span class="line"><span class="comment">// Managing eviction of nodes:</span></span><br><span class="line"><span class="comment">// When we delete pods off a node, if the node was not empty at the time we then</span></span><br><span class="line"><span class="comment">// queue an eviction watcher. If we hit an error, retry deletion.</span></span><br><span class="line"><span class="keyword">go</span> wait.Until(nc.doEvictionPass, scheduler.NodeEvictionPeriod, stopCh)</span><br><span class="line">    ......</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>这个控制器就是node_lifecycle_controller，也即宿主生命周期控制器，该控制器定时 (每100ms) 驱逐宿主上的容器。这个控制器并非不分青红皂白就一通乱杀，不然线上早就乱套了，我们再来看看其判断条件：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// monitorNodeStatus verifies node status are constantly updated by kubelet, and if not,</span></span><br><span class="line"><span class="comment">// post &quot;NodeReady==ConditionUnknown&quot;. It also evicts all pods if node is not ready or</span></span><br><span class="line"><span class="comment">// not reachable for a long period of time.</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(nc *Controller)</span> <span class="title">monitorNodeStatus</span><span class="params">()</span> <span class="title">error</span></span> &#123;</span><br><span class="line">  ......</span><br><span class="line">  <span class="keyword">if</span> currentReadyCondition != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="comment">// Check eviction timeout against decisionTimestamp</span></span><br><span class="line"><span class="keyword">if</span> observedReadyCondition.Status == v1.ConditionFalse &#123;</span><br><span class="line"><span class="keyword">if</span> decisionTimestamp.After(nc.nodeStatusMap[node.Name].readyTransitionTimestamp.Add(nc.podEvictionTimeout)) &#123;</span><br><span class="line"><span class="keyword">if</span> nc.evictPods(node) &#123;</span><br><span class="line">glog.V(<span class="number">2</span>).Infof(<span class="string">&quot;Node is NotReady. Adding Pods on Node %s to eviction queue: %v is later than %v + %v&quot;</span>,</span><br><span class="line">node.Name,</span><br><span class="line">decisionTimestamp,</span><br><span class="line">nc.nodeStatusMap[node.Name].readyTransitionTimestamp,</span><br><span class="line">nc.podEvictionTimeout,</span><br><span class="line">)</span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">if</span> observedReadyCondition.Status == v1.ConditionUnknown &#123;</span><br><span class="line"><span class="keyword">if</span> decisionTimestamp.After(nc.nodeStatusMap[node.Name].probeTimestamp.Add(nc.podEvictionTimeout)) &#123;</span><br><span class="line"><span class="keyword">if</span> nc.evictPods(node) &#123;</span><br><span class="line">glog.V(<span class="number">2</span>).Infof(<span class="string">&quot;Node is unresponsive. Adding Pods on Node %s to eviction queues: %v is later than %v + %v&quot;</span>,</span><br><span class="line">node.Name,</span><br><span class="line">decisionTimestamp,</span><br><span class="line">nc.nodeStatusMap[node.Name].readyTransitionTimestamp,</span><br><span class="line">nc.podEvictionTimeout-gracePeriod,</span><br><span class="line">)</span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br><span class="line">  &#125;</span><br><span class="line">  ......</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(nc *Controller)</span> <span class="title">evictPods</span><span class="params">(node *v1.Node)</span> <span class="title">bool</span></span> &#123;</span><br><span class="line">nc.evictorLock.Lock()</span><br><span class="line"><span class="keyword">defer</span> nc.evictorLock.Unlock()</span><br><span class="line"><span class="keyword">return</span> nc.zonePodEvictor[utilnode.GetZoneKey(node)].Add(node.Name, <span class="keyword">string</span>(node.UID))</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>可见，控制器驱逐该宿主上的所有Pod的条件有二：</p><ol><li>宿主的状态为NotReady或者Unknown</li><li>宿主状态保持非Ready超过指定时间阈值。该时间阈值由nc.podEvictionTimeout定义，默认为5分钟，我们的线上集群将其定制为2000分钟</li></ol><p>在<a href="https://plpan.github.io/docker-hang-%E6%AD%BB%E9%98%BB%E5%A1%9E-kubelet-%E5%88%9D%E5%A7%8B%E5%8C%96%E6%B5%81%E7%A8%8B/">docker-hang-死阻塞-kubelet-初始化流程</a>中，我们提到由于docker hang死，kubelet初始化流程被阻塞，宿主状态为NotReady，命中条件1；我们检查kubelet NotReady的起始时间为2020-07-10 17:58:59，与控制器删除Pod的时间间隔基本为2000分钟，命中条件2。</p><p>至此，本问题基本已盖棺定论：由于线上宿主状态非Ready持续时间太长，引起控制器驱逐宿主上所有容器导致。</p><h3 id="思考"><a href="#思考" class="headerlink" title="思考"></a>思考</h3><p>清楚了其原理之后，大家再来思考一个问题：当宿主状态非Ready时，无法处理控制器发出的驱逐容器的请求，当且仅当宿主状态变成Ready之后，才能开始处理。既然宿主已恢复，是否还有必要立即驱逐其上的所有容器？尤其是针对有状态服务，删除Pod之后，立马又在原宿主创建该Pod。我个人感觉非但没有必要，而且还存在一定风险。</p><p>针对控制器的驱逐策略，我们调大了线上的驱逐时间间隔，从原来的2000分钟，调整为3年。</p><h3 id="Reference"><a href="#Reference" class="headerlink" title="Reference"></a>Reference</h3><ol><li><a href="https://elixir.bootlin.com/linux/v3.10/source/fs/pipe.c#L496">https://elixir.bootlin.com/linux/v3.10/source/fs/pipe.c#L496</a></li><li><a href="https://elixir.bootlin.com/linux/v3.10/source/include/linux/pipe_fs_i.h#L4">https://elixir.bootlin.com/linux/v3.10/source/include/linux/pipe_fs_i.h#L4</a></li></ol>]]></content>
    
    
      
      
    <summary type="html">&lt;h3 id=&quot;背景&quot;&gt;&lt;a href=&quot;#背景&quot; class=&quot;headerlink&quot; title=&quot;背景&quot;&gt;&lt;/a&gt;背景&lt;/h3&gt;&lt;p&gt;背景详见：&lt;a href=&quot;https://plpan.github.io/docker-hang-%E6%AD%BB%E6%8E%92%E</summary>
      
    
    
    
    <category term="问题排查" scheme="https://plpan.github.io/categories/%E9%97%AE%E9%A2%98%E6%8E%92%E6%9F%A5/"/>
    
    
    <category term="kubernetes" scheme="https://plpan.github.io/tags/kubernetes/"/>
    
    <category term="docker" scheme="https://plpan.github.io/tags/docker/"/>
    
  </entry>
  
  <entry>
    <title>docker hang 死阻塞 kubelet 初始化流程</title>
    <link href="https://plpan.github.io/docker-hang-%E6%AD%BB%E9%98%BB%E5%A1%9E-kubelet-%E5%88%9D%E5%A7%8B%E5%8C%96%E6%B5%81%E7%A8%8B/"/>
    <id>https://plpan.github.io/docker-hang-%E6%AD%BB%E9%98%BB%E5%A1%9E-kubelet-%E5%88%9D%E5%A7%8B%E5%8C%96%E6%B5%81%E7%A8%8B/</id>
    <published>2020-07-18T08:50:27.000Z</published>
    <updated>2020-11-12T14:52:25.727Z</updated>
    
    <content type="html"><![CDATA[<h3 id="背景"><a href="#背景" class="headerlink" title="背景"></a>背景</h3><p>最近升级了一版kubelet，修复因kubelet删除Sidecar类型Pod慢导致平台删除集群超时的问题。在灰度redis隔离集群的时候，发现升级kubelet并重启服务后，少量宿主状态变成了NotReady，并且回滚kubelet至之前版本，宿主状态仍然是NotReady。查看宿主状态时提示 ‘container runtime is down’ ，显示容器运行时出了问题。</p><p>我们使用的容器运行时是docker。我们就去检查docker的状态，检测结果如下：</p><ul><li>docker ps 查看所有容器列表，执行正常</li><li>docker inspect 查看容器详细状态，某一容器执行阻塞</li></ul><p>典型的docker hang死行为。我们最近在升级docker版本，存量宿主docker的版本为1.13.1，并且在逐步升级至18.06.3，新宿主的docker版本都是18.06.3。docker hang死问题在1.13.1版本上表现得更彻底，在执行docker ps的时候就已经hang死了；而docker 18.06.3做了一点小小的优化，在执行docker ps时去掉了容器级别的加锁操作。但是很多docker命令在执行前都会申请容器锁，因此一旦某一个容器出现问题，并不会造成docker服务不可响应，受影响的也仅仅是该容器，无法执行操作。</p><p>至于为什么以docker ps与docker inspect为指标检查docker状态，因为kubelet就是依赖这两个docker命令获取容器状态。</p><p>所以，现在问题有二：</p><ul><li>docker hang死的根因是什么？</li><li>docker hang死时，为什么重启kubelet，会导致宿主状态变为NotReady？</li></ul><p>docker hang死的排查详见：<a href="https://plpan.github.io/docker-hang-%E6%AD%BB%E6%8E%92%E6%9F%A5%E4%B9%8B%E6%97%85/">docker-hang-死排查之旅</a>。现在我们再来分析，当容器异常时，为什么重启kubelet，宿主的状态会从Ready变成NotReady。</p><h3 id="宿主状态生成机制"><a href="#宿主状态生成机制" class="headerlink" title="宿主状态生成机制"></a>宿主状态生成机制</h3><p>在问题排查之前，我们需要先了解宿主状态的生成机制。</p><p>宿主的所有状态都是node.Status的属性，因此我们直接定位kubelet设置node.Status的代码即可。</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// Run starts the kubelet reacting to config updates</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(kl *Kubelet)</span> <span class="title">Run</span><span class="params">(updates &lt;-<span class="keyword">chan</span> kubetypes.PodUpdate)</span></span> &#123;</span><br><span class="line">  ......</span><br><span class="line">  <span class="keyword">if</span> kl.kubeClient != <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="comment">// Start syncing node status immediately, this may set up things the runtime needs to run.</span></span><br><span class="line"><span class="keyword">go</span> wait.Until(kl.syncNodeStatus, kl.nodeStatusUpdateFrequency, wait.NeverStop)</span><br><span class="line"><span class="keyword">go</span> kl.fastStatusUpdateOnce()</span><br><span class="line">&#125;</span><br><span class="line">  ......</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>kubelet在启动时创建了一个goroutine，周期性地向apiserver同步本宿主的状态，同步周期默认是10s。</p><p>跟踪调用链路，我们可以看到kubelet针对宿主会设置多个Condition，表明宿主当前所处的状态，比如宿主内存是否告急、线程数是否告急，以及宿主是否就绪。</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// defaultNodeStatusFuncs is a factory that generates the default set of</span></span><br><span class="line"><span class="comment">// setNodeStatus funcs</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(kl *Kubelet)</span> <span class="title">defaultNodeStatusFuncs</span><span class="params">()</span> []<span class="title">func</span><span class="params">(*v1.Node)</span> <span class="title">error</span></span> &#123;</span><br><span class="line">   ......</span><br><span class="line">   setters = <span class="built_in">append</span>(setters,</span><br><span class="line">      nodestatus.OutOfDiskCondition(kl.clock.Now, kl.recordNodeStatusEvent),</span><br><span class="line">      nodestatus.MemoryPressureCondition(kl.clock.Now, kl.evictionManager.IsUnderMemoryPressure, kl.recordNodeStatusEvent),</span><br><span class="line">      nodestatus.DiskPressureCondition(kl.clock.Now, kl.evictionManager.IsUnderDiskPressure, kl.recordNodeStatusEvent),</span><br><span class="line">      nodestatus.PIDPressureCondition(kl.clock.Now, kl.evictionManager.IsUnderPIDPressure, kl.recordNodeStatusEvent),</span><br><span class="line">      nodestatus.ReadyCondition(kl.clock.Now, kl.runtimeState.runtimeErrors, kl.runtimeState.networkErrors, validateHostFunc, kl.containerManager.Status, kl.recordNodeStatusEvent),</span><br><span class="line">      nodestatus.VolumesInUse(kl.volumeManager.ReconcilerStatesHasBeenSynced, kl.volumeManager.GetVolumesInUse),</span><br><span class="line">      <span class="comment">// TODO(mtaufen): I decided not to move this setter for now, since all it does is send an event</span></span><br><span class="line">      <span class="comment">// and record state back to the Kubelet runtime object. In the future, I&#x27;d like to isolate</span></span><br><span class="line">      <span class="comment">// these side-effects by decoupling the decisions to send events and partial status recording</span></span><br><span class="line">      <span class="comment">// from the Node setters.</span></span><br><span class="line">      kl.recordNodeSchedulableEvent,</span><br><span class="line">   )</span><br><span class="line">   <span class="keyword">return</span> setters</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>其中Ready Condition表明宿主是否就绪，kubectl查看宿主状态时，展示的Status信息就是Ready Condition的状态，常见的状态及其含义定义如下：</p><ul><li>Ready状态：表明宿主状态一切OK，能正常响应Pod事件</li><li>NotReady状态：表明宿主的kubelet仍在运行，但是此时已经无法处理Pod事件。NotReady绝大多数情况都是由容器运行时异常导致</li><li>Unknown状态：表明宿主上的kubelet已停止运行</li></ul><p>kubelet定义的Ready Condition的判定条件如下：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// ReadyCondition returns a Setter that updates the v1.NodeReady condition on the node.</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">ReadyCondition</span><span class="params">(</span></span></span><br><span class="line"><span class="function"><span class="params">nowFunc <span class="keyword">func</span>()</span> <span class="title">time</span>.<span class="title">Time</span>, // <span class="title">typically</span> <span class="title">Kubelet</span>.<span class="title">clock</span>.<span class="title">Now</span></span></span><br><span class="line">runtimeErrorsFunc <span class="function"><span class="keyword">func</span><span class="params">()</span> []<span class="title">string</span>, // <span class="title">typically</span> <span class="title">Kubelet</span>.<span class="title">runtimeState</span>.<span class="title">runtimeErrors</span></span></span><br><span class="line">networkErrorsFunc <span class="function"><span class="keyword">func</span><span class="params">()</span> []<span class="title">string</span>, // <span class="title">typically</span> <span class="title">Kubelet</span>.<span class="title">runtimeState</span>.<span class="title">networkErrors</span></span></span><br><span class="line">appArmorValidateHostFunc <span class="function"><span class="keyword">func</span><span class="params">()</span> <span class="title">error</span>, // <span class="title">typically</span> <span class="title">Kubelet</span>.<span class="title">appArmorValidator</span>.<span class="title">ValidateHost</span>, <span class="title">might</span> <span class="title">be</span> <span class="title">nil</span> <span class="title">depending</span> <span class="title">on</span> <span class="title">whether</span> <span class="title">there</span> <span class="title">was</span> <span class="title">an</span> <span class="title">appArmorValidator</span></span></span><br><span class="line">cmStatusFunc <span class="function"><span class="keyword">func</span><span class="params">()</span> <span class="title">cm</span>.<span class="title">Status</span>, // <span class="title">typically</span> <span class="title">Kubelet</span>.<span class="title">containerManager</span>.<span class="title">Status</span></span></span><br><span class="line">recordEventFunc <span class="function"><span class="keyword">func</span><span class="params">(eventType, event <span class="keyword">string</span>)</span>, // <span class="title">typically</span> <span class="title">Kubelet</span>.<span class="title">recordNodeStatusEvent</span></span></span><br><span class="line">) Setter &#123;</span><br><span class="line"><span class="keyword">return</span> <span class="function"><span class="keyword">func</span><span class="params">(node *v1.Node)</span> <span class="title">error</span></span> &#123;</span><br><span class="line">......</span><br><span class="line">rs := <span class="built_in">append</span>(runtimeErrorsFunc(), networkErrorsFunc()...)</span><br><span class="line">    requiredCapacities := []v1.ResourceName&#123;v1.ResourceCPU, v1.ResourceMemory, v1.ResourcePods&#125;</span><br><span class="line"><span class="keyword">if</span> utilfeature.DefaultFeatureGate.Enabled(features.LocalStorageCapacityIsolation) &#123;</span><br><span class="line">requiredCapacities = <span class="built_in">append</span>(requiredCapacities, v1.ResourceEphemeralStorage)</span><br><span class="line">&#125;</span><br><span class="line">missingCapacities := []<span class="keyword">string</span>&#123;&#125;</span><br><span class="line"><span class="keyword">for</span> _, resource := <span class="keyword">range</span> requiredCapacities &#123;</span><br><span class="line"><span class="keyword">if</span> _, found := node.Status.Capacity[resource]; !found &#123;</span><br><span class="line">missingCapacities = <span class="built_in">append</span>(missingCapacities, <span class="keyword">string</span>(resource))</span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">if</span> <span class="built_in">len</span>(missingCapacities) &gt; <span class="number">0</span> &#123;</span><br><span class="line">rs = <span class="built_in">append</span>(rs, fmt.Sprintf(<span class="string">&quot;Missing node capacity for resources: %s&quot;</span>, strings.Join(missingCapacities, <span class="string">&quot;, &quot;</span>)))</span><br><span class="line">&#125;</span><br><span class="line">    <span class="keyword">if</span> <span class="built_in">len</span>(rs) &gt; <span class="number">0</span> &#123;</span><br><span class="line">newNodeReadyCondition = v1.NodeCondition&#123;</span><br><span class="line">Type:              v1.NodeReady,</span><br><span class="line">Status:            v1.ConditionFalse,</span><br><span class="line">Reason:            <span class="string">&quot;KubeletNotReady&quot;</span>,</span><br><span class="line">Message:           strings.Join(rs, <span class="string">&quot;,&quot;</span>),</span><br><span class="line">LastHeartbeatTime: currentTime,</span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br><span class="line">......</span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>以上代码片段显示，宿主是否Ready取决于很多条件，包含运行时判定、网络判定、基本资源判定等。</p><h3 id="宿主状态变化定位"><a href="#宿主状态变化定位" class="headerlink" title="宿主状态变化定位"></a>宿主状态变化定位</h3><p>接下来，我们将重点放在运行时判定，分析宿主状态发生变化的原因。运行时判定条件定义如下：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(s *runtimeState)</span> <span class="title">runtimeErrors</span><span class="params">()</span> []<span class="title">string</span></span> &#123;</span><br><span class="line">   s.RLock()</span><br><span class="line">   <span class="keyword">defer</span> s.RUnlock()</span><br><span class="line">   <span class="keyword">var</span> ret []<span class="keyword">string</span></span><br><span class="line">   <span class="keyword">if</span> !s.lastBaseRuntimeSync.Add(s.baseRuntimeSyncThreshold).After(time.Now()) &#123;  <span class="comment">// 1</span></span><br><span class="line">      ret = <span class="built_in">append</span>(ret, <span class="string">&quot;container runtime is down&quot;</span>)</span><br><span class="line">   &#125;</span><br><span class="line">   <span class="keyword">if</span> s.internalError != <span class="literal">nil</span> &#123;</span><br><span class="line">      ret = <span class="built_in">append</span>(ret, s.internalError.Error())</span><br><span class="line">   &#125;</span><br><span class="line">   <span class="keyword">for</span> _, hc := <span class="keyword">range</span> s.healthChecks &#123;                                            <span class="comment">// 2</span></span><br><span class="line">      <span class="keyword">if</span> ok, err := hc.fn(); !ok &#123;</span><br><span class="line">         ret = <span class="built_in">append</span>(ret, fmt.Sprintf(<span class="string">&quot;%s is not healthy: %v&quot;</span>, hc.name, err))</span><br><span class="line">      &#125;</span><br><span class="line">   &#125;</span><br><span class="line">   <span class="keyword">return</span> ret</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>当出现如下两种状况之一时，则判定运行时检查不通过：</p><ol><li>当前时间距最近一次运行时同步操作 (lastBaseRuntimeSync) 的时间间隔超过指定阈值（默认30s）</li><li>运行时健康检查未通过</li></ol><p>那么，当时宿主的NotReady是由哪种状况引起的呢？结合kubelet日志分析，kubelet每隔5s就输出一条日志：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">......</span><br><span class="line">I0715 <span class="number">10</span>:<span class="number">43</span>:<span class="number">28.049240</span>   <span class="number">16315</span> kubelet.<span class="keyword">go</span>:<span class="number">1835</span>] skipping pod synchronization - [container runtime is down]</span><br><span class="line">I0715 <span class="number">10</span>:<span class="number">43</span>:<span class="number">33.049359</span>   <span class="number">16315</span> kubelet.<span class="keyword">go</span>:<span class="number">1835</span>] skipping pod synchronization - [container runtime is down]</span><br><span class="line">I0715 <span class="number">10</span>:<span class="number">43</span>:<span class="number">38.049492</span>   <span class="number">16315</span> kubelet.<span class="keyword">go</span>:<span class="number">1835</span>] skipping pod synchronization - [container runtime is down]</span><br><span class="line">......</span><br></pre></td></tr></table></figure><p>因此，状况1是宿主NotReady的元凶。</p><p>我们继续分析为什么kubelet没有按照预期设置lastBaseRuntimeSync。kubelet启动时会创建一个goroutine，并在该goroutine中循环设置lastBaseRuntimeSync，循环如下：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(kl *Kubelet)</span> <span class="title">Run</span><span class="params">(updates &lt;-<span class="keyword">chan</span> kubetypes.PodUpdate)</span></span> &#123;</span><br><span class="line">   ......</span><br><span class="line">   <span class="keyword">go</span> wait.Until(kl.updateRuntimeUp, <span class="number">5</span>*time.Second, wait.NeverStop)</span><br><span class="line">   ......</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(kl *Kubelet)</span> <span class="title">updateRuntimeUp</span><span class="params">()</span></span> &#123;</span><br><span class="line">   kl.updateRuntimeMux.Lock()</span><br><span class="line">   <span class="keyword">defer</span> kl.updateRuntimeMux.Unlock()</span><br><span class="line">   ......</span><br><span class="line">   kl.oneTimeInitializer.Do(kl.initializeRuntimeDependentModules)</span><br><span class="line">   kl.runtimeState.setRuntimeSync(kl.clock.Now())</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(s *runtimeState)</span> <span class="title">setRuntimeSync</span><span class="params">(t time.Time)</span></span> &#123;</span><br><span class="line"> s.Lock()</span><br><span class="line"> <span class="keyword">defer</span> s.Unlock()</span><br><span class="line"> s.lastBaseRuntimeSync = t</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>正常情况下，kubelet每隔5s会将lastBaseRuntimeSync设置为当前时间，而宿主状态异常时，这个时间戳一直未被更新。也即updateRuntimeUp一直被阻塞在设置lastBaseRuntimeSync之前的某一步。因此，我们只需逐个排查updateRuntimeUp内的函数调用即可，具体过程不再展示，最终的函数调用链路如下：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">initializeRuntimeDependentModules -&gt; kl.cadvisor.Start -&gt; cc.Manager.Start -&gt; self.createContainer -&gt; m.createContainerLocked -&gt; container.NewContainerHandler -&gt; factory.CanHandleAndAccept -&gt; self.client.ContainerInspect</span><br></pre></td></tr></table></figure><p>调用链路显示，最后cadvisor执行docker inspect时被hang死，阻塞了kubelet的一个关键初始化流程。</p><p>如果容器异常发生在kubelet初始化完毕之后，则不会对宿主的Ready状态造成任何影响，因为oneTimeInitializer是sync.Once类型，也即仅仅会在kubelet启动时执行一次。那时容器异常对kubelet的影响有限，仅仅是不能处理该Pod的任何事件，包含删除、变更等，但是仍然能够处理其他Pod的事件。</p><p>这就解释了为什么kubelet重启前宿主状态是Ready，重启后状态是NotReady。</p><h3 id="后续"><a href="#后续" class="headerlink" title="后续"></a>后续</h3><p>可能有人会问，为什么cadvisor执行docker inspect操作不加超时控制？确实，如果添加了超时控制，重启kubelet不会引起宿主状态变更。个人觉得添加超时控制没有什么问题，不清楚是否有啥坑，待详细挖掘后再来补充。</p>]]></content>
    
    
      
      
    <summary type="html">&lt;h3 id=&quot;背景&quot;&gt;&lt;a href=&quot;#背景&quot; class=&quot;headerlink&quot; title=&quot;背景&quot;&gt;&lt;/a&gt;背景&lt;/h3&gt;&lt;p&gt;最近升级了一版kubelet，修复因kubelet删除Sidecar类型Pod慢导致平台删除集群超时的问题。在灰度redis隔离集群的时候，</summary>
      
    
    
    
    <category term="问题排查" scheme="https://plpan.github.io/categories/%E9%97%AE%E9%A2%98%E6%8E%92%E6%9F%A5/"/>
    
    
    <category term="kubernetes" scheme="https://plpan.github.io/tags/kubernetes/"/>
    
    <category term="docker" scheme="https://plpan.github.io/tags/docker/"/>
    
  </entry>
  
  <entry>
    <title>docker hang 死排查之旅</title>
    <link href="https://plpan.github.io/docker-hang-%E6%AD%BB%E6%8E%92%E6%9F%A5%E4%B9%8B%E6%97%85/"/>
    <id>https://plpan.github.io/docker-hang-%E6%AD%BB%E6%8E%92%E6%9F%A5%E4%B9%8B%E6%97%85/</id>
    <published>2020-07-17T08:20:51.000Z</published>
    <updated>2020-11-12T14:53:31.389Z</updated>
    
    <content type="html"><![CDATA[<h3 id="背景"><a href="#背景" class="headerlink" title="背景"></a>背景</h3><p>最近，我们在升级kubelet时，发现部分宿主机上docker出现hang死的现象，发现过程详见：<a href="https://plpan.github.io/docker-hang-%E6%AD%BB%E9%98%BB%E5%A1%9E-kubelet-%E5%88%9D%E5%A7%8B%E5%8C%96%E6%B5%81%E7%A8%8B/">docker-hang-死阻塞-kubelet-初始化流程</a>。</p><p>现在，我们聚焦在docker上，分析docker hang死问题发生时的现象、形成的原因、问题定位的方法，以及对应的解决办法。本文详细记录了整个过程。</p><h3 id="docker-hang死"><a href="#docker-hang死" class="headerlink" title="docker hang死"></a>docker hang死</h3><p>我们对docker hang死并不陌生，因为已经发生了好多起。其发生时的现象也多种多样。以往针对docker 1.13.1版本的排查都发现了一些线索，但是并没有定位到根因，最终绝大多数也是通过重启docker解决。而这一次发生在docker 18.06.3版本的docker hang死行为，经过我们4人小分队接近一周的望闻问切，终于确定了其病因。注意，docker hang死的原因远不止一种，因此本排查方法与结果并不具有普适性。</p><p>在开始问题排查之前，我们先整理目前掌握的知识：</p><ul><li>特定容器异常，无法响应docker inspect操作</li></ul><p>除此之外的信息，我们则一无所知。</p><p>当我们排查一个未知的问题时，一般的做法是先找一个切入点，然后顺藤摸瓜，逐步缩小问题排查的圈定范围，并最终在细枝末节上定位问题的所在。而本问题中，docker显然是我们唯一的切入点。</p><h4 id="链路跟踪"><a href="#链路跟踪" class="headerlink" title="链路跟踪"></a>链路跟踪</h4><p>首先，我们希望对docker运行的全局状况有一个大致的了解，熟悉go语言开发的用户自然能联想到调试神器pprof。我们借助pprof描绘出了docker当时运行的蓝图：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br></pre></td><td class="code"><pre><span class="line">goroutine profile: total <span class="number">722373</span></span><br><span class="line"><span class="number">717594</span> @ <span class="number">0x7fe8bc202980</span> <span class="number">0x7fe8bc202a40</span> <span class="number">0x7fe8bc2135d8</span> <span class="number">0x7fe8bc2132ef</span> <span class="number">0x7fe8bc238c1a</span> <span class="number">0x7fe8bd56f7fe</span> <span class="number">0x7fe8bd56f6bd</span> <span class="number">0x7fe8bcea8719</span> <span class="number">0x7fe8bcea938b</span> <span class="number">0x7fe8bcb726ca</span> <span class="number">0x7fe8bcb72b01</span> <span class="number">0x7fe8bc71c26b</span> <span class="number">0x7fe8bcb85f4a</span> <span class="number">0x7fe8bc4b9896</span> <span class="number">0x7fe8bc72a438</span> <span class="number">0x7fe8bcb849e2</span> <span class="number">0x7fe8bc4bc67e</span> <span class="number">0x7fe8bc4b88a3</span> <span class="number">0x7fe8bc230711</span></span><br><span class="line">#<span class="number">0x7fe8bc2132ee</span>sync.runtime_SemacquireMutex+<span class="number">0x3e</span>/usr/local/<span class="keyword">go</span>/src/runtime/sema.<span class="keyword">go</span>:<span class="number">71</span></span><br><span class="line">#<span class="number">0x7fe8bc238c19</span>sync.(*Mutex).Lock+<span class="number">0x109</span>/usr/local/<span class="keyword">go</span>/src/sync/mutex.<span class="keyword">go</span>:<span class="number">134</span></span><br><span class="line">#<span class="number">0x7fe8bd56f7fd</span>github.com/docker/docker/daemon.(*Daemon).ContainerInspectCurrent+<span class="number">0x8d</span>/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/daemon/inspect.<span class="keyword">go</span>:<span class="number">40</span></span><br><span class="line">#<span class="number">0x7fe8bd56f6bc</span>github.com/docker/docker/daemon.(*Daemon).ContainerInspect+<span class="number">0x11c</span>/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/daemon/inspect.<span class="keyword">go</span>:<span class="number">29</span></span><br><span class="line">#<span class="number">0x7fe8bcea8718</span>github.com/docker/docker/api/server/router/container.(*containerRouter).getContainersByName+<span class="number">0x118</span>/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router/container/inspect.<span class="keyword">go</span>:<span class="number">15</span></span><br><span class="line">#<span class="number">0x7fe8bcea938a</span>github.com/docker/docker/api/server/router/container.(*containerRouter).(github.com/docker/docker/api/server/router/container.getContainersByName)-fm+<span class="number">0x6a</span>/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router/container/container.<span class="keyword">go</span>:<span class="number">39</span></span><br><span class="line">#<span class="number">0x7fe8bcb726c9</span>github.com/docker/docker/api/server/middleware.ExperimentalMiddleware.WrapHandler.func1+<span class="number">0xd9</span>/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/middleware/experimental.<span class="keyword">go</span>:<span class="number">26</span></span><br><span class="line">#<span class="number">0x7fe8bcb72b00</span>github.com/docker/docker/api/server/middleware.VersionMiddleware.WrapHandler.func1+<span class="number">0x400</span>/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/middleware/version.<span class="keyword">go</span>:<span class="number">62</span></span><br><span class="line">#<span class="number">0x7fe8bc71c26a</span>github.com/docker/docker/pkg/authorization.(*Middleware).WrapHandler.func1+<span class="number">0x7aa</span>/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/pkg/authorization/middleware.<span class="keyword">go</span>:<span class="number">59</span></span><br><span class="line">#<span class="number">0x7fe8bcb85f49</span>github.com/docker/docker/api/server.(*Server).makeHTTPHandler.func1+<span class="number">0x199</span>/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/server.<span class="keyword">go</span>:<span class="number">141</span></span><br><span class="line">#<span class="number">0x7fe8bc4b9895</span>net/http.HandlerFunc.ServeHTTP+<span class="number">0x45</span>/usr/local/<span class="keyword">go</span>/src/net/http/server.<span class="keyword">go</span>:<span class="number">1947</span></span><br><span class="line">#<span class="number">0x7fe8bc72a437</span>github.com/docker/docker/vendor/github.com/gorilla/mux.(*Router).ServeHTTP+<span class="number">0x227</span>/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/gorilla/mux/mux.<span class="keyword">go</span>:<span class="number">103</span></span><br><span class="line">#<span class="number">0x7fe8bcb849e1</span>github.com/docker/docker/api/server.(*routerSwapper).ServeHTTP+<span class="number">0x71</span>/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router_swapper.<span class="keyword">go</span>:<span class="number">29</span></span><br><span class="line">#<span class="number">0x7fe8bc4bc67d</span>net/http.serverHandler.ServeHTTP+<span class="number">0xbd</span>/usr/local/<span class="keyword">go</span>/src/net/http/server.<span class="keyword">go</span>:<span class="number">2694</span></span><br><span class="line">#<span class="number">0x7fe8bc4b88a2</span>net/http.(*conn).serve+<span class="number">0x652</span>/usr/local/<span class="keyword">go</span>/src/net/http/server.<span class="keyword">go</span>:<span class="number">1830</span></span><br><span class="line"></span><br><span class="line"><span class="number">4175</span> @ <span class="number">0x7fe8bc202980</span> <span class="number">0x7fe8bc202a40</span> <span class="number">0x7fe8bc2135d8</span> <span class="number">0x7fe8bc2132ef</span> <span class="number">0x7fe8bc238c1a</span> <span class="number">0x7fe8bcc2eccf</span> <span class="number">0x7fe8bd597af4</span> <span class="number">0x7fe8bcea2456</span> <span class="number">0x7fe8bcea956b</span> <span class="number">0x7fe8bcb73dff</span> <span class="number">0x7fe8bcb726ca</span> <span class="number">0x7fe8bcb72b01</span> <span class="number">0x7fe8bc71c26b</span> <span class="number">0x7fe8bcb85f4a</span> <span class="number">0x7fe8bc4b9896</span> <span class="number">0x7fe8bc72a438</span> <span class="number">0x7fe8bcb849e2</span> <span class="number">0x7fe8bc4bc67e</span> <span class="number">0x7fe8bc4b88a3</span> <span class="number">0x7fe8bc230711</span></span><br><span class="line">#<span class="number">0x7fe8bc2132ee</span>sync.runtime_SemacquireMutex+<span class="number">0x3e</span>/usr/local/<span class="keyword">go</span>/src/runtime/sema.<span class="keyword">go</span>:<span class="number">71</span></span><br><span class="line">#<span class="number">0x7fe8bc238c19</span>sync.(*Mutex).Lock+<span class="number">0x109</span>/usr/local/<span class="keyword">go</span>/src/sync/mutex.<span class="keyword">go</span>:<span class="number">134</span></span><br><span class="line">#<span class="number">0x7fe8bcc2ecce</span>github.com/docker/docker/container.(*State).IsRunning+<span class="number">0x2e</span>/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/container/state.<span class="keyword">go</span>:<span class="number">240</span></span><br><span class="line">#<span class="number">0x7fe8bd597af3</span>github.com/docker/docker/daemon.(*Daemon).ContainerStats+<span class="number">0xb3</span>/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/daemon/stats.<span class="keyword">go</span>:<span class="number">30</span></span><br><span class="line">#<span class="number">0x7fe8bcea2455</span>github.com/docker/docker/api/server/router/container.(*containerRouter).getContainersStats+<span class="number">0x1e5</span>/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router/container/container_routes.<span class="keyword">go</span>:<span class="number">115</span></span><br><span class="line">#<span class="number">0x7fe8bcea956a</span>github.com/docker/docker/api/server/router/container.(*containerRouter).(github.com/docker/docker/api/server/router/container.getContainersStats)-fm+<span class="number">0x6a</span>/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router/container/container.<span class="keyword">go</span>:<span class="number">42</span></span><br><span class="line">#<span class="number">0x7fe8bcb73dfe</span>github.com/docker/docker/api/server/router.cancellableHandler.func1+<span class="number">0xce</span>/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router/local.<span class="keyword">go</span>:<span class="number">92</span></span><br><span class="line">#<span class="number">0x7fe8bcb726c9</span>github.com/docker/docker/api/server/middleware.ExperimentalMiddleware.WrapHandler.func1+<span class="number">0xd9</span>/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/middleware/experimental.<span class="keyword">go</span>:<span class="number">26</span></span><br><span class="line">#<span class="number">0x7fe8bcb72b00</span>github.com/docker/docker/api/server/middleware.VersionMiddleware.WrapHandler.func1+<span class="number">0x400</span>/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/middleware/version.<span class="keyword">go</span>:<span class="number">62</span></span><br><span class="line">#<span class="number">0x7fe8bc71c26a</span>github.com/docker/docker/pkg/authorization.(*Middleware).WrapHandler.func1+<span class="number">0x7aa</span>/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/pkg/authorization/middleware.<span class="keyword">go</span>:<span class="number">59</span></span><br><span class="line">#<span class="number">0x7fe8bcb85f49</span>github.com/docker/docker/api/server.(*Server).makeHTTPHandler.func1+<span class="number">0x199</span>/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/server.<span class="keyword">go</span>:<span class="number">141</span></span><br><span class="line">#<span class="number">0x7fe8bc4b9895</span>net/http.HandlerFunc.ServeHTTP+<span class="number">0x45</span>/usr/local/<span class="keyword">go</span>/src/net/http/server.<span class="keyword">go</span>:<span class="number">1947</span></span><br><span class="line">#<span class="number">0x7fe8bc72a437</span>github.com/docker/docker/vendor/github.com/gorilla/mux.(*Router).ServeHTTP+<span class="number">0x227</span>/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/gorilla/mux/mux.<span class="keyword">go</span>:<span class="number">103</span></span><br><span class="line">#<span class="number">0x7fe8bcb849e1</span>github.com/docker/docker/api/server.(*routerSwapper).ServeHTTP+<span class="number">0x71</span>/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router_swapper.<span class="keyword">go</span>:<span class="number">29</span></span><br><span class="line">#<span class="number">0x7fe8bc4bc67d</span>net/http.serverHandler.ServeHTTP+<span class="number">0xbd</span>/usr/local/<span class="keyword">go</span>/src/net/http/server.<span class="keyword">go</span>:<span class="number">2694</span></span><br><span class="line">#<span class="number">0x7fe8bc4b88a2</span>net/http.(*conn).serve+<span class="number">0x652</span>/usr/local/<span class="keyword">go</span>/src/net/http/server.<span class="keyword">go</span>:<span class="number">1830</span></span><br><span class="line"></span><br><span class="line"><span class="number">1</span> @ <span class="number">0x7fe8bc202980</span> <span class="number">0x7fe8bc202a40</span> <span class="number">0x7fe8bc2135d8</span> <span class="number">0x7fe8bc2131fb</span> <span class="number">0x7fe8bc239a3b</span> <span class="number">0x7fe8bcbb679d</span> <span class="number">0x7fe8bcc26774</span> <span class="number">0x7fe8bd570b20</span> <span class="number">0x7fe8bd56f81c</span> <span class="number">0x7fe8bd56f6bd</span> <span class="number">0x7fe8bcea8719</span> <span class="number">0x7fe8bcea938b</span> <span class="number">0x7fe8bcb726ca</span> <span class="number">0x7fe8bcb72b01</span> <span class="number">0x7fe8bc71c26b</span> <span class="number">0x7fe8bcb85f4a</span> <span class="number">0x7fe8bc4b9896</span> <span class="number">0x7fe8bc72a438</span> <span class="number">0x7fe8bcb849e2</span> <span class="number">0x7fe8bc4bc67e</span> <span class="number">0x7fe8bc4b88a3</span> <span class="number">0x7fe8bc230711</span></span><br><span class="line">#<span class="number">0x7fe8bc2131fa</span>sync.runtime_Semacquire+<span class="number">0x3a</span>/usr/local/<span class="keyword">go</span>/src/runtime/sema.<span class="keyword">go</span>:<span class="number">56</span></span><br><span class="line">#<span class="number">0x7fe8bc239a3a</span>sync.(*RWMutex).RLock+<span class="number">0x4a</span>/usr/local/<span class="keyword">go</span>/src/sync/rwmutex.<span class="keyword">go</span>:<span class="number">50</span></span><br><span class="line">#<span class="number">0x7fe8bcbb679c</span>github.com/docker/docker/daemon/exec.(*Store).List+<span class="number">0x4c</span>/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/daemon/exec/exec.<span class="keyword">go</span>:<span class="number">140</span></span><br><span class="line">#<span class="number">0x7fe8bcc26773</span>github.com/docker/docker/container.(*Container).GetExecIDs+<span class="number">0x33</span>/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/container/container.<span class="keyword">go</span>:<span class="number">423</span></span><br><span class="line">#<span class="number">0x7fe8bd570b1f</span>github.com/docker/docker/daemon.(*Daemon).getInspectData+<span class="number">0x5cf</span>/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/daemon/inspect.<span class="keyword">go</span>:<span class="number">178</span></span><br><span class="line">#<span class="number">0x7fe8bd56f81b</span>github.com/docker/docker/daemon.(*Daemon).ContainerInspectCurrent+<span class="number">0xab</span>/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/daemon/inspect.<span class="keyword">go</span>:<span class="number">42</span></span><br><span class="line">#<span class="number">0x7fe8bd56f6bc</span>github.com/docker/docker/daemon.(*Daemon).ContainerInspect+<span class="number">0x11c</span>/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/daemon/inspect.<span class="keyword">go</span>:<span class="number">29</span></span><br><span class="line">#<span class="number">0x7fe8bcea8718</span>github.com/docker/docker/api/server/router/container.(*containerRouter).getContainersByName+<span class="number">0x118</span>/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router/container/inspect.<span class="keyword">go</span>:<span class="number">15</span></span><br><span class="line">#<span class="number">0x7fe8bcea938a</span>github.com/docker/docker/api/server/router/container.(*containerRouter).(github.com/docker/docker/api/server/router/container.getContainersByName)-fm+<span class="number">0x6a</span>/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router/container/container.<span class="keyword">go</span>:<span class="number">39</span></span><br><span class="line">#<span class="number">0x7fe8bcb726c9</span>github.com/docker/docker/api/server/middleware.ExperimentalMiddleware.WrapHandler.func1+<span class="number">0xd9</span>/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/middleware/experimental.<span class="keyword">go</span>:<span class="number">26</span></span><br><span class="line">#<span class="number">0x7fe8bcb72b00</span>github.com/docker/docker/api/server/middleware.VersionMiddleware.WrapHandler.func1+<span class="number">0x400</span>/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/middleware/version.<span class="keyword">go</span>:<span class="number">62</span></span><br><span class="line">#<span class="number">0x7fe8bc71c26a</span>github.com/docker/docker/pkg/authorization.(*Middleware).WrapHandler.func1+<span class="number">0x7aa</span>/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/pkg/authorization/middleware.<span class="keyword">go</span>:<span class="number">59</span></span><br><span class="line">#<span class="number">0x7fe8bcb85f49</span>github.com/docker/docker/api/server.(*Server).makeHTTPHandler.func1+<span class="number">0x199</span>/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/server.<span class="keyword">go</span>:<span class="number">141</span></span><br><span class="line">#<span class="number">0x7fe8bc4b9895</span>net/http.HandlerFunc.ServeHTTP+<span class="number">0x45</span>/usr/local/<span class="keyword">go</span>/src/net/http/server.<span class="keyword">go</span>:<span class="number">1947</span></span><br><span class="line">#<span class="number">0x7fe8bc72a437</span>github.com/docker/docker/vendor/github.com/gorilla/mux.(*Router).ServeHTTP+<span class="number">0x227</span>/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/gorilla/mux/mux.<span class="keyword">go</span>:<span class="number">103</span></span><br><span class="line">#<span class="number">0x7fe8bcb849e1</span>github.com/docker/docker/api/server.(*routerSwapper).ServeHTTP+<span class="number">0x71</span>/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router_swapper.<span class="keyword">go</span>:<span class="number">29</span></span><br><span class="line">#<span class="number">0x7fe8bc4bc67d</span>net/http.serverHandler.ServeHTTP+<span class="number">0xbd</span>/usr/local/<span class="keyword">go</span>/src/net/http/server.<span class="keyword">go</span>:<span class="number">2694</span></span><br><span class="line">#<span class="number">0x7fe8bc4b88a2</span>net/http.(*conn).serve+<span class="number">0x652</span>/usr/local/<span class="keyword">go</span>/src/net/http/server.<span class="keyword">go</span>:<span class="number">1830</span></span><br><span class="line"></span><br><span class="line"><span class="number">1</span> @ <span class="number">0x7fe8bc202980</span> <span class="number">0x7fe8bc212946</span> <span class="number">0x7fe8bc8b6881</span> <span class="number">0x7fe8bc8b699d</span> <span class="number">0x7fe8bc8e259b</span> <span class="number">0x7fe8bc8e1695</span> <span class="number">0x7fe8bc8c47d5</span> <span class="number">0x7fe8bd2e0c06</span> <span class="number">0x7fe8bd2eda96</span> <span class="number">0x7fe8bc8c42fb</span> <span class="number">0x7fe8bc8c4613</span> <span class="number">0x7fe8bd2a6474</span> <span class="number">0x7fe8bd2e6976</span> <span class="number">0x7fe8bd3661c5</span> <span class="number">0x7fe8bd56842f</span> <span class="number">0x7fe8bcea7bdb</span> <span class="number">0x7fe8bcea9f6b</span> <span class="number">0x7fe8bcb726ca</span> <span class="number">0x7fe8bcb72b01</span> <span class="number">0x7fe8bc71c26b</span> <span class="number">0x7fe8bcb85f4a</span> <span class="number">0x7fe8bc4b9896</span> <span class="number">0x7fe8bc72a438</span> <span class="number">0x7fe8bcb849e2</span> <span class="number">0x7fe8bc4bc67e</span> <span class="number">0x7fe8bc4b88a3</span> <span class="number">0x7fe8bc230711</span></span><br><span class="line">#<span class="number">0x7fe8bc8b6880</span>github.com/docker/docker/vendor/google.golang.org/grpc/transport.(*Stream).waitOnHeader+<span class="number">0x100</span>/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/transport/transport.<span class="keyword">go</span>:<span class="number">222</span></span><br><span class="line">#<span class="number">0x7fe8bc8b699c</span>github.com/docker/docker/vendor/google.golang.org/grpc/transport.(*Stream).RecvCompress+<span class="number">0x2c</span>/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/transport/transport.<span class="keyword">go</span>:<span class="number">233</span></span><br><span class="line">#<span class="number">0x7fe8bc8e259a</span>github.com/docker/docker/vendor/google.golang.org/grpc.(*csAttempt).recvMsg+<span class="number">0x63a</span>/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/stream.<span class="keyword">go</span>:<span class="number">515</span></span><br><span class="line">#<span class="number">0x7fe8bc8e1694</span>github.com/docker/docker/vendor/google.golang.org/grpc.(*clientStream).RecvMsg+<span class="number">0x44</span>/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/stream.<span class="keyword">go</span>:<span class="number">395</span></span><br><span class="line">#<span class="number">0x7fe8bc8c47d4</span>github.com/docker/docker/vendor/google.golang.org/grpc.invoke+<span class="number">0x184</span>/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/call.<span class="keyword">go</span>:<span class="number">83</span></span><br><span class="line">#<span class="number">0x7fe8bd2e0c05</span>github.com/docker/docker/vendor/github.com/containerd/containerd.namespaceInterceptor.unary+<span class="number">0xf5</span>/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/containerd/containerd/grpc.<span class="keyword">go</span>:<span class="number">35</span></span><br><span class="line">#<span class="number">0x7fe8bd2eda95</span>github.com/docker/docker/vendor/github.com/containerd/containerd.(namespaceInterceptor).(github.com/docker/docker/vendor/github.com/containerd/containerd.unary)-fm+<span class="number">0xf5</span>/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/containerd/containerd/grpc.<span class="keyword">go</span>:<span class="number">51</span></span><br><span class="line">#<span class="number">0x7fe8bc8c42fa</span>github.com/docker/docker/vendor/google.golang.org/grpc.(*ClientConn).Invoke+<span class="number">0x10a</span>/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/call.<span class="keyword">go</span>:<span class="number">35</span></span><br><span class="line">#<span class="number">0x7fe8bc8c4612</span>github.com/docker/docker/vendor/google.golang.org/grpc.Invoke+<span class="number">0xc2</span>/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/google.golang.org/grpc/call.<span class="keyword">go</span>:<span class="number">60</span></span><br><span class="line">#<span class="number">0x7fe8bd2a6473</span>github.com/docker/docker/vendor/github.com/containerd/containerd/api/services/tasks/v1.(*tasksClient).Start+<span class="number">0xd3</span>/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/containerd/containerd/api/services/tasks/v1/tasks.pb.<span class="keyword">go</span>:<span class="number">421</span></span><br><span class="line">#<span class="number">0x7fe8bd2e6975</span>github.com/docker/docker/vendor/github.com/containerd/containerd.(*process).Start+<span class="number">0xf5</span>/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/containerd/containerd/process.<span class="keyword">go</span>:<span class="number">109</span></span><br><span class="line">#<span class="number">0x7fe8bd3661c4</span>github.com/docker/docker/libcontainerd.(*client).Exec+<span class="number">0x4b4</span>/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/libcontainerd/client_daemon.<span class="keyword">go</span>:<span class="number">381</span></span><br><span class="line">#<span class="number">0x7fe8bd56842e</span>github.com/docker/docker/daemon.(*Daemon).ContainerExecStart+<span class="number">0xb4e</span>/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/daemon/exec.<span class="keyword">go</span>:<span class="number">251</span></span><br><span class="line">#<span class="number">0x7fe8bcea7bda</span>github.com/docker/docker/api/server/router/container.(*containerRouter).postContainerExecStart+<span class="number">0x34a</span>/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router/container/exec.<span class="keyword">go</span>:<span class="number">125</span></span><br><span class="line">#<span class="number">0x7fe8bcea9f6a</span>github.com/docker/docker/api/server/router/container.(*containerRouter).(github.com/docker/docker/api/server/router/container.postContainerExecStart)-fm+<span class="number">0x6a</span>/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router/container/container.<span class="keyword">go</span>:<span class="number">59</span></span><br><span class="line">#<span class="number">0x7fe8bcb726c9</span>github.com/docker/docker/api/server/middleware.ExperimentalMiddleware.WrapHandler.func1+<span class="number">0xd9</span>/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/middleware/experimental.<span class="keyword">go</span>:<span class="number">26</span></span><br><span class="line">#<span class="number">0x7fe8bcb72b00</span>github.com/docker/docker/api/server/middleware.VersionMiddleware.WrapHandler.func1+<span class="number">0x400</span>/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/middleware/version.<span class="keyword">go</span>:<span class="number">62</span></span><br><span class="line">#<span class="number">0x7fe8bc71c26a</span>github.com/docker/docker/pkg/authorization.(*Middleware).WrapHandler.func1+<span class="number">0x7aa</span>/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/pkg/authorization/middleware.<span class="keyword">go</span>:<span class="number">59</span></span><br><span class="line">#<span class="number">0x7fe8bcb85f49</span>github.com/docker/docker/api/server.(*Server).makeHTTPHandler.func1+<span class="number">0x199</span>/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/server.<span class="keyword">go</span>:<span class="number">141</span></span><br><span class="line">#<span class="number">0x7fe8bc4b9895</span>net/http.HandlerFunc.ServeHTTP+<span class="number">0x45</span>/usr/local/<span class="keyword">go</span>/src/net/http/server.<span class="keyword">go</span>:<span class="number">1947</span></span><br><span class="line">#<span class="number">0x7fe8bc72a437</span>github.com/docker/docker/vendor/github.com/gorilla/mux.(*Router).ServeHTTP+<span class="number">0x227</span>/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/vendor/github.com/gorilla/mux/mux.<span class="keyword">go</span>:<span class="number">103</span></span><br><span class="line">#<span class="number">0x7fe8bcb849e1</span>github.com/docker/docker/api/server.(*routerSwapper).ServeHTTP+<span class="number">0x71</span>/root/rpmbuild/BUILD/src/engine/.gopath/src/github.com/docker/docker/api/server/router_swapper.<span class="keyword">go</span>:<span class="number">29</span></span><br><span class="line">#<span class="number">0x7fe8bc4bc67d</span>net/http.serverHandler.ServeHTTP+<span class="number">0xbd</span>/usr/local/<span class="keyword">go</span>/src/net/http/server.<span class="keyword">go</span>:<span class="number">2694</span></span><br><span class="line">#<span class="number">0x7fe8bc4b88a2</span>net/http.(*conn).serve+<span class="number">0x652</span>/usr/local/<span class="keyword">go</span>/src/net/http/server.<span class="keyword">go</span>:<span class="number">1830</span></span><br></pre></td></tr></table></figure><p>注意，这是一份精简后的docker协程栈信息。从上面的蓝图，我们可以总结出如下结论：</p><ul><li>有 717594 个协程被阻塞在docker inspect</li><li>有 4175 个协程被阻塞在docker stats</li><li>有 1 个协程被阻塞在获取 docker exec的任务ID</li><li>有 1 个协程被阻塞在docker exec的执行过程</li></ul><p>从上面的结论，我们基本了解了异常容器hang死的原因：在于该容器执行docker exec (4)后未返回，进而导致获取docker exec的任务ID (3)阻塞，由于(3)获取了容器锁，进而导致了docker inspect (1)与docker stats (2)卡死。所以病因并非是docker inspect，而是docker exec。</p><p>要想继续往下挖掘，我们现在有必要补充一下背景知识。kubelet启动容器或者在容器内执行命令的完整调用路径如下：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br></pre></td><td class="code"><pre><span class="line">+--------------------------------------------------------------+</span><br><span class="line">|                                                              |</span><br><span class="line">|   +------------+                                             |</span><br><span class="line">|   |            |                                             |</span><br><span class="line">|   |   kubelet  |                                             |</span><br><span class="line">|   |            |                                             |</span><br><span class="line">|   +------|-----+                                             |</span><br><span class="line">|          |                                                   |</span><br><span class="line">|          |                                                   |</span><br><span class="line">|   +------v-----+       +---------------+                     |</span><br><span class="line">|   |            |       |               |                     |</span><br><span class="line">|   |   dockerd  -------&gt;|  containerd   |                     |</span><br><span class="line">|   |            |       |               |                     |</span><br><span class="line">|   +------------+       +-------|-------+                     |</span><br><span class="line">|                                |                             |</span><br><span class="line">|                                |                             |</span><br><span class="line">|                        +-------v-------+     +-----------+   |</span><br><span class="line">|                        |               |     |           |   |</span><br><span class="line">|                        |containerd-shim-----&gt;|   runc    |   |</span><br><span class="line">|                        |               |     |           |   |</span><br><span class="line">|                        +---------------+     +-----------+   |</span><br><span class="line">|                                                              |</span><br><span class="line">+--------------------------------------------------------------+</span><br></pre></td></tr></table></figure><p>dockerd与containerd可以当做两层nginx代理，containerd-shim是容器的监护人，而runc则是容器启动与命令执行的真正工具人。runc干的事情也非常简单：按照用户指定的配置创建NS，或者进入特定NS，然后执行用户命令。说白了，创建容器就是新建NS，然后在该NS内执行用户指定的命令。</p><p>按照上面介绍的背景知识，我们继续往下探索containerd。幸运的是，借助pprof，我们也可以描绘出containerd此时的运行蓝图：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br></pre></td><td class="code"><pre><span class="line">goroutine profile: total <span class="number">430</span></span><br><span class="line"><span class="number">1</span> @ <span class="number">0x7f6e55f82740</span> <span class="number">0x7f6e55f92616</span> <span class="number">0x7f6e56a8412c</span> <span class="number">0x7f6e56a83d6d</span> <span class="number">0x7f6e56a911bf</span> <span class="number">0x7f6e56ac6e3b</span> <span class="number">0x7f6e565093de</span> <span class="number">0x7f6e5650dd3b</span> <span class="number">0x7f6e5650392b</span> <span class="number">0x7f6e56b51216</span> <span class="number">0x7f6e564e5909</span> <span class="number">0x7f6e563ec76a</span> <span class="number">0x7f6e563f000a</span> <span class="number">0x7f6e563f6791</span> <span class="number">0x7f6e55fb0151</span></span><br><span class="line">#<span class="number">0x7f6e56a8412b</span>github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.(*Client).dispatch+<span class="number">0x24b</span>/<span class="keyword">go</span>/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/client.<span class="keyword">go</span>:<span class="number">102</span></span><br><span class="line">#<span class="number">0x7f6e56a83d6c</span>github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc.(*Client).Call+<span class="number">0x15c</span>/<span class="keyword">go</span>/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/client.<span class="keyword">go</span>:<span class="number">73</span></span><br><span class="line">#<span class="number">0x7f6e56a911be</span>github.com/containerd/containerd/linux/shim/v1.(*shimClient).Start+<span class="number">0xbe</span>/<span class="keyword">go</span>/src/github.com/containerd/containerd/linux/shim/v1/shim.pb.<span class="keyword">go</span>:<span class="number">1745</span></span><br><span class="line">#<span class="number">0x7f6e56ac6e3a</span>github.com/containerd/containerd/linux.(*Process).Start+<span class="number">0x8a</span>/<span class="keyword">go</span>/src/github.com/containerd/containerd/linux/process.<span class="keyword">go</span>:<span class="number">125</span></span><br><span class="line">#<span class="number">0x7f6e565093dd</span>github.com/containerd/containerd/services/tasks.(*local).Start+<span class="number">0x14d</span>/<span class="keyword">go</span>/src/github.com/containerd/containerd/services/tasks/local.<span class="keyword">go</span>:<span class="number">187</span></span><br><span class="line">#<span class="number">0x7f6e5650dd3a</span>github.com/containerd/containerd/services/tasks.(*service).Start+<span class="number">0x6a</span>/<span class="keyword">go</span>/src/github.com/containerd/containerd/services/tasks/service.<span class="keyword">go</span>:<span class="number">72</span></span><br><span class="line">#<span class="number">0x7f6e5650392a</span>github.com/containerd/containerd/api/services/tasks/v1._Tasks_Start_Handler.func1+<span class="number">0x8a</span>/<span class="keyword">go</span>/src/github.com/containerd/containerd/api/services/tasks/v1/tasks.pb.<span class="keyword">go</span>:<span class="number">624</span></span><br><span class="line">#<span class="number">0x7f6e56b51215</span>github.com/containerd/containerd/vendor/github.com/grpc-ecosystem/<span class="keyword">go</span>-grpc-prometheus.UnaryServerInterceptor+<span class="number">0xa5</span>/<span class="keyword">go</span>/src/github.com/containerd/containerd/vendor/github.com/grpc-ecosystem/<span class="keyword">go</span>-grpc-prometheus/server.<span class="keyword">go</span>:<span class="number">29</span></span><br><span class="line">#<span class="number">0x7f6e564e5908</span>github.com/containerd/containerd/api/services/tasks/v1._Tasks_Start_Handler+<span class="number">0x168</span>/<span class="keyword">go</span>/src/github.com/containerd/containerd/api/services/tasks/v1/tasks.pb.<span class="keyword">go</span>:<span class="number">626</span></span><br><span class="line">#<span class="number">0x7f6e563ec769</span>github.com/containerd/containerd/vendor/google.golang.org/grpc.(*Server).processUnaryRPC+<span class="number">0x849</span>/<span class="keyword">go</span>/src/github.com/containerd/containerd/vendor/google.golang.org/grpc/server.<span class="keyword">go</span>:<span class="number">920</span></span><br><span class="line">#<span class="number">0x7f6e563f0009</span>github.com/containerd/containerd/vendor/google.golang.org/grpc.(*Server).handleStream+<span class="number">0x1319</span>/<span class="keyword">go</span>/src/github.com/containerd/containerd/vendor/google.golang.org/grpc/server.<span class="keyword">go</span>:<span class="number">1142</span></span><br><span class="line">#<span class="number">0x7f6e563f6790</span>github.com/containerd/containerd/vendor/google.golang.org/grpc.(*Server).serveStreams.func1<span class="number">.1</span>+<span class="number">0xa0</span>/<span class="keyword">go</span>/src/github.com/containerd/containerd/vendor/google.golang.org/grpc/server.<span class="keyword">go</span>:<span class="number">637</span></span><br></pre></td></tr></table></figure><p>同样，我们仅保留了关键的协程信息，从上面的协程栈可以看出，containerd阻塞在接收exec返回结果处，附上关键代码佐证：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(c *Client)</span> <span class="title">dispatch</span><span class="params">(ctx context.Context, req *Request, resp *Response)</span> <span class="title">error</span></span> &#123;</span><br><span class="line">   errs := <span class="built_in">make</span>(<span class="keyword">chan</span> error, <span class="number">1</span>)</span><br><span class="line">   call := &amp;callRequest&#123;</span><br><span class="line">      req:  req,</span><br><span class="line">      resp: resp,</span><br><span class="line">      errs: errs,</span><br><span class="line">   &#125;</span><br><span class="line"></span><br><span class="line">   <span class="keyword">select</span> &#123;</span><br><span class="line">   <span class="keyword">case</span> c.calls &lt;- call:</span><br><span class="line">   <span class="keyword">case</span> &lt;-c.done:</span><br><span class="line">      <span class="keyword">return</span> c.err</span><br><span class="line">   &#125;</span><br><span class="line"></span><br><span class="line">   <span class="keyword">select</span> &#123;        <span class="comment">// 此处对应上面协程栈 /go/src/github.com/containerd/containerd/vendor/github.com/stevvooe/ttrpc/client.go:102</span></span><br><span class="line">   <span class="keyword">case</span> err := &lt;-errs:</span><br><span class="line">      <span class="keyword">return</span> filterCloseErr(err)</span><br><span class="line">   <span class="keyword">case</span> &lt;-c.done:</span><br><span class="line">      <span class="keyword">return</span> c.err</span><br><span class="line">   &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>containerd将请求传递至containerd-shim之后，一直在等待containerd-shim的返回。</p><p>正常情况下，如果我们能够按照调用链路逐个分析每个组件的协程调用栈信息，我们能够很快的定位问题所在。不幸的是，由于线上docker没有开启debug模式，我们无法收集containerd-shim的pprof信息，并且runc也没有开启pprof。因此单纯依赖协程调用链路定位问题这条路被堵死了。</p><p>截至目前，我们已经收集了部分关键信息，同时也将问题排查范围更进一步地缩小在containerd-shim与runc之间。接下来我们换一种思路继续排查。</p><h4 id="进程排查"><a href="#进程排查" class="headerlink" title="进程排查"></a>进程排查</h4><p>当组件的运行状态无法继续获取时，我们转换一下思维，获取容器的运行状态，也即异常容器此时的进程状态。</p><p>既然docker ps执行正常，而docker inspect hang死，首先我们定位异常容器，命令如下：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">docker ps | grep -v NAME | awk <span class="string">&#x27;&#123;print $1&#125;&#x27;</span> | while read cid; do echo $cid; docker inspect -f &#123;&#123;.State.Pid&#125;&#125; $cid; done</span><br></pre></td></tr></table></figure><p>拿到异常容器的ID之后，我们就能扫描与该容器相关的所有进程：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">UID      PID    PPID  C STIME TTY          TIME CMD</span><br><span class="line">root     <span class="number">11646</span>  <span class="number">6655</span>  <span class="number">0</span> Jun17 ?        <span class="number">00</span>:<span class="number">01</span>:<span class="number">04</span> docker-containerd-shim -namespace moby -workdir /home/docker_rt/containerd/daemon/io.containerd.runtime.v1.linux/moby/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5 -address /<span class="keyword">var</span>/run/docker/containerd/docker-containerd.sock -containerd-binary /usr/bin/docker-containerd -runtime-root /<span class="keyword">var</span>/run/docker/runtime-runc</span><br><span class="line">root     <span class="number">11680</span> <span class="number">11646</span>  <span class="number">0</span> Jun17 ?        <span class="number">00</span>:<span class="number">00</span>:<span class="number">00</span> /dockerinit</span><br><span class="line">root     <span class="number">15581</span> <span class="number">11646</span>  <span class="number">0</span> Jun17 ?        <span class="number">00</span>:<span class="number">00</span>:<span class="number">00</span> docker-runc --root /<span class="keyword">var</span>/run/docker/runtime-runc/moby --log /run/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5/log.json --log-format json exec --process /tmp/runc-process616674997 --detach --pid-file /run/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5/<span class="number">0594</span>c5897a41d401e4d1d7ddd44dacdd316c7e7d53bfdae7f16b0f6b26fcbcda.pid bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5</span><br><span class="line">root     <span class="number">15638</span> <span class="number">15581</span>  <span class="number">0</span> Jun17 ?        <span class="number">00</span>:<span class="number">00</span>:<span class="number">00</span> docker-runc init</span><br></pre></td></tr></table></figure><p>核心进程列表如上，简单备注下：</p><ul><li>6655：containerd进程</li><li>11646：异常容器的containerd-shim进程</li><li>11680：异常容器的容器启动进程。在容器内查看，因PID NS的隔离，该进程ID是1</li><li>15581：在异常容器内执行用户命令的进程，此时还未进入容器内部</li><li>15638：在异常容器内执行用户命令时，进入容器NS的进程</li></ul><p>这里再补充一个背景知识：当我们启动容器时，首先会创建runc init进程，创建并进入新的容器NS；而当我们在容器内执行命令时，首先也会创建runc init进程，进入容器的NS。只有在进入容器的隔离NS之后，才会执行用户指定的命令。</p><p>面对上面的进程列表，我们无法直观地分辨问题究竟由哪个进程引起。因此，我们还需要了解进程当前所处的状态。借助strace，我们逐一展示进程的活动状态：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// 11646 (container-shim)</span></span><br><span class="line">Process <span class="number">11646</span> attached with <span class="number">10</span> threads</span><br><span class="line">[pid <span class="number">37342</span>] epoll_pwait(<span class="number">5</span>,  &lt;unfinished ...&gt;</span><br><span class="line">[pid <span class="number">11656</span>] futex(<span class="number">0x818cc0</span>, FUTEX_WAIT, <span class="number">0</span>, NULL &lt;unfinished ...&gt;</span><br><span class="line">[pid <span class="number">11655</span>] restart_syscall(&lt;... resuming interrupted call ...&gt; &lt;unfinished ...&gt;</span><br><span class="line">[pid <span class="number">11654</span>] futex(<span class="number">0x818bd8</span>, FUTEX_WAIT, <span class="number">0</span>, NULL &lt;unfinished ...&gt;</span><br><span class="line">[pid <span class="number">11653</span>] futex(<span class="number">0x7fc730</span>, FUTEX_WAKE, <span class="number">1</span> &lt;unfinished ...&gt;</span><br><span class="line">[pid <span class="number">11651</span>] futex(<span class="number">0xc4200b4148</span>, FUTEX_WAIT, <span class="number">0</span>, NULL &lt;unfinished ...&gt;</span><br><span class="line">[pid <span class="number">11650</span>] futex(<span class="number">0xc420082948</span>, FUTEX_WAIT, <span class="number">0</span>, NULL &lt;unfinished ...&gt;</span><br><span class="line">[pid <span class="number">11649</span>] futex(<span class="number">0xc420082548</span>, FUTEX_WAIT, <span class="number">0</span>, NULL &lt;unfinished ...&gt;</span><br><span class="line">[pid <span class="number">11647</span>] restart_syscall(&lt;... resuming interrupted call ...&gt; &lt;unfinished ...&gt;</span><br><span class="line">[pid <span class="number">11646</span>] futex(<span class="number">0x7fd008</span>, FUTEX_WAIT, <span class="number">0</span>, NULL &lt;unfinished ...&gt;</span><br><span class="line">[pid <span class="number">11653</span>] &lt;... futex resumed&gt; )       = <span class="number">0</span></span><br><span class="line">[pid <span class="number">11647</span>] &lt;... restart_syscall resumed&gt; ) = <span class="number">-1</span> EAGAIN (Resource temporarily unavailable)</span><br><span class="line">[pid <span class="number">11653</span>] epoll_wait(<span class="number">4</span>,  &lt;unfinished ...&gt;</span><br><span class="line">[pid <span class="number">11647</span>] pselect6(<span class="number">0</span>, NULL, NULL, NULL, &#123;<span class="number">0</span>, <span class="number">20000</span>&#125;, <span class="number">0</span>) = <span class="number">0</span> (Timeout)</span><br><span class="line">[pid <span class="number">11647</span>] futex(<span class="number">0x7fc730</span>, FUTEX_WAIT, <span class="number">0</span>, &#123;<span class="number">60</span>, <span class="number">0</span>&#125;</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="comment">// 11581 (runc exec)</span></span><br><span class="line">Process <span class="number">15581</span> attached with <span class="number">7</span> threads</span><br><span class="line">[pid <span class="number">15619</span>] read(<span class="number">6</span>,  &lt;unfinished ...&gt;</span><br><span class="line">[pid <span class="number">15592</span>] futex(<span class="number">0xc4200be148</span>, FUTEX_WAIT, <span class="number">0</span>, NULL &lt;unfinished ...&gt;</span><br><span class="line">[pid <span class="number">15591</span>] futex(<span class="number">0x7fd6d25f6238</span>, FUTEX_WAIT, <span class="number">0</span>, NULL &lt;unfinished ...&gt;</span><br><span class="line">[pid <span class="number">15590</span>] futex(<span class="number">0xc420084d48</span>, FUTEX_WAIT, <span class="number">0</span>, NULL &lt;unfinished ...&gt;</span><br><span class="line">[pid <span class="number">15586</span>] futex(<span class="number">0x7fd6d25f6320</span>, FUTEX_WAIT, <span class="number">0</span>, NULL &lt;unfinished ...&gt;</span><br><span class="line">[pid <span class="number">15584</span>] restart_syscall(&lt;... resuming interrupted call ...&gt; &lt;unfinished ...&gt;</span><br><span class="line">[pid <span class="number">15581</span>] futex(<span class="number">0x7fd6d25d9b28</span>, FUTEX_WAIT, <span class="number">0</span>, NULL</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="comment">// 11638 (runc init)</span></span><br><span class="line">Process <span class="number">15638</span> attached with <span class="number">7</span> threads</span><br><span class="line">[pid <span class="number">15648</span>] futex(<span class="number">0x7f512cea5320</span>, FUTEX_WAIT, <span class="number">0</span>, NULL &lt;unfinished ...&gt;</span><br><span class="line">[pid <span class="number">15647</span>] futex(<span class="number">0x7f512cea5238</span>, FUTEX_WAIT, <span class="number">0</span>, NULL &lt;unfinished ...&gt;</span><br><span class="line">[pid <span class="number">15645</span>] futex(<span class="number">0xc4200bc148</span>, FUTEX_WAIT, <span class="number">0</span>, NULL &lt;unfinished ...&gt;</span><br><span class="line">[pid <span class="number">15643</span>] futex(<span class="number">0xc420082d48</span>, FUTEX_WAIT, <span class="number">0</span>, NULL &lt;unfinished ...&gt;</span><br><span class="line">[pid <span class="number">15642</span>] futex(<span class="number">0xc420082948</span>, FUTEX_WAIT, <span class="number">0</span>, NULL &lt;unfinished ...&gt;</span><br><span class="line">[pid <span class="number">15639</span>] restart_syscall(&lt;... resuming interrupted call ...&gt; &lt;unfinished ...&gt;</span><br><span class="line">[pid <span class="number">15638</span>] write(<span class="number">2</span>, <span class="string">&quot;/usr/local/go/src/runtime/proc.g&quot;</span>..., <span class="number">33</span></span><br></pre></td></tr></table></figure><p>从关联进程的活动状态，我们可以得出如下结论：</p><ul><li>runc exec在等待从6号FD读取数据</li><li>runc init在等待从2号FD写入数据</li></ul><p>这些FD究竟对应的是什么文件呢？我们借助lsof可以查看：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// 11638 (runc init)</span></span><br><span class="line">COMMAND     PID USER   FD      TYPE             DEVICE SIZE/OFF       NODE NAME</span><br><span class="line">runc:[<span class="number">2</span>:I <span class="number">15638</span> root  cwd       DIR               <span class="number">0</span>,<span class="number">41</span>      <span class="number">192</span> <span class="number">1066743071</span> /</span><br><span class="line">runc:[<span class="number">2</span>:I <span class="number">15638</span> root  rtd       DIR               <span class="number">0</span>,<span class="number">41</span>      <span class="number">192</span> <span class="number">1066743071</span> /</span><br><span class="line">runc:[<span class="number">2</span>:I <span class="number">15638</span> root  txt       REG                <span class="number">0</span>,<span class="number">4</span>  <span class="number">7644224</span> <span class="number">1070360467</span> /memfd:runc_cloned:/proc/self/exe (deleted)</span><br><span class="line">runc:[<span class="number">2</span>:I <span class="number">15638</span> root  mem       REG                <span class="number">8</span>,<span class="number">3</span>  <span class="number">2107816</span>    <span class="number">1053962</span> /usr/lib64/libc<span class="number">-2.17</span>.so</span><br><span class="line">runc:[<span class="number">2</span>:I <span class="number">15638</span> root  mem       REG                <span class="number">8</span>,<span class="number">3</span>    <span class="number">19512</span>    <span class="number">1054285</span> /usr/lib64/libdl<span class="number">-2.17</span>.so</span><br><span class="line">runc:[<span class="number">2</span>:I <span class="number">15638</span> root  mem       REG                <span class="number">8</span>,<span class="number">3</span>   <span class="number">266688</span>    <span class="number">1050626</span> /usr/lib64/libseccomp.so<span class="number">.2</span><span class="number">.3</span><span class="number">.1</span></span><br><span class="line">runc:[<span class="number">2</span>:I <span class="number">15638</span> root  mem       REG                <span class="number">8</span>,<span class="number">3</span>   <span class="number">142296</span>    <span class="number">1055698</span> /usr/lib64/libpthread<span class="number">-2.17</span>.so</span><br><span class="line">runc:[<span class="number">2</span>:I <span class="number">15638</span> root  mem       REG                <span class="number">8</span>,<span class="number">3</span>    <span class="number">27168</span>    <span class="number">3024893</span> /usr/local/gundam/gundam_client/preload/lib64/gundam_preload.so</span><br><span class="line">runc:[<span class="number">2</span>:I <span class="number">15638</span> root  mem       REG                <span class="number">8</span>,<span class="number">3</span>   <span class="number">164432</span>    <span class="number">1054515</span> /usr/lib64/ld<span class="number">-2.17</span>.so</span><br><span class="line">runc:[<span class="number">2</span>:I <span class="number">15638</span> root    <span class="number">0</span>r     FIFO                <span class="number">0</span>,<span class="number">8</span>      <span class="number">0</span>t0 <span class="number">1070361745</span> pipe</span><br><span class="line">runc:[<span class="number">2</span>:I <span class="number">15638</span> root    <span class="number">1</span>w     FIFO                <span class="number">0</span>,<span class="number">8</span>      <span class="number">0</span>t0 <span class="number">1070361746</span> pipe</span><br><span class="line">runc:[<span class="number">2</span>:I <span class="number">15638</span> root    <span class="number">2</span>w     FIFO                <span class="number">0</span>,<span class="number">8</span>      <span class="number">0</span>t0 <span class="number">1070361747</span> pipe</span><br><span class="line">runc:[<span class="number">2</span>:I <span class="number">15638</span> root    <span class="number">3</span>u     unix <span class="number">0xffff881ff8273000</span>      <span class="number">0</span>t0 <span class="number">1070361341</span> socket</span><br><span class="line">runc:[<span class="number">2</span>:I <span class="number">15638</span> root    <span class="number">5</span>u  a_inode                <span class="number">0</span>,<span class="number">9</span>        <span class="number">0</span>       <span class="number">7180</span> [eventpoll]</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="comment">// 11581 (runc exec)</span></span><br><span class="line">COMMAND     PID USER   FD      TYPE             DEVICE SIZE/OFF       NODE NAME</span><br><span class="line">docker-ru <span class="number">15581</span> root  cwd       DIR               <span class="number">0</span>,<span class="number">18</span>      <span class="number">120</span> <span class="number">1066743076</span> /run/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5</span><br><span class="line">docker-ru <span class="number">15581</span> root  rtd       DIR                <span class="number">8</span>,<span class="number">3</span>     <span class="number">4096</span>          <span class="number">2</span> /</span><br><span class="line">docker-ru <span class="number">15581</span> root  txt       REG                <span class="number">8</span>,<span class="number">3</span>  <span class="number">7644224</span>     <span class="number">919775</span> /usr/bin/docker-runc</span><br><span class="line">docker-ru <span class="number">15581</span> root  mem       REG                <span class="number">8</span>,<span class="number">3</span>  <span class="number">2107816</span>    <span class="number">1053962</span> /usr/lib64/libc<span class="number">-2.17</span>.so</span><br><span class="line">docker-ru <span class="number">15581</span> root  mem       REG                <span class="number">8</span>,<span class="number">3</span>    <span class="number">19512</span>    <span class="number">1054285</span> /usr/lib64/libdl<span class="number">-2.17</span>.so</span><br><span class="line">docker-ru <span class="number">15581</span> root  mem       REG                <span class="number">8</span>,<span class="number">3</span>   <span class="number">266688</span>    <span class="number">1050626</span> /usr/lib64/libseccomp.so<span class="number">.2</span><span class="number">.3</span><span class="number">.1</span></span><br><span class="line">docker-ru <span class="number">15581</span> root  mem       REG                <span class="number">8</span>,<span class="number">3</span>   <span class="number">142296</span>    <span class="number">1055698</span> /usr/lib64/libpthread<span class="number">-2.17</span>.so</span><br><span class="line">docker-ru <span class="number">15581</span> root  mem       REG                <span class="number">8</span>,<span class="number">3</span>    <span class="number">27168</span>    <span class="number">3024893</span> /usr/local/gundam/gundam_client/preload/lib64/gundam_preload.so</span><br><span class="line">docker-ru <span class="number">15581</span> root  mem       REG                <span class="number">8</span>,<span class="number">3</span>   <span class="number">164432</span>    <span class="number">1054515</span> /usr/lib64/ld<span class="number">-2.17</span>.so</span><br><span class="line">docker-ru <span class="number">15581</span> root    <span class="number">0</span>r     FIFO                <span class="number">0</span>,<span class="number">8</span>      <span class="number">0</span>t0 <span class="number">1070361745</span> pipe</span><br><span class="line">docker-ru <span class="number">15581</span> root    <span class="number">1</span>w     FIFO                <span class="number">0</span>,<span class="number">8</span>      <span class="number">0</span>t0 <span class="number">1070361746</span> pipe</span><br><span class="line">docker-ru <span class="number">15581</span> root    <span class="number">2</span>w     FIFO                <span class="number">0</span>,<span class="number">8</span>      <span class="number">0</span>t0 <span class="number">1070361747</span> pipe</span><br><span class="line">docker-ru <span class="number">15581</span> root    <span class="number">3</span>w      REG               <span class="number">0</span>,<span class="number">18</span>     <span class="number">5456</span> <span class="number">1066709902</span> /run/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5/log.json</span><br><span class="line">docker-ru <span class="number">15581</span> root    <span class="number">4</span>u  a_inode                <span class="number">0</span>,<span class="number">9</span>        <span class="number">0</span>       <span class="number">7180</span> [eventpoll]</span><br><span class="line">docker-ru <span class="number">15581</span> root    <span class="number">6</span>u     unix <span class="number">0xffff881ff8275400</span>      <span class="number">0</span>t0 <span class="number">1070361342</span> socket</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="comment">// 11646 (container-shim)</span></span><br><span class="line">COMMAND     PID USER   FD      TYPE             DEVICE SIZE/OFF       NODE NAME</span><br><span class="line">docker-co <span class="number">11646</span> root  cwd       DIR               <span class="number">0</span>,<span class="number">18</span>      <span class="number">120</span> <span class="number">1066743076</span> /run/docker/containerd/daemon/io.containerd.runtime.v1.linux/moby/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5</span><br><span class="line">docker-co <span class="number">11646</span> root  rtd       DIR                <span class="number">8</span>,<span class="number">3</span>     <span class="number">4096</span>          <span class="number">2</span> /</span><br><span class="line">docker-co <span class="number">11646</span> root  txt       REG                <span class="number">8</span>,<span class="number">3</span>  <span class="number">4173632</span>     <span class="number">919772</span> /usr/bin/docker-containerd-shim</span><br><span class="line">docker-co <span class="number">11646</span> root    <span class="number">0</span>r      CHR                <span class="number">1</span>,<span class="number">3</span>      <span class="number">0</span>t0       <span class="number">2052</span> /dev/null</span><br><span class="line">docker-co <span class="number">11646</span> root    <span class="number">1</span>w      CHR                <span class="number">1</span>,<span class="number">3</span>      <span class="number">0</span>t0       <span class="number">2052</span> /dev/null</span><br><span class="line">docker-co <span class="number">11646</span> root    <span class="number">2</span>w      CHR                <span class="number">1</span>,<span class="number">3</span>      <span class="number">0</span>t0       <span class="number">2052</span> /dev/null</span><br><span class="line">docker-co <span class="number">11646</span> root    <span class="number">3</span>r     FIFO                <span class="number">0</span>,<span class="number">8</span>      <span class="number">0</span>t0 <span class="number">1070361745</span> pipe</span><br><span class="line">docker-co <span class="number">11646</span> root    <span class="number">4</span>u  a_inode                <span class="number">0</span>,<span class="number">9</span>        <span class="number">0</span>       <span class="number">7180</span> [eventpoll]</span><br><span class="line">docker-co <span class="number">11646</span> root    <span class="number">5</span>u  a_inode                <span class="number">0</span>,<span class="number">9</span>        <span class="number">0</span>       <span class="number">7180</span> [eventpoll]</span><br><span class="line">docker-co <span class="number">11646</span> root    <span class="number">6</span>u     unix <span class="number">0xffff881e8cac2800</span>      <span class="number">0</span>t0 <span class="number">1066743079</span> @/containerd-shim/moby/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5/shim.sock</span><br><span class="line">docker-co <span class="number">11646</span> root    <span class="number">7</span>u     unix <span class="number">0xffff881e8cac3400</span>      <span class="number">0</span>t0 <span class="number">1066743968</span> @/containerd-shim/moby/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5/shim.sock</span><br><span class="line">docker-co <span class="number">11646</span> root    <span class="number">8</span>r     FIFO                <span class="number">0</span>,<span class="number">8</span>      <span class="number">0</span>t0 <span class="number">1066743970</span> pipe</span><br><span class="line">docker-co <span class="number">11646</span> root    <span class="number">9</span>w     FIFO                <span class="number">0</span>,<span class="number">8</span>      <span class="number">0</span>t0 <span class="number">1070361745</span> pipe</span><br><span class="line">docker-co <span class="number">11646</span> root   <span class="number">10</span>r     FIFO                <span class="number">0</span>,<span class="number">8</span>      <span class="number">0</span>t0 <span class="number">1066743971</span> pipe</span><br><span class="line">docker-co <span class="number">11646</span> root   <span class="number">11</span>u     FIFO               <span class="number">0</span>,<span class="number">18</span>      <span class="number">0</span>t0 <span class="number">1066700778</span> /run/docker/containerd/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5/init-stdout</span><br><span class="line">docker-co <span class="number">11646</span> root   <span class="number">12</span>r     FIFO                <span class="number">0</span>,<span class="number">8</span>      <span class="number">0</span>t0 <span class="number">1066743972</span> pipe</span><br><span class="line">docker-co <span class="number">11646</span> root   <span class="number">13</span>w     FIFO               <span class="number">0</span>,<span class="number">18</span>      <span class="number">0</span>t0 <span class="number">1066700778</span> /run/docker/containerd/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5/init-stdout</span><br><span class="line">docker-co <span class="number">11646</span> root   <span class="number">14</span>u     FIFO               <span class="number">0</span>,<span class="number">18</span>      <span class="number">0</span>t0 <span class="number">1066700778</span> /run/docker/containerd/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5/init-stdout</span><br><span class="line">docker-co <span class="number">11646</span> root   <span class="number">15</span>r     FIFO               <span class="number">0</span>,<span class="number">18</span>      <span class="number">0</span>t0 <span class="number">1066700778</span> /run/docker/containerd/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5/init-stdout</span><br><span class="line">docker-co <span class="number">11646</span> root   <span class="number">16</span>u     FIFO               <span class="number">0</span>,<span class="number">18</span>      <span class="number">0</span>t0 <span class="number">1066700779</span> /run/docker/containerd/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5/init-stderr</span><br><span class="line">docker-co <span class="number">11646</span> root   <span class="number">17</span>w     FIFO               <span class="number">0</span>,<span class="number">18</span>      <span class="number">0</span>t0 <span class="number">1066700779</span> /run/docker/containerd/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5/init-stderr</span><br><span class="line">docker-co <span class="number">11646</span> root   <span class="number">18</span>u     FIFO               <span class="number">0</span>,<span class="number">18</span>      <span class="number">0</span>t0 <span class="number">1066700779</span> /run/docker/containerd/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5/init-stderr</span><br><span class="line">docker-co <span class="number">11646</span> root   <span class="number">19</span>r     FIFO               <span class="number">0</span>,<span class="number">18</span>      <span class="number">0</span>t0 <span class="number">1066700779</span> /run/docker/containerd/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5/init-stderr</span><br><span class="line">docker-co <span class="number">11646</span> root   <span class="number">20</span>r     FIFO                <span class="number">0</span>,<span class="number">8</span>      <span class="number">0</span>t0 <span class="number">1070361746</span> pipe</span><br><span class="line">docker-co <span class="number">11646</span> root   <span class="number">26</span>r     FIFO                <span class="number">0</span>,<span class="number">8</span>      <span class="number">0</span>t0 <span class="number">1070361747</span> pipe</span><br></pre></td></tr></table></figure><p>有心人结合strace与lsof的结果，已经能够自己得出结论。</p><p>runc init往2号FD内写数据时阻塞，2号FD对应的类型是pipe类型。而linux pipe有一个默认的数据大小，当写入的数据超过该大小时，同时读端并未读取数据，写端就会被阻塞。</p><p>小结一下：containerd-shim启动runc exec去容器内执行用户命令，而runc exec启动runc init进入容器时，由于往2号FD写数据超过限制大小而被阻塞。当最底层的runc init被阻塞时，造成了调用链路上所有进程都被阻塞：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">runc init → runc exec → containerd-shim exec → containerd exec → dockerd exec</span><br></pre></td></tr></table></figure><p>问题定位至此，我们已经了解了docker hang死的原因。但是，现在我们还有如下问题并未解决：</p><ul><li>为什么runc init会往2号FD (对应go语言的os.Stderr) 中写入超过linux pipe大小限制的数据？</li><li>为什么runc init出现问题只发生在特定容器？</li></ul><p>如果常态下runc init就需要往os.Stdout或者os.Stderr中写入很多数据，那么所有容器的创建都应该有问题。所以，我们可以确定是该异常容器出现了什么未知原因，导致runc init非预期往os.Stderr写入了大量数据。而此时runc init往os.Stderr中写入的数据就很有可能揭示非预期的异常。</p><p>所以，我们需要获取runc init当前正在写入的数据。由于runc init的2号FD是个匿名pipe，我们无法使用常规文件读取的方式获取pipe内的数据。这里感谢鹤哥趟坑，找到了一种读取匿名pipe内容的方法：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br><span class="line">91</span><br><span class="line">92</span><br><span class="line">93</span><br><span class="line">94</span><br><span class="line">95</span><br><span class="line">96</span><br><span class="line">97</span><br><span class="line">98</span><br><span class="line">99</span><br><span class="line">100</span><br><span class="line">101</span><br><span class="line">102</span><br><span class="line">103</span><br><span class="line">104</span><br><span class="line">105</span><br><span class="line">106</span><br><span class="line">107</span><br><span class="line">108</span><br><span class="line">109</span><br><span class="line">110</span><br><span class="line">111</span><br><span class="line">112</span><br><span class="line">113</span><br><span class="line">114</span><br><span class="line">115</span><br><span class="line">116</span><br><span class="line">117</span><br><span class="line">118</span><br><span class="line">119</span><br><span class="line">120</span><br><span class="line">121</span><br></pre></td><td class="code"><pre><span class="line"># cat /proc/<span class="number">15638</span>/fd/<span class="number">2</span></span><br><span class="line">runtime/cgo: pthread_create failed: Resource temporarily unavailable</span><br><span class="line">SIGABRT: abort</span><br><span class="line">PC=<span class="number">0x7f512b7365f7</span> m=<span class="number">0</span> sigcode=<span class="number">18446744073709551610</span></span><br><span class="line"></span><br><span class="line">goroutine <span class="number">0</span> [idle]:</span><br><span class="line">runtime: unknown pc <span class="number">0x7f512b7365f7</span></span><br><span class="line">stack: frame=&#123;sp:<span class="number">0x7ffe1121a658</span>, fp:<span class="number">0x0</span>&#125; stack=[<span class="number">0x7ffe0ae1bb28</span>,<span class="number">0x7ffe1121ab50</span>)</span><br><span class="line"><span class="number">00007</span>ffe1121a558:  <span class="number">00007</span>ffe1121a6d8  <span class="number">00007</span>ffe1121a6b0</span><br><span class="line"><span class="number">00007</span>ffe1121a568:  <span class="number">0000000000000001</span>  <span class="number">00007</span>f512c527660</span><br><span class="line"><span class="number">00007</span>ffe1121a578:  <span class="number">00007</span>f512c54d560  <span class="number">00007</span>f512c54d208</span><br><span class="line"><span class="number">00007</span>ffe1121a588:  <span class="number">00007</span>f512c333e6f  <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007</span>ffe1121a598:  <span class="number">00007</span>f512c527660  <span class="number">0000000000000005</span></span><br><span class="line"><span class="number">00007</span>ffe1121a5a8:  <span class="number">0000000000000000</span>  <span class="number">0000000000000001</span></span><br><span class="line"><span class="number">00007</span>ffe1121a5b8:  <span class="number">00007</span>f512c54d208  <span class="number">00007</span>f512c528000</span><br><span class="line"><span class="number">00007</span>ffe1121a5c8:  <span class="number">00007</span>ffe1121a600  <span class="number">00007</span>f512b704b0c</span><br><span class="line"><span class="number">00007</span>ffe1121a5d8:  <span class="number">00007</span>f512b7110c0  <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007</span>ffe1121a5e8:  <span class="number">00007</span>f512c54d560  <span class="number">00007</span>ffe1121a620</span><br><span class="line"><span class="number">00007</span>ffe1121a5f8:  <span class="number">00007</span>ffe1121a610  <span class="number">000000000</span>f11ed7d</span><br><span class="line"><span class="number">00007</span>ffe1121a608:  <span class="number">00007</span>f512c550153  <span class="number">00000000</span>ffffffff</span><br><span class="line"><span class="number">00007</span>ffe1121a618:  <span class="number">00007</span>f512c550a9b  <span class="number">00007</span>f512b707d00</span><br><span class="line"><span class="number">00007</span>ffe1121a628:  <span class="number">00007</span>f512babc868  <span class="number">00007</span>f512c9e9e5e</span><br><span class="line"><span class="number">00007</span>ffe1121a638:  <span class="number">00007</span>f512d3bb080  <span class="number">00000000000000</span>f1</span><br><span class="line"><span class="number">00007</span>ffe1121a648:  <span class="number">0000000000000011</span>  <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007</span>ffe1121a658: &lt;<span class="number">00007</span>f512b737ce8  <span class="number">0000000000000020</span></span><br><span class="line"><span class="number">00007</span>ffe1121a668:  <span class="number">0000000000000000</span>  <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007</span>ffe1121a678:  <span class="number">0000000000000000</span>  <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007</span>ffe1121a688:  <span class="number">0000000000000000</span>  <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007</span>ffe1121a698:  <span class="number">0000000000000000</span>  <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007</span>ffe1121a6a8:  <span class="number">0000000000000000</span>  <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007</span>ffe1121a6b8:  <span class="number">0000000000000000</span>  <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007</span>ffe1121a6c8:  <span class="number">0000000000000000</span>  <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007</span>ffe1121a6d8:  <span class="number">0000000000000000</span>  <span class="number">00007</span>f512babc868</span><br><span class="line"><span class="number">00007</span>ffe1121a6e8:  <span class="number">00007</span>f512c9e9e5e  <span class="number">00007</span>f512d3bb080</span><br><span class="line"><span class="number">00007</span>ffe1121a6f8:  <span class="number">00007</span>f512c33f260  <span class="number">00007</span>f512babc1c0</span><br><span class="line"><span class="number">00007</span>ffe1121a708:  <span class="number">00007</span>f512babc1c0  <span class="number">0000000000000001</span></span><br><span class="line"><span class="number">00007</span>ffe1121a718:  <span class="number">00007</span>f512babc243  <span class="number">00000000000000</span>f1</span><br><span class="line"><span class="number">00007</span>ffe1121a728:  <span class="number">00007</span>f512b7787ec  <span class="number">0000000000000001</span></span><br><span class="line"><span class="number">00007</span>ffe1121a738:  <span class="number">00007</span>f512babc1c0  <span class="number">000000000000000</span>a</span><br><span class="line"><span class="number">00007</span>ffe1121a748:  <span class="number">00007</span>f512b7e8a4d  <span class="number">000000000000000</span>a</span><br><span class="line">runtime: unknown pc <span class="number">0x7f512b7365f7</span></span><br><span class="line">stack: frame=&#123;sp:<span class="number">0x7ffe1121a658</span>, fp:<span class="number">0x0</span>&#125; stack=[<span class="number">0x7ffe0ae1bb28</span>,<span class="number">0x7ffe1121ab50</span>)</span><br><span class="line"><span class="number">00007</span>ffe1121a558:  <span class="number">00007</span>ffe1121a6d8  <span class="number">00007</span>ffe1121a6b0</span><br><span class="line"><span class="number">00007</span>ffe1121a568:  <span class="number">0000000000000001</span>  <span class="number">00007</span>f512c527660</span><br><span class="line"><span class="number">00007</span>ffe1121a578:  <span class="number">00007</span>f512c54d560  <span class="number">00007</span>f512c54d208</span><br><span class="line"><span class="number">00007</span>ffe1121a588:  <span class="number">00007</span>f512c333e6f  <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007</span>ffe1121a598:  <span class="number">00007</span>f512c527660  <span class="number">0000000000000005</span></span><br><span class="line"><span class="number">00007</span>ffe1121a5a8:  <span class="number">0000000000000000</span>  <span class="number">0000000000000001</span></span><br><span class="line"><span class="number">00007</span>ffe1121a5b8:  <span class="number">00007</span>f512c54d208  <span class="number">00007</span>f512c528000</span><br><span class="line"><span class="number">00007</span>ffe1121a5c8:  <span class="number">00007</span>ffe1121a600  <span class="number">00007</span>f512b704b0c</span><br><span class="line"><span class="number">00007</span>ffe1121a5d8:  <span class="number">00007</span>f512b7110c0  <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007</span>ffe1121a5e8:  <span class="number">00007</span>f512c54d560  <span class="number">00007</span>ffe1121a620</span><br><span class="line"><span class="number">00007</span>ffe1121a5f8:  <span class="number">00007</span>ffe1121a610  <span class="number">000000000</span>f11ed7d</span><br><span class="line"><span class="number">00007</span>ffe1121a608:  <span class="number">00007</span>f512c550153  <span class="number">00000000</span>ffffffff</span><br><span class="line"><span class="number">00007</span>ffe1121a618:  <span class="number">00007</span>f512c550a9b  <span class="number">00007</span>f512b707d00</span><br><span class="line"><span class="number">00007</span>ffe1121a628:  <span class="number">00007</span>f512babc868  <span class="number">00007</span>f512c9e9e5e</span><br><span class="line"><span class="number">00007</span>ffe1121a638:  <span class="number">00007</span>f512d3bb080  <span class="number">00000000000000</span>f1</span><br><span class="line"><span class="number">00007</span>ffe1121a648:  <span class="number">0000000000000011</span>  <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007</span>ffe1121a658: &lt;<span class="number">00007</span>f512b737ce8  <span class="number">0000000000000020</span></span><br><span class="line"><span class="number">00007</span>ffe1121a668:  <span class="number">0000000000000000</span>  <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007</span>ffe1121a678:  <span class="number">0000000000000000</span>  <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007</span>ffe1121a688:  <span class="number">0000000000000000</span>  <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007</span>ffe1121a698:  <span class="number">0000000000000000</span>  <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007</span>ffe1121a6a8:  <span class="number">0000000000000000</span>  <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007</span>ffe1121a6b8:  <span class="number">0000000000000000</span>  <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007</span>ffe1121a6c8:  <span class="number">0000000000000000</span>  <span class="number">0000000000000000</span></span><br><span class="line"><span class="number">00007</span>ffe1121a6d8:  <span class="number">0000000000000000</span>  <span class="number">00007</span>f512babc868</span><br><span class="line"><span class="number">00007</span>ffe1121a6e8:  <span class="number">00007</span>f512c9e9e5e  <span class="number">00007</span>f512d3bb080</span><br><span class="line"><span class="number">00007</span>ffe1121a6f8:  <span class="number">00007</span>f512c33f260  <span class="number">00007</span>f512babc1c0</span><br><span class="line"><span class="number">00007</span>ffe1121a708:  <span class="number">00007</span>f512babc1c0  <span class="number">0000000000000001</span></span><br><span class="line"><span class="number">00007</span>ffe1121a718:  <span class="number">00007</span>f512babc243  <span class="number">00000000000000</span>f1</span><br><span class="line"><span class="number">00007</span>ffe1121a728:  <span class="number">00007</span>f512b7787ec  <span class="number">0000000000000001</span></span><br><span class="line"><span class="number">00007</span>ffe1121a738:  <span class="number">00007</span>f512babc1c0  <span class="number">000000000000000</span>a</span><br><span class="line"><span class="number">00007</span>ffe1121a748:  <span class="number">00007</span>f512b7e8a4d  <span class="number">000000000000000</span>a</span><br><span class="line"></span><br><span class="line">goroutine <span class="number">1</span> [running, locked to thread]:</span><br><span class="line">runtime.systemstack_switch()</span><br><span class="line">/usr/local/<span class="keyword">go</span>/src/runtime/asm_amd64.s:<span class="number">363</span> fp=<span class="number">0xc4200a3ed0</span> sp=<span class="number">0xc4200a3ec8</span> pc=<span class="number">0x7f512c7281d0</span></span><br><span class="line">runtime.startTheWorld()</span><br><span class="line">/usr/local/<span class="keyword">go</span>/src/runtime/proc.<span class="keyword">go</span>:<span class="number">978</span> +<span class="number">0x2f</span> fp=<span class="number">0xc4200a3ee8</span> sp=<span class="number">0xc4200a3ed0</span> pc=<span class="number">0x7f512c70221f</span></span><br><span class="line">runtime.GOMAXPROCS(<span class="number">0x1</span>, <span class="number">0xc42013d9a0</span>)</span><br><span class="line">/usr/local/<span class="keyword">go</span>/src/runtime/debug.<span class="keyword">go</span>:<span class="number">30</span> +<span class="number">0xa0</span> fp=<span class="number">0xc4200a3f10</span> sp=<span class="number">0xc4200a3ee8</span> pc=<span class="number">0x7f512c6d9810</span></span><br><span class="line">main.init<span class="number">.0</span>()</span><br><span class="line">/<span class="keyword">go</span>/src/github.com/opencontainers/runc/init.<span class="keyword">go</span>:<span class="number">14</span> +<span class="number">0x61</span> fp=<span class="number">0xc4200a3f30</span> sp=<span class="number">0xc4200a3f10</span> pc=<span class="number">0x7f512c992801</span></span><br><span class="line">main.init()</span><br><span class="line">&lt;autogenerated&gt;:<span class="number">1</span> +<span class="number">0x624</span> fp=<span class="number">0xc4200a3f88</span> sp=<span class="number">0xc4200a3f30</span> pc=<span class="number">0x7f512c9a1014</span></span><br><span class="line">runtime.main()</span><br><span class="line">/usr/local/<span class="keyword">go</span>/src/runtime/proc.<span class="keyword">go</span>:<span class="number">186</span> +<span class="number">0x1d2</span> fp=<span class="number">0xc4200a3fe0</span> sp=<span class="number">0xc4200a3f88</span> pc=<span class="number">0x7f512c6ff962</span></span><br><span class="line">runtime.goexit()</span><br><span class="line">/usr/local/<span class="keyword">go</span>/src/runtime/asm_amd64.s:<span class="number">2361</span> +<span class="number">0x1</span> fp=<span class="number">0xc4200a3fe8</span> sp=<span class="number">0xc4200a3fe0</span> pc=<span class="number">0x7f512c72ad71</span></span><br><span class="line"></span><br><span class="line">goroutine <span class="number">6</span> [syscall]:</span><br><span class="line">os/signal.signal_recv(<span class="number">0x0</span>)</span><br><span class="line">/usr/local/<span class="keyword">go</span>/src/runtime/sigqueue.<span class="keyword">go</span>:<span class="number">139</span> +<span class="number">0xa8</span></span><br><span class="line">os/signal.loop()</span><br><span class="line">/usr/local/<span class="keyword">go</span>/src/os/signal/signal_unix.<span class="keyword">go</span>:<span class="number">22</span> +<span class="number">0x24</span></span><br><span class="line">created by os/signal.init<span class="number">.0</span></span><br><span class="line">/usr/local/<span class="keyword">go</span>/src/os/signal/signal_unix.<span class="keyword">go</span>:<span class="number">28</span> +<span class="number">0x43</span></span><br><span class="line"></span><br><span class="line">rax    <span class="number">0x0</span></span><br><span class="line">rbx    <span class="number">0x7f512babc868</span></span><br><span class="line">rcx    <span class="number">0xffffffffffffffff</span></span><br><span class="line">rdx    <span class="number">0x6</span></span><br><span class="line">rdi    <span class="number">0x271</span></span><br><span class="line">rsi    <span class="number">0x271</span></span><br><span class="line">rbp    <span class="number">0x7f512c9e9e5e</span></span><br><span class="line">rsp    <span class="number">0x7ffe1121a658</span></span><br><span class="line">r8     <span class="number">0xa</span></span><br><span class="line">r9     <span class="number">0x7f512c524740</span></span><br><span class="line">r10    <span class="number">0x8</span></span><br><span class="line">r11    <span class="number">0x206</span></span><br><span class="line">r12    <span class="number">0x7f512d3bb080</span></span><br><span class="line">r13    <span class="number">0xf1</span></span><br><span class="line">r14    <span class="number">0x11</span></span><br><span class="line">r15    <span class="number">0x0</span></span><br><span class="line">rip    <span class="number">0x7f512b7365f7</span></span><br><span class="line">rflags <span class="number">0x206</span></span><br><span class="line">cs     <span class="number">0x33</span></span><br><span class="line">fs     <span class="number">0x0</span></span><br><span class="line">gs     <span class="number">0x0</span></span><br><span class="line">exec failed: container_linux.<span class="keyword">go</span>:<span class="number">348</span>: starting container process caused <span class="string">&quot;read init-p: connection reset by peer&quot;</span></span><br></pre></td></tr></table></figure><p>额，runc init因资源不足创建线程失败？？？这种输出显然不是runc的输出，而是go runtime非预期的输出内容。那么资源不足，究竟是什么资源类型资源不足呢？我们在结合 /var/log/message 日志分析：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br></pre></td><td class="code"><pre><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: runc:[<span class="number">2</span>:INIT] invoked oom-killer: gfp_mask=<span class="number">0xd0</span>, order=<span class="number">0</span>, oom_score_adj=<span class="number">997</span></span><br><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: CPU: <span class="number">14</span> PID: <span class="number">12788</span> Comm: runc:[<span class="number">2</span>:INIT] Tainted: G        W  OE  ------------ T <span class="number">3.10</span><span class="number">.0</span><span class="number">-514.16</span><span class="number">.1</span>.el7.stable.v1<span class="number">.4</span>.x86_64 #<span class="number">1</span></span><br><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: Hardware name: Inspur SA5212M4/YZMB<span class="number">-00370</span><span class="number">-107</span>, BIOS <span class="number">4.1</span><span class="number">.10</span> <span class="number">11</span>/<span class="number">14</span>/<span class="number">2016</span></span><br><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: ffff88103841dee0 <span class="number">00000000</span>c4394691 ffff880263e4bcb8 ffffffff8168863d</span><br><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: ffff880263e4bd50 ffffffff81683585 ffff88203cc5e300 ffff880ee02b2380</span><br><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: <span class="number">0000000000000001</span> <span class="number">0000000000000000</span> <span class="number">0000000000000000</span> <span class="number">0000000000000046</span></span><br><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: Call Trace:</span><br><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: [&lt;ffffffff8168863d&gt;] dump_stack+<span class="number">0x19</span>/<span class="number">0x1b</span></span><br><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: [&lt;ffffffff81683585&gt;] dump_header+<span class="number">0x85</span>/<span class="number">0x27f</span></span><br><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: [&lt;ffffffff81185b06&gt;] ? find_lock_task_mm+<span class="number">0x56</span>/<span class="number">0xc0</span></span><br><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: [&lt;ffffffff81185fbe&gt;] oom_kill_process+<span class="number">0x24e</span>/<span class="number">0x3c0</span></span><br><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: [&lt;ffffffff81093c2e&gt;] ? has_capability_noaudit+<span class="number">0x1e</span>/<span class="number">0x30</span></span><br><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: [&lt;ffffffff811f4d91&gt;] mem_cgroup_oom_synchronize+<span class="number">0x551</span>/<span class="number">0x580</span></span><br><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: [&lt;ffffffff811f41b0&gt;] ? mem_cgroup_charge_common+<span class="number">0xc0</span>/<span class="number">0xc0</span></span><br><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: [&lt;ffffffff81186844&gt;] pagefault_out_of_memory+<span class="number">0x14</span>/<span class="number">0x90</span></span><br><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: [&lt;ffffffff816813fa&gt;] mm_fault_error+<span class="number">0x68</span>/<span class="number">0x12b</span></span><br><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: [&lt;ffffffff81694405&gt;] __do_page_fault+<span class="number">0x395</span>/<span class="number">0x450</span></span><br><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: [&lt;ffffffff816944f5&gt;] do_page_fault+<span class="number">0x35</span>/<span class="number">0x90</span></span><br><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: [&lt;ffffffff81690708&gt;] page_fault+<span class="number">0x28</span>/<span class="number">0x30</span></span><br><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: memory: usage <span class="number">3145728</span>kB, limit <span class="number">3145728</span>kB, failcnt <span class="number">14406932</span></span><br><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: memory+swap: usage <span class="number">3145728</span>kB, limit <span class="number">9007199254740988</span>kB, failcnt <span class="number">0</span></span><br><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: kmem: usage <span class="number">3143468</span>kB, limit <span class="number">9007199254740988</span>kB, failcnt <span class="number">0</span></span><br><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: Memory cgroup stats <span class="keyword">for</span> /kubepods/burstable/pod6c4333b3-a663<span class="number">-11</span>ea-b39f<span class="number">-6</span>c92bf85beda: cache:<span class="number">0</span>KB rss:<span class="number">0</span>KB rss_huge:<span class="number">0</span>KB mapped_file:<span class="number">0</span>KB swap:<span class="number">0</span>KB inactive_anon:<span class="number">0</span>KB active_anon:<span class="number">0</span>KB inactive_file:<span class="number">0</span>KB active_file:<span class="number">0</span>KB unevictable:<span class="number">0</span>KB</span><br><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: Memory cgroup stats <span class="keyword">for</span> /kubepods/burstable/pod6c4333b3-a663<span class="number">-11</span>ea-b39f<span class="number">-6</span>c92bf85beda/b761e05249245695278b3f409d2d6e5c6a5bff6995ff0cf44d03af4aa9764a30: cache:<span class="number">0</span>KB rss:<span class="number">40</span>KB rss_huge:<span class="number">0</span>KB mapped_file:<span class="number">0</span>KB swap:<span class="number">0</span>KB inactive_anon:<span class="number">0</span>KB active_anon:<span class="number">40</span>KB inactive_file:<span class="number">0</span>KB active_file:<span class="number">0</span>KB unevictable:<span class="number">0</span>KB</span><br><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: Memory cgroup stats <span class="keyword">for</span> /kubepods/burstable/pod6c4333b3-a663<span class="number">-11</span>ea-b39f<span class="number">-6</span>c92bf85beda/<span class="number">1</span>d1750ecc627cc5d60d80c071b2eb4d515ee8880c5b5136883164f08319869b0: cache:<span class="number">0</span>KB rss:<span class="number">0</span>KB rss_huge:<span class="number">0</span>KB mapped_file:<span class="number">0</span>KB swap:<span class="number">0</span>KB inactive_anon:<span class="number">0</span>KB active_anon:<span class="number">0</span>KB inactive_file:<span class="number">0</span>KB active_file:<span class="number">0</span>KB unevictable:<span class="number">0</span>KB</span><br><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: Memory cgroup stats <span class="keyword">for</span> /kubepods/burstable/pod6c4333b3-a663<span class="number">-11</span>ea-b39f<span class="number">-6</span>c92bf85beda/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5: cache:<span class="number">0</span>KB rss:<span class="number">2220</span>KB rss_huge:<span class="number">0</span>KB mapped_file:<span class="number">0</span>KB swap:<span class="number">0</span>KB inactive_anon:<span class="number">0</span>KB active_anon:<span class="number">2140</span>KB inactive_file:<span class="number">0</span>KB active_file:<span class="number">0</span>KB unevictable:<span class="number">0</span>KB</span><br><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: Memory cgroup stats <span class="keyword">for</span> /kubepods/burstable/pod6c4333b3-a663<span class="number">-11</span>ea-b39f<span class="number">-6</span>c92bf85beda/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5/super-agent: cache:<span class="number">0</span>KB rss:<span class="number">0</span>KB rss_huge:<span class="number">0</span>KB mapped_file:<span class="number">0</span>KB swap:<span class="number">0</span>KB inactive_anon:<span class="number">0</span>KB active_anon:<span class="number">0</span>KB inactive_file:<span class="number">0</span>KB active_file:<span class="number">0</span>KB unevictable:<span class="number">0</span>KB</span><br><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name</span><br><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: [<span class="number">30598</span>]     <span class="number">0</span> <span class="number">30598</span>      <span class="number">255</span>        <span class="number">1</span>       <span class="number">4</span>        <span class="number">0</span>          <span class="number">-998</span> pause</span><br><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: [<span class="number">11680</span>]     <span class="number">0</span> <span class="number">11680</span>   <span class="number">164833</span>     <span class="number">1118</span>      <span class="number">20</span>        <span class="number">0</span>           <span class="number">997</span> dockerinit</span><br><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: [<span class="number">12788</span>]     <span class="number">0</span> <span class="number">12788</span>   <span class="number">150184</span>     <span class="number">1146</span>      <span class="number">23</span>        <span class="number">0</span>           <span class="number">997</span> runc:[<span class="number">2</span>:INIT]</span><br><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: oom-kill:,cpuset=bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5,mems_allowed=<span class="number">0</span><span class="number">-1</span>,oom_memcg=/kubepods/burstable/pod6c4333b3-a663<span class="number">-11</span>ea-b39f<span class="number">-6</span>c92bf85beda,task_memcg=/kubepods/burstable/pod6c4333b3-a663<span class="number">-11</span>ea-b39f<span class="number">-6</span>c92bf85beda/bbd5e4b5f9c13666dd0ec7ff7afb2c4c2b0ede40a4adf1de43cc31c606f283f5,task=runc:[<span class="number">2</span>:INIT],pid=<span class="number">12800</span>,uid=<span class="number">0</span></span><br><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: Memory cgroup out of memory: Kill process <span class="number">12800</span> (runc:[<span class="number">2</span>:INIT]) score <span class="number">997</span> or sacrifice child</span><br><span class="line">Jun <span class="number">17</span> <span class="number">03</span>:<span class="number">18</span>:<span class="number">17</span> host-xx kernel: Killed process <span class="number">12788</span> (runc:[<span class="number">2</span>:INIT]) total-vm:<span class="number">600736</span>kB, anon-rss:<span class="number">3296</span>kB, file-rss:<span class="number">276</span>kB, shmem-rss:<span class="number">1012</span>kB</span><br></pre></td></tr></table></figure><p>/var/log/message 记录了该容器在大约1个月前大量的OOM日志信息，该时间与异常的runc init进程启动时间基本匹配。</p><p>小结runc init阻塞的原因：在一个非常关键的时间节点，runc init由于内存资源不足，创建线程失败，触发go runtime的非预期输出，进而造成runc init阻塞在写pipe操作。</p><p>定位至此，问题的全貌已经基本描述清楚。但是我们还有一个疑问，既然runc init在往pipe中写数据，难道没有其它进程来读取pipe中的内容吗？</p><p>大家还记得上面lsof执行的结果吗？有心人一定发现了该pipe的读端是谁了，对，就是containerd-shim，对应的pipe的inode编号为1070361747。那么，为什么containerd-shim没有来读pipe里面的内容呢？我们结合代码来分析：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(e *execProcess)</span> <span class="title">start</span><span class="params">(ctx context.Context)</span> <span class="params">(err error)</span></span> &#123;</span><br><span class="line">   ......</span><br><span class="line">   <span class="keyword">if</span> err := e.parent.runtime.Exec(ctx, e.parent.id, e.spec, opts); err != <span class="literal">nil</span> &#123;   <span class="comment">// 执行runc init</span></span><br><span class="line">      <span class="built_in">close</span>(e.waitBlock)</span><br><span class="line">      <span class="keyword">return</span> e.parent.runtimeError(err, <span class="string">&quot;OCI runtime exec failed&quot;</span>)</span><br><span class="line">   &#125;</span><br><span class="line">   ......</span><br><span class="line">   <span class="keyword">else</span> <span class="keyword">if</span> !e.stdio.IsNull() &#123;</span><br><span class="line">      fifoCtx, cancel := context.WithTimeout(ctx, <span class="number">15</span>*time.Second)</span><br><span class="line">      <span class="keyword">defer</span> cancel()</span><br><span class="line"></span><br><span class="line">      <span class="keyword">if</span> err := copyPipes(fifoCtx, e.io, e.stdio.Stdin, e.stdio.Stdout, e.stdio.Stderr, &amp;e.wg, &amp;copyWaitGroup); err != <span class="literal">nil</span> &#123;   <span class="comment">// 读pipe</span></span><br><span class="line">         <span class="keyword">return</span> errors.Wrap(err, <span class="string">&quot;failed to start io pipe copy&quot;</span>)</span><br><span class="line">      &#125;</span><br><span class="line">   &#125;</span><br><span class="line">   ......</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(r *Runc)</span> <span class="title">Exec</span><span class="params">(context context.Context, id <span class="keyword">string</span>, spec specs.Process, opts *ExecOpts)</span> <span class="title">error</span></span> &#123;</span><br><span class="line">   ......</span><br><span class="line">   cmd := r.command(context, <span class="built_in">append</span>(args, id)...)</span><br><span class="line">   <span class="keyword">if</span> opts != <span class="literal">nil</span> &amp;&amp; opts.IO != <span class="literal">nil</span> &#123;</span><br><span class="line">      opts.Set(cmd)</span><br><span class="line">   &#125;</span><br><span class="line">   ......</span><br><span class="line">   ec, err := Monitor.Start(cmd)</span><br><span class="line">   ......</span><br><span class="line">   status, err := Monitor.Wait(cmd, ec)</span><br><span class="line">   ......</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>额，containerd-shim的设计是，等待runc init执行完成之后，再来读取pipe中的内容。但是此时的runc init由于非预期的写入数据量比较大，被阻塞在了写pipe操作处。。。完美的死锁。</p><p>终于，本次docker hang死问题的核心脉络都已理清。接下来我们再来聊聊解决方案。</p><h3 id="解决方案"><a href="#解决方案" class="headerlink" title="解决方案"></a>解决方案</h3><p>当了解了docker hang死的成因之后，我们可以针对性的提出如下解决办法。</p><h4 id="最直观的办法"><a href="#最直观的办法" class="headerlink" title="最直观的办法"></a>最直观的办法</h4><p>既然docker exec可能会引起docker hang死，那么我们禁用系统中所有的docker exec操作即可。最典型的是kubelet的probe，当前我们默认给所有Pod添加了ReadinessProbe，并且是以exec的形式进入容器内执行命令。我们调整kubelet的探测行为，修改为tcp或者http probe即可。</p><p>这里组件虽然改动不大，但是涉及业务容器的改造成本太大了，如何迁移存量集群是个大问题。</p><h4 id="最根本的办法"><a href="#最根本的办法" class="headerlink" title="最根本的办法"></a>最根本的办法</h4><p>既然当前containerd-shim读pipe需要等待runc exec执行完毕，如果我们将读pipe的操作提前至runc exec命令执行之前，理论上也可以避免死锁。</p><p>同样。这种方案的升级成本太高了，升级containerd-shim时需要重启存量的所有容器，这个方案基本不可能通过评审。</p><h4 id="最简单的办法"><a href="#最简单的办法" class="headerlink" title="最简单的办法"></a>最简单的办法</h4><p>既然runc init阻塞在写pipe，我们主动读取pipe内的内容，也能让runc init顺利退出。</p><p>在将本解决方案自动化的过程中，如何能够识别如docker hang死是由于写pipe导致的，是一个小小的挑战。但是相对于以上两种解决方案，我认为还是值得一试，毕竟其影响微乎其微。</p><h3 id="后续"><a href="#后续" class="headerlink" title="后续"></a>后续</h3><p>其实我们在读pipe的时候还引起了一个另外的问题，详见：<a href="https://plpan.github.io/%E4%B8%80%E6%AC%A1%E8%AF%BB-pipe-%E5%BC%95%E5%8F%91%E7%9A%84%E8%A1%80%E6%A1%88/">一次读-pipe-引发的血案</a>。</p><p>另外，docker hang死的原因远非这一种，本次排查的结果也并非适用于所有场景。希望各位看官能够根据自己的现场排查问题。</p><p>本次docker hang死的排查之旅已然告终。</p><p>本次排查由四人小分队 @飞哥 @鹤哥 @博哥 @我 一起排查长达数天的结论，欢迎大家一键三连，以表支持。</p><p>以上排查如果有误，也欢迎指正。</p>]]></content>
    
    
      
      
    <summary type="html">&lt;h3 id=&quot;背景&quot;&gt;&lt;a href=&quot;#背景&quot; class=&quot;headerlink&quot; title=&quot;背景&quot;&gt;&lt;/a&gt;背景&lt;/h3&gt;&lt;p&gt;最近，我们在升级kubelet时，发现部分宿主机上docker出现hang死的现象，发现过程详见：&lt;a href=&quot;https://plpa</summary>
      
    
    
    
    <category term="问题排查" scheme="https://plpan.github.io/categories/%E9%97%AE%E9%A2%98%E6%8E%92%E6%9F%A5/"/>
    
    
    <category term="docker" scheme="https://plpan.github.io/tags/docker/"/>
    
    <category term="containerd" scheme="https://plpan.github.io/tags/containerd/"/>
    
    <category term="runc" scheme="https://plpan.github.io/tags/runc/"/>
    
  </entry>
  
  <entry>
    <title>netns leak 排查之旅</title>
    <link href="https://plpan.github.io/netns-leak-%E6%8E%92%E6%9F%A5%E4%B9%8B%E6%97%85/"/>
    <id>https://plpan.github.io/netns-leak-%E6%8E%92%E6%9F%A5%E4%B9%8B%E6%97%85/</id>
    <published>2020-05-15T02:19:24.000Z</published>
    <updated>2020-11-12T01:23:48.060Z</updated>
    
    <content type="html"><![CDATA[<h3 id="揭开面纱"><a href="#揭开面纱" class="headerlink" title="揭开面纱"></a>揭开面纱</h3><p>周一，接到RD反馈线上容器网络访问存在异常，具体线上描述如下：</p><ul><li>上游服务driver-api所有容器访问下游服务duse-api某一容器TCP【telnet测试】连接不通，访问其余下游容器均正常</li><li>上游服务容器测试下游容器IP连通性【ping测试】正常</li></ul><p>从以上两点现象可以得出一个结论：</p><ul><li>容器的网络设备存在，IP地址连通，但是容器服务进程未启动，端口未启动</li><li>但是，当我们和业务RD确认之后，发现业务容器状态正常，业务进程也正运行着。嗯，问题不简单。</li></ul><p>此外，同事这边排查还有一个结论：</p><ul><li>arp反向解析duse-api特殊容器IP时，不返回MAC地址信息</li><li>当telnet失败后，立即执行arp，会返回MAC地址信息</li></ul><p>当我们拿着arp解析的MAC地址与容器当前的MAC地址作比较时，发现MAC地址不一致。唔，基本上确定问题所在了，net ns泄漏了。执行如下命令验证：</p><figure class="highlight routeros"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo<span class="built_in"> ip </span>netns ls | <span class="keyword">while</span> read ns; <span class="keyword">do</span> sudo<span class="built_in"> ip </span>netns exec <span class="variable">$ns</span><span class="built_in"> ip </span>addr; done | grep inet | grep -v 127 | awk <span class="string">&#x27;&#123;print $2&#125;&#x27;</span> | sort | uniq -c</span><br></pre></td></tr></table></figure><p>确实发现该容器对应的IP出现了两次，该容器IP对应了两个网络命名空间，也即该容器的网络命名空间出现了泄漏。</p><h3 id="误入迷障"><a href="#误入迷障" class="headerlink" title="误入迷障"></a>误入迷障</h3><p>当确定了问题所在之后，我们立马调转排查方向，重新投入到net ns泄漏的排查事业当中。</p><p>既然net ns出现了泄漏，我们只需要排查被泄露的net ns的成因即可。在具体定位之前，首先补充一个背景：</p><ul><li>ip netns 命令默认扫描 /var/run/netns 目录，从该目录下的文件读取net ns的信息</li><li>默认情况下，kubelet调用docker创建容器时，docker会将net ns文件隐藏，如果不做特殊处理，我们执行 ip netns 命令将看不到任何数据</li><li>当前弹性云为了方便排查问题，做了一个特殊处理，将容器的网络命名空间mount到 /var/run/netns 目录 【注意，这里有个大坑】</li></ul><p>有了弹性云当前的特殊处理，我们就可以知道所有net ns的创建时间，也即 /var/run/netns 目录下对应文件的创建时间。</p><p>我们查看该泄漏ns文件的创建时间为2020-04-17 11:34:07，排查范围进一步缩小，只需从该时间点附近排查即可。</p><p>接下来，我们分析了该附近时间段，容器究竟遭遇了什么：</p><ul><li>2020-04-17 11:33:26 用户执行发布更新操作</li><li>2020-04-17 11:34:24 平台显示容器已启动</li><li>2020-04-17 11:34:28 平台显示容器启动脚本执行失败</li><li>2020-04-17 11:36:22 用户重新部署该容器</li><li>2020-04-17 11:36:31 平台显示容器已删除成功</li></ul><p>既然是容器网络命名空间泄漏，则说明再删除容器的时候，没有执行ns的清理操作。【注：这里由于基础知识不足，导致问题排查绕了地球一圈】</p><p>我们梳理kubelet在该时间段对该容器的清理日志，核心相关日志展示如下：</p><figure class="highlight apache"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="attribute">I0417</span> <span class="number">11</span>:<span class="number">36</span>:<span class="number">30</span>.<span class="number">974674</span>   <span class="number">37736</span> kubelet_pods.go:<span class="number">1180</span>] Killing unwanted pod <span class="string">&quot;duse-api-xxxxx-0&quot;</span></span><br><span class="line"><span class="attribute">I0417</span> <span class="number">11</span>:<span class="number">36</span>:<span class="number">30</span>.<span class="number">976803</span>   <span class="number">37736</span> plugins.go:<span class="number">391</span>] Calling network plugin cni to tear down pod <span class="string">&quot;duse-api-xxxxx-0_default&quot;</span></span><br><span class="line"><span class="attribute">I0417</span> <span class="number">11</span>:<span class="number">36</span>:<span class="number">30</span>.<span class="number">983499</span>   <span class="number">37736</span> kubelet_pods.go:<span class="number">1780</span>] Orphaned pod <span class="string">&quot;4ae28778-805c-11ea-a54c-b4055d1e6372&quot;</span> found, removing pod cgroups</span><br><span class="line"><span class="attribute">I0417</span> <span class="number">11</span>:<span class="number">36</span>:<span class="number">30</span>.<span class="number">986360</span>   <span class="number">37736</span> pod_container_manager_linux.go:<span class="number">167</span>] Attempt to kill process with pid: <span class="number">48892</span></span><br><span class="line"><span class="attribute">I0417</span> <span class="number">11</span>:<span class="number">36</span>:<span class="number">30</span>.<span class="number">986382</span>   <span class="number">37736</span> pod_container_manager_linux.go:<span class="number">174</span>] successfully killed <span class="literal">all</span> unwanted processes.</span><br></pre></td></tr></table></figure><p>简单描述流程：</p><ul><li>I0417 11:36:30.974674 根据删除容器执行，执行杀死Pod操作</li><li>I0417 11:36:30.976803 调用cni插件清理网络命名空间</li><li>I0417 11:36:30.983499 常驻协程检测到Pod已终止运行，开始执行清理操作，包括清理目录、cgroup</li><li>I0417 11:36:30.986360 清理cgroup时杀死容器中还未退出的进程</li><li>I0417 11:36:30.986382 显示所有容器进程都已被杀死</li></ul><p>这里提示一点：正常情况下，容器退出时，容器内所有进程都已退出。而上面之所以出现清理cgroup时需要杀死容器内未退出进程，是由于常驻协程的检测机制导致的，常驻协程判定Pod已终止运行的条件是：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// podIsTerminated returns true if pod is in the terminated state (&quot;Failed&quot; or &quot;Succeeded&quot;).</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(kl *Kubelet)</span> <span class="title">podIsTerminated</span><span class="params">(pod *v1.Pod)</span> <span class="title">bool</span></span> &#123;</span><br><span class="line">   <span class="comment">// Check the cached pod status which was set after the last sync.</span></span><br><span class="line">   status, ok := kl.statusManager.GetPodStatus(pod.UID)</span><br><span class="line">   <span class="keyword">if</span> !ok &#123;</span><br><span class="line">      <span class="comment">// If there is no cached status, use the status from the</span></span><br><span class="line">      <span class="comment">// apiserver. This is useful if kubelet has recently been</span></span><br><span class="line">      <span class="comment">// restarted.</span></span><br><span class="line">      status = pod.Status</span><br><span class="line">   &#125;</span><br><span class="line">   <span class="keyword">return</span> status.Phase == v1.PodFailed || status.Phase == v1.PodSucceeded || (pod.DeletionTimestamp != <span class="literal">nil</span> &amp;&amp; notRunning(status.ContainerStatuses))</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>这个容器命中了第三个或条件：容器已被标记删除，并且所有业务容器都不在运行中（业务容器启动失败，根本就没运行起来过），但是Pod的sandbox容器可能仍然处于运行状态。</p><p>仅依据上面的kubelet日志，难以发现问题所在。我们接着又分析了cni插件的日志，截取cni在删除该Pod容器网络时的日志如下：</p><figure class="highlight routeros"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">[pid:98497] 2020/04/17 11:36:30.990707 main.go:89: ===== start cni process =====</span><br><span class="line">[pid:98497] 2020/04/17 11:36:30.990761 main.go:90: os env: [<span class="attribute">CNI_COMMAND</span>=DEL <span class="attribute">CNI_CONTAINERID</span>=c2ef79f7596b6b558f0c01c0715bac46714eefd1e9966625a09414c7218e1013 <span class="attribute">CNI_NETNS</span>=/proc/48892/ns/net <span class="attribute">CNI_ARGS</span>=IgnoreUnknown=1;K8S_POD_NAMESPACE=default;K8S_POD_NAME=duse-api-xxxxx-0;K8S_POD_INFRA_CONTAINER_ID=c2ef79f7596b6b558f0c01c0715bac46714eefd1e9966625a09414c7218e1013 <span class="attribute">CNI_IFNAME</span>=eth0 <span class="attribute">CNI_PATH</span>=/home/user/cloud/cni-plugins/bin <span class="attribute">LANG</span>=en_US.UTF-8 <span class="attribute">PATH</span>=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin <span class="attribute">KUBE_LOGTOSTDERR</span>=--logtostderr=false <span class="attribute">KUBE_LOG_LEVEL</span>=--v=3 <span class="attribute">KUBE_ALLOW_PRIV</span>=--allow-privileged=true <span class="attribute">KUBE_MASTER</span>=--master=https://10.xxx.xxx.xxx:6443 <span class="attribute">KUBELET_ADDRESS</span>=--address=0.0.0.0 <span class="attribute">KUBELET_HOSTNAME</span>=--hostname_override=10.xxx.xxx.xxx KUBELET_POD_INFRA_CONTAINER= <span class="attribute">KUBELET_ARGS</span>=--network-plugin=cni <span class="attribute">--cni-bin-dir</span>=/home/user/cloud/cni-plugins/bin <span class="attribute">--cni-conf-dir</span>=/home/user/cloud/cni-plugins/conf <span class="attribute">--kubeconfig</span>=/etc/kubernetes/kubeconfig/kubelet.kubeconfig <span class="attribute">--cert-dir</span>=/etc/kubernetes/ssl <span class="attribute">--log-dir</span>=/var/log/kubernetes <span class="attribute">--stderrthreshold</span>=3 <span class="attribute">--allowed-unsafe-sysctls</span>=net.*,kernel.shm*,kernel.msg*,kernel.sem,fs.mqueue.* <span class="attribute">--pod-infra-container-image</span>=registry.keji.com/k8s/pause:3.0 --eviction-hard=  <span class="attribute">--image-gc-high-threshold</span>=75 <span class="attribute">--image-gc-low-threshold</span>=65 <span class="attribute">--feature-gates</span>=KubeletPluginsWatcher=false <span class="attribute">--restart-count-limit</span>=5 <span class="attribute">--last-upgrade-time</span>=2019-07-01]</span><br><span class="line">[pid:98497] 2020/04/17 11:36:30.990771 main.go:91: stdin : &#123;<span class="string">&quot;cniVersion&quot;</span>:<span class="string">&quot;0.3.0&quot;</span>,<span class="string">&quot;logDir&quot;</span>:<span class="string">&quot;/home/user/cloud/cni-plugins/acllogs&quot;</span>,<span class="string">&quot;name&quot;</span>:<span class="string">&quot;cloudcni&quot;</span>,<span class="string">&quot;type&quot;</span>:<span class="string">&quot;aclCni&quot;</span>&#125;</span><br><span class="line">[pid:98497] 2020/04/17 11:36:30.990790 main.go:181: failed <span class="keyword">to</span> Statfs <span class="string">&quot;/proc/48892/ns/net&quot;</span>: <span class="literal">no</span> such file <span class="keyword">or</span> directory</span><br><span class="line">[pid:98497] 2020/04/17 11:36:30.990814 main.go:94: ===== end cni process =====</span><br></pre></td></tr></table></figure><p>其中，main.go:181行的错误日志一下就抓住了我们的眼球，结合代码分析下：</p><figure class="highlight stata"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br></pre></td><td class="code"><pre><span class="line">func cmdDel(<span class="keyword">args</span> *skel.CmdArgs) <span class="keyword">error</span> &#123;</span><br><span class="line">    <span class="keyword">n</span>, _, <span class="keyword">err</span> := loadConf(<span class="keyword">args</span>.StdinData)</span><br><span class="line">    <span class="keyword">if</span> <span class="keyword">err</span> != nil &#123;</span><br><span class="line">        <span class="keyword">return</span> <span class="keyword">err</span></span><br><span class="line">    &#125;</span><br><span class="line"> </span><br><span class="line">    netns, <span class="keyword">err</span> := ns.GetNS(<span class="keyword">args</span>.Netns)</span><br><span class="line">    <span class="keyword">if</span> <span class="keyword">err</span> != nil &#123;</span><br><span class="line">        <span class="keyword">log</span>.Println(<span class="keyword">err</span>)     <span class="comment">//// Line 181</span></span><br><span class="line">        <span class="keyword">return</span> fmt.Errorf(<span class="string">&quot;failed to open netns %q: %v&quot;</span>, netns, <span class="keyword">err</span>)</span><br><span class="line">    &#125;</span><br><span class="line">    defer netns.<span class="keyword">Close</span>()</span><br><span class="line">    ...</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>可以看到，cni在调用该插件清理容器网络命名空间时，由于181行的错误，导致cni插件提前退出，并没有执行后面的清理操作。唔，终于找到你，小虫子。</p><p>这里，我们先简单总结下问题排查至此，得出的阶段性结论：</p><ul><li>由于容器启动失败，在删除Pod时，常驻协程定时清理非运行状态Pod的cgroup，杀死了Pod的sandbox容器</li><li>当删除容器命令触发的cni清理操作执行时，发现sandbox的pause进程已退出，定位不到容器的网络命名空间，因此退出cni的清理操作</li><li>最终容器网络命名空间泄漏</li></ul><p>既然，明确了问题所在，我们就赶紧来定制修复方案吧，甚至于，我们很快就给出了一版修复：</p><ul><li>保证在Pod的所有容器退出之前，不会执行cgroup清理操作</li></ul><p>这样就保证了删除容器命令触发的清理操作能够按照顺序执行：</p><ul><li>杀死所有业务容器</li><li>执行cni插件清理工作</li><li>杀死sandbox容器</li><li>执行cgroup清理工作</li></ul><p>我们风风火火的修复了内部版本之后，还验证了社区新版本代码中这块逻辑仍旧保持原样，就想着给社区送温暖（事实证明是妄想）。我们就去开源版本搭建的集群中，复现这个问题。然后噩梦就来了。。。</p><p>相同的Pod配置文件，我们在弹性云内部版本几乎能够百分百复现net ns泄漏的问题，而在开源社区版本中，从未出现过一次net ns泄漏。难不成，搞不好，莫不是说，不是我们定位的这个原因？</p><h3 id="拨云现月"><a href="#拨云现月" class="headerlink" title="拨云现月"></a>拨云现月</h3><p>这个结论对我们来说，不是一个好消息。费力不小，不说南辕北辙，但是确实还未发现问题的根因。</p><p>为了进一步缩小问题排查范围，我们找内核组同学请教了一个基础知识：</p><ul><li>在删除net ns时，如果该ns内仍有网络设备，系统自动先删除网络设备，然后再删除ns</li></ul><p>掌握了这个基础知识，我们再来排查。既然原生k8s集群不存在net ns泄漏问题，那问题一定由我们定制的某个模块引起。由于net ns泄漏发生在node上，当前弹性云在node节点上部署的模块包含：</p><ul><li>kubelet</li><li>cni plugins</li><li>other tools</li></ul><p>由于kubelet已经被排除嫌疑，那么罪魁祸首基本就是cni插件了。对比原生集群与弹性云线上集群的cni插件，发现一个极有可能会造成net ns泄漏的点：</p><ul><li>定制的cni插件为了排查问题的方便，将容器的网络命名空间文件绑定挂载到了 /var/run/netns 目录下 【参考上面的大坑】</li></ul><p>我们赶紧着手验证元凶是否就是它。修改cni插件代码，删除绑定挂载操作，然后在测试环境验证。验证结果符合预期，net ns不在泄漏。至此，真相终于大白于天下了。</p><h3 id="亡羊补牢"><a href="#亡羊补牢" class="headerlink" title="亡羊补牢"></a>亡羊补牢</h3><p>当初为net ns做一个绑定挂载，其目的就是为了方便我们排查问题，使得 ip netns 命令能够访问当前宿主上所有Pod的网络命名空间。</p><p>但其实一个简单的软链操作就能够实现这个目标。Pod退出时，如果这个软链文件未被清理，也不会引起net ns的泄漏，同时 ls -la /var/run/netns 命令可以清晰的看到哪些net ns仍有效，哪些已无效。</p><h3 id="事后诸葛"><a href="#事后诸葛" class="headerlink" title="事后诸葛"></a>事后诸葛</h3><p>为什么绑定挂载能够导致net ns泄漏呢？这是由linux 网络命名空间特性决定的：</p><ul><li>只要该命名空间中仍有一个进程存活，或者存在绑定挂载的情况（可能还存在其他情况），该ns就不会被回收</li><li>而一旦所有进程都已退出，并且也无特殊状况，linux将自动回收该ns</li></ul><p>最后，这个问题本身并不复杂，之所以问题存在如此之久，排查如此曲折，主要暴露了我们的基础知识有所欠缺。</p><p>好好学习，天天向上，方是王道！</p>]]></content>
    
    
      
      
    <summary type="html">&lt;h3 id=&quot;揭开面纱&quot;&gt;&lt;a href=&quot;#揭开面纱&quot; class=&quot;headerlink&quot; title=&quot;揭开面纱&quot;&gt;&lt;/a&gt;揭开面纱&lt;/h3&gt;&lt;p&gt;周一，接到RD反馈线上容器网络访问存在异常，具体线上描述如下：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;上游服务driver-api所有容</summary>
      
    
    
    
    <category term="问题排查" scheme="https://plpan.github.io/categories/%E9%97%AE%E9%A2%98%E6%8E%92%E6%9F%A5/"/>
    
    
    <category term="kubernetes" scheme="https://plpan.github.io/tags/kubernetes/"/>
    
    <category term="docker" scheme="https://plpan.github.io/tags/docker/"/>
    
    <category term="cni" scheme="https://plpan.github.io/tags/cni/"/>
    
    <category term="linux namespace" scheme="https://plpan.github.io/tags/linux-namespace/"/>
    
  </entry>
  
  <entry>
    <title>go sync.pool</title>
    <link href="https://plpan.github.io/go-sync-pool/"/>
    <id>https://plpan.github.io/go-sync-pool/</id>
    <published>2019-04-14T07:26:29.000Z</published>
    <updated>2020-11-12T01:23:48.060Z</updated>
    
    <content type="html"><![CDATA[<p>众所周知，Go实现了自动垃圾回收，这就意味着：当我们在申请内存时，不必关心如何以及何时释放内存，这些都是由Go语言内部实现的。注：我们关心的是堆内存，因为栈内存会随着函数调用的返回自动释放。</p><p>自动垃圾回收极大地降低了我们写程序时的心智负担，但是，这是否就意味着我们能够随心所欲的申请大量内存呢？理论上当然可以，但实际写代码时强烈不推荐这种做法，因为大量的临时堆内存会给GC线程的造成负担。</p><p>此时，小明同学就问：有没有办法能缓解海量临时对象的分配问题呢？</p><p>当然是有的，内存复用就是一个典型方案，而内存池就是该方案的一个实例，Go语言官方提供一种内存池的实现方案——sync.Pool。</p><p>首先我们来看sync.Pool的使用方式：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">main</span><span class="params">()</span></span> &#123;</span><br><span class="line">pool := sync.Pool&#123;</span><br><span class="line">New: <span class="function"><span class="keyword">func</span><span class="params">()</span> <span class="title">interface</span></span>&#123;&#125; &#123;</span><br><span class="line"><span class="keyword">return</span> <span class="string">&quot;Hello&quot;</span></span><br><span class="line">&#125;,</span><br><span class="line">&#125;</span><br><span class="line">old := pool.Get()</span><br><span class="line">pool.Put(old.(<span class="keyword">string</span>) + <span class="string">&quot; World&quot;</span>)</span><br><span class="line"><span class="built_in">new</span> := pool.Get()</span><br><span class="line">fmt.Println(<span class="built_in">new</span>) <span class="comment">// Hello World</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>借助上面这段简单代码，我们验证了sync.Pool的内存复用。那么sync.Pool又是如何实现内存复用的呢？让我们来深入Go源码看一看。</p><p>sync.Pool的源码位于$GOROOT/src/sync/pool.go，其结构体定义如下：</p><figure class="highlight pgsql"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">type</span> Pool struct &#123;</span><br><span class="line">noCopy noCopy</span><br><span class="line"></span><br><span class="line"><span class="keyword">local</span>     unsafe.Pointer // <span class="keyword">local</span> fixed-size per-P pool, actual <span class="keyword">type</span> <span class="keyword">is</span> [P]poolLocal</span><br><span class="line">localSize uintptr        // size <span class="keyword">of</span> the <span class="keyword">local</span> <span class="keyword">array</span></span><br><span class="line"></span><br><span class="line">// <span class="built_in">New</span> optionally specifies a <span class="keyword">function</span> <span class="keyword">to</span> generate</span><br><span class="line">// a <span class="keyword">value</span> <span class="keyword">when</span> <span class="keyword">Get</span> would otherwise <span class="keyword">return</span> nil.</span><br><span class="line">// It may <span class="keyword">not</span> be changed <span class="keyword">concurrently</span> <span class="keyword">with</span> calls <span class="keyword">to</span> <span class="keyword">Get</span>.</span><br><span class="line"><span class="built_in">New</span> func() interface&#123;&#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><ul><li>noCopy字段：go vet静态扫描代码时提示对象拷贝，不影响编译和运行</li><li>local：对象池数组，实际上是[P]poolLocal，而poolLocal则为每个P的本地内存池，P的本地内存池有两个对象：<ul><li>private interface{}：一个私有坑位</li><li>shared  []interface{}：一组公有坑位</li></ul></li><li>localSize：local数组的大小，一般等于P的数量（在调用GOMAXPROCS时会出现短暂不一致）</li><li>New：当对象池为空时，就调用New方法创建一个临时对象</li></ul><p>这里需要注意的是：sync.Pool内存池并非P结构体的一个字段，而是sync.Pool自己维护了一个数组，取P的id作为数组下标来获取内存池对象。</p><p>了解了sync.Pool的数据结构之后，我们再来看其操作原理，sync.Pool的操作有两个：Get和Put，因为Put简单，我们先来看Put:</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// Put adds x to the pool.</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(p *Pool)</span> <span class="title">Put</span><span class="params">(x <span class="keyword">interface</span>&#123;&#125;)</span></span> &#123;</span><br><span class="line"><span class="keyword">if</span> x == <span class="literal">nil</span> &#123;</span><br><span class="line"><span class="keyword">return</span></span><br><span class="line">&#125;</span><br><span class="line">l := p.pin()</span><br><span class="line"><span class="keyword">if</span> l.private == <span class="literal">nil</span> &#123;</span><br><span class="line">l.private = x</span><br><span class="line">x = <span class="literal">nil</span></span><br><span class="line">&#125;</span><br><span class="line">runtime_procUnpin()</span><br><span class="line"><span class="keyword">if</span> x != <span class="literal">nil</span> &#123;</span><br><span class="line">l.Lock()</span><br><span class="line">l.shared = <span class="built_in">append</span>(l.shared, x)</span><br><span class="line">l.Unlock()</span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>p.pin用于获取P的对象池，Put优先将内存对象存储到内存池私有坑位，如果私有坑位已经被占，则将其存储到公有坑位</p><p>注意：如果内存对象被存储至公有坑位，则需要加锁。</p><p>接着我们再来看Get操作：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(p *Pool)</span> <span class="title">Get</span><span class="params">()</span> <span class="title">interface</span></span>&#123;&#125; &#123;</span><br><span class="line">l := p.pin()</span><br><span class="line">x := l.private</span><br><span class="line">l.private = <span class="literal">nil</span></span><br><span class="line">runtime_procUnpin()</span><br><span class="line"><span class="keyword">if</span> x == <span class="literal">nil</span> &#123;</span><br><span class="line">l.Lock()</span><br><span class="line">last := <span class="built_in">len</span>(l.shared) - <span class="number">1</span></span><br><span class="line"><span class="keyword">if</span> last &gt;= <span class="number">0</span> &#123;</span><br><span class="line">x = l.shared[last]</span><br><span class="line">l.shared = l.shared[:last]</span><br><span class="line">&#125;</span><br><span class="line">l.Unlock()</span><br><span class="line"><span class="keyword">if</span> x == <span class="literal">nil</span> &#123;</span><br><span class="line">x = p.getSlow()</span><br><span class="line">&#125;</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">if</span> x == <span class="literal">nil</span> &amp;&amp; p.New != <span class="literal">nil</span> &#123;</span><br><span class="line">x = p.New()</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">return</span> x</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><ul><li>如果本P的私有坑位有对象，则直接返回</li><li>如果本P私有坑位没有对象，则从本P的公有坑位中获取一个对象返回</li><li>如果本P的公有坑位也没有对象，则依次遍历其他P的公有坑位，取走一个对象返回</li><li>如果所有P的公有坑位都没有对象，并且定义New函数，则调用New函数创建一个对象</li><li>否则返回nil</li></ul><p>注意：每当遍历一个P的公有坑位时，都需要加锁，因此最多加锁N次，最少0次，其中N为P的数目</p><p>了解了以上原理，我们就能够开开心心的使用sync.Pool了。此时，小明同学又问了，我明明已经使用了sync.Pool了，为什么GC压力还非常大？</p><p>这就涉及到sync.Pool本身的内存回收了：sync.Pool缓存临时对象并非是永久保存，它保活的时间作用域其实也非常短：我们发现sync/pool.go中还定义了poolCleanup函数用于内存池的清理，我们再看其调用时机：</p><figure class="highlight autoit"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">init</span><span class="params">()</span> &#123;</span></span><br><span class="line">runtime_registerPoolCleanup(poolCleanup)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>runtime_xxx函数都可以对应到$GOROOT/src/runtime包下的xxx函数，我们找到对应的函数定义：</p><figure class="highlight swift"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">//go:linkname sync_runtime_registerPoolCleanup sync.runtime_registerPoolCleanup</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">sync_runtime_registerPoolCleanup</span><span class="params">(f <span class="keyword">func</span><span class="params">()</span></span></span>) &#123;</span><br><span class="line">poolcleanup = f</span><br><span class="line">&#125;</span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">clearpools</span><span class="params">()</span></span> &#123;</span><br><span class="line">   <span class="comment">// clear sync.Pools</span></span><br><span class="line">   <span class="keyword">if</span> poolcleanup != <span class="literal">nil</span> &#123;</span><br><span class="line">      poolcleanup()</span><br><span class="line">   &#125;</span><br><span class="line">  ......</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>因此，我们只需要定位clearpools的调用时机即可：</p><figure class="highlight reasonml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// gcStart transitions the GC from _GCoff to _GCmark (if</span></span><br><span class="line"><span class="comment">// !mode.stwMark) or _GCmarktermination (if mode.stwMark) by</span></span><br><span class="line"><span class="comment">// performing sweep termination and GC initialization.</span></span><br><span class="line"><span class="comment">//</span></span><br><span class="line"><span class="comment">// This may return without performing this transition in some cases,</span></span><br><span class="line"><span class="comment">// such as when called on a system stack or with locks held.</span></span><br><span class="line">func gc<span class="constructor">Start(<span class="params">mode</span> <span class="params">gcMode</span>, <span class="params">trigger</span> <span class="params">gcTrigger</span>)</span> &#123;</span><br><span class="line">    ......</span><br><span class="line">    clearpools<span class="literal">()</span></span><br><span class="line">   ......</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>我们发现每当GC开始时，都会清理sync.Pool内存对象池，这就意味着sync.Pool缓存的临时对象都活不过一个GC周期。如果我们的程序在疯狂分配临时对象，这就会加速GC的执行频率，而GC开始时又会释放sync.Pool内存池，这简直就是一个死循环。</p><p>所以小明啊，最佳的实践是什么呢？当然是优化代码逻辑咯，尽量减少内存分配次数。具体的代码优化可以借助pprof实现。</p>]]></content>
    
    
      
      
    <summary type="html">&lt;p&gt;众所周知，Go实现了自动垃圾回收，这就意味着：当我们在申请内存时，不必关心如何以及何时释放内存，这些都是由Go语言内部实现的。注：我们关心的是堆内存，因为栈内存会随着函数调用的返回自动释放。&lt;/p&gt;
&lt;p&gt;自动垃圾回收极大地降低了我们写程序时的心智负担，但是，这是否就意味着</summary>
      
    
    
    
    <category term="源码分析" scheme="https://plpan.github.io/categories/%E6%BA%90%E7%A0%81%E5%88%86%E6%9E%90/"/>
    
    
    <category term="go" scheme="https://plpan.github.io/tags/go/"/>
    
  </entry>
  
</feed>
