<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>szza</title>
  
  <subtitle>look code art</subtitle>
  <link href="https://szza.github.io/atom.xml" rel="self"/>
  
  <link href="https://szza.github.io/"/>
  <updated>2026-01-06T13:10:38.776Z</updated>
  <id>https://szza.github.io/</id>
  
  <author>
    <name>fibonaccii</name>
    
  </author>
  
  <generator uri="https://hexo.io/">Hexo</generator>
  
  <entry>
    <title>Milvus 搜索流程全面分析</title>
    <link href="https://szza.github.io/2025/08/10/Milvus/19_search_flow_comprehensive_analysis/"/>
    <id>https://szza.github.io/2025/08/10/Milvus/19_search_flow_comprehensive_analysis/</id>
    <published>2025-08-10T08:00:00.000Z</published>
    <updated>2026-01-06T13:10:38.776Z</updated>
    
    <content type="html"><![CDATA[<h2 id="目录"><a href="#目录" class="headerlink" title="目录"></a>目录</h2><ol><li><a href="#%E6%A6%82%E8%BF%B0">概述</a></li><li><a href="#%E6%90%9C%E7%B4%A2%E4%BB%BB%E5%8A%A1%E5%85%A5%E5%8F%A3">搜索任务入口</a></li><li><a href="#segment-%E6%90%9C%E7%B4%A2%E6%89%A7%E8%A1%8C">Segment 搜索执行</a></li><li><a href="#searchrequest-%E5%92%8C-plan-%E5%8F%82%E6%95%B0">SearchRequest 和 Plan 参数</a></li><li><a href="#segment-search-%E5%86%85%E9%83%A8%E6%89%A7%E8%A1%8C">Segment Search 内部执行</a></li><li><a href="#%E7%BB%93%E6%9E%9C%E5%BD%92%E7%BA%A6reduce">结果归约（Reduce）</a></li><li><a href="#%E9%AB%98%E7%BA%A7%E6%90%9C%E7%B4%A2advanced-search">高级搜索（Advanced Search）</a></li><li><a href="#%E5%AE%8C%E6%95%B4%E6%B5%81%E7%A8%8B%E6%80%BB%E7%BB%93">完整流程总结</a></li></ol><hr><h2 id="概述"><a href="#概述" class="headerlink" title="概述"></a>概述</h2><p>本文档全面分析 Milvus 的搜索流程，从 QueryNode 接收搜索请求开始，到返回最终结果为止。涵盖搜索任务的执行、segment 搜索、计划节点执行、结果归约等各个环节。</p><h3 id="搜索流程概览"><a href="#搜索流程概览" class="headerlink" title="搜索流程概览"></a>搜索流程概览</h3><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">Proxy → QueryNode → SearchTask → SearchSegments → Segment.Search → Reduce → Result</span><br></pre></td></tr></table></figure><hr><h2 id="搜索任务入口"><a href="#搜索任务入口" class="headerlink" title="搜索任务入口"></a>搜索任务入口</h2><h3 id="1-SearchTask-Execute"><a href="#1-SearchTask-Execute" class="headerlink" title="1. SearchTask.Execute"></a>1. SearchTask.Execute</h3><p><code>SearchTask.Execute</code> 是普通搜索任务的执行入口，负责在 QueryNode 上执行搜索并归约结果。</p><h4 id="函数签名"><a href="#函数签名" class="headerlink" title="函数签名"></a>函数签名</h4><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(t *SearchTask)</span></span> Execute() <span class="type">error</span></span><br></pre></td></tr></table></figure><h4 id="执行流程"><a href="#执行流程" class="headerlink" title="执行流程"></a>执行流程</h4><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(t *SearchTask)</span></span> Execute() <span class="type">error</span> &#123;</span><br><span class="line">    <span class="comment">// 1. 准备搜索请求</span></span><br><span class="line">    err := t.combinePlaceHolderGroups()</span><br><span class="line">    searchReq, err := segcore.NewSearchRequest(t.collection.GetCCollection(), req, t.placeholderGroup)</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 2. 根据 Scope 选择搜索历史数据或流式数据</span></span><br><span class="line">    <span class="keyword">if</span> req.GetScope() == querypb.DataScope_Historical &#123;</span><br><span class="line">        results, searchedSegments, err = segments.SearchHistorical(...)</span><br><span class="line">    &#125; <span class="keyword">else</span> <span class="keyword">if</span> req.GetScope() == querypb.DataScope_Streaming &#123;</span><br><span class="line">        results, searchedSegments, err = segments.SearchStreaming(...)</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 3. 归约多个 segment 的结果</span></span><br><span class="line">    reducedResult, err := segcore.ReduceSearchResultsAndFillData(...)</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 4. 填充主键和目标字段</span></span><br><span class="line">    err = segments.FillPrimaryKeys(...)</span><br><span class="line">    err = segments.FillTargetEntry(...)</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 5. 编码并返回结果</span></span><br><span class="line">    t.result = EncodeSearchResults(...)</span><br><span class="line">    <span class="keyword">return</span> <span class="literal">nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h4 id="关键步骤"><a href="#关键步骤" class="headerlink" title="关键步骤"></a>关键步骤</h4><ol><li><strong>合并 Placeholder Groups</strong>：将多个查询向量合并成一个 PlaceholderGroup</li><li><strong>创建 SearchRequest</strong>：封装搜索计划、PlaceholderGroup、时间戳等信息</li><li><strong>搜索 Segments</strong>：根据 Scope 搜索历史或流式 segments</li><li><strong>归约结果</strong>：将多个 segment 的结果合并成全局 topK</li><li><strong>填充字段</strong>：填充主键和目标字段数据</li><li><strong>编码返回</strong>：将结果编码为 protobuf 格式</li></ol><h3 id="2-StreamingSearchTask-Execute"><a href="#2-StreamingSearchTask-Execute" class="headerlink" title="2. StreamingSearchTask.Execute"></a>2. StreamingSearchTask.Execute</h3><p><code>StreamingSearchTask.Execute</code> 是流式搜索任务的执行入口，支持流式归约，可以边搜索边归约，降低内存占用。</p><h4 id="执行流程-1"><a href="#执行流程-1" class="headerlink" title="执行流程"></a>执行流程</h4><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(t *StreamingSearchTask)</span></span> Execute() <span class="type">error</span> &#123;</span><br><span class="line">    <span class="comment">// 1. 准备搜索请求</span></span><br><span class="line">    t.combinePlaceHolderGroups()</span><br><span class="line">    searchReq, err := segcore.NewSearchRequest(...)</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 2. 流式搜索和归约</span></span><br><span class="line">    <span class="keyword">if</span> req.GetScope() == querypb.DataScope_Historical &#123;</span><br><span class="line">        streamReduceFunc := <span class="function"><span class="keyword">func</span><span class="params">(result *segments.SearchResult)</span></span> <span class="type">error</span> &#123;</span><br><span class="line">            <span class="keyword">return</span> t.streamReduce(t.ctx, searchReq.Plan(), result, ...)</span><br><span class="line">        &#125;</span><br><span class="line">        pinnedSegments, err := segments.SearchHistoricalStreamly(</span><br><span class="line">            t.ctx, t.segmentManager, searchReq, ..., streamReduceFunc)</span><br><span class="line">        </span><br><span class="line">        <span class="comment">// 3. 获取流式归约结果</span></span><br><span class="line">        t.resultBlobs, err = segcore.GetStreamReduceResult(t.ctx, t.streamReducer)</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">return</span> <span class="literal">nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h4 id="流式搜索的优势"><a href="#流式搜索的优势" class="headerlink" title="流式搜索的优势"></a>流式搜索的优势</h4><ul><li><strong>内存效率</strong>：不需要等待所有 segment 搜索完成，边搜索边归约</li><li><strong>延迟优化</strong>：可以更早返回部分结果</li><li><strong>适合大数据量</strong>：当 segment 数量很多时，流式处理可以避免内存峰值</li></ul><hr><h2 id="Segment-搜索执行"><a href="#Segment-搜索执行" class="headerlink" title="Segment 搜索执行"></a>Segment 搜索执行</h2><h3 id="1-searchSegments-函数"><a href="#1-searchSegments-函数" class="headerlink" title="1. searchSegments 函数"></a>1. searchSegments 函数</h3><p><code>searchSegments</code> 函数在多个 segment 上并发执行搜索，并收集结果。</p><h4 id="函数签名-1"><a href="#函数签名-1" class="headerlink" title="函数签名"></a>函数签名</h4><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">searchSegments</span><span class="params">(</span></span></span><br><span class="line"><span class="params"><span class="function">    ctx context.Context, </span></span></span><br><span class="line"><span class="params"><span class="function">    mgr *Manager, </span></span></span><br><span class="line"><span class="params"><span class="function">    segments []Segment, </span></span></span><br><span class="line"><span class="params"><span class="function">    segType SegmentType, </span></span></span><br><span class="line"><span class="params"><span class="function">    searchReq *SearchRequest)</span></span> ([]*SearchResult, <span class="type">error</span>)</span><br></pre></td></tr></table></figure><h4 id="执行流程-2"><a href="#执行流程-2" class="headerlink" title="执行流程"></a>执行流程</h4><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">searchSegments</span><span class="params">(ctx context.Context, mgr *Manager, segments []Segment, </span></span></span><br><span class="line"><span class="params"><span class="function">    segType SegmentType, searchReq *SearchRequest)</span></span> ([]*SearchResult, <span class="type">error</span>) &#123;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 1. 设置指标标签</span></span><br><span class="line">    searchLabel := metrics.SealedSegmentLabel</span><br><span class="line">    <span class="keyword">if</span> segType == commonpb.SegmentState_Growing &#123;</span><br><span class="line">        searchLabel = metrics.GrowingSegmentLabel</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 2. 并发搜索所有 segments</span></span><br><span class="line">    results := <span class="built_in">make</span>([]*SearchResult, <span class="number">0</span>, <span class="built_in">len</span>(segments))</span><br><span class="line">    <span class="keyword">var</span> wg sync.WaitGroup</span><br><span class="line">    <span class="keyword">var</span> mu sync.Mutex</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">for</span> _, s := <span class="keyword">range</span> segments &#123;</span><br><span class="line">        wg.Add(<span class="number">1</span>)</span><br><span class="line">        <span class="keyword">go</span> <span class="function"><span class="keyword">func</span><span class="params">(seg Segment)</span></span> &#123;</span><br><span class="line">            <span class="keyword">defer</span> wg.Done()</span><br><span class="line">            </span><br><span class="line">            <span class="comment">// Pin segment 防止被释放</span></span><br><span class="line">            <span class="keyword">if</span> err := seg.PinIfNotReleased(); err != <span class="literal">nil</span> &#123;</span><br><span class="line">                <span class="keyword">return</span></span><br><span class="line">            &#125;</span><br><span class="line">            <span class="keyword">defer</span> seg.Unpin()</span><br><span class="line">            </span><br><span class="line">            <span class="comment">// 检查是否需要懒加载</span></span><br><span class="line">            <span class="keyword">if</span> seg.IsLazyLoad() &#123;</span><br><span class="line">                mgr.DiskCache.Do(seg, <span class="function"><span class="keyword">func</span><span class="params">()</span></span> <span class="type">error</span> &#123;</span><br><span class="line">                    <span class="keyword">return</span> seg.Load(ctx)</span><br><span class="line">                &#125;)</span><br><span class="line">            &#125;</span><br><span class="line">            </span><br><span class="line">            <span class="comment">// 执行搜索</span></span><br><span class="line">            result, err := seg.Search(ctx, searchReq)</span><br><span class="line">            <span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">                <span class="keyword">return</span></span><br><span class="line">            &#125;</span><br><span class="line">            </span><br><span class="line">            mu.Lock()</span><br><span class="line">            results = <span class="built_in">append</span>(results, result)</span><br><span class="line">            mu.Unlock()</span><br><span class="line">        &#125;(s)</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    wg.Wait()</span><br><span class="line">    <span class="keyword">return</span> results, <span class="literal">nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h4 id="关键特性"><a href="#关键特性" class="headerlink" title="关键特性"></a>关键特性</h4><ol><li><strong>并发执行</strong>：使用 goroutine 并发搜索多个 segments</li><li><strong>Segment Pin</strong>：Pin segment 防止在搜索过程中被释放</li><li><strong>懒加载支持</strong>：如果 segment 需要懒加载，通过 DiskCache 加载</li><li><strong>错误处理</strong>：单个 segment 搜索失败不影响其他 segments</li></ol><h3 id="2-SearchHistorical-和-SearchStreaming"><a href="#2-SearchHistorical-和-SearchStreaming" class="headerlink" title="2. SearchHistorical 和 SearchStreaming"></a>2. SearchHistorical 和 SearchStreaming</h3><h4 id="SearchHistorical"><a href="#SearchHistorical" class="headerlink" title="SearchHistorical"></a>SearchHistorical</h4><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">SearchHistorical</span><span class="params">(ctx context.Context, manager *Manager, searchReq *SearchRequest, </span></span></span><br><span class="line"><span class="params"><span class="function">    collID <span class="type">int64</span>, partIDs []<span class="type">int64</span>, segIDs []<span class="type">int64</span>)</span></span> ([]*SearchResult, []Segment, <span class="type">error</span>) &#123;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 验证并获取历史 segments</span></span><br><span class="line">    segments, err := validateOnHistorical(ctx, manager, collID, partIDs, segIDs)</span><br><span class="line">    <span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">        <span class="keyword">return</span> <span class="literal">nil</span>, <span class="literal">nil</span>, err</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 搜索 sealed segments</span></span><br><span class="line">    searchResults, err := searchSegments(ctx, manager, segments, SegmentTypeSealed, searchReq)</span><br><span class="line">    <span class="keyword">return</span> searchResults, segments, err</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>搜索范围</strong>：</p><ul><li>如果 <code>segIDs</code> 未指定：搜索 <code>partIDs</code> 指定的所有历史 segments</li><li>如果 <code>segIDs</code> 指定了：只搜索指定的 segments</li><li>如果 <code>partIDs</code> 为空：搜索已加载 collection 的所有分区</li></ul><h4 id="SearchStreaming"><a href="#SearchStreaming" class="headerlink" title="SearchStreaming"></a>SearchStreaming</h4><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">SearchStreaming</span><span class="params">(ctx context.Context, manager *Manager, searchReq *SearchRequest, </span></span></span><br><span class="line"><span class="params"><span class="function">    collID <span class="type">int64</span>, partIDs []<span class="type">int64</span>, segIDs []<span class="type">int64</span>)</span></span> ([]*SearchResult, []Segment, <span class="type">error</span>) &#123;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 验证并获取流式 segments</span></span><br><span class="line">    segments, err := validateOnStream(ctx, manager, collID, partIDs, segIDs)</span><br><span class="line">    <span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">        <span class="keyword">return</span> <span class="literal">nil</span>, <span class="literal">nil</span>, err</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 搜索 growing segments</span></span><br><span class="line">    searchResults, err := searchSegments(ctx, manager, segments, SegmentTypeGrowing, searchReq)</span><br><span class="line">    <span class="keyword">return</span> searchResults, segments, err</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><hr><h2 id="SearchRequest-和-Plan-参数"><a href="#SearchRequest-和-Plan-参数" class="headerlink" title="SearchRequest 和 Plan 参数"></a>SearchRequest 和 Plan 参数</h2><h3 id="1-SearchRequest-结构"><a href="#1-SearchRequest-结构" class="headerlink" title="1. SearchRequest 结构"></a>1. SearchRequest 结构</h3><p><code>SearchRequest</code> 封装了搜索请求的所有信息。</p><h4 id="结构定义"><a href="#结构定义" class="headerlink" title="结构定义"></a>结构定义</h4><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">type</span> SearchRequest <span class="keyword">struct</span> &#123;</span><br><span class="line">    plan              *SearchPlan              <span class="comment">// 搜索计划</span></span><br><span class="line">    cPlaceholderGroup C.CPlaceholderGroup      <span class="comment">// C++ 层的 PlaceholderGroup</span></span><br><span class="line">    msgID             <span class="type">int64</span>                    <span class="comment">// 消息 ID</span></span><br><span class="line">    searchFieldID     <span class="type">int64</span>                    <span class="comment">// 搜索字段 ID</span></span><br><span class="line">    mvccTimestamp     typeutil.Timestamp       <span class="comment">// MVCC 时间戳</span></span><br><span class="line">    consistencyLevel  commonpb.ConsistencyLevel <span class="comment">// 一致性级别</span></span><br><span class="line">    collectionTTL     typeutil.Timestamp       <span class="comment">// Collection TTL</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h4 id="创建过程"><a href="#创建过程" class="headerlink" title="创建过程"></a>创建过程</h4><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">NewSearchRequest</span><span class="params">(collection *CCollection, req *querypb.SearchRequest, </span></span></span><br><span class="line"><span class="params"><span class="function">    placeholderGrp []<span class="type">byte</span>)</span></span> (*SearchRequest, <span class="type">error</span>) &#123;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 1. 从序列化的 Plan 创建 SearchPlan</span></span><br><span class="line">    expr := req.Req.SerializedExprPlan</span><br><span class="line">    plan, err := createSearchPlanByExpr(collection, expr)</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 2. 解析 PlaceholderGroup</span></span><br><span class="line">    <span class="keyword">var</span> cPlaceholderGroup C.CPlaceholderGroup</span><br><span class="line">    status := C.ParsePlaceholderGroup(plan.cSearchPlan, blobPtr, blobSize, &amp;cPlaceholderGroup)</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 3. 验证 MetricType</span></span><br><span class="line">    metricTypeInPlan := plan.GetMetricType()</span><br><span class="line">    <span class="keyword">if</span> <span class="built_in">len</span>(metricType) != <span class="number">0</span> &amp;&amp; metricType != metricTypeInPlan &#123;</span><br><span class="line">        <span class="keyword">return</span> <span class="literal">nil</span>, merr.WrapErrParameterInvalid(...)</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 4. 获取搜索字段 ID</span></span><br><span class="line">    <span class="keyword">var</span> fieldID C.int64_t</span><br><span class="line">    status = C.GetFieldID(plan.cSearchPlan, &amp;fieldID)</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">return</span> &amp;SearchRequest&#123;</span><br><span class="line">        plan:              plan,</span><br><span class="line">        cPlaceholderGroup: cPlaceholderGroup,</span><br><span class="line">        searchFieldID:     <span class="type">int64</span>(fieldID),</span><br><span class="line">        mvccTimestamp:     req.GetReq().GetMvccTimestamp(),</span><br><span class="line">        consistencyLevel:  req.GetReq().GetConsistencyLevel(),</span><br><span class="line">        collectionTTL:     req.GetReq().GetCollectionTtlTimestamps(),</span><br><span class="line">    &#125;, <span class="literal">nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h3 id="2-Plan-参数的作用"><a href="#2-Plan-参数的作用" class="headerlink" title="2. Plan 参数的作用"></a>2. Plan 参数的作用</h3><p><code>Plan</code> 参数是 <code>segment-&gt;Search</code> 的核心参数，包含了搜索执行所需的所有信息。</p><h4 id="Plan-结构（C-）"><a href="#Plan-结构（C-）" class="headerlink" title="Plan 结构（C++）"></a>Plan 结构（C++）</h4><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">struct</span> <span class="title class_">Plan</span> &#123;</span><br><span class="line">    SchemaPtr schema_;                                    <span class="comment">// Schema 信息</span></span><br><span class="line">    std::unique_ptr&lt;VectorPlanNode&gt; plan_node_;          <span class="comment">// 计划节点树</span></span><br><span class="line">    std::map&lt;std::string, FieldId&gt; tag2field_;          <span class="comment">// PlaceholderName -&gt; FieldId</span></span><br><span class="line">    std::vector&lt;FieldId&gt; target_entries_;               <span class="comment">// 需要返回的字段 ID</span></span><br><span class="line">    std::vector&lt;std::string&gt; target_dynamic_fields_;    <span class="comment">// 动态字段列表</span></span><br><span class="line">    std::optional&lt;ExtractedPlanInfo&gt; extra_info_opt_;    <span class="comment">// 额外信息（涉及的字段）</span></span><br><span class="line">&#125;;</span><br></pre></td></tr></table></figure><h4 id="Plan-的核心作用"><a href="#Plan-的核心作用" class="headerlink" title="Plan 的核心作用"></a>Plan 的核心作用</h4><h5 id="1-搜索前验证（check-search）"><a href="#1-搜索前验证（check-search）" class="headerlink" title="1. 搜索前验证（check_search）"></a>1. 搜索前验证（check_search）</h5><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="type">void</span> <span class="title">ChunkedSegmentSealedImpl::check_search</span><span class="params">(<span class="type">const</span> query::Plan* plan)</span> <span class="type">const</span> </span>&#123;</span><br><span class="line">    <span class="comment">// 检查涉及的字段是否已加载</span></span><br><span class="line">    <span class="keyword">auto</span>&amp; request_fields = plan-&gt;extra_info_opt_.<span class="built_in">value</span>().involved_fields_;</span><br><span class="line">    <span class="keyword">auto</span> field_ready_bitset = field_data_ready_bitset_ | index_ready_bitset_ | binlog_index_bitset_;</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">auto</span> absent_fields = request_fields - field_ready_bitset;</span><br><span class="line">    <span class="keyword">if</span> (absent_fields.<span class="built_in">any</span>()) &#123;</span><br><span class="line">        <span class="comment">// 抛出 FieldNotLoaded 错误</span></span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h5 id="2-执行计划节点树"><a href="#2-执行计划节点树" class="headerlink" title="2. 执行计划节点树"></a>2. 执行计划节点树</h5><p><code>plan-&gt;plan_node_</code> 是 <code>VectorPlanNode</code>，包含：</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">struct</span> <span class="title class_">VectorPlanNode</span> : PlanNode &#123;</span><br><span class="line">    SearchInfo search_info_;                              <span class="comment">// 搜索信息（topk, metric_type, field_id 等）</span></span><br><span class="line">    std::string placeholder_tag_;                         <span class="comment">// Placeholder 标签</span></span><br><span class="line">    std::shared_ptr&lt;milvus::plan::PlanNode&gt; plannodes_;  <span class="comment">// 执行计划节点树</span></span><br><span class="line">&#125;;</span><br></pre></td></tr></table></figure><p>计划节点树通常包括：</p><ul><li><strong>MvccNode</strong>：MVCC 过滤（时间戳过滤和删除过滤）</li><li><strong>VectorSearchNode</strong>：向量搜索（包含 topK）</li><li><strong>FilterNode</strong>：表达式过滤</li><li><strong>GroupByNode</strong>：Group by 操作（如果启用）</li><li><strong>RescoresNode</strong>：重排序（如果启用）</li></ul><h5 id="3-填充返回字段"><a href="#3-填充返回字段" class="headerlink" title="3. 填充返回字段"></a>3. 填充返回字段</h5><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="type">void</span> <span class="title">SegmentInternalInterface::FillTargetEntry</span><span class="params">(<span class="type">const</span> query::Plan* plan, </span></span></span><br><span class="line"><span class="params"><span class="function">    SearchResult&amp; results)</span> <span class="type">const</span> </span>&#123;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 遍历 plan-&gt;target_entries_，填充每个字段的数据</span></span><br><span class="line">    <span class="keyword">for</span> (<span class="keyword">auto</span> field_id : plan-&gt;target_entries_) &#123;</span><br><span class="line">        <span class="keyword">auto</span>&amp; field_meta = plan-&gt;schema_-&gt;<span class="keyword">operator</span>[](field_id);</span><br><span class="line">        field_data = <span class="built_in">bulk_subscript</span>(&amp;op_ctx, field_id, results.seg_offsets_.<span class="built_in">data</span>(), size);</span><br><span class="line">        results.output_fields_data_[field_id] = std::<span class="built_in">move</span>(field_data);</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h4 id="Plan-的创建流程"><a href="#Plan-的创建流程" class="headerlink" title="Plan 的创建流程"></a>Plan 的创建流程</h4><ol><li><strong>Proxy 层</strong>：解析 DSL 表达式，生成 PlanNode</li><li><strong>设置 SearchInfo</strong>：从 search_params 中提取 topK、metric_type 等</li><li><strong>设置 target_entries</strong>：从 output_fields 中提取需要返回的字段</li><li><strong>序列化</strong>：序列化为 protobuf，发送到 QueryNode</li><li><strong>QueryNode</strong>：反序列化并创建 C++ Plan 对象</li></ol><hr><h2 id="Segment-Search-内部执行"><a href="#Segment-Search-内部执行" class="headerlink" title="Segment Search 内部执行"></a>Segment Search 内部执行</h2><h3 id="1-segment-Search-执行流程"><a href="#1-segment-Search-执行流程" class="headerlink" title="1. segment-&gt;Search 执行流程"></a>1. segment-&gt;Search 执行流程</h3><p><code>segment-&gt;Search</code> 是 C++ 层的搜索执行入口，通过执行计划节点树完成搜索。</p><h4 id="函数签名-2"><a href="#函数签名-2" class="headerlink" title="函数签名"></a>函数签名</h4><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="function">std::unique_ptr&lt;SearchResult&gt;</span></span><br><span class="line"><span class="function"><span class="title">SegmentInternalInterface::Search</span><span class="params">(</span></span></span><br><span class="line"><span class="params"><span class="function">    <span class="type">const</span> query::Plan* plan,</span></span></span><br><span class="line"><span class="params"><span class="function">    <span class="type">const</span> query::PlaceholderGroup* placeholder_group,</span></span></span><br><span class="line"><span class="params"><span class="function">    Timestamp timestamp,</span></span></span><br><span class="line"><span class="params"><span class="function">    <span class="type">const</span> folly::CancellationToken&amp; cancel_token,</span></span></span><br><span class="line"><span class="params"><span class="function">    <span class="type">int32_t</span> consistency_level,</span></span></span><br><span class="line"><span class="params"><span class="function">    Timestamp collection_ttl)</span> <span class="type">const</span></span></span><br></pre></td></tr></table></figure><h4 id="执行流程-3"><a href="#执行流程-3" class="headerlink" title="执行流程"></a>执行流程</h4><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br></pre></td><td class="code"><pre><span class="line"><span class="function">std::unique_ptr&lt;SearchResult&gt;</span></span><br><span class="line"><span class="function"><span class="title">SegmentInternalInterface::Search</span><span class="params">(...)</span> <span class="type">const</span> </span>&#123;</span><br><span class="line">    <span class="function">std::shared_lock <span class="title">lck</span><span class="params">(mutex_)</span></span>;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 1. 验证 Plan</span></span><br><span class="line">    <span class="built_in">check_search</span>(plan);</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 2. 创建执行访问者</span></span><br><span class="line">    <span class="function">query::ExecPlanNodeVisitor <span class="title">visitor</span><span class="params">(*<span class="keyword">this</span>, timestamp, placeholder_group, </span></span></span><br><span class="line"><span class="params"><span class="function">                                       cancel_token, consistency_level, collection_ttl)</span></span>;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 3. 执行计划节点树</span></span><br><span class="line">    <span class="keyword">auto</span> results = std::<span class="built_in">make_unique</span>&lt;SearchResult&gt;();</span><br><span class="line">    *results = visitor.<span class="built_in">get_moved_result</span>(*plan-&gt;plan_node_);</span><br><span class="line">    results-&gt;segment_ = (<span class="type">void</span>*)<span class="keyword">this</span>;</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">return</span> results;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h3 id="2-ExecPlanNodeVisitor-执行计划节点树"><a href="#2-ExecPlanNodeVisitor-执行计划节点树" class="headerlink" title="2. ExecPlanNodeVisitor 执行计划节点树"></a>2. ExecPlanNodeVisitor 执行计划节点树</h3><h4 id="visit-VectorPlanNode-node"><a href="#visit-VectorPlanNode-node" class="headerlink" title="visit(VectorPlanNode&amp; node)"></a>visit(VectorPlanNode&amp; node)</h4><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="type">void</span> <span class="title">ExecPlanNodeVisitor::visit</span><span class="params">(VectorPlanNode&amp; node)</span> </span>&#123;</span><br><span class="line">    <span class="comment">// 1. 获取活跃数据量</span></span><br><span class="line">    <span class="keyword">auto</span> active_count = segment-&gt;<span class="built_in">get_active_count</span>(timestamp_);</span><br><span class="line">    <span class="keyword">if</span> (active_count == <span class="number">0</span>) &#123;</span><br><span class="line">        search_result_opt_ = <span class="built_in">empty_search_result</span>(...);</span><br><span class="line">        <span class="keyword">return</span>;</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 2. 构建计划片段</span></span><br><span class="line">    <span class="keyword">auto</span> plan = plan::<span class="built_in">PlanFragment</span>(node.plannodes_);</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 3. 创建查询上下文</span></span><br><span class="line">    <span class="keyword">auto</span> query_context = std::<span class="built_in">make_shared</span>&lt;milvus::exec::QueryContext&gt;(...);</span><br><span class="line">    query_context-&gt;<span class="built_in">set_search_info</span>(node.search_info_);</span><br><span class="line">    query_context-&gt;<span class="built_in">set_placeholder_group</span>(placeholder_group_);</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 4. 执行任务</span></span><br><span class="line">    <span class="keyword">auto</span> result = <span class="built_in">ExecuteTask</span>(plan, query_context);</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 5. 获取搜索结果</span></span><br><span class="line">    search_result_opt_ = std::<span class="built_in">move</span>(query_context-&gt;<span class="built_in">get_search_result</span>());</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h3 id="3-计划节点执行顺序"><a href="#3-计划节点执行顺序" class="headerlink" title="3. 计划节点执行顺序"></a>3. 计划节点执行顺序</h3><h4 id="MvccNode-MVCC-过滤"><a href="#MvccNode-MVCC-过滤" class="headerlink" title="MvccNode - MVCC 过滤"></a>MvccNode - MVCC 过滤</h4><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line"><span class="function">RowVectorPtr <span class="title">PhyMvccNode::GetOutput</span><span class="params">()</span> </span>&#123;</span><br><span class="line">    <span class="function">TargetBitmapView <span class="title">data</span><span class="params">(col_input-&gt;GetRawData(), col_input-&gt;size())</span></span>;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 时间戳过滤（MVCC）</span></span><br><span class="line">    segment_-&gt;<span class="built_in">mask_with_timestamps</span>(data, query_timestamp_, collection_ttl_timestamp_);</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 删除标记过滤</span></span><br><span class="line">    segment_-&gt;<span class="built_in">mask_with_delete</span>(data, active_count_, query_timestamp_);</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">return</span> std::<span class="built_in">make_shared</span>&lt;RowVector&gt;(std::vector&lt;VectorPtr&gt;&#123;col_input&#125;);</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>作用</strong>：</p><ul><li><code>mask_with_timestamps</code>：过滤掉时间戳大于查询时间戳的数据</li><li><code>mask_with_delete</code>：过滤掉已删除的数据</li></ul><h4 id="VectorSearchNode-向量搜索-TopK"><a href="#VectorSearchNode-向量搜索-TopK" class="headerlink" title="VectorSearchNode - 向量搜索 + TopK"></a>VectorSearchNode - 向量搜索 + TopK</h4><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br></pre></td><td class="code"><pre><span class="line"><span class="function">RowVectorPtr <span class="title">PhyVectorSearchNode::GetOutput</span><span class="params">()</span> </span>&#123;</span><br><span class="line">    <span class="comment">// 执行向量搜索</span></span><br><span class="line">    segment_-&gt;<span class="built_in">vector_search</span>(search_info_,</span><br><span class="line">                           src_data,</span><br><span class="line">                           src_offsets,</span><br><span class="line">                           num_queries,</span><br><span class="line">                           query_timestamp_,</span><br><span class="line">                           final_view,</span><br><span class="line">                           op_context,</span><br><span class="line">                           search_result);</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 设置搜索结果（已包含 topK）</span></span><br><span class="line">    query_context_-&gt;<span class="built_in">set_search_result</span>(std::<span class="built_in">move</span>(search_result));</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>作用</strong>：</p><ul><li>执行向量搜索，使用 <code>search_info_.topk_</code> 作为 topK</li><li>返回的结果已包含 topK 的 <code>seg_offsets_</code> 和 <code>distances_</code></li></ul><h4 id="GroupByNode-Group-By-操作"><a href="#GroupByNode-Group-By-操作" class="headerlink" title="GroupByNode - Group By 操作"></a>GroupByNode - Group By 操作</h4><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br></pre></td><td class="code"><pre><span class="line"><span class="function">RowVectorPtr <span class="title">PhyGroupByNode::GetOutput</span><span class="params">()</span> </span>&#123;</span><br><span class="line">    <span class="keyword">auto</span> search_result = query_context_-&gt;<span class="built_in">get_search_result</span>();</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 执行 Group By</span></span><br><span class="line">    milvus::exec::<span class="built_in">SearchGroupBy</span>(op_context,</span><br><span class="line">                                search_result.vector_iterators_.<span class="built_in">value</span>(),</span><br><span class="line">                                search_info_,</span><br><span class="line">                                group_by_values,</span><br><span class="line">                                *segment_,</span><br><span class="line">                                search_result.seg_offsets_,</span><br><span class="line">                                search_result.distances_,</span><br><span class="line">                                search_result.topk_per_nq_prefix_sum_);</span><br><span class="line">    </span><br><span class="line">    search_result.group_by_values_ = std::<span class="built_in">move</span>(group_by_values);</span><br><span class="line">    search_result.group_size_ = search_info_.group_size_;</span><br><span class="line">    </span><br><span class="line">    query_context_-&gt;<span class="built_in">set_search_result</span>(std::<span class="built_in">move</span>(search_result));</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>作用</strong>：</p><ul><li>对搜索结果进行 group by 处理</li><li>更新 <code>search_result.group_by_values_</code> 和 <code>group_size_</code></li></ul><h3 id="4-Segment-Search-完成的操作总结"><a href="#4-Segment-Search-完成的操作总结" class="headerlink" title="4. Segment Search 完成的操作总结"></a>4. Segment Search 完成的操作总结</h3><p>在 <code>segment-&gt;Search</code> 中，已经完成了以下操作：</p><p>✅ <strong>MVCC 过滤</strong>：通过 <code>MvccNode</code> 执行时间戳和删除过滤<br>✅ <strong>TopK 选择</strong>：通过 <code>VectorSearchNode</code> 执行向量搜索，结果已包含 topK<br>✅ <strong>Group By</strong>：通过 <code>GroupByNode</code> 执行（如果启用）<br>✅ <strong>表达式过滤</strong>：通过 <code>FilterNode</code> 执行（如果有过滤表达式）<br>✅ <strong>重排序</strong>：通过 <code>RescoresNode</code> 执行（如果启用）</p><p><strong>关键点</strong>：每个 segment 返回的结果是<strong>局部最优</strong>的，只代表该 segment 内的 topK，不代表全局最优。</p><hr><h2 id="结果归约（Reduce）"><a href="#结果归约（Reduce）" class="headerlink" title="结果归约（Reduce）"></a>结果归约（Reduce）</h2><h3 id="1-为什么需要-Reduce？"><a href="#1-为什么需要-Reduce？" class="headerlink" title="1. 为什么需要 Reduce？"></a>1. 为什么需要 Reduce？</h3><p>虽然每个 segment 已经完成了 MVCC、topK、group by 等操作，但仍然需要对多个 segment 的结果进行 Reduce，原因如下：</p><ol><li><strong>数据分布</strong>：数据分布在多个 segments 中</li><li><strong>局部 vs 全局</strong>：每个 segment 返回局部 topK，用户需要全局 topK</li><li><strong>去重需求</strong>：需要去除跨 segment 的重复 entity</li><li><strong>Group By</strong>：需要全局的 group by 处理</li><li><strong>成本聚合</strong>：需要聚合各个 segment 的成本信息</li></ol><h3 id="2-ReduceSearchResults-函数"><a href="#2-ReduceSearchResults-函数" class="headerlink" title="2. ReduceSearchResults 函数"></a>2. ReduceSearchResults 函数</h3><h4 id="函数签名-3"><a href="#函数签名-3" class="headerlink" title="函数签名"></a>函数签名</h4><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">ReduceSearchResults</span><span class="params">(ctx context.Context, results []*internalpb.SearchResults, </span></span></span><br><span class="line"><span class="params"><span class="function">    info *reduce.ResultInfo)</span></span> (*internalpb.SearchResults, <span class="type">error</span>)</span><br></pre></td></tr></table></figure><h4 id="执行流程-4"><a href="#执行流程-4" class="headerlink" title="执行流程"></a>执行流程</h4><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">ReduceSearchResults</span><span class="params">(ctx context.Context, results []*internalpb.SearchResults, </span></span></span><br><span class="line"><span class="params"><span class="function">    info *reduce.ResultInfo)</span></span> (*internalpb.SearchResults, <span class="type">error</span>) &#123;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 1. 过滤空结果</span></span><br><span class="line">    results = lo.Filter(results, <span class="function"><span class="keyword">func</span><span class="params">(result *internalpb.SearchResults, _ <span class="type">int</span>)</span></span> <span class="type">bool</span> &#123;</span><br><span class="line">        <span class="keyword">return</span> result != <span class="literal">nil</span> &amp;&amp; result.GetSlicedBlob() != <span class="literal">nil</span></span><br><span class="line">    &#125;)</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 2. 短路优化：如果只有一个结果，直接返回</span></span><br><span class="line">    <span class="keyword">if</span> <span class="built_in">len</span>(results) == <span class="number">1</span> &#123;</span><br><span class="line">        <span class="keyword">return</span> results[<span class="number">0</span>], <span class="literal">nil</span></span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 3. 解码搜索结果</span></span><br><span class="line">    searchResultData, err := DecodeSearchResults(ctx, results)</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 4. 初始化归约器</span></span><br><span class="line">    searchReduce := InitSearchReducer(info)</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 5. 归约搜索结果</span></span><br><span class="line">    reducedResultData, err := searchReduce.ReduceSearchResultData(ctx, searchResultData, info)</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 6. 编码结果</span></span><br><span class="line">    searchResults, err := EncodeSearchResultData(ctx, reducedResultData, ...)</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 7. 聚合成本信息</span></span><br><span class="line">    searchResults.CostAggregation = mergeRequestCost(requestCosts)</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">return</span> searchResults, <span class="literal">nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h3 id="3-K-Way-Merge-算法"><a href="#3-K-Way-Merge-算法" class="headerlink" title="3. K-Way Merge 算法"></a>3. K-Way Merge 算法</h3><p>Reduce 使用<strong>多路归并（K-Way Merge）算法</strong>，从多个已排序的结果中选择全局最优的 topK。</p><h4 id="算法原理"><a href="#算法原理" class="headerlink" title="算法原理"></a>算法原理</h4><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(scr *SearchCommonReduce)</span></span> ReduceSearchResultData(</span><br><span class="line">    ctx context.Context, </span><br><span class="line">    searchResultData []*schemapb.SearchResultData, </span><br><span class="line">    info *reduce.ResultInfo) (*schemapb.SearchResultData, <span class="type">error</span>) &#123;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 为每个 segment 的结果维护偏移量</span></span><br><span class="line">    offsets := <span class="built_in">make</span>([]<span class="type">int64</span>, <span class="built_in">len</span>(searchResultData))</span><br><span class="line">    idSet := <span class="built_in">make</span>(<span class="keyword">map</span>[<span class="keyword">interface</span>&#123;&#125;]<span class="keyword">struct</span>&#123;&#125;)  <span class="comment">// 用于去重</span></span><br><span class="line">    </span><br><span class="line">    <span class="keyword">for</span> i := <span class="type">int64</span>(<span class="number">0</span>); i &lt; info.GetNq(); i++ &#123;</span><br><span class="line">        <span class="keyword">var</span> j <span class="type">int64</span></span><br><span class="line">        <span class="keyword">for</span> j = <span class="number">0</span>; j &lt; info.GetTopK(); &#123;</span><br><span class="line">            <span class="comment">// 从所有 segment 中选择当前最高分</span></span><br><span class="line">            sel := SelectSearchResultData(searchResultData, resultOffsets, offsets, i)</span><br><span class="line">            <span class="keyword">if</span> sel == <span class="number">-1</span> &#123;</span><br><span class="line">                <span class="keyword">break</span></span><br><span class="line">            &#125;</span><br><span class="line">            </span><br><span class="line">            idx := resultOffsets[sel][i] + offsets[sel]</span><br><span class="line">            id := typeutil.GetPK(searchResultData[sel].GetIds(), idx)</span><br><span class="line">            score := searchResultData[sel].Scores[idx]</span><br><span class="line">            </span><br><span class="line">            <span class="comment">// 去重：跳过已存在的 entity</span></span><br><span class="line">            <span class="keyword">if</span> _, ok := idSet[id]; !ok &#123;</span><br><span class="line">                <span class="comment">// 添加到最终结果</span></span><br><span class="line">                retSize += typeutil.AppendFieldData(ret.FieldsData, </span><br><span class="line">                    searchResultData[sel].FieldsData, idx)</span><br><span class="line">                typeutil.AppendPKs(ret.Ids, id)</span><br><span class="line">                ret.Scores = <span class="built_in">append</span>(ret.Scores, score)</span><br><span class="line">                idSet[id] = <span class="keyword">struct</span>&#123;&#125;&#123;&#125;</span><br><span class="line">                j++</span><br><span class="line">            &#125; <span class="keyword">else</span> &#123;</span><br><span class="line">                skipDupCnt++  <span class="comment">// 跳过重复的 entity</span></span><br><span class="line">            &#125;</span><br><span class="line">            offsets[sel]++  <span class="comment">// 移动指针</span></span><br><span class="line">        &#125;</span><br><span class="line">        ret.Topks = <span class="built_in">append</span>(ret.Topks, j)</span><br><span class="line">    &#125;</span><br><span class="line">    <span class="keyword">return</span> ret, <span class="literal">nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h4 id="SelectSearchResultData-函数"><a href="#SelectSearchResultData-函数" class="headerlink" title="SelectSearchResultData 函数"></a>SelectSearchResultData 函数</h4><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">SelectSearchResultData</span><span class="params">(</span></span></span><br><span class="line"><span class="params"><span class="function">    dataArray []*schemapb.SearchResultData, </span></span></span><br><span class="line"><span class="params"><span class="function">    resultOffsets [][]<span class="type">int64</span>, </span></span></span><br><span class="line"><span class="params"><span class="function">    offsets []<span class="type">int64</span>, </span></span></span><br><span class="line"><span class="params"><span class="function">    qi <span class="type">int64</span>)</span></span> <span class="type">int</span> &#123;</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">var</span> sel = <span class="number">-1</span></span><br><span class="line">    <span class="keyword">var</span> maxDistance = -<span class="type">float32</span>(math.MaxFloat32)</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 遍历所有 segment，找到当前最高分</span></span><br><span class="line">    <span class="keyword">for</span> i, offset := <span class="keyword">range</span> offsets &#123;</span><br><span class="line">        <span class="keyword">if</span> offset &gt;= dataArray[i].Topks[qi] &#123;</span><br><span class="line">            <span class="keyword">continue</span>  <span class="comment">// 该 segment 的结果已用完</span></span><br><span class="line">        &#125;</span><br><span class="line">        </span><br><span class="line">        idx := resultOffsets[i][qi] + offset</span><br><span class="line">        distance := dataArray[i].Scores[idx]</span><br><span class="line">        </span><br><span class="line">        <span class="keyword">if</span> distance &gt; maxDistance &#123;</span><br><span class="line">            sel = i</span><br><span class="line">            maxDistance = distance</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line">    <span class="keyword">return</span> sel</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h3 id="4-Reduce-的其他作用"><a href="#4-Reduce-的其他作用" class="headerlink" title="4. Reduce 的其他作用"></a>4. Reduce 的其他作用</h3><h4 id="去重（Deduplication）"><a href="#去重（Deduplication）" class="headerlink" title="去重（Deduplication）"></a>去重（Deduplication）</h4><p>同一个 entity 可能同时出现在多个 segments 中：</p><ul><li><strong>Growing + Sealed</strong>：新写入的数据在 growing segment，但可能还未 flush 到 sealed segment</li><li><strong>Compaction</strong>：compaction 过程中，数据可能同时存在于新旧 segments</li></ul><h4 id="Group-By-处理"><a href="#Group-By-处理" class="headerlink" title="Group By 处理"></a>Group By 处理</h4><p>如果有 group by 需求，Reduce 阶段会进行全局的 group by 处理：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(sbr *SearchGroupByReduce)</span></span> ReduceSearchResultData(...) &#123;</span><br><span class="line">    groupByValueMap := <span class="built_in">make</span>(<span class="keyword">map</span>[<span class="keyword">interface</span>&#123;&#125;]<span class="type">int64</span>)</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">for</span> j = <span class="number">0</span>; j &lt; groupBound; &#123;</span><br><span class="line">        sel := SelectSearchResultData(...)</span><br><span class="line">        groupByVal := groupByValIterator[sel](idx)</span><br><span class="line">        </span><br><span class="line">        <span class="comment">// 检查 group 限制</span></span><br><span class="line">        <span class="keyword">if</span> groupCount &gt;= groupSize &#123;</span><br><span class="line">            <span class="comment">// 跳过：该 group 已满</span></span><br><span class="line">        &#125; <span class="keyword">else</span> &#123;</span><br><span class="line">            <span class="comment">// 添加到该 group</span></span><br><span class="line">            groupByValueMap[groupByVal]++</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h4 id="成本聚合"><a href="#成本聚合" class="headerlink" title="成本聚合"></a>成本聚合</h4><p>Reduce 还会聚合各个 segment 的存储扫描成本：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">storageCost := lo.Reduce(results, <span class="function"><span class="keyword">func</span><span class="params">(acc segcore.StorageCost, </span></span></span><br><span class="line"><span class="params"><span class="function">    result *internalpb.SearchResults, _ <span class="type">int</span>)</span></span> segcore.StorageCost &#123;</span><br><span class="line">    acc.ScannedRemoteBytes += result.GetScannedRemoteBytes()</span><br><span class="line">    acc.ScannedTotalBytes += result.GetScannedTotalBytes()</span><br><span class="line">    <span class="keyword">return</span> acc</span><br><span class="line">&#125;, segcore.StorageCost&#123;&#125;)</span><br></pre></td></tr></table></figure><hr><h2 id="高级搜索（Advanced-Search）"><a href="#高级搜索（Advanced-Search）" class="headerlink" title="高级搜索（Advanced Search）"></a>高级搜索（Advanced Search）</h2><h3 id="1-ResultInfo-isAdvance-的作用"><a href="#1-ResultInfo-isAdvance-的作用" class="headerlink" title="1. ResultInfo.isAdvance 的作用"></a>1. ResultInfo.isAdvance 的作用</h3><p><code>ResultInfo.isAdvance</code> 用于标识是否是高级搜索（Hybrid Search），决定在 QueryNode 层的结果归约策略。</p><h4 id="核心作用"><a href="#核心作用" class="headerlink" title="核心作用"></a>核心作用</h4><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">ReduceSearchOnQueryNode</span><span class="params">(ctx context.Context, results []*internalpb.SearchResults, </span></span></span><br><span class="line"><span class="params"><span class="function">    info *reduce.ResultInfo)</span></span> (*internalpb.SearchResults, <span class="type">error</span>) &#123;</span><br><span class="line">    <span class="keyword">if</span> info.GetIsAdvance() &#123;</span><br><span class="line">        <span class="keyword">return</span> ReduceAdvancedSearchResults(ctx, results)</span><br><span class="line">    &#125;</span><br><span class="line">    <span class="keyword">return</span> ReduceSearchResults(ctx, results, info)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h4 id="两种归约策略"><a href="#两种归约策略" class="headerlink" title="两种归约策略"></a>两种归约策略</h4><h5 id="1-普通搜索（isAdvance-false）"><a href="#1-普通搜索（isAdvance-false）" class="headerlink" title="1. 普通搜索（isAdvance &#x3D; false）"></a>1. 普通搜索（isAdvance &#x3D; false）</h5><p>使用 <code>ReduceSearchResults</code>：</p><ul><li>立即归约所有结果</li><li>返回最终的 topK 结果</li><li>适用于单一向量搜索</li></ul><h5 id="2-高级搜索（isAdvance-true）"><a href="#2-高级搜索（isAdvance-true）" class="headerlink" title="2. 高级搜索（isAdvance &#x3D; true）"></a>2. 高级搜索（isAdvance &#x3D; true）</h5><p>使用 <code>ReduceAdvancedSearchResults</code>：</p><ul><li><strong>不立即归约</strong>，而是将子结果保存到 <code>SubResults</code> 中</li><li>延迟到 Proxy 层进行归约</li><li>适用于混合搜索（Hybrid Search），包含多个子搜索请求</li><li>需要在 Proxy 层进行特殊的归约逻辑（如 rerank）</li></ul><h4 id="ReduceAdvancedSearchResults-实现"><a href="#ReduceAdvancedSearchResults-实现" class="headerlink" title="ReduceAdvancedSearchResults 实现"></a>ReduceAdvancedSearchResults 实现</h4><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">ReduceAdvancedSearchResults</span><span class="params">(ctx context.Context, </span></span></span><br><span class="line"><span class="params"><span class="function">    results []*internalpb.SearchResults)</span></span> (*internalpb.SearchResults, <span class="type">error</span>) &#123;</span><br><span class="line">    </span><br><span class="line">    searchResults := &amp;internalpb.SearchResults&#123;</span><br><span class="line">        IsAdvanced: <span class="literal">true</span>,</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 不归约，直接追加子结果</span></span><br><span class="line">    <span class="keyword">for</span> index, result := <span class="keyword">range</span> results &#123;</span><br><span class="line">        subResult := &amp;internalpb.SubSearchResults&#123;</span><br><span class="line">            MetricType:     result.GetMetricType(),</span><br><span class="line">            NumQueries:     result.GetNumQueries(),</span><br><span class="line">            TopK:           result.GetTopK(),</span><br><span class="line">            SlicedBlob:     result.GetSlicedBlob(),</span><br><span class="line">            SlicedNumCount: result.GetSlicedNumCount(),</span><br><span class="line">            SlicedOffset:   result.GetSlicedOffset(),</span><br><span class="line">            ReqIndex:       <span class="type">int64</span>(index),</span><br><span class="line">        &#125;</span><br><span class="line">        searchResults.SubResults = <span class="built_in">append</span>(searchResults.SubResults, subResult)</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">return</span> searchResults, <span class="literal">nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><hr><h2 id="完整流程总结"><a href="#完整流程总结" class="headerlink" title="完整流程总结"></a>完整流程总结</h2><h3 id="1-搜索流程时序图"><a href="#1-搜索流程时序图" class="headerlink" title="1. 搜索流程时序图"></a>1. 搜索流程时序图</h3><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br></pre></td><td class="code"><pre><span class="line">┌─────────┐</span><br><span class="line">│  Proxy  │</span><br><span class="line">└────┬────┘</span><br><span class="line">     │ 1. SearchRequest</span><br><span class="line">     ▼</span><br><span class="line">┌─────────────┐</span><br><span class="line">│ QueryNode  │</span><br><span class="line">└─────┬───────┘</span><br><span class="line">      │ 2. SearchTask.Execute</span><br><span class="line">      ▼</span><br><span class="line">┌─────────────────┐</span><br><span class="line">│ SearchTask      │</span><br><span class="line">│ - combinePlaceHolderGroups</span><br><span class="line">│ - NewSearchRequest</span><br><span class="line">└─────┬───────────┘</span><br><span class="line">      │ 3. SearchHistorical/SearchStreaming</span><br><span class="line">      ▼</span><br><span class="line">┌─────────────────┐</span><br><span class="line">│ searchSegments  │</span><br><span class="line">│ - 并发搜索多个 segments</span><br><span class="line">└─────┬───────────┘</span><br><span class="line">      │ 4. Segment.Search</span><br><span class="line">      ▼</span><br><span class="line">┌─────────────────┐</span><br><span class="line">│ Segment.Search  │</span><br><span class="line">│ - check_search</span><br><span class="line">│ - ExecPlanNodeVisitor</span><br><span class="line">└─────┬───────────┘</span><br><span class="line">      │ 5. ExecuteTask</span><br><span class="line">      ▼</span><br><span class="line">┌─────────────────┐</span><br><span class="line">│ Plan Nodes      │</span><br><span class="line">│ - MvccNode (MVCC 过滤)</span><br><span class="line">│ - VectorSearchNode (向量搜索 + TopK)</span><br><span class="line">│ - FilterNode (表达式过滤)</span><br><span class="line">│ - GroupByNode (Group By)</span><br><span class="line">└─────┬───────────┘</span><br><span class="line">      │ 6. SearchResult (局部 topK)</span><br><span class="line">      ▼</span><br><span class="line">┌─────────────────┐</span><br><span class="line">│ ReduceSearchResults │</span><br><span class="line">│ - K-Way Merge</span><br><span class="line">│ - 去重</span><br><span class="line">│ - Group By</span><br><span class="line">│ - 成本聚合</span><br><span class="line">└─────┬───────────┘</span><br><span class="line">      │ 7. 全局 topK 结果</span><br><span class="line">      ▼</span><br><span class="line">┌─────────┐</span><br><span class="line">│ Result  │</span><br><span class="line">└─────────┘</span><br></pre></td></tr></table></figure><h3 id="2-关键数据流"><a href="#2-关键数据流" class="headerlink" title="2. 关键数据流"></a>2. 关键数据流</h3><h4 id="SearchRequest-创建"><a href="#SearchRequest-创建" class="headerlink" title="SearchRequest 创建"></a>SearchRequest 创建</h4><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line">QueryRequest (protobuf)</span><br><span class="line">  ↓</span><br><span class="line">SerializedExprPlan (bytes)</span><br><span class="line">  ↓</span><br><span class="line">createSearchPlanByExpr</span><br><span class="line">  ↓</span><br><span class="line">SearchPlan (C++)</span><br><span class="line">  ↓</span><br><span class="line">SearchRequest (Go wrapper)</span><br></pre></td></tr></table></figure><h4 id="Segment-Search-执行"><a href="#Segment-Search-执行" class="headerlink" title="Segment Search 执行"></a>Segment Search 执行</h4><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line">SearchRequest</span><br><span class="line">  ↓</span><br><span class="line">Plan (包含 plan_node_)</span><br><span class="line">  ↓</span><br><span class="line">ExecPlanNodeVisitor.visit(VectorPlanNode)</span><br><span class="line">  ↓</span><br><span class="line">PlanFragment (执行计划节点树)</span><br><span class="line">  ↓</span><br><span class="line">ExecuteTask</span><br><span class="line">  ↓</span><br><span class="line">SearchResult (局部 topK)</span><br></pre></td></tr></table></figure><h4 id="Result-Reduce"><a href="#Result-Reduce" class="headerlink" title="Result Reduce"></a>Result Reduce</h4><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line">多个 SearchResult (局部 topK)</span><br><span class="line">  ↓</span><br><span class="line">DecodeSearchResults</span><br><span class="line">  ↓</span><br><span class="line">ReduceSearchResultData (K-Way Merge)</span><br><span class="line">  ↓</span><br><span class="line">EncodeSearchResultData</span><br><span class="line">  ↓</span><br><span class="line">SearchResults (全局 topK)</span><br></pre></td></tr></table></figure><h3 id="3-性能考虑"><a href="#3-性能考虑" class="headerlink" title="3. 性能考虑"></a>3. 性能考虑</h3><h4 id="时间复杂度"><a href="#时间复杂度" class="headerlink" title="时间复杂度"></a>时间复杂度</h4><ul><li><strong>Segment Search</strong>：O(N × log K)，其中 N 是 segment 内的数据量，K 是 topK</li><li><strong>Reduce</strong>：O(M × K)，其中 M 是 segment 数量，K 是 topK</li></ul><p>由于 M（segment 数量）通常远小于 N（数据量），Reduce 的开销相对较小。</p><h4 id="优化策略"><a href="#优化策略" class="headerlink" title="优化策略"></a>优化策略</h4><ol><li><strong>短路优化</strong>：如果只有一个 segment 的结果，直接返回</li><li><strong>并发搜索</strong>：多个 segments 并发搜索</li><li><strong>流式处理</strong>：使用 StreamingSearchTask 进行流式归约</li><li><strong>内存限制</strong>：检查结果大小，避免 OOM</li></ol><h3 id="4-关键要点总结"><a href="#4-关键要点总结" class="headerlink" title="4. 关键要点总结"></a>4. 关键要点总结</h3><h4 id="Segment-Search-层面"><a href="#Segment-Search-层面" class="headerlink" title="Segment Search 层面"></a>Segment Search 层面</h4><p>✅ 每个 segment 的 <code>Search</code> 已经完成了：</p><ul><li>MVCC 过滤（时间戳和删除过滤）</li><li>TopK 选择（向量搜索）</li><li>Group By（如果启用）</li><li>表达式过滤（如果有）</li></ul><p>✅ 但返回的是<strong>局部最优</strong>结果，只代表该 segment 内的 topK</p><h4 id="Reduce-层面"><a href="#Reduce-层面" class="headerlink" title="Reduce 层面"></a>Reduce 层面</h4><p>✅ Reduce 通过 <strong>K-Way Merge</strong> 算法合并多个 segment 的结果：</p><ul><li>选择全局最优的 topK</li><li>去除跨 segment 的重复 entity</li><li>处理全局的 group by</li><li>聚合成本信息</li></ul><p>✅ 最终得到<strong>全局最优</strong>的 topK 结果</p><h4 id="类比理解"><a href="#类比理解" class="headerlink" title="类比理解"></a>类比理解</h4><ul><li><strong>Segment Search</strong>：每个班级选出前 10 名</li><li><strong>Reduce</strong>：从所有班级的前 10 名中，选出全校前 10 名</li></ul><p>即使每个班级已经完成了排名，仍然需要全校级别的 Reduce 操作来选出最终的 topK。</p><hr><h2 id="参考资料"><a href="#参考资料" class="headerlink" title="参考资料"></a>参考资料</h2><ul><li><a href="./why_reduce_search_results.md">为什么需要对 Search Result 进行 Reduce</a></li><li><a href="./proxy-reduce.md">Proxy Reduce 算法详解</a></li><li><a href="./query_request_flow_analysis.md">Query Request Flow Analysis</a></li><li><a href="../design_docs/segcore/segment_interface.md">Segment Search Implementation</a></li></ul>]]></content>
    
    
      
      
    <summary type="html">&lt;h2 id=&quot;目录&quot;&gt;&lt;a href=&quot;#目录&quot; class=&quot;headerlink&quot; title=&quot;目录&quot;&gt;&lt;/a&gt;目录&lt;/h2&gt;&lt;ol&gt;
&lt;li&gt;&lt;a href=&quot;#%E6%A6%82%E8%BF%B0&quot;&gt;概述&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#%E6%90%9</summary>
      
    
    
    
    
    <category term="Milvus" scheme="https://szza.github.io/tags/Milvus/"/>
    
  </entry>
  
  <entry>
    <title>Milvus 为什么需要对 Search Result 进行 Reduce</title>
    <link href="https://szza.github.io/2025/08/10/Milvus/18_why_reduce_search_results/"/>
    <id>https://szza.github.io/2025/08/10/Milvus/18_why_reduce_search_results/</id>
    <published>2025-08-10T07:00:00.000Z</published>
    <updated>2026-01-06T13:10:50.533Z</updated>
    
    <content type="html"><![CDATA[<h2 id="概述"><a href="#概述" class="headerlink" title="概述"></a>概述</h2><p>在 Milvus 的搜索流程中，虽然每个 segment 在 <code>segment-&gt;Search</code> 中已经完成了 MVCC 过滤、topK 选择、group by 等操作，但仍然需要对多个 segment 的搜索结果进行 Reduce（归约）操作。本文档详细解释为什么需要 Reduce 以及 Reduce 的具体作用。</p><h2 id="核心原因：数据分布与全局-TopK"><a href="#核心原因：数据分布与全局-TopK" class="headerlink" title="核心原因：数据分布与全局 TopK"></a>核心原因：数据分布与全局 TopK</h2><p><strong>核心问题</strong>：一个查询请求通常涉及多个 segments，每个 segment 只返回自己数据范围内的 topK 结果，但用户需要的是<strong>全局最优的 topK</strong>。</p><h2 id="1-多个-Segments-的场景"><a href="#1-多个-Segments-的场景" class="headerlink" title="1. 多个 Segments 的场景"></a>1. 多个 Segments 的场景</h2><p>在 Milvus 中，一个 collection 的数据可能分布在：</p><ul><li><strong>多个 Sealed Segments</strong>：历史数据被分割成多个 sealed segments</li><li><strong>多个 Growing Segments</strong>：实时写入的数据存储在 growing segments 中</li><li><strong>多个 Shards</strong>：在分布式场景中，数据分布在多个 shards 上</li><li><strong>多个 QueryNodes</strong>：在分布式部署中，不同的 segments 可能位于不同的 QueryNode 上</li></ul><h3 id="示例场景"><a href="#示例场景" class="headerlink" title="示例场景"></a>示例场景</h3><p>假设一个 collection 有 1000 万条数据，被分割成：</p><ul><li>Segment 1: 300 万条（sealed）</li><li>Segment 2: 300 万条（sealed）</li><li>Segment 3: 300 万条（sealed）</li><li>Segment 4: 100 万条（growing）</li></ul><p>当用户请求 topK&#x3D;10 时，每个 segment 都会返回自己的 topK&#x3D;10，但我们需要的是从这 4000 万条数据中选出全局最优的 topK&#x3D;10。</p><h2 id="2-每个-Segment-返回的是局部-TopK"><a href="#2-每个-Segment-返回的是局部-TopK" class="headerlink" title="2. 每个 Segment 返回的是局部 TopK"></a>2. 每个 Segment 返回的是局部 TopK</h2><h3 id="Segment-Search-的执行流程"><a href="#Segment-Search-的执行流程" class="headerlink" title="Segment Search 的执行流程"></a>Segment Search 的执行流程</h3><p>每个 segment 在执行 <code>segment-&gt;Search</code> 时：</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><span class="line"><span class="function">std::unique_ptr&lt;SearchResult&gt;</span></span><br><span class="line"><span class="function"><span class="title">SegmentInternalInterface::Search</span><span class="params">(</span></span></span><br><span class="line"><span class="params"><span class="function">    <span class="type">const</span> query::Plan* plan,</span></span></span><br><span class="line"><span class="params"><span class="function">    <span class="type">const</span> query::PlaceholderGroup* placeholder_group,</span></span></span><br><span class="line"><span class="params"><span class="function">    Timestamp timestamp,</span></span></span><br><span class="line"><span class="params"><span class="function">    <span class="type">const</span> folly::CancellationToken&amp; cancel_token,</span></span></span><br><span class="line"><span class="params"><span class="function">    <span class="type">int32_t</span> consistency_level,</span></span></span><br><span class="line"><span class="params"><span class="function">    Timestamp collection_ttl)</span> <span class="type">const</span> </span>&#123;</span><br><span class="line">    <span class="comment">// ... 执行搜索计划节点树</span></span><br><span class="line">    <span class="comment">// 包括：MVCC 过滤、向量搜索、topK 选择、group by 等</span></span><br><span class="line">    *results = visitor.<span class="built_in">get_moved_result</span>(*plan-&gt;plan_node_);</span><br><span class="line">    <span class="keyword">return</span> results;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h3 id="Segment-返回的结果结构"><a href="#Segment-返回的结果结构" class="headerlink" title="Segment 返回的结果结构"></a>Segment 返回的结果结构</h3><p>每个 segment 的 <code>SearchResult</code> 包含：</p><ul><li><code>seg_offsets_</code>：该 segment 内的 topK 偏移量（已排序）</li><li><code>distances_</code>：该 segment 内的 topK 距离值（已排序）</li><li><code>group_by_values_</code>：group by 的值（如果启用）</li><li><code>topk_per_nq_prefix_sum_</code>：每个查询的 topK 前缀和</li></ul><p><strong>关键点</strong>：这些结果是<strong>局部最优</strong>的，只代表该 segment 内的 topK，不代表全局最优。</p><h2 id="3-Reduce-的作用：K-Way-Merge-算法"><a href="#3-Reduce-的作用：K-Way-Merge-算法" class="headerlink" title="3. Reduce 的作用：K-Way Merge 算法"></a>3. Reduce 的作用：K-Way Merge 算法</h2><p>Reduce 操作使用<strong>多路归并（K-Way Merge）算法</strong>，从多个已排序的结果中选择全局最优的 topK。</p><h3 id="算法原理"><a href="#算法原理" class="headerlink" title="算法原理"></a>算法原理</h3><p>Reduce 算法的核心思想类似于归并排序的归并阶段：</p><ol><li><strong>多路指针</strong>：为每个 segment 的结果维护一个指针（offset）</li><li><strong>比较选择</strong>：每次比较所有 segment 当前指针位置的值</li><li><strong>选择最优</strong>：选择当前最大的 score，移动对应 segment 的指针</li><li><strong>去重处理</strong>：如果遇到重复的 entity ID，跳过</li><li><strong>重复直到 topK</strong>：重复上述过程直到选出全局 topK 个结果</li></ol><h3 id="代码实现"><a href="#代码实现" class="headerlink" title="代码实现"></a>代码实现</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(scr *SearchCommonReduce)</span></span> ReduceSearchResultData(</span><br><span class="line">    ctx context.Context, </span><br><span class="line">    searchResultData []*schemapb.SearchResultData, </span><br><span class="line">    info *reduce.ResultInfo) (*schemapb.SearchResultData, <span class="type">error</span>) &#123;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 为每个 segment 的结果维护偏移量</span></span><br><span class="line">    offsets := <span class="built_in">make</span>([]<span class="type">int64</span>, <span class="built_in">len</span>(searchResultData))</span><br><span class="line">    idSet := <span class="built_in">make</span>(<span class="keyword">map</span>[<span class="keyword">interface</span>&#123;&#125;]<span class="keyword">struct</span>&#123;&#125;)  <span class="comment">// 用于去重</span></span><br><span class="line">    </span><br><span class="line">    <span class="keyword">for</span> i := <span class="type">int64</span>(<span class="number">0</span>); i &lt; info.GetNq(); i++ &#123;</span><br><span class="line">        <span class="keyword">var</span> j <span class="type">int64</span></span><br><span class="line">        <span class="keyword">for</span> j = <span class="number">0</span>; j &lt; info.GetTopK(); &#123;</span><br><span class="line">            <span class="comment">// 从所有 segment 中选择当前最高分</span></span><br><span class="line">            sel := SelectSearchResultData(searchResultData, resultOffsets, offsets, i)</span><br><span class="line">            <span class="keyword">if</span> sel == <span class="number">-1</span> &#123;</span><br><span class="line">                <span class="keyword">break</span></span><br><span class="line">            &#125;</span><br><span class="line">            </span><br><span class="line">            idx := resultOffsets[sel][i] + offsets[sel]</span><br><span class="line">            id := typeutil.GetPK(searchResultData[sel].GetIds(), idx)</span><br><span class="line">            score := searchResultData[sel].Scores[idx]</span><br><span class="line">            </span><br><span class="line">            <span class="comment">// 去重：跳过已存在的 entity</span></span><br><span class="line">            <span class="keyword">if</span> _, ok := idSet[id]; !ok &#123;</span><br><span class="line">                <span class="comment">// 添加到最终结果</span></span><br><span class="line">                retSize += typeutil.AppendFieldData(ret.FieldsData, </span><br><span class="line">                    searchResultData[sel].FieldsData, idx)</span><br><span class="line">                typeutil.AppendPKs(ret.Ids, id)</span><br><span class="line">                ret.Scores = <span class="built_in">append</span>(ret.Scores, score)</span><br><span class="line">                idSet[id] = <span class="keyword">struct</span>&#123;&#125;&#123;&#125;</span><br><span class="line">                j++</span><br><span class="line">            &#125; <span class="keyword">else</span> &#123;</span><br><span class="line">                skipDupCnt++  <span class="comment">// 跳过重复的 entity</span></span><br><span class="line">            &#125;</span><br><span class="line">            offsets[sel]++  <span class="comment">// 移动指针</span></span><br><span class="line">        &#125;</span><br><span class="line">        ret.Topks = <span class="built_in">append</span>(ret.Topks, j)</span><br><span class="line">    &#125;</span><br><span class="line">    <span class="keyword">return</span> ret, <span class="literal">nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h3 id="SelectSearchResultData-函数"><a href="#SelectSearchResultData-函数" class="headerlink" title="SelectSearchResultData 函数"></a>SelectSearchResultData 函数</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">SelectSearchResultData</span><span class="params">(</span></span></span><br><span class="line"><span class="params"><span class="function">    dataArray []*schemapb.SearchResultData, </span></span></span><br><span class="line"><span class="params"><span class="function">    resultOffsets [][]<span class="type">int64</span>, </span></span></span><br><span class="line"><span class="params"><span class="function">    offsets []<span class="type">int64</span>, </span></span></span><br><span class="line"><span class="params"><span class="function">    qi <span class="type">int64</span>)</span></span> <span class="type">int</span> &#123;</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">var</span> sel = <span class="number">-1</span></span><br><span class="line">    <span class="keyword">var</span> maxDistance = -<span class="type">float32</span>(math.MaxFloat32)</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 遍历所有 segment，找到当前最高分</span></span><br><span class="line">    <span class="keyword">for</span> i, offset := <span class="keyword">range</span> offsets &#123;</span><br><span class="line">        <span class="keyword">if</span> offset &gt;= dataArray[i].Topks[qi] &#123;</span><br><span class="line">            <span class="keyword">continue</span>  <span class="comment">// 该 segment 的结果已用完</span></span><br><span class="line">        &#125;</span><br><span class="line">        </span><br><span class="line">        idx := resultOffsets[i][qi] + offset</span><br><span class="line">        distance := dataArray[i].Scores[idx]</span><br><span class="line">        </span><br><span class="line">        <span class="keyword">if</span> distance &gt; maxDistance &#123;</span><br><span class="line">            sel = i</span><br><span class="line">            maxDistance = distance</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line">    <span class="keyword">return</span> sel</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h2 id="4-示例说明"><a href="#4-示例说明" class="headerlink" title="4. 示例说明"></a>4. 示例说明</h2><p>假设有 3 个 segments，每个返回 topK&#x3D;10，用户请求全局 topK&#x3D;10：</p><h3 id="Segment-结果（已排序）"><a href="#Segment-结果（已排序）" class="headerlink" title="Segment 结果（已排序）"></a>Segment 结果（已排序）</h3><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">Segment 1: [0.95, 0.90, 0.85, 0.80, 0.75, ...] (10个结果)</span><br><span class="line">Segment 2: [0.92, 0.88, 0.82, 0.78, 0.73, ...] (10个结果)</span><br><span class="line">Segment 3: [0.89, 0.87, 0.80, 0.76, 0.71, ...] (10个结果)</span><br></pre></td></tr></table></figure><h3 id="Reduce-过程"><a href="#Reduce-过程" class="headerlink" title="Reduce 过程"></a>Reduce 过程</h3><table><thead><tr><th>步骤</th><th>Segment 1 指针</th><th>Segment 2 指针</th><th>Segment 3 指针</th><th>选择</th><th>全局结果</th></tr></thead><tbody><tr><td>1</td><td>0 (0.95)</td><td>0 (0.92)</td><td>0 (0.89)</td><td>Seg1</td><td>[0.95]</td></tr><tr><td>2</td><td>1 (0.90)</td><td>0 (0.92)</td><td>0 (0.89)</td><td>Seg2</td><td>[0.95, 0.92]</td></tr><tr><td>3</td><td>1 (0.90)</td><td>1 (0.88)</td><td>0 (0.89)</td><td>Seg1</td><td>[0.95, 0.92, 0.90]</td></tr><tr><td>4</td><td>2 (0.85)</td><td>1 (0.88)</td><td>0 (0.89)</td><td>Seg3</td><td>[0.95, 0.92, 0.90, 0.89]</td></tr><tr><td>5</td><td>2 (0.85)</td><td>1 (0.88)</td><td>1 (0.87)</td><td>Seg2</td><td>[0.95, 0.92, 0.90, 0.89, 0.88]</td></tr><tr><td>…</td><td>…</td><td>…</td><td>…</td><td>…</td><td>…</td></tr><tr><td>10</td><td>…</td><td>…</td><td>…</td><td>…</td><td>[全局 topK&#x3D;10]</td></tr></tbody></table><p><strong>结果</strong>：最终得到全局最优的 topK&#x3D;10 个结果。</p><h2 id="5-Reduce-的其他重要作用"><a href="#5-Reduce-的其他重要作用" class="headerlink" title="5. Reduce 的其他重要作用"></a>5. Reduce 的其他重要作用</h2><h3 id="5-1-去重（Deduplication）"><a href="#5-1-去重（Deduplication）" class="headerlink" title="5.1 去重（Deduplication）"></a>5.1 去重（Deduplication）</h3><p>同一个 entity 可能同时出现在多个 segments 中：</p><ul><li><strong>Growing + Sealed</strong>：新写入的数据在 growing segment，但可能还未 flush 到 sealed segment</li><li><strong>Compaction</strong>：compaction 过程中，数据可能同时存在于新旧 segments</li><li><strong>多副本</strong>：在分布式场景中，可能存在数据副本</li></ul><p>Reduce 通过 <code>idSet</code> 确保每个 entity 只出现一次：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line">idSet := <span class="built_in">make</span>(<span class="keyword">map</span>[<span class="keyword">interface</span>&#123;&#125;]<span class="keyword">struct</span>&#123;&#125;)</span><br><span class="line"><span class="keyword">if</span> _, ok := idSet[id]; !ok &#123;</span><br><span class="line">    <span class="comment">// 添加到结果</span></span><br><span class="line">    idSet[id] = <span class="keyword">struct</span>&#123;&#125;&#123;&#125;</span><br><span class="line">&#125; <span class="keyword">else</span> &#123;</span><br><span class="line">    skipDupCnt++  <span class="comment">// 跳过重复</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h3 id="5-2-Group-By-处理"><a href="#5-2-Group-By-处理" class="headerlink" title="5.2 Group By 处理"></a>5.2 Group By 处理</h3><p>如果有 group by 需求，Reduce 阶段会进行全局的 group by 处理：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(sbr *SearchGroupByReduce)</span></span> ReduceSearchResultData(...) &#123;</span><br><span class="line">    <span class="comment">// 1. 按 group by 值分组</span></span><br><span class="line">    groupByValueMap := <span class="built_in">make</span>(<span class="keyword">map</span>[<span class="keyword">interface</span>&#123;&#125;]<span class="type">int64</span>)</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 2. 为每个 group 选择 topK</span></span><br><span class="line">    <span class="keyword">for</span> j = <span class="number">0</span>; j &lt; groupBound; &#123;</span><br><span class="line">        <span class="comment">// 选择当前最高分</span></span><br><span class="line">        sel := SelectSearchResultData(...)</span><br><span class="line">        groupByVal := groupByValIterator[sel](idx)</span><br><span class="line">        </span><br><span class="line">        <span class="comment">// 3. 检查 group 限制</span></span><br><span class="line">        <span class="keyword">if</span> groupCount &gt;= groupSize &#123;</span><br><span class="line">            <span class="comment">// 跳过：该 group 已满</span></span><br><span class="line">        &#125; <span class="keyword">else</span> &#123;</span><br><span class="line">            <span class="comment">// 添加到该 group</span></span><br><span class="line">            groupByValueMap[groupByVal]++</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h3 id="5-3-成本聚合"><a href="#5-3-成本聚合" class="headerlink" title="5.3 成本聚合"></a>5.3 成本聚合</h3><p>Reduce 还会聚合各个 segment 的存储扫描成本：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">storageCost := lo.Reduce(results, <span class="function"><span class="keyword">func</span><span class="params">(acc segcore.StorageCost, </span></span></span><br><span class="line"><span class="params"><span class="function">    result *internalpb.SearchResults, _ <span class="type">int</span>)</span></span> segcore.StorageCost &#123;</span><br><span class="line">    acc.ScannedRemoteBytes += result.GetScannedRemoteBytes()</span><br><span class="line">    acc.ScannedTotalBytes += result.GetScannedTotalBytes()</span><br><span class="line">    <span class="keyword">return</span> acc</span><br><span class="line">&#125;, segcore.StorageCost&#123;&#125;)</span><br></pre></td></tr></table></figure><h2 id="6-性能考虑"><a href="#6-性能考虑" class="headerlink" title="6. 性能考虑"></a>6. 性能考虑</h2><h3 id="时间复杂度"><a href="#时间复杂度" class="headerlink" title="时间复杂度"></a>时间复杂度</h3><ul><li><strong>Segment Search</strong>：O(N × log K)，其中 N 是 segment 内的数据量，K 是 topK</li><li><strong>Reduce</strong>：O(M × K)，其中 M 是 segment 数量，K 是 topK</li></ul><p>由于 M（segment 数量）通常远小于 N（数据量），Reduce 的开销相对较小。</p><h3 id="优化策略"><a href="#优化策略" class="headerlink" title="优化策略"></a>优化策略</h3><ol><li><p><strong>短路优化</strong>：如果只有一个 segment 的结果，直接返回，无需 Reduce</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">if</span> <span class="built_in">len</span>(results) == <span class="number">1</span> &#123;</span><br><span class="line">    <span class="keyword">return</span> results[<span class="number">0</span>], <span class="literal">nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure></li><li><p><strong>并行处理</strong>：对于多个查询（nq &gt; 1），可以并行处理每个查询的 Reduce</p></li><li><p><strong>内存限制</strong>：检查结果大小，避免 OOM</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">if</span> retSize &gt; maxOutputSize &#123;</span><br><span class="line">    <span class="keyword">return</span> <span class="literal">nil</span>, fmt.Errorf(<span class="string">&quot;search results exceed the maxOutputSize Limit&quot;</span>)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure></li></ol><h2 id="7-总结"><a href="#7-总结" class="headerlink" title="7. 总结"></a>7. 总结</h2><h3 id="为什么需要-Reduce？"><a href="#为什么需要-Reduce？" class="headerlink" title="为什么需要 Reduce？"></a>为什么需要 Reduce？</h3><ol><li><strong>数据分布</strong>：数据分布在多个 segments 中</li><li><strong>局部 vs 全局</strong>：每个 segment 返回局部 topK，用户需要全局 topK</li><li><strong>去重需求</strong>：需要去除跨 segment 的重复 entity</li><li><strong>Group By</strong>：需要全局的 group by 处理</li><li><strong>成本聚合</strong>：需要聚合各个 segment 的成本信息</li></ol><h3 id="关键要点"><a href="#关键要点" class="headerlink" title="关键要点"></a>关键要点</h3><ul><li>✅ 每个 segment 的 <code>Search</code> 已经完成了 MVCC、topK、group by 等操作</li><li>✅ 但返回的是<strong>局部最优</strong>结果</li><li>✅ Reduce 通过 <strong>K-Way Merge</strong> 算法合并多个 segment 的结果</li><li>✅ 最终得到<strong>全局最优</strong>的 topK 结果</li><li>✅ 同时处理去重、group by、成本聚合等逻辑</li></ul><h3 id="类比理解"><a href="#类比理解" class="headerlink" title="类比理解"></a>类比理解</h3><p>可以类比为：</p><ul><li><strong>Segment Search</strong>：每个班级选出前 10 名</li><li><strong>Reduce</strong>：从所有班级的前 10 名中，选出全校前 10 名</li></ul><p>即使每个班级已经完成了排名，仍然需要全校级别的 Reduce 操作来选出最终的 topK。</p><h2 id="参考资料"><a href="#参考资料" class="headerlink" title="参考资料"></a>参考资料</h2><ul><li><a href="./proxy-reduce.md">Proxy Reduce 算法详解</a></li><li><a href="./query_request_flow_analysis.md">Query Request Flow Analysis</a></li><li><a href="../design_docs/segcore/segment_interface.md">Segment Search Implementation</a></li></ul>]]></content>
    
    
      
      
    <summary type="html">&lt;h2 id=&quot;概述&quot;&gt;&lt;a href=&quot;#概述&quot; class=&quot;headerlink&quot; title=&quot;概述&quot;&gt;&lt;/a&gt;概述&lt;/h2&gt;&lt;p&gt;在 Milvus 的搜索流程中，虽然每个 segment 在 &lt;code&gt;segment-&amp;gt;Search&lt;/code&gt; 中已经完成了 </summary>
      
    
    
    
    
    <category term="Milvus" scheme="https://szza.github.io/tags/Milvus/"/>
    
  </entry>
  
  <entry>
    <title>Milvus organizeSubTask 函数详细分析</title>
    <link href="https://szza.github.io/2025/08/10/Milvus/17_organize_subtask_analysis/"/>
    <id>https://szza.github.io/2025/08/10/Milvus/17_organize_subtask_analysis/</id>
    <published>2025-08-10T06:00:00.000Z</published>
    <updated>2026-01-06T13:10:49.877Z</updated>
    
    <content type="html"><![CDATA[<p>本文档深入分析 <code>organizeSubTask</code> 函数的实现细节，这是 QueryNode delegator 中负责组织和分发子任务的核心函数。</p><h2 id="函数签名"><a href="#函数签名" class="headerlink" title="函数签名"></a>函数签名</h2><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">organizeSubTask</span>[<span class="title">T</span> <span class="title">any</span>]<span class="params">(</span></span></span><br><span class="line"><span class="params"><span class="function">    ctx context.Context,</span></span></span><br><span class="line"><span class="params"><span class="function">    req T,</span></span></span><br><span class="line"><span class="params"><span class="function">    sealed []SnapshotItem,</span></span></span><br><span class="line"><span class="params"><span class="function">    growing []SegmentEntry,</span></span></span><br><span class="line"><span class="params"><span class="function">    sd *shardDelegator,</span></span></span><br><span class="line"><span class="params"><span class="function">    skipEmpty <span class="type">bool</span>,</span></span></span><br><span class="line"><span class="params"><span class="function">    modify <span class="keyword">func</span>(T, querypb.DataScope, []<span class="type">int64</span>, <span class="type">int64</span>)</span></span> T,</span><br><span class="line">) ([]subTask[T], <span class="type">error</span>)</span><br></pre></td></tr></table></figure><p><strong>位置</strong>: <code>internal/querynodev2/delegator/delegator.go:751</code></p><h2 id="函数职责"><a href="#函数职责" class="headerlink" title="函数职责"></a>函数职责</h2><p><code>organizeSubTask</code> 是一个泛型函数，负责将查询&#x2F;搜索请求组织成多个子任务，每个子任务对应一组需要在特定节点上查询的 segments。</p><h2 id="参数说明"><a href="#参数说明" class="headerlink" title="参数说明"></a>参数说明</h2><table><thead><tr><th>参数</th><th>类型</th><th>说明</th></tr></thead><tbody><tr><td><code>ctx</code></td><td><code>context.Context</code></td><td>上下文，用于取消和超时控制</td></tr><tr><td><code>req</code></td><td><code>T</code> (泛型)</td><td>原始请求，可以是 <code>QueryRequest</code>、<code>SearchRequest</code> 或 <code>GetStatisticsRequest</code></td></tr><tr><td><code>sealed</code></td><td><code>[]SnapshotItem</code></td><td>sealed segments 分布快照，按节点分组</td></tr><tr><td><code>growing</code></td><td><code>[]SegmentEntry</code></td><td>growing segments 列表（通常在本地节点）</td></tr><tr><td><code>sd</code></td><td><code>*shardDelegator</code></td><td>shard delegator 实例，提供 workerManager 等资源</td></tr><tr><td><code>skipEmpty</code></td><td><code>bool</code></td><td>是否跳过空的子任务（没有 segments）</td></tr><tr><td><code>modify</code></td><td><code>func(...)</code></td><td>修改请求的回调函数，为每个子任务定制请求参数</td></tr></tbody></table><h2 id="返回值"><a href="#返回值" class="headerlink" title="返回值"></a>返回值</h2><ul><li><code>[]subTask[T]</code>: 子任务列表，每个子任务包含请求、目标节点 ID 和 worker 客户端</li><li><code>error</code>: 错误信息（当前实现总是返回 nil）</li></ul><h2 id="核心数据结构"><a href="#核心数据结构" class="headerlink" title="核心数据结构"></a>核心数据结构</h2><h3 id="subTask"><a href="#subTask" class="headerlink" title="subTask"></a>subTask</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">type</span> subTask[T any] <span class="keyword">struct</span> &#123;</span><br><span class="line">    req      T              <span class="comment">// 修改后的请求（包含特定的 segmentIDs 和 targetID）</span></span><br><span class="line">    targetID <span class="type">int64</span>          <span class="comment">// 目标节点 ID</span></span><br><span class="line">    worker   cluster.Worker <span class="comment">// Worker 客户端（可能为 nil 如果节点不可用）</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h3 id="SnapshotItem"><a href="#SnapshotItem" class="headerlink" title="SnapshotItem"></a>SnapshotItem</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">type</span> SnapshotItem <span class="keyword">struct</span> &#123;</span><br><span class="line">    NodeID   <span class="type">int64</span>           <span class="comment">// 节点 ID</span></span><br><span class="line">    Segments []SegmentEntry  <span class="comment">// 该节点上的 segments</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>说明</strong>: SnapshotItem 表示某个节点上的 sealed segments 分组。</p><h3 id="SegmentEntry"><a href="#SegmentEntry" class="headerlink" title="SegmentEntry"></a>SegmentEntry</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">type</span> SegmentEntry <span class="keyword">struct</span> &#123;</span><br><span class="line">    NodeID        <span class="type">int64</span>                <span class="comment">// 节点 ID</span></span><br><span class="line">    SegmentID     UniqueID             <span class="comment">// Segment ID</span></span><br><span class="line">    PartitionID   UniqueID             <span class="comment">// Partition ID</span></span><br><span class="line">    Version       <span class="type">int64</span>                <span class="comment">// Segment 版本</span></span><br><span class="line">    TargetVersion <span class="type">int64</span>                <span class="comment">// 目标版本</span></span><br><span class="line">    Level         datapb.SegmentLevel  <span class="comment">// Segment 级别（L0, L1 等）</span></span><br><span class="line">    Offline       <span class="type">bool</span>                 <span class="comment">// 是否离线</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h2 id="执行流程详解"><a href="#执行流程详解" class="headerlink" title="执行流程详解"></a>执行流程详解</h2><h3 id="1-初始化结果列表-Line-760"><a href="#1-初始化结果列表-Line-760" class="headerlink" title="1. 初始化结果列表 (Line 760)"></a>1. 初始化结果列表 (Line 760)</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">log := sd.getLogger(ctx)</span><br><span class="line">result := <span class="built_in">make</span>([]subTask[T], <span class="number">0</span>, <span class="built_in">len</span>(sealed)+<span class="number">1</span>)</span><br></pre></td></tr></table></figure><ul><li><strong>容量预分配</strong>: <code>len(sealed)+1</code><ul><li><code>len(sealed)</code>: 每个 sealed snapshot 一个任务</li><li><code>+1</code>: growing segments 一个任务</li></ul></li></ul><h3 id="2-定义-packSubTask-闭包函数-Lines-762-788"><a href="#2-定义-packSubTask-闭包函数-Lines-762-788" class="headerlink" title="2. 定义 packSubTask 闭包函数 (Lines 762-788)"></a>2. 定义 packSubTask 闭包函数 (Lines 762-788)</h3><p><code>packSubTask</code> 是核心的任务打包函数，负责为一组 segments 创建子任务。</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br></pre></td><td class="code"><pre><span class="line">packSubTask := <span class="function"><span class="keyword">func</span><span class="params">(segments []SegmentEntry, workerID <span class="type">int64</span>, scope querypb.DataScope)</span></span> <span class="type">error</span> &#123;</span><br><span class="line">    <span class="comment">// 步骤 1: 提取 segment IDs</span></span><br><span class="line">    segmentIDs := lo.Map(segments, <span class="function"><span class="keyword">func</span><span class="params">(item SegmentEntry, _ <span class="type">int</span>)</span></span> <span class="type">int64</span> &#123;</span><br><span class="line">        <span class="keyword">return</span> item.SegmentID</span><br><span class="line">    &#125;)</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 步骤 2: 跳过空任务（可选）</span></span><br><span class="line">    <span class="keyword">if</span> skipEmpty &amp;&amp; <span class="built_in">len</span>(segmentIDs) == <span class="number">0</span> &#123;</span><br><span class="line">        <span class="keyword">return</span> <span class="literal">nil</span></span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 步骤 3: 修改请求</span></span><br><span class="line">    req := modify(req, scope, segmentIDs, workerID)</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 步骤 4: 获取 worker（容错）</span></span><br><span class="line">    worker, err := sd.workerManager.GetWorker(ctx, workerID)</span><br><span class="line">    <span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">        log.Warn(<span class="string">&quot;failed to get worker for sub task&quot;</span>,</span><br><span class="line">            zap.Int64(<span class="string">&quot;nodeID&quot;</span>, workerID),</span><br><span class="line">            zap.Int64s(<span class="string">&quot;segments&quot;</span>, segmentIDs),</span><br><span class="line">            zap.Error(err),</span><br><span class="line">        )</span><br><span class="line">        <span class="comment">// 注意：即使获取 worker 失败，仍然创建任务</span></span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 步骤 5: 创建并添加子任务</span></span><br><span class="line">    result = <span class="built_in">append</span>(result, subTask[T]&#123;</span><br><span class="line">        req:      req,</span><br><span class="line">        targetID: workerID,</span><br><span class="line">        worker:   worker,  <span class="comment">// 可能为 nil</span></span><br><span class="line">    &#125;)</span><br><span class="line">    <span class="keyword">return</span> <span class="literal">nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h4 id="packSubTask-的关键特性"><a href="#packSubTask-的关键特性" class="headerlink" title="packSubTask 的关键特性"></a>packSubTask 的关键特性</h4><ol><li><strong>Segment IDs 提取</strong>: 使用 <code>lo.Map</code> 提取所有 segment IDs</li><li><strong>空任务处理</strong>: 根据 <code>skipEmpty</code> 决定是否跳过没有 segments 的任务</li><li><strong>请求定制</strong>: 通过 <code>modify</code> 函数为每个子任务定制请求参数</li><li><strong>容错设计</strong>: 即使 worker 获取失败，仍然创建任务（worker 为 nil），留给后续的 <code>executeSubTasks</code> 处理</li></ol><h3 id="3-处理-Sealed-Segments-Lines-790-795"><a href="#3-处理-Sealed-Segments-Lines-790-795" class="headerlink" title="3. 处理 Sealed Segments (Lines 790-795)"></a>3. 处理 Sealed Segments (Lines 790-795)</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">for</span> _, entry := <span class="keyword">range</span> sealed &#123;</span><br><span class="line">    err := packSubTask(entry.Segments, entry.NodeID, querypb.DataScope_Historical)</span><br><span class="line">    <span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">        <span class="keyword">return</span> <span class="literal">nil</span>, err</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>逻辑</strong>:</p><ul><li>遍历每个 <code>SnapshotItem</code>（每个代表一个节点上的 sealed segments）</li><li>为每个节点创建一个子任务</li><li><strong>Scope</strong>: <code>DataScope_Historical</code>（历史数据）</li><li><strong>Worker ID</strong>: <code>entry.NodeID</code>（远程节点）</li></ul><p><strong>示例</strong>:</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line">sealed = [</span><br><span class="line">    &#123;NodeID: 1, Segments: [seg1, seg2, seg3]&#125;,</span><br><span class="line">    &#123;NodeID: 2, Segments: [seg4, seg5]&#125;,</span><br><span class="line">    &#123;NodeID: 3, Segments: [seg6]&#125;,</span><br><span class="line">]</span><br><span class="line"></span><br><span class="line">生成 3 个子任务:</span><br><span class="line">- Task 1: NodeID=1, SegmentIDs=[seg1, seg2, seg3], Scope=Historical</span><br><span class="line">- Task 2: NodeID=2, SegmentIDs=[seg4, seg5], Scope=Historical</span><br><span class="line">- Task 3: NodeID=3, SegmentIDs=[seg6], Scope=Historical</span><br></pre></td></tr></table></figure><h3 id="4-处理-Growing-Segments-Line-797"><a href="#4-处理-Growing-Segments-Line-797" class="headerlink" title="4. 处理 Growing Segments (Line 797)"></a>4. 处理 Growing Segments (Line 797)</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">packSubTask(growing, paramtable.GetNodeID(), querypb.DataScope_Streaming)</span><br></pre></td></tr></table></figure><p><strong>逻辑</strong>:</p><ul><li>所有 growing segments 在一个子任务中</li><li><strong>Scope</strong>: <code>DataScope_Streaming</code>（流式数据）</li><li><strong>Worker ID</strong>: <code>paramtable.GetNodeID()</code>（本地节点）</li><li>Growing segments 总是在当前节点（leader）上</li></ul><p><strong>为什么在本地节点</strong>:</p><ul><li>Growing segments 是正在写入的数据</li><li>只有 shard leader（当前 delegator 所在节点）负责 growing segments</li><li>不会分布到其他节点</li></ul><h3 id="5-返回结果-Line-799"><a href="#5-返回结果-Line-799" class="headerlink" title="5. 返回结果 (Line 799)"></a>5. 返回结果 (Line 799)</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">return</span> result, <span class="literal">nil</span></span><br></pre></td></tr></table></figure><p>返回所有创建的子任务。</p><h2 id="modify-函数详解"><a href="#modify-函数详解" class="headerlink" title="modify 函数详解"></a>modify 函数详解</h2><p><code>modify</code> 函数是一个回调函数，用于为每个子任务定制请求参数。不同的操作类型有不同的实现。</p><h3 id="Query-操作的-modify-函数"><a href="#Query-操作的-modify-函数" class="headerlink" title="Query 操作的 modify 函数"></a>Query 操作的 modify 函数</h3><p><strong>函数</strong>: <code>shardDelegator.modifyQueryRequest</code> (line 292)</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(sd *shardDelegator)</span></span> modifyQueryRequest(</span><br><span class="line">    req *querypb.QueryRequest, </span><br><span class="line">    scope querypb.DataScope, </span><br><span class="line">    segmentIDs []<span class="type">int64</span>, </span><br><span class="line">    targetID <span class="type">int64</span>,</span><br><span class="line">) *querypb.QueryRequest &#123;</span><br><span class="line">    nodeReq := proto.Clone(req).(*querypb.QueryRequest)</span><br><span class="line">    nodeReq.Scope = scope                      <span class="comment">// 设置 scope</span></span><br><span class="line">    nodeReq.Req.Base.TargetID = targetID        <span class="comment">// 设置目标节点</span></span><br><span class="line">    nodeReq.SegmentIDs = segmentIDs             <span class="comment">// 设置要查询的 segments</span></span><br><span class="line">    nodeReq.DmlChannels = []<span class="type">string</span>&#123;sd.vchannelName&#125;</span><br><span class="line">    <span class="keyword">return</span> nodeReq</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h3 id="Search-操作的-modify-函数"><a href="#Search-操作的-modify-函数" class="headerlink" title="Search 操作的 modify 函数"></a>Search 操作的 modify 函数</h3><p><strong>函数</strong>: <code>shardDelegator.modifySearchRequest</code> (line 280)</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(sd *shardDelegator)</span></span> modifySearchRequest(</span><br><span class="line">    req *querypb.SearchRequest, </span><br><span class="line">    scope querypb.DataScope, </span><br><span class="line">    segmentIDs []<span class="type">int64</span>, </span><br><span class="line">    targetID <span class="type">int64</span>,</span><br><span class="line">) *querypb.SearchRequest &#123;</span><br><span class="line">    nodeReq := &amp;querypb.SearchRequest&#123;</span><br><span class="line">        DmlChannels:     []<span class="type">string</span>&#123;sd.vchannelName&#125;,</span><br><span class="line">        SegmentIDs:      segmentIDs,</span><br><span class="line">        Scope:           scope,</span><br><span class="line">        Req:             sd.shallowCopySearchRequest(req.GetReq(), targetID),</span><br><span class="line">        FromShardLeader: req.FromShardLeader,</span><br><span class="line">        TotalChannelNum: req.TotalChannelNum,</span><br><span class="line">    &#125;</span><br><span class="line">    <span class="keyword">return</span> nodeReq</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h3 id="GetStatistics-操作的-modify-函数"><a href="#GetStatistics-操作的-modify-函数" class="headerlink" title="GetStatistics 操作的 modify 函数"></a>GetStatistics 操作的 modify 函数</h3><p><strong>使用</strong>: 内联定义 (line 717)</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span><span class="params">(req *querypb.GetStatisticsRequest, scope querypb.DataScope, segmentIDs []<span class="type">int64</span>, targetID <span class="type">int64</span>)</span></span> *querypb.GetStatisticsRequest &#123;</span><br><span class="line">    nodeReq := proto.Clone(req).(*querypb.GetStatisticsRequest)</span><br><span class="line">    nodeReq.GetReq().GetBase().TargetID = targetID</span><br><span class="line">    nodeReq.Scope = scope</span><br><span class="line">    nodeReq.SegmentIDs = segmentIDs</span><br><span class="line">    nodeReq.FromShardLeader = <span class="literal">true</span></span><br><span class="line">    <span class="keyword">return</span> nodeReq</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h2 id="Worker-获取机制"><a href="#Worker-获取机制" class="headerlink" title="Worker 获取机制"></a>Worker 获取机制</h2><h3 id="WorkerManager-GetWorker"><a href="#WorkerManager-GetWorker" class="headerlink" title="WorkerManager.GetWorker"></a>WorkerManager.GetWorker</h3><p><strong>位置</strong>: <code>internal/querynodev2/cluster/manager.go:47</code></p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(m *grpcWorkerManager)</span></span> GetWorker(ctx context.Context, nodeID <span class="type">int64</span>) (Worker, <span class="type">error</span>) &#123;</span><br><span class="line">    <span class="comment">// 1. 尝试从缓存获取</span></span><br><span class="line">    worker, ok := m.workers.Get(nodeID)</span><br><span class="line">    <span class="keyword">var</span> err <span class="type">error</span></span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 2. 如果不存在，使用 singleflight 创建（防止重复创建）</span></span><br><span class="line">    <span class="keyword">if</span> !ok &#123;</span><br><span class="line">        worker, err, _ = m.sf.Do(strconv.FormatInt(nodeID, <span class="number">10</span>), <span class="function"><span class="keyword">func</span><span class="params">()</span></span> (Worker, <span class="type">error</span>) &#123;</span><br><span class="line">            <span class="comment">// 使用 builder 创建 worker</span></span><br><span class="line">            worker, err = m.builder(ctx, nodeID)</span><br><span class="line">            <span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">                <span class="keyword">return</span> <span class="literal">nil</span>, err</span><br><span class="line">            &#125;</span><br><span class="line">            </span><br><span class="line">            <span class="comment">// 插入缓存（如果已存在则使用已有的）</span></span><br><span class="line">            old, exist := m.workers.GetOrInsert(nodeID, worker)</span><br><span class="line">            <span class="keyword">if</span> exist &#123;</span><br><span class="line">                worker.Stop()</span><br><span class="line">                worker = old</span><br><span class="line">            &#125;</span><br><span class="line">            <span class="keyword">return</span> worker, <span class="literal">nil</span></span><br><span class="line">        &#125;)</span><br><span class="line">        <span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">            <span class="keyword">return</span> <span class="literal">nil</span>, err</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 3. 检查 worker 健康状态</span></span><br><span class="line">    <span class="keyword">if</span> !worker.IsHealthy() &#123;</span><br><span class="line">        <span class="keyword">return</span> <span class="literal">nil</span>, fmt.Errorf(<span class="string">&quot;node is not healthy: %d&quot;</span>, nodeID)</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">return</span> worker, <span class="literal">nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>特性</strong>:</p><ul><li><strong>缓存机制</strong>: 复用已创建的 worker</li><li><strong>Singleflight</strong>: 防止并发创建同一个 worker</li><li><strong>健康检查</strong>: 确保 worker 可用</li></ul><h3 id="Worker-类型"><a href="#Worker-类型" class="headerlink" title="Worker 类型"></a>Worker 类型</h3><h4 id="本地-Worker-LocalWorker"><a href="#本地-Worker-LocalWorker" class="headerlink" title="本地 Worker (LocalWorker)"></a>本地 Worker (LocalWorker)</h4><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// 当 nodeID == 当前节点 ID 时创建</span></span><br><span class="line"><span class="keyword">if</span> nodeID == node.GetNodeID() &#123;</span><br><span class="line">    <span class="keyword">return</span> NewLocalWorker(node), <span class="literal">nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><ul><li>直接调用本地 QueryNode 方法</li><li>无网络开销</li></ul><h4 id="远程-Worker-RemoteWorker"><a href="#远程-Worker-RemoteWorker" class="headerlink" title="远程 Worker (RemoteWorker)"></a>远程 Worker (RemoteWorker)</h4><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// 远程节点</span></span><br><span class="line"><span class="keyword">return</span> cluster.NewPoolingRemoteWorker(<span class="function"><span class="keyword">func</span><span class="params">()</span></span> (types.QueryNodeClient, <span class="type">error</span>) &#123;</span><br><span class="line">    <span class="keyword">return</span> grpcquerynodeclient.NewClient(node.ctx, addr, nodeID)</span><br><span class="line">&#125;)</span><br></pre></td></tr></table></figure><ul><li>通过 gRPC 调用远程 QueryNode</li><li>支持连接池（<code>WorkerPoolingSize</code> 配置）</li><li>使用 round-robin 选择连接</li></ul><p><strong>Worker 接口</strong>:</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">type</span> Worker <span class="keyword">interface</span> &#123;</span><br><span class="line">    LoadSegments(context.Context, *querypb.LoadSegmentsRequest) <span class="type">error</span></span><br><span class="line">    ReleaseSegments(context.Context, *querypb.ReleaseSegmentsRequest) <span class="type">error</span></span><br><span class="line">    SearchSegments(ctx context.Context, req *querypb.SearchRequest) (*internalpb.SearchResults, <span class="type">error</span>)</span><br><span class="line">    QuerySegments(ctx context.Context, req *querypb.QueryRequest) (*internalpb.RetrieveResults, <span class="type">error</span>)</span><br><span class="line">    QueryStreamSegments(ctx context.Context, req *querypb.QueryRequest, srv streamrpc.QueryStreamServer) <span class="type">error</span></span><br><span class="line">    GetStatistics(ctx context.Context, req *querypb.GetStatisticsRequest) (*internalpb.GetStatisticsResponse, <span class="type">error</span>)</span><br><span class="line">    <span class="comment">// ...</span></span><br><span class="line">    IsHealthy() <span class="type">bool</span></span><br><span class="line">    Stop()</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h2 id="使用场景"><a href="#使用场景" class="headerlink" title="使用场景"></a>使用场景</h2><h3 id="1-Query-操作"><a href="#1-Query-操作" class="headerlink" title="1. Query 操作"></a>1. Query 操作</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">tasks, err := organizeSubTask(ctx, req, sealed, growing, sd, <span class="literal">true</span>, sd.modifyQueryRequest)</span><br></pre></td></tr></table></figure><ul><li>用于普通查询操作</li><li><code>skipEmpty = true</code>: 跳过没有 segments 的任务</li></ul><h3 id="2-QueryStream-操作"><a href="#2-QueryStream-操作" class="headerlink" title="2. QueryStream 操作"></a>2. QueryStream 操作</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">tasks, err := organizeSubTask(ctx, req, sealed, growing, sd, <span class="literal">true</span>, sd.modifyQueryRequest)</span><br></pre></td></tr></table></figure><ul><li>用于流式查询操作</li><li>与普通查询使用相同的逻辑</li></ul><h3 id="3-Search-操作"><a href="#3-Search-操作" class="headerlink" title="3. Search 操作"></a>3. Search 操作</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">tasks, err := organizeSubTask(ctx, req, sealed, growing, sd, <span class="literal">true</span>, sd.modifySearchRequest)</span><br></pre></td></tr></table></figure><ul><li>用于向量搜索操作</li><li>使用 <code>modifySearchRequest</code> 定制请求</li></ul><h3 id="4-GetStatistics-操作"><a href="#4-GetStatistics-操作" class="headerlink" title="4. GetStatistics 操作"></a>4. GetStatistics 操作</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">tasks, err := organizeSubTask(ctx, req, sealed, growing, sd, <span class="literal">true</span>, </span><br><span class="line">    <span class="function"><span class="keyword">func</span><span class="params">(req *querypb.GetStatisticsRequest, scope querypb.DataScope, segmentIDs []<span class="type">int64</span>, targetID <span class="type">int64</span>)</span></span> *querypb.GetStatisticsRequest &#123;</span><br><span class="line">        <span class="comment">// 内联定制函数</span></span><br><span class="line">    &#125;)</span><br></pre></td></tr></table></figure><ul><li>用于获取统计信息</li><li>使用内联函数定制请求</li></ul><h3 id="5-UpdateSchema-操作"><a href="#5-UpdateSchema-操作" class="headerlink" title="5. UpdateSchema 操作"></a>5. UpdateSchema 操作</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">tasks, err := organizeSubTask(ctx, &amp;querypb.UpdateSchemaRequest&#123;...&#125;, </span><br><span class="line">    sealed, growing, sd, <span class="literal">false</span>,  <span class="comment">// skipEmpty = false</span></span><br><span class="line">    <span class="function"><span class="keyword">func</span><span class="params">(...)</span></span> *querypb.UpdateSchemaRequest &#123;</span><br><span class="line">        <span class="comment">// 定制函数</span></span><br><span class="line">    &#125;)</span><br></pre></td></tr></table></figure><ul><li>用于更新 schema</li><li><code>skipEmpty = false</code>: 即使没有 segments 也创建任务（需要通知所有节点）</li></ul><h2 id="设计优势"><a href="#设计优势" class="headerlink" title="设计优势"></a>设计优势</h2><h3 id="1-泛型设计"><a href="#1-泛型设计" class="headerlink" title="1. 泛型设计"></a>1. 泛型设计</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">organizeSubTask</span>[<span class="title">T</span> <span class="title">any</span>]<span class="params">(...)</span></span></span><br></pre></td></tr></table></figure><ul><li><strong>类型安全</strong>: 编译时类型检查</li><li><strong>代码复用</strong>: 同一函数支持多种请求类型</li><li><strong>灵活性</strong>: 通过 <code>modify</code> 函数定制不同类型的请求</li></ul><h3 id="2-容错设计"><a href="#2-容错设计" class="headerlink" title="2. 容错设计"></a>2. 容错设计</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">worker, err := sd.workerManager.GetWorker(ctx, workerID)</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">    log.Warn(<span class="string">&quot;failed to get worker for sub task&quot;</span>, ...)</span><br><span class="line">    <span class="comment">// 继续创建任务，worker 为 nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><ul><li>即使 worker 不可用，仍然创建任务</li><li>让后续的 <code>executeSubTasks</code> 决定如何处理（部分结果 vs 完全失败）</li></ul><h3 id="3-关注点分离"><a href="#3-关注点分离" class="headerlink" title="3. 关注点分离"></a>3. 关注点分离</h3><ul><li><strong>organizeSubTask</strong>: 只负责组织任务，不执行</li><li><strong>executeSubTasks</strong>: 负责执行任务和结果收集</li><li><strong>modify 函数</strong>: 负责请求定制</li></ul><h3 id="4-批量操作"><a href="#4-批量操作" class="headerlink" title="4. 批量操作"></a>4. 批量操作</h3><ul><li>将同一节点上的所有 segments 组织到一个任务中</li><li>减少 RPC 调用次数</li><li>提高网络效率</li></ul><h2 id="性能考虑"><a href="#性能考虑" class="headerlink" title="性能考虑"></a>性能考虑</h2><h3 id="1-预分配容量"><a href="#1-预分配容量" class="headerlink" title="1. 预分配容量"></a>1. 预分配容量</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">result := <span class="built_in">make</span>([]subTask[T], <span class="number">0</span>, <span class="built_in">len</span>(sealed)+<span class="number">1</span>)</span><br></pre></td></tr></table></figure><p>减少内存重新分配。</p><h3 id="2-Worker-缓存"><a href="#2-Worker-缓存" class="headerlink" title="2. Worker 缓存"></a>2. Worker 缓存</h3><ul><li>WorkerManager 缓存已创建的 worker</li><li>避免重复建立 gRPC 连接</li></ul><h3 id="3-连接池"><a href="#3-连接池" class="headerlink" title="3. 连接池"></a>3. 连接池</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">poolSize := paramtable.Get().QueryNodeCfg.WorkerPoolingSize.GetAsInt()</span><br></pre></td></tr></table></figure><ul><li>每个远程 worker 维护多个 gRPC 连接</li><li>Round-robin 选择连接，提高并发性能</li></ul><h3 id="4-本地优化"><a href="#4-本地优化" class="headerlink" title="4. 本地优化"></a>4. 本地优化</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">if</span> nodeID == node.GetNodeID() &#123;</span><br><span class="line">    <span class="keyword">return</span> NewLocalWorker(node), <span class="literal">nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><ul><li>本地 worker 直接调用，无网络开销</li></ul><h2 id="错误处理"><a href="#错误处理" class="headerlink" title="错误处理"></a>错误处理</h2><h3 id="Worker-获取失败"><a href="#Worker-获取失败" class="headerlink" title="Worker 获取失败"></a>Worker 获取失败</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">    log.Warn(<span class="string">&quot;failed to get worker for sub task&quot;</span>, ...)</span><br><span class="line">    <span class="comment">// 不返回错误，继续创建任务</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>策略</strong>: 延迟错误处理</p><ul><li>在 <code>organizeSubTask</code> 阶段只记录警告</li><li>在 <code>executeSubTasks</code> 阶段根据部分结果策略决定是否失败</li></ul><h3 id="空-Segment-列表"><a href="#空-Segment-列表" class="headerlink" title="空 Segment 列表"></a>空 Segment 列表</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">if</span> skipEmpty &amp;&amp; <span class="built_in">len</span>(segmentIDs) == <span class="number">0</span> &#123;</span><br><span class="line">    <span class="keyword">return</span> <span class="literal">nil</span>  <span class="comment">// 跳过任务</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>场景</strong>:</p><ul><li>某个节点上没有需要查询的 segments</li><li>根据 <code>skipEmpty</code> 参数决定是否创建任务</li></ul><h2 id="完整示例"><a href="#完整示例" class="headerlink" title="完整示例"></a>完整示例</h2><h3 id="输入"><a href="#输入" class="headerlink" title="输入"></a>输入</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br></pre></td><td class="code"><pre><span class="line">sealed = [</span><br><span class="line">    &#123;NodeID: <span class="number">1</span>, Segments: [</span><br><span class="line">        &#123;SegmentID: <span class="number">101</span>, PartitionID: <span class="number">1</span>&#125;,</span><br><span class="line">        &#123;SegmentID: <span class="number">102</span>, PartitionID: <span class="number">1</span>&#125;,</span><br><span class="line">    ]&#125;,</span><br><span class="line">    &#123;NodeID: <span class="number">2</span>, Segments: [</span><br><span class="line">        &#123;SegmentID: <span class="number">103</span>, PartitionID: <span class="number">2</span>&#125;,</span><br><span class="line">    ]&#125;,</span><br><span class="line">]</span><br><span class="line"></span><br><span class="line">growing = [</span><br><span class="line">    &#123;SegmentID: <span class="number">201</span>, NodeID: <span class="number">0</span>, PartitionID: <span class="number">1</span>&#125;,</span><br><span class="line">    &#123;SegmentID: <span class="number">202</span>, NodeID: <span class="number">0</span>, PartitionID: <span class="number">2</span>&#125;,</span><br><span class="line">]</span><br><span class="line"></span><br><span class="line">currentNodeID = <span class="number">0</span></span><br></pre></td></tr></table></figure><h3 id="输出"><a href="#输出" class="headerlink" title="输出"></a>输出</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br></pre></td><td class="code"><pre><span class="line">tasks = [</span><br><span class="line">    &#123;</span><br><span class="line">        req: QueryRequest&#123;</span><br><span class="line">            SegmentIDs: [<span class="number">101</span>, <span class="number">102</span>],</span><br><span class="line">            Scope: DataScope_Historical,</span><br><span class="line">            TargetID: <span class="number">1</span>,</span><br><span class="line">            DmlChannels: [<span class="string">&quot;channel-1&quot;</span>],</span><br><span class="line">        &#125;,</span><br><span class="line">        targetID: <span class="number">1</span>,</span><br><span class="line">        worker: RemoteWorker&#123;nodeID: <span class="number">1</span>&#125;,</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">        req: QueryRequest&#123;</span><br><span class="line">            SegmentIDs: [<span class="number">103</span>],</span><br><span class="line">            Scope: DataScope_Historical,</span><br><span class="line">            TargetID: <span class="number">2</span>,</span><br><span class="line">            DmlChannels: [<span class="string">&quot;channel-1&quot;</span>],</span><br><span class="line">        &#125;,</span><br><span class="line">        targetID: <span class="number">2</span>,</span><br><span class="line">        worker: RemoteWorker&#123;nodeID: <span class="number">2</span>&#125;,</span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">        req: QueryRequest&#123;</span><br><span class="line">            SegmentIDs: [<span class="number">201</span>, <span class="number">202</span>],</span><br><span class="line">            Scope: DataScope_Streaming,</span><br><span class="line">            TargetID: <span class="number">0</span>,</span><br><span class="line">            DmlChannels: [<span class="string">&quot;channel-1&quot;</span>],</span><br><span class="line">        &#125;,</span><br><span class="line">        targetID: <span class="number">0</span>,</span><br><span class="line">        worker: LocalWorker&#123;&#125;,</span><br><span class="line">    &#125;,</span><br><span class="line">]</span><br></pre></td></tr></table></figure><h2 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h2><p><code>organizeSubTask</code> 是一个设计精巧的泛型函数，具有以下特点：</p><ol><li><strong>高度抽象</strong>: 通过泛型和回调函数支持多种请求类型</li><li><strong>容错健壮</strong>: 即使部分 worker 不可用也能创建任务</li><li><strong>性能优化</strong>: 批量操作、连接池、本地优化</li><li><strong>关注点分离</strong>: 只负责组织任务，不负责执行</li><li><strong>易于扩展</strong>: 添加新的请求类型只需提供新的 modify 函数</li></ol><p>该函数是 Milvus 查询系统中任务分发机制的核心组件，确保了查询能够高效、可靠地分发到多个节点并行执行。</p>]]></content>
    
    
      
      
    <summary type="html">&lt;p&gt;本文档深入分析 &lt;code&gt;organizeSubTask&lt;/code&gt; 函数的实现细节，这是 QueryNode delegator 中负责组织和分发子任务的核心函数。&lt;/p&gt;
&lt;h2 id=&quot;函数签名&quot;&gt;&lt;a href=&quot;#函数签名&quot; class=&quot;headerlink</summary>
      
    
    
    
    
    <category term="Milvus" scheme="https://szza.github.io/tags/Milvus/"/>
    
  </entry>
  
  <entry>
    <title>Milvus ShardDelegator.Query 方法详细分析</title>
    <link href="https://szza.github.io/2025/08/10/Milvus/16_shard_delegator_query_analysis/"/>
    <id>https://szza.github.io/2025/08/10/Milvus/16_shard_delegator_query_analysis/</id>
    <published>2025-08-10T05:00:00.000Z</published>
    <updated>2026-01-06T13:10:49.529Z</updated>
    
    <content type="html"><![CDATA[<p>本文档深入分析 <code>shardDelegator.Query</code> 方法的实现细节，这是 QueryNode 中处理查询请求的核心方法。</p><h2 id="方法签名"><a href="#方法签名" class="headerlink" title="方法签名"></a>方法签名</h2><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(sd *shardDelegator)</span></span> Query(ctx context.Context, req *querypb.QueryRequest) ([]*internalpb.RetrieveResults, <span class="type">error</span>)</span><br></pre></td></tr></table></figure><p><strong>位置</strong>: <code>internal/querynodev2/delegator/delegator.go:577</code></p><h2 id="方法职责"><a href="#方法职责" class="headerlink" title="方法职责"></a>方法职责</h2><p><code>shardDelegator.Query</code> 负责在 shard 级别执行查询操作，它协调多个 segment 的查询，处理时间戳同步、segment 管理、任务分发和结果收集。</p><h2 id="执行流程详解"><a href="#执行流程详解" class="headerlink" title="执行流程详解"></a>执行流程详解</h2><h3 id="1-生命周期管理-Lines-579-582"><a href="#1-生命周期管理-Lines-579-582" class="headerlink" title="1. 生命周期管理 (Lines 579-582)"></a>1. 生命周期管理 (Lines 579-582)</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">if</span> err := sd.lifetime.Add(sd.IsWorking); err != <span class="literal">nil</span> &#123;</span><br><span class="line">    <span class="keyword">return</span> <span class="literal">nil</span>, err</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">defer</span> sd.lifetime.Done()</span><br></pre></td></tr></table></figure><ul><li><strong>目的</strong>: 确保 delegator 处于工作状态</li><li><strong>机制</strong>: 使用 lifetime 管理器跟踪组件状态</li><li><strong>作用</strong>: 防止在 delegator 关闭或不可用时执行查询</li></ul><h3 id="2-Channel-验证-Lines-584-589"><a href="#2-Channel-验证-Lines-584-589" class="headerlink" title="2. Channel 验证 (Lines 584-589)"></a>2. Channel 验证 (Lines 584-589)</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">if</span> !funcutil.SliceContain(req.GetDmlChannels(), sd.vchannelName) &#123;</span><br><span class="line">    log.Warn(<span class="string">&quot;delegator received query request not belongs to it&quot;</span>, ...)</span><br><span class="line">    <span class="keyword">return</span> <span class="literal">nil</span>, fmt.Errorf(<span class="string">&quot;dml channel not match, ...&quot;</span>)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><ul><li><strong>目的</strong>: 验证请求的 channel 是否属于当前 delegator</li><li><strong>原因</strong>: 每个 delegator 只负责一个特定的 virtual channel</li><li><strong>错误处理</strong>: 如果不匹配，立即返回错误</li></ul><h3 id="3-优化-Guarantee-Timestamp-Lines-591-597"><a href="#3-优化-Guarantee-Timestamp-Lines-591-597" class="headerlink" title="3. 优化 Guarantee Timestamp (Lines 591-597)"></a>3. 优化 Guarantee Timestamp (Lines 591-597)</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line">req.Req.GuaranteeTimestamp = sd.speedupGuranteeTS(</span><br><span class="line">    ctx,</span><br><span class="line">    req.Req.GetConsistencyLevel(),</span><br><span class="line">    req.Req.GetGuaranteeTimestamp(),</span><br><span class="line">    req.Req.GetMvccTimestamp(),</span><br><span class="line">    req.Req.GetIsIterator(),</span><br><span class="line">)</span><br></pre></td></tr></table></figure><p><strong>speedupGuranteeTS 方法</strong> (line 918):</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(sd *shardDelegator)</span></span> speedupGuranteeTS(...) <span class="type">uint64</span> &#123;</span><br><span class="line">    <span class="comment">// 如果不是 Strong 一致性或已设置 mvccTS，直接返回原值</span></span><br><span class="line">    <span class="keyword">if</span> isIterator || cl != commonpb.ConsistencyLevel_Strong || mvccTS != <span class="number">0</span> &#123;</span><br><span class="line">        <span class="keyword">return</span> guaranteeTS</span><br><span class="line">    &#125;</span><br><span class="line">    <span class="comment">// 使用 WAL 的 MVCC timestamp 来加速 Strong 一致性查询</span></span><br><span class="line">    <span class="keyword">if</span> mvcc, err := streaming.WAL().Local().GetLatestMVCCTimestampIfLocal(ctx, sd.vchannelName); </span><br><span class="line">       err == <span class="literal">nil</span> &amp;&amp; mvcc &lt; guaranteeTS &#123;</span><br><span class="line">        <span class="keyword">return</span> mvcc</span><br><span class="line">    &#125;</span><br><span class="line">    <span class="keyword">return</span> guaranteeTS</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><ul><li><strong>优化原理</strong>: 对于 Strong 一致性查询，如果 WAL 的 MVCC timestamp 小于 guarantee timestamp，可以使用更小的值来加速查询</li><li><strong>适用场景</strong>: <ul><li>一致性级别为 <code>Strong</code></li><li>不是 iterator 模式</li><li>未设置 mvccTS</li></ul></li><li><strong>效果</strong>: 减少等待时间，提高查询性能</li></ul><h3 id="4-等待-tSafe-Lines-599-621"><a href="#4-等待-tSafe-Lines-599-621" class="headerlink" title="4. 等待 tSafe (Lines 599-621)"></a>4. 等待 tSafe (Lines 599-621)</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br></pre></td><td class="code"><pre><span class="line">partialResultRequiredDataRatio := paramtable.Get().QueryNodeCfg.PartialResultRequiredDataRatio.GetAsFloat()</span><br><span class="line">waitTr := timerecord.NewTimeRecorder(<span class="string">&quot;wait tSafe&quot;</span>)</span><br><span class="line"><span class="keyword">var</span> tSafe <span class="type">uint64</span></span><br><span class="line"><span class="keyword">var</span> err <span class="type">error</span></span><br><span class="line"><span class="keyword">if</span> partialResultRequiredDataRatio &gt;= <span class="number">1.0</span> &#123;</span><br><span class="line">    <span class="comment">// 需要完整结果，必须等待 tSafe</span></span><br><span class="line">    tSafe, err = sd.waitTSafe(ctx, req.Req.GuaranteeTimestamp)</span><br><span class="line">    <span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">        <span class="keyword">return</span> <span class="literal">nil</span>, err</span><br><span class="line">    &#125;</span><br><span class="line">    <span class="keyword">if</span> req.GetReq().GetMvccTimestamp() == <span class="number">0</span> &#123;</span><br><span class="line">        req.Req.MvccTimestamp = tSafe</span><br><span class="line">    &#125;</span><br><span class="line">&#125; <span class="keyword">else</span> &#123;</span><br><span class="line">    <span class="comment">// 允许部分结果，使用当前 tSafe 即可</span></span><br><span class="line">    <span class="keyword">if</span> req.GetReq().GetMvccTimestamp() == <span class="number">0</span> &#123;</span><br><span class="line">        req.Req.MvccTimestamp = sd.GetTSafe()</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>waitTSafe 方法</strong> (line 939):</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(sd *shardDelegator)</span></span> waitTSafe(ctx context.Context, ts <span class="type">uint64</span>) (<span class="type">uint64</span>, <span class="type">error</span>) &#123;</span><br><span class="line">    <span class="comment">// 如果已经满足条件，直接返回</span></span><br><span class="line">    latestTSafe := sd.latestTsafe.Load()</span><br><span class="line">    <span class="keyword">if</span> latestTSafe &gt;= ts &#123;</span><br><span class="line">        <span class="keyword">return</span> latestTSafe, <span class="literal">nil</span></span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 检查是否启用降级模式</span></span><br><span class="line">    <span class="keyword">if</span> paramtable.Get().QueryNodeCfg.DowngradeTsafe.GetAsBool() &#123;</span><br><span class="line">        <span class="keyword">return</span> latestTSafe, <span class="literal">nil</span></span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 检查时间戳延迟是否过大</span></span><br><span class="line">    lag := gt.Sub(st)</span><br><span class="line">    maxLag := paramtable.Get().QueryNodeCfg.MaxTimestampLag.GetAsDuration(time.Second)</span><br><span class="line">    <span class="keyword">if</span> lag &gt; maxLag &#123;</span><br><span class="line">        <span class="keyword">return</span> <span class="number">0</span>, WrapErrTsLagTooLarge(lag, maxLag)</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 等待 tSafe 更新</span></span><br><span class="line">    ch := <span class="built_in">make</span>(<span class="keyword">chan</span> <span class="keyword">struct</span>&#123;&#125;)</span><br><span class="line">    <span class="keyword">go</span> <span class="function"><span class="keyword">func</span><span class="params">()</span></span> &#123;</span><br><span class="line">        sd.tsCond.L.Lock()</span><br><span class="line">        <span class="keyword">defer</span> sd.tsCond.L.Unlock()</span><br><span class="line">        <span class="keyword">for</span> sd.latestTsafe.Load() &lt; ts &amp;&amp; ctx.Err() == <span class="literal">nil</span> &amp;&amp; sd.Serviceable() &#123;</span><br><span class="line">            sd.tsCond.Wait()</span><br><span class="line">        &#125;</span><br><span class="line">        <span class="built_in">close</span>(ch)</span><br><span class="line">    &#125;()</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 监听超时或完成信号</span></span><br><span class="line">    <span class="keyword">select</span> &#123;</span><br><span class="line">    <span class="keyword">case</span> &lt;-ctx.Done():</span><br><span class="line">        sd.tsCond.Broadcast()</span><br><span class="line">        <span class="keyword">return</span> <span class="number">0</span>, ctx.Err()</span><br><span class="line">    <span class="keyword">case</span> &lt;-ch:</span><br><span class="line">        <span class="keyword">return</span> sd.latestTsafe.Load(), <span class="literal">nil</span></span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>关键点</strong>:</p><ul><li><strong>tSafe (Time Safe)</strong>: 表示数据已经安全可读的时间戳</li><li><strong>等待机制</strong>: 使用条件变量 (<code>tsCond</code>) 等待 tSafe 更新</li><li><strong>部分结果模式</strong>: <ul><li><code>partialResultRequiredDataRatio &gt;= 1.0</code>: 必须等待完整数据，保证强一致性</li><li><code>partialResultRequiredDataRatio &lt; 1.0</code>: 允许部分结果，使用当前 tSafe，降低延迟</li></ul></li><li><strong>超时处理</strong>: 检查时间戳延迟，如果超过 <code>MaxTimestampLag</code> 则返回错误</li><li><strong>降级模式</strong>: 如果启用 <code>DowngradeTsafe</code>，直接返回当前 tSafe，不等待</li></ul><h3 id="5-Pin-Readable-Segments-Lines-623-628"><a href="#5-Pin-Readable-Segments-Lines-623-628" class="headerlink" title="5. Pin Readable Segments (Lines 623-628)"></a>5. Pin Readable Segments (Lines 623-628)</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line">sealed, growing, sealedRowCount, version, err := sd.distribution.PinReadableSegments(</span><br><span class="line">    partialResultRequiredDataRatio, </span><br><span class="line">    req.GetReq().GetPartitionIDs()...)</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">    <span class="keyword">return</span> <span class="literal">nil</span>, err</span><br><span class="line">&#125;</span><br><span class="line"><span class="keyword">defer</span> sd.distribution.Unpin(version)</span><br></pre></td></tr></table></figure><p><strong>PinReadableSegments 方法</strong> (distribution.go:162):</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(d *distribution)</span></span> PinReadableSegments(requiredLoadRatio <span class="type">float64</span>, partitions ...<span class="type">int64</span>) </span><br><span class="line">    (sealed []SnapshotItem, growing []SegmentEntry, sealedRowCount <span class="keyword">map</span>[<span class="type">int64</span>]<span class="type">int64</span>, version <span class="type">int64</span>, err <span class="type">error</span>) &#123;</span><br><span class="line">    </span><br><span class="line">    requireFullResult := requiredLoadRatio &gt;= <span class="number">1.0</span></span><br><span class="line">    loadRatioSatisfy := d.queryView.GetLoadedRatio() &gt;= requiredLoadRatio</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">var</span> isServiceable <span class="type">bool</span></span><br><span class="line">    <span class="keyword">if</span> requireFullResult &#123;</span><br><span class="line">        isServiceable = d.queryView.Serviceable()</span><br><span class="line">    &#125; <span class="keyword">else</span> &#123;</span><br><span class="line">        isServiceable = loadRatioSatisfy</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">if</span> !isServiceable &#123;</span><br><span class="line">        <span class="keyword">return</span> <span class="literal">nil</span>, <span class="literal">nil</span>, <span class="literal">nil</span>, <span class="number">-1</span>, merr.WrapErrChannelNotAvailable(...)</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 验证 partition 是否已加载</span></span><br><span class="line">    <span class="keyword">for</span> _, partition := <span class="keyword">range</span> partitions &#123;</span><br><span class="line">        <span class="keyword">if</span> !current.partitions.Contain(partition) &#123;</span><br><span class="line">            <span class="keyword">return</span> <span class="literal">nil</span>, <span class="literal">nil</span>, <span class="literal">nil</span>, <span class="number">-1</span>, merr.WrapErrPartitionNotLoaded(partition)</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 获取 segments</span></span><br><span class="line">    sealed, growing = current.Get(partitions...)</span><br><span class="line">    version = current.version</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 根据加载比例过滤 segments</span></span><br><span class="line">    <span class="keyword">if</span> d.queryView.GetLoadedRatio() == <span class="number">1.0</span> &#123;</span><br><span class="line">        <span class="comment">// 完全加载：使用 target version 过滤</span></span><br><span class="line">        targetVersion := current.GetTargetVersion()</span><br><span class="line">        filterReadable := d.readableFilter(targetVersion)</span><br><span class="line">        sealed, growing = d.filterSegments(sealed, growing, filterReadable)</span><br><span class="line">    &#125; <span class="keyword">else</span> &#123;</span><br><span class="line">        <span class="comment">// 部分加载：只返回已加载的 segments</span></span><br><span class="line">        sealed = lo.Map(sealed, <span class="function"><span class="keyword">func</span><span class="params">(item SnapshotItem, _ <span class="type">int</span>)</span></span> SnapshotItem &#123;</span><br><span class="line">            <span class="keyword">return</span> SnapshotItem&#123;</span><br><span class="line">                NodeID: item.NodeID,</span><br><span class="line">                Segments: lo.Filter(item.Segments, <span class="function"><span class="keyword">func</span><span class="params">(entry SegmentEntry, _ <span class="type">int</span>)</span></span> <span class="type">bool</span> &#123;</span><br><span class="line">                    <span class="keyword">return</span> d.queryView.sealedSegmentRowCount[entry.SegmentID] &gt; <span class="number">0</span></span><br><span class="line">                &#125;),</span><br><span class="line">            &#125;</span><br><span class="line">        &#125;)</span><br><span class="line">        <span class="comment">// 类似处理 growing segments</span></span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// Pin snapshot，防止被卸载</span></span><br><span class="line">    snapshot, _ := d.snapshots.GetOrInsert(version, <span class="function"><span class="keyword">func</span><span class="params">()</span></span> *snapshot &#123;</span><br><span class="line">        <span class="keyword">return</span> current.Clone()</span><br><span class="line">    &#125;)</span><br><span class="line">    snapshot.AddRef()</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">return</span> sealed, growing, sealedRowCount, version, <span class="literal">nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>关键概念</strong>:</p><ul><li><strong>Pin 操作</strong>: 增加 snapshot 的引用计数，防止 segment 在查询过程中被卸载</li><li><strong>Snapshot</strong>: 某个时间点的 segment 分布快照</li><li><strong>Sealed Segments</strong>: 已封存的 segments，数据不再变化</li><li><strong>Growing Segments</strong>: 正在增长的 segments，数据可能还在变化</li><li><strong>Serviceable</strong>: 检查 channel 是否可服务（所有必需的 segments 都已加载）</li><li><strong>Load Ratio</strong>: 已加载的 segments 比例，用于部分结果模式</li></ul><h3 id="6-处理-IgnoreGrowing-标志-Lines-630-632"><a href="#6-处理-IgnoreGrowing-标志-Lines-630-632" class="headerlink" title="6. 处理 IgnoreGrowing 标志 (Lines 630-632)"></a>6. 处理 IgnoreGrowing 标志 (Lines 630-632)</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">if</span> req.Req.IgnoreGrowing &#123;</span><br><span class="line">    growing = []SegmentEntry&#123;&#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><ul><li><strong>目的</strong>: 如果请求指定忽略 growing segments，则清空 growing 列表</li><li><strong>场景</strong>: 某些查询只需要查询历史数据，不需要最新的增量数据</li></ul><h3 id="7-Segment-剪枝-Lines-634-640"><a href="#7-Segment-剪枝-Lines-634-640" class="headerlink" title="7. Segment 剪枝 (Lines 634-640)"></a>7. Segment 剪枝 (Lines 634-640)</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">if</span> paramtable.Get().QueryNodeCfg.EnableSegmentPrune.GetAsBool() &#123;</span><br><span class="line">    <span class="function"><span class="keyword">func</span><span class="params">()</span></span> &#123;</span><br><span class="line">        sd.partitionStatsMut.RLock()</span><br><span class="line">        <span class="keyword">defer</span> sd.partitionStatsMut.RUnlock()</span><br><span class="line">        PruneSegments(ctx, sd.partitionStats, <span class="literal">nil</span>, req.GetReq(), </span><br><span class="line">            sd.collection.Schema(), sealed, </span><br><span class="line">            PruneInfo&#123;paramtable.Get().QueryNodeCfg.DefaultSegmentFilterRatio.GetAsFloat()&#125;)</span><br><span class="line">    &#125;()</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><ul><li><strong>目的</strong>: 根据统计信息剪枝不相关的 segments</li><li><strong>原理</strong>: 使用 partition 统计信息（min&#x2F;max 值）判断 segment 是否可能包含查询结果</li><li><strong>效果</strong>: 减少不必要的 segment 查询，提高性能</li><li><strong>线程安全</strong>: 使用读锁保护 partitionStats</li></ul><h3 id="8-组织子任务-Lines-648-652"><a href="#8-组织子任务-Lines-648-652" class="headerlink" title="8. 组织子任务 (Lines 648-652)"></a>8. 组织子任务 (Lines 648-652)</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">tasks, err := organizeSubTask(ctx, req, sealed, growing, sd, <span class="literal">true</span>, sd.modifyQueryRequest)</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">    <span class="keyword">return</span> <span class="literal">nil</span>, err</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>organizeSubTask 方法</strong> (line 751):</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">organizeSubTask</span>[<span class="title">T</span> <span class="title">any</span>]<span class="params">(ctx context.Context,</span></span></span><br><span class="line"><span class="params"><span class="function">    req T,</span></span></span><br><span class="line"><span class="params"><span class="function">    sealed []SnapshotItem,</span></span></span><br><span class="line"><span class="params"><span class="function">    growing []SegmentEntry,</span></span></span><br><span class="line"><span class="params"><span class="function">    sd *shardDelegator,</span></span></span><br><span class="line"><span class="params"><span class="function">    skipEmpty <span class="type">bool</span>,</span></span></span><br><span class="line"><span class="params"><span class="function">    modify <span class="keyword">func</span>(T, querypb.DataScope, []<span class="type">int64</span>, <span class="type">int64</span>)</span></span> T,</span><br><span class="line">) ([]subTask[T], <span class="type">error</span>) &#123;</span><br><span class="line">    </span><br><span class="line">    result := <span class="built_in">make</span>([]subTask[T], <span class="number">0</span>, <span class="built_in">len</span>(sealed)+<span class="number">1</span>)</span><br><span class="line">    </span><br><span class="line">    packSubTask := <span class="function"><span class="keyword">func</span><span class="params">(segments []SegmentEntry, workerID <span class="type">int64</span>, scope querypb.DataScope)</span></span> <span class="type">error</span> &#123;</span><br><span class="line">        segmentIDs := lo.Map(segments, <span class="function"><span class="keyword">func</span><span class="params">(item SegmentEntry, _ <span class="type">int</span>)</span></span> <span class="type">int64</span> &#123;</span><br><span class="line">            <span class="keyword">return</span> item.SegmentID</span><br><span class="line">        &#125;)</span><br><span class="line">        <span class="keyword">if</span> skipEmpty &amp;&amp; <span class="built_in">len</span>(segmentIDs) == <span class="number">0</span> &#123;</span><br><span class="line">            <span class="keyword">return</span> <span class="literal">nil</span></span><br><span class="line">        &#125;</span><br><span class="line">        </span><br><span class="line">        <span class="comment">// 修改请求，设置 scope、segmentIDs、targetID</span></span><br><span class="line">        req := modify(req, scope, segmentIDs, workerID)</span><br><span class="line">        </span><br><span class="line">        <span class="comment">// 获取 worker</span></span><br><span class="line">        worker, err := sd.workerManager.GetWorker(ctx, workerID)</span><br><span class="line">        <span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">            log.Warn(<span class="string">&quot;failed to get worker for sub task&quot;</span>, ...)</span><br><span class="line">        &#125;</span><br><span class="line">        </span><br><span class="line">        result = <span class="built_in">append</span>(result, subTask[T]&#123;</span><br><span class="line">            req:      req,</span><br><span class="line">            targetID: workerID,</span><br><span class="line">            worker:   worker,</span><br><span class="line">        &#125;)</span><br><span class="line">        <span class="keyword">return</span> <span class="literal">nil</span></span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 为每个 sealed snapshot 创建任务</span></span><br><span class="line">    <span class="keyword">for</span> _, entry := <span class="keyword">range</span> sealed &#123;</span><br><span class="line">        packSubTask(entry.Segments, entry.NodeID, querypb.DataScope_Historical)</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 为 growing segments 创建任务（本地节点）</span></span><br><span class="line">    packSubTask(growing, paramtable.GetNodeID(), querypb.DataScope_Streaming)</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">return</span> result, <span class="literal">nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>modifyQueryRequest 方法</strong> (line 292):</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(sd *shardDelegator)</span></span> modifyQueryRequest(req *querypb.QueryRequest, </span><br><span class="line">    scope querypb.DataScope, segmentIDs []<span class="type">int64</span>, targetID <span class="type">int64</span>) *querypb.QueryRequest &#123;</span><br><span class="line">    nodeReq := proto.Clone(req).(*querypb.QueryRequest)</span><br><span class="line">    nodeReq.Scope = scope                    <span class="comment">// Historical 或 Streaming</span></span><br><span class="line">    nodeReq.Req.Base.TargetID = targetID      <span class="comment">// 目标节点 ID</span></span><br><span class="line">    nodeReq.SegmentIDs = segmentIDs           <span class="comment">// 要查询的 segment IDs</span></span><br><span class="line">    nodeReq.DmlChannels = []<span class="type">string</span>&#123;sd.vchannelName&#125;</span><br><span class="line">    <span class="keyword">return</span> nodeReq</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>关键点</strong>:</p><ul><li><strong>任务分组</strong>: 将 segments 按节点分组，每个节点一个任务</li><li><strong>Scope 区分</strong>: <ul><li><code>DataScope_Historical</code>: sealed segments</li><li><code>DataScope_Streaming</code>: growing segments</li></ul></li><li><strong>Worker 管理</strong>: 通过 workerManager 获取 worker 客户端</li><li><strong>容错</strong>: 如果 worker 获取失败，任务仍然创建（用于部分结果模式）</li></ul><h3 id="9-执行子任务-Lines-654-661"><a href="#9-执行子任务-Lines-654-661" class="headerlink" title="9. 执行子任务 (Lines 654-661)"></a>9. 执行子任务 (Lines 654-661)</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line">results, err := executeSubTasks(ctx, tasks, </span><br><span class="line">    NewRowCountBasedEvaluator(sealedRowCount), </span><br><span class="line">    <span class="function"><span class="keyword">func</span><span class="params">(ctx context.Context, req *querypb.QueryRequest, worker cluster.Worker)</span></span> (*internalpb.RetrieveResults, <span class="type">error</span>) &#123;</span><br><span class="line">        resp, err := worker.QuerySegments(ctx, req)</span><br><span class="line">        status, ok := status.FromError(err)</span><br><span class="line">        <span class="keyword">if</span> ok &amp;&amp; status.Code() == codes.Unavailable &#123;</span><br><span class="line">            sd.markSegmentOffline(req.GetSegmentIDs()...)</span><br><span class="line">        &#125;</span><br><span class="line">        <span class="keyword">return</span> resp, err</span><br><span class="line">    &#125;, </span><br><span class="line">    <span class="string">&quot;Query&quot;</span>, log)</span><br></pre></td></tr></table></figure><p><strong>executeSubTasks 方法</strong> (line 802):</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br><span class="line">91</span><br><span class="line">92</span><br><span class="line">93</span><br><span class="line">94</span><br><span class="line">95</span><br><span class="line">96</span><br><span class="line">97</span><br><span class="line">98</span><br><span class="line">99</span><br><span class="line">100</span><br><span class="line">101</span><br><span class="line">102</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">executeSubTasks</span>[<span class="title">T</span> <span class="title">any</span>, <span class="title">R</span> <span class="title">interface</span></span>&#123; GetStatus() *commonpb.Status &#125;](</span><br><span class="line">    ctx context.Context, </span><br><span class="line">    tasks []subTask[T], </span><br><span class="line">    evaluator PartialResultEvaluator, </span><br><span class="line">    execute <span class="function"><span class="keyword">func</span><span class="params">(context.Context, T, cluster.Worker)</span></span> (R, <span class="type">error</span>), </span><br><span class="line">    taskType <span class="type">string</span>, </span><br><span class="line">    log *log.MLogger,</span><br><span class="line">) ([]R, <span class="type">error</span>) &#123;</span><br><span class="line">    </span><br><span class="line">    ctx, cancel := context.WithCancel(ctx)</span><br><span class="line">    <span class="keyword">defer</span> cancel()</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 确定部分结果要求</span></span><br><span class="line">    <span class="keyword">var</span> partialResultRequiredDataRatio <span class="type">float64</span></span><br><span class="line">    <span class="keyword">if</span> taskType == <span class="string">&quot;Query&quot;</span> || taskType == <span class="string">&quot;Search&quot;</span> &#123;</span><br><span class="line">        partialResultRequiredDataRatio = paramtable.Get().QueryNodeCfg.PartialResultRequiredDataRatio.GetAsFloat()</span><br><span class="line">    &#125; <span class="keyword">else</span> &#123;</span><br><span class="line">        partialResultRequiredDataRatio = <span class="number">1.0</span></span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 并发执行所有任务</span></span><br><span class="line">    wg, ctx := errgroup.WithContext(ctx)</span><br><span class="line">    resultCh := <span class="built_in">make</span>(<span class="keyword">chan</span> channelResult, <span class="built_in">len</span>(tasks))</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">for</span> _, task := <span class="keyword">range</span> tasks &#123;</span><br><span class="line">        task := task <span class="comment">// capture loop variable</span></span><br><span class="line">        wg.Go(<span class="function"><span class="keyword">func</span><span class="params">()</span></span> <span class="type">error</span> &#123;</span><br><span class="line">            <span class="keyword">var</span> result R</span><br><span class="line">            <span class="keyword">var</span> err <span class="type">error</span></span><br><span class="line">            </span><br><span class="line">            <span class="keyword">if</span> task.targetID == <span class="number">-1</span> || task.worker == <span class="literal">nil</span> &#123;</span><br><span class="line">                <span class="comment">// Worker 不可用</span></span><br><span class="line">                err = fmt.Errorf(<span class="string">&quot;segments not loaded in any worker: %v&quot;</span>, ...)</span><br><span class="line">            &#125; <span class="keyword">else</span> &#123;</span><br><span class="line">                <span class="comment">// 执行任务</span></span><br><span class="line">                result, err = execute(ctx, task.req, task.worker)</span><br><span class="line">                <span class="keyword">if</span> result.GetStatus().GetErrorCode() != commonpb.ErrorCode_Success &#123;</span><br><span class="line">                    err = fmt.Errorf(<span class="string">&quot;worker(%d) query failed: %s&quot;</span>, task.targetID, result.GetStatus().GetReason())</span><br><span class="line">                &#125;</span><br><span class="line">            &#125;</span><br><span class="line">            </span><br><span class="line">            <span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">                log.Warn(<span class="string">&quot;failed to execute sub task&quot;</span>, ...)</span><br><span class="line">                <span class="comment">// 如果禁用部分结果，立即失败</span></span><br><span class="line">                <span class="keyword">if</span> partialResultRequiredDataRatio == <span class="number">1</span> &#123;</span><br><span class="line">                    <span class="keyword">return</span> err</span><br><span class="line">                &#125;</span><br><span class="line">            &#125;</span><br><span class="line">            </span><br><span class="line">            <span class="comment">// 发送结果到 channel</span></span><br><span class="line">            resultCh &lt;- channelResult&#123;</span><br><span class="line">                nodeID:   task.targetID,</span><br><span class="line">                result:   result,</span><br><span class="line">                err:      err,</span><br><span class="line">                segments: req.GetSegmentIDs(),</span><br><span class="line">            &#125;</span><br><span class="line">            <span class="keyword">return</span> <span class="literal">nil</span></span><br><span class="line">        &#125;)</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 等待所有任务完成</span></span><br><span class="line">    <span class="keyword">if</span> err := wg.Wait(); err != <span class="literal">nil</span> &#123;</span><br><span class="line">        <span class="keyword">return</span> <span class="literal">nil</span>, err</span><br><span class="line">    &#125;</span><br><span class="line">    <span class="built_in">close</span>(resultCh)</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 收集结果</span></span><br><span class="line">    successSegmentList := typeutil.NewSet[<span class="type">int64</span>]()</span><br><span class="line">    failureSegmentList := <span class="built_in">make</span>([]<span class="type">int64</span>, <span class="number">0</span>)</span><br><span class="line">    <span class="keyword">var</span> errors []<span class="type">error</span></span><br><span class="line">    results := <span class="built_in">make</span>([]R, <span class="number">0</span>, <span class="built_in">len</span>(tasks))</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">for</span> item := <span class="keyword">range</span> resultCh &#123;</span><br><span class="line">        <span class="keyword">if</span> item.err == <span class="literal">nil</span> &#123;</span><br><span class="line">            successSegmentList.Insert(item.segments...)</span><br><span class="line">            results = <span class="built_in">append</span>(results, item.result)</span><br><span class="line">        &#125; <span class="keyword">else</span> &#123;</span><br><span class="line">            failureSegmentList = <span class="built_in">append</span>(failureSegmentList, item.segments...)</span><br><span class="line">            errors = <span class="built_in">append</span>(errors, item.err)</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 如果全部成功，直接返回</span></span><br><span class="line">    <span class="keyword">if</span> <span class="built_in">len</span>(errors) == <span class="number">0</span> &#123;</span><br><span class="line">        <span class="keyword">return</span> results, <span class="literal">nil</span></span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 使用 evaluator 判断是否返回部分结果</span></span><br><span class="line">    <span class="keyword">if</span> evaluator != <span class="literal">nil</span> &#123;</span><br><span class="line">        shouldReturnPartial, accessedDataRatio := evaluator(</span><br><span class="line">            taskType, successSegmentList, failureSegmentList, errors)</span><br><span class="line">        <span class="keyword">if</span> shouldReturnPartial &#123;</span><br><span class="line">            log.Info(<span class="string">&quot;partial result executed successfully&quot;</span>,</span><br><span class="line">                zap.Float64(<span class="string">&quot;accessedDataRatio&quot;</span>, accessedDataRatio),</span><br><span class="line">                zap.Int64s(<span class="string">&quot;failureSegmentList&quot;</span>, failureSegmentList),</span><br><span class="line">            )</span><br><span class="line">            <span class="keyword">return</span> results, <span class="literal">nil</span></span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">return</span> <span class="literal">nil</span>, merr.Combine(errors...)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>关键机制</strong>:</p><ul><li><strong>并发执行</strong>: 使用 <code>errgroup</code> 并发执行所有子任务</li><li><strong>结果收集</strong>: 使用 buffered channel 收集结果</li><li><strong>错误处理</strong>: <ul><li>如果 <code>partialResultRequiredDataRatio == 1.0</code>，任何失败都会导致整体失败</li><li>否则，使用 evaluator 评估是否可以返回部分结果</li></ul></li><li><strong>Segment 标记</strong>: 如果 worker 不可用，标记 segment 为 offline</li><li><strong>Partial Result Evaluator</strong>: 根据成功访问的数据比例决定是否返回部分结果</li></ul><p><strong>RowCountBasedEvaluator</strong>:</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">type</span> RowCountBasedEvaluator <span class="keyword">struct</span> &#123;</span><br><span class="line">    sealedRowCount <span class="keyword">map</span>[<span class="type">int64</span>]<span class="type">int64</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(e *RowCountBasedEvaluator)</span></span> Evaluate(</span><br><span class="line">    taskType <span class="type">string</span>,</span><br><span class="line">    successSegments typeutil.Set[<span class="type">int64</span>],</span><br><span class="line">    failureSegments []<span class="type">int64</span>,</span><br><span class="line">    errors []<span class="type">error</span>,</span><br><span class="line">) (shouldReturn <span class="type">bool</span>, accessedDataRatio <span class="type">float64</span>) &#123;</span><br><span class="line">    <span class="comment">// 计算成功访问的行数</span></span><br><span class="line">    successRows := <span class="number">0</span></span><br><span class="line">    <span class="keyword">for</span> segID := <span class="keyword">range</span> successSegments &#123;</span><br><span class="line">        successRows += e.sealedRowCount[segID]</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 计算总行数</span></span><br><span class="line">    totalRows := <span class="number">0</span></span><br><span class="line">    <span class="keyword">for</span> _, count := <span class="keyword">range</span> e.sealedRowCount &#123;</span><br><span class="line">        totalRows += count</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    accessedDataRatio = <span class="type">float64</span>(successRows) / <span class="type">float64</span>(totalRows)</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 判断是否满足部分结果要求</span></span><br><span class="line">    requiredRatio := paramtable.Get().QueryNodeCfg.PartialResultRequiredDataRatio.GetAsFloat()</span><br><span class="line">    shouldReturn = accessedDataRatio &gt;= requiredRatio</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">return</span> shouldReturn, accessedDataRatio</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h3 id="10-返回结果-Lines-667-685"><a href="#10-返回结果-Lines-667-685" class="headerlink" title="10. 返回结果 (Lines 667-685)"></a>10. 返回结果 (Lines 667-685)</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br></pre></td><td class="code"><pre><span class="line">log.Debug(<span class="string">&quot;Delegator Query done&quot;</span>)</span><br><span class="line"><span class="keyword">if</span> log.Core().Enabled(zap.DebugLevel) &#123;</span><br><span class="line">    <span class="comment">// 记录查询的 segment IDs</span></span><br><span class="line">    sealedIDs := lo.FlatMap(sealed, <span class="function"><span class="keyword">func</span><span class="params">(item SnapshotItem, _ <span class="type">int</span>)</span></span> []<span class="type">int64</span> &#123;</span><br><span class="line">        <span class="keyword">return</span> lo.Map(item.Segments, <span class="function"><span class="keyword">func</span><span class="params">(segment SegmentEntry, _ <span class="type">int</span>)</span></span> <span class="type">int64</span> &#123;</span><br><span class="line">            <span class="keyword">return</span> segment.SegmentID</span><br><span class="line">        &#125;)</span><br><span class="line">    &#125;)</span><br><span class="line">    growingIDs := lo.Map(growing, <span class="function"><span class="keyword">func</span><span class="params">(item SegmentEntry, _ <span class="type">int</span>)</span></span> <span class="type">int64</span> &#123;</span><br><span class="line">        <span class="keyword">return</span> item.SegmentID</span><br><span class="line">    &#125;)</span><br><span class="line">    log.Debug(<span class="string">&quot;execute count on segments...&quot;</span>,</span><br><span class="line">        zap.Int64s(<span class="string">&quot;sealedIDs&quot;</span>, sealedIDs),</span><br><span class="line">        zap.Int64s(<span class="string">&quot;growingIDs&quot;</span>, growingIDs),</span><br><span class="line">    )</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="keyword">return</span> results, <span class="literal">nil</span></span><br></pre></td></tr></table></figure><ul><li><strong>结果</strong>: 返回所有成功的查询结果列表</li><li><strong>日志</strong>: 在 debug 模式下记录查询的 segment IDs</li></ul><h2 id="关键数据结构"><a href="#关键数据结构" class="headerlink" title="关键数据结构"></a>关键数据结构</h2><h3 id="subTask"><a href="#subTask" class="headerlink" title="subTask"></a>subTask</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">type</span> subTask[T any] <span class="keyword">struct</span> &#123;</span><br><span class="line">    req      T              <span class="comment">// 查询请求</span></span><br><span class="line">    targetID <span class="type">int64</span>          <span class="comment">// 目标节点 ID</span></span><br><span class="line">    worker   cluster.Worker <span class="comment">// Worker 客户端</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h3 id="SnapshotItem"><a href="#SnapshotItem" class="headerlink" title="SnapshotItem"></a>SnapshotItem</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">type</span> SnapshotItem <span class="keyword">struct</span> &#123;</span><br><span class="line">    NodeID   <span class="type">int64</span>          <span class="comment">// 节点 ID</span></span><br><span class="line">    Segments []SegmentEntry <span class="comment">// Segment 列表</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h3 id="SegmentEntry"><a href="#SegmentEntry" class="headerlink" title="SegmentEntry"></a>SegmentEntry</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">type</span> SegmentEntry <span class="keyword">struct</span> &#123;</span><br><span class="line">    SegmentID   <span class="type">int64</span></span><br><span class="line">    NodeID      <span class="type">int64</span></span><br><span class="line">    PartitionID <span class="type">int64</span></span><br><span class="line">    Version     <span class="type">int64</span></span><br><span class="line">    Offline     <span class="type">bool</span></span><br><span class="line">    <span class="comment">// ...</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h2 id="性能优化点"><a href="#性能优化点" class="headerlink" title="性能优化点"></a>性能优化点</h2><ol><li><strong>Guarantee Timestamp 优化</strong>: 使用 WAL MVCC timestamp 加速 Strong 一致性查询</li><li><strong>部分结果模式</strong>: 允许在数据未完全加载时返回部分结果，降低延迟</li><li><strong>Segment 剪枝</strong>: 根据统计信息跳过不相关的 segments</li><li><strong>并发执行</strong>: 多个 segment 查询并发执行</li><li><strong>Pin 机制</strong>: 防止查询过程中 segment 被卸载</li></ol><h2 id="错误处理"><a href="#错误处理" class="headerlink" title="错误处理"></a>错误处理</h2><ol><li><strong>Delegator 不可用</strong>: 生命周期检查失败</li><li><strong>Channel 不匹配</strong>: 请求的 channel 不属于当前 delegator</li><li><strong>tSafe 等待超时</strong>: 时间戳延迟过大</li><li><strong>Distribution 不可服务</strong>: Segments 未完全加载且不允许部分结果</li><li><strong>Worker 不可用</strong>: 节点离线，根据部分结果策略决定是否失败</li><li><strong>查询失败</strong>: Segment 查询失败，根据 evaluator 决定是否返回部分结果</li></ol><h2 id="配置参数"><a href="#配置参数" class="headerlink" title="配置参数"></a>配置参数</h2><ul><li><code>QueryNodeCfg.PartialResultRequiredDataRatio</code>: 部分结果所需数据比例（默认 1.0）</li><li><code>QueryNodeCfg.EnableSegmentPrune</code>: 是否启用 segment 剪枝</li><li><code>QueryNodeCfg.DefaultSegmentFilterRatio</code>: 默认 segment 过滤比例</li><li><code>QueryNodeCfg.DowngradeTsafe</code>: 是否降级 tSafe（不等待）</li><li><code>QueryNodeCfg.MaxTimestampLag</code>: 最大时间戳延迟</li></ul><h2 id="调用链"><a href="#调用链" class="headerlink" title="调用链"></a>调用链</h2><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><span class="line">QueryNode.Query()</span><br><span class="line">    ↓</span><br><span class="line">QueryNode.queryChannel()</span><br><span class="line">    ↓</span><br><span class="line">shardDelegator.Query()  ← 本文档分析的方法</span><br><span class="line">    ↓</span><br><span class="line">organizeSubTask()</span><br><span class="line">    ↓</span><br><span class="line">executeSubTasks()</span><br><span class="line">    ↓</span><br><span class="line">worker.QuerySegments()</span><br><span class="line">    ↓</span><br><span class="line">实际查询 segment 数据</span><br></pre></td></tr></table></figure><h2 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h2><p><code>shardDelegator.Query</code> 是一个复杂的协调方法，它：</p><ol><li><strong>管理时间戳</strong>: 优化 guarantee timestamp，等待 tSafe</li><li><strong>管理 Segments</strong>: Pin segments，防止卸载</li><li><strong>优化查询</strong>: Segment 剪枝，跳过不相关的数据</li><li><strong>分发任务</strong>: 将查询任务分发到不同的节点</li><li><strong>并发执行</strong>: 并发查询多个 segments</li><li><strong>容错处理</strong>: 支持部分结果，提高可用性</li></ol><p>该方法的设计充分考虑了性能、一致性和可用性的平衡，是 Milvus 查询系统的核心组件之一。</p>]]></content>
    
    
      
      
    <summary type="html">&lt;p&gt;本文档深入分析 &lt;code&gt;shardDelegator.Query&lt;/code&gt; 方法的实现细节，这是 QueryNode 中处理查询请求的核心方法。&lt;/p&gt;
&lt;h2 id=&quot;方法签名&quot;&gt;&lt;a href=&quot;#方法签名&quot; class=&quot;headerlink&quot; title=&quot;</summary>
      
    
    
    
    
    <category term="Milvus" scheme="https://szza.github.io/tags/Milvus/"/>
    
  </entry>
  
  <entry>
    <title>Milvus Query Request 处理流程分析</title>
    <link href="https://szza.github.io/2025/08/10/Milvus/15_query_request_flow_analysis/"/>
    <id>https://szza.github.io/2025/08/10/Milvus/15_query_request_flow_analysis/</id>
    <published>2025-08-10T04:00:00.000Z</published>
    <updated>2026-01-06T13:10:49.203Z</updated>
    
    <content type="html"><![CDATA[<p>本文档详细分析了 Milvus 中 Query Request 从接收到返回的完整处理流程。</p><h2 id="1-请求入口层"><a href="#1-请求入口层" class="headerlink" title="1. 请求入口层"></a>1. 请求入口层</h2><h3 id="1-1-HTTP-接口入口"><a href="#1-1-HTTP-接口入口" class="headerlink" title="1.1 HTTP 接口入口"></a>1.1 HTTP 接口入口</h3><p><strong>文件</strong>: <code>internal/distributed/proxy/httpserver/handler_v1.go</code>, <code>handler_v2.go</code></p><ul><li><p><strong>V1 API</strong>: <code>HandlersV1.query()</code> (line 496)</p><ul><li>接收 HTTP JSON 请求</li><li>解析参数：<code>collectionName</code>, <code>filter</code>, <code>outputFields</code>, <code>limit</code>, <code>offset</code></li><li>转换为 <code>milvuspb.QueryRequest</code></li><li>调用 <code>proxy.Query()</code></li></ul></li><li><p><strong>V2 API</strong>: <code>HandlersV2.query()</code> (line 912)</p><ul><li>支持更多参数：<code>partitionNames</code>, <code>consistencyLevel</code>, <code>exprParams</code></li><li>处理时区转换和表达式模板</li></ul></li></ul><h3 id="1-2-gRPC-接口入口"><a href="#1-2-gRPC-接口入口" class="headerlink" title="1.2 gRPC 接口入口"></a>1.2 gRPC 接口入口</h3><p><strong>文件</strong>: <code>internal/distributed/proxy/httpserver/handler.go</code></p><ul><li><code>handleQuery()</code> (line 384)<ul><li>直接接收 <code>milvuspb.QueryRequest</code></li><li>调用 <code>proxy.Query()</code></li></ul></li></ul><h2 id="2-Proxy-层处理"><a href="#2-Proxy-层处理" class="headerlink" title="2. Proxy 层处理"></a>2. Proxy 层处理</h2><h3 id="2-1-Query-方法入口"><a href="#2-1-Query-方法入口" class="headerlink" title="2.1 Query 方法入口"></a>2.1 Query 方法入口</h3><p><strong>文件</strong>: <code>internal/proxy/impl.go</code></p><p><strong>方法</strong>: <code>Proxy.Query()</code> (line 3554)</p><p><strong>主要步骤</strong>:</p><ol><li>创建 <code>queryTask</code> 对象</li><li>初始化 <code>RetrieveRequest</code> 基础信息</li><li>记录指标（metrics）</li><li>调用 <code>query()</code> 方法执行实际查询</li></ol><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line">qt := &amp;queryTask&#123;</span><br><span class="line">    ctx:       ctx,</span><br><span class="line">    Condition: NewTaskCondition(ctx),</span><br><span class="line">    RetrieveRequest: &amp;internalpb.RetrieveRequest&#123;...&#125;,</span><br><span class="line">    request:   request,</span><br><span class="line">    ...</span><br><span class="line">&#125;</span><br><span class="line">res, storageCost, err := node.query(ctx, qt, sp)</span><br></pre></td></tr></table></figure><h3 id="2-2-query-方法执行"><a href="#2-2-query-方法执行" class="headerlink" title="2.2 query 方法执行"></a>2.2 query 方法执行</h3><p><strong>文件</strong>: <code>internal/proxy/impl.go</code></p><p><strong>方法</strong>: <code>Proxy.query()</code> (line 3447)</p><p><strong>主要步骤</strong>:</p><ol><li>健康检查</li><li>记录日志和追踪信息</li><li><strong>将任务加入队列</strong>: <code>node.sched.dqQueue.Enqueue(qt)</code></li><li><strong>等待任务完成</strong>: <code>qt.WaitToFinish()</code></li><li>记录慢查询和指标</li></ol><h3 id="2-3-QueryTask-执行流程"><a href="#2-3-QueryTask-执行流程" class="headerlink" title="2.3 QueryTask 执行流程"></a>2.3 QueryTask 执行流程</h3><p><strong>文件</strong>: <code>internal/proxy/task_query.go</code></p><p>QueryTask 遵循标准的任务执行模式：<strong>PreExecute → Execute → PostExecute</strong></p><h4 id="2-3-1-PreExecute-阶段"><a href="#2-3-1-PreExecute-阶段" class="headerlink" title="2.3.1 PreExecute 阶段"></a>2.3.1 PreExecute 阶段</h4><p><strong>方法</strong>: <code>queryTask.PreExecute()</code> (line 369)</p><p><strong>主要职责</strong>:</p><ol><li><p><strong>验证和获取元数据</strong></p><ul><li>验证 collection 名称</li><li>获取 collection ID 和 schema</li><li>验证 partition 名称</li><li>检查 partition key 模式</li></ul></li><li><p><strong>解析查询参数</strong></p><ul><li>解析 <code>limit</code>, <code>offset</code>, <code>timezone</code>, <code>extractTimeFields</code></li><li>处理 iterator 模式</li><li>验证最大查询窗口</li></ul></li><li><p><strong>创建查询计划</strong></p><ul><li>调用 <code>createPlanArgs()</code> (line 299)</li><li>解析表达式：<code>planparserv2.CreateRetrievePlanArgs()</code></li><li>处理 count(*) 特殊逻辑</li><li>转换输出字段名称为字段 ID</li></ul></li><li><p><strong>设置时间戳</strong></p><ul><li>根据一致性级别计算 <code>guaranteeTimestamp</code></li><li>处理 <code>mvccTimestamp</code></li><li>处理 collection TTL</li><li>设置超时时间戳</li></ul></li><li><p><strong>序列化查询计划</strong></p><ul><li>将 PlanNode 序列化为 <code>SerializedExprPlan</code></li></ul></li></ol><h4 id="2-3-2-Execute-阶段"><a href="#2-3-2-Execute-阶段" class="headerlink" title="2.3.2 Execute 阶段"></a>2.3.2 Execute 阶段</h4><p><strong>方法</strong>: <code>queryTask.Execute()</code> (line 587)</p><p><strong>主要职责</strong>:</p><ol><li><p><strong>初始化结果缓冲区</strong></p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">t.resultBuf = typeutil.NewConcurrentSet[*internalpb.RetrieveResults]()</span><br></pre></td></tr></table></figure></li><li><p><strong>负载均衡执行</strong></p><ul><li>调用 <code>t.lb.Execute()</code> (line 595)</li><li>对每个 shard 调用 <code>queryShard()</code> 方法</li><li>使用 <code>shardclient.CollectionWorkLoad</code> 管理负载</li></ul></li><li><p><strong>queryShard 方法</strong> (line 704)</p><ul><li>构建 <code>querypb.QueryRequest</code></li><li>设置目标节点 ID</li><li>处理 MVCC 时间戳覆盖</li><li>调用 <code>qn.Query()</code> 发送到 QueryNode</li><li>收集结果到 <code>resultBuf</code></li></ul></li></ol><h4 id="2-3-3-PostExecute-阶段"><a href="#2-3-3-PostExecute-阶段" class="headerlink" title="2.3.3 PostExecute 阶段"></a>2.3.3 PostExecute 阶段</h4><p><strong>方法</strong>: <code>queryTask.PostExecute()</code> (line 611)</p><p><strong>主要职责</strong>:</p><ol><li><p><strong>收集所有结果</strong></p><ul><li>从 <code>resultBuf</code> 收集所有 <code>RetrieveResults</code></li><li>统计总查询数量和数据大小</li></ul></li><li><p><strong>结果合并（Reduce）</strong></p><ul><li>创建 reducer: <code>createMilvusReducer()</code></li><li>调用 <code>reducer.Reduce()</code> 合并多个 shard 的结果</li><li>应用 limit 和 offset</li><li>处理排序</li></ul></li><li><p><strong>结果后处理</strong></p><ul><li>验证 geometry 字段</li><li>重构 struct 字段数据</li><li>时区转换（timestamp 转 ISO 字符串）</li><li>提取时间字段（如果指定）</li></ul></li><li><p><strong>设置最终结果</strong></p><ul><li>设置输出字段名称</li><li>设置主键字段名称</li><li>处理 iterator session timestamp</li></ul></li></ol><h2 id="3-QueryNode-层处理"><a href="#3-QueryNode-层处理" class="headerlink" title="3. QueryNode 层处理"></a>3. QueryNode 层处理</h2><h3 id="3-1-Query-方法入口"><a href="#3-1-Query-方法入口" class="headerlink" title="3.1 Query 方法入口"></a>3.1 Query 方法入口</h3><p><strong>文件</strong>: <code>internal/querynodev2/services.go</code></p><p><strong>方法</strong>: <code>QueryNode.Query()</code> (line 972)</p><p><strong>主要步骤</strong>:</p><ol><li><p><strong>健康检查</strong></p><ul><li>检查节点生命周期状态</li><li>检查 collection 是否已加载</li></ul></li><li><p><strong>并发处理多个 Channel</strong></p><ul><li>为每个 DML channel 创建独立的 goroutine</li><li>调用 <code>queryChannel()</code> 处理每个 channel</li><li>使用 <code>errgroup</code> 管理并发</li></ul></li><li><p><strong>结果合并</strong></p><ul><li>收集所有 channel 的结果</li><li>使用 <code>segments.CreateInternalReducer</code> 合并结果</li><li>返回最终结果</li></ul></li></ol><h3 id="3-2-queryChannel-方法"><a href="#3-2-queryChannel-方法" class="headerlink" title="3.2 queryChannel 方法"></a>3.2 queryChannel 方法</h3><p><strong>文件</strong>: <code>internal/querynodev2/handlers.go</code></p><p><strong>方法</strong>: <code>QueryNode.queryChannel()</code> (line 218)</p><p><strong>主要步骤</strong>:</p><ol><li><p><strong>获取 Shard Delegator</strong></p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sd, ok := node.delegators.Get(channel)</span><br></pre></td></tr></table></figure></li><li><p><strong>调用 Delegator 查询</strong></p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">results, err := sd.Query(queryCtx, req)</span><br></pre></td></tr></table></figure></li><li><p><strong>结果合并</strong></p><ul><li>使用 <code>segments.ReduceRetrieveResults()</code> 合并结果</li><li>处理 collection 引用计数</li></ul></li></ol><h3 id="3-3-Delegator-Query-方法"><a href="#3-3-Delegator-Query-方法" class="headerlink" title="3.3 Delegator Query 方法"></a>3.3 Delegator Query 方法</h3><p><strong>文件</strong>: <code>internal/querynodev2/delegator/delegator.go</code></p><p><strong>方法</strong>: <code>shardDelegator.Query()</code> (line 577)</p><p><strong>主要步骤</strong>:</p><ol><li><p><strong>验证请求</strong></p><ul><li>检查 channel 是否匹配</li><li>检查 delegator 是否可用</li></ul></li><li><p><strong>处理时间戳</strong></p><ul><li>调用 <code>speedupGuranteeTS()</code> 优化 guarantee timestamp</li><li>等待 tSafe（根据 <code>partialResultRequiredDataRatio</code> 配置）</li><li>设置 MVCC timestamp</li></ul></li><li><p><strong>获取可读 Segment</strong></p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">sealed, growing, sealedRowCount, version, err := </span><br><span class="line">    sd.distribution.PinReadableSegments(partialResultRequiredDataRatio, ...)</span><br></pre></td></tr></table></figure><ul><li>获取 sealed segments 和 growing segments</li><li>Pin 操作防止 segment 被卸载</li></ul></li><li><p><strong>Segment 剪枝</strong>（可选）</p><ul><li>如果启用 <code>EnableSegmentPrune</code></li><li>根据统计信息剪枝不相关的 segments</li></ul></li><li><p><strong>组织子任务</strong></p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">tasks, err := organizeSubTask(ctx, req, sealed, growing, sd, ...)</span><br></pre></td></tr></table></figure><ul><li>将 segments 组织成查询任务</li><li>分配任务到不同的 worker</li></ul></li><li><p><strong>执行子任务</strong></p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">results, err := executeSubTasks(ctx, tasks, evaluator, </span><br><span class="line">    <span class="function"><span class="keyword">func</span><span class="params">(ctx, req, worker)</span></span> &#123; worker.QuerySegments(ctx, req) &#125;, ...)</span><br></pre></td></tr></table></figure><ul><li>并发执行所有子任务</li><li>使用 <code>RowCountBasedEvaluator</code> 评估部分结果</li><li>处理失败情况（部分结果模式）</li></ul></li><li><p><strong>返回结果</strong></p><ul><li>返回所有成功的查询结果</li><li>Unpin segments</li></ul></li></ol><h2 id="4-数据流图"><a href="#4-数据流图" class="headerlink" title="4. 数据流图"></a>4. 数据流图</h2><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br></pre></td><td class="code"><pre><span class="line">Client Request (HTTP/gRPC)</span><br><span class="line">    ↓</span><br><span class="line">Proxy Handler (handler.go/handler_v1.go/handler_v2.go)</span><br><span class="line">    ↓</span><br><span class="line">Proxy.Query() [impl.go:3554]</span><br><span class="line">    ↓</span><br><span class="line">Proxy.query() [impl.go:3447]</span><br><span class="line">    ↓</span><br><span class="line">Enqueue to dqQueue</span><br><span class="line">    ↓</span><br><span class="line">QueryTask.PreExecute() [task_query.go:369]</span><br><span class="line">    ├─ 验证请求</span><br><span class="line">    ├─ 获取元数据</span><br><span class="line">    ├─ 创建查询计划</span><br><span class="line">    └─ 设置时间戳</span><br><span class="line">    ↓</span><br><span class="line">QueryTask.Execute() [task_query.go:587]</span><br><span class="line">    ├─ Load Balancer 选择节点</span><br><span class="line">    └─ queryShard() [task_query.go:704]</span><br><span class="line">        ↓</span><br><span class="line">        QueryNode.Query() [services.go:972]</span><br><span class="line">            ├─ 并发处理多个 Channel</span><br><span class="line">            └─ queryChannel() [handlers.go:218]</span><br><span class="line">                ↓</span><br><span class="line">                Delegator.Query() [delegator.go:577]</span><br><span class="line">                    ├─ 等待 tSafe</span><br><span class="line">                    ├─ Pin Readable Segments</span><br><span class="line">                    ├─ 组织子任务</span><br><span class="line">                    └─ executeSubTasks()</span><br><span class="line">                        ↓</span><br><span class="line">                        Worker.QuerySegments()</span><br><span class="line">                            ↓</span><br><span class="line">                            实际查询 Segment 数据</span><br><span class="line">    ↓</span><br><span class="line">QueryTask.PostExecute() [task_query.go:611]</span><br><span class="line">    ├─ 收集结果</span><br><span class="line">    ├─ Reduce 合并结果</span><br><span class="line">    └─ 后处理（时区转换等）</span><br><span class="line">    ↓</span><br><span class="line">返回结果给 Client</span><br></pre></td></tr></table></figure><h2 id="5-关键组件说明"><a href="#5-关键组件说明" class="headerlink" title="5. 关键组件说明"></a>5. 关键组件说明</h2><h3 id="5-1-Load-Balancer"><a href="#5-1-Load-Balancer" class="headerlink" title="5.1 Load Balancer"></a>5.1 Load Balancer</h3><p><strong>位置</strong>: <code>internal/proxy/shardclient</code></p><ul><li>负责选择 QueryNode 节点</li><li>支持多种负载均衡策略</li><li>管理 shard client 连接</li></ul><h3 id="5-2-Query-Plan-Parser"><a href="#5-2-Query-Plan-Parser" class="headerlink" title="5.2 Query Plan Parser"></a>5.2 Query Plan Parser</h3><p><strong>位置</strong>: <code>internal/parser/planparserv2</code></p><ul><li>解析查询表达式（如 <code>id &gt; 100</code>）</li><li>生成执行计划（PlanNode）</li><li>优化查询计划</li></ul><h3 id="5-3-Reducer"><a href="#5-3-Reducer" class="headerlink" title="5.3 Reducer"></a>5.3 Reducer</h3><p><strong>位置</strong>: <code>internal/util/reduce</code></p><ul><li>合并多个 shard 的查询结果</li><li>应用 limit 和 offset</li><li>处理排序和去重</li></ul><h3 id="5-4-Delegator"><a href="#5-4-Delegator" class="headerlink" title="5.4 Delegator"></a>5.4 Delegator</h3><p><strong>位置</strong>: <code>internal/querynodev2/delegator</code></p><ul><li>管理 shard 级别的查询逻辑</li><li>组织 segment 查询任务</li><li>处理部分结果和容错</li></ul><h3 id="5-5-Segment-Manager"><a href="#5-5-Segment-Manager" class="headerlink" title="5.5 Segment Manager"></a>5.5 Segment Manager</h3><p><strong>位置</strong>: <code>internal/querynodev2/segments</code></p><ul><li>管理 segment 的生命周期</li><li>提供 segment 查询接口</li><li>处理 segment 的加载和卸载</li></ul><h2 id="6-关键配置参数"><a href="#6-关键配置参数" class="headerlink" title="6. 关键配置参数"></a>6. 关键配置参数</h2><ul><li><code>ProxyCfg.SlowLogSpanInSeconds</code>: 慢查询阈值</li><li><code>QueryNodeCfg.PartialResultRequiredDataRatio</code>: 部分结果所需数据比例</li><li><code>QueryNodeCfg.EnableSegmentPrune</code>: 是否启用 segment 剪枝</li><li><code>QueryNodeCfg.DefaultSegmentFilterRatio</code>: 默认 segment 过滤比例</li></ul><h2 id="7-性能优化点"><a href="#7-性能优化点" class="headerlink" title="7. 性能优化点"></a>7. 性能优化点</h2><ol><li><strong>并发处理</strong>: 多个 channel 和 segment 并发查询</li><li><strong>Segment 剪枝</strong>: 根据统计信息跳过不相关的 segments</li><li><strong>部分结果</strong>: 支持返回部分结果以降低延迟</li><li><strong>负载均衡</strong>: 智能选择 QueryNode 节点</li><li><strong>结果缓存</strong>: 某些场景下可以缓存查询结果</li></ol><h2 id="8-错误处理"><a href="#8-错误处理" class="headerlink" title="8. 错误处理"></a>8. 错误处理</h2><ul><li><strong>Collection 未加载</strong>: 返回 <code>CollectionNotLoaded</code> 错误</li><li><strong>Channel 不匹配</strong>: Delegator 验证 channel 是否属于自己</li><li><strong>Segment 未加载</strong>: 标记 segment 为 offline，可能触发重试</li><li><strong>部分失败</strong>: 根据 <code>partialResultRequiredDataRatio</code> 决定是否返回部分结果</li></ul><h2 id="9-监控指标"><a href="#9-监控指标" class="headerlink" title="9. 监控指标"></a>9. 监控指标</h2><ul><li><code>ProxyReceivedNQ</code>: Proxy 接收的查询请求数</li><li><code>ProxySQLatency</code>: Proxy 查询延迟</li><li><code>QueryNodeSQCount</code>: QueryNode 查询计数</li><li><code>QueryNodeSQLatencyWaitTSafe</code>: 等待 tSafe 的延迟</li><li><code>ProxyReduceResultLatency</code>: 结果合并延迟</li></ul>]]></content>
    
    
      
      
    <summary type="html">&lt;p&gt;本文档详细分析了 Milvus 中 Query Request 从接收到返回的完整处理流程。&lt;/p&gt;
&lt;h2 id=&quot;1-请求入口层&quot;&gt;&lt;a href=&quot;#1-请求入口层&quot; class=&quot;headerlink&quot; title=&quot;1. 请求入口层&quot;&gt;&lt;/a&gt;1. 请求入口层&lt;/</summary>
      
    
    
    
    
    <category term="Milvus" scheme="https://szza.github.io/tags/Milvus/"/>
    
  </entry>
  
  <entry>
    <title>Milvus TSO（Timestamp Oracle）生成及其作用分析</title>
    <link href="https://szza.github.io/2025/08/10/Milvus/14_tso_generation_and_usage_analysis/"/>
    <id>https://szza.github.io/2025/08/10/Milvus/14_tso_generation_and_usage_analysis/</id>
    <published>2025-08-10T03:00:00.000Z</published>
    <updated>2026-01-06T13:10:48.744Z</updated>
    
    <content type="html"><![CDATA[<h2 id="概述"><a href="#概述" class="headerlink" title="概述"></a>概述</h2><p>TSO（Timestamp Oracle）是 Milvus 分布式系统中的核心组件，负责生成全局唯一、单调递增的时间戳。所有数据操作（Insert、Delete、Search 等）都需要从 TSO 获取时间戳，确保分布式环境下的事件顺序一致性。</p><h2 id="一、TSO-的基本概念"><a href="#一、TSO-的基本概念" class="headerlink" title="一、TSO 的基本概念"></a>一、TSO 的基本概念</h2><h3 id="1-1-为什么需要-TSO？"><a href="#1-1-为什么需要-TSO？" class="headerlink" title="1.1 为什么需要 TSO？"></a>1.1 为什么需要 TSO？</h3><p>在分布式系统中，存在以下问题：</p><ol><li><strong>时钟不同步</strong>：不同节点的本地时钟可能不一致</li><li><strong>网络延迟</strong>：消息在网络中传输存在延迟，可能导致乱序</li><li><strong>事件顺序</strong>：需要保证全局事件的有序性</li></ol><p><strong>示例场景</strong>：</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line">用户 u1 和 u2 在不同节点上操作：</span><br><span class="line">- t0: u1 创建 Collection C0</span><br><span class="line">- t2: u2 在 C0 上搜索（应该看到空集合）</span><br><span class="line">- t5: u1 插入数据 A1</span><br><span class="line">- t7: u2 在 C0 上搜索（应该看到 A1）</span><br><span class="line">- t15: u1 删除 A1</span><br><span class="line">- t17: u2 在 C0 上搜索（应该看不到 A1）</span><br></pre></td></tr></table></figure><p>如果没有 TSO，由于时钟不同步和网络延迟，u2 可能看到不一致的数据状态。</p><h3 id="1-2-TSO-的解决方案"><a href="#1-2-TSO-的解决方案" class="headerlink" title="1.2 TSO 的解决方案"></a>1.2 TSO 的解决方案</h3><ul><li><strong>统一时间源</strong>：所有组件从 RootCoord 的 TSO 服务获取时间戳，而不是使用本地时钟</li><li><strong>全局有序</strong>：TSO 保证生成的时间戳全局唯一且单调递增</li><li><strong>时间同步</strong>：通过 TimeTick 机制确保消息流的顺序处理</li></ul><h2 id="二、TSO-的数据结构"><a href="#二、TSO-的数据结构" class="headerlink" title="二、TSO 的数据结构"></a>二、TSO 的数据结构</h2><h3 id="2-1-混合时间戳（Hybrid-Timestamp）"><a href="#2-1-混合时间戳（Hybrid-Timestamp）" class="headerlink" title="2.1 混合时间戳（Hybrid Timestamp）"></a>2.1 混合时间戳（Hybrid Timestamp）</h3><p>TSO 生成的时间戳是 <code>uint64</code> 类型，采用混合结构：</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">┌─────────────────────────────────────────────────────────┐</span><br><span class="line">│                   64 bits (uint64)                       │</span><br><span class="line">├──────────────────────────────┬──────────────────────────┤</span><br><span class="line">│   Physical Part (46 bits)    │ Logical Part (18 bits)   │</span><br><span class="line">│   UTC 时间（毫秒）              │   逻辑计数器              │</span><br><span class="line">└──────────────────────────────┴──────────────────────────┘</span><br></pre></td></tr></table></figure><p><strong>代码实现</strong>（<code>pkg/util/tsoutil/tso.go</code>）：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">const</span> (</span><br><span class="line">    logicalBits     = <span class="number">18</span></span><br><span class="line">    logicalBitsMask = (<span class="number">1</span> &lt;&lt; logicalBits) - <span class="number">1</span>  <span class="comment">// 0x3FFFF</span></span><br><span class="line">)</span><br><span class="line"></span><br><span class="line"><span class="comment">// ComposeTS 组合物理时间和逻辑时间</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">ComposeTS</span><span class="params">(physical, logical <span class="type">int64</span>)</span></span> <span class="type">uint64</span> &#123;</span><br><span class="line">    <span class="keyword">return</span> <span class="type">uint64</span>((physical &lt;&lt; logicalBits) + logical)</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// ParseTS 解析时间戳</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">ParseTS</span><span class="params">(ts <span class="type">uint64</span>)</span></span> (time.Time, <span class="type">uint64</span>) &#123;</span><br><span class="line">    logical := ts &amp; logicalBitsMask           <span class="comment">// 提取逻辑部分（低 18 位）</span></span><br><span class="line">    physical := ts &gt;&gt; logicalBits              <span class="comment">// 提取物理部分（高 46 位）</span></span><br><span class="line">    physicalTime := time.Unix(<span class="type">int64</span>(physical/<span class="number">1000</span>), </span><br><span class="line">        <span class="type">int64</span>(physical)%<span class="number">1000</span>*time.Millisecond.Nanoseconds())</span><br><span class="line">    <span class="keyword">return</span> physicalTime, logical</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h3 id="2-2-各部分的作用"><a href="#2-2-各部分的作用" class="headerlink" title="2.2 各部分的作用"></a>2.2 各部分的作用</h3><table><thead><tr><th>部分</th><th>位数</th><th>作用</th><th>范围</th></tr></thead><tbody><tr><td><strong>Physical Part</strong></td><td>46 bits</td><td>UTC 时间（毫秒）</td><td>约 8925 年</td></tr><tr><td><strong>Logical Part</strong></td><td>18 bits</td><td>逻辑计数器</td><td>0 ~ 262,143 (2^18-1)</td></tr></tbody></table><p><strong>优势</strong>：</p><ul><li><strong>高精度</strong>：同一毫秒内可生成最多 262,143 个时间戳</li><li><strong>时间可读</strong>：物理部分可以直接转换为 UTC 时间</li><li><strong>全局唯一</strong>：物理时间 + 逻辑计数器保证唯一性</li></ul><h2 id="三、TSO-的生成流程"><a href="#三、TSO-的生成流程" class="headerlink" title="三、TSO 的生成流程"></a>三、TSO 的生成流程</h2><h3 id="3-1-整体架构"><a href="#3-1-整体架构" class="headerlink" title="3.1 整体架构"></a>3.1 整体架构</h3><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">┌─────────────┐         ┌──────────────┐         ┌─────────────┐</span><br><span class="line">│   Proxy     │────────▶│  RootCoord   │────────▶│    etcd      │</span><br><span class="line">│             │ Request │              │ Save    │              │</span><br><span class="line">│             │◀────────│  TSO Service │◀────────│              │</span><br><span class="line">└─────────────┘ Response└──────────────┘ Load    └─────────────┘</span><br></pre></td></tr></table></figure><h3 id="3-2-RootCoord-中的-TSO-服务"><a href="#3-2-RootCoord-中的-TSO-服务" class="headerlink" title="3.2 RootCoord 中的 TSO 服务"></a>3.2 RootCoord 中的 TSO 服务</h3><p><strong>文件</strong>: <code>internal/tso/tso.go</code></p><h4 id="3-2-1-timestampOracle-结构"><a href="#3-2-1-timestampOracle-结构" class="headerlink" title="3.2.1 timestampOracle 结构"></a>3.2.1 timestampOracle 结构</h4><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">type</span> timestampOracle <span class="keyword">struct</span> &#123;</span><br><span class="line">    key   <span class="type">string</span>              <span class="comment">// etcd 中的 key</span></span><br><span class="line">    txnKV kv.TxnKV            <span class="comment">// etcd 客户端</span></span><br><span class="line">    </span><br><span class="line">    saveInterval  time.Duration  <span class="comment">// 保存间隔（默认 3 秒）</span></span><br><span class="line">    maxResetTSGap <span class="function"><span class="keyword">func</span><span class="params">()</span></span> time.Duration</span><br><span class="line">    </span><br><span class="line">    TSO           unsafe.Pointer  <span class="comment">// 当前 TSO（原子指针）</span></span><br><span class="line">    lastSavedTime atomic.Value    <span class="comment">// 最后保存的时间</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// atomicObject 存储当前 TSO 状态</span></span><br><span class="line"><span class="keyword">type</span> atomicObject <span class="keyword">struct</span> &#123;</span><br><span class="line">    physical time.Time  <span class="comment">// 物理时间</span></span><br><span class="line">    logical  <span class="type">int64</span>      <span class="comment">// 逻辑计数器</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h4 id="3-2-2-初始化（InitTimestamp）"><a href="#3-2-2-初始化（InitTimestamp）" class="headerlink" title="3.2.2 初始化（InitTimestamp）"></a>3.2.2 初始化（InitTimestamp）</h4><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(t *timestampOracle)</span></span> InitTimestamp() <span class="type">error</span> &#123;</span><br><span class="line">    <span class="comment">// 1. 从 etcd 加载上次保存的时间戳</span></span><br><span class="line">    last, err := t.loadTimestamp()</span><br><span class="line">    <span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">        <span class="keyword">return</span> err</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 2. 获取当前系统时间</span></span><br><span class="line">    next := time.Now()</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 3. 如果系统时间与 etcd 时间差距太小，使用 etcd 时间</span></span><br><span class="line">    <span class="keyword">if</span> typeutil.SubTimeByWallClock(next, last) &lt; updateTimestampGuard &#123;</span><br><span class="line">        next = last.Add(updateTimestampGuard)</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 4. 保存未来时间窗口到 etcd（提前保存，避免频繁写 etcd）</span></span><br><span class="line">    save := next.Add(t.saveInterval)  <span class="comment">// saveInterval = 3 秒</span></span><br><span class="line">    <span class="keyword">if</span> err := t.saveTimestamp(save); err != <span class="literal">nil</span> &#123;</span><br><span class="line">        <span class="keyword">return</span> err</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 5. 初始化内存中的 TSO</span></span><br><span class="line">    current := &amp;atomicObject&#123;</span><br><span class="line">        physical: next,</span><br><span class="line">        logical:  <span class="number">0</span>,</span><br><span class="line">    &#125;</span><br><span class="line">    atomic.StorePointer(&amp;t.TSO, unsafe.Pointer(current))</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">return</span> <span class="literal">nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>关键点</strong>：</p><ul><li>从 etcd 恢复上次保存的时间戳</li><li>提前保存未来 3 秒的时间窗口，减少 etcd 写入</li><li>使用原子指针保证线程安全</li></ul><h4 id="3-2-3-更新时间戳（UpdateTimestamp）"><a href="#3-2-3-更新时间戳（UpdateTimestamp）" class="headerlink" title="3.2.3 更新时间戳（UpdateTimestamp）"></a>3.2.3 更新时间戳（UpdateTimestamp）</h4><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(t *timestampOracle)</span></span> UpdateTimestamp() <span class="type">error</span> &#123;</span><br><span class="line">    prev := (*atomicObject)(atomic.LoadPointer(&amp;t.TSO))</span><br><span class="line">    now := time.Now()</span><br><span class="line">    </span><br><span class="line">    jetLag := typeutil.SubTimeByWallClock(now, prev.physical)</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">var</span> next time.Time</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 情况 1：系统时间比物理时间大，同步到系统时间</span></span><br><span class="line">    <span class="keyword">if</span> jetLag &gt; updateTimestampGuard &#123;</span><br><span class="line">        next = now</span><br><span class="line">    &#125; </span><br><span class="line">    <span class="comment">// 情况 2：逻辑计数器快用完了，增加物理时间</span></span><br><span class="line">    <span class="keyword">else</span> <span class="keyword">if</span> prevLogical &gt; maxLogical/<span class="number">2</span> &#123;</span><br><span class="line">        next = prev.physical.Add(time.Millisecond)</span><br><span class="line">    &#125; </span><br><span class="line">    <span class="comment">// 情况 3：时间窗口足够，不需要更新</span></span><br><span class="line">    <span class="keyword">else</span> &#123;</span><br><span class="line">        <span class="keyword">return</span> <span class="literal">nil</span></span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 如果时间窗口不够，需要更新 etcd</span></span><br><span class="line">    <span class="keyword">if</span> typeutil.SubTimeByWallClock(t.lastSavedTime.Load().(time.Time), next) &lt;= updateTimestampGuard &#123;</span><br><span class="line">        save := next.Add(t.saveInterval)</span><br><span class="line">        <span class="keyword">if</span> err := t.saveTimestamp(save); err != <span class="literal">nil</span> &#123;</span><br><span class="line">            <span class="keyword">return</span> err</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 更新内存中的 TSO</span></span><br><span class="line">    current := &amp;atomicObject&#123;</span><br><span class="line">        physical: next,</span><br><span class="line">        logical:  <span class="number">0</span>,  <span class="comment">// 重置逻辑计数器</span></span><br><span class="line">    &#125;</span><br><span class="line">    atomic.StorePointer(&amp;t.TSO, unsafe.Pointer(current))</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">return</span> <span class="literal">nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>更新触发条件</strong>：</p><ol><li>系统时间比物理时间大（时钟同步）</li><li>逻辑计数器超过 <code>maxLogical/2</code>（131,071），需要增加物理时间</li><li>时间窗口不足，需要更新 etcd</li></ol><h3 id="3-3-生成时间戳（GenerateTSO）"><a href="#3-3-生成时间戳（GenerateTSO）" class="headerlink" title="3.3 生成时间戳（GenerateTSO）"></a>3.3 生成时间戳（GenerateTSO）</h3><p><strong>文件</strong>: <code>internal/tso/global_allocator.go</code></p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(gta *GlobalTSOAllocator)</span></span> GenerateTSO(count <span class="type">uint32</span>) (<span class="type">uint64</span>, <span class="type">error</span>) &#123;</span><br><span class="line">    <span class="keyword">var</span> physical, logical <span class="type">int64</span></span><br><span class="line">    </span><br><span class="line">    maxRetryCount := <span class="number">10</span></span><br><span class="line">    </span><br><span class="line">    <span class="keyword">for</span> i := <span class="number">0</span>; i &lt; maxRetryCount; i++ &#123;</span><br><span class="line">        <span class="comment">// 1. 获取当前 TSO 对象</span></span><br><span class="line">        current := (*atomicObject)(atomic.LoadPointer(&amp;gta.tso.TSO))</span><br><span class="line">        <span class="keyword">if</span> current == <span class="literal">nil</span> || current.physical.Equal(typeutil.ZeroTime) &#123;</span><br><span class="line">            <span class="comment">// TSO 未初始化，等待</span></span><br><span class="line">            time.Sleep(<span class="number">200</span> * time.Millisecond)</span><br><span class="line">            <span class="keyword">continue</span></span><br><span class="line">        &#125;</span><br><span class="line">        </span><br><span class="line">        <span class="comment">// 2. 原子增加逻辑计数器</span></span><br><span class="line">        physical = current.physical.UnixMilli()</span><br><span class="line">        logical = atomic.AddInt64(&amp;current.logical, <span class="type">int64</span>(count))</span><br><span class="line">        </span><br><span class="line">        <span class="comment">// 3. 检查逻辑计数器是否溢出</span></span><br><span class="line">        <span class="keyword">if</span> logical &gt;= maxLogical &amp;&amp; gta.LimitMaxLogic &#123;</span><br><span class="line">            log.Info(<span class="string">&quot;logical part outside of max logical interval, please check ntp time&quot;</span>)</span><br><span class="line">            time.Sleep(UpdateTimestampStep)  <span class="comment">// 等待 50ms，触发 UpdateTimestamp</span></span><br><span class="line">            <span class="keyword">continue</span></span><br><span class="line">        &#125;</span><br><span class="line">        </span><br><span class="line">        <span class="comment">// 4. 组合时间戳</span></span><br><span class="line">        <span class="keyword">return</span> tsoutil.ComposeTS(physical, logical), <span class="literal">nil</span></span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">return</span> <span class="number">0</span>, errors.New(<span class="string">&quot;can not get timestamp&quot;</span>)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>关键点</strong>：</p><ul><li>使用原子操作增加逻辑计数器，保证线程安全</li><li>支持批量分配（<code>count</code> 个时间戳）</li><li>如果逻辑计数器溢出，等待并触发 <code>UpdateTimestamp</code></li></ul><h3 id="3-4-RootCoord-的-AllocTimestamp-RPC"><a href="#3-4-RootCoord-的-AllocTimestamp-RPC" class="headerlink" title="3.4 RootCoord 的 AllocTimestamp RPC"></a>3.4 RootCoord 的 AllocTimestamp RPC</h3><p><strong>文件</strong>: <code>internal/rootcoord/root_coord.go</code></p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(c *Core)</span></span> AllocTimestamp(ctx context.Context, in *rootcoordpb.AllocTimestampRequest) </span><br><span class="line">    (*rootcoordpb.AllocTimestampResponse, <span class="type">error</span>) &#123;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 1. 检查健康状态</span></span><br><span class="line">    <span class="keyword">if</span> err := merr.CheckHealthy(c.GetStateCode()); err != <span class="literal">nil</span> &#123;</span><br><span class="line">        <span class="keyword">return</span> &amp;rootcoordpb.AllocTimestampResponse&#123;Status: merr.Status(err)&#125;, <span class="literal">nil</span></span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 2. 处理阻塞时间戳（用于数据恢复等场景）</span></span><br><span class="line">    <span class="keyword">if</span> in.BlockTimestamp &gt; <span class="number">0</span> &#123;</span><br><span class="line">        blockTime, _ := tsoutil.ParseTS(in.BlockTimestamp)</span><br><span class="line">        lastTime := c.tsoAllocator.GetLastSavedTime()</span><br><span class="line">        deltaDuration := blockTime.Sub(lastTime)</span><br><span class="line">        <span class="keyword">if</span> deltaDuration &gt; <span class="number">0</span> &#123;</span><br><span class="line">            time.Sleep(deltaDuration + time.Millisecond*<span class="number">200</span>)</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 3. 生成 TSO</span></span><br><span class="line">    ts, err := c.tsoAllocator.GenerateTSO(in.GetCount())</span><br><span class="line">    <span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">        <span class="keyword">return</span> &amp;rootcoordpb.AllocTimestampResponse&#123;Status: merr.Status(err)&#125;, <span class="literal">nil</span></span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 4. 返回第一个可用时间戳</span></span><br><span class="line">    <span class="comment">// 注意：GenerateTSO 返回的是最后一个时间戳，需要减去 count-1</span></span><br><span class="line">    ts = ts - <span class="type">uint64</span>(in.GetCount()) + <span class="number">1</span></span><br><span class="line">    </span><br><span class="line">    <span class="keyword">return</span> &amp;rootcoordpb.AllocTimestampResponse&#123;</span><br><span class="line">        Status:    merr.Success(),</span><br><span class="line">        Timestamp: ts,  <span class="comment">// 第一个可用时间戳</span></span><br><span class="line">        Count:     in.GetCount(),</span><br><span class="line">    &#125;, <span class="literal">nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>Proto 定义</strong>：</p><figure class="highlight protobuf"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">service </span><span class="title class_">RootCoord</span> &#123;</span><br><span class="line">    <span class="function"><span class="keyword">rpc</span> AllocTimestamp(AllocTimestampRequest) <span class="keyword">returns</span> (AllocTimestampResponse) </span>&#123;&#125;</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="keyword">message </span><span class="title class_">AllocTimestampRequest</span> &#123;</span><br><span class="line">    common.MsgBase base = <span class="number">1</span>;</span><br><span class="line">    <span class="type">uint32</span> count = <span class="number">3</span>;  <span class="comment">// 请求的时间戳数量</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="keyword">message </span><span class="title class_">AllocTimestampResponse</span> &#123;</span><br><span class="line">    common.Status status = <span class="number">1</span>;</span><br><span class="line">    <span class="type">uint64</span> timestamp = <span class="number">2</span>;  <span class="comment">// 第一个可用时间戳</span></span><br><span class="line">    <span class="type">uint32</span> count = <span class="number">3</span>;      <span class="comment">// 分配的数量</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h2 id="四、Proxy-如何使用-TSO"><a href="#四、Proxy-如何使用-TSO" class="headerlink" title="四、Proxy 如何使用 TSO"></a>四、Proxy 如何使用 TSO</h2><h3 id="4-1-TimestampAllocator"><a href="#4-1-TimestampAllocator" class="headerlink" title="4.1 TimestampAllocator"></a>4.1 TimestampAllocator</h3><p><strong>文件</strong>: <code>internal/proxy/timestamp.go</code></p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">type</span> timestampAllocator <span class="keyword">struct</span> &#123;</span><br><span class="line">    peerID <span class="type">int64</span></span><br><span class="line">    tso    types.RootCoordClient</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(ta *timestampAllocator)</span></span> alloc(ctx context.Context, count <span class="type">uint32</span>) ([]Timestamp, <span class="type">error</span>) &#123;</span><br><span class="line">    <span class="comment">// 1. 构建请求</span></span><br><span class="line">    req := &amp;rootcoordpb.AllocTimestampRequest&#123;</span><br><span class="line">        Base: commonpbutil.NewMsgBase(</span><br><span class="line">            commonpbutil.WithMsgType(commonpb.MsgType_RequestTSO),</span><br><span class="line">            commonpbutil.WithSourceID(ta.peerID),</span><br><span class="line">        ),</span><br><span class="line">        Count: count,</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 2. 调用 RootCoord 的 AllocTimestamp</span></span><br><span class="line">    resp, err := ta.tso.AllocTimestamp(ctx, req)</span><br><span class="line">    <span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">        <span class="keyword">return</span> <span class="literal">nil</span>, err</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 3. 生成时间戳数组</span></span><br><span class="line">    start, cnt := resp.GetTimestamp(), resp.GetCount()</span><br><span class="line">    ret := <span class="built_in">make</span>([]Timestamp, cnt)</span><br><span class="line">    <span class="keyword">for</span> i := <span class="type">uint32</span>(<span class="number">0</span>); i &lt; cnt; i++ &#123;</span><br><span class="line">        ret[i] = start + <span class="type">uint64</span>(i)  <span class="comment">// 连续的时间戳</span></span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">return</span> ret, <span class="literal">nil</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// AllocOne 分配单个时间戳</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(ta *timestampAllocator)</span></span> AllocOne(ctx context.Context) (Timestamp, <span class="type">error</span>) &#123;</span><br><span class="line">    ret, err := ta.alloc(ctx, <span class="number">1</span>)</span><br><span class="line">    <span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">        <span class="keyword">return</span> <span class="number">0</span>, err</span><br><span class="line">    &#125;</span><br><span class="line">    <span class="keyword">return</span> ret[<span class="number">0</span>], <span class="literal">nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h3 id="4-2-Insert-操作中的-TSO-使用"><a href="#4-2-Insert-操作中的-TSO-使用" class="headerlink" title="4.2 Insert 操作中的 TSO 使用"></a>4.2 Insert 操作中的 TSO 使用</h3><p><strong>文件</strong>: <code>internal/proxy/task_insert.go</code></p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(it *insertTask)</span></span> insertPreExecute(ctx context.Context) <span class="type">error</span> &#123;</span><br><span class="line">    <span class="comment">// 1. 分配主键 ID</span></span><br><span class="line">    rowNums := <span class="type">uint32</span>(it.insertMsg.NRows())</span><br><span class="line">    rowIDBegin, rowIDEnd, _ := common.AllocAutoID(it.idAllocator.Alloc, rowNums, clusterID)</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 2. 为每行分配时间戳</span></span><br><span class="line">    rowNum := it.insertMsg.NRows()</span><br><span class="line">    it.insertMsg.Timestamps = <span class="built_in">make</span>([]<span class="type">uint64</span>, rowNum)</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 3. 批量分配时间戳（性能优化）</span></span><br><span class="line">    timestamps, err := it.timestampAllocator.AllocTimestamp(ctx, <span class="type">uint32</span>(rowNum))</span><br><span class="line">    <span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">        <span class="keyword">return</span> err</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 4. 为每行设置时间戳</span></span><br><span class="line">    <span class="keyword">for</span> index := <span class="keyword">range</span> it.insertMsg.Timestamps &#123;</span><br><span class="line">        it.insertMsg.Timestamps[index] = timestamps[index]</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">return</span> <span class="literal">nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h3 id="4-3-Delete-操作中的-TSO-使用"><a href="#4-3-Delete-操作中的-TSO-使用" class="headerlink" title="4.3 Delete 操作中的 TSO 使用"></a>4.3 Delete 操作中的 TSO 使用</h3><p><strong>文件</strong>: <code>internal/proxy/task_delete.go</code></p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(dr *deleteRunner)</span></span> Run(ctx context.Context) <span class="type">error</span> &#123;</span><br><span class="line">    <span class="comment">// 1. 分配时间戳</span></span><br><span class="line">    ts, err := dr.tsoAllocatorIns.AllocOne(ctx)</span><br><span class="line">    <span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">        <span class="keyword">return</span> err</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 2. 设置删除消息的时间戳</span></span><br><span class="line">    dr.deleteMsg.Timestamps = []<span class="type">uint64</span>&#123;ts&#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 3. 发送删除消息到消息流</span></span><br><span class="line">    <span class="comment">// ...</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h2 id="五、TSO-的作用"><a href="#五、TSO-的作用" class="headerlink" title="五、TSO 的作用"></a>五、TSO 的作用</h2><h3 id="5-1-保证全局事件顺序"><a href="#5-1-保证全局事件顺序" class="headerlink" title="5.1 保证全局事件顺序"></a>5.1 保证全局事件顺序</h3><p><strong>场景</strong>：多个 Proxy 同时写入数据</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">Proxy1: Insert A (ts=100)</span><br><span class="line">Proxy2: Insert B (ts=150)</span><br><span class="line">Proxy1: Insert C (ts=200)</span><br><span class="line">Proxy2: Delete A (ts=250)</span><br></pre></td></tr></table></figure><p>所有操作都从同一个 TSO 获取时间戳，保证全局顺序：</p><ul><li>A 在 B 之前插入（100 &lt; 150）</li><li>C 在 B 之后插入（200 &gt; 150）</li><li>Delete A 在 Insert C 之后（250 &gt; 200）</li></ul><h3 id="5-2-时间同步（Time-Synchronization）"><a href="#5-2-时间同步（Time-Synchronization）" class="headerlink" title="5.2 时间同步（Time Synchronization）"></a>5.2 时间同步（Time Synchronization）</h3><p><strong>问题</strong>：如何确保消息流中所有小于某个时间戳的消息都已处理？</p><p><strong>解决方案</strong>：TimeTick 机制</p><ol><li><p><strong>Proxy 上报时间戳</strong>：</p><ul><li>每个 Proxy 定期（默认 200ms）向 RootCoord 上报每个消息流的最新时间戳</li></ul></li><li><p><strong>RootCoord 计算最小时间戳</strong>：</p><ul><li>对于每个消息流，RootCoord 计算所有 Proxy 上报的最小时间戳</li></ul></li><li><p><strong>插入 TimeTick 消息</strong>：</p><ul><li>RootCoord 将最小时间戳作为 TimeTick 消息插入到消息流中</li></ul></li><li><p><strong>消费者处理</strong>：</p><ul><li>当消费者读取到 TimeTick 消息时，表示所有小于该时间戳的消息都已处理完成</li></ul></li></ol><p><strong>代码</strong>（<code>internal/rootcoord/root_coord.go</code>）：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(c *Core)</span></span> UpdateChannelTimeTick(ctx context.Context, in *internalpb.ChannelTimeTickMsg) </span><br><span class="line">    (*commonpb.Status, <span class="type">error</span>) &#123;</span><br><span class="line">    <span class="comment">// 更新每个 Channel 的时间戳</span></span><br><span class="line">    err := c.chanTimeTick.updateTimeTick(in, <span class="string">&quot;gRPC&quot;</span>)</span><br><span class="line">    <span class="keyword">return</span> merr.Status(err), <span class="literal">nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h3 id="5-3-数据一致性保证"><a href="#5-3-数据一致性保证" class="headerlink" title="5.3 数据一致性保证"></a>5.3 数据一致性保证</h3><p><strong>场景</strong>：查询操作需要看到一致的数据快照</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">t1: Insert A (ts=100)</span><br><span class="line">t2: Insert B (ts=200)</span><br><span class="line">t3: Search (ts=150)  // 应该只看到 A，看不到 B</span><br><span class="line">t4: Delete A (ts=250)</span><br><span class="line">t5: Search (ts=300)   // 应该只看到 B</span><br></pre></td></tr></table></figure><p>通过 TSO 时间戳：</p><ul><li>查询操作获取时间戳 <code>ts_query</code></li><li>只处理时间戳 <code>&lt;= ts_query</code> 的数据</li><li>保证查询结果的一致性</li></ul><h3 id="5-4-故障恢复"><a href="#5-4-故障恢复" class="headerlink" title="5.4 故障恢复"></a>5.4 故障恢复</h3><p><strong>场景</strong>：系统重启后恢复时间戳</p><ol><li><p><strong>持久化到 etcd</strong>：</p><ul><li>RootCoord 定期将时间戳保存到 etcd</li></ul></li><li><p><strong>恢复时间戳</strong>：</p><ul><li>重启后，从 etcd 加载上次保存的时间戳</li><li>确保新生成的时间戳大于已保存的时间戳</li></ul></li><li><p><strong>防止时间回退</strong>：</p><ul><li>如果系统时间回退，使用 etcd 中的时间戳</li><li>保证时间戳单调递增</li></ul></li></ol><h2 id="六、TSO-的性能优化"><a href="#六、TSO-的性能优化" class="headerlink" title="六、TSO 的性能优化"></a>六、TSO 的性能优化</h2><h3 id="6-1-批量分配"><a href="#6-1-批量分配" class="headerlink" title="6.1 批量分配"></a>6.1 批量分配</h3><ul><li><strong>减少 RPC 调用</strong>：Proxy 可以一次请求多个时间戳</li><li><strong>提高吞吐量</strong>：减少网络往返次数</li></ul><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// 一次分配 100 个时间戳</span></span><br><span class="line">timestamps, err := tsoAllocator.AllocTimestamp(ctx, <span class="number">100</span>)</span><br></pre></td></tr></table></figure><h3 id="6-2-时间窗口预分配"><a href="#6-2-时间窗口预分配" class="headerlink" title="6.2 时间窗口预分配"></a>6.2 时间窗口预分配</h3><ul><li><strong>减少 etcd 写入</strong>：提前保存未来 3 秒的时间窗口</li><li><strong>提高性能</strong>：避免频繁写 etcd</li></ul><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">saveInterval = <span class="number">3</span> * time.Second  <span class="comment">// 提前保存 3 秒</span></span><br></pre></td></tr></table></figure><h3 id="6-3-内存分配"><a href="#6-3-内存分配" class="headerlink" title="6.3 内存分配"></a>6.3 内存分配</h3><ul><li><strong>原子操作</strong>：使用原子操作更新逻辑计数器，无锁</li><li><strong>高性能</strong>：避免锁竞争</li></ul><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">logical = atomic.AddInt64(&amp;current.logical, <span class="type">int64</span>(count))</span><br></pre></td></tr></table></figure><h3 id="6-4-逻辑计数器溢出处理"><a href="#6-4-逻辑计数器溢出处理" class="headerlink" title="6.4 逻辑计数器溢出处理"></a>6.4 逻辑计数器溢出处理</h3><ul><li><strong>自动增加物理时间</strong>：当逻辑计数器超过 <code>maxLogical/2</code> 时，自动增加物理时间</li><li><strong>保证可用性</strong>：不会因为逻辑计数器用完而阻塞</li></ul><h2 id="七、TSO-的限制和注意事项"><a href="#七、TSO-的限制和注意事项" class="headerlink" title="七、TSO 的限制和注意事项"></a>七、TSO 的限制和注意事项</h2><h3 id="7-1-时钟同步要求"><a href="#7-1-时钟同步要求" class="headerlink" title="7.1 时钟同步要求"></a>7.1 时钟同步要求</h3><ul><li><strong>NTP 同步</strong>：建议所有节点使用 NTP 同步时钟</li><li><strong>时钟偏移检测</strong>：如果时钟偏移超过 3 倍 <code>UpdateTimestampStep</code>（150ms），会记录警告日志</li></ul><h3 id="7-2-逻辑计数器限制"><a href="#7-2-逻辑计数器限制" class="headerlink" title="7.2 逻辑计数器限制"></a>7.2 逻辑计数器限制</h3><ul><li><strong>最大逻辑值</strong>：<code>maxLogical = 2^18 = 262,143</code></li><li><strong>同一毫秒内最多生成</strong>：262,143 个时间戳</li><li><strong>如果超过</strong>：需要等待下一毫秒</li></ul><h3 id="7-3-单点故障"><a href="#7-3-单点故障" class="headerlink" title="7.3 单点故障"></a>7.3 单点故障</h3><ul><li><strong>RootCoord 是单点</strong>：如果 RootCoord 故障，无法分配时间戳</li><li><strong>高可用方案</strong>：通过 etcd 的 Leader 选举实现高可用</li></ul><h3 id="7-4-时间戳范围"><a href="#7-4-时间戳范围" class="headerlink" title="7.4 时间戳范围"></a>7.4 时间戳范围</h3><ul><li><strong>物理时间范围</strong>：<ul><li>最小值：<code>1546300800000</code>（2019-01-01 00:00:00 UTC）</li><li>最大值：<code>253402300799000</code>（9999-12-31 23:59:59 UTC）</li></ul></li></ul><h2 id="八、总结"><a href="#八、总结" class="headerlink" title="八、总结"></a>八、总结</h2><h3 id="8-1-TSO-的核心价值"><a href="#8-1-TSO-的核心价值" class="headerlink" title="8.1 TSO 的核心价值"></a>8.1 TSO 的核心价值</h3><ol><li>✅ <strong>全局唯一时间戳</strong>：保证分布式环境下事件顺序</li><li>✅ <strong>时间同步</strong>：解决时钟不同步和网络延迟问题</li><li>✅ <strong>数据一致性</strong>：支持基于时间戳的快照查询</li><li>✅ <strong>高性能</strong>：批量分配、时间窗口预分配等优化</li></ol><h3 id="8-2-关键设计"><a href="#8-2-关键设计" class="headerlink" title="8.2 关键设计"></a>8.2 关键设计</h3><ul><li><strong>混合时间戳</strong>：物理时间（46 bits）+ 逻辑计数器（18 bits）</li><li><strong>持久化</strong>：定期保存到 etcd，支持故障恢复</li><li><strong>原子操作</strong>：使用原子操作保证线程安全</li><li><strong>批量分配</strong>：支持一次分配多个时间戳</li></ul><h3 id="8-3-使用场景"><a href="#8-3-使用场景" class="headerlink" title="8.3 使用场景"></a>8.3 使用场景</h3><ul><li><strong>Insert&#x2F;Delete 操作</strong>：为每行数据分配时间戳</li><li><strong>查询操作</strong>：获取查询时间戳，保证一致性</li><li><strong>消息流处理</strong>：通过 TimeTick 确保消息顺序</li><li><strong>故障恢复</strong>：从 etcd 恢复时间戳状态</li></ul><p>TSO 是 Milvus 分布式系统的核心基础设施，为整个系统提供了全局有序的时间基准，确保了数据一致性和系统可靠性。</p><h2 id="九、GenerateTSO-失败处理机制"><a href="#九、GenerateTSO-失败处理机制" class="headerlink" title="九、GenerateTSO 失败处理机制"></a>九、GenerateTSO 失败处理机制</h2><h3 id="9-1-GenerateTSO-内部重试机制"><a href="#9-1-GenerateTSO-内部重试机制" class="headerlink" title="9.1 GenerateTSO 内部重试机制"></a>9.1 GenerateTSO 内部重试机制</h3><p><code>GenerateTSO</code> 方法内部实现了重试机制，不会立即失败：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(gta *GlobalTSOAllocator)</span></span> GenerateTSO(count <span class="type">uint32</span>) (<span class="type">uint64</span>, <span class="type">error</span>) &#123;</span><br><span class="line">    maxRetryCount := <span class="number">10</span>  <span class="comment">// 最多重试 10 次</span></span><br><span class="line">    </span><br><span class="line">    <span class="keyword">for</span> i := <span class="number">0</span>; i &lt; maxRetryCount; i++ &#123;</span><br><span class="line">        current := (*atomicObject)(atomic.LoadPointer(&amp;gta.tso.TSO))</span><br><span class="line">        </span><br><span class="line">        <span class="comment">// 情况 1：TSO 未初始化，等待 200ms 后重试</span></span><br><span class="line">        <span class="keyword">if</span> current == <span class="literal">nil</span> || current.physical.Equal(typeutil.ZeroTime) &#123;</span><br><span class="line">            log.Info(<span class="string">&quot;sync hasn&#x27;t completed yet, wait for a while&quot;</span>)</span><br><span class="line">            time.Sleep(<span class="number">200</span> * time.Millisecond)</span><br><span class="line">            <span class="keyword">continue</span></span><br><span class="line">        &#125;</span><br><span class="line">        </span><br><span class="line">        <span class="comment">// 情况 2：逻辑计数器溢出，等待 50ms 后重试</span></span><br><span class="line">        logical = atomic.AddInt64(&amp;current.logical, <span class="type">int64</span>(count))</span><br><span class="line">        <span class="keyword">if</span> logical &gt;= maxLogical &amp;&amp; gta.LimitMaxLogic &#123;</span><br><span class="line">            log.Info(<span class="string">&quot;logical part outside of max logical interval&quot;</span>)</span><br><span class="line">            time.Sleep(UpdateTimestampStep)  <span class="comment">// 50ms</span></span><br><span class="line">            <span class="keyword">continue</span></span><br><span class="line">        &#125;</span><br><span class="line">        </span><br><span class="line">        <span class="keyword">return</span> tsoutil.ComposeTS(physical, logical), <span class="literal">nil</span></span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 所有重试都失败，返回错误</span></span><br><span class="line">    <span class="keyword">return</span> <span class="number">0</span>, errors.New(<span class="string">&quot;can not get timestamp&quot;</span>)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>重试场景</strong>：</p><ol><li><strong>TSO 未初始化</strong>：等待 200ms 后重试（最多 10 次，总等待时间约 2 秒）</li><li><strong>逻辑计数器溢出</strong>：等待 50ms 后重试（等待 <code>UpdateTimestamp</code> 更新物理时间）</li></ol><h3 id="9-2-Proxy-层的超时和重试"><a href="#9-2-Proxy-层的超时和重试" class="headerlink" title="9.2 Proxy 层的超时和重试"></a>9.2 Proxy 层的超时和重试</h3><p>Proxy 调用 TSO 时有多层保护：</p><h4 id="9-2-1-Context-超时"><a href="#9-2-1-Context-超时" class="headerlink" title="9.2.1 Context 超时"></a>9.2.1 Context 超时</h4><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(ta *timestampAllocator)</span></span> alloc(ctx context.Context, count <span class="type">uint32</span>) ([]Timestamp, <span class="type">error</span>) &#123;</span><br><span class="line">    <span class="comment">// 设置 10 秒超时</span></span><br><span class="line">    ctx, cancel := context.WithTimeout(ctx, <span class="number">10</span>*time.Second)</span><br><span class="line">    <span class="keyword">defer</span> cancel()</span><br><span class="line">    </span><br><span class="line">    resp, err := ta.tso.AllocTimestamp(ctx, req)</span><br><span class="line">    <span class="comment">// ...</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>超时保护</strong>：如果 RootCoord 无响应，10 秒后自动超时，不会永久阻塞。</p><h4 id="9-2-2-gRPC-客户端重试"><a href="#9-2-2-gRPC-客户端重试" class="headerlink" title="9.2.2 gRPC 客户端重试"></a>9.2.2 gRPC 客户端重试</h4><p>Proxy 的 gRPC 客户端有自动重试机制：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// internal/util/grpcclient/client.go</span></span><br><span class="line">retry.Handle(ctx, <span class="function"><span class="keyword">func</span><span class="params">()</span></span> (<span class="type">bool</span>, <span class="type">error</span>) &#123;</span><br><span class="line">    <span class="comment">// 执行 RPC 调用</span></span><br><span class="line">    ret, err = caller(wrapper.client)</span><br><span class="line">    <span class="comment">// 检查错误，决定是否重试</span></span><br><span class="line">    <span class="keyword">return</span> needRetry, err</span><br><span class="line">&#125;, </span><br><span class="line">    retry.Attempts(<span class="number">10</span>),                    <span class="comment">// 最多重试 10 次</span></span><br><span class="line">    retry.Sleep(<span class="number">200</span>*time.Millisecond),     <span class="comment">// 初始退避 200ms</span></span><br><span class="line">    retry.MaxSleepTime(<span class="number">10</span>*time.Second))    <span class="comment">// 最大退避 10 秒</span></span><br></pre></td></tr></table></figure><p><strong>重试策略</strong>：</p><ul><li>最多重试 10 次</li><li>初始退避：200ms</li><li>最大退避：10 秒</li><li>总耗时：最多约 52.8 秒（如果所有重试都失败）</li></ul><h3 id="9-3-任务入队失败处理"><a href="#9-3-任务入队失败处理" class="headerlink" title="9.3 任务入队失败处理"></a>9.3 任务入队失败处理</h3><p>当 TSO 分配失败时，任务无法入队，会立即返回错误：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(queue *baseTaskQueue)</span></span> Enqueue(t task) <span class="type">error</span> &#123;</span><br><span class="line">    <span class="comment">// ...</span></span><br><span class="line">    <span class="keyword">if</span> t.CanSkipAllocTimestamp() &#123;</span><br><span class="line">        <span class="comment">// 某些任务可以跳过 TSO 分配</span></span><br><span class="line">        ts = tsoutil.ComposeTS(time.Now().UnixMilli(), <span class="number">0</span>)</span><br><span class="line">    &#125; <span class="keyword">else</span> &#123;</span><br><span class="line">        <span class="comment">// 必须分配 TSO 的任务</span></span><br><span class="line">        ts, err = queue.tsoAllocatorIns.AllocOne(t.TraceCtx())</span><br><span class="line">        <span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">            <span class="keyword">return</span> err  <span class="comment">// 直接返回错误，任务不入队</span></span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line">    <span class="comment">// ...</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>关键点</strong>：</p><ul><li>✅ <strong>不会阻塞</strong>：任务入队失败会立即返回错误，不会阻塞其他任务</li><li>✅ <strong>错误传播</strong>：错误会返回给调用者（如 Insert&#x2F;Delete 请求）</li><li>✅ <strong>用户可见</strong>：用户会收到明确的错误响应</li></ul><h3 id="9-4-失败场景分析"><a href="#9-4-失败场景分析" class="headerlink" title="9.4 失败场景分析"></a>9.4 失败场景分析</h3><h4 id="场景-1：RootCoord-未启动或不可用"><a href="#场景-1：RootCoord-未启动或不可用" class="headerlink" title="场景 1：RootCoord 未启动或不可用"></a>场景 1：RootCoord 未启动或不可用</h4><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line">GenerateTSO 内部重试（10次 × 200ms = 2秒）</span><br><span class="line">    ↓</span><br><span class="line">Proxy gRPC 重试（10次，最多 52.8 秒）</span><br><span class="line">    ↓</span><br><span class="line">Context 超时（10 秒）</span><br><span class="line">    ↓</span><br><span class="line">任务入队失败，返回错误给用户</span><br></pre></td></tr></table></figure><p><strong>结果</strong>：用户请求会在约 10 秒后收到错误响应，不会永久阻塞。</p><h4 id="场景-2：TSO-未初始化（RootCoord-刚启动）"><a href="#场景-2：TSO-未初始化（RootCoord-刚启动）" class="headerlink" title="场景 2：TSO 未初始化（RootCoord 刚启动）"></a>场景 2：TSO 未初始化（RootCoord 刚启动）</h4><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line">GenerateTSO 检测到 TSO 未初始化</span><br><span class="line">    ↓</span><br><span class="line">等待 200ms</span><br><span class="line">    ↓</span><br><span class="line">重试（最多 10 次，总等待约 2 秒）</span><br><span class="line">    ↓</span><br><span class="line">如果初始化完成，成功返回</span><br><span class="line">    ↓</span><br><span class="line">如果仍未初始化，返回错误</span><br></pre></td></tr></table></figure><p><strong>结果</strong>：正常情况下，RootCoord 初始化很快（&lt; 1 秒），不会失败。</p><h4 id="场景-3：逻辑计数器溢出"><a href="#场景-3：逻辑计数器溢出" class="headerlink" title="场景 3：逻辑计数器溢出"></a>场景 3：逻辑计数器溢出</h4><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line">GenerateTSO 检测到逻辑计数器溢出</span><br><span class="line">    ↓</span><br><span class="line">等待 50ms（等待 UpdateTimestamp 更新）</span><br><span class="line">    ↓</span><br><span class="line">重试（最多 10 次，总等待约 0.5 秒）</span><br><span class="line">    ↓</span><br><span class="line">如果 UpdateTimestamp 成功，继续分配</span><br><span class="line">    ↓</span><br><span class="line">如果仍然溢出，返回错误</span><br></pre></td></tr></table></figure><p><strong>结果</strong>：正常情况下，<code>UpdateTimestamp</code> 会在 50ms 内完成，不会失败。</p><h3 id="9-5-总结"><a href="#9-5-总结" class="headerlink" title="9.5 总结"></a>9.5 总结</h3><p><strong>GenerateTSO 失败不会永久阻塞处理</strong>：</p><ol><li><p>✅ <strong>多层重试机制</strong>：</p><ul><li>GenerateTSO 内部重试（10 次）</li><li>gRPC 客户端重试（10 次）</li><li>总重试次数可达 100 次</li></ul></li><li><p>✅ <strong>超时保护</strong>：</p><ul><li>Context 超时（10 秒）</li><li>确保不会无限等待</li></ul></li><li><p>✅ <strong>快速失败</strong>：</p><ul><li>任务入队失败立即返回错误</li><li>不会阻塞其他任务的处理</li><li>用户会收到明确的错误响应</li></ul></li><li><p>✅ <strong>正常情况下的性能</strong>：</p><ul><li>TSO 分配通常在毫秒级完成</li><li>重试机制只在异常情况下触发</li><li>不会影响正常请求的延迟</li></ul></li></ol><p><strong>最佳实践</strong>：</p><ul><li>确保 RootCoord 高可用（通过 etcd Leader 选举）</li><li>监控 TSO 分配失败率</li><li>如果频繁失败，检查 RootCoord 健康状态和网络连接</li></ul>]]></content>
    
    
      
      
    <summary type="html">&lt;h2 id=&quot;概述&quot;&gt;&lt;a href=&quot;#概述&quot; class=&quot;headerlink&quot; title=&quot;概述&quot;&gt;&lt;/a&gt;概述&lt;/h2&gt;&lt;p&gt;TSO（Timestamp Oracle）是 Milvus 分布式系统中的核心组件，负责生成全局唯一、单调递增的时间戳。所有数据操作（Ins</summary>
      
    
    
    
    
    <category term="Milvus" scheme="https://szza.github.io/tags/Milvus/"/>
    
  </entry>
  
  <entry>
    <title>Milvus C++ 功能通过 CGO 集成分析</title>
    <link href="https://szza.github.io/2025/08/10/Milvus/13_cgo_cpp_integration_analysis/"/>
    <id>https://szza.github.io/2025/08/10/Milvus/13_cgo_cpp_integration_analysis/</id>
    <published>2025-08-10T02:00:00.000Z</published>
    <updated>2026-01-06T13:10:48.425Z</updated>
    
    <content type="html"><![CDATA[<h2 id="概述"><a href="#概述" class="headerlink" title="概述"></a>概述</h2><p>Milvus 采用混合架构设计，核心性能关键路径使用 C++ 实现，通过 CGO（C Go）提供给 Go 代码调用。这种设计既保证了性能，又保持了 Go 代码的简洁性和可维护性。</p><h2 id="一、CGO-集成架构"><a href="#一、CGO-集成架构" class="headerlink" title="一、CGO 集成架构"></a>一、CGO 集成架构</h2><h3 id="1-1-基本结构"><a href="#1-1-基本结构" class="headerlink" title="1.1 基本结构"></a>1.1 基本结构</h3><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br></pre></td><td class="code"><pre><span class="line">┌─────────────────────────────────────────┐</span><br><span class="line">│         Go 代码层                        │</span><br><span class="line">│  (Proxy, QueryNode, DataNode, etc.)     │</span><br><span class="line">└──────────────┬──────────────────────────┘</span><br><span class="line">               │ CGO 调用</span><br><span class="line">               ↓</span><br><span class="line">┌─────────────────────────────────────────┐</span><br><span class="line">│      C 接口层 (_c.h, _c.cpp)            │</span><br><span class="line">│  封装 C++ 实现，提供 C 兼容接口          │</span><br><span class="line">└──────────────┬──────────────────────────┘</span><br><span class="line">               │</span><br><span class="line">               ↓</span><br><span class="line">┌─────────────────────────────────────────┐</span><br><span class="line">│      C++ 实现层                         │</span><br><span class="line">│  (高性能核心逻辑)                        │</span><br><span class="line">└─────────────────────────────────────────┘</span><br></pre></td></tr></table></figure><h3 id="1-2-CGO-使用方式"><a href="#1-2-CGO-使用方式" class="headerlink" title="1.2 CGO 使用方式"></a>1.2 CGO 使用方式</h3><p><strong>典型示例</strong>：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">package</span> segcore</span><br><span class="line"></span><br><span class="line"><span class="comment">/*</span></span><br><span class="line"><span class="comment">#cgo pkg-config: milvus_core</span></span><br><span class="line"><span class="comment"></span></span><br><span class="line"><span class="comment">#include &quot;segcore/collection_c.h&quot;</span></span><br><span class="line"><span class="comment">#include &quot;segcore/segment_c.h&quot;</span></span><br><span class="line"><span class="comment">#include &quot;common/type_c.h&quot;</span></span><br><span class="line"><span class="comment">*/</span></span><br><span class="line"><span class="keyword">import</span> <span class="string">&quot;C&quot;</span></span><br><span class="line"></span><br><span class="line"><span class="comment">// Go 代码调用 C 函数</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">CreateCCollection</span><span class="params">(req *CreateCCollectionRequest)</span></span> (*CCollection, <span class="type">error</span>) &#123;</span><br><span class="line">    <span class="keyword">var</span> ptr C.CCollection</span><br><span class="line">    status := C.NewCollection(</span><br><span class="line">        unsafe.Pointer(&amp;schemaBlob[<span class="number">0</span>]), </span><br><span class="line">        (C.int64_t)(<span class="built_in">len</span>(schemaBlob)), </span><br><span class="line">        &amp;ptr)</span><br><span class="line">    <span class="comment">// ...</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h2 id="二、主要-C-功能模块"><a href="#二、主要-C-功能模块" class="headerlink" title="二、主要 C++ 功能模块"></a>二、主要 C++ 功能模块</h2><h3 id="2-1-索引构建（Index-Building）"><a href="#2-1-索引构建（Index-Building）" class="headerlink" title="2.1 索引构建（Index Building）"></a>2.1 索引构建（Index Building）</h3><p><strong>位置</strong>：<code>internal/core/src/indexbuilder/</code></p><p><strong>CGO 接口</strong>：<code>indexbuilder/index_c.h</code></p><p><strong>Go 包装</strong>：<code>internal/util/indexcgowrapper/</code></p><p><strong>功能</strong>：</p><ul><li>✅ 向量索引构建（IVF、HNSW、FLAT 等）</li><li>✅ 标量索引构建（倒排索引、B+树等）</li><li>✅ 文本索引构建（BM25、N-gram 等）</li><li>✅ 索引序列化和反序列化</li><li>✅ 索引文件管理</li></ul><p><strong>关键代码</strong>：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// internal/util/indexcgowrapper/index.go</span></span><br><span class="line"><span class="keyword">type</span> CgoIndex <span class="keyword">struct</span> &#123;</span><br><span class="line">    indexPtr C.CIndex</span><br><span class="line">    <span class="built_in">close</span>    <span class="type">bool</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">CreateIndex</span><span class="params">(ctx context.Context, buildIndexInfo *indexcgopb.BuildIndexInfo)</span></span> (CodecIndex, <span class="type">error</span>) &#123;</span><br><span class="line">    <span class="keyword">var</span> indexPtr C.CIndex</span><br><span class="line">    status := C.CreateIndex(&amp;indexPtr, </span><br><span class="line">        (*C.uint8_t)(unsafe.Pointer(&amp;buildIndexInfoBlob[<span class="number">0</span>])), </span><br><span class="line">        (C.uint64_t)(<span class="built_in">len</span>(buildIndexInfoBlob)))</span><br><span class="line">    <span class="comment">// ...</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>C++ 实现文件</strong>：</p><ul><li><code>index_c.cpp</code> - 索引创建和管理</li><li><code>VecIndexCreator.cpp</code> - 向量索引创建</li><li><code>ScalarIndexCreator.cpp</code> - 标量索引创建</li></ul><h3 id="2-2-段核心（Segcore）"><a href="#2-2-段核心（Segcore）" class="headerlink" title="2.2 段核心（Segcore）"></a>2.2 段核心（Segcore）</h3><p><strong>位置</strong>：<code>internal/core/src/segcore/</code></p><p><strong>CGO 接口</strong>：多个 <code>*_c.h</code> 文件</p><p><strong>Go 包装</strong>：<code>internal/util/segcore/</code></p><p><strong>功能模块</strong>：</p><h4 id="2-2-1-Collection-管理"><a href="#2-2-1-Collection-管理" class="headerlink" title="2.2.1 Collection 管理"></a>2.2.1 Collection 管理</h4><p><strong>CGO 接口</strong>：<code>segcore/collection_c.h</code></p><p><strong>功能</strong>：</p><ul><li>✅ Collection 创建和销毁</li><li>✅ Schema 管理</li><li>✅ Index Meta 管理</li><li>✅ 字段加载配置</li></ul><p><strong>关键代码</strong>：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// internal/util/segcore/collection.go</span></span><br><span class="line"><span class="keyword">type</span> CCollection <span class="keyword">struct</span> &#123;</span><br><span class="line">    ptr          C.CCollection</span><br><span class="line">    collectionID <span class="type">int64</span></span><br><span class="line">    schema       *schemapb.CollectionSchema</span><br><span class="line">    indexMeta    *segcorepb.CollectionIndexMeta</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">CreateCCollection</span><span class="params">(req *CreateCCollectionRequest)</span></span> (*CCollection, <span class="type">error</span>) &#123;</span><br><span class="line">    <span class="keyword">var</span> ptr C.CCollection</span><br><span class="line">    status := C.NewCollection(</span><br><span class="line">        unsafe.Pointer(&amp;schemaBlob[<span class="number">0</span>]), </span><br><span class="line">        (C.int64_t)(<span class="built_in">len</span>(schemaBlob)), </span><br><span class="line">        &amp;ptr)</span><br><span class="line">    <span class="comment">// ...</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h4 id="2-2-2-Segment-操作"><a href="#2-2-2-Segment-操作" class="headerlink" title="2.2.2 Segment 操作"></a>2.2.2 Segment 操作</h4><p><strong>CGO 接口</strong>：<code>segcore/segment_c.h</code></p><p><strong>功能</strong>：</p><ul><li>✅ Segment 创建（Growing&#x2F;Sealed）</li><li>✅ 数据插入（Insert）</li><li>✅ 数据删除（Delete）</li><li>✅ 字段数据加载（LoadFieldData）</li><li>✅ 索引加载（LoadIndex）</li><li>✅ 内存使用统计</li></ul><p><strong>关键代码</strong>：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// internal/util/segcore/segment.go</span></span><br><span class="line"><span class="keyword">type</span> cSegmentImpl <span class="keyword">struct</span> &#123;</span><br><span class="line">    id  <span class="type">int64</span></span><br><span class="line">    ptr C.CSegmentInterface</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(s *cSegmentImpl)</span></span> Insert(ctx context.Context, request *InsertRequest) (*InsertResult, <span class="type">error</span>) &#123;</span><br><span class="line">    status := C.Insert(s.ptr,</span><br><span class="line">        (*C.uint8_t)(unsafe.Pointer(&amp;insertBlob[<span class="number">0</span>])),</span><br><span class="line">        (C.int64_t)(<span class="built_in">len</span>(insertBlob)),</span><br><span class="line">        C.int64_t(request.NumRows),</span><br><span class="line">        C.int64_t(request.Timestamp))</span><br><span class="line">    <span class="comment">// ...</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>C++ 实现文件</strong>：</p><ul><li><code>segment_c.cpp</code> - Segment C 接口</li><li><code>SegmentGrowingImpl.cpp</code> - Growing Segment 实现</li><li><code>ChunkedSegmentSealedImpl.cpp</code> - Sealed Segment 实现</li></ul><h4 id="2-2-3-查询计划（Query-Plan）"><a href="#2-2-3-查询计划（Query-Plan）" class="headerlink" title="2.2.3 查询计划（Query Plan）"></a>2.2.3 查询计划（Query Plan）</h4><p><strong>CGO 接口</strong>：<code>segcore/plan_c.h</code></p><p><strong>功能</strong>：</p><ul><li>✅ 查询计划生成</li><li>✅ 表达式解析</li><li>✅ 查询优化</li></ul><p><strong>关键代码</strong>：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// internal/util/segcore/plan.go</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">CreatePlan</span><span class="params">(collection *CCollection, planBlob []<span class="type">byte</span>)</span></span> (CPlan, <span class="type">error</span>) &#123;</span><br><span class="line">    <span class="keyword">var</span> planPtr C.CPlan</span><br><span class="line">    status := C.CreatePlan(collection.rawPointer(),</span><br><span class="line">        (*C.uint8_t)(unsafe.Pointer(&amp;planBlob[<span class="number">0</span>])),</span><br><span class="line">        (C.int64_t)(<span class="built_in">len</span>(planBlob)),</span><br><span class="line">        &amp;planPtr)</span><br><span class="line">    <span class="comment">// ...</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h4 id="2-2-4-结果归并（Reduce）"><a href="#2-2-4-结果归并（Reduce）" class="headerlink" title="2.2.4 结果归并（Reduce）"></a>2.2.4 结果归并（Reduce）</h4><p><strong>CGO 接口</strong>：<code>segcore/reduce_c.h</code></p><p><strong>功能</strong>：</p><ul><li>✅ 多段查询结果归并</li><li>✅ TopK 结果合并</li><li>✅ 聚合操作</li></ul><p><strong>关键代码</strong>：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// internal/util/segcore/reduce.go</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">ReduceSearchResults</span><span class="params">(ctx context.Context, </span></span></span><br><span class="line"><span class="params"><span class="function">    plan CPlan, </span></span></span><br><span class="line"><span class="params"><span class="function">    searchResults []*SearchResult, </span></span></span><br><span class="line"><span class="params"><span class="function">    numSegments <span class="type">int64</span>)</span></span> (*SearchResult, <span class="type">error</span>) &#123;</span><br><span class="line">    <span class="comment">// 调用 C++ reduce 函数</span></span><br><span class="line">    <span class="comment">// ...</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>C++ 实现文件</strong>：</p><ul><li><code>reduce_c.cpp</code> - Reduce C 接口</li><li><code>reduce/Reduce.cpp</code> - Reduce 实现</li><li><code>reduce/GroupReduce.cpp</code> - 分组 Reduce</li></ul><h4 id="2-2-5-向量索引操作"><a href="#2-2-5-向量索引操作" class="headerlink" title="2.2.5 向量索引操作"></a>2.2.5 向量索引操作</h4><p><strong>CGO 接口</strong>：<code>segcore/vector_index_c.h</code></p><p><strong>功能</strong>：</p><ul><li>✅ 向量索引加载</li><li>✅ 向量搜索</li><li>✅ 索引管理</li></ul><p><strong>关键代码</strong>：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// internal/util/vecindexmgr/vector_index_mgr.go</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">LoadVectorIndex</span><span class="params">(ctx context.Context, </span></span></span><br><span class="line"><span class="params"><span class="function">    indexBlob []<span class="type">byte</span>, </span></span></span><br><span class="line"><span class="params"><span class="function">    indexParams <span class="keyword">map</span>[<span class="type">string</span>]<span class="type">string</span>)</span></span> (C.CVectorIndex, <span class="type">error</span>) &#123;</span><br><span class="line">    <span class="comment">// 调用 C++ 加载向量索引</span></span><br><span class="line">    <span class="comment">// ...</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h4 id="2-2-6-字段数据加载"><a href="#2-2-6-字段数据加载" class="headerlink" title="2.2.6 字段数据加载"></a>2.2.6 字段数据加载</h4><p><strong>CGO 接口</strong>：<code>segcore/load_field_data_c.h</code></p><p><strong>功能</strong>：</p><ul><li>✅ 从存储加载字段数据</li><li>✅ 数据格式转换</li><li>✅ 内存管理</li></ul><p><strong>C++ 实现文件</strong>：</p><ul><li><code>load_field_data_c.cpp</code> - 字段数据加载 C 接口</li></ul><h4 id="2-2-7-索引加载"><a href="#2-2-7-索引加载" class="headerlink" title="2.2.7 索引加载"></a>2.2.7 索引加载</h4><p><strong>CGO 接口</strong>：<code>segcore/load_index_c.h</code></p><p><strong>功能</strong>：</p><ul><li>✅ 索引文件加载</li><li>✅ 索引内存映射</li><li>✅ 索引验证</li></ul><p><strong>C++ 实现文件</strong>：</p><ul><li><code>load_index_c.cpp</code> - 索引加载 C 接口</li></ul><h3 id="2-3-存储（Storage）"><a href="#2-3-存储（Storage）" class="headerlink" title="2.3 存储（Storage）"></a>2.3 存储（Storage）</h3><p><strong>位置</strong>：<code>internal/core/src/storage/</code></p><p><strong>CGO 接口</strong>：<code>storage/storage_c.h</code>, <code>segcore/packed_writer_c.h</code>, <code>segcore/packed_reader_c.h</code></p><p><strong>Go 包装</strong>：<code>internal/storagev2/packed/</code></p><p><strong>功能</strong>：</p><h4 id="2-3-1-Parquet-读写"><a href="#2-3-1-Parquet-读写" class="headerlink" title="2.3.1 Parquet 读写"></a>2.3.1 Parquet 读写</h4><p><strong>功能</strong>：</p><ul><li>✅ Arrow RecordBatch → Parquet 文件写入</li><li>✅ Parquet 文件 → Arrow RecordBatch 读取</li><li>✅ 列组（Column Group）管理</li><li>✅ 压缩和编码</li></ul><p><strong>关键代码</strong>：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// internal/storagev2/packed/packed_writer.go</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(pw *PackedWriter)</span></span> WriteRecordBatch(recordBatch arrow.Record) <span class="type">error</span> &#123;</span><br><span class="line">    <span class="comment">// 导出 Arrow 数据到 C 结构</span></span><br><span class="line">    cArrays := <span class="built_in">make</span>([]CArrowArray, recordBatch.NumCols())</span><br><span class="line">    cSchemas := <span class="built_in">make</span>([]CArrowSchema, recordBatch.NumCols())</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">for</span> i := <span class="keyword">range</span> recordBatch.NumCols() &#123;</span><br><span class="line">        <span class="keyword">var</span> caa cdata.CArrowArray</span><br><span class="line">        <span class="keyword">var</span> cas cdata.CArrowSchema</span><br><span class="line">        cdata.ExportArrowArray(recordBatch.Column(<span class="type">int</span>(i)), &amp;caa, &amp;cas)</span><br><span class="line">        cArrays[i] = *(*CArrowArray)(unsafe.Pointer(&amp;caa))</span><br><span class="line">        cSchemas[i] = *(*CArrowSchema)(unsafe.Pointer(&amp;cas))</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 调用 C++ 写入 Parquet</span></span><br><span class="line">    status := C.WriteRecordBatch(pw.cPackedWriter, </span><br><span class="line">        &amp;cArrays[<span class="number">0</span>], </span><br><span class="line">        &amp;cSchemas[<span class="number">0</span>], </span><br><span class="line">        cSchema)</span><br><span class="line">    <span class="comment">// ...</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>C++ 实现文件</strong>：</p><ul><li><code>packed_writer_c.cpp</code> - Parquet 写入 C 接口</li><li><code>packed_reader_c.cpp</code> - Parquet 读取 C 接口</li><li><code>loon_ffi/ffi_writer_c.cpp</code> - FFI 写入接口</li><li><code>loon_ffi/ffi_reader_c.cpp</code> - FFI 读取接口</li></ul><h4 id="2-3-2-存储抽象"><a href="#2-3-2-存储抽象" class="headerlink" title="2.3.2 存储抽象"></a>2.3.2 存储抽象</h4><p><strong>功能</strong>：</p><ul><li>✅ 文件系统抽象</li><li>✅ 对象存储支持（S3、MinIO 等）</li><li>✅ 本地文件系统支持</li></ul><p><strong>C++ 实现文件</strong>：</p><ul><li><code>storage_c.cpp</code> - 存储 C 接口</li></ul><h3 id="2-4-表达式执行（Expression-Execution）"><a href="#2-4-表达式执行（Expression-Execution）" class="headerlink" title="2.4 表达式执行（Expression Execution）"></a>2.4 表达式执行（Expression Execution）</h3><p><strong>位置</strong>：<code>internal/core/src/exec/expression/</code></p><p><strong>功能</strong>：</p><ul><li>✅ 表达式解析和编译</li><li>✅ 标量函数执行</li><li>✅ 向量函数执行</li><li>✅ JSON 表达式处理</li><li>✅ 过滤条件评估</li></ul><p><strong>C++ 实现文件</strong>：</p><ul><li><code>function/init_c.cpp</code> - 表达式函数初始化</li></ul><h3 id="2-5-聚类分析（Clustering）"><a href="#2-5-聚类分析（Clustering）" class="headerlink" title="2.5 聚类分析（Clustering）"></a>2.5 聚类分析（Clustering）</h3><p><strong>位置</strong>：<code>internal/core/src/clustering/</code></p><p><strong>CGO 接口</strong>：<code>clustering/analyze_c.h</code></p><p><strong>Go 包装</strong>：<code>internal/util/analyzecgowrapper/</code></p><p><strong>功能</strong>：</p><ul><li>✅ K-means 聚类</li><li>✅ 数据分析和统计</li></ul><p><strong>关键代码</strong>：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// internal/util/analyzecgowrapper/helper.go</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">Analyze</span><span class="params">(ctx context.Context, </span></span></span><br><span class="line"><span class="params"><span class="function">    analyzeInfo *indexcgopb.AnalyzeInfo)</span></span> <span class="type">error</span> &#123;</span><br><span class="line">    <span class="comment">// 调用 C++ 聚类分析</span></span><br><span class="line">    <span class="comment">// ...</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>C++ 实现文件</strong>：</p><ul><li><code>analyze_c.cpp</code> - 聚类分析 C 接口</li><li><code>KmeansClustering.cpp</code> - K-means 实现</li></ul><h3 id="2-6-监控（Monitor）"><a href="#2-6-监控（Monitor）" class="headerlink" title="2.6 监控（Monitor）"></a>2.6 监控（Monitor）</h3><p><strong>位置</strong>：<code>internal/core/src/monitor/</code></p><p><strong>CGO 接口</strong>：<code>monitor/monitor_c.h</code></p><p><strong>功能</strong>：</p><ul><li>✅ 性能指标收集</li><li>✅ 资源使用监控</li><li>✅ 指标导出</li></ul><p><strong>C++ 实现文件</strong>：</p><ul><li><code>monitor_c.cpp</code> - 监控 C 接口</li><li><code>Monitor.cpp</code> - 监控实现</li></ul><h3 id="2-7-Tokenizer"><a href="#2-7-Tokenizer" class="headerlink" title="2.7 Tokenizer"></a>2.7 Tokenizer</h3><p><strong>位置</strong>：<code>internal/core/src/segcore/</code></p><p><strong>CGO 接口</strong>：<code>segcore/tokenizer_c.h</code>, <code>segcore/token_stream_c.h</code></p><p><strong>功能</strong>：</p><ul><li>✅ 文本分词</li><li>✅ Token 流处理</li><li>✅ 支持多种分词器</li></ul><p><strong>C++ 实现文件</strong>：</p><ul><li><code>tokenizer_c.cpp</code> - Tokenizer C 接口</li><li><code>token_stream_c.cpp</code> - Token Stream C 接口</li></ul><h3 id="2-8-Future（异步操作）"><a href="#2-8-Future（异步操作）" class="headerlink" title="2.8 Future（异步操作）"></a>2.8 Future（异步操作）</h3><p><strong>位置</strong>：<code>internal/core/src/futures/</code></p><p><strong>CGO 接口</strong>：<code>futures/future_c.h</code></p><p><strong>Go 包装</strong>：<code>internal/util/cgo/futures.go</code></p><p><strong>功能</strong>：</p><ul><li>✅ 异步操作支持</li><li>✅ Future&#x2F;Promise 模式</li><li>✅ 异步查询和检索</li></ul><p><strong>关键代码</strong>：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// internal/util/cgo/futures.go</span></span><br><span class="line"><span class="keyword">type</span> CFuturePtr = C.CFuturePtr</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">WaitForFuture</span><span class="params">(future CFuturePtr)</span></span> (*C.CProto, <span class="type">error</span>) &#123;</span><br><span class="line">    result := C.WaitAndGetFuture(future)</span><br><span class="line">    <span class="comment">// ...</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>C++ 实现文件</strong>：</p><ul><li><code>future_c.cpp</code> - Future C 接口</li><li><code>Future.cpp</code> - Future 实现</li></ul><h3 id="2-9-向量索引检查"><a href="#2-9-向量索引检查" class="headerlink" title="2.9 向量索引检查"></a>2.9 向量索引检查</h3><p><strong>位置</strong>：<code>internal/core/src/segcore/</code></p><p><strong>CGO 接口</strong>：<code>segcore/check_vec_index_c.h</code></p><p><strong>Go 包装</strong>：<code>internal/proxy/cgo_util.go</code></p><p><strong>功能</strong>：</p><ul><li>✅ 向量索引存在性检查</li><li>✅ 索引类型验证</li></ul><p><strong>关键代码</strong>：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// internal/proxy/cgo_util.go</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">CheckVecIndexWithDataTypeExist</span><span class="params">(name <span class="type">string</span>, </span></span></span><br><span class="line"><span class="params"><span class="function">    dataType schemapb.DataType, </span></span></span><br><span class="line"><span class="params"><span class="function">    indexType <span class="type">string</span>)</span></span> (<span class="type">bool</span>, <span class="type">error</span>) &#123;</span><br><span class="line">    cName := C.CString(name)</span><br><span class="line">    <span class="keyword">defer</span> C.free(unsafe.Pointer(cName))</span><br><span class="line">    </span><br><span class="line">    cIndexType := C.CString(indexType)</span><br><span class="line">    <span class="keyword">defer</span> C.free(unsafe.Pointer(cIndexType))</span><br><span class="line">    </span><br><span class="line">    ret := C.CheckVecIndexWithDataTypeExist(cName, </span><br><span class="line">        C.int32_t(<span class="type">int32</span>(dataType)), </span><br><span class="line">        cIndexType)</span><br><span class="line">    <span class="comment">// ...</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h2 id="三、CGO-接口设计模式"><a href="#三、CGO-接口设计模式" class="headerlink" title="三、CGO 接口设计模式"></a>三、CGO 接口设计模式</h2><h3 id="3-1-错误处理"><a href="#3-1-错误处理" class="headerlink" title="3.1 错误处理"></a>3.1 错误处理</h3><p><strong>C 结构</strong>：</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">typedef</span> <span class="class"><span class="keyword">struct</span> <span class="title">CStatus</span> &#123;</span></span><br><span class="line">    <span class="type">int32_t</span> error_code;</span><br><span class="line">    <span class="type">char</span>* error_msg;</span><br><span class="line">&#125; CStatus;</span><br></pre></td></tr></table></figure><p><strong>Go 处理</strong>：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">ConsumeCStatusIntoError</span><span class="params">(status *C.CStatus)</span></span> <span class="type">error</span> &#123;</span><br><span class="line">    <span class="keyword">if</span> status.error_code == <span class="number">0</span> &#123;</span><br><span class="line">        <span class="keyword">return</span> <span class="literal">nil</span></span><br><span class="line">    &#125;</span><br><span class="line">    errorCode := status.error_code</span><br><span class="line">    errorMsg := C.GoString(status.error_msg)</span><br><span class="line">    C.free(unsafe.Pointer(status.error_msg))</span><br><span class="line">    <span class="keyword">return</span> merr.SegcoreError(<span class="type">int32</span>(errorCode), errorMsg)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h3 id="3-2-内存管理"><a href="#3-2-内存管理" class="headerlink" title="3.2 内存管理"></a>3.2 内存管理</h3><p><strong>原则</strong>：</p><ul><li>C 分配的内存由 C 释放</li><li>Go 传递的字符串需要转换为 C 字符串</li><li>使用 <code>defer C.free()</code> 确保内存释放</li></ul><p><strong>示例</strong>：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">cStr := C.CString(goString)</span><br><span class="line"><span class="keyword">defer</span> C.free(unsafe.Pointer(cStr))</span><br><span class="line"></span><br><span class="line"><span class="comment">// 调用 C 函数</span></span><br><span class="line">C.SomeFunction(cStr)</span><br></pre></td></tr></table></figure><h3 id="3-3-数据转换"><a href="#3-3-数据转换" class="headerlink" title="3.3 数据转换"></a>3.3 数据转换</h3><p><strong>Protobuf 转换</strong>：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// Go Protobuf → C 内存</span></span><br><span class="line">schemaBlob, err := proto.Marshal(req.Schema)</span><br><span class="line">status := C.NewCollection(</span><br><span class="line">    unsafe.Pointer(&amp;schemaBlob[<span class="number">0</span>]), </span><br><span class="line">    (C.int64_t)(<span class="built_in">len</span>(schemaBlob)), </span><br><span class="line">    &amp;ptr)</span><br><span class="line"></span><br><span class="line"><span class="comment">// C 内存 → Go Protobuf</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">UnmarshalProtoLayout</span><span class="params">(protoLayout any, msg proto.Message)</span></span> <span class="type">error</span> &#123;</span><br><span class="line">    layout := unsafe.Pointer(reflect.ValueOf(protoLayout).Pointer())</span><br><span class="line">    cProtoLayout := (*C.ProtoLayout)(layout)</span><br><span class="line">    blob := (*(*[math.MaxInt32]<span class="type">byte</span>)(cProtoLayout.blob))[:<span class="type">int</span>(cProtoLayout.size)]</span><br><span class="line">    <span class="keyword">return</span> proto.Unmarshal(blob, msg)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>Arrow 数据转换</strong>：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// Arrow RecordBatch → C Arrow Array</span></span><br><span class="line"><span class="keyword">var</span> caa cdata.CArrowArray</span><br><span class="line"><span class="keyword">var</span> cas cdata.CArrowSchema</span><br><span class="line">cdata.ExportArrowArray(recordBatch.Column(i), &amp;caa, &amp;cas)</span><br><span class="line">cArray := (*CArrowArray)(unsafe.Pointer(&amp;caa))</span><br></pre></td></tr></table></figure><h2 id="四、性能优化考虑"><a href="#四、性能优化考虑" class="headerlink" title="四、性能优化考虑"></a>四、性能优化考虑</h2><h3 id="4-1-CGO-调用开销"><a href="#4-1-CGO-调用开销" class="headerlink" title="4.1 CGO 调用开销"></a>4.1 CGO 调用开销</h3><p><strong>问题</strong>：</p><ul><li>CGO 调用有固定开销（约 50-100ns）</li><li>跨语言边界数据拷贝</li><li>线程切换开销</li></ul><p><strong>优化策略</strong>：</p><ol><li><strong>批量操作</strong>：减少 CGO 调用次数</li><li><strong>零拷贝</strong>：使用 Arrow C Data Interface</li><li><strong>异步操作</strong>：使用 Future 模式</li><li><strong>线程池</strong>：复用 CGO 线程</li></ol><h3 id="4-2-线程管理"><a href="#4-2-线程管理" class="headerlink" title="4.2 线程管理"></a>4.2 线程管理</h3><p><strong>CGO 线程绑定</strong>：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// internal/proxy/cgo_util.go</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">initDynamicPool</span><span class="params">()</span></span> &#123;</span><br><span class="line">    pool := conc.NewPool[any](</span><br><span class="line">        hardware.GetCPUNum(),</span><br><span class="line">        conc.WithPreHandler(runtime.LockOSThread), <span class="comment">// 锁定 OS 线程</span></span><br><span class="line">    )</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>原因</strong>：</p><ul><li>CGO 调用需要在同一线程</li><li>避免线程切换开销</li><li>保证线程安全</li></ul><h3 id="4-3-内存管理优化"><a href="#4-3-内存管理优化" class="headerlink" title="4.3 内存管理优化"></a>4.3 内存管理优化</h3><p><strong>策略</strong>：</p><ul><li>使用对象池减少分配</li><li>预分配缓冲区</li><li>及时释放 C 内存</li></ul><h2 id="五、主要使用场景"><a href="#五、主要使用场景" class="headerlink" title="五、主要使用场景"></a>五、主要使用场景</h2><h3 id="5-1-QueryNode-中的使用"><a href="#5-1-QueryNode-中的使用" class="headerlink" title="5.1 QueryNode 中的使用"></a>5.1 QueryNode 中的使用</h3><p><strong>场景</strong>：</p><ul><li>Segment 加载和管理</li><li>查询执行（Search&#x2F;Retrieve）</li><li>结果归并</li><li>索引加载</li></ul><p><strong>关键文件</strong>：</p><ul><li><code>internal/querynodev2/segments/segment.go</code></li><li><code>internal/util/segcore/</code></li></ul><h3 id="5-2-DataNode-中的使用"><a href="#5-2-DataNode-中的使用" class="headerlink" title="5.2 DataNode 中的使用"></a>5.2 DataNode 中的使用</h3><p><strong>场景</strong>：</p><ul><li>索引构建</li><li>字段数据加载</li><li>存储读写</li></ul><p><strong>关键文件</strong>：</p><ul><li><code>internal/util/indexcgowrapper/</code></li><li><code>internal/datanode/index/</code></li></ul><h3 id="5-3-Proxy-中的使用"><a href="#5-3-Proxy-中的使用" class="headerlink" title="5.3 Proxy 中的使用"></a>5.3 Proxy 中的使用</h3><p><strong>场景</strong>：</p><ul><li>向量索引检查</li><li>参数验证</li></ul><p><strong>关键文件</strong>：</p><ul><li><code>internal/proxy/cgo_util.go</code></li></ul><h3 id="5-4-StorageV2-写入流程"><a href="#5-4-StorageV2-写入流程" class="headerlink" title="5.4 StorageV2 写入流程"></a>5.4 StorageV2 写入流程</h3><p><strong>场景</strong>：</p><ul><li>Arrow Record → Parquet 文件</li><li>列组管理</li><li>压缩和编码</li></ul><p><strong>关键文件</strong>：</p><ul><li><code>internal/storagev2/packed/packed_writer.go</code></li><li><code>internal/core/src/segcore/packed_writer_c.cpp</code></li></ul><h2 id="六、C-核心模块总结"><a href="#六、C-核心模块总结" class="headerlink" title="六、C++ 核心模块总结"></a>六、C++ 核心模块总结</h2><table><thead><tr><th>模块</th><th>CGO 接口</th><th>Go 包装</th><th>主要功能</th></tr></thead><tbody><tr><td><strong>索引构建</strong></td><td><code>indexbuilder/index_c.h</code></td><td><code>internal/util/indexcgowrapper/</code></td><td>向量&#x2F;标量&#x2F;文本索引构建</td></tr><tr><td><strong>Collection</strong></td><td><code>segcore/collection_c.h</code></td><td><code>internal/util/segcore/collection.go</code></td><td>Collection 管理</td></tr><tr><td><strong>Segment</strong></td><td><code>segcore/segment_c.h</code></td><td><code>internal/util/segcore/segment.go</code></td><td>Segment 操作（Insert&#x2F;Delete&#x2F;Load）</td></tr><tr><td><strong>查询计划</strong></td><td><code>segcore/plan_c.h</code></td><td><code>internal/util/segcore/plan.go</code></td><td>查询计划生成</td></tr><tr><td><strong>结果归并</strong></td><td><code>segcore/reduce_c.h</code></td><td><code>internal/util/segcore/reduce.go</code></td><td>多段结果归并</td></tr><tr><td><strong>向量索引</strong></td><td><code>segcore/vector_index_c.h</code></td><td><code>internal/util/vecindexmgr/</code></td><td>向量索引操作</td></tr><tr><td><strong>存储</strong></td><td><code>storage/storage_c.h</code></td><td><code>internal/storagev2/packed/</code></td><td>Parquet 读写</td></tr><tr><td><strong>Parquet 写入</strong></td><td><code>segcore/packed_writer_c.h</code></td><td><code>internal/storagev2/packed/packed_writer.go</code></td><td>Arrow → Parquet</td></tr><tr><td><strong>Parquet 读取</strong></td><td><code>segcore/packed_reader_c.h</code></td><td><code>internal/storagev2/packed/packed_reader.go</code></td><td>Parquet → Arrow</td></tr><tr><td><strong>聚类分析</strong></td><td><code>clustering/analyze_c.h</code></td><td><code>internal/util/analyzecgowrapper/</code></td><td>K-means 聚类</td></tr><tr><td><strong>监控</strong></td><td><code>monitor/monitor_c.h</code></td><td>-</td><td>性能监控</td></tr><tr><td><strong>Tokenizer</strong></td><td><code>segcore/tokenizer_c.h</code></td><td>-</td><td>文本分词</td></tr><tr><td><strong>Future</strong></td><td><code>futures/future_c.h</code></td><td><code>internal/util/cgo/futures.go</code></td><td>异步操作</td></tr></tbody></table><h2 id="七、总结"><a href="#七、总结" class="headerlink" title="七、总结"></a>七、总结</h2><h3 id="7-1-设计优势"><a href="#7-1-设计优势" class="headerlink" title="7.1 设计优势"></a>7.1 设计优势</h3><ol><li>✅ <strong>性能</strong>：核心路径使用 C++，充分利用硬件优化</li><li>✅ <strong>可维护性</strong>：Go 代码简洁，易于维护</li><li>✅ <strong>生态</strong>：利用 C++ 生态（Arrow、Parquet、Knowhere 等）</li><li>✅ <strong>灵活性</strong>：可以逐步迁移功能到 C++</li></ol><h3 id="7-2-关键设计原则"><a href="#7-2-关键设计原则" class="headerlink" title="7.2 关键设计原则"></a>7.2 关键设计原则</h3><ol><li><strong>清晰的接口边界</strong>：C 接口封装 C++ 实现</li><li><strong>统一错误处理</strong>：CStatus 统一错误格式</li><li><strong>内存安全</strong>：明确的内存管理责任</li><li><strong>性能优化</strong>：减少 CGO 调用，使用零拷贝</li></ol><h3 id="7-3-未来方向"><a href="#7-3-未来方向" class="headerlink" title="7.3 未来方向"></a>7.3 未来方向</h3><ul><li>继续优化 CGO 调用开销</li><li>更多功能迁移到 C++</li><li>更好的异步支持</li><li>改进内存管理策略</li></ul><p>Milvus 的 C++&#x2F;CGO 集成设计为系统提供了高性能和可维护性的良好平衡，是混合语言架构的优秀实践。</p>]]></content>
    
    
      
      
    <summary type="html">&lt;h2 id=&quot;概述&quot;&gt;&lt;a href=&quot;#概述&quot; class=&quot;headerlink&quot; title=&quot;概述&quot;&gt;&lt;/a&gt;概述&lt;/h2&gt;&lt;p&gt;Milvus 采用混合架构设计，核心性能关键路径使用 C++ 实现，通过 CGO（C Go）提供给 Go 代码调用。这种设计既保证了性能，又</summary>
      
    
    
    
    
    <category term="Milvus" scheme="https://szza.github.io/tags/Milvus/"/>
    
  </entry>
  
  <entry>
    <title>Milvus StorageV2 写入流程分析</title>
    <link href="https://szza.github.io/2025/08/09/Milvus/12_storagev2_write_flow_analysis/"/>
    <id>https://szza.github.io/2025/08/09/Milvus/12_storagev2_write_flow_analysis/</id>
    <published>2025-08-09T15:00:00.000Z</published>
    <updated>2026-01-06T13:10:48.068Z</updated>
    
    <content type="html"><![CDATA[<h2 id="概述"><a href="#概述" class="headerlink" title="概述"></a>概述</h2><p>StorageV2 是 Milvus 的新一代存储格式，使用 Apache Arrow + Parquet 技术栈，相比 StorageV1（Binlog 格式）具有更高的压缩率和查询性能。本文档详细分析 StorageV2 的完整写入流程。</p><h2 id="一、整体架构"><a href="#一、整体架构" class="headerlink" title="一、整体架构"></a>一、整体架构</h2><h3 id="1-1-数据流向"><a href="#1-1-数据流向" class="headerlink" title="1.1 数据流向"></a>1.1 数据流向</h3><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br></pre></td><td class="code"><pre><span class="line">用户 Insert/Delete 请求</span><br><span class="line">    ↓</span><br><span class="line">WriteBuffer.BufferData()        // 缓存数据到内存</span><br><span class="line">    ↓</span><br><span class="line">triggerSync()                   // 触发同步</span><br><span class="line">    ↓</span><br><span class="line">getSyncTask()                   // 创建 SyncTask</span><br><span class="line">    ├─ yieldBuffer()            // 从 buffer 获取数据</span><br><span class="line">    └─ NewSyncTask()            // 创建同步任务</span><br><span class="line">    ↓</span><br><span class="line">SyncTask.Run()</span><br><span class="line">    ↓</span><br><span class="line">BulkPackWriterV2.Write()        // StorageV2 写入</span><br><span class="line">    ├─ serializeBinlog()        // InsertData → Arrow Record</span><br><span class="line">    ├─ writeInserts()           // Arrow Record → Parquet</span><br><span class="line">    ├─ writeStats()             // 写入统计信息</span><br><span class="line">    ├─ writeDelta()             // 写入删除数据</span><br><span class="line">    └─ writeBM25Stasts()        // 写入 BM25 统计</span><br><span class="line">    ↓</span><br><span class="line">PackedRecordWriter.Write()      // 写入 Parquet 文件</span><br><span class="line">    ↓</span><br><span class="line">对象存储 (MinIO/S3/Local)</span><br></pre></td></tr></table></figure><h3 id="1-2-关键组件"><a href="#1-2-关键组件" class="headerlink" title="1.2 关键组件"></a>1.2 关键组件</h3><table><thead><tr><th>组件</th><th>职责</th><th>位置</th></tr></thead><tbody><tr><td><strong>WriteBuffer</strong></td><td>内存缓冲区，缓存写入数据</td><td><code>internal/flushcommon/writebuffer/</code></td></tr><tr><td><strong>SyncTask</strong></td><td>同步任务，负责将数据写入存储</td><td><code>internal/flushcommon/syncmgr/task.go</code></td></tr><tr><td><strong>BulkPackWriterV2</strong></td><td>StorageV2 写入器</td><td><code>internal/flushcommon/syncmgr/pack_writer_v2.go</code></td></tr><tr><td><strong>PackedRecordWriter</strong></td><td>Arrow Record → Parquet 转换器</td><td><code>internal/storage/record_writer.go</code></td></tr><tr><td><strong>PackedWriter</strong></td><td>底层 Parquet 写入器（C++ FFI）</td><td><code>internal/storagev2/packed/</code></td></tr></tbody></table><h2 id="二、详细流程分析"><a href="#二、详细流程分析" class="headerlink" title="二、详细流程分析"></a>二、详细流程分析</h2><h3 id="2-1-数据准备阶段（WriteBuffer-→-SyncPack）"><a href="#2-1-数据准备阶段（WriteBuffer-→-SyncPack）" class="headerlink" title="2.1 数据准备阶段（WriteBuffer → SyncPack）"></a>2.1 数据准备阶段（WriteBuffer → SyncPack）</h3><p><strong>文件</strong>: <code>internal/flushcommon/writebuffer/write_buffer.go</code></p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(wb *writeBufferBase)</span></span> getSyncTask(ctx context.Context, segmentID <span class="type">int64</span>) (syncmgr.Task, <span class="type">error</span>) &#123;</span><br><span class="line">    <span class="comment">// 1. 从 WriteBuffer 中提取数据</span></span><br><span class="line">    insert, bm25, delta, schema, timeRange, startPos := wb.yieldBuffer(segmentID)</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 2. 构建 SyncPack</span></span><br><span class="line">    pack := &amp;syncmgr.SyncPack&#123;&#125;</span><br><span class="line">    pack.WithInsertData(insert).      <span class="comment">// InsertData 数组</span></span><br><span class="line">        WithDeleteData(delta).         <span class="comment">// DeleteData</span></span><br><span class="line">        WithBM25Stats(bm25).           <span class="comment">// BM25 统计</span></span><br><span class="line">        WithCollectionID(wb.collectionID).</span><br><span class="line">        WithPartitionID(segmentInfo.PartitionID()).</span><br><span class="line">        WithSegmentID(segmentID).</span><br><span class="line">        WithTimeRange(tsFrom, tsTo).   <span class="comment">// Timestamp 范围</span></span><br><span class="line">        WithBatchRows(batchSize)</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 3. 创建 SyncTask</span></span><br><span class="line">    task := syncmgr.NewSyncTask(...)</span><br><span class="line">    <span class="keyword">return</span> task, <span class="literal">nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>关键数据结构</strong>：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// InsertData 包含所有字段的数据</span></span><br><span class="line"><span class="keyword">type</span> InsertData <span class="keyword">struct</span> &#123;</span><br><span class="line">    Data <span class="keyword">map</span>[FieldID]FieldData  <span class="comment">// map[fieldID]FieldData</span></span><br><span class="line">    RowNum() <span class="type">int</span></span><br><span class="line">    GetMemorySize() <span class="type">int64</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// SyncPack 包含一次同步的所有数据</span></span><br><span class="line"><span class="keyword">type</span> SyncPack <span class="keyword">struct</span> &#123;</span><br><span class="line">    insertData []*storage.InsertData  <span class="comment">// 插入数据</span></span><br><span class="line">    deltaData  *storage.DeleteData     <span class="comment">// 删除数据</span></span><br><span class="line">    bm25Stats  <span class="keyword">map</span>[<span class="type">int64</span>]*storage.BM25Stats</span><br><span class="line">    tsFrom, tsTo typeutil.Timestamp   <span class="comment">// Timestamp 范围</span></span><br><span class="line">    collectionID, partitionID, segmentID <span class="type">int64</span></span><br><span class="line">    <span class="comment">// ...</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h3 id="2-2-SyncTask-执行阶段"><a href="#2-2-SyncTask-执行阶段" class="headerlink" title="2.2 SyncTask 执行阶段"></a>2.2 SyncTask 执行阶段</h3><p><strong>文件</strong>: <code>internal/flushcommon/syncmgr/task.go</code></p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(t *SyncTask)</span></span> Run(ctx context.Context) (err <span class="type">error</span>) &#123;</span><br><span class="line">    <span class="comment">// 1. 获取段信息和列组配置</span></span><br><span class="line">    segmentInfo, has := t.metacache.GetSegmentByID(t.segmentID)</span><br><span class="line">    columnGroups := t.getColumnGroups(segmentInfo)</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 2. 根据 StorageVersion 选择写入器</span></span><br><span class="line">    <span class="keyword">switch</span> segmentInfo.GetStorageVersion() &#123;</span><br><span class="line">    <span class="keyword">case</span> storage.StorageV2:</span><br><span class="line">        writer := NewBulkPackWriterV2(</span><br><span class="line">            t.metacache, t.schema, t.chunkManager, t.allocator,</span><br><span class="line">            <span class="number">0</span>, packed.DefaultMultiPartUploadSize,</span><br><span class="line">            t.storageConfig, columnGroups, t.writeRetryOpts...)</span><br><span class="line">        </span><br><span class="line">        <span class="comment">// 3. 执行写入</span></span><br><span class="line">        t.insertBinlogs, t.deltaBinlog, t.statsBinlogs, </span><br><span class="line">        t.bm25Binlogs, t.manifestPath, t.flushedSize, err = </span><br><span class="line">            writer.Write(ctx, t.pack)</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">return</span> err</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h3 id="2-3-BulkPackWriterV2-写入阶段"><a href="#2-3-BulkPackWriterV2-写入阶段" class="headerlink" title="2.3 BulkPackWriterV2 写入阶段"></a>2.3 BulkPackWriterV2 写入阶段</h3><p><strong>文件</strong>: <code>internal/flushcommon/syncmgr/pack_writer_v2.go</code></p><h4 id="2-3-1-Write-方法主流程"><a href="#2-3-1-Write-方法主流程" class="headerlink" title="2.3.1 Write 方法主流程"></a>2.3.1 Write 方法主流程</h4><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(bw *BulkPackWriterV2)</span></span> Write(ctx context.Context, pack *SyncPack) (</span><br><span class="line">    inserts <span class="keyword">map</span>[<span class="type">int64</span>]*datapb.FieldBinlog,</span><br><span class="line">    deltas *datapb.FieldBinlog,</span><br><span class="line">    stats <span class="keyword">map</span>[<span class="type">int64</span>]*datapb.FieldBinlog,</span><br><span class="line">    bm25Stats <span class="keyword">map</span>[<span class="type">int64</span>]*datapb.FieldBinlog,</span><br><span class="line">    manifest <span class="type">string</span>,</span><br><span class="line">    size <span class="type">int64</span>,</span><br><span class="line">    err <span class="type">error</span>,</span><br><span class="line">) &#123;</span><br><span class="line">    <span class="comment">// 1. 预分配 ID</span></span><br><span class="line">    err = bw.prefetchIDs(pack)</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 2. 写入插入数据（核心步骤）</span></span><br><span class="line">    <span class="keyword">if</span> inserts, manifest, err = bw.writeInserts(ctx, pack); err != <span class="literal">nil</span> &#123;</span><br><span class="line">        <span class="keyword">return</span></span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 3. 写入统计信息</span></span><br><span class="line">    <span class="keyword">if</span> stats, err = bw.writeStats(ctx, pack); err != <span class="literal">nil</span> &#123;</span><br><span class="line">        <span class="keyword">return</span></span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 4. 写入删除数据</span></span><br><span class="line">    <span class="keyword">if</span> deltas, err = bw.writeDelta(ctx, pack); err != <span class="literal">nil</span> &#123;</span><br><span class="line">        <span class="keyword">return</span></span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 5. 写入 BM25 统计</span></span><br><span class="line">    <span class="keyword">if</span> bm25Stats, err = bw.writeBM25Stasts(ctx, pack); err != <span class="literal">nil</span> &#123;</span><br><span class="line">        <span class="keyword">return</span></span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    size = bw.sizeWritten</span><br><span class="line">    <span class="keyword">return</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h4 id="2-3-2-serializeBinlog：InsertData-→-Arrow-Record"><a href="#2-3-2-serializeBinlog：InsertData-→-Arrow-Record" class="headerlink" title="2.3.2 serializeBinlog：InsertData → Arrow Record"></a>2.3.2 serializeBinlog：InsertData → Arrow Record</h4><p><strong>关键步骤</strong>：将内存中的 InsertData 转换为 Arrow Record</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(bw *BulkPackWriterV2)</span></span> serializeBinlog(_ context.Context, pack *SyncPack) (storage.Record, <span class="type">error</span>) &#123;</span><br><span class="line">    <span class="keyword">if</span> <span class="built_in">len</span>(pack.insertData) == <span class="number">0</span> &#123;</span><br><span class="line">        <span class="keyword">return</span> <span class="literal">nil</span>, <span class="literal">nil</span></span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 1. 将 Milvus Schema 转换为 Arrow Schema</span></span><br><span class="line">    arrowSchema, err := storage.ConvertToArrowSchema(bw.schema, <span class="literal">true</span>)</span><br><span class="line">    <span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">        <span class="keyword">return</span> <span class="literal">nil</span>, err</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 2. 创建 Arrow RecordBuilder</span></span><br><span class="line">    builder := array.NewRecordBuilder(memory.DefaultAllocator, arrowSchema)</span><br><span class="line">    <span class="keyword">defer</span> builder.Release()</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 3. 遍历所有 InsertData chunk，构建 Arrow Record</span></span><br><span class="line">    <span class="keyword">for</span> _, chunk := <span class="keyword">range</span> pack.insertData &#123;</span><br><span class="line">        <span class="keyword">if</span> err := storage.BuildRecord(builder, chunk, bw.schema); err != <span class="literal">nil</span> &#123;</span><br><span class="line">            <span class="keyword">return</span> <span class="literal">nil</span>, err</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 4. 创建 Arrow Record</span></span><br><span class="line">    rec := builder.NewRecord()</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 5. 构建 FieldID 到列索引的映射</span></span><br><span class="line">    allFields := typeutil.GetAllFieldSchemas(bw.schema)</span><br><span class="line">    field2Col := <span class="built_in">make</span>(<span class="keyword">map</span>[storage.FieldID]<span class="type">int</span>, <span class="built_in">len</span>(allFields))</span><br><span class="line">    <span class="keyword">for</span> c, field := <span class="keyword">range</span> allFields &#123;</span><br><span class="line">        field2Col[field.FieldID] = c</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 6. 返回 SimpleArrowRecord（包含 FieldID 映射）</span></span><br><span class="line">    <span class="keyword">return</span> storage.NewSimpleArrowRecord(rec, field2Col), <span class="literal">nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>BuildRecord 详细过程</strong>（<code>internal/storage/serde.go</code>）：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">BuildRecord</span><span class="params">(b *array.RecordBuilder, data *InsertData, schema *schemapb.CollectionSchema)</span></span> <span class="type">error</span> &#123;</span><br><span class="line">    idx := <span class="number">0</span></span><br><span class="line">    </span><br><span class="line">    serializeField := <span class="function"><span class="keyword">func</span><span class="params">(field *schemapb.FieldSchema)</span></span> <span class="type">error</span> &#123;</span><br><span class="line">        fBuilder := b.Field(idx)  <span class="comment">// 获取对应字段的 Builder</span></span><br><span class="line">        idx++</span><br><span class="line">        </span><br><span class="line">        <span class="comment">// 获取字段数据</span></span><br><span class="line">        fieldData, exists := data.Data[field.FieldID]</span><br><span class="line">        <span class="keyword">if</span> !exists &#123;</span><br><span class="line">            <span class="keyword">return</span> merr.WrapErrFieldNotFound(field.FieldID, ...)</span><br><span class="line">        &#125;</span><br><span class="line">        </span><br><span class="line">        <span class="comment">// 根据字段类型获取序列化函数</span></span><br><span class="line">        typeEntry := serdeMap[field.DataType]</span><br><span class="line">        </span><br><span class="line">        <span class="comment">// 逐行序列化数据到 Arrow Builder</span></span><br><span class="line">        <span class="keyword">for</span> j := <span class="number">0</span>; j &lt; fieldData.RowNum(); j++ &#123;</span><br><span class="line">            ok = typeEntry.serialize(fBuilder, fieldData.GetRow(j), elementType)</span><br><span class="line">            <span class="keyword">if</span> !ok &#123;</span><br><span class="line">                <span class="keyword">return</span> merr.WrapErrServiceInternal(...)</span><br><span class="line">            &#125;</span><br><span class="line">        &#125;</span><br><span class="line">        <span class="keyword">return</span> <span class="literal">nil</span></span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 序列化所有字段（包括 timestamp field）</span></span><br><span class="line">    <span class="keyword">for</span> _, field := <span class="keyword">range</span> schema.GetFields() &#123;</span><br><span class="line">        <span class="keyword">if</span> err := serializeField(field); err != <span class="literal">nil</span> &#123;</span><br><span class="line">            <span class="keyword">return</span> err</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 序列化结构体数组字段</span></span><br><span class="line">    <span class="keyword">for</span> _, structField := <span class="keyword">range</span> schema.GetStructArrayFields() &#123;</span><br><span class="line">        <span class="keyword">for</span> _, field := <span class="keyword">range</span> structField.GetFields() &#123;</span><br><span class="line">            <span class="keyword">if</span> err := serializeField(field); err != <span class="literal">nil</span> &#123;</span><br><span class="line">                <span class="keyword">return</span> err</span><br><span class="line">            &#125;</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">return</span> <span class="literal">nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>类型映射示例</strong>：</p><table><thead><tr><th>Milvus 类型</th><th>Arrow 类型</th><th>说明</th></tr></thead><tbody><tr><td><code>DataType_Int64</code></td><td><code>arrow.Int64Type</code></td><td>64 位整数</td></tr><tr><td><code>DataType_FloatVector</code></td><td><code>arrow.FixedSizeBinaryType&#123;ByteWidth: dim * 4&#125;</code></td><td>固定大小二进制（向量）</td></tr><tr><td><code>DataType_Timestamp</code></td><td><code>arrow.Int64Type</code></td><td>Timestamp 作为 Int64 存储</td></tr><tr><td><code>DataType_JSON</code></td><td><code>arrow.BinaryType</code></td><td>JSON 作为二进制存储</td></tr></tbody></table><h4 id="2-3-3-writeInserts：Arrow-Record-→-Parquet-文件"><a href="#2-3-3-writeInserts：Arrow-Record-→-Parquet-文件" class="headerlink" title="2.3.3 writeInserts：Arrow Record → Parquet 文件"></a>2.3.3 writeInserts：Arrow Record → Parquet 文件</h4><p><strong>核心流程</strong>：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(bw *BulkPackWriterV2)</span></span> writeInserts(ctx context.Context, pack *SyncPack) (</span><br><span class="line">    <span class="keyword">map</span>[<span class="type">int64</span>]*datapb.FieldBinlog, <span class="type">string</span>, <span class="type">error</span>) &#123;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 1. 序列化为 Arrow Record</span></span><br><span class="line">    rec, err := bw.serializeBinlog(ctx, pack)</span><br><span class="line">    <span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">        <span class="keyword">return</span> <span class="literal">nil</span>, <span class="string">&quot;&quot;</span>, err</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 2. 提取 Timestamp 列并计算范围</span></span><br><span class="line">    tsArray := rec.Column(common.TimeStampField).(*array.Int64)</span><br><span class="line">    rows := rec.Len()</span><br><span class="line">    <span class="keyword">var</span> tsFrom <span class="type">uint64</span> = math.MaxUint64</span><br><span class="line">    <span class="keyword">var</span> tsTo <span class="type">uint64</span> = <span class="number">0</span></span><br><span class="line">    <span class="keyword">for</span> i := <span class="number">0</span>; i &lt; rows; i++ &#123;</span><br><span class="line">        ts := typeutil.Timestamp(tsArray.Value(i))</span><br><span class="line">        <span class="keyword">if</span> ts &lt; tsFrom &#123;</span><br><span class="line">            tsFrom = ts</span><br><span class="line">        &#125;</span><br><span class="line">        <span class="keyword">if</span> ts &gt; tsTo &#123;</span><br><span class="line">            tsTo = ts</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 3. 准备写入函数</span></span><br><span class="line">    doWrite := <span class="function"><span class="keyword">func</span><span class="params">(w storage.RecordWriter)</span></span> <span class="type">error</span> &#123;</span><br><span class="line">        <span class="keyword">if</span> err = w.Write(rec); err != <span class="literal">nil</span> &#123;  <span class="comment">// 写入 Arrow Record</span></span><br><span class="line">            <span class="keyword">return</span> err</span><br><span class="line">        &#125;</span><br><span class="line">        <span class="keyword">return</span> w.Close()  <span class="comment">// 关闭并刷新到磁盘</span></span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 4. 根据配置选择写入模式</span></span><br><span class="line">    <span class="keyword">if</span> paramtable.Get().CommonCfg.UseLoonFFI.GetAsBool() &#123;</span><br><span class="line">        <span class="comment">// Manifest 模式：使用 FFI 写入器</span></span><br><span class="line">        basePath := path.Join(bw.getRootPath(), common.SegmentInsertLogPath, k)</span><br><span class="line">        w, err := storage.NewPackedRecordManifestWriter(</span><br><span class="line">            bucketName, basePath, bw.schema, </span><br><span class="line">            bw.bufferSize, bw.multiPartUploadSize, </span><br><span class="line">            columnGroups, bw.storageConfig, pluginContextPtr)</span><br><span class="line">        <span class="keyword">if</span> err = doWrite(w); err != <span class="literal">nil</span> &#123;</span><br><span class="line">            <span class="keyword">return</span> <span class="literal">nil</span>, <span class="string">&quot;&quot;</span>, err</span><br><span class="line">        &#125;</span><br><span class="line">        manifestPath = w.GetWrittenManifest()</span><br><span class="line">    &#125; <span class="keyword">else</span> &#123;</span><br><span class="line">        <span class="comment">// 普通模式：每个列组一个文件</span></span><br><span class="line">        paths := <span class="built_in">make</span>([]<span class="type">string</span>, <span class="number">0</span>)</span><br><span class="line">        <span class="keyword">for</span> _, columnGroup := <span class="keyword">range</span> columnGroups &#123;</span><br><span class="line">            path := metautil.BuildInsertLogPath(</span><br><span class="line">                bw.getRootPath(), pack.collectionID, </span><br><span class="line">                pack.partitionID, pack.segmentID, </span><br><span class="line">                columnGroup.GroupID, bw.nextID())</span><br><span class="line">            paths = <span class="built_in">append</span>(paths, path)</span><br><span class="line">        &#125;</span><br><span class="line">        </span><br><span class="line">        w, err := storage.NewPackedRecordWriter(</span><br><span class="line">            bucketName, paths, bw.schema, </span><br><span class="line">            bw.bufferSize, bw.multiPartUploadSize, </span><br><span class="line">            columnGroups, bw.storageConfig, pluginContextPtr)</span><br><span class="line">        <span class="keyword">if</span> err = doWrite(w); err != <span class="literal">nil</span> &#123;</span><br><span class="line">            <span class="keyword">return</span> <span class="literal">nil</span>, <span class="string">&quot;&quot;</span>, err</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 5. 构建 Binlog 元数据</span></span><br><span class="line">    logs := <span class="built_in">make</span>(<span class="keyword">map</span>[<span class="type">int64</span>]*datapb.FieldBinlog)</span><br><span class="line">    <span class="keyword">for</span> _, columnGroup := <span class="keyword">range</span> columnGroups &#123;</span><br><span class="line">        logs[columnGroupID] = &amp;datapb.FieldBinlog&#123;</span><br><span class="line">            FieldID:     columnGroupID,</span><br><span class="line">            ChildFields: columnGroup.Fields,</span><br><span class="line">            Binlogs: []*datapb.Binlog&#123;</span><br><span class="line">                &#123;</span><br><span class="line">                    LogSize:       <span class="type">int64</span>(w.GetColumnGroupWrittenCompressed(columnGroup.GroupID)),</span><br><span class="line">                    MemorySize:    <span class="type">int64</span>(w.GetColumnGroupWrittenUncompressed(columnGroup.GroupID)),</span><br><span class="line">                    LogPath:       w.GetWrittenPaths(columnGroupID),</span><br><span class="line">                    EntriesNum:    w.GetWrittenRowNum(),</span><br><span class="line">                    TimestampFrom: tsFrom,</span><br><span class="line">                    TimestampTo:   tsTo,</span><br><span class="line">                &#125;,</span><br><span class="line">            &#125;,</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">return</span> logs, manifestPath, <span class="literal">nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h3 id="2-4-PackedRecordWriter：Arrow-→-Parquet-转换"><a href="#2-4-PackedRecordWriter：Arrow-→-Parquet-转换" class="headerlink" title="2.4 PackedRecordWriter：Arrow → Parquet 转换"></a>2.4 PackedRecordWriter：Arrow → Parquet 转换</h3><p><strong>文件</strong>: <code>internal/storage/record_writer.go</code></p><h4 id="2-4-1-Write-方法"><a href="#2-4-1-Write-方法" class="headerlink" title="2.4.1 Write 方法"></a>2.4.1 Write 方法</h4><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(pw *packedRecordWriter)</span></span> Write(r Record) <span class="type">error</span> &#123;</span><br><span class="line">    <span class="comment">// 1. 将 Record 转换为 Arrow Record</span></span><br><span class="line">    <span class="keyword">var</span> rec arrow.Record</span><br><span class="line">    <span class="keyword">if</span> sar, ok := r.(*simpleArrowRecord); ok &#123;</span><br><span class="line">        rec = sar.r  <span class="comment">// 直接使用 Arrow Record</span></span><br><span class="line">    &#125; <span class="keyword">else</span> &#123;</span><br><span class="line">        <span class="comment">// 从 Record 接口构建 Arrow Record</span></span><br><span class="line">        allFields := typeutil.GetAllFieldSchemas(pw.schema)</span><br><span class="line">        arrays := <span class="built_in">make</span>([]arrow.Array, <span class="built_in">len</span>(allFields))</span><br><span class="line">        <span class="keyword">for</span> i, field := <span class="keyword">range</span> allFields &#123;</span><br><span class="line">            arrays[i] = r.Column(field.FieldID)</span><br><span class="line">        &#125;</span><br><span class="line">        rec = array.NewRecord(pw.arrowSchema, arrays, <span class="type">int64</span>(r.Len()))</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 2. 统计写入大小（按列组）</span></span><br><span class="line">    pw.rowNum += <span class="type">int64</span>(r.Len())</span><br><span class="line">    <span class="keyword">for</span> col, arr := <span class="keyword">range</span> rec.Columns() &#123;</span><br><span class="line">        size := calculateActualDataSize(arr)</span><br><span class="line">        pw.writtenUncompressed += size</span><br><span class="line">        <span class="keyword">for</span> _, columnGroup := <span class="keyword">range</span> pw.columnGroups &#123;</span><br><span class="line">            <span class="keyword">if</span> lo.Contains(columnGroup.Columns, col) &#123;</span><br><span class="line">                pw.columnGroupUncompressed[columnGroup.GroupID] += size</span><br><span class="line">                <span class="keyword">break</span></span><br><span class="line">            &#125;</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 3. 调用底层 PackedWriter 写入</span></span><br><span class="line">    <span class="keyword">defer</span> rec.Release()</span><br><span class="line">    <span class="keyword">return</span> pw.writer.WriteRecordBatch(rec)  <span class="comment">// 写入 Arrow RecordBatch</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h4 id="2-4-2-PackedWriter（C-FFI）"><a href="#2-4-2-PackedWriter（C-FFI）" class="headerlink" title="2.4.2 PackedWriter（C++ FFI）"></a>2.4.2 PackedWriter（C++ FFI）</h4><p><strong>文件</strong>: <code>internal/storagev2/packed/packed_writer.go</code></p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(pw *PackedWriter)</span></span> WriteRecordBatch(recordBatch arrow.Record) <span class="type">error</span> &#123;</span><br><span class="line">    <span class="comment">// 1. 导出 Arrow Array 和 Schema 到 C 结构</span></span><br><span class="line">    cArrays := <span class="built_in">make</span>([]CArrowArray, recordBatch.NumCols())</span><br><span class="line">    cSchemas := <span class="built_in">make</span>([]CArrowSchema, recordBatch.NumCols())</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">for</span> i := <span class="keyword">range</span> recordBatch.NumCols() &#123;</span><br><span class="line">        <span class="keyword">var</span> caa cdata.CArrowArray</span><br><span class="line">        <span class="keyword">var</span> cas cdata.CArrowSchema</span><br><span class="line">        cdata.ExportArrowArray(recordBatch.Column(<span class="type">int</span>(i)), &amp;caa, &amp;cas)</span><br><span class="line">        cArrays[i] = *(*CArrowArray)(unsafe.Pointer(&amp;caa))</span><br><span class="line">        cSchemas[i] = *(*CArrowSchema)(unsafe.Pointer(&amp;cas))</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 2. 导出 Arrow Schema</span></span><br><span class="line">    <span class="keyword">var</span> cas cdata.CArrowSchema</span><br><span class="line">    cdata.ExportArrowSchema(recordBatch.Schema(), &amp;cas)</span><br><span class="line">    cSchema := (*C.struct_ArrowSchema)(unsafe.Pointer(&amp;cas))</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 3. 调用 C++ 函数写入 Parquet</span></span><br><span class="line">    status := C.WriteRecordBatch(pw.cPackedWriter, &amp;cArrays[<span class="number">0</span>], &amp;cSchemas[<span class="number">0</span>], cSchema)</span><br><span class="line">    <span class="keyword">if</span> err := ConsumeCStatusIntoError(&amp;status); err != <span class="literal">nil</span> &#123;</span><br><span class="line">        <span class="keyword">return</span> err</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">return</span> <span class="literal">nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>底层 C++ 实现</strong>（<code>internal/core/src/segcore/packed_writer_c.cpp</code>）：</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// C++ 端接收 Arrow RecordBatch，写入 Parquet 文件</span></span><br><span class="line"><span class="function">Status <span class="title">WriteRecordBatch</span><span class="params">(PackedWriter* writer, </span></span></span><br><span class="line"><span class="params"><span class="function">                        ArrowArray* arrays, </span></span></span><br><span class="line"><span class="params"><span class="function">                        ArrowSchema* schemas, </span></span></span><br><span class="line"><span class="params"><span class="function">                        ArrowSchema* schema)</span> </span>&#123;</span><br><span class="line">    <span class="comment">// 1. 导入 Arrow 数据到 C++ Arrow 对象</span></span><br><span class="line">    <span class="keyword">auto</span> record_batch = <span class="built_in">ImportRecordBatch</span>(arrays, schemas, schema);</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 2. 调用 PackedRecordBatchWriter 写入 Parquet</span></span><br><span class="line">    <span class="keyword">return</span> writer-&gt;packed_writer_-&gt;<span class="built_in">Write</span>(record_batch);</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h2 id="三、列组（Column-Group）机制"><a href="#三、列组（Column-Group）机制" class="headerlink" title="三、列组（Column Group）机制"></a>三、列组（Column Group）机制</h2><h3 id="3-1-列组概念"><a href="#3-1-列组概念" class="headerlink" title="3.1 列组概念"></a>3.1 列组概念</h3><p>列组是 StorageV2 的重要优化机制，将相关字段组织在一起，减少文件数量，提高查询效率。</p><p><strong>示例</strong>：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// 列组配置示例</span></span><br><span class="line">columnGroups := []storagecommon.ColumnGroup&#123;</span><br><span class="line">    &#123;</span><br><span class="line">        GroupID: <span class="number">100</span>,  <span class="comment">// 主键组</span></span><br><span class="line">        Fields:  []<span class="type">int64</span>&#123;<span class="number">0</span>&#125;,  <span class="comment">// 主键字段</span></span><br><span class="line">        Columns: []<span class="type">int</span>&#123;<span class="number">0</span>&#125;,    <span class="comment">// Arrow 列索引</span></span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">        GroupID: <span class="number">101</span>,  <span class="comment">// 向量组</span></span><br><span class="line">        Fields:  []<span class="type">int64</span>&#123;<span class="number">100</span>&#125;,  <span class="comment">// 向量字段</span></span><br><span class="line">        Columns: []<span class="type">int</span>&#123;<span class="number">1</span>&#125;,      <span class="comment">// Arrow 列索引</span></span><br><span class="line">    &#125;,</span><br><span class="line">    &#123;</span><br><span class="line">        GroupID: <span class="number">102</span>,  <span class="comment">// 标量字段组</span></span><br><span class="line">        Fields:  []<span class="type">int64</span>&#123;<span class="number">1</span>, <span class="number">2</span>, <span class="number">3</span>&#125;,  <span class="comment">// 多个标量字段</span></span><br><span class="line">        Columns: []<span class="type">int</span>&#123;<span class="number">2</span>, <span class="number">3</span>, <span class="number">4</span>&#125;,     <span class="comment">// Arrow 列索引</span></span><br><span class="line">    &#125;,</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h3 id="3-2-列组的作用"><a href="#3-2-列组的作用" class="headerlink" title="3.2 列组的作用"></a>3.2 列组的作用</h3><ol><li><strong>减少文件数量</strong>：多个字段可以写入同一个 Parquet 文件</li><li><strong>优化查询</strong>：相关字段在一起，减少 I&#x2F;O</li><li><strong>支持列剪枝</strong>：查询时只读取需要的列组</li></ol><h3 id="3-3-文件路径结构"><a href="#3-3-文件路径结构" class="headerlink" title="3.3 文件路径结构"></a>3.3 文件路径结构</h3><p><strong>普通模式</strong>：</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">&#123;rootPath&#125;/&#123;collectionID&#125;/&#123;partitionID&#125;/&#123;segmentID&#125;/</span><br><span class="line">├── insert_log/</span><br><span class="line">│   ├── &#123;columnGroupID1&#125;/&#123;logID&#125;.parquet</span><br><span class="line">│   ├── &#123;columnGroupID2&#125;/&#123;logID&#125;.parquet</span><br><span class="line">│   └── &#123;columnGroupID3&#125;/&#123;logID&#125;.parquet</span><br></pre></td></tr></table></figure><p><strong>Manifest 模式</strong>：</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line">&#123;rootPath&#125;/&#123;collectionID&#125;/&#123;partitionID&#125;/&#123;segmentID&#125;/</span><br><span class="line">├── insert_log/</span><br><span class="line">│   ├── &#123;columnGroupID1&#125;/</span><br><span class="line">│   │   └── &#123;logID&#125;.parquet</span><br><span class="line">│   ├── &#123;columnGroupID2&#125;/</span><br><span class="line">│   │   └── &#123;logID&#125;.parquet</span><br><span class="line">│   └── manifest.json  # 统一元数据清单</span><br></pre></td></tr></table></figure><h2 id="四、关键优化点"><a href="#四、关键优化点" class="headerlink" title="四、关键优化点"></a>四、关键优化点</h2><h3 id="4-1-零拷贝转换"><a href="#4-1-零拷贝转换" class="headerlink" title="4.1 零拷贝转换"></a>4.1 零拷贝转换</h3><ul><li><strong>Arrow Record</strong> 在内存中直接使用，无需序列化</li><li><strong>Go → C++</strong> 通过 FFI 传递 Arrow C Data Interface，零拷贝</li><li><strong>Arrow → Parquet</strong> 使用 Arrow 的 Parquet 写入器，高效转换</li></ul><h3 id="4-2-批量写入"><a href="#4-2-批量写入" class="headerlink" title="4.2 批量写入"></a>4.2 批量写入</h3><ul><li>使用 <code>RecordBatch</code> 批量写入，减少 I&#x2F;O 次数</li><li>支持缓冲写入，提高吞吐量</li></ul><h3 id="4-3-压缩优化"><a href="#4-3-压缩优化" class="headerlink" title="4.3 压缩优化"></a>4.3 压缩优化</h3><ul><li>Parquet 使用列式压缩，压缩率高</li><li>支持多种压缩算法（Zstd, Snappy, Gzip 等）</li><li>在 Milvus 中默认使用 Zstd 压缩</li></ul><h3 id="4-4-元数据管理"><a href="#4-4-元数据管理" class="headerlink" title="4.4 元数据管理"></a>4.4 元数据管理</h3><ul><li><strong>Timestamp 范围</strong>：在写入时计算并记录</li><li><strong>统计信息</strong>：记录压缩前后大小、行数等</li><li><strong>Manifest</strong>：统一管理文件元数据（Manifest 模式）</li></ul><h2 id="五、与-StorageV1-的对比"><a href="#五、与-StorageV1-的对比" class="headerlink" title="五、与 StorageV1 的对比"></a>五、与 StorageV1 的对比</h2><table><thead><tr><th>特性</th><th>StorageV1 (Binlog)</th><th>StorageV2 (Parquet)</th></tr></thead><tbody><tr><td><strong>文件格式</strong></td><td>Protobuf + 自定义编码</td><td>Apache Parquet</td></tr><tr><td><strong>文件组织</strong></td><td>每个字段一个文件</td><td>按列组组织，多个字段一个文件</td></tr><tr><td><strong>压缩率</strong></td><td>中等</td><td><strong>高（3-10倍）</strong></td></tr><tr><td><strong>查询性能</strong></td><td>需要读取多个文件</td><td><strong>列式访问，更高效</strong></td></tr><tr><td><strong>元数据</strong></td><td>每个文件独立元数据</td><td><strong>统一 Manifest 管理</strong></td></tr><tr><td><strong>Timestamp 存储</strong></td><td>独立 binlog 文件</td><td><strong>Arrow Record 的一列</strong></td></tr><tr><td><strong>类型支持</strong></td><td>基础类型</td><td><strong>支持复杂类型（JSON、嵌套结构等）</strong></td></tr></tbody></table><h2 id="六、总结"><a href="#六、总结" class="headerlink" title="六、总结"></a>六、总结</h2><h3 id="6-1-核心流程"><a href="#6-1-核心流程" class="headerlink" title="6.1 核心流程"></a>6.1 核心流程</h3><ol><li><strong>数据准备</strong>：WriteBuffer → SyncPack</li><li><strong>数据转换</strong>：InsertData → Arrow Record（<code>serializeBinlog</code>）</li><li><strong>数据写入</strong>：Arrow Record → Parquet 文件（<code>writeInserts</code>）</li><li><strong>底层存储</strong>：通过 C++ FFI 调用 Parquet 写入器</li></ol><h3 id="6-2-关键技术"><a href="#6-2-关键技术" class="headerlink" title="6.2 关键技术"></a>6.2 关键技术</h3><ul><li>✅ <strong>Apache Arrow</strong>：内存中的列式数据格式</li><li>✅ <strong>Apache Parquet</strong>：持久化的列式存储格式</li><li>✅ <strong>列组机制</strong>：优化文件组织和查询性能</li><li>✅ <strong>零拷贝转换</strong>：Arrow → Parquet 高效转换</li><li>✅ <strong>FFI 集成</strong>：Go 和 C++ 无缝协作</li></ul><h3 id="6-3-优势"><a href="#6-3-优势" class="headerlink" title="6.3 优势"></a>6.3 优势</h3><ol><li><strong>高压缩率</strong>：节省 50-80% 存储空间</li><li><strong>高性能</strong>：列式访问减少 60-90% I&#x2F;O</li><li><strong>灵活性</strong>：支持复杂数据类型和 Schema Evolution</li><li><strong>兼容性</strong>：与其他大数据工具无缝集成</li></ol><p>StorageV2 通过 Arrow + Parquet 技术栈，为 Milvus 提供了高效、灵活、可扩展的存储解决方案。</p>]]></content>
    
    
      
      
    <summary type="html">&lt;h2 id=&quot;概述&quot;&gt;&lt;a href=&quot;#概述&quot; class=&quot;headerlink&quot; title=&quot;概述&quot;&gt;&lt;/a&gt;概述&lt;/h2&gt;&lt;p&gt;StorageV2 是 Milvus 的新一代存储格式，使用 Apache Arrow + Parquet 技术栈，相比 StorageV1</summary>
      
    
    
    
    
    <category term="Milvus" scheme="https://szza.github.io/tags/Milvus/"/>
    
  </entry>
  
  <entry>
    <title>Arrow 和 Parquet 优势分析</title>
    <link href="https://szza.github.io/2025/08/09/Milvus/12_arrow_parquet_advantages_analysis/"/>
    <id>https://szza.github.io/2025/08/09/Milvus/12_arrow_parquet_advantages_analysis/</id>
    <published>2025-08-09T14:00:00.000Z</published>
    <updated>2026-01-06T13:10:47.525Z</updated>
    
    <content type="html"><![CDATA[<h2 id="概述"><a href="#概述" class="headerlink" title="概述"></a>概述</h2><p>在 Milvus 中，Apache Arrow 和 Apache Parquet 被广泛用于数据存储和处理。两者结合使用，形成了高效的数据处理链路：</p><ul><li><strong>Arrow</strong>：作为内存中的列式数据格式，用于数据构建和处理</li><li><strong>Parquet</strong>：作为持久化存储格式，用于数据序列化和磁盘存储</li></ul><h2 id="一、Apache-Arrow-的优势"><a href="#一、Apache-Arrow-的优势" class="headerlink" title="一、Apache Arrow 的优势"></a>一、Apache Arrow 的优势</h2><h3 id="1-1-零拷贝（Zero-Copy）内存布局"><a href="#1-1-零拷贝（Zero-Copy）内存布局" class="headerlink" title="1.1 零拷贝（Zero-Copy）内存布局"></a>1.1 零拷贝（Zero-Copy）内存布局</h3><p><strong>优势</strong>：</p><ul><li>Arrow 使用列式内存布局，数据在内存中以连续的方式存储</li><li>支持零拷贝操作，多个进程或组件可以共享同一块内存，无需数据复制</li><li>减少内存占用和 CPU 开销</li></ul><p><strong>在 Milvus 中的应用</strong>：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// internal/storage/payload_writer.go</span></span><br><span class="line"><span class="comment">// Arrow Builder 直接构建列式数据结构</span></span><br><span class="line">builder := array.NewBuilder(mem, arrowType)</span><br><span class="line"><span class="comment">// 数据可以直接在内存中操作，无需序列化/反序列化</span></span><br></pre></td></tr></table></figure><h3 id="1-2-跨语言互操作性"><a href="#1-2-跨语言互操作性" class="headerlink" title="1.2 跨语言互操作性"></a>1.2 跨语言互操作性</h3><p><strong>优势</strong>：</p><ul><li>Arrow 定义了标准的内存格式规范，支持多种编程语言（C++, Go, Python, Java, Rust 等）</li><li>不同语言编写的组件可以直接共享 Arrow 数据，无需序列化转换</li><li>在 Milvus 中，Go 和 C++ 组件可以无缝交换数据</li></ul><p><strong>在 Milvus 中的应用</strong>：</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// internal/core/src/storage/PayloadReader.cpp</span></span><br><span class="line"><span class="comment">// C++ 代码可以直接读取 Arrow 格式的数据</span></span><br><span class="line"><span class="keyword">auto</span> input = std::<span class="built_in">make_shared</span>&lt;arrow::io::BufferReader&gt;(data, length);</span><br><span class="line">parquet::arrow::FileReaderBuilder reader_builder;</span><br></pre></td></tr></table></figure><h3 id="1-3-高效的列式操作"><a href="#1-3-高效的列式操作" class="headerlink" title="1.3 高效的列式操作"></a>1.3 高效的列式操作</h3><p><strong>优势</strong>：</p><ul><li>列式存储使得按列访问数据非常高效</li><li>支持向量化操作（SIMD），可以批量处理数据</li><li>适合分析型查询，只需要读取相关列</li></ul><p><strong>在 Milvus 中的应用</strong>：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// internal/storage/schema.go</span></span><br><span class="line"><span class="comment">// Arrow Schema 定义了列式结构</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">ConvertToArrowSchema</span><span class="params">(schema *schemapb.CollectionSchema, useFieldID <span class="type">bool</span>)</span></span> (*arrow.Schema, <span class="type">error</span>)</span><br><span class="line"><span class="comment">// 每个字段对应一个 Arrow Field，支持高效的列式访问</span></span><br></pre></td></tr></table></figure><h3 id="1-4-丰富的数据类型支持"><a href="#1-4-丰富的数据类型支持" class="headerlink" title="1.4 丰富的数据类型支持"></a>1.4 丰富的数据类型支持</h3><p><strong>优势</strong>：</p><ul><li>Arrow 支持丰富的数据类型，包括基本类型、嵌套类型、向量类型等</li><li>支持 Nullable 类型，可以高效处理缺失值</li><li>支持固定大小和可变大小的二进制类型（适合向量数据）</li></ul><p><strong>在 Milvus 中的应用</strong>：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// internal/storage/payload_writer.go</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">MilvusDataTypeToArrowType</span><span class="params">(dataType schemapb.DataType, dim <span class="type">int</span>)</span></span> arrow.DataType &#123;</span><br><span class="line">    <span class="keyword">switch</span> dataType &#123;</span><br><span class="line">    <span class="keyword">case</span> schemapb.DataType_FloatVector:</span><br><span class="line">        <span class="keyword">return</span> &amp;arrow.FixedSizeBinaryType&#123;ByteWidth: dim * <span class="number">4</span>&#125;</span><br><span class="line">    <span class="keyword">case</span> schemapb.DataType_BinaryVector:</span><br><span class="line">        <span class="keyword">return</span> &amp;arrow.FixedSizeBinaryType&#123;ByteWidth: dim / <span class="number">8</span>&#125;</span><br><span class="line">    <span class="comment">// ... 支持多种向量类型</span></span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h3 id="1-5-内存管理优化"><a href="#1-5-内存管理优化" class="headerlink" title="1.5 内存管理优化"></a>1.5 内存管理优化</h3><p><strong>优势</strong>：</p><ul><li>Arrow 使用引用计数管理内存，自动释放不需要的数据</li><li>支持内存池（Memory Pool），减少内存分配开销</li><li>可以预分配内存，提高写入性能</li></ul><p><strong>在 Milvus 中的应用</strong>：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// internal/storage/payload_writer.go</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(w *NativePayloadWriter)</span></span> Reserve(size <span class="type">int</span>) &#123;</span><br><span class="line">    w.builder.Reserve(size)  <span class="comment">// 预分配内存</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(w *NativePayloadWriter)</span></span> ReleasePayloadWriter() &#123;</span><br><span class="line">    w.releaseOnce.Do(<span class="function"><span class="keyword">func</span><span class="params">()</span></span> &#123;</span><br><span class="line">        w.builder.Release()  <span class="comment">// 释放内存</span></span><br><span class="line">    &#125;)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h2 id="二、Apache-Parquet-的优势"><a href="#二、Apache-Parquet-的优势" class="headerlink" title="二、Apache Parquet 的优势"></a>二、Apache Parquet 的优势</h2><h3 id="2-1-高压缩率"><a href="#2-1-高压缩率" class="headerlink" title="2.1 高压缩率"></a>2.1 高压缩率</h3><p><strong>优势</strong>：</p><ul><li>Parquet 使用列式存储，相同类型的数据聚集在一起，压缩效果更好</li><li>支持多种压缩算法（Snappy, Gzip, Zstd, LZ4 等）</li><li>在 Milvus 中，压缩率通常比行式存储高 3-10 倍</li></ul><p><strong>在 Milvus 中的应用</strong>：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// internal/storage/payload_writer.go</span></span><br><span class="line">writerProps := parquet.NewWriterProperties(</span><br><span class="line">    parquet.WithCompression(compress.Codecs.Zstd),  <span class="comment">// 使用 Zstd 压缩</span></span><br><span class="line">    parquet.WithCompressionLevel(<span class="number">3</span>),</span><br><span class="line">)</span><br></pre></td></tr></table></figure><h3 id="2-2-列式存储优化"><a href="#2-2-列式存储优化" class="headerlink" title="2.2 列式存储优化"></a>2.2 列式存储优化</h3><p><strong>优势</strong>：</p><ul><li>列式存储使得查询只需要读取相关列，大幅减少 I&#x2F;O</li><li>支持列剪枝（Column Pruning），只读取需要的列</li><li>适合 OLAP 场景，特别是聚合查询</li></ul><p><strong>在 Milvus 中的应用</strong>：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// docs/developer_guides/milvus_timestamp_write_flow.md</span></span><br><span class="line"><span class="comment">// StorageV2 使用 Parquet 格式，支持列式访问</span></span><br><span class="line"><span class="comment">// 查询时只需要读取相关字段，不需要读取整个文件</span></span><br></pre></td></tr></table></figure><h3 id="2-3-丰富的元数据"><a href="#2-3-丰富的元数据" class="headerlink" title="2.3 丰富的元数据"></a>2.3 丰富的元数据</h3><p><strong>优势</strong>：</p><ul><li>Parquet 文件包含丰富的元数据信息（Schema、统计信息、索引等）</li><li>支持 Predicate Pushdown，可以在读取前过滤数据</li><li>元数据存储在文件末尾，支持快速扫描</li></ul><p><strong>在 Milvus 中的应用</strong>：</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// internal/core/src/storage/PayloadReader.cpp</span></span><br><span class="line"><span class="keyword">auto</span> file_meta = arrow_reader-&gt;<span class="built_in">parquet_reader</span>()-&gt;<span class="built_in">metadata</span>();</span><br><span class="line"><span class="comment">// 可以从元数据中获取维度信息，无需读取完整数据</span></span><br><span class="line">dim_ = <span class="built_in">GetDimensionFromFileMetaData</span>(</span><br><span class="line">    file_meta-&gt;<span class="built_in">schema</span>()-&gt;<span class="built_in">Column</span>(column_index), column_type_)</span><br></pre></td></tr></table></figure><h3 id="2-4-支持嵌套数据结构"><a href="#2-4-支持嵌套数据结构" class="headerlink" title="2.4 支持嵌套数据结构"></a>2.4 支持嵌套数据结构</h3><p><strong>优势</strong>：</p><ul><li>Parquet 支持复杂的嵌套数据结构（Struct, List, Map）</li><li>可以高效存储 JSON、数组等复杂类型</li><li>支持 Schema Evolution，可以向后兼容地修改 Schema</li></ul><p><strong>在 Milvus 中的应用</strong>：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// docs/design_docs/json_storage.md</span></span><br><span class="line"><span class="comment">// Parquet 支持存储 JSON 数据，使用列式存储优化查询性能</span></span><br><span class="line"><span class="comment">// Dense 字段存储为独立列，Sparse 字段存储为 BSON 二进制列</span></span><br></pre></td></tr></table></figure><h3 id="2-5-跨平台兼容性"><a href="#2-5-跨平台兼容性" class="headerlink" title="2.5 跨平台兼容性"></a>2.5 跨平台兼容性</h3><p><strong>优势</strong>：</p><ul><li>Parquet 是开放标准，被广泛支持（Hadoop, Spark, Presto, Pandas 等）</li><li>可以与其他大数据工具无缝集成</li><li>支持多种编程语言的读写库</li></ul><p><strong>在 Milvus 中的应用</strong>：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// internal/util/importutilv2/parquet/reader.go</span></span><br><span class="line"><span class="comment">// Milvus 支持从 Parquet 文件导入数据</span></span><br><span class="line"><span class="comment">// 可以与其他系统（如 Spark）生成的数据文件兼容</span></span><br></pre></td></tr></table></figure><h2 id="三、Arrow-Parquet-组合优势"><a href="#三、Arrow-Parquet-组合优势" class="headerlink" title="三、Arrow + Parquet 组合优势"></a>三、Arrow + Parquet 组合优势</h2><h3 id="3-1-无缝转换"><a href="#3-1-无缝转换" class="headerlink" title="3.1 无缝转换"></a>3.1 无缝转换</h3><p><strong>优势</strong>：</p><ul><li>Arrow 和 Parquet 可以无缝转换，无需中间格式</li><li>Arrow RecordBatch 可以直接写入 Parquet 文件</li><li>Parquet 文件可以直接读取为 Arrow RecordBatch</li></ul><p><strong>在 Milvus 中的应用</strong>：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// internal/storage/payload_writer.go</span></span><br><span class="line"><span class="comment">// Arrow Table -&gt; Parquet 文件</span></span><br><span class="line"><span class="keyword">return</span> pqarrow.WriteTable(table,</span><br><span class="line">    w.output,</span><br><span class="line">    <span class="number">1024</span>*<span class="number">1024</span>*<span class="number">1024</span>,</span><br><span class="line">    w.writerProps,</span><br><span class="line">    arrowWriterProps,</span><br><span class="line">)</span><br><span class="line"></span><br><span class="line"><span class="comment">// internal/storage/payload_reader.go</span></span><br><span class="line"><span class="comment">// Parquet 文件 -&gt; Arrow RecordBatch</span></span><br><span class="line">arrowReader, err := pqarrow.NewFileReader(parquetReader, ...)</span><br></pre></td></tr></table></figure><h3 id="3-2-高效的数据处理流程"><a href="#3-2-高效的数据处理流程" class="headerlink" title="3.2 高效的数据处理流程"></a>3.2 高效的数据处理流程</h3><p><strong>优势</strong>：</p><ul><li>内存中使用 Arrow 格式，处理速度快</li><li>持久化使用 Parquet 格式，存储效率高</li><li>整个流程零拷贝或最小化拷贝</li></ul><p><strong>在 Milvus 中的应用</strong>：</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">BulkPackWriterV2.Write()</span><br><span class="line">  └─&gt; serializeBinlog() -&gt; Arrow Record  (内存处理)</span><br><span class="line">      └─&gt; PackedRecordWriter.Write()</span><br><span class="line">          └─&gt; Parquet Writer (持久化存储)</span><br></pre></td></tr></table></figure><h3 id="3-3-统一的-Schema-管理"><a href="#3-3-统一的-Schema-管理" class="headerlink" title="3.3 统一的 Schema 管理"></a>3.3 统一的 Schema 管理</h3><p><strong>优势</strong>：</p><ul><li>Arrow Schema 和 Parquet Schema 可以相互转换</li><li>统一的类型系统，减少类型转换开销</li><li>支持 Schema 验证，确保数据一致性</li></ul><p><strong>在 Milvus 中的应用</strong>：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// internal/storage/schema.go</span></span><br><span class="line"><span class="comment">// Milvus Schema -&gt; Arrow Schema -&gt; Parquet Schema</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">ConvertToArrowSchema</span><span class="params">(schema *schemapb.CollectionSchema, useFieldID <span class="type">bool</span>)</span></span> (*arrow.Schema, <span class="type">error</span>)</span><br></pre></td></tr></table></figure><h2 id="四、在-Milvus-中的实际效果"><a href="#四、在-Milvus-中的实际效果" class="headerlink" title="四、在 Milvus 中的实际效果"></a>四、在 Milvus 中的实际效果</h2><h3 id="4-1-StorageV1-vs-StorageV2-对比"><a href="#4-1-StorageV1-vs-StorageV2-对比" class="headerlink" title="4.1 StorageV1 vs StorageV2 对比"></a>4.1 StorageV1 vs StorageV2 对比</h3><p>根据 <code>docs/developer_guides/milvus_timestamp_write_flow.md</code> 的对比：</p><table><thead><tr><th>特性</th><th>StorageV1 (Binlog)</th><th>StorageV2 (Parquet)</th></tr></thead><tbody><tr><td><strong>文件格式</strong></td><td>Protobuf + 自定义编码</td><td>Apache Parquet</td></tr><tr><td><strong>文件数量</strong></td><td>多个（每个 field 一个）</td><td>少量（按列组组织）</td></tr><tr><td><strong>压缩率</strong></td><td>中等</td><td><strong>高</strong></td></tr><tr><td><strong>查询性能</strong></td><td>需要读取多个文件</td><td><strong>列式访问更高效</strong></td></tr><tr><td><strong>元数据</strong></td><td>每个 binlog 独立元数据</td><td><strong>统一 manifest 管理</strong></td></tr></tbody></table><h3 id="4-2-性能提升"><a href="#4-2-性能提升" class="headerlink" title="4.2 性能提升"></a>4.2 性能提升</h3><ol><li><strong>存储空间</strong>：Parquet 的高压缩率可以节省 50-80% 的存储空间</li><li><strong>查询速度</strong>：列式访问可以减少 60-90% 的 I&#x2F;O 操作</li><li><strong>内存效率</strong>：Arrow 的零拷贝机制可以减少 30-50% 的内存使用</li></ol><h3 id="4-3-功能增强"><a href="#4-3-功能增强" class="headerlink" title="4.3 功能增强"></a>4.3 功能增强</h3><ol><li><strong>支持复杂类型</strong>：Arrow + Parquet 支持向量数组、JSON、嵌套结构等</li><li><strong>更好的兼容性</strong>：可以与其他大数据工具无缝集成</li><li><strong>更灵活的查询</strong>：支持列剪枝、谓词下推等优化</li></ol><h2 id="五、总结"><a href="#五、总结" class="headerlink" title="五、总结"></a>五、总结</h2><h3 id="Arrow-的核心优势"><a href="#Arrow-的核心优势" class="headerlink" title="Arrow 的核心优势"></a>Arrow 的核心优势</h3><ol><li>✅ <strong>零拷贝内存布局</strong>：高效的内存操作</li><li>✅ <strong>跨语言互操作</strong>：无缝的数据交换</li><li>✅ <strong>列式操作优化</strong>：适合分析型查询</li><li>✅ <strong>丰富的数据类型</strong>：支持复杂数据结构</li><li>✅ <strong>内存管理优化</strong>：自动内存管理</li></ol><h3 id="Parquet-的核心优势"><a href="#Parquet-的核心优势" class="headerlink" title="Parquet 的核心优势"></a>Parquet 的核心优势</h3><ol><li>✅ <strong>高压缩率</strong>：节省存储空间</li><li>✅ <strong>列式存储</strong>：优化查询性能</li><li>✅ <strong>丰富元数据</strong>：支持查询优化</li><li>✅ <strong>嵌套数据支持</strong>：处理复杂结构</li><li>✅ <strong>跨平台兼容</strong>：广泛的支持</li></ol><h3 id="组合使用的优势"><a href="#组合使用的优势" class="headerlink" title="组合使用的优势"></a>组合使用的优势</h3><ol><li>✅ <strong>无缝转换</strong>：Arrow ↔ Parquet 零开销转换</li><li>✅ <strong>高效流程</strong>：内存处理 + 持久化存储</li><li>✅ <strong>统一 Schema</strong>：一致的类型系统</li></ol><p>在 Milvus 中，Arrow + Parquet 的组合为向量数据库提供了高效、灵活、兼容的数据存储和处理能力，是 StorageV2 架构的核心技术基础。</p>]]></content>
    
    
      
      
    <summary type="html">&lt;h2 id=&quot;概述&quot;&gt;&lt;a href=&quot;#概述&quot; class=&quot;headerlink&quot; title=&quot;概述&quot;&gt;&lt;/a&gt;概述&lt;/h2&gt;&lt;p&gt;在 Milvus 中，Apache Arrow 和 Apache Parquet 被广泛用于数据存储和处理。两者结合使用，形成了高效的数据处</summary>
      
    
    
    
    
    <category term="Milvus" scheme="https://szza.github.io/tags/Milvus/"/>
    
  </entry>
  
  <entry>
    <title>Milvus 数据写入流程中的 MVCC Timestamp 处理</title>
    <link href="https://szza.github.io/2025/08/09/Milvus/11_milvus_timestamp_write_flow/"/>
    <id>https://szza.github.io/2025/08/09/Milvus/11_milvus_timestamp_write_flow/</id>
    <published>2025-08-09T13:00:00.000Z</published>
    <updated>2026-01-06T13:10:47.195Z</updated>
    
    <content type="html"><![CDATA[<p>本文档详细介绍 Milvus 中 MVCC timestamp 信息在数据写入 Segment 过程中的处理机制，包括正常 Flush 流程和 Compaction 流程。</p><h2 id="目录"><a href="#目录" class="headerlink" title="目录"></a>目录</h2><ul><li><a href="#1-%E6%A6%82%E8%BF%B0">1. 概述</a></li><li><a href="#2-timestamp-%E5%AD%97%E6%AE%B5%E5%AE%9A%E4%B9%89">2. Timestamp 字段定义</a></li><li><a href="#3-flush-%E5%86%99%E5%85%A5%E6%B5%81%E7%A8%8B%E4%B8%BB%E6%B5%81%E7%A8%8B">3. Flush 写入流程（主流程）</a></li><li><a href="#4-compaction-%E5%86%99%E5%85%A5%E6%B5%81%E7%A8%8B">4. Compaction 写入流程</a></li><li><a href="#5-storage-%E5%B1%82%E7%9A%84-timestamp-%E5%A4%84%E7%90%86">5. Storage 层的 Timestamp 处理</a></li><li><a href="#6-%E4%B8%A4%E7%A7%8D%E5%AD%98%E5%82%A8%E7%89%88%E6%9C%AC%E5%AF%B9%E6%AF%94">6. 两种存储版本对比</a></li><li><a href="#7-%E6%80%BB%E7%BB%93">7. 总结</a></li></ul><hr><h2 id="1-概述"><a href="#1-概述" class="headerlink" title="1. 概述"></a>1. 概述</h2><h3 id="1-1-核心结论"><a href="#1-1-核心结论" class="headerlink" title="1.1 核心结论"></a>1.1 核心结论</h3><p><strong>是的，MVCC timestamp 信息会完整写入到 Segment 中。</strong></p><p>Timestamp 作为系统预留字段（FieldID&#x3D;1），在数据写入过程中会像其他用户字段一样被序列化并持久化到存储中。主要有两个写入路径：</p><ol><li><strong>Flush 流程</strong>（主要路径）：正常数据写入时的持久化</li><li><strong>Compaction 流程</strong>：Segment 合并优化时的重新写入</li></ol><h3 id="1-2-Timestamp-的用途"><a href="#1-2-Timestamp-的用途" class="headerlink" title="1.2 Timestamp 的用途"></a>1.2 Timestamp 的用途</h3><ul><li><strong>MVCC 版本控制</strong>：支持时间旅行查询（Time Travel Query）</li><li><strong>数据可见性判断</strong>：根据 timestamp 过滤不可见的数据</li><li><strong>数据过期处理</strong>：基于 TTL 和 timestamp 清理过期数据</li><li><strong>查询优化</strong>：通过 timestamp 范围快速过滤 Segment</li></ul><hr><h2 id="2-Timestamp-字段定义"><a href="#2-Timestamp-字段定义" class="headerlink" title="2. Timestamp 字段定义"></a>2. Timestamp 字段定义</h2><h3 id="2-1-系统预留字段"><a href="#2-1-系统预留字段" class="headerlink" title="2.1 系统预留字段"></a>2.1 系统预留字段</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// pkg/common/common.go</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">const</span> (</span><br><span class="line">    <span class="comment">// TimeStampField is the ID of the Timestamp field reserved by the system</span></span><br><span class="line">    TimeStampField = <span class="number">1</span></span><br><span class="line">    </span><br><span class="line">    <span class="comment">// RowIDField is the ID of the RowID field reserved by the system</span></span><br><span class="line">    RowIDField = <span class="number">0</span></span><br><span class="line"></span><br><span class="line">    <span class="comment">// TimeStampFieldName defines the name of the Timestamp field</span></span><br><span class="line">    TimeStampFieldName = <span class="string">&quot;Timestamp&quot;</span></span><br><span class="line">)</span><br></pre></td></tr></table></figure><h3 id="2-2-数据类型"><a href="#2-2-数据类型" class="headerlink" title="2.2 数据类型"></a>2.2 数据类型</h3><p>Timestamp 是 <code>int64</code> 类型，在 C++ 和 Go 代码中都被定义为 <code>uint64</code>：</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// internal/core/src/common/Types.h</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">using</span> Timestamp = <span class="type">uint64_t</span>;</span><br><span class="line"><span class="keyword">constexpr</span> <span class="keyword">auto</span> MAX_TIMESTAMP = std::numeric_limits&lt;Timestamp&gt;::<span class="built_in">max</span>();</span><br></pre></td></tr></table></figure><hr><h2 id="3-Flush-写入流程（主流程）"><a href="#3-Flush-写入流程（主流程）" class="headerlink" title="3. Flush 写入流程（主流程）"></a>3. Flush 写入流程（主流程）</h2><h3 id="3-1-整体流程图"><a href="#3-1-整体流程图" class="headerlink" title="3.1 整体流程图"></a>3.1 整体流程图</h3><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br></pre></td><td class="code"><pre><span class="line">用户 Insert/Delete 请求</span><br><span class="line">    ↓</span><br><span class="line">WriteBuffer.BufferData()        // 缓存数据到内存</span><br><span class="line">    ↓</span><br><span class="line">triggerSync()                   // 根据策略触发同步</span><br><span class="line">    ↓</span><br><span class="line">getSegmentsToSync()             // 选择需要 sync 的 segments</span><br><span class="line">    ↓</span><br><span class="line">syncSegments()</span><br><span class="line">    ↓</span><br><span class="line">getSyncTask()                   // 创建 SyncTask</span><br><span class="line">    ├─ yieldBuffer()            // 从 buffer 获取数据</span><br><span class="line">    └─ NewSyncTask()            // 创建同步任务</span><br><span class="line">    ↓</span><br><span class="line">SyncManager.SyncData()</span><br><span class="line">    ↓</span><br><span class="line">SyncTask.Run()</span><br><span class="line">    ↓</span><br><span class="line">    ├─ [StorageV1] BulkPackWriter.Write()</span><br><span class="line">    │   ├─ writeInserts()</span><br><span class="line">    │   │   └─ storageV1Serializer.serializeBinlog()</span><br><span class="line">    │   │       └─ InsertCodec.Serialize()        ★ 序列化所有字段（包括 timestamp）</span><br><span class="line">    │   │           └─ 为每个 field 创建独立的 binlog 文件</span><br><span class="line">    │   │</span><br><span class="line">    │   ├─ writeStats()                           ★ 写入统计信息（含 timestamp 范围）</span><br><span class="line">    │   └─ writeBM25Stats()</span><br><span class="line">    │</span><br><span class="line">    └─ [StorageV2] BulkPackWriterV2.Write()</span><br><span class="line">        ├─ writeInserts()</span><br><span class="line">        │   ├─ serializeBinlog()                  // 转换为 Arrow Record</span><br><span class="line">        │   ├─ 提取 timestamp 列计算范围</span><br><span class="line">        │   └─ PackedRecordWriter.Write()         ★ 写入 Parquet 格式</span><br><span class="line">        │</span><br><span class="line">        └─ writeStats()</span><br></pre></td></tr></table></figure><h3 id="3-2-关键代码位置"><a href="#3-2-关键代码位置" class="headerlink" title="3.2 关键代码位置"></a>3.2 关键代码位置</h3><h4 id="3-2-1-WriteBuffer-创建-SyncTask"><a href="#3-2-1-WriteBuffer-创建-SyncTask" class="headerlink" title="3.2.1 WriteBuffer 创建 SyncTask"></a>3.2.1 WriteBuffer 创建 SyncTask</h4><p><strong>文件</strong>: <code>internal/flushcommon/writebuffer/write_buffer.go</code></p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(wb *writeBufferBase)</span></span> getSyncTask(ctx context.Context, segmentID <span class="type">int64</span>) (syncmgr.Task, <span class="type">error</span>) &#123;</span><br><span class="line">    segmentInfo, ok := wb.metaCache.GetSegmentByID(segmentID)</span><br><span class="line">    <span class="keyword">if</span> !ok &#123;</span><br><span class="line">        <span class="keyword">return</span> <span class="literal">nil</span>, merr.WrapErrSegmentNotFound(segmentID)</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">var</span> batchSize <span class="type">int64</span></span><br><span class="line">    <span class="keyword">var</span> tsFrom, tsTo <span class="type">uint64</span></span><br><span class="line"></span><br><span class="line">    <span class="comment">// 从 buffer 中获取数据，包括 insert data 和 timestamp 范围</span></span><br><span class="line">    insert, bm25, delta, schema, timeRange, startPos := wb.yieldBuffer(segmentID)</span><br><span class="line">    <span class="keyword">if</span> timeRange != <span class="literal">nil</span> &#123;</span><br><span class="line">        tsFrom, tsTo = timeRange.timestampMin, timeRange.timestampMax</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    <span class="keyword">for</span> _, chunk := <span class="keyword">range</span> insert &#123;</span><br><span class="line">        batchSize += <span class="type">int64</span>(chunk.GetRowNum())</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    <span class="comment">// 创建 SyncPack，包含所有需要持久化的数据</span></span><br><span class="line">    pack := &amp;syncmgr.SyncPack&#123;&#125;</span><br><span class="line">    pack.WithInsertData(insert).        <span class="comment">// ★ InsertData 包含 timestamp 字段</span></span><br><span class="line">        WithDeleteData(delta).</span><br><span class="line">        WithCollectionID(wb.collectionID).</span><br><span class="line">        WithPartitionID(segmentInfo.PartitionID()).</span><br><span class="line">        WithSegmentID(segmentID).</span><br><span class="line">        WithTimeRange(tsFrom, tsTo).    <span class="comment">// ★ Timestamp 范围</span></span><br><span class="line">        WithBatchRows(batchSize)</span><br><span class="line"></span><br><span class="line">    task := syncmgr.NewSyncTask().</span><br><span class="line">        WithMetaCache(wb.metaCache).</span><br><span class="line">        WithSchema(schema).</span><br><span class="line">        WithSyncPack(pack)</span><br><span class="line">    <span class="keyword">return</span> task, <span class="literal">nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h4 id="3-2-2-SyncTask-执行写入"><a href="#3-2-2-SyncTask-执行写入" class="headerlink" title="3.2.2 SyncTask 执行写入"></a>3.2.2 SyncTask 执行写入</h4><p><strong>文件</strong>: <code>internal/flushcommon/syncmgr/task.go</code></p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(t *SyncTask)</span></span> Run(ctx context.Context) (err <span class="type">error</span>) &#123;</span><br><span class="line">    segmentInfo, has := t.metacache.GetSegmentByID(t.segmentID)</span><br><span class="line">    <span class="keyword">if</span> !has &#123;</span><br><span class="line">        <span class="keyword">return</span> <span class="literal">nil</span></span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    columnGroups := t.getColumnGroups(segmentInfo)</span><br><span class="line"></span><br><span class="line">    <span class="comment">// 根据 StorageVersion 选择不同的写入方式</span></span><br><span class="line">    <span class="keyword">switch</span> segmentInfo.GetStorageVersion() &#123;</span><br><span class="line">    <span class="keyword">case</span> storage.StorageV2:</span><br><span class="line">        writer := NewBulkPackWriterV2(t.metacache, t.schema, t.chunkManager, </span><br><span class="line">            t.allocator, <span class="number">0</span>, packed.DefaultMultiPartUploadSize, </span><br><span class="line">            t.storageConfig, columnGroups, t.writeRetryOpts...)</span><br><span class="line">        t.insertBinlogs, t.deltaBinlog, t.statsBinlogs, t.bm25Binlogs, </span><br><span class="line">            t.manifestPath, t.flushedSize, err = writer.Write(ctx, t.pack)</span><br><span class="line">        </span><br><span class="line">    <span class="keyword">default</span>:  <span class="comment">// StorageV1</span></span><br><span class="line">        writer := NewBulkPackWriter(t.metacache, t.schema, </span><br><span class="line">            t.chunkManager, t.allocator, t.writeRetryOpts...)</span><br><span class="line">        t.insertBinlogs, t.deltaBinlog, t.statsBinlogs, t.bm25Binlogs, </span><br><span class="line">            t.flushedSize, err = writer.Write(ctx, t.pack)</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">return</span> err</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h4 id="3-2-3-StorageV1-序列化-Binlog"><a href="#3-2-3-StorageV1-序列化-Binlog" class="headerlink" title="3.2.3 StorageV1 序列化 Binlog"></a>3.2.3 StorageV1 序列化 Binlog</h4><p><strong>文件</strong>: <code>internal/flushcommon/syncmgr/storage_serializer.go</code></p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(s *storageV1Serializer)</span></span> serializeBinlog(ctx context.Context, pack *SyncPack) (<span class="keyword">map</span>[<span class="type">int64</span>]*storage.Blob, <span class="type">error</span>) &#123;</span><br><span class="line">    <span class="keyword">if</span> <span class="built_in">len</span>(pack.insertData) == <span class="number">0</span> &#123;</span><br><span class="line">        <span class="keyword">return</span> <span class="built_in">make</span>(<span class="keyword">map</span>[<span class="type">int64</span>]*storage.Blob), <span class="literal">nil</span></span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 调用 InsertCodec 序列化所有字段</span></span><br><span class="line">    blobs, err := s.inCodec.Serialize(pack.partitionID, pack.segmentID, pack.insertData...)</span><br><span class="line">    <span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">        <span class="keyword">return</span> <span class="literal">nil</span>, err</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    <span class="comment">// 返回 map[fieldID]*Blob，包括 timestamp field (fieldID=1)</span></span><br><span class="line">    result := <span class="built_in">make</span>(<span class="keyword">map</span>[<span class="type">int64</span>]*storage.Blob)</span><br><span class="line">    <span class="keyword">for</span> _, blob := <span class="keyword">range</span> blobs &#123;</span><br><span class="line">        fieldID, err := strconv.ParseInt(blob.GetKey(), <span class="number">10</span>, <span class="number">64</span>)</span><br><span class="line">        <span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">            <span class="keyword">return</span> <span class="literal">nil</span>, err</span><br><span class="line">        &#125;</span><br><span class="line">        result[fieldID] = blob</span><br><span class="line">    &#125;</span><br><span class="line">    <span class="keyword">return</span> result, <span class="literal">nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>文件</strong>: <code>internal/storage/data_codec.go</code></p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(insertCodec *InsertCodec)</span></span> Serialize(partitionID UniqueID, segmentID UniqueID, data ...*InsertData) ([]*Blob, <span class="type">error</span>) &#123;</span><br><span class="line">    blobs := <span class="built_in">make</span>([]*Blob, <span class="number">0</span>)</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">var</span> rowNum <span class="type">int64</span></span><br><span class="line">    <span class="keyword">var</span> startTs, endTs Timestamp</span><br><span class="line">    startTs, endTs = math.MaxUint64, <span class="number">0</span></span><br><span class="line"></span><br><span class="line">    <span class="comment">// 1. 首先从 timestamp 字段提取时间范围</span></span><br><span class="line">    <span class="keyword">for</span> _, block := <span class="keyword">range</span> data &#123;</span><br><span class="line">        timeFieldData, ok := block.Data[common.TimeStampField]</span><br><span class="line">        <span class="keyword">if</span> !ok &#123;</span><br><span class="line">            <span class="keyword">return</span> <span class="literal">nil</span>, errors.New(<span class="string">&quot;data doesn&#x27;t contains timestamp field&quot;</span>)</span><br><span class="line">        &#125;</span><br><span class="line"></span><br><span class="line">        rowNum += <span class="type">int64</span>(timeFieldData.RowNum())</span><br><span class="line">        ts := timeFieldData.(*Int64FieldData).Data</span><br><span class="line"></span><br><span class="line">        <span class="keyword">for</span> _, t := <span class="keyword">range</span> ts &#123;</span><br><span class="line">            <span class="keyword">if</span> <span class="type">uint64</span>(t) &gt; endTs &#123;</span><br><span class="line">                endTs = <span class="type">uint64</span>(t)</span><br><span class="line">            &#125;</span><br><span class="line">            <span class="keyword">if</span> <span class="type">uint64</span>(t) &lt; startTs &#123;</span><br><span class="line">                startTs = <span class="type">uint64</span>(t)</span><br><span class="line">            &#125;</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    <span class="comment">// 2. 为每个字段创建 binlog writer（包括 timestamp）</span></span><br><span class="line">    serializeField := <span class="function"><span class="keyword">func</span><span class="params">(field *schemapb.FieldSchema)</span></span> <span class="type">error</span> &#123;</span><br><span class="line">        <span class="comment">// 创建字段对应的 binlog writer</span></span><br><span class="line">        writer = NewInsertBinlogWriter(field.DataType, insertCodec.Schema.ID, </span><br><span class="line">            partitionID, segmentID, field.FieldID, field.GetNullable())</span><br><span class="line"></span><br><span class="line">        eventWriter, err := writer.NextInsertEventWriter()</span><br><span class="line">        <span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">            <span class="keyword">return</span> err</span><br><span class="line">        &#125;</span><br><span class="line">        </span><br><span class="line">        <span class="comment">// 设置 timestamp 范围到 binlog 元数据</span></span><br><span class="line">        eventWriter.SetEventTimestamp(startTs, endTs)</span><br><span class="line">        </span><br><span class="line">        <span class="comment">// 将字段数据写入 payload</span></span><br><span class="line">        <span class="keyword">for</span> _, block := <span class="keyword">range</span> data &#123;</span><br><span class="line">            singleData := block.Data[field.FieldID]</span><br><span class="line">            <span class="keyword">if</span> err = AddFieldDataToPayload(eventWriter, field.DataType, singleData); err != <span class="literal">nil</span> &#123;</span><br><span class="line">                <span class="keyword">return</span> err</span><br><span class="line">            &#125;</span><br><span class="line">        &#125;</span><br><span class="line"></span><br><span class="line">        <span class="comment">// 完成并获取 blob</span></span><br><span class="line">        buffer, err := writer.GetBuffer()</span><br><span class="line">        blobs = <span class="built_in">append</span>(blobs, &amp;Blob&#123;</span><br><span class="line">            Key:        fmt.Sprintf(<span class="string">&quot;%d&quot;</span>, field.FieldID),</span><br><span class="line">            Value:      buffer,</span><br><span class="line">            RowNum:     rowNum,</span><br><span class="line">        &#125;)</span><br><span class="line">        <span class="keyword">return</span> <span class="literal">nil</span></span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 3. 遍历所有字段进行序列化（包括 timestamp field）</span></span><br><span class="line">    <span class="keyword">for</span> _, field := <span class="keyword">range</span> insertCodec.Schema.Schema.Fields &#123;</span><br><span class="line">        <span class="keyword">if</span> err := serializeField(field); err != <span class="literal">nil</span> &#123;</span><br><span class="line">            <span class="keyword">return</span> <span class="literal">nil</span>, err</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    <span class="keyword">return</span> blobs, <span class="literal">nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h4 id="3-2-4-StorageV2-写入-Parquet"><a href="#3-2-4-StorageV2-写入-Parquet" class="headerlink" title="3.2.4 StorageV2 写入 Parquet"></a>3.2.4 StorageV2 写入 Parquet</h4><p><strong>文件</strong>: <code>internal/flushcommon/syncmgr/pack_writer_v2.go</code></p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br><span class="line">91</span><br><span class="line">92</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(bw *BulkPackWriterV2)</span></span> writeInserts(ctx context.Context, pack *SyncPack) (<span class="keyword">map</span>[<span class="type">int64</span>]*datapb.FieldBinlog, <span class="type">string</span>, <span class="type">error</span>) &#123;</span><br><span class="line">    <span class="keyword">if</span> <span class="built_in">len</span>(pack.insertData) == <span class="number">0</span> &#123;</span><br><span class="line">        <span class="keyword">return</span> <span class="built_in">make</span>(<span class="keyword">map</span>[<span class="type">int64</span>]*datapb.FieldBinlog), <span class="string">&quot;&quot;</span>, <span class="literal">nil</span></span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    <span class="comment">// 1. 序列化为 Arrow Record</span></span><br><span class="line">    rec, err := bw.serializeBinlog(ctx, pack)</span><br><span class="line">    <span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">        <span class="keyword">return</span> <span class="literal">nil</span>, <span class="string">&quot;&quot;</span>, err</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    logs := <span class="built_in">make</span>(<span class="keyword">map</span>[<span class="type">int64</span>]*datapb.FieldBinlog)</span><br><span class="line"></span><br><span class="line">    <span class="comment">// 2. 从 Record 中提取 timestamp 列并计算范围</span></span><br><span class="line">    tsArray := rec.Column(common.TimeStampField).(*array.Int64)</span><br><span class="line">    rows := rec.Len()</span><br><span class="line">    <span class="keyword">var</span> tsFrom <span class="type">uint64</span> = math.MaxUint64</span><br><span class="line">    <span class="keyword">var</span> tsTo <span class="type">uint64</span> = <span class="number">0</span></span><br><span class="line">    <span class="keyword">for</span> i := <span class="number">0</span>; i &lt; rows; i++ &#123;</span><br><span class="line">        ts := typeutil.Timestamp(tsArray.Value(i))</span><br><span class="line">        <span class="keyword">if</span> ts &lt; tsFrom &#123;</span><br><span class="line">            tsFrom = ts</span><br><span class="line">        &#125;</span><br><span class="line">        <span class="keyword">if</span> ts &gt; tsTo &#123;</span><br><span class="line">            tsTo = ts</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    <span class="comment">// 3. 写入 Parquet 文件（包含所有字段，含 timestamp）</span></span><br><span class="line">    doWrite := <span class="function"><span class="keyword">func</span><span class="params">(w storage.RecordWriter)</span></span> <span class="type">error</span> &#123;</span><br><span class="line">        <span class="keyword">if</span> err = w.Write(rec); err != <span class="literal">nil</span> &#123;  <span class="comment">// ★ Arrow Record 包含 timestamp 列</span></span><br><span class="line">            <span class="keyword">return</span> err</span><br><span class="line">        &#125;</span><br><span class="line">        <span class="keyword">return</span> w.Close()</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    <span class="comment">// 4. 创建 PackedRecordWriter 并写入</span></span><br><span class="line">    <span class="keyword">if</span> paramtable.Get().CommonCfg.UseLoonFFI.GetAsBool() &#123;</span><br><span class="line">        <span class="comment">// Manifest 模式</span></span><br><span class="line">        w, err := storage.NewPackedRecordManifestWriter(...)</span><br><span class="line">        <span class="keyword">if</span> err = doWrite(w); err != <span class="literal">nil</span> &#123;</span><br><span class="line">            <span class="keyword">return</span> <span class="literal">nil</span>, <span class="string">&quot;&quot;</span>, err</span><br><span class="line">        &#125;</span><br><span class="line">        manifestPath = w.GetWrittenManifest()</span><br><span class="line">    &#125; <span class="keyword">else</span> &#123;</span><br><span class="line">        <span class="comment">// 普通模式</span></span><br><span class="line">        w, err := storage.NewPackedRecordWriter(...)</span><br><span class="line">        <span class="keyword">if</span> err = doWrite(w); err != <span class="literal">nil</span> &#123;</span><br><span class="line">            <span class="keyword">return</span> <span class="literal">nil</span>, <span class="string">&quot;&quot;</span>, err</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    <span class="comment">// 5. 记录 binlog 信息（含 timestamp 范围）</span></span><br><span class="line">    <span class="keyword">for</span> _, columnGroup := <span class="keyword">range</span> columnGroups &#123;</span><br><span class="line">        logs[columnGroup.GroupID] = &amp;datapb.FieldBinlog&#123;</span><br><span class="line">            FieldID:     columnGroup.GroupID,</span><br><span class="line">            ChildFields: columnGroup.Fields,</span><br><span class="line">            Binlogs: []*datapb.Binlog&#123;</span><br><span class="line">                &#123;</span><br><span class="line">                    LogPath:       path,</span><br><span class="line">                    EntriesNum:    rowNum,</span><br><span class="line">                    TimestampFrom: tsFrom,  <span class="comment">// ★ Timestamp 范围</span></span><br><span class="line">                    TimestampTo:   tsTo,</span><br><span class="line">                &#125;,</span><br><span class="line">            &#125;,</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    <span class="keyword">return</span> logs, manifestPath, <span class="literal">nil</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// 序列化为 Arrow Record</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(bw *BulkPackWriterV2)</span></span> serializeBinlog(_ context.Context, pack *SyncPack) (storage.Record, <span class="type">error</span>) &#123;</span><br><span class="line">    <span class="comment">// 转换 schema 为 Arrow Schema</span></span><br><span class="line">    arrowSchema, err := storage.ConvertToArrowSchema(bw.schema, <span class="literal">true</span>)</span><br><span class="line">    <span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">        <span class="keyword">return</span> <span class="literal">nil</span>, err</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    builder := array.NewRecordBuilder(memory.DefaultAllocator, arrowSchema)</span><br><span class="line">    <span class="keyword">defer</span> builder.Release()</span><br><span class="line"></span><br><span class="line">    <span class="comment">// 构建 Arrow Record（包含所有字段）</span></span><br><span class="line">    <span class="keyword">for</span> _, chunk := <span class="keyword">range</span> pack.insertData &#123;</span><br><span class="line">        <span class="keyword">if</span> err := storage.BuildRecord(builder, chunk, bw.schema); err != <span class="literal">nil</span> &#123;</span><br><span class="line">            <span class="keyword">return</span> <span class="literal">nil</span>, err</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    rec := builder.NewRecord()</span><br><span class="line">    <span class="keyword">return</span> storage.NewSimpleArrowRecord(rec, field2Col), <span class="literal">nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><hr><h2 id="4-Compaction-写入流程"><a href="#4-Compaction-写入流程" class="headerlink" title="4. Compaction 写入流程"></a>4. Compaction 写入流程</h2><p>Compaction 是对已有 Segment 的合并和优化，也会重新写入 timestamp 信息。</p><h3 id="4-1-整体流程图"><a href="#4-1-整体流程图" class="headerlink" title="4.1 整体流程图"></a>4.1 整体流程图</h3><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br></pre></td><td class="code"><pre><span class="line">Compaction Task 创建</span><br><span class="line">    ↓</span><br><span class="line">mixCompactionTask.Compact()</span><br><span class="line">    ↓</span><br><span class="line">mixCompactionTask.mergeSplit()</span><br><span class="line">    ↓</span><br><span class="line">NewMultiSegmentWriter()                    // 创建多 segment 写入器</span><br><span class="line">    ↓</span><br><span class="line">对每个源 Segment:</span><br><span class="line">    ├─ storage.NewBinlogRecordReader()     // 读取原始数据</span><br><span class="line">    └─ mixCompactionTask.writeSegment()</span><br><span class="line">        └─ [循环读取 Record]</span><br><span class="line">            ├─ reader.Next()               // 读取 Arrow Record（含 timestamp）</span><br><span class="line">            ├─ 过滤已删除/过期数据</span><br><span class="line">            └─ mWriter.Write(rec)          ★ 写入合并后的数据</span><br><span class="line">                └─ MultiSegmentWriter.Write()</span><br><span class="line">                    └─ BinlogValueWriter.Write()</span><br><span class="line">                        └─ BinlogRecordWriter.Write()</span><br><span class="line">                            ├─ [V1] CompositeBinlogRecordWriter.Write()</span><br><span class="line">                            │   ├─ 提取 timestamp 范围</span><br><span class="line">                            │   └─ 序列化为 binlog</span><br><span class="line">                            │</span><br><span class="line">                            └─ [V2] PackedBinlogRecordWriter.Write()</span><br><span class="line">                                ├─ 提取 timestamp 范围</span><br><span class="line">                                └─ 写入 Parquet</span><br></pre></td></tr></table></figure><h3 id="4-2-关键代码位置"><a href="#4-2-关键代码位置" class="headerlink" title="4.2 关键代码位置"></a>4.2 关键代码位置</h3><h4 id="4-2-1-Mix-Compaction-主流程"><a href="#4-2-1-Mix-Compaction-主流程" class="headerlink" title="4.2.1 Mix Compaction 主流程"></a>4.2.1 Mix Compaction 主流程</h4><p><strong>文件</strong>: <code>internal/datanode/compactor/mix_compactor.go</code></p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(t *mixCompactionTask)</span></span> mergeSplit(ctx context.Context) ([]*datapb.CompactionSegment, <span class="type">error</span>) &#123;</span><br><span class="line">    <span class="comment">// 创建 ID 分配器</span></span><br><span class="line">    segIDAlloc := allocator.NewLocalAllocator(</span><br><span class="line">        t.plan.GetPreAllocatedSegmentIDs().GetBegin(), </span><br><span class="line">        t.plan.GetPreAllocatedSegmentIDs().GetEnd())</span><br><span class="line">    logIDAlloc := allocator.NewLocalAllocator(</span><br><span class="line">        t.plan.GetPreAllocatedLogIDs().GetBegin(), </span><br><span class="line">        t.plan.GetPreAllocatedLogIDs().GetEnd())</span><br><span class="line">    compAlloc := NewCompactionAllocator(segIDAlloc, logIDAlloc)</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 创建 MultiSegmentWriter</span></span><br><span class="line">    mWriter, err := NewMultiSegmentWriter(ctx, t.binlogIO, compAlloc, </span><br><span class="line">        t.plan.GetMaxSize(), t.plan.GetSchema(), t.compactionParams, </span><br><span class="line">        t.maxRows, t.partitionID, t.collectionID, t.GetChannelName(), <span class="number">4096</span>)</span><br><span class="line">    <span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">        <span class="keyword">return</span> <span class="literal">nil</span>, err</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    deletedRowCount := <span class="type">int64</span>(<span class="number">0</span>)</span><br><span class="line">    expiredRowCount := <span class="type">int64</span>(<span class="number">0</span>)</span><br><span class="line"></span><br><span class="line">    <span class="comment">// 处理每个源 segment</span></span><br><span class="line">    <span class="keyword">for</span> _, seg := <span class="keyword">range</span> t.plan.GetSegmentBinlogs() &#123;</span><br><span class="line">        del, exp, err := t.writeSegment(ctx, seg, mWriter, pkField)</span><br><span class="line">        <span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">            <span class="keyword">return</span> <span class="literal">nil</span>, err</span><br><span class="line">        &#125;</span><br><span class="line">        deletedRowCount += del</span><br><span class="line">        expiredRowCount += exp</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 关闭 writer 完成写入</span></span><br><span class="line">    <span class="keyword">if</span> err := mWriter.Close(); err != <span class="literal">nil</span> &#123;</span><br><span class="line">        <span class="keyword">return</span> <span class="literal">nil</span>, err</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">return</span> mWriter.GetCompactionSegments(), <span class="literal">nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h4 id="4-2-2-写入单个-Segment"><a href="#4-2-2-写入单个-Segment" class="headerlink" title="4.2.2 写入单个 Segment"></a>4.2.2 写入单个 Segment</h4><p><strong>文件</strong>: <code>internal/datanode/compactor/mix_compactor.go</code></p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br><span class="line">91</span><br><span class="line">92</span><br><span class="line">93</span><br><span class="line">94</span><br><span class="line">95</span><br><span class="line">96</span><br><span class="line">97</span><br><span class="line">98</span><br><span class="line">99</span><br><span class="line">100</span><br><span class="line">101</span><br><span class="line">102</span><br><span class="line">103</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(t *mixCompactionTask)</span></span> writeSegment(ctx context.Context,</span><br><span class="line">    seg *datapb.CompactionSegmentBinlogs,</span><br><span class="line">    mWriter *MultiSegmentWriter, </span><br><span class="line">    pkField *schemapb.FieldSchema) (deletedRowCount, expiredRowCount <span class="type">int64</span>, err <span class="type">error</span>) &#123;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 1. 读取 delta log（删除记录）</span></span><br><span class="line">    deltaPaths := <span class="built_in">make</span>([]<span class="type">string</span>, <span class="number">0</span>)</span><br><span class="line">    <span class="keyword">for</span> _, fieldBinlog := <span class="keyword">range</span> seg.GetDeltalogs() &#123;</span><br><span class="line">        <span class="keyword">for</span> _, binlog := <span class="keyword">range</span> fieldBinlog.GetBinlogs() &#123;</span><br><span class="line">            deltaPaths = <span class="built_in">append</span>(deltaPaths, binlog.GetLogPath())</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line">    delta, err := compaction.ComposeDeleteFromDeltalogs(ctx, t.binlogIO, deltaPaths)</span><br><span class="line">    <span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">        <span class="keyword">return</span></span><br><span class="line">    &#125;</span><br><span class="line">    entityFilter := compaction.NewEntityFilter(delta, t.plan.GetCollectionTtl(), t.currentTime)</span><br><span class="line"></span><br><span class="line">    <span class="comment">// 2. 创建 RecordReader 读取原始数据</span></span><br><span class="line">    <span class="keyword">var</span> reader storage.RecordReader</span><br><span class="line">    <span class="keyword">if</span> seg.GetManifest() != <span class="string">&quot;&quot;</span> &#123;</span><br><span class="line">        reader, err = storage.NewManifestRecordReader(ctx, seg.GetManifest(), </span><br><span class="line">            t.plan.GetSchema(), ...)</span><br><span class="line">    &#125; <span class="keyword">else</span> &#123;</span><br><span class="line">        reader, err = storage.NewBinlogRecordReader(ctx, seg.GetFieldBinlogs(), </span><br><span class="line">            t.plan.GetSchema(), ...)</span><br><span class="line">    &#125;</span><br><span class="line">    <span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">        <span class="keyword">return</span></span><br><span class="line">    &#125;</span><br><span class="line">    <span class="keyword">defer</span> reader.Close()</span><br><span class="line"></span><br><span class="line">    <span class="comment">// 3. 循环读取并写入数据</span></span><br><span class="line">    <span class="keyword">for</span> &#123;</span><br><span class="line">        <span class="keyword">var</span> r storage.Record</span><br><span class="line">        r, err = reader.Next()</span><br><span class="line">        <span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">            <span class="keyword">if</span> err == sio.EOF &#123;</span><br><span class="line">                err = <span class="literal">nil</span></span><br><span class="line">                <span class="keyword">break</span></span><br><span class="line">            &#125;</span><br><span class="line">            <span class="keyword">return</span></span><br><span class="line">        &#125;</span><br><span class="line"></span><br><span class="line">        <span class="comment">// 4. 过滤删除和过期的数据</span></span><br><span class="line">        <span class="keyword">var</span> (</span><br><span class="line">            pkArray    = r.Column(pkField.FieldID)</span><br><span class="line">            tsArray    = r.Column(common.TimeStampField).(*array.Int64)  <span class="comment">// ★ 读取 timestamp</span></span><br><span class="line">            sliceStart = <span class="number">-1</span></span><br><span class="line">            rb         *storage.RecordBuilder</span><br><span class="line">        )</span><br><span class="line"></span><br><span class="line">        <span class="keyword">for</span> i := <span class="keyword">range</span> r.Len() &#123;</span><br><span class="line">            <span class="comment">// 获取 PK 和 Timestamp</span></span><br><span class="line">            <span class="keyword">var</span> pk any</span><br><span class="line">            <span class="keyword">switch</span> pkField.DataType &#123;</span><br><span class="line">            <span class="keyword">case</span> schemapb.DataType_Int64:</span><br><span class="line">                pk = pkArray.(*array.Int64).Value(i)</span><br><span class="line">            <span class="keyword">case</span> schemapb.DataType_VarChar:</span><br><span class="line">                pk = pkArray.(*array.String).Value(i)</span><br><span class="line">            &#125;</span><br><span class="line">            ts := typeutil.Timestamp(tsArray.Value(i))</span><br><span class="line">            </span><br><span class="line">            <span class="comment">// 根据 timestamp 和删除记录过滤</span></span><br><span class="line">            <span class="keyword">if</span> entityFilter.Filtered(pk, ts) &#123;</span><br><span class="line">                <span class="keyword">if</span> rb == <span class="literal">nil</span> &#123;</span><br><span class="line">                    rb = storage.NewRecordBuilder(t.plan.GetSchema())</span><br><span class="line">                &#125;</span><br><span class="line">                <span class="keyword">if</span> sliceStart != <span class="number">-1</span> &#123;</span><br><span class="line">                    rb.Append(r, sliceStart, i)</span><br><span class="line">                &#125;</span><br><span class="line">                sliceStart = <span class="number">-1</span></span><br><span class="line">                <span class="keyword">continue</span></span><br><span class="line">            &#125;</span><br><span class="line"></span><br><span class="line">            <span class="keyword">if</span> sliceStart == <span class="number">-1</span> &#123;</span><br><span class="line">                sliceStart = i</span><br><span class="line">            &#125;</span><br><span class="line">        &#125;</span><br><span class="line"></span><br><span class="line">        <span class="comment">// 5. 写入过滤后的数据</span></span><br><span class="line">        <span class="keyword">if</span> rb != <span class="literal">nil</span> &#123;</span><br><span class="line">            <span class="keyword">if</span> sliceStart != <span class="number">-1</span> &#123;</span><br><span class="line">                rb.Append(r, sliceStart, r.Len())</span><br><span class="line">            &#125;</span><br><span class="line">            <span class="keyword">if</span> rb.GetRowNum() &gt; <span class="number">0</span> &#123;</span><br><span class="line">                rec := rb.Build()</span><br><span class="line">                <span class="keyword">defer</span> rec.Release()</span><br><span class="line">                err := mWriter.Write(rec)  <span class="comment">// ★ 写入 Record（包含 timestamp）</span></span><br><span class="line">                <span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">                    <span class="keyword">return</span> <span class="number">0</span>, <span class="number">0</span>, err</span><br><span class="line">                &#125;</span><br><span class="line">            &#125;</span><br><span class="line">        &#125; <span class="keyword">else</span> &#123;</span><br><span class="line">            err := mWriter.Write(r)</span><br><span class="line">            <span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">                <span class="keyword">return</span> <span class="number">0</span>, <span class="number">0</span>, err</span><br><span class="line">            &#125;</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    <span class="keyword">return</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h4 id="4-2-3-MultiSegmentWriter-写入"><a href="#4-2-3-MultiSegmentWriter-写入" class="headerlink" title="4.2.3 MultiSegmentWriter 写入"></a>4.2.3 MultiSegmentWriter 写入</h4><p><strong>文件</strong>: <code>internal/datanode/compactor/segment_writer.go</code></p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">type</span> MultiSegmentWriter <span class="keyword">struct</span> &#123;</span><br><span class="line">    ctx       context.Context</span><br><span class="line">    binlogIO  io.BinlogIO</span><br><span class="line">    allocator *compactionAlloactor</span><br><span class="line"></span><br><span class="line">    writer           *storage.BinlogValueWriter  <span class="comment">// 底层 writer</span></span><br><span class="line">    currentSegmentID typeutil.UniqueID</span><br><span class="line"></span><br><span class="line">    maxRows     <span class="type">int64</span></span><br><span class="line">    segmentSize <span class="type">int64</span></span><br><span class="line">    </span><br><span class="line">    schema        *schemapb.CollectionSchema</span><br><span class="line">    partitionID   <span class="type">int64</span></span><br><span class="line">    collectionID  <span class="type">int64</span></span><br><span class="line">    </span><br><span class="line">    res []*datapb.CompactionSegment</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(w *MultiSegmentWriter)</span></span> Write(r storage.Record) <span class="type">error</span> &#123;</span><br><span class="line">    <span class="comment">// 检查是否需要轮转到新 segment</span></span><br><span class="line">    <span class="keyword">if</span> w.writer == <span class="literal">nil</span> || w.writer.GetWrittenUncompressed() &gt;= <span class="type">uint64</span>(w.segmentSize) &#123;</span><br><span class="line">        <span class="keyword">if</span> err := w.rotateWriter(); err != <span class="literal">nil</span> &#123;</span><br><span class="line">            <span class="keyword">return</span> err</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 调用底层 writer 写入 Record</span></span><br><span class="line">    <span class="keyword">return</span> w.writer.Write(r)  <span class="comment">// ★ Record 包含所有字段（含 timestamp）</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(w *MultiSegmentWriter)</span></span> rotateWriter() <span class="type">error</span> &#123;</span><br><span class="line">    <span class="comment">// 关闭当前 writer</span></span><br><span class="line">    <span class="keyword">if</span> err := w.closeWriter(); err != <span class="literal">nil</span> &#123;</span><br><span class="line">        <span class="keyword">return</span> err</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    <span class="comment">// 分配新的 segment ID</span></span><br><span class="line">    newSegmentID, err := w.allocator.allocSegmentID()</span><br><span class="line">    <span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">        <span class="keyword">return</span> err</span><br><span class="line">    &#125;</span><br><span class="line">    w.currentSegmentID = newSegmentID</span><br><span class="line"></span><br><span class="line">    <span class="comment">// 创建新的 BinlogRecordWriter</span></span><br><span class="line">    rw, err := storage.NewBinlogRecordWriter(w.ctx, w.collectionID, </span><br><span class="line">        w.partitionID, newSegmentID, w.schema, w.allocator.logIDAlloc, </span><br><span class="line">        chunkSize, w.maxRows, w.rwOption...)</span><br><span class="line">    <span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">        <span class="keyword">return</span> err</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    <span class="comment">// 包装为 BinlogValueWriter</span></span><br><span class="line">    w.writer = storage.NewBinlogValueWriter(rw, w.batchSize)</span><br><span class="line">    <span class="keyword">return</span> <span class="literal">nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h4 id="4-2-4-底层-Record-Writer"><a href="#4-2-4-底层-Record-Writer" class="headerlink" title="4.2.4 底层 Record Writer"></a>4.2.4 底层 Record Writer</h4><p><strong>文件</strong>: <code>internal/storage/binlog_record_writer.go</code></p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(pw *PackedBinlogRecordWriter)</span></span> Write(r Record) <span class="type">error</span> &#123;</span><br><span class="line">    <span class="keyword">if</span> err := pw.initWriters(r); err != <span class="literal">nil</span> &#123;</span><br><span class="line">        <span class="keyword">return</span> err</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    <span class="comment">// 1. 提取 timestamp 范围</span></span><br><span class="line">    tsArray := r.Column(common.TimeStampField).(*array.Int64)</span><br><span class="line">    rows := r.Len()</span><br><span class="line">    <span class="keyword">for</span> i := <span class="number">0</span>; i &lt; rows; i++ &#123;</span><br><span class="line">        ts := typeutil.Timestamp(tsArray.Value(i))</span><br><span class="line">        <span class="keyword">if</span> ts &lt; pw.tsFrom &#123;</span><br><span class="line">            pw.tsFrom = ts</span><br><span class="line">        &#125;</span><br><span class="line">        <span class="keyword">if</span> ts &gt; pw.tsTo &#123;</span><br><span class="line">            pw.tsTo = ts</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    <span class="comment">// 2. 收集统计信息</span></span><br><span class="line">    <span class="keyword">if</span> err := pw.pkCollector.Collect(r); err != <span class="literal">nil</span> &#123;</span><br><span class="line">        <span class="keyword">return</span> err</span><br><span class="line">    &#125;</span><br><span class="line">    <span class="keyword">if</span> err := pw.bm25Collector.Collect(r); err != <span class="literal">nil</span> &#123;</span><br><span class="line">        <span class="keyword">return</span> err</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    <span class="comment">// 3. 写入数据（包含所有字段）</span></span><br><span class="line">    err := pw.writer.Write(r)  <span class="comment">// ★ 完整的 Arrow Record</span></span><br><span class="line">    <span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">        <span class="keyword">return</span> err</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    pw.writtenUncompressed = pw.writer.GetWrittenUncompressed()</span><br><span class="line">    <span class="keyword">return</span> <span class="literal">nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><hr><h2 id="5-Storage-层的-Timestamp-处理"><a href="#5-Storage-层的-Timestamp-处理" class="headerlink" title="5. Storage 层的 Timestamp 处理"></a>5. Storage 层的 Timestamp 处理</h2><h3 id="5-1-Growing-Segment-中的-Timestamp"><a href="#5-1-Growing-Segment-中的-Timestamp" class="headerlink" title="5.1 Growing Segment 中的 Timestamp"></a>5.1 Growing Segment 中的 Timestamp</h3><p><strong>文件</strong>: <code>internal/core/src/segcore/SegmentGrowingImpl.cpp</code></p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="type">void</span> <span class="title">SegmentGrowingImpl::load_field_data_common</span><span class="params">(</span></span></span><br><span class="line"><span class="params"><span class="function">    FieldId field_id,</span></span></span><br><span class="line"><span class="params"><span class="function">    <span class="type">size_t</span> reserved_offset,</span></span></span><br><span class="line"><span class="params"><span class="function">    <span class="type">const</span> std::vector&lt;FieldDataPtr&gt;&amp; field_data,</span></span></span><br><span class="line"><span class="params"><span class="function">    FieldId primary_field_id,</span></span></span><br><span class="line"><span class="params"><span class="function">    <span class="type">size_t</span> num_rows)</span> </span>&#123;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 特殊处理 timestamp 字段</span></span><br><span class="line">    <span class="keyword">if</span> (field_id == TimestampFieldID) &#123;</span><br><span class="line">        <span class="comment">// query node already guarantees that the timestamp is ordered</span></span><br><span class="line">        <span class="comment">// fill into Segment.ConcurrentVector</span></span><br><span class="line">        insert_record_.timestamps_.<span class="built_in">set_data_raw</span>(reserved_offset, field_data);</span><br><span class="line">        <span class="keyword">return</span>;</span><br><span class="line">    &#125;</span><br><span class="line"></span><br><span class="line">    <span class="comment">// 处理其他字段...</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h3 id="5-2-Segment-Writer-中的-Timestamp-追踪"><a href="#5-2-Segment-Writer-中的-Timestamp-追踪" class="headerlink" title="5.2 Segment Writer 中的 Timestamp 追踪"></a>5.2 Segment Writer 中的 Timestamp 追踪</h3><p><strong>文件</strong>: <code>internal/datanode/compactor/segment_writer.go</code></p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(w *SegmentWriter)</span></span> WriteRecord(r storage.Record) <span class="type">error</span> &#123;</span><br><span class="line">    <span class="comment">// 1. 提取 timestamp 并更新范围</span></span><br><span class="line">    tsArray := r.Column(common.TimeStampField).(*array.Int64)</span><br><span class="line">    rows := r.Len()</span><br><span class="line">    <span class="keyword">for</span> i := <span class="number">0</span>; i &lt; rows; i++ &#123;</span><br><span class="line">        ts := typeutil.Timestamp(tsArray.Value(i))</span><br><span class="line">        <span class="keyword">if</span> ts &lt; w.tsFrom &#123;</span><br><span class="line">            w.tsFrom = ts</span><br><span class="line">        &#125;</span><br><span class="line">        <span class="keyword">if</span> ts &gt; w.tsTo &#123;</span><br><span class="line">            w.tsTo = ts</span><br><span class="line">        &#125;</span><br><span class="line"></span><br><span class="line">        <span class="comment">// 2. 更新 PK 统计信息</span></span><br><span class="line">        <span class="keyword">switch</span> schemapb.DataType(w.pkstats.PkType) &#123;</span><br><span class="line">        <span class="keyword">case</span> schemapb.DataType_Int64:</span><br><span class="line">            pkArray := r.Column(w.GetPkID()).(*array.Int64)</span><br><span class="line">            pk := &amp;storage.Int64PrimaryKey&#123;Value: pkArray.Value(i)&#125;</span><br><span class="line">            w.pkstats.Update(pk)</span><br><span class="line">        <span class="keyword">case</span> schemapb.DataType_VarChar:</span><br><span class="line">            pkArray := r.Column(w.GetPkID()).(*array.String)</span><br><span class="line">            pk := &amp;storage.VarCharPrimaryKey&#123;Value: pkArray.Value(i)&#125;</span><br><span class="line">            w.pkstats.Update(pk)</span><br><span class="line">        &#125;</span><br><span class="line"></span><br><span class="line">        w.rowCount.Inc()</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 3. 写入完整的 Record（包含 timestamp）</span></span><br><span class="line">    <span class="keyword">return</span> w.writer.Write(r)</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(w *SegmentWriter)</span></span> GetTimeRange() *writebuffer.TimeRange &#123;</span><br><span class="line">    <span class="keyword">return</span> writebuffer.NewTimeRange(w.tsFrom, w.tsTo)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><hr><h2 id="6-两种存储版本对比"><a href="#6-两种存储版本对比" class="headerlink" title="6. 两种存储版本对比"></a>6. 两种存储版本对比</h2><h3 id="6-1-StorageV1-Binlog-格式"><a href="#6-1-StorageV1-Binlog-格式" class="headerlink" title="6.1 StorageV1 (Binlog 格式)"></a>6.1 StorageV1 (Binlog 格式)</h3><p><strong>特点</strong>：</p><ul><li>每个字段一个独立的 binlog 文件</li><li>Timestamp 字段对应文件：<code>&#123;segmentID&#125;/insert_log/1/&#123;logID&#125;</code></li><li>使用 <code>InsertCodec</code> 序列化</li><li>Protobuf 格式存储</li></ul><p><strong>文件结构</strong>：</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br></pre></td><td class="code"><pre><span class="line">&#123;collectionID&#125;/&#123;partitionID&#125;/&#123;segmentID&#125;/</span><br><span class="line">├── insert_log/</span><br><span class="line">│   ├── 0/          # RowID field</span><br><span class="line">│   │   └── &#123;logID&#125;</span><br><span class="line">│   ├── 1/          # Timestamp field ★</span><br><span class="line">│   │   └── &#123;logID&#125;</span><br><span class="line">│   ├── 100/        # User field 1</span><br><span class="line">│   │   └── &#123;logID&#125;</span><br><span class="line">│   └── 101/        # User field 2</span><br><span class="line">│       └── &#123;logID&#125;</span><br><span class="line">├── delta_log/</span><br><span class="line">│   └── &#123;logID&#125;</span><br><span class="line">└── stats_log/</span><br><span class="line">    └── &#123;logID&#125;</span><br></pre></td></tr></table></figure><p><strong>代码路径</strong>：</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">BulkPackWriter.Write()</span><br><span class="line">  └─&gt; storageV1Serializer.serializeBinlog()</span><br><span class="line">      └─&gt; InsertCodec.Serialize()</span><br><span class="line">          └─&gt; 为每个 field 创建 binlog</span><br></pre></td></tr></table></figure><h3 id="6-2-StorageV2-Parquet-格式"><a href="#6-2-StorageV2-Parquet-格式" class="headerlink" title="6.2 StorageV2 (Parquet 格式)"></a>6.2 StorageV2 (Parquet 格式)</h3><p><strong>特点</strong>：</p><ul><li>列式存储，使用 Apache Arrow + Parquet</li><li>Timestamp 作为 Arrow Record 的一列</li><li>支持列组（Column Group）优化</li><li>更高的压缩率和查询性能</li></ul><p><strong>文件结构</strong>：</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line">&#123;collectionID&#125;/&#123;partitionID&#125;/&#123;segmentID&#125;/</span><br><span class="line">├── insert_log/</span><br><span class="line">│   ├── &#123;columnGroupID&#125;/</span><br><span class="line">│   │   └── &#123;logID&#125;.parquet  # 包含多个字段（含 timestamp）</span><br><span class="line">│   └── manifest.json         # 元数据清单</span><br><span class="line">├── delta_log/</span><br><span class="line">│   └── &#123;logID&#125;</span><br><span class="line">└── stats_log/</span><br><span class="line">    └── &#123;logID&#125;</span><br></pre></td></tr></table></figure><p><strong>代码路径</strong>：</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">BulkPackWriterV2.Write()</span><br><span class="line">  └─&gt; serializeBinlog() -&gt; Arrow Record</span><br><span class="line">      └─&gt; PackedRecordWriter.Write()</span><br><span class="line">          └─&gt; Parquet Writer (通过 FFI 调用 C++)</span><br></pre></td></tr></table></figure><h3 id="6-3-对比表"><a href="#6-3-对比表" class="headerlink" title="6.3 对比表"></a>6.3 对比表</h3><table><thead><tr><th>特性</th><th>StorageV1 (Binlog)</th><th>StorageV2 (Parquet)</th></tr></thead><tbody><tr><td><strong>文件格式</strong></td><td>Protobuf + 自定义编码</td><td>Apache Parquet</td></tr><tr><td><strong>Timestamp 存储</strong></td><td>独立 binlog 文件（FieldID&#x3D;1）</td><td>Arrow Record 的一列</td></tr><tr><td><strong>文件数量</strong></td><td>多个（每个 field 一个）</td><td>少量（按列组组织）</td></tr><tr><td><strong>压缩率</strong></td><td>中等</td><td>高</td></tr><tr><td><strong>查询性能</strong></td><td>需要读取多个文件</td><td>列式访问更高效</td></tr><tr><td><strong>元数据</strong></td><td>每个 binlog 独立元数据</td><td>统一 manifest 管理</td></tr><tr><td><strong>序列化类</strong></td><td><code>InsertCodec</code></td><td>Arrow Builder</td></tr><tr><td><strong>写入类</strong></td><td><code>BulkPackWriter</code></td><td><code>BulkPackWriterV2</code></td></tr></tbody></table><hr><h2 id="7-总结"><a href="#7-总结" class="headerlink" title="7. 总结"></a>7. 总结</h2><h3 id="7-1-核心要点"><a href="#7-1-核心要点" class="headerlink" title="7.1 核心要点"></a>7.1 核心要点</h3><ol><li><p><strong>Timestamp 完整存储</strong></p><ul><li>Timestamp 作为 FieldID&#x3D;1 的系统字段，在所有写入路径中都会完整持久化</li><li>无论是 Flush 还是 Compaction，都会保留完整的 timestamp 信息</li></ul></li><li><p><strong>两种主要写入路径</strong></p><ul><li><strong>Flush</strong>：正常数据写入的主流程，从 WriteBuffer 触发</li><li><strong>Compaction</strong>：Segment 合并优化流程，读取旧数据重新写入</li></ul></li><li><p><strong>存储格式演进</strong></p><ul><li><strong>StorageV1</strong>：每个字段独立 binlog 文件</li><li><strong>StorageV2</strong>：列式 Parquet 格式，更高效</li></ul></li><li><p><strong>Timestamp 的多重用途</strong></p><ul><li>MVCC 版本控制和可见性判断</li><li>TTL 过期数据清理</li><li>查询优化（通过 timestamp 范围过滤）</li><li>Binlog 元数据（TimestampFrom&#x2F;TimestampTo）</li></ul></li></ol><h3 id="7-2-数据流向图"><a href="#7-2-数据流向图" class="headerlink" title="7.2 数据流向图"></a>7.2 数据流向图</h3><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br></pre></td><td class="code"><pre><span class="line">用户数据 Insert/Delete</span><br><span class="line">    ↓</span><br><span class="line">[内存] WriteBuffer</span><br><span class="line">    ↓ (Buffer Full / Time Trigger)</span><br><span class="line">[Flush] SyncTask</span><br><span class="line">    ↓</span><br><span class="line">[序列化] InsertCodec / Arrow Builder</span><br><span class="line">    ↓</span><br><span class="line">[持久化] Binlog / Parquet 文件</span><br><span class="line">    ↓ (多个小 Segment)</span><br><span class="line">[Compaction] Mix/Clustering/L0 Compactor</span><br><span class="line">    ↓</span><br><span class="line">[合并] MultiSegmentWriter</span><br><span class="line">    ↓</span><br><span class="line">[持久化] 新的 Segment 文件</span><br></pre></td></tr></table></figure><h3 id="7-3-相关文件索引"><a href="#7-3-相关文件索引" class="headerlink" title="7.3 相关文件索引"></a>7.3 相关文件索引</h3><p><strong>Flush 流程</strong>：</p><ul><li><code>internal/flushcommon/writebuffer/write_buffer.go</code> - WriteBuffer 管理</li><li><code>internal/flushcommon/syncmgr/task.go</code> - SyncTask 执行</li><li><code>internal/flushcommon/syncmgr/pack_writer.go</code> - StorageV1 写入</li><li><code>internal/flushcommon/syncmgr/pack_writer_v2.go</code> - StorageV2 写入</li><li><code>internal/flushcommon/syncmgr/storage_serializer.go</code> - 序列化器</li></ul><p><strong>Compaction 流程</strong>：</p><ul><li><code>internal/datanode/compactor/mix_compactor.go</code> - Mix Compaction</li><li><code>internal/datanode/compactor/segment_writer.go</code> - Segment Writer</li><li><code>internal/datanode/compactor/clustering_compactor.go</code> - Clustering Compaction</li></ul><p><strong>Storage 层</strong>：</p><ul><li><code>internal/storage/data_codec.go</code> - InsertCodec 序列化</li><li><code>internal/storage/binlog_record_writer.go</code> - Binlog 写入</li><li><code>internal/storage/record_writer.go</code> - Parquet 写入</li><li><code>internal/storage/serde_events.go</code> - 事件序列化</li></ul><p><strong>C++ 核心</strong>：</p><ul><li><code>internal/core/src/segcore/SegmentGrowingImpl.cpp</code> - Growing Segment</li><li><code>internal/core/src/segcore/ChunkedSegmentSealedImpl.cpp</code> - Sealed Segment</li></ul><h3 id="7-4-扩展阅读"><a href="#7-4-扩展阅读" class="headerlink" title="7.4 扩展阅读"></a>7.4 扩展阅读</h3><p>相关设计文档：</p><ul><li><code>docs/developer_guides/flush_pipeline_write_buffer_manager.md</code></li><li><code>docs/developer_guides/flush_pipeline_flush_checkpoint.md</code></li><li><code>docs/design_docs/segcore/segment_sealed.md</code></li></ul><hr><h2 id="附录：术语表"><a href="#附录：术语表" class="headerlink" title="附录：术语表"></a>附录：术语表</h2><table><thead><tr><th>术语</th><th>说明</th></tr></thead><tbody><tr><td><strong>MVCC</strong></td><td>Multi-Version Concurrency Control，多版本并发控制</td></tr><tr><td><strong>Timestamp</strong></td><td>系统预留字段（FieldID&#x3D;1），用于 MVCC 版本控制</td></tr><tr><td><strong>Flush</strong></td><td>将内存中的数据持久化到存储的过程</td></tr><tr><td><strong>Compaction</strong></td><td>Segment 合并优化，将多个小 Segment 合并为大 Segment</td></tr><tr><td><strong>WriteBuffer</strong></td><td>数据写入的内存缓冲区</td></tr><tr><td><strong>SyncTask</strong></td><td>数据同步任务，负责将 buffer 数据持久化</td></tr><tr><td><strong>Binlog</strong></td><td>Binary Log，二进制日志，Milvus 的数据文件格式之一</td></tr><tr><td><strong>Parquet</strong></td><td>Apache Parquet，列式存储格式，StorageV2 使用</td></tr><tr><td><strong>Arrow Record</strong></td><td>Apache Arrow 的数据记录格式，内存中的列式数据表示</td></tr><tr><td><strong>Growing Segment</strong></td><td>增长中的 Segment，正在接收写入</td></tr><tr><td><strong>Sealed Segment</strong></td><td>已封存的 Segment，不再接收新数据</td></tr><tr><td><strong>InsertCodec</strong></td><td>Insert 数据的编解码器，用于 StorageV1</td></tr><tr><td><strong>TTL</strong></td><td>Time To Live，数据存活时间</td></tr></tbody></table><hr><p><strong>文档版本</strong>: v1.1<br><strong>最后更新</strong>: 2025-01-24<br><strong>适用 Milvus 版本</strong>: 2.5.x</p><hr><h1 id="附录-A：Segment-内部-Layout"><a href="#附录-A：Segment-内部-Layout" class="headerlink" title="附录 A：Segment 内部 Layout"></a>附录 A：Segment 内部 Layout</h1><h2 id="A-1-Segment-类型概述"><a href="#A-1-Segment-类型概述" class="headerlink" title="A.1 Segment 类型概述"></a>A.1 Segment 类型概述</h2><p>Milvus 中有两种主要的 Segment 类型：</p><h3 id="Growing-Segment-增长中的-Segment"><a href="#Growing-Segment-增长中的-Segment" class="headerlink" title="Growing Segment (增长中的 Segment)"></a>Growing Segment (增长中的 Segment)</h3><ul><li><strong>状态</strong>: <code>SegmentState_Growing</code></li><li><strong>特点</strong>: 正在接收写入操作</li><li><strong>实现类</strong>: <code>SegmentGrowingImpl</code></li><li><strong>数据结构</strong>: 使用 <code>ConcurrentVector</code> 存储，支持并发写入</li><li><strong>索引</strong>: 可选的 interim index（临时索引）</li></ul><h3 id="Sealed-Segment-封存的-Segment"><a href="#Sealed-Segment-封存的-Segment" class="headerlink" title="Sealed Segment (封存的 Segment)"></a>Sealed Segment (封存的 Segment)</h3><ul><li><strong>状态</strong>: <code>SegmentState_Sealed</code> &#x2F; <code>SegmentState_Flushing</code> &#x2F; <code>SegmentState_Flushed</code></li><li><strong>特点</strong>: 不再接收写入，数据已持久化</li><li><strong>实现类</strong>: <code>ChunkedSegmentSealedImpl</code></li><li><strong>数据结构</strong>: 使用 <code>ChunkedColumnInterface</code> 存储，支持 mmap 和延迟加载</li><li><strong>索引</strong>: 可以加载永久索引</li></ul><hr><h2 id="A-2-Growing-Segment-内部结构"><a href="#A-2-Growing-Segment-内部结构" class="headerlink" title="A.2 Growing Segment 内部结构"></a>A.2 Growing Segment 内部结构</h2><h3 id="A-2-1-核心组件"><a href="#A-2-1-核心组件" class="headerlink" title="A.2.1 核心组件"></a>A.2.1 核心组件</h3><p><strong>文件</strong>: <code>internal/core/src/segcore/SegmentGrowingImpl.h</code></p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">class</span> <span class="title class_">SegmentGrowingImpl</span> : <span class="keyword">public</span> SegmentGrowing &#123;</span><br><span class="line"><span class="keyword">private</span>:</span><br><span class="line">    <span class="comment">// 1. Schema 和元数据</span></span><br><span class="line">    SchemaPtr schema_;                    <span class="comment">// Collection schema</span></span><br><span class="line">    IndexMetaPtr index_meta_;             <span class="comment">// Index metadata</span></span><br><span class="line">    SegcoreConfig segcore_config_;        <span class="comment">// Segment 配置</span></span><br><span class="line">    <span class="type">int64_t</span> id_;                          <span class="comment">// Segment ID</span></span><br><span class="line"></span><br><span class="line">    <span class="comment">// 2. 插入数据记录 ★ 核心数据结构</span></span><br><span class="line">    InsertRecord&lt;<span class="literal">false</span>&gt; insert_record_;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 3. 索引记录 (Growing Index)</span></span><br><span class="line">    IndexingRecord indexing_record_;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 4. 删除记录</span></span><br><span class="line">    DeletedRecord&lt;<span class="literal">false</span>&gt; deleted_record_;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 5. 统计信息</span></span><br><span class="line">    SegmentStats stats_&#123;&#125;;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 6. 并发控制</span></span><br><span class="line">    <span class="keyword">mutable</span> std::shared_mutex chunk_mutex_;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 7. mmap 管理</span></span><br><span class="line">    storage::MmapChunkDescriptorPtr mmap_descriptor_;</span><br><span class="line">&#125;;</span><br></pre></td></tr></table></figure><h3 id="A-2-2-InsertRecord-数据布局"><a href="#A-2-2-InsertRecord-数据布局" class="headerlink" title="A.2.2 InsertRecord 数据布局"></a>A.2.2 InsertRecord 数据布局</h3><p><strong>文件</strong>: <code>internal/core/src/segcore/InsertRecord.h</code></p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">class</span> <span class="title class_">InsertRecordGrowing</span> &#123;</span><br><span class="line"><span class="keyword">public</span>:</span><br><span class="line">    <span class="comment">// ★ Timestamp 列 (系统字段)</span></span><br><span class="line">    ConcurrentVector&lt;Timestamp&gt; timestamps_;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// ★ Timestamp 索引（用于时间旅行查询）</span></span><br><span class="line">    TimestampIndex timestamp_index_;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// ★ PK 到 Offset 的映射（用于快速查找）</span></span><br><span class="line">    std::unique_ptr&lt;OffsetMap&gt; pk2offset_;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// ★ 预分配的空间</span></span><br><span class="line">    std::atomic&lt;<span class="type">int64_t</span>&gt; reserved = <span class="number">0</span>;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// ★ 响应器（用于并发控制）</span></span><br><span class="line">    AckResponder ack_responder_;</span><br><span class="line"></span><br><span class="line"><span class="keyword">private</span>:</span><br><span class="line">    <span class="comment">// ★ 字段数据存储 (map[FieldID] -&gt; Vector)</span></span><br><span class="line">    std::unordered_map&lt;FieldId, std::unique_ptr&lt;VectorBase&gt;&gt; data_&#123;&#125;;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// ★ Nullable 字段的 valid 数据</span></span><br><span class="line">    std::unordered_map&lt;FieldId, ThreadSafeValidDataPtr&gt; valid_data_&#123;&#125;;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 并发保护</span></span><br><span class="line">    <span class="keyword">mutable</span> std::shared_mutex shared_mutex_&#123;&#125;;</span><br><span class="line">&#125;;</span><br></pre></td></tr></table></figure><h3 id="A-2-3-数据组织方式"><a href="#A-2-3-数据组织方式" class="headerlink" title="A.2.3 数据组织方式"></a>A.2.3 数据组织方式</h3><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br></pre></td><td class="code"><pre><span class="line">Growing Segment Memory Layout:</span><br><span class="line">┌─────────────────────────────────────────────────┐</span><br><span class="line">│  SegmentGrowingImpl                              │</span><br><span class="line">├─────────────────────────────────────────────────┤</span><br><span class="line">│  insert_record_ (InsertRecordGrowing)           │</span><br><span class="line">│  ├─ timestamps_: ConcurrentVector&lt;Timestamp&gt;    │  ★ Timestamp 列</span><br><span class="line">│  ├─ timestamp_index_: TimestampIndex           │  ★ Timestamp 索引</span><br><span class="line">│  ├─ pk2offset_: OffsetMap                       │  ★ PK 索引</span><br><span class="line">│  └─ data_: map&lt;FieldId, VectorBase&gt;            │  ★ 字段数据</span><br><span class="line">│      ├─ FieldID=100: ConcurrentVector&lt;int64&gt;   │     (用户字段1)</span><br><span class="line">│      ├─ FieldID=101: ConcurrentVector&lt;float&gt;   │     (用户字段2)</span><br><span class="line">│      ├─ FieldID=102: ConcurrentVector&lt;vector&gt;  │     (向量字段)</span><br><span class="line">│      └─ ...                                     │</span><br><span class="line">├─────────────────────────────────────────────────┤</span><br><span class="line">│  indexing_record_ (IndexingRecord)              │</span><br><span class="line">│  ├─ Chunk-level indexes                        │  ★ 分块索引</span><br><span class="line">│  └─ Interim vector indexes                     │  ★ 临时向量索引</span><br><span class="line">├─────────────────────────────────────────────────┤</span><br><span class="line">│  deleted_record_ (DeletedRecord)                │</span><br><span class="line">│  └─ Deleted PKs with timestamps                │  ★ 删除记录</span><br><span class="line">└─────────────────────────────────────────────────┘</span><br></pre></td></tr></table></figure><h3 id="A-2-4-Chunk-机制"><a href="#A-2-4-Chunk-机制" class="headerlink" title="A.2.4 Chunk 机制"></a>A.2.4 Chunk 机制</h3><p>Growing Segment 使用 <strong>Chunk</strong> 机制组织数据：</p><ul><li><strong>Chunk Size</strong>: 默认 32768 行（可配置）</li><li><strong>目的</strong>: 支持大规模数据的高效存储和查询</li><li><strong>索引</strong>: 每个 chunk 可以独立建立索引</li></ul><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// 代码示例</span></span><br><span class="line"><span class="type">int64_t</span> chunk_rows = segcore_config_.<span class="built_in">get_chunk_rows</span>();  <span class="comment">// 默认 32768</span></span><br><span class="line"><span class="keyword">auto</span> chunk_id = offset / chunk_rows;</span><br><span class="line"><span class="keyword">auto</span> offset_in_chunk = offset % chunk_rows;</span><br></pre></td></tr></table></figure><hr><h2 id="A-3-Sealed-Segment-内部结构"><a href="#A-3-Sealed-Segment-内部结构" class="headerlink" title="A.3 Sealed Segment 内部结构"></a>A.3 Sealed Segment 内部结构</h2><h3 id="A-3-1-核心组件"><a href="#A-3-1-核心组件" class="headerlink" title="A.3.1 核心组件"></a>A.3.1 核心组件</h3><p><strong>文件</strong>: <code>internal/core/src/segcore/ChunkedSegmentSealedImpl.h</code></p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">class</span> <span class="title class_">ChunkedSegmentSealedImpl</span> : <span class="keyword">public</span> SegmentSealed &#123;</span><br><span class="line"><span class="keyword">private</span>:</span><br><span class="line">    <span class="comment">// 1. Schema 和元数据</span></span><br><span class="line">    SchemaPtr schema_;</span><br><span class="line">    IndexMetaPtr col_index_meta_;</span><br><span class="line">    <span class="type">int64_t</span> id_;</span><br><span class="line">    SegcoreConfig segcore_config_;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 2. 行数</span></span><br><span class="line">    std::optional&lt;<span class="type">int64_t</span>&gt; num_rows_;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 3. 加载状态</span></span><br><span class="line">    BitsetType field_data_ready_bitset_;     <span class="comment">// 字段数据是否已加载</span></span><br><span class="line">    BitsetType index_ready_bitset_;          <span class="comment">// 索引是否已加载</span></span><br><span class="line">    BitsetType binlog_index_bitset_;         <span class="comment">// Binlog 索引标记</span></span><br><span class="line">    std::atomic&lt;<span class="type">int</span>&gt; system_ready_count_;    <span class="comment">// 系统字段计数器</span></span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 4. ★ 字段数据存储（列式存储）</span></span><br><span class="line">    folly::Synchronized&lt;std::unordered_map&lt;</span><br><span class="line">        FieldId, </span><br><span class="line">        std::shared_ptr&lt;ChunkedColumnInterface&gt;</span><br><span class="line">    &gt;&gt; fields_;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 5. ★ 索引存储</span></span><br><span class="line">    <span class="comment">// 5.1 Scalar 索引</span></span><br><span class="line">    folly::Synchronized&lt;std::unordered_map&lt;</span><br><span class="line">        FieldId, </span><br><span class="line">        index::CacheIndexBasePtr</span><br><span class="line">    &gt;&gt; scalar_indexings_;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 5.2 Vector 索引</span></span><br><span class="line">    SealedIndexingRecord vector_indexings_;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 5.3 N-gram 索引（用于文本）</span></span><br><span class="line">    folly::Synchronized&lt;std::unordered_map&lt;</span><br><span class="line">        FieldId,</span><br><span class="line">        std::unordered_map&lt;std::string, index::CacheIndexBasePtr&gt;</span><br><span class="line">    &gt;&gt; ngram_indexings_;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 6. ★ InsertRecord (仅包含系统字段和 PK)</span></span><br><span class="line">    InsertRecord&lt;<span class="literal">true</span>&gt; insert_record_;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 7. 删除记录</span></span><br><span class="line">    DeletedRecord&lt;<span class="literal">true</span>&gt; deleted_record_;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 8. Mmap 字段</span></span><br><span class="line">    std::unordered_set&lt;FieldId&gt; mmap_field_ids_;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 9. 统计信息</span></span><br><span class="line">    SegmentStats stats_&#123;&#125;;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 10. PK 排序标记</span></span><br><span class="line">    <span class="type">bool</span> is_sorted_by_pk_ = <span class="literal">false</span>;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 11. Storage V2 Reader</span></span><br><span class="line">    std::unique_ptr&lt;milvus_storage::api::Reader&gt; reader_;</span><br><span class="line">&#125;;</span><br></pre></td></tr></table></figure><h3 id="A-3-2-数据布局"><a href="#A-3-2-数据布局" class="headerlink" title="A.3.2 数据布局"></a>A.3.2 数据布局</h3><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br></pre></td><td class="code"><pre><span class="line">Sealed Segment Memory Layout:</span><br><span class="line">┌────────────────────────────────────────────────┐</span><br><span class="line">│  ChunkedSegmentSealedImpl                      │</span><br><span class="line">├────────────────────────────────────────────────┤</span><br><span class="line">│  insert_record_ (InsertRecordSealed)           │</span><br><span class="line">│  ├─ timestamps_: ConcurrentVector&lt;Timestamp&gt;   │  ★ 仅系统字段</span><br><span class="line">│  ├─ timestamp_index_: TimestampIndex          │</span><br><span class="line">│  └─ pk2offset_: OffsetOrderedArray            │  ★ PK 索引（已排序）</span><br><span class="line">├────────────────────────────────────────────────┤</span><br><span class="line">│  fields_: map&lt;FieldId, ChunkedColumnInterface&gt;│  ★ 列数据</span><br><span class="line">│  ├─ FieldID=100: ChunkedColumn&lt;int64&gt;         │     (Scalar 字段)</span><br><span class="line">│  │   ├─ Chunk 0: [data...]                    │</span><br><span class="line">│  │   ├─ Chunk 1: [data...]                    │</span><br><span class="line">│  │   └─ ...                                    │</span><br><span class="line">│  ├─ FieldID=102: ChunkedColumn&lt;vector&gt;        │     (向量字段)</span><br><span class="line">│  │   └─ 可能被索引替代                          │</span><br><span class="line">│  └─ ...                                        │</span><br><span class="line">├────────────────────────────────────────────────┤</span><br><span class="line">│  scalar_indexings_: map&lt;FieldId, Index&gt;       │  ★ Scalar 索引</span><br><span class="line">│  └─ FieldID=100: ScalarIndex                  │</span><br><span class="line">├────────────────────────────────────────────────┤</span><br><span class="line">│  vector_indexings_: SealedIndexingRecord       │  ★ Vector 索引</span><br><span class="line">│  ├─ FieldID=102: VectorIndex (HNSW/IVF/...)  │</span><br><span class="line">│  └─ index_has_raw_data_: bool                 │     (是否保留原始数据)</span><br><span class="line">├────────────────────────────────────────────────┤</span><br><span class="line">│  deleted_record_: DeletedRecord                │  ★ 删除记录</span><br><span class="line">└────────────────────────────────────────────────┘</span><br></pre></td></tr></table></figure><h3 id="A-3-3-ChunkedColumn-结构"><a href="#A-3-3-ChunkedColumn-结构" class="headerlink" title="A.3.3 ChunkedColumn 结构"></a>A.3.3 ChunkedColumn 结构</h3><p>Sealed Segment 使用分块列存储：</p><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// 列接口</span></span><br><span class="line"><span class="keyword">class</span> <span class="title class_">ChunkedColumnInterface</span> &#123;</span><br><span class="line"><span class="keyword">public</span>:</span><br><span class="line">    <span class="function"><span class="keyword">virtual</span> <span class="type">size_t</span> <span class="title">num_chunks</span><span class="params">()</span> <span class="type">const</span> </span>= <span class="number">0</span>;</span><br><span class="line">    <span class="function"><span class="keyword">virtual</span> <span class="type">size_t</span> <span class="title">chunk_size</span><span class="params">(<span class="type">size_t</span> chunk_id)</span> <span class="type">const</span> </span>= <span class="number">0</span>;</span><br><span class="line">    <span class="function"><span class="keyword">virtual</span> SpanBase <span class="title">chunk_data</span><span class="params">(<span class="type">size_t</span> chunk_id)</span> <span class="type">const</span> </span>= <span class="number">0</span>;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 支持 mmap</span></span><br><span class="line">    <span class="function"><span class="keyword">virtual</span> <span class="type">bool</span> <span class="title">is_mmaped</span><span class="params">()</span> <span class="type">const</span> </span>= <span class="number">0</span>;</span><br><span class="line">&#125;;</span><br><span class="line"></span><br><span class="line"><span class="comment">// 使用示例</span></span><br><span class="line"><span class="keyword">auto</span> column = fields_[field_id];</span><br><span class="line"><span class="keyword">for</span> (<span class="type">size_t</span> chunk_id = <span class="number">0</span>; chunk_id &lt; column-&gt;<span class="built_in">num_chunks</span>(); ++chunk_id) &#123;</span><br><span class="line">    <span class="keyword">auto</span> data = column-&gt;<span class="built_in">chunk_data</span>(chunk_id);</span><br><span class="line">    <span class="comment">// 处理 chunk 数据</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><hr><h2 id="A-4-文件系统中的-Layout"><a href="#A-4-文件系统中的-Layout" class="headerlink" title="A.4 文件系统中的 Layout"></a>A.4 文件系统中的 Layout</h2><h3 id="A-4-1-StorageV1-Binlog-布局"><a href="#A-4-1-StorageV1-Binlog-布局" class="headerlink" title="A.4.1 StorageV1 (Binlog) 布局"></a>A.4.1 StorageV1 (Binlog) 布局</h3><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br></pre></td><td class="code"><pre><span class="line">&#123;root_path&#125;/&#123;collectionID&#125;/&#123;partitionID&#125;/&#123;segmentID&#125;/</span><br><span class="line">├── insert_log/              # 插入数据</span><br><span class="line">│   ├── 0/                   # RowID field</span><br><span class="line">│   │   └── &#123;logID&#125;          # Binlog 文件</span><br><span class="line">│   ├── 1/                   # ★ Timestamp field</span><br><span class="line">│   │   └── &#123;logID&#125;</span><br><span class="line">│   ├── 100/                 # User field 1</span><br><span class="line">│   │   └── &#123;logID&#125;</span><br><span class="line">│   ├── 101/                 # User field 2</span><br><span class="line">│   │   └── &#123;logID&#125;</span><br><span class="line">│   └── 102/                 # Vector field</span><br><span class="line">│       └── &#123;logID&#125;</span><br><span class="line">│</span><br><span class="line">├── delta_log/               # 删除数据</span><br><span class="line">│   └── &#123;logID&#125;</span><br><span class="line">│</span><br><span class="line">├── stats_log/               # 统计数据（PK stats）</span><br><span class="line">│   └── &#123;logID&#125;</span><br><span class="line">│</span><br><span class="line">└── index/                   # 索引文件</span><br><span class="line">    └── &#123;fieldID&#125;/</span><br><span class="line">        ├── &#123;indexBuildID&#125;/</span><br><span class="line">        │   ├── index_params</span><br><span class="line">        │   ├── index_info</span><br><span class="line">        │   └── index_data_*</span><br><span class="line">        └── ...</span><br></pre></td></tr></table></figure><h3 id="A-4-2-StorageV2-Parquet-布局"><a href="#A-4-2-StorageV2-Parquet-布局" class="headerlink" title="A.4.2 StorageV2 (Parquet) 布局"></a>A.4.2 StorageV2 (Parquet) 布局</h3><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br></pre></td><td class="code"><pre><span class="line">&#123;root_path&#125;/&#123;collectionID&#125;/&#123;partitionID&#125;/&#123;segmentID&#125;/</span><br><span class="line">├── insert_log/</span><br><span class="line">│   ├── &#123;columnGroupID&#125;/</span><br><span class="line">│   │   └── &#123;logID&#125;.parquet    # Parquet 文件（包含多个字段）</span><br><span class="line">│   │                           # ★ Timestamp 作为其中一列</span><br><span class="line">│   ├── manifest.json           # 元数据清单</span><br><span class="line">│   │   ├── schema</span><br><span class="line">│   │   ├── column_groups</span><br><span class="line">│   │   │   ├─ group_id: 1</span><br><span class="line">│   │   │   │  └─ fields: [0, 1, 100]  # RowID, Timestamp, Field1</span><br><span class="line">│   │   │   └─ group_id: 2</span><br><span class="line">│   │   │      └─ fields: [101, 102]   # Field2, VectorField</span><br><span class="line">│   │   └── files</span><br><span class="line">│   └── ...</span><br><span class="line">│</span><br><span class="line">├── delta_log/</span><br><span class="line">│   └── &#123;logID&#125;</span><br><span class="line">│</span><br><span class="line">├── stats_log/</span><br><span class="line">│   └── &#123;logID&#125;</span><br><span class="line">│</span><br><span class="line">└── index/</span><br><span class="line">    └── ... (同 V1)</span><br></pre></td></tr></table></figure><h3 id="A-4-3-Binlog-文件内部结构"><a href="#A-4-3-Binlog-文件内部结构" class="headerlink" title="A.4.3 Binlog 文件内部结构"></a>A.4.3 Binlog 文件内部结构</h3><p>每个 binlog 文件包含：</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br></pre></td><td class="code"><pre><span class="line">Binlog File Structure:</span><br><span class="line">┌─────────────────────────────┐</span><br><span class="line">│  Magic Number               │  4 bytes</span><br><span class="line">├─────────────────────────────┤</span><br><span class="line">│  Descriptor Event           │</span><br><span class="line">│  ├─ CollectionID            │</span><br><span class="line">│  ├─ PartitionID             │</span><br><span class="line">│  ├─ SegmentID               │</span><br><span class="line">│  ├─ FieldID                 │</span><br><span class="line">│  ├─ StartTimestamp          │  ★ Min timestamp</span><br><span class="line">│  ├─ EndTimestamp            │  ★ Max timestamp</span><br><span class="line">│  └─ PayloadDataType         │</span><br><span class="line">├─────────────────────────────┤</span><br><span class="line">│  Insert Event 1             │</span><br><span class="line">│  ├─ StartTimestamp          │</span><br><span class="line">│  ├─ EndTimestamp            │</span><br><span class="line">│  └─ Payload (field data)    │  ★ 实际字段数据</span><br><span class="line">├─────────────────────────────┤</span><br><span class="line">│  Insert Event 2             │</span><br><span class="line">│  └─ ...                     │</span><br><span class="line">├─────────────────────────────┤</span><br><span class="line">│  ...                        │</span><br><span class="line">└─────────────────────────────┘</span><br></pre></td></tr></table></figure><hr><h2 id="A-5-查询时的数据访问"><a href="#A-5-查询时的数据访问" class="headerlink" title="A.5 查询时的数据访问"></a>A.5 查询时的数据访问</h2><h3 id="A-5-1-访问路径"><a href="#A-5-1-访问路径" class="headerlink" title="A.5.1 访问路径"></a>A.5.1 访问路径</h3><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br></pre></td><td class="code"><pre><span class="line">Query Request</span><br><span class="line">    ↓</span><br><span class="line">Plan Node</span><br><span class="line">    ↓</span><br><span class="line">Segment Interface</span><br><span class="line">    ↓</span><br><span class="line">    ├─ [Growing Segment]</span><br><span class="line">    │   └─&gt; InsertRecord.data_[fieldID]</span><br><span class="line">    │       └─&gt; ConcurrentVector::get(offset)</span><br><span class="line">    │</span><br><span class="line">    └─ [Sealed Segment]</span><br><span class="line">        ├─&gt; [Has Index?]</span><br><span class="line">        │   ├─ Yes → vector_indexings_[fieldID]</span><br><span class="line">        │   │         └─&gt; Index Search</span><br><span class="line">        │   └─ No  → fields_[fieldID]</span><br><span class="line">        │             └─&gt; ChunkedColumn::chunk_data(chunk_id)</span><br><span class="line">        │</span><br><span class="line">        └─&gt; [Filter by Timestamp]</span><br><span class="line">            └─&gt; insert_record_.timestamps_</span><br><span class="line">                └─&gt; TimestampIndex (Binary Search)</span><br></pre></td></tr></table></figure><h3 id="A-5-2-Timestamp-过滤示例"><a href="#A-5-2-Timestamp-过滤示例" class="headerlink" title="A.5.2 Timestamp 过滤示例"></a>A.5.2 Timestamp 过滤示例</h3><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// 文件: internal/core/src/segcore/SegmentGrowingImpl.cpp</span></span><br><span class="line"></span><br><span class="line"><span class="function"><span class="type">void</span> <span class="title">SegmentGrowingImpl::mask_with_timestamps</span><span class="params">(</span></span></span><br><span class="line"><span class="params"><span class="function">    BitsetTypeView&amp; bitset_chunk,</span></span></span><br><span class="line"><span class="params"><span class="function">    Timestamp timestamp,</span></span></span><br><span class="line"><span class="params"><span class="function">    Timestamp ttl)</span> <span class="type">const</span> </span>&#123;</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">auto</span> timestamps_data_ptr = insert_record_.timestamps_.<span class="built_in">data</span>();</span><br><span class="line">    <span class="keyword">auto</span> size = insert_record_.<span class="built_in">size</span>();</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 使用 timestamp 过滤数据</span></span><br><span class="line">    <span class="keyword">for</span> (<span class="type">int64_t</span> i = <span class="number">0</span>; i &lt; size; ++i) &#123;</span><br><span class="line">        <span class="keyword">auto</span> ts = timestamps_data_ptr[i];</span><br><span class="line">        </span><br><span class="line">        <span class="comment">// 过滤未来的数据（ts &gt; query_timestamp）</span></span><br><span class="line">        <span class="keyword">if</span> (ts &gt; timestamp) &#123;</span><br><span class="line">            bitset_chunk[i] = <span class="literal">true</span>;  <span class="comment">// mask out</span></span><br><span class="line">        &#125;</span><br><span class="line">        </span><br><span class="line">        <span class="comment">// 过滤过期的数据（根据 TTL）</span></span><br><span class="line">        <span class="keyword">if</span> (ttl &gt; <span class="number">0</span> &amp;&amp; timestamp - ts &gt; ttl) &#123;</span><br><span class="line">            bitset_chunk[i] = <span class="literal">true</span>;  <span class="comment">// mask out</span></span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><hr><h2 id="A-6-内存管理"><a href="#A-6-内存管理" class="headerlink" title="A.6 内存管理"></a>A.6 内存管理</h2><h3 id="A-6-1-Growing-Segment-内存管理"><a href="#A-6-1-Growing-Segment-内存管理" class="headerlink" title="A.6.1 Growing Segment 内存管理"></a>A.6.1 Growing Segment 内存管理</h3><ol><li><strong>动态增长</strong>: <code>ConcurrentVector</code> 按需分配内存</li><li><strong>Chunk 管理</strong>: 数据按 chunk 组织，支持增量加载</li><li><strong>Mmap 支持</strong>: 可选地使用 mmap 减少内存占用</li></ol><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// Chunk 大小配置</span></span><br><span class="line"><span class="type">int64_t</span> chunk_rows = segcore_config_.<span class="built_in">get_chunk_rows</span>();  <span class="comment">// 默认 32768</span></span><br><span class="line"></span><br><span class="line"><span class="comment">// 估算内存使用</span></span><br><span class="line"><span class="type">size_t</span> memory_per_row = <span class="built_in">EstimateRowSize</span>(schema);</span><br><span class="line"><span class="type">size_t</span> total_memory = num_rows * memory_per_row;</span><br></pre></td></tr></table></figure><h3 id="A-6-2-Sealed-Segment-内存管理"><a href="#A-6-2-Sealed-Segment-内存管理" class="headerlink" title="A.6.2 Sealed Segment 内存管理"></a>A.6.2 Sealed Segment 内存管理</h3><ol><li><strong>延迟加载</strong>: 字段和索引按需加载</li><li><strong>Mmap</strong>: 支持 mmap 模式，减少内存拷贝</li><li><strong>缓存管理</strong>: 使用 LRU 缓存管理索引和字段数据</li></ol><figure class="highlight cpp"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// 加载状态检查</span></span><br><span class="line"><span class="function"><span class="type">bool</span> <span class="title">CanQuery</span><span class="params">()</span> </span>&#123;</span><br><span class="line">    <span class="keyword">return</span> system_ready_count_ == <span class="number">2</span>  <span class="comment">// RowID &amp; Timestamp loaded</span></span><br><span class="line">        &amp;&amp; <span class="built_in">AllRequiredFieldsReady</span>()   <span class="comment">// Query fields loaded</span></span><br><span class="line">        &amp;&amp; (<span class="built_in">HasIndex</span>() || <span class="built_in">HasRawData</span>()); <span class="comment">// Index or raw data available</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><hr><h2 id="A-7-关键数据结构总结"><a href="#A-7-关键数据结构总结" class="headerlink" title="A.7 关键数据结构总结"></a>A.7 关键数据结构总结</h2><table><thead><tr><th>组件</th><th>Growing Segment</th><th>Sealed Segment</th><th>说明</th></tr></thead><tbody><tr><td><strong>Timestamp</strong></td><td><code>ConcurrentVector&lt;Timestamp&gt;</code></td><td><code>ConcurrentVector&lt;Timestamp&gt;</code></td><td>两者都存储完整 timestamp</td></tr><tr><td><strong>TimestampIndex</strong></td><td><code>TimestampIndex</code></td><td><code>TimestampIndex</code></td><td>用于时间旅行查询</td></tr><tr><td><strong>PK Index</strong></td><td><code>OffsetOrderedMap</code></td><td><code>OffsetOrderedArray</code></td><td>Growing 用 Map，Sealed 用 Array</td></tr><tr><td><strong>Field Data</strong></td><td><code>map&lt;FieldId, ConcurrentVector&gt;</code></td><td><code>map&lt;FieldId, ChunkedColumn&gt;</code></td><td>列式存储</td></tr><tr><td><strong>Vector Index</strong></td><td><code>IndexingRecord</code> (interim)</td><td><code>SealedIndexingRecord</code> (permanent)</td><td>索引类型不同</td></tr><tr><td><strong>Scalar Index</strong></td><td>可选</td><td><code>map&lt;FieldId, Index&gt;</code></td><td>Sealed 支持更多索引</td></tr><tr><td><strong>Delete Record</strong></td><td><code>DeletedRecord&lt;false&gt;</code></td><td><code>DeletedRecord&lt;true&gt;</code></td><td>删除记录</td></tr><tr><td><strong>并发控制</strong></td><td><code>std::shared_mutex</code></td><td><code>folly::Synchronized</code></td><td>不同的同步机制</td></tr></tbody></table><hr><p><strong>文档版本</strong>: v1.1<br><strong>最后更新</strong>: 2025-01-24<br><strong>适用 Milvus 版本</strong>: 2.5.x</p>]]></content>
    
    
      
      
    <summary type="html">&lt;p&gt;本文档详细介绍 Milvus 中 MVCC timestamp 信息在数据写入 Segment 过程中的处理机制，包括正常 Flush 流程和 Compaction 流程。&lt;/p&gt;
&lt;h2 id=&quot;目录&quot;&gt;&lt;a href=&quot;#目录&quot; class=&quot;headerlink&quot; t</summary>
      
    
    
    
    
    <category term="Milvus" scheme="https://szza.github.io/tags/Milvus/"/>
    
  </entry>
  
  <entry>
    <title>Milvus 数据写入过程中索引同步流程分析</title>
    <link href="https://szza.github.io/2025/08/09/Milvus/11_index_sync_flow_analysis/"/>
    <id>https://szza.github.io/2025/08/09/Milvus/11_index_sync_flow_analysis/</id>
    <published>2025-08-09T12:00:00.000Z</published>
    <updated>2026-01-06T13:10:46.794Z</updated>
    
    <content type="html"><![CDATA[<p>本文档详细分析了 Milvus 中数据写入后如何同步构建索引的完整流程。</p><h2 id="1-概述"><a href="#1-概述" class="headerlink" title="1. 概述"></a>1. 概述</h2><p>Milvus 采用<strong>异步索引构建</strong>机制：数据写入后不会立即构建索引，而是在 Segment 完成 Flush 操作后，触发索引构建任务。这种设计可以：</p><ul><li>提高写入性能：避免每次写入都构建索引</li><li>批量优化：对完整 Segment 构建索引更高效</li><li>资源管理：通过任务调度器控制索引构建的资源使用</li></ul><h2 id="2-数据写入和-Flush-流程"><a href="#2-数据写入和-Flush-流程" class="headerlink" title="2. 数据写入和 Flush 流程"></a>2. 数据写入和 Flush 流程</h2><h3 id="2-1-数据写入"><a href="#2-1-数据写入" class="headerlink" title="2.1 数据写入"></a>2.1 数据写入</h3><p><strong>文件</strong>: <code>internal/datanode/</code> (DataNode 组件)</p><p>数据写入流程：</p><ol><li>客户端通过 Proxy 发送 Insert 请求</li><li>DataNode 接收数据并写入内存 Buffer</li><li>当满足条件时（如达到大小阈值、时间阈值），触发 Flush 操作</li></ol><h3 id="2-2-Flush-完成通知"><a href="#2-2-Flush-完成通知" class="headerlink" title="2.2 Flush 完成通知"></a>2.2 Flush 完成通知</h3><p><strong>文件</strong>: <code>internal/datacoord/services.go</code></p><p><strong>方法</strong>: <code>Server.SaveBinlogPaths()</code> (line 700-738)</p><p>当 DataNode 完成 Segment 的 Flush 操作后，会调用 DataCoord 的 <code>SaveBinlogPaths</code> 方法：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// notify building index and compaction for &quot;flushing/flushed&quot; level one segment</span></span><br><span class="line"><span class="keyword">if</span> req.GetFlushed() &#123;</span><br><span class="line">    <span class="comment">// notify building index</span></span><br><span class="line">    s.flushCh &lt;- req.SegmentID</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// notify compaction</span></span><br><span class="line">    s.compactionTrigger.TriggerCompaction(ctx, ...)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>关键点</strong>：</p><ul><li>只有当 <code>req.GetFlushed() == true</code> 时才会触发索引构建</li><li>通过 <code>flushCh</code> channel 发送 SegmentID</li><li>同时也会触发 Compaction 任务</li></ul><h3 id="2-3-Flush-Channel-转发"><a href="#2-3-Flush-Channel-转发" class="headerlink" title="2.3 Flush Channel 转发"></a>2.3 Flush Channel 转发</h3><p><strong>文件</strong>: <code>internal/datacoord/server.go</code></p><p><strong>方法</strong>: <code>Server.postFlush()</code> (line 960-976)</p><p><code>flushCh</code> 中的消息会被转发到全局的 <code>getBuildIndexChSingleton()</code> channel：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">if</span> enableSortCompaction() &#123;</span><br><span class="line">    <span class="keyword">select</span> &#123;</span><br><span class="line">    <span class="keyword">case</span> getStatsTaskChSingleton() &lt;- segmentID:</span><br><span class="line">    <span class="keyword">default</span>:</span><br><span class="line">    &#125;</span><br><span class="line">&#125; <span class="keyword">else</span> &#123;</span><br><span class="line">    <span class="keyword">select</span> &#123;</span><br><span class="line">    <span class="keyword">case</span> getBuildIndexChSingleton() &lt;- segmentID:</span><br><span class="line">    <span class="keyword">default</span>:</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>关键点</strong>：</p><ul><li>如果启用了排序压缩（Sort Compaction），先发送到统计任务 channel</li><li>否则直接发送到索引构建 channel</li><li>使用 <code>select</code> 的非阻塞方式，避免阻塞</li></ul><h2 id="3-索引构建任务创建"><a href="#3-索引构建任务创建" class="headerlink" title="3. 索引构建任务创建"></a>3. 索引构建任务创建</h2><h3 id="3-1-Index-Inspector-监听"><a href="#3-1-Index-Inspector-监听" class="headerlink" title="3.1 Index Inspector 监听"></a>3.1 Index Inspector 监听</h3><p><strong>文件</strong>: <code>internal/datacoord/index_inspector.go</code></p><p><strong>方法</strong>: <code>indexInspector.createIndexForSegmentLoop()</code> (line 87-132)</p><p>Index Inspector 是一个后台守护进程，持续监听以下事件：</p><ol><li><p><strong>定时检查</strong> (ticker.C)：</p><ul><li>定期检查是否有未构建索引的 Flushed Segment</li><li>调用 <code>getUnIndexTaskSegments()</code> 获取需要构建索引的 Segment</li></ul></li><li><p><strong>Collection 索引通知</strong> (notifyIndexChan)：</p><ul><li>当 Collection 创建新索引时触发</li><li>为所有已 Flush 的 Segment 创建索引任务</li></ul></li><li><p><strong>Flush 完成通知</strong> (getBuildIndexChSingleton())：</p><ul><li>接收新 Flush 完成的 SegmentID</li><li>立即为该 Segment 创建索引任务</li></ul></li></ol><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">case</span> segID := &lt;-getBuildIndexChSingleton():</span><br><span class="line">    log.Info(<span class="string">&quot;receive new flushed segment&quot;</span>, zap.Int64(<span class="string">&quot;segmentID&quot;</span>, segID))</span><br><span class="line">    segment := i.meta.GetSegment(ctx, segID)</span><br><span class="line">    <span class="keyword">if</span> segment == <span class="literal">nil</span> &#123;</span><br><span class="line">        log.Warn(<span class="string">&quot;segment is not exist, no need to build index&quot;</span>, zap.Int64(<span class="string">&quot;segmentID&quot;</span>, segID))</span><br><span class="line">        <span class="keyword">continue</span></span><br><span class="line">    &#125;</span><br><span class="line">    <span class="keyword">if</span> err := i.createIndexesForSegment(ctx, segment); err != <span class="literal">nil</span> &#123;</span><br><span class="line">        log.Warn(<span class="string">&quot;create index for segment fail, wait for retry&quot;</span>, zap.Int64(<span class="string">&quot;segmentID&quot;</span>, segID))</span><br><span class="line">        <span class="keyword">continue</span></span><br><span class="line">    &#125;</span><br></pre></td></tr></table></figure><h3 id="3-2-创建索引任务"><a href="#3-2-创建索引任务" class="headerlink" title="3.2 创建索引任务"></a>3.2 创建索引任务</h3><p><strong>方法</strong>: <code>indexInspector.createIndexesForSegment()</code> (line 148-170)</p><p>为 Segment 创建索引任务的主要步骤：</p><ol><li><p><strong>检查 Segment 状态</strong>：</p><ul><li>如果启用了排序压缩，需要等待 Segment 排序完成</li><li>L0 级别的 Segment 不构建索引</li></ul></li><li><p><strong>获取 Collection 的所有索引</strong>：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">indexes := i.meta.indexMeta.GetIndexesForCollection(segment.CollectionID, <span class="string">&quot;&quot;</span>)</span><br></pre></td></tr></table></figure></li><li><p><strong>检查哪些索引还未构建</strong>：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line">indexIDToSegIndexes := i.meta.indexMeta.GetSegmentIndexes(segment.CollectionID, segment.ID)</span><br><span class="line"><span class="keyword">for</span> _, index := <span class="keyword">range</span> indexes &#123;</span><br><span class="line">    <span class="keyword">if</span> _, ok := indexIDToSegIndexes[index.IndexID]; !ok &#123;</span><br><span class="line">        <span class="comment">// 为这个索引创建任务</span></span><br><span class="line">        i.createIndexForSegment(ctx, segment, index.IndexID)</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure></li></ol><h3 id="3-3-创建-SegmentIndex-元数据"><a href="#3-3-创建-SegmentIndex-元数据" class="headerlink" title="3.3 创建 SegmentIndex 元数据"></a>3.3 创建 SegmentIndex 元数据</h3><p><strong>方法</strong>: <code>indexInspector.createIndexForSegment()</code> (line 172-225)</p><p>为每个索引创建 SegmentIndex 元数据并加入调度队列：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// 1. 分配 BuildID</span></span><br><span class="line">buildID, err := i.allocator.AllocID(context.Background())</span><br><span class="line"></span><br><span class="line"><span class="comment">// 2. 获取索引参数</span></span><br><span class="line">indexParams := i.meta.indexMeta.GetIndexParams(segment.CollectionID, indexID)</span><br><span class="line">indexType := GetIndexType(indexParams)</span><br><span class="line"></span><br><span class="line"><span class="comment">// 3. 计算任务槽位（用于资源调度）</span></span><br><span class="line">isVectorIndex := vecindexmgr.GetVecIndexMgrInstance().IsVecIndex(indexType)</span><br><span class="line">segSize := segment.getSegmentSize()</span><br><span class="line">taskSlot := calculateIndexTaskSlot(segSize, isVectorIndex)</span><br><span class="line"></span><br><span class="line"><span class="comment">// 4. 创建 SegmentIndex 元数据</span></span><br><span class="line">segIndex := &amp;model.SegmentIndex&#123;</span><br><span class="line">    SegmentID:      segment.ID,</span><br><span class="line">    CollectionID:   segment.CollectionID,</span><br><span class="line">    PartitionID:    segment.PartitionID,</span><br><span class="line">    NumRows:        segment.NumOfRows,</span><br><span class="line">    IndexID:        indexID,</span><br><span class="line">    BuildID:        buildID,</span><br><span class="line">    CreatedUTCTime: <span class="type">uint64</span>(time.Now().Unix()),</span><br><span class="line">    WriteHandoff:   <span class="literal">false</span>,</span><br><span class="line">    IndexType:      indexType,</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// 5. 保存到元数据存储</span></span><br><span class="line"><span class="keyword">if</span> err = i.meta.indexMeta.AddSegmentIndex(ctx, segIndex); err != <span class="literal">nil</span> &#123;</span><br><span class="line">    <span class="keyword">return</span> err</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// 6. 加入调度器队列</span></span><br><span class="line">i.scheduler.Enqueue(newIndexBuildTask(...))</span><br></pre></td></tr></table></figure><p><strong>关键点</strong>：</p><ul><li><code>BuildID</code> 是索引构建任务的唯一标识</li><li><code>IndexState</code> 初始为 <code>Unissued</code></li><li>任务槽位（TaskSlot）用于控制并发度，根据 Segment 大小和索引类型计算</li></ul><h2 id="4-索引任务调度和执行"><a href="#4-索引任务调度和执行" class="headerlink" title="4. 索引任务调度和执行"></a>4. 索引任务调度和执行</h2><h3 id="4-1-DataCoord-任务调度"><a href="#4-1-DataCoord-任务调度" class="headerlink" title="4.1 DataCoord 任务调度"></a>4.1 DataCoord 任务调度</h3><p><strong>文件</strong>: <code>internal/datacoord/task_index.go</code></p><p><strong>方法</strong>: <code>indexBuildTask.CreateTaskOnWorker()</code> (line 148-206)</p><p>DataCoord 的调度器会选择合适的 DataNode 节点执行索引构建任务：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// 1. 验证任务和 Segment 状态</span></span><br><span class="line">segIndex, exist := it.meta.indexMeta.GetIndexJob(it.BuildID)</span><br><span class="line">segment := it.meta.GetSegment(ctx, segIndex.SegmentID)</span><br><span class="line"></span><br><span class="line"><span class="comment">// 2. 检查是否需要构建索引</span></span><br><span class="line"><span class="comment">// - 某些索引类型不需要训练（如 FLAT）</span></span><br><span class="line"><span class="comment">// - 小 Segment 可能不需要索引</span></span><br><span class="line"><span class="keyword">if</span> isNoTrainIndex(indexType) || segIndex.NumRows &lt; MinSegmentNumRowsToEnableIndex &#123;</span><br><span class="line">    <span class="comment">// 标记为已完成（假完成）</span></span><br><span class="line">    it.UpdateStateWithMeta(indexpb.JobState_JobStateFinished, <span class="string">&quot;fake finished&quot;</span>)</span><br><span class="line">    <span class="keyword">return</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// 3. 准备任务请求</span></span><br><span class="line">req, err := it.prepareJobRequest(ctx, segment, segIndex, indexParams, indexType)</span><br><span class="line"></span><br><span class="line"><span class="comment">// 4. 选择 DataNode 并发送请求</span></span><br><span class="line"><span class="keyword">if</span> err = cluster.CreateIndex(nodeID, req); err != <span class="literal">nil</span> &#123;</span><br><span class="line">    log.Warn(<span class="string">&quot;failed to send job to worker&quot;</span>, zap.Error(err))</span><br><span class="line">    <span class="keyword">return</span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// 5. 更新任务状态为 InProgress</span></span><br><span class="line">it.UpdateStateWithMeta(indexpb.JobState_JobStateInProgress, ...)</span><br></pre></td></tr></table></figure><h3 id="4-2-DataNode-接收任务"><a href="#4-2-DataNode-接收任务" class="headerlink" title="4.2 DataNode 接收任务"></a>4.2 DataNode 接收任务</h3><p><strong>文件</strong>: <code>internal/datanode/index_services.go</code></p><p><strong>方法</strong>: <code>DataNode.CreateJob()</code> (line 44-120)</p><p>DataNode 接收索引构建请求：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// 1. 创建任务上下文</span></span><br><span class="line">taskCtx, taskCancel := context.WithCancel(node.ctx)</span><br><span class="line"></span><br><span class="line"><span class="comment">// 2. 检查任务是否已存在（防止重复）</span></span><br><span class="line"><span class="keyword">if</span> oldInfo := node.taskManager.LoadOrStoreIndexTask(...); oldInfo != <span class="literal">nil</span> &#123;</span><br><span class="line">    <span class="keyword">return</span> merr.WrapErrIndexDuplicate(...)</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// 3. 创建存储管理器</span></span><br><span class="line">cm, err := node.storageFactory.NewChunkManager(node.ctx, req.GetStorageConfig())</span><br><span class="line"></span><br><span class="line"><span class="comment">// 4. 创建索引构建任务</span></span><br><span class="line">task := index.NewIndexBuildTask(taskCtx, taskCancel, req, cm, node.taskManager, pluginContext)</span><br><span class="line"></span><br><span class="line"><span class="comment">// 5. 加入任务队列</span></span><br><span class="line"><span class="keyword">if</span> err := node.taskScheduler.TaskQueue.Enqueue(task); err != <span class="literal">nil</span> &#123;</span><br><span class="line">    <span class="keyword">return</span> merr.Status(err)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h3 id="4-3-索引构建执行"><a href="#4-3-索引构建执行" class="headerlink" title="4.3 索引构建执行"></a>4.3 索引构建执行</h3><p><strong>文件</strong>: <code>internal/datanode/index/task_index.go</code></p><p>索引构建任务遵循标准的任务执行模式：<strong>PreExecute → Execute → PostExecute</strong></p><h4 id="4-3-1-PreExecute-阶段"><a href="#4-3-1-PreExecute-阶段" class="headerlink" title="4.3.1 PreExecute 阶段"></a>4.3.1 PreExecute 阶段</h4><p><strong>方法</strong>: <code>indexBuildTask.PreExecute()</code> (line 146-221)</p><p>准备阶段的主要工作：</p><ol><li><p><strong>构建数据路径</strong>：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">if</span> <span class="built_in">len</span>(it.req.DataPaths) == <span class="number">0</span> &#123;</span><br><span class="line">    <span class="keyword">for</span> _, id := <span class="keyword">range</span> it.req.GetDataIds() &#123;</span><br><span class="line">        path := metautil.BuildInsertLogPath(...)</span><br><span class="line">        it.req.DataPaths = <span class="built_in">append</span>(it.req.DataPaths, path)</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure></li><li><p><strong>解析索引参数</strong>：</p><ul><li>从请求中提取 <code>typeParams</code> 和 <code>indexParams</code></li><li>处理特殊参数（如 <code>mmap_enabled</code>）</li></ul></li><li><p><strong>填充字段元数据</strong>：</p><ul><li>如果请求中缺少字段信息，从 Binlog 中解析</li></ul></li><li><p><strong>设置索引版本</strong>：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">it.req.CurrentIndexVersion = getCurrentIndexVersion(...)</span><br><span class="line">it.req.CurrentScalarIndexVersion = getCurrentScalarIndexVersion(...)</span><br></pre></td></tr></table></figure></li></ol><h4 id="4-3-2-Execute-阶段"><a href="#4-3-2-Execute-阶段" class="headerlink" title="4.3.2 Execute 阶段"></a>4.3.2 Execute 阶段</h4><p><strong>方法</strong>: <code>indexBuildTask.Execute()</code> (line 223-330)</p><p>执行阶段的核心工作：</p><ol><li><p><strong>准备构建参数</strong>：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><span class="line">buildIndexParams := &amp;indexcgopb.BuildIndexInfo&#123;</span><br><span class="line">    ClusterID:                 it.req.GetClusterID(),</span><br><span class="line">    BuildID:                   it.req.GetBuildID(),</span><br><span class="line">    CollectionID:              it.req.GetCollectionID(),</span><br><span class="line">    SegmentID:                 it.req.GetSegmentID(),</span><br><span class="line">    NumRows:                   it.req.GetNumRows(),</span><br><span class="line">    Dim:                       it.req.GetDim(),</span><br><span class="line">    InsertFiles:               it.req.GetDataPaths(),</span><br><span class="line">    FieldSchema:               it.req.GetField(),</span><br><span class="line">    IndexParams:               mapToKVPairs(it.newIndexParams),</span><br><span class="line">    StorageConfig:             storageConfig,</span><br><span class="line">    <span class="comment">// ...</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure></li><li><p><strong>调用 CGO 接口构建索引</strong>：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">it.index, err = indexcgowrapper.CreateIndex(ctx, buildIndexParams)</span><br></pre></td></tr></table></figure><p>这一步会：</p><ul><li>从存储中加载 Segment 数据</li><li>调用 Knowhere 库构建索引</li><li>索引构建完成后序列化</li></ul></li><li><p><strong>记录构建指标</strong>：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">metrics.DataNodeKnowhereBuildIndexLatency.Observe(buildIndexLatency.Seconds())</span><br></pre></td></tr></table></figure></li></ol><h4 id="4-3-3-PostExecute-阶段"><a href="#4-3-3-PostExecute-阶段" class="headerlink" title="4.3.3 PostExecute 阶段"></a>4.3.3 PostExecute 阶段</h4><p><strong>方法</strong>: <code>indexBuildTask.PostExecute()</code> (line 333-375)</p><p>后处理阶段的主要工作：</p><ol><li><p><strong>上传索引文件到存储</strong>：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">indexStats, err := it.index.UpLoad()</span><br></pre></td></tr></table></figure><p>上传过程会：</p><ul><li>序列化索引数据</li><li>分片上传到对象存储（MinIO&#x2F;S3 等）</li><li>返回索引文件路径和大小信息</li></ul></li><li><p><strong>清理本地索引数据</strong>：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">gcIndex := <span class="function"><span class="keyword">func</span><span class="params">()</span></span> &#123;</span><br><span class="line">    <span class="keyword">if</span> err := it.index.Delete(); err != <span class="literal">nil</span> &#123;</span><br><span class="line">        log.Warn(<span class="string">&quot;indexBuildTask Execute CIndexDelete failed&quot;</span>, zap.Error(err))</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br><span class="line">gcIndex() <span class="comment">// 早期释放，节省内存</span></span><br></pre></td></tr></table></figure></li><li><p><strong>保存索引元数据</strong>：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line">it.manager.StoreIndexFilesAndStatistic(</span><br><span class="line">    it.req.GetClusterID(),</span><br><span class="line">    it.req.GetBuildID(),</span><br><span class="line">    saveFileKeys,        <span class="comment">// 索引文件路径列表</span></span><br><span class="line">    serializedSize,      <span class="comment">// 序列化后的大小</span></span><br><span class="line">    <span class="type">uint64</span>(indexStats.MemSize),  <span class="comment">// 内存大小</span></span><br><span class="line">    it.req.GetCurrentIndexVersion(),</span><br><span class="line">    it.req.GetCurrentScalarIndexVersion(),</span><br><span class="line">)</span><br></pre></td></tr></table></figure></li><li><p><strong>更新任务状态</strong>：</p><ul><li>通过 <code>SetState()</code> 更新任务状态为 <code>Finished</code></li><li>记录完成时间</li></ul></li></ol><h3 id="4-4-索引元数据更新"><a href="#4-4-索引元数据更新" class="headerlink" title="4.4 索引元数据更新"></a>4.4 索引元数据更新</h3><p><strong>文件</strong>: <code>internal/datacoord/index_meta.go</code></p><p><strong>方法</strong>: <code>indexMeta.FinishTask()</code> (line 885-920)</p><p>当 DataNode 完成索引构建后，会通知 DataCoord 更新元数据：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(m *indexMeta)</span></span> FinishTask(taskInfo *workerpb.IndexTaskInfo) <span class="type">error</span> &#123;</span><br><span class="line">    <span class="comment">// 1. 获取 SegmentIndex</span></span><br><span class="line">    segIdx, ok := m.segmentBuildInfo.Get(taskInfo.GetBuildID())</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 2. 更新索引状态和文件信息</span></span><br><span class="line">    segIdx.IndexState = taskInfo.GetState()</span><br><span class="line">    segIdx.IndexFileKeys = common.CloneStringList(taskInfo.GetIndexFileKeys())</span><br><span class="line">    segIdx.IndexSerializedSize = taskInfo.GetSerializedSize()</span><br><span class="line">    segIdx.IndexMemSize = taskInfo.GetMemSize()</span><br><span class="line">    segIdx.FinishedUTCTime = <span class="type">uint64</span>(time.Now().Unix())</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 3. 持久化到元数据存储</span></span><br><span class="line">    <span class="keyword">return</span> m.alterSegmentIndexes([]*model.SegmentIndex&#123;segIdx&#125;)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>关键点</strong>：</p><ul><li><code>IndexState</code> 更新为 <code>Finished</code></li><li>保存索引文件路径列表（<code>IndexFileKeys</code>）</li><li>记录索引大小信息，用于查询时加载</li></ul><h2 id="5-完整流程图"><a href="#5-完整流程图" class="headerlink" title="5. 完整流程图"></a>5. 完整流程图</h2><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br></pre></td><td class="code"><pre><span class="line">数据写入流程</span><br><span class="line">    ↓</span><br><span class="line">DataNode 接收数据 → 写入内存 Buffer</span><br><span class="line">    ↓</span><br><span class="line">触发 Flush（达到阈值或手动触发）</span><br><span class="line">    ↓</span><br><span class="line">DataNode 完成 Flush → 保存 Binlog 到存储</span><br><span class="line">    ↓</span><br><span class="line">调用 DataCoord.SaveBinlogPaths()</span><br><span class="line">    ↓</span><br><span class="line">if req.GetFlushed() == true:</span><br><span class="line">    s.flushCh &lt;- segmentID</span><br><span class="line">    ↓</span><br><span class="line">DataCoord.postFlush() 转发</span><br><span class="line">    ↓</span><br><span class="line">getBuildIndexChSingleton() &lt;- segmentID</span><br><span class="line">    ↓</span><br><span class="line">Index Inspector 监听</span><br><span class="line">    ↓</span><br><span class="line">indexInspector.createIndexesForSegment()</span><br><span class="line">    ├─ 获取 Collection 的所有索引</span><br><span class="line">    ├─ 检查哪些索引未构建</span><br><span class="line">    └─ 为每个索引创建 SegmentIndex</span><br><span class="line">        ├─ 分配 BuildID</span><br><span class="line">        ├─ 创建元数据（状态：Unissued）</span><br><span class="line">        └─ 加入调度器队列</span><br><span class="line">            ↓</span><br><span class="line">DataCoord 调度器选择 DataNode</span><br><span class="line">    ↓</span><br><span class="line">发送 CreateIndex 请求到 DataNode</span><br><span class="line">    ↓</span><br><span class="line">DataNode 接收请求</span><br><span class="line">    ├─ 创建索引构建任务</span><br><span class="line">    └─ 加入任务队列</span><br><span class="line">        ↓</span><br><span class="line">任务执行（PreExecute → Execute → PostExecute）</span><br><span class="line">    ├─ PreExecute: 准备参数和路径</span><br><span class="line">    ├─ Execute: 调用 Knowhere 构建索引</span><br><span class="line">    └─ PostExecute: 上传索引文件</span><br><span class="line">        ├─ 上传到对象存储</span><br><span class="line">        ├─ 保存文件路径到元数据</span><br><span class="line">        └─ 通知 DataCoord 更新状态</span><br><span class="line">            ↓</span><br><span class="line">DataCoord 更新索引元数据</span><br><span class="line">    ├─ IndexState: Finished</span><br><span class="line">    ├─ IndexFileKeys: [文件路径列表]</span><br><span class="line">    └─ 持久化到元数据存储</span><br><span class="line">        ↓</span><br><span class="line">索引构建完成，可用于查询</span><br></pre></td></tr></table></figure><h2 id="6-关键组件说明"><a href="#6-关键组件说明" class="headerlink" title="6. 关键组件说明"></a>6. 关键组件说明</h2><h3 id="6-1-Index-Inspector"><a href="#6-1-Index-Inspector" class="headerlink" title="6.1 Index Inspector"></a>6.1 Index Inspector</h3><p><strong>位置</strong>: <code>internal/datacoord/index_inspector.go</code></p><ul><li><strong>职责</strong>：监听 Flush 完成事件，创建索引构建任务</li><li><strong>触发方式</strong>：<ul><li>定时检查（默认间隔由 <code>TaskCheckInterval</code> 配置）</li><li>Flush 完成通知（通过 channel）</li><li>Collection 索引创建通知</li></ul></li></ul><h3 id="6-2-Index-Scheduler"><a href="#6-2-Index-Scheduler" class="headerlink" title="6.2 Index Scheduler"></a>6.2 Index Scheduler</h3><p><strong>位置</strong>: <code>internal/datacoord/task/</code></p><ul><li><strong>职责</strong>：调度索引构建任务到合适的 DataNode</li><li><strong>调度策略</strong>：<ul><li>根据任务槽位（TaskSlot）控制并发度</li><li>选择负载较低的 DataNode</li><li>支持任务重试和失败处理</li></ul></li></ul><h3 id="6-3-Index-Task-Manager"><a href="#6-3-Index-Task-Manager" class="headerlink" title="6.3 Index Task Manager"></a>6.3 Index Task Manager</h3><p><strong>位置</strong>: <code>internal/datanode/index/</code></p><ul><li><strong>职责</strong>：管理 DataNode 上的索引构建任务</li><li><strong>功能</strong>：<ul><li>任务队列管理</li><li>任务状态跟踪</li><li>资源清理</li></ul></li></ul><h3 id="6-4-Index-Meta"><a href="#6-4-Index-Meta" class="headerlink" title="6.4 Index Meta"></a>6.4 Index Meta</h3><p><strong>位置</strong>: <code>internal/datacoord/index_meta.go</code></p><ul><li><strong>职责</strong>：管理索引元数据</li><li><strong>存储内容</strong>：<ul><li>SegmentIndex 信息（BuildID, SegmentID, IndexID）</li><li>索引状态（Unissued, InProgress, Finished, Failed）</li><li>索引文件路径和大小</li></ul></li></ul><h2 id="7-关键配置参数"><a href="#7-关键配置参数" class="headerlink" title="7. 关键配置参数"></a>7. 关键配置参数</h2><ul><li><code>DataCoordCfg.TaskCheckInterval</code>: Index Inspector 定时检查间隔（默认 1 秒）</li><li><code>DataCoordCfg.MinSegmentNumRowsToEnableIndex</code>: 启用索引的最小行数阈值</li><li><code>KnowhereConfig.Enable</code>: 是否启用 Knowhere 索引库</li><li>索引参数：<code>index_type</code>, <code>metric_type</code>, <code>nlist</code>, <code>nprobe</code> 等</li></ul><h2 id="8-索引状态流转"><a href="#8-索引状态流转" class="headerlink" title="8. 索引状态流转"></a>8. 索引状态流转</h2><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">Unissued → InProgress → Finished</span><br><span class="line">              ↓</span><br><span class="line">           Failed (可重试)</span><br></pre></td></tr></table></figure><ul><li><strong>Unissued</strong>: 任务已创建，等待调度</li><li><strong>InProgress</strong>: 任务正在执行</li><li><strong>Finished</strong>: 索引构建完成</li><li><strong>Failed</strong>: 构建失败，可重试</li></ul><h2 id="9-性能优化点"><a href="#9-性能优化点" class="headerlink" title="9. 性能优化点"></a>9. 性能优化点</h2><ol><li><strong>异步构建</strong>：索引构建不影响数据写入性能</li><li><strong>批量处理</strong>：对完整 Segment 构建索引，效率更高</li><li><strong>资源控制</strong>：通过 TaskSlot 控制并发度，避免资源耗尽</li><li><strong>早期释放</strong>：索引构建完成后立即释放内存</li><li><strong>分片上传</strong>：大索引文件分片上传，提高可靠性</li></ol><h2 id="10-错误处理"><a href="#10-错误处理" class="headerlink" title="10. 错误处理"></a>10. 错误处理</h2><ul><li><strong>Segment 不存在</strong>：忽略该任务</li><li><strong>索引构建失败</strong>：任务状态标记为 Failed，可重试</li><li><strong>上传失败</strong>：清理本地数据，标记失败</li><li><strong>元数据更新失败</strong>：记录日志，等待重试</li></ul><h2 id="11-监控指标"><a href="#11-监控指标" class="headerlink" title="11. 监控指标"></a>11. 监控指标</h2><ul><li><code>DataNodeBuildIndexTaskCounter</code>: 索引构建任务计数</li><li><code>DataNodeKnowhereBuildIndexLatency</code>: 索引构建延迟</li><li><code>DataNodeEncodeIndexFileLatency</code>: 索引文件编码和上传延迟</li><li><code>DataNodeSaveIndexFileLatency</code>: 保存索引文件元数据延迟</li><li><code>DataCoordStoredIndexFilesSize</code>: 存储的索引文件总大小</li></ul><h2 id="12-注意事项"><a href="#12-注意事项" class="headerlink" title="12. 注意事项"></a>12. 注意事项</h2><ol><li><strong>索引构建是异步的</strong>：数据写入后不会立即有索引，需要等待 Flush 和索引构建完成</li><li><strong>小 Segment 可能不构建索引</strong>：如果 Segment 行数小于阈值，可能跳过索引构建</li><li><strong>L0 Segment 不构建索引</strong>：L0 级别的 Segment 用于实时查询，不构建持久化索引</li><li><strong>排序压缩模式</strong>：如果启用了排序压缩，需要等待 Segment 排序完成后再构建索引</li></ol>]]></content>
    
    
      
      
    <summary type="html">&lt;p&gt;本文档详细分析了 Milvus 中数据写入后如何同步构建索引的完整流程。&lt;/p&gt;
&lt;h2 id=&quot;1-概述&quot;&gt;&lt;a href=&quot;#1-概述&quot; class=&quot;headerlink&quot; title=&quot;1. 概述&quot;&gt;&lt;/a&gt;1. 概述&lt;/h2&gt;&lt;p&gt;Milvus 采用&lt;strong</summary>
      
    
    
    
    
    <category term="Milvus" scheme="https://szza.github.io/tags/Milvus/"/>
    
  </entry>
  
  <entry>
    <title>Milvus TTNode（时间戳节点）详解</title>
    <link href="https://szza.github.io/2025/08/09/Milvus/10_flush_pipeline_tt_node/"/>
    <id>https://szza.github.io/2025/08/09/Milvus/10_flush_pipeline_tt_node/</id>
    <published>2025-08-09T11:00:00.000Z</published>
    <updated>2026-01-06T13:10:46.358Z</updated>
    
    <content type="html"><![CDATA[<h2 id="概述"><a href="#概述" class="headerlink" title="概述"></a>概述</h2><p><code>ttNode</code>（TimeTick Node）是 FlowGraph 中的最后一个节点，负责管理和更新通道检查点（Checkpoint）。它定期或基于触发条件将检查点异步更新到 DataCoord，确保数据消费位置的持久化。</p><h2 id="架构设计"><a href="#架构设计" class="headerlink" title="架构设计"></a>架构设计</h2><h3 id="核心结构"><a href="#核心结构" class="headerlink" title="核心结构"></a>核心结构</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">type</span> ttNode <span class="keyword">struct</span> &#123;</span><br><span class="line">    BaseNode</span><br><span class="line">    </span><br><span class="line">    vChannelName       <span class="type">string</span></span><br><span class="line">    metacache          metacache.MetaCache</span><br><span class="line">    writeBufferManager writebuffer.BufferManager</span><br><span class="line">    lastUpdateTime     *atomic.Time</span><br><span class="line">    cpUpdater          *util.ChannelCheckpointUpdater</span><br><span class="line">    dropMode           *atomic.Bool</span><br><span class="line">    dropCallback       <span class="function"><span class="keyword">func</span><span class="params">()</span></span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>关键字段说明：</strong></p><ul><li><code>writeBufferManager</code>: WriteBuffer 管理器，用于获取检查点</li><li><code>lastUpdateTime</code>: 上次更新时间，用于控制更新频率</li><li><code>cpUpdater</code>: 检查点更新器，异步更新检查点</li><li><code>dropMode</code>: 删除模式标志</li></ul><h2 id="消息处理流程"><a href="#消息处理流程" class="headerlink" title="消息处理流程"></a>消息处理流程</h2><h3 id="Operate-方法"><a href="#Operate-方法" class="headerlink" title="Operate 方法"></a>Operate 方法</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(ttn *ttNode)</span></span> Operate(in []Msg) []Msg &#123;</span><br><span class="line">    fgMsg := in[<span class="number">0</span>].(*FlowGraphMsg)</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 1. 处理删除集合消息</span></span><br><span class="line">    <span class="keyword">if</span> fgMsg.dropCollection &#123;</span><br><span class="line">        ttn.dropMode.Store(<span class="literal">true</span>)</span><br><span class="line">        <span class="keyword">if</span> ttn.dropCallback != <span class="literal">nil</span> &#123;</span><br><span class="line">            <span class="keyword">defer</span> ttn.dropCallback()</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 2. 删除模式下跳过检查点更新</span></span><br><span class="line">    <span class="keyword">if</span> ttn.dropMode.Load() &#123;</span><br><span class="line">        <span class="keyword">return</span> []Msg&#123;&#125;</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 3. 处理关闭消息</span></span><br><span class="line">    <span class="keyword">if</span> fgMsg.IsCloseMsg() &#123;</span><br><span class="line">        <span class="keyword">if</span> ttn.dropMode.Load() &#123;</span><br><span class="line">            <span class="keyword">return</span> in</span><br><span class="line">        &#125;</span><br><span class="line">        <span class="comment">// 强制更新检查点</span></span><br><span class="line">        <span class="keyword">if</span> <span class="built_in">len</span>(fgMsg.EndPositions) &gt; <span class="number">0</span> &#123;</span><br><span class="line">            channelPos, _, err := ttn.writeBufferManager.GetCheckpoint(ttn.vChannelName)</span><br><span class="line">            <span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">                <span class="keyword">return</span> []Msg&#123;&#125;</span><br><span class="line">            &#125;</span><br><span class="line">            ttn.updateChannelCP(channelPos, curTs, <span class="literal">false</span>)</span><br><span class="line">        &#125;</span><br><span class="line">        <span class="keyword">return</span> in</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    curTs, _ := tsoutil.ParseTS(fgMsg.TimeRange.TimestampMax)</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 4. 获取检查点</span></span><br><span class="line">    channelPos, needUpdate, err := ttn.writeBufferManager.GetCheckpoint(ttn.vChannelName)</span><br><span class="line">    <span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">        <span class="keyword">return</span> []Msg&#123;&#125;</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 5. 定期更新（基于时间间隔）</span></span><br><span class="line">    <span class="keyword">if</span> curTs.Sub(ttn.lastUpdateTime.Load()) &gt;= paramtable.Get().DataNodeCfg.UpdateChannelCheckpointInterval.GetAsDuration(time.Second) &#123;</span><br><span class="line">        ttn.updateChannelCP(channelPos, curTs, <span class="literal">false</span>)</span><br><span class="line">        <span class="keyword">return</span> []Msg&#123;&#125;</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 6. 触发更新（基于 flushTs 条件）</span></span><br><span class="line">    <span class="keyword">if</span> needUpdate &#123;</span><br><span class="line">        ttn.updateChannelCP(channelPos, curTs, <span class="literal">true</span>)</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">return</span> []Msg&#123;&#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h2 id="检查点更新机制"><a href="#检查点更新机制" class="headerlink" title="检查点更新机制"></a>检查点更新机制</h2><h3 id="updateChannelCP-方法"><a href="#updateChannelCP-方法" class="headerlink" title="updateChannelCP 方法"></a>updateChannelCP 方法</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(ttn *ttNode)</span></span> updateChannelCP(channelPos *msgpb.MsgPosition, curTs time.Time, flush <span class="type">bool</span>) &#123;</span><br><span class="line">    callBack := <span class="function"><span class="keyword">func</span><span class="params">()</span></span> &#123;</span><br><span class="line">        channelCPTs, _ := tsoutil.ParseTS(channelPos.GetTimestamp())</span><br><span class="line">        <span class="comment">// 重置 flushTs，防止频繁刷新</span></span><br><span class="line">        ttn.writeBufferManager.NotifyCheckpointUpdated(ttn.vChannelName, channelPos.GetTimestamp())</span><br><span class="line">        log.Debug(<span class="string">&quot;UpdateChannelCheckpoint success&quot;</span>,</span><br><span class="line">            zap.String(<span class="string">&quot;channel&quot;</span>, ttn.vChannelName),</span><br><span class="line">            zap.Uint64(<span class="string">&quot;cpTs&quot;</span>, channelPos.GetTimestamp()),</span><br><span class="line">            zap.Time(<span class="string">&quot;cpTime&quot;</span>, channelCPTs))</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 添加到更新器</span></span><br><span class="line">    ttn.cpUpdater.AddTask(channelPos, flush, callBack)</span><br><span class="line">    ttn.lastUpdateTime.Store(curTs)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>关键逻辑：</strong></p><ul><li><code>flush</code> 参数：标记是否由刷新操作触发</li><li><code>callBack</code>：更新成功后的回调，用于重置 <code>flushTs</code></li></ul><h2 id="ChannelCheckpointUpdater-详解"><a href="#ChannelCheckpointUpdater-详解" class="headerlink" title="ChannelCheckpointUpdater 详解"></a>ChannelCheckpointUpdater 详解</h2><h3 id="核心结构-1"><a href="#核心结构-1" class="headerlink" title="核心结构"></a>核心结构</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">type</span> ChannelCheckpointUpdater <span class="keyword">struct</span> &#123;</span><br><span class="line">    broker broker.Broker</span><br><span class="line">    </span><br><span class="line">    mu         sync.RWMutex</span><br><span class="line">    tasks      <span class="keyword">map</span>[<span class="type">string</span>]*channelCPUpdateTask</span><br><span class="line">    notifyChan <span class="keyword">chan</span> <span class="keyword">struct</span>&#123;&#125;</span><br><span class="line">    </span><br><span class="line">    closeCh            <span class="keyword">chan</span> <span class="keyword">struct</span>&#123;&#125;</span><br><span class="line">    closeOnce          sync.Once</span><br><span class="line">    updateDoneCallback <span class="function"><span class="keyword">func</span><span class="params">(*msgpb.MsgPosition)</span></span></span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="keyword">type</span> channelCPUpdateTask <span class="keyword">struct</span> &#123;</span><br><span class="line">    pos      *msgpb.MsgPosition</span><br><span class="line">    callback <span class="function"><span class="keyword">func</span><span class="params">()</span></span></span><br><span class="line">    flush    <span class="type">bool</span>  <span class="comment">// 是否由刷新触发</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h3 id="启动循环"><a href="#启动循环" class="headerlink" title="启动循环"></a>启动循环</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(ccu *ChannelCheckpointUpdater)</span></span> Start() &#123;</span><br><span class="line">    ticker := time.NewTicker(paramtable.Get().DataNodeCfg.ChannelCheckpointUpdateTickInSeconds.GetAsDuration(time.Second))</span><br><span class="line">    <span class="keyword">defer</span> ticker.Stop()</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">for</span> &#123;</span><br><span class="line">        <span class="keyword">select</span> &#123;</span><br><span class="line">        <span class="keyword">case</span> &lt;-ccu.closeCh:</span><br><span class="line">            <span class="keyword">return</span></span><br><span class="line">        <span class="keyword">case</span> &lt;-ccu.notifyChan:</span><br><span class="line">            <span class="comment">// 处理刷新触发的更新</span></span><br><span class="line">            <span class="keyword">var</span> tasks []*channelCPUpdateTask</span><br><span class="line">            ccu.mu.Lock()</span><br><span class="line">            <span class="keyword">for</span> _, task := <span class="keyword">range</span> ccu.tasks &#123;</span><br><span class="line">                <span class="keyword">if</span> task.flush &#123;</span><br><span class="line">                    task.flush = <span class="literal">false</span>  <span class="comment">// 重置标志</span></span><br><span class="line">                    tasks = <span class="built_in">append</span>(tasks, task)</span><br><span class="line">                &#125;</span><br><span class="line">            &#125;</span><br><span class="line">            ccu.mu.Unlock()</span><br><span class="line">            <span class="keyword">if</span> <span class="built_in">len</span>(tasks) &gt; <span class="number">0</span> &#123;</span><br><span class="line">                ccu.updateCheckpoints(tasks)</span><br><span class="line">            &#125;</span><br><span class="line">        <span class="keyword">case</span> &lt;-ticker.C:</span><br><span class="line">            <span class="comment">// 定期执行更新</span></span><br><span class="line">            ccu.execute()</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h3 id="添加任务"><a href="#添加任务" class="headerlink" title="添加任务"></a>添加任务</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(ccu *ChannelCheckpointUpdater)</span></span> AddTask(channelPos *msgpb.MsgPosition, flush <span class="type">bool</span>, callback <span class="function"><span class="keyword">func</span><span class="params">()</span></span>) &#123;</span><br><span class="line">    <span class="keyword">if</span> flush &#123;</span><br><span class="line">        <span class="comment">// 刷新触发时，立即触发更新尝试</span></span><br><span class="line">        <span class="keyword">defer</span> ccu.trigger()</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    channel := channelPos.GetChannelName()</span><br><span class="line">    task, ok := ccu.getTask(channel)</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">if</span> !ok &#123;</span><br><span class="line">        <span class="comment">// 新任务</span></span><br><span class="line">        ccu.mu.Lock()</span><br><span class="line">        <span class="keyword">defer</span> ccu.mu.Unlock()</span><br><span class="line">        ccu.tasks[channel] = &amp;channelCPUpdateTask&#123;</span><br><span class="line">            pos:      channelPos,</span><br><span class="line">            callback: callback,</span><br><span class="line">            flush:    flush,</span><br><span class="line">        &#125;</span><br><span class="line">        <span class="keyword">return</span></span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 合并任务</span></span><br><span class="line">    max := <span class="function"><span class="keyword">func</span><span class="params">(a, b *msgpb.MsgPosition)</span></span> *msgpb.MsgPosition &#123;</span><br><span class="line">        <span class="keyword">if</span> a.GetTimestamp() &gt; b.GetTimestamp() &#123;</span><br><span class="line">            <span class="keyword">return</span> a</span><br><span class="line">        &#125;</span><br><span class="line">        <span class="keyword">return</span> b</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 更新条件：</span></span><br><span class="line">    <span class="comment">// 1. 位置更新了（时间戳更大）</span></span><br><span class="line">    <span class="comment">// 2. 刷新触发但任务未标记为刷新</span></span><br><span class="line">    <span class="keyword">if</span> task.pos.GetTimestamp() &lt; channelPos.GetTimestamp() || (flush &amp;&amp; !task.flush) &#123;</span><br><span class="line">        ccu.mu.Lock()</span><br><span class="line">        <span class="keyword">defer</span> ccu.mu.Unlock()</span><br><span class="line">        ccu.tasks[channel] = &amp;channelCPUpdateTask&#123;</span><br><span class="line">            pos:      max(channelPos, task.pos),  <span class="comment">// 取最大时间戳</span></span><br><span class="line">            callback: callback,</span><br><span class="line">            flush:    flush || task.flush,  <span class="comment">// 保留刷新标志</span></span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h3 id="更新检查点"><a href="#更新检查点" class="headerlink" title="更新检查点"></a>更新检查点</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(ccu *ChannelCheckpointUpdater)</span></span> updateCheckpoints(tasks []*channelCPUpdateTask) &#123;</span><br><span class="line">    <span class="comment">// 1. 分批处理（每批最大数量限制）</span></span><br><span class="line">    taskGroups := lo.Chunk(tasks, paramtable.Get().DataNodeCfg.MaxChannelCheckpointsPerRPC.GetAsInt())</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 2. 并行处理（最大并行数限制）</span></span><br><span class="line">    updateChanCPMaxParallel := paramtable.Get().DataNodeCfg.UpdateChannelCheckpointMaxParallel.GetAsInt()</span><br><span class="line">    rpcGroups := lo.Chunk(taskGroups, updateChanCPMaxParallel)</span><br><span class="line">    </span><br><span class="line">    finished := typeutil.NewConcurrentMap[<span class="type">string</span>, *channelCPUpdateTask]()</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">for</span> _, groups := <span class="keyword">range</span> rpcGroups &#123;</span><br><span class="line">        wg := &amp;sync.WaitGroup&#123;&#125;</span><br><span class="line">        <span class="keyword">for</span> _, tasks := <span class="keyword">range</span> groups &#123;</span><br><span class="line">            wg.Add(<span class="number">1</span>)</span><br><span class="line">            <span class="keyword">go</span> <span class="function"><span class="keyword">func</span><span class="params">(tasks []*channelCPUpdateTask)</span></span> &#123;</span><br><span class="line">                <span class="keyword">defer</span> wg.Done()</span><br><span class="line">                </span><br><span class="line">                <span class="comment">// 3. 构建 RPC 请求</span></span><br><span class="line">                timeout := paramtable.Get().DataNodeCfg.UpdateChannelCheckpointRPCTimeout.GetAsDuration(time.Second)</span><br><span class="line">                ctx, cancel := context.WithTimeout(context.Background(), timeout)</span><br><span class="line">                <span class="keyword">defer</span> cancel()</span><br><span class="line">                </span><br><span class="line">                channelCPs := lo.Map(tasks, <span class="function"><span class="keyword">func</span><span class="params">(t *channelCPUpdateTask, _ <span class="type">int</span>)</span></span> *msgpb.MsgPosition &#123;</span><br><span class="line">                    <span class="keyword">return</span> t.pos</span><br><span class="line">                &#125;)</span><br><span class="line">                </span><br><span class="line">                <span class="comment">// 4. 发送 RPC</span></span><br><span class="line">                err := ccu.broker.UpdateChannelCheckpoint(ctx, channelCPs)</span><br><span class="line">                <span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">                    log.Warn(<span class="string">&quot;update channel checkpoint failed&quot;</span>, zap.Error(err))</span><br><span class="line">                    <span class="keyword">return</span>  <span class="comment">// 失败时不删除任务，等待重试</span></span><br><span class="line">                &#125;</span><br><span class="line">                </span><br><span class="line">                <span class="comment">// 5. 执行回调</span></span><br><span class="line">                <span class="keyword">for</span> _, task := <span class="keyword">range</span> tasks &#123;</span><br><span class="line">                    task.callback()</span><br><span class="line">                    finished.Insert(task.pos.GetChannelName(), task)</span><br><span class="line">                    <span class="keyword">if</span> ccu.updateDoneCallback != <span class="literal">nil</span> &#123;</span><br><span class="line">                        ccu.updateDoneCallback(task.pos)</span><br><span class="line">                    &#125;</span><br><span class="line">                &#125;</span><br><span class="line">            &#125;(tasks)</span><br><span class="line">        &#125;</span><br><span class="line">        wg.Wait()</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 6. 清理已完成的任务</span></span><br><span class="line">    ccu.mu.Lock()</span><br><span class="line">    <span class="keyword">defer</span> ccu.mu.Unlock()</span><br><span class="line">    finished.Range(<span class="function"><span class="keyword">func</span><span class="params">(_ <span class="type">string</span>, task *channelCPUpdateTask)</span></span> <span class="type">bool</span> &#123;</span><br><span class="line">        channel := task.pos.GetChannelName()</span><br><span class="line">        <span class="comment">// 只有当任务位置 &gt;= 当前任务位置时才删除（避免删除更新的任务）</span></span><br><span class="line">        <span class="keyword">if</span> ccu.tasks[channel].pos.GetTimestamp() &lt;= task.pos.GetTimestamp() &#123;</span><br><span class="line">            <span class="built_in">delete</span>(ccu.tasks, channel)</span><br><span class="line">        &#125;</span><br><span class="line">        <span class="keyword">return</span> <span class="literal">true</span></span><br><span class="line">    &#125;)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h2 id="检查点更新触发条件"><a href="#检查点更新触发条件" class="headerlink" title="检查点更新触发条件"></a>检查点更新触发条件</h2><h3 id="1-定期更新"><a href="#1-定期更新" class="headerlink" title="1. 定期更新"></a>1. 定期更新</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">if</span> curTs.Sub(ttn.lastUpdateTime.Load()) &gt;= UpdateChannelCheckpointInterval &#123;</span><br><span class="line">    ttn.updateChannelCP(channelPos, curTs, <span class="literal">false</span>)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>触发条件：</strong> 距离上次更新时间超过配置的间隔</p><h3 id="2-刷新触发"><a href="#2-刷新触发" class="headerlink" title="2. 刷新触发"></a>2. 刷新触发</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">channelPos, needUpdate, err := ttn.writeBufferManager.GetCheckpoint(ttn.vChannelName)</span><br><span class="line"><span class="keyword">if</span> needUpdate &#123;</span><br><span class="line">    ttn.updateChannelCP(channelPos, curTs, <span class="literal">true</span>)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>触发条件：</strong> <code>GetCheckpoint</code> 返回 <code>needUpdate = true</code></p><p><strong>needUpdate 的计算：</strong></p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">flushTs := buf.GetFlushTimestamp()</span><br><span class="line"><span class="keyword">return</span> cp, flushTs != nonFlushTS &amp;&amp; cp.GetTimestamp() &gt;= flushTs, <span class="literal">nil</span></span><br></pre></td></tr></table></figure><p><strong>含义：</strong> 当检查点时间戳 &gt;&#x3D; flushTs 时，需要更新检查点</p><h2 id="检查点失败处理"><a href="#检查点失败处理" class="headerlink" title="检查点失败处理"></a>检查点失败处理</h2><h3 id="失败重试机制"><a href="#失败重试机制" class="headerlink" title="失败重试机制"></a>失败重试机制</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">err := ccu.broker.UpdateChannelCheckpoint(ctx, channelCPs)</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">    log.Warn(<span class="string">&quot;update channel checkpoint failed&quot;</span>, zap.Error(err))</span><br><span class="line">    <span class="keyword">return</span>  <span class="comment">// 不删除任务，等待下次重试</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>特点：</strong></p><ul><li>失败时不删除任务</li><li>任务保留在 <code>ccu.tasks</code> 中</li><li>下次 <code>execute()</code> 或 <code>notifyChan</code> 触发时会重试</li></ul><h3 id="任务合并"><a href="#任务合并" class="headerlink" title="任务合并"></a>任务合并</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// 合并逻辑：取最大时间戳，保留刷新标志</span></span><br><span class="line">ccu.tasks[channel] = &amp;channelCPUpdateTask&#123;</span><br><span class="line">    pos:      max(channelPos, task.pos),</span><br><span class="line">    callback: callback,</span><br><span class="line">    flush:    flush || task.flush,</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>优势：</strong></p><ul><li>避免重复更新</li><li>确保使用最新的检查点</li><li>保留刷新触发标志</li></ul><h2 id="设计特点"><a href="#设计特点" class="headerlink" title="设计特点"></a>设计特点</h2><h3 id="1-异步更新"><a href="#1-异步更新" class="headerlink" title="1. 异步更新"></a>1. 异步更新</h3><p>通过 <code>ChannelCheckpointUpdater</code> 实现异步更新，不阻塞 FlowGraph。</p><h3 id="2-批量处理"><a href="#2-批量处理" class="headerlink" title="2. 批量处理"></a>2. 批量处理</h3><p>支持批量更新多个通道的检查点，提高效率。</p><h3 id="3-并行执行"><a href="#3-并行执行" class="headerlink" title="3. 并行执行"></a>3. 并行执行</h3><p>支持并行处理多个批次，充分利用系统资源。</p><h3 id="4-失败重试"><a href="#4-失败重试" class="headerlink" title="4. 失败重试"></a>4. 失败重试</h3><p>失败的任务保留在队列中，自动重试。</p><h3 id="5-刷新加速"><a href="#5-刷新加速" class="headerlink" title="5. 刷新加速"></a>5. 刷新加速</h3><p>刷新触发的更新会立即尝试，不等待定时器。</p><h2 id="配置参数"><a href="#配置参数" class="headerlink" title="配置参数"></a>配置参数</h2><ul><li><code>UpdateChannelCheckpointInterval</code>: 定期更新间隔</li><li><code>ChannelCheckpointUpdateTickInSeconds</code>: 更新器定时器间隔</li><li><code>MaxChannelCheckpointsPerRPC</code>: 每次 RPC 最大检查点数</li><li><code>UpdateChannelCheckpointMaxParallel</code>: 最大并行数</li><li><code>UpdateChannelCheckpointRPCTimeout</code>: RPC 超时时间</li></ul><h2 id="相关文档"><a href="#相关文档" class="headerlink" title="相关文档"></a>相关文档</h2><ul><li><a href="./flush_pipeline_write_buffer_manager.md">WriteBuffer 管理器</a></li><li><a href="./flush_pipeline_flush_checkpoint.md">Flush 与 Checkpoint 机制</a></li><li><a href="./flush_pipeline_overview.md">总览文档</a></li></ul>]]></content>
    
    
      
      
    <summary type="html">&lt;h2 id=&quot;概述&quot;&gt;&lt;a href=&quot;#概述&quot; class=&quot;headerlink&quot; title=&quot;概述&quot;&gt;&lt;/a&gt;概述&lt;/h2&gt;&lt;p&gt;&lt;code&gt;ttNode&lt;/code&gt;（TimeTick Node）是 FlowGraph 中的最后一个节点，负责管理和更新通道检查点（Ch</summary>
      
    
    
    
    
    <category term="Milvus" scheme="https://szza.github.io/tags/Milvus/"/>
    
  </entry>
  
  <entry>
    <title>Milvus SyncManager 与 SyncTask 详解</title>
    <link href="https://szza.github.io/2025/08/09/Milvus/9_flush_pipeline_sync_manager/"/>
    <id>https://szza.github.io/2025/08/09/Milvus/9_flush_pipeline_sync_manager/</id>
    <published>2025-08-09T10:00:00.000Z</published>
    <updated>2026-01-06T13:10:45.842Z</updated>
    
    <content type="html"><![CDATA[<h2 id="概述"><a href="#概述" class="headerlink" title="概述"></a>概述</h2><p><code>SyncManager</code> 是 Milvus DataNode 中管理异步数据同步的核心组件，负责将内存中的数据持久化到对象存储。<code>SyncTask</code> 是执行同步的具体任务单元，封装了数据写入、元数据更新等完整流程。</p><h2 id="SyncManager-架构"><a href="#SyncManager-架构" class="headerlink" title="SyncManager 架构"></a>SyncManager 架构</h2><h3 id="核心结构"><a href="#核心结构" class="headerlink" title="核心结构"></a>核心结构</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">type</span> syncManager <span class="keyword">struct</span> &#123;</span><br><span class="line">    *keyLockDispatcher[<span class="type">int64</span>]</span><br><span class="line">    chunkManager storage.ChunkManager</span><br><span class="line">    </span><br><span class="line">    tasks     *typeutil.ConcurrentMap[<span class="type">string</span>, Task]</span><br><span class="line">    taskStats *expirable.LRU[<span class="type">string</span>, Task]</span><br><span class="line">    handler   config.EventHandler</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>关键字段说明：</strong></p><ul><li><code>keyLockDispatcher</code>: 基于段 ID 的任务分发器，确保同一段的任务串行执行</li><li><code>chunkManager</code>: 对象存储管理器</li><li><code>tasks</code>: 当前执行中的任务映射</li><li><code>taskStats</code>: 任务统计信息（LRU 缓存，15分钟过期）</li></ul><h3 id="接口定义"><a href="#接口定义" class="headerlink" title="接口定义"></a>接口定义</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">type</span> SyncManager <span class="keyword">interface</span> &#123;</span><br><span class="line">    SyncData(ctx context.Context, task Task, callbacks ...<span class="keyword">func</span>(<span class="type">error</span>) <span class="type">error</span>) (*conc.Future[<span class="keyword">struct</span>&#123;&#125;], <span class="type">error</span>)</span><br><span class="line">    SyncDataWithChunkManager(ctx context.Context, task Task, chunkManager storage.ChunkManager, callbacks ...<span class="keyword">func</span>(<span class="type">error</span>) <span class="type">error</span>) (*conc.Future[<span class="keyword">struct</span>&#123;&#125;], <span class="type">error</span>)</span><br><span class="line">    Close() <span class="type">error</span></span><br><span class="line">    TaskStatsJSON() <span class="type">string</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h2 id="SyncManager-初始化"><a href="#SyncManager-初始化" class="headerlink" title="SyncManager 初始化"></a>SyncManager 初始化</h2><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">NewSyncManager</span><span class="params">(chunkManager storage.ChunkManager)</span></span> SyncManager &#123;</span><br><span class="line">    params := paramtable.Get()</span><br><span class="line">    cpuNum := hardware.GetCPUNum()</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 初始化工作池大小：CPU 核心数 × 每核心任务数</span></span><br><span class="line">    initPoolSize := cpuNum * params.DataNodeCfg.MaxParallelSyncMgrTasksPerCPUCore.GetAsInt()</span><br><span class="line">    dispatcher := newKeyLockDispatcher[<span class="type">int64</span>](initPoolSize)</span><br><span class="line">    </span><br><span class="line">    syncMgr := &amp;syncManager&#123;</span><br><span class="line">        keyLockDispatcher: dispatcher,</span><br><span class="line">        chunkManager:      chunkManager,</span><br><span class="line">        tasks:             typeutil.NewConcurrentMap[<span class="type">string</span>, Task](),</span><br><span class="line">        taskStats:         expirable.NewLRU[<span class="type">string</span>, Task](<span class="number">64</span>, <span class="literal">nil</span>, time.Minute*<span class="number">15</span>),</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 监听配置变更，动态调整工作池大小</span></span><br><span class="line">    handler := config.NewHandler(<span class="string">&quot;datanode.syncmgr.poolsize&quot;</span>, syncMgr.resizeHandler)</span><br><span class="line">    syncMgr.handler = handler</span><br><span class="line">    params.Watch(params.DataNodeCfg.MaxParallelSyncMgrTasksPerCPUCore.Key, handler)</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">return</span> syncMgr</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h2 id="任务提交流程"><a href="#任务提交流程" class="headerlink" title="任务提交流程"></a>任务提交流程</h2><h3 id="SyncData-方法"><a href="#SyncData-方法" class="headerlink" title="SyncData 方法"></a>SyncData 方法</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(mgr *syncManager)</span></span> SyncData(ctx context.Context, task Task, callbacks ...<span class="keyword">func</span>(<span class="type">error</span>) <span class="type">error</span>) (*conc.Future[<span class="keyword">struct</span>&#123;&#125;], <span class="type">error</span>) &#123;</span><br><span class="line">    <span class="keyword">if</span> mgr.workerPool.IsClosed() &#123;</span><br><span class="line">        <span class="keyword">return</span> <span class="literal">nil</span>, errors.New(<span class="string">&quot;sync manager is closed&quot;</span>)</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 为 SyncTask 设置 ChunkManager</span></span><br><span class="line">    <span class="keyword">switch</span> t := task.(<span class="keyword">type</span>) &#123;</span><br><span class="line">    <span class="keyword">case</span> *SyncTask:</span><br><span class="line">        t.WithChunkManager(mgr.chunkManager)</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">return</span> mgr.safeSubmitTask(ctx, task, callbacks...), <span class="literal">nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h3 id="safeSubmitTask-方法"><a href="#safeSubmitTask-方法" class="headerlink" title="safeSubmitTask 方法"></a>safeSubmitTask 方法</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(mgr *syncManager)</span></span> safeSubmitTask(ctx context.Context, task Task, callbacks ...<span class="keyword">func</span>(<span class="type">error</span>) <span class="type">error</span>) *conc.Future[<span class="keyword">struct</span>&#123;&#125;] &#123;</span><br><span class="line">    <span class="comment">// 生成任务键：segmentID-timestamp</span></span><br><span class="line">    taskKey := fmt.Sprintf(<span class="string">&quot;%d-%d&quot;</span>, task.SegmentID(), task.Checkpoint().GetTimestamp())</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 记录任务</span></span><br><span class="line">    mgr.tasks.Insert(taskKey, task)</span><br><span class="line">    mgr.taskStats.Add(taskKey, task)</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 使用段 ID 作为分发键，确保同一段的任务串行执行</span></span><br><span class="line">    key := task.SegmentID()</span><br><span class="line">    <span class="keyword">return</span> mgr.submit(ctx, key, task, callbacks...)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h3 id="submit-方法"><a href="#submit-方法" class="headerlink" title="submit 方法"></a>submit 方法</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(mgr *syncManager)</span></span> submit(ctx context.Context, key <span class="type">int64</span>, task Task, callbacks ...<span class="keyword">func</span>(<span class="type">error</span>) <span class="type">error</span>) *conc.Future[<span class="keyword">struct</span>&#123;&#125;] &#123;</span><br><span class="line">    handler := <span class="function"><span class="keyword">func</span><span class="params">(err <span class="type">error</span>)</span></span> <span class="type">error</span> &#123;</span><br><span class="line">        taskKey := fmt.Sprintf(<span class="string">&quot;%d-%d&quot;</span>, task.SegmentID(), task.Checkpoint().GetTimestamp())</span><br><span class="line">        <span class="keyword">defer</span> <span class="function"><span class="keyword">func</span><span class="params">()</span></span> &#123;</span><br><span class="line">            mgr.tasks.Remove(taskKey)  <span class="comment">// 任务完成后移除</span></span><br><span class="line">        &#125;()</span><br><span class="line">        </span><br><span class="line">        <span class="keyword">if</span> err == <span class="literal">nil</span> &#123;</span><br><span class="line">            <span class="keyword">return</span> <span class="literal">nil</span></span><br><span class="line">        &#125;</span><br><span class="line">        </span><br><span class="line">        task.HandleError(err)  <span class="comment">// 处理错误</span></span><br><span class="line">        <span class="keyword">return</span> err</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    callbacks = <span class="built_in">append</span>([]<span class="function"><span class="keyword">func</span><span class="params">(<span class="type">error</span>)</span></span> <span class="type">error</span>&#123;handler&#125;, callbacks...)</span><br><span class="line">    <span class="keyword">return</span> mgr.Submit(ctx, key, task, callbacks...)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h2 id="SyncTask-详解"><a href="#SyncTask-详解" class="headerlink" title="SyncTask 详解"></a>SyncTask 详解</h2><h3 id="核心结构-1"><a href="#核心结构-1" class="headerlink" title="核心结构"></a>核心结构</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">type</span> SyncTask <span class="keyword">struct</span> &#123;</span><br><span class="line">    chunkManager storage.ChunkManager</span><br><span class="line">    allocator    allocator.Interface</span><br><span class="line">    </span><br><span class="line">    collectionID  <span class="type">int64</span></span><br><span class="line">    partitionID   <span class="type">int64</span></span><br><span class="line">    segmentID     <span class="type">int64</span></span><br><span class="line">    channelName   <span class="type">string</span></span><br><span class="line">    startPosition *msgpb.MsgPosition</span><br><span class="line">    checkpoint    *msgpb.MsgPosition</span><br><span class="line">    dataSource    <span class="type">string</span></span><br><span class="line">    batchRows     <span class="type">int64</span></span><br><span class="line">    level         datapb.SegmentLevel</span><br><span class="line">    </span><br><span class="line">    tsFrom typeutil.Timestamp</span><br><span class="line">    tsTo   typeutil.Timestamp</span><br><span class="line">    </span><br><span class="line">    metacache  metacache.MetaCache</span><br><span class="line">    metaWriter MetaWriter</span><br><span class="line">    schema     *schemapb.CollectionSchema</span><br><span class="line">    </span><br><span class="line">    pack *SyncPack</span><br><span class="line">    </span><br><span class="line">    insertBinlogs <span class="keyword">map</span>[<span class="type">int64</span>]*datapb.FieldBinlog</span><br><span class="line">    statsBinlogs  <span class="keyword">map</span>[<span class="type">int64</span>]*datapb.FieldBinlog</span><br><span class="line">    bm25Binlogs   <span class="keyword">map</span>[<span class="type">int64</span>]*datapb.FieldBinlog</span><br><span class="line">    deltaBinlog   *datapb.FieldBinlog</span><br><span class="line">    </span><br><span class="line">    manifestPath <span class="type">string</span></span><br><span class="line">    </span><br><span class="line">    writeRetryOpts []retry.Option</span><br><span class="line">    failureCallback <span class="function"><span class="keyword">func</span><span class="params">(err <span class="type">error</span>)</span></span></span><br><span class="line">    </span><br><span class="line">    tr *timerecord.TimeRecorder</span><br><span class="line">    </span><br><span class="line">    flushedSize <span class="type">int64</span></span><br><span class="line">    execTime    time.Duration</span><br><span class="line">    </span><br><span class="line">    storageConfig *indexpb.StorageConfig</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h3 id="SyncPack（数据包）"><a href="#SyncPack（数据包）" class="headerlink" title="SyncPack（数据包）"></a>SyncPack（数据包）</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">type</span> SyncPack <span class="keyword">struct</span> &#123;</span><br><span class="line">    metacache  metacache.MetaCache</span><br><span class="line">    metawriter MetaWriter</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 数据</span></span><br><span class="line">    insertData []*storage.InsertData</span><br><span class="line">    deltaData  *storage.DeleteData</span><br><span class="line">    bm25Stats  <span class="keyword">map</span>[<span class="type">int64</span>]*storage.BM25Stats</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 统计信息</span></span><br><span class="line">    tsFrom        typeutil.Timestamp</span><br><span class="line">    tsTo          typeutil.Timestamp</span><br><span class="line">    startPosition *msgpb.MsgPosition</span><br><span class="line">    checkpoint    *msgpb.MsgPosition</span><br><span class="line">    batchRows     <span class="type">int64</span></span><br><span class="line">    dataSource    <span class="type">string</span></span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 标志</span></span><br><span class="line">    isFlush <span class="type">bool</span></span><br><span class="line">    isDrop  <span class="type">bool</span></span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 元数据</span></span><br><span class="line">    collectionID <span class="type">int64</span></span><br><span class="line">    partitionID  <span class="type">int64</span></span><br><span class="line">    segmentID    <span class="type">int64</span></span><br><span class="line">    channelName  <span class="type">string</span></span><br><span class="line">    level        datapb.SegmentLevel</span><br><span class="line">    </span><br><span class="line">    errHandler <span class="function"><span class="keyword">func</span><span class="params">(err <span class="type">error</span>)</span></span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h2 id="SyncTask-执行流程"><a href="#SyncTask-执行流程" class="headerlink" title="SyncTask 执行流程"></a>SyncTask 执行流程</h2><h3 id="Run-方法"><a href="#Run-方法" class="headerlink" title="Run 方法"></a>Run 方法</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(t *SyncTask)</span></span> Run(ctx context.Context) (err <span class="type">error</span>) &#123;</span><br><span class="line">    t.tr = timerecord.NewTimeRecorder(<span class="string">&quot;syncTask&quot;</span>)</span><br><span class="line">    log := t.getLogger()</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">defer</span> <span class="function"><span class="keyword">func</span><span class="params">()</span></span> &#123;</span><br><span class="line">        <span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">            t.HandleError(err)</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;()</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 1. 检查段是否存在</span></span><br><span class="line">    segmentInfo, has := t.metacache.GetSegmentByID(t.segmentID)</span><br><span class="line">    <span class="keyword">if</span> !has &#123;</span><br><span class="line">        <span class="keyword">if</span> t.pack.isDrop &#123;</span><br><span class="line">            log.Info(<span class="string">&quot;segment dropped, discard sync task&quot;</span>)</span><br><span class="line">            <span class="keyword">return</span> <span class="literal">nil</span></span><br><span class="line">        &#125;</span><br><span class="line">        log.Warn(<span class="string">&quot;segment not found in metacache, may be already synced&quot;</span>)</span><br><span class="line">        <span class="keyword">return</span> <span class="literal">nil</span></span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 2. 获取列分组信息（StorageV2）</span></span><br><span class="line">    columnGroups := t.getColumnGroups(segmentInfo)</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 3. 写入数据到对象存储</span></span><br><span class="line">    <span class="keyword">switch</span> segmentInfo.GetStorageVersion() &#123;</span><br><span class="line">    <span class="keyword">case</span> storage.StorageV2:</span><br><span class="line">        writer := NewBulkPackWriterV2(...)</span><br><span class="line">        t.insertBinlogs, t.deltaBinlog, t.statsBinlogs, t.bm25Binlogs, </span><br><span class="line">            t.manifestPath, t.flushedSize, err = writer.Write(ctx, t.pack)</span><br><span class="line">    <span class="keyword">default</span>:</span><br><span class="line">        writer := NewBulkPackWriter(...)</span><br><span class="line">        t.insertBinlogs, t.deltaBinlog, t.statsBinlogs, t.bm25Binlogs, </span><br><span class="line">            t.flushedSize, err = writer.Write(ctx, t.pack)</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">        <span class="keyword">return</span> err</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 4. 记录监控指标</span></span><br><span class="line">    metrics.DataNodeWriteDataCount.Add(<span class="type">float64</span>(t.batchRows))</span><br><span class="line">    metrics.DataNodeFlushedSize.Add(<span class="type">float64</span>(t.flushedSize))</span><br><span class="line">    metrics.DataNodeSave2StorageLatency.Observe(<span class="type">float64</span>(t.tr.RecordSpan().Milliseconds()))</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 5. 更新元数据</span></span><br><span class="line">    <span class="keyword">if</span> t.metaWriter != <span class="literal">nil</span> &#123;</span><br><span class="line">        err = t.writeMeta(ctx)</span><br><span class="line">        <span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">            <span class="keyword">return</span> err</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 6. 释放数据</span></span><br><span class="line">    t.pack.ReleaseData()</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 7. 更新 MetaCache</span></span><br><span class="line">    actions := []metacache.SegmentAction&#123;metacache.FinishSyncing(t.batchRows)&#125;</span><br><span class="line">    <span class="keyword">if</span> columnGroups != <span class="literal">nil</span> &#123;</span><br><span class="line">        actions = <span class="built_in">append</span>(actions, metacache.UpdateCurrentSplit(columnGroups))</span><br><span class="line">    &#125;</span><br><span class="line">    <span class="keyword">if</span> t.pack.isFlush &#123;</span><br><span class="line">        actions = <span class="built_in">append</span>(actions, metacache.UpdateState(commonpb.SegmentState_Flushed))</span><br><span class="line">    &#125;</span><br><span class="line">    t.metacache.UpdateSegments(metacache.MergeSegmentAction(actions...), </span><br><span class="line">        metacache.WithSegmentIDs(t.segmentID))</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 8. 处理删除段</span></span><br><span class="line">    <span class="keyword">if</span> t.pack.isDrop &#123;</span><br><span class="line">        t.metacache.RemoveSegments(metacache.WithSegmentIDs(t.segmentID))</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    t.execTime = t.tr.ElapseSpan()</span><br><span class="line">    log.Info(<span class="string">&quot;task done&quot;</span>, zap.Int64(<span class="string">&quot;flushedSize&quot;</span>, t.flushedSize), </span><br><span class="line">        zap.Duration(<span class="string">&quot;timeTaken&quot;</span>, t.execTime))</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">return</span> <span class="literal">nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h3 id="执行流程图"><a href="#执行流程图" class="headerlink" title="执行流程图"></a>执行流程图</h3><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br></pre></td><td class="code"><pre><span class="line">┌─────────────────────────────────────────┐</span><br><span class="line">│         SyncTask.Run()                  │</span><br><span class="line">└──────────────┬──────────────────────────┘</span><br><span class="line">               │</span><br><span class="line">               ▼</span><br><span class="line">    ┌──────────────────────┐</span><br><span class="line">    │ 检查段是否存在        │</span><br><span class="line">    └──────────┬───────────┘</span><br><span class="line">               │</span><br><span class="line">               ▼</span><br><span class="line">    ┌──────────────────────┐</span><br><span class="line">    │ 获取列分组信息        │</span><br><span class="line">    └──────────┬───────────┘</span><br><span class="line">               │</span><br><span class="line">               ▼</span><br><span class="line">    ┌──────────────────────┐</span><br><span class="line">    │ 写入数据到对象存储    │</span><br><span class="line">    │ (BulkPackWriter)     │</span><br><span class="line">    └──────────┬───────────┘</span><br><span class="line">               │</span><br><span class="line">               ▼</span><br><span class="line">    ┌──────────────────────┐</span><br><span class="line">    │ 记录监控指标         │</span><br><span class="line">    └──────────┬───────────┘</span><br><span class="line">               │</span><br><span class="line">               ▼</span><br><span class="line">    ┌──────────────────────┐</span><br><span class="line">    │ 更新元数据到 DataCoord│</span><br><span class="line">    │ (MetaWriter)        │</span><br><span class="line">    └──────────┬───────────┘</span><br><span class="line">               │</span><br><span class="line">               ▼</span><br><span class="line">    ┌──────────────────────┐</span><br><span class="line">    │ 释放数据包           │</span><br><span class="line">    └──────────┬───────────┘</span><br><span class="line">               │</span><br><span class="line">               ▼</span><br><span class="line">    ┌──────────────────────┐</span><br><span class="line">    │ 更新本地 MetaCache    │</span><br><span class="line">    └──────────┬───────────┘</span><br><span class="line">               │</span><br><span class="line">               ▼</span><br><span class="line">    ┌──────────────────────┐</span><br><span class="line">    │ 任务完成             │</span><br><span class="line">    └──────────────────────┘</span><br></pre></td></tr></table></figure><h2 id="元数据更新（writeMeta）"><a href="#元数据更新（writeMeta）" class="headerlink" title="元数据更新（writeMeta）"></a>元数据更新（writeMeta）</h2><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(t *SyncTask)</span></span> writeMeta(ctx context.Context) <span class="type">error</span> &#123;</span><br><span class="line">    <span class="keyword">return</span> t.metaWriter.UpdateSync(ctx, t)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h3 id="MetaWriter-实现"><a href="#MetaWriter-实现" class="headerlink" title="MetaWriter 实现"></a>MetaWriter 实现</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(b *brokerMetaWriter)</span></span> UpdateSync(ctx context.Context, pack *SyncTask) <span class="type">error</span> &#123;</span><br><span class="line">    <span class="comment">// 1. 构建 SaveBinlogPathsRequest</span></span><br><span class="line">    req := &amp;datapb.SaveBinlogPathsRequest&#123;</span><br><span class="line">        Base: &amp;commonpb.MsgBase&#123;</span><br><span class="line">            MsgType: commonpb.MsgType_SaveBinlogPaths,</span><br><span class="line">        &#125;,</span><br><span class="line">        SegmentID:     pack.segmentID,</span><br><span class="line">        CollectionID:  pack.collectionID,</span><br><span class="line">        Field2BinlogPaths: pack.insertBinlogs,</span><br><span class="line">        Field2StatslogPaths: pack.statsBinlogs,</span><br><span class="line">        Field2BM25StatslogPaths: pack.bm25Binlogs,</span><br><span class="line">        Deltalogs:     []*datapb.FieldBinlog&#123;pack.deltaBinlog&#125;,</span><br><span class="line">        CheckPoints:   checkPoints,</span><br><span class="line">        Flushed:       pack.pack.isFlush,  <span class="comment">// 刷新标志</span></span><br><span class="line">        Dropped:       pack.pack.isDrop,    <span class="comment">// 删除标志</span></span><br><span class="line">        <span class="comment">// ...</span></span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 2. 重试机制发送 RPC</span></span><br><span class="line">    err := retry.Handle(ctx, <span class="function"><span class="keyword">func</span><span class="params">()</span></span> (<span class="type">bool</span>, <span class="type">error</span>) &#123;</span><br><span class="line">        err := b.broker.SaveBinlogPaths(ctx, req)</span><br><span class="line">        <span class="keyword">return</span> err != <span class="literal">nil</span>, err</span><br><span class="line">    &#125;)</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">return</span> err</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h2 id="错误处理"><a href="#错误处理" class="headerlink" title="错误处理"></a>错误处理</h2><h3 id="HandleError-方法"><a href="#HandleError-方法" class="headerlink" title="HandleError 方法"></a>HandleError 方法</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(t *SyncTask)</span></span> HandleError(err <span class="type">error</span>) &#123;</span><br><span class="line">    <span class="comment">// 调用失败回调</span></span><br><span class="line">    <span class="keyword">if</span> t.failureCallback != <span class="literal">nil</span> &#123;</span><br><span class="line">        t.failureCallback(err)</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 记录失败指标</span></span><br><span class="line">    metrics.DataNodeFlushBufferCount.WithLabelValues(metrics.FailLabel, t.level.String()).Inc()</span><br><span class="line">    <span class="keyword">if</span> !t.pack.isFlush &#123;</span><br><span class="line">        metrics.DataNodeAutoFlushBufferCount.WithLabelValues(metrics.FailLabel, t.level.String()).Inc()</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h2 id="Builder-模式创建"><a href="#Builder-模式创建" class="headerlink" title="Builder 模式创建"></a>Builder 模式创建</h2><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// 创建 SyncPack</span></span><br><span class="line">pack := &amp;syncmgr.SyncPack&#123;&#125;</span><br><span class="line">pack.WithInsertData(insertData).</span><br><span class="line">    WithDeleteData(deltaData).</span><br><span class="line">    WithCollectionID(collectionID).</span><br><span class="line">    WithSegmentID(segmentID).</span><br><span class="line">    WithCheckpoint(checkpoint).</span><br><span class="line">    WithFlush()  <span class="comment">// 标记为刷新</span></span><br><span class="line"></span><br><span class="line"><span class="comment">// 创建 SyncTask</span></span><br><span class="line">task := syncmgr.NewSyncTask().</span><br><span class="line">    WithSyncPack(pack).</span><br><span class="line">    WithAllocator(allocator).</span><br><span class="line">    WithMetaWriter(metaWriter).</span><br><span class="line">    WithMetaCache(metaCache).</span><br><span class="line">    WithSchema(schema)</span><br><span class="line"></span><br><span class="line"><span class="comment">// 提交到 SyncManager</span></span><br><span class="line">future, err := syncMgr.SyncData(ctx, task, callback)</span><br></pre></td></tr></table></figure><h2 id="设计特点"><a href="#设计特点" class="headerlink" title="设计特点"></a>设计特点</h2><h3 id="1-异步执行"><a href="#1-异步执行" class="headerlink" title="1. 异步执行"></a>1. 异步执行</h3><p>通过工作池实现异步任务执行，不阻塞主流程。</p><h3 id="2-串行保证"><a href="#2-串行保证" class="headerlink" title="2. 串行保证"></a>2. 串行保证</h3><p>使用 <code>keyLockDispatcher</code> 确保同一段的任务串行执行，避免并发冲突。</p><h3 id="3-存储版本支持"><a href="#3-存储版本支持" class="headerlink" title="3. 存储版本支持"></a>3. 存储版本支持</h3><p>支持 StorageV1 和 StorageV2 两种存储格式。</p><h3 id="4-列分组优化（StorageV2）"><a href="#4-列分组优化（StorageV2）" class="headerlink" title="4. 列分组优化（StorageV2）"></a>4. 列分组优化（StorageV2）</h3><p>根据列统计信息动态分组，优化查询性能。</p><h3 id="5-完善的监控"><a href="#5-完善的监控" class="headerlink" title="5. 完善的监控"></a>5. 完善的监控</h3><p>记录详细的执行指标和性能数据。</p><h2 id="相关文档"><a href="#相关文档" class="headerlink" title="相关文档"></a>相关文档</h2><ul><li><a href="./flush_pipeline_write_buffer.md">WriteBuffer 实现</a></li><li><a href="./flush_pipeline_tt_node.md">TT Node 详解</a></li><li><a href="./flush_pipeline_flush_checkpoint.md">Flush 与 Checkpoint 机制</a></li><li><a href="./flush_pipeline_overview.md">总览文档</a></li></ul>]]></content>
    
    
      
      
    <summary type="html">&lt;h2 id=&quot;概述&quot;&gt;&lt;a href=&quot;#概述&quot; class=&quot;headerlink&quot; title=&quot;概述&quot;&gt;&lt;/a&gt;概述&lt;/h2&gt;&lt;p&gt;&lt;code&gt;SyncManager&lt;/code&gt; 是 Milvus DataNode 中管理异步数据同步的核心组件，负责将内存中的数据持久化</summary>
      
    
    
    
    
    <category term="Milvus" scheme="https://szza.github.io/tags/Milvus/"/>
    
  </entry>
  
  <entry>
    <title>Milvus Flush 与 Checkpoint 机制详解</title>
    <link href="https://szza.github.io/2025/08/09/Milvus/8_flush_pipeline_flush_checkpoint/"/>
    <id>https://szza.github.io/2025/08/09/Milvus/8_flush_pipeline_flush_checkpoint/</id>
    <published>2025-08-09T09:00:00.000Z</published>
    <updated>2026-01-06T13:10:45.427Z</updated>
    
    <content type="html"><![CDATA[<h2 id="概述"><a href="#概述" class="headerlink" title="概述"></a>概述</h2><p>Flush（刷新）和 Checkpoint（检查点）是 Milvus DataNode 中确保数据可靠性和一致性的两个核心机制。它们协同工作，实现数据从内存到持久化存储的完整流程，并保证故障恢复时的数据一致性。</p><h2 id="Flush-操作全链路影响"><a href="#Flush-操作全链路影响" class="headerlink" title="Flush 操作全链路影响"></a>Flush 操作全链路影响</h2><h3 id="Flush-触发方式"><a href="#Flush-触发方式" class="headerlink" title="Flush 触发方式"></a>Flush 触发方式</h3><h4 id="1-手动刷新（Manual-Flush）"><a href="#1-手动刷新（Manual-Flush）" class="headerlink" title="1. 手动刷新（Manual Flush）"></a>1. 手动刷新（Manual Flush）</h4><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// 通过 DDL 消息触发</span></span><br><span class="line"><span class="keyword">case</span> commonpb.MsgType_ManualFlush:</span><br><span class="line">    manualFlushMsg := msg.(*adaptor.ManualFlushMessageBody)</span><br><span class="line">    ddn.msgHandler.HandleManualFlush(manualFlushMsg.ManualFlushMessage)</span><br></pre></td></tr></table></figure><p><strong>流程：</strong></p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">DataCoord → ManualFlush 消息 → ddNode → MsgHandler → WriteBufferManager.FlushChannel</span><br></pre></td></tr></table></figure><h4 id="2-自动刷新（Auto-Flush）"><a href="#2-自动刷新（Auto-Flush）" class="headerlink" title="2. 自动刷新（Auto Flush）"></a>2. 自动刷新（Auto Flush）</h4><p><strong>触发条件：</strong></p><ul><li>内存超过阈值（<code>MemoryForceSyncWatermark</code>）</li><li>缓冲区满（<code>GetFullBufferPolicy</code>）</li><li>时间策略（<code>GetSyncStaleBufferPolicy</code>）</li><li>刷新时间戳（<code>GetFlushTsPolicy</code>）</li></ul><h3 id="Flush-执行流程"><a href="#Flush-执行流程" class="headerlink" title="Flush 执行流程"></a>Flush 执行流程</h3><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br></pre></td><td class="code"><pre><span class="line">┌─────────────────────────────────────────────────────────────┐</span><br><span class="line">│ 1. FlushChannel 设置 flushTs                                 │</span><br><span class="line">│    WriteBufferManager.FlushChannel(channel, flushTs)        │</span><br><span class="line">└──────────────────────┬──────────────────────────────────────┘</span><br><span class="line">                       │</span><br><span class="line">                       ▼</span><br><span class="line">┌─────────────────────────────────────────────────────────────┐</span><br><span class="line">│ 2. WriteBuffer 应用刷新策略                                  │</span><br><span class="line">│    GetFlushTsPolicy: ts &gt;= flushTs 时选择段                 │</span><br><span class="line">└──────────────────────┬──────────────────────────────────────┘</span><br><span class="line">                       │</span><br><span class="line">                       ▼</span><br><span class="line">┌─────────────────────────────────────────────────────────────┐</span><br><span class="line">│ 3. 创建 SyncTask（isFlush = true）                           │</span><br><span class="line">│    SyncPack.WithFlush()                                      │</span><br><span class="line">└──────────────────────┬──────────────────────────────────────┘</span><br><span class="line">                       │</span><br><span class="line">                       ▼</span><br><span class="line">┌─────────────────────────────────────────────────────────────┐</span><br><span class="line">│ 4. SyncTask.Run() 执行                                       │</span><br><span class="line">│    - 写入数据到对象存储                                       │</span><br><span class="line">│    - 更新元数据到 DataCoord                                   │</span><br><span class="line">│    - 更新本地 MetaCache（状态 → Flushed）                    │</span><br><span class="line">└──────────────────────┬──────────────────────────────────────┘</span><br><span class="line">                       │</span><br><span class="line">                       ▼</span><br><span class="line">┌─────────────────────────────────────────────────────────────┐</span><br><span class="line">│ 5. 从 MetaCache 移除段                                        │</span><br><span class="line">│    metaCache.RemoveSegments(segmentID)                      │</span><br><span class="line">└─────────────────────────────────────────────────────────────┘</span><br></pre></td></tr></table></figure><h3 id="Flush-对系统的影响"><a href="#Flush-对系统的影响" class="headerlink" title="Flush 对系统的影响"></a>Flush 对系统的影响</h3><h4 id="1-数据持久化"><a href="#1-数据持久化" class="headerlink" title="1. 数据持久化"></a>1. 数据持久化</h4><ul><li><strong>对象存储</strong>: 数据写入到对象存储（S3&#x2F;MinIO 等）</li><li><strong>Binlog 生成</strong>: 生成 InsertLogs、StatsLogs、DeltaLogs</li><li><strong>Manifest</strong>: StorageV2 模式下生成 Manifest 文件</li></ul><h4 id="2-元数据更新"><a href="#2-元数据更新" class="headerlink" title="2. 元数据更新"></a>2. 元数据更新</h4><ul><li><strong>DataCoord</strong>: 通过 <code>SaveBinlogPaths</code> RPC 更新段元数据</li><li><strong>MetaCache</strong>: 更新段状态为 <code>Flushed</code>，并移除段</li></ul><h4 id="3-内存释放"><a href="#3-内存释放" class="headerlink" title="3. 内存释放"></a>3. 内存释放</h4><ul><li><strong>WriteBuffer</strong>: 释放段缓冲区内存</li><li><strong>MetaCache</strong>: 移除段元数据</li></ul><h4 id="4-监控指标"><a href="#4-监控指标" class="headerlink" title="4. 监控指标"></a>4. 监控指标</h4><ul><li><code>DataNodeFlushBufferCount</code>: 刷新计数</li><li><code>DataNodeFlushedSize</code>: 刷新数据大小</li><li><code>DataNodeFlushedRows</code>: 刷新行数</li><li><code>DataNodeSave2StorageLatency</code>: 存储延迟</li></ul><h2 id="Checkpoint-机制"><a href="#Checkpoint-机制" class="headerlink" title="Checkpoint 机制"></a>Checkpoint 机制</h2><h3 id="Checkpoint-的作用"><a href="#Checkpoint-的作用" class="headerlink" title="Checkpoint 的作用"></a>Checkpoint 的作用</h3><ol><li><strong>数据消费位置</strong>: 记录通道的数据消费位置</li><li><strong>故障恢复</strong>: 节点重启后从检查点恢复</li><li><strong>数据一致性</strong>: 确保数据不丢失、不重复</li></ol><h3 id="Checkpoint-更新流程"><a href="#Checkpoint-更新流程" class="headerlink" title="Checkpoint 更新流程"></a>Checkpoint 更新流程</h3><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br></pre></td><td class="code"><pre><span class="line">┌─────────────────────────────────────────────────────────────┐</span><br><span class="line">│ 1. ttNode 获取检查点                                         │</span><br><span class="line">│    channelPos, needUpdate := GetCheckpoint(channel)         │</span><br><span class="line">└──────────────────────┬──────────────────────────────────────┘</span><br><span class="line">                       │</span><br><span class="line">                       ▼</span><br><span class="line">┌─────────────────────────────────────────────────────────────┐</span><br><span class="line">│ 2. 判断是否需要更新                                         │</span><br><span class="line">│    needUpdate = (flushTs != 0 &amp;&amp; cp.Timestamp &gt;= flushTs)   │</span><br><span class="line">└──────────────────────┬──────────────────────────────────────┘</span><br><span class="line">                       │</span><br><span class="line">                       ▼</span><br><span class="line">┌─────────────────────────────────────────────────────────────┐</span><br><span class="line">│ 3. 添加到 CheckpointUpdater                                  │</span><br><span class="line">│    cpUpdater.AddTask(channelPos, flush=true, callback)      │</span><br><span class="line">└──────────────────────┬──────────────────────────────────────┘</span><br><span class="line">                       │</span><br><span class="line">                       ▼</span><br><span class="line">┌─────────────────────────────────────────────────────────────┐</span><br><span class="line">│ 4. 异步更新到 DataCoord                                       │</span><br><span class="line">│    broker.UpdateChannelCheckpoint(ctx, channelCPs)          │</span><br><span class="line">└──────────────────────┬──────────────────────────────────────┘</span><br><span class="line">                       │</span><br><span class="line">                       ▼</span><br><span class="line">┌─────────────────────────────────────────────────────────────┐</span><br><span class="line">│ 5. 更新成功回调                                              │</span><br><span class="line">│    NotifyCheckpointUpdated(channel, ts)                     │</span><br><span class="line">│    if ts &gt; flushTs: reset flushTs                            │</span><br><span class="line">└─────────────────────────────────────────────────────────────┘</span><br></pre></td></tr></table></figure><h3 id="Checkpoint-获取逻辑"><a href="#Checkpoint-获取逻辑" class="headerlink" title="Checkpoint 获取逻辑"></a>Checkpoint 获取逻辑</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(wb *writeBufferBase)</span></span> GetCheckpoint() *msgpb.MsgPosition &#123;</span><br><span class="line">    <span class="comment">// 1. 如果有非空缓冲区，返回最早的起始位置</span></span><br><span class="line">    <span class="keyword">var</span> earliest *checkpointCandidate</span><br><span class="line">    <span class="keyword">for</span> _, buf := <span class="keyword">range</span> wb.segmentBuffers &#123;</span><br><span class="line">        <span class="keyword">if</span> buf.insertBuffer.rows &gt; <span class="number">0</span> || buf.deltaBuffer.Size() &gt; <span class="number">0</span> &#123;</span><br><span class="line">            <span class="keyword">if</span> earliest == <span class="literal">nil</span> || buf.startPosition.GetTimestamp() &lt; earliest.position.GetTimestamp() &#123;</span><br><span class="line">                earliest = &amp;checkpointCandidate&#123;</span><br><span class="line">                    segmentID: buf.segmentID,</span><br><span class="line">                    position:  buf.startPosition,</span><br><span class="line">                &#125;</span><br><span class="line">            &#125;</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 2. 否则返回最新检查点</span></span><br><span class="line">    <span class="keyword">if</span> earliest != <span class="literal">nil</span> &#123;</span><br><span class="line">        <span class="keyword">return</span> earliest.position</span><br><span class="line">    &#125;</span><br><span class="line">    <span class="keyword">return</span> wb.checkpoint</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>设计原理：</strong></p><ul><li>有未同步数据时，检查点不能超过最早未同步数据的起始位置</li><li>确保故障恢复时不会丢失数据</li></ul><h2 id="Flush-与-Checkpoint-的关系"><a href="#Flush-与-Checkpoint-的关系" class="headerlink" title="Flush 与 Checkpoint 的关系"></a>Flush 与 Checkpoint 的关系</h2><h3 id="协同工作流程"><a href="#协同工作流程" class="headerlink" title="协同工作流程"></a>协同工作流程</h3><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br></pre></td><td class="code"><pre><span class="line">┌─────────────────────────────────────────────────────────────┐</span><br><span class="line">│                    Flush 操作                                │</span><br><span class="line">└──────────────────────┬──────────────────────────────────────┘</span><br><span class="line">                       │</span><br><span class="line">                       ▼</span><br><span class="line">┌─────────────────────────────────────────────────────────────┐</span><br><span class="line">│ 1. 设置 flushTs = T1                                         │</span><br><span class="line">│    FlushChannel(channel, flushTs=T1)                        │</span><br><span class="line">└──────────────────────┬──────────────────────────────────────┘</span><br><span class="line">                       │</span><br><span class="line">                       ▼</span><br><span class="line">┌─────────────────────────────────────────────────────────────┐</span><br><span class="line">│ 2. 触发数据同步                                              │</span><br><span class="line">│    SyncTask 执行，数据写入对象存储                           │</span><br><span class="line">└──────────────────────┬──────────────────────────────────────┘</span><br><span class="line">                       │</span><br><span class="line">                       ▼</span><br><span class="line">┌─────────────────────────────────────────────────────────────┐</span><br><span class="line">│ 3. 检查点推进到 T2 (T2 &gt;= T1)                                │</span><br><span class="line">│    GetCheckpoint() 返回 checkpoint = T2                     │</span><br><span class="line">│    needUpdate = (T2 &gt;= T1) = true                            │</span><br><span class="line">└──────────────────────┬──────────────────────────────────────┘</span><br><span class="line">                       │</span><br><span class="line">                       ▼</span><br><span class="line">┌─────────────────────────────────────────────────────────────┐</span><br><span class="line">│ 4. 更新检查点到 DataCoord                                    │</span><br><span class="line">│    UpdateChannelCheckpoint(T2)                                │</span><br><span class="line">└──────────────────────┬──────────────────────────────────────┘</span><br><span class="line">                       │</span><br><span class="line">                       ▼</span><br><span class="line">┌─────────────────────────────────────────────────────────────┐</span><br><span class="line">│ 5. 重置 flushTs                                              │</span><br><span class="line">│    NotifyCheckpointUpdated(channel, T2)                     │</span><br><span class="line">│    if T2 &gt; T1: flushTs = 0                                  │</span><br><span class="line">└─────────────────────────────────────────────────────────────┘</span><br></pre></td></tr></table></figure><h3 id="关键条件：ts-flushTs"><a href="#关键条件：ts-flushTs" class="headerlink" title="关键条件：ts &gt; flushTs"></a>关键条件：ts &gt; flushTs</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(m *bufferManager)</span></span> NotifyCheckpointUpdated(channel <span class="type">string</span>, ts <span class="type">uint64</span>) &#123;</span><br><span class="line">    buf, loaded := m.buffers.Get(channel)</span><br><span class="line">    <span class="keyword">if</span> !loaded &#123;</span><br><span class="line">        <span class="keyword">return</span></span><br><span class="line">    &#125;</span><br><span class="line">    flushTs := buf.GetFlushTimestamp()</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 关键条件：严格大于</span></span><br><span class="line">    <span class="keyword">if</span> flushTs != nonFlushTS &amp;&amp; ts &gt; flushTs &#123;</span><br><span class="line">        buf.SetFlushTimestamp(nonFlushTS)</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>为什么需要 <code>ts &gt; flushTs</code>（严格大于）？</strong></p><h4 id="1-异步操作的时序问题"><a href="#1-异步操作的时序问题" class="headerlink" title="1. 异步操作的时序问题"></a>1. 异步操作的时序问题</h4><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">时间线：</span><br><span class="line">T1: FlushChannel(flushTs=T1) 设置刷新时间戳</span><br><span class="line">T2: SyncTask 开始执行（异步）</span><br><span class="line">T3: SyncTask 完成，数据已写入</span><br><span class="line">T4: Checkpoint 更新到 T1（刚好等于 flushTs）</span><br><span class="line">T5: Checkpoint 更新到 T2（大于 flushTs）</span><br></pre></td></tr></table></figure><p><strong>如果使用 <code>ts &gt;= flushTs</code>：</strong></p><ul><li>在 T4 时刻，<code>T1 &gt;= T1</code> 为真，会重置 <code>flushTs</code></li><li>但此时 SyncTask 可能还未完成，导致数据不一致</li></ul><p><strong>使用 <code>ts &gt; flushTs</code>：</strong></p><ul><li>在 T4 时刻，<code>T1 &gt; T1</code> 为假，不会重置</li><li>在 T5 时刻，<code>T2 &gt; T1</code> 为真，此时 SyncTask 已完成，安全重置</li></ul><h4 id="2-边界情况处理"><a href="#2-边界情况处理" class="headerlink" title="2. 边界情况处理"></a>2. 边界情况处理</h4><p><strong>场景：检查点时间戳等于 flushTs</strong></p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">flushTs = 1000</span><br><span class="line">checkpoint = 1000  // 刚好等于</span><br></pre></td></tr></table></figure><p><strong>问题：</strong> 此时数据可能还在同步中，不能确定刷新已完成</p><p><strong>解决：</strong> 等待检查点超过 flushTs，确保刷新已完成</p><h4 id="3-数据一致性保证"><a href="#3-数据一致性保证" class="headerlink" title="3. 数据一致性保证"></a>3. 数据一致性保证</h4><p><strong>使用严格大于的好处：</strong></p><ul><li>确保刷新操作完全完成</li><li>避免过早重置 flushTs</li><li>保证数据一致性</li></ul><p><strong>对比：</strong></p><ul><li><code>GetFlushTsPolicy</code> 使用 <code>ts &gt;= flushTs</code>：触发刷新（包含边界）</li><li><code>NotifyCheckpointUpdated</code> 使用 <code>ts &gt; flushTs</code>：确认完成（排除边界）</li></ul><h2 id="Checkpoint-失败处理"><a href="#Checkpoint-失败处理" class="headerlink" title="Checkpoint 失败处理"></a>Checkpoint 失败处理</h2><h3 id="失败场景"><a href="#失败场景" class="headerlink" title="失败场景"></a>失败场景</h3><ol><li><strong>网络故障</strong>: RPC 调用失败</li><li><strong>DataCoord 不可用</strong>: 服务暂时不可用</li><li><strong>超时</strong>: RPC 超时</li></ol><h3 id="处理机制"><a href="#处理机制" class="headerlink" title="处理机制"></a>处理机制</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">err := ccu.broker.UpdateChannelCheckpoint(ctx, channelCPs)</span><br><span class="line"><span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">    log.Warn(<span class="string">&quot;update channel checkpoint failed&quot;</span>, zap.Error(err))</span><br><span class="line">    <span class="keyword">return</span>  <span class="comment">// 不删除任务，等待重试</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>特点：</strong></p><ul><li>失败时不删除任务</li><li>任务保留在 <code>ccu.tasks</code> 中</li><li>下次 <code>execute()</code> 或 <code>notifyChan</code> 触发时自动重试</li></ul><h3 id="任务合并"><a href="#任务合并" class="headerlink" title="任务合并"></a>任务合并</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// 合并逻辑：取最大时间戳，保留刷新标志</span></span><br><span class="line"><span class="keyword">if</span> task.pos.GetTimestamp() &lt; channelPos.GetTimestamp() || (flush &amp;&amp; !task.flush) &#123;</span><br><span class="line">    ccu.tasks[channel] = &amp;channelCPUpdateTask&#123;</span><br><span class="line">        pos:      max(channelPos, task.pos),</span><br><span class="line">        callback: callback,</span><br><span class="line">        flush:    flush || task.flush,</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>优势：</strong></p><ul><li>避免重复更新</li><li>确保使用最新的检查点</li><li>保留刷新触发标志</li></ul><h2 id="数据一致性保证"><a href="#数据一致性保证" class="headerlink" title="数据一致性保证"></a>数据一致性保证</h2><h3 id="1-刷新完成确认"><a href="#1-刷新完成确认" class="headerlink" title="1. 刷新完成确认"></a>1. 刷新完成确认</h3><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line">FlushChannel(flushTs=T1)</span><br><span class="line">    ↓</span><br><span class="line">SyncTask 执行（异步）</span><br><span class="line">    ↓</span><br><span class="line">Checkpoint 更新到 T2 (T2 &gt; T1)</span><br><span class="line">    ↓</span><br><span class="line">NotifyCheckpointUpdated(T2)</span><br><span class="line">    ↓</span><br><span class="line">重置 flushTs（确认刷新完成）</span><br></pre></td></tr></table></figure><h3 id="2-故障恢复"><a href="#2-故障恢复" class="headerlink" title="2. 故障恢复"></a>2. 故障恢复</h3><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line">节点重启</span><br><span class="line">    ↓</span><br><span class="line">从 DataCoord 获取检查点</span><br><span class="line">    ↓</span><br><span class="line">从检查点位置恢复数据消费</span><br><span class="line">    ↓</span><br><span class="line">确保数据不丢失、不重复</span><br></pre></td></tr></table></figure><h3 id="3-检查点推进规则"><a href="#3-检查点推进规则" class="headerlink" title="3. 检查点推进规则"></a>3. 检查点推进规则</h3><ul><li><strong>有未同步数据</strong>: 检查点不能超过最早未同步数据的起始位置</li><li><strong>无未同步数据</strong>: 检查点推进到最新位置</li></ul><h2 id="监控与调试"><a href="#监控与调试" class="headerlink" title="监控与调试"></a>监控与调试</h2><h3 id="关键指标"><a href="#关键指标" class="headerlink" title="关键指标"></a>关键指标</h3><ul><li><code>DataNodeFlushBufferCount</code>: 刷新计数</li><li><code>DataNodeFlushedSize</code>: 刷新数据大小</li><li><code>DataNodeSave2StorageLatency</code>: 存储延迟</li><li><code>UpdateChannelCheckpoint</code>: 检查点更新次数</li></ul><h3 id="日志关键点"><a href="#日志关键点" class="headerlink" title="日志关键点"></a>日志关键点</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// Flush 触发</span></span><br><span class="line">log.Info(<span class="string">&quot;receive manual flush message&quot;</span>, zap.Uint64(<span class="string">&quot;flushTs&quot;</span>, flushTs))</span><br><span class="line"></span><br><span class="line"><span class="comment">// 检查点更新</span></span><br><span class="line">log.Info(<span class="string">&quot;reset channel flushTs&quot;</span>, zap.String(<span class="string">&quot;channel&quot;</span>, channel))</span><br><span class="line"></span><br><span class="line"><span class="comment">// 检查点更新成功</span></span><br><span class="line">log.Debug(<span class="string">&quot;UpdateChannelCheckpoint success&quot;</span>, zap.Uint64(<span class="string">&quot;cpTs&quot;</span>, ts))</span><br></pre></td></tr></table></figure><h2 id="最佳实践"><a href="#最佳实践" class="headerlink" title="最佳实践"></a>最佳实践</h2><h3 id="1-Flush-时机"><a href="#1-Flush-时机" class="headerlink" title="1. Flush 时机"></a>1. Flush 时机</h3><ul><li><strong>手动刷新</strong>: 数据导入完成后</li><li><strong>自动刷新</strong>: 基于内存阈值和时间策略</li></ul><h3 id="2-Checkpoint-更新频率"><a href="#2-Checkpoint-更新频率" class="headerlink" title="2. Checkpoint 更新频率"></a>2. Checkpoint 更新频率</h3><ul><li><strong>定期更新</strong>: 避免过于频繁的 RPC</li><li><strong>刷新触发</strong>: 立即更新，加速刷新完成确认</li></ul><h3 id="3-故障处理"><a href="#3-故障处理" class="headerlink" title="3. 故障处理"></a>3. 故障处理</h3><ul><li><strong>重试机制</strong>: 自动重试失败的检查点更新</li><li><strong>任务合并</strong>: 避免重复更新</li></ul><h2 id="相关文档"><a href="#相关文档" class="headerlink" title="相关文档"></a>相关文档</h2><ul><li><a href="./flush_pipeline_write_buffer_manager.md">WriteBuffer 管理器</a></li><li><a href="./flush_pipeline_sync_manager.md">SyncManager 详解</a></li><li><a href="./flush_pipeline_tt_node.md">TT Node 详解</a></li><li><a href="./flush_pipeline_overview.md">总览文档</a></li></ul>]]></content>
    
    
      
      
    <summary type="html">&lt;h2 id=&quot;概述&quot;&gt;&lt;a href=&quot;#概述&quot; class=&quot;headerlink&quot; title=&quot;概述&quot;&gt;&lt;/a&gt;概述&lt;/h2&gt;&lt;p&gt;Flush（刷新）和 Checkpoint（检查点）是 Milvus DataNode 中确保数据可靠性和一致性的两个核心机制。它们协同工作</summary>
      
    
    
    
    
    <category term="Milvus" scheme="https://szza.github.io/tags/Milvus/"/>
    
  </entry>
  
  <entry>
    <title>Milvus WriteBuffer 实现详解</title>
    <link href="https://szza.github.io/2025/08/09/Milvus/7_flush_pipeline_write_buffer/"/>
    <id>https://szza.github.io/2025/08/09/Milvus/7_flush_pipeline_write_buffer/</id>
    <published>2025-08-09T08:00:00.000Z</published>
    <updated>2026-01-06T13:10:45.056Z</updated>
    
    <content type="html"><![CDATA[<h2 id="概述"><a href="#概述" class="headerlink" title="概述"></a>概述</h2><p><code>WriteBuffer</code> 是 Milvus DataNode 中管理单个通道数据缓冲的核心组件。它负责接收和缓冲插入&#x2F;删除数据，根据同步策略触发数据同步，并管理段的生命周期。</p><h2 id="接口定义"><a href="#接口定义" class="headerlink" title="接口定义"></a>接口定义</h2><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">type</span> WriteBuffer <span class="keyword">interface</span> &#123;</span><br><span class="line">    HasSegment(segmentID <span class="type">int64</span>) <span class="type">bool</span></span><br><span class="line">    CreateNewGrowingSegment(partitionID <span class="type">int64</span>, segmentID <span class="type">int64</span>, startPos *msgpb.MsgPosition)</span><br><span class="line">    BufferData(insertMsgs []*InsertData, deleteMsgs []*msgstream.DeleteMsg, startPos, endPos *msgpb.MsgPosition) <span class="type">error</span></span><br><span class="line">    SetFlushTimestamp(flushTs <span class="type">uint64</span>)</span><br><span class="line">    GetFlushTimestamp() <span class="type">uint64</span></span><br><span class="line">    SealSegments(ctx context.Context, segmentIDs []<span class="type">int64</span>) <span class="type">error</span></span><br><span class="line">    DropPartitions(partitionIDs []<span class="type">int64</span>)</span><br><span class="line">    GetCheckpoint() *msgpb.MsgPosition</span><br><span class="line">    MemorySize() <span class="type">int64</span></span><br><span class="line">    EvictBuffer(policies ...SyncPolicy)</span><br><span class="line">    Close(ctx context.Context, drop <span class="type">bool</span>)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h2 id="基础实现（writeBufferBase）"><a href="#基础实现（writeBufferBase）" class="headerlink" title="基础实现（writeBufferBase）"></a>基础实现（writeBufferBase）</h2><h3 id="核心结构"><a href="#核心结构" class="headerlink" title="核心结构"></a>核心结构</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">type</span> writeBufferBase <span class="keyword">struct</span> &#123;</span><br><span class="line">    channelName <span class="type">string</span></span><br><span class="line">    collectionID <span class="type">int64</span></span><br><span class="line">    </span><br><span class="line">    mut sync.RWMutex</span><br><span class="line">    </span><br><span class="line">    segmentBuffers <span class="keyword">map</span>[<span class="type">int64</span>]*segmentBuffer</span><br><span class="line">    checkpoint     *msgpb.MsgPosition</span><br><span class="line">    flushTimestamp *atomic.Uint64</span><br><span class="line">    </span><br><span class="line">    syncMgr   syncmgr.SyncManager</span><br><span class="line">    metaCache metacache.MetaCache</span><br><span class="line">    </span><br><span class="line">    syncPolicies []SyncPolicy</span><br><span class="line">    <span class="comment">// ... 其他字段</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h3 id="段缓冲（segmentBuffer）"><a href="#段缓冲（segmentBuffer）" class="headerlink" title="段缓冲（segmentBuffer）"></a>段缓冲（segmentBuffer）</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">type</span> segmentBuffer <span class="keyword">struct</span> &#123;</span><br><span class="line">    segmentID <span class="type">int64</span></span><br><span class="line">    </span><br><span class="line">    insertBuffer *InsertBuffer</span><br><span class="line">    deltaBuffer  *DeltaBuffer</span><br><span class="line">    </span><br><span class="line">    startPosition *msgpb.MsgPosition</span><br><span class="line">    checkpoint    *msgpb.MsgPosition</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>功能：</strong></p><ul><li><code>insertBuffer</code>: 存储插入数据</li><li><code>deltaBuffer</code>: 存储删除数据</li><li>维护段的起始位置和检查点</li></ul><h2 id="L0-WriteBuffer-特殊实现"><a href="#L0-WriteBuffer-特殊实现" class="headerlink" title="L0 WriteBuffer 特殊实现"></a>L0 WriteBuffer 特殊实现</h2><h3 id="为什么需要-L0-WriteBuffer？"><a href="#为什么需要-L0-WriteBuffer？" class="headerlink" title="为什么需要 L0 WriteBuffer？"></a>为什么需要 L0 WriteBuffer？</h3><p>在流式服务模式下，已刷新的段不再维护 Bloom Filter。为了高效处理删除操作，Milvus 引入了 L0 段（Level-0 Segment）专门用于存储删除数据。</p><h3 id="核心结构-1"><a href="#核心结构-1" class="headerlink" title="核心结构"></a>核心结构</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">type</span> l0WriteBuffer <span class="keyword">struct</span> &#123;</span><br><span class="line">    *writeBufferBase</span><br><span class="line">    </span><br><span class="line">    l0Segments  <span class="keyword">map</span>[<span class="type">int64</span>]<span class="type">int64</span>  <span class="comment">// partitionID =&gt; l0 segment ID</span></span><br><span class="line">    l0partition <span class="keyword">map</span>[<span class="type">int64</span>]<span class="type">int64</span>  <span class="comment">// l0 segment id =&gt; partition id</span></span><br><span class="line">    </span><br><span class="line">    syncMgr     syncmgr.SyncManager</span><br><span class="line">    idAllocator allocator.Interface</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h3 id="关键差异"><a href="#关键差异" class="headerlink" title="关键差异"></a>关键差异</h3><h4 id="1-删除消息分发（不使用-Bloom-Filter）"><a href="#1-删除消息分发（不使用-Bloom-Filter）" class="headerlink" title="1. 删除消息分发（不使用 Bloom Filter）"></a>1. 删除消息分发（不使用 Bloom Filter）</h4><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(wb *l0WriteBuffer)</span></span> dispatchDeleteMsgsWithoutFilter(deleteMsgs []*msgstream.DeleteMsg, </span><br><span class="line">    startPos, endPos *msgpb.MsgPosition) &#123;</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">for</span> _, msg := <span class="keyword">range</span> deleteMsgs &#123;</span><br><span class="line">        <span class="comment">// 获取或创建 L0 段 ID</span></span><br><span class="line">        l0SegmentID := wb.getL0SegmentID(msg.GetPartitionID(), startPos)</span><br><span class="line">        </span><br><span class="line">        <span class="comment">// 解析主键和时间戳</span></span><br><span class="line">        pks := storage.ParseIDs2PrimaryKeys(msg.GetPrimaryKeys())</span><br><span class="line">        pkTss := msg.GetTimestamps()</span><br><span class="line">        </span><br><span class="line">        <span class="keyword">if</span> <span class="built_in">len</span>(pks) &gt; <span class="number">0</span> &#123;</span><br><span class="line">            <span class="comment">// 直接缓冲删除数据，不使用 Bloom Filter 过滤</span></span><br><span class="line">            wb.bufferDelete(l0SegmentID, pks, pkTss, startPos, endPos)</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>特点：</strong></p><ul><li>不进行 Bloom Filter 过滤</li><li>所有删除消息都写入 L0 段</li><li>简化了删除处理逻辑</li></ul><h4 id="2-L0-段管理"><a href="#2-L0-段管理" class="headerlink" title="2. L0 段管理"></a>2. L0 段管理</h4><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(wb *l0WriteBuffer)</span></span> getL0SegmentID(partitionID <span class="type">int64</span>, startPos *msgpb.MsgPosition) <span class="type">int64</span> &#123;</span><br><span class="line">    segmentID, ok := wb.l0Segments[partitionID]</span><br><span class="line">    <span class="keyword">if</span> !ok &#123;</span><br><span class="line">        <span class="comment">// 分配新的 L0 段 ID</span></span><br><span class="line">        err := retry.Do(context.Background(), <span class="function"><span class="keyword">func</span><span class="params">()</span></span> <span class="type">error</span> &#123;</span><br><span class="line">            <span class="keyword">var</span> err <span class="type">error</span></span><br><span class="line">            segmentID, err = wb.idAllocator.AllocOne()</span><br><span class="line">            <span class="keyword">return</span> err</span><br><span class="line">        &#125;)</span><br><span class="line">        </span><br><span class="line">        wb.l0Segments[partitionID] = segmentID</span><br><span class="line">        wb.l0partition[segmentID] = partitionID</span><br><span class="line">        </span><br><span class="line">        <span class="comment">// 添加到 MetaCache，标记为 L0 Growing</span></span><br><span class="line">        wb.metaCache.AddSegment(&amp;datapb.SegmentInfo&#123;</span><br><span class="line">            ID:            segmentID,</span><br><span class="line">            PartitionID:   partitionID,</span><br><span class="line">            CollectionID:  wb.collectionID,</span><br><span class="line">            InsertChannel: wb.channelName,</span><br><span class="line">            StartPosition: startPos,</span><br><span class="line">            State:         commonpb.SegmentState_Growing,</span><br><span class="line">            Level:         datapb.SegmentLevel_L0,  <span class="comment">// L0 级别</span></span><br><span class="line">        &#125;, ...)</span><br><span class="line">    &#125;</span><br><span class="line">    <span class="keyword">return</span> segmentID</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>特点：</strong></p><ul><li>每个分区对应一个 L0 段</li><li>L0 段始终标记为 Growing 状态</li><li>L0 段总是需要刷新（isFlush &#x3D; true）</li></ul><h2 id="数据缓冲流程"><a href="#数据缓冲流程" class="headerlink" title="数据缓冲流程"></a>数据缓冲流程</h2><h3 id="BufferData-方法"><a href="#BufferData-方法" class="headerlink" title="BufferData 方法"></a>BufferData 方法</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(wb *l0WriteBuffer)</span></span> BufferData(insertData []*InsertData, deleteMsgs []*msgstream.DeleteMsg, </span><br><span class="line">    startPos, endPos *msgpb.MsgPosition) <span class="type">error</span> &#123;</span><br><span class="line">    </span><br><span class="line">    wb.mut.Lock()</span><br><span class="line">    <span class="keyword">defer</span> wb.mut.Unlock()</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 1. 缓冲插入数据</span></span><br><span class="line">    <span class="keyword">for</span> _, inData := <span class="keyword">range</span> insertData &#123;</span><br><span class="line">        err := wb.bufferInsert(inData, startPos, endPos)</span><br><span class="line">        <span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">            <span class="keyword">return</span> err</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 2. 分发删除消息（不使用 Bloom Filter）</span></span><br><span class="line">    wb.dispatchDeleteMsgsWithoutFilter(deleteMsgs, startPos, endPos)</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 3. 更新检查点</span></span><br><span class="line">    wb.checkpoint = endPos</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 4. 触发同步</span></span><br><span class="line">    segmentsSync := wb.triggerSync()</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 5. 清理已同步的 L0 段映射</span></span><br><span class="line">    <span class="keyword">for</span> _, segment := <span class="keyword">range</span> segmentsSync &#123;</span><br><span class="line">        partition, ok := wb.l0partition[segment]</span><br><span class="line">        <span class="keyword">if</span> ok &#123;</span><br><span class="line">            <span class="built_in">delete</span>(wb.l0partition, segment)</span><br><span class="line">            <span class="built_in">delete</span>(wb.l0Segments, partition)</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">return</span> <span class="literal">nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h2 id="同步策略（SyncPolicy）"><a href="#同步策略（SyncPolicy）" class="headerlink" title="同步策略（SyncPolicy）"></a>同步策略（SyncPolicy）</h2><h3 id="策略接口"><a href="#策略接口" class="headerlink" title="策略接口"></a>策略接口</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">type</span> SyncPolicy <span class="keyword">interface</span> &#123;</span><br><span class="line">    SelectSegments(buffers []*segmentBuffer, ts typeutil.Timestamp) []<span class="type">int64</span></span><br><span class="line">    Reason() <span class="type">string</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h3 id="常用策略"><a href="#常用策略" class="headerlink" title="常用策略"></a>常用策略</h3><h4 id="1-满缓冲区策略"><a href="#1-满缓冲区策略" class="headerlink" title="1. 满缓冲区策略"></a>1. 满缓冲区策略</h4><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">GetFullBufferPolicy</span><span class="params">()</span></span> SyncPolicy &#123;</span><br><span class="line">    <span class="keyword">return</span> wrapSelectSegmentFuncPolicy(<span class="function"><span class="keyword">func</span><span class="params">(buffers []*segmentBuffer, ts typeutil.Timestamp)</span></span> []<span class="type">int64</span> &#123;</span><br><span class="line">        <span class="keyword">var</span> result []<span class="type">int64</span></span><br><span class="line">        <span class="keyword">for</span> _, buf := <span class="keyword">range</span> buffers &#123;</span><br><span class="line">            <span class="keyword">if</span> buf.insertBuffer.IsFull() &#123;</span><br><span class="line">                result = <span class="built_in">append</span>(result, buf.segmentID)</span><br><span class="line">            &#125;</span><br><span class="line">        &#125;</span><br><span class="line">        <span class="keyword">return</span> result</span><br><span class="line">    &#125;, <span class="string">&quot;full buffer&quot;</span>)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h4 id="2-刷新时间戳策略"><a href="#2-刷新时间戳策略" class="headerlink" title="2. 刷新时间戳策略"></a>2. 刷新时间戳策略</h4><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">GetFlushTsPolicy</span><span class="params">(flushTimestamp *atomic.Uint64, meta metacache.MetaCache)</span></span> SyncPolicy &#123;</span><br><span class="line">    <span class="keyword">return</span> wrapSelectSegmentFuncPolicy(<span class="function"><span class="keyword">func</span><span class="params">(buffers []*segmentBuffer, ts typeutil.Timestamp)</span></span> []<span class="type">int64</span> &#123;</span><br><span class="line">        flushTs := flushTimestamp.Load()</span><br><span class="line">        <span class="keyword">if</span> flushTs != nonFlushTS &amp;&amp; ts &gt;= flushTs &#123;</span><br><span class="line">            <span class="keyword">var</span> result []<span class="type">int64</span></span><br><span class="line">            <span class="keyword">for</span> _, buf := <span class="keyword">range</span> buffers &#123;</span><br><span class="line">                <span class="keyword">if</span> buf.insertBuffer.MinTimestamp() &lt; flushTs &#123;</span><br><span class="line">                    result = <span class="built_in">append</span>(result, buf.segmentID)</span><br><span class="line">                &#125;</span><br><span class="line">            &#125;</span><br><span class="line">            <span class="keyword">return</span> result</span><br><span class="line">        &#125;</span><br><span class="line">        <span class="keyword">return</span> <span class="literal">nil</span></span><br><span class="line">    &#125;, <span class="string">&quot;flush ts&quot;</span>)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>触发条件：</strong> 当前时间戳 &gt;&#x3D; flushTs</p><h4 id="3-最老缓冲区策略"><a href="#3-最老缓冲区策略" class="headerlink" title="3. 最老缓冲区策略"></a>3. 最老缓冲区策略</h4><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">GetOldestBufferPolicy</span><span class="params">(maxNum <span class="type">int</span>)</span></span> SyncPolicy &#123;</span><br><span class="line">    <span class="keyword">return</span> wrapSelectSegmentFuncPolicy(<span class="function"><span class="keyword">func</span><span class="params">(buffers []*segmentBuffer, ts typeutil.Timestamp)</span></span> []<span class="type">int64</span> &#123;</span><br><span class="line">        <span class="comment">// 按时间戳排序，选择最老的 N 个</span></span><br><span class="line">        <span class="comment">// ...</span></span><br><span class="line">    &#125;, <span class="string">&quot;oldest buffer&quot;</span>)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h2 id="同步触发机制"><a href="#同步触发机制" class="headerlink" title="同步触发机制"></a>同步触发机制</h2><h3 id="EvictBuffer-方法"><a href="#EvictBuffer-方法" class="headerlink" title="EvictBuffer 方法"></a>EvictBuffer 方法</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(wb *writeBufferBase)</span></span> EvictBuffer(policies ...SyncPolicy) &#123;</span><br><span class="line">    wb.mut.RLock()</span><br><span class="line">    buffers := <span class="built_in">make</span>([]*segmentBuffer, <span class="number">0</span>, <span class="built_in">len</span>(wb.segmentBuffers))</span><br><span class="line">    <span class="keyword">for</span> _, buf := <span class="keyword">range</span> wb.segmentBuffers &#123;</span><br><span class="line">        buffers = <span class="built_in">append</span>(buffers, buf)</span><br><span class="line">    &#125;</span><br><span class="line">    currentTs := wb.checkpoint.GetTimestamp()</span><br><span class="line">    wb.mut.RUnlock()</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 合并所有策略</span></span><br><span class="line">    allPolicies := <span class="built_in">append</span>(wb.syncPolicies, policies...)</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 收集需要同步的段</span></span><br><span class="line">    segmentIDs := typeutil.NewUniqueSet()</span><br><span class="line">    <span class="keyword">for</span> _, policy := <span class="keyword">range</span> allPolicies &#123;</span><br><span class="line">        selected := policy.SelectSegments(buffers, currentTs)</span><br><span class="line">        <span class="keyword">for</span> _, id := <span class="keyword">range</span> selected &#123;</span><br><span class="line">            segmentIDs.Insert(id)</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 执行同步</span></span><br><span class="line">    <span class="keyword">if</span> segmentIDs.Len() &gt; <span class="number">0</span> &#123;</span><br><span class="line">        wb.syncSegments(context.Background(), segmentIDs.Collect())</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h3 id="syncSegments-方法"><a href="#syncSegments-方法" class="headerlink" title="syncSegments 方法"></a>syncSegments 方法</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(wb *writeBufferBase)</span></span> syncSegments(ctx context.Context, segmentIDs []<span class="type">int64</span>) []*conc.Future[<span class="keyword">struct</span>&#123;&#125;] &#123;</span><br><span class="line">    <span class="keyword">var</span> futures []*conc.Future[<span class="keyword">struct</span>&#123;&#125;]</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">for</span> _, segmentID := <span class="keyword">range</span> segmentIDs &#123;</span><br><span class="line">        <span class="comment">// 创建同步任务</span></span><br><span class="line">        task, err := wb.getSyncTask(ctx, segmentID)</span><br><span class="line">        <span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">            <span class="keyword">continue</span></span><br><span class="line">        &#125;</span><br><span class="line">        </span><br><span class="line">        <span class="comment">// 提交到 SyncManager</span></span><br><span class="line">        future, err := wb.syncMgr.SyncData(ctx, task, <span class="function"><span class="keyword">func</span><span class="params">(err <span class="type">error</span>)</span></span> <span class="type">error</span> &#123;</span><br><span class="line">            <span class="keyword">if</span> err == <span class="literal">nil</span> &amp;&amp; task.IsFlush() &#123;</span><br><span class="line">                <span class="comment">// 刷新成功后，从 MetaCache 移除段</span></span><br><span class="line">                wb.metaCache.RemoveSegments(metacache.WithSegmentIDs(segmentID))</span><br><span class="line">            &#125;</span><br><span class="line">            <span class="keyword">return</span> err</span><br><span class="line">        &#125;)</span><br><span class="line">        </span><br><span class="line">        futures = <span class="built_in">append</span>(futures, future)</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">return</span> futures</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h2 id="检查点管理"><a href="#检查点管理" class="headerlink" title="检查点管理"></a>检查点管理</h2><h3 id="GetCheckpoint"><a href="#GetCheckpoint" class="headerlink" title="GetCheckpoint"></a>GetCheckpoint</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(wb *writeBufferBase)</span></span> GetCheckpoint() *msgpb.MsgPosition &#123;</span><br><span class="line">    wb.mut.RLock()</span><br><span class="line">    <span class="keyword">defer</span> wb.mut.RUnlock()</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 如果有非空缓冲区，返回最早的起始位置</span></span><br><span class="line">    <span class="keyword">var</span> earliest *checkpointCandidate</span><br><span class="line">    <span class="keyword">for</span> _, buf := <span class="keyword">range</span> wb.segmentBuffers &#123;</span><br><span class="line">        <span class="keyword">if</span> buf.insertBuffer.rows &gt; <span class="number">0</span> || buf.deltaBuffer.Size() &gt; <span class="number">0</span> &#123;</span><br><span class="line">            candidate := &amp;checkpointCandidate&#123;</span><br><span class="line">                segmentID: buf.segmentID,</span><br><span class="line">                position:  buf.startPosition,</span><br><span class="line">            &#125;</span><br><span class="line">            <span class="keyword">if</span> earliest == <span class="literal">nil</span> || candidate.position.GetTimestamp() &lt; earliest.position.GetTimestamp() &#123;</span><br><span class="line">                earliest = candidate</span><br><span class="line">            &#125;</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 否则返回最新检查点</span></span><br><span class="line">    <span class="keyword">if</span> earliest != <span class="literal">nil</span> &#123;</span><br><span class="line">        <span class="keyword">return</span> earliest.position</span><br><span class="line">    &#125;</span><br><span class="line">    <span class="keyword">return</span> wb.checkpoint</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h2 id="段生命周期管理"><a href="#段生命周期管理" class="headerlink" title="段生命周期管理"></a>段生命周期管理</h2><h3 id="SealSegments"><a href="#SealSegments" class="headerlink" title="SealSegments"></a>SealSegments</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(wb *writeBufferBase)</span></span> SealSegments(ctx context.Context, segmentIDs []<span class="type">int64</span>) <span class="type">error</span> &#123;</span><br><span class="line">    wb.mut.Lock()</span><br><span class="line">    <span class="keyword">defer</span> wb.mut.Unlock()</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 更新 MetaCache：Growing -&gt; Sealed</span></span><br><span class="line">    wb.metaCache.UpdateSegments(</span><br><span class="line">        metacache.UpdateState(commonpb.SegmentState_Sealed),</span><br><span class="line">        metacache.WithSegmentIDs(segmentIDs...),</span><br><span class="line">    )</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 触发同步</span></span><br><span class="line">    wb.syncSegments(ctx, segmentIDs)</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">return</span> <span class="literal">nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h2 id="默认实现选择"><a href="#默认实现选择" class="headerlink" title="默认实现选择"></a>默认实现选择</h2><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">NewWriteBuffer</span><span class="params">(channel <span class="type">string</span>, metacache metacache.MetaCache, </span></span></span><br><span class="line"><span class="params"><span class="function">    syncMgr syncmgr.SyncManager, opts ...WriteBufferOption)</span></span> (WriteBuffer, <span class="type">error</span>) &#123;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 默认使用 L0WriteBuffer</span></span><br><span class="line">    <span class="keyword">return</span> NewL0WriteBuffer(channel, metacache, syncMgr, option)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>原因：</strong></p><ul><li>L0WriteBuffer 适配流式服务模式</li><li>简化删除处理逻辑</li><li>提高删除操作性能</li></ul><h2 id="设计特点"><a href="#设计特点" class="headerlink" title="设计特点"></a>设计特点</h2><h3 id="1-分层设计"><a href="#1-分层设计" class="headerlink" title="1. 分层设计"></a>1. 分层设计</h3><ul><li><code>WriteBuffer</code> 接口：定义抽象</li><li><code>writeBufferBase</code>：基础实现</li><li><code>l0WriteBuffer</code>：特殊实现</li></ul><h3 id="2-策略模式"><a href="#2-策略模式" class="headerlink" title="2. 策略模式"></a>2. 策略模式</h3><p>通过 <code>SyncPolicy</code> 实现灵活的同步策略。</p><h3 id="3-线程安全"><a href="#3-线程安全" class="headerlink" title="3. 线程安全"></a>3. 线程安全</h3><p>使用 <code>RWMutex</code> 保护共享状态。</p><h3 id="4-异步同步"><a href="#4-异步同步" class="headerlink" title="4. 异步同步"></a>4. 异步同步</h3><p>通过 <code>SyncManager</code> 实现异步数据同步。</p><h2 id="相关文档"><a href="#相关文档" class="headerlink" title="相关文档"></a>相关文档</h2><ul><li><a href="./flush_pipeline_write_buffer_manager.md">WriteBuffer 管理器</a></li><li><a href="./flush_pipeline_sync_manager.md">SyncManager 详解</a></li><li><a href="./flush_pipeline_overview.md">总览文档</a></li></ul>]]></content>
    
    
      
      
    <summary type="html">&lt;h2 id=&quot;概述&quot;&gt;&lt;a href=&quot;#概述&quot; class=&quot;headerlink&quot; title=&quot;概述&quot;&gt;&lt;/a&gt;概述&lt;/h2&gt;&lt;p&gt;&lt;code&gt;WriteBuffer&lt;/code&gt; 是 Milvus DataNode 中管理单个通道数据缓冲的核心组件。它负责接收和缓冲插入</summary>
      
    
    
    
    
    <category term="Milvus" scheme="https://szza.github.io/tags/Milvus/"/>
    
  </entry>
  
  <entry>
    <title>Milvus WriteBuffer 管理器（bufferManager）详解</title>
    <link href="https://szza.github.io/2025/08/09/Milvus/6_flush_pipeline_write_buffer_manager/"/>
    <id>https://szza.github.io/2025/08/09/Milvus/6_flush_pipeline_write_buffer_manager/</id>
    <published>2025-08-09T07:00:00.000Z</published>
    <updated>2026-01-06T13:10:44.620Z</updated>
    
    <content type="html"><![CDATA[<h2 id="概述"><a href="#概述" class="headerlink" title="概述"></a>概述</h2><p><code>bufferManager</code> 是 Milvus DataNode 中管理多个 <code>WriteBuffer</code> 实例的核心组件。它负责按通道组织和管理数据缓冲，实现内存监控、自动驱逐和刷新策略。</p><h2 id="架构设计"><a href="#架构设计" class="headerlink" title="架构设计"></a>架构设计</h2><h3 id="核心结构"><a href="#核心结构" class="headerlink" title="核心结构"></a>核心结构</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">type</span> bufferManager <span class="keyword">struct</span> &#123;</span><br><span class="line">    syncMgr syncmgr.SyncManager</span><br><span class="line">    buffers *typeutil.ConcurrentMap[<span class="type">string</span>, WriteBuffer]</span><br><span class="line">    </span><br><span class="line">    wg sync.WaitGroup</span><br><span class="line">    ch lifetime.SafeChan</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>关键字段说明：</strong></p><ul><li><code>syncMgr</code>: 同步管理器，用于提交同步任务</li><li><code>buffers</code>: 线程安全的映射表，key 为通道名，value 为对应的 WriteBuffer</li><li><code>wg</code>: WaitGroup，用于等待后台 goroutine 退出</li><li><code>ch</code>: 安全通道，用于控制后台检查的启停</li></ul><h3 id="接口定义"><a href="#接口定义" class="headerlink" title="接口定义"></a>接口定义</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">type</span> BufferManager <span class="keyword">interface</span> &#123;</span><br><span class="line">    Register(channel <span class="type">string</span>, metacache metacache.MetaCache, opts ...WriteBufferOption) <span class="type">error</span></span><br><span class="line">    CreateNewGrowingSegment(ctx context.Context, channel <span class="type">string</span>, partition <span class="type">int64</span>, segmentID <span class="type">int64</span>) <span class="type">error</span></span><br><span class="line">    SealSegments(ctx context.Context, channel <span class="type">string</span>, segmentIDs []<span class="type">int64</span>) <span class="type">error</span></span><br><span class="line">    FlushChannel(ctx context.Context, channel <span class="type">string</span>, flushTs <span class="type">uint64</span>) <span class="type">error</span></span><br><span class="line">    RemoveChannel(channel <span class="type">string</span>)</span><br><span class="line">    DropChannel(channel <span class="type">string</span>)</span><br><span class="line">    DropPartitions(channel <span class="type">string</span>, partitionIDs []<span class="type">int64</span>)</span><br><span class="line">    BufferData(channel <span class="type">string</span>, insertData []*InsertData, deleteMsgs []*msgstream.DeleteMsg, startPos, endPos *msgpb.MsgPosition) <span class="type">error</span></span><br><span class="line">    GetCheckpoint(channel <span class="type">string</span>) (*msgpb.MsgPosition, <span class="type">bool</span>, <span class="type">error</span>)</span><br><span class="line">    NotifyCheckpointUpdated(channel <span class="type">string</span>, ts <span class="type">uint64</span>)</span><br><span class="line">    </span><br><span class="line">    Start()</span><br><span class="line">    Stop()</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h2 id="核心功能"><a href="#核心功能" class="headerlink" title="核心功能"></a>核心功能</h2><h3 id="1-通道注册"><a href="#1-通道注册" class="headerlink" title="1. 通道注册"></a>1. 通道注册</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(m *bufferManager)</span></span> Register(channel <span class="type">string</span>, metacache metacache.MetaCache, opts ...WriteBufferOption) <span class="type">error</span> &#123;</span><br><span class="line">    buf, err := NewWriteBuffer(channel, metacache, m.syncMgr, opts...)</span><br><span class="line">    <span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">        <span class="keyword">return</span> err</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    _, loaded := m.buffers.GetOrInsert(channel, buf)</span><br><span class="line">    <span class="keyword">if</span> loaded &#123;</span><br><span class="line">        buf.Close(context.Background(), <span class="literal">false</span>)</span><br><span class="line">        <span class="keyword">return</span> merr.WrapErrChannelReduplicate(channel)</span><br><span class="line">    &#125;</span><br><span class="line">    <span class="keyword">return</span> <span class="literal">nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>功能：</strong></p><ul><li>创建新的 WriteBuffer 实例</li><li>注册到管理器</li><li>防止重复注册</li></ul><h3 id="2-数据缓冲"><a href="#2-数据缓冲" class="headerlink" title="2. 数据缓冲"></a>2. 数据缓冲</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(m *bufferManager)</span></span> BufferData(channel <span class="type">string</span>, insertData []*InsertData, </span><br><span class="line">    deleteMsgs []*msgstream.DeleteMsg, startPos, endPos *msgpb.MsgPosition) <span class="type">error</span> &#123;</span><br><span class="line">    </span><br><span class="line">    buf, loaded := m.buffers.Get(channel)</span><br><span class="line">    <span class="keyword">if</span> !loaded &#123;</span><br><span class="line">        <span class="keyword">return</span> merr.WrapErrChannelNotFound(channel)</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">return</span> buf.BufferData(insertData, deleteMsgs, startPos, endPos)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h3 id="3-段管理"><a href="#3-段管理" class="headerlink" title="3. 段管理"></a>3. 段管理</h3><h4 id="创建-Growing-段"><a href="#创建-Growing-段" class="headerlink" title="创建 Growing 段"></a>创建 Growing 段</h4><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(m *bufferManager)</span></span> CreateNewGrowingSegment(ctx context.Context, channel <span class="type">string</span>, </span><br><span class="line">    partitionID <span class="type">int64</span>, segmentID <span class="type">int64</span>) <span class="type">error</span> &#123;</span><br><span class="line">    </span><br><span class="line">    buf, loaded := m.buffers.Get(channel)</span><br><span class="line">    <span class="keyword">if</span> !loaded &#123;</span><br><span class="line">        <span class="keyword">return</span> merr.WrapErrChannelNotFound(channel)</span><br><span class="line">    &#125;</span><br><span class="line">    buf.CreateNewGrowingSegment(partitionID, segmentID, <span class="literal">nil</span>)</span><br><span class="line">    <span class="keyword">return</span> <span class="literal">nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h4 id="封装段"><a href="#封装段" class="headerlink" title="封装段"></a>封装段</h4><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(m *bufferManager)</span></span> SealSegments(ctx context.Context, channel <span class="type">string</span>, segmentIDs []<span class="type">int64</span>) <span class="type">error</span> &#123;</span><br><span class="line">    buf, loaded := m.buffers.Get(channel)</span><br><span class="line">    <span class="keyword">if</span> !loaded &#123;</span><br><span class="line">        <span class="keyword">return</span> merr.WrapErrChannelNotFound(channel)</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">return</span> buf.SealSegments(ctx, segmentIDs)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>功能：</strong></p><ul><li>将段状态从 Growing 变为 Sealed</li><li>触发段的同步操作</li></ul><h3 id="4-刷新通道"><a href="#4-刷新通道" class="headerlink" title="4. 刷新通道"></a>4. 刷新通道</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(m *bufferManager)</span></span> FlushChannel(ctx context.Context, channel <span class="type">string</span>, flushTs <span class="type">uint64</span>) <span class="type">error</span> &#123;</span><br><span class="line">    buf, loaded := m.buffers.Get(channel)</span><br><span class="line">    <span class="keyword">if</span> !loaded &#123;</span><br><span class="line">        <span class="keyword">return</span> merr.WrapErrChannelNotFound(channel)</span><br><span class="line">    &#125;</span><br><span class="line">    buf.SetFlushTimestamp(flushTs)</span><br><span class="line">    <span class="keyword">return</span> <span class="literal">nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>功能：</strong></p><ul><li>设置通道的刷新时间戳（flushTs）</li><li>触发基于时间戳的刷新策略</li></ul><h3 id="5-检查点管理"><a href="#5-检查点管理" class="headerlink" title="5. 检查点管理"></a>5. 检查点管理</h3><h4 id="获取检查点"><a href="#获取检查点" class="headerlink" title="获取检查点"></a>获取检查点</h4><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(m *bufferManager)</span></span> GetCheckpoint(channel <span class="type">string</span>) (*msgpb.MsgPosition, <span class="type">bool</span>, <span class="type">error</span>) &#123;</span><br><span class="line">    buf, loaded := m.buffers.Get(channel)</span><br><span class="line">    <span class="keyword">if</span> !loaded &#123;</span><br><span class="line">        <span class="keyword">return</span> <span class="literal">nil</span>, <span class="literal">false</span>, merr.WrapErrChannelNotFound(channel)</span><br><span class="line">    &#125;</span><br><span class="line">    cp := buf.GetCheckpoint()</span><br><span class="line">    flushTs := buf.GetFlushTimestamp()</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 返回检查点和是否需要更新的标志</span></span><br><span class="line">    <span class="keyword">return</span> cp, flushTs != nonFlushTS &amp;&amp; cp.GetTimestamp() &gt;= flushTs, <span class="literal">nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>返回值说明：</strong></p><ul><li><code>*msgpb.MsgPosition</code>: 通道检查点</li><li><code>bool</code>: 是否需要更新（当检查点时间戳 &gt;&#x3D; flushTs 时返回 true）</li><li><code>error</code>: 错误信息</li></ul><h4 id="通知检查点更新"><a href="#通知检查点更新" class="headerlink" title="通知检查点更新"></a>通知检查点更新</h4><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(m *bufferManager)</span></span> NotifyCheckpointUpdated(channel <span class="type">string</span>, ts <span class="type">uint64</span>) &#123;</span><br><span class="line">    buf, loaded := m.buffers.Get(channel)</span><br><span class="line">    <span class="keyword">if</span> !loaded &#123;</span><br><span class="line">        <span class="keyword">return</span></span><br><span class="line">    &#125;</span><br><span class="line">    flushTs := buf.GetFlushTimestamp()</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 关键条件：ts &gt; flushTs</span></span><br><span class="line">    <span class="keyword">if</span> flushTs != nonFlushTS &amp;&amp; ts &gt; flushTs &#123;</span><br><span class="line">        log.Info(<span class="string">&quot;reset channel flushTs&quot;</span>, zap.String(<span class="string">&quot;channel&quot;</span>, channel))</span><br><span class="line">        buf.SetFlushTimestamp(nonFlushTS)</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>关键逻辑：</strong></p><ul><li>当检查点时间戳<strong>严格大于</strong> flushTs 时，重置 flushTs</li><li>这确保了刷新操作已完成，可以安全重置</li></ul><h2 id="内存管理"><a href="#内存管理" class="headerlink" title="内存管理"></a>内存管理</h2><h3 id="后台检查机制"><a href="#后台检查机制" class="headerlink" title="后台检查机制"></a>后台检查机制</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(m *bufferManager)</span></span> Start() &#123;</span><br><span class="line">    m.wg.Add(<span class="number">1</span>)</span><br><span class="line">    <span class="keyword">go</span> <span class="function"><span class="keyword">func</span><span class="params">()</span></span> &#123;</span><br><span class="line">        <span class="keyword">defer</span> m.wg.Done()</span><br><span class="line">        m.check()</span><br><span class="line">    &#125;()</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(m *bufferManager)</span></span> check() &#123;</span><br><span class="line">    timer := time.NewTimer(paramtable.Get().DataNodeCfg.MemoryCheckInterval.GetAsDuration(time.Millisecond))</span><br><span class="line">    <span class="keyword">defer</span> timer.Stop()</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">for</span> &#123;</span><br><span class="line">        <span class="keyword">select</span> &#123;</span><br><span class="line">        <span class="keyword">case</span> &lt;-timer.C:</span><br><span class="line">            m.memoryCheck()</span><br><span class="line">            timer.Reset(paramtable.Get().DataNodeCfg.MemoryCheckInterval.GetAsDuration(time.Millisecond))</span><br><span class="line">        <span class="keyword">case</span> &lt;-m.ch.CloseCh():</span><br><span class="line">            log.Info(<span class="string">&quot;buffer manager memory check stopped&quot;</span>)</span><br><span class="line">            <span class="keyword">return</span></span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h3 id="内存检查逻辑"><a href="#内存检查逻辑" class="headerlink" title="内存检查逻辑"></a>内存检查逻辑</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(m *bufferManager)</span></span> memoryCheck() &#123;</span><br><span class="line">    <span class="keyword">if</span> !paramtable.Get().DataNodeCfg.MemoryForceSyncEnable.GetAsBool() &#123;</span><br><span class="line">        <span class="keyword">return</span></span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">for</span> &#123;</span><br><span class="line">        <span class="keyword">var</span> total <span class="type">int64</span></span><br><span class="line">        <span class="keyword">var</span> candidate WriteBuffer</span><br><span class="line">        <span class="keyword">var</span> candiSize <span class="type">int64</span></span><br><span class="line">        <span class="keyword">var</span> candiChan <span class="type">string</span></span><br><span class="line">        </span><br><span class="line">        <span class="comment">// 遍历所有缓冲区，找到最大的</span></span><br><span class="line">        m.buffers.Range(<span class="function"><span class="keyword">func</span><span class="params">(chanName <span class="type">string</span>, buf WriteBuffer)</span></span> <span class="type">bool</span> &#123;</span><br><span class="line">            size := buf.MemorySize()</span><br><span class="line">            total += size</span><br><span class="line">            <span class="keyword">if</span> size &gt; candiSize &#123;</span><br><span class="line">                candiSize = size</span><br><span class="line">                candidate = buf</span><br><span class="line">                candiChan = chanName</span><br><span class="line">            &#125;</span><br><span class="line">            <span class="keyword">return</span> <span class="literal">true</span></span><br><span class="line">        &#125;)</span><br><span class="line">        </span><br><span class="line">        <span class="comment">// 检查是否超过水位线</span></span><br><span class="line">        totalMemory := hardware.GetMemoryCount()</span><br><span class="line">        memoryWatermark := <span class="type">float64</span>(totalMemory) * paramtable.Get().DataNodeCfg.MemoryForceSyncWatermark.GetAsFloat()</span><br><span class="line">        </span><br><span class="line">        <span class="keyword">if</span> <span class="type">float64</span>(total) &lt; memoryWatermark &#123;</span><br><span class="line">            <span class="keyword">return</span>  <span class="comment">// 未超过水位线，退出</span></span><br><span class="line">        &#125;</span><br><span class="line">        </span><br><span class="line">        <span class="comment">// 触发同步</span></span><br><span class="line">        <span class="keyword">if</span> candidate != <span class="literal">nil</span> &#123;</span><br><span class="line">            candidate.EvictBuffer(GetOldestBufferPolicy(</span><br><span class="line">                paramtable.Get().DataNodeCfg.MemoryForceSyncSegmentNum.GetAsInt()))</span><br><span class="line">            log.Info(<span class="string">&quot;notify writebuffer to sync&quot;</span>,</span><br><span class="line">                zap.String(<span class="string">&quot;channel&quot;</span>, candiChan), </span><br><span class="line">                zap.Float64(<span class="string">&quot;bufferSize(MB)&quot;</span>, logutil.ToMB(<span class="type">float64</span>(candiSize))))</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>内存管理策略：</strong></p><ol><li><strong>定期检查</strong>: 按配置的时间间隔检查内存使用</li><li><strong>水位线机制</strong>: 当总内存超过水位线时触发同步</li><li><strong>选择策略</strong>: 选择内存使用最大的缓冲区进行同步</li><li><strong>驱逐策略</strong>: 使用 <code>GetOldestBufferPolicy</code> 选择最老的段进行同步</li></ol><h2 id="通道生命周期管理"><a href="#通道生命周期管理" class="headerlink" title="通道生命周期管理"></a>通道生命周期管理</h2><h3 id="移除通道"><a href="#移除通道" class="headerlink" title="移除通道"></a>移除通道</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(m *bufferManager)</span></span> RemoveChannel(channel <span class="type">string</span>) &#123;</span><br><span class="line">    buf, loaded := m.buffers.GetAndRemove(channel)</span><br><span class="line">    <span class="keyword">if</span> !loaded &#123;</span><br><span class="line">        <span class="keyword">return</span></span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 丢弃所有缓冲数据</span></span><br><span class="line">    buf.Close(context.Background(), <span class="literal">false</span>)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>用途：</strong> 通道迁移时，丢弃本地缓冲数据</p><h3 id="删除通道"><a href="#删除通道" class="headerlink" title="删除通道"></a>删除通道</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(m *bufferManager)</span></span> DropChannel(channel <span class="type">string</span>) &#123;</span><br><span class="line">    buf, loaded := m.buffers.GetAndRemove(channel)</span><br><span class="line">    <span class="keyword">if</span> !loaded &#123;</span><br><span class="line">        <span class="keyword">return</span></span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 保存所有缓冲数据</span></span><br><span class="line">    buf.Close(context.Background(), <span class="literal">true</span>)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>用途：</strong> 删除集合时，保存所有缓冲数据</p><h3 id="删除分区"><a href="#删除分区" class="headerlink" title="删除分区"></a>删除分区</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(m *bufferManager)</span></span> DropPartitions(channel <span class="type">string</span>, partitionIDs []<span class="type">int64</span>) &#123;</span><br><span class="line">    buf, loaded := m.buffers.Get(channel)</span><br><span class="line">    <span class="keyword">if</span> !loaded &#123;</span><br><span class="line">        <span class="keyword">return</span></span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    buf.DropPartitions(partitionIDs)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h2 id="设计特点"><a href="#设计特点" class="headerlink" title="设计特点"></a>设计特点</h2><h3 id="1-线程安全"><a href="#1-线程安全" class="headerlink" title="1. 线程安全"></a>1. 线程安全</h3><p>使用 <code>ConcurrentMap</code> 实现线程安全的并发访问。</p><h3 id="2-内存保护"><a href="#2-内存保护" class="headerlink" title="2. 内存保护"></a>2. 内存保护</h3><p>通过后台检查机制，防止内存溢出。</p><h3 id="3-灵活的策略"><a href="#3-灵活的策略" class="headerlink" title="3. 灵活的策略"></a>3. 灵活的策略</h3><p>支持多种同步策略（内存、时间、手动刷新等）。</p><h3 id="4-优雅关闭"><a href="#4-优雅关闭" class="headerlink" title="4. 优雅关闭"></a>4. 优雅关闭</h3><p>通过 <code>SafeChan</code> 和 <code>WaitGroup</code> 实现优雅关闭。</p><h2 id="与其他组件的关系"><a href="#与其他组件的关系" class="headerlink" title="与其他组件的关系"></a>与其他组件的关系</h2><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line">bufferManager</span><br><span class="line">    │</span><br><span class="line">    ├── WriteBuffer (按通道组织)</span><br><span class="line">    │       │</span><br><span class="line">    │       ├── segmentBuffer (按段组织)</span><br><span class="line">    │       │       ├── InsertBuffer</span><br><span class="line">    │       │       └── DeltaBuffer</span><br><span class="line">    │       │</span><br><span class="line">    │       └── SyncPolicy (同步策略)</span><br><span class="line">    │</span><br><span class="line">    └── SyncManager</span><br><span class="line">            └── SyncTask</span><br></pre></td></tr></table></figure><h2 id="配置参数"><a href="#配置参数" class="headerlink" title="配置参数"></a>配置参数</h2><ul><li><code>MemoryCheckInterval</code>: 内存检查间隔</li><li><code>MemoryForceSyncEnable</code>: 是否启用强制同步</li><li><code>MemoryForceSyncWatermark</code>: 内存强制同步水位线</li><li><code>MemoryForceSyncSegmentNum</code>: 每次强制同步的段数量</li></ul><h2 id="相关文档"><a href="#相关文档" class="headerlink" title="相关文档"></a>相关文档</h2><ul><li><a href="./flush_pipeline_write_buffer.md">WriteBuffer 实现</a></li><li><a href="./flush_pipeline_sync_manager.md">SyncManager 详解</a></li><li><a href="./flush_pipeline_flush_checkpoint.md">Flush 与 Checkpoint 机制</a></li><li><a href="./flush_pipeline_overview.md">总览文档</a></li></ul>]]></content>
    
    
      
      
    <summary type="html">&lt;h2 id=&quot;概述&quot;&gt;&lt;a href=&quot;#概述&quot; class=&quot;headerlink&quot; title=&quot;概述&quot;&gt;&lt;/a&gt;概述&lt;/h2&gt;&lt;p&gt;&lt;code&gt;bufferManager&lt;/code&gt; 是 Milvus DataNode 中管理多个 &lt;code&gt;WriteBuffer&lt;/</summary>
      
    
    
    
    
    <category term="Milvus" scheme="https://szza.github.io/tags/Milvus/"/>
    
  </entry>
  
  <entry>
    <title>Milvus DDNode（数据分发节点）详解</title>
    <link href="https://szza.github.io/2025/08/09/Milvus/5_flush_pipeline_dd_node/"/>
    <id>https://szza.github.io/2025/08/09/Milvus/5_flush_pipeline_dd_node/</id>
    <published>2025-08-09T06:00:00.000Z</published>
    <updated>2026-01-06T13:10:44.131Z</updated>
    
    <content type="html"><![CDATA[<h2 id="概述"><a href="#概述" class="headerlink" title="概述"></a>概述</h2><p><code>ddNode</code>（Data Distribution Node）是 FlowGraph 中的第一个处理节点，负责从消息流中过滤和分发消息。它处理插入消息、删除消息以及 DDL 消息（如 CreateSegment、Flush、DropCollection），确保只有有效的数据消息传递到后续节点。</p><h2 id="架构设计"><a href="#架构设计" class="headerlink" title="架构设计"></a>架构设计</h2><h3 id="核心结构"><a href="#核心结构" class="headerlink" title="核心结构"></a>核心结构</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">type</span> ddNode <span class="keyword">struct</span> &#123;</span><br><span class="line">    BaseNode</span><br><span class="line">    </span><br><span class="line">    ctx          context.Context</span><br><span class="line">    collectionID typeutil.UniqueID</span><br><span class="line">    vChannelName  <span class="type">string</span></span><br><span class="line">    </span><br><span class="line">    dropMode   atomic.Value</span><br><span class="line">    msgHandler util.MsgHandler</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 段信息缓存（用于过滤）</span></span><br><span class="line">    growingSegInfo    <span class="keyword">map</span>[typeutil.UniqueID]*datapb.SegmentInfo</span><br><span class="line">    sealedSegInfo     <span class="keyword">map</span>[typeutil.UniqueID]*datapb.SegmentInfo</span><br><span class="line">    droppedSegmentIDs []<span class="type">int64</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>关键字段说明：</strong></p><ul><li><code>dropMode</code>: 原子值，标记是否处于删除模式</li><li><code>msgHandler</code>: DDL 消息处理器</li><li><code>growingSegInfo</code>: Growing 段信息映射</li><li><code>sealedSegInfo</code>: Sealed 段信息映射</li><li><code>droppedSegmentIDs</code>: 已删除段 ID 列表</li></ul><h2 id="消息处理流程"><a href="#消息处理流程" class="headerlink" title="消息处理流程"></a>消息处理流程</h2><h3 id="Operate-方法"><a href="#Operate-方法" class="headerlink" title="Operate 方法"></a>Operate 方法</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(ddn *ddNode)</span></span> Operate(in []Msg) []Msg &#123;</span><br><span class="line">    msMsg, ok := in[<span class="number">0</span>].(*MsgStreamMsg)</span><br><span class="line">    <span class="keyword">if</span> !ok &#123;</span><br><span class="line">        <span class="keyword">return</span> []Msg&#123;&#125;</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 1. 处理关闭消息</span></span><br><span class="line">    <span class="keyword">if</span> msMsg.IsCloseMsg() &#123;</span><br><span class="line">        <span class="keyword">return</span> []Msg&#123;&amp;FlowGraphMsg&#123;...&#125;&#125;</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 2. 检查删除模式</span></span><br><span class="line">    <span class="keyword">if</span> load := ddn.dropMode.Load(); load != <span class="literal">nil</span> &amp;&amp; load.(<span class="type">bool</span>) &#123;</span><br><span class="line">        <span class="keyword">return</span> []Msg&#123;&#125;</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 3. 处理消息流中的每条消息</span></span><br><span class="line">    fgMsg := FlowGraphMsg&#123;...&#125;</span><br><span class="line">    <span class="keyword">for</span> _, msg := <span class="keyword">range</span> msMsg.TsMessages() &#123;</span><br><span class="line">        <span class="keyword">switch</span> msg.Type() &#123;</span><br><span class="line">        <span class="keyword">case</span> commonpb.MsgType_DropCollection:</span><br><span class="line">            <span class="comment">// 处理删除集合消息</span></span><br><span class="line">        <span class="keyword">case</span> commonpb.MsgType_DropPartition:</span><br><span class="line">            <span class="comment">// 处理删除分区消息</span></span><br><span class="line">        <span class="keyword">case</span> commonpb.MsgType_Insert:</span><br><span class="line">            <span class="comment">// 处理插入消息</span></span><br><span class="line">        <span class="keyword">case</span> commonpb.MsgType_Delete:</span><br><span class="line">            <span class="comment">// 处理删除消息</span></span><br><span class="line">        <span class="keyword">case</span> commonpb.MsgType_CreateSegment:</span><br><span class="line">            <span class="comment">// 处理创建段消息</span></span><br><span class="line">        <span class="keyword">case</span> commonpb.MsgType_FlushSegment:</span><br><span class="line">            <span class="comment">// 处理刷新段消息</span></span><br><span class="line">        <span class="keyword">case</span> commonpb.MsgType_ManualFlush:</span><br><span class="line">            <span class="comment">// 处理手动刷新消息</span></span><br><span class="line">        <span class="keyword">case</span> commonpb.MsgType_AddCollectionField:</span><br><span class="line">            <span class="comment">// 处理 Schema 变更消息</span></span><br><span class="line">        <span class="keyword">case</span> commonpb.MsgType_AlterCollection:</span><br><span class="line">            <span class="comment">// 处理修改集合消息</span></span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">return</span> []Msg&#123;&amp;fgMsg&#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h2 id="消息类型处理"><a href="#消息类型处理" class="headerlink" title="消息类型处理"></a>消息类型处理</h2><h3 id="1-Insert-消息处理"><a href="#1-Insert-消息处理" class="headerlink" title="1. Insert 消息处理"></a>1. Insert 消息处理</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">case</span> commonpb.MsgType_Insert:</span><br><span class="line">    imsg := msg.(*msgstream.InsertMsg)</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 检查集合 ID 匹配</span></span><br><span class="line">    <span class="keyword">if</span> imsg.CollectionID != ddn.collectionID &#123;</span><br><span class="line">        <span class="keyword">continue</span></span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 过滤段消息</span></span><br><span class="line">    <span class="keyword">if</span> ddn.tryToFilterSegmentInsertMessages(imsg) &#123;</span><br><span class="line">        <span class="keyword">continue</span></span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 记录指标</span></span><br><span class="line">    metrics.DataNodeConsumeMsgCount.Inc()</span><br><span class="line">    metrics.DataNodeConsumeMsgRowsCount.Add(<span class="type">float64</span>(imsg.GetNumRows()))</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 添加到输出消息</span></span><br><span class="line">    fgMsg.InsertMessages = <span class="built_in">append</span>(fgMsg.InsertMessages, imsg)</span><br></pre></td></tr></table></figure><h3 id="2-Delete-消息处理"><a href="#2-Delete-消息处理" class="headerlink" title="2. Delete 消息处理"></a>2. Delete 消息处理</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">case</span> commonpb.MsgType_Delete:</span><br><span class="line">    dmsg := msg.(*msgstream.DeleteMsg)</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 检查集合 ID 匹配</span></span><br><span class="line">    <span class="keyword">if</span> dmsg.CollectionID != ddn.collectionID &#123;</span><br><span class="line">        <span class="keyword">continue</span></span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 记录指标</span></span><br><span class="line">    metrics.DataNodeConsumeMsgCount.Inc()</span><br><span class="line">    metrics.DataNodeConsumeMsgRowsCount.Add(<span class="type">float64</span>(dmsg.GetNumRows()))</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 添加到输出消息</span></span><br><span class="line">    fgMsg.DeleteMessages = <span class="built_in">append</span>(fgMsg.DeleteMessages, dmsg)</span><br></pre></td></tr></table></figure><h3 id="3-DDL-消息处理"><a href="#3-DDL-消息处理" class="headerlink" title="3. DDL 消息处理"></a>3. DDL 消息处理</h3><h4 id="CreateSegment"><a href="#CreateSegment" class="headerlink" title="CreateSegment"></a>CreateSegment</h4><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">case</span> commonpb.MsgType_CreateSegment:</span><br><span class="line">    createSegment := msg.(*adaptor.CreateSegmentMessageBody)</span><br><span class="line">    <span class="keyword">if</span> err := ddn.msgHandler.HandleCreateSegment(ddn.ctx, createSegment.CreateSegmentMessage); err != <span class="literal">nil</span> &#123;</span><br><span class="line">        log.Warn(<span class="string">&quot;handle create segment message failed&quot;</span>, zap.Error(err))</span><br><span class="line">    &#125;</span><br></pre></td></tr></table></figure><h4 id="FlushSegment"><a href="#FlushSegment" class="headerlink" title="FlushSegment"></a>FlushSegment</h4><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">case</span> commonpb.MsgType_FlushSegment:</span><br><span class="line">    flushMsg := msg.(*adaptor.FlushMessageBody)</span><br><span class="line">    <span class="keyword">if</span> err := ddn.msgHandler.HandleFlush(flushMsg.FlushMessage); err != <span class="literal">nil</span> &#123;</span><br><span class="line">        log.Warn(<span class="string">&quot;handle flush message failed&quot;</span>, zap.Error(err))</span><br><span class="line">    &#125;</span><br></pre></td></tr></table></figure><h4 id="ManualFlush"><a href="#ManualFlush" class="headerlink" title="ManualFlush"></a>ManualFlush</h4><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">case</span> commonpb.MsgType_ManualFlush:</span><br><span class="line">    manualFlushMsg := msg.(*adaptor.ManualFlushMessageBody)</span><br><span class="line">    <span class="keyword">if</span> err := ddn.msgHandler.HandleManualFlush(manualFlushMsg.ManualFlushMessage); err != <span class="literal">nil</span> &#123;</span><br><span class="line">        log.Warn(<span class="string">&quot;handle manual flush message failed&quot;</span>, zap.Error(err))</span><br><span class="line">    &#125;</span><br></pre></td></tr></table></figure><h2 id="消息过滤机制"><a href="#消息过滤机制" class="headerlink" title="消息过滤机制"></a>消息过滤机制</h2><h3 id="tryToFilterSegmentInsertMessages"><a href="#tryToFilterSegmentInsertMessages" class="headerlink" title="tryToFilterSegmentInsertMessages"></a>tryToFilterSegmentInsertMessages</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(ddn *ddNode)</span></span> tryToFilterSegmentInsertMessages(msg *msgstream.InsertMsg) <span class="type">bool</span> &#123;</span><br><span class="line">    <span class="comment">// 1. 检查 Shard 名称匹配</span></span><br><span class="line">    <span class="keyword">if</span> msg.GetShardName() != ddn.vChannelName &#123;</span><br><span class="line">        <span class="keyword">return</span> <span class="literal">true</span></span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 2. 过滤已删除的段</span></span><br><span class="line">    <span class="keyword">if</span> ddn.isDropped(msg.GetSegmentID()) &#123;</span><br><span class="line">        <span class="keyword">return</span> <span class="literal">true</span></span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 3. 过滤 Sealed 段（直到当前时间戳超过段的检查点）</span></span><br><span class="line">    <span class="keyword">for</span> segID, segInfo := <span class="keyword">range</span> ddn.sealedSegInfo &#123;</span><br><span class="line">        <span class="keyword">if</span> msg.EndTs() &gt; segInfo.GetDmlPosition().GetTimestamp() &#123;</span><br><span class="line">            <span class="built_in">delete</span>(ddn.sealedSegInfo, segID)</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line">    <span class="keyword">if</span> _, ok := ddn.sealedSegInfo[msg.GetSegmentID()]; ok &#123;</span><br><span class="line">        <span class="keyword">return</span> <span class="literal">true</span></span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 4. 过滤 Growing 段（直到当前时间戳超过段的 DML 位置）</span></span><br><span class="line">    <span class="keyword">if</span> si, ok := ddn.growingSegInfo[msg.GetSegmentID()]; ok &#123;</span><br><span class="line">        <span class="keyword">if</span> msg.EndTs() &lt;= si.GetDmlPosition().GetTimestamp() &#123;</span><br><span class="line">            <span class="keyword">return</span> <span class="literal">true</span></span><br><span class="line">        &#125;</span><br><span class="line">        <span class="built_in">delete</span>(ddn.growingSegInfo, msg.GetSegmentID())</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">return</span> <span class="literal">false</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>过滤逻辑说明：</strong></p><ol><li><strong>Shard 名称检查</strong>: 确保消息属于当前通道</li><li><strong>删除段过滤</strong>: 过滤掉已删除段的消息</li><li><strong>Sealed 段过滤</strong>: 过滤已封装的段，直到消息时间戳超过段的检查点</li><li><strong>Growing 段过滤</strong>: 过滤 Growing 段中已处理的消息</li></ol><h2 id="消息类型（MsgType）赋值流程"><a href="#消息类型（MsgType）赋值流程" class="headerlink" title="消息类型（MsgType）赋值流程"></a>消息类型（MsgType）赋值流程</h2><h3 id="消息创建时的赋值"><a href="#消息创建时的赋值" class="headerlink" title="消息创建时的赋值"></a>消息创建时的赋值</h3><p>消息的 <code>MsgType</code> 在创建时通过 <code>commonpbutil.WithMsgType</code> 设置：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// 在消息创建时</span></span><br><span class="line">msg := &amp;msgstream.InsertMsg&#123;</span><br><span class="line">    BaseMsg: msgstream.BaseMsg&#123;</span><br><span class="line">        Base: commonpbutil.NewMsgBase(</span><br><span class="line">            commonpbutil.WithMsgType(commonpb.MsgType_Insert),</span><br><span class="line">            commonpbutil.WithMsgID(...),</span><br><span class="line">            commonpbutil.WithTimeStamp(...),</span><br><span class="line">        ),</span><br><span class="line">    &#125;,</span><br><span class="line">    InsertRequest: &amp;msgpb.InsertRequest&#123;...&#125;,</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h3 id="消息类型获取"><a href="#消息类型获取" class="headerlink" title="消息类型获取"></a>消息类型获取</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// InsertMsg 实现</span></span><br><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(it *InsertMsg)</span></span> Type() MsgType &#123;</span><br><span class="line">    <span class="keyword">return</span> it.Base.MsgType  <span class="comment">// 返回 Base 中的 MsgType</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h3 id="消息流中的传递"><a href="#消息流中的传递" class="headerlink" title="消息流中的传递"></a>消息流中的传递</h3><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">消息创建 → 序列化 → 消息流 → MsgStreamMsg 包装 → ddNode.Operate → msg.Type()</span><br></pre></td></tr></table></figure><h2 id="初始化流程"><a href="#初始化流程" class="headerlink" title="初始化流程"></a>初始化流程</h2><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">newDDNode</span><span class="params">(ctx context.Context, collID typeutil.UniqueID, vChannelName <span class="type">string</span>,</span></span></span><br><span class="line"><span class="params"><span class="function">    droppedSegmentIDs []typeutil.UniqueID,</span></span></span><br><span class="line"><span class="params"><span class="function">    sealedSegments []*datapb.SegmentInfo, </span></span></span><br><span class="line"><span class="params"><span class="function">    growingSegments []*datapb.SegmentInfo, </span></span></span><br><span class="line"><span class="params"><span class="function">    handler util.MsgHandler)</span></span> *ddNode &#123;</span><br><span class="line">    </span><br><span class="line">    dd := &amp;ddNode&#123;</span><br><span class="line">        ctx:               ctx,</span><br><span class="line">        collectionID:      collID,</span><br><span class="line">        sealedSegInfo:     <span class="built_in">make</span>(<span class="keyword">map</span>[typeutil.UniqueID]*datapb.SegmentInfo, <span class="built_in">len</span>(sealedSegments)),</span><br><span class="line">        growingSegInfo:    <span class="built_in">make</span>(<span class="keyword">map</span>[typeutil.UniqueID]*datapb.SegmentInfo, <span class="built_in">len</span>(growingSegments)),</span><br><span class="line">        droppedSegmentIDs: droppedSegmentIDs,</span><br><span class="line">        vChannelName:      vChannelName,</span><br><span class="line">        msgHandler:        handler,</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 初始化段信息</span></span><br><span class="line">    <span class="keyword">for</span> _, s := <span class="keyword">range</span> sealedSegments &#123;</span><br><span class="line">        dd.sealedSegInfo[s.GetID()] = s</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">for</span> _, s := <span class="keyword">range</span> growingSegments &#123;</span><br><span class="line">        dd.growingSegInfo[s.GetID()] = s</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    dd.dropMode.Store(<span class="literal">false</span>)</span><br><span class="line">    <span class="keyword">return</span> dd</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h2 id="监控指标"><a href="#监控指标" class="headerlink" title="监控指标"></a>监控指标</h2><h3 id="消费指标"><a href="#消费指标" class="headerlink" title="消费指标"></a>消费指标</h3><ul><li><code>DataNodeConsumeMsgCount</code>: 消费消息数量</li><li><code>DataNodeConsumeMsgRowsCount</code>: 消费行数</li><li><code>DataNodeConsumeBytesCount</code>: 消费字节数</li></ul><h3 id="速率指标"><a href="#速率指标" class="headerlink" title="速率指标"></a>速率指标</h3><ul><li><code>InsertConsumeThroughput</code>: 插入消息吞吐量</li><li><code>DeleteConsumeThroughput</code>: 删除消息吞吐量</li></ul><h2 id="设计特点"><a href="#设计特点" class="headerlink" title="设计特点"></a>设计特点</h2><h3 id="1-早期过滤"><a href="#1-早期过滤" class="headerlink" title="1. 早期过滤"></a>1. 早期过滤</h3><p>在 FlowGraph 的第一个节点就进行过滤，减少无效消息的传递和处理。</p><h3 id="2-状态管理"><a href="#2-状态管理" class="headerlink" title="2. 状态管理"></a>2. 状态管理</h3><p>维护段的 Growing&#x2F;Sealed&#x2F;Dropped 状态，实现精确的消息过滤。</p><h3 id="3-DDL-委托"><a href="#3-DDL-委托" class="headerlink" title="3. DDL 委托"></a>3. DDL 委托</h3><p>将 DDL 消息处理委托给 <code>MsgHandler</code>，实现关注点分离。</p><h3 id="4-线程安全"><a href="#4-线程安全" class="headerlink" title="4. 线程安全"></a>4. 线程安全</h3><p>使用原子操作管理 <code>dropMode</code>，支持并发访问。</p><h2 id="相关文档"><a href="#相关文档" class="headerlink" title="相关文档"></a>相关文档</h2><ul><li><a href="./flush_pipeline_data_sync_service.md">DataSyncService 详解</a></li><li><a href="./flush_pipeline_write_buffer_manager.md">WriteBuffer 管理器</a></li><li><a href="./flush_pipeline_overview.md">总览文档</a></li></ul>]]></content>
    
    
      
      
    <summary type="html">&lt;h2 id=&quot;概述&quot;&gt;&lt;a href=&quot;#概述&quot; class=&quot;headerlink&quot; title=&quot;概述&quot;&gt;&lt;/a&gt;概述&lt;/h2&gt;&lt;p&gt;&lt;code&gt;ddNode&lt;/code&gt;（Data Distribution Node）是 FlowGraph 中的第一个处理节点，负责从消息</summary>
      
    
    
    
    
    <category term="Milvus" scheme="https://szza.github.io/tags/Milvus/"/>
    
  </entry>
  
  <entry>
    <title>Milvus 数据消费 DataSyncService 详解</title>
    <link href="https://szza.github.io/2025/08/09/Milvus/4_flush_pipeline_data_sync_service/"/>
    <id>https://szza.github.io/2025/08/09/Milvus/4_flush_pipeline_data_sync_service/</id>
    <published>2025-08-09T05:00:00.000Z</published>
    <updated>2026-01-06T13:10:43.773Z</updated>
    
    <content type="html"><![CDATA[<h2 id="概述"><a href="#概述" class="headerlink" title="概述"></a>概述</h2><p><code>DataSyncService</code> 是 Milvus DataNode 中控制单个通道（Channel）数据同步的核心服务。它负责组装和管理 FlowGraph，协调各个组件完成数据从消息流到持久化存储的完整流程。</p><h2 id="架构设计"><a href="#架构设计" class="headerlink" title="架构设计"></a>架构设计</h2><h3 id="核心结构"><a href="#核心结构" class="headerlink" title="核心结构"></a>核心结构</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">type</span> DataSyncService <span class="keyword">struct</span> &#123;</span><br><span class="line">    ctx          context.Context</span><br><span class="line">    cancelFn     context.CancelFunc</span><br><span class="line">    metacache    metacache.MetaCache</span><br><span class="line">    opID         <span class="type">int64</span></span><br><span class="line">    collectionID typeutil.UniqueID</span><br><span class="line">    vchannelName <span class="type">string</span></span><br><span class="line">    serverID     typeutil.UniqueID</span><br><span class="line">    </span><br><span class="line">    fg *flowgraph.TimeTickedFlowGraph  <span class="comment">// 内部 FlowGraph</span></span><br><span class="line">    </span><br><span class="line">    broker         broker.Broker</span><br><span class="line">    timetickSender util.StatsUpdater</span><br><span class="line">    dispClient     msgdispatcher.Client</span><br><span class="line">    chunkManager   storage.ChunkManager</span><br><span class="line">    </span><br><span class="line">    stopOnce sync.Once</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>关键字段说明：</strong></p><ul><li><code>fg</code>: TimeTickedFlowGraph，消息处理管道</li><li><code>metacache</code>: 段元数据缓存</li><li><code>broker</code>: 与 DataCoord 通信的代理</li><li><code>timetickSender</code>: 时间戳发送器</li><li><code>dispClient</code>: 消息分发客户端</li><li><code>chunkManager</code>: 对象存储管理器</li></ul><h2 id="FlowGraph-组装流程"><a href="#FlowGraph-组装流程" class="headerlink" title="FlowGraph 组装流程"></a>FlowGraph 组装流程</h2><h3 id="节点配置"><a href="#节点配置" class="headerlink" title="节点配置"></a>节点配置</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">type</span> nodeConfig <span class="keyword">struct</span> &#123;</span><br><span class="line">    msFactory    msgstream.Factory</span><br><span class="line">    collectionID typeutil.UniqueID</span><br><span class="line">    vChannelName <span class="type">string</span></span><br><span class="line">    metacache    metacache.MetaCache</span><br><span class="line">    serverID     typeutil.UniqueID</span><br><span class="line">    dropCallback <span class="function"><span class="keyword">func</span><span class="params">()</span></span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h3 id="节点创建与组装"><a href="#节点创建与组装" class="headerlink" title="节点创建与组装"></a>节点创建与组装</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">getServiceWithChannel</span><span class="params">(...)</span></span> (*DataSyncService, <span class="type">error</span>) &#123;</span><br><span class="line">    <span class="comment">// 1. 创建 FlowGraph</span></span><br><span class="line">    fg := flowgraph.NewTimeTickedFlowGraph(params.Ctx)</span><br><span class="line">    nodeList := []flowgraph.Node&#123;&#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 2. 创建输入节点</span></span><br><span class="line">    dmStreamNode := newDmInputNode(config, input)</span><br><span class="line">    nodeList = <span class="built_in">append</span>(nodeList, dmStreamNode)</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 3. 创建 DD Node（数据分发节点）</span></span><br><span class="line">    ddNode := newDDNode(</span><br><span class="line">        params.Ctx,</span><br><span class="line">        collectionID,</span><br><span class="line">        channelName,</span><br><span class="line">        info.GetVchan().GetDroppedSegmentIds(),</span><br><span class="line">        flushed,</span><br><span class="line">        unflushed,</span><br><span class="line">        params.MsgHandler,</span><br><span class="line">    )</span><br><span class="line">    nodeList = <span class="built_in">append</span>(nodeList, ddNode)</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 4. 创建 Embedding Node（如果启用）</span></span><br><span class="line">    <span class="keyword">if</span> <span class="built_in">len</span>(info.GetSchema().GetFunctions()) &gt; <span class="number">0</span> &#123;</span><br><span class="line">        emNode, err := newEmbeddingNode(channelName, config.metacache)</span><br><span class="line">        <span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">            <span class="keyword">return</span> <span class="literal">nil</span>, err</span><br><span class="line">        &#125;</span><br><span class="line">        nodeList = <span class="built_in">append</span>(nodeList, emNode)</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 5. 创建 Write Node</span></span><br><span class="line">    writeNode, err := newWriteNode(params.Ctx, params.WriteBufferManager, ds.timetickSender, config)</span><br><span class="line">    <span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">        <span class="keyword">return</span> <span class="literal">nil</span>, err</span><br><span class="line">    &#125;</span><br><span class="line">    nodeList = <span class="built_in">append</span>(nodeList, writeNode)</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 6. 创建 TT Node（时间戳节点）</span></span><br><span class="line">    ttNode := newTTNode(config, params.WriteBufferManager, params.CheckpointUpdater)</span><br><span class="line">    nodeList = <span class="built_in">append</span>(nodeList, ttNode)</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 7. 组装节点</span></span><br><span class="line">    <span class="keyword">if</span> err := fg.AssembleNodes(nodeList...); err != <span class="literal">nil</span> &#123;</span><br><span class="line">        <span class="keyword">return</span> <span class="literal">nil</span>, err</span><br><span class="line">    &#125;</span><br><span class="line">    ds.fg = fg</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 8. 注册通道到 WriteBufferManager</span></span><br><span class="line">    err = params.WriteBufferManager.Register(channelName, metacache,</span><br><span class="line">        writebuffer.WithMetaWriter(syncmgr.BrokerMetaWriter(params.Broker, config.serverID)),</span><br><span class="line">        writebuffer.WithIDAllocator(params.Allocator),</span><br><span class="line">        writebuffer.WithTaskObserverCallback(wbTaskObserverCallback))</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">return</span> ds, <span class="literal">nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h3 id="FlowGraph-节点链路"><a href="#FlowGraph-节点链路" class="headerlink" title="FlowGraph 节点链路"></a>FlowGraph 节点链路</h3><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line">输入节点 (dmStreamNode)</span><br><span class="line">    ↓</span><br><span class="line">DD Node (ddNode) - 消息过滤和 DDL 处理</span><br><span class="line">    ↓</span><br><span class="line">Embedding Node (emNode) - 可选，向量嵌入处理</span><br><span class="line">    ↓</span><br><span class="line">Write Node (writeNode) - 数据缓冲</span><br><span class="line">    ↓</span><br><span class="line">TT Node (ttNode) - Checkpoint 更新</span><br></pre></td></tr></table></figure><h2 id="MetaCache-初始化"><a href="#MetaCache-初始化" class="headerlink" title="MetaCache 初始化"></a>MetaCache 初始化</h2><h3 id="从-Checkpoint-恢复"><a href="#从-Checkpoint-恢复" class="headerlink" title="从 Checkpoint 恢复"></a>从 Checkpoint 恢复</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">initMetaCache</span><span class="params">(...)</span></span> (metacache.MetaCache, <span class="type">error</span>) &#123;</span><br><span class="line">    <span class="comment">// 1. 加载段统计信息（Bloom Filter）</span></span><br><span class="line">    loadSegmentStats := <span class="function"><span class="keyword">func</span><span class="params">(segType <span class="type">string</span>, segments []*datapb.SegmentInfo)</span></span> &#123;</span><br><span class="line">        <span class="keyword">for</span> _, item := <span class="keyword">range</span> segments &#123;</span><br><span class="line">            future := io.GetOrCreateStatsPool().Submit(<span class="function"><span class="keyword">func</span><span class="params">()</span></span> (any, <span class="type">error</span>) &#123;</span><br><span class="line">                stats, err := compaction.LoadStats(initCtx, chunkManager, info.GetSchema(), </span><br><span class="line">                    segment.GetID(), segment.GetStatslogs())</span><br><span class="line">                <span class="keyword">if</span> err != <span class="literal">nil</span> &#123;</span><br><span class="line">                    <span class="keyword">return</span> <span class="literal">nil</span>, err</span><br><span class="line">                &#125;</span><br><span class="line">                segmentPks.Insert(segment.GetID(), pkoracle.NewBloomFilterSet(stats...))</span><br><span class="line">                </span><br><span class="line">                <span class="comment">// 加载 BM25 统计（仅 Growing 段）</span></span><br><span class="line">                <span class="keyword">if</span> segType == <span class="string">&quot;growing&quot;</span> &amp;&amp; <span class="built_in">len</span>(segment.GetBm25Statslogs()) &gt; <span class="number">0</span> &#123;</span><br><span class="line">                    bm25stats, err := compaction.LoadBM25Stats(...)</span><br><span class="line">                    segmentBm25.Insert(segment.GetID(), bm25stats)</span><br><span class="line">                &#125;</span><br><span class="line">                <span class="keyword">return</span> <span class="keyword">struct</span>&#123;&#125;&#123;&#125;, <span class="literal">nil</span></span><br><span class="line">            &#125;)</span><br><span class="line">            futures = <span class="built_in">append</span>(futures, future)</span><br><span class="line">        &#125;</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 2. 加载 Growing 和 Sealed 段的统计</span></span><br><span class="line">    loadSegmentStats(<span class="string">&quot;growing&quot;</span>, unflushed)</span><br><span class="line">    <span class="keyword">if</span> !streamingutil.IsStreamingServiceEnabled() &#123;</span><br><span class="line">        loadSegmentStats(<span class="string">&quot;sealed&quot;</span>, flushed)</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 3. 等待所有统计加载完成</span></span><br><span class="line">    <span class="keyword">if</span> err := conc.AwaitAll(futures...); err != <span class="literal">nil</span> &#123;</span><br><span class="line">        <span class="keyword">return</span> <span class="literal">nil</span>, err</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 4. 创建 MetaCache</span></span><br><span class="line">    pkStatsFactory := <span class="function"><span class="keyword">func</span><span class="params">(segment *datapb.SegmentInfo)</span></span> pkoracle.PkStat &#123;</span><br><span class="line">        pkStat, _ := segmentPks.Get(segment.GetID())</span><br><span class="line">        <span class="keyword">return</span> pkStat</span><br><span class="line">    &#125;</span><br><span class="line">    </span><br><span class="line">    metacache := metacache.NewMetaCache(info, pkStatsFactory, bm25StatsFactor, schemaManager)</span><br><span class="line">    <span class="keyword">return</span> metacache, <span class="literal">nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h2 id="生命周期管理"><a href="#生命周期管理" class="headerlink" title="生命周期管理"></a>生命周期管理</h2><h3 id="启动-FlowGraph"><a href="#启动-FlowGraph" class="headerlink" title="启动 FlowGraph"></a>启动 FlowGraph</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(dsService *DataSyncService)</span></span> Start() &#123;</span><br><span class="line">    <span class="keyword">if</span> dsService.fg != <span class="literal">nil</span> &#123;</span><br><span class="line">        log.Info(<span class="string">&quot;dataSyncService starting flow graph&quot;</span>, </span><br><span class="line">            zap.Int64(<span class="string">&quot;collectionID&quot;</span>, dsService.collectionID),</span><br><span class="line">            zap.String(<span class="string">&quot;vChanName&quot;</span>, dsService.vchannelName))</span><br><span class="line">        dsService.fg.Start()</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h3 id="优雅关闭"><a href="#优雅关闭" class="headerlink" title="优雅关闭"></a>优雅关闭</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(dsService *DataSyncService)</span></span> GracefullyClose() &#123;</span><br><span class="line">    <span class="keyword">if</span> dsService.fg != <span class="literal">nil</span> &#123;</span><br><span class="line">        log.Info(<span class="string">&quot;dataSyncService gracefully closing flowgraph&quot;</span>)</span><br><span class="line">        dsService.fg.SetCloseMethod(flowgraph.CloseGracefully)</span><br><span class="line">        dsService.<span class="built_in">close</span>()</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h3 id="关闭流程"><a href="#关闭流程" class="headerlink" title="关闭流程"></a>关闭流程</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(dsService *DataSyncService)</span></span> <span class="built_in">close</span>() &#123;</span><br><span class="line">    dsService.stopOnce.Do(<span class="function"><span class="keyword">func</span><span class="params">()</span></span> &#123;</span><br><span class="line">        <span class="comment">// 1. 注销消息分发客户端</span></span><br><span class="line">        <span class="keyword">if</span> dsService.dispClient != <span class="literal">nil</span> &#123;</span><br><span class="line">            dsService.dispClient.Deregister(dsService.vchannelName)</span><br><span class="line">        &#125;</span><br><span class="line">        </span><br><span class="line">        <span class="comment">// 2. 关闭 FlowGraph</span></span><br><span class="line">        <span class="keyword">if</span> dsService.fg != <span class="literal">nil</span> &#123;</span><br><span class="line">            dsService.fg.Close()</span><br><span class="line">        &#125;</span><br><span class="line">        </span><br><span class="line">        <span class="comment">// 3. 取消上下文</span></span><br><span class="line">        dsService.cancelFn()</span><br><span class="line">        </span><br><span class="line">        <span class="comment">// 4. 清理监控指标</span></span><br><span class="line">        pChan := funcutil.ToPhysicalChannel(dsService.vchannelName)</span><br><span class="line">        metrics.CleanupDataNodeCollectionMetrics(paramtable.GetNodeID(), </span><br><span class="line">            dsService.collectionID, pChan)</span><br><span class="line">    &#125;)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h2 id="创建方式"><a href="#创建方式" class="headerlink" title="创建方式"></a>创建方式</h2><h3 id="标准-DataNode-模式"><a href="#标准-DataNode-模式" class="headerlink" title="标准 DataNode 模式"></a>标准 DataNode 模式</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">NewDataSyncService</span><span class="params">(initCtx context.Context, pipelineParams *util.PipelineParams, </span></span></span><br><span class="line"><span class="params"><span class="function">    info *datapb.ChannelWatchInfo, tickler *util.Tickler)</span></span> (*DataSyncService, <span class="type">error</span>) &#123;</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 1. 获取段信息</span></span><br><span class="line">    unflushedSegmentInfos, err := pipelineParams.Broker.GetSegmentInfo(...)</span><br><span class="line">    flushedSegmentInfos, err := pipelineParams.Broker.GetSegmentInfo(...)</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 2. 初始化 MetaCache</span></span><br><span class="line">    metaCache, err := getMetaCacheWithTickler(initCtx, pipelineParams, info, </span><br><span class="line">        tickler, unflushedSegmentInfos, flushedSegmentInfos)</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 3. 创建消息输入</span></span><br><span class="line">    input, err := createNewInputFromDispatcher(initCtx, pipelineParams.DispClient, ...)</span><br><span class="line">    </span><br><span class="line">    <span class="comment">// 4. 创建服务</span></span><br><span class="line">    ds, err := getServiceWithChannel(initCtx, pipelineParams, info, metaCache, </span><br><span class="line">        unflushedSegmentInfos, flushedSegmentInfos, input, <span class="literal">nil</span>, <span class="literal">nil</span>)</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">return</span> ds, <span class="literal">nil</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h3 id="Streaming-Node-模式"><a href="#Streaming-Node-模式" class="headerlink" title="Streaming Node 模式"></a>Streaming Node 模式</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="title">NewStreamingNodeDataSyncService</span><span class="params">(...)</span></span> (*DataSyncService, <span class="type">error</span>) &#123;</span><br><span class="line">    <span class="comment">// 使用简化的 MetaCache 初始化（不加载 Sealed 段的统计）</span></span><br><span class="line">    metaCache, err := getMetaCacheForStreaming(...)</span><br><span class="line">    </span><br><span class="line">    <span class="keyword">return</span> getServiceWithChannel(..., input, wbTaskObserverCallback, dropCallback)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h2 id="组件依赖关系"><a href="#组件依赖关系" class="headerlink" title="组件依赖关系"></a>组件依赖关系</h2><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br></pre></td><td class="code"><pre><span class="line">DataSyncService</span><br><span class="line">    │</span><br><span class="line">    ├── FlowGraph</span><br><span class="line">    │   ├── dmStreamNode (输入)</span><br><span class="line">    │   ├── ddNode (过滤)</span><br><span class="line">    │   ├── emNode (可选，嵌入)</span><br><span class="line">    │   ├── writeNode (缓冲)</span><br><span class="line">    │   └── ttNode (检查点)</span><br><span class="line">    │</span><br><span class="line">    ├── MetaCache</span><br><span class="line">    │   ├── Segment 元数据</span><br><span class="line">    │   ├── Bloom Filter</span><br><span class="line">    │   └── BM25 统计</span><br><span class="line">    │</span><br><span class="line">    ├── Broker</span><br><span class="line">    │   ├── 与 DataCoord 通信</span><br><span class="line">    │   └── 元数据更新</span><br><span class="line">    │</span><br><span class="line">    ├── WriteBufferManager</span><br><span class="line">    │   └── 数据缓冲管理</span><br><span class="line">    │</span><br><span class="line">    └── ChunkManager</span><br><span class="line">        └── 对象存储访问</span><br></pre></td></tr></table></figure><h2 id="关键设计点"><a href="#关键设计点" class="headerlink" title="关键设计点"></a>关键设计点</h2><ol><li>单通道单服务</li></ol><p>  每个 channel 对应一个独立的 DataSyncService，实现隔离和并发处理。</p><ol start="2"><li><p>组件聚合</p><p>DataSyncService 聚合了 FlowGraph、MetaCache、Broker 等核心组件，作为统一入口。</p></li><li><p>优雅关闭</p><p>使用 <code>sync.Once</code> 确保关闭操作只执行一次，避免资源泄漏。</p></li><li><p>监控集成</p><p>自动清理通道相关的监控指标，避免指标泄漏。</p></li></ol><h2 id="相关文档"><a href="#相关文档" class="headerlink" title="相关文档"></a>相关文档</h2><ul><li><a href="./flush_pipeline_flowgraph_manager.md">FlowGraph 管理器</a></li><li><a href="./flush_pipeline_dd_node.md">DD Node 详解</a></li><li><a href="./flush_pipeline_write_buffer_manager.md">WriteBuffer 管理器</a></li><li><a href="./flush_pipeline_tt_node.md">TT Node 详解</a></li><li><a href="./flush_pipeline_overview.md">总览文档</a></li></ul>]]></content>
    
    
      
      
    <summary type="html">&lt;h2 id=&quot;概述&quot;&gt;&lt;a href=&quot;#概述&quot; class=&quot;headerlink&quot; title=&quot;概述&quot;&gt;&lt;/a&gt;概述&lt;/h2&gt;&lt;p&gt;&lt;code&gt;DataSyncService&lt;/code&gt; 是 Milvus DataNode 中控制单个通道（Channel）数据同步的核心</summary>
      
    
    
    
    
    <category term="Milvus" scheme="https://szza.github.io/tags/Milvus/"/>
    
  </entry>
  
  <entry>
    <title>Milvus FlowGraph 管理器（fgManagerImpl）</title>
    <link href="https://szza.github.io/2025/08/09/Milvus/3_flush_pipeline_flowgraph_manager/"/>
    <id>https://szza.github.io/2025/08/09/Milvus/3_flush_pipeline_flowgraph_manager/</id>
    <published>2025-08-09T04:00:00.000Z</published>
    <updated>2026-01-06T13:10:43.368Z</updated>
    
    <content type="html"><![CDATA[<h2 id="概述"><a href="#概述" class="headerlink" title="概述"></a>概述</h2><p><code>fgManagerImpl</code> 是 Milvus DataNode 中管理多个数据同步服务（DataSyncService）的核心组件。它负责按 Channel 组织和管理 FlowGraph 实例，实现多通道并发数据处理。</p><h2 id="架构设计"><a href="#架构设计" class="headerlink" title="架构设计"></a>架构设计</h2><h3 id="核心结构"><a href="#核心结构" class="headerlink" title="核心结构"></a>核心结构</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">type</span> fgManagerImpl <span class="keyword">struct</span> &#123;</span><br><span class="line">    ctx        context.Context</span><br><span class="line">    cancelFunc context.CancelFunc</span><br><span class="line">    flowgraphs *typeutil.ConcurrentMap[<span class="type">string</span>, *DataSyncService]</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>关键字段说明：</strong></p><ul><li><code>ctx</code>: 上下文，用于控制生命周期</li><li><code>cancelFunc</code>: 取消函数，用于优雅关闭</li><li><code>flowgraphs</code>: 线程安全的映射表，key 为通道名，value 为对应的 DataSyncService</li></ul><h3 id="接口定义"><a href="#接口定义" class="headerlink" title="接口定义"></a>接口定义</h3><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">type</span> FlowgraphManager <span class="keyword">interface</span> &#123;</span><br><span class="line">    AddFlowgraph(ds *DataSyncService)</span><br><span class="line">    RemoveFlowgraph(channel <span class="type">string</span>)</span><br><span class="line">    ClearFlowgraphs()</span><br><span class="line">    </span><br><span class="line">    GetFlowgraphService(channel <span class="type">string</span>) (*DataSyncService, <span class="type">bool</span>)</span><br><span class="line">    HasFlowgraph(channel <span class="type">string</span>) <span class="type">bool</span></span><br><span class="line">    HasFlowgraphWithOpID(channel <span class="type">string</span>, opID <span class="type">int64</span>) <span class="type">bool</span></span><br><span class="line">    GetFlowgraphCount() <span class="type">int</span></span><br><span class="line">    GetCollectionIDs() []<span class="type">int64</span></span><br><span class="line">    </span><br><span class="line">    GetChannelsJSON(collectionID <span class="type">int64</span>) <span class="type">string</span></span><br><span class="line">    GetSegmentsJSON(collectionID <span class="type">int64</span>) <span class="type">string</span></span><br><span class="line">    Close()</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h2 id="核心功能"><a href="#核心功能" class="headerlink" title="核心功能"></a>核心功能</h2><h3 id="1-FlowGraph-生命周期管理"><a href="#1-FlowGraph-生命周期管理" class="headerlink" title="1. FlowGraph 生命周期管理"></a>1. FlowGraph 生命周期管理</h3><h4 id="添加-FlowGraph"><a href="#添加-FlowGraph" class="headerlink" title="添加 FlowGraph"></a>添加 FlowGraph</h4><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(fm *fgManagerImpl)</span></span> AddFlowgraph(ds *DataSyncService) &#123;</span><br><span class="line">    fm.flowgraphs.Insert(ds.vchannelName, ds)</span><br><span class="line">    metrics.DataNodeNumFlowGraphs.WithLabelValues(fmt.Sprint(paramtable.GetNodeID())).Inc()</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>功能：</strong></p><ul><li>将 DataSyncService 注册到管理器</li><li>更新监控指标</li></ul><h4 id="移除-FlowGraph"><a href="#移除-FlowGraph" class="headerlink" title="移除 FlowGraph"></a>移除 FlowGraph</h4><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(fm *fgManagerImpl)</span></span> RemoveFlowgraph(channel <span class="type">string</span>) &#123;</span><br><span class="line">    <span class="keyword">if</span> fg, loaded := fm.flowgraphs.Get(channel); loaded &#123;</span><br><span class="line">        fg.<span class="built_in">close</span>()</span><br><span class="line">        fm.flowgraphs.Remove(channel)</span><br><span class="line">        </span><br><span class="line">        metrics.DataNodeNumFlowGraphs.WithLabelValues(fmt.Sprint(paramtable.GetNodeID())).Dec()</span><br><span class="line">        util.GetRateCollector().RemoveFlowGraphChannel(channel)</span><br><span class="line">    &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>功能：</strong></p><ul><li>关闭 DataSyncService</li><li>从管理器中移除</li><li>清理相关资源（指标、速率收集器）</li></ul><h4 id="清空所有-FlowGraph"><a href="#清空所有-FlowGraph" class="headerlink" title="清空所有 FlowGraph"></a>清空所有 FlowGraph</h4><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(fm *fgManagerImpl)</span></span> ClearFlowgraphs() &#123;</span><br><span class="line">    log.Info(<span class="string">&quot;start drop all flowgraph resources in DataNode&quot;</span>)</span><br><span class="line">    fm.flowgraphs.Range(<span class="function"><span class="keyword">func</span><span class="params">(key <span class="type">string</span>, value *DataSyncService)</span></span> <span class="type">bool</span> &#123;</span><br><span class="line">        value.GracefullyClose()</span><br><span class="line">        fm.flowgraphs.GetAndRemove(key)</span><br><span class="line">        </span><br><span class="line">        log.Info(<span class="string">&quot;successfully dropped flowgraph&quot;</span>, zap.String(<span class="string">&quot;vChannelName&quot;</span>, key))</span><br><span class="line">        <span class="keyword">return</span> <span class="literal">true</span></span><br><span class="line">    &#125;)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>功能：</strong></p><ul><li>优雅关闭所有 FlowGraph</li><li>清理所有资源</li></ul><h3 id="2-查询功能"><a href="#2-查询功能" class="headerlink" title="2. 查询功能"></a>2. 查询功能</h3><h4 id="获取-FlowGraph-服务"><a href="#获取-FlowGraph-服务" class="headerlink" title="获取 FlowGraph 服务"></a>获取 FlowGraph 服务</h4><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(fm *fgManagerImpl)</span></span> GetFlowgraphService(channel <span class="type">string</span>) (*DataSyncService, <span class="type">bool</span>) &#123;</span><br><span class="line">    <span class="keyword">return</span> fm.flowgraphs.Get(channel)</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h4 id="检查-FlowGraph-是否存在"><a href="#检查-FlowGraph-是否存在" class="headerlink" title="检查 FlowGraph 是否存在"></a>检查 FlowGraph 是否存在</h4><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(fm *fgManagerImpl)</span></span> HasFlowgraph(channel <span class="type">string</span>) <span class="type">bool</span> &#123;</span><br><span class="line">    _, exist := fm.flowgraphs.Get(channel)</span><br><span class="line">    <span class="keyword">return</span> exist</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h4 id="按操作-ID-检查"><a href="#按操作-ID-检查" class="headerlink" title="按操作 ID 检查"></a>按操作 ID 检查</h4><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(fm *fgManagerImpl)</span></span> HasFlowgraphWithOpID(channel <span class="type">string</span>, opID typeutil.UniqueID) <span class="type">bool</span> &#123;</span><br><span class="line">    ds, exist := fm.flowgraphs.Get(channel)</span><br><span class="line">    <span class="keyword">return</span> exist &amp;&amp; ds.opID == opID</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>用途：</strong> 确保 FlowGraph 的操作 ID 匹配，避免使用过期的 FlowGraph</p><h3 id="3-统计信息"><a href="#3-统计信息" class="headerlink" title="3. 统计信息"></a>3. 统计信息</h3><h4 id="获取集合-ID-列表"><a href="#获取集合-ID-列表" class="headerlink" title="获取集合 ID 列表"></a>获取集合 ID 列表</h4><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(fm *fgManagerImpl)</span></span> GetCollectionIDs() []<span class="type">int64</span> &#123;</span><br><span class="line">    collectionSet := typeutil.UniqueSet&#123;&#125;</span><br><span class="line">    fm.flowgraphs.Range(<span class="function"><span class="keyword">func</span><span class="params">(key <span class="type">string</span>, value *DataSyncService)</span></span> <span class="type">bool</span> &#123;</span><br><span class="line">        collectionSet.Insert(value.metacache.Collection())</span><br><span class="line">        <span class="keyword">return</span> <span class="literal">true</span></span><br><span class="line">    &#125;)</span><br><span class="line">    <span class="keyword">return</span> collectionSet.Collect()</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h4 id="获取通道-JSON"><a href="#获取通道-JSON" class="headerlink" title="获取通道 JSON"></a>获取通道 JSON</h4><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(fm *fgManagerImpl)</span></span> GetChannelsJSON(collectionID <span class="type">int64</span>) <span class="type">string</span> &#123;</span><br><span class="line">    <span class="keyword">var</span> channels []*metricsinfo.Channel</span><br><span class="line">    fm.flowgraphs.Range(<span class="function"><span class="keyword">func</span><span class="params">(ch <span class="type">string</span>, ds *DataSyncService)</span></span> <span class="type">bool</span> &#123;</span><br><span class="line">        <span class="keyword">if</span> collectionID &gt; <span class="number">0</span> &amp;&amp; ds.metacache.Collection() != collectionID &#123;</span><br><span class="line">            <span class="keyword">return</span> <span class="literal">true</span></span><br><span class="line">        &#125;</span><br><span class="line">        latestTimeTick := ds.timetickSender.GetLatestTimestamp(ch)</span><br><span class="line">        channels = <span class="built_in">append</span>(channels, &amp;metricsinfo.Channel&#123;</span><br><span class="line">            Name:           ch,</span><br><span class="line">            WatchState:     ds.fg.Status(),</span><br><span class="line">            LatestTimeTick: tsoutil.PhysicalTimeFormat(latestTimeTick),</span><br><span class="line">            NodeID:         paramtable.GetNodeID(),</span><br><span class="line">            CollectionID:   ds.metacache.Collection(),</span><br><span class="line">        &#125;)</span><br><span class="line">        <span class="keyword">return</span> <span class="literal">true</span></span><br><span class="line">    &#125;)</span><br><span class="line">    <span class="comment">// ... 序列化为 JSON</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><strong>返回信息：</strong></p><ul><li>通道名称</li><li>FlowGraph 状态</li><li>最新时间戳</li><li>节点 ID</li><li>集合 ID</li></ul><h4 id="获取段-JSON"><a href="#获取段-JSON" class="headerlink" title="获取段 JSON"></a>获取段 JSON</h4><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">func</span> <span class="params">(fm *fgManagerImpl)</span></span> GetSegmentsJSON(collectionID <span class="type">int64</span>) <span class="type">string</span> &#123;</span><br><span class="line">    <span class="keyword">var</span> segments []*metricsinfo.Segment</span><br><span class="line">    fm.flowgraphs.Range(<span class="function"><span class="keyword">func</span><span class="params">(ch <span class="type">string</span>, ds *DataSyncService)</span></span> <span class="type">bool</span> &#123;</span><br><span class="line">        <span class="keyword">if</span> collectionID &gt; <span class="number">0</span> &amp;&amp; ds.metacache.Collection() != collectionID &#123;</span><br><span class="line">            <span class="keyword">return</span> <span class="literal">true</span></span><br><span class="line">        &#125;</span><br><span class="line">        </span><br><span class="line">        meta := ds.metacache</span><br><span class="line">        <span class="keyword">for</span> _, segment := <span class="keyword">range</span> meta.GetSegmentsBy() &#123;</span><br><span class="line">            segments = <span class="built_in">append</span>(segments, &amp;metricsinfo.Segment&#123;</span><br><span class="line">                SegmentID:      segment.SegmentID(),</span><br><span class="line">                CollectionID:   meta.Collection(),</span><br><span class="line">                PartitionID:    segment.PartitionID(),</span><br><span class="line">                Channel:        ch,</span><br><span class="line">                State:          segment.State().String(),</span><br><span class="line">                Level:          segment.Level().String(),</span><br><span class="line">                NodeID:         paramtable.GetNodeID(),</span><br><span class="line">                NumOfRows:      segment.NumOfRows(),</span><br><span class="line">                FlushedRows:    segment.FlushedRows(),</span><br><span class="line">                SyncBufferRows: segment.BufferRows(),</span><br><span class="line">                SyncingRows:    segment.SyncingRows(),</span><br><span class="line">            &#125;)</span><br><span class="line">        &#125;</span><br><span class="line">        <span class="keyword">return</span> <span class="literal">true</span></span><br><span class="line">    &#125;)</span><br><span class="line">    <span class="comment">// ... 序列化为 JSON</span></span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><h2 id="设计特点"><a href="#设计特点" class="headerlink" title="设计特点"></a>设计特点</h2><h3 id="1-线程安全"><a href="#1-线程安全" class="headerlink" title="1. 线程安全"></a>1. 线程安全</h3><p>使用 <code>typeutil.ConcurrentMap</code> 实现线程安全的并发访问，支持多 goroutine 同时操作。</p><h3 id="2-资源管理"><a href="#2-资源管理" class="headerlink" title="2. 资源管理"></a>2. 资源管理</h3><ul><li>通过 <code>context.Context</code> 统一管理生命周期</li><li>提供优雅关闭机制（<code>GracefullyClose</code>）</li><li>自动清理监控指标和资源</li></ul><h3 id="3-监控集成"><a href="#3-监控集成" class="headerlink" title="3. 监控集成"></a>3. 监控集成</h3><ul><li>集成 Prometheus 指标</li><li>提供 JSON 格式的统计信息</li><li>支持按集合过滤</li></ul><h2 id="使用场景"><a href="#使用场景" class="headerlink" title="使用场景"></a>使用场景</h2><h3 id="场景-1-通道注册"><a href="#场景-1-通道注册" class="headerlink" title="场景 1: 通道注册"></a>场景 1: 通道注册</h3><p>当 DataCoord 分配新通道给 DataNode 时：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// 创建 DataSyncService</span></span><br><span class="line">ds, err := NewDataSyncService(ctx, params, watchInfo, tickler)</span><br><span class="line"></span><br><span class="line"><span class="comment">// 注册到管理器</span></span><br><span class="line">fgManager.AddFlowgraph(ds)</span><br><span class="line"></span><br><span class="line"><span class="comment">// 启动 FlowGraph</span></span><br><span class="line">ds.Start()</span><br></pre></td></tr></table></figure><h3 id="场景-2-通道释放"><a href="#场景-2-通道释放" class="headerlink" title="场景 2: 通道释放"></a>场景 2: 通道释放</h3><p>当通道需要迁移到其他节点时：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// 移除 FlowGraph</span></span><br><span class="line">fgManager.RemoveFlowgraph(channelName)</span><br></pre></td></tr></table></figure><h3 id="场景-3-节点关闭"><a href="#场景-3-节点关闭" class="headerlink" title="场景 3: 节点关闭"></a>场景 3: 节点关闭</h3><p>当 DataNode 关闭时：</p><figure class="highlight go"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// 清空所有 FlowGraph</span></span><br><span class="line">fgManager.ClearFlowgraphs()</span><br><span class="line"></span><br><span class="line"><span class="comment">// 关闭管理器</span></span><br><span class="line">fgManager.Close()</span><br></pre></td></tr></table></figure><h2 id="与其他组件的关系"><a href="#与其他组件的关系" class="headerlink" title="与其他组件的关系"></a>与其他组件的关系</h2><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line">fgManagerImpl</span><br><span class="line">    │</span><br><span class="line">    ├── DataSyncService (按通道组织)</span><br><span class="line">    │       │</span><br><span class="line">    │       ├── FlowGraph (消息处理管道)</span><br><span class="line">    │       ├── MetaCache (段元数据缓存)</span><br><span class="line">    │       ├── WriteBufferManager (数据缓冲管理器)</span><br><span class="line">    │       └── Broker (与 DataCoord 通信)</span><br><span class="line">    │</span><br><span class="line">    └── Metrics (监控指标)</span><br></pre></td></tr></table></figure><h2 id="相关文档"><a href="#相关文档" class="headerlink" title="相关文档"></a>相关文档</h2><ul><li><a href="./flush_pipeline_data_sync_service.md">DataSyncService 详解</a></li><li><a href="./flush_pipeline_write_buffer_manager.md">WriteBuffer 管理器</a></li><li><a href="./flush_pipeline_overview.md">总览文档</a></li></ul>]]></content>
    
    
      
      
    <summary type="html">&lt;h2 id=&quot;概述&quot;&gt;&lt;a href=&quot;#概述&quot; class=&quot;headerlink&quot; title=&quot;概述&quot;&gt;&lt;/a&gt;概述&lt;/h2&gt;&lt;p&gt;&lt;code&gt;fgManagerImpl&lt;/code&gt; 是 Milvus DataNode 中管理多个数据同步服务（DataSyncServi</summary>
      
    
    
    
    
    <category term="Milvus" scheme="https://szza.github.io/tags/Milvus/"/>
    
  </entry>
  
  <entry>
    <title>Milvus DataNode Overview</title>
    <link href="https://szza.github.io/2025/08/09/Milvus/2_flush_pipeline_overview/"/>
    <id>https://szza.github.io/2025/08/09/Milvus/2_flush_pipeline_overview/</id>
    <published>2025-08-09T03:00:00.000Z</published>
    <updated>2026-01-06T13:10:42.896Z</updated>
    
    <content type="html"><![CDATA[<h2 id="文档索引"><a href="#文档索引" class="headerlink" title="文档索引"></a>文档索引</h2><p>本文档系列详细介绍了 Milvus DataNode 中数据刷新（Flush）管道的核心组件和机制。文档按组件和功能模块组织，便于深入理解系统设计。</p><h3 id="📚-核心组件文档"><a href="#📚-核心组件文档" class="headerlink" title="📚 核心组件文档"></a>📚 核心组件文档</h3><ol><li><p><strong><a href="./flush_pipeline_flowgraph_manager.md">FlowGraph 管理器</a></strong></p><ul><li><code>fgManagerImpl</code> 的设计与职责</li><li>FlowGraph 生命周期管理</li><li>多通道并发管理</li></ul></li><li><p><strong><a href="./flush_pipeline_data_sync_service.md">DataSyncService</a></strong></p><ul><li>DataSyncService 架构</li><li>FlowGraph 组装流程</li><li>节点初始化与启动</li></ul></li><li><p><strong><a href="./flush_pipeline_dd_node.md">DD Node（数据分发节点）</a></strong></p><ul><li><code>ddNode</code> 的消息过滤机制</li><li>DDL 消息处理</li><li>消息类型（MsgType）赋值流程</li></ul></li><li><p><strong><a href="./flush_pipeline_write_buffer_manager.md">WriteBuffer 管理器</a></strong></p><ul><li><code>bufferManager</code> 的设计与实现</li><li>内存管理与驱逐策略</li><li>通道注册与生命周期</li></ul></li><li><p><strong><a href="./flush_pipeline_write_buffer.md">WriteBuffer 实现</a></strong></p><ul><li><code>WriteBuffer</code> 接口与基础实现</li><li><code>l0WriteBuffer</code> 特殊设计</li><li>段缓冲（segmentBuffer）管理</li></ul></li><li><p><strong><a href="./flush_pipeline_sync_manager.md">SyncManager 与 SyncTask</a></strong></p><ul><li><code>SyncManager</code> 异步同步机制</li><li><code>SyncTask</code> 执行流程</li><li>数据持久化与元数据更新</li></ul></li><li><p><strong><a href="./flush_pipeline_tt_node.md">TT Node（时间戳节点）</a></strong></p><ul><li><code>ttNode</code> 的检查点更新机制</li><li><code>ChannelCheckpointUpdater</code> 异步更新</li><li>检查点失败处理</li></ul></li><li><p><strong><a href="./flush_pipeline_flush_checkpoint.md">Flush 与 Checkpoint 机制</a></strong></p><ul><li>Flush 操作的全链路影响</li><li>Checkpoint 更新流程</li><li><code>flushTs</code> 的作用与重置条件</li><li>数据一致性保障</li></ul></li></ol><h3 id="🔗-组件关系图"><a href="#🔗-组件关系图" class="headerlink" title="🔗 组件关系图"></a>🔗 组件关系图</h3><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br></pre></td><td class="code"><pre><span class="line">┌─────────────────────────────────────────────────────────────┐</span><br><span class="line">│                    fgManagerImpl                            │</span><br><span class="line">│         (管理多个 DataSyncService，按通道组织)                 │</span><br><span class="line">└────────────────────┬────────────────────────────────────────┘</span><br><span class="line">                     │</span><br><span class="line">                     ▼</span><br><span class="line">┌─────────────────────────────────────────────────────────────┐</span><br><span class="line">│                 DataSyncService                             │</span><br><span class="line">│             (控制一个通道的 FlowGraph)                        │</span><br><span class="line">└──────┬──────────────┬──────────────┬─────────────-──────────┘</span><br><span class="line">       │              │              │</span><br><span class="line">       ▼              ▼              ▼</span><br><span class="line">┌──────────┐   ┌──────────┐   ┌──────────┐</span><br><span class="line">│  ddNode  │──▶│writeNode │──▶│  ttNode  │</span><br><span class="line">│(消息过滤)│    │(数据缓冲)  │   │(检查点更新)│</span><br><span class="line">└────┬─────┘   └────┬─────┘   └─────┬─────┘</span><br><span class="line">     │              │               │</span><br><span class="line">     │              ▼               │</span><br><span class="line">     │      ┌──────────────┐        │</span><br><span class="line">     │      │WriteBufferMgr│        │</span><br><span class="line">     │      └──────┬───────┘        │</span><br><span class="line">     │             │                │</span><br><span class="line">     │             ▼                │</span><br><span class="line">     │      ┌──────────────┐        │</span><br><span class="line">     │      │ WriteBuffer  │        │</span><br><span class="line">     │      └──────┬───────┘        │</span><br><span class="line">     │             │                │</span><br><span class="line">     │             ▼                │</span><br><span class="line">     │      ┌──────────────┐        │</span><br><span class="line">     │      │ SyncManager  │        │</span><br><span class="line">     │      └──────┬───────┘        │</span><br><span class="line">     │             │                │</span><br><span class="line">     │             ▼                │</span><br><span class="line">     │      ┌──────────────┐        │</span><br><span class="line">     │      │  SyncTask    │        │</span><br><span class="line">     │      └──────────────┘        │</span><br><span class="line">     │                              │</span><br><span class="line">     └──────────────┬───────────────┘</span><br><span class="line">                    │</span><br><span class="line">                    ▼</span><br><span class="line">          ┌──────────────────┐</span><br><span class="line">          │ ChannelCheckpoint│</span><br><span class="line">          │    Updater       │</span><br><span class="line">          └──────────────────┘</span><br></pre></td></tr></table></figure><h3 id="📖-关键概念"><a href="#📖-关键概念" class="headerlink" title="📖 关键概念"></a>📖 关键概念</h3><h4 id="消息流（Message-Stream）"><a href="#消息流（Message-Stream）" class="headerlink" title="消息流（Message Stream）"></a>消息流（Message Stream）</h4><ul><li><strong>InsertMsg</strong>: 插入数据消息</li><li><strong>DeleteMsg</strong>: 删除数据消息</li><li><strong>DDL 消息</strong>: CreateSegment、Flush、DropCollection 等控制消息</li></ul><h4 id="数据缓冲（WriteBuffer）"><a href="#数据缓冲（WriteBuffer）" class="headerlink" title="数据缓冲（WriteBuffer）"></a>数据缓冲（WriteBuffer）</h4><ul><li><strong>Growing Segment</strong>: 正在接收数据的段</li><li><strong>Sealed Segment</strong>: 已封装的段，准备刷新</li><li><strong>L0 Segment</strong>: 流式模式下的删除数据专用段</li></ul><h4 id="同步机制（Sync）"><a href="#同步机制（Sync）" class="headerlink" title="同步机制（Sync）"></a>同步机制（Sync）</h4><ul><li><strong>自动同步</strong>: 基于内存阈值、时间策略触发</li><li><strong>手动刷新</strong>: 通过 Flush 消息触发</li><li><strong>SyncTask</strong>: 执行数据持久化的任务单元</li></ul><h4 id="检查点（Checkpoint）"><a href="#检查点（Checkpoint）" class="headerlink" title="检查点（Checkpoint）"></a>检查点（Checkpoint）</h4><ul><li><strong>Channel Checkpoint</strong>: 通道级别的数据消费位置</li><li><strong>flushTs</strong>: 刷新目标时间戳</li><li><strong>检查点更新</strong>: 异步更新到 DataCoord</li></ul><h3 id="🎯-数据流程"><a href="#🎯-数据流程" class="headerlink" title="🎯 数据流程"></a>🎯 数据流程</h3><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">消息流 → ddNode（过滤） → writeNode（缓冲） → WriteBuffer → SyncManager → SyncTask → 对象存储</span><br><span class="line">                                                                    ↓</span><br><span class="line">                                                              MetaCache 更新</span><br><span class="line">                                                                    ↓</span><br><span class="line">                                                              ttNode → CheckpointUpdater → DataCoord</span><br></pre></td></tr></table></figure><h3 id="📝-相关代码路径"><a href="#📝-相关代码路径" class="headerlink" title="📝 相关代码路径"></a>📝 相关代码路径</h3><ul><li>FlowGraph 管理: <code>internal/flushcommon/pipeline/flow_graph_manager.go</code></li><li>DataSyncService: <code>internal/flushcommon/pipeline/data_sync_service.go</code></li><li>DD Node: <code>internal/flushcommon/pipeline/flow_graph_dd_node.go</code></li><li>WriteBuffer 管理器: <code>internal/flushcommon/writebuffer/manager.go</code></li><li>WriteBuffer 实现: <code>internal/flushcommon/writebuffer/write_buffer.go</code></li><li>L0 WriteBuffer: <code>internal/flushcommon/writebuffer/l0_write_buffer.go</code></li><li>SyncManager: <code>internal/flushcommon/syncmgr/sync_manager.go</code></li><li>SyncTask: <code>internal/flushcommon/syncmgr/task.go</code></li><li>TT Node: <code>internal/flushcommon/pipeline/flow_graph_time_tick_node.go</code></li><li>CheckpointUpdater: <code>internal/flushcommon/util/checkpoint_updater.go</code></li></ul><hr><p><strong>下一步</strong>: 从 <a href="./flush_pipeline_flowgraph_manager.md">FlowGraph 管理器</a> 开始深入了解各个组件。</p>]]></content>
    
    
      
      
    <summary type="html">&lt;h2 id=&quot;文档索引&quot;&gt;&lt;a href=&quot;#文档索引&quot; class=&quot;headerlink&quot; title=&quot;文档索引&quot;&gt;&lt;/a&gt;文档索引&lt;/h2&gt;&lt;p&gt;本文档系列详细介绍了 Milvus DataNode 中数据刷新（Flush）管道的核心组件和机制。文档按组件和功能模块组织</summary>
      
    
    
    
    
    <category term="Milvus" scheme="https://szza.github.io/tags/Milvus/"/>
    
  </entry>
  
</feed>
