tag:github.com,2008:https://github.com/WorksApplications/Sudachi/releasesRelease notes from Sudachi2024-11-05T05:37:57Ztag:github.com,2008:Repository/100921897/v0.7.52024-11-05T05:55:27ZSudachi version 0.7.5<h1>Highlights</h1>
<ul>
<li>Behavior of the dictionary printer and builder are changed (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2439554903" data-permission-text="Title is private" data-url="https://github.com/WorksApplications/Sudachi/issues/234" data-hovercard-type="pull_request" data-hovercard-url="/WorksApplications/Sudachi/pull/234/hovercard" href="https://github.com/WorksApplications/Sudachi/pull/234">#234</a>)
<ul>
<li><code>DictionaryPrinter</code> now prints word references in the (Surface, POS, Reading) triple format, instead of the line number format.</li>
<li><code>DictionaryBuilder</code> now allows the dictionary form to be written in the triple format, not only the line number format.</li>
</ul>
</li>
</ul>
<h1>Added</h1>
<ul>
<li>Benchmark scripts are added (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2502361366" data-permission-text="Title is private" data-url="https://github.com/WorksApplications/Sudachi/issues/235" data-hovercard-type="pull_request" data-hovercard-url="/WorksApplications/Sudachi/pull/235/hovercard" href="https://github.com/WorksApplications/Sudachi/pull/235">#235</a>)</li>
</ul>
<h1>Fixed</h1>
<ul>
<li>Tutorial and readme are updated (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2588157902" data-permission-text="Title is private" data-url="https://github.com/WorksApplications/Sudachi/issues/237" data-hovercard-type="pull_request" data-hovercard-url="/WorksApplications/Sudachi/pull/237/hovercard" href="https://github.com/WorksApplications/Sudachi/pull/237">#237</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2634438355" data-permission-text="Title is private" data-url="https://github.com/WorksApplications/Sudachi/issues/240" data-hovercard-type="pull_request" data-hovercard-url="/WorksApplications/Sudachi/pull/240/hovercard" href="https://github.com/WorksApplications/Sudachi/pull/240">#240</a>)</li>
<li><code>Config.Resource.asByteBuffer</code> now always returns ByteBuffer with little endian byte order (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2610297806" data-permission-text="Title is private" data-url="https://github.com/WorksApplications/Sudachi/issues/239" data-hovercard-type="pull_request" data-hovercard-url="/WorksApplications/Sudachi/pull/239/hovercard" href="https://github.com/WorksApplications/Sudachi/pull/239">#239</a>)
<ul>
<li><code>StringUtil.readAllBytes</code> also now returns ByteBuffer with little endian byte order.</li>
</ul>
</li>
</ul>github-actions[bot]tag:github.com,2008:Repository/100921897/v0.7.42024-07-02T07:27:42ZSudachi version 0.7.4<h1>Highlights</h1>
<ul>
<li>Add <code>Tokenizer.lazyTokenizeSentences(SplitMode mode, Readable input)</code>, that performs analysis lazily and saves memory usage (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2369244789" data-permission-text="Title is private" data-url="https://github.com/WorksApplications/Sudachi/issues/231" data-hovercard-type="pull_request" data-hovercard-url="/WorksApplications/Sudachi/pull/231/hovercard" href="https://github.com/WorksApplications/Sudachi/pull/231">#231</a>)
<ul>
<li><code>Tokenizer.tokenizeSentences(SplitMode mode, Reader input)</code> is marked as deprecated.</li>
</ul>
</li>
</ul>
<h1>Fixed</h1>
<ul>
<li>Do not segfault on tokenizing with closed dictionary (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="1885138167" data-permission-text="Title is private" data-url="https://github.com/WorksApplications/Sudachi/issues/217" data-hovercard-type="pull_request" data-hovercard-url="/WorksApplications/Sudachi/pull/217/hovercard" href="https://github.com/WorksApplications/Sudachi/pull/217">#217</a>)</li>
<li>The default config sudachi.json sets non-existent property joinKanjiNumeric in JoinNumericPlugin (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2136761701" data-permission-text="Title is private" data-url="https://github.com/WorksApplications/Sudachi/issues/221" data-hovercard-type="issue" data-hovercard-url="/WorksApplications/Sudachi/issues/221/hovercard" href="https://github.com/WorksApplications/Sudachi/issues/221">#221</a>)</li>
<li>fix incorrect size calculation when expand (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2277341109" data-permission-text="Title is private" data-url="https://github.com/WorksApplications/Sudachi/issues/227" data-hovercard-type="pull_request" data-hovercard-url="/WorksApplications/Sudachi/pull/227/hovercard" href="https://github.com/WorksApplications/Sudachi/pull/227">#227</a>)</li>
<li>Update tutorial.md (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2237023218" data-permission-text="Title is private" data-url="https://github.com/WorksApplications/Sudachi/issues/226" data-hovercard-type="pull_request" data-hovercard-url="/WorksApplications/Sudachi/pull/226/hovercard" href="https://github.com/WorksApplications/Sudachi/pull/226">#226</a>)</li>
</ul>github-actions[bot]tag:github.com,2008:Repository/100921897/v0.7.32023-06-26T02:09:00ZSudachi version 0.7.3<p>This is a support release for Elasticsearch/OpenSearch integration 3.1.0 release.</p>
<h1>Highlights</h1>
<ul>
<li>Added <code>Config.fromResource</code> method for reading Configs vial PathAnchor. (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="1770684229" data-permission-text="Title is private" data-url="https://github.com/WorksApplications/Sudachi/issues/212" data-hovercard-type="pull_request" data-hovercard-url="/WorksApplications/Sudachi/pull/212/hovercard" href="https://github.com/WorksApplications/Sudachi/pull/212">#212</a>)</li>
</ul>
<h1>Internals</h1>
<ul>
<li>Plugin classloading is done by PathAnchor and support multiple classloaders (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="1742902128" data-permission-text="Title is private" data-url="https://github.com/WorksApplications/Sudachi/issues/210" data-hovercard-type="pull_request" data-hovercard-url="/WorksApplications/Sudachi/pull/210/hovercard" href="https://github.com/WorksApplications/Sudachi/pull/210">#210</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="1716593229" data-permission-text="Title is private" data-url="https://github.com/WorksApplications/Sudachi/issues/209" data-hovercard-type="issue" data-hovercard-url="/WorksApplications/Sudachi/issues/209/hovercard" href="https://github.com/WorksApplications/Sudachi/issues/209">#209</a>)</li>
</ul>
<h1>Notes about v0.7.2</h1>
<p>Release v0.7.2 contains subset of the functionality of this release but did not contain crucial features. It is not a broken release, but there are no user-visible changed from v0.7.1.</p>github-actions[bot]tag:github.com,2008:Repository/100921897/v0.7.22023-06-15T02:25:54Zv0.7.2<p>bump version -> v0.7.2</p>eiennohitotag:github.com,2008:Repository/100921897/v0.7.12023-03-09T09:51:28ZSudachi version 0.7.1<p>This is a maintenance release</p>
<h1>Highlights</h1>
<ul>
<li>Fixed analysis truncation when using analysis with sentence splitting and the input does not contain data which can be treated as splittable sentences</li>
<li>Fixed O(N^2) performance in sentence splitting when underlying reader does not fill buffer fully at once</li>
<li>Stop calling into reader with full buffer</li>
</ul>github-actions[bot]tag:github.com,2008:Repository/100921897/v0.6.42023-03-09T09:51:14Z0.6.4<p>This is a maintenance release</p>
<h1>Highlights</h1>
<ul>
<li>Fixed analysis truncation when using analysis with sentence splitting and the input does not contain data which can be treated as splittable sentences</li>
<li>Fixed O(N^2) performance in sentence splitting when underlying reader does not fill buffer fully at once</li>
<li>Stop calling into reader with full buffer</li>
</ul>github-actions[bot]tag:github.com,2008:Repository/100921897/v0.6.32022-08-29T12:50:39ZSudachi version 0.6.3<p>Port relaxed boundary mode from 0.7.0 while keeping ABI compatibility with pre-0.7.0 versions.</p>github-actions[bot]tag:github.com,2008:Repository/100921897/v0.7.02022-08-16T03:00:38ZSudachi version 0.7.0<h1>Highlights</h1>
<ul>
<li><code>Tokenizer.tokenize</code> API returns <code>MorphemeList</code> instead of <code>List<Morpheme></code>. This change is ABI-incompatible with previous versions and applications which use Sudachi <strong>require recompilation</strong>. The change should be source-compatible with no changes required to the source code which uses Sudachi.</li>
<li>New API: <code>MorphemeList.split</code>: resplit C-mode token sequence to lower level without re-analyzing the whole string.</li>
<li>Added relaxed boundary matching mode for Regex OOV handler</li>
</ul>github-actions[bot]tag:github.com,2008:Repository/100921897/v0.6.22022-06-21T01:05:24ZSudachi version 0.6.2<h1>Highlights</h1>
<ul>
<li>Fixed invalid POS tags which appeared when using user-defined POS tags both in user dictionaries and OOV handlers. You are not affected by this bug if you did not use user-defined POS in OOV handlers.</li>
</ul>github-actions[bot]tag:github.com,2008:Repository/100921897/v0.6.12022-06-10T08:45:35ZSudachi version 0.6.1<h1>Highlights</h1>
<ul>
<li><strong>DO NOT USE 0.6.0, IT IS INCOMPATIBLE WITH 0.6.1</strong></li>
<li>Regex OOV plugin has configurable maximum token length</li>
<li>SettingsAnchor renamed to PathAnchor to make more clear its purpose</li>
<li>Add useful Config methods, e.g. for a common case of loading default configuration with provided PathAnchor to resolve default paths in another directory.</li>
<li>Filesystem-based PathAnchor now plays correctly with SecurityManager present (e.g. in ElasticSearch).</li>
</ul>
<h2>Regex OOV length</h2>
<p>Use <code>maxLength</code> field of the plugin configuration object to set maximum allowed length, in utf-8 bytes (by default 32). The unit will change to unicode codepoints in the future.</p>github-actions[bot]