tag:github.com,2008:https://github.com/WorksApplications/Sudachi/releases Release notes from Sudachi 2024-11-05T05:37:57Z tag:github.com,2008:Repository/100921897/v0.7.5 2024-11-05T05:55:27Z Sudachi version 0.7.5 <h1>Highlights</h1> <ul> <li>Behavior of the dictionary printer and builder are changed (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2439554903" data-permission-text="Title is private" data-url="https://github.com/WorksApplications/Sudachi/issues/234" data-hovercard-type="pull_request" data-hovercard-url="/WorksApplications/Sudachi/pull/234/hovercard" href="https://github.com/WorksApplications/Sudachi/pull/234">#234</a>) <ul> <li><code>DictionaryPrinter</code> now prints word references in the (Surface, POS, Reading) triple format, instead of the line number format.</li> <li><code>DictionaryBuilder</code> now allows the dictionary form to be written in the triple format, not only the line number format.</li> </ul> </li> </ul> <h1>Added</h1> <ul> <li>Benchmark scripts are added (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2502361366" data-permission-text="Title is private" data-url="https://github.com/WorksApplications/Sudachi/issues/235" data-hovercard-type="pull_request" data-hovercard-url="/WorksApplications/Sudachi/pull/235/hovercard" href="https://github.com/WorksApplications/Sudachi/pull/235">#235</a>)</li> </ul> <h1>Fixed</h1> <ul> <li>Tutorial and readme are updated (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2588157902" data-permission-text="Title is private" data-url="https://github.com/WorksApplications/Sudachi/issues/237" data-hovercard-type="pull_request" data-hovercard-url="/WorksApplications/Sudachi/pull/237/hovercard" href="https://github.com/WorksApplications/Sudachi/pull/237">#237</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2634438355" data-permission-text="Title is private" data-url="https://github.com/WorksApplications/Sudachi/issues/240" data-hovercard-type="pull_request" data-hovercard-url="/WorksApplications/Sudachi/pull/240/hovercard" href="https://github.com/WorksApplications/Sudachi/pull/240">#240</a>)</li> <li><code>Config.Resource.asByteBuffer</code> now always returns ByteBuffer with little endian byte order (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2610297806" data-permission-text="Title is private" data-url="https://github.com/WorksApplications/Sudachi/issues/239" data-hovercard-type="pull_request" data-hovercard-url="/WorksApplications/Sudachi/pull/239/hovercard" href="https://github.com/WorksApplications/Sudachi/pull/239">#239</a>) <ul> <li><code>StringUtil.readAllBytes</code> also now returns ByteBuffer with little endian byte order.</li> </ul> </li> </ul> github-actions[bot] tag:github.com,2008:Repository/100921897/v0.7.4 2024-07-02T07:27:42Z Sudachi version 0.7.4 <h1>Highlights</h1> <ul> <li>Add <code>Tokenizer.lazyTokenizeSentences(SplitMode mode, Readable input)</code>, that performs analysis lazily and saves memory usage (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2369244789" data-permission-text="Title is private" data-url="https://github.com/WorksApplications/Sudachi/issues/231" data-hovercard-type="pull_request" data-hovercard-url="/WorksApplications/Sudachi/pull/231/hovercard" href="https://github.com/WorksApplications/Sudachi/pull/231">#231</a>) <ul> <li><code>Tokenizer.tokenizeSentences(SplitMode mode, Reader input)</code> is marked as deprecated.</li> </ul> </li> </ul> <h1>Fixed</h1> <ul> <li>Do not segfault on tokenizing with closed dictionary (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="1885138167" data-permission-text="Title is private" data-url="https://github.com/WorksApplications/Sudachi/issues/217" data-hovercard-type="pull_request" data-hovercard-url="/WorksApplications/Sudachi/pull/217/hovercard" href="https://github.com/WorksApplications/Sudachi/pull/217">#217</a>)</li> <li>The default config sudachi.json sets non-existent property joinKanjiNumeric in JoinNumericPlugin (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2136761701" data-permission-text="Title is private" data-url="https://github.com/WorksApplications/Sudachi/issues/221" data-hovercard-type="issue" data-hovercard-url="/WorksApplications/Sudachi/issues/221/hovercard" href="https://github.com/WorksApplications/Sudachi/issues/221">#221</a>)</li> <li>fix incorrect size calculation when expand (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2277341109" data-permission-text="Title is private" data-url="https://github.com/WorksApplications/Sudachi/issues/227" data-hovercard-type="pull_request" data-hovercard-url="/WorksApplications/Sudachi/pull/227/hovercard" href="https://github.com/WorksApplications/Sudachi/pull/227">#227</a>)</li> <li>Update tutorial.md (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="2237023218" data-permission-text="Title is private" data-url="https://github.com/WorksApplications/Sudachi/issues/226" data-hovercard-type="pull_request" data-hovercard-url="/WorksApplications/Sudachi/pull/226/hovercard" href="https://github.com/WorksApplications/Sudachi/pull/226">#226</a>)</li> </ul> github-actions[bot] tag:github.com,2008:Repository/100921897/v0.7.3 2023-06-26T02:09:00Z Sudachi version 0.7.3 <p>This is a support release for Elasticsearch/OpenSearch integration 3.1.0 release.</p> <h1>Highlights</h1> <ul> <li>Added <code>Config.fromResource</code> method for reading Configs vial PathAnchor. (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="1770684229" data-permission-text="Title is private" data-url="https://github.com/WorksApplications/Sudachi/issues/212" data-hovercard-type="pull_request" data-hovercard-url="/WorksApplications/Sudachi/pull/212/hovercard" href="https://github.com/WorksApplications/Sudachi/pull/212">#212</a>)</li> </ul> <h1>Internals</h1> <ul> <li>Plugin classloading is done by PathAnchor and support multiple classloaders (<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="1742902128" data-permission-text="Title is private" data-url="https://github.com/WorksApplications/Sudachi/issues/210" data-hovercard-type="pull_request" data-hovercard-url="/WorksApplications/Sudachi/pull/210/hovercard" href="https://github.com/WorksApplications/Sudachi/pull/210">#210</a>, <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="1716593229" data-permission-text="Title is private" data-url="https://github.com/WorksApplications/Sudachi/issues/209" data-hovercard-type="issue" data-hovercard-url="/WorksApplications/Sudachi/issues/209/hovercard" href="https://github.com/WorksApplications/Sudachi/issues/209">#209</a>)</li> </ul> <h1>Notes about v0.7.2</h1> <p>Release v0.7.2 contains subset of the functionality of this release but did not contain crucial features. It is not a broken release, but there are no user-visible changed from v0.7.1.</p> github-actions[bot] tag:github.com,2008:Repository/100921897/v0.7.2 2023-06-15T02:25:54Z v0.7.2 <p>bump version -&gt; v0.7.2</p> eiennohito tag:github.com,2008:Repository/100921897/v0.7.1 2023-03-09T09:51:28Z Sudachi version 0.7.1 <p>This is a maintenance release</p> <h1>Highlights</h1> <ul> <li>Fixed analysis truncation when using analysis with sentence splitting and the input does not contain data which can be treated as splittable sentences</li> <li>Fixed O(N^2) performance in sentence splitting when underlying reader does not fill buffer fully at once</li> <li>Stop calling into reader with full buffer</li> </ul> github-actions[bot] tag:github.com,2008:Repository/100921897/v0.6.4 2023-03-09T09:51:14Z 0.6.4 <p>This is a maintenance release</p> <h1>Highlights</h1> <ul> <li>Fixed analysis truncation when using analysis with sentence splitting and the input does not contain data which can be treated as splittable sentences</li> <li>Fixed O(N^2) performance in sentence splitting when underlying reader does not fill buffer fully at once</li> <li>Stop calling into reader with full buffer</li> </ul> github-actions[bot] tag:github.com,2008:Repository/100921897/v0.6.3 2022-08-29T12:50:39Z Sudachi version 0.6.3 <p>Port relaxed boundary mode from 0.7.0 while keeping ABI compatibility with pre-0.7.0 versions.</p> github-actions[bot] tag:github.com,2008:Repository/100921897/v0.7.0 2022-08-16T03:00:38Z Sudachi version 0.7.0 <h1>Highlights</h1> <ul> <li><code>Tokenizer.tokenize</code> API returns <code>MorphemeList</code> instead of <code>List&lt;Morpheme&gt;</code>. This change is ABI-incompatible with previous versions and applications which use Sudachi <strong>require recompilation</strong>. The change should be source-compatible with no changes required to the source code which uses Sudachi.</li> <li>New API: <code>MorphemeList.split</code>: resplit C-mode token sequence to lower level without re-analyzing the whole string.</li> <li>Added relaxed boundary matching mode for Regex OOV handler</li> </ul> github-actions[bot] tag:github.com,2008:Repository/100921897/v0.6.2 2022-06-21T01:05:24Z Sudachi version 0.6.2 <h1>Highlights</h1> <ul> <li>Fixed invalid POS tags which appeared when using user-defined POS tags both in user dictionaries and OOV handlers. You are not affected by this bug if you did not use user-defined POS in OOV handlers.</li> </ul> github-actions[bot] tag:github.com,2008:Repository/100921897/v0.6.1 2022-06-10T08:45:35Z Sudachi version 0.6.1 <h1>Highlights</h1> <ul> <li><strong>DO NOT USE 0.6.0, IT IS INCOMPATIBLE WITH 0.6.1</strong></li> <li>Regex OOV plugin has configurable maximum token length</li> <li>SettingsAnchor renamed to PathAnchor to make more clear its purpose</li> <li>Add useful Config methods, e.g. for a common case of loading default configuration with provided PathAnchor to resolve default paths in another directory.</li> <li>Filesystem-based PathAnchor now plays correctly with SecurityManager present (e.g. in ElasticSearch).</li> </ul> <h2>Regex OOV length</h2> <p>Use <code>maxLength</code> field of the plugin configuration object to set maximum allowed length, in utf-8 bytes (by default 32). The unit will change to unicode codepoints in the future.</p> github-actions[bot]