BasicCAT — Computer-Aided Translation (CAT) Tools BasicCAT Website https://www.basiccat.org/ Tue, 17 Mar 2026 13:24:41 +0000 Tue, 17 Mar 2026 13:24:41 +0000 Jekyll v3.10.0 How to Detect Text Colors in Images <p>Detecting text colors in images, including background colors and stroke colors, can be used for layout restoration, translation, and other operations.</p> <p>Original image:</p> <p><img src="/album/text-color-detection/source.jpg" alt="Original image" /></p> <p>Translated image:</p> <p><img src="/album/text-color-detection/translated.jpg" alt="Translated image" /></p> <h2 id="implementation-principle">Implementation Principle</h2> <p>For background colors, KMeans can be used to cluster colors and identify the dominant colors in the image.</p> <p>For text colors, contour detection can be performed to calculate the average pixel value of the text contour and exclude pixels that are similar to the background color.</p> <p><img src="/album/text-color-detection/plain-color.jpg" alt="Plain color background" /></p> <p>However, if the background is complex and stroke colors or the color of each character need to be extracted, traditional image processing methods may not perform well. In such cases, convolutional neural networks can be used for extraction.</p> <p>The computer-assisted image translation software <a href="/imagetrans/">ImageTrans</a> integrates the algorithms mentioned above.</p> <h2 id="examples">Examples</h2> <p>Below are some operational examples.</p> <h3 id="example-1">Example 1</h3> <p>In many CG images, different colors are often used to distinguish dialogues between different characters.</p> <p><img src="/album/text-color-detection/example.jpg" alt="Example" /></p> <p>ImageTrans can detect stroke colors and text colors. Below are the raw recognition results.</p> <p><img src="/album/text-color-detection/detected.jpg" alt="Detection results" /></p> <p>As can be seen, the text colors and stroke colors have been detected, but they are not very accurate.</p> <p>In the project settings, we can predefine several styles and specify the colors used for each style.</p> <p><img src="/album/text-color-detection/project-settings.jpg" alt="Project settings" /></p> <p>Afterward, perform a color matching operation to match the styles based on the text colors.</p> <p><img src="/album/text-color-detection/workflow.jpg" alt="Workflow" /></p> <p>This resolves the issue of inaccuracies in the detected colors.</p> <p><img src="/album/text-color-detection/adjusted.jpg" alt="Adjusted results" /></p> <h3 id="example-2">Example 2</h3> <p>Some texts use rich text formatting, where a single line of text may contain multiple colors. ImageTrans supports recognizing the style of each character and outputting results with rich text tags.</p> <p><img src="/album/text-color-detection/inline-text.jpg" alt="Inline styles" /></p> Sun, 15 Mar 2026 08:28:50 +0000 https://www.basiccat.org/how-to-detect-text-color-in-image/ https://www.basiccat.org/how-to-detect-text-color-in-image/ imagetrans blog How to Convert Traditional Chinese PDF from Vertical to Horizontal and Enhance Text Clarity <p>I recently came across several older Taiwanese academic papers and found that they were all in PDF format with vertical text, Traditional Chinese, and black-and-white scanned copies. The text was illegible, which was so difficult to read that I worried it might worsen my myopia. So, I came up with some ways to optimize the reading experience. For users in Mainland China, converting the vertical text to horizontal format makes it much easier to read. However, sometimes preserving the original layout is better. So I also tried using OCR to recognize the text, remove the original content, and reformat it with clear, readable text. Below is a demonstration of the results and how to do it.</p> <h2 id="demonstration-of-results">Demonstration of Results</h2> <p>Original file (<a href="https://kmweb.moa.gov.tw/redirect_files.php?id=98635">PDF</a>):</p> <p><img src="/album/vertical-text-PDF/original.png" alt="Original file" /></p> <p>Converted to horizontal markdown format:</p> <div class="language-markdown highlighter-rouge"><div class="highlight"><pre class="highlight"><code>卷五十第 期一十二第 <span class="gu">## 那一個甘蔗品種好?</span> <span class="gu">## 選擇蔗種的基本原則</span> 選擇甘蔗品種的基本原則,是「適地適種」,而「適地」的意義,並不是單指適合的土壤的意思,還包括蔗園所在地點的一切環境條件,諸如氣候狀況、灌溉條件、肥料多少、勞力充份否,間作不間作,留宿根不留等等,都要考慮到.只有適合種蔗環境的品種,才能有好的收成,所以在選擇品種之前,首先要明白所經營的蔗園的環境.下面幾點是主要的考慮對象: (1)土質和地力:土質的粘重或輕鬆,主要關係土壤水分的保持與喪失.粘重的土壤,保水力较强,纵然在灌溉水不够充分的地方,也能栽培比較不耐旱的品種,但是粘重的蔗園容易排水不良,根系太深,不耐浸的品种就发育不良.土质轻松的土壤適恰相反,不適合栽培不耐旱品種,而適應根系深的品種. 甘蔗品種的耐肥性是個個不同的,有的適合在好地方的蔗園,有的在地方好的蔗園反而減產.所以地方的肥沃瘦瘠也需要密切注意. (2)灌溉的條件:要說甘蔗是水做的,一點也沒有錯,一支甘蔗有七〇%以上是水.因之,水成了種蔗的必具條件,只是甘蔗品種裏有些特別喜愛水一些,缺了水,產量隨即降低,另一些對水淡漠一些,早一點影響產量不大.選擇品種的時候,不能不顧到這點. (3)宿根:每一個品種的宿根能力多有不同,有些可以宿根,有些宿根产量很低,留宿根化不來,要是選錯了就要自己吃虧. (4)植期不同:有些品种只能作为秋植,而且必須早秋植;延遲種植,產量降低.另一些品種又只適合春植,作為秋植時並不理想.而二期糊仔有二期糊仔的品種,一期糊仔有一期糊仔的品種, 都混雜不得的. (5)勞力的多少:有的蔗友家中勞力很多,或者財力雄厚,足以招雇够量的勞力來從事精耕栽培,那麼就應該選擇適合精耕栽培的品種.否則,應該選擇在粗放耕作時仍有相當產量的品種. (6)病蟲害問題:現在推廣的品種,大致說對於主要的病蟲害都是抵抗的,但在容易發生病蟲害的環境如玉米栽培地區,有些品種比較容易發生露菌病,應該避免栽培. <span class="gu">## 目前推廣的優良蔗種</span> 談完了選擇品種時考慮的主要種蔗環境條件,現在來選擇品種,首先要介紹一些最近由糖業試驗所育成,在推廣中的甘蔗新品種. F一四六已經推廣三個年期了,大家對它都已十分認識,它是一個中大莖,多分櫱的品種.初期生育緩慢,而在次年高溫多水的季節迅F一四六速生長.原料莖很多,只是短一點,適合在地力中等以上,有水灌溉,不很粘重的地方作为早秋植,F一四六可以間作,宿根也不錯,最高的產量可以達到三十萬公斤. F一四八作為春植或晚秋植時,它增產能力比在早秋植時高,同時也不適合二期糊仔.蔗莖又細小一點,所以蔗友們不很歡迎.事實F一四八上,用F一四八作晚植蔗園的品種,它的產量比F一四六,和F一五二安定.而且在所有的新品種中,F一四八是糖份最高的一個,也是最不怕露菌病的品種,又因為它很早熟,種了它,多半是在製糖季初收穫,蔗友們可以提前利用種蔗後的土地. F一四八也不適合在粘重或過份輕鬆,以及乾旱的地方栽培,間作和宿根的條件不及F一四六. <span class="gu">## F一五一</span> F一五一適合在粘土蔗園秋植及二期糊仔 ,在溫度較低,土壤較乾的時候,萌 芽比其他新品種好.它是中莖種,原料莖比F一四六細而長,也比F一四六容易倒伏,但不易开花;和F一四六同样地晚熟,步留也差不多.露菌病的感染程度較重.F一五一的新植產量在輕鬆的壤土裏不及F一四六,在粘重的土壤則勝過F一四六.宿根的產量和F一四六相仿. F一五一不能作為晚植蔗園的品種. F一五二是地力肥沃,灌溉水充沛蕉園的增產王牌,它的原料莖高而粗,只要保持適當的原料莖數,很容易的得上二十萬公斤以上的F一五一甲當產量.不過在灌溉水缺乏,地力不夠而又沒有辦法施用重肥的蔗園,F一五二沒有增產的希望. 七、八月種植的F一五二容易母莖徒長,因此不如九月種植的蔗園產量穩定.F一五二是個好<span class="sb"> </span><span class="gu">## ·紹介者作期本·</span> ▼湯冠雄:曾任臺灣糖業試驗所種藝系技師,現任該所虎尾蔗作改良場場長,從事甘蔗栽培研究已十七年. <span class="p">![</span><span class="nv">图片</span><span class="p">](</span><span class="sx">)</span> 夏輔禹:安東省岡城縣人,臺灣省立農學院畢業,現任糖業試驗所虎尾蔗作改良場研究組組長,一直擔任蔗田間作試驗方面工作. <span class="p">![</span><span class="nv">图片</span><span class="p">](</span><span class="sx">)</span> </code></pre></div></div> <p>A clear version preserving the original layout (Image + PDF):</p> <p><img src="/album/vertical-text-PDF/reconstructed.png" alt="Clear version" /></p> <p>Text-based PDF with vector text, remains sharp even when zoomed in:</p> <p><a href="/album/vertical-text-PDF/reconstructed-pdf.pdf">File Link</a></p> <h2 id="how-to-guide">How-to Guide</h2> <p>This requires using the OCR software <a href="/imagetrans/">ImageTrans</a>.</p> <ol> <li>Use the built-in bubble detection feature to detect vertical text lines.</li> <li>Use a large language model for OCR. Since the text here is quite blurry, only large language models can achieve good results. I chose the <code class="language-plaintext highlighter-rouge">qwen3-vl-235b-a22b-instruct</code> model for this task.</li> <li>Use PaddleOCR’s DocLayout panel detection model to identify text paragraphs and sort the text lines from right to left according to the reading order.</li> <li>Specify text styles for both vertical and horizontal text. Then, determine which text is horizontal and which is vertical based on the aspect ratio.</li> <li>Export as PDF.</li> <li>Next, export as markdown. Here, we can merge text lines by paragraph to remove unnecessary line breaks.</li> </ol> <p>Software main interface:</p> <p><img src="/album/vertical-text-PDF/imagetrans.jpg" alt="Software main interface" /></p> <p>Displaying panels and order:</p> <p><img src="/album/vertical-text-PDF/imagetrans-order-mode.jpg" alt="Displaying panels and order" /></p> Sun, 08 Mar 2026 11:32:50 +0000 https://www.basiccat.org/how-to-convert-Traditional-Chinese-PDF-from-vertical-to-horizontal-and-enhance-text-clarity/ https://www.basiccat.org/how-to-convert-Traditional-Chinese-PDF-from-vertical-to-horizontal-and-enhance-text-clarity/ imagetrans blog How to Scan Negatives <p>Photos taken with old film cameras usually leave behind negatives (film) after being developed and printed at a photo lab. I recently came across quite a few of these negatives. The negatives contain a type of silver salt that is highly light-sensitive. Standard negatives are 35mm wide and 24mm high, and the developed content is inverted.</p> <p>Because negatives are very small and have an additional orange mask (color cast), digitizing them is not an easy task. A common method is to use a professional film scanner. If a scanner isn’t available, you can also use a mobile phone or a camera to “scan” them by photographing.</p> <h2 id="photographing-with-a-camera">Photographing with a Camera</h2> <ol> <li> <p>Negatives are transparent and require transmitted light to reveal the image. You can prepare a white light source, such as a phone screen or a ceiling light.</p> <p><img src="/album/negative/negative_on_screen.jpg" alt="Negative on a phone screen" /></p> </li> <li> <p>Use a macro lens to take the photo. For a mobile phone, you can attach an external macro lens.</p> <p><img src="/album/negative/macro_lens.jpg" alt="Macro lens attached to a phone" /></p> </li> <li> <p>After shooting, use software like Adobe Photoshop for further processing: inverting the colors and using the “Auto Color” function to remove the orange mask.</p> <p><img src="/album/negative/invert.jpg" alt="Inverted image" /></p> <p><img src="/album/negative/auto_color_result.jpg" alt="Result after auto color adjustment" /></p> </li> </ol> <p>Since a mobile phone was used here, there might be issues like insufficient sharpness, inaccurate colors, and screen moiré patterns. Using a professional camera would yield much better results, potentially offering greater sharpness than some scanners. Additionally, professional cameras can save files in raw format, which is more convenient for color correction.</p> <h2 id="scanning-with-a-scanner">Scanning with a Scanner</h2> <p>Choose a flatbed scanner that supports scanning negatives, such as the Epson V850. The one I used here is an Epson V300, which comes with a transparency unit and film holders, making it suitable for scanning negatives.</p> <p><img src="/album/negative/epson-v300.jpg" alt="Epson V300 scanner" /></p> <p>You can use the manufacturer’s software, like Epson Scan, to scan. It can automatically crop negatives, invert colors, and remove the orange mask.</p> <p><img src="/album/negative/epson-scan.jpg" alt="Epson Scan software interface" /></p> <p>If you prefer to obtain only the raw scan data, you can use software like <a href="/imagetrans/">ImageTrans</a> to control the scanner via interfaces such as TWAIN, ICA, or SANE.</p> <p>Afterwards, you can use ImageTrans or other software for cropping, inverting, and color correction.</p> <p>Scanning using ImageTrans:</p> <p><img src="/album/negative/imagetrans_transparency_unit.jpg" alt="ImageTrans scanning interface showing transparency unit option" /></p> <p>Result after cropping, inverting, and color correction using ImageTrans:</p> <p><img src="/album/negative/imagetrans_processed_photo.jpg" alt="ImageTrans processed photo result" /></p> Fri, 27 Feb 2026 12:28:50 +0000 https://www.basiccat.org/how-to-scan-negatives/ https://www.basiccat.org/how-to-scan-negatives/ imagetrans blog Reverse Engineering of Document Scanner Protocols <p>Scanners are typically accessed through interfaces such as SCSI, USB, and networks. To control a scanner to perform scanning tasks and return scanned data, a set of standards must be followed. Modern scanners generally support the eSCL standard for network connections. However, not all scanners comply with this standard; manufacturers often define their own protocols and provide drivers for applications to use.</p> <p>But drivers are not always available for all operating systems and CPU architectures. As a result, some people have reverse-engineered these protocols, enabling the use of scanners without official drivers. Examples of such software include the open-source SANE and commercial options like ExactScan and VueScan. Additionally, writing custom drivers allows precise control over scanners, such as adjusting colors, enabling or disabling paper jam detection sensors, and previewing scan results in real time. Moreover, with browsers now supporting USB device control, it is even possible to operate scanners directly within a browser.</p> <p>Below are some methods for reverse engineering scanners:</p> <ol> <li><strong>Directly Leverage SANE’s Source Code</strong>: SANE has reverse-engineered most scanners. By studying its code, we can understand the communication methods of various scanners.</li> <li> <p><strong>Capture USB Traffic in an Environment with Working Drivers</strong>: For example, here’s how to do it on Linux:</p> <p>Start capturing:</p> <div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo </span>modprobe usbmon <span class="nb">sudo </span>tcpdump <span class="nt">-i</span> usbmon2 <span class="nt">-w</span> scan.pcap </code></pre></div> </div> <p>Use SANE to scan a document:</p> <div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>scanimage <span class="nt">-o</span> t.jpg <span class="nt">-l</span> 0 <span class="nt">-t</span> 0 <span class="nt">-x</span> 5 <span class="nt">-y</span> 5 </code></pre></div> </div> <p>Then press Ctrl+C to terminate capturing command and save the results. Finally, analyze the results using WireShark.</p> <p>This method requires strong analytical skills regarding the USB protocol and is relatively challenging.</p> </li> </ol> <p>Using Qoder, I had AI write a Python program based on SANE’s Pixma driver to control the Canon Lide 300 via Python and libusb. I tested and it works: <a href="https://github.com/xulihang/Canon-Lide-300-Python-USB-Driver">https://github.com/xulihang/Canon-Lide-300-Python-USB-Driver</a>.</p> <p>For general users, it’s simpler to use existing scanning software, such as <a href="/imagetrans/">ImageTrans</a>, which supports various scanning APIs like TWAIN, WIA, ICA, SANE, and eSCL. It allows users to operate most scanners across different operating systems. Scanned documents can then be further processed for tasks like translation, OCR, and generating searchable PDFs.</p> Fri, 27 Feb 2026 02:42:50 +0000 https://www.basiccat.org/reverse-engineer-document-scanner/ https://www.basiccat.org/reverse-engineer-document-scanner/ imagetrans blog Compiling SANE for macOS <p>I recently purchased a Fujitsu Fi-6130 scanner for just over 500 RMB, which is a relatively affordable sheet-fed scanner. I wanted to use it on macOS, but I found that for older, low-end scanners like this, the official drivers are no longer provided. The only options are to use SANE or VueScan.</p> <p>SANE can be installed via Homebrew, but since I wanted to compile a version that is easy to distribute, I decided to recompile it myself. Here are the steps I took.</p> <ol> <li> <p>Install Homebrew.</p> </li> <li> <p>Install the necessary dependencies.</p> <div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>brew <span class="nb">install </span>autoconf automake libtool gettext git pkg-config libusb libjpeg </code></pre></div> </div> </li> <li> <p>Download the source code package from the SANE official website.</p> </li> <li> <p>Run the following commands to compile:</p> <div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>./autogen.sh ./configure <span class="nt">--prefix</span><span class="o">=</span>/usr/local <span class="se">\ </span> <span class="nv">CPPFLAGS</span><span class="o">=</span><span class="s2">"-I/usr/local/include -I/opt/homebrew/include"</span> <span class="se">\ </span> <span class="nv">LDFLAGS</span><span class="o">=</span><span class="s2">"-L/usr/local/lib -L/opt/homebrew/lib"</span> make make <span class="nb">install</span> </code></pre></div> </div> </li> <li> <p>If you need to distribute it to an environment without Homebrew, you can use <code class="language-plaintext highlighter-rouge">otool</code> and <code class="language-plaintext highlighter-rouge">install_name_tool</code> to modify the paths.</p> </li> </ol> <p>The macOS version of <a href="/imagetrans/">ImageTrans</a> has already integrated SANE, allowing you to directly use scanners like the old fi-6130 for scanning.</p> Tue, 24 Feb 2026 02:42:50 +0000 https://www.basiccat.org/compile-sane-for-macos/ https://www.basiccat.org/compile-sane-for-macos/ imagetrans blog Overview of Document Scanning Interfaces <p>Modern document scanners emerged around the 1980s. To connect scanners with computers, many document scanning APIs were developed: TWAIN, ICA, SANE, WIA, eSCL, and so on. This article provides an overview of these protocols.</p> <h2 id="twain">TWAIN</h2> <p>Scanner manufacturers provide specialized scanning software for mainstream operating systems, typically Windows and macOS. TWAIN is a universal interface for calling the manufacturer-provided software to perform scanning. It is strongly tied to the user interface (UI); although the default scanning interface can be hidden, different UIs may still appear during operations.</p> <p>Because it directly calls the manufacturer-provided software, it offers extensive capabilities, such as acquiring scanned images line by line, detecting barcodes within images, image enhancement, and more.</p> <p>Using TWAIN to call Epson Scan for scanning:</p> <p><img src="/album/document-scanning-api/epson-scan.jpg" alt="epson twain" /></p> <p>TWAIN is primarily used on Windows.</p> <h2 id="wia">WIA</h2> <p>WIA is the officially supported interface for image acquisition devices on Windows. After installing the scanner driver, scanning can be performed through the Windows Fax and Scan application.</p> <p><img src="/album/document-scanning-api/windows-fax.jpg" alt="windows fax" /></p> <p>It can also be called programmatically via COM. You can show the UI or acquire the image silently. The UI, when shown, is a unified, specialized version of WIA.</p> <p><img src="/album/document-scanning-api/wia.jpg" alt="wia" /></p> <p>Using WIA cannot use the manufacturer’s specialized scanning software.</p> <h2 id="ica">ICA</h2> <p>ICA is the official interface provided by Apple. After installing the dedicated ICA driver, a customized scanning interface becomes available in the Image Capture application. It also supports calling via the interface without displaying the UI.</p> <p><img src="/album/document-scanning-api/imagecapture.jpg" alt="ica" /></p> <h2 id="sane">SANE</h2> <p>SANE is the primary scanning interface on Unix-like systems (mainly Linux). It can also be used on macOS.</p> <p>SANE scanner drivers are mostly written through reverse engineering, though some manufacturers, like Epson, do provide dedicated SANE drivers.</p> <p>SANE was designed from the start for network scanning, so it is not as tightly bound to the UI as TWAIN.</p> <h2 id="escl">eSCL</h2> <p>eSCL is an HTTP-based network document scanning protocol promoted by Apple. As long as the scanner is connected to the network, scanning can be performed directly through this interface. It is now maintained by the Mopria organization, initiated by manufacturers such as Canon and HP.</p> <h2 id="scanning-software">Scanning Software</h2> <h3 id="desktop-software">Desktop Software</h3> <ul> <li><strong>NAPS2</strong>: Supports TWAIN, WIA, SANE, eSCL, and ICA. It is an open-source, all-in-one, cross-platform scanning software.</li> <li><strong><a href="imagetrans/">ImageTrans</a></strong>: Integrates document scanning functionality based on WIA, SANE, ICA, and eSCL. It can directly scan documents and perform tasks such as OCR, translation, and generating searchable PDFs.</li> <li><strong>VueScan</strong>: Reverse engineers the drivers of most scanners, allowing direct scanner usage without additional driver installations.</li> <li><strong>SilverFast</strong>: Highly professional in scanning photos and film negatives; typically bundled with scanner purchases.</li> </ul> <h3 id="sdks">SDKs</h3> <ul> <li><strong>Dynamsoft</strong>: Dynamic Web TWAIN</li> <li><strong>Asprise</strong>: JSane, JTWAIN, Scanner.js</li> <li><strong>Leadtools</strong></li> <li><strong>Vintasoft</strong></li> <li><strong>ScanOnWeb</strong></li> </ul> <p>Among these SDKs, Dynamic Web TWAIN supports the most protocols and is the most actively maintained.</p> Sat, 31 Jan 2026 11:32:50 +0000 https://www.basiccat.org/overview-of-document-scanning-interfaces/ https://www.basiccat.org/overview-of-document-scanning-interfaces/ imagetrans blog Batch OCR and Translate Image and PDF files <p>ImageTrans integrates a Hot Folder feature in v5.6.0 that monitors whether a folder has new files added, and if so, it processes those files by executing the specified workflow using the current project as a template.</p> <p>Here’s the basic operation steps:</p> <ol> <li>Create a new project and set up things like language, OCR, custom workflows, and more.</li> <li>If you want to support importing and exporting PDFs, you need to define the settings for importing and exporting in advance in the project settings, and add the exporting PDF in the workflow.</li> <li>Through the menu bar - Tools - Hot Folder, open Hot Folder, set the folder to be monitored, and then you can batch OCR and translation files.</li> </ol> Mon, 05 Jan 2026 11:00:50 +0000 https://www.basiccat.org/batch-ocr-and-translate-image-and-PDF/ https://www.basiccat.org/batch-ocr-and-translate-image-and-PDF/ imagetrans blog Clean, Deskew and Enhance Scanned Books <p>I downloaded several e-books from the Superstar e-library and planned to convert them to PDF to read on Dasung e-ink tablets. However, I found that the scanned books had low clarity, low contrast and other problems like deskew and text on the other side that can be seen. The readability on e-book readers is low.</p> <p><img src="/album/clean-scanned-document/uncleaned-on-eink-tablet.jpg" alt="Uncleaned on electronic ink screen" /></p> <p>After some processing, I finally got a clear version of the PDF, and was able to read it on the e-ink device.</p> <p><img src="/album/clean-scanned-document/cleaned-on-eink-tablet.jpg" alt="Cleaned on electronic ink screen" /></p> <p>Here are the processing steps that went through.</p> <p>Original image:</p> <p><img src="/album/clean-scanned-document/original.jpg" alt="Original" /></p> <p>Super-resolution operation on the image to improve clarity:</p> <p><img src="/album/clean-scanned-document/superresolution.jpg" alt="" /></p> <p>Recognize the rotation of the text in the image and deskew the image accordingly:</p> <p><img src="/album/clean-scanned-document/deskewed.jpg" alt="deskewed" /></p> <p>Recognize the text in the image and binarize the image by text area to get a version that is only black and white:</p> <p><img src="/album/clean-scanned-document/black-white.png" alt="black and white" /></p> <p>The final PDF of this 203-page book is only 8MB in size, and it supports searching for the text in the PDF.</p> <p>The above operations are done in one place using <a href="/imagetrans/">ImageTrans</a>.</p> Mon, 05 Jan 2026 10:36:50 +0000 https://www.basiccat.org/clean-deskew-enhance-scanned-books/ https://www.basiccat.org/clean-deskew-enhance-scanned-books/ imagetrans blog Use Large Language Models to Get Good OCR Results <p>Large language models can accurately understand and process text, and some multimodal vision models can directly process images. Below are several methods for using them to achieve good OCR results.</p> <h2 id="direct-ocr">Direct OCR</h2> <p>Use a large model directly, for example qwen-vl, to process images and extract the text. The results can be very precise.</p> <p>However, large models currently cannot reliably return text coordinates. You usually need to first use a specialized text-localization method to locate text regions, and then use the large model to recognize the text.</p> <h2 id="correcting-recognition-results">Correcting Recognition Results</h2> <p>Use a large model to directly proofread or correct OCR outputs. This approach has lower computational requirements than processing images directly. But you’d better use a model with enough number of parameters. Smaller models (for example, 7B) tend to perform poorly at correction.</p> <h2 id="layout-analysis">Layout Analysis</h2> <p>Large models can also perform layout analysis. This can be used to determine which paragraph each piece of text in an image belongs to and to output text in the correct reading order.</p> Mon, 08 Dec 2025 12:26:50 +0000 https://www.basiccat.org/use-large-language-models-to-get-good-OCR-results/ https://www.basiccat.org/use-large-language-models-to-get-good-OCR-results/ imagetrans blog Can Large Language Models Replace Human Translators in Manga Translation? <style> .post-content table { width: auto; } </style> <p>Can large language models replace human translators in the field of manga translation? Let’s take the following <em>Ranma 1/2</em> manga as an example to compare translations by large language models and human translators. We compared online large language models such as ChatGPT, Claude, and Gemini, offline large language models like Sakura and Qwen, and traditional machine translation engines such as Google, Baidu and Caiyun.</p> <p><img src="/album/imagetrans-language-learning/Ranma1_012.jpg" alt="original" /></p> <p>Comparison table (Japanese to Chinese):</p> <table> <thead> <tr> <th>Translator</th> <th>1. 破っ!!</th> <th>2. あー、</th> <th>3. 調子いい。</th> <th>4. まーたあかねはー。</th> <th>5. んなことばっかやってるからまともにモテないのよ。</th> <th>6. よけーなお世話よ。</th> <th>7. あたしはおねーちゃんと違って男なんか、</th> <th>8. 大っ嫌いなの。</th> <th>9. ふーん、</th> <th>10. じゃーこの話あんたにゃ関係ないか。</th> </tr> </thead> <tbody> <tr> <td>Human (Hongkong)</td> <td>嘿!</td> <td>嘘!</td> <td>厉害吧。</td> <td>小茜。</td> <td>这玩意不适合女孩子玩。</td> <td>多谢关心了。</td> <td>我和你不同。</td> <td>我最讨厌男孩子!</td> <td>唔!</td> <td>那这事和你一点关系也没有了……</td> </tr> <tr> <td>Human (Bilibili)</td> <td>哈!</td> <td>呼!</td> <td>状态不错。</td> <td>小茜你又在练这个。</td> <td>老干这种事,可是交不到男朋友的哦。</td> <td>要你管。</td> <td>我跟姐姐你不一样,我最讨厌的就是。</td> <td>男孩子了!</td> <td>呵呵。</td> <td>那这事就和你没关系了吧。</td> </tr> <tr> <td>Human (op)</td> <td>哈!</td> <td>呼,</td> <td>状态不错。</td> <td>小茜,你又在这里训练…</td> <td>总是做这种事,可要找不到对象噢。</td> <td>不用你管。</td> <td>我和姐姐不一样,我最讨厌的</td> <td>就是男生。</td> <td>这样啊…</td> <td>那这件事和你就没关系了。</td> </tr> <tr> <td>baidu</td> <td>打破!!</td> <td>啊</td> <td>状态很好。</td> <td>啊,茜啊。</td> <td>因为只做这样的事,所以不受欢迎。</td> <td>少管闲事。</td> <td>我和姐姐不一样,男人什么的</td> <td>最讨厌了。</td> <td>嗯</td> <td>那么这件事和你没关系吗。</td> </tr> <tr> <td>chatgpt-gpt4o</td> <td>分裂!!</td> <td>啊—</td> <td>状态不错。</td> <td>又是茜。</td> <td>就是因为老干这些事情,才不会真正受欢迎。</td> <td>不用你多管闲事。</td> <td>我可不像姐姐那样,</td> <td>最讨厌男人了。</td> <td>哦,</td> <td>那这件事就跟你没关系了。</td> </tr> <tr> <td>claude-4.5-sonnet</td> <td>破!!</td> <td>啊—</td> <td>感觉不错。</td> <td>茜又来了—</td> <td>就是因为老做这种事才没人喜欢你。</td> <td>多管闲事。</td> <td>我和姐姐不一样,</td> <td>我最讨厌男人了。</td> <td>哦—</td> <td>那这件事跟你没关系了。</td> </tr> <tr> <td>caiyun</td> <td>突破!!</td> <td>啊 ——</td> <td>状态不错。</td> <td>又见面了,茜。</td> <td>就是因为总是做这种事,所以才不受欢迎啊。</td> <td>多管闲事啊。</td> <td>我和姐姐不一样,对男人什么的,</td> <td>最讨厌了。</td> <td>嗯 ——</td> <td>那么,这件事跟你没关系吗?</td> </tr> <tr> <td>deepseek-chat-v3.2</td> <td>破っ!!</td> <td>啊——</td> <td>状态不错。</td> <td>真是的 小红又这样。</td> <td>你总干这种事所以才不受欢迎啊。</td> <td>多管闲事。</td> <td>我和姐姐不一样 对男人这种生物</td> <td>最——讨厌了。</td> <td>哼——</td> <td>那这事就跟你没关系咯。</td> </tr> <tr> <td>gemini-2.5-flash</td> <td>打破!!</td> <td>啊,</td> <td>状态很好。</td> <td>茜又来了。</td> <td>就是因为老做这种事才没人喜欢你。</td> <td>多管闲事。</td> <td>我可不像姐姐,对男人什么的,</td> <td>最讨厌了。</td> <td>哦,</td> <td>那这件事就和你没关系咯。</td> </tr> <tr> <td>google</td> <td>打破它!!</td> <td>啊,</td> <td>我感觉很好。</td> <td>又是茜。</td> <td>这就是你永远不会真正受欢迎的原因。</td> <td>那不关你的事。</td> <td>与我姐姐不同,我不喜欢男生。</td> <td>我恨它。</td> <td>唔,</td> <td>所以这个故事与你无关。</td> </tr> <tr> <td>gpt-oss-120b</td> <td>破!!</td> <td>啊,</td> <td>状态不错。</td> <td>又是あかね啊——。</td> <td>因为只会干那种事,根本不受人喜欢。</td> <td>真是多管闲事啊。</td> <td>我和姐姐不同,男人这种东西,</td> <td>非常讨厌。</td> <td>哼,</td> <td>那么这件事跟你没关系吧。</td> </tr> <tr> <td>gpt-oss-20b</td> <td>破了!!</td> <td>啊,</td> <td>状态很好。</td> <td>又是阿卡ネ啊。</td> <td>因为只做那种事,根本没法正常受欢迎呢。</td> <td>真麻烦的照顾啊。</td> <td>我和姐姐不同,男人之类的,</td> <td>非常讨厌。</td> <td>嗯,</td> <td>那这事跟你没关系吧。</td> </tr> <tr> <td>qwen3-235b</td> <td>破了!!</td> <td>啊——,</td> <td>状态不错。</td> <td>又在搞那个红呢。</td> <td>净做这种事,难怪一点都吸引不了别人。</td> <td>少管闲事。</td> <td>我和姐姐可不一样,对男人什么的,</td> <td>最讨厌了。</td> <td>哦——,</td> <td>那这事跟你没关系了。</td> </tr> <tr> <td>qwen3-8b</td> <td>破了!</td> <td>啊——</td> <td>气势不错。</td> <td>好吧,我又输了。</td> <td>你老是这么说话,怎么能吸引到正常的人呢。</td> <td>你可真是个麻烦人物。</td> <td>我跟那个姐姐不一样,我一点都不喜欢男人,</td> <td>非常讨厌。</td> <td>哦——</td> <td>所以这个故事跟你无关对吧。</td> </tr> <tr> <td>sakura-14b</td> <td>破!!</td> <td>啊——</td> <td>真是的。</td> <td>又来了。</td> <td>就是成天搞这种事才会没有异性缘哦。</td> <td>真是多管闲事。</td> <td>我和姐姐不一样,最讨厌男生了。</td> <td>大家都一个样。</td> <td>嗯——</td> <td>那这事和你没关系了。</td> </tr> <tr> <td>sakura-7b</td> <td>破!!</td> <td>啊——</td> <td>遵命。</td> <td>又来了啊。</td> <td>就是因为整天做这种事,所以才不受欢迎。</td> <td>多管闲事。</td> <td>我和姐姐不一样,</td> <td>最讨厌男生了。</td> <td>哦——</td> <td>那这件事和你无关。</td> </tr> </tbody> </table> <p>Comparison table (Japanese to English):</p> <table> <thead> <tr> <th>Translator</th> <th>1. 破っ!!</th> <th>2. あー、</th> <th>3. 調子いい。</th> <th>4. まーたあかねはー。</th> <th>5. んなことばっかやってるからまともにモテないのよ。</th> <th>6. よけーなお世話よ。</th> <th>7. あたしはおねーちゃんと違って男なんか、</th> <th>8. 大っ嫌いなの。</th> <th>9. ふーん、</th> <th>10. じゃーこの話あんたにゃ関係ないか。</th> </tr> </thead> <tbody> <tr> <td>Human (vizmedia)</td> <td>Hyaah!</td> <td>Ahh!</td> <td>That was nice.</td> <td>There you go again, Akane.</td> <td>No wonder the boys all think you’re so weird.</td> <td>So why should I care?</td> <td>Not everybody thinks the world revolves around boys, Nabiki.</td> <td>Especially not me.</td> <td>No?</td> <td>Then I guess this would’t interest you.</td> </tr> <tr> <td>Human (op)</td> <td>Yack…</td> <td>Gasp…</td> <td>Good.</td> <td>Akane, you are training again…</td> <td>It won’t help you with boys.</td> <td>It is none of your business.</td> <td>I am different from you…</td> <td>I hate boys.</td> <td>If so…</td> <td>it has nothing to do with you.</td> </tr> <tr> <td>baidu</td> <td>Break!</td> <td>Ah</td> <td>Good condition.</td> <td>Akane Akane.</td> <td>Because I’m doing things like that, I’m really moody.</td> <td>Good luck!</td> <td>I’m quite different from a man</td> <td>I hate you.</td> <td>HMM</td> <td>Why don’t you have this story?</td> </tr> <tr> <td>chatgpt-gpt4o</td> <td>Broke!!</td> <td>Ah,</td> <td>I’m feeling good.</td> <td>There goes Akane again.</td> <td>That’s why you never genuinely attract anyone, because you’re always doing stuff like that.</td> <td>None of your business.</td> <td>Unlike you, big sis, I</td> <td>absolutely hate guys.</td> <td>Hmm,</td> <td>so this has nothing to do with you then.</td> </tr> <tr> <td>claude-4.5-sonnet</td> <td>Hya!!</td> <td>Ah,</td> <td>I feel great.</td> <td>There goes Akane again.</td> <td>That’s why you can’t get a boyfriend acting like that.</td> <td>Mind your own business.</td> <td>Unlike you, sis, I</td> <td>absolutely hate boys.</td> <td>Hmm,</td> <td>then this doesn’t concern you.</td> </tr> <tr> <td>caiyun</td> <td>Break!!</td> <td>Ah,</td> <td>she‘s in good shape.</td> <td>Well, Akane.</td> <td>I‘m not really popular because I do all these things.</td> <td>You‘re being too kind.</td> <td>I‘m different from onee-chan, I’m not a man.</td> <td>I really hate them.</td> <td>Hmm.</td> <td>Then this story has nothing to do with you?</td> </tr> <tr> <td>deepseek-chat-v3.2</td> <td>Break!!</td> <td>Ahh,</td> <td>Feeling good.</td> <td>Maa, Akane is…</td> <td>That’s why you keep doing things like that and can’t get a proper boyfriend.</td> <td>Mind your own business.</td> <td>Unlike you, sis, I hate men,</td> <td>I really hate them.</td> <td>Hmm,</td> <td>Then this story has nothing to do with you.</td> </tr> <tr> <td>gemini-2.5-flash</td> <td>Smash!!</td> <td>Ah,</td> <td>I feel great.</td> <td>Akane, again…</td> <td>You’re always doing things like that, no wonder you’re not popular.</td> <td>Mind your own business.</td> <td>Unlike my sister, I don’t care about guys,</td> <td>Hate them!</td> <td>Hmm,</td> <td>Then this conversation has nothing to do with you.</td> </tr> <tr> <td>google</td> <td>Break it!!</td> <td>ah,</td> <td>I’m feeling good.</td> <td>Akane again.</td> <td>That’s why you’re never really popular.</td> <td>That’s none of your business.</td> <td>Unlike my sister, I don’t like guys.</td> <td>I hate it.</td> <td>Hmm,</td> <td>So this story doesn’t concern you.</td> </tr> <tr> <td>gpt-oss-120b</td> <td>Break!!</td> <td>Ah,</td> <td>Feeling good.</td> <td>There goes Akane again.</td> <td>Because you’re always doing stuff like that, you never get any proper attention from the opposite sex.</td> <td>What a big help.</td> <td>Unlike my sister, I…</td> <td>I really hate men.</td> <td>Hmm,</td> <td>Well then, this story doesn’t concern you, does it?</td> </tr> <tr> <td>gpt-oss-20b</td> <td>Shattered!!</td> <td>Ah,</td> <td>I’m feeling good.</td> <td>Akane again.</td> <td>Because I keep doing things like that, I can’t attract anyone properly.</td> <td>You’re such a nuisance.</td> <td>Unlike my older sister, I don’t like men.</td> <td>I hate them.</td> <td>Hmm,</td> <td>So this story doesn’t concern you.</td> </tr> <tr> <td>qwen3-235b</td> <td>Break!!</td> <td>Ah,</td> <td>I’m feeling great.</td> <td>Akane, again…</td> <td>That’s why you never get properly liked by anyone.</td> <td>None of your business.</td> <td>Unlike onee-chan, I hate guys,</td> <td>I really hate them.</td> <td>Huh,</td> <td>Then this conversation has nothing to do with you.</td> </tr> <tr> <td>qwen3-8b</td> <td>Break!!</td> <td>Ugh,</td> <td>I’m in a good mood.</td> <td>You’re always so clumsy.</td> <td>Because you keep doing such stupid things, you can’t attract anyone properly.</td> <td>You’re really making things easy for me.</td> <td>Unlike you, I don’t like guys,</td> <td>At all.</td> <td>Hmmm,</td> <td>Then this story has nothing to do with you.</td> </tr> </tbody> </table> <p>Through comparison, it can be observed that current large AI models still fall short of human translation when translating high-context languages like Japanese. Issues include incorrect pronouns, mistranslated names, poor coherence, and overly literal sentence structures that lack naturalness.</p> <p>However, human translation also has its problems, such as translators over-interpreting or not adhering closely enough to the original text. This was more common before the emergence of high-quality machine translation. Nowadays, with the widespread adoption of machine translation post-editing, many human translators may directly use machine translation results, resulting in higher fidelity to the original text.</p> <p>As for which large language model performs the best, it can be seen that the more parameters a model has, the better its quality tends to be. Some large language models, such as Sakura, are fine-tuned using Japanese-Chinese corpora. However, due to their smaller parameter amount, their translation performance remains suboptimal and is prone to hallucinations compared to larger lanuage models with more parameters. But their performance is usually better than general models with similar number of parameters.</p> <p>Below are tables of different engines’ BLEU scores. The scores are calculated comparing the similarity with the human translation, which can be used to evaluate the quality.</p> <p>Japanese to Chinese:</p> <table> <thead> <tr> <th>Engine</th> <th>BLEU@bilibili</th> <th>BLEU@Hongkong</th> <th>BLEU@op</th> <th>Average</th> </tr> </thead> <tbody> <tr> <td>qwen3-235b</td> <td>0.2035</td> <td>0.0484</td> <td>0.2247</td> <td>0.1589</td> </tr> <tr> <td>caiyun</td> <td>0.1760</td> <td>0.0481</td> <td>0.2147</td> <td>0.1463</td> </tr> <tr> <td>deepseek-chat-v3.2</td> <td>0.1803</td> <td>0.0335</td> <td>0.1719</td> <td>0.1286</td> </tr> <tr> <td>claude-4.5-sonnet</td> <td>0.1104</td> <td>0.0917</td> <td>0.1751</td> <td>0.1257</td> </tr> <tr> <td>chatgpt-gpt4o</td> <td>0.1253</td> <td>0.0565</td> <td>0.1448</td> <td>0.1089</td> </tr> <tr> <td>sakura-14b</td> <td>0.1115</td> <td>0.0652</td> <td>0.1370</td> <td>0.1046</td> </tr> <tr> <td>sakura-7b</td> <td>0.0647</td> <td>0.0816</td> <td>0.1530</td> <td>0.0998</td> </tr> <tr> <td>baidu</td> <td>0.0933</td> <td>0.0533</td> <td>0.1408</td> <td>0.0958</td> </tr> <tr> <td>gpt-oss-120b</td> <td>0.1032</td> <td>0.0225</td> <td>0.1226</td> <td>0.0828</td> </tr> <tr> <td>gemini-2.5-flash</td> <td>0.0961</td> <td>0.0400</td> <td>0.0981</td> <td>0.0781</td> </tr> <tr> <td>gpt-oss-20b</td> <td>0.0616</td> <td>0.0320</td> <td>0.0868</td> <td>0.0601</td> </tr> <tr> <td>qwen3-8b</td> <td>0.0527</td> <td>0.0225</td> <td>0.0601</td> <td>0.0451</td> </tr> <tr> <td>google</td> <td>0.0247</td> <td>0.0552</td> <td>0.0502</td> <td>0.0434</td> </tr> </tbody> </table> <p>Japanese to English:</p> <table> <thead> <tr> <th>Engine</th> <th>BLEU@vizmedia</th> <th>BLEU@op</th> <th>Average</th> </tr> </thead> <tbody> <tr> <td>chatgpt-gpt4o</td> <td>0.0545</td> <td>0.1968</td> <td>0.1257</td> </tr> <tr> <td>qwen3-235b</td> <td>0.0523</td> <td>0.1864</td> <td>0.1194</td> </tr> <tr> <td>deepseek-chat-v3.2</td> <td>0.0813</td> <td>0.1432</td> <td>0.1122</td> </tr> <tr> <td>claude-4.5-sonnet</td> <td>0.0606</td> <td>0.1267</td> <td>0.0937</td> </tr> <tr> <td>caiyun</td> <td>0.0625</td> <td>0.1193</td> <td>0.0909</td> </tr> <tr> <td>google</td> <td>0.0468</td> <td>0.1348</td> <td>0.0908</td> </tr> <tr> <td>gemini-2.5-flash</td> <td>0.0515</td> <td>0.1294</td> <td>0.0904</td> </tr> <tr> <td>baidu</td> <td>0.0540</td> <td>0.1065</td> <td>0.0803</td> </tr> <tr> <td>qwen3-8b</td> <td>0.0438</td> <td>0.1081</td> <td>0.0760</td> </tr> <tr> <td>gpt-oss-120b</td> <td>0.0473</td> <td>0.0850</td> <td>0.0662</td> </tr> <tr> <td>gpt-oss-20b</td> <td>0.0477</td> <td>0.0711</td> <td>0.0594</td> </tr> </tbody> </table> <p>Of course, manga translation is multimodal. It involves other works like lettering and retouching. Here is the English version by Viz Media:</p> <p><img src="/album/ranma-vizmedia.webp" alt="Vizmedia" /></p> <p>Check out <a href="/imagetrans/">ImageTrans</a>, a computer-aided image translation tool, to complete manga translation with the help of various large language models. It is an integrated app, which lets you do OCR, translation, lettering and retouching in one place.</p> Tue, 25 Nov 2025 13:20:50 +0000 https://www.basiccat.org/can-large-language-model-replace-human-translator-in-terms-of-manga-translation/ https://www.basiccat.org/can-large-language-model-replace-human-translator-in-terms-of-manga-translation/ imagetrans blog