Neil Ernst activity https://gitlab.com/neilernst 2026-01-26T20:30:57Z tag:gitlab.com,2026-01-26:5034888881 Neil Ernst commented on issue #218 at Remi Rampin / taguette 2026-01-26T20:30:57Z neilernst Neil Ernst

Hi @remram44 any interest in pursuing this? ROCK seems to be under active development - new links are:

tag:gitlab.com,2024-04-26:3325310178 Neil Ernst commented on issue #322 at Remi Rampin / taguette 2024-04-26T19:22:08Z neilernst Neil Ernst

I restarted and the error went away so perhaps something got corrupted. I left the server running and upgraded Homebrew packages so it might have messed with Python. I also changed this to true include-system-site-packages = true in pyvenv.cfg. At any rate it now works, thanks.

tag:gitlab.com,2024-04-25:3322708541 Neil Ernst opened issue #322: Taguette export fails with custom db name at Remi Rampin / taguette 2024-04-25T20:24:05Z neilernst Neil Ernst

I am using a taguette-config.py file. My db is named sqlite:///taguette.sqlite3 (local to that directory). This is on a Mac M2 with Taguette 1.4.1 from pip.

If I go to the UI and select Project->Export Project, it throws an exception:

2024-04-25 13:12:15,600 INFO: Connecting to SQL database 'sqlite:////var/folders/ht/l1bt4z0x5fb5j5crxp4_wv5m0000gn/T/taguette_export_o_rom58_/db.sqlite3'
2024-04-25 13:12:15,603 WARNING: The tables don't seem to exist; creating
2024-04-25 13:12:15,624 INFO: Context impl SQLiteImpl.
2024-04-25 13:12:15,624 INFO: Will assume non-transactional DDL.
2024-04-25 13:12:15,633 INFO: Running stamp_revision  -> db5e31a0233d
2024-04-25 13:12:15,635 INFO: Context impl SQLiteImpl.
2024-04-25 13:12:15,635 INFO: Will assume non-transactional DDL.
2024-04-25 13:12:15,659 ERROR: Uncaught exception GET /project/2/export/project.sqlite3 (127.0.0.1)
HTTPServerRequest(protocol='http', host='localhost:7465', method='GET', uri='/project/2/export/project.sqlite3', version='HTTP/1.1', remote_ip='127.0.0.1')
Traceback (most recent call last):
  File "/Users/nernst/Documents/projects/sloan-td/taguette.virtualenv.arm/lib/python3.11/site-packages/tornado/web.py", line 1790, in _execute
    result = await result
             ^^^^^^^^^^^^
  File "/Users/nernst/Documents/projects/sloan-td/taguette.virtualenv.arm/lib/python3.11/site-packages/taguette/web/export.py", line 306, in get
    database.copy_project(
  File "/opt/homebrew/Cellar/[email protected]/3.11.6/Frameworks/Python.framework/Versions/3.11/lib/python3.11/contextlib.py", line 81, in inner
  File "/Users/nernst/Documents/projects/sloan-td/taguette.virtualenv.arm/lib/python3.11/site-packages/taguette/database/copy.py", line 62, in copy_project
    mapping_document = copy(
                       ^^^^^
  File "/Users/nernst/Documents/projects/sloan-td/taguette.virtualenv.arm/lib/python3.11/site-packages/taguette/database/copy.py", line 24, in copy
    return copy_table(
           ^^^^^^^^^^^
  File "/Users/nernst/Documents/projects/sloan-td/taguette.virtualenv.arm/lib/python3.11/site-packages/taguette/database/copy.py", line 251, in copy_table
    if not validators[key](value):
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/nernst/Documents/projects/sloan-td/taguette.virtualenv.arm/lib/python3.11/site-packages/taguette/convert.py", line 256, in is_html_safe
    cleaned = bleach.clean(
              ^^^^^^^^^^^^^
  File "/Users/nernst/Documents/projects/sloan-td/taguette.virtualenv.arm/lib/python3.11/site-packages/bleach/__init__.py", line 74, in clean
    cleaner = Cleaner(
              ^^^^^^^^
  File "/Users/nernst/Documents/projects/sloan-td/taguette.virtualenv.arm/lib/python3.11/site-packages/bleach/sanitizer.py", line 132, in __init__
    self.walker = html5lib_shim.getTreeWalker("etree")
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/nernst/Documents/projects/sloan-td/taguette.virtualenv.arm/lib/python3.11/site-packages/bleach/_vendor/html5lib/treewalkers/__init__.py", line 57, in getTreeWalker
    from . import etree
  File "/Users/nernst/Documents/projects/sloan-td/taguette.virtualenv.arm/lib/python3.11/site-packages/bleach/_vendor/html5lib/treewalkers/etree.py", line 8, in <module>
    from . import base
  File "/Users/nernst/Documents/projects/sloan-td/taguette.virtualenv.arm/lib/python3.11/site-packages/bleach/_vendor/html5lib/treewalkers/base.py", line 3, in <module>
    from xml.dom import Node
ModuleNotFoundError: No module named 'xml.dom'
2024-04-25 13:12:15,662 ERROR: 500 GET /project/2/export/project.sqlite3 (127.0.0.1) 75.84ms (admin) lang=en-CA,en-US;q=0.9,en;q=0.8
tag:gitlab.com,2024-03-07:3211924421 Neil Ernst commented on issue #321 at Remi Rampin / taguette 2024-03-07T20:40:21Z neilernst Neil Ernst

There might be people relying on the heuristic I guess?

Honestly, from an effort point of view it's probably simpler to make people do the conversion outside Taguette, using Calibre, Pandoc, etc. Then you could only support HTML and plain text formats and remove a major source of hassle. I guess it's a question of where to put resources. From my point of view tag editing, export, reporting are more important features.

Thanks for the tool!

tag:gitlab.com,2024-03-07:3211898157 Neil Ernst opened issue #321: Allow Calibre to parse plain text as plain at Remi Rampin / taguette 2024-03-07T20:23:16Z neilernst Neil Ernst

Transcriptions exported from e.g. Otter.ai are usually lightly formatted plain text (formatting = new lines + timestamps). However, when I parse this with Calibre in Taguette, it forces Calibre to use heuristics to identify structures and add <H2> and similar tags. I don't actually want any of this - ideally Taguette preserves the original text document look and feel.

The way to do this seems to be passing the --formatting-type plain to Calibre when converting a plain text document. It would be helpful to expose in an advanced interface a way for Taguette to pass options to Calibre as part of the import dialog. Alternately, just assume the plain text file should not be enhanced with heuristics, since most people likely don't understand what those are (and we aren't in an ebook context anyway).

MWE:

original file (snippet) from Otter.ai:

A 45:02
Right? And I guess, you know, your career will be similar, right? You'll be looking for permanent job and you don't want to be postdoc forever.

P10 45:15
Yeah. Yeah. Same thing. Right.

calibre conversion with Taguette and its use of --enable-heuristics:

<p class="calibre1">A 45:02</p>

<p class="calibre1">Right? And I guess, you know, your career will be similar, right? You’ll be looking for permanent job and you don’t want to be postdoc forever.</p>

<h2 class="calibre2">P10 45:15</h2>

<p class="calibre1">Yeah. Yeah. Same thing. Right.</p>

<p class="whitespace"> </p>

The workaround is to convert the text file directly to HTML using calibre, and then import the HTML file in Taguette, rather than the text file.