You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This commit was created on GitHub.com and signed with GitHub’s verified signature.
release: 1.0.0 (#2)
* feat: added pdf extraction with initial "greedy" chunking strategy
* refactor: chunk strategies using `langchain`'s implementations
* doc: added more information
* refactor: `by_char` chunking was changed to `by_separator`
- Flags were added to `pdf`.
- A verbose option on execution.
- Other code structure changes
* refactor: better readability
* doc: more info on executing and flags
* chore: addindg requirements file
* chore: adding script to run multiple strategies
* chore: added script to handle resource download for some of the strategies
* fix: setting correct `punkt_tab` resource
* docs: removed production 'how to run' still unclear
* refactor: changes on script flags to get terminal auto complete
* refactor: restart from strach
- `argsparse` instead of `click` to reduce dependencies.
- packages restructure to have easier execution and maintanability.
- dropping `run_pdf_strategies.sh` script, now the `--file` flag can handle multiple files and folders.
- `--chunk-strategy` can handle multiple strategies at once.
- different pdf reader to be more performant and optimized.
- other minor changes.
* fix: some fixes to the project
- decode issues with `\n` in `--chunk-separator`.
- removed useless validations on flags.
- other fixes in code.
* refactor: removed prints