-
-
Notifications
You must be signed in to change notification settings - Fork 18
Comparing changes
Open a pull request
base repository: mehmet-kozan/pdf-parse
base: v2.4.3
head repository: mehmet-kozan/pdf-parse
compare: v2.4.4
- 12 commits
- 254 files changed
- 1 contributor
Commits on Oct 17, 2025
-
* Refactor reports and tests directory structure Migrated all reporting and test files from 'reports_site' and 'test' to new 'reports' and 'tests' directories for improved organization. Updated scripts, configs, and documentation to reflect new paths. Cleaned up .gitignore and workflow to match new structure. Added Vite configs for browser and worker builds. Updated references throughout the project for consistency. * Update README.md * Update test and config paths to new reports directory Refactored all test and config file references from legacy 'test/pdf_file' and 'reports_site' directories to the new 'reports/pdf' and 'reports' structure. This improves consistency and aligns with the updated project organization. * Update config paths and remove test outputs Changed test output ignore patterns in .gitignore to match 'tests/' directory. Updated api-extractor.json to use new tsconfig and report paths. Deleted multiple test output files and images from the tests/unit directory to clean up obsolete or unnecessary test artifacts. * Update clean:report script and remove API reports The clean:report npm script now deletes the reports/api directory in addition to other report folders. Removed pdf-parse.api.json and tsdoc-metadata.json from reports/api, likely to clean up generated API documentation artifacts. * Refactor build configs and update CDN references Removed vite.config.browser.min.ts and consolidated minification into vite.config.browser.ts. Renamed vite.config.node.ts to vite.config.cjs.ts and updated build scripts in package.json to reflect these changes. Updated CDN links and documentation in README.md, improved .gitignore rules, and adjusted the browser export type path. The build process now outputs minified files by default and copies necessary worker files and type definitions. * Refactor config structure and update build scripts Moved and renamed Vite and TypeScript config files into a 'configs' directory for better organization. Deleted redundant or duplicate config files. Updated build scripts in package.json to reference new config paths and simplified the browser build to use the default vite.config.ts. Adjusted CJS Vite config to use 'terser' minification and removed unused options. * Update TypeScript config paths and output directories Adjusted tsconfig and api-extractor.json to use correct relative paths for rootDir and outDir, moving type config files to the configs directory. Updated include and exclude patterns to reflect new directory structure and ensure correct type declaration output. * Update release workflow and integration test packaging Changed the release workflow to create draft releases and updated triggers for deploy and publish workflows to run on release publication. Integration test scripts now pack the local pdf-parse package and integration test dependencies use the local .tgz file instead of 'latest'. Also incremented package version to 2.4.4 and improved clean script. * Update benchmark setup and integration test packaging Set 'bench:install' to always install the latest pdf2json. Update integration test script to also copy the packed tarball to the reports directory. Fix benchmark import to use the correct browser build output. * Move CustomCanvasFactory export to worker module CustomCanvasFactory is now exported from 'pdf-parse/worker' instead of 'pdf-parse/canvas'. Updated documentation, type definitions, and tests to reflect this change. Also removed unused subpath exports from package.json. * Refactor test data and update test imports Moved PDF test data files from tests/unit/pdf_data to tests/unit/helper and updated references in test files accordingly. PDF files are now stored under reports/pdf. Removed redundant and legacy test files, and simplified test imports to use the new helper structure. Updated CONTRIBUTING.md to reflect new test data and PDF file locations.
Configuration menu - View commit details
-
Copy full SHA for 691aeff - Browse repository at this point
Copy the full SHA 691aeffView commit details -
Refactor table test helpers and update site scripts (#26)
Moved PDF test files to reports/pdf and refactored table test helpers into individual modules for empty, full, multi, and simple tables. Updated test cases to use new helpers and removed redundant data files. Adjusted site build and documentation scripts to use 'report:build' and 'report' instead of 'site:build' and 'site'.
Configuration menu - View commit details
-
Copy full SHA for 55567a2 - Browse repository at this point
Copy the full SHA 55567a2View commit details -
* Refactor import order in test files Moved all 'data' imports to the top of each test file for consistency and improved readability. Also removed three type-specific test files from the test-types directory. * Add custom exception classes and refactor error handling Introduces custom exception classes (InvalidPDFException, PasswordException, FormatError) in src/Exception.ts and exports them. Refactors PDFParse to use getException for error mapping. Updates API documentation, test helpers, and test structure to use new exceptions and reorganizes test files and sample PDFs. Adjusts tsconfig and vitest config for improved module resolution and coverage. * Refactor test data and add custom exception classes Moved PDF test data to helper classes and updated test files to use them, improving maintainability. Added AbortException, ResponseException, and UnknownErrorException classes to src/Exception.ts for better error handling. Updated vitest config to use node environment. Cleaned up and replaced old test files with new modular tests.
Configuration menu - View commit details
-
Copy full SHA for 82dc4f0 - Browse repository at this point
Copy the full SHA 82dc4f0View commit details
Commits on Oct 18, 2025
-
New utils api implemented. (#28)
* Add Japanese PDF test and helper files Added a Japanese PDF file for testing, a corresponding helper module, and a unit test to verify text extraction and page count. This enhances test coverage for Japanese language PDFs. * Move getHeader to utils and refactor usage The getHeader function and HeaderResult type were moved from src/HeaderResult.ts to utils/getHeader.ts, and related exports were updated. PDFParse no longer includes getHeader; tests and documentation now use getHeader from pdf-parse/utils. Build scripts, configs, and package.json were updated to support the new utils module and its TypeScript and bundling setup.
Configuration menu - View commit details
-
Copy full SHA for 881fbbe - Browse repository at this point
Copy the full SHA 881fbbeView commit details -
API Documentation, Public Type Annotations and Workflow Updates (#29)
* Update GitHub Actions triggers and caching settings Renamed several workflow files for backup purposes. Updated triggers for test, integration, and unsupported test workflows to run on push to main, pull requests, and workflow_dispatch. Enabled package-manager cache in test workflow and commented out NPM cache cleaning step. * Add PDF reports and update test URLs to local assets Added climate-change.pdf and climate.pdf to reports/pdf. Updated test cases in large-file.test.ts and url.test.ts to use local GitHub Pages PDF assets instead of external URLs for more reliable and consistent testing. Added @public JSDoc annotations to exception classes in src/Exception.ts for improved documentation. Minor workflow improvements in test.yml for Node.js setup and npm install. * Add and improve JSDoc @public annotations Added and refined JSDoc @public annotations and documentation for exported classes, interfaces, and types across core modules. This improves API clarity and TypeScript documentation, making public API surfaces explicit for consumers. * Update settings and API extractor config Associate tsdoc-metadata.json with JSONC in VS Code settings. Enable 'includeForgottenExports' and update reportFolder in api-extractor.json for improved API documentation output.
Configuration menu - View commit details
-
Copy full SHA for dd1c603 - Browse repository at this point
Copy the full SHA dd1c603View commit details -
getTable() return type changed. (#30)
* Update GitHub Actions triggers and caching settings Renamed several workflow files for backup purposes. Updated triggers for test, integration, and unsupported test workflows to run on push to main, pull requests, and workflow_dispatch. Enabled package-manager cache in test workflow and commented out NPM cache cleaning step. * Add PDF reports and update test URLs to local assets Added climate-change.pdf and climate.pdf to reports/pdf. Updated test cases in large-file.test.ts and url.test.ts to use local GitHub Pages PDF assets instead of external URLs for more reliable and consistent testing. Added @public JSDoc annotations to exception classes in src/Exception.ts for improved documentation. Minor workflow improvements in test.yml for Node.js setup and npm install. * Add and improve JSDoc @public annotations Added and refined JSDoc @public annotations and documentation for exported classes, interfaces, and types across core modules. This improves API clarity and TypeScript documentation, making public API surfaces explicit for consumers. * Update settings and API extractor config Associate tsdoc-metadata.json with JSONC in VS Code settings. Enable 'includeForgottenExports' and update reportFolder in api-extractor.json for improved API documentation output. * Refactor table extraction and update examples/docs Refactored table extraction logic to group tables per page, introduced TableArray type, and updated related interfaces. Added new example scripts for all major features and improved README with clearer usage, exception handling, and browser integration instructions. Removed obsolete demo and adjusted tests to match new table result structure.
Configuration menu - View commit details
-
Copy full SHA for 4b6c46e - Browse repository at this point
Copy the full SHA 4b6c46eView commit details
Commits on Oct 19, 2025
-
CDN urls updated
pdf-parse/utilsreplaced withpdf-parse/node(#31)* Refactor demo styles and remove example scripts Moved shared CSS for demo HTML files into a new styles.css file and updated HTML files to reference it, reducing duplication. Removed all files from the examples directory. Updated package.json keywords for improved discoverability and set dependency versions to exact values. * Refactor utils config paths and improve PDF text extraction Moved utility config files to project root and updated related paths in scripts and configs. Enhanced PDFParse to support custom page joiners and improved line break detection based on line height. Cleaned up test descriptions and documentation, and simplified Vitest config plugin usage and path aliases. * Update path alias and remove vite-tsconfig-paths plugin Moved the 'pdf-parse' alias below 'pdf-parse/utils' in vitest.config.ts and removed the vite-tsconfig-paths plugin. Also made a minor formatting change in tsconfig.json for the 'pdf-parse' path. This streamlines path resolution and removes unnecessary plugin usage. * Add Vitest package config and update scripts Introduces vitest.config.package.ts for package-specific test configuration. Updates package.json to add a beta version, new test script 'test:p', and reorganizes devDependencies for improved test management. * Refactor project structure and update build configs Moved core source files to src/pdf-parse and node-specific files to src/node. Updated build outputs, TypeScript configs, and package.json exports to reflect new directory structure. Renamed utility configs and extractor configs for clarity. Updated test imports and removed legacy TableUtil and related test. Adjusted Vite and Vitest configs for new paths. Expanded API documentation for geometry and table types. * Add worker build script and canvas utilities Introduces a new build script for worker code using esbuild, adds a worker entry point and canvas utility classes for use in worker environments, and updates the Vite config to copy the worker bundle. Also adds a new npm script for building the worker and makes minor improvements to integration test script. * Add API Extractor config and update worker build process Introduces api-extractor.worker.json and tsconfig.worker.json for generating type declarations and API reports for the worker build. Updates package.json exports and scripts to use the new build and type output structure. Enhances build-worker.mjs to handle type file copying and cleanup, and adds ESM build output. Test imports updated to use CanvasFactory instead of CustomCanvasFactory. * Switch pdf.worker to .mjs and add type declaration Updated all references from pdf.worker.js to pdf.worker.mjs for consistency with ES module usage. Added a TypeScript ambient module declaration for pdf.worker.mjs to prevent import errors. Improved worker loading logic with error handling and fallback in index.ts. * Update worker API and examples for consistency Renamed getDataUrl to getData in worker API and updated all references in documentation and examples. Improved worker path resolution for both CJS and ESM environments. Added troubleshooting examples for worker usage. Updated CDN URLs and usage instructions in README for clarity. Enhanced build script to handle import.meta.url replacement and improved type cleanup. * Remove worker and canvas build scripts and files Deleted all files related to the custom worker and canvas build pipeline, including bin/canvas, bin/worker, scripts/rename-cjs.mjs, and vite.config.worker.ts. Updated package.json to remove unused build:worker.back and cleaned up scripts. Added 'pdf-parse/worker' alias in vitest.config.ts and updated VSCode settings for new metadata files. This refactor removes legacy worker/canvas build logic in favor of a new approach. * Rename browser build to web and update references Renamed all 'browser' build outputs and references to 'web' for consistency across the codebase, including source files, build scripts, documentation, and example/demo imports. Updated related paths in .gitignore, biome.json, package.json, Vite config, and test/benchmark imports. Added new worker tests and adjusted test/benchmark structure for improved coverage and organization.
Configuration menu - View commit details
-
Copy full SHA for 8a7a044 - Browse repository at this point
Copy the full SHA 8a7a044View commit details -
Command-line interface for quick PDF processing implemented. (#32)
* Add CLI tool and tests for pdf-parse Introduces a new CLI entry point (bin/cli.mjs) for extracting PDF metadata, text, images, screenshots, and tables. Adds argument parsing (bin/minimist.mjs), basic CLI tests (bin/cli.test.mjs), and registers the CLI in package.json as 'pdf-parse'. * Add 'ss' alias and update image threshold option Introduces 'ss' as an alias for the 'screenshot' command and changes the image size threshold option from '--imageThreshold' to '--min' for consistency. Updates help text and argument parsing accordingly. * Add 'check' command to validate PDF headers from URL Introduces a new 'check' command to the CLI that validates PDF file headers and format for files accessible via URL. Updates help text and examples, and adds supporting functions for header retrieval and output formatting. * Add CLI magic option and expand tests, docs Introduces a new --magic option to the CLI for PDF magic byte validation. Expands CLI test coverage for commands, options, and error handling. Adds comprehensive CLI usage documentation in docs/README.cli.md. * Update README.md * Refactor PDF header check to use 'magic' property Replaces the 'isPdf' property with 'magic' in getHeader logic and updates CLI output, tests, and documentation accordingly. This clarifies the result of PDF magic bytes validation and improves consistency across the codebase. * Add --large flag for optimized large PDF processing Introduces a --large CLI flag to enable performance optimizations for large PDF files, including disabling auto-fetch, disabling streaming, and increasing range chunk size. Updates help text and documentation to describe the new flag and its usage.
Configuration menu - View commit details
-
Copy full SHA for 1bfae05 - Browse repository at this point
Copy the full SHA 1bfae05View commit details -
NPM publish workflow now triggers only on published releases. (#33)
* Refactor NPM publish workflow and update CLI docs Simplifies the GitHub Actions workflow by removing integration tests, only triggering on release publication, and improving npm publish tagging logic. Also updates CLI documentation to clarify the '--magic' option. * Update publish_npm_package.yml
Configuration menu - View commit details
-
Copy full SHA for 69eec3e - Browse repository at this point
Copy the full SHA 69eec3eView commit details -
Release v2.4.4 and update README formatting
Bumped package version from 2.4.4-beta.1 to 2.4.4 and removed the sideEffects field from package.json. README.md received minor formatting improvements and updated feature descriptions for clarity.
Configuration menu - View commit details
-
Copy full SHA for c153bf7 - Browse repository at this point
Copy the full SHA c153bf7View commit details
Commits on Oct 20, 2025
-
Update API Extractor config and rename API docs
Changed API Extractor report file names and output folders for consistency. Renamed generated API documentation files to match new naming convention and added node API report file.
Configuration menu - View commit details
-
Copy full SHA for 1e87edd - Browse repository at this point
Copy the full SHA 1e87eddView commit details -
Configuration menu - View commit details
-
Copy full SHA for 54937b6 - Browse repository at this point
Copy the full SHA 54937b6View commit details
This comparison is taking too long to generate.
Unfortunately it looks like we can’t render this comparison for you right now. It might be too big, or there might be something weird with your repository.
You can try running this command locally to see the comparison on your machine:
git diff v2.4.3...v2.4.4