chore/change default split page behavior to true#118
Merged
Conversation
* Set the split_pdf_page default to true * Update the readme, add another reference back to our docs * Change some warning logs to info. The user should not be warned about default behavior for non pdf files
Klaijan
approved these changes
Jun 17, 2024
Contributor
Klaijan
left a comment
There was a problem hiding this comment.
Tested with non-pdf files, logging shows INFO then print out output elements
e.g.
INFO: Preparing to split document for partition.
INFO: Given file doesn't have '.pdf' extension, so splitting is not enabled.
INFO: Partitioning without split.
INFO: Successfully partitioned the document.
[
{
"type": "Title",
"element_id": "7366bfb62015d7f749e8c38e7284a60c",
"text": "Lorem ipsum dolor sit amet.",
"metadata": {
"category_depth": 0,
"filename": "fake.doc",
"languages": [
"por",
"cat"
],
"filetype": "application/msword"
}
}
]
awalker4
added a commit
to Unstructured-IO/unstructured-js-client
that referenced
this pull request
Jun 17, 2024
Mirror of Unstructured-IO/unstructured-python-client#118 * Set the split_pdf_page default to true and run `make client-generate` locally. * Update the readme, add another reference back to our docs, bring back some autogenerated sections like in the python repo * Change some warning logs to info. The user should not be warned about default behavior for non pdf files # Testing Use the client locally and verify that split mode is the default, and that the dev experience is good * Create a new test dir and run `npm init -y; npm install typescript tsx` * Check out this branch and install from your test dir: `npm i file:~/repos/unstructured-js-client` * Run this sample script. Try some different files in and verify that the logging and results look acceptable. `npx tsx unstructured.ts` ``` import { UnstructuredClient } from "unstructured-client"; import { PartitionResponse } from "unstructured-client/sdk/models/operations"; import { Strategy } from "unstructured-client/sdk/models/shared"; import * as fs from "fs"; const key = "free-api-key"; const client = new UnstructuredClient({ security: { apiKeyAuth: key, }, }); const filename = "fake-html.html"; const data = fs.readFileSync(filename); client.general.partition({ partitionParameters: { files: { content: data, fileName: filename, }, strategy: Strategy.Auto, } }).then((res: PartitionResponse) => { if (res.statusCode == 200) { console.log(res.elements); } }).catch((e) => { if (e.statusCode) { console.log(e.statusCode); console.log(e.body); } else { console.log(e); } }); ```
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
make client-generatelocally.Testing
Use the client locally and verify that split mode is the default, and that the client behavior is consistent with older versions.
pyenv virtualenv 3.12 unstructured-client; pyenv activate unstructured-clientpip install -e ._sample_docsand verify that the logging and results look acceptable.