| Input | 1 text file with 1 sentence per line |
| Output |
top_knumber of sentences that match input query |
| Jina version | 0.9.22 |
This is an example of using Jina's neural search framework to search through a selection of individual Wikipedia sentences downloaded from Kaggle. It's based on code generated by jina hub new --type app. It uses the distilbert-based-uncased language model from Transformers.
To test this example you can run a Docker image with 30,000 pre-indexed sentences:
docker run -p 45678:45678 jinahub/app.example.wikipedia-sentences-30k:0.2.8-0.9.23You can then query by running:
curl --request POST -d '{"top_k": 10, "mode": "search", "data": ["text:hello world"]}' -H 'Content-Type: application/json' 'http://0.0.0.0:45678/api/search'pip install -r requirements.txtWe'll start off by indexing a small dataset of 50 sentences (data/toy-input.txt) to make sure everything is working:
python app.py -t indexTo index the full dataset (almost 900 MB):
- Set up Kaggle
- Run the script:
sh ./get_data.sh - Set the input file:
export JINA_DATA_FILE='data/input.txt' - Set the number of docs to index
export JINA_MAX_DOCS=500(or whatever number you prefer. The default is50) - Delete the old index:
rm -rf workspace - Index your new dataset:
python app.py -t index
python app.py -t query_restfulThen:
curl --request POST -d '{"top_k": 10, "mode": "search", "data": ["text:hello world"]}' -H 'Content-Type: application/json' 'http://0.0.0.0:45678/api/search'Or use Jinabox with endpoint http://127.0.0.1:45678/api/search
python app.py -t queryThis will create a Docker image with pre-indexed data and an open port for REST queries.
- Run all the steps in setup and index first. Don't run anything in the search step!
- If you want to push to Jina Hub be sure to edit the
LABELs inDockerfileto avoid clashing with other images - Run
docker build -t <your_image_name> .in the root directory of this repo - Run it with
docker run -p 45678:45678 <your_image_name> - Search using instructions from Search above
Please use the following name format for your Docker image, otherwise it will be rejected if you want to push it to Jina Hub.
jinahub/type.kind.image-name:image-version-jina_version
For example:
jinahub/app.example.wikipedia-sentences-30k:0.2.8-0.9.23
Push to Jina Hub
- Ensure hub is installed with
pip install jina[hub] - Run
jina hub loginand paste the code into your browser to authenticate - Run
jina hub push <your_image_name>