This repository packages Replit Code 1.3B as a Truss.
Replit Code 1.3B is an LLM released by Replit, optimized and trained for generating code autocompletions.
First, clone this repository:
git clone https://github.com/basetenlabs/truss-examples/
cd replit-code-1.3b-trussBefore deployment:
- Make sure you have a Baseten account and API key.
- Install the latest version of Truss:
pip install --upgrade truss
With replit-code-1.3b-truss as your working directory, you can deploy the model with:
truss pushPaste your Baseten API key if prompted.
For more information, see Truss documentation.
We found this model runs reasonably fast on A10Gs; you can configure the hardware you'd like in the config.yaml.
...
resources:
cpu: "3"
memory: 14Gi
use_gpu: true
accelerator: A10G
...Before deployment:
- Make sure you have a Baseten account and API key. You can sign up for a Baseten account here.
- Install Truss and the Baseten Python client:
pip install --upgrade baseten truss - Authenticate your development environment with
baseten login
Deploying the Truss is easy; simply load it and push from a Python script:
import baseten
import truss
replit_code_truss = truss.load('.')
baseten.deploy(replit_code_truss)The usual GPT-style parameters will pass right through to the inference point:
- max_new_tokens (default: 64)
- temperature (default: 0.5)
- top_p (default: 0.9)
- top_k (default: 0)
- num_beams (default: 4)
- do_sample (default: False)
Note that we recommend setting do_sample to True for best results, and
increasing the max_new_tokens parameter to 200-300.
truss predict -d '{"prompt": "def fib(n):", "do_sample": True, "max_new_tokens": 300}'You can also invoke your model via a REST API
curl -X POST " https://app.baseten.co/models/YOUR_MODEL_ID/predict" \
-H "Content-Type: application/json" \
-H 'Authorization: Api-Key {YOUR_API_KEY}' \
-d '{
"prompt": "def fib(n):",
"do_sample": True,
"max_new_tokens": 300,
}'