qwen-7b-chat

Qwen 7B Chat Truss

This is a Truss for Qwen-7B Chat. Qwen is a family of models developed by Alibaba Cloud. This LLM supports both English and Chinese.

Truss

Truss is an open-source model serving framework developed by Baseten. It allows you to develop and deploy machine learning models onto Baseten (and other platforms like AWS or GCP. Using Truss, you can develop a GPU model using live-reload, package models and their associated code, create Docker containers and deploy on Baseten.

Deployment

First, clone this repository:

git clone https://github.com/basetenlabs/truss-examples/
cd qwen-7b-chat

Before deployment:

Make sure you have a Baseten account and API key.
Install the latest version of Truss: pip install --upgrade truss

With qwen-7b-chat as your working directory, you can deploy the model with:

truss push

Paste your Baseten API key if prompted.

For more information, see Truss documentation.

Hardware notes

This seven billion parameter model requires an A10 GPU.

Qwen-7B Chat API documentation

This section provides an overview of the Qwen-7B Chat model, its parameters, and how to use it. The API consists of a single route named predict, which you can invoke to generate text based on the provided prompt.

API route: `predict`

The predict route is the primary method for generating text completions based on a given prompt. It takes several parameters:

prompt: The input text that you want the model to generate a response for.
stream (optional, default=False): A boolean determining whether the model should stream a response back. When True, the API returns generated text as it becomes available.
max_new_tokens (optional, default=512): The maximum number of tokens to return, counting input tokens. Maximum of 4096.
temperature (optional, default=0.5): Controls the randomness of the generated text. Higher values produce more diverse results, while lower values produce more deterministic results.
top_p (optional, default=0.95): The cumulative probability threshold for token sampling. The model will only consider tokens whose cumulative probability is below this threshold.
top_k (optional, default=40): The number of top tokens to consider when sampling. The model will only consider the top_k highest-probability tokens.

The API also supports passing any parameter supported by HuggingFace's Transformers.generate.

Example usage

truss predict -d '{"prompt": "What is the meaning of life?", "max_new_tokens": 512}'

You can also invoke your model via a REST API:

curl -X POST " https://app.baseten.co/model_versions/YOUR_MODEL_VERSION_ID/predict" \
     -H "Content-Type: application/json" \
     -H 'Authorization: Api-Key {YOUR_API_KEY}' \
     -d '{
           "prompt": "What's the meaning of life?",
           "max_new_tokens": 512
         }'

Model Output

The meaning of life is a philosophical question that has been debated throughout history. Different people have different beliefs and opinions about what the purpose of existence is, and there is no one definitive answer.

Some believe that the meaning of life is to seek happiness and fulfillment, while others think it is to serve a higher power or to fulfill a specific destiny. Some believe that life has no inherent meaning and that we must create our own purpose through our actions and experiences.

Ultimately, the meaning of life is a deeply personal and subjective concept that may vary from person to person. It is up to each individual to determine their own values and beliefs, and to create a purposeful life that aligns with those beliefs.

Name		Name	Last commit message	Last commit date
parent directory ..
model		model
README.md		README.md
ci.yaml		ci.yaml
config.yaml		config.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Qwen 7B Chat Truss

Truss

Deployment

Hardware notes

Qwen-7B Chat API documentation

API route: `predict`

Example usage

Model Output

FilesExpand file tree

qwen-7b-chat

Directory actions

More options

Directory actions

More options

Latest commit

History

qwen-7b-chat

Folders and files

parent directory

README.md

Qwen 7B Chat Truss

Truss

Deployment

Hardware notes

Qwen-7B Chat API documentation

API route: predict

Example usage

Model Output

API route: `predict`