Oracle Cloud Infrastructure (OCI) Generative AI Service is a fully managed service that integrates these versatile language models into a variety of use cases.
Oracle has released SDK that makes it easy to call OCI Generative AI services. However, for many packaged projects, some code modification is required to integrate the OCI Generative AI services.
Due to the wide application of OpenAI services, its API interface format has been supported by the vast majority of AI applications. In order to speed up application integration, a project has been created to make OCI Generative AI services compatible with the OpenAI API.
With this project, you can quickly integrate any application that supports a custom OpenAI interface without modifying the application.
Oracle 云基础设施 (OCI) 生成式 AI 是一种完全托管的服务,可将这些多功能语言模型集成到各种用例中。
Oracle已经发布了SDK,可以方便地调用OCI生成式AI服务。但是对于很多已经包装好的项目,需要一些代码修改工作量,以集成OCI上的生成式AI服务。
由于OpenAI服务的广泛应用,其API接口格式已经被绝大多数AI应用所支持。为了能够加快应用集成,一个使OCI生成式AI服务兼容OpenAI API的项目被创建。
通过此项目,您可以在不修改应用的情况下,快速集成任何支持自定义OpenAI接口的应用程序。
This is a project inspired by aws-samples/bedrock-access-gateway
- 20260119: Now support Response API for grok and openai models
- 20251223: Now support imported models, see oracle doc
- 20251223: Use OpenAI compatible endpoint
/20231130/actions/v1/chat/completionsfor meta and xai models - 20251209: Now support Gemini models from Google
- 20251011: Add circuit breaker and exponential backoff. Thanks @munger1985
- 20251009: Deploy the app on to OCI OKE, Refer here for more. Thanks @RahulMR42
- 20250925: Add Easy mode, where you can use environment variables to set the models without
models.yaml - 20250925: Support OpenAI gpt-oss models on OCI
- 20250925: Refactored the code using OCI and OpenAI's SDK, built an Adapter system, and made the code more robust
- 20250812:Support token usage for all chat models
- 20250626: Now support Grok models from XAI on Oracle Cloud
- 20250620: Made fixes and tests on the dedicate endpoint
- 20250619: Support OpenAI models on OCI
- 20250523: Fixed a series of major bugs in meta model function calls
- 20250407: Now support tool call for both cohere and llama in stream/non-stream mode. Set
tool_call: trueandstream_tool_call:trueinmodels.yaml - 20250221: Now support image input for multimodal model like
meta.llama-3.2-90b-vision-instruct - 20250121: Add
gunicornto support parallel threads, get a 9x speed up. Thanks to @streamnsight - 20241219: Add a parameter
EMBED_TRUNCATEinconfig.py. This is a parameter that OpenAI does not have. The default settingENDwill truncate input that exceeds the maximum token length and keep the beginning part. - 20241031: Now you can run this app in docker, simpler thanks to @streamnsight
- 20241031: Add MIT license
- 20241022: Support LLM service deployed through the AI Quick Action of OCI Data Science; Optimize model configuration;
- 20240905: Support Instance principals auth. Thanks to @munger1985;
- 20240815: Add Dedicated AI Cluster support;
- 20240729: first commit;
-
Create OCI API Key, follow Set authentication , note that the
key_fileparameter in the.oci/configneeds to be located in the/root/.oci/directory, as this is the user folder for the Docker runtime environment; -
Run the following command, and it will download an image from GitHub and deploy it as a container. The environment variables
OCI_REGIONandOCI_COMPARTMENTshould be configured according to the tenancy information.
docker run -p 8088:8088 \
-v /root/.oci:/root/.oci \
-e OCI_REGION="us-chicago-1" \
-e OCI_COMPARTMENT="ocid1.compartment.oc1..xxxxxx" \
-it ghcr.io/jin38324/oci-genai-access-gateway:v20251217 \
-n oci-genai-access-gateway-
Clone this repository;
-
Finish prerequisites, follow Set Prerequisites ;
-
Run this app:
In directory
./apprun uvicorn:
python app.py
or use
gunicornto enableparallel threads(only support linux):gunicorn app:app --workers 4 --worker-class uvicorn.workers.UvicornWorker --timeout 600 --bind 0.0.0.0:8088
Make sure the
key_fileparameter in user's directory~/.oci/configis~/.oci, where config and private key located.docker build -t oci_genai_gateway_local . docker run -p 8088:8088 \ -v /root/.oci:/root/.oci \ -e OCI_REGION="us-chicago-1" \ -e OCI_COMPARTMENT="ocid1.compartment.oc1..xxxxxx" \ -it oci_genai_gateway_local \ -n oci-genai-access-gateway
-
Config your application, set
API KeyandHost, like this:(This is an example in Cherry Studio)
It's OK now!
pip install -r requirements.txt
Create access authentication for OCI. There are two ways to achieve this:
- Use API Key. This need a little effort, but easy to understand and can be used everywhere
- Use instance principal. This is easy to set but only available on OCI host machines.
Option1: Use API Key
create config file on OCI console, follow this SDK and CLI Configuration File.
Set config.py file, point to your config location,like OCI_CONFIG_FILE = "~/.oci/config".
Option2: Use instance principal setting
Set OCI policy, define
allow dynamic-group <xxxx> to manage generative-ai-family in tenancy
xxxx is your dynamic-group that indicated your vm or other resources
in config.py, set AUTH_TYPE=INSTANCE_PRINCIPAL
You can set environment variables to use the model. Below is minimal required environment variables, more settings can be found in [Other settings](#Other settings).
On windows Powershell:
$env:OCI_REGION="us-chicago-1"
$env:OCI_COMPARTMENT = "ocid1.compartment.oc1..aaaxxxxxx"On linux:
export OCI_REGION="us-chicago-1"
export OCI_COMPARTMENT = "ocid1.compartment.oc1..aaaxxxxxx"If you want to call models in different regions or compartments, you can modify the models.yaml file.
Generative AI is a rapidly evolving field, with new models being added and old models being retired.
So I abandoned hard-coding model information in the code and instead defined the model through models.yaml.
Don't worry, most of the models have been written well in the file, you just need to use them.
You can change your models.yaml in your runtime if new models are avaliable.
You can define 4 types of models:
- ondemand: pre-trained chat model provided by OCI generative AI service, accessed through a unified API.
- embedding: pre-trained embedding model provided by OCI generative AI service, accessed through a unified API.
- dedicated: OCI Generative AI service’s proprietary model, the model to be accessed is determined by specifying the endpoint
- datascience: LLM service deployed through the AI Quick Action function of OCI Data Science.
AI Quick Actions makes it easy for you to browse foundation models, and deploy, fine-tune, and evaluate them inside Data Science notebooks.
The model to be accessed is determined by specifying the endpoint, and endpoint should be end with
/predict. When create datascience deploymentInference modeshould be/v1/chat/completions.Inference containershould beVLLM.
Model information parameters:
region: OCI services can be provided by multiple regions, so you can configure the region to be calledcompartment_id: Required, this parameter determines the compartment where the service is initiated, which is basically related to cost and permission management;name: is a custom name, a legal string is finemodel_id: is the standard model IDendpoint: Call endpoint, which can be viewed through the OCI console
These settings has default values, you can modify the config.py file to change default settings.
| Variable Name | Description | Default Value |
|---|---|---|
| OCI_CONFIG_FILE | OCI config file location | "~/.oci/config" |
| OCI_CONFIG_FILE_KEY | multiple configs can be added in one config file, so you can use key to determain use which one | "DEFAULT" |
| PORT | service http port | 8088 |
| RELOAD | if True, the web service will reload if any file change in the project | True |
| DEBUG | if True, more logs will displayed | True |
| DEFAULT_API_KEYS | Authorize token for the API | ocigenerativeai |
| API_ROUTE_PREFIX | API url PREDIX | "/v1" |
| AUTH_TYPE | API_KEY or INSTANCE_PRINCIPAL |
"API_KEY" |
| GUNICORN_WORKERS | number of gunicor workers, impact on the number of concurrent requests | 4 |
| GUNICORN_TIMEOUT | gunicorn timeout | 600 |
from openai import OpenAI
client = OpenAI(
api_key = "ocigenerativeai",
base_url = "http://xxx.xxx.xxx.xxx:8088/v1/",
)
models = client.models.list()
for model in models:
print(model.id)More example please check the notebook Endpoint test.ipynb

