Change log

Oracle Cloud Infrastructure (OCI) Generative AI Service is a fully managed service that integrates these versatile language models into a variety of use cases.

Oracle has released SDK that makes it easy to call OCI Generative AI services. However, for many packaged projects, some code modification is required to integrate the OCI Generative AI services.

Due to the wide application of OpenAI services, its API interface format has been supported by the vast majority of AI applications. In order to speed up application integration, a project has been created to make OCI Generative AI services compatible with the OpenAI API.

With this project, you can quickly integrate any application that supports a custom OpenAI interface without modifying the application.

Oracle 云基础设施 (OCI) 生成式 AI 是一种完全托管的服务，可将这些多功能语言模型集成到各种用例中。

Oracle已经发布了SDK，可以方便地调用OCI生成式AI服务。但是对于很多已经包装好的项目，需要一些代码修改工作量，以集成OCI上的生成式AI服务。

由于OpenAI服务的广泛应用，其API接口格式已经被绝大多数AI应用所支持。为了能够加快应用集成，一个使OCI生成式AI服务兼容OpenAI API的项目被创建。

通过此项目，您可以在不修改应用的情况下，快速集成任何支持自定义OpenAI接口的应用程序。

This is a project inspired by aws-samples/bedrock-access-gateway

Change log

20260119: Now support Response API for grok and openai models
20251223: Now support imported models, see oracle doc
20251223: Use OpenAI compatible endpoint /20231130/actions/v1/chat/completions for meta and xai models
20251209: Now support Gemini models from Google
20251011: Add circuit breaker and exponential backoff. Thanks @munger1985
20251009: Deploy the app on to OCI OKE, Refer here for more. Thanks @RahulMR42
20250925: Add Easy mode, where you can use environment variables to set the models without models.yaml
20250925: Support OpenAI gpt-oss models on OCI
20250925: Refactored the code using OCI and OpenAI's SDK, built an Adapter system, and made the code more robust
20250812：Support token usage for all chat models
20250626: Now support Grok models from XAI on Oracle Cloud
20250620: Made fixes and tests on the dedicate endpoint
20250619: Support OpenAI models on OCI
20250523: Fixed a series of major bugs in meta model function calls
20250407: Now support tool call for both cohere and llama in stream/non-stream mode. Set tool_call: true and stream_tool_call:true in models.yaml
20250221: Now support image input for multimodal model like meta.llama-3.2-90b-vision-instruct
20250121: Add gunicorn to support parallel threads, get a 9x speed up. Thanks to @streamnsight
20241219: Add a parameter EMBED_TRUNCATE in config.py. This is a parameter that OpenAI does not have. The default setting END will truncate input that exceeds the maximum token length and keep the beginning part.
20241031: Now you can run this app in docker, simpler thanks to @streamnsight
20241031: Add MIT license
20241022: Support LLM service deployed through the AI Quick Action of OCI Data Science; Optimize model configuration;
20240905: Support Instance principals auth. Thanks to @munger1985;
20240815: Add Dedicated AI Cluster support;
20240729: first commit;

Quick Start

Create OCI API Key, follow Set authentication , note that the key_file parameter in the .oci/config needs to be located in the /root/.oci/ directory, as this is the user folder for the Docker runtime environment；
Run the following command, and it will download an image from GitHub and deploy it as a container. The environment variables OCI_REGION and OCI_COMPARTMENT should be configured according to the tenancy information.

docker run -p 8088:8088 \
-v /root/.oci:/root/.oci \
-e OCI_REGION="us-chicago-1" \
-e OCI_COMPARTMENT="ocid1.compartment.oc1..xxxxxx" \
-it ghcr.io/jin38324/oci-genai-access-gateway:v20251217 \
-n oci-genai-access-gateway

Run from source code

Clone this repository;
Finish prerequisites, follow Set Prerequisites ；

Run this app:

Option 1: Launch on host

In directory ./app

run uvicorn:

python app.py

or use gunicorn to enable parallel threads (only support linux):

gunicorn app:app --workers 4 --worker-class uvicorn.workers.UvicornWorker --timeout 600 --bind 0.0.0.0:8088

Option 2: Launch in docker

Make sure the key_file parameter in user's directory ~/.oci/config is ~/.oci, where config and private key located.

docker build -t oci_genai_gateway_local .

docker run -p 8088:8088 \
    -v /root/.oci:/root/.oci \
    -e OCI_REGION="us-chicago-1" \
    -e OCI_COMPARTMENT="ocid1.compartment.oc1..xxxxxx" \
    -it oci_genai_gateway_local \
    -n oci-genai-access-gateway

Config your application, set API Key and Host， like this:

(This is an example in Cherry Studio)

It's OK now!

Set Prerequisites

1. Install python packages

pip install -r requirements.txt

2. Set authentication

Create access authentication for OCI. There are two ways to achieve this:

Use API Key. This need a little effort, but easy to understand and can be used everywhere
Use instance principal. This is easy to set but only available on OCI host machines.

Option1: Use API Key

create config file on OCI console, follow this SDK and CLI Configuration File.

Set config.py file, point to your config location,like OCI_CONFIG_FILE = "~/.oci/config".

Option2: Use instance principal setting

Set OCI policy, define

allow dynamic-group <xxxx> to manage generative-ai-family in tenancy

xxxx is your dynamic-group that indicated your vm or other resources

in config.py, set AUTH_TYPE=INSTANCE_PRINCIPAL

Setting models

Easy mode through Environment variables

You can set environment variables to use the model. Below is minimal required environment variables, more settings can be found in [Other settings](#Other settings).

On windows Powershell:

$env:OCI_REGION="us-chicago-1"
$env:OCI_COMPARTMENT = "ocid1.compartment.oc1..aaaxxxxxx"

On linux:

export OCI_REGION="us-chicago-1"
export OCI_COMPARTMENT = "ocid1.compartment.oc1..aaaxxxxxx"

Advance mode through models.py

If you want to call models in different regions or compartments, you can modify the models.yaml file.

Generative AI is a rapidly evolving field, with new models being added and old models being retired. So I abandoned hard-coding model information in the code and instead defined the model through models.yaml.

Don't worry, most of the models have been written well in the file, you just need to use them.

You can change your models.yaml in your runtime if new models are avaliable.

You can define 4 types of models:

ondemand: pre-trained chat model provided by OCI generative AI service, accessed through a unified API.
embedding: pre-trained embedding model provided by OCI generative AI service, accessed through a unified API.
dedicated: OCI Generative AI service’s proprietary model, the model to be accessed is determined by specifying the endpoint
datascience: LLM service deployed through the AI Quick Action function of OCI Data Science. AI Quick Actions makes it easy for you to browse foundation models, and deploy, fine-tune, and evaluate them inside Data Science notebooks. The model to be accessed is determined by specifying the endpoint, and endpoint should be end with /predict. When create datascience deployment Inference modeshould be /v1/chat/completions.Inference container should be VLLM.

Model information parameters:

region: OCI services can be provided by multiple regions, so you can configure the region to be called
compartment_id: Required, this parameter determines the compartment where the service is initiated, which is basically related to cost and permission management;
name: is a custom name, a legal string is fine
model_id: is the standard model ID
endpoint: Call endpoint, which can be viewed through the OCI console

Other settings:

These settings has default values, you can modify the config.py file to change default settings.

Variable Name	Description	Default Value
OCI_CONFIG_FILE	OCI config file location	"~/.oci/config"
OCI_CONFIG_FILE_KEY	multiple configs can be added in one config file, so you can use key to determain use which one	"DEFAULT"
PORT	service http port	8088
RELOAD	if True, the web service will reload if any file change in the project	True
DEBUG	if True, more logs will displayed	True
DEFAULT_API_KEYS	Authorize token for the API	ocigenerativeai
API_ROUTE_PREFIX	API url PREDIX	"/v1"
AUTH_TYPE	`API_KEY` or `INSTANCE_PRINCIPAL`	"API_KEY"
GUNICORN_WORKERS	number of gunicor workers, impact on the number of concurrent requests	4
GUNICORN_TIMEOUT	gunicorn timeout	600

Deployment

Deploy the app on to OCI OKE

Refer here for more

Test the application

Set base_url

from openai import OpenAI

client = OpenAI(
    api_key = "ocigenerativeai",
    base_url = "http://xxx.xxx.xxx.xxx:8088/v1/",
    )
models = client.models.list()
for model in models:
    print(model.id)

More example please check the notebook Endpoint test.ipynb

Name		Name	Last commit message	Last commit date
Latest commit History 155 Commits
.github/workflows		.github/workflows
app		app
deployments		deployments
image		image
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
endpoint_test.ipynb		endpoint_test.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Change log

Quick Start

Run from source code

Option 1: Launch on host

Option 2: Launch in docker

Set Prerequisites

1. Install python packages

2. Set authentication

Setting models

Easy mode through Environment variables

Advance mode through models.py

Other settings:

Deployment

Deploy the app on to OCI OKE

Test the application

Set base_url

About

Uh oh!

Releases 4

Packages

Uh oh!

Uh oh!

Contributors 7

Languages

Folders and files

Latest commit

History

Repository files navigation

Change log

Quick Start

Run from source code

Option 1: Launch on host

Option 2: Launch in docker

Set Prerequisites

1. Install python packages

2. Set authentication

Setting models

Easy mode through Environment variables

Advance mode through models.py

Other settings:

Deployment

Deploy the app on to OCI OKE

Test the application

Set base_url

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Uh oh!

Contributors 7

Languages

Packages