Skip to content
@deepinfra

Deep Infra

Inference cloud

Popular repositories Loading

  1. deepctl deepctl Public

    Command line tool for Deep Infra cloud ML inference service

    Rust 34 3

  2. deepinfra-node deepinfra-node Public

    Official TypeScript wrapper for DeepInfra Inference API

    TypeScript 20 3

  3. text-generation-inference text-generation-inference Public

    Forked from huggingface/text-generation-inference

    Large Language Model Text Generation Inference

    Python 9 2

  4. ocr-tools ocr-tools Public

    Python 5 2

  5. langchain langchain Public

    Forked from langchain-ai/langchain

    ⚡ Building applications with LLMs through composability ⚡

    Python 1

  6. deepinfra-chat deepinfra-chat Public

    Sample Next.js ai chat app using Deep Infra inference and Vercel ai sdk

    TypeScript 1 2

Repositories

Showing 10 of 42 repositories
  • dynamo Public Forked from ai-dynamo/dynamo

    A Datacenter Scale Distributed Inference Serving Framework

    deepinfra/dynamo’s past year of commit activity
    Rust 0 924 0 0 Updated Mar 13, 2026
  • TensorRT-LLM Public Forked from NVIDIA/TensorRT-LLM

    TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

    deepinfra/TensorRT-LLM’s past year of commit activity
    Python 0 2,204 0 1 Updated Mar 12, 2026
  • vllm-omni Public Forked from vllm-project/vllm-omni

    A framework for efficient model inference with omni-modality models

    deepinfra/vllm-omni’s past year of commit activity
    Python 0 Apache-2.0 540 0 0 Updated Mar 6, 2026
  • docs Public
    deepinfra/docs’s past year of commit activity
    MDX 0 MIT 0 0 2 Updated Mar 2, 2026
  • vllm Public Forked from vllm-project/vllm

    A high-throughput and memory-efficient inference and serving engine for LLMs

    deepinfra/vllm’s past year of commit activity
    Python 0 Apache-2.0 14,556 0 1 Updated Feb 27, 2026
  • hub-docs Public Forked from huggingface/hub-docs

    Docs of the Hugging Face Hub

    deepinfra/hub-docs’s past year of commit activity
    Handlebars 0 Apache-2.0 449 0 0 Updated Feb 24, 2026
  • huggingface.js Public Forked from huggingface/huggingface.js

    Use Hugging Face with JavaScript

    deepinfra/huggingface.js’s past year of commit activity
    TypeScript 0 MIT 668 0 0 Updated Feb 23, 2026
  • tiktoken Public Forked from openai/tiktoken

    tiktoken is a fast BPE tokeniser for use with OpenAI's models.

    deepinfra/tiktoken’s past year of commit activity
    Python 0 MIT 1,466 0 0 Updated Feb 8, 2026
  • Model-Optimizer Public Forked from NVIDIA/Model-Optimizer

    A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM, TensorRT, vLLM, etc. to optimize inference speed.

    deepinfra/Model-Optimizer’s past year of commit activity
    Python 0 Apache-2.0 289 0 0 Updated Jan 12, 2026
  • cookbooks Public

    A collection of cookbooks, tutorials, and examples for using AI models on DeepInfra. This repository provides practical guides, performance benchmarks, and production-ready code examples to help developers build with AI models efficiently. Each cookbook includes comprehensive Jupyter notebooks, benchmarking suites, and real-world use case examples.

    deepinfra/cookbooks’s past year of commit activity
    Jupyter Notebook 0 0 0 0 Updated Dec 15, 2025

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Most used topics

Loading…