Skip to content

cturan/piramit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Piramit [ARCHIVED]

Status: Archived / Test Project

This repository is an archived test project. Active development has ceased completely.

Project Overview and System Purpose

Piramit is an experimental Large Language Model (LLM) inference engine written in Rust, leveraging Vulkan for GPU acceleration. The main goal is to load and run language models efficiently.

How the system broadly operates:

  1. It takes a standard model (e.g., in safetensors format) and converts it into its custom format called .cmf.
  2. It loads this converted .cmf format into GPU memory using Vulkan.
  3. It executes tensor operations (compute shaders) on the GPU to generate text based on user prompts and serves it over HTTP via a web server (Axum).

What is the .cmf Format?

CMF (Custom/Compiled Model Format): A custom, single-file model format designed specifically for the Piramit project. It is generated from standard models using the project's internal cmf-convert tool.

  • Purpose & Benefits: It applies mixed quantization (reducing weights to various precisions like Q4, Q6, Q8, f16) to ensure the model occupies significantly less memory. Furthermore, it embeds the model configuration (config.json) and tokenizer (tokenizer.json) directly into a single binary file. This allows the system to read all the required data from one payload at optimal speeds and load it directly onto the GPU during execution.

Final Known State and Critical Bugs

This section is derived from final developer notes indicating why development was halted.

  • GPU Upload Performance: Generally working fine, displaying good capability.
  • Caching Mechanism: Improved and working better than previous iterations.
  • Concurrency Failure: [CRITICAL] The project currently fails when running with 4 concurrent operations (or at higher concurrency levels). Specifically, it outputs random gibberish or garbage data.

Due to these critical concurrency bugs and the broader scope, the project has been fully paused with no intention of returning at this moment, rendering it formally archived.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages