CloneGuard

CloneGuard is a Python command-line tool that inspects a public GitHub repository for suspicious or high-risk patterns using the GitHub API, without cloning files locally.

It is designed to help with fast risk triage, not to prove a repository is safe.

Important Disclaimer

CloneGuard does not guarantee safety. It only flags suspicious behavior based on rule-based heuristics and optional AI summarization.

Features

Prompt-based CLI workflow for repository checks
GitHub URL validation
GitHub API token-based repository inspection
Selectable scan mode: quick (faster, reduced coverage) or deep (full coverage)
Live progress output with percentages during fetch and scan phases
Recursive rule-based scanner with modular pattern definitions
Detection of common suspicious patterns, including:
- curl | bash style execution
- wget download-and-run flows
- Encoded PowerShell execution
- eval / exec with encoded payloads
- Base64 decode then execute patterns
- Risky install hooks (postinstall, install)
- Potential hardcoded secrets
- Suspicious subprocess shell usage
- Potential obfuscation/exfiltration indicators
- Suspicious filenames
Risk score + risk level (LOW, MEDIUM, HIGH)
Condensed findings output grouped by file and rule to reduce duplicate noise
Clean terminal output with colored status messages
Optional AI summary when GEMINI_API_KEY is set
Optional terminal risk chat with Gemini
Optional ElevenLabs cloned-voice text-to-speech for chat responses
Graceful error handling for invalid URLs, missing API keys, API failures, and scan issues

Project Structure

cloneguard/main.py - CLI entrypoint and workflow orchestration
cloneguard/scanner.py - Recursive file scanning and finding generation
cloneguard/patterns.py - Regex rules and suspicious filename definitions
cloneguard/repo_utils.py - GitHub API and URL utilities
cloneguard/risk.py - Risk scoring model
cloneguard/formatter.py - Output formatting helpers
cloneguard/ai_summary.py - Optional AI-based summary with local fallback
cloneguard/risk_chat.py - Optional terminal risk chat (Gemini + ElevenLabs TTS)
requirements.txt - Dependencies
.gitignore - Ignored local/generated files
.env.example - API key template

Installation

Ensure Python 3.10+ is installed.
Create and activate a virtual environment:

python3 -m venv .venv
source .venv/bin/activate

Install dependencies:

pip install -r requirements.txt

Optional AI summary setup:

cp .env.example .env
# Then edit .env and add your key

If your shell does not auto-load .env, export manually:

export GEMINI_API_KEY="your_key_here"
export GITHUB_API_KEY="your_github_token_here"
export ELEVENLABS_API_KEY="your_elevenlabs_key_here"
export ELEVENLABS_VOICE_ID="your_voice_id_here"
# Optional:
export ELEVENLABS_MODEL_ID="eleven_multilingual_v2"
export CLONEGUARD_CHAT_LANGUAGE="English"

Usage

Run from the project root:

python -m cloneguard.main

Or pass mode directly:

python -m cloneguard.main --mode quick
python -m cloneguard.main --mode deep

Then:

Paste a public GitHub repository URL.
Choose scan mode (quick or deep) if not passed via CLI.
CloneGuard uses GitHub API calls to read repository files in memory and scan them.
It shows progress percentages while fetching/scanning.
It prints findings, risk score, and risk level.
It saves a markdown report in output/.
It automatically opens the generated markdown file.
Optional terminal risk chat lets you type questions.
If chat is enabled, choose mode: text only or voice (text + ElevenLabs playback).
It asks whether to clone the repository.
If yes, it prompts for destination folder and validates:

The path exists
The path is a directory
The target repo folder does not already exist there

It exits and clears only output/audio/ (whether you clone or not).

Risk Chat + Clone Voice

After scan/report output, CloneGuard can start terminal risk chat:

Uses Gemini to answer questions about the scanned repository risks.
You type questions directly in terminal.
Uses ElevenLabs TTS to speak responses using your cloned voice.
Saves generated voice audio files in output/audio/.
Before playback, you can choose per response:
- normal speed
- 2x speed
While audio is playing, press s + Enter to skip immediately.

To use voice output, set both:

ELEVENLABS_API_KEY
ELEVENLABS_VOICE_ID
Optional response language (e.g. English): CLONEGUARD_CHAT_LANGUAGE

If ElevenLabs vars are missing, chat still works in text-only mode.

Example Output

Scanning repository...
Warning: Suspicious patterns detected

Risk Score: 78
Risk Level: HIGH

Findings:
1. Encoded PowerShell execution found in scripts/setup.ps1
   Reason: Encoded PowerShell commands can hide malicious behavior.
Do you want to clone this repository anyway? (yes/no)

When no major issues are detected:

Scanning repository...
No major suspicious patterns detected.
Risk Score: 8
Risk Level: LOW
Do you want to clone this repository? (yes/no)

Risk Scoring Model

Each grouped finding contributes points by severity:

LOW = 3 points
MEDIUM = 8 points
HIGH = 18 points

Context weighting is applied before scoring:

DOC files (README/docs): lower weight
CI files (.github/workflows): medium weight
RUNTIME scripts/code: full weight

Repeated matches are capped so one repeated pattern does not dominate the score.

Risk level thresholds:

LOW: score < 20
MEDIUM: score 20-49
HIGH: score >= 50, or multiple high-severity findings

The model is intentionally simple and interpretable so new rules can be added easily.

API Key and Secret Hygiene

Never hardcode API keys in source code.
Use GEMINI_API_KEY from environment variables.
Use GITHUB_API_KEY or GITHUB_TOKEN from environment variables.
Use ELEVENLABS_API_KEY and ELEVENLABS_VOICE_ID for voice chat.
Keep .env local and uncommitted.
.env is already listed in .gitignore.
Do not print keys in logs or terminal output.

Limitations

Rule-based scanning can miss novel or heavily obfuscated threats.
Regex pattern matching may produce false positives.
Some behavior only appears at runtime and cannot be identified statically.
Large/binary files are skipped for safety and performance.

Future Improvements

Add CLI flags (--url, --json, --strict)
Add per-language rule packs
Add allowlist/ignore configuration
Add unit tests and fixture repositories
Add SARIF/JSON report output for CI integration

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CloneGuard

Important Disclaimer

Features

Project Structure

Installation

Usage

Risk Chat + Clone Voice

Example Output

Risk Scoring Model

API Key and Secret Hygiene

Limitations

Future Improvements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
cloneguard		cloneguard
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
main.py		main.py
pyrightconfig.json		pyrightconfig.json
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

CloneGuard

Important Disclaimer

Features

Project Structure

Installation

Usage

Risk Chat + Clone Voice

Example Output

Risk Scoring Model

API Key and Secret Hygiene

Limitations

Future Improvements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages