This repository is a small monorepo with three primary packages, six pipeline solutions, four standalone tools, and supporting docs:
- packages/github-rest: lightweight GitHub REST client and reusable helpers (endpoints, pagination, permissions). See packages/github-rest/README.md for usage and exported helpers.
- packages/gh-cleanup: CLI tools that implement repository-cleanup features (commands: remove-forks, archive-stale-repos, delete-empty-repos, categorize-repos, summary, evaluate-actions). See packages/gh-cleanup/README.md for CLI options and examples.
- packages/llm-completion: OpenAI/Azure OpenAI LLM client for repository description generation and pattern analysis. See packages/llm-completion/README.md for API details.
Docs and artifacts
-
generated/contains example or generated markdown outputs (e.g., catalogs and summaries) produced by the CLI for site consumption. These are intended as the site/content inputs fordfberry.github.ioor similar static sites. To place thegeneratedfolder at the root of the repo, use../../generated. -
docs/GET-GITHUB-TOKEN.mddocuments how to create a GitHub token with the right scopes for dry-run and destructive operations (delete/archive/update). Use this when preparing CI or local runs. -
scripts/holds utility scripts used by maintainers — notablyscripts/run-all.shwhich runs the full pipeline (summary, categorize, delete-empty, remove-forks, archive-stale, describe-repos) in a safe, mostly dry-run flow and forwards--applyfor destructive steps. When invoked via the npm wrapper scripts (npm run run-all*) the wrapper will create./generatedand capture stdout/stderr to./generated/run-all*.logfiles. -
.github/contains repository-maintenance artifacts and CI/workflow definitions:.github/LLM_DESCRIBE_REPO_PROMPT.md: the default LLM prompt template used by thedescribe-repo/describe-reposcommands. The CLI will search upward for this file when--promptisn't provided..github/describe-files.json: metadata used by repository tooling to enumerate or validate files that should be present in packages or the monorepo (used by CI or publishing scripts)..github/package-placement-rules.md: guidance on where code/packages belong in the monorepo (helps contributors place new packages consistently)..github/copilot-instructions.md: project-specific Copilot / AI assistant guidance for consistent code suggestions, testing, and style rules..github/ISSUE_TEMPLATE/: issue templates to help contributors file useful issues (e.g.,outdated-actions.mddocuments reporting of outdated GH Actions)..github/workflows/: GitHub Actions workflows for CI, release, and site publishing. Examples in this repo include:describe-repo.yml: a workflow that can run thedescribe-reposcommand in CI (useful for scheduled content generation or audits).gh-sdk-ci.yml: CI for thegithub-restpackage.update-site.yml: a workflow that pushes generated site artifacts (fromgenerated/) to a target repo using a deployment token.
-
Other support files:
README.replace.md,docs/andgenerated/contain content and scripts used to assemble the static site content; treat them as source assets for publishing.
If you want, I can also add short links from packages/gh-cleanup/README.md to the .github/LLM_DESCRIBE_REPO_PROMPT.md and the scripts/run-all.sh examples so users discover them faster.
The six solutions work together in an orchestrated pipeline for automated GitHub repository health analysis and remediation:
- security-audit-repos — Security vulnerability scanning (P0)
- sample-health-check — Repository health scoring across 7 dimensions (P0)
- create-remediation-issues — Auto-create GitHub issues for findings (P1)
- pr-feedback-aggregator — Aggregate PR review patterns + LLM analysis (P1)
- azure-best-practices-check — Azure SDK/IaC/CI/config/security scoring (P2)
- sample-auto-fix — Automated remediation: creates branches, writes fixes, opens PRs (P2)
# Build everything
npm ci && npm run build
# Run the full 6-step pipeline (dry-run by default)
node scripts/run-pipeline.mjs
# Or via npm script:
npm run pipeline
# Apply destructive operations (creates issues + PRs)
npm run pipeline:applyAll output goes to generated/{solution-name}/ with timestamped JSON reports.
- Node.js >= 22
- A
.envfile at the repository root withGITHUB_TOKENorGH_TOKENset - For pr-feedback-aggregator: also requires
OPENAI_API_KEYor compatible LLM endpoint in.env - Run
npm ci && npm run buildto install dependencies and compile TypeScript
| Step | Solution | Purpose | Output |
|---|---|---|---|
| 1 | Security Audit | Scan security posture (Dependabot, code scanning, secrets, branch protection) — scores 0-100 | security-audit/{timestamp}-audit.{json,md} |
| 2 | Sample Health Check | Multi-dimension health analysis (Documentation, CI/CD, Dependencies, Activity, Hygiene, Azure, Branch Protection) — grades A–F | sample-health-check/{timestamp}-health.{json,md} |
| 3 | Create Remediation Issues | Extract actionable findings from steps 1–2 and create GitHub Issues with deduplication | remediation-issues/{timestamp}-issues.json |
| 4 | PR Feedback Aggregator | Fetch PR comments, identify patterns via LLM, generate actionable recommendations | pr-feedback-aggregator/{timestamp}-feedback.json |
| 5 | Azure Best Practices | Score Azure patterns: SDK usage, IaC/Bicep, CI/CD config, security — 15 rules across 5 dimensions | azure-best-practices/{timestamp}-check.json |
| 6 | Sample Auto-Fix | Create branches and PRs with auto-fixes for security, health, and Azure findings — includes 6-layer safety model | sample-auto-fix/{timestamp}-fixes.json |
For detailed documentation, see docs/PIPELINE.md.
The six solutions include 300+ tests ensuring reliable operation across varied repository configurations.
In addition to the pipeline, the monorepo includes four standalone tools for targeted operations:
| Tool | Purpose | Documentation |
|---|---|---|
| get-pr-comments | Extract PR comments from a single repository | solutions/get-pr-comments/README.md |
| get-user-comments | Fetch all comments by a specific user across repositories | solutions/get-user-comments/README.md |
| move-between-repos | Transfer issues and PRs between repositories | solutions/move-between-repos/README.md |
| get-instruction-from-pr-comments | Extract actionable instructions from PR review comments | solutions/get-instruction-from-pr-comments/README.md |
These tools run independently and are not part of the automated pipeline.
This section describes the key functionality for the repository-cleanup tooling and the expected behaviors for each tool. It's a concise spec to guide implementation, testing, and safe operation.
-
Remove forks: Identify repositories that are forks and are owned by the authenticated user. Provide a dry-run listing with metadata (name, full_name, parent, last_push, size, topics). Support an interactive or non-interactive deletion mode. Safety: default to dry-run; require
--yesto perform deletions and--forceto skip the final typed confirmation. -
Archive old repositories: Find repositories with no code or issue activity for a configurable threshold (default 365 days). Provide options to filter by org/user, exclude forks or archived repos, and produce a report before archiving. Safety: default to dry-run; require
--yesto PATCH the repo to archived=true. -
Remove empty repositories: Detect repos that are effectively empty using three checks:
size === 0in repo metadata, no commits (commits API returns 409 or empty), and no open pull requests. Provide a dry-run list and optionally delete. Safety: default dry-run;--yesplus interactive confirmation or--forceto skip typing the confirmation string. -
Categorize remaining repositories: Run lightweight analysis per-repo to assign categories (e.g., library, cli, infra, docs, sample). Use heuristics such as language, topics, README presence, package manifests, and last activity. Emit structured output linking repositories to category tags and confidence scores.
-
Generate repository descriptions & topics (LLM): Use a configured LLM chat model with a prompt template to generate a short and long description, suggested topics, and useful links for a repository. This is implemented via the
describe-repoanddescribe-reposcommands. The action defaults to dry-run; provide--applyto PATCH repository descriptions and update topics. Supported flags include--openai-key=,--prompt=, and--out=. -
Generate Markdown table for dfberry.github.io: From categorized results, generate a markdown table with columns: Name, Description, Topics, Language, Category, Last Updated, Link.
-
Summary command: a
summarycommand/feature produces aggregated summaries of repositories (counts, categories, and other high-level metrics) used by the CLI and reporting tools.
Run one example command per main feature (uses the npm wrapper to run the package CLI):
- Summary (initial active list + summary Markdown — matches
scripts/run-all.shstep 1):
npm run start -w gh-cleanup -- summary --output=md --out=../../generated/initial-active.md --summary-out=../../generated/initial-summary.md-
Switches used:
--output=md: output format for the active list (mdorjson).--out=...: destination file for the active list output.--summary-out=...: write the full summary Markdown including counts and the active table.--older-than-days=<n>: change stale cutoff (default 365).--allow-forks: include forks when computing stale/empty/archived sets.--verify: perform extra metadata checks (slower) to verify stale/empty status.
-
Categorize repositories (fetch languages + README and output Markdown — matches
scripts/run-all.shstep 2):
npm run start -w gh-cleanup -- categorize-repos --fetch --output=md --out=../../generated/catalog.md-
Switches used:
--fetch: fetch repository languages and README to improve categorization.--output=md|json: choose Markdown or JSON output.--out=...: destination file for the catalog output.--rules=path: optional rules file to override default categorization heuristics.
-
Delete empty repositories (dry-run or apply — matches
scripts/run-all.shstep 3):
Dry-run:
npm run start -w gh-cleanup -- delete-empty-repos --out=../../generated/delete-empty.jsonApply (destructive):
npm run start -w gh-cleanup -- delete-empty-repos --yes --out=../../generated/delete-empty.json-
Switches used:
--yes: perform the destructive action (delete) instead of a dry-run.--force: skip interactive typed confirmation.--allow-forks: include forks in the scan.--out=...: write the plan or results to a file.--no-audit: omit permission details from the output.
-
Remove forks (dry-run or apply — matches
scripts/run-all.shstep 4):
Dry-run:
npm run start -w gh-cleanup -- remove-forks --out=../../generated/remove-forks.jsonApply (destructive):
npm run start -w gh-cleanup -- remove-forks --yes --out=../../generated/remove-forks.json-
Switches used:
--yes: actually delete matched forked repos.--force: skip interactive confirmation.--out=...: write dry-run or action details to a file.--no-audit: omit permission/audit details.
-
Archive stale repositories (dry-run or apply — matches
scripts/run-all.shstep 5):
Dry-run (default cutoff 365 days):
npm run start -w gh-cleanup -- archive-stale-repos --out=../../generated/stale.jsonApply (archive matched repos):
npm run start -w gh-cleanup -- archive-stale-repos --older-than-days=365 --yes --out=../../generated/stale.json-
Switches used:
--older-than-days=<n>: threshold for inactivity (days).--yes: perform the archival PATCH.--allow-forks: include forks.--out=...: write the list of matched repos.
-
Produce active list JSON (used by the run-all pipeline before describing repos):
npm run start -w gh-cleanup -- summary --output=json --out=../../generated/active.json-
Switches used:
--output=json: produce machine-readable JSON instead of Markdown.--out=...: destination JSON file.
-
Describe repositories (LLM-driven, matches
scripts/run-all.shdescribe step):
Dry-run against active JSON:
npm run start -w gh-cleanup -- describe-repos --repos=../../generated/active.json --out=../../generated/descriptions.jsonApply LLM suggestions to repos (if you want descriptions/topics applied to GitHub):
npm run start -w gh-cleanup -- describe-repos --repos=../../generated/active.json --out=../../generated/descriptions.json --applySwitches used:
- --repos=FILE (alias --input=FILE): input JSON file with the active list (array or object shapes supported).
- --out=FILE: write aggregated AI outputs (JSON or .md inferred by extension).
- --prompt=PATH: override the prompt template file (otherwise searches upward for .github/LLM_DESCRIBE_REPO_PROMPT.md).
- --openai-key=KEY: supply OpenAI key; otherwise OPENAI_API_KEY env var is used.
- --apply: PATCH repository description and update topics on GitHub (destructive; requires GH_TOKEN with repo permissions).
- Evaluate GitHub Actions workflows (dry-run):
```bash
npm run start -w gh-cleanup -- evaluate-actions --out=../../generated/actions.json
```
Debugging LLM calls
- The describe commands support optional debug flags to record prompts and provider responses for inspection:
--debug: enable debug recording.--debug-dir=<path>: directory to write debug files (input prompt and full response JSON).- These map to the
LLMConfig.debugsettings passed to the LLM client; callers may also enable debugging programmatically.
Commands follow a small, consistent calling convention to support both standalone CLI usage and orchestrated group runs.
- CLI: creates a single
GitHubClientand calls the top-levelrunCommand(name, argv, client?)helper. The CLI is the single place that should callgetGitHubClient()for grouped runs. runCommand(top-level command dispatcher): forwards the optionalclientto the command module wrapper. It may be invoked without aclientfor standalone testing.- Wrapper (module CLI entry): accepts
(argv: string[], client?: GitHubClient)— parsesargvintoargsand may validate flags. The wrapper should forward theclientinto the implementation call. - Implementation (
runCommandinside the module): accepts(client?: GitHubClient, args: ParsedArgs)— contains the command logic and should handle a missingclientwhen appropriate (for read-only or test modes).
Flow example:
- CLI creates a single
clientand calls:await runCommand(cmd, childArgv, client). - The dispatcher forwards the
clientand calls the module wrapper:await module.wrapper(childArgv, client). - The wrapper parses
childArgvtoargsand calls the module's implementation:await runCommand(client, args).
This keeps a single getGitHubClient() call for grouped runs while preserving the ability to run commands standalone without a client.
You can generate short descriptions and topic lists for repositories using an LLM-driven CLI command implemented in packages/gh-cleanup/src/commands/describe-repo.ts. The LLM prompt template is at ./.github/LLM_DESCRIBE_REPO_PROMPT.md.
Prerequisites:
- A
.envfile at the repository root or equivalent environment variables set. Examplesamples.envsnippet:
GH_TOKEN=
GH_USER=YOUR_GITHUB_USER_NAME
OPENAI_API_KEY=
OPENAI_ENDPOINT=https://RESOURCE-NAME.openai.azure.com/openai/deployments/gpt-4.1-mini/chat/completions?api-version=API_VERSION
OPENAI_MODEL=gpt-4.1-mini
OPENAI_TEMPERATURE=0.2Variable descriptions:
GH_TOKEN/GITHUB_TOKEN: a GitHub personal access token (PAT) used by the CLI to read and modify repositories. For destructive operations (delete/archive/update topics) the token must have appropriate repo/admin scopes; for read-only operations a token withrepoor read scopes is sufficient.GH_USER: your GitHub username. Used for readability in outputs and to filter owned repositories in summaries.OPENAI_API_KEY: API key for OpenAI (or compatible) services used by the LLM-drivendescribe-*commands. If omitted you can pass--openai-key=on the CLI.OPENAI_ENDPOINT: (optional) full endpoint URL for Azure OpenAI or other hosted endpoints when not using the default OpenAI API base. Example value is the Azure-style chat completions endpoint with deployment and api-version.OPENAI_MODEL: (optional) the model/deployment identifier to use for generation (e.g.,gpt-4.1-mini). When omitted a sensible default is used by the LLM client.OPENAI_TEMPERATURE: (optional) sampling temperature for LLM completions (0-1 scale). Lower values produce more deterministic results; higher values increase creativity.
Debugging LLM calls
-
The describe commands support optional debug flags to record prompts and provider responses for inspection:
--debug: enable debug recording.
This repository uses a TypeScript project-reference build. Recommended workflow:
- Node: >=22 (CI uses Node 22).
- npm: >=7 (recommended for workspace behavior and
npm ci).
Local/CI build commands:
npm ci npm run build
npm run buildexecutestsc -bfrom the repository root to perform an incremental project-reference build across packages. Ensurepackage-lock.jsonis committed and CI runsnpm cifor deterministic installs.If you plan to publish packages, build outputs are emitted to each package's
dist/directory and packages export their compiled entry points fromdist/.--debug-dir=<path>: directory to write debug files (input prompt and full response JSON).- These map to the
LLMConfig.debugsettings passed to the LLM client; callers may also enable debugging programmatically.
Notes:
- When running commands that modify repositories (e.g.,
--yes,--apply), ensureGH_TOKENhas the necessary permissions and consider running a dry-run first. OPENAI_ENDPOINTandOPENAI_MODELare primarily for Azure OpenAI customers; the CLI will fall back to the public OpenAI API unless overridden.
- This repository uses npm workspaces with a single root
package-lock.jsonthat records all workspace dependencies. - When you add, remove, or change dependencies in any workspace
package.json, update the lockfile at the repository root by running:
npm install
git add package-lock.json
git commit -m "Update package-lock.json"- In CI the workflow runs
npm ciat the repository root. If a PR changespackage.jsonwithout updatingpackage-lock.jsonthe build will fail with a clear message. Runnpm installlocally to update the lockfile and push the change to fix the CI failure.
Single repo (dry-run):
npm run start -w gh-cleanup -- describe-repo --repo=owner/repoApply changes (update description & topics):
npm run start -w gh-cleanup -- describe-repo owner/repo --applyBatch run against the active list
The active repository list is in generated/active.md. To run the command for every owner/repo found in that file (dry-run):
grep -Eo '[A-Za-z0-9_.-]+/[A-Za-z0-9_.-]+' generated/active.md | sort -u | xargs -n1 -I{} npm run start -w gh-cleanup -- describe-repo --repo={} To apply changes for each repository, add --apply to the end of the command above.
Optional OpenAI CLI flags supported: --openai-key=, --openai-model=, --openai-temp=, --openai-endpoint=.
Output
The command prints validated JSON to stdout containing short_description, long_description, topics, and links. When run with --apply it will PATCH the repository description and update topics (up to 20).
Using the run-all.sh script
- The
scripts/run-all.shhelper will run the describe step only when an OpenAI key is available in the environment (OPENAI_API_KEY). - If you run
scripts/run-all.sh --applythe script forwards--applyto thedescribe-reposcommand so the LLM-generated descriptions/topics will be applied to each repository. Without--applythe describe step runs in dry-run mode and will not modify repositories.
Example (run-all dry-run):
./scripts/run-all.shExample (apply changes and allow the describe step to write updates):
./scripts/run-all.sh --applyThese switches are used across the CLI commands (examples above and scripts/run-all.sh):