VCL — Verbal Command Language

VCL (Verbal Command Language) is a structured, extensible query language designed to describe cognitive operations a human performs on documents. Think of it as a semantics-first abstraction layer between natural language and the concrete operations performed on legal/technical documents: search, summarize, extract, compare, integrate, verify, and so on.

The project implements the following core modules:

Preprocessor: Converts Natural Language (NL) input into a Directed Acyclic Graph (DAG), subsequently transforming it into a list of tasks in NL to guide the execution.
Translator: A translator from natural language → VCL/JSON.
Planner: Turns structured VCL into an ordered execution plan, managing dependencies between operations.
Executor: Runs atomic operations against a document corpus using LLMs and deterministic tools where appropriate.
GUI: A Streamlit-based interface for interactive usage.
Storage & LLM Layer: Storage integrations (MongoDB/Elasticsearch) and a configurable multi-provider LLM layer.
Evaluation Script: Includes a dedicated evaluation script to assess system performance, accuracy, and reliability across the workflow.

The system is intentionally modular so components can be swapped (different LLM providers, plug-in deterministic tools, alternative UIs).

1. Features

This section explains, in depth, what VCL provides now (V0.2) and what it intends to provide.

Structured cognitive primitives A catalog of atomic operations (e.g., search, summarize, extract semantic, extract logical, compare, integrate, verify, classify, reorganize) with deterministic semantics and operational guidelines. Commands are configured in a single JSON language file so the semantics are explicit and editable.
Natural Language → VCL translation The translator subsystem converts user NL queries into a structured VCL representation (JSON). This representation captures command, what, source, conditions, and optional parameters.
Operation planner A planner maps a VCL representation into an ordered list of atomic operations (an execution plan). The planner resolves dependencies and can break some commands into multiple sub-operations when beneficial.
LLM-guided execution Execution primarily uses a LLM for tasks. In Future, deterministic tools or algorithmic routines will be used (e.g., exact reference extraction, regex-based monetary extraction) where possible.
Command-specific behavior & constraints Each VCL command has a specific set of operational guidelines (for example, search must return verbatim text snippets only, no paraphrase; summarize must be a conceptual rewrite). These guidelines are enforced by the executor.
Streamlit GUI Lightweight web UI for interactive queries, document uploads, and result visualization.
Document DB integrations Built to work with MongoDB and Elasticsearch; ! Document upload and indexing have not been implemented yet.
Logging & monitoring Full logging of translation, planning and execution phases for auditability and debugging.
Conversation memory Keeps a history of interactions (V.1: stored but not used to influence understanding by default).
Editable configuration VCL possible sources, and the what types are defined in a config JSON that can be edited by GUI.

2. Supported VCL Commands

Below is a compact reference of the supported commands. Each command entry summarizes the intent and the key operational guidelines the executor follows.

Commands

search (cerca) — Conceptual search for entities, legal concepts, or snippets.
- Output: verbatim excerpts only (no paraphrase), only snippets that directly answer the query.
- Use: find references, parties, locations, monetary amounts, precedent citations.
summarize (riassumi) — Concise, autonomous summary preserving key facts, arguments and dispositive.
- Output: new text (rewritten) capturing facts, relevant arguments, device; no verbatim copy of entire sections.
extract semantic (estrai semantico) — Extract semantically coherent sections (facts, reasons, devices).
- Output: focused sections, reformulated for clarity but preserving conceptual content.
extract logical (estrai logico) — Reconstruct argumentation chain (sillogisms, premises→conclusion).
- Output: stepwise mapping of premises, intermediate inferences, and conclusion(s). Reformulation allowed to clarify logical structure.
compare (confronta) — Comparative analysis across documents, highlighting agreements and divergences.
- Output: grouped concordances and discordances, optionally short textual citations as evidence.
integrate (integra) — Merge multiple documents into a consolidated single text.
- Output: consolidated text where duplicates are removed, conflicts are flagged or resolved under chosen policy.
verify (verifica) — Consistency and reference checking (legal citations, internal contradictions).
- Output: report of anomalies (incorrect citations, logical inconsistencies). No opinions on merits.
analyze (analizza) — Structural decomposition and evaluation of completeness and argumentative robustness.
- Output: an analytic report mapping sections, evaluating completeness and logical support.
reorganize (riorganizza) — Re-order sections according to a chosen criterion (chronological, topical).
- Output: restructured text with preserved unit content.
classify (classifica) — Tag portions of text with legal function labels (e.g., petition, counterclaim, dispositive).
- Output: mapping of text spans → labels.
other (altro) — Fallback for intentions not covered by predefined commands.
- Output: best-effort response that may combine behaviors from other commands.

3. Architecture

The system was designed as a modular pipeline with clear separation of concerns.

4. Descriptions of main components

This section provides a compact but detailed description of each major component of execution

GUI — Streamlit App

Role: User entry point. Query entry, document upload, view results.
Implementation: Streamlit + minimal front-end logic; calls orchestrator API or runs in-process.
LLM-based: No (UI only; however, it can display LLM results).

Orchestrator

Role: Central coordinator. Receives user requests and routes them through preprocessing, translation, planning and execution steps. Responsible for format conversions (NL → JSON/DQL, JSON → plan).
Implementation: Python service with modular plugs for each pipeline stage.
LLM-based: No.

Preprocessing

Role: Normalize text (lowercasing where needed), perform spellchecking/normalization, sanitize input.
Implementation: Two modes:
- LLM-based spelling correction (default) — uses an LLM to correct typos and preserve intent.
- Rule-based or dictionary-based fallback when -parsers is set.
LLM-based: Optional (default yes, configurable).

Translator (NL → VCL)

Role: Convert an NL query to a VCL JSON structure with fields such as command, what, source, conditions, params.
Submodules (all LLM-based):
1. Command Classifier: chooses the best-matching VCL command.
2. Source Extractor: identifies which document sources are relevant (sentenza 1°, memoria, ricorso).
3. What Extractor: determines the object of operation (e.g., fact, decision, precedent, sillogism).
4. Condition Extractor: pulls filters (dates, parties, jurisdiction).
LLM-based: Yes. Each component takes as input the NL query and possibly the results of the previous steps.

Planner

Role: Convert VCL JSON into an ordered plan of atomic operations. Resolve dependencies and split complex requests into sub-operations when needed.
Implementation: Rule engine plus a planner algorithm that maps VCL -> sequence of tasks (JSON list).
LLM-based: No (deterministic planning logic).

Executor

Role: Executes the plan. For each atomic operation, invokes the appropriate tool:
- Deterministic functions: regex extraction, citation parsing, DB queries.
- LLM-invocations for summarization, logical reconstruction, paraphrasing, synthesis.
Behavior: The executor enforces the operational guidelines for each command (e.g., search returns verbatim snippets).
LLM-based: Yes It takes as input the information that can be obtained from the language configuration and the user's VCL request.

Databases (Document Corpus)

Role: Store and serve documents/history chat to the executor/GUI.
Implementation: MongoDB/Elasticsearch

5. Installation

1) Clone repository

git clone https://github.com/unimib-datAI/VCL.git
cd VCL

2) Python environment

python -m venv .venv
source .venv/bin/activate    # Linux/macOS
.\.venv\Scripts\activate     # Windows

3) Install dependencies

pip install -r requirements.txt

4) Create `.env` file

Create a .env file in the repo root with the variables you need:

# Example .env
OPENAI_API_KEY=sk-...
GOOGLE_API_KEY=...
HUGGINGFACEHUB_API_TOKEN=...
MONGO_URI=...
MONGO_INITDB_ROOT_USERNAME=...
MONGO_INITDB_ROOT_PASSWORD=...

5) Insert files

In documents directory you must store your documents. Each document is a JSON file containing three keys:

name: the name of the file
text: the text content to be processed
type_doc: the label describing the document type
owner: the username of the file owner

6) Docker

A docker-compose.yml is included for MongoDB.

docker compose up --build -d

If you run with Docker, set env vars in docker-compose.yml or mount an .env into the container.

7) Running the application

python main.py

Then open: http://localhost:8501

The script accepts several optional flags (arguments) to customize its behavior.

-api <KEY>
- Description: Provides the API key for the LLM.
-uri_db <URI>
- Description: Provides the connection URL for the database (e.g., MongoDB).
-provider <PROVIDER_NAME>
- Description: Specifies which LLM provider to use.
- Default: google_genai
- Choices: google_genai, openai, copilot, huggingface.
-model_name <MODEL_NAME>
- Description: Specifies the exact LLM model name to use.
- Default: gemini-2.5-flash
- Examples: gpt-4o-mini, claude-3-5-sonnet, mistralai/Mistral-7B-Instruct-v0.2.
-wait_seconds <NUMBER>
- Description: Sets the number of seconds to wait after each LLM call.
- Default: 0
-evaluation_mode
- Description: If present, the Streamlit application is not started, but the experimentation script is started
- Usage: Just add the flag; it does not require a value

Example

python main.py -api sk-XXXX -provider openai -model_name gpt-4o-mini -uri_db mongodb://localhost:27017/dql

Name		Name	Last commit message	Last commit date
Latest commit History 182 Commits
documents		documents
gui		gui
logic		logic
scripts		scripts
utils		utils
.dockerignore		.dockerignore
.gitignore		.gitignore
Caddyfile		Caddyfile
Dockerfile		Dockerfile
README.md		README.md
__init__.py		__init__.py
docker-compose.yml		docker-compose.yml
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VCL — Verbal Command Language

1. Features

2. Supported VCL Commands

Commands

3. Architecture

4. Descriptions of main components

GUI — Streamlit App

Orchestrator

Preprocessing

Translator (NL → VCL)

Planner

Executor

Databases (Document Corpus)

5. Installation

1) Clone repository

2) Python environment

3) Install dependencies

4) Create `.env` file

5) Insert files

6) Docker

7) Running the application

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VCL — Verbal Command Language

1. Features

2. Supported VCL Commands

Commands

3. Architecture

4. Descriptions of main components

GUI — Streamlit App

Orchestrator

Preprocessing

Translator (NL → VCL)

Planner

Executor

Databases (Document Corpus)

5. Installation

1) Clone repository

2) Python environment

3) Install dependencies

4) Create .env file

5) Insert files

6) Docker

7) Running the application

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

4) Create `.env` file

Packages