Control Flow and Data Flow in Large Language Models (LLMs)

This repository contains research conducted during an internship at the AI Institute of South Carolina, focusing on evaluating LLMs' capabilities in handling control flow and data flow tasks. The work involved creating specialized datasets and developing evaluation frameworks to assess how well different language models can understand and process procedural information.

The research demonstrates that while LLMs have made impressive advances in many natural language tasks, their ability to follow procedural instructions correctly and track information flow between steps remains a challenging area that requires specialized evaluation frameworks and targeted datasets.

Project Overview

Control flow and data flow understanding are two fundamental capabilities required for LLMs to successfully perform complex, multi-step tasks that involve procedural knowledge:

Control Flow: The ability of a model to understand and follow a sequence of steps in the correct order (procedural knowledge).
Data Flow: The capability to track how information passes between steps and how variables change throughout a process.

This research developed specialized datasets and evaluation frameworks to test these capabilities in various LLMs, with a particular focus on creating benchmarks that can help improve model performance on tasks requiring sequential reasoning.

Objectives

Preparation of high-quality datasets for benchmarking control flow and data flow capabilities in LLMs
Evaluation of LLMs for their control flow and data flow understanding
Development of evaluation methodologies to assess how well LLMs can:
- Follow sequential instructions in the correct order
- Track information transfer between steps and maintain variable state
Implementation of testing frameworks to quantitatively measure LLM performance

Methodology

Dataset Preparation: WikiHow

The first phase involved creating a control flow dataset from WikiHow instructions:

Started with the WikiHow Dataset
Applied preprocessing steps
Created two datasets:

Ground truth dataset: "wikihowAll-5-dataset before jumbling.csv"
Control flow benchmark: "control_flow_dataset.csv" (with randomly shuffled steps)

The WikiHow dataset presented challenges due to complex steps and inconsistent notations, which motivated the creation of a second dataset with cleaner structure.

Dataset Preparation: Recipes

To address limitations in the WikiHow dataset, a recipe dataset was adopted:

Selected from Kaggle: Recipe Dataset
Key advantages:

Simple, clear procedural steps
9,997 entries
MIT license
Consistent formatting

Preprocessing workflow
Created three datasets:

Ground truth: "recipes_correct_steps.csv"
Control flow benchmark: "controlflow_recipies.csv" (shuffled steps)
Data flow benchmark: "dataflow_recipies.csv" (steps replaced with ones from different categories)

Evaluation Framework

The evaluation approach tests LLMs' ability to identify correct procedural sequences:

Multiple-choice prompts present models with the correct sequence and distractor options
Implementation using Ollama framework for model inference
Key functions:
def load_steps_from_row(row, prefix="step-"):

"""Extracts non-empty steps from row""" steps = [] for col in sorted(row.index): if col.startswith(prefix): cell = str(row[col]).strip()

if cell and cell.lower() != "nan": steps.append(cell) return steps

def generate_distractor(correct_steps, num_distractors=3): """Generates list of distractor sequences by shuffling the correct sequence""" distractors = [] for _ in range(num_distractors): shuffled = correct_steps.copy()

while True: random.shuffle(shuffled) if shuffled != correct_steps: break distractors.append("; ".join(shuffled)) return distractors 4. Evaluation metrics:

Accuracy: percentage of correct sequence identifications
Error analysis: patterns in model mistakes

This framework provides a standardized approach for comparing different LLM architectures on their procedural reasoning capabilities.

Repository Structure

├── Evaluation Code
│ ├── controlflow_dataflow_Gemma3.ipynb # Evaluation code for Gemma3
│ └── readme.txt # Documentation for evaluation code
│
├── Recipes Dataset
│ ├── code.ipynb # Data processing code
│ ├── controlflow_recipies.csv # Control flow benchmark
│ ├── dataflow_recipies.csv # Data flow benchmark
│ ├── recipes_correct_steps.csv # Ground truth dataset
│ └── readme.txt # Dataset documentation
│
├── wikiHow Dataset
│ ├── code.ipynb # Data processing code
│ ├── control_flow_dataset.csv # Control flow benchmark
│ ├── wikihowAll-5-dataset before ju... # Ground truth dataset
│ └── readme.txt # Dataset documentation
│
├── LICENSE # MIT License
└── README.md # Project documentation

Installation and Usage

Prerequisites

Python 3.8+
pandas
re (regular expressions)
Ollama (for model inference)

Setup

Clone the repository
git clone https://github.com/Haryaksh1/Controlflow-and-Dataflow-in-LLMs
cd Controlflow-and-Dataflow-in-LLMs
Install required packages
pip install pandas
Install Ollama following instructions at https://ollama.ai/

Results and Key Findings

The research provides several insights into LLMs' procedural reasoning capabilities:

Control Flow Understanding:
- Models struggle more with longer procedural sequences
- Performance varies significantly between model architectures
- Specific procedural patterns cause consistent difficulties
Data Flow Challenges:
- Tracking variable state changes remains difficult for most models
- Models often fail to maintain consistency when information must be carried through multiple steps
Future Directions:
- Developing specialized pre-training tasks for procedural reasoning
- Creating more diverse benchmarks across domains
- Exploring hybrid approaches that combine symbolic reasoning with neural language models

Contributions

As highlighted in the recommendation letter from Prof. Amit Sheth:

Creation and Curation of Control Flow and Data Flow Datasets
- Independently sourced and curated high-quality datasets
- Developed structured datasets from recipe repositories with clear step dependencies
- Formatted datasets for compatibility with LLM analysis tools
Development of Analysis Code for LLM Workflows
- Implemented Python code using the Ollama framework
- Generated step dependency graphs and tracked variable states
- Provided statistical analysis of control and data flow consistency
Collaborative Problem Solving and Initiative
- Proactively identified and addressed dataset quality issues
- Maintained consistent communication with the research team
- Demonstrated adaptability by responding to evolving project requirements
Engagement with Research Community
- Regularly attended AI Institute meetings from September 2024 onward
- Maintained detailed documentation and organized GitHub repository

Acknowledgments

Prof. Amit Sheth - Supervisor, AI Institute of South Carolina
Vishal Pallaghani - PhD student mentor
AI Institute of South Carolina - For providing research infrastructure and support

This research contributes to the growing body of work on evaluating and improving procedural reasoning in large language models, addressing fundamental capabilities needed for real-world applications.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Control Flow and Data Flow in Large Language Models (LLMs)

Table of Contents

Project Overview

Objectives

Methodology

Dataset Preparation: WikiHow

Dataset Preparation: Recipes

Evaluation Framework

Repository Structure

Installation and Usage

Prerequisites

Setup

Results and Key Findings

Contributions

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
Evaluation Code		Evaluation Code
Recepies Dataset		Recepies Dataset
wikiHow Dataset		wikiHow Dataset
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Control Flow and Data Flow in Large Language Models (LLMs)

Table of Contents

Project Overview

Objectives

Methodology

Dataset Preparation: WikiHow

Dataset Preparation: Recipes

Evaluation Framework

Repository Structure

Installation and Usage

Prerequisites

Setup

Results and Key Findings

Contributions

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages