visualization - Stringfest Analytics

How to visually audit Excel workbooks with Python

George Mount — Mon, 09 Mar 2026 01:43:19 +0000

In a previous post, I built a simple Python tool to audit an Excel workbook. The goal there was fairly straightforward: scan the model programmatically and flag common structural issues such as hard-coded numbers in formulas, constants living outside designated input sheets, external links, and formulas that drift out of pattern down a column.

How to build a simple Excel model auditor with Python

That approach gives us something useful: a list of potential problems. But lists of warnings only get you so far. They tell you what might be wrong, but they do not help you understand how the model itself is structured.

In this post I want to take the idea a step further and show how to visualize the structure of an Excel model using Python. Instead of just printing audit findings, we start thinking about the workbook as a network of dependencies. Once you look at it that way, you can actually map the model and make its logic visible.

If you want to follow along, you can download the demo workbook and the accompanying notebook below.

Download the exercise file here

The notebook and full output are presented below. An explanation of what the script is doing and why these checks matter follows afterward.

At a very high level, the notebook treats the spreadsheet less like a grid and more like a system of connected calculations. That shift in perspective matters because many spreadsheet problems are not obvious when you look at cells one by one. They become clearer when you look at how information flows through the model.

With that in place, let’s walk through the code step by step and look at what each part is doing and what it tells us about the structure of the workbook.

Step 1: Loading the workbook as structured data

The first important move in the notebook is this:

file = "demo_financial_model.xlsx"
wb = openpyxl.load_workbook(file, data_only=False)
print("Sheets:", wb.sheetnames)

When this runs, Python opens the workbook and reads it as a structured file rather than as a live Excel interface. That distinction matters.

Inside Excel, you are usually interacting with the model visually. You click cells, inspect formulas, trace arrows, and move around sheet by sheet. Python approaches the workbook differently. It reads the file as data. That means it can inspect every worksheet, every cell, and every formula systematically.

In this example, the workbook contains three sheets:

['Inputs', 'Calculations', 'Outputs']

That already tells us something useful about the intended design of the model. The workbook appears to be organized using a familiar structure:

Inputs for assumptions
Calculations for the working logic
Outputs for final results

That separation is common in well-organized analytical models, and it gives the rest of the notebook something to work with. The code can now ask whether the workbook is actually behaving the way that structure suggests it should.

Step 2: Looking for hardcoded constants outside the Inputs sheet

The next section of the notebook scans the workbook for numeric constants that appear outside the Inputs sheet:

hardcoded_cells = set()
input_sheets = ["Inputs"]for sheet in wb.sheetnames:
    ws = wb[sheet]    for row in ws.iter_rows():
        for cell in row:            if cell.data_type == "n":
                if sheet not in input_sheets:
                    if cell.value is not None:
                        hardcoded_cells.add(f"{sheet}!{cell.coordinate}")

This part is based on a simple modeling principle: in many structured workbooks, important assumptions should live in dedicated input areas, not be scattered across calculation sheets.

When assumptions are centralized, they are easier to review, update, and explain. If key numbers start appearing directly inside the calculation layer, the model becomes harder to trust. A reviewer has to ask: is this value intentional, or did someone overwrite a formula?

In this example, the notebook reports:

Hardcoded cells: {'Calculations!A3', 'Calculations!A4', 'Calculations!A2', 'Calculations!C2'}

This ties directly back to the previous post. There, I already discussed that A2, A3, and A4 were not especially concerning because they are just year labels. But Calculations!C2 was more significant. It represented a hardcoded revenue value in a place where I would normally expect a formula driven by inputs such as units and price.

That is the first reason this notebook matters. It is not just building a pretty visualization. It is carrying forward actual audit findings from the earlier script and placing them into the structural picture of the model.

Step 3: Expanding Excel ranges so the relationships are real

Next, the notebook defines a helper function called expand_range():

def expand_range(sheet, start, end):    col1, row1 = coordinate_from_string(start)
    col2, row2 = coordinate_from_string(end)    col1_i = column_index_from_string(col1)
    col2_i = column_index_from_string(col2)    cells = []    for col in range(col1_i, col2_i + 1):
        for row in range(row1, row2 + 1):
            coord = f"{get_column_letter(col)}{row}"
            cells.append(f"{sheet}!{coord}")    return cells

This may look technical, but the idea behind it is very straightforward.

Excel formulas often refer to ranges instead of individual cells. For example, a formula might sum a range like E2:E4. To a human reader, that clearly means three cells. But if we want to build a dependency graph, we cannot treat E2:E4 as one mysterious blob. We need to expand it into its underlying parts:

E2
E3
E4

That is exactly what this function does. If we skip this step, the graph will miss real dependencies and the workbook will appear simpler than it actually is. A model might look like it has fewer connections than it really does, and that could hide the true influence of certain cells.

This is a good example of where Python adds value beyond Excel’s built-in tools. Excel can certainly evaluate a range formula correctly, but Python lets us unpack that formula into explicit relationships that can be counted, visualized, and analyzed.

Step 4: Building the dependency graph

This is the core of the notebook:

G = nx.DiGraph()cell_pattern = r"[A-Za-z_]+!\$?[A-Za-z]+\$?\d+"
range_pattern = r"([A-Za-z_]+)!([A-Za-z]+\d+):([A-Za-z]+\d+)"for sheet in wb.sheetnames:
    ws = wb[sheet]    for row in ws.iter_rows():
        for cell in row:            location = f"{sheet}!{cell.coordinate}"            if cell.data_type == "f":                formula = cell.value                ranges = re.findall(range_pattern, formula)                for r in ranges:
                    rng_sheet, start, end = r
                    expanded = expand_range(rng_sheet, start, end)                    for ref in expanded:
                        G.add_edge(ref, location)                refs = re.findall(cell_pattern, formula)                for ref in refs:
                    ref = ref.replace("$","")
                    G.add_edge(ref, location)

At a high level, this code scans every formula in the workbook and asks a simple question:

Which cells does this formula depend on?

For every dependency it finds, it adds an edge to the graph. This is where the ideas of nodes and edges become important.

A node is a cell
An edge is a dependency relationship between cells

If one cell feeds another, the graph captures that as an arrow. So if Calculations!B3 depends on Inputs!B2, the graph includes a relationship like this:

Inputs!B2 → Calculations!B3

That arrow matters because it tells us something about the model’s logic. It says that the value in the calculation cell is not independent. It comes from somewhere else. If the source changes, the downstream cell changes too.

This is one of the biggest conceptual shifts in the notebook. Instead of seeing the workbook as just a set of formulas, we start seeing it as a directed network of cause and effect.

In this example, after building the graph, the notebook reports:

Nodes: 19
Edges: 10

So in this small workbook, Python has identified 19 participating cells and 10 relationships between them.

That may not sound dramatic, but it is the foundation for everything that follows. Once those relationships are represented as a graph, we can ask much more interesting questions than Excel’s built-in tools usually make easy.

Step 5: Making sure disconnected hardcoded cells still show up

After building the graph from formulas, the notebook adds this step:

for cell in hardcoded_cells:
    if cell not in G:
        G.add_node(cell)

If a value is hardcoded and not referenced by any formulas, it may never appear in the graph automatically. That is because the graph is being built from formula relationships. No formula relationship means no edge, and possibly no node.

But from an auditing perspective, that kind of disconnected cell can be extremely important.

Take Calculations!C2 in this workbook. That cell contains a hardcoded revenue value. If nothing is pointing to it and it is not pointing anywhere else, it could easily disappear from the graph unless we explicitly add it. But the fact that it is disconnected is itself informative. It tells us that the value is not participating in the model’s logic the way we might expect.

So this step ensures that hardcoded constants still appear in the visualization, even if they are structurally isolated.

That matters because the absence of a connection can be just as meaningful as the presence of one. A disconnected calculation may indicate a harmless label, an abandoned piece of logic, or a genuine modeling error. The graph should give us the chance to see that.

Step 6: Coloring the nodes so the structure becomes readable

Next, the notebook assigns colors based on each node’s role:

def node_color(node):    if node in hardcoded_cells:
        return "gold"    if node.startswith("Inputs"):
        return "lightgreen"    elif node.startswith("Calculations"):
        return "lightskyblue"    elif node.startswith("Outputs"):
        return "salmon"    return "lightgray"

This is not just a cosmetic step. It helps translate the graph into something an Excel user can interpret quickly.

The colors in this example mean:

green = inputs
blue = calculations
red = outputs
gold = hardcoded constants outside the input layer

That last category is especially useful because it overlays the audit results from the first notebook directly onto the model structure. Instead of only printing a list of suspicious cells, we can now see where those cells live inside the architecture of the workbook.

This is where the graph starts to become more than just a technical artifact. It becomes an explanatory tool.

For example, a gold node sitting inside the Calculations sheet tells a story. It says: this cell is behaving more like an input, but it is not living in the input layer. That is a structural inconsistency.

That is exactly what was happening with Calculations!C2.

Step 7: Forcing a layered layout that mirrors how models are designed

Then the notebook creates positions for the nodes:

layers = {
    "Inputs": 0,
    "Calculations": 1,
    "Outputs": 2
}pos = {}
y_positions = {}for node in sorted(G.nodes):    sheet = node.split("!")[0]
    layer = layers.get(sheet, 3)    y_positions.setdefault(sheet, 0)    pos[node] = (layer, -y_positions[sheet])
    y_positions[sheet] += 1

This step matters because raw network graphs can get messy very quickly. If we simply let a graphing library place everything wherever it wants, the result often looks like a tangle of arrows.

Here, instead of using a free-form layout, the notebook deliberately places the sheets in columns from left to right:

Inputs → Calculations → Outputs

That mirrors the intended design of the workbook itself.

This is a subtle but important choice. It makes the graph easier for Excel users to read because it aligns with the way they already think about a model. Inputs feed calculations, and calculations feed outputs.

So even though the graph is generated with Python, it still respects familiar spreadsheet design principles. The purpose is not to turn Excel into an abstract math problem. The purpose is to make the model’s structure more visible.

Step 8: Drawing the graph and sizing nodes by influence

The graph is then drawn with this code:

plt.figure(figsize=(12,10))sizes = [400 + len(nx.descendants(G,n))*100 for n in G.nodes]nx.draw(
    G,
    pos,
    with_labels=True,
    node_color=colors,
    node_size=sizes,
    font_size=8,
    arrows=True
)plt.title("Excel Model Dependency Graph")
plt.show()

Here’s the resulting visualization of our model:

Two things are happening here. First, the notebook is visualizing the dependency network we just built. Second, it is sizing each node based on the number of downstream cells that depend on it. In other words, cells that influence more of the model appear larger.

This matters because not all cells are equally important. Some cells are relatively local. Others sit near the top of the dependency chain and affect many downstream calculations.

Sizing nodes this way helps the graph communicate influence, not just existence.

That is another place where Python adds value beyond Excel’s built-in auditing tools. Excel can tell you the direct precedents or dependents of a selected cell, but Python can help summarize influence more broadly across the model.

Even in a small workbook, this starts to reveal which parts of the spreadsheet are structurally important.

Step 9: Identifying orphan cells

Finally, the notebook looks for orphan cells:

orphans = [n for n in G.nodes if G.degree(n) == 0]
print("Orphan cells:", orphans)

In this example, the result is:

Orphan cells: ['Calculations!A3', 'Calculations!A4', 'Calculations!A2', 'Calculations!C2']

This is one of the most useful outputs in the notebook because it reinforces the earlier audit findings with structural evidence.

The year labels in A2, A3, and A4 are disconnected because they are simply labels. That is fine.

But Calculations!C2 is different. It is disconnected for a more interesting reason: it is a hardcoded value sitting outside the normal assumption flow of the model.

That is exactly what I pointed out in the previous post. There, the script flagged Calculations!C2 as a hardcoded constant. Here, the graph shows us something more: it is not just hardcoded, it is also structurally isolated.

That combination makes the issue much easier to understand. It is no longer just a suspicious number on a list. It is a number that sits outside the model’s normal flow.

How this compares to Excel’s built-in auditing tools

Excel already includes several useful tools for auditing models. Features such as Trace Precedents, Trace Dependents, and Show Formulas can be very helpful when you are trying to understand a particular formula or troubleshoot a specific section of a workbook.

Where these tools become more limited is at the workbook level. They are designed primarily for interactive, cell-by-cell inspection. That approach works well when you already know where to look, but it becomes harder when you want to step back and understand the structure of the entire model.

The Python approach used in this notebook is aimed at a different level of analysis. Instead of examining formulas individually, the script scans the workbook in a single pass, extracts the relationships between cells programmatically, and then summarizes the structure of the model.

In practical terms, this means we can inspect the entire workbook at once rather than reviewing formulas one at a time. It also allows the same logic to be applied consistently across multiple files, makes it possible to combine structural visualization with audit findings, and can automatically surface cells that appear disconnected or suspicious.

The goal is not to replace Excel’s built-in auditing features. Rather, Python provides an additional layer of visibility. Excel remains excellent for manually exploring individual formulas and tracing specific dependencies. Python, on the other hand, is well suited for stepping back, analyzing a workbook systematically, and creating repeatable checks. Used together, the two approaches complement each other very well.

Why this matters now

This kind of workflow is becoming increasingly relevant as spreadsheet generation becomes faster and easier. AI tools can now produce workbook structures, formulas, and even entire models from prompts. As that capability grows, the bottleneck shifts. The challenge is no longer simply producing spreadsheets; it is understanding them, validating them, and maintaining them.

That is where an approach like this becomes useful. With a relatively small amount of code, we can begin to inspect workbook structure automatically, visualize how logic flows through the model, highlight structural violations, and identify cells that fall outside the intended design.

None of this removes the need for human judgment. As the earlier post showed, some audit findings will inevitably be false positives that require interpretation. What it does provide is a faster and more focused way to review a model. Instead of manually searching through a workbook for potential issues, we can use code to narrow down where our attention is most needed.

Want to build tools like this for your team?

This example is intentionally small, but the underlying idea is very practical. Teams that rely heavily on Excel can use Python to build lightweight validation tools, auditing utilities, and analysis workflows around their spreadsheets.

If you want to set up tools like this for your team, strengthen Python capabilities around Excel, or build more reliable analytics workflows, take a look at my How I Work page:

How I work

That is where I outline how I help organizations combine Excel, Python, and modern AI tools in ways that are useful, practical, and grounded in real analytical work.

The post How to visually audit Excel workbooks with Python first appeared on Stringfest Analytics.

Python in Excel: How to do rolling correlations

George Mount — Wed, 02 Jul 2025 15:46:20 +0000

As a financial analyst, you often want to trace how relationships between asset returns evolve over time. Maybe you’re looking to understand diversification benefits, detect changing market dynamics, or flag periods of abnormal correlation between two securities. That’s where rolling correlations come in. In this walkthrough, we’ll use Python in Excel to track how eBay (EBAY) and ExxonMobil (XOM) stock returns move together across time. Using Python makes this analysis a bit simpler and more dynamic than otherwise in Excel.

Download the exercise file below to follow along:

Download the exercise file here

You’ll find the following code blocks already set up in the downloadable workbook. They’re shown below as Gists for easier reading and reference.

In these first steps, we’re loading the historical price data from the Excel table, setting the Date column as the index so we can work with time series properly, and then calculating daily percentage changes to get the returns for each stock.

We drop the first row of missing values since you can’t calculate a return without a previous day’s price. This gives us a clean DataFrame of daily returns that we’ll use for correlation analysis.

Next, we’ll set up the rolling correlation calculation. Instead of hardcoding the window size (like 30 days), we’ll pulling it dynamically from cell F1 in Excel. This lets the user adjust the rolling window directly in the sheet, much like a parameter in Power BI or Tableau, so they can explore how different time spans affect the correlation.

Next, we’ll create a line chart to visualize how the rolling correlation between EBAY and XOM changes over time. We start by setting the figure size for a cleaner layout, then plot the rolling_corr Series. The title automatically reflects the window size selected in cell F1, and we add axis labels to make the chart easier to interpret. This lets you quickly spot periods when the two stocks moved more or less in sync.

This rolling correlation plot shows how the relationship between EBAY and XOM stock returns changes over time. Instead of calculating a single correlation value for the entire dataset, we’re calculating it over a moving window so we can see how their connection strengthens or weakens during different periods.

When the line is closer to +1, it means the two stocks tended to move in the same direction during that window. A value near 0 means there wasn’t much of a consistent relationship—sometimes they moved together, sometimes not. When the line dips toward -1, it suggests the stocks were moving in opposite directions.

This is useful because market conditions change, and relationships between assets can shift. A static correlation won’t show that. With a rolling view, you can identify periods where diversification may have helped (when correlation is low or negative) or when it didn’t (when correlation is high). And since the window size comes from a cell in Excel, you can easily adjust it to explore how different timeframes affect the trends you see.

A natural next question after looking at a rolling correlation plot is: what window size should I use? The window determines how many days of data are used to calculate each point on the chart. Smaller windows pick up changes more quickly, while larger windows smooth out the noise and give you a broader trend.

There’s no one-size-fits-all answer. Like with most things as an analyst, it depends on your goals.

The data analyst's favorite phrase that every manager LOVES to hear: pic.twitter.com/3pwk2IXqcB

— George Mount (@gjmount) November 20, 2023

If you’re trying to spot quick shifts in behavior, a shorter window may be useful. But if you want a more stable picture that avoids overreacting to day-to-day fluctuations, a longer window might be better. Here’s a quick reference table to help you decide:

Window	Meaning	Pros	Cons
5	1 trading week	Very responsive to recent changes	Extremely volatile
20	1 trading month	Good for detecting short-term co-movement	Still a bit noisy
60	3 trading months (quarter)	Balances short-term trends with stability	May miss fast-changing relationships
120	~6 months	Smooth and stable correlation trend	Slow to react to rapid changes

In addition to plotting the rolling correlation over time, a scatterplot comparing all return points is a good idea too because it gives you a quick visual sense of the overall relationship between the two assets. While the rolling chart shows how correlation changes, the scatterplot reveals the general pattern of how often and how strongly the returns move together across the whole dataset. We can derive that with this code block:

In this case, the code creates a scatterplot of EBAY vs. XOM daily returns, where each dot is a trading day. The alpha=0.6 parameter sets the dots to be semi-transparent, making it easier to see where the points are most concentrated. If the stocks moved in lockstep, you’d see a clear diagonal line—either upward or downward. Here, the points are more loosely spread, suggesting a weak positive correlation.

You’ll also notice some outliers—points that sit far away from the center cluster. These could reflect unusual events like earnings surprises or broader market shocks. While they may be interesting, they can also distort the correlation, so it’s worth considering how much weight to give them depending on your analysis goals.

In this demo, we calculated and visualized rolling correlations using Python in Excel. By working with returns instead of raw prices, we uncovered how their relationship changes over time… something a single correlation number can’t show.

Rolling correlations are great for spotting patterns and shifts in asset behavior, but they depend on window size and assume a linear relationship. Use them as one piece of your broader analysis. If you have questions or want help bringing this kind of insight into your workflow, drop a comment below or get in touch.

The post Python in Excel: How to do rolling correlations first appeared on Stringfest Analytics.

Excel for Healthcare Preview: What is a Dashboard, Anyway?

George Mount — Fri, 05 Aug 2016 09:00:47 +0000

Subscribe to the newsletter for more Excel tips and tricks. Plus, your free ebook!

This is a preview of my course on Excel for healthcare. Learn more at http://georgejmount.com/excelforhealthcare

Every healthcare analyst will be asked to build a dashboard at some point. But what, exactly, is a dashboard? Just like the dashboard on your car, if you focus so much on a data dashboard that you don’t look up at the road, you will crash!

In this section, I will present some tips on building and managing dashboard requests.

The post Excel for Healthcare Preview: What is a Dashboard, Anyway? first appeared on Stringfest Analytics.