scatterplot - Stringfest Analytics

Python in Excel: How to do rolling correlations

George Mount — Wed, 02 Jul 2025 15:46:20 +0000

As a financial analyst, you often want to trace how relationships between asset returns evolve over time. Maybe you’re looking to understand diversification benefits, detect changing market dynamics, or flag periods of abnormal correlation between two securities. That’s where rolling correlations come in. In this walkthrough, we’ll use Python in Excel to track how eBay (EBAY) and ExxonMobil (XOM) stock returns move together across time. Using Python makes this analysis a bit simpler and more dynamic than otherwise in Excel.

Download the exercise file below to follow along:

Download the exercise file here

You’ll find the following code blocks already set up in the downloadable workbook. They’re shown below as Gists for easier reading and reference.

In these first steps, we’re loading the historical price data from the Excel table, setting the Date column as the index so we can work with time series properly, and then calculating daily percentage changes to get the returns for each stock.

We drop the first row of missing values since you can’t calculate a return without a previous day’s price. This gives us a clean DataFrame of daily returns that we’ll use for correlation analysis.

Next, we’ll set up the rolling correlation calculation. Instead of hardcoding the window size (like 30 days), we’ll pulling it dynamically from cell F1 in Excel. This lets the user adjust the rolling window directly in the sheet, much like a parameter in Power BI or Tableau, so they can explore how different time spans affect the correlation.

Next, we’ll create a line chart to visualize how the rolling correlation between EBAY and XOM changes over time. We start by setting the figure size for a cleaner layout, then plot the rolling_corr Series. The title automatically reflects the window size selected in cell F1, and we add axis labels to make the chart easier to interpret. This lets you quickly spot periods when the two stocks moved more or less in sync.

This rolling correlation plot shows how the relationship between EBAY and XOM stock returns changes over time. Instead of calculating a single correlation value for the entire dataset, we’re calculating it over a moving window so we can see how their connection strengthens or weakens during different periods.

When the line is closer to +1, it means the two stocks tended to move in the same direction during that window. A value near 0 means there wasn’t much of a consistent relationship—sometimes they moved together, sometimes not. When the line dips toward -1, it suggests the stocks were moving in opposite directions.

This is useful because market conditions change, and relationships between assets can shift. A static correlation won’t show that. With a rolling view, you can identify periods where diversification may have helped (when correlation is low or negative) or when it didn’t (when correlation is high). And since the window size comes from a cell in Excel, you can easily adjust it to explore how different timeframes affect the trends you see.

A natural next question after looking at a rolling correlation plot is: what window size should I use? The window determines how many days of data are used to calculate each point on the chart. Smaller windows pick up changes more quickly, while larger windows smooth out the noise and give you a broader trend.

There’s no one-size-fits-all answer. Like with most things as an analyst, it depends on your goals.

The data analyst's favorite phrase that every manager LOVES to hear: pic.twitter.com/3pwk2IXqcB

— George Mount (@gjmount) November 20, 2023

If you’re trying to spot quick shifts in behavior, a shorter window may be useful. But if you want a more stable picture that avoids overreacting to day-to-day fluctuations, a longer window might be better. Here’s a quick reference table to help you decide:

Window	Meaning	Pros	Cons
5	1 trading week	Very responsive to recent changes	Extremely volatile
20	1 trading month	Good for detecting short-term co-movement	Still a bit noisy
60	3 trading months (quarter)	Balances short-term trends with stability	May miss fast-changing relationships
120	~6 months	Smooth and stable correlation trend	Slow to react to rapid changes

In addition to plotting the rolling correlation over time, a scatterplot comparing all return points is a good idea too because it gives you a quick visual sense of the overall relationship between the two assets. While the rolling chart shows how correlation changes, the scatterplot reveals the general pattern of how often and how strongly the returns move together across the whole dataset. We can derive that with this code block:

In this case, the code creates a scatterplot of EBAY vs. XOM daily returns, where each dot is a trading day. The alpha=0.6 parameter sets the dots to be semi-transparent, making it easier to see where the points are most concentrated. If the stocks moved in lockstep, you’d see a clear diagonal line—either upward or downward. Here, the points are more loosely spread, suggesting a weak positive correlation.

You’ll also notice some outliers—points that sit far away from the center cluster. These could reflect unusual events like earnings surprises or broader market shocks. While they may be interesting, they can also distort the correlation, so it’s worth considering how much weight to give them depending on your analysis goals.

In this demo, we calculated and visualized rolling correlations using Python in Excel. By working with returns instead of raw prices, we uncovered how their relationship changes over time… something a single correlation number can’t show.

Rolling correlations are great for spotting patterns and shifts in asset behavior, but they depend on window size and assume a linear relationship. Use them as one piece of your broader analysis. If you have questions or want help bringing this kind of insight into your workflow, drop a comment below or get in touch.

The post Python in Excel: How to do rolling correlations first appeared on Stringfest Analytics.

Python in Excel: How to understand the Seaborn package

George Mount — Fri, 06 Dec 2024 19:49:41 +0000

Python’s integration into Excel has opened a new world of possibilities for data analysis and visualization, and Seaborn is one of the most exciting tools to leverage within this environment. Seaborn builds on Matplotlib, simplifying complex visualizations while maintaining flexibility and producing professional, aesthetically pleasing charts.

This post introduces Seaborn and its core functionality, with practical examples to help Excel users get started with this powerful library. Download the exercise file below to follow along:

Download the exercise file here

The building blocks of Seaborn

Seaborn stands out for its intuitive design, built to simplify the process of creating plots. Here are its key features:

Dataset-oriented approach: Seaborn integrates seamlessly with Pandas DataFrames, allowing you to directly reference column names in your visualizations. This makes it easier to create charts without manually restructuring your data, streamlining the workflow for Python in Excel users.
Plot types: Seaborn offers two categories of plotting functions:
- Axis-level functions: Functions like sns.scatterplot() and sns.barplot() are used for single plots, such as scatterplots and bar charts. These functions offer fine-grained control for customization and are a great starting point for simpler, more direct visualizations.
- Figure-level functions: Functions like sns.relplot() and sns.catplot() operate at the figure level, which means they can generate complex visualizations composed of multiple subplots. These functions are particularly useful for exploratory data analysis, as they allow you to facet your data across multiple categories or dimensions automatically. For example, you can use sns.relplot() to create scatterplots that are grouped by a categorical variable into a grid of subplots, providing deeper insights into your dataset. While powerful, figure-level functions require a slightly more involved setup and are beyond the scope of this post.
Themes and styles: With several built-in themes and styles, Seaborn ensures that your charts are professional and readable without requiring extensive manual formatting.

Seaborn relies heavily on specific functions to create different plot types, each with arguments to control data sources, variables, colors, and other aesthetics. As an Excel user, you’re already well-versed in functions—they’re the bread and butter of the Excel programming language. This familiarity puts you in a great position to dive into Seaborn.

Data import and creating your first Seaborn plot

In the following examples, we’ll work with two datasets from the exercise workbook, loading them into Python in Excel as ad_spend_df and ratings_df, respectively:

In the following code, we’ll create a scatter plot to visualize the relationship between advertising spend and monthly sales across different stores.

sns.set_theme(style='white') sets the plot’s overall style to a clean, white background, giving the visualization a minimalist and professional look. This theme will be universally applied to all subsequent Seaborn plots unless explicitly changed.

The scatterplot() function generates the scatter plot. It uses the ad_spend_df dataset, mapping the ‘Advertising Spend’ column to the x-axis and the ‘Monthly Sales’ column to the y-axis. The hue='Store' argument applies color coding to the points based on the ‘Store’ column, making it easy to distinguish data points by store.

The remaining functions are from matplotlib.pyplot, which Seaborn relies on for additional customizations. We’ll use these frequently in other examples to enhance visualizations. Here, plt.title sets the plot’s title, “Advertising Spend vs. Monthly Sales,” providing context for the chart. Similarly, plt.xlabel and plt.ylabel define the axis labels as “Advertising Spend ($)” and “Monthly Sales ($),” clarifying the data being displayed and adding essential details to the visualization.

The resulting scatterplot will look something like this. To refine it further, we might consider enhancing the axis formatting, though this requires a bit more effort compared to Excel’s point-and-click interface. However, we were able to effortlessly add color to each point by category—a task that would have been more challenging to achieve in Excel.

Exploring Seaborn’s chart types

Next, let’s explore a range of other basic plots that can be created using a similar approach with Seaborn. Some of these plots are simpler to replicate in Excel than others:

Line plot

Let’s start with a basic line chart using the lineplot() function. Here, we specify the x-axis and y-axis to map the data appropriately. The marker='o' argument adds circular markers to each data point along the line, making individual values stand out and emphasizing key points in the trend.

The ci=None argument removes the default confidence interval shading typically added by Seaborn, keeping the chart clean and focused. Finally, we’ll include chart and axis titles. Since we’ll consistently add these in future examples, we won’t call attention to it again.

The resulting chart should look like this:

KDE plot

Now let’s create a KDE plot, which offers a smoothed representation of a histogram. Unlike histograms that display data frequency within bins, KDE plots estimate the probability density function of a variable using kernels to create a smooth curve. This results in a continuous line that more accurately reflects the overall distribution of the data.

The kdeplot() function is used here with the ad_spend_df dataset, mapping the ‘Advertising Spend’ column to the x-axis. The fill=True argument fills the area under the curve, enhancing the visual appeal and clarity of the distribution. The color='purple' argument sets the curve’s color to purple, adding a touch of differentiation and aesthetics.

Violin plot

Now, let’s explore creating a violin plot. A violin plot combines the features of a boxplot and a KDE plot. It displays both the summary statistics (like a boxplot) and the data’s full distribution (like a KDE plot). The “violin” shape represents the density of the data at different values, providing deeper insights into the distribution compared to a boxplot alone.

The violinplot() function is used with the ratings_df dataset. The x='Store' argument places the store names on the x-axis, while y='Customer Rating' maps the customer ratings to the y-axis, visualizing their distribution for each store.

The inner='box' argument adds a boxplot inside each violin, summarizing the central tendency and variability of the data, such as the median and interquartile range. The palette='pastel' argument applies a soft pastel color scheme, enhancing the visual appeal of the plot.

Bar plot

Finally, let’s dive into creating a classic bar plot. To start, we calculate the average monthly sales for each store, sorting the results in descending order with .sort_values(ascending=False). The .index method retrieves the sorted store names, which are used to customize the x-axis order, ensuring the data is displayed from highest to lowest.

The sns.set_theme(style="dark") line applies a dark background theme to the plot. This theme will now act as the default for subsequent plots.

To generate the bar plot, we use the barplot() function, mapping the ‘Store’ column to the x-axis and ‘Monthly Sales’ to the y-axis. The estimator='mean' argument ensures the bars represent the mean of ‘Monthly Sales’ for each store.

By setting ci=None, we remove confidence intervals to simplify the chart and emphasize the averages. The palette='Blues_r' applies a reversed blue gradient, where darker shades indicate higher values. Finally, order=sorted_stores arranges the bars in descending order of average sales, as defined earlier.

Conclusion

Seaborn offers an incredible variety of plots, and by now, you’re likely noticing a consistent pattern of using Seaborn functions to map variables and aesthetics, followed by customizing the results with both high-level Seaborn options and Matplotlib tools. This flexibility allows you to tailor your visualizations to your exact needs.

For Excel users, Seaborn is a game-changer, enabling the creation of advanced, visually compelling charts that go beyond the basics. By integrating Python into your workflow, you gain access to powerful analytical tools and precise customization, making it easier to build sophisticated, statistically informed plots with a level of detail and polish that can be difficult to beat in Excel alone.

Have questions about using Seaborn or Python for data visualization in Excel? Drop them in the comments—I’d love to help! Don’t forget to check out the Seaborn documentation for inspiration and examples. Be sure to visit the gallery for some amazing visualization ideas.

The post Python in Excel: How to understand the Seaborn package first appeared on Stringfest Analytics.

How to create interactive scatterplots with Python in Excel

George Mount — Mon, 11 Nov 2024 19:11:34 +0000

In an earlier post, I discussed adding conditional formatting to an Excel chart. One key benefit of using Python in Excel for this—rather than relying on standard Excel charting—is the flexibility it offers. With Python, it’s simpler to adjust the chart when adding new variables or expanding categories:

How to conditionally format a scatterplot in Excel

Let’s take it up a notch. In this post, we’ll make the scatterplot fully dynamic, allowing users to select the x, y, and categorical hue variables from dropdown menus. The chart will automatically update based on these selections.

Follow along using the exercise file provided below:

Download the exercise file here

For this example, we’ll use the popular Palmer penguins dataset. To start, we need to identify which variables are quantitative and which are categorical, as this will determine how each is used in the plot. Luckily, we can use a bit of Pandas magic here!

Each column in a DataFrame is assigned a datatype, and Pandas is generally quite reliable at interpreting the correct type for each variable.

To begin, I’ll add some cells in a helper worksheet. This is where we’ll organize the options for quantitative and categorical variable selections.

Next, I’ll import the dataset into Python in Excel and create a variable to identify all the quantitative variables in the dataset. This will help us separate them for easy selection and use in the scatterplot.

I’ll do the same with the categorical variables:

At this point, you should see something like the following in the workbook: dynamic arrays displaying all the quantitative and categorical variables from the dataset. This result will make it easy to view and select the variables as we proceed with the scatterplot.

We can actually use this dynamic array as an input for data validation, thanks to the spill operator, #. If this notation is new to you, check out my blog post on the topic:

How to understand the # spill operator in Excel

Now, set up a spot to build the chart—either to the right of the dataset or in a new worksheet. Then, from the ribbon go to Data > Data Validation. Select List as the validation criteria and refer to the list of quantitative variables using the spill operator (#). This allows us to use the dynamic array directly, so whenever the list updates, the dropdown options will too.

Repeat the data validation setup for the y-axis, using the same quantitative variables list as the input, and then do the same for the category, using the category range for these options.

Finally, let’s add a dynamic chart title. Unfortunately, Seaborn doesn’t add one automatically, so we’ll have to set it ourselves. Now, we have all the inputs needed to build this chart! The user can define what goes along each axis and select the category to color by, making for a fully customizable visualization:

Now it’s time to create the scatterplot, and I’ll be using Seaborn for this. Since these inputs are only intended for this scatterplot, I’ll place them directly in the function arguments. However, if there’s a chance we might reuse these inputs elsewhere, it’s a good idea to assign them to their own variables first and then reference those variables in the plot. This keeps things modular and makes future adjustments easier!

Awesome! With everything set up, you should now see a scatterplot that updates dynamically like below. The user can select their own x-axis, y-axis, and hue categories, and the plot adjusts automatically—no manual tweaking needed. This shows just how scalable data visualization with Python in Excel can be.

This approach can easily be adapted to build other user-driven interactive visualizations. For example, you could create a bar chart where users can not only select the category variable to plot but also choose how to aggregate the results—like displaying sums, averages, and so forth.

What questions do you have about using these techniques to create interactive data visualizations with Seaborn? Let me know in the comments.

The post How to create interactive scatterplots with Python in Excel first appeared on Stringfest Analytics.