matplotlib - Stringfest Analytics

Copilot for Excel: How to do exploratory data analysis with Python

George Mount — Wed, 06 Aug 2025 23:48:35 +0000

Exploratory data analysis (EDA) is the process of analyzing data to uncover trends, anomalies, and relationships without preconceived assumptions. It typically involves summarizing data with descriptive statistics, visualizing patterns with charts and plots, and spotting potential issues like missing values or outliers.

For Excel users, EDA is essential because it ensures better-informed decisions by deeply understanding the data, reducing costly misinterpretations. While Excel’s familiar tools like PivotTables, charts and the Analysis ToolPak support basic EDA, they can fall short with larger or more complex datasets.

Python in Excel expands Excel’s EDA capabilities significantly, allowing users to leverage powerful data science libraries such as pandas and matplotlib directly within their spreadsheets. This enables sophisticated data manipulation and visualizations without ever leaving Excel. Copilot further simplifies EDA by automatically generating Python code and providing immediate insights, greatly reducing the need for coding expertise.

However, this ease and speed make strong analytical thinking even more crucial. Tools can’t replace judgment, and that’s why resources like The Data Detective by Tim Harford and Becoming a Data Head by Alex J. Gutman and Jordan Goldmeier are so valuable. They help Excel users think critically, spot biases, and responsibly interpret data insights.

Also keep in mind that all actions here will generate Python code. It’s highly beneficial to be fluent in basic Python and statistics to confidently vet and repurpose the outputs.

Download the exercise file here

If you’ve never used Advanced Analysis with Copilot in Excel before, check out this post for the basics. One key takeaway: your data needs to be uploaded to OneDrive first.

How to get started with Advanced Analysis with Python for Copilot in Excel

Also, keep in mind that results from Advanced Analysis, just like with other generative AI tools, are probabilistic, meaning individual outcomes might vary. You may even see some of these outputs from Copilot without explicitly requesting them, especially since this tool often kicks things off with exploratory analysis and visualizations. As every good analysis should!

Summarizing basic statistics

Let’s start with the foundational statistics. Sure, Excel can calculate basic statistics, but it quickly becomes cumbersome, particularly with non-adjacent columns. Moreover, Excel’s Analysis ToolPak isn’t dynamic, meaning it won’t update automatically when your data changes.

Here’s a prompt for Copilot:

“Summarize the basic statistics for the mpg dataset, including mean, median, mode, and range for key numerical features like mpg, cylinders, and horsepower.”

This prompt helps you quickly generate a high-level statistical summary of your data. Copilot even offers a concise narrative explanation you could easily adapt as the basis of a report or analysis.

And because the code here is live, all the values in your workbook update automatically whenever your data changes. (As for dynamically updating all the figures in your report, we’ll explore that another time… that could be another exciting use case for Python in Excel.)

Checking missing values and data types

Excel struggles when handling missing values and data types, both of which are critical for accurate quantitative analysis.

Copilot can efficiently tackle this with the following prompt:

“Generate a summary of missing values and data types for each column in the mpg dataset.”

Copilot quickly summarizes missing values and data types, showing that the “horsepower” column has 6 missing values while other columns are complete. Knowing the data types, such as numeric (Float64, Int64) or categorical (ObjectDType), helps analysts choose appropriate methods for handling missing data, making data cleaning faster and more accurate.

Creating visualizations

Visualizations are central to effective exploratory data analysis (EDA). While Excel can produce basic charts, it often struggles with scaling visualizations across multiple categories or generating several charts at once.

This is where combining Python and Copilot within Excel provides a substantial advantage: you can quickly create detailed visualizations, like histograms for multiple variables, all at once and without manual setup:

“Create histograms for the distribution of mpg, horsepower, and weight across all cars in the dataset.”

To compare fuel efficiency clearly across categories, box plots are an excellent choice because they visually summarize the median, variability, and presence of outliers within each group.

Use prompts like this to easily compare data across categories:

“Show box plots to compare fuel efficiency across different car manufacturers.”

The box plots shown above allow you to quickly compare MPG (miles per gallon) among the top 10 car manufacturers, making it easy to identify brands that consistently offer better or worse fuel efficiency, as well as those with more variability. With Python and Copilot integrated directly into Excel, creating these insightful visualizations is straightforward and requires minimal manual effort.

Diving deeper with correlation

Once comfortable with your data, you might transition toward confirmatory analysis or predictive modeling. Perhaps you’re curious about identifying which vehicle characteristics strongly predict fuel efficiency (mpg) or understanding how these factors relate to each other.

Start examining these relationships easily with prompts like:

“Use Copilot to calculate the correlation matrix for mpg, cylinders, displacement, horsepower, and weight.”

Correlation matrices, like the one generated above with Copilot, quickly highlight relationships by displaying correlation coefficients between pairs of variables. For example, we see that mpg is strongly negatively correlated with weight, displacement, horsepower, and cylinders.

This means that heavier vehicles, those with larger engines, or more cylinders typically have lower fuel efficiency. Such insights guide your analysis toward selecting meaningful predictors for building regression models or further confirmatory tests.

Once you’ve identified important relationships using the correlation matrix, a natural next step is visualizing these correlations to quickly detect patterns or clusters. A heatmap is ideal for this—colors intuitively highlight strong positive or negative relationships, making it easy to interpret complex correlation matrices at a glance.

For example, you can ask Copilot:

“Create a heatmap to visualize the correlation matrix for mpg, cylinders, displacement, horsepower, and weight.”

The heatmap vividly displays strong negative correlations (shown in blue) between mpg and variables like weight, horsepower, displacement, and cylinders. Meanwhile, it reveals strong positive relationships (shown in red) among these predictors themselves. This visual clarifies that as cars get heavier or have larger engines, fuel efficiency decreases, making it easier to select variables for predictive modeling.

Conclusion and next steps

In this post, we’ve explored how integrating Python and Copilot within Excel significantly expands your EDA capabilities. This integration simplifies tasks such as summarizing statistics, handling missing values, and creating visualizations, while dramatically improving the identification and interpretation of correlations. Tools like correlation matrices and heatmaps help you quickly pinpoint key relationships, ensuring you’re better prepared for predictive modeling.

As you move forward, you might explore building predictive models like linear or logistic regression to forecast outcomes based on identified relationships. You could also consider clustering techniques to segment your data into meaningful groups or even venture into advanced machine learning methods such as random forests or neural networks for deeper insights. Additionally, integrating simulation modeling or hypothesis testing within your workflow could further strengthen your analytical toolkit.

Ultimately, integrating Python and Copilot into Excel enhances both your efficiency and the quality of your insights, empowering you to make truly data-driven decisions.

How did this exploration of Python-powered EDA with Copilot in Excel resonate with you? Are there other analytical scenarios or techniques you’re curious about? Let me know in the comments.

The post Copilot for Excel: How to do exploratory data analysis with Python first appeared on Stringfest Analytics.

Python in Excel: How to understand the Matplotlib package

George Mount — Tue, 24 Dec 2024 15:56:44 +0000

Python’s incorporation into Excel offers data analysts robust new tools, particularly in the realm of data visualization. This blog post introduces Matplotlib, a core Python library for crafting visualizations. We’ll begin with the essential components, explore Matplotlib’s customization features, and discuss the benefits of using Pandas DataFrames as your data source for Matplotlib when working with Python in Excel—though this approach isn’t a panacea.

To dive in, download the exercise file here:

Download the exercise file here

The building blocks of Matplotlib

We’ll dive into creating some basic Matplotlib plots using simple datasets shortly. Before that, it’s important to grasp the structure of Matplotlib. Similar to how Excel charts consist of various elements, Matplotlib visualizations are constructed from several essential components:

pyplot: This is Matplotlib’s high-level interface, functioning similarly to Excel’s chart toolbar. It provides tools to quickly create and customize plots. In Python in Excel, this interface is typically imported with the common alias plt, allowing you to use its functions conveniently.
Figure (fig): The overall canvas that contains one or more plots, comparable to the chart area in Excel. The figure acts as the foundation for your visualization, encompassing all elements of your plot.
Axes (ax): The specific region within the figure where the data is plotted. This includes the x-axis, y-axis, gridlines, and the plot itself. It is analogous to the plot area in an Excel chart, where the main visual representation of your data appears.

When these components are brought together, you might see code like this:

fig, ax = plt.subplots()

This command creates a blank canvas—an empty figure and axes. While no data is plotted yet, it’s fully prepared for you to add and customize your visualization.

Let’s explore a more comprehensive example of creating a plot using Matplotlib. We’ll start with the same basic structure and then add layers of data to enhance it:

This will produce a simple line chart in your workbook:

Let’s break down the steps of what was done here.

First, we used plt.subplots() to generate a figure (fig) and one or more axes (ax).
Next, we used the ax.plot() method to visualize data on the axes.
Finally, we added titles, labels, legends, and gridlines using methods like ax.set_title() and ax.legend().

Python in Excel leverages an IPython kernel, which simplifies working with Matplotlib compared to standalone Python scripts. This means there’s no need to use plt.show() to explicitly render a plot as you might see in some online tutorials. Plots are displayed automatically as soon as the code is executed. Additionally, the matplotlib.pyplot module is pre-imported in Python for Excel, allowing you to use functions like plt.subplots() immediately, without requiring an import statement.

`fig, ax` isn’t always used

In many tutorials or examples, you might not see the fig, ax structure explicitly. That’s because matplotlib.pyplot also provides an interface that implicitly manages figures and axes behind the scenes. This allows for quick and simple visualizations, which can be especially useful for straightforward plots.

For instance, the following code creates a line plot without explicitly defining fig and ax:

This will return the same basic line chart as produced in the previous example.

Changing chart types in Matplotlib

What if you wanted to switch from a line plot to something else? In Matplotlib, the plot() function generates a line chart by default. To display a different plot type, you can use other Matplotlib functions. In this example, we’ll use fig and ax to showcase a variety of plot types, each displayed in a separate subplot. We’ll also specify the exact location of each plot within a 2×2 grid.

This approach lets us create a mini-dashboard with just a few lines of code.

Matplotlib and Pandas DataFrames

While we’ve been using lists as data inputs for Matplotlib so far, in Python for Excel, you’ll often work with DataFrames, as they are the default data storage type. Using DataFrames as inputs offers several key advantages, especially when paired with Matplotlib.

One major benefit is the seamless integration between Pandas and Matplotlib, which allows you to easily generate a wide range of visualizations directly from your DataFrame. Since Pandas is built on top of Matplotlib, its .plot() API simplifies the process of creating charts. This consistency in using a DataFrame structure for different chart types makes your workflow more intuitive and efficient.

Here’s a straightforward example of creating a basic line chart. The sales data is located in the sales_df worksheet of the exercise file. We’ll load this data into Python within Excel, and then you’ll see how simple the code is for generating the line chart. We just specify the x and y axis variables from the DataFrame within the arguments, and then set a title for the chart.

While visualizing a DataFrame, many customizations can be achieved using the plot() function or with some assistance from pyplot. However, by defining the fig and ax, you gain access to a broader range of customization options.

It’s worth noting that some options, like setting the y-axis label, could also be done using pyplot(), leading to some redundancy. Matplotlib’s extensive customization capabilities are both a strength and a complexity; it offers numerous ways to modify your plots, sometimes making it feel overwhelmingly customizable.

Conclusion

If you’re interested in data visualization with Python in Excel, starting with Matplotlib is a wise choice. As indicated by Google Trends, it’s one of the oldest and most widely used data visualization libraries available for Python in Excel, surpassing other data visualization packages in both age and popularity.

(If the visualization below doesn’t load, try refreshing the page or visit the page directly using this link.)

At the same time, it can be quite overwhelming and confusing with its layered approach and manual plot construction. While the Pandas API simplifies creating some basic plots directly from DataFrames, it doesn’t cover everything. So, it’s essential to get comfortable with Matplotlib, but also explore other packages that might suit the specific needs of Excel users who often work with tabular data.

Do you have any questions about the basics of Matplotlib or about using Python for data visualization in Excel more broadly? Let me know in the comments.

The post Python in Excel: How to understand the Matplotlib package first appeared on Stringfest Analytics.

Python in Excel: Why use Python for data visualization?

George Mount — Tue, 10 Dec 2024 23:25:39 +0000

Excel has for decades been a mainstay for business and data professionals, providing a familiar environment for organizing, analyzing, and visualizing data. Its built-in charts and graphs offer a quick, accessible solution for conveying insights. In many cases, these standard visuals are still perfectly adequate: if all you need is a simple line graph or a standard bar chart, Excel’s native functionality often provides the fastest path to a finished product.

As data grows more complex and the demands for more nuanced visualization options expand, however, some analysts find themselves looking beyond Excel’s native charting capabilities. This doesn’t mean Excel visuals become irrelevant; quite the opposite. They remain a critical tool in your arsenal, an easy go-to when time is short and the audience expects something clean, clear, and uncomplicated. Yet, as challenges evolve, so do the tools at our disposal. Python’s recent integration into Excel offers one such avenue for growth—not as a replacement for Excel’s familiar charts, but as an enhancement, adding another versatile option to the analyst’s toolkit.

Until recently, working with Python for data visualization typically meant operating outside Excel’s environment. If you wanted the flexibility and customization options offered by Python’s libraries you had to use external Python environments and tools and write scripts to manually load static images of those visualizations into Excel. This added complexity often deterred Excel-based analysts from experimenting with Python, keeping many locked into the point-and-click world of the spreadsheet.

That’s all changed now that Python can run natively inside Excel. By bringing Python directly to the place where so many analysts are already comfortable, Microsoft has lowered the barrier to entry. Suddenly, it’s possible to marry the best of both worlds: the familiarity and convenience of Excel’s interface with the expansive capabilities of Python’s visualization libraries. This means you can incrementally learn Python while still relying on Excel’s tried-and-true methods, switching between native charts and code-based visuals as needed. Instead of tossing out your Excel playbook, you can add Python as a complementary chapter.

A quick comparison of Excel’s native charts vs Python in Excel visualizations follows:

While Python integration in Excel is a significant step forward, it’s not a silver bullet. Within Excel’s environment, Python-generated charts currently lack some of the interactive bells and whistles analysts might enjoy when working in a standalone Python environment. For instance, popular Python-based tools for interactive elements—such as tooltips, sliders, or clickable legends—aren’t yet supported. If one of your main goals in adopting Python is to create fully interactive dashboards directly inside Excel, you may need to wait and see how this functionality evolves.

These limitations might feel like a letdown, but they shouldn’t overshadow Python’s fundamental advantages. At its core, Python’s visualization libraries bring a code-based approach to creating and refining charts. Instead of manually selecting colors, fonts, or axis options through a series of clicks and menus, you write a few lines of code that define these elements as parameters. This means your visualizations are inherently reproducible: when it comes time to update the data or tweak a detail, you only need to adjust a line or two of code. By streamlining repetitive tasks, Python helps reduce the time and effort spent clicking through formatting dialogs, making it easier to produce consistent, high-quality visuals on a regular basis.

Moreover, Python’s vast ecosystem, being open-source, encourages experimentation and collaboration. Libraries like Matplotlib, which is highly customizable, let you create anything from standard line charts to intricate multi-panel figures. Seaborn builds on Matplotlib to simplify the creation of statistically oriented plots and aesthetically pleasing defaults. Plotnine, inspired by the grammar-of-graphics approach popularized by R’s ggplot2, offers a more conceptual framework for building plots layer by layer. Each of these libraries brings unique strengths, and together they cover a wide spectrum of visualization needs. If you ever find Excel’s native charts too limited or cumbersome, Python provides an escape hatch—one that doesn’t force you to abandon your spreadsheets entirely.

A quick comparison of the Python packages currently available in Excel for data visualization follows:

Outside Excel, Python’s capabilities expand even further. In standalone Python environments or web-based dashboards, interactive elements like tooltips, zooming, and panning are easily accessible. You can integrate with tools like Bokeh, Altair, or Plotly to build rich, interactive visualizations that invite exploration. While these features aren’t currently available inside Excel’s Python integration, the ecosystem is dynamic and evolving. It’s entirely possible that, over time, the lines between what’s feasible inside Excel and outside it will blur, granting Excel analysts even more interactive capabilities without leaving their native environment.

In the meantime, think of Python as one more tool at your disposal. Excel’s charts aren’t going anywhere, and they remain a solid choice for many day-to-day tasks. Python simply broadens your horizons. By investing a bit of time in learning Python’s syntax and core visualization methods, you stand to gain a level of flexibility and efficiency that might otherwise be out of reach. Instead of viewing Python as a replacement for Excel’s charting features, view it as a supplement—an advanced set of features you can call upon when the situation demands something beyond the usual options.

Ultimately, the integration of Python into Excel marks an exciting milestone. Not because it turns Excel into a cutting-edge visualization platform overnight, but because it removes barriers. It encourages Excel’s massive user base to dip a toe into more programmatic, reproducible workflows. Over time, you can gradually incorporate Python’s libraries as you see fit, without losing the simplicity and comfort of Excel’s built-in charts. In doing so, you expand your analytical toolkit and position yourself to take advantage of whatever improvements—and increased interactivity—may come down the line.

What questions do you have about data visualization with Python in Excel? How do you see the strengths of native Excel charts compared to the new Python in Excel charts? Are there any features you wish Python in Excel offered? Let me know in the comments.

The post Python in Excel: Why use Python for data visualization? first appeared on Stringfest Analytics.

How to understand the differences between matplotlib, seaborn and plotnine for Python in Excel data visualization

George Mount — Tue, 10 Dec 2024 17:50:29 +0000

Data visualization is a fundamental aspect of effective data analysis. And although Excel is a versatile tool, certain visualization types remain difficult or even impossible to create with its built-in features alone. That’s where Python libraries such as Matplotlib, Seaborn, and Plotnine come in. These libraries provide advanced visualization capabilities that can be seamlessly integrated into Excel through Python in Excel.

But how do you decide which library to use? And what are some practical, time-saving techniques for creating visualizations that Excel’s native tools struggle with? This blog post will guide you through the essentials.

Download the exercise file below to follow along:

Download the exercise file here

Since we’re covering a variety of visualizations, we’ll be using small, one-off datasets to keep things simple. If you need help applying these concepts to your own work—likely with larger datasets—feel free to ask any questions in the comments. AI can also likely assist in adapting these techniques to fit your needs.

For this post I assume you have some familiarity with inserting plots with Python in Excel, if that’s not the case, you can check out this post:

Here are some quick wins for visualizing data with Python in Excel

Matplotlib

Matplotlib is often called the “grandfather” of Python visualization libraries, and it serves as the foundation for many others. At its core, Matplotlib gives you low-level control over every aspect of a chart. You define what your chart looks like from scratch—everything from axes labels and tick marks to colors, grids, and legends is in your hands.

This level of flexibility comes at the cost of verbosity. Tasks that might take one or two lines in another library may take several in Matplotlib. However, if you’re looking for complete customization, it’s unmatched.

How It Works

Matplotlib operates through a figure-based approach: you first create a figure and an axis (or multiple axes) and then define what to draw on those axes. Think of it as creating a blank canvas and painting on it step by step.

Example 1: Creating a Gantt Chart for Project Timelines

A Gantt chart is a classic business visualization for tracking project progress. In Excel, creating one often involves tedious manual work, but with Matplotlib, it’s straightforward:

Here, you define a horizontal bar chart, setting each bar’s starting position and length to represent tasks’ timelines. The result is clean and precise—ready for inclusion in a project report or presentation.

Example 2: Custom Branding for Sales Trends

Suppose you’re creating an executive report and need visuals that align with your company’s branding. Matplotlib provides powerful customization options, allowing you to tailor colors, fonts, and annotations to match your corporate style. Here’s an example:

This flexibility is especially valuable for creating publication-ready visuals directly in Python.

Seaborn

If Matplotlib is the raw material, Seaborn is the refined product. Built on top of Matplotlib, Seaborn simplifies the process of creating aesthetically pleasing and statistically insightful visualizations. It’s designed to work seamlessly with Pandas DataFrames, which are the default data object in Python in Excel. This makes it a natural choice for business users analyzing data directly from Excel workbooks.

Seaborn automates many of the intricate details of plotting—like adding confidence intervals to line plots or managing subplots—so you can focus on interpreting the results. It excels at identifying relationships and patterns in data, making it especially useful for exploring trends, distributions, and correlations.

How It Works

Seaborn uses a dataset-oriented API, allowing you to pass an entire Pandas DataFrame as input. You simply define which columns represent your data of interest, and Seaborn handles the rest, from aesthetic defaults to the layout of the plot.

Example 1: Correlation Heatmap for Marketing Effectiveness

Heatmaps are one of Seaborn’s standout features. Let’s say you’re trying to understand how marketing spend correlates with ROI and customer engagement. A heatmap makes these relationships clear at a glance:

This visualization highlights which variables are strongly correlated, providing actionable insights into which factors drive results.

Example 2: Small Multiples for Sales Trends by Region

If you’re monitoring sales trends across regions, Seaborn’s FacetGrid is an excellent tool for creating small multiples—individual plots for each region displayed in a neatly organized grid.

This approach simplifies the process of comparing sales trends across regions by automating the creation of separate plots for each group. It streamlines your workflow, eliminating the need to generate individual charts manually, and provides an intuitive way to visualize data patterns efficiently.

Seaborn’s tight integration with Pandas DataFrames makes it a natural fit for business users working with Python in Excel. Whether you’re analyzing correlations, exploring multi-dimensional relationships, or creating small multiples, Seaborn helps you extract insights with minimal effort and maximum impact.

Plotnine

Plotnine takes a completely different approach to visualization. It’s inspired by the “grammar of graphics” philosophy, popularized by R’s ggplot2 library. Instead of thinking in terms of “chart types,” Plotnine lets you build visualizations layer by layer, combining data elements like axes, geoms (shapes), and facets to create charts.

This structure is intuitive for users who think in terms of breaking down data relationships and is ideal for layered visualizations or multi-panel plots.

How It Works

In Plotnine, you define a plot by specifying the dataset, the aesthetic mapping (e.g., what the axes represent), and the geoms (the actual chart elements). Each of these components is added incrementally, giving you fine-grained control over the final output.

Example 1: Faceted Chart for Regional Sales Trends

If you want to compare sales trends across regions, Plotnine makes it simple to create side-by-side charts:

Faceted plots like this are perfect for breaking down KPIs by categories like region, product line, or customer segment.

Example 2: KDE plot for distribution

A KDE plot (Kernel Density Estimate) visualizes the probability density of a continuous variable, smoothing the data to show its distribution without the discrete bins of a histogram. It’s useful for understanding the shape and spread of data or identifying multiple modes.

Here, plotnine builds the KDE plot using a layered approach. The ggplot function specifies the dataset and aesthetics, mapping “Sales” to the x-axis. The geom_density layer computes and plots the density curve, filling it with a semi-transparent blue color. Finally, labs adds a title and axis labels. This modular process combines data, aesthetics, and visual elements into a cohesive plot.

Conclusion

Matplotlib, Seaborn, and Plotnine each bring distinct advantages to data visualization, especially for Python users working in Excel. Matplotlib stands out for its extensive customization options, making it ideal for specialized tasks like Gantt charts or branded visuals. However, this flexibility often comes with a steeper learning curve. The table below summarizes the key pros, cons, and differences among these three libraries.

Seaborn, with its seamless integration with Pandas DataFrames, simplifies the creation of insightful statistical visualizations, making it a natural choice for uncovering patterns and relationships with minimal effort. Plotnine’s structured, grammar-of-graphics approach excels at layered and faceted visualizations like small multiples or comparisons but may take some adjustment for users accustomed to Excel’s interface.

These libraries complement Excel’s capabilities, allowing you to overcome its limitations and tell more impactful data stories. The best choice depends on your specific needs—whether it’s control, simplicity, or advanced layering.

Exploring these tools can feel overwhelming at first, but they open up a world of possibilities for enhancing your data visualizations. If you’re unsure which library to start with or how to incorporate them into your workflows, feel free to ask in the comments. I’d also love to hear about your experiences using Matplotlib, Seaborn, or Plotnine—what challenges have you faced, and how have these tools helped? Let me know below.

The post How to understand the differences between matplotlib, seaborn and plotnine for Python in Excel data visualization first appeared on Stringfest Analytics.

Python in Excel: How to understand the Seaborn package

George Mount — Fri, 06 Dec 2024 19:49:41 +0000

Python’s integration into Excel has opened a new world of possibilities for data analysis and visualization, and Seaborn is one of the most exciting tools to leverage within this environment. Seaborn builds on Matplotlib, simplifying complex visualizations while maintaining flexibility and producing professional, aesthetically pleasing charts.

This post introduces Seaborn and its core functionality, with practical examples to help Excel users get started with this powerful library. Download the exercise file below to follow along:

Download the exercise file here

The building blocks of Seaborn

Seaborn stands out for its intuitive design, built to simplify the process of creating plots. Here are its key features:

Dataset-oriented approach: Seaborn integrates seamlessly with Pandas DataFrames, allowing you to directly reference column names in your visualizations. This makes it easier to create charts without manually restructuring your data, streamlining the workflow for Python in Excel users.
Plot types: Seaborn offers two categories of plotting functions:
- Axis-level functions: Functions like sns.scatterplot() and sns.barplot() are used for single plots, such as scatterplots and bar charts. These functions offer fine-grained control for customization and are a great starting point for simpler, more direct visualizations.
- Figure-level functions: Functions like sns.relplot() and sns.catplot() operate at the figure level, which means they can generate complex visualizations composed of multiple subplots. These functions are particularly useful for exploratory data analysis, as they allow you to facet your data across multiple categories or dimensions automatically. For example, you can use sns.relplot() to create scatterplots that are grouped by a categorical variable into a grid of subplots, providing deeper insights into your dataset. While powerful, figure-level functions require a slightly more involved setup and are beyond the scope of this post.
Themes and styles: With several built-in themes and styles, Seaborn ensures that your charts are professional and readable without requiring extensive manual formatting.

Seaborn relies heavily on specific functions to create different plot types, each with arguments to control data sources, variables, colors, and other aesthetics. As an Excel user, you’re already well-versed in functions—they’re the bread and butter of the Excel programming language. This familiarity puts you in a great position to dive into Seaborn.

Data import and creating your first Seaborn plot

In the following examples, we’ll work with two datasets from the exercise workbook, loading them into Python in Excel as ad_spend_df and ratings_df, respectively:

In the following code, we’ll create a scatter plot to visualize the relationship between advertising spend and monthly sales across different stores.

sns.set_theme(style='white') sets the plot’s overall style to a clean, white background, giving the visualization a minimalist and professional look. This theme will be universally applied to all subsequent Seaborn plots unless explicitly changed.

The scatterplot() function generates the scatter plot. It uses the ad_spend_df dataset, mapping the ‘Advertising Spend’ column to the x-axis and the ‘Monthly Sales’ column to the y-axis. The hue='Store' argument applies color coding to the points based on the ‘Store’ column, making it easy to distinguish data points by store.

The remaining functions are from matplotlib.pyplot, which Seaborn relies on for additional customizations. We’ll use these frequently in other examples to enhance visualizations. Here, plt.title sets the plot’s title, “Advertising Spend vs. Monthly Sales,” providing context for the chart. Similarly, plt.xlabel and plt.ylabel define the axis labels as “Advertising Spend ($)” and “Monthly Sales ($),” clarifying the data being displayed and adding essential details to the visualization.

The resulting scatterplot will look something like this. To refine it further, we might consider enhancing the axis formatting, though this requires a bit more effort compared to Excel’s point-and-click interface. However, we were able to effortlessly add color to each point by category—a task that would have been more challenging to achieve in Excel.

Exploring Seaborn’s chart types

Next, let’s explore a range of other basic plots that can be created using a similar approach with Seaborn. Some of these plots are simpler to replicate in Excel than others:

Line plot

Let’s start with a basic line chart using the lineplot() function. Here, we specify the x-axis and y-axis to map the data appropriately. The marker='o' argument adds circular markers to each data point along the line, making individual values stand out and emphasizing key points in the trend.

The ci=None argument removes the default confidence interval shading typically added by Seaborn, keeping the chart clean and focused. Finally, we’ll include chart and axis titles. Since we’ll consistently add these in future examples, we won’t call attention to it again.

The resulting chart should look like this:

KDE plot

Now let’s create a KDE plot, which offers a smoothed representation of a histogram. Unlike histograms that display data frequency within bins, KDE plots estimate the probability density function of a variable using kernels to create a smooth curve. This results in a continuous line that more accurately reflects the overall distribution of the data.

The kdeplot() function is used here with the ad_spend_df dataset, mapping the ‘Advertising Spend’ column to the x-axis. The fill=True argument fills the area under the curve, enhancing the visual appeal and clarity of the distribution. The color='purple' argument sets the curve’s color to purple, adding a touch of differentiation and aesthetics.

Violin plot

Now, let’s explore creating a violin plot. A violin plot combines the features of a boxplot and a KDE plot. It displays both the summary statistics (like a boxplot) and the data’s full distribution (like a KDE plot). The “violin” shape represents the density of the data at different values, providing deeper insights into the distribution compared to a boxplot alone.

The violinplot() function is used with the ratings_df dataset. The x='Store' argument places the store names on the x-axis, while y='Customer Rating' maps the customer ratings to the y-axis, visualizing their distribution for each store.

The inner='box' argument adds a boxplot inside each violin, summarizing the central tendency and variability of the data, such as the median and interquartile range. The palette='pastel' argument applies a soft pastel color scheme, enhancing the visual appeal of the plot.

Bar plot

Finally, let’s dive into creating a classic bar plot. To start, we calculate the average monthly sales for each store, sorting the results in descending order with .sort_values(ascending=False). The .index method retrieves the sorted store names, which are used to customize the x-axis order, ensuring the data is displayed from highest to lowest.

The sns.set_theme(style="dark") line applies a dark background theme to the plot. This theme will now act as the default for subsequent plots.

To generate the bar plot, we use the barplot() function, mapping the ‘Store’ column to the x-axis and ‘Monthly Sales’ to the y-axis. The estimator='mean' argument ensures the bars represent the mean of ‘Monthly Sales’ for each store.

By setting ci=None, we remove confidence intervals to simplify the chart and emphasize the averages. The palette='Blues_r' applies a reversed blue gradient, where darker shades indicate higher values. Finally, order=sorted_stores arranges the bars in descending order of average sales, as defined earlier.

Conclusion

Seaborn offers an incredible variety of plots, and by now, you’re likely noticing a consistent pattern of using Seaborn functions to map variables and aesthetics, followed by customizing the results with both high-level Seaborn options and Matplotlib tools. This flexibility allows you to tailor your visualizations to your exact needs.

For Excel users, Seaborn is a game-changer, enabling the creation of advanced, visually compelling charts that go beyond the basics. By integrating Python into your workflow, you gain access to powerful analytical tools and precise customization, making it easier to build sophisticated, statistically informed plots with a level of detail and polish that can be difficult to beat in Excel alone.

Have questions about using Seaborn or Python for data visualization in Excel? Drop them in the comments—I’d love to help! Don’t forget to check out the Seaborn documentation for inspiration and examples. Be sure to visit the gallery for some amazing visualization ideas.

The post Python in Excel: How to understand the Seaborn package first appeared on Stringfest Analytics.

Python in Excel: How to create a running total

George Mount — Tue, 19 Mar 2024 16:42:23 +0000

One of my favorite applications of Python is time series analysis, particularly through the use of the Pandas package. This package was designed, in large part, to handle time series data in ways that Excel cannot always manage.

Consider, for example, a running total. While constructing a basic one in Excel is not particularly challenging, configuring it to seamlessly work with expanding data sources can be somewhat difficult. Pandas addresses this challenge by providing a simple syntax that is not only optimized for time series data but also dynamic enough to adapt to changing characteristics of the source data.

To follow along, please check out the exercise file provided below:

Download the exercise file here

In this dataset, we track the number of pledges made to a campaign and compare it to our overall pledge goal.

For this post, I’ll proceed under the assumption that you’re proficient in importing Excel tables into Python and utilizing Python for plotting within Excel. Additionally, it’s assumed that you’re familiar with creating basic calculated columns and charts using Python. If this isn’t the case, I recommend my book Advancing into Analytics.

Calculating a running total in Pandas is straightforward enough. To achieve this, we’ll simply create a new column by applying the cumsum() method to the Sales column.

Adjusting the size of the source table or the calculated column does not affect its functionality—it’s as straightforward as that.

Having addressed the calculation itself, let’s shift our focus to visualizing this data, aiming to enhance its interactivity and assist users in monitoring its progress.

A practical approach could involve generating a line chart to observe the evolution of the running total over time. I plan to also include my pledge goal as a horizontal line within this chart, providing a clear comparison to ascertain whether the goal has been achieved or to gauge any shortfall.

Let’s go ahead and do this with a combination of seaborn and matplotlib.pyplot functionality:

This chart is quite impressive, but it’s missing a crucial element that all charts should have: a title. Let’s fix that. However, instead of opting for a simple title, let’s make it a little more interesting with some dynamic inputs.

To start, I’ll incorporate a section in the workbook that allows the user to dynamically adjust the pledge goal, which will, in turn, alter the chart’s horizontal line accordingly.

Following that, I’ll introduce a dynamic chart title. Should the pledge goal be reached, the chart will display the date on which this milestone was achieved. Conversely, if the goal remains unmet, the chart will indicate the number of pledges still required.

These new requirements have made the code somewhat more complex, and it can’t be neatly captured in a single screenshot. Below, you’ll find the code snippet instead.

It utilizes an if statement in Python to determine the appropriate title for the plot, which is then set as the plot’s title. For those eager to delve deeper into Python programming fundamentals, I again recommend my book Advancing into Analytics.

This provides you with a tool like the following.

Here, the user can effortlessly insert a new pledge goal and determine whether it has been achieved or not.

You can further customize this chart by dynamically changing its color to indicate whether the pledge goal has been met. Additionally, you might consider forecasting the expected time to achieve this goal. Both tasks can be efficiently accomplished using Python within Excel.

I hope this post has demonstrated a quick win for integrating Python with Excel, especially for time series data. Do you have any questions about using Python in Excel for time series analysis or anything else? Please share your thoughts in the comments.

Additionally, if you’re aiming to enhance your team’s skills to maximize the impact of financial data in Excel using Python, consider exploring my Python Foundations for Finance corporate workshops:

Python Foundations for Finance workshops

The post Python in Excel: How to create a running total first appeared on Stringfest Analytics.

Python in Excel: How to make an open-high-low-close (OHLC) chart

George Mount — Mon, 04 Mar 2024 22:43:27 +0000

As a Python enthusiast for a few years and an Excel aficionado for even longer, it’s been exciting to see Python become officially supported in Excel. Yet, I still receive a lot of questions from Excel users about who this tool is right for and when they should use it.

It’s a fair question and one that’s hard to always have an answer for, especially with Excel offering so much great native functionality these days! However, one discipline that particularly stands to benefit from Python is finance. It has truly become a tool of the trade for finance professionals, to the extent that the CFA exams now even assume proficiency in Python.

This post will demonstrate how to create a particular kind of financial markets analysis plot using Python, directly within Excel. You can follow along with the link below:

Download the exercise file here

What is an OHLC plot?

An OHLC plot is a financial chart showing Open, High, Low, and Close prices for a set period—day, week, or month. It visually summarizes price movements, offering market behavior insights. Traders, investors, and analysts use it to gauge market trends, make decisions, and spot trading opportunities from historical data.

OHLC plots with Python in Excel

Summarizing the data

The first task on our agenda is to summarize the data. Currently, we have hourly prices, and to facilitate plotting, I aim to extract the open, high, low, and close prices for each date. This task shouldn’t be too daunting as pandas offers convenient tools for such operations which I can employ in our Excel-based Python environment.

To kick things off, I’ll convert the stock prices from the exercise workbook’s stocks table into a DataFrame. From there, I’ll leverage pandas to determine the daily open, high, low, and close prices—a process that involves transforming the DataFrame index into dates. By employing a combination of the resample() and ohlc() methods, I can reformat the data into a structure conducive for Python-based plotting and user comprehension.

I have pasted the code in a snippet below for ease of visibility and use:

Building the OHLC plot

The next step is to actually build the plot. Although there are specific Python packages like mplfinance that can easily generate financially-related plots in Python such as this one, we do not have access to them in the Python in Excel environment. Therefore, we will need to be a little creative by using matplotlib.pyplot to build the chart mostly from our own code.

Since this code is somewhat challenging to understand, and it’s difficult to fit everything into one screenshot, I will provide it below in a snippet with some comments. Then, I will display the results inside Excel. Please note that when you receive a plot from Python in Excel, it will appear in a single cell, so you will want to resize it appropriately.

Your Python in Excel plot should look something like this:

Interpretation

Let’s take a moment to interpret what we’ve created. The horizontal lines to the left and right show the opening and closing prices for each day, respectively. A green bar means the stock closed higher than it opened, which is seen as positive, while a red bar means it closed lower, seen as negative.

In the chart you’ve shown, the stock had two days where it ended higher than it started, and three days where it ended lower. The lengths of the vertical lines show how much the price fluctuated during the day—the longer the line, the more the price moved.

Conclusion

I hope this blog post has helped you consider what types of tools can assist you in leveraging Python within Excel. Perhaps you’ve discovered a new finance plot style, or at the very least, gained insights into how to create one. What questions do you have, or what ideas are you considering for utilizing Python in Excel? Feel free to share your thoughts in the comments. I also offer corporate training services on various topics in Python, Excel, finance and data analytics.

The post Python in Excel: How to make an open-high-low-close (OHLC) chart first appeared on Stringfest Analytics.

How to understand the initialization environment in Python for Excel

George Mount — Sun, 05 Nov 2023 14:51:20 +0000

Note: The Python in Excel initialization environment is now editable! Check out this post from Microsoft for more.

The availability of Python integration within Excel has made Python an accessible and valuable skill for many Excel users to consider adding to their toolkit.

However, as an open-source programming language, Python operates differently from proprietary spreadsheet software like Excel.

This post examines a significant distinction—the use of packages in Python. We will discuss how these packages function, their management in Python-enabled Excel environments, and their essential role in the vast majority of data analysis tasks carried out with Python.

What are packages?

At its core, a Python package is a collection of Python files and code that, when combined, extend the capabilities of Python programs. In this way, they function similarly to Excel add-ins.

However, unlike VBA add-ins, which are typically used sporadically and independently of one another, the utilization of packages is fundamental to programming in Python. Almost every Python script you wil come across begins with commands to import one or more packages, and this holds true for data analysis in Python as well.

Python packages for data analysis

When guiding Excel users through learning Python, it’s important to emphasize that Python was not originally designed with a primary focus on data analysis. What seems straightforward in Excel, such as multiplying a range by two, can yield surprising results in Python, as illustrated in this blog post:

Python was not designed for data analysis (and why that’s OK)

Certainly, the fact that Python was not inherently set up for data analysis does not deter its use for such purposes. On the contrary, the robustness of Python’s data analysis capabilities has grown significantly due to the availability of specialized packages.

To effectively use Python for data analysis, it’s essential to incorporate a variety of these packages into your code.

This necessity is where the initialization script for Python in Excel proves invaluable. It sets the stage for Excel users to harness Python’s power by managing package importation and setting up the environment for data manipulation and analysis tasks.

Python in Excel is configured with a specific setup that prepares Python to carry out various tasks within your workbook. This configuration includes setting up essential packages. Some parts of this code ensure that Python and Excel operate together seamlessly behind the scenes, while most of the setup involves making useful packages accessible for data analysis and visualization in Python on the user interface.

Let’s explore the location of the initialization script in Excel and examine its contents.

Viewing the initialization script

To view the initialization script for Python in Excel, navigate to the ‘Formulas’ tab on the ribbon and then click on ‘Initialization’. An ‘Initialization’ sidebar should appear on the right side of your worksheet, as follows:

To make this set of imports a little more interpretable, I’ve moved some things around and added comments into the below script:

Much of the code here pertains to backend integration between Python and Excel, which I wouldn’t recommend for beginners to focus on initially. However, there are five Python packages that are particularly useful for users looking to maximize Python for data analysis:

numpy & pandas for data manipulation and analysis
matplotlib.pyplot & seaborn for data visualization
statsmodels for statistical analysis

For a basic overview of using all of these packages, check out my book, Advancing into Analytics:

Advancing into Analytics: From Excel to Python and R (O’Reilly)

Advancing into Analytics: From Excel to Python and R (O’Reilly)” — Stringfest Analytics" src="proxy.php?url=https://stringfestanalytics.com/book/embed/#?secret=dauNH2A7u3#?secret=H92KtGawyU" data-secret="H92KtGawyU" width="500" height="282" frameborder="0" marginwidth="0" marginheight="0" scrolling="no">

What other packages are available?

While there are many more thousands of packages available for Python, only a handful more are currently supported in the Excel environment. You can get a full list of them here.

To include any of these additional packages in your workbook or to otherwise make any additions to the intialization, Excel at this time suggests the following:

Tip: The initialization settings are currently read-only. You can work around this by creating a separate sheet that is the first sheet in your workbook and entering desired import statements and settings on this worksheet. We calculate Python formulas in row-major order and then worksheet order, so code on the first worksheet is the first to run.

To conclude, Python packages add a new dimension to Excel for data analysis, functioning like supercharged add-ins. This post walked you through the key elements of integrating Python packages with Excel, focusing on tasks such as data manipulation, visualization, and statistical analysis.

Do you have any questions about Python packages, whether in general or specifically related to their use with Excel? Feel free to share your queries in the comments section below.

The post How to understand the initialization environment in Python for Excel first appeared on Stringfest Analytics.

matplotlib - Stringfest Analytics

Copilot for Excel: How to do exploratory data analysis with Python

Summarizing basic statistics

Checking missing values and data types

Creating visualizations

Diving deeper with correlation

Conclusion and next steps

Python in Excel: How to understand the Matplotlib package

The building blocks of Matplotlib

fig, ax isn’t always used

Changing chart types in Matplotlib

Matplotlib and Pandas DataFrames

Conclusion

Python in Excel: Why use Python for data visualization?

How to understand the differences between matplotlib, seaborn and plotnine for Python in Excel data visualization

Matplotlib

How It Works

Example 1: Creating a Gantt Chart for Project Timelines

Example 2: Custom Branding for Sales Trends

Seaborn

How It Works

Example 1: Correlation Heatmap for Marketing Effectiveness

Example 2: Small Multiples for Sales Trends by Region

Plotnine

How It Works

Example 1: Faceted Chart for Regional Sales Trends

Example 2: KDE plot for distribution

Conclusion

Python in Excel: How to understand the Seaborn package

The building blocks of Seaborn

Data import and creating your first Seaborn plot

Exploring Seaborn’s chart types

Line plot

KDE plot

Violin plot

Bar plot

Conclusion

Python in Excel: How to create a running total

Python in Excel: How to make an open-high-low-close (OHLC) chart

What is an OHLC plot?

OHLC plots with Python in Excel

Summarizing the data

Building the OHLC plot

Interpretation

Conclusion

How to understand the initialization environment in Python for Excel

What are packages?

Python packages for data analysis

Viewing the initialization script

What other packages are available?

`fig, ax` isn’t always used