numpy - Stringfest Analytics

Python in Excel: How to build optimization models with Copilot

George Mount — Wed, 12 Nov 2025 22:50:40 +0000

Excel users typically turn to Solver for optimization, but let’s face it—Solver isn’t exactly user-friendly, with its confusing menus, tricky constraints, and cryptic error messages. Luckily, there’s another option: Python in Excel’s Advanced Analysis.

But here’s the catch: Advanced Analysis expects neatly structured datasets with clear rows and columns. Optimization problems, however, usually involve exploring relationships between variables, calculations, and constraints, making them quite different from typical data analysis tasks.

That means we’ll need a creative workaround, but it’s also a great chance to rethink optimization and discover what’s possible with Excel and Python together. Ready? Open a blank workbook and type “Advanced Analysis with Python” into Copilot to launch Advanced Analysis and get started.

Example 1: Product optimization

Product optimization is a classic analytics problem, and Python in Excel gives us a surprisingly clean way to model it without Solver. In this example we’re working with a simple two-product scenario. I frame it almost like a word problem: lay out the parameters, constraints, and objective in plain language, then hand the entire setup to Copilot.

Copilot interprets the problem and returns the optimal solution:

The entire optimization model is translated into and then executed in Python. It’s helpful to look at the code Copilot generates behind the scenes:

If you know Python, you’ll notice it uses plain lists, tuples, and NumPy arrays rather than DataFrames, even though Python in Excel normally wraps most inputs and outputs as DataFrames. That difference introduces a little friction when you’re doing optimization inside Excel. You’ll may notice in the following examples that important final values like total profit or total cost don’t always appear in the results, because they don’t fit cleanly into the output DataFrame Copilot creates. When that happens, you may need to ask Copilot for those numbers directly or print them from the code yourself.

Still, the workflow is straightforward: describe the optimization problem clearly, pass it to Copilot, and let Python build and solve the model directly inside Excel.

Three-product optimization with added constraints

Here’s a slightly more realistic scenario. Instead of choosing between two products, we now have three: A, B, and C. Each one has different profit margins and different demands on labor and material, and now we add real-world constraints like minimum production and maximum capacity. Again, I describe the entire optimization problem in plain language, just as you’d see it in a textbook or case study, and hand that straight to Copilot.

Copilot interprets the structure, builds a Python optimization model underneath, and returns the production plan.

Python decides that the best mix is roughly 13 units of A, 49 units of B, and zero units of C. That result fully respects the labor and material limits, meets the minimum production requirement for A, and stays under the cap for C.

As with the previous example, you don’t automatically see the total profit or resource utilization in the output table. Those values exist in the Python environment (or could be easily produced), but they’re not displayed unless you explicitly ask for them.

Minimizing shipping costs across locations

This last example shifts from production planning to a classic transportation problem. We have two warehouses and three retail stores, each with its own shipping cost, capacity, and demand. Again, I describe the entire problem as a word-problem: the costs, the capacities, the store requirements, and the objective of minimizing total shipping cost.

Copilot takes that plain-language description, builds the underlying optimization model in Python, and returns the shipment plan.

Warehouse 1 fully supplies Store 1 and part of Store 3, while Warehouse 2 picks up the remaining demand for Stores 2 and 3. Some routes end up with zero shipments because the optimizer naturally avoids expensive or unnecessary paths. And just like before, the output appears as a small DataFrame because that’s the default format for Python in Excel. If you want additional information like the total minimized cost you’d need to ask Copilot for it or extract it directly from the Python code.

Want to check your results against mine? Download the solution workbook below:

Download the solution workbook here

In summary, while Excel users traditionally turn to Solver for optimization, Python in Excel’s Advanced Analysis offers an attractive alternative. This approach simplifies model building by allowing users to frame optimization scenarios clearly in everyday language, bypassing Solver’s often complex interface. Leveraging Python’s powerful analytical capabilities directly within Excel makes this method intuitive and accessible.

However, it’s important to recognize some limitations. The optimization packages available within Python in Excel—primarily SciPy, NumPy, and pandas—can effectively handle many scenarios similar to Solver’s linear and nonlinear optimization tasks. Yet, advanced capabilities such as integer programming (available via Solver’s GRG or Evolutionary engines) or convex optimization tasks typically handled by specialized libraries like PuLP or CVXOPT aren’t supported within the current Python environment in Excel.

Additionally, users may encounter issues where essential outputs, like total profits or minimized costs, aren’t immediately visible and require manual extraction from the Python environment or additional prompts to Copilot.

By clearly defining optimization problems in plain language and using Python libraries available in Excel, you can streamline complex modeling tasks and gain practical insights without leaving the familiar Excel interface.

The post Python in Excel: How to build optimization models with Copilot first appeared on Stringfest Analytics.

Python in Excel: How to calculate value at risk (VaR)

George Mount — Tue, 01 Jul 2025 14:11:00 +0000

Value at Risk (VaR) measures the potential loss of an asset or portfolio under normal market conditions, typically defined by a specific confidence level and time period. It’s essential for risk management, allowing analysts and portfolio managers to quantify and control potential losses.

Traditionally, calculating VaR in Excel has been cumbersome, involving complex techniques that are error-prone and difficult to scale. With Python now integrated into Excel, performing accurate and efficient VaR calculations is simpler than ever. Python brings powerful statistical and computational tools directly into Excel’s familiar interface, making sophisticated risk calculations accessible to all Excel users, even those new to coding.

In this post, you’ll learn exactly how Python in Excel simplifies VaR calculations. Follow along with the demonstration using the exercise file linked below.

Download the exercise file here

This workbook includes all of the code you’ll need, shown step by step. First, let’s load and transform the data:

To start our VaR calculation, we’ll import our historical price data from Excel into a DataFrame. Next, we’ll calculate daily returns, since VaR measures the potential loss based on changes in value rather than absolute prices. By analyzing these daily returns, we can quantify the typical range of gains and losses, which forms the basis of our risk assessment. With this data prepared, we’re now ready to accurately estimate and visualize the Value at Risk.

Next, we set our confidence level to 95% and use NumPy function to find the daily return threshold that marks the worst 5% of outcomes— that is, our VaR. To clearly present the result, we use Python’s f-string formatting, which lets us embed calculations directly into a string, neatly displaying our VaR as a percentage.

Now, we extend our one-day VaR to a longer period: in this case, 10 days. Because risk tends to scale with time, we multiply our original one-day VaR by the square root of the holding period. This calculation gives us a practical estimate of our maximum expected loss over the longer timeframe. Again, we’re using an f-string to neatly format and display our 10-day VaR result clearly as a percentage.

Finally, we visualize the distribution of daily returns using a histogram to clearly see the frequency of different returns. We also add a vertical dashed red line at our 95% VaR threshold to highlight this critical risk boundary visually. This plot makes it easy to understand how frequently different returns occur, and where our VaR fits within this distribution, providing a clear visual understanding of potential risk exposure.

Here’s our resulting plot:

This histogram shows the distribution of daily returns for a particular asset or portfolio. The red dashed line represents the Value at Risk (VaR) at a 95% confidence level, positioned at -1.50%. This means there is a 5% chance of experiencing a daily loss exceeding 1.50%. In other words, on 95% of trading days, the losses are expected to be better (less severe) than this threshold. Visualizing VaR in this manner helps financial analysts quickly grasp the potential downside risk, highlighting how extreme losses, while infrequent, are still a vital consideration in risk management strategies.

Last but not least, let’s translate our VaR percentages into meaningful dollar amounts. By multiplying the VaR values by the total portfolio value, we calculate exactly how much money could potentially be lost within both one-day and 10-day periods at the 95% confidence level.

Setting this up in an Excel output, this summarizes the Value at Risk analysis in practical terms. With a portfolio valued at $100,000, the 95% one-day VaR indicates a potential loss of approximately $1,503.37 on a single day, meaning there’s only a 5% chance that daily losses would exceed this amount under typical market conditions.

Extending the timeframe, the 10-day VaR at the same confidence level shows a potential loss of approximately $4,754.08, highlighting how risk scales over longer holding periods. This output provides clear, dollar-specific insights, making it easier to communicate and manage the portfolio’s potential risk exposure.

In this post, you’ve learned how to calculate and visualize Value at Risk (VaR) using Python directly within Excel. By integrating Python’s statistical power into Excel’s familiar environment, you’ve simplified a previously complex task, making advanced financial risk analysis more accessible. Understanding VaR allows you to better quantify and manage potential losses, enhancing your decision-making in finance.

It’s important to recognize that this example is quite simplistic, and real-world scenarios often involve more sophisticated modeling. Additionally, VaR has faced criticism for contributing to inadequate risk assessments, particularly during financial crises, because it relies heavily on assumptions like normal market conditions and consistent volatility. Therefore, while VaR is a useful tool, always combine it with additional risk measures, scenario analyses, and stress tests for a more robust risk assessment.

If you have any questions about this approach or wish to explore Python in Excel further for financial analytics, feel free to leave a comment below.

The post Python in Excel: How to calculate value at risk (VaR) first appeared on Stringfest Analytics.

Python in Excel: How to generate fake data with Faker

George Mount — Sat, 17 Aug 2024 15:17:28 +0000

In a previous blog post I showed how to get over limitations of creating dummy datasets directly in Microsoft 365 Copilot by asking for a Python script instead, particularly the faker package.

How to generate datasets with Copilot for Microsoft 365

Technology moves quickly, and since publishing that post, the Faker package has become available directly within the Python in Excel environment! In this post, I’d like to show you how to generate datasets directly in Excel using this package.

To follow along with this tutorial, download the exercise file below.

Download the exercise file here

When to use Faker

Faker is especially useful when generating categorical data, such as locations, email addresses, or names. To get started with any Faker session, we’ll begin with a couple of lines of code:

from faker import Faker imports the Faker class from the Faker library.
fake = Faker() creates a new instance of the Faker class and assigns it to the variable fake.

This Faker instance can then be used to generate various types of fake data. If you’re not familiar with this vocabulary of classes and instances, they relate to Python’s use of object-oriented programming. However, you don’t need to worry too much about the theory for now—let’s dive into creating some fake data!

To do so, let’s get started by creating a fake name! The code fake_name = fake.name() generates a fake name using the Faker instance and assigns it to the variable fake_name. We’ll then view the results in our Excel workbook.

We just created a single fake name, which can be useful, but you’ll likely need to generate more data than that for most tasks. You can generate multiple names at once using a Python feature called list comprehension, like so:

This code uses the same fake.name() method as before to generate a new fake name for each iteration over the range(10), resulting in a list of 10 unique names:

So far, very cool! You might notice that the names I’m getting are completely different from yours and that your names keep changing every time you run the code. This can be a problem if you want to share your results or reuse the data later. To make the data reusable, we can stabilize the random number generation by setting a random seed. Once you set the seed, everything else in your Faker instance will consistently produce the same random numbers.

In the example below, my 10 names will remain the same after setting the random seed, but the one generated before setting the seed will continue to change. I usually set my random seed to 1234, but you can choose any integer—just be consistent.

Awesome work so far! Now, let’s get a bit more sophisticated. Faker can generate much more than just names. You can explore the thorough documentation to see all the possibilities. There are numerous categories, and it can even set the locale of the random data to make it more location-appropriate. For example, you can generate phone numbers formatted for the United States versus those you’d find in India.

Generating a DataFrame of fake data

Let’s take our data generation a step further. This time, I want to generate a list of both names and emails. Now, we’re entering the realm of data that makes sense to organize into rows and columns. This sounds like a job for Pandas!

What I’m going to do is create a dictionary in Python, which will be a great way to hold the data before we shape it into a Pandas DataFrame. This will give us a typical two-dimensional dataset that Excel users will find familiar:

This approach is pretty cool, but we might want to ensure the data is more internally consistent by aligning the email addresses with the corresponding names. To achieve this, we can start by generating the person’s name and then convert that name into an email address in lowercase, using @example.com. Once these elements are created, we can directly insert both lists into a DataFrame.

You’ll see that the results in our workbook are much more consistent but perhaps overly simplified.

For example, you might want these email addresses to use more realistic and varied domains rather than just example.com. You’ll also notice that the first entry results in a bogus email address—since the “Mrs.” in the name wasn’t stripped, it created an odd-looking email address. These issues can definitely be fixed with more advanced techniques, but we’ll leave it as is for now. Again, the point of this exercise is to generate data that’s close enough to realistic. After all, this is just fake data.

Awesome work! If you plan to continue generating this specific set of fake data, it’s a great idea to package it into a function. This approach allows you to easily call the function later in your workbook to generate exactly the number of rows you need. To make this function self-contained, I’ve ensured that everything necessary for it to work is defined internally. You’ll notice that I’m setting up a standalone instance and seed for Faker directly within the function:

This code is reproduced below:

Generating quantitative data with Faker

The Faker package can generate a wide range of categories and handle tasks involving quantities. For example, if I wanted to create a function to randomly generate a DataFrame with sales categories and a sales quantity variable, with numbers evenly distributed between 1 and 100, my code would look like this. Notice that I’ll create a custom list of sales categories to randomize from, using Python’s random package instead of Faker. This means I need to set the random seed for both Faker and random since the randomizations are coming from two different sources.

However, if you’re looking to generate more complex statistical distributions or need greater control over the statistical properties of the data (such as the mean, variance, and so forth), you might consider using other packages specifically designed for statistical simulations. In particular, the numpy package is excellent for this. Keep in mind that if you use a different package than Faker, you’ll want to set the random seed for it as well, as the random generators used will be different.

For example, let’s generate the same dataset of categories and quantities, but this time we’ll use numpy to create a variable that is normally distributed with a mean of 50 and a standard deviation of 10. At this point, we’re not even using Faker anymore because we’ve created our own custom list of business categories tailored to our specific use cases. Additionally, we’re working with more specific variable distributions, which Faker isn’t really designed to handle.

This demonstrates that while Faker is excellent for generating categorical data, it’s not the ultimate tool for all data generation needs in Python.

Getting help with Copilot

Because there are so many types of variables you might want to create, and Faker is just one tool in the toolkit, a great way to expand your range of possibilities is by using Copilot to have generative AI write the script for you. For example, here’s a prompt you can give to Copilot in Excel to generate a dataset. Unfortunately, this feature can be a bit hit or miss right now in Copilot for Excel. If it doesn’t work, I’d suggest trying it in Copilot for 365, as shown in the post shared at the beginning of this post. But here’s a prompt you can try in Copilot in Excel:

Create a Python script to give me a dataset for an organization’s employee performance review for 500 employees with the following details:

Columns: Employee ID, Department, Performance Score, Salary, Years of Experience Performance
Score: Normally distributed with a mean of 70 and a standard deviation of 10
Salary: Log-normally distributed with a mean of $50,000 and a standard deviation of $15,000
Years of Experience: Exponentially distributed with a lambda of 0.1
Department: Randomly chosen from ‘Sales’, ‘HR’, ‘IT’, ‘Marketing’, ‘Finance’
Random seed: Set to 1234
Faker seed instance: Set to 1234

Keep in mind that Copilot requires data to be stored in a table format to be used effectively. To work around this, I created a fake table as a workaround to run the prompt and achieve the desired result.

Copilot does provide some results, which are displayed in full in the Gist below.

Conclusion

In conclusion, the Faker package in Python is a powerful and flexible tool for generating fake data that can be seamlessly integrated into Excel. It’s an ideal solution for Excel analysts and data enthusiasts who need to test and validate their spreadsheets. Whether you’re preparing data for demonstrations, training sessions, or simply ensuring the robustness of your Excel applications, Faker offers a straightforward way to create realistic datasets on the fly. However, it’s definitely not the only tool you need to know for generating realistic datasets. Be aware of NumPy, particularly for quantitative data, and don’t hesitate to use Copilot for creating more complex generation scripts.

What questions do you have about the Faker package specifically, or Python in Excel more generally? Let me know in the comments.

The post Python in Excel: How to generate fake data with Faker first appeared on Stringfest Analytics.

Python in Excel: How to understand how objects work

George Mount — Tue, 28 Nov 2023 17:46:25 +0000

The concept of objects significantly distinguishes Python from Excel, enhancing its importance in programming. Integrating Python objects into Excel introduces unique behaviors and functionalities, which will be the focal point of this blog post.

Download the following to explore this intriguing intersection of Python and Excel:

Download the exercise file here

We’ll begin the exploration in the formulas-objects worksheet.

How do Python objects work differently than Excel cells?

One of the most notable distinctions between Python and Excel lies in Python’s utilization of objects. In Python, each data element is stored as a distinct object in memory, isolated from its predecessors. This architecture ensures that data is not immediately recalculated whenever another object is modified. This approach significantly simplifies auditing and reproducing work, making it more efficient than in Excel.

Consider a basic scenario in an Excel worksheet, where formulas are sequentially stacked to derive a result:

A single misplaced keystroke or alteration can lead to errors cascading throughout an Excel spreadsheet. Imagine the consequences if this erroneous cell resides in a separate worksheet or, even more challenging, in a different workbook.

Troubleshooting and debugging in such scenarios become exceedingly difficult, resembling a precarious and haphazard game of Jenga rather than a methodical debugging process.

Contrarily, Python operates by storing each data element within a distinct object. While this is a simplified explanation, envision assigning each data element as placing it into its unique container, termed an object. Each of these objects is stored separately in memory, and altering one does not inherently affect the others. Therefore, if an anomaly occurs in one object, it does not trigger widespread chaos among the rest.

It’s important to note that in standard Python, when data is encapsulated into an object, you must print or metaphorically “open the box” to view its contents. This characteristic of Python differs in its application within Excel. Additionally, the default method of object storage in Python is modified when integrated with Excel.

Python in Excel is optimized for Pandas DataFrames

For this next part of the demo, head to the points worksheet, which displays the top five NBA players ranked by points scored at the time this blog post was composed.

Python, much like a collection of boxes with varying shapes and sizes, accommodates a diverse array of object types.

Excel’s grid-based layout is particularly suited for handling tabular datasets. It’s a powerful tool for tasks such as generating new columns from existing ones, altering column formats, and other similar operations.

When dealing with such data in Excel, the ideal Python object type is the Pandas DataFrame. This is the default object type utilized when integrating Python with Excel, perfectly tailored for managing Excel data.

To begin, activate a Python cell in your Excel worksheet using the PY() function, and link it to the points data. This action will generate a Pandas DataFrame right within Excel:

By default, the name of the created object appears in the Excel cell. However, if you wish to view the actual data it contains, simply use the keyboard shortcut Ctrl + Alt + Shift + M to reveal the underlying dataset.

Don’t forget to assign the object

So far, we haven’t assigned the results of our data manipulation to a Python object. Utilizing Python without assigning operations to objects might seem feasible, but doing so significantly limits its capabilities. It’s akin to using a hand-held calculator for ad-hoc calculations without retaining any of the results for future use. To leverage Python’s full potential in Excel, return to your Python code cell and begin by assigning the results to a DataFrame named points:

Having successfully assigned and named the DataFrame, we now have the flexibility to perform extensive data analysis and manipulation, thanks to the capabilities of Pandas.

Suppose we want to determine the average number of points scored in this dataset. To accomplish this, we can utilize the powerful features of Pandas. The resulting computation will be assigned back to a new object, which I’ll name avg_points_pd:

Switching to other Python object types

While Python in Excel defaults to storing data in a DataFrame, it’s important to note that this isn’t the only data structure available. As previously mentioned, Python offers a variety of object types for different purposes. Generally, Pandas is an excellent choice for handling data, thanks to its robust and efficient functionalities.

However, there are instances when opting for a different data type is advantageous. Take, for instance, my current task: I plan to convert the points column into a NumPy array to calculate its mean. This method yields the same result as using Pandas, albeit requiring a bit more effort. So, one might wonder, why choose this approach?

The benefits of switching to other object types

One of the most significant aspects to consider is the efficiency in terms of time and performance. This can be effectively evaluated using the %%time magic cell command in Python. By placing this command at the beginning of your cell, you can swiftly obtain performance metrics.

These speed measurements are conveniently displayed in the Diagnostics menu, located to the right of your data:

Let’s delve into the diagnostics together. A millisecond (ms) is equivalent to 1000 microseconds (µs), making the first calculation notably quicker than the second. Specifically, the NumPy method, utilized in cell A10, demonstrates a speed roughly five times greater than the Pandas method, when considering both CPU and wall times. To understand the nuances between CPU and wall time, and their applications in Python, you might find this post informative.

This difference in performance time is quite significant on a relative scale. However, the distinction between microseconds and milliseconds might seem negligible to the average user. Thus, it’s important to weigh whether the efficiency gained with NumPy justifies the slightly more complex coding requirements. Imagine you’re working with a large dataset where calculating the average is faster with NumPy by a few seconds compared to Pandas. In such cases, opting for NumPy could be advantageous.

Beyond speed and efficiency, there are scenarios where transitioning between different object types in Python can simplify your tasks in Excel. A fascinating instance is provided by Owen from FlexYourData, demonstrating how 3D NumPy arrays can be employed to circumvent the need for writing loops or iterations in solving Excel problems.

In both my example and Owen’s, we emphasize the NumPy array as an optimal choice for converting your DataFrame. This serves as an indication of effective practices, but numerous other possibilities exist for transforming your DataFrame into different objects, offering substantial benefits. Below is a summary table that illustrates these methods:

Conversion Target	Code to Convert from DataFrame	Code to Convert Back to DataFrame
Numpy Array	`df.to_numpy()` or `df.values`	`pd.DataFrame(array)`
List	`df.values.tolist()`	`pd.DataFrame(list)`
Tuple	`tuple(map(tuple, df.values))`	`pd.DataFrame(list_of_tuples)`
Dictionary	`df.to_dict()`	`pd.DataFrame(dict)`

Additionally, it’s worth noting that Owen’s demonstrations are remarkably impressive, showcasing an aspirational level of proficiency in utilizing diverse Python objects within Excel. However, it’s equally important to acknowledge and celebrate your current achievements. Remember, by integrating NumPy, you’ve achieved notable efficiency gains. Feel free to compare your work with mine to see the progress you’ve made:

Download the solution file here

What questions do you have about how Python objects work in Excel specifically, or Python in Excel more generally? Let me know in the comments.

The post Python in Excel: How to understand how objects work first appeared on Stringfest Analytics.

Python was not designed for data analysis (and why that’s OK)

George Mount — Sat, 02 Jul 2022 15:52:49 +0000

A major reason I think it’s easier for Excel users to pick up R versus Python is that these tools tend to “think” more similarly than Python. See what I mean here: let’s take a range of numbers and attempt to multiply it by two using the built-in range, vector and list objects in Excel, Python and R respectively:

multiply-range-by-two-in-excel-r-python Download

Looks pretty straightforward in Excel and R, right? Take the range, multiply by two, get each number times two. By contrast, Python does something rather different: it literally takes the range, and duplicates it (so we get eight numbers not four). Weird, right?

Well, not necessarily. Excel and R were designed for statistics and arithmetic. Python was designed more generally to communicate with the operating system, process errors, and so forth. The way a program ought to “think” for these tasks is rather different than for analyzing data.

“You’re crazy, bud. Python’s cleaning up in the data space right now,” you may be thinking (pun intended). That’s true. And it’s with the help of a fantastic set of packages to make analyzing data there feel a lot more natural (You may have heard of some of these: pandas, scikit-learn, numpy, etc.).

This post isn’t a takedown of Python or endorsement or R. You could never pick a favorite child. It’s just an exploration of how software objectives inform software behavior, with a very simple example.

To get started with this great set of tools for data analysis, check out my book Advancing into Analytics.

The post Python was not designed for data analysis (and why that’s OK) first appeared on Stringfest Analytics.