dynamic array functions - Stringfest Analytics

How to understand the TEXTSPLIT() function in Excel

George Mount — Fri, 17 Jan 2025 23:16:18 +0000

When working with text data in Excel, it’s common to need to split strings by various delimiters such as commas, semicolons, and so forth. Although the legacy Text to Columns feature has its uses, it’s somewhat limited as a one-time operation that doesn’t automatically update with new data, can’t handle unique circumstances like missing delimiters, and so forth.

This is where the TEXTSPLIT() function shines. In this post, we’ll explore its capabilities with practical examples. To follow along, download the example file provided below.

Download the exercise file here

First, let’s run through the parameters of TEXTSPLIT(). Like many of the newer dynamic array functions in Excel, this one comes with quite a few of parameters. We won’t cover all of them in use here, but they’re listed below, and you can always check out the Microsoft documentation for further details:

Parameter	Description	Required/Optional
text	The text string that you want to split.	Required
col_delimiter	The character or string that separates columns.	Required
row_delimiter	The character or string that separates rows. If omitted, splits only into columns.	Optional
ignore_empty	A logical value that determines whether to ignore empty cells (TRUE) or include them (FALSE). Defaults to FALSE.	Optional
match_mode	Controls case sensitivity for delimiter matching. 1 for case-insensitive, 0 for case-sensitive. Defaults to 0.	Optional
pad_with	The value used to pad the result when there are more delimiters than text elements. Defaults to #N/A.	Optional

Great! Let’s dive into some examples. To get started, let’s split this list of fruits across columns using the second parameter, using a comma as a delimiter:

=TEXTSPLIT(A1, ",")

You’ll notice that the result is a spilled array. If you’re not familiar with dynamic arrays and dynamic array functions, please check out this post:

How to understand dynamic arrays in Excel

If we want to display these results down the rows, all we need to do is use the row delimiter argument instead. We can simply leave the column delimiter blank:

=TEXTSPLIT(A1, , ",")

One thing we might notice here, which wasn’t as obvious when they were in their own columns, is that now these records look a little out of alignment. This is because there’s actually an extra space between “Banana” and “Cherry” that we didn’t account for. It’s causing the leading space in those two entries. To fix that, we can just add that whitespace next to the comma:

Next, let’s explore splitting by both rows and columns. Here, we have a set of data that resembles key-value pairs about an individual which is quite hard to read. To make it more readable, we can expand this into a table. We’ll split the colons to create new columns and use semicolons to indicate new rows:

=TEXTSPLIT(A1, ":", ";")

If there’s a missing value or an extra delimiter, by default Excel will leave an empty cell for that particular value in the results. To override this behavior, simply set the fourth parameter, ignore_empty, to TRUE.

=TEXTSPLIT(A1, ",", , TRUE)

Last but not least, we can handle a similar scenario when the data is set up to split into both rows and columns, but a missing value would disrupt the structure of the result, leading to an uneven number of rows in each column. In this case, we can use the last parameter, pad_with, to fix this mismatch:

=TEXTSPLIT(A1, "=", ", ", , , "N/A")

In conclusion, TEXTSPLIT() is a powerful dynamic array function that excels at organizing often text-heavy data from complex strings, cleaning datasets, transforming key-value pairs into tabular formats, or preparing data for analysis. How do you plan to use TEXTSPLIT() in your Excel work, or have you already discovered a creative application for it? Please share your experiences, plans, or any questions or comments below.

The post How to understand the TEXTSPLIT() function in Excel first appeared on Stringfest Analytics.

How to understand the PIVOTBY() function in Excel

George Mount — Mon, 17 Jun 2024 15:30:39 +0000

In a previous post, we explored the GROUPBY() function as a formula-driven alternative to PivotTables in Excel:

How to understand the GROUPBY() function in Excel

Although the GROUPBY() function is useful, it lacks a key feature: the ability to add variables across the columns of a PivotTable. In this post, you’ll see how we can overcome this limitation using the PIVOTBY() function in Excel. This function is somewhat more complex than GROUPBY(), but it offers additional flexibility.

You can follow along with the exercise file provided below:

Download the exercise file here

The PIVOTBY() function in Excel requires four essential arguments and offers several optional ones. This function is truly a next-generation feature based on its complexity alone! We will concentrate on the four mandatory arguments and the fifth optional one, which enables us to add labels to our results. For a deeper understanding of the syntax, refer to Microsoft’s official documentation.

The key difference here, compared to using GROUPBY(), is that we can add categories to both rows and columns. Consider the following example using the demo dataset. Here, we use GROUPBY() to categorize the rows and calculate the sum of sales:

=GROUPBY(sales[[#All],[category]], sales[[#All],[sales]], SUM, 3)

If desired, we could achieve the same outcome using the PIVOTBY() function. This time, however, we would leave an argument blank to indicate that no columns are to be included:

=PIVOTBY(sales[[#All],[category]], ,
sales[[#All],[sales]],
SUM, 3)

Now, let’s explore capabilities beyond GROUPBY(). For example, we can flip around our arguments to rearrange the PivotTable to display categories along the columns:

OK, let’s now move on to something more realistic and visually appealing. This time, I’ll position country along the rows and category atop the columns:

=PIVOTBY( sales[[#All],[country]],
sales[[#All],[category]],
sales[[#All],[sales]],
SUM, 3)

PivotTables become especially useful when displaying multiple categories along the rows, allowing for a detailed breakdown by combinations of categories. Additionally, customizing the aggregation types within the PivotTable enables the use of the most illuminative methods for each variable. Let’s delve into constructing a more complex PivotTable next.

=PIVOTBY(sales[[#All],[country]:[customer_gender]],
sales[[#All],[category]],
sales[[#All],[quantity]:[sales]],
HSTACK(AVERAGE, SUM), 3)

In this example, I placed both country and customer_gender along the rows, and “category” across the columns. I even specified that I wanted an average of quantity and a sum of sales using the HSTACK() function.

One annoying thing about creating a PivotTable-like output here, as opposed to using an actual PivotTable, is the lack of a menu to assist with formatting. You could certainly format this manually, but if you’re looking for something more responsive and enjoyable to build, consider adding some conditional formatting. I set up a rule to format the data based on whether row 4 indicates quantity or sales:

It’s also worth noting that referencing multiple columns simultaneously is simpler when those columns are adjacent in the PivotTable. While not essential, this proximity simplifies the process. If the columns are not adjacent, you can utilize the CHOOSECOLS() function:

How to select two noncontiguous columns in an Excel table

One particularly cool application of the PIVOTBY() function is the ability to create a quick two-way proportion table using the new PERCENTOF() function:

=PIVOTBY(sales[[#All],[country]],
sales[[#All],[category]],
sales[[#All],[sales]],
PERCENTOF, 3)

You even have the option here to change how the proportions are calculated in this table.

For example, instead of basing the proportions on the grand total, you can calculate them across the row totals. This requires modifying the very last of so many function arguments in this function. It looks pretty unusual, doesn’t it?

=PIVOTBY(sales[[#All],[country]],
sales[[#All],[category]],
sales[[#All],[sales]],
PERCENTOF, 3, , , , , , 1)

I hope this post got you excited about the power of PIVOTBY(). There is so much you can achieve with IT, and I encourage you to explore its many parametes.

This function offers a formula-driven, instantly updated approach to creating PivotTable-like solutions. Nice work for getting the hang of it — remembering how to perform these operations can even be mentally taxing to programmers in languages like Python or R.

If you’re seeking an introduction to dynamic array functions in Excel, consider exploring my book, Modern Data Analytics in Excel:

Modern Data Analytics in Excel: Using Power Query, Power Pivot and More for Enhanced Data Analytics (O’Reilly)

Modern Data Analytics in Excel: Using Power Query, Power Pivot and More for Enhanced Data Analytics (O’Reilly)” — Stringfest Analytics" src="proxy.php?url=https://stringfestanalytics.com/maxl/embed/#?secret=o5wLZuddlP#?secret=NNARAeAXXc" data-secret="NNARAeAXXc" width="500" height="282" frameborder="0" marginwidth="0" marginheight="0" scrolling="no">

For corporate training solutions to help your team maximize their use of Excel for data analytics, check out my Modern Data Analytics in Excel workshops:

Modern Data Analytics in Excel workshops

The post How to understand the PIVOTBY() function in Excel first appeared on Stringfest Analytics.

How to understand the # spill operator in Excel

George Mount — Sat, 18 May 2024 18:55:51 +0000

Dynamic array functions represent one of the most exciting new features in Excel. Although generally as straightforward in syntax to use as other Excel functions, their underlying mechanism for handling data differs significantly.

Unlike traditional functions that deliver a single data piece in a single cell, dynamic array functions can return multiple pieces of data, or an array, simultaneously. In this post, we will explore how to leverage the power of this spill range to construct similarly dynamic summaries and aggregations.

Please feel free to follow along using the exercise file provided below:

Download the exercise file here

Dynamic array spill ranges and the spill operator

To start, let’s discuss why dynamic array functions in Excel are unique and the necessity of a spill range.

Dynamic array functions can generate multiple outputs from a single formula, filling a range of cells—known as a spill range—with results from one input formula. The spill range’s size and shape dynamically adapt based on input or data changes. For example, if you were to add a new product to the dm_sales table in the demo workbook, you will see the results in H2 expand by one row.

Should any cells within the intended spill range be occupied, Excel will display a #SPILL! error, signaling an obstruction preventing full display of the array.

Aggregating dynamic array spill ranges

Next, let’s see how to reference the complete dynamic array in further formulas or analysis, use the spill operator #. Adding # to the array’s initial cell reference (e.g., A1#) tells Excel to reference the entire spill range, not just the top-left cell. This ensures that downstream aggregations or references involving the dynamic array adjust dynamically, preserving the accuracy and relevance of your data analysis.

For example, using the COUNTA() function on the H3# spill range will essentially take a count of unique items.

Try adding or removing products from the source data table; both the unique item array and the downstream aggregation will update automatically.

Use case: dynamic drop-down

Now that you understand the basics of spill ranges and the spill operator, let’s explore a more complex use case: creating a dynamic dropdown list based on the sales reps represented in the dataset.

First, I’ll use the UNIQUE() function again to find all unique entries by combining the first name and last name columns:

To display the names in a dropdown in the format “last name, first name,” I need to rearrange the columns dynamically. This ensures that the list updates automatically if any names are added or removed.

To concatenate the columns, I’ll refer to the spill range. To reference both the first and second columns of the spill range, I can use the INDEX() function. Here’s how I’ll access the second column of the spill range and then the first:

Here, I’ve concatenated the second column with the first, using the necessary comma as a delimiter.

Now, the exciting part is that this new name format is also a dynamic array! To refer to this range with the spill operator, you can use it with data validation. Go to Data > Data Validation, then under List, set the source to =J2# to successfully link the dropdown to a dynamic array:

There’s much more potential with the spill operator when used alongside dynamic array functions. You can even integrate these with visualizations to create dynamic, interactive charts.

To master the essentials of dynamic array functions and other crucial features for creating dynamic, powerful analytics tools in Excel, check out my book Modern Data Analytics in Excel:

Modern Data Analytics in Excel: Using Power Query, Power Pivot and More for Enhanced Data Analytics (O’Reilly)

Modern Data Analytics in Excel: Using Power Query, Power Pivot and More for Enhanced Data Analytics (O’Reilly)” — Stringfest Analytics" src="proxy.php?url=https://stringfestanalytics.com/maxl/embed/#?secret=H5v0KbEGQh#?secret=IPlVva66n2" data-secret="IPlVva66n2" width="500" height="282" frameborder="0" marginwidth="0" marginheight="0" scrolling="no">

The post How to understand the # spill operator in Excel first appeared on Stringfest Analytics.

How to understand the RANDARRAY() function in Excel

George Mount — Thu, 16 May 2024 19:37:46 +0000

The RANDARRAY() function is one of the newer additions to Excel’s function library, providing a dynamic and versatile way to generate random numbers. Unlike the legacy RANDBETWEEN() and RAND() functions, which are limited to generating single values, RANDARRAY() offers more flexibility and efficiency, especially for business applications where random data generation is essential.

Let’s take a look at some examples in this post. You can follow along by downloading the exercise file below:

Download the exercise file here

`RANDARRAY()`‘s syntax

The RANDARRAY() function efficiently generates random integers or decimal numbers across specified rows and columns. Its syntax is straightforward:

RANDARRAY([rows], [columns], [min], [max], [integer])

Each parameter is optional:

rows: Defines the number of rows in the array; defaults to 1 if omitted.
columns: Defines the number of columns; defaults to 1 if omitted.
min: Sets the minimum value of the random numbers, with a default of 0. Useful for starting random numbers above zero.
max: Sets the maximum value, defaulting to 1. This constrains random numbers to a specific range.
integer: A Boolean where TRUE generates integers and FALSE (default) produces decimals between min and max.

RANDARRAY() streamlines the generation of random data, eliminating the need for multiple formulas and simplifying data layouts. It is ideal for financial modeling, Monte Carlo simulations, and preparing datasets for analysis or training, offering precision and reducing errors. This function is versatile, allowing for control over the range and type of numbers generated, and capable of creating large matrices more efficiently than RAND() or RANDBETWEEN().

Now that we’ve covered the syntax, let’s explore some practical examples.

Example 1: Sales Forecasting

Consider a marketing team needing to simulate sales data for various products over a quarter. By using the RANDARRAY() function, you can create a random dataset representing daily sales figures across different products. Suppose we need random sales data for 10 products over 90 days. Using the formula =RANDARRAY(90, 10, 0, 100, TRUE), we generate a 90×10 array of random integers between 0 and 100.

This dataset can be used to analyze trends, forecast sales, and prepare for quarterly reviews, providing a comprehensive view of potential sales patterns.

You can see this in the sales worksheet of the demo workbook.

Example 2: Risk Analysis in Corporate Finance

In corporate finance, RANDARRAY proves invaluable in risk assessment and financial modeling. Let’s say we need to simulate the daily returns of a stock portfolio over a year to analyze potential risks.

Using =RANDARRAY(365, 5, -0.05, 0.05), we generate a 365×5 array of random returns between -5% and 5% for five different stocks. This data helps in understanding the variability and risk profile of the portfolio, allowing for better investment decisions and risk management strategies.

You can see this on the risk worksheet of the demo workbook.

Example 3: Quality Control in Manufacturing

Another practical example is in HR analytics, where RANDARRAY() can help simulate employee performance scores. Suppose we want to analyze potential performance scores for 50 employees over a year. The formula =RANDARRAY(12, 50, 1, 5, TRUE) generates a 12×50 array of random integers between 1 and 5, representing monthly performance ratings. This simulated data can assist in performance reviews, identifying trends, and planning development programs.

You can see this on the quality worksheet of the demo workbook:

Conclusion

RANDARRAY() stands out for its versatility, making it an invaluable tool across numerous business applications. Unlike RANDBETWEEN() and RAND(), which generate only single random values, RANDARRAY can create entire arrays with a single formula. This capability not only saves time but also increases the robustness and clarity of your data analysis efforts.

While RANDARRAY() is exceptionally useful for generating random numbers based on a uniform distribution, it’s important to be aware of its two distinct limitations.

Firstly, if your statistical model requires different distributions, RANDARRAY() might not be the best fit, as it is limited to uniform distributions and cannot generate random numbers from other types of distributions.

Secondly, RANDARRAY() does not allow the customization of the random seed. This limitation hinders the ability to reproduce the same set of random numbers consistently, which is essential for ensuring reproducibility in your analysis.

To overcome these limitations, integrating Python with Excel can be highly advantageous. Python’s extensive libraries enable the generation of random numbers from a variety of distributions and allow the setting of random seeds to ensure reproducibility.

For those interested in further exploring dynamic array functions and their applications, I recommend checking out my book, Modern Data Analytics in Excel. This resource delves deeper into dynamic arrays and provides practical examples to enhance your Excel skills.

Modern Data Analytics in Excel: Using Power Query, Power Pivot and More for Enhanced Data Analytics (O’Reilly)

Modern Data Analytics in Excel: Using Power Query, Power Pivot and More for Enhanced Data Analytics (O’Reilly)” — Stringfest Analytics" src="proxy.php?url=https://stringfestanalytics.com/maxl/embed/#?secret=q3oiN1nzo8#?secret=SHqF1iP8Jk" data-secret="SHqF1iP8Jk" width="500" height="282" frameborder="0" marginwidth="0" marginheight="0" scrolling="no">

The post How to understand the RANDARRAY() function in Excel first appeared on Stringfest Analytics.

How to understand the SEQUENCE() function in Excel

George Mount — Thu, 16 May 2024 13:58:02 +0000

The SEQUENCE() function in Excel is a powerful tool for generating lists of values in a defined sequence. This function is particularly useful for business professionals who need to create a series of data without manually entering each number. Understanding and utilizing SEQUENCE() can save time and reduce errors, making it a superior alternative to the fill handle for creating sequences.

To follow along with this demonstration, please download the exercise file below:

Download the exercise file here

`SEQUENCE()` syntax

The SEQUENCE() function syntax is as follows:

=SEQUENCE(rows, [columns], [start], [step])

Here’s a breakdown of each parameter:

rows: The number of rows in the sequence.
columns: (Optional) The number of columns in the sequence. If omitted, defaults to 1.
start: (Optional) The starting number of the sequence. If omitted, defaults to 1.
step: (Optional) The increment for each subsequent number in the sequence. If omitted, defaults to 1.

Next, let’s explore a few quick examples of the SEQUENCE() function in action.

Example 1: Inventory Management

For our first example, let’s begin with a straightforward task on the inventory worksheet of the demo file: creating new stock keeping units (SKUs) for our inventory. We need to list 100 new SKUs in a column, starting from SKU 101.

In this case, we’ll request a sequence of 100 rows, use the default parameter of 1 column, and step increments of 1, starting at 101. This approach might seem unusual if you’ve never done it before in Excel, but in such instances, we will leave arguments we don’t need to modify as blank.

Go ahead and experiment by changing the various arguments of the function to see what you end up with. This might be the best way to learn any Excel function, including SEQUENCE().

Example 1: Monthly Budget Planning

Next for something a little more complex, head to the budget worksheet of the demo workbook.

Imagine you are preparing a monthly budget and need to list the months of the year along with a sequential number for each month. Instead of manually typing each month number using the fill handle, you can use SEQUENCE() to generate this automatically.

=SEQUENCE(12)

This formula generates a sequence from 1 to 12, representing each month. Keep in mind that because we left the other arguments blank, they defaulted to 1, meaning the sequence will generate a single column of numbers starting at 1 and extending down 12 cells.

While this might not seem very exciting—after all, we’ve just ended up with 12 numbers with no clear indication of what months they represent—it’s still more dynamic than if you had manually entered the numbers. For instance, if you needed to forecast for 18 periods, you could simply replace 12 with 18.

But let’s take it a step further. To indicate the month-end date for each row, I can use a combination of the EDATE() function with SEQUENCE():

=EDATE(A2 - 1, SEQUENCE(A5))

This formula now facilitates fully dynamic sequence generation, serving as an excellent starting point for developing various monthly budgets.

The most unusual aspect of this formula is likely the subtraction of 1 from A2. If you’re unfamiliar with the EDATE() function, its second argument specifies the number of months to add or subtract from a given start date. Since our SEQUENCE() starts at 1, it would normally add one month to the date figures. Therefore, subtracting 1 acts as a counterbalance, ensuring we get the month-end date for the last day of each previous month. Feel free to experiment further on your own if this concept is unclear!

Example 3: Sum of top N items

For our final example in this post, let’s explore how SEQUENCE() can assist in summing the top N items from a sales list. This technique is invaluable for understanding what percentage of an assortment’s performance is derived from a select group of top items. Check it out on the sales worksheet of the demo workbook.

Perhaps you’ve previously used the LARGE() function to identify the Nth largest value in a dataset. With SEQUENCE(), instead of isolating just the Nth largest values, we can capture the first through Nth values. In this example, I’ve set N to 3, which will retrieve the first, second, and third largest values.

I’ve also designed this to be user-defined, allowing the user to adjust N as needed. Consider adding data validation or a dropdown menu to provide users with guidance and sanity checks. With this setup, it’s easy to see how the top three items contribute nearly half of all sales:

=SUM(LARGE(sales[Sales],SEQUENCE(D3)))

Conclusion

Using SEQUENCE() in these scenarios not only saves time but also enhances data integrity. The function’s ability to dynamically adjust to changes in your dataset makes it a more robust solution compared to using the fill handle. By leveraging SEQUENCE(), you can streamline your workflows, reduce the risk of errors, and ensure that your data remains consistent and reliable.

For more insights into dynamic array functions, check out my book, Modern Data Analytics in Excel:

Modern Data Analytics in Excel: Using Power Query, Power Pivot and More for Enhanced Data Analytics (O’Reilly)

Modern Data Analytics in Excel: Using Power Query, Power Pivot and More for Enhanced Data Analytics (O’Reilly)” — Stringfest Analytics" src="proxy.php?url=https://stringfestanalytics.com/maxl/embed/#?secret=noGW49JCIG#?secret=1obSjUHH5G" data-secret="1obSjUHH5G" width="500" height="282" frameborder="0" marginwidth="0" marginheight="0" scrolling="no">

The post How to understand the SEQUENCE() function in Excel first appeared on Stringfest Analytics.

Modern Data Analytics in Excel (Excel Virtually Global online conference 2024)

George Mount — Fri, 22 Sep 2023 19:44:36 +0000

I’ll be presenting online at the Excel Virtually Global online conference, this coming Weds, Oct 11 2024 at 1pm Eastern US time on the topic, “Modern Data Analytics in Excel.”

Get all the details and register for the conference here.

This is a free, all-volunteer conference hosted and presented by primarily Microsoft MVPs. The organizers ask that attendees donate to a charity of their choice and let them know in US dollars the amount (how to do this will be reviewed later).

Here are some details about my session, which is heavily influenced by my forthcoming O’Reilly Media book by the same title:

If you haven’t modernized your reporting and analysis in Excel, it may be time! In this free one-hour webinar, we’ll be looking at some new features for powerful analysis:

Make first impressions with the data using Power Query profiling
Conduct ad-hoc analysis with PivotTables and Charts
Pull data fast with dynamic arrays

I hope to see you at my and so many other awesome session! Learn more about the event and get your questions answered from the organizers here.

The post Modern Data Analytics in Excel (Excel Virtually Global online conference 2024) first appeared on Stringfest Analytics.

dynamic array functions - Stringfest Analytics

How to understand the TEXTSPLIT() function in Excel

How to understand the PIVOTBY() function in Excel

How to understand the # spill operator in Excel

Dynamic array spill ranges and the spill operator

Aggregating dynamic array spill ranges

Use case: dynamic drop-down

How to understand the RANDARRAY() function in Excel

RANDARRAY()‘s syntax

Example 1: Sales Forecasting

Example 2: Risk Analysis in Corporate Finance

Example 3: Quality Control in Manufacturing

Conclusion

How to understand the SEQUENCE() function in Excel

SEQUENCE() syntax

Example 1: Inventory Management

Example 1: Monthly Budget Planning

Example 3: Sum of top N items

Conclusion

Modern Data Analytics in Excel (Excel Virtually Global online conference 2024)

Get all the details and register for the conference here.

`RANDARRAY()`‘s syntax

`SEQUENCE()` syntax