null - Stringfest Analytics https://stringfestanalytics.com Analytics & AI for Modern Excel Thu, 24 Oct 2024 02:37:11 +0000 en-US hourly 1 https://wordpress.org/?v=6.9.4 https://i0.wp.com/stringfestanalytics.com/wp-content/uploads/2020/05/cropped-RGB-SEAL-LOGO-STRINGFEST-01.png?fit=32%2C32&ssl=1 null - Stringfest Analytics https://stringfestanalytics.com 32 32 98759290 How to replace blank values with nulls in Excel Power Query https://stringfestanalytics.com/how-to-replace-blank-values-with-nulls-in-excel-power-query/ Sat, 18 Nov 2023 21:04:39 +0000 https://stringfestanalytics.com/?p=12048 In Excel’s Power Query, blank values are treated as nulls to signify the absence of data or to indicate an unknown, missing, or inapplicable value. This approach is consistent with database and programming standards, enhancing the precision of data processing, analysis, and integration with various data systems. By default, Power Query looks for empty values […]

The post How to replace blank values with nulls in Excel Power Query first appeared on Stringfest Analytics.

]]>
In Excel’s Power Query, blank values are treated as nulls to signify the absence of data or to indicate an unknown, missing, or inapplicable value. This approach is consistent with database and programming standards, enhancing the precision of data processing, analysis, and integration with various data systems.

By default, Power Query looks for empty values and codes them as nulls. However, sometimes values that look blank, actually aren’t, and Power Query has an issue making this conversion. Let’s take a look using this penguins csv file found on GitHub:

https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv

To import the dataset into Power Query, navigate to Data > Get Data > From File > From Text/CSV, and enter the GitHub URL.

You’ll see a data preview. In the fourth row, most missing values are correctly marked as null, except for the ‘sex’ column, which inexplicably shows blanks. There are a few more blank values visible in the Power Query preview:

This issue arises because the dataset uses empty strings instead of true blanks, which Power Query interprets as valid, zero-length string data rather than a data absence.

The use of capital letters in the sex column is also somewhat jarring compared to other columns, suggesting a different coding approach. To rectify this, let’s replace these values.

Select the sex column, then go to Replace Values on the Home tab of the ribbon:

Leave the first field blank and set the second field to null:

Replace missing values with null

Upon confirming, these values will display in italics, distinguishing them as a reserved keyword:

Blank values replaced with nulls

For consistency, I’ll also convert this column to proper case. To do that, right click on the column and select Transform > Capitalize Each Word:

Convert column to proper case

Once done, close and load the results back to Power Query:

Penguins replace missing values with nulls  finished file

To the untrained eye, our modifications might seem trivial, but they significantly enhance data handling in Power Query.

Did you manage to follow these steps? Compare your file with mine to see the difference.

If you have any questions about working with null values, replacing values, or other Power Query features, feel free to ask in the comments.

The post How to replace blank values with nulls in Excel Power Query first appeared on Stringfest Analytics.

]]>
12048
How to understand null and missing values in Power Query https://stringfestanalytics.com/how-to-understand-null-and-missing-values-in-power-query/ https://stringfestanalytics.com/how-to-understand-null-and-missing-values-in-power-query/#comments Fri, 19 Aug 2022 14:58:38 +0000 https://stringfestanalytics.com/?p=9732 I’ve written in the past how Power Query enforces good data hygiene practices that aren’t always in Excel. One of those is the presence of a null or dedicated missing value. This has been a real problem for people in the past: Good news… it doesn’t have to be anymore, with Power Query. In this […]

The post How to understand null and missing values in Power Query first appeared on Stringfest Analytics.

]]>
I’ve written in the past how Power Query enforces good data hygiene practices that aren’t always in Excel. One of those is the presence of a null or dedicated missing value. This has been a real problem for people in the past:

https://twitter.com/pk_sullivan/status/899637402?s=20&t=7-3Ylu5JMnw17RK-o_Q5Fw

Good news… it doesn’t have to be anymore, with Power Query. In this post we’ll look at what null means in Power Query, and how to use it properly. Let’s practice on a a modified version of the penguins dataset:

 

Go ahead and load this data into Power Query by clicking inside the table and selecting Data > Get Data > From Table/Range. (Check this post for more detailed steps.)

For this exercise, we will focus on the row ID = 4 (row 5 back in Excel). Take a look at how these values are represented in Excel versus Power Query:

Power Query vs Excel: missing values

The values that were blank in Excel are now null in Power Query. (We’ll circle back to the NA under sex.) This is Power Query’s dedicated way to encode missing values. But what is a missing value?

It does NOT equal zero!

Possibly the most common misconception of a missing value is that it equals zero. It does not! If we knew what the value should be, we would just fill in a zero! By definition, we don’t know what the missing value stands for. To be extra careful this mistake isn’t made, we use a special value null just for missing values. Otherwise, it’s too easy to mix up zeros and nulls.

We don’t know why it’s missing

Did you know the word data comes from the Latin for “what is given?” That means if our data is missing, it wasn’t given. It’s an unknown unknown: we don’t know why it’s missing, nor do we know what the real value is.

This seems like a trivial point, but I remember a past boss asking me why values were missing. And while it’s important to understand how the data was collected, the data itself can’t tell us that. The best we can do is represent and store that missing data the best we can. And that is by using the null value.

We should always represent missing values as null (which starts as a blank in Excel)

Not all coworkers or bosses are going to “get” null. They may prefer a more “human readable” missing value like “Not available” used in the raw spreadsheet data. And, to be fair, Excel in the past was pretty ambiguous about what to use for missing values. But these days, we should always represent missing values as blanks in Excel, because those become null in Power Query.

To understand the importance of being consistent with missing values, let’s take a look at summarizing our nulls:

We can count up nulls with column quality

With missing values, many statistical models break. They may be signs of poor data quality and collection methods. When exploring the data, we should be able to take stock of missing values easily. That’s what null lets us do.

In the Power Query editor, head over to the View tab and check on “Column quality” under Data Preview. Here you will see a menu appear at the top of each column summarizing what percent of values are valid, contain formula errors, and are empty. And to Power Query, empty means null and null alone.

Power Query column quality missing values

Recoding values as null

Those of you who have been been eating your carrots you might have seen the NA in record 4 of sex. It appears that whoever collected this data marked some of the missing values this way, perhaps as “not available?” It makes sense as a value, but not to Power Query. If we keep these values as NA, they won’t be identified as missing which could impact later steps of the analysis.

To fix this, we will right-click on the sex columns and select Replace Values. This should look very similar to a Find and Replace in regular Excel, except this time we’ll replace the selected phrase with the Power Query null:

Recode NA as Null in Power Query

If you look carefully, you’ll even notice that the percentage of missing values in the Column Quality menu goes from 2% to 3%. Want to take a closer look? Get the completed exercise file here:

 

Don’t “miss” out… use nulls!

If you use databases and SQLs, null serves a similar function. It’s one that Excel sorely lacked and has a lot of benefits. Most importantly, it makes things unmistakably clear to data users what values are truly missing, and not just 0 or some placeholder value.

How have you used null in the past for data analysis? How could this have helped you in Excel in the past? Let me know in the comments.

You can learn more about Power Query in my book Modern Data Analytics in Excel:

The post How to understand null and missing values in Power Query first appeared on Stringfest Analytics.

]]>
https://stringfestanalytics.com/how-to-understand-null-and-missing-values-in-power-query/feed/ 1 9732