sql - Stringfest Analytics https://stringfestanalytics.com Analytics & AI for Modern Excel Thu, 24 Oct 2024 02:37:11 +0000 en-US hourly 1 https://wordpress.org/?v=6.9.4 https://i0.wp.com/stringfestanalytics.com/wp-content/uploads/2020/05/cropped-RGB-SEAL-LOGO-STRINGFEST-01.png?fit=32%2C32&ssl=1 sql - Stringfest Analytics https://stringfestanalytics.com 32 32 98759290 How to use the SWITCH() function in Excel Power Pivot https://stringfestanalytics.com/how-to-use-the-switch-function-in-excel-power-pivot/ Wed, 08 Mar 2023 19:40:22 +0000 https://stringfestanalytics.com/?p=10634 It’s amazing to think that everything we do on our computers (at least for now) is powered by a series of True/False states. And while most of these operations are done behind the scenes, there’s still a fair amount of conditional or Boolean logic found in many data analysis tasks. If/then Excel statements start simple […]

The post How to use the SWITCH() function in Excel Power Pivot first appeared on Stringfest Analytics.

]]>
It’s amazing to think that everything we do on our computers (at least for now) is powered by a series of True/False states. And while most of these operations are done behind the scenes, there’s still a fair amount of conditional or Boolean logic found in many data analysis tasks.

If/then Excel statements start simple enough. But add a few layers and they quickly become a nightmare to decipher:

Four if statements. What the hell is this? meme

 

Many programming languages like SQL or Tableau provide CASE statements to write cleaner conditional logic statements. DAX in Excel Power Pivot provides something similar with the SWITCH() function. Let’s look at some examples.

 

Creating SWITCH() as a calculated column

In general, SWITCH() is used to recode values using a set of conditional statements. Because each row is evaluated individually and these results are not likely to be aggregated, it makes most sense to store SWITCH() results as a calculated column rather than a measure.

A simple SWITCH() to recode values

At its most basic, the SWITCH() function can be used to recode values. For example, we might want to number the segments Consumer, Corporate and Home Office as 1, 2 and 3 respectively. If a match is not found, we’ll recode the value as "Unknown":

orders_recode: = SWITCH(
    [segment],
    "Consumer", "1",
    "Corporate", "2",
    "Home Office", "3",
    "Unknown"
  )

Remember, every value in a data model table must be of the same type, so because we are including an "Unknown" as a string we must also make the other values like 1 and 2 strings.

Using SWITCH() with TRUE() to bin quantities

In the previous example we recoded the segment variable so that Consumer became segment 1, Corporate became segment 2 and so forth. But what if we wanted to create grouped or binned variables, such as the following:

If the order quantity is less than 3, call it a “Small order.” If it’s less than 6, call it a “Medium order.” Otherwise, call it a “Large order.”

This case is a little more complex, as we are evaluating a series of logical tests rather than simply matching specific values. Because we want every conditional statement to be evaluated at once and the best result returned, we will start the statement with TRUE() to force a result:

= SWITCH(
    TRUE(),
    [quantity] < 3, "Small order",
    [quantity] < 6, "Medium order",
    "Large order"
)
Example of SWITCH function taking TRUE

With these columns derived, go ahead and load your data model to a PivotTable. From here, you’re able to slice and count records by these recoded variables. For example, here’s a count of order sizes by segment:

SWITCH columns used in PivotTable

If SWITCH() is helpful, then use it!

SWITCH() is great for a number of use cases. For example, you might want to group two categories into one, recode uncommon values as "Other" and more. Remember, however, that as more columns get created, the memory needed to store your data model will expand. So as always, compute with caution.

How have you used SWITCH() before in Power Pivot? How does it compare to the classic Excel IF() operators? Let me know in the comments.

The post How to use the SWITCH() function in Excel Power Pivot first appeared on Stringfest Analytics.

]]>
10634
How to understand null and missing values in Power Query https://stringfestanalytics.com/how-to-understand-null-and-missing-values-in-power-query/ https://stringfestanalytics.com/how-to-understand-null-and-missing-values-in-power-query/#comments Fri, 19 Aug 2022 14:58:38 +0000 https://stringfestanalytics.com/?p=9732 I’ve written in the past how Power Query enforces good data hygiene practices that aren’t always in Excel. One of those is the presence of a null or dedicated missing value. This has been a real problem for people in the past: Good news… it doesn’t have to be anymore, with Power Query. In this […]

The post How to understand null and missing values in Power Query first appeared on Stringfest Analytics.

]]>
I’ve written in the past how Power Query enforces good data hygiene practices that aren’t always in Excel. One of those is the presence of a null or dedicated missing value. This has been a real problem for people in the past:

https://twitter.com/pk_sullivan/status/899637402?s=20&t=7-3Ylu5JMnw17RK-o_Q5Fw

Good news… it doesn’t have to be anymore, with Power Query. In this post we’ll look at what null means in Power Query, and how to use it properly. Let’s practice on a a modified version of the penguins dataset:

 

Go ahead and load this data into Power Query by clicking inside the table and selecting Data > Get Data > From Table/Range. (Check this post for more detailed steps.)

For this exercise, we will focus on the row ID = 4 (row 5 back in Excel). Take a look at how these values are represented in Excel versus Power Query:

Power Query vs Excel: missing values

The values that were blank in Excel are now null in Power Query. (We’ll circle back to the NA under sex.) This is Power Query’s dedicated way to encode missing values. But what is a missing value?

It does NOT equal zero!

Possibly the most common misconception of a missing value is that it equals zero. It does not! If we knew what the value should be, we would just fill in a zero! By definition, we don’t know what the missing value stands for. To be extra careful this mistake isn’t made, we use a special value null just for missing values. Otherwise, it’s too easy to mix up zeros and nulls.

We don’t know why it’s missing

Did you know the word data comes from the Latin for “what is given?” That means if our data is missing, it wasn’t given. It’s an unknown unknown: we don’t know why it’s missing, nor do we know what the real value is.

This seems like a trivial point, but I remember a past boss asking me why values were missing. And while it’s important to understand how the data was collected, the data itself can’t tell us that. The best we can do is represent and store that missing data the best we can. And that is by using the null value.

We should always represent missing values as null (which starts as a blank in Excel)

Not all coworkers or bosses are going to “get” null. They may prefer a more “human readable” missing value like “Not available” used in the raw spreadsheet data. And, to be fair, Excel in the past was pretty ambiguous about what to use for missing values. But these days, we should always represent missing values as blanks in Excel, because those become null in Power Query.

To understand the importance of being consistent with missing values, let’s take a look at summarizing our nulls:

We can count up nulls with column quality

With missing values, many statistical models break. They may be signs of poor data quality and collection methods. When exploring the data, we should be able to take stock of missing values easily. That’s what null lets us do.

In the Power Query editor, head over to the View tab and check on “Column quality” under Data Preview. Here you will see a menu appear at the top of each column summarizing what percent of values are valid, contain formula errors, and are empty. And to Power Query, empty means null and null alone.

Power Query column quality missing values

Recoding values as null

Those of you who have been been eating your carrots you might have seen the NA in record 4 of sex. It appears that whoever collected this data marked some of the missing values this way, perhaps as “not available?” It makes sense as a value, but not to Power Query. If we keep these values as NA, they won’t be identified as missing which could impact later steps of the analysis.

To fix this, we will right-click on the sex columns and select Replace Values. This should look very similar to a Find and Replace in regular Excel, except this time we’ll replace the selected phrase with the Power Query null:

Recode NA as Null in Power Query

If you look carefully, you’ll even notice that the percentage of missing values in the Column Quality menu goes from 2% to 3%. Want to take a closer look? Get the completed exercise file here:

 

Don’t “miss” out… use nulls!

If you use databases and SQLs, null serves a similar function. It’s one that Excel sorely lacked and has a lot of benefits. Most importantly, it makes things unmistakably clear to data users what values are truly missing, and not just 0 or some placeholder value.

How have you used null in the past for data analysis? How could this have helped you in Excel in the past? Let me know in the comments.

You can learn more about Power Query in my book Modern Data Analytics in Excel:

The post How to understand null and missing values in Power Query first appeared on Stringfest Analytics.

]]>
https://stringfestanalytics.com/how-to-understand-null-and-missing-values-in-power-query/feed/ 1 9732
Free checklist: 30 Days to SQL Data Analyst https://stringfestanalytics.com/30dayssql/ Sun, 23 Jan 2022 19:33:00 +0000 https://stringfestanalytics.com/?p=8600 In a time when it seems like data analysts are faced with more and more things to learn, one skillset has stood the test of time: SQL. Whether you’re ensconsed in the Microsoft BI stack, pledged your analysis allegiance to R or whatever other scenario, it’s near-essential to know about relational databases and how to […]

The post Free checklist: 30 Days to SQL Data Analyst first appeared on Stringfest Analytics.

]]>
In a time when it seems like data analysts are faced with more and more things to learn, one skillset has stood the test of time: SQL.

Whether you’re ensconsed in the Microsoft BI stack, pledged your analysis allegiance to R or whatever other scenario, it’s near-essential to know about relational databases and how to communicate with them — and I don’t see that changing. That’s where SQL comes in.

In fact, learning platform Dataquest found SQL the most common required skill among all job listings:

With this free checklist, you’ll learn the basics of SQL:

  • The role of relational databases in data analytics, and the role of SQL in relational databases
  • How to perform common row and column operations on database tables
  • How to join two or more tables
  • How to work with strings, dates and other data types
  • The basics of data analysis using SQL

With this learning path you’ll read, watch and download your way through some of the best SQL learning content on the web. Be sure to bookmarks the sites you visit as great places to visit if (really, when) you get blocked with SQL.

This 30-day checklist is tailored to data analysts comfortable in tools like Excel, Power BI or Python learn the basics of reading from one or more tables and performing basic data analysis using SQL.

I assume that readers are on a Windows machine, although I suspect many of the tutorials can be completed just fine on a Mac.

Installing SQL on your computer

Speaking of the environment: SQL is not the easiest program to get up and running on your computer. I suggest installing SQLite and SQLiteStudio for (as the name implies) a lightweight solution. From there you can connect to a practice database: I suggest Sean Lahman’s baseball database.

You will learn how to connect to a database in Day 6 of the checklist. Simply download the Lahman database and connect to it using the steps shown in the video.

I recognize that Lahman is not the data you are going to see in the tutorials. It will be OK. Many of the tutorials will have in-browser tools for you to practice, and you can then translate what you’ve learned to your Lahman database.

Get the checklist here 👇

Also, be sure to sign up below for complete access to my analytics resource library with special event invitations, discounts and more:

What questions about SQL do you have? Do you have any favorite resources to share? Let me know in the comments.

The post Free checklist: 30 Days to SQL Data Analyst first appeared on Stringfest Analytics.

]]>
8600
State of Ohio TechCred program now taking applications (Reimbursed workforce upskilling programs) https://stringfestanalytics.com/tech-cred-applications/ Thu, 22 Jul 2021 18:37:00 +0000 https://georgejmount.com/?p=7166 TL;DR: The State of Ohio is offering FREE MONEY for workforce development in fields including data analytics. Learn more about the program and how to apply here. If you spend any time in Ohio it won’t take long before seeing a Stuck in Ohio bumper sticker. It’s apparently meant to be taken ironically, but I’m […]

The post State of Ohio TechCred program now taking applications (Reimbursed workforce upskilling programs) first appeared on Stringfest Analytics.

]]>
TL;DR: The State of Ohio is offering FREE MONEY for workforce development in fields including data analytics. Learn more about the program and how to apply here.

If you spend any time in Ohio it won’t take long before seeing a Stuck in Ohio bumper sticker. It’s apparently meant to be taken ironically, but I’m not sure it usually is. Our state is often the butt of jokes, the stand-in for everything dull about Middle America.

I’ve never felt “stuck” in Ohio, only blessed. Here in Cleveland we have some of the world’s best cultural institutions with a relatively lower cost of living and slower pace of life than traditional tech hubs.

But I will admit, it’s not always been easy to establish a career since I’ve decided to stay here. Ohio is not a traditional tech hub, and it’s not always had the economy conducive to becoming one.

Fortunately, the state is improving, and offering this program with which I look forward to getting involved: the TechCred workforce upskilling initiative. Check out the following video or this guide to learn a bit more about it.

TechCred offers certifications in nearly every professional field imaginable, from CAD drafting to Excel… Well, from what I hear engineers use Excel all the time, so maybe this isn’t the best example. Take a look at the full credential list here and you’ll see it runs the gamut.

These programs are offered by third-party providers, and employers apply and are reimbursed for them on the TechCred website. I am excited to add Stringfest to the list, where I offer a range of opportunities in data analytics such as Excel, statistical analysis, business intelligence, R, and Python.

Learn more about TechCred and apply here.

If you run a data team in Ohio, I’d love to partner with you on the program. If you have questions about participating, you can contact the TechCred office (who are more knowledgeable about logistics than I am). If you have specific questions about the training that I offer, you’re welcome to contact me.

TechCred will be opening new applications for the program beginning March 1, 2022.

The post State of Ohio TechCred program now taking applications (Reimbursed workforce upskilling programs) first appeared on Stringfest Analytics.

]]>
7166
Advancing into Analytics: the reading list https://stringfestanalytics.com/aina-reading-list/ Mon, 19 Jul 2021 11:50:00 +0000 https://stringfestanalytics.com/?p=7809 You can learn a lot about a book by reviewing its bibliography: often, an author’s biases and beliefs will be on full display there. I suppose that’s the same for me: if you list all the books I cited in Advancing into Analytics: From Excel to Python and R, you’ll see that I really like […]

The post Advancing into Analytics: the reading list first appeared on Stringfest Analytics.

]]>
You can learn a lot about a book by reviewing its bibliography: often, an author’s biases and beliefs will be on full display there.

I suppose that’s the same for me: if you list all the books I cited in Advancing into Analytics: From Excel to Python and R, you’ll see that I really like O’Reilly books, and that you need to know more than statistical analysis in Excel, R and Python to be a data analyst.

This PDF guide lists all the books I referred to in Advancing into Analytics, sorted alphabetically by title with the publisher’s name and a brief description of each.

It’s a treasure trove of some of the best titles in analytics. You can start with my book — learn here how to purchase or even read it for free — and work your way through the list by interest, A-Z, or whatever works for you.

However you decide to read mine and these other books, please leave a review to help other data analysts identify the best analytics resources.

Sign up below to access this PDF download. When you do, you’ll get access to my entire resource library.

Sign up below to get the PDF reading list and access to my resource library: 👇

* indicates required





The post Advancing into Analytics: the reading list first appeared on Stringfest Analytics.

]]>
7809
Learning guide: SQL for Excel users, half-day workshop https://stringfestanalytics.com/sql-excel-users-half-day/ Mon, 05 Jul 2021 10:33:00 +0000 https://georgejmount.com/?p=6100 I like to call SQL the cool old uncle of analytics because, no matter what you are trying to do with your data, SQL has probably done it before. For as impressive as the new Microsoft stack is, it’s important to remember that Microsoft didn’t invent things like a left outer join or one-to-one relationship. […]

The post Learning guide: SQL for Excel users, half-day workshop first appeared on Stringfest Analytics.

]]>
I like to call SQL the cool old uncle of analytics because, no matter what you are trying to do with your data, SQL has probably done it before.

For as impressive as the new Microsoft stack is, it’s important to remember that Microsoft didn’t invent things like a left outer join or one-to-one relationship. These concepts are old relational data modeling, and SQL has classically been the language to interact with relational database management systems.

From this in Excel to that same thing in SQL

We tend to do the same common data tasks regardless of program used: it could be sorting, filtering, or creating a calculated field. You may perform these all the time in Excel, and you will similarly do them all the time in SQL.

That’s why in this workshop I “pivot” from common Excel tasks into their corresponding SQL tasks. One of the best ways to learn something new is to relate it to what you already know, and this approach really accelerates what can be learned in a short amount of time. This workshop covers the major read operations in SQL. By the end of the workshop, users are able to query information from multiple tables in a database.

SQL is essential knowledge for anyone looking to improve their data chops. This workshop accelerates that knowledge acqusition by relating SQL skills right back to Excel skills.

SQL for Excel users half-day workshop

Lesson 1: SQL and Excel for data analytics

Objective: Student can visually inspect a database with SQLiteStudio

Description:

  • What is a database and when would you use it?
  • Exploring the SQLiteStudio interface

Exercises: Practice exploring a second database

Assets needed: Baseball database

Time: 30 minutes

Lesson 2: From Excel tables to SQL SELECT

Objective: Student select some or all fields from various tables of a database

Description:

  • The grammar of SQL
  • Querying fields
  • Functions
  • Aliases

Exercises: Drills

Assets needed: Baseball database

Time: 35 minutes

Lesson 3: From Excel sort & filter to SQL WHERE and ORDER BY

Objective: Student can sort and filter a query’s results

Description:

  • Sorting and filtering
  • How NULLs work
  • Limiting a query’s results

Exercises: Drills

Assets needed: Baseball database

Time: 30 minutes

Lesson 4: From Excel PivotTable to SQL aggregation

Objective: Student can aggregate and summarize query results

Description:

  • Counting records
  • Counting and listing distinct records
  • Grouping and aggregating records
  • Filtering aggregated records
  • Using all read clauses

Exercises: Drills

Assets needed: Baseball database

Time: 35 minutes

Lesson 5: From VLOOKUP() to JOIN

Objective: Student can join multiple tables from the same database

Description:

  • Relational database modeling
  • Left outer joins
  • Inner joins
  • Set operators

Exercises: Drills

Assets needed: Baseball database

Time: 45 minutes

Lesson 6: From “That’s hard in Excel” to “That’s easy in SQL!”

Objective: Student can conduct intermediate data analysis

Description:

  • Subqueries
  • Case operators
  • Common table expressions

Exercises: Drills

Assets needed: Baseball database

Time: 45 minutes

This learning guide is a part of my resource library. For exclusive free access, subscribe below.

The post Learning guide: SQL for Excel users, half-day workshop first appeared on Stringfest Analytics.

]]>
6100
Learning guide: Introduction to SQL, one-day workshop https://stringfestanalytics.com/intro-to-sql-one-day/ https://stringfestanalytics.com/intro-to-sql-one-day/#comments Fri, 21 Aug 2020 19:16:00 +0000 https://georgejmount.com/?p=5934 I like to call Structured Query Language, or SQL, the “cool old uncle of data analysis,” because SQL has been around the block, and can still keep up. While R has been around since 2000, and Python since 1991, SQL dates to the early 1970s. Don’t let the vocal, relatively recent adoption of these other […]

The post Learning guide: Introduction to SQL, one-day workshop first appeared on Stringfest Analytics.

]]>
I like to call Structured Query Language, or SQL, the “cool old uncle of data analysis,” because SQL has been around the block, and can still keep up.

While R has been around since 2000, and Python since 1991, SQL dates to the early 1970s. Don’t let the vocal, relatively recent adoption of these other languages distract you: SQL is a vital part of the data analysts’s stack of tools. (You also should consider these tools not as substitutes, but as complementary slices of the data analytics stack.)

The below graphic comes from a DataQuest blog post entitled “Want a Job in Data? Learn SQL.” The author mined 25,000 jobs on Indeed containing “data” as a search term.

DataQuest’s results of mining 25,000 data jobs on Indeed (source)

Before Python, R or any other skill came SQL. So, what is this language, and what do data analysts need to know?

SQL is the language used to manage data held in a relational database. These tasks are summarized by the famous “CRUD” acronym: create, read, update, and delete. Generally, data analysts are responsible for the second activity, or reading information from a database.

That is the focus of this half-day workshop: to give analysts a command over the most common tasks for reading information out of a database for data analysis.

SQL comes in many flavors, from Access to PostgreSQL. For these elementary reading commands, the differences are negligible. I have designed to conduct the class in SQLite, which has the benefit of being fast, flexible and open-source. We will use the SQLiteStudio environment to move from visual interaction with the database to coding.

https://georgejmount.com/wp-content/uploads/2020/05/introductory-sql-one-day-workshop-1.pdf

Lesson 1: Databases and data analytics

Objective: Student can navigate the SQLiteStudio environment to visually inspect tables in a database

Description:

  • The basics of working with a database
  • Database parts of speech
  • The basics of working with SQLite + SQLiteStudio

Exercises: Retrieve dimensions and database types of a table

Assets needed: Home prices dataset

Time: 50 minutes

Lesson 2: Coding in SQL

Objective: Student can write style-compliant SQL scripts to retrieve tables from a field

Description:

  • Styling in SQL
  • Working with scripts: opening, saving, executing
  • Selecting all fields from a table
  • Selecting some fields from a table
  • Aliasing a field
  • Arithmetic operations on a field
  • String operations on a field
  • Limiting query results

Exercises: Drills

Assets needed: Home prices dataset

Time: 50 minutes

Lesson 3: Sorting & filtering

Objective: Student can sort and filter the results of a query

Description:

  • Conditional operators
  • Logical operators
  • Filtering one or more fields
  • Sorting one or more fields

Exercises: Drills

Assets needed: Home prices dataset

Lesson 4: Aggregating

Objective: Student can group and aggregate the results of a SQL query

Description:

  • Counting and listing distinct records
  • Understanding NULL
  • Grouping and field arithmetic
  • Aggregating and aliasing
  • Filtering aggregation results

Exercises: Drills

Assets needed: Home prices dataset

Time: 50 minutes

Lesson 5: SQL and table joins

Objective: Student can join two or more tables from the same database

Description:

  • How relational databases work
  • JOINs and NULLs
  • INNER JOIN
  • LEFT OUTER JOIN

Exercises: Drills

Assets needed: Home prices dataset

Time: 50 minutes

Lesson 6: Intermediate joins

Objective: Student can use less common techniques to join two or more tables from the same database

Description:

  • RIGHT OUTER JOIN
  • FULL OUTER JOIN

Set operators

Exercises: Drills

Assets needed: Home prices dataset

Time: 50 minutes

Lesson 7: SQL for data analysis

Objective: Student can conduct basic data exploration and analysis in SQL

Description:

  • CASE expressions
  • Subqueries
  • Common table expressions

Exercises: Drills

Assets needed: Home prices dataset

Time: 50 minutes

Lesson 8: SQL for data analysis, continued

Objective: Student can conduct intermediate data exploration and analysis in SQL

Description:

  • Window functions
  • Correlations
  • Regression coefficients

Exercises: Drills

Assets needed: Home prices dataset

Time: 50 minutes

While the next wave in data analytics features tools like Power Query that allow users to accomplish many tasks with no coding, there is something to be said for learning data analysis the old-fashioned way: through SQL commands.

After all, it’s a “structured query language,” and I find that learning SQL tightens and structures your thoughts around data analysis.

This download is part of my resource library. For exclusive free access, subscribe below.

The post Learning guide: Introduction to SQL, one-day workshop first appeared on Stringfest Analytics.

]]>
https://stringfestanalytics.com/intro-to-sql-one-day/feed/ 2 5934
My upcoming O’Reilly business analytics online training courses https://stringfestanalytics.com/oreilly-training-announcement/ Fri, 20 Sep 2019 18:16:44 +0000 https://georgejmount.com/?p=5435 Next week is my first installment of live training offered through O’Reilly Media’s online platform. I’m so excited to partner with this outstanding organization to deliver workshops on business analytics. My first two courses will offer a unique perspective on SQL and R. For each, I will start with a data concept as demonstrated in Excel, […]

The post My upcoming O’Reilly business analytics online training courses first appeared on Stringfest Analytics.

]]>

Next week is my first installment of live training offered through O’Reilly Media’s online platform. I’m so excited to partner with this outstanding organization to deliver workshops on business analytics.

My first two courses will offer a unique perspective on SQL and R. For each, I will start with a data concept as demonstrated in Excel, then move into its equivalent in each of these tools.

Through this method, I hope to show how much spreadsheet users already instinctively understand about data — that rather than “giving up on bad habits,” working with these tools from an existing “mental model” of spreadsheets is a huge advantage.

Below are the course descriptions. Read the full schedule at the link.

  • SQL-Powered Excel for Business Analytics: Structured Query Language (SQL) is the primary language used for accessing and managing data in a relational database system. Analysts, project managers, and other professionals want to make informed decisions using large datasets, but often have to request IT staff to perform these data retrievals. By relating the foundations of SQL to everyday Excel tasks, professionals will be empowered to perform their own data analysis and exploration on databases.
  • R-Powered Excel for Business Analytics: R is an open-source programming language widely used for statistical and data analysis. Project managers, analysts, and other professionals want to make informed decisions using data, but may run into limitations using Excel for advanced, reproducible analysis. By relating the foundations of R to everyday Excel tasks, professionals will be empowered to perform data analysis and exploration in R.

I hope to see you there — O’Reilly’s platform is required to register, but there’s so much in store when you do. If you can’t make this week’s workshops, I will be offering them again in November.

Learn more about my approach to data education here.

The post My upcoming O’Reilly business analytics online training courses first appeared on Stringfest Analytics.

]]>
5435
Review: Excel TV’s Data Science with Power BI and R https://stringfestanalytics.com/exceltvpowerbircoursereview/ Fri, 12 Oct 2018 15:29:11 +0000 https://georgejmount.com/?p=4838 I have had a long history with Excel TV and, like Excel TV (and Excel itself) the channel has changed over the years. Gone are their regular live-streaming interviews with leading Excel authorities. Excel TV’s main product is now online courses. These are pre-recorded classes taught by the same caliber of talent as that of […]

The post Review: Excel TV’s Data Science with Power BI and R first appeared on Stringfest Analytics.

]]>
I have had a long history with Excel TV and, like Excel TV (and Excel itself) the channel has changed over the years.

Gone are their regular live-streaming interviews with leading Excel authorities. Excel TV’s main product is now online courses. These are pre-recorded classes taught by the same caliber of talent as that of the interviews.

Excel TV’s first course on dashboards remains among my favorite and I was thrilled to learn that in its latest course Excel TV turns its attention to the world of Big Data and the Microsoft BI ecosystem writ large.

The course, entitled “Data Science with R and Power BI,” is a well-researched course on combining these applications to deliver insight from data.

Course basics

Course instructor Ryan Wade.

This course is taught by Ryan Wade who has over 20 years of experience in business intelligence. It is delivered on Excel TV’s easy-to-use course platform; if you are a student of its other courses, it is easy to navigate between them.

Lectures are delivered over screenshot video with crisp audio and visuals and include a link to download all source code. Ideally this would include source data as well to reproduce the results, but I have been able to modify the supplied code to apply to my own data.

Understanding R’s place in Microsoft’s World

Probably the biggest strength of this course is how it clearly positions R as a tool to use within the Microsoft ecosystem. This pairing should serve as no surprise to astute Microsoft watchers as Microsoft has for years maintained its own distribution of R with Microsoft R Open and for some time has made R visuals available in Power BI.

Ryan makes full use of the Microsoft BI stack from using its R distribution to using the R Tools for Visual Studio development environment to using SQL Server to store data to (obviously) Power BI to present it. There is a lot going on between these various applications and an outside primer on SQL Server and Power BI might be useful.

Pain points defined and accounted for

Every application has its strengths and weaknesses and it appears that by incorporating R so handily into its BI stack Microsoft has tacitly noted some places where R can fill in some gaps.

Ryan does a great job at explicitly identifying and providing examples of these “pain points.” For example, much of the course focuses on mining unstructured text data from the web and on using regular expressions to clean text, weaker points in Power BI.

The course also includes solid introductions to the popular ggplot2 package for data visualization and the dplyr package for data manipulation.

Of course R itself is not without its problems one of which is memory management and capacity. For this Ryan shows how to use Microsoft’s SQL server to overcome this pain point and soon enough you will have integrated R with SQL Server, Power BI and Visual Studio on your computer. This is a very sensible and well-constructed BI stack.

Meeting in the middle

 

As alluded to before, this course’s curriculum lies at getting you started in data science at the intersection of R and Power BI. I illustrate this with the above Venn Diagram. What I hope to show is that this course is not best suited as an introduction to R or Power BI but rather an introduction to using these tools together (plus SQL Server, I would add). While the course does go into some basic data types in R, novices might have difficulty comprehending the videos and code. This holds true to a lesser extent for Power BI.

At the risk of a shameless plug, for a more comprehensive introduction to R for the Excel user, I suggest (wait for it) my own course, “R Explained for Excel Users.” Here you will get a more brass-tacks introduction to R which will leave you in a better position to tackle more advanced courses such as Excel TV’s.

On the whole, I recommend Excel TV’s Data Science with R and Power BI. The ability to construct data science application using a combination of applications such as in this course is quite powerful and impressive, and the course does a nice job at tailoring a curriculum based on this specific use case.

Ready to get started? Learn more about the class here.

The post Review: Excel TV’s Data Science with Power BI and R first appeared on Stringfest Analytics.

]]>
4838