databases - Stringfest Analytics https://stringfestanalytics.com Analytics & AI for Modern Excel Tue, 07 Jun 2022 13:30:12 +0000 en-US hourly 1 https://wordpress.org/?v=6.9.4 https://i0.wp.com/stringfestanalytics.com/wp-content/uploads/2020/05/cropped-RGB-SEAL-LOGO-STRINGFEST-01.png?fit=32%2C32&ssl=1 databases - Stringfest Analytics https://stringfestanalytics.com 32 32 98759290 Learning guide: Python + SQLAlchemy, half-day workshop https://stringfestanalytics.com/learning-guide-python-sqlalchemy/ Fri, 22 Oct 2021 07:46:00 +0000 https://georgejmount.com/?p=6318 Analytics tools are best thought of as a “stack,” so that each slice is seen in context with the others. The most exciting analytics products span multiple slices, combining the advantages of each. Using SQLAlchemy inside Python is such a case. This open-source tool allows you to interact with relational databases from inside Python. As […]

The post Learning guide: Python + SQLAlchemy, half-day workshop first appeared on Stringfest Analytics.

]]>
Analytics tools are best thought of as a “stack,” so that each slice is seen in context with the others. The most exciting analytics products span multiple slices, combining the advantages of each.

Using SQLAlchemy inside Python is such a case. This open-source tool allows you to interact with relational databases from inside Python. As such, SQLAlchemy crosses two slices of the analytics stack: databases and programming languages.

As the name implies, SQLAlchemy is based on the tenets of SQL, the traditional language for interacting with relational databases. The “Alchemy” is that by adding Python to the mix, magic happens.

Not only is it incredibly useful to pull information from databases into Python objects, the Python language adds versatility to traditional SQL querying.

Take a look at the below half-day workshop as a way to get started with this popular Python package. By the end of the workshop, learners will be able to build an end-to-end data pipeline, from a relational database for data storage and integrity to a pandas DataFrame for data analysis and visualization.

Lesson 1: Combining the powers of SQL & Python

Objective: Student can compare and contrast the uses of Python and SQL for data analysis

Description:

  • Databases versus scripting languages
  • Declarative versus procedural languages
  • Object-oriented programming

Time: 20 minutes

Assets needed: none

Lesson 2: Connecting to a database and returning results

Objective: Student can connect to, retrieve from, and close a database

Description:

  • Installing and loading the packages
  • Connecting to a database
  • Retrieving keys and items
  • Retrieving records
  • Closing the connection          

Time: 45 minutes

Assets needed: Baseball database

Lesson 3: Sorting & filtering results

Objective: Student can sort and filter query results

Description:

  • Conditional logic
  • Filtering results
  • Ordering results

Time: 35 minutes

Assets needed: Baseball database

Lesson 4: Student can group by and create calculations from query results

Objective: Student can create a data manipulation pipeline

  • Grouping results
  • Creating aggregated calculations
  • Creating calculated fields

Time: 35 minutes

Assets needed: Baseball database

Lesson 5: Joining tables

Objective: Student can join two or more tables

  • Inner joins
  • Left outer joins

Time: 35 minutes

Assets needed: Baseball database

Lesson 6: SQLAlchemy & pandas

Objective: Student can load query results into pandas DataFrames and analyze

  • Loading results to a DataFrame
  • Summarizing and analyzing in pandas

Time: 30 minutes

Assets needed: Baseball database

Lesson 7: Capstone: creating a data pipeline

Objective: Student can create an end-to-end data pipeline using Python, SQLAlchemy and pandas

  • Connect to, retrieve and close a database
  • Load results to DataFrame
  • Analyze, inspect and interpret results

Time: 35 minutes

Assets needed: Flights database

The post Learning guide: Python + SQLAlchemy, half-day workshop first appeared on Stringfest Analytics.

]]>
6318
Learning guide: Introduction to Power Pivot in Excel, one-day workshop https://stringfestanalytics.com/learning-guide-power-pivot-one-day-workshop/ Thu, 25 Feb 2021 10:00:00 +0000 https://georgejmount.com/?p=6071 Ah, the “Franken-Table.” You’ve probably created one: a worksheet with umpteen VLOOKUP()‘s referencing various other worksheets and perhaps (gasp!) even other workbooks. It’s ugly and inefficient: you may not even be able to build the report you were hoping to without the workbook crashing. Fortunately Microsoft has developed a Frankentable-killer: Power Pivot. This is a […]

The post Learning guide: Introduction to Power Pivot in Excel, one-day workshop first appeared on Stringfest Analytics.

]]>
Ah, the “Franken-Table.” You’ve probably created one: a worksheet with umpteen VLOOKUP()‘s referencing various other worksheets and perhaps (gasp!) even other workbooks.

It’s ugly and inefficient: you may not even be able to build the report you were hoping to without the workbook crashing.

Fortunately Microsoft has developed a Frankentable-killer: Power Pivot. This is a relational modeling technology that you can use from right within Excel to define relationships between datasets to build efficient, dynamic reports.

For as awesome as this technology is, it’s important to understand that the relational data model was not invented with Power Pivot: it’s a classic field of computer science and provided scaffolding for the Age of Information. That’s why I take a decent amount of time in this workshop to explain the origins and theories of relational models.

Power Pivot is just one plank of the new Microsoft BI platform, so I also take some time to help learners contextualize how these tools work together.

Get your copy of the guide below. You are welcome to use this learning guide at your school or workplace to guide workshops or for however you can benefit from it. Access to the learning guide is not access to a workshop itself. You can use this learning guide to conduct a workshop at your organization. Consider this guide more like a recipe or blueprint.

Download your free copy of this guide below, then check out the course outline.

Learning guide: Power Pivot for Excel

Lesson 1: Power Pivot and Modern Excel

Objective: Student can contextual Power Pivot in the Microsoft BI stack

Description:

  • Modern Excel and the “Power Platform”
  • Power Query, Pivot, View: Oh my!
  • DAX & M, Query & Pivot

Assets needed: None

Time: 25 minutes

Lesson 2: How relational models work

Objective: Student can evaluate relational models for best practices in modeling

Description:

  • A relational model of data
  • Database normalization
  • From lookups from joins
  • Table relationships & cardinality
  • Filter directions
  • Hierarchies

Exercises: Inspect a data model

Assets needed: Employee database

Time: 90 minutes

Lesson 3: From PivotTables to “Power” PivotTables

Objective: Student can perform row-wise data cleaning

Description:

  • Importing data & creating models
  • String and date functions
  • Conditional logic
  • Implicit & explicit measures
  • Creating PivotTable reports

Exercises: Build a data model & PivotTable report

Assets needed:  Employee database

Time: 120 minutes

Lesson 4: Up and running with DAX

Objective: Student can write and modify measures with DAX

Description:

  • Counting and mathematics functions
  • Conditional logic functions
  • Filter functions
  • Iterator functions
  • Sorting & aggregation

Exercises: Drills

Assets needed: Employee database

Time: 120 minutes

Lesson 5: Intermediate DAX

Objective: Student can perform intermediate data analysis operations with DAX

Description:

  • Time intelligence
  • Dependent measures
  • DAX Studio
  • Creating Power View reports

Exercises: Drills

Assets needed: E-commerce database

Time: 90 minutes

This learning guide is part of my resource library. For exclusive free access, sign up below.

The post Learning guide: Introduction to Power Pivot in Excel, one-day workshop first appeared on Stringfest Analytics.

]]>
6071
What is the “data analytics stack?” https://stringfestanalytics.com/data-analytics-stack/ Sat, 08 Aug 2020 11:30:00 +0000 https://georgejmount.com/?p=5973 A poor craftsman blames his tools. But if all you have is a hammer, everything looks like a nail. It’s common for web developers or database adminstrators to refer to their “stack” of tools used to do the job, but I’ve never heard this moniker used for data analysts. So it got me thinking, what […]

The post What is the “data analytics stack?” first appeared on Stringfest Analytics.

]]>

A poor craftsman blames his tools. But if all you have is a hammer, everything looks like a nail.

It’s common for web developers or database adminstrators to refer to their “stack” of tools used to do the job, but I’ve never heard this moniker used for data analysts. So it got me thinking, what is the data analytics stack?

Data analysts make range of a wide variety of software, for a wide variety of tasks. When a solution comes up short, the focus ought not to be on “blaming” tools for their shortcomings, but on possessing alternatives and choosing a better one (or ones) for the given scenario.

That is, it’s better to think of these tools as “slices” of the same stack to be used concurrently, rather than as misfits to be entirely discarded.

To imagine what the analytics stack might look like, I used the below data products Venn diagram, placing the logos of popular data analytics tools in their respective segments.

http://www.datacommunitydc.org/blog/2013/09/the-data-products-venn-diagram 

After stepping back from my marked-up Venn diagram, four categories or “slices” of the stack appeared to me. Let’s get to them below; but first, a caveat.

Staying vendor agnostic

Some vendors have packaged their own “stack” of tools for data analysis; for example, Microsoft’s Power Platform or Google Data Studio. I am keeping my overview of the stack vendor-agnostic.

While you may learn that some slices fit better together, it’s better to start with the context of what category to tool to use, when, rather than what vendor. I will, however, provide a brief industry landscape of these products below, and suggestions for future learning.

Spreadsheets

Reports of the death of spreadsheets are greatly exaggerated. For their ease of use and flexibility, spreadsheets are an excellent choice for back-of-the-envelope calculations and prototyping.

However, spreadsheets do have their limitations. They can lack data integrity, storage and delivery functionalities. These limitations are often what cause pundits to give spreadsheets their last rites. But this misses the point of “the stack” entirely — those tasks aren’t the proper context for spreadsheets in the first place.

The major spreadsheet applications are Microsoft Excel and Google Sheets. I won’t tell you outright my preference, but you may find out if you follow me on social media for long.

Databases

Databases are a relatively ancient technology in the analytics space, but show no signs of slowing. They offer more reliable and extensible methods for data storage and integrity, but the actual analysis easily done directly inside databases is limited.

Structured query language, or SQL, is the language used to interact with relational database management systems. While many SQL platforms exist, the types of read-only operations necessary for most data analysts won’t change across them.

For data analysts new to SQL, I suggest SQLite or Microsoft Access as lightweight tools for learning SQL.

Business intelligence & dashboard platforms

This is a broad swathe of tools and it’s likely the most ambiguous slice of the stack, but here I mean enterprise tools that allow users to gather, model and display data.

Data warehousing tools like MicroStrategy and SAP BusinessObjects straddle the line here, since they are tools designed for self-service data gathering and analysis. But these often have limited visualization and iteractive report-building included.

That’s where tools like Power BI, Tableau and Looker come in. These tools allow users to build data models, dashboards and reports with minimal coding. Importantly, they make it easy to disseminate and update information across an organization.

However, these tools tend to be inflexible in the way they handle and visualize data. They can also be expensive, with single-user annual licenses running several hundred or even thousands of dollars.

Advancing into Analytics Cover Image

Want to learn how to enhance your analytics skills with Python and R? Check out my book Advancing into Analytics.

Data programming languages

While many vendor tools are moving to a place where coding is not as essential to the data workflow, I still think it’s a good idea to learn programming. This helps sharpen understanding of how data processing works, and gives users fuller control of their workflow over using a graphical user interface (GUI).

For data analytics, two open-source programming language are good fits: R and Python. Each include a dizzying universe of free packages made to help with everything from social media automation to geospatial analysis. Learning these tools also opens the door to advanced analytics and data science.

However, this slice could have the steepest learning curve in the stack, and many analysts may struggle to see the benefit of learning to code, when they can do most of what they need easily enough from a GUI.

Not better or worse, just different

Seen in the light of a “stack,” it makes little sense to compare any of these slices, or claim one as inferior than the other. They are meant to be complementary.

Data analysts often wonder which tool they should focus on learning or become the expert in. I would suggest not becoming the expert in any single one, but in learning each slice of the stack well enough to contextualize and choose between them.

Entering the stack

Learning one data tool is daunting. Learning a whole “stack” of them can seem impossible. However, this cross-training can expedite growth, as connections are made across platforms in how to use data effectively.

If it all seems like too much, or you would like specifics in pivoting knowledge in one slice of the stack to another, check out my data education resource library by subscribing below.

What data tools do you use? How do you fit together? Other thoughts on the idea of an “analytics stack?” Let’s discuss in the comments.

The post What is the “data analytics stack?” first appeared on Stringfest Analytics.

]]>
5973