Skip to content

elakew/book_sales_analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 

Repository files navigation

Book Sales - Publisher Revenue

In this project, we uncover high-level and granular insights that will inform the strategic decisions of publishers, namely publishing agents, financiers, and strategists. This exploratory data analysis (EDA) in SQLite and Python focuses on the primary sources and topical trends of publisher revenue.

Key Insights

  • Top publishers, tight market: The five leading publishers - Penguin Group and Random House followed by Amazon Digital, Hachette, and HarperCollins - are tightly grouped together by narrow margins.
  • Genre performance: Fiction has a significant (10x) lead over non-fiction. Children's books has a tiny share of publisher revenue, but at times the highest revenue percentages (60-62%).
  • Publishing year peaks: Books published between approximately 2009 - 2012 yield the highest volume of units sold but relatively low and erratic revenue percentages (~28-53%).

Tableau Dashboard

This dashboard visualizes a comparative analysis of publishers and granular presentation of the primary sources of publisher revenue. Stakeholders may examine these trends and data points by utilizing the filters (publisher, publishing year, and author) and selecting various portions (for example, click the 2011 pubishing year bubble, HarperCollins bar, or a row in the author profile).

Jupyter Notebook on Github

The Python and SQL exploratory data analysis (EDA) demonstrates the ETL process that led to the featured insights and offers additional points of analysis for stakeholders in RevOps, product, and/or marketing.

Considerations and Next Steps

  • The units sold for certain books and publishers, particularly Amazon Digital, are very high (10K+) but the gross sales and publisher revenue are relatively low. A number of these books are likely available as Kindle/Kindle Unlimited products as a part of the Amazon Prime membership; books that are selected and accessed by users through their Amazon account may be logged as units sold.
  • The original dataset does not include dates beyond the publishing year of the books. For further analysis, particularly time-series and predictive models, it would be necessary and beneficial to integrate datasets with temporal sales transactions.

Original dataset

Original Dataset Columns

  • Publishing Year: The year in which each book was published ranging from 1308 - 2016, with nulls
  • Book Name: The title of each book.
  • Author: The name of the author who wrote the book.
  • Language_code: The code representing the language in which the book is written.
  • Author_Rating: The rating assigned to the author based on their previous works.
  • Book_average_rating: The average rating given to the book by readers.
  • Book_ratings_count: The number of ratings given to the book by readers.
  • Genre: The genre or category to which the book belongs.
  • Gross sales: The total sales revenue generated by each book.
  • Publisher revenue: The revenue earned by publishers from selling each book.
  • Sale price: The price at which each copy of a book is sold.
  • Sales rank: A numeric value indicating a book's rank based on its sales performance in comparison to other books within its category (genre).
  • Units sold : Total number of copies sold for each specific title.

About

Exploratory data analysis of publisher revenue with Python and SQL

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors