Microsoft finally released SQL Server 11 “Denali” as CTP3 (Community Technology Preview) for public … Preview. Microsoft is (these are politeness words I can type) stubbornly refusing to have/build own Data Visualization Product. I doubt Crescent “experience” can be considered as a product, especially because it is Silverlight-base, while world already moved to HTML5.

If you have 7 minutes, you can watch Crescent Demo from WPC11, which is showing that while trailing a few years behind DV Leaders and Google, Microsoft is giving to its die hard followers something to cheer about:

I have to admit, that while there is nothing new (for DV expert) in video above, it is a huge progress compare with Excel-based Data Visualizations, which Microsoft tried to promote as a replacement of ProClarity and PerformancePoint Server. Even Microsoft itself positions Crescent (which is 32-bit only!) as a replacement for SSRS Report Builder, so DV Leaders can sleep well another night.

However, Microsoft’s BI Stack is the number 4 or 5 on my list of DV Leaders and CTP3 is so rich with new cool functionality, that it deserves to be covered on this blog.

Of course major news is availability of Tabular Data Model, which means VertiPaq in-memory columnar Engine, similar to PowerPivot Engine but running on Server without any SharePoint (which is a slow virus, as far as I am concerned) and without stupid SharePoint UI and limitations and I quote Microsoft: ” In contrast with the previous release, where VertiPaq was only available via in PowerPivot for SharePoint, you can now use VertiPaq on a standalone Analysis Services instance with no dependency on SharePoint.“!

SSAS (SQL Server Analysis Services) has new (they may existed before, but before CTP3 – ALL who knew that were under NDA) features like memory paging (allows models to be larger than the physical memory of the server, means unlimited scalability and BIG Data support), row level security (user identity used to hide/show visible data), KPI, Partitions; CTP3 removes the maximum 4GB file size limit for string storage file, removes the limit of 2 billion rows per table (each column is still limited to a maximum of 2 billion distinct values, but in columnar database it is much more tolerable restriction!).

New version of PowerPivot is released with support of  Tabular Model and I quote: “You can use this version of the add-in to author and publish PowerPivot workbooks from Excel 2010 to Microsoft SQL Server” and it means no SharePoint involvement again! As Marco Russo put it: “Import your existing PowerPivot workbooks in a Tabular project (yes, you can!)” and I agreed 100% with Marco when he said 4 times: Learn DAX!

After 3 years of delays, Microsoft is finally has BIDS for Visual Studio 2010  and that is huge too, I quote again: “The Tabular Model Designer … is now integrated with Microsoft SQL Server “Denali” (CTP 3) Business Intelligence Development Studio.” It means that BIDS now is not just available but is the main unified development interface for both Multidimensional and Tabular Data Models. Now we can forget about Visual Studio 2008 and finally use more modern VS2010!

Another extremely important for Data Visualization feature is not in SSAS but in SQL Server itself: Columnstore index is finally released and I a quote 1 more time again: “The … SQL Server (CTP 3) introduces a new data warehouse query acceleration feature based on a new type of index called the columnstore. This new index … improves DW query performance by hundreds to thousands of times in some cases, and can routinely give a tenfold speedup for a broad range of decision support queries… columnstore indexes limit or eliminate the need to rely on pre-built aggregates, including user-defined summary tables, and indexed (materialized) views. Furthermore, columnstore indexes can greatly improve ROLAP performance” (ROLAP can be used for real-time Cubes and real-time Data Visualizations).

All these cool SQL Server 11 new stuff is coming soon into Azure Cloud and this can be scary for any DV vendor, unless it knows (Tableau does; Qliktech and Spotfire still ignore SSAS) how to be friendly with Microsoft.

As we know now the newly coined by Microsoft term BISM (Business  Intelligence  Semantic Model) was a marketing attempt to have a “unified” umbrella

for 2 different Data Models and Data Engines: Multidimensional Cubes (invented by Mosha Pasumansky 15 years ago and the foundation for SSAS and MDX – SQL Server Analysis Services) and Tabular Model (used in PowerPivot and VertiPaq in-memory columnar Database with new DAX Language which is going to be very important for future Data Visualization projects).

New CTP3-released BIDS 2010 (finally almighty Visual Studio 2010 will have a “Business Intelligence Development Studio” after 3+ years of unjustified delays!) UI-wise will able to handle these 2 Data Models, but it is giving me a clue why Mosha left Microsoft for Google. And lack of DV product is a clue for me why Donald Farmer (face of Microsoft BI) left Microsoft for Qliktech.

Even more: if you need both Data Models to be present, you need to install 2 (TWO!) different instances of “Analysis Services”: one with Multidimensional Engine and one with new Tabular (VertiPaq/PowerPivot) Engine. It seems to me not as ONE “BI” architecture but TWO “BI” Architectures, interface-glued on Surface by BIDS 2010 and on back-end by all kind of Data Connectors. Basically Microsoft is in confused BI state now because financially it can afford 2 BI Architectures and NO Data Visualization Product!

I cannot believe I am saying this, but I wish Bill Gates back from retirement (it will be good for Microsoft shares and good for Microsoft market capitalization too – just ask Apple’s shareholders about Steve and they will say he is a god)!

Permalink: https://apandre.wordpress.com/2011/07/14/tabular-model/

In last few days something (3 news covered here in one post below) important for the future of Data Visualization and Big Data Analytics happened. IBM recently had 100th Birthday and almost at the same time their engineers published new invention, based on PCM (Phase-Change Memory).

  • PCM will not lose data when when power is turned off.
  • PCM 100 times faster (10 microseconds latency!) then flash and HDD.
  • PCM can endure at least 10 million write cycles (Flash maxed-out @30000)
  • PCM is cheap, has huge capacity and will be mass-produced before 2016.
  • PCM can be used everywhere from huge servers to smartphones


This invention is changing the approach to how to store and access “Big Data” and what portion of “Big Data” need to be in-memory (RAM) for Data Visualization purposes as oppose to outside of RAM (say on hard disk, flash or PCM). IBM may have a keys to Big Data kingdom…

To some people it may be unrelated, but not to me: Teradata just got the Patent on SQL-MapReduce technology they got from Aster Data acquisition. This technology allows also to integrate with Apache Hadoop and derived database systems, used in many Big Data applications.

And last but not least is a recent acknowledgment (for some reason it came from India’s branch of IBM Software and I am wondering why, but finally it came “Straight from the horse’s mouth”! ) from IBM that Data Visualization is the future of Business Intelligence (I said THIS many years ago and still repeating it from time to time: DV is new BI or in other words: the BI is dead, all hails to DV!). IBM is very proudly saying that Cognos 10 supports “enormous” number of Charts (I guess it will make Qlikview, Spotfire and Tableau people laughing)

and that the most discussed feature in Cognos 10 is Active Reports. This functionality allows the report authors to create interactive reports (apparently it is a big deal for IBM!).

IBM even is spreading rumors for weeks (through people who signed NDA with them) about Cognos TM1-based “new visualization tool”, which will “disrupt” DV market… I guess because IBM knows that BI is dead (and IBM wasted $14+B buying 24 BI companies lately) and DV is new BI.

Since IBM improved PCM (see above) and had 100th birthday, I really wish good luck to them, but I wish IBM to stay focused on what they good at instead of spreading all over the high-tech. All these 3 “news” were published yesterday and today and somehow connected in my mind to Data Visualization’s future and forced me to publish this “eclectic” post…

Below is a Part 3 of the Guest Post by my guest blogger Dr. Kadakal, (CEO of Pagos, Inc.). This article is about of how to build Dashboards and Data Visualizations with Excel. The topic is large, and the first portion of article (published on this blog 3 weeks ago) contains the the general Introduction and the Part 1 “Use of Excel as a BI Platform Today“.  The Part 2 – “Dos and Don’ts of building dashboards in Excel“ published 2 weeks ago  and Part 3 – “Publishing Excel dashboards to the Internet“ is started below and its full text is here.

As I said many times, BI is just a marketing umbrella for multiple products and technologies and Data Visualization became recently as one of the most important among those. Data Visualization (DV) so far is a very focused technology and article below shows how to publish Excel Data Visualizations and Dashboards on Web. Actually a few Vendors providing tools to publish Excel-based Dashboards on Web, including Microsoft, Google, Zoho, Pagos and 4+ other vendors:

I leave to the reader to decide if other vendors can compete in business of publishing Excel-based Dashbaords on Web, but the author of the artcile below provides a very good 3 criterias of how to select the vendor, tool and technology for it (and when I used it myself it left me only with 2 choices – the same as described in article).

Author: Ugur Kadakal, Ph.D., CEO and founder of Pagos, Inc. 

Publishing of Excel Dashboards on the Internet

Introduction

In previous article (see “Excel as BI Platform” here) I discussed Excel’s use as a Business Intelligence platform and why it is exceedingly popular software among business users. In 2nd article (“Dos&Don’ts of Building Successful Dashboards in Excel) I talked about some of the principles to follow when building a dashboard or a report in Excel. Together this is a discussion of why Excel is the most powerful self-service BI platform.

However, one of the most important facets of any BI platform is web enablement and collaboration. It is important for business users to be able to create their own dashboards but it is equally important for them to be able to distribute those dashboards securely over the web. In this article, I will discuss two technologies that enable business users to publish and distribute their Excel based dashboards over the web.

Selection Criteria

The following criteria were selected in order to compare the products:

  1. Ability to convert a workbook with most Excel-supported features into a web based application with little to no programming.
  2. Dashboard management, security and access control capabilities that can be handled by business users.
  3. On-premise, server-based deployment options.

Criteria #3 eliminates online spreadsheet products such as Google Docs or Zoho. As much as I support cloud based technologies, in order for a BI product to be successful it should have on-premise deployment options. Without on-premise you neglect the possibility of integration with other data sources within an organization.

There are other web based Excel conversion products on the market but none of them meet the criteria of supporting most Excel features relevant to BI; therefore, they were not included in this article about how to publish Excel Dashboard on Web .

Below is a Part 2 of the Guest Post by my guest blogger Dr. Kadakal, (CEO of Pagos, Inc.). This article is about of how to build Dashboards and Data Visualizations with Excel. The topic is large, and the first portion of article (published on this blog last week) contains the the general Introduction and the Part 1 “Use of Excel as a BI Platform Today“.

The Part 2 – “Dos and Don’ts of building dashboards in Excel“ is below and Part 3 – “Publishing Excel dashboards to the Internet“ is coming soon. It is easy to fall into a trap with Excel, but if  you avoid those risks as described in article below, Excel can become of one of the valuable BI and Data Visualization (DV) tool for user. Dr. Kadakal said to me recently: “if the user doesn’t know what he is doing he may end up spending lots of time maintaining the file or create unnecessary calculation errors”. So we (Dr. Kadakal and me) hope that article below can save time for visitors of this blog.

BI in my mind is a marketing umbrella for multiple products and technologies, including RDBMS, Data Collection, ETL, DW, Reporting, Multidimensional Cubes, OLAP, Columnar and in-Memory Databases, Predictive and Visual Analytics, Modeling and DV.

Data Visualization (aka DV), on other hand, is a technology, which enabling people to explore, drill-down, visually analyze their data and visually search for data patterns, like trends, clusters, outliers, etc. So BI is marketing super-abused term, while DV so far is focused technology and article below shows how to use Excel as a great Dashboard builder and Data Visualization tool.

Dos&Don’ts of Building Successful Dashboards in Excel

Introduction (click to see the full article here)

In previous week’s post (see also article “Excel as BI Platform” here) I discussed Excel’s use as a Business Intelligence platform and why it is exceedingly popular software among business users. In this article I will talk about some of the principles to follow when building a dashboard or a report in Excel.

One of the greatest advantages of Excel is its flexibility: it puts little or no constraints on the user’s ability to create their ideal dashboard environments. As a result, Excel is being used as a platform for solving practically any business challenge. You will find individuals using Excel to solve a number of business-specific challenges in practically any organization or industry. This makes Excel the ultimate business software.

On the other hand, this same flexibility can lead to errors and long term maintenance issues if not handled properly. There are no constraints on data separation, business logic or the creation of a user interface. Inexperienced users tend to build their Excel files by mixing them up. When these facets of a spreadsheet are not properly separated, it becomes much harder to maintain those workbooks and they become prone to errors.

In this article, I will discuss how you can build successful dashboards and reports by separating data, calculations and the user interface. The rest of this post you can find in this article

 Dos and Don’ts of building dashboards in Excel” here.

It discusses how to prepare Data (both static and external) for dashboards, how to build formulas and calculation models, UI and Input Controls for Dashboards and of course – Pivots,Charts, Sparklines and Conditional Formatting for innovative and powerful Data Visualizations in Excel.

This is a Part 1 of surprise Guest post. My guest is Ugur Kadakal, Ph.D., he is the CEO and founder of Pagos, Inc., which he started almost 10 years ago.

Dr. Kadakal is an expert in Excel, Business Intelligence, Data Analytics and Data Visualization. His comprehensive knowledge of Excel, along with his ambitious inventions and ideas, supply the foundation for all Pagos products, which include SpreadsheetWEB (which converts Excel spreadsheets into web applications), SpreasheetLIVE  (a fully-featured, browser-based spreadsheet application environment) and Pagos Spreadsheet Component (which integrates Excel spreadsheets into enterprise web applications).

Pagos started and hosted the largest free collection and repository of professional templates of Excel spreadsheets on the web: http://spreadsheetzone.com . 3 Excel-based Dashboard below can be found on this very popular repository and done by Dr. Kadakal:

Dashboard 1 : Human Resources Dashboard: http://spreadsheetzone.com/templateview.aspx?i=498

Dashboard 2 : Business Activity Dashboard in EuroZone: http://spreadsheetzone.com/templateview.aspx?i=490

Dashboard 3 : Energy Dashboard for Euro Zone: http://spreadsheetzone.com/templateview.aspx?i=491

The topic is large, so this Guest article is splitted on 3 blog posts. The first portion of article contains the Introduction and Part 1 “Use of Excel as a BI Platform Today“, then I expect Dr. Kadakal will do at least 2 more posts: Part 2 – “Dos and Don’ts of building dashboards in Excel“, Part 3 – “Moving Excel dashboards to the Web“.

Excel as a Business Intelligence Platform – Part 1

Introduction

Electronic spreadsheets were one of the very first Business Intelligence (BI) software. While the availability of spreadsheet software and it use as a tool for data analysis dates back to the 1960s, its application in the BI field began with the integration of OLAP and pivot tables. In 1991, Lotus released Improve, followed by Microsoft’s release of PivotTable in 1993. However, Essbase was the first scalable OLAP software to handle large data sets that the early spreadsheet software was incapable of. This is where its name comes from: Extended Spread Sheet Database.

There is no doubt that Microsoft Excel is the most commonly used software for BI purposes. While Excel is general business software, its flexibility and ease of use makes it popular for data analysis with millions of users worldwide. Excel has an install base of hundreds of millions of desktops: far more than any other BI platform. It has become a household name. From educational utilization to domestic applications and enterprise implementation, Excel has been proven incredibly indispensable. Most people with commercial or corporate backgrounds have developed a proficient Excel skillset. This makes Excel the ultimate self-service BI platform. However, like all systems, Excel has some weaknesses that make it difficult to use as a BI tool under certain conditions.

Use of Excel as a BI Platform Today

Small Businesses

Traditionally, small businesses are not considered as an important market segment by most BI vendors. Their data analysis and reporting needs are limited primarily due to their smaller commercial volumes. However, this is changing quickly as smaller organizations begin to collect large amounts of data, thanks to the Internet and social media, and require tools to manage that data. However, what is not changing is the limited financial resources available to them. Small businesses cannot spare to spend large amounts of money on BI software or consultants to aid them in the creation of the applications. That’s why Excel is the ideal platform for them and will most probably remain that way for a foreseeable future. The reasons are clear: (1) most of them already have Excel licenses, (2) most of their users know how to use Excel and (3) their needs are simpler and can be met with Excel.

Mid-Range Businesses

Mid-range businesses are a quickly growing market segment for BI vendors. Traditionally, Excel as a BI platform has been more popular among these businesses. Cost and availability are the primary factors in this. However, two aspects have been steering them to searching for alternatives: (1) Excel can no longer handle their growing data volumes and (2) other BI vendors started offering cost-effective alternatives.

As a result, Excel’s market share in this field is in decline although it still remains the most popular. On the other hand, with the release of Office 2010 and its extended capabilities for handling very large data sets, Excel stands a good chance at reversing this decline.

Large Enterprises

The situation with large enterprises is rather complex. Most of them already have large-scale a BI implementation in place. Those implementations often connect various databases and data warehouses within the organizations. They have made significant investments and continue doing so to expand and maintain their BI systems. They already have a number of dashboards and reports designed to serve their business units. However, business users always need new and different dashboards and reporting tools. The only software that gives them the ultimate flexibility in creating their own reports is Excel. As a result, even in large Enterprises, usage of Excel for BI purposes is common. Business users often go to their data warehouses or BI tools and get a data extract to bring into Excel. They can then prepare their analysis and build their reports in Excel.

Enterprises will continue using their existing platforms because they have made huge investments building those systems. However, Excel use by business users as their secondary BI and reporting tool will continue to rise unless the alternative vendors significantly improve their self-servicing capabilities.

Summary

Excel is one of the ultimate business platforms and offers unparalleled features and capabilities to non-programmers. This makes it an ideal self-service BI platform. In this article, we examined the use of Excel as a BI platform in companies of different sizes. In the next article of this series, we will discuss how to use Excel more efficiently as a BI platform, from handling data to calculations and visual interactions.

Comparison of DV Tools is the most popular page (and post) of this site, visited by many thousands of people. Some of them keep asking to append this comparison with different additional features, one of them is a comparison of requirements of leading DV tools for file and memory footprint and also for reading and saving time.

I took mid-sized dataset (428999 rows and 135 columns), exported it into CSV and compressed it to ZIP format, because all native DV formats (QVW by Qlikview, DXP by Spotfire, TWBX by Tableau and XLSX by Excel and PowerPivot) are compressed one way or another. My starting filesize (of ZIPped dataset) was 56 MB. Here is what I got, see for yourself:

One comment is that numbers above are all relative to configuration of hardware used for tests and also depend on other software I ran during tests, because that software also requires RAM, CPU cycles, disk I/O and even on speed of repainting applications windows on screen, especially for Excel. I probably will add more comments to this post/page, but my first impression from this comparison is that new Tableau’s Data Engine (released in version 6.0 and soon will be updated in 6.1) made Tableau more competitive. Please keep in mind, that comparison of in-memory footprint was much less significant in above test, because Qlikview, Excel and PowerPivot putting all dataset into RAM, while Tableau and Spotfire can leave some (unneeded for visualization) data on disk, treating it as “virtual memory”. Also Tableau using 2 executables (not just one EXE as others): tableau.exe (or tabreader.exe) and tdserver64.exe

Since Tableau is the only DV Leading software, capable to read from SSAS Cubes and from PowerPivot (local SSAS) Cubes, I also took large SSAS Cube and for testing purposes I selected SSAS Sub-Cube with 3 Dimensions, 2 Measures and 156439 “rows”, measured the Time and Footprint, needed for Tableau to read Sub-Cube, Refresh it in Memory, Save to local application file, and also measurted “Cubical” Footprint of it in Memory and on Disk and then compared all results with the same tests while running Excel 2010 alone and Excel 2010 with PowerPivot:

While Tableau’s ability to read and visualize Cubes is cool, performance-wise Tableau is far behind of Excel and PowerPivot, especially in Reading department and memory footprint. In Saving department and File footprint Tableau is doing nothing because it is not saving cube locally in its local application TWBX file (and it keeps data in SSAS cube outside of Tableau) so Tableau’s file footprint for SSAS Cubes is not an indicator but for PowerPivot-based local Cubes Tableau does better job (saving data into local application file) then both Excel and PowerPivot!

TIBCO released Spotfire 3.3 and first (see what is new here) that jumped to my eyes was how mature this product is. For example, among new features is improved scalability – each additional simultaneous user of a web analysis initially claims very little additional system memory:

Many Spotfire customers will be able to support a greater number of web users on their existing hardware by upgrading to 3.3. Spotfire Web Player 3.3 includes significant improvements in memory consumption (as shown above for certain scenarios). Theoretically goal is to minimize the amount of system memory needed to support larger numbers of simultaneous users on the same analysis file. Main use case here: the larger the file and the greater the number of simultaneous web users on that file, then less initial system memory required to support each additional user: it is greatly reduced compared to version 3.2.1 and earlier.

Comparison with competition and thorough testing of new Spotfire scalability has to be done (similar to what Qliktech done with Qlikview here), but my initial reaction is as I said in a Title: we are witnessing a very mature software. Apparently the Defense Intelligence Agency (DIA) agrees with me and Defense Intelligence Agency Selects TIBCO Spotfire Analytics Solutions for Department of Defense Intelligence Information System Community. “With more than 16,500 military and civilian employees worldwide, DIA is a major producer and manager of foreign military intelligence”

Spotfire 3.3 also includes collaborative bookmarking, which enables all Spotfire users  to capture a dashboard – its complete configuration, including markings, drop down selections, and filter settings and share that visualization immediately with other users of that same dashboard, regardless of client in use. Spotfire actually not just a piece of Data Visualization Software, but a real Analytical Platform with large portfolio of products, including completely integrated S-PLUS (commercial version of R Library which has million of users), best Web Client (you can go Zero-footprint with Spotfire Web Player or/and partially free Spotfire Silver), free iPad Client version 1.1.1 (requires iTunes, so be prepared for Apple intrusion), very rich API, SDK, integration with Visual Studio, support of IronPython and JavaScript , well-thought Web Architecture, set of Extension Points etc.

System requirements for Spotfire 3.3 can be found here. Coincidentally with 3.3 Release Spotfire VAR Program got expansion too. Spotfire has a very rich set of training options, see it here. You can also find set of good Spotfire videos from Colin White’s Screencast Library, especially 2011 Webcasts.

My only and large concern with Spotfire is its focus, since it is part of a large corporation TIBCO, which has 50+ products and 50+ reasons to focus on something else. Indirectly it can be confirmed with sales: my estimate that Tableau is growing much faster than Spotfire (sales-wise) and Qlikview Sales probably 3 times larger (dollar-wise) than Spotfire sales. Since TIBCO bought Spotfire in 2007, I expected Spotfire will be integrated with other great TIBCO products, but after 4 years it is still not a case… And TIBCO has no reason to change its corporate policies, since its busines is good and stock is doing well:

(at least 500% increase of share price since end of 2008!). Also see article written by Ted Stamas for SeekingAlpha and comparison of TIBX vs. ETF here:

I think it is interesting to notice that TIBCO recently rejected a buyout offer from HP!

Last week of April 2011 was good for Qliktech. It released the results for a First Quarter 2011 and they are very positive.

Revenue is up (does not look like it is slowing down) 44% YoY, if compared with 1Q2010 with revenue $63M and projection for total 2011 now about $300M (up from preliminary projection of $280M before Q1 happened). Ended the first quarter of 2011 with an active customer count of approximately 19,000 (means about 900000 licensed, paying Data Visualization and BI users now and number of Qlikview users may exceed 1 million in 2011!), up from approximately 14,000 active customers at the end of the first quarter of 2010! Among other news:

  • Qliktech hired 103 new employees in Q1 of 2011 and currently employed 883 people (a 43% increase year-over-year).
  • Qliktech signed a strategic alliance with Deloitte, starting with Netherlands and planning expansion of alliance to Deloitte worldwide.
  •  About 2 weeks ago Qliktech unveils one of the first HTML5-based full client application: Qlikview on iPad (free [user will need license to access Qlikview Server anyway] – and delivered it through the Safari mobile Web browser) – Qliktech claims that it is “every bit as rich as a native app.”

I guess most of DV Client applications should have HTML5 reincarnation soon… As a result of all these positive sound bites, Qliktech shares ended this week above $32, more than tripled in 9 months:


and I compared Qliktech’s relative growth in above Annotated Timeline chart with Microstrategy, TIBCO and Apple (yes, Qliktech is growing at least twice faster than … Apple). I cannot include Tableau in comparison, because Tableau Software is still … a private company.

Qliktech’s capitalization as of today, 4/30/11 is $2.5B, $1B more than Microstrategy and only twice less than TIBCO’s capitalization. I know at least 3 software vendors, who are focused only on BI and DV: Tableau (it is still a private company; BTW, Tableau 6.1 will be released soon) – growing faster (114% YoY- see it here) than anybody, Qliktech (share price has tripled in last 9 months) and Microstrategy (it’s share price almost doubled in last 9 months). I consider the dedication to DV and BI as very important for future success in DV market; for example TIBCO’s Spotfire is only one of 50+ TIBCO’s products… and it dangers the future of one of the most advanced and mature DV products – Spotfire (version 3.3 is coming soon) .

One of reasons for Qliktech growth is its 1000+ partners and extensive Partner Programs for OEM Partners, Solution Providers, Business Consultants and System Integrators. Those overdeveloped Partner Programs required mandatory commitments from Partners in terms of Revenue Targets, Membership Fees, Qlikview Certifications and Minimum number of Trained employees. Lately Qliktech unreasonably raised those requirements and it may backfire and slowdown Qliktech growth and will help competitors like Tableau (Tableau actually opposite to Qliktech: their partnership program is underdeveloped – in my opinion – and requires big improvements) and recently Microstrategy (which seems learning from own and competitors mistakes and catching up lately).

Update 3 months later:

in Q2 of 2011 Qliktech reached 21000 customers worldwide (it means almost 1 million licensed users), $74 Millions in revenue (45% over Q2 2o1o); 1000 full time employes (400+ more compare with Q2 2010), $2.4B Market Capitalization and guess what – $2.2  Million of lost!

Permalink: https://apandre.wordpress.com/2011/04/30/good-week-for-qliktech/

Microstrategy is a famous and BI-dedicated company, operating for 22+ years, recently released Visual Insight (as part of the release of Microstrategy 9.2 this week) and joint the DV race. A couple of years ago, I advised to some local company in terms of choosing Data Visualization Partner and final 3 choices were Qlikview, Spotfire and Microstrategy. Microstrategy was most competitive pricing-wise, but their Data Visualization functionality was not ready yet. They are ready now, see it here (from webcast this week):

Visual Insight as part of Microstrategy 9.2 targets so called “self-service BI”, and transition (they acknowledged that) from “old BI” (tabular reports: published static and OLAP reports) to “new BI” (Data Visualization and Dashboards), from Desktop to Mobile Clients (that is a forward looking statement for sure), from Physical to Cloud.

Microstrategy is claiming that Visual Insight allows to visualize Data in 30 minutes (that is good to know, but DV Leaders already have it for a while, welcome to the club!) compare with 30 days for the same process with “traditional BI”:

(I am saying this for 6 years now and on this blog since inception of it; does it mean that old BI is useless now and too pricey? Microstrategy presenters saying that answer is yes! and I want to thank Microstrategy for the validation of my 6-years old conclusion). For full set of Microstrategy 9.2 slides click here.

Microstrategy 9.2 has a full BI product portfolio, fast in-memory Data Engine, free mobile and tablet clients, has even Free Reporting Suite . Microstrategy (like Qliktech, Tableau and Visokio) is completely focused on Business Intelligence and Data Visualization functionality unlike its giant competitors like SAP, IBM, Oracle and Microsoft.

Update 9/27/11. MIcrostrategy released free Cloud Personal edition, based on Visual Insight, see it for yourself here:

Since many people will use Excel regardless of how good other BI and DV tools are, I am regularly comparing abilities of Excel to solve Data Visualization problems I discussed on this site. In most cases Excel 2003 is completely inappropriate and obsolete (especially visually), Excel 2007 is good only for limited DV tasks like Infographics, Data Slides, Data Presentations, Static Dashboards and Single-Chart Visualizations. Excel 2010 has some features relevant to Data Visualizations, including one of the best columnar in-memory databases (PowerPivot as free add-in), an ability to synchronize multiple Charts through slicers, a limited ability to drilldown data using slicers and even the support for both 64-bit and 32-bit. However, when comparing with Qlikview, Spotfire and Tableau the Excel 2010 feels like a stone-age tool or at least 2 generation behind as far as Data Visualization (and BI) is a concern…

That was my impression until I started to use the Excel Plugin, called Visubi (from company with the same name, see it here ). Suddenly my Excel 2003 and Excel 2007 (I keep them for historical purposes) started to be almost as capable as Excel 2010, because Visubi adding to all those versions of Excel a very capable columnar in-memory database, slicers and many features you cannot find in Excel 2010 and PowerPivot and in addition is greatly improving the functionality of Excel PivotTables and Tables! Vizubi enables me to read (in addition to usual data sources like ODBC, CSV, XLS, XLSX etc.) even my QVD files (Qlikview Data files)! Visubi, unlike PowerPivot, will create Time Dimension(s) the same way as SSAS does. All above means that users are not forced to migrate to Office 2010, but they will have many PowerPivot features with their old version of Excel. In addition Vizubi added to my Excel tables and Pivots uniques feature: I can easily switch back and forth between Table and PivotTable presentation of my data.

Most important Visubi’s feature is that all Vizubi’s tables and pivots are interactive and each piece of data is clickable and enables me to drill down/up/through my entire dataset:

It is basically equivalent or exceeded the drilldown ability of Qlikview, with one exception: Qlikview allows to do it through charts, but Vizubi does it through Tables and PivotTables. Visubi enables Excel user creates large databases with millions of rows (e.g. test database has 15 millions of rows) and enables ordinary users (non-developers) easily create Tables, Reports, Charts, Graphs and Dashboards with such database – all within familiar Excel environment using easy Drag-and-Drop UI:

Vizubi’s Database(s) enables users to share data over central datastore, while keeping Excel as a personal desktop DV (or BI) client. See Vizubi videos here and tutorials here.

Vizubi is a small (15 employees) profitable Italian company and it is a living prove that size does not matter – Vizubi did something extremely valuable and cool for Excel users that giant Microsoft failed to do for many years, even with PowerPivot. Prices for Vizubi is minimal considering the value it adds to Excel: between $99 and &279, depends on the version and the number of seats (discounts are available, see it here ).

Vizubi is not perfect (they just at version 1.21, less then one year old product), for example I wish they will support a graphical drilldown like Qlikview does (outlining rectangles right on Charts and then instant selection of appropriate subset of data ), a web client (like Spotfire) and web publishing for their functionality (even Excel 2010 supports Slicers on a web in Office Live environment), 64-bit Excel (32-bits is so 20th century), the ability to read and use SSAS and PowerPivot directly (like Tableau does), some scripting (Javascript or VBScript like Qlikview) and”formula”  language (like PowerPivot with DAX) etc.

I suggest to review these articles about Vizubi: in TDWI by Stephen Swoyer and relatively old article  from Marco Russo at SQLBlog .

Permalink: https://apandre.wordpress.com/2011/04/10/visubi/

Last week Deloitte suddenly declared that 2011 will be a year of Data Visualization (DV for short, at least on this site) and main technology trend in 2011 will be a Data Visualization as “Emerging Enabler”. It took Deloitte many years to see the trend (I advise to them to re-read posts by observers and analysts like Stephen Few, David Raab, Boris Evelson, Curt Monash, Mark Smith, Fern Halper and other known experts). Yes, I am welcoming Deloitte  to DV Party anyway: better late then never. You can download their “full” report here, in which they allocated first(!) 6 pages to Data Visualization. I cannot resist to notice that “DV Specialists” at Deloitte just recycling (using own words!) some stuff (even from this blog) known for ages and from multiple places on Web and I am glad that Deloitte knows how to use the Internet and how to read.

However, some details in Deloitte’s report amazed me of how they are out of touch with reality and made me wondering in what Cave or Cage (or Ivory Tower?)

these guys are wasting their well-paid time? On a sidebar of their “Visualization” Pages/Post they published a poll: “What type of visualization platform is most effective in supporting your organization’s business decision making?”. Among most laughable options to choose/vote you can find “Lotus” (hello, people, are you there? 20th century ended many years ago!), Access (what are you smoking people?), Excel (it cannot even have interactive charts and proper drilldown functionality, but yes, everybody has it), Crystal Reports (static reports are among main reasons why people looking for interactive Data Visualization alternatives), “Many Eyes” (I love enthusiasts, but it will not help me to produce actionable data views) and some “standalone options” like SAS and ILOG which are 2 generations behind of leading DV tools. What is more amazing that “BI and Reporting option” (Crystal, BO etc.) collected 30% of voters and other vote getters are “standalone option” (Deloitte thinks SAS and ILOG are  there) – 19% and “None of the Above” option got 22%!

In the second part of their 2011 Tech Trends report Deloitte declares the “Real Analytics” as a main trend among “Disruptive Deployments”. Use of word “Real Analytics” made me laugh again and reminds me some other funny usage of the word “real”: “Real Man”, Real Woman” etc. I just want to see what it will be as an “unreal analytics” or “not real analytics” or whatever real antonym for “real analytics” is.

Update: Deloitte and Qliktech form alliance in last week of April of 2011, see it here.

More updates: In August 2011 Deloitte opened “”The Real Analytics website”” here: http://realanalyticsinsights.com/ and on 9/13/11 they “Joined forces in US with Qliktech: http://investor.qlikview.com/releasedetail.cfm?ReleaseID=604843

Permalink: https://apandre.wordpress.com/2011/03/29/deloitte-too/

Heritage Provider Network is offering a cool $3 millions in prize money for the development of an algorithm that can best predict how often people are likely to be sent to the hospital. Jonathan Gluck — senior executive at Heritage — said the goal of the competition is to create a model that can “identify people who can benefit from additional services,” such as nurse visits and preventive care. Such additional services could reduce health care spending and cut back on excessive hospitalizations, Gluck said.

The algorithm contest, the largest of its kind so far, is an attempt (also see Slate article here) to help find the best answers to complicated data-analysis questions. Previous known was the $1 million Netflix Inc. prize awarded in 2009 for a model to better predict what movies people would like. In 2009, a global team of seven members consisting of statisticians, machine-learning experts and computer engineers was awarded the $1 Million contest prize and Netflix replaced its legacy recommendation system with the team’s new algorithm (2nd Netflix’s competition was stopped by FTC and lawyers). I personally think that this time Data Visualization will be a large part of winning solution.

The competition — which will be run by Australian startup firm Kaggle — begins on April 4 and will be open for about two years. Contestants will have access to de-identified insurance claims data to help them develop a system for predicting the number of days an individual is likely to spend in a hospital in one year. Kaggle spent months streamlining claims data and removing potentially identifying information, such as names, addresses, treatment dates and diagnostic codes. Teams will have access to three years of non-identifiable healthcare data for thousands of patients.
The data will include outpatient visits, hospitalizations, medication claims and outpatient laboratory visits, including some test results. The data for each de-identified patient will be organized into two sections: “Historical Data” and “Admission Data.” Historical Data will represent three years of past claims data. This section of the dataset will be used to predict if that patient is going to be admitted during the Admission Data period. Admission Data represents previous claims data and will contain whether or not a hospital admission occurred for that patient; it will be a binary flag.

The training dataset includes several thousand anonymized patients and will be made available, securely and in full, to any registered team for the purpose of developing effective screening algorithms. The quiz/test dataset is a smaller set of anonymized patients. Teams will only receive the Historical Data section of these datasets and the two datasets will be mixed together so that teams will not be aware of which de-identified patients are in which set.

Teams will make predictions based on these data sets and submit their predictions to HPN through the official Heritage Health Prize web site. HPN will use the Quiz Dataset for the initial assessment of the Team’s algorithms. HPN will evaluate and report back scores to the teams through the prize website’s leader board.

Scores from the final Test Dataset will not be made available to teams until the accuracy thresholds are passed. The test dataset will be used in the final judging and results will be kept hidden. These scores are used to preserve the integrity of scoring and to help validate the predictive algorithms. You can find more about Online Testing and Judging here.

The American Hospital Association estimates that more than 71 million people are admitted to the hospital each year, and that $30 Billion is spent on unnecessary admissions.

Pagos released this week SpreadsheetWEB 3.2 (PSW for short) with new Data Visualization features (Pagos Data Visualizer or PDV for short). Among those features is an ability to drill-down any Visible Data through synchronized filters, which immediately made the SpreadsheetWEB a player in Data Visualization Market.

Tools like Tableau, Qlikview or Spotfire allow people to visualize data, but have very limited ability to collect and update data. PSW (Pagos SpreadsheetWEB), on other hand, since versions 1.X was able to convert any Excel Spreadsheet into Web Application and Web-based Data Collector, to save collected data into SQL Server (including latest SQL Server 2008 R2) Database, and to Report or Visualize the Data online through SaaS web-based spreadsheet, which looks and behaves as Excel Spreadsheet! SpreadsheetWEB has unique ability to collect data in a Batch Process and run large datasets against SpreadsheetWEB application. This video demonstrates data collection and data management and collaborations utilizing workflow capabilities and SpreadsheetWEB Control Panel interface. SpreadsheetWEB can use Web-Service as Data Source (like Excel does) and allows web-based spreadsheets to function as Web Service too:

One of the reasons why most people still use and like Excel as a BI tool is that they can use many of the built-in worksheet formulas to process data in real-time while filtering the dashboard. SpreadsheetWEB converts those formulas and can execute them on the server. Database-driven SpreadsheetWEB applications support most features in Excel, including worksheet formulas, 333+ Excel functions, formatting, 33+ types of Excel charts as well as Sparklines,

also see video here:

as well as pivot tables, validation, comments, filters and hyperlinks, while almost completely eliminating the need for application and database developers, as well as need for IT services. Basically if person knows Excel, than he knows how to use SpreadsheetWEB. SpreadsheetWEB (both 64-bit and 32-bit) has HTML Editor and Scripting Support (JavaScript), similar to what macros do for Excel (be aware that it is not port of VBA):

Among 3 DV Leaders only Tableau is able to read Microsoft SQL Server Analysis Services (SSAS) data sources, which is a must for long-term success in Visual Analytics market. SpreadhseetWEB has this functionality the same way as Excel does and therefore ahead of Qlikview and Spotfire in this extremely important department. Among other advanced Data Visualization Features SpreadsheetWEB supports Maps in Dashboards

and multi-page Dashboard reports. I like Version Control for applications and Server Monitoring features – they can be very attractive for enterprise users. SpreadsheetWEB does not require SharePoint Server to execute Excel workbooks on the server. Pagos developed proprietary spreadsheet technology to achieve that independence from SharePoint Server (I personally consider SharePoint as a Virus). This makes Pagos very attractive to cost conscious small to medium size organizations. Installing SpreadsheetWEB only requires Windows Server and Microsoft SQL Server. In addition, SpreadsheetWEB works with free SQL Express Edition, which is an additional savings for Customers with small datasets.

For advanced Data Visualization functionality, Pagos established the OEM partnership with TIBCO and integrates SpreadsheetWEB with TIBCO Spotfire Analytic Platform. For advanced SaaS features, including strictest security and hosting requirements and SAS70 Compliance, Pagos partners with Rackspace.

SpreadsheetWEB is one of the few players in the market that offer Software-as-a-Service (SaaS) licensing along with traditional server licensing. Pagos has very attractive SaaS fees and extremely competitive pricing for those who want to buy own SpreadsheetWEB server: $4900 per SpreadsheetWEB server for 50 named users and 25 web applications and dashboards; that price at least 10 times better than prices from Qlikview, Spotfire and Tableau. Pagos provides 44+ Video Tutorials, 53+ online Demos, free non-expiring trial and Wiki-based full Documentation for SpreadsheetWEB, so people can review, browse and evaluate SpreadsheetWEB way before they will buy it.

Pagos is in BI business since 2002, profitable and fully self-funded since inception, with hundreds of customers. Pagos has other advanced BI-related products, like SpreadsheetLIVE (it offers a fully featured spreadsheet application environment within a web browser) and Pagos Spreadsheet Component (allows software developers to create web and desktop applications that can read, execute, and create Excel spreadsheets without requring Microsoft Excel). If you will compare SpreadsheetWEB with Microsoft’s own attempt to webify Excel and Microsoft’s own Long List of Unsupported Excel features, you can easily appreciate the significance of what Pagos achieved!

Permalink: https://apandre.wordpress.com/2011/03/13/spreadsheetweb/

For many years, Gartner keeps annoying me every January by publishing so called “Magic Quadrant for Business Intelligence Platforms” (MQ4BI for short) and most vendors (mentioned in it; this is funny, even Donald Farmer quotes MQ4BI) almost immediately re-published it either on so-called reprint (e.g. here – for a few months) area of Gartner website or on own website; some of them also making this “report” available to web visitors in exchange for contact info – for free. To channel my feeling toward Gartner  to a  something constructive, I decided to produce my own “Quadrant” for Data Visualization Platforms (DV “Quadrant” or Q4DV for short) – it is below and is a work in-progress and will be modified and republished overtime:

3 DV Leaders (green dots in upper right corner of Q4DV above) compared with each other and with Microsoft BI stack on this blog, as well as voted in DV Poll on LinkedIn. MQ4BI report actually contains a lot of useful info and it deserved to be used as a one of possible data sources for my new post, which has more specific target – Data Visualization Platforms. As I said above, I will call it Quadrant too: Q4DV. But before I will do that, I have to comment on Gartner’s annual MQ4BI.

MQ4BI customer survey included vendor-provided references, as well as survey responses from BI users in Gartner’s BI summit and inquiry lists. There were 1,225 survey responses (funny enough, almost the same number of responces as on my DV Poll on LinkedIn), with 247 (20%) from non-vendor-supplied reference lists. Magic Quadrant Customer Survey’s results the Gartner promised to publish in 1Q11. The Gartner has a somewhat reasonable “Inclusion and Exclusion Criteria” (for Data Visualization Q4DV I excluded some vendors from Gartner List and included a few too), almost tolerable but a fuzzy BI Market Definition (based on 13 loosely pre-defined capabilities organized into 3 categories of functionality: integration, information delivery and analysis).

I also partially agree with the definition and the usage of “Ability to Execute” as one  (Y axis) of 2 dimensions for bubble Chart above (called the same way as entire report “Magic Quadrant for Business Intelligence Platforms”). However I disagree with Gartner’s order of vendors in their ability to execute and for DV purposes I had to completely change order of DV Vendors on X axis (“Completeness of Vision”).

For Q4DV purposes I am reusing Gartner’s MQ as a template, I also excluded almost all vendors, classified by Gartner as niche players with lower ability to execute (bottom-left quarter of MQ4BI), except Panorama Software (Gartner put Panorama to a last place, which is unfair) and will add the following vendors: Panopticon, Visokio, Pagos and may be some others after further testing.

I am going to update this DV “Quadrant”, using the method suggested by Jon Peltier: http://peltiertech.com/WordPress/excel-chart-with-colored-quadrant-background/ – Thank you Jon! I hope I will have time before end of 2011 for it…

Permalink: https://apandre.wordpress.com/2011/02/13/q4dv/

On New Year Eve I started on LinkedIn the Poll “What tool is better for Data Visualization? and 1340 people voted there (which is unusually high return for LinkedIn polls, most of them getting less then 1000 votes), in average one vote per hour during 8 weeks, which is statistically significant as a reflection of the fact that the Data Visualization market has 3 clear leaders (probably at least a generation ahead of all other competitors: Spotfire, Tableau and Qlikview. Spotfire is a top vote getter: as of 2/27/11, 1pm EST: Spotfire got 450 votes (34%), Tableau 308 (23%), Qlikview 305 (23% ; Qlikview result improved during last 3 weeks of this poll), PowerPivot 146 (11%, more votes then all “Other” DV Tools) and all Others DV tools got just 131 votes (10%). Poll got 88 comments (more then 6% of voters commented on poll!) , will be open for more unique voters until 2/27/11, 7pm and its results consistent during last 5 weeks, so statistically it represents the user preferences of the LinkedIn population:

URL is http://linkd.in/f5SRw9 but you need to login to LinkedIn.com to vote. Also see some demographic info (in somewhat ugly visualization by … LinkedIn) about poll voters below:

Interesting that Tableau voters are younger then for other DV tools and more then 82% voters in poll are men. Summary of some comments:

  • – poll’s question is too generic – because an answer partially depends on what you are trying to visualize;
  • – poll is limited by LinkedIn restrictions, which allows no more than 5 possible/optional answers on Poll’s question;
  • – poll’s results may correlate with number of Qlikview/Tableau/Spotfire groups (and the size of their membership) on LinkedIn and also ability of employees of vendors of respective tools to vote in favor of the tool, produced by their company (I don’t see this happened). LinkedIn has 85 groups, related to Qlikview (with almost 5000 members), 34 groups related to Tableau (with 2000+ members total) and 7 groups related to Spotfire (with about 400 members total).
  • Randall Hand posted interesting comments about my poll here:    http://www.vizworld.com/2011/01/tool-data-visualization/#more-19190 . I disagreed with some of Randall’s assessments that “Gartner is probably right” (in my opinion Gartner is usually wrong when it is talking about BI, I posted on this blog about it and Randall agreed with me) and that “IBM & Microsoft rule … markets”. In fact IBM is very far behind (of Qlikview, Spotfire and Tableau) and Microsoft, while has excellent technologies (like PowerPivot and SSAS) are behind too, because Microsoft made a strategic mistake and does not have a visualization product, only technologies for it.
  • Spotfire fans from Facebook had some “advise” from here: http://www.facebook.com/TIBCOSpotfire (post said “TIBCO Spotfire LinkedIn users: Spotfire needs your votes! Weigh in on this poll and make us the Data Visualization tool of choice…” (nothing I can do to prevent people doing that, sorry). I think that the poll is statistically significant anyway and voters from Facebook may be added just a couple of dozens of votes for … their favorite tool.
  • Among Other Data Visualization tools, mentioned in 88 comments so far were JMP, R, Panopticon, Omniscope (from Visokio), BO/SAP Explorer and Excelsius, IBM Cognos, SpreadsheetWEB, IBM’s Elixir Enterprise Edition, iCharts, UC4 Insight, Birst, Digdash, Constellation Roamer, BIme, Bissantz DeltaMaster, RA.Pid, Corda Technologies, Advizor, LogiXml,TeleView etc.

Permalink: https://apandre.wordpress.com/dvpoll/

“Big Data Analytics” (BDA) is going to be a new buzzword for 2011. The same and new companies (and in some cases even the same people) who tried for 20+ years to use the term BI in order to sell their underused software now trying to use the new term BDA in hope to increase their sales and relevancy. Suddenly one of main reasons why BI tools are underused is a rapidly growing size of data.

Now new generation of existing tools (Teradata, Exadata, Netezza, Greenplum, PDW  etc.) and of course “new” tools (can you say VoltDB, Aster Data (Teradata now!), nPario “Platform”. Hadoop, MapReduce, Cassandra, R, HANA, Paradigm4, MPP appliances etc. which are all cool and hot at the same time) and companies will enable users to collect, store, access and manipulate much larger datasets (petabytes).

For users, the level of noise will be now much bigger than before (and SNR – Signal-to-Noise ratio will be lower), because BDA is solving a HUGE (massive amounts of data are everywhere, from genome to RFID to application and network logfiles  to health data etc.) backend problem, while users interact with front-end and concern about trends, outliers, clusters, patterns, drilldowns and other visually intensive data phenomenas. However, SNR can be increased if  BDA technologies will be used together and as supporting tools to the signal-producing tools which are … Data Visualization tools.

Example of that can be a recent partnership between Tableau Software and Aster Data (Teradata bought Aster Data in March 2011!). I know for sure that EMC trying to partner Greenplum with most viable Data Visualizers, Microsoft will integrate its PDW with PowerPivot and Excel and I can assume of how to integrate Spotfire with BDA. Integration of Qlikview with BDA can be more difficult, since Qlikview currently can manipulate only data in own memory. In any case, I see DV tools as the main attraction and selling point for end-users and I hope BDA vendors can/will understand this simple truth and behave accordingly.

Permalink: https://apandre.wordpress.com/2011/01/16/bigdata/

I never saw before when one man moved from one company to another, then 46+ people will almost immediately comment on it. But this is what happened during last few days, when Donald Farmer, the Principal Program Manager for Microsoft BI Platform for 10 years, left Microsoft for Qliktech. Less than one year ago, Donald compared Qlikview and PowerPivot and while he was respectful to Qlikview, his comparison favored PowerPivot and Microsoft BI stack. I can think/guess about multiple reasons why (and I quote him: “I look forward to telling you more about this role and what promises to be a thrilling new direction for me with the most exciting company I have seen in years”) he did it, for example:

  • Microsoft does not have a DV Product (and one can guess that Donald wants to be the “face” of the product),
  • Qliktech had a successful IPO and secondary offering (money talks, especially when 700-strong company has $2B market capitalization and growing),
  • lack of confidence in Microsoft BI Vision (one can guess that Donald has a different “vision”),
  • SharePoint is a virus (SharePoint created a billion dollar industry, which one can consider wasted),
  • Qlikview making a DV Developer much more productive (a cool 30 to 50 times more productive) than Microsoft’s toolset (Microsoft even did not migrate the BIDS 2008 to Visual Studio 2010!),
  • and many others (Donald said that for him it is mostly user empowerment and user inspiration by Qlikview – sounds like he was underinspired with Microsoft BI stack so is it just a move from Microsoft rather then move  to Qliktech? – I guess I need a better explanation),

but Donald did explain it in his next blog post: “QlikView stands out for me, because it not only enables and empowers users; QlikView users are also inspired. This is, in a way, beyond our control. BI vendors and analysts cannot prescribe inspiration“. I have to be honest – and I repeat it again – I wish a better explanation… For  example, one my friend made a “ridiculous guess” that Microsoft sent Donald inside Qliktech to figure out if it does make sense to buy Qliktech and when (I think it is too late for that, but at least it is an interesting thought: good/evil  buyer/VC/investor will do a “due diligence” first, preferably internal and “technical due diligence” too) to buy it and who should stay and who should go.

I actually know other people recently moved to Qliktech (e.g. from Spotfire), but I have a question for Donald about his new title: “QlikView Product Advocate”. According to http://dictionary.reference.com/ the Advocate is a person who defends, supports and promotes a cause. I will argue that Qlikview does not need any of that (no need to defend it for sure, Qlikview has plenty of Supporters and Promoters); instead Qlikview needs a strong strategist and visionary

(and Donald is the best at it) who can lead and convince Qliktech to add new functionality in order to stay ahead of competition with at least Tableau, Spotfire and Microsoft included. One of many examples will be an ability to read … Microsoft’s SSAS multidimensional cubes, like Tableau 6.0 and Omniscope 2.6 have now.

Almost unrelated – I updated this page:  https://apandre.wordpress.com/market/competitors/qliktech/

Permalink: https://apandre.wordpress.com/2011/01/09/farmer_goes_2_qlikview/

Happy holidays to visitors of this blog and my best wishes for 2011! December 2010 was so busy for me, so I did not have time to blog about anything. I will just mention some news in this last post of 2010.

Tableau sales will exceed $40M in 2010 (and they planning to employ 300+ by end of 2011!), which is almost 20% of Qliktech sales in 2010. My guesstimate (if anybody has better data, please comment on it) that Spotfire’s sales in 2010 are about $80M. Qliktech’s market capitalization exceeded recently $2B, more than twice of Microstrategy ($930M as of today) Cap!

I recently noticed that Gartner trying to coin the new catch phrase because old (referring to BI, which never worked because intelligence is attribute of humans and not attribute of businesses) does not work. Now they are saying that for last 20+ years when they talked about business intelligence (BI) they meant an intelligent business. I think this is confusing because (at least in USA) business is all about profit and Chief Business Intelligent Dr. Karl Marx will agree with that. I respect the phrase “Profitable Business” but “Intelligent Business” reminds me the old phrase “Crocodile tears“. Gartner also saying that BI projects should be treated as a “cultural transformation” which reminds me a road paved with good intentions.

I also noticed the huge attention paid by Forrester to Advanced Data Visualization and probably for 4  good reasons (I have the different reasoning, but I am not part of Forrester) :

  • data visualization can fit much more (tens of thousands) data points into one screen or page compare with numerical information and datagrid ( hundreds datapoints per screen);
  • ability to visually drilldown and zoom through interactive and synchronized charts;
  • ability to convey a story behind the data to a wider audience through data visualization.
  • analysts and decision makers cannot see patterns (and in many cases also trends and outliers) in data without data visualization, like 37+ years old example, known as Anscombe’s quartet, which comprises four datasets that have identical simple statistical properties, yet appear very different when visualized. They were constructed by F.J. Anscombe to demonstrate the importance of Data Visualization (DV):
Anscombe’s quartet
I II III IV
x y x y x y x y
10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58
8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76
13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71
9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84
11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47
14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04
6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25
4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50
12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56
7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91
5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89

In 2nd half of 2010 all 3 DV leaders released new versions of their beautiful software: Qlikview, Spotfire and Tableau. Visokio’s Omniscope 2.6 will be available soon and I am waiting for it since June 2010… In 2010 Microsoft, IBM, SAP, SAS, Oracle, Microstrategy etc. all trying hard to catch up with DV leaders and I wish to all of them the best of luck in 2011. Here is a list of some other things I still remember from 2010:

  • Microsoft officially declared that it prefers BISM over OLAP and will invest into their future accordingly. I am very disappointed with Microsoft, because it did not include BIDS (Business Intelligence Development Studio) into Visual Studio 2010. Even with release of supercool and free PowerPivot it is likely now that Microsoft will not be a leader in DV (Data Visualization), given it discontinued ProClarity and PerformancePoint and considering ugliness of SharePoint. Project Crescent (new visualization “experience” from Microsoft) was announced 6 weeks ago, but still not too many details about it, except that it mostly done with Silverlight 5 and Community Technology Preview will be available in 1st half of 2011.
  • SAP bought Sybase, released new version 4.0 of Business Objects and HANA “analytic appliance”
  • IBM bought Netezza and released Cognos 10.
  • Oracle released OBIEE 11g with ROLAP and MOLAP unified
  • Microstrategy released its version 9 Released 3 with much faster performance, integration with ESRI and support for web-serviced data
  • EMC bought Greenplum and started new DCD (Data Computing Division), which is obvious attempt to join BI and DV market
  • Panorama released NovaView for PowerPivot, which is natively connecting to the PowerPivot in-memory models.
  • Actuate’s BIRT was downloaded 10 million times (!) and has over a million (!) BIRT developers
  • Panopticon 5.7 was released recently (on 11/22/10) and adds the ability to display real-time streaming data.

David Raab, one of my favorite DV and BI gurus, published on his blog the interesting comparison of some leading DV tools. According to David’ scenario, one of possible ranking of DV Tools can be like that: Tableau is 1st than  Advizor (version 5.6 available since June 2010), Spotfire and Qlikview (seems to me David implied that order). In my recent DV comparison “my scenario” gave a different ranking: Qlikview is slightly ahead, while Spotfire and Tableau are sharing 2nd place (but very competitive to Qlikview) and Microsoft is distant 4th place, but it is possible that David knows something, which I don’t…

In addition to David, I want to thank  Boris Evelson, Mark Smith, Prof. Shneiderman, Prof. Rosling, Curt Monash, Stephen Few and others for their publications, articles, blogs and demos dedicated to Data Visualization in 2010 and before.

Permalink: https://apandre.wordpress.com/2010/12/25/hny2011/

Microsoft reused its patented VertiPaq column-oriented DB technology in upcoming SQL Server 11.0 release by introducing columnstore indexes, where each columns stored in separate set of disk pages. Below is a “compressed” extraction from Microsoft publication and I think it is very relevant to the future of Data Visualization techologies. Traditionally RDBMS uses “row store” where

heap or a B-tree contains multiple rows per page. The columns are stored in different groups of pages in the columnstore index. Benefits of this are:

  • only the columns needed to solve a query are fetched from disk (this is often fewer than 15% of the columns in a typical fact table),
  • it’s easier to compress the data due to the redundancy of data within a column, and
  • buffer hit rates are improved because data is highly compressed, and frequently accessed parts of commonly used columns remain in memory, while infrequently used parts are paged out.

“The columnstore index in SQL Server employs Microsoft’s patented Vertipaq™ technology, which it shares with SQL Server Analysis Services and PowerPivot. SQL Server columnstore indexes don’t have to fit in main memory, but they can effectively use as much memory as is available on the server. Portions of columns are moved in and out of memory on demand.” SQL Server is the first major database product to support a pure Columnstore index. Columnstore recommended for fact tables in DW in datawarehouse, for large dimensions (say with more than 10 millions of records) and any large tables designated to be used as read-only.

“In memory-constrained environments when the columnstore working set fits in RAM but the row store working set doesn’t fit, it is easy to demonstrate thousand-fold speedups. When both the column store7and the row store fit in RAM, the differences are smaller but are usually in the 6X to 100X range for star join queries with grouping and aggregation.” Your results will of course depend on your data, workload, and hardware. Columnstore index query processing is most heavily optimized for star join queries. OLTP-style queries, including point lookups, and fetches of every column of a wide row, will usually not perform as well with a columnstore index as with a B-tree index.

Columnstore compressed data with a factor of 4 to a factor of 15 compression with different fact tables. The columnstore index is a secondary index; the row store is still present, though during query processing it is often not need, and ends up being paged out. A clustered columnstore index, which will be the master copy of the data, is planned for the future. This will give significant space savings.

Tables with columnstore indexes can’t be updated directly using INSERT, UPDATE, DELETE, and MERGE statements, or bulk load operations. To move data into a columnstore table you can switch in a partition, or disable the columnstore index, update the table, and rebuild the index. Columnstore indexes on partitioned tables must be partition-aligned. Most data warehouse customers have a daily, weekly or monthly load cycle, and treat the data warehouse as read-only during the day, so they’ll almost certainly be able to use columnstore indexes.You can also create a view that uses UNION ALL to combine a table with a column store index and an updatable table without a columnstore index into one logical table. This view can then be referenced by queries. This allows dynamic insertion of new data into a single logical fact table while still retaining much of the performance benefit of columnstore capability.

Most important for DV systems is this statement: “Users who were using OLAP systems only to get fast query performance, but who prefer to use the T-SQL language to write queries, may find they can have one less moving part in their environment, reducing cost and complexity. Users who like the sophisticated reporting tools, dimensional modeling capability, forecasting facilities, and decision-support specific query languages that OLAP tools offer can continue to benefit from them. Moreover, they may now be able to use ROLAP against a columnstore-indexed SQL Server data warehouse, and meet or exceed the performance they were used to in the past with OLAP, but save time by eliminating the cube building process“. This sounds like Microsoft finally figured out of how to compete with Qlikview (technology-wise only, because Microsoft still does not have – may be intentionally(?) – DV product).

Permalink: https://apandre.wordpress.com/2010/12/03/columnstore-index/

SAP released HANA today which does in-memory computing with in-memory database. Sample appliance with 10 blades with 32 cores (using XEON 7500) each; sample (another buzzword: “data source agnostic”) appliance costs approximately half-million of dollars. SAP claimed that”Very complex reports and queries against 500 billion point-of-sale records were run in less than one minute” using parallel processing. SAP HANA “scales linearly” with performance proportional to hardware improvements that enable complex real-time analytics.

Pricing will likely be value based and that it is looking for an all-in figure of around $10 million per deal. Each deal will be evaluated based upon requirements and during the call, the company confirmed that each engagement will be unique (so SAP is hoping for 40-60 deals in pipeline).

I think with such pricing and data size the HANA appliance (as well as other pricey data appliances) can be useful mostly in 2 scenarios:

  • when it integrates with mathematical models to enable users to discover patterns, clusters, trends, outliers and hidden dependencies and
  • when those mountains of data can be visualized, interactively explored and searched, drilled-down and pivot…

8/8/11 Update: The 400 million-euro ($571 million) pipeline for Hana, which was officially released in June, is the biggest in the history of Walldorf, Germany-based SAP, the largest maker of business-management software. It’s growing by 10 million euros a week, co-Chief Executive Officer Bill McDermott said last month. BASF, the world’s largest chemical company, has been able to analyze commodity sales 120 times faster with Hana, it said last month. Russian oil producer OAO Surgutneftegas, which has been using Hana in test programs since February, said the analysis of raw data directly from the operational system made additional data warehouse obsolete.

Permalink: https://apandre.wordpress.com/2010/12/01/sap-hana/

Microsoft used to be a greatest marketing machine in software industry. But after loosing search business to Google and smartphone business to Apple and Google they lost their winning skills. It is clear now that this is also true in so called BI Market (Business Intelligence is just a marketing term).  Microsoft bought ProClarity and it disappeared, they released PerformancePoint Server and it is disappearing too. They have (or had?) the best BI Stack (SQL Server 2008 R2 and its Analysis Services, Business Intelligence Development Studio 2008 (BIDS), Excel 2010, PowerPivot etc.) and they failed to release any BI or Data Visualization Product, despite having all technological pieces and components. Microsoft even released Visual Studio 2010 without any support for BIDS and recently they talked about their Roadmap for BI and again – they delayed the mentioning of BIDS 2010 and they declared NO plans for BI or DV products! Instead they are talking about “new ad hoc reporting and data visualization experience codenamed “Project Crescent””!

And than they have a BISM model as a part of Roadmap: “A new Business Intelligence Semantic Model (BISM) in Analysis Services that will power Crescent as well as other Microsoft BI front end experiences such as Excel, Reporting Services and SharePoint Insights”.

Experience and Model instead of Product? What Microsoft did with PowerPivot is clear: they gave some users the reason to upgrade to Office 2010, and as a result, Microsoft preserved and protected (for another 2 years?) their lucrative Office business but diminished their chances to get a significant pie of $11B (and  growing 10% per year) BI Market. new BISM (Business Intelligence Semantic Model) is a clear sign of losing technological edge:

image

I have to quote (because they finally admitted that BIDS will be  replaced by BISM – when “Project Juneau” will be available): “The BI Semantic Model can be authored by BI professionals in the Visual Studio 2010 environment using a new project type that will be available as part of “Project Juneau”. Juneau is an integrated development environment for all of SQL Server and subsumes the Business Intelligence Development Studio (BIDS). When a business user creates a PowerPivot application, the model that is embedded inside the workbook is also a BI Semantic Model. When the workbook is published to SharePoint, the model is hosted inside an SSAS server and served up to other applications and services such as Excel Services, Reporting Services, etc. Since it is the same BI Semantic Model that is powering PowerPivot for Excel, PowerPivot for SharePoint and Analysis Services, it enables seamless transition of BI applications from Personal BI to Team BI to Organizational (or Professional) BI.

Funniest part of this quote above that Microsoft is honestly believe that SharePoint is not a Virus but a viable Product and it will escape the fate of its “step-brother” – PerfromancePoint Server. Sweet dreams! It is clear that Microsoft failed to understand that Data Visualization is the future of BI market and they keep recycling for themselves the obvious lie “Analysis Services is the industry leading BI platform in this space today“! Indirectly they acknowledged it in a very next statement : “With the introduction of the BI Semantic Model, there are two flavors of Analysis Services – one that runs the UDM (OLAP) model and one that runs the BISM model”. Hello?

Why we need 2 BI Models instead of 1 BI product? BIDS 2008 itself is already buggy and much less productive development environment than Qlikview, Spotfire and Tableau, but now Microsoft wants us to be confused with 2 co-existing approaches: OLAP and BISM? And now get this: “you should expect to see more investment put into the BISM and less in the UDM(OLAP)”!

Dirty Harry will say in such situation: “Go ahead, make my day!” And I guess that Microsoft  does not care that Apple’s  Market CAP is larger than Microsoft now.

Afterthought (looking at this from 2011 point of view): I am thinking now that I know why Donald Farmer left Microsoft 2 months after BISM announcement above.

p010: http://wp.me/pCJUg-7r

It looks like honeymoon for Qlikview after Qliktech’s IPO is over. In addition to Spotfire 3.2/Silver, now we have the 3rd great piece of software in form of Tableau 6. Tableau 6.0 released today (both 32-bit and 64-bit) with new in-memory data engine (very fast, say 67 millions of rows in 2 seconds) and quick data blending from multiple data sources while normalizing across them. Data Visualization Software available as a Server (with web browsers as free Clients) and as a Desktop (Pro for $1999, Personal for $999, Reader for free).

New Data Sources include local PowerPivot files(!),  Aster Data ; new Data Connections include OData , (recently released) Windows Azure Marketplace Datamarket; Data Connection can be Direct/Live or to in-memory data engine. Tableau 6 does full or partial automatic data updates; supports parameters for calculations, what-if modeling, and selectability of Displaying fields in Chart’s axis; combo charts of any pair of charts; has new project views, supports Motion Charts

tumblr_mssaaxhajz1stz40uo1_500

(a la Hans Rosling) etc. Also see Ventana Research and comments by Tableau followers. This post can be expanded, since it is officially 1st day of release.

n009: http://wp.me/sCJUg-tableau6

DV (Data Visualization) makes more sense when you trying to Visualize huge datasets, which indirectly implies the eventual need for DW (Data Warehouses) and DW appliances (DWA). Among pioneers for DWA we can name the Teradata . This was not a very hot area until 7/6/10, when EMC bought Greenplum with its own MPP architecture. On 9/20/10 IBM bought Netezza for $1.7B and DWA market became officially hot in anticipation of need of DV and BI users for a lot of DWA for their “big data”. Teradata claimed 2 years ago that Netezza are far behind performance-wise, but apparently IBM disagrees or does not care… Please note that Netezza,  before it was bought, pro-actively partnered with DV vendors, using them as a way to expand their market share and this points us to the future.

With “big data” buzz everywhere, I suspect a large wave of partnerships between DWA (EMC DCA (Data Computing Appliance), IBM, Teradata, Microsoft / DATAlegro, Oracle / Exadata, SAP ( HANA + Sybase IQ ) vendors, as well as vendors of virtual DWAs)  and DV vendors is coming in 2011. Data Visualization making DWA much more attractive for end users with huge datasets! Microsoft’s PDW was released on 11/9/10 and SAP HANA will be released in November 2010 too

p008: http://wp.me/sCJUg-dwa

BI and DV vendors do not want me to relax and keep releasing new stuff too often. I feel guilty now and I will (3+ months after it was released) comment on Spotfire 3.2 release soon. But today I have to comment on Cognos 10 Release (which will be available Oct. 30; everybody now does pre-announcement: 2 weeks ago Qlikview 10, yesterday BO4, today Cognos 10). I quote: “IBM acquired Cognos in early 2008 during a five year buying spree that saw it swallow over 24 analytics companies in five years for a total bill of US$14 billion”. Rob Ashe, general manager for BI at IBM, said: ““Analytics is a key part of our 2015 roadmap. Last year, analytics contributed $9 billion to our revenues, and we expect to see that grow to $16 billion in 2015.”

The Cognos 10 embeds SSPS and Lotus Connections, supports SaaS, active/interactive reports via email (no need to install anything), mobile devices such as iPhones, iPads and BlackBerrys (as well as Symbian phones, and Windows Mobile devices), real-time updates, has “modern” Web 2.0 user interface. Cognos TM1 (from Applix) is a multidimensional, 64-bit, in-memory OLAP engine which provides fast performance for analyzing complex and sophisticated models, large data sets and even streamed data.

Personally I think Cognos 10 compares favorably against BO4, SAS 9.2, OBIEE 11g , but all 4 have at least 2 common problems: they are all engaged too much with Java and they are far (of Qlikview, Spotfire, Omniscope, Tableau etc.) behind in Data Visualization

n006: http://wp.me/pCJUg-4Z

Business Objects 4.0 will be available this  (2010) year” – SAP teases own customers at ASUG. It became a habit for SAP – to say something about a product they did not release yet. For example they did pre-announcement of HANA (in-memory analytics appliance) in May 2010, see http://www.infoworld.com/d/applications/sap-build-new-in-memory-database-appliances-392 and now they are saying that HANA will be released in November 2010: http://www.infoworld.com/d/applications/saps-in-memory-analytics-boxes-set-november-release-117 . It is very funny to see how 3 (SAP, IBM, Oracle) or 4 (if  you include the mindshare leader SAS) BI behemoths trying to compete (using money instead of creativity) with DV leaders like Qlikview and Spotfire who has in-memory columnar DB for years. E.g. IBM recently bought Netezza, SSPS and Applix and trying to marry Applix with Cognos. Or Oracle (after buying Sun) releasing Exadata and Exalogic to compete with… IBM’s Netezza and SAP’s HANA. SAP actually owns now (after they recently bought Sybase) the best collection of BI and DV-related technologies, like best columnar DB Sybase IQ (ok, Vertica too, but Qlikview, PowerPivot and Spotfire have it in-memory).

Back to BO4: it will be 64-bit only, Desktop Intelligence will not be included in this release, BO4 will be more dependent on Java (SAP, IBM, Oracle and SAS –  all 4 making a strategic mistake by integrating their product with dying Java), BO4 will have “data federation”, BO4 will be integrated with SAP Portfolio (e.g. NetWeaver), Bo4 has now multi-dimensional analytical ability,
SAP Explorer allows in-memory Analytics etc. It took SAP 4+ months from pre-anouncement to release of BO4 – I guess they learn from
Microsoft (I am not sure how it helps).

Update as of 7/27/11: BI 4.0 still not released yet and SAP is planning to release it now in August 2011, basically 10 months later then it was pre-anounced! Among other updates: on 7/25/11 SAP released interesting video with Demo:

Update as of 8/31/11: It took SAP 11 months from pre-announcement of BO4 to officially release it, see http://blogs.sap.com/analytics/2011/08/31/update-on-sap-businessobjects-bi-4-0-general-availability/   SAP said today: “Based on efforts over the last several weeks, BI 4.0 is targeted to become generally available starting September 16, 2011.” Also “For customers and partners currently using BI 4.0, new eLearning tutorials are now available on the SAP Community Network. Check out the latest tutorials and take advantage of the new capabilities BI 4.0 has to offer.” It is very funny and very sad RELEASE process.

Enterprise Deployment of SAP BO may look like this:


n005: http://wp.me/pCJUg-4o


Tableau added 1500 new customers during last year (5500 total, also it is used by Oracle on OEM basis as Oracle Hyperion Visual Explorer), had $20M in sales in 2009, Q3 of 2010 showing 123% growth over the same period a year ago, claiming to be a fastest growing software company in BI market (faster than Qliktech), see http://www.tableausoftware.com/press_release/tableau-massive-growth-hiring-q3-2010

Tableau 6.0 will be released next month, they claiming it is 100 times faster than previous version (5.2) with in-memory columnar DB, 64-bit support and optional data compression. They are so confident (due increasing sales) so they put 40 job openings last week (they had 99 employees in 2009, 180 now and plan to have 200 by end of 2010). Tableau is raising (!) prices for their Tableau Desktop Professional from $1800 to $1999 in November 2010, while Personal will stay at $999. They aim directly at Qliktech saying (through their loyal customer) this: “Competitive BI software like QlikView from QlikTech is difficult to use without a consultant or IT manager by your side, a less than optimal allocation of our team’s time and energy. Tableau is a powerful tool that’s easy to use, built to last, and continues to impress my customers.”

In Tableau’s new sales pitch they claiming (among other 60 new features):

  • New super-fast data engine that can cross-tab 10 million rows in under 1 second
  • The ability to blend data from multiple sources in just a click
  • Create endless combination graphs such as bars with lines, circles with bars, etc.

n004: http://wp.me/pCJUg-3Z

Qliktech released as planned the new version 10 of Qlikview last week, see http://www.qlikview.com/us/company/press-room/press-releases/2010/us/1012-qlikview-10-delivers-consumer-bi-software and delivered a lot of new functionality, see

https://apandre.wordpress.com/wp-content/uploads/2010/10/ds-whats-new-in-qlikview-10-en.pdf

to its already impressive list, like in-memory columnar database, the leading set of visual controls (pie/10, bar/7, column/7, line/6, combo/6, area/4, radar/4, scatter/5, bubble/3, heat-map/block/5, gauge/7, pivot/12, table/12, funnel/2, mekko, sparkline, motion charts etc.) totaling more than 80 different charts (almost comparable with Excel 2010 diversity-wise). Qlikview enjoying the position of the DV Leader in Data Visualization market for last few years, thanks to above functionality and to its charts, functioning as visual filters with interactive drill-down functionality, with best productivity for developers, with easiest UI and with multitude of clients (desktop, IE plugin, Java, ajax, most smartphones). Also take a look on this: http://www.ventanaresearch.com/blog/commentblog.aspx?id=4006 and this: http://customerexperiencematrix.blogspot.com/2010/12/qlikviews-new-release-focuses-on.html

Qliktech recently had a successful IPO and secondary offering,  see http://www.google.com/finance?q=Qlik which made capitalization of the Qliktech approaching $2B. DV competition is far from over: recently Qlikview got very strong competition from Spotfire 3.2, PowerPivot and upcoming (this or next month) releases of Tableau 6 and Omniscope 2.6. And don’t forget DV misleaders with a bunch of money, trying to catch-up: SAP, IBM, Oracle, Microsoft, Microstrategy, even Google and others trying very hard to be a DV contenders                                                                                (n002: https://apandre.wordpress.com/2010/10/19/qlikview10/)

Qliktech uses this Diagram to present its current set of Components and DataFlow between them:

QV10 Components and DataFlow.


My original intention was to write a book about Data Visualization, but I realized that all books in Data Visualization area will become obsolete very quickly and that Blog is much more appropriate format. This blog was started just a few months ago and it is always a work in progress, because in addition to blog’s posts it has multiple webpages and most of them will be completed over time, approximately 1 post or page per week. After a few months of blogging I really started to appreciate what E.M. Forster (in “Aspects of the Novel”), Graham Wallas (in “The art of thought”) and Andre Gide said almost 90 years ago: “How do I know what I think until I see what I say?”.

So yes, it is under construction as a website and it is mostly a weekly blog.

Update for 3/24/2011: This site got 22 posts since first post (since January 2010, roughly one post per 2 weeks), 43 (and still growing) pages (some of them incomplete and all are work in progress), 20  comments and getting in last few weeks (in average) almost 200 (this number actually growing steadily) visitors per day. I am starting to get a lot of feedback and some of new posts actually was prompted by questions and requests from visitors and by phone conversations with some of them (they asked to keep their confidentiality).

Update for 11/11/11: This site/blog got (as of today) 46 posts and 61 pages (about 1 post or page per week, or should I say per weekend), 46 comments, hundreds of images and demos, 400+ visitors per weekday and 200+ visitors on weekend days, many RSS and email subscribers. Almost half of new content on this blog/site now created due demand from visitors and as a respond to their needs and requests. I can claim now that it is the visitor-driven blog and it is very aligned to the current state of the science and art of Data Visualization.

Update for 9/8/12: 67 posts, 65 pages, 133 comments, 12000+ visitors per month, Google+ extension of this Blog with 1580+ followers here: https://plus.google.com/u/0/111053008130113715119/posts#111053008130113715119/posts , 435 images, diagrams and screenshots

Permalink: https://apandre.wordpress.com/2010/09/03/dvblogasworkinprogress/

Published the comparison of 4 leading DV Products, see http://wp.me/PCJUg-1T

I did not included into comparison the 5th leading product – Visokio’s Omniscope, because it has very limited scalability due the specifics of it’s implementation: Java does not allow to visualize too much data. Among factors to considered when comparing DV tools:

  • – memory optimization [Qlikview is the leader in in-memory columnar database technology];
  • – load time [I tested all products above and PowerPivot is the fastest];
  • – memory swapping [Spotfire is only who can use a disk as a virtual memory, while Qlikview limited by RAM only];
  • – incremental updates [Qlikview probably the best in this area];
  • – thin clients [Spotfire has the the best THIN/Web/ZFC (zero-footprint) client, especially with their recent release of Spotfire 3.2 and Spotfire Silver];
  • – thick clients [Qlikview has the best THICK client] ,
  • – access by 3rd party tools [PowerPivot’s integration with Excel 2010, SQL Server 2008 R2 Analysis Services and SharePoint 2010 is a big attraction];
  • – interface with SSAS cubes [PowerPivot has it, Tableau has it, Omniscope will have it very soon, Qlikview and Spotfire do not have it],
  • – GUI [3-way tie, it is heavily depends on personal preferences, but in my opinion Qlikview is more easy to use than others];
  • – advanced analytics [Spotfire 3.2 is the leader here with its integration with S-PLUS and support for IronPython and other add-ons]
  • – the productivity of developers involved with tools mentioned above. In my experience Qlikview is much more productive tool in this regard.

p003: http://wp.me/pCJUg-3R

Since I commented on recent releases of competing DV products (Qlikview, Tableau, Cognos, Business Objects etc.) I feel the need to post about Spotfire 3.2. For me the most important new feature in 3.2 is the  availability of all functionality of Spotfire THICK client in Spotfire 3.2 WebPlayer, specfically Spotfire WebPlayer now can do the same visual drill-down as Qlikview does for a while. Overall the 3.2  Release enabled Spotfire to catch-up with Qlikview and become a co-leader in DV market. Also SPotfire Clinical 3.2 was released, which enables Spotfire to connect with Oracle Clinical Databases. TIBCO Spotfire offers a unique memory-swapping or paging feature, which lets it analyze models that are larger than a single available memory space.

Among new features ability to export any Pages and Visualizations to PDF, improved integration with S-Plus and IronPython, ability to embed more than 4GB (actually unlimited) of application’s data into application file (and TIBCO Spotfire Binary Data Format file) and other improvements, like subtotals in Cross Table, SSO with NTLMv2 (Vista, Win7), Lists Tools and LDAP synchronization, Multiple Localizations for major Asian and European languages. Update on 11/2/10: TIBCO released Spotfire WebPlayer 3.2.1, which now fully supports iPad and its native multi-touch interface.

A few days later on 7/14/10, TIBCO released Spotfire Silver as a fully SaaS/ZFC version of Spotfire 3.2, designated for Self-Serviced BI users, who prefer to minimize their interactions with own IT/MIS departments. Spotfire Silver ahead of all DV competitors in terms of fully web-based but fully functional DV environment.

In case if users prefer behind-firewall Clustering and Fail-over configuration for Spotfire deployment, it may look like this:

n007=http://wp.me/pCJUg-5n

Data Visualization stands on the shoulders of the giants  – previously tried and true technologies like Columnar Databases, in-memory Data Engines and multi-dimensional Data Cubes (known also as OLAP Cubes).

OLAP (online analytical processing) cube on one hand extends a 2-dimensional array (spreadsheet table or array of facts/measures and keys/pointers to dictionaries) to a multidimensional DataCube, and on other hand DataCube is using datawarehouse schemas like Star Schema or Snowflake Schema.


The OLAP cube consists of facts, also called measures, categorized by dimensions (it can be much more than 3 Dimensions; dimensions referred from Fact Table by “foreign keys”). Measures are derived from the records in the Fact Table and Dimensions are derived from the dimension tables, where each column represents one attribute (also called dictionary; dimension can have many attributes). Such multidimensional DataCube organization is close to a Columnar DB data structures. One of the most popular usage of datacubes is a visualization of them in form of Pivot tables, where attributes used as rows, columns and filters while values in cells are appropriate aggregates (SUM, AVG, MAX, MIN, etc.) of  measures.

OLAP operations are foundation for most UI and functionality used by Data Visualization tools. The DV user (sometimes called analyst) navigates through the DataCube and its DataViews for a particular subset of the data, changing the data’s orientations and defining analytical calculations. The user-initiated process of navigating by calling for page displays interactively, through the specification of slices via rotations and drill down/up is sometimes called “slice and dice”. Common operations include slice and dice, drill down, roll up, and pivot:

Slice:

A slice is a subset of a multi-dimensional array corresponding to a single value for one or more members of the dimensions not in the subset.

Dice:

The dice operation is a slice on more than two dimensions of a data cube (or more than two consecutive slices).

Drill Down/Up:

Drilling down or up is a specific analytical technique whereby the user navigates among levels of data ranging from the most summarized (up) to the most detailed (down).

Roll-up:

(Aggregate, Consolidate) A roll-up involves computing all of the data relationships for one or more dimensions. To do this, a computational relationship or formula might be defined.

Pivot:

This operation is also called rotate operation. It rotates the data in order to provide an alternative presentation of data – the report or page display takes a different dimensional orientation.

OLAP Servers with most marketshare are: SSAS (Microsoft SQL Server Analytical Services), Intelligence Server (Microstrategy), Essbase (Oracle also has so called Oracle Database OLAP Option), SAS OLAP Server, NetWeaver Business Warehouse (SAP BW), TM1 (IBM Cognos), Jedox-Palo (I cannot recommend it) etc.

Microsoft had (and still has) the best IDE to create OLAP Cubes (it is a slightly redressed version of Visual Studio 2008, known as BIDS – Business Intelligence Development Studio usually delivered as part of SQL Server 2008) but Microsoft failed (for more than 2  years) to update it for Visual Studio 2010 (update is coming together with SQL Server 2012). So people forced to keep using BIDS 2008 or use some tricks with Visual Studio 2010.

Permalink: https://apandre.wordpress.com/2010/06/13/data-visualization-and-cubes/

Recently I had a few reasons to review Data Visualization technologies in Google portfolio. In short: Google (if it decided to do so) has all components to create a good visualization tool, but the same thing can be said about Microsoft and Microsoft decided to postpone the production of DV tool in favor of other business goals.

I remember a few years ago Google bought a Gapminder (Hans Rosling did some very impressive Demos

tumblr_mssaaxhajz1stz40uo1_500

with it a while ago):

and converted it to a Motion Chart “technology” of its own. Motion Chart (For Motion Chart Demo I did below, please Choose a few countries (e.g. check checkboxes for US and France) and then Click on “Right Arrow” button in the bottom left corner of the Motion Chart below)

(see also here a sample I did myself, using Google’s motion Chart) allows to have 5-6 dimensions crammed into 2-dimensional chart: shape, color and size of bubbles, Axes X and Y as usual (above it will be Life Expectancy and Income per Person) and animated time series (see light blue 1985 in background above – all bubbles will move as “time” goes by). Google uses this and other own visualization technologies in its very useful Public Data Explorer.

Google Fusion Tables is a free service for sharing and visualizing data online. It allows you to upload and share data, merge data from multiple tables into interesting derived tables, and see the most up-to-date data from all sources, it has  TutorialsUser’s GroupDeveloper’s Guide and sample code, as well as examples. You can check a video here:

The Google Fusion Tables API enables programmatic access to Google Fusion Tables content. It is an extension of Google’s existing structured data capabilities for developers. Developer can populate a table in Google Fusion Tables with data, from a single row to hundreds at a time. The data can come from a variety of sources, such as a local database, .CSV file, data collection form, or mobile device. The Google Fusion Tables API is built on top of a subset of the SQL querying language. By referencing data values in SQL-like query expressions, developer can find the data you need, then download it for use by your application. Your app can do any desired processing on the data, such as computing aggregates or feeding into a visualization gadget. Data can be synchronized when you add or change data in the tables in your offline repository, you can ensure the most up-to-date version is available to the world by synchronizing those changes up to Google Fusion Tables.

Everybody knows about Google Web Analytics for your web traffic, visitors, visits, pageviews, length and depth of visits, presented by very simple charts and dashboard, see sample below:

Less people know that Panorama Software has OEM partnership with Google, enabling Google Spreadsheets with SaaS Data Visualizations and Pivot Tables.

Google has Visualization API (and interactive Charts, including all standard Charts, GeoMap, Intensity Map, Map, DyGraph, Sparkline, WordCloud and other Charts) which enables developers to expose own data, stored on any data-store that is connected to the web, as a Visualization compliant datasource. The Google Visualization API also provides a platform that can be used to create, share and reuse visualizations written by the developer community at large. Google provides samples, Chart/API Gallery (Javascript-based visualizations) and Gadget Gallery.

And last but not least, Google has excellent back-end technologies needed for big Data Visualization applications, like BigTable (BigTable is a compressed, high performance, and proprietary database system built on Google File System (GFS), Chubby Lock Service, and a few other Google programs; it is currently not distributed or used outside of Google, although Google offers access to it as part of their Google App Engine) and MapReduce. Add to this list Google Maps and Google Earth

and ask yourself then: what is stopping Google to produce a Competitor for the Holy Trinity (of Qlikview+Spotfire+Tableau) of DV?

Permalink: https://apandre.wordpress.com/2011/02/08/dvgoogle/

William Playfair said more than 200 years ago: (according to Doug McCune and others, he was the first person who visualized the data, unless the legend about Munehisa Homma will be finally proven): “As the eye is the best judge of proportion, being able to estimate it with more quickness and accuracy than any other of our organs, it follows, that wherever relative quantities are in question …[the Line Chart] … is peculiarly applicable; it gives a simple, accurate, and permanent idea, by giving form and shape to a number of separate ideas, which are otherwise abstract and unconnected.” William Playfair invented four types of Data Visualizations: in 1786 the Line Chart, see it at Wikipedia here:

http://upload.wikimedia.org/wikipedia/commons/5/52/Playfair_TimeSeries-2.png

and Bar Chart chart of economic data, and in 1801 the Pie Chart and circle graph, used to show part-whole relations. Recreation of some Playfair Charts can be found here. Some legends (I have to see a prove of them yet) attributed to Munehisa Homma (also known as Munehisa Honma, Sokyu Honma and Sokuta Honma) the invention of Candlestick Charts way before (around 1755?) first Charts was used and published in western countries.

Article in “Economist”, named “Worth a thousand words” referred to “Three of History’s Best Charts Ever”. Economist obviously had no access (or knowledge?) to original Candlestick Charts (please let me know if you have these images or links to them). The 3 visualizations that The Economist described as “three of history’s best” include…

1. Florence Nightangale’s 1858 graphic demonstrating the factors affecting the lives (and death rates) of the British army (which resulted in a graphic type called “Nightingale’s Rose” or “Nightingale’s Coxcomb”), see it on “Economist” site here:

http://media.economist.com/sites/default/files/cf_images/20071222/5107CR3B.jpg.

She showed in a visual graphic that it wasn’t wounds killing the highest number of soldiers – it was infections. This Radar (or Polar?) Chart was done in 1859.

2. Charles Joseph Minard’s very famous 1861 graphic depicting the Russian campaign of 1812 – Tufte called it the “the best statistical graphic ever drawn”, see it on “Economist” site here:

http://media.economist.com/sites/default/files/cf_images/20071222/5107CR2B.jpg .

What a dramatic story it tells. This Area Chart, overlay-ed over map, was created in 1869.

Old Area Chart by Minard, 1869

Smart people in France even figured out of how to do it Dynamic in Excel:

3. William Playfair’s 1821 chart comparing the “weekly wages of a good mechanic” and the “price of a quarter of wheat” over time, see it on “Economist” site here:

http://media.economist.com/sites/default/files/cf_images/20071222/5107CR1B.jpg .

He was one of the first people to use data not just to educate but also to persuade and convince. This old Column Chart, combined with Line (or Area Chart?) – basically one of the first published known Combo Charts, was created in 1821 (almost 200 years ago!)

Minard actually created more charts way before computers and Data Visualization software was created. For example in 1861 he created this Multiline Chart:

In 1866 Mr. Minard created one of the first Stacked Area Charts:

In 1859 Minard published one of the first Bubble Charts, overlayed over Map:

In short, Column, Bar, Line, Combo, Area, Bubble and other type of Charts was used way before (150-200 years ago) people started to use Data Visualization Software. Those oldest charts above and some other very old charts (some created in USA!) you can see in this slideshow:   http://picasaweb.google.com/pandre/Chartology#slideshow/ or/and you can watch this video:

However, as I said in a beginning, some Data Visualization techniques was known and used even before William Playfair. At least 266 years ago in Japan Munehisa Homma invented (again it is a Legend, because even Steve Nison has no copies of original hand-drawn Japanese Candlestick Charts from 18th Century) Candlestick Charts, which eventually became a part of Financial Visualization and they were reused for Stock Charts (a combo of daily Trading Volume and Open-High-Low-Close Multiline Chart of Daily prices).

Permalink: https://apandre.wordpress.com/2010/04/12/history-of-data-visualization/

Data Visualization can be a good thing for Trend Analysis: it allows to “see this” before “analyze this” and to take advantage of human eye ability to recognize trends quicker than any other methods. Dr. Ahlberg started (after selling Spotfire to TIBCO and claiming that “Second place is first loser”) a “Recorded Future” to basically sell … future trends in form (mostly) of Sparklines; he succeeded at least in selling RecordedFuture to investors from CIA and Google. Trend analysis is an attempt to “spot” a pattern, or trend, in data (in most cases well-ordered set of datapoints, e.g. by timestamps) or predict future events.

Visualizing Trends means in many cases either Time Series Chart (can you spot a pattern here with your naked eye?):

or Motion Chart (both best done by … Google, see it here http://visibledata.blogspot.com/p/demos.html ) – can you predict the future here(?):

or Sparklines (I like Sparkline implementations by Qlikview and Excel 2010) – sparklines are scale-less visualization of “trends”:

may be Scatter (Excel is good for it too):

and in some cases Stock Chart (Volume-Open-High-Low-Close, best done with Excel) – for example Microsoft stock is fluctuating near the same level for many years, so I guess there is no visible trend  here, which may be spells a trouble for Microsoft future (compare with visible trend of Apple and Google stocks):

Or you can see Motion, Timeline, Sparkline and Scatter charts alive/online below: for Motion Chart Demo, please Choose a few countries (e.g. check checkboxes for US and France) and then Click on “Right Arrow” button in the bottom left corner of the Motion Chart below:

In statistics trend analysis often refers to techniques for extracting an underlying pattern of behavior in well-ordered dataset which would otherwise be partly hidden by “noise data”. It means that if one cannot “spot” a pattern by visualizing such a dataset, then (and only then) it is time to apply regression analysis and other mathematical methods (unless you smart or lucky enough to remove a noise from your data). As I said in a beginning: try to see it first! However, extrapolating the past to the future can be a source for very dangerous mistakes (just check a history of almost any empire: Roman, Mongol, British, Ottoman, Austrian, Russian etc.)

Human eye has own Curse of Dimensionality (term suggested in 1961 by R.Bellman and described independently by G. Hughes in 1968). In most cases the data (before they visualized) usually organized in multidimensional Cubes (n-Cubes) and/or Data Warehouses and/or speaking more cloudy – in Data Cloud – need to be projected into less-dimensional datasets (small-dimensional Cubes, e.g. 3d-Cubes) before they can be exposed through (preferably  interactive  and  synchronized set of charts, sometimes called dashboards) 2-dimensional surface of computer monitor in form of Charts.

Projection of DataCloud to DataCubes and then to Charts

During last 200+ years people kept inventing all type of charts to be printed on paper or shown on screen, so most charts showing 2- or 3-dimensional datasets. Prof. Hans Rosling led Gapminder.org to create the web-based, animated 6-dimensional Color Bubble Motion Chart (Trendalyzer):

tumblr_mssaaxhajz1stz40uo1_500

ansd screenshot of it here:

which he used in his famous demos: http://www.gapminder.org/world/ , where 6 dimensions in this specific Chart are (almost a record for 2-dimensional chart to carry):

  • X coordinate of the Bubble = Income per person,
  • Y coordinate of the Bubble = Life expectancy,
  • Size of the Bubble = Population of the Country,
  • Color of the Bubble = Continent of the Country,
  • Name of the Bubble = Country,
  • Year = animated 6th Dimension/Parameter as time-stamp of the Bubble.

Trendalyzer was bought from Gapminder in 2007 by Google and was converted into Google Motion Chart, but Google somehow is not in rush to enter the Data Visualization (DV) market.

Dimensionality of this Motion Chart can be pushed even further to 7 dimensions (dimension as an expression of measurement without units) if we will use different Shapes (in addition to filled Circles we can use Triangles, Squares etc.) but it will be literally pushing the limit of what human eye can handle. If you will add to the consideration a tendency of DV Designers to squeeze more than one chart on a screen (how about overcrowded Dashboards with multiple synchronized interactive Charts?), we are literally approaching the limits of both human eye and human brain, regardless of the dimensionality of the Data Warehouse in backend.

Below I approximately assessed the dimensionality of datasets for some popular charts (please feel free to send me the corrections). For each Dataset and respective Chart I estimated the number of measures (usually real or integer number, can be a calculation from other dimensions of dataset), the number of attributes (in many cases they are categories, enumerations or have string as datatype) and 0 or 1 parameter (presenting a well-ordered set, like time (for time series), date, year, sequence (can be used for Data Slicing), natural, integer or real  number) and Dimensionality (the number of Dimensions) as a total number of measures, attributes and parameters in a given dataset.

Chart Measures Attributes Parameter Dimensionality
Gauge, Bullet, KPI 0 0
Monochromatic Pie 1 1
Colorful Pie 1 1 2
Bar/Column 1 1 2
Sparkline 1 1 2
Line 1 1 2
Area 1 1 2
Radar 1 1 2
Stacked Line 1 1 1 3
Multiline 1 1 1 3
Stacked Area 1 1 1 3
Overlapped Radar 1 1 1 3
Stacked Bar/Column 1 1 1 3
Heatmap 1 2 3
Combo 1 2 3
Mekko 2 1 3
Scatter (2-d set) 2 1 3
Bubble (3-d set) 3 1 4
Shaped Motion Bubble 3 1 1 5
Color Shaped Bubble 3 2 5
Color Motion Bubble 3 2 1 6
Motion Chart 3 3 1 7


The diversity of Charts and their Dimensionality adding another complexity for DV Designer: what Chart(s) choose. You can find on web some good suggestions about that. Dr. Andrew Abela created Chart Chooser Diagram

Choosing a good chart by Dr. Abela

and it was even converted into online “application“!

Permalink: https://apandre.wordpress.com/2011/03/02/dimensionality/

How do I know what I think until I see what I say?” Or let me rephrase Mr. E.M. Forster: “How do YOU know what I think until I will blog about it“?

I resisted to an idea to have a blog since 1996, because I perceived the blogging as very similar to a fasting in desert (actually after a few months of blogging I am amazed – according to WordPress Statistics – that my blog has hundreds and hundreds of visitors every day!). But recently I got a few excellent pushes to start my own blog because when I posted comments on somebody’s blog they got deleted against my will. Turned out that owners of those blogs can delete my comments and thoughts anytime if he/she/they do not like what I said. It happened to me on one of Forrester’s Blogs and it happened to me on my own profile on LinkedIn – when I posted so called “update” and some of LinkedIn employees decided to delete it. In both cases above administrators even did not bother to send me my own thoughts for archiving purposes – they just disappear!

So I decided to start the blog about Data Visualization (DV),

because I am doing DV for many years and accumulated many DV implementations and thoughts about DV, DV tools, DV Vendors, DV Market etc. For now I will have 8 main pages (and they will be used as root pages for hierarchy of sub-pages):

  • Home Page of this blog  is a place where all posts and comments will go,
  • Visualization Page (with sub-pages) is for DV Samples and Demos,
  • DataViews Page (and it’s sub-pages) is about … Data Views, Charts and Chartology,
  • Tools Page designated for DV Software and comparison of DV Tools,
  • Solutions Page will describe possible DV solutions, DV System, products  and DV services I can provide,
  • Market Page dedicated to DV Vendors and DV market news and analyses,
  • Data Page is about ETL processes, Data Collection and Data Sources
  • About page can give you an info about me

Another argument (for me to do DV blogging) was said 2500 years ago by Confucius:” Choose a job you love, and you will never have to work a day in your life.” And finally, I have to mention this 500-years old story in hope it will help me to filter out from this blog all unneeded pieces: “An admirer asked Michelangelo how he sculpted the famous statue of David that now sits in the Academia Gallery in Florence. How did he craft this masterpiece of form and beauty? Michelangelo’s offered this strikingly simple description: He first fixed his attention on the slab of raw marble. He studied it and then “chipped away all that wasn’t David.”

p001: http://wp.me/pCJUg-3

Design a site like this with WordPress.com
Get started