Yesterday as we left Yosemite decided to stop at Glacier Point. On the way we started seeing this huge plume of smoke emerging from behind Half Dome and stopped to check it out. This is the Meadows Fire that was burning for a month until yesterday when “blew up” into a much bigger fire.
After completing a number of analytics engagements a common theme has emerged on how to systematically approach this problem. This framework covers how to go from the desire for a business to be data driven to an analytics solution.
Most companies today are collecting large amounts of transactional, customer and employee related data. This data is stored in business systems such as Salesforce.com, SAP, Oracle, Teradata, Hadoop, etc. Data and analytics can help companies gain competitive advantage by leveraging this data and almost always generating additional revenue. These competitive gains are only possible if a solution is devised that continuously ingests company’s internal data-sets and produces timely actionable insights. The goal of such solutions is to incorporate data insights into day-to-day operations of the organization.
The diagram below shows the process and the checklists for each of sub-processes.
(Click figure to Zoom in)
To build these solutions this framework has been tested several times and incorporates many of the lessons learned over the years. The framework generally follows these six steps:
Understand Business Process: Work closely with client to understand business strategy and goals for data/analytics solutions. Typical goals can be as simple as bringing business process transparency to more complex goals such as enabling experimentation, customized experiences or automated decision making.
Form Analytics Strategy: The next step includes determining what Key Performance Indicators (KPIs) or metrics measure business success. During this step existing internal and external data-sets are analyzed to identify useful metrics and highlight gaps in data instrumentation. Ad-hoc analysis and data exploration during this step can lead to better KPIs. Lastly, during this step wire-frame designs are constructed and shared with client to get feedback on potential solutions. At this step decisions are also made about which features are ‘must haves’, ‘performance enhancers’ and ‘delighters’.
Design & Develop Solution: With a clear understanding of data-sources, business goals and KPIs, a solution architecture starts to take shape. Design and development of such solutions is done using one of several tools available. Typical tools used are discussed later in this post. While larger clients tend to want most features incorporated during this step, it is often best to limit the first version to the ‘must have’ features.
Deploy to Client: Deploying an analytical solution involves setting frequency of data collection, data validation, analysis and dashboard updates. Additionally, this is when clients get a first look at the end product. End product in this case can be a dashboard, report or even a detailed presentation. Using feedback from client, this is also a good opportunity to make any major required changes and evaluate if features meet client needs. Using the Kano model is an easy way to quantitatively understand the needs of your client.
Execute & Iterate: Once deployed, a cycle of execution, measurement and iteration starts. From the client’s perspective this cycle has four steps: (1) gather and sanitize data (2) update visualizations (3) perform analysis and (4) share with clients. At the end of each cycle analysts asses and iterate on the product. This is also a good opportunity to incorporate ‘performance enhancers’ and ‘delighter’ features.
Evaluate: After a few rounds of execution, analytics design product becomes operational. At this point analysts work with the client to evaluate analytics ROI and value added to the business.
TOOL FOR BUSINESS ANALYTICS
Tools used for analytic and data science projects can be separate topic on its own. However, given the relevance to this framework, some of today’s (2013/2014) tools could be highlighted as:
Data Curation: SQL, Pig/Hive
Data Discovery (ad-hoc analysis): R, Python, Matlab, Excel
MixPanel, Google Analytics and GoodData are also common at startups for web and mobile analytics. I have also seen the following used, but am personally not a big fan of them: MircoStrategy, Essbase, etc. There are several players in this space offering all kind of tool as shown in the diagram below:
A more detailed marketplace chart is available here.
The most important deliverable is a product/dashboard/report that meets the needs of your client to run a data driven business. Each process gate can also have less structured deliverables as shown below:
Ideally, success criteria for each deliverable should also be defined for each step. For the over all project a common criteria applies “Timely Actionable Insights”. A solution that meets these three criterias ultimately can help move the business forward.
As a leader and manager of a data driven organization you may be able to leverage analytics to generate additional revenue streams. If you are not in the business of selling data (e.g. Jawbone) but are a data driven company (e.g. LinkedIn, Chase, Target), this is your future. So far we have seen consumable analytics come from firms that have a core business of providing insights. Prime examples are companies in the wearable apps and web/business analytics space.
Let us put aside the technical hurdles of setting up analytics products in this post and focus on what kind of customer, when to monetize and what analytics to provide.
Analytics for Businesses
In a recent interview, a CEO of a leading LED manufacturer said “I need analytics on where the fish are swimming right now.” He wanted to know as soon a property is leased or sold so that he could pursue it as a sales lead. This data can be curated by combining data from Zillow, Craigslist and other open data sources.
Companies in a two-sided marketplace business model are in the best situation to provide analytics to both consumers and sellers. Consider a real-estate listings company that can provide realtors with analytics on how their listings are performing and how they are doing compared to other realtors in the area. For listings, providing Google Analytics like information would help sellers understands potential buyers better.
Analytics products that provide interactive automated reports for such businesses can very likely be monetized through a subscription model. Independent and small businesses are continually looking to find prospective clients and to understand the competitive forces in their marketplace. Large businesses may also find performance reports valuable and it may help them compare their services with their competitors.
Analytics for business customers can include
Market Assessment: Analytics around competitors selling similar products along with assessment of demand for those products. eBay, LinkedIn and Craigslist are good examples of two-sided marketplaces which can provide information to sellers about the competitive environment, and demand for items listed. For example, LinkedIn could share information about how many potential candidates meet criteria for job requirements.
Client/Service Performance: Insights about seller’s performance and transactions. A real-estate service could track realtor’s performance (sales) over time and provide analytics about how her listings perform.
Client’s Consumer Behavior: Aggregated demographics data about end-consumers that shed light into buying behavior. Companies like Amazon or eBay can share insights about buyers for particular merchants. This could be data about consumer acquisition funnel or change in Average Order Value (AOV).
Lead Qualification: In cases where consumers are willing to share their data, your service may be able provide sellers with leads and propensity to buy. A good example is recruiters and candidates for job listings on LinkedIn. LinkedIn can share profiles of candidates for job listing that best fit the job description. Similarly, real-estate listing services like Trulia can provide realtors with screens of property buyers who may be interested in appropriate listings.
Insights for Consumers
Below is a snapshot of what LinkedIn will offer if you go to apply for a job using their website. Signing up for a premium account, they’ll give you detailed insights about how many people applied for the role and what LinkedIn estimates your chances are based on your profile.
Consumers only tend to pay for analytics when they have a very specific need for which an investment in analytics results in a financial return. For example someone actively looking for a job or a house may invest in a subscription service that provides them with a competitive advantage over other candidates or properties. In most other cases, an analytics product can be an added feature to your main product.
Types of consumer facing analytics may include
Customized Analysis: Customized recommendations for consumers that provides self-reported data (LinkedIn recommended jobs, Amazon recommended purchases, and Netflix recommended movies). A potential service would be a medical website/app that takes details about your symptoms, x-ray reports, bloodwork, etc. and tells you exactly what type of medical condition you have, and what local doctors specialize in it. Another potential service would provide customized travel itineraries for vacations based on your travel preferences.
Combining Several Data Sources: Consumers provide credentials to their data sources (financial accounts, retirement funds, social networks) and a service combines those sources of data to provide insights. Combining data from all your financial institutions, Mint.com does an excellent job of providing insight about a consumer’s expenditure and investment behavior.
Social Analytics: If your company has millions of consumers and you can aggregate data, you can provide analytics around how a consumer compares to the rest of the population. Spotify in theory could provide analytics around what kind of music your friends like and how your musical tastes differ.
Issues Relating to Privacy
One of the main concerns relating to monetizing data is the questions of privacy and data ownership. If you are looking to monetize data this has be a primary issue that should be addressed upfront. Consumers and businesses want to first make sure that any data that’s either shared with them or collected about them is never sold or publicized to a third party. Any data that can somehow be attributed to a particular user or business is referred to as PII (Personally Identifiable Information). Anyone looking to monetize data must think through how to communicate their commitment to protecting PII data to end customers.
When providing analytics that gives a market or competitive perspective you must aggregate data in a way that no single customer can be identified using that data. Aggregating data to form collective inference enables analysis that can potentially be shared with customers or third parties. Some of the criteria that may be used to ensure that aggregate data is not compromising any customer’s privacy are
Using an adequate sample size for segments and attributes so that a single customer cannot skew analytics
Not splitting data or data segments in a way that a single customer could be identified
Not showing medians, minima and maxima when possible
For companies that collect a lot of data about consumer behavior there are several opportunities to monetize new revenue streams. This can be done by providing analytics to consumers and businesses. While two-sided market places are poised to benefit the most, many businesses can provide valuable actionable insights about service and product usage. When sharing analytics about market assessment and competitive insights privacy issues must be addressed upfront.
Most people use statistics the way a drunkard uses a lamp post, more for support than illumination.
I recently listed an old HP mini computer on Craigslist. One “John Clara” reached out to me via text and tried to sucker me in. Funny thing is that at the time of this post I am still a PayPal employee. How amusing that this sucker is trying to scam a PayPal employee. This is how the conversation went:
+12242028827: HP Mini 110-3510NR Netbook laptop notebook - $145 (san jose downtown) 7:30 PM
+12242028827: still for sale? am john „i want you to get back to me with the condition,and your firm price 7:30 PM
Me: Yes, Condition is like new. Software factory reset, so super fast and no junk software. 7:51 PM
Me: Last price is $140 7:52 PM
+12242028827: I’m okay with your asking price I’m buying it for my cousin in abroad , I want you to get back to me with your PayPal email address and some detailed 7:52 PM
+12242028827: pictures of this item so that I can pay in asap. I’ll add the shipping fee to the total payment I’ll be sending Thanks. 7:52 PM
Me: what’s your paypal ID ? 7:53 PM
+12242028827: firstname.lastname@example.org 7:54 PM
+12242028827: also send me pics to my email email@example.com 7:55 PM
Me: Cool. Pics are on the craigslist posting 7:55 PM
Me: I can’t really ship it because of various reasons. 7:56 PM
+12242028827: tell me the reason 7:57 PM
Me: Btw. I work for PayPal just so you know 7:57 PM
+12242028827: so what does that mean to me now? 7:57 PM
Me: Can you pick up in person or have someone pick up? 7:58 PM
Me: I am cool with you using paypal. Please pay to this cell phone 8:01 PM
Me: No bringing in email addresses 8:01 PM
+12242028827: what are you saying since 8:02 PM
Me: I am saying I am ready to do the deal if you can pay me with paypal 8:04 PM
+12242028827: i told you already 8:06 PM
Me: ?? 8:07 PM
+12242028827: get back to me with your paypal email address for the payment 8:08 PM
Me: ***************@**.com 8:09 PM
Me: Or you can send payment to 814 *** **** using the paypal app 8:09 PM
+12242028827: okay 8:10 PM
+12242028827: send me the pics to my email,firstname.lastname@example.org, so i will proceed with the payment now,get back to me when am done 8:11 PM
Me: Cool. Let me confirm I have received the payment. 8:12 PM
BAD: How many customers churned (went dormant) last month?
GOOD: What percentage of our customers are expected to churn in the next few months and why?
BETTER: What short term impact can we have on our business by reducing customer churn and what options do we have to do that?
Needless to say, bad questions get bad answers. As business leaders we often approach our data science and analytic teams with very specific questions. In most cases asking specific questions gets us very specific answers, which alone is useless. In this case, the good and better questions are likely to generate some very useful and actionable analysis.
Good data/analytics questions seek to get three things out of the answers or analysis. These are:
Insight - Something you didn’t know before
Action - Asking for possible solutions
Time - In a time-frame that your actions can have an impact
What you want is information (not data)
INSIGHTFUL The answer should provide some new thought or idea that was not apparent before. For example,being told about your organization’s market share that you didn’t know before or your customer life time value. To ask questions that generate insight you should almost never ask for absolute numbers. Instead, ask for ratios and percentages of how data points compare.
ACTIONABLE An insight is still pretty useless if you can’t do much about it. If you were simply told what your market share was, that’s not so useful. However, if you were told of a market share shift because your competitor was introducing a new product at a lower price point, that would be actionable.
TIMELY Ask questions that don’t generate a rear view mirror answer. While historic data can help generate trends, it is the ‘prediction’ that’s valuable. For example, instead of finding out your current market share, it might be better asking what is the predicted market share in the next few months. Another important aspect of timeliness is the reproduction and periodic rerun of the analysis so that it continues to provide insights. You want your analysts to automate this process as well so that they are not spinning their wheels each week or month doing the same thing.
Business insights extracted through data mining are valuable for a very short period of time
Just because your organization has Saleforce.com or an Oracle database doesn’t mean you should be going out and hiring a team of data scientists to work full-time. The value of insights that these professionals generate, tends to diminish over time if no new data-sets are being populated or expanded as a result of growing business operations.
Data-sets, like gold mines, are finite and vary based on the three V’s (volume, variety and velocity). Looking from an organization’s leadership perspective, insights discovery typically takes place in four stages.
Initial Stage (Finding the gold mine)
The value an organization gets out of data sets varies over time, but for most typical engagements the process is very similar. The figure below is an abstract view of how much value is generated over time (Note that this figure is not cumulative) during each of the aforementioned stages.
Finding the gold mine
Today most companies are sitting on Enterprise Resource Planning (ERP) and CRM Customer Relationship Management (CRM) tools that over time become data gold mines. Even small tools like Quickbooks, IBM Rationale Clearquest and Clearcase are storing away valuable log files that can unlock tremendous efficiency gains for organizations. The problem is that most organizations do not realize, or even understand, the significant value their data-sets can generate for them.
In this first stage, while no value is extracted from data-sets, an acknowledgement has to happen by the organization’s leadership recognizing the value of data. The leader then assembles a team of business analytics and data science professionals that are asked to explore the possibility of finding golden nuggets in these data sets.
There are many different ways data brings value to an organization: transparency, platform for experimentation, customized actions, and automation of decision making. A right combination of skill sets is also required to analyze this data.
Following the formation of a team, Data discovery is the phase where a number of processes happen. Business analysts look at the data and try to determine what analytics frameworks should be applied to the data; what Key Performance Indicators (KPIs) to extract; and what potential business cases can be constructed. Business analysts and scientists are not only examining the data but also determining the best ways to perform robustness and data hygiene checks.
Once some inroads are made into data structure and cleaning, rich insights and trends start to emerge. Many insights at this point are interesting but less are useful and actionable. This stage is where logged data is converted to daily active users, transaction amounts per day, useful sales ratios and histogram plots of various segments.
During this exploratory stage, a lot of visualization tools become useful as well. Time series and historical analysis are conducted to see how things are changing over time. Predictive analysis can be done to see where the organization is headed. In summary, this stage brings a tremendous amount of transparency to leadership uncovering opportunities for data to play a role.
After a significant amount of brainstorming with data-sets, analysts and leaders define a clear framework that separates out vanity metrics from ones that are useful and actionable. This is an important step because two very important things happen: (1) determination of what data the organization needs to consider, and more important, what data it does not (2) putting together a system in place that determines the right intervals to repeat the insight generation process.
Determining what metrics to focus on is a very critical step because it ties analytics to the organization’s strategic plan. A connection to the strategic plan also means that leaders of the organization have to openly assimilate the metric into the goals of the organization. The framework becomes important because it helps members of the organization understand how they contribute to the goal.
Another important aspect at this stage of the process is the development and deployment of data visualization that is accessible by all members of the organization. This enables democratization of data and allows everyone to be on the same page.
After a visualization tool has been setup, organizations start to monitor their data. The organization also gets aligned with a certain analytical framework. Leaders get updated on the statistics of data periodically, and they take actions when things get out of trend. The amount of value being generated at each data generation cycle is at best incremental and data scientists and analytics don’t have much more value to add besides making sure data updates are accurate. Organizations that become smart enable reporting by exception.
Hiring a data science or business analytics team makes sense if an organization is going through tremendous growth and continues to generate new data sets. If this is not the case, data scientists are only valuable for a very small stage of the insights generation process. As a leader of an organization not undergoing tremendous growth, you must carefully think about your goals and the depth of goal mine you think you are about to mine
Articles, reports, surveys, interviews, and interactives covering Big Data & Advanced Analytics
In their efforts to compensate for the unreliability of human performance, the designers of automated control systems have unwittingly created opportunities for new error types that can be even more serious than those they were seeking to avoid