Yesterday as we left Yosemite decided to stop at Glacier Point. On the way we started seeing this huge plume of smoke emerging from behind Half Dome and stopped to check it out. This is the Meadows Fire that was burning for a month until yesterday when “blew up” into a much bigger fire.

Analytics Engagement Framework

After completing a number of analytics engagements a common theme has emerged on how to systematically approach this problem. This framework covers how to go from the desire for a business to be data driven to an analytics solution.

Most companies today are collecting large amounts of transactional, customer and employee related data. This data is stored in business systems such as, SAP, Oracle, Teradata, Hadoop, etc. Data and analytics can help companies gain competitive advantage by leveraging this data and almost always generating additional revenue. These competitive gains are only possible if a solution is devised that continuously ingests company’s internal data-sets and produces timely actionable insights. The goal of such solutions is to incorporate data insights into day-to-day operations of the organization.


The diagram below shows the process and the checklists for each of sub-processes.

(Click figure to Zoom in)

To build these solutions this framework has been tested several times and incorporates many of the lessons learned over the years. The framework generally follows these six steps:

  1. Understand Business Process: Work closely with client to understand business strategy and goals for data/analytics solutions. Typical goals can be as simple as bringing business process transparency to more complex goals such as enabling experimentation, customized experiences or automated decision making.

  2. Form Analytics Strategy: The next step includes determining what Key Performance Indicators (KPIs) or metrics measure business success. During this step existing internal and external data-sets are analyzed to identify useful metrics and highlight gaps in data instrumentation. Ad-hoc analysis and data exploration during this step can lead to better KPIs. Lastly, during this step wire-frame designs are constructed and shared with client to get feedback on potential solutions. At this step decisions are also made about which features are ‘must haves’, ‘performance enhancers’ and ‘delighters’.

  3. Design & Develop Solution: With a clear understanding of data-sources, business goals and KPIs, a solution architecture starts to take shape. Design and development of such solutions is done using one of several tools available. Typical tools used are discussed later in this post. While larger clients tend to want most features incorporated during this step, it is often best to limit the first version to the ‘must have’ features.

  4. Deploy to Client: Deploying an analytical solution involves setting frequency of data collection, data validation, analysis and dashboard updates. Additionally, this is when clients get a first look at the end product. End product in this case can be a dashboard, report or even a detailed presentation. Using feedback from client, this is also a good opportunity to make any major required changes and evaluate if features meet client needs. Using the Kano model is an easy way to quantitatively understand the needs of your client.

  5. Execute & Iterate: Once deployed, a cycle of execution, measurement and iteration starts. From the client’s perspective this cycle has four steps: (1) gather and sanitize data (2) update visualizations (3) perform analysis and (4) share with clients. At the end of each cycle analysts asses and iterate on the product. This is also a good opportunity to incorporate ‘performance enhancers’ and ‘delighter’ features.

  6. Evaluate: After a few rounds of execution, analytics design product becomes operational. At this point analysts work with the client to evaluate analytics ROI and value added to the business.


Tools used for analytic and data science projects can be separate topic on its own. However, given the relevance to this framework, some of today’s (2013/2014) tools could be highlighted as:

  • Data Curation: SQL, Pig/Hive
  • Data Discovery (ad-hoc analysis): R, Python, Matlab, Excel
  • Dashboard Designs & Mockups: Balsamiq, MS Visio
  • Data Processing & Product Prototyping: SAS, R
  • Machine Learning: R, Python & Mahout
  • Visualization & Dashboarding: Plotly, Tableau, QlikView
  • Presentation & Reports: PowerPoint & PDFs
  • Web A/B testing: Optimizely

MixPanel, Google Analytics and GoodData are also common at startups for web and mobile analytics. I have also seen the following used, but am personally not a big fan of them: MircoStrategy, Essbase, etc. There are several players in this space offering all kind of tool as shown in the diagram below:


A more detailed marketplace chart is available here.


The most important deliverable is a product/dashboard/report that meets the needs of your client to run a data driven business. Each process gate can also have less structured deliverables as shown below:

Ideally, success criteria for each deliverable should also be defined for each step. For the over all project a common criteria applies “Timely Actionable Insights”. A solution that meets these three criterias ultimately can help move the business forward.

Adding supplemental revenue streams through analytics

As a leader and manager of a data driven organization you may be able to leverage analytics to generate additional revenue streams. If you are not in the business of selling data (e.g. Jawbone) but are a data driven company (e.g. LinkedIn, Chase, Target), this is your future. So far we have seen consumable analytics come from firms that have a core business of providing insights. Prime examples are companies in the wearable apps and web/business analytics space.

Let us put aside the technical hurdles of setting up analytics products in this post and focus on what kind of customer, when to monetize and what analytics to provide.

Analytics for Businesses

In a recent interview, a CEO of a leading LED manufacturer said “I need analytics on where the fish are swimming right now.” He wanted to know as soon a property is leased or sold so that he could pursue it as a sales lead. This data can be curated by combining data from Zillow, Craigslist and other open data sources.

Companies in a two-sided marketplace business model are in the best situation to provide analytics to both consumers and sellers. Consider a real-estate listings company that can provide realtors with analytics on how their listings are performing and how they are doing compared to other realtors in the area. For listings, providing Google Analytics like information would help sellers understands potential buyers better.

Analytics products that provide interactive automated reports for such businesses can very likely be monetized through a subscription model. Independent and small businesses are continually looking to find prospective clients and to understand the competitive forces in their marketplace. Large businesses may also find performance reports valuable and it may help them compare their services with their competitors.

Analytics for business customers can include

  1. Market Assessment: Analytics around competitors selling similar products along with assessment of demand for those products. eBay, LinkedIn and Craigslist are good examples of two-sided marketplaces which can provide information to sellers about the competitive environment, and demand for items listed. For example, LinkedIn could share information about how many potential candidates meet criteria for job requirements.
  2. Client/Service Performance: Insights about seller’s performance and transactions. A real-estate service could track realtor’s performance (sales) over time and provide analytics about how her listings perform.
  3. Client’s Consumer Behavior: Aggregated demographics data about end-consumers that shed light into buying behavior. Companies like Amazon or eBay can share insights about buyers for particular merchants. This could be data about consumer acquisition funnel or change in Average Order Value (AOV).
  4. Lead Qualification: In cases where consumers are willing to share their data, your service may be able provide sellers with leads and propensity to buy. A good example is recruiters and candidates for job listings on LinkedIn. LinkedIn can share profiles of candidates for job listing that best fit the job description. Similarly, real-estate listing services like Trulia can provide realtors with screens of property buyers who may be interested in appropriate listings.

Insights for Consumers

Below is a snapshot of what LinkedIn will offer if you go to apply for a job using their website. Signing up for a premium account, they’ll give you detailed insights about how many people applied for the role and what LinkedIn estimates your chances are based on your profile.

Consumers only tend to pay for analytics when they have a very specific need for which an investment in analytics results in a financial return. For example someone actively looking for a job or a house may invest in a subscription service that provides them with a competitive advantage over other candidates or properties. In most other cases, an analytics product can be an added feature to your main product.

Types of consumer facing analytics may include

  1. Customized Analysis: Customized recommendations for consumers that provides self-reported data (LinkedIn recommended jobs, Amazon recommended purchases, and Netflix recommended movies). A potential service would be a medical website/app that takes details about your symptoms, x-ray reports, bloodwork, etc. and tells you exactly what type of medical condition you have, and what local doctors specialize in it. Another potential service would provide customized travel itineraries for vacations based on your travel preferences. 
  2. Combining Several Data Sources: Consumers provide credentials to their data sources (financial accounts, retirement funds, social networks) and a service combines those sources of data to provide insights. Combining data from all your financial institutions, does an excellent job of providing insight about a consumer’s expenditure and investment behavior.
  3. Social Analytics: If your company has millions of consumers and you can aggregate data, you can provide analytics around how a consumer compares to the rest of the population. Spotify in theory could provide analytics around what kind of music your friends like and how your musical tastes differ.

Issues Relating to Privacy

One of the main concerns relating to monetizing data is the questions of privacy and data ownership. If you are looking to monetize data this has be a primary issue that should be addressed upfront. Consumers and businesses want to first make sure that any data that’s either shared with them or collected about them is never sold or publicized to a third party. Any data that can somehow be attributed to a particular user or business is referred to as PII (Personally Identifiable Information). Anyone looking to monetize data must think through how to communicate their commitment to protecting PII data to end customers.

When providing analytics that gives a market or competitive perspective you must aggregate data in a way that no single customer can be identified using that data. Aggregating data to form collective inference enables analysis that can potentially be shared with customers or third parties. Some of the criteria that may be used to ensure that aggregate data is not compromising any customer’s privacy are

  1. Using an adequate sample size for segments and attributes so that a single customer cannot skew analytics
  2. Not splitting data or data segments in a way that a single customer could be identified
  3. Not showing medians, minima and maxima when possible


For companies that collect a lot of data about consumer behavior there are several opportunities to monetize new revenue streams. This can be done by providing analytics to consumers and businesses. While two-sided market places are poised to benefit the most, many businesses can provide valuable actionable insights about service and product usage. When sharing analytics about market assessment and competitive insights privacy issues must be addressed upfront.

Catching criminals off-guard

I recently listed an old HP mini computer on Craigslist. One “John Clara” reached out to me via text and tried to sucker me in. Funny thing is that at the time of this post I am still a PayPal employee. How amusing that this sucker is trying to scam a PayPal employee. This is how the conversation went:

+12242028827: HP Mini 110-3510NR Netbook laptop notebook - $145 (san jose downtown) 7:30 PM
+12242028827: still for sale? am john „i want you to get back to me with the condition,and your firm price 7:30 PM
Me: Yes, Condition is like new. Software factory reset, so super fast and no junk software. 7:51 PM
Me: Last price is $140 7:52 PM
+12242028827: I’m okay with your asking price I’m buying it for my cousin in abroad , I want you to get back to me with your PayPal email address and some detailed 7:52 PM
+12242028827: pictures of this item so that I can pay in asap. I’ll add the shipping fee to the total payment I’ll be sending Thanks. 7:52 PM
Me: what’s your paypal ID ? 7:53 PM
+12242028827: 7:54 PM
+12242028827: also send me pics to my email 7:55 PM
Me: Cool. Pics are on the craigslist posting 7:55 PM
Me: I can’t really ship it because of various reasons. 7:56 PM
+12242028827: tell me the reason 7:57 PM
Me: Btw. I work for PayPal just so you know 7:57 PM
+12242028827: so what does that mean to me now? 7:57 PM
Me: Can you pick up in person or have someone pick up? 7:58 PM
Me: I am cool with you using paypal. Please pay to this cell phone 8:01 PM
Me: No bringing in email addresses 8:01 PM
+12242028827: what are you saying since 8:02 PM
Me: I am saying I am ready to do the deal if you can pay me with paypal 8:04 PM
+12242028827: i told you already 8:06 PM
Me: ?? 8:07 PM
+12242028827: get back to me with your paypal email address for the payment 8:08 PM
Me: ***************@**.com 8:09 PM
Me: Or you can send payment to 814 *** **** using the paypal app 8:09 PM
+12242028827: okay 8:10 PM
+12242028827: send me the pics to my email,, so i will proceed with the payment now,get back to me when am done 8:11 PM
Me: Cool. Let me confirm I have received the payment. 8:12 PM
Me: No receipt of payment yet. The pix are available on the post right here btw****08994.html 8:15 PM
Me: Are you still there ? 8:18 PM
+12242028827: what 8:19 PM
Me: Did you send the payment already? 8:19 PM
+12242028827: E**@***.com,can you just stop all these rubbish 8:22 PM
Me: Yeah. I am serious. 8:22 PM
+12242028827: can you check stop doing rubbish 8:22 PM
Me: I told you I work at PayPal 8:23 PM
Me: That’s my legit email 8:23 PM
+12242028827: okay,thank you,give me 4hours 8:24 PM
Me: Ok 8:25 PM

How to form good data questions

Consider the following questions:

BAD: How many customers churned (went dormant) last month?

GOOD: What percentage of our customers are expected to churn in the next few months and why?

BETTER: What short term impact can we have on our business by reducing customer churn and what options do we have to do that?

Needless to say, bad questions get bad answers. As business leaders we often approach our data science and analytic teams with very specific questions. In most cases asking specific questions gets us very specific answers, which alone is useless. In this case, the good and better questions are likely to generate some very useful and actionable analysis.

Good data/analytics questions seek to get three things out of the answers or analysis. These are:

  1. Insight - Something you didn’t know before
  2. Action - Asking for possible solutions
  3. Time - In a time-frame that your actions can have an impact

What you want is information (not data)

The answer should provide some new thought or idea that was not apparent before. For example,being told about your organization’s market share that you didn’t know before or your customer life time value. To ask questions that generate insight you should almost never ask for absolute numbers. Instead, ask for ratios and percentages of how data points compare.

An insight is still pretty useless if you can’t do much about it. If you were simply told what your market share was, that’s not so useful. However, if you were told of a market share shift because your competitor was introducing a new product at a lower price point, that would be actionable.

Ask questions that don’t generate a rear view mirror answer. While historic data can help generate trends, it is the ‘prediction’ that’s valuable. For example, instead of finding out your current market share, it might be better asking what is the predicted market share in the next few months. Another important aspect of timeliness is the reproduction and periodic rerun of the analysis so that it continues to provide insights. You want your analysts to automate this process as well so that they are not spinning their wheels each week or month doing the same thing.

Time value of business insights

Business insights extracted through data mining are valuable for a very short period of time

Just because your organization has or an Oracle database doesn’t mean you should be going out and hiring a team of data scientists to work full-time. The value of insights that these professionals generate, tends to diminish over time if no new data-sets are being populated or expanded as a result of growing business operations.

Data-sets, like gold mines, are finite and vary based on the three V’s (volume, variety and velocity). Looking from an organization’s leadership perspective, insights discovery typically takes place in four stages.

  1. Initial Stage (Finding the gold mine)
  2. Data Discovery
  3. Decision Making
  4. Monitoring

The value an organization gets out of data sets varies over time, but for most typical engagements the process is very similar. The figure below is an abstract view of how much value is generated over time (Note that this figure is not cumulative) during each of the aforementioned stages.


Finding the gold mine

Today most companies are sitting on Enterprise Resource Planning (ERP) and CRM Customer Relationship Management (CRM) tools that over time become data gold mines. Even small tools like Quickbooks, IBM Rationale Clearquest and Clearcase are storing away valuable log files that can unlock tremendous efficiency gains for organizations. The problem is that most organizations do not realize, or even understand, the significant value their data-sets can generate for them.

In this first stage, while no value is extracted from data-sets, an acknowledgement has to happen by the organization’s leadership recognizing the value of data. The leader then assembles a team of business analytics and data science professionals that are asked to explore the possibility of finding golden nuggets in these data sets.

There are many different ways data brings value to an organization: transparency, platform for experimentation, customized actions, and automation of decision making. A right combination of skill sets is also required to analyze this data.

Discovery Stage

Following the formation of a team, Data discovery is the phase where a number of processes happen. Business analysts look at the data and try to determine what analytics frameworks should be applied to the data; what Key Performance Indicators (KPIs) to extract; and what potential business cases can be constructed. Business analysts and scientists are not only examining the data but also determining the best ways to perform robustness and data hygiene checks.

Once some inroads are made into data structure and cleaning, rich insights and trends start to emerge. Many insights at this point are interesting but less are useful and actionable. This stage is where logged data is converted to daily active users, transaction amounts per day, useful sales ratios and histogram plots of various segments.

During this exploratory stage, a lot of visualization tools become useful as well. Time series and historical analysis are conducted to see how things are changing over time. Predictive analysis can be done to see where the organization is headed. In summary, this stage brings a tremendous amount of transparency to leadership uncovering opportunities for data to play a role.

Decision Making

After a significant amount of brainstorming with data-sets, analysts and leaders define a clear framework that separates out vanity metrics from ones that are useful and actionable. This is an important step because two very important things happen: (1) determination of what data the organization needs to consider, and more important, what data it does not (2) putting together a system in place that determines the right intervals to repeat the insight generation process.

Determining what metrics to focus on is a very critical step because it ties analytics to the organization’s strategic plan. A connection to the strategic plan also means that leaders of the organization have to openly assimilate the metric into the goals of the organization. The framework becomes important because it helps members of the organization understand how they contribute to the goal.

Another important aspect at this stage of the process is the development and deployment of data visualization that is accessible by all members of the organization. This enables democratization of data and allows everyone to be on the same page.

Monitoring Stage

After a visualization tool has been setup, organizations start to monitor their data. The organization also gets aligned with a certain analytical framework. Leaders get updated on the statistics of data periodically, and they take actions when things get out of trend. The amount of value being generated at each data generation cycle is at best incremental and data scientists and analytics don’t have much more value to add besides making sure data updates are accurate. Organizations that become smart enable reporting by exception.


Hiring a data science or business analytics team makes sense if an organization is going through tremendous growth and continues to generate new data sets. If this is not the case, data scientists are only valuable for a very small stage of the insights generation process. As a leader of an organization not undergoing tremendous growth, you must carefully think about your goals and the depth of goal mine you think you are about to mine

In their efforts to compensate for the unreliability of human performance, the designers of automated control systems have unwittingly created opportunities for new error types that can be even more serious than those they were seeking to avoid

- Chris Hart, acting chairman of the National Transportation Safety Board (NTSB) talking about the cause of the Asiana crash.

Unforeseen consequences of automation

Since the advent of autopilots life for pilots has become much easier. This was widely driven in the aerospace industry by something called Crew Resource Management which has a goal of focusing the crew on only a manageable number of tasks at a time. Adversely it has an impact potentially making pilots complacent and depend on autopilot systems more than the systems are capable of.

As businesses start to gather more data and move towards automation they are going to start facing the same kind of problems. Leaders of organizations trying to run lean may rely too heavily on automation resulting in unforeseen disasters.

Ways to get screwed by taxi drivers in Buenos Aires

On our recent travel to Argentina we primarily used taxis as a means of transportation in and around the city. I think I learned every possible way a taxi driver can deceive and cheat a passenger. Here are the top methods that were used on us on five different trips by various drivers. 

1. Currency conversion

Argentina has two exchange rates: one official and the other unofficial. Officially you get 8 souls per dollar and unofficially 11 per dollar. Upon arriving at the airport we only had american moola in our pockets and the taxi driver convinced to pay him 30 US dollars instead of local currency. Obviously trip would have been cheaper in local street rate.

2. Take you for a longer ride

This is a classic one where if you don’t know the way they take you the longer route. How did we know we were getting screwed? We had google maps pulling directions for us and the taxi driver was obviously ‘taking us for a ride’.

3. Take you for a very slow ride

Aware of the shenanigans of the long ride, I started giving turn by turn directions at the next occasion. This guy caught on, and tried a brand new trick. Started driving 5 miles per hour below speed limit. For a trip that google maps predicted 40 minutes, it took us an hour and 5 more dollars. 

4. Meter rates

Another classic one is a faster meter. For a trip that cost 10 dollars one way, cost us $20 returning the same route.

5. Charge extra for your bags

After all these lessons learned you would think we should be negotiating rates in advance. That did not work either, even after negotiating the price, the taxi driver asked for more for ‘additional’ services for unloading bags etc. 

All said and done any money we saved during our awesome negotiations with Peruvians, we lost to Argentina. Chalking another one up to #costofdoingbusiness

P.S. Shout out to Peruvians for the being one of the nicest and politest people.