How Does Did Recommend Differ from Likely to Recommend?

Have you recommended a product to a friend?

Will you recommend that same product to a friend again?

Which of these questions is a better indicator of whether you will actually recommend a product?

If people were consistent, rational, and honest, it’s simple. The second question asks about future intent so that would be the logical choice.

It may come as no surprise that people aren’t necessarily good predictors of their own future behavior.

How the question is phrased, along with the activity and distance in the future, each have an impact on how predictive the question is. It may, therefore be risky to heavily rely on people’s prediction of whether they WILL recommend.

We’ll look at how well-stated intentions predict future actions in a future article. In this article, we’ll examine how asking people whether they recommended something (recalling the past) differs from asking whether they will recommend (a future prediction of behavior). Are you really asking something different? If so, how do they differ and what problems arise from using only reports of past behavior to predict future behavior?

Predicting Is Hard, Especially About the Future

In theory, people should be accurate in recalling whether they did something rather than whether they will do something. With this in mind, Tomer Sharon recommends something he calls the aNPS (actual NPS)—whether people said they recommended over a recent period of time. Similarly, Jared Spool argues past recommendations are a better predictor of someone’s loyalty than asking about what people will do in the future. He uses an example from a Netflix survey to illustrate the point. Neither provides any data to support the idea that these are superior measures of future intent to recommend. (Finding data is hard.)

There is success in predicting what will happen based on what has happened, especially over short intervals and when people and things remain the same.

But even when the ideal conditions exist—even for behavior that is habitual—past behavior is far from a perfect predictor of what will happen. For example, when predicting whether people will exercise in the future, past exercise behavior was found as the best predictor of future exercise. But its correlation, while high, was not a perfect predictor (r = .69).

In a meta-analysis of 16 studies[pdf], Oulette and Wood (1998) also found that past behavior correlated with future behavior, but it was more modest (r = .39). Interestingly, though, they found behavior intentions (how likely you are to do something) were better predictors of future behavior (r = .54). They did find a more complex interaction between past behavior and future intent. In stable contexts, past behavior is a better predictor of future behavior. But in unstable contexts and when behavior isn’t performed on a frequent basis, intention is a better predictor of future acts.

Recommending products or brands to friends and colleagues is likely a less stable and infrequent action. A future analysis will examine this in more detail.

People and Products Change

Using past recommendations to predict future behavior also presents another challenge to consider: changing experiences.

Even if we have a perfect record of what people did and could exclude memory bias, change happens. People change their minds, and products and companies change (often quickly). So what people did in the past, even the recent past, may not be as good an indicator of what they will do in the future. A good example of this also comes from Netflix.

In 2011, Netflix split its mail order DVD business from its online streaming and raised prices. This angered customers (many of whom likely recently recommended the service) and Netflix lost 800,000 customers by one estimate.

We were tracking Netflix’s NPS in 2011 and this change was captured in its Net Promoter Score, which went from one of the highest in our database (73%) to a low -7% as shown in Figure 1.

Figure 1: The Netflix NPS in Feb 2011 and Oct 2011, before and after announcing an unpopular product change.

The Netflix example shows how past recommendations (even recent ones) can quickly become poor predictors of future attitudes and behavior. Netflix ultimately reversed its decision and more than recovered its subscriber base (and NPS score).

But that’s probably an extreme example of how a major change to a product can result in a major change to usage and recommendations. To more systematically understand how reports of past recommendations may differ from future likelihood recommendations, we examined data in two large datasets and compared the differences between past recommendations and stated likelihood to recommend in the future.

Study 1: Software Recommendations

In our 2014 consumer and business software reports, we asked current users of software products whether they recommended the product and how likely they are to recommend it in the future using the 11-point Likelihood to Recommend (LTR) item from the Net Promoter Score. We used this information to quantify the value of a promoter.

On average, 29% of software users in our study reported that they recommended the product to at least one person in the prior year. Of these past recommenders, we found 41% were most likely to recommend it again—giving a 10 on the LTR item (Figure 2). Extending this to the top two boxes we see that 64% of people who recommended are also promoters. Those who responded 8–10 on the LTR item accounted for a substantial (85%) portion of the recommendations. Including passives and promoters (7–10) accounts for almost all (93%) of the recommendations.

You can see the large drop off in recommendations between an 8 and 7 (Figure 2) where the percentage of software users reported recommending drops by more than half (from 21% to 8%) and then by half again (8% to 3%) when moving from 7 (passive) to 6 (detractor). It would be interesting to know why respondents in this study who did recommend in the past are less likely to in the future.

Figure 2: Percent of respondents who reported recommending a software product in the past and would recommend it in the future using the 0 to 10 LTR scale.

Study 2: Recent Recommendation

To corroborate the pattern and look for reasons why past recommendations don’t account for all future recommendations we conducted another study in November 2018 with a more diverse set of products and services. We asked 2,672 participants in an online survey to think about what product or service they most recently recommended to a friend or colleague, either online or offline. Some examples of recently recommended companies and products were:

  • Barnes & Noble
  • Amazon
  • eBay
  • Colourpop Lipstick
  • PlayStation Plus
  • Spotify music
  • Bojangles Chicken and Biscuits
  • Ryka—shoes for women

After the first few hundred responses we noticed many people had recently recommended Amazon, so we asked respondents to think of another company. After recalling their most recent recommendation, we used the 0 to 10 LTR scale (0 = not at all likely to recommend and 10 = extremely likely) to find out how likely they would be to recommend this same product or service again to a friend.

The distribution of likelihood to recommend responses is shown in Figure 3. About half (52%) of participants selected the highest response of 10 (extremely likely) and 17% selected 9. Or, of the people who recommended a product or service in the past, 69% are promoters (Figure 4). Figure 3 also shows the bulk of values are at 8 and above, accounting for 84% of responses and another substantial drop in recommendations happens from 7 to 6 (from 8% to 2%).

Figure 3: Distribution of likelihood to recommend from 2,672 US respondents who recently recommended a product or service.

Figure 4 shows that 92% of all recommendations came from passives and promoters (almost identical to the 93% in Study 1). Across both studies around 8% of past recommenders were not very likely to recommend again. Why won’t they recommend again?

Figure 4: Percent of respondents that are promoters, passives, and detractors for products or services they reported recommending in the past (n =2672).

To find out, we asked respondents who gave 6 or below to briefly tell us why they’re now less likely to recommend in the future.

The most common reasons given were that the product or service was disappointing to them or the person they recommended it to, or it changed since they used it. Examples include:

  • I am slightly less likely to recommend because my purchase contained dead pixels (TCL television).
  • We’ve had recent issues with their products (lightexture).
  • The website was not as user-friendly as it could be (Kohl’s).
  • Service not what I was told it would be (HughesNet).
  • Product didn’t perform as expected (Pony-O).
  • The person I recommended to it went on to have an issue using the site (Fandango).

Several respondents also didn’t feel the opportunity would come up again, supporting the idea that recommendations may be infrequent, indicating past behavior may be a poor predictor of future behavior.

  • I usually recommend Roxy to those who compliment me on my Roxy shoes. If that doesn’t happen then I don’t really bring them up.
  • I don’t expect anyone in my immediate circle to be soliciting recommendations for Dell. If they asked, I would give them a positive recommendation.
  • If it comes up in the conversation, I will recommend it. If it doesn’t I won’t necessarily go out of my way to do it.

Summary and Takeaways

In our analysis of the literature and across two studies examining past recommendations and future intent to recommend, we found:

Did recommend and will recommend offer similar results. If you want to use past recommendations as a “better” measure of future intent, this analysis suggests asking future intent to recommend is highly related with past recommendations. Across two studies, around 92% of respondents who did recommend are also most likely to recommend again (if you include the promoters and passives category).

Around 8% won’t recommend again. While past recommendations and future intent to recommend are highly related, around 8% of respondents who recently recommended a company, product, or experience won’t recommend it in the future (recommendation attrition). Even the most recent recommendation wasn’t a perfect predictor of stated intent to recommend. Expect the recommendation attrition to increase the longer the time passes between when you ask people whether they recommended.

Poor experiences cause recommendation loss. The main reason we found why people who had recommended in the past won’t in the future is that they (or the person they recommended) had a disappointing experience. This can happen from participants who had a more recent bad experience or because the product or service changed (as was the case when Netflix changed its pricing). Several participants also indicated they didn’t feel an opportunity would come up again to recommend.

Don’t dismiss future intentions. Past behavior may be a good indicator of future behavior but not universally. A meta-analysis in the psychology literature suggests stated intent has both an important moderating effect on past behavior, and in many cases is a better predictor of future behavior. We’ll investigate how this holds with product and company recommendations in a future article.

Ask both past recommendations and future intent. If you’re able, ask about both the past and future. It’s likely that people who recommended and are extremely likely to recommend again are the best predictors of who will recommend in the future. A literature review found that a combination of past behavior and future intentions may be better predictors of future behavior depending on the context and frequency of the behavior. Several participants in this study indicated that their recommendations were infrequent and despite prior recommendations may be less likely to recommend again (even though their attitude toward their experience didn’t decrease)


(function() {
if (!window.mc4wp) {
window.mc4wp = {
listeners: [],
forms : {
on: function (event, callback) {
event : event,
callback: callback

Sign-up to receive weekly updates.


Source: How Does Did Recommend Differ from Likely to Recommend?

Relationships, Transactions and Heuristics

There are two general types of customer satisfaction surveys: 1) Customer Transaction Surveys and 2) Customer Relationship Surveys. Customer Transactional Surveys allow you to track satisfaction for specific events. The transactional surveys are typically administered soon after the customer has a specific interaction with the company. The survey asks the customers to rate that specific interaction. Customer Relationship Surveys allow you to measure your customer’s attitudes across different customer touchpoints (e.g., marketing, sales, product, service/support) at a given point in time. The administration of the relationship surveys is not linked to any specific customer interaction with the company. Relationship surveys are typically administered at periodic times throughout the year (e.g., every other quarter, annually). Consequently, the relationship survey asks the customers to rate the company based on their past experience. While the surveys differ with respect to what is being rated (a transaction vs. a relationship), these surveys can share identical customer touchpoint questions (e.g., technical support, sales).

A high-tech company was conducting both a transactional survey and a relationship survey. The surveys shared identical items. Given that the ratings were coming from the same company and shared identical touchpoint questions, they expected the ratings to be the same for both the relationship survey and the transactional survey. The general finding, however, was that ratings on the transactional survey were typically higher than ratings for the same question on the relationship survey. What score is correct about the customer relationship? Why don’t ratings of identical items on relationship surveys and transactional surveys result in the same score? Humans are fallible.

Availability Heuristic

There is a line of research that examines the process by which people make judgments. This research shows that people use heuristics, or rules of thumb, when asked to make decisions or judgments about frequencies and probabilities of events. There is a heuristic called the “availability heuristic” that applies here quite well and might help us explain the difference between transactional ratings and relationship ratings of identical items.

People are said to employ the availability heuristic whenever their estimate of the frequency or probability of some event is based on the ease with which instances of that event can be brought to mind. Basically, the things you can recall more easily are estimated by you to be more frequent in the world than things you can’t recall easily. For example, when presented with a list containing an equal number of male and female names, people are more likely to think that the list contains more male names than female names when more males’ names are of famous men. Because these famous names were more easily recalled, the people think that there must be more male names than female names.

Customers, when rating companies as a whole (relationship surveys), are recalling their prior interactions with the company (e.g., their call into phone support, receipt of marketing material). Their “relationship rating” is a mental composite of these past interactions, negative, positive and mundane. Negative customer experiences, unlike positive or mundane ones, tend to be more vivid, visceral, and, consequently, are more easily recalled compared to pleasant experiences. When I think of my past interactions with companies, it is much easier for me to recall negative experiences than positive experiences. When thinking about a particular company, due to the availability heuristic, customers might overestimate the number of negative experiences, relative to positive experiences, that actually occurred with the company. Thus, their relationship ratings would be adversely affected by the use of the availability heuristic.

Ratings from transactional surveys, however, are less vulnerable to the effect of the availability heuristic. Because the customers are providing ratings for one recent, specific interaction, the customers’ ratings would not be impacted by the availability heuristic.

Summary and Implications

Customer satisfaction ratings in relationship surveys are based on customers’ judgment of past experiences with the company, and, consequently, are susceptible to the effects of the availability heuristic. Customers may more easily recall negative experiences, and, consequently, these negative experiences negatively impact their ratings of the company overall. While it would appear that a transactional survey could be a more accurate measure than a relationship survey, you shouldn’t throw out the use of relationship surveys just yet.

While average scores on items in relationship surveys might be decreased due to the availability heuristic, the correlation among items should not be impacted by the availability heuristic because correlations are independent of scale values; decreasing ratings by a constant across all customers does not have any effect on the correlation coefficients among the items being rated. Consequently, the same drivers of satisfaction/loyalty would be found irrespective of survey type.

I’d like to hear your thoughts.

Source: Relationships, Transactions and Heuristics

How the Right Loyalty and Operational Metrics Drive Service Excellence – Webinar


Last week, I spoke at the CustomerThink Customer Experience Thought Leader Forum, which includes customer experience researchers and practitioners sharing leading-edge practices. Bob Thompson, founder of CustomerThink, organized several sessions focusing on specific CX issues facing business today. In our session, titled Customer Service Excellence: How to Optimize Channel and Metrics to Drive Ominchannel Excellence, Stephen Fioretti, VP of Oracle and I addressed two areas about customer service. He talked about how customer service organizations can align their channel strategy to customer needs by guiding them to the right channel based on the complexity and time sensitivity of interactions. I talked about the different types of metrics to help us understand relationship-level and transaction-level attitudes around service quality.

Self-Service Channel Adoption Increases but Delivers a Poor Experience

Stephen reported some interesting industry statistics from Forrester and Technology Services Industry Association. While the adoption of self-service is on the rise, customers are substantially less satisfied (47% satisfied) with these channels compared to the traditional (and still most popular) telephone channel (74% satisfied). So, while automated service platforms save companies money, they do so at the peril of the customer experience. As more customers adopt these automated channels, companies need to ensure they deliver a great self-service experience.

Improving the Customer Experience of Automated Channels through Behavioral/Attitudinal Analytics

In the talk, I showed how companies, by using linkage analysis, can better understand the self-service channel by analyzing the data behind the transactions, both behavioral and attitudinal. After integrating different data silos, companies can apply predictive analytics on their customer-generated data (e.g., web analytics) to make predictions about customers’ satisfaction with the experience. Using web analytics of online behavior patterns, companies might be able to profile customers who are predicted to be dissatisfied and intervene during the transaction to either improve their service experience or ameliorate its negative impact.

Stephen and I cover a lot more information in the webinar. To learn more, you can access the complete CX Forum webinar recording and slides here (free registration required).


Source: How the Right Loyalty and Operational Metrics Drive Service Excellence – Webinar

Beachbody Gets Data Management in Shape with Talend Solutions

This post is co-authored by Hari Umapathy, Lead Data Engineer at Beachbody and Aarthi Sridharan, Sr.Director of Data (Enterprise Technology) at Beachbody.

Beachbody is a leading provider of fitness, nutrition, and weight-loss programs that deliver results for our more than 23 million customers. Our 350,000 independent “coach” distributors help people reach their health and financial goals.

The company was founded in 1998, and has more than 800 employees. Digital business and the management of data is a vital part of our success. We average more than 5 million monthly unique visits across our digital platforms, which generates an enormous amount of data that we can leverage to enhance our services, provide greater customer satisfaction, and create new business opportunities.

Building a Big Data Lake

One of our most important decisions with regard to data management was deploying Talend’s Real Time Big Data platform about two years ago. We wanted to build a new data environment, including a cloud-based data lake, that could help us manage the fast-growing volumes of data and the growing number of data sources. We also wanted to glean more and better business insights from all the data we are gathering, and respond more quickly to changes.

We are planning to gradually add at least 40 new data sources, including our own in-house databases as well as external sources such as Google Adwords, Doubleclick, Facebook, and a number of other social media sites.

We have a process in which we ingest data from the various sources, store the data that we ingested into the data lake, process the data and then build the reporting and the visualization layer on top of it. The process is enabled in part by Talend’s ETL (Extract, Transform, Load) solution, which can gather data from an unlimited number of sources, organize the data, and centralize it into a single repository such as a data lake.

We already had a traditional, on-premise data warehouse, which we still use, but we were looking for a new platform that could work well with both cloud and big data-related components, and could enable us to bring on the new data sources without as much need for additional development efforts.

The Talend solution enables us to execute new jobs again and again when we add new data sources to ingest in the data lake, without having to code each time. We now have a practice of reusing the existing job via a template, then just bringing in a different set of parameters. That saves us time and money, and allows us to shorten the turnaround time for any new data acquisitions that we had to do as an organization.

The Results of Digital Transformation

For example, whenever a business analytics team or other group comes to us with a request for a new job, we can usually complete it over a two-week sprint. The data will be there for them to write any kind of analytics queries on top of it. That’s a great benefit.

The new data sources we are acquiring allow us to bring all kinds of data into the data lake. For example, we’re adding information such as reports related to the advertisements that we place on Google sites, the user interaction that has taken place on those sites, and the revenue we were able to generate based on those advertisements.

We are also gathering clickstream data from our on-demand streaming platform, and all the activities and transactions related to that. And we are ingesting data from the marketing cloud, which has all the information related to the email marketing that we do. For instance, there’s data about whether people opened the email, whether they responded to the email and how.

Currently, we have about 60 terabytes of data in the data lake, and as we continue to add data sources we anticipate that the volume will at least double in size within the next year.

Getting Data Management in Shape for GDPR

One of the best use cases we’ve had that’s enabled by the Talend solution relates to our efforts to comply with the General Data Protection Regulation (GDPR). The regulation, a set of rules created by the European Parliament, European Council, and European Commission that took effect in May 2018, is designed to bolster data protection and privacy for individuals within the European Union (EU).

We leverage the data lake whenever we need to quickly access customer data that falls under the domain of GDPR. So when a customer asks us for data specific to that customer we have our team create the files from the data lake.

The entire process is simple, making it much easier to comply with such requests. Without a data lake that provides a single, centralized source of information, we would have to go to individual departments within the company to gather customer information. That’s far more complex and time-consuming.

When we built the data lake it was principally for the analytics team. But when different data projects such as this arise we can now leverage the data lake for those purposes, while still benefiting from the analytics use cases.

Looking to the Future

Our next effort, which will likely take place in 2019, will be to consolidate various data stores within the organization with our data lake. Right now different departments have their own data stores, which are siloed. Having this consolidation, which we will achieve using the Talend solutions and the automation these tools provide, will give us an even more convenient way to access data and run business analytics on the data.

We are also planning to leverage the Talend platform to increase data quality. Now that we’re increasing our data sources and getting much more into data analytics and data science, quality becomes an increasingly important consideration. Members of our organization will be able to use the data quality side of the solution in the upcoming months.

Beachbody has always been an innovative company when it comes to gleaning value from our data. But with the Talend technology we can now take data management to the next level. A variety of processes and functions within the company will see use cases and benefits from this, including sales and marketing, customer service, and others.

About the Authors: 

Hari Umapathy

Hari Umapathy is a Lead Data Engineer at Beachbody working on architecting, designing and developing their Data Lake using AWS, Talend, Hadoop and Redshift.  Hari is a Cloudera Certified Developer for Apache Hadoop.  Previously, he worked at Infosys Limited as a Technical Project Lead managing applications and databases for a huge automotive manufacturer in the United States.  Hari holds a bachelor’s degree in Information Technology from Vellore Institute of Technology, Vellore, India.


Aarthi Sridharan

Aarthi Sridharan is the Sr.Director of Data (Enterprise Technology) at Beachbody LLC,  a health and fitness company in Santa Monica. Aarthi’s leadership drives the organization’s abilities to make data driven decisions for accelerated growth and operational excellence. Aarthi and her team are responsible for ingesting and transforming large volumes of data into traditional enterprise data warehouse and into the data lake and building analytics on top it. 

The post Beachbody Gets Data Management in Shape with Talend Solutions appeared first on Talend Real-Time Open Source Data Integration Software.

Source: Beachbody Gets Data Management in Shape with Talend Solutions

Do Detractors Really Say Bad Things about a Company? you think of a bad experience you had with a company?

Did you tell a friend about the bad experience?

Negative word of mouth can be devastating for company and product reputation. If companies can track it and do something to fix the problem, the damage can be contained.

This is one of the selling points of the Net Promoter Score. That is, customers who rate companies low on a 0 to 10 scale (6 and below) are dubbed “Detractors” because they‘re more likely spreading negative word of mouth and discouraging others from buying from a company. Companies with too much negative word of mouth would be unable to grow as much as others that have more positive word of mouth.

But is there any evidence that low scorers are really more likely to say bad things?

Is the NPS Scoring Divorced from Reality?

There is some concern that these NPS designations are divorced from reality. That is, there’s no evidence (or reason) for detractors being classified as 0 to 6 and promoters being 9-10. If these designations are indeed arbitrary or make no sense, then it’s indeed concerning. (See the tweet comment from a vocal critic in Figure 1.)

Figure 1 Validity of NPS designations

Figure 1: Example of a concern being expressed about the validity of the NPS designations.

To look for evidence of the designations, I re-read the 2003 HBR article by Fred Reichheld that made the NPS famous. Reichheld does mention that the reason for the promoter classification is customer referral and repurchase rates but doesn’t provide a lot of detail (not too surprising given it’s an HBR article) or mention the reason for detractors here.

Figure 2 Quote HBR article

Figure 2: Quote from the HBR article “Only Number You Need to Grow,” showing the justification for the designation of detractors, passives, and promoters.

In his 2006 book, The Ultimate Question, Reichheld further explains the justification for the cutoff of detractors, passives, and promoters. In analyzing several thousand comments, he reported that 80% of the Negative Word of Mouth comments came from those who responded from 0 to 6 on the likelihood to recommend item (pg 30). He further reiterated the claim that 80% of the customer referrals came from promoters (9s and 10s).

Contrary to at least one prominent UX voice on social media, there is some evidence and justification for the designations. It’s based on referral and repurchase behaviors and the sharing of negative comments. This might not be enough evidence to convince people (and certainly not dogmatic critics) to use these designations though. It would be good to find corroborating data.

The Challenges with Purchases and Referrals

Corroborating the promoter designation means finding purchases and referrals. It’s not easy associating actual purchases and actual referrals with attitudinal data. You need a way to associate customer survey data with purchases and then track purchases from friends and colleagues. Privacy issues aside, even in the same company, purchase data is often kept in different (and guarded) databases making associations challenging. It was something I dealt with constantly while at Oracle.

What’s more, companies have little incentive to share repurchase rates and survey data with outside firms and third parties may not have access to actual purchase history. Instead, academics and researchers often rely on reported purchases and reported referrals, which may be less accurate than records of actual purchases and actual referrals (a topic for an upcoming article). It’s nonetheless common in the Market Research literature to rely on stated past behavior as a reasonable proxy for actual behavior. We’ll also address purchases and referrals in a future article.

Collecting Word-of-Mouth Comments

But what about the negative comments used to justify the cutoff between detractors and passives? We wanted to replicate Reichheld’s findings that detractors accounted for a substantial portion of negative comments using another dataset to see whether the pattern held.

We looked at open-ended comments we collected from about 500 U.S. customers regarding their most recent experiences with one of nine prominent brands and products. We collected the data ourselves from an online survey in November 2017. It included a mix of airlines, TV providers, and digital experiences. In total, we had 452 comments regarding the most recent experience with the following brands/products:

  • American Airlines
  • Delta Airlines
  • United Airlines
  • Comcast
  • DirecTV
  • Dish Network
  • Facebook
  • iTunes
  • Netflix

Participants in the survey also answered the 11-point Likelihood to Recommend question, as well as a 10-point and 5-point version of the same question.

Coding the Sentiments

The open-ended comments were coded into sentiments from two independent evaluators. Negative comments were coded -1, neutral 0, and positive 1. During the coding process, the evaluators didn’t have access to the raw LTR scores (0 to 10) or other quantitative information.

In general, there was good agreement between the evaluators. The correlation between sentiment scores was high (r = .83) and they agreed 82% of the time on scores. On the remaining 18% where there was disagreement, differences were reconciled, and a sentiment was selected.

Most comments were neutral (43%) or positive (39%), with only 21% of the comments being coded as negative.

Examples of positive comments

“I flew to Hawaii for vacation, the staff was friendly and helpful! I would recommend it to anyone!”—American Airlines Customer

“I love my service with Dish network. I use one of their affordable plans and get many options. I have never had an issue with them, and they are always willing to work with me if something has financially changed.”—Dish Network Customer

Examples of neutral comments

“I logged onto Facebook, checked my notifications, scrolled through my feed, liked a few things, commented on one thing, and looked at some memories.”—Facebook User

“I have a rental property and this is the current TV subscription there. I access the site to manage my account and pay my bill.”—DirecTV User

Examples of negative comments

“I took a flight back from Boston to San Francisco 2 weeks ago on United. It was so terrible. My seat was tiny and the flight attendants were rude. It also took forever to board and deboard.”—United Airlines Customer

“I do not like Comcast because their services consistently have errors and we always have issues with the internet. They also frequently try to raise prices on our bill through random fees that increase over time. And their customer service is unsatisfactory. The only reason we still have Comcast is because it is the best option in our area.”—Comcast Customer

Associating Sentiments to Likelihood to Recommend (Qual to Quant)

We then associated each coded sentiment with the 0 to 10 values on the Likelihood to Recommend item provided by the respondent. Figure 3 shows this relationship.

Figure 3: Likelihood to Recommend

Figure 3: Percent of positive or negative comments associated with each LTR score from 0 to 10.

For example, 24% of all negative comments were associated with people who gave a 0 on the Likelihood to Recommend scale (the lowest response option). In contrast, 35% of positive comments were associated with people who scored the maximum 10 (most likely to recommend). This is further evidence for the extreme responder effect we’ve discussed in an earlier article.

You can see a pattern: As the score increases from 0 to 10, the percent of negative comments go down (r = -.71) and the percent of positive comments go up (r = .87). There isn’t a perfect linear relationship between comment sentiment and scores (otherwise the correlation would be r =1). For example, the percent of positive comments is actually higher at responses of 8 than 9 and the percent of negative comments is actually higher at 5 than 4 (possibly an artifact of this sample size). Nonetheless, this relationship is very strong.

Detractor Threshold Supported

What’s quite interesting from this analysis is that at a score of 6, the ratio of positive to negative comments flips. Respondents with scores above a 6 (7s-10s) are more likely to make positive comments about their most recent experience. Respondents who scored their Likelihood to Recommend at 6 and below are more likely to make negative comments (spread negative word of mouth) about their most recent experience.

At a score of 6, a participant is about 70% more likely to make a negative comment than a positive comment (10% vs 6% respectively). As scores go lower, the ratio goes up dramatically. At a score of 5, participants are more than three times as likely to make a negative comment as a positive comment. At a score of 0, customers are 42 times more likely to make a negative rather than a positive comment (0.6% vs. 24% respectively).

When aggregating the raw scores into promoters, passives, and detractors, we can see that a substantial 90% of negative comments are associated with detractors (0 to 6s). This is shown in Figure 4.

The positive pattern is less pronounced, but still a majority (54%) of positive comments are associated with promoters (9s and 10s). It’s also interesting to see that the passives (7s and 8s) have a much more uniform chance of making a positive, neutral, or negative comment.

This corroborates the data from Reichheld, which showed 80% of negative comments were associated with those who scored 0 to 6. He didn’t report the percent of positive comments with promoters and didn’t associate the responses to each scale point as we did here (you’re welcome).

Figure 4: Percent of positive and negative comments

Figure 4: Percent of positive or negative comments associated with each NPS classification.

If your organization uses a five-point Likelihood to Recommend scale (5 = extremely likely and 1 = not at all likely), there are similar patterns, albeit on a more compressed scale (see Figure 5 ). At a response of 3, the ratio of positive to negative comments also flips—making responses 3 or below also good designations for detractors. At a score of 3, a customer is almost four times as likely to make a negative comment about their experience than a positive comment.

Figure 5: Percent positive or negative comments LTR

Figure 5: Percent of positive or negative comments associated with each LTR score from 1 to 5 (for companies that use a 5-point scale).

Summary & Takeaways

An examination of 452 open-ended comments about customers most recent experience with nine prominent brands and products revealed:

  • Detractors accounted for 90% of negative comments. This independent evaluation corroborates the earlier analysis by Reichheld that found detractors accounted for a majority of negative word-of-mouth comments. This smaller dataset actually found a higher percentage of negative comments associated with 0 to 6 responses than Reichheld reported.
  • Six is a good threshold for identifying negative comments. The probability a comment will be negative (negative word of mouth) starts to exceed positive comment probability at 6 (on the 11-point LTR scale) and 3 (on a 5-point scale). Researchers looking at LTR scores alone can use this threshold to provide some idea about the probability of the customer sentiment about their most recent experience.
  • Repurchase and referral rates need to be examined. This analysis didn’t examine the relationship between referrals or repurchases (reported and observed) and likelihood to recommend, a topic for future research to corroborate the promoter designation.
  • Results are for specific brands used. In this analysis, we selected a range of brands and products we expected to represent a good range of NPS scores (from low to high). Future analyses can examine whether the pattern of scores at 6 or below correspond to negative sentiment in different contexts (e.g. for the most recent purchase) or for other brands/products/websites.
  • Think probabilistically. This analysis doesn’t mean a customer who gave a score of 6 or below necessarily had a bad experience or will say bad things about a company. Nor does it mean that a customer who gives a 9 or 10 necessarily had a favorable experience. You should think probabilistically about UX measures in general and NPS too. That is, it’s more likely (higher probability) that as scores go down on the Likelihood to Recommend item, the chance someone will be saying negative things goes up (but doesn’t guarantee it).
  • Examine your relationships between scores and comments. Most companies we work with have a lot of NPS data associated with verbatim comments. Use the method of coding sentiments described here to see how well the detractor designation matches sentiment and, if possible, see how well the promoter designations correspond with repurchase and referral rates or other behavioral measures (and consider sharing your results!).
  • Take a measured approach to making decisions. Many aspects of measurement aren’t intuitive and it’s easy to dismiss what we don’t understand or are skeptical about. Conversely, it’s easy to accept what’s “always been done” or published in high profile journals. Take a measured approach to deciding what’s best (including on how to use the NPS). Don’t blindly accept programs that claim to be revolutionary without examining the evidence. And don’t be quick to toss out the whole system because it has shortcomings or is over-hyped (we’d have to toss out a lot of methods and authors if this were the case). In all cases, look for corroborating evidence…probably something more than what you find on Twitter.


Humana Using Analytics to Improve Health Outcomes


Earlier this year a CDW survey revealed that analytics is a top priority for two thirds of decision-makers in the health care industry. Nearly 70 percent of respondents said they were planning for or already implementing analytics.

This is no surprise, given the strong results seen by analytics from early adopters like Humana.

The health insurer has made analytics a foundational piece of its clinical operations and consumer engagement efforts. Humana uses predictive models to identify members who would benefit from regular contact with clinical professionals, helping them coordinate care and making needed changes in healthy lifestyle, diet and other areas. This proactive approach results in improved quality of life for members, at a lower cost, said Dr. Vipin Gopal, Enterprise VP, Clinical Analytics.

According to Humana, it identified 1.9 million members with high risk for some aspect of their health through predictive models in 2014. It also used analytics to detect and close 4.3 million instances where recommended care, such as an eye exam for a member with diabetes, had not been given. In those cases, Humana notified members and their physicians, through which such gaps in care were addressed.

“Every touch point with the health care system yields data, whether it’s a physician visit or a visit to a hospital or an outpatient facility,” Gopal said. “We use analytics to understand what can be done to improve health outcomes. Humana has over 15,000 care managers and other professionals who work with members to coordinate care and help them live safely at home, even when faced with medical and functional challenges. All of that work is powered by analytics.”

While health care has lagged other industries in adopting analytics, it accumulates a large volume of data which can be used to generate useful insights, Gopal said, adding, “Health care can hugely benefit from the analytics revolution.”

Until recently, Gopal said, many in health care “did not see analytics as a key component of doing business.” That is rapidly changing, however, largely based on the example of companies like Humana.

Real-time Analytics

Humana also used predictive analytics to help reduce the hospital readmission rate by roughly 40 percent through its Humana at Home programs. After noting that about one in six members enrolled in Humana’s Medicare plans were readmitted within 30 days of a hospital visit, the company built a predictive model to determine which members were most likely to get readmitted. It created a score quantifying the likelihood of readmission for each member; if the score rose above a certain point, a clinician would immediately follow up with the member.

This effort is especially notable, Gopal said, because it incorporates real-time analytics.

“When you are discharged from the hospital, for instance, the score is updated in real time and sent to a nurse,” he said. “If you are trying to prevent a readmission from happening within 30 days, you cannot run a predictive model once a month or even once a week.”

One of Humana’s latest efforts involves using analytics to address the progression of diseases like diabetes, which Gopal said affects about 30 percent of senior citizens. It is classifying its members with diabetes into low, medium and high severity categories. As a person goes from low to high severity, costs of care increase by seven times and quality of life steeply declines. Foot wounds go up 36 times, for example, and the number of foot amputations rises. So Humana is using predictive models to identify members most likely to progress and, hopefully, to slow progression through clinical interventions.

“Really understanding the variables through deep analytics and helping people to not progress, will be huge for our members and for overall public health as well,” Gopal said.

Keys to Analytics Success

Humana has benefited from a relatively mature technology infrastructure, a supportive CEO and an analytics team that Gopal built by design to include a mix of professionals with varied backgrounds — not just data scientists but those with backgrounds in public health, computer science, applied math and engineering.

Of his team, Gopal said, “These are deep problems, and we need the best multidisciplinary talent working on them. It’s not something just public health people can solve, or just computer science people can solve.”

Perhaps the biggest factor in Humana’s success with its analytics program, Gopal said, is using analytics to solve meaningful challenges.

“We do not work on stuff just because it’s cool to do, we work on problems where we can make a direct impact on the business,” he said. “That is how we select projects, and see it through to implementation and right through to results.”

Gopal will discuss Humana’s use of analytics in a presentation at TechFestLou, a conference hosted by the Technology Association of Louisville Kentucky next week in Louisville. A full schedule of events and other information is available on the event website.

Note: This article originally appeared in Enterprise Apps Today. Click for link here.

Originally Posted at: Humana Using Analytics to Improve Health Outcomes

Big Data Will Keep the Shale Boom Rolling

The number of active oil rigs in the United States continued to fall in May, as low prices pushed oil companies to temporarily shut down some of their production facilities. Since the end of May 2014, the U.S. rig count has fallen from 1,536 to 646, according to the energy analysis firm Platts—a 58 percent drop.

Low prices and plummeting rig counts have prompted a gusher of headlines claiming that the shale oil revolution, which by early this year boosted American oil production to nearly 10 million barrels a day, is grinding to a halt. The doomsayers, however, are missing a key parallel trend: lower prices are prompting unprecedented innovation in the oil fields, increasing production per well and slashing costs.
That’s the main reason that even as rig counts have fallen, total production has held steady or continued to rise. In the Eagle Ford, a major shale formation in South Texas, production in April was 22 percent higher than the same month in 2014, according to Platts.

In fact, some observers expect a second wave of technological innovation in shale oil production that will equal or surpass the first one, which was based on horizontal drilling and hydraulic fracturing, or fracking. Fueled by rapid advances in data analytics—aka big data—this new wave promises to usher in a second American oil renaissance: “Shale 2.0,” according to a May 2015 report by Mark Mills, a senior fellow at the Manhattan Institute, a free-market think tank.

Much of the new technological innovation in shale comes from a simple fact: practice makes perfect. Tapping hydrocarbons in “tight,” geologically complex formations means drilling lots and lots of wells—many more than in conventional oil fields. Drilling thousands of wells since the shale revolution began in 2006 has enabled producers—many of them relatively small and nimble—to apply lessons learned at a much higher rate than their counterparts in the conventional oil industry.

This “high iteration learning,” as Judson Jacobs, senior director for upstream analysis at energy research firm IHS, describes it, includes a shift to “walking rigs,” which can move from one location to another on a drilling pad, allowing for the simultaneous exploitation of multiple holes. Advances in drill bits, the blend of water, sand, and chemicals used to frack shale formations, and remote, real-time control of drilling and production equipment are all contributing to efficiency gains.

Photo courtesy of Ken Hodge via Flickr
Photo courtesy of Ken Hodge via Flickr

At the same time, producers have learned when to pause: more than half the cost of shale oil wells comes in the fracking phase, when it’s time to pump pressurized fluids underground to crack open the rock. This is known as well completion, and hundreds of wells in the U.S. are now completion-ready, awaiting a rise in oil prices that will make them economical to pump. Several oil company executives in recent weeks have said that once oil prices rebound to around $65 a barrel (the price was at $64.92 per barrel as of June 1), another wave of production will be unleashed.

This could help the U.S. to replace Saudi Arabia as the top swing producer—able to quickly ramp up (or down) production in response to price shifts. The real revolution on the horizon, however, is not in drilling equipment or practices: it’s in big data.

Thanks to new sensing capabilities, the volume of data produced by a modern unconventional drilling operation is immense—up to one megabyte per foot drilled, according to Mills’s “Shale 2.0” report, or between one and 15 terabytes per well, depending on the length of the underground pipes. That flood of data can be used to optimize drill bit location, enhance subterranean mapping, improve overall production and transportation efficiencies—and predict where the next promising formation lies. Many oil companies are now investing as much in information technology and data analytics as in old-school exploration and production.

At the same time, a raft of petroleum data startups, such as Ayata, FracKnowledge, and Blade Energy Partners, is offering 21st century analytics to oil companies, which have not been known for rapid, data-based innovation. Early efforts to bring modern data analytics into the oil and gas industry faltered, Jacobs says: “The oil companies tried to hire a bunch of data scientists, and teach them to be petroleum engineers. That didn’t go so well. The approach now is to take petroleum engineers and pair them up with technical experts who can supply the analytic horsepower, and try to marry these two groups.”

U.K.-based BP, for example, established a “decision analytics network” in 2012 that now employs more than 200 people “to examine ways to advance use of data and to help BP’s businesses harness these opportunities.”

If these initiatives succeed, big data could not only prolong the shale boom in the U.S., but also launch similar revolutions overseas. Applying the lessons from North America to low-producing oil fields elsewhere could unlock 141 billion barrels of oil in countries like China, Iran, Russia, and Mexico, IHS forecast in a report released last month.

To read the original article on the MIT Technology Review, click here.


Can You Use a Single Item to Predict SUS Scores?


sus-predictThe System Usability Scale (SUS) has been around for decades and is used by hundreds of organizations globally.

The 10-item SUS questionnaire is a measure of a user’s perception of the usability of a “system,” which can be anything from software, hardware, websites, apps, or voice interfaces.

The items are:

  1. I think that I would like to use this system frequently.
  2. I found the system unnecessarily complex.
  3. I thought the system was easy to use.
  4. I think that I would need the support of a technical person to be able to use this system.
  5. I found the various functions in this system were well integrated.
  6. I thought there was too much inconsistency in this system.
  7. I would imagine that most people would learn to use this system very quickly.
  8. I found the system very cumbersome/awkward to use.
  9. I felt very confident using the system.
  10. I needed to learn a lot of things before I could get going with this system.

Why 10 items?

The SUS was built by John Brooke using an approach inspired by what’s come to be known as Classical Test Theory in psychometrics. He started with 50 items that he thought would address the construct of what people think of when they think of the ease of use of systems.

The final set of 10 were the ones that best differentiated between a software application that was known to be easy and one that was difficult. The original study used a relatively small sample size (n =20) and reliability figures weren’t reported. Later research with more data has shown these final 10 items correlate with each other (some very highly, r >.7), with the total SUS score (all r >.62), and have high internal consistency reliability (Cronbach Alpha > .85).

The items are somewhat redundant (hence the high intercorrelations) but some redundancy is by design. To achieve high reliability in Classical Test Theory, you essentially ask the same question about a single construct in many different ways.

At the time the SUS was put forth, it was believed that these 10 items measured only one construct (perceived use). With only 20 participants, it was difficult to test whether the SUS was unidimensional. With more items, the greater the chance of measuring more than one construct.

About 10 years ago, Jim Lewis and I had enough data collected to test the dimensionality of the SUS using a factor analysis. We originally found two dimensions, which we labeled “usability” and “learnability” based on the items that loaded on each factor. This finding was even replicated by other researchers. However, with more data sets we found that the two dimensions were actually an artifact of the positively worded and negatively worded items  [pdf] and not two meaningful dimensions about perceived ease.

In other words, the SUS is after all unidimensional—measuring only the construct of perceived usability. And if it’s measuring only a single construct, do we still need all 10 items?

Too Redundant?

While the SUS is relatively short, are all 10 items necessary? How much is lost by reducing the number of items? How accurate would the SUS be if we used only two items (the bare minimum to assess reliability or load on a factor) or even a single item, which we’ve found sufficient for other simple constructs, including individual SUS items.

To find out, we analyzed SUS scores from 16,010 respondents from 148 products and websites with between 5 and 1,969 responses per product experience—one of the largest sets of SUS scores ever analyzed.

SUS scores for an interface are created by averaging together all the individual scores. Predicting individual scores is more difficult because of the higher variability at the participant level and a topic for a future article. Because our goal is to estimate the SUS score by product, we computed SUS scores for the 148 products, then created average scores for each of the 10 items. We tested all combinations of items to predict the SUS scores.

We found three items to have the highest correlations to the total SUS score: item 2 (r=.96), item 3 (r =.95), and item 8 (r=.95) weren’t statistically different from each other. Each of these items alone can explain at least a whopping 90% of the variation in SUS scores.

The best two-item combination are items 3 and 8 and then items 7 and 8, which together account for 96% of the variability in SUS scores by product. It’s interesting that item 8 is one of the best predictors because this is also the item that’s caused some trouble for respondents—cumbersome is usually changed to awkward because some participants have trouble understanding what cumbersome means. We suspect this item best predicts SUS because it’s related to the negatively worded tone of half the items in the SUS score. It could also be that these negative items add error to the measurement—people respond differently to negative items rather than the experience—and is a good topic for our future research.

Interestingly, there is a significant diminishing return after adding two more items to predict SUS scores. The best combination of three items only adds 1.8% more explanatory power than two items. And adding four items only adds 0.8% more explanatory power. Table 1 shows how much explanatory power each best combination of variables adds to predicting the overall SUS score compared to one fewer item.

# of Items R-Sq(adj) Improved Prediction
1 91.7
2 96.1 4.4
3 97.9 1.8
4 98.7 0.8
5 99.2 0.5
6 99.4 0.2
7 99.6 0.2
8 99.8 0.2
9 99.9 0.1
10 100 0.1

Table 1: Improved prediction in going from 1 to 2, 3… 10 items.

Using Item 3: Easy To Use

With the similar high correlations between the best single item, we selected item 3 (I thought the system was easy to use), which we felt has the best content validity and is used in other questionnaires (SUPR-Q,  UMUX Lite). You can see the relationship between the mean of item 3 and the total SUS score in the plot in Figure 1.

Figure 1 Item 3


Figure 1: Item 3 (“I thought the system was easy to use”) accounts for 90% of the variation in total SUS scores at the product level.

To predict the SUS score from item 3, we simply use the regression equation:

SUS (estimated) = -2.279 + 19.2048 (Mean of Item 3)

The following calculator will do the math for you. Enter the mean of Item 3 to get a predicted SUS score.


Item 3 Mean:


To predict a SUS score using items 3 and 8, use the regression equation:

SUS (estimated) = -6.33 + 9.85 (SUS03) + 10.2(reverse coded item SUS08*)

*Note: The negatively worded items have been reverse-coded in our regression equation and scaled from 1 to 5 so higher values all indicate better scores.

For example, if a product receives a mean score of 3.43 on item 3, it has a predicted SUS score of about 64:

SUS (estimated) = -2.279 + 19.2048 (3.43)

SUS (estimated) = 63.59

If a product receives a mean score of 3.81 on item 3, and 3.48 on item 8,* then the predicted SUS score is about 67:

SUS (estimated) = -6.33 + 9.85 (3.81) + 10.2(3.48)

SUS (estimated) = 66.68

How much is lost?

By reducing the number of items, we, of course, lose information. We introduce errors: the fewer the items, the less accurate the prediction of the 10-item SUS score. Some estimates will overestimate the actual SUS while others will underestimate it. We can assess how much is lost when using fewer items by using the regression equations, generating a predicted SUS score, and then comparing the predicted scores to the actual scores from this data set.

For example, one of the products in the dataset had a SUS score of 76.8. (See Table 2.) Using the mean from item 3 (4.20), the predicted SUS score is 78.4 (from the regression equation). This represents an error of 1.6 points or about 2%. Using the means from items 3 and 8 in the regression equation, the predicted SUS score is 77, an error of only 0.3%.

Item Mean
1 4.03
2 4.07
3 4.20
4 4.03
5 4.03
6 4.10
7 3.98
8 4.12
9 4.23
10 3.93
Actual SUS Score 76.8
Item 3 Predicted SUS 78.4
Error(%) 1.6(2%)
Items 3&8 Predicted SUS 77.0
Raw Error 0.3(0.3%)

Table 2: Means for each of the 10 SUS items for a software product in the database.

Across all 148 products, the median absolute error is 3.5% when using item 3 alone and 2.1% when using both items 3 and 8. However, in some cases, the score for item 3 was off the mark (predictions are rarely perfect). Eight products had a predicted value that deviated by at least 6 points (the highest deviation was 13.8 points). It’s unclear whether some of these deviations can be explained by improper coding of the SUS or other scoring aberrations that may be examined in future research.

Figure 2 shows the predicted SUS score from using just item 3. For example, if the mean score is 4 on item 3 from a group of participants, the estimated SUS score is 75. Anything below 3 isn’t good (below a predicted SUS score of 55) and anything above 4.5 is very good (above a predicted SUS of 84).

Figure 2: Predicted SUS score from the mean of item 3.

Grade Change

SUS scores can be interpreted by associating letter grades based on their percentile rank. The best performing products above the 90th percentile get an A (raw scores of 80.8), average products around the 50th percentile get a C (raw SUS scores around 68), and anything below the 14th percentile gets a failing grade of F (raw scores of 52).

Another way to interpret the accuracy of the prediction is to see how well the predicted SUS scores predict the associated SUS grades. A bit more than half (57%) product grades (84) differed between the predicted and actual SUS score. While this seems like a lot of deviation, of these 84, 57 (68%) only changed by half a letter grade. Figure 3 shows the differences between predicted grades and actual grades from the full 10 items.

Figure 3 Grade differences

Figure 3: Grade differences between item 3 only and the full 10 items.

For example, 7 products were predicted to be a B- using only item 3 but ended up being Bs for the full 10 items. Or put another way, 82% of all grades stayed the same or differed by half a letter grade (121 out of the 148 products). An example of predicted scores that changed more than half a letter grade were 6 products predicted to be a B, but ended up being an A- or A using the full 10 items (see Figure 3 row that starts with “B”).

We can continue with the grading metaphor by assigning letter grades numbers, as is done to compute a Grade Point Average or GPA (having high school flashbacks now?). The College Board method assigns numbers to grades (A = 4, B = 3, C = 2, D = 1, F=0, and the “+” and “-” get a 0.3 adjustment from the base letter value in the indicated direction.

Letter Grade 4.0 Scale
A+ 4.0
A 4.0
A- 3.7
B+ 3.3
B 3.0
B- 2.7
C+ 2.3
C 2.0
C- 1.7
D+ 1.3
D 1.0
F 0.0

Table 3: Number assigned to letters using the College Board designations.

The average GPA of the 148 products is 2.46 (about a C+) and the average GPA using the predicted grade is also 2.46! In fact, the difference is only 0.0014 points (not statistically different; p=.97). This is in large part because a lot of the half grade differences washed out and 7 products had predicted A scores but were actually A+ (both A and A+ have the same GPA number of 4).

Summary and Conclusions

An analysis of over 16,000 individual SUS responses across 148 products found that you can use a single item to predict a SUS score with high accuracy. Doing this has a cost though, as some information is lost (as is the case with most predictions).

Item 3 predicts SUS scores with 90% accuracy. If researchers ask only the single “easy to use” item 3, they can still predict SUS scores with 90% accuracy and expect the full SUS score to differ on average by 3.5% from the prediction.

Two items predict with 96% accuracy. Using only two items (item 3 and 8) can predict the SUS with 96% accuracy and expect the full SUS score to differ on average by 2.1% from the prediction. Future research will examine whether there are better ways to predict the SUS using different items (e.g. the UMUX-Lite).

After Three items Not Much Is Gained: There is a major diminishing return in adding additional items to improve the SUS score prediction. After three items, each additional item adds less than 1% to the accuracy of the SUS score.

Grades differ only slightly. By using only a single item to generate a grade, 82% of all grades were the same or differed by less than half a grade (e.g. predicted A-, actual B+) compared to the full 10-item grades. Using the GPA method, the average GPA was essentially identical, suggesting differences are minor.

Thanks to Jim Lewis and Lawton Pybus for contributing to this analysis.


Source by analyticsweek

Sisense BloX – Go Beyond Dashboards

Your boss comes to you at the end of the day and wants you to create an analytic web application for inventory management. Your first instinct is probably to get down to business coding. First, you create a sketch board, go through the UX and UI, review all the specifications, start development, QA, develop some more, and then QA some more…you know the drill.

What if I told you that you could do all of that in less than 10 minutes instead?

At Sisense Labs, we’re driven by how people will consume data in the future. So, over the past year, we have been creating a framework for developers to create their own analytics and BI applications – packaged BI capabilities for specific needs – that can be placed anywhere. We call it Sisense BloX.

Loops = Value

The idea for Sisense BloX comes as the next step in our journey to embed analytics everywhere. The idea was inspired by this piece on The Upshot, which gave us our “Eureka! moment” to give interactive functionality to our customers wherever and however they need it. Back in November 2017, I presented the idea internally here at Sisense as “Loops = Value.”

Here’s my original slide:

The slide may be pretty bare bones, but the idea was there: data allows you to create applications, applications allow you to take concrete actions, and these actions allow you to create more data. The benefit of higher user engagement with the ease of use to support and deploy in a low-code development environment enables companies to become more data-driven by tying business action with their data. As such, they can speed the monetization of their data investments.

So what is Sisense BloX?

Sisense BloX makes it easier than ever to create custom actionable analytic applications from complex data by leveraging powerful prebuilt templates to integrate application-like functionality into dashboards.

Sisense BloX is the next evolution of our Sisense Everywhere initiative in which we unveiled integrations with products like the Amazon Echo and a smart bulb. It’s another step in Sisense Lab’s pursuit of democratizing the BI world and increasing the value of data for everyone. With Sisense BloX, we transform the world of analytics into an open platform that customizes business applications in order to be more efficient with the way that we interact with our data.

Let’s break that down step by step.

First, the Sisense BloX framework includes a robust library of templates to ensure that you can get started quickly by adding new visualization options or integration points with other applications. That tedious development cycle we mentioned earlier is a thing of the past.

Then, because we live in a world where customization is key, you can customize the code of your analytics app using both HTML and JSON. Essentially, what this means is you can take code from anywhere on the web (like, this) and simply add it to a BloX application. This helps non-developers create applications they only dreamed about before and gives developers the UX layer for their BI.

And, finally, the Sisense BloX framework includes an easy-to-use interface to expose and access many API capabilities directly in the Sisense UI using standard CSS and JSON. What we’ve done is create a low-code environment that makes these APIs accessible to a much wider range of developers and even to non-developers. You can integrate whatever action you want right into your dashboards. Anyone can create an actual BI application using this new UX layer.

Sisense BloX is currently available as a plugin in Sisense Marketplace but make no mistake, the vision is clear—soon every developer will be able to connect data with actions by using a simple coding framework and add buttons, interactivity, animation, and just about anything HTML will allow.

The Future Belongs to Action

Interacting with data is complex. With unlimited use cases and ways to use data, ensuring we provide the right analytical solution in the right scenario is critical. Sisense BloX will integrate BI with UX together in one platform, creating BI apps of all shapes and sizes.

Sisense BloX empowers the data application designers to create business applications with actions wrapped in one container, which create a narrative and have a deeper impact on the organization’s business offering. With Sisense BloX the paradigm shifts from dashboard designers to analytic applications builders and makers. Maybe you want to create a calculator, a slider, or a form that connects and writes back to Salesforce. Sisense BloX allows for this and much more.

I’m excited to introduce Sisense BloX to the world.

Source: Sisense BloX – Go Beyond Dashboards by analyticsweek

Accountants Increasingly Use Data Analysis to Catch Fraud

When a team of forensic accountants began sifting through refunds issued by a national call center, something didn’t add up: There were too many fours in the data. And it was up to the accountants to figure out why.

Until recently, such a subtle anomaly might have slipped by unnoticed. But with employee fraud costing the country an estimated $300 billion a year, forensic accountants are increasingly wielding mathematical weapons to catch cheats.

“The future of forensic accounting lies in data analytics,” said Timothy Hedley, a fraud expert at KPMG, the firm that did the call-center audit.

In the curious case of the call centers, several hundred operators across the country were authorized to issue refunds up to $50; anything larger required the permission of a supervisor. Each operator had processed more than 10,000 refunds over several years. With so much money going out the door, there was opportunity for theft, and KPMG decided to check the validity of the payments with a test called Benford’s Law.


According to Benford’s Law—named for a Depression-era physicist who calculated the expected frequency of digits in lists of numbers—more numbers start with one than any other digit, followed by those that begin with two, then three and so on.

“The low digits are expected to occur far more frequently than the high digits,” said Mark J. Nigrini, author of Benford’s Law: Applications for Forensic Accounting, Auditing, and Fraud Detection and an accounting professor at West Virginia University. “It’s counterintuitive.”

Most people expect digits to occur at about the same frequency. But according to Benford’s Law, ones should account for 30% of leading digits, and each successive number should represent a progressively smaller proportion, with nines coming last, at under 5%.

In their call-center probe, Mr. Hedley and his colleagues stripped off the first digits of the refunds issued by each operator, calculated the frequencies and compared them with the expected distribution.

“For certain people answering the phones, the refunds did not follow Benford’s Law,” Mr. Hedley said. “In the ‘four’ category, there was a huge spike. It led us to think they were giving out lots of refunds just below the $50 threshold.”

The accountants identified a handful of operators—fewer than a dozen—who had issued fraudulent refunds to themselves, friends and family totaling several hundred thousand dollars.

That’s a lot of $40 refunds. But before running the Benford analysis, neither the company nor its auditors had evidence of a problem.

Getting the accounting profession to adopt Benford’s Law and similar tests has been a slow process, but Mr. Nigrini has spent two decades inculcating Benford’s Lawin the accounting and auditing community, promoting it through articles, books and lectures.

“It has the potential to add some big-time value,” said Kurt Schulzke, an accounting professor at Kennesaw State University in Georgia. “There has not been much innovation in the auditing profession in a long time, partly because they have ignored mathematics.”

Now, the Association to Advance Collegiate Schools of Business emphasizes the importance of analytical capabilities. Off-the-shelf forensic-accounting software such as IDEA and ACL include Benford’s Law tests. Even the Securities and Exchange Commission is reviewing how it can use such measures in its renewed efforts to police fraud.

Recently, at the invitation of the agency, Dan Amiram, an accounting professor at Columbia University, and his co-authors Zahn Bozanic of Ohio State University andEthan Rouen, a doctoral student at Columbia, demonstrated their method for applying Benford’s Law to publicly available data in companies’ income statements, balance sheets and statements of cash flow. For example, a look at Enron’snotorious fraudulent accounting from 2000 showed a clear variation from Benford’s Law.

“We decided to take a different approach,” Mr. Amiram said. “Those are the main financial statements that companies report.”

Auditors, who are employed by companies to examine their accounts, are given free access to data that can reveal potential fraud. Investors and other individuals don’t have that luxury. But, Mr. Amiram said, they all have the same goals: “To make capital markets more efficient and make sure bad guys are not cheating anyone.”

Benford’s Law isn’t a magic bullet. It’s only one approach. It isn’t appropriate for all data sets. And when it is a good tool for the job, it simply identifies anomalies in data, which must be explained with further investigation. In many cases, there are reasonable explanations for incongruities.

And with so much attention now paid to Benford’s Law, it might occur to some hucksters to try to evade detection while still cheating. But Mr. Nigrini said it isn’t that simple.

“While you are doing your scheme, you don’t know what the data look like,” he said. “Because you don’t know what the population looks like while you are committing fraud, it’s“It’s a little tricky to beat Benford’s.”

Write to Jo Craven McGinty at

Originally posted via “Accountants Increasingly Use Data Analysis to Catch Fraud”

Source: Accountants Increasingly Use Data Analysis to Catch Fraud