Feb 28, 19: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Data analyst  Source

[ AnalyticsWeek BYTES]

>> Big data: managing the legal and regulatory risks by analyticsweekpick

>> An Introduction to Apache Airflow and Talend: Orchestrate your Containerized Data Integration and Big Data Jobs by analyticsweekpick

>> Stream-based Data Integration – No Formatting Required by analyticsweek

Wanna write? Click Here

[ NEWS BYTES]

>>
 Financial Analytics Market Trends and Opportunities by types and Application in Grooming Regions; Edition 2018-2023 – The West Chronicle (press release) (blog) Under  Financial Analytics

>>
 Streaming Analytics Market Analysis, Outlook, Opportunities, Size, Share Forecast and Supply Demand 2018-2025 – The Smyrna Themes (press release) (blog) Under  Streaming Analytics

>>
 CVS Closes $69B Acquisition of Aetna, Altering Consumer Landscape – HealthPayerIntelligence.com Under  Health Analytics

More NEWS ? Click Here

[ FEATURED COURSE]

Process Mining: Data science in Action

image

Process mining is the missing link between model-based process analysis and data-oriented analysis techniques. Through concrete data sets and easy to use software the course provides data science knowledge that can be ap… more

[ FEATURED READ]

Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking

image

Written by renowned data science experts Foster Provost and Tom Fawcett, Data Science for Business introduces the fundamental principles of data science, and walks you through the “data-analytic thinking” necessary for e… more

[ TIPS & TRICKS OF THE WEEK]

Strong business case could save your project
Like anything in corporate culture, the project is oftentimes about the business, not the technology. With data analysis, the same type of thinking goes. It’s not always about the technicality but about the business implications. Data science project success criteria should include project management success criteria as well. This will ensure smooth adoption, easy buy-ins, room for wins and co-operating stakeholders. So, a good data scientist should also possess some qualities of a good project manager.

[ DATA SCIENCE Q&A]

Q:What is a decision tree?
A: 1. Take the entire data set as input
2. Search for a split that maximizes the ‘separation” of the classes. A split is any test that divides the data in two (e.g. if variable2>10)
3. Apply the split to the input data (divide step)
4. Re-apply steps 1 to 2 to the divided data
5. Stop when you meet some stopping criteria
6. (Optional) Clean up the tree when you went too far doing splits (called pruning)

Finding a split: methods vary, from greedy search (e.g. C4.5) to randomly selecting attributes and split points (random forests)

Purity measure: information gain, Gini coefficient, Chi Squared values

Stopping criteria: methods vary from minimum size, particular confidence in prediction, purity criteria threshold

Pruning: reduced error pruning, out of bag error pruning (ensemble methods)

Source

[ VIDEO OF THE WEEK]

#FutureOfData with Rob(@telerob) / @ConnellyAgency on running innovation in agency

 #FutureOfData with Rob(@telerob) / @ConnellyAgency on running innovation in agency

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

War is 90% information. – Napoleon Bonaparte

[ PODCAST OF THE WEEK]

@TimothyChou on World of #IOT & Its #Future Part 2 #FutureOfData #Podcast

 @TimothyChou on World of #IOT & Its #Future Part 2 #FutureOfData #Podcast

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

Facebook users send on average 31.25 million messages and view 2.77 million videos every minute.

Sourced from: Analytics.CLUB #WEB Newsletter

How to Architect, Engineer and Manage Performance (Part 1)

This is the first of a series of blogs on how to architect, engineer and manage performance.  In it, I’d like to attempt to demystify performance by defining it clearly as well as describing methods and techniques to achieve performance requirements. I’ll also cover how to make sure the requirements are potentially achievable.

Why is Performance Important?

To start, let’s look at a few real-world scenarios to illustrate why performance is critical in today’s business:

  • It is the January sales and you are in the queue at your favorite clothing stores. There are ten people in front of you, all with 10-20 items to scan and scanning is slow. Are you going to wait or give up? Are you happy?  What would a 30% increase in scan rates bring to the business? Would it mean fewer tills and staff at quiet times?  Money to save, more money to make and better customer satisfaction.
  • Let’s say you work at a large bank and you are struggling to meet the payment deadline for inter-bank transfers on peak days. If you fail to meet these, you get fined 1% of the value of all transfers waiting. You need to make the process faster, increase capacity, improve parallelism. It could lose you money, worse damage your reputation with other banks and customers.
  • You’ve needed a mortgage. XYZ Bank offers the best interest rate and they can give you a decision within a month. ABC Bank cost 0.1% more, but guarantee a decision in 72 hours. The vendors of the house you want to buy need a decision within the week.
  • Traders in New York and London can make millions by simply being 1ms faster than others. This is one of the rare performance goals, where there is no limit to how fast, except the cost versus the return.

Why is performance important? Because it means happy customers, cost savings, avoiding lost sales opportunities, differentiating your services, protecting your reputation and much more.

The Outline for Better Performance Processes

Let’s start with performance testing. While this is part of the performance process on it is own, it is usually a case of “too little, too late” and the costs of late change are often severe.

So, let me outline a better process for achieving performance and then I’ll talk a little about each part now, and in more detail in the next few blogs:

  1. Someone should be ultimately responsible for performance – the buck stops with someone, not some team.
  2. This performance owner leads the process – they help others achieve the goals.
  3. Performance goals need to be measurable and realistic, and potentially achievable – they are the subject of discussions and agreement between parties.
  4. Goals will be broken down amongst the team delivering the project – performance is a team game.
  5. Performance will be achieved in stages using tools and techniques – there will no miracles, magic or silver bullets.
  6. Performance must be monitored, measured and managed – how do we deal with the problems which will occur.

Responsibility for Performance

Earlier, I stated that one person should be solely responsible for performance. Let me expand upon that. When you have more than one person in charge, you can no longer be sure how the responsibility is owned. Even with two people, there will be things which you haven’t considered and do not fit neatly in one or other roles.  If possible, do not combine roles like Application and Performance Architect (Leader) as this may lead to poor compromises. In my opinion, it is far better for each person to fight for one thing and have the PM or Chief Architect to judge the arguments rather than one person trying to weigh the benefits on their own.  Clearly, in small projects, this is not possible, and care is necessary to ensure anyone carrying multiple roles can balance the two or bring out the multiple roles without bias.  It is very easy to favour the role we know best or enjoy most.

I probably won’t return to the topic until, towards the end of the series, it will be easier to understand after looking at the other parts in more detail.

The Performance Architect – Performance Expert, Mentor, Leader and Manager

In an ideal scenario, a Performance Architect’s role is to guide others through the performance improvement process, not to do it all themselves. No one is an enough of an expert in all aspects of performance to do this.  Performance Architects should:

  • Manage and orchestrate the process on behalf of the project.
  • Lead by taking responsibility for the division of a requirement(s) and alignment of the performance requirements across the project.
  • Provide expertise in performance, giving more specialised roles ways to solve the challenges of performance – estimation, measurement, tuning, etc.

This needs a bit more explanation in a future blog in this series, but I will cover some of the other parts first.

Setting Measurable Goals – Clear Goals that Reflect Reality

Setting measurable performance goals requires more work than you first think. Consider this example, “95% of transactions must complete in < 5 secs”. On the face of it this seems ok, but consider:

  • Are all transactions equal – logon, account update, new account opening?
  • What do the five seconds mean – elapsed time, processing time, what about the thinking time of the user?
  • What if the transaction fails?
  • Data variations – all customers named “Peabody” versus “Mohammed” or “Smith”.

That one requirement will need to be expanded and broken down into much more detail.  The whole process of getting the performance requirements “right” is a major task and the requirements will continue to be refined during the project as requirements change.

It is worth stating that trying to tune something to go as fast as possible is not the aim of performance. It isn’t a measurable goal, you don’t know when you’ve achieved the goal, and it would be very expensive.

This area can be involved, it takes time, effort and practice, and this is the basis for the rest of the work, so a topic for more discussion in the next blog.

Breaking the Goals Down – Dividing the Cake

If we have five seconds to perform the login transaction, we need to divide that time between the components and systems that will deliver the transaction.  The Performance Architect will do this with the teams involved, but I’ve found that they’ll get it wrong at the start. They don’t have all the facts, it doesn’t matter if it is right at the finish. You are, probably struggling to see where to start, don’t worry about that for a bit, the next section will help.

I’ll look at this in a future blog, but it is probably best discussed after looking at estimation techniques.

Achieving Performance – The Tools and Techniques

When most people drop a ball, they know it falls downwards due to gravity. But not many of us are physicists and four-year-old children haven’t studied any physics, but they don’t seem to struggle to use the experience (knowledge). 

How long will an ice cube take to melt? An immediate reaction goes something like this, “I don’t know it depends on the temperature of the water/air around the ice cube”. So make some assumptions, provide an estimate, then start asking questions and doing research to confirm the assumptions or produce a new estimate.

How long will the process X take?  How long did it take last time? What is different this time? What is similar? What does the internet suggest? Could we do an experiment? Could we build a model? Has it been done before?

Start with an estimate (guess, educated guess, the rule of thumb – use the best method available or methods) and then as the project progresses we can use other techniques to model, measure and refine.

You might be saying, “But I still don’t know”! That’s correct, and you probably never will at the outset.

Statistically, it is almost certain you won’t die by being struck by lightning (Google estimates 24,000 or 6,000 people per year on Earth pass away this way – no one knows the real figure – think about that, we don’t know the real answer).  Most (or all, I hope) assume we won’t be sick tomorrow, but some people will. Nothing is certain, you don’t the answer to nearly everything in the future with accuracy, but it doesn’t stop you making reasonable assumptions and dealing with the surprises, both good and bad.

This is a huge topic and I need to spend some time on this in the series to build your confidence in your skills by showing you just some of the options you have now, you can learn and develop.

“Data Science” and “Statistics” are whole areas of academic study interested in prediction, so this is more than a topic. There are probably more ways to produce estimates than there are to solve the IT problem you are estimating.

Keeping on track – Monitoring, measuring and managing

Donald Rumsfeld made the point there are things we know, we know we don’t know and things that surprise us (unknown unknowns). Actually, it is worse, people often think they know and are wrong, or assume they don’t know and then realise their estimate was better than the one the project used.

Risk management is how we deal with the whole performance process. Risk management, just like the Project Manager uses will help us manage the process. As we progress through the project we will build up our knowledge and confidence in the estimates and reduce the risk and use the information to focus our effort where the greatest risk is.

A Project Manager measures the likelihood of the risk occurring and the impact.  For performance, we measure the chance of us achieving the performance goal and how confident we are of the current estimate.

This will be easier to understand as we look at other parts in more depth, we’ll revisit this in a future blog.

In future blogs in the series I will cover:

  • Setting goals – Refining the performance requirements.
  • Tools and techniques – Where to start – Estimation, Research,
  • Monitoring, measuring and managing – risk and confidence.
  • Breaking the goals down across the team.
  • Tools and techniques – Next Steps – Volumetrics, Model and Statistics
  • Tools and techniques – Later Stages – Some testing (and monitoring) options.
  • Responsibility and Leadership

The Author

Chris first became interested in computers aged 9. His formal IT education was from ages 13 to 21, informally it has never stopped. He joined the British Computer Society (BCS) at 17 and is a Chartered Engineer, Chartered IT Professional and Fellow of the BCS. He is proud and honoured to have held several senior positions in the BCS including Trustee, Chair of Council and Chair of Membership Committee, and remains committed to IT professionalism and the development of IT professionals.

He has worked for two world leading companies before joining his third, Talend as a Customer Success Architect.  He has over 30 years of professional working experience, with data and information systems.  He is passionate about customer success, understanding the requirements and the engineering of IT solutions. 

Our Team

The Talend Customer Success Architecture (CSA) team is a growing worldwide team of 20+ highly experienced technical information professionals. It is part of Talend’s Customer Success organisation dedicated to the mission of making all Talend’s customers successful and encompasses Customer Success Management, Professional Services, Education and Enablement, Support and the CSA teams. 

Talend

Built for cloud data integration, Talend Cloud allows you to liberate your data from everything holding it back. Data integration is a critical step in digital transformation. Talend makes it easy.

The post How to Architect, Engineer and Manage Performance (Part 1) appeared first on Talend Real-Time Open Source Data Integration Software.

Originally Posted at: How to Architect, Engineer and Manage Performance (Part 1)

In the Absence of Data, Everyone is Right

I wrote a post last week that compared two ways to make decisions/predictions: 1) opinion-driven and 2) data-driven. I am a big believer of using data to help make decisions/predictions. Many pundits/analysts made predictions about who would win the US presidential elections. Now that the elections are over, we can compare the success rate for predicting the election. Comparing the pundits with Nate Silver, Mr. Silver is clearly the winner, predicting the winner of the presidential election for each state perfectly (yes, 50 out of 50 states) and the winner of the popular vote.

Summary of polling results from fivethirtyeight.com published on 11/6/2012, one day before the 2012 presidential election. Click image to read entire post.

Let’s compare how each party made their predictions. While both used publicly available polling data, political pundits appeared to make their predictions based on the results from specific polls. Nate Silver, on the other hand, applied his algorithm to many publicly available polling data at the state level. Because of sampling error, poll results varied across the different polls. So, even though the aggregated results of all polls painted a highly probable Obama win, the pundits could still find particular poll results to support their beliefs. (Here is a good summary of pundits who had predicted Romney would win the Electoral College and the popular vote).

Next, I want to present a psychological phenomenon to help explain how the situation above unfolded. How could the pundits make decisions that were counter to the preponderance of evidence available to them? Can we learn how to improve decision making when it comes to improving the customer experience?

Confirmation Bias and Decision Making

Confirmation bias is a psychological phenomenon where people tend to favor information that confirms or supports their existing beliefs and ignores or discounts information that contradicts their beliefs.

Here are three different forms of confirmation bias with some simple guidelines to help you minimize their impact on decision making. These guidelines are not meant to be comprehensive. Look at them as a starting point to help you think more critically about how you make decisions using customer data. If you have suggestions about how to minimize the impact of confirmation bias, please share what you know. I would love to hear your opinion.

  1. People tend to seek out information that supports their beliefs or hypotheses. In our example, the pundits hand-picked specific poll results to make/support their predictions. What can you do? Specifically look for data to refute your beliefs. If you believe product quality is more important than service quality in predicting customer loyalty, be sure to collect evidence about the relative impact of service quality (compared to product quality).
  2. People tend to remember information that supports their position and not remember information that does not support their position. Don’t rely on your memory. When making decisions based on any kind of data, cite the specific reports/studies in which those data appear. Referencing your information source can help other people verify the information and help them understand your decision and how you arrived at it. If they arrive at a different conclusion than you, understand the source of the difference (data quality? different metrics? different analysis?).
  3. People tend to interpret information in a way that supports their opinion. There are a few things you can do to minimize the impact of confirmation bias. First, use inferential statistics to separate real, systematic, meaningful variance in the data from random noise. Place verbal descriptions of the interpretation next to the graph. A clear description ensures that the graph has little room for misinterpretation. Also, let multiple people interpret the information contained in customer reports. People from different perspectives (e.g., IT vs. Marketing) might provide highly different (and revealing) interpretations of the same data.

Summary

My good friend and colleague, Stephen King (CEO of TCELab) put it well when describing the problem of not using data in decision-making: “In the absence of data, everyone is right.” We tend to seek out information that supports our beliefs and disregard information that does not. This confirmation bias negatively impacts decisions by limiting what data we seek out and ignore and how we use those data. To minimize the impact of confirmation bias, act like a scientist. Test competing theories, cite your evidence and apply statistical rigor to your data.

Using Big Data integration principles to organize your disparate business data is one way to improve the quality of decision-making. Data integration around your customers facilitates open dialogue across different departments, improves hypothesis testing using different customer metrics across disparate data sources (e.g., operational, constituency, attitudinal), improving how you make decisions that will ultimately help you win customers or lose customers.

Source

Feb 21, 19: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Complex data  Source

[ AnalyticsWeek BYTES]

>> Relationships, Transactions and Heuristics by bobehayes

>> 8 ways IBM Watson Analytics is transforming business by analyticsweekpick

>> For the Bold, Bullied & Beautiful by v1shal

Wanna write? Click Here

[ NEWS BYTES]

>>
 Democratizing big data: How Brainspace can create citizen data scientists in less than a day – WTOP Under  Big Data

>>
 Global Risk Analytics Market 2018 Forecast to 2023 – Honest Facts Under  Risk Analytics

>>
 How much do you really want artificial intelligence running your life? – American Thinker (blog) Under  Artificial Intelligence

More NEWS ? Click Here

[ FEATURED COURSE]

A Course in Machine Learning

image

Machine learning is the study of algorithms that learn from data and experience. It is applied in a vast variety of application areas, from medicine to advertising, from military to pedestrian. Any area in which you need… more

[ FEATURED READ]

The Industries of the Future

image

The New York Times bestseller, from leading innovation expert Alec Ross, a “fascinating vision” (Forbes) of what’s next for the world and how to navigate the changes the future will bring…. more

[ TIPS & TRICKS OF THE WEEK]

Fix the Culture, spread awareness to get awareness
Adoption of analytics tools and capabilities has not yet caught up to industry standards. Talent has always been the bottleneck towards achieving the comparative enterprise adoption. One of the primal reason is lack of understanding and knowledge within the stakeholders. To facilitate wider adoption, data analytics leaders, users, and community members needs to step up to create awareness within the organization. An aware organization goes a long way in helping get quick buy-ins and better funding which ultimately leads to faster adoption. So be the voice that you want to hear from leadership.

[ DATA SCIENCE Q&A]

Q:What is latent semantic indexing? What is it used for? What are the specific limitations of the method?
A: * Indexing and retrieval method that uses singular value decomposition to identify patterns in the relationships between the terms and concepts contained in an unstructured collection of text
* Based on the principle that words that are used in the same contexts tend to have similar meanings
* “Latent”: semantic associations between words is present not explicitly but only latently
* For example: two synonyms may never occur in the same passage but should nonetheless have highly associated representations

Used for:

* Learning correct word meanings
* Subject matter comprehension
* Information retrieval
* Sentiment analysis (social network analysis)

Source

[ VIDEO OF THE WEEK]

#FutureOfData Podcast: Peter Morgan, CEO, Deep Learning Partnership

 #FutureOfData Podcast: Peter Morgan, CEO, Deep Learning Partnership

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

Data that is loved tends to survive. – Kurt Bollacker, Data Scientist, Freebase/Infochimps

[ PODCAST OF THE WEEK]

Discussing Forecasting with Brett McLaughlin (@akabret), @Akamai

 Discussing Forecasting with Brett McLaughlin (@akabret), @Akamai

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

Every second we create new data. For example, we perform 40,000 search queries every second (on Google alone), which makes it 3.5 searches per day and 1.2 trillion searches per year.In Aug 2015, over 1 billion people used Facebook FB +0.54% in a single day.

Sourced from: Analytics.CLUB #WEB Newsletter

Are Top Box Scores a Better Predictor of Behavior?

Are Top Box Scores a Better Predictive of Behavior

Are Top Box Scores a Better Predictive of BehaviorWhat does 4.1 on a 5-point scale mean? Or 5.6 on a 7-point scale?

Interpreting rating scale data can be difficult in the absence of an external benchmark or historical norms.

A popular technique used often by marketers to interpret rating scale data is the so-called “top box” and “top-two box” scoring approach.

For example, on a 5-point scale, such as the one shown in Figure 1, counting the number of respondents who selected the most favorable response (“strongly agree”) fall into the top box. (See how it looks like a box and is the “top” response to select?)

Top-Box
Strongly-Disagree Disagree Undecided Agree Strongly-Agree
1 2 3 4 5

 

Figure 1: Top box of a 5-point question.

Likewise, the top-two box counts responses in the two most extreme responses (4 and 5 in Figure 1). This approach is popular when the number of response options are between 7 and 11 points. For example, the 11-point Net Promoter Question (“How likely are you to recommend this product to a friend”) has the top-two boxes of 9 and 10 designated as “Promoters” (Figure 2).

The idea behind this practice is that you’re getting only those that have the strongest feelings toward a statement. This applies to standard Likert item options (Strongly Disagree to Strongly Agree) and to other response options (Definitely Will Not Purchase to Definitely Will Purchase). The Net Promoter Score not only uses the top-two box, but also the bottom-six box approach in computing the score, which captures both the extreme responders (likely to recommend and likely to dissuade others).

 

Detractors Passive Promoters
Top 2 Box
Not at
all Likely
Neutral Extremely
Likely
0 1 2 3 4 5 6 7 8 9 10

 

Figure 2: Top-two box for the 11-point Likelihood to Recommend (LTR) question used for the Net Promoter Score.

Of course, shifting the analysis from using means to top box percentages may seem like it provides more meaning even though it doesn’t. For example, what does it mean when 56% of respondents select 4 or 5 on a 5-point scale or 63% select 6 or 7 on a 7-point scale? Do you really have more information than with the mean? Without an external benchmark, you still don’t know whether these are good, indifferent, or poor percentages.

Loss of Information

The major problem with the top box approach is that you lose information in the transformation from rating scale to proportion. Should a person who responds with 1 on a 5-point scale be treated the same (computationally) as those who provide a neutral (3) response? The issues seem even more concerning on the 11-point LTR item. Are 0s and 1s really the same as 5s and 6s when determining detractors?

For example, from an analysis of 87 software products, we found converting the 11 points into essentially a two-point scale lost 4% of the information.

The negative impact is:

  1. Wider margins of error (more uncertainty)
  2. Needing a larger sample size to detect differences
  3. Changes over time or to competitors become less easy to detect with the same sample size (loss of precision)

This increase in the margin of error (and its effect on sample size) can be seen in the responses of 53 participants to their Likelihood to Recommend scores toward the brand Coca-Cola in Figure 3. Using the mean LTR response, the confidence interval width is 5.2% of the range (.57/11); for the NPS computation, the confidence interval width is 9.4% of the range (18.7/200).

Difference in width of confidence intervals using the mean or the official NPS

Figure 3: Difference in width of confidence intervals using the mean (right panel) or the official NPS (left panel) from LTR toward Coca-Cola. The NPS margin of error is almost twice the width of the mean (twice the uncertainty).

Moving the Mean or the Extremes?

The intent of using measures like customer satisfaction, likelihood to recommend, and perceived usability is of course not just an exercise in moving the mean from 4.5 to 5.1. It should be about using generally easy to collect leading indicators to predict harder to measure behavior.

This is the general idea behind models like the service profit chain: Increased customer satisfaction is expected to lead to greater customer retention. Improved customer retention leads to greater profitability.

Reichheld and others have argued though that, in fact, it’s not the mean companies should be concerned with, but rather the extreme responders, which have a better association with repurchasing (growth). In his 2003 HBR article, Reichheld says

“Promoters,” the customers with the highest rates of repurchase and referral, gave ratings of nine or ten to the [likelihood to recommend] question.”

Reichheld also talks about the impact of extremely low responses (detractors). But is there other evidence to support the connection between extreme attitudes and behavior that Reichheld found?

The Extremes of Attitudes

There is evidence that attitudes (at least in some situations) don’t follow a simple linear pattern and in fact, it’s the extremes in attitudes, which are better predictors of behavior.

Oliver et al. (1997) suggest that moderate attitudes fall into a ”zone of indifference” and only when attitudes become extremely positive or negative do they begin to map to behavior.

Anderson and Mittal (2000) also echo this non-linear relationship and asymmetry and note that often a decrease in satisfaction will have a greater impact on behavior than an equivalent increase. They describe two types of attributes:

  • Satisfaction-maintaining attributes are what customers expect and are more likely to exhibit “negative asymmetry.” For example, consumers have come to expect clear calls and good coverage from their wireless provider; when the clarity and coverage goes down, consumers get angry. As such, performance changes in the middle of a satisfaction scale are more consequential than those at the upper extreme of satisfaction (i.e. 5 out of 5).
  • Satisfaction-enhancing attributes exhibit positive asymmetry. These are often called delighters and changes in the upper range have more consequence than the middle range. For example, having free Wi-Fi on an airplane may delight customers and lead to higher levels of repeat purchasing and recommending. In this case, changes in the upper extremes of satisfaction have a better link to behavior.

van Doorn et al. (2007) conducted two studies from Dutch consumers to understand the relationship between attitudes and behavior. In the first study, they surveyed 266 Dutch consumers using five 6-point rating scales on environmental consciousness. They found an exponential relationship between attitude to the environment and number of categories of organic products purchased (e.g. meat, eggs, fruit).

Relationship between number of organic products purchased and environmental concerns

Figure 4: Relationship between the number of organic products purchased and environmental concerns from van Doorn et al. (2007) show a non-linear pattern.

They found the relationship between environmental concern and the number of organic product categories purchased is negligible for environmental concern below 5, but for extremely high levels of environmental concern, the relation is much stronger than in the linear model (see Figure 4).

In a second study, they examined the relationship between the number of loyalty cards and attitudes toward privacy from 3,657 Dutch respondents in 2004. They used two 5-point items asking about privacy concerns. In this study though, they found weaker evidence for the non-linear relationship but still found that privacy scores below 2.5 didn’t have much impact on loyalty cards. For privacy scores above 2.5 (see Figure 5), the average number of customer cards decreased more rapidly (less linear).

Relationship between privacy concerns and number of loyalty cards

Figure 5: Relationship between privacy concerns and number of loyalty cards from van Doorn et al. (2007) also show a non-linear pattern.

van Doorn et al. (2007) concluded it makes sense to target only those consumers close to or at the extreme points of the attitudinal scale: bottom-two box and top-two box.

The authors argue that in some circumstances it makes more sense to pay attention to the extremes (echoing Anderson and Mittal). Customers with very low satisfaction (bottom box) may have a greater effect on things like churn. Likewise, high satisfaction (top-two box) customers are likely to drive customer retention, which means that efforts should be made to shift customers just beneath the top-two box to above the threshold.

This asymmetry was also seen with Mittal, Ross, and Baldasare (1998). [pdf] Three studies in the healthcare and the automotive industry found that overall satisfaction and repurchase intentions are affected asymmetrically: negative outcomes had a disproportionate impact on satisfaction.

But not all studies show this effect with extremes. Morgan and Rego (2006), in their analysis of U.S. companies, showed that top-two box scores are a good predictor of future business performance, but actually perform slightly worse than using average satisfaction (they used a Net Promoter type question in their analysis).

de Haan et al. (2015), using data from 93 Dutch services firms from 18 industries, found that top-two box customer satisfaction performs best for predicting customer retention from 1,375 customers from a two-year follow-up survey. They found the top-two box satisfaction and officially scored NPS using its top-two minus bottom-six approach were slightly better predictors than using their full-scale mean on customer retention (Sat Mean r = .15 vs Sat Top 2 Box; r=.18 and NPS Mean r = .16 vs NPS Scored r=.17). They suggested it’s useful to transform scores to focus on very positive (or very negative) groups and to predict customer metrics, including customer retention and tenure.

Extremes of UX Attitudes

Echoing this extreme attitude on behavior in an analysis I conducted in 2012 for a wireless carrier, I looked at the relationship between the attitude toward the usability (using SUS) and likelihood to recommend (NPS) a phone handset and their relationship on return rates.

In running a linear regression on both SUS and NPS to predict return rates at a product (not individual level), I was able to explain 8% and 14% of return rates respectively. However, when I transformed the data into extremes (SUS > 80 = high and SUS 30% = high and NPS < -25% = low), I was able to more than double the explanatory power of attitude predicting behavior to 20% and to 27% R-square respectively.

This can be seen in Figure 6 (the pictures are for illustration only). Handsets with the highest SUS scores had less than half the return rate than handsets that scored average or below. This illustrates the non-linear relationship: movement of SUS scores from horrible (in the 30s-40s) to below average (50s-60s) didn’t affect the return rate.

Non-linear relationship between SUS and 30-day return rates

Figure 6: Non-linear relationship between SUS (usability) and 30-day return rates (phone images are for illustration purposes and don’t represent the actual phones in the analysis).

Summary & Takeaways

This analysis of the literature and our own research found:

Using top box scores loses information and increases uncertainty around the mean. The actual loss will depend on the data, but we found it was around 4% in one study. The margin of error around the estimate will in many situations approximately double when going from mean to NPS. This leads to needing larger sample sizes to detect the same differences over time or against competitors.

Data lost using top or bottom box scoring might be worth shedding. Some published research and our own analysis have found that in some situations, when predicting behavior, that more extreme responses are a better predictor. More research is needed to understand the limitations and extent of this relationship.

The relationship between attitudinal and behavior may be non-linear (in some cases). In situations where there is non-linear behavior, top box and bottom box scoring may capture this non-linearity better than using the mean (or other transformations), lending credence to the NPS approach.

Context matters. Not all studies showed a non-linear relationship and superiority of the top box scoring approach. In some cases, the mean was a better predictor of behavior (albeit slightly) and using both as measures of behavior seems prudent.

Bottom box might be as important. While top box scoring tends to be more common, in many cases it’s not the top box, but the bottom box that matters more. There is some evidence that extreme negative attitudes (e.g. losing what’s expected) predicts behavior better, especially in cases when customers expect an attribute in a product or service.

Thanks to Jim Lewis for commenting on an earlier draft of this article.

Originally Posted at: Are Top Box Scores a Better Predictor of Behavior?

Feb 14, 19: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Correlation-Causation  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> The Future of Big Data [Infographics] by v1shal

>> Specificity is the Soul of Data Narrative by analyticsweek

>> @ChuckRehberg / @TrigentSoftware on Translating Technology to Solve Business Problems #FutureOfData by v1shal

Wanna write? Click Here

[ NEWS BYTES]

>>
 JMMMMC Recognized For Overall Excellence In Quality Among Rural Hospitals – Sand Hills Express Under  Health Analytics

>>
 Webinar: Cloud security – Five questions to help decide who to trust – The Lawyer Under  Cloud Security

>>
 Global Streaming Analytics Market 2018 Share, Trend, Segmentation and Forecast to 2023 – Finance Exchange Under  Streaming Analytics

More NEWS ? Click Here

[ FEATURED COURSE]

Statistical Thinking and Data Analysis

image

This course is an introduction to statistical data analysis. Topics are chosen from applied probability, sampling, estimation, hypothesis testing, linear regression, analysis of variance, categorical data analysis, and n… more

[ FEATURED READ]

Antifragile: Things That Gain from Disorder

image

Antifragile is a standalone book in Nassim Nicholas Taleb’s landmark Incerto series, an investigation of opacity, luck, uncertainty, probability, human error, risk, and decision-making in a world we don’t understand. The… more

[ TIPS & TRICKS OF THE WEEK]

Winter is coming, warm your Analytics Club
Yes and yes! As we are heading into winter what better way but to talk about our increasing dependence on data analytics to help with our decision making. Data and analytics driven decision making is rapidly sneaking its way into our core corporate DNA and we are not churning practice ground to test those models fast enough. Such snugly looking models have hidden nails which could induce unchartered pain if go unchecked. This is the right time to start thinking about putting Analytics Club[Data Analytics CoE] in your work place to help Lab out the best practices and provide test environment for those models.

[ DATA SCIENCE Q&A]

Q:Do you know / used data reduction techniques other than PCA? What do you think of step-wise regression? What kind of step-wise techniques are you familiar with?
A: data reduction techniques other than PCA?:
Partial least squares: like PCR (principal component regression) but chooses the principal components in a supervised way. Gives higher weights to variables that are most strongly related to the response

step-wise regression?
– the choice of predictive variables are carried out using a systematic procedure
– Usually, it takes the form of a sequence of F-tests, t-tests, adjusted R-squared, AIC, BIC
– at any given step, the model is fit using unconstrained least squares
– can get stuck in local optima
– Better: Lasso

step-wise techniques:
– Forward-selection: begin with no variables, adding them when they improve a chosen model comparison criterion
– Backward-selection: begin with all the variables, removing them when it improves a chosen model comparison criterion

Better than reduced data:
Example 1: If all the components have a high variance: which components to discard with a guarantee that there will be no significant loss of the information?
Example 2 (classification):
– One has 2 classes; the within class variance is very high as compared to between class variance
– PCA might discard the very information that separates the two classes

Better than a sample:
– When number of variables is high relative to the number of observations

Source

[ VIDEO OF THE WEEK]

@AnalyticsWeek: Big Data Health Informatics for the 21st Century: Gil Alterovitz

 @AnalyticsWeek: Big Data Health Informatics for the 21st Century: Gil Alterovitz

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

Data is the new science. Big Data holds the answers. – Pat Gelsinger

[ PODCAST OF THE WEEK]

Andrea Gallego(@risenthink) / @BCG on Managing Analytics Practice #FutureOfData #Podcast

 Andrea Gallego(@risenthink) / @BCG on Managing Analytics Practice #FutureOfData #Podcast

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

571 new websites are created every minute of the day.

Sourced from: Analytics.CLUB #WEB Newsletter

Six Practices Critical to Creating Value from Data and Analytics [INFOGRAPHIC]

IBM Institute for Business Value surveyed 900 IT and business executives from 70 countries from June through August 2013. The 50+ survey questions were designed to help translate concepts relating to generating value from analytics into actions. They found that business leaders adopt specific strategies to create value from data and analytics. Leaders:

  1. are 166% more likely to make decisions based on data.
  2. are 2.2 times more likely to have a formal career path for analytics.
  3. cite growth as the key source of value.
  4. measure the impact of analytics investments.
  5. have predictive analytics capabilities.
  6. have some form of shared analytics resources.

Read my summary of IBM’s study here. Download the entire study here. And check out IBM’s infographic below.

IBM Institute for Business Value - 2013 Infographic

 

Source by bobehayes

6 things that you should know about vMwarevSphere 6.5

vSphere 6.5 offers a resilient, highly available, on-demand infrastructure that is the perfect groundwork for any cloud environment. It provides innovation that will assist digital transformation for the business and make the job of the IT administrator simpler. This means that most of their time will be freed up so that they can carry out more innovations instead of maintaining the status quo. Furthermore, vSpehere is the foundation of the hybrid cloud strategy of VMware and is necessary for cross-cloud architectures. Here are essential features of the new and updated vSphere.

vCenter Server appliance

vCenter is an essential backend tool that controls the virtual infrastructure of VMware. vCenter 6.5 has lots of innovative upgraded features. It has a migration tool that aids in shifting from vSphere 5.5 or 6.0 to vSphere 6.5. The vCenter Server appliance also includes the VMware Update Manager that eliminates the need for restarting external VM tasks or using pesky plugins.

vSphere client

In the past, the front-end client that was used for accessing the vCenter Server was quite old-fashioned and stocky. The vSphere has undergone necessary HTML5 alterations. Aside from the foreseeable performance upgrades, the change also makes this tool cross-browser compatible and more mobile-friendly.  The plugins are no longer needed and the UI has been switched for a more cutting-edge aesthetics founded on the VMware Clarity UI.

Backup and restore

The backup and restore capabilities of the VSpher 6.5 is an excellent functionality that enables clients to back up data on any Platform Services Controller appliances or the vCenter Server directly from the Application Programming Interface(API) or Virtual Appliance Management Interface (VAMI). In addition, it is able to back up both VUM and Auto Deploy implanted within the appliance. This backup mainly consists of files that need to be streamed into a preferred storage device through SCP, FTP(s), or HTTP(s) protocols.

Superior automation capabilities

With regards to automation, VMware vSphere 6.5 works perfectly because of the new upgrades. The new PowerCLI tweak has been an excellent addition to the VMware part because it is completely module-based and the APIs are at present in very high demand. This feature enables the IT administrators to entirely computerize tasks down to the virtual machine level.

 Secure boot

The secure boot element of vSphere comprises the -enabled virtual machines. This feature is available in both Linux and Windows VMs and it allows secure boot to be completed through the clicking of a simplified checkbox situated in the VM properties. After it is enabled, only the properly signed VMs can utilize the virtual environment for booting.

 Improved auditing

The Vsphere 6.5 offers clients improved audit-quality logging characteristics. This aids in accessing more forensic details about user actions. With this feature, it is easier to determine what was done, when, by whom, and if any investigations are essential with regards to anomalies and security threats.

VMware’s vSphere developed out of complexity and necessity of expanding the virtualization market. The earlier serve products were not robust enough to deal with the increasing demands of IT departments. As businesses invested in virtualization, they had to consolidate and simplify their physical server farms into virtualized ones and this triggered the need for virtual infrastructure. With these VSphere 6.5 features in mind, you can unleash its full potential and usage. Make the switch today to the new and innovative VMware VSphere 6.5.

 

Source: 6 things that you should know about vMwarevSphere 6.5 by thomassujain

Feb 07, 19: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Insights  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> Caterpillar digs in to data analytics—investing in hot startup Uptake by analyticsweekpick

>> Healthcare Dashboards: Examples of Visualizing Key Metrics & KPIs by analyticsweek

>> 8 ways IBM Watson Analytics is transforming business by analyticsweekpick

Wanna write? Click Here

[ NEWS BYTES]

>>
 Five Data Science Job Opportunities At Microsoft India That You Should Not Miss Out On – Analytics India Magazine Under  Data Science

>>
 Big Data Analytics in Banking Market Competition from Opponents, Constraints and Threat Growth Rate … – thebankingsector.com Under  Big Data Analytics

>>
 Survey: What’s Ahead for Martech in 2019? – Franchising.com Under  Marketing Analytics

More NEWS ? Click Here

[ FEATURED COURSE]

CS229 – Machine Learning

image

This course provides a broad introduction to machine learning and statistical pattern recognition. … more

[ FEATURED READ]

On Intelligence

image

Jeff Hawkins, the man who created the PalmPilot, Treo smart phone, and other handheld devices, has reshaped our relationship to computers. Now he stands ready to revolutionize both neuroscience and computing in one strok… more

[ TIPS & TRICKS OF THE WEEK]

Fix the Culture, spread awareness to get awareness
Adoption of analytics tools and capabilities has not yet caught up to industry standards. Talent has always been the bottleneck towards achieving the comparative enterprise adoption. One of the primal reason is lack of understanding and knowledge within the stakeholders. To facilitate wider adoption, data analytics leaders, users, and community members needs to step up to create awareness within the organization. An aware organization goes a long way in helping get quick buy-ins and better funding which ultimately leads to faster adoption. So be the voice that you want to hear from leadership.

[ DATA SCIENCE Q&A]

Q:Do you think 50 small decision trees are better than a large one? Why?
A: * Yes!
* More robust model (ensemble of weak learners that come and make a strong learner)
* Better to improve a model by taking many small steps than fewer large steps
* If one tree is erroneous, it can be auto-corrected by the following
* Less prone to overfitting

Source

[ VIDEO OF THE WEEK]

@chrisbishop on futurist's lens on #JobsOfFuture #FutureofWork #JobsOfFuture #Podcast

 @chrisbishop on futurist’s lens on #JobsOfFuture #FutureofWork #JobsOfFuture #Podcast

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

The world is one big data problem. – Andrew McAfee

[ PODCAST OF THE WEEK]

@EdwardBoudrot / @Optum on #DesignThinking & #DataDriven Products #FutureOfData #Podcast

 @EdwardBoudrot / @Optum on #DesignThinking & #DataDriven Products #FutureOfData #Podcast

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

According to execs, the influx of data is putting a strain on IT infrastructure. 55 percent of respondents reporting a slowdown of IT systems and 47 percent citing data security problems, according to a global survey from Avanade.

Sourced from: Analytics.CLUB #WEB Newsletter

The pros of using facial recognition technology

In the present times, the implementation of FRT which is the abbreviated form of Facial Recognition Technology is widely used in workplaces and by company owners for eradicating the problems and occurrence of crime. The workplace is not the only area that has seen the inclusion of facial recognition as with time it has become a practical application that is part of human lives. The mobile phone that a person uses can also be enabled with a facial lock that will allow the user to maintain privacy with regard to his/her device.

The concept of facial recognition technology

The buzz about facial recognition and its implementation in various gadgets and spheres of human life can be well understood if the core concept or definition of this technology is known. Facial technology is software used in devices that are operated by artificial intelligence. The facial recognition technology makes use of biometric data which enables the mathematical mapping of the features of a human face. This is done so that it becomes easier to identify a person by his/her digital image. The software makes use of specific algorithms to compare the face of a person with the stored digital image which enables authenticity of the identification process. The other kinds of identifying technologies which are being used along with facial recognition include matching of fingerprints and recognition through voice.
The advantages of using facial recognition technology
When FRT is applied, then the identity of a person is confirmed by applying three methods. Firstly the fingerprints of the person being scanned, secondly the face detection tools are employed, and the third verification is done through comparison. In crowded places too the use of facial recognition is possible and will help in pointing out individuals. There are maniple advantages of using this technology, and some of these are discussed below:

• Automation of the process of identification and identity verification:

In any workplace or business center, numerous individuals are working in different areas at the same time. The presence of security guards at each and every checkpoint for entry and exit will require the employment of multiple security guards. The manual process of checking and identification will also require a considerable amount of time. However, if FRT is used, then the entire process becomes automatic and will save operational cost and time. The process of allowing people to enter the premises of the business organization becomes simplified with FRT installation. Hence extra surveillance for monitoring of cameras won’t be required once the identification process becomes automated.

• Provision of acquiring enhanced security:

Security is a huge factor in almost every place. In organizations breach in security can result in potential damage. Therefore investing in applications that will make the security foolproof is sensible. A biometric system will help a business owner to track anyone who enters the business organization. The system is designed in such a manner that if a person attempts to trespass then immediately an alert will be raised. Along with the alert, the picture of the person who has entered the premise without any permission will also be shown which will help in finding out the person quickly. For correct optimization of a business organization, a person can visit the site of redtailseo.com/las-vegas/.
The presence of security staff might not be able to observe such trespassers because usually people who are entering a premise illegally has some wrong intentions and will, therefore, know a way around to avoid the security guards. However, it is not that easy to avoid a facial recognition security system, and so trespassers will think twice before entering a protected building.

• Superior level of accuracy:

The technology of facial recognition has undergone many changes since its inception and are now equipped with infrared cameras and also enabled with 3D. This has undoubtedly increased the accuracy of the process of identification. It is true that there is a possibility to hoodwink a facial-recognition security system, but it is tough to do so. The level of accuracy also helps in cataloging all the individuals who have visited the facility and in case of any unpleasant situation, the recording done by the software throughout the day can be reviewed.

FRT system – The potential areas that can be worked upon

There is no doubt that facial recognition has changed the way of looking at security, but it also has some problem areas which can be worked upon. These problems are discussed below:

• The angle of the surveillance:

The angle by which the surveillance system is being placed often creates a lot of pressure for ensuring flawless identification. There are numerous angles which have to be used together for uploading a face into the storage system of the software. A frontal image of view is necessary for generating a face template that is clear and recognizable. The photos need to have high resolution, and the more than one angle has to be engaged for capturing the picture. If an intruder uses sunglasses, then it might be difficult for the software to pick up that individual.

• Processing of information and storage of data:

In the digital platform, the main thing that is indispensable is storage. A lot of information that is recorded has to be stored correctly so it can be reviewed later. In facial recognition system, each frame has to be processed, so a group of computers has to be used to minimize the time taken for processing information.

• The quality of the image stored:

Low-quality facial recognition will provide underwhelming results, and so advanced software system has to be used for operating the digital cameras that are responsible for image capturing. In the present system the image is captured from a particular video, and then it is compared with the stored photo, but such comparisons can be flawed as small size images do not provide correct results.

Therefore, it can be understood that there are some areas in case of facial recognition technology which can be improved but the introduction of this software has unquestionably revolutionized the security system.

Originally Posted at: The pros of using facial recognition technology