Apr 27, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Statistically Significant  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> Relative Performance Assessment: Improving your Competitive Advantage by bobehayes

>> Defining Predictive Analytics in Healthcare by analyticsweekpick

>> Development of the Customer Sentiment Index: Measuring Customers’ Attitudes by bobehayes

Wanna write? Click Here

[ NEWS BYTES]

>>
 Machine learning firm steps up its federal game – Washington Technology Under  Machine Learning

>>
 Verisk Analytics, Inc., Acquires Analyze Re – Yahoo Sports Under  Risk Analytics

>>
 Face Value: sentiment analysis shows business leaders are positive about the year ahead – The Conversation AU Under  Sentiment Analysis

More NEWS ? Click Here

[ FEATURED COURSE]

Applied Data Science: An Introduction

image

As the world’s data grow exponentially, organizations across all sectors, including government and not-for-profit, need to understand, manage and use big, complex data sets—known as big data…. more

[ FEATURED READ]

Introduction to Graph Theory (Dover Books on Mathematics)

image

A stimulating excursion into pure mathematics aimed at “the mathematically traumatized,” but great fun for mathematical hobbyists and serious mathematicians as well. Requiring only high school algebra as mathematical bac… more

[ TIPS & TRICKS OF THE WEEK]

Finding a success in your data science ? Find a mentor
Yes, most of us dont feel a need but most of us really could use one. As most of data science professionals work in their own isolations, getting an unbiased perspective is not easy. Many times, it is also not easy to understand how the data science progression is going to be. Getting a network of mentors address these issues easily, it gives data professionals an outside perspective and unbiased ally. It’s extremely important for successful data science professionals to build a mentor network and use it through their success.

[ DATA SCIENCE JOB Q&A]

Q:What is root cause analysis? How to identify a cause vs. a correlation? Give examples
A: Root cause analysis:
– Method of problem solving used for identifying the root causes or faults of a problem
– A factor is considered a root cause if removal of it prevents the final undesirable event from recurring

Identify a cause vs. a correlation:
– Correlation: statistical measure that describes the size and direction of a relationship between two or more variables. A correlation between two variables doesn’t imply that the change in one variable is the cause of the change in the values of the other variable
– Causation: indicates that one event is the result of the occurrence of the other event; there is a causal relationship between the two events
– Differences between the two types of relationships are easy to identify, but establishing a cause and effect is difficult

Example: sleeping with one’s shoes on is strongly correlated with waking up with a headache. Correlation-implies-causation fallacy: therefore, sleeping with one’s shoes causes headache.
More plausible explanation: both are caused by a third factor: going to bed drunk.

Identify a cause Vs a correlation: use of a controlled study
– In medical research, one group may receive a placebo (control) while the other receives a treatment If the two groups have noticeably different outcomes, the different experiences may have caused the different outcomes

Source

[ VIDEO OF THE WEEK]

#BigData @AnalyticsWeek #FutureOfData #Podcast with Michael OConnell, @Tibco

 #BigData @AnalyticsWeek #FutureOfData #Podcast with Michael OConnell, @Tibco

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

Data are becoming the new raw material of business. – Craig Mundie

[ PODCAST OF THE WEEK]

#BigData @AnalyticsWeek #FutureOfData #Podcast with Beena Ammanath, @GE

 #BigData @AnalyticsWeek #FutureOfData #Podcast with Beena Ammanath, @GE

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

94% of Hadoop users perform analytics on large volumes of data not possible before; 88% analyze data in greater detail; while 82% can now retain more of their data.

Sourced from: Analytics.CLUB #WEB Newsletter

Apr 24, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Data Accuracy  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> March 13, 2017 Health and Biotech analytics news roundup by pstein

>> Word For Social Media Strategy for Brick-Mortar Stores: “Community” by v1shal

>> How oil and gas firms are failing to grasp the necessity of Big Data analytics by anum

Wanna write? Click Here

[ NEWS BYTES]

>>
 Hybrid Cloud Transforms Enterprises – Business 2 Community – Business 2 Community Under  Hybrid Cloud

>>
 Is Oil A Long Term Buy? – Hellenic Shipping News Worldwide Under  Risk Analytics

>>
 How Big Data is Improving Cyber Security – CSO Online Under  cyber security

More NEWS ? Click Here

[ FEATURED COURSE]

Lean Analytics Workshop – Alistair Croll and Ben Yoskovitz

image

Use data to build a better startup faster in partnership with Geckoboard… more

[ FEATURED READ]

The Black Swan: The Impact of the Highly Improbable

image

A black swan is an event, positive or negative, that is deemed improbable yet causes massive consequences. In this groundbreaking and prophetic book, Taleb shows in a playful way that Black Swan events explain almost eve… more

[ TIPS & TRICKS OF THE WEEK]

Save yourself from zombie apocalypse from unscalable models
One living and breathing zombie in today’s analytical models is the pulsating absence of error bars. Not every model is scalable or holds ground with increasing data. Error bars that is tagged to almost every models should be duly calibrated. As business models rake in more data the error bars keep it sensible and in check. If error bars are not accounted for, we will make our models susceptible to failure leading us to halloween that we never wants to see.

[ DATA SCIENCE JOB Q&A]

Q:What is: lift, KPI, robustness, model fitting, design of experiments, 80/20 rule?
A: Lift:
It’s measure of performance of a targeting model (or a rule) at predicting or classifying cases as having an enhanced response (with respect to the population as a whole), measured against a random choice targeting model. Lift is simply: target response/average response.

Suppose a population has an average response rate of 5% (mailing for instance). A certain model (or rule) has identified a segment with a response rate of 20%, then lift=20/5=4

Typically, the modeler seeks to divide the population into quantiles, and rank the quantiles by lift. He can then consider each quantile, and by weighing the predicted response rate against the cost, he can decide to market that quantile or not.
“if we use the probability scores on customers, we can get 60% of the total responders we’d get mailing randomly by only mailing the top 30% of the scored customers”.

KPI:
– Key performance indicator
– A type of performance measurement
– Examples: 0 defects, 10/10 customer satisfaction
– Relies upon a good understanding of what is important to the organization

More examples:

Marketing & Sales:
– New customers acquisition
– Customer attrition
– Revenue (turnover) generated by segments of the customer population
– Often done with a data management platform

IT operations:
– Mean time between failure
– Mean time to repair

Robustness:
– Statistics with good performance even if the underlying distribution is not normal
– Statistics that are not affected by outliers
– A learning algorithm that can reduce the chance of fitting noise is called robust
– Median is a robust measure of central tendency, while mean is not
– Median absolute deviation is also more robust than the standard deviation

Model fitting:
– How well a statistical model fits a set of observations
– Examples: AIC, R2, Kolmogorov-Smirnov test, Chi 2, deviance (glm)

Design of experiments:
The design of any task that aims to describe or explain the variation of information under conditions that are hypothesized to reflect the variation.
In its simplest form, an experiment aims at predicting the outcome by changing the preconditions, the predictors.
– Selection of the suitable predictors and outcomes
– Delivery of the experiment under statistically optimal conditions
– Randomization
– Blocking: an experiment may be conducted with the same equipment to avoid any unwanted variations in the input
– Replication: performing the same combination run more than once, in order to get an estimate for the amount of random error that could be part of the process
– Interaction: when an experiment has 3 or more variables, the situation in which the interaction of two variables on a third is not additive

80/20 rule:
– Pareto principle
– 80% of the effects come from 20% of the causes
– 80% of your sales come from 20% of your clients
– 80% of a company complaints come from 20% of its customers

Source

[ VIDEO OF THE WEEK]

#DataScience Approach to Reducing #Employee #Attrition

 #DataScience Approach to Reducing #Employee #Attrition

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

Everybody gets so much information all day long that they lose their common sense. – Gertrude Stein

[ PODCAST OF THE WEEK]

Unconference Panel Discussion: #Workforce #Analytics Leadership Panel

 Unconference Panel Discussion: #Workforce #Analytics Leadership Panel

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

Data is growing faster than ever before and by the year 2020, about 1.7 megabytes of new information will be created every second for every human being on the planet.

Originally Posted at: Apr 24, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

Emergency Preparation Checklist for Severe Weather

Emergency Preparation Checklist for weather emergenciesOn Coming arrival of hurricane, I searched and prepared this list of things to keep in mind and do incase of weather emergencies such as a hurricance. Create a central place where you keep all the required equipments and supplies needed in case of weather emergencies. Place a list of items and important number in that place for easy access. There, you can keep items you’ll need in case disaster strikes suddenly or you need to evacuate. Always audit the area to make sure all supplies are readily available for grab.

Here are recommendations on what to do before a storm approaches:
— Download weather apps, The Red Cross has a Hurricane App available in the Apple App Store and the Google Play Store.
— Also, download a First Aid app.
— If high wind is expected, seal up windows and doors with 5/8 inch plywood.
— Tighten all the outside items in if they could be picked up by the wind.
— Make sure gutters are clear from any debris.
— Reinforce the garage door.
— Turn the refrigerator to its coldest setting in case power goes off.
— Use a cooler to keep from opening the doors on the freezer or refrigerator.
— Park your car at a safe place, away from Trees or any object that could fly and damage the car.
— Fill a bathtub with water, secure vent with some plastic bag to save water from slow leak.
— Top off the fuel tank on your car.
— Go over the evacuation plan with the family, and learn alternate routes to safety.
— Learn the location of the nearest shelter or in case of pets, nearest pet-friendly shelter.
— In case severe flooding, put an ax in your attic.
— If evacuation is needed, and stick to marked evacuation routes, if possible.
— Store important documents — passports, Social Security cards, birth certificates, deeds — in a watertight container.
— Create inventory list for your household property.
— Leave a note at some noticeable place about your whereabouts.
— Unplug small appliances and electronics before leaving.
— If possible, turn off the electricity, gas and water for residence.
Here is a list of supplies:
— One gallon per person per day and gather a three day supply worth..
— Three days of food, with suggested items including: canned meats, canned or dried fruits, canned vegetables, canned juice, peanut butter, jelly, salt-free crackers, energy/protein bars, trail mix/nuts, dry cereal, cookies or other comfort food.
— Flashlight(s).
— A battery-powered radio, preferably a weather radio.
— Extra batteries.
— A can opener.
— A small fire extinguisher.
— Whistles for each person.
— A first aid kit, including latex gloves; sterile dressings; soap/cleaning agent; antibiotic ointment; burn ointment; adhesive bandages in small, medium and large sizes; eye wash; a thermometer; aspirin/pain reliever; anti-diarrhea tablets; antacids; laxatives; small scissors; tweezers; petroleum jelly.
— A seven-day supply of medications.
— Vitamins.
— A map of the area.
— Baby supplies.
— Pet supplies.
— Wet wipes.
— A camera (to document storm damage).
— A multipurpose tool, with pliers and a screwdriver.
— Cell phones and chargers.
— Contact information for the family.
— A sleeping bag for each person.
— Extra cash.
— An extra set of house keys.
— An extra set of car keys.
— An emergency ladder to evacuate the second floor.
— Household bleach.
— Paper cups, plates and paper towels.
— A silver foil emergency blanket (else a normal blanket will do).
— Insect repellent.
— Rain gear.
— Tools and supplies for securing your home.
— Plastic sheeting.
— Duct tape.
— Dust masks.
— Activities for children.
— Charcoal and matches, if you have a portable grill. But only use it outside.
American Red Cross tips on what to do after the storm arrives:
— Continue listening to a NOAA Weather Radio or the local news for the latest updates.
— Stay alert for extended rainfall and subsequent flooding even after the hurricane or tropical storm has ended.
— If you evacuated, return home only when officials say it is safe.
— Drive only if necessary and avoid flooded roads and washed out bridges.
— Keep away from loose or dangling power lines and report them immediately to the power company.
— Stay out of any building that has water around it.
— Inspect your home for damage. Take pictures of damage, both of the building and its contents, for insurance purposes.
— Use flashlights in the dark. Do NOT use candles.
— Avoid drinking or preparing food with tap water until you are sure it’s not contaminated.
— Check refrigerated food for spoilage. If in doubt, throw it out.
— Wear protective clothing and be cautious when cleaning up to avoid injury.
— Watch animals closely and keep them under your direct control.
— Use the telephone only for emergency calls.
Sources: American Red Cross, Federal Emergency Management Agency, National Hurricane CenterRedcross also provides specific checklists for specific weather emergencies, it could be found here

Source

Apr 20, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

 

Issue #15    Web Version
Contact Us: info@analyticsweek.com

[  ANNOUNCEMENT ]

I hope this note finds you well. Please excuse the brief interruption in our newsletter. Over past few weeks, we have been doing some A/B testing and mounting our Newsletter on our AI led coach TAO.ai. This newsletter and future versions would be using capability of TAO. As with any AI, it needs some training, so kindly excuse/report the rough edges.

– Team TAO/AnalyticsCLUB

[  COVER OF THE WEEK ]

image
Weak data  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> Collaborative Analytics: Analytics for your BigData by v1shal

>> Colleges are using big data to identify when students are likely to flame out by analyticsweekpick

>> Rise of Data Capital by Paul Sonderegger by thebiganalytics

Wanna write? Click Here

[ NEWS BYTES]

>>
 Strategy Analytics: Android accounts for 88% of smartphones shipped in Q3 2016 – GSMArena.com Under  Analytics

>>
 Did you know we’re sedentary but less obese than average? So says Miami statistics website – Miami Herald Under  Statistics

>>
 MHS grad sinks Steel Roots in cyber security – News – North of … – Wicked Local North of Boston Under  cyber security

More NEWS ? Click Here

[ FEATURED COURSE]

Statistical Thinking and Data Analysis

image

This course is an introduction to statistical data analysis. Topics are chosen from applied probability, sampling, estimation, hypothesis testing, linear regression, analysis of variance, categorical data analysis, and n… more

[ FEATURED READ]

The Signal and the Noise: Why So Many Predictions Fail–but Some Don’t

image

People love statistics. Statistics, however, do not always love them back. The Signal and the Noise, Nate Silver’s brilliant and elegant tour of the modern science-slash-art of forecasting, shows what happens when Big Da… more

[ TIPS & TRICKS OF THE WEEK]

Grow at the speed of collaboration
A research by Cornerstone On Demand pointed out the need for better collaboration within workforce, and data analytics domain is no different. A rapidly changing and growing industry like data analytics is very difficult to catchup by isolated workforce. A good collaborative work-environment facilitate better flow of ideas, improved team dynamics, rapid learning, and increasing ability to cut through the noise. So, embrace collaborative team dynamics.

[ DATA SCIENCE JOB Q&A]

Q:What is cross-validation? How to do it right?
A: It’s a model validation technique for assessing how the results of a statistical analysis will generalize to an independent data set. Mainly used in settings where the goal is prediction and one wants to estimate how accurately a model will perform in practice. The goal of cross-validation is to define a data set to test the model in the training phase (i.e. validation data set) in order to limit problems like overfitting, and get an insight on how the model will generalize to an independent data set.

Examples: leave-one-out cross validation, K-fold cross validation

How to do it right?

the training and validation data sets have to be drawn from the same population
predicting stock prices: trained for a certain 5-year period, it’s unrealistic to treat the subsequent 5-year a draw from the same population
common mistake: for instance the step of choosing the kernel parameters of a SVM should be cross-validated as well
Bias-variance trade-off for k-fold cross validation:

Leave-one-out cross-validation: gives approximately unbiased estimates of the test error since each training set contains almost the entire data set (n?1n?1 observations).

But: we average the outputs of n fitted models, each of which is trained on an almost identical set of observations hence the outputs are highly correlated. Since the variance of a mean of quantities increases when correlation of these quantities increase, the test error estimate from a LOOCV has higher variance than the one obtained with k-fold cross validation

Typically, we choose k=5 or k=10, as these values have been shown empirically to yield test error estimates that suffer neither from excessively high bias nor high variance.
Source

[ ENGAGE WITH CLUB]

 ASK Club      FIND Project   

Get HIRED  #GetTAO Coach

 

[ FOLLOW & SIGNUP]

TAO

iTunes

XbyTAO

Facebook

Twitter

Youtube

Analytic.Club

LinkedIn

Newsletter

[ ENGAGE WITH TAO]

#GetTAO Coach

  Join @xTAOai  

[ VIDEO OF THE WEEK]

Data-As-A-Service (#DAAS) to enable compliance reporting

 Data-As-A-Service (#DAAS) to enable compliance reporting

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

It is a capital mistake to theorize before one has data. Insensibly, one begins to twist the facts to suit theories, instead of theories to

[ PODCAST OF THE WEEK]

#BigData @AnalyticsWeek #FutureOfData #Podcast with David Rose, @DittoLabs

 #BigData @AnalyticsWeek #FutureOfData #Podcast with David Rose, @DittoLabs

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

By then, our accumulated digital universe of data will grow from 4.4 zettabyets today to around 44 zettabytes, or 44 trillion gigabytes.

[ TAO DEMO]

AnalyticsClub Demo Video

 

[ PROGRAMS]

Invite top local professionals to your office

 

↓

 

Data Analytics Hiring Drive

 

 
*This Newsletter is hand-curated and autogenerated using #TEAMTAO & TAO, excuse some initial blemishes. As with any AI, it may get worse before it will get relevant, excuse us with your patience & feedback.
Let us know how we could improve the experience using: feedbackform

Copyright © 2016 AnalyticsWeek LLC.

Data-As-A-Service to enable compliance reporting

Data-As-A-Service to enable compliance reporting
Data-As-A-Service to enable compliance reporting

Big Data tools are clearly very powerful & flexible while dealing with unstructured information. However, they are equally applicable, especially when combined with columnar stores such as parquet, to address rapidly changing regulatory requirements that involve reporting & analyzing data across multiple silos of structured information. This is an example of applying multiple big data tools to create data-as-a-service that brings together a data hub, and enable very high performance analytics & reporting leveraging a combination of HDFS, Spark, Cassandra, Parquet, Talend and Jasper. In this talk, we will discuss the architecture, challenges & opportunities of designing data-as-a-Service that enables businesses to respond to changing regulatory & compliance requirements.

Speaker:
Girish Juneja, Senior Vice President/CTO at Altisource

Girish Juneja is in charge of guiding Altisource’s technology vision and will led technology teams across Boston, Los Angeles, Seattle and other cities nationally and nationally, according to a release.

Girish was formerly general manager of big data products and chief technology officer of data center software at California-based chip maker Intel Corp. (Nasdaq: INTC). He helped lead several acquisitions including the acquisition of McAfee Inc. in 2011, according to a release.

He was also the co-founder of technology company Sarvega Inc., acquired by Intel in 2005, and he holds a master’s degree in computer science and an MBA in finance and strategy from the University of Chicago.

Slideshare:
[slideshare id=40783439&doc=girshmeetuppresntationmm3-141027135842-conversion-gate01]
Video:

Source: Data-As-A-Service to enable compliance reporting

Why So Many ‘Fake’ Data Scientists?

Have you noticed how many people are suddenly calling themselves data scientists? Your neighbour, that gal you met at a cocktail party — even your accountant has had his business cards changed!

There are so many people out there that suddenly call themselves ‘data scientists’ because it is the latest fad. The Harvard Business Review even called it the sexiest job of the 21st century! But in fact, many calling themselves data scientists are lacking the full skill set I would expect were I in charge of hiring a data scientist.

What I see is many business analysts that haven’t even got any understanding of big data technology or programming languages call themselves data scientists. Then there are programmers from the IT function who understand programming but lack the business skills, analytics skills or creativity needed to be a true data scientist.

Part of the problem here is simple supply and demand economics: There simply aren’t enough true data scientists out there to fill the need, and so less qualified (or not qualified at all!) candidates make it into the ranks.

Second is that the role of a data scientist is often ill-defined within the field and even within a single company.  People throw the term around to mean everything from a data engineer (the person responsible for creating the software “plumbing” that collects and stores the data) to statisticians who merely crunch the numbers.

A true data scientist is so much more. In my experience, a data scientist is:

  • multidisciplinary. I have seen many companies try to narrow their recruiting by searching for only candidates who have a Phd in mathematics, but in truth, a good data scientist could come from a variety of backgrounds — and may not necessarily have an advanced degree in any of them.
  • business savvy.  If a candidate does not have much business experience, the company must compensate by pairing him or her with someone who does.
  • analytical. A good data scientist must be naturally analytical and have a strong ability to spot patterns.
  • good at visual communications. Anyone can make a chart or graph; it takes someone who understands visual communications to create a representation of data that tells the story the audience needs to hear.
  • versed in computer science. Professionals who are familiar with Hadoop, Java, Python, etc. are in high demand. If your candidate is not expert in these tools, he or she should be paired with a data engineer who is.
  • creative. Creativity is vital for a data scientist, who needs to be able to look beyond a particular set of numbers, beyond even the company’s data sets to discover answers to questions — and perhaps even pose new questions.
  • able to add significant value to data. If someone only presents the data, he or she is a statistician, not a data scientist. Data scientists offer great additional value over data through insights and analysis.
  • a storyteller. In the end, data is useless without context. It is the data scientist’s job to provide that context, to tell a story with the data that provides value to the company.

If you can find a candidate with all of these traits — or most of them with the ability and desire to grow — then you’ve found someone who can deliver incredible value to your company, your systems, and your field.

But skimp on any of these traits, and you run the risk of hiring an imposter, someone just hoping to ride the data sciences bubble until it bursts.

To read the original article on Data Science Central, click here.

Originally Posted at: Why So Many ‘Fake’ Data Scientists? by analyticsweekpick

Apr 13, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

 

Issue #15    Web Version
Contact Us: info@analyticsweek.com

[  ANNOUNCEMENT ]

I hope this note finds you well. Please excuse the brief interruption in our newsletter. Over past few weeks, we have been doing some A/B testing and mounting our Newsletter on our AI led coach TAO.ai. This newsletter and future versions would be using capability of TAO. As with any AI, it needs some training, so kindly excuse/report the rough edges.

– Team TAO/AnalyticsCLUB

[  COVER OF THE WEEK ]

image
Data security  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> The What and Where of Big Data: A Data Definition Framework by bobehayes

>> The Cost Of Too Much Data by v1shal

>> Unraveling the Mystery of Big Data by v1shal

Wanna write? Click Here

[ NEWS BYTES]

>>
 How a Data Scientist’s Job ‘Play in Front’ than other BI and Analytic Roles – CIOReview Under  Data Scientist

>>
 AI, Machine Learning to Reach $47 Billion by 2020 – Infosecurity Magazine Under  Machine Learning

>>
 Software to “Encode the Mindset” of Lawyers – Lawfuel (blog) Under  Prescriptive Analytics

More NEWS ? Click Here

[ FEATURED COURSE]

Lean Analytics Workshop – Alistair Croll and Ben Yoskovitz

image

Use data to build a better startup faster in partnership with Geckoboard… more

[ FEATURED READ]

Storytelling with Data: A Data Visualization Guide for Business Professionals

image

Storytelling with Data teaches you the fundamentals of data visualization and how to communicate effectively with data. You’ll discover the power of storytelling and the way to make data a pivotal point in your story. Th… more

[ TIPS & TRICKS OF THE WEEK]

Analytics Strategy that is Startup Compliant
With right tools, capturing data is easy but not being able to handle data could lead to chaos. One of the most reliable startup strategy for adopting data analytics is TUM or The Ultimate Metric. This is the metric that matters the most to your startup. Some advantages of TUM: It answers the most important business question, it cleans up your goals, it inspires innovation and helps you understand the entire quantified business.

[ DATA SCIENCE JOB Q&A]

Q:What is cross-validation? How to do it right?
A: It’s a model validation technique for assessing how the results of a statistical analysis will generalize to an independent data set. Mainly used in settings where the goal is prediction and one wants to estimate how accurately a model will perform in practice. The goal of cross-validation is to define a data set to test the model in the training phase (i.e. validation data set) in order to limit problems like overfitting, and get an insight on how the model will generalize to an independent data set.

Examples: leave-one-out cross validation, K-fold cross validation

How to do it right?

the training and validation data sets have to be drawn from the same population
predicting stock prices: trained for a certain 5-year period, it’s unrealistic to treat the subsequent 5-year a draw from the same population
common mistake: for instance the step of choosing the kernel parameters of a SVM should be cross-validated as well
Bias-variance trade-off for k-fold cross validation:

Leave-one-out cross-validation: gives approximately unbiased estimates of the test error since each training set contains almost the entire data set (n?1n?1 observations).

But: we average the outputs of n fitted models, each of which is trained on an almost identical set of observations hence the outputs are highly correlated. Since the variance of a mean of quantities increases when correlation of these quantities increase, the test error estimate from a LOOCV has higher variance than the one obtained with k-fold cross validation

Typically, we choose k=5 or k=10, as these values have been shown empirically to yield test error estimates that suffer neither from excessively high bias nor high variance.
Source

[ ENGAGE WITH CLUB]

 ASK Club      FIND Project   

Get HIRED  #GetTAO Coach

 

[ FOLLOW & SIGNUP]

TAO

iTunes

XbyTAO

Facebook

Twitter

Youtube

Analytic.Club

LinkedIn

Newsletter

[ ENGAGE WITH TAO]

#GetTAO Coach

  Join @xTAOai  

[ VIDEO OF THE WEEK]

#BigData @AnalyticsWeek #FutureOfData #Podcast with Eloy Sasot, News Corp

 #BigData @AnalyticsWeek #FutureOfData #Podcast with Eloy Sasot, News Corp

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

Processed data is information. Processed information is knowledge Processed knowledge is Wisdom. – Ankala V. Subbarao

[ PODCAST OF THE WEEK]

#BigData @AnalyticsWeek #FutureOfData #Podcast with Joe DeCosmo, @Enova

 #BigData @AnalyticsWeek #FutureOfData #Podcast with Joe DeCosmo, @Enova

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

140,000 to 190,000. Too few people with deep analytical skills to fill the demand of Big Data jobs in the U.S. by 2018.

[ TAO DEMO]

AnalyticsClub Demo Video

 

[ PROGRAMS]

Invite top local professionals to your office

 

↓

 

Data Analytics Hiring Drive

 

 
*This Newsletter is hand-curated and autogenerated using #TEAMTAO & TAO, excuse some initial blemishes. As with any AI, it may get worse before it will get relevant, excuse us with your patience & feedback.
Let us know how we could improve the experience using: feedbackform

Copyright © 2016 AnalyticsWeek LLC.

Analyzing Big Data: A Customer-Centric Approach

Big Data

The latest buzz word in business is Big Data. According to Pat Gelsinger, President and COO of EMC, in an article by the The Wall Street Journal, Big Data refers to the idea that companies can extract value from collecting, processing and analyzing vast quantities of data. Businesses who can get a better handle on these data will be more likely to outperform their competitors who do not.

When people talk about Big Data, they are typically referring to three characteristics of the data:

  1. Volume: the amount of data being collected is massive
  2. Velocity: the speed at which data are being generated/collected is very fast (consider the streams of tweets)
  3. Variety: the different types of data like structured and unstructured data

Because extremely large data sets cannot be processed using conventional database systems, companies have created new ways of processing (e.g., storing, accessing and analyzing) this big data. Big Data is about housing data on multiple servers for quick access and employing parallel processing of the data (rather than following sequential steps).

Business Value of Big Data Will Come From Analytics

In a late 2010 study, researchers from MIT Sloan Management Review and IBM asked 3000 executives, managers and analysts about how they obtain value from their massive amounts of data.  They found that organizations that used business information and analytics outperformed organizations who did not. Specifically, these researchers found that top-performing businesses were twice as likely to use analytics to guide future strategies and guide day-to-day operations compared to their low-performing counterparts.

The MIT/IBM researchers, however, also found that the number one obstacle to the adoption of analytics in their organizations was a lack of understanding of how to use analytics to improve the business (the second and third top obstacles were: Lack of management bandwidth due to competing priorities and a lack of skills internally). In addition, there are simply not enough people with Big Data analysis skills.  McKinsey and Company estimates that the “United States faces a shortage of 140,000 to 190,000 people with analytical expertise and 1.5 million managers and analysts with the skills to understand and make decisions based on the analysis of big data.”

Customer Experience Management and Big Data

The problem of Big Data is one of applying appropriate analytic techniques to business data to extract value. Companies who can apply appropriate statistical models to their data will make better sense of the data and, consequently, get more value from those data. Generally speaking, business data can be divided into four types:

  1. Operational
  2. Financial
  3. Constituency (includes employees, partners)
  4. Customer

Customer Experience Management (CEM) is the process of understanding and managing customers’ interactions with and perceptions about the company/brand. Businesses are already realizing the value of integrating different types of customer data to improve customer loyalty. In my research on best practices in customer feedback programs, I found that the integration of different types of customer data (purchase history, service history, values and satisfaction) are necessary for an effective customer feedback program. Specifically, I found that loyalty leading companies, compared to their loyalty lagging counterparts, link customer feedback metrics to a variety of business metrics (operational, financial, constituency) to uncover deeper customer insights. Additionally, to facilitate this integration between attitudinal data and objective business data, loyalty leaders also integrate customer feedback into their daily business processes and customer relationship management system.

While I have not yet used new technology that supports Big Data (e.g., Hadoop, MapReduce) to process data, I have worked with businesses to merge disparate data sets to conduct what is commonly called Business Linkage Analysis. Business linkage analysis is a problem of data organization. The ultimate goal of linkage analysis is to understand the causes and consequences of customer loyalty (e.g., advocacy, purchasing, retention). I think that identifying the correlates of customer metrics is central to extracting value from Big Data.

Customer-Centric Approach to Analyzing Big Data

I have written three posts on different types of linkage analysis, each presenting a data model (a way to organize the data) to conduct each type of linkage analysis. The key to conducting linkage analysis is to ensure the different data sets are organized (e.g., aggregated) properly to support the conclusions you want to make from your combined data.

  • Linking operational and customer metrics: We are interested in calculating the statistical relationships between customer metrics and operational metrics. Data are aggregated at the transaction level.  Understanding these relationships allows businesses to build/identify customer-centric business metrics, manage customer relationships using objective operational metrics and reward employee behavior that will drive customer satisfaction.
  • Linking financial and customer metrics: We are interested in calculating the statistical relationships between customer metrics and financial business outcomes. Data are aggregated at the customer level. Understanding these relationships allows you to strengthen the business case for your CEM program, identify drivers of real customer behaviors and determine ROI for customer experience improvement solutions.
  • Linking constituency and customer metrics: We are interested in calculating the statistical relationship between customer metrics and employee/partner metrics (e.g., satisfaction, loyalty, training metrics). Data are aggregated at the constituency level. Understanding these relationships allows businesses to understand the impact of employee and partner experience on the customer experience, improve the health of the customer relationship by improving the health of the employee and partner relationship and build a customer centric culture.

Summary

The era of Big Data is upon us. From small and midsize companies to large enterprise companies, their ability to extract value from big data through smart analytics will be the key to their business success. In this post, I presented a few analytic approaches in which different types of data sources are merged with customer feedback data. This customer-centric approach allows for businesses to analyze their data in a way that helps them understand the reasons for customer dis/loyalty and the impact dis/loyalty has to the growth of the company.

Download Free Paper on Linkage Analysis

Source: Analyzing Big Data: A Customer-Centric Approach by bobehayes

Planning The Future with Wearables, IOT, and Big Data

wearable-x-rays
According to Dataconomy, this year’s BARC study shows 83% of companies already invested in Big Data, or planning future engagement – a 20% increase on Gartner’s 2013 calculations. The Internet of Things has changed our data collection processes from computer-bound functions to real-world operations, with newly-connected everyday objects providing in-depth information around individual habits, preferences, and personal stats. This relative data allows companies to create and adapt their products and services for enhanced user experiences and personalized services.

Healthcare

With Fitbit’s second quarter revenue of $400 million tripling expectations, and reported sales of 4.5 million devices in this second quarter alone, it’s obvious that health-conscious individuals are eager to make use of the fitness benefits Wearables offer. However, Wearables are not only encouraging users to be more active but are being used to simplify and transform patient-centric care. Able to monitor heart rate and vital signs as well as activity levels, Wearables are able to alert users, doctors, emergency response or family members of signs of distress. The heart rates of those with heart disease can be carefully monitored, alerts can discourage users from harmful behaviors and encourage positive ones, surgeons can use smart glasses to monitor vital signs during operations, and the vast quantities of data received can be used for epidemiological studies. Healthcare providers have many IoT opportunities available to them, and those correctly making use of them will improve patient welfare and treatment as well as ensure their own success.

Insurance

Insurers also have a wealth of opportunities available to them should they properly utilize Wearables and the Internet of Things. By using data acquired from Wearables for more accurate underwriting, products can be tailored to the individual. Information such as location, level of exercise, driving record, medications used, work history, credit ratings, hobbies and interests, and spending habits can be acquired through data amalgamation, and instead of relying on client declarations, companies have access to more accurate and honest data.

Entertainment

iWatch-Apple

Not only useful, Wearables and the Internet of Things have a strong base in amusement. Though these devices are accumulating enormous quantities of practical data, their primary purpose for users is often recreational. Macworld suggests the Apple Watch is not here to entertain, but the array of applications available would suggest otherwise. CIO looks at some weird and wacky Wearables that would suit anyone’s Christmas list, including Ping, a social networking garment; Motorola digital tattoos; tweeting bras; Peekiboo for seeing the world through your child’s eyes; and smart pajamas that let you know when your kids are ready for bed. Most of us don’t need any of these things, but we want them. And they all collect massive quantities of data by the microsecond.

Legalities

But of course, all this data flying around comes with some serious risks, not least of all invasion of privacy. As the years have gone by, we’ve become less and less concerned about how much data we’re offering up, never considering its security or the implications of providing it. Questions around whether data recorded from Wearables is legally governed ‘personal data’ have arisen, and the collection and use of this data is likely to face some serious legal challenges in the future. It’s not likely Wearables are going to disappear, but shrewd developers are creating safer, more secure products to best navigate these waters.

Article originally appeared HERE.

Source: Planning The Future with Wearables, IOT, and Big Data by analyticsweekpick

Webinar: Improving the Customer Experience Using Big Data, Customer-Centric Measurement and Analytics

pivotal-logo-taglineI recently gave a talk on how to improve the customer experience using Big Data, customer-centric measurement and analytics. My talk was hosted by the good people at Pivotal (recently Cetas).

You can view the webinar by registering here or you can view the slides below. In this webinar, Improving the Customer Experience Using Big Data, Customer-Centric Measurement and Analytics, I include content from my new book “TCE – Total Customer Experience: Building Business Through Customer-Centric Measurement and Analytics.” I discuss three areas: measuring the right customer metrics, integrating disparate data silos and using Big Data to answer strategic business questions. Using the right customer metrics in conjunction with other business data, businesses will be able to extract meaningful results that help executives make the right decisions to move their company forward.

In the book, I present best practices in measurement and analytics for customer experience management (CEM) programs.  Drawing on decades of research and practice, I illustrate analytical best practices in the field of customer experience management that will help you increase the value of all your business data to help improve the customer experience and increase customer loyalty.

 

Source