May 11, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Data Mining  Source

[ NEWS BYTES]

>>
 Incentives need to change for firms to take cyber-security more … – The Economist Under  cyber security

>>
 Xilinx Expands into Wide Range of Vision-Guided Machine Learning Applications with reVISION – Design and Reuse (press release) Under  Machine Learning

>>
 Neustar forms marketing analytics partnership with Facebook 26 September 2016 – Research Magazine Under  Marketing Analytics

More NEWS ? Click Here

[ FEATURED COURSE]

CS229 – Machine Learning

image

This course provides a broad introduction to machine learning and statistical pattern recognition. … more

[ FEATURED READ]

Data Science from Scratch: First Principles with Python

image

Data science libraries, frameworks, modules, and toolkits are great for doing data science, but they’re also a good way to dive into the discipline without actually understanding data science. In this book, you’ll learn … more

[ TIPS & TRICKS OF THE WEEK]

Keeping Biases Checked during the last mile of decision making
Today a data driven leader, a data scientist or a data driven expert is always put to test by helping his team solve a problem using his skills and expertise. Believe it or not but a part of that decision tree is derived from the intuition that adds a bias in our judgement that makes the suggestions tainted. Most skilled professionals do understand and handle the biases well, but in few cases, we give into tiny traps and could find ourselves trapped in those biases which impairs the judgement. So, it is important that we keep the intuition bias in check when working on a data problem.

[ DATA SCIENCE Q&A]

Q:What is the Law of Large Numbers?
A: * A theorem that describes the result of performing the same experiment a large number of times
* Forms the basis of frequency-style thinking
* It says that the sample mean, the sample variance and the sample standard deviation converge to what they are trying to estimate
* Example: roll a dice, expected value is 3.5. For a large number of experiments, the average converges to 3.5

Source

[ VIDEO OF THE WEEK]

#BigData @AnalyticsWeek #FutureOfData #Podcast with Dr. Nipa Basu, @DnBUS

 #BigData @AnalyticsWeek #FutureOfData #Podcast with Dr. Nipa Basu, @DnBUS

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

Without big data, you are blind and deaf and in the middle of a freeway. – Geoffrey Moore

[ PODCAST OF THE WEEK]

#BigData @AnalyticsWeek #FutureOfData #Podcast with Scott Zoldi, @fico

 #BigData @AnalyticsWeek #FutureOfData #Podcast with Scott Zoldi, @fico

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

Every person in the world having more than 215m high-resolution MRI scans a day.

Sourced from: Analytics.CLUB #WEB Newsletter

Can Hadoop be Apple easy?

Hadoop is now on the minds of executives who care deeply about the power of their rapidly accumulating data. It has already inspired a broad range of big data experiments, established a beachhead as a production system in the enterprise and garnered tremendous optimism for expanded use.

However, it is also starting to create tremendous frustration. A recent analyst report showed less enthusiasm for Hadoop pilots this year than last. Many companies are getting lost on their way to big data glory. Instead, they find themselves in a confusing place of complexity and befuddlement. What’s going on?

While there are heady predictions that by 2020, 75 percent of the Fortune 2000 will be running a 1,000-node Hadoop cluster, there is also evidence that Hadoop is not being adopted as easily as one would think. In 2013, six years after the birth of Hadoop, Gartner said that only 10 percent of the organizations it surveyed were using Hadoop. According to the most recent Gartner survey, less than 50 percent of 284 respondents have invested in Hadoop technology or even plan to do so.

data-center-Tim-Dorr-Flickr

The current attempts to transform Hadoop into a full-blown enterprise product only accomplish the basics and leave the most challenging activities, the operations part, to the users, who, for good reason, wonder what to do next. Now we get to the problem. Hadoop is still complex to run at scale and in production.

Once you get Hadoop running, the real work is just beginning. In order to provide value to the business you need to maintain a cluster that is always up and high performance while being transparent to the end-user. You must make sure the jobs don’t get in each other’s way. You need to support different types of jobs that compete for resources. You have to monitor and troubleshoot the work as it flows through the system. This means doing all sorts of work that is managed, controlled, and monitored by experts. These tasks include diagnosing problems with users’ jobs, handling resource contention between users, resolving problems with jobs that block each other, etc.

How can companies get past the painful stage and start achieving the cost and big data benefits that Hadoop promises? When we look at the advanced practitioners, those companies that have ample data and ample resources to pursue the benefits of Hadoop, we find evidence that the current ways of using Hadoop still require significant end-customer involvement and hands-on support in order to be successful.

For example, Netflix created the Genie project to streamline the use of Amazon Elastic MapReduce by its data scientists, whom Netflix wanted to insulate from the complexity of creating and managing clusters. The Genie project fills the gaps between what Amazon offers and what Netflix actually needs to run diverse workloads in an efficient manner. After a user describes the nature of a desired workload by using metadata, Genie matches the workload with clusters that are best suited to run it, thereby granting the user’s wish.

Once Hadoop finds its “genie,” it can solve the problem of turning Hadoop into a useful tool that can be run at scale and in production. The reason Hadoop adoption and the move into production is going slowly is that these hard problems are being figured out over and over again, stalling progress. By filling this gap for Hadoop, users can do just what they want to do, and learn things about data, without having to waste time learning about Hadoop.

To read the original article on Venture Beat, click here.

Source: Can Hadoop be Apple easy?

Deriving “Inherently Intelligent” Information from Artificial Intelligence

The emergence of big data and scalable data lakes has made it easy for organizations to focus on amassing enormous quantities of data–almost to the exclusion of the analytic insight which renders big data an asset.

According to Paxata co-founder and Chief Product Officer Nenshad Bardoliwalla,“People are collecting this data but they have no idea what’s actually in the data lake, so they can’t take advantage of it.”

Instead of focusing on data and its collection, enterprises should focus on information and its insight, which is the natural outcome of intelligent analytics. Data preparation exists at the nexus between ingesting data and obtaining valuable information from them, and is the critical requisite which has traditionally kept data in the backrooms of IT and away from the business users that need them.

Self-service data preparation tools, however, enable business users to actuate most aspects of preparation—including integration, data quality measures, data governance adherence, and transformation—themselves. The incorporation of myriad facets of artificial intelligence including machine learning, natural language processing, and semantic ontologies both automates and expedites these processes, delivering their vaunted capabilities to the business users who have the most to gain from them.

“How do I get information that allows me to very rapidly do analysis or get insight without having to make this a PhD thesis for every person in the company?” asked Paxata co-founder and CEO Prakash Nanduri. “That’s actually the challenge that’s facing our industry these days.”

Preparing Analytics with Artificial Intelligence
Contemporary artificial intelligence and its accessibility to the enterprise today is the answer to Nanduri’s question, and the key to intelligent information. Transitioning from initial data ingestion to analytic insight in business-viable time frames requires the leveraging of the aforementioned artificial intelligence capabilities in smart data preparation platforms. These tools effectively obsolete the manual data preparation that otherwise threatens to consume the time of data scientists and IT departments. “We cannot do any analysis until we have complete, clean, contextual, and consumable data,” Nanduri maintained, enumerating (at a high level) the responsibility of data preparation platforms. Artificial intelligence facilitates those necessities with smart systems that learn from both data-derived precedents and user input, natural language, and evolving semantic models “to do all the heavy lifting for the human beings,” Nanduri said.

Utilizing Natural Language
Artificial intelligence algorithms are at the core of modern data preparation platforms such as Paxata that have largely replaced manual preparation. “There are a series of algorithmic techniques that can automate the process of turning data into information,” Bardoliwalla explained. Those algorithms exploit natural language processing in three key ways that offer enhanced user experiences for self-service:

  • User experience is directly improved with search capabilities via natural language processing that hasten aspects of data discovery.
  • The aforementioned algorithms are invaluable for joining relevant data sets to one another for integration purposes, while suggesting to end users the best way to do so.
  • NLP is also used to standardized terms that may have been entered in different ways, yet which have the same meaning across different systems in a manner that reinforces data quality.

“I always like to say that I just want the system to do it for me,” Bardoliwalla remarked. “Just look at my data, tell me where all the variations are, then recommend the right answer.” The human involvement in this process is vital, particularly with these type of machine learning algorithms that provide recommendations that users choose from-—and which then become the basis for future actions.

At Scale
Perhaps one of the most discernible advantages of smart data management platforms is their ability to utilize artificial intelligence technologies at scale. Scale is one of the critical prerequisites for making such options enterprise grade, and encompasses affirmative responses to critical questions Nanduri asked of these tools, such as can they “handle security, can you handle lineage, can you handle mixed work loads, can you deal with full automation, do you allow for both interactive workloads and batch jobs, and have a full audit trail?” The key to accounting for these different facets of enterprise-grade data preparation at scale is a distributed computing environment that relies on in-memory techniques to account for the most exacting demands of big data. The scalable nature of the algorithms that power such platforms is optimized in that setting. “It’s not enough to run these algorithms on my desk top,” Bardoliwalla commented. “You have to be able to run this on a billion rows, and standardize a billion rows, or join a billion row data set with a 500 billion row data set.”

Shifting the ETL Paradigm with “Point and Click” Transformation
Such scalability is virtually useless without the swiftness to account for the real-time and near real-time needs of modern business. “With a series of technologies we built on top of Apache Spark including a compiler and optimizer, including columnar caching, including our own transformations that are coded as RDDs which is the core Spark abstraction, we have built a very intelligent distributed computing layer… that allows us to interact with sub-second response time on very large volumes of data,” Bardoliwalla mentioned. The most cogent example of this intersection of scale and expeditiousness is in transforming data, which is typically an arduous, time consuming process utilizing traditional ETL methods. Whereas ETL in relational environments requires exhaustive modeling for all possible questions of data in advance—and significant re-calibration times for additional requirements or questions—the incorporation of a semantic model across all data sources voids such concerns. “Instead of presupposing what semantics are, and premodeling the transformations necessary to get uniform semantics, in a Paxata model we build from the ground up and infer our way into a standardized model,” Bardoliwalla revealed. “We are able to allow the data to emerge into information based on the precedents and the algorithmic recommendations people are doing.”

Overcoming the Dark
The significance of self-service data preparation is not easy to summarize. It involves, yet transcends, its alignment with the overall tendency within the data management landscape to empower the business and facilitate timely control over its data at scale. It is predicated on, yet supersedes, the placement of the foremost technologies in the data space—artificial intelligence and all of its particulars—in the hands of those same people. Similarly, it is about more than the comprehensive nature of these solutions and their ability to reinforce parts of data quality, data governance, transformation, and data integration.

Quintessentially, it symbolizes a much needed victory over dark data, and helps to bring to light information assets that might otherwise remain untapped through a process vital to analytics itself.

“The entire ETL paradigm is broken,” Bardoliwalla said. “The reason we have dark data in the enterprise is because the vast majority of people cannot use the tools that are already available to them to turn data into information. They have to rely on the elite few who have these capabilities, but don’t have the business context.”

In light of this situation, smart data preparation is not only ameliorating ETL, but also fulfilling a longstanding industry need to truly democratize data and their management.

Source: Deriving “Inherently Intelligent” Information from Artificial Intelligence by jelaniharper

May 04, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Insights  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> October 10, 2016 Health and Biotech Analytics News Roundup by pstein

>> One Word Can Speak Volumes About Your Company Culture by bobehayes

>> Happy Holidays! Top 10 blogs from 2012 by v1shal

Wanna write? Click Here

[ NEWS BYTES]

>>
 Gladbach vs Dortmund: Line-ups and statistics – Bundesliga – official website Under  Statistics

>>
 7 steps for success with predictive analytics and machine learning … – Health Data Management Under  Machine Learning

>>
 Essay Writing Competition in Statistics – Mathrubhumi English Under  Statistics

More NEWS ? Click Here

[ FEATURED COURSE]

Python for Beginners with Examples

image

A practical Python course for beginners with examples and exercises…. more

[ FEATURED READ]

Thinking, Fast and Slow

image

Drawing on decades of research in psychology that resulted in a Nobel Prize in Economic Sciences, Daniel Kahneman takes readers on an exploration of what influences thought example by example, sometimes with unlikely wor… more

[ TIPS & TRICKS OF THE WEEK]

Finding a success in your data science ? Find a mentor
Yes, most of us dont feel a need but most of us really could use one. As most of data science professionals work in their own isolations, getting an unbiased perspective is not easy. Many times, it is also not easy to understand how the data science progression is going to be. Getting a network of mentors address these issues easily, it gives data professionals an outside perspective and unbiased ally. It’s extremely important for successful data science professionals to build a mentor network and use it through their success.

[ DATA SCIENCE JOB Q&A]

Q:Compare R and Python
A: R
– Focuses on better, user friendly data analysis, statistics and graphical models
– The closer you are to statistics, data science and research, the more you might prefer R
– Statistical models can be written with only a few lines in R
– The same piece of functionality can be written in several ways in R
– Mainly used for standalone computing or analysis on individual servers
– Large number of packages, for anything!

Python
– Used by programmers that want to delve into data science
– The closer you are working in an engineering environment, the more you might prefer Python
– Coding and debugging is easier mainly because of the nice syntax
– Any piece of functionality is always written the same way in Python
– When data analysis needs to be implemented with web apps
– Good tool to implement algorithms for production use

Source

[ VIDEO OF THE WEEK]

Understanding #Customer Buying Journey with #BigData

 Understanding #Customer Buying Journey with #BigData

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

Data are becoming the new raw material of business. – Craig Mundie

[ PODCAST OF THE WEEK]

#GlobalBusiness at the speed of The #BigAnalytics

 #GlobalBusiness at the speed of The #BigAnalytics

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

Market research firm IDC has released a new forecast that shows the big data market is expected to grow from $3.2 billion in 2010 to $16.9 billion in 2015.

Sourced from: Analytics.CLUB #WEB Newsletter

Is big data dating the key to long-lasting romance?

If you want to know if a prospective date is relationship material, just ask them three questions, says Christian Rudder, one of the founders of US internet dating site OKCupid.

  • “Do you like horror movies?”
  • “Have you ever travelled around another country alone?”
  • “Wouldn’t it be fun to chuck it all and go live on a sailboat?”

Why? Because these are the questions first date couples agree on most often, he says.

Mr Rudder discovered this by analysing large amounts of data on OKCupid members who ended up in relationships.

Dating agencies like OKCupid, Match.com – which acquired OKCupid in 2011 for $50m (£30m) – eHarmony and many others, amass this data by making users answer questions about themselves when they sign up.

Some agencies ask as many as 400 questions, and the answers are fed in to large data repositories. Match.com estimates that it has more than 70 terabytes (70,000 gigabytes) of data about its customers.

Applying big data analytics to these treasure troves of information is helping the agencies provide better matches for their customers. And more satisfied customers mean bigger profits.

US internet dating revenues top $2bn (£1.2bn) annually, according to research company IBISWorld. Just under one in 10 of all American adults have tried it.

Morecambe & Wise with Glenda Jackson as Cleopatra
If Cleopatra had used big data analytics perhaps she wouldn’t have made the ultimately fatal decision to hook up with Mark Anthony

The market for dating using mobile apps is particularly strong and is predicted to grow from about $1bn in 2011 to $2.3bn by 2016, according to Juniper Research.

Porky pies

There is, however, a problem: people lie.

To present themselves in what they believe to be a better light, the information customers provide about themselves is not always completely accurate: men are most commonly economical with the truth about age, height and income, while with women it’s age, weight and build.

Mr Rudder adds that many users also supply other inaccurate information about themselves unintentionally.

“My intuition is that most of what users enter is true, but people do misunderstand themselves,” he says.

For example, a user may honestly believe that they listen mostly to classical music, but analysis of their iTunes listening history or their Spotify playlists might provide a far more accurate picture of their listening habits.

Lovers on a picnic
Can big data analytics really engineer the perfect match?

Inaccurate data is a problem because it can lead to unsuitable matches, so some dating agencies are exploring ways to supplement user-provided data with that gathered from other sources.

With users’ permission, dating services could access vast amounts of data from sources including their browser and search histories, film-viewing habits from services such as Netflix and Lovefilm, and purchase histories from online shops like Amazon.

But the problem with this approach is that there is a limit to how much data is really useful, Mr Rudder believes.

“We’ve found that the answers to some questions provide useful information, but if you just collect more data you don’t get high returns on it,” he says.

Social engineering

This hasn’t stopped Hinge, a Washington DC-based dating company, gathering information about its customers from their Facebook pages.

The data is likely to be accurate because other Facebook users police it, Justin McLeod, the company’s founder, believes.

Man pressing "Like" button
Dating site Hinge uses Facebook data to supplement members’ online dating profiles

“You can’t lie about where you were educated because one of your friends is likely to say, ‘You never went to that school’,” he points out.

It also infers information about people by looking at their friends, Mr McLeod says.

“There is definitely useful information contained in the fact that you are a friend of someone.”

Hinge suggests matches with people known to their Facebook friends.

“If you show a preference for people who work in finance, or you tend to like Bob’s friends but not Ann’s, we use that when we curate possible matches,” he explains.

The pool of potential matches can be considerable, because Hinge users have an average of 700 Facebook friends, Mr McLeod adds.

‘Collaborative filtering’

But it turns out that algorithms can produce good matches without asking users for any data about themselves at all.

For example, Dr Kang Zhao, an assistant professor at the University of Iowa and an expert in business analytics and social network analysis, has created a match-making system based on a technique known as collaborative filtering.

Dr Zhao’s system looks at users’ behaviour as they browse a dating site for prospective partners, and at the responses they receive from people they contact.

“If you are a boy we identify people who like the same girls as you – which indicates similar taste – and people who get the same response from these girls as you do – which indicates similar attractiveness,” he explains.

Model of the word love on a laptop
Do opposites attract or does it come down to whether you share friends and musical taste?

Dr Zhao’s algorithm can then suggest potential partners in the same way websites like Amazon or Netflix recommend products or movies, based on the behaviour of other customers who have bought the same products, or enjoyed the same films.

Internet dating may be big business, but no-one has yet devised the perfect matching system. It may well be that the secret of true love is simply not susceptible to big data or any other type of analysis.

“Two people may have exactly the same iTunes history,” OKCupid’s Christian Rudder concludes, “but if one doesn’t like the other’s clothes or the way they look then there simply won’t be any future in that relationship.”

Originally posted via “Is big data dating the key to long-lasting romance?”

Source: Is big data dating the key to long-lasting romance?

Apr 27, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Statistically Significant  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> Relative Performance Assessment: Improving your Competitive Advantage by bobehayes

>> Defining Predictive Analytics in Healthcare by analyticsweekpick

>> Development of the Customer Sentiment Index: Measuring Customers’ Attitudes by bobehayes

Wanna write? Click Here

[ NEWS BYTES]

>>
 Machine learning firm steps up its federal game – Washington Technology Under  Machine Learning

>>
 Verisk Analytics, Inc., Acquires Analyze Re – Yahoo Sports Under  Risk Analytics

>>
 Face Value: sentiment analysis shows business leaders are positive about the year ahead – The Conversation AU Under  Sentiment Analysis

More NEWS ? Click Here

[ FEATURED COURSE]

Applied Data Science: An Introduction

image

As the world’s data grow exponentially, organizations across all sectors, including government and not-for-profit, need to understand, manage and use big, complex data sets—known as big data…. more

[ FEATURED READ]

Introduction to Graph Theory (Dover Books on Mathematics)

image

A stimulating excursion into pure mathematics aimed at “the mathematically traumatized,” but great fun for mathematical hobbyists and serious mathematicians as well. Requiring only high school algebra as mathematical bac… more

[ TIPS & TRICKS OF THE WEEK]

Finding a success in your data science ? Find a mentor
Yes, most of us dont feel a need but most of us really could use one. As most of data science professionals work in their own isolations, getting an unbiased perspective is not easy. Many times, it is also not easy to understand how the data science progression is going to be. Getting a network of mentors address these issues easily, it gives data professionals an outside perspective and unbiased ally. It’s extremely important for successful data science professionals to build a mentor network and use it through their success.

[ DATA SCIENCE JOB Q&A]

Q:What is root cause analysis? How to identify a cause vs. a correlation? Give examples
A: Root cause analysis:
– Method of problem solving used for identifying the root causes or faults of a problem
– A factor is considered a root cause if removal of it prevents the final undesirable event from recurring

Identify a cause vs. a correlation:
– Correlation: statistical measure that describes the size and direction of a relationship between two or more variables. A correlation between two variables doesn’t imply that the change in one variable is the cause of the change in the values of the other variable
– Causation: indicates that one event is the result of the occurrence of the other event; there is a causal relationship between the two events
– Differences between the two types of relationships are easy to identify, but establishing a cause and effect is difficult

Example: sleeping with one’s shoes on is strongly correlated with waking up with a headache. Correlation-implies-causation fallacy: therefore, sleeping with one’s shoes causes headache.
More plausible explanation: both are caused by a third factor: going to bed drunk.

Identify a cause Vs a correlation: use of a controlled study
– In medical research, one group may receive a placebo (control) while the other receives a treatment If the two groups have noticeably different outcomes, the different experiences may have caused the different outcomes

Source

[ VIDEO OF THE WEEK]

#BigData @AnalyticsWeek #FutureOfData #Podcast with Michael OConnell, @Tibco

 #BigData @AnalyticsWeek #FutureOfData #Podcast with Michael OConnell, @Tibco

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

Data are becoming the new raw material of business. – Craig Mundie

[ PODCAST OF THE WEEK]

#BigData @AnalyticsWeek #FutureOfData #Podcast with Beena Ammanath, @GE

 #BigData @AnalyticsWeek #FutureOfData #Podcast with Beena Ammanath, @GE

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

94% of Hadoop users perform analytics on large volumes of data not possible before; 88% analyze data in greater detail; while 82% can now retain more of their data.

Sourced from: Analytics.CLUB #WEB Newsletter

Apr 24, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Data Accuracy  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> March 13, 2017 Health and Biotech analytics news roundup by pstein

>> Word For Social Media Strategy for Brick-Mortar Stores: “Community” by v1shal

>> How oil and gas firms are failing to grasp the necessity of Big Data analytics by anum

Wanna write? Click Here

[ NEWS BYTES]

>>
 Hybrid Cloud Transforms Enterprises – Business 2 Community – Business 2 Community Under  Hybrid Cloud

>>
 Is Oil A Long Term Buy? – Hellenic Shipping News Worldwide Under  Risk Analytics

>>
 How Big Data is Improving Cyber Security – CSO Online Under  cyber security

More NEWS ? Click Here

[ FEATURED COURSE]

Lean Analytics Workshop – Alistair Croll and Ben Yoskovitz

image

Use data to build a better startup faster in partnership with Geckoboard… more

[ FEATURED READ]

The Black Swan: The Impact of the Highly Improbable

image

A black swan is an event, positive or negative, that is deemed improbable yet causes massive consequences. In this groundbreaking and prophetic book, Taleb shows in a playful way that Black Swan events explain almost eve… more

[ TIPS & TRICKS OF THE WEEK]

Save yourself from zombie apocalypse from unscalable models
One living and breathing zombie in today’s analytical models is the pulsating absence of error bars. Not every model is scalable or holds ground with increasing data. Error bars that is tagged to almost every models should be duly calibrated. As business models rake in more data the error bars keep it sensible and in check. If error bars are not accounted for, we will make our models susceptible to failure leading us to halloween that we never wants to see.

[ DATA SCIENCE JOB Q&A]

Q:What is: lift, KPI, robustness, model fitting, design of experiments, 80/20 rule?
A: Lift:
It’s measure of performance of a targeting model (or a rule) at predicting or classifying cases as having an enhanced response (with respect to the population as a whole), measured against a random choice targeting model. Lift is simply: target response/average response.

Suppose a population has an average response rate of 5% (mailing for instance). A certain model (or rule) has identified a segment with a response rate of 20%, then lift=20/5=4

Typically, the modeler seeks to divide the population into quantiles, and rank the quantiles by lift. He can then consider each quantile, and by weighing the predicted response rate against the cost, he can decide to market that quantile or not.
“if we use the probability scores on customers, we can get 60% of the total responders we’d get mailing randomly by only mailing the top 30% of the scored customers”.

KPI:
– Key performance indicator
– A type of performance measurement
– Examples: 0 defects, 10/10 customer satisfaction
– Relies upon a good understanding of what is important to the organization

More examples:

Marketing & Sales:
– New customers acquisition
– Customer attrition
– Revenue (turnover) generated by segments of the customer population
– Often done with a data management platform

IT operations:
– Mean time between failure
– Mean time to repair

Robustness:
– Statistics with good performance even if the underlying distribution is not normal
– Statistics that are not affected by outliers
– A learning algorithm that can reduce the chance of fitting noise is called robust
– Median is a robust measure of central tendency, while mean is not
– Median absolute deviation is also more robust than the standard deviation

Model fitting:
– How well a statistical model fits a set of observations
– Examples: AIC, R2, Kolmogorov-Smirnov test, Chi 2, deviance (glm)

Design of experiments:
The design of any task that aims to describe or explain the variation of information under conditions that are hypothesized to reflect the variation.
In its simplest form, an experiment aims at predicting the outcome by changing the preconditions, the predictors.
– Selection of the suitable predictors and outcomes
– Delivery of the experiment under statistically optimal conditions
– Randomization
– Blocking: an experiment may be conducted with the same equipment to avoid any unwanted variations in the input
– Replication: performing the same combination run more than once, in order to get an estimate for the amount of random error that could be part of the process
– Interaction: when an experiment has 3 or more variables, the situation in which the interaction of two variables on a third is not additive

80/20 rule:
– Pareto principle
– 80% of the effects come from 20% of the causes
– 80% of your sales come from 20% of your clients
– 80% of a company complaints come from 20% of its customers

Source

[ VIDEO OF THE WEEK]

#DataScience Approach to Reducing #Employee #Attrition

 #DataScience Approach to Reducing #Employee #Attrition

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

Everybody gets so much information all day long that they lose their common sense. – Gertrude Stein

[ PODCAST OF THE WEEK]

Unconference Panel Discussion: #Workforce #Analytics Leadership Panel

 Unconference Panel Discussion: #Workforce #Analytics Leadership Panel

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

Data is growing faster than ever before and by the year 2020, about 1.7 megabytes of new information will be created every second for every human being on the planet.

Originally Posted at: Apr 24, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

Emergency Preparation Checklist for Severe Weather

Emergency Preparation Checklist for weather emergenciesOn Coming arrival of hurricane, I searched and prepared this list of things to keep in mind and do incase of weather emergencies such as a hurricance. Create a central place where you keep all the required equipments and supplies needed in case of weather emergencies. Place a list of items and important number in that place for easy access. There, you can keep items you’ll need in case disaster strikes suddenly or you need to evacuate. Always audit the area to make sure all supplies are readily available for grab.

Here are recommendations on what to do before a storm approaches:
— Download weather apps, The Red Cross has a Hurricane App available in the Apple App Store and the Google Play Store.
— Also, download a First Aid app.
— If high wind is expected, seal up windows and doors with 5/8 inch plywood.
— Tighten all the outside items in if they could be picked up by the wind.
— Make sure gutters are clear from any debris.
— Reinforce the garage door.
— Turn the refrigerator to its coldest setting in case power goes off.
— Use a cooler to keep from opening the doors on the freezer or refrigerator.
— Park your car at a safe place, away from Trees or any object that could fly and damage the car.
— Fill a bathtub with water, secure vent with some plastic bag to save water from slow leak.
— Top off the fuel tank on your car.
— Go over the evacuation plan with the family, and learn alternate routes to safety.
— Learn the location of the nearest shelter or in case of pets, nearest pet-friendly shelter.
— In case severe flooding, put an ax in your attic.
— If evacuation is needed, and stick to marked evacuation routes, if possible.
— Store important documents — passports, Social Security cards, birth certificates, deeds — in a watertight container.
— Create inventory list for your household property.
— Leave a note at some noticeable place about your whereabouts.
— Unplug small appliances and electronics before leaving.
— If possible, turn off the electricity, gas and water for residence.
Here is a list of supplies:
— One gallon per person per day and gather a three day supply worth..
— Three days of food, with suggested items including: canned meats, canned or dried fruits, canned vegetables, canned juice, peanut butter, jelly, salt-free crackers, energy/protein bars, trail mix/nuts, dry cereal, cookies or other comfort food.
— Flashlight(s).
— A battery-powered radio, preferably a weather radio.
— Extra batteries.
— A can opener.
— A small fire extinguisher.
— Whistles for each person.
— A first aid kit, including latex gloves; sterile dressings; soap/cleaning agent; antibiotic ointment; burn ointment; adhesive bandages in small, medium and large sizes; eye wash; a thermometer; aspirin/pain reliever; anti-diarrhea tablets; antacids; laxatives; small scissors; tweezers; petroleum jelly.
— A seven-day supply of medications.
— Vitamins.
— A map of the area.
— Baby supplies.
— Pet supplies.
— Wet wipes.
— A camera (to document storm damage).
— A multipurpose tool, with pliers and a screwdriver.
— Cell phones and chargers.
— Contact information for the family.
— A sleeping bag for each person.
— Extra cash.
— An extra set of house keys.
— An extra set of car keys.
— An emergency ladder to evacuate the second floor.
— Household bleach.
— Paper cups, plates and paper towels.
— A silver foil emergency blanket (else a normal blanket will do).
— Insect repellent.
— Rain gear.
— Tools and supplies for securing your home.
— Plastic sheeting.
— Duct tape.
— Dust masks.
— Activities for children.
— Charcoal and matches, if you have a portable grill. But only use it outside.
American Red Cross tips on what to do after the storm arrives:
— Continue listening to a NOAA Weather Radio or the local news for the latest updates.
— Stay alert for extended rainfall and subsequent flooding even after the hurricane or tropical storm has ended.
— If you evacuated, return home only when officials say it is safe.
— Drive only if necessary and avoid flooded roads and washed out bridges.
— Keep away from loose or dangling power lines and report them immediately to the power company.
— Stay out of any building that has water around it.
— Inspect your home for damage. Take pictures of damage, both of the building and its contents, for insurance purposes.
— Use flashlights in the dark. Do NOT use candles.
— Avoid drinking or preparing food with tap water until you are sure it’s not contaminated.
— Check refrigerated food for spoilage. If in doubt, throw it out.
— Wear protective clothing and be cautious when cleaning up to avoid injury.
— Watch animals closely and keep them under your direct control.
— Use the telephone only for emergency calls.
Sources: American Red Cross, Federal Emergency Management Agency, National Hurricane CenterRedcross also provides specific checklists for specific weather emergencies, it could be found here

Source

Apr 20, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

 

Issue #15    Web Version
Contact Us: info@analyticsweek.com

[  ANNOUNCEMENT ]

I hope this note finds you well. Please excuse the brief interruption in our newsletter. Over past few weeks, we have been doing some A/B testing and mounting our Newsletter on our AI led coach TAO.ai. This newsletter and future versions would be using capability of TAO. As with any AI, it needs some training, so kindly excuse/report the rough edges.

– Team TAO/AnalyticsCLUB

[  COVER OF THE WEEK ]

image
Weak data  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> Collaborative Analytics: Analytics for your BigData by v1shal

>> Colleges are using big data to identify when students are likely to flame out by analyticsweekpick

>> Rise of Data Capital by Paul Sonderegger by thebiganalytics

Wanna write? Click Here

[ NEWS BYTES]

>>
 Strategy Analytics: Android accounts for 88% of smartphones shipped in Q3 2016 – GSMArena.com Under  Analytics

>>
 Did you know we’re sedentary but less obese than average? So says Miami statistics website – Miami Herald Under  Statistics

>>
 MHS grad sinks Steel Roots in cyber security – News – North of … – Wicked Local North of Boston Under  cyber security

More NEWS ? Click Here

[ FEATURED COURSE]

Statistical Thinking and Data Analysis

image

This course is an introduction to statistical data analysis. Topics are chosen from applied probability, sampling, estimation, hypothesis testing, linear regression, analysis of variance, categorical data analysis, and n… more

[ FEATURED READ]

The Signal and the Noise: Why So Many Predictions Fail–but Some Don’t

image

People love statistics. Statistics, however, do not always love them back. The Signal and the Noise, Nate Silver’s brilliant and elegant tour of the modern science-slash-art of forecasting, shows what happens when Big Da… more

[ TIPS & TRICKS OF THE WEEK]

Grow at the speed of collaboration
A research by Cornerstone On Demand pointed out the need for better collaboration within workforce, and data analytics domain is no different. A rapidly changing and growing industry like data analytics is very difficult to catchup by isolated workforce. A good collaborative work-environment facilitate better flow of ideas, improved team dynamics, rapid learning, and increasing ability to cut through the noise. So, embrace collaborative team dynamics.

[ DATA SCIENCE JOB Q&A]

Q:What is cross-validation? How to do it right?
A: It’s a model validation technique for assessing how the results of a statistical analysis will generalize to an independent data set. Mainly used in settings where the goal is prediction and one wants to estimate how accurately a model will perform in practice. The goal of cross-validation is to define a data set to test the model in the training phase (i.e. validation data set) in order to limit problems like overfitting, and get an insight on how the model will generalize to an independent data set.

Examples: leave-one-out cross validation, K-fold cross validation

How to do it right?

the training and validation data sets have to be drawn from the same population
predicting stock prices: trained for a certain 5-year period, it’s unrealistic to treat the subsequent 5-year a draw from the same population
common mistake: for instance the step of choosing the kernel parameters of a SVM should be cross-validated as well
Bias-variance trade-off for k-fold cross validation:

Leave-one-out cross-validation: gives approximately unbiased estimates of the test error since each training set contains almost the entire data set (n?1n?1 observations).

But: we average the outputs of n fitted models, each of which is trained on an almost identical set of observations hence the outputs are highly correlated. Since the variance of a mean of quantities increases when correlation of these quantities increase, the test error estimate from a LOOCV has higher variance than the one obtained with k-fold cross validation

Typically, we choose k=5 or k=10, as these values have been shown empirically to yield test error estimates that suffer neither from excessively high bias nor high variance.
Source

[ ENGAGE WITH CLUB]

 ASK Club      FIND Project   

Get HIRED  #GetTAO Coach

 

[ FOLLOW & SIGNUP]

TAO

iTunes

XbyTAO

Facebook

Twitter

Youtube

Analytic.Club

LinkedIn

Newsletter

[ ENGAGE WITH TAO]

#GetTAO Coach

  Join @xTAOai  

[ VIDEO OF THE WEEK]

Data-As-A-Service (#DAAS) to enable compliance reporting

 Data-As-A-Service (#DAAS) to enable compliance reporting

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

It is a capital mistake to theorize before one has data. Insensibly, one begins to twist the facts to suit theories, instead of theories to

[ PODCAST OF THE WEEK]

#BigData @AnalyticsWeek #FutureOfData #Podcast with David Rose, @DittoLabs

 #BigData @AnalyticsWeek #FutureOfData #Podcast with David Rose, @DittoLabs

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

By then, our accumulated digital universe of data will grow from 4.4 zettabyets today to around 44 zettabytes, or 44 trillion gigabytes.

[ TAO DEMO]

AnalyticsClub Demo Video

 

[ PROGRAMS]

Invite top local professionals to your office

 

↓

 

Data Analytics Hiring Drive

 

 
*This Newsletter is hand-curated and autogenerated using #TEAMTAO & TAO, excuse some initial blemishes. As with any AI, it may get worse before it will get relevant, excuse us with your patience & feedback.
Let us know how we could improve the experience using: feedbackform

Copyright © 2016 AnalyticsWeek LLC.

Data-As-A-Service to enable compliance reporting

Data-As-A-Service to enable compliance reporting
Data-As-A-Service to enable compliance reporting

Big Data tools are clearly very powerful & flexible while dealing with unstructured information. However, they are equally applicable, especially when combined with columnar stores such as parquet, to address rapidly changing regulatory requirements that involve reporting & analyzing data across multiple silos of structured information. This is an example of applying multiple big data tools to create data-as-a-service that brings together a data hub, and enable very high performance analytics & reporting leveraging a combination of HDFS, Spark, Cassandra, Parquet, Talend and Jasper. In this talk, we will discuss the architecture, challenges & opportunities of designing data-as-a-Service that enables businesses to respond to changing regulatory & compliance requirements.

Speaker:
Girish Juneja, Senior Vice President/CTO at Altisource

Girish Juneja is in charge of guiding Altisource’s technology vision and will led technology teams across Boston, Los Angeles, Seattle and other cities nationally and nationally, according to a release.

Girish was formerly general manager of big data products and chief technology officer of data center software at California-based chip maker Intel Corp. (Nasdaq: INTC). He helped lead several acquisitions including the acquisition of McAfee Inc. in 2011, according to a release.

He was also the co-founder of technology company Sarvega Inc., acquired by Intel in 2005, and he holds a master’s degree in computer science and an MBA in finance and strategy from the University of Chicago.

Slideshare:
[slideshare id=40783439&doc=girshmeetuppresntationmm3-141027135842-conversion-gate01]
Video:

Source: Data-As-A-Service to enable compliance reporting