SAS enlarges its palette for big data analysis

SAS offers new tools for training, as well as for banking and network security.

SAS Institute did big data decades before big data was the buzz, and now the company is expanding on the ways large-scale computerized analysis can help organizations.

As part of its annual SAS Global Forum, being held in Dallas this week, the company has released new software customized for banking and cybersecurity, for training more people to understand SAS analytics, and for helping non-data scientists do predictive analysis with visual tools.

Founded in 1976, SAS was one of the first companies to offer analytics software for businesses. A private company that generated US$3 billion in revenue in 2014, SAS has devoted considerable research and development funds to enhance its core Statistical Analysis System (SAS) platform over the years. The new releases are the latest fruits of these labors.

With the aim of getting more people trained in the SAS ways, the company has posted its training software, SAS University Edition, on the Amazon Web Services Marketplace. Using AWS eliminates the work of setting up the software on a personal computer, and first-time users of AWS can use the 12-month free tier program, to train on the software at no cost.

SAS launched the University Edition a year ago, and it has since been downloaded over 245,000 times, according to the company.

With the release, SAS is taking aim at one of the chief problems organizations face today when it comes to data analysis, that of finding qualified talent. By 2018, the U.S. alone will face a shortage of anywhere from 140,000 to 190,000 people with analytical expertise, The McKinsey Global Institute consultancy has estimated.

Predictive analytics is becoming necessary even in fields where it hasn’t been heavily used heretofore. One example is information technology security. Security managers for large organizations are growing increasingly frustrated at learning of breaches only after they happen. SAS is betting that applying predictive and behavioral analytics to operational IT data, such as server logs, can help identify and deter break-ins and other malicious activity, as they unfold.

Last week, SAS announced that it’s building a new software package, called SAS Cybersecurity, which will process large of amounts of real-time data from network operations. The software, which will be generally available by the end of the year, will build a model of routine activity, which it then can use to identify and flag suspicious behavior.

SAS is also customizing its software for the banking industry. A new package, called SAS Model Risk Management, provides a detailed model of a how a bank operates so that the bank can better understand its financial risks, as well as convey these risks to regulators.

SAS also plans to broaden its user base by making its software more appealing beyond computer statisticians and data scientists. To this end, the company has paired its data exploration software, called SAS Visual Analytics, with its software for developing predictive models, called SAS Visual Statistics. The pairing can allow non-data scientists, such as line of business analysts and risk managers, to predict future trends based on current data.

The combined products can also be tied in with SAS In-Memory Analytics, software designed to allow large amounts of data to be held entirely in the server’s memory, speeding analysis. It can also work with data on Hadoop clusters, relational database systems or SAS servers.

QVC, the TV and online retailer, has already paired the two products. At its Italian operations, QVC streamlined its supply chain operations by allowing its sales staff to spot buying trends more easily, and spend less time building reports, according to SAS.

The combined package of SAS Visual Analytics and SAS Visual Statistics will be available in May.

Originally posted via “SAS enlarges its palette for big data analysis”

Source: SAS enlarges its palette for big data analysis

Can Hadoop be Apple easy?

Hadoop is now on the minds of executives who care deeply about the power of their rapidly accumulating data. It has already inspired a broad range of big data experiments, established a beachhead as a production system in the enterprise and garnered tremendous optimism for expanded use.

However, it is also starting to create tremendous frustration. A recent analyst report showed less enthusiasm for Hadoop pilots this year than last. Many companies are getting lost on their way to big data glory. Instead, they find themselves in a confusing place of complexity and befuddlement. What’s going on?

While there are heady predictions that by 2020, 75 percent of the Fortune 2000 will be running a 1,000-node Hadoop cluster, there is also evidence that Hadoop is not being adopted as easily as one would think. In 2013, six years after the birth of Hadoop, Gartner said that only 10 percent of the organizations it surveyed were using Hadoop. According to the most recent Gartner survey, less than 50 percent of 284 respondents have invested in Hadoop technology or even plan to do so.


The current attempts to transform Hadoop into a full-blown enterprise product only accomplish the basics and leave the most challenging activities, the operations part, to the users, who, for good reason, wonder what to do next. Now we get to the problem. Hadoop is still complex to run at scale and in production.

Once you get Hadoop running, the real work is just beginning. In order to provide value to the business you need to maintain a cluster that is always up and high performance while being transparent to the end-user. You must make sure the jobs don’t get in each other’s way. You need to support different types of jobs that compete for resources. You have to monitor and troubleshoot the work as it flows through the system. This means doing all sorts of work that is managed, controlled, and monitored by experts. These tasks include diagnosing problems with users’ jobs, handling resource contention between users, resolving problems with jobs that block each other, etc.

How can companies get past the painful stage and start achieving the cost and big data benefits that Hadoop promises? When we look at the advanced practitioners, those companies that have ample data and ample resources to pursue the benefits of Hadoop, we find evidence that the current ways of using Hadoop still require significant end-customer involvement and hands-on support in order to be successful.

For example, Netflix created the Genie project to streamline the use of Amazon Elastic MapReduce by its data scientists, whom Netflix wanted to insulate from the complexity of creating and managing clusters. The Genie project fills the gaps between what Amazon offers and what Netflix actually needs to run diverse workloads in an efficient manner. After a user describes the nature of a desired workload by using metadata, Genie matches the workload with clusters that are best suited to run it, thereby granting the user’s wish.

Once Hadoop finds its “genie,” it can solve the problem of turning Hadoop into a useful tool that can be run at scale and in production. The reason Hadoop adoption and the move into production is going slowly is that these hard problems are being figured out over and over again, stalling progress. By filling this gap for Hadoop, users can do just what they want to do, and learn things about data, without having to waste time learning about Hadoop.

To read the original article on Venture Beat, click here.

Source: Can Hadoop be Apple easy?

Deriving “Inherently Intelligent” Information from Artificial Intelligence

The emergence of big data and scalable data lakes has made it easy for organizations to focus on amassing enormous quantities of data–almost to the exclusion of the analytic insight which renders big data an asset.

According to Paxata co-founder and Chief Product Officer Nenshad Bardoliwalla,“People are collecting this data but they have no idea what’s actually in the data lake, so they can’t take advantage of it.”

Instead of focusing on data and its collection, enterprises should focus on information and its insight, which is the natural outcome of intelligent analytics. Data preparation exists at the nexus between ingesting data and obtaining valuable information from them, and is the critical requisite which has traditionally kept data in the backrooms of IT and away from the business users that need them.

Self-service data preparation tools, however, enable business users to actuate most aspects of preparation—including integration, data quality measures, data governance adherence, and transformation—themselves. The incorporation of myriad facets of artificial intelligence including machine learning, natural language processing, and semantic ontologies both automates and expedites these processes, delivering their vaunted capabilities to the business users who have the most to gain from them.

“How do I get information that allows me to very rapidly do analysis or get insight without having to make this a PhD thesis for every person in the company?” asked Paxata co-founder and CEO Prakash Nanduri. “That’s actually the challenge that’s facing our industry these days.”

Preparing Analytics with Artificial Intelligence
Contemporary artificial intelligence and its accessibility to the enterprise today is the answer to Nanduri’s question, and the key to intelligent information. Transitioning from initial data ingestion to analytic insight in business-viable time frames requires the leveraging of the aforementioned artificial intelligence capabilities in smart data preparation platforms. These tools effectively obsolete the manual data preparation that otherwise threatens to consume the time of data scientists and IT departments. “We cannot do any analysis until we have complete, clean, contextual, and consumable data,” Nanduri maintained, enumerating (at a high level) the responsibility of data preparation platforms. Artificial intelligence facilitates those necessities with smart systems that learn from both data-derived precedents and user input, natural language, and evolving semantic models “to do all the heavy lifting for the human beings,” Nanduri said.

Utilizing Natural Language
Artificial intelligence algorithms are at the core of modern data preparation platforms such as Paxata that have largely replaced manual preparation. “There are a series of algorithmic techniques that can automate the process of turning data into information,” Bardoliwalla explained. Those algorithms exploit natural language processing in three key ways that offer enhanced user experiences for self-service:

  • User experience is directly improved with search capabilities via natural language processing that hasten aspects of data discovery.
  • The aforementioned algorithms are invaluable for joining relevant data sets to one another for integration purposes, while suggesting to end users the best way to do so.
  • NLP is also used to standardized terms that may have been entered in different ways, yet which have the same meaning across different systems in a manner that reinforces data quality.

“I always like to say that I just want the system to do it for me,” Bardoliwalla remarked. “Just look at my data, tell me where all the variations are, then recommend the right answer.” The human involvement in this process is vital, particularly with these type of machine learning algorithms that provide recommendations that users choose from-—and which then become the basis for future actions.

At Scale
Perhaps one of the most discernible advantages of smart data management platforms is their ability to utilize artificial intelligence technologies at scale. Scale is one of the critical prerequisites for making such options enterprise grade, and encompasses affirmative responses to critical questions Nanduri asked of these tools, such as can they “handle security, can you handle lineage, can you handle mixed work loads, can you deal with full automation, do you allow for both interactive workloads and batch jobs, and have a full audit trail?” The key to accounting for these different facets of enterprise-grade data preparation at scale is a distributed computing environment that relies on in-memory techniques to account for the most exacting demands of big data. The scalable nature of the algorithms that power such platforms is optimized in that setting. “It’s not enough to run these algorithms on my desk top,” Bardoliwalla commented. “You have to be able to run this on a billion rows, and standardize a billion rows, or join a billion row data set with a 500 billion row data set.”

Shifting the ETL Paradigm with “Point and Click” Transformation
Such scalability is virtually useless without the swiftness to account for the real-time and near real-time needs of modern business. “With a series of technologies we built on top of Apache Spark including a compiler and optimizer, including columnar caching, including our own transformations that are coded as RDDs which is the core Spark abstraction, we have built a very intelligent distributed computing layer… that allows us to interact with sub-second response time on very large volumes of data,” Bardoliwalla mentioned. The most cogent example of this intersection of scale and expeditiousness is in transforming data, which is typically an arduous, time consuming process utilizing traditional ETL methods. Whereas ETL in relational environments requires exhaustive modeling for all possible questions of data in advance—and significant re-calibration times for additional requirements or questions—the incorporation of a semantic model across all data sources voids such concerns. “Instead of presupposing what semantics are, and premodeling the transformations necessary to get uniform semantics, in a Paxata model we build from the ground up and infer our way into a standardized model,” Bardoliwalla revealed. “We are able to allow the data to emerge into information based on the precedents and the algorithmic recommendations people are doing.”

Overcoming the Dark
The significance of self-service data preparation is not easy to summarize. It involves, yet transcends, its alignment with the overall tendency within the data management landscape to empower the business and facilitate timely control over its data at scale. It is predicated on, yet supersedes, the placement of the foremost technologies in the data space—artificial intelligence and all of its particulars—in the hands of those same people. Similarly, it is about more than the comprehensive nature of these solutions and their ability to reinforce parts of data quality, data governance, transformation, and data integration.

Quintessentially, it symbolizes a much needed victory over dark data, and helps to bring to light information assets that might otherwise remain untapped through a process vital to analytics itself.

“The entire ETL paradigm is broken,” Bardoliwalla said. “The reason we have dark data in the enterprise is because the vast majority of people cannot use the tools that are already available to them to turn data into information. They have to rely on the elite few who have these capabilities, but don’t have the business context.”

In light of this situation, smart data preparation is not only ameliorating ETL, but also fulfilling a longstanding industry need to truly democratize data and their management.

Source: Deriving “Inherently Intelligent” Information from Artificial Intelligence by jelaniharper

Is big data dating the key to long-lasting romance?

If you want to know if a prospective date is relationship material, just ask them three questions, says Christian Rudder, one of the founders of US internet dating site OKCupid.

  • “Do you like horror movies?”
  • “Have you ever travelled around another country alone?”
  • “Wouldn’t it be fun to chuck it all and go live on a sailboat?”

Why? Because these are the questions first date couples agree on most often, he says.

Mr Rudder discovered this by analysing large amounts of data on OKCupid members who ended up in relationships.

Dating agencies like OKCupid, – which acquired OKCupid in 2011 for $50m (£30m) – eHarmony and many others, amass this data by making users answer questions about themselves when they sign up.

Some agencies ask as many as 400 questions, and the answers are fed in to large data repositories. estimates that it has more than 70 terabytes (70,000 gigabytes) of data about its customers.

Applying big data analytics to these treasure troves of information is helping the agencies provide better matches for their customers. And more satisfied customers mean bigger profits.

US internet dating revenues top $2bn (£1.2bn) annually, according to research company IBISWorld. Just under one in 10 of all American adults have tried it.

Morecambe & Wise with Glenda Jackson as Cleopatra
If Cleopatra had used big data analytics perhaps she wouldn’t have made the ultimately fatal decision to hook up with Mark Anthony

The market for dating using mobile apps is particularly strong and is predicted to grow from about $1bn in 2011 to $2.3bn by 2016, according to Juniper Research.

Porky pies

There is, however, a problem: people lie.

To present themselves in what they believe to be a better light, the information customers provide about themselves is not always completely accurate: men are most commonly economical with the truth about age, height and income, while with women it’s age, weight and build.

Mr Rudder adds that many users also supply other inaccurate information about themselves unintentionally.

“My intuition is that most of what users enter is true, but people do misunderstand themselves,” he says.

For example, a user may honestly believe that they listen mostly to classical music, but analysis of their iTunes listening history or their Spotify playlists might provide a far more accurate picture of their listening habits.

Lovers on a picnic
Can big data analytics really engineer the perfect match?

Inaccurate data is a problem because it can lead to unsuitable matches, so some dating agencies are exploring ways to supplement user-provided data with that gathered from other sources.

With users’ permission, dating services could access vast amounts of data from sources including their browser and search histories, film-viewing habits from services such as Netflix and Lovefilm, and purchase histories from online shops like Amazon.

But the problem with this approach is that there is a limit to how much data is really useful, Mr Rudder believes.

“We’ve found that the answers to some questions provide useful information, but if you just collect more data you don’t get high returns on it,” he says.

Social engineering

This hasn’t stopped Hinge, a Washington DC-based dating company, gathering information about its customers from their Facebook pages.

The data is likely to be accurate because other Facebook users police it, Justin McLeod, the company’s founder, believes.

Man pressing "Like" button
Dating site Hinge uses Facebook data to supplement members’ online dating profiles

“You can’t lie about where you were educated because one of your friends is likely to say, ‘You never went to that school’,” he points out.

It also infers information about people by looking at their friends, Mr McLeod says.

“There is definitely useful information contained in the fact that you are a friend of someone.”

Hinge suggests matches with people known to their Facebook friends.

“If you show a preference for people who work in finance, or you tend to like Bob’s friends but not Ann’s, we use that when we curate possible matches,” he explains.

The pool of potential matches can be considerable, because Hinge users have an average of 700 Facebook friends, Mr McLeod adds.

‘Collaborative filtering’

But it turns out that algorithms can produce good matches without asking users for any data about themselves at all.

For example, Dr Kang Zhao, an assistant professor at the University of Iowa and an expert in business analytics and social network analysis, has created a match-making system based on a technique known as collaborative filtering.

Dr Zhao’s system looks at users’ behaviour as they browse a dating site for prospective partners, and at the responses they receive from people they contact.

“If you are a boy we identify people who like the same girls as you – which indicates similar taste – and people who get the same response from these girls as you do – which indicates similar attractiveness,” he explains.

Model of the word love on a laptop
Do opposites attract or does it come down to whether you share friends and musical taste?

Dr Zhao’s algorithm can then suggest potential partners in the same way websites like Amazon or Netflix recommend products or movies, based on the behaviour of other customers who have bought the same products, or enjoyed the same films.

Internet dating may be big business, but no-one has yet devised the perfect matching system. It may well be that the secret of true love is simply not susceptible to big data or any other type of analysis.

“Two people may have exactly the same iTunes history,” OKCupid’s Christian Rudder concludes, “but if one doesn’t like the other’s clothes or the way they look then there simply won’t be any future in that relationship.”

Originally posted via “Is big data dating the key to long-lasting romance?”

Source: Is big data dating the key to long-lasting romance?

Apr 24, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..)


Data Accuracy  Source


More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> March 13, 2017 Health and Biotech analytics news roundup by pstein

>> Word For Social Media Strategy for Brick-Mortar Stores: “Community” by v1shal

>> How oil and gas firms are failing to grasp the necessity of Big Data analytics by anum

Wanna write? Click Here


 Hybrid Cloud Transforms Enterprises – Business 2 Community – Business 2 Community Under  Hybrid Cloud

 Is Oil A Long Term Buy? – Hellenic Shipping News Worldwide Under  Risk Analytics

 How Big Data is Improving Cyber Security – CSO Online Under  cyber security

More NEWS ? Click Here


Lean Analytics Workshop – Alistair Croll and Ben Yoskovitz


Use data to build a better startup faster in partnership with Geckoboard… more


The Black Swan: The Impact of the Highly Improbable


A black swan is an event, positive or negative, that is deemed improbable yet causes massive consequences. In this groundbreaking and prophetic book, Taleb shows in a playful way that Black Swan events explain almost eve… more


Save yourself from zombie apocalypse from unscalable models
One living and breathing zombie in today’s analytical models is the pulsating absence of error bars. Not every model is scalable or holds ground with increasing data. Error bars that is tagged to almost every models should be duly calibrated. As business models rake in more data the error bars keep it sensible and in check. If error bars are not accounted for, we will make our models susceptible to failure leading us to halloween that we never wants to see.


Q:What is: lift, KPI, robustness, model fitting, design of experiments, 80/20 rule?
A: Lift:
It’s measure of performance of a targeting model (or a rule) at predicting or classifying cases as having an enhanced response (with respect to the population as a whole), measured against a random choice targeting model. Lift is simply: target response/average response.

Suppose a population has an average response rate of 5% (mailing for instance). A certain model (or rule) has identified a segment with a response rate of 20%, then lift=20/5=4

Typically, the modeler seeks to divide the population into quantiles, and rank the quantiles by lift. He can then consider each quantile, and by weighing the predicted response rate against the cost, he can decide to market that quantile or not.
“if we use the probability scores on customers, we can get 60% of the total responders we’d get mailing randomly by only mailing the top 30% of the scored customers”.

– Key performance indicator
– A type of performance measurement
– Examples: 0 defects, 10/10 customer satisfaction
– Relies upon a good understanding of what is important to the organization

More examples:

Marketing & Sales:
– New customers acquisition
– Customer attrition
– Revenue (turnover) generated by segments of the customer population
– Often done with a data management platform

IT operations:
– Mean time between failure
– Mean time to repair

– Statistics with good performance even if the underlying distribution is not normal
– Statistics that are not affected by outliers
– A learning algorithm that can reduce the chance of fitting noise is called robust
– Median is a robust measure of central tendency, while mean is not
– Median absolute deviation is also more robust than the standard deviation

Model fitting:
– How well a statistical model fits a set of observations
– Examples: AIC, R2, Kolmogorov-Smirnov test, Chi 2, deviance (glm)

Design of experiments:
The design of any task that aims to describe or explain the variation of information under conditions that are hypothesized to reflect the variation.
In its simplest form, an experiment aims at predicting the outcome by changing the preconditions, the predictors.
– Selection of the suitable predictors and outcomes
– Delivery of the experiment under statistically optimal conditions
– Randomization
– Blocking: an experiment may be conducted with the same equipment to avoid any unwanted variations in the input
– Replication: performing the same combination run more than once, in order to get an estimate for the amount of random error that could be part of the process
– Interaction: when an experiment has 3 or more variables, the situation in which the interaction of two variables on a third is not additive

80/20 rule:
– Pareto principle
– 80% of the effects come from 20% of the causes
– 80% of your sales come from 20% of your clients
– 80% of a company complaints come from 20% of its customers



#DataScience Approach to Reducing #Employee #Attrition

 #DataScience Approach to Reducing #Employee #Attrition

Subscribe to  Youtube


Everybody gets so much information all day long that they lose their common sense. – Gertrude Stein


Unconference Panel Discussion: #Workforce #Analytics Leadership Panel

 Unconference Panel Discussion: #Workforce #Analytics Leadership Panel


iTunes  GooglePlay


Data is growing faster than ever before and by the year 2020, about 1.7 megabytes of new information will be created every second for every human being on the planet.

Originally Posted at: Apr 24, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

Emergency Preparation Checklist for Severe Weather

Emergency Preparation Checklist for weather emergenciesOn Coming arrival of hurricane, I searched and prepared this list of things to keep in mind and do incase of weather emergencies such as a hurricance. Create a central place where you keep all the required equipments and supplies needed in case of weather emergencies. Place a list of items and important number in that place for easy access. There, you can keep items you’ll need in case disaster strikes suddenly or you need to evacuate. Always audit the area to make sure all supplies are readily available for grab.

Here are recommendations on what to do before a storm approaches:
— Download weather apps, The Red Cross has a Hurricane App available in the Apple App Store and the Google Play Store.
— Also, download a First Aid app.
— If high wind is expected, seal up windows and doors with 5/8 inch plywood.
— Tighten all the outside items in if they could be picked up by the wind.
— Make sure gutters are clear from any debris.
— Reinforce the garage door.
— Turn the refrigerator to its coldest setting in case power goes off.
— Use a cooler to keep from opening the doors on the freezer or refrigerator.
— Park your car at a safe place, away from Trees or any object that could fly and damage the car.
— Fill a bathtub with water, secure vent with some plastic bag to save water from slow leak.
— Top off the fuel tank on your car.
— Go over the evacuation plan with the family, and learn alternate routes to safety.
— Learn the location of the nearest shelter or in case of pets, nearest pet-friendly shelter.
— In case severe flooding, put an ax in your attic.
— If evacuation is needed, and stick to marked evacuation routes, if possible.
— Store important documents — passports, Social Security cards, birth certificates, deeds — in a watertight container.
— Create inventory list for your household property.
— Leave a note at some noticeable place about your whereabouts.
— Unplug small appliances and electronics before leaving.
— If possible, turn off the electricity, gas and water for residence.
Here is a list of supplies:
— One gallon per person per day and gather a three day supply worth..
— Three days of food, with suggested items including: canned meats, canned or dried fruits, canned vegetables, canned juice, peanut butter, jelly, salt-free crackers, energy/protein bars, trail mix/nuts, dry cereal, cookies or other comfort food.
— Flashlight(s).
— A battery-powered radio, preferably a weather radio.
— Extra batteries.
— A can opener.
— A small fire extinguisher.
— Whistles for each person.
— A first aid kit, including latex gloves; sterile dressings; soap/cleaning agent; antibiotic ointment; burn ointment; adhesive bandages in small, medium and large sizes; eye wash; a thermometer; aspirin/pain reliever; anti-diarrhea tablets; antacids; laxatives; small scissors; tweezers; petroleum jelly.
— A seven-day supply of medications.
— Vitamins.
— A map of the area.
— Baby supplies.
— Pet supplies.
— Wet wipes.
— A camera (to document storm damage).
— A multipurpose tool, with pliers and a screwdriver.
— Cell phones and chargers.
— Contact information for the family.
— A sleeping bag for each person.
— Extra cash.
— An extra set of house keys.
— An extra set of car keys.
— An emergency ladder to evacuate the second floor.
— Household bleach.
— Paper cups, plates and paper towels.
— A silver foil emergency blanket (else a normal blanket will do).
— Insect repellent.
— Rain gear.
— Tools and supplies for securing your home.
— Plastic sheeting.
— Duct tape.
— Dust masks.
— Activities for children.
— Charcoal and matches, if you have a portable grill. But only use it outside.
American Red Cross tips on what to do after the storm arrives:
— Continue listening to a NOAA Weather Radio or the local news for the latest updates.
— Stay alert for extended rainfall and subsequent flooding even after the hurricane or tropical storm has ended.
— If you evacuated, return home only when officials say it is safe.
— Drive only if necessary and avoid flooded roads and washed out bridges.
— Keep away from loose or dangling power lines and report them immediately to the power company.
— Stay out of any building that has water around it.
— Inspect your home for damage. Take pictures of damage, both of the building and its contents, for insurance purposes.
— Use flashlights in the dark. Do NOT use candles.
— Avoid drinking or preparing food with tap water until you are sure it’s not contaminated.
— Check refrigerated food for spoilage. If in doubt, throw it out.
— Wear protective clothing and be cautious when cleaning up to avoid injury.
— Watch animals closely and keep them under your direct control.
— Use the telephone only for emergency calls.
Sources: American Red Cross, Federal Emergency Management Agency, National Hurricane CenterRedcross also provides specific checklists for specific weather emergencies, it could be found here


Data-As-A-Service to enable compliance reporting

Data-As-A-Service to enable compliance reporting
Data-As-A-Service to enable compliance reporting

Big Data tools are clearly very powerful & flexible while dealing with unstructured information. However, they are equally applicable, especially when combined with columnar stores such as parquet, to address rapidly changing regulatory requirements that involve reporting & analyzing data across multiple silos of structured information. This is an example of applying multiple big data tools to create data-as-a-service that brings together a data hub, and enable very high performance analytics & reporting leveraging a combination of HDFS, Spark, Cassandra, Parquet, Talend and Jasper. In this talk, we will discuss the architecture, challenges & opportunities of designing data-as-a-Service that enables businesses to respond to changing regulatory & compliance requirements.

Girish Juneja, Senior Vice President/CTO at Altisource

Girish Juneja is in charge of guiding Altisource’s technology vision and will led technology teams across Boston, Los Angeles, Seattle and other cities nationally and nationally, according to a release.

Girish was formerly general manager of big data products and chief technology officer of data center software at California-based chip maker Intel Corp. (Nasdaq: INTC). He helped lead several acquisitions including the acquisition of McAfee Inc. in 2011, according to a release.

He was also the co-founder of technology company Sarvega Inc., acquired by Intel in 2005, and he holds a master’s degree in computer science and an MBA in finance and strategy from the University of Chicago.

[slideshare id=40783439&doc=girshmeetuppresntationmm3-141027135842-conversion-gate01]

Source: Data-As-A-Service to enable compliance reporting

Why So Many ‘Fake’ Data Scientists?

Have you noticed how many people are suddenly calling themselves data scientists? Your neighbour, that gal you met at a cocktail party — even your accountant has had his business cards changed!

There are so many people out there that suddenly call themselves ‘data scientists’ because it is the latest fad. The Harvard Business Review even called it the sexiest job of the 21st century! But in fact, many calling themselves data scientists are lacking the full skill set I would expect were I in charge of hiring a data scientist.

What I see is many business analysts that haven’t even got any understanding of big data technology or programming languages call themselves data scientists. Then there are programmers from the IT function who understand programming but lack the business skills, analytics skills or creativity needed to be a true data scientist.

Part of the problem here is simple supply and demand economics: There simply aren’t enough true data scientists out there to fill the need, and so less qualified (or not qualified at all!) candidates make it into the ranks.

Second is that the role of a data scientist is often ill-defined within the field and even within a single company.  People throw the term around to mean everything from a data engineer (the person responsible for creating the software “plumbing” that collects and stores the data) to statisticians who merely crunch the numbers.

A true data scientist is so much more. In my experience, a data scientist is:

  • multidisciplinary. I have seen many companies try to narrow their recruiting by searching for only candidates who have a Phd in mathematics, but in truth, a good data scientist could come from a variety of backgrounds — and may not necessarily have an advanced degree in any of them.
  • business savvy.  If a candidate does not have much business experience, the company must compensate by pairing him or her with someone who does.
  • analytical. A good data scientist must be naturally analytical and have a strong ability to spot patterns.
  • good at visual communications. Anyone can make a chart or graph; it takes someone who understands visual communications to create a representation of data that tells the story the audience needs to hear.
  • versed in computer science. Professionals who are familiar with Hadoop, Java, Python, etc. are in high demand. If your candidate is not expert in these tools, he or she should be paired with a data engineer who is.
  • creative. Creativity is vital for a data scientist, who needs to be able to look beyond a particular set of numbers, beyond even the company’s data sets to discover answers to questions — and perhaps even pose new questions.
  • able to add significant value to data. If someone only presents the data, he or she is a statistician, not a data scientist. Data scientists offer great additional value over data through insights and analysis.
  • a storyteller. In the end, data is useless without context. It is the data scientist’s job to provide that context, to tell a story with the data that provides value to the company.

If you can find a candidate with all of these traits — or most of them with the ability and desire to grow — then you’ve found someone who can deliver incredible value to your company, your systems, and your field.

But skimp on any of these traits, and you run the risk of hiring an imposter, someone just hoping to ride the data sciences bubble until it bursts.

To read the original article on Data Science Central, click here.

Originally Posted at: Why So Many ‘Fake’ Data Scientists? by analyticsweekpick

Analyzing Big Data: A Customer-Centric Approach

Big Data

The latest buzz word in business is Big Data. According to Pat Gelsinger, President and COO of EMC, in an article by the The Wall Street Journal, Big Data refers to the idea that companies can extract value from collecting, processing and analyzing vast quantities of data. Businesses who can get a better handle on these data will be more likely to outperform their competitors who do not.

When people talk about Big Data, they are typically referring to three characteristics of the data:

  1. Volume: the amount of data being collected is massive
  2. Velocity: the speed at which data are being generated/collected is very fast (consider the streams of tweets)
  3. Variety: the different types of data like structured and unstructured data

Because extremely large data sets cannot be processed using conventional database systems, companies have created new ways of processing (e.g., storing, accessing and analyzing) this big data. Big Data is about housing data on multiple servers for quick access and employing parallel processing of the data (rather than following sequential steps).

Business Value of Big Data Will Come From Analytics

In a late 2010 study, researchers from MIT Sloan Management Review and IBM asked 3000 executives, managers and analysts about how they obtain value from their massive amounts of data.  They found that organizations that used business information and analytics outperformed organizations who did not. Specifically, these researchers found that top-performing businesses were twice as likely to use analytics to guide future strategies and guide day-to-day operations compared to their low-performing counterparts.

The MIT/IBM researchers, however, also found that the number one obstacle to the adoption of analytics in their organizations was a lack of understanding of how to use analytics to improve the business (the second and third top obstacles were: Lack of management bandwidth due to competing priorities and a lack of skills internally). In addition, there are simply not enough people with Big Data analysis skills.  McKinsey and Company estimates that the “United States faces a shortage of 140,000 to 190,000 people with analytical expertise and 1.5 million managers and analysts with the skills to understand and make decisions based on the analysis of big data.”

Customer Experience Management and Big Data

The problem of Big Data is one of applying appropriate analytic techniques to business data to extract value. Companies who can apply appropriate statistical models to their data will make better sense of the data and, consequently, get more value from those data. Generally speaking, business data can be divided into four types:

  1. Operational
  2. Financial
  3. Constituency (includes employees, partners)
  4. Customer

Customer Experience Management (CEM) is the process of understanding and managing customers’ interactions with and perceptions about the company/brand. Businesses are already realizing the value of integrating different types of customer data to improve customer loyalty. In my research on best practices in customer feedback programs, I found that the integration of different types of customer data (purchase history, service history, values and satisfaction) are necessary for an effective customer feedback program. Specifically, I found that loyalty leading companies, compared to their loyalty lagging counterparts, link customer feedback metrics to a variety of business metrics (operational, financial, constituency) to uncover deeper customer insights. Additionally, to facilitate this integration between attitudinal data and objective business data, loyalty leaders also integrate customer feedback into their daily business processes and customer relationship management system.

While I have not yet used new technology that supports Big Data (e.g., Hadoop, MapReduce) to process data, I have worked with businesses to merge disparate data sets to conduct what is commonly called Business Linkage Analysis. Business linkage analysis is a problem of data organization. The ultimate goal of linkage analysis is to understand the causes and consequences of customer loyalty (e.g., advocacy, purchasing, retention). I think that identifying the correlates of customer metrics is central to extracting value from Big Data.

Customer-Centric Approach to Analyzing Big Data

I have written three posts on different types of linkage analysis, each presenting a data model (a way to organize the data) to conduct each type of linkage analysis. The key to conducting linkage analysis is to ensure the different data sets are organized (e.g., aggregated) properly to support the conclusions you want to make from your combined data.

  • Linking operational and customer metrics: We are interested in calculating the statistical relationships between customer metrics and operational metrics. Data are aggregated at the transaction level.  Understanding these relationships allows businesses to build/identify customer-centric business metrics, manage customer relationships using objective operational metrics and reward employee behavior that will drive customer satisfaction.
  • Linking financial and customer metrics: We are interested in calculating the statistical relationships between customer metrics and financial business outcomes. Data are aggregated at the customer level. Understanding these relationships allows you to strengthen the business case for your CEM program, identify drivers of real customer behaviors and determine ROI for customer experience improvement solutions.
  • Linking constituency and customer metrics: We are interested in calculating the statistical relationship between customer metrics and employee/partner metrics (e.g., satisfaction, loyalty, training metrics). Data are aggregated at the constituency level. Understanding these relationships allows businesses to understand the impact of employee and partner experience on the customer experience, improve the health of the customer relationship by improving the health of the employee and partner relationship and build a customer centric culture.


The era of Big Data is upon us. From small and midsize companies to large enterprise companies, their ability to extract value from big data through smart analytics will be the key to their business success. In this post, I presented a few analytic approaches in which different types of data sources are merged with customer feedback data. This customer-centric approach allows for businesses to analyze their data in a way that helps them understand the reasons for customer dis/loyalty and the impact dis/loyalty has to the growth of the company.

Download Free Paper on Linkage Analysis

Source: Analyzing Big Data: A Customer-Centric Approach by bobehayes

Planning The Future with Wearables, IOT, and Big Data

According to Dataconomy, this year’s BARC study shows 83% of companies already invested in Big Data, or planning future engagement – a 20% increase on Gartner’s 2013 calculations. The Internet of Things has changed our data collection processes from computer-bound functions to real-world operations, with newly-connected everyday objects providing in-depth information around individual habits, preferences, and personal stats. This relative data allows companies to create and adapt their products and services for enhanced user experiences and personalized services.


With Fitbit’s second quarter revenue of $400 million tripling expectations, and reported sales of 4.5 million devices in this second quarter alone, it’s obvious that health-conscious individuals are eager to make use of the fitness benefits Wearables offer. However, Wearables are not only encouraging users to be more active but are being used to simplify and transform patient-centric care. Able to monitor heart rate and vital signs as well as activity levels, Wearables are able to alert users, doctors, emergency response or family members of signs of distress. The heart rates of those with heart disease can be carefully monitored, alerts can discourage users from harmful behaviors and encourage positive ones, surgeons can use smart glasses to monitor vital signs during operations, and the vast quantities of data received can be used for epidemiological studies. Healthcare providers have many IoT opportunities available to them, and those correctly making use of them will improve patient welfare and treatment as well as ensure their own success.


Insurers also have a wealth of opportunities available to them should they properly utilize Wearables and the Internet of Things. By using data acquired from Wearables for more accurate underwriting, products can be tailored to the individual. Information such as location, level of exercise, driving record, medications used, work history, credit ratings, hobbies and interests, and spending habits can be acquired through data amalgamation, and instead of relying on client declarations, companies have access to more accurate and honest data.



Not only useful, Wearables and the Internet of Things have a strong base in amusement. Though these devices are accumulating enormous quantities of practical data, their primary purpose for users is often recreational. Macworld suggests the Apple Watch is not here to entertain, but the array of applications available would suggest otherwise. CIO looks at some weird and wacky Wearables that would suit anyone’s Christmas list, including Ping, a social networking garment; Motorola digital tattoos; tweeting bras; Peekiboo for seeing the world through your child’s eyes; and smart pajamas that let you know when your kids are ready for bed. Most of us don’t need any of these things, but we want them. And they all collect massive quantities of data by the microsecond.


But of course, all this data flying around comes with some serious risks, not least of all invasion of privacy. As the years have gone by, we’ve become less and less concerned about how much data we’re offering up, never considering its security or the implications of providing it. Questions around whether data recorded from Wearables is legally governed ‘personal data’ have arisen, and the collection and use of this data is likely to face some serious legal challenges in the future. It’s not likely Wearables are going to disappear, but shrewd developers are creating safer, more secure products to best navigate these waters.

Article originally appeared HERE.

Source: Planning The Future with Wearables, IOT, and Big Data by analyticsweekpick