Nov 23, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Complex data  Source

[ AnalyticsWeek BYTES]

>> Nov 23, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..) by admin

>> December 5, 2016 Health and Biotech analytics news roundup by pstein

>> What Felix Baumgartner Space Jump Could Teach Entrepreneurs by v1shal

Wanna write? Click Here

[ NEWS BYTES]

>>
 Hardware Trends 2017: Complete Slides, And Some Analysis – Forbes Under  IOT

>>
 Qualcomm Launches 48-core Centriq for $1995: Arm Servers for … – AnandTech Under  Cloud

>>
 VMware working with Amazon on Hybrid Cloud Product – Financialbuzz.com Under  Hybrid Cloud

More NEWS ? Click Here

[ FEATURED COURSE]

Process Mining: Data science in Action

image

Process mining is the missing link between model-based process analysis and data-oriented analysis techniques. Through concrete data sets and easy to use software the course provides data science knowledge that can be ap… more

[ FEATURED READ]

Introduction to Graph Theory (Dover Books on Mathematics)

image

A stimulating excursion into pure mathematics aimed at “the mathematically traumatized,” but great fun for mathematical hobbyists and serious mathematicians as well. Requiring only high school algebra as mathematical bac… more

[ TIPS & TRICKS OF THE WEEK]

Data Have Meaning
We live in a Big Data world in which everything is quantified. While the emphasis of Big Data has been focused on distinguishing the three characteristics of data (the infamous three Vs), we need to be cognizant of the fact that data have meaning. That is, the numbers in your data represent something of interest, an outcome that is important to your business. The meaning of those numbers is about the veracity of your data.

[ DATA SCIENCE Q&A]

Q:What is random forest? Why is it good?
A: Random forest? (Intuition):
– Underlying principle: several weak learners combined provide a strong learner
– Builds several decision trees on bootstrapped training samples of data
– On each tree, each time a split is considered, a random sample of m predictors is chosen as split candidates, out of all p predictors
– Rule of thumb: at each split m=?p
– Predictions: at the majority rule

Why is it good?
– Very good performance (decorrelates the features)
– Can model non-linear class boundaries
– Generalization error for free: no cross-validation needed, gives an unbiased estimate of the generalization error as the trees is built
– Generates variable importance

Source

[ VIDEO OF THE WEEK]

#BigData @AnalyticsWeek #FutureOfData #Podcast with @ScottZoldi, @FICO

 #BigData @AnalyticsWeek #FutureOfData #Podcast with @ScottZoldi, @FICO

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

I keep saying that the sexy job in the next 10 years will be statisticians. And I’m not kidding. – Hal Varian

[ PODCAST OF THE WEEK]

#BigData @AnalyticsWeek #FutureOfData #Podcast with Joe DeCosmo, @Enova

 #BigData @AnalyticsWeek #FutureOfData #Podcast with Joe DeCosmo, @Enova

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

In the developed economies of Europe, government administrators could save more than €100 billion ($149 billion) in operational efficiency improvements alone by using big data, not including using big data to reduce fraud and errors and boost the collection of tax revenues.

Sourced from: Analytics.CLUB #WEB Newsletter

IBM Invests to Help Open-Source Big Data Software — and Itself

The IBM “endorsement effect” has often shaped the computer industry over the years. In 1981, when IBM entered the personal computer business, the company decisively pushed an upstart technology into the mainstream.

In 2000, the open-source operating system Linux was viewed askance in many corporations as an oddball creation and even legally risky to use, since the open-source ethos prefers sharing ideas rather than owning them. But IBM endorsed Linux and poured money and people into accelerating the adoption of the open-source operating system.

On Monday, IBM is to announce a broadly similar move in big data software. The company is placing a large investment — contributing software developers, technology and education programs — behind an open-source project for real-time data analysis, called Apache Spark.

The commitment, according to Robert Picciano, senior vice president for IBM’s data analytics business, will amount to “hundreds of millions of dollars” a year.

Photo courtesy of Pingdom via Flickr
Photo courtesy of Pingdom via Flickr

In the big data software market, much of the attention and investment so far has been focused on Apache Hadoop and the companies distributing that open-source software, including Cloudera, Hortonworks and MapR. Hadoop, put simply, is the software that makes it possible to handle and analyze vast volumes of all kinds of data. The technology came out of the pure Internet companies like Google and Yahoo, and is increasingly being used by mainstream companies, which want to do similar big data analysis in their businesses.

But if Hadoop opens the door to probing vast volumes of data, Spark promises speed. Real-time processing is essential for many applications, from analyzing sensor data streaming from machines to sales transactions on online marketplaces. The Spark technology was developed at the Algorithms, Machines and People Lab at the University of California, Berkeley. A group from the Berkeley lab founded a company two years ago, Databricks, which offers Spark software as a cloud service.

Spark, Mr. Picciano said, is crucial technology that will make it possible to “really deliver on the promise of big data.” That promise, he said, is to quickly gain insights from data to save time and costs, and to spot opportunities in fields like sales and new product development.

IBM said it will put more than 3,500 of its developers and researchers to work on Spark-related projects. It will contribute machine-learning technology to the open-source project, and embed Spark in IBM’s data analysis and commerce software. IBM will also offer Spark as a service on its programming platform for cloud software development, Bluemix. The company will open a Spark technology center in San Francisco to pursue Spark-based innovations.

And IBM plans to partner with academic and private education organizations including UC Berkeley’s AMPLab, DataCamp, Galvanize and Big Data University to teach Spark to as many as 1 million data engineers and data scientists.

Ion Stoica, the chief executive of Databricks, who is a Berkeley computer scientist on leave from the university, called the IBM move “a great validation for Spark.” He had talked to IBM people in recent months and knew they planned to back Spark, but, he added, “the magnitude is impressive.”

With its Spark initiative, analysts said, IBM wants to lend a hand to an open-source project, woo developers and strengthen its position in the fast-evolving market for big data software.

By aligning itself with a popular open-source project, IBM, they said, hopes to attract more software engineers to use its big data software tools, too. “It’s first and foremost a play for the minds — and hearts — of developers,” said Dan Vesset, an analyst at IDC.

IBM is investing in its own future as much as it is contributing to Spark. IBM needs a technology ecosystem, where it is a player and has influence, even if it does not immediately profit from it. IBM mainly makes its living selling applications, often tailored to individual companies, which address challenges in their business like marketing, customer service, supply-chain management and developing new products and services.

“IBM makes its money higher up, building solutions for customers,” said Mike Gualtieri, a analyst for Forrester Research. “That’s ultimately why this makes sense for IBM.”

To read the original article on The New York Times, click here.

Source: IBM Invests to Help Open-Source Big Data Software — and Itself by analyticsweekpick

The Value of Enterprise Feedback Management Vendors

In an excellent post, Bob Thompson reviews the VoC space in his blog on Voice of the Customer (Voc) Command Centers, including a discussion of 1) the six feedback dimensions, 2) how the VoC command center needs to include technology to a) capture feedback, b) analyze feedback and c) manage top priorities to resolution, and 3) the consolidation of the Enterprise Feedback Management (EFM) industry, including mention of the Verint acquisition of Vovici.

Enterprise Feedback Management (EFM) is the process of collecting, managing, analyzing and disseminating different sources (e.g., customers, employees, partners) of feedback.  EFM vendors help companies facilitate their customer experience management (CEM) efforts, hoping to improve the customer experience and increase customer loyalty.  The value that the Verint-Vovici solution provides their customers is stated in their press release:

“As the market’s most comprehensive VoC Analytics platform available, the Verint-Vovici solution will enable organizations to implement a single-vendor solution for collecting, analyzing and acting on customer insights.”

Advice to Verint-Vovici: VoC Programs are about People, Processes

A VoC program involves more than technology that helps companies capture, analyze and manage feedback. A VoC program contains many components, each impacting the program’s effectiveness. To improve customer loyalty, companies must consider how they structure their VoC program across these components. A VoC program has six major areas or components: Strategy, Governance, Business Process Integration, Method, Reporting, and Applied Research. Figure 1 below represents the components of customer feedback programs.

Components of a Customer Feedback Program
Figure 1. Elements of a Customer Feedback Program

The success of VoC programs depends on proper adoption of certain business practices in each of these six areas. While each of the six areas has VoC best practice standards, the major success drivers are related to strategy/governance, business process integration, and applied research. Companies who adopt the following practices experience higher levels of customer loyalty compared to companies who do not adopt these practices:

  • Customer feedback included in the company’s strategic vision, mission and goals.
  • Customer feedback results used in executives’ objectives and incentive compensation.
  • Customer feedback results included in the company/executive dashboards.
  • Customer feedback program integrated into business processes and technology (e.g., CRM system).
  • All areas of the customer feedback program (e.g., process and goals) communicated regularly to the entire company.
  • Customer feedback results shared throughout the company.
  • Statistical relationships established between customer feedback data and operational metrics (e.g., turnaround time, hold time).
  • Applied research using customer feedback data regularly conducted.
  • Statistical relationships established between customer feedback data and other constituency metrics (e.g., employee satisfaction or partner satisfaction metrics).

Reanalyzing that same data (note: the following has not yet been published), I had asked respondents about their use of third-party survey vendors and their satisfaction with these vendors. Surprisingly, I found that companies who used third-party survey vendors did not have more loyal customers (Mean = 68th percentile in industry on customer loyalty) than companies who did not use third-party vendors (Mean = 65th percentile). Furthermore, of those companies who used third-party vendors, only 60% of the companies were satisfied (20% very satisfied) with them. The use of EFM vendors does not guarantee improvements in the customer experience and customer loyalty.

While technology will continue to play a role in improving VoC programs by capturing, aggregating and disseminating customer feedback, it appears that the success of a VoC program is more about people and processes and less about technology. The Verint-Vovici solution (for that matter, all EFM solutions), to be successful, need to cognizant of all components of their customer’s VoC program and must consider how their technology will improve the people and processes (building a customer-centric culture).

Assessing your VoC Program

The way your VoC program is structured impacts its success. If you are a VoC professional who manages customer feedback programs for your company, you can take the Customer Feedback Program Diagnostic (CFPD) to determine if your VoC program adopts best practices. This brief assessment process can help your company:

  1. identify your customer feedback program’s strengths and weaknesses
  2. understand how to improve your customer feedback program
  3. facilitate your customer experience improvement efforts
  4. increase customer loyalty
  5. accelerate business growth

Upon completion of this 10-minutes assessment, you will receive immediate feedback on your company’s VoC program. Additionally, all respondents will receive a free summary report of the research findings.  Take the CFPD now.

Source: The Value of Enterprise Feedback Management Vendors

Nov 16, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Statistics  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> Put big data to work with Cortana Analytics by analyticsweekpick

>> Respondents Needed for a Study about the Use of Net Scores and Mean Scores in Customer Experience Management by bobehayes

>> Understanding Customer Buying Journey with Big Data by v1shal

Wanna write? Click Here

[ NEWS BYTES]

>>
 What will you regulate? Former White House chief data scientist questions Tesla’s Elon Musk on his AI fear – Economic Times Under  Data Scientist

>>
 How Big Data Analytics are Empowering Customer’s Acquisition in Native Advertising – Customer Think Under  Big Data Analytics

>>
 Maine businesses respond to global cyber attack – WCSH-TV Under  Big Data Security

More NEWS ? Click Here

[ FEATURED COURSE]

Deep Learning Prerequisites: The Numpy Stack in Python

image

The Numpy, Scipy, Pandas, and Matplotlib stack: prep for deep learning, machine learning, and artificial intelligence… more

[ FEATURED READ]

The Black Swan: The Impact of the Highly Improbable

image

A black swan is an event, positive or negative, that is deemed improbable yet causes massive consequences. In this groundbreaking and prophetic book, Taleb shows in a playful way that Black Swan events explain almost eve… more

[ TIPS & TRICKS OF THE WEEK]

Fix the Culture, spread awareness to get awareness
Adoption of analytics tools and capabilities has not yet caught up to industry standards. Talent has always been the bottleneck towards achieving the comparative enterprise adoption. One of the primal reason is lack of understanding and knowledge within the stakeholders. To facilitate wider adoption, data analytics leaders, users, and community members needs to step up to create awareness within the organization. An aware organization goes a long way in helping get quick buy-ins and better funding which ultimately leads to faster adoption. So be the voice that you want to hear from leadership.

[ DATA SCIENCE Q&A]

Q:What is an outlier? Explain how you might screen for outliers and what would you do if you found them in your dataset. Also, explain what an inlier is and how you might screen for them and what would you do if you found them in your dataset
A: Outliers:
– An observation point that is distant from other observations
– Can occur by chance in any distribution
– Often, they indicate measurement error or a heavy-tailed distribution
– Measurement error: discard them or use robust statistics
– Heavy-tailed distribution: high skewness, can’t use tools assuming a normal distribution
– Three-sigma rules (normally distributed data): 1 in 22 observations will differ by twice the standard deviation from the mean
– Three-sigma rules: 1 in 370 observations will differ by three times the standard deviation from the mean

Three-sigma rules example: in a sample of 1000 observations, the presence of up to 5 observations deviating from the mean by more than three times the standard deviation is within the range of what can be expected, being less than twice the expected number and hence within 1 standard deviation of the expected number (Poisson distribution).

If the nature of the distribution is known a priori, it is possible to see if the number of outliers deviate significantly from what can be expected. For a given cutoff (samples fall beyond the cutoff with probability p), the number of outliers can be approximated with a Poisson distribution with lambda=pn. Example: if one takes a normal distribution with a cutoff 3 standard deviations from the mean, p=0.3% and thus we can approximate the number of samples whose deviation exceed 3 sigmas by a Poisson with lambda=3

Identifying outliers:
– No rigid mathematical method
– Subjective exercise: be careful
– Boxplots
– QQ plots (sample quantiles Vs theoretical quantiles)

Handling outliers:
– Depends on the cause
– Retention: when the underlying model is confidently known
– Regression problems: only exclude points which exhibit a large degree of influence on the estimated coefficients (Cook’s distance)

Inlier:
– Observation lying within the general distribution of other observed values
– Doesn’t perturb the results but are non-conforming and unusual
– Simple example: observation recorded in the wrong unit (°F instead of °C)

Identifying inliers:
– Mahalanobi’s distance
– Used to calculate the distance between two random vectors
– Difference with Euclidean distance: accounts for correlations
– Discard them

Source

[ VIDEO OF THE WEEK]

@JohnTLangton from @Wolters_Kluwer discussed his #AI Lead Startup Journey #FutureOfData #Podcast

 @JohnTLangton from @Wolters_Kluwer discussed his #AI Lead Startup Journey #FutureOfData #Podcast

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

The data fabric is the next middleware. – Todd Papaioannou

[ PODCAST OF THE WEEK]

Jeff Palmucci @TripAdvisor discusses managing a #MachineLearning #AI Team

 Jeff Palmucci @TripAdvisor discusses managing a #MachineLearning #AI Team

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

More than 5 billion people are calling, texting, tweeting and browsing on mobile phones worldwide.

Sourced from: Analytics.CLUB #WEB Newsletter

8 ways IBM Watson Analytics is transforming business

intro_title_hp-100640472-orig
8 ways IBM Watson Analytics is transforming business

IBM says Watson represents a new era of computing — a step forward to cognitive computing, where apps and systems interact with humans via natural language and help us augment our own understanding of the world with big data insights.

The Watson Analytics offering is intended to provide the benefits of advanced analytics without the complexity. The data discovery service, available via the cloud, guides data exploration, automates predictive analytics and enables dashboard and infographic creation.

Here are eight examples of organizations using Watson Analytics to transform their operations.

Build Predictive Model – Paschall Truck Lines (PTL)

Analyze injury report to improve safety - Mears Group

Identify new business opportunity - Minter Ellison

Save costs and optimize travel time - Caliber Patient Care

Identify Customer Behavior Trends - Kristalytics

Gain insights on concession stand performance – Legends

Teach students how to leverage social media sentiments - Iowa State University

Unvial insights and build advanced visualization - University of Memphis

 

Read the complete article at http://www.cio.com/article/3026691/analytics/8-ways-ibm-watson-analytics-is-transforming-business.html

Source: 8 ways IBM Watson Analytics is transforming business by analyticsweekpick

Big Data Analytics, Supercomputing Seed Growth in Plant Research

Over the millennia our ability to utilize plants in many different ways has allowed us to flourish as a species. Most importantly, they turn our waste carbon dioxide into oxygen.

But we have also used plants to provide shelter, to publish and transmit information on paper and as a food source. In fact, developing new ways to utilize plants has even led to population explosions throughout time, such as when we first developed granaries to store grain thousands of years ago. In these modern times of climate change, global warming, ever-increasing populations and fossil fuels, plants have never been more important.

We need to ensure that there is enough plant biomass available to satisfy the world’s needs as well as ensuring there is enough to preserve habitats, act as a carbon sink and supply us with enough oxygen. To do this we must find ways to increase biomass yields, increase land available for plant production, reduce the risk to crop yields from pests and disease, limit wasted biomass and optimize plant properties to better suit specific applications such as increasing nutritional composition for human health.

IBM is acutely aware of the importance of plants and is developing and utilizing a number of approaches in agriculture. Precision agriculture is one approach combining real-time data such as weather and soil quality with predictive analytics to allow the best decisions to be made when planting, fertilizing and harvesting crops. Another approach, in collaboration with Mars, has used bioinformatics to sequence the genome of cocoa.

Photo courtesy of Building a Smarter Planet
Photo courtesy of Building a Smarter Planet

Leveraging the technologies developed from other life sciences studies and developing specialized algorithms, IBM was able to identify genes that produced healthier and better tasting cocoa plants. Both of these approaches utilize IBM’s expertise in Big Data and Analytics.

Here in Australia I am part of a team at the IBM Research Collaboratory for Life Sciences – Melbourne working in collaboration with researchers in the ARC Centre of Excellence in Plant Cell Walls. We are using computational biology to examine the structure of the wall that surrounds all plant cells. We are investigating how the plant cell wall is produced, its structure and organisation and how it gets deconstructed.

In work just published in the journal Plant Physiology we used molecular dynamics techniques and high performance computing to model the major component of plant cell walls: cellulose. Our results strongly suggest that the fibres of cellulose are much smaller than previously believed. We are now investigating how these cellulose fibres interact with each other and other wall components to form the plant cell wall. Through these studies we intend to produce more accurate models of the plant cell wall and make predictions about how changes will affect the plant’s physical properties.

The possible application areas are vast. In the area of food security, we could optimize the properties of plant cell walls to make plants that are more drought/salt tolerant or more resistant to disease pathogens. In the area of human health, we hope to increase the nutritional composition of plant cell walls. In the paper and textiles industry, we could increase the physical strength of the plant cell wall making plants better for pulping or fibre production. In the area of biofuels, our studies should help to limit the effect of recalcitrance leading to more efficient ethanol extraction.

Supercomputers, such as the IBM Blue Gene/Q that we used at the University of Melbourne’s Victorian Life Sciences Computation Initiative (VLSCI), are essential in these type of projects where we examine the dynamics of biological systems at the nanoscale. Such work requires simulation of the motion of each atom and to do this we must calculate how all these atoms interact.

This must be done for many millions of time steps – a process that would take years on a standard desktop computer but days on a supercomputer. Supercomputing accelerates science and that is what the University of Melbourne and IBM Research set out to do with the formation of the VLSCI and the Collaboratory. Our work with the ARC Centre of Excellence in Plant Cell Walls is an excellent example of the success of these two organisations and the importance of plant research to the world.

Over the past few decades the major focus of life sciences research has been on animals and humans. In many ways, today we are at a similar position with plant research. We predict this research to grow dramatically in the coming years and it’s exciting to be at the forefront of the field.

To read the original article on Building a Smarter Planet, click here.

Originally Posted at: Big Data Analytics, Supercomputing Seed Growth in Plant Research by analyticsweekpick

Nov 09, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Data Storage  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> Genomics England exploits big data analytics to personalise cancer treatment by analyticsweekpick

>> Social Media and the Future of Customer Support [Infographics] by v1shal

>> Business Linkage Analysis: An Overview by bobehayes

Wanna write? Click Here

[ NEWS BYTES]

>>
 Trick or treat! Halloween spending statistics – Seeking Alpha Under  Statistics

>>
 People are paying more attention to data security in wake of cyber attack – Mississippi News Now Under  Data Security

>>
 Guidelines for Understanding the Next-Generation of Hadoop Technologies – Database Trends and Applications Under  Hadoop

More NEWS ? Click Here

[ FEATURED COURSE]

CPSC 540 Machine Learning

image

Machine learning (ML) is one of the fastest growing areas of science. It is largely responsible for the rise of giant data companies such as Google, and it has been central to the development of lucrative products, such … more

[ FEATURED READ]

Rise of the Robots: Technology and the Threat of a Jobless Future

image

What are the jobs of the future? How many will there be? And who will have them? As technology continues to accelerate and machines begin taking care of themselves, fewer people will be necessary. Artificial intelligence… more

[ TIPS & TRICKS OF THE WEEK]

Fix the Culture, spread awareness to get awareness
Adoption of analytics tools and capabilities has not yet caught up to industry standards. Talent has always been the bottleneck towards achieving the comparative enterprise adoption. One of the primal reason is lack of understanding and knowledge within the stakeholders. To facilitate wider adoption, data analytics leaders, users, and community members needs to step up to create awareness within the organization. An aware organization goes a long way in helping get quick buy-ins and better funding which ultimately leads to faster adoption. So be the voice that you want to hear from leadership.

[ DATA SCIENCE Q&A]

Q:Is mean imputation of missing data acceptable practice? Why or why not?
A: * Bad practice in general
* If just estimating means: mean imputation preserves the mean of the observed data
* Leads to an underestimate of the standard deviation
* Distorts relationships between variables by “pulling” estimates of the correlation toward zero

Source

[ VIDEO OF THE WEEK]

#BigData @AnalyticsWeek #FutureOfData #Podcast with Joe DeCosmo, @Enova

 #BigData @AnalyticsWeek #FutureOfData #Podcast with Joe DeCosmo, @Enova

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

Information is the oil of the 21st century, and analytics is the combustion engine. – Peter Sondergaard

[ PODCAST OF THE WEEK]

#BigData @AnalyticsWeek #FutureOfData #Podcast with Dr. Nipa Basu, @DnBUS

 #BigData @AnalyticsWeek #FutureOfData #Podcast with Dr. Nipa Basu, @DnBUS

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

In 2015, a staggering 1 trillion photos will be taken and billions of them will be shared online. By 2017, nearly 80% of photos will be taken on smart phones.

Sourced from: Analytics.CLUB #WEB Newsletter

Inovalon’s Next Generation Big Data Platform Solution Achieves NCQA Measure Certification

Big Data Processing and Advanced Analytics Deliver Unprecedented Speed and Real-Time Insights for Improved Clinical and Quality Outcomes.

BOWIE, Md., March 19, 2015 (GLOBE NEWSWIRE) — Inovalon (Nasdaq:INOV), a leading technology company providing advanced cloud-based data analytics and data-driven intervention platforms to the healthcare industry, today announced that on March 16, 2015, the next generation of big data platform for Healthcare Effectiveness Data and Information Set (HEDIS®) quality measurement and reporting, Inovalon’s Quality Spectrum Insight (QSI®-XL) solution, received National Committee for Quality Assurance (NCQA) Measure Certification for HEDIS® 2015.

The QSI®-XL platform, the core analytics engine within Inovalon’s HEDIS Advantage™ solution, leverages big data processing with the industry’s most robust analytics platform available, further enhancing the industry-leading solution utilized by more than two-thirds of the nation’s outsourced quality measurement initiatives. Inovalon’s advanced technology architecture delivers unprecedented processing times that enable accelerated decision-making and speed-to-impact to inform quality analysis and improvement programs – achieving dramatic improvements in data aggregation simplicity and greater than 15 times processing speed capabilities as compared to the company’s previously available industry-leading technologies.

“Inovalon’s advanced cloud-based technologies continue to transform the healthcare industry,” said Joseph Rostock, chief technology officer at Inovalon. “As the industry shifts to quality- and value-based outcome models, the ability to efficiently and rapidly integrate and analyze large sets of data has become increasingly critical for healthcare organizations to be successful. Inovalon drives superior value for our clients by providing the timely, granular insights that health plans, Accountable Care Organizations, provider organizations, and employer groups need to improve quality initiatives, while solving the challenges of volume and disparate sources of healthcare and patient information.”

Prior to the rise in quality-driven care, the aggregation and analysis of quality measurement data was predominantly undertaken on an annual basis – submitted to oversight bodies, state, and federal regulatory agencies to fulfill licensing and accreditation requirements. With the dramatic rise in performance-based incentives across the U.S. healthcare landscape, programs such as the Five-Star Quality Rating System for Medicare Advantage plans, state Medicaid quality programs such as New York’s Quality Assurance Reporting Requirements (QARR) program, the Affordable Care Act’s Quality Rating System (QRS) program, Medicare Shared Savings Program for Accountable Care Organizations (ACOs), and a host of other programs, the need for advanced sophistication in quality data aggregation and analysis has risen significantly. These programs now drive billions of dollars in performance incentives. As a result, adjacent constituents of the incentivized market are becoming similarly subject to the same market forces through shared-risk and share-savings agreements. The ultimate impact is broadening the influence of quality- and value-based incentives to extend beyond the health plan marketplace to include hospitals, providers, pharma/life sciences, and even device manufacturers.

Further emphasizing the speed and significance of the market’s transition to quality- and value-based incentives within healthcare, on January 26, 2015, the U.S. Health and Human Services (HHS) announced its intention to transition 50 percent of its payments (or approximately $215 billion) to quality-based payments in Medicare by 2018, with even higher percentage goals in the years shortly thereafter.

“Providing the industry’s most robust and widely used quality analytics platform positions Inovalon well to serve its clients and the market in the rapidly expanding need for quality data aggregation and analytics,” said Dan Rizzo, Inovalon’s chief innovation officer. “We believe that the ability for Inovalon to receive massive amounts of data easily within its data lake repository, and process this data in near real-time, providing clients not only with up-to-date insights into their patients, provider and population quality status, but also to serve as a critical element of their quality improvement program, is truly unique in the marketplace.”

Built upon Inovalon’s QSI® engine, Inovalon’s HEDIS Advantage™ platform performs the analytics for more than 1200 Interactive Data Submission System (IDSS) submissions annually for leading health plans across the U.S. to the NCQA, the Centers for Medicare and Medicaid Services (CMS), the Utilization Review Accreditation Commission (URAC), and other state regulatory agencies. The majority of HEDIS Advantage™ users access these capabilities via Inovalon’s cloud-based platform, with the balance of users leveraging installed software versions. As of the date of this release, QSI®-XL is already in place and operating with healthcare organization’s NCQA HEDIS® reporting and quality improvement programs. Going forward, both QSI® and QSI®-XL solutions will be offered by Inovalon with clients selecting their desired platform based on dataset size and compute speed needs.

In addition to supporting client needs in quality data aggregation and analytics, the QSI®-XL engine will replace QSCL®, Inovalon’s prior-generation big data quality measurement platform, as a critical component of Inovalon’s predictive analytics platform, Star Advantage®, which identifies future gaps in quality and optimal venue, timing, modality, and content for resolving such identified gaps in quality through the utilization of Inovalon’s data-driven intervention platform. With Inovalon’s arsenal of cloud-based quality analytics tools, clients have the ability to uniquely understand and improve their quality score in near-real time – a capability that has demonstrated improved quality rates by 300 percent compared to populations unaided by such a technology.

The QSI®-XL engine additionally enables the application of customer analytical modules, created through Inovalon’s proprietary Quality Spectrum Flowchart Designer (QSFD®), across massive datasets at dramatically accelerated speeds. QSFD® allows non-technologists to design proprietary algorithms through easy-to-use “drag and drop” graphical interfaces, compiling logic routines for application within the cloud, minutes after creation. These capabilities allow a diverse array of data-users, from internal product designers to device manufactures and pharma/life science researchers, to design unique analysis and apply them against Inovalon’s large-scale datasets in seconds and minutes.

In addition to the NCQA Measure Certification received by Inovalon’s QSI®-XL big data quality analytics engine on March 16th, 2015, Inovalon’s Quality Spectrum Insight (QSI®) software engine (version 18) has also received NCQA Measure Certification for HEDIS® 2015. Inovalon’s QSI® software engine has now received NCQA’s Measure Certification for HEDIS® every year, since 2001. Going forward, both QSI® and QSI®-XL solutions will be offered to Inovalon’s clients.

NCQA is an independent, not-for-profit organization dedicated to improving health care quality. NCQA accredits and certifies a wide range of health care organizations and manages the evolution of HEDIS®, the performance measurement tool used by more than 90 percent of the nation’s health plans. HEDIS® is a registered trademark of the NCQA.

About Inovalon

Inovalon is a leading technology company that combines advanced cloud-based data analytics and data-driven intervention platforms to achieve meaningful insight and impact in clinical and quality outcomes, utilization, and financial performance across the healthcare landscape. Inovalon’s unique achievement of value is delivered through the effective progression of Turning Data into Insight, and Insight into Action®. Large proprietary datasets, advanced integration technologies, sophisticated predictive analytics, data-driven intervention platforms, and deep subject matter expertise deliver a seamless, end-to-end capability that brings the benefits of big data and large-scale analytics to the point of care. Driven by data, Inovalon uniquely identifies gaps in care, quality, data integrity, and financial performance – while bringing to bear the unique capabilities to resolve them. Providing technology that supports hundreds of healthcare organizations in 98.2% of U.S. counties and Puerto Rico, Inovalon’s cloud-based analytical and data-driven intervention platforms are informed by data pertaining to more than 754,000 physicians, 248,000 clinical facilities, and more than 120 million Americans providing a powerful solution suite that drives high-value impact, improving quality and economics for health plans, ACOs, hospitals, physicians, consumers and pharma/life-sciences researchers. For more information, visitwww.inovalon.com.

CONTACT: Inovalon
         Kim E. Collins
         4321 Collington Road
         Bowie, Maryland 20716
         Phone: 301-809-4000
         kimecollins@inovalon.com
         
         Greenough, on behalf of Inovalon
         Andrea LePain
         Phone: 617-275-6526
         alepain@greenoughcom.com

Originally posted via "Inovalon's Next Generation Big Data Platform Solution Achieves NCQA Measure Certification"

Source: Inovalon’s Next Generation Big Data Platform Solution Achieves NCQA Measure Certification by analyticsweekpick

Nov 02, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Data Mining  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> What is Customer Loyalty? Part 1 by bobehayes

>> Deriving “Inherently Intelligent” Information from Artificial Intelligence by jelaniharper

>> Lessons From Apple Retail Stores, Gold for Others by v1shal

Wanna write? Click Here

[ NEWS BYTES]

>>
 Cloud formation looks like giant seagulls over Newquay – Cornwall Live Under  Cloud

>>
 IBM’s Watson Data Platform aims to become data science operating system – ZDNet Under  Data Science

>>
 Bayer Leverkusen vs. Schalke: Line-ups and statistics – bundesliga … – Bundesliga – official website Under  Statistics

More NEWS ? Click Here

[ FEATURED COURSE]

Data Mining

image

Data that has relevance for managerial decisions is accumulating at an incredible rate due to a host of technological advances. Electronic data capture has become inexpensive and ubiquitous as a by-product of innovations… more

[ FEATURED READ]

Hypothesis Testing: A Visual Introduction To Statistical Significance

image

Statistical significance is a way of determining if an outcome occurred by random chance, or did something cause that outcome to be different than the expected baseline. Statistical significance calculations find their … more

[ TIPS & TRICKS OF THE WEEK]

Keeping Biases Checked during the last mile of decision making
Today a data driven leader, a data scientist or a data driven expert is always put to test by helping his team solve a problem using his skills and expertise. Believe it or not but a part of that decision tree is derived from the intuition that adds a bias in our judgement that makes the suggestions tainted. Most skilled professionals do understand and handle the biases well, but in few cases, we give into tiny traps and could find ourselves trapped in those biases which impairs the judgement. So, it is important that we keep the intuition bias in check when working on a data problem.

[ DATA SCIENCE Q&A]

Q:Is it beneficial to perform dimensionality reduction before fitting an SVM? Why or why not?
A: * When the number of features is large comparing to the number of observations (e.g. document-term matrix)
* SVM will perform better in this reduced space

Source

[ VIDEO OF THE WEEK]

@SidProbstein / @AIFoundry on Leading #DataDriven Technology Transformation #FutureOfData #Podcast

 @SidProbstein / @AIFoundry on Leading #DataDriven Technology Transformation #FutureOfData #Podcast

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

Big Data is not the new oil. – Jer Thorp

[ PODCAST OF THE WEEK]

@AnalyticsWeek #FutureOfData with Robin Thottungal(@rathottungal), Chief Data Scientist at @EPA

 @AnalyticsWeek #FutureOfData with Robin Thottungal(@rathottungal), Chief Data Scientist at @EPA

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

30 Billion pieces of content shared on Facebook every month.

Sourced from: Analytics.CLUB #WEB Newsletter

For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights

Technology revolutions come in measured, sometimes foot-dragging steps. The lab science and marketing enthusiasm tend to underestimate the bottlenecks to progress that must be overcome with hard work and practical engineering.

The field known as “big data” offers a contemporary case study. The catchphrase stands for the modern abundance of digital data from many sources — the web, sensors, smartphones and corporate databases — that can be mined with clever software for discoveries and insights. Its promise is smarter, data-driven decision-making in every field. That is why data scientist is the economy’s hot new job.

Yet far too much handcrafted work — what data scientists call “data wrangling,” “data munging” and “data janitor work” — is still required. Data scientists, according to interviews and expert estimates, spend from 50 percent to 80 percent of their time mired in this more mundane labor of collecting and preparing unruly digital data, before it can be explored for useful nuggets.

“Data wrangling is a huge — and surprisingly so — part of the job,” said Monica Rogati, vice president for data science at Jawbone, whose sensor-filled wristband and software track activity, sleep and food consumption, and suggest dietary and health tips based on the numbers. “It’s something that is not appreciated by data civilians. At times, it feels like everything we do.”

Several start-ups are trying to break through these big data bottlenecks by developing software to automate the gathering, cleaning and organizing of disparate data, which is plentiful but messy. The modern Wild West of data needs to be tamed somewhat so it can be recognized and exploited by a computer program.

“It’s an absolute myth that you can send an algorithm over raw data and have insights pop up,” said Jeffrey Heer, a professor of computer science at the University of Washington and a co-founder of Trifacta, a start-up based in San Francisco.

Timothy Weaver, the chief information officer of Del Monte Foods, calls the predicament of data wrangling big data’s “iceberg” issue, meaning attention is focused on the result that is seen rather than all the unseen toil beneath. But it is a problem born of opportunity. Increasingly, there are many more sources of data to tap that can deliver clues about a company’s business, Mr. Weaver said.

In the food industry, he explained, the data available today could include production volumes, location data on shipments, weather reports, retailers’ daily sales and social network comments, parsed for signals of shifts in sentiment and demand.

The result, Mr. Weaver said, is being able to see each stage of a business in greater detail than in the past, to tailor product plans and trim inventory. “The more visibility you have, the more intelligent decisions you can make,” he said.

But if the value comes from combining different data sets, so does the headache. Data from sensors, documents, the web and conventional databases all come in different formats. Before a software algorithm can go looking for answers, the data must be cleaned up and converted into a unified form that the algorithm can understand.

Data formats are one challenge, but so is the ambiguity of human language. Iodine, a new health start-up, gives consumers information on drug side effects and interactions. Its lists, graphics and text descriptions are the result of combining the data from clinical research, government reports and online surveys of people’s experience with specific drugs.

But the Food and Drug Administration, National Institutes of Health and pharmaceutical companies often apply slightly different terms to describe the same side effect. For example, “drowsiness,” “somnolence” and “sleepiness” are all used. A human would know they mean the same thing, but a software algorithm has to be programmed to make that interpretation. That kind of painstaking work must be repeated, time and again, on data projects.

Data experts try to automate as many steps in the process as possible. “But practically, because of the diversity of data, you spend a lot of your time being a data janitor, before you can get to the cool, sexy things that got you into the field in the first place,” said Matt Mohebbi, a data scientist and co-founder of Iodine.

The big data challenge today fits a familiar pattern in computing. A new technology emerges and initially it is mastered by an elite few. But with time, ingenuity and investment, the tools get better, the economics improve, business practices adapt and the technology eventually gets diffused and democratized into the mainstream.

In software, for example, the early programmers were a priesthood who understood the inner workings of the machine. But the door to programming was steadily opened to more people over the years with higher-level languages from Fortran to Java, and even simpler tools like spreadsheets.

Spreadsheets made financial math and simple modeling accessible to millions of nonexperts in business. John Akred, chief technology officer at Silicon Valley Data Science, a consulting firm, sees something similar in the modern data world, as the software tools improve.

“We are witnessing the beginning of that revolution now, of making these data problems addressable by a far larger audience,” Mr. Akred said.

ClearStory Data, a start-up in Palo Alto, Calif., makes software that recognizes many data sources, pulls them together and presents the results visually as charts, graphics or data-filled maps. Its goal is to reach a wider market of business users beyond data masters.

Six to eight data sources typically go into each visual presentation. One for a retailer might include scanned point-of-sale data, weather reports, web traffic, competitors’ pricing data, the number of visits to the merchant’s smartphone app and video tracking of parking lot traffic, said Sharmila Shahani-Mulligan, chief executive of ClearStory.

“You can’t do this manually,” Ms. Shahani-Mulligan said. “You’re never going to find enough data scientists and analysts.”

Trifacta makes a tool for data professionals. Its software employs machine-learning technology to find, present and suggest types of data that might be useful for a data scientist to see and explore, depending on the task at hand.

“We want to lift the burden from the user, reduce the time spent on data preparation and learn from the user,” said Joseph M. Hellerstein, chief strategy officer of Trifacta, who is also a computer science professor at the University of California, Berkeley.

Paxata, a start-up in Redwood City, Calif., is focused squarely on automating data preparation — finding, cleaning and blending data so that it is ready to be analyzed. The data refined by Paxata can then be fed into a variety of analysis or visualization software tools, chosen by the data scientist or business analyst, said Prakash Nanduri, chief executive of Paxata.

“We’re trying to liberate people from data-wrangling,” Mr. Nanduri said. “We want to free up their time and save them from going blind.”

Data scientists emphasize that there will always be some hands-on work in data preparation, and there should be. Data science, they say, is a step-by-step process of experimentation.

“You prepared your data for a certain purpose, but then you learn something new, and the purpose changes,” said Cathy O’Neil, a data scientist at the Columbia University Graduate School of Journalism, and co-author, with Rachel Schutt, of “Doing Data Science” (O’Reilly Media, 2013).

Plenty of progress is still to be made in easing the analysis of data. “We really need better tools so we can spend less time on data wrangling and get to the sexy stuff,” said Michael Cavaretta, a data scientist at Ford Motor, which has used big data analysis to trim inventory levels and guide changes in car design.

Mr. Cavaretta is familiar with the work of ClearStory, Trifacta, Paxata and other start-ups in the field. “I’d encourage these start-ups to keep at it,” he said. “It’s a good problem, and a big one.”

Correction: August 19, 2014
An article on Monday about the development of software to automate the gathering, cleaning and organizing of disparate data misstated the year when “Doing Data Science” was published. It first came out in 2013, not 2014.

Originally posted via “For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights”

Source by analyticsweekpick