Aug 30, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Convincing  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> October 10, 2016 Health and Biotech Analytics News Roundup by pstein

>> Ten Guidelines for Clean Customer Feedback Data by bobehayes

>> Apr 20, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..) by admin

Wanna write? Click Here

[ NEWS BYTES]

>>
 Data Analytics Add Value to Healthcare Supply Chain Management – RevCycleIntelligence.com Under  Health Analytics

>>
 The men’s fashion company that’s part apparel, part big data – Marketplace.org Under  Big Data

>>
 TV Time’s New Analytics Tool Breaks Down Fan Reaction to Shows … – Variety Under  Social Analytics

More NEWS ? Click Here

[ FEATURED COURSE]

Introduction to Apache Spark

image

Learn the fundamentals and architecture of Apache Spark, the leading cluster-computing framework among professionals…. more

[ FEATURED READ]

On Intelligence

image

Jeff Hawkins, the man who created the PalmPilot, Treo smart phone, and other handheld devices, has reshaped our relationship to computers. Now he stands ready to revolutionize both neuroscience and computing in one strok… more

[ TIPS & TRICKS OF THE WEEK]

Winter is coming, warm your Analytics Club
Yes and yes! As we are heading into winter what better way but to talk about our increasing dependence on data analytics to help with our decision making. Data and analytics driven decision making is rapidly sneaking its way into our core corporate DNA and we are not churning practice ground to test those models fast enough. Such snugly looking models have hidden nails which could induce unchartered pain if go unchecked. This is the right time to start thinking about putting Analytics Club[Data Analytics CoE] in your work place to help Lab out the best practices and provide test environment for those models.

[ DATA SCIENCE Q&A]

Q:What is an outlier? Explain how you might screen for outliers and what would you do if you found them in your dataset. Also, explain what an inlier is and how you might screen for them and what would you do if you found them in your dataset
A: Outliers:
– An observation point that is distant from other observations
– Can occur by chance in any distribution
– Often, they indicate measurement error or a heavy-tailed distribution
– Measurement error: discard them or use robust statistics
– Heavy-tailed distribution: high skewness, can’t use tools assuming a normal distribution
– Three-sigma rules (normally distributed data): 1 in 22 observations will differ by twice the standard deviation from the mean
– Three-sigma rules: 1 in 370 observations will differ by three times the standard deviation from the mean

Three-sigma rules example: in a sample of 1000 observations, the presence of up to 5 observations deviating from the mean by more than three times the standard deviation is within the range of what can be expected, being less than twice the expected number and hence within 1 standard deviation of the expected number (Poisson distribution).

If the nature of the distribution is known a priori, it is possible to see if the number of outliers deviate significantly from what can be expected. For a given cutoff (samples fall beyond the cutoff with probability p), the number of outliers can be approximated with a Poisson distribution with lambda=pn. Example: if one takes a normal distribution with a cutoff 3 standard deviations from the mean, p=0.3% and thus we can approximate the number of samples whose deviation exceed 3 sigmas by a Poisson with lambda=3

Identifying outliers:
– No rigid mathematical method
– Subjective exercise: be careful
– Boxplots
– QQ plots (sample quantiles Vs theoretical quantiles)

Handling outliers:
– Depends on the cause
– Retention: when the underlying model is confidently known
– Regression problems: only exclude points which exhibit a large degree of influence on the estimated coefficients (Cook’s distance)

Inlier:
– Observation lying within the general distribution of other observed values
– Doesn’t perturb the results but are non-conforming and unusual
– Simple example: observation recorded in the wrong unit (°F instead of °C)

Identifying inliers:
– Mahalanobi’s distance
– Used to calculate the distance between two random vectors
– Difference with Euclidean distance: accounts for correlations
– Discard them

Source

[ VIDEO OF THE WEEK]

@TimothyChou on World of #IOT & Its #Future Part 1 #FutureOfData #Podcast

 @TimothyChou on World of #IOT & Its #Future Part 1 #FutureOfData #Podcast

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

The world is one big data problem. – Andrew McAfee

[ PODCAST OF THE WEEK]

#BigData @AnalyticsWeek #FutureOfData #Podcast with @MichOConnell, @Tibco

 #BigData @AnalyticsWeek #FutureOfData #Podcast with @MichOConnell, @Tibco

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

571 new websites are created every minute of the day.

Sourced from: Analytics.CLUB #WEB Newsletter

CMOs’ Journey from Big Data to Big Profits (Infographic)

CMOs’ Journey from Big Data to Big Profits (Infographic)
CMOs’ Journey from Big Data to Big Profits (Infographic)

Since the consumer purchase funnel is generating great amounts of data, it has become extremely difficult to track and make sense of the data, as consumers add social media and mobile channels to their decision-making. This is fueling the ever mounting pressure on CMOs to show how their budget delivers incremental business value.

Better data management is turning out to be a strong competitive edge and great value generation tools for organizations. So, well managed marketing organization will make adequate use of data.

This has pushed many marketers to stand overwhelmingly towards better big data analytics, as analytics will become a major component of their business over the next several years—according to the Teradata Data Driven Marketing Survey 2013 released by Teradata earlier this year, 71 percent or marketers say they plan to implement big data analytics within the next two years.

Marketers already rely on a number of common and easily accessible forms of data to drive their marketing initiatives—customer service data, customer satisfaction data, digital interaction data and demographic data. But true data-driven marketing takes it to the next level: Marketers need to collect and analyze massive amounts of complicated, unstructured data that combines the traditional data their companies have collected with interaction data (e.g., data pulled from social media), integrating both online and offline data sources to create a single view of their customer.

Visually and McKinsey & Company co published this infographic to illustrate the pressures that CMOs find themselves under and respective potential benefit in leveraging big data.

CMOs’ Journey from Big Data to Big Profits (Infographic)
CMOs’ Journey from Big Data to Big Profits (Infographic)

Source by v1shal

Aug 23, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Insights  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> Big data’s big problem: How to make it work in the real world by analyticsweekpick

>> January 16, 2017 Health and Biotech analytics news roundup by pstein

>> March 6, 2017 Health and Biotech analytics news roundup by pstein

Wanna write? Click Here

[ FEATURED COURSE]

Deep Learning Prerequisites: The Numpy Stack in Python

image

The Numpy, Scipy, Pandas, and Matplotlib stack: prep for deep learning, machine learning, and artificial intelligence… more

[ FEATURED READ]

On Intelligence

image

Jeff Hawkins, the man who created the PalmPilot, Treo smart phone, and other handheld devices, has reshaped our relationship to computers. Now he stands ready to revolutionize both neuroscience and computing in one strok… more

[ TIPS & TRICKS OF THE WEEK]

Grow at the speed of collaboration
A research by Cornerstone On Demand pointed out the need for better collaboration within workforce, and data analytics domain is no different. A rapidly changing and growing industry like data analytics is very difficult to catchup by isolated workforce. A good collaborative work-environment facilitate better flow of ideas, improved team dynamics, rapid learning, and increasing ability to cut through the noise. So, embrace collaborative team dynamics.

[ DATA SCIENCE Q&A]

Q:How to detect individual paid accounts shared by multiple users?
A: * Check geographical region: Friday morning a log in from Paris and Friday evening a log in from Tokyo
* Bandwidth consumption: if a user goes over some high limit
* Counter of live sessions: if they have 100 sessions per day (4 times per hour) that seems more than one person can do

Source

[ VIDEO OF THE WEEK]

#FutureOfData Podcast: Peter Morgan, CEO, Deep Learning Partnership

 #FutureOfData Podcast: Peter Morgan, CEO, Deep Learning Partnership

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

We chose it because we deal with huge amounts of data. Besides, it sounds really cool. – Larry Page

[ PODCAST OF THE WEEK]

#FutureOfData with @CharlieDataMine, @Oracle discussing running analytics in an enterprise

 #FutureOfData with @CharlieDataMine, @Oracle discussing running analytics in an enterprise

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

Distributed computing (performing computing tasks using a network of computers in the cloud) is very real. Google GOOGL -0.53% uses it every day to involve about 1,000 computers in answering a single search query, which takes no more than 0.2 seconds to complete.

Sourced from: Analytics.CLUB #WEB Newsletter

“To Cloud or Not”: Practical Considerations for Disaster Recovery and High Availability in Public Clouds

The perceived benefits of low cost storage, per-usage pricing models, and flexible accessibility—and those facilitated by multi-tenant, public cloud providers in particular—are compelling. Across industries and use cases, organizations are hastening to migrate to the cloud for applications that are increasingly becoming more mission critical with each deployment.

What many fail to realize (until after the fact) is that the question of security is not the only cautionary consideration accompanying this change in enterprise architecture. There are also numerous distinctions related to disaster recovery, failover clustering, and high availability for public cloud use cases which drastically differ from conventional on-premise methods for ensuring business continuity. Most times, businesses are tasked with making significant network configurations to enable these preventive measures which can ultimately determine how successful cloud deployments are.

“Once you’ve made the decision that the cloud is your platform, high availability and security are two things you can’t do without,” explained Dave Bermingham, Senior Technical Evangelist, Microsoft Cloud and Datacenter Management MVP at SIOS. “You have them on-prem. Whenever you have a business critical application, you make sure you take all the steps you can to make sure it’s highly available and your network is secure.”

Availability Realities
The realities of the potential for high availability in the cloud vastly differ from their general perception. According to Bermingham, most of the major public cloud providers such as AWS, Google, Azure and others “have multiple data centers across the entire globe. Within each geographic location they have redundancy in what they call zones. Each region is divided so you can have zone one and zone two be entirely dependent of one another, so there should be no single point of failure between the zones.” The standard promises of nearly 100 percent availability contained in most service-level agreements are predicated on organizations running instances in more than one zone.

However, for certain critical applications such as database management systems like Microsoft SQL Server, for example, “the data is being written in one instance,” noted Bermingham. “Even if you have a second instance up and running, it’s not going to do you any good because the data written on the primary instance won’t be on the secondary instance unless you take steps to make that happen.” Some large cloud providers don’t have Storage Area Networks (SANs) used for conventional on-premise high availability, while there are also few out-the-box opportunities for failovers between regions. The latter is especially essential when “you have a larger outage that affects an entire region,” Bermingham said. “A lot of what we’ve seen to date has been some user error…that has a far reaching impact that could bring down an entire region. These are also susceptible to natural disasters that are regional specific.”

Disaster Recovery
Organizations can maximize disaster recovery efforts in public clouds or even mitigate the need for them with a couple different approaches. Foremost of these involves SANless clusters, which provide failover capabilities not predicated on SAN. Instead of relying on storage networks not supported by some large public clouds, this approach relies on software to facilitate failovers via an experience that is “the same as their experience on-prem with their traditional storage cluster,” Bermingham mentioned. Moreover, it is useful for standard editions of database systems like SQL Server as opposed to options like Always On availability groups.

The latter enables the replication of databases and failovers, but is a feature of the pricey enterprise edition of database management systems such as SQL Server. These alternative methods to what public clouds offer for high availability can assist with redundancy between regions, as opposed to just between zones. “You really want to have a plan B for your availability beyond just distributed across different zones in the same region,” Bermingham commented. “Being able to get your data and have a recovery plan for an entirely different region, or even from one cloud provider to another if something really awful happened and Google went offline across multiple zones, that would be really bad.”

Business Continuity
Other factors pertaining to inter-region disaster recovery expressly relate to networking differences between public clouds and on-premise settings. Typically, when failing over to additional clusters clients can simply connect to a virtual IP address that moves between servers depending on which node is active at that given point in time. This process involves gratuitous Address Resolution Protocols (ARPs), which are not supported by some of the major public cloud vendors. One solution for notifying clients of an updated IP address involves “creating some host-specific routes in different subnets so each of your nodes would live in a different subnet,” Bermingham said. “Depending upon whichever node is online, it will bring an IP address specific to that subnet online. Then, the routing tables would automatically be configured to route directly to that address with a host-specific route.”

Another option is to leverage an internal load-bouncer for client re-direction, which doesn’t work across regions. According to Bermingham: “Many people want to not only have multiple instances in different zones in a same region, but also some recovery option should there be failure in an entire region so they can stand up another instance in an entirely different region in Google Cloud. Or, they can do a hybrid cloud and replicate back on-prem, then use your on-prem as a disaster recovery site. For those configurations that span regions, the route update method is going to be the most reliable for client re-direction.”

Security Necessities
By taking these dedicated measures to ensure business continuity, disaster recovery, and high availability courtesy of failovers, organizations can truly make public cloud deployments a viable means of extracting business value. They simply require a degree of upfront preparation which many businesses aren’t aware of until they’ve already invested in public clouds. There’s also the issue of security, which correlates to certain aspects of high availability. “A lot of times when you’re talking about high availability, you’re talking about moving data across networks so you have to leverage the tools the cloud provider gives you,” Bermingham remarked.“That’s really where high availability and security intersect: making sure your data is secure in transit, and the cloud vendors will give you tools to ensure that.”

Originally Posted at: “To Cloud or Not”: Practical Considerations for Disaster Recovery and High Availability in Public Clouds

Technology Considerations for CIOs for Data Driven Organization

Technology Considerations for CIOs for Data Driven Projects

Making your business data driven requires monitoring a lot more data than you are used to, so you will have a big-data hump to jump. The bigger is the data at play, the more is the need to handle bigger interfaces, multiple data sources etc. and it requires a strong data management strategy to manage various other considerations. Depending on the business and relevant data, technology consideration may vary. Following are certain areas that you could consider for technology strategy to pursue data driven project. Use this as a basic roadmap, every business is specific and it can have more or less things to worry about.

So key technology considerations for today’s CIO’s  include:

Database Considerations:

One of the primary thing that will make entire data driven project workable is the database considerations. This is dependent on the risks associated with the database.

Consider the following:

Coexisting with existing architecture:

One thing that you have to ask yourself is how will the required technology square with the existing infrastructure. Technology integration if not planned well could toast a well-run business. So, careful consideration is needed as it has direct dependency on performance and cost of the organization. ETL (Extract, Transform and Load) tools must act as a bridge between relational environment such as Oracle and analytics data warehouse such as Teradata.

– Storage and Hardware:

To make the engine work requires lot of processing around compressions, deduplication and cache management. These functions are critical for making data analytics work efficiently. Data analytics is the backbone for most of data driven projects. There are various vendors out there with tools that are sophisticated to handle up to 45 fold compressions, and reinflation, making processing and storage tedious. So, consideration around tools and their hardware and storage need is critical. So, each tool must be carefully studied for its footprint on the organization and resource allocations should be made according to the complexity of the tools and tasks.

– Query and Analytics:

Query complication varies dependent on used case. Some queries do not require a lot of pre/post processing and some queries require deep analytics, pre and post processing. Each used case comes with its own requirement and therefore must be dealt with accordingly. Some cases may even require help of visualization tools to make data consumable. So, careful considerations must be made on low and high bar requirements of the used cases. Query and Analytics requirement will have indirect impact on the cost as well as infrastructure requirement for the business.

– Scale and Manageability:

Business often have to accumulate data from disparate data sources and analyze it in different environment making entire model difficult to scale and manageable. It is another big task to understand the complications around data modeling. It encompasses infrastructure requirements, tool requirements, talent requirements etc. to provision for future growth. A deep consideration should be given to infrastructure scalability and manageability for post data driven model business. It is a delicate task and must be done carefully for accurate measurements.

Data Architecture:
There are many other decisions to be made when considering the information architecture design as it relates to big data storage/analysis. These include choosing between relational or non-relational data stores; virtualized on-premise servers or external clouds; in-memory or disk-based processing; uncompressed data formats (quicker access) or compressed (cheaper storage). Companies also need to decide whether or not to shard – split tables by row and distribute them across multiple servers – to improve performance. Other choices to be made include choosing either column-oriented or row-oriented as the dominant processing method and hybrid platform or greenfield approach. Solution could be the best mix of above stated combinations. So, a careful run of thought must be given to data requirements.

– Column-oriented Database:

As opposed to relational (row-based databases), column-oriented database group stores data that share similar attributes, e.g. one record contains the age for every customer. This type of data organization is conducive to performing many selective queries rapidly, a benchmark of big data analytics.

– In-memory Database:

In-memory is another way to speed up processing by turning to database platforms using CPU memory for data storage instead of physical disks. This cuts down the number of cycles required for data retrieval, aggregation and processing, enabling complex queries to be executed much faster. They are expensive system and has a use when high processing rate in real-time is a priority. Many trading desks use this model to process real-time trades.

– NoSQL:

“Not Only SQL” provides a foundation Semi-structured model of data handling for inconsistent or sparse data. It’s not structured data, and therefore does not require fixed-table schemas, join operations and can scale horizontally across nodes (locally or in the cloud). NoSQL offerings come in different shapes and sizes, with open-source and licensed options and keep the needs of various social and Web platforms in mind.

– Database Appliances:

They are readily usable data nodes that are self-contained combinations of hardware and software to extend storage capabilities of relational systems or to provide an engineered system for new big data capabilities such as columnar, in-memory databases.

– Map Reduce:

Is a technique used for distributing computation of large data sets across a cluster of commodity processing nodes. Processing can be performed in parallel, as the workload is reduced into discrete independent operations, allowing some workloads to be most effectively delivered via a cloud-based infrastructure. It comes really handy when dealing with big-data problem at low cost.

Other Constraints:

Other constraints that are somewhat linked to technology but must be considered are resource requirement, market risks associated with tools etc. This could be considered as an ad-hoc task but it holds the similar pain point and must be taken seriously. Some other such decisions are if resource/support is cheap or expensive, risks with technologies being adopted and affordability of the tools.

– Resource Availability:

As you nail down on technologies needed to fuel the data engine, one question to ask is: whether the knowledgeable resources are available in abundance or will it be a nightmare to find someone to help out. It is always helpful to adopt technologies that are popular and has more resources available at lesser cost. It is a simple math of demand and supply but it ultimately helps a lot later down the road.

– Associated Risks with tools:

As we all know world is changing rapidly at the rate difficult to keep pace. With this change comes changing technological landscape. It is crucial to consider the maturity of the tools and technologies being considered. Installing something new and fresh holds the risk of lower adoption and hence lesser support. Similarly, old school technologies are always vulnerable to disruption and run-down by competition. So, technology that stays somewhere in the middle should be adopted.

– Indirect Costs & Affordability:

Another interesting point that is linked to technology is the cost and affordability associated with a particular technology. License agreements, scaling costs, and cost to manage organizational change are some of the important consideration that needs to be taken care of. What starts cheap need not be cheap later on and vice-verse, so carefully planning customer growth, market change etc would help in understanding the complete long term terms and costs with adopting any technology.

So, getting your organization on rails of data driven engine is fun, super cool, and sustainable but holds some serious technology considerations that must be considered before jumping on a particular technology.

Here is a video from IDC’s Crawford Del Prete discussing CIO Strategies around Big Data and Analytics

Source

Aug 16, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Convincing  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> How Retailer Should Use QR Code To Hit The Pot Of Gold by v1shal

>> Why the time is ripe for security behaviour analytics by analyticsweekpick

>> â€œTo Cloud or Not”: Practical Considerations for Disaster Recovery and High Availability in Public Clouds by jelaniharper

Wanna write? Click Here

[ NEWS BYTES]

>>
 Cyber security news round-up – Digital Health Under  cyber security

>>
 Global Streaming Analytics Market Current and Future Industry Trends, 2016 – 2024 – Exclusive Reportage Under  Streaming Analytics

>>
 UnitedHealth to Spread in Pharmacy Business With Genoa Deal – Zacks.com Under  Health Analytics

More NEWS ? Click Here

[ FEATURED COURSE]

Learning from data: Machine learning course

image

This is an introductory course in machine learning (ML) that covers the basic theory, algorithms, and applications. ML is a key technology in Big Data, and in many financial, medical, commercial, and scientific applicati… more

[ FEATURED READ]

Research Design: Qualitative, Quantitative, and Mixed Methods Approaches, 4th Edition

image

The eagerly anticipated Fourth Edition of the title that pioneered the comparison of qualitative, quantitative, and mixed methods research design is here! For all three approaches, Creswell includes a preliminary conside… more

[ TIPS & TRICKS OF THE WEEK]

Data Have Meaning
We live in a Big Data world in which everything is quantified. While the emphasis of Big Data has been focused on distinguishing the three characteristics of data (the infamous three Vs), we need to be cognizant of the fact that data have meaning. That is, the numbers in your data represent something of interest, an outcome that is important to your business. The meaning of those numbers is about the veracity of your data.

[ DATA SCIENCE Q&A]

Q:Do we always need the intercept term in a regression model?
A: * It guarantees that the residuals have a zero mean
* It guarantees the least squares slopes estimates are unbiased
* the regression line floats up and down, by adjusting the constant, to a point where the mean of the residuals is zero

Source

[ VIDEO OF THE WEEK]

#GlobalBusiness at the speed of The #BigAnalytics

 #GlobalBusiness at the speed of The #BigAnalytics

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

If we have data, let’s look at data. If all we have are opinions, let’s go with mine. – Jim Barksdale

[ PODCAST OF THE WEEK]

Scott Harrison (@SRHarrisonJD) on leading the learning organization #JobsOfFuture #Podcast

 Scott Harrison (@SRHarrisonJD) on leading the learning organization #JobsOfFuture #Podcast

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

100 terabytes of data uploaded daily to Facebook.

Sourced from: Analytics.CLUB #WEB Newsletter

“More Than Just Data”: The Impact of Bayesian Machine Learning on Predictive Modeling

Predictive analytics is rapidly reshaping the power of data-driven practices. The influence of its most accessible format, machine learning, encompasses everything from standard data preparation processes to the most far-flung aspects of cognitive computing. In all but the most rare instances, there is a surfeit of vendors and platforms provisioning self-service options for laymen users.

Facilitating the overarching predictive models that function as the basis for predictive analytics results can involve many different approaches. According to Shapiro+Raj Senior Vice President of Strategy, Research and Analytics Dr. Lauren Tucker, two of those readily surpass the others.

“Some of the most advanced techniques are going to be Bayesian and machine learning, because those tools are the best ones to use in an environment of uncertainty ,” Tucker disclosed.

In a world of swiftly changing business requirements and ever fickle customer bases, a look into the future can sometimes prove the best way to gain competitive advantage—particularly when it’s backed by both data and knowledge.

The Future from the Present, Not the Past
Tucker largely advocates Bayesian machine learning methods for determining predictive modeling because of their intrinsically forward-facing nature. These models involve an element of immediacy that far transcends that of “traditional advanced analytics tools that use the history to predict the future” according to Tucker. The combination of Bayesian techniques within an overarching framework for machine learning models is able to replace a reliance on historic data with information that is firmly entrenched in the present. “Bayesian machine learning allows for much more forward-facing navigation of the future by incorporating other kinds of assumptions and elements to help predict the future based on certain types of assessments of what’s happening now,” Tucker said. Their efficacy becomes all the more pronounced in use cases in which there is a wide range of options for future events or factors that could impact them, such as marketing or sales trends that are found in any number of verticals.

The Knowledge Base of Bayesian Models
Bayesian modeling techniques are knowledge based, and require the input of a diversity of shareholders to flesh out their models. Integrating this knowledge base with the data-centric approach of machine learning successfully amalgamates both stakeholder insight and undisputed data-driven facts. This situation enables the strengths of the former to fortify those of the latter, forming a gestalt that provides a reliable means of determining future outcomes. “The Bayesian approach to predictive modeling incorporates more than just data,” Tucker said. “It ties in people’s assessments of what might happen in the future, it blends in people’s expertise and knowledge and turns that into numerical values that, when combined with hardcore data, gives you a more accurate picture of what’s going to happen in the future.” Historically, the effectiveness of Bayesian modeling was hampered by limitations on computing power, memory, and other technical circumscriptions that have been overcome by modern advancements in those areas. Thus, in use cases in which organizations have to bank the enterprise on performance, the confluence of Bayesian and machine learning models utilize more resources—both data-driven and otherwise—to predict outcomes.

Transparent Data Culture
The Bayesian machine learning approach to predictive modeling offers a multitude of additional advantages, one of which is an added degree of transparency in data-centric processes. By merging human expertise into data-driven processes, organizations are able to gain a sense of credence that might otherwise prevent them from fully embracing the benefits of data culture. “The irony of course is that traditional models, because they didn’t incorporate knowledge or expertise, sort of drew predictive outcomes that just could not be assimilated into a larger organization because people were like, ‘I don’t believe that; it’s not intuitive to me,” Tucker said. Bayesian machine learning models are able to emphasize a data-driven company culture by democratizing input into the models for predictive analytics. Instead of merely relying on an elite group of data scientists or IT professionals to determine the future results of processes, “stakeholders are able to incorporate their perspectives, their expertise, their understanding of the marketplace and the future of the marketplace into the modeling process,” Tucker mentioned.

High Stakes, Performance-Based Outcomes
The most convincing use cases for Bayesian machine learning are high stakes situations in which the success of organizations is dependent on the outcomes of their predictive models. A good example of such a need for reliable predictive modeling is performance-based marketing and performance-based pay incentives, in which organizations are only rewarded for the merits that they are able to produce from their efforts. McDonald’s recently caused headlines in which it mandated performance-based rewards for its marketing entities. “I would argue that more and more clients, whether bricks and mortar or e-commerce, are going to start to have that expectation that their marketing partners and their ad agencies are required to have advanced analytics become more integrated into the core offering of the agency, and not be peripheral to it,” Tucker commented.

Upfront Preparation
The key to the Bayesian methodology for predictive modeling is a thorough assessment process in which organizations—both at the departmental and individual levels—obtain employee input for specific use cases. Again, the primary boon associated with this approach is both a more accurate model and increased involvement in reinforcing data culture. “I’m always amazed at how many people in an organization don’t have a full picture of their business situation,” Tucker revealed. “This forces that kind of conversation, and forces it to happen.” The ensuing assessments in such situations uncover almost as much business expertise of a particular domain as they do information to inform the underlying predictive models. The machine learning algorithms that influence the results of predictive analytics in such cases are equally as valuable, and are designed to improve with time and experience.

Starting Small
The actual data themselves that organizations want to utilize for the basis of Bayseian machine learning predictive models can include big data. Or, they can include traditional small data, or just small data applications of big data. “One of the things that makes Bayesian approaches and machine learning approaches challenging is how to better and more efficiently get the kind of knowledge and some of the small data that we need— insights from that that we need—in order to do the bigger predictive modeling projects,” Tucker explained. Such information can include quantitative projections for ROI and revenues, as well as qualitative measures of insight of hard-earned experience. Bayesian machine learning models are able to utilize both data and the experiences of those that are dependent on that data to do their jobs in such a way that ultimately, they both benefit. The outcome is predictive analytics that can truly inform the future of an organization’s success.

Originally Posted at: “More Than Just Data”: The Impact of Bayesian Machine Learning on Predictive Modeling

Aug 09, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Data Mining  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> April 3, 2017 Health and Biotech analytics news roundup by pstein

>> Firing Up Innovation in Big Enterprises by d3eksha

>> Sisense Boto Brings The Future To Your Fingertips by analyticsweek

Wanna write? Click Here

[ NEWS BYTES]

>>
 NEWSMAKER- UK watchdog warns financial firms over Big Data – Reuters Under  Big Data

>>
 Cloud security threats and solutions – Cyprus Mail Under  Cloud Security

>>
 This Discounted Course Will Introduce You to The World of Data Science – Daily Beast Under  Data Science

More NEWS ? Click Here

[ FEATURED COURSE]

Lean Analytics Workshop – Alistair Croll and Ben Yoskovitz

image

Use data to build a better startup faster in partnership with Geckoboard… more

[ FEATURED READ]

Superintelligence: Paths, Dangers, Strategies

image

The human brain has some capabilities that the brains of other animals lack. It is to these distinctive capabilities that our species owes its dominant position. Other animals have stronger muscles or sharper claws, but … more

[ TIPS & TRICKS OF THE WEEK]

Fix the Culture, spread awareness to get awareness
Adoption of analytics tools and capabilities has not yet caught up to industry standards. Talent has always been the bottleneck towards achieving the comparative enterprise adoption. One of the primal reason is lack of understanding and knowledge within the stakeholders. To facilitate wider adoption, data analytics leaders, users, and community members needs to step up to create awareness within the organization. An aware organization goes a long way in helping get quick buy-ins and better funding which ultimately leads to faster adoption. So be the voice that you want to hear from leadership.

[ DATA SCIENCE Q&A]

Q:Is it better to design robust or accurate algorithms?
A: A. The ultimate goal is to design systems with good generalization capacity, that is, systems that correctly identify patterns in data instances not seen before
B. The generalization performance of a learning system strongly depends on the complexity of the model assumed
C. If the model is too simple, the system can only capture the actual data regularities in a rough manner. In this case, the system poor generalization properties and is said to suffer from underfitting
D. By contrast, when the model is too complex, the system can identify accidental patterns in the training data that need not be present in the test set. These spurious patterns can be the result of random fluctuations or of measurement errors during the data collection process. In this case, the generalization capacity of the learning system is also poor. The learning system is said to be affected by overfitting
E. Spurious patterns, which are only present by accident in the data, tend to have complex forms. This is the idea behind the principle of Occam’s razor for avoiding overfitting: simpler models are preferred if more complex models do not significantly improve the quality of the description for the observations
Quick response: Occam’s Razor. It depends on the learning task. Choose the right balance
F. Ensemble learning can help balancing bias/variance (several weak learners together = strong learner)
Source

[ VIDEO OF THE WEEK]

@AnalyticsWeek Panel Discussion: Marketing Analytics

 @AnalyticsWeek Panel Discussion: Marketing Analytics

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

War is 90% information. – Napoleon Bonaparte

[ PODCAST OF THE WEEK]

#BigData @AnalyticsWeek #FutureOfData #Podcast with Dr. Nipa Basu, @DnBUS

 #BigData @AnalyticsWeek #FutureOfData #Podcast with Dr. Nipa Basu, @DnBUS

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

Estimates suggest that by better integrating big data, healthcare could save as much as $300 billion a year — that’s equal to reducing costs by $1000 a year for every man, woman, and child.

Sourced from: Analytics.CLUB #WEB Newsletter

[Podcast] Radical Transparency: Building A Kinder, Gentler Financial Marketplace

[soundcloud url=”https://api.soundcloud.com/tracks/463116444″ params=”color=#ff5500&auto_play=false&hide_related=false&show_comments=false&show_user=true&show_reposts=false&show_teaser=true” width=”100%” height=”166″ iframe=”true” /]

Ah – financial markets: a place where buyers and sellers come together to nail each other to the wall.

Like a giant Thunderdome of capitalism, buyers buy because they think the value of a security is going to go up, and sellers sell because they think it’s going to go down. It’s “two people enter, one person leaves”… having got the better of the other.

At least, that’s the way it’s supposed to work. But, apparently, PolicyBazaar, an India-based online marketplace for all types of insurance (and our customer), did not get the memo.

No, our friends at PolicyBazaar seem to think that a marketplace for financial products is where buyers and sellers come together to have a giant tea party, where both parties hold hands, exchange goods for services, and magically find themselves enriched, and wiser, for having engaged in a transaction.

By leveraging analytics to create a more transparent marketplace, PolicyBazaar has delivered valuable insights to sellers, helping them enhance their offerings to provide better, more-competitive products that more people want to buy.

On the other side of the transaction, PolicyBazaar has leveraged analytics so that their service team can counsel buyers, helping ensure that they get the best policy for the best price.

And why do they do this? Are they working both sides of the street to get consulting fees from the sellers and tips for good service from the buyers?

No. They do not.

The reason is that they do make a set fee on what’s sold online. And, as it turns out, that’s a great model, because they sell a whole lot of insurance. In fact, PolicyBazaar has captured a 90 percent market share in insurance in India in just a few years.

In the end, the radical transparency embraced by PolicyBazaar ensures that buyers buy more and are more satisfied, and sellers have better products and sell more. And absolutely nobody gets nailed to the wall on the deal.

But what’s the fun in that?

In this episode of Radical Transparency, we talk to the architect behind PolicyBazaar, CTO Ashish Gupta. We also speak to Max Wolff, Chief Economist at The Phoenix Group, who talks through malevolent markets (used cars) and benevolent markets (launching smartphones) to explain where PolicyBazaar resides.

Wolff also discusses whether analytics-driven markets approach what has been only a theoretical construct – a perfect market outcome – and whether the PolicyBazaar model is original, sustainable, and ethical.

You can check out the current episode here.

About Radical Transparency

Radical Transparency shows how business decision-makers are using analytics to make unexpected discoveries that revolutionize companies, disrupt industries, and, sometimes, change the world. Radical Transparency combines storytelling with analysis, economics, and data science to highlight the big opportunities and the big mistakes uncovered by analytics in business.

Source: [Podcast] Radical Transparency: Building A Kinder, Gentler Financial Marketplace

The Bifurcation of Data Science: Making it Work

Today’s data scientists face expectations of an enormous magnitude. These professionals are singlehandedly tasked with mastering a variety of domains, from those relevant to various business units and their objectives to some of the more far-reaching principles of statistics, algorithms, and advanced analytics.

Moreover, organizations are increasingly turning to them to solve complicated business problems with an immediacy which justifies both their salaries and the scarcity with which these employees are found. When one couples these factors with the mounting pressures facing the enterprise today in the form of increasing expenditures, operational complexity, and regulatory concerns, it appears this discipline as a whole is being significantly challenged to pave the way for successfully monetizing data-driven practices.

According to Cambridge Semantics advisor Carl Reed, who spent a substantial amount of time employing data-centric solutions at both Goldman Sachs and Credit Suisse, a number of these expectations are well-intentioned, but misplaced:

“Everyone is hitting the same problem with big data, and it starts with this perception and anticipation that the concept of turning data into business intelligence that can be a differentiator, a competitive weapon, is easy to do. But the majority of the people having that conversation don’t truly understand the data dimension of the data science they’re talking about.”

Truly understanding that data dimension involves codifying it into two elements: that which pertains to science and analytics and that which pertains to the business. According to Reed, mastering these elements of data science results in a profound ability to “really understand the investment you have to make to monetize your output and, by the way, it’s the exact same investment you have to make for all these other data fronts that are pressurizing your organization. So, the smart thing to do is to make that investment once and maximize your return.”

The Problems of Data Science
The problems associated with data science are twofold, although their impacts on the enterprise are far from commensurate with each other. This bifurcation is based on the pair of roles which data scientists have to fundamentally fulfill for organizations. The first is to function as a savvy analytics expert (what Reed termed “the guy that’s got PhD level experience in the complexities of advanced machine learning, neural networks, vector machines, etc.”); the second is to do so as a business professional attuned to the needs of the multiple units who rely on both the data scientist and his or her analytics output to do their jobs better. This second dimension is pivotal and effectively represents the initial problem organizations encounter when attempting to leverage data science—translating that science into quantitative business value. “You want to couple the first role with the business so that the data scientist can apply his modeling expertise to real world business problems, leveraging the subject matter expertise of business partners so that what he comes up with can be monetized,” Reed explained. “That’s problem number one: the model can be fantastic but it has to be monetized.”

The second problem attending the work of data scientists has been well documented—a fact which has done little to alleviate it. The copious quantities of big data involved in such work require an inordinate amount of what Reed termed “data engineering”, the oftentimes manual preparation work of cleansing, disambiguating, and curating data for a suitable model prior to actually implementing data for any practical purpose. This problem is particularly poignant because, as Reed mentioned, it “isn’t pushed to the edge; it’s more pulled to the center of your universe. Every single data scientist that I’ve spoken to and certainly the ones that I have on my team consider data engineering from a data science perspective to be “data janitorial work”.

Data Governance and Data Quality Ramifications
The data engineering quandary is so disadvantageous to data science and the enterprise today for multiple reasons. Firstly, it’s work that these professionals (with their advanced degrees and analytics shrewdness) are typically over qualified for. Furthermore, the ongoing data preparation process is exorbitantly time-consuming, which keeps these professionals from spending more time working on solutions that help the business realize organizational objectives. The time consumption involved in this process becomes substantially aggravated when advanced analytics algorithms for machine learning or deep learning are involved, as Reed revealed. “I’ve had anything from young, fresh, super productive 26 year olds coming straight out of PhD programs telling me that they’re spending 80 percent of their time cleaning and connecting data so they can fit into their model. If you think about machine learning where you’ve got a massive learning set of data which is indicative of the past, especially if you’re talking about supervised [machine learning], then you need a whole bunch of connected test data to make sure your model is doing what’s expected. That’s an enormous amount of data engineering work, not modeling work.”

The necessity of data engineering is that it must take place—correctly—in order to get the desired results out of data models and any subsequent analytics or applications dependent on such data. However, it is just that necessity which frequently creates situations in which such engineering is done locally for the needs of specific data scientists using sources for specific applications, which are oftentimes not reusable or applicable for additional business units or use cases. The resulting data quality and governance complications can swiftly subtract any value attached to the use of such data.

According to Reed: “The data scientist is important, but unless you want the data scientist to be constantly distracted by cleaning data with all the negative connotations that drag with it, which is everyone’s going to use their own lingua franca, everyone’s going to use their own context, everyone’s going to clean data duplicitously, everyone’s going to clean data for themselves which, in reality, is just adding yet another entity for the enterprise to disambiguate when they look across stuff. Essentially, if you leave it to the data science they’re not turning the data into an asset. They’re turning the data into a by-product of their model.”

Recurring Centralization Wins
The solution to these data science woes (and the key to its practical implementation) is as binary as the two primary aspects of these problems are. Organizations should make the data engineering process a separate function from data science, freeing data scientists’ time to focus on the creation of data-centric answers to business problems. Additionally, they should ensure that their data engineering function is centralized to avoid any localized issues of data governance and data quality stemming from provincial data preparation efforts. There are many methods by which organizations can achieve these objectives, from deploying any variety of data preparation and self-service data preparation options, to utilizing the standards-based approach of semantic graph technologies which implement such preparation work upfront for a naturally evolving user experience. By separating the data engineering necessity as an individual responsibility in a centralized fashion, the enterprise is able to “push that responsibility hopefully towards the center of your organization where you can stitch data together, you can disambiguate it, [and] you can start treating it like an enterprise asset making it available to your data science teams which you can then associate with your businesses and pull to the edge of your organization so that they leverage clean signal many times over as consumers versus having to create clean signals parochially for themselves,” Reed remarked.

Source by jelaniharper