Aug 16, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Convincing  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> How Retailer Should Use QR Code To Hit The Pot Of Gold by v1shal

>> Why the time is ripe for security behaviour analytics by analyticsweekpick

>> â€œTo Cloud or Not”: Practical Considerations for Disaster Recovery and High Availability in Public Clouds by jelaniharper

Wanna write? Click Here

[ NEWS BYTES]

>>
 Cyber security news round-up – Digital Health Under  cyber security

>>
 Global Streaming Analytics Market Current and Future Industry Trends, 2016 – 2024 – Exclusive Reportage Under  Streaming Analytics

>>
 UnitedHealth to Spread in Pharmacy Business With Genoa Deal – Zacks.com Under  Health Analytics

More NEWS ? Click Here

[ FEATURED COURSE]

Learning from data: Machine learning course

image

This is an introductory course in machine learning (ML) that covers the basic theory, algorithms, and applications. ML is a key technology in Big Data, and in many financial, medical, commercial, and scientific applicati… more

[ FEATURED READ]

Research Design: Qualitative, Quantitative, and Mixed Methods Approaches, 4th Edition

image

The eagerly anticipated Fourth Edition of the title that pioneered the comparison of qualitative, quantitative, and mixed methods research design is here! For all three approaches, Creswell includes a preliminary conside… more

[ TIPS & TRICKS OF THE WEEK]

Data Have Meaning
We live in a Big Data world in which everything is quantified. While the emphasis of Big Data has been focused on distinguishing the three characteristics of data (the infamous three Vs), we need to be cognizant of the fact that data have meaning. That is, the numbers in your data represent something of interest, an outcome that is important to your business. The meaning of those numbers is about the veracity of your data.

[ DATA SCIENCE Q&A]

Q:Do we always need the intercept term in a regression model?
A: * It guarantees that the residuals have a zero mean
* It guarantees the least squares slopes estimates are unbiased
* the regression line floats up and down, by adjusting the constant, to a point where the mean of the residuals is zero

Source

[ VIDEO OF THE WEEK]

#GlobalBusiness at the speed of The #BigAnalytics

 #GlobalBusiness at the speed of The #BigAnalytics

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

If we have data, let’s look at data. If all we have are opinions, let’s go with mine. – Jim Barksdale

[ PODCAST OF THE WEEK]

Scott Harrison (@SRHarrisonJD) on leading the learning organization #JobsOfFuture #Podcast

 Scott Harrison (@SRHarrisonJD) on leading the learning organization #JobsOfFuture #Podcast

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

100 terabytes of data uploaded daily to Facebook.

Sourced from: Analytics.CLUB #WEB Newsletter

“More Than Just Data”: The Impact of Bayesian Machine Learning on Predictive Modeling

Predictive analytics is rapidly reshaping the power of data-driven practices. The influence of its most accessible format, machine learning, encompasses everything from standard data preparation processes to the most far-flung aspects of cognitive computing. In all but the most rare instances, there is a surfeit of vendors and platforms provisioning self-service options for laymen users.

Facilitating the overarching predictive models that function as the basis for predictive analytics results can involve many different approaches. According to Shapiro+Raj Senior Vice President of Strategy, Research and Analytics Dr. Lauren Tucker, two of those readily surpass the others.

“Some of the most advanced techniques are going to be Bayesian and machine learning, because those tools are the best ones to use in an environment of uncertainty ,” Tucker disclosed.

In a world of swiftly changing business requirements and ever fickle customer bases, a look into the future can sometimes prove the best way to gain competitive advantage—particularly when it’s backed by both data and knowledge.

The Future from the Present, Not the Past
Tucker largely advocates Bayesian machine learning methods for determining predictive modeling because of their intrinsically forward-facing nature. These models involve an element of immediacy that far transcends that of “traditional advanced analytics tools that use the history to predict the future” according to Tucker. The combination of Bayesian techniques within an overarching framework for machine learning models is able to replace a reliance on historic data with information that is firmly entrenched in the present. “Bayesian machine learning allows for much more forward-facing navigation of the future by incorporating other kinds of assumptions and elements to help predict the future based on certain types of assessments of what’s happening now,” Tucker said. Their efficacy becomes all the more pronounced in use cases in which there is a wide range of options for future events or factors that could impact them, such as marketing or sales trends that are found in any number of verticals.

The Knowledge Base of Bayesian Models
Bayesian modeling techniques are knowledge based, and require the input of a diversity of shareholders to flesh out their models. Integrating this knowledge base with the data-centric approach of machine learning successfully amalgamates both stakeholder insight and undisputed data-driven facts. This situation enables the strengths of the former to fortify those of the latter, forming a gestalt that provides a reliable means of determining future outcomes. “The Bayesian approach to predictive modeling incorporates more than just data,” Tucker said. “It ties in people’s assessments of what might happen in the future, it blends in people’s expertise and knowledge and turns that into numerical values that, when combined with hardcore data, gives you a more accurate picture of what’s going to happen in the future.” Historically, the effectiveness of Bayesian modeling was hampered by limitations on computing power, memory, and other technical circumscriptions that have been overcome by modern advancements in those areas. Thus, in use cases in which organizations have to bank the enterprise on performance, the confluence of Bayesian and machine learning models utilize more resources—both data-driven and otherwise—to predict outcomes.

Transparent Data Culture
The Bayesian machine learning approach to predictive modeling offers a multitude of additional advantages, one of which is an added degree of transparency in data-centric processes. By merging human expertise into data-driven processes, organizations are able to gain a sense of credence that might otherwise prevent them from fully embracing the benefits of data culture. “The irony of course is that traditional models, because they didn’t incorporate knowledge or expertise, sort of drew predictive outcomes that just could not be assimilated into a larger organization because people were like, ‘I don’t believe that; it’s not intuitive to me,” Tucker said. Bayesian machine learning models are able to emphasize a data-driven company culture by democratizing input into the models for predictive analytics. Instead of merely relying on an elite group of data scientists or IT professionals to determine the future results of processes, “stakeholders are able to incorporate their perspectives, their expertise, their understanding of the marketplace and the future of the marketplace into the modeling process,” Tucker mentioned.

High Stakes, Performance-Based Outcomes
The most convincing use cases for Bayesian machine learning are high stakes situations in which the success of organizations is dependent on the outcomes of their predictive models. A good example of such a need for reliable predictive modeling is performance-based marketing and performance-based pay incentives, in which organizations are only rewarded for the merits that they are able to produce from their efforts. McDonald’s recently caused headlines in which it mandated performance-based rewards for its marketing entities. “I would argue that more and more clients, whether bricks and mortar or e-commerce, are going to start to have that expectation that their marketing partners and their ad agencies are required to have advanced analytics become more integrated into the core offering of the agency, and not be peripheral to it,” Tucker commented.

Upfront Preparation
The key to the Bayesian methodology for predictive modeling is a thorough assessment process in which organizations—both at the departmental and individual levels—obtain employee input for specific use cases. Again, the primary boon associated with this approach is both a more accurate model and increased involvement in reinforcing data culture. “I’m always amazed at how many people in an organization don’t have a full picture of their business situation,” Tucker revealed. “This forces that kind of conversation, and forces it to happen.” The ensuing assessments in such situations uncover almost as much business expertise of a particular domain as they do information to inform the underlying predictive models. The machine learning algorithms that influence the results of predictive analytics in such cases are equally as valuable, and are designed to improve with time and experience.

Starting Small
The actual data themselves that organizations want to utilize for the basis of Bayseian machine learning predictive models can include big data. Or, they can include traditional small data, or just small data applications of big data. “One of the things that makes Bayesian approaches and machine learning approaches challenging is how to better and more efficiently get the kind of knowledge and some of the small data that we need— insights from that that we need—in order to do the bigger predictive modeling projects,” Tucker explained. Such information can include quantitative projections for ROI and revenues, as well as qualitative measures of insight of hard-earned experience. Bayesian machine learning models are able to utilize both data and the experiences of those that are dependent on that data to do their jobs in such a way that ultimately, they both benefit. The outcome is predictive analytics that can truly inform the future of an organization’s success.

Originally Posted at: “More Than Just Data”: The Impact of Bayesian Machine Learning on Predictive Modeling

Aug 09, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Data Mining  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> April 3, 2017 Health and Biotech analytics news roundup by pstein

>> Firing Up Innovation in Big Enterprises by d3eksha

>> Sisense Boto Brings The Future To Your Fingertips by analyticsweek

Wanna write? Click Here

[ NEWS BYTES]

>>
 NEWSMAKER- UK watchdog warns financial firms over Big Data – Reuters Under  Big Data

>>
 Cloud security threats and solutions – Cyprus Mail Under  Cloud Security

>>
 This Discounted Course Will Introduce You to The World of Data Science – Daily Beast Under  Data Science

More NEWS ? Click Here

[ FEATURED COURSE]

Lean Analytics Workshop – Alistair Croll and Ben Yoskovitz

image

Use data to build a better startup faster in partnership with Geckoboard… more

[ FEATURED READ]

Superintelligence: Paths, Dangers, Strategies

image

The human brain has some capabilities that the brains of other animals lack. It is to these distinctive capabilities that our species owes its dominant position. Other animals have stronger muscles or sharper claws, but … more

[ TIPS & TRICKS OF THE WEEK]

Fix the Culture, spread awareness to get awareness
Adoption of analytics tools and capabilities has not yet caught up to industry standards. Talent has always been the bottleneck towards achieving the comparative enterprise adoption. One of the primal reason is lack of understanding and knowledge within the stakeholders. To facilitate wider adoption, data analytics leaders, users, and community members needs to step up to create awareness within the organization. An aware organization goes a long way in helping get quick buy-ins and better funding which ultimately leads to faster adoption. So be the voice that you want to hear from leadership.

[ DATA SCIENCE Q&A]

Q:Is it better to design robust or accurate algorithms?
A: A. The ultimate goal is to design systems with good generalization capacity, that is, systems that correctly identify patterns in data instances not seen before
B. The generalization performance of a learning system strongly depends on the complexity of the model assumed
C. If the model is too simple, the system can only capture the actual data regularities in a rough manner. In this case, the system poor generalization properties and is said to suffer from underfitting
D. By contrast, when the model is too complex, the system can identify accidental patterns in the training data that need not be present in the test set. These spurious patterns can be the result of random fluctuations or of measurement errors during the data collection process. In this case, the generalization capacity of the learning system is also poor. The learning system is said to be affected by overfitting
E. Spurious patterns, which are only present by accident in the data, tend to have complex forms. This is the idea behind the principle of Occam’s razor for avoiding overfitting: simpler models are preferred if more complex models do not significantly improve the quality of the description for the observations
Quick response: Occam’s Razor. It depends on the learning task. Choose the right balance
F. Ensemble learning can help balancing bias/variance (several weak learners together = strong learner)
Source

[ VIDEO OF THE WEEK]

@AnalyticsWeek Panel Discussion: Marketing Analytics

 @AnalyticsWeek Panel Discussion: Marketing Analytics

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

War is 90% information. – Napoleon Bonaparte

[ PODCAST OF THE WEEK]

#BigData @AnalyticsWeek #FutureOfData #Podcast with Dr. Nipa Basu, @DnBUS

 #BigData @AnalyticsWeek #FutureOfData #Podcast with Dr. Nipa Basu, @DnBUS

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

Estimates suggest that by better integrating big data, healthcare could save as much as $300 billion a year — that’s equal to reducing costs by $1000 a year for every man, woman, and child.

Sourced from: Analytics.CLUB #WEB Newsletter

[Podcast] Radical Transparency: Building A Kinder, Gentler Financial Marketplace

[soundcloud url=”https://api.soundcloud.com/tracks/463116444″ params=”color=#ff5500&auto_play=false&hide_related=false&show_comments=false&show_user=true&show_reposts=false&show_teaser=true” width=”100%” height=”166″ iframe=”true” /]

Ah – financial markets: a place where buyers and sellers come together to nail each other to the wall.

Like a giant Thunderdome of capitalism, buyers buy because they think the value of a security is going to go up, and sellers sell because they think it’s going to go down. It’s “two people enter, one person leaves”… having got the better of the other.

At least, that’s the way it’s supposed to work. But, apparently, PolicyBazaar, an India-based online marketplace for all types of insurance (and our customer), did not get the memo.

No, our friends at PolicyBazaar seem to think that a marketplace for financial products is where buyers and sellers come together to have a giant tea party, where both parties hold hands, exchange goods for services, and magically find themselves enriched, and wiser, for having engaged in a transaction.

By leveraging analytics to create a more transparent marketplace, PolicyBazaar has delivered valuable insights to sellers, helping them enhance their offerings to provide better, more-competitive products that more people want to buy.

On the other side of the transaction, PolicyBazaar has leveraged analytics so that their service team can counsel buyers, helping ensure that they get the best policy for the best price.

And why do they do this? Are they working both sides of the street to get consulting fees from the sellers and tips for good service from the buyers?

No. They do not.

The reason is that they do make a set fee on what’s sold online. And, as it turns out, that’s a great model, because they sell a whole lot of insurance. In fact, PolicyBazaar has captured a 90 percent market share in insurance in India in just a few years.

In the end, the radical transparency embraced by PolicyBazaar ensures that buyers buy more and are more satisfied, and sellers have better products and sell more. And absolutely nobody gets nailed to the wall on the deal.

But what’s the fun in that?

In this episode of Radical Transparency, we talk to the architect behind PolicyBazaar, CTO Ashish Gupta. We also speak to Max Wolff, Chief Economist at The Phoenix Group, who talks through malevolent markets (used cars) and benevolent markets (launching smartphones) to explain where PolicyBazaar resides.

Wolff also discusses whether analytics-driven markets approach what has been only a theoretical construct – a perfect market outcome – and whether the PolicyBazaar model is original, sustainable, and ethical.

You can check out the current episode here.

About Radical Transparency

Radical Transparency shows how business decision-makers are using analytics to make unexpected discoveries that revolutionize companies, disrupt industries, and, sometimes, change the world. Radical Transparency combines storytelling with analysis, economics, and data science to highlight the big opportunities and the big mistakes uncovered by analytics in business.

Source: [Podcast] Radical Transparency: Building A Kinder, Gentler Financial Marketplace

The Bifurcation of Data Science: Making it Work

Today’s data scientists face expectations of an enormous magnitude. These professionals are singlehandedly tasked with mastering a variety of domains, from those relevant to various business units and their objectives to some of the more far-reaching principles of statistics, algorithms, and advanced analytics.

Moreover, organizations are increasingly turning to them to solve complicated business problems with an immediacy which justifies both their salaries and the scarcity with which these employees are found. When one couples these factors with the mounting pressures facing the enterprise today in the form of increasing expenditures, operational complexity, and regulatory concerns, it appears this discipline as a whole is being significantly challenged to pave the way for successfully monetizing data-driven practices.

According to Cambridge Semantics advisor Carl Reed, who spent a substantial amount of time employing data-centric solutions at both Goldman Sachs and Credit Suisse, a number of these expectations are well-intentioned, but misplaced:

“Everyone is hitting the same problem with big data, and it starts with this perception and anticipation that the concept of turning data into business intelligence that can be a differentiator, a competitive weapon, is easy to do. But the majority of the people having that conversation don’t truly understand the data dimension of the data science they’re talking about.”

Truly understanding that data dimension involves codifying it into two elements: that which pertains to science and analytics and that which pertains to the business. According to Reed, mastering these elements of data science results in a profound ability to “really understand the investment you have to make to monetize your output and, by the way, it’s the exact same investment you have to make for all these other data fronts that are pressurizing your organization. So, the smart thing to do is to make that investment once and maximize your return.”

The Problems of Data Science
The problems associated with data science are twofold, although their impacts on the enterprise are far from commensurate with each other. This bifurcation is based on the pair of roles which data scientists have to fundamentally fulfill for organizations. The first is to function as a savvy analytics expert (what Reed termed “the guy that’s got PhD level experience in the complexities of advanced machine learning, neural networks, vector machines, etc.”); the second is to do so as a business professional attuned to the needs of the multiple units who rely on both the data scientist and his or her analytics output to do their jobs better. This second dimension is pivotal and effectively represents the initial problem organizations encounter when attempting to leverage data science—translating that science into quantitative business value. “You want to couple the first role with the business so that the data scientist can apply his modeling expertise to real world business problems, leveraging the subject matter expertise of business partners so that what he comes up with can be monetized,” Reed explained. “That’s problem number one: the model can be fantastic but it has to be monetized.”

The second problem attending the work of data scientists has been well documented—a fact which has done little to alleviate it. The copious quantities of big data involved in such work require an inordinate amount of what Reed termed “data engineering”, the oftentimes manual preparation work of cleansing, disambiguating, and curating data for a suitable model prior to actually implementing data for any practical purpose. This problem is particularly poignant because, as Reed mentioned, it “isn’t pushed to the edge; it’s more pulled to the center of your universe. Every single data scientist that I’ve spoken to and certainly the ones that I have on my team consider data engineering from a data science perspective to be “data janitorial work”.

Data Governance and Data Quality Ramifications
The data engineering quandary is so disadvantageous to data science and the enterprise today for multiple reasons. Firstly, it’s work that these professionals (with their advanced degrees and analytics shrewdness) are typically over qualified for. Furthermore, the ongoing data preparation process is exorbitantly time-consuming, which keeps these professionals from spending more time working on solutions that help the business realize organizational objectives. The time consumption involved in this process becomes substantially aggravated when advanced analytics algorithms for machine learning or deep learning are involved, as Reed revealed. “I’ve had anything from young, fresh, super productive 26 year olds coming straight out of PhD programs telling me that they’re spending 80 percent of their time cleaning and connecting data so they can fit into their model. If you think about machine learning where you’ve got a massive learning set of data which is indicative of the past, especially if you’re talking about supervised [machine learning], then you need a whole bunch of connected test data to make sure your model is doing what’s expected. That’s an enormous amount of data engineering work, not modeling work.”

The necessity of data engineering is that it must take place—correctly—in order to get the desired results out of data models and any subsequent analytics or applications dependent on such data. However, it is just that necessity which frequently creates situations in which such engineering is done locally for the needs of specific data scientists using sources for specific applications, which are oftentimes not reusable or applicable for additional business units or use cases. The resulting data quality and governance complications can swiftly subtract any value attached to the use of such data.

According to Reed: “The data scientist is important, but unless you want the data scientist to be constantly distracted by cleaning data with all the negative connotations that drag with it, which is everyone’s going to use their own lingua franca, everyone’s going to use their own context, everyone’s going to clean data duplicitously, everyone’s going to clean data for themselves which, in reality, is just adding yet another entity for the enterprise to disambiguate when they look across stuff. Essentially, if you leave it to the data science they’re not turning the data into an asset. They’re turning the data into a by-product of their model.”

Recurring Centralization Wins
The solution to these data science woes (and the key to its practical implementation) is as binary as the two primary aspects of these problems are. Organizations should make the data engineering process a separate function from data science, freeing data scientists’ time to focus on the creation of data-centric answers to business problems. Additionally, they should ensure that their data engineering function is centralized to avoid any localized issues of data governance and data quality stemming from provincial data preparation efforts. There are many methods by which organizations can achieve these objectives, from deploying any variety of data preparation and self-service data preparation options, to utilizing the standards-based approach of semantic graph technologies which implement such preparation work upfront for a naturally evolving user experience. By separating the data engineering necessity as an individual responsibility in a centralized fashion, the enterprise is able to “push that responsibility hopefully towards the center of your organization where you can stitch data together, you can disambiguate it, [and] you can start treating it like an enterprise asset making it available to your data science teams which you can then associate with your businesses and pull to the edge of your organization so that they leverage clean signal many times over as consumers versus having to create clean signals parochially for themselves,” Reed remarked.

Source by jelaniharper

Aug 02, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Data Mining  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ NEWS BYTES]

>>
 Business Analytics & Enterprise Software Market Business Attractiveness 2018 to 2021 – Expert Consulting Under  Business Analytics

>>
 Set up Hyper-V nested virtualization for production – TechTarget Under  Virtualization

>>
 Virtualization Is Kicking Juniper in the Berries – Light Reading Under  Virtualization

More NEWS ? Click Here

[ FEATURED COURSE]

Baseball Data Wrangling with Vagrant, R, and Retrosheet

image

Analytics with the Chadwick tools, dplyr, and ggplot…. more

[ FEATURED READ]

The Signal and the Noise: Why So Many Predictions Fail–but Some Don’t

image

People love statistics. Statistics, however, do not always love them back. The Signal and the Noise, Nate Silver’s brilliant and elegant tour of the modern science-slash-art of forecasting, shows what happens when Big Da… more

[ TIPS & TRICKS OF THE WEEK]

Keeping Biases Checked during the last mile of decision making
Today a data driven leader, a data scientist or a data driven expert is always put to test by helping his team solve a problem using his skills and expertise. Believe it or not but a part of that decision tree is derived from the intuition that adds a bias in our judgement that makes the suggestions tainted. Most skilled professionals do understand and handle the biases well, but in few cases, we give into tiny traps and could find ourselves trapped in those biases which impairs the judgement. So, it is important that we keep the intuition bias in check when working on a data problem.

[ DATA SCIENCE Q&A]

Q:Explain the difference between “long” and “wide” format data. Why would you use one or the other?
A: * Long: one column containing the values and another column listing the context of the value Fam_id year fam_inc

* Wide: each different variable in a separate column
Fam_id fam_inc96 fam_inc97 fam_inc98

Long Vs Wide:
– Data manipulations are much easier when data is in the wide format: summarize, filter
– Program requirements

Source

[ VIDEO OF THE WEEK]

Advanced #Analytics in #Hadoop

 Advanced #Analytics in #Hadoop

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

Data beats emotions. – Sean Rad, founder of Ad.ly

[ PODCAST OF THE WEEK]

Scott Harrison (@SRHarrisonJD) on leading the learning organization #JobsOfFuture #Podcast

 Scott Harrison (@SRHarrisonJD) on leading the learning organization #JobsOfFuture #Podcast

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

39 percent of marketers say that their data is collected ‘too infrequently or not real-time enough.’

Sourced from: Analytics.CLUB #WEB Newsletter

Better Business Intelligence Boons with Data Virtualization

Data virtualization’s considerable influence throughout the data sphere is inextricably linked to its pervasive utility. It is at once the forefather of the current attention surrounding data preparation, the precursor to the data lake phenomenon, and one of the principal enablers of the abundant volumes and varieties of big data in its raw form.

Nevertheless, its enduring reputation as a means of enhancing business intelligence and analytics continues to persist, and is possibly the most compelling, cost-saving, and time-saving application of its widespread utility.

“It brings together related pieces of information sitting in different data stores, and then provides that to an analytical platform for them to view or make some analysis for reporting,” explained Ravi Shankar, Denodo Chief Marketing Officer. “Or it could be used to accomplish some use case such as a customer service rep that is looking at all the customer information, all the products they have bought, and how many affect them.”

Better BI
Data virtualization technologies are responsible for providing an abstraction layer of multiple sources, structures, types, and locations of data, which is accessible in a singular platform for any purpose. The data themselves do not actually move, but users can access them from the same place. The foundation of virtualization rests upon its ability to integrate data for a single, or even multiple, applications. Although there is no shortage of use cases for such singular, seamless integration—which, in modern or what Shankar calls “third generation” platforms involves a uniform semantic layer to rectify disparities in modeling and meaning—the benefits to BI users is one of the most immediately discernible.

Typical BI processing of various data from different stores and locations requires mapping of each store to an external one, and replicating data (which is usually regarded as a bottleneck) to that destination. The transformation for ETL involves the generation of copious amounts of code which is further slowed by a lengthy testing period. “This could take up to three months for an organization to do this work, especially for the volume of big data involved in some projects,” Shankar noted.

Virtualization’s Business Model
By working with what Shankar termed a “business model” associated with data virtualization, users simply drag and drop the requisite data into a user interface before publishing it to conventional BI or analytics tools. There is much less code involved with the process, which translates into decreased time to insight and less dedication of manpower. “What takes three months with a data integration ETL tool might take a week with data virtualization,” Shankar mentioned. “Usually the comparison is between 1 to 4 and 1 to 6.” That same ratio applies to IT people working on the ETL process versus a solitary data analyst leveraging virtualization technologies. “So you save on the cost and you save on the time with data virtualization,” Shankar said.

Beyond Prep, Before Data Lakes
Those cost and time reductions are worthy of examination in multiple ways. Not only do they apply to the data preparation process of integrating and transforming data, but they also affect the front offices that are leveraging data quicker and more cheaper than standard BI paradigms allow them to do so. Since data is both stored and accessed in their native forms, data virtualization technologies represent the first rudimentary attempts at the data lake concept in which all data are stored in their native formats. This sort of aggregation is ideal for use with big data and its plentiful forms and structures, enabling organizations to analyze it with traditional BI tools. Still, such integration becomes more troublesome than valuable without common semantics. “The source systems, which are databases, have a very technical way of defining semantics,” Shankar mentioned. “But for the business user, you need to have a semantic layer.”

Before BI: Semantics and Preparation
A critical aspect of the integration that data virtualization provides is its designation of a unified semantic layer, which Shankar states is essential to “the transformation from a technical to a business level” of understanding the data’s significance. Semantic consistency is invaluable to ensuring successful integration and standardized meaning of terms and definitions. Traditional BI mechanisms require ETL tools and separate measures for data quality to cohere such semantics. However, this pivotal step is frequently complicated in some of the leading BI platforms on the market, which account for semantics in multiple layers.

This complexity is amplified by the implementation of multiple tools and use cases across the enterprise. Virtualization platforms address this requisite by provisioning a central location for common semantics that are applicable to the plethora of uses and platforms that organizations have. “What customers are doing now is centralizing their semantic layer and definitions within the data virtualization layer itself,” Shankar remarked. “So they don’t have to duplicate that within any of the tools.”

Governance and Security
The lack of governance and security that can conceptually hamper data lakes—turning them into proverbial data swamps—does not exist with data virtualization platforms. There are multiple ways in which these technologies account for governance. Firstly, they enable the same sort of access controls from the source system at the virtualization layer. “If I’m going to Salesforce.com within my own company, I can see the sales opportunities but someone else in marketing who’s below me cannot see those sales opportunities,” Shankar said. “They can see what else there is of the marketing. If you have that level of security already set up, then the data virtualization will be able to block you from being able to see that information.”

This security measure is augmented (or possibly even superseded) by leveraging virtualization as a form of single sign-on, whereas users can no longer directly access an application but instead have to go through the virtualization layer first. In this case, “the data virtualization layer becomes the layer where we will do the authorization and authentication for all of the source systems,” Shankar said. “That way all the security policies are governed in one central place and you don’t have to program them for each of the separate applications.”

Beyond BI
The benefits that data virtualization produces easily extend beyond business intelligence. Still, the more efficient and expedient analytical insight virtualization technologies beget can revamp almost any BI deployment. Furthermore, it does so in a manner that reinforces security and governance, while helping to further the overarching self-service movement within the data sphere. With both cloud and on-premise options available, it helps to significantly simplify integration and many of the eminent facets of data preparation that make analytics possible.

Source

Jul 26, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
statistical anomaly  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> Build a Mobile Gaming Events Data Pipeline with Databricks Delta by analyticsweek

>> Chatters, silences, and signs: Google has launched two major updates in the past one month by thomassujain

>> How Data Science Is Fueling Social Entrepreneurship by analyticsweekpick

Wanna write? Click Here

[ NEWS BYTES]

>>
 Microsoft Sinks Subsea Data Center off Scotland – Light Reading Under  Data Center

>>
 Summit shock fades; Samsonite struggles; Europe’s big data day – CNNMoney Under  Big Data

>>
 At events round the world, operators question short term virtualization case – Rethink Research Under  Virtualization

More NEWS ? Click Here

[ FEATURED COURSE]

A Course in Machine Learning

image

Machine learning is the study of algorithms that learn from data and experience. It is applied in a vast variety of application areas, from medicine to advertising, from military to pedestrian. Any area in which you need… more

[ FEATURED READ]

Storytelling with Data: A Data Visualization Guide for Business Professionals

image

Storytelling with Data teaches you the fundamentals of data visualization and how to communicate effectively with data. You’ll discover the power of storytelling and the way to make data a pivotal point in your story. Th… more

[ TIPS & TRICKS OF THE WEEK]

Analytics Strategy that is Startup Compliant
With right tools, capturing data is easy but not being able to handle data could lead to chaos. One of the most reliable startup strategy for adopting data analytics is TUM or The Ultimate Metric. This is the metric that matters the most to your startup. Some advantages of TUM: It answers the most important business question, it cleans up your goals, it inspires innovation and helps you understand the entire quantified business.

[ DATA SCIENCE Q&A]

Q:Provide examples of machine-to-machine communications?
A: Telemedicine
– Heart patients wear specialized monitor which gather information regarding heart state
– The collected data is sent to an electronic implanted device which sends back electric shocks to the patient for correcting incorrect rhythms

Product restocking
– Vending machines are capable of messaging the distributor whenever an item is running out of stock

Source

[ VIDEO OF THE WEEK]

Solving #FutureOfOrgs with #Detonate mindset (by @steven_goldbach & @geofftuff) #FutureOfData #Podcast

 Solving #FutureOfOrgs with #Detonate mindset (by @steven_goldbach & @geofftuff) #FutureOfData #Podcast

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

You can have data without information, but you cannot have information without data. – Daniel Keys Moran

[ PODCAST OF THE WEEK]

@DrewConway on creating socially responsible data science practice #FutureOfData #Podcast

 @DrewConway on creating socially responsible data science practice #FutureOfData #Podcast

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

In that same survey, by a small but noticeable margin, executives at small companies (fewer than 1,000 employees) are nearly 10 percent more likely to view data as a strategic differentiator than their counterparts at large enterprises.

Sourced from: Analytics.CLUB #WEB Newsletter

An Agile Approach to Big Data Management: Intelligent Workload Routing Automation

Monolithic approaches to data management, and the management of big data in particular, are no longer sufficient. Organizations have entirely too many storage and computing environments to account for, including bare metal on-premise servers, virtualization options, the cloud, and any combination of hybrid implementations.

Capitalizing on this distributed data landscape requires the ability to seamlessly shift resources between hosts, machines, and settings for real-time factors affecting workload optimization. Whether spurred by instances of failure, maintenance, surges, or fleeting pricing models (between cloud providers, for example), contemporary enterprises must react nimbly enough to take advantage of their infrastructural and architectural complexity.

Furthermore, regulatory mandates such as the EU General Data Protection Regulation and others necessitate the dynamic positioning of workflows in accordance with critical data governance protocols. A careful synthesis of well-planned governance policy, Artificial Intelligence techniques, and instance-level high availability execution can create the agility of a global data fabric in which “if [organizations] can get smart and let the automation take place, then everything’s done the way they want it done, all the time, right,” DH2i CEO Don Boxley said.

Smart Availability
The automation of intelligent workload routing is predicated on mitigating downtime and maximizing performance. Implicit to these goals is the notion of what Boxley referred to as Smart Availability in which “you’re always able to pick the best execution venue for your workloads.” Thus, the concept of high availability, which relies on techniques such as clustering, failovers, and other redundancy measures to ensure availability, is enhanced by dynamically (and even proactively) moving workloads to intelligently improve performance. Use cases for doing so are interminable, but are particularly acute in instances of online transaction processing in verticals such as finance or insurance. Whether processing insurance claims or handling bank teller transactions, “those situations are very sensitive to performance,” Boxley explained. “So, the ability to move a workload when it’s under stress to balance out that performance is a big deal.” Of equal value is the ability to move workloads between settings in the cloud, which can encompass provisioning workloads to “span clouds”, as Boxley mentioned, or even between them. “The idea of using the cloud for burst performance also becomes an option, assuming all the data governance issues are aligned,” Boxley added.

Data Governance
The flexibility of automating intelligent workloads is only limited by data governance policy, which is a crucial piece of the success of dynamically shifting workload environments. Governance mandates are especially important for data hosted in the cloud, as there are strict regulations about where certain information (pertaining to industry, location, etc.) is stored. Organizations must also contend with governance protocols about who can view or access data, while also taking care to protect sensitive and personally identifiable information. In fact, the foresight required for comprehensive policies about where data and their workloads are stored and enacted is one of the fundamental aspects of the intelligence involved in routing them. “That’s what the key is: acting the right way every time,” Boxley observed. “The burden is on organizations to ensure the policies they write are an accurate reflection of what they want to take place.” Implementing proper governance policies about where data are is vitally important when automating their routing, whether for downtime or upsurges. “One or two workloads, okay I can manage that,” Boxley said. “If I’ve got 100, 500 workloads, that becomes difficult. It’s better to get smart, write those policies, and let the automation take place.”

Artificial Intelligence Automation
Once the proper policies are formulated, workloads are automatically routed in accordance to them. Depending on the organization, use case, and the particular workload, that automation is tempered with human-based decision support. According to Boxley, certain facets of the underlying automation are underpinned by, “AI built into the solution which is constantly monitoring, getting information from all the hosts of all the workloads under management.” AI algorithms are deployed to denote substantial changes in workloads attributed to either failure or spikes. In either of these events (or any other event that an organization specifies), alerts are generated to a third-party monitor tasked with overseeing designated workflows. Depending on the specifics of the policy, that monitor will either take the previously delineated action or issue alerts to the organization, which can then choose from a variety of contingencies. “A classic use case is if a particular process isn’t getting a certain level of CPU power; then you look for another CPU to meet that need,” Boxley said. The aforementioned governance policies will identify which alternative CPUs to use.

Data Fabric
The governance requisites for shifting workloads (and their data) between environments is a pivotal aspect of holistically stitching together what’s termed a comprehensive data fabric of the enterprise’s computational and data resources. “In terms of the technology part, we’re getting there,” Boxley acknowledged. “The bigger challenge is the governance part.” With regulations such as GDPR propelling the issue of governance to the forefront of data-driven organizations, the need to solidify policies prior to moving data around is clear.

Equally unequivocal, however, is the need to intelligently shift those resources around an organization’s entire computing fabric to avail itself of the advantages found in different environments. As Boxley indicated, “The environments are going to continue to get more complex.” The degree of complexity implicit in varying environments such as Windows or Linux, in addition to computational settings ranging from the cloud to on-premise, virtualization options to physical servers, seemingly supports the necessity of managing these variations in a uniform manner. The ability to transfer workloads and data between these diverse settings with machine intelligent methods in alignment with predefined governance policies is an ideal way of reducing that complexity so it becomes much more manageable—and agile.

Originally Posted at: An Agile Approach to Big Data Management: Intelligent Workload Routing Automation