Jul 12, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Big Data knows everything  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> Apr 27, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..) by admin

>> Free Research Report on the State of Patient Experience in US Hospitals by bobehayes

>> Jun 01, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..) by admin

Wanna write? Click Here

[ NEWS BYTES]

>>
 Bill Promoting Behavioral Health EHR Incentives Passes House – EHRIntelligence.com Under  Health Analytics

>>
 India sees 77% growth in HR analytics professionals in 5 years, says report – Business Standard Under  Talent Analytics

>>
 Utilizing Healthcare Data Security, Cloud for a Stronger Environment – HealthITSecurity.com Under  Cloud Security

More NEWS ? Click Here

[ FEATURED COURSE]

Tackle Real Data Challenges

image

Learn scalable data management, evaluate big data technologies, and design effective visualizations…. more

[ FEATURED READ]

Antifragile: Things That Gain from Disorder

image

Antifragile is a standalone book in Nassim Nicholas Taleb’s landmark Incerto series, an investigation of opacity, luck, uncertainty, probability, human error, risk, and decision-making in a world we don’t understand. The… more

[ TIPS & TRICKS OF THE WEEK]

Analytics Strategy that is Startup Compliant
With right tools, capturing data is easy but not being able to handle data could lead to chaos. One of the most reliable startup strategy for adopting data analytics is TUM or The Ultimate Metric. This is the metric that matters the most to your startup. Some advantages of TUM: It answers the most important business question, it cleans up your goals, it inspires innovation and helps you understand the entire quantified business.

[ DATA SCIENCE Q&A]

Q:Explain what a long-tailed distribution is and provide three examples of relevant phenomena that have long tails. Why are they important in classification and regression problems?
A: * In long tailed distributions, a high frequency population is followed by a low frequency population, which gradually tails off asymptotically
* Rule of thumb: majority of occurrences (more than half, and when Pareto principles applies, 80%) are accounted for by the first 20% items in the distribution
* The least frequently occurring 80% of items are more important as a proportion of the total population
* Zipf’s law, Pareto distribution, power laws

Examples:
1) Natural language
– Given some corpus of natural language – The frequency of any word is inversely proportional to its rank in the frequency table
– The most frequent word will occur twice as often as the second most frequent, three times as often as the third most frequent…
– The” accounts for 7% of all word occurrences (70000 over 1 million)
– ‘of” accounts for 3.5%, followed by ‘and”…
– Only 135 vocabulary items are needed to account for half the English corpus!

2. Allocation of wealth among individuals: the larger portion of the wealth of any society is controlled by a smaller percentage of the people

3. File size distribution of Internet Traffic

Additional: Hard disk error rates, values of oil reserves in a field (a few large fields, many small ones), sizes of sand particles, sizes of meteorites

Importance in classification and regression problems:
– Skewed distribution
– Which metrics to use? Accuracy paradox (classification), F-score, AUC
– Issue when using models that make assumptions on the linearity (linear regression): need to apply a monotone transformation on the data (logarithm, square root, sigmoid function…)
– Issue when sampling: your data becomes even more unbalanced! Using of stratified sampling of random sampling, SMOTE (‘Synthetic Minority Over-sampling Technique”, NV Chawla) or anomaly detection approach

Source

[ VIDEO OF THE WEEK]

Solving #FutureOfOrgs with #Detonate mindset (by @steven_goldbach & @geofftuff) #FutureOfData #Podcast

 Solving #FutureOfOrgs with #Detonate mindset (by @steven_goldbach & @geofftuff) #FutureOfData #Podcast

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

You can use all the quantitative data you can get, but you still have to distrust it and use your own intelligence and judgment. – Alvin Tof

[ PODCAST OF THE WEEK]

@ChuckRehberg / @TrigentSoftware on Translating Technology to Solve Business Problems #FutureOfData #Podcast

 @ChuckRehberg / @TrigentSoftware on Translating Technology to Solve Business Problems #FutureOfData #Podcast

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

40% projected growth in global data generated per year vs. 5% growth in global IT spending.

Sourced from: Analytics.CLUB #WEB Newsletter

The Meaning of Scale Values for Likelihood to Recommend Ratings

meaning of likelihood to recommend ratingsCustomer experience management professionals use self-reported “likelihood” questions to measure customer loyalty. In their basic form, customer loyalty questions ask customers to rate their likelihood of engaging in different types of loyalty behaviors toward the company, including behaviors around retention (likelihood to leave), advocacy (likelihood to recommend) and purchasing (likelihood to buy different products). These loyalty questions typically employ a “0 to 10” rating scale that includes verbal anchors at each end of the scale to indicate low or high likelihood (e.g., 0 = not at all likely to 10 = extremely likely; 0 = least likely to 10 = most likely).

While it is generally accepted that higher likelihood ratings represent higher levels of loyalty, we are left with some ambiguity about what defines “good” likelihood ratings. How much better is a score of 8 than 7? Is there a meaningful difference between a rating of 7 and 6? I explored the meaning of likelihood ratings in the present analysis.

The Data

Data are from a customer survey of a large automobile dealership located on the east coast. The current analysis is based on 30,236 respondents over a 4-year period (2010 thru Nov 2013) across 20 different locations. The survey asked customers six questions, including two similar loyalty questions:

  1. How likely is it that you would recommend your sales person to a friend or colleague? Rating options were on a rating scale from 0 (Least Likely) to 10 (Most Likely).
  2. Would you recommend your sales associate to a friend, relative or neighbor? Response options were either “Yes” or “No.”
Figure 1. Meaning of Scale Values for Likelihood to Recommend Ratings
Figure 1. Meaning of Scale Values for Likelihood to Recommend Ratings

Results

Figure 1 illustrates the relationship between these two different loyalty questions, showing the percent of “Yes” responses of second loyalty question across each of the 11 response options of the first loyalty question. Generally, the percent of customers who say they would recommend the company drops as the likelihood to recommend ratings drop. We see a few interesting things when we look a little closer at this figure:

  1. There is no major difference between ratings of 8, 9 and 10 with respect to the percent of customers who say they would recommend. For each of the top likelihood scale values, nearly all (over 99%) customers say they would recommend.
  2. Likelihood ratings of 0 through 6 represent negative loyalty responses; for each of these likelihood values, most customers would not recommend.
  3. There is a substantial increase in loyalty from a likelihood rating of 6 (38% say “Yes”)  to 7 (80% say “Yes”) .

Implications

There are few implications regarding the results of the current analysis. First, companies use customer segmentation analysis to create homogeneous groups of customers to target each group with specific marketing, sales and service initiatives to improve or maintain overall customer loyalty. The developers of the Net Promoter Score segment customers into three groups based on their likelihood rating: Promoters (ratings of 9 or 10), Passives (ratings of 7 or 8) and Detractors (ratings of 0 through 6). The results of the present analysis, however, suggest a slightly different segmentation. The results show that customers who give a likelihood rating of 8 are more similar to customers who give a rating of 9 or 10 (Promoters) than they are to customers who give a rating of 7 (Passives).

Second, the segmentation of customers based on the NPS trichotomy seems arbitrary. The current results supports a slightly different segmentation of customers into different, more homogeneous groups. These three new customer segments are:

  1. Strong Advocates: Customers who give a likelihood rating of 8, 9 or 10. Over 99% percent of Strong Advocates say they would recommend.
  2. Advocates: Customers who give a likelihood rating of 7. About 80% of Advocates say they would recommend.
  3. Non-Advocates: Customers who give a likelihood rating of 0 through 6. On average, 84% of Non-Advocates say they would not recommend. These customers are labeled Detractors in NPS lingo. The relabeling of this group of customers as “Non-advocates” more honestly reflects what is being measured by the “likelihood to recommend” question. See Word-of-Mouth Index that more fully explores this notion.

Finally, it appears that all Detractors are not created equal. While ratings of 0 through 6 do represent low loyalty, a likelihood rating of 6 is a little more positive than the lower likelihood ratings. Targeting improvement initiatives that blanket all are typically directed at customers who report low levels of loyalty, those giving a likelihood rating of 6 or less. However, it might be beneficial to further segment these at-risk customers to target improvement initiatives to customers giving a rating of 6. Recall that a likelihood rating of 7 results in a substantial increase in recommending. Moving customers from a likelihood rating from 0 to 1 or 1 to 2 does very little to increase recommendations. Moving customers from a likelihood rating from 6 to 7, however, results in a substantial increase in recommending.

Source by bobehayes

10 Misconceptions about Big Data

10 Misconceptions about Big Data
10 Misconceptions about Big Data

Lot of content is thrown around Big Data, and how it could change the market landscape. It is running it’s hype cycle and many bloggers, experts, consultants, & professionals are lining themselves to align with big data. So, not everything that is known to this industry is accurate and some misconceptions that show up here and there have been discussed below.

  1. Big Data has the answer to everything: This has been floating as part of many coffee table conversations. Big data could help but it’s not a magic wand that will help you find answers to everything. Certainly Bigdata could potentially help you answer most cryptic questions but it’s not for everything. So, fine-tune the expectation of what to get out from BigData strategy.
  2. Data Scientist drives BigData: Almost every now and then, we stumble upon someone who claims to be a data scientist and boast about how they are driving BigData in their company. Surely, they all are doing important work of helping find insights from data but BigData is already happening whether Data Scientists drive it or not. BigData journey begins with capturing as much data as possible. Data scientists just help steer the insights from data. So, don’t wait on data scientist before you start prepping for BigData.
  3. Big Data is complicated: With escalating pay cheque of Data Scientists, it is not difficult to understand that BigData is perceived as rocket science that only few could tame and understand. This is a pure evil that most of businesses has to deal with. BigData is just more volume, velocity and variety of contained data. It need not be complicated. In fact, a well-designed big-data system is often scalable, simple and fast. So, breath easy if you find your big data nicely laid out and easy to understand.
  4. The More Data the better: There is a debate on how much data is effective and whether more data is better. There are certainly two schools of thoughts. One suggesting that more data you have better you could learn from it. But I believe data and its effectiveness goes more around quality aspect of data and not always quantity. So, based on the circumstances, quality and in some case quantity could signify better impact from data.
  5. Big data is just hype: Surely, you must find yourself either for or against this statement, but that is because it is what it is. Bigdata is getting a lot of press hours and PR time. It is partly because there is hype, but partly because tools to deal with big data has unveiled the capability to address unmanageable blob of data and parse to insights using commodity hardware. So, hype is evident but there is whole capability shift that is fueling this hype like demand to handle more data and get to better insights within data. So, big data is not just hype but a real shift of capabilities on how businesses start to look at their data.
  6. Big data is unstructured: Yay, I am certain if you are into bigdata domain for more than 1day, you must have heard rants around bigdata being unstructured data. It is not true. As stated earlier, big data is just data that is beyond your expectations around 3 vectors: Volume, Velocity and Variety. So, data could be structured or unstructured, its 3 Vs that defines its BigData status and not the structure of data.
  7. Data eliminates uncertainty: Data surely helps convey more information around a particular use case but certainly it is not an indicator to predict certainty. Future data is as uncertain as the market condition. Uncertainty comes to business layers through various areas, competitive landscape, customer experience, market conditions, and other business dependent conditions. So, data is certainly not a good indicator for eliminating uncertainty.
  8. We must capture everything in order to analyze our Big Data: Sure, it sounds awesome to capture everything to learn everything, but it’s delusional. Everything is a very circumstantial thing. Business every time shifts its dependence from few sets of KPIs to others. So, there could never be an exhaustive list to capture. It will keep on changing with market. Another key area to understand is that few data sets have limited to no impact on business, so data should be picked according to it’s impact on the business. And these KPIs must be evaluated every now and then to measure changing market shift.
  9. Big Data systems are expensive to implement and maintain: Yes, this still exists as a misconception with many businesses. But businesses should try to understand that the very fact Bigdata is sitting in hot seat is because commodity hardware could now be used to tackle bigdata. So, bigdata systems are not expansive anymore. They have been low and getting lower on their cost. So, cost should never be a deterrent for indulging in bigdata project.
  10. Big Data is for Big Companies Only: Like the point previously quoted, big data tools are cheap and they are run on cheap commodity hardware. So, they are accessible and no more the dream/passion of big corporation only. Small/Mid size companies have almost similar leverage when it comes to thinking like big corporations. So, bigdata capabilities are for the strong hearts and not the rich pockets.

So, BigData landscape is filled with Truth and Myth, so make sure to check which side your hurdle lies before calling in quits and throwing in the towel.

Source: 10 Misconceptions about Big Data

Jul 05, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Correlation-Causation  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> Why the time is ripe for security behaviour analytics by analyticsweekpick

>> #FutureOfData with @CharlieDataMine, @Oracle discussing running analytics in an enterprise by v1shal

>> CMS Predictive Readmission Models ‘Not Very Good’ by analyticsweekpick

Wanna write? Click Here

[ NEWS BYTES]

>>
 Global BPO Business Analytics Market by 2023: Accenture, HP, TCS … – Healthcare Journal Under  Business Analytics

>>
 Streaming Analytics Market: Business Opportunities, Current Trends, Market Challenges & Global Industry Analysis By … – Business Investor Under  Streaming Analytics

>>
 Microsoft Azure Container Instances Now Production-Ready – Virtualization Review Under  Virtualization

More NEWS ? Click Here

[ FEATURED COURSE]

Lean Analytics Workshop – Alistair Croll and Ben Yoskovitz

image

Use data to build a better startup faster in partnership with Geckoboard… more

[ FEATURED READ]

The Black Swan: The Impact of the Highly Improbable

image

A black swan is an event, positive or negative, that is deemed improbable yet causes massive consequences. In this groundbreaking and prophetic book, Taleb shows in a playful way that Black Swan events explain almost eve… more

[ TIPS & TRICKS OF THE WEEK]

Winter is coming, warm your Analytics Club
Yes and yes! As we are heading into winter what better way but to talk about our increasing dependence on data analytics to help with our decision making. Data and analytics driven decision making is rapidly sneaking its way into our core corporate DNA and we are not churning practice ground to test those models fast enough. Such snugly looking models have hidden nails which could induce unchartered pain if go unchecked. This is the right time to start thinking about putting Analytics Club[Data Analytics CoE] in your work place to help Lab out the best practices and provide test environment for those models.

[ DATA SCIENCE Q&A]

Q:What is principal component analysis? Explain the sort of problems you would use PCA for. Also explain its limitations as a method?

A: Statistical method that uses an orthogonal transformation to convert a set of observations of correlated variables into a set of values of linearly uncorrelated variables called principal components.

Reduce the data from n to k dimensions: find the k vectors onto which to project the data so as to minimize the projection error.
Algorithm:
1) Preprocessing (standardization): PCA is sensitive to the relative scaling of the original variable
2) Compute covariance matrix ?
3) Compute eigenvectors of ?
4) Choose kk principal components so as to retain xx% of the variance (typically x=99)

Applications:
1) Compression
– Reduce disk/memory needed to store data
– Speed up learning algorithm. Warning: mapping should be defined only on training set and then applied to test set

2. Visualization: 2 or 3 principal components, so as to summarize data

Limitations:
– PCA is not scale invariant
– The directions with largest variance are assumed to be of most interest
– Only considers orthogonal transformations (rotations) of the original variables
– PCA is only based on the mean vector and covariance matrix. Some distributions (multivariate normal) are characterized by this but some are not
– If the variables are correlated, PCA can achieve dimension reduction. If not, PCA just orders them according to their variances

Source

[ VIDEO OF THE WEEK]

#BigData @AnalyticsWeek #FutureOfData with Jon Gibs(@jonathangibs) @L2_Digital

 #BigData @AnalyticsWeek #FutureOfData with Jon Gibs(@jonathangibs) @L2_Digital

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

The most valuable commodity I know of is information. – Gordon Gekko

[ PODCAST OF THE WEEK]

#BigData @AnalyticsWeek #FutureOfData #Podcast with Joe DeCosmo, @Enova

 #BigData @AnalyticsWeek #FutureOfData #Podcast with Joe DeCosmo, @Enova

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

Big data is a top business priority and drives enormous opportunity for business improvement. Wikibon’s own study projects that big data will be a $50 billion business by 2017.

Sourced from: Analytics.CLUB #WEB Newsletter

Gleanster – Actionable Insights at a Glance

Gleanster: Actionable insights at a glanceI am happy to announce that I have joined Gleanster’s Thought Leader group as a contributing analyst. Gleanster is a market research and advisory services firm that benchmarks best practices in technology-enabled business initiatives, delivering actionable insights that allow companies to make smart business decisions and match their needs with vendor solutions.

In my role at Gleanster, I will be involved in providing insight into the Enterprise Feedback Management (EFM) and Customer Experience Management (CEM) space. Building on Gleanster’s 2010 Customer Feedback Management report as well as my own research on best practices in customer feedback programs (See Beyond the Ultimate Question for complete results of my research), I will be directing Gleanster’s upcoming benchmark study on Customer Experience Management. In this study, we will identify specific components of CEM that are essential in helping companies deliver a great customer experience that increases customer loyalty.

“We are excited to have Dr. Hayes as part of our distinguished thought leader group. Dr. Hayes brings over 20 years of experience to bear on important issues in customer experience management and enterprise feedback management. Specifically, his prior research on the measurement and meaning of customer loyalty and best practices in customer feedback programs has helped advance the field tremendously. His scientific research is highly regarded by his industry peers, and we are confident that Dr. Hayes’ continuing contributions to the field will bring great value to the Gleanster community.”

Jeff Zabin, CEO
Gleanster

As a proud member of the 1% for the Planet alliance, Gleanster is committed to donating at least 1% of their annual sales revenue to nonprofit organizations focused on environmental sustainability.

Source

Jun 28, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Accuracy  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> Enterprise Architecture for the Internet of Things: Containerization and Microservices by jelaniharper

>> 25 Hilarious Geek Quotes for Geek-Wear by v1shal

>> May 22, 2017 Health and Biotech analytics news roundup by pstein

Wanna write? Click Here

[ NEWS BYTES]

>>
 What the country’s first undergrad program in artificial intelligence will look like – EdScoop News Under  Artificial Intelligence

>>
 Rugby Statistics: Cooney’s figures still stand up to scrutiny – Irish Times Under  Statistics

>>
 Witad Awards 2018 Write-Ups: Data Scientist of the Year … – www.waterstechnology.com Under  Data Scientist

More NEWS ? Click Here

[ FEATURED COURSE]

The Analytics Edge

image

This is an Archived Course
EdX keeps courses open for enrollment after they end to allow learners to explore content and continue learning. All features and materials may not be available, and course content will not be… more

[ FEATURED READ]

The Future of the Professions: How Technology Will Transform the Work of Human Experts

image

This book predicts the decline of today’s professions and describes the people and systems that will replace them. In an Internet society, according to Richard Susskind and Daniel Susskind, we will neither need nor want … more

[ TIPS & TRICKS OF THE WEEK]

Winter is coming, warm your Analytics Club
Yes and yes! As we are heading into winter what better way but to talk about our increasing dependence on data analytics to help with our decision making. Data and analytics driven decision making is rapidly sneaking its way into our core corporate DNA and we are not churning practice ground to test those models fast enough. Such snugly looking models have hidden nails which could induce unchartered pain if go unchecked. This is the right time to start thinking about putting Analytics Club[Data Analytics CoE] in your work place to help Lab out the best practices and provide test environment for those models.

[ DATA SCIENCE Q&A]

Q:Explain what a local optimum is and why it is important in a specific context,
such as K-means clustering. What are specific ways of determining if you have a local optimum problem? What can be done to avoid local optima?

A: * A solution that is optimal in within a neighboring set of candidate solutions
* In contrast with global optimum: the optimal solution among all others

* K-means clustering context:
It’s proven that the objective cost function will always decrease until a local optimum is reached.
Results will depend on the initial random cluster assignment

* Determining if you have a local optimum problem:
Tendency of premature convergence
Different initialization induces different optima

* Avoid local optima in a K-means context: repeat K-means and take the solution that has the lowest cost

Source

[ VIDEO OF THE WEEK]

@ReshanRichards on creating a learning startup for preparing for #FutureOfWork #JobsOfFuture #Podcast

 @ReshanRichards on creating a learning startup for preparing for #FutureOfWork #JobsOfFuture #Podcast

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

Without big data, you are blind and deaf and in the middle of a freeway. – Geoffrey Moore

[ PODCAST OF THE WEEK]

@DrewConway on creating socially responsible data science practice #FutureOfData #Podcast

 @DrewConway on creating socially responsible data science practice #FutureOfData #Podcast

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

29 percent report that their marketing departments have ‘too little or no customer/consumer data.’ When data is collected by marketers, it is often not appropriate to real-time decision making.

Sourced from: Analytics.CLUB #WEB Newsletter

How oil and gas firms are failing to grasp the necessity of Big Data analytics

An explosion in information volumes and processing power is transforming the energy sector. Even the major players are dragging their feet to catch up.

The business of oil and gas profit-making takes place increasingly in the realm of bits and bytes. The information explosion is everywhere, be it in the geosciences, engineering and management or even on the financial and regulatory sides. The days of easy oil are running out; unconventional plays are becoming the norm. For producers that means operations are getting trickier, more expensive and data-intensive.

“Companies are spending a lot of money on IT. Suncor alone spends about $500 million per year.”

Thirty years ago geoscientists could get their work done by scribbling on paper; today they are watching well data flow, in real time and by the petabyte, across their screens. Despite what many think, the challenge for them doesn’t lie in storing the mountains of data. That’s the easy part. The challenge is more about building robust IT infrastructures that ­holistically integrate operations data and enable ­different systems and sensors to talk to each other. With greater transparency over the data, operators can better analyze it and draw actionable insights that bring real competitive value.

Even the big guys aren’t progressive in this area,” says Nicole Jardin, CEO of Emerald Associates, a Calgary-based firm that provides project management solutions from Oracle. “They often make decisions without real big data analytics and collaborative tools. But people aren’t always ready for the level of transparency that’s now possible.” Asked why a company would not automatically buy into a solution that would massively help decision-makers, her answer is terse: “Firefighters want glory.”

The suggestion is, of course, that many big data management tools are so powerful that they can dramatically de-risk oil and gas projects. Many problems end up much more predictable and avoidable. As a result, people whose jobs depend on solving those problems and putting out fires see their livelihoods threatened by this IT trend. Resistance and suspicion, always a dark side of any corporate culture, rears its ugly face.

On the other hand, more progressive companies have already embraced the opportunities of big data. They don’t need convincing and have long since moved from resistance to enthusiastic adoption. They have grown shrewder and savvier and base their IT investments very objectively according to cost-benefit metrics. The central question for vendors: “So what’s the ROI?”

There is big confusion about big data, and there are different views about where the oil and gas industry is lagging in terms of adopting cutting-edge tools. Scott Fawcett, director at Alberta Innovates – Technology Futures in Calgary and a former executive at global technology companies like Apptio, SAP SE and Cisco Systems, points out that this is not small potato stuff. “There has been an explosion of data. How are you to deal with all the data coming in in terms of storage, processing, analytics? Companies are spending a lot of money on IT. Suncor alone spends about $500 million per year.” He then adds, “And that’s even at a time when memory costs have plummeted.”

 

The big data story had its modest beginnings in the 1980s, with the introduction of the first systems that allowed the energy industry to put data in a digital format. Very suddenly, the traditional characteristics of oil and gas and other resource industries – often unfairly snubbed as a field of “hewers of word and carriers of water” – changed fundamentally. The shift was from an analog to a digital business template; operations went high-tech.

It was also the beginning of what The Atlantic writer Jonathan Rauch has called the “new old economy.” With the advent of digitization, innovation accelerated and these innovations cross-fertilized each other in an ever-accelerating positive feedback loop. “Measurement-while-drilling, directional drilling and 3-D seismic imaging not only developed simultaneously but also developed one another,” wrote Rauch. “Higher resolution seismic imaging increased the payoff for accurate drilling, and so companies scrambled to invest in high-tech downhole sensors; power sensors, in turn, increased yields and hence the payoff for expensive directional drilling; and faster, cheaper directional drilling increased the payoff for still higher resolution from 3-D seismic imaging.”

One of the biggest issues in those early days was storage, but when that problem was more or less solved, the industry turned to the next challenge of improving the processing and analysis of the enormous and complex data sets it collects daily. Traditional data applications such as Microsoft Excel were hopelessly inadequate for the task.

In fact, the more data and analytical capacities the industry got, the more it wanted. It wasn’t long ago that E&P companies would evaluate an area and then drill a well. Today, companies still evaluate then drill, but the data collected in real time from the drilling is entered into the system to guide planning for the next well. Learnings are captured and their value compounded immediately. In the process, the volume of collected data mushrooms.

The label “big data” creates confusion, just as does the term Big Oil. The “big” part of big data is widely misunderstood. It is, therefore, helpful to define big data with the three v’s of volume, velocity and variety. With regard to the first “v,” technology analysts International Data Corp. estimated that there were 2.7 zettabytes of data worldwide as of March 2012. A zettabyte equals 1.1 trillion gigabytes. The amount of data in the world doubles each year, and the data in the oil and gas industry, which makes up a non-trivial part of the data universe, keeps flooding in from every juncture along the exploration, production and processing value chain.

Velocity, the second “v,” refers to the speed by which the volume data is accumulating. This is caused by the fact that, in accordance with Moore’s famous law, computational power keeps increasing exponentially, storage costs keep falling and communication and ubiquitous smart technology keep generating more and more information.

“In the old days, people were driving around in trucks, measuring things. Now there are sensors that do that work.”

On the velocity side, Scott Fawcett says, “In the old days people were driving around in trucks, measuring things. Now there are sensors doing that work.” Sensors are everywhere in operations now. Just in their downhole deployment, there are flowmeters and pressure, temperature, vibrations gauges as well as acoustic and electromagnetic sensors.

Big data analytics is the ability to asses and draw rich insights from data sets so decision-makers can better de-risk projects. There is a common big data focus of oil and gas companies on logistics and optimization, according to Dale Sperrazza, general manager Europe and sub-Saharan Africa at Halliburton Landmark. If this focus is too one-sided, companies may end up just optimizing a well drilled in a suboptimal location.

“So while there is great value in big data and advanced analytics for oilfield operations and equipment, no matter if the sand truck shows up on time, drilling times are reduced and logistical delays are absolutely minimized, a poorly chosen well is a poorly performing well,” writes Luther Birdzell in the blog OAG Analytics.

Birdzell goes on to explain that the lack of predictive analytics results in about 25 per cent of the wells in large U.S. resource plays underperforming, at a cost of roughly $10 million per well. After all, if a company fails to have enough trucks to haul away production from a site before a storage facility fills up, then the facility shuts down. Simply put, when a facility is shut down, production is deferred, deferred production is deferred revenue, and deferred revenue can be the kiss of death for companies in fragile financial health.

The application of directional drilling and hydraulic multi-stage fracturing to hydrocarbon-rich source rocks has made the petroleum business vastly more complex, according to the Deloitte white paper The Challenge of Renaissance, and this complexity can only be managed by companies with a real mastery of big data and its analytical tools. The age of easy oil continues to fade out while the new data- and technology-driven age of “hard oil” is taking center stage. The capital costs of unconventional oil and gas plays are now so high and the technical requirements so convoluted, the margins for error have grown very small. Decision-makers can’t afford to make too many bad calls.

Despite the investments companies are putting into data-generating tools like sensors, much of the data is simply discarded, because the right infrastructure is missing. “IT infrastructure should not be confused with just storage; it is rather the capacity to warehouse and model data,” according to Nicole Jardin at Emerald Associates. If the right infrastructure is in place, the sensor-generated data could be deeply analyzed and opportunities ­identified for production, safety or environmental improvements.

Today, operators are even introducing automated controls that register data anomalies and point to the possible imminent occurrence of dangerous events. Behind these automated controls are predictive models which monitor operational processes in real time. They are usually coupled with systems that not only alert companies to issues but also make recommendations to deal with them. Pipelines are obviously investing heavily in these systems, but automated controls are part of a much larger development now sweeping across all industries and broadly called “the Internet of things” or “the industrial Internet.”

“In the ’80s, when data was being stored digitally, it was fragmented with systems that weren’t capable of communicating with each other,” Fawcett says. The next wave in big data is toward the holistic view of data system de-fragmentation and integration. “Ultimately,” Jardin says, “in order to analyze data, you need to federate it. Getting all the parts to speak to each other should now be high priority for competitively minded energy companies.”

Originally posted via “How oil and gas firms are failing to grasp the necessity of Big Data analytics”

Source: How oil and gas firms are failing to grasp the necessity of Big Data analytics by analyticsweekpick

Jun 21, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Ethics  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> Enterprise Data Modeling Made Easy by jelaniharper

>> Assessment of Risk Maps in Data Scientist Jobs by thomassujain

>> Mar 01, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..) by admin

Wanna write? Click Here

[ NEWS BYTES]

>>
 Network Intelligence witnesses 48% growth in FY 2017-18 – Tech Observer Under  Big Data Security

>>
 Predictive and Prescriptive Analytics Market valued of USD 16.84 billion, at a CAGR of 20.43% by the end of 2023 – The Financial Analyst Under  Prescriptive Analytics

>>
 New York City Spending the Focus at Fourth Annual Business Analytics Conference – Manhattan College News Under  Analytics

More NEWS ? Click Here

[ FEATURED COURSE]

Deep Learning Prerequisites: The Numpy Stack in Python

image

The Numpy, Scipy, Pandas, and Matplotlib stack: prep for deep learning, machine learning, and artificial intelligence… more

[ FEATURED READ]

The Misbehavior of Markets: A Fractal View of Financial Turbulence

image

Mathematical superstar and inventor of fractal geometry, Benoit Mandelbrot, has spent the past forty years studying the underlying mathematics of space and natural patterns. What many of his followers don’t realize is th… more

[ TIPS & TRICKS OF THE WEEK]

Data aids, not replace judgement
Data is a tool and means to help build a consensus to facilitate human decision-making but not replace it. Analysis converts data into information, information via context leads to insight. Insights lead to decision making which ultimately leads to outcomes that brings value. So, data is just the start, context and intuition plays a role.

[ DATA SCIENCE Q&A]

Q:What is cross-validation? How to do it right?
A: It’s a model validation technique for assessing how the results of a statistical analysis will generalize to an independent data set. Mainly used in settings where the goal is prediction and one wants to estimate how accurately a model will perform in practice. The goal of cross-validation is to define a data set to test the model in the training phase (i.e. validation data set) in order to limit problems like overfitting, and get an insight on how the model will generalize to an independent data set.

Examples: leave-one-out cross validation, K-fold cross validation

How to do it right?

the training and validation data sets have to be drawn from the same population
predicting stock prices: trained for a certain 5-year period, it’s unrealistic to treat the subsequent 5-year a draw from the same population
common mistake: for instance the step of choosing the kernel parameters of a SVM should be cross-validated as well
Bias-variance trade-off for k-fold cross validation:

Leave-one-out cross-validation: gives approximately unbiased estimates of the test error since each training set contains almost the entire data set (n?1n?1 observations).

But: we average the outputs of n fitted models, each of which is trained on an almost identical set of observations hence the outputs are highly correlated. Since the variance of a mean of quantities increases when correlation of these quantities increase, the test error estimate from a LOOCV has higher variance than the one obtained with k-fold cross validation

Typically, we choose k=5 or k=10, as these values have been shown empirically to yield test error estimates that suffer neither from excessively high bias nor high variance.
Source

[ VIDEO OF THE WEEK]

@AnalyticsWeek Panel Discussion: Health Informatics Analytics

 @AnalyticsWeek Panel Discussion: Health Informatics Analytics

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

What we have is a data glut. – Vernon Vinge

[ PODCAST OF THE WEEK]

Want to fix #DataScience ? fix #governance by @StephenGatchell @Dell #FutureOfData #Podcast

 Want to fix #DataScience ? fix #governance by @StephenGatchell @Dell #FutureOfData #Podcast

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

The data volumes are exploding, more data has been created in the past two years than in the entire previous history of the human race.

Sourced from: Analytics.CLUB #WEB Newsletter

2016 Trends in Big Data Governance: Modeling the Enterprise

A number of changes in the contemporary data landscape have affected the implementation of data governance. The normalization of big data has resulted in a situation in which such deployments are so common that they’re merely considered a standard part of data management. The confluence of technologies largely predicated on big data—cloud, mobile and social—are gaining similar prominence, transforming the expectations of not only customers but business consumers of data.

Consequently, the demands for big data governance are greater than ever, as organizations attempt to implement policies to reflect their corporate values and sate customer needs in a world in which increased regulatory consequences and security breaches are not aberrations.

The most pressing developments for big data governance in 2016 include three dominant themes. Organizations need to enforce it outside the corporate firewalls via the cloud, democratize the level of data stewardship requisite for the burgeoning self-service movement, and provide metadata and semantic consistency to negate the impact of silos while promoting sharing of data across the enterprise.

These objectives are best achieved with a degree of foresight and stringency that provides a renewed emphasis on modeling in its myriad forms. According to TopQuadrant co-founder, executive VP and director of TopBraid Technologies Ralph Hodgson, “What you find is the meaning of data governance is shifting. I sometimes get criticized for saying this, but it’s shifting towards a sense of modeling the enterprise.”

In the Cloud

Perhaps the single most formidable challenge facing big data governance is accounting for the plethora of use cases involving the cloud, which appears tailored for the storage and availability demands of big data deployments. These factors, in conjunction with the analytics options available from third-party providers, make utilizing the cloud more attractive than ever. However, cloud architecture challenges data governance in a number of ways including:

  • Semantic modeling: Each cloud application has its own semantic model. Without dedicated governance measures on the part of an organization, integrating those different models can hinder data’s meaning and its reusability.
  • Service provider models: Additionally, each cloud service provider has its own model which may or may not be congruent with enterprise models for data. Organizations have to account for these models as well as those at the application level.
  • Metadata: Applications and cloud providers also have disparate metadata standards which need to be reconciled. According to Tamr Global Head of Strategy, Operations and Marketing Nidhi Aggarwal, “Seeing the metadata is important from a governance standpoint because you don’t want the data available to anybody. You want the metadata about the data transparent.” Vendor lock-in in the form of proprietary metadata issued by providers and their applications can be a problem too—especially since such metadata can encompass an organization’s so that it effectively belongs to the provider.

Rectifying these issues requires a substantial degree of planning prior to entering into service level agreements. Organizations should consider both current and future integration plans and their ramifications for semantics and metadata, which is part of the basic needs assessment that accompanies any competent governance program. Business input is vital to this process. Methods for addressing these cloud-based points of inconsistency include transformation and writing code, or adopting enterprise-wide semantic models via ontologies, taxonomies, and RDF graphs. The critical element is doing so in a way that involves the provider prior to establishing service.

The Democratization of Data Stewardship

The democratization of big data is responsible for an emergence of what Gartner refers to as ‘citizen stewardship’ in two capital ways. The popularity of data lakes and the availability of data preparation tools with cognitive computing capabilities are empowering end users to assert more control over their data. The result is a shifting from the centralized model of data stewardship (which typically encompassed stewards from both the business and IT, the former in accordance to domains) to a decentralized one in which virtually everyone actually using data plays a role in its stewardship.

Both preparation tools and data lakes herald this movement by giving end users the opportunity to perform data integration. Machine learning technologies inform the former and can identify which data is best integrated with others on an application or domain-wide basis. The celerity of this self-service access and integration to data necessitates that the onus of integrating data in accordance to governance policy falls on the end user. Preparation tools can augment that process by facilitating ETL and other forms of action with machine learning algorithms, which can maintain semantic consistency.

Data lakes equipped with semantic capabilities can facilitate a number of preparation functions from initial data discovery to integration while ensuring the sort of metadata and semantic consistency for proper data governance. Regardless, “if you put data in a data lake, there still has to be some metadata associated with it,” MapR Chief Marketing Officer Jack Norris explained. “You need some sort of schema that’s defined so you can accomplish self-service.”

Metadata and Semantic Consistency

No matter what type of architecture is employed (either cloud or on-premise), consistent metadata and semantics represent the foundation of secure governance once enterprise wide policies based on business objectives are formulated. As noted by Franz CEO Jans Aasman, “That’s usually how people define data governance: all the processes that enable you to have more consistent data”. Perhaps the most thorough means of ensuring consistency in these two aspects of governance involves leveraging a data lake or single repository enriched with semantic technologies. The visual representation of data elements on an RDF graph is accessible for end user consumption, while semantic models based on ontological descriptions of data elements clarify their individual meanings. These models can be mapped to metadata to grant uniformity in this vital aspect of governance and provide semantic consistency on diverse sets of big data.

Alternatively, it is possible to achieve metadata consistency via processes instead of technologies. Doing so is more tenuous, yet perhaps preferable to organizations still utilizing a silo approach among different business domains. Sharing and integrating that data is possible through the means of an enterprise-wide governance council with business membership across those domains, which rigorously defines and monitors metadata attributes so that there is still a common semantic model across units. This approach might behoove less technologically savvy organizations, although the sustainment of such councils could become difficult. Still, this approach results in consistent metadata and semantic models on disparate sets of big data.

Enterprise Modeling

The emphasis on modeling that is reflected in all of these trends substantiates the viewpoint that effective big data governance requires strident modeling. Moreover, it is important to implement at a granular level so that data is able to be reused and maintain its meaning across different technologies, applications, business units, and personnel changes. The degree of prescience and planning required to successfully model the enterprise to ensure governance objectives are met will be at the forefront of governance concerns in 2016, whether organizations are seeking new data management solutions or refining established ones. In this respect, governance is actually the foundation upon which data management rests. According to Cambridge Semantics president Alok Prasad, “Even if you are the CEO, you will not go against your IT department in terms of security and governance. Even if you can get a huge ROI, if the governance and security are not there you will not adopt a solution.”

 

Originally Posted at: 2016 Trends in Big Data Governance: Modeling the Enterprise