Jul 26, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
statistical anomaly  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> Build a Mobile Gaming Events Data Pipeline with Databricks Delta by analyticsweek

>> Chatters, silences, and signs: Google has launched two major updates in the past one month by thomassujain

>> How Data Science Is Fueling Social Entrepreneurship by analyticsweekpick

Wanna write? Click Here

[ NEWS BYTES]

>>
 Microsoft Sinks Subsea Data Center off Scotland – Light Reading Under  Data Center

>>
 Summit shock fades; Samsonite struggles; Europe’s big data day – CNNMoney Under  Big Data

>>
 At events round the world, operators question short term virtualization case – Rethink Research Under  Virtualization

More NEWS ? Click Here

[ FEATURED COURSE]

A Course in Machine Learning

image

Machine learning is the study of algorithms that learn from data and experience. It is applied in a vast variety of application areas, from medicine to advertising, from military to pedestrian. Any area in which you need… more

[ FEATURED READ]

Storytelling with Data: A Data Visualization Guide for Business Professionals

image

Storytelling with Data teaches you the fundamentals of data visualization and how to communicate effectively with data. You’ll discover the power of storytelling and the way to make data a pivotal point in your story. Th… more

[ TIPS & TRICKS OF THE WEEK]

Analytics Strategy that is Startup Compliant
With right tools, capturing data is easy but not being able to handle data could lead to chaos. One of the most reliable startup strategy for adopting data analytics is TUM or The Ultimate Metric. This is the metric that matters the most to your startup. Some advantages of TUM: It answers the most important business question, it cleans up your goals, it inspires innovation and helps you understand the entire quantified business.

[ DATA SCIENCE Q&A]

Q:Provide examples of machine-to-machine communications?
A: Telemedicine
– Heart patients wear specialized monitor which gather information regarding heart state
– The collected data is sent to an electronic implanted device which sends back electric shocks to the patient for correcting incorrect rhythms

Product restocking
– Vending machines are capable of messaging the distributor whenever an item is running out of stock

Source

[ VIDEO OF THE WEEK]

Solving #FutureOfOrgs with #Detonate mindset (by @steven_goldbach & @geofftuff) #FutureOfData #Podcast

 Solving #FutureOfOrgs with #Detonate mindset (by @steven_goldbach & @geofftuff) #FutureOfData #Podcast

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

You can have data without information, but you cannot have information without data. – Daniel Keys Moran

[ PODCAST OF THE WEEK]

@DrewConway on creating socially responsible data science practice #FutureOfData #Podcast

 @DrewConway on creating socially responsible data science practice #FutureOfData #Podcast

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

In that same survey, by a small but noticeable margin, executives at small companies (fewer than 1,000 employees) are nearly 10 percent more likely to view data as a strategic differentiator than their counterparts at large enterprises.

Sourced from: Analytics.CLUB #WEB Newsletter

An Agile Approach to Big Data Management: Intelligent Workload Routing Automation

Monolithic approaches to data management, and the management of big data in particular, are no longer sufficient. Organizations have entirely too many storage and computing environments to account for, including bare metal on-premise servers, virtualization options, the cloud, and any combination of hybrid implementations.

Capitalizing on this distributed data landscape requires the ability to seamlessly shift resources between hosts, machines, and settings for real-time factors affecting workload optimization. Whether spurred by instances of failure, maintenance, surges, or fleeting pricing models (between cloud providers, for example), contemporary enterprises must react nimbly enough to take advantage of their infrastructural and architectural complexity.

Furthermore, regulatory mandates such as the EU General Data Protection Regulation and others necessitate the dynamic positioning of workflows in accordance with critical data governance protocols. A careful synthesis of well-planned governance policy, Artificial Intelligence techniques, and instance-level high availability execution can create the agility of a global data fabric in which “if [organizations] can get smart and let the automation take place, then everything’s done the way they want it done, all the time, right,” DH2i CEO Don Boxley said.

Smart Availability
The automation of intelligent workload routing is predicated on mitigating downtime and maximizing performance. Implicit to these goals is the notion of what Boxley referred to as Smart Availability in which “you’re always able to pick the best execution venue for your workloads.” Thus, the concept of high availability, which relies on techniques such as clustering, failovers, and other redundancy measures to ensure availability, is enhanced by dynamically (and even proactively) moving workloads to intelligently improve performance. Use cases for doing so are interminable, but are particularly acute in instances of online transaction processing in verticals such as finance or insurance. Whether processing insurance claims or handling bank teller transactions, “those situations are very sensitive to performance,” Boxley explained. “So, the ability to move a workload when it’s under stress to balance out that performance is a big deal.” Of equal value is the ability to move workloads between settings in the cloud, which can encompass provisioning workloads to “span clouds”, as Boxley mentioned, or even between them. “The idea of using the cloud for burst performance also becomes an option, assuming all the data governance issues are aligned,” Boxley added.

Data Governance
The flexibility of automating intelligent workloads is only limited by data governance policy, which is a crucial piece of the success of dynamically shifting workload environments. Governance mandates are especially important for data hosted in the cloud, as there are strict regulations about where certain information (pertaining to industry, location, etc.) is stored. Organizations must also contend with governance protocols about who can view or access data, while also taking care to protect sensitive and personally identifiable information. In fact, the foresight required for comprehensive policies about where data and their workloads are stored and enacted is one of the fundamental aspects of the intelligence involved in routing them. “That’s what the key is: acting the right way every time,” Boxley observed. “The burden is on organizations to ensure the policies they write are an accurate reflection of what they want to take place.” Implementing proper governance policies about where data are is vitally important when automating their routing, whether for downtime or upsurges. “One or two workloads, okay I can manage that,” Boxley said. “If I’ve got 100, 500 workloads, that becomes difficult. It’s better to get smart, write those policies, and let the automation take place.”

Artificial Intelligence Automation
Once the proper policies are formulated, workloads are automatically routed in accordance to them. Depending on the organization, use case, and the particular workload, that automation is tempered with human-based decision support. According to Boxley, certain facets of the underlying automation are underpinned by, “AI built into the solution which is constantly monitoring, getting information from all the hosts of all the workloads under management.” AI algorithms are deployed to denote substantial changes in workloads attributed to either failure or spikes. In either of these events (or any other event that an organization specifies), alerts are generated to a third-party monitor tasked with overseeing designated workflows. Depending on the specifics of the policy, that monitor will either take the previously delineated action or issue alerts to the organization, which can then choose from a variety of contingencies. “A classic use case is if a particular process isn’t getting a certain level of CPU power; then you look for another CPU to meet that need,” Boxley said. The aforementioned governance policies will identify which alternative CPUs to use.

Data Fabric
The governance requisites for shifting workloads (and their data) between environments is a pivotal aspect of holistically stitching together what’s termed a comprehensive data fabric of the enterprise’s computational and data resources. “In terms of the technology part, we’re getting there,” Boxley acknowledged. “The bigger challenge is the governance part.” With regulations such as GDPR propelling the issue of governance to the forefront of data-driven organizations, the need to solidify policies prior to moving data around is clear.

Equally unequivocal, however, is the need to intelligently shift those resources around an organization’s entire computing fabric to avail itself of the advantages found in different environments. As Boxley indicated, “The environments are going to continue to get more complex.” The degree of complexity implicit in varying environments such as Windows or Linux, in addition to computational settings ranging from the cloud to on-premise, virtualization options to physical servers, seemingly supports the necessity of managing these variations in a uniform manner. The ability to transfer workloads and data between these diverse settings with machine intelligent methods in alignment with predefined governance policies is an ideal way of reducing that complexity so it becomes much more manageable—and agile.

Originally Posted at: An Agile Approach to Big Data Management: Intelligent Workload Routing Automation

Podcast – How Talend Is Helping Companies Liberate Their Data

Earlier this year, I had the opportunity to chat with Nathan Latka on his podcast “The Top.” It’s a broad sweeping discussion that covers everything from Talend’s origins through our current market approach and growth. We talk about Talend’s financial model, acquisition strategy, product focus, and plan for competing for in a $16B market opportunity. Hope you enjoy!

[soundcloud url=”https://api.soundcloud.com/tracks/459114396″ params=”color=#ff5500&auto_play=false&hide_related=false&show_comments=true&show_user=true&show_reposts=false&show_teaser=true&visual=true” width=”100%” height=”300″ iframe=”true” /]

[youtube https://www.youtube.com/watch?v=WwaYyhErtUI]

 

The post Podcast – How Talend Is Helping Companies Liberate Their Data appeared first on Talend Real-Time Open Source Data Integration Software.

Source: Podcast – How Talend Is Helping Companies Liberate Their Data

Jul 19, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Statistically Significant  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> #BigData #BigOpportunity in Big #HR by @MarcRind #JobsOfFuture #Podcast by v1shal

>> Gleanster – Actionable Insights at a Glance by bobehayes

>> 2017 Trends in Cognitive Computing: Humanizing Artificial Intelligence by jelaniharper

Wanna write? Click Here

[ NEWS BYTES]

>>
 Monica C. Smith, CEO Of Marketsmith, Makes No Apologies As A ‘Catalyst of Innovation’ – MarTech Series Under  Marketing Analytics

>>
 Machine Learning, EHR Big Data Analytics Predict Sepsis – Health IT Analytics Under  Big Data Analytics

>>
 2018 Big Data 100: 30 Coolest Business Analytics Vendors – CRN Under  Business Analytics

More NEWS ? Click Here

[ FEATURED COURSE]

R Basics – R Programming Language Introduction

image

Learn the essentials of R Programming – R Beginner Level!… more

[ FEATURED READ]

Hypothesis Testing: A Visual Introduction To Statistical Significance

image

Statistical significance is a way of determining if an outcome occurred by random chance, or did something cause that outcome to be different than the expected baseline. Statistical significance calculations find their … more

[ TIPS & TRICKS OF THE WEEK]

Data Analytics Success Starts with Empowerment
Being Data Driven is not as much of a tech challenge as it is an adoption challenge. Adoption has it’s root in cultural DNA of any organization. Great data driven organizations rungs the data driven culture into the corporate DNA. A culture of connection, interactions, sharing and collaboration is what it takes to be data driven. Its about being empowered more than its about being educated.

[ DATA SCIENCE Q&A]

Q:How would you define and measure the predictive power of a metric?
A: * Predictive power of a metric: the accuracy of a metric’s success at predicting the empirical
* They are all domain specific
* Example: in field like manufacturing, failure rates of tools are easily observable. A metric can be trained and the success can be easily measured as the deviation over time from the observed
* In information security: if the metric says that an attack is coming and one should do X. Did the recommendation stop the attack or the attack never happened?

Source

[ VIDEO OF THE WEEK]

Unconference Panel Discussion: #Workforce #Analytics Leadership Panel

 Unconference Panel Discussion: #Workforce #Analytics Leadership Panel

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

The data fabric is the next middleware. – Todd Papaioannou

[ PODCAST OF THE WEEK]

@TimothyChou on World of #IOT & Its #Future Part 2 #FutureOfData #Podcast

 @TimothyChou on World of #IOT & Its #Future Part 2 #FutureOfData #Podcast

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

Decoding the human genome originally took 10 years to process; now it can be achieved in one week.

Sourced from: Analytics.CLUB #WEB Newsletter

How the NFL is Using Big Data

Your fantasy football team just went high tech.

Like many businesses, the National Football League is experimenting with big data to help players, fans, and teams alike.

The NFL recently announced a deal with tech firm Zebra to install RFID data sensors in players’ shoulder pads and in all of the NFL’s arenas. The chips collect detailed location data on each player, and from that data, things like player acceleration and speed can be analyzed.

The NFL plans to make the data available to fans and teams, though not during game play. The thought is that statistics-mad fans will jump at the chance to consume more data about their favorite players and teams.

In the future, the data collection might be expanded. In last year’s Pro Bowl, sensors were installed in the footballs to show exactly how far they were thrown.

Big data on the gridiron
Of course, this isn’t the NFL’s first foray into big data. In fact, like other statistics-dependent sports leagues, the NFL was crunching big data before the term even existed.

However, in the last few years, the business has embraced the technology side, hiring its first chief information officer, and developing its own platform available to all 32 teams. Individual teams can create their own applications to mine the data to improve scouting, education, and preparation for meeting an opposing team.

It’s also hoped that the data will help coaches make better decisions. They can review real statistics about an opposing team’s plays or how often one of their own plays worked rather than relying solely on instinct. They will also, in the future, be able to use the data on an individual player to determine if he is improving.

Diehard fans can, for a fee, access this same database to build their perfect fantasy football team. Because, at heart, the NFL believes that the best fans are engaged fans. They want to encourage the kind of obsessive statistics-keeping that many sport fans are known for.

nfl

Will big data change the game?

It’s hard to predict how this flood of new data will impact the game. Last year, only 14 stadiums and a few teams were outfitted with the sensors. And this year, the NFL decided against installing sensors in all footballs after the politics of last year’s “deflate gate” when the Patriots were accused of under inflating footballs for an advantage.
Still, it seems fairly easy to predict that the new data will quickly make its way into TV broadcast booths and instant replays. Broadcasters love to have additional data points to examine between plays and between games.

And armchair quarterbacks will now have yet another insight into the game, allowing them access (for a fee) to the same information the coaches have. Which will, of course mean they can make better calls than the coaches. Right?

Bernard Marr is a best-selling author, keynote speaker and business consultant in big data, analytics and enterprise performance. His new books are ‘Big Data’ ‘Key Business Analytics’

Source: How the NFL is Using Big Data

Customer Churn or Retention? A Must Watch Customer Experience Tutorial

Care about Churn or Retention. Here is a brilliant watch for you.
Care about Churn or Retention. Here is a brilliant watch for you.

Customer retention and reduced churn is in high charts for most of businesses. So, how can companies work through their customer experience to achieve it. In the video below, TCELab touched some brilliant points that could help any company work through their strategy to build a Voice of Customer program.

The video is taken from one of our affiliate calls and it got a lot of positive response, so we decided to use it for educational purposes. If you don’t have 1 hour to spend, here is a trajectory for what is covered and when.

Happy scrolling. Don’t forget to share it with your network so that they could get things right as well.

0:00:07 What is Customer Experience Management (CEM)?
0:02:04 Why do CEO’s care?
0:04:15 Why CEM vendor should be excited?
0:07:15 What does CEM Program looks like?
0:07:45 Design of a CEM Program: CEM Program Components
0:11:20 Design of a CEM Program: Disparate Sources of Business Data
0:14:23 Design of a CEM Program: Data Linkage (connecting data to answer different question)
0:17:17 Design of a CEM Program: Integrating your business data (mapping organization silos with survey type)
0:20:58 Design of a CEM Program: Three ways to grow business… why just NPS is not enough?
0:25:40 TCELab product plug but some cross winds of CEM gold information
0:33:10 TCELab CLAAP Platform but some cross winds of CEM gold information
0:39:00 TCELab product execution process, time-lengths & other relevant information around it (information relevant to affiliate networks)
0:43:30 TCELab product lists (information relevant to affiliate networks)
0:52:40 TCELab case study: Kashoo + lot of good information for SAAS companies CEM program
Blog source

Source by v1shal

Jul 12, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Big Data knows everything  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> Apr 27, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..) by admin

>> Free Research Report on the State of Patient Experience in US Hospitals by bobehayes

>> Jun 01, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..) by admin

Wanna write? Click Here

[ NEWS BYTES]

>>
 Bill Promoting Behavioral Health EHR Incentives Passes House – EHRIntelligence.com Under  Health Analytics

>>
 India sees 77% growth in HR analytics professionals in 5 years, says report – Business Standard Under  Talent Analytics

>>
 Utilizing Healthcare Data Security, Cloud for a Stronger Environment – HealthITSecurity.com Under  Cloud Security

More NEWS ? Click Here

[ FEATURED COURSE]

Tackle Real Data Challenges

image

Learn scalable data management, evaluate big data technologies, and design effective visualizations…. more

[ FEATURED READ]

Antifragile: Things That Gain from Disorder

image

Antifragile is a standalone book in Nassim Nicholas Taleb’s landmark Incerto series, an investigation of opacity, luck, uncertainty, probability, human error, risk, and decision-making in a world we don’t understand. The… more

[ TIPS & TRICKS OF THE WEEK]

Analytics Strategy that is Startup Compliant
With right tools, capturing data is easy but not being able to handle data could lead to chaos. One of the most reliable startup strategy for adopting data analytics is TUM or The Ultimate Metric. This is the metric that matters the most to your startup. Some advantages of TUM: It answers the most important business question, it cleans up your goals, it inspires innovation and helps you understand the entire quantified business.

[ DATA SCIENCE Q&A]

Q:Explain what a long-tailed distribution is and provide three examples of relevant phenomena that have long tails. Why are they important in classification and regression problems?
A: * In long tailed distributions, a high frequency population is followed by a low frequency population, which gradually tails off asymptotically
* Rule of thumb: majority of occurrences (more than half, and when Pareto principles applies, 80%) are accounted for by the first 20% items in the distribution
* The least frequently occurring 80% of items are more important as a proportion of the total population
* Zipf’s law, Pareto distribution, power laws

Examples:
1) Natural language
– Given some corpus of natural language – The frequency of any word is inversely proportional to its rank in the frequency table
– The most frequent word will occur twice as often as the second most frequent, three times as often as the third most frequent…
– The” accounts for 7% of all word occurrences (70000 over 1 million)
– ‘of” accounts for 3.5%, followed by ‘and”…
– Only 135 vocabulary items are needed to account for half the English corpus!

2. Allocation of wealth among individuals: the larger portion of the wealth of any society is controlled by a smaller percentage of the people

3. File size distribution of Internet Traffic

Additional: Hard disk error rates, values of oil reserves in a field (a few large fields, many small ones), sizes of sand particles, sizes of meteorites

Importance in classification and regression problems:
– Skewed distribution
– Which metrics to use? Accuracy paradox (classification), F-score, AUC
– Issue when using models that make assumptions on the linearity (linear regression): need to apply a monotone transformation on the data (logarithm, square root, sigmoid function…)
– Issue when sampling: your data becomes even more unbalanced! Using of stratified sampling of random sampling, SMOTE (‘Synthetic Minority Over-sampling Technique”, NV Chawla) or anomaly detection approach

Source

[ VIDEO OF THE WEEK]

Solving #FutureOfOrgs with #Detonate mindset (by @steven_goldbach & @geofftuff) #FutureOfData #Podcast

 Solving #FutureOfOrgs with #Detonate mindset (by @steven_goldbach & @geofftuff) #FutureOfData #Podcast

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

You can use all the quantitative data you can get, but you still have to distrust it and use your own intelligence and judgment. – Alvin Tof

[ PODCAST OF THE WEEK]

@ChuckRehberg / @TrigentSoftware on Translating Technology to Solve Business Problems #FutureOfData #Podcast

 @ChuckRehberg / @TrigentSoftware on Translating Technology to Solve Business Problems #FutureOfData #Podcast

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

40% projected growth in global data generated per year vs. 5% growth in global IT spending.

Sourced from: Analytics.CLUB #WEB Newsletter

The Meaning of Scale Values for Likelihood to Recommend Ratings

meaning of likelihood to recommend ratingsCustomer experience management professionals use self-reported “likelihood” questions to measure customer loyalty. In their basic form, customer loyalty questions ask customers to rate their likelihood of engaging in different types of loyalty behaviors toward the company, including behaviors around retention (likelihood to leave), advocacy (likelihood to recommend) and purchasing (likelihood to buy different products). These loyalty questions typically employ a “0 to 10” rating scale that includes verbal anchors at each end of the scale to indicate low or high likelihood (e.g., 0 = not at all likely to 10 = extremely likely; 0 = least likely to 10 = most likely).

While it is generally accepted that higher likelihood ratings represent higher levels of loyalty, we are left with some ambiguity about what defines “good” likelihood ratings. How much better is a score of 8 than 7? Is there a meaningful difference between a rating of 7 and 6? I explored the meaning of likelihood ratings in the present analysis.

The Data

Data are from a customer survey of a large automobile dealership located on the east coast. The current analysis is based on 30,236 respondents over a 4-year period (2010 thru Nov 2013) across 20 different locations. The survey asked customers six questions, including two similar loyalty questions:

  1. How likely is it that you would recommend your sales person to a friend or colleague? Rating options were on a rating scale from 0 (Least Likely) to 10 (Most Likely).
  2. Would you recommend your sales associate to a friend, relative or neighbor? Response options were either “Yes” or “No.”
Figure 1. Meaning of Scale Values for Likelihood to Recommend Ratings
Figure 1. Meaning of Scale Values for Likelihood to Recommend Ratings

Results

Figure 1 illustrates the relationship between these two different loyalty questions, showing the percent of “Yes” responses of second loyalty question across each of the 11 response options of the first loyalty question. Generally, the percent of customers who say they would recommend the company drops as the likelihood to recommend ratings drop. We see a few interesting things when we look a little closer at this figure:

  1. There is no major difference between ratings of 8, 9 and 10 with respect to the percent of customers who say they would recommend. For each of the top likelihood scale values, nearly all (over 99%) customers say they would recommend.
  2. Likelihood ratings of 0 through 6 represent negative loyalty responses; for each of these likelihood values, most customers would not recommend.
  3. There is a substantial increase in loyalty from a likelihood rating of 6 (38% say “Yes”)  to 7 (80% say “Yes”) .

Implications

There are few implications regarding the results of the current analysis. First, companies use customer segmentation analysis to create homogeneous groups of customers to target each group with specific marketing, sales and service initiatives to improve or maintain overall customer loyalty. The developers of the Net Promoter Score segment customers into three groups based on their likelihood rating: Promoters (ratings of 9 or 10), Passives (ratings of 7 or 8) and Detractors (ratings of 0 through 6). The results of the present analysis, however, suggest a slightly different segmentation. The results show that customers who give a likelihood rating of 8 are more similar to customers who give a rating of 9 or 10 (Promoters) than they are to customers who give a rating of 7 (Passives).

Second, the segmentation of customers based on the NPS trichotomy seems arbitrary. The current results supports a slightly different segmentation of customers into different, more homogeneous groups. These three new customer segments are:

  1. Strong Advocates: Customers who give a likelihood rating of 8, 9 or 10. Over 99% percent of Strong Advocates say they would recommend.
  2. Advocates: Customers who give a likelihood rating of 7. About 80% of Advocates say they would recommend.
  3. Non-Advocates: Customers who give a likelihood rating of 0 through 6. On average, 84% of Non-Advocates say they would not recommend. These customers are labeled Detractors in NPS lingo. The relabeling of this group of customers as “Non-advocates” more honestly reflects what is being measured by the “likelihood to recommend” question. See Word-of-Mouth Index that more fully explores this notion.

Finally, it appears that all Detractors are not created equal. While ratings of 0 through 6 do represent low loyalty, a likelihood rating of 6 is a little more positive than the lower likelihood ratings. Targeting improvement initiatives that blanket all are typically directed at customers who report low levels of loyalty, those giving a likelihood rating of 6 or less. However, it might be beneficial to further segment these at-risk customers to target improvement initiatives to customers giving a rating of 6. Recall that a likelihood rating of 7 results in a substantial increase in recommending. Moving customers from a likelihood rating from 0 to 1 or 1 to 2 does very little to increase recommendations. Moving customers from a likelihood rating from 6 to 7, however, results in a substantial increase in recommending.

Source by bobehayes

10 Misconceptions about Big Data

10 Misconceptions about Big Data
10 Misconceptions about Big Data

Lot of content is thrown around Big Data, and how it could change the market landscape. It is running it’s hype cycle and many bloggers, experts, consultants, & professionals are lining themselves to align with big data. So, not everything that is known to this industry is accurate and some misconceptions that show up here and there have been discussed below.

  1. Big Data has the answer to everything: This has been floating as part of many coffee table conversations. Big data could help but it’s not a magic wand that will help you find answers to everything. Certainly Bigdata could potentially help you answer most cryptic questions but it’s not for everything. So, fine-tune the expectation of what to get out from BigData strategy.
  2. Data Scientist drives BigData: Almost every now and then, we stumble upon someone who claims to be a data scientist and boast about how they are driving BigData in their company. Surely, they all are doing important work of helping find insights from data but BigData is already happening whether Data Scientists drive it or not. BigData journey begins with capturing as much data as possible. Data scientists just help steer the insights from data. So, don’t wait on data scientist before you start prepping for BigData.
  3. Big Data is complicated: With escalating pay cheque of Data Scientists, it is not difficult to understand that BigData is perceived as rocket science that only few could tame and understand. This is a pure evil that most of businesses has to deal with. BigData is just more volume, velocity and variety of contained data. It need not be complicated. In fact, a well-designed big-data system is often scalable, simple and fast. So, breath easy if you find your big data nicely laid out and easy to understand.
  4. The More Data the better: There is a debate on how much data is effective and whether more data is better. There are certainly two schools of thoughts. One suggesting that more data you have better you could learn from it. But I believe data and its effectiveness goes more around quality aspect of data and not always quantity. So, based on the circumstances, quality and in some case quantity could signify better impact from data.
  5. Big data is just hype: Surely, you must find yourself either for or against this statement, but that is because it is what it is. Bigdata is getting a lot of press hours and PR time. It is partly because there is hype, but partly because tools to deal with big data has unveiled the capability to address unmanageable blob of data and parse to insights using commodity hardware. So, hype is evident but there is whole capability shift that is fueling this hype like demand to handle more data and get to better insights within data. So, big data is not just hype but a real shift of capabilities on how businesses start to look at their data.
  6. Big data is unstructured: Yay, I am certain if you are into bigdata domain for more than 1day, you must have heard rants around bigdata being unstructured data. It is not true. As stated earlier, big data is just data that is beyond your expectations around 3 vectors: Volume, Velocity and Variety. So, data could be structured or unstructured, its 3 Vs that defines its BigData status and not the structure of data.
  7. Data eliminates uncertainty: Data surely helps convey more information around a particular use case but certainly it is not an indicator to predict certainty. Future data is as uncertain as the market condition. Uncertainty comes to business layers through various areas, competitive landscape, customer experience, market conditions, and other business dependent conditions. So, data is certainly not a good indicator for eliminating uncertainty.
  8. We must capture everything in order to analyze our Big Data: Sure, it sounds awesome to capture everything to learn everything, but it’s delusional. Everything is a very circumstantial thing. Business every time shifts its dependence from few sets of KPIs to others. So, there could never be an exhaustive list to capture. It will keep on changing with market. Another key area to understand is that few data sets have limited to no impact on business, so data should be picked according to it’s impact on the business. And these KPIs must be evaluated every now and then to measure changing market shift.
  9. Big Data systems are expensive to implement and maintain: Yes, this still exists as a misconception with many businesses. But businesses should try to understand that the very fact Bigdata is sitting in hot seat is because commodity hardware could now be used to tackle bigdata. So, bigdata systems are not expansive anymore. They have been low and getting lower on their cost. So, cost should never be a deterrent for indulging in bigdata project.
  10. Big Data is for Big Companies Only: Like the point previously quoted, big data tools are cheap and they are run on cheap commodity hardware. So, they are accessible and no more the dream/passion of big corporation only. Small/Mid size companies have almost similar leverage when it comes to thinking like big corporations. So, bigdata capabilities are for the strong hearts and not the rich pockets.

So, BigData landscape is filled with Truth and Myth, so make sure to check which side your hurdle lies before calling in quits and throwing in the towel.

Source: 10 Misconceptions about Big Data

Jul 05, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Correlation-Causation  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> Why the time is ripe for security behaviour analytics by analyticsweekpick

>> #FutureOfData with @CharlieDataMine, @Oracle discussing running analytics in an enterprise by v1shal

>> CMS Predictive Readmission Models ‘Not Very Good’ by analyticsweekpick

Wanna write? Click Here

[ NEWS BYTES]

>>
 Global BPO Business Analytics Market by 2023: Accenture, HP, TCS … – Healthcare Journal Under  Business Analytics

>>
 Streaming Analytics Market: Business Opportunities, Current Trends, Market Challenges & Global Industry Analysis By … – Business Investor Under  Streaming Analytics

>>
 Microsoft Azure Container Instances Now Production-Ready – Virtualization Review Under  Virtualization

More NEWS ? Click Here

[ FEATURED COURSE]

Lean Analytics Workshop – Alistair Croll and Ben Yoskovitz

image

Use data to build a better startup faster in partnership with Geckoboard… more

[ FEATURED READ]

The Black Swan: The Impact of the Highly Improbable

image

A black swan is an event, positive or negative, that is deemed improbable yet causes massive consequences. In this groundbreaking and prophetic book, Taleb shows in a playful way that Black Swan events explain almost eve… more

[ TIPS & TRICKS OF THE WEEK]

Winter is coming, warm your Analytics Club
Yes and yes! As we are heading into winter what better way but to talk about our increasing dependence on data analytics to help with our decision making. Data and analytics driven decision making is rapidly sneaking its way into our core corporate DNA and we are not churning practice ground to test those models fast enough. Such snugly looking models have hidden nails which could induce unchartered pain if go unchecked. This is the right time to start thinking about putting Analytics Club[Data Analytics CoE] in your work place to help Lab out the best practices and provide test environment for those models.

[ DATA SCIENCE Q&A]

Q:What is principal component analysis? Explain the sort of problems you would use PCA for. Also explain its limitations as a method?

A: Statistical method that uses an orthogonal transformation to convert a set of observations of correlated variables into a set of values of linearly uncorrelated variables called principal components.

Reduce the data from n to k dimensions: find the k vectors onto which to project the data so as to minimize the projection error.
Algorithm:
1) Preprocessing (standardization): PCA is sensitive to the relative scaling of the original variable
2) Compute covariance matrix ?
3) Compute eigenvectors of ?
4) Choose kk principal components so as to retain xx% of the variance (typically x=99)

Applications:
1) Compression
– Reduce disk/memory needed to store data
– Speed up learning algorithm. Warning: mapping should be defined only on training set and then applied to test set

2. Visualization: 2 or 3 principal components, so as to summarize data

Limitations:
– PCA is not scale invariant
– The directions with largest variance are assumed to be of most interest
– Only considers orthogonal transformations (rotations) of the original variables
– PCA is only based on the mean vector and covariance matrix. Some distributions (multivariate normal) are characterized by this but some are not
– If the variables are correlated, PCA can achieve dimension reduction. If not, PCA just orders them according to their variances

Source

[ VIDEO OF THE WEEK]

#BigData @AnalyticsWeek #FutureOfData with Jon Gibs(@jonathangibs) @L2_Digital

 #BigData @AnalyticsWeek #FutureOfData with Jon Gibs(@jonathangibs) @L2_Digital

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

The most valuable commodity I know of is information. – Gordon Gekko

[ PODCAST OF THE WEEK]

#BigData @AnalyticsWeek #FutureOfData #Podcast with Joe DeCosmo, @Enova

 #BigData @AnalyticsWeek #FutureOfData #Podcast with Joe DeCosmo, @Enova

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

Big data is a top business priority and drives enormous opportunity for business improvement. Wikibon’s own study projects that big data will be a $50 billion business by 2017.

Sourced from: Analytics.CLUB #WEB Newsletter