Jun 21, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Ethics  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> Enterprise Data Modeling Made Easy by jelaniharper

>> Assessment of Risk Maps in Data Scientist Jobs by thomassujain

>> Mar 01, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..) by admin

Wanna write? Click Here

[ NEWS BYTES]

>>
 Network Intelligence witnesses 48% growth in FY 2017-18 – Tech Observer Under  Big Data Security

>>
 Predictive and Prescriptive Analytics Market valued of USD 16.84 billion, at a CAGR of 20.43% by the end of 2023 – The Financial Analyst Under  Prescriptive Analytics

>>
 New York City Spending the Focus at Fourth Annual Business Analytics Conference – Manhattan College News Under  Analytics

More NEWS ? Click Here

[ FEATURED COURSE]

Deep Learning Prerequisites: The Numpy Stack in Python

image

The Numpy, Scipy, Pandas, and Matplotlib stack: prep for deep learning, machine learning, and artificial intelligence… more

[ FEATURED READ]

The Misbehavior of Markets: A Fractal View of Financial Turbulence

image

Mathematical superstar and inventor of fractal geometry, Benoit Mandelbrot, has spent the past forty years studying the underlying mathematics of space and natural patterns. What many of his followers don’t realize is th… more

[ TIPS & TRICKS OF THE WEEK]

Data aids, not replace judgement
Data is a tool and means to help build a consensus to facilitate human decision-making but not replace it. Analysis converts data into information, information via context leads to insight. Insights lead to decision making which ultimately leads to outcomes that brings value. So, data is just the start, context and intuition plays a role.

[ DATA SCIENCE Q&A]

Q:What is cross-validation? How to do it right?
A: It’s a model validation technique for assessing how the results of a statistical analysis will generalize to an independent data set. Mainly used in settings where the goal is prediction and one wants to estimate how accurately a model will perform in practice. The goal of cross-validation is to define a data set to test the model in the training phase (i.e. validation data set) in order to limit problems like overfitting, and get an insight on how the model will generalize to an independent data set.

Examples: leave-one-out cross validation, K-fold cross validation

How to do it right?

the training and validation data sets have to be drawn from the same population
predicting stock prices: trained for a certain 5-year period, it’s unrealistic to treat the subsequent 5-year a draw from the same population
common mistake: for instance the step of choosing the kernel parameters of a SVM should be cross-validated as well
Bias-variance trade-off for k-fold cross validation:

Leave-one-out cross-validation: gives approximately unbiased estimates of the test error since each training set contains almost the entire data set (n?1n?1 observations).

But: we average the outputs of n fitted models, each of which is trained on an almost identical set of observations hence the outputs are highly correlated. Since the variance of a mean of quantities increases when correlation of these quantities increase, the test error estimate from a LOOCV has higher variance than the one obtained with k-fold cross validation

Typically, we choose k=5 or k=10, as these values have been shown empirically to yield test error estimates that suffer neither from excessively high bias nor high variance.
Source

[ VIDEO OF THE WEEK]

@AnalyticsWeek Panel Discussion: Health Informatics Analytics

 @AnalyticsWeek Panel Discussion: Health Informatics Analytics

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

What we have is a data glut. – Vernon Vinge

[ PODCAST OF THE WEEK]

Want to fix #DataScience ? fix #governance by @StephenGatchell @Dell #FutureOfData #Podcast

 Want to fix #DataScience ? fix #governance by @StephenGatchell @Dell #FutureOfData #Podcast

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

The data volumes are exploding, more data has been created in the past two years than in the entire previous history of the human race.

Sourced from: Analytics.CLUB #WEB Newsletter

2016 Trends in Big Data Governance: Modeling the Enterprise

A number of changes in the contemporary data landscape have affected the implementation of data governance. The normalization of big data has resulted in a situation in which such deployments are so common that they’re merely considered a standard part of data management. The confluence of technologies largely predicated on big data—cloud, mobile and social—are gaining similar prominence, transforming the expectations of not only customers but business consumers of data.

Consequently, the demands for big data governance are greater than ever, as organizations attempt to implement policies to reflect their corporate values and sate customer needs in a world in which increased regulatory consequences and security breaches are not aberrations.

The most pressing developments for big data governance in 2016 include three dominant themes. Organizations need to enforce it outside the corporate firewalls via the cloud, democratize the level of data stewardship requisite for the burgeoning self-service movement, and provide metadata and semantic consistency to negate the impact of silos while promoting sharing of data across the enterprise.

These objectives are best achieved with a degree of foresight and stringency that provides a renewed emphasis on modeling in its myriad forms. According to TopQuadrant co-founder, executive VP and director of TopBraid Technologies Ralph Hodgson, “What you find is the meaning of data governance is shifting. I sometimes get criticized for saying this, but it’s shifting towards a sense of modeling the enterprise.”

In the Cloud

Perhaps the single most formidable challenge facing big data governance is accounting for the plethora of use cases involving the cloud, which appears tailored for the storage and availability demands of big data deployments. These factors, in conjunction with the analytics options available from third-party providers, make utilizing the cloud more attractive than ever. However, cloud architecture challenges data governance in a number of ways including:

  • Semantic modeling: Each cloud application has its own semantic model. Without dedicated governance measures on the part of an organization, integrating those different models can hinder data’s meaning and its reusability.
  • Service provider models: Additionally, each cloud service provider has its own model which may or may not be congruent with enterprise models for data. Organizations have to account for these models as well as those at the application level.
  • Metadata: Applications and cloud providers also have disparate metadata standards which need to be reconciled. According to Tamr Global Head of Strategy, Operations and Marketing Nidhi Aggarwal, “Seeing the metadata is important from a governance standpoint because you don’t want the data available to anybody. You want the metadata about the data transparent.” Vendor lock-in in the form of proprietary metadata issued by providers and their applications can be a problem too—especially since such metadata can encompass an organization’s so that it effectively belongs to the provider.

Rectifying these issues requires a substantial degree of planning prior to entering into service level agreements. Organizations should consider both current and future integration plans and their ramifications for semantics and metadata, which is part of the basic needs assessment that accompanies any competent governance program. Business input is vital to this process. Methods for addressing these cloud-based points of inconsistency include transformation and writing code, or adopting enterprise-wide semantic models via ontologies, taxonomies, and RDF graphs. The critical element is doing so in a way that involves the provider prior to establishing service.

The Democratization of Data Stewardship

The democratization of big data is responsible for an emergence of what Gartner refers to as ‘citizen stewardship’ in two capital ways. The popularity of data lakes and the availability of data preparation tools with cognitive computing capabilities are empowering end users to assert more control over their data. The result is a shifting from the centralized model of data stewardship (which typically encompassed stewards from both the business and IT, the former in accordance to domains) to a decentralized one in which virtually everyone actually using data plays a role in its stewardship.

Both preparation tools and data lakes herald this movement by giving end users the opportunity to perform data integration. Machine learning technologies inform the former and can identify which data is best integrated with others on an application or domain-wide basis. The celerity of this self-service access and integration to data necessitates that the onus of integrating data in accordance to governance policy falls on the end user. Preparation tools can augment that process by facilitating ETL and other forms of action with machine learning algorithms, which can maintain semantic consistency.

Data lakes equipped with semantic capabilities can facilitate a number of preparation functions from initial data discovery to integration while ensuring the sort of metadata and semantic consistency for proper data governance. Regardless, “if you put data in a data lake, there still has to be some metadata associated with it,” MapR Chief Marketing Officer Jack Norris explained. “You need some sort of schema that’s defined so you can accomplish self-service.”

Metadata and Semantic Consistency

No matter what type of architecture is employed (either cloud or on-premise), consistent metadata and semantics represent the foundation of secure governance once enterprise wide policies based on business objectives are formulated. As noted by Franz CEO Jans Aasman, “That’s usually how people define data governance: all the processes that enable you to have more consistent data”. Perhaps the most thorough means of ensuring consistency in these two aspects of governance involves leveraging a data lake or single repository enriched with semantic technologies. The visual representation of data elements on an RDF graph is accessible for end user consumption, while semantic models based on ontological descriptions of data elements clarify their individual meanings. These models can be mapped to metadata to grant uniformity in this vital aspect of governance and provide semantic consistency on diverse sets of big data.

Alternatively, it is possible to achieve metadata consistency via processes instead of technologies. Doing so is more tenuous, yet perhaps preferable to organizations still utilizing a silo approach among different business domains. Sharing and integrating that data is possible through the means of an enterprise-wide governance council with business membership across those domains, which rigorously defines and monitors metadata attributes so that there is still a common semantic model across units. This approach might behoove less technologically savvy organizations, although the sustainment of such councils could become difficult. Still, this approach results in consistent metadata and semantic models on disparate sets of big data.

Enterprise Modeling

The emphasis on modeling that is reflected in all of these trends substantiates the viewpoint that effective big data governance requires strident modeling. Moreover, it is important to implement at a granular level so that data is able to be reused and maintain its meaning across different technologies, applications, business units, and personnel changes. The degree of prescience and planning required to successfully model the enterprise to ensure governance objectives are met will be at the forefront of governance concerns in 2016, whether organizations are seeking new data management solutions or refining established ones. In this respect, governance is actually the foundation upon which data management rests. According to Cambridge Semantics president Alok Prasad, “Even if you are the CEO, you will not go against your IT department in terms of security and governance. Even if you can get a huge ROI, if the governance and security are not there you will not adopt a solution.”

 

Originally Posted at: 2016 Trends in Big Data Governance: Modeling the Enterprise

February 13, 2017 Health and Biotech analytics news roundup

News and commentary about health and biotech data:

How data science is transforming cancer treatment scheduling: Scheduling appointments is a problem that many electronic systems are not able to handle efficiently. New holistic approaches inspired by manufacturing are helping to improve this process.

Data Analytics May Keep Cancer Patients out of Emergency Departments: Researchers at the UPenn school of medicine are developing a model to predict when patients need emergency care, as well as best practices for when they need such care.

Why practices are struggling to exchange records: A recent survey found some key issues in the exchange of electronic health records, including a lack of confidence in the technology and difficulties transferring data across different systems.

Pharmacist and drug associations want better data on medicine shortages: The associations called for more up-to-date information about possible shortages from providers.

Secrets of Life in a Spoonful of Blood: Researchers have increasingly powerful tools to study fetal development, including sequencing DNA found in the mother’s blood.

Originally Posted at: February 13, 2017 Health and Biotech analytics news roundup by pstein

Jun 14, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Productivity  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> What is data security For Organizations? by analyticsweekpick

>> Data Modeling Tomorrow: Self-Describing Data Formats by jelaniharper

>> Data-As-A-Service to enable compliance reporting by v1shal

Wanna write? Click Here

[ NEWS BYTES]

>>
 Amazon’s cloud business is using this weird 1976 ‘Saturday Night Live’ skit to explain its new blockchain product – Business Insider Under  Cloud

>>
 Virtualization Software Market Overview, Cost Structure Analysis, Growth Opportunities and Forecast to 2023 – Exclusive Reportage Under  Virtualization

>>
 Stellar Lumens [XLM], thrusters burned out? – Sentiment Analysis – April 16 – AMBCrypto Under  Sentiment Analysis

More NEWS ? Click Here

[ FEATURED COURSE]

Lean Analytics Workshop – Alistair Croll and Ben Yoskovitz

image

Use data to build a better startup faster in partnership with Geckoboard… more

[ FEATURED READ]

The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World

image

In the world’s top research labs and universities, the race is on to invent the ultimate learning algorithm: one capable of discovering any knowledge from data, and doing anything we want, before we even ask. In The Mast… more

[ TIPS & TRICKS OF THE WEEK]

Winter is coming, warm your Analytics Club
Yes and yes! As we are heading into winter what better way but to talk about our increasing dependence on data analytics to help with our decision making. Data and analytics driven decision making is rapidly sneaking its way into our core corporate DNA and we are not churning practice ground to test those models fast enough. Such snugly looking models have hidden nails which could induce unchartered pain if go unchecked. This is the right time to start thinking about putting Analytics Club[Data Analytics CoE] in your work place to help Lab out the best practices and provide test environment for those models.

[ DATA SCIENCE Q&A]

Q:What is: collaborative filtering, n-grams, cosine distance?
A: Collaborative filtering:
– Technique used by some recommender systems
– Filtering for information or patterns using techniques involving collaboration of multiple agents: viewpoints, data sources.
1. A user expresses his/her preferences by rating items (movies, CDs.)
2. The system matches this user’s ratings against other users’ and finds people with most similar tastes
3. With similar users, the system recommends items that the similar users have rated highly but not yet being rated by this user

n-grams:
– Contiguous sequence of n items from a given sequence of text or speech
– ‘Andrew is a talented data scientist”
– Bi-gram: ‘Andrew is”, ‘is a”, ‘a talented”.
– Tri-grams: ‘Andrew is a”, ‘is a talented”, ‘a talented data”.
– An n-gram model models sequences using statistical properties of n-grams; see: Shannon Game
– More concisely, n-gram model: P(Xi|Xi?(n?1)…Xi?1): Markov model
– N-gram model: each word depends only on the n?1 last words

Issues:
– when facing infrequent n-grams
– solution: smooth the probability distributions by assigning non-zero probabilities to unseen words or n-grams
– Methods: Good-Turing, Backoff, Kneser-Kney smoothing

Cosine distance:
– How similar are two documents?
– Perfect similarity/agreement: 1
– No agreement : 0 (orthogonality)
– Measures the orientation, not magnitude

Given two vectors A and B representing word frequencies:
cosine-similarity(A,B)=?A,B?/||A||?||B||

Source

[ VIDEO OF THE WEEK]

Making sense of unstructured data by turning strings into things

 Making sense of unstructured data by turning strings into things

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

In God we trust. All others must bring data. – W. Edwards Deming

[ PODCAST OF THE WEEK]

#BigData #BigOpportunity in Big #HR by @MarcRind #JobsOfFuture #Podcast

 #BigData #BigOpportunity in Big #HR by @MarcRind #JobsOfFuture #Podcast

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

Big data is a top business priority and drives enormous opportunity for business improvement. Wikibon’s own study projects that big data will be a $50 billion business by 2017.

Sourced from: Analytics.CLUB #WEB Newsletter

How to Use Social Media to Find Customers (Infographic)

How to Use Social Media to Find Customers (Infographic)
How to Use Social Media to Find Customers (Infographic)

Everyone talks about how important it is to be on Social Media.  But how do you use it to gather more customers?  Well, Kathleen Davis has shared some information compiled by Wishpond.

 

Did you know that 77% of B2C have found new customers through Facebook, while LinkedIn has proven significant for B2B. Try 277% more effective than Facebook. Those are some outstanding numbers!

 

Check out the rest in the Infographic below.

 

Lessons Big Data Projects Could Use From Startups

 

source

Source

The Pitch Deck We Used To Raise $500,000 For Our Startup

The Pitch Deck We Used To Raise $500,000 For Our Startup
The Pitch Deck We Used To Raise $500,000 For Our Startup

I came across Pitch Deck used by Buffer co-founders Joel Gascoigne and Leo Widrich to raise $500k round. This deck has lots of good information and it could be really useful for other startups seeking to raise capital.

Why it is relevant for most of startups? Because buffer founders were also first timers.

One of the big no-no’s we’ve learnt about early on in Silicon Valley is to publicly share the pitchdeck you’ve used to raise money. At least, not before you’ve been acquired or failed or in any other way been removed from stage. That’s a real shame, we thought. Sharing the actual slidedeck we used (and one, that’s not 10 years old) is by far one of the most useful things for others to learn from. In fact, both Joel and I have privately shared the deck with fledging founders to help them with their fundraising. On top of that, our case study is hopefully uniquely insightful for lots of people. Here is why:
Half a million is not a crazy amount: It’s therefore hopefully an example that helps the widest range of founders trying to raise money.
Both Joel and myself are first-timers: We couldn’t just throw big names onto a slideshow and ride with it. We had to test and change the flow and deck a lot.

As a summary: this deck is build upto one key slide: Traction.

So without further ado – have a look at their pitch deck.

ps: If you want to read more? check out OnStartups.com

Source by v1shal

Jun 07, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Tour of Accounting  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> December 5, 2016 Health and Biotech analytics news roundup by pstein

>> Is data analytics about causes … or correlations? by analyticsweekpick

>> Mar 15, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..) by admin

Wanna write? Click Here

[ NEWS BYTES]

>>
 Getting Data Science Right: How To Structure Data Science Teams For Maximum Results – Forbes Under  Data Science

>>
 4 Companies Betting on Cloud Gaming – Motley Fool Under  Cloud

>>
 From Hawkeye country to the Vikings with data analytics – The Gazette: Eastern Iowa Breaking News and Headlines Under  Analytics

More NEWS ? Click Here

[ FEATURED COURSE]

Deep Learning Prerequisites: The Numpy Stack in Python

image

The Numpy, Scipy, Pandas, and Matplotlib stack: prep for deep learning, machine learning, and artificial intelligence… more

[ FEATURED READ]

How to Create a Mind: The Secret of Human Thought Revealed

image

Ray Kurzweil is arguably today’s most influential—and often controversial—futurist. In How to Create a Mind, Kurzweil presents a provocative exploration of the most important project in human-machine civilization—reverse… more

[ TIPS & TRICKS OF THE WEEK]

Winter is coming, warm your Analytics Club
Yes and yes! As we are heading into winter what better way but to talk about our increasing dependence on data analytics to help with our decision making. Data and analytics driven decision making is rapidly sneaking its way into our core corporate DNA and we are not churning practice ground to test those models fast enough. Such snugly looking models have hidden nails which could induce unchartered pain if go unchecked. This is the right time to start thinking about putting Analytics Club[Data Analytics CoE] in your work place to help Lab out the best practices and provide test environment for those models.

[ DATA SCIENCE Q&A]

Q:What is the Central Limit Theorem? Explain it. Why is it important?
A: The CLT states that the arithmetic mean of a sufficiently large number of iterates of independent random variables will be approximately normally distributed regardless of the underlying distribution. i.e: the sampling distribution of the sample mean is normally distributed.
– Used in hypothesis testing
– Used for confidence intervals
– Random variables must be iid: independent and identically distributed
– Finite variance

Source

[ VIDEO OF THE WEEK]

#FutureOfData with Rob(@telerob) / @ConnellyAgency on running innovation in agency

 #FutureOfData with Rob(@telerob) / @ConnellyAgency on running innovation in agency

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

The temptation to form premature theories upon insufficient data is the bane of our profession. – Sherlock Holmes

[ PODCAST OF THE WEEK]

Jeff Palmucci @TripAdvisor discusses managing a #MachineLearning #AI Team

 Jeff Palmucci @TripAdvisor discusses managing a #MachineLearning #AI Team

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

Walmart handles more than 1 million customer transactions every hour, which is imported into databases estimated to contain more than 2.5 petabytes of data.

Sourced from: Analytics.CLUB #WEB Newsletter

Best Practices For Building Talent In Analytics

Best practice pinned on noticeboard

Companies across all industries depend more and more on analytics and insights to run their businesses profitably. But, attracting, managing and retaining talented personnel to execute on those strategies remains a challenge. This is not the case for consumer products heavyweight The Procter & Gamble Company (P&G), which has been at the top of its analytics game for 50 years now.

During the 2014 Retail/Consumer Goods Analytics Summit, Glenn Wegryn, retired associate director of analytics for P&G, shared best practices for building the talent capabilities required to ensure success. A leadership council is in charge of sharing analytics best practices across P&G — breaking down silos to make sure the very best talent is being leveraged to solve the company’s most pressing business issues.

So, what are the characteristics of a great data analyst and where can you find them?

“I always look for people with solid quantitative backgrounds because that is the hardest thing to learn on the job,” said Wegryn.

Combine that with mature communication skills and a talent for business acumen and you’ve got the perfect formula for a great data analyst.

When it comes to sourcing analytics, Wegryn says companies have an important strategic decision to make: Do you build it internally, leveraging resources like consultants and universities? Do you buy it from a growing community of technology solution providers? Or, do you adopt a hybrid model?

“Given the explosion of business analytics programs across the country, your organization should find ample opportunities to tap into those resources,” advised Wegryn.

To retain and nurture your organization’s business analysts, Wegryn recommended creating a career path that grows and the importance of encouraging talented personnel internally until they reach a trusted CEO advisory role.

Wegryn also shared key questions an organization should ask to unleash the value of analytics, and suggested that analytics should always start and end with a decision.

“You make a decision in business that leads to action that gleans insights that leads to another decision,” he said. “While the business moves one way, the business analyst works backward in a focused, disciplined and controlled manner.”

Perhaps most importantly, the key to building the talent capability to ensure analytics success came from P&G’s retired chairman, president and CEO Bob McDonald: “… having motivation from the top helps.”

Wegryn agreed: “It really helps when the person at the top of the chain is driven on data.”

The inaugural Retail & Consumer Goods Analytics Summit event was held September 11-12, 2014 at the W Hotel in San Diego, California. The conference featured keynotes from retail and consumer goods leaders, peer-to-peer exchanges and relationship building.

Article originally appeared HERE.

Originally Posted at: Best Practices For Building Talent In Analytics by analyticsweekpick