Jun 28, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Accuracy  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> Enterprise Architecture for the Internet of Things: Containerization and Microservices by jelaniharper

>> 25 Hilarious Geek Quotes for Geek-Wear by v1shal

>> May 22, 2017 Health and Biotech analytics news roundup by pstein

Wanna write? Click Here

[ NEWS BYTES]

>>
 What the country’s first undergrad program in artificial intelligence will look like – EdScoop News Under  Artificial Intelligence

>>
 Rugby Statistics: Cooney’s figures still stand up to scrutiny – Irish Times Under  Statistics

>>
 Witad Awards 2018 Write-Ups: Data Scientist of the Year … – www.waterstechnology.com Under  Data Scientist

More NEWS ? Click Here

[ FEATURED COURSE]

The Analytics Edge

image

This is an Archived Course
EdX keeps courses open for enrollment after they end to allow learners to explore content and continue learning. All features and materials may not be available, and course content will not be… more

[ FEATURED READ]

The Future of the Professions: How Technology Will Transform the Work of Human Experts

image

This book predicts the decline of today’s professions and describes the people and systems that will replace them. In an Internet society, according to Richard Susskind and Daniel Susskind, we will neither need nor want … more

[ TIPS & TRICKS OF THE WEEK]

Winter is coming, warm your Analytics Club
Yes and yes! As we are heading into winter what better way but to talk about our increasing dependence on data analytics to help with our decision making. Data and analytics driven decision making is rapidly sneaking its way into our core corporate DNA and we are not churning practice ground to test those models fast enough. Such snugly looking models have hidden nails which could induce unchartered pain if go unchecked. This is the right time to start thinking about putting Analytics Club[Data Analytics CoE] in your work place to help Lab out the best practices and provide test environment for those models.

[ DATA SCIENCE Q&A]

Q:Explain what a local optimum is and why it is important in a specific context,
such as K-means clustering. What are specific ways of determining if you have a local optimum problem? What can be done to avoid local optima?

A: * A solution that is optimal in within a neighboring set of candidate solutions
* In contrast with global optimum: the optimal solution among all others

* K-means clustering context:
It’s proven that the objective cost function will always decrease until a local optimum is reached.
Results will depend on the initial random cluster assignment

* Determining if you have a local optimum problem:
Tendency of premature convergence
Different initialization induces different optima

* Avoid local optima in a K-means context: repeat K-means and take the solution that has the lowest cost

Source

[ VIDEO OF THE WEEK]

@ReshanRichards on creating a learning startup for preparing for #FutureOfWork #JobsOfFuture #Podcast

 @ReshanRichards on creating a learning startup for preparing for #FutureOfWork #JobsOfFuture #Podcast

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

Without big data, you are blind and deaf and in the middle of a freeway. – Geoffrey Moore

[ PODCAST OF THE WEEK]

@DrewConway on creating socially responsible data science practice #FutureOfData #Podcast

 @DrewConway on creating socially responsible data science practice #FutureOfData #Podcast

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

29 percent report that their marketing departments have ‘too little or no customer/consumer data.’ When data is collected by marketers, it is often not appropriate to real-time decision making.

Sourced from: Analytics.CLUB #WEB Newsletter

How oil and gas firms are failing to grasp the necessity of Big Data analytics

An explosion in information volumes and processing power is transforming the energy sector. Even the major players are dragging their feet to catch up.

The business of oil and gas profit-making takes place increasingly in the realm of bits and bytes. The information explosion is everywhere, be it in the geosciences, engineering and management or even on the financial and regulatory sides. The days of easy oil are running out; unconventional plays are becoming the norm. For producers that means operations are getting trickier, more expensive and data-intensive.

“Companies are spending a lot of money on IT. Suncor alone spends about $500 million per year.”

Thirty years ago geoscientists could get their work done by scribbling on paper; today they are watching well data flow, in real time and by the petabyte, across their screens. Despite what many think, the challenge for them doesn’t lie in storing the mountains of data. That’s the easy part. The challenge is more about building robust IT infrastructures that ­holistically integrate operations data and enable ­different systems and sensors to talk to each other. With greater transparency over the data, operators can better analyze it and draw actionable insights that bring real competitive value.

Even the big guys aren’t progressive in this area,” says Nicole Jardin, CEO of Emerald Associates, a Calgary-based firm that provides project management solutions from Oracle. “They often make decisions without real big data analytics and collaborative tools. But people aren’t always ready for the level of transparency that’s now possible.” Asked why a company would not automatically buy into a solution that would massively help decision-makers, her answer is terse: “Firefighters want glory.”

The suggestion is, of course, that many big data management tools are so powerful that they can dramatically de-risk oil and gas projects. Many problems end up much more predictable and avoidable. As a result, people whose jobs depend on solving those problems and putting out fires see their livelihoods threatened by this IT trend. Resistance and suspicion, always a dark side of any corporate culture, rears its ugly face.

On the other hand, more progressive companies have already embraced the opportunities of big data. They don’t need convincing and have long since moved from resistance to enthusiastic adoption. They have grown shrewder and savvier and base their IT investments very objectively according to cost-benefit metrics. The central question for vendors: “So what’s the ROI?”

There is big confusion about big data, and there are different views about where the oil and gas industry is lagging in terms of adopting cutting-edge tools. Scott Fawcett, director at Alberta Innovates – Technology Futures in Calgary and a former executive at global technology companies like Apptio, SAP SE and Cisco Systems, points out that this is not small potato stuff. “There has been an explosion of data. How are you to deal with all the data coming in in terms of storage, processing, analytics? Companies are spending a lot of money on IT. Suncor alone spends about $500 million per year.” He then adds, “And that’s even at a time when memory costs have plummeted.”

 

The big data story had its modest beginnings in the 1980s, with the introduction of the first systems that allowed the energy industry to put data in a digital format. Very suddenly, the traditional characteristics of oil and gas and other resource industries – often unfairly snubbed as a field of “hewers of word and carriers of water” – changed fundamentally. The shift was from an analog to a digital business template; operations went high-tech.

It was also the beginning of what The Atlantic writer Jonathan Rauch has called the “new old economy.” With the advent of digitization, innovation accelerated and these innovations cross-fertilized each other in an ever-accelerating positive feedback loop. “Measurement-while-drilling, directional drilling and 3-D seismic imaging not only developed simultaneously but also developed one another,” wrote Rauch. “Higher resolution seismic imaging increased the payoff for accurate drilling, and so companies scrambled to invest in high-tech downhole sensors; power sensors, in turn, increased yields and hence the payoff for expensive directional drilling; and faster, cheaper directional drilling increased the payoff for still higher resolution from 3-D seismic imaging.”

One of the biggest issues in those early days was storage, but when that problem was more or less solved, the industry turned to the next challenge of improving the processing and analysis of the enormous and complex data sets it collects daily. Traditional data applications such as Microsoft Excel were hopelessly inadequate for the task.

In fact, the more data and analytical capacities the industry got, the more it wanted. It wasn’t long ago that E&P companies would evaluate an area and then drill a well. Today, companies still evaluate then drill, but the data collected in real time from the drilling is entered into the system to guide planning for the next well. Learnings are captured and their value compounded immediately. In the process, the volume of collected data mushrooms.

The label “big data” creates confusion, just as does the term Big Oil. The “big” part of big data is widely misunderstood. It is, therefore, helpful to define big data with the three v’s of volume, velocity and variety. With regard to the first “v,” technology analysts International Data Corp. estimated that there were 2.7 zettabytes of data worldwide as of March 2012. A zettabyte equals 1.1 trillion gigabytes. The amount of data in the world doubles each year, and the data in the oil and gas industry, which makes up a non-trivial part of the data universe, keeps flooding in from every juncture along the exploration, production and processing value chain.

Velocity, the second “v,” refers to the speed by which the volume data is accumulating. This is caused by the fact that, in accordance with Moore’s famous law, computational power keeps increasing exponentially, storage costs keep falling and communication and ubiquitous smart technology keep generating more and more information.

“In the old days, people were driving around in trucks, measuring things. Now there are sensors that do that work.”

On the velocity side, Scott Fawcett says, “In the old days people were driving around in trucks, measuring things. Now there are sensors doing that work.” Sensors are everywhere in operations now. Just in their downhole deployment, there are flowmeters and pressure, temperature, vibrations gauges as well as acoustic and electromagnetic sensors.

Big data analytics is the ability to asses and draw rich insights from data sets so decision-makers can better de-risk projects. There is a common big data focus of oil and gas companies on logistics and optimization, according to Dale Sperrazza, general manager Europe and sub-Saharan Africa at Halliburton Landmark. If this focus is too one-sided, companies may end up just optimizing a well drilled in a suboptimal location.

“So while there is great value in big data and advanced analytics for oilfield operations and equipment, no matter if the sand truck shows up on time, drilling times are reduced and logistical delays are absolutely minimized, a poorly chosen well is a poorly performing well,” writes Luther Birdzell in the blog OAG Analytics.

Birdzell goes on to explain that the lack of predictive analytics results in about 25 per cent of the wells in large U.S. resource plays underperforming, at a cost of roughly $10 million per well. After all, if a company fails to have enough trucks to haul away production from a site before a storage facility fills up, then the facility shuts down. Simply put, when a facility is shut down, production is deferred, deferred production is deferred revenue, and deferred revenue can be the kiss of death for companies in fragile financial health.

The application of directional drilling and hydraulic multi-stage fracturing to hydrocarbon-rich source rocks has made the petroleum business vastly more complex, according to the Deloitte white paper The Challenge of Renaissance, and this complexity can only be managed by companies with a real mastery of big data and its analytical tools. The age of easy oil continues to fade out while the new data- and technology-driven age of “hard oil” is taking center stage. The capital costs of unconventional oil and gas plays are now so high and the technical requirements so convoluted, the margins for error have grown very small. Decision-makers can’t afford to make too many bad calls.

Despite the investments companies are putting into data-generating tools like sensors, much of the data is simply discarded, because the right infrastructure is missing. “IT infrastructure should not be confused with just storage; it is rather the capacity to warehouse and model data,” according to Nicole Jardin at Emerald Associates. If the right infrastructure is in place, the sensor-generated data could be deeply analyzed and opportunities ­identified for production, safety or environmental improvements.

Today, operators are even introducing automated controls that register data anomalies and point to the possible imminent occurrence of dangerous events. Behind these automated controls are predictive models which monitor operational processes in real time. They are usually coupled with systems that not only alert companies to issues but also make recommendations to deal with them. Pipelines are obviously investing heavily in these systems, but automated controls are part of a much larger development now sweeping across all industries and broadly called “the Internet of things” or “the industrial Internet.”

“In the ’80s, when data was being stored digitally, it was fragmented with systems that weren’t capable of communicating with each other,” Fawcett says. The next wave in big data is toward the holistic view of data system de-fragmentation and integration. “Ultimately,” Jardin says, “in order to analyze data, you need to federate it. Getting all the parts to speak to each other should now be high priority for competitively minded energy companies.”

Originally posted via “How oil and gas firms are failing to grasp the necessity of Big Data analytics”

Source: How oil and gas firms are failing to grasp the necessity of Big Data analytics by analyticsweekpick

Jun 21, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Ethics  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> Enterprise Data Modeling Made Easy by jelaniharper

>> Assessment of Risk Maps in Data Scientist Jobs by thomassujain

>> Mar 01, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..) by admin

Wanna write? Click Here

[ NEWS BYTES]

>>
 Network Intelligence witnesses 48% growth in FY 2017-18 – Tech Observer Under  Big Data Security

>>
 Predictive and Prescriptive Analytics Market valued of USD 16.84 billion, at a CAGR of 20.43% by the end of 2023 – The Financial Analyst Under  Prescriptive Analytics

>>
 New York City Spending the Focus at Fourth Annual Business Analytics Conference – Manhattan College News Under  Analytics

More NEWS ? Click Here

[ FEATURED COURSE]

Deep Learning Prerequisites: The Numpy Stack in Python

image

The Numpy, Scipy, Pandas, and Matplotlib stack: prep for deep learning, machine learning, and artificial intelligence… more

[ FEATURED READ]

The Misbehavior of Markets: A Fractal View of Financial Turbulence

image

Mathematical superstar and inventor of fractal geometry, Benoit Mandelbrot, has spent the past forty years studying the underlying mathematics of space and natural patterns. What many of his followers don’t realize is th… more

[ TIPS & TRICKS OF THE WEEK]

Data aids, not replace judgement
Data is a tool and means to help build a consensus to facilitate human decision-making but not replace it. Analysis converts data into information, information via context leads to insight. Insights lead to decision making which ultimately leads to outcomes that brings value. So, data is just the start, context and intuition plays a role.

[ DATA SCIENCE Q&A]

Q:What is cross-validation? How to do it right?
A: It’s a model validation technique for assessing how the results of a statistical analysis will generalize to an independent data set. Mainly used in settings where the goal is prediction and one wants to estimate how accurately a model will perform in practice. The goal of cross-validation is to define a data set to test the model in the training phase (i.e. validation data set) in order to limit problems like overfitting, and get an insight on how the model will generalize to an independent data set.

Examples: leave-one-out cross validation, K-fold cross validation

How to do it right?

the training and validation data sets have to be drawn from the same population
predicting stock prices: trained for a certain 5-year period, it’s unrealistic to treat the subsequent 5-year a draw from the same population
common mistake: for instance the step of choosing the kernel parameters of a SVM should be cross-validated as well
Bias-variance trade-off for k-fold cross validation:

Leave-one-out cross-validation: gives approximately unbiased estimates of the test error since each training set contains almost the entire data set (n?1n?1 observations).

But: we average the outputs of n fitted models, each of which is trained on an almost identical set of observations hence the outputs are highly correlated. Since the variance of a mean of quantities increases when correlation of these quantities increase, the test error estimate from a LOOCV has higher variance than the one obtained with k-fold cross validation

Typically, we choose k=5 or k=10, as these values have been shown empirically to yield test error estimates that suffer neither from excessively high bias nor high variance.
Source

[ VIDEO OF THE WEEK]

@AnalyticsWeek Panel Discussion: Health Informatics Analytics

 @AnalyticsWeek Panel Discussion: Health Informatics Analytics

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

What we have is a data glut. – Vernon Vinge

[ PODCAST OF THE WEEK]

Want to fix #DataScience ? fix #governance by @StephenGatchell @Dell #FutureOfData #Podcast

 Want to fix #DataScience ? fix #governance by @StephenGatchell @Dell #FutureOfData #Podcast

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

The data volumes are exploding, more data has been created in the past two years than in the entire previous history of the human race.

Sourced from: Analytics.CLUB #WEB Newsletter

2016 Trends in Big Data Governance: Modeling the Enterprise

A number of changes in the contemporary data landscape have affected the implementation of data governance. The normalization of big data has resulted in a situation in which such deployments are so common that they’re merely considered a standard part of data management. The confluence of technologies largely predicated on big data—cloud, mobile and social—are gaining similar prominence, transforming the expectations of not only customers but business consumers of data.

Consequently, the demands for big data governance are greater than ever, as organizations attempt to implement policies to reflect their corporate values and sate customer needs in a world in which increased regulatory consequences and security breaches are not aberrations.

The most pressing developments for big data governance in 2016 include three dominant themes. Organizations need to enforce it outside the corporate firewalls via the cloud, democratize the level of data stewardship requisite for the burgeoning self-service movement, and provide metadata and semantic consistency to negate the impact of silos while promoting sharing of data across the enterprise.

These objectives are best achieved with a degree of foresight and stringency that provides a renewed emphasis on modeling in its myriad forms. According to TopQuadrant co-founder, executive VP and director of TopBraid Technologies Ralph Hodgson, “What you find is the meaning of data governance is shifting. I sometimes get criticized for saying this, but it’s shifting towards a sense of modeling the enterprise.”

In the Cloud

Perhaps the single most formidable challenge facing big data governance is accounting for the plethora of use cases involving the cloud, which appears tailored for the storage and availability demands of big data deployments. These factors, in conjunction with the analytics options available from third-party providers, make utilizing the cloud more attractive than ever. However, cloud architecture challenges data governance in a number of ways including:

  • Semantic modeling: Each cloud application has its own semantic model. Without dedicated governance measures on the part of an organization, integrating those different models can hinder data’s meaning and its reusability.
  • Service provider models: Additionally, each cloud service provider has its own model which may or may not be congruent with enterprise models for data. Organizations have to account for these models as well as those at the application level.
  • Metadata: Applications and cloud providers also have disparate metadata standards which need to be reconciled. According to Tamr Global Head of Strategy, Operations and Marketing Nidhi Aggarwal, “Seeing the metadata is important from a governance standpoint because you don’t want the data available to anybody. You want the metadata about the data transparent.” Vendor lock-in in the form of proprietary metadata issued by providers and their applications can be a problem too—especially since such metadata can encompass an organization’s so that it effectively belongs to the provider.

Rectifying these issues requires a substantial degree of planning prior to entering into service level agreements. Organizations should consider both current and future integration plans and their ramifications for semantics and metadata, which is part of the basic needs assessment that accompanies any competent governance program. Business input is vital to this process. Methods for addressing these cloud-based points of inconsistency include transformation and writing code, or adopting enterprise-wide semantic models via ontologies, taxonomies, and RDF graphs. The critical element is doing so in a way that involves the provider prior to establishing service.

The Democratization of Data Stewardship

The democratization of big data is responsible for an emergence of what Gartner refers to as ‘citizen stewardship’ in two capital ways. The popularity of data lakes and the availability of data preparation tools with cognitive computing capabilities are empowering end users to assert more control over their data. The result is a shifting from the centralized model of data stewardship (which typically encompassed stewards from both the business and IT, the former in accordance to domains) to a decentralized one in which virtually everyone actually using data plays a role in its stewardship.

Both preparation tools and data lakes herald this movement by giving end users the opportunity to perform data integration. Machine learning technologies inform the former and can identify which data is best integrated with others on an application or domain-wide basis. The celerity of this self-service access and integration to data necessitates that the onus of integrating data in accordance to governance policy falls on the end user. Preparation tools can augment that process by facilitating ETL and other forms of action with machine learning algorithms, which can maintain semantic consistency.

Data lakes equipped with semantic capabilities can facilitate a number of preparation functions from initial data discovery to integration while ensuring the sort of metadata and semantic consistency for proper data governance. Regardless, “if you put data in a data lake, there still has to be some metadata associated with it,” MapR Chief Marketing Officer Jack Norris explained. “You need some sort of schema that’s defined so you can accomplish self-service.”

Metadata and Semantic Consistency

No matter what type of architecture is employed (either cloud or on-premise), consistent metadata and semantics represent the foundation of secure governance once enterprise wide policies based on business objectives are formulated. As noted by Franz CEO Jans Aasman, “That’s usually how people define data governance: all the processes that enable you to have more consistent data”. Perhaps the most thorough means of ensuring consistency in these two aspects of governance involves leveraging a data lake or single repository enriched with semantic technologies. The visual representation of data elements on an RDF graph is accessible for end user consumption, while semantic models based on ontological descriptions of data elements clarify their individual meanings. These models can be mapped to metadata to grant uniformity in this vital aspect of governance and provide semantic consistency on diverse sets of big data.

Alternatively, it is possible to achieve metadata consistency via processes instead of technologies. Doing so is more tenuous, yet perhaps preferable to organizations still utilizing a silo approach among different business domains. Sharing and integrating that data is possible through the means of an enterprise-wide governance council with business membership across those domains, which rigorously defines and monitors metadata attributes so that there is still a common semantic model across units. This approach might behoove less technologically savvy organizations, although the sustainment of such councils could become difficult. Still, this approach results in consistent metadata and semantic models on disparate sets of big data.

Enterprise Modeling

The emphasis on modeling that is reflected in all of these trends substantiates the viewpoint that effective big data governance requires strident modeling. Moreover, it is important to implement at a granular level so that data is able to be reused and maintain its meaning across different technologies, applications, business units, and personnel changes. The degree of prescience and planning required to successfully model the enterprise to ensure governance objectives are met will be at the forefront of governance concerns in 2016, whether organizations are seeking new data management solutions or refining established ones. In this respect, governance is actually the foundation upon which data management rests. According to Cambridge Semantics president Alok Prasad, “Even if you are the CEO, you will not go against your IT department in terms of security and governance. Even if you can get a huge ROI, if the governance and security are not there you will not adopt a solution.”

 

Originally Posted at: 2016 Trends in Big Data Governance: Modeling the Enterprise

February 13, 2017 Health and Biotech analytics news roundup

News and commentary about health and biotech data:

How data science is transforming cancer treatment scheduling: Scheduling appointments is a problem that many electronic systems are not able to handle efficiently. New holistic approaches inspired by manufacturing are helping to improve this process.

Data Analytics May Keep Cancer Patients out of Emergency Departments: Researchers at the UPenn school of medicine are developing a model to predict when patients need emergency care, as well as best practices for when they need such care.

Why practices are struggling to exchange records: A recent survey found some key issues in the exchange of electronic health records, including a lack of confidence in the technology and difficulties transferring data across different systems.

Pharmacist and drug associations want better data on medicine shortages: The associations called for more up-to-date information about possible shortages from providers.

Secrets of Life in a Spoonful of Blood: Researchers have increasingly powerful tools to study fetal development, including sequencing DNA found in the mother’s blood.

Originally Posted at: February 13, 2017 Health and Biotech analytics news roundup by pstein

Jun 14, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Productivity  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> What is data security For Organizations? by analyticsweekpick

>> Data Modeling Tomorrow: Self-Describing Data Formats by jelaniharper

>> Data-As-A-Service to enable compliance reporting by v1shal

Wanna write? Click Here

[ NEWS BYTES]

>>
 Amazon’s cloud business is using this weird 1976 ‘Saturday Night Live’ skit to explain its new blockchain product – Business Insider Under  Cloud

>>
 Virtualization Software Market Overview, Cost Structure Analysis, Growth Opportunities and Forecast to 2023 – Exclusive Reportage Under  Virtualization

>>
 Stellar Lumens [XLM], thrusters burned out? – Sentiment Analysis – April 16 – AMBCrypto Under  Sentiment Analysis

More NEWS ? Click Here

[ FEATURED COURSE]

Lean Analytics Workshop – Alistair Croll and Ben Yoskovitz

image

Use data to build a better startup faster in partnership with Geckoboard… more

[ FEATURED READ]

The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World

image

In the world’s top research labs and universities, the race is on to invent the ultimate learning algorithm: one capable of discovering any knowledge from data, and doing anything we want, before we even ask. In The Mast… more

[ TIPS & TRICKS OF THE WEEK]

Winter is coming, warm your Analytics Club
Yes and yes! As we are heading into winter what better way but to talk about our increasing dependence on data analytics to help with our decision making. Data and analytics driven decision making is rapidly sneaking its way into our core corporate DNA and we are not churning practice ground to test those models fast enough. Such snugly looking models have hidden nails which could induce unchartered pain if go unchecked. This is the right time to start thinking about putting Analytics Club[Data Analytics CoE] in your work place to help Lab out the best practices and provide test environment for those models.

[ DATA SCIENCE Q&A]

Q:What is: collaborative filtering, n-grams, cosine distance?
A: Collaborative filtering:
– Technique used by some recommender systems
– Filtering for information or patterns using techniques involving collaboration of multiple agents: viewpoints, data sources.
1. A user expresses his/her preferences by rating items (movies, CDs.)
2. The system matches this user’s ratings against other users’ and finds people with most similar tastes
3. With similar users, the system recommends items that the similar users have rated highly but not yet being rated by this user

n-grams:
– Contiguous sequence of n items from a given sequence of text or speech
– ‘Andrew is a talented data scientist”
– Bi-gram: ‘Andrew is”, ‘is a”, ‘a talented”.
– Tri-grams: ‘Andrew is a”, ‘is a talented”, ‘a talented data”.
– An n-gram model models sequences using statistical properties of n-grams; see: Shannon Game
– More concisely, n-gram model: P(Xi|Xi?(n?1)…Xi?1): Markov model
– N-gram model: each word depends only on the n?1 last words

Issues:
– when facing infrequent n-grams
– solution: smooth the probability distributions by assigning non-zero probabilities to unseen words or n-grams
– Methods: Good-Turing, Backoff, Kneser-Kney smoothing

Cosine distance:
– How similar are two documents?
– Perfect similarity/agreement: 1
– No agreement : 0 (orthogonality)
– Measures the orientation, not magnitude

Given two vectors A and B representing word frequencies:
cosine-similarity(A,B)=?A,B?/||A||?||B||

Source

[ VIDEO OF THE WEEK]

Making sense of unstructured data by turning strings into things

 Making sense of unstructured data by turning strings into things

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

In God we trust. All others must bring data. – W. Edwards Deming

[ PODCAST OF THE WEEK]

#BigData #BigOpportunity in Big #HR by @MarcRind #JobsOfFuture #Podcast

 #BigData #BigOpportunity in Big #HR by @MarcRind #JobsOfFuture #Podcast

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

Big data is a top business priority and drives enormous opportunity for business improvement. Wikibon’s own study projects that big data will be a $50 billion business by 2017.

Sourced from: Analytics.CLUB #WEB Newsletter

How to Use Social Media to Find Customers (Infographic)

How to Use Social Media to Find Customers (Infographic)
How to Use Social Media to Find Customers (Infographic)

Everyone talks about how important it is to be on Social Media.  But how do you use it to gather more customers?  Well, Kathleen Davis has shared some information compiled by Wishpond.

 

Did you know that 77% of B2C have found new customers through Facebook, while LinkedIn has proven significant for B2B. Try 277% more effective than Facebook. Those are some outstanding numbers!

 

Check out the rest in the Infographic below.

 

Lessons Big Data Projects Could Use From Startups

 

source

Source

The Pitch Deck We Used To Raise $500,000 For Our Startup

The Pitch Deck We Used To Raise $500,000 For Our Startup
The Pitch Deck We Used To Raise $500,000 For Our Startup

I came across Pitch Deck used by Buffer co-founders Joel Gascoigne and Leo Widrich to raise $500k round. This deck has lots of good information and it could be really useful for other startups seeking to raise capital.

Why it is relevant for most of startups? Because buffer founders were also first timers.

One of the big no-no’s we’ve learnt about early on in Silicon Valley is to publicly share the pitchdeck you’ve used to raise money. At least, not before you’ve been acquired or failed or in any other way been removed from stage. That’s a real shame, we thought. Sharing the actual slidedeck we used (and one, that’s not 10 years old) is by far one of the most useful things for others to learn from. In fact, both Joel and I have privately shared the deck with fledging founders to help them with their fundraising. On top of that, our case study is hopefully uniquely insightful for lots of people. Here is why:
Half a million is not a crazy amount: It’s therefore hopefully an example that helps the widest range of founders trying to raise money.
Both Joel and myself are first-timers: We couldn’t just throw big names onto a slideshow and ride with it. We had to test and change the flow and deck a lot.

As a summary: this deck is build upto one key slide: Traction.

So without further ado – have a look at their pitch deck.

ps: If you want to read more? check out OnStartups.com

Source by v1shal

Jun 07, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Tour of Accounting  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> December 5, 2016 Health and Biotech analytics news roundup by pstein

>> Is data analytics about causes … or correlations? by analyticsweekpick

>> Mar 15, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..) by admin

Wanna write? Click Here

[ NEWS BYTES]

>>
 Getting Data Science Right: How To Structure Data Science Teams For Maximum Results – Forbes Under  Data Science

>>
 4 Companies Betting on Cloud Gaming – Motley Fool Under  Cloud

>>
 From Hawkeye country to the Vikings with data analytics – The Gazette: Eastern Iowa Breaking News and Headlines Under  Analytics

More NEWS ? Click Here

[ FEATURED COURSE]

Deep Learning Prerequisites: The Numpy Stack in Python

image

The Numpy, Scipy, Pandas, and Matplotlib stack: prep for deep learning, machine learning, and artificial intelligence… more

[ FEATURED READ]

How to Create a Mind: The Secret of Human Thought Revealed

image

Ray Kurzweil is arguably today’s most influential—and often controversial—futurist. In How to Create a Mind, Kurzweil presents a provocative exploration of the most important project in human-machine civilization—reverse… more

[ TIPS & TRICKS OF THE WEEK]

Winter is coming, warm your Analytics Club
Yes and yes! As we are heading into winter what better way but to talk about our increasing dependence on data analytics to help with our decision making. Data and analytics driven decision making is rapidly sneaking its way into our core corporate DNA and we are not churning practice ground to test those models fast enough. Such snugly looking models have hidden nails which could induce unchartered pain if go unchecked. This is the right time to start thinking about putting Analytics Club[Data Analytics CoE] in your work place to help Lab out the best practices and provide test environment for those models.

[ DATA SCIENCE Q&A]

Q:What is the Central Limit Theorem? Explain it. Why is it important?
A: The CLT states that the arithmetic mean of a sufficiently large number of iterates of independent random variables will be approximately normally distributed regardless of the underlying distribution. i.e: the sampling distribution of the sample mean is normally distributed.
– Used in hypothesis testing
– Used for confidence intervals
– Random variables must be iid: independent and identically distributed
– Finite variance

Source

[ VIDEO OF THE WEEK]

#FutureOfData with Rob(@telerob) / @ConnellyAgency on running innovation in agency

 #FutureOfData with Rob(@telerob) / @ConnellyAgency on running innovation in agency

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

The temptation to form premature theories upon insufficient data is the bane of our profession. – Sherlock Holmes

[ PODCAST OF THE WEEK]

Jeff Palmucci @TripAdvisor discusses managing a #MachineLearning #AI Team

 Jeff Palmucci @TripAdvisor discusses managing a #MachineLearning #AI Team

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

Walmart handles more than 1 million customer transactions every hour, which is imported into databases estimated to contain more than 2.5 petabytes of data.

Sourced from: Analytics.CLUB #WEB Newsletter