May 24, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Data interpretation  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ NEWS BYTES]

>>
 The case for one giant, multibillion-dollar cloud contract for DoD – C4ISRNet Under  Cloud

>>
 Streaming Analytics Market Is Constantly Growing On Account Of the Increasing Operational Efficiency And Production … – Expert Consulting Under  Streaming Analytics

>>
 ‘Salah’s statistics in his debut season at Anfield are quite astonishing’ – how the papers saw Liverpool FC’s … – Daily Post North Wales Under  Statistics

More NEWS ? Click Here

[ FEATURED COURSE]

Tackle Real Data Challenges

image

Learn scalable data management, evaluate big data technologies, and design effective visualizations…. more

[ FEATURED READ]

The Future of the Professions: How Technology Will Transform the Work of Human Experts

image

This book predicts the decline of today’s professions and describes the people and systems that will replace them. In an Internet society, according to Richard Susskind and Daniel Susskind, we will neither need nor want … more

[ TIPS & TRICKS OF THE WEEK]

Fix the Culture, spread awareness to get awareness
Adoption of analytics tools and capabilities has not yet caught up to industry standards. Talent has always been the bottleneck towards achieving the comparative enterprise adoption. One of the primal reason is lack of understanding and knowledge within the stakeholders. To facilitate wider adoption, data analytics leaders, users, and community members needs to step up to create awareness within the organization. An aware organization goes a long way in helping get quick buy-ins and better funding which ultimately leads to faster adoption. So be the voice that you want to hear from leadership.

[ DATA SCIENCE Q&A]

Q:Explain Tufte’s concept of ‘chart junk’?
A: All visuals elements in charts and graphs that are not necessary to comprehend the information represented, or that distract the viewer from this information

Examples of unnecessary elements include:
– Unnecessary text
– Heavy or dark grid lines
– Ornamented chart axes
– Pictures
– Background
– Unnecessary dimensions
– Elements depicted out of scale to one another
– 3-D simulations in line or bar charts

Source

[ VIDEO OF THE WEEK]

@AnalyticsWeek: Big Data at Work: Paul Sonderegger

 @AnalyticsWeek: Big Data at Work: Paul Sonderegger

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

With data collection, ‘the sooner the better’ is always the best answer. – Marissa Mayer

[ PODCAST OF THE WEEK]

#FutureOfData Podcast: Peter Morgan, CEO, Deep Learning Partnership

 #FutureOfData Podcast: Peter Morgan, CEO, Deep Learning Partnership

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

29 percent report that their marketing departments have ‘too little or no customer/consumer data.’ When data is collected by marketers, it is often not appropriate to real-time decision making.

Sourced from: Analytics.CLUB #WEB Newsletter

May 17, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Big Data knows everything  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> Customer Service Excellence in 8 steps! by martin

>> Underpinning Enterprise Data Governance with Machine Intelligence by jelaniharper

>> Navigating Big Data Careers with a Statistics PhD by analyticsweekpick

Wanna write? Click Here

[ NEWS BYTES]

>>
 Startup Dremio Promises Improved Big Data Access For Business Analytics With New Release – CRN Under  Business Analytics

>>
 Teladoc taps IBM Watson machine learning for second opinion service – Healthcare IT News Under  Machine Learning

>>
 Amazon or no, banks are in for big changes, one analyst says … – MarketWatch Under  Financial Analytics

More NEWS ? Click Here

[ FEATURED COURSE]

Machine Learning

image

6.867 is an introductory course on machine learning which gives an overview of many concepts, techniques, and algorithms in machine learning, beginning with topics such as classification and linear regression and ending … more

[ FEATURED READ]

Hypothesis Testing: A Visual Introduction To Statistical Significance

image

Statistical significance is a way of determining if an outcome occurred by random chance, or did something cause that outcome to be different than the expected baseline. Statistical significance calculations find their … more

[ TIPS & TRICKS OF THE WEEK]

Fix the Culture, spread awareness to get awareness
Adoption of analytics tools and capabilities has not yet caught up to industry standards. Talent has always been the bottleneck towards achieving the comparative enterprise adoption. One of the primal reason is lack of understanding and knowledge within the stakeholders. To facilitate wider adoption, data analytics leaders, users, and community members needs to step up to create awareness within the organization. An aware organization goes a long way in helping get quick buy-ins and better funding which ultimately leads to faster adoption. So be the voice that you want to hear from leadership.

[ DATA SCIENCE Q&A]

Q:What do you think about the idea of injecting noise in your data set to test the sensitivity of your models?
A: * Effect would be similar to regularization: avoid overfitting
* Used to increase robustness

Source

[ VIDEO OF THE WEEK]

#BigData @AnalyticsWeek #FutureOfData #Podcast with Eloy Sasot, News Corp

 #BigData @AnalyticsWeek #FutureOfData #Podcast with Eloy Sasot, News Corp

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

It is a capital mistake to theorize before one has data. Insensibly, one begins to twist the facts to suit theories, instead of theories to

[ PODCAST OF THE WEEK]

@ChuckRehberg / @TrigentSoftware on Translating Technology to Solve Business Problems #FutureOfData #Podcast

 @ChuckRehberg / @TrigentSoftware on Translating Technology to Solve Business Problems #FutureOfData #Podcast

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

73% of organizations have already invested or plan to invest in big data by 2016

Sourced from: Analytics.CLUB #WEB Newsletter

May 10, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Pacman  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> Dec 21, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..) by admin

>> â€œPutting Data Everywhere”: Leveraging Centralized Business Intelligence for Full-Blown Data Culture by jelaniharper

>> Finance Best Practices Are Changing—Is Your Organization Keeping Pace? by analyticsweekpick

Wanna write? Click Here

[ NEWS BYTES]

>>
 6 Tips for Keeping Iot Devices Safe – Security Sales & Integration Under  IOT

>>
 Software AG ramps up Australian push with IoT platform – IoT Hub Under  Streaming Analytics

>>
 Customer experience in a new dimension: 3D Augmented Reality App Mercedes cAR and Virtual Reality goggles … – Automotive World (press release) Under  Customer Experience

More NEWS ? Click Here

[ FEATURED COURSE]

Python for Beginners with Examples

image

A practical Python course for beginners with examples and exercises…. more

[ FEATURED READ]

The Signal and the Noise: Why So Many Predictions Fail–but Some Don’t

image

People love statistics. Statistics, however, do not always love them back. The Signal and the Noise, Nate Silver’s brilliant and elegant tour of the modern science-slash-art of forecasting, shows what happens when Big Da… more

[ TIPS & TRICKS OF THE WEEK]

Winter is coming, warm your Analytics Club
Yes and yes! As we are heading into winter what better way but to talk about our increasing dependence on data analytics to help with our decision making. Data and analytics driven decision making is rapidly sneaking its way into our core corporate DNA and we are not churning practice ground to test those models fast enough. Such snugly looking models have hidden nails which could induce unchartered pain if go unchecked. This is the right time to start thinking about putting Analytics Club[Data Analytics CoE] in your work place to help Lab out the best practices and provide test environment for those models.

[ DATA SCIENCE Q&A]

Q:What is star schema? Lookup tables?
A: The star schema is a traditional database schema with a central (fact) table (the “observations”, with database “keys” for joining with satellite tables, and with several fields encoded as ID’s). Satellite tables map ID’s to physical name or description and can be “joined” to the central fact table using the ID fields; these tables are known as lookup tables, and are particularly useful in real-time applications, as they save a lot of memory. Sometimes star schemas involve multiple layers of summarization (summary tables, from granular to less granular) to retrieve information faster.

Lookup tables:
– Array that replace runtime computations with a simpler array indexing operation

Source

[ VIDEO OF THE WEEK]

@JohnTLangton from @Wolters_Kluwer discussed his #AI Lead Startup Journey #FutureOfData #Podcast

 @JohnTLangton from @Wolters_Kluwer discussed his #AI Lead Startup Journey #FutureOfData #Podcast

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

War is 90% information. – Napoleon Bonaparte

[ PODCAST OF THE WEEK]

@JohnNives on ways to demystify AI for enterprise #FutureOfData #Podcast

 @JohnNives on ways to demystify AI for enterprise #FutureOfData #Podcast

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

73% of organizations have already invested or plan to invest in big data by 2016

Sourced from: Analytics.CLUB #WEB Newsletter

May 03, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Data security  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> The Value of Opinion versus Data in Customer Experience Management by bobehayes

>> The Case For Pure Play Virtualization by analyticsweekpick

>> Steph Curry’s Season Stats in 13 lines of R Code by stattleship

Wanna write? Click Here

[ NEWS BYTES]

>>
 How Machine Learning can help Cryptocurrency Traders Maximize their Gains – Cryptovest Under  Machine Learning

>>
 Mulvaney response to CFPB data security gaps baffles cyber experts – American Banker Under  Data Security

>>
 No experience + hiring freeze + political donor = $121000 job? – The Boston Globe Under  Data Security

More NEWS ? Click Here

[ FEATURED COURSE]

Machine Learning

image

6.867 is an introductory course on machine learning which gives an overview of many concepts, techniques, and algorithms in machine learning, beginning with topics such as classification and linear regression and ending … more

[ FEATURED READ]

The Misbehavior of Markets: A Fractal View of Financial Turbulence

image

Mathematical superstar and inventor of fractal geometry, Benoit Mandelbrot, has spent the past forty years studying the underlying mathematics of space and natural patterns. What many of his followers don’t realize is th… more

[ TIPS & TRICKS OF THE WEEK]

Save yourself from zombie apocalypse from unscalable models
One living and breathing zombie in today’s analytical models is the pulsating absence of error bars. Not every model is scalable or holds ground with increasing data. Error bars that is tagged to almost every models should be duly calibrated. As business models rake in more data the error bars keep it sensible and in check. If error bars are not accounted for, we will make our models susceptible to failure leading us to halloween that we never wants to see.

[ DATA SCIENCE Q&A]

Q:What is star schema? Lookup tables?
A: The star schema is a traditional database schema with a central (fact) table (the “observations”, with database “keys” for joining with satellite tables, and with several fields encoded as ID’s). Satellite tables map ID’s to physical name or description and can be “joined” to the central fact table using the ID fields; these tables are known as lookup tables, and are particularly useful in real-time applications, as they save a lot of memory. Sometimes star schemas involve multiple layers of summarization (summary tables, from granular to less granular) to retrieve information faster.

Lookup tables:
– Array that replace runtime computations with a simpler array indexing operation

Source

[ VIDEO OF THE WEEK]

#BigData @AnalyticsWeek #FutureOfData #Podcast with @Beena_Ammanath, @GE

 #BigData @AnalyticsWeek #FutureOfData #Podcast with @Beena_Ammanath, @GE

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

The world is one big data problem. – Andrew McAfee

[ PODCAST OF THE WEEK]

#FutureOfData with @CharlieDataMine, @Oracle discussing running analytics in an enterprise

 #FutureOfData with @CharlieDataMine, @Oracle discussing running analytics in an enterprise

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

94% of Hadoop users perform analytics on large volumes of data not possible before; 88% analyze data in greater detail; while 82% can now retain more of their data.

Sourced from: Analytics.CLUB #WEB Newsletter

Apr 26, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Accuracy check  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> Data Science and Big Data: Two very Different Beasts by analyticsweekpick

>> Automating Data Modeling for the Internet of Things: Accelerating Transformation and Data Preparation by jelaniharper

>> Digital textbook analytics can predict student outcomes, study finds by analyticsweekpick

Wanna write? Click Here

[ NEWS BYTES]

>>
 Telefónica Expands Its Global Telco Cloud – Light Reading Under  Cloud

>>
 Johnson Controls Acquires Smartvue to Add IoT Video Services Platform – Security Sales & Integration Under  IOT

>>
 Margaret Mary Health earns nationwide honor – Country 103.9 WRBI Under  Health Analytics

More NEWS ? Click Here

[ FEATURED COURSE]

CS109 Data Science

image

Learning from data in order to gain useful predictions and insights. This course introduces methods for five key facets of an investigation: data wrangling, cleaning, and sampling to get a suitable data set; data managem… more

[ FEATURED READ]

The Future of the Professions: How Technology Will Transform the Work of Human Experts

image

This book predicts the decline of today’s professions and describes the people and systems that will replace them. In an Internet society, according to Richard Susskind and Daniel Susskind, we will neither need nor want … more

[ TIPS & TRICKS OF THE WEEK]

Keeping Biases Checked during the last mile of decision making
Today a data driven leader, a data scientist or a data driven expert is always put to test by helping his team solve a problem using his skills and expertise. Believe it or not but a part of that decision tree is derived from the intuition that adds a bias in our judgement that makes the suggestions tainted. Most skilled professionals do understand and handle the biases well, but in few cases, we give into tiny traps and could find ourselves trapped in those biases which impairs the judgement. So, it is important that we keep the intuition bias in check when working on a data problem.

[ DATA SCIENCE Q&A]

Q:How would you come up with a solution to identify plagiarism?
A: * Vector space model approach
* Represent documents (the suspect and original ones) as vectors of terms
* Terms: n-grams; n=1 to as much we can (detect passage plagiarism)
* Measure the similarity between both documents
* Similarity measure: cosine distance, Jaro-Winkler, Jaccard
* Declare plagiarism at a certain threshold

Source

[ VIDEO OF THE WEEK]

@SidProbstein / @AIFoundry on Leading #DataDriven Technology Transformation #FutureOfData #Podcast

 @SidProbstein / @AIFoundry on Leading #DataDriven Technology Transformation #FutureOfData #Podcast

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

Information is the oil of the 21st century, and analytics is the combustion engine. – Peter Sondergaard

[ PODCAST OF THE WEEK]

@CRGutowski from @GE_Digital on Using #Analytics to #Transform Sales #FutureOfData #Podcast

 @CRGutowski from @GE_Digital on Using #Analytics to #Transform Sales #FutureOfData #Podcast

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

Every second we create new data. For example, we perform 40,000 search queries every second (on Google alone), which makes it 3.5 searches per day and 1.2 trillion searches per year.In Aug 2015, over 1 billion people used Facebook FB +0.54% in a single day.

Sourced from: Analytics.CLUB #WEB Newsletter

Apr 19, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Accuracy  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> Gleanster – Actionable Insights at a Glance by bobehayes

>> 7 Things to Look before Picking Your Data Discovery Vendor by v1shal

>> Three Upcoming Talks on Big Data and Customer Experience Management by bobehayes

Wanna write? Click Here

[ NEWS BYTES]

>>
 Tyler PD crime statistics show decrease in homicides, increase in violent crimes – KLTV Under  Statistics

>>
 100 Top Hospitals announced by IBM Watson Health – Healthcare Finance News Under  Health Analytics

>>
 Morristown Police Department statistics March 30-April 5 | Police … – Stowe Today Under  Statistics

More NEWS ? Click Here

[ FEATURED COURSE]

Baseball Data Wrangling with Vagrant, R, and Retrosheet

image

Analytics with the Chadwick tools, dplyr, and ggplot…. more

[ FEATURED READ]

Storytelling with Data: A Data Visualization Guide for Business Professionals

image

Storytelling with Data teaches you the fundamentals of data visualization and how to communicate effectively with data. You’ll discover the power of storytelling and the way to make data a pivotal point in your story. Th… more

[ TIPS & TRICKS OF THE WEEK]

Finding a success in your data science ? Find a mentor
Yes, most of us dont feel a need but most of us really could use one. As most of data science professionals work in their own isolations, getting an unbiased perspective is not easy. Many times, it is also not easy to understand how the data science progression is going to be. Getting a network of mentors address these issues easily, it gives data professionals an outside perspective and unbiased ally. It’s extremely important for successful data science professionals to build a mentor network and use it through their success.

[ DATA SCIENCE Q&A]

Q:How to optimize algorithms? (parallel processing and/or faster algorithms). Provide examples for both?
A: Premature optimization is the root of all evil – Donald Knuth

Parallel processing: for instance in R with a single machine.
– doParallel and foreach package
– doParallel: parallel backend, will select n-cores of the machine
– for each: assign tasks for each core
– using Hadoop on a single node
– using Hadoop on multi-node

Faster algorithm:
– In computer science: Pareto principle; 90% of the execution time is spent executing 10% of the code
– Data structure: affect performance
– Caching: avoid unnecessary work
– Improve source code level
For instance: on early C compilers, WHILE(something) was slower than FOR(;;), because WHILE evaluated “something” and then had a conditional jump which tested if it was true while FOR had unconditional jump.

Source

[ VIDEO OF THE WEEK]

#BigData @AnalyticsWeek #FutureOfData #Podcast with Eloy Sasot, News Corp

 #BigData @AnalyticsWeek #FutureOfData #Podcast with Eloy Sasot, News Corp

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

He uses statistics as a drunken man uses lamp posts—for support rather than for illumination. – Andrew Lang

[ PODCAST OF THE WEEK]

Using Analytics to build A #BigData #Workforce

 Using Analytics to build A #BigData #Workforce

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

Poor data can cost businesses 20%–35% of their operating revenue.

Sourced from: Analytics.CLUB #WEB Newsletter

Apr 12, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Extrapolating  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> Venu (@VenuV62 | @ProcterGamble) on creating a rockstar data science team #FutureOfData by admin

>> Tackling 4th Industrial Revolution with HR4.0 – Playcast – Data Analytics Leadership Playbook Podcast by v1shal

>> Google loses data as lightning strikes by analyticsweekpick

Wanna write? Click Here

[ NEWS BYTES]

>>
 Lakefront Hammond data center to include tech incubator, Purdue greenhouse – nwitimes.com Under  Data Center

>>
 GE tries to shed complexity with sale of HIT lines – Health Data Management Under  Health Analytics

>>
 As Worlds Collide: Join Xconomy for Big Data Meets Big Biology – Xconomy Under  Big Data

More NEWS ? Click Here

[ FEATURED COURSE]

Artificial Intelligence

image

This course includes interactive demonstrations which are intended to stimulate interest and to help students gain intuition about how artificial intelligence methods work under a variety of circumstances…. more

[ FEATURED READ]

Storytelling with Data: A Data Visualization Guide for Business Professionals

image

Storytelling with Data teaches you the fundamentals of data visualization and how to communicate effectively with data. You’ll discover the power of storytelling and the way to make data a pivotal point in your story. Th… more

[ TIPS & TRICKS OF THE WEEK]

Analytics Strategy that is Startup Compliant
With right tools, capturing data is easy but not being able to handle data could lead to chaos. One of the most reliable startup strategy for adopting data analytics is TUM or The Ultimate Metric. This is the metric that matters the most to your startup. Some advantages of TUM: It answers the most important business question, it cleans up your goals, it inspires innovation and helps you understand the entire quantified business.

[ DATA SCIENCE Q&A]

Q:What is the life cycle of a data science project ?
A: 1. Data acquisition
Acquiring data from both internal and external sources, including social media or web scraping. In a steady state, data extraction and routines should be in place, and new sources, once identified would be acquired following the established processes

2. Data preparation
Also called data wrangling: cleaning the data and shaping it into a suitable form for later analyses. Involves exploratory data analysis and feature extraction.

3. Hypothesis & modelling
Like in data mining but not with samples, with all the data instead. Applying machine learning techniques to all the data. A key sub-step: model selection. This involves preparing a training set for model candidates, and validation and test sets for comparing model performances, selecting the best performing model, gauging model accuracy and preventing overfitting

4. Evaluation & interpretation

Steps 2 to 4 are repeated a number of times as needed; as the understanding of data and business becomes clearer and results from initial models and hypotheses are evaluated, further tweaks are performed. These may sometimes include step5 and be performed in a pre-production.

5. Deployment

6. Operations
Regular maintenance and operations. Includes performance tests to measure model performance, and can alert when performance goes beyond a certain acceptable threshold

7. Optimization
Can be triggered by failing performance, or due to the need to add new data sources and retraining the model or even to deploy new versions of an improved model

Note: with increasing maturity and well-defined project goals, pre-defined performance can help evaluate feasibility of the data science project early enough in the data-science life cycle. This early comparison helps the team refine hypothesis, discard the project if non-viable, change approaches.

Steps 2 to 4 are repeated a number of times as needed; as the understanding of data and business becomes clearer and results from initial models and hypotheses are evaluated, further tweaks are performed. These may sometimes include step5 and be performed in a pre-production.

Deployment

Operations
Regular maintenance and operations. Includes performance tests to measure model performance, and can alert when performance goes beyond a certain acceptable threshold

Optimization
Can be triggered by failing performance, or due to the need to add new data sources and retraining the model or even to deploy new versions of an improved model

Note: with increasing maturity and well-defined project goals, pre-defined performance can help evaluate feasibility of the data science project early enough in the data-science life cycle. This early comparison helps the team refine hypothesis, discard the project if non-viable, change approaches.

Steps 2 to 4 are repeated a number of times as needed; as the understanding of data and business becomes clearer and results from initial models and hypotheses are evaluated, further tweaks are performed. These may sometimes include step5 and be performed in a pre-production.

Deployment

Operations
Regular maintenance and operations. Includes performance tests to measure model performance, and can alert when performance goes beyond a certain acceptable threshold

Optimization
Can be triggered by failing performance, or due to the need to add new data sources and retraining the model or even to deploy new versions of an improved model

Note: with increasing maturity and well-defined project goals, pre-defined performance can help evaluate feasibility of the data science project early enough in the data-science life cycle. This early comparison helps the team refine hypothesis, discard the project if non-viable, change approaches.

Steps 2 to 4 are repeated a number of times as needed; as the understanding of data and business becomes clearer and results from initial models and hypotheses are evaluated, further tweaks are performed. These may sometimes include step5 and be performed in a pre-production.

Deployment

Operations
Regular maintenance and operations. Includes performance tests to measure model performance, and can alert when performance goes beyond a certain acceptable threshold

Optimization
Can be triggered by failing performance, or due to the need to add new data sources and retraining the model or even to deploy new versions of an improved model

Note: with increasing maturity and well-defined project goals, pre-defined performance can help evaluate feasibility of the data science project early enough in the data-science life cycle. This early comparison helps the team refine hypothesis, discard the project if non-viable, change approaches.

Source

[ VIDEO OF THE WEEK]

#BigData @AnalyticsWeek #FutureOfData #Podcast with @ScottZoldi, @FICO

 #BigData @AnalyticsWeek #FutureOfData #Podcast with @ScottZoldi, @FICO

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

Without big data, you are blind and deaf and in the middle of a freeway. – Geoffrey Moore

[ PODCAST OF THE WEEK]

#BigData @AnalyticsWeek #FutureOfData with Jon Gibs(@jonathangibs) @L2_Digital

 #BigData @AnalyticsWeek #FutureOfData with Jon Gibs(@jonathangibs) @L2_Digital

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

And one of my favourite facts: At the moment less than 0.5% of all data is ever analysed and used, just imagine the potential here.

Sourced from: Analytics.CLUB #WEB Newsletter

Apr 05, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Insights  Source

[ AnalyticsWeek BYTES]

>> Who could be a Startup CEO? Ben Horowitz’s 2 cents  by v1shal

>> Where Big Data Projects Fail by analyticsweekpick

>> August 7, 2017 Health and Biotech analytics news roundup by pstein

Wanna write? Click Here

[ NEWS BYTES]

>>
 Big firms take an FB break – Khaleej Times Under  Data Security

>>
 4 modern challenges for the Internet of Things – Information Age Under  Internet Of Things

>>
 A Data Scientist Was Sick of Seeing Spam on His Facebook so He Built a Fake News Detector – Motherboard Under  Data Scientist

More NEWS ? Click Here

[ FEATURED COURSE]

Machine Learning

image

6.867 is an introductory course on machine learning which gives an overview of many concepts, techniques, and algorithms in machine learning, beginning with topics such as classification and linear regression and ending … more

[ FEATURED READ]

Superintelligence: Paths, Dangers, Strategies

image

The human brain has some capabilities that the brains of other animals lack. It is to these distinctive capabilities that our species owes its dominant position. Other animals have stronger muscles or sharper claws, but … more

[ TIPS & TRICKS OF THE WEEK]

Strong business case could save your project
Like anything in corporate culture, the project is oftentimes about the business, not the technology. With data analysis, the same type of thinking goes. It’s not always about the technicality but about the business implications. Data science project success criteria should include project management success criteria as well. This will ensure smooth adoption, easy buy-ins, room for wins and co-operating stakeholders. So, a good data scientist should also possess some qualities of a good project manager.

[ DATA SCIENCE Q&A]

Q:How to efficiently scrape web data, or collect tons of tweets?
A: * Python example
* Requesting and fetching the webpage into the code: httplib2 module
* Parsing the content and getting the necessary info: BeautifulSoup from bs4 package
* Twitter API: the Python wrapper for performing API requests. It handles all the OAuth and API queries in a single Python interface
* MongoDB as the database
* PyMongo: the Python wrapper for interacting with the MongoDB database
* Cronjobs: a time based scheduler in order to run scripts at specific intervals; allows to bypass the “rate limit exceed” error

Source

[ VIDEO OF THE WEEK]

@AnalyticsWeek #FutureOfData with Robin Thottungal(@rathottungal), Chief Data Scientist at @EPA

 @AnalyticsWeek #FutureOfData with Robin Thottungal(@rathottungal), Chief Data Scientist at @EPA

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

The temptation to form premature theories upon insufficient data is the bane of our profession. – Sherlock Holmes

[ PODCAST OF THE WEEK]

Unconference Panel Discussion: #Workforce #Analytics Leadership Panel

 Unconference Panel Discussion: #Workforce #Analytics Leadership Panel

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

More than 200bn HD movies – which would take a person 47m years to watch.

Sourced from: Analytics.CLUB #WEB Newsletter

Mar 29, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Correlation-Causation  Source

[ AnalyticsWeek BYTES]

>> Why You Must Not Have Any Doubts About Cloud Security by thomassujain

>> Investing in Big Data by Bill Pieroni by thebiganalytics

>> The backlash against big data by analyticsweekpick

Wanna write? Click Here

[ NEWS BYTES]

>>
 Apartment Investment & Management Co (NYSE:AIV) Institutional Investor Sentiment Analysis – Frisco Fastball Under  Sentiment Analysis

>>
 Hadoop Infrastructure Engineer – Built In Chicago Under  Hadoop

>>
 Watch the #MeToo campaign spread around the world on Facebook, Twitter, and Instagram – Fast Company Under  Social Analytics

More NEWS ? Click Here

[ FEATURED COURSE]

Pattern Discovery in Data Mining

image

Learn the general concepts of data mining along with basic methodologies and applications. Then dive into one subfield in data mining: pattern discovery. Learn in-depth concepts, methods, and applications of pattern disc… more

[ FEATURED READ]

Storytelling with Data: A Data Visualization Guide for Business Professionals

image

Storytelling with Data teaches you the fundamentals of data visualization and how to communicate effectively with data. You’ll discover the power of storytelling and the way to make data a pivotal point in your story. Th… more

[ TIPS & TRICKS OF THE WEEK]

Save yourself from zombie apocalypse from unscalable models
One living and breathing zombie in today’s analytical models is the pulsating absence of error bars. Not every model is scalable or holds ground with increasing data. Error bars that is tagged to almost every models should be duly calibrated. As business models rake in more data the error bars keep it sensible and in check. If error bars are not accounted for, we will make our models susceptible to failure leading us to halloween that we never wants to see.

[ DATA SCIENCE Q&A]

Q:Explain what a long-tailed distribution is and provide three examples of relevant phenomena that have long tails. Why are they important in classification and regression problems?
A: * In long tailed distributions, a high frequency population is followed by a low frequency population, which gradually tails off asymptotically
* Rule of thumb: majority of occurrences (more than half, and when Pareto principles applies, 80%) are accounted for by the first 20% items in the distribution
* The least frequently occurring 80% of items are more important as a proportion of the total population
* Zipf’s law, Pareto distribution, power laws

Examples:
1) Natural language
– Given some corpus of natural language – The frequency of any word is inversely proportional to its rank in the frequency table
– The most frequent word will occur twice as often as the second most frequent, three times as often as the third most frequent…
– The” accounts for 7% of all word occurrences (70000 over 1 million)
– ‘of” accounts for 3.5%, followed by ‘and”…
– Only 135 vocabulary items are needed to account for half the English corpus!

2. Allocation of wealth among individuals: the larger portion of the wealth of any society is controlled by a smaller percentage of the people

3. File size distribution of Internet Traffic

Additional: Hard disk error rates, values of oil reserves in a field (a few large fields, many small ones), sizes of sand particles, sizes of meteorites

Importance in classification and regression problems:
– Skewed distribution
– Which metrics to use? Accuracy paradox (classification), F-score, AUC
– Issue when using models that make assumptions on the linearity (linear regression): need to apply a monotone transformation on the data (logarithm, square root, sigmoid function…)
– Issue when sampling: your data becomes even more unbalanced! Using of stratified sampling of random sampling, SMOTE (‘Synthetic Minority Over-sampling Technique”, NV Chawla) or anomaly detection approach

Source

[ VIDEO OF THE WEEK]

#FutureOfData with Rob(@telerob) / @ConnellyAgency on running innovation in agency

 #FutureOfData with Rob(@telerob) / @ConnellyAgency on running innovation in agency

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

If you can’t explain it simply, you don’t understand it well enough. – Albert Einstein

[ PODCAST OF THE WEEK]

#FutureOfData Podcast: Peter Morgan, CEO, Deep Learning Partnership

 #FutureOfData Podcast: Peter Morgan, CEO, Deep Learning Partnership

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

IDC Estimates that by 2020,business transactions on the internet- business-to-business and business-to-consumer – will reach 450 billion per day.

Sourced from: Analytics.CLUB #WEB Newsletter

Mar 22, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Data Mining  Source

[ NEWS BYTES]

>>
 Trick or treat! Halloween spending statistics – Seeking Alpha Under  Statistics

>>
 Battleground on drug price legislation shifts to states and so does lobbying by PhRMA – MedCity News Under  Health Analytics

>>
 Increase in Data Discovery Tools to Propel the Global Prescriptive Analytics Market – Edition Truth Under  Prescriptive Analytics

More NEWS ? Click Here

[ FEATURED COURSE]

Statistical Thinking and Data Analysis

image

This course is an introduction to statistical data analysis. Topics are chosen from applied probability, sampling, estimation, hypothesis testing, linear regression, analysis of variance, categorical data analysis, and n… more

[ FEATURED READ]

Rise of the Robots: Technology and the Threat of a Jobless Future

image

What are the jobs of the future? How many will there be? And who will have them? As technology continues to accelerate and machines begin taking care of themselves, fewer people will be necessary. Artificial intelligence… more

[ TIPS & TRICKS OF THE WEEK]

Save yourself from zombie apocalypse from unscalable models
One living and breathing zombie in today’s analytical models is the pulsating absence of error bars. Not every model is scalable or holds ground with increasing data. Error bars that is tagged to almost every models should be duly calibrated. As business models rake in more data the error bars keep it sensible and in check. If error bars are not accounted for, we will make our models susceptible to failure leading us to halloween that we never wants to see.

[ DATA SCIENCE Q&A]

Q:Explain what resampling methods are and why they are useful?
A: * repeatedly drawing samples from a training set and refitting a model of interest on each sample in order to obtain additional information about the fitted model
* example: repeatedly draw different samples from training data, fit a linear regression to each new sample, and then examine the extent to which the resulting fit differ
* most common are: cross-validation and the bootstrap
* cross-validation: random sampling with no replacement
* bootstrap: random sampling with replacement
* cross-validation: evaluating model performance, model selection (select the appropriate level of flexibility)
* bootstrap: mostly used to quantify the uncertainty associated with a given estimator or statistical learning method

Source

[ VIDEO OF THE WEEK]

Data-As-A-Service (#DAAS) to enable compliance reporting

 Data-As-A-Service (#DAAS) to enable compliance reporting

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

He uses statistics as a drunken man uses lamp posts—for support rather than for illumination. – Andrew Lang

[ PODCAST OF THE WEEK]

Pascal Marmier (@pmarmier) @SwissRe discusses running data driven innovation catalyst

 Pascal Marmier (@pmarmier) @SwissRe discusses running data driven innovation catalyst

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

Data production will be 44 times greater in 2020 than it was in 2009.

Sourced from: Analytics.CLUB #WEB Newsletter