Dec 13, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Statistics  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> Big Data Advances in Customer Experience Management by bobehayes

>> Tackling 4th Industrial Revolution with HR4.0 – Playcast – Data Analytics Leadership Playbook Podcast by v1shal

>> Jeff Palmucci / @TripAdvisor discusses managing a #MachineLearning #AI Team by v1shal

Wanna write? Click Here

[ NEWS BYTES]

>>
 Global Sentiment Analysis Software Market Size, Growth Opportunities, Current Trends, Forecast by 2025 – Redfield Herald (press release) (blog) Under  Sentiment Analysis

>>
 Ultimate Software Climbs Into the Cloud – Motley Fool Under  Cloud

>>
 Google Analytics updated with Google Material Theme tweaks on the web – 9to5Google Under  Analytics

More NEWS ? Click Here

[ FEATURED COURSE]

Artificial Intelligence

image

This course includes interactive demonstrations which are intended to stimulate interest and to help students gain intuition about how artificial intelligence methods work under a variety of circumstances…. more

[ FEATURED READ]

Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking

image

Written by renowned data science experts Foster Provost and Tom Fawcett, Data Science for Business introduces the fundamental principles of data science, and walks you through the “data-analytic thinking” necessary for e… more

[ TIPS & TRICKS OF THE WEEK]

Grow at the speed of collaboration
A research by Cornerstone On Demand pointed out the need for better collaboration within workforce, and data analytics domain is no different. A rapidly changing and growing industry like data analytics is very difficult to catchup by isolated workforce. A good collaborative work-environment facilitate better flow of ideas, improved team dynamics, rapid learning, and increasing ability to cut through the noise. So, embrace collaborative team dynamics.

[ DATA SCIENCE Q&A]

Q:Explain what a false positive and a false negative are. Why is it important these from each other? Provide examples when false positives are more important than false negatives, false negatives are more important than false positives and when these two types of errors are equally important
A: * False positive
Improperly reporting the presence of a condition when it’s not in reality. Example: HIV positive test when the patient is actually HIV negative

* False negative
Improperly reporting the absence of a condition when in reality it’s the case. Example: not detecting a disease when the patient has this disease.

When false positives are more important than false negatives:
– In a non-contagious disease, where treatment delay doesn’t have any long-term consequences but the treatment itself is grueling
– HIV test: psychological impact

When false negatives are more important than false positives:
– If early treatment is important for good outcomes
– In quality control: a defective item passes through the cracks!
– Software testing: a test to catch a virus has failed

Source

[ VIDEO OF THE WEEK]

@AnalyticsWeek Panel Discussion: Finance and Insurance Analytics

 @AnalyticsWeek Panel Discussion: Finance and Insurance Analytics

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

Without big data, you are blind and deaf and in the middle of a freeway. – Geoffrey Moore

[ PODCAST OF THE WEEK]

@AlexWG on Unwrapping Intelligence in #ArtificialIntelligence #FutureOfData #Podcast

 @AlexWG on Unwrapping Intelligence in #ArtificialIntelligence #FutureOfData #Podcast

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

And one of my favourite facts: At the moment less than 0.5% of all data is ever analysed and used, just imagine the potential here.

Sourced from: Analytics.CLUB #WEB Newsletter

The Data Driven Road Less Traveled

data_img

To help better explain the topic, let me take a small detour; explain conventional big business paradox and why it a threat in today’s economy. Remember the days when large businesses were coined 800-pound gorilla and small businesses only dreamt about touching their market share. That in some sense is the conventional big business paradox. It is not true anymore. With ever connected world, easy access to cutting edge platform and methodologies, even small businesses have access to disruptive technologies and ways. In fact Small businesses have an advantage. They could react quickly, act nimbly and have a better focus. So, it is not a surprise that every now and then, big businesses are getting small blows by companies running against conventional big business paradox. So, what is wrong? Big organizations more often than required, run on their conventional ways. Sure, you could argue that scale and size makes them slow, however, try explaining it to businesses like Amazon, Salesforce and Google. In the current landscape, rapidly changing market and customer dynamics demand better ways to analyze the evolving customer expectations and faster response times.

Besides getting your hands on best talent in the market, large businesses should create ways to introduce another paradigm that would help them identify and understand changing customer expectations and technology paradigm. I call it Data Driven Enterprise.

Data never lies, never introduces bias, nor leaves anything to assumptions. Data driven businesses have been proven time and again as more sustainable business. In fact, if you talk about star products of big businesses, they are extensively monitored. What fails most businesses is their lack of attention towards nooks and crannies, where we first observe the signs of the changing customer expectations/ preference/ technology. That is why a centralized data driven approach is the way to go.

A data driven framework is not complicated but a series of focused steps taken to achieve a data focused enterprise. Think of it as an engine to rapidly validate hypothesis in a lean/iterative manner. Sounds cool? Yes, it is! More often than before, more businesses are aligning themselves to a better data driven enterprise. Want more information…

I’ve written an ebook, which had crossed 3k downloads last month. It has a series of easy to digest steps for building thought leadership that is needed to help take a big business to data driven innovation route. Please feel free to download the ebook at: http://pxl.me/ddibook and let me know your thoughts.
Remember, it’s just the start of the discussion; together we all have to travel a long road to sustained business growth.

Originally Posted at: The Data Driven Road Less Traveled by d3eksha

Dec 06, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Big Data knows everything  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> Development of the Customer Sentiment Index: Lexical Differences by bobehayes

>> Using Big Data In A Crisis: Nepal Earthquake by analyticsweekpick

>> Customer Loyalty and Customer Lifetime Value by bobehayes

Wanna write? Click Here

[ NEWS BYTES]

>>
 Senior Data Scientist – Built In Chicago Under  Data Scientist

>>
 Risk Analytics Market 2018-2023: Top Company, Highest manufactures, Competitors, challenges and Drivers with … – News Egypt (press release) (blog) Under  Risk Analytics

>>
 Egyptian Pollution Plan Helps Combat ‘Black Cloud’ – Voice of America Under  Cloud

More NEWS ? Click Here

[ FEATURED COURSE]

Python for Beginners with Examples

image

A practical Python course for beginners with examples and exercises…. more

[ FEATURED READ]

Research Design: Qualitative, Quantitative, and Mixed Methods Approaches, 4th Edition

image

The eagerly anticipated Fourth Edition of the title that pioneered the comparison of qualitative, quantitative, and mixed methods research design is here! For all three approaches, Creswell includes a preliminary conside… more

[ TIPS & TRICKS OF THE WEEK]

Save yourself from zombie apocalypse from unscalable models
One living and breathing zombie in today’s analytical models is the pulsating absence of error bars. Not every model is scalable or holds ground with increasing data. Error bars that is tagged to almost every models should be duly calibrated. As business models rake in more data the error bars keep it sensible and in check. If error bars are not accounted for, we will make our models susceptible to failure leading us to halloween that we never wants to see.

[ DATA SCIENCE Q&A]

Q:Explain what a local optimum is and why it is important in a specific context,
such as K-means clustering. What are specific ways of determining if you have a local optimum problem? What can be done to avoid local optima?

A: * A solution that is optimal in within a neighboring set of candidate solutions
* In contrast with global optimum: the optimal solution among all others

* K-means clustering context:
It’s proven that the objective cost function will always decrease until a local optimum is reached.
Results will depend on the initial random cluster assignment

* Determining if you have a local optimum problem:
Tendency of premature convergence
Different initialization induces different optima

* Avoid local optima in a K-means context: repeat K-means and take the solution that has the lowest cost

Source

[ VIDEO OF THE WEEK]

Harsh Tiwari talks about fabric of data driven leader in Financial Sector #FutureOfData #Podcast

 Harsh Tiwari talks about fabric of data driven leader in Financial Sector #FutureOfData #Podcast

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

If we have data, let’s look at data. If all we have are opinions, let’s go with mine. – Jim Barksdale

[ PODCAST OF THE WEEK]

Harsh Tiwari talks about fabric of data driven leader in Financial Sector #FutureOfData #Podcast

 Harsh Tiwari talks about fabric of data driven leader in Financial Sector #FutureOfData #Podcast

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

We are seeing a massive growth in video and photo data, where every minute up to 300 hours of video are uploaded to YouTube alone.

Sourced from: Analytics.CLUB #WEB Newsletter

Big Data Explained in Less Than 2 Minutes – To Absolutely Anyone

There are some things that are so big that they have implications for everyone, whether we want them to or not. Big Data is one of those concepts, and is completely transforming the way we do business and is impacting most other parts of our lives.

It’s such an important idea that everyone from your grandma to your CEO needs to have a basic understanding of what it is and why it’s important.

Source for cartoon: click here

What is Big Data?

“Big Data” means different things to different people and there isn’t, and probably never will be, a commonly agreed upon definition out there. But the phenomenon is real and it is producing benefits in so many different areas, so it makes sense for all of us to have a working understanding of the concept.

So here’s my quick and dirty definition:

The basic idea behind the phrase ‘Big Data’ is that everything we do is increasingly leaving a digital trace (or data), which we (and others) can use and analyse. Big Data therefore refers to that data being collected and our ability to make use of it.

I don’t love the term “big data” for a lot of reasons, but it seems we’re stuck with it. It’s basically a ‘stupid’ term for a very real phenomenon – the datafication of our world and our increasing ability to analyze data in a way that was never possible before.

Of course, data collection itself isn’t new. We as humans have been collecting and storing data since as far back as 18,000 BCE. What’s new are the recent technological advances in chip and sensor technology, the Internet, cloud computing, and our ability to store and analyze data that have changed the quantityof data we can collect.

Things that have been a part of everyday life for decades — shopping, listening to music, taking pictures, talking on the phone — now happen more and more wholly or in part in the digital realm, and therefore leave a trail of data.

The other big change is in the kind of data we can analyze. It used to be that data fit neatly into tables and spreadsheets, things like sales figures and wholesale prices and the number of customers that came through the door.

Now data analysts can also look at “unstructured” data like photos, tweets, emails, voice recordings and sensor data to find patterns.

How is it being used?

As with any leap forward in innovation, the tool can be used for good or nefarious purposes. Some people are concerned about privacy, as more and more details of our lives are being recorded and analyzed by businesses, agencies, and governments every day. Those concerns are real and not to be taken lightly, and I believe that best practices, rules, and regulations will evolve alongside the technology to protect individuals.

But the benefits of big data are very real, and truly remarkable.

Most people have some idea that companies are using big data to better understand and target customers. Using big data, retailers can predict what products will sell, telecom companies can predict if and when a customer might switch carriers, and car insurance companies understand how well their customers actually drive.

It’s also used to optimize business processes. Retailers are able to optimize their stock levels based on what’s trending on social media, what people are searching for on the web, or even weather forecasts. Supply chains can be optimized so that delivery drivers use less gas and reach customers faster.

But big data goes way beyond shopping and consumerism. Big data analytics enable us to find new cures and better understand and predict the spread of diseases. Police forces use big data tools to catch criminals and even predict criminal activity and credit card companies use big data analytics it to detect fraudulent transactions. A number of cities are even using big data analytics with the aim of turning themselves into Smart Cities, where a bus would know to wait for a delayed train and where traffic signals predict traffic volumes and operate to minimize jams.

Why is it so important?

The biggest reason big data is important to everyone is that it’s a trend that’s only going to grow.

As the tools to collect and analyze the data become less and less expensive and more and more accessible, we will develop more and more uses for it — everything from smart yoga mats to better healthcare tools and a more effective police force.

And, if you live in the modern world, it’s not something you can escape. Whether you’re all for the benefits big data can bring, or worried about Big Brother, it’s important to be aware of the phenomena and tuned in to how it’s affecting your daily life.

What are your biggest questions about big data? I’d love to hear them in the comments below — and they may inspire future posts to address them.

To read the full article on Data Science Central, click here.

Source: Big Data Explained in Less Than 2 Minutes – To Absolutely Anyone

Bias: Breaking the Chain that Holds Us Back

Speaker Bio: Dr. Vivienne Ming was named one of 10 Women to Watch in Tech by Inc. Magazine, she is a theoretical neuroscientist, entrepreneur, and author. She co-founded Socos Labs, her fifth company, an independent think tank exploring the future of human potential. Dr. Ming launched Socos Labs to combine her varied work with that of other creative experts and expand their impact on global policy issues, both inside companies and throughout our communities. Previously, Vivienne was a visiting scholar at UC Berkeley’s Redwood Center for Theoretical Neuroscience, pursuing her research in cognitive neuroprosthetics. In her free time, Vivienne has invented AI systems to help treat her diabetic son, predict manic episodes in bipolar sufferers weeks in advance, and reunited orphan refugees with extended family members. She sits on boards of numerous companies and nonprofits including StartOut, The Palm Center, Cornerstone Capital, Platypus Institute, Shiftgig, Zoic Capital, and SmartStones. Dr. Ming also speaks frequently on her AI-driven research into inclusion and gender in business. For relaxation, she is a wife and mother of two.

Distilled Blog Post Summary: Dr. Vivienne Ming’s talk at a recent Domino MeetUp delved into bias and its implications, including potential liabilities for algorithms, models, businesses, and humans. Dr. Ming’s evidence included first-hand knowledge fundraising for multiple startups, data analysis completed during her tenure as the Chief Scientist at Gild, as well as citing studies within data, economics, recruiting, and education. This blog post provides text and video clip highlights from the talk. The full video is available for viewing. If you are interested in viewing additional content from Domino’s past events, review the Data Science Popup Playlist. If you are interested in attending an event in-person, then consider the upcoming Rev.

Research, Experimentation, and Discovery: Core of Science

Research, experimentation, and discovery are at the core of all types of science, including data science. Dr. Ming kicked off the talk with indicating “one of the powers of doing a lot of rich data work, there’s this whole range– I mean, there’s very little in this world that’s not an entree into”. While Dr. Ming provided detailed insights and evidence that pointed to the potential of rich data work during the entire talk, this blog post focuses on the implications and liabilities of bias within gender, names, and ethnic demographics. It also covers how bias isn’t solely a data or algorithm problem, it is a human problem. The first step to address bias is acknowledging that it exists.

Do You See the Chameleon? The Roots of Bias

Each one of us has biases and makes assessments based on those biases. Dr. Ming uses Johannes Stotter’s Chameleon to point out that “the roots of bias are fundamental and unavoidable”. Many people when they see the image, see a chameleon. However, the chameleon image consists of two people covered in body paint and are strategically placed to look like a chameleon. In the video clip below, Dr. Ming indicates

“I cannot make an unbiased AI. There are no unbiased rats in the world. In a very basic sense, these systems are making decisions on their uncertainty, and the only rational way to do that is to act the best we can given the data. The problem is when you refuse to acknowledge there’s a problem with our bias and actually do something about it. And we have this tremendous amount of evidence that there is a serious problem, and it’s holding, not just small things back. But as I’m going to get to later, it’s holding us back from a transformed world, one that I think anyone can selfishly celebrate.”

https://fast.wistia.com/assets/external/E-v1.js

Bias as the Pat on the Head (or the Chain) that Holds Us Back

While history is filled with moments when bias is not acknowledged as a problem, there are also moments when people addressed societal-reinforced gender bias. Women have assumed male nom de plumes to write epic novels, fight in wars, win Judo championships, run marathons, and even, as Dr. Ming pointed out, create an all-women software company called Freelance Programmers in the 1960s. During the meetup, Dr. Ming indicated that Dame Stephane “Steve” Shirley’s TedTalk, “Why do ambitious women have flat heads?”, helped her parse two distinctly different startup fundraising experiences that were grounded in gender bias.

Prior to Dr. Ming co-founding her current education technology company and obtaining her academic credentials, she dropped out of college and started a film company. When

“we started this company, and the funny thing is, despite having nothing, nothing that anyone should invest in– we didn’t have a script. We didn’t have talent. Literally, we didn’t even have talent. We didn’t have experience. We had nothing. We essentially raised what you might in the tech industry called seed round after a few phone calls.“

However, raising funding was more difficult the second time, for her current company, despite having substantially more academic, technology, and business credentials. During one of the funding meetings with a small firm with 5 partners, Dr. Ming relayed how the last partner said “‘you should feel so proud of what you’ve built’. And at the time, I thought, oh, Jesus, at least one of these people is on our side. In fact, as we were leaving the room, he literally patted me on the head, which seemed a little strange.” This prompted Dr. Ming to consider how

“my credentials are transformed that second time. No one questioned us about the technology. They loved it. They questioned whether we know how to run a business. The product itself people loved versus a film. Everything the second time around should have been dramatically easier. Except the only real difference that I can see is that the first time I was a man and the second time I was a woman.“

This led Dr. Ming to conclude and understand what Stephanie Shirley meant by ambitious women having flat heads from all of the times they have been pat on the head. Dr. Ming relays that

“I’ve learned ever since as an entrepreneur is, as soon as it feels like they’re dealing with their favorite niece rather than me as a business person, then I know, I know that they simply are not taking me seriously. And all the PhD’s in the world doesn’t matter, all the past successes in my other companies doesn’t matter. You are just that thing to me. And what I’ve learned is, figure that out ahead of time. Don’t bother wasting days and hours, and prepping to pitch to people that simply are not capable of understanding who you are, but of course, in a lot of context, that’s all you’ve got.“

Dr. Ming also pointed out that the bias due to gender also manifested at an organization where she worked before and after her gender transition. She noted when she went into work after her gender transition,

“That’s the last day anyone ever asked me a math question, which is kind of funny. I do happen to also have a PhD in psychology. But somehow one day to the next, I didn’t forget how to do convergence proofs. I didn’t forget what it meant to invent algorithms. And yet that was how people dealt with it, people who knew before. You see how powerful the change is to see someone in a different skin.”

This experience is similar to Dame Shirley’s, who, in order to start what would become a multi-billion dollar software company in the 1960s, “started to challenge the conventions of the time, even to the extent of changing my name from “Stephanie” to “Steve” in my business development letters, so as to get through the door before anyone realized that he was a she”. Dame Shirley subverted bias during a time when she, as a female, was prevented from working on the stock exchange, driving a bus, or, “Indeed, I couldn’t open a bank account without my husband’s permission”. Yet, despite the bias, Dame Shirley remarked

“who would have guessed that the programming of the black box flight recorder of Supersonic Concord would have been done by a bunch of women working in their own homes” ….”And later, when it was a company valued at over three billion dollars, and I’d made 70 of the staff into millionaires, they sort of said, “Well done, Steve!”

While it is no longer the 1960s, bias implications and liabilities are still present. Yet, we in data science are able to access data to have open conversations about bias as the first step avoiding inaccuracies, training data liabilities, and model liabilities within our data science projects and analysis. What if, in 2018, people built and trained models based on the assumption that humans with XY chromosomes lacked the ability to code because they only reviewed and used data from Dame Shirley’s company in the 1960s? Consider that a moment, as that is what happened to Dame Shirley, Dr. Ming, and many others. Bias implications and liabilities have real world consequences. Being aware of the bias and then addressing it, moves the industry forward towards breaking the chain that holds research, data science, and us, back.

Say My Name: Biased Perceptions Uncovered

When Dr. Ming was the Chief Scientist at Gild, a reporter called her for a quote on the Jose Zamora story. This also led to Dr. Ming’s research on her upcoming book. “The Tax of Being Different”, Dr. Ming relayed anecdotes during the meetup (see video clip) and has also written about this research for the Financial Times:

“To calculate the tax on being different I made use of a data set of 122m professional profiles collected by Gild, a company specialising in tech for hiring and HR, where I worked as chief scientist. From that data, I was able to compare the career trajectories of specific populations by examining the actual individuals. For example, our data set had 151,604 people called “Joe” and 103,011 named “José”. After selecting only for software developers we still had 7,105 and 4,896 respectively, real people writing code for a living. Analysing their career trajectories I found that José typically needs a masters degree or higher compared to Joe with no degree at all to be equally likely to get a promotion for the same quality of work. The tax on being different is largely implicit. People need not act maliciously for it to be levied. This means that José needs six additional years of education and all of the tuition and opportunity costs that education entails. This is the tax on being different, and for José that tax costs $500,000-$1m over his lifetime.” (Financial Times)

https://fast.wistia.com/assets/external/E-v1.js

While this particular example focuses on ethnicity-oriented demographic bias, during the meetup discussion, Dr. Ming referenced quite a few research studies regarding name bias. In case Domino Data Science Blog readers do not have some of research she cites on hand, a sample of studies have published around bias with names include: names that suggest male gender, “noble-sounding” surnames in Europe, names that are perceived as “easy-to-pronounce” which also has implications for how organizations choose their names. Yet, Dr. Ming did not limit the discussion to bias within gender and naming, she also dived right into how demographic bias impacts image classification, particularly with ethnicity.

Bias within Image Classification: Missing Uhura and Not Unlocking your iPhone X

Before Dr. Ming was the Chief Data Scientist at Gild, she was able to see Paul Viola’s face recognition algorithm demo. In that demo, she noticed that the algorithm didn’t detect Uhura. Viola indicated that this was a problem and it would be addressed. Fast forward years later to when Dr. Ming was the Chief Scientist at Gild, she relayed how she received “a call from The Wall Street Journal [and WSJ asked her] ‘So Google’s face recognition system just labeled a black couple as gorillas. Is AI racist?’ And I said, ‘Well, it’s the same as the rest of us. It depends on how you raise it.’“

For background context, in 2015, Google released a new photo app and a software developer discovered that the app labeled two people of color as “gorillas”  and Yonatan Zunger was the Chief Architect for Social at Google at the time. Since Yonatan Zunger is no longer at Google, he has since provided candid commentary about bias. Then, in January 2018, Wired ran a follow up story regarding the 2015 event. In the article Wired tested Google Photos and found that the labels for gorillas, chimpanzees, chimp, and monkey “were censored from searches and image tags after the 2015 incident”. This was confirmed by Google. Wired also ran a test to assess view of people by conducting searches for “African American”, “black man”, “black woman”, or “black person” which resulted in “an image of a grazing antelope” (on the search “African American”) as well as “black-and-white images of people, correctly sorted by gender but not filtered by race”. This points to the continued challenges involved with addressing bias in machine learning and models. Bias that also has implications beyond social justice.

As Dr. Ming pointed out in the meetup video clip below, facial recognition is also built into the iPhone X. The face recognition feature has potential challenges in recognizing global faces of color. Yet, despite all of this, Dr. Ming indicates “but what you have to recognize, none of these are algorithm problems. These are human problems.” Humans made decisions to build algorithms, build models, train models, and roll out products that include bias that has wide implications.

https://fast.wistia.com/assets/external/E-v1.js

Conclusion

Introducing liability into an algorithm or model via bias isn’t solely a data or algorithm problem, it is a human problem. Understanding that it is a problem is the first step in addressing it. In the recent Domino Meetup, Dr. Ming relayed how

“AI is an amazing tool, but it’s just a tool. It will never solve your problems for you. You have to solve them. And particularly in the work I do, there are only ever messy human problems, and they only ever have messy human solutions. What’s amazing about machine learning is that once we found some of those issues, we can actually use it to reach as many people as possible, to make this essentially cost-effective, to scale that solution to everyone. But if you think some deep neural network is going to somehow magically figure out who you want to hire when you have not been hiring the right people in the first place, what is it you think is happening in that data set?”

Domino continually curates and amplifies ideas, perspectives, and research to contribute to discussions that accelerate data science work. The full video of Dr. Ming’s talk at the recent Domino MeetUp is available. There is also an additional technical talk that Dr. Ming gave at the Berkeley Institute of Data Science on “Maximizing Human Potential Using Machine Learning-Driven Applications”. If you are interested in similar content to these talks, please feel free to visit the Domino Data Science Popup Playlist or attend the upcoming Rev.

The post Bias: Breaking the Chain that Holds Us Back appeared first on Data Science Blog by Domino.

Source: Bias: Breaking the Chain that Holds Us Back

Nov 29, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Complex data  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> Top 4 Instagram Analytics Tools That Digital Marketers Can Use for Business by thomassujain

>> Meet the Robot Reading Your Resume [infographics] by v1shal

>> Adopting a Multi-Cloud Strategy: Challenges vs. Benefits by analyticsweek

Wanna write? Click Here

[ NEWS BYTES]

>>
 10 Best Social Media Management Tools for Marketers – TGDaily – TG Daily (blog) Under  Social Analytics

>>
 Tableau cracks the business data code…be a data scientist now for just $19 – TNW Under  Data Scientist

>>
 How efficient smart cities will be built on IoT sensors – TechRepublic Under  IOT

More NEWS ? Click Here

[ FEATURED COURSE]

Artificial Intelligence

image

This course includes interactive demonstrations which are intended to stimulate interest and to help students gain intuition about how artificial intelligence methods work under a variety of circumstances…. more

[ FEATURED READ]

The Signal and the Noise: Why So Many Predictions Fail–but Some Don’t

image

People love statistics. Statistics, however, do not always love them back. The Signal and the Noise, Nate Silver’s brilliant and elegant tour of the modern science-slash-art of forecasting, shows what happens when Big Da… more

[ TIPS & TRICKS OF THE WEEK]

Data aids, not replace judgement
Data is a tool and means to help build a consensus to facilitate human decision-making but not replace it. Analysis converts data into information, information via context leads to insight. Insights lead to decision making which ultimately leads to outcomes that brings value. So, data is just the start, context and intuition plays a role.

[ DATA SCIENCE Q&A]

Q:Why is naive Bayes so bad? How would you improve a spam detection algorithm that uses naive Bayes?
A: Naïve: the features are assumed independent/uncorrelated
Assumption not feasible in many cases
Improvement: decorrelate features (covariance matrix into identity matrix)

Source

[ VIDEO OF THE WEEK]

The History and Use of R

 The History and Use of R

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

It’s easy to lie with statistics. It’s hard to tell the truth without statistics. – Andrejs Dunkels

[ PODCAST OF THE WEEK]

@TimothyChou on World of #IOT & Its #Future Part 1 #FutureOfData #Podcast

 @TimothyChou on World of #IOT & Its #Future Part 1 #FutureOfData #Podcast

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

100 terabytes of data uploaded daily to Facebook.

Sourced from: Analytics.CLUB #WEB Newsletter

Unraveling the Mystery of Big Data

Slide01
Synopsis:
Curious about the Big Data hype? Want to find out just how big, BIG is? Who’s using Big Data for what, and what can you use it for? How about the architecture underpinnings and technology stacks? Where might you fit in the stack? Maybe some gotchas to avoid? Lionel Silberman, a seasoned Data Architect spreads some light on it. A good and wholesome refresher into Big Data and what all it can do.
Our guest speaker:

Lionel Silberman,
Senior Data Architect, Compuware
Lionel Silberman has over thirty years of experience in big data product development. He has expert knowledge of relational databases, both internals and applications, performance tuning, modeling, and programming. His product and development experience encompasses the major RDBMS vendors, object-oriented, time-series, OLAP, transaction-driven, MPP, distributed and federated database applications, data appliances, NoSQL systems Hadoop and Cassandra, as well as data parallel and mathematical algorithm development. He is currently employed at Compuware, integrating enterprise products at the data level. All are welcome to join us.

Video:

Slideshare:

Source by v1shal

Data Science Programming: Python vs R

“Data Scientist – The Sexiest Job of 21st Century.”- Harvard Business Review

If you are already into a big data related career then you must already be familiar with the set of big data skillsthat you need to master to grab the sexiest job of 21st century. With every industry generating massive amounts of data – the need to crunch data requires more powerful and sophisticated programming tools like Python and R language. Python and R are among the popular programming languages that a data scientist must know to pursue a lucrative career in data science.

Python is popular as a general purpose web programming language whereas R is popular for its great features for data visualization as it was particularly developed for statistical computing. At DeZyre, our career counsellors often get questions from prospective students as to what should they learn first Python programming or R programming. If you are unsure on which programming language to learn first then you are on the right page.

Python and R language top the list of basic tools for statistical computing among the set of data scientist skills.Data scientists often debate on the fact that which one is more valuable R programming or Python programming, however both the programming languages have their specialized key features complementing each other.

Data Science with Python Language

Data science consists of several interrelated but different activities such as computing statistics, building predictive models, accessing and manipulating data, building explanatory models, data visualizations, integrating models into production systems and much more on data. Python programming provides data scientists with a set of libraries that helps them perform all these operations on data.

Python is a general purpose multi-paradigm programming language for data science that has gained wide popularity-because of its syntax simplicity and operability on different eco-systems. Python programming can help programmers play with data by allowing them to do anything they need with data – data munging, data wrangling, website scraping, web application building, data engineering and more. Python language makes it easy for programmers to write maintainable, large scale robust code.

“Python programming has been an important part of Google since the beginning, and remains so as the system grows and evolves. Today dozens of Google engineers use Python language, and we’re looking for more people with skills in this language.” – said Peter Norvig, Director at Google.

Unlike R language, Python language does not have in-built packages but it has support for libraries like Scikit, Numpy, Pandas, Scipy and Seaborn that data scientists can use to perform useful statistical and machine learningtasks. Python programming is similar to pseudo code and makes sense immediately just like English language. The expressions and characters used in the code can be mathematical, however, the logic can be easily adhered from the code.

What makes Python language the King of Data Science Programming Languages?

“In Python programming, everything is an object. It’s possible to write applications in Python language using several programming paradigms, but it does make for writing very clear and understandable object-oriented code.”- said Brian Curtin, member of Python Software Foundation

1) Broadness

The public package index for Python language popularly known as PyPi has approximately 40K add-ons available listed under 300 different categories. So, if a developer or a data scientist has to do something with Python language then there is high probability that someone already has it and they need not begin from the scratch. Python programming is used extensively for various tasks ranging from CGI and web development, system testing and automation, and ETL to gaming.

2) Efficient

Developers these days spend lot of time in defining and processing big data. With the increasing amount of data that needs to be processed, it becomes extremely important for programmers to efficiently manage the in-memory usage. Python language has generators both from functions and also as expressions which helps in iterative processing i.e. one item at a time. When there are large number of processes to be applied to a set of data in that case generators in Python language prove to be great advantage as they grab the source data ,one item at a time and then pass through the entire processing chain.

The generator based migration tool collective.transmogrifier helps make complex and interdependent updates to the data as it is being processed from the old site and then allows the programmers to create and store objects in constant memory at the new site.The transmogrifier plays vital role in Python programming when dealing with larger data sets.

3) Can be Easily Mastered Under Expert Guidance-Read It, Use it with Ease

Python language has gained wide popularity as the syntax is clear and readable making it easy to learn under expert guidance. Data scientists can gain expertise knowledge and master programming with Python in scientific computing by taking industry expert oriented Python programming courses. The readability of the syntax makes it easier for other peer programmers update already written Python programs at a faster pace and also helps write new programs quickly.

Applications of Python language-

  • Python programming is used by Mozilla for exploring their broad code base. Mozilla releases several open source packages built using Python.
  • Dropbox, a popular file hosting service founded by Drew Houston as he kept forgetting his USB. The project was started to fulfill his personal needs but it turned out to be so good that even others started using it.Dropbox is completely written in Python language which now has close to 150 million registered users.
  • Walt Disney uses Python language to enhance the supremacy of their creative processes.
  • Some other exceptional products written in Python language are –

i. Cocos2d-A popular open source 2D gaming framework

ii.Mercurial- A popular cross-platform, distributed code revision control tool used by developers.

iii.Bit Torrent- File sharing software

iv.Reddit- Entertainment and Social News website.

Limitations of Python Programming-

  • Python is an interpreted language and thus is many a times slower than the compiled languages.
  • “A possible disadvantage of Python is its slow speed of execution. But many Python packages have been optimized over the years and execute at C speed.”- said Pierre Carbonnelle, a Python programmer who runs the PyPL language index.
  • Python language being a dynamically typed language poses certain design restrictions. It requires rigorous testing because errors show up only during runtime.
  • Python programming has gained popularity on desktop and server platforms but is still weak on mobile computing platforms as there are very less number of mobile apps that are developed using Python language. Python programming can be rarely found on the client side of web applications.

Click here to know more about our IBM Certified Hadoop Developer course

Data Science with R Language

Millions of data scientists and statisticians use R programming to get away with challenging problems related to statistical computing and quantitative marketing. R language has become an essential tool for finance and business analytics-driven organizations like LinkedIn, Twitter, Bank of America, Facebook and Google.

R is an open source programming language and environment for statistical computing and graphics available on Linux, Windows and Mac. R language has an innovative package system that allows developers to extend the functionality to new heights by providing cross-platform distribution and testing of data and code. With more than 5K publicly released packages available for download, it is just a great programming language for exploratory data analysis language can easily be integrated with other object oriented programming languages like C, C++ and Java. R language has array-oriented syntax making it easier for programmers to translate math to code, in particular for professionals with minimal programming background.

Why use R programming for data science?

1.R language is one of the best tools for data scientists in the world of data visualization. It virtually has everything that a data scientist needs- statistical models, data manipulation and visualization charts.

2.Data scientists can create unique and beautiful data visualizations with R language that go far beyond the out-dated line plots and bar charts. With R programming, data scientists can draw meaningful insights from data in multiple dimensions using 3D surfaces and multi-panel charts. The Economist and The New York Times exploit the custom charting capabilities of R programming to create stunning infographics.

3.One great feature of R programming is its reproducible research-the code and data can be given to an interested third party which can trace it back to reproduce the same results. Thus, data scientists need to write code that will extract the data, analyse it and generate a HTML, PDF or a PPT for reporting. When any other third party is interested, the original author can share the code and data with the third party for reproducing similar results.

4.R language is designed particularly for data analysis with a flexibility to mix and match various statistical and predictive models for best possible outcomes. R programming scripts can further be automated with ease to promote production deployments and reproducible research.

5.R language has rich community of approximately 2 million users and close to 1000’s of developers that draws talents of data scientists spread across the world. The community has packages widespread across actuarial analysis, finance, machine learning, web technologies,pharmaceuticals that can be of great help to predict component failure times, analyse genomic sequences, and optimize portfolios. All these resources created by experts in various domains can be accessed easily for free, online.

Applications of R Language

  • Ford uses open source tools like R programming and Hadoop for data driven decision support and statistical data analysis.
  • The popular insurance giant Lloyd’s uses R language to create motion charts that provide analysis reports to investors.
  • Google uses R programming to analyse the effectiveness of online advertising campaigns, predict economic activities and measure the ROI of advertising campaigns.
  • Facebook uses R language to analyse the status updates and create the social network graph.
  • Zillow makes use of R programming to promote the housing prices.

Limitations of R Language

  • R programming has a steep learning curve for professionals who do not come from a programming background (professionals hailing from a GUI world like that of MicrosoftExcel).
  • Working with R language can at times be slow if the code is written poorly, however, there are solutions to this like FastR package, pqR and Penjin.

Data Science with Python or R Programming- What to learn first?

There are certain strategies that will help professionals decide their call of action on whether to begin learning data science with Python language or with R language –

  • If professionals are aware of the fact on what kind of project they will be working on then they can make a decision on which language to learn first. If the projects requires working with jumbled or scrape data from files, websites or any other sources of data then professionals must first start their learning with Python language. On the other hand, if the project requires working with clean data then professionals must first learn to focus on the data analysis part which requires learning R programming first.
  • It is always better to be on-par with the teams so find out what data science  programming language are they using R or Python. Collaboration and learning becomes much easier if you and your team mates are on the same language paradigm.
  • Trends in increasing data scientist jobs will help make a better decision on which what to learn first R language or Python language.
  • Last but not the least, do consider your personal preferences as to what interests you more and which is easier for you to grasp.

Having understood briefly about Python language and R language, the bottom line here is that it is difficult to choose learning any one language first -Python or R to crack data scientist jobs in top big data companies. Each one has its own advantages and disadvantages based on the different scenarios and tasks to be performed. Thus, the best solution is to make a smart move based on the above listed strategies and decide which language you should learn first that will fetch you a job with big data scientist salary and later add onto your skill set by learning the other language.

To read the original article on DeZyre, click here.

Originally Posted at: Data Science Programming: Python vs R

Nov 22, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Ethics  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> How to #Leap into the #FutureOfWork by @HowardHYu #JobsOfFuture by v1shal

>> The Key to DevOps for Big Data Applications: Containers and Stateful Storage by jelaniharper

>> #OpenAnalyticsDay: A Day for Analytics by v1shal

Wanna write? Click Here

[ NEWS BYTES]

>>
 Social media analytic tools for business marketers – mtltimes.ca Under  Social Analytics

>>
 A How to Get More Bang from Your Big Data Clusters – CIO Under  Big Data

>>
 Five9 Aims To Unlock Insight From Contact Center With Artificial Intelligence – Forbes Under  Artificial Intelligence

More NEWS ? Click Here

[ FEATURED COURSE]

The Analytics Edge

image

This is an Archived Course
EdX keeps courses open for enrollment after they end to allow learners to explore content and continue learning. All features and materials may not be available, and course content will not be… more

[ FEATURED READ]

Big Data: A Revolution That Will Transform How We Live, Work, and Think

image

“Illuminating and very timely . . . a fascinating — and sometimes alarming — survey of big data’s growing effect on just about everything: business, government, science and medicine, privacy, and even on the way we think… more

[ TIPS & TRICKS OF THE WEEK]

Grow at the speed of collaboration
A research by Cornerstone On Demand pointed out the need for better collaboration within workforce, and data analytics domain is no different. A rapidly changing and growing industry like data analytics is very difficult to catchup by isolated workforce. A good collaborative work-environment facilitate better flow of ideas, improved team dynamics, rapid learning, and increasing ability to cut through the noise. So, embrace collaborative team dynamics.

[ DATA SCIENCE Q&A]

Q:When you sample, what bias are you inflicting?
A: Selection bias:
– An online survey about computer use is likely to attract people more interested in technology than in typical

Under coverage bias:
– Sample too few observations from a segment of population

Survivorship bias:
– Observations at the end of the study are a non-random set of those present at the beginning of the investigation
– In finance and economics: the tendency for failed companies to be excluded from performance studies because they no longer exist

Source

[ VIDEO OF THE WEEK]

#HumansOfSTEAM feat. Hussain Gadwal, Mechanical Designer via @SciThinkers #STEM #STEAM

 #HumansOfSTEAM feat. Hussain Gadwal, Mechanical Designer via @SciThinkers #STEM #STEAM

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

Getting information off the Internet is like taking a drink from a firehose. – Mitchell Kapor

[ PODCAST OF THE WEEK]

Understanding #BigData #BigOpportunity in Big HR by @MarcRind #FutureOfData #Podcast

 Understanding #BigData #BigOpportunity in Big HR by @MarcRind #FutureOfData #Podcast

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

Three-quarters of decision-makers (76 per cent) surveyed anticipate significant impacts in the domain of storage systems as a result of the “Big Data” phenomenon.

Sourced from: Analytics.CLUB #WEB Newsletter

March 6, 2017 Health and Biotech analytics news roundup

Here’s the latest in health and biotech analytics:

Mathematical Analysis Reveals Prognostic Signature for Prostate Cancer: University of East Anglia researchers used an unsupervised technique to categorize cancers based on gene expression levels. Their method was better able than current supervised methods to identify patients with more harmful variants of the disease.

Assisting Pathologists in Detecting Cancer with Deep Learning: Scientists at Google have trained deep learning models to detect tumors in images of tissue samples. These models beat pathologists’ diagnoses by one metric.

Patient expectations for health data sharing exceed reality, study says: The Humana study shows that, among other beliefs, most patients think doctors share more information than they actually do. They also expect information from digital devices will be beneficial.

NHS accused of covering up huge data loss that put thousands at risk: The UK’s national health service failed to deliver half a million medically relevant documents between 2011 and 2016. They had previously briefed Parliament about the failure, but not the scale of it.

Entire operating system written into DNA at 215 Pbytes/gram: Yaniv Erlich and Dina Zielinski (New York Genome Center) used a “fountain code” to translate a 2.1 MB archive into DNA. They were able to retrieve the data by sequencing the resulting fragments, a process that was robust to mutations and loss of sequences.

Source: March 6, 2017 Health and Biotech analytics news roundup