Dec 27, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Data analyst  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> Data Matching with Different Regional Data Sets by analyticsweekpick

>> Jeff Palmucci / @TripAdvisor discusses managing a #MachineLearning #AI Team by v1shal

>> The Methods UX Professionals Use (2018) by analyticsweek

Wanna write? Click Here

[ NEWS BYTES]

>>
 Financial Contrast: LendingClub (LC) and Marchex (MCHX) – Fairfield Current Under  Social Analytics

>>
 Unraveling the Data Analytics Advantage – CDOTrends Under  Business Analytics

>>
 Cna Financial Corp (NYSE:CNA) Institutional Investor Sentiment Analysis – The Cardinal Weekly (press release) Under  Sentiment Analysis

More NEWS ? Click Here

[ FEATURED COURSE]

Lean Analytics Workshop – Alistair Croll and Ben Yoskovitz

image

Use data to build a better startup faster in partnership with Geckoboard… more

[ FEATURED READ]

The Industries of the Future

image

The New York Times bestseller, from leading innovation expert Alec Ross, a “fascinating vision” (Forbes) of what’s next for the world and how to navigate the changes the future will bring…. more

[ TIPS & TRICKS OF THE WEEK]

Data Analytics Success Starts with Empowerment
Being Data Driven is not as much of a tech challenge as it is an adoption challenge. Adoption has it’s root in cultural DNA of any organization. Great data driven organizations rungs the data driven culture into the corporate DNA. A culture of connection, interactions, sharing and collaboration is what it takes to be data driven. Its about being empowered more than its about being educated.

[ DATA SCIENCE Q&A]

Q:Explain what a local optimum is and why it is important in a specific context,
such as K-means clustering. What are specific ways of determining if you have a local optimum problem? What can be done to avoid local optima?

A: * A solution that is optimal in within a neighboring set of candidate solutions
* In contrast with global optimum: the optimal solution among all others

* K-means clustering context:
It’s proven that the objective cost function will always decrease until a local optimum is reached.
Results will depend on the initial random cluster assignment

* Determining if you have a local optimum problem:
Tendency of premature convergence
Different initialization induces different optima

* Avoid local optima in a K-means context: repeat K-means and take the solution that has the lowest cost

Source

[ VIDEO OF THE WEEK]

@DrewConway on creating socially responsible data science practice #FutureOfData #Podcast

 @DrewConway on creating socially responsible data science practice #FutureOfData #Podcast

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

It’s easy to lie with statistics. It’s hard to tell the truth without statistics. – Andrejs Dunkels

[ PODCAST OF THE WEEK]

@DrewConway on fabric of an IOT Startup #FutureOfData #Podcast

 @DrewConway on fabric of an IOT Startup #FutureOfData #Podcast

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

The largest AT&T database boasts titles including the largest volume of data in one unique database (312 terabytes) and the second largest number of rows in a unique database (1.9 trillion), which comprises AT&T’s extensive calling records.

Sourced from: Analytics.CLUB #WEB Newsletter

What Happens When You Put Hundreds of BI Experts in One Room?

Last week we wrapped up the second day of our global, two-day client conference: Eureka!. Our sold-out event brought together hundreds of business leaders and analytics professionals from around the globe to listen to thought-provoking presentations and engage in discussions about the evolution of the analytics industry.

You may be wondering why we chose to call out client conference “Eureka!”. I’m glad you asked. A “eureka moment” is an “aha!” moment, a moment where something clicks and finally makes sense. In hearing and sharing stories, experiences, and perspectives with industry veterans and peers, it was our hope that attendees experienced moments of surprise and enlightenment.

Unsurprisingly, some of the hottest topics at Eureka! were the shift to embedding analytics everywhere, the impact of AI and augmented analytics on businesses, and how to drive transformational change with analytics.

Eureka!

Embedding Analytics Everywhere

In his opening day keynote, Sisense CEO, Amir Orad, emphasized the importance of lowering the barrier to analytics and empowering everyone to use data to make decisions. Providing analytics to everyone, everywhere means catering to the different ways people understand data. This means moving beyond desktop dashboards and offering insights naturally throughout our lives.

Continuing with the non-traditional side of analytics, Amir pointed to three organizations using analytics in unique ways:

  1. Celestica, a global electronics manufacturer, leverages analytics to reduce its carbon footprint. Within just four months of implementing analytics, they saw a 1,041 metric tons reduction of Co2e. That’s enough energy to power 110 homes for one full year!
  2. Skullcandy, the incredibly popular maker of headphones, earbuds, and other audio and wireless products, has used analytics in their business to virtually eliminate fraudulent returns.
  3. Indiana Donor Network, the organ and tissue donation network for the state of Indiana, has used analytics to increase skin donations by 70% and cornea donations by a whopping 224%.

Solidifying the need to embed analytics everywhere in order to transform industries was Sham Sokka of Philips, who spoke about revolutionizing patient care by delivering relevant data and analytics to the right individual at each stage of client care. “We fully believe in this concept of data democratization,” Sham said. “Not everyone is a data scientist so you want to have a platform that can serve simple data to a patient but complex data to an administrator. Getting the right data to the right person is super critical.”

AI and Augmented Analytics

There’s no doubt that artificial intelligence and augmented analytics are going to continue to impact every aspect of analytics – from data prep to insight discovery.

In her keynote, Jen Underwood of Impact Analytix, discussed the unprecedented pace of continuous technological change we’re currently witnessing. When organizations adopt augmented analytics, Jen said they see a multitude of benefits, which include:

  1. Empowering the masses: Rather than providing analytics for only around 30% of an organization, augmented analytics makes discovering insight easy enough for everyone.
  2. Saving time: Augmented data prep automates and accelerates the process, applies reinforcement learning while humans drive algorithms, and helps improve data quality for faster results.
  3. Revealing hidden patterns: Augmented analytics can find patterns in your data that a human might never detect – or detect when it’s too late – using manual techniques.
  4. Improving accuracy: With the ability to apply statistical significance, uncertainty, and risk model estimates, augmented analytics takes into account aspects of data prep and modeling that manual approaches may miss.

Joining in on the topic of artificial intelligence, Professor and Author Avi Goldfarb, gave a keynote that had participants glued to their chairs. His session demonstrated how artificial intelligence will affect business, public policy, and society in virtually all fields. The point he drove home? Prediction isn’t useful unless you can do something with it. What’s useful about AI and prediction is the ability to take action and create a feedback loop – that’s where the competitive edge comes into play.

Transformational Change

Advancements in technology are great but it’s the changes they bring to organizations that make all the difference in the real world. In his session, Bill Janczak from Indiana Donor Network told his organization’s story of transformation through the implementation of analytics.

Eureka!

As a small organization with a small IT budget, Indiana Donor Network has a large mission – to help people during their time of need. Run traditionally like a non-profit, Indiana Donor Network realized that changing their behavior and adding in analytics was the missing piece to ensuring organs make it to the right place at the right time. Using analytics they were able to make some major, important changes:

  1. Within hours they can now catch errors and common data entry challenges that would normally take around 30-45 days to find. This lead to improved matches for organ transplants.
  2. They are now able to monitor which donor outreach programs are successful and which are not in order to focus their activities and spend their resources on programs that actually drive more awareness and donor authorization so that more people can be helped in the long run.

We’ve Struck Gold!

The last two days were a whirlwind of bright ideas, futuristic visions, and practical applications of analytics to improve businesses around the globe. If the excitement in the room surrounding all of the technological transformations was any indication, I’d say the future for analytics is bright.

I’d like to extend a quick thank you to all of our speakers and customers for contributing to an awesome, fascinating, and fun event. Until next year!

Originally Posted at: What Happens When You Put Hundreds of BI Experts in One Room? by analyticsweek

Dec 20, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
SQL Database  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> Better Business Intelligence Boons with Data Virtualization by jelaniharper

>> Virtualization – A Look to the Future by analyticsweekpick

>> Future of Public Sector and Jobs in #BigData World #FutureOfData #Podcast by v1shal

Wanna write? Click Here

[ NEWS BYTES]

>>
 2018-2023 Prescriptive Analytics Market Overview, Growth, Types, Applications, Market Dynamics, Companies … – Stock Analysis Under  Prescriptive Analytics

>>
 HPE Big Data VP On BlueData Deal, Dell EMC And The ‘Huge’ AI Opportunity – CRN Under  Big Data

>>
 Global Mobile Marketing Analytics Market Booming Growth Status, Market estimation, Analysis and Future forecast … – FastOnlineNews Under  Marketing Analytics

More NEWS ? Click Here

[ FEATURED COURSE]

Applied Data Science: An Introduction

image

As the world’s data grow exponentially, organizations across all sectors, including government and not-for-profit, need to understand, manage and use big, complex data sets—known as big data…. more

[ FEATURED READ]

Storytelling with Data: A Data Visualization Guide for Business Professionals

image

Storytelling with Data teaches you the fundamentals of data visualization and how to communicate effectively with data. You’ll discover the power of storytelling and the way to make data a pivotal point in your story. Th… more

[ TIPS & TRICKS OF THE WEEK]

Data Analytics Success Starts with Empowerment
Being Data Driven is not as much of a tech challenge as it is an adoption challenge. Adoption has it’s root in cultural DNA of any organization. Great data driven organizations rungs the data driven culture into the corporate DNA. A culture of connection, interactions, sharing and collaboration is what it takes to be data driven. Its about being empowered more than its about being educated.

[ DATA SCIENCE Q&A]

Q:What is latent semantic indexing? What is it used for? What are the specific limitations of the method?
A: * Indexing and retrieval method that uses singular value decomposition to identify patterns in the relationships between the terms and concepts contained in an unstructured collection of text
* Based on the principle that words that are used in the same contexts tend to have similar meanings
* “Latent”: semantic associations between words is present not explicitly but only latently
* For example: two synonyms may never occur in the same passage but should nonetheless have highly associated representations

Used for:

* Learning correct word meanings
* Subject matter comprehension
* Information retrieval
* Sentiment analysis (social network analysis)

Source

[ VIDEO OF THE WEEK]

#BigData @AnalyticsWeek #FutureOfData #Podcast with Juan Gorricho, @disney

 #BigData @AnalyticsWeek #FutureOfData #Podcast with Juan Gorricho, @disney

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

In God we trust. All others must bring data. – W. Edwards Deming

[ PODCAST OF THE WEEK]

@ChuckRehberg / @TrigentSoftware on Translating Technology to Solve Business Problems #FutureOfData #Podcast

 @ChuckRehberg / @TrigentSoftware on Translating Technology to Solve Business Problems #FutureOfData #Podcast

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

Data production will be 44 times greater in 2020 than it was in 2009.

Sourced from: Analytics.CLUB #WEB Newsletter

Data Management Rules for Analytics

With analytics taking a central role in most companies’ daily operations, managing the massive data streams organizations create is more important than ever. Effective business intelligence is the product of data that is scrubbed, properly stored, and easy to find. When your organization uses raw data without proper management procedures, your results suffer.

The first step towards creating better data for analytics starts with managing data the right way. Establishing clear protocols and following them can help streamline the analytics process, offer better insights, and simplify the process of handling data. You can start by implementing these five rules to manage your data more efficiently.

1. Establish Clear Analytics Goals Before Getting Started

As the amount of data produced by organizations daily grows exponentially, sorting through terabytes of information can become problematic and reduce the efficiency of analytics. Such large data sets require significantly longer times to scrub and properly organize. For companies that deal with multiple streams that exhibit heavy bandwidth, having a clear line of sight towards business and analytics goals can help reduce inflows and prioritize relevant data.

It’s important to establish clear objectives for data and create parameters that filter out data points that are irrelevant or unclear. This facilitates pre-screening datasets and makes scrubbing and sorting easier by reducing white noise. Additionally, you can focus even more on measuring specific KPIs to further filter out the right data from the stream.

6 crucial steps of preparing data for analysis

2. Simplify and Centralize Your Data Streams

Another problem analytics suites face is reconciling disparate data from multiple streams. Organizations have internal, third-party, customer, and other data that must be considered as part of a larger whole instead of viewed in isolation. Leaving data as-is can be damaging to insights, as different sources may use unique formats or different styles.

Before allowing multiple streams to connect to your data analytics software, your first step should be establishing a process to collect data more centrally and unify it. This centralization makes it easier to input data seamlessly into analytics tools, but also simplifies the methodology for users to find and manipulate data. Consider how to set up your data streams best to reduce the number of sources to eventually produce more unified sets.

3. Scrub Your Data Before Warehousing

The endless stream of data raises questions about quality and quantity. While having more information is preferable, data loses its usefulness when it’s surrounded by noise and irrelevant points. Unscrubbed data sets make it harder to uncover insights, properly manage databases, and access information later.

Before worrying about data warehousing and access, consider the processes in place to scrub data to produce clean sets. Create phases that ensure data relevance is considered while effectively filtering out data that is not pertinent. Additionally, make sure the process is as automated as possible to reduce wasted resources. Implementing functions such as data classification and pre-sorting can help expedite the cleaning process.

4. Establish Clear Data Governance Protocols

One of the biggest emerging issues facing data management is data governance. Because of the sensitive nature of many sources—consumer information, sensitive financial details, and so on—concerns about who has access to information are becoming a central topic in data management. Moreover, allowing free access to datasets and storage can lead to manipulation, mistakes, and deletions that could prove damaging.

It’s vital to establish clear and explicit rules about who can access data, when, and how. Creating tiered permission systems (read, read/write, admin) can help limit the exposure to mistakes and danger. Additionally, sorting data in ways that facilitate access to different groups can help manage data access better without the need to give free rein to all team members.

5. Create Dynamic Data Structures

Many times, storing data is reduced to a single database that limits how you can manipulate it. Static data structures are effective for holding data, but they are restrictive when it comes to analyzing and processing it. Instead, data managers should place a greater emphasis towards creating structures that encourage deeper analysis.

Dynamic data structures present a way to store real-time data that allows users to connect points better. Using three-dimensional databases, finding methods to reshape data rapidly, and creating more inter-connected data silos can help contribute to more agile business intelligence. Generate databases and structures that simplify accessing and interacting with data rather than isolating it.

The fields of data management and analytics are constantly evolving. For analytics teams, it’s vital to create infrastructures that are future-proofed and offer the best possible insights for users. By establishing best practices and following them as closely as possible, organizations can significantly enhance the quality of the insights their data produces.

6 crucial steps of preparing data for analysis

Source

Dec 13, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Statistics  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> Big Data Advances in Customer Experience Management by bobehayes

>> Tackling 4th Industrial Revolution with HR4.0 – Playcast – Data Analytics Leadership Playbook Podcast by v1shal

>> Jeff Palmucci / @TripAdvisor discusses managing a #MachineLearning #AI Team by v1shal

Wanna write? Click Here

[ NEWS BYTES]

>>
 Global Sentiment Analysis Software Market Size, Growth Opportunities, Current Trends, Forecast by 2025 – Redfield Herald (press release) (blog) Under  Sentiment Analysis

>>
 Ultimate Software Climbs Into the Cloud – Motley Fool Under  Cloud

>>
 Google Analytics updated with Google Material Theme tweaks on the web – 9to5Google Under  Analytics

More NEWS ? Click Here

[ FEATURED COURSE]

Artificial Intelligence

image

This course includes interactive demonstrations which are intended to stimulate interest and to help students gain intuition about how artificial intelligence methods work under a variety of circumstances…. more

[ FEATURED READ]

Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking

image

Written by renowned data science experts Foster Provost and Tom Fawcett, Data Science for Business introduces the fundamental principles of data science, and walks you through the “data-analytic thinking” necessary for e… more

[ TIPS & TRICKS OF THE WEEK]

Grow at the speed of collaboration
A research by Cornerstone On Demand pointed out the need for better collaboration within workforce, and data analytics domain is no different. A rapidly changing and growing industry like data analytics is very difficult to catchup by isolated workforce. A good collaborative work-environment facilitate better flow of ideas, improved team dynamics, rapid learning, and increasing ability to cut through the noise. So, embrace collaborative team dynamics.

[ DATA SCIENCE Q&A]

Q:Explain what a false positive and a false negative are. Why is it important these from each other? Provide examples when false positives are more important than false negatives, false negatives are more important than false positives and when these two types of errors are equally important
A: * False positive
Improperly reporting the presence of a condition when it’s not in reality. Example: HIV positive test when the patient is actually HIV negative

* False negative
Improperly reporting the absence of a condition when in reality it’s the case. Example: not detecting a disease when the patient has this disease.

When false positives are more important than false negatives:
– In a non-contagious disease, where treatment delay doesn’t have any long-term consequences but the treatment itself is grueling
– HIV test: psychological impact

When false negatives are more important than false positives:
– If early treatment is important for good outcomes
– In quality control: a defective item passes through the cracks!
– Software testing: a test to catch a virus has failed

Source

[ VIDEO OF THE WEEK]

@AnalyticsWeek Panel Discussion: Finance and Insurance Analytics

 @AnalyticsWeek Panel Discussion: Finance and Insurance Analytics

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

Without big data, you are blind and deaf and in the middle of a freeway. – Geoffrey Moore

[ PODCAST OF THE WEEK]

@AlexWG on Unwrapping Intelligence in #ArtificialIntelligence #FutureOfData #Podcast

 @AlexWG on Unwrapping Intelligence in #ArtificialIntelligence #FutureOfData #Podcast

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

And one of my favourite facts: At the moment less than 0.5% of all data is ever analysed and used, just imagine the potential here.

Sourced from: Analytics.CLUB #WEB Newsletter

The Data Driven Road Less Traveled

data_img

To help better explain the topic, let me take a small detour; explain conventional big business paradox and why it a threat in today’s economy. Remember the days when large businesses were coined 800-pound gorilla and small businesses only dreamt about touching their market share. That in some sense is the conventional big business paradox. It is not true anymore. With ever connected world, easy access to cutting edge platform and methodologies, even small businesses have access to disruptive technologies and ways. In fact Small businesses have an advantage. They could react quickly, act nimbly and have a better focus. So, it is not a surprise that every now and then, big businesses are getting small blows by companies running against conventional big business paradox. So, what is wrong? Big organizations more often than required, run on their conventional ways. Sure, you could argue that scale and size makes them slow, however, try explaining it to businesses like Amazon, Salesforce and Google. In the current landscape, rapidly changing market and customer dynamics demand better ways to analyze the evolving customer expectations and faster response times.

Besides getting your hands on best talent in the market, large businesses should create ways to introduce another paradigm that would help them identify and understand changing customer expectations and technology paradigm. I call it Data Driven Enterprise.

Data never lies, never introduces bias, nor leaves anything to assumptions. Data driven businesses have been proven time and again as more sustainable business. In fact, if you talk about star products of big businesses, they are extensively monitored. What fails most businesses is their lack of attention towards nooks and crannies, where we first observe the signs of the changing customer expectations/ preference/ technology. That is why a centralized data driven approach is the way to go.

A data driven framework is not complicated but a series of focused steps taken to achieve a data focused enterprise. Think of it as an engine to rapidly validate hypothesis in a lean/iterative manner. Sounds cool? Yes, it is! More often than before, more businesses are aligning themselves to a better data driven enterprise. Want more information…

I’ve written an ebook, which had crossed 3k downloads last month. It has a series of easy to digest steps for building thought leadership that is needed to help take a big business to data driven innovation route. Please feel free to download the ebook at: http://pxl.me/ddibook and let me know your thoughts.
Remember, it’s just the start of the discussion; together we all have to travel a long road to sustained business growth.

Originally Posted at: The Data Driven Road Less Traveled by d3eksha

Dec 06, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Big Data knows everything  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> Development of the Customer Sentiment Index: Lexical Differences by bobehayes

>> Using Big Data In A Crisis: Nepal Earthquake by analyticsweekpick

>> Customer Loyalty and Customer Lifetime Value by bobehayes

Wanna write? Click Here

[ NEWS BYTES]

>>
 Senior Data Scientist – Built In Chicago Under  Data Scientist

>>
 Risk Analytics Market 2018-2023: Top Company, Highest manufactures, Competitors, challenges and Drivers with … – News Egypt (press release) (blog) Under  Risk Analytics

>>
 Egyptian Pollution Plan Helps Combat ‘Black Cloud’ – Voice of America Under  Cloud

More NEWS ? Click Here

[ FEATURED COURSE]

Python for Beginners with Examples

image

A practical Python course for beginners with examples and exercises…. more

[ FEATURED READ]

Research Design: Qualitative, Quantitative, and Mixed Methods Approaches, 4th Edition

image

The eagerly anticipated Fourth Edition of the title that pioneered the comparison of qualitative, quantitative, and mixed methods research design is here! For all three approaches, Creswell includes a preliminary conside… more

[ TIPS & TRICKS OF THE WEEK]

Save yourself from zombie apocalypse from unscalable models
One living and breathing zombie in today’s analytical models is the pulsating absence of error bars. Not every model is scalable or holds ground with increasing data. Error bars that is tagged to almost every models should be duly calibrated. As business models rake in more data the error bars keep it sensible and in check. If error bars are not accounted for, we will make our models susceptible to failure leading us to halloween that we never wants to see.

[ DATA SCIENCE Q&A]

Q:Explain what a local optimum is and why it is important in a specific context,
such as K-means clustering. What are specific ways of determining if you have a local optimum problem? What can be done to avoid local optima?

A: * A solution that is optimal in within a neighboring set of candidate solutions
* In contrast with global optimum: the optimal solution among all others

* K-means clustering context:
It’s proven that the objective cost function will always decrease until a local optimum is reached.
Results will depend on the initial random cluster assignment

* Determining if you have a local optimum problem:
Tendency of premature convergence
Different initialization induces different optima

* Avoid local optima in a K-means context: repeat K-means and take the solution that has the lowest cost

Source

[ VIDEO OF THE WEEK]

Harsh Tiwari talks about fabric of data driven leader in Financial Sector #FutureOfData #Podcast

 Harsh Tiwari talks about fabric of data driven leader in Financial Sector #FutureOfData #Podcast

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

If we have data, let’s look at data. If all we have are opinions, let’s go with mine. – Jim Barksdale

[ PODCAST OF THE WEEK]

Harsh Tiwari talks about fabric of data driven leader in Financial Sector #FutureOfData #Podcast

 Harsh Tiwari talks about fabric of data driven leader in Financial Sector #FutureOfData #Podcast

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

We are seeing a massive growth in video and photo data, where every minute up to 300 hours of video are uploaded to YouTube alone.

Sourced from: Analytics.CLUB #WEB Newsletter

Big Data Explained in Less Than 2 Minutes – To Absolutely Anyone

There are some things that are so big that they have implications for everyone, whether we want them to or not. Big Data is one of those concepts, and is completely transforming the way we do business and is impacting most other parts of our lives.

It’s such an important idea that everyone from your grandma to your CEO needs to have a basic understanding of what it is and why it’s important.

Source for cartoon: click here

What is Big Data?

“Big Data” means different things to different people and there isn’t, and probably never will be, a commonly agreed upon definition out there. But the phenomenon is real and it is producing benefits in so many different areas, so it makes sense for all of us to have a working understanding of the concept.

So here’s my quick and dirty definition:

The basic idea behind the phrase ‘Big Data’ is that everything we do is increasingly leaving a digital trace (or data), which we (and others) can use and analyse. Big Data therefore refers to that data being collected and our ability to make use of it.

I don’t love the term “big data” for a lot of reasons, but it seems we’re stuck with it. It’s basically a ‘stupid’ term for a very real phenomenon – the datafication of our world and our increasing ability to analyze data in a way that was never possible before.

Of course, data collection itself isn’t new. We as humans have been collecting and storing data since as far back as 18,000 BCE. What’s new are the recent technological advances in chip and sensor technology, the Internet, cloud computing, and our ability to store and analyze data that have changed the quantityof data we can collect.

Things that have been a part of everyday life for decades — shopping, listening to music, taking pictures, talking on the phone — now happen more and more wholly or in part in the digital realm, and therefore leave a trail of data.

The other big change is in the kind of data we can analyze. It used to be that data fit neatly into tables and spreadsheets, things like sales figures and wholesale prices and the number of customers that came through the door.

Now data analysts can also look at “unstructured” data like photos, tweets, emails, voice recordings and sensor data to find patterns.

How is it being used?

As with any leap forward in innovation, the tool can be used for good or nefarious purposes. Some people are concerned about privacy, as more and more details of our lives are being recorded and analyzed by businesses, agencies, and governments every day. Those concerns are real and not to be taken lightly, and I believe that best practices, rules, and regulations will evolve alongside the technology to protect individuals.

But the benefits of big data are very real, and truly remarkable.

Most people have some idea that companies are using big data to better understand and target customers. Using big data, retailers can predict what products will sell, telecom companies can predict if and when a customer might switch carriers, and car insurance companies understand how well their customers actually drive.

It’s also used to optimize business processes. Retailers are able to optimize their stock levels based on what’s trending on social media, what people are searching for on the web, or even weather forecasts. Supply chains can be optimized so that delivery drivers use less gas and reach customers faster.

But big data goes way beyond shopping and consumerism. Big data analytics enable us to find new cures and better understand and predict the spread of diseases. Police forces use big data tools to catch criminals and even predict criminal activity and credit card companies use big data analytics it to detect fraudulent transactions. A number of cities are even using big data analytics with the aim of turning themselves into Smart Cities, where a bus would know to wait for a delayed train and where traffic signals predict traffic volumes and operate to minimize jams.

Why is it so important?

The biggest reason big data is important to everyone is that it’s a trend that’s only going to grow.

As the tools to collect and analyze the data become less and less expensive and more and more accessible, we will develop more and more uses for it — everything from smart yoga mats to better healthcare tools and a more effective police force.

And, if you live in the modern world, it’s not something you can escape. Whether you’re all for the benefits big data can bring, or worried about Big Brother, it’s important to be aware of the phenomena and tuned in to how it’s affecting your daily life.

What are your biggest questions about big data? I’d love to hear them in the comments below — and they may inspire future posts to address them.

To read the full article on Data Science Central, click here.

Source: Big Data Explained in Less Than 2 Minutes – To Absolutely Anyone

Bias: Breaking the Chain that Holds Us Back

Speaker Bio: Dr. Vivienne Ming was named one of 10 Women to Watch in Tech by Inc. Magazine, she is a theoretical neuroscientist, entrepreneur, and author. She co-founded Socos Labs, her fifth company, an independent think tank exploring the future of human potential. Dr. Ming launched Socos Labs to combine her varied work with that of other creative experts and expand their impact on global policy issues, both inside companies and throughout our communities. Previously, Vivienne was a visiting scholar at UC Berkeley’s Redwood Center for Theoretical Neuroscience, pursuing her research in cognitive neuroprosthetics. In her free time, Vivienne has invented AI systems to help treat her diabetic son, predict manic episodes in bipolar sufferers weeks in advance, and reunited orphan refugees with extended family members. She sits on boards of numerous companies and nonprofits including StartOut, The Palm Center, Cornerstone Capital, Platypus Institute, Shiftgig, Zoic Capital, and SmartStones. Dr. Ming also speaks frequently on her AI-driven research into inclusion and gender in business. For relaxation, she is a wife and mother of two.

Distilled Blog Post Summary: Dr. Vivienne Ming’s talk at a recent Domino MeetUp delved into bias and its implications, including potential liabilities for algorithms, models, businesses, and humans. Dr. Ming’s evidence included first-hand knowledge fundraising for multiple startups, data analysis completed during her tenure as the Chief Scientist at Gild, as well as citing studies within data, economics, recruiting, and education. This blog post provides text and video clip highlights from the talk. The full video is available for viewing. If you are interested in viewing additional content from Domino’s past events, review the Data Science Popup Playlist. If you are interested in attending an event in-person, then consider the upcoming Rev.

Research, Experimentation, and Discovery: Core of Science

Research, experimentation, and discovery are at the core of all types of science, including data science. Dr. Ming kicked off the talk with indicating “one of the powers of doing a lot of rich data work, there’s this whole range– I mean, there’s very little in this world that’s not an entree into”. While Dr. Ming provided detailed insights and evidence that pointed to the potential of rich data work during the entire talk, this blog post focuses on the implications and liabilities of bias within gender, names, and ethnic demographics. It also covers how bias isn’t solely a data or algorithm problem, it is a human problem. The first step to address bias is acknowledging that it exists.

Do You See the Chameleon? The Roots of Bias

Each one of us has biases and makes assessments based on those biases. Dr. Ming uses Johannes Stotter’s Chameleon to point out that “the roots of bias are fundamental and unavoidable”. Many people when they see the image, see a chameleon. However, the chameleon image consists of two people covered in body paint and are strategically placed to look like a chameleon. In the video clip below, Dr. Ming indicates

“I cannot make an unbiased AI. There are no unbiased rats in the world. In a very basic sense, these systems are making decisions on their uncertainty, and the only rational way to do that is to act the best we can given the data. The problem is when you refuse to acknowledge there’s a problem with our bias and actually do something about it. And we have this tremendous amount of evidence that there is a serious problem, and it’s holding, not just small things back. But as I’m going to get to later, it’s holding us back from a transformed world, one that I think anyone can selfishly celebrate.”

https://fast.wistia.com/assets/external/E-v1.js

Bias as the Pat on the Head (or the Chain) that Holds Us Back

While history is filled with moments when bias is not acknowledged as a problem, there are also moments when people addressed societal-reinforced gender bias. Women have assumed male nom de plumes to write epic novels, fight in wars, win Judo championships, run marathons, and even, as Dr. Ming pointed out, create an all-women software company called Freelance Programmers in the 1960s. During the meetup, Dr. Ming indicated that Dame Stephane “Steve” Shirley’s TedTalk, “Why do ambitious women have flat heads?”, helped her parse two distinctly different startup fundraising experiences that were grounded in gender bias.

Prior to Dr. Ming co-founding her current education technology company and obtaining her academic credentials, she dropped out of college and started a film company. When

“we started this company, and the funny thing is, despite having nothing, nothing that anyone should invest in– we didn’t have a script. We didn’t have talent. Literally, we didn’t even have talent. We didn’t have experience. We had nothing. We essentially raised what you might in the tech industry called seed round after a few phone calls.“

However, raising funding was more difficult the second time, for her current company, despite having substantially more academic, technology, and business credentials. During one of the funding meetings with a small firm with 5 partners, Dr. Ming relayed how the last partner said “‘you should feel so proud of what you’ve built’. And at the time, I thought, oh, Jesus, at least one of these people is on our side. In fact, as we were leaving the room, he literally patted me on the head, which seemed a little strange.” This prompted Dr. Ming to consider how

“my credentials are transformed that second time. No one questioned us about the technology. They loved it. They questioned whether we know how to run a business. The product itself people loved versus a film. Everything the second time around should have been dramatically easier. Except the only real difference that I can see is that the first time I was a man and the second time I was a woman.“

This led Dr. Ming to conclude and understand what Stephanie Shirley meant by ambitious women having flat heads from all of the times they have been pat on the head. Dr. Ming relays that

“I’ve learned ever since as an entrepreneur is, as soon as it feels like they’re dealing with their favorite niece rather than me as a business person, then I know, I know that they simply are not taking me seriously. And all the PhD’s in the world doesn’t matter, all the past successes in my other companies doesn’t matter. You are just that thing to me. And what I’ve learned is, figure that out ahead of time. Don’t bother wasting days and hours, and prepping to pitch to people that simply are not capable of understanding who you are, but of course, in a lot of context, that’s all you’ve got.“

Dr. Ming also pointed out that the bias due to gender also manifested at an organization where she worked before and after her gender transition. She noted when she went into work after her gender transition,

“That’s the last day anyone ever asked me a math question, which is kind of funny. I do happen to also have a PhD in psychology. But somehow one day to the next, I didn’t forget how to do convergence proofs. I didn’t forget what it meant to invent algorithms. And yet that was how people dealt with it, people who knew before. You see how powerful the change is to see someone in a different skin.”

This experience is similar to Dame Shirley’s, who, in order to start what would become a multi-billion dollar software company in the 1960s, “started to challenge the conventions of the time, even to the extent of changing my name from “Stephanie” to “Steve” in my business development letters, so as to get through the door before anyone realized that he was a she”. Dame Shirley subverted bias during a time when she, as a female, was prevented from working on the stock exchange, driving a bus, or, “Indeed, I couldn’t open a bank account without my husband’s permission”. Yet, despite the bias, Dame Shirley remarked

“who would have guessed that the programming of the black box flight recorder of Supersonic Concord would have been done by a bunch of women working in their own homes” ….”And later, when it was a company valued at over three billion dollars, and I’d made 70 of the staff into millionaires, they sort of said, “Well done, Steve!”

While it is no longer the 1960s, bias implications and liabilities are still present. Yet, we in data science are able to access data to have open conversations about bias as the first step avoiding inaccuracies, training data liabilities, and model liabilities within our data science projects and analysis. What if, in 2018, people built and trained models based on the assumption that humans with XY chromosomes lacked the ability to code because they only reviewed and used data from Dame Shirley’s company in the 1960s? Consider that a moment, as that is what happened to Dame Shirley, Dr. Ming, and many others. Bias implications and liabilities have real world consequences. Being aware of the bias and then addressing it, moves the industry forward towards breaking the chain that holds research, data science, and us, back.

Say My Name: Biased Perceptions Uncovered

When Dr. Ming was the Chief Scientist at Gild, a reporter called her for a quote on the Jose Zamora story. This also led to Dr. Ming’s research on her upcoming book. “The Tax of Being Different”, Dr. Ming relayed anecdotes during the meetup (see video clip) and has also written about this research for the Financial Times:

“To calculate the tax on being different I made use of a data set of 122m professional profiles collected by Gild, a company specialising in tech for hiring and HR, where I worked as chief scientist. From that data, I was able to compare the career trajectories of specific populations by examining the actual individuals. For example, our data set had 151,604 people called “Joe” and 103,011 named “José”. After selecting only for software developers we still had 7,105 and 4,896 respectively, real people writing code for a living. Analysing their career trajectories I found that José typically needs a masters degree or higher compared to Joe with no degree at all to be equally likely to get a promotion for the same quality of work. The tax on being different is largely implicit. People need not act maliciously for it to be levied. This means that José needs six additional years of education and all of the tuition and opportunity costs that education entails. This is the tax on being different, and for José that tax costs $500,000-$1m over his lifetime.” (Financial Times)

https://fast.wistia.com/assets/external/E-v1.js

While this particular example focuses on ethnicity-oriented demographic bias, during the meetup discussion, Dr. Ming referenced quite a few research studies regarding name bias. In case Domino Data Science Blog readers do not have some of research she cites on hand, a sample of studies have published around bias with names include: names that suggest male gender, “noble-sounding” surnames in Europe, names that are perceived as “easy-to-pronounce” which also has implications for how organizations choose their names. Yet, Dr. Ming did not limit the discussion to bias within gender and naming, she also dived right into how demographic bias impacts image classification, particularly with ethnicity.

Bias within Image Classification: Missing Uhura and Not Unlocking your iPhone X

Before Dr. Ming was the Chief Data Scientist at Gild, she was able to see Paul Viola’s face recognition algorithm demo. In that demo, she noticed that the algorithm didn’t detect Uhura. Viola indicated that this was a problem and it would be addressed. Fast forward years later to when Dr. Ming was the Chief Scientist at Gild, she relayed how she received “a call from The Wall Street Journal [and WSJ asked her] ‘So Google’s face recognition system just labeled a black couple as gorillas. Is AI racist?’ And I said, ‘Well, it’s the same as the rest of us. It depends on how you raise it.’“

For background context, in 2015, Google released a new photo app and a software developer discovered that the app labeled two people of color as “gorillas”  and Yonatan Zunger was the Chief Architect for Social at Google at the time. Since Yonatan Zunger is no longer at Google, he has since provided candid commentary about bias. Then, in January 2018, Wired ran a follow up story regarding the 2015 event. In the article Wired tested Google Photos and found that the labels for gorillas, chimpanzees, chimp, and monkey “were censored from searches and image tags after the 2015 incident”. This was confirmed by Google. Wired also ran a test to assess view of people by conducting searches for “African American”, “black man”, “black woman”, or “black person” which resulted in “an image of a grazing antelope” (on the search “African American”) as well as “black-and-white images of people, correctly sorted by gender but not filtered by race”. This points to the continued challenges involved with addressing bias in machine learning and models. Bias that also has implications beyond social justice.

As Dr. Ming pointed out in the meetup video clip below, facial recognition is also built into the iPhone X. The face recognition feature has potential challenges in recognizing global faces of color. Yet, despite all of this, Dr. Ming indicates “but what you have to recognize, none of these are algorithm problems. These are human problems.” Humans made decisions to build algorithms, build models, train models, and roll out products that include bias that has wide implications.

https://fast.wistia.com/assets/external/E-v1.js

Conclusion

Introducing liability into an algorithm or model via bias isn’t solely a data or algorithm problem, it is a human problem. Understanding that it is a problem is the first step in addressing it. In the recent Domino Meetup, Dr. Ming relayed how

“AI is an amazing tool, but it’s just a tool. It will never solve your problems for you. You have to solve them. And particularly in the work I do, there are only ever messy human problems, and they only ever have messy human solutions. What’s amazing about machine learning is that once we found some of those issues, we can actually use it to reach as many people as possible, to make this essentially cost-effective, to scale that solution to everyone. But if you think some deep neural network is going to somehow magically figure out who you want to hire when you have not been hiring the right people in the first place, what is it you think is happening in that data set?”

Domino continually curates and amplifies ideas, perspectives, and research to contribute to discussions that accelerate data science work. The full video of Dr. Ming’s talk at the recent Domino MeetUp is available. There is also an additional technical talk that Dr. Ming gave at the Berkeley Institute of Data Science on “Maximizing Human Potential Using Machine Learning-Driven Applications”. If you are interested in similar content to these talks, please feel free to visit the Domino Data Science Popup Playlist or attend the upcoming Rev.

The post Bias: Breaking the Chain that Holds Us Back appeared first on Data Science Blog by Domino.

Source: Bias: Breaking the Chain that Holds Us Back