@JohnNives on ways to demystify AI for enterprise #FutureOfData

[youtube https://www.youtube.com/watch?v=daiVHrsZQMU]

@JohnNives on ways to demystify AI for enterprise #FutureOfData

Youtube: https://www.youtube.com/watch?v=daiVHrsZQMU
iTunes: http://math.im/itunes

In this podcast @JohnNives discusses ways to demystify AI for enterprise. He shared his perspective on how businesses should engage with AI and what are some of the best practices and considerations for businesses to adopt AI in their strategic roadmap. This podcast is great for anyone seeking to learn about way to adopt AI in enterprise landscape.

John’s Recommended Listen:
FutureOfData Podcast http://math.im/itunes
War and Peace Leo Tolstoy (Author),‎ Frederick Davidson (Narrator),‎ Inc. Blackstone Audio (Publisher) https://amzn.to/2w7ObkI

Podcast Link:
iTunes: http://math.im/itunes
GooglePlay: http://math.im/gplay

Jean’s BIO:
Jean-Louis (John) Nives serves as Chief Digital Officer and the Global Chair of the Digital Transformation practice at N2Growth. Prior to joining N2Growth, Mr. Nives was at IBM Global Business Services, within the Watson and Analytics Center of Competence. There he worked on Cognitive Digital Transformation projects related to Watson, Big Data, Analytics, Social Business and Marketing/Advertising Technology. Examples include CognitiveTV and the application of external unstructured data (social, weather, etc.) for business transformation.
Prior relevant experience includes executive leadership positions at Nielsen, IRI, Kraft and two successful advertising technology acquisitions (Appnexus and SintecMedia). In this capacity, Jean-Louis combined information, analytics and technology to created significant business value in transformative ways.
Jean-Louis earned a Bachelor’s Degree in Industrial Engineering from University at Buffalo and an MBA in Finance and Computer Science from Pace University. He is married with four children and lives in the New York City area.

About #Podcast:
#FutureOfData podcast is a conversation starter to bring leaders, influencers and lead practitioners to come on show and discuss their journey in creating the data driven future.

Wanna Join?
If you or any you know wants to join in,
Register your interest @ http://play.analyticsweek.com/guest/

Want to sponsor?
Email us @ info@analyticsweek.com

#FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

Source: @JohnNives on ways to demystify AI for enterprise #FutureOfData by admin

2017 Trends in Data Modeling

The projected expansion of the data ecosystem in 2017 is causing extremely deliberate, systematic challenges for organizations attempting to exploit the most effective techniques available for maximizing data utility.

The plenitude of cognitive computing options, cloud paradigms, data science, and mobile technologies for big data has demonstrated its business value in a multitude of use cases. Pragmatically, however, its inclusion alongside conventional data management processes poses substantial questions on the back end pertaining to data governance and, more fundamentally, to data modeling.

Left unchecked, these concerns could potentially compromise any front-end merit while cluttering data-driven methods with unnecessary silos and neglected data sets. The key to addressing them lies in the implementation of swiftly adjustable data models which can broaden to include the attributes of the constantly changing business environments in which organizations compete.

According to TopQuadrant Executive VP and Director of TopBraid Technologies Ralph Hodgson, the consistency and adaptability of data modeling may play an even more dire role for the enterprise today:

“You have physical models and logical models, and they make their way into different databases from development to user acceptance into production. On that journey, things change. People might change the names of some of the columns of some of those data bases. The huge need is to be able to trace that through that whole assembly line of data.”

Enterprise Data Models
One of the surest ways to create a flexible enterprise model for a top down approach to the multiple levels of modeling Hodgson denoted is to use the linked data approach reliant upon semantic standards. Although there are other means of implementing enterprise data models, this approach has the advantages of being based on uniform standards applicable to all data which quickly adjust to include new requirements and use cases. Moreover, it has the added benefit of linking all data on an enterprise knowledge graph which, according to Franz CEO Jans Aasman, is one of the dominant trends to impact the coming year. “We don’t have to even talk about it anymore,” Aasman stated. “Everyone is trying to produce a knowledge graph of their data assets.”

The merit of a uniform data model for multiple domains throughout the enterprise is evinced in Master Data Management platforms as well; one can argue the linked data approach of ontological models merely extends that concept throughout the enterprise. In both cases, organizations are able to avoid situations in which “they spend so much time trying to figure out what the data model looks like and how do we integrate these different systems together so they can talk.” Stibo Systems Director of Vertical Solutions Shahrukh Arif claimed. “If you have it all in one platform, now you can actually realize that full value because you don’t have so spend so much time and money on the integrations and data models.”

Data Utility Models
The consistency of comprehensive approaches to data modeling are particularly crucial for cloud-based architecture or for incorporating data external to the enterprise. Frequently, organizations may encounter situations in which they must reconcile differences in modeling and metadata when attaining data from third-party sources. They can address these issues upfront by creating what DISCERN Chairman and CEO Harry Blount termed a “data utility model”, in which “all of the relevant data was available and mapped to all of the relevant macro-metadata, a metamodel I should say, and you could choose which data you want” from the third party in accordance with the utility model. Actually erecting such a model requires going through the conventional modeling process of determining business requirements and facilitating them through IT—which organizations can actually have done for them by competitive service providers. “Step one is asking all the right questions, step two is you need to have a federated, real-time data integration platform so you can take in any data in any format at any time in any place and always keep it up to date,” Blount acknowledged. “The third requirement is you need to have a scalable semantic graph structure.”

Relational Data Modeling (On-Demand Schema)
Data modeling in the relational world is increasingly impacted by the modeling techniques associated with contemporary big data initiatives. Redressing the inherent modeling disparities between the two is largely a means of accounting for semi-structured and unstructured data in relational environments primarily designed for structured data. Organizations are able to hurdle this modeling issue through the means of file formats which derive schema on demand. Options such as JSON and Avro are ideal for those who “want what is modeled in the big data world to align with what they have in their relational databases so they can do analytics held in their main databases,” Hodgson remarked.

One of the boons of utilizing Avro is the complete traceability it provides for data in relational settings—although such data may have originated from more contemporary unstructured sources associated with big data. The Avro format, and other files in this vein, allow modelers to traverse both relational schema requirements with what may be a lack of such schema intrinsic to most big data. According to Hodgson, Avro “still has the ontological connection, but it still talks in terms of property values and columns. It’s basically a table in the same sense you find in a spreadsheet. It’s that kind of table but the columns all align with the columns in a relational database, and those columns can be associated with a logical model which need not be an entity-relationship model. It can be an ontology.”

Predictive Models
Predictive models have been widely impacted by cognitive computing methods and other aspects of data science–although these two realms of data management are not necessarily synonymous with classic statistically-trained predictive models. Still, the influx of algorithms associated with various means of cognitive computing are paramount to the creation of predictive models which illustrate their full utility on unstructured big data sets at high velocities. Organizations can access entire libraries of machine learning and deep learning models from third-party vendors through the cloud, and either readily deploy them with their own data or “As a platform, we allow customers to build their own models or extend our models in service of their own specific needs” indico Chief Customer Officer Vishal Daga said.

The result is not only a dramatic reduction in the overall cost, labor, and salaries of hard to find data scientists to leverage cognitive computing techniques for predictive models, but also a degree of personalization—facilitated by the intelligent algorithms involved—enabling organizations to tailor those models to their own particular use cases. Thus, AI-centered SaaS opportunities actually reflect a predictive models on-demand service based on some of the most relevant data-centric processes to date.

Enterprise Representation
The nucleus of the enduring appositeness of data modeling is the increasingly complicated data landscape—including cognitive computing, a bevy of external data sources heralded by the cloud and mobile technologies in big data quantities—and the need to effectually structure data in a meaningful way. Modeling data is the initial step to gleaning its meaning and provides the basis for all of the different incarnations of data modeling, regardless of the particular technologies involved. However, there appears to be a burgeoning sense of credence associated with doing so on an enterprise-wide scale as “Knowing how data’s flowing and who it’s supporting, and what kind of new sources might make a difference to those usages, it’s all going to be possible when you have a representation of the enterprise,” Hodgson commented.

Adding further conviction to the value of enterprise data modeling is the analytic output facilitated by it. All-inclusive modeling techniques at the core of enterprise-spanning knowledge graphs appear well-suited for the restructuring of the data sphere caused by the big data disruption—particularly when paired with in-memory, parallel processing graph-aware analytics engines. “As modern data diversity and volumes grow, relational database management systems (RDBMS) are proving too inflexible, expensive and time-consuming for enterprises,” Cambridge Semantics VP of Engineering Barry Zane said. “Graph-based online analytical processing (GOLAP) will find a central place in everyday business by taking on data analytics challenges of all shapes and sizes, rapidly accelerating time-to-value in data discovery and analytics.”



What Happens When You Put Hundreds of BI Experts in One Room?

Last week we wrapped up the second day of our global, two-day client conference: Eureka!. Our sold-out event brought together hundreds of business leaders and analytics professionals from around the globe to listen to thought-provoking presentations and engage in discussions about the evolution of the analytics industry.

You may be wondering why we chose to call out client conference “Eureka!”. I’m glad you asked. A “eureka moment” is an “aha!” moment, a moment where something clicks and finally makes sense. In hearing and sharing stories, experiences, and perspectives with industry veterans and peers, it was our hope that attendees experienced moments of surprise and enlightenment.

Unsurprisingly, some of the hottest topics at Eureka! were the shift to embedding analytics everywhere, the impact of AI and augmented analytics on businesses, and how to drive transformational change with analytics.


Embedding Analytics Everywhere

In his opening day keynote, Sisense CEO, Amir Orad, emphasized the importance of lowering the barrier to analytics and empowering everyone to use data to make decisions. Providing analytics to everyone, everywhere means catering to the different ways people understand data. This means moving beyond desktop dashboards and offering insights naturally throughout our lives.

Continuing with the non-traditional side of analytics, Amir pointed to three organizations using analytics in unique ways:

  1. Celestica, a global electronics manufacturer, leverages analytics to reduce its carbon footprint. Within just four months of implementing analytics, they saw a 1,041 metric tons reduction of Co2e. That’s enough energy to power 110 homes for one full year!
  2. Skullcandy, the incredibly popular maker of headphones, earbuds, and other audio and wireless products, has used analytics in their business to virtually eliminate fraudulent returns.
  3. Indiana Donor Network, the organ and tissue donation network for the state of Indiana, has used analytics to increase skin donations by 70% and cornea donations by a whopping 224%.

Solidifying the need to embed analytics everywhere in order to transform industries was Sham Sokka of Philips, who spoke about revolutionizing patient care by delivering relevant data and analytics to the right individual at each stage of client care. “We fully believe in this concept of data democratization,” Sham said. “Not everyone is a data scientist so you want to have a platform that can serve simple data to a patient but complex data to an administrator. Getting the right data to the right person is super critical.”

AI and Augmented Analytics

There’s no doubt that artificial intelligence and augmented analytics are going to continue to impact every aspect of analytics – from data prep to insight discovery.

In her keynote, Jen Underwood of Impact Analytix, discussed the unprecedented pace of continuous technological change we’re currently witnessing. When organizations adopt augmented analytics, Jen said they see a multitude of benefits, which include:

  1. Empowering the masses: Rather than providing analytics for only around 30% of an organization, augmented analytics makes discovering insight easy enough for everyone.
  2. Saving time: Augmented data prep automates and accelerates the process, applies reinforcement learning while humans drive algorithms, and helps improve data quality for faster results.
  3. Revealing hidden patterns: Augmented analytics can find patterns in your data that a human might never detect – or detect when it’s too late – using manual techniques.
  4. Improving accuracy: With the ability to apply statistical significance, uncertainty, and risk model estimates, augmented analytics takes into account aspects of data prep and modeling that manual approaches may miss.

Joining in on the topic of artificial intelligence, Professor and Author Avi Goldfarb, gave a keynote that had participants glued to their chairs. His session demonstrated how artificial intelligence will affect business, public policy, and society in virtually all fields. The point he drove home? Prediction isn’t useful unless you can do something with it. What’s useful about AI and prediction is the ability to take action and create a feedback loop – that’s where the competitive edge comes into play.

Transformational Change

Advancements in technology are great but it’s the changes they bring to organizations that make all the difference in the real world. In his session, Bill Janczak from Indiana Donor Network told his organization’s story of transformation through the implementation of analytics.


As a small organization with a small IT budget, Indiana Donor Network has a large mission – to help people during their time of need. Run traditionally like a non-profit, Indiana Donor Network realized that changing their behavior and adding in analytics was the missing piece to ensuring organs make it to the right place at the right time. Using analytics they were able to make some major, important changes:

  1. Within hours they can now catch errors and common data entry challenges that would normally take around 30-45 days to find. This lead to improved matches for organ transplants.
  2. They are now able to monitor which donor outreach programs are successful and which are not in order to focus their activities and spend their resources on programs that actually drive more awareness and donor authorization so that more people can be helped in the long run.

We’ve Struck Gold!

The last two days were a whirlwind of bright ideas, futuristic visions, and practical applications of analytics to improve businesses around the globe. If the excitement in the room surrounding all of the technological transformations was any indication, I’d say the future for analytics is bright.

I’d like to extend a quick thank you to all of our speakers and customers for contributing to an awesome, fascinating, and fun event. Until next year!

Originally Posted at: What Happens When You Put Hundreds of BI Experts in One Room? by analyticsweek

Data Management Rules for Analytics

With analytics taking a central role in most companies’ daily operations, managing the massive data streams organizations create is more important than ever. Effective business intelligence is the product of data that is scrubbed, properly stored, and easy to find. When your organization uses raw data without proper management procedures, your results suffer.

The first step towards creating better data for analytics starts with managing data the right way. Establishing clear protocols and following them can help streamline the analytics process, offer better insights, and simplify the process of handling data. You can start by implementing these five rules to manage your data more efficiently.

1. Establish Clear Analytics Goals Before Getting Started

As the amount of data produced by organizations daily grows exponentially, sorting through terabytes of information can become problematic and reduce the efficiency of analytics. Such large data sets require significantly longer times to scrub and properly organize. For companies that deal with multiple streams that exhibit heavy bandwidth, having a clear line of sight towards business and analytics goals can help reduce inflows and prioritize relevant data.

It’s important to establish clear objectives for data and create parameters that filter out data points that are irrelevant or unclear. This facilitates pre-screening datasets and makes scrubbing and sorting easier by reducing white noise. Additionally, you can focus even more on measuring specific KPIs to further filter out the right data from the stream.

6 crucial steps of preparing data for analysis

2. Simplify and Centralize Your Data Streams

Another problem analytics suites face is reconciling disparate data from multiple streams. Organizations have internal, third-party, customer, and other data that must be considered as part of a larger whole instead of viewed in isolation. Leaving data as-is can be damaging to insights, as different sources may use unique formats or different styles.

Before allowing multiple streams to connect to your data analytics software, your first step should be establishing a process to collect data more centrally and unify it. This centralization makes it easier to input data seamlessly into analytics tools, but also simplifies the methodology for users to find and manipulate data. Consider how to set up your data streams best to reduce the number of sources to eventually produce more unified sets.

3. Scrub Your Data Before Warehousing

The endless stream of data raises questions about quality and quantity. While having more information is preferable, data loses its usefulness when it’s surrounded by noise and irrelevant points. Unscrubbed data sets make it harder to uncover insights, properly manage databases, and access information later.

Before worrying about data warehousing and access, consider the processes in place to scrub data to produce clean sets. Create phases that ensure data relevance is considered while effectively filtering out data that is not pertinent. Additionally, make sure the process is as automated as possible to reduce wasted resources. Implementing functions such as data classification and pre-sorting can help expedite the cleaning process.

4. Establish Clear Data Governance Protocols

One of the biggest emerging issues facing data management is data governance. Because of the sensitive nature of many sources—consumer information, sensitive financial details, and so on—concerns about who has access to information are becoming a central topic in data management. Moreover, allowing free access to datasets and storage can lead to manipulation, mistakes, and deletions that could prove damaging.

It’s vital to establish clear and explicit rules about who can access data, when, and how. Creating tiered permission systems (read, read/write, admin) can help limit the exposure to mistakes and danger. Additionally, sorting data in ways that facilitate access to different groups can help manage data access better without the need to give free rein to all team members.

5. Create Dynamic Data Structures

Many times, storing data is reduced to a single database that limits how you can manipulate it. Static data structures are effective for holding data, but they are restrictive when it comes to analyzing and processing it. Instead, data managers should place a greater emphasis towards creating structures that encourage deeper analysis.

Dynamic data structures present a way to store real-time data that allows users to connect points better. Using three-dimensional databases, finding methods to reshape data rapidly, and creating more inter-connected data silos can help contribute to more agile business intelligence. Generate databases and structures that simplify accessing and interacting with data rather than isolating it.

The fields of data management and analytics are constantly evolving. For analytics teams, it’s vital to create infrastructures that are future-proofed and offer the best possible insights for users. By establishing best practices and following them as closely as possible, organizations can significantly enhance the quality of the insights their data produces.

6 crucial steps of preparing data for analysis


The Data Driven Road Less Traveled


To help better explain the topic, let me take a small detour; explain conventional big business paradox and why it a threat in today’s economy. Remember the days when large businesses were coined 800-pound gorilla and small businesses only dreamt about touching their market share. That in some sense is the conventional big business paradox. It is not true anymore. With ever connected world, easy access to cutting edge platform and methodologies, even small businesses have access to disruptive technologies and ways. In fact Small businesses have an advantage. They could react quickly, act nimbly and have a better focus. So, it is not a surprise that every now and then, big businesses are getting small blows by companies running against conventional big business paradox. So, what is wrong? Big organizations more often than required, run on their conventional ways. Sure, you could argue that scale and size makes them slow, however, try explaining it to businesses like Amazon, Salesforce and Google. In the current landscape, rapidly changing market and customer dynamics demand better ways to analyze the evolving customer expectations and faster response times.

Besides getting your hands on best talent in the market, large businesses should create ways to introduce another paradigm that would help them identify and understand changing customer expectations and technology paradigm. I call it Data Driven Enterprise.

Data never lies, never introduces bias, nor leaves anything to assumptions. Data driven businesses have been proven time and again as more sustainable business. In fact, if you talk about star products of big businesses, they are extensively monitored. What fails most businesses is their lack of attention towards nooks and crannies, where we first observe the signs of the changing customer expectations/ preference/ technology. That is why a centralized data driven approach is the way to go.

A data driven framework is not complicated but a series of focused steps taken to achieve a data focused enterprise. Think of it as an engine to rapidly validate hypothesis in a lean/iterative manner. Sounds cool? Yes, it is! More often than before, more businesses are aligning themselves to a better data driven enterprise. Want more information…

I’ve written an ebook, which had crossed 3k downloads last month. It has a series of easy to digest steps for building thought leadership that is needed to help take a big business to data driven innovation route. Please feel free to download the ebook at: http://pxl.me/ddibook and let me know your thoughts.
Remember, it’s just the start of the discussion; together we all have to travel a long road to sustained business growth.

Originally Posted at: The Data Driven Road Less Traveled by d3eksha

Big Data Explained in Less Than 2 Minutes – To Absolutely Anyone

There are some things that are so big that they have implications for everyone, whether we want them to or not. Big Data is one of those concepts, and is completely transforming the way we do business and is impacting most other parts of our lives.

It’s such an important idea that everyone from your grandma to your CEO needs to have a basic understanding of what it is and why it’s important.

Source for cartoon: click here

What is Big Data?

“Big Data” means different things to different people and there isn’t, and probably never will be, a commonly agreed upon definition out there. But the phenomenon is real and it is producing benefits in so many different areas, so it makes sense for all of us to have a working understanding of the concept.

So here’s my quick and dirty definition:

The basic idea behind the phrase ‘Big Data’ is that everything we do is increasingly leaving a digital trace (or data), which we (and others) can use and analyse. Big Data therefore refers to that data being collected and our ability to make use of it.

I don’t love the term “big data” for a lot of reasons, but it seems we’re stuck with it. It’s basically a ‘stupid’ term for a very real phenomenon – the datafication of our world and our increasing ability to analyze data in a way that was never possible before.

Of course, data collection itself isn’t new. We as humans have been collecting and storing data since as far back as 18,000 BCE. What’s new are the recent technological advances in chip and sensor technology, the Internet, cloud computing, and our ability to store and analyze data that have changed the quantityof data we can collect.

Things that have been a part of everyday life for decades — shopping, listening to music, taking pictures, talking on the phone — now happen more and more wholly or in part in the digital realm, and therefore leave a trail of data.

The other big change is in the kind of data we can analyze. It used to be that data fit neatly into tables and spreadsheets, things like sales figures and wholesale prices and the number of customers that came through the door.

Now data analysts can also look at “unstructured” data like photos, tweets, emails, voice recordings and sensor data to find patterns.

How is it being used?

As with any leap forward in innovation, the tool can be used for good or nefarious purposes. Some people are concerned about privacy, as more and more details of our lives are being recorded and analyzed by businesses, agencies, and governments every day. Those concerns are real and not to be taken lightly, and I believe that best practices, rules, and regulations will evolve alongside the technology to protect individuals.

But the benefits of big data are very real, and truly remarkable.

Most people have some idea that companies are using big data to better understand and target customers. Using big data, retailers can predict what products will sell, telecom companies can predict if and when a customer might switch carriers, and car insurance companies understand how well their customers actually drive.

It’s also used to optimize business processes. Retailers are able to optimize their stock levels based on what’s trending on social media, what people are searching for on the web, or even weather forecasts. Supply chains can be optimized so that delivery drivers use less gas and reach customers faster.

But big data goes way beyond shopping and consumerism. Big data analytics enable us to find new cures and better understand and predict the spread of diseases. Police forces use big data tools to catch criminals and even predict criminal activity and credit card companies use big data analytics it to detect fraudulent transactions. A number of cities are even using big data analytics with the aim of turning themselves into Smart Cities, where a bus would know to wait for a delayed train and where traffic signals predict traffic volumes and operate to minimize jams.

Why is it so important?

The biggest reason big data is important to everyone is that it’s a trend that’s only going to grow.

As the tools to collect and analyze the data become less and less expensive and more and more accessible, we will develop more and more uses for it — everything from smart yoga mats to better healthcare tools and a more effective police force.

And, if you live in the modern world, it’s not something you can escape. Whether you’re all for the benefits big data can bring, or worried about Big Brother, it’s important to be aware of the phenomena and tuned in to how it’s affecting your daily life.

What are your biggest questions about big data? I’d love to hear them in the comments below — and they may inspire future posts to address them.

To read the full article on Data Science Central, click here.

Source: Big Data Explained in Less Than 2 Minutes – To Absolutely Anyone

Bias: Breaking the Chain that Holds Us Back

Speaker Bio: Dr. Vivienne Ming was named one of 10 Women to Watch in Tech by Inc. Magazine, she is a theoretical neuroscientist, entrepreneur, and author. She co-founded Socos Labs, her fifth company, an independent think tank exploring the future of human potential. Dr. Ming launched Socos Labs to combine her varied work with that of other creative experts and expand their impact on global policy issues, both inside companies and throughout our communities. Previously, Vivienne was a visiting scholar at UC Berkeley’s Redwood Center for Theoretical Neuroscience, pursuing her research in cognitive neuroprosthetics. In her free time, Vivienne has invented AI systems to help treat her diabetic son, predict manic episodes in bipolar sufferers weeks in advance, and reunited orphan refugees with extended family members. She sits on boards of numerous companies and nonprofits including StartOut, The Palm Center, Cornerstone Capital, Platypus Institute, Shiftgig, Zoic Capital, and SmartStones. Dr. Ming also speaks frequently on her AI-driven research into inclusion and gender in business. For relaxation, she is a wife and mother of two.

Distilled Blog Post Summary: Dr. Vivienne Ming’s talk at a recent Domino MeetUp delved into bias and its implications, including potential liabilities for algorithms, models, businesses, and humans. Dr. Ming’s evidence included first-hand knowledge fundraising for multiple startups, data analysis completed during her tenure as the Chief Scientist at Gild, as well as citing studies within data, economics, recruiting, and education. This blog post provides text and video clip highlights from the talk. The full video is available for viewing. If you are interested in viewing additional content from Domino’s past events, review the Data Science Popup Playlist. If you are interested in attending an event in-person, then consider the upcoming Rev.

Research, Experimentation, and Discovery: Core of Science

Research, experimentation, and discovery are at the core of all types of science, including data science. Dr. Ming kicked off the talk with indicating “one of the powers of doing a lot of rich data work, there’s this whole range– I mean, there’s very little in this world that’s not an entree into”. While Dr. Ming provided detailed insights and evidence that pointed to the potential of rich data work during the entire talk, this blog post focuses on the implications and liabilities of bias within gender, names, and ethnic demographics. It also covers how bias isn’t solely a data or algorithm problem, it is a human problem. The first step to address bias is acknowledging that it exists.

Do You See the Chameleon? The Roots of Bias

Each one of us has biases and makes assessments based on those biases. Dr. Ming uses Johannes Stotter’s Chameleon to point out that “the roots of bias are fundamental and unavoidable”. Many people when they see the image, see a chameleon. However, the chameleon image consists of two people covered in body paint and are strategically placed to look like a chameleon. In the video clip below, Dr. Ming indicates

“I cannot make an unbiased AI. There are no unbiased rats in the world. In a very basic sense, these systems are making decisions on their uncertainty, and the only rational way to do that is to act the best we can given the data. The problem is when you refuse to acknowledge there’s a problem with our bias and actually do something about it. And we have this tremendous amount of evidence that there is a serious problem, and it’s holding, not just small things back. But as I’m going to get to later, it’s holding us back from a transformed world, one that I think anyone can selfishly celebrate.”


Bias as the Pat on the Head (or the Chain) that Holds Us Back

While history is filled with moments when bias is not acknowledged as a problem, there are also moments when people addressed societal-reinforced gender bias. Women have assumed male nom de plumes to write epic novels, fight in wars, win Judo championships, run marathons, and even, as Dr. Ming pointed out, create an all-women software company called Freelance Programmers in the 1960s. During the meetup, Dr. Ming indicated that Dame Stephane “Steve” Shirley’s TedTalk, “Why do ambitious women have flat heads?”, helped her parse two distinctly different startup fundraising experiences that were grounded in gender bias.

Prior to Dr. Ming co-founding her current education technology company and obtaining her academic credentials, she dropped out of college and started a film company. When

“we started this company, and the funny thing is, despite having nothing, nothing that anyone should invest in– we didn’t have a script. We didn’t have talent. Literally, we didn’t even have talent. We didn’t have experience. We had nothing. We essentially raised what you might in the tech industry called seed round after a few phone calls.“

However, raising funding was more difficult the second time, for her current company, despite having substantially more academic, technology, and business credentials. During one of the funding meetings with a small firm with 5 partners, Dr. Ming relayed how the last partner said “‘you should feel so proud of what you’ve built’. And at the time, I thought, oh, Jesus, at least one of these people is on our side. In fact, as we were leaving the room, he literally patted me on the head, which seemed a little strange.” This prompted Dr. Ming to consider how

“my credentials are transformed that second time. No one questioned us about the technology. They loved it. They questioned whether we know how to run a business. The product itself people loved versus a film. Everything the second time around should have been dramatically easier. Except the only real difference that I can see is that the first time I was a man and the second time I was a woman.“

This led Dr. Ming to conclude and understand what Stephanie Shirley meant by ambitious women having flat heads from all of the times they have been pat on the head. Dr. Ming relays that

“I’ve learned ever since as an entrepreneur is, as soon as it feels like they’re dealing with their favorite niece rather than me as a business person, then I know, I know that they simply are not taking me seriously. And all the PhD’s in the world doesn’t matter, all the past successes in my other companies doesn’t matter. You are just that thing to me. And what I’ve learned is, figure that out ahead of time. Don’t bother wasting days and hours, and prepping to pitch to people that simply are not capable of understanding who you are, but of course, in a lot of context, that’s all you’ve got.“

Dr. Ming also pointed out that the bias due to gender also manifested at an organization where she worked before and after her gender transition. She noted when she went into work after her gender transition,

“That’s the last day anyone ever asked me a math question, which is kind of funny. I do happen to also have a PhD in psychology. But somehow one day to the next, I didn’t forget how to do convergence proofs. I didn’t forget what it meant to invent algorithms. And yet that was how people dealt with it, people who knew before. You see how powerful the change is to see someone in a different skin.”

This experience is similar to Dame Shirley’s, who, in order to start what would become a multi-billion dollar software company in the 1960s, “started to challenge the conventions of the time, even to the extent of changing my name from “Stephanie” to “Steve” in my business development letters, so as to get through the door before anyone realized that he was a she”. Dame Shirley subverted bias during a time when she, as a female, was prevented from working on the stock exchange, driving a bus, or, “Indeed, I couldn’t open a bank account without my husband’s permission”. Yet, despite the bias, Dame Shirley remarked

“who would have guessed that the programming of the black box flight recorder of Supersonic Concord would have been done by a bunch of women working in their own homes” ….”And later, when it was a company valued at over three billion dollars, and I’d made 70 of the staff into millionaires, they sort of said, “Well done, Steve!”

While it is no longer the 1960s, bias implications and liabilities are still present. Yet, we in data science are able to access data to have open conversations about bias as the first step avoiding inaccuracies, training data liabilities, and model liabilities within our data science projects and analysis. What if, in 2018, people built and trained models based on the assumption that humans with XY chromosomes lacked the ability to code because they only reviewed and used data from Dame Shirley’s company in the 1960s? Consider that a moment, as that is what happened to Dame Shirley, Dr. Ming, and many others. Bias implications and liabilities have real world consequences. Being aware of the bias and then addressing it, moves the industry forward towards breaking the chain that holds research, data science, and us, back.

Say My Name: Biased Perceptions Uncovered

When Dr. Ming was the Chief Scientist at Gild, a reporter called her for a quote on the Jose Zamora story. This also led to Dr. Ming’s research on her upcoming book. “The Tax of Being Different”, Dr. Ming relayed anecdotes during the meetup (see video clip) and has also written about this research for the Financial Times:

“To calculate the tax on being different I made use of a data set of 122m professional profiles collected by Gild, a company specialising in tech for hiring and HR, where I worked as chief scientist. From that data, I was able to compare the career trajectories of specific populations by examining the actual individuals. For example, our data set had 151,604 people called “Joe” and 103,011 named “José”. After selecting only for software developers we still had 7,105 and 4,896 respectively, real people writing code for a living. Analysing their career trajectories I found that José typically needs a masters degree or higher compared to Joe with no degree at all to be equally likely to get a promotion for the same quality of work. The tax on being different is largely implicit. People need not act maliciously for it to be levied. This means that José needs six additional years of education and all of the tuition and opportunity costs that education entails. This is the tax on being different, and for José that tax costs $500,000-$1m over his lifetime.” (Financial Times)


While this particular example focuses on ethnicity-oriented demographic bias, during the meetup discussion, Dr. Ming referenced quite a few research studies regarding name bias. In case Domino Data Science Blog readers do not have some of research she cites on hand, a sample of studies have published around bias with names include: names that suggest male gender, “noble-sounding” surnames in Europe, names that are perceived as “easy-to-pronounce” which also has implications for how organizations choose their names. Yet, Dr. Ming did not limit the discussion to bias within gender and naming, she also dived right into how demographic bias impacts image classification, particularly with ethnicity.

Bias within Image Classification: Missing Uhura and Not Unlocking your iPhone X

Before Dr. Ming was the Chief Data Scientist at Gild, she was able to see Paul Viola’s face recognition algorithm demo. In that demo, she noticed that the algorithm didn’t detect Uhura. Viola indicated that this was a problem and it would be addressed. Fast forward years later to when Dr. Ming was the Chief Scientist at Gild, she relayed how she received “a call from The Wall Street Journal [and WSJ asked her] ‘So Google’s face recognition system just labeled a black couple as gorillas. Is AI racist?’ And I said, ‘Well, it’s the same as the rest of us. It depends on how you raise it.’“

For background context, in 2015, Google released a new photo app and a software developer discovered that the app labeled two people of color as “gorillas”  and Yonatan Zunger was the Chief Architect for Social at Google at the time. Since Yonatan Zunger is no longer at Google, he has since provided candid commentary about bias. Then, in January 2018, Wired ran a follow up story regarding the 2015 event. In the article Wired tested Google Photos and found that the labels for gorillas, chimpanzees, chimp, and monkey “were censored from searches and image tags after the 2015 incident”. This was confirmed by Google. Wired also ran a test to assess view of people by conducting searches for “African American”, “black man”, “black woman”, or “black person” which resulted in “an image of a grazing antelope” (on the search “African American”) as well as “black-and-white images of people, correctly sorted by gender but not filtered by race”. This points to the continued challenges involved with addressing bias in machine learning and models. Bias that also has implications beyond social justice.

As Dr. Ming pointed out in the meetup video clip below, facial recognition is also built into the iPhone X. The face recognition feature has potential challenges in recognizing global faces of color. Yet, despite all of this, Dr. Ming indicates “but what you have to recognize, none of these are algorithm problems. These are human problems.” Humans made decisions to build algorithms, build models, train models, and roll out products that include bias that has wide implications.



Introducing liability into an algorithm or model via bias isn’t solely a data or algorithm problem, it is a human problem. Understanding that it is a problem is the first step in addressing it. In the recent Domino Meetup, Dr. Ming relayed how

“AI is an amazing tool, but it’s just a tool. It will never solve your problems for you. You have to solve them. And particularly in the work I do, there are only ever messy human problems, and they only ever have messy human solutions. What’s amazing about machine learning is that once we found some of those issues, we can actually use it to reach as many people as possible, to make this essentially cost-effective, to scale that solution to everyone. But if you think some deep neural network is going to somehow magically figure out who you want to hire when you have not been hiring the right people in the first place, what is it you think is happening in that data set?”

Domino continually curates and amplifies ideas, perspectives, and research to contribute to discussions that accelerate data science work. The full video of Dr. Ming’s talk at the recent Domino MeetUp is available. There is also an additional technical talk that Dr. Ming gave at the Berkeley Institute of Data Science on “Maximizing Human Potential Using Machine Learning-Driven Applications”. If you are interested in similar content to these talks, please feel free to visit the Domino Data Science Popup Playlist or attend the upcoming Rev.

The post Bias: Breaking the Chain that Holds Us Back appeared first on Data Science Blog by Domino.

Source: Bias: Breaking the Chain that Holds Us Back

Unraveling the Mystery of Big Data

Curious about the Big Data hype? Want to find out just how big, BIG is? Who’s using Big Data for what, and what can you use it for? How about the architecture underpinnings and technology stacks? Where might you fit in the stack? Maybe some gotchas to avoid? Lionel Silberman, a seasoned Data Architect spreads some light on it. A good and wholesome refresher into Big Data and what all it can do.
Our guest speaker:

Lionel Silberman,
Senior Data Architect, Compuware
Lionel Silberman has over thirty years of experience in big data product development. He has expert knowledge of relational databases, both internals and applications, performance tuning, modeling, and programming. His product and development experience encompasses the major RDBMS vendors, object-oriented, time-series, OLAP, transaction-driven, MPP, distributed and federated database applications, data appliances, NoSQL systems Hadoop and Cassandra, as well as data parallel and mathematical algorithm development. He is currently employed at Compuware, integrating enterprise products at the data level. All are welcome to join us.



Source by v1shal

Data Science Programming: Python vs R

“Data Scientist – The Sexiest Job of 21st Century.”- Harvard Business Review

If you are already into a big data related career then you must already be familiar with the set of big data skillsthat you need to master to grab the sexiest job of 21st century. With every industry generating massive amounts of data – the need to crunch data requires more powerful and sophisticated programming tools like Python and R language. Python and R are among the popular programming languages that a data scientist must know to pursue a lucrative career in data science.

Python is popular as a general purpose web programming language whereas R is popular for its great features for data visualization as it was particularly developed for statistical computing. At DeZyre, our career counsellors often get questions from prospective students as to what should they learn first Python programming or R programming. If you are unsure on which programming language to learn first then you are on the right page.

Python and R language top the list of basic tools for statistical computing among the set of data scientist skills.Data scientists often debate on the fact that which one is more valuable R programming or Python programming, however both the programming languages have their specialized key features complementing each other.

Data Science with Python Language

Data science consists of several interrelated but different activities such as computing statistics, building predictive models, accessing and manipulating data, building explanatory models, data visualizations, integrating models into production systems and much more on data. Python programming provides data scientists with a set of libraries that helps them perform all these operations on data.

Python is a general purpose multi-paradigm programming language for data science that has gained wide popularity-because of its syntax simplicity and operability on different eco-systems. Python programming can help programmers play with data by allowing them to do anything they need with data – data munging, data wrangling, website scraping, web application building, data engineering and more. Python language makes it easy for programmers to write maintainable, large scale robust code.

“Python programming has been an important part of Google since the beginning, and remains so as the system grows and evolves. Today dozens of Google engineers use Python language, and we’re looking for more people with skills in this language.” – said Peter Norvig, Director at Google.

Unlike R language, Python language does not have in-built packages but it has support for libraries like Scikit, Numpy, Pandas, Scipy and Seaborn that data scientists can use to perform useful statistical and machine learningtasks. Python programming is similar to pseudo code and makes sense immediately just like English language. The expressions and characters used in the code can be mathematical, however, the logic can be easily adhered from the code.

What makes Python language the King of Data Science Programming Languages?

“In Python programming, everything is an object. It’s possible to write applications in Python language using several programming paradigms, but it does make for writing very clear and understandable object-oriented code.”- said Brian Curtin, member of Python Software Foundation

1) Broadness

The public package index for Python language popularly known as PyPi has approximately 40K add-ons available listed under 300 different categories. So, if a developer or a data scientist has to do something with Python language then there is high probability that someone already has it and they need not begin from the scratch. Python programming is used extensively for various tasks ranging from CGI and web development, system testing and automation, and ETL to gaming.

2) Efficient

Developers these days spend lot of time in defining and processing big data. With the increasing amount of data that needs to be processed, it becomes extremely important for programmers to efficiently manage the in-memory usage. Python language has generators both from functions and also as expressions which helps in iterative processing i.e. one item at a time. When there are large number of processes to be applied to a set of data in that case generators in Python language prove to be great advantage as they grab the source data ,one item at a time and then pass through the entire processing chain.

The generator based migration tool collective.transmogrifier helps make complex and interdependent updates to the data as it is being processed from the old site and then allows the programmers to create and store objects in constant memory at the new site.The transmogrifier plays vital role in Python programming when dealing with larger data sets.

3) Can be Easily Mastered Under Expert Guidance-Read It, Use it with Ease

Python language has gained wide popularity as the syntax is clear and readable making it easy to learn under expert guidance. Data scientists can gain expertise knowledge and master programming with Python in scientific computing by taking industry expert oriented Python programming courses. The readability of the syntax makes it easier for other peer programmers update already written Python programs at a faster pace and also helps write new programs quickly.

Applications of Python language-

  • Python programming is used by Mozilla for exploring their broad code base. Mozilla releases several open source packages built using Python.
  • Dropbox, a popular file hosting service founded by Drew Houston as he kept forgetting his USB. The project was started to fulfill his personal needs but it turned out to be so good that even others started using it.Dropbox is completely written in Python language which now has close to 150 million registered users.
  • Walt Disney uses Python language to enhance the supremacy of their creative processes.
  • Some other exceptional products written in Python language are –

i. Cocos2d-A popular open source 2D gaming framework

ii.Mercurial- A popular cross-platform, distributed code revision control tool used by developers.

iii.Bit Torrent- File sharing software

iv.Reddit- Entertainment and Social News website.

Limitations of Python Programming-

  • Python is an interpreted language and thus is many a times slower than the compiled languages.
  • “A possible disadvantage of Python is its slow speed of execution. But many Python packages have been optimized over the years and execute at C speed.”- said Pierre Carbonnelle, a Python programmer who runs the PyPL language index.
  • Python language being a dynamically typed language poses certain design restrictions. It requires rigorous testing because errors show up only during runtime.
  • Python programming has gained popularity on desktop and server platforms but is still weak on mobile computing platforms as there are very less number of mobile apps that are developed using Python language. Python programming can be rarely found on the client side of web applications.

Click here to know more about our IBM Certified Hadoop Developer course

Data Science with R Language

Millions of data scientists and statisticians use R programming to get away with challenging problems related to statistical computing and quantitative marketing. R language has become an essential tool for finance and business analytics-driven organizations like LinkedIn, Twitter, Bank of America, Facebook and Google.

R is an open source programming language and environment for statistical computing and graphics available on Linux, Windows and Mac. R language has an innovative package system that allows developers to extend the functionality to new heights by providing cross-platform distribution and testing of data and code. With more than 5K publicly released packages available for download, it is just a great programming language for exploratory data analysis language can easily be integrated with other object oriented programming languages like C, C++ and Java. R language has array-oriented syntax making it easier for programmers to translate math to code, in particular for professionals with minimal programming background.

Why use R programming for data science?

1.R language is one of the best tools for data scientists in the world of data visualization. It virtually has everything that a data scientist needs- statistical models, data manipulation and visualization charts.

2.Data scientists can create unique and beautiful data visualizations with R language that go far beyond the out-dated line plots and bar charts. With R programming, data scientists can draw meaningful insights from data in multiple dimensions using 3D surfaces and multi-panel charts. The Economist and The New York Times exploit the custom charting capabilities of R programming to create stunning infographics.

3.One great feature of R programming is its reproducible research-the code and data can be given to an interested third party which can trace it back to reproduce the same results. Thus, data scientists need to write code that will extract the data, analyse it and generate a HTML, PDF or a PPT for reporting. When any other third party is interested, the original author can share the code and data with the third party for reproducing similar results.

4.R language is designed particularly for data analysis with a flexibility to mix and match various statistical and predictive models for best possible outcomes. R programming scripts can further be automated with ease to promote production deployments and reproducible research.

5.R language has rich community of approximately 2 million users and close to 1000’s of developers that draws talents of data scientists spread across the world. The community has packages widespread across actuarial analysis, finance, machine learning, web technologies,pharmaceuticals that can be of great help to predict component failure times, analyse genomic sequences, and optimize portfolios. All these resources created by experts in various domains can be accessed easily for free, online.

Applications of R Language

  • Ford uses open source tools like R programming and Hadoop for data driven decision support and statistical data analysis.
  • The popular insurance giant Lloyd’s uses R language to create motion charts that provide analysis reports to investors.
  • Google uses R programming to analyse the effectiveness of online advertising campaigns, predict economic activities and measure the ROI of advertising campaigns.
  • Facebook uses R language to analyse the status updates and create the social network graph.
  • Zillow makes use of R programming to promote the housing prices.

Limitations of R Language

  • R programming has a steep learning curve for professionals who do not come from a programming background (professionals hailing from a GUI world like that of MicrosoftExcel).
  • Working with R language can at times be slow if the code is written poorly, however, there are solutions to this like FastR package, pqR and Penjin.

Data Science with Python or R Programming- What to learn first?

There are certain strategies that will help professionals decide their call of action on whether to begin learning data science with Python language or with R language –

  • If professionals are aware of the fact on what kind of project they will be working on then they can make a decision on which language to learn first. If the projects requires working with jumbled or scrape data from files, websites or any other sources of data then professionals must first start their learning with Python language. On the other hand, if the project requires working with clean data then professionals must first learn to focus on the data analysis part which requires learning R programming first.
  • It is always better to be on-par with the teams so find out what data science  programming language are they using R or Python. Collaboration and learning becomes much easier if you and your team mates are on the same language paradigm.
  • Trends in increasing data scientist jobs will help make a better decision on which what to learn first R language or Python language.
  • Last but not the least, do consider your personal preferences as to what interests you more and which is easier for you to grasp.

Having understood briefly about Python language and R language, the bottom line here is that it is difficult to choose learning any one language first -Python or R to crack data scientist jobs in top big data companies. Each one has its own advantages and disadvantages based on the different scenarios and tasks to be performed. Thus, the best solution is to make a smart move based on the above listed strategies and decide which language you should learn first that will fetch you a job with big data scientist salary and later add onto your skill set by learning the other language.

To read the original article on DeZyre, click here.

Originally Posted at: Data Science Programming: Python vs R

March 6, 2017 Health and Biotech analytics news roundup

Here’s the latest in health and biotech analytics:

Mathematical Analysis Reveals Prognostic Signature for Prostate Cancer: University of East Anglia researchers used an unsupervised technique to categorize cancers based on gene expression levels. Their method was better able than current supervised methods to identify patients with more harmful variants of the disease.

Assisting Pathologists in Detecting Cancer with Deep Learning: Scientists at Google have trained deep learning models to detect tumors in images of tissue samples. These models beat pathologists’ diagnoses by one metric.

Patient expectations for health data sharing exceed reality, study says: The Humana study shows that, among other beliefs, most patients think doctors share more information than they actually do. They also expect information from digital devices will be beneficial.

NHS accused of covering up huge data loss that put thousands at risk: The UK’s national health service failed to deliver half a million medically relevant documents between 2011 and 2016. They had previously briefed Parliament about the failure, but not the scale of it.

Entire operating system written into DNA at 215 Pbytes/gram: Yaniv Erlich and Dina Zielinski (New York Genome Center) used a “fountain code” to translate a 2.1 MB archive into DNA. They were able to retrieve the data by sequencing the resulting fragments, a process that was robust to mutations and loss of sequences.

Source: March 6, 2017 Health and Biotech analytics news roundup