The AI Talent Gap: Locating Global Data Science Centers

Good AI talent is hard to find. The talent pool for anyone with deep expertise in modern artificial intelligence techniques is terribly thin. More and more companies are committing to data and artificial intelligence as their differentiator. The early adopters will quickly find difficulties in determining which data science expertise meets their needs. And the AI talent? If you are not Google, Facebook, Netflix, Amazon, or Apple, good luck.

With the popularity of AI, pockets of expertise are emerging around the world. For a firm that needs AI expertise to advance its digital strategy, finding these data science hubs becomes increasingly important. In this article we look at the initiatives different countries are pushing in the race to become AI leaders and we examine existing and potential data science centers.

It seems as though every country wants to become a global AI power. With the Chinese government pledging billions of dollars in AI funding, other countries don’t want to be left behind.

In Europe, France plans to invest €1.5 billion in AI research over the next 4 years while Germany has universities joining forces with corporations such as Porsche, Bosch, and Daimler to collaborate on AI research. Even Amazon, with a contribution of €1.25 million, is collaborating in the AI efforts in Germany’s Cyber Valley around the city of Stuttgart. Not one to be left behind, the UK pledged £300 million for AI research as well.

Other countries to commit money to AI are Singapore, which committed $150 million and Canada, which not only committed $125 million, but also has large data science hubs in Toronto and Montreal. Yoshua Bengio, one of the fathers of deep learning, is from Montreal, the city with the biggest group of AI researchers in the world. Toronto has a booming tech industry that naturally attracts AI money.

Data scientists worldwide.

Examining a variety of sources, data science professionals are spread across the regions where we would expect them. The graphic below shows the number of members of the site Data Science Central. Since the site is in English, we expect most of its members to come from English speaking countries; however, it still gives us some insight as to which countries have higher representation.

 

Source: www.datasciencecentral.com 

It becomes difficult then to determine AI hubs without classifying talent by levels. One example of this is India; despite its large number of data science professionals, many of them are employed in lower-skilled roles such as data labeling and processing.

So what would be considered a data science hub? The graphic below defines a hub by the number of advanced AI professionals in the country. The countries shown here have AI talent working in companies such as Google, Baidu, Apple and Amazon. However, this omits a large group of talent that is not hired by these types of companies.

 

Source: https://medium.com

Matching the previous graph with a study conducted by Element AI, we see some commonalities, but also see some new hubs emerge. The same talent centers remain, but more countries are highlighted on the map. Element AI’s approach consisted of analyzing LinkedIn profiles, factoring in participation in conferences and publications and weighting skills highly.

 

Source: http://www.jfgagne.ai/talent/

As you search for AI talent, we recommend basing your search on 4 factors: workforce availability, cost of labor, English proficiency, and skill level. Kaggle, one of the most popular data science websites, conducted a salary survey with respondents from 171 countries. The results can be seen below.

Source: www.kaggle.com

Salaries are as expected, but show high variability. By aggregating salary data and the talent pool map, you can decide which countries suit your goals better. The EF English Proficiency Index shows which countries have the highest proficiency in English and can further weed out those that may have a strong AI presence or low cost of labor, but low English proficiency.

In the end, you want to hire professionals that understand the problems you are facing and can tailor their work to your specific needs. With a global mindset, companies can mitigate talent scarcity. If you are considering sourcing talent globally, we recommend hiring strong leadership locally, who act as AI product managers that can manage a team. Hire production managers located on-site with your global talent. They can oversee any data science or AI development and report back to the product manager. KUNGFU.AI will continue to study these global trends and help ensure companies are equipped with access to the best talent to meet their needs.

Originally Posted at: The AI Talent Gap: Locating Global Data Science Centers

When Worlds Collide: Blockchain and Master Data Management

Master Data Management (MDM) is an approach to the management of golden records that has been around over a decade only to find a growth spurt lately as some organizations are exceeding pain thresholds in the management of common data. Blockchain has a slightly shorter history, coming aboard with bitcoin, but also is seeing its revolution these days as data gets distributed far and wide and trust has taken center stage in business relationships.  

Volumes could be written about each on its own, and given that most organizations still have a way to go with each discipline, that might be appropriate. However, good ideas wait for no one and today’s idea is MDM on Blockchain.

Thinking back over our MDM implementations over the years, it is easy to see the data distribution network becoming wider. As a matter of fact, master data distribution is usually the most time-intensive and unwieldy part of an MDM implementation anymore. The blockchain removes overhead, costs and unreliability from authenticated peer-to-peer network partner transactions involving data exchange. It can support one of the big challenges of MDM with governed, bi-directional synchronization of master data between the blockchain and enterprise MDM.

Another core MDM challenge is arriving at the “single version of the truth”. It’s elusive even with MDM because everyone must tacitly agree to the process used to instantiate the data in the first place. While many MDM practitioners go to great lengths to utilize the data rules from a data governance process, it is still a process subject to criticism. The consensus that blockchain can achieve is a governance proxy for that elusive “single version of the truth” by achieving group consensus for trust as well as full lineage of data.

Blockchain enables the major components and tackles the major challenges in MDM.

Blockchain provides a distributed database, as opposed to a centralized hub, that can store data that is certified, and for perpetuity. By storing timestamped and linked blocks, the blockchain is unalterable and permanent. Though not for low latency transactions yet, transactions involving master data, such as financial settlements, are ideal for blockchain and can be sped up by an order of magnitude since blockchain removes the grist in a normal process.

Blockchain uses pre-defined rules that act as gatekeepers of data quality and governs the way in which data is utilized. Blockchains can be deployed publicly (like bitcoin) or internally (like an implementation of Hyperledger). There could be a blockchain per subject area (like customer or product) in the implementation. MDM will begin by utilizing these internal blockchain networks, also known as Distributed Ledger Technology, though utilization of public blockchains are inevitable.

A shared master data ledger beyond company boundaries can, for example, contain common and agreed master data including public company information and common contract clauses with only counterparties able to see the content and destination of the communication.

Hyperledger is quickly becoming the standard for open source blockchain. Hyperledger is hosted by The Linux Foundation. IBM, with the Hyperledger Fabric, is establishing the framework for blockchain in the enterprise. Supporting master data management with a programmable interface for confidential transactions over a permissioned network is becoming a key inflection point for blockchain and Hyperledger.

Data management is about right data at the right time and master data is fundamental to great data management, which is why centralized approaches like the discipline of master data management has taken center stage. MDM can utilize the blockchain for distribution and governance and blockchain can clearly utilize the great master data produced by MDM. Blockchain data needs data governance like any data. This data actually needs it more given its importance on the network.

MDM and blockchain are going to be intertwined now. It enables the key components of establishing and distributing the single version of the truth of data. Blockchain enables trusted, governed data. It integrates this data across broad networks. It prevents duplication and provides data lineage.

It will start in MDM in niches that demand these traits such as financial, insurance and government data. You can get to know the customer better with native fuzzy search and matching in the blockchain. You can track provenance, ownership, relationship and lineage of assets, do trade/channel finance and post-trade reconciliation/settlement.

Blockchain is now a disruption vector for MDM. MDM vendors need to be at least blockchain-aware today, creating the ability for blockchain integration in the near future, such as what IBM InfoSphere Master Data Management is doing this year. Others will lose ground.

Source: When Worlds Collide: Blockchain and Master Data Management by analyticsweek

The Paradigmatic Shift of Data Fabrics: Connecting to Big Data

The notion of a data fabric encompasses several fundamental aspects of contemporary data management. It provides a singular means of data modeling, a centralized method of enforcing data governance, and an inimitable propensity for facilitating data discovery.

Its overarching significance to the enterprise, however, supersedes these individual facets of managing data. At its core, the data fabric tenet effectively signifies a transition in the way data is accessed and, to a lesser extent, even deployed.

According to Denodo Chief Marketing Officer Ravi Shankar, in a big data driven world with increasing infrastructure costs and regulatory repercussions, it has become all but necessary: “to be able to connect to the data instead of collect the data.”

Incorporating various facets of data lakes, virtualization technologies, data preparation, and data ingestion tools, a unified data fabric enables organizations to access data from any variety of locations including on-premise, in the cloud, from remote devices, or from traditional ones. Moreover, it does so in a manner which reinforces traditional centralization benefits such as consistency, uniformity, and oversight vital to regulatory compliance and data governance.

“The world is going connected,” Shankar noted. “There’s connected cars, connected devices. All of these are generating a lot of different data that is all being used to analyze information.”

Analyzing that data in times commensurate with consumer expectations—and with the self-service reporting tools that business users have increasingly become accustomed to—is possible today with a data fabric. According to Cambridge Semantics Chief Technology Officer Sean Martin, when properly implemented a data fabric facilitates “exposing all of the processing and information assets of the business to some sort of portal that has a way of exchanging data between the different sources”, which effectively de-silos the enterprise in the process.

Heterogeneous Data, Single Repository
The quintessential driver for the emergence of the enterprise data fabric concept is the ubiquity of big data and its multiple manifestations. The amounts and diversity of the types of data ingested test the limits of traditional data warehousing methods, which were not explicitly designed to account for the challenges of big data. Instead, organizations began turning to the cloud more and more frequently, while options such as Hadoop (renowned for its cheap storage) became increasingly viable. Consequently, “companies have moved away from a single consuming model in the sense that it used to be standardized for [platforms such as] BusinessObjects,” Shankar explained. “Now with the oncoming of Tableau and QlikView, there are multiple different reporting solutions through the use of the cloud. IT now wants to provide an independence to use any reporting tool.” The freedom to select the tool of choice for data analysis largely hinges on the modeling benefits of a data fabric, which helps to “connect to all the different sources,” Shankar stated. “It could be data warehousing, which many people have. It could be a big data system, cloud systems, and also other on-premises systems. The data fabric stitches all of these things together into a virtual repository and makes it available to the consumers.”

Data Modeling
From a data modeling perspective, a data fabric helps to reconcile the individual semantics involved with proprietary tools accessed through the cloud. Virtually all platforms accessed through the cloud (and many on-premise ones) have respective semantics and taxonomies which can quickly lead to vendor lock-in. “QlikView, Tableau, BusinessObjects, Cognos, all of these have semantic models that cater to their applications,” Shankar said. “Now, if you want to report with all these different forms you have to create different semantic models.” The alternative is to use the virtualization capabilities of a data fabric for effectively “unifying the semantic models within the data fabric,” Shankar said.

One of the principal advantages of this approach is to do so with semantics tailored for an organization’s own business needs, as opposed to those of a particular application. What Shankar referred to as the “high level logical data model” of a data fabric provides a single place for definitions, terms, and mapping which is applicable across the enterprise’s data. Subsequently, the individual semantic models of application tools are used in conjunction with that overlying logical business model, which provides the basis for the interchange of tools, data types, and data sources. “When the data’s in a data store it’s usually in a pretty obscure form,” Martin commented. “If you want to use it universally to make it available across your enterprise you need to convert that into a form that makes it meaningful. Typically the way we do that is by mapping it to an ontology.”

Data Governance
The defining characteristic of a data fabric is the aforementioned virtual repository for all data sources, which is one of the ways in which it builds upon the data lake concept. In addition to the uniform modeling it enables, it also supplies a singular place in which to store the necessary metadata for all sources and data types. That metadata, in turn, is one of the main ways users can create intelligent action for data discovery or search. “Since this single virtual repository actually stores all of this metadata information, the data fabric has evolved to support other functions like data discovery and search because this is one place where you can see all the enterprise data,” Shankar observed. Another benefit is the enhanced governance and security facilitated by this centralized approach in which the metadata about the data and the action created from the data is stored.

“The data fabric stores what we call metadata information,” Shankar said. “It stores information about the data, where to go find it, what type of data, what type of association and so on. It contains a bridge of the data.” This information is invaluable for determining data lineage, which becomes pivotal for effecting regulatory compliance. It can also function as a means of implementing role-based access to data “at the data fabric layer,” Shankar commented. “Since you check the source systems directly, if it comes through the data fabric it will make sure it only gives you the data you have access to.” Mapping the data to requisite business glossaries helps to buttress the enterprise-wide definitions and usage of terminology which are hallmarks of effective governance.

Data Preparation
The data fabric tenet is also a critical means of implementing data preparation quickly and relatively painlessly—particularly when compared to conventional ETL methods. According to Shankar: “Connecting to the data is much easier than collecting, since collecting requires moving the data, replicating it, and transforming it, all of which takes time.” Interestingly enough, those temporal benefits also translate into advantages for resources. Shankar estimated that for every four IT personnel required to enact ETL, only one person is needed to connect data with virtualization technologies. These temporal and resource advantages naturally translate to a greater sense of agility, which is critical for swiftly incorporating new data sources and satisfying customers in the age of real-time. In this regard, the business value of a data fabric directly relates to the abstraction capabilities of its virtualization technologies. According to Martin, “You start to build those abstractions that give you an agility with your data and your processing. Right now, how easy is it for you to move all your things from Amazon [Web Services] to Google? It’s a big effort. How about if the enterprise data fabric was pretty much doing that for you? That’s a value to you; you don’t get locked in with any particular infrastructure provider, so you get better pricing.”

Source: The Paradigmatic Shift of Data Fabrics: Connecting to Big Data by jelaniharper

The Future Of Big Data Looks Like Streaming

stream
Big data is big news, but it’s still in its infancy. While most enterprises at least talk about launching Big Data projects, the reality is that very few do in any significant way. In fact, according to new survey data from Dimensional, while 91% of corporate data professionals have considered investment in Big Data, only 5% actually put any investment into a deployment, and only 11% even had a pilot in place.

Big data is big news, but it’s still in its infancy. While most enterprises at least talk about launching Big Data projects, the reality is that very few do in any significant way. In fact, according to new survey data from Dimensional, while 91% of corporate data professionals have considered investment in Big Data, only 5% actually put any investment into a deployment, and only 11% even had a pilot in place.

Real Time Gets Real

ReadWrite: Hadoop has been all about batch processing, but the new world of streaming analytics is all about real time and involves a different stack of technologies.

Langseth: Yes, however I would not entangle the concepts of real-time and streaming. Real-time data is obviously best handled as a stream. But it’s possible to stream historical data as well, just as your DVR can stream Gone with the Wind or last week’s American Idol to your TV.

 This distinction is important, as we at Zoomdata believe that analyzing data as a stream adds huge scalability and flexibility benefits, regardless of if the data is real-time or historical.

RW: So what are the components of this new stack? And how is this new big data stack impacting enterprise plans?

JL: The new stack is in some ways an extension of the old stack, and in some ways really new.

Data has always started its life as a stream. A stream of transactions in a point of sale system. A stream of stocks being bought and sold. A stream of agricultural goals being traded for valuable metals in Mesopotamia.

Traditional ETL processes would batch that data up and kill its stream nature. They did so because the data could not be transported as a stream, it needed to be loaded onto removable disks and tapes to be transported from place to place.

But now it is possible to take streams from their sources, through any enrichment or transformation processes, through analytical systems, and into the data’s “final resting place”—all as a stream. There is no real need to batch up data given today’s modern architectures such as Kafka and Kinesis, modern data stores such as MongoDB, Cassandra, Hbase, and DynamoDB (which can accept and store data as a stream), and modern business intelligence tools like the ones we make at Zoomdata that are able to process and visualize these streams as well as historical data, in a very seamless way.

Just like your home DVR can play live TV, rewind a few minutes or hours, or play moves from last century, the same is possible with data analysis tools like Zoomdata that treat time as a fluid.

Throw That Batch In The Stream

Also we believe that those who have proposed a “Lambda Architecture,” effectively separating paths for real-time and batched data, are espousing an unnecessary trade-off, optimized for legacy tooling that simply wasn’t engineered to handle streams of data be they historical or real-time.

At Zoomdata we believe that it is not necessary to separate-track real-time and historical, as there is now end-to-end tooling that can handle both from sourcing, to transport, to storage, to analysis and visualization.

RW: So this shift toward streaming data is real, and not hype?

JL: It’s real. It’s affecting modern deployments right now, as architects realize that it isn’t necessary to ever batch up data, at all, if it can be handled as a stream end-to-end. This massively simplifies Big Data architectures if you don’t need to worry about batch windows, recovering from batch process failures, etc.

So again, even if you don’t need to analyze data from five seconds or even five minutes ago to make business decisions, it still may be simplest and easiest to handle the data as a stream. This is a radical departure from the way things in big data have been done before, as Hadoop encouraged batch thinking.

But it is much easier to just handle data as a stream, even if you don’t care at all—or perhaps not yet—about real-time analysis.

RW: So is streaming analytics what Big Data really means?

JL: Yes. Data is just like water, or electricity. You can put water in bottles, or electricity in batteries, and ship them around the world by planes trains and automobiles. For some liquids, such as Dom Perignon, this makes sense. For other liquids, and for electricity, it makes sense to deliver them as a stream through wires or pipes. It’s simply more efficient if you don’t need to worry about batching it up and dealing with it in batches.

Data is very similar. It’s easier to stream big data end-to-end than it is to bottle it up.

Article originally appeared HERE.

Source: The Future Of Big Data Looks Like Streaming

Mar 07, 19: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Data shortage  Source

[ AnalyticsWeek BYTES]

>> Dresner’s 2018 Embedded BI Market Study: Top 5 Key Findings by analyticsweek

>> My Conversation with Oracle on Customer Experience Management by bobehayes

>> Data Scientist? Programmer? Are They Mutually Exclusive? by analyticsweek

Wanna write? Click Here

[ NEWS BYTES]

>>
 Delta Risk CEO Scott Kaine Featured on “Insights & Intelligence” Cloud Security Podcast – Security Boulevard Under  Cloud Security

>>
 Global Risk Analytics Market – Current & Future trends, Growth Opportunities, Industry analysis & forecast by 2025 – TechnoBust Under  Risk Analytics

>>
 Prescriptive Analytics Market- The Growing Prominence Of Big Data – CMFE News (press release) (blog) Under  Prescriptive Analytics

More NEWS ? Click Here

[ FEATURED COURSE]

Python for Beginners with Examples

image

A practical Python course for beginners with examples and exercises…. more

[ FEATURED READ]

Machine Learning With Random Forests And Decision Trees: A Visual Guide For Beginners

image

If you are looking for a book to help you understand how the machine learning algorithms “Random Forest” and “Decision Trees” work behind the scenes, then this is a good book for you. Those two algorithms are commonly u… more

[ TIPS & TRICKS OF THE WEEK]

Data aids, not replace judgement
Data is a tool and means to help build a consensus to facilitate human decision-making but not replace it. Analysis converts data into information, information via context leads to insight. Insights lead to decision making which ultimately leads to outcomes that brings value. So, data is just the start, context and intuition plays a role.

[ DATA SCIENCE Q&A]

Q:How would you define and measure the predictive power of a metric?
A: * Predictive power of a metric: the accuracy of a metric’s success at predicting the empirical
* They are all domain specific
* Example: in field like manufacturing, failure rates of tools are easily observable. A metric can be trained and the success can be easily measured as the deviation over time from the observed
* In information security: if the metric says that an attack is coming and one should do X. Did the recommendation stop the attack or the attack never happened?

Source

[ VIDEO OF THE WEEK]

@EdwardBoudrot / @Optum on #DesignThinking & #DataDriven Products #FutureOfData #Podcast

 @EdwardBoudrot / @Optum on #DesignThinking & #DataDriven Products #FutureOfData #Podcast

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

Hiding within those mounds of data is knowledge that could change the life of a patient, or change the world. – Atul Butte, Stanford

[ PODCAST OF THE WEEK]

#BigData @AnalyticsWeek #FutureOfData #Podcast with Eloy Sasot, News Corp

 #BigData @AnalyticsWeek #FutureOfData #Podcast with Eloy Sasot, News Corp

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

Akamai analyzes 75 million events per day to better target advertisements.

Sourced from: Analytics.CLUB #WEB Newsletter

What Is Happening With Women Entrepreneurs? [Infographics]

Business-Plan-Woman

On this International Women’s Day, it might be a wise idea to learn how women is shaping the entrepreneurial landscape. Not only is the impact impressive, growing but it is also building sustained growth. In some aspects, the impact is equal or better than the male counterparts.

Women entrepreneurs has been on the rise for sometime, more specifically, we’ve grown twice as fast as men between 1997 and 2007, at the pace of 44% growth in women-owned businesses. if it is not a cool stats, not sure what else is?

There are a Dozen interesting factoids about how women is shaping business landscape:

  1. In 2005, there were 7 CEO’s in Fortune 500. As of May 2011, there were 12 CEO’s in Fortune 500 companies, not many but growing.
  2. Approximately 32% of women business owners believe that being a woman in a male-dominated industry is beneficial.
  3. The number of women-owned companies with 100 or more employees has increased at nearlytwice the growth rate of all other companies.
  4. The vast majority (83%) of women business owners are personally involved in selecting and purchasing technology for their businesses.
  5. The workforces of women-owned firms show more gender equality. Women business owners overallemploy a roughly balanced workforce (52% women, 48% men), while men business owners employ 38% women and 62% men, on average.
  6. 3% of all women-owned firms have revenues of $1 million or more compared with 6% of men-owned firms.
  7. Women business owners are nearly twice as likely as men business owners to intend to pass the business on to a daughter or daughters (37% vs. 19%).
  8. Between 1997 and 2002, women-owned firms increased their employment by 70,000, whereas firms owned by men lost 1 million employees.
  9. One in five firms with revenue of $1 million or more is woman-owned.
  10. Women owners of firms with $1 million or more in revenue are more likely to belong to formal business organizations, associations or networks than other women business owners (81% vs. 61%).
  11. Women-owned firms in the U.S. are more likely than all firms to offer flex-time, tuition reimbursement and, at a smaller size, profit sharing to their workers.
  12. 86% of women entrepreneurs say they use the same products and services at home that they do in their business, for familiarity and convenience.

Road is well traveled and boy we have covered a distance. Let us embrace and keep breaking the glass ceiling. At the end, Happy International Women’s Day you all!

Infographic: Women in Business
Courtesy of: CreditDonkey

Source