Three ways to help your data science team network with other big data pros

Business people working in a conference room.
Business people working in a conference room.

One of the most exciting ways to use big data analytics in your corporate strategy is to target other data scientists (e.g., Cloudera). I call this using big data as a core strategy as opposed to a supporting strategy, wherein analytic strategies are incorporated into traditional products and services that target a non-analytic market (e.g., Progressive).

A core strategy is exciting for your data science team because they get to build products and services for people just like them — other data scientists. This is a very sound idea that I fervently advocate.

Like attracts like
People have a natural affinity for others like them, and data scientists are no exception.

Although data science is a multi-disciplinary skill that has its tentacles in a wide range of areas, it’s the narrow intersection that defines the field. As such, the population of true data science enthusiasts is quite small, which makes their social bonds very tight.

Two data scientists meeting for the first time can carry on a conversation for hours on subjects the vast majority of the population won’t understand, much less care about. So when the people creating your offering (your in-house team) also have the same passion and knowledge as the people consuming your offering (your customers), you have an amazing opportunity to accelerate customer loyalty.

Be intentional about setting up these meetings
These relationships are going to form no matter what, so it’s best to be intentional about how these meetings happen. Like any other group of professionals, there are several associations available for data scientists, and with the recent explosion of corporate interest in data scientists, it seems like a new one pops up every other day. Add to this trade shows, online forums, and other community events, and you have a great potential for your staff to at least casually bump into your customers, if not meet with them on a regular basis.

Wouldn’t you want to control these interactions instead of leaving these relationships to organically grow on their own? It makes sense to me.

Suggestions to point you in the right direction
There are several possibilities for controlling the interactions between your data scientists and your customers, and the one you choose depends on your resources and the value you place on strategic loyalty. I’m an advocate of infusing loyalty into your strategy, so I’ll always recommend that you show no reticence in pouring funds in this direction. That said, this approach isn’t for everyone, and I respect that.

Did You Know Mass Save Offers A Free Home Energy Assessment? Click To Find Out More
Find out more about the many ways you can save energy and reduce costs with Mass Save.
Sponsored events

For those who would rather reserve the bulk of their strategic stockpile for other pursuits, I recommend at least a moderate investment in bringing your staff and your customers together with regularly sponsored events. It doesn’t take much to sponsor a regular (and fun) event where your data scientists can network with existing and potential customers. It’s also a great opportunity for you to strengthen your brand within a very vertical market.

Don’t let the informal structure of sponsored events detract you from coaching your data scientists on the necessary do’s and don’ts. It’s good to talk freely with other professionals; however, there’s a line of confidentiality that must be maintained. It’s important that you explain this to your data scientists, as you probably won’t be asking your guests to sign a non-disclosure agreement before they start eating their salad.

Strategic, legal partnerships

On the other end of the spectrum is a strategic, legal partnership; this makes sense if you have a very short list of high-value customers and/or you face fierce competition in the marketplace. Bringing your customers on as partners binds their allegiance and widens the communication channels without worrying of a confidentiality breach.

You must be willing to commit a serious amount of time and resources to make this work. It defeats the purpose of structuring formal arrangements like this only to have one annual get-together each year where very little information is exchanged.

Special projects

Another idea is special projects, which is somewhere between sponsored events and legal partnerships. Similar to a consulting arrangement, a special project has a beginning and an end and serves a specific objective. The idea is to put your data scientists and customers together as a team to accomplish a goal. The project sponsor could be you, your customer, or a third party. Confidentiality agreements are in place to promote an open exchange of ideas, but the relationship isn’t evergreen like a legal partnership. In this way, you can network and brand with a larger audience without the anxiety of trade secrets leaving your fortress.

I’ve given you three ideas for putting your data science staff and your customers together, and there are many more worth exploring. Take some time today to figure out which idea makes the most sense for your organization, and put a plan in place to make it happen.

Birds of a feather flock together; it’s your job to manage their migration path.
Originally posted at:

Source: Three ways to help your data science team network with other big data pros

What is Customer Loyalty? Part 1

True Test of Loyalty
Article on RAPID Loyalty Approach – click to download article

There seems to be a consensus among customer feedback professionals that business growth depends on improving customer loyalty. It appears, however, that there is little agreement in how they define and measure customer loyalty. In this and subsequent blog posts, I examine the concept of customer loyalty, presenting different definitions of this construct. I attempt to summarize their similarities and differences and present a definition of customer loyalty that is based on theory and practical measurement considerations.

The Different Faces of Customer Loyalty

There are many different definitions of customer loyalty. I did a search on Google using “customer loyalty definition” and found the following:

  • Esteban Kolsky proposes two models of loyalty:  emotional and intellectual. In this approach, Kolsky posits that emotional loyalty is about how the customer feels about doing business with you and your products, “loves” what you do and could not even think of doing business with anybody else. Intellectual loyalty, on the other hand, is more transactionally-based where customers must justify doing business with you rather than someone else.
  • Don Peppers talks about customer loyalty from two perspectives: attitudinal and behavioral. From Peppers’ perspective, attitudinal loyalty is no more than customer preference; behavioral loyalty, however, is concerned about actual behaviors regardless of the customers’ attitude or preference behind that behavior.
  • Bruce Temkin proposed that customer loyalty equates to willingness to consider, trust and forgive.
  • Customer Loyalty Institute states that customer loyalty is “all about attracting the right customer, getting them to buy, buy often, buy in higher quantities and bring you even more customers.”
  • Beyond Philosophy states that customer loyalty is “the result of consistently positive emotional experience, physical attribute-based satisfaction and perceived value of an experience, which includes the product or services.” From this definition, it is unclear to me if they view customer loyalty as some “thing” or rather a process.
  • Jim Novo defines customer loyalty in behavioral terms. Specifically, he states that customer loyalty, “describes the tendency of a customer to choose one business or product over another for a particular need.”

These definitions illustrate the ambiguity of the term, “customer loyalty.” Some people take an emotional/attitudinal approach to defining customer loyalty while others emphasize the behavioral aspect of customer loyalty. Still others define customer loyalty in process terms.

Emotional Loyalty

Customers can experience positive feelings about your company/brand. Kolsky uses the word, “love,” to describe this feeling of emotional loyalty. I think that Kolksy’s two models of customer loyalty (emotional and intellectual) are not really different types of loyalty. They simply reflect two ends of the same continuum. The feeling of “love” for the brand is one end of this continuum and the feeling of “indifference” is on the other end of this continuum.

Temkin’s model of customer loyalty is clearly emotional; he measures customer loyalty using questions about willingness to consider, trust and forgive, each representing positive feelings when someone “loves” a company.

Behavioral Loyalty

Customers can engage in positive behaviors toward the company/brand. Peppers believes what is important to companies is customer behavior, what customers do. That is, what matters to business is whether or not customers exhibit positive behaviors toward the company. Also, Novo’s definition is behavioral in nature as he emphasizes the word, “choose.” While loyalty behaviors can take different forms, they each benefit the company and brand in different ways.

Customer Loyalty as an Attribute about the Customers

To me (due perhaps to my training as a psychologist), customer loyalty is best conceptualized as an attribute about the customer. Customer loyalty is a quality, characteristic or thing about the customer that can be measured. Customers can either possess high levels of loyalty or they can posses low levels of loyalty, whether it be an attitude or behavior. While the process of managing customer relationships is important in understanding how to increase customer loyalty (Customer Loyalty Institute, Beyond Philosophy), it is different from customer loyalty.

Definition of Customer Loyalty

Considering the different conceptualizations of customer loyalty, I offer a definition of customer loyalty that incorporates prior definitions of customer loyalty:

Customer loyalty is the degree to which customers experience positive feelings for and exhibit positive behaviors toward a company/brand.

This definition reflects an attribute or characteristic about the customer that supports both attitudinal and behavioral components of loyalty. This definition of customer loyalty is left generally vague to reflect the different positive emotions (e.g., love, willingness to forgive, trust) and behaviors (e.g., buy, buy more often, stay) that customers can experience.

In an upcoming post, I will present research on the measurement of customer loyalty that will help clarify this definition. This research helps shed light on the meaning of customer loyalty and how businesses can benefit by taking a more rigorous approach to measuring customer loyalty.


Why Cloud-native is more than software just running on someone else’s computer

The cloud is not “just someone else’s computer”, even though that meme has been spreading so fast on the internet. The cloud consists of extremely scalable data centers with highly optimized and automated processes. This makes a huge difference if you are talking about the level of application software.

So what is “cloud-native” really?

“Cloud-native” is more than just a marketing slogan. And a “cloud-native application” is not simply a conventionally developed application which is running on “someone else’s computer”. It is designed especially for the cloud, for scalable data centers with automated processes.

Software that is really born in the cloud (i.e. cloud-native) automatically leads to a change in thinking and a paradigm shift on many levels. From the outset, cloud-native developed applications are designed with scalability in mind and are optimized with regard to maintainability and agility.

They are based on the “continuous delivery” approach and thus lead to continuously improving applications. The time from development to deployment is reduced considerably and often only takes a few hours or even minutes. This can only be achieved with test-driven developments and highly automated processes.

Rather than some sort of monolithic structure, applications are usually designed as a loosely connected system of comparatively simple components such as microservices. Agile methods are practically always deployed, and the DevOps approach is more or less essential. This, in turn, means that the demands made on developers increase, specifically requiring them to have well-founded “operations” knowledge.

Download The Cloud Data Integration Primer now.

Download Now

Cloud-native = IT agility

With a “cloud-native” approach, organizations expect to have more agility and especially to have more flexibility and speed. Applications can be delivered faster and continuously at high levels of quality, they are also better aligned to real needs and their time to market is much faster as well. In these times of “software is eating the world”, where software is an essential factor of survival for almost all organizations, the significance of these advantages should not be underestimated.

In this context: the cloud certainly is not “just someone else’s computer”. And the “Talend Cloud” is more than just an installation from Talend that runs in the cloud. The Talend Cloud is cloud-native.

In order to achieve the highest levels of agility, in the end, it is just not possible to avoid changing over to the cloud. Potentially there could be a complete change in thinking in the direction of “serverless”, with the prospect of optimizing cost efficiency as well as agility.  As in all things enterprise technology, time will tell. But to be sure, cloud-native is an enabler on the rise.

About the author Dr. Gero Presser

Dr. Gero Presser is a co-founder and managing partner of Quinscape GmbH in Dortmund. Quinscape has positioned itself on the German market as a leading system integrator for the Talend, Jaspersoft/Spotfire, Kony and Intrexx platforms and, with their 100 members of staff, they take care of renowned customers including SMEs, large corporations and the public sector. 

Gero Presser did his doctorate in decision-making theory in the field of artificial intelligence and at Quinscape he is responsible for setting up the business field of Business Intelligence with a focus on analytics and integration.


The post Why Cloud-native is more than software just running on someone else’s computer appeared first on Talend Real-Time Open Source Data Integration Software.

Source: Why Cloud-native is more than software just running on someone else’s computer

Sabre Airline Solutions Gives Airline Data a Critical Upgrade

Sabre Airline Solutions (Sabre) supplies applications to airlines that enable them to manage a variety of planning tasks and strategic operations, including crew schedules, flight paths, and weight and balance for aircraft.

The challenge for Sabre was that many airlines had not implemented the proper upgrades. That meant some large customers were as many as five versions behind. And moving them to the new suite would have been a time-consuming, expensive, version-by-version process. Customers, understandably, were nervous about tackling that process.


Reducing upgrade time and costs for customers

Talend was given a deadline of two weeks to complete the migration for an important customer and surpassed the company’s expectations. This enabled Sabre to complete the needed migrations in just a matter of hours.

“To help our airline customers succeed in a very competitive industry, we need a way to migrate data more efficiently. Talend is the solution for data mobility ” – Dave Gebhart, Software Development Principal

Replicating a process to save time and money

As a result of the shorter, more-cost efficient process, Sabre can now easily replicate it. The new process reduced the cost of doing migrations by 80 percent, and it enabled Sabre to do as many as 25 upgrades in a year, whereas previously they could manage only about 10. That means Sabre more than doubled the upgrade slots they are able to serve because of all the benefits of using Talend.

What’s next? Sabre is currently working on a project that uses Talend for a more complex task. “We’re integrating three legacy applications, and we’re using Talend to extract data and transform it into objects that can be converted into XML service requests, which are then processed so the data can be loaded via web services into a new system,” he says. “Talend is the engine we’re using to drive this multi-step process.” 



The post Sabre Airline Solutions Gives Airline Data a Critical Upgrade appeared first on Talend Real-Time Open Source Data Integration Software.


10 Things Your Customers WISH You Knew About Them [Infographic]

10 Things Your Customers WISH You Knew About Them
10 Things Your Customers WISH You Knew About Them

Understanding your customers is an integral part of any successful business. It is instrumental in building a loyal customer base. Here is an infographic listing 10 research studies that reveal the things your customers WISH you knew.

10 Things Your Customers WISH You Knew About Them


How To Create Content Ideas For A New Client [Infographic]

How To Create Content Ideas For A New Client [Infographic]
How To Create Content Ideas For A New Client [Infographic]

As a content creator, you might sometimes meet with companies and brands that are looking to market their products with content. It’s one of the most efficient way to get your message out, and people go to great lengths to bake content that could make an impact through blogs and websites. But for the content creator, it can be a difficult task to put together content ideas that will satisfy the often detail oriented client.

The ideas for good content usually come from a need-to-know moment. You might browse the Internet when you come across something you want to know more about. Usually there are places to find this information, but every once in a while, you come across a road block that just won’t yield the information you want to find on one source. That’s usually when you have the idea of actually putting together an exact piece of content that outlines those missing pieces, if you know what I mean.

Depending on what content ideas you usually have, this routine can be used when you want and have to create ideas for your client as well. Usually you need to ask the client a few questions before you have a clear picture about what the client is looking for and what they want to promote or market.

How to Create Content Ideas for a New Client

by CopyPress.
Explore more infographics like this one on the web’s largest information design community – Visually.


Source: How To Create Content Ideas For A New Client [Infographic] by v1shal

The AI Talent Gap: Locating Global Data Science Centers

Good AI talent is hard to find. The talent pool for anyone with deep expertise in modern artificial intelligence techniques is terribly thin. More and more companies are committing to data and artificial intelligence as their differentiator. The early adopters will quickly find difficulties in determining which data science expertise meets their needs. And the AI talent? If you are not Google, Facebook, Netflix, Amazon, or Apple, good luck.

With the popularity of AI, pockets of expertise are emerging around the world. For a firm that needs AI expertise to advance its digital strategy, finding these data science hubs becomes increasingly important. In this article we look at the initiatives different countries are pushing in the race to become AI leaders and we examine existing and potential data science centers.

It seems as though every country wants to become a global AI power. With the Chinese government pledging billions of dollars in AI funding, other countries don’t want to be left behind.

In Europe, France plans to invest €1.5 billion in AI research over the next 4 years while Germany has universities joining forces with corporations such as Porsche, Bosch, and Daimler to collaborate on AI research. Even Amazon, with a contribution of €1.25 million, is collaborating in the AI efforts in Germany’s Cyber Valley around the city of Stuttgart. Not one to be left behind, the UK pledged £300 million for AI research as well.

Other countries to commit money to AI are Singapore, which committed $150 million and Canada, which not only committed $125 million, but also has large data science hubs in Toronto and Montreal. Yoshua Bengio, one of the fathers of deep learning, is from Montreal, the city with the biggest group of AI researchers in the world. Toronto has a booming tech industry that naturally attracts AI money.

Data scientists worldwide.

Examining a variety of sources, data science professionals are spread across the regions where we would expect them. The graphic below shows the number of members of the site Data Science Central. Since the site is in English, we expect most of its members to come from English speaking countries; however, it still gives us some insight as to which countries have higher representation.



It becomes difficult then to determine AI hubs without classifying talent by levels. One example of this is India; despite its large number of data science professionals, many of them are employed in lower-skilled roles such as data labeling and processing.

So what would be considered a data science hub? The graphic below defines a hub by the number of advanced AI professionals in the country. The countries shown here have AI talent working in companies such as Google, Baidu, Apple and Amazon. However, this omits a large group of talent that is not hired by these types of companies.



Matching the previous graph with a study conducted by Element AI, we see some commonalities, but also see some new hubs emerge. The same talent centers remain, but more countries are highlighted on the map. Element AI’s approach consisted of analyzing LinkedIn profiles, factoring in participation in conferences and publications and weighting skills highly.



As you search for AI talent, we recommend basing your search on 4 factors: workforce availability, cost of labor, English proficiency, and skill level. Kaggle, one of the most popular data science websites, conducted a salary survey with respondents from 171 countries. The results can be seen below.


Salaries are as expected, but show high variability. By aggregating salary data and the talent pool map, you can decide which countries suit your goals better. The EF English Proficiency Index shows which countries have the highest proficiency in English and can further weed out those that may have a strong AI presence or low cost of labor, but low English proficiency.

In the end, you want to hire professionals that understand the problems you are facing and can tailor their work to your specific needs. With a global mindset, companies can mitigate talent scarcity. If you are considering sourcing talent globally, we recommend hiring strong leadership locally, who act as AI product managers that can manage a team. Hire production managers located on-site with your global talent. They can oversee any data science or AI development and report back to the product manager. KUNGFU.AI will continue to study these global trends and help ensure companies are equipped with access to the best talent to meet their needs.

Originally Posted at: The AI Talent Gap: Locating Global Data Science Centers

When Worlds Collide: Blockchain and Master Data Management

Master Data Management (MDM) is an approach to the management of golden records that has been around over a decade only to find a growth spurt lately as some organizations are exceeding pain thresholds in the management of common data. Blockchain has a slightly shorter history, coming aboard with bitcoin, but also is seeing its revolution these days as data gets distributed far and wide and trust has taken center stage in business relationships.  

Volumes could be written about each on its own, and given that most organizations still have a way to go with each discipline, that might be appropriate. However, good ideas wait for no one and today’s idea is MDM on Blockchain.

Thinking back over our MDM implementations over the years, it is easy to see the data distribution network becoming wider. As a matter of fact, master data distribution is usually the most time-intensive and unwieldy part of an MDM implementation anymore. The blockchain removes overhead, costs and unreliability from authenticated peer-to-peer network partner transactions involving data exchange. It can support one of the big challenges of MDM with governed, bi-directional synchronization of master data between the blockchain and enterprise MDM.

Another core MDM challenge is arriving at the “single version of the truth”. It’s elusive even with MDM because everyone must tacitly agree to the process used to instantiate the data in the first place. While many MDM practitioners go to great lengths to utilize the data rules from a data governance process, it is still a process subject to criticism. The consensus that blockchain can achieve is a governance proxy for that elusive “single version of the truth” by achieving group consensus for trust as well as full lineage of data.

Blockchain enables the major components and tackles the major challenges in MDM.

Blockchain provides a distributed database, as opposed to a centralized hub, that can store data that is certified, and for perpetuity. By storing timestamped and linked blocks, the blockchain is unalterable and permanent. Though not for low latency transactions yet, transactions involving master data, such as financial settlements, are ideal for blockchain and can be sped up by an order of magnitude since blockchain removes the grist in a normal process.

Blockchain uses pre-defined rules that act as gatekeepers of data quality and governs the way in which data is utilized. Blockchains can be deployed publicly (like bitcoin) or internally (like an implementation of Hyperledger). There could be a blockchain per subject area (like customer or product) in the implementation. MDM will begin by utilizing these internal blockchain networks, also known as Distributed Ledger Technology, though utilization of public blockchains are inevitable.

A shared master data ledger beyond company boundaries can, for example, contain common and agreed master data including public company information and common contract clauses with only counterparties able to see the content and destination of the communication.

Hyperledger is quickly becoming the standard for open source blockchain. Hyperledger is hosted by The Linux Foundation. IBM, with the Hyperledger Fabric, is establishing the framework for blockchain in the enterprise. Supporting master data management with a programmable interface for confidential transactions over a permissioned network is becoming a key inflection point for blockchain and Hyperledger.

Data management is about right data at the right time and master data is fundamental to great data management, which is why centralized approaches like the discipline of master data management has taken center stage. MDM can utilize the blockchain for distribution and governance and blockchain can clearly utilize the great master data produced by MDM. Blockchain data needs data governance like any data. This data actually needs it more given its importance on the network.

MDM and blockchain are going to be intertwined now. It enables the key components of establishing and distributing the single version of the truth of data. Blockchain enables trusted, governed data. It integrates this data across broad networks. It prevents duplication and provides data lineage.

It will start in MDM in niches that demand these traits such as financial, insurance and government data. You can get to know the customer better with native fuzzy search and matching in the blockchain. You can track provenance, ownership, relationship and lineage of assets, do trade/channel finance and post-trade reconciliation/settlement.

Blockchain is now a disruption vector for MDM. MDM vendors need to be at least blockchain-aware today, creating the ability for blockchain integration in the near future, such as what IBM InfoSphere Master Data Management is doing this year. Others will lose ground.

Source: When Worlds Collide: Blockchain and Master Data Management by analyticsweek

The Paradigmatic Shift of Data Fabrics: Connecting to Big Data

The notion of a data fabric encompasses several fundamental aspects of contemporary data management. It provides a singular means of data modeling, a centralized method of enforcing data governance, and an inimitable propensity for facilitating data discovery.

Its overarching significance to the enterprise, however, supersedes these individual facets of managing data. At its core, the data fabric tenet effectively signifies a transition in the way data is accessed and, to a lesser extent, even deployed.

According to Denodo Chief Marketing Officer Ravi Shankar, in a big data driven world with increasing infrastructure costs and regulatory repercussions, it has become all but necessary: “to be able to connect to the data instead of collect the data.”

Incorporating various facets of data lakes, virtualization technologies, data preparation, and data ingestion tools, a unified data fabric enables organizations to access data from any variety of locations including on-premise, in the cloud, from remote devices, or from traditional ones. Moreover, it does so in a manner which reinforces traditional centralization benefits such as consistency, uniformity, and oversight vital to regulatory compliance and data governance.

“The world is going connected,” Shankar noted. “There’s connected cars, connected devices. All of these are generating a lot of different data that is all being used to analyze information.”

Analyzing that data in times commensurate with consumer expectations—and with the self-service reporting tools that business users have increasingly become accustomed to—is possible today with a data fabric. According to Cambridge Semantics Chief Technology Officer Sean Martin, when properly implemented a data fabric facilitates “exposing all of the processing and information assets of the business to some sort of portal that has a way of exchanging data between the different sources”, which effectively de-silos the enterprise in the process.

Heterogeneous Data, Single Repository
The quintessential driver for the emergence of the enterprise data fabric concept is the ubiquity of big data and its multiple manifestations. The amounts and diversity of the types of data ingested test the limits of traditional data warehousing methods, which were not explicitly designed to account for the challenges of big data. Instead, organizations began turning to the cloud more and more frequently, while options such as Hadoop (renowned for its cheap storage) became increasingly viable. Consequently, “companies have moved away from a single consuming model in the sense that it used to be standardized for [platforms such as] BusinessObjects,” Shankar explained. “Now with the oncoming of Tableau and QlikView, there are multiple different reporting solutions through the use of the cloud. IT now wants to provide an independence to use any reporting tool.” The freedom to select the tool of choice for data analysis largely hinges on the modeling benefits of a data fabric, which helps to “connect to all the different sources,” Shankar stated. “It could be data warehousing, which many people have. It could be a big data system, cloud systems, and also other on-premises systems. The data fabric stitches all of these things together into a virtual repository and makes it available to the consumers.”

Data Modeling
From a data modeling perspective, a data fabric helps to reconcile the individual semantics involved with proprietary tools accessed through the cloud. Virtually all platforms accessed through the cloud (and many on-premise ones) have respective semantics and taxonomies which can quickly lead to vendor lock-in. “QlikView, Tableau, BusinessObjects, Cognos, all of these have semantic models that cater to their applications,” Shankar said. “Now, if you want to report with all these different forms you have to create different semantic models.” The alternative is to use the virtualization capabilities of a data fabric for effectively “unifying the semantic models within the data fabric,” Shankar said.

One of the principal advantages of this approach is to do so with semantics tailored for an organization’s own business needs, as opposed to those of a particular application. What Shankar referred to as the “high level logical data model” of a data fabric provides a single place for definitions, terms, and mapping which is applicable across the enterprise’s data. Subsequently, the individual semantic models of application tools are used in conjunction with that overlying logical business model, which provides the basis for the interchange of tools, data types, and data sources. “When the data’s in a data store it’s usually in a pretty obscure form,” Martin commented. “If you want to use it universally to make it available across your enterprise you need to convert that into a form that makes it meaningful. Typically the way we do that is by mapping it to an ontology.”

Data Governance
The defining characteristic of a data fabric is the aforementioned virtual repository for all data sources, which is one of the ways in which it builds upon the data lake concept. In addition to the uniform modeling it enables, it also supplies a singular place in which to store the necessary metadata for all sources and data types. That metadata, in turn, is one of the main ways users can create intelligent action for data discovery or search. “Since this single virtual repository actually stores all of this metadata information, the data fabric has evolved to support other functions like data discovery and search because this is one place where you can see all the enterprise data,” Shankar observed. Another benefit is the enhanced governance and security facilitated by this centralized approach in which the metadata about the data and the action created from the data is stored.

“The data fabric stores what we call metadata information,” Shankar said. “It stores information about the data, where to go find it, what type of data, what type of association and so on. It contains a bridge of the data.” This information is invaluable for determining data lineage, which becomes pivotal for effecting regulatory compliance. It can also function as a means of implementing role-based access to data “at the data fabric layer,” Shankar commented. “Since you check the source systems directly, if it comes through the data fabric it will make sure it only gives you the data you have access to.” Mapping the data to requisite business glossaries helps to buttress the enterprise-wide definitions and usage of terminology which are hallmarks of effective governance.

Data Preparation
The data fabric tenet is also a critical means of implementing data preparation quickly and relatively painlessly—particularly when compared to conventional ETL methods. According to Shankar: “Connecting to the data is much easier than collecting, since collecting requires moving the data, replicating it, and transforming it, all of which takes time.” Interestingly enough, those temporal benefits also translate into advantages for resources. Shankar estimated that for every four IT personnel required to enact ETL, only one person is needed to connect data with virtualization technologies. These temporal and resource advantages naturally translate to a greater sense of agility, which is critical for swiftly incorporating new data sources and satisfying customers in the age of real-time. In this regard, the business value of a data fabric directly relates to the abstraction capabilities of its virtualization technologies. According to Martin, “You start to build those abstractions that give you an agility with your data and your processing. Right now, how easy is it for you to move all your things from Amazon [Web Services] to Google? It’s a big effort. How about if the enterprise data fabric was pretty much doing that for you? That’s a value to you; you don’t get locked in with any particular infrastructure provider, so you get better pricing.”

Source: The Paradigmatic Shift of Data Fabrics: Connecting to Big Data by jelaniharper

The Future Of Big Data Looks Like Streaming

Big data is big news, but it’s still in its infancy. While most enterprises at least talk about launching Big Data projects, the reality is that very few do in any significant way. In fact, according to new survey data from Dimensional, while 91% of corporate data professionals have considered investment in Big Data, only 5% actually put any investment into a deployment, and only 11% even had a pilot in place.

Big data is big news, but it’s still in its infancy. While most enterprises at least talk about launching Big Data projects, the reality is that very few do in any significant way. In fact, according to new survey data from Dimensional, while 91% of corporate data professionals have considered investment in Big Data, only 5% actually put any investment into a deployment, and only 11% even had a pilot in place.

Real Time Gets Real

ReadWrite: Hadoop has been all about batch processing, but the new world of streaming analytics is all about real time and involves a different stack of technologies.

Langseth: Yes, however I would not entangle the concepts of real-time and streaming. Real-time data is obviously best handled as a stream. But it’s possible to stream historical data as well, just as your DVR can stream Gone with the Wind or last week’s American Idol to your TV.

 This distinction is important, as we at Zoomdata believe that analyzing data as a stream adds huge scalability and flexibility benefits, regardless of if the data is real-time or historical.

RW: So what are the components of this new stack? And how is this new big data stack impacting enterprise plans?

JL: The new stack is in some ways an extension of the old stack, and in some ways really new.

Data has always started its life as a stream. A stream of transactions in a point of sale system. A stream of stocks being bought and sold. A stream of agricultural goals being traded for valuable metals in Mesopotamia.

Traditional ETL processes would batch that data up and kill its stream nature. They did so because the data could not be transported as a stream, it needed to be loaded onto removable disks and tapes to be transported from place to place.

But now it is possible to take streams from their sources, through any enrichment or transformation processes, through analytical systems, and into the data’s “final resting place”—all as a stream. There is no real need to batch up data given today’s modern architectures such as Kafka and Kinesis, modern data stores such as MongoDB, Cassandra, Hbase, and DynamoDB (which can accept and store data as a stream), and modern business intelligence tools like the ones we make at Zoomdata that are able to process and visualize these streams as well as historical data, in a very seamless way.

Just like your home DVR can play live TV, rewind a few minutes or hours, or play moves from last century, the same is possible with data analysis tools like Zoomdata that treat time as a fluid.

Throw That Batch In The Stream

Also we believe that those who have proposed a “Lambda Architecture,” effectively separating paths for real-time and batched data, are espousing an unnecessary trade-off, optimized for legacy tooling that simply wasn’t engineered to handle streams of data be they historical or real-time.

At Zoomdata we believe that it is not necessary to separate-track real-time and historical, as there is now end-to-end tooling that can handle both from sourcing, to transport, to storage, to analysis and visualization.

RW: So this shift toward streaming data is real, and not hype?

JL: It’s real. It’s affecting modern deployments right now, as architects realize that it isn’t necessary to ever batch up data, at all, if it can be handled as a stream end-to-end. This massively simplifies Big Data architectures if you don’t need to worry about batch windows, recovering from batch process failures, etc.

So again, even if you don’t need to analyze data from five seconds or even five minutes ago to make business decisions, it still may be simplest and easiest to handle the data as a stream. This is a radical departure from the way things in big data have been done before, as Hadoop encouraged batch thinking.

But it is much easier to just handle data as a stream, even if you don’t care at all—or perhaps not yet—about real-time analysis.

RW: So is streaming analytics what Big Data really means?

JL: Yes. Data is just like water, or electricity. You can put water in bottles, or electricity in batteries, and ship them around the world by planes trains and automobiles. For some liquids, such as Dom Perignon, this makes sense. For other liquids, and for electricity, it makes sense to deliver them as a stream through wires or pipes. It’s simply more efficient if you don’t need to worry about batching it up and dealing with it in batches.

Data is very similar. It’s easier to stream big data end-to-end than it is to bottle it up.

Article originally appeared HERE.

Source: The Future Of Big Data Looks Like Streaming