Data virtualizationâs considerable influence throughout the data sphere is inextricably linked to its pervasive utility. It is at once the forefather of the current attention surrounding data preparation, the precursor to the data lake phenomenon, and one of the principal enablers of the abundant volumes and varieties of big data in its raw form.
Nevertheless, its enduring reputation as a means of enhancing business intelligence and analytics continues to persist, and is possibly the most compelling, cost-saving, and time-saving application of its widespread utility.
âIt brings together related pieces of information sitting in different data stores, and then provides that to an analytical platform for them to view or make some analysis for reporting,â explained Ravi Shankar, Denodo Chief Marketing Officer. âOr it could be used to accomplish some use case such as a customer service rep that is looking at all the customer information, all the products they have bought, and how many affect them.â
Data virtualization technologies are responsible for providing an abstraction layer of multiple sources, structures, types, and locations of data, which is accessible in a singular platform for any purpose. The data themselves do not actually move, but users can access them from the same place. The foundation of virtualization rests upon its ability to integrate data for a single, or even multiple, applications. Although there is no shortage of use cases for such singular, seamless integrationâwhich, in modern or what Shankar calls âthird generationâ platforms involves a uniform semantic layer to rectify disparities in modeling and meaningâthe benefits to BI users is one of the most immediately discernible.
Typical BI processing of various data from different stores and locations requires mapping of each store to an external one, and replicating data (which is usually regarded as a bottleneck) to that destination. The transformation for ETL involves the generation of copious amounts of code which is further slowed by a lengthy testing period. âThis could take up to three months for an organization to do this work, especially for the volume of big data involved in some projects,â Shankar noted.
Virtualizationâs Business Model
By working with what Shankar termed a âbusiness modelâ associated with data virtualization, users simply drag and drop the requisite data into a user interface before publishing it to conventional BI or analytics tools. There is much less code involved with the process, which translates into decreased time to insight and less dedication of manpower. âWhat takes three months with a data integration ETL tool might take a week with data virtualization,â Shankar mentioned. âUsually the comparison is between 1 to 4 and 1 to 6.â That same ratio applies to IT people working on the ETL process versus a solitary data analyst leveraging virtualization technologies. âSo you save on the cost and you save on the time with data virtualization,â Shankar said.
Beyond Prep, Before Data Lakes
Those cost and time reductions are worthy of examination in multiple ways. Not only do they apply to the data preparation process of integrating and transforming data, but they also affect the front offices that are leveraging data quicker and more cheaper than standard BI paradigms allow them to do so. Since data is both stored and accessed in their native forms, data virtualization technologies represent the first rudimentary attempts at the data lake concept in which all data are stored in their native formats. This sort of aggregation is ideal for use with big data and its plentiful forms and structures, enabling organizations to analyze it with traditional BI tools. Still, such integration becomes more troublesome than valuable without common semantics. âThe source systems, which are databases, have a very technical way of defining semantics,â Shankar mentioned. âBut for the business user, you need to have a semantic layer.â
Before BI: Semantics and Preparation
A critical aspect of the integration that data virtualization provides is its designation of a unified semantic layer, which Shankar states is essential to âthe transformation from a technical to a business levelâ of understanding the dataâs significance. Semantic consistency is invaluable to ensuring successful integration and standardized meaning of terms and definitions. Traditional BI mechanisms require ETL tools and separate measures for data quality to cohere such semantics. However, this pivotal step is frequently complicated in some of the leading BI platforms on the market, which account for semantics in multiple layers.
This complexity is amplified by the implementation of multiple tools and use cases across the enterprise. Virtualization platforms address this requisite by provisioning a central location for common semantics that are applicable to the plethora of uses and platforms that organizations have. âWhat customers are doing now is centralizing their semantic layer and definitions within the data virtualization layer itself,â Shankar remarked. âSo they donât have to duplicate that within any of the tools.â
Governance and Security
The lack of governance and security that can conceptually hamper data lakesâturning them into proverbial data swampsâdoes not exist with data virtualization platforms. There are multiple ways in which these technologies account for governance. Firstly, they enable the same sort of access controls from the source system at the virtualization layer. âIf Iâm going to Salesforce.com within my own company, I can see the sales opportunities but someone else in marketing whoâs below me cannot see those sales opportunities,â Shankar said. âThey can see what else there is of the marketing. If you have that level of security already set up, then the data virtualization will be able to block you from being able to see that information.â
This security measure is augmented (or possibly even superseded) by leveraging virtualization as a form of single sign-on, whereas users can no longer directly access an application but instead have to go through the virtualization layer first. In this case, âthe data virtualization layer becomes the layer where we will do the authorization and authentication for all of the source systems,â Shankar said. âThat way all the security policies are governed in one central place and you donât have to program them for each of the separate applications.â
The benefits that data virtualization produces easily extend beyond business intelligence. Still, the more efficient and expedient analytical insight virtualization technologies beget can revamp almost any BI deployment. Furthermore, it does so in a manner that reinforces security and governance, while helping to further the overarching self-service movement within the data sphere. With both cloud and on-premise options available, it helps to significantly simplify integration and many of the eminent facets of data preparation that make analytics possible.