November 21, 2016 Health and Biotech analytics news roundup

Here’s the latest in health and biotech analytics, in particular some new partnerships between academia and industry:

Analytical Booster Platform to Deliver “Smarter Healthcare”: THB (Technology, Healthcare, Big Data Analytics) is targeting the Indian healthcare system. They aim to give providers the “right information and tools at the right time.”

Pitt, Pfizer team up on health data analytics: The one-year partnership will use public and private data to find relationships between brain disease, brain imaging, and genetic markers.

Broad Institute Teams Up With Intel To Integrate Genomic Data From Diverse Sources And Enhance Genomic Data Analytic Capabilities: The Intel-Broad Center for Genomic Data Engineering will seek to optimize tools to be used on Intel-based computational platforms. They will also seek to enable collaborations through common workflow models.

UC San Francisco and GE Healthcare Launch Deep Learning Partnership to Advance Care Globally: They will be developing deep learning algorithms to help with many facets of healthcare problem solving, like determining what requires normal care and what requires quick intervention.

Source: November 21, 2016 Health and Biotech analytics news roundup

Innovation at The Power of Incubation

Innovating at The Power of Incubation
Innovating at The Power of Incubation

Having worked with corporate innovation and seen innovations evolving in different sectors, it became easier to imagine what an innovation cycle entails. Some companies do it better than the others, but most of the companies spend a lot of money on innovation with less visibility on the outcome and returns. And, in large corporates, a strong sense of bias still exists at all levels that could taint the disruptive idea even before it can be executed well. So, how to fix that?
Organize an incubator to disrupt your business. Wait don’t panic, let me explain how it could really help a fortune company stay in business for a sustainable, foreseeable future. Incubators are the next level of business case competition, but more rigorous, longer duration and more effective. In an incubator, you just don’t get the next big idea for the company, but also get the one that can be executed in most effective way.

Here are my 5 reason on why it is relevant:

1. It lets you find out the opportunities that were not visible to your focused eyes: Startup entrepreneurs have an open mind to try out the most amazing and complex projects. Also, they don’t have the restrictions (legal, bureaucracy, brand impact, accountability to shareholders) that large corporates have that pose as a big hurdle in innovation. Startups are also free from multiple biases that large corporations are plagued with. Hence, it is much easier and faster to innovate and try out new things in a lean manner and in a bureaucracy free startup environment. Incubators bring out the best of the both worlds, where the best minds are competing to bring the best products to market and where the corporates provides the necessary support and is its culture is not able to taint the ideas.

2. Stay close to the ideas that could disrupt your market and sleep better: What salesforce did to Oracle, and what amazon did to retail stores is not something you would be looking forward to. So, if you are not keeping your eyes and ears open to next disruption, you might miss the very last boat that will let you afloat. Incubators can act as a breeding ground for disruptive ideas not just for your current market landscape, but also things that might change the marketplace for your goods in the future for forever. So, it provides you an opportunity to grow in your land as well as in neighboring lands while investing limited resources.

3. Opportunity to hire entrepreneurs which rarely show up in HR resume: Incubators could be a great place to spot talent especially people who are motivated to endure in the land of unknown and can make things happen. It is HR’s dream to hire those 20% that lifts 80% of the company and take them to GREEN zone. I have been at numerous roles and seen variable talent pool. True entrepreneurs always stand out; hustling, giving their 110% heart and soul to make things happen. Money does not motivate them, but creating something useful does. Every company craves for people with out of the box thinking, lean, fast and ambitious to make things happen. In favor of sustainability, this is the talent that each organization needs especially for fostering innovation and success.

4. Keeps you current, sexy and relevant: Whether we talk about Larry Ellison discussing the concept of cloud/big data as fluff, or Steve Ballmer laughing at iPhone or Blackberry’s tumble. Every big conglomerate lives in their bubble, they have limited sized window that shows them the world they live in. Reality distortion is almost true for all big companies, bigger the size, thicker the lens, poorer the vision. Startups have the tendency to stay current and act on latest and greatest methodologies that exists today. Big companies get that know-how free if they are associated with startups. They could be part of what and how the world is changing and what roles startups play, so they can adept their practices and stay current. Companies don’t need to invest millions to get ideas on staying current, sometimes all it takes is thousands to make the difference.

5. Good karma points, positive PR and strong brand building: Last but not the least; incubators can really help a brand image of the large corporates, as startups are considered to be interesting, sexy and young. They attract the youth and the early adopters. Press and media want to find out and write about the next big idea in the industry. So, being attached to an incubator and startups gets you good media coverage and publicity and creates brand awareness. It also creates positive vibes in the old consumers and reinforces their support for the brand and attracts new customers.

There are many large corporates that have leveraged incubation as a technique to get an edge over innovation in their industries, namely – Pepsi, GE, Nike, Microsoft etc. All these companies are pioneers in their respective fields and have leveraged and profited from their involvement with the incubators and startups.

What should we do?

No, you don’t have to get into the incubation business; there are tons of incubators out there. Find one and partner with them to get you going on the road to fix some of the innovation loopholes that could not be fixed by data innovation. Yes, big data innovation is still super relevant and yes, incubators could help you innovate as well. So, there is more than one easy, cost effective, and optimal ways for big enterprises to innovate.

Here is a quick video by Christie Hefner on designing a corporate culture that is open to all ideas.

Source

Creating Great Choices to Enable #FutureOfWork by @JenniferRiel #JobsOfFuture #Podcast

 

In this podcast Jennifer Harris (@JenniferRiel) sat with Vishal (@Vishaltx from @AnalyticsWeek) to discuss her book “Creating Great Choices: A Leader’s Guide to Integrative Thinking”. She sheds light on the importance of integrating thinking in generating long lasting solutions. She shared some of the innovative ways business could get to creative problem solving that prevent bias and isolation and brings diversity in the opinion. Jennifer also spoke about the challenges that tribalism brings to the quality of decision making. This conversation and her book is great for anyone looking to create a futureproof organization that takes measured decision for effective outcome.

Her Book Link:
Creating Great Choices: A Leader’s Guide to Integrative Thinking by Jennifer Riel (Author), Roger L. Martin (Author) https://amzn.to/2JGeljS

Jennifer’s Recommended Read:
Pride and Prejudice by Jane Austen and Tony Tanner https://amzn.to/2MbHkeb
Thinking, Fast and Slow by Daniel Kahneman https://amzn.to/2sNzgbt
The Righteous Mind: Why Good People Are Divided by Politics and Religion by Jonathan Haidt https://amzn.to/2xUZFZD
Give and Take: Why Helping Others Drives Our Success by Adam M. Grant Ph.D. https://amzn.to/2xYtWHa

Podcast Link:
iTunes: http://math.im/jofitunes
GooglePlay: http://math.im/jofgplay

Here is Jennifer’s Bio:
Jennifer Riel is an adjunct professor at the Rotman School of Management, University of Toronto, specializing in creative problem solving. Her focus is on helping everyone, from undergraduate students to business executives, to create better choices, more of the time.

Jennifer is the co-author of Creating Great Choices: A Leader’s Guide to Integrative Thinking (with Roger L. Martin, former Dean of the Rotman School of Management). Based on a decade of teaching and practice with integrative thinking, the book lays out a practical methodology for tackling our most vexing business problems. Using illustrations from organizations like LEGO, Vanguard and Unilever, The book shows how individuals can leverage the tension of opposing ideas to create a third, better way forward.

An award-winning teacher, Jennifer leads training on integrative thinking, strategy and innovation at organizations of all types, from small non-profits to some of the largest companies in the world.

About #Podcast:
#JobsOfFuture podcast is a conversation starter to bring leaders, influencers and lead practitioners to come on show and discuss their journey in creating the work, worker and workplace of the future.

Want to sponsor?
Email us @ info@analyticsweek.com

Keywords:
#JobsOfFuture
JobsOfFuture
Jobs of future
Future of work
Leadership
Strategy

Source: Creating Great Choices to Enable #FutureOfWork by @JenniferRiel #JobsOfFuture #Podcast by v1shal

CMOs’ Journey from Big Data to Big Profits (Infographic)

CMOs’ Journey from Big Data to Big Profits (Infographic)
CMOs’ Journey from Big Data to Big Profits (Infographic)

Since the consumer purchase funnel is generating great amounts of data, it has become extremely difficult to track and make sense of the data, as consumers add social media and mobile channels to their decision-making. This is fueling the ever mounting pressure on CMOs to show how their budget delivers incremental business value.

Better data management is turning out to be a strong competitive edge and great value generation tools for organizations. So, well managed marketing organization will make adequate use of data.

This has pushed many marketers to stand overwhelmingly towards better big data analytics, as analytics will become a major component of their business over the next several years—according to the Teradata Data Driven Marketing Survey 2013 released by Teradata earlier this year, 71 percent or marketers say they plan to implement big data analytics within the next two years.

Marketers already rely on a number of common and easily accessible forms of data to drive their marketing initiatives—customer service data, customer satisfaction data, digital interaction data and demographic data. But true data-driven marketing takes it to the next level: Marketers need to collect and analyze massive amounts of complicated, unstructured data that combines the traditional data their companies have collected with interaction data (e.g., data pulled from social media), integrating both online and offline data sources to create a single view of their customer.

Visually and McKinsey & Company co published this infographic to illustrate the pressures that CMOs find themselves under and respective potential benefit in leveraging big data.

CMOs’ Journey from Big Data to Big Profits (Infographic)
CMOs’ Journey from Big Data to Big Profits (Infographic)

Source by v1shal

“To Cloud or Not”: Practical Considerations for Disaster Recovery and High Availability in Public Clouds

The perceived benefits of low cost storage, per-usage pricing models, and flexible accessibility—and those facilitated by multi-tenant, public cloud providers in particular—are compelling. Across industries and use cases, organizations are hastening to migrate to the cloud for applications that are increasingly becoming more mission critical with each deployment.

What many fail to realize (until after the fact) is that the question of security is not the only cautionary consideration accompanying this change in enterprise architecture. There are also numerous distinctions related to disaster recovery, failover clustering, and high availability for public cloud use cases which drastically differ from conventional on-premise methods for ensuring business continuity. Most times, businesses are tasked with making significant network configurations to enable these preventive measures which can ultimately determine how successful cloud deployments are.

“Once you’ve made the decision that the cloud is your platform, high availability and security are two things you can’t do without,” explained Dave Bermingham, Senior Technical Evangelist, Microsoft Cloud and Datacenter Management MVP at SIOS. “You have them on-prem. Whenever you have a business critical application, you make sure you take all the steps you can to make sure it’s highly available and your network is secure.”

Availability Realities
The realities of the potential for high availability in the cloud vastly differ from their general perception. According to Bermingham, most of the major public cloud providers such as AWS, Google, Azure and others “have multiple data centers across the entire globe. Within each geographic location they have redundancy in what they call zones. Each region is divided so you can have zone one and zone two be entirely dependent of one another, so there should be no single point of failure between the zones.” The standard promises of nearly 100 percent availability contained in most service-level agreements are predicated on organizations running instances in more than one zone.

However, for certain critical applications such as database management systems like Microsoft SQL Server, for example, “the data is being written in one instance,” noted Bermingham. “Even if you have a second instance up and running, it’s not going to do you any good because the data written on the primary instance won’t be on the secondary instance unless you take steps to make that happen.” Some large cloud providers don’t have Storage Area Networks (SANs) used for conventional on-premise high availability, while there are also few out-the-box opportunities for failovers between regions. The latter is especially essential when “you have a larger outage that affects an entire region,” Bermingham said. “A lot of what we’ve seen to date has been some user error…that has a far reaching impact that could bring down an entire region. These are also susceptible to natural disasters that are regional specific.”

Disaster Recovery
Organizations can maximize disaster recovery efforts in public clouds or even mitigate the need for them with a couple different approaches. Foremost of these involves SANless clusters, which provide failover capabilities not predicated on SAN. Instead of relying on storage networks not supported by some large public clouds, this approach relies on software to facilitate failovers via an experience that is “the same as their experience on-prem with their traditional storage cluster,” Bermingham mentioned. Moreover, it is useful for standard editions of database systems like SQL Server as opposed to options like Always On availability groups.

The latter enables the replication of databases and failovers, but is a feature of the pricey enterprise edition of database management systems such as SQL Server. These alternative methods to what public clouds offer for high availability can assist with redundancy between regions, as opposed to just between zones. “You really want to have a plan B for your availability beyond just distributed across different zones in the same region,” Bermingham commented. “Being able to get your data and have a recovery plan for an entirely different region, or even from one cloud provider to another if something really awful happened and Google went offline across multiple zones, that would be really bad.”

Business Continuity
Other factors pertaining to inter-region disaster recovery expressly relate to networking differences between public clouds and on-premise settings. Typically, when failing over to additional clusters clients can simply connect to a virtual IP address that moves between servers depending on which node is active at that given point in time. This process involves gratuitous Address Resolution Protocols (ARPs), which are not supported by some of the major public cloud vendors. One solution for notifying clients of an updated IP address involves “creating some host-specific routes in different subnets so each of your nodes would live in a different subnet,” Bermingham said. “Depending upon whichever node is online, it will bring an IP address specific to that subnet online. Then, the routing tables would automatically be configured to route directly to that address with a host-specific route.”

Another option is to leverage an internal load-bouncer for client re-direction, which doesn’t work across regions. According to Bermingham: “Many people want to not only have multiple instances in different zones in a same region, but also some recovery option should there be failure in an entire region so they can stand up another instance in an entirely different region in Google Cloud. Or, they can do a hybrid cloud and replicate back on-prem, then use your on-prem as a disaster recovery site. For those configurations that span regions, the route update method is going to be the most reliable for client re-direction.”

Security Necessities
By taking these dedicated measures to ensure business continuity, disaster recovery, and high availability courtesy of failovers, organizations can truly make public cloud deployments a viable means of extracting business value. They simply require a degree of upfront preparation which many businesses aren’t aware of until they’ve already invested in public clouds. There’s also the issue of security, which correlates to certain aspects of high availability. “A lot of times when you’re talking about high availability, you’re talking about moving data across networks so you have to leverage the tools the cloud provider gives you,” Bermingham remarked.“That’s really where high availability and security intersect: making sure your data is secure in transit, and the cloud vendors will give you tools to ensure that.”

Originally Posted at: “To Cloud or Not”: Practical Considerations for Disaster Recovery and High Availability in Public Clouds

Technology Considerations for CIOs for Data Driven Organization

Technology Considerations for CIOs for Data Driven Projects

Making your business data driven requires monitoring a lot more data than you are used to, so you will have a big-data hump to jump. The bigger is the data at play, the more is the need to handle bigger interfaces, multiple data sources etc. and it requires a strong data management strategy to manage various other considerations. Depending on the business and relevant data, technology consideration may vary. Following are certain areas that you could consider for technology strategy to pursue data driven project. Use this as a basic roadmap, every business is specific and it can have more or less things to worry about.

So key technology considerations for today’s CIO’s  include:

Database Considerations:

One of the primary thing that will make entire data driven project workable is the database considerations. This is dependent on the risks associated with the database.

Consider the following:

Coexisting with existing architecture:

One thing that you have to ask yourself is how will the required technology square with the existing infrastructure. Technology integration if not planned well could toast a well-run business. So, careful consideration is needed as it has direct dependency on performance and cost of the organization. ETL (Extract, Transform and Load) tools must act as a bridge between relational environment such as Oracle and analytics data warehouse such as Teradata.

– Storage and Hardware:

To make the engine work requires lot of processing around compressions, deduplication and cache management. These functions are critical for making data analytics work efficiently. Data analytics is the backbone for most of data driven projects. There are various vendors out there with tools that are sophisticated to handle up to 45 fold compressions, and reinflation, making processing and storage tedious. So, consideration around tools and their hardware and storage need is critical. So, each tool must be carefully studied for its footprint on the organization and resource allocations should be made according to the complexity of the tools and tasks.

– Query and Analytics:

Query complication varies dependent on used case. Some queries do not require a lot of pre/post processing and some queries require deep analytics, pre and post processing. Each used case comes with its own requirement and therefore must be dealt with accordingly. Some cases may even require help of visualization tools to make data consumable. So, careful considerations must be made on low and high bar requirements of the used cases. Query and Analytics requirement will have indirect impact on the cost as well as infrastructure requirement for the business.

– Scale and Manageability:

Business often have to accumulate data from disparate data sources and analyze it in different environment making entire model difficult to scale and manageable. It is another big task to understand the complications around data modeling. It encompasses infrastructure requirements, tool requirements, talent requirements etc. to provision for future growth. A deep consideration should be given to infrastructure scalability and manageability for post data driven model business. It is a delicate task and must be done carefully for accurate measurements.

Data Architecture:
There are many other decisions to be made when considering the information architecture design as it relates to big data storage/analysis. These include choosing between relational or non-relational data stores; virtualized on-premise servers or external clouds; in-memory or disk-based processing; uncompressed data formats (quicker access) or compressed (cheaper storage). Companies also need to decide whether or not to shard – split tables by row and distribute them across multiple servers – to improve performance. Other choices to be made include choosing either column-oriented or row-oriented as the dominant processing method and hybrid platform or greenfield approach. Solution could be the best mix of above stated combinations. So, a careful run of thought must be given to data requirements.

– Column-oriented Database:

As opposed to relational (row-based databases), column-oriented database group stores data that share similar attributes, e.g. one record contains the age for every customer. This type of data organization is conducive to performing many selective queries rapidly, a benchmark of big data analytics.

– In-memory Database:

In-memory is another way to speed up processing by turning to database platforms using CPU memory for data storage instead of physical disks. This cuts down the number of cycles required for data retrieval, aggregation and processing, enabling complex queries to be executed much faster. They are expensive system and has a use when high processing rate in real-time is a priority. Many trading desks use this model to process real-time trades.

– NoSQL:

“Not Only SQL” provides a foundation Semi-structured model of data handling for inconsistent or sparse data. It’s not structured data, and therefore does not require fixed-table schemas, join operations and can scale horizontally across nodes (locally or in the cloud). NoSQL offerings come in different shapes and sizes, with open-source and licensed options and keep the needs of various social and Web platforms in mind.

– Database Appliances:

They are readily usable data nodes that are self-contained combinations of hardware and software to extend storage capabilities of relational systems or to provide an engineered system for new big data capabilities such as columnar, in-memory databases.

– Map Reduce:

Is a technique used for distributing computation of large data sets across a cluster of commodity processing nodes. Processing can be performed in parallel, as the workload is reduced into discrete independent operations, allowing some workloads to be most effectively delivered via a cloud-based infrastructure. It comes really handy when dealing with big-data problem at low cost.

Other Constraints:

Other constraints that are somewhat linked to technology but must be considered are resource requirement, market risks associated with tools etc. This could be considered as an ad-hoc task but it holds the similar pain point and must be taken seriously. Some other such decisions are if resource/support is cheap or expensive, risks with technologies being adopted and affordability of the tools.

– Resource Availability:

As you nail down on technologies needed to fuel the data engine, one question to ask is: whether the knowledgeable resources are available in abundance or will it be a nightmare to find someone to help out. It is always helpful to adopt technologies that are popular and has more resources available at lesser cost. It is a simple math of demand and supply but it ultimately helps a lot later down the road.

– Associated Risks with tools:

As we all know world is changing rapidly at the rate difficult to keep pace. With this change comes changing technological landscape. It is crucial to consider the maturity of the tools and technologies being considered. Installing something new and fresh holds the risk of lower adoption and hence lesser support. Similarly, old school technologies are always vulnerable to disruption and run-down by competition. So, technology that stays somewhere in the middle should be adopted.

– Indirect Costs & Affordability:

Another interesting point that is linked to technology is the cost and affordability associated with a particular technology. License agreements, scaling costs, and cost to manage organizational change are some of the important consideration that needs to be taken care of. What starts cheap need not be cheap later on and vice-verse, so carefully planning customer growth, market change etc would help in understanding the complete long term terms and costs with adopting any technology.

So, getting your organization on rails of data driven engine is fun, super cool, and sustainable but holds some serious technology considerations that must be considered before jumping on a particular technology.

Here is a video from IDC’s Crawford Del Prete discussing CIO Strategies around Big Data and Analytics

Source

“More Than Just Data”: The Impact of Bayesian Machine Learning on Predictive Modeling

Predictive analytics is rapidly reshaping the power of data-driven practices. The influence of its most accessible format, machine learning, encompasses everything from standard data preparation processes to the most far-flung aspects of cognitive computing. In all but the most rare instances, there is a surfeit of vendors and platforms provisioning self-service options for laymen users.

Facilitating the overarching predictive models that function as the basis for predictive analytics results can involve many different approaches. According to Shapiro+Raj Senior Vice President of Strategy, Research and Analytics Dr. Lauren Tucker, two of those readily surpass the others.

“Some of the most advanced techniques are going to be Bayesian and machine learning, because those tools are the best ones to use in an environment of uncertainty ,” Tucker disclosed.

In a world of swiftly changing business requirements and ever fickle customer bases, a look into the future can sometimes prove the best way to gain competitive advantage—particularly when it’s backed by both data and knowledge.

The Future from the Present, Not the Past
Tucker largely advocates Bayesian machine learning methods for determining predictive modeling because of their intrinsically forward-facing nature. These models involve an element of immediacy that far transcends that of “traditional advanced analytics tools that use the history to predict the future” according to Tucker. The combination of Bayesian techniques within an overarching framework for machine learning models is able to replace a reliance on historic data with information that is firmly entrenched in the present. “Bayesian machine learning allows for much more forward-facing navigation of the future by incorporating other kinds of assumptions and elements to help predict the future based on certain types of assessments of what’s happening now,” Tucker said. Their efficacy becomes all the more pronounced in use cases in which there is a wide range of options for future events or factors that could impact them, such as marketing or sales trends that are found in any number of verticals.

The Knowledge Base of Bayesian Models
Bayesian modeling techniques are knowledge based, and require the input of a diversity of shareholders to flesh out their models. Integrating this knowledge base with the data-centric approach of machine learning successfully amalgamates both stakeholder insight and undisputed data-driven facts. This situation enables the strengths of the former to fortify those of the latter, forming a gestalt that provides a reliable means of determining future outcomes. “The Bayesian approach to predictive modeling incorporates more than just data,” Tucker said. “It ties in people’s assessments of what might happen in the future, it blends in people’s expertise and knowledge and turns that into numerical values that, when combined with hardcore data, gives you a more accurate picture of what’s going to happen in the future.” Historically, the effectiveness of Bayesian modeling was hampered by limitations on computing power, memory, and other technical circumscriptions that have been overcome by modern advancements in those areas. Thus, in use cases in which organizations have to bank the enterprise on performance, the confluence of Bayesian and machine learning models utilize more resources—both data-driven and otherwise—to predict outcomes.

Transparent Data Culture
The Bayesian machine learning approach to predictive modeling offers a multitude of additional advantages, one of which is an added degree of transparency in data-centric processes. By merging human expertise into data-driven processes, organizations are able to gain a sense of credence that might otherwise prevent them from fully embracing the benefits of data culture. “The irony of course is that traditional models, because they didn’t incorporate knowledge or expertise, sort of drew predictive outcomes that just could not be assimilated into a larger organization because people were like, ‘I don’t believe that; it’s not intuitive to me,” Tucker said. Bayesian machine learning models are able to emphasize a data-driven company culture by democratizing input into the models for predictive analytics. Instead of merely relying on an elite group of data scientists or IT professionals to determine the future results of processes, “stakeholders are able to incorporate their perspectives, their expertise, their understanding of the marketplace and the future of the marketplace into the modeling process,” Tucker mentioned.

High Stakes, Performance-Based Outcomes
The most convincing use cases for Bayesian machine learning are high stakes situations in which the success of organizations is dependent on the outcomes of their predictive models. A good example of such a need for reliable predictive modeling is performance-based marketing and performance-based pay incentives, in which organizations are only rewarded for the merits that they are able to produce from their efforts. McDonald’s recently caused headlines in which it mandated performance-based rewards for its marketing entities. “I would argue that more and more clients, whether bricks and mortar or e-commerce, are going to start to have that expectation that their marketing partners and their ad agencies are required to have advanced analytics become more integrated into the core offering of the agency, and not be peripheral to it,” Tucker commented.

Upfront Preparation
The key to the Bayesian methodology for predictive modeling is a thorough assessment process in which organizations—both at the departmental and individual levels—obtain employee input for specific use cases. Again, the primary boon associated with this approach is both a more accurate model and increased involvement in reinforcing data culture. “I’m always amazed at how many people in an organization don’t have a full picture of their business situation,” Tucker revealed. “This forces that kind of conversation, and forces it to happen.” The ensuing assessments in such situations uncover almost as much business expertise of a particular domain as they do information to inform the underlying predictive models. The machine learning algorithms that influence the results of predictive analytics in such cases are equally as valuable, and are designed to improve with time and experience.

Starting Small
The actual data themselves that organizations want to utilize for the basis of Bayseian machine learning predictive models can include big data. Or, they can include traditional small data, or just small data applications of big data. “One of the things that makes Bayesian approaches and machine learning approaches challenging is how to better and more efficiently get the kind of knowledge and some of the small data that we need— insights from that that we need—in order to do the bigger predictive modeling projects,” Tucker explained. Such information can include quantitative projections for ROI and revenues, as well as qualitative measures of insight of hard-earned experience. Bayesian machine learning models are able to utilize both data and the experiences of those that are dependent on that data to do their jobs in such a way that ultimately, they both benefit. The outcome is predictive analytics that can truly inform the future of an organization’s success.

Originally Posted at: “More Than Just Data”: The Impact of Bayesian Machine Learning on Predictive Modeling

[Podcast] Radical Transparency: Building A Kinder, Gentler Financial Marketplace

[soundcloud url=”https://api.soundcloud.com/tracks/463116444″ params=”color=#ff5500&auto_play=false&hide_related=false&show_comments=false&show_user=true&show_reposts=false&show_teaser=true” width=”100%” height=”166″ iframe=”true” /]

Ah – financial markets: a place where buyers and sellers come together to nail each other to the wall.

Like a giant Thunderdome of capitalism, buyers buy because they think the value of a security is going to go up, and sellers sell because they think it’s going to go down. It’s “two people enter, one person leaves”… having got the better of the other.

At least, that’s the way it’s supposed to work. But, apparently, PolicyBazaar, an India-based online marketplace for all types of insurance (and our customer), did not get the memo.

No, our friends at PolicyBazaar seem to think that a marketplace for financial products is where buyers and sellers come together to have a giant tea party, where both parties hold hands, exchange goods for services, and magically find themselves enriched, and wiser, for having engaged in a transaction.

By leveraging analytics to create a more transparent marketplace, PolicyBazaar has delivered valuable insights to sellers, helping them enhance their offerings to provide better, more-competitive products that more people want to buy.

On the other side of the transaction, PolicyBazaar has leveraged analytics so that their service team can counsel buyers, helping ensure that they get the best policy for the best price.

And why do they do this? Are they working both sides of the street to get consulting fees from the sellers and tips for good service from the buyers?

No. They do not.

The reason is that they do make a set fee on what’s sold online. And, as it turns out, that’s a great model, because they sell a whole lot of insurance. In fact, PolicyBazaar has captured a 90 percent market share in insurance in India in just a few years.

In the end, the radical transparency embraced by PolicyBazaar ensures that buyers buy more and are more satisfied, and sellers have better products and sell more. And absolutely nobody gets nailed to the wall on the deal.

But what’s the fun in that?

In this episode of Radical Transparency, we talk to the architect behind PolicyBazaar, CTO Ashish Gupta. We also speak to Max Wolff, Chief Economist at The Phoenix Group, who talks through malevolent markets (used cars) and benevolent markets (launching smartphones) to explain where PolicyBazaar resides.

Wolff also discusses whether analytics-driven markets approach what has been only a theoretical construct – a perfect market outcome – and whether the PolicyBazaar model is original, sustainable, and ethical.

You can check out the current episode here.

About Radical Transparency

Radical Transparency shows how business decision-makers are using analytics to make unexpected discoveries that revolutionize companies, disrupt industries, and, sometimes, change the world. Radical Transparency combines storytelling with analysis, economics, and data science to highlight the big opportunities and the big mistakes uncovered by analytics in business.

Source: [Podcast] Radical Transparency: Building A Kinder, Gentler Financial Marketplace

The Bifurcation of Data Science: Making it Work

Today’s data scientists face expectations of an enormous magnitude. These professionals are singlehandedly tasked with mastering a variety of domains, from those relevant to various business units and their objectives to some of the more far-reaching principles of statistics, algorithms, and advanced analytics.

Moreover, organizations are increasingly turning to them to solve complicated business problems with an immediacy which justifies both their salaries and the scarcity with which these employees are found. When one couples these factors with the mounting pressures facing the enterprise today in the form of increasing expenditures, operational complexity, and regulatory concerns, it appears this discipline as a whole is being significantly challenged to pave the way for successfully monetizing data-driven practices.

According to Cambridge Semantics advisor Carl Reed, who spent a substantial amount of time employing data-centric solutions at both Goldman Sachs and Credit Suisse, a number of these expectations are well-intentioned, but misplaced:

“Everyone is hitting the same problem with big data, and it starts with this perception and anticipation that the concept of turning data into business intelligence that can be a differentiator, a competitive weapon, is easy to do. But the majority of the people having that conversation don’t truly understand the data dimension of the data science they’re talking about.”

Truly understanding that data dimension involves codifying it into two elements: that which pertains to science and analytics and that which pertains to the business. According to Reed, mastering these elements of data science results in a profound ability to “really understand the investment you have to make to monetize your output and, by the way, it’s the exact same investment you have to make for all these other data fronts that are pressurizing your organization. So, the smart thing to do is to make that investment once and maximize your return.”

The Problems of Data Science
The problems associated with data science are twofold, although their impacts on the enterprise are far from commensurate with each other. This bifurcation is based on the pair of roles which data scientists have to fundamentally fulfill for organizations. The first is to function as a savvy analytics expert (what Reed termed “the guy that’s got PhD level experience in the complexities of advanced machine learning, neural networks, vector machines, etc.”); the second is to do so as a business professional attuned to the needs of the multiple units who rely on both the data scientist and his or her analytics output to do their jobs better. This second dimension is pivotal and effectively represents the initial problem organizations encounter when attempting to leverage data science—translating that science into quantitative business value. “You want to couple the first role with the business so that the data scientist can apply his modeling expertise to real world business problems, leveraging the subject matter expertise of business partners so that what he comes up with can be monetized,” Reed explained. “That’s problem number one: the model can be fantastic but it has to be monetized.”

The second problem attending the work of data scientists has been well documented—a fact which has done little to alleviate it. The copious quantities of big data involved in such work require an inordinate amount of what Reed termed “data engineering”, the oftentimes manual preparation work of cleansing, disambiguating, and curating data for a suitable model prior to actually implementing data for any practical purpose. This problem is particularly poignant because, as Reed mentioned, it “isn’t pushed to the edge; it’s more pulled to the center of your universe. Every single data scientist that I’ve spoken to and certainly the ones that I have on my team consider data engineering from a data science perspective to be “data janitorial work”.

Data Governance and Data Quality Ramifications
The data engineering quandary is so disadvantageous to data science and the enterprise today for multiple reasons. Firstly, it’s work that these professionals (with their advanced degrees and analytics shrewdness) are typically over qualified for. Furthermore, the ongoing data preparation process is exorbitantly time-consuming, which keeps these professionals from spending more time working on solutions that help the business realize organizational objectives. The time consumption involved in this process becomes substantially aggravated when advanced analytics algorithms for machine learning or deep learning are involved, as Reed revealed. “I’ve had anything from young, fresh, super productive 26 year olds coming straight out of PhD programs telling me that they’re spending 80 percent of their time cleaning and connecting data so they can fit into their model. If you think about machine learning where you’ve got a massive learning set of data which is indicative of the past, especially if you’re talking about supervised [machine learning], then you need a whole bunch of connected test data to make sure your model is doing what’s expected. That’s an enormous amount of data engineering work, not modeling work.”

The necessity of data engineering is that it must take place—correctly—in order to get the desired results out of data models and any subsequent analytics or applications dependent on such data. However, it is just that necessity which frequently creates situations in which such engineering is done locally for the needs of specific data scientists using sources for specific applications, which are oftentimes not reusable or applicable for additional business units or use cases. The resulting data quality and governance complications can swiftly subtract any value attached to the use of such data.

According to Reed: “The data scientist is important, but unless you want the data scientist to be constantly distracted by cleaning data with all the negative connotations that drag with it, which is everyone’s going to use their own lingua franca, everyone’s going to use their own context, everyone’s going to clean data duplicitously, everyone’s going to clean data for themselves which, in reality, is just adding yet another entity for the enterprise to disambiguate when they look across stuff. Essentially, if you leave it to the data science they’re not turning the data into an asset. They’re turning the data into a by-product of their model.”

Recurring Centralization Wins
The solution to these data science woes (and the key to its practical implementation) is as binary as the two primary aspects of these problems are. Organizations should make the data engineering process a separate function from data science, freeing data scientists’ time to focus on the creation of data-centric answers to business problems. Additionally, they should ensure that their data engineering function is centralized to avoid any localized issues of data governance and data quality stemming from provincial data preparation efforts. There are many methods by which organizations can achieve these objectives, from deploying any variety of data preparation and self-service data preparation options, to utilizing the standards-based approach of semantic graph technologies which implement such preparation work upfront for a naturally evolving user experience. By separating the data engineering necessity as an individual responsibility in a centralized fashion, the enterprise is able to “push that responsibility hopefully towards the center of your organization where you can stitch data together, you can disambiguate it, [and] you can start treating it like an enterprise asset making it available to your data science teams which you can then associate with your businesses and pull to the edge of your organization so that they leverage clean signal many times over as consumers versus having to create clean signals parochially for themselves,” Reed remarked.

Source by jelaniharper

Better Business Intelligence Boons with Data Virtualization

Data virtualization’s considerable influence throughout the data sphere is inextricably linked to its pervasive utility. It is at once the forefather of the current attention surrounding data preparation, the precursor to the data lake phenomenon, and one of the principal enablers of the abundant volumes and varieties of big data in its raw form.

Nevertheless, its enduring reputation as a means of enhancing business intelligence and analytics continues to persist, and is possibly the most compelling, cost-saving, and time-saving application of its widespread utility.

“It brings together related pieces of information sitting in different data stores, and then provides that to an analytical platform for them to view or make some analysis for reporting,” explained Ravi Shankar, Denodo Chief Marketing Officer. “Or it could be used to accomplish some use case such as a customer service rep that is looking at all the customer information, all the products they have bought, and how many affect them.”

Better BI
Data virtualization technologies are responsible for providing an abstraction layer of multiple sources, structures, types, and locations of data, which is accessible in a singular platform for any purpose. The data themselves do not actually move, but users can access them from the same place. The foundation of virtualization rests upon its ability to integrate data for a single, or even multiple, applications. Although there is no shortage of use cases for such singular, seamless integration—which, in modern or what Shankar calls “third generation” platforms involves a uniform semantic layer to rectify disparities in modeling and meaning—the benefits to BI users is one of the most immediately discernible.

Typical BI processing of various data from different stores and locations requires mapping of each store to an external one, and replicating data (which is usually regarded as a bottleneck) to that destination. The transformation for ETL involves the generation of copious amounts of code which is further slowed by a lengthy testing period. “This could take up to three months for an organization to do this work, especially for the volume of big data involved in some projects,” Shankar noted.

Virtualization’s Business Model
By working with what Shankar termed a “business model” associated with data virtualization, users simply drag and drop the requisite data into a user interface before publishing it to conventional BI or analytics tools. There is much less code involved with the process, which translates into decreased time to insight and less dedication of manpower. “What takes three months with a data integration ETL tool might take a week with data virtualization,” Shankar mentioned. “Usually the comparison is between 1 to 4 and 1 to 6.” That same ratio applies to IT people working on the ETL process versus a solitary data analyst leveraging virtualization technologies. “So you save on the cost and you save on the time with data virtualization,” Shankar said.

Beyond Prep, Before Data Lakes
Those cost and time reductions are worthy of examination in multiple ways. Not only do they apply to the data preparation process of integrating and transforming data, but they also affect the front offices that are leveraging data quicker and more cheaper than standard BI paradigms allow them to do so. Since data is both stored and accessed in their native forms, data virtualization technologies represent the first rudimentary attempts at the data lake concept in which all data are stored in their native formats. This sort of aggregation is ideal for use with big data and its plentiful forms and structures, enabling organizations to analyze it with traditional BI tools. Still, such integration becomes more troublesome than valuable without common semantics. “The source systems, which are databases, have a very technical way of defining semantics,” Shankar mentioned. “But for the business user, you need to have a semantic layer.”

Before BI: Semantics and Preparation
A critical aspect of the integration that data virtualization provides is its designation of a unified semantic layer, which Shankar states is essential to “the transformation from a technical to a business level” of understanding the data’s significance. Semantic consistency is invaluable to ensuring successful integration and standardized meaning of terms and definitions. Traditional BI mechanisms require ETL tools and separate measures for data quality to cohere such semantics. However, this pivotal step is frequently complicated in some of the leading BI platforms on the market, which account for semantics in multiple layers.

This complexity is amplified by the implementation of multiple tools and use cases across the enterprise. Virtualization platforms address this requisite by provisioning a central location for common semantics that are applicable to the plethora of uses and platforms that organizations have. “What customers are doing now is centralizing their semantic layer and definitions within the data virtualization layer itself,” Shankar remarked. “So they don’t have to duplicate that within any of the tools.”

Governance and Security
The lack of governance and security that can conceptually hamper data lakes—turning them into proverbial data swamps—does not exist with data virtualization platforms. There are multiple ways in which these technologies account for governance. Firstly, they enable the same sort of access controls from the source system at the virtualization layer. “If I’m going to Salesforce.com within my own company, I can see the sales opportunities but someone else in marketing who’s below me cannot see those sales opportunities,” Shankar said. “They can see what else there is of the marketing. If you have that level of security already set up, then the data virtualization will be able to block you from being able to see that information.”

This security measure is augmented (or possibly even superseded) by leveraging virtualization as a form of single sign-on, whereas users can no longer directly access an application but instead have to go through the virtualization layer first. In this case, “the data virtualization layer becomes the layer where we will do the authorization and authentication for all of the source systems,” Shankar said. “That way all the security policies are governed in one central place and you don’t have to program them for each of the separate applications.”

Beyond BI
The benefits that data virtualization produces easily extend beyond business intelligence. Still, the more efficient and expedient analytical insight virtualization technologies beget can revamp almost any BI deployment. Furthermore, it does so in a manner that reinforces security and governance, while helping to further the overarching self-service movement within the data sphere. With both cloud and on-premise options available, it helps to significantly simplify integration and many of the eminent facets of data preparation that make analytics possible.

Source