Improving Big Data Governance with Semantics

By Dr. Jans Aasman Ph.d, CEO of Franz Inc.

Effective data governance consists of protocols, practices, and the people necessary for implementation to ensure trustworthy, consistent data. Its yields include regulatory compliance, improved data quality, and data’s increased valuation as a monetary asset that organizations can bank on.

Nonetheless, these aspects of governance would be impossible without what is arguably its most important component: the common terminologies and definitions that are sustainable throughout an entire organization, and which comprise the foundation for the aforementioned policy and governance outcomes.

When intrinsically related to the technologies used to implement governance protocols, terminology systems (containing vocabularies and taxonomies) can unify terms and definitions at a granular level. The result is a greatly increased ability to tackle the most pervasive challenges associated with big data governance including recurring issues with unstructured and semi-structured data, integration efforts (such as mergers and acquisitions), and regulatory compliance.

A Realistic Approach
Designating the common terms and definitions that are the rudiments of governance varies according to organization, business units, and specific objectives for data management. Creating policy from them and embedding them in technology that can achieve governance goals is perhaps most expediently and sustainably facilitated by semantic technologies, which are playing an increasingly pivotal role in the overall implementation of data governance in the wake of big data’s emergence.

Once organizations adopt a glossary of terminology and definitions, they can then determine rules about terms based on their relationships to one another via taxonomies. Taxonomies are useful for disambiguation purposes and can clarify preferred labels—among any number of synonyms—for different terms in accordance to governance conventions. These definitions and taxonomies form the basis for automated terminology systems that label data according to governance standards via inputs and outputs. Ingested data adheres to terminology conventions and is stored according to preferred labels. Data captured prior to the implementation of such a system can still be queried according to the system’s standards.

Linking Terminology Systems: Endless Possibilities
The possibilities that such terminology systems produce (especially for unstructured and semi-structured big data) are virtually limitless, particularly with the linking capabilities of semantic technologies. In the medical field, a hand written note hastily scribbled by a doctor can be readily transcribed by the terminology system in accordance to governance policy with preferred terms, effectively giving structure to unstructured data. Moreover, it can be linked to billing coding systems per business functions. That structured data can then be stored in a knowledge repository and queried along with other data, adding to the comprehensive integration and accumulation of data that gives big data its value.

Focusing on common definitions and linking terminology systems enables organizations to leverage business intelligence and analytics on different databases across business units. This method is also critical for determining customer disambiguation, a frequently occurring problem across vertical industries. In finance, it is possible for institutions with numerous subsidiaries and acquisitions (such as Citigroup, Citibank, Citi Bike, etc.) to determine which subsidiary actually spent how much money with the parent company and additional internal, data-sensitive problems by using a common repository. Also, linking the different terminology repositories for these distinct yet related entities can achieve the same objective.

The primary way in which semantics addresses linking between terminology systems is by ensuring that those systems are utilizing the same words and definitions for the commonality of meaning required for successful linking. Vocabularies and taxonomies can provide such commonality of meaning, which can be implemented with ontologies to provide a standards-based approach to disparate systems and databases.

Subsequently, all systems that utilize those vocabularies and ontologies can be linked. In finance, the Financial Industry Business Ontology (FIBO) is being developed to grant “data harmonization and…the unambiguous sharing of meaning across different repositories.” The life sciences industry is similarly working on industry wide standards so that numerous databases can be made available to all within this industry, while still restricting access to internal drug discovery processes according to organization.

Regulatory Compliance and Ontologies
In terms of regulatory compliance, organizations are much more flexible and celeritous to account for new requirements when data throughout disparate systems and databases are linked and commonly shared—requiring just a single update as opposed to numerous time consuming updates in multiple places. Issues of regulatory compliance are also assuaged in a semantic environment through the use of ontological models, which provide the schema that can create a model specifically in adherence to regulatory requirements.

Organizations can use ontologies to describe such requirements, then write rules for them that both restrict and permit access and usage according to regulations. Although ontological models can also be created for any other sort of requirements pertaining to governance (metadata, reference data, etc.) it is somewhat idealistic to attempt to account for all facets of governance implementation via such models. The more thorough approach is to do so with terminology systems and supplement them accordingly with ontological models.

Terminologies First
The true value in utilizing a semantic approach to big data governance that focuses on terminology systems, their requisite taxonomies, and vocabularies pertains to the fact that this method is effective for governing unstructured data. Regardless of what particular schema (or lack thereof) is available, organizations can get their data to adhere to governance protocols by focusing on the terms, definitions, and relationships between them. Conversely, ontological models have a demonstrated efficacy with structured data. Given the fact that the majority of new data created is unstructured, the best means of wrapping effective governance policies and practices around them is through leveraging these terminology systems and semantic approaches that consistently achieve governance outcomes.

About the Author: Dr. Jans Aasman Ph.d is the CEO of Franz Inc., an early innovator in Artificial Intelligence and leading supplier of Semantic Graph Database technology. Dr. Aasman’s previous experience and educational background include:
• Experimental and cognitive psychology at the University of Groningen, specialization: Psychophysiology, Cognitive Psychology.
• Tenured Professor in Industrial Design at the Technical University of Delft. Title of the chair: Informational Ergonomics of Telematics and Intelligent Products
• KPN Research, the research lab of the major Dutch telecommunication company
• Carnegie Mellon University. Visiting Scientist at the Computer Science Department of Prof. Dr. Allan Newell

Originally Posted at: Improving Big Data Governance with Semantics

How the lack of the right data affects the promise of big data in India

Big data is the big buzzword these days. Big data refers to a collection of data sets or information too large and complex to be processed by standard tools. It is the art and science of combining enterprise data, social data and machine data to derive new insights, which it otherwise would not be possible to derive. It is also about combining past data with real time data to predict or suggest outcomes in a current or future context.

yourstory_BigData

The digital footprint is progressively expanding world over, into fragmented mediums (blogs, tweets, reviews etc.) and technologies (mobile, web, cloud/SaaS etc.).

Digital landscape in India

India’s digital landscape too may be evolving quickly but overall penetration remains low, with only 1 in 5 Indians using the Internet in July 2014.

In India, enterprises and businesses have access to a veritable wealth of information. And though some of the larger organisations have made a start in harnessing this information, most Indian companies are still learning how to collect and store big data.

Telecom providers, online travel agencies and online retail stores are some of the industries that are using big data analytics to engage customers in some way or another.

However, big data analytics is still in its infancy in India. Most companies are still learning to store the data collected. Also, there are several challenges when it comes to the collection of data sets themselves. Past and current data is required to make the application of big data analytics really useful, and there is a scarcity of this in public and private sectors in India. Some of the reasons for the lack of enough data are:

Yet to be fully computerised

Healthcare, economic, and statistical data, in both private and public sectors in India, is yet to be computerised. The main reason for this is the late adoption of IT in India. Unlike in the West, most industries in India made the transition from manual records to computerised information systems only during the last decade.

Over the years, the state and central ministries have made moves towards e-governance.  Efforts to deliver public services, and to make access to these services easier, are being made as well. This is still a work in progress; huge amounts of data across many government sectors are yet to be digitised.

Quality of data

In big data analytics, data sufficiency plays a critical role when samples are run across different dimensions. Sufficient data points to make informed analyses are required. Not only the quantity of data, the quality of data being used for crunching, too, influences the quality of insights.  If the signal-to-noise-ratio is high, the accuracy of results may vary for less than optimum data samples. In a country like India, there is very little information about the individuals, due to the fact that Indians are not overly expressive, especially on public forums.

Public social media information that is available for most individuals from India lacks quality information about users themselves. Random facts and figures in individual profiles, sharing of spam content, and fake social media accounts that are created for bots are very common in India.

Spam

Social media sites are becoming increasingly vulnerable to spam attacks. Time spent by a captive audience on social media sites opens up windows of opportunities for online threats and spammers.

Again, social media spam contributes to the signal-to-noise-ratio that defines the quality of big data. This takes away from the accuracy of results.

Cultural and Social influences

In most western markets, insights generated through big data can be applied across the whole consumer base. However, given the extensive cultural and linguistic variation across India, any insight generated for a consumer based out of Chandigarh, for example, will not be directly applicable to a consumer based in Chennai. This problem is made worse by the fact that a lot of local data lives in regional publications, in different languages, and has very limited online visibility.

Unstructured data leads to mapping issues

Big data in India is not structured. Most transactional data in the healthcare and retail segments are stored purely for book-keeping purposes. They have very limited appropriate information of the kind that can help big data analytics map enterprise-generated transactional data with public information.

In the case of developed countries, user data is rich enough to provide demographic or group level markers that can be used to generate customized insights while maintaining individual privacy. Lack of these standard identifiers in Indian consumer data is one of the biggest bottlenecks while mapping various transactional and social records in India.

Handsets and internet connectivity

Even though smart phones are driving the new handset market in India, feature phones still dominate everyday usage. Most connections in India are pre-paid and fewer than 10% of users have access to 3G networks. To add to it, internet connection speeds are amongst the lowest in Asia. As a result, consumer data, especially retail enterprise data, is limited.

As more people in India make the move to smart phones, and internet connectivity improves, there will be an increase in the amount of usable data generated. As big data analytics is in its infancy in India today, huge efforts would need to be made to improve the quality of data stored by organisations and enterprises. However, key contributors to the promise of big data analytics in India are steadily gaining ground. An increase in social media users, and efforts by enterprises, both public and private for optimum collection and storage of transactional enterprise data, will contribute to better quality data sets for the better application of big data analytics.

 About the Author: Srikant Sastri is the Co-founder of Crayon Data.

To read the original article on YourStory, click here.

Source by analyticsweekpick

Jun 15, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Data security  Source

[ AnalyticsWeek BYTES]

>> Talent analytics in practice by analyticsweekpick

>> Data center location – your DATA harbour by martin

>> The 10 Commandments for data driven leaders by v1shal

Wanna write? Click Here

[ NEWS BYTES]

>>
 Women and Men Now Grocery Shop Equally: Study … – Progressive Grocer Under  Prescriptive Analytics

>>
 Fast-moving big data changes data preparation process for analytics – TechTarget Under  Big Data

>>
 Scientists use Tweet ‘sentiment analysis’ to predict Hillary Clinton win – Daily News & Analysis Under  Sentiment Analysis

More NEWS ? Click Here

[ FEATURED COURSE]

Applied Data Science: An Introduction

image

As the world’s data grow exponentially, organizations across all sectors, including government and not-for-profit, need to understand, manage and use big, complex data sets—known as big data…. more

[ FEATURED READ]

Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython

image

Python for Data Analysis is concerned with the nuts and bolts of manipulating, processing, cleaning, and crunching data in Python. It is also a practical, modern introduction to scientific computing in Python, tailored f… more

[ TIPS & TRICKS OF THE WEEK]

Strong business case could save your project
Like anything in corporate culture, the project is oftentimes about the business, not the technology. With data analysis, the same type of thinking goes. It’s not always about the technicality but about the business implications. Data science project success criteria should include project management success criteria as well. This will ensure smooth adoption, easy buy-ins, room for wins and co-operating stakeholders. So, a good data scientist should also possess some qualities of a good project manager.

[ DATA SCIENCE Q&A]

Q:Do you think 50 small decision trees are better than a large one? Why?
A: * Yes!
* More robust model (ensemble of weak learners that come and make a strong learner)
* Better to improve a model by taking many small steps than fewer large steps
* If one tree is erroneous, it can be auto-corrected by the following
* Less prone to overfitting

Source

[ VIDEO OF THE WEEK]

#FutureOfData Podcast: Peter Morgan, CEO, Deep Learning Partnership

 #FutureOfData Podcast: Peter Morgan, CEO, Deep Learning Partnership

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

For every two degrees the temperature goes up, check-ins at ice cream shops go up by 2%. – Andrew Hogue, Foursquare

[ PODCAST OF THE WEEK]

#FutureOfData Podcast: Conversation With Sean Naismith, Enova Decisions

 #FutureOfData Podcast: Conversation With Sean Naismith, Enova Decisions

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

Estimates suggest that by better integrating big data, healthcare could save as much as $300 billion a year — that’s equal to reducing costs by $1000 a year for every man, woman, and child.

Sourced from: Analytics.CLUB #WEB Newsletter

Big Data: Are you ready for blast-off?

As Technology of Business begins a month-long series of features on the theme of Big Data, we kick off with a Q&A backgrounder answering some of those basic questions you were too afraid to ask.

Good question. After all, we’ve always had large amounts of data haven’t we, from loyalty card schemes, till receipts, medical records, tax returns and so on?

As Laurie Miles, head of analytics for big data specialist SAS, says: “The term big data has been around for decades, and we’ve been doing analytics all this time. It’s not big, it’s just bigger.”

But it’s the velocity, variety and volume of data that has merited the new term.

So what made it bigger?

Most traditional data was structured, or neatly organised in databases. Then the world went digital and the internet came along. Most of what we do could be translated into strings of ones and noughts capable of being recorded, stored, searched, and analysed.

There was a proliferation of so-called unstructured data generated by all our digital interactions, from email to online shopping, text messages to tweets, Facebook updates to YouTube videos.

Man checking phone before networked society poster
As the number of mobile phones grows globally, so does the volume of data they generate from call metadata, texts, emails, social media updates, photos, videos, and location

And the number of gadgets recording and transmitting data, from smartphones to intelligent fridges, industrial sensors to CCTV cameras, has also proliferated globally, leading to an explosion in the volume of data.

These data sets are now so large and complex that we need new tools and approaches to make the most of them.

How much data is there?

Nobody really knows because the volume is growing so fast. Some say that about 90% of all the data in the world today has been created in the past few years.

According to computer giant IBM, 2.5 exabytes – that’s 2.5 billion gigabytes (GB) – of data was generated every day in 2012. That’s big by anyone’s standards. “About 75% of data is unstructured, coming from sources such as text, voice and video,” says Mr Miles.

And as mobile phone penetration is forecast to grow from about 61% of the global population in 2013 to nearly 70% by 2017, those figures can only grow. The US government’s open data project already offers more than 120,000 publicly available data sets.

Where is it all stored?

The first computers came with memories measured in kilobytes, but the latest smartphones can now store 32GB and many laptops now have one terabyte (1,000GB) hard drives as standard. Storage is not really an issue anymore.

NSA data centre in Utah
The US National Security Agency has built a huge data centre in Bluffdale, Utah – codenamed Bumblehive – capable of storing a yottabyte of data – that’s one thousand trillion gigabytes

For large businesses “the cost of data storage has plummeted,” says Andrew Carr, UK and Ireland chief executive of IT consultancy Bull. Businesses can either keep all their data on-site, in their own remote data centres, or farm it out to “cloud-based” data storage providers.

A number of open source platforms have grown up specifically to handle these vast amounts of data quickly and efficiently, including Hadoop, MongoDB, Cassandra, and NoSQL.

Why is it important?

Data is only as good as the intelligence we can glean from it, and that entails effective data analytics and a whole lot of computing power to cope with the exponential increase in volume.

But a recent Bain & Co report found that of 400 large companies those that had already adopted big data analytics “have gained a significant lead over the rest of the corporate world.”

“Big data is not just historic business intelligence,” says Mr Carr, “it’s the addition of real-time data and the ability to mash together several data sets that makes it so valuable.”

Practically, anyone who makes, grows and sells anything can use big data analytics to make their manufacturing and production processes more efficient and their marketing more targeted and cost-effective.

It is throwing up interesting findings in the fields of healthcare, scientific research, agriculture, logistics, urban design, energy, retailing, crime reduction, and business operations – several of which we’ll be exploring over the coming weeks.

Thai farmer works in rice field
By analysing weather, soil, topography and GPS tractor data, farmers can increase crop yields

“It’s a big deal for corporations, for society and for each individual,” says Ralf Dreischmeier, head of The Boston Consulting Group’s information technology practice.

Can we handle all this data?

Big data needs new skills, but the business and academic worlds are playing catch up. “The job of data scientist didn’t exist five or 10 years ago,” says Duncan Ross, director of data science at Teradata. “But where are they? There’s a shortage.”

And many businesses are only just waking up to the realisation that data is a valuable asset that they need to protect and exploit. “Banks only use a third of their available data because it often sits in databases that are hard to access,” says Mr Dreischmeier.

“We need to find ways to make this data more easily accessible.”

Businesses, governments and public bodies also need to keep sensitive data safe from hackers, spies and natural disasters – an increasingly tall order in this mobile, networked world.

Who owns it all?

That’s the billion dollar question. A lot depends on the service provider hosting the data, the global jurisdiction it is stored in, and how it was generated. It is a legal minefield.

Facebook logo
Facebook’s logo – created using photos of its global users – adorns the wall of a new data centre in Sweden – its first outside the US. But who has rights to all the data?

Does telephone call metadata – the location, time, and duration of calls rather than their conversational content – belong to the caller, the phone network or any government spying agency that happens to be listening in?

When our cars become networked up, will it be the drivers, owners or manufacturers who own the data they generate?

Social media platforms will often say that their users own their own content, but then lay claim to how that content is used, reserving the right to share it with third parties. So when you tweet you effectively give up any control over how that tweet is used in future, even though Twitter terms and conditions say: “What’s yours is yours.”

Privacy and intellectual property laws have not kept up with the pace of technological change.

Originally posted via “Big Data: Are you ready for blast-off?”

Source: Big Data: Are you ready for blast-off? by anum

It’s Time to Tap into the Cloud Data Protection Market Opportunity

DARPA_Big_Data
Until now, most businesses did not have the access or resources to implement more complete data protection, including advanced backup, disaster recovery, and secure file sync and share. In fact, a recent study from research firm IDC found that 70% of SMBs have insufficient disaster recovery protection today. At the same time, a recent Spiceworks survey reported that cloud backup and recovery is the top cloud service that IT Pros plan to start using in the next six months.

The good news is that companies today have more options for data protection than ever before. The cloud makes enterprise-grade backup and disaster recovery solutions accessible and affordable for SMBs–and this translates into a massive market opportunity for service providers.

At Acronis, we believe that service providers are uniquely positioned to tap into the cloud to bring best-in-class data protection services to their customers.

We all know that service providers are experts at providing IT services, including administration, maintenance and customer support. They’ve opened up the door to cloud computing for businesses of all sizes, especially for SMBs.

But, service providers do much more than provide cloud solutions, servers and storage. For example, service providers are constantly improving upon the efficiency and cost-effectiveness of the solutions they deliver, including integrating different services into completely transparent and uniform services for their customers.

Service providers also look for opportunities to continuously enhance their offerings to provide end customers with the best possible solutions–now and in the future. Finally, service providers are the best cost managers in the business–they know how to scale solutions and make them easier to buy and deploy for end users. This relentless focus on cost-effectiveness benefits both their businesses with higher margins and their end customers with better value at a lower cost.

This is why Acronis delivers a complete set of cloud data protection solutions for service providers. We know service providers, and we know what it takes to make them successful. And there is a huge and unmet market need for easy, complete and affordable data protection for small and midsize businesses.

The bottom line: Now’s the ideal time to check out how you can grow your business with the latest solutions in cloud data protection, leveraging highly flexible go-to-market models and support for the broadest range of service provider workloads.

If you’d like to learn more about how Acronis can help you quickly tap into the growing market for cloud data protection services, you’ll find more information about our solutions here.

Read more at: http://mspmentor.net/blog/it-s-time-tap-cloud-data-protection-market-opportunity

Originally Posted at: It’s Time to Tap into the Cloud Data Protection Market Opportunity by analyticsweekpick

Jun 08, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Data interpretation  Source

[ AnalyticsWeek BYTES]

>> 8 Best Practices to Maximize ROI from Predictive Analytics by analyticsweekpick

>> Data Driven Innovation: A Primer by v1shal

>> Map of US Hospitals and their Patient Experience Ratings by bobehayes

Wanna write? Click Here

[ NEWS BYTES]

>>
 RFx (request for x) encompasses the entire formal request process and can include any of the following: – TechTarget Under  Sales Analytics

>>
 SAP’s Leonardo points towards Applied Data Science as a Service – Diginomica Under  Data Science

>>
 Four ways to create the ultimate personalized customer experience – TechTarget Under  Customer Experience

More NEWS ? Click Here

[ FEATURED COURSE]

Pattern Discovery in Data Mining

image

Learn the general concepts of data mining along with basic methodologies and applications. Then dive into one subfield in data mining: pattern discovery. Learn in-depth concepts, methods, and applications of pattern disc… more

[ FEATURED READ]

Superintelligence: Paths, Dangers, Strategies

image

The human brain has some capabilities that the brains of other animals lack. It is to these distinctive capabilities that our species owes its dominant position. Other animals have stronger muscles or sharper claws, but … more

[ TIPS & TRICKS OF THE WEEK]

Analytics Strategy that is Startup Compliant
With right tools, capturing data is easy but not being able to handle data could lead to chaos. One of the most reliable startup strategy for adopting data analytics is TUM or The Ultimate Metric. This is the metric that matters the most to your startup. Some advantages of TUM: It answers the most important business question, it cleans up your goals, it inspires innovation and helps you understand the entire quantified business.

[ DATA SCIENCE Q&A]

Q:When you sample, what bias are you inflicting?
A: Selection bias:
– An online survey about computer use is likely to attract people more interested in technology than in typical

Under coverage bias:
– Sample too few observations from a segment of population

Survivorship bias:
– Observations at the end of the study are a non-random set of those present at the beginning of the investigation
– In finance and economics: the tendency for failed companies to be excluded from performance studies because they no longer exist

Source

[ VIDEO OF THE WEEK]

@AnalyticsWeek: Big Data Health Informatics for the 21st Century: Gil Alterovitz

 @AnalyticsWeek: Big Data Health Informatics for the 21st Century: Gil Alterovitz

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

Data beats emotions. – Sean Rad, founder of Ad.ly

[ PODCAST OF THE WEEK]

#DataScience Approach to Reducing #Employee #Attrition

 #DataScience Approach to Reducing #Employee #Attrition

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

100 terabytes of data uploaded daily to Facebook.

Sourced from: Analytics.CLUB #WEB Newsletter

Malaysia opens digital government lab for big data analytics

Malaysia today officially launched a government lab to analyse data from across agencies and to test new ways of using the data to improve public services.

The Digital Government Lab will “facilitate various ministries and agencies to have a greater level of analytics of the collected data, in strict adherence to the government’s data security, integrity and sovereignty guidelines”, said Datuk Abdul Wahab Abdullah, Chief Executive of MIMOS, the country’s national ICT research agency.

The lab was set up in January by MIMOS, Modernisation and Management Planning Unit and Multimedia Development Corporation, as part of a wider national Big Data Analytics Innovation Network.

It is already testing ideas to analyse the public mood on a newly introduced tax and on flooding. With the Ministry of Finance, “we are extracting some data related to the newly implemented GST [Goods and Services Tax] by the Malaysian government, and we are looking at the sentiments of the public, extracting data from social network and blogs, so that this information will provide a better reading of the sentiments”, a MIMOS spokesperson told FutureGov.

Another project with the Department of Irrigation and Drainage is “looking at data from sensors and also feedback from the public on social media on flood issues, and others related to irrigation”, he said.

Other agencies testing their ideas at the lab are the Department of Islamic Development and National Hydraulic Research Institute.

The Minister of Science, Technology and Innovation, Datuk Dr Ewon Ebin, said that his ministry will work with the Ministry of Communications and Multimedia to ensure that the lab maintains data security and sovereignty.

The lab will eventually be opened up for use by the private sector, the MIMOS spokesperson said.

Originally posted via “Malaysia opens digital government lab for big data analytics”

Source by anum

Is the Importance of Customer Experience Overinflated?

Companies rely on customer experience management (CEM) programs to provide insight about how to manage customer relationships effectively to grow their business. CEM programs require measurement of primarily two types of variables, satisfaction with customer experience and customer loyalty. These metrics are used specifically to assess the importance of customer experience in improving customer loyalty. Determining the “importance” of different customer experience attributes needs to be precise as it plays a major role in helping companies: 1) prioritize improvement efforts, 2) estimate return on investment (ROI) of improvement efforts and 3) allocate company resources.

How We Determine Importance of Customer Experience Attributes

When we label a customer experience attribute as “important,” we typically are referring to the magnitude of the correlation between customer ratings on that attribute (e.g., product quality, account management, customer service) and a measure of customer loyalty (e.g., recommend, renew service contract). Correlations can vary from 0.0 to 1.0. Those attributes that have a high correlation with customer loyalty (approaching 1.0) are considered more “important” than other attributes that have a low correlation with customer loyalty (approaching 0.0).

Measuring Satisfaction with the Customer Experience and Customer Loyalty Via Surveys

Companies typically (almost always?) rely on customer surveys to measure both the satisfaction with the customer experience (CX) as well as the level of customer loyalty.  That is, customers are given a survey that includes questions about the customer experience and customer loyalty. The customers are asked to make ratings about their satisfaction with the customer experience and their level of customer loyalty (typically likelihood ratings).

As mentioned earlier, to identify the importance of customer experience attributes on customer loyalty, ratings of CX metrics and customer loyalty are correlated with each other.

The Problem of a Single Method of Measurement: Common Method Variance

The magnitude of the correlations between measures of satisfaction (with the customer experience) and measures of customer loyalty are made up of different components. On one hand, the correlation is due to the “true” relationship between satisfaction with the experience and customer loyalty.

On the other hand, because the two variables are measured using the same method - a survey with self-reported ratings, the magnitude of the correlation is partly due to the method of how the data are collected. Referred to as Common Method Variance (CMV) and studied in the field of social sciences (see Campbell and Fiske, 1959) where surveys are a common method of data collection, the general finding is that the correlation between two different measures is driven partly by the true relationship between the constructs being measured as well as the way they are measured.

The impact of CMV in customer experience management likely occurs when you use the same method of collecting data (e.g., survey questions) for both predictors (e.g., satisfaction with the customer experience) and outcomes (e.g., customer loyalty). That is, the size of the correlation between satisfaction and loyalty metrics is likely due to the fact that both variables are measured using a survey instrument.

Customer Loyalty Measures: Real Behaviors v. Expected Behaviors

The CMV problem is not really about how we measure satisfaction with the customer experience; a survey is a good way to measure the feelings/perceptions behind the customers’ experience. The problem lies with how we measure customer loyalty. Customer loyalty is about actual customer behavior. It is real customer behavior (e.g., number of recommendations, number of products purchased, whether a customer renewed their service contract) that drives company profits. Popular self-report measures ask for customers’ estimation of their likelihood of engaging in certain behaviors in the future (e.g., likely to recommend, likely to purchase, likely to renew).

Using self-report measures of satisfaction and loyalty, researchers have found high correlations between these two variables; For example, Bruce Temkin has found correlations between satisfaction with the customer experience and NPS to be around .70. Similarly, in my research, I have found comparably sized correlations (r ≈ .50) looking at the impact of the customer experience on advocacy loyalty (the recommend question is part of my advocacy metric). Are these correlations a good reflection of the importance of the customer experience in predicting loyalty (as measured by the recommend question)? Before I answer that question, let us first look at work (Sharma, Yetton and Crawford, 2009) that helps us classify different types of customer measurement and their impact on correlations.

Different Ways to Measure Customer Loyalty

Sharma et al. highlight four different types of measurement methods. I have slightly modified their four types to illustrate customer loyalty measures that are least susceptible to CMV (coded as 1) to measures that are most susceptible to CMV (coded as 4):

  1. System-captured metrics reflect objective metrics of customer loyalty: Data are obtained from historical records and other objective sources, including purchase records (captured in a CRM system). Example: Computer generated records of “time spent on the Web site” or “number of products/services purchased” or “whether a customer renewed their service contract.”
  2. Behavioral-continuous items reflect specific loyalty behaviors that respondents have carried out: Responses are typically captured on a continuous scale. Example item: How many friends did you tell about company XYZ in the past 12 months? None to 10, say.
  3. Behaviorally-anchored items reflect specific actions that respondents have carried out: Responses are typically captured on scales with behavioral anchors. Example item: How often have you shopped at store XYZ in the past month? Not at all to Very Often.
  4. Perceptually-anchored items reflect perceptions of loyalty behavior: Responses are typically on Likert scales, semantic differential or “agree/disagree scale”. Example: I shop at the store regularly. Agree to Disagree.

These researchers looked at 75 different studies examining the correlation between perceived usefulness (predictor) and usage of IT (criterion). While all studies used perceptually-anchored measures for perceived usefulness (perception/attitude), different studies used one of four different types of measures of usage (behavior). These researchers found that CMV accounted for 59% of the variance in the relationship between perceived usefulness and usage (r = .59 for perceptually-anchored items; r = .42 for behaviorally anchored items; r = .29 for behavioral continuous items; r = .16 for system-captured metrics). That is, the method with which researchers measure “usage” impacts the outcome of the results; as the usage measures become less susceptible to CMV (moving up the scale from 4 to 1 above), the magnitude of the correlation decreases between perceived usefulness and usage.

Looking at research in the CEM space, we commonly see that customer loyalty is measured using questions that reflect perceptually-anchored questions (type 4 above), the type of measure most susceptible to CMV.

Table 1. Descriptive statistics and correlations of two types of recommend loyalty metrics (behavioral-continuous and perceptually-anchored) with customer experience ratings.

An Example

I have some survey data on the wireless service industry that examined the impact of customer satisfaction with customer touch points (e.g, product, coverage/reliability and customer service) on customer loyalty. This study included measures of satisfaction with the customer experience (perceptually-anchored) and two different measures of customer loyalty:

  1. self-reported number of people you recommended the company to in the past 12 months (behavioral-continuous).
  2. self-reported likelihood to recommend (perceptually-anchored)

The correlations among these measures are located in Table 1.

As you can see, the two recommend loyalty metrics are weakly related to each other (r = .47), suggesting that they measure two different constructs. Additionally, and as expected by the CMV model, the behavioral-continuous measure of customer loyalty (number of friends/colleagues) shows a significantly lower correlation (average r = .28) with customer experience ratings compared to the perceptually-anchored measure of customer loyalty (likelihood to recommend) (average r = .52). These findings are strikingly similar to the above findings of Sharma et al. (2009).

Summary and Implications

The way in which we measure the customer experience and customer loyalty impacts the correlations we see between them. As measures of both variables use perceptually-anchored questions on the same survey, the correlation between the two are likely overinflated. I contend that the true impact of customer experience on customer loyalty can only be determined when real customer loyalty behaviors are used in the statistical modeling process.

We may be overestimating the importance (e.g., impact) of customer experience on customer loyalty simply due to the fact that we measure both variables (experience and loyalty) using the same instrument, a survey with similar scale characteristics. Companies commonly use the correlations (or squared correlation) between a given attribute and customer loyalty as the basis for estimating the return on investment (ROI) when improving the customer experience. The use of overinflated correlations will likely result in an overestimate of the ROI of customer experience improvement efforts. As such, companies need to temper this estimation when perceptually-anchored customer loyalty metrics are used.

I argue elsewhere that we need to use more objective metrics of customer loyalty whenever they are available. Using Big Data principles, companies can link real loyalty behaviors with customer satisfaction ratings. Using a customer-centric approach to linkage analysis, our company,TCELab  helps companies integrate customer feedback data with their CRM data where real customer loyalty data are housed  (see CEM Linkage for a deeper discussion).

While measuring customer loyalty using real, objective metrics (system-captured) would be ideal, many companies do not have the resources to collect and link customer loyalty behaviors to customer ratings of their experience. Perhaps loyalty measures that are less susceptible to CMV could be developed and used to get a more realistic assessment of the importance of the customer experience on customer loyalty.  For example, self-reported metrics that are more easily verifiable by the company (e.g., “likelihood to renew service contract” is more easily verifiable by the company than “likelihood to recommend”) might encourage customers to provide realistic ratings about their expected behaviors, thus reflecting a truer measure of customer loyalty. At TCELab, our customer survey, the Customer Relationship Diagnostic (CRD), includes verifiable types of loyalty questions (e.g., likely to renew contract, likely to purchase additional/different products, likely to upgrade).

The impact of the Common Method Variance (CMV) in CEM research is likely strong in studies in which the data for customer satisfaction (the predictor) and customer loyalty (the criterion) are collected using surveys with similar item characteristics (perceptually-anchored). CEM professionals need to keep the problem of CMV in mind when interpreting customer survey results (any survey results, really) and estimating the impact of customer experience on customer loyalty and financial performance.

What kind of loyalty metrics do you use in your organization? How do you measure them?

Originally Posted at: Is the Importance of Customer Experience Overinflated?

Jun 01, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Fake data  Source

[ AnalyticsWeek BYTES]

>> RSPB Conservation Efforts Take Flight Thanks To Data Analytics by analyticsweekpick

>> Data Science 101: Interactive Analysis with Jupyter and Pandas by john-hammink

>> Big Data in China Is a Big Deal by anum

Wanna write? Click Here

[ NEWS BYTES]

>>
 [Bootstrap Heroes] G-Square brings in a bot and plug-and-play element into analytics – YourStory.com Under  Sales Analytics

>>
 Security Experts Warn Congress That the Internet of Things Could Kill People – MIT Technology Review Under  Internet Of Things

>>
 Grady Health System earns HIMSS Analytics Stage 7 award … – Healthcare IT News Under  Analytics

More NEWS ? Click Here

[ FEATURED COURSE]

Process Mining: Data science in Action

image

Process mining is the missing link between model-based process analysis and data-oriented analysis techniques. Through concrete data sets and easy to use software the course provides data science knowledge that can be ap… more

[ FEATURED READ]

The Industries of the Future

image

The New York Times bestseller, from leading innovation expert Alec Ross, a “fascinating vision” (Forbes) of what’s next for the world and how to navigate the changes the future will bring…. more

[ TIPS & TRICKS OF THE WEEK]

Strong business case could save your project
Like anything in corporate culture, the project is oftentimes about the business, not the technology. With data analysis, the same type of thinking goes. It’s not always about the technicality but about the business implications. Data science project success criteria should include project management success criteria as well. This will ensure smooth adoption, easy buy-ins, room for wins and co-operating stakeholders. So, a good data scientist should also possess some qualities of a good project manager.

[ DATA SCIENCE Q&A]

Q:Provide examples of machine-to-machine communications?
A: Telemedicine
– Heart patients wear specialized monitor which gather information regarding heart state
– The collected data is sent to an electronic implanted device which sends back electric shocks to the patient for correcting incorrect rhythms

Product restocking
– Vending machines are capable of messaging the distributor whenever an item is running out of stock

Source

[ VIDEO OF THE WEEK]

Decision-Making: The Last Mile of Analytics and Visualization

 Decision-Making: The Last Mile of Analytics and Visualization

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

He uses statistics as a drunken man uses lamp posts—for support rather than for illumination. – Andrew Lang

[ PODCAST OF THE WEEK]

#BigData @AnalyticsWeek #FutureOfData #Podcast with Scott Zoldi, @fico

 #BigData @AnalyticsWeek #FutureOfData #Podcast with Scott Zoldi, @fico

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

Estimates suggest that by better integrating big data, healthcare could save as much as $300 billion a year — that’s equal to reducing costs by $1000 a year for every man, woman, and child.

Sourced from: Analytics.CLUB #WEB Newsletter

Will China use big data as a tool of the state?

Since imperial times Chinese governments have yearned for a perfect surveillance state. Will big data now deliver it? On 5 July 2009, residents of Xinjiang, China’s far western province, found the internet wasn’t working. It’s a regular frustration in remote areas, but it rapidly became apparent that this time it wasn’t coming back. The government had hit the kill switch on the entire province when a protest in the capital Ürümqi by young Uighur men (of the area’s indigenous Turkic population) turned into a riot against the Han Chinese, in which at least 197 people were killed.

The shutdown was intended to prevent similar uprisings by the Uighur, long subjected to religious and cultural repression, and to halt revenge attacks by Han. In that respect, it might have worked; officially, there was no fatal retaliation, but in retrospect the move came to be seen as an error.

Speaking anonymously, a Chinese security advisor described the blackout as ‘a serious mistake… now we are years behind where we could have been in tracking terrorists’. Young Uighur learnt to see the internet as hostile territory – a lesson reinforced by the arrest of Ilham Tohti, a popular professor of economics, on trumped-up charges of extremism linked to an Uighur-language website he administered. ‘We turn off our phones before we talk politics’, a tech-savvy Uighur acquaintance remarked.

The Uighur continued to consume digital media, but increasingly in off-line form, whether viewing discs full of Turkish TV series or jihadist propaganda passed on memory sticks. Where once Chinese media reports claimed that arrested Uighur had been visiting ‘separatist’ websites, now they noted drawers full of burnt DVDs and flash drives.

A series of brutal terrorist attacks early in 2014 reinforced the lesson for the Chinese authorities; by driving Uighur off-line they had thrown away valuable data. Last summer, the Public Security University in Beijing began recruiting overseas experts in data analysis, including, I’m told, former members of the Israeli security forces.

In Xinjiang, tightened control means less information, and the Chinese government has always had a fraught relationship with information – private and public. Today, an explosion in available data promises to open up sources of knowledge previously tightly locked away. To some, this seems a shift toward democracy. But technocrats within the government also see it as a way to create a more efficient form of authoritarianism.

In functioning democratic societies, information is gathered from numerous independent sources: universities, newspapers, non-government organisations (NGOs), pollsters. But for the Chinese Communist Party, the idea of independent, uncontrolled media remains anathema. Media is told to ‘direct public opinion’, not reflect it.

China’s rulers have always struggled to get good data, especially in the countryside

In the 2000s, a nascent civil society was gradually forming, much of it digital. Social media, public forums, online whistle-blowing, and investigative journalism offered ways to expose corrupt officials and to force the state to follow its own laws. But in the past three years all such efforts have been crushed with fresh ruthlessness. Lawyers, journalists and activists who were once public-opinion leaders have been jailed, exiled, banned from social media, silenced through private threats, or publicly humiliated by being forced to ‘confess’ on national television. Ideological alternatives to the Party, such as house churches, have seen heightened persecution. And the life was choked from services such as Weibo, the Chinese Twitter-alike, with thousands of accounts banned and posts deleted.

Yet this has left Beijing with the same problem it has always faced. From the beginning, central government has tried simultaneously to gather information for itself and to keep it out of the hands of the public.

A vast array of locals, from seismological surveyors to secret police services, gathered data for the government, but in its progress through the hierarchies of party and state the data inevitably became distorted for political and personal ends. Now some people in government see technology as a solution: a world in which data can be gathered directly, from bottom to top, circumventing the distortions of hierarchy and the threat of oversight alike. But for others, even letting go to the minimal degree necessary to gather such data presents a threat to their own power.

Ren (a pseudonym), a native Beijinger in his late 20s, spent his college years in the West fervently defending China online. But, he now says: ‘I realised that I didn’t know what was going on, and there are so many problems everywhere.’ Back in China, and working for the government, he sees monitoring social media as the best way for the government to keep abreast of and respond to public opinion, allowing a ‘responsible’ authoritarianism. Corrupt officials can be identified, local problems brought to the attention of higher levels, and the public’s voice heard. At the same time, data-analysis techniques can be used to identify worrying groupings in certain areas, and predict possible ‘mass group incidents’ (riots and protests) before they occur.

‘Now that we’ve taken out the “big Vs”,’ Ren told me, ‘we shouldn’t worry about ordinary people speaking.’ ‘Big Vs’ are famous and much-followed ‘Verified’ users on Weibo and other social media services; celebrities, but also ‘public intellectuals’ who have been systematically eliminated over the past three years. With potential rallying points such as opinion leaders or alternative ideologies crushed, the government can view the seething mass of public grievances as a potential source of information, not a direct challenge.

The central government is acutely aware of how little it knows about the country it rules, as fragmented local authorities contend to bend data to their own ends. For instance, the evaluation of officials’ performance by their superiors is, formally, deeply dependent on statistical measurements, predominantly local gross domestic product (GDP) growth. (Informally, it depends also on family connections and outright bribery.) As a result, officials go to great lengths to juke the stats. As Li Keqiang, now the Chinese premier, told a US official in 2007 when he was Party Secretary of Liaoning Province, in a conversation later released by WikiLeaks, GDP figures are ‘man-made’ and ‘for reference only’.

Li said he relied, as many analysts do, on proxy data that’s far harder to fake. To measure growth in his own province, for instance, he looked at electricity, volume of rail cargo and disbursed loans. But he also relied on ‘official and unofficial channels’ to find information on the area he ran, including ‘friends who are not from Liaoning to gather information [I] cannot obtain myself.’

Li’s dilemma would have been familiar to any past emperor. China’s rulers have always struggled to get good data, especially in the countryside. The imperial Chinese state did its best to make its vast and diverse population legible. Household registration systems, dating back to China’s ancient precursor kingdoms, tried to monitor subjects from birth to death. Government officials trudged their way to isolated hamlets across mountains, jungles and deserts. But at the same time, local leaders reported pleasant fictions to the capital to cover their backs.

The People’s Republic inherited these problems, but added to them an obsession with statistics, acquired from the Soviets. Communism was ‘scientific’, and so the evidence had to be manufactured to support it. Newspapers in the 1950s included paragraphs of figures about increased production and national dedication. Chinese reporters still cram unnecessary (and often fictitious) statistics into stories. (‘The new factory has an area of 2,794 square meters.’) ‘According to statistics’ is one of the most overused phrases in mainland writing.

All of this has made China a society in which real information is guarded with unusual jealousy, even within the government. For decades, even the most innocuous data was treated like a state secret. Even the phone numbers of government departments were given out only to a privileged few.

The government in Beijing is well aware that information it receives from below is mangled or invented on the way up. The National Bureau of Statistics (NBS), which chiefly manages industrial data, frequently demands more direct reporting; it has repeatedly called for businesses to send their information directly to the NBS, and has begun naming and shaming firms that don’t do so, as well as local authorities that it catches fixing numbers. In September 2013, for instance, it reported on its website that a Yunnanese county had inflated its industrial growth four-fold. But the NBS is largely ‘helpless’, a junior official at a more powerful body smugly told me, lacking the internal clout to enforce its demands.

One corrective has been sudden descents by higher authorities for ‘inspection tours’. But these are usually anticipated and controlled by local officials who have long since mastered the Potemkin arts. Another longstanding solution was the petitioning system, first institutionalised in the seventh century AD. It let individuals circumvent local officials and present their plea for justice directly to higher authorities, or even directly to the capital. The system is still in place, handling millions of requests a year. But it has never worked, with petitioners more likely to be branded as troublemakers, beaten up, or imprisoned, than for their information to reach anyone of note. Partly the problem is that one of the metrics used to measure officials is the number of petitioners their district produces, the theory being that good governance produces fewer complains, and so corruption has been incentivised.

The files of perceived dissidents are thick, but the records of ordinary life are thin

The constant interference of middlemen is why some in central government are so excited by the possibility of gathering data directly. Take the contentious issue of population. Incentives to distort information cut two ways; under the one‑child policy, rural families often try to avoid reporting births at all, but rural authorities have a strong incentive to over-report their population, since they receive size-linked benefits from the centre. Urban areas, meanwhile, have a strong incentive to under-report population figures, as they’re supposed to be limiting the speed of urbanisation to controllable levels. Beijing’s official population is 21.5 million, but public transport figures suggest the real figure might be 30-35 million.

In theory, China’s surveillance state already generates massive amounts of personal data that could provide government with valuable information. The ID card, now radio-frequency-based, is central to Chinese citizens’ lives, required from banks to hospitals. A centralised database lets ordinary people check ID numbers against names online and confirm identities. But individual transactions with the ID card often go unrecorded unless the Public Security Bureau (PSB) – essentially, the local police station – has already taken an interest in you. And so the files of perceived dissidents are thick, but the records of ordinary life are thin. Even if central agencies go looking for the information, it is distorted en route from municipal, then provincial PSBs.

Despite the vast amounts of data produced within the government, Chinese scientists and officials often find themselves turning to the same sources as Western ones. They’ve seized on projects from abroad that demonstrate how analysis could potentially map population mobility through mobile-phone usage. The mass of consumer data produced by the online shopping services run by the $255 billion Alibaba group is another huge bonanza. Now smartphones produce more of the information that the government needs than secret policemen do.

In and of itself, China’s search for data is morally neutral. As the US political scientist James Scott points out in Seeing Like a State (1998), population data can equally be used for universal vaccinations or genocidal round-ups.

If big data is used by China’s central government to identify corrupt officials, pinpoint potential epidemics and ease traffic, that can only be laudable. Better data would also help NGOs seeking to aid a huge and complex population, and firms looking to invest in China’s future. The flow of data could circumvent vested interests and open up the country’s potential. For Professor Shi Yong, deputy director of the Research Center on Fictitious Economy and Data Science in Beijing , this is a moral issue, not just a question of governance. ‘The data comes from the people,’ he said strongly, ‘so it should be shared by the people.’

Most people in China don’t want to protest against government. They want to know where the good schools are, how clean the air is, and what mortality rates are at local hospitals. Shi returned to China after spending two decades at universities in the US because he was excited by the possibilities of China’s growing information society.

Resistance to opening up officials’ property registration details is extremely fierce

‘Let’s say I want to move to a small city here,’ he told me. ‘I want to know school districts, rent, health: we don’t have this information easily available. Instead, people use personal contacts to get it.’ Shi says that there’s huge resistance to the idea of open data, from within the government and even more from businesses. ‘They might want to protect the way they run their business, they may want to hide something.’ One of his current projects is working with the People’s Bank of China (PBOC) on establishing a nationwide personal credit‑rating system.

‘Actually,’ Shi told me, ‘they have two databases: one for personal information and one for companies’ information, and they wanted us to work on both. But I said no, we would only work on the first. This data is very beautiful! Better than the American data, because all the other banks must send the information directly to the PBOC, the central bank, every day.’ The company data, in contrast, was bad enough to be unworkable. ‘You know garbage in, garbage out? With data analysis, small garbage in, big garbage out.’

Shi highlighted the ways in which the internet had already opened up the provinces for the central government. ‘Look at the PX protests,’ he said, pointing to the local outrage in August 2011 in Dalian and elsewhere against factories producing the chemical paraxylene (PX). ‘Two decades ago, that would have gone nowhere. But this time, the higher authorities took notice of it.’

Small injections of information have already had a palpable effect in China. Air pollution comes in two main forms: relatively large particles called PM 10, and relatively small ones called PM 2.5. For years, Chinese cities published only PM 10 figures, and further skewed statistics by picking selectively from less polluted areas. But after independent monitors, including the US Embassy in Beijing, began putting their own PM 2.5 figures online hourly, which spread rapidly through social media, public pressure eventually forced a shift in official policy.

The crucial issue is who gets to see and to use the data. If it’s limited to officials, however pure-minded their intentions, all it will do is reinforce the reach of the state. China’s strong data protection and privacy laws function primarily not to protect citizens from state intrusion, but to shield officials and businessmen from public scrutiny. Resistance to opening up officials’ property registration details is extremely fierce.

Even if opened up, this information means nothing without tools to find it.  In China, much of that searching is filtered through the web services and search engine Baidu, which is based in Beijing and commands three-quarters of search revenue on the mainland. Like much Chinese online innovation, Baidu profited from the government’s fears of foreign firms, which created a walled garden in which domestic products thrived. After Google announced it would cease censoring searches in China in 2010, the US giant was effectively blocked on the mainland, its share of searches falling from 36 per cent in 2009 to 1.6 per cent in 2013. But Baidu had to fight off internal competitors too, including ChinaSo, the search engine created last year by the merger of the People’s Daily newspaper and the Xinhua news agency, both state-run.

Baidu recently announced that it would launch a big-data engine to allow the public to search and analyse its available data. The firm already works with the Ministry of Transport, using data drawn from the search results on its map service to predict travel trends and help manage traffic. In a project ‘inspired by’ Google Flu Trends, it’s also working with health authorities to predict epidemic outbreaks.

Baidu is widely criticised for co‑operating with the authorities over censorship, and for its dependence on paid advertising which puts the highest-paying companies at the top of search results. That’s why, as Ren explained: ‘If you search for public opinion (minyi), you get two pages of car results’ – minyi uses the same characters as an automobile brand.

Yet the firm also puts up some quiet, informal resistance to government intrusion. It maintains less personal information on users than Google does, for instance, partly because it has fewer integrated services, but it also wipes its own records of search histories far more frequently than Western firms. Insiders say that in meetings with the authorities, Baidu plays an active role in speaking up for greater online freedoms.

That might be why Baidu isn’t popular among many in government. ‘The Party’s publicity department invited Zhou Xiaoping [a young, ultra-nationalist blogger] to speak recently,’ Ren told me. ‘Much of the speech was a rant against Baidu, how they were “rightists” [pro-US, civil rights, and free markets]. Do you know, he said, that if you search “police brutality” on Baidu, you get results about China? Why are the results not about the US, he asked. He got rounds and rounds of applause.’

The network-analysis techniques the authorities use to identify terrorists are also deployed against peaceful independence activists

Whatever measures some firms take, intrusion by the state is hard to resist. A draconian new draft for a national security law, likely to be introduced this year, specifies that the state has full access to any data it demands – already the case in practice – and that any foreign firm working in China must keep all their Chinese data inside the country. It also envisages extensive camera networks and the use of facial-recognition software on a vast scale.

Shi described to me how personal banking and credit information ‘is being used as part of the anti-corruption campaign to identify the networks of corrupt officials’, who in China often hide their graft – whether it’s property or cash – by putting it in the names of friends or family. Using data analysis, Shi suggested, the Party’s investigators could root out such previously opaque networks.

Identifying and targeting friends and family, however, is also a technique that the Chinese state has traditionally used against dissidents and whistleblowers. In earlier times, ideological deviance could cause a man’s entire family to be persecuted, or even executed. Even today, the threat of children forced out of school or spouses fired from jobs is part of the toolset deployed against ‘troublemakers’. In Xinjiang, meanwhile, the network-analysis techniques the authorities use to identify terrorists are also deployed against peaceful independence activists, academic dissidents such as Tohti (whose students were marched out to testify against him), and Islamic teachers.

When I asked Shi about the increasing discussion in the West over government surveillance, he suggested that it would come in time in China. ‘We’re not at that stage yet,’ he said. ‘Right now, we’re just setting up the basic infrastructure. In time, we’ll have the kinds of legal protections that developed countries do.’

That might happen. But I’ve been hearing from well-meaning people ever since I came to China more than a decade ago that the rule of law is right around the corner. The corner’s still there. But now it has a CCTV camera on it.

James Palmer is a British writer and editor who works closely with Chinese journalists. His latest book is The Death of Mao (2012). He lives in Beijing

Originally posted via “Will China use big data as a tool of the state?”

Source: Will China use big data as a tool of the state?