Janet Amos Pribanic Chief Operating Officer John Daniel Associates, Inc. Janetâs Profile
Business Analytics is changing rapidly. Traditional BI is being challenged due to the rate at which we are not only collecting data, but wanting to leverage that data for business advantage.
A recent survey of over 40 technology vendorsâ clients by one of the largest IT research firms showed that ultimately, the customer experience matters. The value given to the customer is what matters, always! So, how do we get there?â¦.
Even if not perfect â get it in the hands of the business fast
When we evaluate successful analytics customers, many of them started with a very inexpensive (hear free trial) solution, and leveraged that experience to build a successful solution and architecture. What better way to learn, than to fail (or partially fail) and take that experience forward. The term âfailâ here does not necessarily mean a solution has been built and thrown away. It means that what has been learned via a trial process or prototype has given us valuable insight on what is most meaningful moving forward with analytics and successful analytics architecture. A functional architecture, or methodology, is the best place to start. Here are the 5 steps:
Identify the business problem we are out to solve with data (analytics)
Gather the data
Build the model to support the business
View and explore the data
Deploy the operational insights
And note â this is an iterative process. There is value in getting this out there not exactly right. However, get it out there fast â you will leverage feedback faster, and this will allow better insight from the business because they can see it and take corrective action faster. In addition, disparities are exposed faster, incorrect business rules are exposed faster, and by having that exposure, the organization can now take action and leverage powerful and cheap architectures, like Hadoop, to enable you to take in massive amounts of data and store data very inexpensively. Now you are prepared to take on strategic business analytics.
Letâs look at a possible example. You have identified and gathered data around a problem in your supply chain. After gathering the data, the next move is to explore it. You sample the data in the data set, test it and analyze it. You develop hypotheses which are then tested. For example, you might do analysis to figure out which two or three vendors are late in deliveries, resulting in customer satisfaction issues.
After you determine why those vendors are not performing, you operationalize the insights youâve gained, building them into your business logic and workflows. If ABC Company is consistently late on deliveries within the supply chain and contributes to two actions that map to the supply chain model (challenge), begin corrective action before affecting additional clients.
Once insights are operationalized, it is important to close the model feedback loop. The models you build could (and likely will) change over time; the factors that caused supply chain challenge this year may not hold a year from now, as market and other factors change. Test your hypotheses to see if the model still holds or if it needs adjustment. For example, as new forms of vendor interaction are introduced, the supply chain variables may also change over time.
Advanced analytics should become the mainstay of your secret sauce. The point is to use all of your resources effectively: data modelers and the many people with business domain knowledge.
Get analytics out there fast, even if not perfect: itâs a counterintuitive way to make rapid progress. Take in only as much data as you need, analyze and create operational models, and refine those models. Then take those models and begin to build your functional architecture.
To deliver successful analytics, you will also need to plan and staff correctly. To do advanced analytics at scale, there are two approaches from a staffing perspective:
Hire lots of expensive data modelers
Leverage people who are technically savvy in your company with strong business acumen
The first way is certainly challenging. In fact, it canât scale either because there is not an abundant supply of data modelers to hire, even if you had unlimited resources.
The second way takes a different and proven successful approach. Do not be constrained by the technology that enables analytics but instead focus on the strength of logic that powers analytics.
In practical terms, ask data modelers to partner with you to research ways to solve business problems, and then have them build consumable models that solve those problems. With models that serve as templates, businesspeople can perform analytics on their own, working with the models the data modelers developed, and then extend those models to new areas as business logic dictates.
Your company will significantly benefit by this secret sauce. Choose to make it part of your core competency!
Keeping Biases Checked during the last mile of decision making
Today a data driven leader, a data scientist or a data driven expert is always put to test by helping his team solve a problem using his skills and expertise. Believe it or not but a part of that decision tree is derived from the intuition that adds a bias in our judgement that makes the suggestions tainted. Most skilled professionals do understand and handle the biases well, but in few cases, we give into tiny traps and could find ourselves trapped in those biases which impairs the judgement. So, it is important that we keep the intuition bias in check when working on a data problem.
[ DATA SCIENCE Q&A]
Q:What does NLP stand for?
A: * Interaction with human (natural) and computers languages
* Involves natural language understanding
– Machine translation
– Question answering: whats the capital of Canada?
– Sentiment analysis: extract subjective information from a set of documents, identify trends or public opinions in the social media
I have been doing some work on how investment professionals can use customer feedback as part of their valuation process. I include a case study of an investment firm that usedÂ customer feedback to help confirm the valuation of the target company (it did), and also where to start in terms of managing the business to secure its future.
Investment professionals take a huge risk when they purchase or make a significant investment in a business. To identify and minimize their investment risk, these professionals conduct due diligence of the business. Due diligence is an investigation or audit of a potential investment. Investors typically examine such matters as the business’ finances, proprietary information, employees, insurance, equipment and property, and litigation claims, to name a few (Entrepreneur.com offers these due diligenceÂ questionsÂ and aÂ downloadable checklist. Forbes has a theirÂ checklist. Inc.com has their ownÂ checklistÂ and even offersÂ some adviceÂ for conducting due diligence.).
While some due diligence efforts include an examination of customer data, they typically focus on identifying the number and types of customers (e.g., where they are located, size). Even in cases where customer feedback is included in the due diligence process, the business sellersÂ hand-pick aÂ few customersÂ to be interviewed by the buyer, resulting in potentially biased information about the health of the business and inflating the perceived value of the business. If customer feedback is to be of value, a more rigorous approach using customer feedback is needed. In this post, I will outline a more in-depth approach at using systematic customer feedback in the due diligence process.
1. Ask a Representative Sample of Customers
When soliciting customer feedback, take steps to ensure the feedback is representative of all possible feedback from the population of customers. Â Ask for a complete customer list from the seller and randomly select the customers you want to give you feedback. If you are particularly concerned about specific customer segments, use stratified random sampling (random selection occurs within each customer segment) to ensure you get enough respondents for the segments in question.
While a census is unnecessary to get a reliable picture of the entire customer base, I recommend that, when possible, you invite all customers to provide feedback.Â For B2C companies, surveys need to be targeted to the buyer of the products/services.Â For B2B companies, due to the nature of the buying process, surveys need to be targeted to all parties who are directly and indirectly involved in buying the company’s products/services (e.g., decision makers and decision influencers).
Verify the quality of the sample of customers by comparing the demographic make-up of the sample to that of the entire customer base. The extent to which the sample is representative of the population will determine the quality of the inferences you are able to make about the population. To make any meaningful conclusions about the value of the target company, the customers you ask need to be a representative sample of the population of customers.
2. Ask about Customer Loyalty
The value of a company in directly impacted by customer loyalty. The greater the customer loyalty, the higher the company value. Customers can increase the value of a company by engaging in three different types of customer loyalty behaviors.Â As is illustrated in theÂ Customer Loyalty Measurement Framework, these three types of customer loyalty include: 1)Â retention loyaltyÂ (valuable customers stay around for a long time), 2)Â advocacy loyaltyÂ (customers tell family/friends about the company to drive new customer growth) and 3)Â purchasing loyaltyÂ (customers increase their share-of-wallet to drive average revenue per user/customer (ARPU) growth).
BecauseÂ customers can exhibit their loyalty to a company in different ways,Â you need to ask the right loyalty for your specific needs.Â Does the target company have a history of high defection rates? If so, ask customers about their intention of staying. Does the target company have stagnant ARPU growth? If so, ask customers about their intention of buying different products. Does the target company historically have low new customer growth? If so, ask customers about their intention of recommending the company to their friends.
As a starting point, consider Â including a loyalty question for each of the three general types of loyalty behaviors: retention, advocacy and purchasing (see theÂ RAPID loyalty):
Likelihood to switch providers (retention)
Likelihood to renew service contract (retention)
Likelihood to recommend (advocacy)
Overall satisfaction (advocacy)
Likelihood to purchase different solutions from <Company Name> (purchasing)
Likelihood to expand use of <Company Name’s> products throughout company (purchasing)
3. Ask about Customer Experience (CX)
Customer loyalty is impacted by the customer experience. According to Wikipedia, the customer experience (CX) is the sum of all experiences a customer has with a supplier of goods and services over the duration of that relationship. Customers who are satisfied with their experience with the supplier stay longer, recommend more, and buy more from the supplier compared to customers who are less satisfied with their experience.
Ask customers about their experience with the company. While you could ask customers literally hundreds of CX questions about each specific aspect of their experience, research shows that you only need a few CX questions to understand whatÂ drives their loyalty. For example, ask customers how satisfied the are with the target company across each of these areas:
Ease of doing business
Communications from the Company
Future Product/Company Direction
4. Ask about Relative Performance
Companies do not perform in a vacuum; competitors are vying for the same customers and limited prospects as the target company you are purchasing. If the target company has plenty of competitors in its space, you need to understand where you rank relative to the competition. After all, top ranked companies receive greater share of wallet compared to their bottom-ranked competitors. All things equal, a company that is ranked the lowest is less valuable than a company that is ranked the highest.
Ask customers about how the company compares to its competitors. Toward that end, theÂ Relative Performance AssessmentÂ (RPA), a competitive analytics solution, helps investors understand the relative ranking of the target company and identify ways to increase their ranking, and consequently, increase share of wallet.Â In its basic form, the RPA method requires two questions:
What best describes Company’s performance compared to the competitors you use?
Please tell us why you rank Company’s performance the way you do.Â This question allows each customer to indicate the reasons behind his/her response about your ranking. The content of the customers’ comments can be examined to identify underlying themes to help diagnose the reasons for high rankings or low rankings
To understand the value of the company you are purchasing, you need to know how you measure up to the competition. More importantly, after the purchase, the RPA will help you know what you need to do to improve your ranking in the industry.
5. Ask about Company-Specific Issues
Investors may have a need to ask additional questions that are specific to the target company. These questions, driven by specific business needs, can include demographic questions (if not included in their CRM system), open-ended questions, and targeted questions.Â Typical questions in B2B relationship surveys include:
Time as a customer
Job function (e.g., Marketing, Sales, IT, Service)
Level of influence in purchasing decisions of <Company Name> solutions (Primary decision maker, Decision influencer, No influence)
Include one or two open-ended questions that allow respondents to provide additional feedback in their own words. Depending on how the questions are phrased, customers’ remarks can provide additional insight about the health of the customer relationship. Text analytics help you understand both the primary content of words as well as the sentiment behind them. To understand potential improvement areas, a question I commonly use is:
If you were in charge of <Company Name>, what improvements, if any, would you make?
Customer relationship surveys can be used to collect feedback about specific topics that are of interest to executive management. Give careful consideration about asking additional questions. As with any survey question, you must know exactly how the data from the questions will be used to improve customer loyalty. Some popular topics of interest include measuring 1) perceived benefits of solutions and 2) perceived value. Some sample questions are:
How much improvement did you experience in productivity due to <Company Name’s> solutions?
Satisfaction with price of the solution given the value received
Next, I will present an example of how one investment firm used customer feedback to help in their due diligence process.
An investment firm wanted to expand their portfolio of companies by purchasing an existing B2B company. As part of the due diligence process, the investment firm worked with the target company to acquire their customer email list for a Web-based customer survey. The investment firm used theÂ Customer Relationship DiagnosticÂ (CRD) to collect customer feedback. The CRD is a brief survey that asks customers about different types of customer loyalty, satisfaction with general CX touch points, relative performance and a few company-specific questions.
The response rate for the survey was about 70% and consisted primarily of decision makers and decision influencers (~80%) and were Managers, Directors or Executives (~70%).
Case Study: Loyalty Results
Customer loyalty results are located in Figure 1. As you can see, customers reported moderate levels of customer loyalty for most of the loyalty questions (e.g., advocacy and retention). For purchasing loyalty, customers reported low likelihood of buying different products and low likelihood of expanding the use of the target company’s solutions.
Case Study: CX Results
Results of the CX ratings can be found in Figure 2. Based on the survey results, the customers were moderately satisfied with their experiences across the touch points, except for Communications from the Company and Future Product/Company Direction.
Between 20% and 50% of the customers said they were dissatisfied with each of the seven customer touch points.
Case Study: Relative Performance Assessment Results
Results of the Relative Performance Assessment ratings are located in Figure 3. As you can see, customers said that only 42% of the customers indicated that the company was better than the competition. Almost 60% of the customers indicated that the company was the same worse than most other competitors.
After re-scaling the values of the 5-p0int rating scale (1 = worst to 5 = best) to a 0-100 scale, I estimated that the target company falls roughly at the 54th percentile in their industry; that is, the company’s performance is typical when compared to their competition.
Case Study: Determining Dollar Value of Loyalty
To estimate the expected revenue gains/losses of the target company, I worked with the investment firm to translate the customer loyalty ratings into a dollar value. We employed subject matter experts (SMEs) as well as analyzed existing financial reports of the target company to arrive at our best guess of expected annual revenue gains through new customers (~$300k)) and existing customers (through purchasing new/different products – ~$160k) and estimated the annual revenue at risk due to churn (by customers stop using the company ~ $450k).
Case Study: The Decision
Overall, the customer feedback confirmed the valuation of the company. While the target company was perceived to be in the middle of the pack in their industry (ranked at 54th percentile) and the future direction of their products/company appeared dismal (50% are dissatisfied),Â investors believed they had the management team that could address these shortcomings.Â The investment company decided to buy the company.
Case Study: Where to Make Improvements
The investors now became the business owner, and, consequently, needed to manage the business to secure its future. The survey results were analyzed to help decide where to best allocate resources in areas that will improve customer loyalty (and revenue) while minimizing the improvement costs.
Using driver analysis on the existing data, the investment firm found that there were three key drivers of customer loyalty: 1) product quality, 2) communications from the company and 3) future product/company direction. Again, using SMEs, we were able to estimate the ROI for improving each of the three key drivers. It turns out that the greatest ROI for CX improvements would be achieved by improving communications from the company and future product/company direction.
Benefits of Using Customer Feedback in your Due Diligence Process
You can significantly enhance your due diligence process through a systematic approach of collecting and analyzing customer feedback. Using the questions I proposed above, here are some benefits you can achieve when you use customer feedback as part of your due diligence process when purchasing a company:
Identify investment opportunities others miss and avoid investing in poor opportunities. Discover the quality of products and services from the people who matter: The customers.
Estimate revenue gains/losses. Â Using survey data and financial data, you can estimateÂ annual revenue at risk due to customer churn and revenue growth due to new customers and expanding relationships with current customers.
Understand your competitive advantage/disadvantage. Â Your relative performance will impact how much incremental money your customers will spend with you.Â Collecting customer feedback can help you identify what you need to do to beat your competition to improve your growth.
Understand the ROI of different improvement efforts.
Investors can gain valuable insight about a target company they are buying by simply asking customers the right questions. Be sure you ask a representative sample of customers so the feedback you get is meaningful and reflects the entire customer base. Ask customers about different types of loyalty behaviors in which they are likely to engage. This feedback can help you estimate revenue gains and risks. Ask customers about their customer experience to identify company strengths as well as potential problems. Ask customers about the company’s relative performance compared to other companies. This insight can help you understand the competitive landscape in the company’s industry and identify ways to improve/maintain your competitive advantage.
When purchasing a company, a systematic approach to surveying the customers (and analyzing the data correctly) can significantly augment the information in your due diligence process and provide a lot of insight about the value of the company. Asking the customers of the target company could mean the difference between acquiring a valuable company or a lemon.
Learn more about the Customer Relationship Diagnostic (CRD) for your due diligence
Data aids, not replace judgement
Data is a tool and means to help build a consensus to facilitate human decision-making but not replace it. Analysis converts data into information, information via context leads to insight. Insights lead to decision making which ultimately leads to outcomes that brings value. So, data is just the start, context and intuition plays a role.
[ DATA SCIENCE Q&A]
Q:How to clean data?
A: 1. First: detect anomalies and contradictions
* Tidy data: (Hadley Wickam paper)
column names are values, not names, e.g. 26-45
multiple variables are stored in one column, e.g. m1534 (male of 15-34 years old age)
variables are stored in both rows and columns, e.g. tmax, tmin in the same column
multiple types of observational units are stored in the same table. e.g, song dataset and rank dataset in the same table
*a single observational unit is stored in multiple tables (can be combined)
* Data-Type constraints: values in a particular column must be of a particular type: integer, numeric, factor, boolean
* Range constraints: number or dates fall within a certain range. They have minimum/maximum permissible values
* Mandatory constraints: certain columns cant be empty
* Unique constraints: a field must be unique across a dataset: a same person must have a unique SS number
* Set-membership constraints: the values for a columns must come from a set of discrete values or codes: a gender must be female, male
* Regular expression patterns: for example, phone number may be required to have the pattern: (999)999-9999
* Missing values
* Cross-field validation: certain conditions that utilize multiple fields must hold. For instance, in laboratory medicine: the sum of the different white blood cell must equal to zero (they are all percentages). In hospital database, a patients date or discharge cant be earlier than the admission date
2. Clean the data using:
* Regular expressions: misspellings, regular expression patterns
* KNN-impute and other missing values imputing methods
* Coercing: data-type constraints
* Melting: tidy data issues
* Date/time parsing
* Removing observations
Oracle has bolstered its database portfolio with the Oracle Data Integrator (ODI), a piece of middleware designed to help analysts sift through big data across a variety of sources.
As the name suggests, the ODI effectively eases the process of linking data in different formats and from diverse databases and clusters, such as Hadoop, NoSQL and relational databases.
This enables Oracle customers to conduct analysis on large and varied datasets without dedicating time and resources to preparing big data in an integrated and secure way prior to analysis.
In effect, the ODI allows huge pools of data to be treated as just another data source to be used alongside more regularly accessed data warehouses and structured databases.
Jeff Pollock, vice president of product management at Oracle, claimed that the ODI allows customers to be experts in extract, transform and load tools without learning the code needed to carry out such actions.
“Oracle is the only vendor that can automatically generate Spark, Hive and Pig transformations from a single mapping which allows our customers to focus on business value and the overall architecture rather than multiple programming languages,” he said.
Avoiding the need for proprietary code means that the ODI can be run natively with a company’s existing Hadoop cluster, bypassing the need to invest in additional development.
Cluster databases like Hadoop and Spark have traditionally been geared towards programmers with knowledge of the coding needed to manipulate them. On the flipside, analysts would mostly use software tools to carry out enterprise-level data analytics.
The ODI gives the non-code savvy analyst the ability to harness Hadoop and other data sources without requiring the coding knowledge to do so.
It also means that a company’s developers need not retrain to handle multiple databases. Oracle is touting this as a way for companies to save money and time on big data analysis.
IBM just released the results of a global study on how businesses can get the most value from Big Data and analytics. They found nine areas that are critical to creating value from analytics. You can download the entire study here.
IBM Institute for Business Value surveyed 900 IT and business executives from 70 countries from June through August 2013. The 50+ survey questions were designedÂ to help translate conceptsÂ relating to generating value from analytics into actions.
Nine Levers to Value Creation
The researchers identified nine levers that help organizations create value from data. They compared leaders (those whoÂ identified their organization as substantiallyÂ outperforming their industry peers) with the rest of the sample. They found that the leaders (19% of the sample) implement the nine levers to a greater degree than the non-leaders.Â These nine levers are:
Source of value: Actions and decisions that generate results. Leaders tend to focus primarily on their ability to increase revenue and less so on cost reduction.
Measurement: Evaluating the impact on business outcomes. Leaders ensure they know how their analytics impact business outcomes.
Platform: Integrated capabilities delivered by hardware andÂ software.Â Sixty percent of Leaders have predictive analytic capabilities, asÂ well as simulation (55%) and optimization (67%)Â capabilities.
Culture: Availability and use of data and analytics within anÂ organization. Leaders make more than half of their decisions based on data and analytics.
Data: Structure and formality of the organizationâs dataÂ governance process and the security of its data. Two-thirds of Leaders trust the quality of their data and analytics. A majority of leaders (57%) adopt enterprise-levelÂ standards, policies and practices to integrate dataÂ across the organization.
Trust: Organizational confidence. Leaders demonstrate a high degree of trust between individual employees (60% between executives, 53% between business and IT executives)
Sponsorship: Executive support and involvement. Leaders (56%) oversee the use of data andÂ analytics within their own departments,Â guided by an enterprise-level strategy, common policiesÂ and metrics, and standardized methodologiesÂ compared to the rest (20%).
Funding: Financial rigor in the analytics funding process. Nearly two-thirds of Leaders pool resources to fund analytic investments. They evaluate these investments through pilot testing, cost/benefit analysis and forecasting KPIs.
Expertise: Development of and access to data managementÂ and analytic skills and capabilities.Â Leaders shareÂ advanced analytics subject matter experts across projects, where analytics employees have formalized roles, clearly defined career paths and experience investments to develop their skills.
The researchers state that each of the nine levers have a different impact on the organizationâsÂ ability to deliver value from the data and analytics; that is,Â all nine levers distinguish Leaders from the rest but each LeverÂ impacts value creation in different ways. Enable levers need to be in place before value can be seen through the Drive and Amplify levers.Â The nine levers are organized into three levels:
Enable: These levers form the basis for big data and analytics.
Drive: These levers are needed to realize value from data andÂ analytics; lack of sophistication within these levers will impedeÂ value creation.
Amplify: These levers boost value creation.
Recommendations: Creating an Analytic Blueprint
Next, the researchers offered a blueprint on how business leaders can translate the research findings into real changes for their own businesses. This operational blueprint consists of three areas: 1) Strategy, 2) Technology and 3) Organization.
Strategy is about the deliberateness with which the organization approaches analytics. Businesses need to adopt practices around Sponsorship, Source of value and Funding to instill a sense of purpose to data and analytics that connects the strategic visions to the tactical activities.
Technology is about the enabling capabilities and resources an organization has available to manage, process, analyze, interpret and store data. Businesses need to adopt practices around Expertise, Data and Platform to create a foundation for analytic discovery to address today’s problems while planning for future data challenges.
Organization is about the actions taken to use data and analytics to create value. Businesses need to adopt practices around Culture, Measurement and Trust to enable the organization to be driven by fact-based decisions.
One way businesses are trying to outperform their competitors is through the use of analytics on their treasure trove of data. The IBM researchers were able to identify the necessary ingredients to extract value from analytics.Â The current research supports prior research on the benefits of analytics in business:
Analytic innovators 1) use analytics primarily to increase value to the customer rather than to decrease costs/allocate resources, 2) aggregate/integrate different business data silos and look for relationships among once-disparate metric and 3) secureÂ executive support around the use of analytics that encourage sharing of best practices and data-driven insights throughout their company.
Businesses, to extract value from analytics, need to focus on improving strategic, technological and organizational aspects on how they treat data and analytics. The research identified nine area or levers executives can use to improve the value they generate from their data.
Grow at the speed of collaboration
A research by Cornerstone On Demand pointed out the need for better collaboration within workforce, and data analytics domain is no different. A rapidly changing and growing industry like data analytics is very difficult to catchup by isolated workforce. A good collaborative work-environment facilitate better flow of ideas, improved team dynamics, rapid learning, and increasing ability to cut through the noise. So, embrace collaborative team dynamics.
[ DATA SCIENCE Q&A]
Q:Give examples of bad and good visualizations?
A: Bad visualization:
– Pie charts: difficult to make comparisons between items when area is used, especially when there are lots of items
– Color choice for classes: abundant use of red, orange and blue. Readers can think that the colors could mean good (blue) versus bad (orange and red) whereas these are just associated with a specific segment
– 3D charts: can distort perception and therefore skew data
– Using a solid line in a line chart: dashed and dotted lines can be distracting
– Heat map with a single color: some colors stand out more than others, giving more weight to that data. A single color with varying shades show the intensity better
– Adding a trend line (regression line) to a scatter plot help the reader highlighting trends
You may have heard the term âbig dataâ in reference to companies like Netflix, Google or Facebook. Itâs the collection of all those little data points about your choices and decision making process that allows companies to know exactly what movie youâre in the mood for when you plop down on your couch with a bowl of popcorn after a long day. Recently, big data has also made a foray into the educational realm. Whether through information gathered through standardized testing or the use of adaptive learning systems, big data is well on its way to completely transforming K-12 education.
Here are 10 ways Big Data is changing K-12 education:
1. Different pace of learning
One of the main challenges that educators currently face is adapting their instruction so it accommodates many different students who learn at different paces. The tools used to collect data, like intelligent adaptive learning systems, are designed to shift the pace of instruction depending on the prior knowledge, abilities and interests of each student. Teachers, in turn, can use this data to inform their pace of instruction going forward.
2. Sharing information
When students change schools or move across state lines, it has often been a challenge for their new teachers to get a firm grasp of what they have covered and which content areas may need more attention. The Common Core standards make data interchangeable across schools and districts.
3. Pinpoint problem areas
A unique feature of big data is that it allows teachers and administrators to pinpoint academic problem areas in students as they learn rather than after they take the test. For example, if a student is working through an adaptive learning program and the data collected reveals that he or she needs more help understanding the fundamental concepts behind fractions, teachers or the adaptive learning system can set aside time to work individually with that student to address and overcome the problem.
4. Need for analysts
Of course, the collection of all of this data isnât helpful for anyone if it just sits there â school districts are beginning to need analysts to interpret it all. Disparate data sets must be linked so that decision makers in a school district can view, sort and analyze the information to develop both long- and short-term plans for improving education. School districts may also need to set up workshops to show teachers how they can use all of this data effectively.
5. Different means of educational advancement
Traditionally, readiness for educational advancement has been determined more by age than whether or not the student was ready to learn more challenging material. Gifted students may be advanced, but they often stay in the same class as their peers because information about what they know can only be collected sporadically. Big data allows teachers and administrators to get a continuous sense of where students are falling academically, and whether or not they are ready to advance.
6. Smooth transitions
The collection of data is not only allowing for smoother transition between schools, but also grade levels. Access to information databases about what exactly students know could prove quite useful to school districts that are in the process of implementing the Common Core State Standards. Because the CCSS are changing academic requirements, some students find that theyâve inadvertently missed learning something important because it was shifted to the grade below. Data can pinpoint this problem so it can be addressed.
7. Personalized activities
Personalized learning has become a much-heralded approach to education, and big data is helping teachers tailor activities to individual learners. Technology, in particular, is playing a central role. Tech-savvy students can use computer games and adaptive learning programs to complete educational activities that are interactive and take their skill level into account.
8. Using analytics
One significant change that schools are seeing is the increasing use of analytics to inform their approaches. For example, big data can be analyzed to create plans to improve academic results, decrease dropout rates and influence the day-to-day decision making of administrators and teachers.
9. Engage parents and students
Itâs extremely important for parents to be involved in their childrenâs education, and big data is providing a means of engaging both parents and students. If at parent/teacher conferences, educators can pinpoint exactly where a child is excelling and where more work is needed, and can provide data to back up those claims, parents will have a clearer understanding of what they can do to help their children succeed in school.
10. Customized instruction
Perhaps most exciting for teachers and students alike is the ability for customized instruction that big data provides. This differs greatly from the approach to education in the past, when teachers would deliver one lesson and expect all students to understand, even if they learned in very different ways.
Is your school using big data? What changes are you seeing?
Save yourself from zombie apocalypse from unscalable models
One living and breathing zombie in today’s analytical models is the pulsating absence of error bars. Not every model is scalable or holds ground with increasing data. Error bars that is tagged to almost every models should be duly calibrated. As business models rake in more data the error bars keep it sensible and in check. If error bars are not accounted for, we will make our models susceptible to failure leading us to halloween that we never wants to see.
[ DATA SCIENCE Q&A]
Q:Explain what a local optimum is and why it is important in a specific context,
such as K-means clustering. What are specific ways of determining if you have a local optimum problem? What can be done to avoid local optima?
A: * A solution that is optimal in within a neighboring set of candidate solutions
* In contrast with global optimum: the optimal solution among all others
* K-means clustering context:
Its proven that the objective cost function will always decrease until a local optimum is reached.
Results will depend on the initial random cluster assignment
* Determining if you have a local optimum problem:
Tendency of premature convergence
Different initialization induces different optima
* Avoid local optima in a K-means context: repeat K-means and take the solution that has the lowest cost
The role of data and analytics in business continues to grow. To make sense of their plethora of data, businesses are lookingÂ to data scientists for help. Job site, indeed.com, shows a continued growth in “data scientist” positions. To better understand the field of data science, we studied hundreds of data professionals.
In that study, we foundÂ that data scientists are not created equal. That is, data professionals differ with respect to the skills they possess. For example, some professionals are proficient inÂ statistical and mathematical skills while othersÂ are proficient inÂ computer scienceÂ skills. Still others have a strong business acumen. In the current analysis, I want to determine the breadth of talent that data professionals possess to better understand the possibility of finding a single data scientist who is skilled in all areas. First, let’s review the study sample and the method of how we measured talent.
Assessing Proficiency in DataÂ Skills
We surveyed hundreds ofÂ data professionals to tell us about their skills in five areas: Business, Technology, Math & Modeling, Programming and Statistics. Each skill area included five specific skills, totaling 25 different data skills in all.
For example, in the Business Skills area, data professionals were asked to rate their proficiency in such specific skillsÂ as “Business development,” and “Governance & Compliance (e.g., security).” In the Technology Skills area, they were asked to rate their proficiency in such skills as “Big and Distributed Data (e.g., Hadoop, Map/Reduce, Spark),” and “Managing unstructured data (e.g., noSQL).” In the Statistics Skills, they were asked to rate their proficiency in such skills as “Statistics and statistical modeling (e.g., general linear model, ANOVA, MANOVA, Spatio-temporal, Geographical Information System (GIS)),” and “Science/Scientific Method (e.g., experimental design, research design).”
For each of the 25 skills, respondents were asked to tell us their level proficiency using the following scale:
The different levels of proficiency are defined aroundÂ the data scientists ability to give or need to receiveÂ help. In the instructions to the data professionals, the “Intermediate” level of proficiency was defined as the ability “to successfully complete tasks as requested.” We used that proficiency level (i.e., Intermediate) as the minimum acceptable level of proficiency for each data skill. The proficiency levels below the Intermediate level (i.e., Novice, Fundamental Awareness, Don’t Know) were defined by an increasing need for help on the part of the data professional. Proficiency levels above the Intermediate level (i.e., Advanced, Expert) were defined by the data professional’sÂ increasing ability to give help or be knownÂ by others as “a person to ask.”
We looked at the level of proficiency for the 25 different data skills across four different job roles. As is seen in Figure 1, data professionals tend to be skilled in areas that are appropriate for their job role (see green-shaded areas in Figure 1). Specifically, Business Management data professionals show the most proficiency in Business Skills. Researchers, on the other hand, show lowest level of proficiency in Business Skills and the highest in Statistics Skills.
For many of the data skills, the typical data professional does not have the minimum level of proficiency to do be successful at work, no matter their role (see yellow- and red-shaded areas in Figure 1). These data skills include the following: Unstructured data, NLP, Machine Learning, Big and distributed data, Cloud management, Front-end programming, Optimization, Graphic models, Algorithms and Bayesian statistics.
In Search of the Elite Data Scientist
There are a couple of ways an organization can build their data science capability. ItÂ can either hire a single individual who is skilled in all data science areas or it can hire a team of data professionals who have complementary skills. In both cases, the organization has all the skills necessary to use data intelligently. However, the likelihood of finding a data professional who is an expert in all five skill areas is quite low (see Figure 2). In our sample, we looked at three levels of proficiency: Intermediate, Advanced and Expert. We found that only 10% of the data professionals indicated they had, at least, an Intermediate level of proficiency in all five skill areas. The picture looks more bleak you look for data professionals who have advanced or expert proficiencies in data skills. The chance of finding a data professionalÂ with AdvancedÂ skills or better in all five skill areas drops to less thanÂ 1%. There were no data professionals who were considered as Experts in all five skill areas.
We looked atÂ proficiency differences across five industries: Consulting (n = 52), Education / Science (n = 50), Financial, (n = 52), Healthcare (n = 50) and IT (n = 95). We identified dataÂ professionals who had an advanced level of proficiency acrossÂ the differentÂ skills. We found that data professionals in the Education / Science industry have more advanced skills (54% of data professionals have at least an advanced level of proficiency in at least one skill area) compared to data professionals in the Financial (37%) and IT (34%) industries.
The term “data scientist” is ambiguous. There are different types of data scientists, each defined by their level of proficiency in one of fiveÂ skill areas: Business, Technology, Programming, Math & ModelingÂ and Statistics.Â Data scientists can beÂ defined by the skills they possess. So, when somebody tells you they are a data scientist, be sure you know what type they are.
Finding a data professional who is proficientÂ in all data science skill areas is extremely difficult.Â As our studyÂ shows, data professionals rarely possess proficiency in all fiveÂ skill areas at the level needed to be successful at work. The chance of finding a data professional with Expert skills in all five areas (even in 3 or 4 skill areas) is akin to finding a unicorn; they just don’t exist. There were very few data professionals who even had the basic minimum level of proficiency (i.e., Intermediate level of proficiency) in all five skill areas. Additionally, our initial findings on industry differences in skill proficiency suggest that skilled data professionals might be easier to find in specific industries. These industry differences could impact recruitment and management of data professionals. An under-supplyÂ of data science talent in one industry could require companies to use more dramatic recruitment efforts to attract data professionals fromÂ outside theÂ industry. In industries where there are plenty of skilled data professionals, companies can be more selective in their hiring efforts.
Optimizing the value of business data is dependent on the skills of the data professionals who process the data. We took a skills-based approach to understanding how organizations can extractÂ value from their data. Based on our findings, we recommend that organizations avoid trying to find a single data professional who has the skills that span the entire spectrum of data science. Rather, a better approach is to consider building up your data science capability through the formation of teams of data professionals who have complementary skills.