Sep 13, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Data security  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ NEWS BYTES]

>>
 Top 10 Cloud Computing Challenges – Datamation Under  Cloud Security

>>
 Statistics Show More Signs Of The Tourism Slowdown – Reykjavík Grapevine Under  Statistics

>>
 IoT In Action – Introducing Azure Sphere – Microsoft – Channel 9 Under  IOT

More NEWS ? Click Here

[ FEATURED COURSE]

Lean Analytics Workshop – Alistair Croll and Ben Yoskovitz

image

Use data to build a better startup faster in partnership with Geckoboard… more

[ FEATURED READ]

Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking

image

Written by renowned data science experts Foster Provost and Tom Fawcett, Data Science for Business introduces the fundamental principles of data science, and walks you through the “data-analytic thinking” necessary for e… more

[ TIPS & TRICKS OF THE WEEK]

Data aids, not replace judgement
Data is a tool and means to help build a consensus to facilitate human decision-making but not replace it. Analysis converts data into information, information via context leads to insight. Insights lead to decision making which ultimately leads to outcomes that brings value. So, data is just the start, context and intuition plays a role.

[ DATA SCIENCE Q&A]

Q:Give examples of data that does not have a Gaussian distribution, nor log-normal?
A: * Allocation of wealth among individuals
* Values of oil reserves among oil fields (many small ones, a small number of large ones)

Source

[ VIDEO OF THE WEEK]

@DrewConway on fabric of an IOT Startup #FutureOfData #Podcast

 @DrewConway on fabric of an IOT Startup #FutureOfData #Podcast

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

Without big data, you are blind and deaf and in the middle of a freeway. – Geoffrey Moore

[ PODCAST OF THE WEEK]

@EdwardBoudrot / @Optum on #DesignThinking & #DataDriven Products #FutureOfData #Podcast

 @EdwardBoudrot / @Optum on #DesignThinking & #DataDriven Products #FutureOfData #Podcast

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

We are seeing a massive growth in video and photo data, where every minute up to 300 hours of video are uploaded to YouTube alone.

Sourced from: Analytics.CLUB #WEB Newsletter

November 21, 2016 Health and Biotech analytics news roundup

Here’s the latest in health and biotech analytics, in particular some new partnerships between academia and industry:

Analytical Booster Platform to Deliver “Smarter Healthcare”: THB (Technology, Healthcare, Big Data Analytics) is targeting the Indian healthcare system. They aim to give providers the “right information and tools at the right time.”

Pitt, Pfizer team up on health data analytics: The one-year partnership will use public and private data to find relationships between brain disease, brain imaging, and genetic markers.

Broad Institute Teams Up With Intel To Integrate Genomic Data From Diverse Sources And Enhance Genomic Data Analytic Capabilities: The Intel-Broad Center for Genomic Data Engineering will seek to optimize tools to be used on Intel-based computational platforms. They will also seek to enable collaborations through common workflow models.

UC San Francisco and GE Healthcare Launch Deep Learning Partnership to Advance Care Globally: They will be developing deep learning algorithms to help with many facets of healthcare problem solving, like determining what requires normal care and what requires quick intervention.

Source: November 21, 2016 Health and Biotech analytics news roundup

Innovation at The Power of Incubation

Innovating at The Power of Incubation
Innovating at The Power of Incubation

Having worked with corporate innovation and seen innovations evolving in different sectors, it became easier to imagine what an innovation cycle entails. Some companies do it better than the others, but most of the companies spend a lot of money on innovation with less visibility on the outcome and returns. And, in large corporates, a strong sense of bias still exists at all levels that could taint the disruptive idea even before it can be executed well. So, how to fix that?
Organize an incubator to disrupt your business. Wait don’t panic, let me explain how it could really help a fortune company stay in business for a sustainable, foreseeable future. Incubators are the next level of business case competition, but more rigorous, longer duration and more effective. In an incubator, you just don’t get the next big idea for the company, but also get the one that can be executed in most effective way.

Here are my 5 reason on why it is relevant:

1. It lets you find out the opportunities that were not visible to your focused eyes: Startup entrepreneurs have an open mind to try out the most amazing and complex projects. Also, they don’t have the restrictions (legal, bureaucracy, brand impact, accountability to shareholders) that large corporates have that pose as a big hurdle in innovation. Startups are also free from multiple biases that large corporations are plagued with. Hence, it is much easier and faster to innovate and try out new things in a lean manner and in a bureaucracy free startup environment. Incubators bring out the best of the both worlds, where the best minds are competing to bring the best products to market and where the corporates provides the necessary support and is its culture is not able to taint the ideas.

2. Stay close to the ideas that could disrupt your market and sleep better: What salesforce did to Oracle, and what amazon did to retail stores is not something you would be looking forward to. So, if you are not keeping your eyes and ears open to next disruption, you might miss the very last boat that will let you afloat. Incubators can act as a breeding ground for disruptive ideas not just for your current market landscape, but also things that might change the marketplace for your goods in the future for forever. So, it provides you an opportunity to grow in your land as well as in neighboring lands while investing limited resources.

3. Opportunity to hire entrepreneurs which rarely show up in HR resume: Incubators could be a great place to spot talent especially people who are motivated to endure in the land of unknown and can make things happen. It is HR’s dream to hire those 20% that lifts 80% of the company and take them to GREEN zone. I have been at numerous roles and seen variable talent pool. True entrepreneurs always stand out; hustling, giving their 110% heart and soul to make things happen. Money does not motivate them, but creating something useful does. Every company craves for people with out of the box thinking, lean, fast and ambitious to make things happen. In favor of sustainability, this is the talent that each organization needs especially for fostering innovation and success.

4. Keeps you current, sexy and relevant: Whether we talk about Larry Ellison discussing the concept of cloud/big data as fluff, or Steve Ballmer laughing at iPhone or Blackberry’s tumble. Every big conglomerate lives in their bubble, they have limited sized window that shows them the world they live in. Reality distortion is almost true for all big companies, bigger the size, thicker the lens, poorer the vision. Startups have the tendency to stay current and act on latest and greatest methodologies that exists today. Big companies get that know-how free if they are associated with startups. They could be part of what and how the world is changing and what roles startups play, so they can adept their practices and stay current. Companies don’t need to invest millions to get ideas on staying current, sometimes all it takes is thousands to make the difference.

5. Good karma points, positive PR and strong brand building: Last but not the least; incubators can really help a brand image of the large corporates, as startups are considered to be interesting, sexy and young. They attract the youth and the early adopters. Press and media want to find out and write about the next big idea in the industry. So, being attached to an incubator and startups gets you good media coverage and publicity and creates brand awareness. It also creates positive vibes in the old consumers and reinforces their support for the brand and attracts new customers.

There are many large corporates that have leveraged incubation as a technique to get an edge over innovation in their industries, namely – Pepsi, GE, Nike, Microsoft etc. All these companies are pioneers in their respective fields and have leveraged and profited from their involvement with the incubators and startups.

What should we do?

No, you don’t have to get into the incubation business; there are tons of incubators out there. Find one and partner with them to get you going on the road to fix some of the innovation loopholes that could not be fixed by data innovation. Yes, big data innovation is still super relevant and yes, incubators could help you innovate as well. So, there is more than one easy, cost effective, and optimal ways for big enterprises to innovate.

Here is a quick video by Christie Hefner on designing a corporate culture that is open to all ideas.

Source

Sep 06, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Fake data  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> 5 tips to becoming a big data superhero by analyticsweekpick

>> What Are the 3 Critical Keys to Healthcare Big Data Analytics? by analyticsweekpick

>> How Big Data Is Changing The Entertainment Industry! by analyticsweekpick

Wanna write? Click Here

[ NEWS BYTES]

>>
 Insurers pay premium for cyber security experts – Financial Times Under  cyber security

>>
 Lawmakers Unveil Plans for Agency Telework and Cloud Security – Nextgov Under  Cloud Security

>>
 As VMworld nears, virtualization disrupts the cloud application ecosystem – SiliconANGLE News (blog) Under  Virtualization

More NEWS ? Click Here

[ FEATURED COURSE]

Artificial Intelligence

image

This course includes interactive demonstrations which are intended to stimulate interest and to help students gain intuition about how artificial intelligence methods work under a variety of circumstances…. more

[ FEATURED READ]

Hypothesis Testing: A Visual Introduction To Statistical Significance

image

Statistical significance is a way of determining if an outcome occurred by random chance, or did something cause that outcome to be different than the expected baseline. Statistical significance calculations find their … more

[ TIPS & TRICKS OF THE WEEK]

Keeping Biases Checked during the last mile of decision making
Today a data driven leader, a data scientist or a data driven expert is always put to test by helping his team solve a problem using his skills and expertise. Believe it or not but a part of that decision tree is derived from the intuition that adds a bias in our judgement that makes the suggestions tainted. Most skilled professionals do understand and handle the biases well, but in few cases, we give into tiny traps and could find ourselves trapped in those biases which impairs the judgement. So, it is important that we keep the intuition bias in check when working on a data problem.

[ DATA SCIENCE Q&A]

Q:Explain likely differences between administrative datasets and datasets gathered from experimental studies. What are likely problems encountered with administrative data? How do experimental methods help alleviate these problems? What problem do they bring?
A: Advantages:
– Cost
– Large coverage of population
– Captures individuals who may not respond to surveys
– Regularly updated, allow consistent time-series to be built-up

Disadvantages:
– Restricted to data collected for administrative purposes (limited to administrative definitions. For instance: incomes of a married couple, not individuals, which can be more useful)
– Lack of researcher control over content
– Missing or erroneous entries
– Quality issues (addresses may not be updated or a postal code is provided only)
– Data privacy issues
– Underdeveloped theories and methods (sampling methods…)

Source

[ VIDEO OF THE WEEK]

Surviving Internet of Things

 Surviving Internet of Things

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

It is a capital mistake to theorize before one has data. Insensibly, one begins to twist the facts to suit theories, instead of theories to

[ PODCAST OF THE WEEK]

George (@RedPointCTO / @RedPointGlobal) on becoming an unbiased #Technologist in #DataDriven World #FutureOfData #Podcast

 George (@RedPointCTO / @RedPointGlobal) on becoming an unbiased #Technologist in #DataDriven World #FutureOfData #Podcast

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

Market research firm IDC has released a new forecast that shows the big data market is expected to grow from $3.2 billion in 2010 to $16.9 billion in 2015.

Sourced from: Analytics.CLUB #WEB Newsletter

Creating Great Choices to Enable #FutureOfWork by @JenniferRiel #JobsOfFuture #Podcast

 

In this podcast Jennifer Harris (@JenniferRiel) sat with Vishal (@Vishaltx from @AnalyticsWeek) to discuss her book “Creating Great Choices: A Leader’s Guide to Integrative Thinking”. She sheds light on the importance of integrating thinking in generating long lasting solutions. She shared some of the innovative ways business could get to creative problem solving that prevent bias and isolation and brings diversity in the opinion. Jennifer also spoke about the challenges that tribalism brings to the quality of decision making. This conversation and her book is great for anyone looking to create a futureproof organization that takes measured decision for effective outcome.

Her Book Link:
Creating Great Choices: A Leader’s Guide to Integrative Thinking by Jennifer Riel (Author), Roger L. Martin (Author) https://amzn.to/2JGeljS

Jennifer’s Recommended Read:
Pride and Prejudice by Jane Austen and Tony Tanner https://amzn.to/2MbHkeb
Thinking, Fast and Slow by Daniel Kahneman https://amzn.to/2sNzgbt
The Righteous Mind: Why Good People Are Divided by Politics and Religion by Jonathan Haidt https://amzn.to/2xUZFZD
Give and Take: Why Helping Others Drives Our Success by Adam M. Grant Ph.D. https://amzn.to/2xYtWHa

Podcast Link:
iTunes: http://math.im/jofitunes
GooglePlay: http://math.im/jofgplay

Here is Jennifer’s Bio:
Jennifer Riel is an adjunct professor at the Rotman School of Management, University of Toronto, specializing in creative problem solving. Her focus is on helping everyone, from undergraduate students to business executives, to create better choices, more of the time.

Jennifer is the co-author of Creating Great Choices: A Leader’s Guide to Integrative Thinking (with Roger L. Martin, former Dean of the Rotman School of Management). Based on a decade of teaching and practice with integrative thinking, the book lays out a practical methodology for tackling our most vexing business problems. Using illustrations from organizations like LEGO, Vanguard and Unilever, The book shows how individuals can leverage the tension of opposing ideas to create a third, better way forward.

An award-winning teacher, Jennifer leads training on integrative thinking, strategy and innovation at organizations of all types, from small non-profits to some of the largest companies in the world.

About #Podcast:
#JobsOfFuture podcast is a conversation starter to bring leaders, influencers and lead practitioners to come on show and discuss their journey in creating the work, worker and workplace of the future.

Want to sponsor?
Email us @ info@analyticsweek.com

Keywords:
#JobsOfFuture
JobsOfFuture
Jobs of future
Future of work
Leadership
Strategy

Source: Creating Great Choices to Enable #FutureOfWork by @JenniferRiel #JobsOfFuture #Podcast by v1shal

Aug 30, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Convincing  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> October 10, 2016 Health and Biotech Analytics News Roundup by pstein

>> Ten Guidelines for Clean Customer Feedback Data by bobehayes

>> Apr 20, 17: #AnalyticsClub #Newsletter (Events, Tips, News & more..) by admin

Wanna write? Click Here

[ NEWS BYTES]

>>
 Data Analytics Add Value to Healthcare Supply Chain Management – RevCycleIntelligence.com Under  Health Analytics

>>
 The men’s fashion company that’s part apparel, part big data – Marketplace.org Under  Big Data

>>
 TV Time’s New Analytics Tool Breaks Down Fan Reaction to Shows … – Variety Under  Social Analytics

More NEWS ? Click Here

[ FEATURED COURSE]

Introduction to Apache Spark

image

Learn the fundamentals and architecture of Apache Spark, the leading cluster-computing framework among professionals…. more

[ FEATURED READ]

On Intelligence

image

Jeff Hawkins, the man who created the PalmPilot, Treo smart phone, and other handheld devices, has reshaped our relationship to computers. Now he stands ready to revolutionize both neuroscience and computing in one strok… more

[ TIPS & TRICKS OF THE WEEK]

Winter is coming, warm your Analytics Club
Yes and yes! As we are heading into winter what better way but to talk about our increasing dependence on data analytics to help with our decision making. Data and analytics driven decision making is rapidly sneaking its way into our core corporate DNA and we are not churning practice ground to test those models fast enough. Such snugly looking models have hidden nails which could induce unchartered pain if go unchecked. This is the right time to start thinking about putting Analytics Club[Data Analytics CoE] in your work place to help Lab out the best practices and provide test environment for those models.

[ DATA SCIENCE Q&A]

Q:What is an outlier? Explain how you might screen for outliers and what would you do if you found them in your dataset. Also, explain what an inlier is and how you might screen for them and what would you do if you found them in your dataset
A: Outliers:
– An observation point that is distant from other observations
– Can occur by chance in any distribution
– Often, they indicate measurement error or a heavy-tailed distribution
– Measurement error: discard them or use robust statistics
– Heavy-tailed distribution: high skewness, can’t use tools assuming a normal distribution
– Three-sigma rules (normally distributed data): 1 in 22 observations will differ by twice the standard deviation from the mean
– Three-sigma rules: 1 in 370 observations will differ by three times the standard deviation from the mean

Three-sigma rules example: in a sample of 1000 observations, the presence of up to 5 observations deviating from the mean by more than three times the standard deviation is within the range of what can be expected, being less than twice the expected number and hence within 1 standard deviation of the expected number (Poisson distribution).

If the nature of the distribution is known a priori, it is possible to see if the number of outliers deviate significantly from what can be expected. For a given cutoff (samples fall beyond the cutoff with probability p), the number of outliers can be approximated with a Poisson distribution with lambda=pn. Example: if one takes a normal distribution with a cutoff 3 standard deviations from the mean, p=0.3% and thus we can approximate the number of samples whose deviation exceed 3 sigmas by a Poisson with lambda=3

Identifying outliers:
– No rigid mathematical method
– Subjective exercise: be careful
– Boxplots
– QQ plots (sample quantiles Vs theoretical quantiles)

Handling outliers:
– Depends on the cause
– Retention: when the underlying model is confidently known
– Regression problems: only exclude points which exhibit a large degree of influence on the estimated coefficients (Cook’s distance)

Inlier:
– Observation lying within the general distribution of other observed values
– Doesn’t perturb the results but are non-conforming and unusual
– Simple example: observation recorded in the wrong unit (°F instead of °C)

Identifying inliers:
– Mahalanobi’s distance
– Used to calculate the distance between two random vectors
– Difference with Euclidean distance: accounts for correlations
– Discard them

Source

[ VIDEO OF THE WEEK]

@TimothyChou on World of #IOT & Its #Future Part 1 #FutureOfData #Podcast

 @TimothyChou on World of #IOT & Its #Future Part 1 #FutureOfData #Podcast

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

The world is one big data problem. – Andrew McAfee

[ PODCAST OF THE WEEK]

#BigData @AnalyticsWeek #FutureOfData #Podcast with @MichOConnell, @Tibco

 #BigData @AnalyticsWeek #FutureOfData #Podcast with @MichOConnell, @Tibco

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

571 new websites are created every minute of the day.

Sourced from: Analytics.CLUB #WEB Newsletter

CMOs’ Journey from Big Data to Big Profits (Infographic)

CMOs’ Journey from Big Data to Big Profits (Infographic)
CMOs’ Journey from Big Data to Big Profits (Infographic)

Since the consumer purchase funnel is generating great amounts of data, it has become extremely difficult to track and make sense of the data, as consumers add social media and mobile channels to their decision-making. This is fueling the ever mounting pressure on CMOs to show how their budget delivers incremental business value.

Better data management is turning out to be a strong competitive edge and great value generation tools for organizations. So, well managed marketing organization will make adequate use of data.

This has pushed many marketers to stand overwhelmingly towards better big data analytics, as analytics will become a major component of their business over the next several years—according to the Teradata Data Driven Marketing Survey 2013 released by Teradata earlier this year, 71 percent or marketers say they plan to implement big data analytics within the next two years.

Marketers already rely on a number of common and easily accessible forms of data to drive their marketing initiatives—customer service data, customer satisfaction data, digital interaction data and demographic data. But true data-driven marketing takes it to the next level: Marketers need to collect and analyze massive amounts of complicated, unstructured data that combines the traditional data their companies have collected with interaction data (e.g., data pulled from social media), integrating both online and offline data sources to create a single view of their customer.

Visually and McKinsey & Company co published this infographic to illustrate the pressures that CMOs find themselves under and respective potential benefit in leveraging big data.

CMOs’ Journey from Big Data to Big Profits (Infographic)
CMOs’ Journey from Big Data to Big Profits (Infographic)

Source by v1shal

Aug 23, 18: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

[  COVER OF THE WEEK ]

image
Insights  Source

[ LOCAL EVENTS & SESSIONS]

More WEB events? Click Here

[ AnalyticsWeek BYTES]

>> Big data’s big problem: How to make it work in the real world by analyticsweekpick

>> January 16, 2017 Health and Biotech analytics news roundup by pstein

>> March 6, 2017 Health and Biotech analytics news roundup by pstein

Wanna write? Click Here

[ FEATURED COURSE]

Deep Learning Prerequisites: The Numpy Stack in Python

image

The Numpy, Scipy, Pandas, and Matplotlib stack: prep for deep learning, machine learning, and artificial intelligence… more

[ FEATURED READ]

On Intelligence

image

Jeff Hawkins, the man who created the PalmPilot, Treo smart phone, and other handheld devices, has reshaped our relationship to computers. Now he stands ready to revolutionize both neuroscience and computing in one strok… more

[ TIPS & TRICKS OF THE WEEK]

Grow at the speed of collaboration
A research by Cornerstone On Demand pointed out the need for better collaboration within workforce, and data analytics domain is no different. A rapidly changing and growing industry like data analytics is very difficult to catchup by isolated workforce. A good collaborative work-environment facilitate better flow of ideas, improved team dynamics, rapid learning, and increasing ability to cut through the noise. So, embrace collaborative team dynamics.

[ DATA SCIENCE Q&A]

Q:How to detect individual paid accounts shared by multiple users?
A: * Check geographical region: Friday morning a log in from Paris and Friday evening a log in from Tokyo
* Bandwidth consumption: if a user goes over some high limit
* Counter of live sessions: if they have 100 sessions per day (4 times per hour) that seems more than one person can do

Source

[ VIDEO OF THE WEEK]

#FutureOfData Podcast: Peter Morgan, CEO, Deep Learning Partnership

 #FutureOfData Podcast: Peter Morgan, CEO, Deep Learning Partnership

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

We chose it because we deal with huge amounts of data. Besides, it sounds really cool. – Larry Page

[ PODCAST OF THE WEEK]

#FutureOfData with @CharlieDataMine, @Oracle discussing running analytics in an enterprise

 #FutureOfData with @CharlieDataMine, @Oracle discussing running analytics in an enterprise

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

Distributed computing (performing computing tasks using a network of computers in the cloud) is very real. Google GOOGL -0.53% uses it every day to involve about 1,000 computers in answering a single search query, which takes no more than 0.2 seconds to complete.

Sourced from: Analytics.CLUB #WEB Newsletter

“To Cloud or Not”: Practical Considerations for Disaster Recovery and High Availability in Public Clouds

The perceived benefits of low cost storage, per-usage pricing models, and flexible accessibility—and those facilitated by multi-tenant, public cloud providers in particular—are compelling. Across industries and use cases, organizations are hastening to migrate to the cloud for applications that are increasingly becoming more mission critical with each deployment.

What many fail to realize (until after the fact) is that the question of security is not the only cautionary consideration accompanying this change in enterprise architecture. There are also numerous distinctions related to disaster recovery, failover clustering, and high availability for public cloud use cases which drastically differ from conventional on-premise methods for ensuring business continuity. Most times, businesses are tasked with making significant network configurations to enable these preventive measures which can ultimately determine how successful cloud deployments are.

“Once you’ve made the decision that the cloud is your platform, high availability and security are two things you can’t do without,” explained Dave Bermingham, Senior Technical Evangelist, Microsoft Cloud and Datacenter Management MVP at SIOS. “You have them on-prem. Whenever you have a business critical application, you make sure you take all the steps you can to make sure it’s highly available and your network is secure.”

Availability Realities
The realities of the potential for high availability in the cloud vastly differ from their general perception. According to Bermingham, most of the major public cloud providers such as AWS, Google, Azure and others “have multiple data centers across the entire globe. Within each geographic location they have redundancy in what they call zones. Each region is divided so you can have zone one and zone two be entirely dependent of one another, so there should be no single point of failure between the zones.” The standard promises of nearly 100 percent availability contained in most service-level agreements are predicated on organizations running instances in more than one zone.

However, for certain critical applications such as database management systems like Microsoft SQL Server, for example, “the data is being written in one instance,” noted Bermingham. “Even if you have a second instance up and running, it’s not going to do you any good because the data written on the primary instance won’t be on the secondary instance unless you take steps to make that happen.” Some large cloud providers don’t have Storage Area Networks (SANs) used for conventional on-premise high availability, while there are also few out-the-box opportunities for failovers between regions. The latter is especially essential when “you have a larger outage that affects an entire region,” Bermingham said. “A lot of what we’ve seen to date has been some user error…that has a far reaching impact that could bring down an entire region. These are also susceptible to natural disasters that are regional specific.”

Disaster Recovery
Organizations can maximize disaster recovery efforts in public clouds or even mitigate the need for them with a couple different approaches. Foremost of these involves SANless clusters, which provide failover capabilities not predicated on SAN. Instead of relying on storage networks not supported by some large public clouds, this approach relies on software to facilitate failovers via an experience that is “the same as their experience on-prem with their traditional storage cluster,” Bermingham mentioned. Moreover, it is useful for standard editions of database systems like SQL Server as opposed to options like Always On availability groups.

The latter enables the replication of databases and failovers, but is a feature of the pricey enterprise edition of database management systems such as SQL Server. These alternative methods to what public clouds offer for high availability can assist with redundancy between regions, as opposed to just between zones. “You really want to have a plan B for your availability beyond just distributed across different zones in the same region,” Bermingham commented. “Being able to get your data and have a recovery plan for an entirely different region, or even from one cloud provider to another if something really awful happened and Google went offline across multiple zones, that would be really bad.”

Business Continuity
Other factors pertaining to inter-region disaster recovery expressly relate to networking differences between public clouds and on-premise settings. Typically, when failing over to additional clusters clients can simply connect to a virtual IP address that moves between servers depending on which node is active at that given point in time. This process involves gratuitous Address Resolution Protocols (ARPs), which are not supported by some of the major public cloud vendors. One solution for notifying clients of an updated IP address involves “creating some host-specific routes in different subnets so each of your nodes would live in a different subnet,” Bermingham said. “Depending upon whichever node is online, it will bring an IP address specific to that subnet online. Then, the routing tables would automatically be configured to route directly to that address with a host-specific route.”

Another option is to leverage an internal load-bouncer for client re-direction, which doesn’t work across regions. According to Bermingham: “Many people want to not only have multiple instances in different zones in a same region, but also some recovery option should there be failure in an entire region so they can stand up another instance in an entirely different region in Google Cloud. Or, they can do a hybrid cloud and replicate back on-prem, then use your on-prem as a disaster recovery site. For those configurations that span regions, the route update method is going to be the most reliable for client re-direction.”

Security Necessities
By taking these dedicated measures to ensure business continuity, disaster recovery, and high availability courtesy of failovers, organizations can truly make public cloud deployments a viable means of extracting business value. They simply require a degree of upfront preparation which many businesses aren’t aware of until they’ve already invested in public clouds. There’s also the issue of security, which correlates to certain aspects of high availability. “A lot of times when you’re talking about high availability, you’re talking about moving data across networks so you have to leverage the tools the cloud provider gives you,” Bermingham remarked.“That’s really where high availability and security intersect: making sure your data is secure in transit, and the cloud vendors will give you tools to ensure that.”

Originally Posted at: “To Cloud or Not”: Practical Considerations for Disaster Recovery and High Availability in Public Clouds

Technology Considerations for CIOs for Data Driven Organization

Technology Considerations for CIOs for Data Driven Projects

Making your business data driven requires monitoring a lot more data than you are used to, so you will have a big-data hump to jump. The bigger is the data at play, the more is the need to handle bigger interfaces, multiple data sources etc. and it requires a strong data management strategy to manage various other considerations. Depending on the business and relevant data, technology consideration may vary. Following are certain areas that you could consider for technology strategy to pursue data driven project. Use this as a basic roadmap, every business is specific and it can have more or less things to worry about.

So key technology considerations for today’s CIO’s  include:

Database Considerations:

One of the primary thing that will make entire data driven project workable is the database considerations. This is dependent on the risks associated with the database.

Consider the following:

Coexisting with existing architecture:

One thing that you have to ask yourself is how will the required technology square with the existing infrastructure. Technology integration if not planned well could toast a well-run business. So, careful consideration is needed as it has direct dependency on performance and cost of the organization. ETL (Extract, Transform and Load) tools must act as a bridge between relational environment such as Oracle and analytics data warehouse such as Teradata.

– Storage and Hardware:

To make the engine work requires lot of processing around compressions, deduplication and cache management. These functions are critical for making data analytics work efficiently. Data analytics is the backbone for most of data driven projects. There are various vendors out there with tools that are sophisticated to handle up to 45 fold compressions, and reinflation, making processing and storage tedious. So, consideration around tools and their hardware and storage need is critical. So, each tool must be carefully studied for its footprint on the organization and resource allocations should be made according to the complexity of the tools and tasks.

– Query and Analytics:

Query complication varies dependent on used case. Some queries do not require a lot of pre/post processing and some queries require deep analytics, pre and post processing. Each used case comes with its own requirement and therefore must be dealt with accordingly. Some cases may even require help of visualization tools to make data consumable. So, careful considerations must be made on low and high bar requirements of the used cases. Query and Analytics requirement will have indirect impact on the cost as well as infrastructure requirement for the business.

– Scale and Manageability:

Business often have to accumulate data from disparate data sources and analyze it in different environment making entire model difficult to scale and manageable. It is another big task to understand the complications around data modeling. It encompasses infrastructure requirements, tool requirements, talent requirements etc. to provision for future growth. A deep consideration should be given to infrastructure scalability and manageability for post data driven model business. It is a delicate task and must be done carefully for accurate measurements.

Data Architecture:
There are many other decisions to be made when considering the information architecture design as it relates to big data storage/analysis. These include choosing between relational or non-relational data stores; virtualized on-premise servers or external clouds; in-memory or disk-based processing; uncompressed data formats (quicker access) or compressed (cheaper storage). Companies also need to decide whether or not to shard – split tables by row and distribute them across multiple servers – to improve performance. Other choices to be made include choosing either column-oriented or row-oriented as the dominant processing method and hybrid platform or greenfield approach. Solution could be the best mix of above stated combinations. So, a careful run of thought must be given to data requirements.

– Column-oriented Database:

As opposed to relational (row-based databases), column-oriented database group stores data that share similar attributes, e.g. one record contains the age for every customer. This type of data organization is conducive to performing many selective queries rapidly, a benchmark of big data analytics.

– In-memory Database:

In-memory is another way to speed up processing by turning to database platforms using CPU memory for data storage instead of physical disks. This cuts down the number of cycles required for data retrieval, aggregation and processing, enabling complex queries to be executed much faster. They are expensive system and has a use when high processing rate in real-time is a priority. Many trading desks use this model to process real-time trades.

– NoSQL:

“Not Only SQL” provides a foundation Semi-structured model of data handling for inconsistent or sparse data. It’s not structured data, and therefore does not require fixed-table schemas, join operations and can scale horizontally across nodes (locally or in the cloud). NoSQL offerings come in different shapes and sizes, with open-source and licensed options and keep the needs of various social and Web platforms in mind.

– Database Appliances:

They are readily usable data nodes that are self-contained combinations of hardware and software to extend storage capabilities of relational systems or to provide an engineered system for new big data capabilities such as columnar, in-memory databases.

– Map Reduce:

Is a technique used for distributing computation of large data sets across a cluster of commodity processing nodes. Processing can be performed in parallel, as the workload is reduced into discrete independent operations, allowing some workloads to be most effectively delivered via a cloud-based infrastructure. It comes really handy when dealing with big-data problem at low cost.

Other Constraints:

Other constraints that are somewhat linked to technology but must be considered are resource requirement, market risks associated with tools etc. This could be considered as an ad-hoc task but it holds the similar pain point and must be taken seriously. Some other such decisions are if resource/support is cheap or expensive, risks with technologies being adopted and affordability of the tools.

– Resource Availability:

As you nail down on technologies needed to fuel the data engine, one question to ask is: whether the knowledgeable resources are available in abundance or will it be a nightmare to find someone to help out. It is always helpful to adopt technologies that are popular and has more resources available at lesser cost. It is a simple math of demand and supply but it ultimately helps a lot later down the road.

– Associated Risks with tools:

As we all know world is changing rapidly at the rate difficult to keep pace. With this change comes changing technological landscape. It is crucial to consider the maturity of the tools and technologies being considered. Installing something new and fresh holds the risk of lower adoption and hence lesser support. Similarly, old school technologies are always vulnerable to disruption and run-down by competition. So, technology that stays somewhere in the middle should be adopted.

– Indirect Costs & Affordability:

Another interesting point that is linked to technology is the cost and affordability associated with a particular technology. License agreements, scaling costs, and cost to manage organizational change are some of the important consideration that needs to be taken care of. What starts cheap need not be cheap later on and vice-verse, so carefully planning customer growth, market change etc would help in understanding the complete long term terms and costs with adopting any technology.

So, getting your organization on rails of data driven engine is fun, super cool, and sustainable but holds some serious technology considerations that must be considered before jumping on a particular technology.

Here is a video from IDC’s Crawford Del Prete discussing CIO Strategies around Big Data and Analytics

Source