Jan 30, 20: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

[  COVER OF THE WEEK ]

image
Correlation-Causation  Source

[ AnalyticsWeek BYTES]

>> 3 Vendors Lead the Wave for Big Data Predictive Analytics by analyticsweekpick

>> THE FUTURE OF BIG DATA by analyticsweekpick

>> Great Learning Data Science Awards by administrator

Wanna write? Click Here

[ FEATURED COURSE]

Process Mining: Data science in Action

image

Process mining is the missing link between model-based process analysis and data-oriented analysis techniques. Through concrete data sets and easy to use software the course provides data science knowledge that can be ap… more

[ FEATURED READ]

Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython

image

Python for Data Analysis is concerned with the nuts and bolts of manipulating, processing, cleaning, and crunching data in Python. It is also a practical, modern introduction to scientific computing in Python, tailored f… more

[ TIPS & TRICKS OF THE WEEK]

Analytics Strategy that is Startup Compliant
With right tools, capturing data is easy but not being able to handle data could lead to chaos. One of the most reliable startup strategy for adopting data analytics is TUM or The Ultimate Metric. This is the metric that matters the most to your startup. Some advantages of TUM: It answers the most important business question, it cleans up your goals, it inspires innovation and helps you understand the entire quantified business.

[ DATA SCIENCE Q&A]

Q:How to optimize algorithms? (parallel processing and/or faster algorithms). Provide examples for both?
A: Premature optimization is the root of all evil – Donald Knuth

Parallel processing: for instance in R with a single machine.
– doParallel and foreach package
– doParallel: parallel backend, will select n-cores of the machine
– for each: assign tasks for each core
– using Hadoop on a single node
– using Hadoop on multi-node

Faster algorithm:
– In computer science: Pareto principle; 90% of the execution time is spent executing 10% of the code
– Data structure: affect performance
– Caching: avoid unnecessary work
– Improve source code level
For instance: on early C compilers, WHILE(something) was slower than FOR(;;), because WHILE evaluated “something” and then had a conditional jump which tested if it was true while FOR had unconditional jump.

Source

[ VIDEO OF THE WEEK]

@SidProbstein / @AIFoundry on Leading #DataDriven Technology Transformation #FutureOfData #Podcast

 @SidProbstein / @AIFoundry on Leading #DataDriven Technology Transformation #FutureOfData #Podcast

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

Torture the data, and it will confess to anything. – Ronald Coase

[ PODCAST OF THE WEEK]

Dave Ulrich (@dave_ulrich) talks about role / responsibility of HR in #FutureOfWork #JobsOfFuture #Podcast

 Dave Ulrich (@dave_ulrich) talks about role / responsibility of HR in #FutureOfWork #JobsOfFuture #Podcast

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

Big data is a top business priority and drives enormous opportunity for business improvement. Wikibon’s own study projects that big data will be a $50 billion business by 2017.

Sourced from: Analytics.CLUB #WEB Newsletter

How to create a business glossary on Talend Data Catalog using API Services and Data Stewardship

We often come across people talking about managing their data by one means such as a Data Lake, MDM or data governance. Modern data management is not only about managing your data but also about making data useful for the business. Furthermore, data management is also about providing the ability to relate frequently used business terminologies to data in the systems. Most of the big enterprises spend months to discover and identify the impact of any change on their entire data supply chain.

For example, introducing a change to a business terminology could cause domino effect of change on the dependent systems, companies usually spend a large amount of time accurately identifying the impact of such a change on the downstream systems and even that does not guarantee 100% success rate. Incorrect impact analysis would usually result in breaks in the lineage and propagation of incorrect information across the data supply chain. Talend Data Catalog is a one-source platform which can help you leverage the single source of truth with the flexibility of using API’s to add or remove new terminologies and relate it to data within the systems.

With the introduction of Talend Data Catalog API we can now leverage REST API calls to automate actions on business terms such as create, update, delete terms etc.

A sample job shown in Figure 5 demonstrates how we can use Talend data catalog’s REST APIs through a Talend DI Job to set attributes, custom attributes for new business terms in Talend Data catalog glossary model as needed. Additionally, we can rely on Talend data stewardship to accept or reject changes made to terms in the business glossary.

Figure 1 below shows the Swagger documentation of the Talend Data Catalog APIs available. As seen from the documentation, Talend Data Catalog APIs provide a rich feature set to programmatically access and manipulate metadata content.

Rest API Calls

Figure 1: Talend Data Catalog Rest API Calls

Talend Data Stewardship for business terms

Data Stewardship plays a critical role for a successful data-driven glossary across the enterprise. Data Stewards perform a significant contribution in cleaning the data, refining the data and approving the data. Talend provides a data stewardship portal and Studio components to leverage stewards for validating and approving terminologies that should be part of the enterprise glossary. The work of a data steward is dictated by two core components called campaigns and tasks. There are four types of campaigns: Arbitration, Resolution, Merging or Grouping.

To begin, we will use a Resolution campaign to create a data model in the data stewardship portal. The data model we will create needs to have attributes such as name, glossarypath, categorypath, and description as shown in Figure 2. Data stewards will then explore the data that relates to their tasks, resolve the tasks on a one to one basis or for a whole set of records.

 
Data Stewardship Portal

Figure 2: Data Stewardship Portal to create Campaign and data model

In the below example we have created a new Talend Data Integration job to fetch business terms and their corresponding description from database and assign for approval to data stewards. As shown in Figure 3, we can levarage enterprise databases having a predefined table with terms and their definitions and push them through data stewards for changes/approvals or we can pass it through a file.

Talend Data catalog

Figure 3: Fetch terms from database and assign to stewards.

 

In tStewardshipTaskOutput component put the correct URL of data stewardship portal and corresponding user credentials. Create column in schema as created on data stewardship data models attribute. Select the campaign type as resolution. You can assign the task to a particular steward or select “No Assignee” as shown below.

 

Talend Data catalog

Figure4: Create Steward Task for approval

Create another job with components to connect Talend Data Catalog using the REST API call as shown in Figure 5. Then select and add approved terms by data stewards into data catalog glossary. We can also export all terms as CSV file using the export API call. We can also update or add custom attributes to the terms. Finally close the REST API call connection and delete the task on data stewardship if you want to.

fetch approved terms by data Stewards

Figure 5: Job to fetch approved terms by data Stewards and create into Data Catalog Glossary

As shown in Figure 6, create a tRestClient connection to access the data catalog portal. Provide the correct URL and HTTP method as GET and Accept Type as JSON. Provide the query parameters such as user, password and forceLogin as “true”.

 

create a REST connection to data catalog

Figure 6: tRestClient component to create a REST connection to data catalog

 

Extract the JSON fields and map it to “Session_token” column to store the connection approved token for future access as shown in Figure 7.

 

Talend Data Catalog

Figure 7: Extract access token from response

 

Set the global variable with the access token and add another key as “id” corresponding to object_id of the glossary in Talend Data Catalog as shown in Figure 8.

 

Talend Data Catalog

Figure 8: Set Global variable with glossary Object_id and access token

 

Stewards should select the tasks assigned to them and approve them by clicking on corresponding rows and validating their choices to approve or reject as shown below in Figure 9.

Data stewards portal

Figure 9: Resolving terms by data Stewards in Stewardship portal

 

Select all the terms approved by stewards corresponding to a particular steward or any assignee which has State set to “ Resolved” or custom state such as “Ready to Publish” using tDataStewardshipTaskInput component as shown below in Figure 10.

 

Talend Data Catalog

Figure 10: Selecting resolved terms ready to publish in data catalog glossary

 

Use a tREST component to add term to the data catalog glossary. As shown below in Figure 11, provide the correct URL with parameters of API token and accept type as Json.

 

Glossary Talend Data catalog

Figure 11: Adding term to glossary

 

Once you have added terms into glossary you can either export them as CSV as shown using tREST component in Figure 12 or you can add custom attribute to a term as shown in Figure 13.

Glossary terms, data Catalog

Figure 12: Downloading glossary terms as CSV file

Attributes Glossary Terms

Figure 13: Setting attributes to a glossary term

 

In summary, Talend Data Catalog Rest API feature provides lot of flexibility for business to populate business terminologies into Talend Data Catalog glossary by various means and a platform to incorporate data governance and regulatory compliance with the involvement of right stakeholders and data stewards. This blog is a starting point for exploring ways to make use of Talend Data Catalog Rest API’s for Talend Data Catalog. 

The post How to create a business glossary on Talend Data Catalog using API Services and Data Stewardship appeared first on Talend Real-Time Open Source Data Integration Software.

Source: How to create a business glossary on Talend Data Catalog using API Services and Data Stewardship by analyticsweekpick

Logi Tutorial: How to Create Engaging Visual Content

[youtube https://www.youtube.com/watch?v=WAnfaUD9K6s?feature=oembed&w=640&h=360]

Full Transcript

The analysis grid makes it possible to explore, shape, and visualize data through an easy-to-use web interface. With the analysis grid, users can create engaging visual content, filter and aggregate data, and find meaningful answers without technical skill or help from IT. This report was created in less than 10 minutes. In this video we’re going to reconstruct this report while learning the fundamentals of using the analysis grid. Let’s take a look at the tools and features the analysis grid has to offer.

I’ll start by selecting a data source. There may be multiple options available to you. This is the underlying database that we desire to connect and pull data from. You now have a list of tables to choose from. The data sets can be joined together to provide extra data fields. You may also want to tick the check boxes next to each data field name. This will remove it from the data request.

Before we begin creating charts, it’s important to understand how to filter and format and manipulate data. You may have already noticed that there are tabs for formulas and filters as well as tools for adding charts and crosstabs. There’s also an undo and redo feature in case a mistake is made.

Filters are really simple to add. Select the filter column, choose a comparator, and then set the value to filter by. You can create as many filters as needed and you may also remove or edit existing filters. The formulas allow you to make calculations using data fields you’ve selected. The formula help button provides some in depth explanations for the functions and tools available in this feature.

If you noticed when data is selected, you’re given a table with the resulting data. The table has tooling for showing, hiding, sorting, grouping, aggregating, and paging. Now it is possible to focus these modifications to a specific column or field of data. You can achieve this by clicking on the column header. This will allow you to select from an option list in the menu, and in this case, I’d like to format my freight data as currency. When I create charts or other report content, the freight information will display as currency instead of a basic number.

Now that our data has a formatting, we can continue to building a chart. The chart creator tool has many options that help us visualize the data. The analysis grid will detect field data types automatically and guides us to making a chart that makes sense. Start by selecting the label column. This will be the label for each bar in the chart. Notice how the tool knows that this is a date field and provides us temporal grouping options. Now let’s select the freight field in the data column. This will show us how much freight was paid for each quarter. It looks like we need a bit more room to correctly display the content of this visualization so we can do this by simply clicking and dragging the handles on the right bottom or right bottom corner. Drag the chart and size to your desired height and width.

Certain charts allow for the use of forecasting. In this chart, I want to use a logarithmic regression to show what the next two quarters might look like. And we do have quite a few different options available that would fit our use case. Looking good. Now let’s build a cross tab table. In a similar fashion to the chart creator, the crosstab creator guides us through to build a useful content. I want to use this table to break down all of the totals displayed in the bar visualization, but now I can see how much freight we’re spending for each country. This gives me deeper supplemental information that I might not have previously been aware of.

For each column, we can reorder it and resize. I’m going to resize the columns in this table to make it a little bit better for a presentation and viewing experience. Simply click the handles and move them around or click the handle on the right hand side and that will allow you to readjust the size. Now that we’re done with the crosstab and we’ve got now two different visual pieces which we can display to users. So in this case, I think I want my bar above the cross tab. And you’ll notice all I had to do was simply click and drag and we can rearrange the blocks on this.

This report can be given a specific name, which can help us identify what content or information it contains when we come back to it, or perhaps share it with other users. The analysis grid has a baked in auto save feature, so any changes we make to the report are saved automatically without the user having to manually click a button. We’ve completed our report and we’re ready to ship it out to other users.

Source by analyticsweek

Jan 23, 20: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

[  COVER OF THE WEEK ]

image
Convincing  Source

[ AnalyticsWeek BYTES]

>> May 29, 2017 Health and Biotech analytics news roundup by pstein

>> Part 8 of 9, Big Data/Data Lake Platforms: Removing Silos & Operationalizing Your Data by analyticsweekpick

>> The Modern Data Warehouse – Enterprise Data Curation for the Artificial Intelligence Future by analyticsweek

Wanna write? Click Here

[ FEATURED COURSE]

Master Statistics with R

image

In this Specialization, you will learn to analyze and visualize data in R and created reproducible data analysis reports, demonstrate a conceptual understanding of the unified nature of statistical inference, perform fre… more

[ FEATURED READ]

The Signal and the Noise: Why So Many Predictions Fail–but Some Don’t

image

People love statistics. Statistics, however, do not always love them back. The Signal and the Noise, Nate Silver’s brilliant and elegant tour of the modern science-slash-art of forecasting, shows what happens when Big Da… more

[ TIPS & TRICKS OF THE WEEK]

Data Have Meaning
We live in a Big Data world in which everything is quantified. While the emphasis of Big Data has been focused on distinguishing the three characteristics of data (the infamous three Vs), we need to be cognizant of the fact that data have meaning. That is, the numbers in your data represent something of interest, an outcome that is important to your business. The meaning of those numbers is about the veracity of your data.

[ DATA SCIENCE Q&A]

Q:Provide examples of machine-to-machine communications?
A: Telemedicine
– Heart patients wear specialized monitor which gather information regarding heart state
– The collected data is sent to an electronic implanted device which sends back electric shocks to the patient for correcting incorrect rhythms

Product restocking
– Vending machines are capable of messaging the distributor whenever an item is running out of stock

Source

[ VIDEO OF THE WEEK]

#FutureOfData Podcast: Conversation With Sean Naismith, Enova Decisions

 #FutureOfData Podcast: Conversation With Sean Naismith, Enova Decisions

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

Big Data is not the new oil. – Jer Thorp

[ PODCAST OF THE WEEK]

Harsh Tiwari talks about fabric of data driven leader in Financial Sector #FutureOfData #Podcast

 Harsh Tiwari talks about fabric of data driven leader in Financial Sector #FutureOfData #Podcast

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

We are seeing a massive growth in video and photo data, where every minute up to 300 hours of video are uploaded to YouTube alone.

Sourced from: Analytics.CLUB #WEB Newsletter

Build an Affordable $600 eSports Gaming PC to Play CS: GO, DotA 2, LoL and Overwatch

eSports games have become more and more popular among both fans and gamers. As a result, many people start dreaming of building eSports careers—many gamers have pretty solid plans to achieve that goal. What does every gamer need to play eSports disciplines on the appropriate level, and to be able to challenge top LoL (League […]

The post Build an Affordable $600 eSports Gaming PC to Play CS: GO, DotA 2, LoL and Overwatch appeared first on TechSpective.

Source: Build an Affordable $600 eSports Gaming PC to Play CS: GO, DotA 2, LoL and Overwatch by administrator

Jan 16, 20: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

[  COVER OF THE WEEK ]

image
Data shortage  Source

[ AnalyticsWeek BYTES]

>> Visualizing taxi trips between NYC neighborhoods with Spark and Microsoft R Server by analyticsweek

>> The Wonders of Effectual Metadata Management: Automation by jelaniharper

>> Unraveling the Mystery of Big Data by v1shal

Wanna write? Click Here

[ FEATURED COURSE]

Artificial Intelligence

image

This course includes interactive demonstrations which are intended to stimulate interest and to help students gain intuition about how artificial intelligence methods work under a variety of circumstances…. more

[ FEATURED READ]

Hypothesis Testing: A Visual Introduction To Statistical Significance

image

Statistical significance is a way of determining if an outcome occurred by random chance, or did something cause that outcome to be different than the expected baseline. Statistical significance calculations find their … more

[ TIPS & TRICKS OF THE WEEK]

Data aids, not replace judgement
Data is a tool and means to help build a consensus to facilitate human decision-making but not replace it. Analysis converts data into information, information via context leads to insight. Insights lead to decision making which ultimately leads to outcomes that brings value. So, data is just the start, context and intuition plays a role.

[ DATA SCIENCE Q&A]

Q:Explain selection bias (with regard to a dataset, not variable selection). Why is it important? How can data management procedures such as missing data handling make it worse?
A: * Selection of individuals, groups or data for analysis in such a way that proper randomization is not achieved
Types:
– Sampling bias: systematic error due to a non-random sample of a population causing some members to be less likely to be included than others
– Time interval: a trial may terminated early at an extreme value (ethical reasons), but the extreme value is likely to be reached by the variable with the largest variance, even if all the variables have similar means
– Data: “cherry picking”, when specific subsets of the data are chosen to support a conclusion (citing examples of plane crashes as evidence of airline flight being unsafe, while the far more common example of flights that complete safely)
– Studies: performing experiments and reporting only the most favorable results
– Can lead to unaccurate or even erroneous conclusions
– Statistical methods can generally not overcome it

Why data handling make it worse?
– Example: individuals who know or suspect that they are HIV positive are less likely to participate in HIV surveys
– Missing data handling will increase this effect as it’s based on most HIV negative
-Prevalence estimates will be unaccurate

Source

[ VIDEO OF THE WEEK]

@CRGutowski from @GE_Digital on Using #Analytics to #Transform Sales #FutureOfData #Podcast

 @CRGutowski from @GE_Digital on Using #Analytics to #Transform Sales #FutureOfData #Podcast

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

Hiding within those mounds of data is knowledge that could change the life of a patient, or change the world. – Atul Butte, Stanford

[ PODCAST OF THE WEEK]

#BigData @AnalyticsWeek #FutureOfData #Podcast with @ScottZoldi, @FICO

 #BigData @AnalyticsWeek #FutureOfData #Podcast with @ScottZoldi, @FICO

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

According to estimates, the volume of business data worldwide, across all companies, doubles every 1.2 years.

Sourced from: Analytics.CLUB #WEB Newsletter

Where Chief Data Scientist & Open Source Meets – @dandegrazia #FutureOfData #Podcast

[youtube https://www.youtube.com/watch?v=-DkLqKwEhHo]

In this podcast @DanDeGrazia from @IBM spoke with @Vishaltx from @AnalyticsWeek to discuss the mingling of chief data scientist with open sources. He sheds light into some of the big opportunities in open source and how businesses could work together to achieve progress in data science. Dan also shared the importance of smooth communication for success as a data scientist.

Dan’s Recommended Read:
The Five Temptations of a CEO, Anniversary Edition: A Leadership Fable by Patrick Lencioni https://amzn.to/2Jcm5do
What Every BODY is Saying: An Ex-FBI Agent8217;s Guide to Speed-Reading People by Joe Navarro, Marvin Karlins https://amzn.to/2J1RXxO

Podcast Link:
iTunes: http://math.im/itunes
GooglePlay: http://math.im/gplay

Dan’s BIO:
Dan has almost 30 years working with large data sets. Starting with the unusual work of analyzing potential jury pools in the 1980s, Dan also did some of the first PC based voter registration analytics in the Chicago area including putting the first complete list of registered voters on a PC (as hard as that is to imagine today a 50 megabyte harddrive on DOS systems was staggering). Interested in almost anything new and technical, he worked at The Chicago Board of Trade where he taught himself BASIC to write algorithms while working as an Arbitrager in financial futures. After the military Dan moved to San Francisco where he worked several small companies and startups designing and implementing some of the first PC based fax systems (who cares now!), enterprise accounting software and working with early middleware connections using the early 3GL/4GL languages. Always perusing the technical edge cases Dan worked for InfoBright a Column store Database startup in the US and AMEA , at Lingotek an In-Q-Tel funded company working in large data set translations and big data analytics companies like Datameer and his current position as a Chief Data Scientist for Open Source in the IBM Channels organization. Dan’s current just for fun Project is working to create an app that will record and analyze bird songs and provide the user with information on the bird and the specifics of the current song.

About #Podcast:
#FutureOfData podcast is a conversation starter to bring leaders, influencers and lead practitioners to come on show and discuss their journey in creating the data driven future.

Want to sponsor?
Email us @ info@analyticsweek.com

Keywords:
#FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

Source by v1shal

November 14, 2016 Health and Biotech analytics news roundup

Here’s the latest in health and biotech analytics:

Data Specifics Identified for Prediagnostic Heart Failure Detection: IBM researchers analyzed machine learning models that predict heart failure (paper). Among other findings, they worked out that models perform best with shorter prediction windows.

Will Google Take Over the Medical Industry? Big Questions at CO’s Healthcare Conference: In the keynote speech at the Pulse Healthcare Conference, Andrew Quirk pointed to many new players entering the healthcare industry. Panels at the conference covered topics like patient experiences and the future of hospitals.

Accelerating cancer research with deep learning: Georgia Tourassi is head of Health Data Science at Oak Ridge National Laboratory. Her group is using deep neural networks to extract useful diagnostic data, such as the location of a tumor, from clinical reports.

A student innovation to tackle cognitive challenges in health informatics wins this year’s Sysmex Award: The New Zealand diagnostics company gave the award to Daniel Surkalim, a University of Auckland student. He proposed using “graphical relational integrated databases” to make it easier for providers to access electronic health data.

Originally Posted at: November 14, 2016 Health and Biotech analytics news roundup

Jan 09, 20: #AnalyticsClub #Newsletter (Events, Tips, News & more..)

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://events.analytics.club/tw/eventpull.php?cat=WEB): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

Warning: file_get_contents(http://news.analyticsweek.com/tw/newspull.php): failed to open stream: HTTP request failed! in /home3/vishaltao/public_html/mytao/script/includeit.php on line 15

[  COVER OF THE WEEK ]

image
Accuracy check  Source

[ AnalyticsWeek BYTES]

>> AI Now 2019 report slams government use of facial recognition, biased AI by administrator

>> Talent Analytics: Old Wine In New Bottles? by analyticsweekpick

>> 12 Drivers of BigData Analytics by v1shal

Wanna write? Click Here

[ FEATURED COURSE]

Tackle Real Data Challenges

image

Learn scalable data management, evaluate big data technologies, and design effective visualizations…. more

[ FEATURED READ]

Hypothesis Testing: A Visual Introduction To Statistical Significance

image

Statistical significance is a way of determining if an outcome occurred by random chance, or did something cause that outcome to be different than the expected baseline. Statistical significance calculations find their … more

[ TIPS & TRICKS OF THE WEEK]

Data aids, not replace judgement
Data is a tool and means to help build a consensus to facilitate human decision-making but not replace it. Analysis converts data into information, information via context leads to insight. Insights lead to decision making which ultimately leads to outcomes that brings value. So, data is just the start, context and intuition plays a role.

[ DATA SCIENCE Q&A]

Q:You are compiling a report for user content uploaded every month and notice a spike in uploads in October. In particular, a spike in picture uploads. What might you think is the cause of this, and how would you test it?
A: * Halloween pictures?
* Look at uploads in countries that don’t observe Halloween as a sort of counter-factual analysis
* Compare uploads mean in October and uploads means with September: hypothesis testing

Source

[ VIDEO OF THE WEEK]

Decision-Making: The Last Mile of Analytics and Visualization

 Decision-Making: The Last Mile of Analytics and Visualization

Subscribe to  Youtube

[ QUOTE OF THE WEEK]

The most valuable commodity I know of is information. – Gordon Gekko

[ PODCAST OF THE WEEK]

Future of HR is more Relationship than Data - Scott Kramer @ValpoU #JobsOfFuture #Podcast

 Future of HR is more Relationship than Data – Scott Kramer @ValpoU #JobsOfFuture #Podcast

Subscribe 

iTunes  GooglePlay

[ FACT OF THE WEEK]

Distributed computing (performing computing tasks using a network of computers in the cloud) is very real. Google GOOGL -0.53% uses it every day to involve about 1,000 computers in answering a single search query, which takes no more than 0.2 seconds to complete.

Sourced from: Analytics.CLUB #WEB Newsletter

Making Magic with Treasure Data and Pandas for Python

Mirror Moves, by John Hammink
Mirror Moves, by John Hammink

Originally published on Treasure Data blog.

Magic functions, a mainstay of pandas, enable common tasks by saving you typing. Magic functions are functions preceeded by a % symbol. Magic functions have been introduced into pandas-td version 0.8.0! Toru Takahashi from Treasure Data walks us through.

Treasure Data’s magic functions work by wrapping a separate % around the original function, making the functions callable by %%. Let’s explore further to see how this works.

Until now

We start by creating a connection, importing our relevant libraries, and issuing a basic query, all from python (in Jupyter). Using the sample data, it would look like this:

import os
import pandas_td as td

#Initialize connection
con = td.connect(apikey=os.environ[‘TD_API_KEY’], endpoint = ‘https://api.treasuredata.com’)
engine = con.query_engine(database=’sample_datasets’, type=’presto’)
#Read Treasure Data query into a DataFrame
df = td.read_td(‘select * from www_access, engine’)

With the magic function

We can now do merely this:

%%td_use_sample_datasets

%%td_presto
select count(1) as cnt
from nasdaq

If you add the table name nasdaq after %% td_use, you can also see the schema:

use_sample_datasets_nasdaq_schema

Even better, you can tab edit the stored column names:

tab_edit_stored_column_names

As long as %matplotlib inline is enabled; then you can throw a query into magic’s %%td_presto – -plot and immediately visualize it!

Screen Shot 2015-08-14 at 2.05.20 PM

Very convenient!

How to enable it

Set the API_KEY environment variable:
export TD_API_KEY=1234/abcd…

You can then load the magic comment automatically! You’ll want to save the following to ~/.ipython/profile_default/ipython_config.py

c = get_config()
c.InteractiveShellApp.extensions=[
‘pandas_td.ipython’,
]

Let’s review

Loading your data:
review_load

Querying your data with presto:
review_query with presto

Accessing stored columns:
review_stored_columns

Plotting:
review_plot

Stay tuned for many more useful functions from pandas-td! These tools, including Pandas itself, as well as Python and Jupyter are always changing, so please let us know if anything is working differently than what’s shown here.

Magic, by John Hammink
Magic, by John Hammink

Originally Posted at: Making Magic with Treasure Data and Pandas for Python by john-hammink