Monday, January 9, 2017

What is the difference between Data Analytics, Data Analysis, Data Mining, Data Science, Machine Learning, and Big Data?


Lately, I've been doing some research on Machine Learning, which seems to be very interesting and impressive from my point of view. Creating software was always interesting, but coding to "educate" your software in a way that it can learn from previous experiences makes this even more interesting and more impressive. However, if you try to find information about Machine Learning, you will see some other topics that are closely related to it, which are: Data Analytics, Data Analysis, Data Mining, Data Science, and Big Data but, How do they differ from each other?

Here are some core concepts:

Data AnalyticsAnalytics is about applying a mechanical or algorithmic process to derive the insights for example running through various data sets looking for meaningful correlations between them. 

Data AnalysisAnalysis is really a heuristic activity, where scanning through all the data the analyst gains some insight

Data Miningthis term was most widely used in the late 90's and early 00's when a business consolidated all of its data into an Enterprise Data Warehouse. All of that data was brought together to discover previously unknown trends, anomalies and correlations such as the famed 'beer and diapers' correlation (Diapers, Beer, and data science in retail).

Data Sciencea combination of mathematics, statistics, programming, the context of the problem being solved, ingenious ways of capturing data that may not be being captured right now plus the ability to look at things 'differently' (like this Why UPS Trucks Don't Turn Left ) and of course the significant and necessary activity of cleansing, preparing and aligning the data.

Machine Learningthis is one of the tools used by data scientist, where a model is created that mathematically describes a certain process and its outcomes, then the model provides recommendations and monitors the results once those recommendations are implemented and uses the results to improve the model

In addition, I found a discussion about this topic and I wanted to share some thoughts which I consider it can help to clarify:

"The way I see it, machine learning is concerned with algorithms whose performance at some task improves as it gains experience at that task, while data mining is concerned with analysing data for the purpose of discovering unforeseen patterns or properties.

So the similarities are obvious, they both look at data, and hope to extract something of value from it. As I see it, the main difference is whether the goal is to reproduce known knowledge (I know that some of these pictures are cats, and some are dogs, now can some algorithm learn that?), or if the goal is to discover unknown knowledge (is there any interesting structure in this data set?). The two are, unsurprisingly, intertwined, as many of the properties or structure one may be searching for in data mining can be identified by machine learning algorithms. For instance, in data mining, one might be interested in determining if clusters of a certain form appear in the data, and could use a machine learning algorithm like k-means. K-means is a learning algorithm, in that if data has a known structure, it can learn it (under specific conditions, blah blah blah).

So data mining is exploratory, machine learning is focused on solving specific tasks well. That's my take on it, anyway." (by: Jordan Frank)


The following graphic nicely summarizes what all is involved in data science.










Programming thought of the day:


  • The truth is out there. Anybody got the URL?

No comments:

Post a Comment