What exactly it is, its amazing value, with great use case analogies
Data transformation is nothing more than reducing the dimensionality of data to make it easier to work with and to bring precise insights out of it.
A fantastic analogy
Imagine you run a brick-and-mortar store and you would like to track how many people enter your store on a given day. You could set a bell at the door and every time you hear the bell, you make a mark on a notebook. At the end of the day, you write the date the annotation took place and repeat the process next day. In order to total how many people walked into the store on a particular month, you’ll need to count how many people entered everyday on that particular month. Once you have that information, you can compare this data month to month, and look at seasonality and how that affects foot traffic.
Now, see how seamlessly your data has transformed!!!
The process you went through when you counted all the markings and turned it into a number is Data transformation. The same can be said about the process of totaling the monthly customers. Taking all those markings in a piece of papers into a single number is reducing the dimensionality of the data collected in order to reduce its complexity before obtaining insights from it.
The mammoth power of transformed data!!!
Data transformation is the key ingredient in multiple industries. Utility companies for example, use an array of IoT devices to take remote measurements of different pieces of their infrastructure. The data contained in those measurements are sent over to a network of computer systems that monitor and alert stakeholders. This data has a time decay tacit within it; nobody will place importance to an alert that happened 2 years ago as – hopefully – the alert was acted upon, and the issue is no longer present. However, storing the alert data and any other data point to describe the alert for later analysis will allow us to perform an analysis to perhaps predict future failures.
Data Transformation as the ideal ice breaker
Pull out all information you need from piles of raw data!!!
Another powerful characteristic of transforming data is the ability to derive information from non structured, non domain specific unrelated data. Let’s go back to the example of our brick-and-mortar store. If one day a customer comes in and thanks us for our wonderful service, we could write down in our network that we received a compliment. If the next day a customer complains by telling us about their bad experience with an item they purchased, we could also write down we got one complaint that day. Now, imagine a large brand with presence online, they receive reviews on their website, twitter, Instagram, Facebook and any other social media. How would they know how many of those are complaints vs compliments? Better yet: how do you know what product/service or specific thing are the customers complaining/complementing about?. Here is where techniques like Natural Language Processing (NLP) and sentiment analysis can reduce the dimensionality of the data by qualifying all the different textual reviews and counting how many are good vs bad. It could also understand and classify what each review is about and group them together to help us understand what exactly we need, to improve in our customer experience, our products, etc.