The Transmogrification Of Your Data… Hidden In Plain Sight
If you’ve been following along in the series, you know that we use an (I) to indicate the information phase of The Analytic Process (TAP). You also know that each of our phases has its own letter. You could say we are really into letters. With that in mind, it is useful symbolism to note that in every (I) there is a (T). Wait, there is no T in our model!
All information is transported, transformed, translated, or transmitted. That is a lot of (T). If you are following along, you will also note that is a lot of processing (P). For our purposes (and mostly just for this article), we will refer to the hidden processing with a (T). All (I) includes a hidden (T). Hidden because too many people don’t bother to look for it, account for it, or understand it. Ask the data guys…
Your Local Supermarket Has An Organic Aisle… Good Luck At Your Local Data Warehouse
For data to move from point A to point B it must extracted and loaded. These are the E & L in ETL. Data need not be transformed, but quite often it is. Even then, some transformation is both necessary and expected. How do you know what kind of transformation your data went through? Too many analysts assume there is nothing to see here. They are oblivious to the (T).
But hidden in the (T) is a lot of information, caveat, and complexity:
- Depending on how your data is transmitted (one of the more necessary Ts), it may be incomplete, intermittent, batched, lagged, or (in analytic fantasy land) perfect, real-time, and complete.
- Once it arrives it may be translated. Codes and scores, conversions and cross references are never a problem with complete metadata and a strong data dictionar… sorry, I am laughing too hard here.
- Data is often transformed in mostly harmless ways. Well, at least that is what the vendors tell you. Dates are formatted, text fields are truncated, Null values are accounted for (in ways that require more accounting later…). Integers become real numbers and vice versa. Fractions are rounded so that Superman 2 and Office Space have some plot fodder.
- Then there are the more manual components of the data. Data entry is often a front line effort — think call center reps, the role of temp workers, or shipped off to entry farms in low cost of living countries.
- Finally, things change. Platform updates, data migrations, anything with the word legacy, and/or the word outage are likely to inject a whole new level of “transmogrification” into your data.
As I have said before:
Analyst Know Thy Data Process
Okay, I did a little translation on that one. Point being — an understanding of all the (T) that is inherent in your data can be critical to developing any true insight from it. At a pure minimum, you must couch your findings with a realistic sense that much of this is going on. As an example:
Avoid — People are…
Use — The data indicates people may be…
You may defer (D) a forensic trip through your entire data infrastructure, but you will need to determine (D) when chasing down specific lineage is a requirement. Many a major corporation has thrown millions behind amazing behavior that was nothing more than poor data processing. A little secret… many never knew. If the analysts recognize the situation too deep into the effort, well… self-preservation is a powerful force for bad behavior.
A wise man once told me (same guy):
Never attribute to nefarious intent what can easily be explained by incompetence.
Avoid Enron, E-corp, and Evil Corp — know your data process. Recognize the hidden (T) in every (I). Don’t assume that other smart people have taken care of this. Those smart people weren’t hired to provide meaningful insight to the company and until you verify, you don’t really know how smart they truly are, or competent, or attentive.
Next up — we explore another hidden or overlooked sub-layer. Stay tuned for TAP #13. And thanks for reading!
Gurupriyan is a Software Engineer and a technology enthusiast, he’s been working on the field for the last 6 years. Currently focusing on mobile app development and IoT.