Today’s article starts with a shout out to James Miller and his substack The Data MBA. If you’re not already subscribed, you should! It covers the idea of the business of data, and one of the key themes is that data should be treated as an asset.
James describes this principle far more eloquently than I could, but I have to say it’s not a totally new concept for me. One of my former heads of data also used to talk up this idea. His observation was the preposterous notion that somewhere in our organisation, there would undoubtedly be a facilities manager keeping a very precise catalogue of all of the assets they are responsible for. This person would be accountable for knowing exactly how many desks and chairs might be on company premises and where and how they were being used. Yet somehow, we were failing to give the same care and attention to our data assets.
Ok, so at the time he was building a business case for investment in an enterprise data catalogue, but still, I felt it was a fair point that always resonated with me. Reading James’ article reminded me of it. But it also got me thinking of other ways that the facilities management metaphor could be extended for data.
Those in the UK may well be familiar with the activity PAT Testing. PAT testing is an example of RAS Syndrome in that it stands for Portable Appliance Testing. It is normally undertaken periodically by facilities management teams where they go around and test appliances to make sure they are safe to use. This might include things such as making sure the office toaster or kettle are all safe as well as IT equipment such as laptops and monitors.
Items which passed the testing are labelled so that folks can be reassured of their safety.
Isn’t that a great idea for data assets? Do some testing and if the data is “safe” to use, make sure it’s labelled in a way to make sure consumers know it can be trusted.
The obvious tests to apply here would be for Data Quality. The Data Management Association (DAMA) have already defined six data quality dimensions by which we can assess the quality of data.
Those dimensions are:
Accuracy
Completeness
Uniqueness
Consistency
Timeliness
Validity
These dimensions give a good starting point for what to test for, but how should you do that testing? Quite honestly, I’m not sure I’ve arrived at an answer yet.
It feels as though this task most often falls to data teams, and we ended up applying testing to the data in our data platforms.
I’d love to see the trend of “shift left” happening more in data quality land and see this testing happen further upstream in data sources before they get consumed.
And I really want so see scalable and automated ways we can do this as well. Products like dbt have the ability to apply data quality tests, but you still have to make a lot of decisions about what and how to apply those tests. The same can be said for python packages such as Great Expectations.
I’m curious to dive in to the open source data quality tooling provided by Data Kitchen. I’ve been a long term fan of the work Data Kitchen have done around Data Ops principles and from what I’ve seen of this tool, it looks promising… alas, I haven’t had the capacity to explore it in depth. Perhaps watch this space for a more in-depth perspective.
In the meantime though, maybe us data professionals should pay more attention to our facilities management colleagues. We often look to software engineering disciplines to inspire our ways of working, but perhaps we’re missing a trick by not casting a wider net.
One last shout out… in between me drafting this post and it being published, another article on a very similar topic also popped up from the folks at Anmut. This is worthy of your attention too: https://www.anmut.co.uk/comparing-physical-assets-to-data-assets/
Love the article, not just for the mention (too kind), the general idea here of data assurance is really interesting - something that I've thought about too from perspective of ensuring data assets remain 'fit' to create whatever value they are on the hook for.
I believe the NHS is in the process of selling off NHS data (anonomised) so definitely an asset!