Storage silo’s and the Information duplication dilemma

About a year ago I started a personal project to ‘find a better way to store all digital information’. I know this doesn’t say much yet, but in short I think that nowadays storage methodologies are based on old principles that lost some of their original purpose during the changes that happened in the digital landscape.

As we all know, there is a strong move toward cloud storage solutions and use of specialized information networks. But the bad thing is that even with these changes, the methods we use to access and relate information are still mostly based on the fundamental principles we started with years ago.

Don’t get me wrong. I understand why this has happened. Concepts that work have a tendency to get used over and over again. There is just no need to fix something that isn’t broken. And for some part this is true for the way we now store and work with our information. We have managed to use it to create a worldwide web of information, are able to create digital generic products and custom applications for a huge range of usage scenario’s. And all of that without any big problems when it comes to storing and retrieving information, right?

Well, that’s one aspect I seriously doubt. My opinion is that our storage systems are at least ‘sub-optimal’ when it comes to both maintenance and usage. During usage they require huge amounts of resources. Think about the processing power, backups, replication & load balancing needed to keep all of our networked systems running. But that’s not all. During development something I like to call “data model replication” happens in even the smallest applications. We are bound to the concept of duplication when it comes to relating information from system A to information that originated in system B. In other words: we need to copy the information from system B to system A in order to use it there. OK, I know that there are some new innovative approaches that solve part of this problem. One is the usage of ‘Triple stores’ to uniquely identify different data ‘entities’ and allow information to be related in new, clever ways. These changes are exiting and will for sure lead toward a whole new landscape filled with ‘Big Data’ powered applications.The problem I am having is that I do not see which of those innovations are able to cope with the ever increasing amount of personal digital data. There is no way (next to duplication) to relate information between an email and an holiday photo on disk. The only thing we can do is to attach a copy of the photo into the email and send it to someone else. At minimum this will lead to 3 copies of the photo floating through our system. The same goes for most other systems. If we want to related or share data we need to copy it. Next to that it is still a good practice to create a copy of a document if you want to revert back from changes you apply to a document. Everyone who has ever written an important document using a (text)editor knows that it is wise to work with multiple versions and use some kind of backup in case of disk or system failures. It is an enormous frustration to lose day(s) of work. In my opinion these manual ‘guidelines’ are a negative, maybe even hidden burden of our dependency on IT appliances in today’s society.

Another simple example is our private photo album. I guess most people will have some kind of ‘collection’ in which they store all the photo’s they collect through time. It doesn’t actually matter if this collection is stored on our PC or laptop (on-disk), somewhere in the cloud, or maybe spread out over multiple social network silos (like Facebook/Twitter, Instagram). We want to maintain some level of personal control over the collection. The need can rise to structure, categorize or sanitize photo’s in case the collection becomes too huge. Trying to maintain years and years of family photo’s is for example next to impossible without a decent hierarchical structure. I’m for one quite curious how people that rely on a social network like Facebook will keep their photo’s manageable in the years to come. Maybe there comes a moment on which some of people decide to copy all Facebook photos into their local (on-disk/cloud) collection. The question in this matter is if such an action is made easy by Facebook.

Last but not least are all those funny video we all receive. Let’s say you want to share a video you received two months back with a friend because it is related to this person somehow. Do you still know in which social network you got tipped to check out the video? Not me. I’ve got such items streaming in over Facebook, Twitter, Google+, private/public email inboxes. And if those funny video’s weren’t enough: The need to know where information resides is even bigger in the business world. The firm I work for has introduced its very own social network. This opens up yet another channel over which receive questions, answers, hints, tips, and references. All of those need be used some time or another, so I need to remember where to find them. Quite a big load on my mental capacity if you’d ask me. And absolutely not how I’d imagine technology working for me.

All in all I think that we should not have to rely on a multitude of information silo’s and ‘data duplication’ powered information management actions for our everyday life and work. With each ‘pool of data’ we connect to and every applied ‘data management action’ the needed amount of mental referencing capacity increases.

This is a very bad thing to do and will cause stress eventually.

In short this covers a few of the reasons to migrate away from the current storage paradigms, and towards something better. I’m still unable to adequately describe the concept I am working on, but until now the best fitting description would be a ‘unifying graph based storage method’ without the functional limitations we all take for granted when working with everyday information.

I’ll try to create some technical/philosophical blog posts for some of the weird findings I ran into during this adventurous endeavor. Most probably this will result in a huge pile of mini mind dumps and scribbles of related ideas, theories, concepts and results.

Just my two pennies…


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s