Purposeful Metadata

When we do taxonomy projects we almost always come to a point where we are having a conversation with our clients about metadata. In fact, we prefer to absorb metadata design into a taxonomy project, simply because it gives us so much more influence over the performance of our taxonomy – metadata design heavily influences the degree to which any given system can exploit a taxonomy in practical ways.

Often, however, the metadata design is already embedded in the technical systems part of a project, and we come to the metadata table late in the day. I find myself over-frequently meeting the “just in case” school of metadata design – often from within the IT systems design school, where more functionality is better, and more metadata means more future capabilities. “If we have the metadata, we can manipulate the content in xyz ways.”. More is better. More metadata insures you against future regret (“Oh my god, I wish we had put that metadata field in back when we were designing the system”).

Now if the collection of this metadata can be automated you at least overcome the friction imposed on contributors to a system in terms of metadata contributions that they see as stupid and needless and never ever going to benefit them. But a lot of metadata collection cannot be automated or if it can, accuracy will be compromised, because automated systems are not very good at predicting the exceptions and special circumstances that we humans are so good at generating.

So there’s a cost to more metadata in terms of accuracy or user friction (too much metadata friction and content goes elsewhere, users simply route their activity around rather than through the system that is supposed to be helping them). And we often behave as though metadata decisions are simply decisions about costs (friction) weighed against benefits (“think of all the different things we can do with it” ).

I’m coming to believe that this cost-benefit discussion is simply wrongly framed. I’d like to use three real-world (ie non-digital world) stories about metadata to argue that having a purpose for metadata will guarantee its usefulness and its use, and hypothetical cost-benefit balances just disappear. Technically, all three stories are really about data – ie structured information about things, rather than information content, but if we call it metadata just for the sake of argument, I think my three stories suddenly throw into relief three (neglected) major productive purposes for metadata that could arguably make its necessity self-evident to content contributors, and keeps the metadata focused on productivity, performance and outcomes.


(Photo from Flickr)

The first story is about the dabbawallas of Mumbai. A dabbawalla is someone who goes to your home at a set time every working day, collects your freshly cooked lunch in your own personal lunchbox from your loving wife, and consistently gets it to you in your office within a margin of minutes from your regular lunch time. He performs this service across a sprawling city of 25 million people in unpredictable traffic where commutes are long and arduous – and he and his colleagues handle tens of thousands of these lunches in the space of just a couple of hours in the day. After lunch he comes back, and the lunch-box travels back across the city through an intricately coordinated transfer system back to your wife.

The key to the success of this system in a city as complex as Mumbai, is metadata. Your lunchbox has a set of unique markers that tell the different dabbawallas in the transfer chain (many of them illiterate) which part of the city it comes from, and exactly where the meal should be delivered, on which floor, at which office, at what time. The metadata helps the lunchbox to get to where it belongs. And back again.

Now we often talk about metadata as enhancing findability, and this is the most frequent purpose to which we put taxonomy-based metadata. But we are terribly vague about coding subject terms in a generalized vocabulary for generalized “just in case” use. When I choose a subject category for my document, I am presented with an entire taxonomy to choose from, when only a fraction of it is going to be relevant to the work my department does. We need to get smarter at knowing who could be productive consumers of different types of content might be, and making sure that metadata collection for content is customized to those target communities.

We should not be imposing standard metadata frameworks for all types of content and comprising all possible tag options, or asking people to choose from all the terms in a taxonomy and think afresh each time about all possible contexts of use for their contribution.

We should already know who the likely audiences are – and at the point of contribution in a particular department, provide only the metadata options that are relevant to the purpose – getting the content to their targeted consumers. And we already have a mechanism for finding out the potential audiences for any given content – it’s called a knowledge audit.

This is not to say we shouldn’t also have mechanisms to support serendipitous discovery as well as targeted use, but imposing a high burden of metadata selection on content creators is not the way to support smart serendipity. I’ve written here and here and in my book about smarter mechanisms for doing this.


My second story comes from Atul Gawande’s account of how the US military have in the last ten years brought down mortality rates from battlefield wounds from 24% to just 10%. One critical factor, it turns out, is the speed with which an injury can be treated – war wounds involve substantial trauma, and speed of medical aid is critical.

So the military broke down their medical treatment into three levels of provision – forward surgical teams (FSTs) with the bare essentials in equipment and supplies close to the scene of an engagement where intervention could be swift (but not complete), medium scale combat support hospitals (CSHs) behind the lines where follow up could be carried out, then fully equipped specialist hospitals where extensive treatment, recovery and rehabilitation can take place.

Treatment is brought closer to the combat field, but completeness of treatment is not. The forward combat hospitals are all about throughput. You don’t fix patients here, you merely stabilize them sufficiently to get them up to the next level, where they can be treated properly. For injuries that involve substantial hospital stays, the patients will be moved up the chain to the specialist military hospitals in Europe or the US.

As Gawande notes, this involved a substantial change to the surgeon’s mindset, which is instinctively to treat a patient as fully as you can, and also trust your own judgment more than anyone else’s. This three-tier system has distributed and fragmented the treatment – and the thing that holds this together in an environment of great injury, distress and urgency, is metadata.

Gawande describes the case of an airman wounded in a mortar attack near a town called Balad in Iraq:

“Bleeding was controlled, resuscitation with intravenous fluids and blood begun, a guillotine amputation at the thigh performed. He received exploratory abdominal surgery and because a ruptured colon was found, a colostomy. His abdomen was left open, with a clear plastic covering sewn on. A note was taped to him explaining exactly what the surgeons had done. He was then taken to Landstuhl [Germany] by an air force critical care transport team.” (Better, p60-1)

The metadata, of course, is the note taped to the injured man. It captures the context of his treatment so far, and enables the next tier of surgeons to pick up where the last one left off.

We also often talk about the administrative function of metadata – metadata which tells the system when and how to perform management actions on content – when it expires or should be archived, when and how workflow actions should be implemented.

We don’t talk enough about metadata as a tool for distributed, collaborative tasks around documents, such as policy development, research, case management – ie context-setting metadata collection that allows stage-specific information and opinions to be captured to enable smoother handoffs.


The third story also comes from Gawande’s discussion of diligence among army doctors and is linked to the dramatic increase in survival rates. Improved success in saving lives has happened in spite of lower skilled manpower and an increase in the variety (lack of typicality) of injuries, as warfare methods become more provisional and unpredictable.

The key to effective and flexible response comes from the ability to learn, and this comes from a willingness to create – and analyse the patterns in – metadata.

“… the medical teams took the time, despite the chaos and their fatigue, to fill out their logs describing the injuries and their outcomes… they input more than seventy-five different pieces of information on every casualty – all so they could later analyze the patterns in what had happened to the soldiers and how effective the treatments had been.” (Better, p64)

In this way, medical teams found that the nature of blast injuries from improvised explosive devices had significant differences from traditional munitions, and were consequently able to update first aid kits to deal with them better; they saw that eye injuries were increasing, and they found that soldiers weren’t wearing their protective eyewear because it was “uncool” – the army had the eyewear redesigned and injuries dropped again; and the list goes on. Careful completion of the metadata, and detailed attention to the patterns in the metadata helped the medical teams – and the army – learn how to prevent and respond to injuries and enhance their performance.

Commercial sites such as Amazon collect metadata about their customers’ behaviour and use the patterns in transactional metadata to learn how to be more responsive and proactive with customers, but in the enterprise we still treat this as a luxurious extra, and too rarely collect and analyse metadata around the use and usefulness, the behaviours around, and application of our content. “It’s too complicated” to quote a previous Green Chameleon blog post.

Gawande’s war stories suggest that a sharp focus on performance and improvement and learning when looking at metadata needs can pay off. Metadata, at last, might have some legitimate reward for its contributors and users – if its purpose and its link to effectiveness and outcomes is explicit and is transparent.

4 Comments so far

Clive Flashman

Just a quick note for anyone interested in hearing Atul Gawande deliver a lecture in London on Moday June 4th - for free.

He is talking at Imperial College at 6pm. There are still lots of tickets I believe. Write to ‘events@imperial.ac.uk’ by monday morning to register. The lecture is being given in G16, Sir Alexander Fleming Building, South Kensington Campus, London.

I’m going - hope to see you all there!

Posted on June 01, 2007 at 11:56 PM | Comment permalink

Patrick Lambe

Thanks for this Clive, I wish I could make it! Ask him whether he plans to come to Singapore smile

Posted on June 04, 2007 at 10:16 AM | Comment permalink

Clive Flashman

I can now do even better - the vdeo recording of the Atul Gawande lecture this week is now available at the following URL for your viewing pleasure!

Posted on June 06, 2007 at 07:35 PM | Comment permalink

Patrick Lambe

Clive, I am forever indebted - must buy you dinner sometime!

Posted on June 07, 2007 at 11:03 AM | Comment permalink

Page 1 of 1 pages

Commenting is not available in this weblog entry.

Comment Guidelines: Basic XHTML is allowed (<strong>, <em>, <a>) Line breaks and paragraphs are automatically generated. URLs are automatically converted into links.