Defining “Taxonomy”

image

Yesterday I made the claim that a taxonomy cannot be defined by its shape, which is mostly how it does get defined eg “A taxonomy is a hierarchical arrangement of terms blah blah blah...”. I argued that taxonomies should be defined more by their purpose and use, less by the structural form they happen to take (which can vary according to circumstance).

What would a more useful definition be? To start with, we need to go back beyond Linnaeus and the rather narrow sense of “taxonomy” developed by biologists. Let’s go back to the Greek roots and see what they deliver.

The word taxonomy itself derives from two Greek stems: taxis, and nomos. Liddell and Scott’s Greek-English Lexicon describes the meaning of nomos as: “anything assigned, usage or custom, law or ordinance”.

Taxis, broadly, means the arrangement or ordering of things, but it is used in ancient Greek quite flexibly to encompass the disposition of soldiers in military formation, a battle array, a body of soldiers, the arrangement, order or disposition of objects, order or regularity in general, ordinances, prescriptions or recipes, assessment of tributes or assigned rations (whence comes taxation), political order or constitution, rank, position or station in society, an order or class of men, lists, registers, accounts, payments, and land types, a treatise, a fixed point of time, or a term of office.

So the term taxonomy means in general the rules or conventions of order or arrangement, and the variety of usage we’ve just seen reflects the extent to which taxonomies can enter daily life, from classes of people to the disposition of things, ideas, times and places.

This somewhat loose description will form our background definition, instead of the much narrower sense of taxonomy as it has evolved in the biological sciences. When we come to knowledge management applications however, we need more specific guidance on what to look for in a good taxonomy.

There are three basic characteristics of a taxonomy for knowledge management, and to be any good at its job, it needs to fulfil all three functions:

1. A taxonomy is a form of classification scheme

Classification schemes are designed to group related things together, so that if you find one thing within a category, it is easy to find other related things in that category. Notice that I am deliberately not using the term “similar” things.

Many classification schemes are indeed based on similarity of attributes, but we organise things in our world on the basis of many kinds of relationship, not just similarity. Examples might be functional proximity (things we do around the same time such as when we go shopping), causal relationships, the relationships embedded in organisational structures, and so on. We are deliberately keeping our definitions broad and flexible, so as not to get trapped into a narrow and unnecessarily limited set of applications for taxonomies.

We use classification in every aspect of our lives. When we go to the supermarket for oranges, we know we are on the right track when we can see vegetables. When our email inbox starts to get overwhelmed with emails, we create named subfolders and sort our emails into them for ease of retrieval later.

Classification schemes can be very informal and ad hoc, such as when we organise our bathroom cabinet, or line up our music CDs by genre. They can also be highly formal and standardised. We might be familiar with some of the more well known formal classification schemes, for example the Dewey Decimal Classification (DDC) in libraries, or the North American Industry Classification Scheme (NAICS) used in procurement functions or in official statistics.

2. Taxonomies are semantic

Taxonomies in knowledge management are a little different from formal published classification schemes. In libraries, classifications serve to summarise the subject matter of books and articles in an abbreviated code, which is usually also used as a means of locating the physical item in a fixed sequence on the shelves. Shared codes bring related books together physically. Classifications such as NAICS also focus on the use of shorthand codes as a standardised means of enabling information transfer and data manipulation. It can be used in a very wide variety of ways: for example in statistical returns, in company registrations, or the provision of procurement services in electronic marketplaces.

Taxonomies in knowledge management do not usually rely on codes. They are primarily semantic. That is, they provide a fixed vocabulary to describe their knowledge and information assets, and this vocabulary needs to be meaningful and transparent to ordinary users. When content is labelled “Project kickoff”, everybody should know what kinds of documents they can expect to find within that category.

In the language of librarianship and information science, a taxonomy also therefore provides a controlled vocabulary. It is controlled in the sense that the meaning of each label is carefully considered, and ambiguous, alternate or less precise terms are excluded. A new term is admitted to the taxonomy only when it clearly describes a commonly understood category of content, for which there is currently no term.

This usually means that changes to the taxonomy are managed carefully. Changes are not random, and ordinary users cannot change them. This is very different from folksonomies, which are completely user defined.

A taxonomy is also semantic in the sense that it expresses the relationships between terms in the taxonomy. In a taxonomy of driving, CAR : STEERING WHEEL would imply the relationship “is a part of” between STEERING WHEEL and CAR. In the folder structure PROJECT DOCUMENTS : PROJECT KICKOFF we immediately recognise that we will find other types of project documents adjacent to the PROJECT KICKOFF folder, and we expect that they will be linked to the sequence of stages in a project.

If you take all of the labels in a taxonomy, and put them into alphabetical order, you have your controlled vocabulary – a kind of dictionary of your taxonomy.

If you then take each term in your controlled vocabulary and describe its relationships with other terms in the taxonomy, you get a thesaurus. In other words, a thesaurus is simply your taxonomy in dictionary format. A taxonomy on the other hand is a thesaurus with all the labels organised by subject. A taxonomy visually represents or maps the subject as a whole, while a thesaurus explains the topics one by one.

A good thesaurus goes beyond your base taxonomy however – it will also include all the other alternative words we use in common language for your “controlled” terms – motorcars, for example as an alternative term for “CAR” – and point them to the authorised controlled term. These terms will not usually appear in your taxonomy (although they are sometimes included in “scope notes” – explanatory notes describing the meaning of each category label). A good thesaurus will also highlight any other relevant relationships.

An immediate and obvious use for a thesaurus is in a search engine. A good thesaurus can ensure that a whole range of commonly used synonyms that are not part of your controlled vocabulary can still retrieve relevant taxonomy categories. If a synonym means the same thing as a term in your taxonomy and has been associated with that term in the thesaurus, your non-taxonomy keyword will still be recognised and retrieve valid results.

A thesaurus can also help you deal with nasty homonyms – when the same word can mean different things. Let’s say I’m searching for cheap flights to Ireland, and I type “Dublin” into the box for destination airport. The page reloads with the question, “Do you mean Dublin Ohio or Dublin Ireland?” The site has a thesaurus, and it is using that thesaurus to disambiguate two homonymous terms.

3. A taxonomy is a kind of knowledge map

The great military writer von Clausewitz speaks of the importance of the “coup d’oeuil” for the experienced general. By this he means that with one “cast of the eye” over the military situation, the general can immediately grasp its implications and start to anticipate appropriate courses of action. A good taxonomy should enable the same feat in regard to a knowledge domain for any of its users. With one “coup d’oueil” any user of the taxonomy should immediately have a grasp of the overall structure of the knowledge domain covered by the taxonomy, and be able to accurately anticipate what resources she might find where. The taxonomy should be comprehensive, predictable and easy to navigate.

Many of the taxonomies you will see look like hierarchies or tree structures, like the folder structure in your computer, or the site map in your intranet. The structure of the tree visually represents the nature of the relationships between the categories and sub-categories, and this makes the map predictable, enabling navigation between categories. But as we’ve just seen, a taxonomy doesn’t always have to be a tree or a hierarchy to perform its three key functions.

The question I asked yesterday was “what shape is a taxonomy?” It doesn’t really matter, so long as it does its job of work for you. It should help you find things based on relationships, using vocabulary that expresses concepts and relationships meaningfully to users, and it should help you navigate a knowledge domain in an easy and educational manner.

3 Comments so far

Cool post! To me your rendering and description is of an ontology, rather than a taxonomy. Maybe the distinction is specious, but I see taxonomies as “2-dimensional” and hierarchical, and ontologies breaking the mold and going 3-D, growing in any/every direction.

Posted on April 29, 2006 at 03:51 AM | Comment permalink

Patrick

Thanks Susansmile An ontology to me is a much more complex rendering of the knowledge domain, and does not fulfil the third criterion of providing easy to understand navigation of the domain. Although I don’t agree with his limitation to taxonomies having hierarchical form, James Melzer’s comment on the distinction between taxonomy and ontology on the Taxocop discussion group is spot on.

Posted on May 02, 2006 at 10:09 AM | Comment permalink

Stan Garfield

Hi, Patrick.

I linked to this entry from http://h20325.www2.hp.com/blogs/garfield/archive/2007/04/11/3077.html

Regards,
Stan

Posted on April 11, 2007 at 10:49 PM | Comment permalink

Page 1 of 1 pages

Commenting is not available in this weblog entry.

Comment Guidelines: Basic XHTML is allowed (<strong>, <em>, <a>) Line breaks and paragraphs are automatically generated. URLs are automatically converted into links.