Tuesday, June 5, 2012

Tags, Tags, and More Tags

In the Didget Realm, every single Didget can have lots of tags attached to it. Tags are similar to extended attributes that have been added to some file systems. It is extra metadata that exists outside of an object's regular metadata and separate from its data stream. While tagging data is nothing new, the approach we take to implement them with Didgets is very different than other previous solutions like extended attributes or database tags.

Extended Attributes

File system extended attributes are simple Key:Value pairs. The key is a simple string without any specific context involved. Just like file names, a file system will not attempt to interpret the meaning of a given key, it is just a simple lookup with no relationship between any two given keys. Likewise, a filesystem will not attempt to impose any restrictions of the value assigned to any given key other than making sure its length does not exceed any imposed limit.

File systems were not designed to allow fast, efficient searches for files based on the existence of extended attributes or based on any particular value assigned. For example, if an application wanted to find all the documents within a given file system volume that had the extended attribute "Author=John" attached to it, it would need to do a brute force search by finding every file with a document extension and examining each one individually to see if it had that particular extended attribute key and value. For a volume with a million or more files in it, such a search can be painfully slow.

Since many file systems do not support extended attributes and using them can be difficult, they are rarely used by applications. If a file with extended attributes is moved or copied to another file system, it is likely that the extended attributes will either be lost or altered in some way.

Database Tags

Some applications allow the user to tag data by storing information inside of a database managed exclusively by that application. Popular data management software like iTunes and Picasa use this technique to tag music and photos. These databases are not meant to be shared openly between applications and if a photo or music file is copied from one volume to another, the tags don't come with it. A user is only able to search based on the tags if the specific application supports it.

Didget Tags

Unlike these other approaches, our tags are designed to be widely used, shared, and searchable. Any application can use our simple API to get a list of tag definitions and attach tag values based on those definitions to any Didget. Any application can then find Didgets based on tags or add their own tags to make a Didget easier to find or manage.

Every tag within a Didget Chamber is defined using a simple schema. Once a tag is defined, any application can use any defined tag to attach a value to a Didget. If an application wants to use a tag that is not currently defined, it can quickly define a new tag which adds its definition to the schema. Applications can search for Didgets that have a certain defined tag attached to it or more specifically, have a certain value assigned.

For example, an application can define a new tag ".person.Nickname" and then attach that tag with the value of "Bubba" to a photograph Didget. Another application can later query the Didget Manager for a list of all Photograph or Document Didgets that have ".person.Nickname = Bubba" attached. The Didget Manager would be able to process that query in just a few seconds even if there were 2 million Photograph Didgets and 3 million Document Didgets mixed in with 5 million other kinds of Didgets and all of the Didgets had some tags attached to them.

Likewise, applications could search for all Didgets that had any tag of category ".person". It could find a list of all Music Didgets where ".person.musician=Billy Joel" and ".date.year=1980". The Didget Manager is able to perform these lightning fast queries without needing a separate database or implementing a complicated query language.

Unlike file extended attributes, tags are not lost when a Didget is copied or moved to another Chamber. This is because all Chambers support the tags and because applications do not perform the actual copy operation. An application will initiate the copy operation by telling the Didget Manager to copy a Didget, but it is the Didget Manager itself that makes sure nothing is lost during the copy. You never have to worry that your tags will be lost because the application forgot to copy them.

Tags are powerful tools to help users and applications to add meaningful metadata to any or all of the Didgets within a Chamber to enable fast searches based on specific values and build lists or menus from the results.

No comments:

Post a Comment