Wednesday, May 30, 2012

Tracking Changes With Event Counters and Marker Didgets

File systems use date and time stamps to keep track of when files are created, last modified and last accessed. As stated earlier, these values are stored within each file metadata record and reflect the value of the system clock running on the host system. Any application can later change these values to anything they want using the file API.

As computers and storage devices became faster and faster, many different files could be created and modified within a single second. This can make it difficult to determine in what order various operations occurred. The solution for file system designers was to increase the granularity of the values stored in the date and time stamps from milliseconds to microseconds and finally to nanoseconds. If the system clock was always guaranteed to be accurate and no applications could set these values to any arbitrary number then this approach would always allow for an accurate accounting of the order of events within a file system. Unfortunately, that is not the case.

The Didget Management System takes an entirely different approach. Each chamber has its own "Event Counter". This event counter is a 48 bit number maintained exclusively by the Didget Manager and that starts with 1 and is incremented each time some event within the chamber occurs. If the current event counter has a value of 100 and ten new Didgets are created, then first Didget created gets the value of 101 in its "Create" field. The second Didget created gets the value of 102 in its field. This continues until the tenth Didget created gets the value of 109. There is no API that allows any application or operating system to directly change the value of any of the event counter fields stored in the Didget records or mess with the event counter itself.

Using this technique, it is always possible to list every Didget in the chamber in the exact order in which they were created. Likewise, you can find the last 100 Didgets that were accessed or the last 10,000 that were modified. You can tell if Didget X was created before or after Didget Y was last modified.

Just like every other field in the Didget table record, these event counter fields can be specified when performing a general search. For example, in a chamber with millions of Didgets, I can get a list of all the Didgets that were created between event counter 100,000 and event counter 300,000 in under a second. A backup program that knows it performed its last backup at event number 1,000,000 can quickly get a list of all Didgets that were either created or modified after that point.

While the event counter allows you to know that event X happened exactly one event before event Y, it doesn't tell you when either of those events actually occurred or how much time passed between them. That functionality is left to special Didgets called Marker Didgets.

A Marker Didget is used to match either a specific date and time (as recorded by the system clock) or a specific type of event with an event counter value. These Marker Didgets are created either by the Didget Manager itself or by applications. Each Marker Didget has a type associated with it. Some types that have been defined so far are "Chamber Created", "Chamber Mounted", "Chamber Dismounted", "Backup Started", "Backup Ended", "Virus Scan Started", "Virus Scan Ended", and "Time Stamp". The Didget Manager code automatically creates one of these markers when the chamber is created, mounted, or dismounted. Applications like backup programs and virus scanners can create them when they start and when they finish. Any application can create a Time Stamp Marker at any time.

When a Marker Didget is created, the current value of the chamber's event counter is stored in that Marker's creation event counter field (just like it is whenever any other kind of Didget is created). In addition, the Didget Manager queries the system clock and stores the current date and time value also in the Marker Didget's metadata record. This allows us to know that event 1000 occurred at 10:00 am on June 4, 2011 and event 2000 occurred two days later at 1:00 (or at least that what the system clock said the times were). Even if the system clock values are not accurate, the order of events always stay intact.

The Didget Manager will not only create these Marker Didgets whenever an application specifically commands it, it can also be set to do it automatically by using Policy Didgets (another type of special "Managed Didget" that I will detail in a later post). A user or application could create a policy that commands the Didget Manager to automatically create a "Time Stamp" Marker Didget every 15 minutes. Likewise, a policy could direct it to create a "New Month" Marker every time the Didget Manger detects that a new month has began.

Marker Didgets can be used in queries to find Didgets that have been created, modified, or accessed either before, after or between Markers. For example, I could query for a list of all Photo Didgets that were created before the chamber was last mounted. I could ask for a list of all Document Didgets that have been accessed since the last backup or last virus scan. I could ask for a list of all Software Didgets that were created after the New Year Marker in 2010 but before the New Year Marker in 2011.

Sunday, May 27, 2012

Lists, Menus, and Collections

In addition to File Didgets (things like photos, documents, music, or video) there are also a number of "Managed Didgets" where the Didget Manager controls the contents of their data streams. I would like to talk about three of them that are used to organize various other groups of Didgets - List Didgets, Menu Didgets, and Collection Didgets.

List Didgets

A List Didget is a Didget that has, as its contents, a simple list of other Didgets to form a logical group. If a Didget has been added to one of these List Didgets, it could be said that it is "a member of that group". Unlike files that must reside within a single folder or directory (the exception being file systems that support hard links), Didgets are generally expected to be members of several different groups. A single music Didget could be members of the "Music", "Rock and Roll Music", "80's Music", and "My Favorite Songs" groups.

Just like every other Didget in the system, List Didgets can have attributes and tags assigned to them. In addition, List Didgets can be assigned "rules" about what kinds of other Didgets can be added as members. So I could create a List Didget with a name tag of "My Hawaii Vacation Photos" attached; will only allow Photo Didgets with the format of JPEG as members; and contains a current list of 100 Photo Didget IDs. This Didget would effectively represent a "Photo Album" with 100 photos in it of my vacation.

List Didgets are especially suitable for lists where you don't care about each member having some kind of label associated with it. In the example given, it is not important that each of the 100 photos are given labels like "Day at the Beach", "Snorkeling" or "Whale Watching". They could be identified only as "Photo 1", "Photo 2", "Photo 3", ... , "Photo 100", or they could have no names at all.

Menu Didgets

Just like List Didgets, Menu Didgets contain references to other Didgets. Their uniqueness lies in the requirement that every member Didget also be given a short label within the menu. I could create a Menu Didget called "Clint Eastwood Movies" and have 10 members that are all Video Didgets. Each member would have a label in the Menu like "Kelly's Heroes", "Magnum Force", or "Space Cowboys".

Collection Didgets

A Collection Didget is also very similar to a List Didget in that it can have lots of members that do not need any kind of label associated with each of them. With Collection Didgets, members are divided into two distinct categories - Mandatory Members and Optional Members. Collection Didgets are primarily used to track data set completeness. I could have a Collection Didget called "Microsoft Office Software" that contained a list of all the Software Didgets that are required to run that software package as well as another list of Didgets that are nice to have (spell checkers, thesaurus, help files, etc.) but are not needed to run it. A simple utility could be built that checks to see if a given Didget Chamber has everything it needs to run without having to start up the program. It will be possible to build a sophisticated "Package Management System" using these Collection Didgets.

Hierarchy

All these kinds of Didgets can be nested within each other. I can build a hierarchy of menus that had menus inside of other menus. I could mimic a traditional file system folder hierarchy using this technique. "List of Lists" could likewise be made by nesting List Didgets within other List Didgets. Same for "Collection of Collections". I could mix them up a bit by creating a "List of Collections" or a "Menu of Lists".

Each of these "Managed Didgets" represents a powerful tool for organizing groups of data and giving applications an elegant way of visually representing them to the end user.

Didget Organization

Each Chamber within the Didget Realm is capable of storing billions of Didgets. Unlike files in a file system, a Didget does not have to be "located" within a folder. Each Didget can be assigned certain types, attributes, and tags that can be used to distinguish it from all the other Didgets in the system. Simple queries can quickly sort out all the Didgets that match a given search criteria.

Each Didget can be a member of one or more "data sets" but there is no requirement to do so. This means that I could have a Chamber with 10 million Didgets in it and have none of them categorized into a specific data set. This would be like the early days of file systems where there were no folders and all files were in the "root directory".

Even with folders, it is possible to have lots of files in any given folder. For example, some folders like "Windows", "bin", or "My Documents" can sometimes get populated with several thousand files. In file systems, such a situation can cause some real problems. Name conflicts are more likely to arise as the number of files within a single folder rises, since you can't have two files with the same name in the same folder. Performance is also slow when you try to load the contents of the directory into a file manager or dump the list to a terminal screen when there are so many files.

There have been tools available for a long time to help users pick a subset of files out of a long list of available files within such a "crowded folder".  To pick out a smaller subset of all those files a user may issue a command like "Dir *.exe" or "ls *.cpp". Just those files with that extension will be listed, giving the user a much more manageable list to navigate.

In the Didget Realm, things are substantially different. Since each Didget has a unique number as its identifier, there is no problem with having 10 million Didgets all in the same "root directory". The interface is designed to perform lightning-fast searches based on criteria much more powerful than just file names or extensions. In this respect it has a lot more in common with a database than a file system.

So if I want to get a quick list of all the Didgets that are photos in JPEG format of my vacation in Hawaii last year, I can just do a simple query against all 10 million Didgets and get a list of all 100 photos in less than a second. I don't have to navigate down through a directory hierarchy to try and find the C:\Photos\Vacations\Hawaii\2011 directory that has all 100 photo files in it. If I want a Didget to be a part of several different queries, I can just attach additional tags like ".people.Group = Family" and ".activity.Sport = Surfing" and it will appear when I search for those things. I don't need to create separate folders like "C:\Photos\Family" or "C:\Photos\Surfing" and either put copies of the photos in them or create hard or soft links to the original photos.

In addition to the quick query capability of our system, Didgets can be organized into different groups or sets that are more persistent than ad hoc queries. There are three special kinds of Didgets that help organize them - List Didgets, Menu Didgets, and Collection Didgets. I will discuss them further in my next post.

Sunday, May 20, 2012

Past Solutions

The Didget Manager was created to solve a number of data management problems. I will attempt to state the biggest problem as best I can and then illustrate a number of different solutions that have either been attempted in the past, or are currently in use.

Problem: How do you properly manage many millions of pieces of structured and unstructured data especially when they are spread across several storage devices?

It is very easy for an individual or business to buy sufficient storage capacity to hold tens of millions of pieces of data. A 3 TB hard disk drive can be purchased for about $150. Small RAID storage  devices that can hold 12 TB of data can be built for less than $1000. Flash based USB drives or Solid State Drives cost about $1 per GB.

In my home, I have counted over 20 different storage devices that have a capacity of at least 8 GB. Cameras, computers, PVRs, phones, iPads, and video cameras all come with built-in storage. In addition I have several external storage devices like thumb drives, backup drives, and a NAS.

They are all filling up with data. Photos, home video, documents, downloaded software, music, and other stuff seem to slowly fill any available storage. Portable devices like cameras and phones are often synchronized with my laptop or desktop computer. It can be difficult to tell sometimes if I have only one copy of a given photo or if I have dozens of copies spread around all my storage devices.

So what falls under the definition of "Manage" when it comes to data?

1) Backup. Naturally, we want to insure that we have proper backups of all important data. The backup can be located locally in case a storage device just fails or it can be pushed to a remote site to help insure a successful disaster recovery procedure.

2) Replication. Backup is a form of replication, but it also includes the placement of data on several different devices to enable convenient access. We always want to be able to access our important data no matter which device we have with us at the time.

3) Search. Even if we have a storage device with us, it doesn't help us if we can't find the document we are looking for among several million others. We want to be able to search for it based on its name, some attributes it may have, or by a set of keywords.

4) Protection. We want to be able to prevent important information from being altered or destroyed by accident or by a malicious program.

5) Data Sets. We want to be able to organize data by placing various pieces of it into different sets. A set can be an album full of pictures, a play list with dozens of songs, or a software package containing a hundred different programs, libraries, or configuration settings. It would be very helpful if every piece of data could be a member of more than just one data set.

6) Synchronization. If we have more than one copy of something, it would be nice to be able to have changes made to one of the copies be synchronized to all the copies. If a new element is added to a replicated set, it would be helpful to also add it to all the copies of that set.

7) Security. We want to make sure a piece of data can be only accessed by those with permission. We want to make sure that security is not compromised just because the data is moved or copied to another location.

8) Inventory. We want to get an accurate accounting of all our pieces of data. We want to know if any of our data sets are incomplete. We want to know how many documents, photos, or videos we have. We want to know if there are any security holes. We want to know what has changed and what devices have not yet been synchronized, backed up, or replicated.

9) Completeness. We want to make sure when a piece of data is copied from one place to another that it is a complete copy. The data stream as well as all metadata including things like extended attributes need to be copied to assure that the clone is an exact duplicate of the original.

What attempts have been made so far to accomplish some of these data management tasks and what are their limitations?

1) A well-organized file directory tree. This is where applications and users must adhere to a clearly defined plan for grouping all files into appropriate folders or directories. All the operating system files go in C:\Windows; all the user utility programs go in /bin or /usr/bin; or all the user documents go into the C:\users or /home areas. This approach can work fairly well when there are only a few thousand files to deal with. Unfortunately, it requires a lot of work to keep all the files in their proper directory. It also makes it difficult to decide where to put that photo you just downloaded - in the C:\Photos directory or in the C:\Downloads directory.

2) Lots of databases. Since most databases were not built to manage lots of unstructured data like photos, video, and documents (Blobs in database speak), databases are generally used to track and manage files instead. Windows Search, OS X Spotlight, Google's Picasa, iTunes, and iMovie are examples of programs that store file information within a database. When a file is created, its full path along with additional metadata are stored in the database. This allows the user to keep track of millions of files and do very fast queries based on things like keywords or tags. Incremental backups, replication and synchronization functions, and data sets can be tracked using these databases as well. Unfortunately, the databases are completely separate from the file system. It is possible for users or applications to create new files and delete or modify existing files without the database being updated as well. Even if there is a background monitoring tool that has a file system filter driver informing it of every change, it is possible to make changes while that tool is not running. Even if every file system change is accompanied by an accurate update to one or more databases, it can be difficult to manage lots of files when there are dozens of separate databases, each keeping track of just a subset of the whole file system. A separate application is probably managing each database independently and those applications seldom talk to each other.

3) Embedded Data. Some file formats like JPEG allow metadata to be embedded within a file's data stream without disrupting its normal processing. Things like the camera info, data and time the picture was taken, and the GPS coordinates of the location where the picture was taken can be stored within the Exif data portion of .JPG files. Unfortunately, this data is not always accessible or searchable by all applications and some applications can alter the metadata unintentionally. Few data formats allow this behavior so it has limited application.

4) Extended Attributes. This external metadata can be attached to any file within a file system that supports them. Unfortunately, they are generally not searchable and are not universally available. Only some file systems support them and not all supporting file systems have the same rules for implementation. When an application copies a file from one file system to another, the extended attributes can be lost, altered, or just stripped off because the application forgot to copy them.

Tuesday, May 15, 2012

The Didget Record

As stated earlier, every Didget has a 64 byte metadata record used to track it. The Didget Manager is software that manages all the Didgets in the system. Unlike a file system, the Didget Manager is able to distinguish between different kinds of Didgets.

A file has only one mechanism (outside of the actual bits stored in its data stream) used for classification. That mechanism is the file extension (typically a three or four character string appended to the end of the file name). The file extension may symbolize the format of the data stream but the file system does not try to interpret its meaning. It is completely up to applications to interpret which file extensions belong to a particular category.

For example, there are dozens of different data stream formats that are used to represent a still image (e.g. a photograph). JPG, PNG, GIF, TFF, BMP, and ICO are all examples of file extensions used to represent images. If a user wanted to know how many total image files were on a system, they would have to run an application that was programmed to find every type of file extension applicable to images. Since there is no way to ask a file system for a list or a count of "all image files", the application would need to perform a separate search for every file extension. If a volume contained millions of files, this simple search could take up to an hour or more to complete. If a new file extension was created to represent a new image format, the application would need to be updated so that it would look for files with that new extension.

Didgets on the other hand, have several mechanisms that are used to classify data. Every Didget has a Didget type and a Didget subtype. If the type is File Didget, then it also has a File Didget format assigned. The Didget type and subtype fields are bit fields of 16 bits each. Since each of the 16 Didget types can have 16 different subtypes there are 256 possible kinds of Didgets in the system.

One Didget type is "File". When files are converted into Didgets, they are assigned to be File Didgets. The other 15 Didget types have special purposes that apply only within the Didget Realm and I will discuss them in further posts.

Of the 16 File Didget subtypes, only 8 have been defined so far. They are Audio, Document, Image, Script, Software, Structured Data, Text, and Video. Each File Didget subtype can be further categorized into its various formats. Unlike the other two byte fields, this two byte field is not a bit field that can only have a single bit set. Instead it is a unsigned short int and can hold up to 65,534 different format types (zero is reserved).

Audio File Didgets include every format where the data stream is interpreted as sound. Formats for music, audio books, speeches, instruments, voice mail, and other noises all have the "Audio" bit set in the File Didget subtype field.

Software File Didgets include every kind of compiled computer code. Executable files, shared libraries, device drivers, and every other kind of software, regardless of targeted CPU or operating system, all have the "Software" bit set in the File Didget subtype field. Other kinds of code that must be interpreted like Python, Ruby, Perl, system commands, or shell scripts are categorized as "Script".

The other types of File Didgets are used to categorize the various document formats, still images formats, video formats, database formats, and plain text data formats.

Unlike file systems, the Didget Management System provides simple APIs used to search for all the Didgets that match a given set of search criteria. What this means is that an application can make a single call to the Didget Manager for a list of all the Video File Didgets and get a complete and accurate list very quickly no matter how many different kinds of video formats may be present.

Because the Didget Manager is able to quickly check bits in the bit fields described for every Didget Record in the system, it is able to sort out all the matching Didgets for any particular query in record time. On a system with a Quad core processor and 4 GB of RAM, I am able to sort through about 25 million Didget Records per second. This means I can find 9 million photographs mixed in with 16 million other kinds of Didgets in one second or less.

Thursday, May 10, 2012

What is a Didget?

This new data management architecture uses individual objects called Didgets. A Didget (short for Data Widget) has some properties of a conventional file, some properties of items stored in an object store, and a bunch of properties for which I can find no equivalent in any other system.

A Didget has a variable-length data stream just like a file. Any kind of serialized data can be written to this data stream. It can contain a photo, some software, a video stream, or any other structured or unstructured data that can be saved to a file. The number of bytes for this stream can range from zero bytes to just over 18 trillion bytes.

A Didget also has a small set of required metadata that is stored as a fixed-size record within a table. Just like file records like iNodes (Ext2, Ext3, Ext4) and FRS structures (NTFS) that file systems use to track files, the Didget Manager keeps track of all the Didgets within a Chamber using these Didget table entries.

The size of each entry is intentionally small. It is only 64 bytes in size. This allows extremely large numbers of Didgets to be managed using a minimal amount of disk reads and RAM for caching. By contrast, the default iNode size is 256 bytes and NTFS's records are 1024 bytes (4096 bytes on all the new advanced format hard disk drives). Some other file systems have even larger metadata records. In order for the NTFS file system to read in the entire MFT and store it into memory for quick searches, it would need to read in 10 GB from disk and have a 10 GB of RAM if the volume contained 10 million files. For a 100 million files, it would need ten times that much memory.

The Didget Manager, on the other hand, could read and store the entire Didget table in just 640 MB when 10 million Didgets are present. Even for 100 million Didgets, it is a very manageable 6.4 GB.

With only 64 bytes to work with, every single bit is important. Painstaking care was taken to insure that every byte was necessary and yet every field was sufficient size to ensure good limits. Fields within this structure include the Didget's ID, its type information, its attributes, its security keys, a tag count, and three separate event counter values (analogous to date and time stamps).

Absent from this structure is the name of the Didget. It is stored in another structure if the Didget even has one. Unlike files, a Didget does not need to have a name. Its unique identifier is a number. This 64 bit number (the Didget ID) is assigned during Didget creation; it never changes over the life of the Didget; and the number is never recycled if the Didget is deleted and purged from the chamber. This means that if the ID is stored within some other data stream for use by a program, it will always point to the right Didget (unless of course that Didget has been deleted).

A Didget can have a name. In fact it can have lots of them. The name is simply a tag that has been attached to a Didget. A tag is a simple Key:Value pair that is stored in such a way as to enable database-like speeds when searching for Didgets that have certain tags. Each Didget can have up to 255 tags attached to it.

Each tag within the system is defined in a schema. The user or an application can create a new tag definition using a simple API. Once defined, tags can be created and attached to any Didget(s) using that definition.

For example: a Didget containing a photograph taken of Bob in New York City in 2011 may have 3 tags attached to it (.person.FirstName = Bob, .place.City = "New York City", .date.Year = 2011). Any application can issue a query to the Didget Manager for a list of all photographs with these three tags and this Didget will be in the list. If the user wanted to later attach a new tag (e.g. .device.Camera) to this photograph, he could define the tag and then attach the value (.device.Camera = "Cannon EOS 7D") to it.

The Didget Manager is designed to be lightning quick at finding all Didgets that match a given query. For example: if a chamber contained 20 million Didgets and 5 million of them were photographs, it could return a list of the Didget IDs of all 5 million photographs in less than 1 second. If the query required matching several tag values, it would still take less than 10 seconds to return the complete list even if every photograph had dozens of tags attached to them.

The Didget Manager is able to accomplish this task without needing a separate database that can become out of sync with the Didget metadata. The entire system has been built around a "10 second rule". That is to say that the algorithms and structures of the system have been designed such that with the right hardware setup, no query should ever require more than 10 seconds to complete even if the chamber contains billions of Didgets.

Wednesday, May 9, 2012

Prerequisites

As stated earlier, I have invented a new system (Didget Management) that I think can eventually replace conventional file systems. Before going into any details about the various features of my new system, I thought I would first discuss the requirements of any data management system that hopes to have any chance of unseating the reigning champion - the traditional file system.

Of course, any departure from over 50 years of computing tradition will be met with a certain amount of pain. No matter how many processes are put in place to ease the migration of data from one system to another, existing users and programs must adapt to a whole new way of managing data. Some features of conventional systems can be emulated to provide some level of backward compatibility, but nevertheless there will be a learning curve and the new system will break some old ways of doing things.

Obviously, the new system must offer some very compelling features in order to make the pain worth it. The new system must not only solve a number of existing problems, but it must also open up lots of new opportunities. As the Internet has proven, if you provide enough value, widespread adoption is possible in spite of many hurdles. The Internet went from a curiosity to an integral part of the computing landscape in a relatively short time once users and developers realized the power of these inter-connected servers and the opportunities they opened up.

If it aint broke don't fix it...

Although in a previous post I enumerated quite a few problems with conventional file systems, they also have a number of features that I think work very well. My new system must be able to provide these features with equivalent speeds and ease of use.

1) Block Based Storage.

File systems rely heavily on the block storage nature of the physical storage devices they control. Hard drives, flash drives, and optical disks are all block based storage mediums. Like file systems, the Didget Manager makes heavy use of a block based architecture.

2) Variable-Length Data Streams.

Each file in a file system has a data stream that consists of a set of bytes arranged serially that represents information stored in a digital format. The data stream may contain structured or unstructured data. Numerous formats have been invented over the years that programs rely heavily upon to work properly. Just like a file, a Didget has a variable length byte stream that can contain any kind of data. Any existing file can be converted to a Didget without modifying its data stream.

3) Robust, Yet Simple API.

Applications must be able to create, delete, access, modify, and perform queries against any number of data elements (e.g. files). Like file systems, the Didget Management System will release a robust set of APIs that make it very easy for applications to create new Didgets and query or otherwise manipulate existing ones.

4) Support for Massive Numbers.

Modern file systems like NTFS can handle billions of files within a single volume if the underlying storage is sufficiently large enough to hold them. Like volumes do with files, each Chamber can handle billions of individual Didgets.

5) Fast Access.

File systems have been finely tuned over the years to provide quick response to commands from applications to create, open, read, write, and close files. The Didget Manager is able to perform similar operations with just as much speed as conventional file systems. For batch operations where thousands of new Didgets are created at once, we can even do it faster.

Who am I?

The most obvious questions anyone who is taking a serious look at this technology would ask are: Who is this guy? and Does he know what he is talking about?

My name is Andy Lawrence. I have over 20 years experience designing and implementing file system drivers, custom file systems, disk utilities, and cloud storage solutions.

Hopefully, my posts on this blog will speak for themselves as to whether my ideas have merit. I will leave it up to the readers to judge for themselves if the problems I discuss are real and whether or not my solutions are valid.

As far as my qualifications for delivering storage solutions go, here are my credentials. After graduating from college in the late 80s with a BS in computer science, I joined Novell where among other things I worked on device drivers for the DOS, Windows, and OS/2 operating systems. Here I learned in great detail about how disk drives worked and how file systems handle data streams.

In 1995, I joined a small startup called PowerQuest where just a few of us engineers worked on a disk partitioning product called Partition Magic. During my nearly 8 years at the company, I lead the development of a couple other products, Drive Copy and Drive Image. These were among the first disk imaging solutions to enter the market.

I have written custom file systems, worked on cloud based backup solutions, and designed my own general-purpose data management solution. My current "Day Job" is at Move Networks. I am a Principal Engineer at this company which was acquired by Echostar early last year.

I have recruited a small team of former colleagues to assist me in implementing the various features of this new architecture. They also have regular jobs, so we are working on this project in our spare time.

Tuesday, May 8, 2012

The Problem(s) with File Systems

File systems have been the backbone of data storage systems since the early days of computing. I use the plural term because as everyone knows there isn't just one file system, but there are lots of them. FAT, FAT32, NTFS, Ext2, Ext3, Ext4, HFS+, ZFS, etc., are all examples of such file systems and every few years, one of the operating system vendors or someone in academia comes up with a new one.

Over the years more than 100 different file systems have joined the ranks. Some fill very niche applications, others have gained moderate market acceptance, while yet others are running on computers numbering over 100 million. Each new file system offers at least a few unique features (e.g. long file names, access control lists, extended attributes, journaling, or hard links) that set it apart from the others in the field, but all file systems are constrained by the general file system architecture.

Backward compatibility issues and the desire of application designers to write to a single, unified, file API make it extremely difficult for new file systems to introduce compelling, original features without breaking the mold. Over the years, numerous problems dealing with data storage have surfaced. Some problems have been solved or at least mitigated by the introduction of newer file systems. Other problems continue to plague data managers and require a radical new approach to solve.

In spite of numerous problems, file systems work reasonably well and their endurance is a testament to their designers. However, I believe the time has come to replace file systems with something better. By that, I don't mean we need to just build another file system that does a few things differently than the others. I mean that we need a radically new general-purpose data management system that is not limited by the conventional file system architecture.

So, what's wrong with today's file systems? Let me count the ways...

The biggest problem is that file systems don't actually "manage" files. Sure, they enable hundreds or thousands of different applications to create lots of files, but they don't actually help manage them. File systems only do what applications tell them to do and nothing more. A file system won't create, copy, move, or delete a file without an explicit command to do so from the operating system or an application.

Any application with access can create one or more files within the file system hierarchy and fill them with structured or unstructured data. With today's cheap, high-capacity storage devices, file system volumes can be created which will hold many millions of files. Some file systems are capable of storing several billions of individual files.

While a file system will make sure that every file's data stream is properly stored and kept intact and that every file's metadata fields maintain the last values set by applications, the file system itself knows almost nothing about the files it stores. Every file system treats each of its files like a "black box". A file system can't tell a photo from a document or a database from a video. It doesn't make sure that the file's unique identifier (its full path including the file name) is in any way related to the data it contains. It doesn't care a wit if an application creates a file containing a resume' and names it C:\photos\vacations\MyGame.exe. It also will let a user store music files in their /bin directory or put critical operating system files in a folder called /downloads/tmp.

What this means is that if the user, for example, wants to find all photo files that were created in 2011, the file system will do little to help find them. An application must examine each and every file within the system and compare its data type with known photo formats and then check its date and time stamps to see if it was created in that year. Unlike databases that have sophisticated query languages for lightning fast searches, file systems have things like findFirst and findNext.

When the number of files within a file system grows beyond several thousand, it becomes increasingly difficult for the average user to try and manage them using a file browser and a well defined folder structure. Once the number exceeds a million, the user is generally completely lost without special file management applications to help organize all the files. Basic searches for either a single file or for groups of files can take a very long time since directory tree traversals using string comparison functions are inherently slow. As the number of files grows, the queries take longer and longer.

To combat this problem, users are turning to special purpose data management applications to help them manage a certain subset of all their files. To manage their music files, they get iTunes. To manage their photos they try Picasa or Photoshop. To manage their video streams they install iMovie. Each of these applications offers ways to organize and keep track of their respective data sets. They often allow the user to tag or otherwise add special metadata to every file they manage to help the user classify files or put them into playlists, favorites, or albums. This extra metadata is often stored in a proprietary format or in a special database managed exclusively by the application.

This solution to managing data generally results in a collection of separate "silos of information" that do not interoperate with each other very well. Other applications are not able to easily take advantage of the extra metadata generated by the various data management applications. Many files within a given volume are not part of one of these silos and must be managed independently. Movement of data from one system to another often requires special import and export functions that don't always work. Finally, the management applications often just maintain references to the files they manage. If another application moves, renames, or deletes the underlying files, the management application often runs into problems as it tries to resolve the inconsistencies.

Operating systems like Windows 7 and Mac OS X include special file indexing solutions (Windows Search and Spotlight) to help the user find files. The indexer will comb through some or all of the files in a volume and "Index" the metadata and/or file content it can identify. It will store all the index information within special database files so that the user or applications can quickly find files based on keywords. Unfortunately, these indexers are not tied directly to the file systems they index. It is often the case (especially with portable storage devices) that changes are made to files while the indexer is not currently controlling or monitoring changes. This can happen if the user boots another operating system or plugs the portable drive into another computer. Once the indexer resumes operation, it must go through an extensive operation to try and figure out what changed. In some cases, it just deletes its index and starts over. For volumes with millions of files, it can take many hours to re-index.

Some file systems allow extended attributes to be created by applications and maintained by that file system. Unfortunately, extended attributes are not universally supported and each file system's implementation is different. Copying or moving files with extended attributes between file systems can result in the loss of information or its unexpected alteration. Even those file systems that allow extended attributes do not provide any fast way for applications to search for files based on them. Other than through the indexing services mentioned earlier, it is nearly impossible to find a set of files based on a common extended attribute value.

Another persistent problem is that every file within the system is subject to changes initiated by any application with access. The file API is very open and allows almost every piece of metadata or byte stream to be modified at will. Malicious or inadvertent changes can wreck havoc on a system. A virus that manages to run under the logged on user is able to modify any file that user has rights to. Such malicious programs could, for example, make random alterations to filenames and/or folder names and thus invalidate any stored path names. A program can change date and time stamps, file attributes, access permissions, and file locations. The file attribute "Read-only" is just a suggestion for applications to leave the data stream alone. Any program with rights can simply change the attribute to "Read-Write", modify the file contents at will, and even change the attribute back once it is finished. What this means is that no file metadata or data stream can be trusted to be either accurate or even reflect its original state. Only a bit-by-bit comparison with another set of original data can assure that any file has not been altered.

For many operations such as file backup or synchronization, a knowledge of the order of operations against a particular data set is crucial. File systems use date and time stamps to keep track of when files are created, accessed, or modified. As was previously pointed out, because each of these time stamps can be altered at will, they may not be accurate. Even if no applications alters them, the values they contain may not reflect the proper order of operations. The file system simply queries the value of the system clock controlled by the running operating system when it records date and time stamps. The clock may be off by a few minutes, hours, days, or even longer. The clock can be reset by the user or by synchronization with another computer. A portable drive that is plugged into two different computers during the course of a day, each with a clock that is different, may not record the proper sequence of events with regards to file operations.

Lastly, one of the biggest weaknesses of file systems is the unique identifier that is used for files. The file name and the folder names in its hierarchy make up each file's unique identifier. Every file must have one and only one full path name and it must be unique. Some file names are human readable, others are generated by software and may look like "RcXz12p20.rxz". The human readable names are generally in the language of the creator and cannot be translated without altering the file's unique identifier. Various file organizers and any other application that wants to keep track of one or more files, often stores the full path to the files either within a database or within another file's data stream. If the original file name is altered or any folder in its path is either renamed or the file is moved to a new folder, the stored path becomes invalid. "File Not Found" is among the most common error conditions encountered by users or applications.

Computers are much faster at crunching numbers than they are at string comparisons. It will always be much faster for a file system to find a million files if it is given their iNode numbers than it would be to find them based on a million different full path names.

As block storage devices like hard disk drives and flash memory drives continue to expand in capacity, the number of files within any given file system volume will continue to increase dramatically. As the average number of files a user or business has grows, the issues identified here will become even more problematic.

The Time Has Come to Replace File Systems

I am working on a new general-purpose data management system that I think has the right feature set to make it a viable candidate as a suitable replacement for conventional file systems. It is not just another new file system that adds a few new features but still adheres to the general file system architecture that is now decades old.

Instead, it radically departs from the conventional architecture and presents a whole new way to create, protect, and manage data. This new architecture is the result of years of careful design.  It offers solutions to many problems that have plagued file systems over the years and takes entirely new approaches to data management that will enable new ways of dealing with data.

The architecture is complete and the implementation phase is now over a year along. I have recruited some top talent to help me get enough features functional so users can see the power of this new system. We now have a core data management engine that is capable of performing some amazing feats. We also have a working browser application written on top of our data management API that exposes enough of the feature set to make a very good demo. Our first approach is a simple photo organizer that will help average users to get a handle on their collection of thousands (or millions) of photos.

As with every new startup, we are looking for sources of funding to help us achieve our goals. We are currently seeking a company that is struggling to manage large amounts of data and is looking for a new approach to help them solve their problem. Also, we want to talk with angel investors who want to be a part of a disruptive technology with huge potential to change the way world looks at data.

Stay tuned for further information as we introduce the world to our new system which consists of a set of data objects called Didgets (short for data widgets) that are organized into containers called Chambers. Each individual chamber is governed by an instance of the Didget Manager. The worldwide collection of all these Didget Chambers comprise the Didget Realm.

Just like the Internet which consists of a worldwide set of inter-connected servers that came out of nowhere to revolutionize the entire computing world, I believe the Didget Realm can significantly alter the data storage landscape and open up whole new industries.