I have made a ton of changes since I last recorded some videos showing what our Didget Browser can do, so I decided to make some new ones. I will add new links and descriptions as I record them. Here is what I have so far:
Introduction: (4 minutes) http://youtu.be/NgPTYsb4LRQ?hd=1
This video shows you how to create new containers that hold our Didget objects. It shows how to wipe out a container and start again from scratch. It also shows you how to configure the browser to only show certain features and how to pre-populate a new container with various Didgets.
Creating Database Tables: (5 minutes) http://youtu.be/rM1KEVe7TVc?hd=1
This video shows you how to create relational tables using Didgets. It shows how to create a table definition from scratch. It also shows how to create them using Json or CSV files. Once a definition is created, it can be used to create relational tables. Tables can also be constructed by extracting data from a local or remote database using a connector.
Querying our Database Tables: (7 minutes) http://youtu.be/T_Y2R4DA9UI?hd=1
This video shows how to query the tables; how to JOIN tables; and how to save any results out to a completely separate, persistent table that can be later queried just like any other table.
Monday, September 18, 2017
Monday, August 28, 2017
How Good is Good Enough?
I can be a perfectionist in some areas. I am passionate about speed when it comes to computer algorithms. Even if I have spent a few days getting some critical function to be 10x faster than it was before, I will often still stay up late if I know I can squeeze a few more percentage points out of it.
But I am also keen to the 'Lean Startup' idea of an MVP (Minimal Viable Product) where you build something that is just good enough without waiting until it is perfect before introducing it to the market. So I struggle with how good a particular feature has to be before I say it is good enough and move on to the next task.
The Didget Management System has a lot of very innovative features that set it apart from other kinds of general-purpose data managers. But speed is its greatest 'Wow!' factor. It can do things thousands of times faster than conventional file systems. It can do many database operations 2x, 3x, or 5x faster than the major relational database management systems. It takes full advantage of multi-core CPUs to make even single operations much faster by breaking them up and running pieces in parallel.
Yet, I have found my biggest challenge has been to get people to commit resources (time, money, effort) toward something that has considerable promise if there is any risk involved. I will show them something that is taking them 10 minutes to do using their PostgreSQL Database and can be done on my system in only 2 minutes and yet have trouble getting them to commit. This is not something trivial that is outside the core function of their business...it is something critical, and still they hesitate. I think this is mainly because it requires change - a step into the unknown.
Everyone knows that risk is the biggest enemy of innovation. All but the most trivial innovations required someone to take a chance and put something on the line to 'make it happen'. Business managers almost always remember some initiative that failed because they tried something out of the mainstream. But they rarely, if ever, know how a passed-up opportunity would have played out to their advantage.
I am keenly aware that in order to convince companies to switch to my system, it has to be a great improvement over their existing solution. When I started this project, I set the bar at twice as good. If I didn't think I could build something that was at least twice as fast, twice as convenient, or had twice the power; I would never have gone far into its development. It has far exceeded my expectations to the point that I think it will be at least 10x better than anything else.
So I plod ahead with the hope that eventually this platform will attract the attention of those innovators who will take a good look at the risk/reward ratio and decide the reward is just too great to pass up. We have some companies that are taking a look at it, but most have yet to do more than dip their toe in the water.
As I look over my list of tasks that are yet to be completed, there are many that are 'refinement' tasks or things that will make features that already work, significantly better. There are others that are features that do not yet work at all and need to be implemented. I have taken the approach to do a mixture of things from both groups. Every time I get one or two new things working, I will go back and enhance one thing that worked before. At the end of the month, I can then say that the product does things it never did before but also does a handful of things better than ever.
But I am also keen to the 'Lean Startup' idea of an MVP (Minimal Viable Product) where you build something that is just good enough without waiting until it is perfect before introducing it to the market. So I struggle with how good a particular feature has to be before I say it is good enough and move on to the next task.
The Didget Management System has a lot of very innovative features that set it apart from other kinds of general-purpose data managers. But speed is its greatest 'Wow!' factor. It can do things thousands of times faster than conventional file systems. It can do many database operations 2x, 3x, or 5x faster than the major relational database management systems. It takes full advantage of multi-core CPUs to make even single operations much faster by breaking them up and running pieces in parallel.
Yet, I have found my biggest challenge has been to get people to commit resources (time, money, effort) toward something that has considerable promise if there is any risk involved. I will show them something that is taking them 10 minutes to do using their PostgreSQL Database and can be done on my system in only 2 minutes and yet have trouble getting them to commit. This is not something trivial that is outside the core function of their business...it is something critical, and still they hesitate. I think this is mainly because it requires change - a step into the unknown.
Everyone knows that risk is the biggest enemy of innovation. All but the most trivial innovations required someone to take a chance and put something on the line to 'make it happen'. Business managers almost always remember some initiative that failed because they tried something out of the mainstream. But they rarely, if ever, know how a passed-up opportunity would have played out to their advantage.
I am keenly aware that in order to convince companies to switch to my system, it has to be a great improvement over their existing solution. When I started this project, I set the bar at twice as good. If I didn't think I could build something that was at least twice as fast, twice as convenient, or had twice the power; I would never have gone far into its development. It has far exceeded my expectations to the point that I think it will be at least 10x better than anything else.
So I plod ahead with the hope that eventually this platform will attract the attention of those innovators who will take a good look at the risk/reward ratio and decide the reward is just too great to pass up. We have some companies that are taking a look at it, but most have yet to do more than dip their toe in the water.
As I look over my list of tasks that are yet to be completed, there are many that are 'refinement' tasks or things that will make features that already work, significantly better. There are others that are features that do not yet work at all and need to be implemented. I have taken the approach to do a mixture of things from both groups. Every time I get one or two new things working, I will go back and enhance one thing that worked before. At the end of the month, I can then say that the product does things it never did before but also does a handful of things better than ever.
Tuesday, May 2, 2017
Design Principles
As we have designed and implemented the Didget Management System, we have done a good job so far at adhering to these basic principles:
1) No dependencies. We purposely stayed away from using any thing that may cause dependency issues down the road. We don't use .NET. We don't use third party libraries other than the standard C++ libraries. Our browser application uses Qt but the manager itself does not.
2) Take full advantage of CPU cores/threads. We want our code to run much faster on a CPU with more cores. Large individual operations are often broken up into multiple pieces and run in parallel using separate threads. We are not just running multiple queries simultaneously like many database servers (we do that too), but we can run a single query faster when more cores are available.
3) Use thread-safe code. Because we do so many things at the same time, we want to allow multiple operations on the same set of data to safely run in parallel. Data integrity is very important whether running many separate queries simultaneously or when many threads are running different parts of the same query.
4) Be operating system independent. We want this code to run equally well on Linux, Windows, or OSX. All operating system calls are confined to a single 'Kernel' module which is easily ported to other operating systems.
5) Be faster than anything else. Use things like maps, hash tables, and very fast algorithms written in efficient C++ code to do everything. No interpreted code.
6) Re-use code whenever possible. Often the same function can be used to manipulate a dozen or so different kinds of Didgets. When we fine tune something, it often improves performance in multiple areas.
1) No dependencies. We purposely stayed away from using any thing that may cause dependency issues down the road. We don't use .NET. We don't use third party libraries other than the standard C++ libraries. Our browser application uses Qt but the manager itself does not.
2) Take full advantage of CPU cores/threads. We want our code to run much faster on a CPU with more cores. Large individual operations are often broken up into multiple pieces and run in parallel using separate threads. We are not just running multiple queries simultaneously like many database servers (we do that too), but we can run a single query faster when more cores are available.
3) Use thread-safe code. Because we do so many things at the same time, we want to allow multiple operations on the same set of data to safely run in parallel. Data integrity is very important whether running many separate queries simultaneously or when many threads are running different parts of the same query.
4) Be operating system independent. We want this code to run equally well on Linux, Windows, or OSX. All operating system calls are confined to a single 'Kernel' module which is easily ported to other operating systems.
5) Be faster than anything else. Use things like maps, hash tables, and very fast algorithms written in efficient C++ code to do everything. No interpreted code.
6) Re-use code whenever possible. Often the same function can be used to manipulate a dozen or so different kinds of Didgets. When we fine tune something, it often improves performance in multiple areas.
Progress
I recently left my day job to work on the Didget system full time. My small team has made a number of changes to the project since the last blog post, but progress has been slow when so many other things get in the way. I have now been able to accelerate the development and testing greatly.
Here is a list of some new features that have been added over the past two years.
1) Added ability to do JOIN operations on our database tables.
2) Added direct connectors to external databases so we can create DB tables by querying those databases directly. In earlier versions, the user had to export the data into CSV files from those databases and then import those files into our system.
3) Added ability to transform data within database tables. We can now create new columns that are transformations of other columns. For example we can uppercase, lowercase, substitute, split, combine, convert, truncate, trim, etc.
4) Added a 'Folder' container type so that the data stream for every Didget is stored in a separate file. This helps with testing and lets us do more 'apples to apples' comparisons with file system operations.
5) Added more complex SQL query operations. We can now combine lots of AND and OR operations like "SELECT * FROM myTable WHERE Name LIKE '%son' AND Address ILIKE '%123%' OR ZipCode < 10000;" We made it very easy to create and save these queries.
6) Tuned a bunch of operations to be faster. SQL queries now execute even faster and often require less data to be read from disk. Many of our database queries are now about twice as fast as MySQL or PostgreSQL when performed on the same data set on the same machine. Again, we don't need any indexes to often outperform a fully indexed table in those other database systems.
Here is a list of some new features that have been added over the past two years.
1) Added ability to do JOIN operations on our database tables.
2) Added direct connectors to external databases so we can create DB tables by querying those databases directly. In earlier versions, the user had to export the data into CSV files from those databases and then import those files into our system.
3) Added ability to transform data within database tables. We can now create new columns that are transformations of other columns. For example we can uppercase, lowercase, substitute, split, combine, convert, truncate, trim, etc.
4) Added a 'Folder' container type so that the data stream for every Didget is stored in a separate file. This helps with testing and lets us do more 'apples to apples' comparisons with file system operations.
5) Added more complex SQL query operations. We can now combine lots of AND and OR operations like "SELECT * FROM myTable WHERE Name LIKE '%son' AND Address ILIKE '%123%' OR ZipCode < 10000;" We made it very easy to create and save these queries.
6) Tuned a bunch of operations to be faster. SQL queries now execute even faster and often require less data to be read from disk. Many of our database queries are now about twice as fast as MySQL or PostgreSQL when performed on the same data set on the same machine. Again, we don't need any indexes to often outperform a fully indexed table in those other database systems.
Subscribe to:
Posts (Atom)