YouTube Channel
https://www.youtube.com/channel/UC-L1oTcH0ocMXShifCt4JQQ
I created a couple new videos that demonstrate the speed and flexibility of the Didget System's database functions. I loaded in the Chicago crime data available in CSV or Json format on the city's open data portal. Anyone can go download this decent-sized table (nearly 7 million rows and 22 columns).
I loaded this data into Didgets as well as Postgres and did some benchmarks. On average, the Didget System could query the table in about 1/4 of the time it took Postgres. I ran dozens of queries and saw results anywhere from twice as fast to up to ten times as fast. I saw similar results using SQL Server and MySQL.
Welcome to the Realm, The World of Didgets
Monday, July 29, 2019
Friday, December 21, 2018
Updates
2018 has been a busy year. It has been some time since I last posted, so I didn't want the year to end without an update.
We formed a new company (Didgets.io) and started a simple web page for it. I have added a ton of features and enhanced many of the previously implemented ones. We successfully found our first two paying customers so we had some modest income this year. We are currently working on adding team members and looking for some working capital to speed up development. We have a number of potential customers that we are currently working with to get them on board.
As every startup founder does, I have to wear multiple hats. The 'Documentation Hat' has been one I have obviously neglected as other tasks have consumed my time.
To catch up, here is a brief list of a few of the most important changes over the past year:
1) Added a lot of Json support to try and capture some of the NoSQL market. Json files can be used to import tables and we can export values and results into a Json file. Since Json supports arrays for every single value, we have been able to test out our 'three dimensional table' features. Each row/column intersection can have multiple values that can be treated separately.
2) Ported everything to Linux and working on the MacOS version too. Updated the build tools to VS2017 and use Qt 5.11.2 for the browser tool. Can now build in both Visual Studio and Qt Creator.
3) Create indexes of sets of text files or of table columns. These are not your typical RDBMS indexes that are used to speed up queries. They are analytical tools to find and analyze patterns in text.
4) Create catalogs of other systems. We can now create Didgets (with associated tags) without importing the data streams based on files in other systems.
5) Enabled 'drill down' analytics from relational table results. If the user has a result set (from either a 'SELECT *' or a more specific query), they can see all the values represented for each column using the 'show values' option. For example: a query against a customers table might display all customers living in California (e.g. 10,000 rows). Right click on the 'city' column header and select the 'show values' option and it will give you a list of all the cities for those 10,000 customers and how many customers are found in each city. If you double click on one cell withing the result set (e.g. 'San Francisco' for row number 3 in the city column) it will pop up a new result set with every customer in that city (e.g. 2500 rows showing each customer in San Francisco). Continuing this process lets the user drill down to more and more specific criteria.
6) Added a set of formulas so our table results can each work much more like a spreadsheet.
7) Added a bunch of transformations to database tables. The user can now modify the data on a column by column basis. We can do things like uppercase, strip punctuation, trim spaces, truncate, replace or split values. The resulting transformations can be placed in entirely new columns within the table.
We formed a new company (Didgets.io) and started a simple web page for it. I have added a ton of features and enhanced many of the previously implemented ones. We successfully found our first two paying customers so we had some modest income this year. We are currently working on adding team members and looking for some working capital to speed up development. We have a number of potential customers that we are currently working with to get them on board.
As every startup founder does, I have to wear multiple hats. The 'Documentation Hat' has been one I have obviously neglected as other tasks have consumed my time.
To catch up, here is a brief list of a few of the most important changes over the past year:
1) Added a lot of Json support to try and capture some of the NoSQL market. Json files can be used to import tables and we can export values and results into a Json file. Since Json supports arrays for every single value, we have been able to test out our 'three dimensional table' features. Each row/column intersection can have multiple values that can be treated separately.
2) Ported everything to Linux and working on the MacOS version too. Updated the build tools to VS2017 and use Qt 5.11.2 for the browser tool. Can now build in both Visual Studio and Qt Creator.
3) Create indexes of sets of text files or of table columns. These are not your typical RDBMS indexes that are used to speed up queries. They are analytical tools to find and analyze patterns in text.
4) Create catalogs of other systems. We can now create Didgets (with associated tags) without importing the data streams based on files in other systems.
5) Enabled 'drill down' analytics from relational table results. If the user has a result set (from either a 'SELECT *' or a more specific query), they can see all the values represented for each column using the 'show values' option. For example: a query against a customers table might display all customers living in California (e.g. 10,000 rows). Right click on the 'city' column header and select the 'show values' option and it will give you a list of all the cities for those 10,000 customers and how many customers are found in each city. If you double click on one cell withing the result set (e.g. 'San Francisco' for row number 3 in the city column) it will pop up a new result set with every customer in that city (e.g. 2500 rows showing each customer in San Francisco). Continuing this process lets the user drill down to more and more specific criteria.
6) Added a set of formulas so our table results can each work much more like a spreadsheet.
7) Added a bunch of transformations to database tables. The user can now modify the data on a column by column basis. We can do things like uppercase, strip punctuation, trim spaces, truncate, replace or split values. The resulting transformations can be placed in entirely new columns within the table.
Monday, September 18, 2017
Latest Videos
I have made a ton of changes since I last recorded some videos showing what our Didget Browser can do, so I decided to make some new ones. I will add new links and descriptions as I record them. Here is what I have so far:
Introduction: (4 minutes) http://youtu.be/NgPTYsb4LRQ?hd=1
This video shows you how to create new containers that hold our Didget objects. It shows how to wipe out a container and start again from scratch. It also shows you how to configure the browser to only show certain features and how to pre-populate a new container with various Didgets.
Creating Database Tables: (5 minutes) http://youtu.be/rM1KEVe7TVc?hd=1
This video shows you how to create relational tables using Didgets. It shows how to create a table definition from scratch. It also shows how to create them using Json or CSV files. Once a definition is created, it can be used to create relational tables. Tables can also be constructed by extracting data from a local or remote database using a connector.
Querying our Database Tables: (7 minutes) http://youtu.be/T_Y2R4DA9UI?hd=1
This video shows how to query the tables; how to JOIN tables; and how to save any results out to a completely separate, persistent table that can be later queried just like any other table.
Introduction: (4 minutes) http://youtu.be/NgPTYsb4LRQ?hd=1
This video shows you how to create new containers that hold our Didget objects. It shows how to wipe out a container and start again from scratch. It also shows you how to configure the browser to only show certain features and how to pre-populate a new container with various Didgets.
Creating Database Tables: (5 minutes) http://youtu.be/rM1KEVe7TVc?hd=1
This video shows you how to create relational tables using Didgets. It shows how to create a table definition from scratch. It also shows how to create them using Json or CSV files. Once a definition is created, it can be used to create relational tables. Tables can also be constructed by extracting data from a local or remote database using a connector.
Querying our Database Tables: (7 minutes) http://youtu.be/T_Y2R4DA9UI?hd=1
This video shows how to query the tables; how to JOIN tables; and how to save any results out to a completely separate, persistent table that can be later queried just like any other table.
Monday, August 28, 2017
How Good is Good Enough?
I can be a perfectionist in some areas. I am passionate about speed when it comes to computer algorithms. Even if I have spent a few days getting some critical function to be 10x faster than it was before, I will often still stay up late if I know I can squeeze a few more percentage points out of it.
But I am also keen to the 'Lean Startup' idea of an MVP (Minimal Viable Product) where you build something that is just good enough without waiting until it is perfect before introducing it to the market. So I struggle with how good a particular feature has to be before I say it is good enough and move on to the next task.
The Didget Management System has a lot of very innovative features that set it apart from other kinds of general-purpose data managers. But speed is its greatest 'Wow!' factor. It can do things thousands of times faster than conventional file systems. It can do many database operations 2x, 3x, or 5x faster than the major relational database management systems. It takes full advantage of multi-core CPUs to make even single operations much faster by breaking them up and running pieces in parallel.
Yet, I have found my biggest challenge has been to get people to commit resources (time, money, effort) toward something that has considerable promise if there is any risk involved. I will show them something that is taking them 10 minutes to do using their PostgreSQL Database and can be done on my system in only 2 minutes and yet have trouble getting them to commit. This is not something trivial that is outside the core function of their business...it is something critical, and still they hesitate. I think this is mainly because it requires change - a step into the unknown.
Everyone knows that risk is the biggest enemy of innovation. All but the most trivial innovations required someone to take a chance and put something on the line to 'make it happen'. Business managers almost always remember some initiative that failed because they tried something out of the mainstream. But they rarely, if ever, know how a passed-up opportunity would have played out to their advantage.
I am keenly aware that in order to convince companies to switch to my system, it has to be a great improvement over their existing solution. When I started this project, I set the bar at twice as good. If I didn't think I could build something that was at least twice as fast, twice as convenient, or had twice the power; I would never have gone far into its development. It has far exceeded my expectations to the point that I think it will be at least 10x better than anything else.
So I plod ahead with the hope that eventually this platform will attract the attention of those innovators who will take a good look at the risk/reward ratio and decide the reward is just too great to pass up. We have some companies that are taking a look at it, but most have yet to do more than dip their toe in the water.
As I look over my list of tasks that are yet to be completed, there are many that are 'refinement' tasks or things that will make features that already work, significantly better. There are others that are features that do not yet work at all and need to be implemented. I have taken the approach to do a mixture of things from both groups. Every time I get one or two new things working, I will go back and enhance one thing that worked before. At the end of the month, I can then say that the product does things it never did before but also does a handful of things better than ever.
But I am also keen to the 'Lean Startup' idea of an MVP (Minimal Viable Product) where you build something that is just good enough without waiting until it is perfect before introducing it to the market. So I struggle with how good a particular feature has to be before I say it is good enough and move on to the next task.
The Didget Management System has a lot of very innovative features that set it apart from other kinds of general-purpose data managers. But speed is its greatest 'Wow!' factor. It can do things thousands of times faster than conventional file systems. It can do many database operations 2x, 3x, or 5x faster than the major relational database management systems. It takes full advantage of multi-core CPUs to make even single operations much faster by breaking them up and running pieces in parallel.
Yet, I have found my biggest challenge has been to get people to commit resources (time, money, effort) toward something that has considerable promise if there is any risk involved. I will show them something that is taking them 10 minutes to do using their PostgreSQL Database and can be done on my system in only 2 minutes and yet have trouble getting them to commit. This is not something trivial that is outside the core function of their business...it is something critical, and still they hesitate. I think this is mainly because it requires change - a step into the unknown.
Everyone knows that risk is the biggest enemy of innovation. All but the most trivial innovations required someone to take a chance and put something on the line to 'make it happen'. Business managers almost always remember some initiative that failed because they tried something out of the mainstream. But they rarely, if ever, know how a passed-up opportunity would have played out to their advantage.
I am keenly aware that in order to convince companies to switch to my system, it has to be a great improvement over their existing solution. When I started this project, I set the bar at twice as good. If I didn't think I could build something that was at least twice as fast, twice as convenient, or had twice the power; I would never have gone far into its development. It has far exceeded my expectations to the point that I think it will be at least 10x better than anything else.
So I plod ahead with the hope that eventually this platform will attract the attention of those innovators who will take a good look at the risk/reward ratio and decide the reward is just too great to pass up. We have some companies that are taking a look at it, but most have yet to do more than dip their toe in the water.
As I look over my list of tasks that are yet to be completed, there are many that are 'refinement' tasks or things that will make features that already work, significantly better. There are others that are features that do not yet work at all and need to be implemented. I have taken the approach to do a mixture of things from both groups. Every time I get one or two new things working, I will go back and enhance one thing that worked before. At the end of the month, I can then say that the product does things it never did before but also does a handful of things better than ever.
Tuesday, May 2, 2017
Design Principles
As we have designed and implemented the Didget Management System, we have done a good job so far at adhering to these basic principles:
1) No dependencies. We purposely stayed away from using any thing that may cause dependency issues down the road. We don't use .NET. We don't use third party libraries other than the standard C++ libraries. Our browser application uses Qt but the manager itself does not.
2) Take full advantage of CPU cores/threads. We want our code to run much faster on a CPU with more cores. Large individual operations are often broken up into multiple pieces and run in parallel using separate threads. We are not just running multiple queries simultaneously like many database servers (we do that too), but we can run a single query faster when more cores are available.
3) Use thread-safe code. Because we do so many things at the same time, we want to allow multiple operations on the same set of data to safely run in parallel. Data integrity is very important whether running many separate queries simultaneously or when many threads are running different parts of the same query.
4) Be operating system independent. We want this code to run equally well on Linux, Windows, or OSX. All operating system calls are confined to a single 'Kernel' module which is easily ported to other operating systems.
5) Be faster than anything else. Use things like maps, hash tables, and very fast algorithms written in efficient C++ code to do everything. No interpreted code.
6) Re-use code whenever possible. Often the same function can be used to manipulate a dozen or so different kinds of Didgets. When we fine tune something, it often improves performance in multiple areas.
1) No dependencies. We purposely stayed away from using any thing that may cause dependency issues down the road. We don't use .NET. We don't use third party libraries other than the standard C++ libraries. Our browser application uses Qt but the manager itself does not.
2) Take full advantage of CPU cores/threads. We want our code to run much faster on a CPU with more cores. Large individual operations are often broken up into multiple pieces and run in parallel using separate threads. We are not just running multiple queries simultaneously like many database servers (we do that too), but we can run a single query faster when more cores are available.
3) Use thread-safe code. Because we do so many things at the same time, we want to allow multiple operations on the same set of data to safely run in parallel. Data integrity is very important whether running many separate queries simultaneously or when many threads are running different parts of the same query.
4) Be operating system independent. We want this code to run equally well on Linux, Windows, or OSX. All operating system calls are confined to a single 'Kernel' module which is easily ported to other operating systems.
5) Be faster than anything else. Use things like maps, hash tables, and very fast algorithms written in efficient C++ code to do everything. No interpreted code.
6) Re-use code whenever possible. Often the same function can be used to manipulate a dozen or so different kinds of Didgets. When we fine tune something, it often improves performance in multiple areas.
Progress
I recently left my day job to work on the Didget system full time. My small team has made a number of changes to the project since the last blog post, but progress has been slow when so many other things get in the way. I have now been able to accelerate the development and testing greatly.
Here is a list of some new features that have been added over the past two years.
1) Added ability to do JOIN operations on our database tables.
2) Added direct connectors to external databases so we can create DB tables by querying those databases directly. In earlier versions, the user had to export the data into CSV files from those databases and then import those files into our system.
3) Added ability to transform data within database tables. We can now create new columns that are transformations of other columns. For example we can uppercase, lowercase, substitute, split, combine, convert, truncate, trim, etc.
4) Added a 'Folder' container type so that the data stream for every Didget is stored in a separate file. This helps with testing and lets us do more 'apples to apples' comparisons with file system operations.
5) Added more complex SQL query operations. We can now combine lots of AND and OR operations like "SELECT * FROM myTable WHERE Name LIKE '%son' AND Address ILIKE '%123%' OR ZipCode < 10000;" We made it very easy to create and save these queries.
6) Tuned a bunch of operations to be faster. SQL queries now execute even faster and often require less data to be read from disk. Many of our database queries are now about twice as fast as MySQL or PostgreSQL when performed on the same data set on the same machine. Again, we don't need any indexes to often outperform a fully indexed table in those other database systems.
Here is a list of some new features that have been added over the past two years.
1) Added ability to do JOIN operations on our database tables.
2) Added direct connectors to external databases so we can create DB tables by querying those databases directly. In earlier versions, the user had to export the data into CSV files from those databases and then import those files into our system.
3) Added ability to transform data within database tables. We can now create new columns that are transformations of other columns. For example we can uppercase, lowercase, substitute, split, combine, convert, truncate, trim, etc.
4) Added a 'Folder' container type so that the data stream for every Didget is stored in a separate file. This helps with testing and lets us do more 'apples to apples' comparisons with file system operations.
5) Added more complex SQL query operations. We can now combine lots of AND and OR operations like "SELECT * FROM myTable WHERE Name LIKE '%son' AND Address ILIKE '%123%' OR ZipCode < 10000;" We made it very easy to create and save these queries.
6) Tuned a bunch of operations to be faster. SQL queries now execute even faster and often require less data to be read from disk. Many of our database queries are now about twice as fast as MySQL or PostgreSQL when performed on the same data set on the same machine. Again, we don't need any indexes to often outperform a fully indexed table in those other database systems.
Tuesday, September 22, 2015
2015 Demo Videos on YouTube
Links to latest demo videos.
10 minute video that shows file stuff as well as database operations:
https://www.youtube.com/watch?v=2uUvGMUyFhY
Shorter, 5 minute video that just shows the database operations, Latest and fastest code in that area is on display here:
https://youtu.be/0X02xpy8ygc
10 minute video that shows file stuff as well as database operations:
https://www.youtube.com/watch?v=2uUvGMUyFhY
Shorter, 5 minute video that just shows the database operations, Latest and fastest code in that area is on display here:
https://youtu.be/0X02xpy8ygc
Subscribe to:
Posts (Atom)