Lava Database Performance

Due to the Distributed nature of the database, it is possible for the Distributed Client to do a large amount of the processing for a particular client applicati o n, rather than passing on the load to the Primary Server. This division of responsibility results in a huge improvement of overall performance, as each user typically has to wait for only the processing he himself has triggered - rather than wait for 10 other users to complete what they had requested first.

Regardless of the advantages of Distributed SQL technology, it is still essential to have very good performance in the underlying database.

In order to boost the performance of the database and typical applications several unique devices were implemented.

  • Memory Tables
  • Conventionally, databases place all data tables on disk. Quite often, application data applies to a single session, and has no value once the session has terminated. For this purpose, both Distributed Client and the Primary Server provide for memory tables - exactly equivalent in all functional aspects to permanent tables, but stored fully and only in memory. The performance of memory tables is many times faster than the disk equivalent, allowing significant performance increases where these are appropriate.

  • Raw Memory Tables
  • A normal database table will leave a "hole" in the data when a row is deleted. Often this is not a problem, but with some algorithms it is imperative that the data be contiguous - no gaps. Also, in a conventional table it is not possible to place an entry (row) between two closely adjacent rows (say rows 10 and 11). Again, with some algorithms this is exactly what you need to do.

    In order to support these requirements, the Lava database provides a Raw table - similar to a conventional table except for the fact that only current data is listed (deleting a row moves all data subsequent to the deletion up one) and that row insertion is supported (inserting a row between 10 and 11 results in row 11 becoming row 12).

  • Table Sorting
  • Where appropriate, the database supports the sorting of memory tables or raw tables by one or more columns. This means that programmers do not have to waste time creating facilities to sort large data sets - in addition, the built-in sort is the fastest available. Even on a typical workstation it is possible to sort a million rows of data in a few seconds.

  • Stack tables
  • Quite often a programming algorithm requires processing of data in stack-fashion (LIFO, or last-in-first-out). The Lava database provides a special kind of table which behaves exactly like a stack, and supports Push and Pop operations in addition to the conventional Truncate, as well as the same facilities as a Raw table - in other words it is possible to remove or add entries halfway down the stack if required.

  • Table buffering
  • Permanent tables of necessity have slower performance than Memory tables. However, the majority of application data requires permanent storage. For this reason, Lava permanent tables are extensively buffered in memory by a highly intelligent delayed-write mechanism, which provides permanent tables performance which in the case of moderately sized tables (no more than 1,000,000 rows) may be as high as 70% to 80% of the performance of memory tables.

  • High performance Indexing
  • One of the weakest links in conventional database performance is the index functionality. Due to the requirement to store indexes permanently, the speed of index access is typically severely compromised.

    To circumvent this problem, the Lava database makes use of an innovative mechanism which in the case of moderately sized tables (no more than 1,000,000 rows) builds and maintains indexes using memory only. The result is index processing and seeking which is multiple times faster that conventional indexes.

    As indexes are not permanently stored, this required facilities to build indexes very rapidly. A proprietary index construction algorithm allows creation of indexes many times faster than in conventional databases - for example, creating an index on a table of 1 million rows can take 20 minutes or more in MySQL. An exactly equivalent index (on the same data and the same hardware) takes approximately 30 seconds in a Lava database.

  • Automatic Indexing
  • All conventional databases use only the indexes defined by the programmer or administrator to improve performance of seeks and complex select statements. If a particular index would enhance performance of a large select by a factor of 10 or 20, it will only be used if it already exists. This imposes a significant burden of specialist understanding on programmers who may not intuitively grasp the intricacies of SQL optimisation.

    To circumvent this problem, the Lava database has a SQL optimiser which can identify potential indexes for optimisation purposes. Where an index is required on a moderate-size table and does not exist, the SQL planner will simply create it on the fly, then use it to optimise the select. Due to the extremely fast creation time of indexes in a Lava database, the first execution of the statement will typically only be delayed for a few seconds. Subsequent execution of this or related queries is at the maximum speed possible for the table in question.

  • Optimised Row Level processing
  • All current databases include the facility to execute a wide range of SQL commands to interact with the data tables. It is true that the SQL processing in a Lava database is typically much faster than conventional databases, and this obviously helps to improve application performance. However, many algorithms in database applications are easier to write using row-level processing rather than SQL. Complicated processing of accounting data, for example, is easier to concieve and implement at row level than to construct the extremely complex and typically slow SQL select statement required to extract exactly the correct data.

    For this reason the Lava database provides a comprehensive set of row level functions, including Add, Delete, Get, Update, Seek (with various options) and more. Even column-level functions are supported (GetColumn, PutColumn). In addition, these functions are highly optimised, providing results in fractions of a millisecond even on very large data tables.

    Using these facilities it is possible to implement complex data processing algorithms with ease, and debug them incrementally - rather than the complex process of achieving the correct result with an impossibly complex SQL statement. As the programmer typically has a better understanding of the interrelationships between data tables than the SQL planner does, a hand-coded solution to the problem will typically outperfrom a complex SQL statement by a very long way.

  • High performance Memory Manager
  • Although the Windows Memory Manager is an adequate mechanism for the average program, in the case of the Lava database the intense requirement for memory processing (memory tables, memory indexes, table buffering ...) demanded something much more powerful.

    For this reason a dedicated Memory Manager was designed and implemented as part of the Lava Database Kernel. Not only is the memory "leakage" in the proprietary Memory Manager only a small fraction of that in the Windows version, the performance is significantly better. Allocating and releasing 50,000 memory buffers in the Windows Memory Manager takes approximately 1.85 seconds on a typcial workstation. That sounds fast - about 37 microseconds per operation. In comparison, the exact same set takes only 0.2 seconds using the Lava Memory Manager, about 4 microseconds per operation.

  • Full Duplex Communication Protocol
  • Due to the distributed nature of the Lava Database, and the fact that the Distributed Client is a fully functional database in its own right, the protocol between the Primary Server and the Client operates on two independent sockets, with client requests to the server using one socket (which also receives the responses for these requests) while the communication from the server to the client uses the alternate socket.

    A highly optimised state machine performs protocol construction and verification on both ends, while multithreading in both server and client allow each to operate at maximum efficiency with no delays resulting from contention.

  • Optimised Socket Layer and Dispatcher
  • In order to ensure maximum efficiency in communication between the server and client, both use the most advanced socket interface available - I/O Completion Ports. Innovative threaded protocol-driven state machines at both ends result in immeasurably small communication delays. Thread pooling on the dispatcher allows the server to utilize multicore processors to best advantage, auto-configuring the pool at mount time by creating the optimal number of dispatcher threads depending on core count.

  • General optimization
  • The entire database implementation has been fine-tuned and optimised using a proprietary and highly sophisticated run-time profiler, which allows detailed examination of processing delays and algorithmic inefficiencies during extended performance trials. The result of this optimisation and the mechanisms listed above (as well as a number of other smaller mechanisms not listed here) is a database kernel which operates at an extremely high level of efficiency, outstripping the performance of competing databases by multiples - 10 or 20 times the performance of conventional databases is common for the greater majority of data operations in a distributed application environment.