Mini SQL 2.0

Beta

General Information

Introduction

Mini SQL 2.0 is the second generation of the mSQL database system. The first generation product, mSQL 1.x, was designed to provide high speed access to small data sets. The original goal was to perform 100 basic operations per second on an average UNIX workstation with small data sets using very few system resources (i.e. memory and CPU cycles). The original design goal was met and the software has proven to be popular because of this functionality.
During mSQL's life, people have used it for applications far beyond the scope of the original design. These high-end applications, containing up to 1 million rows of data, showed a need for better handling of complex queries and large data sets if the package was to be used in this way. The second generation of the mSQL server has been designed to suit these high-end applications while maintaining the original design goals of mSQL 1. It has been designed to meet three main criteria

Provide comparable performance for simple operations as mSQL 1.x.
Provide rapid access to large databases and complex operations.
Provide more of the functionality outlined in the ANSI SQL specification.

Enhanced Indexing

One of the major flaws of mSQL 1.0 when used for larger applications was the simplistic indexing support. The original server supported only a single primary key per table and the key could consist of only one field. The internal use of the key was restricted to queries using a simple equality condition. As such, the vast majority of queries were processed without the aid of the key.
mSQL 2.0 provides much more sophisticated indexing support. Each table can have multiple indices defined for its data, with each index consisting of one to ten fields. Any index can be designated as a unique or non-unique index. The index information is stored in a series of AVL Tree structures that are mapped into the virtual memory address space of the mSQL server process. The use of AVL Trees in this way ensures that access to key data is extremely fast.
Although the Beta 1 release of mSQL 2.0 includes only the AVL indexing scheme, the database engine itself has been written to support multiple indexing formats. The underlying format of the index can be specified in the SQL command used to create the index. Other indexing schemes are under development and will be made available in subsequent releases of mSQL 2.0.
To aid in the use of the indices during query execution, a layer of abstraction know as the "candidate rows" system has been introduced into the server. The concept of the candidate rows abstraction is that during query processing, the module performing the query requests the next row from the data table that is a candidate for the selection criteria specified in the query. The requesting module is not aware of the mechanisms used to determine how that row was chosen or accessed. The "candidate row" routines are responsible for determining the best access method (based on the conditions specified in the where clause) and for retrieving the data rows as they are requested. This ensures that the optimum access method is used whenever a row of data is accessed without replicating the access logic in each module and without any "special case" algorithms.
Because the candidate row abstraction provides a single logic path for the acquisition of data rows, it can also be used to optimise the query. A simple query optimiser has been included in the candidate row abstraction and the functionality of the optimiser will be enhanced over time.

Data Types

Another of the limiting factors of the performance of mSQL 1.0 was the size to which tables grew. Given an increasing number of rows, the amount of data that needed to be manipulated in memory increased proportionally. Unfortunately, the fixed length field structure of mSQL 1.0 usually forced a lot of white space and field padding to be included in the data.
To overcome this problem, mSQL 2.0 includes support for a variable length char type (text). The text type allows an unrestricted amount of data to be inserted into a field by using an overflow buffer scheme to hold data beyond the specified size of the field. This provides the best of both worlds in that the database designer can specify the average size of a char field ensuring that in most cases, the data will be held in the data table. If a value is inserted that is longer than average, it will be split between that data table and the overflow buffers. This eliminates the need to specify overly large fields (e.g. 255 character) for storage of URLs and filenames.
To provide a more complete SQL environment future releases of mSQL will include more of the "standard" data types defined by the SQL standard. These will include date/time, currency, and various other types that are provided by the larger database systems.

System Variable / Pseudo Fields

The 2.0 engine includes a framework for supporting system variable or pseudo fields. These fields are data elements maintained by the engine itself but are accessed using a normal select call. Some of the data pertains to an entire table, some to a particular row, and some to the current session. The 2.0 engine provides support for the following system variables

Name Description

_rowid An internal value used to identify a row based on its location. The _rowid field can be used in where clauses during updates and deletes to specify a particular row.

_timestamp An internal value indicating when the row was last modified

_sysdate The current time and date on the machine running the database engine returned in standard UNIX time format (e.g. seconds since the epoch)

_user The username associated with the session over which the query was submitted

Name	Description
_rowid	An internal value used to identify a row based on its location. The _rowid field can be used in where clauses during updates and deletes to specify a particular row.
_timestamp	An internal value indicating when the row was last modified
_sysdate	The current time and date on the machine running the database engine returned in standard UNIX time format (e.g. seconds since the epoch)
_user	The username associated with the session over which the query was submitted

Sequences

To overcome the problem of trying to manage sequences in client application code, mSQL 2.0 provides in-build, atomic operations for accessing and managing sequences. A sequence is a numeric counter that is automatically adjusted to the next value in the sequence each time it is accessed. The sequence is created using a version of the SQL CREATE command and can be created with a user defined initial value and also a user defined step value (i.e. the value added to the sequence after each access.
Any table can have a sequence created for it but a table can only contain a single sequence. Once the sequence has been created, accessing the sequence value is achieved using the _seq system variable, that is by using a query such as
select _seq from foo
The current sequence value is returned and the sequence is updated to the next value in the sequence. Access to and modification of the sequence is atomic.

Complex Expressions

Unlike version 1.x of mSQL, 2.0 supports the notion of complex expressions in where clauses (i.e. the use of parenthesis and sub conditions in a condition). This removes a major limitation in mSQL as it was impossible to perform queries like
SELECT name FROM staff 
	WHERE (staff_id < 100 OR staff_id > 200) 
	AND dept = 'finance'
mSQL 2.0 supports the nesting of sub-conditions to any depth and there can be any number of sub-conditions in the where clause.

Regular Expressions

ANSI SQL defines a very simple regular expression language for use in matching data with the LIKE operator. mSQL 1.x implemented a superset of this functionality by providing access to the complete UNIX regular expression syntax in a LIKE clause. Processing a UNIX regular expression is far more "expensive" than processing the simplistic functionality defined in the ANSI specification and as such LIKE based searching was quite slow in 1.x. To improve this, mSQL 2.0 provides a standard SQL LIKE operator and also offers the extended UNIX syntax via the RLIKE (i.e. regexp LIKE) operator. Most queries will require only the simple functionality offered by the standard SQL LIKE but the extended functionality is retained for compatibility. It should be noted that the new LIKE operator is much faster than the full UNIX version offered by 1.x.
One of the main uses of the UNIX regular expression syntax in 1.x was for performing case insensitive searches. In standard SQL, the way to perform case insensitive searches is to use functions like UPCASE() in the query. 2.0 does not yet provide functions and performing function calls on each comparison is going to reduce performance greatly. To overcome this problem, 2.0 offers a non-standard operator known as CLIKE which implements a case-insensitive version of the standard SQL LIKE operator which both solves the problem and provides much better performance.

ORDER BY and DISTINCT

Most "real-world" applications utilise the sorting and DISTINCT functionality of an SQL server when presenting a list of data returned from the database. The implementation of ORDER BY and DISTINCT in mSQL 1.x proved to be a performance bottleneck for serious applications. To overcome this, 2.0 offers a new sorting implementation (based on the quicksort algorithm) and also has a faster DISTINCT implementation.

Client Connections

As the popularity of mSQL for use behind web servers increased it became apparent that the limit of 24 simultaneous client connections was a major problem. 2.0 overcomes this by reconfiguring the server's internal client connection tables at run-time to handle the maximum number of connections possible based on the operating system and the way the kernel is configured. On an average OS, mSQL 2.0 will reconfigur itself to handle over 200 simultaneous client connections.

Run-time Configuration

Configuration details, such as the location of the UNIX and TCP ports etc., was hard-coded into the server and API library in mSQL 1.x. To provide more flexibility, all configuration details are now included in a configuration file that is loaded at run-time by both the server and any client applications that talk to the server. By modifying a value in the config file, all applications will use the new value when they are next executed. All programs also offer a run-time flag to allow the loading of a non-default configuration file to allow for testing of new servers or applications.
To reduce the risk associated with root-owned daemon processes, mSQL 2.0 can be configured to run as any user (via the config file). By default, the server will run as a user called msql once it has started. If the server is run as root it will call setuid() to change to the desired user once it has initialised itself and performed any startup operations.

Lite & W3-mSQL 2.0

mSQL 2.0 is bundled with a couple of new tools to aid in the development of applications. W3-mSQL 2.0, the second generation WWW interface package, is included as a standard tool. The new W3-mSQL code provides a complete scripting language, with full access to the mSQL API, within an HTML tag. This tool can be used to develop sophisticated GUI based applications that are platform independent and available as shared resources on a network. Along with the mSQL API, a library of nearly 60 other functions, including file I/O, strings handling and date/time manipulation are available to the scripts within a W3-mSQL enhanced web page.
To solve another problem associated with delivering "real" applications over the web, W3-mSQL provides an enhanced and flexible authentication system. Any page that is accessed via W3-mSQL is subjected to the new W3-auth access scrutiny. Access can be restricted via a combination of username/passwd and requesting host. Configuration of the security system, including management of user groups, definition of secure areas, and creation of authorised users, is via a graphical interface accessed via a web page.
Access to mSQL from scripting languages has become popular and virtually all major scripting languages provide an interface to the original mSQL server. Support for script based access to mSQL becomes standard in mSQL 2.0 with the inclusion of its own scripting language. The language, called Lite, is a stand-alone version of the language provided by W3-mSQL (i.e. the language that W3-mSQL offers inside the special HTML tags is Lite) and includes access to the mSQL API and the other functions mentioned above. Lite, as it's name implies, is a lightweight language yet provides a powerful and flexible programming environment. The syntax of the language will be very familiar to C programmers (and ESL programmers) and provides shell-like ease of use. A future release of Lite will include support for ASCII forms to provide a rapid development environment for non-graphical mSQL-based applications.

Other tools

mSQL 2.0 is bundled with a couple of new tools. To aid migration of data to and from mSQL, two new utilities have been added to the distribution for the 2.0 release. msqlimport and msqlexport provide a mechanism for the import and export of data as formatted text files. Migrating data from other databases into mSQL 2.0 will require just a simple export from the source database and a subsequent import into the new mSQL database. The tools have been developed to be flexible enough to support virtually any text based formatting of the exported data.
Other familiar tools have been modified to reflect the functionality of mSQL 2.0. relshow can provide detailed information about all the table structure elements, including the indices defined on tables. msqladmin has been modified to provide statistical information on a per connection basis (so you can monitor who's doing what and when).

2.0 features not in Beta 1

A multi-process server is currently under design and will appear in a later Beta of 2.0. This will allow multiple queries to be processed at the same time (with the appropriate locking etc).
The framework for value functions (such as UPCASE etc.) and aggregate functions (such as COUNT etc.) has been written but support for functions has not been completed.
Value expressions (such as 'UPDATE foo SET baa = baa + 1') are not in the Beta 1 release.
Enhanced security and access control has not yet been added to the 2.0 code base but will be in a later release.