The data processing efficiency provided by APSCCS is a result of discrete data storage and precision retrieval. The processing of data within a computing system may
be efficiently or inefficiently slow. An efficiently slow case involves a large dataset where most or all processed data are relevant in generating the result. An
example is when an enormous amount of requests processed by a server reflects slow responses to end users. An inefficiently slow case occurs when a large dataset is
processed as an overhead instead of contributing to the result. A large database table that is scanned to retrieve or update a significantly smaller set of records
is an example.
The APSCCS interface provides a means to avoid blind scanning and to facilitate retrieval of data subsets with precision. These features contribute toward eliminating
efficient, inefficient and any other kind of slow data processing. The data result of an APSCCS compression is guaranteed to be secure. Therefore, data transmissions
between endpoints should always be preceeded by an APSCCS compression. The data may require filtering at a server to avoid redundant transmissions. The filtering in
RAM would only apply to security key files. The precise location of relevant data in a filtering operation depends on a data case. The procedure for structured data
as in a database case will be explained.
|
Besides, data requests triggered by users of a client system should be sent in chunks aligned with multiples of an output device capacity. Such alignment should account
for extra data that prevents a delay due to server response time. This may be achieved by tracking a buffer offset during any client request for output. The client request
for data in such a case should occur as a background operation. Furthermore, The importance of minimal disk access to performance cannot be overlooked. The first paragraph
under Data Streaming explains factors that contribute to optimal performance of disk access. An optimal chunk size for database-related operations is one that can be
processed within a split second but spans large data. An APSCCS compression of large data is an important step toward data migration. This step prepares data for fast
retrieval in chunk form.
|
The mechanism that APSCCS provides to facilitate data processing will be illustrated through a database management system scenario. The specified technique can be applied
directly or customized for any environment that reads and writes data (in memory and on disk). The goal of this technique is to minimize searching blindly for data and to
identify relevant memory locations (or equivalents). The combination of avoiding a blind search, identifying memory locations and limiting processed data to an optimum
threshold contributes to efficient performance. The row, column and table of a database are logical units that do not reflect internal processing details.
A database query promotes this disconnection from internal processing by using the logical units in a statement. The APSCCS interface creates an effective bridge between a
query and actual data processing. This bridge may be understood by considering the storage mechanism in a sample database system architecture. The database system may define
a hierarchy of logical structures called Tablespace, Segment, Extent and Page (in descending order). The physical datafile stored on disk is represented by an extent, which
contains contiguous pages. A page may contain the full or partial rows of a table (depending on row size).
|
Therefore, the system may scan, load or process multiple datafiles to execute a query. The bridge that APSCCS can provide begins with compressing contiguous datafiles. The
datafiles would first be combined into a single file that aligns with a logical scope such as a tablespace. An environment would extract all rows for such a scope. The APSCCS
interface compresses a file from 2GB maximum size to 80 bytes. A chunk size less than 2GB may be supplied as parameter to the interface. This compression creates a security
key file of same size as the supplied parameter. Therefore, a compression of combined datafiles may produce file divisions.
The compression of combined datafiles can be expedited through parallel processing since APSCCS can create an arbitrary number of chunks from multiple positions within a source
file in any order. The 80-byte compressed files have a similar name with numerical suffixes from 1 to number of divisions. The security key files have a similar pattern with a
different primary name compared to 80-byte files. The APSCCS process of creating file divisions is fast since it involves machine-level operations devoid of data abstractions.
Besides, creating such file divisions without APSCCS would be relatively awkward to implement and gives rise to vulnerable files.
|
|
The APSCCS bridge ultimately leads to acquiring a pair of 80-byte and security key files with same suffix number for decompression. This retrieval of files with a specific suffix
number avoids the slow scanning for rows that can occur with large data. This process is followed by retrieving specific rows from a set of decompressed datafiles. These ultimate
steps are made possible by another crucial part of the bridge provided by APSCCS.
|
The retrieval of specific rows is made possible by what may be called column reduction. A column reducer is an efficient replacement for the conventional database index. The
column reducer comprises three parts of data. The first part contains substrings of a fixed size derived from original column values. A set of row identifiers that align with each
column make up the second part. The third part is a set of numerical file positions derived from an APSCCS compression of datafiles. These three parts are aligned contiguously in
a vertical sequence of first, second then third part.
This differs from the sorted set of concatenated column values and row identifiers of a conventional index. Also, unlike an index, there are no column combinations possible with a
reducer. An increase in number of column reducers has no impact on query performance. Only the storage space on disk is affected by an increase. The column reducer is meant to be a
mandatory component of query execution rather than an option. Besides, a client application is best suited to limiting the number of column reducers needed within a defined scope of
operation.
|
The following guideline leans toward migrating an existing database, but applies features relevant to any case. A database system may create column reducers after compressing combined
datafiles through APSCCS. The file divisions created by APSCCS from datafiles become source files for generating column reducers. The files are read in sequence to retrieve linked
column substrings, row identifiers and numerical suffixes. The substrings should be a minimum of 8 bytes to facilitate 64-bit memory alignment. The size of 8 bytes is optimal for any kind
of data. However, any substring that needs to exceed 8 bytes should be a factor of 8 for optimum performance.
Also, column values less than 8 bytes should be padded with a distinguishing character. The column substrings can be extracted from a decompressed pair of matching source file divisions
then sorted in memory before saving to disk. This operation would perform efficiently in parallel processing due to the APSCCS advantage that can decompress file divisions in arbitrary
order. The parallel tasks could each be assigned a sequential set of source file divisions according to environment capacity. This sorting of columns per file division provides a sorted
base that makes the entire sorting process more efficient.
|
The sorting process would insert a column substring at a correct position. This may be achieved by shifting data from the target position downward (copying to extra memory). The row
identifiers and chunk numbers are inserted at similar positions relative to their first values in a file to align them with column substrings. The process should also consider RAM space
by saving sorted data to disk and loading partial data from disk iteratively. The 8-byte aligned substrings allow this implementation to be straightforward. A column substring yet to be
retrieved may belong anywhere within a sorted set.
Therefore, column reducers cannot be created until all column substrings have been retrieved. The APSCCS interface would compress a finalized column file to produce 80-byte and security
key files in a similar manner mentioned for datafiles. This process can also be expedited through parallel processing. The column reducer provides an immediate bridge to a database query.
|
|
A data map is nececessary to align numerical file suffixes of a finalized column file with column substrings. This map creation should be aligned with the substring extraction process.
First, a chunk size is chosen for each file division in an APSCCS compression. The selected chunk size should be a factor of 8. The default and maximum size is 2GB if no chunk
size parameter is passed to APSCCS. The map contains substrings that coincide with factors of a chunk size and their corresponding numerical order. This step identifies the source
file divisions. The derived map then becomes equipped to align two values with each another. These are the numerical position of a source file division and corresponding last column
substring.
The APSCCS compression of a finalized column file would now be aligned with this derived map. The numerical order of source file divisions and numerical suffixes of APSCCS divisions
become aligned with column substrings in a map. A database query would apply the map, APSCCS compressed datafiles and column reducers in the following manner. A typical query may contain
one column value associated with a conditional clause meant to limit the range of valid results. A derived map can be loaded to RAM and scanned to match the column value of a query. This
matching takes advantage of the alphabetical order of column values in a map. Therefore, a map would be scanned to identify the first column value greater than what is stated in a query.
|
The corresponding numerical order of this column value now points to a specific APSCCS column reducer. A decompression of matching 80-byte and security key file divisions for the column
reducer would reveal sorted column values that satisfy a conditional clause. These column values are followed by matching row identifiers and numerical file suffixes. The numerical suffixes
point to file divisions created by an APSCCS compression of combined datafiles.
|
|
A decompression of matching 80-byte and security key file divisions for a corresponding numerical suffix provides the target rows of a query. However, such rows may be a subset of entire
rows related to a query. Therefore, retrieval of multiple column reducers may be necessary. This scenario would benefit from parallel processing that APSCCS facilitates. A conditional
clause would determine the sets of column reducers that are relevant to a query. These column reducers may represent either a single file division or consecutive file divisions.
Any insertion to a chunk that has reached its size limit may be viewed as requiring adjustment to subsequent chunks. However, the APSCCS process avoids cascading insertions to contiguous
chunks. Instead, an insertion to a chunk that has reached its size limit creates an adjacent file named as A1 attached to the preceding file name. This feature will be available in the next
version of APSCCS. The quantity of attached A's depict a corresponding number of new chunks relative to an original chunk file. The optimal chunk sizes selected for column reducers and compressed
datafiles would be large data processed within a split second. This strategy minimizes the seek time to retrieve data from disk due to a relatively smaller number of file divisions.
|
The column reducer facilitates efficient reading from a database of any size. The other database operations must be followed by reading to acquire precise locations for changing, deleting or
inserting data. This creates an opportunity for a process that could be named as Focused on Read Operations or FRO. A background operation that caters to data updates, insertions and deletions
would be introduced. The data reading implied by any query is sufficient to fulfill all read and write requests. A sample request to update data in a source would involve a query that either
specifies replacement data or includes a read operation for such data.
A foreground operation would then relinquish source updates to the background operation. The response to this request need not wait for a source update to complete. Instead, data returned in
a request would apply replacement data specified in a query or the results of an included (perhaps implied) read operation. A data structure that contains elements retrieved by parsing a query
would be necessary. The elements of this data structure should align with table columns, non-read operations and column values specified in a query. The foreground operation would create a set
of current instances of this data structure from extracted elements belonging to each query or request. The foreground operation would then mark each instance ready for processing.
|
The background operation first reads every unit marking before processing corresponding data structures. The background operation should process a discrete set of requests at a time. The
foreground operation should process one request at a time. Also, concurrent operations that warrant synchronization through a lock or other blocking mechanisms can be avoided. The concurrent
reading of a source by multiple requests needs no synchronization. The concurrent writing or deleting by multiple requests can be managed effectively as follows. All datasource changes should
be initiated by specific users.
The client application should define a column that identifies users. This column would restrict table changes to the data context that belongs to each user. A master context may be necessary to
manage multiple or all users. The maximum number of concurrent requests in such a case is 2. The client application would determine an order of precedence for master and normal requests. Any
resource that is common to multiple users should be reflected as a quantity to each user. The constituent data for such a resource should be changed only through a background operation. A count
update may be reflected to each user in the following manner.
|
A discrete number of requests could be parsed to identify references to the common resource. The count update provided to users is current within limits of existing requests and queries. The
parsing of queries to fulfill user updates provides an advantage in case multiple users need to share a common credential. The benefit of discrete data access still applies to users of
a common credential. The in-memory query parsing enhances this benefit. The parsing of queries prevents performance issues associated with engaging a large datasource in real-time.
However, the processing of enormous requests in memory can strain resources.
The APSSCS interface provides an exclusive solution to handling enormous requests. A set of requests that exceed an arbitrary memory size may be saved as a file that becomes compressed through
APSCCS. The compressed files can be decompressed as chunks to replenish completed requests in memory. The FRO approach satisfies data integrity in a unique manner. A database engine may now delay
but guarantee that requested changes to a datasource take place. However, the changes expected by a client are accurately reflected in real-time.
|
The data extracted by a database engine through FRO provides a client with sufficient feedback. The new background operation of a database engine may now manage changes in a simple manner. All
insertions and deletions can take place in any order, but updates can occur chronologically. Therefore, data integrity is achieved without the negative impact that immediate datasource feedback
imposes on performance.
In conclusion, a consistently efficient query execution arises from the following factors:
- Processing of sorted colums in memory
- Retrieval of data subsets with precision
- Optimal seek time on hard disk
- An FRO strategy involving a foreground operation, background operation and query parsing that provides immediate feedback to each client
|