Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
General Notes
-------------
Access (Jet) does not in general initialize pages to zero before writing them,
so the file will contains a lot of unititialized data. This makes the task of
figuring out the format a bit more difficult than it otherwise would be.
This document will, generally speaking, provide all offsets and constants in
hex format.
Most multibyte pointer and integers are stored in little endian (LSB-MSB) order.
There is an exception in the case of indexes, see the section on index pages for
details.
Terminology
-----------
This section contains a mix of information about data structures used in the MDB
file format along with general database terminology needed to explain these
structures.
Page - A fixed size region within the file on a 2 or 4K boundry. All
data in the file exists inside pages.
Catalog Table - Tables in Access generally starting with "MSys". See the TDEF
(table definition) pages for "System Table" field.
Catalog Entry - A row from the MSysObjects table describing another database
object. The MSysObjects table definition page is always at
page 2 of the database, and a phony tdef structure is
bootstrapped to initially read the database.
Page Split - A process in which a row is added to a page with no space left.
A second page is allocated and rows on the original page are
split between the two pages and then indexes are updated. Pages
can use a variety of algorithms for splitting the rows, the
most popular being a 50/50 split in which rows are divided
evenly between pages.
Overflow Page - Instead of doing a full page split with associated index writes,
a pointer to an "overflow" page can be stored at the original
row's location. Compacting a database would normally rewrite
overflow pages back into regular pages.
Leaf Page - The lowest page on an index tree. In Access, leaf pages are of
a different type than other index pages.
UCS-2 - a two byte unicode encoding used in Jet4 files.
Covered Query - a query that can be satisfied by reading only index pages. For
instance if the query
"SELECT count(*) from Table1 where Column3 = 4" were run and
Column3 was indexed, the query could be satisfied by reading
only indexes. Because of the way Access hashes text columns
in indexes, covered queries on text columns are not possible.
Pages
-----
At it's topmost level MDB files are organized into a series of fixed sized
pages. These are 2K in size for Jet3 (Access 97) and 4K for Jet4 (Access
2000/2002). All data in MDB files exists within pages, of which there are
a number of types.
The first byte of each page idenitifies the page type as follows.
0x00 Database definition page. (Always page 0)
0x01 Data page
0x02 Table definition
0x03 Intermediate Index pages
0x04 Leaf Index pages
0x05 Page Usage Bitmaps (extended page usage)
Database Definition Page
------------------------
Each MDB database has a single definition page located at beginning of the file.
Not a lot is known about this page, and it is one of the least documented page
types. However, it contains things like Jet version, encryption keys, and name
of the creating program.
Offset 0x14 contains the Jet version of this database 0x00 for 3, 0x01 for 4
This is used by the mdb-ver utility to determine the Jet version.
Data Pages
----------
All data rows are stored in type 0x01 pages.
The header of a Jet3 data page looks like this:
+--------------------------------------------------------------------------+
| Jet3 Data Page Definition |
+------+---------+---------------------------------------------------------+
| data | length | name | description |
+------+---------+---------------------------------------------------------+
| 0x01 | 1 byte | page_type | 0x01 indicates a data page. |
| 0x01 | 1 byte | unknown | |
| ???? | 2 bytes | free_space | Free space in this page |
| ???? | 4 bytes | tdef_pg | Page pointer to table definition |
| ???? | 4 bytes | num_rows | number of records on this page |
+------+---------+---------------------------------------------------------+
| Iterate for the number of records |
+--------------------------------------------------------------------------+
| ???? | 2 bytes | offset_row | The records location on this page |
+--------------------------------------------------------------------------+
In Jet4, an additional four byte field was added. It's purpose is currently
unknown.
+--------------------------------------------------------------------------+
| Jet4 Data Page Definition |
+------+---------+---------------------------------------------------------+
| data | length | name | description |
+------+---------+---------------------------------------------------------+
| 0x01 | 1 byte | page_type | 0x01 indicates a data page. |
| 0x01 | 1 byte | unknown | |
| ???? | 2 bytes | free_space | Free space in this page |
| ???? | 4 bytes | tdef_pg | Page pointer to table definition |
| ???? | 4 bytes | unknown | Unknown |
| ???? | 4 bytes | num_rows | number of records on this page |
+------+---------+---------------------------------------------------------+
| Iterate for the number of records |
+--------------------------------------------------------------------------+
| ???? | 2 bytes | offset_row | The records location on this page |
+--------------------------------------------------------------------------+
Notes for offset_row:
- Offsets that have 0x40 in the high order byte point to a location within
the page where a Data Pointer (4 bytes) to another data page is stored. Also
known as an overflow page.
- Offsets that have 0x80 in the high order byte are deleted rows.
(These flags are delflag and lookupflag in source code)
Rows are stored from the end of the page to the top of the page. So, the first
row stored runs from bytes offset_row to page_size - 1. The next row runs from
its offset to the previous row's offset, and so on.
Decoding a row requires knowing the number and types of columns from its TDEF
page. Decoding is handled by the routine mdb_crack_row().
The Jet3 row format is:
+--------------------------------------------------------------------------+
| Jet3 Row Definition |
+------+---------+---------------------------------------------------------+
| data | length | name | description |
+------+---------+---------------------------------------------------------+
| ???? | 1 byte | num_cols | Number of columns stored on this row. |
| ???? | n bytes | fixed_cols | Fixed length columns |
| ???? | n bytes | var_cols | Variable length columns |
| ???? | 1 byte | eod | length of data from begining of record |
| ???? | n bytes | var_table[]| offset from start of row for each var_col |
| ???? | 1 byte | var_len | number of variable length columns |
| ???? | n bytes | jump_table | number of variable length columns |
| ???? | n bytes | null_mask | Null indicator. size is 1 byte per 8 cols |
| | | | 0 indicates a null value. Also used to |
| | | | represent value of boolean type columns |
+--------------------------------------------------------------------------+
Notes:
. A row will always have the number of fixed columns as specified in the table
definition, but may have less variable columns, as rows are not updated when
columns are added.
. All fixed length columns are stored first to last, followed by variable length
columns.
. The size of the null table is computed by (num_cols - 1)/8 + 1
. Fixed columns can be null (unlike some other databases).
. The var_len field indicates the size of the var_table[].
. The eod field points at the last byte of the var_cols field. It is used to
determine where the last var_col ends.
. For boolean fixed columns, the values are in null_table[]: 0 indicates a false
value, 1 indicates a true value
. An 0xFF stored in the var_table indicates that this column has been deleted.
In Jet3 offsets are stored as 1 byte fields yielding a maximum of 256 bytes. To
get around this offsets are computed using a jump table. The jump table stores
the number of the first column in this jump segment. If the size of the data is
less than 256 then no jump table will be present.
For example if the row contains 45 columns and the offset of the 14th column is
more than 256 then the first entry in the jump table will be 0xe (14). If the
23rd column is the first one at offset > 512 the second entry of the jump tab
le
would be 0x17 (23) and so on.
+--------------------------------------------------------------------------+
| Jet4 Row Definition |
+------+---------+---------------------------------------------------------+
| data | length | name | description |
+------+---------+---------------------------------------------------------+
| ???? | 2 bytes | num_cols | Number of columns stored on this row. |
| ???? | n bytes | fixed_cols | Fixed length columns |
| ???? | n bytes | var_cols | Variable length columns |
| ???? | 2 bytes | eod | length of data from begining of record |
| ???? | n bytes | var_table[]| offset from start of row for each var_col |
| ???? | 2 bytes | var_len | number of variable length columns |
| ???? | n bytes | null_mask | Null indicator. size is 1 byte per 8 cols |
| | | | 0 indicates a null value. Also used to |
| | | | represent value of bit type columns |
+--------------------------------------------------------------------------+
Notes:
. All offsets are stored as 2 byte fields including the var_table entries.
. the jump table was (thankfully) ditched in Jet4.
Each memo column (or other long binary data) in a row
+-------------------------------------------------------------------------+
| Memo Field Definition (12 bytes)
+------+---------+-------------+------------------------------------------+
| data | length | name | description |
+------+---------+-------------+------------------------------------------+
| ???? | 2 bytes | memo_len | Total length of the memo |
| ???? | 2 bytes | bitmask | See values |
| ???? | 4 bytes | lval_dp | Data pointer to LVAL page (if needed) |
| 0x00 | 4 bytes | unknown | |
+------+---------+-------------+------------------------------------------+
Values for the bitmask:
0x8000= the memo is in a string at the end of this header (memo_len bytes)
0x4000= the memo is in a unique LVAL page in a record type 1
0x0000= the memo is in n LVAL pages in a record type 2
If the memo is in a LVAL page, we use row_id of lval_dp to find the row.
offset_start of memo = (int16*) LVAL_page[ 10 + row_id * 2]
if (rowid=0)
offset_stop of memo = 2048
else
offset_stop of memo = (int16*) LVAL_page[ 10 + row_id * 2 - 2]
The length (partial if type 2) for the memo is:
memo_page_len = offset_stop - offset_start
LVAL Pages
----------
(LVAL Page are particular data pages for long data storages )
The header of a LVAL page looks like this (10 bytes) :
+------+---------+-------------+------------------------------------------+
| data | length | name | description |
+------+---------+-------------+------------------------------------------+
| 0x01 | 1 bytes | page_type | 0x01 indicate a data page |
| 0x01 | 1 bytes | unknown | |
| ???? | 2 bytes | free_space | The free space in this page |
| LVAL | 4 bytes | lval_id | The word 'LVAL' |
| ???? | 2 bytes | num_rows | Number of rows in this page |
+-------------------------------------------------------------------------+
| Iterate for the number of records |
+-------------------------------------------------------------------------+
| ???? | 2 bytes | row_offset | to the records location on this page |
+-------------------------------------------------------------------------+
Each memo record type 1 looks like this:
+------+---------+-------------+------------------------------------------+
| data | length | name | description |
+------+---------+-------------+------------------------------------------+
| ???? | n bytes | memo_value | A string which is the memo |
+-------------------------------------------------------------------------+
Each memo record type 2 looks like this:
+------+---------+-------------+------------------------------------------+
| data | length | name | description |
+------+---------+-------------+------------------------------------------+
| ???? | 4 bytes | lval_dp | Next page LVAL type 2 if memo is too long|
| ???? | n bytes | memo_value | A string which is the memo (partial) |
+-------------------------------------------------------------------------+
In a LVAL type 2 data page, you have
10 or 12 bytes for the header of the data page,
2 bytes for an offset,
4 bytes for the next lval_pg
So there is a bloc of 2048 - (10+2+4) = 2032(jet3)
or 4096 - (12+2+4) = 4078(jet4) bytes max in a page.
TDEF Pages (Table Definition)
-----------------------------
Every table in the database has a TDEF page. It contains a definition of
the columns, types, sizes, indexes, and similar information.
+-------------------------------------------------------------------------+
| Jet3/Jet4 TDEF Header
+------+---------+-------------+------------------------------------------+
| data | length | name | description |
+------+---------+-------------+------------------------------------------+
| 0x02 | 1 bytes | page_type | 0x02 indicate a tabledef page |
| 0x01 | 1 bytes | unknown | |
| 'VC' | 2 bytes | tdef_id | The word 'VC' (Jet3 only, Jet4 unknown) |
| 0x00 | 4 bytes | next_pg | Next tdef page pointer (0 if none) |
+------+---------+-------------+------------------------------------------+
TDEFs can span multiple pages for large tables, this is accomplished using the
next_pg field.
+-------------------------------------------------------------------------+
| Jet3 Table Definition Block (35 bytes) |
+------+---------+-------------+------------------------------------------+
| data | length | name | description |
+------+---------+-------------+------------------------------------------+
| ???? | 4 bytes | tdef_len | Length of the data for this page |
| ???? | 4 bytes | num_rows | Number of records in this table |
| 0x00 | 4 bytes | autonumber | value for the next value of the |
| | | | autonumber column, if any. 0 otherwise |
| 0x4e | 1 byte | table_type | 0x53: user table, 0x4e: system table |
| ???? | 2 bytes | max_cols | Max columns a row will have (deletions) |
| ???? | 2 bytes | num_var_cols| Number of variable columns in table |
| ???? | 2 bytes | num_cols | Number of columns in table (repeat) |
| ???? | 4 bytes | num_idx | Number of indexes in table |
| ???? | 4 bytes | num_real_idx| Number of indexes in table (repeat) |
| ???? | 4 bytes | used_pages | Points to a record containing the |
| | | | usage bitmask for this table. |
| ???? | 4 bytes | free_pages | Points to a similar record as above, |
| | | | listing pages which contain free space. |
+-------------------------------------------------------------------------+
| Iterate for the number of num_real_idx (8 bytes per idxs) |
+-------------------------------------------------------------------------+
| 0x00 | 4 bytes | ??? | |
| ???? | 4 bytes | num_idx_rows| (not sure) |
+-------------------------------------------------------------------------+
| Iterate for the number of num_cols (18 bytes per column) |
+-------------------------------------------------------------------------+
| ???? | 1 byte | col_type | Column Type (see table below) |
| ???? | 2 bytes | col_num | Column Number, (not always) |
| ???? | 2 bytes | offset_V | Offset for variable length columns |
| ???? | 4 bytes | ??? | |
| ???? | 4 bytes | ??? | |
| ???? | 1 byte | bitmask | low order bit indicates variable columns |
| ???? | 2 bytes | offset_F | Offset for fixed length columns |
| ???? | 2 bytes | col_len | Length of the column (0 if memo) |
+-------------------------------------------------------------------------+
| Iterate for the number of num_cols (n bytes per column) |
+-------------------------------------------------------------------------+
| ???? | 1 byte | col_name_len| len of the name of the column |
| ???? | n bytes | col_name | Name of the column |
+-------------------------------------------------------------------------+
| Iterate for the number of num_real_idx (30+9 = 39 bytes) |
+-------------------------------------------------------------------------+
| Iterate 10 times for 10 possible columns (10*3 = 30 bytes) |
+-------------------------------------------------------------------------+
| ???? | 2 bytes | col_num | number of a column (0xFFFF= none) |
| ???? | 1 byte | col_order | 0x01 = ascendency order |
+-------------------------------------------------------------------------+
| ???? | 4 bytes | unknown | |
| ???? | 4 bytes | first_dp | Data pointer of the index page |
| ???? | 1 byte | flags | See flags table for indexes |
+-------------------------------------------------------------------------+
| Iterate for the number of num_real_idx |
+-------------------------------------------------------------------------+
| ???? | 4 bytes | index_num | Number of the index |
| | | |(warn: not always in the sequential order)|
| ???? | 4 bytes | index_num2 | Number of the index (repeat) |
| 0xFF | 4 bytes | ??? | |
| 0x00 | 4 bytes | ??? | |
| 0x04 | 2 bytes | ??? | |
| ???? | 1 byte | primary_key | 0x01 if this index is primary |
+-------------------------------------------------------------------------+
| Iterate for the number of num_real_idx |
+-------------------------------------------------------------------------+
| ???? | 1 byte | idx_name_len| len of the name of the index |
| ???? | n bytes | idx_name | Name of the index |
+-------------------------------------------------------------------------+
| ???? | n bytes | ??? | |
| 0xFF | 2 bytes | ??? | End of the tableDef ? |
+-------------------------------------------------------------------------+
+-------------------------------------------------------------------------+
| Jet4 Table Definition Block (55 bytes) |
+------+---------+-------------+------------------------------------------+
| data | length | name | description |
+------+---------+-------------+------------------------------------------+
| ???? | 4 bytes | tdef_len | Length of the data for this page |
| ???? | 4 bytes | unknown | unknown |
| ???? | 4 bytes | num_rows | Number of records in this table |
| 0x00 | 4 bytes | autonumber | value for the next value of the |
| | | | autonumber column, if any. 0 otherwise |
| ???? |16 bytes | unknown | unknown |
| 0x4e | 1 byte | table_type | 0x53: user table, 0x4e: system table |
| ???? | 2 bytes | max_cols | Max columns a row will have (deletions) |
| ???? | 2 bytes | num_var_cols| Number of variable columns in table |
| ???? | 2 bytes | num_cols | Number of columns in table (repeat) |
| ???? | 4 bytes | num_idx | Number of indexes in table |
| ???? | 4 bytes | num_real_idx| Number of indexes in table (repeat) |
| ???? | 4 bytes | used_pages | Points to a record containing the |
| | | | usage bitmask for this table. |
| ???? | 4 bytes | free_pages | Points to a similar record as above, |
| | | | listing pages which contain free space. |
+-------------------------------------------------------------------------+
| Iterate for the number of num_real_idx (12 bytes per idxs) |
+-------------------------------------------------------------------------+
| 0x00 | 4 bytes | ??? | |
| ???? | 4 bytes | num_idx_rows| (not sure) |
| 0x00 | 4 bytes | ??? | |
+-------------------------------------------------------------------------+
| Iterate for the number of num_cols (25 bytes per column) |
+-------------------------------------------------------------------------+
| ???? | 1 byte | col_type | Column Type (see table below) |
| ???? | 4 bytes | unknown | matches first unknown definition block |
| ???? | 2 bytes | col_num | Column Number |
| ???? | 2 bytes | offset_V | Offset for variable length columns |
| ???? | 2 bytes | col_num | Column Number (repeat) |
| ???? | 4 bytes | ??? | |
| ???? | 1 byte | bitmask | low order bit indicates variable columns |
| ???? | 1 byte | ??? | seems to be 1 when variable len |
| 0000 | 4 bytes | ??? | |
| ???? | 2 bytes | offset_F | Offset for fixed length columns |
| ???? | 2 bytes | col_len | Length of the column (0 if memo) |
+-------------------------------------------------------------------------+
| Iterate for the number of num_cols (n*2 bytes per column) |
+-------------------------------------------------------------------------+
| ???? | 2 bytes | col_name_len| len of the name of the column |
| ???? | n bytes | col_name | Name of the column (UCS-2 format) |
+-------------------------------------------------------------------------+
| Iterate for the number of num_real_idx (30+9 = 39 bytes) |
+-------------------------------------------------------------------------+
| ???? | 4 bytes | ??? | |
+-------------------------------------------------------------------------+
| Iterate 10 times for 10 possible columns (10*3 = 30 bytes) |
+-------------------------------------------------------------------------+
| ???? | 2 bytes | col_num | number of a column (0xFFFF= none) |
| ???? | 1 byte | col_order | 0x01 = ascendency order |
+-------------------------------------------------------------------------+
| ???? | 4 bytes | unknown | |
| ???? | 4 bytes | first_dp | Data pointer of the index page |
| ???? | 1 byte | flags | See flags table for indexes |
| ???? | 9 bytes | unknown | |
+-------------------------------------------------------------------------+
| Iterate for the number of num_real_idx (27 bytes) |
+-------------------------------------------------------------------------+
| ???? | 4 bytes | unknown | matches first unknown definition block |
| ???? | 4 bytes | index_num | Number of the index |
| | | |(warn: not always in the sequential order)|
| ???? | 4 bytes | index_num2 | Number of the index (repeat) |
| 0xFF | 4 bytes | ??? | |
| 0x00 | 4 bytes | ??? | |
| 0x04 | 2 bytes | ??? | |
| ???? | 1 byte | primary_key | 0x01 if this index is primary |
| ???? | 4 bytes | unknown | |
+-------------------------------------------------------------------------+
| Iterate for the number of num_real_idx |
+-------------------------------------------------------------------------+
| ???? | 2 bytes | idx_name_len| len of the name of the index |
| ???? | n bytes | idx_name | Name of the index (UCS-2) |
+-------------------------------------------------------------------------+
| ???? | n bytes | ??? | |
| 0xFF | 2 bytes | ??? | End of the tableDef ? |
+-------------------------------------------------------------------------+
Index flags (not complete):
0x01 Unique
0x02 IgnoreNuls
0x08 Required