Object Store Management Architectures

Alexandros Biliris and Jack Orenstein

Abstract:

Issues in building storage system for OO DBMS. and persistent languages.
Techniques for placing small and large objects on disk-diskspace management.
Client-Server architecture for OO DBMS.
Alternatives in making programming language persistent-the pointers, three pointers dereference mechanisms: import/export, software and hardware dereference.

Requirements of Advanced DB Applications:

Applications like Electronic CAD, mechanical CAD, GIS, Computer aided pulishing.
Have extremely complex datastructures and complex patterns of computation and navigation.
Computationally intensive and interactive with high performance demands.
Requirements are quite different from traditional ones in which high throughput for simple manipulations of simple data models is the goal.
Associative queries and related features such as views are useful but not essential.

Organising Objects on Disks:

Object Ids (Oid)

Uniquely identify objects for the life of the database.
Physical Oids - large and fast (no need of translation to locate an object).
Logical Oids - more flexible (allow free movement of objects).

Slotted Pages

Physical address consists of volume and page number in which object is stored, and slot number that contains object 's offset within the page.

Files
Heap

Hash files, B-Tree files - an attribute value of the object is used to determine the page in which it is stored.

Sorted File
Clustering of objects

Hints - e.g.: put the object near this object, put the object on that disk volume, etc.
Automatic - OO DBMS automatically clusters related objects based on Oids.

Large objects:

Objects that can't fit entirely in a single page for e.g digital images, audio, video,...
Need virtually unlimietd size.
Need operations that deal with a specific number of bytes within the object: read, write, insert, delete, and append.
Cost of allocation of large number of disk blocks must be minimal.
Requires efficient random access - cost of locating any given byte within the object must be independent of object size.
Good sequential access performance - I/O rates in reading or writing large chunks of bytes must be close to transfer rates.
Small changes should have small impact.
"Count" B-Tree structure - each node of the tree contains a sequence of (page no, count) pairs, indicating the child page-id and the number of bytes stored under this child from the beginning of the node.
Physical vs Logical Oids.

Moving objects
Forwarding

Disk Space Management:

Block -based allocation

Each block is addressed individually - like Unix inodes.
No notion of physical contiguity.
Free blocks can be managed by simple linear linked lists.

Extent -based allocation

Equal sized chunks of contiguous blocks are allocated.
Binary buddy system.
fixed size extents (contain segments).

Variable sized segments
Binary buddy system

Segments of a given size can start only at blocks whose block number is divisible by the size of the segment.
Avoid fragmentation.
Searching (find free).
Allocation (split).
Deallocation (coalescing).

Client-server architecture:

Method execution site

Query shipping

Adv: avoids communication cost to the client if result set is a small subset of collection.
Disadv: practically difficult to employ (needs user code to be linked with the server's code), cache synchronization, server scalability problems.

Data shipping

Adv: server can be ignorant about data format in database (e.g. file servers like NFS mounting etc.)
Disadv: heavy client - has to do all query processing and optimization.

Unit of data transfer - object vs page servers, file server.
Caching and unit of data replication - the client caches both the page and the lock mode acquired on it.

Intra transaction caching - caching within a transaction
Inter transaction caching - caching between transactions.
Cache consistency problem
Items/Locks
Data/No-lock cache.
Callback locking.
Optimistic.

Caching Alternatives:

2PL with no inter-transaction caching.
Data cache + no lock cache.

Locks granted by server only.
Client verifies page is uptodate with lock request

Server tracks latest page versions.
piggyback page with lock grant.

Data + lock cache (Callback locking).

No server access if lock cached locally.
Server calls-back lock on conflicting request.

Optimistic locking (for cache)

Assumes cached pages uptodate.
Verify at commit time.
Can add invalidation from server when server receives write. Verification still needed.

Unit of locking and recovery - access to shared data must be synchronized to ensure the ACID properties.

Locking for serializability Vs locking for cache consistency.
Multiple granularity (page Vs object).
Adaptive granularity

lock de-escalation

Lock at coarse level.
Convert to finer level on data contention (eg page -> object).

Persistence and Programming Languages:

One data model - same type system applies to both transient and persistent data.
No translation between in-memory and on-disk representation to be done by programmer.
OO DBMS can be viewed as a conventional programming language extended with persistence.
Pointers and Object -ids

both serve to identify objects.
pointer is valid only during the execution of a program - specified location with 32 bit address space.
Oid has to be valid for the life time of the object. - typically 64 to 128 bits.
Support to map Oid to pointer needed through software mediation.

Object Layout - object representation in-memory and on-disk varies. Hence reformatting or at best fixup is needed while bringing an objec to memory.
Retrieving and Updating Objects.

Retrieval may be implicit - triggered by a dereference.
Explicit request by programmer.
Detect updates automatically due to virtual memory protection violation.
Or compiler to detect updates when an object becomes dirty and pass flags to underlying storage system.

PLs:

Fortran/COBOL: Easy, no pointers.
Smalltalk/Lisp: No direct access to pointers from PL => easier than C/C++.
C/C++/Java...:

Pointers Vs Oids.
Object layout: Compiler/Architecture dependencies.
Hidden pointers (C++)

Used to implement virtual functions.
Per process (executable) value.
Set during first access/swizzling.

OO DBMS Architectures:

Import / Export objeect managers
Software dereferencing
Hardware dereferencing

Fixing Hidden Pointers:

Modify compiler.
Create and initialize object

Copying overhead.
Where to create.

Related:

The ObjectStore DB System

Fine-Grained Sharing in a Page Server OODBMS

Index Page