Object Store Management Architectures
Alexandros Biliris and Jack Orenstein
Abstract:
-
Issues in building storage system for OO DBMS. and persistent languages.
-
Techniques for placing small and large objects on disk-diskspace management.
-
Client-Server architecture for OO DBMS.
-
Alternatives in making programming language persistent-the pointers, three
pointers dereference mechanisms: import/export, software and hardware dereference.
Requirements of Advanced DB Applications:
-
Applications like Electronic CAD, mechanical CAD, GIS, Computer
aided pulishing.
-
Have extremely complex datastructures and complex patterns of computation
and navigation.
-
Computationally intensive and interactive with high performance demands.
-
Requirements are quite different from traditional ones in which high throughput
for simple manipulations of simple data models is the goal.
-
Associative queries and related features such as views are useful but not
essential.
Organising Objects on Disks:
-
Object Ids (Oid)
-
Uniquely identify objects for the life of the database.
-
Physical Oids - large and fast (no need of translation to locate an object).
-
Logical Oids - more flexible (allow free movement of objects).
-
Slotted Pages
-
Physical address consists of volume and page number in which object
is stored, and slot number that contains object 's offset within the page.
-
Files
-
Heap
-
Hash files, B-Tree files - an attribute value of the object is used to
determine the page in which it is stored.
-
Sorted File
-
Clustering of objects
-
Hints - e.g.: put the object near this object, put the
object on that disk volume, etc.
-
Automatic - OO DBMS automatically clusters related objects based on Oids.
Large objects:
-
Objects that can't fit entirely in a single page for e.g digital images,
audio, video,...
-
Need virtually unlimietd size.
-
Need operations that deal with a specific number of bytes within the object:
read, write, insert, delete, and append.
-
Cost of allocation of large number of disk blocks must be minimal.
-
Requires efficient random access - cost of locating any given byte within
the object must be independent of object size.
-
Good sequential access performance - I/O rates in reading or writing large
chunks of bytes must be close to transfer rates.
-
Small changes should have small impact.
-
"Count" B-Tree structure - each node of the tree contains a sequence of
(page no, count) pairs, indicating the child page-id and the number
of bytes stored under this child from the beginning of the node.
-
Physical vs Logical Oids.
-
Moving objects
-
Forwarding
Disk Space Management:
-
Block -based allocation
-
Each block is addressed individually - like Unix inodes.
-
No notion of physical contiguity.
-
Free blocks can be managed by simple linear linked lists.
-
Extent -based allocation
-
Equal sized chunks of contiguous blocks are allocated.
-
Binary buddy system.
-
fixed size extents (contain segments).
-
Variable sized segments
-
Binary buddy system
-
Segments of a given size can start only at blocks whose block number is
divisible by the size of the segment.
-
Avoid fragmentation.
-
Searching (find free).
-
Allocation (split).
-
Deallocation (coalescing).
Client-server architecture:
-
Method execution site
-
Query shipping
-
Adv: avoids communication cost to the client if result set
is a small subset of collection.
-
Disadv: practically difficult to employ (needs user code to be linked with
the server's code), cache synchronization, server scalability problems.
-
Data shipping
-
Adv: server can be ignorant about data format in database (e.g. file servers
like NFS mounting etc.)
-
Disadv: heavy client - has to do all query processing and optimization.
-
Unit of data transfer - object vs page servers, file server.
-
Caching and unit of data replication - the client caches both the page
and the lock mode acquired on it.
-
Intra transaction caching - caching within a transaction
-
Inter transaction caching - caching between transactions.
-
Cache consistency problem
-
Items/Locks
-
Data/No-lock cache.
-
Callback locking.
-
Optimistic.
-
Caching Alternatives:
-
2PL with no inter-transaction caching.
-
Data cache + no lock cache.
-
Locks granted by server only.
-
Client verifies page is uptodate with lock request
-
Server tracks latest page versions.
-
piggyback page with lock grant.
-
Data + lock cache (Callback locking).
-
No server access if lock cached locally.
-
Server calls-back lock on conflicting request.
-
Optimistic locking (for cache)
-
Assumes cached pages uptodate.
-
Verify at commit time.
-
Can add invalidation from server when server receives write. Verification
still needed.
-
Unit of locking and recovery - access to shared data must be synchronized
to ensure the ACID properties.
-
Locking for serializability Vs locking for cache consistency.
-
Multiple granularity (page Vs object).
-
Adaptive granularity
-
lock de-escalation
-
Lock at coarse level.
-
Convert to finer level on data contention (eg page -> object).
Persistence and Programming Languages:
-
One data model - same type system applies to both transient and persistent
data.
-
No translation between in-memory and on-disk representation to be done
by programmer.
-
OO DBMS can be viewed as a conventional programming language extended with
persistence.
-
Pointers and Object -ids
-
both serve to identify objects.
-
pointer is valid only during the execution of a program - specified location
with 32 bit address space.
-
Oid has to be valid for the life time of the object. - typically 64 to
128 bits.
-
Support to map Oid to pointer needed through software mediation.
-
Object Layout - object representation in-memory and on-disk varies. Hence
reformatting or at best fixup is needed while bringing an objec to memory.
-
Retrieving and Updating Objects.
-
Retrieval may be implicit - triggered by a dereference.
-
Explicit request by programmer.
-
Detect updates automatically due to virtual memory protection violation.
-
Or compiler to detect updates when an object becomes dirty and pass flags
to underlying storage system.
-
PLs:
-
Fortran/COBOL: Easy, no pointers.
-
Smalltalk/Lisp: No direct access to pointers from PL => easier than C/C++.
-
C/C++/Java...:
-
Pointers Vs Oids.
-
Object layout: Compiler/Architecture dependencies.
-
Hidden pointers (C++)
-
Used to implement virtual functions.
-
Per process (executable) value.
-
Set during first access/swizzling.
OO DBMS Architectures:
-
Import / Export objeect managers
-
Software dereferencing
-
Hardware dereferencing
Fixing Hidden Pointers:
-
Modify compiler.
-
Create and initialize object
-
Copying overhead.
-
Where to create.
Related:
The ObjectStore DB System
Fine-Grained Sharing in a Page Server
OODBMS
Index Page