Slides by Kalpesh Kapoor
ObjectStore is an object-oriented database management system (OODBMS)
that provides a tightly integrated language interface to the traditional
DBMS features of persistent storage , transaction management (concurrency
control and recovery ), distributed data access, and associative queries.
ObjectStore was designed to provide a unified programmatic interface to
both persistently allocated data ( i.e., data that lives beyond the
execution of an application program ) and transiently allocated data
( i.e. data that doesn't survive beyond an application's execution),
with object-access speed for persistent data usually equal to that of
an in-memory dereference of a pointer to transient data.
- Close integration with programming language
- Persistence independent of type
- No need to inherit from a base class, in ODMG we need to say public
Persistent Object
- No new type system
- No translation code, no copying - i.e. No translation code between
disk resident representation and representation during execution. Such
a mapping of tuples to object-data members is called "impedance matching".
- Locking and logging is automatic
- Expressive power of C++
- Reusability and libraries
- Conversion from existing application to add persistence
- Type checking - Applies to persistent as well as transient data.
- High performance for target applications
- Temporal locality - Many data items will be used "mostly" by one
user over a short of time.
- Spatial locality - Often only a small portion of database
is used.
- Fine interleaving - Many database operations are small. Operation
overhead cost during networking should be less. Operation like 'fetching
an object' i.e. dereferencing of pointer should be fast. To guarantee
this ObjectStore uses indexes , query optimization , log-based recovery,
and so on.
- Fast dereferencing
- + Usual DB goals
- C Library
- C++ Library
- Extended C++
- Persistent : Keyword specifies a storage class, example
persistent < db > department *engg_dept; // associates names with persistent objects. - Alternatives : department* engg_dept = db->find_object("engg_dept");
- Inverse member relationships
- automatic maintenance of integrity constraints
deletion from one deletes inverse link
Examples from OBJECTSTORE
class department
{
os_set<employee **> employees inverse_member employee :: Dept;
add_employee(emp *e)
{
employee->insert(e);
}
int works_here(emp *e)
{
return employees->contains(e);
}
};
someFunction()
{
employee *e;
department *d;
.
.
foreach(e,d->employees)
{
e->incrementSalary(5);
}
}
main()
{
database * db;
Persistent<db> department * engg_dept;
db = database :: open("/a/b");
transaction::begin();
employee * emp = new (db) employee("Ravi");
eng_dept->add_employee(emp);
emp->salary = 1000;
transaction::commit();
}
os.Set<emp *> all_emp
os.Set<emp *>& overpaid_emp = all_emp[: salary >= 1000 :] ;
all_emp[: dept->employees[: name == "Fred" :] :];
// all emp in Fred's department
// or we can say,
all_emp->query('employee *', "dept->employees[: name == 'Fred' :]");
- No joins(as of 1991), OQL support now.
- Now permits SQL like language.
- Multiple versions of object
- Parallel editing -
If many users wants to update the same data concurrently
then they can create alternate versions of their own.
- Merging left to user
- Versioning independent of type
- Check out/Check in - User checks out a version of an object, makes
changes and then check changes back in to the main development project so
that they are visible to other members of cooperating team.
- Versioning for groups of objects in Workspaces
- Workspaces - User can control which versions to use , for each object
or group of objects of interest by setting up private workspaces that
specify the desired version. These workspaces could be shared by different
users.
- Access protection - ObjectStore needs dereferencing a pointer
to persistent and transient data to compile similarly and should take comparable
time. Wants to access objects which really has not yet been retrieved
from database . Therefore here ObjectStore takes advantage of CPU's virtual
memory hardware. ObjectStore marks pages no access , read , write,read-write
and hence virtual memory hardware takes care of access violation.
- Databases
- in fly
- in disk partitions
- Client server communication
- Page server implementation
- Concurrency control, log based recovery
- Two phase commit
- Communication via net/shared memory/local sockets
- Server knows nothing about page contents
- Client cache in client host
- Flush pages or commit, but retain in cache
- All processing at client side - This contrasts with traditional
RDBMS systems in which the server is largely responsible for handling
all query processing, optimization and formatting.
- Client side page cache
- On end of transaction
- Unmap pages
- Write modified pages back to server ( still in cache )
- Cache coherence
- Page in client cache in shared or exclusive mode
- Server tracks which clients have page and in what mode
- Call back locking/invalidation
- On conflicting request server performs callback locking.
- Client returns 'called' page if/when not locked
- Swizzling
- Tag tables - give type implementation
- location of objects in page
- types of objects ( index on type table in schema )
- To minimize the space overhead while keeping access fast, the tag
table is heavily compressed and indexed.
- Placement of objects within the database
- Segments of database
- Segments can be specified when creating object
- Segment at-a-time transfer possible
- Objects can cross page boundaries
- Clustering of objects
- of frequently accessed objects.
- Increases locality -
client cache is efficiently used and fewer pages need to be transferred for
accessing objects.
- Collection facilities
- Cursors - For iterating.
- Collections
- os_collection
- os_set, os_bag, os_list
- various other representation based on user supplied access pattern.
- Collection representation can change dynamically
e.g. set as array/linked-list changes to BTree as size grows
run-time function dispatched as virtual functions
- Optimization issues
- Relation statistics not available at compile time - as the collection
are not known by the name, and may be pointed at or evaluated as a result
of an expression thus we need multiple strategies.
- Optimizing path expression dereference, for ex.
emp -> manager -> dept
emp -> company -> dept
paths are precomputed joins and can have indices - Collections may be known at runtime only
- ObjectStore's C++ compiler parses and optimizes the query at compile
time. If queries are expressed using library interface then they are parsed
and optimized at run time.
- Multiple strategies , selection at run time
- Mainly index access decisions
PATHS are precomputed joins
foreach ( p in r )
printf(p -> mgr -> spouse -> name )
// that is access via secondary index
- Code generation
- two strategies (scan/index) for each node
- checks runtime what indices are available, etc.
- Join optimization problem occur when two paths expressions are compared
- regular path expressions similar to joins via secondary indices
if data does not fit in memory can lead to thrashing, but ObjectStore ignores this.
- Index maintenance
- Very expensive if every assignment needs an index check
- Declaration of data members that could potentially
be used as index keys. for ex.
class Person {
int age indexable;// (does not require index must exist )
}; - Index on path
- Path indices
eg. a -> b -> c == 21 - Single step - more work to evaluate
e.g. person -> child -> school == "IIT" - Full path - faster in above query but makes index update more difficult
- Update affects not only local indices
e.g. find people with a child named Fred
class Person{
os_set<Person *> child;
}
// Updates can be on new person, new child for person, name change for child, etc.
- When a indexable data member is updated
- All affected access methods are updated
- All access methods downstream in affected index paths are updated
- Update to collection, triggers updates that may affect all access
methods of all indexes of the collection.
- Performance
- Cold cache at client - Empty cache. Cold cache times are dominated
by the time required to get data from the disk.
- Warm cache at client - Cold cache after a number of iterations have
run on it becomes warm. Warm times reflect processing speed of data that
is already present at the client.
- Cattell benchmark ( now OO1 and OO7 )
- Conclusion
- ObjectStore was designed to perform complex manipulations on large
databases.
- Ease of use, expressive power, code reusability and tight integration
with the host environment along with high speed performance
- To meet these goals virtual memory architecture is used
- Single type system
- ObjectStore's collection, relationship and query facilities provide
support for conceptual modeling constructs such as multivalued attributes,
and many-to-many relationships can be translated directly into declarative
ObjectStore constructs.
Related:
Object Store Management Architecture
Fine-Grained Sharing in a Page Server
OODBMS
Index Page