Object Store

Slides by Kalpesh Kapoor

Object Store

Abstract

ObjectStore is an object-oriented database management system (OODBMS) that provides a tightly integrated language interface to the traditional DBMS features of persistent storage , transaction management (concurrency control and recovery ), distributed data access, and associative queries. ObjectStore was designed to provide a unified programmatic interface to both persistently allocated data ( i.e., data that lives beyond the execution of an application program ) and transiently allocated data ( i.e. data that doesn't survive beyond an application's execution), with object-access speed for persistent data usually equal to that of an in-memory dereference of a pointer to transient data.

Goals

Close integration with programming language
- Persistence independent of type
- No need to inherit from a base class, in ODMG we need to say public Persistent Object
- No new type system
- No translation code, no copying - i.e. No translation code between disk resident representation and representation during execution. Such a mapping of tuples to object-data members is called "impedance matching".
- Locking and logging is automatic
- Expressive power of C++
- Reusability and libraries
- Conversion from existing application to add persistence
- Type checking - Applies to persistent as well as transient data.
High performance for target applications
- Temporal locality - Many data items will be used "mostly" by one user over a short of time.
- Spatial locality - Often only a small portion of database is used.
- Fine interleaving - Many database operations are small. Operation overhead cost during networking should be less. Operation like 'fetching an object' i.e. dereferencing of pointer should be fast. To guarantee this ObjectStore uses indexes , query optimization , log-based recovery, and so on.
- Fast dereferencing
+ Usual DB goals

Application Interface

C Library
C++ Library
Extended C++
- Persistent : Keyword specifies a storage class, example
  persistent < db > department *engg_dept; // associates names with persistent objects.
- Alternatives : department* engg_dept = db->find_object("engg_dept");
- Inverse member relationships
  - automatic maintenance of integrity constraints deletion from one deletes inverse link

Examples from OBJECTSTORE

	class department
	{
	    os_set<employee **> employees inverse_member employee :: Dept;
	
	    add_employee(emp *e) 
	    {
	        employee->insert(e);
	    }
	
	    int works_here(emp *e)
	    {
	        return employees->contains(e);
	    }
	            
	};

	someFunction()
	{
	    employee *e;
	    department *d;
	    .
	    .
	
	    foreach(e,d->employees)
	    {
		    e->incrementSalary(5);
	    }
	}

	main()
	{
	    database * db;
	    Persistent<db> department * engg_dept;
	
	    db = database :: open("/a/b");
	
	    transaction::begin();
	
	    employee * emp = new (db) employee("Ravi");
	
	    eng_dept->add_employee(emp);
	
	    emp->salary = 1000;
	
	    transaction::commit();
	}

Associative queries

Selections

	    os.Set<emp *> all_emp
	    os.Set<emp *>& overpaid_emp = all_emp[: salary >= 1000 :] ;

Nested selections

	    all_emp[: dept->employees[: name == "Fred" :] :];
	
	    // all emp in Fred's department
	    // or we can say,

	    all_emp->query('employee *', "dept->employees[: name == 'Fred' :]");

No joins(as of 1991), OQL support now.
Now permits SQL like language.

Versions

Multiple versions of object
Parallel editing - If many users wants to update the same data concurrently then they can create alternate versions of their own.
Merging left to user
Versioning independent of type
Check out/Check in - User checks out a version of an object, makes changes and then check changes back in to the main development project so that they are visible to other members of cooperating team.
Versioning for groups of objects in Workspaces
Workspaces - User can control which versions to use , for each object or group of objects of interest by setting up private workspaces that specify the desired version. These workspaces could be shared by different users.

Implementations

Access protection - ObjectStore needs dereferencing a pointer to persistent and transient data to compile similarly and should take comparable time. Wants to access objects which really has not yet been retrieved from database . Therefore here ObjectStore takes advantage of CPU's virtual memory hardware. ObjectStore marks pages no access , read , write,read-write and hence virtual memory hardware takes care of access violation.
Databases
- in fly
- in disk partitions
Client server communication
- Page server implementation
  - Concurrency control, log based recovery
  - Two phase commit
- Communication via net/shared memory/local sockets
- Server knows nothing about page contents
- Client cache in client host
- Flush pages or commit, but retain in cache
- All processing at client side - This contrasts with traditional RDBMS systems in which the server is largely responsible for handling all query processing, optimization and formatting.
- Client side page cache
- On end of transaction
  - Unmap pages
  - Write modified pages back to server ( still in cache )
Cache coherence
- Page in client cache in shared or exclusive mode
- Server tracks which clients have page and in what mode
- Call back locking/invalidation
- On conflicting request server performs callback locking.
- Client returns 'called' page if/when not locked
Swizzling
- Tag tables - give type implementation
- location of objects in page
- types of objects ( index on type table in schema )
- To minimize the space overhead while keeping access fast, the tag table is heavily compressed and indexed.
Placement of objects within the database
- Segments of database
- Segments can be specified when creating object
- Segment at-a-time transfer possible
- Objects can cross page boundaries

Clustering of objects
- of frequently accessed objects.
- Increases locality - client cache is efficiently used and fewer pages need to be transferred for accessing objects.
- Collection facilities
  - Cursors - For iterating.
  - Collections
    - os_collection
      - os_set, os_bag, os_list
      - various other representation based on user supplied access pattern.
    - Collection representation can change dynamically
      e.g. set as array/linked-list changes to BTree as size grows
      run-time function dispatched as virtual functions
Optimization issues
- Relation statistics not available at compile time - as the collection are not known by the name, and may be pointed at or evaluated as a result of an expression thus we need multiple strategies.
- Optimizing path expression dereference, for ex.
  emp -> manager -> dept
  emp -> company -> dept
  paths are precomputed joins and can have indices
- Collections may be known at runtime only
- ObjectStore's C++ compiler parses and optimizes the query at compile time. If queries are expressed using library interface then they are parsed and optimized at run time.
- Multiple strategies , selection at run time
- Mainly index access decisions
  PATHS are precomputed joins
  foreach ( p in r )
  printf(p -> mgr -> spouse -> name )
  // that is access via secondary index
Code generation
- two strategies (scan/index) for each node
- checks runtime what indices are available, etc.
- Join optimization problem occur when two paths expressions are compared
- regular path expressions similar to joins via secondary indices if data does not fit in memory can lead to thrashing, but ObjectStore ignores this.
Index maintenance
- Very expensive if every assignment needs an index check
- Declaration of data members that could potentially be used as index keys. for ex.
  class Person {
  int age indexable;// (does not require index must exist )
  };
- Index on path
  - Path indices
    eg. a -> b -> c == 21
  - Single step - more work to evaluate
    e.g. person -> child -> school == "IIT"
  - Full path - faster in above query but makes index update more difficult
  - Update affects not only local indices
    e.g. find people with a child named Fred
    class Person{
    os_set<Person *> child;
    }
    // Updates can be on new person, new child for person, name change for child, etc.
- When a indexable data member is updated
  - All affected access methods are updated
  - All access methods downstream in affected index paths are updated
- Update to collection, triggers updates that may affect all access methods of all indexes of the collection.
Performance
- Cold cache at client - Empty cache. Cold cache times are dominated by the time required to get data from the disk.
- Warm cache at client - Cold cache after a number of iterations have run on it becomes warm. Warm times reflect processing speed of data that is already present at the client.
- Cattell benchmark ( now OO1 and OO7 )
Conclusion
- ObjectStore was designed to perform complex manipulations on large databases.
- Ease of use, expressive power, code reusability and tight integration with the host environment along with high speed performance
- To meet these goals virtual memory architecture is used
- Single type system
- ObjectStore's collection, relationship and query facilities provide support for conceptual modeling constructs such as multivalued attributes, and many-to-many relationships can be translated directly into declarative ObjectStore constructs.

Related:

Object Store Management Architecture

Fine-Grained Sharing in a Page Server OODBMS

Index Page