The pluggable filter/loader sub-system is responsible for extracting entities of interest from the target artifacts and storing them into the Spaces registry persistent store. It also discovers additional artifacts of interest that are logically related to the artifacts provided. The sub-system is designed to handle different artifact types, including network based artifacts and artifacts stored in archives. Finally the system is deigned to be extensible, allowing others to provide filters for extracting entity information from other types of meta-data files. The initial implementation will focus on the XSD and WSDL file types.

Architecture

The base architecture consists of four major sub-components (Dispatcher, Filter, Loader and BulkLoader) that receive input from external artifacts and persist information gathered from them into the registry store. The components form a logical two-stage, sequential process and are loosely coupled and share their state within an in-memory state-map.

entity-property model diagram

During the first stage the FilterDispatcher analyses the list of target entities and dispatches each to a registered extract-function or Filter. During processing the Filter stores information about discovered entities in a state-map and dispatches any new artifacts discovered back to the FilterDispatcher for later processing.

Once the FilterDispatcher has processed all artifacts the in-memory state is passed to the Loader classes. One or more loader classes are invoked in sequence, each examining the state-map and storing the entities and properties into the registry persistent store.

The Dispatcher (FilterDispatcher)

As its name suggests the FilterDispatcher() is a central work management class of the sub-system. It acts as a router that passes items of work in the form of Artifacts to the appropriate Filter classes. Routing is currently done by file type (identified by extension or content). The dispatcher also handles file system directories and archive files for which it processes the content. The unit of work of the Dispatcher is by Artifact. Each Artifact represents a resource stream that defines entities that may be stored in the Registry store. An Artifact can be a file within an archive, a HTTP resource or a normal local file resource.

The main function of the Dispatcher is to manage an internal work queue. New items of work as Artifacts can be added to the queue by calling Dispatcher newWork() method. The Dispatcher iterates over the work queue until all of the artifacts in the queue have been visited. At the end of the process, items that have been visited may have the ignored or error flags set as appropriate.

The Filter

As the Dispatcher visits files it routes known file types to specific filters that support the Filter interface and registered as being interested in the file. These filters role is to extract metadata from the artifacts and store it in an in-memory model. Validation of artifacts is outside the scope of Filters. A Filter implementor therefore can focus on the task of extracting the key information from a meta-data source as opposed to implementing a fully validating parser.

The Loader

The Loader is the simplest of all the pieces of this sub-system and provides a loose coupling between the in-memory state created by the Filter classes and the persisted metamodel within the registry Store. The design supports multiple loader classes each that receive a reference to an opaque in-memory map containing relevant state information. The state map is opaque, meaning only a Filter and corrosponding Loader might know the form of the data contained within its subset of the global load state. In other words, exactly how the Loader interprets the state map and model provided by it's Filter down to its implementator of those components. It is envisaged that each type of filter will choose how its state information is stored and will have a companion Loader class that knows how to interpret and store that information into the registry.

The goal of the loose coupling is to allow the filter-loader pairs and the registry store to evolve independently without any direct interaction. The link between the two being encapsulated within the loader classes.

Filter State

The output of the filter stage is held in memory with an abstract state of type Map which provides isolation between filter types. The state object is persisted across all calls to filters within the lifecycle of a single Dispatcher() instance. The aim is that each filter type (e.g. for XSD files) can store their own state object within the Map. This filter-specific state object will typically be created during the first call to a specific filter-type from an instance of Dispatcher(); usually with the setDispatcher() method.

As an example of this, the XSD filter creates a shared state object of type XsdState that is added to the Dispatchers state map using the key of the XsdState Class object. Again, this convention allows multiple state files to be created and stored in the single shared state object maintained by the Dispatcher() instance. The XSD filter implementation is then free to populate this state object with any information generated by its filter processing. This state information is essentially opaque to the Dispatcher() or any other class that does not have intimate knowledge of the Xsd filter process. Usually for every filter implementation, there will be a corrosponding loader class implementation that can interpret the state and process it usefully.

Although implementation specific, and not enforced by the architecture, the XsdState class also maintains a separate Map that maintains the collections of metadata models that have been discovered by the extraction process. The corresponding loader function will navigate through this state and translate these models to the appropriate form to be stored within the registry.

Transformation

The final part of the process involves the Loaders transforming the in-memory state into the Entity/Property metamodel that is used to persist the metadata into the Registry store. Each type of metadata object will use its own metamodel reusing as much of the core model framework as possible. Core metamodel constructs are documented in a separate tech-note. The load process itself is performed in two phases. The first phase creates the newly discovered entities within the store. Phase two then creates the relationships between them.

This high-level introduction to the filter/loader model is not meant as a development guide and only provides an overview of the framework. A subsequent tech-note will provide an end-to-end walk-through of implementing a new filter.