The Metadata Model Service (MMS) provides capabilities for developers using caGrid to generate and add service metadata. Operations for generating Domain Model Metadata provide capabilities for MMS clients to generate caGrid standard Data Service metadata. Related operations for annotating standard caGrid Service Metadata provide capabilities for clients to augment standard caGrid service metadata with information from metadata registries, such as the caDSR.
View the Project Status.
The purpose of this document is to describe the architecture of the Metadata Model Service (MMS) grid service. The information in this document is to help developers interested in extending or modifying the MMS.
The MMS is a stateless, single-resource grid service created with Introduce. As such, it follows the standard package implementation breakdown between, client, common, service, and resource. The MMS also contains some additional packages which are specific to its implementation. The organization and purpose of the packages of MMS are described in the table below.
|Package in org.cagrid.mms||Description|
|client||The Introduce generated client package|
|common||The Introduce generated common package|
|domain||The MMS domain model|
|service||The Introduce generated service package, with additional classes for the spring-loaded implementation|
|service.impl||The MMS interface definitions|
|service.impl.cadsr||The caDSR-based implementation of the MMS interface|
|stubs||The Introduce/Axis generated protocol|
The service-side implementation of the MMS is a standard Introduce-generated service infrastructure. However, rather than the standard service "*Impl" class being used to implement the business logic, the MMS leverages Spring's dependency injection capabilities to load and configure the business logic from a deploy-time specified configuration file. The default implementation makes use of the caDSR as its sole external metadata registry.
As the MMS provides no "management" operations for the metadata models it expects its clients to reference, it is generally expected to be deployed referencing an external metadata registry. For example, the default implementation is configured to make use of caBIG's Cancer Data Standards Repository (caDSR). This external registry provides the means to add, modify, delete, or otherwise manage the UML models and their correspondence to XML Schemas which the MMS leverages. The details of that registry are of no concern to MMS clients; they only need to know that a particular MMS deployment is able to leverage it as a metadata registry (which is described in the Model Source Metadata). The MMS can similarly be configured for any other metadata registry which meets the requirements below.
In order to integrate an external metadata registry with the MMS, there are some basic requirements the registry must be able to satisfy. Firstly, the MMS's main purpose is annotating existing Service Metadata, using a registry-specific means of looking UML models for XML Schemas, and generating full UML model representations for an identified subset of a particular project in the registry. As such, it is required the registry provides an identifiable collection of UML Models in sufficient detail to reproduce a UML diagram (e.g. DomainModel). It is strongly suggested the registry also provide a mapping of those models to their representative XML Schema manifestations (in support of the annotateServiceMetadata operation). This is not strictly required (if not possible), as the annotateServiceMetadata can simply return the provided ServiceMetadata instance if no mapping is possible. The second main requirement is that the registry be able to identify its managed UML Projects through consistent means. That is, the MMS client API uses UMLProjectIdentifier (see the MMS Domain Model below) to identify projects of interest. As such, it must assign an identifer and optional version to each project, and where this identifier is not sufficient to uniquely identify a project, it may make use of optional or required additional registry-specific properties. For example, the table below shows the mapping of the caDSR MMS implementation's UMLProjectIdentifier.
|caDSR Grid Service||MMS Service|
|gov.nih.nci.cadsr.umlproject.domain.Project class||org.cagrid.mms.domain.UMLProjectIdentifier class|
|shortName attribute||identifier attribute|
|version attribute||version attribute|
|gmeNamespace attribute||gmeNamespace additional source property|
|longName attribute||longName additional source property|
|publicID attribute||publicID additional source property|
The MMS service relies heavily on the standard caGrid Metadata Models for its operations, but has additional classes for specifying parameters to its operations, and advertising its capabilities via service metadata (ModelSourceMetadata). The actual model itself is generated automatically by Axis, from the XML Schemas used by the MMS. As such, the object model is serialized automatically by Axis when exchanged between client and service.
As a general framework for creating and annotating caGrid service metadata, the MMS can be configured to communicate with nearly any external metadata repository. In order for clients to ascertain the metadata repositories available for a given MMS service, the MMS provides Model Source Metadata which can be retrieved by a client and interrogated for such information.
The Model Source Metadata contains a defaultSourceIdentifier and a collection of supported Model Sources, which describe the requirements and capabilities of the external metadata repositories. Each repository, a Model Source, is described by a SourceDescriptor in the Model Source Metadata. The Model Source is identified by its identifier, which is its unique name used to identify it in the MMS's operations. The Model Source is described by its description attribute, which is used solely for human readable presentation of the source. Each Model Source also has an optional collection of PropertyDescriptors, each of which describe a Property (a name value pair) available for use in UMLProjectIdentifers within the described external metadata repository. That is, the name of descriptor corresponds to the name that would be used in a Property. The PropertyDescriptor has two additional attributes: a description attribute which is used to describe what the Property represents, and a required boolean attribute which indicates whether or not UMLProjectIdentifer's for the given source are required to provide values for this Property.
The main service-side implementation classes of the MMS are shown below. The service implementation class, MetadataModelServiceImpl, reads the settings in the MetadataModelServiceConfiguration class, and uses them to instantiate an implementation of the MMS interface via Spring. The MMS implementation class implements the core business logic of the MMS, and defines the standard operations and their use of the MMSGeneralException. These components are described in more detail in the following sections.
The MMS interface mirrors the service operations of the MMS fairly directly, and allows the general service to be easily used for any external metadata registry which can map between XML Schema and UML Models. These operations are described in detail in the service operation section of the MMS Developers Guide.
The MMS service implementation (MetadataModelServiceImpl), uses Spring's XMLBeanFactory to load and configure the MMS interface implementation, using a configuration and property file which are specified via Introduce service properties. The MMS Administrators Guide configuration section provides extensive documentation of this process. As such, the MMS business logic can easily be modified by plugging into custom code or configuration settings without needing to modify any of the MMS service's code or build/deploy process.
The classes involved in the default implementation of the MMS interface, which leverages the caDSR as its external metadata registry, are shown below.
The CaDSRMMImpl is the implementation of the MMS interface, which acts as the entry point to the caDSR-based implementation. It is responsible for constructing the appropriate MetadataModelSource instance, which indicates the allowable caDSR-specific properties the service will support. For generating domain models and annotating service metadata, it delegates to the DomainModelBuilder and ServiceMetadataAnnotator respectively. Both these components leverage the caCORE ApplicationService for communicating with the caDSR.
The caCORE 4.0 ApplicationService is used by various other components to access information in the caDSR. It provides a simple query API which is available for remote access. The component APIs that make use of the ApplicationService, generally take in instance of it in their constructor. The CaDSRMMImpl creates an instance by using the ApplicationServiceProvider.getApplicationServiceFromUrl(url) call, where the URL used is populated via a mapping of MMS Source Identifier to URL, provided via the Spring configuration.
The ApplicationService is used to issue both HQL (Hibernate Query Language), and simple "query by example" queries. The two caCORE models which are consulted are the gov.nih.nci.cadsr.domain model, which represents the caDSR information and the gov.nih.nci.cadsr.umlproject.domain, which represents a UML-like view of information in the caDSR.
The DomainModelBuilder provides the ability to generate caGrid standard Data Service metadata instances for project's registered in the caDSR. The Data Service metadata describes the information model being exposed by a Data Service. For more information on the model, consult the caGrid metadata overview. The DomainModelBuilder uses the ApplicationService to access the necessary information from the caDSR. It mostly uses optimized HQL queries to efficiently access the necessary information, utilizing eager association fetching where appropriate. The majority of work performed by the DomainModelBuilder is simply aggregating and transforming information in the caDSR into the format necessary to describe the DomainModel metadata of Data Services. As much of the necessary extraction and transformation is independent, and the information is located in a remote system where network delays slow down computation, the DomainModelBuilder benefits greatly from parallelism. In order to achieve this parallelism, the builder employs a work/thread pool. The commonj WorkManager, distributed with Globus, is used for this purpose. In this framework, the work to be done is modeled as implementations of the Work interface, and Work items are scheduled with the WorkManager for execution with a configurable pool of threads. This provides a mechanism to manage the amount of system resources consumed by the service for the purposes of scheduling background tasks. Each task concurrently scheduled beyond the maximum number of worker threads is placed in a priority queue for processing once a currently executing task completes, and a thread becomes available. The DomainModelBuilder utilizes the WorkManager by creating Work items for each UML Class and UML Attribute it processes in the model.
The DomainModelBuilder provides four variants of the operation used for creating domain model instances. Each method, takes as input, a representation of the caDSR Project for which the DomainModel should be created. The first, createDomainModelForProject, takes only the project description, and generates a model which describes the entire domain model being exposed for the project. The second, createDomainModelForPackage, additionally takes an array of Strings which represent UML package names in the Project which should be exposed. The method generates a model which describes exposing all UML Classes which are in UML Packages with a name specified in the array. Any associations to UML Classes outside of the specified packages are not exposed. The third method, createDomainModelForClasses, also takes an array of Strings which represent the fully qualified UML Class names which should be exposed in the model. Any association between classes not specified is omitted. The final method, createDomainModelForClassesWithExcludes, also takes an additional array of Strings which represent the fully qualified UML Class names which should be exposed in the model, but also takes an array of UMLAssociationExcludes which can be used to exclude specific associations from the model (in addition to the already excluded associations which reference classes not specified in the array of class names). The UMLAssociationExclude Class allows the client to specify a sourceRoleName, sourceClassName, targetRoleName, and targetClassName. Any UML Association which would otherwise be included in the computed subset of the DomainModel is omitted if it meets the criteria described by any of the UMLAssociationExcludes. The value of any attribute of the UMLAssociationExclude can be the wildcard "" which indicates it should match anything. As such, specifying an exclude with "" as the value for all attributes would effectively omit all associations from the DomainModel. By using no wildcards, a single association can be omitted, and by using a combination of some values and some wildcards, groups of associations can be omitted. For example, specifying an exclude instance with a sourceClassName value of "gov.nih.nci.cabio.domain.Gene" and wildcards for all other attributes would effectively omit any associations from the DomainModel where gov.nih.nci.cabio.domain.Gene was the source of the association. Using these methods, in combination with the service provided methods of finding all Projects, Packages, Classes, and Associations a DomainModel exposing any subset of Classes and Associations can be created. The DomainModelBuilder API allows a client to simply identify the items to be exposed in the Project, and it does the work to create the conforming DomainModel instance.
The ServiceMetadataAnnotator is similar to the DomainModelBuilder in that it creates caGrid metadata instances. However, the ServiceMetadataAnnotator produces the standard ServiceMetadata common to all caGrid services, and it requires the client to supply a partially populated model as input. The caGrid common service metadata specifies information about a grid service and its operations. For more information on the model, consult the the caGrid metadata overview. The ServiceMetadataAnnotator takes this model and populates the UML and semantically oriented components by querying the caDSR appropriately. Specifically, it populates the semantically annotated UML Class information (similar to the type used in Data Service Domain Model metadata) for each input and output type of every operation the service provides. It does this by examining the XML Qualified Name (QName) of each type used in the signature of the operation and locating its UML equivalent in caDSR using the new caDSR 4.0 GME Namespace feature. In this approach, UML items (Projects, Classes, Attributes, etc) in the caDSR are annotated with their representative XML Schema namespace. The ServiceMetadataAnnotator use the QName to construct an appropriate "prototype" object (e.g. UMLClassMetadata), populating the gmeNamespace and gmeName attributes, and passes it to the ApplicationService.search operation.
In addition to the ServiceMetadata instance, the anotator also takes a Map of XML Namespace to QualifiedProject (which specifies and ApplicationService and UML Project to use) that the CaDSRMMSImpl creates based on the NamespaceToProjectMappping passed to the annotateServiceMetadata operation. The NamespaceToProjectMappping provides a way for clients to give a hint to the MMS as to which Project (and Metadata Source) should be used for a particular Namespace. The caGrid Metadata Introduce Extension is an example where this capability is leveraged. The caDSR Data Type Discovery component of Introduce allows one to add models (XML Schemas) to their project, by browsing the caDSR and selecting a Project/Package. When such a model is added, it notes its caDSR information (e.g. Project name and version). Later when the caGrid Metadata Introduce Extension uses the MMS to annotate the service's metadata, it reads this information and creates a series of NamespaceToProjectMappping which map each XML Schema's Namespace to the appropriate caDSR Project. The MMS then uses this information to ensure the correct metadata is associated with the operations which make use of types from those XML Schemas. As such, the NamespaceToProjectMappping array provides clients the ability to control the MMS behavior when there are multiple external metadata repositories supported and/or a given XML Schema Namespace ambiguously maps to multiple UML Models (i.e. multiple projects are reusing the same XML Schema).
The MMS Client API is a standard Introduce-generated client API, which mirrors the operations of the service, the details of which can be found in the relevant section of the MMS Developers Guide.
The design and use of the client are otherwise just as described in the Introduce documentation.
The MMS provides no direct user interface components, though the caDSR Type Discovery Introduce Extension makes use of the MMS to generate DomainModels for UML Packages in the caDSR. Information on the use of this component can be found in metadata documentation.
The GME has an extensive suite of unit and system tests. The unit tests live within the GME project itself (globalModelExchange), but the system tests are within the caGrid system tests module (tests), which is not distributed with the release, in the globalModelExchangeTests project. Both sets of tests run automatically on a nightly basis on the caGrid Quality Dashboard.
The MMS currently has a small collection of unit tests that test the following aspects.
|org.cagrid.mms||Tests the codebase for cyclic package dependencies|
|org.cagrid.mms.metadata||Tests that the grid service's metadata validates to the standard schema|
|org.cagrid.mms.test||Testing utilities and test base classes|
These tests can be run locally by typing the following command from the mms project:
This will run every unit test, and produce an error message upon any failures.
The MMS currently has a single system/integration test. The test case exercises the following steps:
- sets up a Tomcat container
- deploys the MMS service to the container
- checks the ModelSourceMetadata (non-null, has a default source which has details specified)
- stop tomcat
- deletes tomcat