Data Services 1.4 Design Guide
| Navigation | ||
|---|---|---|
| caGrid | caGrid 1.4 Documentation | |
| Data Services | Data Services 1.4 Documentation | Data Services 1.4 Design Guide |
| |
|
|
| |
Contents |
|
| |
|
|
Architecture
![]() |
The central Java class around which the majority of the data services service-side infrastructure revolves is the BaseServiceImpl class. This abstract class provides the basic implementation of the grid service and auditing support methods. The various standard query mechanisms such as query and queryWithEnumeration are implemented by concrete classes which extend from this base class.
On startup of the data service, the service class configures itself by reading from the ServiceConfigUtil class. This class acts as a wrapper around the dynamically generated ServiceConfiguration class which Introduce creates, and provides accessors to data service specific configuration properties. At this time, the data service will initialize any CQL validators required, instantiate and configure auditors, and create a configured instance of the CQL query processor.
When a CQL or CQL 2 query is passed to the data service, the configuration is consulted to determine if validation of that query needs to be performed. If validation is specified, the validator implementations specified in the configuration are instantiated, and the query is passed off to them. If validation fails, an exception is thrown indicating the cause of the failure and returned to the client. Otherwise, query processing proceeds.
Once validation is complete, the query is passed along to the CQL query processor implementation specified by the configuration. In the case of a CQL 2 query, the CQL 2 query processor is always invoked. If the service configuration does not specify a CQL 2 query processor implementation, query processing will fail at this point with an appropriate error returned to the client. CQL 1 queries may be processed by the CQL 2 processor after conversion, or by the CQL 1 query processor if one has been specified in the configuration. If the query fails at this point, an exception indicating the nature of the problem is thrown and returned to the client. Otherwise, the results of the query processing operation are returned.
Throughout this process, data service auditors are kept appraised of the current processing status through a series of callbacks. The caGrid Data Services tooling provides a single concrete implementation of the auditor interface which logs information to the local file system. Additional auditors which perform custom functionality may be added by the service developer.
Based on Standard Grid Services
The caGrid Data Services architecture is designed as a specialization of standard grid services. As such, some basic requirements for Data Services are immediately met:
- Security integration:
- All security concepts that apply to any other grid service are immediately available and enforced on caGrid data services.
- Simplified creation tooling:
- The Introduce Toolkit allows creation of grid services, including configuring their metadata, security, and service functionality using a simple graphical interface.
- Introduce also provides a pluggable back end architecture and graphical user interface, which allows specialized services to be created with a minimum of interaction from end users. Data Services leverage this extensibility to create services with a standard query interface, metadata, and core implementation.
Specialization of Features
Further requirements are met through implementation of a standardized query schema and client tooling to manipulate it:
- The standard query language is CQL 2, defined by a schema registered in the GME.
- Allows creation of Java objects which represent the CQL 2 query structure and can be passed to the caGrid data service's executeQuery method.
- CQL 2 Java Object model can be populated so as to describe the target data type and all qualifications and restrictions that must be met for the requested object.
- Queries may be imported from an XML representation either on disk, or any other source of String input.
- Backwards compatible support for CQL 1
- Data services support both CQL 1 and 2, but are only required to provide a CQL 2 query processor implementation to properly support both languages. The service infrastructure will handle conversion of CQL 1 to CQL 2 and results from CQL 2 to CQL 1 in this case.
- Services may optionally supply both a CQL 1 and CQL 2 query processor. In this case, the query processor implementation for each language version will be invoked when the data service is queried with each query language.
- Query results described by a CQL 2 Results schema, also registered in the GME.
- Client tooling implemented as an iterator over the result set. The iterator may deserialize the XML returned in the CQL Result object as a series of registered objects. An alternative implementation simply returns the XML without any processing applied to it.
- Data services expose a metadata document known as a domain model.
- The Domain Model defines the data types which are exposed by the data service and their relationships to one another.
- This model also contains semantic information, which allows data services to be discovered in the grid based on concept codes.
Service Interface
All caGrid data services implement a standard interface in the form of a WSDL document, which contains a single 'query' method. This method takes a single CQL Query parameter, and returns a single CQL Result Set object. All data services must follow this implementation pattern, but are free to include additional methods, such as domain-specific querying and data upload capabilities.
To both simplify creation of data services and ensure interoperability between data services, the basic implementation of this query operation is provided by the caGrid data services infrastructure, and is imported into user-generated services as they are created.
The query result schema wraps the serialized XML of registered data objects. These objects are identified by their schemas, which are included in the WSDL of the caGrid Data Service. This enables clients to discover which data types are available from a given service.
Query Processors
As caGrid data services are intended to be an abstraction away from an arbitrary underlying data resource, the data services infrastructure provides a means for customizing the implementation by which queries are executed against the data resource. The data services infrastructure provides an abstract base class for querying a data source with CQL, which data providers are required to implement, known as the CQL query processor. Query processor implementations are expected to take a single CQL query and produce an appropriate result set. Query processors are pluggable at runtime to the data service infrastructure, and are loaded via reflection. Implementations of query processors may specify configuration properties they require for proper functionality. These properties are configurable by the service developer through a graphical interface in the Introduce toolkit, as well as at deploy time of the service. At runtime, these properties and their corresponding values are passed to the query processor implementation.
To aid in moving existing silver level data sources on to the grid, several implementations of the CQL query processor are provided with the caGrid data service infrastructure to perform queries against a caCORE SDK generated data source. Serialization of SDK generated objects is also automatically configured when the service is created through the Introduce Toolkit.
Query Processing Decision Tree
Since CQL 2 query processors are new for caGrid 1.4, some services upgraded from earlier versions of caGrid may not have CQL 2 query processors. However, since CQL 2 is a superset of CQL 1, only a CQL 2 query processor is required, and the data service infrastructure will translate the query and results appropriately. The data service infrastructure makes a best effort to handle both CQL 1 and 2 given the query processors it has available according to the following decision tree:
![]() |
Data Service Styles
Main Article: Data Service Styles
The data service creation and modification system is pluggable, allowing for specialized data services to be easily created and configured with the Introduce toolkit. The style concept insulates the data service core infrastructure from the complexities associated with supporting various data sources, such as the caCORE SDK.
Querying
Data Services are accessed via their query() method, which takes a single CQL Query parameter, or the executeQuery() method, which takes a single CQL 2 query parameter. These methods can also throw both a MalformedQueryException and a QueryProcessingException to indicate error conditions.
An alternate delivery mechanism like WS-Enumeration and caGrid Transfer are also supported and have their own specialized query methods and standard WSDL interfaces.
Creation
caGrid Data Services can be built with a set of extensions to the Introduce Toolkit. This provides grid service developers with a simple and well defined starting point to create caBIG gold compliant Data Services.
From 1.3 to 1.4
CaGrid 1.4 maintains the principle of backwards compatibility with the previous version of caGrid. This holds true for caGrid data services as well. Data services developed with caGrid 1.0, 1.1, 1.2, and 1.3 can be queried with the 1.4 data service client side tooling, however, as is the case with all other caGrid services, the libraries and internal APIs are generally different enough that data services developed with caGrid 1.4 should not be deployed along side services from other versions. All new services should be developed using caGrid 1.4 to take advantage of new features and tooling, as well as any bug fixes.
New features
New features exist in caGrid 1.4 data services which are not present in earlier versions. For clients to take advantage of these features, they must make use of the 1.4 client libraries or otherwise find ways to integrate with the new tooling.
CQL 2
CQL 2 is a revamped and enhanced version of CQL which draws on the experiences and community feedback regarding CQL in its earlier incarnation. All data service operations which previously handled CQL queries now have parallel operations which handle CQL 2. For a full overview of CQL 2 and its capabilities and tooling, please see the CQL 2 documentation.
caGrid Transfer results retrieval
For caGrid 1.4, data services now support the use of caGrid Transfer to retrieve CQL and CQL 2 query results. Please review the client API information regarding the use of data services with Transfer.
Integration with caCORE SDK 4.1+
The 4.1 / 4.1.1 release of caCORE SDK integrates CQL processing natively by using the CQL to HQL translator from caGrid 1.2's support for caCORE SDK version 4.0. This allows the new query processor for caCORE 4.1 to leverage this functionality and move the processing of CQL queries into the caCORE SDK application directly. The new features and functionality from caGrid 1.2's support for caCORE SDK 4.0 carry forward to the new version.
A new wizard has been created which simplifies creation of caGrid 1.4 data services backed by caCORE SDK 4.1.1 and 4.2. Service developers wishing to create data services backed by a caCORE SDK 4.1.1 or 4.2 system should begin by taking the tutorial. Query support for caCORE SDK 4.3 is accomplished by another caGrid library written for use by the caCORE SDK to process CQL queries against the ISO 21090 data types the caCORE SDK supports.
In all cases, support for CQL 2 query processing is accomplished in caGrid specific code outside of the caCORE SDK.







