Data Services 1.2 Developers Guide
| Navigation | ||
|---|---|---|
| caGrid | caGrid 1.2 Documentation | |
| Data Services | Data Services 1.2 Documentation | Data Services 1.2 Developers Guide |
The purpose of this guide is to describe the caGrid Data Service Infrastructure such that developers can make programmatic use of its client tools and extension points.
| |
|
|
| |
Contents |
|
| |
|
|
Overview
caGrid Data Services provide an object view of a data resource across the grid. The data resource is exposed through a well defined query method, which also relies on well defined query language objects to perform queries and return results as a strongly typed set. caGrid Data Services are designed to expose objects whose XML schemas are registered in the GME, and also expose metadata about those data objects derived from the caDSR. Data Services also provide support for integration with alternate results delivery mechanisms such as WS-Enumeration. Enabling these features adds new query methods to the Grid-facing API.
CQL
The caGrid Query Language, or CQL is the query language used for all caGrid Data Services to express object-oriented queries against a data service. It is defined in an XML document conforming to a well defined schema with the URI http://CQL.caBIG/1/gov.nih.nci.cagrid.CQLQuery
.
Essentially a CQL query is a request for instances of a particular class. Which instances are returned can be constrained by the value of their attributes, which kinds of objects they are associated with and by what the values of their attributes are. A CQL query also specifies which attributes should be included in the returned instances or if the result should just be a count of the instances.
The following XML schema diagram shows the required structure of a CQL query.
A CQL query consists of some or all the following XML elements:
- CQLQuery
A simple top-level XML element at the head of every CQL query document. A CQLQuery has no attributes. It contains a Target element and an optional QueryModifier element.
<CQLQuery>
<Target …> … </Target>
<QueryModifier …> … </QueryModifier>
</CQLQuery>
- Target
The Target XML element specifies the class that the query's result objects will be an instance of. It may contain elements that constrain which instances of the target class are included in the result.
A Target XML element has a required attribute named name. The value of the Target element's name attribute is the name of the class whose instances the query will return. We refer to the class named by the name attribute as the query's target class. The target class must exactly match a class named in the data service's domain model.
A Target element may contain either an Attribute, Association or Group element to constrain which instances of the target class are to be included in the result.<Target name="foo.bar"/>
or
<Target name="foo.bar">
<Attribute …/>
</Target>or
<Target name="foo.bar">
<Association …/>
…
</Association>
</Target>or
<Target name="foo.bar">
<Group …/>
…
</Group>
</Target> - QueryModifier
A QueryModifier is an XML element that specifies how objects will be included in the query's results. A QueryModifier element has a required attribute named countOnly. The value of the countOnly attribute must be either true or false. If the value of the countOnly attribute is true then the result of the query will contain just the number of objects included in the query rather and not contain any of the actual result objects.
If the value of a QueryModifier element's countOnly attribute is false, then the query's results will contain instances of the target class that match the query. Which of the class's attributes are included in the results depends in what AttributeNames elements, if any, that the QueryModifier contains.
A QueryModifier element may contain zero or more AttributeNames elements. If the QueryModifier element contains zero AttributNames elements, then the returned objects will contain just their id attribute. Every class in a data service's domain model should have an id attribute whose value uniquely identifies an instance of the class within the data service.
An AttributeNames element contains just the name of a single attribute of the target class. An AttributeNames element has no attributes or child elements. If a QueryModifier element contains any AttributeNames elements, then the query results will include each of the attributes named by the AttributeNames elements.
A QueryModifier element may also contain zero or more DistinctAttribute elements. A DistinctAttribute element contains just the name of a single attribute of the target class. A DistinctAttribute element has no attributes or child elements.
If a QueryModifier element contains DistinctAttribute elements, then the results of the query will only include instances of the target class whose attributes named by the DistinctAttribute elements have a unique combination of values. This means that some instances of the target class will be excluded from the query results. Which instances of the target class are excluded is left up to the data service.
The inclusion of a QueryModifier element in a query is optional. If a query does not contain a QueryModifier element, then the results of the query will contain the instances of the target class that match the query. The instances of the target class will contain just their id attribute.
Note that CQL does not know or care about the type of value (number, string, date, etc.) that a class's attributes may have.
<QueryModifier countOnly="true" />
or
<QueryModifier countOnly="true">
<AttributeNames>foo</AttributeNames>
<AttributeNames>bar</AttributeNames>
<AttributeNames>zip</AttributeNames>
⋮
<DistinctAttribute>foo</DistinctAttribute>
<DistinctAttribute>bar</DistinctAttribute>
</QueryModifier>
- Association
An Association XML element can appear as a child element of a Target, Association or Group XML element. Target, Association and Group elements have a target class. If a Target, Association or Group element has any child Association elements, the presence of the Association child elements places a constraint on which instances of the enclosing element's target class will be considered by the query. Instances of the enclosing XML element's target class are considered for a query only if they are associated with an instance of the child Association element's target class through the association identified by the child Association element.
Association XML elements have two attributes that are named roleName and name. The value of the roleName attribute must be the name of an association in the data service's data model that can be navigated away from the enclosing XML element's target class. The value of the name attribute must be the name of the class that the association can be used to navigate to. This class is considered to be the target class of the Association element.
The CQL schema specifies that the roleName attribute is optional and that the name attribute is required. However, it good practice to always specify both attributes. The value of just one of the attributes may be insufficient to uniquely identify an association in the data service's data model. Also, some data services treat the roleName attribute as required.
Only those instances of the Association XML element's target class that are connected to instances of the enclosing XML element's target class though the association identified by the Association element will be considered in satisfying the Association element's constraint.
An Association XML element can have a child element that is an Association, Attribute or Group element. If an Association element does have a child element, then the child element is used to further constrain which instances of the Association element's target class will be considered by the query.<Association
roleName="owner"
name="org.foo.id"/>or
<Association
roleName="owner"
name="org.foo.id">
<Association …>
…
</Association>
</Assocation>or
<Association
roleName="owner"
name="org.foo.id">
<Attribute … />
</Assocation>or
<Association
roleName="owner"
name="org.foo.id">
<Group …>
…
</Group>
</Assocation> - Attribute
An Attribute XML element can appear as a child element of a Target, Association or Group XML element. Target, Association and Group elements have a target class. If a Target, Association or Group element has any child Attribute elements, the presence of the Attribute child elements places a constraint on which instances of the enclosing element's target class will be considered by the query. Instances of the enclosing XML element's target class are considered for a query only if their value for the attribute named by the Attribute element passes the test specified by the Attribute element.
An Attribute element has three XML attributes, which define its constraint. The attribute name is required. Its value is the name of the attribute of the enclosing XML element's target class to be restricted.
The attribute value is required. The value of the XML value attribute is a value to be compared to the named value in instances of the enclosing element's target class.
The attribute predicate describes what type of test to be performed on the value of the named attribute in instances of the enclosing element's target class. Allowable values for this include: EQUAL_TO, NOT_EQUAL_TO, LIKE, LESS_THAN, LESS_THAN_EQUAL_TO, GREATER_THAN and GREATER_THAN_EQUAL_TO. These tests all involve a comparison between the value of an instance attribute and the value of the value XML attribute. The exact nature of the comparison that corresponds to each value of predicate is not defined by CQL; it is determined by the particular data service.
The two other permitted values for the predicate attribute are IS_NOT_NULL and IS_NULL. These values imply a test only for the presence or absence, respectively, of the instance attribute's value. They do not constrain the value of the instance attribute. The value of the value XML attribute is ignored when the value of the predicate attribute is IS_NULL or IS_NOT_NULL.
<Attribute name="size" predicate="EQUAL_TO" value="3">
Always specify a value for predicate
The CQL schema specifies that the predicate attribute is optional with a default value of EQUAL_TO. However, some data services treat the predicate attribute as required. It is recommended that the value of the predicate attribute is always specified.
- Group
Group XML elements combine two or more constrains under a single Target or {Association}} element. A Group XML element contains two or more Attribute, Association or Group elements.
Group elements have an attribute named logicOperator, whose value may be either AND or OR. The value of the logicOperator attribute determines how the constraints of the Group element's child elements are combined into a single constraint for the Group element's enclosing element.
If the value of the logicOperator attribute is AND, then the constraints of the Group element's child elements are combined into a single constraint that is satisfied only if the constraints of all of the child elements are satisfied.
If the value of the logicOperator attribute is OR, then the constraints of the Group element's child elements are combined into a single constraint that is satisfied only if the constraints of any of the child elements are satisfied.
Combining AND and OR
Sometimes you may want to combine constraints in ways that involves both AND and OR logic. This can be accomplished by nesting Group elements inside of each other, with the nested Group elements specifying a different value for its logicOperator attribute the the enclosing Group element.
Overview of the Process of Constructing a CQL Query
To construct a CQL query, first identify the data type that you would like to retrieve. This data type (the class from the UML model) becomes the '''Target''' in your CQL query. Next, identify the criteria that you would like to use to retrieve only a subset of all available data. For example, if you specify the "Gene" class as the Target, you will retrieve all Gene objects in the database, which probably is not what you want. To limit the subset of Gene objects that you retrieve, you must identify "filtering" criteria.
To specify "filtering criteria", use the Group, Attribute and Association CQL elements. For example, to retrieve only Gene objects where the Gene name matches a given pattern, you would specify an attribute filter on the Gene class (the "name" attribute, for example).
If the attributes that you would like to filter upon are in another class, specify an association from the Target to a (associated) class. At that point, you can specify Attributes in the associated class to filter on.
CQL Examples
Several example CQL queries are available on this wiki that demonstrate how to create a few common types of CQL queries.
Creating a CQL Query In Code
Data services in caGrid use CQL to compose queries. A query can be produced programmatically, building up parts of the query using the supplied object model:
Programmatic query building
The data services project in caGrid provides a Java object model for CQL which can be used to build queries.
gov.nih.nci.cagrid.cqlquery.CQLQuery query =
new gov.nih.nci.cagrid.cqlquery.CQLQuery();
gov.nih.nci.cagrid.cqlquery.Object target =
new gov.nih.nci.cagrid.cqlquery.Object();
target.setName(gov.nih.nci.cabio.domain.Gene.class.getName());
gov.nih.nci.cagrid.cqlquery.Attribute symbolAttribute =
new gov.nih.nci.cagrid.cqlquery.Attribute(
"symbol",
gov.nih.nci.cagrid.cqlquery.Predicate.LIKE,
"IL%");
target.setAttribute(symbolAttribute);
query.setTarget(target);
Load CQL query from a Reader
Alternatively, a CQL query can be loaded from Reader
. The code examples below illustrate loading from a string of XML text or an XML file on disk and deserialized into the object model:
// from a string
CQLQuery query2 = (CQLQuery) gov.nih.nci.cagrid.common.Utils.deserializeObject(
new StringReader("<CQLQuery ... />"), CQLQuery.class);
// from a file
CQLQuery query3 = (CQLQuery) gov.nih.nci.cagrid.common.Utils.deserializeObject(
new FileReader(cqlFile), CQLQuery.class);
Write CQL query out to a file
The following code illustrates how to write a CQL query out to a file. The Utils.serializeObject method takes any Writer
as input.
// obtain a CQL query instance
CQLQuery someQuery = ...;
// open a writer to send XML to
StringWriter writer = new StringWriter();
// serialize
Utils.serializeObject(someQuery, DataServiceConstants.CQL_QUERY_QNAME, writer);
// print XML to the console
System.out.println(writer.getBuffer().toString());
Schemas
The CQL schemas are available on this wiki.
Caveats
- CQL does not permit querying for attributes with values that are XML schema complex types.
- Only values that can be represented as XML schema simple types are allowed.
- CQL Attribute Results cannot contain attribute values which are XML schema complex types.
- Only values that can be represented as XML schema simple types are allowed.
- CQL does not provide a facility for returning object instances other than the targeted data type.
- This includes subclasses of the targeted data type. These cannot be returned because their XML representation will differ from that of the requested object, which violates the expected results schema.
- CQL cannot return populated associations on instances of the targeted data type.
- This has some implications when dealing with uni-directional associations. For example:
Assume two classes: Person -name Address -street And there is a uni-directional association between Person and Address (Person->Address) With CQL you can say: "Give me the name of the Person at '123 Main St'" You can just write the CQL with target Person, and criteria of Association to Address where Address.street='123 Main St' But you can't say: "Give me the Address of 'Scott'" Because Address needs to be the target, and there is no way to express constraints on the Person (as there is no association). However, if the association is bi-directional, you CAN do both. To do the second query, you just would express the query as target Address, and criteria of Association to Person where Person.name='Scott'. You basically need to "invert" the criteria.
- This has some implications when dealing with uni-directional associations. For example:
Generic Data Service Clients
The caGrid Data Services infrastructure supplies three basic client classes which can be used to invoke any arbitrary caGrid Data Service. This capability is due to the query methods of all data services being defined in a common, well known WSDL which each unique service instance imports.
The basic data service client, which is capable of invoking any caGrid Data Service, is the class gov.nih.nci.cagrid.data.client.DataServiceClient. The class defines the query() method, which takes a CQL Query as its single parameter and returns a CQL Query Results instance. A sample usage of this class is provided below:
import gov.nih.nci.cagrid.common.Utils; import gov.nih.nci.cagrid.cqlquery.CQLQuery; import gov.nih.nci.cagrid.cqlquery.Object; import gov.nih.nci.cagrid.cqlresultset.CQLQueryResults; import gov.nih.nci.cagrid.data.DataServiceConstants; public class SampleDataServiceInvocation { public static void main(String[] args) { try { DataServiceClient client = new DataServiceClient(args[0]); CQLQuery query = new CQLQuery(); Object target = new Object(); target.setName("some.class.name"); query.setTarget(target); CQLQueryResults results = client.query(query); Utils.serializeDocument("myResults.xml", results, DataServiceConstants.CQL_RESULT_COLLECTION_QNAME); } catch (Exception ex) { ex.printStackTrace(); System.exit(1); } } }
This small sample will create a new data service client using a URL specified on the command line and submit a query to it for all objects of the type "some.class.name". The results will be stored on disk in an XML document named "myResults.xml". The DataServiceConstants class used in this example provides static Strings and QNames used throughout the data service infrastructure. The constant CQL_RESULT_COLLECTION_QNAME is the QName which defines the XML type for result sets returned from the data service's query method.
Additionally, the caGrid Data Services infrastructure provides clients that can connect to data services which support WS-Enumeration and the caGrid Bulk Data Transfer infrastructure. Respectively, these clients are gov.nih.nci.cagrid.data.enumeration.client.EnumerationDataServiceClient and gov.nih.nci.cagrid.data.bdt.client.BDTDataServiceClient. These clients provide public APIs which return an EnumerationContext instance or a BulkDataHandlerReference respectively. These return types may be used to initialize an instance of the Globus provided ClientEnumIter class, or make use of the caGrid Bulk Data Transfer Client directly.
The client classes provided with the data service infrastructure, as well as any other clients generated by the Introduce toolkit, should not be assumed to be thread safe. Each thread communicating with a data service should have its own instance of the client class. Since client instances are unique, multiple data service clients may be used within the same thread or JVM to communicate with multiple data services simultaneously.
Client Side Utilities
The caGrid Data Services infrastructure provides a number of utility classes to make invocation of remote data services a simpler process for the application developer. The package gov.nih.nci.cagrid.data.utilities contains utilities can invoke standard, enumeration, and BDT data services, as well as tools for handling domain models and working with wsdd and castor mapping files.
Iterating Query Results
When a query is performed using the standard caGrid Data Service client's query method, a CQLQueryResults object is returned. This object is a container for both the results themselves and some information pertaining to their type. This container can contain object results, attribute name/value pairs, or a count of the total number of items in the result set. The difficulty of manipulating a container which may contain such a wide variety of result types stored in it may be handled by an iterator class provided with the data service infrastructure.
The interface DataServiceIterator specifies a single query() method, which takes a CQL Query and returns an instance of java.util.Iterator. The iterator can be used to walk through the results of a query issued to a data service. Three concrete implementations of the DataServiceIterator interface are provided, each for communicating with a different type of data service. The DataServiceHandle class can be used to invoke a standard caGrid Data Service, while the EnumerationDataServiceHandle and BdtDataServiceHandle classes are designed for WS-Enumeration and Bulk Data Transfer supporting data services, respectively.
Additionally, this package contains an Iterator utility for handling CQLQueryResults instances directly. The class CQLQueryResultsIterator implements the java.util.Iterator interface, and has three constructors. The choice of constructor affects the behavior of calling the next() method.
- CQLQueryResultsIterator(CQLQueryResults)
- Creates an Iterator over the results which will return materialized objects deserialized using the default type mappings.
- CQLQueryResultsIterator(CQLQueryResults, boolean)
- Creates an Iterator over the results, and the value of the Boolean parameter indicates if XML strings should be returned from the next() method.
- If the Boolean value is true, XML text of each item is returned, otherwise the results will be deserialized using the default type mappings.
- CQLQueryResultsIterator(CQLQueryResults, InputStream)
- Creates an Iterator over the results, and expects the InputStream will point to a client or server side wsdd.
- The contents of this wsdd file will be used to configure deserialization of the objects contained in the results.
- Creates an Iterator over the results, and expects the InputStream will point to a client or server side wsdd.
The class gov.nih.nci.cagrid.data.utilities.CQLQueryResultsIterator implements the java.util.Iterator interface, and so can be used in a while() loop like any other iterator over a Java collection. Depending on what the query to the data service asked for, calls to the next() method of this iterator will return different types of objects.
- If the query was for object results, then:
- The iterator returns objects of the type specified as the target for the query.
- Objects which require custom serialization and/or deserialization require that the iterator be configured with an InputStream to the client-config.wsdd file containing the type mappings for the objects.
- Alternatively, the iterator can be configured to return only the XML representation of those objects.
- If the query was for attribute results, including distinct attributes, then:
- Each successive call to next() returns an array of TargetAttribute types.
- These types contain the name of an attribute and its value.
- The value in the TargetAttribute instance will be null if the value was null on the object satisfying the query.Each array of TargetAttributes corresponds to one object instance which satisfied the CQL query criteria.
- If the query was for a count of object instances, then:
- The iterator returns a single java.lang.Long value.
An example usage of this iterator is below:
import gov.nih.nci.cagrid.cqlquery.CQLQuery; import gov.nih.nci.cagrid.cqlresultset.CQLQueryResults; import gov.nih.nci.cagrid.data.utilities.CQLQueryResultsIterator; import java.util.Iterator; public class SampleDataServiceInvocation { public static void main(String[] args) { try { DataServiceClient client = new DataServiceClient(args[0]); CQLQuery query = new CQLQuery(); // build up the query CQLQueryResults results = client.query(query); Iterator iter = new CQLQueryResultsIterator(results, SampleDataServiceInvocation.class.getResourceAsStream( "client-config.wsdd")); while (iter.hasNext()) { java.lang.Object result = iter.next(); // do something with the result object } } catch (Exception ex) { ex.printStackTrace(); System.exit(1); } } }
More information on the caGrid Data Services client APIs may be found on this wiki.
Utility Classes
Utilities
The caGrid data services infrastructure includes several utility classes which can be used to ease development and use of data services. These classes are found in the gov.nih.nci.cagrid.data.utilities package distributed with the data service infrastructure.
CQLResultsCreationUtil
This class provides convenience methods for creating CQLQueryResults instances for object results, attribute results, and a counting result. A convenience method for identifier results may be added in the future. The class provides three public static methods, one for each type of results currently supported.
- public static CQLQueryResults createObjectResults(List objects, String targetName, Mappings classToQname)
- objects - a list of Java objects to be placed in a new CQLQueryResults object.
- targetName - the name of the class targeted by the query which produced these object results. All items in the objects list should be of this type.
- classToQname - a mapping from class name to QName. This is a generated Java bean from the XML schema for the data service infrastructure and contains an array of name/value pairs that map class names to QNames.
- public static CQLQueryResults createAttributeResults(List attribArrays, String targetClassname, String[] attribNames)
- attribArrays - a List of Object arrays. Each array should have one value for one attribute of an object. These values may be null. The values must be in an order corresponding the ordering of attribute names
- targetClassname - the name of the class targeted by the query which produced these attribute results. All attribute arrays should have some from this type.
- attribNames - the names of the attributes returned by the query. These should be in the same ordering used by the attribute arrays.
- public static CQLQueryResults createCountResults(long count, String targetClassname)
- count - the number of resulting items (objects, attribute sets) from a query
- targetClassname - the name of the class which was the target of the query
DataServiceIterator
The data service iterator is an interface which provides for a query to be submitted to a data service and an Iterator over the result set to be returned. There are two implementations of this interface; one for the standard data service and one for data services with enumeration enabled.
- DataServiceHandle
- The data service handle is the implementation of the data service iterator class for a base caGrid Data Service. It has three constructors, all of which take a DataServiceClient instance. The default constructor needs only this parameter. The other two constructors should be used when custom serialization and deserialization of types has been specified for the service. The extra parameter can be either the filename of a wsdd file containing this mapping information, or an InputStream to the same information.
- EnumDataServiceHandle
- The enum data service handle is the implementation of the data service iterator interface for a WS-Enumeration enabled caGrid Data Service. It has two constructors, both of which take an enumeration data service client instance. The default constructor needs only this parameter. The second constructor takes an IterationConstraints instance, which contains information about how data should be requested from the enumeration data service.
- BdtDataServiceHandle
- The BDT data service handle is an implementation of the data service iterator interface to be used with a BDT-enabled caGrid Data Service. Its behavior is the same as that of the enum data service handle, except that it handles the additional invocation of the BDT context to support enumeration internally.
DomainModelUtils
The domain model utils provide a means to extract useful information from a domain model.
- public static UMLClass getReferencedUMLClass(DomainModel model, UMLClassReference reference)
- To save on document size, domain models do not duplicate class information when an association is defined, but rather use class references based on ID values. These reference values can be traced back to their original UML Class instance with this function.
- public static UMLClass[] getAllSuperclasses(DomainModel model, String className)
- Superclasses of a UML Class can be determined by traversing UML class references and generalization information. There are two methods which perform this task in the Domain Model Utils class. One uses a class name and the other extracts the name from an UMLClass instance and passes it to the other.
WsddUtil
The wsdd utility class contains functions to set parameters on a wsdd file. This class is used internally to the Introduce data service extension to edit the wsdd files and change the castor mapping file name.
- public static void setGlobalClientParameter(String clientWsddFile, String key, String value)
- clientWsddFile - the name of the client side wsdd file to edit. When edits are complete, the changed file is saved to the same location.
- key - the key of the parameter. This is the name by which the parameter can be accessed.
- value - the value stored in the parameter
- public static void setServiceParameter(String serverWsddFile, String serviceName, String key, String value)
- serverWsddFile - the name of the server side wsdd file to edit. When edits are complete, the changed file will be saved to the same location
- key - the key of the parameter. This is the name by which the parameter can be accessed
- value - the value stored in the parameter
Validation Tools
The caGrid Data Services infrastructure provides for validation of queries with respect to the domain model exposed by a service and the CQL schema, as well as query results for validity with respect to the exposed data types.
CQL Query Syntax
The caGrid Data Service infrastructure provides mechanisms to validate CQL queries for syntactic correctness. While the Axis engine prevents malformed XML from ever being turned into CQL objects, it does not handle XML that does not conform to certain schema restrictions. For example, Axis does not prevent populating multiple child elements of an XML schema 'choice'. For this reason, CQL syntax validation can be enabled on a caGrid data service. This mechanism will reject invalid queries before they ever reach a CQL Query Processor implementation, saving the processor's developer from having to handle them. This same validation can be performed either on the client side or offline completely by using the query validation utilities. For syntax validation, the interface gov.nih.nci.cagrid.data.cql.validation.CqlStructureValidator is provided, as are two implementations of this interface. The interface provides the validateCqlStructure() method, which takes a single CQLQuery instance parameter, and throws a MalformedQueryException if an error is encountered. The default implementation of this interface is the gov.nih.nci.cagrid.data.cql.validation.ObjectWalkingCQLValidator class. As its name suggests, this class walks through the CQL object model, seeking out inconsistencies with the published CQL schema. This class also has a main()' method, which allows it to be run from the command line with a list of CQL query XML files specified as arguments. The data service infrastructure uses this class by default when query validation is enabled. This can be changed to any other class which implements the CqlStructureValidator interface by editing the value of the dataService*cqlValidatorClass service property in a generated data service.
Domain Model Conformance
The Data Service infrastructure also provides mechanisms to validate a structurally sound CQL query against a Domain Model to ensure its restrictions are supported by the domain model's exposed structure. Domain Model validation may be enabled for a caGrid data service, and will be performed on every query submitted to the service before it is passed to the CQL query processor. The interface gov.nih.nci.cagrid.data.cql.validation.CqlDomainValidator is provided, along with a single implementation. The interface provides the validateDomainModel() method, which takes a single CQLQuery instance parameter, and throws a MalformedQueryException if an error is encountered. The lone implementation provided with the caGrid Data Service infrastructure is the gov.nih.nci.cagrid.data.cql.validation.DomainModelValidator class. Like the CQL validation instance, this class has a main() method, which allows it to be run from a command line. The arguments should be first a domain model XML file, then a list of CQL query files to be validated. The data service infrastructure uses this class when domain model validation is enabled. This implementation may be substituted for another by editing the value of the dataService_domainModelValidatorClass service property in a generated data service.
Results Validation
The data service infrastructure also provides a means to both validate the results of a CQL query against a known set of targets, and to determine what target data types are allowed to be returned by a caGrid Data Service. Every data service exposes a schema through its WSDL that enumerates the data types which may be returned by the data service. This schema appears in generated services under the schemas/<ServiceName> directory as <ServiceName>_CQLResultTypes.xsd.
The utility class gov.nih.nci.cagrid.data.utilities.validation.CQLQueryResultsValidator has been provided to both retrieve this file and verify that a CQLQueryResults instance conforms to this schema. An instance of this class can be constructed with either the full path to a data service's WSDL file, or an endpoint reference to a running data service.
The validator exposes two public methods:
This method locates the restriction XSD file and saves its contents into the file specified.
- public void saveRestrictedCQLResultSetXSD(File fileLocation) throws SchemaValidationException
- fileLocation - a file into which the restriction XSD will be saved.
- public void validateCQLResultSet(CQLQueryResults resultSet) throws SchemaValidationException
- resultSet - a set of results generated by a query into a caGrid Data Service. The object contents of this result set will be processed against the restriction XSD.
The CQLQueryResultsValidator class also has a main() method, which takes two arguments. The first argument is a URL to a caGrid Data Service, which will be used to retrieve the result restriction schema. The second argument should be the filename of a CQLQueryResults instance serialized to an XML document.
CQL Query Processors
The CQL Query Processor is a pluggable implementation which handles the details of processing CQL against some backend data source and produces a CQLQueryResults instance. The particular implementation used is determined by a value in the service's deployment properties, and an instance of the processor is loaded at runtime via reflection. The query processor may optionally supply a set of properties via the getRequiredParameters(). These properties may be configured prior to deployment of the service, and are passed into the query processor when it is first instantiated via the initialize() method. Additionally, the query processor may implement the method getPropertiesFromEtc(). This method returns a java.collections.Set containing a subset of keys from the getRequiredParameters() method whose values should be returned as file system paths relative to the etc directory of the deployed grid service.
When a query is issued to the data service, the query will be passed along to the CQL Query Processor instance's processQuery() method. This method may throw both a QueryProcessingException in the case of an error in handling the query and a MalformedQueryException in cases where the query was found to be invalid for any reason.
Implementation
See Also: How-to Implement CQL Query Processors
All query processor implementations are required to extend the abstract base class gov.nih.nci.cagrid.data.cql.CQLQueryProcessor. This base class declares several methods which are meant to be overridden, however the only method a query processor is required to implement is the processQuery() method. This method takes a CQL query and returns a CQLQueryResults instance. This method is declared abstract in the base class, which enforces this implementation requirement. Generally, this method should be able to translate CQL into whatever native query language is required by the back end data resource, and translate the result set into a CQLQueryResults instance.
CQL Query Processors are designed to be configurable at runtime by a set of properties. These properties are modifiable via the data service extension to the Introduce toolkit, or manually by editing a configuration file once a service has been built. The base CQL query processor class provides a method to retrieve required configuration parameters and their associated default values:
- public Properties getRequiredParameters()
- This method is provided by default and returns an empty java.util.Properties instance. CQL implementers who require properties to be configured should override this method to return a populated Properties instance. If a property is optional, its value should be set to an empty string. All property keys must be valid Java identifiers meaning that there cannot be any spaces or punctuation in the key.
Additionally, a method is provided to specify a subset of those properties are meant to be file locations:
- getPropertiesFromEtc()
- This method returns a java.util.Set containing a subset of keys from the getRequiredParameters() method whose values should be returned as file system paths relative to the etc directory of the deployed grid service.
The query processor base class has two protected methods which provide access to any user configured parameters and an input stream to the server side wsdd configuration file. The method getConfiguredParameters() returns a java.util.Properties instance containing all the keys defined in the properties returned by getRequiredParameters(), but with either the default or a developer configured value associated with each. The method getConfiguredWsddStream() returns an InputStream instance which will read in the contents of the server side wsdd configuration file. The call to the query processor's initialize method, and in turn the population of these values, occurs when the data service is first instantiated, typically when the container is started. Calls to these methods before this time will return null. For this reason, the constructor of the CQL Query Processor implementation must be fairly simple, and initialization of any resources required delayed until the initialize() method has been called.
/** * Processes the CQL Query * @param cqlQuery * @return The results of processing a CQL query * @throws MalformedQueryException * Should be thrown when the query itself does not conform to the * CQL standard or attempts to perform queries outside of * the exposed domain model * @throws QueryProcessingException * Thrown for all exceptions in query processing not related * to the query being malformed */ public abstract CQLQueryResults processQuery(CQLQuery cqlQuery) throws MalformedQueryException, QueryProcessingException;
The only method which is absolutly required to be implemented by CQL query processors is the processQuery() method. This is the method which executes the CQL query against its data source and generates an appropriate set of results. There are utilities (discussed earlier) to make generation of this result set a simpler process. At the time this method is called, the return values of getConfiguredParameters() and getConfiguredWsddStream() will be non-null.
The processQuery() method throws both a MalformedQueryException and a QueryProcessingException. Malformed query exceptions should be thrown under conditions where the query is somehow incorrect syntactically, or uses features of the CQL language which are not yet supported in the query processor implementation. If query syntax validation is enabled in the data service infrastructure, then it may be assumed that all queries reaching the processQuery() method are at least well formed CQL. Query processing exceptions should be thrown when some error occurs which prevents successful resolution of the query request. These conditions may include database errors, file system problems, or misconfiguration of properties.
Service Styles Architecture
See also: Data Service Styles
Data service styles may be added to the data service extension to provide additional functionality to the service creation and configuration processes, and are selected by the service developer when a service is first created. Styles may be installed at any time after the primary caGrid Data Services extension has been installed by adding to the styles directory found in the installed data service extension directory. Each style must provide its own directory in which files it uses will be placed, but no restriction is made on the naming of these directories. At the top level of each style directory, a style.xml file must exist, describing the style. This document describes the style's name, which caGrid and Data Service versions it is compatible with, and information on which classes are to be loaded for each component of the style. If the service developer selects no style at service creation time, the service is created with only the standard data services components and query method, and ready to have a custom domain model, query processor, and other data service requirements selected and configured.
Functionality Extended by Styles
Data Service styles may add functionality to any or all of the following areas of service develo-pment with the Introduce toolkit:
- Creation Wizard
- The service style may supply a list of wizard panels to be displayed and chained together in a wizard-like fashion to break the setup process for the service style into a series of steps. These panels will be shown in a wizard dialog when a service style is selected at service creation time.
- Post-creation processing
-
- Just as Introduce extensions may add functionality to the service creation process, data service styles may add processing capabilities to this step.
-
- Modification User Interface
- The style may supply a graphical panel which will be added to the data service tab in the Introduce service modification viewer. Th-is tab can be used to configure any style-specific options in the service.
- Post-code generation processing
- The style may add functionality to the code generation process of service modification. This processing will be invoked each time the service is modified and saved in Introduce.
Implementation of a Style
Main Article Data Service Style Creation






