Access Keys:
Skip to content (Access Key - 0)

Federated Query Processor


Federated Query Processor 1.2 Administrators Guide


Navigation
caGrid caGrid 1.2 Documentation
FQP FQP 1.2 Documentation FQP 1.2 Administrators Guide
Table of Contents


Prerequisites


The Federated Query Processor service does not require any software or special system configuration beyond the standard caGrid stack.

The performance of the FQP service may benefit from large amounts of RAM and multiple processor cores for handling concurrent query operations, but this is not strictly required to deploy the service.

Obtaining the Service

The Federated Query Processor service is available in the caGrid release, and can be found in the directory $CAGRID_LOCATION/projects/fqp. If you have obtained a source release or checkout of caGrid, the FQP service must first be complied. From the directory $CAGRID_LOCATION, execute the command ant all to compile all of caGrid including the FQP service, or ant build-project -Dsingle.project.name=fqp to build just the FQP service and the projects on which it depends.

Installing the Software


Install caGrid and a Container

In this step you will download and install the FQP service and a grid service container using the caGrid Installer. If you already have caGrid 1.2 installed on your machine, and a suitable container, you may proceed to the next section.

Once you have installed caGrid, the FQP software can be found in the directory location where you installed caGrid, in the caGrid/projects/fqp directory. This guide will refer to that location as FQP_HOME

To install caGrid/FederatedQueryProcessor and set up a container, please see the Secure Container section of the caGrid 1.2 Installer User's Guide

Configuration


To simply deploy the FQP service with the default configuration, all you really need to edit is the service's standard ServiceMetadata, by following these instructions.

Service Properties


The Federated Query Processor service may be configured by changing values specified in the service.properties file found in the root directory of the FQP distribution.

  • threadPoolSize
    • Default value: 10
    • Type: Integer
    • Controls the size of the thread pool used by the FQP service to perform DCQL and perform final query aggregation against target data services. Increasing this value may improve performance and responsiveness of the FQP service at the expense of potentially using more server resources.
  • initialResultLeaseInMinutes
    • Default value: 30
    • Type: Integer
    • Controls the initial time-to-live (lease time) of FederatedQueryResults resources. The value is specified in minutes. Unless the client explicitly requests a termination time for their results resource more distant in the future, after this time has elapsed, the resource will be destroyed. When the resource is destroyed, any remaining query execution tasks are terminated and any DCQL query results are lost.

These properties may be configured at deployment time by the Introduce service deployment GUI, or by directly editing the service.properties file before deploying it.

Service Metadata


The FQP service provides a Resource Property, which acts as metadata for its clients and describes the service's capabilities and information on where it is being hosted. This Resource Property is the caGrid standard ServiceMetadata, and is loaded from a file on the filesystem (serviceMetadata.xml), which is located in the FQP_HOME/etc directory, and is deployed with the service upon deployment. This file is fully populated, using dummy date for the information about where the service is being hosted. Before deploying the service, you are strongly recommended to edit this file and provide the information which describes your organization.

If you aren't comfortable editing XML, you can use Introduce's graphical editor instead when you deploy the service.

Below is the relevant section of the file which you should edit.

 <ns1:hostingResearchCenter>
  <ns15:ResearchCenter displayName="The Ohio State University" shortName="OSU" xmlns:ns15="gme://caGrid.caBIG/1.0/gov.nih.nci.cagrid.metadata.common">
   <ns15:Address country="US" postalCode="43210" stateProvince="OH" street1="333 W. 10th Ave"/>
   <ns15:pointOfContactCollection>
    <ns15:PointOfContact affiliation="OSU" email="oster@bmi.osu.edu" firstName="Scott" lastName="Oster" role="maintainer"/>
   </ns15:pointOfContactCollection>
  </ns15:ResearchCenter>
 </ns1:hostingResearchCenter>

Deployment


If the Federated Query Processor service is to be used with the caGrid Transfer tools, it must be deployed to a Tomcat service container along with the transfer service.

The Federated Query Processor is an Introduce-created service, and as such, supports all the standard Introduce deployment processes, which are described in the Introduce Administrator's Guide

For example, to deploy the service to Tomcat from the command line, you can type the following command from the FQP_HOME directory:

> ant deployTomcat

You will then want to start up Tomcat so your service is available. For Tomcat, you can run this command:

Linux / Unix

> $CATALINA_HOME/bin/startup.sh

Windows

> %CATALINA_HOME%\bin\startup.bat

You must supply portions of the the service metadata the FQP service instance will use to register with caGrid index services, specifically the Point of Contact information and the Research Center information. This information can be set via the Introduce service deployment GUI, or by manually editing $CAGRID_LOCATION/projects/fqp/etc/serviceMetadata.xml and deploying via the command line.

Once the FQP service has been deployed, start the service container and verify that no error messages are printed to the console.

Validation


To validate the service has been deployed and is functioning correctly, use this simple client code to invoke a DCQL query and verify its results:

import gov.nih.nci.cagrid.common.Utils;
import gov.nih.nci.cagrid.dcql.DCQLQuery;
import gov.nih.nci.cagrid.dcqlresult.DCQLQueryResultsCollection;
import gov.nih.nci.cagrid.fqp.client.FederatedQueryProcessorClient;

import java.io.FileReader;
import java.io.StringWriter;

public class SimpleFQP {

    public static void main(String[] args) {
        String url = args[0];
        String queryFile = args[1];
        
        try {
            FederatedQueryProcessorClient client = 
                new FederatedQueryProcessorClient(url);
            FileReader reader = new FileReader(queryFile);
            DCQLQuery query = (DCQLQuery) Utils.deserializeObject(
                reader, DCQLQuery.class);
            reader.close();
            System.out.println("Querying " + url);
            DCQLQueryResultsCollection results = client.execute(query);
            StringWriter writer = new StringWriter();
            Utils.serializeObject(
                results, 
                DCQLQueryResultsCollection.getTypeDesc().getXmlType(), 
                writer);
            System.out.println(writer.getBuffer().toString());
            System.out.println("Done");
        } catch (Exception ex) {
            ex.printStackTrace();
            System.exit(1);
        }
    }
}

The simple main method takes two parameters. The first is the URL of the Federated Query Processor service, and the second is the filename of a DCQL query to execute. Try running the following query, which simply queries the caArray data service for Publications with an ID less than or equal to 10.

<ns1:>DCQLQuery xmlns:ns1="http://caGrid.caBIG/1.0/gov.nih.nci.cagrid.dcql">
 <ns1:TargetObject name="gov.nih.nci.caarray.domain.publication.Publication">
  <ns1:Attribute name="id" predicate="LESS_THAN_EQUAL_TO" value="10"/>
 </ns1:TargetObject>
 <ns1:targetServiceURL>http://array.nci.nih.gov:80/wsrf/services/cagrid/CaArraySvc</ns1:targetServiceURL>
</ns1:>DCQLQuery>

Executing this query should produce results similar to the following:

<ns1:DCQLQueryResultsCollection xmlns:ns1="http://caGrid.caBIG/1.0/gov.nih.nci.cagrid.dcqlresult">
 <ns1:DCQLResult targetServiceURL="http://array.nci.nih.gov:80/wsrf/services/cagrid/CaArraySvc">
  <ns2:CQLQueryResultCollection targetClassname="gov.nih.nci.caarray.domain.publication.Publication" xmlns:ns2="http://CQL.caBIG/1/gov.nih.nci.cagrid.CQLResultSet">
   <ns2:ObjectResult>
    <ns2:Publication id="1" authors="Calvo A, Xiao N, Kang J, Best CJ, Leiva I, Emmert-Buck MR, Jorcyk C, Green JE" pages="5325-35" publication="Cancer Research" pubMedId="12235003" title="Alterations in gene expression profiles during prostate cancer progression" uri="http://cancerres.aacrjournals.org/cgi/content/full/62/18/5325" volume="62" year="2002" xmlns:ns2="gme://caArray.caBIG/2.1/gov.nih.nci.caarray.domain.publication">
     <ns2:status>
      <ns3:Term id="680" value="Published" xmlns:ns3="gme://caArray.caBIG/2.1/gov.nih.nci.caarray.domain.vocabulary">
       <ns3:categories/>
      </ns3:Term>
     </ns2:status>
    </ns2:Publication>
   </ns2:ObjectResult>
   <ns2:ObjectResult>
    <ns4:Publication id="10" authors="Rygaard K, Sorenson GD, Pettengill OS, Cate CC, Spang-Thomsen M." pages="5312-5317" publication="Cancer Research" pubMedId="2167152" title="Abnormalities in structure and expression of the retinoblastoma gene in small cell lung cancer cell lines and xenografts in nude mice" volume="50" year="1990" xmlns:ns4="gme://caArray.caBIG/2.1/gov.nih.nci.caarray.domain.publication">
     <ns4:status>
      <ns5:Term id="680" value="Published" xmlns:ns5="gme://caArray.caBIG/2.1/gov.nih.nci.caarray.domain.vocabulary">
       <ns5:categories/>
      </ns5:Term>
     </ns4:status>
     <ns4:type>
      <ns6:Term id="47" accession="MO_430" url="http://mged.sourceforge.net/ontologies/MGEDontology.php#journal_article" value="journal_article" xmlns:ns6="gme://caArray.caBIG/2.1/gov.nih.nci.caarray.domain.vocabulary">
       <ns6:categories/>
      </ns6:Term>
     </ns4:type>
    </ns4:Publication>
   </ns2:ObjectResult>
  </ns2:CQLQueryResultCollection>
 </ns1:DCQLResult>
</ns1:DCQLQueryResultsCollection>
Last edited by
Knowledge Center (1155 days ago) , ...
Adaptavist Theme Builder Powered by Atlassian Confluence