Access Keys:
Skip to content (Access Key - 0)

Federated Query Processor


Federated Query Processor 1.4 Administrators Guide


Navigation
caGrid caGrid 1.4 Documentation
FQP FQP 1.4 Documentation FQP 1.4 Administrators Guide
Contents

Prerequisites


The Federated Query Processor service does not require any software or special system configuration beyond the standard caGrid stack.

To use the caGrid Transfer infrastructure for results retrieval, the transfer project is required, and the FQP service must be deployed to the same Tomcat container as the transfer service.

The performance of the FQP service may benefit from large amounts of RAM and multiple processor cores for handling concurrent query operations, but this is not strictly required to deploy the service.

Obtaining the Service


The Federated Query Processor service is available in the caGrid release, and can be found in the directory $CAGRID_LOCATION/projects/fqp. If you have obtained a source release or checkout of caGrid, the FQP service must first be complied. From the directory $CAGRID_LOCATION, execute the command ant all to compile all of caGrid including the FQP service, or ant build-project -Dsingle.project.name=fqp to build just the FQP service and the projects on which it depends.

Installing the Software


Installation of the Federated Query Processor can be accomplished using the caGrid Installer, or manually.

Install caGrid and a Container

In this step you will download and install the FQP service and a grid service container using the caGrid Installer. If you already have caGrid 1.4 installed on your machine, and a suitable container, you may proceed to the next section.

Once you have installed caGrid, the FQP software can be found in the directory location where you installed caGrid, in the caGrid/projects/fqp directory. This guide will refer to that location as FQP_HOME

To install caGrid/FederatedQueryProcessor and set up a container, see the following sections.

Installer Prerequisites

The caGrid Installer installs all prerequisites except for Java and MySQL.

  • Java 6 JDK
    • Make sure the JAVA_HOME environment variable is set and points to the location where the JDK has been installed.
  • (Optional) If you are deploying caGrid core services locally, you may also need a MySQL database.
    Note
    MySQL is only required for the security services and GME. You can use 4.x (with transaction enabled; i.e., use InnoDB engine) or 5.x.

Install caGrid and Configure a Secure Container Using the caGrid 1.4 Installer

  1. Download the caGrid 1.4 Installer. The downloaded installer should be contained in the file caGrid-installer-1.4.zip.
  2. Unzip the file caGrid-installer-1.4.zip, this should create the directory caGrid-installer-1.4, from this point forward we will refer to this directory as CAGRID_INSTALLER_LOCATION.
  3. From a command prompt launch the installer:
     > cd *CAGRID_INSTALLER_LOCATION\\\\\* 
    > java -jar caGrid-installer-1.4.jar
  4. Select the I agree to this license checkbox and click Next.
  5. Select the Install/Configure caGrid Software and Install/Configure Grid Service Container checkboxes and click Next.
  6. The installer detects whether or not you have already installed Ant. It installs or reinstalls it, depending on the installation status. In either case, you must specify where you want to install Ant.
  7. The installer detects whether or not you have already installed Globus. It installs or reinstalls it, depending on your installation status. In either case, you must specify where you want to install Globus.
  8. The installer prompts you to speciry where you want to install caGrid. Specify a location and then click Next.
  9. The installer displays a list of tasks that the installer will perform. Click Next to start the installation process. The installer downloads, builds, and installs several components. Note: This process takes several minutes.
  10. Once the installer has completed installing all the components, click Next.
  11. The installer ask you which Grid you would like to configure your installation to use. The installer supports configuring caGrid to work out of the box with many community Grid environments. For testing and development purposes we recommend selecting the Training Grid. If you do not want to configure caGrid to work with an existing Grid, you may select that as well. The installer can also be modified to support additional Grids.
  12. The installer shows a summary of the tasks to be completed. Click  Next to configure caGrid to use the selected target Grids. Note: This process takes several minutes.
  13. Once the installer has finished configuring caGrid to use the target Grid, click  Next.
  14. Select the Container to which  you want to deploy your service. This guide provides instructions for using the Tomcat container. Check the Should this container be secure? option and then click  Next.
  15. In the hostname box, enter the hostname of your server; this should match the hostname you used in creating your host credentials. Click Next.
    If you plan on using this container to deploy a service that registers to an existing grid, you must use a publicly resolvable DNS name (or static IP). If you do not, you will have to manually edit configuration files later.
  16. From the Obtain host credentials method list, select Browse host credentials on the file system. Click  Next.
    If you do not have credentials for your service yet, then Request Credentials.
  17. Enter the location of your host certificate into the Certificate box.  Enter the location of your private key into Key box. Click Next.
  18. The next screen prompts you to specify where you want to install Tomcat. In the Directory box, enter the installation location and then click Next.
  19. The next screen displays a list of tasks that the installer will perform to install and configure Tomcat. Click Next.
  20. Once the installer has completed installing all the components, click Next.
  21. Click Next. The final screen will remind you set your ANT_HOME, GLOBUS_LOCATION and CATALINA_HOME environment variables. Set these variables immediately and then click Finish.
    These instructions are also written to a file called CAGRID_POST_INSTALLATION.txt in the directory from which you ran the installer.
  22. Close the installer by clicking Close.

Configuring the Service


To simply deploy the FQP service with the default configuration, all you need to edit is the service's standard ServiceMetadata, by following these instructions.

Edit Service Properties

The Federated Query Processor service may be configured by changing values specified in the service.properties file found in the root directory of the FQP distribution.

  • maxTargetServicesPerQuery
    • Default value: 12
    • Type: Integer
    • Controls the maximum number of target data services which may be included in any single DCQL query. If a client attempts to execute a query which specifies more than this number of target data services, an exception will be thrown and the query will not execute. If this value is set to zero (0), the number of services is unlimited.
  • maxRetryTimeout
    • Default value: 300
    • Type: Integer
    • Controls the maximum number of seconds a client may request the FQP service to wait between retrying queries to target data services which failed to respond correctly. If the client specifies a value greater than this, an exception will be thrown and the query will not execute. If this value is set to zero (0), the maximum timeout is unlimited.
  • maxRetries
    • Default value: 4
    • Type: Integer
    • Controls the maximum number of retries a client may request the FQP service to perform when retrying to execute queries to target data services which failed to respond correctly. If the client specifies a value greater than this, an exception will be thrown and the query will not execute. If this value is set to zero (0), the maximum number of retries is unlimited.
  • threadPoolSize
    • Default value: 10
    • Type: Integer
    • Controls the size of the thread pool used by the FQP service to perform DCQL and DCQL 2 queries and perform final query aggregation against target data services. Increasing this value may improve performance and responsiveness of the FQP service at the expense of potentially using more server resources.
  • initialResultLeaseInMinutes
    • Default value: 30
    • Type: Integer
    • Controls the initial time-to-live (lease time) of FederatedQueryResults resources. The value is specified in minutes. Unless the client explicitly requests a termination time for their results resource more distant in the future, after this time has elapsed, the resource will be destroyed. When the resource is destroyed, any remaining query execution tasks are terminated and any query results it may have contained are lost.

These properties may be configured at deployment time by the Introduce service deployment GUI, or by directly editing the service.properties file before deploying it.

Edit Service Metadata

FQP provides service metadata to clients and other services that describes information about the service, operations supported by the service, and information on the organization hosting the service.

Edit the service metadata to reflect your organization as follows:

  1. Open the FQP service metadata file, FQP_HOME/etc/serviceMetadata.xml.
  2. In the hostingResearchCenter element near the bottom of the file, do the following.
    1. Supply your ResearchCenter infomation.
    2. Supply your Address. This is the address that is used when mapping your service on the caGrid Portal.
    3. Supply the PointOfContact. This is the person responsible for maintaining the service.
      A completed example:
      <ns1:hostingResearchCenter>
        <ns53:ResearchCenter displayName="Ohio State University" shortName="OSU" xmlns:ns53="gme://caGrid.caBIG/1.0/gov.nih.nci.cagrid.metadata.common">
         <ns53:Address country="US" locality="Columbus" postalCode="43210" stateProvince="OH" street1="3190 Graves Hall" street2="333 W. 10th Ave."/>
         <ns53:pointOfContactCollection>
          <ns53:PointOfContact affiliation="OSU" email="John.Doe@osumc.edu" firstName="John" lastName="Doe" phoneNumber="(555) 555-5555" role="Maintainer"/>
         </ns53:pointOfContactCollection>
        </ns53:ResearchCenter>
       </ns1:hostingResearchCenter>
      
Note
By default, FQP registers with and publishes its service metadata to the Index Service. The default Index Service is configured as the Index Service of the target grid you selected when you installed Dorian. You can find configuration details on registering and publishing to the Index Service, including disabling registration and changing which Index Service to register with, on the Registration and Discovery page.

Starting FQP


The Federated Query Processor service requires that the caGrid Transfer Service be deployed to the same Tomcat or JBoss container the FQP service will be deployed to. The Transfer service must be deployed first.

Once a container has been set up, the caGrid Transfer service must be deployed. Change to the caGrid transfer project directory ($CAGRID_LOCATION/projects/transfer) and execute the command "ant deployTomcat". Once this completes, the Federated Query Processor service may be deployed by changing to the $CAGRID_LOCATION/projects/fqp directory and executing "ant deployTomcat".

The Federated Query Processor is an Introduce-created service, so supports all the standard Introduce deployment processes, which are described in the Introduce Administrator's Guide.

For example, to deploy the service to Tomcat from the command line, type the following command from the FQP_HOME directory:

 > ant deployTomcat 

Start up Tomcat so that your service is available. For Tomcat, run this command:

Linux / Unix

 > $CATALINA_HOME/bin/startup.sh 
Windows

 > %CATALINA_HOME%\bin\startup.bat 

Validating FQP


To validate the service has been deployed and is functioning correctly, use this simple client code to invoke a DCQL query and verify its results:

import gov.nih.nci.cagrid.common.Utils;
import gov.nih.nci.cagrid.dcql.DCQLQuery;
import gov.nih.nci.cagrid.dcqlresult.DCQLQueryResultsCollection;
import gov.nih.nci.cagrid.fqp.client.FederatedQueryProcessorClient;

import java.io.FileReader;
import java.io.StringWriter;

public class SimpleFQP {

    public static void main(String[] args) {
        String url = args[0];
        String queryFile = args[1];

        try {
            FederatedQueryProcessorClient client =
                new FederatedQueryProcessorClient(url);
            FileReader reader = new FileReader(queryFile);
            DCQLQuery query = (DCQLQuery) Utils.deserializeObject(
                reader, DCQLQuery.class);
            reader.close();
            System.out.println("Querying " + url);
            DCQLQueryResultsCollection results = client.execute(query);
            StringWriter writer = new StringWriter();
            Utils.serializeObject(
                results,
                DCQLQueryResultsCollection.getTypeDesc().getXmlType(),
                writer);
            System.out.println(writer.getBuffer().toString());
            System.out.println("Done");
        } catch (Exception ex) {
            ex.printStackTrace();
            System.exit(1);
        }
    }
}

The simple main method takes two parameters. The first is the URL of the Federated Query Processor service, and the second is the filename of a DCQL query to execute. Try running the following query, which simply queries the caArray data service for Publications with an ID less than or equal to 10.

<ns1:DCQLQuery xmlns:ns1="http://caGrid.caBIG/1.0/gov.nih.nci.cagrid.dcql">
 <ns1:TargetObject name="gov.nih.nci.caarray.domain.publication.Publication">
  <ns1:Attribute name="id" predicate="LESS_THAN_EQUAL_TO" value="10"/>
 </ns1:TargetObject>
 <ns1:targetServiceURL>http://array.nci.nih.gov:80/wsrf/services/cagrid/CaArraySvc</ns1:targetServiceURL>
</ns1:DCQLQuery>

Executing this query should produce results similar to the following:

<ns1:DCQLQueryResultsCollection xmlns:ns1="http://caGrid.caBIG/1.0/gov.nih.nci.cagrid.dcqlresult">
 <ns1:DCQLResult targetServiceURL="http://array.nci.nih.gov:80/wsrf/services/cagrid/CaArraySvc">
  <ns2:CQLQueryResultCollection targetClassname="gov.nih.nci.caarray.domain.publication.Publication" xmlns:ns2="http://CQL.caBIG/1/gov.nih.nci.cagrid.CQLResultSet">
   <ns2:ObjectResult>
    <ns2:Publication id="1" authors="Calvo A, Xiao N, Kang J, Best CJ, Leiva I, Emmert-Buck MR, Jorcyk C, Green JE" pages="5325-35" publication="Cancer Research" pubMedId="12235003" title="Alterations in gene expression profiles during prostate cancer progression" uri="http://cancerres.aacrjournals.org/cgi/content/full/62/18/5325" volume="62" year="2002" xmlns:ns2="gme://caArray.caBIG/2.1/gov.nih.nci.caarray.domain.publication">
     <ns2:status>
      <ns3:Term id="680" value="Published" xmlns:ns3="gme://caArray.caBIG/2.1/gov.nih.nci.caarray.domain.vocabulary">
       <ns3:categories/>
      </ns3:Term>
     </ns2:status>
    </ns2:Publication>
   </ns2:ObjectResult>
   <ns2:ObjectResult>
    <ns4:Publication id="10" authors="Rygaard K, Sorenson GD, Pettengill OS, Cate CC, Spang-Thomsen M." pages="5312-5317" publication="Cancer Research" pubMedId="2167152" title="Abnormalities in structure and expression of the retinoblastoma gene in small cell lung cancer cell lines and xenografts in nude mice" volume="50" year="1990" xmlns:ns4="gme://caArray.caBIG/2.1/gov.nih.nci.caarray.domain.publication">
     <ns4:status>
      <ns5:Term id="680" value="Published" xmlns:ns5="gme://caArray.caBIG/2.1/gov.nih.nci.caarray.domain.vocabulary">
       <ns5:categories/>
      </ns5:Term>
     </ns4:status>
     <ns4:type>
      <ns6:Term id="47" accession="MO_430" url="http://mged.sourceforge.net/ontologies/MGEDontology.php#journal_article" value="journal_article" xmlns:ns6="gme://caArray.caBIG/2.1/gov.nih.nci.caarray.domain.vocabulary">
       <ns6:categories/>
      </ns6:Term>
     </ns4:type>
    </ns4:Publication>
   </ns2:ObjectResult>
  </ns2:CQLQueryResultCollection>
 </ns1:DCQLResult>
</ns1:DCQLQueryResultsCollection>
Last edited by
Saba Bokhari (464 days ago) , ...
Adaptavist Theme Builder Powered by Atlassian Confluence