Access Keys:
Skip to content (Access Key - 0)

Bulk Data Transfer


Design


Shannon Hastings

  • Inst 8-14-06
  • Mod 9-1-06
  • Mod 10-27-06

Overall Architecture:


The BulkDataHandler service aims to gather useful standard transfer mechanisms together into one standard service interface. The service will advertise metadata describing the possible transfer mechanisms which the BulkDataHandler service will support. The BulkDataHandler architecture utilizes the WSRF (Web Services Resource Framework) in order to enable the data provider to create resources which can then later be used to transmit the results to the user via the BulkDataHandler service. The data provider will be required to implement a resource interface on the server to enable the BulkDataHandler to be able to transmit their data. The current version of the BulkDataHandler supports 3 different transfer mechanisms: WS-Transfer, WS-Enumeration, and an accessor for obtaining GridFTP URL's. We believe that this current set of exposed functionality will enable users to effectively move data of larger sizes from service to client. The WS-Transfer and WS-Enumeration are standard data transportation standards for grid based computing and will fully adhere to security and data standards requirements of caBIG. GridFTP, however, is a transfer technology but not a standard protocol for doing so, that is, using GridFTP does not ensure interoperability from the server and the client. In this version we will be assuming that that transfer of the large data over the GridFTP channel is not guaranteed to be interoperable with the client. In our next version we plan to address this issue by providing specification of the "wire contents" of the GridFTP channel (such as requiring it to be XML representation, adopting a binary format such as BINX or DFDL, etc).

Figure 1: Bulk Data Transfer Grid Service Architecture

Any service which desires to utilize the bulk data transfer service will be required to provide the BDTMetadata. This metadata will describe the methods which provide BDT services for transferring results. The metadata will also describe the available transfer mechanisms. The service must be able to create and initialize a BDTResource for each call to any method which is enabling BDT by returning a BDTHandlerReference. This BDTResource will be implemented by the data service provider in order to optimally hook in the BDT capability to the back end data resource. This style of Resource creation is known as the factory patern. Any method which desires to use the BulkDataTransferService to transmit the results must factory a BDTResource, initialize it, and return an EPR to this resource back to the caller so they can use the BulkDataTranserService methods to retrieve the results via the desired mechanism. Any of these methods must return an BDTHandlerReference to the BulkDataTransferService which will be able to use that resource and properly handle the transferring of the results.

BDTMetadata:


Figure 2: Bulk Data Transfer Service Metadata


Figure 3: Transfer Provider Metadata

The BDTMetadata, as described above, will accompany any service which has the ability to return BDTHandlerService EPRs. The BDTMetadata will describe the available 3rd party transfer capabilities available in the BDTService as well as the operations which will create and return a reference to a BDTResource. The metadata will be stored as Resource Properties (WSRF) of the main service. These resource properties can optionally be registered with an index service to aid in service discovery.

BDTResource:


The BDTResource will be the container class responsible for maintaining information about the result set, how to retrieve it, and what has been transferred. An instance of this resource class must be created anytime an operation is called which returns a BDTHandlerReference EPR. The BDTResource class will be automatically generated/stubbed for the developer and will only need to be implemented to suit the desired application. Also, an implementation of the methods in the user's service which can return a BDTHandlerReference will also be implemented automatically and will only need to be customized in order to construct the resource with whatever information may be needed from the input of the method being called. The BDTResource is a Lifetime Managed (WS-Lifetime) resource. The designer has the ability to set the termination time or remove the resource as desired per the application.

Figure 4: BDTResource Class Stub
Figure 5:Sample Instance Method Using BDT Image

Possible Usage Scenario:

User calls queryWithBulkTransfer(CQLQuery query) on the service. The data service will make the query, create a BDTResource to represent the results of the query, and return back an BTDResource EPR to the caller. The user will then be able to make the appropriate call on the BulkDataTransferService to the desired 3rd party transfer method. The resource, which the EPR refers to, will then be used to transfer the data to the caller.

Last edited by
Sarah Honacki (1147 days ago)
Adaptavist Theme Builder Powered by Atlassian Confluence