Access Keys:
Skip to content (Access Key - 0)

caGrid Transfer


caGrid Transfer 1.2 Design Guide


Contents

Architecture


The architecture of the caGrid Transfer Service is simple yet powerful. It is comprised of the following 4 main components:

  1. Transfer Service
  2. Transfer Service Helper
  3. Transfer WebApp
  4. Transfer Client Helper

Each component plays a role in either staging the data, persisting the information representing the data to be transfered, securing the data, transferring the data from service to client or client to service, and cleaning the data up.

Transfer Architecture

Components


caGrid Transfer Service

The caGrid Transfer Service is a WSRF-based grid service, and is responsible for creating resources that represent the data to be held and transferred. It utilizes the WS-ResourceFramework to create unique, stateful resource instances for each data item to be transferred. Grid Transfer is designed to run in the same container as the invoking service, and is better suited that SOAP to move large data items.
The user passes either a pointer to a file, or the data itself to the TransferServiceHelper. The Helper creates an instance of the TransferServiceContext resource (via its ResourceHome); data and a pointer to it are stored until either the client picks it up, or it is destroyed:

  • Data is stored as a file on the file system
  • The resource stores information needed to operate, so it can survive a container or system restart. This persists as a ResourceProperty of the type DataStorageDescriptor on the service.

Grid Transfer also supports the WS-Notification specification: by subscribing, clients can listen for changes to the DataStorageDescriptor and be notified of changes to the data staging status of the resource.
If deployed into a secure container, it uses the caller's identity to protect the data. The callers distinguished name (DN) and the file location are written into its persisted data; this is used by the TransferSerlvlet to ensure the retriever of the data is authorized to do so. The service has a public method called getTransferDataDescriptor() that returns a TransferDataDescriptor object when invoked. This object contains the URL to retrieve the data over http(s).

Transfer Webapp

The Transfer Webapp is a Java servlet deployed into the same container as Globus. It delivers the data to the consumer over an HTTP or GSI-based HTTPS connection:

  • If Globus is deployed to this container in a non-secure mode, basic http sockets will be used and security will not enforced (i.e. anyone with the URL to the data item will be able to retrieve it).
  • If the container is secured, this servlet communicates over HTTPS connections using the GSI secure sockets (by using the same connector in Tomcat or JBoss, which is used by Axis/Globus). This secure connection contains the credentials of the caller, enabling the Transfer Webapp to compare the caller's identity to the identity of the requested resource. The Transfer Webapp checks the resource in the TransferServiceContext (which represents this data in the DataStorageDescriptor) and compares the userDN attribute to the caller's authenticated DN. If they match, the caller has the same credentials as the creator of the data item, and the data is streamed back to the caller. If not, the connection is dropped and the data remains protected on the server.

APIs


Transfer Service Helper

The Transfer Service Helper is a server side API used to create the TransferServiceContextResource for the data being transferred. It uses the ResourceHome of the TransferServiceContext to create an instance of a TransferServiceContextResource, which maintains the user identification and file location of the data to be transferred. The Transfer Service Helper has several createTransferContext methods to create a Transfer resource. For download scenarios, it can consume a byte array, input stream, or file; it doesn't need to have the data for upload scenarios. Each of these methods will return a TransferServiceContextReference, which contains the EPR to the resource. The user uses this EPR to get the DataTransferDescriptor of the resource for retrieval or submission. The API is as follows:

package org.cagrid.transfer.context.service.helper;
public class TransferServiceHelper {

    //used for sending data provided by a file, the boolean tells the service whether or not to delete the file once it has been sent
    public static org.cagrid.transfer.context.stubs.types.TransferServiceContextReference createTransferContext(File file, DataDescriptor dd,  boolean deleteFileOnDestroy) throws RemoteException {
...
    }

    //used for sending data provided by a byte array
    public static org.cagrid.transfer.context.stubs.types.TransferServiceContextReference createTransferContext(byte[] data, DataDescriptor dd) throws RemoteException {
...
    }

    //used for sending data provided by an input stream
    public static org.cagrid.transfer.context.stubs.types.TransferServiceContextReference createTransferContext(InputStream is, DataDescriptor dd) throws RemoteException {
...
    }

    //used for receiving an upload
    public static org.cagrid.transfer.context.stubs.types.TransferServiceContextReference createTransferContext(DataDescriptor dd, DataStagedCallback callback) throws RemoteException {
...
    }
}

Transfer Client Helper


The Transfer Client Helper is a client side API for retrieving or submitting the data item. The data item is either created by the grid service and held by the servlet (waiting to deliver it), or will later be received by the servlet and held for the service to process. In the download scenario, if the container is secure, the user will call the operation below (if insecure null can be passed in for the credentials):


public static InputStream getData(DataTransferDescriptor desc, GlobusCredential creds) throws Exception

and else the user can call the:


public static InputStream getData(DataTransferDescriptor desc) throws Exception

This call will create the appropriate socket connection to the url provided in the DataTransferDescriptor which points to the data item to be transferred or the url where the data can be posted. If this connection is opened properly and the user is authorized the InputStream to the data will be returned. This InputStream will then be able to be read by the user to obtain the data. Once the data is read in total the user can call destroy on the TransferServiceContextClient to let the server know it can now remove the cached data.
In the case of upload, the same applies, but different methods will be used:


public static void putData(InputStream is, long contentLength, DataTransferDescriptor desc, GlobusCredential creds) throws Exception

and else the user can call the:


public static void putData(InputStream is, long contentLength, DataTransferDescriptor desc) throws Exception

In this case, after the data has been read from the InputStream, the user should call setStatus(Status.Staged) on the TransferServiceContextClient to let the service know that the data is present and can be processed. If the user service registered a callback with the TransferServiceContextResource when it was created with the TransferServiceHelper, that callback method will then be invoked.

Performance


The chart below shows trends in transfer time vs data size. It shows that:

  • Performance with respect to data size grows linearly.
  • HTTP vs GSI Encrypted HTTPS is, on average, an order of magnitude slower.
Transfer Performance

Tests were run with the client and server on the same machine to remove the possible variations due to network traffic. This allowed the tests to show the latency trends of the software itself, and ignore those caused by network anomalies or bandwidth.

Last edited by
Sarah Honacki (1956 days ago) , ...
Adaptavist Theme Builder Powered by Atlassian Confluence