This guide explains how to install a local version of caGrid to support a new cooperative grid.
|Keep Detailed Notes|
As you work through all of the following steps, it is highly recommended that you keep detailed notes of everything that you do. Small mistakes in one step can cause large problems in a later step. Having detailed notes makes it easier to solve these problems by retracing your steps.
Most of the services available on a grid are there because they perform a function that is needed by the grid's users. However there are a few services that are needed as grid infrastructure. The Dorian service is needed to issue credentials to users that prove their authenticated identity. Since the user credentials issued by Dorian are normally valid for twelve hours, a single instance of Dorian can support many users.
Another service that is needed for grid infrastructure is the Grid Trust Service (GTS). The purpose of GTS is to tell the other grid services and clients what instances of Dorian to trust and if any credentials issued by a trusted Dorian have been revoked by that Dorian. Since other services and clients will will ask GTS for updates every few minutes, a moderately large grid may overwhelm a single GTS service.
To avoid the grid-wide performance problem that an overwhelmed GTS would cause, the installation instructions for caGrid describe how to configure a grid with a master and slave GTS. When configured this way, only the slave GTS gets updates from the master GTS and everything else gets updates from the slave GTS.
Having a grid with a master and slave GTS allows for the GTS to support a much larger grid. There can be multiple slave GTS instances in a grid. When grid growth causes the slave GTS to approach its capacity, you can add an add another slave GTS.
Including a slave GTS in a new grid configuration makes the grid more complicated to set up. If you expect your grid to grow indefinitely, then it is best to configure it with a slave GTS now. If you expect that grid to have fewer than 40 services and only a moderate number of clients at any one time, then you can save some time now by including only a single GTS instance in your grid.
The decision to not include a slave GTS in a grid can be changed after the grid is in production. However this change will require making the grid unavailable for a short time and the change process is not yet well documented.
Based on these guidelines, decide it you want to configure your grid with a single GTS or a master and slave GTS.
Choose a one word name for your new grid. Write this name down. You will incorporate this name into other names. In this documentation, we refer to this one-word grid name as [GRID_NAME]. Wherever you see [GRID_NAME], replace it with the actual one-word name that you have chosen.
We recommend that you configure one virtual machine (VM) per service and that you then clone that machine for future installations.
Begin by planning how you will configure the host and each VM. For example, plan the following:
- Host names
- IP addressess
- DNS names
Next, determine the grid service deployment layout for each virtual machine, as described below.
The following table was used when planning to install and configure the caGrid 1.3 Training Grid. Record your information in a similar manner.
caGrid 1.3 Example Grid Layout
|External Hostname||port||IP||Secure||Database||VM Disk Space||VM RAM|
|portal.training.cagrid.org||80 / 443||184.108.40.206||NO/YES||YES||60GB||4GB|
The recommended naming convention for grid host names is
For example, if the name you have chosen for your grid is "abc", then the name of the host for the dorian service might be dorian.abc.example.org.
The caGrid Knowledge Center has already created a caGrid 1.3 Virtual Machine that you can use, if its configuration is appropriate for your environment. It is suitable for VMWare ESX, which was used as the host for each service. The VM was configured with a Grid user account and required software as listed in caGrid 1.3 Host Configuration.
One service was deployed per VM on the caGrid 1.3 Training Grid.
If there is a firewall running on the VM (recommended), be sure to configure the firewall to allow connections on the port for the service(s) that will be running on the VM.
Details such as how to configure the VMs and network that support a grid are beyond the scope of this document. However there are some general things that should be considered in your planning to help ensure the availability of your grid and the services that it supports.
- Use network monitoring software to send alerts to the appropriate support people when a service becomes unavailable. The sooner the right people know that a service is down, the sooner the will be able to fix it. The caGrid Knowledge Center does not endorse any particular network monitoring software.
- You should have plans for the effective backup and restore of the databases used by the grid services. These services keep their most frequently changing data in their database. Daily backup for these databases is recommended.
We recommend that GME and security-related services are configured with a dedicated database that runs on the same host VM as the service. Core caGrid services that have their own MySQL database include:
- You should have plans for the effective backup and restore of the file systems used by the grid services. The core services do not keep frequently changing data as files, so weekly or monthly backups may be appropriate.
- You should have plans for creating and managing snapshots of VMs. These may be needed to restore the configuration of a VM if the VM becomes corrupted. Since the configuration of VMs does not change often, new snapshots are not needed often.
- You will need s way to manage passwords. As you configure the caGrid core services, you will create a variety of passwords. The CaGrid Knowledge Center uses an open source password vault program called KeePass for this purpose.
The Dorian and GTS services are essential to the functioning of other services on a grid. These two services are needed for the other services to know who and what to trust. In order for Dorian and GTS to trust each other, they must share certificates issued by a certificate authority (CA) that does not rely on either of them. This is the trust fabric CA.
The trust fabric CA exists solely to support GTS and Dorian, but it does not rely on either of them in any way. The trust fabric CA is a set of ant scripts, rather than a grid service.
Create the trust fabric CA. Then you will be ready to begin configuring caGrid services.
The next step is to modify the caGrid installer to know how to configure caGrid for your new grid.
Make sure that you use the new installer on the same GTS host on which you created the trust fabric authority, so that it will be targeted to the correct grid.
One of the steps described in the setting up each service is editing a file with the name etc/serviceMetaData.xml. You will add to this file information about the organization that hosts the service and points of contact for questions/problems with the service instance. The contents of this file are published to the grid's index service and so will be accessible to anyone using the grid.
setMetadata.pl.zip is a Perl script you can use to populate the service metadata for caGrid services. Instructions are provided in the top of the script. It was tested on Mac and Linux.
When you are setting up a new grid, there will usually be a number of services that are hosted by the same organization and have the same people who are appropriate as points of contact. Since all of this information appears together towards the end of the serviceMetadata.xml file, after you have entered the information for one service, you may find it most confenient to copy and paste the information into other service's serviceMetadata.xml file.
In this step, you will install the master GTS service. You do this before installing Dorian, because is easier to set up if you have GTS running first. However, you cannot finish configuring GTS until Dorian is running, because you need appropriate user credentials from Dorian to administer GTS. The details for installing the master GTS are in the Master GTS Installation Guide.
Begin this step by installing Dorian. The instructions for this are in the Dorian Installation Guide.
If your planned grid configuration includes a slave GTS service, then now it the time to install the Slave GTS service.
Prior to using gaards-ui configuring the GTS service(s) to trust the Dorian certificate authority, you were instructed to temporarily add the Dorian certificate authority to the ExcludeCAs element in the GTS container's etc/cagrid_SyncGTS/sync-description.xml file. Now is the time to remove these as they should no longer be needed.
The following services should be installed as needed for your specific Grid applications. If you are installing Grid applications, please refer to your application documentation to determine which optional services you need to install. The list of optional services that add functionality to the Grid include the following:
- Credential Delegation Service (recommended if installing WebSSO)
- Federated Query Processor
- Grid Grouper
- Index Service (recommended for service discovery capabilities)
- Authentication Service
- Global Model Exchange
- Metadata Model Service
- Taverna Workflow
- BPEL Workflow
All services should be configured to automatically start when their host VM boots.