Access Keys:
Skip to content (Access Key - 0)

Identifiers


Prefix Authority


Contents

Deployment Planning

Prior to deployment, the identifier prefix must be established. This involves determining the URL endpoints for the naming authority and the prefix authority as well as the PURL top-level domain mapping to the naming authority.

It is particularly important to choose an appropriate domain name for the prefix authority (PURL server) and the PURL top level domain because these two components make up the identifier prefix, and this is not expected to ever change after deployment. Identifiers are permanent URIs by definition.

The naming authority URL can change at any time since it is actually hidden or protected by the prefix authority. When such a change occurs, the mapping from PURL domain to naming authority is updated to specify the new endpoint.

The rest of this guide assumes the following domains:

Prefix Authority (PURL Server) End Point  http://identifiers-pa.nci.nih.gov
Naming Authority ID (PURL Domain)  production 
Naming Authority Web App End Point  https://identifiers-na.nci.nih.gov/namingauthority/NamingAuthorityService 
Naming Authority Grid Service End Point  https://identifiers-na.nci.nih.gov:8443/wsrf/services/cagrid/IdentifiersNAService 

With the settings above, the identifiers prefix becomes:

Example identifier:

When an identifier such as

http://identifiers-pa.nci.nih.gov/production/7e82e853-c972-4d63-a891-cbe0260316c2

is "followed" (resolved), the prefix authority (PURL) redirects the client to

https://identifiers-na.nci.nih.gov/namingauthority/NamingAuthorityService/7e82e853-c972-4d63-a891-cbe0260316c2

for resolution services.

Prefix Authority Deployment


PURLZ is the official PURL server by OCLC. It provides a level of indirection that allows the underlying web addresses of resources to change over time without negatively affecting systems that depend on them. This capability provides continuity of references to network resources that may migrate from machine to machine.

caGrid's identifiers framework leverages PURLZ as its prefix authority.

Installation

Download PURLZ: http://persistenturls.googlecode.com/files/PURLZ-Server-1.6.2.jar

Double click the jar file or use "java --jar PURLZ-Server-1.6.2.jar" from a terminal window to start the installer.

Click Next.

Accept the terms of the license agreement and click Next.

Specify an installation path and click Next.

Enter the host name and port number. Then click Next.

Choose "Use MySQL" and click Next.

Enter MySQL connectivity parameters and click Next.

In controlled environments such as production, it is recommended that a PURL administrator be designated to approve user- and top-level domain registrations. Click Next.

Accept defaults and click Next.

The installer proceeds to complete the installation. Click Next twice and then Done.

Configuration

Server Name

The host name identifiers-pa.nci.nih.gov has to be added to the server configuration before it can be used. Otherwise, the web interface and redirection services would only work when localhost is used in the URL.

Open /Applications/PURL-Server-1.6.1/modules/mod-purl-virtualhost/module.xml for edit and add our host name after localhost as follows (note the ".*" after the host name):

 <export>
    <!--
    ***********
    Export all of host address space - note could export multiple hosts here.
    (Note have added localhost so you can test it)
    ***********
    -->
    <uri>
         <match>jetty://localhost.*</match>
         <match>jetty://identifiers-pa.osu-citih.org.*</match>*
         <!-- Add any other jetty://<servername> matches that you want
               to match. -->
         <match>ffcpl:/etc/HTTPBridgeConfig.xml</match>
    </uri>
  </export>
Running on Port 80

The installation wizard above showed that port 8080 was entered along with the desired host name. In most setups, this wouldn't be desired since the port number would then have to be part of the identifiers.

A problem to solve is that PURLZ seems to lack support for running on ports that are considered privileged (i.e., 80) by operating systems such as Linux. Even if the server is started by root, which is undesirable, it exhibits other undocumented run-time issues.

PURLZ uses a jetty server internally, and there is jetty documentation pointing to a solution that allows the setting of a runtime user ID after the port has been bound. This would potentially allow to start the server as root (enabling binding to port 80), and then jetty would switch to the specified runtime user id. The problem with this solution is that it requires rebuilding jetty's source, which again, is undesirable.

Therefore, the recommended approach in this guide is to let PURLZ run on port 8080 (or another non-privileged port) and configure the operating system to redirect port 80 to the PURLZ port. The following configuration has been tested to work on CentOS Linux.

 1.- Create file /etc/xinetd.d/http with the following contents
      service http
       {
          disable = no
          flags = REUSE
          socket_type = stream
          wait = no
          user = root
          redirect = 127.0.0.1 8080
          log_on_failure += USERID
       }
 2.- Re-start xinetd      $ /etc/init.d/xinetd restart
Startup

The server can be started in the foreground as follows:

 $ cd /Applications/PURLZ-Server-1.6.1/bin  $ ./start.sh (or startup.bat if using MS Windows)
It may also be convenient to start the server as a daemon when the system starts. The following is a sample init script for CentOS Linux.

#!/bin/sh
#
# Startup script for PURLZ
#
# chkconfig: - 85 15
# description: PURLZ server
# processname: purlz
# pidfile: /var/run/purlz.pid
# config:
##############################################################################
. /etc/init.d/functions
JAVA_HOME=/usr/local/java
PURLZ_HOME=/home/purlz/ext/purlz
PURLZ_LOG=$PURLZ_HOME/log/console.log
PURLZ_USER=purlz
PID_FILE=/var/run/purlz.pid
case "$1" in
  start)
    daemon --pidfile=$PID_FILE --user=$PURLZ_USER $PURLZ_HOME/bin/start.sh &> $PURLZ_HOME/log/console.log &
    chown $PURLZ_USER $PURLZ_HOME/log/console.log
    chgrp $PURLZ_USER $PURLZ_HOME/log/console.log
    chmod 755 $PURLZ_HOME/log/console.log
    exit $?
    ;;
  stop)
    PID=`cat $PID_FILE`
    kill $PID
    ;;
  *)
    echo "Usage purlz start/stop"
    exit 1;;
esac

Once the server is started, verify it's running by pointing your browser to http://identifiers-pa.nci.nih.gov. A page like the one shown immediately below should be displayed.

Log on to the server as 'admin' with password 'password' and proceed to change the password.

Top-Level Domain Creation

A PURL domain is needed to identify the target-naming authority. The domain binds the identifier prefix to the naming authority. Therefore, a prefix authority (PURL server) can be used as an authority for multiple naming authorities by defining corresponding domains.

Following the aforementioned deployment plan will create the following mapping:

     production => http://identifiers-na.nci.nih.gov/namingauthority/NamingAuthorityService

Where production is the PURL domain and the mapping itself is a partial-redirect PURL, domain has to be created before any PURL can be placed in it.

Login as 'admin' and click on the Domains tab. Choose Create a new domain from the drop-down menu on the left, and enter the information as seen below. Click Submit to create the domain.

Now create a PURL that will redirect resolution of our identifiers to their corresponding naming authority host.

Click the PURLs tab. Choose Create an advanced PURL from the drop-down menu on the left, and enter the information as seen below. Note that the full Target URL is "http://identifiers-na.nci.nih.gov/namingauthority/NamingAuthorityService". Click Submit to create the PURL.

Last edited by
Sarah Honacki (582 days ago) , ...
Adaptavist Theme Builder Powered by Atlassian Confluence