Phenotype Database installation guide, GSCF

21-08-2012 8 min read

This post will assist you with installing (parts of) the Phenotype Database project [1] . We will be using a CentOS installation [2] . CentOS uses the source code from the commercial Red Hat Enterprise Linux project [3] . Currently the project's main component already has such an installation guide, written for a Debian GNU/Linux installation. That guide can be found in the repository [4] .

The Phenotype Foundation's software project consists of multiple components, which are called modules. These modules generally have the same requirements. We will start by looking at the main module, called the Generic Study Capture Framework (GSCF). The commands provided here assume that the account you will be using for the installation has the required permissions.

Java

The project uses the Java Virtual Machine. At this point in time, there is no dependency on a particular vendor. If no Java runtime has been installed yet, we will have to do so. We can test wether we have one, by running the following command:

java -version

If this runs fine, then we have a Java version. We need a Java with version 1.6, at the least. If you don't have the right version, or don't have any version, you will need to install one. You can install the open source implementation OpenJDK or the commercial version by Oracle. We can install the open source version with the following command:

yum install java-1.6.0-openjdk

PostgreSQL

The database we will be using is PostgreSQL. We can install this with the following command:

yum install postgresql-server

To see if the postgresql service is running, we do:

service postgresql status

You may get a negative answer, as I did, in which case we can issue the following command to start the service:

service postgresql start

Either way, since we are configuring a server, we will want this service to start when we start up the server. In my case, the installer did not configure this for me. You can easily test this out for yourself, either by restarting your server, logging in and then checking if the service has been started, or by looking at the output of

chkconfig --list postgresql

and confirming that runlevels 2, 3 and 4 are on.
This is one way to set the service to start at boot:

chkconfig --add postgresql; chkconfig --level 234 postgresql on

If you were to reboot after running this command, you would likely be prompted by a 'setup agent' - you can safely ignore this and allow booting to resume by choosing the 'exit' option.
To configure postgres, we will switch to the postgres user account:

su - postgres

We will start the PostgreSQL interactive terminal:

psql

We will enter some commands into this terminal. If the terminal responds to you with text that starts with "ERROR:", then, yes, something is going wrong. Make sure to use the correct " and ' characters, as shown in the following command examples.
We will create a postgres user called 'mydbuser' with password 'mydbpassword':

create user mydbuser password 'mydbpassword';

Of course, you should replace these authentication details with values that seem sensible to you.
Now we will create a database:

create database "mytestdb";

Finally, we will tell postgres that our new user has all privileges for the new database, and that our new user owns that database:

grant all privileges on database mytestdb to mydbuser; alter database mytestdb owner to mydbuser;

We are done with the psql program. We will exit psql:

Now we will log out of the postgres account and go back to our own account:

exit

We will change some PostgreSQL settings, to make our install more secure. The file we will edit is /var/lib/pgsql/data/pg_hba.conf. We issue the following command:

nano /var/lib/pgsql/data/pg_hba.conf

Now we will scroll to the bottom. We should find something like this:

# TYPE DATABASE USER CIDR-ADDRESS METHOD

# "local" is for Unix domain socket connections only
local all all ident sameuser
# IPv4 local connections:
host all all 127.0.0.1/32 ident sameuser
# IPv6 local connections:
host all all ::1/128 ident sameuser

We will change the last two entries, such that we end up with the following:

# TYPE DATABASE USER CIDR-ADDRESS METHOD

To make sure that our postgresql service is aware of these changes, we restart it:

service postgresql restart

Tomcat

GSCF consists of several files, which will be wrapped in a so-called container. This container is a WAR file. WAR stands for web application archive. We need a program that can 'serve' the contents of such a container. We will be using the Apache Software Foundation's Tomcat web server, version 7.

yum install tomcat7

This version may not be in the repositories that your CentOS version uses. In that case you will have to install it manually. In this guide we will assume that your OS does in fact have tomcat7 in one of it's repositories. Either way, don't forget to check if the service starts along with the server, as we did with the postgresql service. If not, make sure it does.

Installing our application

First, we will stop the tomcat service from running. If you have a proper install, you will probably use the following command to do so:

service tomcat7 stop

This script may not exist yet and you may need to drop the version number, depending on how exactly you installed tomcat. We will place the GSCF WAR-file (which can be downloaded from GitHub) in tomcat's webapps directory. This directory is probably located at /var/lib/tomcat7/webapps. Confirm the location of the webapps folder. Next, copy the WAR-file to that location.

cp gscf-www.war /var/lib/tomcat7/webapps/gscf.war

GSCF Configuration file

GSCF requires a configuration file. The following is an example of what it's contents could look like.

# server URL

grails.serverURL=http://test.dbxp.org

# DATABASE
dataSource.driverClassName=org.postgresql.Driver
dataSource.dialect=org.hibernate.dialect.PostgreSQLDialect
dataSource.url=jdbc:postgresql://localhost:5432/gscf-www
dataSource.dbCreate=update
dataSource.username=mydbuser
dataSource.password=mydbpassword
#dataSource.logSql=false

# SpringSecurity E-Mail Settings
grails.plugins.springsecurity.ui.forgotPassword.emailFrom=gscfproject@gmail.com

# module configuration
#modules.sam.url=http://sam.test.dbxp.org
#modules.metabolomics.url=http://metabolomics.test.dbxp....
#modules.metagenomics.url=http://metagenomics.test.dbxp....

# default application users
authentication.users.admin.username=admin
authentication.users.admin.password=admiN123!
authentication.users.admin.email=admin@dbnp.org
authentication.users.admin.administrator=true
authentication.users.user.username=user
authentication.users.user.password=useR123!
authentication.users.user.email=user@dbnp.org
authentication.users.user.administrator=false

// override application title
application.title=Phenotype Database

# use shibboleth authentication?
authentication.shibboleth=false

You will have to modify the contents of the file so that it corresponds with your setup, at least the server URL and the database connection credentials if different from mentioned above. The use of the modules is optional. The file has to be placed in a .gscf directory in the home directory of the user under which the tomcat process runs (e.g. /home/tomcat7/.gscf).
As you can see we have set the grails.serverURL property to be http://test.dbxp.org. We will be using this adress to access our application, using the Apache webserver. If you wish to try the application locally or without apache, then you should set this property to be the server's IP adress, with port number 8080, ending in a slash and the application name. For example: 192.168.0.100:8080/gscf-0.9.0. It could also be 192.168.0.100:8080/gscf, if you created the symbolic link. If you are unsure what the application name is, we can find out in the next section. The locations of additional modules have been commented out in this example. This is not a problem, as modules can be added anytime through GSCF. Make sure the right database details are set. Change the authentication details that are listed under "default application users" into whatever authentication details you wish to use. New users can be added at any time through GSCF.

Tomcat's permissions

Your tomcat application should have been set up such that it is started by the tomcat user. We need to make sure that the tomcat user has all the permissions that it needs to. One way to do that could be as follows:

cd /usr/share/tomcat7/

chown tomcat:tomcat . -R

cd ./webapps

chmod gu+rx *.war -R

This particular chown command sets all files in and under this directory to be owned by user tomcat and to be associated with the tomcat group. This particular chmod command sets all .war-files in it's directory (the webapps directory) to be readable and executable by the file's owner and members of the file's group. Remember that new .war-files should be made readable and executable for the tomcat user and tomcat group too.

Starting our application

The logfiles for tomcat are probably located in the folder /usr/share/tomcat7/logs. We will open two sessions to our server, one to start tomcat and one to look at it's main log. One way to look at a log file and be kept updated of changes to it, is to use the tail program with the -f option.

tail /usr/share/tomcat7/logs/catalina.out -n 500 -f

This command will keep us updated of the last 500 lines of the catalina.out file. We will be looking at these contents to see if our application starts properly. In a second session, we start tomcat. At some point, an entry like the following should appear.

INFO: Deploying web application directory /usr/share/tomcat7/webapps/gscf

Finally, the log should say something like this:

INFO: Server startup in 62939 ms

We can test if our application's homepage does indeed load, by browsing to the application's address. It should be something like http://localhost:8080/gscf. If you want to confirm this from the server but don't want to install any browser, you can probably do something like this:

wget http://localhost:8080/gscf --output-document="/dev/null"

This command should display several lines of output. If the application can be found, you will find lines like the following among the output:

--2012-08-21 13:38:24-- http://localhost:8080/gscf/

Connecting to localhost|127.0.0.1|:8080... connected.

HTTP request sent, awaiting response... 200 OK

The 200 OK indicates that the homepage could be loaded just fine.

Setting up access to our application

We probably want our application to be accessible from outside our local network, at a specific URL. We have already set the DNS record for this URL to point to our server. We will be setting up the Apache webserver (httpd) for this. First, we will install it:

yum install httpd

Again, we should now make sure that the service is started on boot.

We will make sure httpd is not running:

service httpd stop

You may get an error message that says FAILED, this is fine. It just means that the service wasn't running yet.
We will configure the Apache webserver to load the modules we need. The installation directory is assumed to be /etc/httpd/. First we will list out those modules we want:

ls /etc/httpd/modules | grep -e "_rewrite" -e "_proxy"

We will now check if these are listed somewhere at the top of the httpd.conf file. This file can be opened by issuing this command:

nano conf/httpd.conf

After pushing the "Page Down" key on our keyboard a few times, the "Dynamic Shared Object (DSO) Support" listing should scroll in to view. We should check if the previously mentioned files are listed here. If not, we should list them in the same way that these other files are listed, e.g. the file located at /etc/httpd/modules/mod_rewrite.so should be listed as follows:

LoadModule rewrite_module modules/mod_rewrite.so

The pattern for these entries is as follows:

LoadModule NAME_module modules/mod_NAME.so

LoadModule EXTRA_LONG_NAME_module modules/mod_EXTRA_LONG_NAME.so

Now that Apache knows to load the modules we want, we will configure Apache to serve up our web application.

We will be using the address test.dbxp.org, and we will use that address to name our configuration file. We will now create the as of yet non-existant file, by "touching" it:

touch /etc/httpd/conf/test.dbxp.org.conf

The following is an example of what this file could contain. It is set up to use the previously mentioned URLs and directories, so you should change this to reflect your changes.

ServerName test.dbxp.org

ServerAlias test.gscf.dbxp.org

ErrorLog /var/log/httpd/gscf-test-error.log
CustomLog /var/log/httpd/gscf-test-access.log combined

RewriteEngine on

# keep listening for the serveralias, but redirect to
# servername instead to make sure only one user session
# is created (tomcat will create one user session per
# domain which may lead to two (or more) usersessions
# depending on the number of serveraliases)
# see gscf ticket #321
RewriteCond %{HTTP_HOST} ^test.gscf.dbxp.org$ [NC]
RewriteRule ^(.*)$ http://test.dbxp.org$1 [R=301,L]

# rewrite the /gscf-a.b.c-environment/ part of the url
RewriteCond %{HTTP_HOST} ^test.dbxp.org$ [NC]
RewriteRule ^/gscf/(.*)$ /$1 [L,PT,NC,NE]

ProxyPass http://localhost:8080/gscf/
ProxyPassReverse http://localhost:8080/gscf/

Information on properly loadbalancing GSCF can be found at the bottom of the the INSTALLATION.md file.

We will keep an eye on the logs while we issue the command to start the httpd.
These log files will be located in /var/log/httpd. Right now this directory might be empty, but it will soon contain at least the following files:

access_log

error_log

ssl_access_log

ssl_error_log

ssl_request_log

To look at those log files we might use a command like this, using a different session:

tail /var/log/httpd/error_log -n 500 -f

The error_log will be the most interesting one right now, so let's look at that one.

Let's start the webserver:

service httpd start

Your GSCF instance should now be up and running at the URL (and alias URL) you have chosen. Keep an eye out on the tomcat and httpd logs, they may help you with troubleshooting.
httpd's error_log probably tells us something like the following:

Apache/2.2.3 (CentOS) configured -- resuming normal operations