CyVerse_logo

Home_Icon Learning Center Home

OSG Tool Integration in DE

Goal

This quickstart contains the complete instructions for integrating OpenScienceGrid tools in CyVerse Discovery Environment. OSG tools are intended for high-throughput processing of many files.


Prerequisites

Downloads, access, and services

In order to complete this tutorial you will need access to the following services/software

Prerequisite Preparation/Notes Link/Download
CyVerse account You will need a CyVerse account to complete this exercise CyVerse User Portal
Atmosphere Access (Optional – for building containers remotely) Atmosphere access is by request only Check or request access: CyVerse User Portal
Installation of Docker Docker on the machine you’re building containers Atmosphere Installation instructions
Installation of Singularity Singularity on the machine you’re building containers Atmosphere Installation instructions

Platform(s)

We will use the following CyVerse platform(s):

Platform Interface Link Platform Documentation Quick Start
Discovery Environment Web/Point-and-click Discovery Environment DE Manual Discovery Environment Guide
Atmosphere Cloud Virtual Machine Atmosphere Atmosphere Manual  

Input and example data

In order to complete this quickstart you will need to have the following inputs prepared

Input File(s) Format Preparation/Notes Example Data
Read1.fastq fastq Make sure it is a fastq file and not fasta file Read1.fastq

Get started

  • Create a Docker image for your tool/software of interest
  • Build the Docker image for your tool/software
  • Test Docker image
  • Submit a pull request to OSG github repo
  • Integrate DE tool using “Add Tools” option in DE

1. Create a Docker image for your tool

This is the first step in the process of making OSG tool integration in DE. The minimum requirements for creating a Docker image include the following dependencies (apart from the dependencies that are needed for your tool of interest)

Operating System Directories Dependencies Files
Debian /work icommands wrapper
Ubuntu 16+ /cvmfs wget upload-files
Centos7+   python3 create-ticket.sh

For this example, we will use fastq-sample.

Let’s first create a Dockerfile using your favorite editor which satisfies the above requirements

$ mkdir fastq-sample-osg && cd fastq-sample-osg

$ wget https://data.cyverse.org/dav-anon/iplant/projects/osg/upload-files

$ wget https://data.cyverse.org/dav-anon/iplant/projects/osg/wrapper
$ vi Dockerfile

FROM ubuntu:xenial
MAINTAINER User Name <[email protected]>

RUN mkdir /cvmfs /work

RUN apt-get update \
    && apt-get install -y lsb curl apt-transport-https python3 python-requests libfuse2 wget gcc make libpcre3-dev libz-dev

# Install fastq-tools
RUN wget http://homes.cs.washington.edu/~dcjones/fastq-tools/fastq-tools-0.8.tar.gz
RUN tar xvf fastq-tools-0.8.tar.gz
WORKDIR fastq-tools-0.8
RUN ./configure
RUN  make install fastq==0.8

WORKDIR /work

# Define the iRODS package
ENV ICMD_BASE="https://files.renci.org/pub/irods/releases/4.1.12/ubuntu14"
ENV ICMD_PKG="irods-icommands-4.1.12-ubuntu14-x86_64.deb"

# Install icommands.
RUN curl -o "$ICMD_PKG" "$ICMD_BASE/$ICMD_PKG" \
        && dpkg -i "$ICMD_PKG" \
        && rm -f "$ICMD_PKG"

# Install the wrapper script and the script to upload the output files.
ADD wrapper /usr/bin/wrapper
ADD upload-files /usr/bin/upload-files

# Make the wrapper script the default command.
CMD ["wrapper"]

Note

The Dockerfile and wrapper files are specific for this example fastq-sample tool.

If you want to create OSG tool for your tool of interest, replace the specific parts of the script

in wrapper script append lines # 117-122 with your job information:

   # Run the job.
def run_job(arguments, output_filename, error_filename):
    with open(output_filename, "w") as out, open(error_filename, "w") as err:
        rc = subprocess.call(["fastq-sample"] + arguments, stdout=out, stderr=err)
        if rc != 0:
            raise Exception("fastq-sample returned exit code {0}".format(rc))

Note

Be sure the upload-files and wrapper scripts are executable. This can be done by adding the following line to your Dockerfile:

# Make scripts executable:
RUN chmod +x /usr/bin/upload-files /usr/bin/wrapper

2. Build and push the Docker image to Dockerhub

Once you create the Dockerfile, next step is to build the Docker image and push it to Dockerhub manually (you can also do an automated build)

$ docker build -t upendradevisetty/fastq-sample-osg:0.8 .

$ docker push upendradevisetty/fastq-sample-osg:0.8

3. Test Docker image

Testing of OSG-fastq-sample docker image can be done in two ways: Locally using Singularity and on Open Science Grid (OSG). Since many users don’t have access to OSG, we recommed that you test it locally.

Important

This is very important step as it is very hard to troubleshoot after you integrate the OSG tool in DE

3.1 Create a folder with input file(s) and output folder on CyVerse Datastore

For this example, the only input file is Read1.fastq. I have this input file in this path on Datastore /iplant/home/upendra_35/fastq-sample-osg/Read1.fastq and output folder in this path on Datastore /iplant/home/upendra_35/fastq-sample-osg/output

3.2 Create input and output path files

The next step is to create an input and output path files that contains the paths to the input and output respectively.

$ cat input-paths.txt
  /iplant/home/upendra_35/fastq-sample-osg/Read1.fastq

$ cat output-paths.txt
  /iplant/home/upendra_35/fastq-sample-osg/output

3.3 Create input and output tickets from input and output paths files

Using create-tickets.sh <https://github.com/upendrak/fastq-sample-osg/blob/master/create-tickets.sh>, create tickets for both inputs and outputs

$ wget https://raw.githubusercontent.com/upendrak/fastq-sample-osg/master/create-tickets.sh

$ mkdir sample_data

$ bash create-tickets.sh -r input-paths.txt > sample_data/input_ticket.list
$ cat sample_data/input_ticket.list
  # application/vnd.de.path-list+csv; version=1
  c0837571b6b7416fb998c5b9b226ea,/iplant/home/upendra_35/fastq-sample-osg/Read1.fastq

$ bash create-tickets.sh -w output-paths.txt > sample_data/output_ticket.list
$ cat sample_data/output_ticket.list
  # application/vnd.de.path-list+csv; version=1
  3fe4ea0dab5241cfb69420335c0902,/iplant/home/upendra_35/fastq-sample-osg/output

3.4 Create a config.json file in the sample_data folder

Here is an example of config.json for the fastq-sample-osg tool

$ vi sample_data/config.json
 {
     "arguments": [
         "-n",
         "10",
   "Read1.fastq"
     ],
     "irods_host": "davos.cyverse.org",
     "irods_port": 1247,
     "irods_job_user": "upendra_35",
     "irods_user_name": "job",
     "irods_zone_name": "",
     "input_ticket_list": "input_ticket.list",
     "output_ticket_list": "output_ticket.list",
     "status_update_url": "https://de.cyverse.org/job/bd1a1b53-9a7e-4031-bf0c-227a0c63f555/status",
     "stdout": "out.txt",
     "stderr": "err.txt"
 }

This is similar to running on the commandline like this..

$ fastq-sample -n 10 Read1.fastq

3.5 Pull the Docker image as singularity file (.sif)

Important

You must have Singularity installed to complete this step.

$ singularity pull docker://upendradevisetty/fastq-sample-osg:0.8

This will create fastq-sample-osg_0.8.sif singularity image in your working directory

3.6 Test the singularity image

Important

Remove your current iRODS password: rm ~/.irods/.irodsA

Running this container will overwrite the ~/.irods/irods_environment.json file on the localhost. To regenerate the previous iRODS environment, rm -rf ~/.irods/irods_environment.json and run iinit again.

Once you have created the input_ticket.list, output_ticket.list and config.json files in the /sample_data directory, you are ready for a test with the Singularity image:

$ cd sample_data

$ singularity exec ../fastq-sample-osg_0.8.sif ../wrapper
  running: configuration successfully loaded
  running: initializing the iRODS connection
  running: downloading the input files
  Enter your current iRODS password:
  running: processing the input files
  running: uploading the output files
  Enter your current iRODS password:
  Enter your current iRODS password:Enter your current iRODS password:

  Enter your current iRODS password:
  Enter your current iRODS password:
  Enter your current iRODS password:Enter your current iRODS password:
  Enter your current iRODS password:Enter your current iRODS password:completed: job completed successfully

Note

It will prompt you to enter your irods passwords several times, if so, then keep pressing the ENTER until the job is successfully finished. The output files will be uploaded to your output folder in datastore.

Once your job has finished, you should expect to see the input (Read1.fastq) and output (sample.fastq) files in the current working directory and also in the in the output directory.

4. Submit a pull request to OSG github repo for fastq-sample-osg tool

Once the Singularity run works, add your Docker image in here. For this particular example, we will add upendradevisetty/fastq-sample-osg:0.8 in there.

Note

You will have to fork and do a PR for this to work

Here is a screenshot of fastq-sample-osg:0.8 pull request to OSG github repo

osg-cvmfs

After the PR is merged, it takes few hours for the image to be available on CVMFS.

5. Integrate DE tool using “Add Tools” option in DE

After the image is available on OSG, it is now ready to be integrated into DE.

5.1 Log-in to CyVerse Discovery Environment and click on the “Apps” window

5.2 Click “Manage Tools” -> “Tools” -> “Add Tool” and fill the details about your Docker image

add_tool_osg

Procede here to create an app interface in the CyVerse Discovery Environment


Additional information, help

Post your question using the intercom button on the bottom right of this page:


Fix or improve this documentation


Home_Icon Learning Center Home