Roslin Configuration

nf-core pipelines sarek, rnaseq, chipseq, mag, differentialabundance and isoseq have all been tested on the University of Edinburgh Eddie HPC with test profile.

Getting help

There is a Teams group dedicated to Nextflow users: Nextflow Teams Also, you can find at the coding club held each Wednesday: Code Club Teams

Using the Roslin config profile

To use, run the pipeline with -profile roslin (one hyphen). This will download and launch the roslin.config file which has been pre-configured with a setup suitable for the University of Edinburgh Eddie HPC.

The configuration file supports running nf-core pipelines with Docker containers running under Singularity by default. Conda is not currently supported.

nextflow run nf-core/PIPELINE -profile roslin # ...rest of pipeline flags

Before running the pipeline, you will need to load Nextflow from the module system or activate your Nextflow conda environment. Generally, the most recent version will be the one you want.

To list versions:

module avail|grep nextflow

To load the most recent version (08/08/2024):

module load igmm/bac/nextflow/24.04.2

This config enables Nextflow to manage the pipeline jobs via the SGE job scheduler and using Singularity for software management.

Singularity set up

The roslin profile is set to use /exports/cmvm/eddie/eb/groups/alaw3_eb_singularity_cache as the singularity cache directory. This directory is put at the disposition of roslin institute nextflow/nf-core users by the Roslin Bioinformatics group led by Andy Law. All new container will be cached in this directory writable by all. If you face any problem with singularity cache, please contact Sébastien Guizard, Donald Dunbar and Andy Law with the Roslin Bioinformatics group in CC.

Singularity will by default create a directory .singularity in your $HOME directory on eddie. Space on $HOME is very limited, so it is a good idea to create a directory somewhere else with more room and link the locations.

cd $HOME
mkdir /exports/eddie/path/to/my/area/.singularity
ln -s /exports/eddie/path/to/my/area/.singularity .singularity

SGE project set up

By default, users’ jobs are started with the uoe_baseline project that gives access to free nodes. If you have a project code that gives you access to paid nodes, it can be used by jobs submitted by Nextflow. To do so, you need to set up an environment variable called NFX_SGE_PROJECT:

export NFX_SGE_PROJECT="<PROJECT_NAME_HERE>"

If you wish, you place this variable declaration in your .bashrc file located in your home directory to automatically set it up each time you log on Eddie.

NB: This will work only with the roslin profile.

Nodes exclusion

The roslin profile excludes some specific nodes. The superdome node is reserved for special applications, and the access must be requested. Eddie’s nodes are being migrated to Rocky Linux 9 and some of them are already online. These are not fully set up yet and jobs have troubles to run on them. Until those nodes are stable, we exclude them.

Running Nextflow

On a login node

You can use a qlogin to run Nextflow, if you request more than the default 2 GB of memory. Unfortunately, you can’t submit the initial Nextflow run process as a job as you can’t qsub within a qsub. If your eddie terminal disconnects your Nextflow job will stop. You can run qlogin in a screen session to prevent this.

Start a new screen session.

screen -S <session_name>

Start an interactive job with qlogin.

qlogin -l h_vmem=8G

You can leave our screen session by typing Ctrl + A, then d.

To list existing screen sessions, use:

screen -ls

To reconnect to an existing screen session, use:

screen -r <session_name>

On the wild west node

Wild West node has relaxed restriction compared to regular nodes, which allows the execution of Nextflow. The access to Wild West node must be requested to Andy Law and IS. Similarly to the qlogin option, it is advised to run Nextflow within a screen session.

Using iGenomes references

A local copy of the iGenomes resource has been made available on the Eddie HPC for those with access to /exports/igmm/eddie/BioinformaticsResources so you should be able to run the pipeline against any reference available in the igenomes.config. You can do this by simply using the --genome <GENOME_ID> parameter.

Config file

See config file on GitHub

roslin.config
//Profile config names for nf-core/configs
params {
    config_profile_description = 'University of Edinburgh (Eddie) cluster profile for Roslin Institute provided by nf-core/configs.'
    config_profile_contact = 'Sebastien Guizard (@sguizard) and Donald Dunbar (@ddunbar)'
    config_profile_url = 'https://www.ed.ac.uk/information-services/research-support/research-computing/ecdf/high-performance-computing'
}
 
executor {
    name = "sge"
}
 
process {
    stageInMode = 'symlink'
    scratch = 'false'
    penv = { task.cpus > 1 ? "sharedmem" : null }
 
    // To date (16/08/2024), the FastQC module is still broken.
    // More details here: https://github.com/nf-core/modules/pull/6156
    // Until the Pull Request is accepted, and the new version of the module is integrated to pipelines,
    // We force the amount of memory here.
    withName: 'FASTQC.*' {
        cpus = 5
        memory = '5.GB'
        // Check if an environment variable NFX_SGE_PROJECT exists, if yes, use the stored value for -P option
        // Otherwise set the project to uoe_baseline
        if (System.getenv('NFX_SGE_PROJECT')) {
            clusterOptions = {"-l rl9=false -l h=!node1d01 -l h_vmem=10G -pe sharedmem 5 -P $NFX_SGE_PROJECT"}
        } else {
            clusterOptions = {"-l rl9=false -l h=!node1d01 -l h_vmem=10G -pe sharedmem 5 -P uoe_baseline"}
        }
    }
 
    // This withName will override all jobs (except for FASTQC jobs, cf above) clusterOptions
    // This is necessary to allow jobs to run on Eddie for many users
    // For each job, we add an extra 4 Gb of memory.
    // For example, the process asked 16 Gb of RAM (task.memory). The job will reserve 20 Gb of RAM.
    // The process will still use 16 Gb (task.memory) leaving 4 Gb for other system processes.
    // This is very useful any JAVA programs which allocate task.memory RAM for its Virtual Machine
    // Also it leaves enough memory for singularity to unpack images.
    withName: '!.*FASTQC.*' {
        // Check if an environment variable NFX_SGE_PROJECT exists, if yes, use the stored value for -P option
        // Otherwise set the project to uoe_baseline
        if (System.getenv('NFX_SGE_PROJECT')) {
            clusterOptions = {"-l rl9=false -l h=!node1d01 -l h_vmem=${(task.memory + 8.GB).bytes/task.cpus} -P $NFX_SGE_PROJECT"}
        } else {
            clusterOptions = {"-l rl9=false -l h=!node1d01 -l h_vmem=${(task.memory + 8.GB).bytes/task.cpus} -P uoe_baseline"}
        }
    }
 
    // common SGE error statuses
    errorStrategy = {task.exitStatus in [143,137,104,134,139,140] ? 'retry' : 'finish'}
    maxErrors = '-1'
    maxRetries = 3
 
    beforeScript =
    """
    . /etc/profile.d/modules.sh
    module load singularity
    export SINGULARITY_TMPDIR="\$TMPDIR"
    """
}
 
params {
    // iGenomes reference base
    igenomes_base = '/exports/igmm/eddie/BioinformaticsResources/igenomes'
}
 
env {
    MALLOC_ARENA_MAX=1
}
 
singularity {
    envWhitelist = "SINGULARITY_TMPDIR,TMPDIR"
    runOptions = '-p -B "$TMPDIR"'
    enabled = true
    autoMounts = true
    cacheDir = '/exports/cmvm/eddie/eb/groups/alaw3_eb_singularity_cache'
}