Using PDSF for ATLAS Work

From Atlas Wiki

Jump to: navigation, search

if you find that any of these instructions are incomplete or obsolete please send mail to Zach.


Getting an Account

Have a look at this page for information about how to get an account. If you need access to additional resources, or if you are not sure if your account is approved, you can ask Zach. If you need a password reset, then you need to contact NERSC directly.

Mailing Lists

We have our own mailing list for atlas specific pdsf issues and general requests for help which you can sign up through eGroups: atlas-lbl-pdsf-users. Please cc this list on support requests you think would be of interest to the general group.

PDSF maintains two emailing lists. The first, pdsf-users (, automatically adds all users. The second, pdsf-status, is used to notify us about downtimes, meetings, etc. It is advised that you join this list.

Recent Information/News

Monthly meetings are held, and the minutes are here. You can also find some helpful information in this Tuesday meeting talk.

Note that PDSF is going through some transitions. Though we are trying to keep the information here up to date, just send a note if you think things need an update or if you get confused.

OS environment

PDSF has several operating systems installed. One can switch between the systems using CHOS. The suggested OS is sl64 (Scientific Linux 6.4). If you have a need to use sl53 then it's still available, but everyone is encouraged to move to SL6 as soon as they can, and for certain tasks (panda job submission, DQ2, using athena release 19 and higher) SL5 is no longer supported.

To use CHOS in (Ba)sh


To use CHOS in (t)csh

  setenv CHOS sl53

If you find that the default shell isn't in your preferred OS, then set the CHOS variable in your ~/.chos file.

Login shell

A user can configure the login shell on PDSF at Login with PDSF username and password. ATLAS recommended shell is bash. Csh is known to be broken in some cases by the huge amount of stuff that cmt puts into environment.

Access ATLAS software releases

Last reviewed Feb. 2011 by Yushu Yao

All ATLAS releases are managed by CernVM-FS, a web-based, read-only file system. New releases are installed from the Central CERN location and maintained by the ATLAS Release Management Team. No Local installation or customization needed.

To use ATLAS Software on PDSF, always do the following first:

   source /common/atlas/scripts/

The above should be your first line in a batch job as well.

To show the available ATLAS software releases:

   showVersions --show athena

To show the available ATLAS DB releases (Note: you should not need DBReleases any more!):

   showVersions --show dbrelease

To setup an ATLAS release with a custom testarea

   asetup --testarea=$HOME/mytestarea

The ATLAS software are managed by the ATLASLocalRootBase/ManageTier3SW package, please refer to for more features. The available ATLAS releases are on this website: Please allow 3-5 days after the announcement for the new release to be installed and updated onto CernVM-FS at PDSF.

Running Batch Jobs

PDSF uses the SLURM batch system, which differs from the lsf batch system most of you are used to from lxplus at CERN.

The SLURM commands are not enabled by default. Load them using

 module load slurm

Submit your jobs simply using


Below is a sample job script to use CernVM-FS provided ATLAS Releases:

% cat
#SBATCH --output=slurm-%j.out
#SBATCH --error=slurm-%j.err
#SBATCH --account=atlas
#SBATCH --partition=shared-chos
#SBATCH --time=48:00:00
#SBATCH --mem=1800
#SBATCH --constraint=
#SBATCH --image=custom:pdsf-chos-sl64:v4 --export=NONE
CHOS=sl64 chos

The options after #SBATCH can be passed directly to sbatch. But it is more convenient to store them in the submission script header. The important options are the account, partition and mem. The mem option requests 1.8 Gb of memory per job, which is the maximum available per core. Anything larger than that will take up two slots, reducing the number of resources available. Do not request more memory unless you are confident that you need it!

CHOS is needed to run ATLAS code on a batch node. It executes a script under the specific environment supported by ATLAS software releases.

% cat
shopt -s expand_aliases
source /common/atlas/scripts/
asetup,noTest,gcc48 blabla

Note the line with "shopt" has to be there.

In some time, CHOS will be replaced by Shifter. If you would like to use it now, you need to change the partition to "shared" and execute your code using shifter as

shifter --volume=/global/project:/project --volume=/global/projecta:/projecta /bin/bash

However until that transition occurs, only a very limited number of nodes will be configured to use shifter.

Modules in Batch Jobs

In order to use modules in batch jobs, it is a good idea to include these lines in your batch scripts:

 source /usr/share/Modules/init/bash
 module use /usr/common/usg/Modules/modulefiles

EventLoop Batch Jobs

The EventLoop SLURM driver needs to be configured to work with Shifter and the NERSC environment. All of the necessary settings are available starting in release 21.

The following is an example of using the SlurmDriver in Python.

driver = ROOT.EL.SlurmDriver()
driver.shellInit = setup-athena-code
driver.SetJobName  (“blah-i-use-submitdir”)
driver.SetAccount  (“atlas”)
driver.SetRunTime  (“24:00:00”)
driver.SetMemory   (“1800”)
driver.setString(ROOT.EL.Job.optBatchSlurmExtraConfigLines,”#SBATCH --image=custom:pdsf-chos-sl64:v4 --export=NONE”)
driver.setString(ROOT.EL.Job.optBatchSlurmWrapperExec,export TMPDIR=\${SLURM_TMP}; CHOS=sl64 chos ")

Starting in release 21, the EventLoop batch drivers require the user to explicitly setup the release on each node. If using a stable release, the setup-athena-code needs to be replaces with the following line

export AtlasSetupSite=${AtlasSetupSite}; export AtlasSetup=${AtlasSetup}; source ${AtlasSetup}/scripts/ ${AtlasProject},${AtlasVersion}; source ${WorkDir_DIR}/

It setups the same release on the nodes as used during the submission time.

If you are running in release 20.7, you need a patched version of EventLoop. Also you do not need the shellInit driver option, as the RootCore release is setup automatically by the batch driver. To obtain a patched version of EventLoop, run the following commands in your work directory:

rc checkout_pkg EventLoop
rc find_packages
cd EventLoop/
patch -p0 < ~spgriso/patch.EventLoop-00-01-55.SlurmPDSF
rc compile

Nx Server for PDSF Use

Connecting to pdsf using the NX server is highly recommended, particularly for those working at CERN. To set it up, follow the instructions here.

Setting Up SVN Access

For password-less access to svn create at file called config in your ~/.ssh directory containing:

 Host svn
         Protocol 2,1
         GSSAPIAuthentication yes
         GSSAPIDelegateCredentials yes
         ForwardX11Trusted yes
         ForwardX11 yes 

If your username is different for, then add the following line under that host entry:

         User mhance

of course replacing "mhance" with the username that expects.

In your ~/.bashrc file add

 export KRB5_CONFIG="/common/atlas/setup_files/krb5.conf"

For each pdsf session you have do

 kinit username@CERN.CH

then you should not have to type your password for every svn directory.

Disk space

Location Filesystem Quota or Size Comments
/project/projectdirs/atlas GPFS 100 TB No backup (backup service in development). Can be used for data files or SW.
/common/atlas GPFS 1.5 TB(?) Nightly backup. This disk is for software, not for data files.
/eliza1/atlas GPFS 28 TB No backup. Scratch only, disappearing soon!
/eliza2/atlas GPFS 35 TB No backup. Scratch only, disappearing soon!
/eliza11/atlas GPFS 110 TB No backup. Free for data or code.
/eliza18/atlas GPFS 350 TB No backup. Put data files here. Also grid endpoint.
/oldeliza/scratch GPFS 142 TB No backup. Scratch only, will be around until disks fail.
  • To check group disk usage and quotas (-G displays quotas in GB instead of MB):
   myquota -G -g atlas
 * To check your own disk usage and quota (-G displays quotas in GB instead of MB:
 myquota -G
  • In addition to a disk space quota, there is also a quota on the number of files (inodes). A kit for a single ATLAS release requires about 80k inodes, so please check the inode quota before installing a new kit.
  • Note that when you are near your quota on /home (>90%), your batch jobs will not run. So please keep your home area clear.

Getting/Distributing data locally: using rucio-xxx on PDSF

You are free to use interactive pdsf nodes for small transfers, like getting single files to test jobs on. For any large transfers, please use the Rucio Rule Definition Droid (successor to DaTRI):

After finding your datasets in Step 1, enter NERSC_LOCALGROUPDISK in Step 2. In Step 3, enter a finite lifetime and enter a comment when asking for approval.

Following the transfer, the files will appear in e.g.


where the "5a/7d" varies for each file. Here is a useful command to find all the files associated with a dataset

  rucio list-file-replicas datasetname/ --rse NERSC_LOCALGROUPDISK --protocol srm | awk '{print $12}' | grep SFN | sed -r s_".*SFN=(.*)"_"\1"_g 

To get small files, set up the environment:

  source /common/atlas/scripts/
  localSetupDQ2Client --skipConfirm
  voms-proxy-init --voms=atlas

Now you can use dq2-ls, dq2-get, etc. Documentation for the tools is here.

If you have problems with "certificate out of date" errors, please do the following and re-try the dq2-get:

  export X509_CERT_DIR=/usr/common/nsg/etc/certificates

If you want to create a new dataset visible to grid users from data that is local on PDSF you can use dq2-put:

  dq2-put -L NERSC_SCRATCHDISK -s sourceDir datasetName

please refer to the dq2 twiki page linked above for details and for the format of the dataset name (must be of the form user.[UserName].*). On PDSF the above functionality should work. However you may encounter errors in writing files to the disk, which look like:

>> Transfer of file MC11_7TeV.107499.singlepart_empty.pileup_Pythia8_A2M_noslim_2011BS.mu9.VTXD3PD.root to SE: FAILED

In this case please open a NERSC ticket and ask for the destination directory to be made group-writable. The destination directory in case you are writing to NERSC_SCRATCHDISK will be like:


where [YourUserName] is your nickname on the grid and [XXX] is the part of the dataset name between dots following your nickname, e.g. user.[YourUserName].[XXX].someOtherInfo.v1.0/

Using the PDSF grid server

To list datasets local to PDSF: set up the DQ2 environment, then


The files are physically located on the /eliza18 disk, so you can also login to PDSF and use "ls":

  ls /eliza18/atlas/atlasdata/atlaslocalgroupdisk/rucio

to find files, and use the files directly in your athena or ROOT jobs.

With the new Rucio system, finding files with "ls" is basically impossible anymore -- you need to get the physical file locations for a given dataset using dq2-ls, e.g.:

  dq2-ls -pfL NERSC_LOCALGROUPDISK mc15_13TeV.my_dataset_name  | grep "srm\:" | sed s_"srm\://"__g 

To request a dataset to PDSF, follow these instructions:

To get data to NERSC_LOCALGROUPDISK you may need an explicit approval if the dataset is large (TB's). Ian can do that if you see your request is awaiting approval for a long time.

If you copied data in LOCALGROUPDISK and don't need it any more, please use

  dq2-delete-replicas  -d dataset-name NERSC_LOCALGROUPDISK

to clean up. Everyone should be able to delete datasets they requested to PDSF. Otherwise ask Ian. The "-d" is necessary to actually delete the files, as opposed to just erasing the listing of the dataset at NERSC from the server.

If you get "permission denied" errors when trying to transfer or write data to our grid endpoints, then you may need to either (a) register with DaTRI, or (b) request the "usatlas" role for your grid certificate.

(a) To register with DaTRI, visit the following page and follow the instructions there:

(b) To get the "usatlas" role, follow the instructions on the following page, starting from "In addition, you should request to join the group associated to your country...."

Frontier DB access on pdsf

If you use CernVM-FS based ATLAS releases, the following frontier server is setup automatically.

To get very fast online DB access you should setup Frontier access. This will work for 15.5.X and on. Setup the atlas software then do

   export FRONTIER_SERVER='(serverurl='

then run athena and marvel at how fast it is.

Using Kerberos at PDSF

To be able to check out packages in CVS or use your cern afs space, you need to have kerberos authentication. CERN has now switched to Kerberos v5. To be properly authenticated, you need to define a variable:

 export KRB5_CONFIG=/common/atlas/setup_files/krb5.conf

and use

 kinit username@CERN.CH

to authenticate yourself.

Using pAthena on PDSF: Running Grid Jobs

  • Using pathena is an efficient way to do analysis over the grid. The documentation is quite good and should be able to guide you through using it. Here are some quick installation instructions.

You can use pathena through the AtlasLocalRootBase setup:

setupATLAS setup # or your favorite athena release localSetupPandaClient pathena --help

Now pathena is ready to use. If you're using prun instead, you likely do not need to set up athena:

setupATLAS localSetupPandaClient prun --help

Using eos (from CERN) on PDSF

To set up eos access on PDSF, just do:

  source /afs/
  export EOS_MGM_URL=root://
  export KRB5_CONFIG="/common/atlas/setup_files/krb5.conf"
  kinit username@CERN.CH

From there, you can access files on eos as you would on lxplus, except that you must replace eosatlas with For example:

root -l root://

Useful Atlas Tools on pdsf

  • Valgrind is a good way to debug code, find memory leaks and understand seg faults. To setup it up on the pdsf SL4, setup your normal software and then do
 source /afs/

to run valgrind do

 valgrind --leak-check=yes --trace-children=yes --num-callers=8 --show-reachable=yes 
 `which` >! valgrind.log 2>&1

You can decode the output using the Atlas documentation

Other Tips for Easy Usage

Compiling is quite slow with the atlas software on PDSF. If you need to compile something and are not adding any new header or source files, you can do

 make QUICK=1

You can't use this if you've just checked out a package and are compiling for the first time. This is only to be used when you've made relatively minor changes.

There are default soft limits on the address space users are allowed to consume in the interactive machines. This can, for instance, cause problems when opening large ROOT files for browsing or with a MakeClass script. To get around this, try executing this command in your bash shell:

  ulimit -v 5242880

Known Problems with using ATLAS specific software on PDSF

  • None today :-)
Personal tools