Showing posts with label emsl. Show all posts
Showing posts with label emsl. Show all posts

28 October 2012

267. ECCE client connecting to remote site via reverse port/local port forwarding

The situation I'm about describe is quite specific, yet I don't think it's that unusual.

A. I've got a computer at work which is behind a firewall so that I can't connect directly to it from the outside. This will be referred to as Work.

B. I've got a laptop at home which is connected to a wireless  router. This will be referred to as Home.

C. The router is a Linksys/tomato router, which is accessible from the outside ( This will be referred to as Router.

I'd like to connect from home to my ecce server at work so that I can monitor and submit jobs.

At Work:
ssh -R 19997:localhost:8096

At Home:
ssh -L 5555:localhost:19997

We're basically tying together port 5555 at Home with port 8096 at Work, via an intermediary server.

At Home, edit your ecce/apps/siteconfig/Dataservers and change the relevant lines to

    <desc>ECCE Data Server--remote</desc>


Note that submission actually happens from your ecce client, not your server (i.e. from Home, not Work), so to get your submission scripts in order you may have to do a bit of fiddling. E.g. if you ecce server is also the queue master for an SGE batch system:

ssh -R 19999:localhost:22

ssh -L 5454:localhost:19999

Edit /apps/siteconfig/ and add
ssh_p5454: ssh -XC -p 5454|scp -P 5454|xterm

But you can read more about that here:

23 October 2012

264. Upgrade to ECCE 6.4 on Debian Testing

There's little reason to upgrade from 6.3 to 6.4 since
Other than open source availability and bundling the latest NWChem 6.1.1, the ECCE 6.4 release is otherwise equivalent to the 6.3 release.  
But it's always nice with some new and shiny.

For general instructions on how to install ecce from scratch, see e.g.

Go here to download the latest release. You will be asked to supply your name, email address and the name of your institution. However, you no longer need to register.

Download the file install_ecce.v6.4.rhel5-gcc4.1.2-m64.csh (full binary + builder). And yes, you can use wget for that.
Stop the ecce server if it's running:

Make your ecce install file executable and run it:
chmod +x install_ecce.v6.4.rhel5-gcc4.1.2-m64.csh

which launches the installation:
Extracting ECCE distribution from ./install_ecce.v6.4.rhel5-gcc4.1.2-m64.csh...

Main ECCE installation menu
1) Help on main menu options
2) Prerequisite software check
3) Full install
4) Full upgrade
5) Application software install
6) Application software upgrade
7) Server install
8) Server upgrade

IMPORTANT: If you are uncertain about any aspect of installing
or running ECCE at your site, please refer to the detailed
ECCE Installation and Administration Guide at

Hit  at prompts to accept the default value in brackets.

Selection: [1] 4
Host name: [beryllium] 
New application installation directory: [/home/me/tmp/ecce-v6.4/apps] /home/me/.ecce/ecce-v6.4/apps
Existing application directory to upgrade: /home/me/.ecce/ecce-v6.3e/apps

Backup existing server user data (yes/no)? [yes]

ECCE v6.4 will be installed using the settings:

  Installation type: [full upgrade]
  Host name: [beryllium]
  Application installation directory: [/home/me/.ecce/ecce-v6.4/apps]
  Application directory to upgrade: [/home/me/.ecce/ecce-6.3e/apps]
  Server installation directory: [/home/me/.ecce/ecce-v6.4/server]
  Server directory to upgrade: [/home/me/.ecce/ecce-6.3e/server]
  Backup existing server user data: [yes]

Are these choices correct (yes/no/quit)? [yes] 
Installing ECCE application software in /home/me/.ecce/ecce-v6.4/apps...
  Extracting application distribution...
  Extracting NWChem binary distribution...
  Extracting NWChem common distribution...
  Extracting client WebHelp distribution...
  Configuring application software...
  Configuring NWChem...

Installing ECCE server in /home/me/.ecce/ecce-v6.4/server...
  Extracting data server in /home/me/.ecce/ecce-v6.4/server/httpd...
  Extracting data libraries in /home/me/.ecce/ecce-v6.4/server/data...
  Extracting Java Messaging Server in /home/me/.ecce/ecce-v6.4/server/activemq...
  Configuring ECCE server...
  Copying user data from server to be upgraded...
  Copying share data from server to be upgraded...

ECCE installation succeeded.

!! You MUST perform the following steps in order to use ECCE !!
-- Unless only the user 'me' will be running ECCE,
   start the ECCE server as 'me' with:

-- To register machines to run computational codes, please see
   the installation and compute resource registration manuals

-- Before running ECCE each user must source an environment
   setup script.  For csh/tcsh users add this to ~/.cshrc:
     if ( -e /home/me/.ecce/ecce-v6.4/apps/scripts/runtime_setup ) then
       source /home/me/.ecce/ecce-v6.4/apps/scripts/runtime_setup
   For sh/bash users, add this to ~/.profile or ~/.bashrc:
     if [ -e /home/me/.ecce/ecce-v6.4/apps/scripts/ ]; then
       . /home/me/.ecce/ecce-v6.4/apps/scripts/

And then
/home/me/.ecce/ecce-v6.4/server/httpd/bin/apachectl start: httpd started
[1] 25382
INFO  BrokerService         
- ActiveMQ 5.1.0 JMS Message Broker (localhost) is starting
INFO  BrokerService      
- ActiveMQ JMS Message Broker (localhost, ID:beryllium-46481-1350964505499-0:0) started

Put the following in your ~/.bashrc

export ECCE_HOME=/home/me/.ecce/ecce-v6.4/apps
export PATH=${ECCE_HOME}/scripts:${ECCE_HOME}/scripts/parsers:${PATH}$

And run:
source ~/.bashrc

07 June 2012

179. Building ECCE on Debian Testing/Wheezy

UPDATE: Build went fine. Upgrade went fine. But the organizer doesn't show my jobs properly i.e. the files are there but they aren't recognised as jobs. I haven't had a solid look at this yet, and it might just be because I need to restart more services than just the http server. It's been a long day...

UPDATE 2: An update on a different computer went without a hitch, with all the old job files being imported properly.

UPDATE 3: I started ECCE and let the data manger chew on it for four hours. No luck on the troublesome computer. The only difference is the java -- openjdk 7 worked fine, oracle jre/jdk didn't. Dunno if this is the reason, but currently in the process of installing the binaries to see whether that works better. Updates will come...

UPDATE 4: Installing the prebuilt ECCE binaries did the trick. In summary: as far as I know you MUST use openjdk 7. SUN/Oracle Java does not appear to work. It's exhibited in a lack of ability to recognise old jobs as rather than just folders and files.

I'm trying to document everything I'm doing these days, no matter how simple or (at least in retrospect) obvious it is.

Here's how to build the 4th of June 2012 version of ecce v6.3.

You need to be registered with EMSL/PNNL to download ecce. There are plans to open-source the software properly (i.e. no need to register) sometime this coming northern summer. But for now you need to be an academic group leader to have access.

(I originally posted a somewhat different post where I recommended making some changes to the build scripts re ECCE_HOME. Eventually I saw the light and realised the error of my ways)

Download ecce-v6.3.src.tar.bz2, and put it in a suitable folder, e.g. ~/tmp
cd ~/tmp
tar -xvf ecce-v6.3-src.tar.bz2
cd ecce-v6.3/
export ECCE_HOME=`pwd`
cd build/

The first time you run ./build_ecce you'll be asked a series of questions relating to installed packages. If it's all good, answer
Do you want to skip these checks for future build_ecce invocations (y/n)? y
If anything came up, then read the message carefully and install the missing package.

NOTE: on one box I noticed different version of java and javac being found, as I had both openjdk 6 and 7 installed. I couldn't set javac to 6 but I could do
sudo update-alternatives --config java
and set it to openjdk 7.

[From my small, statistically unsound sample set Oracle/SUN java will NOT work.]

Then do ./build_ecce again. And again. And again. In all, I think you do it six or seven times - each time a new package is built.
I always get a
lib: No such file or directory.
at the end of the httpd build. Not sure why, but everything seems to be ok in spite of that.
Anyway, you know that you're done running ./build_ecce when you get
ECCE built and distribution created in /home/verahill/tmp/ecce-v6.3
At this point, you are ready to install
DO NOT USE ./install_ecce


But that's a different story. install_ecce will give you weird error messages about missing tar files. install_ecce.v6.3.csh on the other hand will work fine.

01 June 2012

169. ECCE, node hopping and cluster nodes without direct access to WAN

NOTE: I'm still working on this. But this sort of works for now. More details coming soon.
Update:  I've posted better solutions more recently for use with SGE. See aspects of e.g. and

It's not easy choosing a good title for this post which describes the purpose of it succinctly yet clearly (sort of like this sentence), so here's what we're dealing with:

The Problem:
You can access the submit node from off-site. You can't access the subnodes directly from off-site. This post shows you how you can submit to each subnode directly. A better technical solution is obviously to use qsub on the main node. Having said that, with very little modification this method can also be adapted to the situation described here:

A low-tech, home-built example is the following:
-eth0 - Main  node - eth1 - (subnodes: node0, node1, node2, node3)/

eth0 is WAN, and eth1 is LAN. You can ssh to the main node from the 'internet'. There's no queue manager like SGE installed -- submission of jobs is done by logging onto each node and executing the commands there.

The example:
Main node:

The most obvious solution:
do port redirection. The downside is that it requires some technical skills of the users, and anything with networking and ssh is a PITA if they insist on using windows.

The smarter solution:
I was informed of this solution here.

ROCKS-specific stuff
If your cluster is running ROCKS 5.4.3 and you're having issues opening csh shells, just move /etc/csh.cshrc out of the way as a crude fix.
sudo mv /etc/csh.cshrc ~/
Don't forget to do this on all the nodes if they have local /etc folders! And it's not that easy -- the passwords aren't the same on the nodes as on the main node.
So, on the main node:
rocks set host sec_attr compute-0-1 attr=root_pw
sudo su
# cat /etc/hosts
#mv /etc/csh.cshrc .

As the user you'll be running as, edit/create your ~/.cshrc

setenv GAUSS_SCRDIR /tmp
setenv ECCE_HOME /share/apps/ecce/apps
set PATH= (/share/apps/nwchem/nwchem-6.1/bin/LINUX64 $PATH)
setenv LD_LIBRARY_PATH /opt/openmpi/lib:/share/apps/openblas/lib
Repeat on all nodes.

SIDE NOTE: What I'm ultimately looking to achieve on the ROCKS cluster is front-node managed SGE submissions. Easy, you say? Well, ECCE submits a
setenv PATH /opt/gridengine/bin/lx26-amd64/:/usr/bin/:/bin:/usr/sbin:/sbin:/usr/X11R6/bin:/usr/bin/X11:${PATH}
and for some reason the ROCKS/CentOS (t)csh can't handle (most of the time) adding ${PATH} to itself since it's too long unless "s are added (it seems) i.e. this works consistently:
setenv PATH "/opt/gridengine/bin/lx26-amd64/:/usr/bin/:/bin:/usr/sbin:/sbin:/usr/X11R6/bin:/usr/bin/X11:${PATH}"
On my debian system either works fine (bsd-csh).

ANYWAY -- continue

Start setting it up using
ecce -admin

After that, do some editing in ecce-6.3/apps/siteconfig/

xappsPath: /usr/X11R6/bin
NWChem: /share/apps/nwchem/nwchem-6.1/bin/LINUX64/nwchem
Gaussian-03: /share/apps/gaussian/g09/g09
sourceFile:  /etc/csh.login

compute-0-0 compute-0-0.local Dell HPCS3 Phase 1 Cluster AMD 2.6 GHz Opteron 40:5 ssh :NWChem:Gaussian-03MN:RD:SD:UN:PW