Monday, February 13, 2006

BGU grid computers maintenance

Welcome to my Grid Computing and other stuff blog!
Your comments will be most appreciated.

Today's activities
1) Set ssh access without password between nodes.
2) Install Condor 6.6.10 on being the central manager:
under /usr/local
[root@grid8 local]# gzip -d condor-6.6.10-linux-x86-glibc23-dynamic.tar.gz
[root@grid8 local]# tar xvf ./condor-6.6.10-linux-x86-glibc23-dynamic.tar
[root@grid8 local]# hostname
[root@grid8 local]# cd condor-6.6.10
[root@grid8 condor-6.6.10]# ./condor_install
My answer to the Condor installer:
Full installation.
Multiple machines.
Machines do not share files via a file server.
There is no realse dir yet.
Installation dir: /usr/local/condor
Create that directory.
Notify by Email to:
Mail path: /bin/mail
Do all the machines are from domain "" - Yes.
Unique UID - No.
Enable Java support: Yes
Java exists under: /usr/bin/java
Create links to other directories: Yes
"bin" will go to /usr/local/bin
Full name of the central manager: (this node)
Condor directories will go to: /home/condor
Local config file:
Creating config files in "/home/condor" ... done.

Configuring global condor config file ... done.
Created /usr/local/condor/etc/condor_config.

Pool name: "BGU grid"
Should I put in a soft link from /home/condor/condor_config to
/usr/local/condor/etc/condor_config [yes] yes

As "root" start up Condor: /usr/local/condor/etc/examples/condor.boot start
Unfortunately, Condor did not start.
It seems that the HD has a problem, here are a few lines from /var/log/messages:
Feb 13 09:58:58 grid8 kernel: end_request: I/O error, dev 03:02 (hda), sector 591512
Feb 13 10:04:02 grid8 kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
Feb 13 10:04:02 grid8 kernel: hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=800358, sector=59

The hards disk will be replaced and I lost a working hour :(

The Condor central manager installation will be repeated on Grid9

No comments: