Wednesday, November 08, 2006

The 4th IGT Grid-HPC Work group meeting



The 4th Grid-HPC Work group meeting

Date: Thursday, November 9th

Location: IGT Offices, Maskit 4, 5th Floor, Hertzliya

Agenda:

14:00 - 14:15: Opening. Avner & Guy

14:15 - 15:00: Grid Mathematica.

Speaker: Haim Ricas, M.Sc Applied Mathematica

General Manager Tashtit Scientific Consultants Ltd

Wolfram Research Distributors in Israel

Abstract:

gridMathematica implements many parallel programming primitives and includes high-level commands for parallel execution of matrix operations, plotting, and other functions. It comes with sample applications of many popular new programming approaches, such as parallel Monte Carlo simulation, visualization, searching, and optimization.

The implementations for all high-level parallel processing commands are provided in Mathematica source form, so they can also serve as templates for users to build their own parallel programs.

At the meeting I'll present a general overview, I'll demonstrate powerful examples and I'll discuss main key benefits on Mathematica and gridMathematica

Speaker: Zvi Tannenbaum from ACS (http://www.acs-grid.com/) will talk about:
High-Performance Computing with Mathematica: The PoochMPI Toolkit
The new PoochMPI Toolkit for Mathematica enables Wolfram Research's
Mathematica to be combined with the easy-to-use, supercomputer-
compatible Pooch clustering technology of Dauger Research. This
fusion applies the parallel computing paradigm of today's
supercomputers and the ease-of-use of Pooch to Mathematica, enabling
possibilities none of these technologies could do alone.
Closely following the supercomputing industry-standard Message-
Passing Interface (MPI), the PoochMPI Toolkit creates a standard way
for every Mathematica kernel in the cluster to communicate with each
other directly while performing computations. In contrast to typical
grid implementations that are solely master-slave or server-client,
this solution instead has all kernels communicate with each other
directly and collectively the way modern supercomputers do.

15:00 - 15:10: Break

15:10 - 15:40: Intel Xeon Dual Core Processor Roadmap.

Speaker: Moshe Yaacov, Eastronics, Technical Support Manager.

Abstract: TBA

15:40- 16:15: Discussion and concluding remarks

To register, please send your contact details to info@grid.org.il

Thank you!

Guy Tel-Zur

Grid-HPC WG Director

IGT

www.Grid.org.il

Sunday, October 29, 2006

Installing WiFi on my DELL Latitude 420 running Fedora Core 6

I just installed Fedora Core 6. The WiFi was not detected (once again :( ).
Here are the steps to overcome this problem:


1) Find the type of card installed on your laptop by lspci -v:
0c:00.0 Network controller: Broadcom Corporation Dell Wireless 1390 WLAN Mini-PCI Card (rev 01)

Subsystem: Dell Unknown device 0007

Flags: bus master, fast devsel, latency 0, IRQ 177

Memory at dfdfc000 (32-bit, non-prefetchable) [size=16K]

Capabilities: [40] Power Management version 2

Capabilities: [58] Message Signalled Interrupts: 64bit- Queue=0/0 Enable-
Capabilities: [d0] Express Legacy Endpoint IRQ 0

Capabilities: [100] Advanced Error Reporting
Capabilities: [13c] Virtual Channel


2) Download and install the latest ndiswrapper (ndiswrapper-1.27.tar in my case)

# cd ndiswrapper-1.27

# make uninstall

# make
# make install
# lspci -n

[root@localhost DRIVER]# ndiswrapper -h
install/manage Windows drivers for ndiswrapper
usage: ndiswrapper OPTION
-i inffile install driver described by 'inffile'
-a devid driver use installed 'driver' for 'devid'
-r driver remove 'driver'
-l list installed drivers
-m write configuration for modprobe
-ma write module alias configuration for all devices
-mi write module install configuration for all devices
-v report version information
where 'devid' is either PCIID or USBID of the form XXXX:XXXX,
as reported by 'lspci -n' or 'lsusb' for the card

3) Download the corresponding Windows driver, check this link.
In my case download the following driver http://ftp.us.dell.com/network/R115321.EXE.
Then, proceed as follows:
# unzip R115321.EXE
# cd DRIVER

[root@localhost wifi_driver]# ll ./DRIVER/
total 992
-rw-r--r-- 1 root root 10102 Nov 14 2005 bcm43xx.cat
-rw-r--r-- 1 root root 555872 Nov 2 2005 bcmwl5.inf
-rw-r--r-- 1 root root 424320 Nov 2 2005 bcmwl5.sys
[root@localhost wifi_driver]# cd DRIVER/
[root@localhost DRIVER]# ndiswrapper -i bcmwl5.inf
installing bcmwl5 ...

4) Check that it works and can be configured:
# iwconfig
lo no wireless extensions.
eth0 no wireless extensions.
sit0 no wireless extensions.
wlan0 IEEE 802.11g ESSID:"Polaris" Nickname:"Polaris"
Mode:Managed Frequency:2.437 GHz Access Point: 00:14:7C:B6:EC:14
Bit Rate=54 Mb/s Tx-Power:32 dBm
RTS thr=2347 B Fragment thr=2346 B
Encryption key:xxxx-xxxx-xxxx-xxxx-xxxx-xxxx-xx Security mode:restricted
Power Management:off
Link Quality:78/100 Signal level:-46 dBm Noise level:-96 dBm
Rx invalid nwid:0 Rx invalid crypt:0 Rx invalid frag:0
Tx excessive retries:0 Invalid misc:0 Missed beacon:0

Comment: At this stage you still won't see some of the details because you are not

connected yet. But, at least you will see something for the 'wlan0'.

5) Install the WiFi Radar:

# bzip2 -d ./wifi-radar-1.9.7.tar.bz2

# tar xvf ./wifi-radar-1.9.7.tar
# cd wifi-radar-1.9.7
# cp ./wifi-radar /usr/sbin
# mkdir /etc/wifi-radar
# cp ./wifi-radar.conf /etc/wifi-radar/.
# vi /etc/wifi-radar/wifi-radar.conf and change eth2 to wlan0!!!
# ./wifi-radar

Here are a few screen shots:

Click on "connect":

...and BINGO!



Good luck!!!

Guy

Tuesday, October 24, 2006

Installing WiFi on my ThinkPad

I installed CentOS 4.4 and was surprised to find out that the wireless connection was not working.
After some Internet searching I did the follwing two steps which solved the problem:
1) Installed the IPW2200 firmware
2) Installed the WiFi Radar
Enclosed are screen shots that summerise all the ingredients and the successful happy end


Monday, August 07, 2006

How to extract your DN (Distinguished Name) from your certificate

The recepie below may not be the simplest but this is what I had discovered while learning this subject

From the browser which holds your certificate make a bakup that will save it in p12 format.
If you are working under Windows, this is the right moment to FTP it to
Linux where life are easier, then proceed from you Linux computer by converting the *.p12 certificate to *.pem format:

openssl pkcs12 -info -in guy_cert_iucc_expired.p12 -out ~/.globus/usercert.pem

(don't forget BTW to chmod 644 this file)

then, extract the DN by:

/usr/local/globus-4.0.2/bin/grid-cert-info -subject

in my case the result will be:

/C=IL/O=IUCC/CN=IUCC/emailAddress=ca@mail.iucc.ac.il

which is my DN

actually you can simply grep the string subject from usercert.pem

Good luck

Monday, July 31, 2006

Today I want to document setting a CA for my Globus project

The following work was done on "titan":


[globus@titan globus-4.0.2]$ export GLOBUS_LOCATION=/usr/local/globus-4.0.2
[globus@titan globus-4.0.2]$ $GLOBUS_LOCATION/setup/globus/setup-simple-ca

WARNING: GPT_LOCATION not set, assuming:
GPT_LOCATION=/usr/local/globus-4.0.2



C e r t i f i c a t e A u t h o r i t y S e t u p

This script will setup a Certificate Authority for signing Globus
users certificates. It will also generate a simple CA package
that can be distributed to the users of the CA.

The CA information about the certificates it distributes will
be kept in:

/home/globus/.globus/simpleCA/

ERROR: It looks like a CA has already been setup at this location.
Do you want to overwrite this CA? (y/n) [n]:y

The unique subject name for this CA is:

cn=Globus Simple CA, ou=simpleCA-titan, ou=GlobusTest, o=Grid

Do you want to keep this as the CA subject (y/n) [y]:

Enter the email of the CA (this is the email where certificate
requests will be sent to be signed by the CA):tel-zur@computer.org

The CA certificate has an expiration date. Keep in mind that
once the CA certificate has expired, all the certificates
signed by that CA become invalid. A CA should regenerate
the CA certificate and start re-issuing ca-setup packages
before the actual CA certificate expires. This can be done
by re-running this setup script. Enter the number of DAYS
the CA certificate should last before it expires.
[default: 5 years (1825 days)]:

Enter PEM pass phrase:
Verifying - Enter PEM pass phrase:


creating CA config package...done.


A self-signed certificate has been generated
for the Certificate Authority with the subject:

/O=Grid/OU=GlobusTest/OU=simpleCA-titan/CN=Globus Simple CA

If this is invalid, rerun this script

/usr/local/globus-4.0.2/setup/globus/setup-simple-ca

and enter the appropriate fields.

-------------------------------------------------------------------

The private key of the CA is stored in /home/globus/.globus/simpleCA//private/cakey.pem
The public CA certificate is stored in /home/globus/.globus/simpleCA//cacert.pem
The distribution package built for this CA is stored in

/home/globus/.globus/simpleCA//globus_simple_ca_89aac96f_setup-0.19.tar.gz

This file must be distributed to any host wishing to request
certificates from this CA.

CA setup complete.

The following commands will now be run to setup the security
configuration files for this CA:

$GLOBUS_LOCATION/sbin/gpt-build /home/globus/.globus/simpleCA//globus_simple_ca_89aac96f_setup-0.19.tar.gz

$GLOBUS_LOCATION/sbin/gpt-postinstall
-------------------------------------------------------------------



setup-ssl-utils: Configuring ssl-utils package
Running setup-ssl-utils-sh-scripts...

***************************************************************************

Note: To complete setup of the GSI software you need to run the
following script as root to configure your security configuration
directory:

/usr/local/globus-4.0.2/setup/globus_simple_ca_89aac96f_setup/setup-gsi

For further information on using the setup-gsi script, use the -help
option. The -default option sets this security configuration to be
the default, and -nonroot can be used on systems where root access is
not available.

***************************************************************************

setup-ssl-utils: Complete

[globus@titan globus-4.0.2]$

as root:

[root@titan root]# /usr/local/globus-4.0.2/setup/globus_simple_ca_89aac96f_setup/setup-gsi
setup-gsi: Configuring GSI security
Making /etc/grid-security...
mkdir /etc/grid-security
Making trusted certs directory: /etc/grid-security/certificates/
mkdir /etc/grid-security/certificates/
Installing /etc/grid-security/certificates//grid-security.conf.89aac96f...
Running grid-security-config...
Installing Globus CA certificate into trusted CA certificate directory...
Installing Globus CA signing policy into trusted CA certificate directory...
setup-gsi: Complete
[root@titan root]#

Thursday, June 29, 2006

How to start MonALISA - A reminder for myself

cd to: /home/condor/MonaLisa/Service/CMD
then type:
[condor@grid4 CMD]$ ./MLD start
Password:
Starting UPDATE
..........OK
Trying to start MonaLisa.Please wait...STARTED [ PID == 4427 ]

or better, as root:
Usage: /etc/rc.d/init.d/MLD [start|stop|restart]

Then enjoy the plot from the client:

Monday, June 19, 2006

How to create a big file?

We wanted to do some benchmarks about file transfers between Israel and Singapore.
Here is my way to create a big file, using Octave (Matlab should work as well):

-bash-2.05b$ cat ./big.m
mat=ones(1,1000*1024*1024/8);
size(mat)
save -binary OneGig.dat mat

# here are a few examples:

-bash-2.05b$ ls -l
-rw-r--r-- 1 tel-zur tel-zur 1000M Jun 23 18:19 OneGig.dat

You are welcome to try, but hey! Give me a credit ok ? :)

Here is an update to this post (Nov.16,2006):
The method above can be called the "Physicist Way", Now I will describe the "Computer Geek Way", which is much more elegant of course:

dd if=/dev/zero of=one_gig_file bs=1M count=1000

That's it!

Monday, June 12, 2006

The 3rd IGT HPC work group meeting

The 3rd IGT HPC work group meeting will take place on Wednesday, June 21, 15:00-17:00
For further details click on the link or visit the IGT web site then click on "Next IGT Events"

The wonderful VPN

Using VPN I can now do lots of things as if I am connected from inside the campus:


Bingo! I am connected and able to submit Condor jobs from the departmental terminal server and of course browes magazines from the university library.




Wednesday, May 24, 2006

Distributed Debugger Tool

The DDT is now installed on the grid nodes and works fine!
Here is a screenshot of my first test run:This test was performed using OpenMPI v1.0.2.

Friday, April 28, 2006

So many new things to learn

Below is a list of technologies that I want to take a look at some time.....

1) WxPython, more references: Ref #1.
2) Ruby, Ruby in a Nutshell By Yukihiro Matsumoto available from the ACM/Safari.
What is the hell Ruby on Rails?????

The next items were mentioned in the April issue of Linux Journal:

3) RH clones: CentOS, Scientific Linux, Tao Linux . A) Which it better? B) How do you define "Better"?
4) openswan
5) squid, SquidGuard
6) tinyca2
7) etherreal
8) snort
9) freeradius
10) Planet
11) xoops

Thursday, April 27, 2006

Things I did today

1) Corrected the Java configuration in the Windows machines and posted a comment about it to the condor-users mailing list.
2) Discovered that my Ph.D. dissertation is cited in a paper called "Spin as an additional tool for QGP investigations"
3) Search for my papers using Google Scholar.
4) Participated together with Avner and Nati in a meeting with the MAGNET committee.
5) Condor week 2006 presentation slides are available here.
6) An interesting web site with lots of material: MSc in e-Science Web Site
7) A book to read: Grid Computing The Savvy Manager’s Guide
8) Today published the new edition of Physica Plus. Strangely the opening article about R. P. Feynman was written by Yuval Neeman who died yesterday at age 81 and his funeral took place earlier today.

Wednesday, April 26, 2006

Things I did today

1) "Condorize" application for Itzhak Orion: delivered ~100 jobs at the BGU Condor pool.

What worries me is the long tail of a few jobs that never end. They go back and forth between the computing nodes. Below is the load of today's test:

2) "Condorize"application for Chen Keasar. Part 1, successfully run the Java application on my notebook, it consumes lots of CPU. Next step will be to run it under "Personal Condor" and then to install everything in the big pool.
3) Downloaded and read about Fluka.
4) Made a conference call about allowing Nova application to run at the BGU Condor pool.
5) Took a look at "condor_status -xml", then downloaded the missing classads.dtd and tried to see of XMLSpy home edition can make the very long output more readable. I need to learn more about parsing XML.

Wednesday, April 19, 2006

Ganglia @ home


Ganglia is now installed on my Linux nodes at home.
You can see it live by visiting my website tel-zur.org.
A screenshot is on the left. In two of the nodes (FC 5 and Scientific Linux) installing the gmond rpm was enough, but on the 3rd node (SuSE10) I had to compile the source.

Monday, March 20, 2006

Password-less ssh connection

Follow these steps:
1) ~ mkdir .ssh
2) cd .ssh
3) ssh-keygen -t dsa % no passphrase, just press twice
4) cat id_dsa.pub >> authorized_keys2
5) Repeat step #4 for all the public keys of the desired nodes.
6) chmod 600 ./authorized_keys2
* Enjoy
Testing the network connection between the BGU and my new collaborators in Singapore

Monday, February 13, 2006

Condor is now installed on the Grid nodes

Condor version and platform:
$CondorVersion: 6.6.10 Jun 13 2005 $
$CondorPlatform: I386-LINUX_RH9 $


Condor is now instaled on Grid4,5,6,7 and 9. All the nodes are both compute and submit nodes, Grid9 is the Central Manager.
Grid8 is down due to a hard disk failure.

[condor@grid4 condor]$ condor_status

Name OpSys Arch State Activity LoadAv Mem ActvtyTime

vm1@grid4.bgu LINUX INTEL Owner Idle 0.000 501[?????]
vm2@grid4.bgu LINUX INTEL Unclaimed Idle 0.000 501 0+00:39:33
vm1@grid5.bgu LINUX INTEL Owner Idle 0.000 249[?????]
vm2@grid5.bgu LINUX INTEL Owner Idle 0.000 249[?????]
vm1@grid6.bgu LINUX INTEL Owner Idle 0.000 501[?????]
vm2@grid6.bgu LINUX INTEL Owner Idle 0.000 501[?????]
vm1@grid7.bgu LINUX INTEL Owner Idle 0.000 501[?????]
vm2@grid7.bgu LINUX INTEL Owner Idle 0.000 501[?????]
vm1@grid9.bgu LINUX INTEL Owner Idle 0.000 501 0+00:15:09
vm2@grid9.bgu LINUX INTEL Unclaimed Idle 0.000 501 0+00:00:05

Machines Owner Claimed Unclaimed Matched Preempting

INTEL/LINUX 10 8 0 2 0 0

Total 10 8 0 2 0 0


Condor_view is available here.

The Grid computers specifications

Grid4: Processor: Dual AMD Athlon(tm) MP 2000+, 1666MHz, cache: 256 KB, RAM: 1GB, HD: 40GB, OS: Scientific Linux 3.03, Kernel: 2.4.21-20.ELsmp

Grid5: Processor: Intel(R) Pentium(R) 4 CPU 3.00GHz, cache: 1MB, HD:80GB, OS: Scientific Linux, Kernel 2.4.21-37.ELsmp.

Grid6: Processor: Dual AMD Athlon(tm) MP 1900+, 1600MHz, cache: 256 KB, RAM: 1GB, HD: 40GB, OS: Scientific Linux, Kernel: 2.4.21-20.ELsmp

Grid7: Processor: Dual AMD Athlon(tm) MP 2000+, 1666MHz, cache: 256 KB, RAM: 1GB, HD: 40GB, OS: Scientific Linux 3.03, Kernel: 2.4.21-20.ELsmp

Grid8: Down!!!

Grid9: Dual AMD Athlon(tm) MP 2000+, 1666MHz, cache: 256 KB, RAM: 1GB, HD: 40GB, OS: Scientific Linux 3.03, Kernel: 2.4.21-20.ELsmp

A New Condor Application at the BGU

Collaboration with Chen Keasar
In the forthcoming weeks I will try to "Condorize" his computer code Meshi

BGU grid computers maintenance

Welcome to my Grid Computing and other stuff blog!
Your comments will be most appreciated.

Today's activities
1) Set ssh access without password between nodes.
2) Install Condor 6.6.10 on grid8.bgu.ac.il being the central manager:
under /usr/local
[root@grid8 local]# gzip -d condor-6.6.10-linux-x86-glibc23-dynamic.tar.gz
[root@grid8 local]# tar xvf ./condor-6.6.10-linux-x86-glibc23-dynamic.tar
[root@grid8 local]# hostname grid8.bgu.ac.il
[root@grid8 local]# cd condor-6.6.10
[root@grid8 condor-6.6.10]# ./condor_install
My answer to the Condor installer:
Full installation.
Multiple machines.
Machines do not share files via a file server.
There is no realse dir yet.
Installation dir: /usr/local/condor
Create that directory.
Notify by Email to: tel-zur@ee.bgu.ac.il
Mail path: /bin/mail
Do all the machines are from domain "bgu.ac.il" - Yes.
Unique UID - No.
Enable Java support: Yes
Java exists under: /usr/bin/java
Create links to other directories: Yes
"bin" will go to /usr/local/bin
Full name of the central manager: grid8.bgu.ac.il (this node)
Condor directories will go to: /home/condor
Local config file:
Creating config files in "/home/condor" ... done.

Configuring global condor config file ... done.
Created /usr/local/condor/etc/condor_config.

Pool name: "BGU grid"
Should I put in a soft link from /home/condor/condor_config to
/usr/local/condor/etc/condor_config [yes] yes

As "root" start up Condor: /usr/local/condor/etc/examples/condor.boot start
Unfortunately, Condor did not start.
It seems that the HD has a problem, here are a few lines from /var/log/messages:
Feb 13 09:58:58 grid8 kernel: end_request: I/O error, dev 03:02 (hda), sector 591512
Feb 13 10:04:02 grid8 kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
Feb 13 10:04:02 grid8 kernel: hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=800358, sector=59
1512

The hards disk will be replaced and I lost a working hour :(

The Condor central manager installation will be repeated on Grid9