Monday, December 24, 2007
I added the following comment:
"I liked the example and I think this kind of demonstrations really help explaining were bottlenecks may occur in the parallel code. I would like to add two comments:
1. Solving the matrix and obtaining 0 for the startup time, after already neglecting the bandwidth, means that the problem is totally Embarrassingly Parallel and this is indeed the case for MC calculations. It would be interesting to repeat the exercise for a communication intensive application, e.g. Laplace equation.
2. I think it worth mentioning Collective Communication commands, i.e. Bcast and Reduce in this context. In particular Reduce+Summation can be more efficient than the loop over the Recv because it can do summation out of order and this may hide some of the holding/communication time. Another alternative is to use MPI ANY_SOURCE in the Recv command.
Tuesday, December 18, 2007
Wednesday, December 12, 2007
Doodle - Event scheduling
Moodle - A free open source course management system (CMS)
Schnoodle - A Schnoodle is a Poodle hybrid that is a cross-breed of a Poodle and a Schnauzer
Saturday, December 08, 2007
I searched for a few Buzz Words and then repeated the search with a combination of them.
The number of entries returned is not in accordance with the known laws of arithmetics!
"Grid Computing" 171,000 entries
where "XXX YYY" means with the exact phrase XXX YYY!
virtualization 1,530,000 entries
"Grid Computing" Virtualization 1,300,000 entries
where space between the terms means with all of the words.
virtualization -"grid computing" 2,580,000 entries
where "-" means exclude the term from the search
-virtualization "grid computing" 4,510,000 entries
In order to verify this mystery I repeated the test with two other terms:
Israel 26,600,000 entries
Jerusalem 4,190,000 entries
Jerusalem Israel 1,120,000 entries
jerusalem -israel 2,150,000 entries
-jerusalem israel 45,300,000 entries
To my understanding the situation can be demonstrated as in the following plot:
Does this mean we should make a compromise in East Jerusalem's territory
(have less Jerusalem and get more Israel) ???
Comments are welcome to shed light on this mystery
Note: If you want to reproduce my test take into account:
1. There may be a small change in the number of entries found. This fluctuation is however negligible.
2. There is another small difference in the number of entries if you try capital letters instead of small letters or change the order of the words.
3. Drawing was produced using the free tool Dia.
Wednesday, November 28, 2007
If Grid Computing is so good but what we see practically is a complete chaos in the Data Center then there must exist an Anti-Grid that cancels all its benefits.
If one tries to make an order and to split the Data Center into two separate sites then immediately Grid - Anti-Grid pairs are produced out of the vacuum and chaos is being restored.
Monday, November 26, 2007
In the table below I mention a few.
Friday, November 23, 2007
1. Verify that the device is recognized in 'dmesg', write down the device name. e.g. /dev/sdb
2. mkdir /disk_on_key
3. mount -t vfat /dev/sdb /disk_on_key
4. at the end un-mount by: umount /disk_on_key
Sunday, November 18, 2007
I installed Globus version gt4.0.4-x86_64_rhas_4-installer on an Opteron node running CentOS 4.4.
output of uname -a is:
2.6.9-55.0.2.ELsmp #1 SMP Tue Jun 26 14:14:47 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux
In the globus-gatekeeper.log there is the following error message:
TIME: Thu Oct 25 18:16:50 2007
PID: 14589 -- Notice: 6: globus-gatekeeper pid=14589 starting at Thu Oct 25 18:16:50 2007
TIME: Thu Oct 25 18:16:50 2007
PID: 14589 -- Notice: 6: Got connection xxx.xxx.xxx.xxx at Thu Oct 25 18:16:50 2007
GSS authentication failure
GSS Major Status: General failure
GSS Minor Status Error Chain:
globus_gsi_gssapi: Error during delegation: Delegation protocol violation
Failure: GSS failed Major:000d0000 Minor:00000002 Token:00000000
TIME: Thu Oct 25 18:16:50 2007
PID: 14589 -- Failure: GSS failed Major:000d0000 Minor:00000002 Token:00000000
I would appreciate any help resolving this problem
Then, I received the following reply from Charles Bacon:
That's the sign of a client who disconnected because it didn't trust the gatekeeper. From the gatekeeper's point of view, the disconnect of the client is a violation of protocol. It doesn't indicate anything wrong with your gatekeeper.
Thank you for your reply!
Thursday, November 15, 2007
I keep there references to relevant links and documents.
You are welcome to join the group: Facebook | Software Vulnerability
My page in Facebook
(Note: In order to access the group you need first to login to your Facebook account)
Sunday, November 11, 2007
The Israel Physical Society (IPS) on-line magazine, PhysicaPlus, is published a few times a year both in English and in Hebrew and is an excellent source for semi-popular Physics articles.
You are invited to visit PhysicaPlus
Sunday, November 04, 2007
Wednesday, October 31, 2007
"EGEE-III will last for 24 months, with a total manpower bid of almost 10,000 person months and an EC budget of Euro 36 million. As with EGEE-II, partners will provide extra effort to the project beyond that funded by the EC, bringing the total project budget up to Euro 70 million, and also contributing a further estimated Euro 50 million worth of computing resources."
and I ask myself what is more justified, a single extended 3PFLOPs BlueGene/P with 884,736-processors or thousands of various kinds of old 32 bit boxes distributed all over the world running Scientific Linux 3.0.X and gLite?
For years we were told that no single data center site will be able to cope with a few PB of data per year that will be produced by the LHC; Nice Powerpoint presentations showed a 20km tower of CDROMs, higher than the Mt. Blanc, indicating that only distributed Grid Computing environment will save us from a catastrophe.
Perhaps it is time to ask, in the spirit of Byron Katie's book Loving What Is, her four questions
- Is it true?
- Can you absolutely know that it's true?
- How do you react when you believe that thought?
- Who would you be without the thought?
*Inspired by the lyrics from Laurie Anderson's song, SmokeRings: "Que es mas macho, pineapple o knife?"
Sunday, October 21, 2007
We are now four years later and something is not going well with "Grid Computing".
An indication that there is a problem can easily be seen by looking at the "Google Trends" plot for the term "Grid Computing":
(click on the image to get the current trend).
This finding can be compared with another buzz word, "Virtualization", which is older than "Grid Computing" and yet is gaining more and more momentum:
There is however one exception. The Academic Grid is still having lot's of glory thanks to the huge heavily funded European (EGEE) and other US projects. When LHC data will start to be taken at CERN it will reach it's top importance. But, it seems that for other scientific projects Grid Computing is not going to be such a success. It will remain as "Nice to have" but will never replace High-Performance Computing (HPC) on one hand and classical distributed computing tools such as Condor  which exists for more than 20 years on the other hand.
Once the governmental fundings will be removed then all the hype of the academic Grid Computing will decline very quickly as well.
As was pointed in an interesting talk by Fabrizio Gagliardi about the future of grid computing, at the GridKa07 School, other kinds of Grid Computing infrastructures that will stand on stable financial ground may emerge as the successors, for example Amazon's S3 and EC2 and the joint IBM and Google's cloud computing.
Thursday, September 13, 2007
Wednesday, September 12, 2007
Upon installing this JDK from SUN there was a strange error message that "run-java-tool is not available". Then, I noticed that /usr/bin/java was pointing to /usr/bin/run-java-tool. I know that it was invented with some good reasons but in order to start playing quickly with Eclipse I just removed this link and put instead:
"ln -s /usr/lib/jvm/sun-jdk-1.6/bin/java /usr/bin/java", then Eclipse was happy and me too!!!
Sunday, June 24, 2007
June 28th, 14:00-16:00 - Grid-HPC Work Group meeting
Location: IGT Offices, Maskit 4, 5th Floor, Hertzliya
14:00-14:15: OPENING - Avner & Guy
DEBUGGING AND OPTIMIZING APPLICATIONS FOR MULTICORE MPP ARCHITECTURES
Jacques Philouze, Vice President Sales & Marketing, Allinea
As two, four and potentially eight-core processors become the norm, the defacto HPC architecture is tending towards large clusters of modest 8-16 core shared-memory servers, potentially with co-processing devices (eg. GPGPUs, FPGAs, Clearspeed). Programming these machines optimally presents a number of challenges, and applications that use a mixed programming models are now becoming commonplace.
In this presentation we will discuss the challenges facing today's HPC application developers, and the need for simple tools that can address mixed programming models. We will present new multicore features of Allinea's Distributed Debugging Tool (DDT) and Optimisation and Profiling Tool (OPT), and discuss our aims to provide a consolidated, scalable, yet intuitive framework for HPC developers .
FastDL - Cluster Computing with IDL
Timely Visualization and Analysis of Large Data Sets Using IDL and Parallel Computing
Arie Rubin M.Sc.E.E.
IIT-Image Information Technologies (Represents ITT VIS, Boulder Colorado USA)
Scientists exploring fluid and particle dynamics, high-energy and plasma physics, astrophysics and space sciences, biophysics, protein folding and medical science are challenged to visualize and analyze increasingly complex data. With FastDL scientists and developers can run IDL visualization and analyses applications in parallel on cost-effective Linux clusters, significantly shortening the time required to get results. FastDL is comprised of two independent components that address the varying needs of parallel data analysis and visualization applications: TaskDL and mpiDL.TaskDL allows users to run IDL procedures on multiple machines simultaneously by collecting different remote processors together as a task farm. mpiDL provides a Message Passing Interface (MPI) within IDL for synchronizing and passing data between nodes during program execution. GRIDL – Grid Computing with IDL Running parallel IDL applications on a set of nodes communicating over WAN.
15:35-15:45: DISCUSSION AND CONCLUDING REMARKS
To register, please send your contact details to: firstname.lastname@example.org
We are looking forward to seeing you!
Grid-HPC WG DirectorIGT
Saturday, June 16, 2007
Friday, June 15, 2007
I found the following reference useful when I installed VMware:
How To Install VMware Server On A Fedora 7 Desktop | HowtoForge - Linux Howtos and Tutorials
However, there still was a problem: After defining a new virtual machine I got this error message:
"Unable to change virtual machine power state. The "/usr/lib/vmware/bin/vmware-vmx" process did not start properly...."
Any ideas what to do next????
I think this is because the latest VMware is not ready yet for the advanced F7 and its latest kernel.
What it did was to fall back to CentOS5.
Wednesday, May 30, 2007
[root@grid02]# more /diskless/centos44_diskless_ver_03/root/etc/rc.local
# This script will be executed *after* all the other init scripts.
# You can put your own initialization stuff in here if you don't
# want to do the full Sys V style init stuff.
#--------added manually by Guy, 30/5/2007:
mount -t nfs 192.168.1.2:/home /home
2) In order to document the installation procedure, here is the files & directory structure on the server which is relevant for the pxe-boot & tftp stage:
[root@grid02 ~]# ls /tftpboot/
[root@grid02 ~]# ls /tftpboot/linux-install/
centos44_diskless_ver_01 centos44_diskless_ver_03 pxelinux.0
centos44_diskless_ver_02 msgs pxelinux.cfg
[root@grid02 ~]# ls /tftpboot/linux-install/centos44_diskless_ver_03
[root@grid01 ~]# ls /tftpboot/linux-install/pxelinux.cfg/
C0A80101 C0A80106 C0A80109 C0A8010C C0A8010F C0A80112 default
C0A80103 C0A80107 C0A8010A C0A8010D C0A80110 C0A80113 pxeos.xml
C0A80104 C0A80108 C0A8010B C0A8010E C0A80111 C0A80114
Typical content of a C0A.... file:
telzur@grid02 pxelinux.cfg]$ more C0A80101
append initrd=centos44_diskless_ver_03/initrd.img root=/dev/ram0 init=disklessrc NFSROOT=192.168.1.2:/diskless/centos44_diskless_ver_03 ramdisk_size=16254
[telzur@grid02 pxelinux.cfg]$ more pxeos.xml
Monday, May 21, 2007
Enter GRID pass phrase:
Your identity: /C=IL/O=IUCC/OU=BGU/CN=Guy Tel-Zur
Cannot find file or dir: /home/telzur/.glite/vomses
Creating temporary proxy ........................................................ Done
Contacting lcg-voms.cern.ch:15004 [/C=CH/O=CERN/OU=GRID/CN=host/lcg-voms.cern.ch] "dteam" Done
Creating proxy .......................................................... Done
Your proxy is valid until Wed May 23 05:27:43 2007
[telzur@cs-grid4 tests]$ voms-proxy-info
subject : /C=IL/O=IUCC/OU=BGU/CN=Guy Tel-Zur/CN=proxy
issuer : /C=IL/O=IUCC/OU=BGU/CN=Guy Tel-Zur
identity : /C=IL/O=IUCC/OU=BGU/CN=Guy Tel-Zur
type : proxy
strength : 512 bits
path : /tmp/x509up_u33335
timeleft : 11:59:55
[telzur@cs-grid4 tests]$ glite-job-submit hello2.jdl
Selected Virtual Organisation name (from proxy certificate extension): dteam
Connecting to host g01.phy.bg.ac.yu, port 7772
Logging to host g01.phy.bg.ac.yu, port 9002
JOB SUBMIT OUTCOME
The job has been successfully submitted to the Network Server.
Use glite-job-status command to check job current status. Your job identifier is:
[telzur@cs-grid4 tests]$ glite-job-status https://g01.phy.bg.ac.yu:9000/ZUMjYMdDBZW4cH9pa9aLyg
Status info for the Job : https://g01.phy.bg.ac.yu:9000/ZUMjYMdDBZW4cH9pa9aLyg
Current Status: Done (Success)
Exit code: 0
Status Reason: Job terminated successfully
Submitted: Tue May 22 17:28:18 2007 IDT
[telzur@cs-grid4 tests]$ glite-job-output --dir . \ https://g01.phy.bg.ac.yu:9000/ZUMjYMdDBZW4cH9pa9aLyg
Retrieving files from host: g01.phy.bg.ac.yu ( for https://g01.phy.bg.ac.yu:9000/ZUMjYMdDBZW4cH9pa9aLyg )
JOB GET OUTPUT OUTCOME
Output sandbox files for the job:
have been successfully retrieved and stored in the directory:
[telzur@cs-grid4 tests]$ cd telzur_ZUMjYMdDBZW4cH9pa9aLyg/
[telzur@cs-grid4 telzur_ZUMjYMdDBZW4cH9pa9aLyg]$ ls
[telzur@cs-grid4 telzur_ZUMjYMdDBZW4cH9pa9aLyg]$ more ./hw.err
[telzur@cs-grid4 telzur_ZUMjYMdDBZW4cH9pa9aLyg]$ more ./hw.out
So far it looks GOOD!!!
Tuesday, May 15, 2007
Here is a screen shot showing Windows-XP and CentOS5 running as guests Operating Systems (click on the image to enlarge):
Sunday, May 13, 2007
Monday, March 12, 2007
Sunday, March 11, 2007
rsync -a -e ssh golden_client:/ /diskless/whatever/root/
Use the following syntax which does not produce error messages:
rsync -v -a -e ssh --exclude='/sys/*' --exclude='/proc/*' golden_client:/ /diskless/whatever/root/
Other comments (that I find critial in my experience):
1) Make sure the following RPM is installed in the client: busybox-anaconda
(In my CentOS4.4 case it is: busybox-anaconda-1.00.rc1-5.x86_64.rpm)
2) Make sure there are no active NFS mounted partitions on the client before executing 'rsync'.
3) Make sure that the following 2 directories exist:
and that in the tftp definition file under /etc/xinted.d/tftp the following line is set correctly:
server_args = -s /tftpboot/linux-install
4) If pxelinux.0 does not exist under /tftp/linux-install copy it:
cp /usr/lib/syslinux/pxelinux.0 /tftpboot/linux-install/.
5) Pay attention to the directory hirarchy:
if under /etc/xinetd.d/tftp appears a line, as we wrote above: server_args = -s /tftpboot/linux-install
then in /etc/dhcpd.conf the reference to pxelinux.0 should be:
filename "linux-install/pxelinux.0"; or any other variation of it. which will make a failure in booting the diskless nodes.
Friday, March 09, 2007
1) Install the JRE package from SUN (download and then 'chmod u+x' to the *.bin file).
2) cd to /usr/lib/firefox-18.104.22.168/plugins
[root@guydell plugins]# ln -s /usr/java/jre1.5.0_11/plugin/i386/ns7/libjavaplugin_oji.so
[root@guydell plugins]# ls -l
lrwxrwxrwx 1 root root 58 Mar 9 17:16 libjavaplugin_oji.so -> /usr/java/jre1.5.0_11/plugin/i386/ns7/libjavaplugin_oji.so
-rwxr-xr-x 1 root root 14288 Jul 29 2006 libnullplugin.so
-rwxr-xr-x 1 root root 7564 Jul 29 2006 libunixprintplugin.so
Saturday, February 24, 2007
Monday, March 26th, 2007
IGT Offices, Maskit 4, 5th Floor, Hertzliya
14:00-14:15: OPENING - Avner & Guy
14:15-14:50: PPF, JAVA, OPEN-SOURCE AND GRID: BEYOND THE TRADITIONAL GRID
Speaker: Laurent Cohen, ILOG, Inc. and JPPF founder
Traditional Grid architectures rely on a concept of job submission that comes with a set of constraints regarding the nature of the compute nodes, the ease of deployment of the jobs, and the effort required to use the Grid, which could otherwise be utilized to work on the problems to solve. JPPF offers an alternative, enabling a true heterogeneous nature of the Grid components, an ease of use that permits engineers, developers and scientists to focus on their domain rather than on the grid infrastructure, while retaining the benefits of the Grid technology to solve heavy and complex computational problems.We will present how the design and architectural choices in JPPF, in terms of programming language, installation, administration, dynamic configuration, updates automation, application code deployment and security policy automation, bring outstanding benefits in the areas of the cost of adoption, level of effort at the operational and organizational levels, as well as the resulting ease of use for the end-users.
15:00-15:35: INTERACTIVE, USER FRIENDLY, PARALLEL COMPUTING ON CLUSTERS
Speaker: Yoel Jacobsen/E&M CTO
Productivity boost brought by Star-P to the teams of MATLAB-skilled domain experts is matched by the economic benefits of getting the most out of the computing power of next generation servers and clusters. Star-P enables interactive workflow for large-scale scientific and engineering computing, eliminating the need for intermediate steps of reprogramming the code in C, Fortran, and MPI, and dramatically shortening the time to insight.
15:35-15:45: DISCUSSION AND CONCLUDING REMARKS
To register, please send your contact details to email@example.com
We are looking forward to seeing you!
Grid-HPC WG Director
Monday, January 22, 2007
There were 4 meetings with an average number of about 20 attendees.
If your company/research group would like to give a presentation in this forum please contact me.
Monday, January 08, 2007
Sunday, January 07, 2007
I was inspired by the first example in "Casting Your Net with OpenVPN" by Paul Duncan article that was published in the Linux Magazine.
First I created configuration files at both ends, see screen dump for the client configure file:
Then I started OpenVPN on both computers:
and similarly at the remote server:
After establishing the connection, I could connect via my private network:
I did SSH from my client (10.55.55.2) to my server (10.55.55.1):