Monday, July 20, 2009

Bioinformatics Computer Skills Chapter 1

The first chapter talks about how biological information now is being collected far faster than it can be analyzed or understood. Huge databases are being collected and stored but the only thing that can help with making all this information useful would be fast processing computers.

The first tool talked about is used for searching and finding DNA sequences using programs such as BLAST and FASTA. Which takes us to a web interface at the National Center for Biotechnology Information (NCBI). The book details out all the specifics of what this is and what it means. I will not go into that because my intention is not to copy this book. I could get into a lot of trouble and the authors I’m sure know much more about this than me.

Let’s just suffice it to say that the first tool used is a web browser. OpenSolaris comes complete with Firefox one of the most used web browsers provided by most Linux distributions.

It is spelled out that in order to use most of the powerful programs available you will need to know Unix. OpenSolaris is also Unix and it comes from the very stable Solaris base which has been around as long or longer than Linux. The book talks about the need to probably set up your computer as a dual boot system because you will want to have access to software such as word processors, graphics programs and visual programming tools. With OpenSolaris you have a complete office suite OpenOffice which is compatible with the Windows Office suite. If there happens to be some tool that just is only available for Windows than you can run a VirtualBox Virtual Machine with windows for that one piece of software.  With VirtualBox there is no need to boot and reboot back and forth between Windows and OpenSolaris. Both Operating Systems are running simultaneously and you can move files between the two systems or cut and paste data and text from one to the other.

 

Summarize tools and systems from chapter 1

Web Browser – Already there in OpenSolaris.

Office Suite – Available for OpenSolaris with OpenOffice which is provided with OpenSolaris.

No need to Dual boot, but should it be absolutely required run Windows in a Virtual Machine with VirtualBox.

 

Thanks and stay tuned for tools and systems for chapter 2.

Bioinformatics Computer Skills

I’m reading the book by Cynthia Gibas and Per Jambeck called Bioinformatics Computer Skills. The book is very well written and details how Biology is being advance by using computing systems. The authors detail out computing tools and systems that can be used by researchers for all kinds of biological research, simulation and modeling. Everything is provided with examples from the perspective of using compute hardware running Linux. I thought it would be very interesting to go through and see how much of the same would be available or could be made available with OpenSolaris. So for next while I’ll be blogging chapter by chapter on doing everything they show with OpenSolaris.

I hope I’ve not bitten off way to much work for myself.

 

Regards 

Parallel Processing with Sun HPC Developer

The other day I thought I’d try out the parallel programming environment in Sun’s HPC Developer. I selected a very popular example from the LANL Website. The example by Blaise Barney  “Simple Heat Equation Parallel Solution 1” posted here was an ideal example. It requires processes in multiples of 4. The web site goes into more detail about the science of code which is a very interesting read.

Retrieve Sample Code 

Building Parallel Program

Start SunStudio in Developer VM and create a new project. File –> New Project. Choose Categories: GridEngine, Projects: Grid Engine C Project. Click Next. Project Name: GridEngineCProject_HeatExample and set the Parallel Environment to orte. Project is created. Now open up the project tree so you can see the Source Files. A new C source file is created named gridEngineMain.c. Double click on this file and it will open. Copy the source Sample Code from Blaise Barney and paste it over all the code in the gridEngineMain.c file.

Setup Header and Library file locations by right mouse clicking on the project name in Projects Tree. Select Properties from the menu list. This will bring up a configuration panel. Under Categories select C Compiler. Under General Set the Include Directories with the browse button and go to /opt/SUNWhpc/include. In Categories now select Linker in Libraries browse to add library file dynamic /opt/SUNWhpc/lib/libmpi.so. Click OK

Now in main SunStudio Menu select Run –> Clean and Build main project. This should build successfully with Exit value 0.

Running the Parallel Program

To make the program run in a parallel environment for the first time we will use mpirun. Open a console and change directory to the executable.

$cd /export/home/hpcuser/SunStudioProjects/GridEngineCProject_HeatExample/dist/Debug/SunStudio-Solaris-Sparc

Do a list

$ ls

You should see an executable program we just built gridenginecproject_heatexample

Set the Environment variable LD_LIBRARY_PATH so the environment will know where to find the mpi library

$ LD_LIBRARY_PATH=/opt/SUNWhpc/lib

$ export LD_LIBRARY_PATH

Run the mpi example program locally with 4 processes.

$ mpirun –n 4 ./gridenginecproject_heatexample

You should see something similar to this output:

MPI task 3 has started...
MPI task 2 has started...
MPI task 1 has started...
MPI task 0 has started...
Initialized array sum = 1.335708e+14
Sent 4000000 elements to task 1 offset= 4000000
Task 1 mysum = 4.884048e+13
Sent 4000000 elements to task 2 offset= 8000000
Task 2 mysum = 7.983003e+13
Sent 4000000 elements to task 3 offset= 12000000
Task 0 mysum = 1.598859e+13
Task 3 mysum = 1.161867e+14
Sample results:
  0.000000e+00  2.000000e+00  4.000000e+00  6.000000e+00  8.000000e+00
  8.000000e+06  8.000002e+06  8.000004e+06  8.000006e+06  8.000008e+06
  1.600000e+07  1.600000e+07  1.600000e+07  1.600001e+07  1.600001e+07
  2.400000e+07  2.400000e+07  2.400000e+07  2.400001e+07  2.400001e+07
*** Final sum= 2.608458e+14 ***

 

Now we want to run the program on the integrated grid compute nodes. Setup parallel environment orte 2 slots for length of 10 mins
$ qrsh -pe orte 2 -l h_rt=00:10:00

This sets up a console with parallel environment onto the compute grid. There are 2 compute machines.

Now run mpi example program starting 4 processes
$ /opt/SUNWhpc/bin/mpirun -n 4 ./gridenginecproject_heatexample

 

Parallel environments

Grid Engine uses the concept of a parallel environment which defines how a parallel job should be ran. A parallel environment (PE) is used to initialize the cluster for parallel execution of your code. To specify a parallel environment use -pe <pe_name> <num>, where pe_name is one of the defined PEs and num is the requested number of CPUs.

The <num> parameter can be an integer or an interval. If the parameter is an interval, -pe dmp4 4,8,12, the scheduler tries to allocate at least 4 CPUs and at most 12.

 

 

 

Sources:

“Simple Heat Equation Parallel Solution 1” from LANL site written by Blais Barney
https://computing.llnl.gov/tutorials/parallel_comp/#ExamplesHeat
C Code Example
https://computing.llnl.gov/tutorials/mpi/samples/C/mpi_heat2D.c

Same Code written in Fortran should you like to try it.

https://computing.llnl.gov/tutorials/mpi/samples/Fortran/mpi_array.f

 

 

Friday, July 17, 2009

Using HPC Developer 1.0 with VirtualBox

Sun has recently released a Virtual Image of a complete Development environment for HPC software. This image contains all the basic developer tools with a multi-node grid system for testing. The image was created to load with VMWare Server, Desktop or VMWare Fusion. However the image can also be used with VirtualBox. VirtualBox is a completely opensource virtual machine system. The image needs a little tweaking to be loaded in VirtualBox so here are the instructions.

Things you will need

Sun HPC Developer Image.

VirtualBox Software.

Software to mount Sun HPC Developer Image.

Linux and MacOS – Software included in OS.

Windows – Try Virtual CloneDrive. Some easy instructions on how to use it here.

Software to unzip and untar Image file.

Linux and MacOS – Software include in OS.

Windows – Try 7 zip.

Download the Image

Watch a video about the project location and how to download the image.

or go straight to the download location here.

Unpacking the iso file

The image downloads in an iso file format. You can burn this directly to a DVD and use is this way or mount the iso using the proper tools for your system. The mounted iso will contain these files:

mounted

Enter the Images directory

Image

Unzip with your particular tool for bzip2 files. Once it is unzipped you will want to untar the tar file which will then be a directory HPC-Distro-Developer-VMware and will contain these files:

Files-unpacked

The file that will be important to remember is the HPC-Distro-Developer-VMware.vmdk (6,248,960 KB). This is the image file which will be loaded in VirtualBox later.

Load Image

By this point I’m going to assume you have downloaded and installed VirtualBox. If not download the version you need here.

Watch the video on how to load Developer Image:

Part 1

Part 2

or Follow these steps:

File -> Virtual Media Manager
Add vmdk file. Push the add button and navigate to the location where you unpacked Developer HPC-Distro-Developer-VMware.vmdk file.

In VirtualBox make a new VM.
push button New or in menu Machine -> New.
Give it the Name you wish for example: HPC Developer 1.0
Operating System: Solaris
Version: OpenSolaris (64 bit) or OpenSolaris for 32 bit.
click Next
Memory should be set to at least 1024MB more will make it run better (2048MB)
click Next
Create hard disk "Use existing hard disk". Virtual Media Manager will pop up again. Select HPC-Distro-Developer-VMware.vmdk file. Press Select button.
click Next
click Finish

Now on left part of VirtualBox Machine Manager select the VM you just created and press the Settings button.

Under System settings:
Motherboard:
Enable ACPI
Disable IO APIC
Processor:
Enable PAE/NX
Acceleration:
Enable VT-x/AMD-V
Enable Nested Paging

Under Display settings:
Video:
Change Video Memory to 18 - 20 MB
Enable 3D Acceleration

Under Storage settings:
Hard Disks:
IDE Controller Type:
Leave as PIIX4
Enable Additional Controller and set it to SCSI (LsiLogic)
Attachments should be Slot SCSI port 0 and Hard Disk HPC-Distro-Developer-VMware.vmdk (Normal)

Under Network settings:
Adapter 1:
Enable Network Adapter
Adapter Type: Intel PRO/1000 MT ...
Attached to: NAT

press OK

You should be able to start the Virtual Machine now.

Once it is up a started get a console and su passwd hpcdistro.

Check that you have interfaces:
e1000g1
e1000g1:1
e1000g1:2
e1000g1:3

If they are not all there halt the compute zones
# zoneadm -z node1 halt
# zoneadm -z node2 halt

Edit zone config /etc/zones files node1.xml and node2.xml
Search for e1000g0 and change it to e1000g1

Now boot the zones again
# zoneadm -z node1 boot
# zoneadm -z node2 boot

Give the zones a little time to boot and then check that they are up by doing an ifconfig -a and pinging the addresses of e1000g1:1, e1000g1:2 and e1000g1:3.

Here is another blog page that details someother issues involved in converting VMWare images to VirtualBox. http://blogs.sun.com/mf/entry/opensolaris_hpcdistro_1_0_in

I hope this blog page will help you more easily get started developing parallel HPC software on OpenSolaris.