Can we afford more INCITEful spending?
January 29th, 2010The DOE's INCITE program awarded 1.6bln CPU hours to 69 researchers. With a $400,000,000 pricetag, do you feel taxpayers are making an investment worth the interest on the debt? Taking the government out of the picture makes ten times more computing power available, while reducing costs by 90%.
Follow up:
Do you feel the DOE should decide what research is "cutting-edge", and deny resources to others? We all want nano solar cells and advancements in nuclear power. Never mind that more than a decade has passed since the last US nuclear power plant went online [source].
These federally funded supercomputing projects are great... if you are one of the 69 chosen few that is allowed to use the resource in 2010. But what about the rest of the research community? They have to wait until 2011 and complete extensive grant proposals to get their research done. A million of the hours are going to IBM to develop drivers and software for supercomputers with 10 million processors [source, page 3]. Our government already paid IBM millions to develop and build the machine at Argonne National Laboratory, now the machine is used to help IBM design an even larger monstrosity that only our federal government can afford.
The argument is made that our government must fund programs like this to create American jobs. But when I read the latest jobs newsletter from IBM, it was promoting "hot jobs" in India, China, Brazil, Romania, Argentina, Philippines, Egypt & Vietnam. Did you notice any countries missing from the list? I'm all for a "smarter planet". I just feel there are much smarter ways of achieving it then taxing the working people of the United States to fund a bureaucracy that decides it would be prudent to pay IBM to build a massive computer that the bureaucracy then grants the use of back to IBM to research how to build even larger and more expensive machines.
The INCITE program budget request for 2010's fiscal year is over 400 MILLION dollars [source]. That divides into the 1.6 billion hours and comes out to 25 cents per awarded CPU hour. By comparison, Amazon Web Services charges 8.5 cents an hour for CPU time [source]. That is for a "cloud" resource, and is not "tightly coupled" the way a supercomputer is, but it sure beats waiting another year and hoping for a time allocation in 2011. And we the people are often willing to use our home computers to help with research as well. When volunteer computing like this is used, the cost drops another 10 fold.
If we each took a few minutes and ran one of the research projects on our home computer, for just a week, 10 fold more computing resource would be available to researchers; and we the people can decide which projects are worthy of our support. The University of California at Berkeley as developed the infrastructure that makes it easy for home users to do just that. And DeepSci.com has a service offering making it easy for researchers to utilize such volunteer computing resources.
Our country will be paying interest on the 400 million dollar INCITE funding for the rest of my lifetime. NOT doing the research is not a viable option. Doing research with 10 fold more computing power, at one tenth of the cost is the smart step needed for progress.
When Blue Waters aren't wet.
December 14th, 2009Link: http://DeepSci.com/blog
Blue Waters is a supercomputer that will be used to tackle research challenges in proteomics, genomics, modeling of macromolecular complex structures and others that simply cannot be performed on any other available computing resources.
Blue Waters is a petascale supercomputer funded by the National Science Foundation and the University of Illinois. It will be under construction until 2011 and located at the University of Illinois at Urbana-Champaign.
The machine is based on IBM's POWER7 processors and will be the first to use a technology IBM calls PERCS (Productive, Easy-to-use, Reliable Computing System). Blue Waters will employ more then 200,000 processor cores and will provide more than 800 terabytes of memory.
Dec 17, 2PM Eastern time, 2hr webinar on Blue Waters supercomputer will discuss opportunities for scientists to access resources of the NSF-funded Blue Waters petascale computing system.
A permenant list of collaberators is being formed:
http://www.nigms.nih.gov/bluewaters/input
The next round of proposals are due March 17, 2010
Scientific areas of interest include, but are not limited to:
> Protein structure prediction from sequence
> Prediction of protein-ligand and protein-protein interactions
> Modeling of macromolecular complex structures by combinations of experimental methods
> Simulation of enzymatic mechanisms and their coupling to macromolecular dynamics
> Analysis and simulation of biological systems of material and information processing
> Genetics of genes, proteins, organisms and populations, and their evolutionary pathways
> Patterns of infectious disease communication and development of resistance
Each year, NSF expects to allocate computing time to 10-12 projects that simply cannot be performed on any other available computing resources. If you want to advance your research before 2011, or are not one of the lucky dozen to access the 200,000 processors, you might consider tapping in to volunteer computing resources and accessing more than 500,000 home computers. Some computing algorithms, simulations and models can be adapted to this sort of loosely-coupled architecture. http://DeepSci.com can help you do just that.
Protein folding is no bologna
December 7th, 2009When we hear the word protein, most of us think "lunch!", but proteins within the body effect your health, and the health of many disease cells you may be harboring. The H1N1 (and flu virus in any given year) is essentially just a few minor protein changes on the flu from years past.
Follow up:
Indeed proteins are found on the surface of virus cells such as HIV, and cancer cells. Cells can be identified by their protein wrappers and this makes them viable targets for treatments and vaccines. But it all hinges on the ability to determine the shape of the protein, and therefore the shape your treatment must be. If your treatment successfully links in to the disease, then you have changed the disease protein and quit possibly are preventing it from replicating, or from conveying the message it used to. So, if you can bind to a protein on an HIV cell, you modify what it used to do. Hopefully in a way that is no longer harmful to the body. So the ability to predict the shape of a protein (and RNA such as in the image) is key to attacking these viruses and diseases.
So why are there no HIV vaccines? With only 20 amino acids used to build all proteins, it would seem that the problem of predicting structure should be fairly straightforward. It is anything but.
An overview of the process is described in this video lecture by Dr. David Baker at the University of Washington. Proteins are an assembly of the 22 amino acids (AAs), and as each joins to the next they can join at various torsion angles, in any dimension. The number of angles varies for different AAs but can be conservatively estimated at 3. So the connection from one AA to the next can occur in any of 3 different directions.
Dr. Baker uses an energy function rooted in the physical properties of the atoms that make up the AAs to test any given orientation and score it's liklihood of being the correct structure (i.e. the same as the structure you would find in nature). No problem! Just run your energy function against all 3 orientation and see which scores the best! ...if only it were that simple.
You see you can't just look at two adjacent AAs and get a picture of the total structure. As it folds around and back on itself, various portions of the chain will interact with one another. In fact, some of the possible conformations would violate the law of physics that at any given time, only one object can occupy a given space.
And so the total number of possible conformations to consider increases with the length of the protein (i.e. the number of AAs). Each AA adds (in our simplified estimate) another 3 possible orientations. No problem, just have a supercomputer run through each one and see which has the best energy score!
Proteins vary dramatically in size, but are often over 100 amino acids long. And so with our estimate of 3 torsion angles at each junction the search space is 3 to the 100th power. That is roughly a 5 with 47 zeros after it. If our hypothetical supercomputer takes just a millisecond to run through the energy calculations for all of the atoms in the model, considering each of the torsion angles will still take more time then the universe has been in existence, literally billions of years. And we haven't yet taken in to consideration the orientation of the 100 side chains! And then there is the fact that thousands of proteins would need to be studied, including those that have not been discovered yet. This is one example of the type of problem I was referring to when I said that Gen-x is going to need 10x (the computing power).
A better approach must be found. Some means of avoiding the calculation of each of the possible combinations. Dr. Baker has devised some unique approaches to doing just that. But protein structure prediction is still a very computationally intense problem.
Dr. Baker uses volunteer computing (a form of grid computing where the general public contributes the use of their home computer to help this basic research). By doing so, as he points out in the video, he no longer buys more computers. For which I'm sure the University of Washington is thankful, because Dr. Baker's Rosetta@home project is now using 80,000 machines on a daily basis to continue their basic research. He does this by using open source software originated by University of California-Berkeley, called the Berkeley Open-Infrastructure for Network Computing (BOINC), which is supported by the National Science Foundation.
Your home computer, even when you are "using it", (such as this very moment) is idle over 90% of the time. BOINC gives you an easy way to utilize that valuable resource to further advance science. Your computer can be used to help Rosetta@home and a large array of other research... but more on that in future blogs.
As you can imagine, managing an environment of tens of thousands of machines becomes a challenge as well, and to address that problem researchers can now turn to services such as DeepSci.com to manage such projects.
Gen-x will need 10x (the computing power)
December 5th, 2009It is my perception that researchers are increasingly adopting computer modeling and simulation in their work. And as communities get more familiar with their respective tools of the day, the next generation of tools is going to require 10 times more computing power due to tackling a more diverse set of variables when available computing power enables it.
Follow up:
I picture, for example, an implantable medical device that has used computer modeling to predict that the device will not introduce problems with blood clotting, and yet to create this model, many assumptions are made and values are fixed. And so I picture the day when the device creators are on the witness stand at a malpractice lawsuit testifying that they only modeled "the average person". And made no effort to specifically model the patient or even broad category of patient in which the device was implanted.
Did you model an obese individual? Did you model someone with both hypertension and high cholesterol? Did you model a person when dehydrated and over exerted on a hot Summer day? No, we only ran the one model of the average person. If you get 3 or 4 variables where previously fixed values were used, you quickly find yourself with 1000 times more combinations to review with your model.
What do you think? Are we on the cusp of a new age in scientific computing?
