Tags: boinc
Protein folding is no bologna
December 7th, 2009When we hear the word protein, most of us think "lunch!", but proteins within the body effect your health, and the health of many disease cells you may be harboring. The H1N1 (and flu virus in any given year) is essentially just a few minor protein changes on the flu from years past.
Follow up:
Indeed proteins are found on the surface of virus cells such as HIV, and cancer cells. Cells can be identified by their protein wrappers and this makes them viable targets for treatments and vaccines. But it all hinges on the ability to determine the shape of the protein, and therefore the shape your treatment must be. If your treatment successfully links in to the disease, then you have changed the disease protein and quit possibly are preventing it from replicating, or from conveying the message it used to. So, if you can bind to a protein on an HIV cell, you modify what it used to do. Hopefully in a way that is no longer harmful to the body. So the ability to predict the shape of a protein (and RNA such as in the image) is key to attacking these viruses and diseases.
So why are there no HIV vaccines? With only 20 amino acids used to build all proteins, it would seem that the problem of predicting structure should be fairly straightforward. It is anything but.
An overview of the process is described in this video lecture by Dr. David Baker at the University of Washington. Proteins are an assembly of the 22 amino acids (AAs), and as each joins to the next they can join at various torsion angles, in any dimension. The number of angles varies for different AAs but can be conservatively estimated at 3. So the connection from one AA to the next can occur in any of 3 different directions.
Dr. Baker uses an energy function rooted in the physical properties of the atoms that make up the AAs to test any given orientation and score it's liklihood of being the correct structure (i.e. the same as the structure you would find in nature). No problem! Just run your energy function against all 3 orientation and see which scores the best! ...if only it were that simple.
You see you can't just look at two adjacent AAs and get a picture of the total structure. As it folds around and back on itself, various portions of the chain will interact with one another. In fact, some of the possible conformations would violate the law of physics that at any given time, only one object can occupy a given space.
And so the total number of possible conformations to consider increases with the length of the protein (i.e. the number of AAs). Each AA adds (in our simplified estimate) another 3 possible orientations. No problem, just have a supercomputer run through each one and see which has the best energy score!
Proteins vary dramatically in size, but are often over 100 amino acids long. And so with our estimate of 3 torsion angles at each junction the search space is 3 to the 100th power. That is roughly a 5 with 47 zeros after it. If our hypothetical supercomputer takes just a millisecond to run through the energy calculations for all of the atoms in the model, considering each of the torsion angles will still take more time then the universe has been in existence, literally billions of years. And we haven't yet taken in to consideration the orientation of the 100 side chains! And then there is the fact that thousands of proteins would need to be studied, including those that have not been discovered yet. This is one example of the type of problem I was referring to when I said that Gen-x is going to need 10x (the computing power).
A better approach must be found. Some means of avoiding the calculation of each of the possible combinations. Dr. Baker has devised some unique approaches to doing just that. But protein structure prediction is still a very computationally intense problem.
Dr. Baker uses volunteer computing (a form of grid computing where the general public contributes the use of their home computer to help this basic research). By doing so, as he points out in the video, he no longer buys more computers. For which I'm sure the University of Washington is thankful, because Dr. Baker's Rosetta@home project is now using 80,000 machines on a daily basis to continue their basic research. He does this by using open source software originated by University of California-Berkeley, called the Berkeley Open-Infrastructure for Network Computing (BOINC), which is supported by the National Science Foundation.
Your home computer, even when you are "using it", (such as this very moment) is idle over 90% of the time. BOINC gives you an easy way to utilize that valuable resource to further advance science. Your computer can be used to help Rosetta@home and a large array of other research... but more on that in future blogs.
As you can imagine, managing an environment of tens of thousands of machines becomes a challenge as well, and to address that problem researchers can now turn to services such as DeepSci.com to manage such projects.
