If
we know the primary sequence of a protein, can
we predict its three-dimensional structure by
computational approaches? This is one of the most
important and difficult problems in computational
molecular biology and has tremendous implications
for the field of proteomics.
In
this talk, I will present two efficient algorithms
to solve two key problems of protein structure
prediction: backbone prediction by protein threading
technique and side-chain prediction. Both problems
are NP-hard and hard to approximate.
First,
I will present a linear integer programming approach
to the protein threading problem. With a theoretically-grounded
formulation, this approach will lead to an amazing
result: almost all the relaxed linear program
instances will produce integral solutions directly,
without using branch-and-bound. This result indicates
that the protein threading problem is tractable
empirically although it is hard theoretically.
Next, I will describe a novel approach to the
protein side-chain packing problem, based on the
tree-decomposition of a protein structure. This
algorithm not only runs very efficiently in practice,
but also yields a polynomial-time approximation
scheme for some types of objective functions.
I
have implemented the proposed algorithms combined
with several other components as two protein structure
prediction programs RAPTOR (for backbone prediction)
and SCATD (for side-chain prediction). RAPTOR
ranks first among all the programs of its category
in CASP5 and third in CASP6, the most recognized
competition in the area of protein structure prediction.
Without losing any prediction accuracy, SCATD
achieves an average speed five times faster than
SCWRL 3.0, a widely-used side-chain packing program,
and runs up to 90 times faster on large proteins.