Publication Date:
2014-12-13
Description:
Proteins fold into complex three-dimensional shapes. Simplified representations of their shapes are central to rationalise, compare, classify, and interpret protein structures. Traditional methods to abstract protein folding patterns rely on representing their standard secondary structural elements (helices and strands of sheet) using line segments. This results in ignoring a significant proportion of structural information. The motivation of this research is to derive mathematically rigorous and biologically meaningful abstractions of protein folding patterns that maximize the economy of structural description and minimize the loss of structural information. We report on a novel method to describe a protein as a non-overlapping set of parametric three dimensional curves of varying length and complexity. Our approach to this problem is supported by information theory and uses the statistical framework of minimum message length (MML) inference. We demonstrate the effectiveness of our non-linear abstraction to support efficient and effective comparison of protein folding patterns on a large scale.
Print ISSN:
1545-5963
Electronic ISSN:
1557-9964
Topics:
Biology
,
Computer Science
Published by
Institute of Electrical and Electronics Engineers (IEEE)
on behalf of
The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.