Publication Date:
2018-02-07
Description:
This study proposes a new method to determine the functions of an unannotated protein. The proteins and amino acid residues mentioned in biomedical texts associated with an unannotated protein $p$ can be considered as characteristics terms for $p$ , which are highly predictive of the potential functions of $p$ . Similarly, proteins and amino acid residues mentioned in biomedical texts associated with proteins annotated with a functional category $f$ can be considered as characteristics terms of $f$ . We introduce in this paper an information extraction system called IFP_IFC that predicts the functions of an unannotated protein $p$ by representing $p$ and each functional category $f$ by a vector of weights. Each weight reflects the degree of association between a c- aracteristic term and $p$ (or a characteristic term and $f$ ). First, IFP_IFC constructs a network, whose nodes represent the different functional categories, and its edges the interrelationships between the nodes. Then, it determines the functions of $p$ by employing random walks with restarts on the mentioned network. The walker is the vector of $p$ . Finally, $p$ is assigned to the functional categories of the nodes in the network that are visited most by the walker. We evaluated the quality of IFP_IFC by comparing it experimentally with two other systems. Results showed marked improvement.
Print ISSN:
1545-5963
Electronic ISSN:
1557-9964
Topics:
Biology
,
Computer Science
Published by
Institute of Electrical and Electronics Engineers (IEEE)
on behalf of
The IEEE Computational Intelligence Society ; The IEEE Computer Society ; The IEEE Control Systems Society ; The IEEE Engineering in Medicine and Biology Society ; The Association for Computing Machinery.