ALBERT — All Library Books, journals and Electronic Records Telegrafenberg

1

Unknown

A Low-Cost Named Entity Recognition Research Based on Active Learning (2018)

Huang, Han ; Wang, Hongyu ; Jin, Dawei

Hindawi

In: Scientific Programming. 2018; 2018: 1-10. Published 2018 Dec 18. doi: 10.1155/2018/1890683.

add to mindlist on the mindlist

Details

Publication Date: 2018-12-18

Description: Named entity recognition (NER) is an indispensable and very important part of many natural language processing technologies, such as information extraction, information retrieval, and intelligent Q & A. This paper describes the development of the AL-CRF model, which is a NER approach based on active learning (AL). The algorithmic sequence of the processes performed by the AL-CRF model is the following: first, the samples are clustered using the k-means approach. Then, stratified sampling is performed on the produced clusters in order to obtain initial samples, which are used to train the basic conditional random field (CRF) classifier. The next step includes the initiation of the selection process which uses the criterion of entropy. More specifically, samples having the highest entropy values are added to the training set. Afterwards, the learning process is repeated, and the CRF classifier is retrained based on the obtained training set. The learning and the selection process of the AL is running iteratively until the harmonic mean F stabilizes and the final NER model is obtained. Several NER experiments are performed on legislative and medical cases in order to validate the AL-CRF performance. The testing data include Chinese judicial documents and Chinese electronic medical records (EMRs). Testing indicates that our proposed algorithm has better recognition accuracy and recall rate compared to the conventional CRF model. Moreover, the main advantage of our approach is that it requires fewer manually labelled training samples, and at the same time, it is more effective. This can result in a more cost effective and more reliable process.

Print ISSN: 1058-9244

Electronic ISSN: 1875-919X

Topics: Computer Science , Media Resources and Communication Sciences, Journalism

Published by Hindawi

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

2

Unknown

HPGraph: High-Performance Graph Analytics with Productivity on the GPU (2018)

Yang, Haoduo ; Su, Huayou ; Lan, Qiang ; [et al.]

Hindawi

In: Scientific Programming. 2018; 2018: 1-11. Published 2018 Dec 11. doi: 10.1155/2018/9340697.

add to mindlist on the mindlist

Details

Publication Date: 2018-12-11

Description: The growing use of graph in many fields has sparked a broad interest in developing high-level graph analytics programs. Existing GPU implementations have limited performance with compromising on productivity. HPGraph, our high-performance bulk-synchronous graph analytics framework based on the GPU, provides an abstraction focused on mapping vertex programs to generalized sparse matrix operations on GPU as the backend. HPGraph strikes a balance between performance and productivity by coupling high-performance GPU computing primitives and optimization strategies with a high-level programming model for users to implement various graph algorithms with relatively little effort. We evaluate the performance of HPGraph for four graph primitives (BFS, SSSP, PageRank, and TC). Our experiments show that HPGraph matches or even exceeds the performance of high-performance GPU graph libraries such as MapGraph, nvGraph, and Gunrock. HPGraph also runs significantly faster than advanced CPU graph libraries.

Print ISSN: 1058-9244

Electronic ISSN: 1875-919X

Topics: Computer Science , Media Resources and Communication Sciences, Journalism

Published by Hindawi

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

3

Unknown

Multiobjective Glowworm Swarm Optimization-Based Dynamic Replication Algorithm for Real-Time Distributed Databases (2018)

Thalij, Saadi Hamad ; Hakkoymaz, Veli

Hindawi

In: Scientific Programming. 2018; 2018: 1-16. Published 2018 Dec 04. doi: 10.1155/2018/2724692.

add to mindlist on the mindlist

Details

Publication Date: 2018-12-04

Description: Distributed systems offer resources to be accessed geographically for large-scale data requests of different users. In many cases, replication of the vital data files and storing their replica in multiple locations accessible to the requesting clients is vital in improving the data availability, reliability, security, and reduction of the execution time. It is important that real-time distributed databases maintain the consistency constraints and also guarantee the time constraints required by the client requests. However, when the size of the distributed system increases, the user access time also tends to increase, which in turn increases the vitality of the replica placement. Thus, the primary issues that emerge are deciding upon an optimal replication number and identifying perfect locations to store the replicated data. These open challenges have been considered in this study, which turns to develop a dynamic data replication algorithm for real-time distributed databases using a multiobjective glowworm swarm optimization (MGSO) strategy. The proposed algorithm adapts the random patterns of the read-write requests and employs a dynamic window mechanism for replication. It also models the replica number and placement problem as a multiobjective optimization problem and utilizes MGSO for resolving it. The cost models are presented to ensure the time constraint satisfaction in servicing user requests. The performance of the MGSO dynamic data replication algorithm has been studied using competitive analysis, and the results show the efficiency of the proposed algorithm for the distributed databases.

Print ISSN: 1058-9244

Electronic ISSN: 1875-919X

Topics: Computer Science , Media Resources and Communication Sciences, Journalism

Published by Hindawi

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

4

Unknown

Improving I/O Efficiency in Hadoop-Based Massive Data Analysis Programs (2018)

Lee, Kyong-Ha ; Kang, Woo Lam ; Suh, Young-Kyoon

Hindawi

In: Scientific Programming. 2018; 2018: 1-9. Published 2018 Dec 02. doi: 10.1155/2018/2682085.

add to mindlist on the mindlist

Details

Publication Date: 2018-12-02

Description: Apache Hadoop has been a popular parallel processing tool in the era of big data. While practitioners have rewritten many conventional analysis algorithms to make them customized to Hadoop, the issue of inefficient I/O in Hadoop-based programs has been repeatedly reported in the literature. In this article, we address the problem of the I/O inefficiency in Hadoop-based massive data analysis by introducing our efficient modification of Hadoop. We first incorporate a columnar data layout into the conventional Hadoop framework, without any modification of the Hadoop internals. We also provide Hadoop with indexing capability to save a huge amount of I/O while processing not only selection predicates but also star-join queries that are often used in many analysis tasks.

Print ISSN: 1058-9244

Electronic ISSN: 1875-919X

Topics: Computer Science , Media Resources and Communication Sciences, Journalism

Published by Hindawi

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

5

Unknown

A 64-Line Lidar-Based Road Obstacle Sensing Algorithm for Intelligent Vehicles (2018)

Wang, Hai ; Lou, Xinyu ; Cai, Yingfeng ; [et al.]

Hindawi

In: Scientific Programming. 2018; 2018: 1-7. Published 2018 Nov 21. doi: 10.1155/2018/6385104.

add to mindlist on the mindlist

Details

Publication Date: 2018-11-21

Description: Based on the 64-line lidar sensor, an object detection and classification algorithm with both effectiveness and real time is proposed. Firstly, a multifeature and multilayer lidar points map is used to separate the road, obstacle, and suspension object. Then, obstacle grids are clustered by a grid-clustering algorithm with dynamic distance threshold. After that, by combining the motion state information of two adjacent frames, the clustering results are corrected. Finally, the SVM classifier is used to classify obstacles with clustered object position and attitude features. The good accuracy and real-time performance of the algorithm are proved by experiments, and it can meet the real-time requirements of the intelligent vehicles.

Print ISSN: 1058-9244

Electronic ISSN: 1875-919X

Topics: Computer Science , Media Resources and Communication Sciences, Journalism

Published by Hindawi

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

6

Unknown

SDN Programming for Heterogeneous Switches with Flow Table Pipelining (2018)

Wang, Junchang ; Cheng, Shaojin ; Fu, Xiong

Hindawi

In: Scientific Programming. 2018; 2018: 1-13. Published 2018 Nov 21. doi: 10.1155/2018/2848232.

add to mindlist on the mindlist

Details

Publication Date: 2018-11-21

Description: High-level programming is one of the critical building blocks of the effective use of software-defined networking (SDN). Existing solutions, however, either (1) cannot utilize the state-of-the-art switches with flow table pipelining, a key technique to prevent flow rule set explosion or (2) force programmers to manually organize and manage hardware flow table pipelines, which is time-consuming and error-prone. This paper presents a high-level SDN programming framework to address these issues. The framework can automatically (1) generate rule sets for heterogeneous switches with different flow table pipelining designs and (2) update installed rules when the network state changes. As a result, the framework can not only generate efficient rule sets for switches but also provide programmers a centralized, intuitive, and hence easy-to-use programming API. Experiments show that the framework can generate compact rule sets that are 29–116 times smaller than those generated by other open-source SDN controllers. Besides, the framework is 5 times faster to recover from network link failures in comparison to other controllers.

Print ISSN: 1058-9244

Electronic ISSN: 1875-919X

Topics: Computer Science , Media Resources and Communication Sciences, Journalism

Published by Hindawi

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

7

Unknown

Improving POI Recommendation via Dynamic Tensor Completion (2018)

Liao, Jinzhi ; Tang, Jiuyang ; Zhao, Xiang ; [et al.]

Hindawi

In: Scientific Programming. 2018; 2018: 1-11. Published 2018 Nov 13. doi: 10.1155/2018/3907804.

add to mindlist on the mindlist

Details

Publication Date: 2018-11-13

Description: POI recommendation finds significant importance in various real-life applications, especially when meeting with location-based services, e.g., check-ins social networks. In this paper, we propose to solve POI recommendation through a novel model of dynamic tensor, which is among the first triumphs of its kind. In order to carry out timely recommendation, we predict POI by utilizing a completion algorithm based on fast low-rank tensor. Particularly, the dynamic tensor structure is complemented by the fast low-rank tensor completion algorithm so as to achieve prediction with better performance, where the parameter optimization is achieved by a pigeon-inspired heuristic algorithm. In short, our POI recommendation via the dynamic tensor method can take advantage of the intrinsic characteristics of check-ins data due to the multimode features such as current categories, subsequent categories, and temporal information as well as seasons variations are all integrated into the model. Extensive experiment results not only validate the superiority of our proposed method but also imply the application prospect in large-scale and real-time POI recommendation environment.

Print ISSN: 1058-9244

Electronic ISSN: 1875-919X

Topics: Computer Science , Media Resources and Communication Sciences, Journalism

Published by Hindawi

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

8

Unknown

Scientific Programming Techniques and Algorithms for Data-Intensive Engineering Environments (2018)

Alor-Hernández, Giner ; Mejía-Miranda, Jezreel ; Álvarez-Rodríguez, José María

Hindawi

In: Scientific Programming. 2018; 2018: 1-3. Published 2018 Nov 05. doi: 10.1155/2018/1351239.

add to mindlist on the mindlist

Details

Publication Date: 2018-11-05

Print ISSN: 1058-9244

Electronic ISSN: 1875-919X

Topics: Computer Science , Media Resources and Communication Sciences, Journalism

Published by Hindawi

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

9

Unknown

Feature Reduction Based on Hybrid Efficient Weighted Gene Genetic Algorithms with Artificial Neural Network for Machine Learning Problems in the Big Data (2018)

Mohammed, Tareq Abed ; Alhayali, Shaymaa ; Bayat, Oguz ; [et al.]

Hindawi

In: Scientific Programming. 2018; 2018: 1-10. Published 2018 Oct 30. doi: 10.1155/2018/2691759.

add to mindlist on the mindlist

Details

Publication Date: 2018-10-30

Description: A large amount of data being generated from different sources and the analyzing and extracting of useful information from these data becomes a very complex task. The difficulty of dealing with big data arises from many factors such as the high number of features, existence of lost data, and variety of data. One of the most effective solutions that used to overcome the huge amount of big data is the feature reduction process. In this paper, a set of hybrid and efficient algorithms are proposed to classify the datasets that have large feature size by merging the genetic algorithms with the artificial neural networks. The genetic algorithms are used as a prestep to significantly reduce the feature size of the analyzed data before handling that data using machine learning techniques. Reducing the number of features simplifies the task of classifying the analyzed data and enhances the performance of the machine learning algorithms that are used to extract valuable information from big data. The proposed algorithms use a new gene-weight mechanism that can significantly enhance the performance and decrease the required search time. The proposed algorithms are applied on different datasets to pick the most relative and important features before applying the artificial neural networks algorithm, and the results show that our proposed algorithms can effectively enhance the classifying performance over the tested datasets.

Print ISSN: 1058-9244

Electronic ISSN: 1875-919X

Topics: Computer Science , Media Resources and Communication Sciences, Journalism

Published by Hindawi

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

10

Unknown

Survey of Scientific Programming Techniques for the Management of Data-Intensive Engineering Environments (2018)

Álvarez-Rodríguez, Jose María ; Alor-Hernández, Giner ; Mejía-Miranda, Jezreel

Hindawi

In: Scientific Programming. 2018; 2018: 1-21. Published 2018 Oct 30. doi: 10.1155/2018/8467413.

add to mindlist on the mindlist

Details

Publication Date: 2018-10-30

Description: The present paper introduces and reviews existing technology and research works in the field of scientific programming methods and techniques in data-intensive engineering environments. More specifically, this survey aims to collect those relevant approaches that have faced the challenge of delivering more advanced and intelligent methods taking advantage of the existing large datasets. Although existing tools and techniques have demonstrated their ability to manage complex engineering processes for the development and operation of safety-critical systems, there is an emerging need to know how existing computational science methods will behave to manage large amounts of data. That is why, authors review both existing open issues in the context of engineering with special focus on scientific programming techniques and hybrid approaches. 1193 journal papers have been found as the representative in these areas screening 935 to finally make a full review of 122. Afterwards, a comprehensive mapping between techniques and engineering and nonengineering domains has been conducted to classify and perform a meta-analysis of the current state of the art. As the main result of this work, a set of 10 challenges for future data-intensive engineering environments have been outlined.

Print ISSN: 1058-9244

Electronic ISSN: 1875-919X

Topics: Computer Science , Media Resources and Communication Sciences, Journalism

Published by Hindawi

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext