ALBERT

All Library Books, journals and Electronic Records Telegrafenberg

Your email was sent successfully. Check your inbox.

An error occurred while sending the email. Please try again.

Proceed reservation?

Export
Filter
  • Books
  • Articles  (1,410)
  • Oxford University Press  (1,410)
  • MDPI Publishing
  • 2020-2022  (1,410)
  • 1975-1979
  • 2020  (1,410)
  • Computer Science  (1,410)
Collection
  • Books
  • Articles  (1,410)
Years
Year
Journal
  • 1
    Publication Date: 2020-10-22
    Description: Events represent a tipping point that affects users’ opinions and vary depending upon their popularity from local to international. Indeed, social media offer users platforms to express their opinions and commitments to events that attract them. However, owing to the volume of data, users are encountering a difficulty to accede to the preferred events according to their features that are stored in their social network profiles. To surmount this limitation, multiple event recommendation systems appeared. Nevertheless, these systems use a limited number of event dimensions and user’s features. Besides, they consider users’ features stored in a single user’s profile and disregard the semantic concept. In this research, an approach for multi-dimensional event recommendation is set forward to recommend events to users resting on several event dimensions (engagement, location, topic, time and popularity) and some user’s features (demographic data, position and user’s/friend’s interests) stored in multi-user’s profiles by considering the semantic relationships between user’s features, specifically user’s interests. The performance of our approach was assessed using error rate measurements (mean absolute error, root mean squared error and cross-validation). Experiment that results on real-world event data sets confirmed that our approach recommends events that fit the user more than the previous approaches with the lowest error rate values.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 2
    Publication Date: 2020-04-06
    Description: Despite The Central Dogma states the destiny of gene as ‘DNA makes RNA and RNA makes protein’, the nucleic acids not only store and transmit genetic information but also, surprisingly, join in intracellular vital movement as a regulator of gene expression. Bioinformatics has contributed to knowledge for a series of emerging novel nucleic acids molecules. For typical cases, microRNA (miRNA), long noncoding RNA (lncRNA) and circular RNA (circRNA) exert crucial role in regulating vital biological processes, especially in malignant diseases. Due to extraordinarily heterogeneity among all malignancies, hepatocellular carcinoma (HCC) has emerged enormous limitation in diagnosis and therapy. Mechanistic, diagnostic and therapeutic nucleic acids for HCC emerging in past score years have been systematically reviewed. Particularly, we have organized recent advances on nucleic acids of HCC into three facets: (i) summarizing diverse nucleic acids and their modification (miRNA, lncRNA, circRNA, circulating tumor DNA and DNA methylation) acting as potential biomarkers in HCC diagnosis; (ii) concluding different patterns of three key noncoding RNAs (miRNA, lncRNA and circRNA) in gene regulation and (iii) outlining the progress of these novel nucleic acids for HCC diagnosis and therapy in clinical trials, and discuss their possibility for clinical applications. All in all, this review takes a detailed look at the advances of novel nucleic acids from potential of biomarkers and elaboration of mechanism to early clinical application in past 20 years.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 3
    Publication Date: 2020-02-11
    Description: Together with various hosts and environments, ubiquitous microbes interact closely with each other forming an intertwined system or community. Of interest, shifts of the relationships between microbes and their hosts or environments are associated with critical diseases and ecological changes. While advances in high-throughput Omics technologies offer a great opportunity for understanding the structures and functions of microbiome, it is still challenging to analyse and interpret the omics data. Specifically, the heterogeneity and diversity of microbial communities, compounded with the large size of the datasets, impose a tremendous challenge to mechanistically elucidate the complex communities. Fortunately, network analyses provide an efficient way to tackle this problem, and several network approaches have been proposed to improve this understanding recently. Here, we systemically illustrate these network theories that have been used in biological and biomedical research. Then, we review existing network modelling methods of microbial studies at multiple layers from metagenomics to metabolomics and further to multi-omics. Lastly, we discuss the limitations of present studies and provide a perspective for further directions in support of the understanding of microbial communities.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 4
    Publication Date: 2020-01-10
    Description: Recent advances in single-cell RNA sequencing (scRNA-seq) enable characterization of transcriptomic profiles with single-cell resolution and circumvent averaging artifacts associated with traditional bulk RNA sequencing (RNA-seq) data. Here, we propose SCDC, a deconvolution method for bulk RNA-seq that leverages cell-type specific gene expression profiles from multiple scRNA-seq reference datasets. SCDC adopts an ENSEMBLE method to integrate deconvolution results from different scRNA-seq datasets that are produced in different laboratories and at different times, implicitly addressing the problem of batch-effect confounding. SCDC is benchmarked against existing methods using both in silico generated pseudo-bulk samples and experimentally mixed cell lines, whose known cell-type compositions serve as ground truths. We show that SCDC outperforms existing methods with improved accuracy of cell-type decomposition under both settings. To illustrate how the ENSEMBLE framework performs in complex tissues under different scenarios, we further apply our method to a human pancreatic islet dataset and a mouse mammary gland dataset. SCDC returns results that are more consistent with experimental designs and that reproduce more significant associations between cell-type proportions and measured phenotypes.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 5
    Publication Date: 2020-02-11
    Description: Type III secretion systems (T3SS) can be found in many pathogenic bacteria, such as Dysentery bacillus, Salmonella typhimurium, Vibrio cholera and pathogenic Escherichia coli. The routes of infection of these bacteria include the T3SS transferring a large number of type III secreted effectors (T3SE) into host cells, thereby blocking or adjusting the communication channels of the host cells. Therefore, the accurate identification of T3SEs is the precondition for the further study of pathogenic bacteria. In this article, a new T3SEs ensemble predictor was developed, which can accurately distinguish T3SEs from any unknown protein. In the course of the experiment, methods and models are strictly trained and tested. Compared with other methods, EP3 demonstrates better performance, including the absence of overfitting, strong robustness and powerful predictive ability. EP3 (an ensemble predictor that accurately identifies T3SEs) is designed to simplify the user’s (especially nonprofessional users) access to T3SEs for further investigation, which will have a significant impact on understanding the progression of pathogenic bacterial infections. Based on the integrated model that we proposed, a web server had been established to distinguish T3SEs from non-T3SEs, where have EP3_1 and EP3_2. The users can choose the model according to the species of the samples to be tested. Our related tools and data can be accessed through the link http://lab.malab.cn/∼lijing/EP3.html.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 6
    Publication Date: 2020-07-07
    Description: Deciphering microRNA (miRNA) targets is important for understanding the function of miRNAs as well as miRNA-based diagnostics and therapeutics. Given the highly cell-specific nature of miRNA regulation, recent computational approaches typically exploit expression data to identify the most physiologically relevant target messenger RNAs (mRNAs). Although effective, those methods usually require a large sample size to infer miRNA–mRNA interactions, thus limiting their applications in personalized medicine. In this study, we developed a novel miRNA target prediction algorithm called miRACLe (miRNA Analysis by a Contact modeL). It integrates sequence characteristics and RNA expression profiles into a random contact model, and determines the target preferences by relative probability of effective contacts in an individual-specific manner. Evaluation by a variety of measures shows that fitting TargetScan, a frequently used prediction tool, into the framework of miRACLe can improve its predictive power with a significant margin and consistently outperform other state-of-the-art methods in prediction accuracy, regulatory potential and biological relevance. Notably, the superiority of miRACLe is robust to various biological contexts, types of expression data and validation datasets, and the computation process is fast and efficient. Additionally, we show that the model can be readily applied to other sequence-based algorithms to improve their predictive power, such as DIANA-microT-CDS, miRanda-mirSVR and MirTarget4. MiRACLe is publicly available at https://github.com/PANWANG2014/miRACLe.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 7
    Publication Date: 2020-07-07
    Description: With the development of single-cell RNA sequencing (scRNA-seq) technology, it has become possible to perform large-scale transcript profiling for tens of thousands of cells in a single experiment. Many analysis pipelines have been developed for data generated from different high-throughput scRNA-seq platforms, bringing a new challenge to users to choose a proper workflow that is efficient, robust and reliable for a specific sequencing platform. Moreover, as the amount of public scRNA-seq data has increased rapidly, integrated analysis of scRNA-seq data from different sources has become increasingly popular. However, it remains unclear whether such integrated analysis would be biassed if the data were processed by different upstream pipelines. In this study, we encapsulated seven existing high-throughput scRNA-seq data processing pipelines with Nextflow, a general integrative workflow management framework, and evaluated their performance in terms of running time, computational resource consumption and data analysis consistency using eight public datasets generated from five different high-throughput scRNA-seq platforms. Our work provides a useful guideline for the selection of scRNA-seq data processing pipelines based on their performance on different real datasets. In addition, these guidelines can serve as a performance evaluation framework for future developments in high-throughput scRNA-seq data processing.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 8
    Publication Date: 2020-04-21
    Description: The fast accumulation of biological data calls for their integration, analysis and exploitation through more systematic approaches. The generation of novel, relevant hypotheses from this enormous quantity of data remains challenging. Logical models have long been used to answer a variety of questions regarding the dynamical behaviours of regulatory networks. As the number of published logical models increases, there is a pressing need for systematic model annotation, referencing and curation in community-supported and standardised formats. This article summarises the key topics and future directions of a meeting entitled ‘Annotation and curation of computational models in biology’, organised as part of the 2019 [BC]2 conference. The purpose of the meeting was to develop and drive forward a plan towards the standardised annotation of logical models, review and connect various ongoing projects of experts from different communities involved in the modelling and annotation of molecular biological entities, interactions, pathways and models. This article defines a roadmap towards the annotation and curation of logical models, including milestones for best practices and minimum standard requirements.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 9
    Publication Date: 2020-02-25
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 10
    Publication Date: 2020-02-26
    Description: Circular RNAs (circRNAs) are a unique class of RNA molecule identified more than 40 years ago which are produced by a covalent linkage via back-splicing of linear RNA. Recent advances in sequencing technologies and bioinformatics tools have led directly to an ever-expanding field of types and biological functions of circRNAs. In parallel with technological developments, practical applications of circRNAs have arisen including their utilization as biomarkers of human disease. Currently, circRNA-associated bioinformatics tools can support projects including circRNA annotation, circRNA identification and network analysis of competing endogenous RNA (ceRNA). In this review, we collected about 100 circRNA-associated bioinformatics tools and summarized their current attributes and capabilities. We also performed network analysis and text mining on circRNA tool publications in order to reveal trends in their ongoing development.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 11
    Publication Date: 2020-03-16
    Description: Motivation MircroRNAs (miRNAs) regulate target genes and are responsible for lethal diseases such as cancers. Accurately recognizing and identifying miRNA and gene pairs could be helpful in deciphering the mechanism by which miRNA affects and regulates the development of cancers. Embedding methods and deep learning methods have shown their excellent performance in traditional classification tasks in many scenarios. But not so many attempts have adapted and merged these two methods into miRNA–gene relationship prediction. Hence, we proposed a novel computational framework. We first generated representational features for miRNAs and genes using both sequence and geometrical information and then leveraged a deep learning method for the associations’ prediction. Results We used long short-term memory (LSTM) to predict potential relationships and proved that our method outperformed other state-of-the-art methods. Results showed that our framework SG-LSTM got an area under curve of 0.94 and was superior to other methods. In the case study, we predicted the top 10 miRNA–gene relationships and recommended the top 10 potential genes for hsa-miR-335-5p for SG-LSTM-core. We also tested our model using a larger dataset, from which 14 668 698 miRNA–gene pairs were predicted. The top 10 unknown pairs were also listed. Availability Our work can be download in https://github.com/Xshelton/SG_LSTM Contact luojiawei@hnu.edu.cn Supplementary information Supplementary data are available at Briefings in Bioinformatics online.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 12
    Publication Date: 2020-02-17
    Description: The locations of the initiation of genomic DNA replication are defined as origins of replication sites (ORIs), which regulate the onset of DNA replication and play significant roles in the DNA replication process. The study of ORIs is essential for understanding the cell-division cycle and gene expression regulation. Accurate identification of ORIs will provide important clues for DNA replication research and drug development by developing computational methods. In this paper, the first integrated predictor named iORI-Euk was built to identify ORIs in multiple eukaryotes and multiple cell types. In the predictor, seven eukaryotic (Homo sapiens, Mus musculus, Drosophila melanogaster, Arabidopsis thaliana, Pichia pastoris, Schizosaccharomyces pombe and Kluyveromyces lactis) ORI data was collected from public database to construct benchmark datasets. Subsequently, three feature extraction strategies which are k-mer, binary encoding and combination of k-mer and binary were used to formulate DNA sequence samples. We also compared the different classification algorithms’ performance. As a result, the best results were obtained by using support vector machine in 5-fold cross-validation test and independent dataset test. Based on the optimal model, an online web server called iORI-Euk (http://lin-group.cn/server/iORI-Euk/) was established for the novel ORI identification.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 13
    Publication Date: 2020-01-25
    Description: How to accurately estimate protein–ligand binding affinity remains a key challenge in computer-aided drug design (CADD). In many cases, it has been shown that the binding affinities predicted by classical scoring functions (SFs) cannot correlate well with experimentally measured biological activities. In the past few years, machine learning (ML)-based SFs have gradually emerged as potential alternatives and outperformed classical SFs in a series of studies. In this study, to better recognize the potential of classical SFs, we have conducted a comparative assessment of 25 commonly used SFs. Accordingly, the scoring power was systematically estimated by using the state-of-the-art ML methods that replaced the original multiple linear regression method to refit individual energy terms. The results show that the newly-developed ML-based SFs consistently performed better than classical ones. In particular, gradient boosting decision tree (GBDT) and random forest (RF) achieved the best predictions in most cases. The newly-developed ML-based SFs were also tested on another benchmark modified from PDBbind v2007, and the impacts of structural and sequence similarities were evaluated. The results indicated that the superiority of the ML-based SFs could be fully guaranteed when sufficient similar targets were contained in the training set. Moreover, the effect of the combinations of features from multiple SFs was explored, and the results indicated that combining NNscore2.0 with one to four other classical SFs could yield the best scoring power. However, it was not applicable to derive a generic target-specific SF or SF combination.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 14
    Publication Date: 2020-10-14
    Description: Reliability evaluation of interconnection networks is of significant importance to the design and maintenance of interconnection networks. The component connectivity is an important parameter for the reliability evaluation of interconnection networks and is a generalization of the traditional connectivity. The $g$-component connectivity $ckappa _g (G)$ of a non-complete connected graph $G$ is the minimum number of vertices whose deletion results in a graph with at least $g$ components. Determining the $g$-component connectivity is still an unsolved problem in many interconnection networks. Let $Q_{n,k}$ ($1leq kleq n-1$) denote the $(n, k)$-enhanced hypercube. In this paper, let $ngeq 7$ and $1leq k leq n-5$, we determine $ckappa _{g}(Q_{n,k}) = g(n + 1) - frac{1}{2}g(g + 1) + 1$ for $2 leq g leq n$. The previous result in Zhao and Yang (2019, Conditional connectivity of folded hypercubes. Discret. Appl. Math., 257, 388–392) is extended.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 15
    Publication Date: 2020-07-16
    Description: A general framework to investigate the interference and coverage probability is proposed in this paper for indoor terahertz (THz) communications with beamforming antennas. Due to the multipath effects of THz band (0.1–10 THz), the line of sight and non-line of sight interference from users and access points (APs) (both equipped with beamforming antennas) are separately analyzed based on distance-dependent probability functions. Moreover, to evaluate the effects of obstacles in real applications, a Poisson distribution blockage model is implemented. Moreover, the coverage probability is derived by means of signal to interference plus noise ratio (SINR). Numerical results are conducted to present the interference and coverage probability with different parameters, including the indoor area size, SINR threshold, numbers of interfering users and APs and half-power bandwidth of beamforming antenna.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 16
    Publication Date: 2020-07-07
    Description: We consider the problem of real-time scheduling in uniprocessor devices powered by energy harvesters. In particular, we focus on mixed sets of tasks with time and energy constraints: hard deadline periodic tasks and soft aperiodic tasks without deadlines. We present an optimal aperiodic servicing algorithm that minimizes the response times of aperiodic tasks without compromising the schedulability of hard deadline periodic tasks. The server, called Slack Stealing with energy Preserving (SSP), is designed based on a slack stealing mechanism that profits whenever possible from available spare processing time and energy. We analytically establish the optimality of SSP. Our simulation results validate our theoretical analysis.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 17
    Publication Date: 2020-07-25
    Description: Reversible data hiding (RDH) with contrast enhancement (RDH-CE) is a special type of RDH in improving the subjective visual perception by enhancing the image contrast during the process of data embedding. In RDH-CE, data hiding is achieved via pairwise histogram expansion, and the embedding rate can be increased by performing multiple cycles of histogram expansions. However, when embedding rate gets high, human visible image degradation is observed. Previous work designed an upper bound of the embedding level for RDH-CE, which effectively avoids image over-sharping but offers limited embedding capacity. In this paper, a better tunable bound is designed to enhance the embedding capacity of RDH-CE by exploiting the characteristics of histogram distribution. Furthermore, the objective distortion introduced by histogram pre-shifting is minimized when the embedding level is no more than the upper bound, and the human visible degradation is minimized when the embedding level exceeds the limitation of the proposed upper bound. Experimental results validate that the proposed method provides appropriate upper bound of the embedding level, increases the effective embedding capacity and offers better image contrast.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 18
    Publication Date: 2020-07-01
    Description: The rapid and widespread adoption of internet of things-related services advances the development of the cloud-edge framework, including multiple cloud datacenters (CDCs) and edge micro-datacenters (EDCs). This paper aims to apply analytical modeling techniques to assess the effectiveness of cloud-edge computing resource allocation policies from the perspective of improving the performance of cloud-edge service. We focus on two types of physical device (PD)-allocation policies that define how to select a PD from a CDC/EDC for service provision. The first is randomly selecting a PD, denoted as RandAvail. The other is denoted as SEQ, in which an available idle PD is selected to serve client requests only after the waiting queues of all busy PDs are full. We first present the models in the case of an On–Off request arrival process and verify the approximate accuracy of the proposed models through simulations. Then, we apply analytical models for comparing RandAvail and SEQ policies, in terms of request rejection probability and mean response time, under various system parameter settings.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 19
    Publication Date: 2020-07-01
    Description: The amount of online video content is exponentially increasing, which spurs its access demands. Providing optimal quality of service (QoS) for this ever-increasing video data is a challenging task due to the number of QoS constraints. The system resources, the distributed system platform and the transport protocol thus all need to collaborate to guarantee an acceptable level of QoS for the optimal video streaming process. In this paper, we present a comprehensive survey on QoS management for the video-on-demand systems. First, we focus on load management and replication algorithms in content delivery networks and peer-to-peer (P2P) networks for their shortcomings. We also address the problem of admission control and resource allocation with the objectives of congestion avoidance and frame-loss reduction. Besides, we introduce and discuss various replication schemes. For both the client–server architecture and P2P networks, we highlight the need for a specific storage management policy to preserve system reliability and content availability. We also focus on content distribution and streaming protocols scaling. We deduce that content availability is linked to the characteristics and the performance of the streaming protocols. Finally, we create a comparison table that presents the different contributions of the discussed approaches as well as their limitations. We believe that such a comprehensive survey provides useful insights and contributes to the related domains.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 20
    Publication Date: 2020-05-08
    Description: The inability to scale is one of the most concerning problems looming in blockchain systems, where every node has to store all contents of the ledger database locally, leading to centralization and higher operation costs. In this paper, we propose a model named virtual block group (VBG), which aims at addressing the node storage scalability problem. Adopting the VBG model, each node only needs to store part of block data and saves the VBG storage index to distributed hash table by taking block data as a resource, thus improving the query efficiency of block data. With the incentive mechanism of block data storage, and the storage verification and audit mechanism of block data, the security and reliability of block data storage can be ensured. The analysis and calculation show that this model saves hard drive storage space of the node to a greater extent with a shorter time of requesting block data, in the premise of ensuring secure and reliable block data. Compared to other technologies such as sharding, our model does not change the consensus mechanism or the network topology and retains the reliability and security of the original blockchain system.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 21
    Publication Date: 2020-04-30
    Description: Flying ad hoc networks (FANETs) are a collection of unmanned aerial vehicles that communicate without any predefined infrastructure. FANET, being one of the most researched topics nowadays, finds its scope in many complex applications like drones used for military applications, border surveillance systems and other systems like civil applications in traffic monitoring and disaster management. Quality of service (QoS) performance parameters for routing e.g. delay, packet delivery ratio, jitter and throughput in FANETs are quite difficult to improve. Mobility models play an important role in evaluating the performance of the routing protocols. In this paper, the integration of two selected mobility models, i.e. random waypoint and Gauss–Markov model, is implemented. As a result, the random Gauss integrated model is proposed for evaluating the performance of AODV (ad hoc on-demand distance vector), DSR (dynamic source routing) and DSDV (destination-Sequenced distance vector) routing protocols. The simulation is done with an NS2 simulator for various scenarios by varying the number of nodes and taking low- and high-node speeds of 50 and 500, respectively. The experimental results show that the proposed model improves the QoS performance parameters of AODV, DSR and DSDV protocol.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 22
    Publication Date: 2020-06-12
    Description: The exponential growth in mobile broadband data traffic with demand for faster data connectivity has become the most engaging challenges for mobile operators. They are facing an enormous data load in the core network and are finding new solutions to offload data to other complementary technologies. Mobile data offloading using device-to-device (D2D) communication stands out as the promising and the low-cost solution to reduce the burden on cellular network. Data offloading is the process of reducing the load in the cellular medium by using alternative wireless technologies for bearing data using opportunistic assignment of nodes. In this paper, iNHeRENT, a Novel HybRid user equipment (UE) selection scheme using D2D communication in next generation wireless networks that provides better offloading efficiency and throughput than the existing schemes, is proposed. Here, a small set of Wi-Fi-enabled hybrid user equipment ($UE_H$*) is chosen to offload cellular data in an efficient way. The objective of the work is to use minimum number of $UE_H$* to cover maximum number of UE in the serving area of an evolved Node B and to offload maximum amount of data. A $UE_H$* is a special UE with both cellular and Wi-Fi interfaces enabled to offload data. The coverage, throughput, packet delivery ratio and offloading efficiency metrics for the selected number of $UE_H$* are considered, and it is found that an offloading efficiency of 95.45% was achieved for a minimum number of 7% $UE_H$* using iNHeRENT.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 23
    Publication Date: 2020-10-17
    Description: In the modern era, Internet usage has become a basic necessity in the lives of people. Nowadays, people can perform online shopping and check the customer’s views about products that purchased online. Social networking services enable users to post opinions on public platforms. Analyzing people’s opinions helps corporations to improve the quality of products and provide better customer service. However, analyzing this content manually is a daunting task. Therefore, we implemented sentiment analysis to make the process automatically. The entire process includes data collection, pre-processing, word embedding, sentiment detection and classification using deep learning techniques. Twitter was chosen as the source of data collection and tweets collected automatically by using Tweepy. In this paper, three deep learning techniques were implemented, which are CNN, Bi-LSTM and CNN-Bi-LSTM. Each of the models trained on three datasets consists of 50K, 100K and 200K tweets. The experimental result revealed that, with the increasing amount of training data size, the performance of the models improved, especially the performance of the Bi-LSTM model. When the model trained on the 200K dataset, it achieved about 3% higher accuracy than the 100K dataset and achieved about 7% higher accuracy than the 50K dataset. Finally, the Bi-LSTM model scored the highest performance in all metrics and achieved an accuracy of 95.35%.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 24
    Publication Date: 2020-10-13
    Description: In this research, an automated analysis is performed on students’ chat and text data generated by social media platforms over the course of one semester and thoroughly analyzed for potential feedback about teaching, exams, and course contents. A data crawler is developed that performs horizontal and vertical samplings of the data. After data crawling, a few preprocessing steps are performed including text extraction, noise removal, stop-word removal, word stemming, text classification, and feature extraction. The intensity of a review is determined using four measures containing knowledge and understanding, course contents, teaching style, and assessment procedures for a specific course. The proposed system contains features from text mining and web mining to automatically identify a review whenever a user writes comments on their studies. This system aims to provide curriculum development committees with valuable online student feedback and assist in curriculum improvements. By comparing these automated reviews to results obtained from manual student survey forms, we found that the automated system yields the same output but at a fraction of the cost and time typically spent on collecting and analyzing manual student surveys.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 25
    Publication Date: 2020-06-23
    Description: As a nature-inspired algorithm, artificial bee colony (ABC) is an optimization algorithm that is inspired by the search behaviour of honey bees. The main aim of this study is to examine the effects of the ABC-based feature selection algorithm on classification performance for cyberbullying, which has become a significant worldwide social issue in recent years. With this purpose, the classification performance of the proposed ABC-based feature selection method is compared with three different traditional methods such as information gain, ReliefF and chi square. Experimental results present that ABC-based feature selection method outperforms than three traditional methods for the detection of cyberbullying. The Macro averaged F_measure of the data set is increased from 0.659 to 0.8 using proposed ABC-based feature selection method.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 26
    Publication Date: 2020-08-19
    Description: The prices in the stock market are dynamic in nature, thereby pretend as a hectic challenge to the sellers and buyers in predicting the trending stocks for the future. To ensure effective prediction of the stock market, the chronological penguin Levenberg–Marquardt-based nonlinear autoregressive network (CPLM-based NARX) is employed, and the prediction is devised on the basis of past and the recent rank of market. Initially, input data are subjected to the features extraction that is based on the technical indicators, such as WILLR, ROCR, MOM, RSI, CCI, ADX, TRIX, MACD, OBV, TSF, ATR and MFI. The technical indicator is adapted for predicting the stock market. The wrapper-enabled feature selection is employed for selecting the highly significant features that are generated using the technical indicators. The highly significant features of the data are fed to the prediction module, which is developed using the NARX model. The NARX model uses the CPLM algorithm that is formed using the integration of the chronological-based penguin search optimization algorithm and the Levenberg–Marquardt algorithm. The prediction using the proposed CPLM-based NARX shows the superior performance in terms of mean absolute percentage error and root mean square error with values of 0.96 and 0.805, respectively.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 27
    Publication Date: 2020-06-16
    Description: Location-based services have attracted much attention in both academia and industry. However, protecting user’s privacy while providing accurate service for users remains challenging. In most of the existing research works, a semi-trusted proxy is employed to act on behalf of a user to minimize the computation and communication costs of the user. However, user privacy, e.g. location privacy, cannot be protected against the proxy. In this paper, we design a new blind filter protocol where a user can employ a semi-trusted proxy to determine whether a point of interest is within a circular area centered at the user’s location. During the protocol, neither the proxy nor the location-based service provider can obtain the location of the user and the query results. Moreover, each type of query is controlled by an access tree and only the users whose attributes satisfy this access tree can complete the specific type of query. Security analysis and efficiency experiments validate that the proposed protocol is secure and efficient in terms of the computation and communication overhead.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 28
    Publication Date: 2020-08-17
    Description: In this paper, we present a full-reference quality assessment metric based on the information of visual saliency. The saliency information is provided under the form of degrees associated to each vertex of the surface mesh. From these degrees, statistical attributes reflecting the structures of the reference and distorted meshes are computed. These are used by four comparisons functions genetically optimized that quantify the structure differences between a reference and a distorted mesh. We also present a statistical comparison study of six full-reference quality assessment metrics for 3D meshes. We compare the objective metrics results with humans subjective scores of quality considering the 3D meshes in one hand and the distorsion types in the other hand. Also, we show which metrics are statistically superior to their counterparts. For these comparisons we use the Spearman Rank Ordered Correlation Coefficient and the hypothetic test of Student (ttest). To attest the pertinence of the proposed approach, a comparison with a ground truth saliency and an application associated to the assessment of the visual rendering of smoothing algorithms are presented. Experimental results show that the proposed metric is very competitive with the state-of-the-art.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 29
    Publication Date: 2020-06-15
    Description: Diagnosability and connectivity are important metrics for the reliability and fault diagnosis capability of interconnection networks, respectively. The g-extra connectivity of a graph G, denoted by $kappa _g(G)$, is the minimum number of vertices whose deletion will disconnect the network and every remaining component has more than $g$ vertices. The g-extra conditional diagnosability of graph G, denoted by $t_g(G)$, is the maximum number of faulty vertices that the graph G can guarantee to identify under the condition that every fault-free component contains at least g+1 vertices. In this paper, we first determine that g-extra connectivity of DQcube is $kappa _g(G)=(g+1)(n+1)-frac{g(g+3)}{2}$ for $0leq gleq n-3$ and then show that the g-extra conditional diagnosability of DQcube under the PMC model $(ngeq 4, 1leq gleq n-3)$ and the MM$^ast$ model $(ngeq 7, 1leq gleq frac{n-3}{4})$ is $t_g(G)=(g+1)(n+1)-frac{g(g+3)}{2}+g$, respectively.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 30
    Publication Date: 2020-06-15
    Description: In this paper, using Mixed-Integer Linear Programming, a new automatic search tool for truncated differential characteristic is presented. Our method models the problem of finding a maximal probability truncated differential characteristic, being able to distinguish the cipher from a pseudo-random permutation. Using this method, we analyze Midori64, SKINNY64/X and CRAFT block ciphers, for all of which the existing results are improved. In all cases, the truncated differential characteristic is much more efficient than the (upper bound of) bit-wise differential characteristic proven by the designers, for any number of rounds. More specifically, the highest possible rounds, for which an efficient differential characteristic can exist for Midori64, SKINNY64/X and CRAFT are 6, 7 and 10 rounds, respectively, for which differential characteristics with maximum probabilities of $2^{-60}$, $2^{-52}$ and $2^{-62.61}$ (may) exist. Using our new method, we introduce new truncated differential characteristics for these ciphers with respective probabilities $2^{-54}$, $2^{-4}$ and $2^{-24}$ at the same number of rounds. Moreover, the longest truncated differential characteristics found for SKINNY64/X and CRAFT have 10 and 12 rounds, respectively. This method can be used as a new tool for differential analysis of SPN block ciphers.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 31
    Publication Date: 2020-06-15
    Description: Action recognition is a challenging task. Deep learning models have been investigated to solve this problem. Setting up a new neural network model is a crucial and time-consuming process. Alternatively, pre-trained convolutional neural network (CNN) models offer rapid modeling. The selection of the hyperparameters of CNNs is a challenging issue that heavily depends on user experience. The parameters of CNNs should be carefully selected to get effective results. For this purpose, the artificial bee colony (ABC) algorithm is used for tuning the parameters to get optimum results. The proposed method includes three main stages: the image preprocessing stage involves automatic cropping of the meaningful area within the images in the data set, the transfer learning stage includes experiments with six different pre-trained CNN models and the hyperparameter tuning stage using the ABC algorithm. Performance comparison of the pre-trained CNN models involving the use and nonuse of the ABC algorithm for the Stanford 40 data set is presented. The experiments show that the pre-trained CNN models with ABC are more successful than pre-trained CNN models without ABC. Additionally, to the best of our knowledge, the improved NASNet-Large CNN model with the ABC algorithm gives the best accuracy of 87.78% for the overall success rate-based performance metric.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 32
    Publication Date: 2020-08-05
    Description: At present, the tactile perception of 3D geometric bumps (such as sinusoidal bumps, Gaussian bumps, triangular bumps, etc.) on touchscreens is mainly realized by mapping the local gradients of rendered virtual surfaces to lateral electrostatic friction, while maintaining the constant normal feedback force. The latest study has shown that the recognition rate of 3D visual objects with electrovibration is lower by 27$\%$ than that using force-feedback devices. Based on the custom-designed tactile display coupling with electrovibration and mechanical vibration stimuli, this paper proposes a novel tactile rendering algorithm of 3D geometric bumps, which simultaneously generates the lateral and the normal perceptual dimensions. Specifically, a mapping relationship with the electrostatic friction proportional to the gradient of 3D geometric bumps is firstly established. Then, resorting to the angle between the lateral friction force and the normal feedback force, a rendering model of the normal feedback force using mechanical vibration is further determined. Compared to the previous works with electrovibration, objective evaluations with 12 participants showed that the novel version significantly improved recognition rates of 3D bumps on touchscreens.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 33
    Publication Date: 2020-08-04
    Description: In context-aware recommendation systems, most existing methods encode users’ preferences by mapping item and category information into the same space, which is just a stack of information. The item and category information contained in the interaction behaviours is not fully utilized. Moreover, since users’ preferences for a candidate item are influenced by the changes in temporal and historical behaviours, it is unreasonable to predict correlations between users and candidates by using users’ fixed features. A fine-grained and coarse-grained information based framework proposed in our paper which considers multi-granularity information of users’ historical behaviours. First, a parallel structure is provided to mine users’ preference information under different granularities. Then, self-attention and attention mechanisms are used to capture the dynamic preferences. Experiment results on two publicly available datasets show that our framework outperforms state-of-the-art methods across the calculated evaluation metrics.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 34
    Publication Date: 2020-09-29
    Description: In recent years, with the continuous development of internet of things and cloud computing technologies, data intensive applications have gotten more and more attention. In the distributed cloud environment, the access of massive data is often the bottleneck of its performance. It is very significant to propose a suitable data deployment algorithm for improving the utilization of cloud server and the efficiency of task scheduling. In order to reduce data access cost and data deployment time, an optimal data deployment algorithm is proposed in this paper. By modeling and analyzing the data deployment problem, the problem is solved by using the improved genetic algorithm. After the data are well deployed, aiming at improving the efficiency of task scheduling, a task progress aware scheduling algorithm is proposed in this paper in order to make the speculative execution mechanism more accurate. Firstly, the threshold to detect the slow tasks and fast nodes are set. Then, the slow tasks and fast nodes are detected by calculating the remaining time of the tasks and the real-time processing ability of the nodes, respectively. Finally, the backup execution of the slow tasks is performed on the fast nodes. While satisfying the load balancing of the system, the experimental results show that the proposed algorithms can obviously reduce data access cost, service-level agreement (SLA) default rate and the execution time of the system and optimize data deployment for improving scheduling efficiency in distributed clouds.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 35
    Publication Date: 2020-01-02
    Description: For a high level of data availability and reliability, a common strategy for cloud service providers is to rely on replication, i.e. storing several replicas onto different servers. To provide cloud users with a strong guarantee that all replicas required by them are actually stored, many multi-replica integrity auditing schemes were proposed. However, most existing solutions are not resource economical since users need to create and upload replicas of their files by themselves. A multi-replica solution called Mirror is presented to overcome the problems, but we find that it is vulnerable to storage saving attack, by which a dishonest provider can considerably save storage costs compared to the costs of storing all the replicas honestly—while still can pass any challenge successfully. In addition, we also find that Mirror is easily subject to substitution attack and forgery attack, which pose new security risks for cloud users. To address the problems, we propose some simple yet effective countermeasures and an improved proofs of retrievability and replication scheme, which can resist the aforesaid attacks and maintain the advantages of Mirror, such as economical bandwidth and efficient verification. Experimental results show that our scheme exhibits comparable performance with Mirror while achieving high security.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 36
    Publication Date: 2020-06-08
    Description: A fuzzy extractor derives uniformly random strings from noisy sources that are neither reliably reproducible nor uniformly random. The basic definition of fuzzy extractor was first formally introduced by Dodis et al. and has achieved various applications in cryptographic systems. However, it has been proved that a fuzzy extractor could become totally insecure when the same noisy random source is extracted multiple times. To solve this problem, the reusable fuzzy extractor is proposed. In this paper, we propose the first reusable fuzzy extractor based on the LPN assumption, which is efficient and resilient to linear fraction of errors. Furthermore, our construction serves as an alternative post-quantum reusable fuzzy extractor.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 37
    Publication Date: 2020-03-17
    Description: A microarray dataset contains thousands of DNA spots covering almost every gene in the genome. Microarray-based gene expression helps with the diagnosis, prognosis and treatment of cancer. The nature of diseases frequently changes, which in turn generates a considerable volume of data. The main drawback of microarray data is the curse of dimensionality. It hinders useful information and leads to computational instability. The main objective of feature selection is to extract and remove insignificant and irrelevant features to determine the informative genes that cause cancer. Random forest is a well-suited classification algorithm for microarray data. To enhance the importance of the variables, we proposed out-of-bag (OOB) cases in every tree of the forest to count the number of votes for the exact class. The incorporation of random permutation in the variables of these OOB cases enables us to select the crucial features from high-dimensional microarray data. In this study, we analyze the effects of various random forest parameters on the selection procedure. ‘Variable drop fraction’ regulates the forest construction. The higher variable drop fraction value efficiently decreases the dimensionality of the microarray data. Forest built with 800 trees chooses fewer important features under any variable drop fraction value that reduces microarray data dimensionality.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 38
    Publication Date: 2020-06-08
    Description: Tactile feedback added to touchscreens provides users with a high-quality interactive experience. The effect of tactile feedback on typical interaction gestures requires to be evaluated. With a custom-designed electrostatic tactile feedback device, we explore the effects of tactile feedback on zoom-in/out gestures and determine the issues satisfied by the relationship between completion time (CT) and index of difficulty (ID). Specifically, we compare the effect of electrostatic tactile feedback on the efficiency and accuracy of zoom-in/out gestures in three conditions, that is, no tactile feedback, linearly increasing tactile feedback force over operation process, and tactile feedback only in a target area. Then, we study the relationship between CT and ID with tactile feedback added to the target area. Results of experimental data from 12 participants show that tactile feedback added only to a target area can significantly increase operational efficiency and accuracy of zoom-in/out gestures. Furthermore, the relationship between CT and ID agrees well with Fitts’ law, and the correlation coefficient is larger than 0.9.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 39
    Publication Date: 2020-01-02
    Description: Artificial intelligence is one of the most trending topics in the field of Computer Science which aims to make machines and computers ‘smart’. There are multiple diverse technical and specialized research associated with it. Due to the accelerating rate of technological changes, artificial intelligence has taken over a lot of human jobs and is giving excellent results that are more efficient and effective, than humans. However, a lot of time there has been a concern about the following: will artificial intelligence surpass human intelligence in the near future? Are computers’ ever accelerating abilities to outpace human jobs and skills a matter of concern? The different views and myths on the subject have made it even a more than just a topic of discussion. In this research paper, we will study the existing facts and literature to understand the true definitions of artificial intelligence (AI) and human intelligence (HI) by classifying each of its types separately and analyzing the extent of their full capabilities. Later, we will discuss the possibilities if AI eventually can replace human jobs in the market. Finally, we will synthesize and summarize results and findings of why artificial intelligence cannot surpass human intelligence completely in the future.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 40
    Publication Date: 2020-03-06
    Description: The 5G mobile communication system is coming with a main objective, known also as IMT-2020, that intends to increase the current data rates up to several gigabits per second. To meet an accompanying demand of the super high-speed encryption, EIA and EEA algorithms face some challenges. The 3GPP standardization organization expects to increase the security level to 256-bit key length, and the international cryptographic field responds actively in cipher designs and standard applications. SNOW-V is such a proposal offered by the SNOW family design team, with a revision of the SNOW 3G architecture in terms of linear feedback shift register (LFSR) and finite state machine (FSM), where the LFSR part is new and operates eight times the speed of the FSM, consisting of two shift registers and each feeding into the other, and the FSM increases to three 128-bit registers and employs two instances of full AES encryption round function for update. It takes a 128-bit IV, employs 896-bit internal state and produces 128-bit keystream blocks. The result is competitive in pure software environment, making use of both AES-NI and AVX acceleration instructions. Thus, the security evaluation of SNOW-V is essential and urgent, since there is scarcely any definite security bound for it. In this paper, we propose a byte-based guess-and-determine attack on SNOW-V with complexity $2^{406}$ using only seven keystream blocks. We first improve the heuristic guessing-path auto-searching algorithm based on dynamic programming by adding initial guessing set, which is iteratively modified by sieving out the unnecessary guessing variables, in order to correct the guessing path according to the cipher structure and finally launch smaller guessing basis. For the specific design, we split all the computing units into bytes and rewrite all the internal operations correspondingly. We establish a backward-clock linear equation system according to the circular construction of the LFSR part. Then we further simplify the equations to adapt to the input requirements of the heuristic guessing-path auto-searching algorithm. Finally, the derived guessing path needs modification for the pre-simplification and post-reduction. This is the first complete guess-and-determine attack on SNOW-V as well as the first specific security evaluation to the full cipher.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 41
    Publication Date: 2020-01-02
    Description: Perceptible visual tracking acts as an important module for distinct perception tasks of autonomous robots. Better features help in easier decision-making process. The evaluation of tracking objects, dynamic positions and their visual information in results are quite difficult tasks. Until now, most real-time visual tracking algorithms suffer from poor robustness and low occurrence as they deal with complex real-world data. In this paper, we have proposed more robust and faster visual tracking framework using scale invariant feature transform (SIFT) and the optical flow in belief propagation (BF) algorithm for efficient processing in real scenarios. Here, a new feature-based optical flow along with BF algorithm is utilized to compute the affine matrix of a regional center on SIFT key points in frames. Experimental results depict that the proposed approach is more efficient and more robust in comparison with the state-of-the-art tracking algorithms with more complex scenarios.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 42
    Publication Date: 2020-09-15
    Description: In order to comprehensively evaluate the achievements of the 'Belt and Road' in integrated transportation, researchers need to optimize the method of generating evaluation indices and construct the framework structure of the 'Belt and Road' transportation index system. This paper used GDELT database as data source and obtained full text data of English news in 25 countries along ‘the Belt and Road’. The paper also introduced the topic model, combined with the unsupervised method (latent Dirichlet allocation, LDA) and the supervision method (labeled LDA) to mine the topics contained in the news data. It constructed the transportation development model and analyzed the development trend of transportation in various countries. The study found that the development trend of transportation in the countries along the line is unbalanced, which can be divided into four types: rapid development type, stable development type, slow development type and lagging development type. The method of this paper can effectively extract temporal and spatial variation of news events, discover potential risks in various countries, support real-time and dynamic monitoring of the social development situation of the countries along the border and provide auxiliary decision support for implementation of the ‘the Belt and Road’ initiative, which has important application value.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 43
    Publication Date: 2020-09-14
    Description: Social media is believed to have played a central role in the mobilization of Algerian citizens to peaceful protest against their country’s corrupt regime. Since no one foresaw these protests (called ‘The Revolution of Smiles’ or ‘The Hirak Movement’), this research conducted social media analysis to elicit vital insights about both the intensity of sentiment and the influence of social media on this unexpected instigation of political protest. This work built a deep learning model and analysed the influence of content, sentiment and user features on information spread. The model used the learning capability of a long short-term memory network to predict ‘retweetability’. Experiments were conducted on two real-world datasets (Hirak and Brexit) collected from Twitter. User features were found to be a key element in the diffusion of information. The strongest feelings about event context actively influenced the spread of tweets. The Twitter emotion corpus was found to improve the predictive ability of the model developed in this study.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 44
    Publication Date: 2020-09-12
    Description: This paper introduces a new approach to semantic image retrieval using shape descriptors as dispersion and moment in conjunction with discriminative classifier model of latent-dynamic conditional random fields (LDCRFs). The target region is firstly localized via the background subtraction model. Then the features of dispersion and moments are employed to k-means clustering to extract object’s feature as second stage. After that, the learning process is carried out by LDCRFs. Finally, simple protocol and RDF (resource description framework) query language (i.e. SPARQL) on input text or image query is to retrieve semantic image based on sequential processes of query engine, matching module and ontology manager. Experimental findings show that our approach can be successful to retrieve images against the mammal’s benchmark with retrieving rate of 98.11%. Such outcomes are likely to compare very positively with those accessible in the literature from other researchers.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 45
    Publication Date: 2020-05-22
    Description: Recommender systems nowadays play an important role in providing helpful information for users, especially in ecommerce applications. Many of the proposed models use rating histories of the users in order to predict unknown ratings. Recently, users’ reviews as a valuable source of knowledge have attracted the attention of researchers in this field and a new category denoted as review-based recommender systems has emerged. In this study, we make use of the information included in user reviews as well as available rating scores to develop a review-based rating prediction system. The proposed scheme attempts to handle the uncertainty problem of the rating histories, by fuzzifying the given ratings. Another advantage of the proposed system is the use of a word embedding representation model for textual reviews, instead of using traditional models such as binary bag of words and TFIDF 1 vector space. It also makes use of the helpfulness voting scores, in order to prune data and achieve better results. The effectiveness of the rating prediction scheme as well as the final recommender system was evaluated against the Amazon dataset. Experimental results revealed that the proposed recommender system outperforms its counterparts and can be used as a suitable tool in ecommerce environments.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 46
    Publication Date: 2020-05-19
    Description: The concept of smart systems blessed with different technologies can enable many algorithms used in Machine Learning (ML) and the world of the Internet of Things (IoT). In a modern city many different sensors can be used for information collection. Algorithms that are cast-off in Machine Learning improves the capabilities and intelligence of a system when the amount of data collectedincreases. In this research, we propose a TCC-SVM system model to analyse traffic congestion in the environment of a smart city. The proposed model comprises an ML-enabled IoT-based road traffic congestion control system whereby the occurrence of congestion at a specific point is notified.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 47
    Publication Date: 2020-05-09
    Description: In view of the fact that the existing public opinion propagation aspects are mostly based on single-layer propagation network, these works rarely consider the double-layer network structure and the negative opinion evolution. This paper proposes a new susceptible-infected-vaccinated-susceptible negative opinion information propagation model with preventive vaccination by constructing double-layer network topology. Firstly, the continuous-time Markov chain is used to simulate the negative public opinion information propagation process and the nonlinear dynamic equation of the model is derived; secondly, the steady state condition of the virus propagation in the model is proposed and mathematically proved; finally, Monte Carlo method is applied in the proposed model. The parameters of simulation model have an effect on negative public opinion information propagation, the derivation results are verified by computer simulation. The simulation results show that the proposed model has a larger threshold of public opinion information propagation and has more effective control of the scale of negative public opinion; it also can reduce the density of negative public opinion information propagation and suppress negative public opinion information compared with the traditional susceptible infected susceptible model. It also can provide the scientific method and research approach based on probability statistics for the study of negative public opinion information propagation in complex networks.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 48
    Publication Date: 2020-05-09
    Description: Twitter is an extensively used micro-blogging site for publishing user’s views on recent happenings. This wide reachability of messages over large audience poses a threat, as the degree of personally identifiable information disclosed might lead to user regrets. The Tweet-Scan-Post system scans the tweets contextually for sensitive messages. The tweet repository was generated using cyber-keywords for personal, professional and health tweets. The Rules of Sensitivity and Contextuality was defined based on standards established by various national regulatory bodies. The naive sensitivity regression function uses the Bag-of-Words model built from short text messages. The imbalanced classes in dataset result in misclassification with 25% of sensitive and 75% of insensitive tweets. The system opted stacked classification to combat the problem of imbalanced classes. The system initially applied various state-of-art algorithms and predicted 26% of the tweets to be sensitive. The proposed stacked classification approach increased the overall proportion of sensitive tweets to 35%. The system contributes a vocabulary set of 201 Sensitive Privacy Keyword using the boosting approach for three tweet categories. Finally, the system formulates a sensitivity scaling called TSP’s Tweet Sensitivity Scale based on Senti-Cyber features composed of Sensitive Privacy Keywords, Cyber-keywords with Non-Sensitive Privacy Keywords and Non-Cyber-keywords to detect the degree of disclosed sensitive information.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 49
    Publication Date: 2020-01-30
    Description: Fog computing has become an emerging environment that provides data storage, computing and some other services on the edge of network. It not only can acquire data from terminal devices, but also can provide computing services to users by opening computing resources. Compared with cloud computing, fog devices can collaborate to provide users with powerful computing services through resource allocation. However, as many of fog devices are not monitored, there are some security problems. For example, since fog server processes and maintains user information, device information, task parameters and so on, fog server is easy to perform illegal resource allocation for extra benefits. In this paper, we propose a secure computing resource allocation framework for open fog computing. In our scheme, the fog server is responsible for processing computing requests and resource allocations, and the cloud audit center is responsible for auditing the behaviors of the fog servers and fog nodes. Based on the proposed security framework, our proposed scheme can resist the attack of single malicious node and the collusion attack of fog server and computing devices. Furthermore, the experiments show our proposed scheme is efficient. For example, when the number of initial idle service devices is 40, the rejection rate of allocated tasks is 10% and the total number of sub-tasks is changed from 150 to 200, the total allocation time of our scheme is only changed from 15 ms to 25 ms; additionally, when the task of 5000 order matrix multiplication is tested on 10 service devices, the total computing time of our scheme is $sim$250 s, which is better than that of single computer (where single computer needs more than 1500 s). Therefore, our proposed scheme has obvious advantages when it faces some tasks that require more computational cost, such as complex scientific computing, distributed massive data query, distributed image processing and so on.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 50
    Publication Date: 2020-08-04
    Description: Automatic search methods have been widely used for cryptanalysis of block ciphers, especially for the most classic cryptanalysis methods—differential and linear cryptanalysis. However, the automatic search methods, no matter based on MILP, SMT/SAT or CP techniques, can be inefficient when the search space is too large. In this paper, we propose three new methods to improve Matsui’s branch-and-bound search algorithm, which is known as the first generic algorithm for finding the best differential and linear trails. The three methods, named reconstructing DDT and LAT according to weight, executing linear layer operations in minimal cost and merging two 4-bit S-boxes into one 8-bit S-box, respectively, can efficiently speed up the search process by reducing the search space as much as possible and reducing the cost of executing linear layer operations. We apply our improved algorithm to DESL and GIFT, which are still the hard instances for the automatic search methods. As a result, we find the best differential trails for DESL (up to 14-round) and GIFT-128 (up to 19-round). The best linear trails for DESL (up to 16-round), GIFT-128 (up to 10-round) and GIFT-64 (up to 15-round) are also found. To the best of our knowledge, these security bounds for DESL and GIFT under single-key scenario are given for the first time. Meanwhile, it is the longest exploitable (differential or linear) trails for DESL and GIFT. Furthermore, benefiting from the efficiency of the improved algorithm, we do experiments to demonstrate that the clustering effect of differential trails for 13-round DES and DESL are both weak.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 51
    Publication Date: 2020-08-05
    Description: As the size of a multiprocessor system grows, the probability that faults occur in this system increases. One measure of the reliability of a multiprocessor system is the probability that a fault-free subsystem of a certain size still exists with the presence of individual faults. In this paper, we use the probabilistic fault model to establish the subgraph reliability for $AG_n$, the $n$-dimensional alternating group graph. More precisely, we first analyze the probability $R_n^{n-1}(p)$ that at least one subgraph with dimension $n-1$ is fault-free in $AG_n$, when given a uniform probability of a single vertex being fault-free. Since subgraphs of $AG_n$ intersect in rather complicated manners, we resort to the principle of inclusion–exclusion by considering intersections of up to five subgraphs and obtain an upper bound of the probability. Then we consider the probabilistic fault model when the probability of a single vertex being fault-free is nonuniform, and we show that the upper bound under these two models is very close to the lower bound obtained in a previous result, and it is better than the upper bound deduced from that of the arrangement graph, which means that the upper bound we obtained is very tight.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 52
    Publication Date: 2020-08-05
    Description: Recently, Yang et al. proposed the first certificateless proxy signature scheme against malicious-but-passive key generation center (MKGC) attacks. They proved that their scheme can resist the MKGC attacks in the standard model. In this paper, we point out that their scheme cannot achieve this security because the adversary can forge valid signatures.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 53
    Publication Date: 2020-08-05
    Description: In this study, a cotton disease diagnosis method that uses a combined algorithm of case-based reasoning (CBR) and fuzzy logic was designed and implemented. It focuses on the prevention, diagnosis and control of diseases affecting cotton production in China. Conventional methods of disease diagnosis are primarily based on CBR with reference to user-provided symptoms; however, in most cases, user-provided symptoms do not fully meet the requirements of CBR. To address this problem, fuzzy logic is incorporated into CBR to allow for more flexible and accurate models. With the help of CBR and fuzzy reasoning, three diagnostic results can be obtained by the cotton disease diagnosis system (CDDS) constructed in this study: success, success but not exact and failure. To verify the reliability of the CDDS and its ability to diagnose cotton diseases, its diagnostic accuracy and stability were analyzed and compared with the results obtained by the traditional expert scoring method. The analysis results reveal that the CDDS can achieve a high diagnostic success rate (above 90%) and better diagnostic stability than the traditional expert scoring method when at least four disease symptoms are input. The CDDS provides an independent and objective source of information to assist farmers in the diagnosis and prevention of cotton diseases.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 54
    Publication Date: 2020-08-04
    Description: Thermal sensors are now being an emerging technology in image processing applications such as face recognition, fault detection, object detection and classification, navigation, etc. Owing to its versatility, it has been an influential concern for many researchers recently. Thermal sensors have proficiency of sensing the object heedless of the lighting conditions. Due to this added leverage of thermal sensors, we propose a novel scheme for spotting the object, which is targeted by a specific thermal camera. The accomplishment of this task paves the opportunity for guiding the visually impaired (VI) people within the indoor environment adequately. Augmenting the obstacles in the user’s path is requisite for the VI people’s navigation. The image of the object is captured using the thermal camera and pre-processed for enhancing the quality of that image by suppressing the background, tuning the colour channels, etc. Noise in the thermal image is eradicated to a certain extent using Gaussian smoothing process followed by Markov random field for constructing the Gaussian mixture model. Further, the pattern is deduced and classified based on the least-squares support-vector machine. The experiment is tested for disparate timing and distance, and the optimum solution is obtained. To enact the accurate outcome with short estimation period in affordable size and cost is the main added logic behind this fused concept.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 55
    Publication Date: 2020-10-24
    Description: Enhancer-promoter interactions (EPIs) play an important role in transcriptional regulation. Recently, machine learning-based methods have been widely used in the genome-scale identification of EPIs due to their promising predictive performance. In this paper, we propose a novel method, termed EPI-DLMH, for predicting EPIs with the use of DNA sequences only. EPI-DLMH consists of three major steps. First, a two-layer convolutional neural network is used to learn local features, and an bidirectional gated recurrent unit network is used to capture long-range dependencies on the sequences of promoters and enhancers. Second, an attention mechanism is used for focusing on relatively important features. Finally, a matching heuristic mechanism is introduced for the exploration of the interaction between enhancers and promoters. We use benchmark datasets in evaluating and comparing the proposed method with existing methods. Comparative results show that our model is superior to currently existing models in multiple cell lines. Specifically, we found that the matching heuristic mechanism introduced into the proposed model mainly contributes to the improvement of performance in terms of overall accuracy. Additionally, compared with existing models, our model is more efficient with regard to computational speed.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 56
    Publication Date: 2020-07-30
    Description: Motivation Recent technological advances produce a wealth of high-dimensional descriptions of biological processes, yet extracting meaningful insight and mechanistic understanding from these data remains challenging. For example, in developmental biology, the dynamics of differentiation can now be mapped quantitatively using single-cell RNA sequencing, yet it is difficult to infer molecular regulators of developmental transitions. Here, we show that discovering informative features in the data is crucial for statistical analysis as well as making experimental predictions. Results We identify features based on their ability to discriminate between clusters of the data points. We define a class of problems in which linear separability of clusters is hidden in a low-dimensional space. We propose an unsupervised method to identify the subset of features that define a low-dimensional subspace in which clustering can be conducted. This is achieved by averaging over discriminators trained on an ensemble of proposed cluster configurations. We then apply our method to single-cell RNA-seq data from mouse gastrulation, and identify 27 key transcription factors (out of 409 total), 18 of which are known to define cell states through their expression levels. In this inferred subspace, we find clear signatures of known cell types that eluded classification prior to discovery of the correct low-dimensional subspace. Availability and implementation https://github.com/smelton/SMD. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 57
    Publication Date: 2020-10-26
    Description: Motivation Polymerase chain reaction (PCR) has been a revolutionary biomedical advancement. However, for PCR to be appropriately used, one must spend a significant amount of effort on PCR primer design. Carefully designed PCR primers not only increase sensitivity and specificity, but also decrease effort spent on experimental optimization. Computer software removes the human element by performing and automating the complex and rigorous calculations required in PCR primer design. Classification and review of the available software options and their capabilities should be a valuable resource for any PCR application. Results This paper focuses on currently available free PCR primer design software and their major functions (https://pcrprimerdesign.github.io/). The software are classified according to their PCR applications, such as Sanger sequencing, reverse transcription quantitative PCR, single nucleotide polymorphism detection, splicing variant detection, methylation detection, microsatellite detection, multiplex PCR and targeted next generation sequencing, and conserved/degenerate primers to clone orthologous genes from related species, new gene family members in the same species, or to detect a group of related pathogens. Each software is summarized to provide a technical review of their capabilities and utilities.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 58
    Publication Date: 2020-10-18
    Description: Motivation Network diffusion and label propagation are fundamental tools in computational biology, with applications like gene-disease association, protein function prediction and module discovery. More recently, several publications have introduced a permutation analysis after the propagation process, due to concerns that network topology can bias diffusion scores. This opens the question of the statistical properties and the presence of bias of such diffusion processes in each of its applications. In this work, we characterised some common null models behind the permutation analysis and the statistical properties of the diffusion scores. We benchmarked seven diffusion scores on three case studies: synthetic signals on a yeast interactome, simulated differential gene expression on a protein-protein interaction network and prospective gene set prediction on another interaction network. For clarity, all the datasets were based on binary labels, but we also present theoretical results for quantitative labels. Results Diffusion scores starting from binary labels were affected by the label codification, and exhibited a problem-dependent topological bias that could be removed by the statistical normalisation. Parametric and non-parametric normalisation addressed both points by being codification-independent and by equalising the bias. We identified and quantified two sources of bias -mean value and variance- that yielded performance differences when normalising the scores. We provided closed formulae for both and showed how the null covariance is related to the spectral properties of the graph. Despite none of the proposed scores systematically outperformed the others, normalisation was preferred when the sought positive labels were not aligned with the bias. We conclude that the decision on bias removal should be problem and data-driven, i.e. based on a quantitative analysis of the bias and its relation to the positive entities. Availability The code is publicly available at https://github.com/b2slab/diffuBench Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 59
    Publication Date: 2020-07-29
    Description: Motivation MicroRNAs (miRNAs) are a class of non-coding RNAs that play critical roles in various biological processes. Many studies have shown that miRNAs are closely related to the occurrence, development and diagnosis of human diseases. Traditional biological experiments are costly and time consuming. As a result, effective computational models have become increasingly popular for predicting associations between miRNAs and diseases, which could effectively boost human disease diagnosis and prevention. Results We propose a novel computational framework, called AEMDA, to identify associations between miRNAs and diseases. AEMDA applies a learning-based method to extract dense and high-dimensional representations of diseases and miRNAs from integrated disease semantic similarity, miRNA functional similarity and heterogeneous related interaction data. In addition, AEMDA adopts a deep autoencoder that does not need negative samples to retrieve the underlying associations between miRNAs and diseases. Furthermore, the reconstruction error is used as a measurement to predict disease-associated miRNAs. Our experimental results indicate that AEMDA can effectively predict disease-related miRNAs and outperforms state-of-the-art methods. Availability and implementation The source code and data are available at https://github.com/CunmeiJi/AEMDA. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 60
    Publication Date: 2020-05-19
    Description: Motivation Different from traditional linear RNAs (containing 5′ and 3′ ends), circular RNAs (circRNAs) are a special type of RNAs that have a closed ring structure. Accumulating evidence has indicated that circRNAs can directly bind proteins and participate in a myriad of different biological processes. Results For identifying the interaction of circRNAs with 37 different types of circRNA-binding proteins (RBPs), we develop an ensemble neural network, termed PASSION, which is based on the concatenated artificial neural network (ANN) and hybrid deep neural network frameworks. Specifically, the input of the ANN is the optimal feature subset for each RBP, which has been selected from six types of feature encoding schemes through incremental feature selection and application of the XGBoost algorithm. In turn, the input of the hybrid deep neural network is a stacked codon-based scheme. Benchmarking experiments indicate that the ensemble neural network reaches the average best area under the curve (AUC) of 0.883 across the 37 circRNA datasets when compared with XGBoost, k-nearest neighbor, support vector machine, random forest, logistic regression and Naive Bayes. Moreover, each of the 37 RBP models is extensively tested by performing independent tests, with the varying sequence similarity thresholds of 0.8, 0.7, 0.6 and 0.5, respectively. The corresponding average AUC obtained are 0.883, 0.876, 0.868 and 0.883, respectively, highlighting the effectiveness and robustness of PASSION. Extensive benchmarking experiments demonstrate that PASSION achieves a competitive performance for identifying binding sites between circRNA and RBPs, when compared with several state-of-the-art methods. Availability and implementation A user-friendly web server of PASSION is publicly accessible at http://flagship.erc.monash.edu/PASSION/. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 61
    Publication Date: 2020-09-25
    Description: Summary The COVID-19 crisis has elicited a global response by the scientific community that has led to a burst of publications on the pathophysiology of the virus. However, without coordinated efforts to organize this knowledge, it can remain hidden away from individual research groups. By extracting and formalizing this knowledge in a structured and computable form, as in the form of a knowledge graph, researchers can readily reason and analyze this information on a much larger scale. Here, we present the COVID-19 Knowledge Graph, an expansive cause-and-effect network constructed from scientific literature on the new coronavirus that aims to provide a comprehensive view of its pathophysiology. To make this resource available to the research community and facilitate its exploration and analysis, we also implemented a web application and released the KG in multiple standard formats. Availability The COVID-19 Knowledge Graph is publicly available under CC-0 license at https://github.com/covid19kg and https://bikmi.covid19-knowledgespace.de. Supplementary information Supplementary data are available online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 62
    Publication Date: 2020-01-30
    Description: Motivation One of the most important problems in drug discovery research is to precisely predict a new indication for an existing drug, i.e. drug repositioning. Recent recommendation system-based methods have tackled this problem using matrix completion models. The models identify latent factors contributing to known drug-disease associations, and then infer novel drug-disease associations by the correlations between latent factors. However, these models have not fully considered the various drug data sources and the sparsity of the drug-disease association matrix. In addition, using the global structure of the drug-disease association data may introduce noise, and consequently limit the prediction power. Results In this work, we propose a novel drug repositioning approach by using Bayesian inductive matrix completion (DRIMC). First, we embed four drug data sources into a drug similarity matrix and two disease data sources in a disease similarity matrix. Then, for each drug or disease, its feature is described by similarity values between it and its nearest neighbors, and these features for drugs and diseases are mapped onto a shared latent space. We model the association probability for each drug-disease pair by inductive matrix completion, where the properties of drugs and diseases are represented by projections of drugs and diseases, respectively. As the known drug-disease associations have been manually verified, they are more trustworthy and important than the unknown pairs. We assign higher confidence levels to known association pairs compared with unknown pairs. We perform comprehensive experiments on three benchmark datasets, and DRIMC improves prediction accuracy compared with six stat-of-the-art approaches. Availability and implementation Source code and datasets are available at https://github.com/linwang1982/DRIMC. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 63
    Publication Date: 2020-10-24
    Description: Motivation Neural methods to extract drug-drug interactions (DDIs) from literature require a large number of annotations. In this study, we propose a novel method to effectively utilize external drug database information as well as information from large-scale plain text for DDI extraction. Specifically, we focus on drug description and molecular structure information as the drug database information. Results We evaluated our approach on the DDIExtraction 2013 shared task data set. We obtained the following results. First, large-scale raw text information can greatly improve the performance of extracting DDIs when combined with the existing model and it shows the state-of-the-art performance. Second, each of drug description and molecular structure information is helpful to further improve the DDI performance for some specific DDI types. Finally, the simultaneous use of the drug description and molecular structure information can significantly improve the performance on all the DDI types. We showed that the plain text, the drug description information, and molecular structure information are complementary and their effective combination are essential for the improvement. Availability https://github.com/tticoin/DESC_MOL-DDIE
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 64
    Publication Date: 2020-04-24
    Description: Summary Bulk RNA sequencing studies have demonstrated that human leukocyte antigen (HLA) genes may be expressed in a cell type-specific and allele-specific fashion. Single-cell gene expression assays have the potential to further resolve these expression patterns, but currently available methods do not perform allele-specific quantification at the molecule level. Here, we present scHLAcount, a post-processing workflow for single-cell RNA-seq data that computes allele-specific molecule counts of the HLA genes based on a personalized reference constructed from the sample’s HLA genotypes. Availability and implementation scHLAcount is available under the MIT license at https://github.com/10XGenomics/scHLAcount. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 65
    Publication Date: 2020-03-30
    Description: Motivation Protein domains are subunits that can fold and function independently. Correct domain boundary assignment is thus a critical step toward accurate protein structure and function analyses. There is, however, no efficient algorithm available for accurate domain prediction from sequence. The problem is particularly challenging for proteins with discontinuous domains, which consist of domain segments that are separated along the sequence. Results We developed a new algorithm, FUpred, which predicts protein domain boundaries utilizing contact maps created by deep residual neural networks coupled with coevolutionary precision matrices. The core idea of the algorithm is to retrieve domain boundary locations by maximizing the number of intra-domain contacts, while minimizing the number of inter-domain contacts from the contact maps. FUpred was tested on a large-scale dataset consisting of 2549 proteins and generated correct single- and multi-domain classifications with a Matthew’s correlation coefficient of 0.799, which was 19.1% (or 5.3%) higher than the best machine learning (or threading)-based method. For proteins with discontinuous domains, the domain boundary detection and normalized domain overlapping scores of FUpred were 0.788 and 0.521, respectively, which were 17.3% and 23.8% higher than the best control method. The results demonstrate a new avenue to accurately detect domain composition from sequence alone, especially for discontinuous, multi-domain proteins. Availability and implementation https://zhanglab.ccmb.med.umich.edu/FUpred. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 66
    Publication Date: 2020-04-07
    Description: Motivation Protein structure and function are essentially determined by how the side-chain atoms interact with each other. Thus, accurate protein side-chain packing (PSCP) is a critical step toward protein structure prediction and protein design. Despite the importance of the problem, however, the accuracy and speed of current PSCP programs are still not satisfactory. Results We present FASPR for fast and accurate PSCP by using an optimized scoring function in combination with a deterministic searching algorithm. The performance of FASPR was compared with four state-of-the-art PSCP methods (CISRR, RASP, SCATD and SCWRL4) on both native and non-native protein backbones. For the assessment on native backbones, FASPR achieved a good performance by correctly predicting 69.1% of all the side-chain dihedral angles using a stringent tolerance criterion of 20°, compared favorably with SCWRL4, CISRR, RASP and SCATD which successfully predicted 68.8%, 68.6%, 67.8% and 61.7%, respectively. Additionally, FASPR achieved the highest speed for packing the 379 test protein structures in only 34.3 s, which was significantly faster than the control methods. For the assessment on non-native backbones, FASPR showed an equivalent or better performance on I-TASSER predicted backbones and the backbones perturbed from experimental structures. Detailed analyses showed that the major advantage of FASPR lies in the optimal combination of the dead-end elimination and tree decomposition with a well optimized scoring function, which makes FASPR of practical use for both protein structure modeling and protein design studies. Availability and implementation The web server, source code and datasets are freely available at https://zhanglab.ccmb.med.umich.edu/FASPR and https://github.com/tommyhuangthu/FASPR. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 67
    Publication Date: 2020-04-21
    Description: Motivation Next-generation sequencing is rapidly improving diagnostic rates in rare Mendelian diseases, but even with whole genome or whole exome sequencing, the majority of cases remain unsolved. Increasingly, RNA sequencing is being used to solve many cases that evade diagnosis through sequencing alone. Specifically, the detection of aberrant splicing in many rare disease patients suggests that identifying RNA splicing outliers is particularly useful for determining causal Mendelian disease genes. However, there is as yet a paucity of statistical methodologies to detect splicing outliers. Results We developed LeafCutterMD, a new statistical framework that significantly improves the previously published LeafCutter in the context of detecting outlier splicing events. Through simulations and analysis of real patient data, we demonstrate that LeafCutterMD has better power than the state-of-the-art methodology while controlling false-positive rates. When applied to a cohort of disease-affected probands from the Mayo Clinic Center for Individualized Medicine, LeafCutterMD recovered all aberrantly spliced genes that had previously been identified by manual curation efforts. Availability and implementation The source code for this method is available under the opensource Apache 2.0 license in the latest release of the LeafCutter software package available online at http://davidaknowles.github.io/leafcutter. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 68
    Publication Date: 2020-04-04
    Description: Motivation Programmed DNA elimination (PDE) plays a crucial role in the transitions between germline and somatic genomes in diverse organisms ranging from unicellular ciliates to multicellular nematodes. However, software specific for the detection of DNA splicing events is scarce. In this paper, we describe Accurate Deletion Finder (ADFinder), an efficient detector of PDEs using high-throughput sequencing data. ADFinder can predict PDEs with relatively low sequencing coverage, detect multiple alternative splicing forms in the same genomic location and calculate the frequency for each splicing event. This software will facilitate research of PDEs and all down-stream analyses. Results By analyzing genome-wide DNA splicing events in two micronuclear genomes of Oxytricha trifallax and Tetrahymena thermophila, we prove that ADFinder is effective in predicting large scale PDEs. Availability and implementation The source codes and manual of ADFinder are available in our GitHub website: https://github.com/weibozheng/ADFinder. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 69
    Publication Date: 2020-10-29
    Description:   The development of new drugs is costly, time consuming, and often accompanied with safety issues. Drug repurposing can avoid the expensive and lengthy process of drug development by finding new uses for already approved drugs. In order to repurpose drugs effectively, it is useful to know which proteins are targeted by which drugs. Computational models that estimate the interaction strength of new drug–target pairs have the potential to expedite drug repurposing. Several models have been proposed for this task. However, these models represent the drugs as strings, which is not a natural way to represent molecules. We propose a new model called GraphDTA that represents drugs as graphs and uses graph neural networks to predict drug–target affinity. We show that graph neural networks not only predict drug–target affinity better than non-deep learning models, but also outperform competing deep learning methods. Our results confirm that deep learning models are appropriate for drug–target binding affinity prediction, and that representing drugs as graphs can lead to further improvements. Availability of data and materials The proposed models are implemented in Python. Related data, pre-trained models, and source code are publicly available at https://github.com/thinng/GraphDTA. All scripts and data needed to reproduce the post-hoc statistical analysis are available from https://doi.org/10.5281/zenodo.3603523.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 70
    Publication Date: 2020-04-06
    Description: Summary Fully realizing the promise of personalized medicine will require rapid and accurate classification of pathogenic human variation. Multiplexed assays of variant effect (MAVEs) can experimentally test nearly all possible variants in selected gene targets. Planning a MAVE study involves identifying target genes with clinical impact, and identifying scalable functional assays for that target. Here, we describe MaveQuest, a web-based resource enabling systematic variant effect mapping studies by identifying potential functional assays, disease phenotypes and clinical relevance for nearly all human protein-coding genes. Availability and implementation MaveQuest service: https://mavequest.varianteffect.org/. MaveQuest source code: https://github.com/kvnkuang/mavequest-front-end/. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 71
    Publication Date: 2020-10-16
    Description: Motivation Nearly 40% of the genes in sequenced genomes have no experimentally- or computationally-derived functional annotations. To fill this gap, we seek to develop methods for network-based gene function prediction that can integrate heterogeneous data for multiple species with experimentally-based functional annotations and systematically transfer them to newly-sequenced organisms on a genomewide scale. However, the large sizes of such networks pose a challenge for the scalability of current methods. Results We develop a label propagation algorithm called FastSinkSource. By formally bounding its rate of progress, we decrease the running time by a factor of 100 without sacrificing accuracy. We systematically evaluate many approaches to construct multi-species bacterial networks and apply FastSinkSource and other state-of-the-art methods to these networks. We find that the most accurate and efficient approach is to pre-compute annotation scores for species with experimental annotations, and then to transfer them to other organisms. In this manner, FastSinkSource runs in under three minutes for 200 bacterial species. Availability and Implementation An implementation of our framework and all data used in this research are available at https://github.com/Murali-group/multi-species-GOA-prediction. Contact murali@cs.vt.edu Supplementary Information Supplementary information is available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 72
    Publication Date: 2020-10-16
    Description: Motivation Infection with strains of different subtypes and the subsequent crossover reading between the two strands of genomic RNAs by host cells’ reverse transcriptase are the main causes of the vast HIV-1 sequence diversity. Such inter-subtype genomic recombinants can become circulating recombinant forms (CRFs) after widespread transmissions in a population. Complete prediction of all the subtype sources of a CRF strain is a complicated machine learning problem. It is also difficult to understand whether a strain is an emerging new subtype and if so, how to accurately identify the new components of the genetic source. Results We introduce a multi-label learning algorithm for the complete prediction of multiple sources of a CRF sequence as well as the prediction of its chronological number. The prediction is strengthened by a voting of various multi-label learning methods to avoid biased decisions. In our steps, frequency and position features of the sequences are both extracted to capture signature patterns of pure subtypes and CRFs. The method was applied to 7185 HIV-1 sequences, comprising 5530 pure subtype sequences and 1655 CRF sequences. Results have demonstrated that the method can achieve very high accuracy (reaching 99%) in the prediction of the complete set of labels of HIV-1 recombinant forms. A few wrong predictions are actually incomplete predictions, very close to the complete set of genuine labels. Availability https://github.com/Runbin-tang/The-source-of-HIV-CRFs-prediction Contact yuzuguo@aliyun.com;jinyan.li@uts.edu.au Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 73
    Publication Date: 2020-08-05
    Description: Motivation Despite of the lack of folded structure, intrinsically disordered regions (IDRs) of proteins play versatile roles in various biological processes, and many nonsynonymous single nucleotide variants (nsSNVs) in IDRs are associated with human diseases. The continuous accumulation of nsSNVs resulted from the wide application of NGS has driven the development of disease-association prediction methods for decades. However, their performance on nsSNVs in IDRs remains inferior, possibly due to the domination of nsSNVs from structured regions in training data. Therefore, it is highly demanding to build a disease-association predictor specifically for nsSNVs in IDRs with better performance. Results We present IDRMutPred, a machine learning-based tool specifically for predicting disease-associated germline nsSNVs in IDRs. Based on 17 selected optimal features that are extracted from sequence alignments, protein annotations, hydrophobicity indices and disorder scores, IDRMutPred was trained using three ensemble learning algorithms on the training dataset containing only IDR nsSNVs. The evaluation on the two testing datasets shows that all the three prediction models outperform 17 other popular general predictors significantly, achieving the ACC between 0.856 and 0.868 and MCC between 0.713 and 0.737. IDRMutPred will prioritize disease-associated IDR germline nsSNVs more reliably than general predictors. Availability and implementation The software is freely available at http://www.wdspdb.com/IDRMutPred. Contact yezq@pku.org.cn or ydwu@pku.edu.cn Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 74
    Publication Date: 2020-10-29
    Description: Motivation The rapid development of sequencing technologies has enabled us to generate a large number of metagenomic reads from genetic materials in microbial communities, making it possible to gain deep insights into understanding the differences between the genetic materials of different groups of microorganisms, such as bacteria, viruses, plasmids, etc. Computational methods based on k-mer frequencies have been shown to be highly effective for classifying metagenomic sequencing reads into different groups. However, such methods usually use all the k-mers as features for prediction without selecting relevant k-mers for the different groups of sequences, i.e, unique nucleotide patterns containing biological significance. Results To select k-mers for distinguishing different groups of sequences with guaranteed false discovery rate (FDR) control, we develop KIMI, a general framework based on model-X Knockoffs regarded as the state-of-the-art statistical method for false discovery rate (FDR) control, for sequence motif discovery with arbitrary target FDR level, such that reproducibility can be theoretically guaranteed. KIMI is shown through simulation studies to be effective in simultaneously controlling FDR and yielding high power, outperforming the broadly used Benjamini-Hochberg (B-H) procedure and the q-value method for FDR control. To illustrate the usefulness of KIMI in analyzing real datasets, we take the viral motif discovery problem as an example and implement KIMI on a real dataset consisting of viral and bacterial contigs. We show that the accuracy of predicting viral and bacterial contigs can be increased by training the prediction model only on relevant k-mers selected by KIMI. Availability Our implementation of KIMI is available at https://github.com/xinbaiusc/KIMI. Supplementary information Supplementary Materials are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 75
    Publication Date: 2020-04-14
    Description: Motivation Therapeutic peptides failing at clinical trials could be attributed to their toxicity profiles like hemolytic activity, which hamper further progress of peptides as drug candidates. The accurate prediction of hemolytic peptides (HLPs) and its activity from the given peptides is one of the challenging tasks in immunoinformatics, which is essential for drug development and basic research. Although there are a few computational methods that have been proposed for this aspect, none of them are able to identify HLPs and their activities simultaneously. Results In this study, we proposed a two-layer prediction framework, called HLPpred-Fuse, that can accurately and automatically predict both hemolytic peptides (HLPs or non-HLPs) as well as HLPs activity (high and low). More specifically, feature representation learning scheme was utilized to generate 54 probabilistic features by integrating six different machine learning classifiers and nine different sequence-based encodings. Consequently, the 54 probabilistic features were fused to provide sufficiently converged sequence information which was used as an input to extremely randomized tree for the development of two final prediction models which independently identify HLP and its activity. Performance comparisons over empirical cross-validation analysis, independent test and case study against state-of-the-art methods demonstrate that HLPpred-Fuse consistently outperformed these methods in the identification of hemolytic activity. Availability and implementation For the convenience of experimental scientists, a web-based tool has been established at http://thegleelab.org/HLPpred-Fuse. Contact glee@ajou.ac.kr or watshara.sho@mahidol.ac.th or bala@ajou.ac.kr Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 76
    Publication Date: 2020-10-28
    Description: Motivation Identification of blood-brain barrier (BBB) permeability of a compound is a major challenge in neurotherapeutic drug discovery. Conventional approaches for BBB permeability measurement are expensive, time-consuming, and labor-intensive. BBB permeability is associated with diverse chemical properties of compounds. However, BBB permeability prediction models have been developed using small datasets and limited features, which are usually not practical due to their low coverage of chemical diversity of compounds. Aim of this study is to develop a BBB permeability prediction model using a large dataset for practical applications. This model can be used for facilitated compound screening in the early stage of brain drug discovery. Results A dataset of 7162 compounds with BBB permeability (5453 BBB+ and 1709 BBB-) was compiled from the literature, where BBB+ and BBB- denote BBB-permeable and non-permeable compounds, respectively. We trained a machine learning model based on Light Gradient Boosting Machine (LightGBM) algorithm and achieved an overall accuracy of 89%, an area under the curve (AUC) of 0.93, specificity of 0.77, and sensitivity of 0.93, when ten-fold cross-validation was performed. The model was further evaluated using 74 central nerve system (CNS) compounds (39 BBB+ and 35 BBB-) obtained from the literature and showed an accuracy of 90%, sensitivity of 0.85, and specificity of 0.94. Our model outperforms over existing BBB permeability prediction models. Availability The prediction server is available at http://ssbio.cau.ac.kr/software/bbb.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 77
    Publication Date: 2020-01-08
    Description: Summary ChemBioServer 2.0 is the advanced sequel of a web server for filtering, clustering and networking of chemical compound libraries facilitating both drug discovery and repurposing. It provides researchers the ability to (i) browse and visualize compounds along with their physicochemical and toxicity properties, (ii) perform property-based filtering of compounds, (iii) explore compound libraries for lead optimization based on perfect match substructure search, (iv) re-rank virtual screening results to achieve selectivity for a protein of interest against different protein members of the same family, selecting only those compounds that score high for the protein of interest, (v) perform clustering among the compounds based on their physicochemical properties providing representative compounds for each cluster, (vi) construct and visualize a structural similarity network of compounds providing a set of network analysis metrics, (vii) combine a given set of compounds with a reference set of compounds into a single structural similarity network providing the opportunity to infer drug repurposing due to transitivity, (viii) remove compounds from a network based on their similarity with unwanted substances (e.g. failed drugs) and (ix) build custom compound mining pipelines. Availability and implementation http://chembioserver.vi-seem.eu.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 78
    Publication Date: 2020-10-27
    Description: Motivation Structured semantic resources, for example, biological knowledge bases (KBs) and ontologies, formally define biological concepts, entities and their semantic relationships, manifested as structured axioms and unstructured texts (e.g., textual definitions). The resources contain accurate expressions of biological reality and have been used by machine-learning models to assist intelligent applications like knowledge discovery. The current methods use both the axioms and definitions as plain texts in representation learning. However, since the axioms are machine-readable while the natural language is human-understandable, difference in meaning of token and structure impedes the representations to encode desirable biological knowledge. Results We propose ERBK, a representation learning model of bio-entities. Instead of using the axioms and definitions as a textual corpus, our method uses knowledge graph embedding method and deep convolutional neural models to encode the axioms and definitions respectively. The representations could not only encode more underlying biological knowledge but also be further applied to zero-shot circumstance where existing approaches fall short. Experimental evaluations show that ERBK outperforms the existing methods for predicting protein-protein interactions and gene-disease associations. Moreover, it shows that ERBK still maintains promising performance under the zero-shot circumstance. We believe the representations and the method have certain generality and could extend to other types of bio-relation. Availability The source code is available at the gitlab repository https://gitlab.com/BioAI/erbk Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 79
    Publication Date: 2020-10-14
    Description: Motivation Despite widespread prevalence of somatic structural variations (SV) across most tumor types, understanding of their molecular implications often remains poor. SVs are extremely heterogeneous in size and complexity, hindering the interpretation of their pathogenic role. Tools integrating large SV datasets across platforms are required to fully characterize the cancer’s somatic landscape. Results svpluscnv R package is a swiss army knife for the integration and interpretation of orthogonal datasets including copy number variant (CNV) segmentation profiles and sequencing-based structural variant calls (SVC). The package implements analysis and visualization tools to evaluate chromosomal instability and ploidy, identify genes harboring recurrent SVs and detects complex rearrangements such as chromothripsis and chromoplexia. Further, it allows systematic identification of hot-spot shattered genomic regions, showing reproducibility across alternative detection methods and datasets. Availability https://github.com/ccbiolab/svpluscnv Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 80
    Publication Date: 2020-10-14
    Description: Motivation We present flexiMAP (flexible Modeling of Alternative PolyAdenylation), a new beta-regression-based method implemented in R, for discovering differential alternative polyadenylation events in standard RNA-seq data. Results We show, using both simulated and real data, that flexiMAP exhibits a good balance between specificity and sensitivity and compares favourably to existing methods, especially at low fold changes. In addition, the tests on simulated data reveal some hitherto unrecognised caveats of existing methods. Importantly, flexiMAP allows modeling of multiple known covariates that often confound the results of RNA-seq data analysis. Availability The flexiMAP R package is available at: https://github.com/kszkop/flexiMAP Scripts and data to reproduce the analysis in this paper are available at: https://doi.org/10.5281/zenodo.3689788 Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 81
    Publication Date: 2020-10-14
    Description: Motivation The complete characterization of enzymatic activities between molecules remains incomplete, hindering biological engineering and limiting biological discovery. We develop in this work a technique, Enzymatic Link Prediction (ELP), for predicting the likelihood of an enzymatic transformation between two molecules. ELP models enzymatic reactions catalogued in the KEGG database as a graph. ELP is innovative over prior works in using graph embedding to learn molecular representations that capture not only molecular and enzymatic attributes but also graph connectivity. Results We explore transductive (test nodes included in the training graph) and inductive (test nodes not part of the training graph) learning models. We show that ELP achieves high AUC when learning node embeddings using both graph connectivity and node attributes. Further, we show that graph embedding improves link prediction by 30% in AUC over fingerprint-based similarity approaches and by 8% over Support Vector Machines. We compare ELP against rule-based methods. We also evaluate ELP for predicting links in pathway maps and for reconstruction of edges in reaction networks of four common gut microbiota phyla: actinobacteria, bacteroidetes, firmicutes and proteobacteria. To emphasize the importance of graph embedding in the context of biochemical networks, we illustrate how graph embedding can guide visualization. Availability The code and datasets are available through https://github.com/HassounLab/ELP
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 82
    Publication Date: 2020-10-14
    Description: Summary Electronic health records (EHRs) linked with a DNA biobank provide unprecedented opportunities for biomedical research in precision medicine. The Phenome-wide association study (PheWAS) is a widely used technique for the evaluation of relationships between genetic variants and a large collection of clinical phenotypes recorded in EHRs. PheWAS analyses are typically presented as static tables and charts of summary statistics obtained from statistical tests of association between a genetic variant and individual phenotypes. Comorbidities are common and typically lead to complex, multivariate gene-disease association signals that are challenging to interpret. Discovering and interrogating multimorbidity patterns and their influence in PheWAS is difficult and time-consuming. We present PheWAS-ME: an interactive dashboard to visualize individual-level genotype and phenotype data side-by-side with PheWAS analysis results, allowing researchers to explore multimorbidity patterns and their associations with a genetic variant of interest. We expect this application to enrich PheWAS analyses by illuminating clinical multimorbidity patterns present in the data. Availability A demo PheWAS-ME application is publicly available at https://prod.tbilab.org/phewas_me/. Sample datasets are provided for exploration with the option to upload custom PheWAS results and corresponding individual-level data. Online versions of the appendices are available at https://prod.tbilab.org/phewas_me_info/. The source code is available as an R package on GitHub (https://github.com/tbilab/multimorbidity_explorer). Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 83
    Publication Date: 2020-10-14
    Description: Motivation Over the past years, many computational methods have been developed to incorporate information about phenotypes for disease gene prioritization task. These methods generally compute the similarity between a patient’s phenotypes and a database of gene-phenotype to find the most phenotypically similar match. The main limitation in these methods is their reliance on knowledge about phenotypes associated with particular genes, which is not complete in humans as well as in many model organisms such as the mouse and fish. Information about functions of gene products and anatomical site of gene expression is available for more genes and can also be related to phenotypes through ontologies and machine learning models. Results We developed a novel graph-based machine learning method for biomedical ontologies which is able to exploit axioms in ontologies and other graph-structured data. Using our machine learning method, we embed genes based on their associated phenotypes, functions of the gene products, and anatomical location of gene expression. We then develop a machine learning model to predict gene–disease associations based on the associations between genes and multiple biomedical ontologies, and this model significantly improves over state of the art methods. Furthermore, we extend phenotype-based gene prioritization methods significantly to all genes which are associated with phenotypes, functions, or site of expression. Availability Software and data are available at https://github.com/bio-ontology-research-group/DL2Vec.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 84
    Publication Date: 2020-10-14
    Description: Summary Here we present PhyloWGA, an open source R package for conducting phylogenetic analysis and investigation of whole genome data Availability Available at Github (https://github.com/radamsRHA/PhyloWGA).
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 85
    Publication Date: 2020-10-14
    Description: Motivation Efficient and accurate alignment of DNA/RNA sequence reads to each other or to a reference genome/transcriptome is an important problem in genomic analysis. Nanopore sequencing has emerged as a major sequencing technology and many long-read aligners have been designed for aligning nanopore reads. However, the high error rate makes accurate and efficient alignment difficult. Utilizing the noise and error characteristics inherent in the sequencing process properly can play a vital role in constructing a robust aligner. In this paper, we design QAlign, a pre-processor that can be used with any long-read aligner for aligning long reads to a genome/transcriptome or to other long reads. The key idea in QAlign is to convert the nucleotide reads into discretized current levels that capture the error modes of the nanopore sequencer before running it through a sequence aligner. Results We show that QAlign is able to improve alignment rates from around 80% up to 90% with nanopore reads when aligning to the genome. We also show that QAlign improves the average overlap quality by 9.2%, 2.5% and 10.8% andin three real datasets for read-to-read alignment. Read-to-transcriptome alignment rates are improved from 51.6% to 75.4% and 82.6% to 90% in two real datasets. Availability https://github.com/joshidhaivat/QAlign.git Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 86
    Publication Date: 2020-10-14
    Description: Motivation A phylogenetic tree reconciliation is a mapping of one phylogenetic tree onto another which represents the co-evolution of two sets of taxa (eg. parasite-host co-evolution, gene-species co-evolution). The reconciliation framework was extended to allow modeling the co-evolution of three sets of taxa such as transcript-gene-species co-evolutions. Several web-based tools have been developed for the display and manipulation of phylogenetic trees and co-phylogenetic trees involving two trees, but there currently exists no tool for visualizing the joint reconciliation between three phylogenetic trees. Results Here, we present DoubleRecViz, a web-based tool for visualizing double reconciliations between phylogenetic trees at three levels: transcript, gene and species. DoubleRecViz extends the RecPhyloXML model –developed for gene-species tree reconciliation– to represent joint transcript-gene and gene-species tree reconciliations. It is implemented using the Dash library, which is a toolbox that provides dynamic visualization functionalities for web data visualization in Python. Availability and implementation DoubleRecViz is available through a web server at https://doublerecviz.cobius.usherbrooke.ca. The source code and information about installation procedures are also available at https://github.com/UdeS-CoBIUS/DoubleRecViz. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 87
    Publication Date: 2020-10-13
    Description: Summary FCSlib is an open-source R tool for Fluorescence Fluctuation Spectroscopy data analysis. It encompasses techniques such as Fluorescence Correlation Spectroscopy, Number and Brightness, Pair Correlation Function and Pair Correlation of Molecular Brightness. Availability https://cran.r-project.org/web/packages/FCSlib/ for Linux, Windows and macOS platforms. Supplementary information Available at https://github.com/FCSlib/FCSlib and Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 88
    Publication Date: 2020-10-14
    Description: Motivation Recent advancements in high-dimensional single-cell technologies, such as mass cytometry, enable longitudinal experiments to track dynamics of cell populations and identify change points where the proportions vary significantly. However, current research is limited by the lack of tools specialized for analyzing longitudinal mass cytometry data. In order to infer cell population dynamics from such data, we developed a statistical framework named CYBERTRACK2.0. The framework’s analytic performance was validated against synthetic and real data, showing that its results are consistent with previous research. Availability CYBERTRACK2.0 is available at https://github.com/kodaim1115/CYBERTRACK2. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 89
    Publication Date: 2020-01-23
    Description: Motivation The development of clustered regularly interspaced short palindromic repeat (CRISPR)/CRISPR-associated protein 9 (Cas9) technology has provided a simple yet powerful system for targeted genome editing. In recent years, this system has been widely used for various gene editing applications. The CRISPR editing efficacy is mainly dependent on the single guide RNA (sgRNA), which guides Cas9 for genome cleavage. While there have been multiple attempts at improving sgRNA design, there is a pressing need for greater sgRNA potency and generalizability across various experimental conditions. Results We employed a unique plasmid library expressed in human cells to quantify the potency of thousands of CRISPR/Cas9 sgRNAs. Differential sequence and structural features among the most and least potent sgRNAs were then used to train a machine learning algorithm for assay design. Comparative analysis indicates that our new algorithm outperforms existing CRISPR/Cas9 sgRNA design tools. Availability and implementation The new sgRNA design tool is freely accessible as a web application, http://crispr.wustl.edu. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 90
    Publication Date: 2020-01-06
    Description: Summary ipyrad is a free and open source tool for assembling and analyzing restriction site-associated DNA sequence datasets using de novo and/or reference-based approaches. It is designed to be massively scalable to hundreds of taxa and thousands of samples, and can be efficiently parallelized on high performance computing clusters. It is available both as a command line interface and as a Python package with an application programming interface, the latter of which can be used interactively to write complex, reproducible scripts and implement a suite of downstream analysis tools. Availability and implementation ipyrad is a free and open source program written in Python. Source code is available from the GitHub repository (https://github.com/dereneaton/ipyrad/), and Linux and MacOS installs are distributed through the conda package manager. Complete documentation, including numerous tutorials, and Jupyter notebooks demonstrating example assemblies and applications of downstream analysis tools are available online: https://ipyrad.readthedocs.io/.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 91
    Publication Date: 2020-01-17
    Description: Motivation The birth-death (BD) model constitutes the theoretical backbone of most phylogenetic tools for reconstructing speciation/extinction dynamics over time. Performing simulations of reconstructed trees (linking extant taxa) under the BD model in backward time, conditioned on the number of species sampled at present day and, in some cases, a specific time interval since the most recent common ancestor (MRCA), is needed for assessing the performance of reconstruction tools, for parametric bootstrapping and for detecting data outliers. The few simulation tools that exist scale poorly to large modern phylogenies, which can comprise thousands or even millions of tips (and rising). Results Here I present efficient software for simulating reconstructed phylogenies under time-dependent BD models in backward time, conditioned on the number of sampled species and (optionally) on the time since the MRCA. On large trees, my software is 1000–10 000 times faster than existing tools. Availability and implementation The presented software is incorporated into the R package ‘castor’, which is available on The Comprehensive R Archive Network (CRAN). Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 92
    Publication Date: 2020-01-23
    Description: Motivation Systematic identification of molecular targets among known drugs plays an essential role in drug repurposing and understanding of their unexpected side effects. Computational approaches for prediction of drug–target interactions (DTIs) are highly desired in comparison to traditional experimental assays. Furthermore, recent advances of multiomics technologies and systems biology approaches have generated large-scale heterogeneous, biological networks, which offer unexpected opportunities for network-based identification of new molecular targets among known drugs. Results In this study, we present a network-based computational framework, termed AOPEDF, an arbitrary-order proximity embedded deep forest approach, for prediction of DTIs. AOPEDF learns a low-dimensional vector representation of features that preserve arbitrary-order proximity from a highly integrated, heterogeneous biological network connecting drugs, targets (proteins) and diseases. In total, we construct a heterogeneous network by uniquely integrating 15 networks covering chemical, genomic, phenotypic and network profiles among drugs, proteins/targets and diseases. Then, we build a cascade deep forest classifier to infer new DTIs. Via systematic performance evaluation, AOPEDF achieves high accuracy in identifying molecular targets among known drugs on two external validation sets collected from DrugCentral [area under the receiver operating characteristic curve (AUROC) = 0.868] and ChEMBL (AUROC = 0.768) databases, outperforming several state-of-the-art methods. In a case study, we showcase that multiple molecular targets predicted by AOPEDF are associated with mechanism-of-action of substance abuse disorder for several marketed drugs (such as aripiprazole, risperidone and haloperidol). Availability and implementation Source code and data can be downloaded from https://github.com/ChengF-Lab/AOPEDF. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 93
    Publication Date: 2020-07-03
    Description: Motivation The Robinson–Foulds (RF) metric is widely used by biologists, linguists and chemists to quantify similarity between pairs of phylogenetic trees. The measure tallies the number of bipartition splits that occur in both trees—but this conservative approach ignores potential similarities between almost-identical splits, with undesirable consequences. ‘Generalized’ RF metrics address this shortcoming by pairing splits in one tree with similar splits in the other. Each pair is assigned a similarity score, the sum of which enumerates the similarity between two trees. The challenge lies in quantifying split similarity: existing definitions lack a principled statistical underpinning, resulting in misleading tree distances that are difficult to interpret. Here, I propose probabilistic measures of split similarity, which allow tree similarity to be measured in natural units (bits). Results My new information-theoretic metrics outperform alternative measures of tree similarity when evaluated against a broad suite of criteria, even though they do not account for the non-independence of splits within a single tree. Mutual clustering information exhibits none of the undesirable properties that characterize other tree comparison metrics, and should be preferred to the RF metric. Availability and implementation The methods discussed in this article are implemented in the R package ‘TreeDist’, archived at https://dx.doi.org/10.5281/zenodo.3528123. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 94
    Publication Date: 2020-10-26
    Description: Motivation The existence of more than 100 public Galaxy servers with service quotas is indicative of the need for an increased availability of compute resources for Galaxy to use. The GalaxyCloudRunner enables a Galaxy server to easily expand its available compute capacity by sending user jobs to cloud resources. User jobs are routed to the acquired resources based on a set of configurable rules and the resources can be dynamically acquired from any of 4 popular cloud providers (AWS, Azure, GCP, or OpenStack) in an automated fashion. Availability GalaxyCloudRunner is implemented in Python and leverages Docker containers. The source code is MIT licensed and available at https://github.com/cloudve/galaxycloudrunner. The documentation is available at http://gcr.cloudve.org/.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 95
    Publication Date: 2020-05-21
    Description: Motivation Recent developments in technology have enabled researchers to collect multiple OMICS datasets for the same individuals. The conventional approach for understanding the relationships between the collected datasets and the complex trait of interest would be through the analysis of each OMIC dataset separately from the rest, or to test for associations between the OMICS datasets. In this work we show that integrating multiple OMICS datasets together, instead of analysing them separately, improves our understanding of their in-between relationships as well as the predictive accuracy for the tested trait. Several approaches have been proposed for the integration of heterogeneous and high-dimensional (p≫n) data, such as OMICS. The sparse variant of canonical correlation analysis (CCA) approach is a promising one that seeks to penalize the canonical variables for producing sparse latent variables while achieving maximal correlation between the datasets. Over the last years, a number of approaches for implementing sparse CCA (sCCA) have been proposed, where they differ on their objective functions, iterative algorithm for obtaining the sparse latent variables and make different assumptions about the original datasets. Results Through a comparative study we have explored the performance of the conventional CCA proposed by Parkhomenko et al., penalized matrix decomposition CCA proposed by Witten and Tibshirani and its extension proposed by Suo et al. The aforementioned methods were modified to allow for different penalty functions. Although sCCA is an unsupervised learning approach for understanding of the in-between relationships, we have twisted the problem as a supervised learning one and investigated how the computed latent variables can be used for predicting complex traits. The approaches were extended to allow for multiple (more than two) datasets where the trait was included as one of the input datasets. Both ways have shown improvement over conventional predictive models that include one or multiple datasets. Availability and implementation https://github.com/theorod93/sCCA. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 96
    Publication Date: 2020-10-12
    Description: Motivation The combinatorial sequential Monte Carlo (CSMC) has been demonstrated to be an efficient complementary method to the standard Markov chain Monte Carlo (MCMC) for Bayesian phylogenetic tree inference using biological sequences. It is appealing to combine the CSMC and MCMC in the framework of the particle Gibbs (PG) sampler to jointly estimate the phylogenetic trees and evolutionary parameters. However, the Markov chain of the particle Gibbs may mix poorly for high dimensional problems (e.g. phylogenetic trees). Some remedies, including the particle Gibbs with ancestor sampling and the interacting particle MCMC, have been proposed to improve the PG. But they either cannot be applied to or remain inefficient for the combinatorial tree space. Results We introduce a novel CSMC method by proposing a more efficient proposal distribution. It also can be combined into the particle Gibbs sampler framework to infer parameters in the evolutionary model. The new algorithm can be easily parallelized by allocating samples over different computing cores. We validate that the developed CSMC can sample trees more efficiently in various particle Gibbs samplers via numerical experiments. Availability Our implementation is available at https://github.com/liangliangwangsfu/phyloPMCMC Supplementary information Supplementary materials are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 97
    Publication Date: 2020-10-12
    Description: Summary The C ++ library HOPS (Highly-Optimized Polytope Sampling) provides implementations of efficient and scalable algorithms for sampling convex-constrained models that are equipped with arbitrary target functions. For uniform sampling, substantial performance gains were achieved compared to the state-of-the-art. The ease of integration and utility of non-uniform sampling is showcased in a Bayesian inference setting, demonstrating how HOPS interoperates with third-party software. Availability and Implementation Source code is available at https://github.com/modsim/hops/, tested on Linux and MS Windows, includes unit tests, detailed documentation, example applications, and a Dockerfile. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 98
    Publication Date: 2020-10-08
    Description: Motivation Since its launch in 2010, Identifiers.org has become an important tool for the annotation and cross-referencing of Life Science data. In 2016, we established the Compact Identifier (CID) scheme (prefix: accession) to generate globally unique identifiers for data resources using their locally assigned accession identifiers. Since then, we have developed and improved services to support the growing need to create, reference and resolve CIDs, in systems ranging from human readable text to cloud based e-infrastructures, by providing high availability and low latency cloud-based services, backed by a high quality, manually curated resource. Results We describe a set of services that can be used to construct and resolve CIDs in Life Sciences and beyond. We have developed a new front end for accessing the Identifiers.org registry data and APIs to simplify integration of Identifiers.org CID services with third party applications. We have also deployed the new Identifiers.org infrastructure in a commercial cloud environment, bringing our services closer to the data. Availability https://identifiers.org Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 99
    Publication Date: 2020-10-08
    Description: Here we present an automated pipeline for downloading NCBI GenBank entries (DONE) and continuous updating of a local sequence database based on user-specified queries. The database can be created with either protein or nucleotide sequences containing all entries or complete genomes only. The pipeline can automatically clean the database by removing entries with matches to a database of user-specified sequence contaminants. The default contamination entries include sequences from the UniVec database of plasmids, marker genes and sequencing adapters from NCBI, an E. coli genome, rRNA sequences, vectors and satellite sequences. Furthermore, duplicates are removed and the database is automatically screened for sequences from green fluorescent protein (GFP), luciferase and antibiotic resistance genes that might be present in some GenBank viral entries, and could lead to false positives in virus identification. For utilizing the database we present a useful opportunity for dealing with possible human contamination. We show the applicability of DONE by downloading a virus database comprising 37 virus families. We observed an average increase of 16,776 new entries downloaded per month for the 37 families. Additionally, we demonstrate the utility of a custom database compared to a standard reference database for classifying both simulated and real sequence data. Availability The DONE pipeline for downloading and cleaning is deposited in a publicly available repository (https://bitbucket.org/genomicepidemiology/done/src/master/). Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 100
    Publication Date: 2020-10-08
    Description: Summary Population studies such as genome-wide association study (GWAS) have identified a variety of genomic variants associated with human diseases. To further understand potential mechanisms of disease variants, recent statistical methods associate functional omic data (e.g., gene expression) with genotype and phenotype and link variants to individual genes. However, how to interpret molecular mechanisms from such associations, especially across omics, is still challenging. To address this problem, we developed an interpretable deep learning method, Varmole, to simultaneously reveal genomic functions and mechanisms while predicting phenotype from genotype. In particular, Varmole embeds multi-omic networks into a deep neural network architecture and prioritizes variants, genes and regulatory linkages via biological drop-connect without needing prior feature selections. Availability and implementation Varmole is available as a Python tool on GitHub at https://github.com/daifengwanglab/Varmole. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
Close ⊗
This website uses cookies and the analysis tool Matomo. More information can be found here...