ALBERT

All Library Books, journals and Electronic Records Telegrafenberg

Your email was sent successfully. Check your inbox.

An error occurred while sending the email. Please try again.

Proceed reservation?

Export
Filter
  • Articles  (298)
  • Institute of Electrical and Electronics Engineers (IEEE)  (298)
  • Elsevier
  • 2015-2019  (298)
  • 1990-1994
  • 1945-1949
  • 2016  (298)
  • IEEE Transactions on Computers (T-C)  (298)
  • 1288
  • Computer Science  (298)
  • Geography
  • Mechanical Engineering, Materials Science, Production Engineering, Mining and Metallurgy, Traffic Engineering, Precision Mechanics
Collection
  • Articles  (298)
Publisher
  • Institute of Electrical and Electronics Engineers (IEEE)  (298)
  • Elsevier
Years
  • 2015-2019  (298)
  • 1990-1994
  • 1945-1949
Year
Topic
  • Computer Science  (298)
  • Geography
  • Mechanical Engineering, Materials Science, Production Engineering, Mining and Metallurgy, Traffic Engineering, Precision Mechanics
  • 1
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-12
    Description: During at-speed test of high performance sequential ICs using scan-based Logic BIST, the IC activity factor (AF) induced by the applied test vectors is significantly higher than that experienced during its in field operation. Consequently, power droop (PD) may take place during both shift and capture phases, which will slow down the circuit under test (CUT) signal transitions. At capture, this phenomenon is likely to be erroneously recognized as due to delay faults. As a result, a false test fail may be generated, with consequent increase in yield loss. In this paper, we propose two approaches to reduce the PD generated at capture during at-speed test of sequential circuits with scan-based Logic BIST using the Launch-On-Shift scheme. Both approaches increase the correlation between adjacent bits of the scan chains with respect to conventional scan-based LBIST. This way, the AF of the scan chains at capture is reduced. Consequently, the AF of the CUT at capture, thus the PD at capture, is also reduced compared to conventional scan-based LBIST. The former approach, hereinafter referred to as Low-Cost Approach (LCA), enables a 50 percent reduction in the worst case magnitude of PD during conventional logic BIST. It requires a small cost in terms of area overhead (of approximately 1.5 percent on average), and it does not increase the number of test vectors over the conventional scan-based LBIST to achieve the same Fault Coverage (FC). Moreover, compared to three recent alternative solutions, LCA features a comparable AF in the scan chains at capture, while requiring lower test time and area overhead. The second approach, hereinafter referred to as High-Reduction Approach (HRA), enables scalable PD reductions at capture of up to 87 percent, with limited additional costs in terms of area overhead and number of required test vectors for a given target FC, over our LCA approach. Particularly, compared to two of the three recent alternative solutions mentioned above, HRA en- bles a significantly lower AF in the scan chains during the application of test vectors, while requiring either a comparable area overhead or a significantly lower test time. Compared to the remaining alternative solutions mentioned above, HRA enables a similar AF in the scan chains at capture (approximately 90 percent lower than conventional scan-based LBIST), while requiring a significantly lower test time (approximately 4.87 times on average lower number of test vectors) and comparable area overhead (of approximately 1.9 percent on average).
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 2
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: The advent of the cloud computing makes storage outsourcing become a rising trend, which promotes the secure remote data auditing a hot topic that appeared in the research literature. Recently some research consider the problem of secure and efficient public data integrity auditing for shared dynamic data. However, these schemes are still not secure against the collusion of cloud storage server and revoked group users during user revocation in practical cloud storage system. In this paper, we figure out the collusion attack in the exiting scheme and provide an efficient public integrity auditing scheme with secure group user revocation based on vector commitment and verifier-local revocation group signature. We design a concrete scheme based on the our scheme definition. Our scheme supports the public checking and efficient user revocation and also some nice properties, such as confidently, efficiency, countability and traceability of secure group user revocation. Finally, the security and experimental analysis show that, compared with its relevant schemes our scheme is also secure and efficient.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 3
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: To select an appropriate level of error protection in caches, the impact of various protection schemes on the cache Failure In Time (FIT) rate must be evaluated for a target benchmark suite. However, while many simulation tools exist to evaluate area, power and performance for a set of benchmark programs, there is a dearth of such tools for reliability. This paper introduces a new cache reliability model called PARMA+ that has unique features which distinguish it from previous models. PARMA+ estimates a cache's FIT rate in the presence of spatial multi-bit faults, single-bit faults, temporal multi-bit faults and different error protection schemes including parity, ECC, early write-back and bit-interleaving. We first develop the model formally, then we demonstrate its accuracy. We have run reliability simulations for many distributions of large and small fault patterns and have compared them with accelerated fault injection simulations. PARMA+ has high accuracy and low computational complexity.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 4
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: Although the travel time is the most important information in road networks, many spatial queries, e.g., $k$ -nearest-neighbor ( $k$ -NN) and range queries, for location-based services (LBS) are only based on the network distance. This is because it is costly for an LBS provider to collect real-time traffic data from vehicles or roadside sensors to compute the travel time between two locations. With the advance of web mapping services, e.g., Google Maps, Microsoft Bing Maps, and MapQuest Maps, there is an invaluable opportunity for using such services for processing spatial queries based on the travel time. In this paper, we propose a server-side S patial M ashup S ervice (SMS) that enables the LBS provider to efficiently evaluate $k$ -NN queries in road networks using the route information and travel time retrieved from an external web mapping service. Due to the high cost of retrieving such external information, the usage limits of web mapping services, and the large number of spatial queries, we optimize the SMS for a large number of $k$ -NN queries. We first discuss how the SMS processes a single $k$ -NN query using two optimizations, namely, direction sharing and parallel requesting . Then, we extend them to process multiple concurrent $k$ -NN queries and design a performance tuning tool to provide a trade-off between the query response time and the number of external requests and more importantly, to prevent a starvation problem in the parallel requesting optimization for concurrent queries. We evaluate the performance of the proposed SMS using MapQuest Maps, a real road network, real and synthetic data sets. Experimental results show the efficiency and scalability of our optimizations designed for the SMS.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 5
    Publication Date: 2016-07-08
    Description: Several recent works have studied mobile vehicle scheduling to recharge sensor nodes via wireless energy transfer technologies. Unfortunately, most of them overlooked important factors of the vehicles’ moving energy consumption and limited recharging capacity, which may lead to problematic schedules or even stranded vehicles. In this paper, we consider the recharge scheduling problem under such important constraints. To balance energy consumption and latency, we employ one dedicated data gathering vehicle and multiple charging vehicles. We first organize sensors into clusters for easy data collection, and obtain theoretical bounds on latency. Then we establish a mathematical model for the relationship between energy consumption and replenishment, and obtain the minimum number of charging vehicles needed. We formulate the scheduling into a Profitable Traveling Salesmen Problem that maximizes profit - the amount of replenished energy less the cost of vehicle movements, and prove it is NP-hard. We devise and compare two algorithms: a greedy one that maximizes the profit at each step; an adaptive one that partitions the network and forms Capacitated Minimum Spanning Trees per partition. Through extensive evaluations, we find that the adaptive algorithm can keep the number of nonfunctional nodes at zero. It also reduces transient energy depletion by 30-50 percent and saves 10-20 percent energy. Comparisons with other common data gathering methods show that we can save 30 percent energy and reduce latency by two orders of magnitude.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 6
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: The capability of selectively sharing encrypted data with different users via public cloud storage may greatly ease security concerns over inadvertent data leaks in the cloud. A key challenge to designing such encryption schemes lies in the efficient management of encryption keys. The desired flexibility of sharing any group of selected documents with any group of users demands different encryption keys to be used for different documents. However, this also implies the necessity of securely distributing to users a large number of keys for both encryption and search, and those users will have to securely store the received keys, and submit an equally large number of keyword trapdoors to the cloud in order to perform search over the shared data. The implied need for secure communication, storage, and complexity clearly renders the approach impractical. In this paper, we address this practical problem, which is largely neglected in the literature, by proposing the novel concept of key-aggregate searchable encryption and instantiating the concept through a concrete KASE scheme, in which a data owner only needs to distribute a single key to a user for sharing a large number of documents, and the user only needs to submit a single trapdoor to the cloud for querying the shared documents. The security analysis and performance evaluation both confirm that our proposed schemes are provably secure and practically efficient.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 7
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: Infrastructure-as-a-service (IaaS) cloud providers offer tenants elastic computing resources in the form of virtual machine (VM) instances to run their jobs. Recently, providing predictable performance (i.e., performance guarantee) for tenant applications is becoming increasingly compelling in IaaS clouds. However, the hardware heterogeneity and performance interference across the same type of cloud VM instances can bring substantial performance variation to tenant applications, which inevitably stops the tenants from moving their performance-sensitive applications to the IaaS cloud. To tackle this issue, this paper proposes Heifer, a He terogeneity and i nter fer ence-aware VM provisioning framework for tenant applications, by focusing on MapReduce as a representative cloud application. It predicts the performance of MapReduce applications by designing a lightweight performance model using the online-measured resource utilization and capturing VM interference. Based on such a performance model, Heifer provisions the VM instances of the good-performing hardware type (i.e., the hardware that achieves the best application performance) to achieve predictable performance for tenant applications, by explicitly exploring the hardware heterogeneity and capturing VM interference. With extensive prototype experiments in our local private cloud and a real-world public cloud (i.e., Microsoft Azure) as well as complementary large-scale simulations, we demonstrate that Heifer can guarantee the job performance while saving the job budget for tenants. Moreover, our evaluation results show that Heifer can improve the job throughput of cloud datacenters, such that the revenue of cloud providers can be increased, thereby achieving a win-win situation between providers and tenants.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 8
    Publication Date: 2016-07-08
    Description: Gaussian normal bases (GNBs) are special set of normal bases (NBs) which yield low complexity $GFleft(2^{m}right)$ arithmetic operations. In this paper, we present new architectures for the digit-level single, hybrid-double, and hybrid-triple multiplication of $GFleft(2^{m}right)$ elements based on the GNB representation for odd values of $m > 1$ . The proposed fully-serial-in single multipliers perform multiplication of two field elements and offer high throughput when the data-path capacity for entering inputs is limited. The proposed hybrid-double and hybrid-triple digit-level GNB multipliers perform, respectively, two and three field multiplications using the same latency required for a single digit-level multiplier, at the expense of increased area. In addition, we present a new eight-ary field exponentiation architecture which does not require precomputed or stored intermediate values.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 9
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: Shamir's secret sharing scheme is an effective way to distribute secret to a group of shareholders. The security of the unprotected sharing scheme, however, can be easily broken by cheaters or attackers who maliciously feed incorrect shares during the secret recovery stage or inject faults into hardware computing the secret. In this paper, we propose cheater detection and identification schemes based on robust and algebraic manipulation detection (AMD) codes and m-disjunct matrices (superimposed codes). We present the constructions of codes for cheater detection and identification and describe how the cheater identification problem can be related to the classic group testing algorithms based on m-disjunct matrices. Simulation and synthesis results show that the proposed architecture can improve the security level significantly even under strong cheating attack models with reasonable area and timing overheads.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 10
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: Cloud platforms encompass a large number of storage services that can be used to manage the needs of customers. Each of these services, offered by a different provider, is characterized by specific features, limitations and prices. In presence of multiple options, it is crucial to select the best solution fitting the customer requirements in terms of quality of service and costs. Most of the available approaches are not able to handle uncertainty in the expression of subjective preferences from customers, and can result in wrong (or sub-optimal) service selections in presence of rational/selfish providers, exposing untrustworthy indications concerning the quality of service levels and prices associated to their offers. In addition, due to its multi-objective nature, the optimal service selection process results in a very complex task to be managed, when possible, in a distributed way, for well-known scalability reasons. In this work, we aim at facing the above challenges by proposing three novel contributions. The fuzzy sets theory is used to express vagueness in the subjective preferences of the customers. The service selection is resolved with the distributed application of fuzzy inference or Dempster-Shafer theory of evidence. The selection strategy is also complemented by the adoption of a game theoretic approach for promoting truth-telling ones among service providers. We present empirical evidence of the proposed solution effectiveness through properly crafted simulation experiments.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 11
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 12
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: As the cloud computing technology develops during the last decade, outsourcing data to cloud service for storage becomes an attractive trend, which benefits in sparing efforts on heavy data maintenance and management. Nevertheless, since the outsourced cloud storage is not fully trustworthy, it raises security concerns on how to realize data deduplication in cloud while achieving integrity auditing. In this work, we study the problem of integrity auditing and secure deduplication on cloud data. Specifically, aiming at achieving both data integrity and deduplication in cloud, we propose two secure systems, namely SecCloud and SecCloud $^+$ . SecCloud introduces an auditing entity with a maintenance of a MapReduce cloud, which helps clients generate data tags before uploading as well as audit the integrity of data having been stored in cloud. Compared with previous work, the computation by user in SecCloud is greatly reduced during the file uploading and auditing phases. SecCloud $^+$ is designed motivated by the fact that customers always want to encrypt their data before uploading, and enables integrity auditing and secure deduplication on encrypted data.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 13
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: Cache compression improves the performance of a multi-core system by being able to store more cache blocks in a compressed format. Compression is achieved by exploiting data patterns present within a block. For a given cache space, compression increases the effective cache capacity. However, this increase is limited by the number of tags that can be accommodated at the cache. Prefetching is another technique that improves system performance by fetching the cache blocks ahead of time into the cache and hiding the off-chip latency. Commonly used hardware prefetchers, such as stream and stride, fetch multiple contiguous blocks into the cache. In this paper we propose prefetched blocks compaction (PBC) wherein we exploit the data patterns present across these prefetched blocks. PBC compacts the prefetched blocks into a single block with a single tag, effectively increasing the cache capacity. We also modify the cache organization to access these multiple cache blocks residing in a single block without any need for extra tag look-ups. PBC improves the system performance by 11.1 percent with a maximum of 43.4 percent on a four-core system.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 14
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: Multi-core processors achieve a trade-off between the performance and the power consumption by using Dynamic Voltage Scaling (DVS) techniques. In this paper, we study the power efficient scheduling problem of real-time tasks in an identical multi-core system, and present Node Scaling model to achieve power-aware scheduling. We prove that there is a bound speed which results in the minimal power consumption for a given task set, and the maximal value of task utilization, $u_{max}$ , in a task set is a key element to decide its minimal power consumption. Based on the value $u_{max}$ , we classify task sets into two categories: the bounded task sets and the non-bounded task sets, and we prove the lower bound of power consumption for each type of task set. Simulations based on Intel Xeon X5550 and PXA270 processors show Node Scaling model can achieve power efficient scheduling by applying to existing algorithms such as EDF-FF and SPA2. The ratio of power reduction depends on the multi-core processor's property which is defined as the ratio of the bound speed to the maximal speed of the cores. When the ratio of speeds decreases, the ratio of power reduction increases for all the power efficient algorithms.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 15
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: Existing secure and privacy-preserving schemes for vehicular communications in vehicular ad hoc networks face some challenges, e.g., reducing the dependence on ideal tamper-proof devices, building efficient member revocation mechanisms and avoiding computation and communication bottlenecks. To cope with those challenges, we propose a highly efficient secure and privacy-preserving scheme based on identity-based aggregate signatures. Our scheme enables hierarchical aggregation and batch verification. The individual identity-based signatures generated by different vehicles can be aggregated and verified in a batch. The aggregated signatures can be re-aggregated by a message collector (e.g., traffic management authority). With our hierarchical aggregation technique, we significantly reduce the transmission/storage overhead of the vehicles and other parties. Furthermore, existing batch verification based schemes in vehicular ad hoc networks require vehicles to wait for enough messages to perform a batch verification. In contrast, we assume that vehicles will generate messages (and the corresponding signatures) in certain time spans, so that vehicles only need to wait for a very short period before they can start the batch verification procedure. Simulation shows that a vehicle can verify the received messages with very low latency and fast response.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 16
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: Computer vision applications have a large disparity in operations, data representation and memory access patterns from the early vision stages to the final classification and recognition stages. A hardware system for computer vision has to provide high flexibility without compromising performance, exploiting massively spatial-parallel operations but also keeping a high throughput on data-dependent and complex program flows. Furthermore, the architecture must be modular, scalable and easy to adapt to the needs of different applications. Keeping this in mind, a hybrid SIMD/MIMD architecture for embedded computer vision is proposed. It consists of a coprocessor designed to provide fast and flexible computation of demanding image processing tasks of vision applications. A 32-bit 128-unit device was prototyped on a Virtex-6 FPGA which delivers a peak performance of 19.6 GOP/s and 7.2 W of power dissipation.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 17
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: The key to reducing static energy in supercomputers is switching off their unused components. Routers are the major components of a supercomputer. Whether routers can be effectively switched off or not has become the key to static energy management for supercomputers. For many typical applications, the routers in a supercomputer exhibit low utilization. However, there is no effective method to switch the routers off when they are idle. By analyzing the router occupancy in time and space, for the first time, we present a routing-policy guided topology partitioning methodology to solve this problem. We propose topology partitioning methods for three kinds of commonly used topologies (mesh, torus and fat-tree) equipped with the three most popular routing policies (deterministic routing, directionally adaptive routing and fully adaptive routing). Based on the above methods, we propose the key techniques required in this topology partitioning based static energy management in supercomputer interconnection networks to switch off unused routers in both time and space dimensions. Three topology-aware resource allocation algorithms have been developed to handle effectively different job-mixes running on a supercomputer. We validate the effectiveness of our methodology by using Tianhe-2 and a simulator for the aforementioned topologies and routing policies. The energy savings achieved on a subsystem of Tianhe-2 range from 3.8 to 79.7 percent. This translates into a yearly energy cost reduction of up to half a million US dollars for Tianhe-2.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 18
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: This paper proposes several designs of approximate restoring dividers; two different levels of approximation (cell and array levels) are employed. Three approximate subtractor cells are utilized for integer subtraction as basic step of division; these cells tend to mitigate accuracy in subtraction with other metrics, such as circuit complexity and power dissipation. At array level, exact cells are either replaced or truncated in the approximate divider designs. A comprehensive evaluation of approximation at both cell- and array (divider) levels is pursued using error analysis and HSPICE simulation; different circuit metrics including complexity and power dissipation are evaluated. Different applications are investigated by utilizing the proposed approximate arithmetic circuits. The simulation results show that with extensive savings for power dissipation and circuit complexity, the proposed designs offer better error tolerant capabilities for quotient oriented applications (image processing) than remainder oriented application (modulo operations). The proposed approximate restoring divider is significantly better than the approximate non-restoring scheme presented in the technical literature.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 19
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: Wireless sensor networks (WSNs) have been considered to be the next generation paradigm of structural health monitoring (SHM) systems due to the low cost, high scalability and ease of deployment. Due to the intrinsically energy-intensive nature of the sensor nodes in SHM application, it is highly preferable that they can be divided into subsets and take turns to monitor the condition of a structure. This approach is generally called as ‘coverage-preserving scheduling’ and has been widely adopted in existing WSN applications. The problem of partitioning the nodes into subsets is generally called as the ’maximum lifetime coverage problem (MLCP)’. However, existing solutions to the MLCP cannot be directly applied to SHM application. As compared to other WSN applications, we cannot define a specific coverage area independently for each sensor node in SHM, which is however the basic assumption in all existing solutions to the MLCP. In this paper, we proposed two approaches to solve the MLCP in SHM. The performance of the methods is demonstrated through both extensive simulations and real experiments.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 20
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: NAND flash memory is widely used for the secondary storage of computer systems. The flash translation layer (FTL) is the firmware that manages and operates a flash-based storage device. One of the FTL's modules manages the RAM buffer of the flash device. Now this RAM buffer is sufficient to be used for both address mapping and data buffering. As the fastest component of the flash layer interface, effective management of this buffer has a significant impact on the performance of data storage and access. This paper proposes a novel scheme called TreeFTL for this purpose. TreeFTL organizes address translation pages and data storage pages in a tree-like structure in the RAM buffer. The tree enables TreeFTL to adapt to the access behaviors of workloads by dynamically adjusting the partitions for address mapping and data buffering. Furthermore, TreeFTL employs a lightweight mechanism to evict the least-recently-used victim pages when the need arises. Our experiments show that TreeFTL is able to spend 46.6 and 49.0 percent less service time over various workloads than two state-of-the-art algorithms, respectively, for a 64 MB RAM buffer.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 21
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: Whereas clustered microarchitectures themselves have been extensively studied, the memory units for these clustered microarchitectures have received relatively little attention. This article discusses some of the inherent challenges of clustered memory units and shows how these can be overcome. Clustered memory pipelines work well with the late allocation of load/store queue entries and physically unordered queues. Yet this approach has characteristic problems such as queue overflows and allocation patterns that lead to deadlocks. We propose techniques to solve each of these problems and show that a distributed memory unit can offer significant energy savings and speedups over a centralized unit. For instance, compared to a centralized cache with a load/store queue of 64/24 entries, our four-cluster distributed memory unit with load/store queues of 16/8 entries each consumes 31 percent less energy and performs 4,7 percent better on SPECint and consumes 36 percent less energy and performs 7 percent better for SPECfp.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 22
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: In Broadcast Encryption (BE) systems like Pay-TV, AACS, online content sharing and broadcasting, reducing the header length (communication overhead per session) is of practical interest. The Subset Difference (SD) scheme due to Naor-Naor-Lotspiech (NNL) is the most popularly used BE scheme. We introduce the $(a,b,gamma)$ augmented binary tree subset difference ( $(a,b,gamma)$ -ABTSD) scheme which is a generalization of the NNL-SD scheme. By varying the parameters $(a,b,gamma)$ , it is possible to obtain $O(nlog n)$ different schemes. The average header length achieved by the new schemes is smaller than all known schemes having the same decryption time as that of the NNL-SD scheme and achieving non-trivial trade-offs between the user storage and the header size. The amount of key material that a user is required to store increases. For the earlier mentioned applications, reducing header size and achieving fast decryption is perhaps more of a concern than the user storage.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 23
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: We propose a new optimal data placement technique to improve the performance of MapReduce in cloud data centers by considering not only the data locality but also the global data access costs. We first conducted an analytical and experimental study to identify the performance issues of MapReduce in data centers and to show that MapReduce tasks that are involved in unexpected remote data access have much greater communication costs and execution time, and can significantly deteriorate the overall performance. Next, we formulated the problem of optimal data placement and proposed a generative model to minimize global data access cost in data centers and showed that the optimal data placement problem is NP-hard. To solve the optimal data placement problem, we propose a topology-aware heuristic algorithm by first constructing a replica-balanced distribution tree for the abstract tree structure, and then building a replica-similarity distribution tree for detail tree construction, to construct an optimal replica distribution tree. The experimental results demonstrated that our optimal data placement approach can improve the performance of MapReduce with lower communication and computation costs by effectively minimizing global data access costs, more specifically reducing unexpected remote data access.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 24
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: This paper presents a Ternary Content-addressable Memory (TCAM) design which is based on the use of floating-gate (flash) transistors. TCAMs are extensively used in high speed IP networking, and are commonly found in routers in the internet core. Traditional TCAM ICs are built using CMOS devices, and a single TCAM cell utilizes 17 transistors. In contrast, our TCAM cell utilizes only two flash transistors, thereby significantly reducing circuit area. We cover the chip-level architecture of the TCAM IC briefly, focusing mainly on the TCAM block which does fast parallel IP routing table lookup. Our flash-based TCAM (FTCAM) block is simulated in SPICE, and we show that it has a significantly lowered area compared to a CMOS based TCAM block, with a speed that can meet current ( $sim$ 400 Gb/s) data rates that are found in the internet core.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 25
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: The Booth multiplier has been widely used for high performance signed multiplication by encoding and thereby reducing the number of partial products. A multiplier using the radix- $4$ (or modified Booth) algorithm is very efficient due to the ease of partial product generation, whereas the radix- $8$ Booth multiplier is slow due to the complexity of generating the odd multiples of the multiplicand. In this paper, this issue is alleviated by the application of approximate designs. An approximate $2$ -bit adder is deliberately designed for calculating the sum of $1times$ and $2times$ of a binary number. This adder requires a small area, a low power and a short critical path delay. Subsequently, the $2$ -bit adder is employed to implement the less significant section of a recoding adder for generating the triple multiplicand with no carry propagation. In the pursuit of a trade-off between accuracy and power consumption, two signed $16times 16$ bit approximate radix-8 Booth multipliers are designed using the approximate recoding adder with and without the truncation of a number of less significant bits in the partial products. The proposed approximate multipliers are faster and more power efficient than the accurate Booth multiplier. The multiplier with 15-bit truncation achieves the best overall performance in terms of hardware and accuracy when compared to other approximate Booth multiplier designs. Finally, the approximate multipliers are applied to the design of a low-pass FIR filter and they show better performance than other approximate Booth multipliers.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 26
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: Solid State Drives (SSDs) have been extensively deployed as the cache of hard disk-based storage systems. The SSD-based cache generally supplies ultra-large capacity, whereas managing so large a cache introduces excessive memory overhead, which in turn makes the SSD-based cache neither cost-effective nor energy-efficient. This work targets to reduce the memory overhead introduced by the replacement policy of SSD-based cache. Traditionally, data structures involved in cache replacement policy reside in main memory. While these in-memory data structures are not suitable for SSD-based cache any more since the cache is much larger than ever. We propose a memory-efficient framework which keeps most data structures in SSD while just leaving the memory-efficient data structure (i.e., a new bloom proposed in this work) in main memory. Our framework can be used to implement any LRU-based replacement policies under negligible memory overhead. We evaluate our proposals via theoretical analysis and prototype implementation. Experimental results demonstrate that, our framework is practical to implement most replacement policies for large caches, and is able to reduce the memory overhead by about $10 times$ .
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 27
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: A large portion of existing multithreaded embedded sofware has been programmed according to symmetric shared memory platforms where a monolithic memory block is shared by all cores. Such platforms accommodate popular parallel programming models such as POSIX threads and OpenMP. However with the growing number of cores in modern manycore embedded architectures, they present a bottleneck related to their centralized memory accesses. This paper proposes a solution tailored for an efficient execution of applications defined with shared-memory programming models onto on-chip distributed-memory multicore architectures. It shows how performance, area and energy consumption are significantly improved thanks to the scalability of these architectures. This is illustrated in an open-source realistic design framework, including tools from ASIC to microkernel.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 28
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: This paper describes a procedure that computes seeds for $LFSR$ -based generation of partially-functional broadside tests. Existing $LFSR$ -based test data compression methods compute seeds based on incompletely-specified test cubes. Functional broadside tests are fully-specified, and they have fully-specified scan-in states. This is the main challenge that the test generation procedure described in this paper needs to address. It addresses it by using a process that modifies an initial seed $s_i$ in order to reduce the Hamming distance between the scan-in state $p_i$ that $s_i$ creates and a reachable state $r_j$ . When the Hamming distance is reduced to zero, the seed can be used for generating functional broadside tests. When the distance is larger than zero, the tests are partially-functional. Experimental results are presented for transition faults in benchmark circuits to demonstrate the resulting distances and fault co- erage.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 29
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: A new apparatus for fast multiplication of two numbers is introduced. Inputs are split into partitions, and one number is replaced by two with zeros interlaced in every other partition. Products are computed with no carries between partitions, in the time required to multiply the short partitions and add the partial sums. Component adders and multipliers can be chosen to trade off area and speed. A new graphical tool is used to compare this multiplier to existing ones based on CMOS VLSI simulations.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 30
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-05-10
    Description: Covert channels are widely considered as a major risk of information leakage in various operating systems, such as desktop, cloud, and mobile systems. The existing works of modeling covert channels have mainly focused on using finite state machines (FSMs) and their transforms to describe the process of covert channel transmission. However, a FSM is rather an abstract model, where information about the shared resource, synchronization, and encoding/decoding cannot be presented in the model, making it difficult for researchers to realize and analyze the covert channels. In this paper, we use the high-level Petri Nets (HLPN) to model the structural and behavioral properties of covert channels. We use the HLPN to model the classic covert channel protocol. Moreover, the results from the analysis of the HLPN model are used to highlight the major shortcomings and interferences in the protocol. Furthermore, we propose two new covert channel models, namely: (a) two channel transmission protocol (TCTP) model and (b) self-adaptive protocol (SAP) model. The TCTP model circumvents the mutual inferences in encoding and synchronization operations; whereas the SAP model uses sleeping time and redundancy check to ensure correct transmission in an environment with strong noise. To demonstrate the correctness and usability of our proposed models in heterogeneous environments, we implement the TCTP and SAP in three different systems: (a) Linux, (b) Xen, and (c) Fiasco.OC. Our implementation also indicates the practicability of the models in heterogeneous, scalable and flexible environments.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 31
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-05-10
    Description: Recovery from sudden power-off (SPO) is one of the primary concerns among practitioners which bars the quick and wide deployment of flash storage devices. In this work, we propose Metadata Embedded Write (MEW), a novel scheme for handling the sudden power-off recovery in modern flash storage devices. Given that a large fraction of commercial SSDs employ compression technology, MEW exploits the compression-induced internal fragmentation in the data area to store rich metadata for fast and complete recovery. MEW consists of (i) a metadata embedding scheme to harbor SSD metadata in a physical page together with multiple compressed logical pages, (ii) an allocation chain based fast recovery scheme, and (iii) a light-weight metadata logging scheme which enables MEW to maintain the metadata for incompressible data, too. We performed extensive experiments to examine the performance of MEW. The performance overhead of MEW is 3 percent in the worst case, in terms of the write amplification factor, compared to the pure compression-based FTL that does not have any recovery scheme.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 32
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-05-10
    Description: Scaling the CMOS devices deep into the nanorange reduces their reliability margins significantly. Consequently, accurately calculating the reliability of digital nanocircuits is becoming a necessity for investigating design alternatives to optimize the trade-offs between area-power-delay and reliability. However, accurate reliability calculation of large and highly connected circuits is complex and very time consuming. This paper proposes a progressive consensus-based algorithm for identifying the worst reliability input vectors and the associated critical logic gates. Improving the reliability of the critical gates helps circuit designers to effectively improve the circuit overall reliability while having a minimal impact on the traditional power-area-deal design parameters. The accuracy and efficiency of the algorithm can be tuned to fit a variety of applications. The algorithm scales well with circuit size, and is independent of the interconnect complexity and the logic depth. Extensive computational results show that the accuracy and the efficiency of the proposed algorithm are better than the most recent results reported in the literature.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 33
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-05-10
    Description: The Serial-out bit-level multiplication scheme is characterized by an important latency feature. It has an ability to sequentially generate an output bit of the multiplication result in each clock cycle. However, the computational complexity of the existing serial-out bit-level multipliers in $GF$ ( $2^m$ ) using normal basis representation, limits its usefulness in many applications; hence, an optimized serial-out bit-level multiplier using polynomial basis representation is needed. In this paper, we propose new serial-out bit-level Mastrovito multiplier schemes. We show that in terms of the time complexities, the proposed multiplier schemes outperform the existing serial-out bit-level schemes available in the literature. In addition, using the proposed multiplier schemes, we present new hybrid-double multiplication architectures. To the best of our knowledge, this is the first time such a hybrid multiplier structure using the polynomial basis is proposed. Prototypes of the presented serial-out bit-level schemes and the proposed hybrid-double multiplication architectures (10 schemes in total) are implemented over both $GF(2^{163})$ and $GF(2^{233})$ , and experimental results are presented.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 34
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-05-10
    Description: Though a cooperative broadcast scheme has been proposed for fading environments, it has two defects: First, it only handles a packet flow from a single source node in the network, but does not consider the scenario of multiple packet flows simultaneously broadcasted from different source nodes. Second, it only allows a single relay node to forward a packet in each time slot, though multiple relay nodes forwarding in a time slot can significantly reduce broadcast latency. In this paper, we aim achieve low-latency multi-flow broadcast in wireless multi-hop networks with fading channels. To describe the interference among the transmission in different flows, we incorporate the Rayleigh fading model to the signal to noise ratio (SNR) model. Then, we introduce a cooperative diversity scheme which allows multiple relays forwarding in a time slot to reduce broadcast latency. We then formulate an interesting problem: In a fading environment, what is the optimal relay allocation schedule to minimize the broadcast latency? We propose a warm up heuristic algorithm for single-flow cooperative broadcast, based on which, we further propose a heuristic algorithm for multi-flow cooperative broadcast. Simulation results demonstrate that the two algorithms achieve lower broadcast latency than a previous method.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 35
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-05-10
    Description: On modern multicore machines, the memory management typically combines address interleaving in hardware and random allocation in the operating system (OS) to improve performance of both memory and cache. The conventional solutions, however, are increasingly strained as a wide variety of workloads run on complicated memory hierarchy and cause contention at multiple levels. We describe a new framework (named HVR) in OS memory management to support a flexible policy space for tackling diverse application needs, integrating vertical partitioning across layers, horizontal partitioning and random-interleaved allocation at a single layer. We exhaustively study the performance of these policies for over 2,000 workloads and correlate performance with application characteristics. Based on this correlation we derive several practical rules of memory allocation that we integrate into the unified HVR framework to guide resource partitioning and sharing for dynamic and diverse workloads. We implement our approach in Linux kernel 2.6.32 as a restructured page indexing system plus a series of kernel modules. Experimental results show that our framework consistently outperforms the unmodified Linux kernel, with up to 21 percent performance gains, and outperforms prior solutions at individual levels of the memory hierarchy.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 36
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-05-10
    Description: With the explosive growth in data volume, the I/O bottleneck has become an increasingly daunting challenge for big data analytics in the Cloud. Recent studies have shown that moderate to high data redundancy clearly exists in primary storage systems in the Cloud. Our experimental studies reveal that data redundancy exhibits a much higher level of intensity on the I/O path than that on disks due to relatively high temporal access locality associated with small I/O requests to redundant data. Moreover, directly applying data deduplication to primary storage systems in the Cloud will likely cause space contention in memory and data fragmentation on disks. Based on these observations, we propose a performance-oriented I/O deduplication, called POD, rather than a capacity-oriented I/O deduplication, exemplified by iDedup, to improve the I/O performance of primary storage systems in the Cloud without sacrificing capacity savings of the latter. POD takes a two-pronged approach to improving the performance of primary storage systems and minimizing performance overhead of deduplication, namely, a request-based selective deduplication technique, called Select-Dedupe, to alleviate the data fragmentation and an adaptive memory management scheme, called iCache, to ease the memory contention between the bursty read traffic and the bursty write traffic. We have implemented a prototype of POD as a module in the Linux operating system. The experiments conducted on our lightweight prototype implementation of POD show that POD significantly outperforms iDedup in the I/O performance measure by up to 87.9 percent with an average of 58.8 percent. Moreover, our evaluation results also show that POD achieves comparable or better capacity savings than iDedup.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 37
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-05-10
    Description: This manuscript proposes three classes of codes for error correction in a storage system in which the memory cells do not have the same number of levels, i.e., a multiscale storage. The proposed codes are single multiscale-symbol error correction (SMSEC) codes and are capable of correcting any errors occurring on a single memory cell, namely a column-deleted SMSEC code, an element-compacted SMSEC code and a product SMSEC code. In the proposed codes, the codewords are divided into two partitions, the elements on the first partition are over GF(2 b 1 ), while those on the remaining partition are over GF(2 b 2 ). This paper also gives guidelines for selection among the three SMSEC codes to meet the desired hardware overhead in the parallel decoder for realistic parameters of the partition pair, such as ( b 1 , b 2 ) = (4,3), (4,2) and (3,2). Moreover it is shown that the best choice for a MSS system is the SMSEC code with the shortest check bit length; if the check bit lengths of at least two codes are equal, then the use of the element-compacted SMSEC code incurs in the smallest hardware overhead.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 38
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-05-10
    Description: Multi-cloud storage can provide better features such as availability and scalability. Current works use multiple cloud storage providers with erasure coding to achieve certain benefits including fault-tolerance improving or vendor lock-in avoiding. However, these works only use the multi-cloud storage in ad-hoc ways, and none of them considers the optimization issue in general. In fact, the key to optimize the multi-cloud storage is to effectively choose providers and erasure coding parameters. Meanwhile, the data placement should satisfy system or application developers’ requirements. As developers often demand various objectives to be optimized simultaneously, such complex requirement optimization cannot be easily fulfilled by ad-hoc ways. This paper presents Triones, a systematic model to formally formulate data placement in multi-cloud storage by using erasure coding. Firstly, Triones addresses the problem of data placement optimization by applying non-linear programming and geometric space abstraction. It could satisfy complex requirements involving multi-objective optimization. Secondly, Triones can effectively balance among different objectives in optimization and is scalable to incorporate new ones. The effectiveness of the model is proved by extensive experiments on multiple cloud storage providers in the real world. For simple requirements, Triones can achieve 50 percent access latency reduction, compared with the model in $mu$ LibCloud. For complex requirements, Triones can improve fault-tolerance level by 2 $times$ and reduce access latency and vendor lock-in level by 30 $sim$ 70 percent and 49.85 percent respectively with about 19.19 percent more cost, compared with the model only optimizing cost in Scalia.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 39
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-05-10
    Description: In this paper, we propose a two-factor data security protection mechanism with factor revocability for cloud storage system. Our system allows a sender to send an encrypted message to a receiver through a cloud storage server. The sender only needs to know the identity of the receiver but no other information (such as its public key or its certificate). The receiver needs to possess two things in order to decrypt the ciphertext. The first thing is his/her secret key stored in the computer. The second thing is a unique personal security device which connects to the computer. It is impossible to decrypt the ciphertext without either piece. More importantly, once the security device is stolen or lost, this device is revoked. It cannot be used to decrypt any ciphertext. This can be done by the cloud server which will immediately execute some algorithms to change the existing ciphertext to be un-decryptable by this device. This process is completely transparent to the sender. Furthermore, the cloud server cannot decrypt any ciphertext at any time. The security and efficiency analysis show that our system is not only secure but also practical.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 40
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-04-08
    Description: The digit-recurrence division algorithm is used in several high-performance processors because it provides good tradeoffs in terms of latency, area and power dissipation. In this work we develop a minimally redundant radix-8 divider for binary64 (double-precision) aiming at obtaining better energy efficiency in the performance-per-watt space. The results show that the radix-8 divider, when compared to radix-4 and radix-16 units, requires less energy to complete a division for high clock rates.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 41
    Publication Date: 2016-02-09
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 42
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-02-09
    Description: To achieve high reliability in on-chip networks, it is necessary to test the network as frequently as possible to detect physical failures before they lead to system-level failures. A main obstacle is that the circuit under test has to be isolated, resulting in network cuts and packet blockage which limit the testing frequency. To address this issue, we propose a comprehensive network-level approach which could test multiple routers simultaneously at high speed without blocking or dropping packets. We first introduce a reconfigurable router architecture allowing the cores to keep their connections with the network while the routers are under test. A deadlock-free and highly adaptive routing algorithm is proposed to support reconfigurations for testing. In addition, a testing sequence is defined to allow testing multiple routers to avoid dropping of packets. A procedure is proposed to control the behavior of the affected packets during the transition of a router from the normal to the testing mode and vice versa. This approach neither interrupts the execution of applications nor has a significant impact on the execution time. Experiments with the PARSEC benchmarks on an 8 $times$ 8 NoC-based chip multiprocessors show only 3 percent execution time increase with four routers simultaneously under test.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 43
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-02-09
    Description: Feedback bridging faults may give rise to oscillations within integrated circuits. This work mainly investigates the propagation of oscillations, a behavior that may have a relevant impact on the fault detection. We propose both a logic-level model of the faulty circuit and two techniques aiming to the generation of high-quality test sequences.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 44
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-02-09
    Description: In field degradation of manycore processors poses a grand challenge to core management, largely because the degradation is hard to quantify. We propose a novel core-level degradation quantification scheme, CoreRank, to facilitate the management. We first develop a new degradation metric, called “healthy condition”, to capture the implication of performance degradation of a core with specific degraded components. Then, we propose a performance sampling scheme by using micro-operation streams, called snippet, to statistically quantify cores’ healthy condition. We find that similar snippets exhibit stable performance distribution, which makes them ideal micro-benchmarks to testify the core-level healthy conditions. We develop a hardware-implemented version of CoreRank based on bloom filter and hash table. Unlike the traditional “faulty” or “fault-free” judgement, CoreRank provides a key facility to make better use of those imperfect cores that suffered from various progressive aging mechanisms such as NBTI, HCI. Experimental results show that CoreRank successfully hides significant performance degradation of a defective manycore processor in which even more than half of the cores are salvaged from various defects.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 45
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-02-09
    Description: Aggressive technology scaling triggers novel challenges to the design of multi-/many-core systems, such as limited power budget and increased reliability issues. Today’s many-core systems employ dynamic power management and runtime mapping strategies trying to offer optimal performance while fulfilling power constraints. On the other hand, due to the reliability challenges, online testing techniques are becoming a necessity in current and near future technologies. However, state-of-the-art techniques are not aware of the other power/performance requirements. This paper proposes a power-aware non-intrusive online testing approach for many-core systems. The approach schedules software based self-test routines on the various cores during their idle periods, while honoring the power budget and limiting delays in the workload execution. A test criticality metric, based on a device aging model, is used to select cores to be tested at a time. Moreover, power and reliability issues related to the testing at different voltage and frequency levels are also handled. Extensive experimental results reveal that the proposed approach can i) efficiently test the cores within the available power budget causing a negligible performance penalty, ii) adapt the test frequency to the current cores’ aging status, and iii) cover available voltage and frequency levels during the testing.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 46
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-02-09
    Description: Software-Based Self-Test is an effective methodology for devising the online testing of Systems-on-Chip. In the automotive field, a set of test programs to be run during mission mode is also called Core Self-Test library. This paper introduces many new contributions: (1) it illustrates the several issues that need to be taken into account when generating test programs for on-line execution; (2) it proposed an overall development flow based on ordered generation of test programs that is minimizing the computational efforts; (3) it is providing guidelines for allowing the coexistence of the Core Self-Test library with the mission application while guaranteeing execution robustness. The proposed methodology has been experimented on a large industrial case study. The coverage level reached after one year of team work is over 87 percent of stuck-at fault coverage, and execution time is compliant with the ISO26262 specification. Experimental results suggest that alternative approaches may request excessive evaluation time thus making the generation flow unfeasible for large designs.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 47
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-02-09
    Description: Scaling supply voltage to values near the threshold voltage allows a dramatic decrease in the power consumption of processors; however, the lower the voltage, the higher the sensitivity to process variation, and, hence, the lower the reliability. Large SRAM structures, like the last-level cache (LLC), are extremely vulnerable to process variation because they are aggressively sized to satisfy high density requirements. In this paper, we propose Concertina, an LLC designed to enable reliable operation at low voltages with conventional SRAM cells. Based on the observation that for many applications the LLC contains large amounts of null data, Concertina compresses cache blocks in order that they can be allocated to cache entries with faulty cells, enabling use of 100 percent of the LLC capacity. To distribute blocks among cache entries, Concertina implements a compression- and fault-aware insertion/replacement policy that reduces the LLC miss rate. Concertina reaches the performance of an ideal system implementing an LLC that does not suffer from parameter variation with a modest storage overhead. Specifically, performance degrades by less than 2 percent, even when using small SRAM cells, which implies over 90 percent of cache entries having defective cells, and this represents a notable improvement on previously proposed techniques.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 48
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-02-09
    Description: Due to more aggressive design rules adopted by memories than logic circuits, memories have been considered as the major technology driver of advanced logic circuits, so far as CMOS process technology is concerned. Memory failure pattern identification therefore is important, and is traditionally considered a key task that can help improve the efficiency of memory diagnosis and failure analysis. Critical failure patterns (that are the yield killers), however, may change in different memory designs and process technologies. It is difficult to identify critical failure patterns from high-volume memory failure bitmaps if they are not predefined. To solve this problem, we propose a local parallel search algorithm for efficient memory failure pattern identification. In addition, the proposed system integrates the defect-spectrum-based and coordinate-distance-based methods to identify critical memory failure patterns from a large amount of memory failure bitmaps automatically, even if they are not defined in advance. In our experiment for 132,488 4-MB memory failure bitmaps, the proposed system can automatically identify six critical yet undefined failure patterns in minutes, in addition to all known patterns. In comparison, the state-of-the-art commercial tools need manual inspection of the memory failure bitmaps to identify the same failure patterns.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 49
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-02-09
    Description: This paper proposes a novel scheme for a low-power non-volatile (NV) memory that exploits a two-level arrangement for attaining single event/multiple bit upsets (SEU/MBU) tolerance. Low-power hardened NVSRAM cell designs are initially utilized at the first level; these designs increase the critical charge and decrease power consumption by providing a positive (virtual) ground level voltage. A soft error rate (SER) analysis is also pursued to confirm the findings of the critical charge-based analysis. Simulation of these cells shows that their operation has a very high SEU tolerance, the charges in the nodes of the circuits for non-volatile storage and gate leakage current reduction have very high values, thus ensuring that a SEU will highly unlike affect the correct functions. A novel memory scheme with the proposed NVSRAM cells is proposed for tolerating MBU; in this scheme, only the error detection circuitry is required, because error correction is provided by the non-volatile elements of the NVSRAM cells. Simulation results show that the proposed scheme is very efficient in terms of delay and number of transistors (as measure of complexity). Moreover, the very high critical charge of some of the proposed cell designs reduces the number of MBU appearing as errors at the outputs of the memory, thus further reducing the error detection hardware required by the proposed scheme. An extensive evaluation and comparison of different schemes are presented.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 50
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-02-09
    Description: Graphics processing units (GPUs) are increasingly attractive for both safety-critical and High-Performance Computing applications. GPU reliability is a primary concern for both the automotive and aerospace markets and is becoming an issue also for supercomputers. In fact, the high number of devices in large data centers makes the probability of having at least a device corrupted to be very high. In this paper, we aim at giving novel insights on GPU reliability by evaluating the neutron sensitivity of modern GPUs memory structures, highlighting pattern dependence and multiple errors occurrences. Additionally, a wide set of parallel codes are exposed to controlled neutron beams to measure GPUs operative error rates. From experimental data and algorithm analysis we derive general insights on parallel algorithms and programming approaches reliability. Finally, error-correcting code, algorithm-based fault tolerance, and duplication with comparison hardening strategies are presented and evaluated on GPUs through radiation experiments. We present and compare both the reliability improvement and imposed overhead of the selected hardening solutions.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 51
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-02-09
    Description: Auction-style pricing policies can effectively reflect the underlying trends in demand and supply for the cloud resources, and thereby attracted a research interest recently. In particular, a desirable cloud auction design should be (1) online to timely reflect the fluctuation of supply-demand relations, (2) expressive to support the heterogeneous user demands, and (3) truthful to discourage users from cheating behaviors. Meeting these requirements simultaneously is non-trivial, and most existing auction mechanism designs do not directly apply. To meet these goals, this paper conducts the first work on a framework for truthful online cloud auctions where users with heterogeneous demands could come and leave on the fly. Concretely speaking, we first design a novel bidding language, wherein users’ heterogeneous requirement on their desired allocation time, application type, and even how they value among different possible allocations can be flexibly and concisely expressed. Besides, building on top of our bidding language we propose COCA, an incentive-Compatible (truthful) Online Cloud Auction mechanism. To ensure truthfulness with heterogenous and online user demand, the design of COCA is driven by a monotonic payment rule and a utility-maximizing allocation rule. Moreover, our theoretical analysis shows that the worst-case performance of COCA can be well-bounded, and our further discussion shows that COCA performs well when some other important factors in online auction design are taken into consideration. Finally, in simulations the performance of COCA is seen to be comparable to the well-known off-line Vickrey-Clarke-Groves (VCG) mechanism [19] .
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 52
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-02-09
    Description: Traffic hotspots, a severe form of network congestion, can be caused unexpectedly in a network-on-chip (NoC) due to the immanent spatio-temporal unevenness of application traffic. Hotspots reduce the NoC’s effective throughput, where in the worst-case scenario, network traffic flows can be frozen indefinitely. To alleviate this problematic phenomenon several adaptive routing algorithms employ online load-balancing schemes, aiming to reduce the possibility of hotspots arising. Since most are not explicitly hotspot-agnostic, they cannot completely prevent hotspot formation(s) as their reactive capability to hotspots is merely passive. This paper presents a pro-active Hotspot-Preventive Routing Algorithm (HPRA) which uses the advance knowledge gained from network-embedded artificial neural network-based (ANN) hotspot predictors to guide packet routing in mitigating any unforeseen near-future hotspot occurrences. First, these ANN-based predictors are trained offline and during multicore operation they gather online statistical data to predict about-to-be-formed hotspots, promptly informing HPRA to take appropriate hotspot-preventive action(s). Next, in a holistic approach, additional ANN training is performed with data acquired after HPRA interferes, so as to further improve hotspot prediction accuracy; hence, the ANN mechanism does not only predict hotspots, but is also aware of changes that HPRA imposes upon the interconnect infrastructure. Evaluation results, including utilizing real application traffic traces gathered from parallelized workload executions onto a chip multiprocessor architecture, show that HPRA can improve network throughput up to $81$ percent when compared with prior-art. Hardware synthesis results affirm the HPRA mechanism’s moderate overhead requisites.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 53
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-02-09
    Description: Today, a considerable portion of embedded systems, e.g., automotive and avionic, comprise several control applications. Guaranteeing the stability of these control applications in embedded systems, or cyber-physical systems, is perhaps the most fundamental requirement while implementing such applications. This is different from the classical hard real-time systems where often the acceptance criterion is meeting the deadline. In other words, in the case of control applications, guaranteeing stability is considered to be a main design goal, which is linked to the amount of delay and jitter a control application can tolerate before instability. This advocates the need for new design and analysis techniques for embedded real-time systems running control applications. In this paper, the analysis and design of such systems considering a server-based resource reservation mechanism are addressed. The benefits of employing servers are manifold: providing a compositional and scalable framework, protection against other tasks’ misbehaviors, and systematic bandwidth assignment and co-design. We propose a methodology for designing bandwidth-optimal servers to stabilize control tasks. The pessimism involved in the proposed methodology is both discussed theoretically and evaluated experimentally.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 54
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-02-09
    Description: TSV-to-TSV coupling is known to be a significant detriment to signal integrity in three-dimensional (3D) IC architectures. Designing a reliable Through-Silicon Via is critical in order to support better performance. This paper explores the challenges brought on by capacitive and inductive TSV-to-TSV coupling in TSV-based 3D ICs. Based on our analyses, we propose two approaches to mitigate these effects. In one approach, a novel coding technique that adjusts the current flow pattern is proposed to mitigate the inductive coupling effects. In another approach an alternative architecture, wrapping around the TSVs, is proposed to greatly reduce the capacitive coupling effect. The efficiency of the proposed coding methods and supporting architectures is demonstrated by comprehensive simulations at both the hardware and system levels.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 55
    Publication Date: 2016-02-09
    Description: Mainstream chip multiprocessors already include a significant number of cores that make straightforward snooping-based cache coherence less appropriate. Further increase in core count will almost certainly require more sophisticated tracking of data sharing to minimize unnecessary messages and cache snooping. Directory-based coherence has been the standard solution for large-scale shared-memory multiprocessors and is a clear candidate for on-chip coherence maintenance. A vanilla directory design, however, suffers from inefficient use of storage to keep coherence metadata. The result is a high storage overhead for larger scales. Reducing this overhead leads to saving of resources that can be redeployed for other purposes. In this paper, we exploit familiar characteristics of coherence metadata, but with novel angles and propose two practical techniques to increase the expressiveness of directory entries, particularly for chip-multiprocessors. First, it is well known that the vast majority of cache lines have a small number of sharers. We exploit a related fact with a subtle but important difference: that a significant portion of directory entries only need to track one node. We can thus use a hybrid representation of sharers list for the directory. Second, contiguous memory regions often share the same coherence characteristics and can be tracked by a single entry. We propose an adaptive multi-granular mechanism that does not rely on any profiling, compiler, or operating system support to identify such regions. Moreover, it allows co-existence of line and region entries in the same locations, thus making regions more applicable. We show that both techniques improve the expressiveness of directory entries, and, when combined, can reduce directory storage by more than an order of magnitude with negligible loss of precision.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 56
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-02-09
    Description: The importance of the Internet for society is increasing. To ensure a functional Internet, its routers need to operate correctly. However, the need for router flexibility has led to the use of software-programmable network processors in routers, which exposes these systems to data plane attacks. Recently, hardware monitors have been introduced into network processors to verify the expected behavior of processor cores at run time. If instruction-level execution deviates from the expected sequence, an attack is identified, triggering processor core recovery efforts. In this manuscript, we describe a scalable network processor monitoring system that supports the reallocation of hardware monitors to processor cores in response to workload changes. The scalability of our monitoring architecture is demonstrated using theoretical models, simulation, and router system-level experiments implemented on an FPGA-based hardware platform. For a system with four processor cores and six monitors, the monitors result in a 6 percent logic and 38 percent memory bit overhead versus the processor’s core logic and instruction storage. No slowdown of system throughput due to monitoring is reported.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 57
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-02-09
    Description: As a variation of random linear network coding, segmented network coding (SNC) has attracted great interest in data dissemination over lossy networks due to its low computational cost. In order to guarantee the success of decoding, SNC can adopt a feedbackless forward error correction (FEC) approach by applying a linear block code to the input packets before segmentation at the source node. In particular, if the empirical rank distribution of transfer matrices of segments is known in advance, several classes of coded SNC can achieve close-to-optimal decoding performance. However, the empirical rank distribution in the absence of feedback has been little investigated yet, making the whole performance of the FEC approach unknown. To close this gap, in this paper, we present the first comprehensive study on the transmission scheduling issue for the FEC approach, aiming at optimizing the rank distribution of transfer matrices with little control overhead. We propose an efficient adaptive scheduling framework for coded SNC in lossy unicast networks. This framework is one-sided (i.e., each network node forwards the segments adaptively only according to its own state) and scalable (i.e., its buffer cost will not keep on growing when the number of input packets goes to infinity). The performance of the framework is further optimized based on a linear programming approach. Extensive numerical results show that our framework performs near-optimally with respect to the empirical rank distribution.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 58
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-02-09
    Description: Routing algorithm design for on-chip networks (OCNs) has become increasingly challenging due to high levels of integration and complexity of modern systems-on-chip (SoCs). The inherent unreliability of components, embedded oversized IP blocks, and fine-grained voltage-frequency islands (VFIs) management among others, raise several challenges in OCNs: (a) network topologies become irregular or asymmetric making circular route dependencies that lead to deadlock hard to detect; and (b) routing algorithms that lack strong load-balancing properties often saturate prematurely. In order to address the aforementioned deadlock and load-balancing problems, we propose the traffic balancing oblivious routing (TBOR) algorithm. It is a two-phase routing algorithm consisting of: (1) construction of the weighted acyclic channel dependency graph (CDG) for the OCN to efficiently maximize available resource utilization; and (2) channel ordering across turn models to keep the underlying CDG cycle-free to guarantee deadlock-freedom using one or more turn-models. Channel bandwidth utilization and traffic balancing are achieved through static virtual channel allocation according to residual bandwidth of healthy links. In addition, we introduce in this work two schemes of different granularity of fault detection and analysis while guaranteeing in-order packet delivery by assigning a unique path to each flow. Extensive experiments demonstrate the proposed routing methodology outperforms previous algorithms.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 59
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-02-09
    Description: The approach for fast application relaunching on the current Android system is to cache background applications in memory. This mechanism is limited by the available memory size. In addition, the application state may not be easily recovered. We propose a prototype system, MARS, to enable page swapping and cache more applications. MARS can speed up the application relaunching and restore the application state. As a new page swapping design for optimizing application relaunching, MARS isolates Android runtime Garbage Collection (GC) from page swapping for compatibility and employs several flash-aware techniques for swap-in speedup. Two main components of MARS are page slot allocation and read/write control. Page slot allocation reorganizes page slots in swap area to produce sequential reads and improve the performance of swap-in. Read/Write control addresses the read/write interference issue by reducing concurrent and extra internal writes. Compared to the conventional Linux page swapping, these two components can scale up the read bandwidth up to about 3.8 times. Application tests on a Google Nexus 4 phone show that MARS reduces the launching time of applications by 50 $sim$ 80 percent. The modified page swapping mechanism can outperform the conventional Linux page swapping up to four times.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 60
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-02-09
    Description: With the rapid proliferation of RFID technologies, RFID has been introduced into applications such as inventory and sampling inspection. Conventionally, in RFID systems, the reader usually identifies all the RFID tags in the interrogation region with the maximum power. However, some applications may only need to identify the tags in a specified area, which is usually smaller than the reader’s default interrogation region. An example could be identifying the tags in a box, while ignoring the tags out of the box. In this paper, we respectively present two solutions to identify the tags in the specified area. The principle of the solutions can be compared to the picture-taking process of an auto-focus camera, which firstly focuses on the target automatically and then takes the picture. Similarly, our solutions first focus on the specified area and then shoot the tags. The design of the two solutions is based on the extensive empirical study on RFID tags. Realistic experiment results show that our solutions can reduce the execution time by $44$ percent compared to the baseline solution, which identifies the tags with maximum power. Furthermore, we improve the proposed solutions to make them work well in more complex environments.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 61
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-02-09
    Description: We show that no existing continuous-time, binary value-domain model for digital circuits is able to correctly capture glitch propagation. Prominent examples of such models are based on pure delay channels (P), inertial delay channels (I), or the elaborate Delay Degradation Model (DDM) channels proposed by Bellido-Díaz et al. We accomplish our goal by considering the border between solvability and non-solvability of a simple problem called short-pulse filtration (SPF), which is closely related to arbitration and synchronization. On one hand, we prove that SPF is solvable in bounded time in any such model that provides channels with non constant delay, like I and DDM. This is in opposition to the impossibility of solving bounded SPF in real (physical) circuit models. On the other hand, for binary circuit models with constant-delay channels, we prove that SPF cannot be solved even in unbounded time; again in opposition to physical circuit models. Consequently, indeed none of the binary value-domain models proposed so far (and that we are aware of) faithfully captures glitch propagation of real circuits. We finally show that these modeling mismatches do not hold for the weaker eventual SPF problem.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 62
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-02-09
    Description: Spin-transfer torque RAM (STT-RAM) has emerged as an energy-efficient and high-density alternative to SRAM for large on-chip caches. However, its high write energy has been considered as a serious drawback. Hybrid caches mitigate this problem by incorporating a small SRAM cache for write-intensive data along with an STT-RAM cache. In such architectures, choosing cache blocks to be placed into the SRAM cache is the key to their energy efficiency. This paper proposes a new hybrid cache architecture called prediction hybrid cache. The key idea is to predict write intensity of cache blocks at the time of cache misses and determine block placement based on the prediction. We design a write intensity predictor that realize the idea by exploiting a correlation between write intensity of blocks and memory access instructions that incur cache misses of those blocks. It includes a mechanism to dynamically adapt the predictor to application characteristics. We also design a hybrid cache architecture in which write-intensive blocks identified by the predictor are placed into the SRAM region. Evaluations show that our scheme reduces energy consumption of hybrid caches by 28 percent (31 percent) on average compared to the existing hybrid cache architecture in a single-core (quad-core) system.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 63
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-02-09
    Description: In online social networks (OSNs), to evaluate trust from one user to another indirectly connected user, the trust evidence in the trusted paths (i.e., paths built through intermediate trustful users) should be carefully treated. Some paths may overlap with each other, leading to a unique challenge of path dependence , i.e., how to aggregate the trust values of multiple dependent trusted paths. OSNs bear the characteristic of high clustering, which makes the path dependence phenomenon common. Another challenge is trust decay through propagation, i.e., how to propagate trust along a trusted path, considering the possible decay in each node. We analyze the similarity between trust propagation and network flow, and convert a trust evaluation task with path dependence and trust decay into a generalized network flow problem. We propose a modified flow-based trust evaluation scheme GFTrust , in which we address path dependence using network flow, and model trust decay with the leakage associated with each node. Experimental results, with the real social network data sets of Epinions and Advogato, demonstrate that GFTrust can predict trust in OSNs with a high accuracy, and verify its preferable properties.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 64
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-02-09
    Description: Indexing key-value data on persistent storage is an important factor for NoSQL databases. Most key-value storage engines use tree-like structures for data indexing, but their performance and space overhead rapidly get worse as the key length becomes longer. This also affects the merge or compaction cost which is critical to the overall throughput. In this paper, we present ForestDB, a key-value storage engine for a single node of large-scale NoSQL databases. ForestDB uses a new hybrid indexing scheme called HB $^+$ -trie, which is a disk-based trie-like structure combined with B $^+$ -trees. It allows for efficient indexing and retrieval of arbitrary length string keys with relatively low disk accesses over tree-like structures, even though the keys are very long and randomly distributed in the key space. Our evaluation results show that ForestDB significantly outperforms the current key-value storage engine of Couchbase Server [1] , LevelDB [2] , and RocksDB [3] , in terms of both the number of operations per second and the amount of disk writes per update operation.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 65
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-02-09
    Description: Parallelization is an efficient approach to accelerate multi-core, multi-processor and cluster architecture simulators. Nevertheless, frequent synchronization can significantly hinder the performance of a parallel simulator. A common practice in alleviating synchronization cost is to relax synchronization using lengthened synchronous steps. However, as a side effect, simulation accuracy deteriorates considerably. Through analyzing various factors contributing to the causality error in lax synchronization, we observe that a coherent speed across all nodes is critical to achieve high accuracy. To this end, we propose wall-clock based synchronization (WBSP), a novel mechanism that uses wall-clock time to maintain a coherent running speed across the different nodes by periodically synchronizing simulated clocks with the wall clock within each lax step. Our proposed method only results in a modest precision loss while achieving performance close to lax synchronization. We implement WBSP in a many-core parallel simulator and a cluster parallel simulator. Experimental results show that at a scale of 32-host threads, it improves the performance of the many-core simulator by 4.3× on average with less than a 5.5 percent accuracy loss compared to the conservative mechanism. On the cluster simulator with 64 nodes, our proposed scheme achieves an 8.3× speedup compared to the conservative mechanism while yielding only a 1.7 percent accuracy loss. Meanwhile, WBSP outperforms the recent proposed adaptive mechanism on simulations that exhibit heavy traffic.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 66
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-02-09
    Description: With increasing transistor density on a single chip, processor design in the nanoscale era is hitting power and frequency walls. Due to these challenges, processors not only need to run fast, but remain cool and consume less energy. At this juncture where no further improvement in clock frequency is possible, data dependent latching through timing speculation provides a silver lining. In this paper, (a) we present a novel power level switching mechanism using reliable and aggressive designs that support overclocking. Using the proposed framework, we achieve 40 percent speed-up and also observe 75 percent energy-delay squared product (ED $^2$ ) savings relative to the base architecture. (b) showcase the loss of efficiency in current chip multiprocessor systems due to excess power supplied. We propose a utilization-aware task scheduling (UTS)—a power management scheme that increases energy efficiency of chip multiprocessors. (c) demonstrate that UTS along with aggressive timing speculation extracts ample performance from the system without loss of efficiency, and also without breaching power and thermal constraints. We demonstrate that UTS improves performance by 12 percent due to aggressive power level switching and over 50 percent in ED $^2$ savings in comparing to traditional power management techniques.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 67
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-04-08
    Description: The papers in this special section focus on next generation urbanization that incorporates smart city development. The development of smart cities is viewed as the key to the next generation urbanization process for improving the efficiency, reliability, and security of a traditional city. The concept of smart city includes various aspects such as environmental sustainability, social sustainability, regional competitiveness,natural resources management, cybersecurity, and quality of life improvement. With the massive deployment of networked smart devices/sensors, an unprecedentedly large amount of sensory data can be collected and processed by advanced computing paradigms, which are the enabling techniques for smart city. For example, given historical environmental, population, and economic information, salient modeling and analytics are needed to simulate the impact of potential city planning strategies, which will be critical for intelligent decision-making.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 68
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-04-08
    Description: Buildings account for nearly 40 percent of the total energy consumption in the United States. As a critical step toward smart cities, it is essential to intelligently manage and coordinate the building operations to improve the efficiency and reliability of overall energy system. With the advent of smart meters and two-way communication systems, various energy consumptions from smart buildings can now be coordinated across the smart grid together with other energy loads and power plants. In this paper, we propose a comprehensive framework to integrate the operations of smart buildings into the energy scheduling of bulk power system through proactive building demand participation . This new scheme enables buildings to proactively express and communicate their energy consumption preferences to smart grid operators rather than passively receive and react to market signals and instructions such as time varying electricity prices. The proposed scheme is implemented in a simulation environment. The experiment results show that the proactive demand response scheme can achieve up to 10 percent system generation cost reduction and 20 percent building operation cost reduction compared with passive demand response scheme. The results also demonstrate that the system cost savings increase significantly with more flexible load installed and higher percentage of proactive customers participation level in the power network.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 69
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-04-08
    Description: To guarantee reliability, bulk data dissemination relies on the negotiation scheme in which senders and receivers negotiate transmission schedule through a three-way handshake procedure. However, we find negotiation incurs a long dissemination time and seriously defers the network-wide convergence. On the other hand, the flooding approach, which is conventionally considered inefficient and energy-consuming, can facilitate bulk data dissemination if appropriately incorporated. This motivates us to pursue a delicate tradeoff between negotiation and flooding in the bulk data dissemination. We propose SurF (Survival of the Fittest), a bulk data dissemination protocol which adaptively adopts negotiation and leverages flooding opportunistically. SurF incorporates a time-reliability model to estimate the time efficiencies (flooding versus negotiation) and dynamically selects the fittest one to facilitate the dissemination process. We implement SurF in TinyOS 2.1.1 and evaluate its performance with 40 TelosB nodes. The results show that SurF, while retaining the dissemination reliability, reduces the dissemination time by 40 percent in average, compared with the state-of-the-art protocols.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 70
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-04-08
    Description: With proliferation of smart phones and an increasing number of services provisioned by clouds, it is commonplace for users to request cloud services from their mobile devices. Accessing services directly from the Internet data centers inherently incurs high latency due to long RTTs and possible congestions in WAN. To lower the latency, some researchers propose to ‘cache’ the services at edge clouds or smart routers in the access network which are closer to end users than the Internet cloud. Although ‘caching’ is a promising technique, placing the services and dispatching users’ requests in a way that can minimize the users’ access delay and service providers’ cost has not been addressed so far. In this paper, we study the joint optimization of service placement and load dispatching in the mobile cloud systems. We show this problem is unique to both the traditional caching problem in mobile networks and the content distribution problem in content distribution networks. We develop a set of efficient algorithms for service providers to achieve various trade-offs among the average latency of mobile users’ requests, and the cost of service providers. Our solution utilizes user's mobility pattern and services access pattern to predict the distribution of user's future requests, and then adapt the service placement and load dispatching online based on the prediction. We conduct extensive trace driven simulations. Results show our solution not only achieves much lower latency than directly accessing service from remote clouds, but also outperforms other classical benchmark algorithms in term of the latency, cost and algorithm running time.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 71
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-04-08
    Description: DDoS attack source traceback is an open and challenging problem. Deterministic packet marking (DPM) is a simple and effective traceback mechanism, but the current DPM based traceback schemes are not practical due to their scalability constraint. We noticed a factor that only a limited number of computers and routers are involved in an attack session. Therefore, we only need to mark these involved nodes for traceback purpose, rather than marking every node of the Internet as the existing schemes doing. Based on this finding, we propose a novel marking on demand (MOD) traceback scheme based on the DPM mechanism. In order to traceback to involved attack source, what we need to do is to mark these involved ingress routers using the traditional DPM strategy. Similar to existing schemes, we require participated routers to install a traffic monitor. When a monitor notices a surge of suspicious network flows, it will request a unique mark from a globally shared MOD server, and mark the suspicious flows with the unique marks. At the same time, the MOD server records the information of the marks and their related requesting IP addresses. Once a DDoS attack is confirmed, the victim can obtain the attack sources by requesting the MOD server with the marks extracted from attack packets. Moreover, we use the marking space in a round-robin style, which essentially addresses the scalability problem of the existing DPM based traceback schemes. We establish a mathematical model for the proposed traceback scheme, and thoroughly analyze the system. Theoretical analysis and extensive real-world data experiments demonstrate that the proposed traceback method is feasible and effective.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 72
    Publication Date: 2016-04-08
    Description: As technology scales deep into the sub-micron regime, transistors become less reliable. Future systems are widely predicted to suffer from considerable aging and wear-out effects. This ominous threat has urged system designers to develop effective run-time testing methodologies that can monitor and assess the system's health. In this work, we investigate the potential of online software-based functional testing at the granularity of individual microprocessor core components in multi-/many-core systems. While existing techniques monolithically test the entire core, our approach aims to reduce testing time by avoiding the over-testing of under-utilized units. To facilitate fine-grained testing, we introduce DaemonGuard, a framework that enables the real-time observation of individual sub-core modules and performs on-demand selective testing of only the modules that have recently been stressed. Moreover, we investigate the impact of the cache hierarchy on the testing process and we develop a cache-aware selective testing methodology that significantly expedites the execution of memory-intensive test programs. The monitoring and test-initiation process is orchestrated by a transparent, minimally-intrusive, and lightweight operating system process that observes the utilization of individual datapath components at run-time. We perform a series of experiments using a full-system, execution-driven simulation framework running a commodity operating system, real multi-threaded workloads, and test programs. Our results indicate that operating-system-assisted selective testing at the sub-core level leads to substantial savings in testing time and very low impact on system performance. Additionally, the cache-aware testing technique is shown to be very effective in exploiting the memory hierarchy to further minimize the testing time.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 73
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-04-08
    Description: Mergesort is the most widely-known external sorting algorithm, which is used when the data being sorted do not fit into the available main memory. There have been several attempts to improve mergesort by reducing I/O time, since mergesort is I/O intensive. However, these methods assumed that mergesort runs on hard disk drives (HDDs). Flash-based solid state drives (SSDs) are emerging as next generation storage devices and becoming alternatives to HDDs. SSDs outperform HDDs in access latency, because they have no physical arms to move. In addition, SSDs benefit from their inner structure by exploiting internal parallelism, resulting in high I/O bandwidth. Previous methods for improving mergesort focused on reducing random access cost, which is insignificant on SSDs. In this paper we propose an external mergesort algorithm for SSDs called FMsort . FMsort calculates a block read order which is the order of blocks needed in the merge phase. With a block read order, a number of blocks required during the merge phase are read into main memory via multiple asynchronous I/Os. Our experiments show that FMsort outperforms other mergesort algorithms, at an invisible cost of calculating a block read order.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 74
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-04-08
    Description: Sharing resources in hierarchical real-time systems implemented with reservation servers requires the adoption of special budget management protocols that preserve the bandwidth allocated to a specific component. In addition, blocking times must be accurately estimated to guarantee both the global feasibility of all the servers and the local schedulability of applications running on each component. This paper presents two new local schedulability tests to verify the schedulability of real-time applications running on reservation servers under fixed priority and EDF local schedulers. Reservation servers are implemented with the BROE algorithm. A simple extension to the SRP protocol is also proposed to reduce the blocking time of the server when accessing global resources shared among components. The performance of the new schedulability tests are compared with other solutions proposed in the literature, showing the effectiveness of the proposed improvements. Finally, an implementation of the main protocols on a lightweight RTOS is described, highlighting the main practical issues that have been encountered.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 75
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-04-08
    Description: Maintaining timeliness and data freshness for real-time data objects has long been recognized as an important problem in real-time database research. Despite years of active research, most of the past work focuses on uniprocessor systems. In this paper, we study the workload-aware temporal consistency maintenance problem upon multiprocessor platforms. We consider the problem of how to partition a set of update transactions to $m ge 2$ processors to maintain the temporal consistency of real-time data objects under both earliest deadline first (EDF) and deadline monotonic (DM) scheduling in each processor, while minimizing the total workload on $m$ processors. Firstly, we only consider the feasibility aspect of the problem by proposing two polynomial time partitioning schemes, Temporal Consistency Partitioning under EDF ( $mathsf TCP_{mathsf{EDF}}$ ) and Temporal Consistency Partitioning under DM ( $mathsf TCP_{mathsf{DM}}$ ), and formally showing that the resource augmentation bounds of both $mathsf TCP_{mathsf{EDF}}$ and $mathsf TCP_{mathsf{DM}}$ are $({3 - frac{1}{m}})$ . Secondly, we address the partition problem globally by proposing a polynomial time heuristic, Density factor Balancing Fit ( $mathsf{DBF}$ ), where density factor balancing plays a major role in producing workload-efficient partitionings. Finally, we evaluate the feasibility and workload performances of $mathsf{DBF}$ versus other heuristics with comparable quality experimentally.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 76
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-04-08
    Description: As power consumption of the Internet has been growing quickly in recent years, saving energy has become an important problem of networking research, for which the most promising solution is to find the minimum-power network subsets and shut down other unnecessary network devices and links to satisfy changing traffic loads. However, in traditional networks, it is difficult to implement a coordinated strategy among the network devices due to their distributed network control. On the other hand, the new networking paradigm—software defined network (SDN) provides us an efficient way of having a centralized controller with a global network view to control the power states. As an emerging technology, SDNs usually coexist with traditional networks at present. Therefore, we need to investigate how to save energy in partially deployed SDNs. In this paper, we formulate the optimization problem of finding minimum-power network subsets in partially deployed SDNs. After proving the problem is NP-hard, we propose a heuristic solution to approach its exact solution. Through extensive simulations, we demonstrate that our heuristic algorithm has a good performance; that is, on average we can save about 50 percent of total power consumption in the full SDN, having a distance less than 5 percent of the exact solution's power consumption. Moreover, it also achieves good performance in the partially deployed SDN, on average saving about 40 percent of the total power consumption when there are about 60 percent SDN nodes in the network. Meanwhile, it runs significantly faster than a general linear solver of this problem, by reducing the computation time of the network containing hundreds of nodes by 100× at least.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 77
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-04-08
    Description: In smart city, all kinds of users’ data are stored in electronic devices to make everything intelligent. A smartphone is the most widely used electronic device and it is the pivot of all smart systems. However, current smartphones are not competent to manage users’ sensitive data, and they are facing the privacy leakage caused by data over-collection. Data over-collection, which means smartphones apps collect users’ data more than its original function while within the permission scope, is rapidly becoming one of the most serious potential security hazards in smart city. In this paper, we study the current state of data over-collection and study some most frequent data over-collected cases. We present a mobile-cloud framework, which is an active approach to eradicate the data over-collection. By putting all users’ data into a cloud, the security of users’ data can be greatly improved. We have done extensive experiments and the experimental results have demonstrated the effectiveness of our approach.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 78
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-04-08
    Description: We consider the crucial problem of maximizing the system-wide performance which takes into account request processing throughput , smartphone user experience and system stability in a participatory sensing system with cooperative smartphones. Three important controls need to be made, i.e., 1) request admission control, 2) task allocation, and 3) task scheduling on smartphones. It is highly challenging to achieve the optimal system-wide performance, given arbitrary unknown arrivals of sensing requests, intrinsic tradeoff between request processing throughput and smartphone user experience degradation, and heterogenous requests. Little existing work has studied this crucial problem of maximizing the system-wide performance of a participatory sensing system as a whole. In response to the challenges, we propose an optimal online control approach to maximize the system-wide performance of a participatory sensing system. Exploiting the stochastic Lyapunov optimization techniques, it derives the optimal online control strategies for request admission control, task allocation and task scheduling on smartphones. The most salient feature of our approach is that the achieved system-wide performance is arbitrarily close to the optimum, despite unpredictable and arbitrary request arrivals. Rigorous theoretical analysis and comprehensive simulation evaluation jointly demonstrate the efficacy of our online control approach.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 79
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-04-08
    Description: Audio represents one of the most appealing yet least exploited modalities in wireless sensor networks, due to the potentially extremely large data volumes and limited wireless capacity. Therefore, how to effectively collect audio sensing information remains a challenging problem. In this paper, we propose a new paradigm of audio information collection based on the concept of audio-on-demand. We consider a sink-free environment targeting for disaster management, where audio chunks are stored inside the network for retrieval. The difficulty is to guarantee a high search success rate without infrastructure support. To solve the problem, we design a novel replication algorithm that deploys an optimal number of $O(sqrt{n})$ replicas across the sensor network. We prove the optimality of the energy consumption of the algorithm. We implement a sink-free audio-on-demand (SAoD) WSN system, and conduct extensive simulations to evaluate the performance and efficiency of our design. The experimental results show that our design can provide satisfactory quality of audio-on-demand service with short startup latency and slight playback jitter. Extensive simulation results show that this design achieves a search success rate of 98 percent while reducing the search energy consumption by an order of magnitude compared with existing schemes.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 80
    Publication Date: 2016-04-08
    Description: One of the solutions to improve mobile Delay Tolerant Network (DTN) performance is to place additional stationary nodes, called throwboxes, to create a greater number of contact opportunities. In this paper, we study a key optimization problem in a time-evolving throwbox-assisted DTN: throwbox selection, to answer the questions such as “how many active throwboxes do I need?” and “which throwboxes should be activated?” We formally define two throwbox optimization problems: min-throwbox problem and k-throwbox problem for time-evolving DTNs modeled by weighted space-time graphs. We show that min-throwbox problem is NP-hard and propose a set of greedy algorithms which can efficiently provide quality solutions for both challenging problems.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 81
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-04-08
    Description: Chip multiprocessors (CMPs) require a low-latency interconnect fabric network-on-chip (NoC) to minimize processor stall time on instruction and data accesses that are serviced by the last-level cache (LLC). While packet-switched mesh interconnects sacrifice performance of many-core processors due to NoC-induced delays, existing circuit-switched interconnects do not offer lower network delays as they cannot hide the time it takes to set up a circuit. To address this problem, this work introduces CIMA—a hybrid circuit-switched and packet-switched mesh-based interconnection network that affords low LLC access delays at a small area cost. CIMA uses virtual cut-through (VCT) switching for short request packets, and benefits from circuit switching for longer, delay sensitive response packets. While a request is being served by the LLC, CIMA attempts to set up a circuit for the corresponding response packet. By the time the request packet is served and the response gets ready, a circuit has already been prepared, and as a result, the response packet experiences short delay in the network. A detailed evaluation targeting a 64-core CMP running scale-out workloads reveals that CIMA improves system performance by 21 percent over the state-of-the-art hybrid circuit-packet-switched network.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 82
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-04-08
    Description: The Hierarchical Real-Time Scheduling (HiRTS) technique helps improve overall resource utilization in real-time embedded systems. With HiRTS, a computation resource is divided into a group of temporal resource partitions, each of which accommodates multiple real-time tasks. Besides the computation resource partitioning problem, real-time task scheduling on resource partitions is also a major problem of HiRTS. The existing scheduling techniques for dedicated resources, like schedulability tests and utilization bounds, are unable to work without changes on temporal resource partitions in most cases. In this paper, we show how to achieve maximal transparency for task scheduling on Regular Partitions, a type of resource partition introduced by the Regularity-based Resource Partition (RRP) Model. We show that several classes of real-time scheduling problems on a regular partition can be transformed into equivalent problems on a dedicated single resource, such that comprehensive single-resource scheduling techniques provide optimal solutions. Furthermore, this transformation method could be applied to different types of real-time tasks such as periodic tasks, sporadic tasks and aperiodic tasks.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 83
    Publication Date: 2016-08-09
    Description: Stereo matching, a key element towards extracting depth information from stereo images, is widely used in several embedded consumer electronic and multimedia systems. Such systems demand high processing performance and accurate depth perception, while their deployment in embedded and mobile environments implies that cost, energy and memory overheads need to be minimized. Hardware acceleration has been demonstrated in efficient embedded stereo vision systems. To this end, this paper presents the design and implementation of a hardware-based stereo matching system able to provide high accuracy and concurrently high performance for embedded vision devices, which are associated with limited hardware and power budget. We first implemented a compact and efficient design of the guided image filter, an edge-preserving filter, which reduces the hardware complexity of the implemented stereo algorithm, while at the same time maintains high-quality results. The guided filter design is used in two parts of the stereo matching pipeline, showing that it can simplify the hardware complexity of the Adaptive Support Weight aggregation step, and efficiently enable a powerful disparity refinement unit, which improves matching accuracy, even though cost aggregation is based on simple, fixed support strategies. We implemented several variants of our design on a Kintex-7 FPGA board, which was able to process HD video (1,280 × 720) in real-time (60 fps), using ∼57.5k and ∼71k of the FPGA's logic (CLB) and register resources, respectively. Additionally, the proposed stereo matching design delivers leading accuracy when compared to state-of-the-art hardware implementations based on the Middlebury evaluation metrics (at least 1.5 percent less bad matching pixels).
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 84
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-08-09
    Description: Simulation speed is crucial in virtual platforms, in order to enhance the design flow with early validation and design space exploration. This work tackles this challenge by focusing on two main techniques for speeding up virtual platform simulation, namely efficient data types implementation and a novel scheduling technique. Both the optimizations are obtained through code manipulation. The target language is C++ and its extensions (i.e., SystemC), that are the most widespread languages for virtual platform modeling and simulation. The optimization techniques are considered orthogonal, as they target different aspects of the simulated code. Experimental results prove the effectiveness of both the single techniques and of their combined application on complex case studies, with the result of reaching a maximum speedup of 70 $times$ in the simulation of a virtual platform.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 85
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-08-09
    Description: Although most current cloud providers, such as Amazon Web Services (AWS) and Microsoft Azure offer different types of computing instances with different capacities, cloud users tend to hire a cluster of instances of particular type to ensure performance predictability for their applications. Nowadays, many large-scale applications including big data analytics applications feature workload patterns that have heterogeneous resource demands, for which, accounting for heterogeneity of virtual cloud instances to allocate would be highly advantageous to the application performance. However, performance predictability has been always an issue in such clusters of heterogeneous resources. In particular, to precisely decide on what instances from which types to enclose in a cluster, such that the desired performance is attained, remains an open question. To this end, we devise a resource allocation mechanism by formulating it as a Mixed-Integer programming model representing an optimization problem. Our resource allocation mechanism incorporates predictable average performance as a unified performance metric, which concerns two key performance-related issues: (a) Performance variation within same-type instances, and (b) Correlations of performance variabilities across different types. Our experimental results demonstrate that target performance is predictable and attainable for clusters of heterogeneous resources. Our mechanism constructs clusters whose performance is within 95 percent of the performance of optimal ones, hence deadlines are always met. By reoptimisation, our mechanism can react to performance mispredictions and support autoscaling for varying workloads. We experimentally verify our findings using clusters on Amazon EC2 with MapReduce workloads, and on a private cloud as well. We conduct comparison experiments with an existing recent resource allocation approach in literature.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 86
    Publication Date: 2016-08-09
    Description: A binary de Bruijn sequence of order $n$ is a sequence of zeros and ones of period $2^n$ that contains every binary $n$ -tuple exactly once in a period of the sequence. A composited construction of a de Bruijn sequence is a construction of a nonlinear feedback shift register (NLFSR) that generates a de Bruijn sequence where the composited feedback function of the NLFSR is the sum of a feedback function with $k$ th order composition and a sum of $(k+1)$ product-of-sum terms. The goals of this article are to perform a profound analysis of composited de Bruijn sequences for use in cryptography and find an efficient implementation of the composited feedback function. We first determine the lower bound of the linear complexity of a composited de Bruijn sequence and then conduct a profound analysis on the composited construction by introducing the notion of the higher order $D$ -morphic preimages of a binary sequence. Our analysis aims at the r- construction of a composited de Bruijn sequence from a segment known as $k$ th order $D$ -morphic order $n$ de Bruijn preimages ( $(n,k)$ -DMDPs) of length $(2^n+k)$ and $k$ th order $D$ -morphic order $n$ $m$ -sequence preimages ( $(n,k)$ -DMMPs
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 87
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-08-09
    Description: As the number of cores increases, more master components can simultaneously access main memory. In real-time systems, this ongoing trend is leading to crippling pessimism when computing the worst-case cache miss time, since a memory request could potentially contend with other requests coming from every other core in the system. CPU-centric scheduling policies, therefore, are no longer sufficient to guarantee schedulability without introducing unacceptable pessimism for memory-intensive task sets. For this reason, we believe a shift is needed towards real-time scheduling approaches that can prevent timing interference from memory contention, while still making efficient use of the multicore platform. Previously, we have demonstrated the practicality of the PREM task model, where each job consists of a sequence of phases, some of which access memory and some of which perform only computation on cached data. In this work, we present the first global memory-centric scheduling policy for memory-intensive task sets whose jobs can be modeled as a sequence of memory-intensive (memory phase) and execution-intensive (execution phase) phases. The proposed policy is parameterizable based on the number of cores which are allowed to concurrently access main memory without saturating it. Building upon results from multicore response-time analysis, we introduce the notion of virtual memory cores as a fundamental technique for performing phase-based response time analysis for memory-intensive task sets. Finally, we use synthetic task set generation to demonstrate that proposed scheduling policy and related schedulability bound do indeed better schedule memory-intensive task sets when compared to state-of-art multicore scheduling.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 88
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-08-09
    Description: Graphic processing units (GPUs) provide a massively-parallel computational power and encourage the use of general-purpose computing on GPUs (GPGPU). The distinguished design of discrete GPUs helps them to provide the high throughput, scalability, and energy efficiency needed for GPGPU applications. Despite the previous study on GPU virtualization, the tradeoffs between the virtualization approaches remain unclear, because of a lack of designs for or quantitative evaluations of the hypervisor-level virtualization for discrete GPUs. Shedding light on these tradeoffs and the technical requirements for the hypervisor-level virtualization would facilitate the development of an appropriate GPU virtualization solution. $sf{GPUvm}$ , which is an open architecture for hypervisor-level GPU virtualization with a particular emphasis on using the Xen hypervisor, is presented in this paper. $sf{GPUvm}$ offers three virtualization modes: the full-, naive para-, and high-performance para-virtualization. $sf{GPUvm}$ exposes low- and high-level interfaces such as memory-mapped I/O and DRM APIs to the guest virtual machines (VMs). Our experiments using a relevant commodity GPU showed that $sf{GPUvm}$ incurs different overheads as the level of the exposed int- rfaces is changed. The results also showed that a coarse-grained fairness on the GPU among multiple VMs can be achieved using GPU scheduling.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 89
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-08-09
    Description: A 3D stacked network-on-chip (NOC) promises the integration of a large number of cores in a many-core system-on-chip (SOC). The NOC can be used to test the embedded cores in such SOCs, whereby the added cost of dedicated test-access hardware can be avoided. However, a potential problem associated with 3D NOC-based test access is the emergence of hotspots due to stacking and the high toggle rates associated with structural test patterns used for manufacturing test. High temperatures and hotspots can lead to the failure of good parts, resulting in yield loss. We describe a unicast-based multicast approach and a thermal-driven test scheduling method to avoid hotspots, whereby the full NOC bandwidth is used to deliver test packets. Test delivery is carried out using a new unicast-based multicast scheme. Experimental results highlight the effectiveness of the proposed method in reducing test time under thermal constraints.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 90
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-08-09
    Description: Power budgeting is an essential aspect of networks-on-chip (NoC) to meet the power constraint for on-chip communications while assuring the best possible overall system performance. For simplicity and ease of implementation, existing NoC power budgeting schemes treat all the individual routers uniformly when allocating power to them. However, such homogeneous power budgeting schemes ignore the fact that the workloads of different NoC routers may vary significantly, and thus may provide excess power to routers with low workloads, whereas insufficient power to those with high workloads. In this paper, we formulate the NoC power budgeting problem in order to optimize the network performance over a power budget through per-router frequency scaling. We take into account of heterogeneous workloads across different routers as imposed by variations in traffic. Correspondingly, we propose a fine-grained solution using an agile algorithm with low time complexity. Frequency of each router is set individually according to its contribution to the average network latency while meeting the power budget. Experimental results have confirmed that with fairly low runtime and hardware overhead, the proposed scheme can help save up to $50$ percent application execution time when compared with the latest proposed methods.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 91
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-08-09
    Description: One of the main reasons why Byzantine fault-tolerant (BFT) systems are currently not widely used lies in their high resource consumption: $3f+1$  replicas are required to tolerate only $f$  faults. Recent works have been able to reduce the minimum number of replicas to  $2f+1$ by relying on trusted subsystems that prevent a faulty replica from making conflicting statements to other replicas without being detected. Nevertheless, having been designed with the focus on fault handling, during normal-case operation these systems still use more resources than actually necessary to make progress in the absence of faults. This paper presents Resource-efficient Byzantine Fault Tolerance  ( ReBFT ), an approach that minimizes the resource usage of a BFT system during normal-case operation by keeping $f$  replicas in a passive mode. In contrast to active replicas, passive replicas neither participate in the agreement protocol nor execute client requests; instead, they are brought up to speed by verified state updates provided by active replicas. In case of suspected or detected faults, passive replicas are activated in a consistent manner. To underline the flexibility of our approach, we apply ReBFT to two ex- sting BFT systems: PBFT and MinBFT.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 92
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-08-09
    Description: This paper presents a set of eight novel configurations for the design of single event soft error (SE) tolerant latches. Each latch uses a three-transistor building block called 1P-2N and its complementary block 2P-1N. It is shown that all proposed latches have better soft error rate (SER) performance as compared to the SE-tolerant latches reported till date. It is also shown that the proposed configurations provide a more relaxed tradeoff between SER and other specifications mainly delay, power dissipation and area. RTL implementation of a proposed latch is also shown to verify the behaviour subjected to the transient faults. The benefit of implementing a SE tolerant circuit in VHDL language is the feasibility to exhaustively check the immunity of the circuit against transient faults at every sensitive node by just writing simple boolean expressions of each element in the circuit. The proposed configurations and a few selected reported configurations have been also designed, laid out and post layout extracted in 90 nm CMOS logic technology. Post layout simulations have been performed on all proposed latch configurations with clock frequency of 500 MHz and performance comparison results are presented.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 93
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-08-09
    Description: Large integer multiplication is a major performance bottleneck in fully homomorphic encryption (FHE) schemes over the integers. In this paper two optimised multiplier architectures for large integer multiplication are proposed. The first of these is a low-latency hardware architecture of an integer-FFT multiplier. Secondly, the use of low Hamming weight (LHW) parameters is applied to create a novel hardware architecture for large integer multiplication in integer-based FHE schemes. The proposed architectures are implemented, verified and compared on the Xilinx Virtex-7 FPGA platform. Finally, the proposed implementations are employed to evaluate the large multiplication in the encryption step of FHE over the integers. The analysis shows a speed improvement factor of up to 26.2 for the low-latency design compared to the corresponding original integer-based FHE software implementation. When the proposed LHW architecture is combined with the low-latency integer-FFT accelerator to evaluate a single FHE encryption operation, the performance results show that a speed improvement by a factor of approximately 130 is possible.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 94
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-08-09
    Description: Novel 4D CORDIC algorithms and hardware architecture for multiplying quaternions are presented, aimed at constant-coefficient multipliers. In our solution, microrotations are multiplications by hypercomplex numbers with only one non-zero imaginary part. Such transformations can be described using sparse matrices and implemented using less hardware resources than the iteration of the known quaternion CORDIC algorithm. Chip area can be saved because sparse microrotations can be computed using a permutation network and two-operand adders, without four-operand additions, which are necessary in the known solution. The obtainable area savings depend on the implementation technology and the architectures of adders and bit shifters but can be as large as 45 and 25 percent for ASIC and FPGA circuits, respectively. The circuitry simplifications come at the cost of slowing down computations: our approach improves the area-delay product, and a single sparse iteration can be executed up to 15 percent faster than the conventional one, but more microrotations are usually necessary to achieve the required accuracy. On the other hand, a sparse 4D iteration can be identified with microrotations of two 2D CORDICs which work in parallel. This makes convergence analysis easier than in the previous 4D solution, as it is sufficient to consider only two dimensions.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 95
    Publication Date: 2016-08-09
    Description: Field-Programmable Gate Arrays (FPGAs) are rapidly gaining popularity as implementation platforms for complex space-borne computing systems. However, such systems are exposed to cosmic radiation with levels orders of magnitude higher than terrestrial levels which can cause transient and even permanent hardware faults in on-board computing platforms. Because of this, development of effective fault mitigation methods and self-repair mechanisms has become a vital aspect for FPGA-based space-borne computing platforms. This work presents a novel method for transient and permanent fault mitigation and run-time fault recovery for commercial-grade FPGA devices with partially reconfigurable tile-based architectures. The proposed method ensures the same pre-determined recovery time for transient and permanent hardware faults through dynamic on-chip component relocation regardless of the fault type. The method makes use of fully distributed control, communication, self-synchronization and self-integration mechanisms embedded in each on-chip hardware component. Run-time collaboration between components provides relocation & fault mitigation procedures. The distributed nature of the above mechanisms excludes most central failure points which could cause non-restorable system faults. This method has been implemented, tested and verified on a Xilinx Kintex-7 FPGA platform. Results show that the proposed method is significantly more resource efficient when compared with Triple-Module Redundancy or central, software-based control mechanisms.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 96
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-08-09
    Description: Manipulating logic functions via majority operators recently drew the attention of researchers in computer science. For example, circuit optimization based on majority operators enables superior results as compared to traditional synthesis tools. Also, the Boolean satisfiability problem finds new solution approaches when described in terms of majority decisions. To support computer logic applications based on majority, a sound and complete set of axioms is required. Most of the recent advances in majority logic deal only with ternary majority (MAJ-3) operators because the axiomatization with solely MAJ-3 and complementation operators is well understood. However, it is of interest extending such axiomatization to $n$ -ary majority operators (MAJ- $n$ ) from both the theoretical and practical perspective. In this work, we address this issue by introducing a sound and complete axiomatization of MAJ- $n$ logic. Our axiomatization naturally includes existing MAJ-3 and MAJ-5 axiomatic systems. Based on this general set of axioms, computer applications can now fully exploit the expressive power of majority logic.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 97
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-08-09
    Description: Diagnosis by comparison is a realistic approach to detect faults of multiprocessor systems. This paper considers a pessimistic diagnostic strategy for hypercube-like multiprocessor systems under the comparison model. The pessimistic strategy is a diagnostic process whereby all faulty nodes can be correctly identified and at most one fault-free node may be misjudged as a faulty node. We propose a pessimistic diagnosis algorithm based on the largest component in the faulty system. For a system with $N=2^n$ nodes and $nge 5$ , when the number of faulty nodes is bounded by $2n-2$ , the algorithm can correctly identify all nodes except at most one node left undiagnosed. The time complexity of the algorithm is $O(Nlog_2N)$ .
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 98
    Publication Date: 2016-08-09
    Description: The market trend of flash memory chips has been toward high density but with low reliability. The rapidly increasing bit error rates and emerging reliability issues of the coming triple-level cell and even three-dimensional flash chips will expose users to extremely high risks for storing data in such low reliability storage media. With these concerns in mind, this paper rethinks the layer design of flash devices and proposes a complete paradigm shift to re-configure physical flash chips of potentially massive parallelism into better “virtual chips”, in order to improve the data recoverability in a modular and low-cost way. The concept of virtual chips is realized by reinforcing the hardware abstraction layer without continually complicating the conventional flash management software of the flash translation layer. The capability and compatibility of the proposed design were verified by both property analysis and a series of experiments with encouraging results.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 99
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-08-09
    Description: Technology scaling leads to the reliability issue as a primary concern in Network-on-Chip (NoC) design. We observe that due to routing algorithm some routers age much faster than others which becomes a bottleneck for NoC lifetime. In this paper, lifetime is modeled as a resource consumed over time. A metric lifetime budget is associated with each router, indicating the maximum allowed workload for current period. Since the heterogeneity in router lifetime reliability has strong correlation with the routing algorithm, we define a problem to optimize the lifetime by routing packets along the path with maximum lifetime budgets. The problem is then extended for both performance and lifetime reliability optimization. The lifetime is optimized in long-term time scale while performance is optimized in short-term time scale. Two dynamic programming-based adaptive routing algorithms (lifetime aware routing and multi-objective routing) are proposed to solve the problems. In the experiments, the lifetime aware routing and multi-objective routing algorithms are evaluated with synthetic traffic and real benchmarks respectively. The experimental results show that the lifetime aware routing has around 20, 45 and 55 percent minimal lifetime improvement than XY routing, NoP routing and Oddeven routing, respectively. In addition, the multi-objective adaptive routing algorithm can effectively improve both performance and lifetime.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 100
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-08-09
    Description: Homomorphic encryption (HE) systems enable computations on encrypted data, without decrypting and without knowledge of the secret key. In this work, we describe an optimized Ring Learning With Errors (RLWE) based implementation of a variant of the HE system recently proposed by Gentry, Sahai and Waters (GSW). Although this system was widely believed to be less efficient than its contemporaries, we demonstrate quite the opposite behavior for a large class of applications. We first highlight and carefully exploit the algebraic features of the system to achieve significant speedup over the state-of-the-art HE implementation, namely the IBM homomorphic encryption library (HElib). We introduce several optimizations on top of our HE implementation, and use the resulting scheme to construct a homomorphic Bayesian spam filter, secure multiple keyword search, and a homomorphic evaluator for binary decision trees. Our results show a factor of $10times$ improvement in performance (under the same security settings and CPU platforms) compared to IBM HElib for these applications. Our system is built to be easily portable to GPUs (unlike IBM HElib) which results in an additional speedup of up to a factor of $103.5times$ to offer an overall speedup of $1{,}035times$ .
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
Close ⊗
This website uses cookies and the analysis tool Matomo. More information can be found here...