ALBERT

All Library Books, journals and Electronic Records Telegrafenberg

Your email was sent successfully. Check your inbox.

An error occurred while sending the email. Please try again.

Proceed reservation?

Export
Filter
  • Articles  (825)
  • 2015-2019  (825)
  • 1945-1949
  • IEEE Transactions on Computers (T-C)  (825)
  • 1288
  • Computer Science  (825)
  • 1
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-07
    Description: Digital circuits are expected to increasingly suffer from more hard faults due to technology scaling. Especially, a single hard fault in ALU (Arithmetic Logic Unit) might lead to a total failure in processors or significantly reduce their performance. To address these increasingly important problems, we propose a novel cost-efficient fault-tolerant mechanism for the ALU, called LIZARD. LIZARD employs two half-word ALUs, instead of a single full-word ALU, to perform computations with concurrent fault detection. When a fault is detected, the two ALUs are partitioned into four quarter-word ALUs. After diagnosing and isolating a faulty quarter-word ALU, LIZARD continues its operation using the remaining ones, which can detect and isolate another fault. Even though LIZARD uses narrow ALUs for computations, it adds negligible performance overhead through exploiting predictability of the results in the arithmetic computations. We also present the architectural modifications when employing LIZARD for scalar as well as superscalar processors. Through comparative evaluation, we demonstrate that LIZARD outperforms other competitive fault-tolerant mechanisms in terms of area, energy consumption, performance and reliability.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 2
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-07
    Description: Information searches are the most common application within social networks. Normally, the social network is modeled as a network graph, consisting of nodes (In the rest of the paper, unless otherwise specified, we will use the terms “user” and “node” interchangeably.) representing users within the network and edges representing relationships between users. Choosing the appropriate nodes to form an auxiliary structure for supporting the effective query message spreading can reduce the troublesome repeated queries. To accomplish this, a hybrid search (HS) scheme is proposed. If the query message is received by a node belonging the auxiliary structure constructed by dynamic weighted distributed label clustering (DW-DLC), it would be flooded to all neighbors of the visited node; otherwise, it would be forwarded to one neighbor of the visited node. The DW-DLC based auxiliary structure can accelerate the process of obtaining required information within the network. The simulation results show that the HS+DW-DLC scheme can reduce the average searching delay time, even in a required-information-scarce social network. In addition, the proposed scheme can generate a relatively low amount of repeated messages to lower repeatedly asking social network users.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 3
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-07
    Description: This paper presents a derivation of four radix-2 division algorithms by digit recurrence. Each division algorithm selects a quotient digit from the over-redundant digit set {−2, −1, 0, 1, 2}, and the selection of each quotient digit depends only on the two most-significant digits of the partial remainder in a redundant representation. Two algorithms use a two’s complement representation for the partial remainder and carry-save additions, and the other two algorithms use a binary signed-digit representation for the partial remainder and carry-free additions. Three algorithms are novel. The fourth algorithm has been presented before. Results from the synthesized netlists show that two of our fastest algorithms achieve an improvement of 10 percent in latency per iteration over a standard radix-2 SRT algorithm at the cost of 36 percent more power and 50 percent more area.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 4
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-07
    Description: We present WaFS, a user-level file system, and a related scheduling algorithm for scientific workflow computation in the cloud. WaFS’s primary design goal is to automatically detect and gather the explicit and implicit data dependencies between workflow jobs, rather than high-performance file access. Using WaFS’s data, a workflow scheduler can either make effective cost-performance tradeoffs or improve storage utilization. Proper resource provisioning and storage utilization on pay-as-you-go clouds can be more cost effective than the uses of resources in traditional HPC systems. WaFS and the scheduler controls the number of concurrent workflow instances at runtime so that the storage is well used, while the total makespan (i.e., turnaround time for a workload) is not severely compromised. We describe the design and implementation of WaFS and the new workflow scheduling algorithm based on our previous work. We present empirical evidence of the acceptable overheads of our prototype WaFS and describe a simulation-based study, using representative workflows, to show the makespan benefits of our WaFS-enabled scheduling algorithm.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 5
    Publication Date: 2015-08-07
    Description: This paper presents an anomaly detection model that is granular and distributed to accurately and efficiently identify sensed data anomalies within wireless sensor networks. A more decentralised mechanism is introduced with wider use of in-network processing on a hierarchical sensor node topology resulting in a robust framework for dynamic data domains. This efficiently addresses the big data issue that is encountered in large scale industrial sensor network applications. Data vectors on each node’s observation domain is first partitioned using an unsupervised approach that is adaptive regarding dynamic data streams using cumulative point-wise entropy and average relative density . Second order statistical analysis applied on average relative densities and mean entropy values is then used to differentiate anomalies through robust and adaptive thresholds that are responsive to a dynamic environment. Anomaly detection is then performed in a non-parametric and non-probabilistic manner over the different network tiers in the hierarchical topology in offering increased granularity for evaluation. Experiments were performed extensively using both real and artificial data distributions representative of different dynamic and multi-density observation domains. Results demonstrate higher accuracies in detection as more than 94 percent accompanied by a desirable reduction of more than 85 percent in communication costs when compared to existing centralized methods.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 6
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-07
    Description: The problem of securing data present on USB memories and SD cards has not been adequately addressed in the cryptography literature. While the formal notion of a tweakable enciphering scheme (TES) is well accepted as the proper primitive for secure data storage, the real challenge is to design a low cost TES which can perform at the data rates of the targeted memory devices. In this work, we provide the first answer to this problem. Our solution, called STES, combines a stream cipher with a XOR universal hash function. The security of STES is rigorously analyzed in the usual manner of provable security approach. By carefully defining appropriate variants of the multi-linear hash function and the pseudo-dot product based hash function we obtain controllable trade-offs between area and throughput. We combine the hash function with the recent hardware oriented stream ciphers, namely Mickey, Grain and Trivium. Our implementations are targeted towards two low cost FPGAs—Xilinx Spartan 3 and Lattice ICE40. Simulation results demonstrate that the speeds of encryption/decryption match the data rates of different USB and SD memories. We believe that our work opens up the possibility of actually putting FPGAs within controllers of such memories to perform low-level in-place encryption.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 7
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-07
    Description: Cellular automata (CAs) have been widely used to model and simulate physical systems and processes. CAs have also been successfully used as a VLSI architecture that proved to be very efficient at least in terms of silicon-area utilization and clock-speed maximization. Quantum cellular automata (QCAs) as one of the promising emerging technologies for nanoscale and quantum computing circuit implementation, provides very high scale integration, very high switching frequency and extremely low power characteristics. In this paper we present a new automated design architecture and a tool, namely DATICAQ (Design Automation Tool of 1-D CAs using QCAs), that builds a bridge between 1-D CAs as models of physical systems and processes and 1-D QCAs as nanoelectronic architecture. The QCA implementation of CAs not only drives the already developed CAs circuits to the nanoelectronics era but improves their performance significantly. The inputs of the proposed architecture are CA dimensionality, size, local rule, and initial and boundary conditions imposed by the particular problem. DATICAQ produces as output the layout of the QCA implementation of the particular 1-D CA model. Simulations of CA models for zero and periodic boundary conditions and the corresponding QCA circuits showed that the CA models have been successfully implemented.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 8
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-07
    Description: Role-based access control is an important access control method for securing computer systems. A role-based access control policy can be implemented incorrectly due to various reasons, such as programming errors. Defects in the implementation may lead to unauthorized access and security breaches. To reveal access control defects, this paper presents a model-based approach to automated generation of executable access control tests using predicate/transition nets. Role-permission test models are built by integrating declarative access control rules with functional test models or contracts (preconditions and postconditions) of the associated activities (the system functions). The access control tests are generated automatically from the test models to exercise the interactions of access control activities. They are transformed into executable code through a model-implementation mapping that maps the modeling elements to implementation constructs. The approach has been implemented in an industry-adopted test automation framework that supports the generation of test code in a variety of languages. The full model-based testing process has been applied to three systems implemented in Java. The effectiveness is evaluated through mutation analysis of role-based access control rules. The experiments show that the model-based approach is highly effective in detecting the seeded access control defects.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 9
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-07
    Description: Heterogeneous multiprocessor systems, which are composed of a mix of processing elements, such as commodity multicore processors, graphics processing units (GPUs), and others, have been widely used in scientific computing community. Software applications incorporate the code designed and optimized for different types of processing elements in order to exploit the computing power of such heterogeneous computing systems. In this paper, we consider the problem of optimal distribution of the workload of data-parallel scientific applications between processing elements of such heterogeneous computing systems. We present a solution that uses functional performance models (FPMs) of processing elements and FPM-based data partitioning algorithms. Efficiency of this approach is demonstrated by experiments with parallel matrix multiplication and numerical simulation of lid-driven cavity flow on hybrid servers and clusters.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 10
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-07
    Description: In this paper, we propose a new notion called $k$ -times attribute-based anonymous access control , which is particularly designed for supporting cloud computing environment. In this new notion, a user can authenticate himself/herself to the cloud computing server anonymously. The server only knows the user acquires some required attributes, yet it does not know the identity of this user. In addition, we provide a $k$ -times limit for anonymous access control. That is, the server may limit a particular set of users (i.e., those users with the same set of attribute) to access the system for a maximum $k$ -times within a period or an event. Further additional access will be denied. We also prove the security of our instantiation. Our implementation result shows that our scheme is practical.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 11
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-07
    Description: In face of high partial and complete disk failure rates and untimely system crashes, the executions of low-priority background tasks become increasingly frequent in large-scale data centers. However, the existing algorithms are all reactive optimizations and only exploit the temporal locality of workloads to reduce the user I/O requests during the low-priority background tasks. To address the problem, this paper proposes Intelligent Data Outsourcing (IDO), a zone-based and proactive data migration optimization, to significantly improve the efficiency of the low-priority background tasks. The main idea of IDO is to proactively identify the hot data zones of RAID-structured storage systems in the normal operational state. By leveraging the prediction tools to identify the upcoming events, IDO proactively migrates the data blocks belonging to the hot data zones on the degraded device to a surrogate RAID set in the large-scale data centers. Upon a disk failure or crash reboot, most user I/O requests addressed to the degraded RAID set can be serviced directly by the surrogate RAID set rather than the much slower degraded RAID set. Consequently, the performance of the background tasks and user I/O performance during the background tasks are improved simultaneously. Our lightweight prototype implementation of IDO and extensive trace-driven experiments on two case studies demonstrate that, compared with the existing state-of-the-art approaches, IDO effectively improves the performance of the low-priority background tasks. Moreover, IDO is portable and can be easily incorporated into any existing algorithms for RAID-structured storage systems.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 12
    Publication Date: 2015-08-07
    Description: Cloud computing that provides elastic computing and storage resource on demand has become increasingly important due to the emergence of “big data”. Cloud computing resources are a natural fit for processing big data streams as they allow big data application to run at a scale which is required for handling its complexities (data volume, variety and velocity). With the data no longer under users’ direct control, data security in cloud computing is becoming one of the most concerns in the adoption of cloud computing resources. In order to improve data reliability and availability, storing multiple replicas along with original datasets is a common strategy for cloud service providers. Public data auditing schemes allow users to verify their outsourced data storage without having to retrieve the whole dataset. However, existing data auditing techniques suffers from efficiency and security problems. First, for dynamic datasets with multiple replicas, the communication overhead for update verifications is very large, because each update requires updating of all replicas, where verification for each update requires O(log n ) communication complexity. Second, existing schemes cannot provide public auditing and authentication of block indices at the same time. Without authentication of block indices, the server can build a valid proof based on data blocks other than the blocks client requested to verify. In order to address these problems, in this paper, we present a novel public auditing scheme named MuR-DPA. The new scheme incorporated a novel authenticated data structure (ADS) based on the Merkle hash tree (MHT), which we call MR-MHT. To support full dynamic data updates and authentication of block indices, we included rank and level values in computation of MHT nodes. In contrast to existing schemes, level values of nodes in MR-MHT are assigned in a top-down order, and all replica blocks for each data block are organized into a - ame replica sub-tree. Such a configuration allows efficient verification of updates for multiple replicas. Compared to existing integrity verification and public auditing schemes, theoretical analysis and experimental results show that the proposed MuR-DPA scheme can not only incur much less communication overhead for both update verification and integrity verification of cloud datasets with multiple replicas, but also provide enhanced security against dishonest cloud service providers.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 13
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-06-09
    Description: Bandwidth reservation has been recognized as a value-added service to the cloud provider in recent years. We consider an open market of cloud bandwidth reservation, in which cloud providers offer bandwidth reservation services to cloud tenants, especially online streaming service providers, who have strict requirements on the amount of bandwidth to guarantee their quality of services. In this paper, we model the open market as a double-sided auction, and propose the first family of ST rategy-proof double A uctions for multi-cloud, multi-tenant bandwidth R eservation (STAR). STAR contains two auction mechanisms. The first one, STAR-Grouping, divides the tenants into groups by a bid-independent way, and carefully matches the cloud providers with the tenant groups to form good trades. The second one, STAR-Padding, greedily matches the cloud providers with the tenants, and fills the partially reserved cloud provider(s) with a novel virtual padding tenant who can be a component of the auctioneer. Our analysis shows that both of the two auction mechanisms achieve strategy-proofness and ex-post budget balance. Our evaluation results show that they achieve good performance in terms of social welfare, cloud bandwidth utilization, and tenant satisfaction ratio.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 14
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-06-09
    Description: In a distributed real-time system (DRTS), jobs are often executed on a number of processors and must complete by their end-to-end deadlines. Job deadline requirements may be violated if resource competition among different jobs on a given processor is not considered. This paper introduces a distributed, locally optimal algorithm to assign local deadlines to the jobs on each processor without any restrictions on the mappings of the applications to the processors in the distributed soft real-time system. Improvedschedulability results are achieved by the algorithm since disparate workloads among the processors due to competing jobs havingdifferent paths are considered. Given its distributed nature, the proposed algorithm is adaptive to dynamic changes of the applications and avoids the overhead of global clock synchronization. In order to make the proposed algorithm more practical, two derivatives of the algorithm are proposed and compared. Simulation results based on randomly generated workloads indicate that the proposed approach outperforms existing work both in terms of the number of feasible jobs (between 51% and 313% on average) and the number of feasible task sets (between 12% and 71% on average).
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 15
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-06-09
    Description: Reproducibility, i.e. getting bitwise identical floating point results from multiple runs of the same program, is a property that many users depend on either for debugging or correctness checking in many codes [10] . However, the combination of dynamic scheduling of parallel computing resources, and floating point nonassociativity, makes attaining reproducibility a challenge even for simple reduction operations like computing the sum of a vector of numbers in parallel. We propose a technique for floating point summation that is reproducible independent of the order of summation. Our technique uses Rump’s algorithm for error-free vector transformation [7] , and is much more efficient than using (possibly very) high precision arithmetic. Our algorithm reproducibly computes highly accurate results with an absolute error bound of $n cdot 2^{-28} cdot macheps cdot max _i |v_i|$ at a cost of $7n$ FLOPs and a small constant amount of extra memory usage. Higher accuracies are also possible by increasing the number of error-free transformations. As long as all operations are performed in to-nearest rounding mode, results computed by the proposed algorithms are reproducible for any run on any platform. In particular, our algorithm requires the minimum number of reductions, i.e. one reduction of an array of six double precision floating point numbers per sum, and hence is well suited for massively parallel environments.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 16
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-06-09
    Description: In recent years, embedded dynamic random-access memory (eDRAM) technology has been implemented in last-level caches due to its low leakage energy consumption and high density. However, the fact that eDRAM presents slower access time than static RAM (SRAM) technology has prevented its inclusion in higher levels of the cache hierarchy. This paper proposes to mingle SRAM and eDRAM banks within the data array of second-level (L2) caches. The main goal is to achieve the best trade-off among performance, energy, and area. To this end, two main directions have been followed. First, this paper explores the optimal percentage of banks for each technology. Second, the cache controller is redesigned to deal with performance and energy. Performance is addressed by keeping the most likely accessed blocks in fast SRAM banks. In addition, energy savings are further enhanced by avoiding unnecessary destructive reads of eDRAM blocks. Experimental results show that, compared to a conventional SRAM L2 cache, a hybrid approach requiring similar or even lower area speedups the performance on average by 5.9 percent, while the total energy savings are by 32 percent. For a 45 nm technology node, the energy-delay-area product confirms that a hybrid cache is a better design than the conventional SRAM cache regardless of the number of eDRAM banks, and also better than a conventional eDRAM cache when the number of SRAM banks is an eighth of the total number of cache banks.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 17
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-06-09
    Description: Nearly all of the currently used signature schemes, such as RSA or DSA, are based either on the factoring assumption or the presumed intractability of the discrete logarithm problem. As a consequence, the appearance of quantum computers or algorithmic advances on these problems may lead to the unpleasant situation that a large number of today’s schemes will most likely need to be replaced with more secure alternatives. In this work we present such an alternative—an efficient signature scheme whose security is derived from the hardness of lattice problems. It is based on recent theoretical advances in lattice-based cryptography and is highly optimized for practicability and use in embedded systems. The public and secret keys are roughly $1.5$  kB and $0.3$  kB long, while the signature size is approximately $1.1$  kB for a security level of around $80$ bits. We provide implementation results on reconfigurable hardware (Spartan/Virtex-6) and demonstrate that the scheme is scalable, has low area consumption, and even outperforms classical schemes.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 18
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-06-09
    Description: With the rising demands on cloud services, the electricity consumption has been increasing drastically as the main operational expenditure (OPEX) to data center providers. The geographical heterogeneity of electricity prices motivates us to study the task placement problem over geo-distributed data centers. We exploit the dynamic frequency scaling technique and formulate an optimization problem that minimizes OPEX while guaranteeing the quality-of-service, i.e., the expected response time of tasks. Furthermore, an optimal solution is discovered for this formulated problem. The experimental results show that our proposal achieves much higher cost-efficiency than the traditional resizing scheme, i.e., by activating/deactivating certain servers in data centers.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 19
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-06-09
    Description: A new methodology for DRAM performance analysis has been proposed based on accurate characterization of DRAM bus cycles. The proposed methodology allows cycle-accurate performance analysis of arbitrary DRAM traces, obviates the need for functional simulations, allows accurate estimation of DRAM performance maximum, and enables root causing of suboptimal DRAM operation.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 20
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-12
    Description: During at-speed test of high performance sequential ICs using scan-based Logic BIST, the IC activity factor (AF) induced by the applied test vectors is significantly higher than that experienced during its in field operation. Consequently, power droop (PD) may take place during both shift and capture phases, which will slow down the circuit under test (CUT) signal transitions. At capture, this phenomenon is likely to be erroneously recognized as due to delay faults. As a result, a false test fail may be generated, with consequent increase in yield loss. In this paper, we propose two approaches to reduce the PD generated at capture during at-speed test of sequential circuits with scan-based Logic BIST using the Launch-On-Shift scheme. Both approaches increase the correlation between adjacent bits of the scan chains with respect to conventional scan-based LBIST. This way, the AF of the scan chains at capture is reduced. Consequently, the AF of the CUT at capture, thus the PD at capture, is also reduced compared to conventional scan-based LBIST. The former approach, hereinafter referred to as Low-Cost Approach (LCA), enables a 50 percent reduction in the worst case magnitude of PD during conventional logic BIST. It requires a small cost in terms of area overhead (of approximately 1.5 percent on average), and it does not increase the number of test vectors over the conventional scan-based LBIST to achieve the same Fault Coverage (FC). Moreover, compared to three recent alternative solutions, LCA features a comparable AF in the scan chains at capture, while requiring lower test time and area overhead. The second approach, hereinafter referred to as High-Reduction Approach (HRA), enables scalable PD reductions at capture of up to 87 percent, with limited additional costs in terms of area overhead and number of required test vectors for a given target FC, over our LCA approach. Particularly, compared to two of the three recent alternative solutions mentioned above, HRA en- bles a significantly lower AF in the scan chains during the application of test vectors, while requiring either a comparable area overhead or a significantly lower test time. Compared to the remaining alternative solutions mentioned above, HRA enables a similar AF in the scan chains at capture (approximately 90 percent lower than conventional scan-based LBIST), while requiring a significantly lower test time (approximately 4.87 times on average lower number of test vectors) and comparable area overhead (of approximately 1.9 percent on average).
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 21
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: The advent of the cloud computing makes storage outsourcing become a rising trend, which promotes the secure remote data auditing a hot topic that appeared in the research literature. Recently some research consider the problem of secure and efficient public data integrity auditing for shared dynamic data. However, these schemes are still not secure against the collusion of cloud storage server and revoked group users during user revocation in practical cloud storage system. In this paper, we figure out the collusion attack in the exiting scheme and provide an efficient public integrity auditing scheme with secure group user revocation based on vector commitment and verifier-local revocation group signature. We design a concrete scheme based on the our scheme definition. Our scheme supports the public checking and efficient user revocation and also some nice properties, such as confidently, efficiency, countability and traceability of secure group user revocation. Finally, the security and experimental analysis show that, compared with its relevant schemes our scheme is also secure and efficient.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 22
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: To select an appropriate level of error protection in caches, the impact of various protection schemes on the cache Failure In Time (FIT) rate must be evaluated for a target benchmark suite. However, while many simulation tools exist to evaluate area, power and performance for a set of benchmark programs, there is a dearth of such tools for reliability. This paper introduces a new cache reliability model called PARMA+ that has unique features which distinguish it from previous models. PARMA+ estimates a cache's FIT rate in the presence of spatial multi-bit faults, single-bit faults, temporal multi-bit faults and different error protection schemes including parity, ECC, early write-back and bit-interleaving. We first develop the model formally, then we demonstrate its accuracy. We have run reliability simulations for many distributions of large and small fault patterns and have compared them with accelerated fault injection simulations. PARMA+ has high accuracy and low computational complexity.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 23
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: Although the travel time is the most important information in road networks, many spatial queries, e.g., $k$ -nearest-neighbor ( $k$ -NN) and range queries, for location-based services (LBS) are only based on the network distance. This is because it is costly for an LBS provider to collect real-time traffic data from vehicles or roadside sensors to compute the travel time between two locations. With the advance of web mapping services, e.g., Google Maps, Microsoft Bing Maps, and MapQuest Maps, there is an invaluable opportunity for using such services for processing spatial queries based on the travel time. In this paper, we propose a server-side S patial M ashup S ervice (SMS) that enables the LBS provider to efficiently evaluate $k$ -NN queries in road networks using the route information and travel time retrieved from an external web mapping service. Due to the high cost of retrieving such external information, the usage limits of web mapping services, and the large number of spatial queries, we optimize the SMS for a large number of $k$ -NN queries. We first discuss how the SMS processes a single $k$ -NN query using two optimizations, namely, direction sharing and parallel requesting . Then, we extend them to process multiple concurrent $k$ -NN queries and design a performance tuning tool to provide a trade-off between the query response time and the number of external requests and more importantly, to prevent a starvation problem in the parallel requesting optimization for concurrent queries. We evaluate the performance of the proposed SMS using MapQuest Maps, a real road network, real and synthetic data sets. Experimental results show the efficiency and scalability of our optimizations designed for the SMS.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 24
    Publication Date: 2016-07-08
    Description: Several recent works have studied mobile vehicle scheduling to recharge sensor nodes via wireless energy transfer technologies. Unfortunately, most of them overlooked important factors of the vehicles’ moving energy consumption and limited recharging capacity, which may lead to problematic schedules or even stranded vehicles. In this paper, we consider the recharge scheduling problem under such important constraints. To balance energy consumption and latency, we employ one dedicated data gathering vehicle and multiple charging vehicles. We first organize sensors into clusters for easy data collection, and obtain theoretical bounds on latency. Then we establish a mathematical model for the relationship between energy consumption and replenishment, and obtain the minimum number of charging vehicles needed. We formulate the scheduling into a Profitable Traveling Salesmen Problem that maximizes profit - the amount of replenished energy less the cost of vehicle movements, and prove it is NP-hard. We devise and compare two algorithms: a greedy one that maximizes the profit at each step; an adaptive one that partitions the network and forms Capacitated Minimum Spanning Trees per partition. Through extensive evaluations, we find that the adaptive algorithm can keep the number of nonfunctional nodes at zero. It also reduces transient energy depletion by 30-50 percent and saves 10-20 percent energy. Comparisons with other common data gathering methods show that we can save 30 percent energy and reduce latency by two orders of magnitude.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 25
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: The capability of selectively sharing encrypted data with different users via public cloud storage may greatly ease security concerns over inadvertent data leaks in the cloud. A key challenge to designing such encryption schemes lies in the efficient management of encryption keys. The desired flexibility of sharing any group of selected documents with any group of users demands different encryption keys to be used for different documents. However, this also implies the necessity of securely distributing to users a large number of keys for both encryption and search, and those users will have to securely store the received keys, and submit an equally large number of keyword trapdoors to the cloud in order to perform search over the shared data. The implied need for secure communication, storage, and complexity clearly renders the approach impractical. In this paper, we address this practical problem, which is largely neglected in the literature, by proposing the novel concept of key-aggregate searchable encryption and instantiating the concept through a concrete KASE scheme, in which a data owner only needs to distribute a single key to a user for sharing a large number of documents, and the user only needs to submit a single trapdoor to the cloud for querying the shared documents. The security analysis and performance evaluation both confirm that our proposed schemes are provably secure and practically efficient.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 26
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: Infrastructure-as-a-service (IaaS) cloud providers offer tenants elastic computing resources in the form of virtual machine (VM) instances to run their jobs. Recently, providing predictable performance (i.e., performance guarantee) for tenant applications is becoming increasingly compelling in IaaS clouds. However, the hardware heterogeneity and performance interference across the same type of cloud VM instances can bring substantial performance variation to tenant applications, which inevitably stops the tenants from moving their performance-sensitive applications to the IaaS cloud. To tackle this issue, this paper proposes Heifer, a He terogeneity and i nter fer ence-aware VM provisioning framework for tenant applications, by focusing on MapReduce as a representative cloud application. It predicts the performance of MapReduce applications by designing a lightweight performance model using the online-measured resource utilization and capturing VM interference. Based on such a performance model, Heifer provisions the VM instances of the good-performing hardware type (i.e., the hardware that achieves the best application performance) to achieve predictable performance for tenant applications, by explicitly exploring the hardware heterogeneity and capturing VM interference. With extensive prototype experiments in our local private cloud and a real-world public cloud (i.e., Microsoft Azure) as well as complementary large-scale simulations, we demonstrate that Heifer can guarantee the job performance while saving the job budget for tenants. Moreover, our evaluation results show that Heifer can improve the job throughput of cloud datacenters, such that the revenue of cloud providers can be increased, thereby achieving a win-win situation between providers and tenants.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 27
    Publication Date: 2016-07-08
    Description: Gaussian normal bases (GNBs) are special set of normal bases (NBs) which yield low complexity $GFleft(2^{m}right)$ arithmetic operations. In this paper, we present new architectures for the digit-level single, hybrid-double, and hybrid-triple multiplication of $GFleft(2^{m}right)$ elements based on the GNB representation for odd values of $m > 1$ . The proposed fully-serial-in single multipliers perform multiplication of two field elements and offer high throughput when the data-path capacity for entering inputs is limited. The proposed hybrid-double and hybrid-triple digit-level GNB multipliers perform, respectively, two and three field multiplications using the same latency required for a single digit-level multiplier, at the expense of increased area. In addition, we present a new eight-ary field exponentiation architecture which does not require precomputed or stored intermediate values.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 28
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: Shamir's secret sharing scheme is an effective way to distribute secret to a group of shareholders. The security of the unprotected sharing scheme, however, can be easily broken by cheaters or attackers who maliciously feed incorrect shares during the secret recovery stage or inject faults into hardware computing the secret. In this paper, we propose cheater detection and identification schemes based on robust and algebraic manipulation detection (AMD) codes and m-disjunct matrices (superimposed codes). We present the constructions of codes for cheater detection and identification and describe how the cheater identification problem can be related to the classic group testing algorithms based on m-disjunct matrices. Simulation and synthesis results show that the proposed architecture can improve the security level significantly even under strong cheating attack models with reasonable area and timing overheads.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 29
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: Cloud platforms encompass a large number of storage services that can be used to manage the needs of customers. Each of these services, offered by a different provider, is characterized by specific features, limitations and prices. In presence of multiple options, it is crucial to select the best solution fitting the customer requirements in terms of quality of service and costs. Most of the available approaches are not able to handle uncertainty in the expression of subjective preferences from customers, and can result in wrong (or sub-optimal) service selections in presence of rational/selfish providers, exposing untrustworthy indications concerning the quality of service levels and prices associated to their offers. In addition, due to its multi-objective nature, the optimal service selection process results in a very complex task to be managed, when possible, in a distributed way, for well-known scalability reasons. In this work, we aim at facing the above challenges by proposing three novel contributions. The fuzzy sets theory is used to express vagueness in the subjective preferences of the customers. The service selection is resolved with the distributed application of fuzzy inference or Dempster-Shafer theory of evidence. The selection strategy is also complemented by the adoption of a game theoretic approach for promoting truth-telling ones among service providers. We present empirical evidence of the proposed solution effectiveness through properly crafted simulation experiments.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 30
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 31
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: As the cloud computing technology develops during the last decade, outsourcing data to cloud service for storage becomes an attractive trend, which benefits in sparing efforts on heavy data maintenance and management. Nevertheless, since the outsourced cloud storage is not fully trustworthy, it raises security concerns on how to realize data deduplication in cloud while achieving integrity auditing. In this work, we study the problem of integrity auditing and secure deduplication on cloud data. Specifically, aiming at achieving both data integrity and deduplication in cloud, we propose two secure systems, namely SecCloud and SecCloud $^+$ . SecCloud introduces an auditing entity with a maintenance of a MapReduce cloud, which helps clients generate data tags before uploading as well as audit the integrity of data having been stored in cloud. Compared with previous work, the computation by user in SecCloud is greatly reduced during the file uploading and auditing phases. SecCloud $^+$ is designed motivated by the fact that customers always want to encrypt their data before uploading, and enables integrity auditing and secure deduplication on encrypted data.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 32
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: Cache compression improves the performance of a multi-core system by being able to store more cache blocks in a compressed format. Compression is achieved by exploiting data patterns present within a block. For a given cache space, compression increases the effective cache capacity. However, this increase is limited by the number of tags that can be accommodated at the cache. Prefetching is another technique that improves system performance by fetching the cache blocks ahead of time into the cache and hiding the off-chip latency. Commonly used hardware prefetchers, such as stream and stride, fetch multiple contiguous blocks into the cache. In this paper we propose prefetched blocks compaction (PBC) wherein we exploit the data patterns present across these prefetched blocks. PBC compacts the prefetched blocks into a single block with a single tag, effectively increasing the cache capacity. We also modify the cache organization to access these multiple cache blocks residing in a single block without any need for extra tag look-ups. PBC improves the system performance by 11.1 percent with a maximum of 43.4 percent on a four-core system.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 33
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: Multi-core processors achieve a trade-off between the performance and the power consumption by using Dynamic Voltage Scaling (DVS) techniques. In this paper, we study the power efficient scheduling problem of real-time tasks in an identical multi-core system, and present Node Scaling model to achieve power-aware scheduling. We prove that there is a bound speed which results in the minimal power consumption for a given task set, and the maximal value of task utilization, $u_{max}$ , in a task set is a key element to decide its minimal power consumption. Based on the value $u_{max}$ , we classify task sets into two categories: the bounded task sets and the non-bounded task sets, and we prove the lower bound of power consumption for each type of task set. Simulations based on Intel Xeon X5550 and PXA270 processors show Node Scaling model can achieve power efficient scheduling by applying to existing algorithms such as EDF-FF and SPA2. The ratio of power reduction depends on the multi-core processor's property which is defined as the ratio of the bound speed to the maximal speed of the cores. When the ratio of speeds decreases, the ratio of power reduction increases for all the power efficient algorithms.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 34
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: Existing secure and privacy-preserving schemes for vehicular communications in vehicular ad hoc networks face some challenges, e.g., reducing the dependence on ideal tamper-proof devices, building efficient member revocation mechanisms and avoiding computation and communication bottlenecks. To cope with those challenges, we propose a highly efficient secure and privacy-preserving scheme based on identity-based aggregate signatures. Our scheme enables hierarchical aggregation and batch verification. The individual identity-based signatures generated by different vehicles can be aggregated and verified in a batch. The aggregated signatures can be re-aggregated by a message collector (e.g., traffic management authority). With our hierarchical aggregation technique, we significantly reduce the transmission/storage overhead of the vehicles and other parties. Furthermore, existing batch verification based schemes in vehicular ad hoc networks require vehicles to wait for enough messages to perform a batch verification. In contrast, we assume that vehicles will generate messages (and the corresponding signatures) in certain time spans, so that vehicles only need to wait for a very short period before they can start the batch verification procedure. Simulation shows that a vehicle can verify the received messages with very low latency and fast response.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 35
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: Computer vision applications have a large disparity in operations, data representation and memory access patterns from the early vision stages to the final classification and recognition stages. A hardware system for computer vision has to provide high flexibility without compromising performance, exploiting massively spatial-parallel operations but also keeping a high throughput on data-dependent and complex program flows. Furthermore, the architecture must be modular, scalable and easy to adapt to the needs of different applications. Keeping this in mind, a hybrid SIMD/MIMD architecture for embedded computer vision is proposed. It consists of a coprocessor designed to provide fast and flexible computation of demanding image processing tasks of vision applications. A 32-bit 128-unit device was prototyped on a Virtex-6 FPGA which delivers a peak performance of 19.6 GOP/s and 7.2 W of power dissipation.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 36
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: The key to reducing static energy in supercomputers is switching off their unused components. Routers are the major components of a supercomputer. Whether routers can be effectively switched off or not has become the key to static energy management for supercomputers. For many typical applications, the routers in a supercomputer exhibit low utilization. However, there is no effective method to switch the routers off when they are idle. By analyzing the router occupancy in time and space, for the first time, we present a routing-policy guided topology partitioning methodology to solve this problem. We propose topology partitioning methods for three kinds of commonly used topologies (mesh, torus and fat-tree) equipped with the three most popular routing policies (deterministic routing, directionally adaptive routing and fully adaptive routing). Based on the above methods, we propose the key techniques required in this topology partitioning based static energy management in supercomputer interconnection networks to switch off unused routers in both time and space dimensions. Three topology-aware resource allocation algorithms have been developed to handle effectively different job-mixes running on a supercomputer. We validate the effectiveness of our methodology by using Tianhe-2 and a simulator for the aforementioned topologies and routing policies. The energy savings achieved on a subsystem of Tianhe-2 range from 3.8 to 79.7 percent. This translates into a yearly energy cost reduction of up to half a million US dollars for Tianhe-2.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 37
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: This paper proposes several designs of approximate restoring dividers; two different levels of approximation (cell and array levels) are employed. Three approximate subtractor cells are utilized for integer subtraction as basic step of division; these cells tend to mitigate accuracy in subtraction with other metrics, such as circuit complexity and power dissipation. At array level, exact cells are either replaced or truncated in the approximate divider designs. A comprehensive evaluation of approximation at both cell- and array (divider) levels is pursued using error analysis and HSPICE simulation; different circuit metrics including complexity and power dissipation are evaluated. Different applications are investigated by utilizing the proposed approximate arithmetic circuits. The simulation results show that with extensive savings for power dissipation and circuit complexity, the proposed designs offer better error tolerant capabilities for quotient oriented applications (image processing) than remainder oriented application (modulo operations). The proposed approximate restoring divider is significantly better than the approximate non-restoring scheme presented in the technical literature.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 38
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: Wireless sensor networks (WSNs) have been considered to be the next generation paradigm of structural health monitoring (SHM) systems due to the low cost, high scalability and ease of deployment. Due to the intrinsically energy-intensive nature of the sensor nodes in SHM application, it is highly preferable that they can be divided into subsets and take turns to monitor the condition of a structure. This approach is generally called as ‘coverage-preserving scheduling’ and has been widely adopted in existing WSN applications. The problem of partitioning the nodes into subsets is generally called as the ’maximum lifetime coverage problem (MLCP)’. However, existing solutions to the MLCP cannot be directly applied to SHM application. As compared to other WSN applications, we cannot define a specific coverage area independently for each sensor node in SHM, which is however the basic assumption in all existing solutions to the MLCP. In this paper, we proposed two approaches to solve the MLCP in SHM. The performance of the methods is demonstrated through both extensive simulations and real experiments.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 39
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: NAND flash memory is widely used for the secondary storage of computer systems. The flash translation layer (FTL) is the firmware that manages and operates a flash-based storage device. One of the FTL's modules manages the RAM buffer of the flash device. Now this RAM buffer is sufficient to be used for both address mapping and data buffering. As the fastest component of the flash layer interface, effective management of this buffer has a significant impact on the performance of data storage and access. This paper proposes a novel scheme called TreeFTL for this purpose. TreeFTL organizes address translation pages and data storage pages in a tree-like structure in the RAM buffer. The tree enables TreeFTL to adapt to the access behaviors of workloads by dynamically adjusting the partitions for address mapping and data buffering. Furthermore, TreeFTL employs a lightweight mechanism to evict the least-recently-used victim pages when the need arises. Our experiments show that TreeFTL is able to spend 46.6 and 49.0 percent less service time over various workloads than two state-of-the-art algorithms, respectively, for a 64 MB RAM buffer.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 40
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: Whereas clustered microarchitectures themselves have been extensively studied, the memory units for these clustered microarchitectures have received relatively little attention. This article discusses some of the inherent challenges of clustered memory units and shows how these can be overcome. Clustered memory pipelines work well with the late allocation of load/store queue entries and physically unordered queues. Yet this approach has characteristic problems such as queue overflows and allocation patterns that lead to deadlocks. We propose techniques to solve each of these problems and show that a distributed memory unit can offer significant energy savings and speedups over a centralized unit. For instance, compared to a centralized cache with a load/store queue of 64/24 entries, our four-cluster distributed memory unit with load/store queues of 16/8 entries each consumes 31 percent less energy and performs 4,7 percent better on SPECint and consumes 36 percent less energy and performs 7 percent better for SPECfp.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 41
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: In Broadcast Encryption (BE) systems like Pay-TV, AACS, online content sharing and broadcasting, reducing the header length (communication overhead per session) is of practical interest. The Subset Difference (SD) scheme due to Naor-Naor-Lotspiech (NNL) is the most popularly used BE scheme. We introduce the $(a,b,gamma)$ augmented binary tree subset difference ( $(a,b,gamma)$ -ABTSD) scheme which is a generalization of the NNL-SD scheme. By varying the parameters $(a,b,gamma)$ , it is possible to obtain $O(nlog n)$ different schemes. The average header length achieved by the new schemes is smaller than all known schemes having the same decryption time as that of the NNL-SD scheme and achieving non-trivial trade-offs between the user storage and the header size. The amount of key material that a user is required to store increases. For the earlier mentioned applications, reducing header size and achieving fast decryption is perhaps more of a concern than the user storage.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 42
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: We propose a new optimal data placement technique to improve the performance of MapReduce in cloud data centers by considering not only the data locality but also the global data access costs. We first conducted an analytical and experimental study to identify the performance issues of MapReduce in data centers and to show that MapReduce tasks that are involved in unexpected remote data access have much greater communication costs and execution time, and can significantly deteriorate the overall performance. Next, we formulated the problem of optimal data placement and proposed a generative model to minimize global data access cost in data centers and showed that the optimal data placement problem is NP-hard. To solve the optimal data placement problem, we propose a topology-aware heuristic algorithm by first constructing a replica-balanced distribution tree for the abstract tree structure, and then building a replica-similarity distribution tree for detail tree construction, to construct an optimal replica distribution tree. The experimental results demonstrated that our optimal data placement approach can improve the performance of MapReduce with lower communication and computation costs by effectively minimizing global data access costs, more specifically reducing unexpected remote data access.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 43
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: This paper presents a Ternary Content-addressable Memory (TCAM) design which is based on the use of floating-gate (flash) transistors. TCAMs are extensively used in high speed IP networking, and are commonly found in routers in the internet core. Traditional TCAM ICs are built using CMOS devices, and a single TCAM cell utilizes 17 transistors. In contrast, our TCAM cell utilizes only two flash transistors, thereby significantly reducing circuit area. We cover the chip-level architecture of the TCAM IC briefly, focusing mainly on the TCAM block which does fast parallel IP routing table lookup. Our flash-based TCAM (FTCAM) block is simulated in SPICE, and we show that it has a significantly lowered area compared to a CMOS based TCAM block, with a speed that can meet current ( $sim$ 400 Gb/s) data rates that are found in the internet core.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 44
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: The Booth multiplier has been widely used for high performance signed multiplication by encoding and thereby reducing the number of partial products. A multiplier using the radix- $4$ (or modified Booth) algorithm is very efficient due to the ease of partial product generation, whereas the radix- $8$ Booth multiplier is slow due to the complexity of generating the odd multiples of the multiplicand. In this paper, this issue is alleviated by the application of approximate designs. An approximate $2$ -bit adder is deliberately designed for calculating the sum of $1times$ and $2times$ of a binary number. This adder requires a small area, a low power and a short critical path delay. Subsequently, the $2$ -bit adder is employed to implement the less significant section of a recoding adder for generating the triple multiplicand with no carry propagation. In the pursuit of a trade-off between accuracy and power consumption, two signed $16times 16$ bit approximate radix-8 Booth multipliers are designed using the approximate recoding adder with and without the truncation of a number of less significant bits in the partial products. The proposed approximate multipliers are faster and more power efficient than the accurate Booth multiplier. The multiplier with 15-bit truncation achieves the best overall performance in terms of hardware and accuracy when compared to other approximate Booth multiplier designs. Finally, the approximate multipliers are applied to the design of a low-pass FIR filter and they show better performance than other approximate Booth multipliers.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 45
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: Solid State Drives (SSDs) have been extensively deployed as the cache of hard disk-based storage systems. The SSD-based cache generally supplies ultra-large capacity, whereas managing so large a cache introduces excessive memory overhead, which in turn makes the SSD-based cache neither cost-effective nor energy-efficient. This work targets to reduce the memory overhead introduced by the replacement policy of SSD-based cache. Traditionally, data structures involved in cache replacement policy reside in main memory. While these in-memory data structures are not suitable for SSD-based cache any more since the cache is much larger than ever. We propose a memory-efficient framework which keeps most data structures in SSD while just leaving the memory-efficient data structure (i.e., a new bloom proposed in this work) in main memory. Our framework can be used to implement any LRU-based replacement policies under negligible memory overhead. We evaluate our proposals via theoretical analysis and prototype implementation. Experimental results demonstrate that, our framework is practical to implement most replacement policies for large caches, and is able to reduce the memory overhead by about $10 times$ .
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 46
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: A large portion of existing multithreaded embedded sofware has been programmed according to symmetric shared memory platforms where a monolithic memory block is shared by all cores. Such platforms accommodate popular parallel programming models such as POSIX threads and OpenMP. However with the growing number of cores in modern manycore embedded architectures, they present a bottleneck related to their centralized memory accesses. This paper proposes a solution tailored for an efficient execution of applications defined with shared-memory programming models onto on-chip distributed-memory multicore architectures. It shows how performance, area and energy consumption are significantly improved thanks to the scalability of these architectures. This is illustrated in an open-source realistic design framework, including tools from ASIC to microkernel.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 47
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: This paper describes a procedure that computes seeds for $LFSR$ -based generation of partially-functional broadside tests. Existing $LFSR$ -based test data compression methods compute seeds based on incompletely-specified test cubes. Functional broadside tests are fully-specified, and they have fully-specified scan-in states. This is the main challenge that the test generation procedure described in this paper needs to address. It addresses it by using a process that modifies an initial seed $s_i$ in order to reduce the Hamming distance between the scan-in state $p_i$ that $s_i$ creates and a reachable state $r_j$ . When the Hamming distance is reduced to zero, the seed can be used for generating functional broadside tests. When the distance is larger than zero, the tests are partially-functional. Experimental results are presented for transition faults in benchmark circuits to demonstrate the resulting distances and fault co- erage.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 48
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: A new apparatus for fast multiplication of two numbers is introduced. Inputs are split into partitions, and one number is replaced by two with zeros interlaced in every other partition. Products are computed with no carries between partitions, in the time required to multiply the short partitions and add the partial sums. Component adders and multipliers can be chosen to trade off area and speed. A new graphical tool is used to compare this multiplier to existing ones based on CMOS VLSI simulations.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 49
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-05-10
    Description: Covert channels are widely considered as a major risk of information leakage in various operating systems, such as desktop, cloud, and mobile systems. The existing works of modeling covert channels have mainly focused on using finite state machines (FSMs) and their transforms to describe the process of covert channel transmission. However, a FSM is rather an abstract model, where information about the shared resource, synchronization, and encoding/decoding cannot be presented in the model, making it difficult for researchers to realize and analyze the covert channels. In this paper, we use the high-level Petri Nets (HLPN) to model the structural and behavioral properties of covert channels. We use the HLPN to model the classic covert channel protocol. Moreover, the results from the analysis of the HLPN model are used to highlight the major shortcomings and interferences in the protocol. Furthermore, we propose two new covert channel models, namely: (a) two channel transmission protocol (TCTP) model and (b) self-adaptive protocol (SAP) model. The TCTP model circumvents the mutual inferences in encoding and synchronization operations; whereas the SAP model uses sleeping time and redundancy check to ensure correct transmission in an environment with strong noise. To demonstrate the correctness and usability of our proposed models in heterogeneous environments, we implement the TCTP and SAP in three different systems: (a) Linux, (b) Xen, and (c) Fiasco.OC. Our implementation also indicates the practicability of the models in heterogeneous, scalable and flexible environments.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 50
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-05-10
    Description: Recovery from sudden power-off (SPO) is one of the primary concerns among practitioners which bars the quick and wide deployment of flash storage devices. In this work, we propose Metadata Embedded Write (MEW), a novel scheme for handling the sudden power-off recovery in modern flash storage devices. Given that a large fraction of commercial SSDs employ compression technology, MEW exploits the compression-induced internal fragmentation in the data area to store rich metadata for fast and complete recovery. MEW consists of (i) a metadata embedding scheme to harbor SSD metadata in a physical page together with multiple compressed logical pages, (ii) an allocation chain based fast recovery scheme, and (iii) a light-weight metadata logging scheme which enables MEW to maintain the metadata for incompressible data, too. We performed extensive experiments to examine the performance of MEW. The performance overhead of MEW is 3 percent in the worst case, in terms of the write amplification factor, compared to the pure compression-based FTL that does not have any recovery scheme.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 51
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-05-10
    Description: Scaling the CMOS devices deep into the nanorange reduces their reliability margins significantly. Consequently, accurately calculating the reliability of digital nanocircuits is becoming a necessity for investigating design alternatives to optimize the trade-offs between area-power-delay and reliability. However, accurate reliability calculation of large and highly connected circuits is complex and very time consuming. This paper proposes a progressive consensus-based algorithm for identifying the worst reliability input vectors and the associated critical logic gates. Improving the reliability of the critical gates helps circuit designers to effectively improve the circuit overall reliability while having a minimal impact on the traditional power-area-deal design parameters. The accuracy and efficiency of the algorithm can be tuned to fit a variety of applications. The algorithm scales well with circuit size, and is independent of the interconnect complexity and the logic depth. Extensive computational results show that the accuracy and the efficiency of the proposed algorithm are better than the most recent results reported in the literature.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 52
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-05-10
    Description: The Serial-out bit-level multiplication scheme is characterized by an important latency feature. It has an ability to sequentially generate an output bit of the multiplication result in each clock cycle. However, the computational complexity of the existing serial-out bit-level multipliers in $GF$ ( $2^m$ ) using normal basis representation, limits its usefulness in many applications; hence, an optimized serial-out bit-level multiplier using polynomial basis representation is needed. In this paper, we propose new serial-out bit-level Mastrovito multiplier schemes. We show that in terms of the time complexities, the proposed multiplier schemes outperform the existing serial-out bit-level schemes available in the literature. In addition, using the proposed multiplier schemes, we present new hybrid-double multiplication architectures. To the best of our knowledge, this is the first time such a hybrid multiplier structure using the polynomial basis is proposed. Prototypes of the presented serial-out bit-level schemes and the proposed hybrid-double multiplication architectures (10 schemes in total) are implemented over both $GF(2^{163})$ and $GF(2^{233})$ , and experimental results are presented.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 53
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-05-10
    Description: Though a cooperative broadcast scheme has been proposed for fading environments, it has two defects: First, it only handles a packet flow from a single source node in the network, but does not consider the scenario of multiple packet flows simultaneously broadcasted from different source nodes. Second, it only allows a single relay node to forward a packet in each time slot, though multiple relay nodes forwarding in a time slot can significantly reduce broadcast latency. In this paper, we aim achieve low-latency multi-flow broadcast in wireless multi-hop networks with fading channels. To describe the interference among the transmission in different flows, we incorporate the Rayleigh fading model to the signal to noise ratio (SNR) model. Then, we introduce a cooperative diversity scheme which allows multiple relays forwarding in a time slot to reduce broadcast latency. We then formulate an interesting problem: In a fading environment, what is the optimal relay allocation schedule to minimize the broadcast latency? We propose a warm up heuristic algorithm for single-flow cooperative broadcast, based on which, we further propose a heuristic algorithm for multi-flow cooperative broadcast. Simulation results demonstrate that the two algorithms achieve lower broadcast latency than a previous method.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 54
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-05-10
    Description: On modern multicore machines, the memory management typically combines address interleaving in hardware and random allocation in the operating system (OS) to improve performance of both memory and cache. The conventional solutions, however, are increasingly strained as a wide variety of workloads run on complicated memory hierarchy and cause contention at multiple levels. We describe a new framework (named HVR) in OS memory management to support a flexible policy space for tackling diverse application needs, integrating vertical partitioning across layers, horizontal partitioning and random-interleaved allocation at a single layer. We exhaustively study the performance of these policies for over 2,000 workloads and correlate performance with application characteristics. Based on this correlation we derive several practical rules of memory allocation that we integrate into the unified HVR framework to guide resource partitioning and sharing for dynamic and diverse workloads. We implement our approach in Linux kernel 2.6.32 as a restructured page indexing system plus a series of kernel modules. Experimental results show that our framework consistently outperforms the unmodified Linux kernel, with up to 21 percent performance gains, and outperforms prior solutions at individual levels of the memory hierarchy.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 55
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-05-10
    Description: With the explosive growth in data volume, the I/O bottleneck has become an increasingly daunting challenge for big data analytics in the Cloud. Recent studies have shown that moderate to high data redundancy clearly exists in primary storage systems in the Cloud. Our experimental studies reveal that data redundancy exhibits a much higher level of intensity on the I/O path than that on disks due to relatively high temporal access locality associated with small I/O requests to redundant data. Moreover, directly applying data deduplication to primary storage systems in the Cloud will likely cause space contention in memory and data fragmentation on disks. Based on these observations, we propose a performance-oriented I/O deduplication, called POD, rather than a capacity-oriented I/O deduplication, exemplified by iDedup, to improve the I/O performance of primary storage systems in the Cloud without sacrificing capacity savings of the latter. POD takes a two-pronged approach to improving the performance of primary storage systems and minimizing performance overhead of deduplication, namely, a request-based selective deduplication technique, called Select-Dedupe, to alleviate the data fragmentation and an adaptive memory management scheme, called iCache, to ease the memory contention between the bursty read traffic and the bursty write traffic. We have implemented a prototype of POD as a module in the Linux operating system. The experiments conducted on our lightweight prototype implementation of POD show that POD significantly outperforms iDedup in the I/O performance measure by up to 87.9 percent with an average of 58.8 percent. Moreover, our evaluation results also show that POD achieves comparable or better capacity savings than iDedup.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 56
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-05-10
    Description: This manuscript proposes three classes of codes for error correction in a storage system in which the memory cells do not have the same number of levels, i.e., a multiscale storage. The proposed codes are single multiscale-symbol error correction (SMSEC) codes and are capable of correcting any errors occurring on a single memory cell, namely a column-deleted SMSEC code, an element-compacted SMSEC code and a product SMSEC code. In the proposed codes, the codewords are divided into two partitions, the elements on the first partition are over GF(2 b 1 ), while those on the remaining partition are over GF(2 b 2 ). This paper also gives guidelines for selection among the three SMSEC codes to meet the desired hardware overhead in the parallel decoder for realistic parameters of the partition pair, such as ( b 1 , b 2 ) = (4,3), (4,2) and (3,2). Moreover it is shown that the best choice for a MSS system is the SMSEC code with the shortest check bit length; if the check bit lengths of at least two codes are equal, then the use of the element-compacted SMSEC code incurs in the smallest hardware overhead.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 57
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-05-10
    Description: Multi-cloud storage can provide better features such as availability and scalability. Current works use multiple cloud storage providers with erasure coding to achieve certain benefits including fault-tolerance improving or vendor lock-in avoiding. However, these works only use the multi-cloud storage in ad-hoc ways, and none of them considers the optimization issue in general. In fact, the key to optimize the multi-cloud storage is to effectively choose providers and erasure coding parameters. Meanwhile, the data placement should satisfy system or application developers’ requirements. As developers often demand various objectives to be optimized simultaneously, such complex requirement optimization cannot be easily fulfilled by ad-hoc ways. This paper presents Triones, a systematic model to formally formulate data placement in multi-cloud storage by using erasure coding. Firstly, Triones addresses the problem of data placement optimization by applying non-linear programming and geometric space abstraction. It could satisfy complex requirements involving multi-objective optimization. Secondly, Triones can effectively balance among different objectives in optimization and is scalable to incorporate new ones. The effectiveness of the model is proved by extensive experiments on multiple cloud storage providers in the real world. For simple requirements, Triones can achieve 50 percent access latency reduction, compared with the model in $mu$ LibCloud. For complex requirements, Triones can improve fault-tolerance level by 2 $times$ and reduce access latency and vendor lock-in level by 30 $sim$ 70 percent and 49.85 percent respectively with about 19.19 percent more cost, compared with the model only optimizing cost in Scalia.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 58
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-05-10
    Description: In this paper, we propose a two-factor data security protection mechanism with factor revocability for cloud storage system. Our system allows a sender to send an encrypted message to a receiver through a cloud storage server. The sender only needs to know the identity of the receiver but no other information (such as its public key or its certificate). The receiver needs to possess two things in order to decrypt the ciphertext. The first thing is his/her secret key stored in the computer. The second thing is a unique personal security device which connects to the computer. It is impossible to decrypt the ciphertext without either piece. More importantly, once the security device is stolen or lost, this device is revoked. It cannot be used to decrypt any ciphertext. This can be done by the cloud server which will immediately execute some algorithms to change the existing ciphertext to be un-decryptable by this device. This process is completely transparent to the sender. Furthermore, the cloud server cannot decrypt any ciphertext at any time. The security and efficiency analysis show that our system is not only secure but also practical.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 59
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-05-13
    Description: $(t,k)$ -Diagnosis, which is a generalization of sequential diagnosis, requires that at least $k$ faulty processors be identified and repaired in each iteration when there are at most $t$ faulty processors, where $tge k$ . Based on the assumption that each vertex is adjacent to at least one fault-free vertex, the conditional $(t,k)$ -diagnosis of graphs was investigated by using the comparison diagnosis model. Lower bounds on the conditional $(t, k)$ -diagnosability of graphs were derived, and applied to obtain the following results. 1) Symmetric $d$ -dimensional grids are conditionally $(frac{N}{2d+1}-1,2d-1)$ -diagnosable when $dge 2$ and $N$ (the number of vertices) $ge 4^d$ . 2) Symmetric $d$ -dimensional tori are conditionally $(frac{1}{5}(N+min lbrace frac{8}{5} N^{frac{2}{3}},frac{2N-20}{15}rbrace -2),6)$ -diagnosable when $d=2$ and $Nge 49$ and $(frac{N}{2d+1}-1,4d-2)$ -diagnosable when $dge 3$ and $Nge 4^d$
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 60
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-05-13
    Description: Deadline guaranteed packet scheduling for switches is a fundamental issue for providing guaranteed QoS in digital networks. It is a historically difficult NP-hard problem if three or more deadlines are involved. All existing algorithms have too low throughput to be used in practice. A key reason is they use packet deadlines as default priorities to decide which packets to drop whenever conflicts occur. Although such a priority structure can ease the scheduling by focusing on one deadline at a time, it hurts the throughput greatly. Since deadlines do not necessarily represent the actual importance of packets, we can greatly improve the throughput if deadline induced priority is not enforced. This paper first presents an algorithm that guarantees the maximum throughput for the case where only two different deadlines are allowed. Then, an algorithm called iterative scheduling with no priority (ISNOP) is proposed for the general case where k > 2 different deadlines may occur. Not only does this algorithm have dramatically better average performance than all existing algorithms, but also guarantees approximation ratio of 2. ISNOP would provide a good practical solution for the historically difficult packet scheduling problem.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 61
    Publication Date: 2015-05-13
    Description: By increasing the complexity of digital systems, verification and debugging of such systems have become a major problem and economic issue. Although many computer aided design (CAD) solutions have been suggested to enhance efficiency of existing debugging approaches, they are still suffering from lack of providing a small set of potential error locations and also automatic correction mechanisms. On the other hand, the ever-growing usage of digital signal processing (DSP), computer graphics and embedded systems applications that can be modeled as polynomial computations in their datapath designs, necessitate an effective method to deal with their verification, debugging and correction. In this paper, we introduce a formal debugging approach based on static slicing and dynamic ranking methods to derive a reduced ordered set of potential error locations. In addition, to speed up finding true errors in the presence of multiple design errors, error candidates are sorted in decreasing order of their probability of being an error. After that, a mutation-based technique is employed to automatically correct bugs even in the case of multiple bugs. In order to evaluate the effectiveness of our approach, we have applied it to several industrial designs. The experimental results show that the proposed technique enables us to locate and correct even multiple bugs with high confidence in a short run time even for complex designs of up to several thousand lines of RTL code.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 62
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-05-13
    Description: The series of published works, related to differential fault attack (DFA) against the Grain family, require quite a large number (hundreds) of faults and also several assumptions on the locations and the timings of the faults injected. In this paper, we present a significantly improved scenario from the adversarial point of view for DFA against the Grain family of stream ciphers. Our model is the most realistic one so far as it considers that the cipher has to be re-keyed only a few times and faults can be injected at any random location and at any random point of time, i.e., no precise control is needed over the location and timing of fault injections. We construct equations based on the algebraic description of the cipher by introducing new variables so that the degrees of the equations do not increase. In line of algebraic cryptanalysis, we accumulate such equations based on the fault-free and faulty key-stream bits and solve them using the SAT Solver Cryptominisat-2.9.5 installed with SAGE 5.7. In a few minutes we can recover the state of Grain v1, Grain-128 and Grain-128a with as little as 10, 4 and 10 faults respectively.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 63
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-05-13
    Description: Several papers have studied fault attacks on computing a pairing value $e(P,Q)$ , where $P$ is a public point and $Q$ is a secret point. In this paper, we observe that these attacks are in fact effective only on a small number of pairing-based protocols, and that too only when the protocols are implemented with specific symmetric pairings. We demonstrate the effectiveness of the fault attacks on a public-key encryption scheme, an identity-based encryption scheme, and an oblivious transfer protocol when implemented with a symmetric pairing derived from a supersingular elliptic curve with embedding degree 2.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 64
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-05-13
    Description: The key benefits of using the smartphone accelerometer for human mobility analysis, with or without location determination based upon GPS, Wi-Fi or GSM is that it is energy-efficient, provides real-time contextual information and has high availability. Using measurements from an accelerometer for human mobility analysis presents its own challenges as we all carry our smartphonesdifferently and the measurements are body placement dependent. Also it often relies on an on-demand remote data exchangefor analysis and processing; which is less energy-efficient, has higher network costs and is not real-time. We present a novelaccelerometer framework based upon a probabilistic algorithm that neutralizes the effect of different smartphone on-body placements and orientations to allow human movements to be more accurately and energy-efficiently identified. Using solely the embeddedsmartphone accelerometer without need for referencing historical data and accelerometer noise filtering, our method can in real-time with a time constraint of 2 seconds identify the human mobility state. The method achieves an overall average classification accuracyof 92 percent when evaluated on a dataset gathered from fifteen individuals that classified nine different urban human mobility states.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 65
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-05-13
    Description: Nanoscale process variations in conventional SRAM cells are known to limit voltage scaling in microprocessor caches. Recently, a number of novel cache architectures have been proposed which substitute faulty words of one cache line with healthy words of others, to tolerate these failures at low voltages. These schemes rely on the fault maps to identify faulty words, inevitably increasing the chip area. Besides, the relationship between word sizes and the cache failure rates is not well studied in these works. In this paper, we analyze the word substitution schemes by employing Fault Tree Model and Collision Graph Model. A novel cache architecture (Macho) is then proposed based on this model. Macho is dynamically reconfigurable and is locally optimized (tailored to local fault density) using two algorithms: 1) a graph coloring algorithm for moderate fault densities and 2) a bipartite matching algorithm to support high fault densities. An adaptive matching algorithm enables on-demand reconfiguration of Macho to concentrate available resources on cache working sets. As a result, voltage scaling down to 400 mV is possible, tolerating bit failure rates reaching 1 percent (one failure in every 100 cells). This near-threshold voltage (NTV) operation achieves 44 percent energy reduction in our simulated system (CPU $+$ DRAM models) with a 1 MB L2 cache.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 66
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-05-13
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 67
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-05-13
    Description: We present a custom architecture for realizing the Gentry-Halevi fully homomorphic encryption (FHE) scheme. This contribution presents the first full realization of FHE in hardware. The architecture features an optimized multi-million bit multiplier based on the Schönhage Strassen multiplication algorithm. Moreover, a number of optimizations including spectral techniques as well as a precomputation strategy is used to significantly improve the performance of the overall design. When synthesized using 90 nm technology, the presented architecture achieves to realize the encryption, decryption, and recryption operations in 18.1 msec, 16.1 msec, and 3.1 sec, respectively, and occupies a footprint of less than 30 million gates.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 68
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-05-13
    Description: Providing deadline-sensitive services is a challenge in data centers. Because of the conservativeness in additive increase congestion avoidance, current transmission control protocols are inefficient in utilizing the super high bandwidth of data centers. This may cause many deadline-sensitive flows to miss their deadlines before achieving their available bandwidths. We propose an Adaptive-Acceleration Data Center TCP, A $!^2$ DTCP, which takes into account both network congestion and latency requirement of application service. By using congestion avoidance with an adaptive increase rate that varies between additive and multiplicative, A $!^2$ DTCP accelerates bandwidth detection thus achieving high bandwidth utilization efficiency. At-scale simulations and real testbed implementations show that A $!^2$ DTCP significantly reduces the missed deadline ratio compared to D $!^2$ TCP and DCTCP. In addition, A $!^2$ DTCP can co-exist with conventional TCP as well without requiring more changes in switch hardware than D $!^2$ TCP and DCTCP.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 69
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-05-13
    Description: The design of cache memories is a crucial part of the design cycle of a modern processor, since they are able to bridge the performance gap between the processor and the memory. Unfortunately, caches with low degrees of associativity suffer a large amount of conflict misses. Although by increasing their associativity a significant fraction of these misses can be removed, this comes at a high cost in both power, area, and access time. In this work, we address the problem of high number of conflict misses in low-associative caches, by proposing an indexing policy that adaptively selects the bits from the block address used to index the cache. The basic premise of this work is that the non-uniformity in the set usage is caused by a poor selection of the indexing bits. Instead, by selecting at run time those bits that disperse the working set more evenly across the available sets, a large fraction of the conflict misses (85 percent, on average) can be removed. This leads to IPC improvements of 10.9 percent for the SPEC CPU2006 benchmark suite. By having less accesses in the L2 cache, our proposal also reduces the energy consumption of the cache hierarchy by 13.2 percent. These benefits come with a negligible area overhead.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 70
    Publication Date: 2015-05-13
    Description: This paper concentrates on high-level data-flow optimization and synthesis techniques for datapath intensive designs such as those in Digital Signal Processing (DSP), computer graphics and embedded systems applications, which are modeled as polynomial computations over $Z_{2^{n_1 } } times Z_{2^{n_2 } } times cdots times Z_{2^{n_d } }$ to $Z_{2^m }$ . Our main contribution in this paper is proposing an optimization method based on functional decomposition of multivariate polynomial in the form of $f(x) = g(x) ;o ;h(x) + f_{0} = g(h(x)) + f_{0}$ to obtain good building blocks, and vanishing polynomials over $Z_{2^m }$ to add/delete redundancy to/from given polynomial functions to extract further common sub-expressions. Experimental results for combinational implementation of the designs have shown an average saving of 38.85 and 18.85 percent in the number of gates and critical path delay, respectively, compared with the state-of-the-art techniques. Regarding the comparison with our previous works, the area and delay are improved by 10.87 and 11.22 percent, respectively. Furthermore, experimental results of sequential implementations have shown an average saving of 39.26 and 34.70 percent in the area and the latency, respectively, compared with the state-of-the-art techniques.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 71
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-05-13
    Description: Abnormalities in sensed data streams indicate the spread of malicious attacks, hardware failure and software corruption among the different nodes in a wireless sensor network. These factors of node infection can affect generated and incoming data streams resulting in high chances of inaccurate data, misleading packet translation, wrong decision making and severe communication disruption. This problem is detrimental to real-time applications having stringent quality-of-service (QoS) requirements. The sensed data from other uninfected regions might also get stuck in an infected region should no prior alternative arrangements are made. Although several existing methods (BOUNDHOLE and GAR) can be used to mitigate these issues, their performance is bounded by some limitations, mainly the high risk of falling into routing loops and involvement in unnecessary transmissions. This paper provides a solution to by-pass the infected nodes dynamically using a twin rolling balls technique and also divert the packets that are trapped inside the identified area. The identification of infected nodes is done by adapting a Fuzzy data clustering approach which classifies the nodes based on the fraction of anomalous data that is detected in individual data streams. This information is then used in the proposed by-passed routing (BPR) which rotates two balls in two directions simultaneously: clockwise and counter-clockwise. The first node that hits any ball in any direction and is uninfected, is selected as the next hop. We are also concerned with the incoming packets or the packets-on-the-fly that may be affected when this problem occurs. Besides solving both of the problems in the existing methods, the proposed BPR technique has greatly improved the studied QoS parameters as shown by almost 40 percent increase in the overall performance.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 72
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-05-13
    Description: This paper presents a compositional framework to address the state explosion problem in model checking of concurrent systems. This framework takes as input a system model described as a network of communicating components in a high-level description language, finds the local state transition models for each individual component where local properties can be verified, and then iteratively reduces and composes the component state transition models to form a reduced global model for the entire system where global safety properties can be verified. The state space reductions used in this framework result in a reduced model that contains the exact same set of observably equivalent executions as in the original model, therefore, no false counter-examples result from the verification of the reduced model. This approach allows designs that cannot be handled monolithically or with partial-order reduction to be verified without difficulty. The experimental results show significant scale-up of this compositional verification framework on a number of non-trivial concurrent system models.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 73
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-05-13
    Description: While a NUMA system is being widely used as a target machine for virtualization, each data access request produced by a virtual machine (VM) on the NUMA system may have a different access time depending on not only remote access condition, but also shared resource contentions. Mainly due to this, each VM running on the NUMA system will have irregular data access performance at different times. Because existing hypervisors, such as KVM, VMware, and Xen, have yet to consider this, users of VMs cannot predict their data access performance or even recognize the data access performance they have experienced. In this paper, we propose a novel VM placement technique to resolve this issue pertaining to irregular data access performance of VMs running on the NUMA system. The hypervisor with our technique provides the illusion of a private memory subsystem to each VM, which guarantees the data access latency required by each VM on average. To enable this feature, we periodically evaluates the average data access latency of each VM using hardware performance monitoring units. After every evaluation, our Mcredit -based VM migration algorithm tries to migrate the VCPU or memory of the VM not meeting with its required data access latency to another node, giving the VM less data access latency. We implemented the prototype for KVM hypervisor on Linux 3.10.10. Experimental results show that, in the four-node NUMA system, our technique keeps the required data access performance levels of VMs running various workloads while it only consumes less than 1 percent of the cycles of a core and 0.3 percent of the system memory bandwidth.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 74
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-05-13
    Description: The dramatic growth of mobile multimedia communications imposes new requirements on quality-of-service and energy efficiency in wireless networks. In this paper, we study the energy- and spectrum-efficient cooperative communication (ESCC) problem by exploiting the benefits of cooperative communication (CC) for mobile multimedia applications in multi-channel wireless networks.In a static network, it is formulated as a mixed-integer nonlinear programming problem. To solve this problem, we use linearizationand reformulation techniques to transform it into a mixed-integer linear programming problem that is solved by a branch-and-bound algorithm with enhanced performance. To deal with the problem in dynamic networks, we propose an online algorithm with lowcomputational complexity and deployment overhead. Extensive simulations are conducted to show that the proposed algorithmcan significantly improve the performance of energy efficiency in both static and dynamic networks.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 75
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-05-13
    Description: Question and Answer (Q&A) websites such as Yahoo! Answers provide a platform where users can post questions and receive answers. These systems take advantage of the collective intelligence of users to find information. In this paper, we analyze the online social network (OSN) in Yahoo! Answers. Based on a large amount of our collected data, we studied the OSN’s structural properties, which reveals strikingly distinct properties such as low link symmetry and weak correlation between indegree and outdegree. After studying the knowledge base and behaviors of the users, we find that a small number of top contributors answer most of the questions in the system. Also, each top contributor focuses only on a few knowledge categories. In addition, the knowledge categories of the users are highly clustered. We also study the knowledge base in a user’s social network, which reveals that the members in a user’s social network share only a few knowledge categories. Based on the findings, we provide guidance in the design of spammer detection algorithms and distributed Q&A systems. We also propose a friendship-knowledge oriented Q&A framework that synergistically combines current OSN-based Q&A and web Q&A. We believe that the results presented in this paper are crucial in understanding the collective intelligence in the web Q&A OSNs and lay a cornerstone for the evolution of next-generation Q&A systems.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 76
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-05-13
    Description: The density of flash memory chips has doubled every two years in the past decade and the trend is expected to continue. The increasing capacity of NAND flash memory leads to large RAM footprint on address mapping management. This paper proposes a novel Demand-based block-level Address mapping scheme with a two-level Caching mechanism (DAC) for large-scale NAND flash storage systems. The objective is to reduce RAM footprint without excessively compromising system response time. In our technique, the block-level address mapping table is stored in fixed pages (called the translation pages) in the flash memory. Considering temporal locality that workloads exhibit, we maintain one cache in RAM to store the on-demand address mapping entries. Meanwhile, by exploring both spatial locality and access frequency of workloads with another two caches, the second-level cache is designed to cache selected translation pages. In such a way, both the most-frequently-accessed and sequentially accessed address mapping entries can be stored in the cache so the cache hit ratio can be increased and the system response time can be improved. To the best of our knowledge, this is the first work to reduce the RAM cost by employing the demand-based approach on block-level address mapping schemes. The experiments have been conducted on a real embedded platform. The experimental results show that our technique can effectively reduce the RAM footprint while maintaining similar average system response time compared with previous work.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 77
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-05-13
    Description: SpiNNaker is a multi-core computing engine, with a bespoke and specialised communication infrastructure that supports almost perfect scalability up to a hard limit of $2^{16} times 18 = 1,!179,!648$ cores. This remarkable property is achieved at the cost of ignoring memory coherency, global synchronisation and even deterministic message passing, yet it is still possible to perform meaningful computations. Whilst we have yet to assemble the full machine, the scalability properties make it possible to demonstrate the capabilities of the machine whilst it is being assembled; the more cores we connect, the larger the problems become that we are able to attack. Even with isolated printed circuit boards of 864 cores, interesting capabilities are emerging. This paper is the third of a series charting the development trajectory of the system. In the first two, we outlined the hardware build. Here, we lay out the (rather unusual) low-level foundation software developed so far to support the operation of the machine.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 78
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-05-13
    Description: By leveraging virtual machine (VM) technology, we optimize cloud system performance based on refined resource allocation, in processing user requests with composite services. Our contribution is three-fold. (1) We devise a VM resource allocation scheme with a minimized processing overhead for task execution. (2) We comprehensively investigate the best-suited task scheduling policy with different design parameters. (3) We also explore the best-suited resource sharing scheme with adjusted divisible resource fractions on running tasks in terms of Proportional-share model (PSM), which can be split into absolute mode (called AAPSM) and relative mode (RAPSM). We implement a prototype system over a cluster environment deployed with 56 real VM instances, and summarized valuable experience from our evaluation. As the system runs in short supply, lightest workload first (LWF) is mostly recommended because it can minimize the overall response extension ratio (RER) for both sequential-mode tasks and parallel-mode tasks. In a competitive situation with over-commitment of resources, the best one is combining LWF with both AAPSM and RAPSM. It outperforms other solutions in the competitive situation, by 16 $;+;$ % w.r.t. the worst-case response time and by 7.4 $;+;$ % w.r.t. the fairness.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 79
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-05-13
    Description: Many multicore processors are capable of decreasing the voltage and clock frequency to save energy at the cost of an increased delay. While a large part of the theory oriented literature focuses on local dynamic voltage and frequency scaling (local DVFS), where every core’s voltage and clock frequency can be set separately, this article presents an in-depth theoretical study of the more commonly available global DVFS that makes such changes for the entire chip. This article shows how to choose the optimal clock frequencies that minimize the energy for global DVFS, and it discusses the relationship between scheduling and optimal global DVFS. Formulas are given to find this optimum under time constraints, including proofs thereof. The problem of simultaneously choosing clock frequencies and a schedule that together minimize the energy consumption is discussed, and based on this a scheduling criterion is derived that implicitly assigns frequencies and minimizes energy consumption. Furthermore, this article studies the effectivity of a large class of scheduling algorithms with regard to the derived criterion, and a bound on the maximal relative deviation is given. Simulations show that with our techniques an energy reduction of 30% can be achieved with respect to state-of-the-art research.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 80
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-05-13
    Description: In smart building systems, the automatic control of devices relies on matching the sensed environment information to customized rules. With the development of wireless sensor and actuator networks (WSANs), low-cost and self-organized wireless sensors and actuators can enhance smart building systems, but produce abundant sensing data. Therefore, a rule engine with ability of efficient rule matching is the foundation of WSANs based smart building systems. However, traditional rule engines mainly focus on the complex processing mechanism and omit the amount of sensing data, which are not suitable for large scale WSANs based smart building systems. To address these issues, we build an efficient rule engine. Specifically, we design an atomic event extraction module for extracting atomic event from data messages, and then build a $beta$ -network to acquire the atomic conditions for parsing the atomic trigger events. Taking the atomic trigger events as the key set of MPHF, we construct the minimal perfect hash table which can filter the majority of the unused atomic event with O (1) time overhead. Moreover, a rule engine adaption scheme is proposed to minimize the rule matching overhead. We implement the proposed rule engine in a practical smart building system. The experimental results show that the rule engine can perform efficiently and flexibly with high data throughput and large rule set.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 81
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-05-13
    Description: This paper presents the Just in Time/Just Enough Energy Management (JEM) methodology that is applicable to a broad range of computing systems. The conventional concept of a fixed voltage supply ( V DD ) scheme for both performance and power saving modes of computing systems is revisited and is improved with JEM. The JEM consists of an efficient DC/DC converter and a Power Management Integrated Circuit (PMIC) with a feedback to monitor the activities within a given computing system, providing a new means for dynamic voltage scaling at the system level. The JEM is tested and validated on a blade server that results in 15.11 percent power savings at the motherboard level. A significant thermal improvement of 9.0°C is measured in a 16 GB memory module of the blade server, as well. Moreover, a JEM enabled CMOS circuit depicts a remarkable reduction in the supply current. Furthermore, the JEM is compared to a conventional power supply design, with significant improvement in the processor performance and considerable power savings in the blade server.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 82
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-05-13
    Description: This paper presents a novel method named the Unified Mitchell-based Approximation (UMA) to obtain an optimized Mitchell-based logarithmic conversion circuit for any desired conversion accuracy up to 14 bits. UMA is the first method that is able to obtain a conversion circuit when a specific accuracy is required. In this work, we studied and analyzed five design parameters and their impact on accuracy and hardware merits. We formulate the hardware model of the error correction circuit in the conversion circuit for performance evaluation. Given an accuracy requirement, the proposed method explores the design space of the five design parameters. As the design space is theoretically huge, we propose constraints for the range of the parameter values and develop a systematical search algorithm for exploring the design space. UMA is able to obtain an area-delay product optimized circuit for each of the conversion accuracies achieved by the existing Mitchell-based designs. Synthesis results in 90 nm CMOS technology show that the circuits obtained are comparable or better than the existing Mitchell-based designs with the same accuracy objective. Nine of the fifteen circuits obtained achieve better area-delay product by more than 50 percent. In addition, UMA is able to obtain circuits for any accuracy from 4 to 14 bits, while the best accuracy achieved by the existing Mitchell-based methods is less than 12 bits.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 83
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-04-08
    Description: Earliest Deadline First (EDF) is the most widely studied optimal dynamic scheduling algorithm for uniprocessor real-time systems. For realistic programs, tasks must be allowed to exchange data and use other forms of resources that must be accessed under mutual exclusion. With EDF scheduled systems, access to such resources is usually controlled by the use of Baker’s Stack Resource Protocol (SRP). In this paper we propose an alternative scheme based on deadline inheritance. Shared resources are assigned a relative deadline equal to the minimum (floor) of the relative deadlines of all tasks that use the resource. On entry to the resource a task’s current absolute deadline is subject to an immediately reduction to reflect the resource’s deadline floor. On exit the original deadline for the task is restored. We show that the worst-case behaviour of the new protocol (termed DFP—Deadline Floor inheritance Protocol) is the same as SRP. Indeed it leads to the same blocking term in the scheduling analysis. We argue that the new scheme is however more intuitive, removes the need to support preemption levels and we demonstrate that it can be implemented more efficiently.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 84
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-04-08
    Description: Wireless networks of nano-nodes will play a critical role in future medical, quality control, environmental monitoring and military applications. Nano-nodes are invisible/marginally visible to the human eye, ranging in size from approximately 100  $mu text{m}$ to few nanometers. Nano-networking poses unique challenges, requiring ground-breaking solutions. First, the nano-scale imposes severe restrictions to the computational and communication capabilities of the nodes. Second, nano-nodes are not accessible for programming, configuration and debugging in the classical sense. Thus, a nano-network should be self-configuring, resilient and adaptive to environmental changes. Finally, all nano-networking protocols should be ultra-scalable, since a typical nano-network may comprise billions of nodes. The study contributes a novel paradigm for data dissemination in networking nano-machines, addressing these unique challenges. Relying on innovative analytical results on lattice algebra and nature-inspired processes, a novel data dissemination method is proposed. The nano-nodes exploit their environmental feedback and mature adaptively into network backbone or remain single network users. Such a process can be implemented as an ultra-scalable, low complexity, multi-modal nano-node architecture (physical layer), providing efficient networking and application services at the same time. Requiring existing manufacturing technology, the proposed architecture constitutes the first candidate solution for realizable nano-networking.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 85
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-04-08
    Description: Quantitatively estimating the relationship between the workload and the corresponding power consumption of a multicore processor is an essential step towards achieving energy proportional computing. Most existing and proposed approaches use Performance Monitoring Counters (Hardware Monitoring Counters) for this task. In this paper we propose a complementary approach that employs the statistics of CPU utilization (workload) only. Hence, we model the workload and the power consumption of a multicore processor as random variables and exploit the monotonicity property of their distribution functions to establish a quantitative relationship between the random variables. We will show that for a single-core processor the relationship is best approximated by a quadratic function whereas for a dualcore processor, the relationship is best approximated by a linear function. We will demonstrate the plausibility of our approach by estimating the power consumption of both custom-made and standard benchmarks (namely, the SPEC power benchmark and the Apache benchmarking tool) for an Intel and AMD processors.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 86
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-04-08
    Description: In sensor networks, skeleton extraction has emerged as an appealing approach to support many applications such as load-balanced routing and location-free segmentation. While significant advances have been made for 2D cases, so far skeleton extraction for 3D sensor networks has not been thoroughly studied. In this paper, we conduct the first work of a unified framework providing a connectivity-based and distributed solution for line-like skeleton extraction in both 2D and 3D sensor networks. We highlight its practice as: 1) it has linear time/message complexity; 2) it provides reasonable skeleton results when the network has low node density; 3) the obtained skeletons are robust to shape variations, node densities, boundary noise and communication radio model. In addition, to confirm the effectiveness of the line-like skeleton, a 3D routing scheme is derived based on the extracted skeleton, which achieves balanced traffic load, guaranteed delivery, as well as low stretch factor.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 87
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-04-08
    Description: In this work, we developed a novel multithreaded variable size chunking method, MUCH, which exploits the multicore architecture of the modern microprocessors. The legacy single threaded variable size chunking method leaves much to be desired in terms of effectively exploiting the bandwidth of the state of the art storage devices. MUCH guarantees chunking invariability : The result of chunking does not change regardless of the degree of multithreading or the segment size. This is achieved by inter and intra-segment coalescing at the master thread and Dual Mode Chunking at the client thread. We developed an elaborate performance model to determine the optimal multithreading degree and the segment size. MUCH is implemented in the prototype deduplication system. By fully exploiting the available CPU cores (quad-core), we achieved up to $times$ 4 increase in the chunking performance (MByte/sec). MUCH successfully addresses the performance issues of file chunking which is one of the performance bottlenecks in modern deduplication systems by parallelizing the file chunking operation while guaranteeing Chunking Invariability.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 88
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-04-08
    Description: Multi-processor system on chip (MPSoC) has been widely applied in embedded systems in the past decades. However, it has posed great challenges to efficiently design and implement a rapid prototype for diverse applications due to heterogeneous instruction set architectures (ISA), programming interfaces and software tool chains. In order to solve the problem, this paper proposes a novel high level architecture support for automatic out-of-order (OoO) task execution on FPGA based heterogeneous MPSoCs. The architecture support is composed of a hierarchical middleware with an automatic task level OoO parallel execution engine. Incorporated with a hierarchical OoO layer model, the middleware is able to identify the parallel regions and generate the sources codes automatically. Besides, a runtime middleware Task-Scoreboarding analyzes the inter-task data dependencies and automatically schedules and dispatches the tasks with parameter renaming techniques. The middleware has been verified by the prototype built on FPGA platform. Examples and a JPEG case study demonstrate that our model can largely ease the burden of programmers as well as uncover the task level parallelism.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 89
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-04-08
    Description: Hardware variability is predicted to increase dramatically over the coming years as a consequence of continued technology scaling. In this paper, we apply the Underdesigned and Opportunistic Computing (UnO) paradigm by exposing system-level powervariability to software to improve energy efficiency. We present ViPZonE, a memory management solution in conjunction withapplication annotations that opportunistically performs memory allocations to reduce DRAM energy. ViPZonE’s components consist of a physical address space with DIMM-aware zones, a modified page allocation routine, and a new virtual memory system call for dynamic allocations from userspace. We implemented ViPZonE in the Linux kernel with GLIBC API support, running on a real x86-64 testbed with significant access power variation in its DDR3 DIMMs. We demonstrate that on our testbed, ViPZonE can save up to27.80 percent memory energy, with no more than 4.80 percent performance degradation across a set of PARSEC benchmarks tested with respect to the baseline Linux software. Furthermore, through a hypothetical “what-if” extension, we predict that in futurenon-volatile memory systems which consume almost no idle power, ViPZonE could yield even greater benefits, demonstrating theability to exploit memory hardware variability through opportunistic software.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 90
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-04-08
    Description: To avoid data corruption, error correction codes (ECCs) are widely used to protect memories. ECCs introduce a delay penalty in accessing the data as encoding or decoding has to be performed. This limits the use of ECCs in high-speed memories. This has led to the use of simple codes such as single error correction double error detection (SEC-DED) codes. However, as technology scales multiple cell upsets (MCUs) become more common and limit the use of SEC-DED codes unless they are combined with interleaving. A similar issue occurs in some types of memories like DRAM that are typically grouped in modules composed of several devices. In those modules, the protection against a device failure rather than isolated bit errors is also desirable. In those cases, one option is to use more advanced ECCs that can correct multiple bit errors. The main challenge is that those codes should minimize the delay and area penalty. Among the codes that have been considered for memory protection are Reed-Solomon (RS) codes. These codes are based on non-binary symbols and therefore can correct multiple bit errors. In this paper, single symbol error correction codes based on Reed-Solomon codes that can be implemented with low delay are proposed and evaluated. The results show that they can be implemented with a substantially lower delay than traditional single error correction RS codes.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 91
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-04-08
    Description: Given a set of events and a set of robots, the dispatch problem is to allocate one robot for each event to visit it. In a single round, each robot may be allowed to visit only one event (matching dispatch), or several events in a sequence (sequence dispatch). In a distributed setting, each event is discovered by a sensor and reported to a robot. Here, we present novel algorithms aimed at overcoming the shortcomings of several existing solutions. We propose pairwise distance based matching algorithm ( PDM ) to eliminate long edges by pairwise exchanges between matching pairs. Our sequence dispatch algorithm ( SQD ) iteratively finds the closest event-robot pair, includes the event in dispatch schedule of the selected robot and updates its position accordingly. When event-robot distances are multiplied by robot resistance (inverse of the remaining energy), the corresponding energy-balanced variants are obtained. We also present generalizations which handle multiple visits and timing constraints. Our localized algorithm MAD is based on information mesh infrastructure and local auctions within the robot network for obtaining the optimal dispatch schedule for each robot. The simulations conducted confirm the advantages of our algorithms over other existing solutions in terms of average robot-event distance and lifetime.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 92
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-04-08
    Description: Speedup models are powerful analytical tools for evaluating and predicting the performance of parallel applications. Unfortunately, the well-known speedup models like Amdahl’s law and Gustafson’s law do not take reliability into consideration and therefore cannot accurately account for application performance in the presence of failures. In this study, we enhance Amdahl’s law and Gustafson’s law by considering the impact of failures and the effect of coordinated checkpointing/restart. Unlike existing analytical studies relying on Exponential failure distribution alone, in this work we consider both Exponential and Weibull failure distributions in the construction of our reliability-aware speedup models. The derived reliability-aware models are validated through trace-based simulations under a variety of parameter settings. Our trace-based simulations demonstrate these models can effectively quantify failure impact on application speedup. Moreover, we present two case studies to illustrate the use of these reliability-aware speedup models.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 93
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-04-08
    Description: Higher radix values of the form $beta =2^r$ have been employed traditionally for recoding of multipliers, and for determining quotient- and root-digits in iterative division and square root algorithms, usually only for quite moderate values of $r$ , like 2 or 3. For fast additions, in particular for the accumulation of many terms, generally redundant representations are employed, most often binary  carry-save or borrow-save, but in a number of publications it has been suggested to recode the addends into a higher radix. It is shown that there are no speed advantages in doing so if the radix is a power of 2, on the contrary, there are significant savings in using standard 4-to-2 adders, even saving half of the operations in multi-operand addition.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 94
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-04-08
    Description: Journaling file systems are widely used in modern computer systems as they provide high reliability at reasonable cost. However, existing journaling file systems are not efficient for emerging PCM (phase-change memory) storage because they are optimized for hard disks. Specifically, the large amount of data that they write during journaling degrades the performance of PCM storage seriously as it has a long write latency. In this paper, we present a new journaling file system for PCM, called Shortcut-JFS, that reduces write traffic to PCM by more than half of existing journaling file systems running on block I/O interfaces. To do this, we devise two novel schemes that can be used under byte-addressable I/O interfaces: 1) differential logging that journals only the modified part of a block and 2) in-place checkpointing that eliminates the overhead of block copying. We implement Shortcut-JFS on Linux 2.6.32 and measure the performance of Shortcut-JFS compared to those of existing journaling and log-structured file systems. The results show that the performance improvement of Shortcut-JFS against Ext4 and LFS is 54 and 96 percent, respectively, on average.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 95
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-04-08
    Description: As energy has become one of the key operating costs in running a data center and power waste commonly exists, it is essential to reduce energy inefficiency inside data centers. In this paper, we develop an innovative framework, called PowerTracer , for diagnosing energy inefficiency and saving power. Inside the framework, we first present a resource tracing method based on request tracing in multi-tier services of black boxes. Then, we propose a generalized methodology of applying a request tracing approach for energy inefficiency diagnosis and power saving in multi-tier service systems. With insights into service performance and resource consumption of individual requests, we develop (1) a bottleneck diagnosis tool that pinpoints the root causes of energy inefficiency, and (2) a power saving method that enables dynamic voltage and frequency scaling (DVFS) with online request tracing. We implement a prototype of PowerTracer, and conduct extensive experiments to validate its effectiveness. Our tool analyzes several state-of-the-practice and state-of-the-art DVFS control policies and uncovers existing energy inefficiencies. Meanwhile, the experimental results demonstrate that PowerTracer outperforms its peers in power saving.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 96
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-04-08
    Description: Cloud computing is proposed as an open and promising computing paradigm where customers can deploy and utilize IT services in a pay-as-you-go fashion while saving huge capital investment in their own IT infrastructure. Due to the openness and virtualization, various malicious service providers may exist in these cloud environments, and some of them may record service data from a customer and then collectively deduce the customer's private information without permission. Therefore, from the perspective of cloud customers, it is essential to take certain technical actions to protect their privacy at client side. Noise obfuscation is an effective approach in this regard by utilizing noise data. For instance, noise service requests can be generated and injected into real customer service requests so that malicious service providers would not be able to distinguish which requests are real ones if these requests’ occurrence probabilities are about the same, and consequently related customer privacy can be protected. Currently, existing representative noise generation strategies have not considered possible fluctuations of occurrence probabilities. In this case, the probability fluctuation could not be concealed by existing noise generation strategies, and it is a serious risk for the customer's privacy. To address this probability fluctuation privacy risk, we systematically develop a novel time-series pattern based noise generation strategy for privacy protection on cloud. First, we analyze this privacy risk and present a novel cluster based algorithm to generate time intervals dynamically. Then, based on these time intervals, we investigate corresponding probability fluctuations and propose a novel time-series pattern based forecasting algorithm. Lastly, based on the forecasting algorithm, our novel noise generation strategy can be presented to withstand the probability fluctuation privacy risk. The simulation evaluation demonstrates that our str- tegy can significantly improve the effectiveness of such cloud privacy protection to withstand the probability fluctuation privacy risk.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 97
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-04-08
    Description: The digit-recurrence division algorithm is used in several high-performance processors because it provides good tradeoffs in terms of latency, area and power dissipation. In this work we develop a minimally redundant radix-8 divider for binary64 (double-precision) aiming at obtaining better energy efficiency in the performance-per-watt space. The results show that the radix-8 divider, when compared to radix-4 and radix-16 units, requires less energy to complete a division for high clock rates.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 98
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-12-11
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 99
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-12-11
    Description: The continuing decrease in dimensions and operating voltage of transistors has increased their sensitivity against radiation phenomena, making soft errors an important challenge in future microprocessors. New techniques for detecting errors in the logic and memories that allow meeting the desired failure rate are key to keep harnessing the benefits of Moore's law. This paper proposes a low-cost dynamic particle strike detection mechanism based on acoustic wave detectors. Our results show that the proposed mechanism can protect the whole chip, including both the logic and the memory arrays, and detect all the soft errors caused by particle strikes with minimal hardware overhead and performance cost.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 100
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-12-11
    Description: This paper presents a fixed-point reconfigurable parallel VLSI hardware architecture for real-time Electrical Capacitance Tomography (ECT). It is modular and consists of a front-end module which performs precise capacitance measurements in a time multiplexed manner using Capacitance to Digital Converter (CDC) technique. Another FPGA module performs the inverse steps of the tomography algorithm. A dual port built-in memory banks store the sensitivity matrix, the actual value of the capacitances, and the actual image. A two dimensional (2D) core multi-processing elements (PE) engine intercommunicates with these memory banks via parallel buses. A Hardware-software codesign methodology was conducted using commercially available tools in order to concurrently tune the algorithms and hardware parameters. Hence, the hardware was designed down to the bit-level in order to reduce both the hardware cost and power consumption, while satisfying real-time constraint. Quantization errors were assessed against the image quality and bit-level simulations demonstrate the correctness of the design. Further simulations indicate that the proposed architecture achieves a speed-up of up to three orders of magnitude over the software version when the reconstruction algorithm runs on 2.53 GHz-based Pentium processor or DSP Ti's Delphino TMS320F32837 processor. More specifically, a throughput of 17.241 Kframes/sec for both the Linear-Back Projection (LBP) and modified Landweber algorithms and 8.475 Kframes/sec for the Landweber algorithm with 200 iterations could be achieved. This performance was achieved using an array of [2 × 2] × [2 × 2] processing units. This satisfies the real-time constraint of many industrial applications. To the best of the authors’ knowledge, this is the first embedded system which explores the intrinsic parallelism which is available in modern FPGA for ECT tomography.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
Close ⊗
This website uses cookies and the analysis tool Matomo. More information can be found here...