ALBERT

All Library Books, journals and Electronic Records Telegrafenberg

Your email was sent successfully. Check your inbox.

An error occurred while sending the email. Please try again.

Proceed reservation?

Export
  • 1
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-07
    Description: Digital circuits are expected to increasingly suffer from more hard faults due to technology scaling. Especially, a single hard fault in ALU (Arithmetic Logic Unit) might lead to a total failure in processors or significantly reduce their performance. To address these increasingly important problems, we propose a novel cost-efficient fault-tolerant mechanism for the ALU, called LIZARD. LIZARD employs two half-word ALUs, instead of a single full-word ALU, to perform computations with concurrent fault detection. When a fault is detected, the two ALUs are partitioned into four quarter-word ALUs. After diagnosing and isolating a faulty quarter-word ALU, LIZARD continues its operation using the remaining ones, which can detect and isolate another fault. Even though LIZARD uses narrow ALUs for computations, it adds negligible performance overhead through exploiting predictability of the results in the arithmetic computations. We also present the architectural modifications when employing LIZARD for scalar as well as superscalar processors. Through comparative evaluation, we demonstrate that LIZARD outperforms other competitive fault-tolerant mechanisms in terms of area, energy consumption, performance and reliability.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 2
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-07
    Description: Information searches are the most common application within social networks. Normally, the social network is modeled as a network graph, consisting of nodes (In the rest of the paper, unless otherwise specified, we will use the terms “user” and “node” interchangeably.) representing users within the network and edges representing relationships between users. Choosing the appropriate nodes to form an auxiliary structure for supporting the effective query message spreading can reduce the troublesome repeated queries. To accomplish this, a hybrid search (HS) scheme is proposed. If the query message is received by a node belonging the auxiliary structure constructed by dynamic weighted distributed label clustering (DW-DLC), it would be flooded to all neighbors of the visited node; otherwise, it would be forwarded to one neighbor of the visited node. The DW-DLC based auxiliary structure can accelerate the process of obtaining required information within the network. The simulation results show that the HS+DW-DLC scheme can reduce the average searching delay time, even in a required-information-scarce social network. In addition, the proposed scheme can generate a relatively low amount of repeated messages to lower repeatedly asking social network users.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 3
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-07
    Description: This paper presents a derivation of four radix-2 division algorithms by digit recurrence. Each division algorithm selects a quotient digit from the over-redundant digit set {−2, −1, 0, 1, 2}, and the selection of each quotient digit depends only on the two most-significant digits of the partial remainder in a redundant representation. Two algorithms use a two’s complement representation for the partial remainder and carry-save additions, and the other two algorithms use a binary signed-digit representation for the partial remainder and carry-free additions. Three algorithms are novel. The fourth algorithm has been presented before. Results from the synthesized netlists show that two of our fastest algorithms achieve an improvement of 10 percent in latency per iteration over a standard radix-2 SRT algorithm at the cost of 36 percent more power and 50 percent more area.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 4
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-07
    Description: We present WaFS, a user-level file system, and a related scheduling algorithm for scientific workflow computation in the cloud. WaFS’s primary design goal is to automatically detect and gather the explicit and implicit data dependencies between workflow jobs, rather than high-performance file access. Using WaFS’s data, a workflow scheduler can either make effective cost-performance tradeoffs or improve storage utilization. Proper resource provisioning and storage utilization on pay-as-you-go clouds can be more cost effective than the uses of resources in traditional HPC systems. WaFS and the scheduler controls the number of concurrent workflow instances at runtime so that the storage is well used, while the total makespan (i.e., turnaround time for a workload) is not severely compromised. We describe the design and implementation of WaFS and the new workflow scheduling algorithm based on our previous work. We present empirical evidence of the acceptable overheads of our prototype WaFS and describe a simulation-based study, using representative workflows, to show the makespan benefits of our WaFS-enabled scheduling algorithm.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 5
    Publication Date: 2015-08-07
    Description: This paper presents an anomaly detection model that is granular and distributed to accurately and efficiently identify sensed data anomalies within wireless sensor networks. A more decentralised mechanism is introduced with wider use of in-network processing on a hierarchical sensor node topology resulting in a robust framework for dynamic data domains. This efficiently addresses the big data issue that is encountered in large scale industrial sensor network applications. Data vectors on each node’s observation domain is first partitioned using an unsupervised approach that is adaptive regarding dynamic data streams using cumulative point-wise entropy and average relative density . Second order statistical analysis applied on average relative densities and mean entropy values is then used to differentiate anomalies through robust and adaptive thresholds that are responsive to a dynamic environment. Anomaly detection is then performed in a non-parametric and non-probabilistic manner over the different network tiers in the hierarchical topology in offering increased granularity for evaluation. Experiments were performed extensively using both real and artificial data distributions representative of different dynamic and multi-density observation domains. Results demonstrate higher accuracies in detection as more than 94 percent accompanied by a desirable reduction of more than 85 percent in communication costs when compared to existing centralized methods.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 6
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-07
    Description: The problem of securing data present on USB memories and SD cards has not been adequately addressed in the cryptography literature. While the formal notion of a tweakable enciphering scheme (TES) is well accepted as the proper primitive for secure data storage, the real challenge is to design a low cost TES which can perform at the data rates of the targeted memory devices. In this work, we provide the first answer to this problem. Our solution, called STES, combines a stream cipher with a XOR universal hash function. The security of STES is rigorously analyzed in the usual manner of provable security approach. By carefully defining appropriate variants of the multi-linear hash function and the pseudo-dot product based hash function we obtain controllable trade-offs between area and throughput. We combine the hash function with the recent hardware oriented stream ciphers, namely Mickey, Grain and Trivium. Our implementations are targeted towards two low cost FPGAs—Xilinx Spartan 3 and Lattice ICE40. Simulation results demonstrate that the speeds of encryption/decryption match the data rates of different USB and SD memories. We believe that our work opens up the possibility of actually putting FPGAs within controllers of such memories to perform low-level in-place encryption.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 7
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-07
    Description: Cellular automata (CAs) have been widely used to model and simulate physical systems and processes. CAs have also been successfully used as a VLSI architecture that proved to be very efficient at least in terms of silicon-area utilization and clock-speed maximization. Quantum cellular automata (QCAs) as one of the promising emerging technologies for nanoscale and quantum computing circuit implementation, provides very high scale integration, very high switching frequency and extremely low power characteristics. In this paper we present a new automated design architecture and a tool, namely DATICAQ (Design Automation Tool of 1-D CAs using QCAs), that builds a bridge between 1-D CAs as models of physical systems and processes and 1-D QCAs as nanoelectronic architecture. The QCA implementation of CAs not only drives the already developed CAs circuits to the nanoelectronics era but improves their performance significantly. The inputs of the proposed architecture are CA dimensionality, size, local rule, and initial and boundary conditions imposed by the particular problem. DATICAQ produces as output the layout of the QCA implementation of the particular 1-D CA model. Simulations of CA models for zero and periodic boundary conditions and the corresponding QCA circuits showed that the CA models have been successfully implemented.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 8
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-07
    Description: Role-based access control is an important access control method for securing computer systems. A role-based access control policy can be implemented incorrectly due to various reasons, such as programming errors. Defects in the implementation may lead to unauthorized access and security breaches. To reveal access control defects, this paper presents a model-based approach to automated generation of executable access control tests using predicate/transition nets. Role-permission test models are built by integrating declarative access control rules with functional test models or contracts (preconditions and postconditions) of the associated activities (the system functions). The access control tests are generated automatically from the test models to exercise the interactions of access control activities. They are transformed into executable code through a model-implementation mapping that maps the modeling elements to implementation constructs. The approach has been implemented in an industry-adopted test automation framework that supports the generation of test code in a variety of languages. The full model-based testing process has been applied to three systems implemented in Java. The effectiveness is evaluated through mutation analysis of role-based access control rules. The experiments show that the model-based approach is highly effective in detecting the seeded access control defects.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 9
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-07
    Description: Heterogeneous multiprocessor systems, which are composed of a mix of processing elements, such as commodity multicore processors, graphics processing units (GPUs), and others, have been widely used in scientific computing community. Software applications incorporate the code designed and optimized for different types of processing elements in order to exploit the computing power of such heterogeneous computing systems. In this paper, we consider the problem of optimal distribution of the workload of data-parallel scientific applications between processing elements of such heterogeneous computing systems. We present a solution that uses functional performance models (FPMs) of processing elements and FPM-based data partitioning algorithms. Efficiency of this approach is demonstrated by experiments with parallel matrix multiplication and numerical simulation of lid-driven cavity flow on hybrid servers and clusters.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 10
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-07
    Description: In this paper, we propose a new notion called $k$ -times attribute-based anonymous access control , which is particularly designed for supporting cloud computing environment. In this new notion, a user can authenticate himself/herself to the cloud computing server anonymously. The server only knows the user acquires some required attributes, yet it does not know the identity of this user. In addition, we provide a $k$ -times limit for anonymous access control. That is, the server may limit a particular set of users (i.e., those users with the same set of attribute) to access the system for a maximum $k$ -times within a period or an event. Further additional access will be denied. We also prove the security of our instantiation. Our implementation result shows that our scheme is practical.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 11
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-08-07
    Description: In face of high partial and complete disk failure rates and untimely system crashes, the executions of low-priority background tasks become increasingly frequent in large-scale data centers. However, the existing algorithms are all reactive optimizations and only exploit the temporal locality of workloads to reduce the user I/O requests during the low-priority background tasks. To address the problem, this paper proposes Intelligent Data Outsourcing (IDO), a zone-based and proactive data migration optimization, to significantly improve the efficiency of the low-priority background tasks. The main idea of IDO is to proactively identify the hot data zones of RAID-structured storage systems in the normal operational state. By leveraging the prediction tools to identify the upcoming events, IDO proactively migrates the data blocks belonging to the hot data zones on the degraded device to a surrogate RAID set in the large-scale data centers. Upon a disk failure or crash reboot, most user I/O requests addressed to the degraded RAID set can be serviced directly by the surrogate RAID set rather than the much slower degraded RAID set. Consequently, the performance of the background tasks and user I/O performance during the background tasks are improved simultaneously. Our lightweight prototype implementation of IDO and extensive trace-driven experiments on two case studies demonstrate that, compared with the existing state-of-the-art approaches, IDO effectively improves the performance of the low-priority background tasks. Moreover, IDO is portable and can be easily incorporated into any existing algorithms for RAID-structured storage systems.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 12
    Publication Date: 2015-08-07
    Description: Cloud computing that provides elastic computing and storage resource on demand has become increasingly important due to the emergence of “big data”. Cloud computing resources are a natural fit for processing big data streams as they allow big data application to run at a scale which is required for handling its complexities (data volume, variety and velocity). With the data no longer under users’ direct control, data security in cloud computing is becoming one of the most concerns in the adoption of cloud computing resources. In order to improve data reliability and availability, storing multiple replicas along with original datasets is a common strategy for cloud service providers. Public data auditing schemes allow users to verify their outsourced data storage without having to retrieve the whole dataset. However, existing data auditing techniques suffers from efficiency and security problems. First, for dynamic datasets with multiple replicas, the communication overhead for update verifications is very large, because each update requires updating of all replicas, where verification for each update requires O(log n ) communication complexity. Second, existing schemes cannot provide public auditing and authentication of block indices at the same time. Without authentication of block indices, the server can build a valid proof based on data blocks other than the blocks client requested to verify. In order to address these problems, in this paper, we present a novel public auditing scheme named MuR-DPA. The new scheme incorporated a novel authenticated data structure (ADS) based on the Merkle hash tree (MHT), which we call MR-MHT. To support full dynamic data updates and authentication of block indices, we included rank and level values in computation of MHT nodes. In contrast to existing schemes, level values of nodes in MR-MHT are assigned in a top-down order, and all replica blocks for each data block are organized into a - ame replica sub-tree. Such a configuration allows efficient verification of updates for multiple replicas. Compared to existing integrity verification and public auditing schemes, theoretical analysis and experimental results show that the proposed MuR-DPA scheme can not only incur much less communication overhead for both update verification and integrity verification of cloud datasets with multiple replicas, but also provide enhanced security against dishonest cloud service providers.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 13
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-06-09
    Description: Bandwidth reservation has been recognized as a value-added service to the cloud provider in recent years. We consider an open market of cloud bandwidth reservation, in which cloud providers offer bandwidth reservation services to cloud tenants, especially online streaming service providers, who have strict requirements on the amount of bandwidth to guarantee their quality of services. In this paper, we model the open market as a double-sided auction, and propose the first family of ST rategy-proof double A uctions for multi-cloud, multi-tenant bandwidth R eservation (STAR). STAR contains two auction mechanisms. The first one, STAR-Grouping, divides the tenants into groups by a bid-independent way, and carefully matches the cloud providers with the tenant groups to form good trades. The second one, STAR-Padding, greedily matches the cloud providers with the tenants, and fills the partially reserved cloud provider(s) with a novel virtual padding tenant who can be a component of the auctioneer. Our analysis shows that both of the two auction mechanisms achieve strategy-proofness and ex-post budget balance. Our evaluation results show that they achieve good performance in terms of social welfare, cloud bandwidth utilization, and tenant satisfaction ratio.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 14
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-06-09
    Description: In a distributed real-time system (DRTS), jobs are often executed on a number of processors and must complete by their end-to-end deadlines. Job deadline requirements may be violated if resource competition among different jobs on a given processor is not considered. This paper introduces a distributed, locally optimal algorithm to assign local deadlines to the jobs on each processor without any restrictions on the mappings of the applications to the processors in the distributed soft real-time system. Improvedschedulability results are achieved by the algorithm since disparate workloads among the processors due to competing jobs havingdifferent paths are considered. Given its distributed nature, the proposed algorithm is adaptive to dynamic changes of the applications and avoids the overhead of global clock synchronization. In order to make the proposed algorithm more practical, two derivatives of the algorithm are proposed and compared. Simulation results based on randomly generated workloads indicate that the proposed approach outperforms existing work both in terms of the number of feasible jobs (between 51% and 313% on average) and the number of feasible task sets (between 12% and 71% on average).
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 15
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-06-09
    Description: Reproducibility, i.e. getting bitwise identical floating point results from multiple runs of the same program, is a property that many users depend on either for debugging or correctness checking in many codes [10] . However, the combination of dynamic scheduling of parallel computing resources, and floating point nonassociativity, makes attaining reproducibility a challenge even for simple reduction operations like computing the sum of a vector of numbers in parallel. We propose a technique for floating point summation that is reproducible independent of the order of summation. Our technique uses Rump’s algorithm for error-free vector transformation [7] , and is much more efficient than using (possibly very) high precision arithmetic. Our algorithm reproducibly computes highly accurate results with an absolute error bound of $n cdot 2^{-28} cdot macheps cdot max _i |v_i|$ at a cost of $7n$ FLOPs and a small constant amount of extra memory usage. Higher accuracies are also possible by increasing the number of error-free transformations. As long as all operations are performed in to-nearest rounding mode, results computed by the proposed algorithms are reproducible for any run on any platform. In particular, our algorithm requires the minimum number of reductions, i.e. one reduction of an array of six double precision floating point numbers per sum, and hence is well suited for massively parallel environments.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 16
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-06-09
    Description: In recent years, embedded dynamic random-access memory (eDRAM) technology has been implemented in last-level caches due to its low leakage energy consumption and high density. However, the fact that eDRAM presents slower access time than static RAM (SRAM) technology has prevented its inclusion in higher levels of the cache hierarchy. This paper proposes to mingle SRAM and eDRAM banks within the data array of second-level (L2) caches. The main goal is to achieve the best trade-off among performance, energy, and area. To this end, two main directions have been followed. First, this paper explores the optimal percentage of banks for each technology. Second, the cache controller is redesigned to deal with performance and energy. Performance is addressed by keeping the most likely accessed blocks in fast SRAM banks. In addition, energy savings are further enhanced by avoiding unnecessary destructive reads of eDRAM blocks. Experimental results show that, compared to a conventional SRAM L2 cache, a hybrid approach requiring similar or even lower area speedups the performance on average by 5.9 percent, while the total energy savings are by 32 percent. For a 45 nm technology node, the energy-delay-area product confirms that a hybrid cache is a better design than the conventional SRAM cache regardless of the number of eDRAM banks, and also better than a conventional eDRAM cache when the number of SRAM banks is an eighth of the total number of cache banks.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 17
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-06-09
    Description: Nearly all of the currently used signature schemes, such as RSA or DSA, are based either on the factoring assumption or the presumed intractability of the discrete logarithm problem. As a consequence, the appearance of quantum computers or algorithmic advances on these problems may lead to the unpleasant situation that a large number of today’s schemes will most likely need to be replaced with more secure alternatives. In this work we present such an alternative—an efficient signature scheme whose security is derived from the hardness of lattice problems. It is based on recent theoretical advances in lattice-based cryptography and is highly optimized for practicability and use in embedded systems. The public and secret keys are roughly $1.5$  kB and $0.3$  kB long, while the signature size is approximately $1.1$  kB for a security level of around $80$ bits. We provide implementation results on reconfigurable hardware (Spartan/Virtex-6) and demonstrate that the scheme is scalable, has low area consumption, and even outperforms classical schemes.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 18
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-06-09
    Description: With the rising demands on cloud services, the electricity consumption has been increasing drastically as the main operational expenditure (OPEX) to data center providers. The geographical heterogeneity of electricity prices motivates us to study the task placement problem over geo-distributed data centers. We exploit the dynamic frequency scaling technique and formulate an optimization problem that minimizes OPEX while guaranteeing the quality-of-service, i.e., the expected response time of tasks. Furthermore, an optimal solution is discovered for this formulated problem. The experimental results show that our proposal achieves much higher cost-efficiency than the traditional resizing scheme, i.e., by activating/deactivating certain servers in data centers.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 19
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-06-09
    Description: A new methodology for DRAM performance analysis has been proposed based on accurate characterization of DRAM bus cycles. The proposed methodology allows cycle-accurate performance analysis of arbitrary DRAM traces, obviates the need for functional simulations, allows accurate estimation of DRAM performance maximum, and enables root causing of suboptimal DRAM operation.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 20
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-12
    Description: During at-speed test of high performance sequential ICs using scan-based Logic BIST, the IC activity factor (AF) induced by the applied test vectors is significantly higher than that experienced during its in field operation. Consequently, power droop (PD) may take place during both shift and capture phases, which will slow down the circuit under test (CUT) signal transitions. At capture, this phenomenon is likely to be erroneously recognized as due to delay faults. As a result, a false test fail may be generated, with consequent increase in yield loss. In this paper, we propose two approaches to reduce the PD generated at capture during at-speed test of sequential circuits with scan-based Logic BIST using the Launch-On-Shift scheme. Both approaches increase the correlation between adjacent bits of the scan chains with respect to conventional scan-based LBIST. This way, the AF of the scan chains at capture is reduced. Consequently, the AF of the CUT at capture, thus the PD at capture, is also reduced compared to conventional scan-based LBIST. The former approach, hereinafter referred to as Low-Cost Approach (LCA), enables a 50 percent reduction in the worst case magnitude of PD during conventional logic BIST. It requires a small cost in terms of area overhead (of approximately 1.5 percent on average), and it does not increase the number of test vectors over the conventional scan-based LBIST to achieve the same Fault Coverage (FC). Moreover, compared to three recent alternative solutions, LCA features a comparable AF in the scan chains at capture, while requiring lower test time and area overhead. The second approach, hereinafter referred to as High-Reduction Approach (HRA), enables scalable PD reductions at capture of up to 87 percent, with limited additional costs in terms of area overhead and number of required test vectors for a given target FC, over our LCA approach. Particularly, compared to two of the three recent alternative solutions mentioned above, HRA en- bles a significantly lower AF in the scan chains during the application of test vectors, while requiring either a comparable area overhead or a significantly lower test time. Compared to the remaining alternative solutions mentioned above, HRA enables a similar AF in the scan chains at capture (approximately 90 percent lower than conventional scan-based LBIST), while requiring a significantly lower test time (approximately 4.87 times on average lower number of test vectors) and comparable area overhead (of approximately 1.9 percent on average).
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 21
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: The advent of the cloud computing makes storage outsourcing become a rising trend, which promotes the secure remote data auditing a hot topic that appeared in the research literature. Recently some research consider the problem of secure and efficient public data integrity auditing for shared dynamic data. However, these schemes are still not secure against the collusion of cloud storage server and revoked group users during user revocation in practical cloud storage system. In this paper, we figure out the collusion attack in the exiting scheme and provide an efficient public integrity auditing scheme with secure group user revocation based on vector commitment and verifier-local revocation group signature. We design a concrete scheme based on the our scheme definition. Our scheme supports the public checking and efficient user revocation and also some nice properties, such as confidently, efficiency, countability and traceability of secure group user revocation. Finally, the security and experimental analysis show that, compared with its relevant schemes our scheme is also secure and efficient.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 22
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: To select an appropriate level of error protection in caches, the impact of various protection schemes on the cache Failure In Time (FIT) rate must be evaluated for a target benchmark suite. However, while many simulation tools exist to evaluate area, power and performance for a set of benchmark programs, there is a dearth of such tools for reliability. This paper introduces a new cache reliability model called PARMA+ that has unique features which distinguish it from previous models. PARMA+ estimates a cache's FIT rate in the presence of spatial multi-bit faults, single-bit faults, temporal multi-bit faults and different error protection schemes including parity, ECC, early write-back and bit-interleaving. We first develop the model formally, then we demonstrate its accuracy. We have run reliability simulations for many distributions of large and small fault patterns and have compared them with accelerated fault injection simulations. PARMA+ has high accuracy and low computational complexity.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 23
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: Although the travel time is the most important information in road networks, many spatial queries, e.g., $k$ -nearest-neighbor ( $k$ -NN) and range queries, for location-based services (LBS) are only based on the network distance. This is because it is costly for an LBS provider to collect real-time traffic data from vehicles or roadside sensors to compute the travel time between two locations. With the advance of web mapping services, e.g., Google Maps, Microsoft Bing Maps, and MapQuest Maps, there is an invaluable opportunity for using such services for processing spatial queries based on the travel time. In this paper, we propose a server-side S patial M ashup S ervice (SMS) that enables the LBS provider to efficiently evaluate $k$ -NN queries in road networks using the route information and travel time retrieved from an external web mapping service. Due to the high cost of retrieving such external information, the usage limits of web mapping services, and the large number of spatial queries, we optimize the SMS for a large number of $k$ -NN queries. We first discuss how the SMS processes a single $k$ -NN query using two optimizations, namely, direction sharing and parallel requesting . Then, we extend them to process multiple concurrent $k$ -NN queries and design a performance tuning tool to provide a trade-off between the query response time and the number of external requests and more importantly, to prevent a starvation problem in the parallel requesting optimization for concurrent queries. We evaluate the performance of the proposed SMS using MapQuest Maps, a real road network, real and synthetic data sets. Experimental results show the efficiency and scalability of our optimizations designed for the SMS.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 24
    Publication Date: 2016-07-08
    Description: Several recent works have studied mobile vehicle scheduling to recharge sensor nodes via wireless energy transfer technologies. Unfortunately, most of them overlooked important factors of the vehicles’ moving energy consumption and limited recharging capacity, which may lead to problematic schedules or even stranded vehicles. In this paper, we consider the recharge scheduling problem under such important constraints. To balance energy consumption and latency, we employ one dedicated data gathering vehicle and multiple charging vehicles. We first organize sensors into clusters for easy data collection, and obtain theoretical bounds on latency. Then we establish a mathematical model for the relationship between energy consumption and replenishment, and obtain the minimum number of charging vehicles needed. We formulate the scheduling into a Profitable Traveling Salesmen Problem that maximizes profit - the amount of replenished energy less the cost of vehicle movements, and prove it is NP-hard. We devise and compare two algorithms: a greedy one that maximizes the profit at each step; an adaptive one that partitions the network and forms Capacitated Minimum Spanning Trees per partition. Through extensive evaluations, we find that the adaptive algorithm can keep the number of nonfunctional nodes at zero. It also reduces transient energy depletion by 30-50 percent and saves 10-20 percent energy. Comparisons with other common data gathering methods show that we can save 30 percent energy and reduce latency by two orders of magnitude.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 25
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: The capability of selectively sharing encrypted data with different users via public cloud storage may greatly ease security concerns over inadvertent data leaks in the cloud. A key challenge to designing such encryption schemes lies in the efficient management of encryption keys. The desired flexibility of sharing any group of selected documents with any group of users demands different encryption keys to be used for different documents. However, this also implies the necessity of securely distributing to users a large number of keys for both encryption and search, and those users will have to securely store the received keys, and submit an equally large number of keyword trapdoors to the cloud in order to perform search over the shared data. The implied need for secure communication, storage, and complexity clearly renders the approach impractical. In this paper, we address this practical problem, which is largely neglected in the literature, by proposing the novel concept of key-aggregate searchable encryption and instantiating the concept through a concrete KASE scheme, in which a data owner only needs to distribute a single key to a user for sharing a large number of documents, and the user only needs to submit a single trapdoor to the cloud for querying the shared documents. The security analysis and performance evaluation both confirm that our proposed schemes are provably secure and practically efficient.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 26
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: Infrastructure-as-a-service (IaaS) cloud providers offer tenants elastic computing resources in the form of virtual machine (VM) instances to run their jobs. Recently, providing predictable performance (i.e., performance guarantee) for tenant applications is becoming increasingly compelling in IaaS clouds. However, the hardware heterogeneity and performance interference across the same type of cloud VM instances can bring substantial performance variation to tenant applications, which inevitably stops the tenants from moving their performance-sensitive applications to the IaaS cloud. To tackle this issue, this paper proposes Heifer, a He terogeneity and i nter fer ence-aware VM provisioning framework for tenant applications, by focusing on MapReduce as a representative cloud application. It predicts the performance of MapReduce applications by designing a lightweight performance model using the online-measured resource utilization and capturing VM interference. Based on such a performance model, Heifer provisions the VM instances of the good-performing hardware type (i.e., the hardware that achieves the best application performance) to achieve predictable performance for tenant applications, by explicitly exploring the hardware heterogeneity and capturing VM interference. With extensive prototype experiments in our local private cloud and a real-world public cloud (i.e., Microsoft Azure) as well as complementary large-scale simulations, we demonstrate that Heifer can guarantee the job performance while saving the job budget for tenants. Moreover, our evaluation results show that Heifer can improve the job throughput of cloud datacenters, such that the revenue of cloud providers can be increased, thereby achieving a win-win situation between providers and tenants.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 27
    Publication Date: 2016-07-08
    Description: Gaussian normal bases (GNBs) are special set of normal bases (NBs) which yield low complexity $GFleft(2^{m}right)$ arithmetic operations. In this paper, we present new architectures for the digit-level single, hybrid-double, and hybrid-triple multiplication of $GFleft(2^{m}right)$ elements based on the GNB representation for odd values of $m > 1$ . The proposed fully-serial-in single multipliers perform multiplication of two field elements and offer high throughput when the data-path capacity for entering inputs is limited. The proposed hybrid-double and hybrid-triple digit-level GNB multipliers perform, respectively, two and three field multiplications using the same latency required for a single digit-level multiplier, at the expense of increased area. In addition, we present a new eight-ary field exponentiation architecture which does not require precomputed or stored intermediate values.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 28
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: Shamir's secret sharing scheme is an effective way to distribute secret to a group of shareholders. The security of the unprotected sharing scheme, however, can be easily broken by cheaters or attackers who maliciously feed incorrect shares during the secret recovery stage or inject faults into hardware computing the secret. In this paper, we propose cheater detection and identification schemes based on robust and algebraic manipulation detection (AMD) codes and m-disjunct matrices (superimposed codes). We present the constructions of codes for cheater detection and identification and describe how the cheater identification problem can be related to the classic group testing algorithms based on m-disjunct matrices. Simulation and synthesis results show that the proposed architecture can improve the security level significantly even under strong cheating attack models with reasonable area and timing overheads.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 29
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: Cloud platforms encompass a large number of storage services that can be used to manage the needs of customers. Each of these services, offered by a different provider, is characterized by specific features, limitations and prices. In presence of multiple options, it is crucial to select the best solution fitting the customer requirements in terms of quality of service and costs. Most of the available approaches are not able to handle uncertainty in the expression of subjective preferences from customers, and can result in wrong (or sub-optimal) service selections in presence of rational/selfish providers, exposing untrustworthy indications concerning the quality of service levels and prices associated to their offers. In addition, due to its multi-objective nature, the optimal service selection process results in a very complex task to be managed, when possible, in a distributed way, for well-known scalability reasons. In this work, we aim at facing the above challenges by proposing three novel contributions. The fuzzy sets theory is used to express vagueness in the subjective preferences of the customers. The service selection is resolved with the distributed application of fuzzy inference or Dempster-Shafer theory of evidence. The selection strategy is also complemented by the adoption of a game theoretic approach for promoting truth-telling ones among service providers. We present empirical evidence of the proposed solution effectiveness through properly crafted simulation experiments.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 30
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 31
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: As the cloud computing technology develops during the last decade, outsourcing data to cloud service for storage becomes an attractive trend, which benefits in sparing efforts on heavy data maintenance and management. Nevertheless, since the outsourced cloud storage is not fully trustworthy, it raises security concerns on how to realize data deduplication in cloud while achieving integrity auditing. In this work, we study the problem of integrity auditing and secure deduplication on cloud data. Specifically, aiming at achieving both data integrity and deduplication in cloud, we propose two secure systems, namely SecCloud and SecCloud $^+$ . SecCloud introduces an auditing entity with a maintenance of a MapReduce cloud, which helps clients generate data tags before uploading as well as audit the integrity of data having been stored in cloud. Compared with previous work, the computation by user in SecCloud is greatly reduced during the file uploading and auditing phases. SecCloud $^+$ is designed motivated by the fact that customers always want to encrypt their data before uploading, and enables integrity auditing and secure deduplication on encrypted data.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 32
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: Cache compression improves the performance of a multi-core system by being able to store more cache blocks in a compressed format. Compression is achieved by exploiting data patterns present within a block. For a given cache space, compression increases the effective cache capacity. However, this increase is limited by the number of tags that can be accommodated at the cache. Prefetching is another technique that improves system performance by fetching the cache blocks ahead of time into the cache and hiding the off-chip latency. Commonly used hardware prefetchers, such as stream and stride, fetch multiple contiguous blocks into the cache. In this paper we propose prefetched blocks compaction (PBC) wherein we exploit the data patterns present across these prefetched blocks. PBC compacts the prefetched blocks into a single block with a single tag, effectively increasing the cache capacity. We also modify the cache organization to access these multiple cache blocks residing in a single block without any need for extra tag look-ups. PBC improves the system performance by 11.1 percent with a maximum of 43.4 percent on a four-core system.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 33
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: Multi-core processors achieve a trade-off between the performance and the power consumption by using Dynamic Voltage Scaling (DVS) techniques. In this paper, we study the power efficient scheduling problem of real-time tasks in an identical multi-core system, and present Node Scaling model to achieve power-aware scheduling. We prove that there is a bound speed which results in the minimal power consumption for a given task set, and the maximal value of task utilization, $u_{max}$ , in a task set is a key element to decide its minimal power consumption. Based on the value $u_{max}$ , we classify task sets into two categories: the bounded task sets and the non-bounded task sets, and we prove the lower bound of power consumption for each type of task set. Simulations based on Intel Xeon X5550 and PXA270 processors show Node Scaling model can achieve power efficient scheduling by applying to existing algorithms such as EDF-FF and SPA2. The ratio of power reduction depends on the multi-core processor's property which is defined as the ratio of the bound speed to the maximal speed of the cores. When the ratio of speeds decreases, the ratio of power reduction increases for all the power efficient algorithms.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 34
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: Existing secure and privacy-preserving schemes for vehicular communications in vehicular ad hoc networks face some challenges, e.g., reducing the dependence on ideal tamper-proof devices, building efficient member revocation mechanisms and avoiding computation and communication bottlenecks. To cope with those challenges, we propose a highly efficient secure and privacy-preserving scheme based on identity-based aggregate signatures. Our scheme enables hierarchical aggregation and batch verification. The individual identity-based signatures generated by different vehicles can be aggregated and verified in a batch. The aggregated signatures can be re-aggregated by a message collector (e.g., traffic management authority). With our hierarchical aggregation technique, we significantly reduce the transmission/storage overhead of the vehicles and other parties. Furthermore, existing batch verification based schemes in vehicular ad hoc networks require vehicles to wait for enough messages to perform a batch verification. In contrast, we assume that vehicles will generate messages (and the corresponding signatures) in certain time spans, so that vehicles only need to wait for a very short period before they can start the batch verification procedure. Simulation shows that a vehicle can verify the received messages with very low latency and fast response.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 35
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: Computer vision applications have a large disparity in operations, data representation and memory access patterns from the early vision stages to the final classification and recognition stages. A hardware system for computer vision has to provide high flexibility without compromising performance, exploiting massively spatial-parallel operations but also keeping a high throughput on data-dependent and complex program flows. Furthermore, the architecture must be modular, scalable and easy to adapt to the needs of different applications. Keeping this in mind, a hybrid SIMD/MIMD architecture for embedded computer vision is proposed. It consists of a coprocessor designed to provide fast and flexible computation of demanding image processing tasks of vision applications. A 32-bit 128-unit device was prototyped on a Virtex-6 FPGA which delivers a peak performance of 19.6 GOP/s and 7.2 W of power dissipation.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 36
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: The key to reducing static energy in supercomputers is switching off their unused components. Routers are the major components of a supercomputer. Whether routers can be effectively switched off or not has become the key to static energy management for supercomputers. For many typical applications, the routers in a supercomputer exhibit low utilization. However, there is no effective method to switch the routers off when they are idle. By analyzing the router occupancy in time and space, for the first time, we present a routing-policy guided topology partitioning methodology to solve this problem. We propose topology partitioning methods for three kinds of commonly used topologies (mesh, torus and fat-tree) equipped with the three most popular routing policies (deterministic routing, directionally adaptive routing and fully adaptive routing). Based on the above methods, we propose the key techniques required in this topology partitioning based static energy management in supercomputer interconnection networks to switch off unused routers in both time and space dimensions. Three topology-aware resource allocation algorithms have been developed to handle effectively different job-mixes running on a supercomputer. We validate the effectiveness of our methodology by using Tianhe-2 and a simulator for the aforementioned topologies and routing policies. The energy savings achieved on a subsystem of Tianhe-2 range from 3.8 to 79.7 percent. This translates into a yearly energy cost reduction of up to half a million US dollars for Tianhe-2.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 37
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: This paper proposes several designs of approximate restoring dividers; two different levels of approximation (cell and array levels) are employed. Three approximate subtractor cells are utilized for integer subtraction as basic step of division; these cells tend to mitigate accuracy in subtraction with other metrics, such as circuit complexity and power dissipation. At array level, exact cells are either replaced or truncated in the approximate divider designs. A comprehensive evaluation of approximation at both cell- and array (divider) levels is pursued using error analysis and HSPICE simulation; different circuit metrics including complexity and power dissipation are evaluated. Different applications are investigated by utilizing the proposed approximate arithmetic circuits. The simulation results show that with extensive savings for power dissipation and circuit complexity, the proposed designs offer better error tolerant capabilities for quotient oriented applications (image processing) than remainder oriented application (modulo operations). The proposed approximate restoring divider is significantly better than the approximate non-restoring scheme presented in the technical literature.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 38
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: Wireless sensor networks (WSNs) have been considered to be the next generation paradigm of structural health monitoring (SHM) systems due to the low cost, high scalability and ease of deployment. Due to the intrinsically energy-intensive nature of the sensor nodes in SHM application, it is highly preferable that they can be divided into subsets and take turns to monitor the condition of a structure. This approach is generally called as ‘coverage-preserving scheduling’ and has been widely adopted in existing WSN applications. The problem of partitioning the nodes into subsets is generally called as the ’maximum lifetime coverage problem (MLCP)’. However, existing solutions to the MLCP cannot be directly applied to SHM application. As compared to other WSN applications, we cannot define a specific coverage area independently for each sensor node in SHM, which is however the basic assumption in all existing solutions to the MLCP. In this paper, we proposed two approaches to solve the MLCP in SHM. The performance of the methods is demonstrated through both extensive simulations and real experiments.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 39
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: NAND flash memory is widely used for the secondary storage of computer systems. The flash translation layer (FTL) is the firmware that manages and operates a flash-based storage device. One of the FTL's modules manages the RAM buffer of the flash device. Now this RAM buffer is sufficient to be used for both address mapping and data buffering. As the fastest component of the flash layer interface, effective management of this buffer has a significant impact on the performance of data storage and access. This paper proposes a novel scheme called TreeFTL for this purpose. TreeFTL organizes address translation pages and data storage pages in a tree-like structure in the RAM buffer. The tree enables TreeFTL to adapt to the access behaviors of workloads by dynamically adjusting the partitions for address mapping and data buffering. Furthermore, TreeFTL employs a lightweight mechanism to evict the least-recently-used victim pages when the need arises. Our experiments show that TreeFTL is able to spend 46.6 and 49.0 percent less service time over various workloads than two state-of-the-art algorithms, respectively, for a 64 MB RAM buffer.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 40
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: Whereas clustered microarchitectures themselves have been extensively studied, the memory units for these clustered microarchitectures have received relatively little attention. This article discusses some of the inherent challenges of clustered memory units and shows how these can be overcome. Clustered memory pipelines work well with the late allocation of load/store queue entries and physically unordered queues. Yet this approach has characteristic problems such as queue overflows and allocation patterns that lead to deadlocks. We propose techniques to solve each of these problems and show that a distributed memory unit can offer significant energy savings and speedups over a centralized unit. For instance, compared to a centralized cache with a load/store queue of 64/24 entries, our four-cluster distributed memory unit with load/store queues of 16/8 entries each consumes 31 percent less energy and performs 4,7 percent better on SPECint and consumes 36 percent less energy and performs 7 percent better for SPECfp.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 41
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: In Broadcast Encryption (BE) systems like Pay-TV, AACS, online content sharing and broadcasting, reducing the header length (communication overhead per session) is of practical interest. The Subset Difference (SD) scheme due to Naor-Naor-Lotspiech (NNL) is the most popularly used BE scheme. We introduce the $(a,b,gamma)$ augmented binary tree subset difference ( $(a,b,gamma)$ -ABTSD) scheme which is a generalization of the NNL-SD scheme. By varying the parameters $(a,b,gamma)$ , it is possible to obtain $O(nlog n)$ different schemes. The average header length achieved by the new schemes is smaller than all known schemes having the same decryption time as that of the NNL-SD scheme and achieving non-trivial trade-offs between the user storage and the header size. The amount of key material that a user is required to store increases. For the earlier mentioned applications, reducing header size and achieving fast decryption is perhaps more of a concern than the user storage.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 42
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: We propose a new optimal data placement technique to improve the performance of MapReduce in cloud data centers by considering not only the data locality but also the global data access costs. We first conducted an analytical and experimental study to identify the performance issues of MapReduce in data centers and to show that MapReduce tasks that are involved in unexpected remote data access have much greater communication costs and execution time, and can significantly deteriorate the overall performance. Next, we formulated the problem of optimal data placement and proposed a generative model to minimize global data access cost in data centers and showed that the optimal data placement problem is NP-hard. To solve the optimal data placement problem, we propose a topology-aware heuristic algorithm by first constructing a replica-balanced distribution tree for the abstract tree structure, and then building a replica-similarity distribution tree for detail tree construction, to construct an optimal replica distribution tree. The experimental results demonstrated that our optimal data placement approach can improve the performance of MapReduce with lower communication and computation costs by effectively minimizing global data access costs, more specifically reducing unexpected remote data access.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 43
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: This paper presents a Ternary Content-addressable Memory (TCAM) design which is based on the use of floating-gate (flash) transistors. TCAMs are extensively used in high speed IP networking, and are commonly found in routers in the internet core. Traditional TCAM ICs are built using CMOS devices, and a single TCAM cell utilizes 17 transistors. In contrast, our TCAM cell utilizes only two flash transistors, thereby significantly reducing circuit area. We cover the chip-level architecture of the TCAM IC briefly, focusing mainly on the TCAM block which does fast parallel IP routing table lookup. Our flash-based TCAM (FTCAM) block is simulated in SPICE, and we show that it has a significantly lowered area compared to a CMOS based TCAM block, with a speed that can meet current ( $sim$ 400 Gb/s) data rates that are found in the internet core.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 44
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: The Booth multiplier has been widely used for high performance signed multiplication by encoding and thereby reducing the number of partial products. A multiplier using the radix- $4$ (or modified Booth) algorithm is very efficient due to the ease of partial product generation, whereas the radix- $8$ Booth multiplier is slow due to the complexity of generating the odd multiples of the multiplicand. In this paper, this issue is alleviated by the application of approximate designs. An approximate $2$ -bit adder is deliberately designed for calculating the sum of $1times$ and $2times$ of a binary number. This adder requires a small area, a low power and a short critical path delay. Subsequently, the $2$ -bit adder is employed to implement the less significant section of a recoding adder for generating the triple multiplicand with no carry propagation. In the pursuit of a trade-off between accuracy and power consumption, two signed $16times 16$ bit approximate radix-8 Booth multipliers are designed using the approximate recoding adder with and without the truncation of a number of less significant bits in the partial products. The proposed approximate multipliers are faster and more power efficient than the accurate Booth multiplier. The multiplier with 15-bit truncation achieves the best overall performance in terms of hardware and accuracy when compared to other approximate Booth multiplier designs. Finally, the approximate multipliers are applied to the design of a low-pass FIR filter and they show better performance than other approximate Booth multipliers.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 45
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: Solid State Drives (SSDs) have been extensively deployed as the cache of hard disk-based storage systems. The SSD-based cache generally supplies ultra-large capacity, whereas managing so large a cache introduces excessive memory overhead, which in turn makes the SSD-based cache neither cost-effective nor energy-efficient. This work targets to reduce the memory overhead introduced by the replacement policy of SSD-based cache. Traditionally, data structures involved in cache replacement policy reside in main memory. While these in-memory data structures are not suitable for SSD-based cache any more since the cache is much larger than ever. We propose a memory-efficient framework which keeps most data structures in SSD while just leaving the memory-efficient data structure (i.e., a new bloom proposed in this work) in main memory. Our framework can be used to implement any LRU-based replacement policies under negligible memory overhead. We evaluate our proposals via theoretical analysis and prototype implementation. Experimental results demonstrate that, our framework is practical to implement most replacement policies for large caches, and is able to reduce the memory overhead by about $10 times$ .
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 46
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: A large portion of existing multithreaded embedded sofware has been programmed according to symmetric shared memory platforms where a monolithic memory block is shared by all cores. Such platforms accommodate popular parallel programming models such as POSIX threads and OpenMP. However with the growing number of cores in modern manycore embedded architectures, they present a bottleneck related to their centralized memory accesses. This paper proposes a solution tailored for an efficient execution of applications defined with shared-memory programming models onto on-chip distributed-memory multicore architectures. It shows how performance, area and energy consumption are significantly improved thanks to the scalability of these architectures. This is illustrated in an open-source realistic design framework, including tools from ASIC to microkernel.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 47
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: This paper describes a procedure that computes seeds for $LFSR$ -based generation of partially-functional broadside tests. Existing $LFSR$ -based test data compression methods compute seeds based on incompletely-specified test cubes. Functional broadside tests are fully-specified, and they have fully-specified scan-in states. This is the main challenge that the test generation procedure described in this paper needs to address. It addresses it by using a process that modifies an initial seed $s_i$ in order to reduce the Hamming distance between the scan-in state $p_i$ that $s_i$ creates and a reachable state $r_j$ . When the Hamming distance is reduced to zero, the seed can be used for generating functional broadside tests. When the distance is larger than zero, the tests are partially-functional. Experimental results are presented for transition faults in benchmark circuits to demonstrate the resulting distances and fault co- erage.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 48
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-07-08
    Description: A new apparatus for fast multiplication of two numbers is introduced. Inputs are split into partitions, and one number is replaced by two with zeros interlaced in every other partition. Products are computed with no carries between partitions, in the time required to multiply the short partitions and add the partial sums. Component adders and multipliers can be chosen to trade off area and speed. A new graphical tool is used to compare this multiplier to existing ones based on CMOS VLSI simulations.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 49
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-05-10
    Description: Covert channels are widely considered as a major risk of information leakage in various operating systems, such as desktop, cloud, and mobile systems. The existing works of modeling covert channels have mainly focused on using finite state machines (FSMs) and their transforms to describe the process of covert channel transmission. However, a FSM is rather an abstract model, where information about the shared resource, synchronization, and encoding/decoding cannot be presented in the model, making it difficult for researchers to realize and analyze the covert channels. In this paper, we use the high-level Petri Nets (HLPN) to model the structural and behavioral properties of covert channels. We use the HLPN to model the classic covert channel protocol. Moreover, the results from the analysis of the HLPN model are used to highlight the major shortcomings and interferences in the protocol. Furthermore, we propose two new covert channel models, namely: (a) two channel transmission protocol (TCTP) model and (b) self-adaptive protocol (SAP) model. The TCTP model circumvents the mutual inferences in encoding and synchronization operations; whereas the SAP model uses sleeping time and redundancy check to ensure correct transmission in an environment with strong noise. To demonstrate the correctness and usability of our proposed models in heterogeneous environments, we implement the TCTP and SAP in three different systems: (a) Linux, (b) Xen, and (c) Fiasco.OC. Our implementation also indicates the practicability of the models in heterogeneous, scalable and flexible environments.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 50
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-05-10
    Description: Recovery from sudden power-off (SPO) is one of the primary concerns among practitioners which bars the quick and wide deployment of flash storage devices. In this work, we propose Metadata Embedded Write (MEW), a novel scheme for handling the sudden power-off recovery in modern flash storage devices. Given that a large fraction of commercial SSDs employ compression technology, MEW exploits the compression-induced internal fragmentation in the data area to store rich metadata for fast and complete recovery. MEW consists of (i) a metadata embedding scheme to harbor SSD metadata in a physical page together with multiple compressed logical pages, (ii) an allocation chain based fast recovery scheme, and (iii) a light-weight metadata logging scheme which enables MEW to maintain the metadata for incompressible data, too. We performed extensive experiments to examine the performance of MEW. The performance overhead of MEW is 3 percent in the worst case, in terms of the write amplification factor, compared to the pure compression-based FTL that does not have any recovery scheme.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 51
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-05-10
    Description: Scaling the CMOS devices deep into the nanorange reduces their reliability margins significantly. Consequently, accurately calculating the reliability of digital nanocircuits is becoming a necessity for investigating design alternatives to optimize the trade-offs between area-power-delay and reliability. However, accurate reliability calculation of large and highly connected circuits is complex and very time consuming. This paper proposes a progressive consensus-based algorithm for identifying the worst reliability input vectors and the associated critical logic gates. Improving the reliability of the critical gates helps circuit designers to effectively improve the circuit overall reliability while having a minimal impact on the traditional power-area-deal design parameters. The accuracy and efficiency of the algorithm can be tuned to fit a variety of applications. The algorithm scales well with circuit size, and is independent of the interconnect complexity and the logic depth. Extensive computational results show that the accuracy and the efficiency of the proposed algorithm are better than the most recent results reported in the literature.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 52
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-05-10
    Description: The Serial-out bit-level multiplication scheme is characterized by an important latency feature. It has an ability to sequentially generate an output bit of the multiplication result in each clock cycle. However, the computational complexity of the existing serial-out bit-level multipliers in $GF$ ( $2^m$ ) using normal basis representation, limits its usefulness in many applications; hence, an optimized serial-out bit-level multiplier using polynomial basis representation is needed. In this paper, we propose new serial-out bit-level Mastrovito multiplier schemes. We show that in terms of the time complexities, the proposed multiplier schemes outperform the existing serial-out bit-level schemes available in the literature. In addition, using the proposed multiplier schemes, we present new hybrid-double multiplication architectures. To the best of our knowledge, this is the first time such a hybrid multiplier structure using the polynomial basis is proposed. Prototypes of the presented serial-out bit-level schemes and the proposed hybrid-double multiplication architectures (10 schemes in total) are implemented over both $GF(2^{163})$ and $GF(2^{233})$ , and experimental results are presented.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 53
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-05-10
    Description: Though a cooperative broadcast scheme has been proposed for fading environments, it has two defects: First, it only handles a packet flow from a single source node in the network, but does not consider the scenario of multiple packet flows simultaneously broadcasted from different source nodes. Second, it only allows a single relay node to forward a packet in each time slot, though multiple relay nodes forwarding in a time slot can significantly reduce broadcast latency. In this paper, we aim achieve low-latency multi-flow broadcast in wireless multi-hop networks with fading channels. To describe the interference among the transmission in different flows, we incorporate the Rayleigh fading model to the signal to noise ratio (SNR) model. Then, we introduce a cooperative diversity scheme which allows multiple relays forwarding in a time slot to reduce broadcast latency. We then formulate an interesting problem: In a fading environment, what is the optimal relay allocation schedule to minimize the broadcast latency? We propose a warm up heuristic algorithm for single-flow cooperative broadcast, based on which, we further propose a heuristic algorithm for multi-flow cooperative broadcast. Simulation results demonstrate that the two algorithms achieve lower broadcast latency than a previous method.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 54
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-05-10
    Description: On modern multicore machines, the memory management typically combines address interleaving in hardware and random allocation in the operating system (OS) to improve performance of both memory and cache. The conventional solutions, however, are increasingly strained as a wide variety of workloads run on complicated memory hierarchy and cause contention at multiple levels. We describe a new framework (named HVR) in OS memory management to support a flexible policy space for tackling diverse application needs, integrating vertical partitioning across layers, horizontal partitioning and random-interleaved allocation at a single layer. We exhaustively study the performance of these policies for over 2,000 workloads and correlate performance with application characteristics. Based on this correlation we derive several practical rules of memory allocation that we integrate into the unified HVR framework to guide resource partitioning and sharing for dynamic and diverse workloads. We implement our approach in Linux kernel 2.6.32 as a restructured page indexing system plus a series of kernel modules. Experimental results show that our framework consistently outperforms the unmodified Linux kernel, with up to 21 percent performance gains, and outperforms prior solutions at individual levels of the memory hierarchy.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 55
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-05-10
    Description: With the explosive growth in data volume, the I/O bottleneck has become an increasingly daunting challenge for big data analytics in the Cloud. Recent studies have shown that moderate to high data redundancy clearly exists in primary storage systems in the Cloud. Our experimental studies reveal that data redundancy exhibits a much higher level of intensity on the I/O path than that on disks due to relatively high temporal access locality associated with small I/O requests to redundant data. Moreover, directly applying data deduplication to primary storage systems in the Cloud will likely cause space contention in memory and data fragmentation on disks. Based on these observations, we propose a performance-oriented I/O deduplication, called POD, rather than a capacity-oriented I/O deduplication, exemplified by iDedup, to improve the I/O performance of primary storage systems in the Cloud without sacrificing capacity savings of the latter. POD takes a two-pronged approach to improving the performance of primary storage systems and minimizing performance overhead of deduplication, namely, a request-based selective deduplication technique, called Select-Dedupe, to alleviate the data fragmentation and an adaptive memory management scheme, called iCache, to ease the memory contention between the bursty read traffic and the bursty write traffic. We have implemented a prototype of POD as a module in the Linux operating system. The experiments conducted on our lightweight prototype implementation of POD show that POD significantly outperforms iDedup in the I/O performance measure by up to 87.9 percent with an average of 58.8 percent. Moreover, our evaluation results also show that POD achieves comparable or better capacity savings than iDedup.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 56
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-05-10
    Description: This manuscript proposes three classes of codes for error correction in a storage system in which the memory cells do not have the same number of levels, i.e., a multiscale storage. The proposed codes are single multiscale-symbol error correction (SMSEC) codes and are capable of correcting any errors occurring on a single memory cell, namely a column-deleted SMSEC code, an element-compacted SMSEC code and a product SMSEC code. In the proposed codes, the codewords are divided into two partitions, the elements on the first partition are over GF(2 b 1 ), while those on the remaining partition are over GF(2 b 2 ). This paper also gives guidelines for selection among the three SMSEC codes to meet the desired hardware overhead in the parallel decoder for realistic parameters of the partition pair, such as ( b 1 , b 2 ) = (4,3), (4,2) and (3,2). Moreover it is shown that the best choice for a MSS system is the SMSEC code with the shortest check bit length; if the check bit lengths of at least two codes are equal, then the use of the element-compacted SMSEC code incurs in the smallest hardware overhead.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 57
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-05-10
    Description: Multi-cloud storage can provide better features such as availability and scalability. Current works use multiple cloud storage providers with erasure coding to achieve certain benefits including fault-tolerance improving or vendor lock-in avoiding. However, these works only use the multi-cloud storage in ad-hoc ways, and none of them considers the optimization issue in general. In fact, the key to optimize the multi-cloud storage is to effectively choose providers and erasure coding parameters. Meanwhile, the data placement should satisfy system or application developers’ requirements. As developers often demand various objectives to be optimized simultaneously, such complex requirement optimization cannot be easily fulfilled by ad-hoc ways. This paper presents Triones, a systematic model to formally formulate data placement in multi-cloud storage by using erasure coding. Firstly, Triones addresses the problem of data placement optimization by applying non-linear programming and geometric space abstraction. It could satisfy complex requirements involving multi-objective optimization. Secondly, Triones can effectively balance among different objectives in optimization and is scalable to incorporate new ones. The effectiveness of the model is proved by extensive experiments on multiple cloud storage providers in the real world. For simple requirements, Triones can achieve 50 percent access latency reduction, compared with the model in $mu$ LibCloud. For complex requirements, Triones can improve fault-tolerance level by 2 $times$ and reduce access latency and vendor lock-in level by 30 $sim$ 70 percent and 49.85 percent respectively with about 19.19 percent more cost, compared with the model only optimizing cost in Scalia.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 58
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016-05-10
    Description: In this paper, we propose a two-factor data security protection mechanism with factor revocability for cloud storage system. Our system allows a sender to send an encrypted message to a receiver through a cloud storage server. The sender only needs to know the identity of the receiver but no other information (such as its public key or its certificate). The receiver needs to possess two things in order to decrypt the ciphertext. The first thing is his/her secret key stored in the computer. The second thing is a unique personal security device which connects to the computer. It is impossible to decrypt the ciphertext without either piece. More importantly, once the security device is stolen or lost, this device is revoked. It cannot be used to decrypt any ciphertext. This can be done by the cloud server which will immediately execute some algorithms to change the existing ciphertext to be un-decryptable by this device. This process is completely transparent to the sender. Furthermore, the cloud server cannot decrypt any ciphertext at any time. The security and efficiency analysis show that our system is not only secure but also practical.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 59
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2013-09-28
    Description: File replication is widely used in structured P2P systems to avoid hot spots in servers and enhance file availability. The number of replicas and replication distance affect the file replication cost. These two elements and the replica update frequency determined in the file replication stage also affect the cost of subsequent consistency maintenance. However, most existing file replication protocols focus on improving file lookup efficiency without considering its cost and its subsequent influence on consistency maintenance. This paper studies the problem about how a server chooses files to replicate and where to replicate files to achieve low cost in both file replication and consistency maintenance stages without compromising the effectiveness of file replication. This paper presents a lightweight and Cooperative multifactOr considered file Replication Protocol (CORP) to achieve this goal. CORP simultaneously takes into account multiple factors including file popularity, update rate, node available capacity, file load, and node locality, aiming to minimize the number of replicas, update frequency, and replication distance. CORP also dynamically adjusts the number of replicas based on ever-changing file popularity and visit pattern. Extensive experimental results from simulation and PlanetLab real-world testbed demonstrate the efficiency and effectiveness of CORP in comparison with other file replication protocols. It dramatically reduces the overhead of both file replication and consistency maintenance. In addition, it exhibits high adaptiveness to skewed lookups and yields significant improvement in reducing overloaded nodes. Specifically, compared to the other replication protocols, CORP can reduce more than 71 percent of file replicas, 84 percent of overloaded nodes, 94 percent of consistency maintenance cost, and 72 percent of file replication and consistency maintenance latency.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 60
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2013-09-28
    Description: Power consumption has become a limiting factor in designing next generation network routers. Recent observation shows that IP lookup engines dominate the power consumption of core routers. Previous work on reducing power consumption of routers mainly focused on network- and system-level optimizations. This paper represents the first thorough study on the data structure optimization for lowering the power consumption in static random access memory (SRAM)-based IP lookup engines. Three different SRAM-based IP lookup architectures are discussed: nonpipelined, simple pipelined, and memory-balanced pipelined architectures. For each architecture, we formulate the problem of power minimization by revisiting the time-space tradeoff in multibit tries. Two distinct multibit trie algorithms are investigated: the expanded trie and the tree bitmap trie, which are widely used in SRAM-based IP lookup solutions. A theoretical framework is proposed to determine the optimal strides for building a multibit trie so that the worst-case power consumption of the IP lookup architecture is minimized. Experiments using real-life routing tables including both IPv4 and IPv6 data sets demonstrate that careful selection of strides in building the multibit tries can reduce the power consumption dramatically. We believe our methodology can be applied to other variants of multibit tries and can help in designing more power-efficient SRAM-based IP lookup architectures.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 61
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2013-09-28
    Description: As new transport protocols are being proposed and standardized, the choice of the best communication service to be used by applications for delivering their data when distributed is becoming too complex. Application developers need much knowledge on "how the protocol worksâ to decide whether or not it can be used to fulfill their requirements. Moreover, the performance of the service provided by a given communication protocol is highly dependent on the network context. The Autonomic Transport Protocol presented in this paper is aware of the application requirements and uses learning techniques to adapt the service it provides to best satisfy these requirements as the network conditions vary.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 62
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2013-09-28
    Description: Hardware Trojan attack in the form of malicious modification of a design has emerged as a major security threat. Side-channel analysis has been investigated as an alternative to conventional logic testing to detect the presence of hardware Trojans. However, these techniques suffer from decreased sensitivity toward small Trojans, especially because of the large process variations present in modern nanometer technologies. In this paper, we propose a novel noninvasive, multiple-parameter side-channel analysis-based Trojan detection approach. We use the intrinsic relationship between dynamic current and maximum operating frequency of a circuit to isolate the effect of a Trojan circuit from process noise. We propose a vector generation approach and several design/test techniques to improve the detection sensitivity. Simulation results with two large circuits, a 32-bit integer execution unit (IEU) and a 128-bit advanced encryption standard (AES) cipher, show a detection resolution of 1.12 percent amidst $(pm 20)$ percent parameter variations. The approach is also validated with experimental results. Finally, the use of a combined side-channel analysis and logic testing approach is shown to provide high overall detection coverage for hardware Trojan circuits of varying types and sizes.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 63
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2013-09-28
    Description: Public-key encryption with keyword search (PEKS) is a versatile tool. It allows a third party knowing the search trapdoor of a keyword to search encrypted documents containing that keyword without decrypting the documents or knowing the keyword. However, it is shown that the keyword will be compromised by a malicious third party under a keyword guess attack (KGA) if the keyword space is in a polynomial size. We address this problem with a keyword privacy enhanced variant of PEKS referred to as public-key encryption with fuzzy keyword search (PEFKS). In PEFKS, each keyword corresponds to an exact keyword search trapdoor and a fuzzy keyword search trapdoor. Two or more keywords share the same fuzzy keyword trapdoor. To search encrypted documents containing a specific keyword, only the fuzzy keyword search trapdoor is provided to the third party, i.e., the searcher. Thus, in PEFKS, a malicious searcher can no longer learn the exact keyword to be searched even if the keyword space is small. We propose a universal transformation which converts any anonymous identity-based encryption (IBE) scheme into a secure PEFKS scheme. Following the generic construction, we instantiate the first PEFKS scheme proven to be secure under KGA in the case that the keyword space is in a polynomial size.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 64
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2013-09-28
    Description: This paper presents techniques for low-power addition/subtraction in the logarithmic number system (LNS) and quantifies their impact on digital filter VLSI implementation. The impact of partitioning the look-up tables required for LNS addition/subtraction on complexity, performance, and power dissipation of the corresponding circuits is quantified. Two design parameters are exploited to minimize complexity, namely the LNS base and the organization of the LNS word. A roundoff noise model is used to demonstrate the impact of base and word length on the signal-to-noise ratio of the output of finite impulse response (FIR) filters. In addition, techniques for the low-power implementation of an LNS multiply accumulate (MAC) units are investigated. Furthermore, it is shown that the proposed techniques can be extended to cotransformation-based circuits that employ interpolators. The results are demonstrated by evaluating the power dissipation, complexity and performance of several FIR filter configurations comprising one, two or four MAC units. Simulations of placed and routed VLSI LNS-based digital filters using a 90-nm 1.0 V CMOS standard-cell library reveal that significant power dissipation savings are possible by using optimized LNS circuits at no performance penalty, when compared to linear fixed-point two's-complement equivalents.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 65
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2013-09-28
    Description: NAND flash-based storage device is becoming a viable storage solution for mobile and desktop systems. Because of the erase-before-write nature, flash-based storage devices require garbage collection that causes significant performance degradation, incurring a large number of page migrations and block erasures. To improve I/O performance, therefore, it is important to develop an efficient garbage collection algorithm. In this paper, we propose a novel garbage collection technique, called buffer-aware garbage collection (BAGC), for flash-based storage devices. The BAGC improves the efficiency of two main steps of garbage collection, a block merge step and a victim block selection step, by taking account of the contents of a buffer cache, which is typically used to enhance I/O performance. The buffer-aware block merge (BABM) scheme eliminates unnecessary page migrations by evicting dirty data from a buffer cache during a block merge step. The buffer-aware victim block selection (BAVBS) scheme, on the other hand, selects a victim block so that the benefit of the buffer-aware block merge is maximized. Our experimental results show that BAGC improves I/O performance by up to 43 percent over existing buffer-unaware schemes for various benchmarks.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 66
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2013-09-28
    Description: Workflow-based workloads usually consist of multiple instances of the same workflow, which are jobs with control or data dependencies to carry out a well-defined scientific computation task, with each instance acting on its own input data. To maximize the performance, a high degree of concurrency is always achieved by running multiple instances simultaneously. However, since the amount of storage is limited on most systems, deadlock due to oversubscribed storage requests is a potential problem. To address this problem, we integrate two novel concepts with the traditional problem of deadlock avoidance by proposing two algorithms that can maximize active (not just allocated) resource utilization and minimize makespan. Our approach is based on the well-known banker's algorithm, but our algorithms make the important distinction between active and inactive resources, which is not a part of previous approaches. The central idea is to leverage the data-flow information to dynamically approximate localized maximum claim (i.e., the resource requirements of the remaining jobs of the instance) to improve either interinstance or intrainstance concurrency and still avoid deadlock. Through simulation-based studies, we show how our proposed algorithms are better than the classic banker's algorithm and the more recent Lang's algorithm in terms of makespan and active storage resource utilization.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 67
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2013-09-28
    Description: Security of today's networks heavily rely on network intrusion detection systems (NIDSs). The ability to promptly update the supported rule sets and detect new emerging attacks makes field-programmable gate arrays (FPGAs) a very appealing technology. An important issue is how to scale FPGA-based NIDS implementations to ever faster network links. Whereas a trivial approach is to balance traffic over multiple, but functionally equivalent, hardware blocks, each implementing the whole rule set (several thousands rules), the obvious cons is the linear increase in the resource occupation. In this work, we promote a different, traffic-aware, modular approach in the design of FPGA-based NIDS. Instead of purely splitting traffic across equivalent modules, we classify and group homogeneous traffic, and dispatch it to differently capable hardware blocks, each supporting a (smaller) rule set tailored to the specific traffic category. We implement and validate our approach using the rule set of the well-known Snort NIDS, and we experimentally investigate the emerging trade-offs and advantages, showing resource savings up to 80 percent based on real-world traffic statistics gathered from an operator's backbone.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 68
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2013-09-28
    Description: Reliability evaluation of interconnection network is important to the design and maintenance of multiprocessor systems. Extra connectivity determination and faulty networks' structure analysis are two important aspects for the reliability evaluation of interconnection networks. An $(n)$-dimensional bijective connection network (in brief, BC network), denoted by $(X_n)$, is an $(n)$-regular graph with $(2^{n})$ vertices and $(n2^{n-1})$ edges. The hypercubes, M$(ddot{o})$bius cubes, crossed cubes, and twisted cubes are some examples of the BC networks. By exploring the boundary problem of the BC networks, we prove that when $(nge 4)$ and $(0le hle n-4)$ the $(h)$-extra connectivity of an $(n)$-dimensional BC network $(X_n)$ is $(kappa_{h}(X_n)=)$ $(n(h+1)-{1over 2} h(h+3))$. Furthermore, there exists a large connected component and the remaining small components have at most $(h)$ vertices in total if the total number of faulty vertices is strictly less its $(h)$-extra connectivity. As an application, the results on the $(h)$-extra connectivity and structure of faulty networks on hypercubes, M$(ddot{o})$bius cubes, crossed cubes, and twisted cubes are obtained.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 69
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2013-09-28
    Description: New applications based on cloud computing, such as data synchronization for large chain departmental stores and bank transaction records, require very high-speed data transport. Although a number of high-bandwidth networks have been built, existing transport protocols or their variants over such networks cannot fully exploit the network bandwidth. Our experiments show that the fixed-size application level buffer employed in the receiver side is a major cause of this deficiency. A buffer that is either too small or too large impairs the transfer performance. Due to the varied natures of network conditions and of real-time packet processing (i.e., consuming) speed at the receiver, it is important to ensure that the buffer size is dynamically adjusted according to the perceived execution situation during runtime. In this paper, we propose Rada, a dynamic receiving buffer adaptation scheme for high-speed data transfer. Rada employs an exponential moving average aided scheme to quantify the data arrival rate and consumption rate in the buffer. Based on these two rates, we develop a linear aggressive increase conservative decrease scheme to adjust the buffer size dynamically. Moreover, a weighted mean function is employed to make the adjustment adaptive to the available memory in the receiver. Theoretical analysis is provided to demonstrate the rationale and parameter bounds of Rada. The performance of Rada is also theoretically compared with potential alternatives. We implement Rada in a Linux platform and extensively evaluate its performance in a variety of scenarios. Experimental results conform to the theoretical results, and show that Rada outperforms the static buffer scheme in terms of throughput, memory footprint, and fairness.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 70
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2013-09-28
    Description: This paper introduces SymPLFIED, a program-level framework that allows specification of arbitrary error detectors and the verification of their efficacy against hardware errors. SymPLFIED comprehensively enumerates all transient hardware errors in registers, memory, and computation (expressed symbolically as value errors) that potentially evade detection and cause program failure. The framework uses symbolic execution to abstract the state of erroneous values in the program and model checking to comprehensively find all errors that evade detection. We demonstrate the use of SymPLFIED on a widely deployed aircraft collision avoidance application, tcas. Our results show that the SymPLFIED framework can be used to uncover hard-to-detect catastrophic cases caused by transient errors in programs that may not be exposed by random fault injection-based validation. Further, the errors exposed by the framework help us formulate a set of error detectors for the application to avoid the catastrophic case and other incorrect outcomes.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 71
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2013-09-28
    Description: In delay tolerant networks (DTNs), the lack of continuous connectivity, network partitioning, and long delays make design of network protocols very challenging. Previous DTN research mainly focuses on routing and information propagation. However, with a large number of wireless devices' participation, it becomes crucial regarding how to maintain efficient and dynamic topology of the DTN. In this paper, we study the topology control problem in a predictable DTN, where the time-evolving network topology is known a priori or can be predicted. We first model such time-evolving network as a directed space-time graph that includes both spacial and temporal information. The aim of topology control is to build a sparse structure from the original space-time graph such that 1) the network is still connected over time and supports DTN routing between any two nodes; 2) the total cost of the structure is minimized. We prove that this problem is NP-hard, and propose two greedy-based methods that can significantly reduce the total cost of topology while maintaining the connectivity over time. We also introduce another version of the topology control problem by requiring that the least cost path for any two nodes in this constructed structure is still cost-efficient compared with the one in the original graph. Two greedy-based methods are provided for such a problem. Simulations have been conducted on both random DTN networks and real-world DTN tracing data. Results demonstrate the efficiency of the proposed methods.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 72
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2013-09-28
    Description: To achieve secure group communication, one-time session keys need to be shared among group members in a secure and authenticated manner. In this paper, we propose an improved authenticated key transfer protocol based on Shamir's secret sharing. The proposed protocol achieves key confidentiality due to security of Shamir's secret sharing and provides key authentication by broadcasting a single authentication message to all members. Furthermore, the proposed scheme resists against both insider and outsider attacks.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 73
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2013-09-28
    Description: Recent cost analysis shows that the server cost still dominates the total cost of high-scale data centers or cloud systems. In this paper, we argue for a new twist on the classical resource provisioning problem: heterogeneous workloads are a fact of life in large-scale data centers, and current resource provisioning solutions do not act upon this heterogeneity. Our contributions are threefold: first, we propose a cooperative resource provisioning solution, and take advantage of differences of heterogeneous workloads so as to decrease their peak resources consumption under competitive conditions; second, for four typical heterogeneous workloads: parallel batch jobs, web servers, search engines, and MapReduce jobs, we build an agile system PhoenixCloud that enables cooperative resource provisioning; and third, we perform a comprehensive evaluation for both real and synthetic workload traces. Our experiments show that our solution could save the server cost aggressively with respect to the noncooperative solutions that are widely used in state-of-the-practice hosting data centers or cloud systems: for example, EC2, which leverages the statistical multiplexing technique, or RightScale, which roughly implements the elastic resource provisioning technique proposed in related state-of-the-art work.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 74
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2013-09-28
    Description: A flash translation layer (FTL) provides file systems with transparent access to NAND flash memory. Although many applications running on it require real-time guarantees, it is difficult to provide tight worst case execution time (WCET) bounds with conventional static WCET analysis since an FTL exhibits a large variance in execution time depending on its runtime state. Parametric WCET analysis could be an effective alternative but it is also challenging to formulate a parametric WCET function for an FTL program because traditional FTL architecture does not properly model the runtime availability of flash resources in its code structure. To overcome such a limitation, we propose Petri net-based FTL architecture where a Petri net explicitly specifies dependencies between FTL operations and the runtime resource availability. It comes with an FTL operation sequencer that derives at runtime the shortest sequence of FTL operations for servicing an incoming FTL request under the current resource availability. The sequencer computes the WCET of the request by merely summing the WCETs of only those FTL operations in the sequence. Our experimental results show the effectiveness of our FTL architecture. It allowed for tight WCET estimation that yielded WCETs shorter by a factor of 54 than statically analyzed ones.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 75
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2013-09-28
    Description: Process variations in integrated circuits have significant impact on their performance, leakage, and stability. This is particularly evident in large, regular, and dense structures such as DRAMs. DRAMs are built using minimized transistors with presumably uniform speed in an organized array structure. Process variation can introduce latency disparity among different memory arrays. With the proliferation of 3D stacking technology, DRAMs become a favorable choice for stacking on top of a multicore processor as a last level cache for large capacity, high bandwidth, and low power. Hence, variations in bank speed create a unique problem of nonuniform cache accesses in 3D space. In this paper, we investigate cache management techniques for tolerating process variation in a 3D DRAM stacked onto a multicore processor. We modeled the process variation in a four-layer DRAM memory, including cell transistor, capacitor trench, and peripheral circuit, to characterize the latency and retention time variations among different banks. As a result, the notion of fast and slow banks from the core's standpoint is no longer associated with their physical distances with the banks. They are determined by the different bank latencies due to process variation. We develop cache migration schemes that utilize fast banks while limiting the cost due to migration. Our experiments show that there is a great performance benefit in exploiting fast memory banks through migration. On average, a variation-aware management can improve the performance of a workload over the baseline (where one of the slowest bank speed is assumed for all banks) by 16.5 percent. We are also only 0.8 percent away in performance from an ideal memory where no process variation is present.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 76
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2013-09-28
    Description: We address pairwise and (for the first time) triple key establishment problems in wireless sensor networks (WSN). Several types of combinatorial designs have already been applied in key establishment. A $(BIBD(v,b,r,k,lambda ))$ (or $(t-(v,b,r,k,lambda ))$ design) can be mapped to a sensor network, where $(v)$ represents the size of the key pool, $(b)$ represents the maximum number of nodes that the network can support, and $(k)$ represents the size of the key chain. Any pair (or $(t)$-subset) of keys occurs together uniquely in exactly $(lambda)$ nodes; $(lambda = 2)$ and $(lambda = 3)$ are used to establish unique pairwise or triple keys. We use several known constructions of designs with $(lambda =2)$, to predistribute keys in sensors. We also describe a new construction of a design called strong Steiner trade and use it for pairwise key establishment. To the best of our knowledge, this is the first paper on application of trades to key distribution. Our scheme is highly resilient against node capture attacks (achieved by key refreshing) and is applicable for mobile sensor networks (as key distribution is independent on the connectivity graph), while preserving low storage, computation and communication requirements. We introduce a novel concept of triple key distribution, in which three nodes share common keys, and discuss its application in secure forwarding, detecting malicious nodes and key management in clustered sensor networks. We present a polynomial-based and a combinatorial approach (using trades) for triple key distribution. We also extend our construction to simultaneously provide pairwise and triple key distribution scheme, and apply it to secure data aggregation.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 77
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2013-04-03
    Description: The present paper proposes a generalization of the square root rule for optimal periodic scheduling. The rule defines a ratio of item occurrences in a schedule, which minimizes the mean serving time. However, the actual number of each item's occurrences must be an integer. Therefore, the square root rule assumes large schedules, in order for the ratio to hold with acceptable precision. The present paper introduces an analysis-derived formula which connects the mean serving time and the size of the schedule. The relation shows that small schedules can also achieve near-optimal serving times. The analysis is validated through comparison with simulation and brute force-derived results. Finally, it is shown that minimizing the size of the schedule is also an efficient way of optimizing the aggregate scheduling cost.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 78
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2013-04-03
    Description: Numerous works have addressed efficient parallel $(GF(2^m))$ multiplication based on polynomial basis or some of its variants. For those field degrees where neither irreducible trinomials nor Equally Spaced Polynomials (EPSs) exist, the best area/time performance has been achieved for special-type irreducible pentanomials, which however do not exist for all degrees. In other words, no multiplier architecture has been proposed so far achieving the best performance and, at the same time, being general enough to support any field degrees. In this paper, we propose a new representation, based on what we called Generalized Polynomial Bases (GPBs), covering polynomial bases and the so-called Shifted Polynomial Bases (SPBs) as special cases. In order to study the new representation, we introduce a novel formulation for polynomial basis and its variants, which is able to express concisely all implementation aspects of interest, i.e., gate count, subexpression sharing, and time delay. The methodology enabled by the new formulation is completely general and repetitive in its application, allowing the development of an ad-hoc software tool to derive proofs for area complexity and time delays automatically. As the central contribution of this paper, we introduce some new types of irreducible pentanomials and an associated GPB. Based on the above formulation, we prove that carefully chosen GPBs yield multiplier architectures matching, or even outperforming, the best special-type pentanomials from both the area and time point of view. Most importantly, the proposed GPB architectures require pentanomials existing for all degrees of practical interest. A list of suitable irreducible pentanomials for all degrees less than 1,000 is given in the appendix (Fig. 5 and Tables 4-11 are provided in a separate file containing the body of Appendix, which can be found on the Computer Society Digital Library at >http://doi.ieeecomputersociety.org/10.1109/TC.2012.63).
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 79
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2013-04-03
    Description: Low complexity solutions to provide deterministic quality over packet switched networks while achieving high resource utilization have been an open research issue for many years. Service differentiation combined with resource overprovisioning has been considered an acceptable compromise and widely deployed given that the amount of traffic requiring quality guarantees has been limited. This approach is not viable, though, as new bandwidth hungry applications, such as video on demand, telepresence, and virtual reality, populate networks invalidating the rationale that made it acceptable so far. Time-driven priority represents a potentially interesting solution. However, the fact that the network operation is based on a time reference shared by all nodes raises concerns on the complexity of the nodes, from the point of view of both their hardware and software architecture. This work analyzes the implications that the timing requirements of time-driven priority have on network nodes and shows how proper operation can be ensured even when system components introduce timing uncertainties. Experimental results on a time-driven priority router implementation based on a personal computer both validate the analysis and demonstrate the feasibility of the technology even on an architecture that is not designed for operating under timing constraints.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 80
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2013-04-03
    Description: In recent years, we have experienced a wave of DDoS attacks threatening the welfare of the internet. These are launched by malicious users whose only incentive is to degrade the performance of other, innocent, users. The traditional systems turn out to be quite vulnerable to these attacks. The objective of this work is to take a first step to close this fundamental gap, aiming at laying a foundation that can be used in future computer/network designs taking into account the malicious users. Our approach is based on proposing a metric that evaluates the vulnerability of a system. We then use our vulnerability metric to evaluate a data structure which is commonly used in network mechanisms—the Hash table data structure. We show that Closed Hash is much more vulnerable to DDoS attacks than Open Hash, even though the two systems are considered to be equivalent by traditional performance evaluation. We also apply the metric to queuing mechanisms common to computer and communications systems. Furthermore, we apply it to the practical case of a hash table whose requests are controlled by a queue, showing that even after the attack has ended, the regular users still suffer from performance degradation or even a total denial of service.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 81
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2013-04-03
    Description: In Network Intrusion Detection Systems (NIDSs), string pattern matching demands exceptionally high performance to match the content of network traffic against a predefined database (or dictionary) of malicious patterns. Much work has been done in this field; however, most of the prior work results in low memory efficiency (defined as the ratio of the amount of the required storage in bytes and the size of the dictionary in number of characters). Due to such inefficiency, state-of-the-art designs cannot support large dictionaries without using high-latency external DRAM. We propose an algorithm called "leaf-attaching" to preprocess a given dictionary without increasing the number of patterns. The resulting set of postprocessed patterns can be searched using any tree-search data structure. We also present a scalable, high-throughput, Memory-efficient Architecture for large-scale String Matching (MASM) based on a pipelined binary search tree. The proposed algorithm and architecture achieve a memory efficiency of 0.56 (for the Rogets dictionary) and 1.32 (for the Snort dictionary). As a result, our design scales well to support larger dictionaries. Implementations on 45 nm ASIC and a state-of-the-art FPGA device (for latest Rogets and Snort dictionaries) show that our architecture achieves 24 and 3.2 Gbps, respectively. The MASM module can simply be duplicated to accept multiple characters per cycle, leading to scalable throughput with respect to the number of characters processed in each cycle. Dictionary update involves simply rewriting the content of the memory, which can be done quickly without reconfiguring the chip.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 82
    Publication Date: 2013-04-03
    Description: In this paper, an analytic model is proposed for the performance evaluation of vehicular safety related services in the dedicated short range communications (DSRC) system on highways. The generation and service of safety messages in each vehicle is modeled by a generalized M/G/1 queue. The overall model is a set of interacting M/G/1 queues, one queue for each vehicle. The interaction is that the server is shared as it is the contention medium. To make the model scalable, we use semi-Markov process (SMP) model to capture the shared server's behavior from one tagged vehicle's perspective, where the medium contention and back off behavior for this vehicle and influences from other vehicles are considered. Furthermore, this SMP interacts with the tagged vehicle's own M/G/1 queue through fixed-point iteration. The proof for the existence, uniqueness and convergence of the fixed point is provided. Based on the fixed-point solution, performance indices including mean transmission delay, packet delivery ratio (PDR), and packet reception ratio (PRR) are derived. Analytic-numeric results are verified through extensive simulations under various network parameters. Compared with the existing models, the proposed SMP model facilitates the impact analysis of hidden terminal problem on the PDR and PRR computation in a more precise manner.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 83
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2013-04-03
    Description: We present a comprehensive, self-contained, and mechanically verified proof of correctness of a maximally redundant SRT design for floating-point division and square root extraction, supported by verified procedures that 1) test the admissibility of a proposed digit selection table, 2) determine the minimal dimensions of an admissible table for a given arbitrary radix, and 3) generate these tables. For square root extraction, we also provide a verified procedure for generating an initial approximation that meets the accuracy requirement of the algorithm and ensures that the digit selection index derived from successive partial roots remains static throughout the computation. A radix-8 instantiation of these algorithms has been implemented in the floating-point unit of the AMD processor code-named Steamroller. To ensure their correctness, all of our results and procedures have been formalized and mechanically checked by the ACL2 prover. We present evidence of the value of this approach by comparing it to that of a more conventional published paper that reports similar results, which are shown to be fatally flawed.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 84
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2013-04-03
    Description: Multicore chips are currently dominating the microprocessor market as designs that improve performance and sustain power consumption. However, complex core features must be still considered to provide good performance for existing sequential applications. An effective approach to reduce core complexity without dramatically sacrificing performance is to distribute critical processor structures by using clustered microarchitectures. In these designs, communication latency among clusters is a critical performance bottleneck, and a good steering algorithm is required to reduce intercluster communication. In this paper, we propose a new energy-efficient microarchitectural approach that reduces intercluster communication by detecting and generating independent chains of instructions, referred to as subtraces, from the execution of sequential programs. The devised mechanism has been modeled on an x86-based trace-cache processor, where subtraces are built in the fill unit, stored in a trace cache, and individually steered to different clusters. Experimental results show that the proposal reaches performance speedups around 7 and 15 percent for point-to-point and bus-based interconnects, respectively, while achieving energy savings of up to 12 percent.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 85
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2013-04-03
    Description: The decimal multiplication is one of the most important decimal arithmetic operations which have a growing demand in the area of commercial, financial, and scientific computing. In this paper, we propose a parallel decimal multiplication algorithm with three components, which are a partial product generation, a partial product reduction, and a final digit-set conversion. First, a redundant number system is applied to recode not only the multiplier, but also multiples of the multiplicand in signed-digit (SD) numbers. Furthermore, we present a multioperand SD addition algorithm to reduce the partial product array. Finally, a digit-set conversion algorithm with a hybrid prefix network to decrease the number of the logic gates on the critical path is discussed. An analysis of the timing delay and an HDL model synthesized under 90 nm technology show that by considering the tradeoff of designs among three components, the overall delay of the proposed $(16times 16)$-digit multiplier takes about 11 percent less timing delay with 2 percent less area compared to the current fastest design.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 86
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-05-13
    Description: $(t,k)$ -Diagnosis, which is a generalization of sequential diagnosis, requires that at least $k$ faulty processors be identified and repaired in each iteration when there are at most $t$ faulty processors, where $tge k$ . Based on the assumption that each vertex is adjacent to at least one fault-free vertex, the conditional $(t,k)$ -diagnosis of graphs was investigated by using the comparison diagnosis model. Lower bounds on the conditional $(t, k)$ -diagnosability of graphs were derived, and applied to obtain the following results. 1) Symmetric $d$ -dimensional grids are conditionally $(frac{N}{2d+1}-1,2d-1)$ -diagnosable when $dge 2$ and $N$ (the number of vertices) $ge 4^d$ . 2) Symmetric $d$ -dimensional tori are conditionally $(frac{1}{5}(N+min lbrace frac{8}{5} N^{frac{2}{3}},frac{2N-20}{15}rbrace -2),6)$ -diagnosable when $d=2$ and $Nge 49$ and $(frac{N}{2d+1}-1,4d-2)$ -diagnosable when $dge 3$ and $Nge 4^d$
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 87
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-05-13
    Description: Deadline guaranteed packet scheduling for switches is a fundamental issue for providing guaranteed QoS in digital networks. It is a historically difficult NP-hard problem if three or more deadlines are involved. All existing algorithms have too low throughput to be used in practice. A key reason is they use packet deadlines as default priorities to decide which packets to drop whenever conflicts occur. Although such a priority structure can ease the scheduling by focusing on one deadline at a time, it hurts the throughput greatly. Since deadlines do not necessarily represent the actual importance of packets, we can greatly improve the throughput if deadline induced priority is not enforced. This paper first presents an algorithm that guarantees the maximum throughput for the case where only two different deadlines are allowed. Then, an algorithm called iterative scheduling with no priority (ISNOP) is proposed for the general case where k > 2 different deadlines may occur. Not only does this algorithm have dramatically better average performance than all existing algorithms, but also guarantees approximation ratio of 2. ISNOP would provide a good practical solution for the historically difficult packet scheduling problem.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 88
    Publication Date: 2015-05-13
    Description: By increasing the complexity of digital systems, verification and debugging of such systems have become a major problem and economic issue. Although many computer aided design (CAD) solutions have been suggested to enhance efficiency of existing debugging approaches, they are still suffering from lack of providing a small set of potential error locations and also automatic correction mechanisms. On the other hand, the ever-growing usage of digital signal processing (DSP), computer graphics and embedded systems applications that can be modeled as polynomial computations in their datapath designs, necessitate an effective method to deal with their verification, debugging and correction. In this paper, we introduce a formal debugging approach based on static slicing and dynamic ranking methods to derive a reduced ordered set of potential error locations. In addition, to speed up finding true errors in the presence of multiple design errors, error candidates are sorted in decreasing order of their probability of being an error. After that, a mutation-based technique is employed to automatically correct bugs even in the case of multiple bugs. In order to evaluate the effectiveness of our approach, we have applied it to several industrial designs. The experimental results show that the proposed technique enables us to locate and correct even multiple bugs with high confidence in a short run time even for complex designs of up to several thousand lines of RTL code.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 89
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-05-13
    Description: The series of published works, related to differential fault attack (DFA) against the Grain family, require quite a large number (hundreds) of faults and also several assumptions on the locations and the timings of the faults injected. In this paper, we present a significantly improved scenario from the adversarial point of view for DFA against the Grain family of stream ciphers. Our model is the most realistic one so far as it considers that the cipher has to be re-keyed only a few times and faults can be injected at any random location and at any random point of time, i.e., no precise control is needed over the location and timing of fault injections. We construct equations based on the algebraic description of the cipher by introducing new variables so that the degrees of the equations do not increase. In line of algebraic cryptanalysis, we accumulate such equations based on the fault-free and faulty key-stream bits and solve them using the SAT Solver Cryptominisat-2.9.5 installed with SAGE 5.7. In a few minutes we can recover the state of Grain v1, Grain-128 and Grain-128a with as little as 10, 4 and 10 faults respectively.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 90
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-05-13
    Description: Several papers have studied fault attacks on computing a pairing value $e(P,Q)$ , where $P$ is a public point and $Q$ is a secret point. In this paper, we observe that these attacks are in fact effective only on a small number of pairing-based protocols, and that too only when the protocols are implemented with specific symmetric pairings. We demonstrate the effectiveness of the fault attacks on a public-key encryption scheme, an identity-based encryption scheme, and an oblivious transfer protocol when implemented with a symmetric pairing derived from a supersingular elliptic curve with embedding degree 2.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 91
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-05-13
    Description: The key benefits of using the smartphone accelerometer for human mobility analysis, with or without location determination based upon GPS, Wi-Fi or GSM is that it is energy-efficient, provides real-time contextual information and has high availability. Using measurements from an accelerometer for human mobility analysis presents its own challenges as we all carry our smartphonesdifferently and the measurements are body placement dependent. Also it often relies on an on-demand remote data exchangefor analysis and processing; which is less energy-efficient, has higher network costs and is not real-time. We present a novelaccelerometer framework based upon a probabilistic algorithm that neutralizes the effect of different smartphone on-body placements and orientations to allow human movements to be more accurately and energy-efficiently identified. Using solely the embeddedsmartphone accelerometer without need for referencing historical data and accelerometer noise filtering, our method can in real-time with a time constraint of 2 seconds identify the human mobility state. The method achieves an overall average classification accuracyof 92 percent when evaluated on a dataset gathered from fifteen individuals that classified nine different urban human mobility states.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 92
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-05-13
    Description: Nanoscale process variations in conventional SRAM cells are known to limit voltage scaling in microprocessor caches. Recently, a number of novel cache architectures have been proposed which substitute faulty words of one cache line with healthy words of others, to tolerate these failures at low voltages. These schemes rely on the fault maps to identify faulty words, inevitably increasing the chip area. Besides, the relationship between word sizes and the cache failure rates is not well studied in these works. In this paper, we analyze the word substitution schemes by employing Fault Tree Model and Collision Graph Model. A novel cache architecture (Macho) is then proposed based on this model. Macho is dynamically reconfigurable and is locally optimized (tailored to local fault density) using two algorithms: 1) a graph coloring algorithm for moderate fault densities and 2) a bipartite matching algorithm to support high fault densities. An adaptive matching algorithm enables on-demand reconfiguration of Macho to concentrate available resources on cache working sets. As a result, voltage scaling down to 400 mV is possible, tolerating bit failure rates reaching 1 percent (one failure in every 100 cells). This near-threshold voltage (NTV) operation achieves 44 percent energy reduction in our simulated system (CPU $+$ DRAM models) with a 1 MB L2 cache.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 93
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-05-13
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 94
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-05-13
    Description: We present a custom architecture for realizing the Gentry-Halevi fully homomorphic encryption (FHE) scheme. This contribution presents the first full realization of FHE in hardware. The architecture features an optimized multi-million bit multiplier based on the Schönhage Strassen multiplication algorithm. Moreover, a number of optimizations including spectral techniques as well as a precomputation strategy is used to significantly improve the performance of the overall design. When synthesized using 90 nm technology, the presented architecture achieves to realize the encryption, decryption, and recryption operations in 18.1 msec, 16.1 msec, and 3.1 sec, respectively, and occupies a footprint of less than 30 million gates.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 95
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-05-13
    Description: Providing deadline-sensitive services is a challenge in data centers. Because of the conservativeness in additive increase congestion avoidance, current transmission control protocols are inefficient in utilizing the super high bandwidth of data centers. This may cause many deadline-sensitive flows to miss their deadlines before achieving their available bandwidths. We propose an Adaptive-Acceleration Data Center TCP, A $!^2$ DTCP, which takes into account both network congestion and latency requirement of application service. By using congestion avoidance with an adaptive increase rate that varies between additive and multiplicative, A $!^2$ DTCP accelerates bandwidth detection thus achieving high bandwidth utilization efficiency. At-scale simulations and real testbed implementations show that A $!^2$ DTCP significantly reduces the missed deadline ratio compared to D $!^2$ TCP and DCTCP. In addition, A $!^2$ DTCP can co-exist with conventional TCP as well without requiring more changes in switch hardware than D $!^2$ TCP and DCTCP.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 96
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-05-13
    Description: The design of cache memories is a crucial part of the design cycle of a modern processor, since they are able to bridge the performance gap between the processor and the memory. Unfortunately, caches with low degrees of associativity suffer a large amount of conflict misses. Although by increasing their associativity a significant fraction of these misses can be removed, this comes at a high cost in both power, area, and access time. In this work, we address the problem of high number of conflict misses in low-associative caches, by proposing an indexing policy that adaptively selects the bits from the block address used to index the cache. The basic premise of this work is that the non-uniformity in the set usage is caused by a poor selection of the indexing bits. Instead, by selecting at run time those bits that disperse the working set more evenly across the available sets, a large fraction of the conflict misses (85 percent, on average) can be removed. This leads to IPC improvements of 10.9 percent for the SPEC CPU2006 benchmark suite. By having less accesses in the L2 cache, our proposal also reduces the energy consumption of the cache hierarchy by 13.2 percent. These benefits come with a negligible area overhead.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 97
    Publication Date: 2015-05-13
    Description: This paper concentrates on high-level data-flow optimization and synthesis techniques for datapath intensive designs such as those in Digital Signal Processing (DSP), computer graphics and embedded systems applications, which are modeled as polynomial computations over $Z_{2^{n_1 } } times Z_{2^{n_2 } } times cdots times Z_{2^{n_d } }$ to $Z_{2^m }$ . Our main contribution in this paper is proposing an optimization method based on functional decomposition of multivariate polynomial in the form of $f(x) = g(x) ;o ;h(x) + f_{0} = g(h(x)) + f_{0}$ to obtain good building blocks, and vanishing polynomials over $Z_{2^m }$ to add/delete redundancy to/from given polynomial functions to extract further common sub-expressions. Experimental results for combinational implementation of the designs have shown an average saving of 38.85 and 18.85 percent in the number of gates and critical path delay, respectively, compared with the state-of-the-art techniques. Regarding the comparison with our previous works, the area and delay are improved by 10.87 and 11.22 percent, respectively. Furthermore, experimental results of sequential implementations have shown an average saving of 39.26 and 34.70 percent in the area and the latency, respectively, compared with the state-of-the-art techniques.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 98
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-05-13
    Description: Abnormalities in sensed data streams indicate the spread of malicious attacks, hardware failure and software corruption among the different nodes in a wireless sensor network. These factors of node infection can affect generated and incoming data streams resulting in high chances of inaccurate data, misleading packet translation, wrong decision making and severe communication disruption. This problem is detrimental to real-time applications having stringent quality-of-service (QoS) requirements. The sensed data from other uninfected regions might also get stuck in an infected region should no prior alternative arrangements are made. Although several existing methods (BOUNDHOLE and GAR) can be used to mitigate these issues, their performance is bounded by some limitations, mainly the high risk of falling into routing loops and involvement in unnecessary transmissions. This paper provides a solution to by-pass the infected nodes dynamically using a twin rolling balls technique and also divert the packets that are trapped inside the identified area. The identification of infected nodes is done by adapting a Fuzzy data clustering approach which classifies the nodes based on the fraction of anomalous data that is detected in individual data streams. This information is then used in the proposed by-passed routing (BPR) which rotates two balls in two directions simultaneously: clockwise and counter-clockwise. The first node that hits any ball in any direction and is uninfected, is selected as the next hop. We are also concerned with the incoming packets or the packets-on-the-fly that may be affected when this problem occurs. Besides solving both of the problems in the existing methods, the proposed BPR technique has greatly improved the studied QoS parameters as shown by almost 40 percent increase in the overall performance.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 99
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-05-13
    Description: This paper presents a compositional framework to address the state explosion problem in model checking of concurrent systems. This framework takes as input a system model described as a network of communicating components in a high-level description language, finds the local state transition models for each individual component where local properties can be verified, and then iteratively reduces and composes the component state transition models to form a reduced global model for the entire system where global safety properties can be verified. The state space reductions used in this framework result in a reduced model that contains the exact same set of observably equivalent executions as in the original model, therefore, no false counter-examples result from the verification of the reduced model. This approach allows designs that cannot be handled monolithically or with partial-order reduction to be verified without difficulty. The experimental results show significant scale-up of this compositional verification framework on a number of non-trivial concurrent system models.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 100
    facet.materialart.
    Unknown
    Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2015-05-13
    Description: While a NUMA system is being widely used as a target machine for virtualization, each data access request produced by a virtual machine (VM) on the NUMA system may have a different access time depending on not only remote access condition, but also shared resource contentions. Mainly due to this, each VM running on the NUMA system will have irregular data access performance at different times. Because existing hypervisors, such as KVM, VMware, and Xen, have yet to consider this, users of VMs cannot predict their data access performance or even recognize the data access performance they have experienced. In this paper, we propose a novel VM placement technique to resolve this issue pertaining to irregular data access performance of VMs running on the NUMA system. The hypervisor with our technique provides the illusion of a private memory subsystem to each VM, which guarantees the data access latency required by each VM on average. To enable this feature, we periodically evaluates the average data access latency of each VM using hardware performance monitoring units. After every evaluation, our Mcredit -based VM migration algorithm tries to migrate the VCPU or memory of the VM not meeting with its required data access latency to another node, giving the VM less data access latency. We implemented the prototype for KVM hypervisor on Linux 3.10.10. Experimental results show that, in the four-node NUMA system, our technique keeps the required data access performance levels of VMs running various workloads while it only consumes less than 1 percent of the cycles of a core and 0.3 percent of the system memory bandwidth.
    Print ISSN: 0018-9340
    Electronic ISSN: 1557-9956
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
Close ⊗
This website uses cookies and the analysis tool Matomo. More information can be found here...